I like that idea Alan. The trick is for QueryBuilder's 'newSynonymQuery' to be useful in that context, you need to pass terms with metadata down to the subclass. This is what I started working on a few weeks ago:
https://github.com/o19s/lucene-solr/commit/0fc3930671ef002cfbb5e3d52b6f8edc3715bf14 I don't think it's as simple as overriding analyzeBoolean/analyzeMultiBoolean as Rob suggests, as there's also analyzeGraphBoolean and the that would also need to collect this metadata. I wouldn't want to copy paste all this code into a subclass just to add one token attribute. -Doug On Wed, Nov 28, 2018 at 12:25 PM Alan Woodward <[email protected]> wrote: > I think we can expose this information now with a small tweak to the > SynonymGraphFilter, using the already-existing TypeAttribute. > > SGF is hard-coded to set the type attribute to “SYNONYM” on all tokens > that it inserts into the stream. It should be simple to add another > constructor parameter allowing users to change this; then you can chain > synonym filters, one for each type of expansion you want: synonym, hyponym, > hypernym, whatever, each setting the type attribute differently. > > > On 28 Nov 2018, at 15:59, Michael Gibney <[email protected]> > wrote: > > > > I think the objection to "boosting" in token filters isn't because it > > is "too much", but rather because it breaks the abstraction of the > > analysis chain to directly target scoring (as implied by > > characterizing as "boosting"). > > > > That said, I'm sympathetic to an approach that would establish an > > Attribute to expose the kind of information that would be useful in > > the context of synonyms (or other sorts of derived tokens discussed > > here, where it could be useful to express information about token > > derivation). Such an Attribute would not be directly related to > > scoring/boosting, but would be related to analysis per se, (e.g., > > source token text, thesaurus, degree of confidence, etc.); support > > could be selectively implemented by TokenFilters, and optionally > > leveraged by query builders (e.g., translated to boosts) or even > > recorded to index Payloads by a final custom analysis component .... > > > > "You can look at any attribute on the tokenstream you want", "rely on > > abstract attributes (type, ...) then it should be easy to sub-class > > the query builder to access them". Obviously that works iff analysis > > components record the relevant information in attributes on the > > tokenstream, which I think they currently don't (for much of the > > information that has been discussed here) ... and I know of no > > standard way to express the relevant information on the tokenstream. > > > > I can see that such an Attribute would be out of place (too > > specialized) in the context of the Attributes in lucene/core; but > > there are lots of more specialized Attributes in the various > > submodules under lucene/analysis/* (SynonymGraphFilter lives in > > analysis-common, FWIW). Again, this doesn't strike me as terribly > > specialized, if one thinks of it more generally as a > > "derivation/relationship" Attribute. > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [email protected] > > For additional commands, e-mail: [email protected] > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > -- CTO, OpenSource Connections Author, Relevant Search http://o19s.com/doug
