I think the objection to "boosting" in token filters isn't because it is "too much", but rather because it breaks the abstraction of the analysis chain to directly target scoring (as implied by characterizing as "boosting").
That said, I'm sympathetic to an approach that would establish an Attribute to expose the kind of information that would be useful in the context of synonyms (or other sorts of derived tokens discussed here, where it could be useful to express information about token derivation). Such an Attribute would not be directly related to scoring/boosting, but would be related to analysis per se, (e.g., source token text, thesaurus, degree of confidence, etc.); support could be selectively implemented by TokenFilters, and optionally leveraged by query builders (e.g., translated to boosts) or even recorded to index Payloads by a final custom analysis component .... "You can look at any attribute on the tokenstream you want", "rely on abstract attributes (type, ...) then it should be easy to sub-class the query builder to access them". Obviously that works iff analysis components record the relevant information in attributes on the tokenstream, which I think they currently don't (for much of the information that has been discussed here) ... and I know of no standard way to express the relevant information on the tokenstream. I can see that such an Attribute would be out of place (too specialized) in the context of the Attributes in lucene/core; but there are lots of more specialized Attributes in the various submodules under lucene/analysis/* (SynonymGraphFilter lives in analysis-common, FWIW). Again, this doesn't strike me as terribly specialized, if one thinks of it more generally as a "derivation/relationship" Attribute. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
