I think the objection to "boosting" in token filters isn't because it
is "too much", but rather because it breaks the abstraction of the
analysis chain to directly target scoring (as implied by
characterizing as "boosting").

That said, I'm sympathetic to an approach that would establish an
Attribute to expose the kind of information that would be useful in
the context of synonyms (or other sorts of derived tokens discussed
here, where it could be useful to express information about token
derivation). Such an Attribute would not be directly related to
scoring/boosting, but would be related to analysis per se, (e.g.,
source token text, thesaurus, degree of confidence, etc.); support
could be selectively implemented by TokenFilters, and optionally
leveraged by query builders (e.g., translated to boosts) or even
recorded to index Payloads by a final custom analysis component ....

"You can look at any attribute on the tokenstream you want", "rely on
abstract attributes (type, ...) then it should be easy to sub-class
the query builder to access them".  Obviously that works iff analysis
components record the relevant information in attributes on the
tokenstream, which I think they currently don't (for much of the
information that has been discussed here) ... and I know of no
standard way to express the relevant information on the tokenstream.

I can see that such an Attribute would be out of place (too
specialized) in the context of the Attributes in lucene/core; but
there are lots of more specialized Attributes in the various
submodules under lucene/analysis/* (SynonymGraphFilter lives in
analysis-common, FWIW). Again, this doesn't strike me as terribly
specialized, if one thinks of it more generally as a
"derivation/relationship" Attribute.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to