SynonymTokenFilter, if I understand correctly, maps a given token to a set of tokens representing its synonyms. If used in the filter chain of a query analyzer, it causes a "query expansion". (Correct terminology?) If used in the filter chain of an analyzer it causes "index expansion".
I was wondering whether anyone has implemented a synonym filter that instead of mapping tokens to their synonyms, maps tokens to their "synonym-groups". Again, I'm not sure this is correct IR terminology, but borrowing from the SynonymMap implementation, what I mean by a "synonym-group" is a set words that are considered synonyms. If a word can have different [contextual] meanings, then it would be a member of multiple synonym-groups. The idea here is to minimize the index/query "expansion" by observing that the number of synonym-groups a word belongs to would typically be far fewer than the number of its synonyms. Each synonym-group would be represented by a specially unique term in the index. Unlike SynonymTokenFilter, the filter would have to be used in both the indexer and query analyzer. This is not a new idea. See the comments in LUCENE-1622 (a tangential topic), for example. Has anyone contributed an implementation? -Babak --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
