Your synonyms will break if you try searching for phrases.

Good point, I did write that filter, but I never actually got to searching for exact phrases in it (there was a very specific scenario and we used prefix queries which worked quite well).

Building on your example, "food place in new york" will find nothing,
because 'place' and 'in' share the same position.

You're right, but is it such a big problem in real life? What you're describing is searching for a phrase that spawns both the synonym and the actual token sequence. What I thought was: searching for phrases that were either just synonyms or synonyms and text with an identical position layout (which is the case with single-word synonyms). I dare say this covers majority of cases, although I have nothing to support this claim.

While building the index, I inject synonym group ids instead of actual
words, then I detect synonyms in queries and replace them with group
ids too. Hard part comes after that, you have to adjust
positionIncrements on syngroup id tokens, with respect to the longest
> [snip]

Yep, hairy ;)

More correct approach is to index as-is and expand queries with actual
synonym phrases instead of ids, but then queries become really
humongous if you have any decent synonym dictionary (I have 20+ phrase
groups).

Query expansion is not the option for me, unfortunately -- to many synonyms. It would be much better to do it once at indexing time and rely on this information since.

Thanks for sharing your thoughts, Кирилл.

Dawid

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to