independent of the synonym implementation you might want to consider vector/similarity 
search, for example if the query is "internet device",
then the cosine similarity of the multi-terms "internet device", "wifi router" and "wifi 
device" using the "all-mpnet-base-v2" are

{"cosineSimilarity":1,"cosineDistance":0,"sentenceOne":"internet device","sentenceTwo":"internet device"}

{"cosineSimilarity":0.47380197,"cosineDistance":0.526198,"sentenceOne":"internet device","sentenceTwo":"wifi router"}

{"cosineSimilarity":0.74852204,"cosineDistance":0.25147796,"sentenceOne":"internet device","sentenceTwo":"wifi device"} whereas as you can see "wifi device" is closer to "internet device" than "wifi router" to "internet device" using the model "all-mpnet-base-v2", whereas if you consider "wifi device" a false positive, then it is not helpful of course, but it might be useful otherwise considering the original question of this thread. HTH Michael



Am 02.01.23 um 17:54 schrieb Mikhail Khludnev:
Hello Trevor.
Can you help me better understand this approach? If we have a text "wifi
router" and inject "internet device" at indexing time, terms reside at the
same positions. How to avoid false positive match for query "wifi device"?

On Mon, Jan 2, 2023 at 4:16 PM Trevor Nicholls<tre...@castingthevoid.com>
wrote:

Hi Anh

The two links Michael shared relate to questions I asked when I was trying
to get synonym matching with our application.

I really do have multi-term synonym matching working at this point;
there's always scope for improvement of course but with the hints suppled
in those threads I was able to index our documents and search them using a
variety of synonymous terms, both single words and phrases.

Our application does not use either BooleanQuery or SynonymQuery; I have
just used the standard QueryParser. Instead the synonym processing occurs
in the indexing phase, which is not only simpler (one search pattern, one
query), but also I think you would also find it gives you superior
performance (because the synonym processing occurs once at indexing time
and not at all during searching - and I'm sure you'll be doing far more
searching than indexing).

cheers
T


-----Original Message-----
From: Michael Wechner<michael.wech...@wyona.com>
Sent: Thursday, 29 December 2022 08:56
To:java-user@lucene.apache.org
Subject: Re: Question for SynonymQuery

Hi Anh

The following Stackoverflow link might help


https://stackoverflow.com/questions/73240494/can-someone-assist-me-with-a-multi-word-synonym-problem-in-lucene

The following thread seems to confirm, that escaping the space with a
backslash does not help

https://lists.apache.org/list?java-user@lucene.apache.org:2022-3

HTH

Michael


Am 27.12.22 um 20:22 schrieb Anh Dũng Bùi:
Hi Lucene users,

I recently came across SynonymQuery and found out that it only
supports single-term synonyms (since it accepts a list of Term which
will be considered as synonyms). We have some multi-term synonyms like
"internet device" <-> "wifi router" or "dns" <-> "domain name
service". Am I right that I need to use something like a BooleanQuery
for these cases?
I have 2 other follow-up questions:
- Does SynonymQuery have any advantage over BooleanQuery? Or is it
only different in how scores are computed? As I understand
SynonymWeight will consider all terms as exactly the same while
BooleanQuery will favor the documents with more matched terms.
- Is it worth it to support multi-term synonyms in SynonymQuery? My
feeling is that it's better to just use BooleanQuery in those cases,
since to support multi-term synonyms it needs to accept a list of
Query, which would make it behave like a BooleanQuery. Also how
scoring works with multi-term is another problem.

Thanks & Regards!


---------------------------------------------------------------------
To unsubscribe, e-mail:java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail:java-user-h...@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail:java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail:java-user-h...@lucene.apache.org


Reply via email to