independent of the synonym implementation you might want to consider vector/similarity
search, for example if the query is "internet device",
then the cosine similarity of the multi-terms "internet device", "wifi router" and "wifi
device" using the "all-mpnet-base-v2" are
{"cosineSimilarity":1,"cosineDistance":0,"sentenceOne":"internet
device","sentenceTwo":"internet device"}
{"cosineSimilarity":0.47380197,"cosineDistance":0.526198,"sentenceOne":"internet
device","sentenceTwo":"wifi router"}
{"cosineSimilarity":0.74852204,"cosineDistance":0.25147796,"sentenceOne":"internet
device","sentenceTwo":"wifi device"} whereas as you can see "wifi
device" is closer to "internet device" than "wifi router" to "internet
device" using the model "all-mpnet-base-v2", whereas if you consider
"wifi device" a false positive, then it is not helpful of course, but it
might be useful otherwise considering the original question of this
thread. HTH Michael
Am 02.01.23 um 17:54 schrieb Mikhail Khludnev:
Hello Trevor.
Can you help me better understand this approach? If we have a text "wifi
router" and inject "internet device" at indexing time, terms reside at the
same positions. How to avoid false positive match for query "wifi device"?
On Mon, Jan 2, 2023 at 4:16 PM Trevor Nicholls<tre...@castingthevoid.com>
wrote:
Hi Anh
The two links Michael shared relate to questions I asked when I was trying
to get synonym matching with our application.
I really do have multi-term synonym matching working at this point;
there's always scope for improvement of course but with the hints suppled
in those threads I was able to index our documents and search them using a
variety of synonymous terms, both single words and phrases.
Our application does not use either BooleanQuery or SynonymQuery; I have
just used the standard QueryParser. Instead the synonym processing occurs
in the indexing phase, which is not only simpler (one search pattern, one
query), but also I think you would also find it gives you superior
performance (because the synonym processing occurs once at indexing time
and not at all during searching - and I'm sure you'll be doing far more
searching than indexing).
cheers
T
-----Original Message-----
From: Michael Wechner<michael.wech...@wyona.com>
Sent: Thursday, 29 December 2022 08:56
To:java-user@lucene.apache.org
Subject: Re: Question for SynonymQuery
Hi Anh
The following Stackoverflow link might help
https://stackoverflow.com/questions/73240494/can-someone-assist-me-with-a-multi-word-synonym-problem-in-lucene
The following thread seems to confirm, that escaping the space with a
backslash does not help
https://lists.apache.org/list?java-user@lucene.apache.org:2022-3
HTH
Michael
Am 27.12.22 um 20:22 schrieb Anh Dũng Bùi:
Hi Lucene users,
I recently came across SynonymQuery and found out that it only
supports single-term synonyms (since it accepts a list of Term which
will be considered as synonyms). We have some multi-term synonyms like
"internet device" <-> "wifi router" or "dns" <-> "domain name
service". Am I right that I need to use something like a BooleanQuery
for these cases?
I have 2 other follow-up questions:
- Does SynonymQuery have any advantage over BooleanQuery? Or is it
only different in how scores are computed? As I understand
SynonymWeight will consider all terms as exactly the same while
BooleanQuery will favor the documents with more matched terms.
- Is it worth it to support multi-term synonyms in SynonymQuery? My
feeling is that it's better to just use BooleanQuery in those cases,
since to support multi-term synonyms it needs to accept a list of
Query, which would make it behave like a BooleanQuery. Also how
scoring works with multi-term is another problem.
Thanks & Regards!
---------------------------------------------------------------------
To unsubscribe, e-mail:java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail:java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail:java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail:java-user-h...@lucene.apache.org