Hello Anh,
I was intrigued by your question. And I managed it to work somehow.
see
https://github.com/mkhludnev/likely/blob/eval-mulyw-syns/src/test/java/org/apache/lucene/playground/TestMultiPulty.java
Beware, synonym files
https://github.com/mkhludnev/likely/blob/eval-mulyw-syns/src/test/resources/org/apache/lucene/playground/multy-syn.txt
should use
https://lucene.apache.org/core/8_0_0/analyzers-common/org/apache/lucene/analysis/synonym/SynonymMap.html#WORD_SEPARATOR
Have a nice hack!

On Thu, Dec 29, 2022 at 10:00 AM Anh Dũng Bùi <dungba...@gmail.com> wrote:

> Thanks everyone for the insight. I guess I'll use BooleanQuery then.
>
> There is also a caveat I noticed (not sure if it's an issue or not), which
> is slightly different from the mentioned thread. When I have a multi-word
> synonym, let say "wifi router" and "internet device". Then using
> SynonymGraphFilter at query time (when building the SynonymMap I already
> escaped space with the backslash) would produce this TokenStream for a
> query of "wifi router"
>
> "wifi" (PositionIncrement=1,PositionLength=1), "internet"
> (PositionIncrement=0,PositionLength=1), "router"
> (PositionIncrement=1,PositionLength=1), "device"
> (PositionIncrement=0,PositionLength=1)
>
> This has the same effect as if I had 2 synonyms: "wifi"/"internet" and
> "router"/"device". If I convert this to a BooleanQuery it would become
> ("wifi" OR "internet") AND ("router" OR "device"), but what I would like to
> achieve is ("wifi" AND "router") OR ("internet" AND "device")
>
> I'm curious if there would be some workaround for this case
>
> Thanks,
> Anh Dung Bui
>
>
> On Thu, Dec 29, 2022 at 4:56 AM Michael Wechner <michael.wech...@wyona.com
> >
> wrote:
>
> > Hi Anh
> >
> > The following Stackoverflow link might help
> >
> >
> >
> https://stackoverflow.com/questions/73240494/can-someone-assist-me-with-a-multi-word-synonym-problem-in-lucene
> >
> > The following thread seems to confirm, that escaping the space with a
> > backslash does not help
> >
> > https://lists.apache.org/list?java-user@lucene.apache.org:2022-3
> >
> > HTH
> >
> > Michael
> >
> >
> > Am 27.12.22 um 20:22 schrieb Anh Dũng Bùi:
> > > Hi Lucene users,
> > >
> > > I recently came across SynonymQuery and found out that it only supports
> > > single-term synonyms (since it accepts a list of Term which will be
> > > considered as synonyms). We have some multi-term synonyms like
> "internet
> > > device" <-> "wifi router" or "dns" <-> "domain name service". Am I
> right
> > > that I need to use something like a BooleanQuery for these cases?
> > >
> > > I have 2 other follow-up questions:
> > > - Does SynonymQuery have any advantage over BooleanQuery? Or is it only
> > > different in how scores are computed? As I understand SynonymWeight
> will
> > > consider all terms as exactly the same while BooleanQuery will favor
> the
> > > documents with more matched terms.
> > > - Is it worth it to support multi-term synonyms in SynonymQuery? My
> > feeling
> > > is that it's better to just use BooleanQuery in those cases, since to
> > > support multi-term synonyms it needs to accept a list of Query, which
> > would
> > > make it behave like a BooleanQuery. Also how scoring works with
> > multi-term
> > > is another problem.
> > >
> > > Thanks & Regards!
> > >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >
>


-- 
Sincerely yours
Mikhail Khludnev
https://t.me/MUST_SEARCH
A caveat: Cyrillic!

Reply via email to