Hello, Santam. It seems I achieved what you asking for. https://github.com/mkhludnev/likely/blob/381b491d25e4d2035dd5b8a891dfdcfe2b986b90/src/test/java/org/apache/lucene/playground/TestMultiPulty.java#L32 It expands API and UI into phrases, which match like you expect.
On Fri, Jan 20, 2023 at 4:18 PM _ SATNAM <satnamsingh9...@gmail.com> wrote: > Hey Mikhail and Anh Dung Bui > i am also struggling with synonym query > my use case for eg > I created synonyms for word > API ------> Application program interface > UI ---------> user interface > > doc 1 ---> This is API and it is called Application program interface > doc2 ----> How i help you in UI things > doc3-----> my substance interface > doc4 ------> how to write c++ program > > what i want to achieve is when i search for API UI together > > expected result > it must highlight ---> API and Application program interface in doc1 > ------> UI in doc2 > > but coming output is > it highlighted ---> API and Application program interface in doc1 > ------> UI in doc2 > -----> interface in doc 3 > ------> program in doc4 > > Do you have any suggesting how i achieve this > > (API) OR (UI) > Each term act as phrase query for API UI > no single tokens be matched ,phrase should be matched > > > > > > On Thu, Jan 19, 2023 at 6:56 AM Anh Dũng Bùi <dungba...@gmail.com> wrote: > > > Thanks Mikhail! > > > > It turns out I used FlattenGraphFilter and cause the PositionLength to be > > all 1 and resulted in the behavior above =) > > > > A side note is that we don't need to use WORD_SEPARATOR in the synonym > > file. SynonymMap.Parser.analyze would tokenize and append the separator > for > > us. > > > > Regards, > > Anh Dung Bui > > > > On Mon, Jan 2, 2023 at 8:07 Mikhail Khludnev <m...@apache.org> wrote: > > > > > Hello Anh, > > > I was intrigued by your question. And I managed it to work somehow. > > > see > > > > > > > > > https://github.com/mkhludnev/likely/blob/eval-mulyw-syns/src/test/java/org/apache/lucene/playground/TestMultiPulty.java > > > Beware, synonym files > > > > > > > > > https://github.com/mkhludnev/likely/blob/eval-mulyw-syns/src/test/resources/org/apache/lucene/playground/multy-syn.txt > > > should use > > > > > > > > > https://lucene.apache.org/core/8_0_0/analyzers-common/org/apache/lucene/analysis/synonym/SynonymMap.html#WORD_SEPARATOR > > > Have a nice hack! > > > > > > On Thu, Dec 29, 2022 at 10:00 AM Anh Dũng Bùi <dungba...@gmail.com> > > wrote: > > > > > > > Thanks everyone for the insight. I guess I'll use BooleanQuery then. > > > > > > > > There is also a caveat I noticed (not sure if it's an issue or not), > > > which > > > > is slightly different from the mentioned thread. When I have a > > multi-word > > > > synonym, let say "wifi router" and "internet device". Then using > > > > SynonymGraphFilter at query time (when building the SynonymMap I > > already > > > > escaped space with the backslash) would produce this TokenStream for > a > > > > query of "wifi router" > > > > > > > > "wifi" (PositionIncrement=1,PositionLength=1), "internet" > > > > (PositionIncrement=0,PositionLength=1), "router" > > > > (PositionIncrement=1,PositionLength=1), "device" > > > > (PositionIncrement=0,PositionLength=1) > > > > > > > > This has the same effect as if I had 2 synonyms: "wifi"/"internet" > and > > > > "router"/"device". If I convert this to a BooleanQuery it would > become > > > > ("wifi" OR "internet") AND ("router" OR "device"), but what I would > > like > > > to > > > > achieve is ("wifi" AND "router") OR ("internet" AND "device") > > > > > > > > I'm curious if there would be some workaround for this case > > > > > > > > Thanks, > > > > Anh Dung Bui > > > > > > > > > > > > On Thu, Dec 29, 2022 at 4:56 AM Michael Wechner < > > > michael.wech...@wyona.com > > > > > > > > > wrote: > > > > > > > > > Hi Anh > > > > > > > > > > The following Stackoverflow link might help > > > > > > > > > > > > > > > > > > > > > > > > > https://stackoverflow.com/questions/73240494/can-someone-assist-me-with-a-multi-word-synonym-problem-in-lucene > > > > > > > > > > The following thread seems to confirm, that escaping the space > with a > > > > > backslash does not help > > > > > > > > > > https://lists.apache.org/list?java-user@lucene.apache.org:2022-3 > > > > > > > > > > HTH > > > > > > > > > > Michael > > > > > > > > > > > > > > > Am 27.12.22 um 20:22 schrieb Anh Dũng Bùi: > > > > > > Hi Lucene users, > > > > > > > > > > > > I recently came across SynonymQuery and found out that it only > > > supports > > > > > > single-term synonyms (since it accepts a list of Term which will > be > > > > > > considered as synonyms). We have some multi-term synonyms like > > > > "internet > > > > > > device" <-> "wifi router" or "dns" <-> "domain name service". Am > I > > > > right > > > > > > that I need to use something like a BooleanQuery for these cases? > > > > > > > > > > > > I have 2 other follow-up questions: > > > > > > - Does SynonymQuery have any advantage over BooleanQuery? Or is > it > > > only > > > > > > different in how scores are computed? As I understand > SynonymWeight > > > > will > > > > > > consider all terms as exactly the same while BooleanQuery will > > favor > > > > the > > > > > > documents with more matched terms. > > > > > > - Is it worth it to support multi-term synonyms in SynonymQuery? > My > > > > > feeling > > > > > > is that it's better to just use BooleanQuery in those cases, > since > > to > > > > > > support multi-term synonyms it needs to accept a list of Query, > > which > > > > > would > > > > > > make it behave like a BooleanQuery. Also how scoring works with > > > > > multi-term > > > > > > is another problem. > > > > > > > > > > > > Thanks & Regards! > > > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > > > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > > > > > > > > > > > > > > > > > > -- > > > Sincerely yours > > > Mikhail Khludnev > > > https://t.me/MUST_SEARCH > > > A caveat: Cyrillic! > > > > > > -- Sincerely yours Mikhail Khludnev https://t.me/MUST_SEARCH A caveat: Cyrillic!