[jira] [Commented] (OAK-4804) Synonym analyzer with multiple words in synonym definition can give more results than expected
[ https://issues.apache.org/jira/browse/OAK-4804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16012055#comment-16012055 ] Tommaso Teofili commented on OAK-4804: -- I think having different analyzers for index and query time is anyway mandatory for defining custom text analysis pipelines which make sense in practice. > Synonym analyzer with multiple words in synonym definition can give more > results than expected > -- > > Key: OAK-4804 > URL: https://issues.apache.org/jira/browse/OAK-4804 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Reporter: Vikas Saurabh >Assignee: Vikas Saurabh >Priority: Minor > > Setting up synonyms such as {{"FTW, For the win"}} would also return > documents which contain all of {{"For", "the", "win"}}. > Test case: > {noformat} > @Test > public void fulltextSearchWithPhraseSynonymAnalyzer() throws Exception { > Tree idx = createFulltextIndex(root.getTree("/"), "test"); > TestUtil.useV2(idx); > Tree anl = > idx.addChild(LuceneIndexConstants.ANALYZERS).addChild(LuceneIndexConstants.ANL_DEFAULT); > > anl.addChild(LuceneIndexConstants.ANL_TOKENIZER).setProperty(LuceneIndexConstants.ANL_NAME, > "Standard"); > Tree synFilter = > anl.addChild(LuceneIndexConstants.ANL_FILTERS).addChild("Synonym"); > synFilter.setProperty("synonyms", "syn.txt"); > > synFilter.addChild("syn.txt").addChild(JCR_CONTENT).setProperty(JCR_DATA, > "FTW, For the win"); > Tree test = root.getTree("/").addChild("test"); > test.addChild("1").setProperty("foo", "FTW"); > test.addChild("2").setProperty("foo", "For the win"); > test.addChild("3").setProperty("foo", "For gods sake, this is not the > way to win it"); > root.commit(); > assertQuery("select * from [nt:base] where CONTAINS(*, 'FTW') AND > ISDESCENDANTNODE('/test')", > asList("/test/1", "/test/2"));//current (failing result is > ["/test/1", "/test/2", "/test/3"]) > } > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (OAK-4804) Synonym analyzer with multiple words in synonym definition can give more results than expected
[ https://issues.apache.org/jira/browse/OAK-4804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15871309#comment-15871309 ] Chetan Mehrotra commented on OAK-4804: -- bq. The only way that I can think of fixing this is to allow for different analyzer for index and querying time. Bit late here but yes I think we need to have such a support. > Synonym analyzer with multiple words in synonym definition can give more > results than expected > -- > > Key: OAK-4804 > URL: https://issues.apache.org/jira/browse/OAK-4804 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Reporter: Vikas Saurabh >Assignee: Vikas Saurabh >Priority: Minor > > Setting up synonyms such as {{"FTW, For the win"}} would also return > documents which contain all of {{"For", "the", "win"}}. > Test case: > {noformat} > @Test > public void fulltextSearchWithPhraseSynonymAnalyzer() throws Exception { > Tree idx = createFulltextIndex(root.getTree("/"), "test"); > TestUtil.useV2(idx); > Tree anl = > idx.addChild(LuceneIndexConstants.ANALYZERS).addChild(LuceneIndexConstants.ANL_DEFAULT); > > anl.addChild(LuceneIndexConstants.ANL_TOKENIZER).setProperty(LuceneIndexConstants.ANL_NAME, > "Standard"); > Tree synFilter = > anl.addChild(LuceneIndexConstants.ANL_FILTERS).addChild("Synonym"); > synFilter.setProperty("synonyms", "syn.txt"); > > synFilter.addChild("syn.txt").addChild(JCR_CONTENT).setProperty(JCR_DATA, > "FTW, For the win"); > Tree test = root.getTree("/").addChild("test"); > test.addChild("1").setProperty("foo", "FTW"); > test.addChild("2").setProperty("foo", "For the win"); > test.addChild("3").setProperty("foo", "For gods sake, this is not the > way to win it"); > root.commit(); > assertQuery("select * from [nt:base] where CONTAINS(*, 'FTW') AND > ISDESCENDANTNODE('/test')", > asList("/test/1", "/test/2"));//current (failing result is > ["/test/1", "/test/2", "/test/3"]) > } > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (OAK-4804) Synonym analyzer with multiple words in synonym definition can give more results than expected
[ https://issues.apache.org/jira/browse/OAK-4804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15646457#comment-15646457 ] Vikas Saurabh commented on OAK-4804: OAK-4804 Btw, there's an ugly work around to solve some of the cases. Assuming that index def is only about full text aggregated on top-level node, then if we have following nodes * /AB -> contains {{"A B"}} * /AXB -> contains {{"A X B"}} * /C D -> contains {{"C D"}} * /E -> contains {{E}} * /F -> contains {{F}} then having synonym def to have {noformat} A B=>A,B C D=>C,D A B,C D=>E,F E,F {noformat} would give following results ||query text||result||comment|| |E|/E, /F /AB, /CD| | |A B|/AB, /AXB| | |A|/AB, /AXB| | |"A B"|/AB, /AXB, /E, /F, /CD|possibly unexpected| |C D|/CD| | |D|/CD| | |"C D"|/CD, /E, /F, /AB|possibly unexpected| > Synonym analyzer with multiple words in synonym definition can give more > results than expected > -- > > Key: OAK-4804 > URL: https://issues.apache.org/jira/browse/OAK-4804 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Reporter: Vikas Saurabh >Assignee: Vikas Saurabh >Priority: Minor > > Setting up synonyms such as {{"FTW, For the win"}} would also return > documents which contain all of {{"For", "the", "win"}}. > Test case: > {noformat} > @Test > public void fulltextSearchWithPhraseSynonymAnalyzer() throws Exception { > Tree idx = createFulltextIndex(root.getTree("/"), "test"); > TestUtil.useV2(idx); > Tree anl = > idx.addChild(LuceneIndexConstants.ANALYZERS).addChild(LuceneIndexConstants.ANL_DEFAULT); > > anl.addChild(LuceneIndexConstants.ANL_TOKENIZER).setProperty(LuceneIndexConstants.ANL_NAME, > "Standard"); > Tree synFilter = > anl.addChild(LuceneIndexConstants.ANL_FILTERS).addChild("Synonym"); > synFilter.setProperty("synonyms", "syn.txt"); > > synFilter.addChild("syn.txt").addChild(JCR_CONTENT).setProperty(JCR_DATA, > "FTW, For the win"); > Tree test = root.getTree("/").addChild("test"); > test.addChild("1").setProperty("foo", "FTW"); > test.addChild("2").setProperty("foo", "For the win"); > test.addChild("3").setProperty("foo", "For gods sake, this is not the > way to win it"); > root.commit(); > assertQuery("select * from [nt:base] where CONTAINS(*, 'FTW') AND > ISDESCENDANTNODE('/test')", > asList("/test/1", "/test/2"));//current (failing result is > ["/test/1", "/test/2", "/test/3"]) > } > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4804) Synonym analyzer with multiple words in synonym definition can give more results than expected
[ https://issues.apache.org/jira/browse/OAK-4804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15492693#comment-15492693 ] Vikas Saurabh commented on OAK-4804: I couldn't find anything on the web. Single/double quotes didn't work :(. [~teofili], would you know? > Synonym analyzer with multiple words in synonym definition can give more > results than expected > -- > > Key: OAK-4804 > URL: https://issues.apache.org/jira/browse/OAK-4804 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Reporter: Vikas Saurabh >Assignee: Vikas Saurabh >Priority: Minor > > Setting up synonyms such as {{"FTW, For the win"}} would also return > documents which contain all of {{"For", "the", "win"}}. > Test case: > {noformat} > @Test > public void fulltextSearchWithPhraseSynonymAnalyzer() throws Exception { > Tree idx = createFulltextIndex(root.getTree("/"), "test"); > TestUtil.useV2(idx); > Tree anl = > idx.addChild(LuceneIndexConstants.ANALYZERS).addChild(LuceneIndexConstants.ANL_DEFAULT); > > anl.addChild(LuceneIndexConstants.ANL_TOKENIZER).setProperty(LuceneIndexConstants.ANL_NAME, > "Standard"); > Tree synFilter = > anl.addChild(LuceneIndexConstants.ANL_FILTERS).addChild("Synonym"); > synFilter.setProperty("synonyms", "syn.txt"); > > synFilter.addChild("syn.txt").addChild(JCR_CONTENT).setProperty(JCR_DATA, > "FTW, For the win"); > Tree test = root.getTree("/").addChild("test"); > test.addChild("1").setProperty("foo", "FTW"); > test.addChild("2").setProperty("foo", "For the win"); > test.addChild("3").setProperty("foo", "For gods sake, this is not the > way to win it"); > root.commit(); > assertQuery("select * from [nt:base] where CONTAINS(*, 'FTW') AND > ISDESCENDANTNODE('/test')", > asList("/test/1", "/test/2"));//current (failing result is > ["/test/1", "/test/2", "/test/3"]) > } > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4804) Synonym analyzer with multiple words in synonym definition can give more results than expected
[ https://issues.apache.org/jira/browse/OAK-4804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15492646#comment-15492646 ] Marcel Reutegger commented on OAK-4804: --- Is there a way to define a phrase instead of individual words for the synonym? E.g: {{FTW, 'For the win'}}. > Synonym analyzer with multiple words in synonym definition can give more > results than expected > -- > > Key: OAK-4804 > URL: https://issues.apache.org/jira/browse/OAK-4804 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Reporter: Vikas Saurabh >Assignee: Vikas Saurabh >Priority: Minor > > Setting up synonyms such as {{"FTW, For the win"}} would also return > documents which contain all of {{"For", "the", "win"}}. > Test case: > {noformat} > @Test > public void fulltextSearchWithPhraseSynonymAnalyzer() throws Exception { > Tree idx = createFulltextIndex(root.getTree("/"), "test"); > TestUtil.useV2(idx); > Tree anl = > idx.addChild(LuceneIndexConstants.ANALYZERS).addChild(LuceneIndexConstants.ANL_DEFAULT); > > anl.addChild(LuceneIndexConstants.ANL_TOKENIZER).setProperty(LuceneIndexConstants.ANL_NAME, > "Standard"); > Tree synFilter = > anl.addChild(LuceneIndexConstants.ANL_FILTERS).addChild("Synonym"); > synFilter.setProperty("synonyms", "syn.txt"); > > synFilter.addChild("syn.txt").addChild(JCR_CONTENT).setProperty(JCR_DATA, > "FTW, For the win"); > Tree test = root.getTree("/").addChild("test"); > test.addChild("1").setProperty("foo", "FTW"); > test.addChild("2").setProperty("foo", "For the win"); > test.addChild("3").setProperty("foo", "For gods sake, this is not the > way to win it"); > root.commit(); > assertQuery("select * from [nt:base] where CONTAINS(*, 'FTW') AND > ISDESCENDANTNODE('/test')", > asList("/test/1", "/test/2"));//current (failing result is > ["/test/1", "/test/2", "/test/3"]) > } > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)