[jira] [Commented] (OAK-4804) Synonym analyzer with multiple words in synonym definition can give more results than expected

2017-05-16 Thread Tommaso Teofili (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16012055#comment-16012055
 ] 

Tommaso Teofili commented on OAK-4804:
--

I think having different analyzers for index and query time is anyway mandatory 
for defining custom text analysis pipelines which make sense in practice.

> Synonym analyzer with multiple words in synonym definition can give more 
> results than expected
> --
>
> Key: OAK-4804
> URL: https://issues.apache.org/jira/browse/OAK-4804
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Reporter: Vikas Saurabh
>Assignee: Vikas Saurabh
>Priority: Minor
>
> Setting up synonyms such as {{"FTW, For the win"}} would also return 
> documents which contain all of {{"For", "the", "win"}}.
> Test case:
> {noformat}
> @Test
> public void fulltextSearchWithPhraseSynonymAnalyzer() throws Exception {
> Tree idx = createFulltextIndex(root.getTree("/"), "test");
> TestUtil.useV2(idx);
> Tree anl = 
> idx.addChild(LuceneIndexConstants.ANALYZERS).addChild(LuceneIndexConstants.ANL_DEFAULT);
> 
> anl.addChild(LuceneIndexConstants.ANL_TOKENIZER).setProperty(LuceneIndexConstants.ANL_NAME,
>  "Standard");
> Tree synFilter = 
> anl.addChild(LuceneIndexConstants.ANL_FILTERS).addChild("Synonym");
> synFilter.setProperty("synonyms", "syn.txt");
> 
> synFilter.addChild("syn.txt").addChild(JCR_CONTENT).setProperty(JCR_DATA, 
> "FTW, For the win");
> Tree test = root.getTree("/").addChild("test");
> test.addChild("1").setProperty("foo", "FTW");
> test.addChild("2").setProperty("foo", "For the win");
> test.addChild("3").setProperty("foo", "For gods sake, this is not the 
> way to win it");
> root.commit();
> assertQuery("select * from [nt:base] where CONTAINS(*, 'FTW') AND 
> ISDESCENDANTNODE('/test')",
> asList("/test/1", "/test/2"));//current (failing result is 
> ["/test/1", "/test/2", "/test/3"])
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (OAK-4804) Synonym analyzer with multiple words in synonym definition can give more results than expected

2017-02-16 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15871309#comment-15871309
 ] 

Chetan Mehrotra commented on OAK-4804:
--

bq. The only way that I can think of fixing this is to allow for different 
analyzer for index and querying time.

Bit late here but yes I think we need to have such a support.

> Synonym analyzer with multiple words in synonym definition can give more 
> results than expected
> --
>
> Key: OAK-4804
> URL: https://issues.apache.org/jira/browse/OAK-4804
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Reporter: Vikas Saurabh
>Assignee: Vikas Saurabh
>Priority: Minor
>
> Setting up synonyms such as {{"FTW, For the win"}} would also return 
> documents which contain all of {{"For", "the", "win"}}.
> Test case:
> {noformat}
> @Test
> public void fulltextSearchWithPhraseSynonymAnalyzer() throws Exception {
> Tree idx = createFulltextIndex(root.getTree("/"), "test");
> TestUtil.useV2(idx);
> Tree anl = 
> idx.addChild(LuceneIndexConstants.ANALYZERS).addChild(LuceneIndexConstants.ANL_DEFAULT);
> 
> anl.addChild(LuceneIndexConstants.ANL_TOKENIZER).setProperty(LuceneIndexConstants.ANL_NAME,
>  "Standard");
> Tree synFilter = 
> anl.addChild(LuceneIndexConstants.ANL_FILTERS).addChild("Synonym");
> synFilter.setProperty("synonyms", "syn.txt");
> 
> synFilter.addChild("syn.txt").addChild(JCR_CONTENT).setProperty(JCR_DATA, 
> "FTW, For the win");
> Tree test = root.getTree("/").addChild("test");
> test.addChild("1").setProperty("foo", "FTW");
> test.addChild("2").setProperty("foo", "For the win");
> test.addChild("3").setProperty("foo", "For gods sake, this is not the 
> way to win it");
> root.commit();
> assertQuery("select * from [nt:base] where CONTAINS(*, 'FTW') AND 
> ISDESCENDANTNODE('/test')",
> asList("/test/1", "/test/2"));//current (failing result is 
> ["/test/1", "/test/2", "/test/3"])
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (OAK-4804) Synonym analyzer with multiple words in synonym definition can give more results than expected

2016-11-07 Thread Vikas Saurabh (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15646457#comment-15646457
 ] 

Vikas Saurabh commented on OAK-4804:


OAK-4804
Btw, there's an ugly work around to solve some of the cases.

Assuming that index def is only about full text aggregated on top-level node, 
then if we have following nodes
* /AB -> contains {{"A B"}}
* /AXB -> contains {{"A X B"}}
* /C D -> contains {{"C D"}}
* /E -> contains {{E}}
* /F -> contains {{F}}

then having synonym def to have
{noformat}
A B=>A,B
C D=>C,D
A B,C D=>E,F
E,F
{noformat}
would give following results
||query text||result||comment||
|E|/E, /F /AB, /CD| |
|A B|/AB, /AXB| |
|A|/AB, /AXB| |
|"A B"|/AB, /AXB, /E, /F, /CD|possibly unexpected|
|C D|/CD| |
|D|/CD| |
|"C D"|/CD, /E, /F, /AB|possibly unexpected|

> Synonym analyzer with multiple words in synonym definition can give more 
> results than expected
> --
>
> Key: OAK-4804
> URL: https://issues.apache.org/jira/browse/OAK-4804
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Reporter: Vikas Saurabh
>Assignee: Vikas Saurabh
>Priority: Minor
>
> Setting up synonyms such as {{"FTW, For the win"}} would also return 
> documents which contain all of {{"For", "the", "win"}}.
> Test case:
> {noformat}
> @Test
> public void fulltextSearchWithPhraseSynonymAnalyzer() throws Exception {
> Tree idx = createFulltextIndex(root.getTree("/"), "test");
> TestUtil.useV2(idx);
> Tree anl = 
> idx.addChild(LuceneIndexConstants.ANALYZERS).addChild(LuceneIndexConstants.ANL_DEFAULT);
> 
> anl.addChild(LuceneIndexConstants.ANL_TOKENIZER).setProperty(LuceneIndexConstants.ANL_NAME,
>  "Standard");
> Tree synFilter = 
> anl.addChild(LuceneIndexConstants.ANL_FILTERS).addChild("Synonym");
> synFilter.setProperty("synonyms", "syn.txt");
> 
> synFilter.addChild("syn.txt").addChild(JCR_CONTENT).setProperty(JCR_DATA, 
> "FTW, For the win");
> Tree test = root.getTree("/").addChild("test");
> test.addChild("1").setProperty("foo", "FTW");
> test.addChild("2").setProperty("foo", "For the win");
> test.addChild("3").setProperty("foo", "For gods sake, this is not the 
> way to win it");
> root.commit();
> assertQuery("select * from [nt:base] where CONTAINS(*, 'FTW') AND 
> ISDESCENDANTNODE('/test')",
> asList("/test/1", "/test/2"));//current (failing result is 
> ["/test/1", "/test/2", "/test/3"])
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4804) Synonym analyzer with multiple words in synonym definition can give more results than expected

2016-09-15 Thread Vikas Saurabh (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15492693#comment-15492693
 ] 

Vikas Saurabh commented on OAK-4804:


I couldn't find anything on the web. Single/double quotes didn't work :(. 
[~teofili], would you know?

> Synonym analyzer with multiple words in synonym definition can give more 
> results than expected
> --
>
> Key: OAK-4804
> URL: https://issues.apache.org/jira/browse/OAK-4804
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Reporter: Vikas Saurabh
>Assignee: Vikas Saurabh
>Priority: Minor
>
> Setting up synonyms such as {{"FTW, For the win"}} would also return 
> documents which contain all of {{"For", "the", "win"}}.
> Test case:
> {noformat}
> @Test
> public void fulltextSearchWithPhraseSynonymAnalyzer() throws Exception {
> Tree idx = createFulltextIndex(root.getTree("/"), "test");
> TestUtil.useV2(idx);
> Tree anl = 
> idx.addChild(LuceneIndexConstants.ANALYZERS).addChild(LuceneIndexConstants.ANL_DEFAULT);
> 
> anl.addChild(LuceneIndexConstants.ANL_TOKENIZER).setProperty(LuceneIndexConstants.ANL_NAME,
>  "Standard");
> Tree synFilter = 
> anl.addChild(LuceneIndexConstants.ANL_FILTERS).addChild("Synonym");
> synFilter.setProperty("synonyms", "syn.txt");
> 
> synFilter.addChild("syn.txt").addChild(JCR_CONTENT).setProperty(JCR_DATA, 
> "FTW, For the win");
> Tree test = root.getTree("/").addChild("test");
> test.addChild("1").setProperty("foo", "FTW");
> test.addChild("2").setProperty("foo", "For the win");
> test.addChild("3").setProperty("foo", "For gods sake, this is not the 
> way to win it");
> root.commit();
> assertQuery("select * from [nt:base] where CONTAINS(*, 'FTW') AND 
> ISDESCENDANTNODE('/test')",
> asList("/test/1", "/test/2"));//current (failing result is 
> ["/test/1", "/test/2", "/test/3"])
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4804) Synonym analyzer with multiple words in synonym definition can give more results than expected

2016-09-15 Thread Marcel Reutegger (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15492646#comment-15492646
 ] 

Marcel Reutegger commented on OAK-4804:
---

Is there a way to define a phrase instead of individual words for the synonym? 
E.g: {{FTW, 'For the win'}}.

> Synonym analyzer with multiple words in synonym definition can give more 
> results than expected
> --
>
> Key: OAK-4804
> URL: https://issues.apache.org/jira/browse/OAK-4804
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Reporter: Vikas Saurabh
>Assignee: Vikas Saurabh
>Priority: Minor
>
> Setting up synonyms such as {{"FTW, For the win"}} would also return 
> documents which contain all of {{"For", "the", "win"}}.
> Test case:
> {noformat}
> @Test
> public void fulltextSearchWithPhraseSynonymAnalyzer() throws Exception {
> Tree idx = createFulltextIndex(root.getTree("/"), "test");
> TestUtil.useV2(idx);
> Tree anl = 
> idx.addChild(LuceneIndexConstants.ANALYZERS).addChild(LuceneIndexConstants.ANL_DEFAULT);
> 
> anl.addChild(LuceneIndexConstants.ANL_TOKENIZER).setProperty(LuceneIndexConstants.ANL_NAME,
>  "Standard");
> Tree synFilter = 
> anl.addChild(LuceneIndexConstants.ANL_FILTERS).addChild("Synonym");
> synFilter.setProperty("synonyms", "syn.txt");
> 
> synFilter.addChild("syn.txt").addChild(JCR_CONTENT).setProperty(JCR_DATA, 
> "FTW, For the win");
> Tree test = root.getTree("/").addChild("test");
> test.addChild("1").setProperty("foo", "FTW");
> test.addChild("2").setProperty("foo", "For the win");
> test.addChild("3").setProperty("foo", "For gods sake, this is not the 
> way to win it");
> root.commit();
> assertQuery("select * from [nt:base] where CONTAINS(*, 'FTW') AND 
> ISDESCENDANTNODE('/test')",
> asList("/test/1", "/test/2"));//current (failing result is 
> ["/test/1", "/test/2", "/test/3"])
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)