[jira] Commented: (LUCENE-2287) Unexpected terms are highlighted within nested SpanQuery instances
[ https://issues.apache.org/jira/browse/LUCENE-2287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12999311#comment-12999311 ] Salman Akram commented on LUCENE-2287: -- Hi, It seems the last patch was committed with still couple of failures. Any update on this? Do you think this is still better than the default highlighter? Thanks! Unexpected terms are highlighted within nested SpanQuery instances -- Key: LUCENE-2287 URL: https://issues.apache.org/jira/browse/LUCENE-2287 Project: Lucene - Java Issue Type: Improvement Components: contrib/highlighter Affects Versions: 2.9.1 Environment: Linux, Solaris, Windows Reporter: Michael Goddard Priority: Minor Attachments: LUCENE-2287.patch, LUCENE-2287.patch, LUCENE-2287.patch, LUCENE-2287.patch, LUCENE-2287.patch, LUCENE-2287.patch Original Estimate: 336h Remaining Estimate: 336h I haven't yet been able to resolve why I'm seeing spurious highlighting in nested SpanQuery instances. Briefly, the issue is illustrated by the second instance of Lucene being highlighted in the test below, when it doesn't satisfy the inner span. There's been some discussion about this on the java-dev list, and I'm opening this issue now because I have made some initial progress on this. This new test, added to the HighlighterTest class in lucene_2_9_1, illustrates this: /* * Ref: http://www.lucidimagination.com/blog/2009/07/18/the-spanquery/ */ public void testHighlightingNestedSpans2() throws Exception { String theText = The Lucene was made by Doug Cutting and Lucene great Hadoop was; // Problem //String theText = The Lucene was made by Doug Cutting and the great Hadoop was; // Works okay String fieldName = SOME_FIELD_NAME; SpanNearQuery spanNear = new SpanNearQuery(new SpanQuery[] { new SpanTermQuery(new Term(fieldName, lucene)), new SpanTermQuery(new Term(fieldName, doug)) }, 5, true); Query query = new SpanNearQuery(new SpanQuery[] { spanNear, new SpanTermQuery(new Term(fieldName, hadoop)) }, 4, true); String expected = The BLucene/B was made by BDoug/B Cutting and Lucene great BHadoop/B was; //String expected = The BLucene/B was made by BDoug/B Cutting and the great BHadoop/B was; String observed = highlightField(query, fieldName, theText); System.out.println(Expected: \ + expected + \n + Observed: \ + observed); assertEquals(Why is that second instance of the term \Lucene\ highlighted?, expected, observed); } Is this an issue that's arisen before? I've been reading through the source to QueryScorer, WeightedSpanTerm, WeightedSpanTermExtractor, Spans, and NearSpansOrdered, but haven't found the solution yet. Initially, I thought that the extractWeightedSpanTerms method in WeightedSpanTermExtractor should be called on each clause of a SpanNearQuery or SpanOrQuery, but that didn't get me too far. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1604) Wildcards, ORs etc inside Phrase Queries
[ https://issues.apache.org/jira/browse/SOLR-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12990038#comment-12990038 ] Salman Akram commented on SOLR-1604: Reminder! Any updates regarding integration with CommonGrams? Thanks Wildcards, ORs etc inside Phrase Queries Key: SOLR-1604 URL: https://issues.apache.org/jira/browse/SOLR-1604 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Reporter: Ahmet Arslan Priority: Minor Fix For: Next Attachments: ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhraseQueryParser.java, SOLR-1604.patch Solr Plugin for ComplexPhraseQueryParser (LUCENE-1486) which supports wildcards, ORs, ranges, fuzzies inside phrase queries. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1604) Wildcards, ORs etc inside Phrase Queries
[ https://issues.apache.org/jira/browse/SOLR-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12985771#action_12985771 ] Salman Akram commented on SOLR-1604: Any updates on integration with CommonGrams? Thanks Wildcards, ORs etc inside Phrase Queries Key: SOLR-1604 URL: https://issues.apache.org/jira/browse/SOLR-1604 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Reporter: Ahmet Arslan Priority: Minor Fix For: Next Attachments: ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhraseQueryParser.java, SOLR-1604.patch Solr Plugin for ComplexPhraseQueryParser (LUCENE-1486) which supports wildcards, ORs, ranges, fuzzies inside phrase queries. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1604) Wildcards, ORs etc inside Phrase Queries
[ https://issues.apache.org/jira/browse/SOLR-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984786#action_12984786 ] Salman Akram commented on SOLR-1604: Ahmet, I would be waiting for your response on CommonGrams. Would be grateful if you can look into it this weekend. Thanks! Wildcards, ORs etc inside Phrase Queries Key: SOLR-1604 URL: https://issues.apache.org/jira/browse/SOLR-1604 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Reporter: Ahmet Arslan Priority: Minor Fix For: Next Attachments: ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhraseQueryParser.java, SOLR-1604.patch Solr Plugin for ComplexPhraseQueryParser (LUCENE-1486) which supports wildcards, ORs, ranges, fuzzies inside phrase queries. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1604) Wildcards, ORs etc inside Phrase Queries
[ https://issues.apache.org/jira/browse/SOLR-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12984820#action_12984820 ] Salman Akram commented on SOLR-1604: Although I would be asking this question on the mailing list as well but since its related to this patch so I wanted to check if this patch would work fine with SurroundQueryParser or if Surround does it itself? This patch functionality is really important for me. Thanks a lot! Wildcards, ORs etc inside Phrase Queries Key: SOLR-1604 URL: https://issues.apache.org/jira/browse/SOLR-1604 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Reporter: Ahmet Arslan Priority: Minor Fix For: Next Attachments: ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhraseQueryParser.java, SOLR-1604.patch Solr Plugin for ComplexPhraseQueryParser (LUCENE-1486) which supports wildcards, ORs, ranges, fuzzies inside phrase queries. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1604) Wildcards, ORs etc inside Phrase Queries
[ https://issues.apache.org/jira/browse/SOLR-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12983208#action_12983208 ] Salman Akram commented on SOLR-1604: I tried the patch with latest non-grayed file but still inOrder doesn't seem to have any impact. Results for a b~5 and b a~5 are still different. Also any feedback about CommonGrams integration? Thanks a lot for all the help! Wildcards, ORs etc inside Phrase Queries Key: SOLR-1604 URL: https://issues.apache.org/jira/browse/SOLR-1604 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Reporter: Ahmet Arslan Priority: Minor Fix For: Next Attachments: ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhraseQueryParser.java, SOLR-1604.patch Solr Plugin for ComplexPhraseQueryParser (LUCENE-1486) which supports wildcards, ORs, ranges, fuzzies inside phrase queries. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1604) Wildcards, ORs etc inside Phrase Queries
[ https://issues.apache.org/jira/browse/SOLR-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982651#action_12982651 ] Salman Akram commented on SOLR-1604: I am trying to use CommonGrams with this patch but doesn't seem to work. If I don't add {!complexphrase} it uses CommonGramsQueryFilterFactory and proper bi-grams are made but of course doesn't use this patch. If I add {!complexphrase} it simply does it the old way i.e. ignore CommonGrams. Can you please help how can I combine both these features? Wildcards, ORs etc inside Phrase Queries Key: SOLR-1604 URL: https://issues.apache.org/jira/browse/SOLR-1604 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Reporter: Ahmet Arslan Priority: Minor Fix For: Next Attachments: ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhraseQueryParser.java, SOLR-1604.patch Solr Plugin for ComplexPhraseQueryParser (LUCENE-1486) which supports wildcards, ORs, ranges, fuzzies inside phrase queries. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1604) Wildcards, ORs etc inside Phrase Queries
[ https://issues.apache.org/jira/browse/SOLR-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12980660#action_12980660 ] Salman Akram commented on SOLR-1604: I integrated the patch and its working fine however, there were couple of issues. One is already resolved with the above un-ordered proximity parameters. The issue is that although proximity search works with phrases BUT its not very accurate e.g. If I want to search a b within 10 words of c the query would end up being a b c~10 but this will also return cases where a is not necessarily together with b. Any scenario where these 3 words are within 10 words of each other will match. Is it possible in SOLR to do what I mentioned above? Any other patch? Something like a b c ~10... Thanks! Wildcards, ORs etc inside Phrase Queries Key: SOLR-1604 URL: https://issues.apache.org/jira/browse/SOLR-1604 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Reporter: Ahmet Arslan Priority: Minor Fix For: Next Attachments: ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhraseQueryParser.java, SOLR-1604.patch Solr Plugin for ComplexPhraseQueryParser (LUCENE-1486) which supports wildcards, ORs, ranges, fuzzies inside phrase queries. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (SOLR-1604) Wildcards, ORs etc inside Phrase Queries
[ https://issues.apache.org/jira/browse/SOLR-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12980660#action_12980660 ] Salman Akram edited comment on SOLR-1604 at 1/12/11 6:37 AM: - I integrated the patch and its working fine however, there were couple of issues. One is already resolved with the above un-ordered proximity parameters. The issue is that although proximity search works with phrases BUT its not very accurate e.g. If I want to search a b within 10 words of c the query would end up being a b c~10 but this will also return cases where a is not necessarily together with b. Any scenario where these 3 words are within 10 words of each other will match. Is it possible in SOLR to do what I mentioned above? Any other patch? Something like a b c ~10... Note: I was going through Lucene-1486 and there Ahmet mentioned that Specifically : (john johathon) smith~10 works perfectly. For me it seems there is no difference if I put the parenthesis or not. Thanks! was (Author: salman741): I integrated the patch and its working fine however, there were couple of issues. One is already resolved with the above un-ordered proximity parameters. The issue is that although proximity search works with phrases BUT its not very accurate e.g. If I want to search a b within 10 words of c the query would end up being a b c~10 but this will also return cases where a is not necessarily together with b. Any scenario where these 3 words are within 10 words of each other will match. Is it possible in SOLR to do what I mentioned above? Any other patch? Something like a b c ~10... Thanks! Wildcards, ORs etc inside Phrase Queries Key: SOLR-1604 URL: https://issues.apache.org/jira/browse/SOLR-1604 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Reporter: Ahmet Arslan Priority: Minor Fix For: Next Attachments: ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhraseQueryParser.java, SOLR-1604.patch Solr Plugin for ComplexPhraseQueryParser (LUCENE-1486) which supports wildcards, ORs, ranges, fuzzies inside phrase queries. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (SOLR-1604) Wildcards, ORs etc inside Phrase Queries
[ https://issues.apache.org/jira/browse/SOLR-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12980660#action_12980660 ] Salman Akram edited comment on SOLR-1604 at 1/12/11 6:41 AM: - I integrated the patch and its working fine however, there were couple of issues. One is related to un-ordered proximity which seems to be fixed with the inOrder parameter but its not working for me (doesn't give any error but its still ordered). I will try to get the patch again coz I also merged it in early Nov so maybe it was applied after that. The other issue is that although proximity search works with phrases BUT its not very accurate e.g. If I want to search a b within 10 words of c the query would end up being a b c~10 but this will also return cases where a is not necessarily together with b. Any scenario where these 3 words are within 10 words of each other will match. Is it possible in SOLR to do what I mentioned above? Any other patch? Something like a b c ~10... Note: I was going through Lucene-1486 and there Ahmet mentioned that Specifically : (john johathon) smith~10 works perfectly. For me it seems there is no difference if I put the parenthesis or not. Thanks! was (Author: salman741): I integrated the patch and its working fine however, there were couple of issues. One is already resolved with the above un-ordered proximity parameters. The issue is that although proximity search works with phrases BUT its not very accurate e.g. If I want to search a b within 10 words of c the query would end up being a b c~10 but this will also return cases where a is not necessarily together with b. Any scenario where these 3 words are within 10 words of each other will match. Is it possible in SOLR to do what I mentioned above? Any other patch? Something like a b c ~10... Note: I was going through Lucene-1486 and there Ahmet mentioned that Specifically : (john johathon) smith~10 works perfectly. For me it seems there is no difference if I put the parenthesis or not. Thanks! Wildcards, ORs etc inside Phrase Queries Key: SOLR-1604 URL: https://issues.apache.org/jira/browse/SOLR-1604 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Reporter: Ahmet Arslan Priority: Minor Fix For: Next Attachments: ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhraseQueryParser.java, SOLR-1604.patch Solr Plugin for ComplexPhraseQueryParser (LUCENE-1486) which supports wildcards, ORs, ranges, fuzzies inside phrase queries. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1604) Wildcards, ORs etc inside Phrase Queries
[ https://issues.apache.org/jira/browse/SOLR-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12980853#action_12980853 ] Salman Akram commented on SOLR-1604: I am using SOLR 1.4.1 but integrated this patch in early Nov so maybe you committed the inOrder parameter after that? When you say Regarding parenthesis inside quotes... if this works and groups the words in phrase together won't it work for my case e.g. (a b) c~10? I guess if SurroundQuery doesn't use any analyzer it would be very difficult to make the existing queries work (I am using Standard Analyzer). Wildcards, ORs etc inside Phrase Queries Key: SOLR-1604 URL: https://issues.apache.org/jira/browse/SOLR-1604 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Reporter: Ahmet Arslan Priority: Minor Fix For: Next Attachments: ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhraseQueryParser.java, SOLR-1604.patch Solr Plugin for ComplexPhraseQueryParser (LUCENE-1486) which supports wildcards, ORs, ranges, fuzzies inside phrase queries. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org