[ https://issues.apache.org/jira/browse/SOLR-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12850608#action_12850608 ]
Peter Wolanin edited comment on SOLR-1852 at 3/27/10 11:52 PM: --------------------------------------------------------------- This patch was created by Mark Miller - it's a back port of Solr trunk code plus a tweak to let 1.4 compile With this updated WordDelimiterFilter if I reindex the bug seems to be fixed. In terms of the bug's symptoms to reproduce it, it looks as though Identi.ca is treated as phrase query as if I had quoted it like "Identi ca". That phrase search also fails. I had expected that Identi.ca would be the same as Identi ca (i.e. 2 separate tokens, not a phrase). was (Author: pwolanin): This patch was created by Mark Miller - it's a back port of Solr trunk code plus a tweak to let 1.4 compile With this updated Whitespace Delimiter if I reindex the bug seems to be fixed. In terms of the bug's symptoms to reproduce it, it looks as though Identi.ca is treated as phrase query as if I had quoted it like "Identi ca". That phrase search also fails. I had expected that Identi.ca would be the same as Identi ca (i.e. 2 separate tokens, not a phrase). > enablePositionIncrements="true" can cause searches to fail when they are > parsed as phrase queries > ------------------------------------------------------------------------------------------------- > > Key: SOLR-1852 > URL: https://issues.apache.org/jira/browse/SOLR-1852 > Project: Solr > Issue Type: Bug > Affects Versions: 1.4 > Reporter: Peter Wolanin > Attachments: SOLR-1852.patch > > > Symptom: searching for a string like a domain name containing a '.', the Solr > 1.4 analyzer tells me that I will get a match, but when I enter the search > either in the client or directly in Solr, the search fails. > test string: Identi.ca > queries that fail: IdentiCa, Identi.ca, Identi-ca > query that matches: Identi ca > schema in use is: > http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34&content-type=text%2Fplain&view=co&pathrev=DRUPAL-6--1 > Screen shots: > analysis: http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png > dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png > dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png > standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png > Whether or not the bug appears is determined by the surrounding text: > "would be great to have support for Identi.ca on the follow block" > fails to match "Identi.ca", but putting the content on its own or in another > sentence: > "Support Identi.ca" > the search matches. Testing suggests the word "for" is the problem, and it > looks like the bug occurs when a stop word preceeds a word that is split up > using the word delimiter filter. > Setting enablePositionIncrements="false" in the stop filter and reindexing > causes the searches to match. > According to Mark Miller in #solr, this bug appears to be fixed already in > Solr trunk, either due to the upgraded lucene or changes to the > WordDelimiterFactory -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.