[jira] [Commented] (SOLR-219) Determine if prefix, wildcard, fuzzy queries should be lowercased
[ https://issues.apache.org/jira/browse/SOLR-219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13015333#comment-13015333 ] Patrick Allaert commented on SOLR-219: -- Any plan to implement this? Determine if prefix, wildcard, fuzzy queries should be lowercased - Key: SOLR-219 URL: https://issues.apache.org/jira/browse/SOLR-219 Project: Solr Issue Type: Improvement Reporter: Yonik Seeley Priority: Minor Fix For: Next Attachments: lowercase_prefix.patch, wildcardlowercase.patch Solr should be able to do the right thing when doing prefix/wildcard/fuzzy queries on fields with respect to lowercasing or not. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Handling wildcard search containing special characters (unicode)
Hello, Facing a Solr issue, I have been told that queries with a term like: Kiinteistösih* will not match the Finnish word Kiinteistösihteeri and that it's a known limitation of Lucene. Instead, using the word directly, without wildcard, works. Do you confirm this a known limitation/bug? If so do you have any registered issue about that? Searching the ML archive and the issue tracker in both SOLR and LUCENE projects didn't provide me a pointer to this problem. One of the reference I found on the web talking about this problem is: http://forum.compass-project.org/message.jspa?messageID=227709 But again, no pointer to a discussion or issue. Thanks in advance for your help, Patrick - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Handling wildcard search containing special characters (unicode)
2011/3/31 Robert Muir rcm...@gmail.com: On Thu, Mar 31, 2011 at 9:51 AM, Patrick ALLAERT patrick.alla...@gmail.com wrote: Hello, Facing a Solr issue, I have been told that queries with a term like: Kiinteistösih* will not match the Finnish word Kiinteistösihteeri and that it's a known limitation of Lucene. Instead, using the word directly, without wildcard, works. Do you confirm this a known limitation/bug? If so do you have any registered issue about that? this isn't the case, there's no unicode limitation here. more likely, your analyzer is configured to lowercase text, so in the index Kiinteistösihteeri is really kiinteistösihteeri in other words, try kiinteistösih* and see how that works. Following your suggestion, I tested with: kiinteistösih* but it doesn't show me the intended result. I have found the reason why, this is because of the ISOLatin1AccentFilterFactory filter which is present for both the index and query analyzer. Searching with: kiinteistosih* did the trick. One question remains now: why should I lowercase terms containing a wildcard and making the ISO Latin1 accent conversion myself while I do have: analyzer type=query ... filter class=solr.LowerCaseFilterFactory/ filter class=solr.ISOLatin1AccentFilterFactory/ ... for the corresponding fieldType? I would have guessed it would does it for me. Your reply helped me a lot understanding what's going on. Thank you very much for your participation! Patrick - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org