Re: Handling wildcard search containing special characters (unicode)
: Facing a Solr issue, I have been told that queries with a term like: : Kiinteistösih* : will not match the Finnish word Kiinteistösihteeri and that it's a : known limitation of Lucene. that is a missleading statement -- that types of query *can* match that word in an document, if the schema is configured in a way to preserve that raw term. where people run into trouble is if they use stemming, or loewrcasing, or ascii foldering, or any other forms of analysis at indexing time, because at query time the query parser does not use analysis for prefix and wildcard searches (if it did a search for something like dogs* might stem to dog* which is not what the user asked for) PS... http://people.apache.org/~hossman/#solr-user Please Use solr-user@lucene Not dev@lucene Your question is better suited for the solr-user@lucene mailing list ... not the dev@lucene list. The dev list is for discussing development of the internals of Solr and the Lucene Java library ... it is *not* the appropriate place to ask questions about how to use Solr or the Lucene Java library when developing your own applications. -Hoss - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Handling wildcard search containing special characters (unicode)
Hello, Facing a Solr issue, I have been told that queries with a term like: Kiinteistösih* will not match the Finnish word Kiinteistösihteeri and that it's a known limitation of Lucene. Instead, using the word directly, without wildcard, works. Do you confirm this a known limitation/bug? If so do you have any registered issue about that? Searching the ML archive and the issue tracker in both SOLR and LUCENE projects didn't provide me a pointer to this problem. One of the reference I found on the web talking about this problem is: http://forum.compass-project.org/message.jspa?messageID=227709 But again, no pointer to a discussion or issue. Thanks in advance for your help, Patrick - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Handling wildcard search containing special characters (unicode)
On Thu, Mar 31, 2011 at 9:51 AM, Patrick ALLAERT patrick.alla...@gmail.com wrote: Hello, Facing a Solr issue, I have been told that queries with a term like: Kiinteistösih* will not match the Finnish word Kiinteistösihteeri and that it's a known limitation of Lucene. Instead, using the word directly, without wildcard, works. Do you confirm this a known limitation/bug? If so do you have any registered issue about that? this isn't the case, there's no unicode limitation here. more likely, your analyzer is configured to lowercase text, so in the index Kiinteistösihteeri is really kiinteistösihteeri in other words, try kiinteistösih* and see how that works. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Handling wildcard search containing special characters (unicode)
2011/3/31 Robert Muir rcm...@gmail.com: On Thu, Mar 31, 2011 at 9:51 AM, Patrick ALLAERT patrick.alla...@gmail.com wrote: Hello, Facing a Solr issue, I have been told that queries with a term like: Kiinteistösih* will not match the Finnish word Kiinteistösihteeri and that it's a known limitation of Lucene. Instead, using the word directly, without wildcard, works. Do you confirm this a known limitation/bug? If so do you have any registered issue about that? this isn't the case, there's no unicode limitation here. more likely, your analyzer is configured to lowercase text, so in the index Kiinteistösihteeri is really kiinteistösihteeri in other words, try kiinteistösih* and see how that works. Following your suggestion, I tested with: kiinteistösih* but it doesn't show me the intended result. I have found the reason why, this is because of the ISOLatin1AccentFilterFactory filter which is present for both the index and query analyzer. Searching with: kiinteistosih* did the trick. One question remains now: why should I lowercase terms containing a wildcard and making the ISO Latin1 accent conversion myself while I do have: analyzer type=query ... filter class=solr.LowerCaseFilterFactory/ filter class=solr.ISOLatin1AccentFilterFactory/ ... for the corresponding fieldType? I would have guessed it would does it for me. Your reply helped me a lot understanding what's going on. Thank you very much for your participation! Patrick - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org