[jira] [Commented] (SOLR-219) Determine if prefix, wildcard, fuzzy queries should be lowercased

2011-04-04 Thread Patrick Allaert (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13015333#comment-13015333
 ] 

Patrick Allaert commented on SOLR-219:
--

Any plan to implement this?

 Determine if prefix, wildcard, fuzzy queries should be lowercased
 -

 Key: SOLR-219
 URL: https://issues.apache.org/jira/browse/SOLR-219
 Project: Solr
  Issue Type: Improvement
Reporter: Yonik Seeley
Priority: Minor
 Fix For: Next

 Attachments: lowercase_prefix.patch, wildcardlowercase.patch


 Solr should be able to do the right thing when doing prefix/wildcard/fuzzy 
 queries on fields with respect to lowercasing or not.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Handling wildcard search containing special characters (unicode)

2011-03-31 Thread Patrick ALLAERT
Hello,

Facing a Solr issue, I have been told that queries with a term like:
Kiinteistösih*
will not match the Finnish word Kiinteistösihteeri and that it's a
known limitation of Lucene.
Instead, using the word directly, without wildcard, works.

Do you confirm this a known limitation/bug?
If so do you have any registered issue about that?

Searching the ML archive and the issue tracker in both SOLR and LUCENE
projects didn't provide me a pointer to this problem.

One of the reference I found on the web talking about this problem is:
http://forum.compass-project.org/message.jspa?messageID=227709
But again, no pointer to a discussion or issue.

Thanks in advance for your help,
Patrick

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Handling wildcard search containing special characters (unicode)

2011-03-31 Thread Patrick ALLAERT
2011/3/31 Robert Muir rcm...@gmail.com:
 On Thu, Mar 31, 2011 at 9:51 AM, Patrick ALLAERT
 patrick.alla...@gmail.com wrote:
 Hello,

 Facing a Solr issue, I have been told that queries with a term like:
 Kiinteistösih*
 will not match the Finnish word Kiinteistösihteeri and that it's a
 known limitation of Lucene.
 Instead, using the word directly, without wildcard, works.

 Do you confirm this a known limitation/bug?
 If so do you have any registered issue about that?

 this isn't the case, there's no unicode limitation here.

 more likely, your analyzer is configured to lowercase text, so in the
 index Kiinteistösihteeri is really kiinteistösihteeri
 in other words, try kiinteistösih* and see how that works.

Following your suggestion, I tested with:
kiinteistösih*

but it doesn't show me the intended result.

I have found the reason why, this is because of the
ISOLatin1AccentFilterFactory filter which is present for both the
index and query analyzer.
Searching with:
kiinteistosih*
did the trick.

One question remains now: why should I lowercase terms containing a
wildcard and making the ISO Latin1 accent conversion myself while I do
have:
analyzer type=query
...
  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.ISOLatin1AccentFilterFactory/
...
for the corresponding fieldType?
I would have guessed it would does it for me.

Your reply helped me a lot understanding what's going on.
Thank you very much for your participation!

Patrick

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org