[ https://issues.apache.org/jira/browse/LUCENE-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783392#action_12783392 ]
Simon Willnauer commented on LUCENE-2094: ----------------------------------------- bq. Why do you use Version.LUCENE_CURRENT for all predefined stop word sets (ok, they do not need a match version, because they are already lowercased). 1. the do not ignore case at all so the version will not affect those sets. 2. they are private and we have the full control over the sets. The are all lowercased (as you figured correctly) and none of them contains any supplementary character. 3. The are static and private so passing any usersupplied version is not feasible. bq. In my opinion the whole stuff is only needed for chararrayssets, which are not already lowercased. So is there any chararrayset in lucene with predefined stop-words, that is not lowercased)? Either way, if the set is lowercased or not the lowercaseing is also applied to the values checked against the set. > Prepare CharArraySet for Unicode 4.0 > ------------------------------------ > > Key: LUCENE-2094 > URL: https://issues.apache.org/jira/browse/LUCENE-2094 > Project: Lucene - Java > Issue Type: Bug > Components: Analysis > Affects Versions: 1.9, 2.0.0, 2.1, 2.2, 2.3, 2.3.1, 2.3.2, 2.3.3, 2.4, > 2.4.1, 2.4.2, 2.9, 2.9.1, 2.9.2, 3.0, 3.0.1, 3.1 > Reporter: Simon Willnauer > Fix For: 3.1 > > Attachments: LUCENE-2094.patch, LUCENE-2094.txt, LUCENE-2094.txt, > LUCENE-2094.txt > > > CharArraySet does lowercaseing if created with the correspondent flag. This > causes that String / char[] with uncode 4 chars which are in the set can not > be retrieved in "ignorecase" mode. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org