[
https://issues.apache.org/jira/browse/LUCENE-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783392#action_12783392
]
Simon Willnauer commented on LUCENE-2094:
-----------------------------------------
bq. Why do you use Version.LUCENE_CURRENT for all predefined stop word sets
(ok, they do not need a match version, because they are already lowercased).
1. the do not ignore case at all so the version will not affect those sets.
2. they are private and we have the full control over the sets. The are all
lowercased (as you figured correctly) and none of them contains any
supplementary character.
3. The are static and private so passing any usersupplied version is not
feasible.
bq. In my opinion the whole stuff is only needed for chararrayssets, which are
not already lowercased. So is there any chararrayset in lucene with predefined
stop-words, that is not lowercased)?
Either way, if the set is lowercased or not the lowercaseing is also applied to
the values checked against the set.
> Prepare CharArraySet for Unicode 4.0
> ------------------------------------
>
> Key: LUCENE-2094
> URL: https://issues.apache.org/jira/browse/LUCENE-2094
> Project: Lucene - Java
> Issue Type: Bug
> Components: Analysis
> Affects Versions: 1.9, 2.0.0, 2.1, 2.2, 2.3, 2.3.1, 2.3.2, 2.3.3, 2.4,
> 2.4.1, 2.4.2, 2.9, 2.9.1, 2.9.2, 3.0, 3.0.1, 3.1
> Reporter: Simon Willnauer
> Fix For: 3.1
>
> Attachments: LUCENE-2094.patch, LUCENE-2094.txt, LUCENE-2094.txt,
> LUCENE-2094.txt
>
>
> CharArraySet does lowercaseing if created with the correspondent flag. This
> causes that String / char[] with uncode 4 chars which are in the set can not
> be retrieved in "ignorecase" mode.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]