[jira] Commented: (LUCENE-2094) Prepare CharArraySet for Unicode 4.0

Simon Willnauer (JIRA) Sun, 29 Nov 2009 04:02:46 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783392#action_12783392
 ]


Simon Willnauer commented on LUCENE-2094:
-----------------------------------------

bq. Why do you use Version.LUCENE_CURRENT for all predefined stop word sets 
(ok, they do not need a match version, because they are already lowercased). 

1. the do not ignore case at all so the version will not affect those sets.
2. they are private and we have the full control over the sets. The are all 
lowercased (as you figured correctly) and none of them contains any 
supplementary character.
3. The are static and private so passing any usersupplied version is not 
feasible.

bq. In my opinion the whole stuff is only needed for chararrayssets, which are 
not already lowercased. So is there any chararrayset in lucene with predefined 
stop-words, that is not lowercased)?
Either way, if the set is lowercased or not the lowercaseing is also applied to 
the values checked against the set.

> Prepare CharArraySet for Unicode 4.0
> ------------------------------------
>
>                 Key: LUCENE-2094
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2094
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 1.9, 2.0.0, 2.1, 2.2, 2.3, 2.3.1, 2.3.2, 2.3.3, 2.4, 
> 2.4.1, 2.4.2, 2.9, 2.9.1, 2.9.2, 3.0, 3.0.1, 3.1
>            Reporter: Simon Willnauer
>             Fix For: 3.1
>
>         Attachments: LUCENE-2094.patch, LUCENE-2094.txt, LUCENE-2094.txt, 
> LUCENE-2094.txt
>
>
> CharArraySet does lowercaseing if created with the correspondent flag. This 
> causes that  String / char[] with uncode 4 chars which are in the set can not 
> be retrieved in "ignorecase" mode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-2094) Prepare CharArraySet for Unicode 4.0

Reply via email to