[jira] Issue Comment Edited: (LUCENE-2094) Prepare CharArraySet for Unicode 4.0

Simon Willnauer (JIRA) Thu, 07 Jan 2010 11:55:16 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797772#action_12797772
 ]


Simon Willnauer edited comment on LUCENE-2094 at 1/7/10 7:53 PM:
-----------------------------------------------------------------

bq. Simon, I think yonik refers to this code in stopfilter itself: 
I see, the problem with this piece of code is that it has the caseinsensitive 
flag which would be ignored if I would not create such a set though. As far as 
I can see even previous version did not really do what the javadoc says. 
{code}
  if (stopWords instanceof CharArraySet) {
      this.stopWords = (CharArraySet)stopWords;
    } else {
      this.stopWords = new CharArraySet(stopWords.size(), ignoreCase);
      this.stopWords.addAll(stopWords);
    }
{code}

I agree we should prevent this costly operation but it doesn't seem to be easy 
though. My first impression is to deprecate the ctors which have the ignorecase 
boolean and fix documentation to use charArraySet if case should be ignored. At 
the same time we should introduce a getter to charArraySet and only create a 
new set if the boolean given and the ignorecase member in CharArraySet does not 
match, provided it is an instance of charArraySet.

This should also be backported to 2.9 / 3.0 to enable solr to at least fix 
things where possible.



      was (Author: simonw):
    bq. Simon, I think yonik refers to this code in stopfilter itself: 
I see, the problem with this piece of code is that it has the caseinsensitive 
flag which would be ignored if I would not create such a set though. As far as 
I can see even previous version did not really do what the javadoc says. 
{code}
  if (stopWords instanceof CharArraySet) {
      this.stopWords = (CharArraySet)stopWords;
    } else {
      this.stopWords = new CharArraySet(stopWords.size(), ignoreCase);
      this.stopWords.addAll(stopWords);
    }
{code}

I agree we should prevent this costly operation but it doesn't seem to be easy 
though. My first impression is to deprecate the ctors which have the ignorecase 
boolean and fix documentation to use charArraySet if case should be ignored. At 
the same time we should introduce a getter to charArraySet and only create a 
new set if the boolean given and the ignorecase member in CharArraySet does not 
match, provided it is an instance of charArraySet.


  
> Prepare CharArraySet for Unicode 4.0
> ------------------------------------
>
>                 Key: LUCENE-2094
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2094
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 3.0
>            Reporter: Simon Willnauer
>            Assignee: Uwe Schindler
>             Fix For: 3.1
>
>         Attachments: LUCENE-2094.patch, LUCENE-2094.patch, LUCENE-2094.patch, 
> LUCENE-2094.patch, LUCENE-2094.patch, LUCENE-2094.patch, LUCENE-2094.txt, 
> LUCENE-2094.txt, LUCENE-2094.txt
>
>
> CharArraySet does lowercaseing if created with the correspondent flag. This 
> causes that  String / char[] with uncode 4 chars which are in the set can not 
> be retrieved in "ignorecase" mode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Issue Comment Edited: (LUCENE-2094) Prepare CharArraySet for Unicode 4.0

Reply via email to