CharArraySet behaves inconsistently in add(Object) and contains(Object)
-----------------------------------------------------------------------

                 Key: LUCENE-1502
                 URL: https://issues.apache.org/jira/browse/LUCENE-1502
             Project: Lucene - Java
          Issue Type: Bug
          Components: Analysis
            Reporter: Shai Erera
             Fix For: 2.4.1, 2.9


CharArraySet's add(Object) method looks like this:
    if (o instanceof char[]) {
      return add((char[])o);
    } else if (o instanceof String) {
      return add((String)o);
    } else if (o instanceof CharSequence) {
      return add((CharSequence)o);
    } else {
      return add(o.toString());
    }
You'll notice that in the case of an Object (for example, Integer), the 
o.toString() is added. However, its contains(Object) method looks like this:
    if (o instanceof char[]) {
      char[] text = (char[])o;
      return contains(text, 0, text.length);
    } else if (o instanceof CharSequence) {
      return contains((CharSequence)o);
    }
    return false;
In case of contains(Integer), it always returns false. I've added a simple test 
to TestCharArraySet, which reproduces the problem:
  public void testObjectContains() {
    CharArraySet set = new CharArraySet(10, true);
    Integer val = new Integer(1);
    set.add(val);
    assertTrue(set.contains(val));
    assertTrue(set.contains(new Integer(1)));
  }
Changing contains(Object) to this, solves the problem:
    if (o instanceof char[]) {
      char[] text = (char[])o;
      return contains(text, 0, text.length);
    } 
    return contains(o.toString());

The patch also includes few minor improvements (which were discussed on the 
mailing list) such as the removal of the following dead code from 
getHashCode(CharSequence):
      if (false && text instanceof String) {
        code = text.hashCode();
and simplifying add(Object):
    if (o instanceof char[]) {
      return add((char[])o);
    }
    return add(o.toString());
(which also aligns with the equivalent contains() method).

One thing that's still left open is whether we can avoid the calls to 
Character.toLowerCase calls in all the char[] array methods, by first 
converting the char[] to lowercase, and then passing it through the equals() 
and getHashCode() methods. It works for add(), but fails for contains(char[]) 
since it modifies the input array.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to