Yes, reset was always "mandatory" from an API contract sense, but not always enforced in a practical sense in 3.x (no uniformly extreme negative consequences), as the original emailer indicated. Now, it is "mandatory" in a practical sense as well (extremely annoying consequences in all cases of a contract violation). So, I should have said that the contract was mandatory but not enforced... which from a practical perspective negates its mandatory contractual value.

-- Jack Krupansky

-----Original Message----- From: Uwe Schindler
Sent: Monday, April 15, 2013 11:53 AM
To: java-user@lucene.apache.org
Subject: RE: WhitespaceTokenizer, incrementToke() ArrayOutOfBoundException

Hi,

It was always mandatory! In Lucene 2.x/3.x some Tokenizers just returned bogus, undefined stuff if not correctly reset before usage, especially when Tokenizers are "reused" by the Analyzer, which is now mandatory in 4.x. So we made it throw some Exception (NPE or AIOOBE) in Lucene 4 by initializing the state fields in Lucene 4.0 with some default values that cause the Exception. The Exception is not more specified because of performance reasons (it's just caused by the new default values set in ctor previously).

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


-----Original Message-----
From: Jack Krupansky [mailto:j...@basetechnology.com]
Sent: Monday, April 15, 2013 4:25 PM
To: java-user@lucene.apache.org
Subject: Re: WhitespaceTokenizer, incrementToke()
ArrayOutOfBoundException

I didn't read your code, but do you have the "reset" that is now mandatory
and throws AIOOBE if not present?

-- Jack Krupansky

-----Original Message-----
From: andi rexha
Sent: Monday, April 15, 2013 10:21 AM
To: java-user@lucene.apache.org
Subject: WhitespaceTokenizer, incrementToke() ArrayOutOfBoundException

Hi,
I have tryed to get all the tokens from a TokenStream in the same way as I
was doing in the 3.x version of Lucene, but now (at least with
WhitespaceTokenizer) I get an exception:
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -1
    at java.lang.Character.codePointAtImpl(Character.java:2405)
    at java.lang.Character.codePointAt(Character.java:2369)
    at
org.apache.lucene.analysis.util.CharacterUtils$Java5CharacterUtils.codePoint
At(CharacterUtils.java:164)
    at
org.apache.lucene.analysis.util.CharTokenizer.incrementToken(CharTokeniz
er.java:166)



The code is quite simple, and I thought that it could have worked, but
obviously it doesn't (unless I have made some mistakes).

Here is the code, in case you spot some bugs on it (although it is trivial):
String str = "this is a test";
        Reader reader = new StringReader(str);
        TokenStream tokenStream = new
WhitespaceTokenizer(Version.LUCENE_42,
reader);  //tokenStreamAnalyzer.tokenStream("test", reader);
        CharTermAttribute attribute =
tokenStream.getAttribute(CharTermAttribute.class);
        while (tokenStream.incrementToken()) {
            System.out.println(new String(attribute.buffer(), 0,
attribute.length()));
        }

Hope you have any idea of why it is happening.
Regards,
Andi



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to