Re: WhitespaceTokenizer, incrementToke() ArrayOutOfBoundException

Jack Krupansky Mon, 15 Apr 2013 17:02:50 -0700

Yes, reset was always "mandatory" from an API contract sense, but not alwaysenforced in a practical sense in 3.x (no uniformly extreme negativeconsequences), as the original emailer indicated. Now, it is "mandatory" ina practical sense as well (extremely annoying consequences in all cases of acontract violation). So, I should have said that the contract was mandatorybut not enforced... which from a practical perspective negates its mandatorycontractual value.


-- Jack Krupansky

-----Original Message-----From: Uwe Schindler

Sent: Monday, April 15, 2013 11:53 AM
To: java-user@lucene.apache.org
Subject: RE: WhitespaceTokenizer, incrementToke() ArrayOutOfBoundException

Hi,

It was always mandatory! In Lucene 2.x/3.x some Tokenizers just returnedbogus, undefined stuff if not correctly reset before usage, especially whenTokenizers are "reused" by the Analyzer, which is now mandatory in 4.x. Sowe made it throw some Exception (NPE or AIOOBE) in Lucene 4 by initializingthe state fields in Lucene 4.0 with some default values that cause theException. The Exception is not more specified because of performancereasons (it's just caused by the new default values set in ctor previously).


-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

-----Original Message-----
From: Jack Krupansky [mailto:j...@basetechnology.com]
Sent: Monday, April 15, 2013 4:25 PM
To: java-user@lucene.apache.org
Subject: Re: WhitespaceTokenizer, incrementToke()
ArrayOutOfBoundException

I didn't read your code, but do you have the "reset" that is now mandatory
and throws AIOOBE if not present?

-- Jack Krupansky

-----Original Message-----
From: andi rexha
Sent: Monday, April 15, 2013 10:21 AM
To: java-user@lucene.apache.org
Subject: WhitespaceTokenizer, incrementToke() ArrayOutOfBoundException

Hi,
I have tryed to get all the tokens from a TokenStream in the same way as I
was doing in the 3.x version of Lucene, but now (at least with
WhitespaceTokenizer) I get an exception:
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -1
    at java.lang.Character.codePointAtImpl(Character.java:2405)
    at java.lang.Character.codePointAt(Character.java:2369)
    at
org.apache.lucene.analysis.util.CharacterUtils$Java5CharacterUtils.codePoint
At(CharacterUtils.java:164)
    at
org.apache.lucene.analysis.util.CharTokenizer.incrementToken(CharTokeniz
er.java:166)



The code is quite simple, and I thought that it could have worked, but
obviously it doesn't (unless I have made some mistakes).

Here is the code, in case you spot some bugs on it (although it istrivial):

String str = "this is a test";
        Reader reader = new StringReader(str);
        TokenStream tokenStream = new
WhitespaceTokenizer(Version.LUCENE_42,
reader);  //tokenStreamAnalyzer.tokenStream("test", reader);
        CharTermAttribute attribute =
tokenStream.getAttribute(CharTermAttribute.class);
        while (tokenStream.incrementToken()) {
            System.out.println(new String(attribute.buffer(), 0,
attribute.length()));
        }

Hope you have any idea of why it is happening.
Regards,
Andi



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org

For additional commands, e-mail: java-user-h...@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: WhitespaceTokenizer, incrementToke() ArrayOutOfBoundException

Reply via email to