Yes, reset was always "mandatory" from an API contract sense, but not always
enforced in a practical sense in 3.x (no uniformly extreme negative
consequences), as the original emailer indicated. Now, it is "mandatory" in
a practical sense as well (extremely annoying consequences in all cases of a
contract violation). So, I should have said that the contract was mandatory
but not enforced... which from a practical perspective negates its mandatory
contractual value.
-- Jack Krupansky
-----Original Message-----
From: Uwe Schindler
Sent: Monday, April 15, 2013 11:53 AM
To: java-user@lucene.apache.org
Subject: RE: WhitespaceTokenizer, incrementToke() ArrayOutOfBoundException
Hi,
It was always mandatory! In Lucene 2.x/3.x some Tokenizers just returned
bogus, undefined stuff if not correctly reset before usage, especially when
Tokenizers are "reused" by the Analyzer, which is now mandatory in 4.x. So
we made it throw some Exception (NPE or AIOOBE) in Lucene 4 by initializing
the state fields in Lucene 4.0 with some default values that cause the
Exception. The Exception is not more specified because of performance
reasons (it's just caused by the new default values set in ctor previously).
-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
-----Original Message-----
From: Jack Krupansky [mailto:j...@basetechnology.com]
Sent: Monday, April 15, 2013 4:25 PM
To: java-user@lucene.apache.org
Subject: Re: WhitespaceTokenizer, incrementToke()
ArrayOutOfBoundException
I didn't read your code, but do you have the "reset" that is now mandatory
and throws AIOOBE if not present?
-- Jack Krupansky
-----Original Message-----
From: andi rexha
Sent: Monday, April 15, 2013 10:21 AM
To: java-user@lucene.apache.org
Subject: WhitespaceTokenizer, incrementToke() ArrayOutOfBoundException
Hi,
I have tryed to get all the tokens from a TokenStream in the same way as I
was doing in the 3.x version of Lucene, but now (at least with
WhitespaceTokenizer) I get an exception:
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -1
at java.lang.Character.codePointAtImpl(Character.java:2405)
at java.lang.Character.codePointAt(Character.java:2369)
at
org.apache.lucene.analysis.util.CharacterUtils$Java5CharacterUtils.codePoint
At(CharacterUtils.java:164)
at
org.apache.lucene.analysis.util.CharTokenizer.incrementToken(CharTokeniz
er.java:166)
The code is quite simple, and I thought that it could have worked, but
obviously it doesn't (unless I have made some mistakes).
Here is the code, in case you spot some bugs on it (although it is
trivial):
String str = "this is a test";
Reader reader = new StringReader(str);
TokenStream tokenStream = new
WhitespaceTokenizer(Version.LUCENE_42,
reader); //tokenStreamAnalyzer.tokenStream("test", reader);
CharTermAttribute attribute =
tokenStream.getAttribute(CharTermAttribute.class);
while (tokenStream.incrementToken()) {
System.out.println(new String(attribute.buffer(), 0,
attribute.length()));
}
Hope you have any idea of why it is happening.
Regards,
Andi
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org