RE: WhitespaceTokenizer, incrementToke() ArrayOutOfBoundException

Uwe Schindler Mon, 15 Apr 2013 08:54:13 -0700

Hi,

It was always mandatory! In Lucene 2.x/3.x some Tokenizers just returned bogus, 
undefined stuff if not correctly reset before usage, especially when Tokenizers 
are "reused" by the Analyzer, which is now mandatory in 4.x. So we made it 
throw some Exception (NPE or AIOOBE) in Lucene 4 by initializing the state 
fields in Lucene 4.0 with some default values that cause the Exception. The 
Exception is not more specified because of performance reasons (it's just 
caused by the new default values set in ctor previously).


-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [email protected]


> -----Original Message-----
> From: Jack Krupansky [mailto:[email protected]]
> Sent: Monday, April 15, 2013 4:25 PM
> To: [email protected]
> Subject: Re: WhitespaceTokenizer, incrementToke()
> ArrayOutOfBoundException
> 
> I didn't read your code, but do you have the "reset" that is now mandatory
> and throws AIOOBE if not present?
> 
> -- Jack Krupansky
> 
> -----Original Message-----
> From: andi rexha
> Sent: Monday, April 15, 2013 10:21 AM
> To: [email protected]
> Subject: WhitespaceTokenizer, incrementToke() ArrayOutOfBoundException
> 
> Hi,
> I have tryed to get all the tokens from a TokenStream in the same way as I
> was doing in the 3.x version of Lucene, but now (at least with
> WhitespaceTokenizer) I get an exception:
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -1
>     at java.lang.Character.codePointAtImpl(Character.java:2405)
>     at java.lang.Character.codePointAt(Character.java:2369)
>     at
> org.apache.lucene.analysis.util.CharacterUtils$Java5CharacterUtils.codePoint
> At(CharacterUtils.java:164)
>     at
> org.apache.lucene.analysis.util.CharTokenizer.incrementToken(CharTokeniz
> er.java:166)
> 
> 
> 
> The code is quite simple, and I thought that it could have worked, but
> obviously it doesn't (unless I have made some mistakes).
> 
> Here is the code, in case you spot some bugs on it (although it is trivial):
> String str = "this is a test";
>         Reader reader = new StringReader(str);
>         TokenStream tokenStream = new
> WhitespaceTokenizer(Version.LUCENE_42,
> reader);  //tokenStreamAnalyzer.tokenStream("test", reader);
>         CharTermAttribute attribute =
> tokenStream.getAttribute(CharTermAttribute.class);
>         while (tokenStream.incrementToken()) {
>             System.out.println(new String(attribute.buffer(), 0,
> attribute.length()));
>         }
> 
> Hope you have any idea of why it is happening.
> Regards,
> Andi
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

RE: WhitespaceTokenizer, incrementToke() ArrayOutOfBoundException

Reply via email to