Hi,
I have tryed to get all the tokens from a TokenStream in the same way as I was
doing in the 3.x version of Lucene, but now (at least with WhitespaceTokenizer)
I get an exception:
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -1
at java.lang.Character.codePointAtImpl(Character.java:2405)
at java.lang.Character.codePointAt(Character.java:2369)
at
org.apache.lucene.analysis.util.CharacterUtils$Java5CharacterUtils.codePointAt(CharacterUtils.java:164)
at
org.apache.lucene.analysis.util.CharTokenizer.incrementToken(CharTokenizer.java:166)
The code is quite simple, and I thought that it could have worked, but
obviously it doesn't (unless I have made some mistakes).
Here is the code, in case you spot some bugs on it (although it is trivial):
String str = "this is a test";
Reader reader = new StringReader(str);
TokenStream tokenStream = new WhitespaceTokenizer(Version.LUCENE_42,
reader); //tokenStreamAnalyzer.tokenStream("test", reader);
CharTermAttribute attribute =
tokenStream.getAttribute(CharTermAttribute.class);
while (tokenStream.incrementToken()) {
System.out.println(new String(attribute.buffer(), 0,
attribute.length()));
}
Hope you have any idea of why it is happening.
Regards,
Andi