Hi!
I try to use WhitespaceAnalyzer from Lucene 4.0 for splitting strings to words.
I wrote smal test:
@Test
public void whitespaceAnalyzerTest() throws IOException {
String string = "sdfdsf sdfsdf sd sdf ";
Analyzer wa = new WhitespaceAnalyzer(Version.LUCENE_40);
TokenStream tokenStream = wa.tokenStream("", new StringReader(string));
while (tokenStream.incrementToken()) {
System.out.println(tokenStream.getAttribute(CharTermAttribute.class).toString());
}
}
but got exception:
java.lang.ArrayIndexOutOfBoundsException: -1
at java.lang.Character.codePointAtImpl(Character.java:2405)
at java.lang.Character.codePointAt(Character.java:2369)
at
org.apache.lucene.analysis.util.CharacterUtils$Java5CharacterUtils.codePointAt(CharacterUtils.java:164)
at
org.apache.lucene.analysis.util.CharTokenizer.incrementToken(CharTokenizer.java:166)
at
com.maxx.tests.lucene40test.analyzer.AnalyzerTest.whitespaceAnalyzerTest(AnalyzerTest.java:93)
...
If I change WhitespaceAnalyzer to StandardAnalyzer it work correctly.
For workaround I can create StandardAnalyzer without stopwords, but why my
code doesn’t work?
--
Krasovskiy Maxim