Lucene 4.0 WhitespaceAnalyzer problem

Maksym Krasovskiy Tue, 15 Jan 2013 03:29:10 -0800

Hi!
I try to use WhitespaceAnalyzer from Lucene 4.0  for splitting strings to words.
I wrote smal test:
@Test
public void whitespaceAnalyzerTest() throws IOException {
    String string = "sdfdsf sdfsdf sd sdf ";
    Analyzer wa = new WhitespaceAnalyzer(Version.LUCENE_40);
    TokenStream tokenStream = wa.tokenStream("", new StringReader(string));
    while (tokenStream.incrementToken()) {
        
System.out.println(tokenStream.getAttribute(CharTermAttribute.class).toString());
    }
}


but got exception:
java.lang.ArrayIndexOutOfBoundsException: -1
    at java.lang.Character.codePointAtImpl(Character.java:2405)
    at java.lang.Character.codePointAt(Character.java:2369)
    at 
org.apache.lucene.analysis.util.CharacterUtils$Java5CharacterUtils.codePointAt(CharacterUtils.java:164)
    at 
org.apache.lucene.analysis.util.CharTokenizer.incrementToken(CharTokenizer.java:166)
    at 
com.maxx.tests.lucene40test.analyzer.AnalyzerTest.whitespaceAnalyzerTest(AnalyzerTest.java:93)
    ...


If I change WhitespaceAnalyzer to StandardAnalyzer  it work correctly. 
For workaround I can create StandardAnalyzer  without stopwords, but why my 
code doesn’t work?



--
Krasovskiy Maxim

Lucene 4.0 WhitespaceAnalyzer problem

Reply via email to