Robert Muir created LUCENE-5897: ----------------------------------- Summary: performance bug ("adversary") in StandardTokenizer Key: LUCENE-5897 URL: https://issues.apache.org/jira/browse/LUCENE-5897 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir
There seem to be some conditions (I don't know how rare or what conditions) that cause StandardTokenizer to essentially hang on input: I havent looked hard yet, but as its essentially a DFA I think something wierd might be going on. An easy way to reproduce is with 1MB of underscores, it will just hang forever. {code} public void testWorthyAdversary() throws Exception { char buffer[] = new char[1024 * 1024]; Arrays.fill(buffer, '_'); int tokenCount = 0; Tokenizer ts = new StandardTokenizer(); ts.setReader(new StringReader(new String(buffer))); ts.reset(); while (ts.incrementToken()) { tokenCount++; } ts.end(); ts.close(); assertEquals(0, tokenCount); } {code} -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org