[ https://issues.apache.org/jira/browse/LUCENE-5897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14104379#comment-14104379 ]
Robert Muir commented on LUCENE-5897: ------------------------------------- OK, that makes sense. Can we do something crazy during generation of jflex (e.g. inject a little code) to bound this to at least maxTokenLength() ? > performance bug ("adversary") in StandardTokenizer > -------------------------------------------------- > > Key: LUCENE-5897 > URL: https://issues.apache.org/jira/browse/LUCENE-5897 > Project: Lucene - Core > Issue Type: Bug > Reporter: Robert Muir > > There seem to be some conditions (I don't know how rare or what conditions) > that cause StandardTokenizer to essentially hang on input: I havent looked > hard yet, but as its essentially a DFA I think something wierd might be going > on. > An easy way to reproduce is with 1MB of underscores, it will just hang > forever. > {code} > public void testWorthyAdversary() throws Exception { > char buffer[] = new char[1024 * 1024]; > Arrays.fill(buffer, '_'); > int tokenCount = 0; > Tokenizer ts = new StandardTokenizer(); > ts.setReader(new StringReader(new String(buffer))); > ts.reset(); > while (ts.incrementToken()) { > tokenCount++; > } > ts.end(); > ts.close(); > assertEquals(0, tokenCount); > } > {code} -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org