[ https://issues.apache.org/jira/browse/LUCENE-5897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14136439#comment-14136439 ]
Steve Rowe commented on LUCENE-5897: ------------------------------------ LUCENE-5770 (JFlex 1.6 upgrade) happened in 4.10, so backporting to 4.9 will require some changes. > performance bug ("adversary") in StandardTokenizer > -------------------------------------------------- > > Key: LUCENE-5897 > URL: https://issues.apache.org/jira/browse/LUCENE-5897 > Project: Lucene - Core > Issue Type: Bug > Reporter: Robert Muir > Assignee: Steve Rowe > Fix For: 4.10, 5.0, 4.9.1 > > Attachments: LUCENE-5897.patch > > > There seem to be some conditions (I don't know how rare or what conditions) > that cause StandardTokenizer to essentially hang on input: I havent looked > hard yet, but as its essentially a DFA I think something wierd might be going > on. > An easy way to reproduce is with 1MB of underscores, it will just hang > forever. > {code} > public void testWorthyAdversary() throws Exception { > char buffer[] = new char[1024 * 1024]; > Arrays.fill(buffer, '_'); > int tokenCount = 0; > Tokenizer ts = new StandardTokenizer(); > ts.setReader(new StringReader(new String(buffer))); > ts.reset(); > while (ts.incrementToken()) { > tokenCount++; > } > ts.end(); > ts.close(); > assertEquals(0, tokenCount); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org