[ https://issues.apache.org/jira/browse/LUCENE-5897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137225#comment-14137225 ]
ASF subversion and git services commented on LUCENE-5897: --------------------------------------------------------- Commit 1625586 from [~sar...@syr.edu] in branch 'dev/branches/lucene_solr_4_9' [ https://svn.apache.org/r1625586 ] LUCENE-5897, LUCENE-5400: change JFlex-generated source munging so that zzRefill() doesn't call Reader.read(buffer,start,len) with len=0 > performance bug ("adversary") in StandardTokenizer > -------------------------------------------------- > > Key: LUCENE-5897 > URL: https://issues.apache.org/jira/browse/LUCENE-5897 > Project: Lucene - Core > Issue Type: Bug > Reporter: Robert Muir > Assignee: Steve Rowe > Fix For: 4.9.1, 4.10, 5.0 > > Attachments: LUCENE-5897.patch > > > There seem to be some conditions (I don't know how rare or what conditions) > that cause StandardTokenizer to essentially hang on input: I havent looked > hard yet, but as its essentially a DFA I think something wierd might be going > on. > An easy way to reproduce is with 1MB of underscores, it will just hang > forever. > {code} > public void testWorthyAdversary() throws Exception { > char buffer[] = new char[1024 * 1024]; > Arrays.fill(buffer, '_'); > int tokenCount = 0; > Tokenizer ts = new StandardTokenizer(); > ts.setReader(new StringReader(new String(buffer))); > ts.reset(); > while (ts.incrementToken()) { > tokenCount++; > } > ts.end(); > ts.close(); > assertEquals(0, tokenCount); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org