[ https://issues.apache.org/jira/browse/LUCENE-5897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14104804#comment-14104804 ]
Steve Rowe commented on LUCENE-5897: ------------------------------------ bq. can it be based on the way the rules are encoded in our grammar? I don't know how to do that - as I mentioned on LUCENE-5400, adding large repeat counts to sub-regexes made JFlex OOM at generation time. Were you thinking of something other than repeat counts? I'm thinking it should be possible to abuse JFlex's buffer handling to just never grow the buffer beyond the initial size, but still allow the contents to be shifted to enable (maximally) buffer-length matches. This would have a nice secondary effect of reducing max memory usage. If I can make it work, I'll add a generation option for this to JFlex. > performance bug ("adversary") in StandardTokenizer > -------------------------------------------------- > > Key: LUCENE-5897 > URL: https://issues.apache.org/jira/browse/LUCENE-5897 > Project: Lucene - Core > Issue Type: Bug > Reporter: Robert Muir > > There seem to be some conditions (I don't know how rare or what conditions) > that cause StandardTokenizer to essentially hang on input: I havent looked > hard yet, but as its essentially a DFA I think something wierd might be going > on. > An easy way to reproduce is with 1MB of underscores, it will just hang > forever. > {code} > public void testWorthyAdversary() throws Exception { > char buffer[] = new char[1024 * 1024]; > Arrays.fill(buffer, '_'); > int tokenCount = 0; > Tokenizer ts = new StandardTokenizer(); > ts.setReader(new StringReader(new String(buffer))); > ts.reset(); > while (ts.incrementToken()) { > tokenCount++; > } > ts.end(); > ts.close(); > assertEquals(0, tokenCount); > } > {code} -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org