[
https://issues.apache.org/jira/browse/DAFFODIL-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Steve Lawrence closed DAFFODIL-2048.
------------------------------------
> Exponential blowup in regular expression matching
> -------------------------------------------------
>
> Key: DAFFODIL-2048
> URL: https://issues.apache.org/jira/browse/DAFFODIL-2048
> Project: Daffodil
> Issue Type: Bug
> Components: Libraries
> Affects Versions: 2.3.0
> Reporter: Brandon Sloane
> Priority: Major
>
> It is possible for simple regular expression to trigger exponential behavior
> at run time.
> For instance, the regular expression "(a|a)*x" is exponential in the number
> of "a" characters that occurs in the beginning of the string.
> If the string does not match, this behavior is directly observable.
> If the string does match, the behavior may still be observable. Daffodil
> performs regular expression matches using a fixed size buffer. If such a
> match fails because it ran out of buffer, Daffodil will try again using a
> bigger buffer. This means that even if a string will end up matching, if it
> requires resizing the buffer, we will observe exponential time when trying to
> match using the smaller buffer.
>
> Below is a testcase demonstrating this:
> {quote}@Test def testUSASCII7BitEncoderMalformedError {
> val encoder = BitsCharsetUSASCII7BitPacked.newEncoder
> val bb = ByteBuffer.allocate(3)
> val cb = CharBuffer.wrap("ab" + 128.toChar) // 128 is not encodable in 7 bits
> val coderResult = encoder.encode(cb, bb, true)
> assertTrue(coderResult.isUnmappable())
> assertEquals(coderResult.length, 1)
> }
> @Test def testRegexpMatch{
>
> val regex = "(a|a)*x".r
> var baseStr=""
>
> // val regex = "(a|a)*y*x".r
> // var baseStr="x"
> // for(i <- 0 to 64){
> // baseStr = "y"+baseStr;
> // }
>
> val oldDecoder = finfo.decoder
> finfo.decoder=BitsCharsetUSASCII.newDecoder();
>
> for(i <- 0 to 300){
> var str="";
> for(j <- 0 to i){
> str = str+"a"
> }
> str=str+baseStr;
> val dis = org.apache.daffodil.io.InputSourceDataInputStream(str.getBytes);
>
> val matcher = regex.pattern.matcher(str)
> matcher.reset()
> val start=System.currentTimeMillis();
> val ans = dis.lookingAt(matcher, finfo)
> val stop = System.currentTimeMillis()
> println(i+": "+(stop-start)+"ms")
> }
>
> finfo.decoder=oldDecoder;
>
> }
> {quote}
> Replace the definition of baseStr and regex with the commented out version to
> see the behaviour when the match is successful.
>
> There are several related issues here:
> 1) The regular expression above is, in fact, regular. It should be able to
> run in linear time. The obvious replacement of "a" for "(a|a)" fixes this.
> However, this same situation can occur in less obvious situations (for
> instance, I ran into this by adding the (?s) flag to a regex containing
> (.|\r)).
> 2) We currently accept regular expressions that are not actually regular, so
> there will be ones that genuinely take exponential time. In this case, we
> should have some form of timeout. (Preferably deterministic)
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)