[ 
https://issues.apache.org/jira/browse/DAFFODIL-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Lawrence closed DAFFODIL-2048.
------------------------------------

> Exponential blowup in regular expression matching
> -------------------------------------------------
>
>                 Key: DAFFODIL-2048
>                 URL: https://issues.apache.org/jira/browse/DAFFODIL-2048
>             Project: Daffodil
>          Issue Type: Bug
>          Components: Libraries
>    Affects Versions: 2.3.0
>            Reporter: Brandon Sloane
>            Priority: Major
>
> It is possible for simple regular expression to trigger exponential behavior 
> at run time.
> For instance, the regular expression "(a|a)*x" is exponential in the number 
> of "a" characters that occurs in the beginning of the string.
> If the string does not match, this behavior is directly observable.
> If the string does match, the behavior may still be observable. Daffodil 
> performs regular expression matches using a fixed size buffer. If such a 
> match fails because it ran out of buffer, Daffodil will try again using a 
> bigger buffer. This means that even if a string will end up matching, if it 
> requires resizing the buffer, we will observe exponential time when trying to 
> match using the smaller buffer.
>  
> Below is a testcase demonstrating this:
> {quote}@Test def testUSASCII7BitEncoderMalformedError {
>  val encoder = BitsCharsetUSASCII7BitPacked.newEncoder
>  val bb = ByteBuffer.allocate(3)
>  val cb = CharBuffer.wrap("ab" + 128.toChar) // 128 is not encodable in 7 bits
>  val coderResult = encoder.encode(cb, bb, true)
>  assertTrue(coderResult.isUnmappable())
>  assertEquals(coderResult.length, 1)
>  }
> @Test def testRegexpMatch{
>  
>  val regex = "(a|a)*x".r
>  var baseStr=""
>  
> // val regex = "(a|a)*y*x".r
> // var baseStr="x"
> // for(i <- 0 to 64){
> // baseStr = "y"+baseStr;
> // }
>  
>  val oldDecoder = finfo.decoder
>  finfo.decoder=BitsCharsetUSASCII.newDecoder();
>  
>  for(i <- 0 to 300){
>  var str="";
>  for(j <- 0 to i){
>  str = str+"a"
>  }
>  str=str+baseStr;
>  val dis = org.apache.daffodil.io.InputSourceDataInputStream(str.getBytes);
>  
>  val matcher = regex.pattern.matcher(str)
>  matcher.reset()
>  val start=System.currentTimeMillis();
>  val ans = dis.lookingAt(matcher, finfo)
>  val stop = System.currentTimeMillis()
>  println(i+": "+(stop-start)+"ms")
>  }
>  
>  finfo.decoder=oldDecoder;
>  
>  }
> {quote}
> Replace the definition of baseStr and regex with the commented out version to 
> see the behaviour when the match is successful.
>  
> There are several related issues here:
> 1) The regular expression above is, in fact, regular. It should be able to 
> run in linear time. The obvious replacement of "a" for "(a|a)" fixes this. 
> However, this same situation can occur in less obvious situations (for 
> instance, I ran into this by adding the (?s) flag to a regex containing 
> (.|\r)).
> 2) We currently accept regular expressions that are not actually regular, so 
> there will be ones that genuinely take exponential time. In this case, we 
> should have some form of timeout. (Preferably deterministic)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to