On Tue, Aug 06, 2002 at 10:38:46PM -0000, Simon Glover wrote:
> 
> On Tue, 6 Aug 2002, Nicholas Clark wrote:
> 
> > On Tue, Aug 06, 2002 at 10:17:48PM +0000, Nicholas Clark wrote:
> >
> >
> >         chopn   S1, 1   # Check that the contents of S1 are no longer constant
> >
> > I spot a trend here.
> >
> 
>  Yep, it looks like it's the trailing comments that are the problem; the
>  code that strips these out was recently patched to fix a different bug
>  -- check out change 82 to assemble.pl
> 
>  Unfortunately, I have no idea why the current code breaks on 5.005_03,
>  or what to do about it :-(

The problem seems to be that capture with multiple levels of parenthesis
(even though they are non-capturing) is getting the 5.005_03 regexp engine
stuck, presumably due to the nested quantifiers (the * in $str_re and the
* outside)

Re-writing it as a match makes it take finite time on 5.005_03 (as appended)

I notice 2 things

1: $str_re is being defined several places as my - is there any reason why it
   and the other building block regexps aren't defined once at the top of the
   file
2: $str_re is always interpolated. In 5.x series qr// interpolation is done
   by stringifying the rexexp, string interpolation and then recompiling,
   isn't it? In which case it would be faster to define the building block
   regexps as strings, surely?

Also, the assembler is slow, and I wonder if attacking other regexps in this
way would generate a speedup. YAPC::Europe have accepted my talk about making
perl scripts faster, and although I was planning on using the things I did to
Encode's compile for 5.8, it might be easier to attack assembler.pl and
document things that work as speedups as I go along.

Nicholas Clark
-- 
Even better than the real thing:        http://nms-cgi.sourceforge.net/

--- ../../src/parrot/assemble.pl        Tue Aug  6 20:54:48 2002
+++ assemble.pl Thu Aug  8 15:47:03 2002
@@ -435,7 +435,18 @@ sub _annotate_contents {
 
   $self->{pc}++;
   return if $line=~/^\s*$/ or $line=~/^\s*#/; # Filter out the comments and blank 
lines
-  $line=~s/^((?:[^'"]+|$str_re)*)#.*$/$1/; # Remove trailing comments
+
+  # Doing it this way chews infinite CPU on 5.005_03. I suspect 5.6.1
+  # introduces some cunning optimisation in the regexp engine to avoid
+  # backtracking through the brackets with the multiple levels of *s
+  #
+  # $line=~s/^((?:[^'"]+|$str_re)*)#.*$/$1/; # Remove trailing comments
+  #
+  # This is 5.005_03 friendly:
+  if ($line=~ /^(?:[^'"]+|$str_re)#/g) {
+    # pos will point to the character after the #
+    substr ($line, (pos $line) - 1) = '';
+  }
   $line=~s/(^\s+|\s+$)//g;           # Remove leading and trailing whitespace
   #
   # Accumulate lines that only have labels until an instruction is found.

Reply via email to