On Wed, 2010-02-17 at 06:36 -0800, John Hardin wrote:
> On Wed, 17 Feb 2010, Karsten Bräckelmann wrote:

> > > +rawbody STYLE_GIBBERISH /<style[^>]{0,30}>(?:\s{1,20}|[^\s:;<]){175}/im
> >
> > Heh, I was just looking at the very same and bookmarked it for review
> > tomorrow morning.
> >
> > However, it shouldn't be that bad -- since it is bound, and the
> > alternatives are spaces or non-spaces. That should not lead to massive
> > backtracking.

Argh! On second thought, something I overlooked yesterday night. That RE
does have *massive* problems with some pathological cases of lots of
spaces. To fit exactly 175 occurrences, it might be necessary e.g. to
split an initial greedy 20 white-spaces match into multiple consecutive
matches of <20 spaces.

The problem is nested quantifiers with an alternation.

An alternative approach that should match the desired would look like
this -- eliminating the alternation with quantifiers inside.

  / (?: \s{1,20} [^\s:;<]{1,80} ){80} /x    # spaces for readability

Since there is no alternation and the two char classes are distinct,
this RE can be simply expanded and matched from left to right, without
any ambiguity.


> I was wondering about that one, too. I'll take it back out. I'm thinking 
> of a better way to achieve that.

John, does the above example help? :)


-- 
char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Reply via email to