On Wed, 2010-02-17 at 06:36 -0800, John Hardin wrote:
> On Wed, 17 Feb 2010, Karsten Bräckelmann wrote:
> > > +rawbody STYLE_GIBBERISH /<style[^>]{0,30}>(?:\s{1,20}|[^\s:;<]){175}/im
> >
> > Heh, I was just looking at the very same and bookmarked it for review
> > tomorrow morning.
> >
> > However, it shouldn't be that bad -- since it is bound, and the
> > alternatives are spaces or non-spaces. That should not lead to massive
> > backtracking.
Argh! On second thought, something I overlooked yesterday night. That RE
does have *massive* problems with some pathological cases of lots of
spaces. To fit exactly 175 occurrences, it might be necessary e.g. to
split an initial greedy 20 white-spaces match into multiple consecutive
matches of <20 spaces.
The problem is nested quantifiers with an alternation.
An alternative approach that should match the desired would look like
this -- eliminating the alternation with quantifiers inside.
/ (?: \s{1,20} [^\s:;<]{1,80} ){80} /x # spaces for readability
Since there is no alternation and the two char classes are distinct,
this RE can be simply expanded and matched from left to right, without
any ambiguity.
> I was wondering about that one, too. I'll take it back out. I'm thinking
> of a better way to achieve that.
John, does the above example help? :)
--
char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}