Loren Wilton writes:
> >It does introduce the danger of algorithmic complexity attacks if .* is
> >used instead of .{0,20} though -- but we may be able to help this if we
> >spot that kind of thing in --lint.
>
> I still don't understand why .* is more dangerous in rawbody rules than
> it is in full rules. Any cases where it would have shown up in rawbody,
> it currently exists in a full rule. Of course, hopefully those are few
> and far between.
true.
> BTW, if .* were the only concern, it would be moderately trivial to
> enhance rule syntax checking to disallow that pattern.
Yeah, that's the problem, it's not *that* moderately trivial. Which of
these are safe?
/.*/ unsafe
/\.*/ unsafe
/\Q.*\E/ safe: \Q..\E means literal ".*"
/.{0,10}/ safe: 10 is short enough
/.{0,1000}/ probably safe?
/.{0,100000}/ possibly unsafe?
/x.{0,200}x.{0,200}x.{0,200}x.{0,200}y/
unsafe given the wrong input -- 500KB of "x"s
patches accepted of course ;)
But agreed, it makes sense to be pragmatic about it instead of striving
for perfection. If we do do something like this, we can just attempt to
spot the common problem cases.
Perfection would require introspection into the perl regexp compiler --
which would be nice but not so easy as far as I know ;)
--j.