On Thu, Feb 21, 2013 at 12:12 PM, Tikhon B. <[email protected]> wrote:
> Hi All,
>
> I am trying to figure out a performance question with the ruby regex
> matcher. Running the following code takes excessively longer than
> expected.
You are doing the measurement wrong. You do not only measure matching
but also creation of the String. Given that fact, all your results
are moot.
> 1. Benchmark.measure {("a" * 10_000).match(/a.*?b/)}
> => 1.140000 0.010000 1.150000 ( 1.195161)
>
> 2. Benchmark.measure {("a" * 10_000).match(/a[^b]*b/)}
> => 1.400000 0.000000 1.400000 ( 1.462178)
>
> Thinking the parser was doing some unnecessary backtracking I also tried
> atomic grouping, and other simpler expressions with very limited
> success:
>
> 3. Benchmark.measure {("a" * 10_000).match(/(?>a[^b]*b)/)}
> => 1.400000 0.000000 1.400000 ( 1.462178)
>
> 4. Benchmark.measure {("a" * 10_000).match(/aa*b/)}
> => 1.270000 0.000000 1.270000 ( 1.331137)
>
> 5. Benchmark.measure{val.match(/a.*b/)}
> => 0.460000 0.000000 0.460000 ( 0.487267)
Where does 'val' come from and what does it reference?
> This may be a very simple vector for a DDOS attack against a server
> using a similar expression to verify input data. What more, since valid
> input data is not likely to resemble scenarios 1-6 the issue may go
> undetected by a developers and testers not aware of this problem.
>
> Conversely, other developers may be turned off from using a regex in a
> place where it would perform well due to simple performance testing as
> above.
>
> This sort of performance hit is in neither the NodeJS nor Python regex
> engines. Writing this message I think I got a better grasp of the cause
> of the issue, but I still feel that it needs to be addressed for the
> sake of security.
Before we jump to conclusions please provide the proper and complete
test. Note also that it is a common fact that NFA's can suffer from
backtracking in certain conditions. So I am not too surprised nor
worried. I did not look closely at all your test cases but often
backtracking issues can be remedied by using greediness modifiers
(either "greedy" or "reluctant").
Kind regards
robert
--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
--
[email protected] |
https://groups.google.com/d/forum/ruby-talk-google?hl=en
---
You received this message because you are subscribed to the Google Groups
"ruby-talk-google" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.