Re: [HACKERS] Pathological regexp match

2010-02-09 Thread Joachim Wieland
On Mon, Feb 8, 2010 at 6:07 PM, David E. Wheeler da...@kineticode.com wrote: On Feb 8, 2010, at 5:15 AM, Magnus Hagander wrote: The text is about 180Kb. PostgreSQL takes ~40 seconds without the patch, ~36 seconds with it, to extract the match from it. Perl takes 0.016 seconds. Obviously we

Re: [HACKERS] Pathological regexp match

2010-02-08 Thread Magnus Hagander
2010/2/1 Michael Glaesemann michael.glaesem...@myyearbook.com: On Jan 31, 2010, at 22:14 , Tom Lane wrote: The Tcl folk accepted that patch, so I went ahead and applied it to our code.  It would still be a good idea for us to do any testing we can on it, though. I applied the patch and ran

Re: [HACKERS] Pathological regexp match

2010-02-08 Thread David E. Wheeler
On Feb 8, 2010, at 5:15 AM, Magnus Hagander wrote: The text is about 180Kb. PostgreSQL takes ~40 seconds without the patch, ~36 seconds with it, to extract the match from it. Perl takes 0.016 seconds. Obviously we need to support Perl regular expressions in core. Not PCRE, but Perl. ;-P

Re: [HACKERS] Pathological regexp match

2010-02-01 Thread Michael Glaesemann
On Jan 31, 2010, at 22:14 , Tom Lane wrote: The Tcl folk accepted that patch, so I went ahead and applied it to our code. It would still be a good idea for us to do any testing we can on it, though. I applied the patch and ran both the test query I submitted as well as original

Re: [HACKERS] Pathological regexp match

2010-01-31 Thread Magnus Hagander
On Sat, Jan 30, 2010 at 08:07, Tom Lane t...@sss.pgh.pa.us wrote: Michael Glaesemann michael.glaesem...@myyearbook.com writes: We came across a regexp that takes very much longer than expected. I poked into this a little bit.  What seems to be happening is that the use of non-greedy

Re: [HACKERS] Pathological regexp match

2010-01-31 Thread Tom Lane
I wrote: I found a possible patch that seems to improve matters: AFAICS the DFA matching is independent of the checking that cdissect() and friends do, so we can apply that check first instead of second. This brings the runtime down from minutes to sub-second on my machine. However I don't

Re: [HACKERS] Pathological regexp match

2010-01-29 Thread Magnus Hagander
2010/1/29 Alvaro Herrera alvhe...@commandprompt.com: Hi Michael, Michael Glaesemann wrote: We came across a regexp that takes very much longer than expected. PostgreSQL 8.4.1 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-44), 64-bit SELECT 'ooo...'

Re: [HACKERS] Pathological regexp match

2010-01-29 Thread Alvaro Herrera
Magnus Hagander wrote: 2010/1/29 Alvaro Herrera alvhe...@commandprompt.com: (There's a badly needed CHECK_FOR_INTERRUPTS in this code BTW) Incidentally, I ran across the exact same issue with a non-greedy regexp with a client earlier this week, and put on my TODO to figure out a good place

Re: [HACKERS] Pathological regexp match

2010-01-29 Thread Tom Lane
Michael Glaesemann michael.glaesem...@myyearbook.com writes: We came across a regexp that takes very much longer than expected. I poked into this a little bit. What seems to be happening is that the use of non-greedy quantifiers plus backreferences turns off most of the optimization that the

[HACKERS] Pathological regexp match

2010-01-28 Thread Michael Glaesemann
We came across a regexp that takes very much longer than expected. PostgreSQL 8.4.1 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-44), 64-bit SELECT 'ooo...' ~ $r$Z(Q)[^Q]*A.*?(\1)$r$; -- omitted for email brevity ?column? -- t (1 row) Time:

Re: [HACKERS] Pathological regexp match

2010-01-28 Thread Alvaro Herrera
Hi Michael, Michael Glaesemann wrote: We came across a regexp that takes very much longer than expected. PostgreSQL 8.4.1 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-44), 64-bit SELECT 'ooo...' ~ $r$Z(Q)[^Q]*A.*?(\1)$r$; -- omitted for email

Re: [HACKERS] Pathological regexp match

2010-01-28 Thread Michael Glaesemann
On Jan 28, 2010, at 21:59 , Alvaro Herrera wrote: Hi Michael, Michael Glaesemann wrote: We came across a regexp that takes very much longer than expected. PostgreSQL 8.4.1 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-44), 64-bit SELECT 'ooo...' ~

Re: [HACKERS] Pathological regexp match

2010-01-28 Thread Alvaro Herrera
Michael Glaesemann wrote: On Jan 28, 2010, at 21:59 , Alvaro Herrera wrote: Hi Michael, Michael Glaesemann wrote: We came across a regexp that takes very much longer than expected. PostgreSQL 8.4.1 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat

Re: [HACKERS] Pathological regexp match

2010-01-28 Thread Alvaro Herrera
Michael Glaesemann wrote: However, as you point out, Postgres doesn't appear to take this into account: postgres=# select regexp_replace('oooZQoooAoooQooQooQooo', $r$(Z(Q) [^Q]*A.*(\2))$r$, $s$X$s$); regexp_replace oooXooo (1 row) postgres=# select

Re: [HACKERS] Pathological regexp match

2010-01-28 Thread Michael Glaesemann
On Jan 28, 2010, at 23:21 , Alvaro Herrera wrote: I think the reason for this is that the first * is greedy and thus the entire expression is considered greedy. The fact that you've made the second * non-greedy does not ungreedify the RE ... Note the docs say: The above rules