Walter Farrell added the comment:

Thanks, Gareth. That does work.

Interesting that regex does still seem to work linearly with the original
version, but your version seems cleaner.

On Mon, Nov 14, 2016 at 3:55 PM, Gareth Rees <rep...@bugs.python.org> wrote:

>
> Gareth Rees added the comment:
>
> This is a well-known gotcha with backtracking regexp implementations. The
> problem is that in the alternation "( +|'[^']*'|\"[^\"]*\"|[^>]+)" there
> are some characters (space, apostrophe, double quotes) that match multiple
> alternatives (for example a space matches both " +" and "[^>]+"). This
> causes the regexp engine to have to backtrack for each ambiguous character
> to try out the other alternatives, leading to runtime that's exponential in
> the number of ambiguous characters.
>
> Linear behaviour can be restored if you make the alternation unambiguous,
> like this: ( +|'[^']*'|\"[^\"]*\"|[^>'\"]+)
>
> ----------
> nosy: +Gareth.Rees
>
> _______________________________________
> Python tracker <rep...@bugs.python.org>
> <http://bugs.python.org/issue28690>
> _______________________________________
>

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue28690>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to