Walter Farrell added the comment: Thanks, Gareth. That does work.
Interesting that regex does still seem to work linearly with the original version, but your version seems cleaner. On Mon, Nov 14, 2016 at 3:55 PM, Gareth Rees <rep...@bugs.python.org> wrote: > > Gareth Rees added the comment: > > This is a well-known gotcha with backtracking regexp implementations. The > problem is that in the alternation "( +|'[^']*'|\"[^\"]*\"|[^>]+)" there > are some characters (space, apostrophe, double quotes) that match multiple > alternatives (for example a space matches both " +" and "[^>]+"). This > causes the regexp engine to have to backtrack for each ambiguous character > to try out the other alternatives, leading to runtime that's exponential in > the number of ambiguous characters. > > Linear behaviour can be restored if you make the alternation unambiguous, > like this: ( +|'[^']*'|\"[^\"]*\"|[^>'\"]+) > > ---------- > nosy: +Gareth.Rees > > _______________________________________ > Python tracker <rep...@bugs.python.org> > <http://bugs.python.org/issue28690> > _______________________________________ > ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue28690> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com