I am not on "Python 3000", so am restricting. Mike Klaas <[EMAIL PROTECTED]> wrote: > > > I have needed to push my stack to teach REs (don't ask), and am > > taking a look at the RE code. I may be able to extend it to support > > RFE 694374 and (more importantly) atomic groups and possessive > > quantifiers. While I regard such things as revolting beyond belief, > > they make a HELL of a difference to the efficiency of recognising > > things like HTML tags in a morass of mixed text. > > +1. I would use such a feature.
I think that I am getting somewhere, but I really dislike the style of _sre.c. It has a very complex semi-stack, semi-finite-state design and no comments on how it is supposed to work. And its memory management looks like a recipe for leaks, so I may well introduce some of them. > > The other approach, which is to stick to true regular expressions, > > and wholly or partially convert to DFAs, has already been rendered > > impossible by even the limited Perl/PCRE extensions that Python > > has adopted. > > Impossible? Surely, a sufficiently-competent re engine could detect > when a DFA is possible to construct? I doubt it. While it isn't equivalent to the halting problem, it IS an intractable one! There are two problems: Firstly, things like backreferences are an absolute no-no. They are not regular, and REs with them in cannot be converted to DFAs. That could be 'solved' by a parser that kicked out such constructions, but it would get screams from many users. Secondly, anything involving explicit or implicit negation can lead to (if I recall) a super-exponential explosion in the size of the DFA. That could be 'solved' by imposing a limit, but few people would be able to predict when it would bite. Thirdly, I would require notice of the question of whether capturing parentheses could be supported, and what semantic changes would be to which were set and how. Regards, Nick Maclaren, University of Cambridge Computing Service, New Museums Site, Pembroke Street, Cambridge CB2 3QH, England. Email: [EMAIL PROTECTED] Tel.: +44 1223 334761 Fax: +44 1223 334679 _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com