I have needed to push my stack to teach REs (don't ask), and am taking a look at the RE code. I may be able to extend it to support RFE 694374 and (more importantly) atomic groups and possessive quantifiers. While I regard such things as revolting beyond belief, they make a HELL of a difference to the efficiency of recognising things like HTML tags in a morass of mixed text.
The other approach, which is to stick to true regular expressions, and wholly or partially convert to DFAs, has already been rendered impossible by even the limited Perl/PCRE extensions that Python has adopted. My first question is whether this would clash with any ongoing work, including being superseded by any changes in Python 3000. Note that I am NOT proposing to do a fixed task, but will produce a proper proposal only when I know what I can achieve for a small amount of work. If the SRE engine turns out to be unsuitable to extend in these ways, I shall quietly abandon the project. My second one is about Unicode. I really, but REALLY regard it as a serious defect that there is no escape for printing characters. Any code that checks arbitrary text is likely to need them - yes, I know why Perl and hence PCRE doesn't have that, but let's skip that. That is easy to add, though choosing a letter is tricky. Currently \c and \C, for 'character' (I would prefer 'text' or 'printable', but \t is obviously insane and \P is asking for incompatibility with Perl and Java). But attempting to rebuild the Unicode database hasn't worked. Tools/unicode is, er, a trifle incomplete and out of date. The only file I need to change is Objects/unicodetype_db.h, but the init attempts to run Tools/unicode/makeunicodedata.py have not been successful. I may be able to reverse engineer the mechanism enough to get the files off the Unicode site and run it, but I don't want to spend forever on it. Any clues? Regards, Nick Maclaren, University of Cambridge Computing Service, New Museums Site, Pembroke Street, Cambridge CB2 3QH, England. Email: [EMAIL PROTECTED] Tel.: +44 1223 334761 Fax: +44 1223 334679 _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com