On Sat, Aug 27, 2011 at 5:48 PM, Terry Reedy <tjre...@udel.edu> wrote: > Many of the things regex does differently might be called either bug fixes > or feature changes, depending on one's viewpoint. Regex should definitely > not be 'bug-compatible'.
Well, as you said, it depends on one's viewpoint. If there's a bug in the treatment of non-BMP character ranges, that's a bug, and fixing it shouldn't break anybody's code (unless it was worth breaking :-). But if there's a change that e.g. (hypothetical example) makes a different choice about how empty matches are treated in some edge case, and the old behavior was properly documented, that's a feature change, and I'd rather introduce a flag to select the new behavior (or, if we have to, a flag to preserve the old behavior, if the new behavior is really considered much better and much more useful). > I think regex should be unicode-standard compliant as much as possible, and > let the chips fall where they may. In most cases the Unicode improvements in regex are not where it is incompatible; e.g. adding \X and named ranges are fine new additions and IIUC the syntax was carefully designed not to introduce any incompatibilities (within the limitations of \-escapes). It's the many other "improvements" to the regex module that sometimes make it incompatible.There's a comprehensive list here: http://pypi.python.org/pypi/regex . Somebody should just go over it and for each difference make a recommendation for whether to treat this as a bugfix, a compatible new feature, or an incompatibility that requires some kind of flag. (We could have a single flag for all incompatibilities, or several flags.) > If so, it would be like the decimal > module, which closely tracks the IEEE decimal standard, rather than the > binary float standard. Well, I would hope that for each "major" Python version (i.e. 3.2, 3.3, 3.4, ...) we would pick a specific version of the Unicode standard and declare our desire to be compliant with that Unicode standard version, and not switch allegiances in some bugfix version (e.g. 3.2.3, 3.3.1, ...). > Regex is already much more compliant than re, as shown by Tom Christiansen. Nobody disagrees with this or thinks it's a bad thing. :-) > This is pretty obviously intentional on MB's part. That's also clear. > It is also probably intentional that re *not* match today's Unicode > TR18 specifications. That I'm not so sure of. I think it's more the case that TR18 evolved and that the re modules didn't -- probably mostly because nobody had the time and nobody was aware of the TR18 changes. > These are reasons why both Ezio and I suggested on the tracker adding regex > without deleting re. (I personally would not mind just replacing re with > regex, but then I have no legacy re code to break. So I am not suggesting > that out of respect for those who do.) That option is definitely still on the table. At the very least a thorough review of the stated differences between re and regex should be done -- I trust that MR has been very thorough in his listing of those differences. The issues regarding maintenance and stability of MR's code can be solved in a number of ways -- if MR doesn't mind I would certainly be willing to give him core committer access (though I'd still recommend that he use his time primarily to train others in maintaining this important code base). -- --Guido van Rossum (python.org/~guido) _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com