Jeffrey C. Jacobs <[EMAIL PROTECTED]> added the comment: Thanks Jim for your thoughts!
Armaury has already explained about Perl 5.10.0. I suppose it's like Macintosh version numbering, since Mac Tiger went from version 10.4.9 to 10.4.10 and 10.4.11 a few years ago. Maybe we should call Python 2.6 Python 2.06 just in case. But 2.6 is the known last in the 2 series so it's not a problem for us! :) >> as well as add a few python-specific > > because this also adds to the scope. At this point the only python-specific changes I am proposing would be items 2, 3 (discussed below), 5 (discussed below), 6 and 7. 6 is only a documentation change, the code is already implemented. 7 is just a better behavior. I think it is RARE one compiles more than 100 unique regular expressions, but you never know as projects tend to grow over time, and in the old code the 101st would be recompiled even if it was just compiled 2 minutes ago. The patch is available so I leave it to the community to judge for themselves whether it is worth it, but as you can see, it's not a very large change. >> 2) Make named matches direct attributes >> of the match object; i.e. instead of m.group('foo'), >> one will be able to write simply m.foo. > >> 3) (maybe) make Match objects subscriptable, such >> that m[n] is equivalent to m.group(n) and allow slicing. > > (2) and (3) would both be nice, but I'm not sure it makes sense to do > *both* instead of picking one. Well, I think named matches are better than numbered ones, so I'd definitely go with 2. The problem with 2, though, is that it still leaves the rather typographically intense m.group(n), since I cannot write m.3. However, since capture groups are always numbered sequentially, it models a list very nicely. So I think for indexing by group number, the subscripting operator makes sense. I was not originally suggesting m['foo'] be supported, but I can see how that may come out of 3. But there is a restriction on python named matches that they have to be valid python and that strikes me as 2 more than 3 because 3 would not require such a restriction but 2 would. So at least I want 2, but it seems IMHO m[1] is better than m.group(1) and not in the least hard or a confusing way of retrieving the given group. Mind you, the Match object is a C-struct with python binding and I'm not exactly sure how to add either feature to it, but I'm sure the C-API manual will help with that. >> 5) Add a well-formed, python-specific comment modifier, >> e.g. (?P#...); > > [handles parens in comments without turning on verbose, but is slower] > > Why? It adds another incompatibility, so it has to be very useful or > clear. What exactly is the advantage over just turning on verbose? Well, Larry Wall and Guido agreed long ago that we, the python community, own all expressions of the form (?P...) and although I'd be my preference to make (?#...) more in conformance with understanding parenthesis nesting, changing the logic behind THAT would make python non-standard. So as far as any conflicting design, we needn't worry. As for speed, the this all occurs in the parser and does not effect the compiler or engine. It occurs only after a (?P has been read and then only as the last check before failure, so it should not be much slower except when the expression is invalid. The actual execution time to find the closing brace of (?P#...) is a bit slower than that for (?#...) but not by much. Verbose is generally a good idea for anything more than a trivial Regular Expression. However, it can have overhead if not included as the first flag: an expression is always checked for verbose post-compilation and if it is encountered, the expression is compiled a second time, which is somewhat wasteful. But the reason I like the (?P#...) over (?#...) is because I think people would more tend to assume: r'He(?# 2 (TWO) ls)llo' should match "Hello" but it doesn't. That expression only matches "He ls)llo", so I created the (?P#...) to make the comment match type more intuitive: r'He(?P# 2 (TWO) ls)llo' matches "Hello". >> 9) C-Engine speed-ups. ... >> a number of Macros are being eliminated where appropriate. > > Be careful on those, particular on str/unicode and different > compile options. Will do; thanks for the advice! I have only observed the UNICODE flag controlling whether certain code is used (besides the ones I've added) and have tried to stay true to that when I encounter it. Mind you, unless I can get my extra 10% it's unlikely I'd actually go with item 9 here, even if it is easier to read IMHO. However, I want to run the new engine proposal through gprof to see if I can track down some bottlenecks. At some point, I hope to get my current changes on Launchpad if I can get that working. If I do, I'll give a link to how people can check out my working code here as well. __________________________________ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue2636> __________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com