New submission from Serhiy Storchaka:

Regular expression parser parses a pattern to a tree, marking nodes by string 
identifiers. Regular expression compiler converts this three into plain list of 
integers. Node's identifiers are transformed to sequential integers. Resulting 
list is not human readable. Proposed patch converts string constants in the 
sre_constants module to named integer constants. These constants doesn't need 
converting to integers, because they are already integers, and when printed 
they looks human-friendly. Now intermediate result of regular expression 
compiler looks much more readable.

Example.

>>> import re, sre_compile, sre_parse
>>> sre_compile._code(sre_parse.parse('[a-z_][a-z_0-9]+', re.I), re.I)

Before patch:

[17, 4, 0, 2, 2147483647, 16, 7, 27, 97, 122, 19, 95, 0, 29, 16, 1, 2147483647, 
16, 11, 10, 0, 67043328, 2147483648, 134217726, 0, 0, 0, 0, 0, 1, 1]

After patch:

[INFO, 4, 0, 2, MAXREPEAT, IN_IGNORE, 7, RANGE, 97, 122, LITERAL, 95, FAILURE, 
REPEAT_ONE, 16, 1, MAXREPEAT, IN_IGNORE, 11, CHARSET, 0, 67043328, 2147483648, 
134217726, 0, 0, 0, 0, FAILURE, SUCCESS, SUCCESS]

This patch also affects debugging output when regular expression is compiled 
with re.DEBUG (identifiers are uppercased and MAXREPEAT is displayed instead of 
2147483647 in repeat statements).

Besides debugging output these changes are invisible for ordinal user. They are 
needed only for developing and debugging the re module itself. The patch 
doesn't affect performance and almost not affects memory consumption.

----------
components: Regular Expressions
files: re_named_consts.patch
keywords: patch
messages: 227008
nosy: ezio.melotti, mrabarnett, pitrou, serhiy.storchaka
priority: normal
severity: normal
stage: patch review
status: open
title: Use named constants internally in the re module
type: enhancement
versions: Python 3.5
Added file: http://bugs.python.org/file36642/re_named_consts.patch

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue22434>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to