03.10.17 06:29, INADA Naoki пише:
Before deferring re.compile, can we make it faster?

I profiled `import string` and small optimization can make it 2x faster!
(but it's not backward compatible)

Please open an issue for this.

I found:

* RegexFlag.__and__ and __new__ is called very often.
* _optimize_charset is slow, because re.UNICODE | re.IGNORECASE

diff --git a/Lib/sre_compile.py b/Lib/sre_compile.py
index 144620c6d1..7c662247d4 100644
--- a/Lib/sre_compile.py
+++ b/Lib/sre_compile.py
@@ -582,7 +582,7 @@ def isstring(obj):

  def _code(p, flags):

-    flags = p.pattern.flags | flags
+    flags = int(p.pattern.flags) | int(flags)
      code = []

      # compile info block

Maybe cast flags to int earlier, in sre_compile.compile()?

diff --git a/Lib/string.py b/Lib/string.py
index b46e60c38f..fedd92246d 100644
--- a/Lib/string.py
+++ b/Lib/string.py
@@ -81,7 +81,7 @@ class Template(metaclass=_TemplateMetaclass):
      delimiter = '$'
      idpattern = r'[_a-z][_a-z0-9]*'
      braceidpattern = None
-    flags = _re.IGNORECASE
+    flags = _re.IGNORECASE | _re.ASCII

      def __init__(self, template):
          self.template = template

patched:
import time:      1191 |       8479 | string

Of course, this patch is not backward compatible. [a-z] doesn't match with 'ı' or 'ſ' anymore.
But who cares?

This looks like a bug fix. I'm wondering if it is worth to backport it to 3.6. But the change itself can break a user code that changes idpattern without touching flags. There is other way, but it should be discussed on the bug tracker.

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to