[issue16286] Use hash if available to optimize a==b and a!=b for bytes and str
Raymond Hettinger added the comment: Serhiy, Gregory, Raymond, Antoine: so what is your feeling on this issue? Is it worth it? I don't think it is worth it. There may be some cases that benefit, but it adds extra branching code to the common cases (sets and dicts) that already have the identity check. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16286 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16286] Use hash if available to optimize a==b and a!=b for bytes and str
STINNER Victor added the comment: If it's hard to see a real speedup, it's probably not interesting to use the hash in string comparison. -- resolution: - invalid status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16286 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16286] Use hash if available to optimize a==b and a!=b for bytes and str
STINNER Victor added the comment: I ran pybench with the patch. I don't understand this result (10% slower with the patch): DictWithStringKeys:28ms25ms +10.7%28ms26ms +10.5% This test doesn't use unicode_compare_eq() from Objects/unicodeobject.c but unicode_eq() from Objects/stringlib/eq.h which is not modified by the patch. -- resolution: - invalid status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16286 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16286] Use hash if available to optimize a==b and a!=b for bytes and str
STINNER Victor added the comment: (oops, I didn't want to close the issue, it's a mistake) -- resolution: invalid - status: closed - open ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16286 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16286] Use hash if available to optimize a==b and a!=b for bytes and str
STINNER Victor added the comment: I added recently a new _PyUnicode_CompareWithId() function: changeset 77bebcf5c4cf (issue #19512). This function can be used instead of PyUnicode_CompareWithASCIIString() when the right parameter is a common string. It is interesting when the right string is probably present in a dictionary. For example, path is always present as sys.path. So interning the string doesn't eat more memory. _PyUnicode_CompareWithId() would be more efficient with compare_hash-3.patch. The function is not used yet in critical path. It is now used in type_new() for example. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16286 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16286] Use hash if available to optimize a==b and a!=b for bytes and str
STINNER Victor added the comment: Serhiy, Gregory, Raymond, Antoine: so what is your feeling on this issue? Is it worth it? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16286 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16286] Use hash if available to optimize a==b and a!=b for bytes and str
Roundup Robot added the comment: New changeset 536a7c09c7fd by Victor Stinner in branch 'default': Issue #16286: write a new subfunction bytes_compare_eq() http://hg.python.org/cpython/rev/536a7c09c7fd -- nosy: +python-dev ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16286 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16286] Use hash if available to optimize a==b and a!=b for bytes and str
Roundup Robot added the comment: New changeset 5fa291435740 by Victor Stinner in branch 'default': Issue #16286: optimize PyUnicode_RichCompare() for identical strings (same http://hg.python.org/cpython/rev/5fa291435740 -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16286 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16286] Use hash if available to optimize a==b and a!=b for bytes and str
Roundup Robot added the comment: New changeset da9c6e4ef301 by Victor Stinner in branch 'default': Issue #16286: remove duplicated identity check from unicode_compare() http://hg.python.org/cpython/rev/da9c6e4ef301 -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16286 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16286] Use hash if available to optimize a==b and a!=b for bytes and str
STINNER Victor added the comment: I applied changes unrelated to the hash. -- Added file: http://bugs.python.org/file32493/compare_hash-3.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16286 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16286] Use hash if available to optimize a==b and a!=b for bytes and str
STINNER Victor added the comment: Results of benchmarks using compare_hash-3.patch: $ time ../benchmarks/perf.py -r -b default ./pythonorig ./pythonhash INFO:root:Skipping benchmark slowspitfire; not compatible with Python 3.4 INFO:root:Skipping benchmark slowpickle; not compatible with Python 3.4 INFO:root:Skipping benchmark slowunpickle; not compatible with Python 3.4 INFO:root:Skipping benchmark spambayes; not compatible with Python 3.4 Running 2to3... Running django_v2... Report on Linux smithers 3.9.4-200.fc18.x86_64 #1 SMP Fri May 24 20:10:49 UTC 2013 x86_64 x86_64 Total CPU cores: 8 ### 2to3 ### Min: 6.358000 - 6.055000: 1.05x faster Avg: 6.407600 - 6.179800: 1.04x faster Significant (t=3.53) Stddev: 0.04311 - 0.13785: 3.1979x larger ### nbody ### Min: 0.219029 - 0.212477: 1.03x faster Avg: 0.224940 - 0.219248: 1.03x faster Significant (t=10.13) Stddev: 0.00373 - 0.00420: 1.1288x larger The following not significant results are hidden, use -v to show them: django_v2. At least, Python is not slower with the patch :-) I'm surprised that the benchmark shows a difference. nbody benchmark is focused on float numbers. I checked with gdb, nbody benchmark does not call any Unicode comparison function. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16286 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16286] Use hash if available to optimize a==b and a!=b for bytes and str
STINNER Victor added the comment: Updated patch. -- Added file: http://bugs.python.org/file32445/compare_hash-2.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16286 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16286] Use hash if available to optimize a==b and a!=b for bytes and str
Antoine Pitrou added the comment: That raises the question of what strings ever have had their hash already computed if the string hasn't been interned or has been used in a dict or set? Currently, none, I think. -- nosy: +pitrou ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16286 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16286] Use hash if available to optimize a==b and a!=b for bytes and str
STINNER Victor added the comment: That raises the question of what strings ever have had their hash already computed if the string hasn't been interned or has been used in a dict or set? Currently, none, I think. Strings are used (and compared) outside dict and set. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16286 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16286] Use hash if available to optimize a==b and a!=b for bytes and str
STINNER Victor added the comment: Let's try to identify some use cases in the Python test suite using gdb: (gdb) b unicode_compare_eq (gdb) condition 1 ((PyASCIIObject*)str1)-hash != -1 ((PyASCIIObject*)str2)-hash != -1 ((PyASCIIObject*)str1)-hash != ((PyASCIIObject*)str2)-hash (gdb) run I didn't dig to understand why hash of these strings are computed. Tell me if you need more examples. Random examples: (1) compare constant strings (strings from co_consts of code objects) importlib._bootstrap: _setup(): os_details = ('posix', ['/']), ('nt', ['\\', '/']) for builtin_os, path_separators in os_details: ... ... if builtin_os == 'nt': == HERE ... (2) importlib._bootstrap: _LoaderBasics.is_package() def is_package(self, fullname): filename = _path_split(self.get_filename(fullname))[1] filename_base = filename.rsplit('.', 1)[0] tail_name = fullname.rpartition('.')[2] return filename_base == '__init__' and ... == HERE It's surprising that filename_base has its hash computed. I suppose that all these functions (_path_split, .rsplit, .rpartition) return the string unmodified. (3) importlib._bootstrap: PathFinder._path_importer_cache(): @classmethod def _path_importer_cache(cls, path): ... if path == '': == HERE path is an entry of sys.path. (4) str in __all__ (list of str): os.py: if putenv not in __all__: __all__.append(putenv) __all__ is a list of strings. (5) site.py: if __name__ == '__main__': == HERE __name__ is 'site'. (6) Python/ceval.py: PyEval_EvalCodeEx() called with arbitrary keyword for (i = 0; i kwcount; i++) { PyObject **co_varnames; PyObject *keyword = kws[2*i]; PyObject *value = kws[2*i + 1]; int j; ... /* Speed hack: do raw pointer compares. As names are normally interned this should almost always hit. */ co_varnames = ((PyTupleObject *)(co-co_varnames))-ob_item; for (j = 0; j total_args; j++) { PyObject *nm = co_varnames[j]; if (nm == keyword) goto kw_found; } /* Slow fallback, just in case */ for (j = 0; j total_args; j++) { PyObject *nm = co_varnames[j]; int cmp = PyObject_RichCompareBool( == HERE keyword, nm, Py_EQ); if (cmp 0) goto kw_found; else if (cmp 0) goto fail; } It looks like the just in case path is taken. (gdb) where #0 unicode_compare_eq (str1='isTest', str2='func') at Objects/unicodeobject.c:10532 #1 0x0052dd41 in PyUnicode_RichCompare (left='isTest', right='func', op=2) at Objects/unicodeobject.c:10609 #2 0x004be4db in do_richcompare (v='isTest', w='func', op=2) at Objects/object.c:647 #3 0x004be790 in PyObject_RichCompare (v='isTest', w='func', op=2) at Objects/object.c:696 #4 0x004be832 in PyObject_RichCompareBool (v='isTest', w='func', op=2) at Objects/object.c:718 #5 0x005a0f68 in PyEval_EvalCodeEx (...) at Python/ceval.c:3450 ... Traceback (most recent call first): File /home/haypo/prog/python/default/Lib/test/test_xml_etree.py, line 1669, in test_get_keyword_args e1 = ET.Element('foo' , x=1, y=2, z=3) ElementTree.Element() accepts arbitary keywords. (7) letter==letter singletons: xml.etree.ElementPath: iterfind() def iterfind(elem, path, namespaces=None): ... if path[-1:] == /: == HERE Traceback (most recent call first): File /home/haypo/prog/python/default/Lib/xml/etree/ElementPath.py, line 254, in iterfind if path[-1:] == /: path is .//grandchild, path[-1] is 'd' which is a singleton, Python already computed the hash of 'd'. Similar example in the same file: def xpath_tokenizer(pattern, namespaces=None): for token in xpath_tokenizer_re.findall(pattern): tag = token[1] if tag and tag[0] != { and : in tag: == HERE ... tag[0] != { = tag is 'grandchild', tag[0] is a singleton. Another example: Traceback (most recent call first): File /home/haypo/prog/python/default/Lib/sre_parse.py, line 194, in __next if char == \\: (8) str not in (list of str), test_descr.py: test_dir(): File /home/haypo/prog/python/default/Lib/test/test_descr.py, line 2255, in listcomp names = [x for x in dir(minstance) if x not in default_attributes] minstance = M(m) minstance.b = 2 minstance.a = 1 default_attributes = ['__name__', '__doc__', '__package__', '__loader__'] names = [x for x in dir(minstance) if x not in default_attributes] -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16286 ___ ___ Python-bugs-list mailing list Unsubscribe:
[issue16286] Use hash if available to optimize a==b and a!=b for bytes and str
STINNER Victor added the comment: (4) str in __all__ (list of str): os.py: if putenv not in __all__: __all__.append(putenv) For this example: putenv is probably interned by def putenv(...). putenv string and the name of the function are the same constant. When a function is defined, it is stored in the locals dictionary, so the key (putenv) is interned by PyDict_SetItem(). So dict is not used explicitly, but implicitly by the namespace. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16286 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16286] Use hash if available to optimize a==b and a!=b for bytes and str
STINNER Victor added the comment: I will benchmark the overhead of memcmp() on short strings. We may check the first and last characters before calling memcmp() to limit the overhead of calling a function. I created the issue #17628 for this point. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16286 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16286] Use hash if available to optimize a==b and a!=b for bytes and str
Changes by STINNER Victor victor.stin...@gmail.com: -- title: Optimize a==b and a!=b for bytes and str - Use hash if available to optimize a==b and a!=b for bytes and str ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16286 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com