[issue16286] Use hash if available to optimize a==b and a!=b for bytes and str

2014-04-16 Thread Raymond Hettinger

Raymond Hettinger added the comment:

 Serhiy, Gregory, Raymond, Antoine: so what is your feeling on 
 this issue? Is it worth it?

I don't think it is worth it.  There may be some cases that benefit, but it 
adds extra branching code to the common cases (sets and dicts) that already 
have the identity check.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16286
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16286] Use hash if available to optimize a==b and a!=b for bytes and str

2013-12-16 Thread STINNER Victor

STINNER Victor added the comment:

If it's hard to see a real speedup, it's probably not interesting to use the 
hash in string comparison.

--
resolution:  - invalid
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16286
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16286] Use hash if available to optimize a==b and a!=b for bytes and str

2013-11-12 Thread STINNER Victor

STINNER Victor added the comment:

I ran pybench with the patch. I don't understand this result (10% slower with 
the patch):

DictWithStringKeys:28ms25ms  +10.7%28ms26ms  +10.5%

This test doesn't use unicode_compare_eq() from Objects/unicodeobject.c but 
unicode_eq() from Objects/stringlib/eq.h which is not modified by the patch.

--
resolution:  - invalid
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16286
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16286] Use hash if available to optimize a==b and a!=b for bytes and str

2013-11-12 Thread STINNER Victor

STINNER Victor added the comment:

(oops, I didn't want to close the issue, it's a mistake)

--
resolution: invalid - 
status: closed - open

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16286
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16286] Use hash if available to optimize a==b and a!=b for bytes and str

2013-11-07 Thread STINNER Victor

STINNER Victor added the comment:

I added recently a new _PyUnicode_CompareWithId() function: changeset 
77bebcf5c4cf (issue #19512).

This function can be used instead of PyUnicode_CompareWithASCIIString() when 
the right parameter is a common string. It is interesting when the right string 
is probably present in a dictionary. For example, path is always present as 
sys.path. So interning the string doesn't eat more memory.

_PyUnicode_CompareWithId() would be more efficient with compare_hash-3.patch. 
The function is not used yet in critical path. It is now used in type_new() for 
example.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16286
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16286] Use hash if available to optimize a==b and a!=b for bytes and str

2013-11-07 Thread STINNER Victor

STINNER Victor added the comment:

Serhiy, Gregory, Raymond, Antoine: so what is your feeling on this issue? Is it 
worth it?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16286
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16286] Use hash if available to optimize a==b and a!=b for bytes and str

2013-11-04 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 536a7c09c7fd by Victor Stinner in branch 'default':
Issue #16286: write a new subfunction bytes_compare_eq()
http://hg.python.org/cpython/rev/536a7c09c7fd

--
nosy: +python-dev

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16286
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16286] Use hash if available to optimize a==b and a!=b for bytes and str

2013-11-04 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 5fa291435740 by Victor Stinner in branch 'default':
Issue #16286: optimize PyUnicode_RichCompare() for identical strings (same
http://hg.python.org/cpython/rev/5fa291435740

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16286
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16286] Use hash if available to optimize a==b and a!=b for bytes and str

2013-11-04 Thread Roundup Robot

Roundup Robot added the comment:

New changeset da9c6e4ef301 by Victor Stinner in branch 'default':
Issue #16286: remove duplicated identity check from unicode_compare()
http://hg.python.org/cpython/rev/da9c6e4ef301

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16286
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16286] Use hash if available to optimize a==b and a!=b for bytes and str

2013-11-04 Thread STINNER Victor

STINNER Victor added the comment:

I applied changes unrelated to the hash.

--
Added file: http://bugs.python.org/file32493/compare_hash-3.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16286
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16286] Use hash if available to optimize a==b and a!=b for bytes and str

2013-11-04 Thread STINNER Victor

STINNER Victor added the comment:

Results of benchmarks using compare_hash-3.patch:

$ time ../benchmarks/perf.py -r -b default ./pythonorig ./pythonhash 
INFO:root:Skipping benchmark slowspitfire; not compatible with Python 3.4
INFO:root:Skipping benchmark slowpickle; not compatible with Python 3.4
INFO:root:Skipping benchmark slowunpickle; not compatible with Python 3.4
INFO:root:Skipping benchmark spambayes; not compatible with Python 3.4
Running 2to3...
Running django_v2...

Report on Linux smithers 3.9.4-200.fc18.x86_64 #1 SMP Fri May 24 20:10:49 UTC 
2013 x86_64 x86_64
Total CPU cores: 8

### 2to3 ###
Min: 6.358000 - 6.055000: 1.05x faster
Avg: 6.407600 - 6.179800: 1.04x faster
Significant (t=3.53)
Stddev: 0.04311 - 0.13785: 3.1979x larger

### nbody ###
Min: 0.219029 - 0.212477: 1.03x faster
Avg: 0.224940 - 0.219248: 1.03x faster
Significant (t=10.13)
Stddev: 0.00373 - 0.00420: 1.1288x larger

The following not significant results are hidden, use -v to show them:
django_v2.


At least, Python is not slower with the patch :-) I'm surprised that the 
benchmark shows a difference. nbody benchmark is focused on float numbers. I 
checked with gdb, nbody benchmark does not call any Unicode comparison function.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16286
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16286] Use hash if available to optimize a==b and a!=b for bytes and str

2013-10-31 Thread STINNER Victor

STINNER Victor added the comment:

Updated patch.

--
Added file: http://bugs.python.org/file32445/compare_hash-2.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16286
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16286] Use hash if available to optimize a==b and a!=b for bytes and str

2013-10-28 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 That raises the question of what strings ever have had their hash
 already computed if the string hasn't been interned or has been used
 in a dict or set?

Currently, none, I think.

--
nosy: +pitrou

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16286
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16286] Use hash if available to optimize a==b and a!=b for bytes and str

2013-10-28 Thread STINNER Victor

STINNER Victor added the comment:

 That raises the question of what strings ever have had their hash
 already computed if the string hasn't been interned or has been used
 in a dict or set?

 Currently, none, I think.

Strings are used (and compared) outside dict and set.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16286
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16286] Use hash if available to optimize a==b and a!=b for bytes and str

2013-10-28 Thread STINNER Victor

STINNER Victor added the comment:

Let's try to identify some use cases in the Python test suite using gdb:

(gdb) b unicode_compare_eq
(gdb) condition 1 ((PyASCIIObject*)str1)-hash != -1   
((PyASCIIObject*)str2)-hash != -1  ((PyASCIIObject*)str1)-hash != 
((PyASCIIObject*)str2)-hash 
(gdb) run

I didn't dig to understand why hash of these strings are computed. Tell me if 
you need more examples.


Random examples:

(1) compare constant strings (strings from co_consts of code objects)

importlib._bootstrap: _setup():

os_details = ('posix', ['/']), ('nt', ['\\', '/'])
for builtin_os, path_separators in os_details:
...
...
if builtin_os == 'nt': == HERE
...


(2) importlib._bootstrap: _LoaderBasics.is_package()

def is_package(self, fullname):
filename = _path_split(self.get_filename(fullname))[1]
filename_base = filename.rsplit('.', 1)[0]
tail_name = fullname.rpartition('.')[2]
return filename_base == '__init__' and ... == HERE

It's surprising that filename_base has its hash computed. I suppose that all 
these functions (_path_split, .rsplit, .rpartition) return the string 
unmodified.


(3) importlib._bootstrap: PathFinder._path_importer_cache():

@classmethod
def _path_importer_cache(cls, path):
...
if path == '': == HERE

path is an entry of sys.path.


(4) str in __all__ (list of str):

os.py:

if putenv not in __all__:
__all__.append(putenv)

__all__ is a list of strings.


(5) site.py:

if __name__ == '__main__': == HERE

__name__ is 'site'.


(6) Python/ceval.py: PyEval_EvalCodeEx() called with arbitrary keyword

for (i = 0; i  kwcount; i++) {
PyObject **co_varnames;
PyObject *keyword = kws[2*i];
PyObject *value = kws[2*i + 1];
int j;
...
/* Speed hack: do raw pointer compares. As names are
   normally interned this should almost always hit. */
co_varnames = ((PyTupleObject *)(co-co_varnames))-ob_item;
for (j = 0; j  total_args; j++) {
PyObject *nm = co_varnames[j];
if (nm == keyword)
goto kw_found;
}
/* Slow fallback, just in case */
for (j = 0; j  total_args; j++) {
PyObject *nm = co_varnames[j];
int cmp = PyObject_RichCompareBool(  == HERE
keyword, nm, Py_EQ);
if (cmp  0)
goto kw_found;
else if (cmp  0)
goto fail;
}

It looks like the just in case path is taken.

(gdb) where
#0  unicode_compare_eq (str1='isTest', str2='func') at 
Objects/unicodeobject.c:10532
#1  0x0052dd41 in PyUnicode_RichCompare (left='isTest', right='func', 
op=2) at Objects/unicodeobject.c:10609
#2  0x004be4db in do_richcompare (v='isTest', w='func', op=2) at 
Objects/object.c:647
#3  0x004be790 in PyObject_RichCompare (v='isTest', w='func', op=2) at 
Objects/object.c:696
#4  0x004be832 in PyObject_RichCompareBool (v='isTest', w='func', op=2) 
at Objects/object.c:718
#5  0x005a0f68 in PyEval_EvalCodeEx (...) at Python/ceval.c:3450
...

Traceback (most recent call first):
  File /home/haypo/prog/python/default/Lib/test/test_xml_etree.py, line 1669, 
in test_get_keyword_args
e1 = ET.Element('foo' , x=1, y=2, z=3)

ElementTree.Element() accepts arbitary keywords.


(7) letter==letter singletons:


xml.etree.ElementPath: iterfind()

def iterfind(elem, path, namespaces=None):
...
if path[-1:] == /: == HERE

Traceback (most recent call first):
  File /home/haypo/prog/python/default/Lib/xml/etree/ElementPath.py, line 
254, in iterfind
if path[-1:] == /:

path is .//grandchild, path[-1] is 'd' which is a singleton, Python already 
computed the hash of 'd'.


Similar example in the same file:

def xpath_tokenizer(pattern, namespaces=None):
for token in xpath_tokenizer_re.findall(pattern):
tag = token[1]
if tag and tag[0] != { and : in tag: == HERE
...

tag[0] != { = tag is 'grandchild', tag[0] is a singleton.


Another example:

Traceback (most recent call first):
  File /home/haypo/prog/python/default/Lib/sre_parse.py, line 194, in __next
if char == \\:


(8) str not in (list of str), test_descr.py: test_dir():

  File /home/haypo/prog/python/default/Lib/test/test_descr.py, line 2255, in 
listcomp
names = [x for x in dir(minstance) if x not in default_attributes]

minstance = M(m)
minstance.b = 2
minstance.a = 1
default_attributes = ['__name__', '__doc__', '__package__',
  '__loader__']
names = [x for x in dir(minstance) if x not in default_attributes]

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16286
___
___
Python-bugs-list mailing list
Unsubscribe: 

[issue16286] Use hash if available to optimize a==b and a!=b for bytes and str

2013-10-28 Thread STINNER Victor

STINNER Victor added the comment:


(4) str in __all__ (list of str):

os.py:

if putenv not in __all__:
__all__.append(putenv)


For this example: putenv is probably interned by def putenv(...). putenv 
string and the name of the function are the same constant. When a function is 
defined, it is stored in the locals dictionary, so the key (putenv) is 
interned by PyDict_SetItem().

So dict is not used explicitly, but implicitly by the namespace.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16286
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16286] Use hash if available to optimize a==b and a!=b for bytes and str

2013-04-09 Thread STINNER Victor

STINNER Victor added the comment:

I will benchmark the overhead of memcmp() on short strings. We may
check the first and last characters before calling memcmp() to limit
the overhead of calling a function.

I created the issue #17628 for this point.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16286
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16286] Use hash if available to optimize a==b and a!=b for bytes and str

2013-04-07 Thread STINNER Victor

Changes by STINNER Victor victor.stin...@gmail.com:


--
title: Optimize a==b and a!=b for bytes and str - Use hash if available to 
optimize a==b and a!=b for bytes and str

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16286
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com