[issue21556] try to use hashtable in pickle

2014-05-23 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 I'm saying attempt, because although it works correctly, some benchmarks are 
 actually slower.
 I didn't profile it, so I don't know if it's due to the hashtable 
 implementation, function call overheads, etc.

It probably shows that Python dicts (which the pickle hashtable is a rip-off 
of, module refcounting) are more optimized than the average hash table 
implementation :-)

But also, _Py_hashtable_hash_ptr is quite crude in that it doesn't even try to 
compensate for pointer alignment, so there will automatically be many 
collisions. Only one hash bucket every 8 or 16 will be used, at best.

And the straightforward collision resolution in hashtable.c is much less 
efficient at mitigating collisions than a Python dict's.

--
nosy: +tim.peters

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21556
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21556] try to use hashtable in pickle

2014-05-23 Thread STINNER Victor

STINNER Victor added the comment:

_Py_hashtable_hash_ptr is quite crude in that it doesn't even try to 
compensate for pointer alignment, so there will automatically be many 
collisions. Only one hash bucket every 8 or 16 will be used, at best.

I chose to use _Py_HashPointer() to drop (shift) lower bits. Do you mean that 
it's not enough? Python memory allocator uses an alignement on 8 bytes (2**3).

Py_uhash_t
_Py_hashtable_hash_ptr(const void *key)
{
return (Py_uhash_t)_Py_HashPointer((void *)key);
}

Py_hash_t
_Py_HashPointer(void *p)
{
Py_hash_t x;
size_t y = (size_t)p;
/* bottom 3 or 4 bits are likely to be 0; rotate y by 4 to avoid
   excessive hash collisions for dicts and sets */
y = (y  4) | (y  (8 * SIZEOF_VOID_P - 4));
x = (Py_hash_t)y;
if (x == -1)
x = -2;
return x;
}

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21556
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21556] try to use hashtable in pickle

2014-05-23 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 _Py_hashtable_hash_ptr is quite crude in that it doesn't even try to 
 compensate for pointer alignment, so there will automatically be many 
 collisions. Only one hash bucket every 8 or 16 will be used, at best.
 
 I chose to use _Py_HashPointer() to drop (shift) lower bits. Do you
 mean that it's not enough? Python memory allocator uses an alignement
 on 8 bytes (2**3).

Ah, sorry, I just hadn't realized that.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21556
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21556] try to use hashtable in pickle

2014-05-23 Thread STINNER Victor

STINNER Victor added the comment:

 And the straightforward collision resolution in hashtable.c is much less 
 efficient at mitigating collisions than a Python dict's.

Modules/hashtable.c comes from http://sourceforge.net/projects/libcfu/ (cfuhash 
type). I adapted the code for my needs (the tracemalloc module), but I didn't 
try to make it fast. My main change was to store data directly in an entry of a 
table instead of using a second memory block for the data (and a pointer in the 
hash table entry).

I didn't want to use the Python dict type for tracemalloc because this type may 
use the Python memory allocator which would lead to reentrant calls to 
tracemalloc. It's also nice to have a function to compute exactly the memory 
usage of an hash table (table + data), it's exposed in 
tracemalloc.get_tracemalloc_memory().

It doesn't mean that I want hashtable.c to be slow :-) Feel free to modify it 
as you want. But trying to make it as fast as Python dict would be hard. The 
design is different. Python dict uses open addressing, whereas _Py_hashtable 
uses linked list to store entries. Python dict is well optimized.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21556
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21556] try to use hashtable in pickle

2014-05-23 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 I didn't want to use the Python dict type for tracemalloc because this
 type may use the Python memory allocator which would lead to reentrant
 calls to tracemalloc.

Ah, so this means CF's patch will make the pickle memotable invisible to
tracemalloc?

 It doesn't mean that I want hashtable.c to be slow :-) Feel free to
 modify it as you want. But trying to make it as fast as Python dict
 would be hard. The design is different. Python dict uses open
 addressing, whereas _Py_hashtable uses linked list to store entries.
 Python dict is well optimized.

Yes, optimizing hashtable.c probably means converting it to the same
hashing scheme as dicts (or the current memotable in pickle.c).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21556
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21556] try to use hashtable in pickle

2014-05-23 Thread STINNER Victor

STINNER Victor added the comment:

Ah, so this means CF's patch will make the pickle memotable invisible to 
tracemalloc?

Currently, _Py_hashtabe uses PyMem_RawMalloc and PyMem_RawFree by default 
(realloc is not needed, we need to keep the previous buckets on rehash). Using 
_Py_hashtable_new_full, you can choose a different memory allocator.

Hum, wait. tracemalloc uses trace PyMem_RawMalloc and PyMem_RawFree. In fact, 
tracemalloc ignores reentrant calls to the memory allocator. So now I don't 
remember exactly why I chose a custom hash table implementation :-) Maybe 
because Python dict requires Python objects with reference counter, use the 
garbage collector, etc.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21556
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21556] try to use hashtable in pickle

2014-05-22 Thread Charles-François Natali

New submission from Charles-François Natali:

This patch is an attempt at making pickle use Modules/hashtable.{h,c} instead 
of its hash table ad-hoc implementation for its memoization table.
I'm saying attempt, because although it works correctly, some benchmarks are 
actually slower.
I didn't profile it, so I don't know if it's due to the hashtable 
implementation, function call overheads, etc.

If we manage to bring this on par with pickle's ad-hoc implementation, it would 
probably be interesting to replace it. If not, then we can just drop this patch 
:-)

Also, there might be other places in the code base which might benefit from 
this generic hashtable, maybe.

--
files: pickle_use_hashtable.diff
keywords: patch
messages: 218920
nosy: haypo, neologix, pitrou, serhiy.storchaka
priority: normal
severity: normal
status: open
title: try to use hashtable in pickle
type: enhancement
versions: Python 3.5
Added file: http://bugs.python.org/file35319/pickle_use_hashtable.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21556
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21556] try to use hashtable in pickle

2014-05-22 Thread Ned Deily

Changes by Ned Deily n...@acm.org:


--
nosy: +alexandre.vassalotti

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21556
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com