[issue16427] Faster hash implementation

2013-10-28 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Since issue19183 supersedes it, yes.

--
resolution:  -> duplicate
stage: patch review -> committed/rejected
status: open -> closed
superseder:  -> PEP 456 Secure and interchangeable hash algorithm

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16427] Faster hash implementation

2013-10-28 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Shouldn't this issue be obsoleted?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16427] Faster hash implementation

2013-10-27 Thread Christian Heimes

Changes by Christian Heimes :


--
assignee:  -> christian.heimes

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16427] Faster hash implementation

2013-10-27 Thread Christian Heimes

Christian Heimes added the comment:

Antoine, I have addressed your concern "Well, the quality of the hash function 
is clearly reduced" in patch 
http://hg.python.org/features/pep-456/rev/765930d944a5

>>> s = set()
>>> for i in range(256):
... s.add(hash("abcdfeg" + chr(i)) & 0xff)
... 
>>> len(s)
256
>>> s = set()
>>> for i in range(256):
... s.add(hash("abcdfeghabcdefg" + chr(i)) & 0xff)
... 
>>> len(s)
256

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16427] Faster hash implementation

2013-06-05 Thread STINNER Victor

STINNER Victor added the comment:

I'm not sure that micro-benchmarks are revelant for this issue. Hash collisions 
may have an higher cost than the gain of a faster hash function. Some people 
feel also concerned by the dict denial of service issue.

It would be interesting to compare performances using the benchmark suite:
http://hg.python.org/benchmarks/

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16427] Faster hash implementation

2013-06-05 Thread Lukas Lueg

Changes by Lukas Lueg :


Added file: http://bugs.python.org/file30475/cityhash_fasthast3.txt

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16427] Faster hash implementation

2013-06-05 Thread Lukas Lueg

Lukas Lueg added the comment:

Here are more benchmarks of vanilla 3.4 vs. cityhash vs. fast_hash_3 on both 
arm7l and x86-64. The patch was applied varbatim, only caching disabled. On 
arm7l, the cpu was fixed to maximum freq (it seems to take ages to switch 
frequencies, at least there is a lot of jitter with ondemand). The cityhash 
implementation was compiled with -O3 on both platforms and -msse4.2 on x86-64.

CityHash and fh3 come out much better than vanilla on x86-64 with cityhash 
being slightly faster (which is surprising). On ARM7 CityHash performs much 
worse than vanilla and fh3 significantly better.

--
Added file: http://bugs.python.org/file30474/cityhash_fashhash3_arm.txt

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16427] Faster hash implementation

2013-06-02 Thread Lukas Lueg

Lukas Lueg added the comment:

The 10**4-case is an error (see insane %), I've never been able to reproduce. 
Having done more tests with fixed cpu frequency and other daemons' process 
priority reduced, cityhash always comes out much slower on arm7l.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16427] Faster hash implementation

2013-06-02 Thread Antoine Pitrou

Antoine Pitrou added the comment:

> Here are some benchmarks for a arm7l on a rk30-board. CityHash was
> compiled with -mcpu=native -O3.

The results look unbelievable. If you take "Length 10 ** 4", it means
arm7l is able to hash 20 GB/s using the default unicode hash function.

(did you disable caching?)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16427] Faster hash implementation

2013-06-02 Thread Lukas Lueg

Lukas Lueg added the comment:

Here are some benchmarks for a arm7l on a rk30-board. CityHash was compiled 
with -mcpu=native -O3.

CityHash is around half as fast as the native algorithm for small strings and 
way, way slower on larger ones. My guess would be that the complex arithmetic 
in cityhash outweights the gains of better scheduling.

The results are somewhat inconclusive, as the performance increases again for 
very long strings.

--
Added file: http://bugs.python.org/file30447/cityhash_arm.txt

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16427] Faster hash implementation

2013-06-02 Thread Lukas Lueg

Lukas Lueg added the comment:

It's a cache sitting between an informix db and and an internal web service. 
Stuff comes out of db, processed, json'ifed, cached and put on the wire. 10**6s 
of strings pass this process per request if uncached...

I use CityHash64WithSeed, the seed being cpython's hash prefix (which I don't 
care about but found reassuring to put in anyway)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16427] Faster hash implementation

2013-06-02 Thread Antoine Pitrou

Antoine Pitrou added the comment:

> I was investigating a callgrind dump of my code, showing how badly
> unicode_hash() was affecting my performance.

Can you tell us about your use case?
There are several CityHash variants, which one did you use? CityHash64?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16427] Faster hash implementation

2013-06-02 Thread Lukas Lueg

Lukas Lueg added the comment:

I was investigating a callgrind dump of my code, showing how badly 
unicode_hash() was affecting my performance. Using google's cityhash  instead 
of the builtin algorithm to hash unicode objects improves overall performance 
by about 15 to 20 percent for my case - that is quite a thing.
Valgrind shows that the number of instructions spent by unicode_hash() drops 
from ~20% to ~11%. Amdahl crunches the two-fold performance increase to the 
mentioned 15 percent.

Cityhash was chosen because of it's MIT license and advertisement for 
performance on short strings.

I've now found this bug and attached a log for haypo's benchmark which compares 
native vs. cityhash. Caching was disabled during the test. Cityhash was 
compiled using -O3 -msse4.2 (cityhash uses cpu-native crc instructions). 
CPython's unittests fail due to known_hash and gdb output; besides that, 
everything else seems to work fine.

Cityhash is advertised for it's performance with short strings, which does not 
seem to show in the benchmark. However, longer strings perform *much* better.

If people are insterested, i can repeat the test on a armv7l

--
nosy: +ebfe
Added file: http://bugs.python.org/file30446/cityhash.txt

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16427] Faster hash implementation

2013-04-23 Thread Martin Morrison

Changes by Martin Morrison :


--
nosy: +isoschiz

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16427] Faster hash implementation

2013-04-10 Thread Mark Dickinson

Mark Dickinson added the comment:

> Note that the patch uses type punning through a union: while GCC allows > 
> this, it's not allowed by ANSI.

I believe it's legal under C99 + TC3.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16427] Faster hash implementation

2013-04-09 Thread Charles-François Natali

Charles-François Natali added the comment:

> Is p guaranteed to be size_t aligned?
> If not, unaligned access can segfault (e.g. on Sparc IIRC).

Apparently yes.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16427] Faster hash implementation

2013-04-09 Thread Charles-François Natali

Charles-François Natali added the comment:

> I'd expect just casting the pointer type before dereferencing:
>
> unsigned char *p;
> ...
> hash = (multiplier * hash) ^ *((Py_uhash_t *)p);
>
> (don't use size_t, use Py_uhash_t)

Is p guaranteed to be size_t aligned?
If not, unaligned access can segfault (e.g. on Sparc IIRC).

> Also it may make DoS attacks easier.

Indeed.
And the increase in collision you demonstrated in your previous
message worries me (both security and performance wise).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16427] Faster hash implementation

2013-04-09 Thread Antoine Pitrou

Antoine Pitrou added the comment:

> pybench and perf.py can be used to measure performances of the patch.
> The speedup may not be detected, but a slowdown would be detected at
> least.

The slowdown would only occur for specific, well-chosen patterns. Also it may 
make DoS attacks easier.
(remember the reason we randomized hashes is so that it's hard for attackers to 
find collisions)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16427] Faster hash implementation

2013-04-09 Thread Gregory P. Smith

Gregory P. Smith added the comment:

> > Note that the patch uses type punning through a union
>
> What is the standard and portable way to cast an array of bytes to size_t?

I'd expect just casting the pointer type before dereferencing:

unsigned char *p;
...
hash = (multiplier * hash) ^ *((Py_uhash_t *)p);

(don't use size_t, use Py_uhash_t)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16427] Faster hash implementation

2013-04-09 Thread STINNER Victor

STINNER Victor added the comment:

> Note that the patch uses type punning through a union

What is the standard and portable way to cast an array of bytes to size_t?

2013/4/9 Charles-François Natali :
>
> Charles-François Natali added the comment:
>
> Note that the patch uses type punning through a union: while GCC allows this, 
> it's not allowed by ANSI (although since we're using a char [], it's somewhat 
> a grey area). An aggresive compiler could optimiza the read/write away.
>
> --
> nosy: +neologix
>
> ___
> Python tracker 
> 
> ___

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16427] Faster hash implementation

2013-04-09 Thread Charles-François Natali

Charles-François Natali added the comment:

Note that the patch uses type punning through a union: while GCC allows this, 
it's not allowed by ANSI (although since we're using a char [], it's somewhat a 
grey area). An aggresive compiler could optimiza the read/write away.

--
nosy: +neologix

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16427] Faster hash implementation

2013-04-09 Thread STINNER Victor

STINNER Victor added the comment:

> (dicts and sets have a sophisticated probing algorithm which takes into 
> account the whole hash value, not the masked one).

Correct, so your specific example should not be a problem since the
whole hash value is different for the 6 hash values.

> Now to know if that may produce slowdowns in some situations...

pybench and perf.py can be used to measure performances of the patch.
The speedup may not be detected, but a slowdown would be detected at
least.

2013/4/9 Antoine Pitrou :
>
> Antoine Pitrou added the comment:
>
> Well, the quality of the hash function is clearly reduced:
>
 hash("abcdefgh") & 0xff
> 206
 hash("abcdefgi") & 0xff
> 206
 hash("abcdefgj") & 0xff
> 206
 hash("abxx") & 0xff
> 206
 hash("aa11") & 0xff
> 206
 hash("aa12") & 0xff
> 206
>
>
> Now to know if that may produce slowdowns in some situations... (dicts and 
> sets have a sophisticated probing algorithm which takes into account the 
> whole hash value, not the masked one).
>
> --
>
> ___
> Python tracker 
> 
> ___

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16427] Faster hash implementation

2013-04-09 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Well, the quality of the hash function is clearly reduced:

>>> hash("abcdefgh") & 0xff
206
>>> hash("abcdefgi") & 0xff
206
>>> hash("abcdefgj") & 0xff
206
>>> hash("abxx") & 0xff
206
>>> hash("aa11") & 0xff
206
>>> hash("aa12") & 0xff
206


Now to know if that may produce slowdowns in some situations... (dicts and sets 
have a sophisticated probing algorithm which takes into account the whole hash 
value, not the masked one).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16427] Faster hash implementation

2013-04-08 Thread Raymond Hettinger

Changes by Raymond Hettinger :


--
nosy: +rhettinger

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16427] Faster hash implementation

2013-04-08 Thread STINNER Victor

STINNER Victor added the comment:

Does anyone know if fast_hash_3.patch may reduce the quality of the hash 
function? (May the patched hash function produce more collisions? The 
"Avalanche effect" thing.)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16427] Faster hash implementation

2013-04-08 Thread STINNER Victor

Changes by STINNER Victor :


Added file: http://bugs.python.org/file29743/bench_hash.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16427] Faster hash implementation

2013-04-08 Thread STINNER Victor

STINNER Victor added the comment:

fast_hash_3.patch is a litte bit (6%) slower for Unicode string shorter than 10 
characters, but much faster for string equal or longer than 100 characters (up 
to 10x faster).

I used the str type and disabled its cache ("_PyUnicode_HASH(self) = x;" in 
unicode_hash()) to run my benchmark.


Summary    |   original |    patched
---++---
Length 1   | 231 ns (*) |   244 ns (+6%)
Length 3   | 238 ns (*) |   253 ns (+6%)
Length 10  | 254 ns (*) | 251 ns
Length 20  | 280 ns (*) |   256 ns (-8%)
Length 100 | 528 ns (*) |  321 ns (-39%)
Length 10 ** 4 |  32 us (*) | 9.49 us (-70%)
Length 10 ** 8 | 329 ms (*) |  104 ms (-68%)
---++---
Total  | 329 ms (*) |  104 ms (-68%)

--
Added file: http://bugs.python.org/file29742/bench_hash.txt

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16427] Faster hash implementation

2013-04-08 Thread Serhiy Storchaka

Changes by Serhiy Storchaka :


Removed file: http://bugs.python.org/file27947/fast_hash_2.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16427] Faster hash implementation

2013-04-08 Thread Serhiy Storchaka

Changes by Serhiy Storchaka :


Removed file: http://bugs.python.org/file27950/fast_str_hash.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16427] Faster hash implementation

2013-04-07 Thread STINNER Victor

Changes by STINNER Victor :


--
nosy: +haypo

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16427] Faster hash implementation

2012-11-15 Thread Andrew Svetlov

Changes by Andrew Svetlov :


--
nosy: +asvetlov

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16427] Faster hash implementation

2012-11-11 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 797de1864fd9 by Antoine Pitrou in branch '3.3':
Add a test for hashing of unaligned memory buffers (from issue #16427).
http://hg.python.org/cpython/rev/797de1864fd9

New changeset 9cb1366b251b by Antoine Pitrou in branch 'default':
Add a test for hashing of unaligned memory buffers (from issue #16427).
http://hg.python.org/cpython/rev/9cb1366b251b

--
nosy: +python-dev

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16427] Faster hash implementation

2012-11-11 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Stefan, thank you for the suggestion.  The test showed that, in fact, at least 
under some x86 there is no performance decrease when using memcpy on nonaligned 
data.  This is good news.  The code can left simple and even some doubtful 
potential undefined behavior was removed.

Additional microbenchmarks:

$ ./python -m timeit -n 1 -s "t = memview(b'a' * 10**8)"  "hash(t)"
$ ./python -m timeit -n 1 -s "t = memview(b'a' * 10**8)[1:]"  "hash(t)"
$ ./python -m timeit -n 1 -s "t = memview(b'a' * 10**8)[8:]"  "hash(t)"

   original  patchedspeedup

bytes  181 msec  46 msec3.9x
UCS1   429 msec  46.2 msec  9.3x
UCS2   179 msec  91.9 msec  1.9x
UCS4   183 msec  184 msec   1x
memview()  362 msec  91.7 msec  3.9x
memview()[1:]  362 msec  93.2 msec  3.9x
memview()[8:]  362 msec  92.4 msec  3.9x

I don't know how it will be on other platforms.

--
Added file: http://bugs.python.org/file27956/fast_hash_3.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16427] Faster hash implementation

2012-11-11 Thread Stefan Krah

Stefan Krah added the comment:

> The code can be identical, but the time will differ significantly for
> aligned and non-aligned data.

Of course, but in most cases the data *is* aligned, so only code that does
something quite special pays the performance penalty.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16427] Faster hash implementation

2012-11-11 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

> Unaligned accesses are not a problem on x86(-64), but they will segfault
(bus error, IIRC) on other architectures such as SPARC, unfortunately.

On x86(-64) this kills performance and makes the optimization be senseless.

> FWIW, on x86/x64 gcc often generates identical code for x = y and
memcpy(x, y, 8).

The code can be identical, but the time will differ significantly for aligned 
and non-aligned data.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16427] Faster hash implementation

2012-11-11 Thread Stefan Krah

Stefan Krah added the comment:

FWIW, on x86/x64 gcc often generates identical code for x = y and
memcpy(x, y, 8). See e.g. the PACK_SINGLE and UNPACK_SINGLE macros in
Objects/memoryobject.c.

I didn't look at the patch yet, just an idea.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16427] Faster hash implementation

2012-11-11 Thread Antoine Pitrou

Antoine Pitrou added the comment:

> If a fast hash for bytes/memoryview is desirable, I can write a fast
> robust implementation for nonaligned data.  But this will be more
> cumbersome and a bit slower.

Unaligned accesses are not a problem on x86(-64), but they will segfault
(bus error, IIRC) on other architectures such as SPARC, unfortunately.

(blame RISC for being really too "simplified")

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16427] Faster hash implementation

2012-11-11 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Oh, I see, it's already here.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16427] Faster hash implementation

2012-11-11 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

> Here is a test case for the hash/alignment issue.

I think here should be a test for a shifted data.  Something like 
hash(b'abcd...') == hash(memoryview(b'xabcd...')[1:]).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16427] Faster hash implementation

2012-11-11 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

> The patch is too optimistic, it gives different results depending on the 
> alignment of the memory buffer:

So this method is not applicable for a byte.  Here is a patch only for strings.

If a fast hash for bytes/memoryview is desirable, I can write a fast robust 
implementation for nonaligned data.  But this will be more cumbersome and a bit 
slower.

--
Added file: http://bugs.python.org/file27950/fast_str_hash.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16427] Faster hash implementation

2012-11-10 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Here is a test case for the hash/alignment issue.

--
Added file: http://bugs.python.org/file27948/test_hash_align.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16427] Faster hash implementation

2012-11-10 Thread Antoine Pitrou

Changes by Antoine Pitrou :


--
nosy: +mark.dickinson, skrah, tim_one

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16427] Faster hash implementation

2012-11-10 Thread Antoine Pitrou

Antoine Pitrou added the comment:

The patch is too optimistic, it gives different results depending on the 
alignment of the memory buffer:

>>> b = b"abcd"*100
>>> hash(b[1:])
7264687738704559842
>>> hash(memoryview(b)[1:])
9054791208347464792
>>> memoryview(b)[1:] == b[1:]
True

--
nosy: +pitrou
stage:  -> patch review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16427] Faster hash implementation

2012-11-10 Thread Serhiy Storchaka

Changes by Serhiy Storchaka :


Removed file: http://bugs.python.org/file27920/fast_hash.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16427] Faster hash implementation

2012-11-10 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

> Any reason you're using an unsigned int in your loop instead of a Py_uhash_t?

In fact, there is no serious reason. This should be the type aligned as minimal 
alignment of void*, size_t and Py_hash_t. Since de facto Py_uhash_t is size_t, 
then we can use size_t.

--
Added file: http://bugs.python.org/file27947/fast_hash_2.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16427] Faster hash implementation

2012-11-10 Thread Gregory P. Smith

Gregory P. Smith added the comment:

yes please!

Any reason you're using an unsigned int in your loop instead of a Py_uhash_t?

--
nosy: +gregory.p.smith

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16427] Faster hash implementation

2012-11-09 Thread Jesús Cea Avión

Changes by Jesús Cea Avión :


--
nosy: +jcea

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16427] Faster hash implementation

2012-11-07 Thread Christian Heimes

Changes by Christian Heimes :


--
nosy: +christian.heimes

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16427] Faster hash implementation

2012-11-07 Thread Serhiy Storchaka

New submission from Serhiy Storchaka:

In the discussion of issue14621 it was noted that much more complex hash 
algorithms can overtake the current one due to the fact that they process more 
data at a time.  Here is a patch that implements this idea for the current 
algorithm.  Also code duplication removed.

Microbenchmarks:

$ ./python -m timeit -n 1 -s "t = b'a' * 10**8"  "hash(t)"
$ ./python -m timeit -n 1 -s "t = 'a' * 10**8"  "hash(t)"
$ ./python -m timeit -n 1 -s "t = '\u0100' * 10**8"  "hash(t)"
$ ./python -m timeit -n 1 -s "t = '\U0001' * 10**8"  "hash(t)"

Results on 32-bit Linux on AMD Athlon 64 X2 4600+:

   original  patchedspeedup

bytes  181 msec  45.7 msec  4x
UCS1   429 msec  45.7 msec  9.4x
UCS2   179 msec  92 msec1.9x
UCS4   183 msec  183 msec   1x

If the idea is acceptable, I will create benchmarks for short strings.

--
components: Interpreter Core
files: fast_hash.patch
keywords: patch
messages: 175093
nosy: serhiy.storchaka
priority: normal
severity: normal
status: open
title: Faster hash implementation
type: performance
versions: Python 3.4
Added file: http://bugs.python.org/file27920/fast_hash.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com