[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-11-21 Thread Roundup Robot

Roundup Robot added the comment:

New changeset eec4758e3a45 by Victor Stinner in branch 'default':
Issue #19183: Simplify test_gdb
http://hg.python.org/cpython/rev/eec4758e3a45

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-11-20 Thread Christian Heimes

Christian Heimes added the comment:

The problems have been resolved.

--
resolution:  -> fixed
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-11-20 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 961d832d8734 by Christian Heimes in branch 'default':
Issue #19183: too many tests depend on the sort order of repr().
http://hg.python.org/cpython/rev/961d832d8734

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-11-20 Thread STINNER Victor

STINNER Victor added the comment:

Not only test_gdb relies on repr() exact value, there is also test_functools:

http://buildbot.python.org/all/builders/AMD64%20OpenIndiana%203.x/builds/6875/steps/test/logs/stdio

==
FAIL: test_repr (test.test_functools.TestPartialC)
--
Traceback (most recent call last):
  File 
"/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Lib/test/test_functools.py",
 line 174, in test_repr
repr(f))
AssertionError: 'func[51 chars]88>, b=, 
[36 chars]00>)' != 'func[51 chars]88>, a=, 
[36 chars]40>)'
- functools.partial(, b=, a=)
+ functools.partial(, a=, b=)


==
FAIL: test_repr (test.test_functools.TestPartialCSubclass)
--
Traceback (most recent call last):
  File 
"/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Lib/test/test_functools.py",
 line 174, in test_repr
repr(f))
AssertionError: 'Part[49 chars]88>, b=, 
[36 chars]00>)' != 'Part[49 chars]88>, a=, 
[36 chars]40>)'
- PartialSubclass(, b=, a=)
+ PartialSubclass(, a=, b=)

--
resolution: fixed -> 
status: closed -> open

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-11-20 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 11cb1c8faf11 by Victor Stinner in branch 'default':
Issue #19183: Fix repr() tests of test_gdb, hash() is now platform dependent
http://hg.python.org/cpython/rev/11cb1c8faf11

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-11-20 Thread Christian Heimes

Changes by Christian Heimes :


--
assignee: ncoghlan -> christian.heimes
resolution:  -> fixed
stage: patch review -> committed/rejected
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-11-20 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 422ed27b62ce by Christian Heimes in branch 'default':
Issue #19183: test_gdb's test_dict was failing on some machines as the order or 
dict keys has changed again.
http://hg.python.org/cpython/rev/422ed27b62ce

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-11-20 Thread Roundup Robot

Roundup Robot added the comment:

New changeset adb471b9cba1 by Christian Heimes in branch 'default':
ssue #19183: Implement PEP 456 'secure and interchangeable hash algorithm'.
http://hg.python.org/cpython/rev/adb471b9cba1

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-11-19 Thread Christian Heimes

Christian Heimes added the comment:

The PEP should be ready now. I have addressed your input in 
http://hg.python.org/peps/rev/fbe779221a7a

--
assignee: christian.heimes -> ncoghlan

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-11-16 Thread Christian Heimes

Christian Heimes added the comment:

The numbers are between cpython default tip and my feature branch. I have 
pulled and merged all upstream changes into my feature branch yesterday. The 
results with "sso" in the file name are with small string optimization.

Performance greatly depends on compiler and CPU. In general SipHash24 is faster 
on modern compilers and modern CPUs. MSVC generates slower code for SipHash24 
but I haven't optimized the code for MSVC yet. Perhaps Serhiy is able to do his 
magic. :)

PS: I have uploaded more benchmarks. The directories contain the raw output and 
CSV files.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-11-16 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Benchmark report (without the small strings optimization):

http://bpaste.net/show/UohtA8dmSREbrtsJYfTI/

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-11-16 Thread Antoine Pitrou

Antoine Pitrou added the comment:

So, amusingly, Christian's patch seems to be 4-5% faster than vanilla on many 
benchmarks here (Sandy Bridge Core i5, 64-bit, gcc 4.8.1). A couple of 
benchmarks are a couple % slower, but nothing severe.

This without the small strings optimization.

On top of that, the small strings optimization seems to have varying effects, 
some positive, some negative, so it doesn't seem to be desirable (on this 
platform anyway).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-11-16 Thread Antoine Pitrou

Antoine Pitrou added the comment:

For the record, it's better to use a geometric mean when agregating benchmark 
results into a single score.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-11-16 Thread Nick Coghlan

Nick Coghlan added the comment:

Thanks - are those numbers with the current feature branch, and hence no small 
string optimization?

To be completely clear, I'm happy to accept a performance penalty to fix the 
hash algorithm. I'd just like to know exactly how big a penalty I'm accepting, 
and whether taking advantage of the small string optimization makes it 
measurably smaller.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-11-15 Thread Christian Heimes

Christian Heimes added the comment:

Here are benchmarks on two Linux machine. It looks like SipHash24 takes 
advantage of newer CPUs. I'm a bit puzzled about the results. Or maybe my super 
simple and naive analyzer doesn't give sensible results...

https://bitbucket.org/tiran/pep-456-benchmarks/

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-11-15 Thread Nick Coghlan

Changes by Nick Coghlan :


--
assignee: ncoghlan -> christian.heimes

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-11-15 Thread Nick Coghlan

Nick Coghlan added the comment:

I reviewed the latest PEP text at http://www.python.org/dev/peps/pep-0456/

I'm almost prepared to accept the current version of the implementation, but 
there's one technical decision to be clarified and a few placeholders in the 
PEP that need to be cleaned up prior to formal acceptance:

* The rationale for turning off the small string optimisation by default rather 
than setting the cutoff to 7 bytes isn't at all clear to me. A consistent 3-5% 
speed difference on the benchmark suite isn't trivial, and if we have the small 
string optimization off by default, why aren't we just deleting that code 
instead?

* A link to the benchmark suite at http://hg.python.org/benchmarks should be 
included at the appropriate places in the PEP

* The "Further things to consider" section needs to be moved to a paragraph 
under "Discussion" describing the current implementation (i.e. the hash 
equivalence is tolerated for simplicity and consistency)

* The "TBD" in the performance section needs to go. Reference should be made to 
the numbers in the small string optimisation section.

* The performance numbers need to be clear on what version of the feature 
branch was used to obtain them (preferably the one you plan to commit!).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-11-13 Thread Christian Heimes

Christian Heimes added the comment:

Hi Nick,

I have updated the patch and the PEP text. The new version has small string 
hash optimization disabled. The other changes are mostly cleanup, reformatting 
and simplifications.

Can you please do a review so I can get the patch into 3.4 before beta1 is 
released?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-11-13 Thread Christian Heimes

Changes by Christian Heimes :


Added file: http://bugs.python.org/file32606/ac521cef665a.diff

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-31 Thread Christian Heimes

Christian Heimes added the comment:

I had to add the conversion from LE to host endianess. The missing conversion 
was affecting and degrading hash value dispersion.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-31 Thread Christian Heimes

Changes by Christian Heimes :


Added file: http://bugs.python.org/file32440/fb2f9c0bbca9.diff

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-29 Thread Christian Heimes

Changes by Christian Heimes :


Added file: http://bugs.python.org/file32417/4756e9ed0328.diff

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-29 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

About memcpy(). Here is sample file. Compile it to assembler:

 gcc -O2 -S -masm=intel fnv.c

With memcpy() main loop is compiled to:

.L3:
mov esi, DWORD PTR [ebx]
imuleax, eax, 103
add ebx, 4
xor eax, esi
sub ecx, 1
mov DWORD PTR [esp+24], esi
jne .L3

With per-byte copy it is compiled to:

.L3:
mov dl, BYTE PTR [ecx]
imuleax, eax, 103
sub ebp, 1
movzx   ebx, BYTE PTR [ecx+1]
movzx   edi, BYTE PTR [ecx+2]
movzx   esi, BYTE PTR [ecx+3]
add ecx, 4
mov dh, bl
sal edi, 16
movzx   edx, dx
sal esi, 24
or  edx, edi
or  edx, esi
xor eax, edx
cmp ebp, -1
jne .L3

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-29 Thread Serhiy Storchaka

Changes by Serhiy Storchaka :


--
Removed message: http://bugs.python.org/msg201675

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-29 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

About memcpy(). Here is sample file. Compile it to assembler:

 gcc -O2 -S -masm=intel fnv.c

With memcpy() main loop is compiled to:

.L8:
movzx   ecx, BYTE PTR [ebx+edx]
imuleax, eax, 103
add edx, 1
xor eax, ecx
cmp edx, edi
jne .L8

With per-byte copy it is compiled to:

.L3:
mov dl, BYTE PTR [ecx]
imuleax, eax, 103
sub ebp, 1
movzx   ebx, BYTE PTR [ecx+1]
movzx   edi, BYTE PTR [ecx+2]
movzx   esi, BYTE PTR [ecx+3]
add ecx, 4
mov dh, bl
sal edi, 16
movzx   edx, dx
sal esi, 24
or  edx, edi
or  edx, esi
xor eax, edx
cmp ebp, -1
jne .L3

--
Added file: http://bugs.python.org/file32414/fnv.c

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-29 Thread Christian Heimes

Christian Heimes added the comment:

Victor:
I have added the licence to Doc/licence.rst and created a new ticket for 
PY_UINT64_T on Windows #19433.

Nick:
The memory layout of the hash secret is now documented. I have renamed the 
members to reflect their purpose, too. 
http://hg.python.org/features/pep-456/file/f0a7e606c2d0/Include/pyhash.h#l32

I'll update my PEP shortly and address the memory layout of _Py_HashSecret_t, 
the small string hashing optimization and performance/memory alignment.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-29 Thread STINNER Victor

STINNER Victor added the comment:

To support Windows 32 bit, the following code in PC/pyconfig.h can be modified 
to use __int64 or _W64: see ssize_t definition below in the same file.

#ifndef PY_UINT64_T
#if SIZEOF_LONG_LONG == 8
#define HAVE_UINT64_T 1
#define PY_UINT64_T unsigned PY_LONG_LONG
#endif
#endif

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-29 Thread STINNER Victor

STINNER Victor added the comment:

+ The above copyright notice and this permission notice shall be included in
+ all copies or substantial portions of the Software.

You should copy the license into Doc/license.rst.

--
nosy: +haypo

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-29 Thread Nick Coghlan

Nick Coghlan added the comment:

Christian's general approach looks fine to me - consolidating the "kind" hashes 
(i.e. byte sequences, numbers and pointers) into one place independent of any 
particular type implementation makes sense to me, and the clear abstraction of 
"What is a hash function?" from Python's point of view is a good thing for 
embedding purposes.

If you get the PEP updated accordingly, we should be able to get that formally 
accepted in fairly short order. (I had some other suggestions in the review, 
but they aren't relevant to accepting the PEP).

Regarding the concerns about a potential performance impact for unaligned 
memory access, I think that can be better assessed *after* the simpler patch is 
merged and it's easier for people to get hold of the new hash implementation. 
However, the PEP should mention the concern, and note it as something we will 
be keeping a close eye on during the beta period.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-28 Thread Arfrever Frehtes Taifersar Arahesis

Changes by Arfrever Frehtes Taifersar Arahesis :


--
nosy: +Arfrever

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-28 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Here is simplified version of the patch.

19427e9cc500.diff:
 22 files changed, 746 insertions(+), 211 deletions(-), 28 modifications(!)
19427e9cc500-simplified.diff:
 21 files changed, 486 insertions(+), 67 deletions(-), 27 modifications(!)

--
Added file: http://bugs.python.org/file32402/19427e9cc500-simplified.diff

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-28 Thread Antoine Pitrou

Changes by Antoine Pitrou :


Added file: http://bugs.python.org/file32398/b8d39bf9ca4a.diff

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-28 Thread Antoine Pitrou

Antoine Pitrou added the comment:

I think Christian is right here. Hashing unaligned memory areas will happen 
quite rarely. It should work, but it doesn't have to be as fast as the common 
case.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-28 Thread Christian Heimes

Christian Heimes added the comment:

The code in your example uses volatile. That prevents lots of compiler 
optimizations. In my experience compilers and CPU do a better optimization job 
than humans until the human factor interferes with the compiler. Even 40% might 
not be slower than calling memcpy() for every block or processing the input 
byte by byte instead of uint64 by uint64...

I can't comment on ARM and Barry's ARM box is dead at the moment. Distributors 
or users can select different and more ARM-friendly code, too. After all the 
hash code is easily interchangeable. :)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-28 Thread Christian Heimes

Christian Heimes added the comment:

Am 28.10.2013 18:15, schrieb Serhiy Storchaka:
> _PyHash_Fini() should be moved out too._Py_HashBytes() is only function which 
> should be customized.

You still haven' convinced me to scatter hash-related functions over
multiple C files. And it won't work with a static definition of the hash
function and PyHash_FuncDef. They MUST be in one object file.

> Because it is redundant and only complicates the code, both use and 
> declaration.

No, I disagree with you. It makes the code less complicated because it
encapsulates all related data in one place. Please provide a patch that
shows the contrary.

> There are other hash related functions (hashing integers, tuples). Only 
> _Py_HashBytes() should be customized and only it worth moving to separated 
> file.

Python doesn't have a hash function for integers anymore. It has a
specialized function for PyLongObject. pyhash.c contains common helper
functions.

> The benefit is that the code will be simpler if get rid from 
> HAVE_ALIGNED_REQUIRED and related code in ./configure. Only on such archaic 
> architectures hash code will be slower.

And it will clutter other code... Please provide a patch and benchmarks
for your proposal. I'll incorporate your patch if it have zero impact on
speed and doesn't make the code harder to understand.

> Let first land simplified patch and then you could add features such as 
> PY_HASH_EXTERNAL and PyHash_FuncDef.

No, I'm not going to remove code in order to re-add it later. If you
don't like my PEP then please provide patches.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-28 Thread Charles-François Natali

Charles-François Natali added the comment:

>> Well, unaligned memory access is usually slower on all architectures :-)
>> Also, I think some ARM architectures don't support unaligned access, so
>> it's not really a thing of the past...
>
> On modern computers it's either not slower or just a tiny bit slower.
> http://lemire.me/blog/archives/2012/05/31/data-alignment-for-speed-myth-or-reality/

I have other benchmarks that show slowdowns of more than 40%:
http://www.alexonlinux.com/aligned-vs-unaligned-memory-access

Also, x86 has optimized unaligned memory accesses, but the world isn't
x86-only (once again, there's ARM, and AFAICT the performance hit can
be quite high).

Now, I perfectly understand that you don't want to mess with the
implementation, but just don't say that "unaligned access doesn't
matter, and is just a tiny bit slower".

IMO the compile-time check is enough.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-28 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

> Because you can't simple replace the files.
  
Why not? This looks as simplest option when you build hard customized CPython.
 
>  It also contains _Py_HashBytes() and  _PyHash_Fini().
  
_PyHash_Fini() should be moved out too._Py_HashBytes() is only function which 
should be customized.
 
> I don't understand why you want me to get rid of the struct. What's your
 > argument against the struct? I like the PyHash_FuncDef because it groups
 > all information (func ptr, name, hash metadata) in a single structure.
  
Because it is redundant and only complicates the code, both use and declaration.
 
> > Why _Py_HashDouble() and _Py_HashPointer() are moved to pyhash.c? They are
 > > hash algorithm agnostic, and it is unlikely they will be redefined in
 > > custom build.
 > I have moved the functions to pyhash.c in order to keep all related
 > internal function in one file. They do not belong in Objects/object.c.
  
There are other hash related functions (hashing integers, tuples). Only 
_Py_HashBytes() should be customized and only it worth moving to separated file.
 
> > You not need the HAVE_ALIGNED_REQUIRED macros if use PY_UHASH_CPY (or
 > > something like for exact 64 bit) in siphash24. On platforms where aligned
 > > access is required you will use per-bytes copy, otherwise you will use
 > > fast 64-bit copy.
 > I'm not going to make siphash24 compatible with platforms that require
 > aligned memory for integers. It's an unnecessary complication and
 > slow-down for all common platforms. The feature will simply not be
 > available on archaic architectures.
  
The benefit is that the code will be simpler if get rid from 
HAVE_ALIGNED_REQUIRED and related code in ./configure. Only on such archaic 
architectures hash code will be slower.
 
> Serhiy, I would like to land my patch before beta 1 hits the fan. We can
 > always improve the code during beta. Right now I don't want to mess around
 > with SipHash24 code. That includes non-64bit platforms as well as
 > architectures that enforce aligned memory for integers.
  
Let first land simplified patch and then you could add features such as 
PY_HASH_EXTERNAL and PyHash_FuncDef.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-28 Thread Christian Heimes

Christian Heimes added the comment:

I have added an optimization for hashing of small strings. It uses an inline 
version of DJBX33A for small strings [1, 7) on 64bit and [1, 5) on 32bit.

Nick, please use "create patch" before you do your review.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-28 Thread Christian Heimes

Christian Heimes added the comment:

Am 28.10.2013 16:59, schrieb Charles-François Natali:
> Well, unaligned memory access is usually slower on all architectures :-)
> Also, I think some ARM architectures don't support unaligned access, so
> it's not really a thing of the past...

On modern computers it's either not slower or just a tiny bit slower.
http://lemire.me/blog/archives/2012/05/31/data-alignment-for-speed-myth-or-reality/

Python's str and bytes datatype are always aligned properly. The
majority of bytearray and memoryview instances are aligned, too.
Unaligned memory access is rare case for most applications. About 50% of
strings have less than 8 bytes (!), 90% have less than 16 bytes. For the
Python's test suite the numbers are even smaller: ~45% <=5 bytes, ~90%
<=12 bytes.

You might see a 10% slowdown for very long and unaligned byte arrays on
some older CPUs. I think we can safely ignore the special case. Any
special case for unaligned memory will introduce additional overhead
that *will* slow down the common path.

Oh, and ARM has unaligned memory access:
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0360f/CDFEJCBH.html

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-28 Thread Charles-François Natali

Charles-François Natali added the comment:

> Seriously, nobody gives a ... about SPARC and MIPS. :) It's nice that
> Python still works on these CPU architectures. But I neither want to
> deviate from the SipHash24 implementation nor make the code slower on
> all relevant platforms such as X86 and X86_64.

Well, unaligned memory access is usually slower on all architectures :-)
Also, I think some ARM architectures don't support unaligned access, so
it's not really a thing of the past...

--
nosy: +neologix

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-28 Thread Christian Heimes

Christian Heimes added the comment:

Serhiy, I would like to land my patch before beta 1 hits the fan. We can always 
improve the code during beta. Right now I don't want to mess around with 
SipHash24 code. That includes non-64bit platforms as well as architectures that 
enforce aligned memory for integers.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-28 Thread Christian Heimes

Christian Heimes added the comment:

Am 28.10.2013 16:18, schrieb Serhiy Storchaka:
> Christian, why PY_HASH_EXTERNAL is here? Do you plan use it any official 
> build? I think that in custom build of Python whole files pyhash.c and 
> pyhash.h can be replaced.

Because you can't simple replace the files. It also contains
_Py_HashBytes() and  _PyHash_Fini(). With PY_HASH_EXTERNAL embedders can
simply define PY_HASH_ALGORITHM PY_HASH_EXTERNAL and provide the extern
struct inside a separate object file.

> When you will get rid from PY_HASH_EXTERNAL, then you could get rid from 
> PyHash_FuncDef, PyHash_Func, etc.

I don't understand why you want me to get rid of the struct. What's your
argument against the struct? I like the PyHash_FuncDef because it groups
all information (func ptr, name, hash metadata) in a single structure.

> Why _Py_HashDouble() and _Py_HashPointer() are moved to pyhash.c? They are 
> hash algorithm agnostic, and it is unlikely they will be redefined in custom 
> build.

I have moved the functions to pyhash.c in order to keep all related
internal function in one file. They do not belong in Objects/object.c.

> You not need the HAVE_ALIGNED_REQUIRED macros if use PY_UHASH_CPY (or 
> something like for exact 64 bit) in siphash24. On platforms where aligned 
> access is required you will use per-bytes copy, otherwise you will use fast 
> 64-bit copy.

I'm not going to make siphash24 compatible with platforms that require
aligned memory for integers. It's an unnecessary complication and
slow-down for all common platforms. The feature will simply not be
available on archaic architectures.

Seriously, nobody gives a ... about SPARC and MIPS. :) It's nice that
Python still works on these CPU architectures. But I neither want to
deviate from the SipHash24 implementation nor make the code slower on
all relevant platforms such as X86 and X86_64.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-28 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Christian, why PY_HASH_EXTERNAL is here? Do you plan use it any official build? 
I think that in custom build of Python whole files pyhash.c and pyhash.h can be 
replaced.

When you will get rid from PY_HASH_EXTERNAL, then you could get rid from 
PyHash_FuncDef, PyHash_Func, etc.

Why _Py_HashDouble() and _Py_HashPointer() are moved to pyhash.c? They are hash 
algorithm agnostic, and it is unlikely they will be redefined in custom build.

You not need the HAVE_ALIGNED_REQUIRED macros if use PY_UHASH_CPY (or something 
like for exact 64 bit) in siphash24. On platforms where aligned access is 
required you will use per-bytes copy, otherwise you will use fast 64-bit copy.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-28 Thread Nick Coghlan

Nick Coghlan added the comment:

On 27 Oct 2013 23:46, "Christian Heimes"  wrote:
>
>
> Christian Heimes added the comment:
>
> Nick, please review the latest patch. I have addressed Antoine's review
in 257597d20fa8.diff. I'll update the PEP as soon as you are happy with the
patch.

Comments from the others sound promising, I should be able to take a look
myself tomorrow night.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-27 Thread Christian Heimes

Changes by Christian Heimes :


Added file: http://bugs.python.org/file32392/19427e9cc500.diff

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-27 Thread Christian Heimes

Christian Heimes added the comment:

PyObject_Hash() and PyObject_HashNotImplemented() should not have been moved to 
pyhash.h. But the other internal helper methods should be kept together.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-27 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

I suggest move to Include/pyhash.h and Python/pyhash.c only _Py_HashBytes() and 
string hash algorithm related constants, and don't touch PyObject_Hash(), 
_Py_HashDouble(), etc. So if anybody want change string hashing algorithm, it 
need only replace these two minimal files.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-27 Thread Christian Heimes

Christian Heimes added the comment:

Nick, please review the latest patch. I have addressed Antoine's review in 
257597d20fa8.diff. I'll update the PEP as soon as you are happy with the patch.

--
assignee:  -> ncoghlan

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-27 Thread Christian Heimes

Changes by Christian Heimes :


Added file: http://bugs.python.org/file32388/257597d20fa8.diff

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-27 Thread Antoine Pitrou

Antoine Pitrou added the comment:

> I can no longer find the configuration for custom path. It's still documented 
> but there is no field for "repo path".

http://buildbot.python.org/all/builders/PPC64%20PowerLinux%20custom

(usually, just replace "3.x" with "custom" in the URL)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-27 Thread Christian Heimes

Christian Heimes added the comment:

I can no longer find the configuration for custom path. It's still documented 
but there is no field for "repo path".

http://buildbot.python.org/all/buildslaves/edelsohn-powerlinux-ppc64

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-27 Thread Antoine Pitrou

Antoine Pitrou added the comment:

> I'm still looking for a 64bit big endian box

Have you tried the PPC64 PowerLinux box? It's in the stable buildbots for a 
reason :-)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-26 Thread Christian Heimes

Christian Heimes added the comment:

Nick, can you do another review? All tests should pass on common boxes.

The latest code hides the struct with the hash function. I have added a 
configure block that detects platforms that don't support unaligned memory 
access. It works correctly on the SPARC Solaris 10 box.

I'm still looking for a 64bit big endian box and a 32bit big endian box that 
support unaligned memory.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-26 Thread Christian Heimes

Changes by Christian Heimes :


Added file: http://bugs.python.org/file32384/31ce9488be1c.diff

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-26 Thread Christian Heimes

Changes by Christian Heimes :


Removed file: http://bugs.python.org/file32365/38b3ad4287ef.diff

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-25 Thread Christian Heimes

Changes by Christian Heimes :


Added file: http://bugs.python.org/file32365/38b3ad4287ef.diff

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-24 Thread Christian Heimes

Christian Heimes added the comment:

I have created a clone for PEP 456 and applied your suggestions. I'm still 
looking for a nice API to handle the hash definition. Do you have some 
suggestions?

--
hgrepos: +212

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-14 Thread Nick Coghlan

Nick Coghlan added the comment:

Added some structural comments to the patch. I'll defer to Serhiy when it comes 
to assessing the algorithm details :)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-07 Thread Christian Heimes

Christian Heimes added the comment:

Sure it does. The test for unaligned hashing passes without an error or a 
segfault.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-07 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

> The test for unaligned hashing passes without an error or a segfault.

On some platforms it can work without a segfault.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-07 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

And note that the quality of the FNV hash function is reduced (msg186403). We 
need "shuffle" result's bits.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-07 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Since hash algorithm determined at compile time, the _Py_HashSecret_t structure 
and the _Py_HashSecret function are redundant. We need define only the 
_Py_HashBytes function.

Currently SipHash algorithm doesn't work with unaligned data.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-07 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

I propose extract all hash related stuff from Include/object.h in separated 
file Include/pyhash.h. And perhaps move Objects/hash.c to Python/pyhash.c.

--
nosy: +serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-07 Thread Roundup Robot

Roundup Robot added the comment:

New changeset c960bed22bf6 by Christian Heimes in branch 'default':
Make Nick BDFG delegate
http://hg.python.org/peps/rev/c960bed22bf6

--
nosy: +python-dev

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-07 Thread Christian Heimes

Christian Heimes added the comment:

unmodified Python:

1000 loops, best of 3: 307 usec per loop (unicode)
1000 loops, best of 3: 930 usec per loop (memoryview)

SipHash:

1000 loops, best of 3: 300 usec per loop (unicode)
1000 loops, best of 3: 906 usec per loop (memoryview)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-07 Thread Antoine Pitrou

Antoine Pitrou added the comment:

> Your benchmark is a bit unrealistic because it times the hash cache
> most of the time. Here is a better benchmark (but bytes-only):
> 
> $ ./python -m timeit -s "words=[w.encode('utf-8') for line in
> open('../LICENSE') for w in line.split()]; import collections" -- "c
> = collections.Counter(memoryview(w) for w in words);
> c.most_common(10)"
> 1000 loops, best of 3: 1.63 msec per loop

Good point. Can you also post all benchmark results?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-07 Thread Christian Heimes

Christian Heimes added the comment:

Your benchmark is a bit unrealistic because it times the hash cache most of the 
time. Here is a better benchmark (but bytes-only):

$ ./python -m timeit -s "words=[w.encode('utf-8') for line in 
open('../LICENSE') for w in line.split()]; import collections" -- "c = 
collections.Counter(memoryview(w) for w in words); c.most_common(10)"
1000 loops, best of 3: 1.63 msec per loop

This increases the number of hash calculations from about 28k to over 8.4 mio. 

I also added a little statistic function to see how large typical string are. 
The artificial benchmark: 

hash 1:115185
hash 2:   1440956
hash 3:   1679976
hash 4:873769
hash 5:948124
hash 6:651799
hash 7:676707
hash 8:545459
hash 9:523615
hash10:421232
hash11:161641
hash12:140797
hash13: 86826
hash14: 41702
hash15: 41570
hash16:   332
hash17:   211
hash18:  4275
hash19:   205
hash20:   131
hash21:  4197
hash22:70
hash23:35
hash24:44
hash25:  4145
hash26:  4137
hash27:  4137
hash28:21
hash29:  4124
hash30: 8
hash31: 5
hash32: 1
hash other: 28866
hash total:   8404302

And here is the statistic of a full test run.

hash 1: 18935
hash 2:596761
hash 3:643973
hash 4:645399
hash 5:576231
hash 6:742531
hash 7:497214
hash 8:330890
hash 9:291301
hash10: 93206
hash11:   1417900
hash12:160802
hash13: 58675
hash14: 49324
hash15: 48068
hash16: 90634
hash17: 24163
hash18: 66079
hash19: 23408
hash20: 20695
hash21: 16424
hash22: 17236
hash23: 59135
hash24: 10368
hash25:  6047
hash26:  6784
hash27:  5565
hash28:  5931
hash29:  3469
hash30:  4220
hash31:  2652
hash32:  2911
hash other: 72042
hash total:   6608973

About 50% of the hashed elements are between 1 and 5 characters long. A 
realistic hash collision attack on 32bit needs at least 7, 8 chars. I see the 
chance of a micro-optimization! :)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-06 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Microbenchmarking hash computation (Linux, gcc 4.7.3):

* Short strings:
  python -m timeit -s "b=b'x'*20" "hash(memoryview(b))"

- 64-bit build, before: 0.263 usec per loop
- 64-bit build, after: 0.263 usec per loop

- 32-bit build, before: 0.303 usec per loop
- 32-bit build, after: 0.358 usec per loop

* Long strings:
  python -m timeit -s "b=b'x'*1000" "hash(memoryview(b))"

- 64-bit build, before: 1.56 usec per loop
- 64-bit build, after: 1.03 usec per loop

- 32-bit build, before: 1.61 usec per loop
- 32-bit build, after: 2.46 usec per loop

Overall, performance looks fine to me.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-06 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Here is a simple benchmark (Linux, gcc 4.7.3):

$ ./python -m timeit -s "words=[w for line in open('LICENSE') for w in 
line.split()]; import collections" "c = collections.Counter(words); 
c.most_common(10)"

- 64-bit build, before: 313 usec per loop
- 64-bit build, after: 298 usec per loop

- 32-bit build, before: 328 usec per loop
- 32-bit build, before: 329 usec per loop

- x32 build, before: 291 usec per loop
- x32 build, after: 284 usec per loop

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-06 Thread Christian Heimes

New submission from Christian Heimes:

The patch implements the current state of PEP 456 plus a configure option to 
select the hash algorithm. I have tested it only on 64bit Linux so far.

--
components: Interpreter Core
files: pep-0456-1.patch
keywords: patch
messages: 199078
nosy: christian.heimes, ncoghlan, pitrou
priority: normal
severity: normal
stage: patch review
status: open
title: PEP 456 Secure and interchangeable hash algorithm
type: enhancement
versions: Python 3.4
Added file: http://bugs.python.org/file31974/pep-0456-1.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com