[issue39298] add BLAKE3 to hashlib

2022-03-24 Thread Jack O'Connor


Jack O'Connor  added the comment:

I did reply to that point above with some baseless speculation, but now I can 
back up my baseless speculation with unscientific data :)

https://gist.github.com/oconnor663/aed7016c9dbe5507510fc50faceaaa07

According to whatever `powerstat -R` measures on my laptop, running 
hardware-accelerated SHA-256 in a loop for a minute or so takes 26.86 Watts on 
average. Doing the same with AVX-512 BLAKE3 takes 29.53 Watts, 10% more. 
Factoring in the 4.69x difference in throughput reported by those loops, the 
overall energy/byte for BLAKE3 is 4.27x lower than SHA-256. This is my first 
time running a power benchmark, so if this sounds implausible hopefully someone 
can catch my mistakes.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2022-03-24 Thread Gregory P. Smith


Gregory P. Smith  added the comment:

You missed the key "And certainly more efficient in terms of watt-secs/byte" 
part.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2022-03-24 Thread Jack O'Connor


Jack O'Connor  added the comment:

> Truncated sha512 (sha512-256) typically performs 40% faster than sha256 on 
> X86_64.

Without hardware acceleration, yes. But because SHA-NI includes only SHA-1 and 
SHA-256, and not SHA-512, it's no longer a level playing field. OpenSSL's 
SHA-512 and SHA-512/256 both get about 797 MB/s on my machine.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2022-03-24 Thread Christian Heimes


Christian Heimes  added the comment:

sha1 should be considered broken anyway and sha256 does not perform well on 
64bit systems. Truncated sha512 (sha512-256) typically performs 40% faster than 
sha256 on X86_64. It should get you close to the performance of BLAKE3 SSE4.1 
on your system.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2022-03-24 Thread Jack O'Connor


Jack O'Connor  added the comment:

> Hardware accelerated SHAs are likely faster than blake3 single core.

Surprisingly, they're not. Here's a quick measurement on my recent ThinkPad 
laptop (64 KiB of input, single-threaded, TurboBoost left on), which supports 
both AVX-512 and the SHA extensions:

OpenSSL SHA-256: 1816 MB/s
OpenSSL SHA-1:   2103 MB/s
BLAKE3 SSE2: 2109 MB/s
BLAKE3 SSE4.1:   2474 MB/s
BLAKE3 AVX2: 4898 MB/s
BLAKE3 AVX-512:  8754 MB/s

The main reason SHA-1 and SHA-256 don't do better is that they're fundamentally 
serial algorithms. Hardware acceleration can speed up a single instance of 
their compression functions, but there's just no way for it to run more than 
one instance per message at a time. In contrast, AES-CTR can easily parallelize 
its blocks, and hardware accelerated AES does beat BLAKE3.

> And certainly more efficient in terms of watt-secs/byte.

I don't have any experience measuring power myself, so take this with a grain 
of salt: I think the difference in throughput shown above is large enough that, 
even accounting for the famously high power draw of AVX-512, BLAKE3 comes out 
ahead in terms of energy/byte. Probably not on ARM though.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2022-03-23 Thread Larry Hastings


Larry Hastings  added the comment:

> Performance wise... The SHA series have hardware acceleration on
> modern CPUs and SoCs.  External libraries such as OpenSSL are in a
> position to provide implementations that make use of that. Same with
> the Linux Kernel CryptoAPI (https://bugs.python.org/issue47102).
>
> Hardware accelerated SHAs are likely faster than blake3 single core.
> And certainly more efficient in terms of watt-secs/byte.

I don't know if OpenSSL currently uses the Intel SHA1 extensions.
A quick google suggests they added support in 2017.  And:

* I'm using a recent CPU that AFAICT supports those extensions.
  (AMD 5950X)
* My Python build with BLAKE3 support is using the OpenSSL implementation
  of SHA1 (_hashlib.openssl_sha1), which I believe is using the OpenSSL
  provided by the OS.  (I haven't built my own OpenSSL or anything.)
* I'm using a recent operating system release (Pop!_OS 21.10), which
  currently has OpenSSL version 1.1.1l-1ubuntu1.1 installed.
* My Python build with BLAKE3 doesn't support multithreaded hashing.
* In that Python build, BLAKE3 is roughly twice as fast as SHA1 for
  non-trivial workloads.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2022-03-23 Thread Gregory P. Smith


Gregory P. Smith  added the comment:

Rust based anything comes with a baseline level of Rust code overhead. 
https://stackoverflow.com/questions/29008127/why-are-rust-executables-so-huge

That seems expected.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2022-03-23 Thread Marc-Andre Lemburg


Marc-Andre Lemburg  added the comment:

Here's a wheel which only includes the portable code (I disabled
all the special cases as you suggested).

Archive:  dist/blake3_experimental_c-0.0.1-cp310-cp310-linux_x86_64.whl
  Length  DateTimeName
-  -- -   
   297680  2022-03-23 19:26   blake3.cpython-310-x86_64-linux-gnu.so
 3183  2022-03-23 19:26   blake3_experimental_c-0.0.1.dist-info/METADATA
  105  2022-03-23 19:26   blake3_experimental_c-0.0.1.dist-info/WHEEL
7  2022-03-23 19:26   
blake3_experimental_c-0.0.1.dist-info/top_level.txt
  451  2022-03-23 19:26   blake3_experimental_c-0.0.1.dist-info/RECORD
- ---
   301426 5 files

I didn't run any benchmarks, but it's clear that the SIMD code was
used in my initial build and this adds some 50kB to the .so file.
This is on a older Linux x64 box with Intel i7-4770k CPU.

Could be that the Rust version adds several such SIMD variants and
then branches based on the platform running the code.

In any case, the C extension is indeed very easy to build and
install with a standard compiler setup.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2022-03-23 Thread Gregory P. Smith


Gregory P. Smith  added the comment:

Performance wise... The SHA series have hardware acceleration on modern CPUs 
and SoCs.  External libraries such as OpenSSL are in a position to provide 
implementations that make use of that. Same with the Linux Kernel CryptoAPI 
(https://bugs.python.org/issue47102).

Hardware accelerated SHAs are likely faster than blake3 single core. And 
certainly more efficient in terms of watt-secs/byte.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2022-03-23 Thread Gregory P. Smith


Gregory P. Smith  added the comment:

To anyone else who comes along with motivation:

I'm fine with blake3 being in hashlib, but I don't want us to guarantee it by 
carrying the implementation of the algorithm in the CPython codebase itself 
unless it gains wide industry standard-like adoption status.

We should feel free to link to both the Rust blake3 and C blake3-py packages 
from the hashlib docs regardless.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2022-03-23 Thread Larry Hastings


Larry Hastings  added the comment:

I can't answer why the Rust one is so much larger--that's a question for Jack.  
But the blake3-py you built might (should?) have support for SIMD extensions.  
See the setup.py for how that works; it appears to at least try to use the SIMD 
extensions on x86 POSIX (32- and 64-bit), x86_64 Windows, and 64-bit ARM POSIX.

If you were really curious, you could run some quick benchmarks, then hack your 
local setup.py to not attempt adding support for those (see "portable code 
only" in setup.py) and do a build, and run your benchmarks again.  If BLAKE3 
got a lot slower, yup, you (initially) built it with SIMD extension support.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2022-03-23 Thread Marc-Andre Lemburg


Marc-Andre Lemburg  added the comment:

With "lean" I meant: doesn't use much code and is easy to compile
and install.

I built a wheel from Jack's experimental package and it comes out to
just under 100kB on Linux x64, compared to around the 1.1MB the
Rust wheel needs:

Archive:  blake3_experimental_c-0.0.1-cp310-cp310-linux_x86_64.whl
  Length  DateTimeName
-  -- -   
   348528  2022-03-23 18:38   blake3.cpython-310-x86_64-linux-gnu.so
 3183  2022-03-23 18:38   blake3_experimental_c-0.0.1.dist-info/METADATA
  105  2022-03-23 18:38   blake3_experimental_c-0.0.1.dist-info/WHEEL
7  2022-03-23 18:38   
blake3_experimental_c-0.0.1.dist-info/top_level.txt
  451  2022-03-23 18:38   blake3_experimental_c-0.0.1.dist-info/RECORD
- ---
   352274 5 files

Archive:  blake3-0.3.1-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.whl
  Length  DateTimeName
-  -- -   
 3800  2022-01-13 01:26   blake3-0.3.1.dist-info/METADATA
  133  2022-01-13 01:26   blake3-0.3.1.dist-info/WHEEL
   48  2022-01-13 01:26   blake3/__init__.py
  4195392  2022-01-13 01:26   blake3/blake3.cpython-310-x86_64-linux-gnu.so
  382  2022-01-13 01:26   blake3-0.3.1.dist-info/RECORD
- ---
  4199755 5 files

I don't know why there is such a significant difference in size. Perhaps
the Rust version includes multiple variants for different CPU
optimizations ?!

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2022-03-23 Thread Larry Hastings


Larry Hastings  added the comment:

The Rust version is already quite "lean".  And it can be much faster than the C 
version, because it supports internal multithreading.  Even without 
multithreading I bet it's at least a hair faster.

Also, Jack has independently written a Python package based around the C 
version:

  https://github.com/oconnor663/blake3-py/tree/master/c_impl

so my making one would be redundant.

I have no interest in building standalone BLAKE3 PyPI packages for Raspberry Pi 
or Android.  My goal was for BLAKE3 to be one of the "included batteries" in 
Python--which would have meant it would, eventually, be available on the 
Raspberry Pi and Android builds that way.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2022-03-23 Thread Marc-Andre Lemburg


Marc-Andre Lemburg  added the comment:

On 23.03.2022 17:53, Larry Hastings wrote:
> 
> Ok, I give up.

Sorry to spoil the fun, but there's no need to throw
everything in the bin ;-)

A lean and fast blake3 C package would still be a great thing
to have on PyPI, e.g. provide support for platforms, which
Jack's blake3 Rust package doesn't cover, e.g.

Raspis:
https://www.piwheels.org/project/blake3/

Android (e.g. via termux):
https://wiki.termux.com/wiki/Main_Page
https://wiki.termux.com/wiki/Python

etc.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2022-03-23 Thread Larry Hastings


Larry Hastings  added the comment:

Ok, I give up.

--
resolution:  -> rejected
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2022-03-23 Thread Marc-Andre Lemburg


Marc-Andre Lemburg  added the comment:

On 23.03.2022 02:12, Gregory P. Smith wrote:
> 
> I view the NIST standard hashes as important enough to attempt to guarantee 
> as present (all the SHAs and MD5) as built-in. Others should really 
> demonstrate practical application popularity to gain included battery status 
> rather than just using PyPI.

+1 on this. I also think the topic deserves a wider discussion.

IMO, Python's stdlib should only provide a basic set of hash algorithms
and not try to add every single new algorithm out there.

PyPI is a much better way to add support for new hash algorithms,
can move much faster than the stdlib, provide specialized builds for
added performance and also add exotic features, which are not always
needed.

Here's the list of Python 3.10 algos on a typical Linux system:

>>> hashlib.algorithms_available
{'sha512_256', 'mdc2', 'md5-sha1', 'md4', 'ripemd160', 'shake_128', 'sha3_384',
'blake2s', 'sha3_512', 'sha3_256', 'sha256', 'sha1', 'sm3', 'sha512_224',
'whirlpool', 'sha384', 'shake_256', 'sha224', 'sha512', 'sha3_224', 'md5',
'blake2b'}

This already is more than enough. Since we're using OpenSSL in Python
anyway, exposing some of the often used algos from OpenSSL is fine,
since it doesn't add much extra bloat. The above list already goes
way beyond this, IMO.

The longer the list gets, the more confusion it causes among users,
since Python's stdlib doesn't provide any guidance on
basic questions such as "Which hash algo should I use for my
application".

Most applications today will only need these basic hash algos:

{'ripemd160', 'sha3_512', 'sha3_256', 'sha256', 'sha1', 'sha512', 'md5'}

--
nosy: +lemburg

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2022-03-22 Thread Jack O'Connor


Jack O'Connor  added the comment:

> maintaining a complicated build process in-tree

For what it's worth, if you have any sort of "in a perfect world" vision for 
what the upstream BLAKE3 project could do to make it trivially easy for you to 
integrate, I'd be very interested in getting that done. Making integration easy 
would benefit all callers. We have some issues on the backburner about 
committing CMake build files, but I assume those would be useless for us here. 
Is there anything that would be more useful? If we provided autotools build 
files, could you call into them?

Fundamentally, BLAKE3 wants to build some code on x86 and some other code on 
ARM, and also some code on Unix and some other code on Windows. Currently we 
just ask the caller to do that for us, for lack of a common standard. (And if 
we're building intrinsics rather than assembly, we also need the compiler flags 
that enable our intrinsics.) But maybe we could handle more of that upstream, 
using the preprocessor? If the build instructions said "compile this one giant 
file on all platforms and don't worry about what it does", would that be 
better? Or would that be gross? Is a header-only library the gold standard? Or 
too C++-ish? Has anyone ever done a really good job of this?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2022-03-22 Thread Gregory P. Smith


Gregory P. Smith  added the comment:

Because I don't think blake3 or blake2 _(though we've shipped it already so 
there's a challenge in making changes https://bugs.python.org/issue47095)_ are 
important enough to be _guaranteed_ present in all builds (our release binaries 
would include them). Depending on an external library for those to exist makes 
sense.

I do not want CPython to get into the business of maintaining a complicated 
build process in-tree for third party architecture specific optimized code for 
non-core functionality purposes.  That is best handled outside of the project & 
on CI and binary release hosts.

I'm okay with blake3 in hashlib if we can avoid gaining another /impl/ tree 
that is a copy of large third party code and our own build system for it.

Q: What benefits does having blake3 builtin vs getting it from PyPI bring?

Q: Should we instead provide a way for third party provided hashes to be 
registered in `hashlib` similar to `codecs.register()`?

I view the NIST standard hashes as important enough to attempt to guarantee as 
present (all the SHAs and MD5) as built-in. Others should really demonstrate 
practical application popularity to gain included battery status rather than 
just using PyPI.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2022-03-22 Thread Larry Hastings


Larry Hastings  added the comment:

> Given that I don't want to see us gain new vendored copies of
> significant but non-critical third party hash code in our tree
> (Modules/_blake3/impl/ in PR 31686) for anything but a known
> fixed term need (ex: the sha2 libtomcrypt code is gone from
> our tree as was clearly going to happen from the start),
> the only way I think we should include blake3 support is if
> there is already a plan for that code to leave our tree in
> the future with a high probability of success.

You've said what you want, but not why.  It sounds like you are against merging 
the BLAKE3 PR containing its own impl.  Why?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2022-03-22 Thread Gregory P. Smith


Gregory P. Smith  added the comment:

correction: our md5/sha1/sha2/sha3 code is not gone yet, but they are simple C 
implementations used as a fallback when the provider of optimal versions are 
unavailable (openssl for those).  That keeps the copies of code in our tree 
simple and most people use the optimal library version.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2022-03-22 Thread Gregory P. Smith


Gregory P. Smith  added the comment:

hashlib creator and other maintainer here: I do not think it was a good idea 
for us to add blake2 to hashlib the way we did. So blake3 should not be 
presumed as a given, at least not done in the same manner.

Background:

While OpenSSL gained _some_ blake2 support in 2016, around when we were adding 
it to hashlib (not a coincidence), we made a mistake: We offered an overly 
complex API. OpenSSL's own support for blake2 is a subset, not sufficient to be 
used as a replacement for the API we exposed so we are stuck with our vendored 
copy with no easy way off. https://github.com/openssl/openssl/issues/980

OpenSSL is not going to gain native blake3 support. 
https://github.com/openssl/openssl/issues/11613

Given that I don't want to see us gain new vendored copies of significant but 
non-critical third party hash code in our tree (Modules/_blake3/impl/ in PR 
31686) for anything but a known fixed term need (ex: the sha2 libtomcrypt code 
is gone from our tree as was clearly going to happen from the start), the only 
way I think we should include blake3 support is if there is already a plan for 
that code to leave our tree in the future with a high probability of success.

A `configure.ac` check for an installed blake3 library to build and link 
against would be appropriate.

Along with updating relevant CI systems and Windows and macOS release build 
systems to have that available.  

That'd significantly shrink the PR to reasonable size.

This means blake3 support should be considered optional as anyone doing their 
own CPython build may not have it.  This is primarily a documentation issue: 
list it as such and provide one official documented API to detect its 
availability.  Our binary releases will include it as will most OS distro 
packagers.  It also means implementation details, performance and platform 
tuning are **not our problem** but those of the OS distro or external library 
provider.

Regarding setup.py, what Christian says is true, that is on its way out. Do not 
add new non-trivial statements to it as that just creates more work for those 
working to untangle the mess. Getting rid of the /impl/ build in favor of an 
autoconf detected library gets rid of that mess.

I'll file a separate issue to track moving blake2 in the same direction so we 
can lose it's /impl/.

--
nosy: +gregory.p.smith

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2022-03-22 Thread Larry Hastings


Larry Hastings  added the comment:

Jack: I've updated the PR, improving compatibility with the "blake3" package on 
PyPI.  I took your notes, and also looked at the C module you wrote.

The resulting commit is here:

https://github.com/python/cpython/pull/31686/commits/37ce72b0444ad63fd1989ad36be5f7790e51f4f1

Specifically:

* derive_key_context is now a string, which I internally encode into UTF-8.

* I added the AUTO member to the module, set to -1.

* max_threads may not be zero; it can be >= 1 or AUTO.

* I added the reset() method.


Some additional thoughts, both on what I did and on what you did:

* In your new() method, your error string says "keys must be 32 bytes".  I went 
with "key must be exactly 32 bytes"; the name of the parameter is "key", and I 
think "exactly" makes the message clearer.

* In my new() method, I complain if the derive_key_context is zero-length.  In 
your opinion, is that a good idea, or is that overly fussy?

* In your copy() method, you hard-code Blake3Type.  It's considered good form 
to use type(self) here, in case the user is calling copy on a user-created 
subclass of Blake3Type.

* In your copy() method, if the original has a lock created, you create a lock 
in the copy too.  I'm not sure why you bother; I leave the lock member 
uninitialized, and let the existing logic create the lock in the copy on demand.

* In the Blake3_methods array, you list the "update" method twice.  I suspect 
this is totally harmless, but it's unnecessary.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2022-03-05 Thread Larry Hastings


Larry Hastings  added the comment:

Right, and I did say "(or BDFL)".  Apparently you didn't bother to consult with 
the BDFL in advance, or at least not in the usual public venues--I haven't 
found a record of such a conversation on the bpo issue, nor in python-dev.

BTW you simultaneously proposed adding SHA3/SHAKE.  The total kloc for all this 
work was over 26k; you didn't mention any discomfort with the size of these 
patches at the time in public correspondance.

In fact, quite the opposite.  On 2016/05/28 you said:

> I also don't get your obsession with lines of code. The gzip and expat
> are far bigger than the KeccakCodePackage.

https://mail.python.org/archives/list/python-...@python.org/message/3YHVN2I74UQC36AVY5BGRJJUE4PMU6GX/

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2022-03-05 Thread Christian Heimes


Christian Heimes  added the comment:

I didn't consult the steering council in 2016, because I lost the keys to the 
time machine. The very first SC election was in 2019. :)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2022-03-04 Thread Larry Hastings


Larry Hastings  added the comment:

Jack O'Connor:
> Was any of the experimental C extension code [...] useful to you?
> I was wondering if it could be possible to copy blake3module.c from
> there verbatim.

I concede I didn't even look at it.  The glue code to mate the library with the 
CPython runtime wasn't the hard part.  And, Python has its own way of working 
(Argument Clinic etc).  I did what seemed easiest, which
was to start with CPython's BLAKE2 support and hack it up.  Given that
I had it working in maybe half a day, this seems reasonable.

I should take a peek at your experimental build though, just to confirm
I got the interfaces right.


> The setup.py build there also has working Windows and NEON support.

The CPython build process doesn't use setup.py to build extensions on
Windows.  I haven't looked recently, but back in the day they were
built like any other DLL, using its own "project file".  This will
require someone to use the MSVS GUI to create a new project, add files,
etc etc.  I assume they can just use the .asm files for the SIMD
extensions so hopefully this will be straightforward.

As for NEON support, the technique I used theoretically should Just Work.
I look forward to hearing back from someone building on ARM.  (I have
a cross-compiler setup for building for MiSTer, which is an ARM platform
with NEON support.  But the cross-compiler is a royal PITA to use...
it's a Docker image, and requires a bunch of manual configuration,
and I haven't touched any of that in more than a year.)


You seem to have done at least a partial code review of my PR, given
your comments to follow.  I appreciate it!  I tried to add you as a
"reviewer" but GitHub wouldn't permit it--I assume this is some sort
of permissions problem.  Still, it'd be nice if you were able to do
future code reviews using the GitHub review tool; are you permitted to
use that for my PR? 

> - My derive_key_context parameter requires a string and refuses to
> accept bytes. This is consistent with our Rust and C APIs (though the C
> API does include a _raw version specifically for bindings, which we're
> using here).

I was considering going the other way with it actually, requiring bytes.
Note that Python has first-class support for hard-coded bytes strings:

b = blake3.blake3(derive_key_context=b"My Funny Valentine (1984)")

The C interface takes "char *", not a "wchar_t *", and this seemed like
the most direct and relatable way to reflect that.  But I'm not militant
about this, and I'm willing to change the interface to require an actual
string (as in Unicode).  I note that your C API already dictates that
Unicode be encoded as UTF-8, so we can do that, and if the encoding fails
the user can deal with it.


> - I've included an `AUTO` constant that provides a special value (-1)
> for the `max_threads` argument, and I explicitly don't support
> `max_threads=0`.

I can do that too; again, I prefer the 0 there, but I'm not militant about
it.  However, it would make sense to me if you had that constant somewhere
in the BLAKE3 C .h files, which AFAICT you currently don't.


> - I enforce that the `data` arguments are positional-only and that the
> other keyword arguments are keyword-only. I think this is consistent
> with the rest of hashlib.

I suspect hashlib is mostly like that, due to being chock full of
legacy code.  But I don't see why that's necessary.  I think permitting
"data" to be a named argument is fine.  So unless you have a strong
conviction about it--which I bet you don't--I'll leave it as
positional-or-keyword.

There are rare circumstances where positional-only arguments are useful;
this isn't one of them.


> - I include a `.reset()` method.

I don't mind adding that.


> - Unrelated to tests: I haven't made any attempt to zero memory in my
> `dealloc` function. But if that's what other hashlib functions do,
> then I'm certainly in favor of doing it here too.

I inherited that from the BLAKE2 code I carved up to make the BLAKE3
version.  And yeah, it made sense to me, so I kept it.


Christian Heimes:
> GH-31686 is a massive patch set. I'm feeling uncomfortable adding
> such much new code for a new hashing algorithm. Did you ask the
> Steering Council for approval?

I didn't.  Like most hashing algorithms, BLAKE3 doesn't allocate
memory and doesn't perform any I/O.  All it does is meditate on
the data you pass in, and write to various pre-allocated fixed-size
buffers.  As large codebases go this seems pretty harmless, almost inert.

The Modules/_blake3/impl directory is about 32kloc.  I note that the
Modules/_blake2/impl directory you checked in in 2016 is about 21kloc,
and you didn't require Steering Council (or BDFL) approval for that.

As (former) Steering Council member Barry Warsaw says: JFDI!


> The platform detection and compiler flag logic must be added to
> configure.ac instead of setup.py. Erlend and I are in the process
> of making setup.py optional. I plan to remove it entirely along
> with 

[issue39298] add BLAKE3 to hashlib

2022-03-04 Thread Christian Heimes


Christian Heimes  added the comment:

GH-31686 is a massive patch set. I'm feeling uncomfortable adding such much new 
code for a new hashing algorithm. Did you ask the Steering Council for approval?

The platform detection and compiler flag logic must be added to configure.ac 
instead of setup.py. Erlend and I are in the process of making setup.py 
optional. I plan to remove it entirely along with distutils in 3.12.

--
assignee: christian.heimes -> 

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2022-03-04 Thread Jack O'Connor


Jack O'Connor  added the comment:

Thanks Larry! Was any of the experimental C extension code under 
https://github.com/oconnor663/blake3-py/tree/master/c_impl useful to you? I was 
wondering if it could be possible to copy blake3module.c from there verbatim. 
The setup.py build there also has working Windows and NEON support.

I've patched the blake3-py test suite (which both the production Rust-based 
extension and that experimental C-based extension currently pass) to invoke the 
new hashlib implementation from your branch. You can find the full test output, 
and the procedure I used to run the tests, in this Gist 
https://gist.github.com/oconnor663/533048580b1c0f4a01d1d55f57f92792. Here are 
some differences:

- My derive_key_context parameter requires a string and refuses to accept 
bytes. This is consistent with our Rust and C APIs (though the C API does 
include a _raw version specifically for bindings, which we're using here). For 
a long discussion of why we prefer to do things this way, see 
https://github.com/BLAKE3-team/BLAKE3/issues/13. The short version is that any 
use case that requires arbitrary bytes for the context string is almost 
certainly violating the documented security requirement that the context string 
must be hardcoded.

- I've included an `AUTO` constant that provides a special value (-1) for the 
`max_threads` argument, and I explicitly don't support `max_threads=0`.

- I enforce that the `data` arguments are positional-only and that the other 
keyword arguments are keyword-only. I think this is consistent with the rest of 
hashlib.

- I include a `.reset()` method. This isn't particularly useful in the default 
case, where you might as well create a new hasher. But when `max_threads` is 
greater than 1 in the Rust implementation, the hasher owns a thread pool, and 
`.reset()` is currently the only way to reuse that pool. (A BLAKE3 hasher is 
also ~2 KB, somewhat larger than other hashers, so callers who are pinching 
pennies with their allocator traffic might prefer to reuse the object.)

- Unrelated to tests: I haven't made any attempt to zero memory in my `dealloc` 
function. But if that's what other hashlib functions do, then I'm certainly in 
favor of doing it here too.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2022-03-04 Thread Larry Hastings


Larry Hastings  added the comment:

Also, for what it's worth: I just ran my checksum benchmarker using a freshly 
built python a la my PR.  Here are my results when hashing 462183782 bytes 
(dicey-dungeons-linux64.zip):

  hash
 algorithm  timebytes/sec size hash 
 -  -- --  -
blake3  0.18976406 2435570699   64 0c72d0a07ba767b75b0c99fed38fda...
  sha1  0.21692419 2130623562   40 9fb83614394cd3b39e42b1d9f84a3c...
sha256  0.26399648 1750719480   64 2320129f2545ff8606d3db1d706d86...
sha224  0.28097475 1644929957   56 133b5da8d8b387f2bcfd69b0c73ed8...
   md4  0.34185237 1351998195   32 dea7585ea9fa4520687bab1dc671858e
   blake2b  0.53724666  860282275  128 e3653f33858a83b386c2fe865280a1...
   md5  0.58128106  795112407   32 299440e1968cf8f8abc288bac8c0a4fa
sha512_224  0.64589952  715566066   56 413d48b782f114870ef80815540678...
sha384  0.64645893  714946859   96 b1c1cd96cef79c15f2171b8aa81304...
sha512  0.65424513  706438241  128 e7d0cec3fe8b73d1534a7bdb484176...
sha512_256  0.68371638  675987586   64 3f58faba70cea4d6ea8a8371e71bbb...
  md5-sha1  0.80361958  575127576   72 299440e1968cf8f8abc288bac8c0a4...
 shake_128  0.84424524  547452041   64 c62a813897b81f67822fc07115deae...
   blake2s  0.85661793  539544839   64 cb8bd19c6ca446bbf7a8abbec61dc5...
  sha3_224  0.95759645  482649850   56 6f96d117c7fcbcd802b222854db644...
 shake_256  1.0152032   455262322   64 2d9f9dafe0ddf792c6407910946845...
  sha3_256  1.015744455019929   64 cc5d55fe0ac31f6e335da1bc6abaf3...
  sha3_384  1.3235858   349190644   96 13206910ff231fe51a38fe637ded30...
   sm3  1.4478934   319211203   64 021cd913540d95b13a03342b54f80d...
 ripemd160  1.4737549   313609670   40 1a956000b88267ec8fc23327d22548...
  sha3_512  1.9131832   241578418  128 e84b9f499b013956f6f36c93234ca3...

"time" is wall time in seconds.  So, yes, BLAKE3 was the fastest hash algorithm 
available on my machine--2.4GB/sec!

(I'm a little surprised by that result actually.  My CPU is pretty modern, so I 
assume it has the SHA1 extensions.  And I further assume we're using OpenSSL, 
and OpenSSL will use those extensions when available.  If BLAKE3 is *still* 
faster that OpenSSL, well! hats off to the BLAKE3 team.  And that just goes to 
show you how useful SIMD extensions are!)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2022-03-04 Thread Larry Hastings


Larry Hastings  added the comment:

Okay, so.  Here's a PR that adds BLAKE3 support to hashlib.

The code was straightforward; I just took the BLAKE2 module and modified it to 
only have one algorithm.  I also copied over the whole C directory tree from 
BLAKE3, which is totally okay fine by their license.  I didn't have to modify 
it a bit.

The tricky part was building it.  Building extension modules is done inside 
setup.py using distutils (or the modern replacement), and in order to get those 
sexy SIMD implementations I had to compile assembly-language files, or C files 
with extra flags.  What I did works great on my Linux box, and I have my 
fingers crossed that it'll work on other POSIX platforms.  I wrote a lng 
comment in setup.py explaining what I did and why.

Steve: I didn't do anything for Windows support.  Do you have time to take a 
pass at it?  Or if you know another core dev who does Windows build stuff, 
please nosy them.  Whoever does the work. I suggest you read 
https://github.com/BLAKE3-team/BLAKE3/blob/master/c/README.md for a little 
guidance on how to build BLAKE3 on Windows with SIMD support.  Also, I see you 
now build Python for Windows on ARM!  Does this mean Python can use BLAKE3's 
NEON support?  Maybe it's time to find out!  Get hyped!

Jack and I corresponded last year (or maybe 2020?) about what the API should 
look like.  The idea is, Jack also maintains a BLAKE3 Python extension on pypi, 
written in Rust.  Our goal was to make the two types behave identically, so 
that it could be like the old stringio / cstringio situation.  You can use the 
built-in one for convenience, but also you can install the Rust version from 
PyPI for even more performance.  Jack, it wouldn't hurt my feelings overly much 
if you checked my PR to make sure I got the interface right.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2022-03-04 Thread Larry Hastings


Change by Larry Hastings :


--
pull_requests: +29805
stage: needs patch -> patch review
pull_request: https://github.com/python/cpython/pull/31686

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2022-02-19 Thread Jack O'Connor


Jack O'Connor  added the comment:

Yes, everything in https://github.com/BLAKE3-team/BLAKE3 and 
https://github.com/oconnor663/blake3-py is public domain via CC0, and dual 
licensed under Apache for good measure. Hopefully that makes it easy to use it 
anywhere.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2022-02-18 Thread Larry Hastings


Larry Hastings  added the comment:

Just checking--I can liberally pull code from 
https://github.com/BLAKE3-team/BLAKE3 yes?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2022-02-17 Thread Larry Hastings


Larry Hastings  added the comment:

I thought someone volunteered to do it--if that's not happening, I could take a 
look at it next week.  Shouldn't be too hard... unless I have to touch 
autoconf, which I only barely understand.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2022-02-17 Thread Jack O'Connor


Jack O'Connor  added the comment:

What's the best way for me to help with the next steps of this?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2022-01-12 Thread Jack O'Connor

Jack O'Connor  added the comment:

Yeah by intrinsics I mean stuff like _mm256_add_epi32(). All of that stuff is 
in these vendored files:

blake3_avx2.c
blake3_avx512.c
blake3_neon.c
blake3_sse2.c
blake3_sse41.c

Also to Michał's question above, I'm not necessarily opposed to publishing 
something like "blake3-c" on PyPI once things stabilize. Even if we get BLAKE3 
into hashlib in 3.11, PyPI modules will be useful to folks running older 
versions, and not everyone wants to install the Rust toolchain.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2022-01-12 Thread Larry Hastings


Larry Hastings  added the comment:

I assume by "intrinsics" you mean using the GCC SIMD stuff, not like inlining 
memcpy() or something.  My assumption is yes, that's fine, we appear to already 
be using them for BLAKE2.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2022-01-12 Thread Jack O'Connor


Jack O'Connor  added the comment:

> As a first pass I say we merge the reference C implementation.

Do you mean portable-only C code, or portable + intrinsics? If the assembly 
files are out, I'd advocate for the latter. The intrinsics implementations are 
nearly as fast as the assembly code, and both of those are several times faster 
than the portable code. You can test this configuration with my current 
setup.py by setting the env var FORCE_INTRINSICS=1.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2022-01-12 Thread Larry Hastings


Larry Hastings  added the comment:

> In setup.py I assume that the target platform of the build is the same as the 
> current interpreter's platform.

If this is included in CPython, it won't be using setup.py, so this isn't a 
concern.

I don't think there's a way to use setup.py to cross-compile, so I'm not sure 
this ever was a concern.


> - Compiling assembly files.

AFAICT Python currently ships exactly one assembly file, 
"Modules/_decimal/libmpdec/vcdiv64.asm", which is only built on Windows.  It 
would be a brave new world of configure.ac hacking to build assembly language 
files on POSIX platforms.  As a first pass I say we merge the reference C 
implementation.  Maybe someday we could add the SIMD assembly language 
stuff--or use the one built in to OpenSSL (if they ever add BLAKE3).

> I assume we don't want to check in the .obj files?

Correct, we don't.

> - blake3module.c contains an awful lot of gotos to handle allocation failure 
> cases.

Works for me, please keep it.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2022-01-12 Thread Jack O'Connor


Jack O'Connor  added the comment:

I was about to say the only missing feature was docstrings, but then I realized 
I hadn't included releasing the GIL. I've added that and pushed an update just 
now. Fingers crossed there's nothing else I've missed. I think it's in 
reasonably good shape, and I'd like to propose it for inclusion in 3.11.

However, I'm not very experienced with setup.py or the Python C API, so I might 
not be the best judge of shape. Here are some highlights for reviewers, where I 
think the implementation (mostly the build) could be shaky:

- Platform detection. In setup.py I assume that the target platform of the 
build is the same as the current interpreter's platform. So for example, if the 
current interpreter has sys.maxsize==2**31-1, I assume we're compiling for 
32-bit. This clearly doesn't allow for any sort of cross-compilation, and in 
general I need feedback about whether there's a more correct way to obtain the 
target platform.

- Compiling assembly files. On Unix this is easy, because we can supply *.S 
files as `extra_objects` and GCC/Clang will do the right thing. But on Windows 
this isn't easy. Currently I shell out to vswhere.exe, ask it for the path to 
the latest version of the ml64.exe assembler, and then shell out to that to 
build .obj files. Then I pass those assembled .obj files as `extra_objects`. 
This feels awfully manual, and there's a good chance I've missed some 
better-supported way to do it. I assume we don't want to check in the .obj 
files?

- Does Python support the GNU ABI on Windows? We have assembly files for this 
in vendor/, but I'm not currently building them.

- Compiling intrinsics files for 32-bit x86. In this case, I create a 
`ccompiler.new_compiler()` for each intrinsics file, so that I can set the 
appropriate flags for each. This is relatively clean, but it leads to things 
getting rebuilt every single time, rather than participating in `setup.py 
build` caching. Maybe nobody cares about this, but again it makes me think 
there might be a better-supported way to do it.

- blake3module.c contains an awful lot of gotos to handle allocation failure 
cases. Is this still considered a best practice? These are bug-prone, and I'm 
not sure how to test them.

- Existing hashlib implementations include an optimization where they avoid 
allocating an internal mutex until they see a long input and want to release 
the GIL. As a quirky side effect of this, they handle allocation failures for 
that mutex by falling back to the do-not-release-the-GIL codepath. That feels 
kind of complicated to me, and in my code I'm always allocating the mutex 
during initialization. This doesn't seem to make much of a performance 
difference when I measure it, but there might be use cases I haven't considered.

Here are some other API details that might be worth bikeshedding:

- The `max_threads` parameter is currently defined to take a special value, 
`blake3.AUTO`, to indicate that the implementation may use as many threads as 
it likes. (The C implementation doesn't include multithreading support, but 
it's API-compatible with the Rust implementation.) `blake3.AUTO` is currently a 
class attribute equal to -1. We may want to bikeshed this name or propose some 
other representation.

- BLAKE3 has three modes: regular hashing, keyed hashing, and key derivation. 
The keyed hashing mode takes a 32-byte key, and the key derivation mode takes a 
context string. Calling the 32-byte key `key` seems good. Calling the context 
string `context` seems less good. Larry has pointed out before that lots of 
random things are called `context`, and readers might not understand what 
they're seeing. I currently call it `derive_key_context` instead, but we could 
bikeshed this.

- I check `itemsize` on input Py_buffers and throw an exception if it's 
anything other than 1. My test suite exercises this, see 
`test_int_array_fails`. However, some (all?) standard hashes don't do this 
check. For example:

>>> from hashlib import sha256
>>> import array
>>> a = array.array("i", [255])
>>> sha256(a).hexdigest()
'81ff65efc4487853bdb4625559e69ab44f19e0f5efbd6d5b2af5e3ab267c8e06'
>>> sha256(bytes([0xff, 0, 0, 0])).hexdigest()
'81ff65efc4487853bdb4625559e69ab44f19e0f5efbd6d5b2af5e3ab267c8e06'

Here we can see sha256() hashing an array of int. On my machine, an int is 4 
little-endian bytes, but of course this sort of thing isn't portable. The same 
array will result in a different SHA-256 output on a big-endian machine, or on 
a machine with ints of a different size. This seems undesirable, and I'm 
surprised that hashlib allows it. However, if there's some known compatibility 
reason why we have to allow it, I could remove this check.

--
versions: +Python 3.11 -Python 3.10

___
Python tracker 

___
___

[issue39298] add BLAKE3 to hashlib

2022-01-12 Thread Michał Górny

Michał Górny  added the comment:

I would still find it helpful to have a "proper" "blake3-c" package on normal 
pypi, for those of us who can't rely on Rust being present immediately.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2022-01-11 Thread Larry Hastings


Larry Hastings  added the comment:

So, can we shoot for adding this to 3.11?  Jack, do you consider the code is in 
good shape?

I'd be up for shepherding it along in the process.  In particular, I can 
contribute the bindings so BLAKE3 is a first-class citizen of hashlib.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2022-01-11 Thread Jack O'Connor


Jack O'Connor  added the comment:

Ah, good idea. I've published the new C implementation as: 
https://test.pypi.org/project/blake3-experimental-c/

You can install it with: pip install -i https://test.pypi.org/simple/ 
blake3-experimental-c

Despite the package name change, the extension module is still "blake3", so we 
still "import blake3" to get at it. For example:

$ pip install -i https://test.pypi.org/simple/ blake3-experimental-c
$ python
>>> from blake3 import blake3
>>> blake3(b"foo").hexdigest()
'04e0bb39f30b1a3feb89f536c93be15055482df748674b00d26e5a7502e9'
>>> blake3(b"foo", max_threads=blake3.AUTO).hexdigest()
'04e0bb39f30b1a3feb89f536c93be15055482df748674b00d26e5a7502e9'

To run the Rust implementation's test suite against this implementation, you 
could then:

$ pip install pytest numpy
$ git clone https://github.com/oconnor663/blake3-py
$ python -m pytest blake3-py/tests/test_blake3.py
= test session starts =
platform linux -- Python 3.10.1, pytest-6.2.5, py-1.11.0, pluggy-0.13.1
rootdir: /tmp
collected 24 items

blake3-py/tests/test_blake3.py  [100%]

= 24 passed in 0.30s ==

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2022-01-11 Thread Christian Heimes


Christian Heimes  added the comment:

You could upload the code to https://test.pypi.org/

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2022-01-10 Thread Jack O'Connor


Jack O'Connor  added the comment:

Update: There is now a C version of the `blake3` Python module available at 
https://github.com/oconnor663/blake3-py/tree/master/c_impl. It's completely 
API-compatible with the Rust version, and it passes the same test suite. 
Multithreading (which is implemented in upstream Rust but not in upstream C) is 
exposed through a "max_threads" argument, as Larry Hastings suggested. The C 
implementation allows this argument but ignores it.

Unlike my previous attempt, this setup.py build handles the full range of 
target platforms and optimized flavors: x86-64 assembly on Windows-MSVC and 
Unix, 32-bit x86 intrinsics on Windows-MSVC and Unix, NEON intrinsics on 
AArch64, and portable C for everyone else. I'm new to distutils/setuptools and 
not particular familiar with the MSVC toolchain either, so there's a good 
chance that others can suggest better/cleaner/more robust approaches than what 
I've got, but it's at least working on my machines and on GitHub CI. (I haven't 
tried to get any cross-compilation working though; is that a thing?)

I haven't published this module to PyPI, partly to avoid confusion with the 
Rust-based implementation, which I think most applications should prefer. But 
if it would make a big difference to anyone who wants to review this code, we 
could certainly put it up as `experimental_blake3_c` or something? Let me know 
what's best.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2021-09-05 Thread Jack O'Connor

Jack O'Connor  added the comment:

Hi Michał, no I haven't done any more work on this since my comments back in 
April. If you wanted to get started on a PyPI implementation, I think that 
would be fantastic. I'd be happy to collaborate over email: 
oconnor...@gmail.com. The branches I linked are still up, but I'm not sure my 
code will be very useful to someone who actually knows what they're doing :) 
Larry also had several ideas about how multithreading could fit in (which would 
be API changes in the Rust case, and forward-looking design work in the C 
case), and if I get permission from Larry I'll forward those emails.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2021-09-05 Thread Michał Górny

Michał Górny  added the comment:

Jack, are you still working on this?  I was considering allocating the time to 
write the bindings for the C library myself but I've stumbled upon this bug and 
I suppose there's no point in duplicating work.  I'd love to see it on pypi, so 
we could play with it a bit.

--
nosy: +mgorny

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2021-04-19 Thread Jack O'Connor


Jack O'Connor  added the comment:

Hey Christian, yes these are new bindings, and also incomplete. See comments in 
https://github.com/oconnor663/cpython/commit/dc6f6163ad9754c9ad53e9e3f3613ca3891a77ba,
 but in short only x86-64 Unix is in working order. If 3.10 doesn't seem 
realistic, I'm happy to go the PyPI route. That said, this is my first time 
using the Python C API. (My code in that branch is going to make that pretty 
obvious.) Could you recommend any existing packages that I might be able to use 
as a model?

For OpenSSL, I'm very interested in the abstract but less familiar with their 
project and their schedules. Who might be a good person to get in touch with?

> I assume there's a completely generic platform-agnostic C implementation, for 
> build environments where the assembly won't work, yes?

Yes, that's the vendored file blake3_portable.c. One TODO for my branch here is 
convincing the Python build system not to try to compile the x86-64-specific 
stuff on other platforms. The vendored file blake3_dispatch.c abstracts over 
all the different implementations and takes care of #ifdef'ing 
platform-specific function calls. (It also does runtime CPU feature detection 
on x86.)

> written using the Rust implementation, which I understand is even more 
> performant

A few details here: The upstream Rust and C implementations have been matched 
in single threaded performance for a while now. They share the same assembly 
files, and the rest is a direct port. The big difference is that Rust also 
includes multithreading support, using the Rayon work-stealing runtime. The 
blake3-py module based on the Rust crate exposes this with a simple boolean 
flag, though we've been thinking about ways to give the caller more control 
over the number of threads used.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2021-04-18 Thread Christian Heimes


Christian Heimes  added the comment:

3.10 feature freeze is in two weeks (May 3). I don't feel comfortable to add so 
much new C code shortly before beta 1. If I understandly correctly the code is 
new and hasn't been published on PyPI yet. I also don't have much time to 
properly review the code. OpenSSL 3.0.0 and PEP 644 is keeping me busy.

I would prefer to postpone the inclusion of blake3. Could you please publish 
the C version on PyPI first and let people test it?

Apropos OpenSSL, do you have plans to submit the algorithm to OpenSSL for 
inclusion in 3.1.0?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2021-04-18 Thread Larry Hastings


Larry Hastings  added the comment:

I note that Python already ships with some #ifdefs around SSE and the like.  
So, yes, we already do this sort of thing, although I think this usually uses 
compiler intrinsics rather than actual assembly.  A quick grep shows zero .s 
files and only one .asm file (./Modules/_decimal/libmpdec/vcdiv64.asm) in the 
Python tree.  Therefore it wouldn't be completely novel for Python but it's 
unusual.

I assume there's a completely generic platform-agnostic C implementation, for 
build environments where the assembly won't work, yes?


Disclaimer: I've been corresponding with Jack sporadically over the past year 
regarding the BLAKE3 Python API.  I also think BLAKE3 is super duper cool 
neat-o, and I have uses for it.  So I'd love to see it in Python 3.10.

One note, just to draw attention to it: the "blake3-py" module, also published 
by Jack, is written using the Rust implementation, which I understand is even 
more performant.  Obviously there's no chance Python would ship that 
implementation.  But by maintaining exact API compatibility between "blake3-py" 
and the "blake3" added to hashlib, this means code can use the fast one when 
it's available, and the built-in one when it isn't, a la CStringIO:

try:
from blake3 import blake3
except ImportError:
from hashlib import blake3

--
versions: +Python 3.10 -Python 3.9

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2021-04-18 Thread Jack O'Connor


Jack O'Connor  added the comment:

An update a year later: I have a proof-of-concept branch that adds BLAKE3 
support to hashlib: https://github.com/oconnor663/cpython/tree/blake3. That 
branch is API compatible with the current master branch of 
https://github.com/oconnor663/blake3-py. Both that module and the upstream 
BLAKE3 repo are ready to be tagged 1.0, just waiting to see whether any 
integrations like this one end up requesting changes.

Would anyone be interested in moving ahead with this? One of the open questions 
would be whether CPython would vendor the BLAKE3 optimized assembly files, or 
whether we'd prefer to stick to C intrinsics.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2020-03-04 Thread Jack O'Connor


Jack O'Connor  added the comment:

I've just published some Python bindings for the Rust implementation on PyPI: 
https://pypi.org/project/blake3

> I'm guessing Python is gonna hold off until BLAKE3 reaches 1.0.

That's very fair. The spec and test vectors are set in stone at this point, but 
the implementations are new, and I don't see any reason to rush things out. 
(Especially since early adopters can now use the library above.) That said, 
there aren't really any expected implementation changes that would be a natural 
moment for the implementations to tag 1.0. I'll probably end up tagging 1.0 as 
soon as a caller appears who needs it to be tagged to meet their own stability 
requirements.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2020-02-19 Thread Maor Kleinberger


Change by Maor Kleinberger :


--
nosy: +kmaork

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2020-02-12 Thread Larry Hastings


Larry Hastings  added the comment:

Personally I'm enjoying these BLAKE3 status updates, and I wouldn't mind at all 
being kept up-to-date during BLAKE3's development via messages on this issue.  
But, given the tenor of the conversation so far, I'm guessing Python is gonna 
hold off until BLAKE3 reaches 1.0.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2020-02-12 Thread Jack O'Connor


Jack O'Connor  added the comment:

Version 0.2.0 of the BLAKE3 repo includes optimized assembly implementations. 
These are behind the "c" Cargo feature for the `blake3` Rust crate, but 
included by default for the internal bindings crate. So the easiest way to 
rerun our favorite benchmark is:

git clone https://github.com/BLAKE3-team/BLAKE3
cd BLAKE3
git fetch
# I rebased this branch on top of version 0.2.0 today.
git checkout origin/bench_406668786
cd c/blake3_c_rust_bindings
# Nightly is currently broken for unrelated reasons, so
# we use stable with this internal bootstrapping flag.
RUSTC_BOOTSTRAP=1 cargo bench 406668786

Running the above on my machine, I get 2888 MB/s, up another 12% from the 0.1.3 
numbers. As a bonus, we don't need to worry about the difference between GCC 
and Clang.

These new assembly files are essentially drop-in replacements for the 
instruction-set-specific C files we had before, which are also still supported. 
The updated C README has more details: 
https://github.com/BLAKE3-team/BLAKE3/blob/master/c/README.md

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2020-02-01 Thread Jakub Stasiak


Change by Jakub Stasiak :


--
nosy: +jstasiak

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2020-01-27 Thread Larry Hastings


Larry Hastings  added the comment:

I just tried it with clang, and uff-da!  2,737,446,868 bytes/sec!

p.s. I compiled with -O3 for both gcc and clang

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2020-01-27 Thread Larry Hastings


Larry Hastings  added the comment:

I gave it a go.  And yup, I see a definite improvement: it jumped from 
1,583,326,242 bytes/sec to 2,376,741,703 bytes/sec on my Intel laptop using 
AVX2.  A 50% improvement!

I also *think* I'm seeing a 10% improvement in ARM using NEON.  On my DE10-Nano 
board, BLAKE3 portable gets about 50mb/sec, and now BLAKE3 using NEON gets 
about 55mb/sec.  (Roughly.)  I might have goofed up on the old benchmarks 
though, or just not written down the final correct numbers.

I observed no statistically significant performance change in the no-SIMD 
builds on Intel and ARM.

p.s. in my previous comment with that table of benchmarks I said "mb/sec".  I 
meant "bytes/sec".  Oops!

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2020-01-22 Thread Jack O'Connor


Jack O'Connor  added the comment:

Version 0.1.3 of the official BLAKE3 repo includes some significant performance 
improvements:

- The x86 implementations include explicit prefetch instructions, which helps 
with long inputs. (commit b8c33e1)
- The C implementation now uses the same parallel parent hashing strategy that 
the Rust implementation uses. (commit 163f522)

When I repeat the benchmarks above with TurboBoost on, here's what I see now:

BLAKE3 Rust  2578 MB/s
BLAKE3 C (clang -O3) 2502 MB/s
BLAKE3 C (gcc -O2)   2223 MB/s
K12 C (gcc -O2)  2175 MB/s

Larry, if you have time to repeat your benchmarks with the latest C code, I'd 
be curious to see if you get similar results.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2020-01-17 Thread Jack O'Connor


Jack O'Connor  added the comment:

I plan to bring the C code up to speed with the Rust code this week. As part of 
that, I'll probably remove comments like the one above :) Otherwise, is there 
anything else we can do on our end to help with this?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2020-01-16 Thread Jack O'Connor


Jack O'Connor  added the comment:

Ok, I've added Rust bindings to the BLAKE3 C implementation, so that I can 
benchmark it in a vaguely consistent way. My laptop is an i5-8250U, which 
should be very similar to yours. (Both are "Kaby Lake Refresh".) My end result 
do look similar to yours with TurboBoost on, but pretty different with 
TurboBoost off:

with TurboBoost on
--
K12 GCC| 2159 MB/s
BLAKE3 Rust| 1787 MB/s
BLAKE3 C Clang | 1588 MB/s
BLAKE3 C GCC   | 1453 MB/s

with TurboBoost off
---
BLAKE3 Rust| 1288 MB/s
K12 GCC| 1060 MB/s
BLAKE3 C Clang | 1094 MB/s
BLAKE3 C GCC   |  943 MB/s

The difference seems to be that with TurboBoost on, the BLAKE3 benchmarks have 
my CPU sitting around 2.4 GHz, while for the K12 benchmarks it's more like 2.9 
GHz. With TurboBoost off, both benchmarks run at 1.6 GHz, and BLAKE3 does 
better. I'm not sure what causes that frequency difference. Perhaps some 
high-power instruction that the BLAKE3 implementation is emitting?

To reproduce these numbers you can clone these two repos (the latter is where I 
happen to have a K12 benchmark):

https://github.com/BLAKE3-team/BLAKE3
https://github.com/oconnor663/blake2_simd

Then in both cases checkout the "bench_406668786" branch, where I've put some 
benchmarks with the same input length you used.

For Rust BLAKE3, at the root of the BLAKE3 repo, run: cargo +nightly bench 
406668786

For C BLAKE3, the command is the same, but run it in the 
"./c/blake3_c_rust_bindings" directory. The build defaults to GCC, and you can 
"export CC=clang" to switch it.

For my K12 benchmark, at the root of the blake2_simd repo, run: cargo +nightly 
bench --features=kangarootwelve 406668786

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2020-01-13 Thread Larry Hastings


Larry Hastings  added the comment:

According to my order details it is a "8th Generation Intel Core i7-8650U".

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2020-01-13 Thread Jack O'Connor


Jack O'Connor  added the comment:

I'm in the middle of adding some Rust bindings to the C implementation in 
github.com/BLAKE3-team/BLAKE3, so that `cargo test` and `cargo bench` can cover 
both. Once that's done, I'll follow up with benchmark numbers from my laptop 
(Kaby Lake i5-8250U, also AVX2 with no AVX-512). For benchmark numbers with 
AVX-512 support, see the Performance section of the BLAKE3 paper 
(https://github.com/BLAKE3-team/BLAKE3-specs/blob/master/blake3.pdf). Larry, 
what processor did you run your benchmarks on?

Also, is there anything currently in CPython that does dispatch based on 
runtime CPU feature detection? Is this something that BLAKE3 should do for 
itself, or is there existing machinery that we'd want to integrate with?

--
nosy: +oconnor663

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2020-01-11 Thread Larry Hastings


Larry Hastings  added the comment:

For what it's worth, I spent some time producing clean benchmarks.  All these 
were run on the same laptop, and all pre-load the same file (406668786 bytes) 
and run one update() on the whole thing to minimize overhead.  K12 and BLAKE3 
are using a hand-written C driver, and compiled with both gcc and clang; all 
the rest of the algorithms are from hashlib.new, python3 configured with 
--enable-optimizations and compiled with gcc.  K12 and BLAKE3 support several 
SIMD extensions; this laptop only has AVX2 (no AVX512).  All these numbers are 
the best of 3.  All tests were run in a single thread.

-+--+--++---
   hash algorithm|elapsed s |mb/sec|size|hash
-+--+--++---
  K12-Haswell 0.176949   2298224495  64  24693954fa0dfb059f99...
K12-Haswell-clang 0.181968   2234841926  64  24693954fa0dfb059f99...
BLAKE3-AVX2-clang 0.250482   1623547723  64  30149a073eab69f76583...
  BLAKE3-AVX2 0.256845   1583326242  64  30149a073eab69f76583...
  md4 0.37684668 1079135924  32  d8a66422a4f0ae430317...
 sha1 0.46739069  870083193  40  a7488d7045591450ded9...
K12-clang 0.498058816509323  64  24693954fa0dfb059f99...
   BLAKE3 0.561470724292378  64  30149a073eab69f76583...
  K12 0.569490714093306  64  24693954fa0dfb059f99...
 BLAKE3-clang 0.57374370881  64  30149a073eab69f76583...
  blake2b 0.58276098  697831191 128  809ca44337af39792f8f...
  md5 0.59936016  678504863  32  306d7de4d1622384b976...
   sha384 0.64208886  633352818  96  b107ce5d086e9757efa7...
   sha512_224 0.66094102  615287556  56  90931762b9e553bd07f3...
   sha512_256 0.66465768  611846969  64  27b03aacdfbde1c2628e...
   sha512 0.6776549   600111921 128  f0af29e2019a6094365b...
  blake2s 0.86828375  468359318  64  02bee0661cd88aa2be15...
   sha256 0.97720436  416155312  64  48b5243cfcd90d84cd3f...
   sha224 1.0255457   396538907  56  10fb56b87724d59761c6...
shake_128 1.0895037   373260576  32  2ec12727ac9d59c2e842...
 md5-sha1 1.1171806   364013470  72  306d7de4d1622384b976...
 sha3_224 1.2059123   337229156  56  93eaf083ca3a9b348e14...
shake_256 1.3039152   311882857  64  b92538fd701791db8c1b...
 sha3_256 1.3417314   303092540  64  69354bf585f21c567f1e...
ripemd160 1.4846368   273918025  40  30f2fe48fec404990264...
 sha3_384 1.7710776   229616579  96  61af0469534633003d3b...
  sm3 1.8384831   221198006  64  1075d29c75b06cb0af3e...
 sha3_512 2.4839673   163717444 128  c7c250e79844d8dc856e...

If I can't have BLAKE3, I'm definitely switching to BLAKE2 ;-)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2020-01-11 Thread Christian Heimes


Christian Heimes  added the comment:

I've been playing with the new algorithm, too. Pretty impressive!

Let's give the reference implementation a while to stabilize. The code has 
comments like: "This is only for benchmarking. The guy who wrote this file 
hasn't touched C since college. Please don't use this code in production."

--
assignee:  -> christian.heimes

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2020-01-10 Thread Karthikeyan Singaravelan


Change by Karthikeyan Singaravelan :


--
nosy: +xtreak

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2020-01-10 Thread Dong-hee Na


Change by Dong-hee Na :


--
nosy: +corona10

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39298] add BLAKE3 to hashlib

2020-01-10 Thread Larry Hastings


New submission from Larry Hastings :

>From 3/4 of the team that brought you BLAKE2, now comes... BLAKE3!

https://github.com/BLAKE3-team/BLAKE3

BLAKE3 is a brand new hashing function.  It's fast, it's paralellizeable, and 
unlike BLAKE2 there's only one variant.

I've experimented with it a little.  On my laptop (2018 Intel i7 64-bit), the 
portable implementation is kind of middle-of-the-pack, but with AVX2 enabled 
it's second only to the "Haswell" build of KangarooTwelve.  On a 32-bit ARMv7 
machine the results are more impressive--the portable implementation is 
neck-and-neck with MD4, and with NEON enabled it's definitely the fastest hash 
function I tested.  These tests are all single-threaded and eliminate I/O 
overhead.

The above Github repo has a reference implementation in C which includes Intel 
and ARM SIMD drivers.  Unsurprisingly, the interface looks roughly the same as 
the BLAKE2 interface(s), so if you took the existing BLAKE2 module and 
s/blake2b/blake3/ you'd be nearly done.  Not quite as close as blake2b and 
blake2s though ;-)

--
components: Library (Lib)
keywords: patch
messages: 359777
nosy: Zooko.Wilcox-O'Hearn, christian.heimes, larry
priority: normal
severity: normal
stage: needs patch
status: open
title: add BLAKE3 to hashlib
type: enhancement
versions: Python 3.9

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com