Jack O'Connor <[email protected]> added the comment:
I was about to say the only missing feature was docstrings, but then I realized
I hadn't included releasing the GIL. I've added that and pushed an update just
now. Fingers crossed there's nothing else I've missed. I think it's in
reasonably good shape, and I'd like to propose it for inclusion in 3.11.
However, I'm not very experienced with setup.py or the Python C API, so I might
not be the best judge of shape. Here are some highlights for reviewers, where I
think the implementation (mostly the build) could be shaky:
- Platform detection. In setup.py I assume that the target platform of the
build is the same as the current interpreter's platform. So for example, if the
current interpreter has sys.maxsize==2**31-1, I assume we're compiling for
32-bit. This clearly doesn't allow for any sort of cross-compilation, and in
general I need feedback about whether there's a more correct way to obtain the
target platform.
- Compiling assembly files. On Unix this is easy, because we can supply *.S
files as `extra_objects` and GCC/Clang will do the right thing. But on Windows
this isn't easy. Currently I shell out to vswhere.exe, ask it for the path to
the latest version of the ml64.exe assembler, and then shell out to that to
build .obj files. Then I pass those assembled .obj files as `extra_objects`.
This feels awfully manual, and there's a good chance I've missed some
better-supported way to do it. I assume we don't want to check in the .obj
files?
- Does Python support the GNU ABI on Windows? We have assembly files for this
in vendor/, but I'm not currently building them.
- Compiling intrinsics files for 32-bit x86. In this case, I create a
`ccompiler.new_compiler()` for each intrinsics file, so that I can set the
appropriate flags for each. This is relatively clean, but it leads to things
getting rebuilt every single time, rather than participating in `setup.py
build` caching. Maybe nobody cares about this, but again it makes me think
there might be a better-supported way to do it.
- blake3module.c contains an awful lot of gotos to handle allocation failure
cases. Is this still considered a best practice? These are bug-prone, and I'm
not sure how to test them.
- Existing hashlib implementations include an optimization where they avoid
allocating an internal mutex until they see a long input and want to release
the GIL. As a quirky side effect of this, they handle allocation failures for
that mutex by falling back to the do-not-release-the-GIL codepath. That feels
kind of complicated to me, and in my code I'm always allocating the mutex
during initialization. This doesn't seem to make much of a performance
difference when I measure it, but there might be use cases I haven't considered.
Here are some other API details that might be worth bikeshedding:
- The `max_threads` parameter is currently defined to take a special value,
`blake3.AUTO`, to indicate that the implementation may use as many threads as
it likes. (The C implementation doesn't include multithreading support, but
it's API-compatible with the Rust implementation.) `blake3.AUTO` is currently a
class attribute equal to -1. We may want to bikeshed this name or propose some
other representation.
- BLAKE3 has three modes: regular hashing, keyed hashing, and key derivation.
The keyed hashing mode takes a 32-byte key, and the key derivation mode takes a
context string. Calling the 32-byte key `key` seems good. Calling the context
string `context` seems less good. Larry has pointed out before that lots of
random things are called `context`, and readers might not understand what
they're seeing. I currently call it `derive_key_context` instead, but we could
bikeshed this.
- I check `itemsize` on input Py_buffers and throw an exception if it's
anything other than 1. My test suite exercises this, see
`test_int_array_fails`. However, some (all?) standard hashes don't do this
check. For example:
>>> from hashlib import sha256
>>> import array
>>> a = array.array("i", [255])
>>> sha256(a).hexdigest()
'81ff65efc4487853bdb4625559e69ab44f19e0f5efbd6d5b2af5e3ab267c8e06'
>>> sha256(bytes([0xff, 0, 0, 0])).hexdigest()
'81ff65efc4487853bdb4625559e69ab44f19e0f5efbd6d5b2af5e3ab267c8e06'
Here we can see sha256() hashing an array of int. On my machine, an int is 4
little-endian bytes, but of course this sort of thing isn't portable. The same
array will result in a different SHA-256 output on a big-endian machine, or on
a machine with ints of a different size. This seems undesirable, and I'm
surprised that hashlib allows it. However, if there's some known compatibility
reason why we have to allow it, I could remove this check.
----------
versions: +Python 3.11 -Python 3.10
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue39298>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com