[issue39298] add BLAKE3 to hashlib
Jack O'Connor added the comment: I did reply to that point above with some baseless speculation, but now I can back up my baseless speculation with unscientific data :) https://gist.github.com/oconnor663/aed7016c9dbe5507510fc50faceaaa07 According to whatever `powerstat -R` measures on my laptop, running hardware-accelerated SHA-256 in a loop for a minute or so takes 26.86 Watts on average. Doing the same with AVX-512 BLAKE3 takes 29.53 Watts, 10% more. Factoring in the 4.69x difference in throughput reported by those loops, the overall energy/byte for BLAKE3 is 4.27x lower than SHA-256. This is my first time running a power benchmark, so if this sounds implausible hopefully someone can catch my mistakes. -- ___ Python tracker <https://bugs.python.org/issue39298> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39298] add BLAKE3 to hashlib
Jack O'Connor added the comment: > Truncated sha512 (sha512-256) typically performs 40% faster than sha256 on > X86_64. Without hardware acceleration, yes. But because SHA-NI includes only SHA-1 and SHA-256, and not SHA-512, it's no longer a level playing field. OpenSSL's SHA-512 and SHA-512/256 both get about 797 MB/s on my machine. -- ___ Python tracker <https://bugs.python.org/issue39298> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39298] add BLAKE3 to hashlib
Jack O'Connor added the comment: > Hardware accelerated SHAs are likely faster than blake3 single core. Surprisingly, they're not. Here's a quick measurement on my recent ThinkPad laptop (64 KiB of input, single-threaded, TurboBoost left on), which supports both AVX-512 and the SHA extensions: OpenSSL SHA-256: 1816 MB/s OpenSSL SHA-1: 2103 MB/s BLAKE3 SSE2: 2109 MB/s BLAKE3 SSE4.1: 2474 MB/s BLAKE3 AVX2: 4898 MB/s BLAKE3 AVX-512: 8754 MB/s The main reason SHA-1 and SHA-256 don't do better is that they're fundamentally serial algorithms. Hardware acceleration can speed up a single instance of their compression functions, but there's just no way for it to run more than one instance per message at a time. In contrast, AES-CTR can easily parallelize its blocks, and hardware accelerated AES does beat BLAKE3. > And certainly more efficient in terms of watt-secs/byte. I don't have any experience measuring power myself, so take this with a grain of salt: I think the difference in throughput shown above is large enough that, even accounting for the famously high power draw of AVX-512, BLAKE3 comes out ahead in terms of energy/byte. Probably not on ARM though. -- ___ Python tracker <https://bugs.python.org/issue39298> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39298] add BLAKE3 to hashlib
Jack O'Connor added the comment: > maintaining a complicated build process in-tree For what it's worth, if you have any sort of "in a perfect world" vision for what the upstream BLAKE3 project could do to make it trivially easy for you to integrate, I'd be very interested in getting that done. Making integration easy would benefit all callers. We have some issues on the backburner about committing CMake build files, but I assume those would be useless for us here. Is there anything that would be more useful? If we provided autotools build files, could you call into them? Fundamentally, BLAKE3 wants to build some code on x86 and some other code on ARM, and also some code on Unix and some other code on Windows. Currently we just ask the caller to do that for us, for lack of a common standard. (And if we're building intrinsics rather than assembly, we also need the compiler flags that enable our intrinsics.) But maybe we could handle more of that upstream, using the preprocessor? If the build instructions said "compile this one giant file on all platforms and don't worry about what it does", would that be better? Or would that be gross? Is a header-only library the gold standard? Or too C++-ish? Has anyone ever done a really good job of this? -- ___ Python tracker <https://bugs.python.org/issue39298> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39298] add BLAKE3 to hashlib
Jack O'Connor added the comment: Thanks Larry! Was any of the experimental C extension code under https://github.com/oconnor663/blake3-py/tree/master/c_impl useful to you? I was wondering if it could be possible to copy blake3module.c from there verbatim. The setup.py build there also has working Windows and NEON support. I've patched the blake3-py test suite (which both the production Rust-based extension and that experimental C-based extension currently pass) to invoke the new hashlib implementation from your branch. You can find the full test output, and the procedure I used to run the tests, in this Gist https://gist.github.com/oconnor663/533048580b1c0f4a01d1d55f57f92792. Here are some differences: - My derive_key_context parameter requires a string and refuses to accept bytes. This is consistent with our Rust and C APIs (though the C API does include a _raw version specifically for bindings, which we're using here). For a long discussion of why we prefer to do things this way, see https://github.com/BLAKE3-team/BLAKE3/issues/13. The short version is that any use case that requires arbitrary bytes for the context string is almost certainly violating the documented security requirement that the context string must be hardcoded. - I've included an `AUTO` constant that provides a special value (-1) for the `max_threads` argument, and I explicitly don't support `max_threads=0`. - I enforce that the `data` arguments are positional-only and that the other keyword arguments are keyword-only. I think this is consistent with the rest of hashlib. - I include a `.reset()` method. This isn't particularly useful in the default case, where you might as well create a new hasher. But when `max_threads` is greater than 1 in the Rust implementation, the hasher owns a thread pool, and `.reset()` is currently the only way to reuse that pool. (A BLAKE3 hasher is also ~2 KB, somewhat larger than other hashers, so callers who are pinching pennies with their allocator traffic might prefer to reuse the object.) - Unrelated to tests: I haven't made any attempt to zero memory in my `dealloc` function. But if that's what other hashlib functions do, then I'm certainly in favor of doing it here too. -- ___ Python tracker <https://bugs.python.org/issue39298> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39298] add BLAKE3 to hashlib
Jack O'Connor added the comment: Yes, everything in https://github.com/BLAKE3-team/BLAKE3 and https://github.com/oconnor663/blake3-py is public domain via CC0, and dual licensed under Apache for good measure. Hopefully that makes it easy to use it anywhere. -- ___ Python tracker <https://bugs.python.org/issue39298> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39298] add BLAKE3 to hashlib
Jack O'Connor added the comment: What's the best way for me to help with the next steps of this? -- ___ Python tracker <https://bugs.python.org/issue39298> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39298] add BLAKE3 to hashlib
Jack O'Connor added the comment: Yeah by intrinsics I mean stuff like _mm256_add_epi32(). All of that stuff is in these vendored files: blake3_avx2.c blake3_avx512.c blake3_neon.c blake3_sse2.c blake3_sse41.c Also to Michał's question above, I'm not necessarily opposed to publishing something like "blake3-c" on PyPI once things stabilize. Even if we get BLAKE3 into hashlib in 3.11, PyPI modules will be useful to folks running older versions, and not everyone wants to install the Rust toolchain. -- ___ Python tracker <https://bugs.python.org/issue39298> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39298] add BLAKE3 to hashlib
Jack O'Connor added the comment: > As a first pass I say we merge the reference C implementation. Do you mean portable-only C code, or portable + intrinsics? If the assembly files are out, I'd advocate for the latter. The intrinsics implementations are nearly as fast as the assembly code, and both of those are several times faster than the portable code. You can test this configuration with my current setup.py by setting the env var FORCE_INTRINSICS=1. -- ___ Python tracker <https://bugs.python.org/issue39298> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39298] add BLAKE3 to hashlib
Jack O'Connor added the comment: I was about to say the only missing feature was docstrings, but then I realized I hadn't included releasing the GIL. I've added that and pushed an update just now. Fingers crossed there's nothing else I've missed. I think it's in reasonably good shape, and I'd like to propose it for inclusion in 3.11. However, I'm not very experienced with setup.py or the Python C API, so I might not be the best judge of shape. Here are some highlights for reviewers, where I think the implementation (mostly the build) could be shaky: - Platform detection. In setup.py I assume that the target platform of the build is the same as the current interpreter's platform. So for example, if the current interpreter has sys.maxsize==2**31-1, I assume we're compiling for 32-bit. This clearly doesn't allow for any sort of cross-compilation, and in general I need feedback about whether there's a more correct way to obtain the target platform. - Compiling assembly files. On Unix this is easy, because we can supply *.S files as `extra_objects` and GCC/Clang will do the right thing. But on Windows this isn't easy. Currently I shell out to vswhere.exe, ask it for the path to the latest version of the ml64.exe assembler, and then shell out to that to build .obj files. Then I pass those assembled .obj files as `extra_objects`. This feels awfully manual, and there's a good chance I've missed some better-supported way to do it. I assume we don't want to check in the .obj files? - Does Python support the GNU ABI on Windows? We have assembly files for this in vendor/, but I'm not currently building them. - Compiling intrinsics files for 32-bit x86. In this case, I create a `ccompiler.new_compiler()` for each intrinsics file, so that I can set the appropriate flags for each. This is relatively clean, but it leads to things getting rebuilt every single time, rather than participating in `setup.py build` caching. Maybe nobody cares about this, but again it makes me think there might be a better-supported way to do it. - blake3module.c contains an awful lot of gotos to handle allocation failure cases. Is this still considered a best practice? These are bug-prone, and I'm not sure how to test them. - Existing hashlib implementations include an optimization where they avoid allocating an internal mutex until they see a long input and want to release the GIL. As a quirky side effect of this, they handle allocation failures for that mutex by falling back to the do-not-release-the-GIL codepath. That feels kind of complicated to me, and in my code I'm always allocating the mutex during initialization. This doesn't seem to make much of a performance difference when I measure it, but there might be use cases I haven't considered. Here are some other API details that might be worth bikeshedding: - The `max_threads` parameter is currently defined to take a special value, `blake3.AUTO`, to indicate that the implementation may use as many threads as it likes. (The C implementation doesn't include multithreading support, but it's API-compatible with the Rust implementation.) `blake3.AUTO` is currently a class attribute equal to -1. We may want to bikeshed this name or propose some other representation. - BLAKE3 has three modes: regular hashing, keyed hashing, and key derivation. The keyed hashing mode takes a 32-byte key, and the key derivation mode takes a context string. Calling the 32-byte key `key` seems good. Calling the context string `context` seems less good. Larry has pointed out before that lots of random things are called `context`, and readers might not understand what they're seeing. I currently call it `derive_key_context` instead, but we could bikeshed this. - I check `itemsize` on input Py_buffers and throw an exception if it's anything other than 1. My test suite exercises this, see `test_int_array_fails`. However, some (all?) standard hashes don't do this check. For example: >>> from hashlib import sha256 >>> import array >>> a = array.array("i", [255]) >>> sha256(a).hexdigest() '81ff65efc4487853bdb4625559e69ab44f19e0f5efbd6d5b2af5e3ab267c8e06' >>> sha256(bytes([0xff, 0, 0, 0])).hexdigest() '81ff65efc4487853bdb4625559e69ab44f19e0f5efbd6d5b2af5e3ab267c8e06' Here we can see sha256() hashing an array of int. On my machine, an int is 4 little-endian bytes, but of course this sort of thing isn't portable. The same array will result in a different SHA-256 output on a big-endian machine, or on a machine with ints of a different size. This seems undesirable, and I'm surprised that hashlib allows it. However, if there's some known compatibility reason why we have to allow it, I could remove this check. -- versions: +Python 3.11 -Python 3.10
[issue39298] add BLAKE3 to hashlib
Jack O'Connor added the comment: Ah, good idea. I've published the new C implementation as: https://test.pypi.org/project/blake3-experimental-c/ You can install it with: pip install -i https://test.pypi.org/simple/ blake3-experimental-c Despite the package name change, the extension module is still "blake3", so we still "import blake3" to get at it. For example: $ pip install -i https://test.pypi.org/simple/ blake3-experimental-c $ python >>> from blake3 import blake3 >>> blake3(b"foo").hexdigest() '04e0bb39f30b1a3feb89f536c93be15055482df748674b00d26e5a7502e9' >>> blake3(b"foo", max_threads=blake3.AUTO).hexdigest() '04e0bb39f30b1a3feb89f536c93be15055482df748674b00d26e5a7502e9' To run the Rust implementation's test suite against this implementation, you could then: $ pip install pytest numpy $ git clone https://github.com/oconnor663/blake3-py $ python -m pytest blake3-py/tests/test_blake3.py = test session starts = platform linux -- Python 3.10.1, pytest-6.2.5, py-1.11.0, pluggy-0.13.1 rootdir: /tmp collected 24 items blake3-py/tests/test_blake3.py [100%] = 24 passed in 0.30s == -- ___ Python tracker <https://bugs.python.org/issue39298> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39298] add BLAKE3 to hashlib
Jack O'Connor added the comment: Update: There is now a C version of the `blake3` Python module available at https://github.com/oconnor663/blake3-py/tree/master/c_impl. It's completely API-compatible with the Rust version, and it passes the same test suite. Multithreading (which is implemented in upstream Rust but not in upstream C) is exposed through a "max_threads" argument, as Larry Hastings suggested. The C implementation allows this argument but ignores it. Unlike my previous attempt, this setup.py build handles the full range of target platforms and optimized flavors: x86-64 assembly on Windows-MSVC and Unix, 32-bit x86 intrinsics on Windows-MSVC and Unix, NEON intrinsics on AArch64, and portable C for everyone else. I'm new to distutils/setuptools and not particular familiar with the MSVC toolchain either, so there's a good chance that others can suggest better/cleaner/more robust approaches than what I've got, but it's at least working on my machines and on GitHub CI. (I haven't tried to get any cross-compilation working though; is that a thing?) I haven't published this module to PyPI, partly to avoid confusion with the Rust-based implementation, which I think most applications should prefer. But if it would make a big difference to anyone who wants to review this code, we could certainly put it up as `experimental_blake3_c` or something? Let me know what's best. -- ___ Python tracker <https://bugs.python.org/issue39298> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39298] add BLAKE3 to hashlib
Jack O'Connor added the comment: Hi Michał, no I haven't done any more work on this since my comments back in April. If you wanted to get started on a PyPI implementation, I think that would be fantastic. I'd be happy to collaborate over email: oconnor...@gmail.com. The branches I linked are still up, but I'm not sure my code will be very useful to someone who actually knows what they're doing :) Larry also had several ideas about how multithreading could fit in (which would be API changes in the Rust case, and forward-looking design work in the C case), and if I get permission from Larry I'll forward those emails. -- ___ Python tracker <https://bugs.python.org/issue39298> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue25782] CPython hangs on error __context__ set to the error itself
Change by Jack O'Connor : -- nosy: -oconnor663 ___ Python tracker <https://bugs.python.org/issue25782> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39298] add BLAKE3 to hashlib
Jack O'Connor added the comment: Hey Christian, yes these are new bindings, and also incomplete. See comments in https://github.com/oconnor663/cpython/commit/dc6f6163ad9754c9ad53e9e3f3613ca3891a77ba, but in short only x86-64 Unix is in working order. If 3.10 doesn't seem realistic, I'm happy to go the PyPI route. That said, this is my first time using the Python C API. (My code in that branch is going to make that pretty obvious.) Could you recommend any existing packages that I might be able to use as a model? For OpenSSL, I'm very interested in the abstract but less familiar with their project and their schedules. Who might be a good person to get in touch with? > I assume there's a completely generic platform-agnostic C implementation, for > build environments where the assembly won't work, yes? Yes, that's the vendored file blake3_portable.c. One TODO for my branch here is convincing the Python build system not to try to compile the x86-64-specific stuff on other platforms. The vendored file blake3_dispatch.c abstracts over all the different implementations and takes care of #ifdef'ing platform-specific function calls. (It also does runtime CPU feature detection on x86.) > written using the Rust implementation, which I understand is even more > performant A few details here: The upstream Rust and C implementations have been matched in single threaded performance for a while now. They share the same assembly files, and the rest is a direct port. The big difference is that Rust also includes multithreading support, using the Rayon work-stealing runtime. The blake3-py module based on the Rust crate exposes this with a simple boolean flag, though we've been thinking about ways to give the caller more control over the number of threads used. -- ___ Python tracker <https://bugs.python.org/issue39298> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39298] add BLAKE3 to hashlib
Jack O'Connor added the comment: An update a year later: I have a proof-of-concept branch that adds BLAKE3 support to hashlib: https://github.com/oconnor663/cpython/tree/blake3. That branch is API compatible with the current master branch of https://github.com/oconnor663/blake3-py. Both that module and the upstream BLAKE3 repo are ready to be tagged 1.0, just waiting to see whether any integrations like this one end up requesting changes. Would anyone be interested in moving ahead with this? One of the open questions would be whether CPython would vendor the BLAKE3 optimized assembly files, or whether we'd prefer to stick to C intrinsics. -- ___ Python tracker <https://bugs.python.org/issue39298> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue42558] waitpid/waitid race caused by change to Popen.send_signal in Python 3.9
Jack O'Connor added the comment: Right, the example above is contrived to demonstrate the race and the crash. In real life code, the good reason I know of to write code like this is to use os.waidid(WNOWAIT) to solve the wait/kill race properly. This is what Duct has been doing, and Nathaniel Smith also described this strategy in https://bugs.python.org/issue38630. The idea is that a waiting thread follows these steps: 1. waitid() with WNOWAIT set, without locking the child 2. after waitid returns, indicating the child has exited, lock the child 3. waitid() without WNOWAIT, or just waitpid(), to reap the zombie child 4. stash the exit status and unlock Meanwhile a killing thread follows these steps: 1. lock the child 2. check the stashed exit status, and unlock and exit early if it's set 3. otherwise, signal the child and unlock This strategy solves the race. The killing thread is free to signal while the waiting thread is blocked in step 1. If the killing thread happens to race in between when waitid() returns and when the waiting thread acquires the child lock, the child is a zombie and the kill signal has no effect. This is safe even if other threads (or e.g. the OOM killer) can randomly kill our child: *they* might have to worry about PID reuse, but their signals can never cause *us* to kill an unrelated process. What breaks this scheme is if some thread calls waitpid() and reaps the child outside of the lock, but normally that'd be a pretty unreasonable thing to do, especially since it can only be done by other threads in the parent process. (There's also some complexity around whether multiple threads are allowed to call waitid(WNOWAIT) on the same PID at the same time. I've just had one thread call it, and had other blocking waiters block on a second lock, but maybe that's overcautious.) So anyway, if you use the strategy above -- precisely because you care about the PID reuse race and want to solve it properly -- and you also happen to use Popen.kill(), then changing Popen.send_signal to reap the child can break you. I don't think this is a bug per se, but it's a behavior change, which matters to a small set of (abnormally) correct programs. But then again, if Duct is the only project that hits this in practice, maybe I'm making a mountain out of a molehill :) -- ___ Python tracker <https://bugs.python.org/issue42558> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38630] subprocess.Popen.send_signal() should poll the process
Jack O'Connor added the comment: Filed separately as https://bugs.python.org/issue42558. -- ___ Python tracker <https://bugs.python.org/issue38630> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue42558] waitpid/waitid race caused by change to Popen.send_signal in Python 3.9
New submission from Jack O'Connor : In Python 3.9, Popen.send_signal() was changed to call Popen.poll() internally before signaling. (Tracking bug: https://bugs.python.org/issue38630.) This is a best-effort check for the famous kill/wait race condition. However, because this can now reap an already-exited child process as a side effect, it can cause previously working programs to crash. Here's a simple example: ``` import os import subprocess import time child = subprocess.Popen(["true"]) time.sleep(1) child.kill() os.waitpid(child.pid, 0) ``` The program above exits cleanly in Python 3.8 but crashes with ChildProcessError in Python 3.9. It's a race against child process exit, so in practice (without the sleep) it's a heisenbug. There's a deeper race here that's harder to demonstrate in an example, but parallel to the original wait/kill issue: If the child PID happens to be reused by another parent thread in this same process, the call to waitpid might *succeed* and reap the unrelated child process. That would export the crash to that thread, and possibly create more kill/wait races. In short, the question of when exactly a child process is reaped is important for correct signaling on Unix, and changing that behavior can break programs in confusing ways. This change affected the Duct library, and I might not've caught it if not for a lucky failing doctest: https://github.com/oconnor663/duct.py/commit/5dfae70cc9481051c5e53da0c48d9efa8ff71507 I haven't searched for more instances of this bug in the wild, but one way to find them would be to look for code that calls both os.waitpid/waitid and also Popen.send_signal/kill/terminate. Duct found itself in this position because it was using waitid(WNOWAIT) on Unix only, to solve this same race condition, and also using Popen.kill on both Unix and Windows. -- messages: 382429 nosy: oconnor663 priority: normal severity: normal status: open title: waitpid/waitid race caused by change to Popen.send_signal in Python 3.9 type: crash versions: Python 3.9 ___ Python tracker <https://bugs.python.org/issue42558> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue40550] Popen.terminate fails with ProcessLookupError under certain conditions
Jack O'Connor added the comment: I'm late to the party, but I want to explain what's going on here in case it's helpful to folks. The issue you're seeing here has to do with whether a child processs has been "reaped". (Windows is different from Unix here, because the parent keeps an open handle to the child, so this is mostly a Unix thing.) In short, when a child exits, it leaves a "zombie" process whose only job is to hold some metadata and keep the child's PID reserved. When the parent calls wait/waitpid/waitid or similar, that zombie process is cleaned up. That means that waiting has important correctness properties apart from just blocking the parent -- signaling after wait returns is unsafe, and forgetting to wait also leaks kernel resources. Here's a short example demonstrating this: ``` import signal import subprocess import time # Start a child process and sleep a little bit so that we know it's exited. child = subprocess.Popen(["true"]) time.sleep(1) # Signal it. Even though it's definitely exited, this is not an error. os.kill(child.pid, signal.SIGKILL) print("signaling before waiting works fine") # Now wait on it. We could also use os.waitpid or os.waitid here. This reaps # the zombie child. child.wait() # Try to signal it again. This raises ProcessLookupError, because the child's # PID has been freed. But note that Popen.kill() would be a no-op here, # because it knows the child has already been waited on. os.kill(child.pid, signal.SIGKILL) ``` With that in mind, the original behavior with communicate() that started this bug is expected. The docs say that communicate() "waits for process to terminate and sets the returncode attribute." That means internally it calls waitpid, so your terminate() thread is racing against process exit. Catching the exc
[issue38630] subprocess.Popen.send_signal() should poll the process
Jack O'Connor added the comment: This change caused a crash in the Duct library in Python 3.9. Duct uses the waitid(NOWAIT) strategy that Nathaniel Smith has described in this thread. With this change, the child process is reaped by kill() in a spot we didn't expect, and a subsequent call to os.waitid() raises a ChildProcessError. This is a race condition that only triggers if the child happens to exit before kill() is called. I just pushed https://github.com/oconnor663/duct.py/commit/5dfae70cc9481051c5e53da0c48d9efa8ff71507 to work around this, which I'll release shortly as Duct version 0.6.4. Broadly speaking, this change could break any program that uses Popen.kill() together with os.waitpid() or os.waitid(). Checking Popen.returncode before calling the os functions is a good workaround for the single-threaded case, but it doesn't fix the multithreaded case. Duct is going to avoid calling Popen.kill() entirely. There are some longer comments about race conditions in that commit I linked. I don't feel strongly one way the other about keeping this new behavior, but we should probably document it clearly in Popen.send_signal() and all of its callers that these functions might reap the child. -- nosy: +oconnor663 ___ Python tracker <https://bugs.python.org/issue38630> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39988] Remove AugLoad and AugStore expression context from AST
Jack O'Connor added the comment: Ah never mind, it looks like that's covered by https://bugs.python.org/issue3 -- ___ Python tracker <https://bugs.python.org/issue39988> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39988] Remove AugLoad and AugStore expression context from AST
Jack O'Connor added the comment: This change may have broken pyflakes on nightly Python. Is that expected or problematic? Error message: "pyflakes" failed during execution due to "module 'ast' has no attribute 'AugLoad'" Seen at: https://travis-ci.org/github/buildinspace/peru/jobs/665005023 -- nosy: +oconnor663 ___ Python tracker <https://bugs.python.org/issue39988> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39298] add BLAKE3 to hashlib
Jack O'Connor added the comment: I've just published some Python bindings for the Rust implementation on PyPI: https://pypi.org/project/blake3 > I'm guessing Python is gonna hold off until BLAKE3 reaches 1.0. That's very fair. The spec and test vectors are set in stone at this point, but the implementations are new, and I don't see any reason to rush things out. (Especially since early adopters can now use the library above.) That said, there aren't really any expected implementation changes that would be a natural moment for the implementations to tag 1.0. I'll probably end up tagging 1.0 as soon as a caller appears who needs it to be tagged to meet their own stability requirements. -- ___ Python tracker <https://bugs.python.org/issue39298> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39298] add BLAKE3 to hashlib
Jack O'Connor added the comment: Version 0.2.0 of the BLAKE3 repo includes optimized assembly implementations. These are behind the "c" Cargo feature for the `blake3` Rust crate, but included by default for the internal bindings crate. So the easiest way to rerun our favorite benchmark is: git clone https://github.com/BLAKE3-team/BLAKE3 cd BLAKE3 git fetch # I rebased this branch on top of version 0.2.0 today. git checkout origin/bench_406668786 cd c/blake3_c_rust_bindings # Nightly is currently broken for unrelated reasons, so # we use stable with this internal bootstrapping flag. RUSTC_BOOTSTRAP=1 cargo bench 406668786 Running the above on my machine, I get 2888 MB/s, up another 12% from the 0.1.3 numbers. As a bonus, we don't need to worry about the difference between GCC and Clang. These new assembly files are essentially drop-in replacements for the instruction-set-specific C files we had before, which are also still supported. The updated C README has more details: https://github.com/BLAKE3-team/BLAKE3/blob/master/c/README.md -- ___ Python tracker <https://bugs.python.org/issue39298> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39298] add BLAKE3 to hashlib
Jack O'Connor added the comment: Version 0.1.3 of the official BLAKE3 repo includes some significant performance improvements: - The x86 implementations include explicit prefetch instructions, which helps with long inputs. (commit b8c33e1) - The C implementation now uses the same parallel parent hashing strategy that the Rust implementation uses. (commit 163f522) When I repeat the benchmarks above with TurboBoost on, here's what I see now: BLAKE3 Rust 2578 MB/s BLAKE3 C (clang -O3) 2502 MB/s BLAKE3 C (gcc -O2) 2223 MB/s K12 C (gcc -O2) 2175 MB/s Larry, if you have time to repeat your benchmarks with the latest C code, I'd be curious to see if you get similar results. -- ___ Python tracker <https://bugs.python.org/issue39298> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39298] add BLAKE3 to hashlib
Jack O'Connor added the comment: I plan to bring the C code up to speed with the Rust code this week. As part of that, I'll probably remove comments like the one above :) Otherwise, is there anything else we can do on our end to help with this? -- ___ Python tracker <https://bugs.python.org/issue39298> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39298] add BLAKE3 to hashlib
Jack O'Connor added the comment: Ok, I've added Rust bindings to the BLAKE3 C implementation, so that I can benchmark it in a vaguely consistent way. My laptop is an i5-8250U, which should be very similar to yours. (Both are "Kaby Lake Refresh".) My end result do look similar to yours with TurboBoost on, but pretty different with TurboBoost off: with TurboBoost on -- K12 GCC| 2159 MB/s BLAKE3 Rust| 1787 MB/s BLAKE3 C Clang | 1588 MB/s BLAKE3 C GCC | 1453 MB/s with TurboBoost off --- BLAKE3 Rust| 1288 MB/s K12 GCC| 1060 MB/s BLAKE3 C Clang | 1094 MB/s BLAKE3 C GCC | 943 MB/s The difference seems to be that with TurboBoost on, the BLAKE3 benchmarks have my CPU sitting around 2.4 GHz, while for the K12 benchmarks it's more like 2.9 GHz. With TurboBoost off, both benchmarks run at 1.6 GHz, and BLAKE3 does better. I'm not sure what causes that frequency difference. Perhaps some high-power instruction that the BLAKE3 implementation is emitting? To reproduce these numbers you can clone these two repos (the latter is where I happen to have a K12 benchmark): https://github.com/BLAKE3-team/BLAKE3 https://github.com/oconnor663/blake2_simd Then in both cases checkout the "bench_406668786" branch, where I've put some benchmarks with the same input length you used. For Rust BLAKE3, at the root of the BLAKE3 repo, run: cargo +nightly bench 406668786 For C BLAKE3, the command is the same, but run it in the "./c/blake3_c_rust_bindings" directory. The build defaults to GCC, and you can "export CC=clang" to switch it. For my K12 benchmark, at the root of the blake2_simd repo, run: cargo +nightly bench --features=kangarootwelve 406668786 -- ___ Python tracker <https://bugs.python.org/issue39298> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39298] add BLAKE3 to hashlib
Jack O'Connor added the comment: I'm in the middle of adding some Rust bindings to the C implementation in github.com/BLAKE3-team/BLAKE3, so that `cargo test` and `cargo bench` can cover both. Once that's done, I'll follow up with benchmark numbers from my laptop (Kaby Lake i5-8250U, also AVX2 with no AVX-512). For benchmark numbers with AVX-512 support, see the Performance section of the BLAKE3 paper (https://github.com/BLAKE3-team/BLAKE3-specs/blob/master/blake3.pdf). Larry, what processor did you run your benchmarks on? Also, is there anything currently in CPython that does dispatch based on runtime CPU feature detection? Is this something that BLAKE3 should do for itself, or is there existing machinery that we'd want to integrate with? -- nosy: +oconnor663 ___ Python tracker <https://bugs.python.org/issue39298> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue31933] some Blake2 parameters are encoded backwards on big-endian platforms
New submission from Jack O'Connor <oconnor...@gmail.com>: See https://github.com/BLAKE2/libb2/issues/12. All Blake2 params have to be encoded in little-endian byte order. For the two multi-byte integer params, leaf_length and node_offset, that means that assigning a native-endian integer to them appears to work on little-endian platforms, but gives the wrong result on big-endian. The current libb2 API doesn't make that very clear, and @sneves is working on new API functions in the GH issue above. In the meantime, we can work around the problem by explicitly assigning little-endian values to the parameter block. -- messages: 305473 nosy: oconnor663 priority: normal severity: normal status: open title: some Blake2 parameters are encoded backwards on big-endian platforms type: behavior versions: Python 3.6, Python 3.7, Python 3.8 ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue31933> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue26848] asyncio.subprocess's communicate() method mishandles empty input bytes
Jack O'Connor added the comment: Related: The asyncio communicate() method differs from standard subprocess in how it treats input bytes when stdin is (probably mistakenly) not set to PIPE. Like this: proc = await create_subprocess_shell("sleep 5") await proc.communicate(b"foo") # Oops, I forgot stdin=PIPE above! The standard, non-async version of this example, communicate would ignore the input bytes entirely. But here in the asyncio version, communicate will try to write those bytes to stdin, which is None, and the result is an AttributeError. Since the user probably only hits this case by mistake, I think raising an exception is preferable. But it would be nice to raise an exception that explicitly said "you've forgotten stdin=PIPE" instead of the unhelpful "'NoneType' object has no attribute 'write'". Maybe it would be worth cleaning this up while we're here? -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue26848> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue26848] asyncio.subprocess's communicate() method mishandles empty input bytes
Jack O'Connor added the comment: Thanks for the heads up, Berker, I've re-submitted the PR as https://github.com/python/asyncio/pull/335. -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue26848> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue26848] asyncio.subprocess's communicate() method mishandles empty input bytes
New submission from Jack O'Connor: Setting stdin=PIPE and then calling communicate(b"") should close the child's stdin immediately, similar to stdin=DEVNULL. Instead, communicate() treats b"" like None and leaves the child's stdin open, which makes the child hang forever if it tries to read anything. I have a PR open with a fix and a test: https://github.com/python/cpython/pull/33 -- components: asyncio messages: 264212 nosy: gvanrossum, haypo, oconnor663, yselivanov priority: normal severity: normal status: open title: asyncio.subprocess's communicate() method mishandles empty input bytes type: behavior versions: Python 3.5, Python 3.6 ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue26848> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24909] Windows: subprocess.Popen: race condition for leaking inheritable handles
Changes by Jack O'Connor <oconnor...@gmail.com>: -- nosy: +oconnor663 ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue24909> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue25779] deadlock with asyncio+contextmanager+ExitStack
New submission from Jack O'Connor: The following hangs at 100% CPU on Python 3.5, though not on Python 3.4: 1) Start an asyncio coroutine with run_until_complete(). 2) Inside the coroutine, enter an ExitStack using a with-statement. 3) Inside the with-statement, call ExitStack.enter_context() with a generator context manager. It doesn't matter what the generator yields. 4) After the enter_context() call, raise an exception. Here's an example script that does all of this and repros the hang: https://gist.github.com/oconnor663/483db2820bb5f877c9ed -- components: asyncio messages: 255719 nosy: gvanrossum, haypo, oconnor663, yselivanov priority: normal severity: normal status: open title: deadlock with asyncio+contextmanager+ExitStack type: behavior versions: Python 3.5 ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue25779> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue25779] deadlock with asyncio+contextmanager+ExitStack
Jack O'Connor added the comment: Thanks for chasing this down. Yury, can you suggest a workaround? -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue25779> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue25781] infinite loop in reprlib
Changes by Jack O'Connor <oconnor...@gmail.com>: -- nosy: +oconnor663 ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue25781> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue25782] CPython hangs on error __context__ set to the error itself
Jack O'Connor added the comment: Yury, do we need to handle more complicated infinite loops, where "self" doesn't actually show up in the loop? Here's an example: try: raise Exception except Exception as ex: loop1 = Exception() loop2 = Exception() loop1.__context__ = loop2 loop2.__context__ = loop1 ex.__context__ = loop1 hasattr(1, 'aa') I'm unfamiliar with CPython, so I don't know whether full-blown loop detection belongs here. Maybe we could add a hardcoded limit like "fail if we loop more than X times"? -- nosy: +oconnor663 ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue25782> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue25786] contextlib.ExitStack introduces a cycle in exception __context__
Changes by Jack O'Connor <oconnor...@gmail.com>: -- nosy: +oconnor663 ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue25786> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue25779] deadlock with asyncio+contextmanager+ExitStack
Jack O'Connor added the comment: Yury, can you help me understand why `hasattr("foo", "bar")` triggers the infinite loop there, but not `print("foo")`? -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue25779> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue25565] subprocess.Popen creates inheritable file descriptors on Windows, can leak to other child processes
Jack O'Connor added the comment: Definitely a duplicate, thanks for pointing me to the original. Should I mark it resolved, or let someone from the project do that? -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue25565> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue25565] subprocess.Popen creates inheritable file descriptors on Windows, can leak to other child processes
New submission from Jack O'Connor: The Windows implementation of Popen calls _make_inheritable(), which creates inheritable copies of Popen's file descriptors. If two Popen calls happen at the same time on different threads, these descriptors can leak to both child processes. Here's a demonstration of a deadlock caused by this bug: https://gist.github.com/oconnor663/b1d39d58b232fc627d84 Victor Stinner also wrote up a summary of the security issues associated with leaking file descriptors in PEP 0446. A workaround for this issue is to protect all Popen calls with a lock. Calls to wait() and communicate() don't need to be protected, so you can release the lock before you make those blocking calls. I don't see a way to safely use run() or the other convenience functions, if you're using pipes and multiple threads. Unfortunately close_fds=True is not allowed on Windows when any of stdin/stdout/stderr are set, which is going the be the case here. Would it be feasible for Popen.__init__() to automatically protect the inheritable copies it creates, with a lock around that section? We already have the _waitpid_lock for POSIX, so it seems like thread safety is a goal. -- components: Library (Lib) messages: 254168 nosy: oconnor663 priority: normal severity: normal status: open title: subprocess.Popen creates inheritable file descriptors on Windows, can leak to other child processes type: security versions: Python 3.5 ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue25565> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23548] TypeError in event loop finalizer, new in Python 3.4.3
Jack O'Connor added the comment: Got it, thanks for the heads up. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23548 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23548] TypeError in event loop finalizer, new in Python 3.4.3
Jack O'Connor added the comment: @gvanrossum, the last two lines you suggested don't give any error, as expected. Not sure why we're getting that error message in the toy example. @haypo, closing the event loop explicitly works fine for me. But since you mentioned it, I don't see any warning when I leave out closing the event loop and turn on PYTHONASYNCIODEBUG=1. What should I be seeing? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23548 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23548] TypeError in event loop finalizer, new in Python 3.4.3
Jack O'Connor added the comment: `close()` fixes it; thanks for the workaround! When I throw a print statement inside `remove_signal_handler`, it says that sig is 17 and handler is 0. 17 looks to be SIGCHLD, presumably from the little echo subprocess exiting in this example. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23548 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23548] TypeError in event loop finalizer, new in Python 3.4.3
New submission from Jack O'Connor: This toy program: import asyncio @asyncio.coroutine def main(): p = yield from asyncio.create_subprocess_shell('echo hi') yield from p.wait() asyncio.get_event_loop().run_until_complete(main()) Produces this output on Arch Linux under Python 3.4.3 (but not 3.4.2): hi Exception ignored in: bound method _UnixSelectorEventLoop.__del__ of _UnixSelectorEventLoop running=False closed=True debug=False Traceback (most recent call last): File /home/jacko/Downloads/Python-3.4.3/Lib/asyncio/base_events.py, line 361, in __del__ File /home/jacko/Downloads/Python-3.4.3/Lib/asyncio/unix_events.py, line 57, in close File /home/jacko/Downloads/Python-3.4.3/Lib/asyncio/unix_events.py, line 138, in remove_signal_handler TypeError: signal handler must be signal.SIG_IGN, signal.SIG_DFL, or a callable object -- components: asyncio messages: 236892 nosy: gvanrossum, haypo, oconnor663, yselivanov priority: normal severity: normal status: open title: TypeError in event loop finalizer, new in Python 3.4.3 type: crash versions: Python 3.4 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23548 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue22428] KeyboardInterrupt inside a coroutine causes AttributeError
New submission from Jack O'Connor: The following test script prints a KeyboardInterrupt traceback (expected), but also an AttributeError traceback (unexpected): import asyncio @asyncio.coroutine def main(): raise KeyboardInterrupt asyncio.get_event_loop().run_until_complete(main()) Traceback (most recent call last): File test.py, line 9, in module asyncio.get_event_loop().run_until_complete(main()) File /usr/lib/python3.4/asyncio/base_events.py, line 203, in run_until_complete self.run_forever() File /usr/lib/python3.4/asyncio/base_events.py, line 184, in run_forever self._run_once() File /usr/lib/python3.4/asyncio/base_events.py, line 817, in _run_once handle._run() File /usr/lib/python3.4/asyncio/events.py, line 39, in _run self._callback(*self._args) File /usr/lib/python3.4/asyncio/tasks.py, line 321, in _step result = next(coro) File /usr/lib/python3.4/asyncio/tasks.py, line 103, in coro res = func(*args, **kw) File test.py, line 6, in main raise KeyboardInterrupt KeyboardInterrupt --- Logging error --- Traceback (most recent call last): --- Logging error --- Traceback (most recent call last): Exception ignored in: bound method Task.__del__ of Task(coro)exception=KeyboardInterrupt() Traceback (most recent call last): File /usr/lib/python3.4/asyncio/futures.py, line 184, in __del__ File /usr/lib/python3.4/asyncio/base_events.py, line 725, in call_exception_handler File /usr/lib/python3.4/logging/__init__.py, line 1296, in error File /usr/lib/python3.4/logging/__init__.py, line 1402, in _log File /usr/lib/python3.4/logging/__init__.py, line 1412, in handle File /usr/lib/python3.4/logging/__init__.py, line 1482, in callHandlers File /usr/lib/python3.4/logging/__init__.py, line 846, in handle File /usr/lib/python3.4/logging/__init__.py, line 977, in emit File /usr/lib/python3.4/logging/__init__.py, line 899, in handleError File /usr/lib/python3.4/traceback.py, line 169, in print_exception File /usr/lib/python3.4/traceback.py, line 153, in _format_exception_iter File /usr/lib/python3.4/traceback.py, line 18, in _format_list_iter File /usr/lib/python3.4/traceback.py, line 65, in _extract_tb_or_stack_iter File /usr/lib/python3.4/linecache.py, line 15, in getline File /usr/lib/python3.4/linecache.py, line 41, in getlines File /usr/lib/python3.4/linecache.py, line 126, in updatecache File /usr/lib/python3.4/tokenize.py, line 437, in open AttributeError: 'module' object has no attribute 'open' The issue is that Task._step() calls self.set_exception() for both Exceptions and BaseExceptions, but it reraises BaseExceptions. That means that the BaseEventLoop never gets to call future.result() to clear the exception when it's a BaseException. Future.__del__ eventually tries to log this uncleared exception, but something about the circumstances of program exit (caused by the same exception) has already cleaned up the builtins needed for logging. Is that __del__ violating some best practices for finalizers by calling such a complicated function? Either way, I'm not sure this case should be considered an unretrieved exception at all. It's propagating outside the event loop after all. Should Task._step() be setting it in the first place, if it's going to reraise it? Should it be set without _log_traceback=True somehow? -- components: asyncio messages: 226985 nosy: gvanrossum, haypo, oconnor663, yselivanov priority: normal severity: normal status: open title: KeyboardInterrupt inside a coroutine causes AttributeError type: behavior versions: Python 3.4 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22428 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue22427] TemporaryDirectory attempts to clean up twice
Jack O'Connor added the comment: My example script is definitely a corner case, but where this actually came up for me was in asyncio. Using a TemporaryDirectory inside a coroutine creates a similar situation. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22427 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue22427] TemporaryDirectory attempts to clean up twice
New submission from Jack O'Connor: The following little script prints (but ignores) a FileNotFoundError. import tempfile def generator(): with tempfile.TemporaryDirectory(): yield g = generator() next(g) Exception ignored in: generator object generator at 0x7fb319fe2c60 Traceback (most recent call last): File gen.py, line 6, in generator File /usr/lib/python3.4/tempfile.py, line 691, in __exit__ File /usr/lib/python3.4/tempfile.py, line 697, in cleanup File /usr/lib/python3.4/shutil.py, line 454, in rmtree File /usr/lib/python3.4/shutil.py, line 452, in rmtree FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmp7wek4xhy' Putting print statements in the TemporaryDirectory class shows what's happening (confirming Guido's theory from https://groups.google.com/forum/#!topic/python-tulip/QXgWH32P2uM): As the program exits, the TemporaryDirectory object is finalized. This actually rmtree's the directory. After that, the generator is finalized, which raises a GeneratorExit inside of it. That exception causes the with statement to call __exit__ on the already-finalized TemporaryDirectory, which tries to rmtree again and throws the FileNotFoundError. The main downside of this bug is just garbage on stderr. I suppose in exceptionally unlikely circumstances, a new temp dir by the same name could be created between the two calls, and we might actually delete something we shouldn't. The simple fix would be to store a _was_cleaned flag or something on the object. Is there any black magic in finalizers or multithreading that demands something more complicated? -- components: Library (Lib) messages: 226979 nosy: oconnor663 priority: normal severity: normal status: open title: TemporaryDirectory attempts to clean up twice type: behavior versions: Python 3.4 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22427 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue22279] read() vs read1() in asyncio.StreamReader documentation
Jack O'Connor added the comment: Agreed that changing read() would probably break tons of people. I don't think a naming inconsistency meets the serious flaws are uncovered bar for breaking a provisional package. If we actually prefer the asyncio way of doing things, all the better. That said, another thing I'm noticing is that in asyncio, read is basically two different functions. This is clear in the code, http://hg.python.org/cpython/file/fb3aee1cff59/Lib/asyncio/streams.py#l433, where the n0 case goes off on its own branch and never comes back. (Incidentally there's another n0 check at line 453 there that I think always returns false.) We have a read function that makes very different guarantees depending on the value of n. Contrast this with the read function from regular io, where read(n) and read() are effectively the same if n is large enough. Maybe just another point that's worth clarifying in the docs. Thanks for the quick replies! -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22279 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue22279] read() vs read1() in asyncio.StreamReader documentation
New submission from Jack O'Connor: BufferedIOBase and related classes have a read(n) and read1(n). The first will wait until n bytes are available (or EOF), while the second will return as soon as any bytes are available. In asyncio.StreamReader, there is no read1 method, but the read method seems to behave like BufferedIOBase.read1, and there is a readexactly method that behaves like BufferedIOBase.read. Is there a reason for this naming change? Either way, could the documentation for asyncio.StreamReader.read be clarified? -- components: asyncio messages: 225924 nosy: gvanrossum, haypo, oconnor663, yselivanov priority: normal severity: normal status: open title: read() vs read1() in asyncio.StreamReader documentation type: behavior versions: Python 3.4 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22279 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com