[issue35459] Use PyDict_GetItemWithError() instead of PyDict_GetItem()
Josh Rosenberg added the comment: #36110 was closed as a duplicate; the superseder is #36109 (which has been fixed). The change should still be documented, just in case anyone gets bitten by it. -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue35459> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35996] Optional modulus argument for new math.prod() function
Josh Rosenberg added the comment: "One other issue is that the arguments to prod() need not be integers, so a modulus argument wouldn't make sense in those contexts." The arguments to pow don't need to be integers either, yet the optional third argument is only really relevant to integers. Not saying we should do this, but we definitely allow optional arguments that are only meaningful for certain input types in other cases. The best argument for this change I can come up with from other Python functions is the push for an efficient math.comb (#35431); if we didn't want to bother supporting minimizing intermediate values, math.comb could be implemented directly in terms of math.factorial instead of trying to pre-cancel values. But even that's not a strong argument for this, given the relative frequency with which each feature is needed (the binomial coefficient coming up much more often than modular reduction of huge products). -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue35996> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue25737] array is not a Sequence
Josh Rosenberg added the comment: Correction: It should actually be registered as a subclass of MutableSequence (which should make it a virtual subclass of Sequence too; list is only registered on MutableSequence as well). -- ___ Python tracker <https://bugs.python.org/issue25737> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue25737] array is not a Sequence
Change by Josh Rosenberg : -- versions: +Python 3.7, Python 3.8 -Python 3.5 ___ Python tracker <https://bugs.python.org/issue25737> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue25737] array is not a Sequence
Josh Rosenberg added the comment: This should not be closed as a duplicate. Yes, array.array isn't automatically a Sequence, but since it isn't, the array module should be modified to explicitly do the equivalent of: import _collections_abc _collections_abc.Sequence.register(array) so it's properly registered manually. -- nosy: +josh.r resolution: duplicate -> status: closed -> open superseder: issubclass without registration only works for "one-trick pony" collections ABCs. -> ___ Python tracker <https://bugs.python.org/issue25737> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35190] collections.abc.Sequence cannot be used to test whether a class provides a particular interface (doc issue)
Josh Rosenberg added the comment: Wait, why should #25737 be closed? This bug is a docs issue; collections.abc shouldn't claim that all the ABCs do duck-typing checks since Sequence doesn't. But #25737 is specific: array.array *should* be registered as a Sequence, but isn't; that requires a code fix (to make array perform the registration), not a doc fix. -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue35190> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15753] No-argument super in method with variable arguments raises SystemError
Josh Rosenberg added the comment: Moving from pending back to open (not sure what was "pending" about it?). The workaround is viable (and used by Python implemented dict subclasses in the standard library since they must accept **kwargs with arbitrary strings, including self), but it does seem a little silly that it's required. Leaving it as low priority since the workaround exists. Still, would be nice to make super() seamless, pulling the first argument if the function accepts it as non-varargs, and the first element of the first argument if it's varargs. If someone reassigns self/args, that's on them; it's fine to raise a RuntimeError if they use no-arg super(), requiring them to use two-arg super explicitly in that case. -- priority: normal -> low status: pending -> open versions: +Python 3.8 -Python 3.2, Python 3.3, Python 3.4 ___ Python tracker <https://bugs.python.org/issue15753> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35988] Python interpreter segfault
Josh Rosenberg added the comment: "your application is using more memory than what is available in the system." Well, it alone may not be using more memory, but the cumulative usage on the system is "too high" by whatever metric the OOM killer is using (IIRC the default rule is that actual committed memory must be less than swap size + 50% of RAM). The OOM killer is a strange and terrible beast, and the behavior varies based on configuration, relative memory usage of each process grouping, minimizing number of processes killed, etc. You can find deep implementation details on it (including how to disable a given process for consideration) here: https://linux-mm.org/OOM_Killer The real solution to problems like this usually amounts to: 1. Install more RAM. 2. Increase the size of your swap partition. Doesn't "fix" being shy of memory if you're actually using more memory than you have RAM, but allows you to handle overcommit (particularly for fork+exec scenarios where the forked process's memory will be freed momentarily) without the OOM killer getting involved, and to limp along slowly, without actually failing, if you actually allocate and use more memory than you have RAM. 3. Tweak the overcommit heuristics to allow more overcommit before invoking the OOM killer. 4. Disable overcommit entirely, so memory allocations fail immediately if sufficient backing storage is not available, rather than succeeding, only to invoke the OOM killer when the allocated memory gets used and the shortage is discovered. This is a good solution if the program(s) in question aren't poorly designed such that they try to allocate many GBs of memory up front even when they're unlikely to need it; unfortunately, there are commonly used programs that overallocate like this and render this solution non-viable if they're part of your setup. Regardless, this isn't a bug in Python itself. Any process that uses a lot of memory (Apache, MySQL) and hasn't explicitly removed itself from OOM killer consideration is going to look tasty when an OOM scenario occurs. -- nosy: +josh.r resolution: -> not a bug stage: -> resolved status: open -> closed ___ Python tracker <https://bugs.python.org/issue35988> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35961] test_slice: gc_decref: Assertion "gc_get_refs(g) > 0" failed: refcount is too small
Josh Rosenberg added the comment: +1 on PR 11830 from me. Can always revisit if #11107 is ever implemented and it turns out that the reference count manipulation means startup is too slow due to all the slice interning triggered comparisons (unlikely, but theoretically possible I guess). -- ___ Python tracker <https://bugs.python.org/issue35961> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35961] test_slice: gc_decref: Assertion "gc_get_refs(g) > 0" failed: refcount is too small
Josh Rosenberg added the comment: Ah, I see Victor posted an alternative PR that avoids the reference counting overhead by explicitly removing the temporary tuples from GC tracking. I'm mildly worried by that approach, only because the only documented use case for PyObject_GC_UnTrack is in tp_dealloc (that said, the code explicitly allows it to be called twice due to the Py_TRASHCAN mechanism, so it's probably safe so long as the GC design never changes dramatically). If slice comparison really is performance sensitive enough to justify this, so be it, but I'd personally prefer to reduce the custom code involved in a rarely used code path (we're not even caching constant slices yet, so no comparisons are likely to occur for 99.99% of slices, right?). -- nosy: +josh.r versions: +Python 3.6, Python 3.7 ___ Python tracker <https://bugs.python.org/issue35961> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35961] test_slice: gc_decref: Assertion "gc_get_refs(g) > 0" failed: refcount is too small
Josh Rosenberg added the comment: Victor found the same bug I found while I was composing this, posting only to incorporate proposed solution: I *think* I have a cause for this, but someone else with greater understanding of the cycle collector should check me if the suggested fix has non-trivial performance implications (I suspect the answer is no, performance is unaffected). slice_richcompare borrows its behavior from tuple by creating a temporary tuple for each slice, the delegating to the tuple comparison ( https://github.com/python/cpython/blob/master/Objects/sliceobject.c#L591 ). Problem is, it uses borrowed references when creating said tuples, not owned references. Because test_slice's BadCmp.__eq__ is implemented in Python, the comparison can be interrupted by cycle collection during the __eq__ call. When then happens, there are precisely two references to the BadCmp object: 1. In the slice (owned) 2. In the temporary tuple (borrowed) When a cycle collection occurs during the comparison, and subtract_refs ( https://github.com/python/cpython/blob/master/Modules/gcmodule.c#L398 ) is called, the BadCmp object in question is visited via both the slice and the tuple, and since it has no non-container objects referencing it, it ends up with the initial reference count of 1 attempting to drop to -1, and the assertion is violated. While the code of gcmodule.c appears to have been refactored since 3.7 so the assert occurs in a different function, with a slightly different message, it would break the same way in both 3.7 and master, and whether or not it triggers the bug, the broken behavior of slice_richcompare hasn't changed for a *long* time. Underlying problem would seem to be slice's richcompare believing it's okay to make a tuple from borrowed references, then make a call on it that can trigger calls into Python level code (and therefore into the cycle collector); everything else is behaving correctly here. I'm guessing the only reason it's not seen in the wild is that slices based on Python defined types are almost never compared at all, let alone compared on debug builds that would be checking the assert and with an accelerated cycle collection cycle that would make a hit likely. Solution would be to stop trying to microoptimize slice_richcompare to avoid reference count manipulation and just build a proper tuple. It would even simplify the code since we could just use PyTuple_Pack, reducing custom code by replacing: t1 = PyTuple_New(3); if (t1 == NULL) return NULL; t2 = PyTuple_New(3); if (t2 == NULL) { Py_DECREF(t1); return NULL; } PyTuple_SET_ITEM(t1, 0, ((PySliceObject *)v)->start); PyTuple_SET_ITEM(t1, 1, ((PySliceObject *)v)->stop); PyTuple_SET_ITEM(t1, 2, ((PySliceObject *)v)->step); PyTuple_SET_ITEM(t2, 0, ((PySliceObject *)w)->start); PyTuple_SET_ITEM(t2, 1, ((PySliceObject *)w)->stop); PyTuple_SET_ITEM(t2, 2, ((PySliceObject *)w)->step); with: t1 = PyTuple_Pack(3, ((PySliceObject *)v)->start, ((PySliceObject *)v)->stop, ((PySliceObject *)v)->step); if (t1 == NULL) return NULL; t2 = PyTuple_Pack(3, ((PySliceObject *)w)->start, ((PySliceObject *)w)->stop, ((PySliceObject *)w)->step); if (t2 == NULL) { Py_DECREF(t1); return NULL; } and makes cleanup simpler, since you can just delete: PyTuple_SET_ITEM(t1, 0, NULL); PyTuple_SET_ITEM(t1, 1, NULL); PyTuple_SET_ITEM(t1, 2, NULL); PyTuple_SET_ITEM(t2, 0, NULL); PyTuple_SET_ITEM(t2, 1, NULL); PyTuple_SET_ITEM(t2, 2, NULL); and let the DECREFs for t1/t2 do their work normally. If for some reason the reference count manipulation is unacceptable, this *could* switch between two behaviors depending on whether or not start/stop/step are of known types (e.g. if all are NoneType/int, this could use the borrowed refs code path safely) where a call back into Python level code is impossible; given that slices are usually made of None and/or ints, this would remove most of the cost for the common case, at the expense of more complicated code. Wouldn't help numpy types though, and I suspect the cost of pre-checking the types for all six values involved would eliminate most of the savings. Sorry for not submitting a proper PR; the work machine I use during the day is not suitable for development (doesn't even have Python installed). -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue35961> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35918] multiprocessing's SyncManager.dict.has_key() method is broken
Change by Josh Rosenberg : -- resolution: -> fixed stage: patch review -> resolved status: open -> closed ___ Python tracker <https://bugs.python.org/issue35918> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5996] abstract class instantiable when subclassing built-in types
Josh Rosenberg added the comment: Closed #35958 as a duplicate of this issue (and updated the title, since clearly the problem is not specific to dict). Patch probably needs to be rebased/rewritten against latest trunk (given it dates from Mercurial days). -- nosy: +Jon McMahon, josh.r stage: -> patch review title: abstract class instantiable when subclassing dict -> abstract class instantiable when subclassing built-in types versions: +Python 3.5, Python 3.6, Python 3.7, Python 3.8 ___ Python tracker <https://bugs.python.org/issue5996> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35904] Add statistics.fmean(seq)
Josh Rosenberg added the comment: Correct me if I'm wrong, but at least initially, the first listed goal of statistics (per the PEP) was: "Correctness over speed. It is easier to speed up a correct but slow function than to correct a fast but buggy one." numpy already exists for people who need insane speed for these algorithms and are willing to compromise accuracy; am I wrong in my impression that statistics is more about providing correct batteries included that are fast enough for simple uses, not reimplementing numpy piece by piece for hardcore number crunching? Even if such a function were desirable, I don't like the naming symmetry between fsum and fmean; it's kind of misleading. math.fsum is a slower, but more precise, version of the built-in sum. Having statistics.fmean be a faster, less accurate, version of statistics.mean reverses that relationship between the f-prefixed and non-f-prefixed versions of a function. -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue35904> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35862] Change the environment for a new process
Change by Josh Rosenberg : Removed file: https://bugs.python.org/file48088/bq-nix.snapshot.json ___ Python tracker <https://bugs.python.org/issue35862> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35862] Change the environment for a new process
Josh Rosenberg added the comment: Why is not having the target assign to the relevant os.environ keys before doing whatever depends on the environment not an option? -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue35862> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35862] Change the environment for a new process
Change by Josh Rosenberg : Removed file: https://bugs.python.org/file48087/core-nix.snapshot.json ___ Python tracker <https://bugs.python.org/issue35862> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35862] Change the environment for a new process
Change by Josh Rosenberg : Removed file: https://bugs.python.org/file48089/bq-nix.manifest ___ Python tracker <https://bugs.python.org/issue35862> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35862] Change the environment for a new process
Change by Josh Rosenberg : -- Removed message: https://bugs.python.org/msg334593 ___ Python tracker <https://bugs.python.org/issue35862> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35866] concurrent.futures deadlock
Josh Rosenberg added the comment: I've only got 3.7.1 Ubuntu bash on Windows (also amd64) immediately available, but I'm not seeing a hang, nor is there any obvious memory leak that might eventually lead to problems (memory regularly drops back to under 10 MB shared, 24 KB private working set). I modified your code to add a sys.stdout.flush() after the write so it would actually echo the dots as they were written instead of waiting for a few thousand of them to build up in the buffer, but otherwise it's the same code. Are you sure you're actually hanging, and it's not just the output getting buffered? -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue35866> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35842] A potential bug about use of uninitialised variable
Josh Rosenberg added the comment: One additional note, just in case you're wondering. slice explicitly does not set Py_TPFLAGS_BASETYPE (in either Py2 or Py3), so you can't make a subclass of slice with NULLable fields by accident (you'll get a TypeError the moment you try to define it). There is one, and only one, slice type, and its fields are never NULL. -- ___ Python tracker <https://bugs.python.org/issue35842> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35842] A potential bug about use of uninitialised variable
Josh Rosenberg added the comment: Yes, the 2.7 version of _PyEval_SliceIndex would bypass the NULL pointer dereference, so *if* you could make a slice with a NULL stop value, you could trigger a read from uninitialized stack memory, rather than dying due to a NULL pointer dereference. But just like Python 3, 2.7's PySlice_New explicitly replaces all NULLs with None ( https://github.com/python/cpython/blob/2.7/Objects/sliceobject.c#L60 ), so such a slice cannot exist. Since you can't make a slice with a NULL value through any supported API, and any unsupported means of doing this means you already have the ability to execute arbitrary code (and do far worse things that just trigger a read from an uninitialized C stack value), the fact that _PyEval_SliceIndex returns success for v == NULL is irrelevant; v isn't NULL in any code path aside of the specific one documented (the SLICE opcode, gone in Py3, which can pass in NULL, but uses defaults of 0 and PY_SSIZE_T_MAX for low and high respectively, so the silent success just leaves the reasonable defaults set), because all other uses use slice objects as the source for v, and they cannot have NULL values. -- resolution: -> not a bug status: open -> closed ___ Python tracker <https://bugs.python.org/issue35842> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35707] time.sleep() should support objects with __float__
Josh Rosenberg added the comment: You've got a reference leak in your __index__ based paths. PyNumber_Index is returning a new reference (either to the existing obj, or a new one, if the existing obj isn't already an int). You never release this reference. Simplest fix is to make intobj top level, initialized to NULL, and Py_XDECREF it along the convert_from_int code path (you can't DECREF it in the index specific path because it needs to survive past the goto, since it's replacing obj). I'm also mildly concerned by how duplicative the code becomes post-patch. If it's not a major performance hit (don't think it is; not even sure the API is even used anymore), perhaps just implement _PyTime_ObjectToTime_t as a wrapper for _PyTime_ObjectToDenominator (with a denominator of 2, so rounding simplifies to just 0 == round down, 1 == round up)? Example: int _PyTime_ObjectToTime_t(PyObject *obj, time_t *sec, _PyTime_round_t round) { long numerator; if (_PyTime_ObjectToDenominator(obj, sec, , 2, round) == 0) { if (numerator) { if (*sec == _Py_IntegralTypeMax(time_t)) { error_time_t_overflow(); return -1; } ++*sec; } return 0; } return -1; } Sorry for not commenting on GitHub, but my work computer has a broken Firefox that GitHub no longer supports properly. -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue35707> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35431] Add a function for computing binomial coefficients to the math module
Josh Rosenberg added the comment: Steven: I'm assuming Brett rearranged the title to put emphasis on the new function and to place it earlier in the title. Especially important if you're reading e-mails with the old subject on an e-mail client with limited subject preview lengths, you end up seeing something like: "The math module should provide a function for computing..." rather than the more useful: "Add a function for computing binomial coefficients to t..." -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue35431> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35842] A potential bug about use of uninitialised variable
Josh Rosenberg added the comment: Your analysis would be (almost) correct if a slice object could have a stop value of NULL. It's wrong in that the error would be a NULL deference, not a silent use of an uninitialized value, but it would be a bug. In your scenario where v == NULL, it would pass the test for v != Py_None, then call PyIndex_Check(v), and since the macro doesn't check for the passed value being NULL, it would perform a NULL deference. But even that's not possible; PySlice_New (which is ultimately responsible for all slice construction) explicitly replaces any argument of NULL with Py_None, so there is no such thing as a slice with *any* value being NULL. So since r->stop is definitely non-NULL, either: 1. It's None, PySlice_Unpack line 232 executes, and stop is initialized or 2. It's non-None, _PyEval_SliceIndex is called with a v that is definitely not None and non-NULL, so it always enters the `if (v != Py_None) {` block, and either it received a value index integer, in which case it initializes *pi (aka stop) and returns 1 (success), or returns 0 (failure), which means stop is never used. The only way you could trigger your bug is to make a slice with an actual NULL for its stop value (and as noted, the bug would be a NULL dereference in PyIndex_Check, not a use of an uninitialized value, because v != Py_None would return true for v == NULL), which is only possible through intentionally misusing PySliceObject (reaching in and tweaking values of the struct directly). And if you can do that, you're already a C extension (or ctypes code) and can crash the interpreter any number of ways without resorting to this level of complexity. -- nosy: +josh.r resolution: -> not a bug stage: -> resolved status: open -> closed ___ Python tracker <https://bugs.python.org/issue35842> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20399] Comparison of memoryview
Josh Rosenberg added the comment: Not my use case specifically, but my link in the last post (repeated below) was to a StackOverflow answer to a problem where using buffer was simple and fast, but memoryview required annoying workarounds. Admittedly, in most cases it's people wanting to do this with strings, so in Python 3 it only actually works if you convert to bytes first (possibly wrapped in a memoryview cast to a larger width if you need to support ordinals outside the latin-1 range). But it seems a valid use case. Examples where rich comparisons were needed include: Effcient way to find longest duplicate string for Python (From Programming Pearls) - https://stackoverflow.com/a/13574862/364696 (which provides a side-by-side comparison of code using buffer and memoryview, and memoryview lost, badly) strcmp for python or how to sort substrings efficiently (without copy) when building a suffix array - https://stackoverflow.com/q/2282579/364696 (a case where they needed to sort based on potentially huge suffixes of huge strings, and didn't want to end up copying all of them) -- ___ Python tracker <https://bugs.python.org/issue20399> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20399] Comparison of memoryview
Josh Rosenberg added the comment: The lack of support for the rich comparison operators on even the most basic memoryviews (e.g. 'B' format) means that memoryview is still a regression from some of the functionality buffer offered back in Python 2 ( https://stackoverflow.com/a/13574862/364696 ); you either need to convert back to bytes (losing the zero-copy behavior) or hand-write a comparator of your own to allow short-circuiting (which thanks to sort not taking cmp functions anymore, means you need to write it, then wrap it with functools.cmp_to_key if you're sorting, not just comparing individual items). While I'll acknowledge it gets ugly to try to support every conceivable format, it seems like, at the very least, we could provide the same functionality as buffer for 1D contiguous memoryviews in the 'B' and 'c' formats (both of which should be handleable by a simple memcmp). -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue20399> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35757] slow subprocess.Popen(..., close_fds=True)
Josh Rosenberg added the comment: Others can correct me if I'm wrong, but I'm fairly sure 2.7 isn't making changes unless they fix critical or security-related bugs. The code here is suboptimal, but it's already been fixed in Python 3 (in #8052), as part of a C accelerator module (that reduces the risk of race conditions and other conflicts your Python level fix entails). Unless someone corrects me, I'll close this as "Won't Fix". -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue35757> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35701] [uuid] 3.8 breaks weak references for UUIDs
Josh Rosenberg added the comment: The UUID module documentation (and docstring) begin with: "This module provides immutable UUID objects" Immutable is a stronger guarantee than __slots__ enforces already, so the documentation already ruled out adding arbitrary attributes to UUID (and the __setattr__ that unconditionally raised TypeError('UUID objects are immutable') supported that. Given the behavior hasn't changed in any way that contradicts the docs, nor would it affect anyone who wasn't intentionally working around the __setattr__ block, I don't feel a need to mention the arbitrary attribute limitation. It's fine to leave in the What's New note (it is a meaningful memory savings for applications using lots of UUIDs), but the note can simplify to just: """uuid.UUID now uses __slots__ to reduce its memory footprint.""" -- ___ Python tracker <https://bugs.python.org/issue35701> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35701] [uuid] 3.8 breaks weak references for UUIDs
Josh Rosenberg added the comment: David, the What's New note about weak references no longer being possible should be removed as part of this change. I'm not sure the note on arbitrary attributes no longer being addable is needed either (__setattr__ blocked that beforehand, it's just even more impossible now). -- ___ Python tracker <https://bugs.python.org/issue35701> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue29871] Enable optimized locks on Windows
Josh Rosenberg added the comment: I assume you meant #35662 (based on the superseder note in the history). -- ___ Python tracker <https://bugs.python.org/issue29871> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35712] Make NotImplemented unusable in boolean context
New submission from Josh Rosenberg : I don't really expect this to go anywhere until Python 4 (*maybe* 3.9 after a deprecation period), but it seems like it would have been a good idea to make NotImplementedType's __bool__ explicitly raise a TypeError (rather than leaving it unset, so NotImplemented evaluates as truthy). Any correct use of NotImplemented per its documented intent would never evaluate it in a boolean context, but rather use identity testing, e.g. back in the Py2 days, the canonical __ne__ delegation to __eq__ for any class should be implemented as something like: def __ne__(self, other): equal = self.__eq__(other) return equal if equal is NotImplemented else not equal Problem is, a lot of folks would make mistakes like doing: def __ne__(self, other): return not self.__eq__(other) which silently returns False when __eq__ returns NotImplemented, rather than returning NotImplemented and allowing Python to check the mirrored operation. Similar issues arise when hand-writing the other rich comparison operators in terms of each other. It seems like, given NotImplemented is a sentinel value that should never be evaluated in a boolean context, at some point it might be nice to explicitly prevent it, to avoid errors like this. Main argument against it is that I don't know of any other type/object that explicitly makes itself unevaluable in a boolean context, so this could be surprising if someone uses NotImplemented as a sentinel unrelated to its intended purpose and suffers the problem. -- messages: 333421 nosy: josh.r priority: normal severity: normal status: open title: Make NotImplemented unusable in boolean context type: behavior ___ Python tracker <https://bugs.python.org/issue35712> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35698] Division by 2 in statistics.median
Josh Rosenberg added the comment: vstinner: The problem isn't the averaging, it's the type inconsistency. In both examples (median([1]), median([1, 1])), the median is unambiguously 1 (no actual average is needed; the values are identical), yet it gets converted to 1.0 only in the latter case. I'm not sure it's possible to fix this though; right now, there is consistency among two cases: 1. When the length is odd, you get the median by identity (and therefore type and value are unchanged) 2. When the length is even, you get the median by adding and dividing by 2 (so for ints, the result is always float). A fix that changed that would add yet another layer of complexity: 1. When the length is odd, you get the median by identity (and therefore type and value are unchanged) 2. When the length is even, a. If the two middle values are equal (possibly only if they have equal types as well, to resolve the issue with [1, 1.0] or [1, True]), return the first of the two middle values (median by identity as in #1) b. Otherwise, you get the median by adding and dividing by 2 And note the required type checking in 2a required to even make it that consistent. Even if we accepted that, we'd pretty quickly get into a debate over whether median([3, 5]) should try to return 4 instead of 4.0, given that the median is representable in the source type (which would further damage consistency). If anything, I think the best design would have been to *always* include a division step (so odd length cases performed middle_elem / 1, while even did (middle_elem1 + middle_elem2) / 2) so the behavior was consistent regardless odd vs. even input length, but that shipped has probably sailed, given the documented behavior specifically notes that the precise middle data point is itself returned for the odd case. I think the solution for people concerned is to explicitly convert int values to be median-ed to fractions.Fraction (or decimal.Decimal) ahead of time, so floating point math never gets involved, and the return type is consistent regardless of length. -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue35698> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35700] Place, Pack and Grid should return the widget
Josh Rosenberg added the comment: Closing as rejected; to my knowledge, *no* built-in Python method both mutate an object and returns the object just mutated, precisely because: 1. It allows for chaining that leads fairly quickly to unreadable code (Python is not Perl/Ruby) 2. It creates doubt as to whether the original object was mutated or not (if list.sort returns a sorted list, it becomes unclear as to whether the original list was sorted as well, or whether a new list was returned; sortedlist = unsortedlist.sort() might give an inaccurate impression of what was going on). Zachary's example of using top-level functions to do the work instead is basically the same practicality compromise that sorted makes in relation to list.sort. -- nosy: +josh.r resolution: -> rejected stage: -> resolved status: open -> closed ___ Python tracker <https://bugs.python.org/issue35700> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35701] 3.8 needlessly breaks weak references for UUIDs
New submission from Josh Rosenberg : I 100% agree with the aim of #30977 (reduce uuid.UUID() memory footprint), but it broke compatibility for any application that was weak referencing UUID instances (which seems a reasonable thing to do; a strong reference to a UUID can be stored in a single master container or passed through a processing pipeline, while also keying WeakKeyDictionary with cached supplementary data). I specifically noticed this because I was about to do that very thing in a processing flow, then noticed UUIDs in 3.6 were a bit heavyweight, memory-wise, went to file a bug on memory usage to add __slots__, and discovered someone had already done it for me. Rather than break compatibility in 3.8, why not simply include '__weakref__' in the __slots__ listing? It would also remove the need for a What's New level description of the change, since the description informs people that: 1. Instances can no longer be weak-referenced (which adding __weakref__ would undp) 2. Instances can no longer add arbitrary attributes. (which was already the case in terms of documented API, programmatically enforced via a __setattr__ override, so it seems an unnecessary thing to highlight outside of Misc/NEWS) The cost of changing __slots__ from: __slots__ = ('int', 'is_safe') to: __slots__ = 'int', 'is_safe', '__weakref__' would only be 4-8 bytes (for 64 bit Python, total cost of object + int would go from 100 to 108 bytes, still about half of the pre-__slots__ cost of 212 bytes), and avoid breaking any code that might rely on being able to weak reference UUIDs. I've marked this as release blocker for the time being because if 3.8 actually releases with this change, it will cause back compat issues that might prevent people relying on UUID weak references from upgrading their code. -- components: Library (Lib) keywords: 3.7regression, easy messages: 38 nosy: Nir Soffer, josh.r, serhiy.storchaka, taleinat, vstinner, wbolster priority: release blocker severity: normal stage: needs patch status: open title: 3.8 needlessly breaks weak references for UUIDs type: behavior versions: Python 3.8 ___ Python tracker <https://bugs.python.org/issue35701> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35657] multiprocessing.Process.join() ignores timeout if child process use os.exec*()
Josh Rosenberg added the comment: Looks like the cause of the change was when os.pipe was changed to create non-inheritable pipes by default; if I monkey-patch multiprocessing.popen_fork.Popen._launch to use os.pipe2(0) instead of os.pipe() to get inheritable descriptors or just clear FD_CLOEXEC in the child with fcntl.fcntl(child_w, fcntl.F_SETFD, 0), the behavior returns to Python 2's behavior. The problem is caused by the mismatch in lifetimes between the pipe fd and the child process itself; normally the pipe lives as long as the child process (it's never actually touched in the child process at all, so it just dies with the child), but when exec gets involved, the pipe is closed long before the child ends. The code in Popen.wait that is commented with "This shouldn't block if wait() returned successfully" is probably the issue; wait() first waits on the parent side of the pipe fd, which returns immediately when the child execs and the pipe is closed. The code is assumes the poll on the process itself can be run in blocking (since the process should have ended already) but this assumption is wrong of course. Possible solutions: 1. No code changes; document that exec in worker processes is unsupported (use subprocess, possibly with a preexec_fn, for this use case). 2. Precede the call to process_obj._bootstrap() in the child with fcntl.fcntl(child_w, fcntl.F_SETFD, 0) to clear the CLOEXEC flag on the child's descriptor, so the file descriptor remains open in the child post-exec. Using os.pipe2(0) instead of os.pipe() in _launch would also work and restore the precise 3.3 and earlier behavior, but it would introduce reintroduce race conditions with parent threads, so it's better to limit the scope to the child process alone, for the child's version of the fd alone. 3. Change multiprocessing.popen_fork.Popen.wait to use os.WNOHANG for all calls with a non-None timeout (not just timeout=0.0), rather than trusting multiprocessing.connection.wait's return value (which only says whether the pipe is closed, not whether the process is closed). Problem is, this would just change the behavior from waiting for the lifetime of the child no matter what to waiting until the exec and then returning immediately, even well before the timeout; it might also introduce race conditions if the fd registers as being closed before the process is fully exited. Point is, this approach would likely require a lot of subtle tweaks to make it work. I'm in favor of either #1 or #2. #2 feels like a intentionally opening a resource leak on the surface, but I think it's actually fine, since we already signed up for a file descriptor that would live for the life of the process; the fact that it's exec-ed seems sort of irrelevant. -- keywords: +3.4regression ___ Python tracker <https://bugs.python.org/issue35657> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35657] multiprocessing.Process.join() ignores timeout if child process use os.exec*()
Josh Rosenberg added the comment: I don't know what triggered the change, but I strongly suspect this is not a supported use of the multiprocessing module; Process is for worker processes (still running Python), and it has a lot of coordination machinery set up between parent and child (for use by, among other things, join) that exec severs rather abruptly. Launching unrelated child processes is what the subprocess module is for. -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue35657> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35588] Speed up mod/divmod/floordiv for Fraction type
Josh Rosenberg added the comment: divmod imposes higher fixed overhead in exchange for operating more efficiently on larger values. Given the differences are small either way, and using divmod reduces scalability concerns for larger values (which are more likely to occur in code that delays normalization), I'd be inclined to stick with the simpler divmod-based implementation. -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue35588> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35338] set union/intersection/difference could accept zero arguments
Josh Rosenberg added the comment: Given the "feature" in question isn't actually an intended feature (just an accident of how unbound methods work), I'm closing this. We're not going to try to make methods callable without self. -- resolution: -> wont fix stage: -> resolved status: open -> closed ___ Python tracker <https://bugs.python.org/issue35338> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35438] Cleanup extension functions using _PyObject_LookupSpecial
Josh Rosenberg added the comment: Agreed with everything in Serhiy's comments. This patch disregards why _PyObject_LookupSpecial and the various _Py_IDENTIFIER related stuff was created in the first place (to handle a non-trivial task efficiently/correctly) in favor of trying to avoid C-APIs that are explicitly okay to use for the CPython standard extensions. The goal is a mistake in the first place; no patch fix will make the goal correct. Closing as not a bug. -- resolution: -> not a bug stage: patch review -> resolved status: open -> closed ___ Python tracker <https://bugs.python.org/issue35438> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35438] Extension modules using non-API functions
Josh Rosenberg added the comment: Batteries-included extension modules aren't limited to the public and/or limited API; they use tons of undocumented internal APIs (everything to do with Py_IDENTIFIERs being an obvious and frequently used non-public API). _PyObject_LookupSpecial is necessary to lookup special methods on the class of an instance (bypassing the instance itself) when no C level slot is associated with the special method (e.g. the math module using it to look up __ceil__ to implement math.ceil). Sure, each of these modules could reimplement it from scratch, but I'm not seeing the point in doing so. -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue35438> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35434] Wrong bpo linked in What's New in 3.8
New submission from Josh Rosenberg : https://docs.python.org/3.8/whatsnew/3.8.html#optimizations begins with: shutil.copyfile(), shutil.copy(), shutil.copy2(), shutil.copytree() and shutil.move() use platform-specific “fast-copy” syscalls on Linux, macOS and Solaris in order to copy the file more efficiently. ... more explanation ... (Contributed by Giampaolo Rodola’ in bpo-25427.) That's all correct, except bpo-25427 is about removing the pyvenv script; it should be referencing bpo-33671. -- assignee: docs@python components: Documentation keywords: easy messages: 331264 nosy: docs@python, giampaolo.rodola, josh.r priority: low severity: normal status: open title: Wrong bpo linked in What's New in 3.8 versions: Python 3.8 ___ Python tracker <https://bugs.python.org/issue35434> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11107] Cache constant "slice" instances
Change by Josh Rosenberg : -- versions: +Python 3.8 -Python 3.5 ___ Python tracker <https://bugs.python.org/issue11107> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35338] set union/intersection/difference could accept zero arguments
Josh Rosenberg added the comment: set.union() without constructing the set you call union on only happens to work for the set.union(a) case because `a` is already a set. union takes arbitrary iterables, not just sets, and you're just cheating by explicitly passing `a` as the expected self argument. If you'd set `a = [1, 2]` (a list, not a set), set.union(a) would fail, because set.union(a) was only working by accident of a being interpreted as self; any such use is misuse. Point is, the zero args case isn't a unique corner case; args = ([1, 2], ANY OTHER ITERABLES HERE) set.union(*args) fails too, because the first argument is interpreted as self, and must be a set for this to work. SilentGhost's solution of constructing the set before union-ing via set().union(*args) is the correct solution; it's free of corner cases, removing the specialness of the first element in args (because self is passed in correctly), and not having any troubles with empty args. intersection is the only interesting case here, where preconstruction of the empty set doesn't work, because that would render the result the empty set unconditionally. The solution there is set(args[0]).intersection(*args) (or *args[1:]), but that's obviously uglier. I'm -1 on making any changes to set.union to support this misuse case. -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue35338> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19865] create_unicode_buffer() fails on non-BMP strings on Windows
Change by Josh Rosenberg : -- keywords: -3.2regression ___ Python tracker <https://bugs.python.org/issue19865> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19865] create_unicode_buffer() fails on non-BMP strings on Windows
Change by Josh Rosenberg : -- keywords: +3.2regression versions: +Python 3.6, Python 3.7, Python 3.8 ___ Python tracker <https://bugs.python.org/issue19865> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35314] fnmatch failed with leading caret (^)
Josh Rosenberg added the comment: Finished typing this while Serhiy was closing, but just for further explanation: This isn't a bug. fnmatch provides "shell-style" wildcards, but that doesn't mean it supports every shell's extensions to the globbing syntax. It doesn't even claim support for full POSIX globbing syntax. The docs explicitly specify support for only four forms: * ? [seq] [!seq] There is no support for [^seq]; [^seq] isn't even part of POSIX globbing per glob(7): "POSIX has declared the effect of a wildcard pattern "[^...]" to be undefined." -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue35314> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35303] A reference leak in _operator.c's methodcaller_repr()
Josh Rosenberg added the comment: This is completely fixed, right? Just making sure there is nothing left to be done to close the issue. -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue35303> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35273] 'eval' in generator expression behave different in dict from list
Josh Rosenberg added the comment: The "bug" is the expected behavior for 2.7, as previously noted, and does not exist on Python 3 (where list comprehensions follow the same rules as generator expressions for scoping), where NameErrors are raised consistently. -- nosy: +josh.r resolution: -> not a bug versions: +Python 2.7 -Python 3.6 ___ Python tracker <https://bugs.python.org/issue35273> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue34805] Explicitly specify `MyClass.__subclasses__()` returns classes in definition order
Josh Rosenberg added the comment: Keep in mind, had this guarantee been in place in 3.4, the improvement in 3.5 couldn't have been made, and issue17936 might have been closed and never addressed, even once dicts were ordered, simply because we never rechecked it. It makes the whole "potential downside" more obvious, because we would have paid that price not so long ago. Knowing that 3.5 improved by breaking this guarantee was part of what made me cautious here. -- ___ Python tracker <https://bugs.python.org/issue34805> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue34805] Explicitly specify `MyClass.__subclasses__()` returns classes in definition order
Josh Rosenberg added the comment: I wrote the response to the OP's use case before I saw your response; it wasn't really intended as an additional critique of the proposed change or a counterargument to your post, just a note that the required behavior could be obtained on all versions of Python via metaclasses, including on 3.5. I have no specific plans to rewrite the typeobject.c, nor make a C implemented WeakSet. I'm just leery of adding language guarantees that limit future development when they: 1. Provide very little benefit (I doubt one package in 10,000 even uses __subclasses__, let alone relies on its ordering) 2. The benefit is achievable without herculean efforts with existing tools (metaclasses can provide the desired behavior with minimal effort at the trivial cost of an additional side-band dict on the root class) If the guarantee never limits a proposed change, then our best case scenario is we provided a guarantee that benefits almost no one (guaranteed upside minimal). But if it limits a proposed change, we might lose out on a significant improvement in performance, code maintainability, what have you (much larger potential downside). I'm just not seeing enough of a benefit to justify the potential cost. -- ___ Python tracker <https://bugs.python.org/issue34805> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue34805] Explicitly specify `MyClass.__subclasses__()` returns classes in definition order
Josh Rosenberg added the comment: I'm also a little skeptical of the OP's proposed use case for other reasons. In any circumstance other than "all classes are defined in the same module", you can't really make useful guarantees about subclass definition order, because: 1. If the converters are defined in multiple modules in a single package, the module with IntConverter could be imported first explicitly, and now BoolConverter will come second. 2. If all the provided converters occur in a single monolithic module, and some other package tries to make a converter for their own int subclass, well, IntConverter is already first in the list of subclasses, so the other package's converter will never be called (unless it's for the direct subclass of int, rather than a grandchild of int, but that's an implementation detail of the OP's project). Essentially, to write such a class hierarchy properly, you'd need to rejigger the ordering each time a class was registered such that any converter for a parent class was pushed until after the converter for all of its descendant classes (and if there is multiple inheritance involved, you're doomed). Even ignoring all that, their use case doesn't require explicit registration if they don't want it to. By making a simple metaclass for the root class, the metaclass's __new__ can perform registration on the descendant class's behalf, either with the definition time ordering of the current design, or with a more complicated rejiggering I described that would be necessary to ensure parent classes are considered after child classes (assuming no multiple inheritance). -- ___ Python tracker <https://bugs.python.org/issue34805> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue34805] Explicitly specify `MyClass.__subclasses__()` returns classes in definition order
Josh Rosenberg added the comment: First off, the OP's original case seems like a use case for functools.singledispatch. Not really related to the problem, just thought I'd mention it. Secondly, are we sure we want to make such a guarantee? That restricts the underlying storage to ordered types (list/dict; possibly tuple at the cost of making modifications slightly more expensive), or an unordered type with additional ordering layered on it (like old-school OrderedDict). That does tie our hands in the future. For example, it seems like it would be a perfectly reasonable approach for the internal collection of subclasses to be implemented as a weakref.WeakSet (some future version of it implemented in C, rather than the current Python layer version) so as to reduce code duplication and improve handling when a subclass disappears. Right now, tp_subclasses is a dict keyed by the raw memory address of the subclass (understandable, but eww), with a value of a weakref to the subclass itself. There is tons of custom code involved in handling this (e.g. the dict only self-cleans because the dealloc for classes explicitly removes the subclass from the parent classes, but every use of the dict still has to assume weakrefs have gone dead anyway, because of reentrancy issues; these are solved problems in WeakSet which hides all the complexity from the user). Being able to use WeakSets would mean a huge amount of special purpose code in typeobject.c could go away, but guaranteeing ordering would make that more difficult (it would require providing an ordering guarantee for WeakSet, which, being built on set, would likely require ordering guarantees for sets in general, or changing WeakSet to be built on dicts). There is also (at least) one edge case that would need to be fixed (based on a brief skim of the code). type_set_bases (which handles assignment to __bases__ AFAICT, admittedly a niche use case) simplified its own implementation by making the process of changing __bases__ be to remove itself as a subclass of all of its original bases, then add itself as a subclass of the new bases. This is done even if there are overlaps in the bases, and even if the new bases are the same. Minimal repro: >>> class A: pass >>> class B(A): pass >>> class C(A): pass >>> A.__subclasses__() # Appear in definition order [__main__.B, __main__.C] >>> B.__bases__ = B.__bases__# Should be no-op... >>> A.__subclasses__() # But oops, order changed [__main__.C, __main__.B] I'm not going to claim this is common or useful (I've done something like this exactly once, interactively, while making an OrderedCounter from OrderedDict and Counter back before dicts were ordered; I got the inheritance order wrong and reversed it after the fact), but making the guarantee would be more than just stating it; we'd either have to complicate the code to back it up, or qualify the guarantee with some weird, possibly CPython-specific details. -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue34805> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35182] Popen.communicate() breaks when child closes its side of pipe but not exits
Josh Rosenberg added the comment: Hmm... Correction to my previous post. communicate itself has a test for: "if self._communication_started and input:" that raises an error if it passes, so the second call to communicate can only be passed None/empty input. And _communicate only explicitly closes self.stdin when input is falsy and _communication_started is False, so the required behavior right now is: 1. First call *may* pass input 2. Second call must not pass (non-empty) input under any circumstance So I think we're actually okay on the code for stdin, but it would be a good idea to document that input *must* be None on all but the first call, and that the input passed to the first call is cached such that as long as at least one call to communicate completes without a TimeoutError (and the stdin isn't explicitly closed), it will all be sent. Sorry for the noise; I should have rechecked communicate itself, not just _communicate. -- ___ Python tracker <https://bugs.python.org/issue35182> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35182] Popen.communicate() breaks when child closes its side of pipe but not exits
Josh Rosenberg added the comment: Sounds like the solution you'd want here is to just change each if check in _communicate, so instead of: if self.stdout: selector.register(self.stdout, selectors.EVENT_READ) if self.stderr: selector.register(self.stderr, selectors.EVENT_READ) it does: if self.stdout and not self.stdout.closed: selector.register(self.stdout, selectors.EVENT_READ) if self.stderr and not self.stderr.closed: selector.register(self.stderr, selectors.EVENT_READ) The `if self.stdin and input:` would also have to change. Right now it's buggy in a related, but far more complex way. Specifically if you call it with input the first time: 1. If some of the input is sent but not all, and the second time you call communicate you rely on the (undocumented, but necessary for consistency) input caching and don't pass input at all, it won't register the stdin handle for read (and in fact, will explicitly close the stdin handle), and the remaining cached data won't be sent. If you try to pass some other non-empty input, it just ignores it and sends whatever remains in the cache (and fails out as in the stdout/stderr case if the data in the cache was sent completely before the timeout). 2. If all of the input was sent on the first call, you *must* pass input=None, or you'll die trying to register self.stdin with the selector The fix for this would be to either: 1. Follow the pattern for self.stdout/stderr (adding "and not self.stdin.closed"), and explicitly document that repeated calls to communicate must pass the exact same input each time (and optionally validate this in the _save_input function, which as of right now just ignores the input if a cache already exists); if input is passed the first time, incompletely transmitted, and not passed the second time, the code will error as in the OP's case, but it will have violated the documented requirements (ideally the error would be a little more clear though) or 2. Change the code so populating the cache (if not already populated) is the first step, and replace all subsequent references to input with references to self._input (for setup tests, also checking if self._input_offset >= len(self._input), so it doesn't register for notifications on self.stdin if all the input has been sent), so it becomes legal to pass input=None on a second call and rely on the first call to communicate caching it. It would still ignore new input values on the subsequent calls, but at least it would behave in a sane way (not closing sys.stdin despite having unsent cached data, then producing a confusing error that is several steps removed from the actual problem) Either way, the caching behavior for input should be properly documented; we clearly specify that output is preserved after a timeout and retrying communicate ("If the process does not terminate after timeout seconds, a TimeoutExpired exception will be raised. Catching this exception and retrying communication will not lose any output."), but we don't say anything about input, and right now, the behavior is the somewhat odd and hard to express: "Retrying a call to communicate when the original call was passed non-None/non-empty input requires subsequent call(s) to pass non-None, non-empty input. The input on said subsequent calls is otherwise ignored; only the unsent remainder of the original input is sent. Also, it will just fail completely if you pass non-empty input and it turns out the original input was sent completely on the previous call, in which case you *must* call it with input=None." It might also be worth changing the selectors module to raise a more obvious exception when register is passed a closed file-like object, but given it only requires non-integer fileobjs to have a .fileno() method, adding a requirement for a "closed" attribute/property could break other code. -- nosy: +josh.r stage: -> needs patch ___ Python tracker <https://bugs.python.org/issue35182> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35180] Ctypes segfault or TypeError tested for python2.7 and 3
Josh Rosenberg added the comment: As soon as you use ctypes, you sign up for all the security vulnerabilities, including denial of service, buffer overrun, use-after-free, etc. that plain old C programs are subject to. In this case, it's just a NULL pointer dereference (read: segfault in most normal cases), but in general, if you don't use ctypes with the same discipline as you would actual C code (at best it provides a little in the way of automatic memory management), you're subject to all the same problems. Side-note: When replying to e-mails, don't include the quotes from the e-mail you're replying to; it just clutters the tracker. -- ___ Python tracker <https://bugs.python.org/issue35180> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35180] Ctypes segfault or TypeError tested for python2.7 and 3
Josh Rosenberg added the comment: The TypeError on Py3 would be because functions taking c_char_p need bytes-like objects, not str, on Python 3. '%s' % directory is pointless when directory is a str; instead you need to encode it to a bytes-like object, e.g. opendir(os.fsencode(directory)) (os.fsencode is Python 3 specific; plain str works fine on Py 2). Your segfault isn't occurring when you load dirfd, it occurs when you call it on the result of opendir, when opendir returned NULL on failure (due to the non-existent directory you call it with). You didn't check the return value, and end up doing flagrantly illegal things with it. In neither case is this a bug in Python; ctypes lets you do evil things that break the rules, and if you break the rules the wrong way, segfaults are to be expected. Fix your argument types (for Py3), check your return values (for Py2). -- nosy: +josh.r resolution: -> not a bug stage: -> resolved status: open -> closed ___ Python tracker <https://bugs.python.org/issue35180> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35175] Builtin function all() is handling dict() types in a weird way.
Change by Josh Rosenberg : -- resolution: -> not a bug stage: -> resolved status: open -> closed ___ Python tracker <https://bugs.python.org/issue35175> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35114] ssl.RAND_status docs describe it as returning True/False; actually returns 1/0
New submission from Josh Rosenberg : The ssl.RAND_status online docs say (with code format on True/False): "Return True if the SSL pseudo-random number generator has been seeded with ‘enough’ randomness, and False otherwise." This is incorrect; the function actually returns 1 or 0 (and the docstring agrees). Fix can be one of: 1. Update docs to be less specific about the return type (use true/false, not True/False) 2. Update docs to match docstring (which specifically says 1/0, not True/False) 3. Update implementation and docstring to actually return True/False (replacing PyLong_FromLong with PyBool_FromLong and changing docstring to use True/False to match online docs) #3 involves a small amount of code churn, but it also means we're not needlessly replicating a C API's use of int return values when the function is logically bool (there is no error status for the C API AFAICT, so it's not like returning int gains us anything on flexibility). bool would be mathematically equivalent to the original 1/0 return value in the rare cases someone uses it mathematically. -- assignee: docs@python components: Documentation, SSL messages: 328917 nosy: docs@python, josh.r priority: low severity: normal status: open title: ssl.RAND_status docs describe it as returning True/False; actually returns 1/0 type: behavior ___ Python tracker <https://bugs.python.org/issue35114> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35098] Deleting __new__ does not restore previous behavior
Change by Josh Rosenberg : -- resolution: -> duplicate stage: -> resolved status: open -> closed superseder: -> Assigning and deleting __new__ attr on the class does not allow to create instances of this class ___ Python tracker <https://bugs.python.org/issue35098> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35043] functools.reduce doesn't work properly with itertools.chain
Josh Rosenberg added the comment: Blech. Copy'n'paste error in last post: a = list(itertools.chain.from_iterable(*my_list)) should be: a = list(itertools.chain.from_iterable(my_list)) (Note removal of *, which is the whole point of from_iterable) -- ___ Python tracker <https://bugs.python.org/issue35043> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35043] functools.reduce doesn't work properly with itertools.chain
Josh Rosenberg added the comment: Your example code doesn't behave the way you claim. my_list isn't changed, and `a` is a chain generator, not a list (without a further list wrapping). In any event, there is no reason to involve reduce here. chain already handles varargs what you're trying to do without involving reduce at all: a = list(itertools.chain(*my_list)) or if you prefer to avoid unnecessary unpacking: a = list(itertools.chain.from_iterable(*my_list)) Either way, a will be [1, 2, 3, 4], and my_list will be unchanged, with no wasteful use of reduce. -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue35043> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35006] itertools.combinations has wrong type when using the typing package
Josh Rosenberg added the comment: Looks like a bug in the typeshed (which mypy depends on to provide typing info for most of the stdlib, which isn't explicitly typed). Affects both combinations and combinations_with_replacement from a quick check of the code: https://github.com/python/typeshed/blob/94485f9e4f86df143801c1810a58df993b2b79b3/stdlib/3/itertools.pyi#L103 Presumably this should be opened on the typeshed tracker. https://github.com/python/typeshed/issues -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue35006> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue34947] inspect.getclosurevars() does not get all globals
Josh Rosenberg added the comment: Problem: The variables from the nested functions (which comprehensions are effectively a special case of) aren't actually closure variables for the function being inspected. Allowing recursive identification of all closure variables might be helpful in some contexts, but you wouldn't want it to be the only behavior; it's easier to convert a non-recursive solution to a recursive solution than the other way around. -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue34947> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19270] Document that sched.cancel() doesn't distinguish equal events and can break order
Josh Rosenberg added the comment: Victor: "I would be interested of the same test on Windows." Looks like someone performed it by accident, and filed #34943 in response (because time.monotonic() did in fact return the exact same time twice in a row on Windows). -- ___ Python tracker <https://bugs.python.org/issue19270> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue34889] int.to_bytes and int.from_bytes should default to the system byte order like the struct module does
Josh Rosenberg added the comment: to_bytes and from_bytes aren't remotely related to native primitive types, struct is. If the associated lengths aren't 2, 4 or 8, there is no real correlation with system level primitives, and providing these defaults makes it easy to accidentally write non-portable code. Providing a default might make sense, but if you do, it should be a fixed default (so output is portable). Making it depend on the system byte order for no real reason aside from "so I can do struct-like things faster in a non-struct way" is not a valid reason to make a behavior both implicit and inconsistent. -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue34889> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue34886] subprocess.run throws exception when input and stdin are passed as kwargs
Josh Rosenberg added the comment: The actual code receives input by name, but stdin is received in **kwargs. The test is just: if input is not None: if 'stdin' in kwargs: raise ValueError(...) kwargs['stdin'] = PIPE Perhaps just change `if 'stdin' in kwargs:` to: if kwargs.get('stdin') is not None: so it obeys the documented API (that says stdin defaults to None, and therefore passing stdin=None explicitly should be equivalent to not passing it at all)? -- ___ Python tracker <https://bugs.python.org/issue34886> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue34886] subprocess.run throws exception when input and stdin are passed as kwargs
Josh Rosenberg added the comment: I just tried: subprocess.run('ls', input=b'', stdin=None) and I got the same ValueError as for passing using kwargs. Where did you get the idea subprocess.run('ls', input=b'', stdin=None) worked? -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue34886> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue34784] Heap-allocated StructSequences
Josh Rosenberg added the comment: This looks like a duplicate of #28709, though admittedly, that bug hasn't seen any PRs. -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue34784> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue34750] locals().update doesn't work in Enum body, even though direct assignment to locals() does
Josh Rosenberg added the comment: The documentation for locals ( https://docs.python.org/3/library/functions.html#locals ) specifically states: Note: The contents of this dictionary should not be modified; changes may not affect the values of local and free variables used by the interpreter. The docstring for locals is similar, making it clear that any correlation between the returned dict and the state of locals if *either* is subsequently modified is implementation dependent, subject to change without back-compat concerns; even if we made this change, we've given ourselves the freedom to undo it at any time, which makes it useless to anyone who might try to rely on it. The fact that even locals()["a"] = 1 happens to work is an implementation detail AFAICT; normally, locals() is and should remain read-only (or at least, modifications don't actually affect the local scope aside from the dict returned by locals()). I'm worried that making _EnumDict inherit from collections.abc.MutableMapping in general would slow down Enums (at the very least creation, I'm not clear on whether _EnumDict remains, hidden behind the mappingproxy, for future lookups on the class), since MutableMapping would introduce a Python layer of overhead to most calls. I'm also just not inclined to encourage the common assumption that locals() returns a dict where mutating it actually works, since it usually doesn't. -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue34750> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue34601] Typo: "which would rather raise MemoryError than give up", than or then?
Josh Rosenberg added the comment: "than" is correct; "giving up" in this context would mean "not even trying to allocate the memory and just preemptively raising OverflowError, like non-integer numeric types with limited ranges". Rather than giving up that way, it chooses to try to allocate the huge integer and raises a MemoryError only if that fails. -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue34601> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue34574] OrderedDict iterators are exhausted during pickling
Josh Rosenberg added the comment: This would presumably be a side-effect of all generic pickling operations of iterators; figuring out what the iterator produces requires running out the iterator. You could special case it case-by-case, but that just makes the behavior unreliable/confusing; now some iterators pickle without being mutated, and others don't. Do you have a proposal to fix it? Is it something that needs to be fixed at all, when the option to pickle the original OrderedDict directly is there? -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue34574> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue34535] queue.Queue(timeout=0.001) avg delay Windows:14.5ms, Ubuntu: 0.063ms
Josh Rosenberg added the comment: Victor, that was a little overboard. By that logic, there doesn't need to be a Windows version of Python. That said, Paul doesn't seem to understand that the real resolution limit isn't 1 ms; that's the lower limit on arguments to the API, but the real limit is the system clock, which has a granularity in the 10-16 ms range. It's a problem with Windows in general, and the cure is worse than the disease. Per https://msdn.microsoft.com/en-us/library/windows/desktop/ms724411(v=vs.85).aspx , the resolution of the system timer is typically in the range of 10 milliseconds to 16 milliseconds. Per https://docs.microsoft.com/en-us/windows/desktop/Sync/wait-functions#wait-functions-and-time-out-intervals : > Wait Functions and Time-out Intervals > The accuracy of the specified time-out interval depends on the resolution of > the system clock. The system clock "ticks" at a constant rate. If the > time-out interval is less than the resolution of the system clock, the wait > may time out in less than the specified length of time. If the time-out > interval is greater than one tick but less than two, the wait can be anywhere > between one and two ticks, and so on. All the Windows synchronization primitives (e.g. WaitForSingleObjectEx https://docs.microsoft.com/en-us/windows/desktop/api/synchapi/nf-synchapi-waitforsingleobjectex , which is what ultimately implements timed lock acquisition on Windows) are based on the system clock, so without drastic measures, it's impossible to get better granularity than the 10-16 ms of the default system clock configuration. The link on "Wait Functions and Time-out Intervals" does mention that this granularity *can* be increased, but it recommends against fine-grained tuning (so you can't just tweak it before a wait and undo the tweak after; the only safe thing to do is change it on program launch and undo it on program exit). Even then, it's a bad idea for Python to use it; per timeBeginPeriod's own docs ( https://docs.microsoft.com/en-us/windows/desktop/api/timeapi/nf-timeapi-timebeginperiod ): > This function affects a global Windows setting. Windows uses the lowest value > (that is, highest resolution) requested by any process. Setting a higher > resolution can improve the accuracy of time-out intervals in wait functions. > However, it can also reduce overall system performance, because the thread > scheduler switches tasks more often. High resolutions can also prevent the > CPU power management system from entering power-saving modes. Setting a > higher resolution does not improve the accuracy of the high-resolution > performance counter. Basically, to improve the resolution of timed lock acquisition, we'd have to change the performance profile of the entire OS while Python was running, likely increasing power usage and possibly reducing performance. Global solutions to local problems are a bad idea. The most reasonable solution to the problem is to simply document it (maybe not for queue.Queue, but for the threading module). Possibly even provide an attribute in the threading module similar to threading.TIMEOUT_MAX that reports the system clock's granularity for informational purposes (might need to be a function so it reports the potentially changing granularity). Other, less reasonable solutions, would be: 1. Expose a function (with prominent warnings about not using it in a fine grained manner, and the effects on power management and performance) that would increase the system clock granularity as much as possible timeGetDevCaps reports possible (possibly limited to a user provided suggestion, so while the clock could go to 1 ms resolution, the user could request only 5 ms resolution to reduce the costs of doing so). Requires some additional state (whether timeBeginPeriod has been called, and with what values) so timeEndPeriod can be called properly before each adjustment and when Python exits. Pro is the code is *relatively* simple and would mostly fix the problem. Cons are that it wouldn't be super discoverable (unless we put notes in every place that uses timeouts, not just in threading docs), it encourages bad behavior (one application deciding its needs are more important that conserving power), and we'd have to be *really* careful to pair our calls universally (timeEndPeriod mus t be called, even when other cleanup is skipped, such as when calling os._exit; AFAICT, the docs imply that per-process adjustments to the clock aren't undone even when the process completes, which means failure to pair all calls would leave the system with a suboptimal system clock resolution that would remain in effect until rebooted). 2. (Likely a terrible idea, and like option 1, should be explicitly opt-in, not enabled by default) Offer the option to have Python lock timeouts only use WaitForSingleObjectEx
[issue34494] simple "sequence" class ignoring __len__
Josh Rosenberg added the comment: That's the documented behavior. Per https://docs.python.org/3/reference/datamodel.html#object.__getitem__ : >Note: for loops expect that an IndexError will be raised for illegal indexes >to allow proper detection of the end of the sequence. The need for *only* __getitem__ is also mentioned in the documentation of the iter builtin ( https://docs.python.org/3/library/functions.html#iter ): >Without a second argument, object must be a collection object which supports >the iteration protocol (the __iter__() method), or it must support the >sequence protocol (the __getitem__() method with integer arguments starting at >0). At no point is a dependency on __len__ mentioned. -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue34494> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue34434] Removal of kwargs for built-in types not covered with "changed in Python" note in documentation
Josh Rosenberg added the comment: For tuple and list, no, they couldn't have looked at the help (because the help calls the argument "iterable", while the only keyword accepted was "sequence"). Nor was "sequence" documented in the online docs, nor anywhere else that I can find; it was solely in the C source code. If it was discoverable in any other way, I wouldn't say documenting the change (outside of What's New) was completely unjustifiable (I acknowledge that int, bool and float warrant a mention, since they did document a functioning name for the argument; I was a little too down on them in my original messages). But the only way someone would accidentally use keyword arguments for list/tuple is if they were fuzzing the constructor by submitting random keyword arguments until something worked. That seems an odd thing to worry about breaking. The error message wouldn't help either; the exception raised tells you what argument was unrecognized, but not the names of recognized arguments. Even if you want to document it, it's hard to do so without being confusing, inaccurate, or both. The original PR's versionchanged message was: *iterable* is now a positional-only parameter. But "iterable" was never a legal keyword, so saying it's "now a positional-only parameter" implies that at some point, it wasn't, and you could pass it with the name "iterable", which is wrong/confusing. If you mention "sequence", you're mentioning a now defunct detail (confusing, but not wrong). I suppose you could have the versionchanged say "This function does not accept keyword arguments", but again, for all discoverable purposes, it never did. I'm not saying *no* documentation of the change is needed, but I am saying, for list/tuple, the What's New note is sufficient to cover it for those people who went mucking through the CPython source code to find an undocumented keyword they could use. -- ___ Python tracker <https://bugs.python.org/issue34434> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue34434] Removal of kwargs for built-in types not covered with "changed in Python" note in documentation
Josh Rosenberg added the comment: Oh, I was checking old docs when I said the online docs didn't call int's argument "x"; the current docs do, so int, float and bool all justify a change (barely), it's just tuple and list for which it's completely unjustifiable. -- ___ Python tracker <https://bugs.python.org/issue34434> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue34434] Removal of kwargs for built-in types not covered with "changed in Python" note in documentation
Josh Rosenberg added the comment: Bloating the documentation is almost certainly unjustifiable for list and tuple, and only barely justifiable for int, bool and float, given that: 1. The documentation (at least for Python 3) has *never* claimed the arguments could be passed by keyword (all of them used brackets to indicate the argument was optional without implying a meaningful default, which is typically how "does not take arguments by keyword" was described before the current "/" convention) and 2. Aside from bool and float (and to a lesser extent, int), the documented name of said parameter didn't match the name it was accepted under, e.g.: a. The docs for tuple and list claimed the name was "iterable"; the only accepted name was "sequence" b. The online docs for int gave a wholly invalid "name", calling it "number | string", when in fact it was accepted only as "x". That said, int's docstring does describe the name "correctly" as "x" So for tuple/list it would have been impossible to write code that depended on being able to pass the first parameter by keyword unless you'd gone mucking about in the CPython source code to figure out the secret keyword name. I could justify a note for int/bool/float given that the docstrings for all of them named the argument, and bool/float named it in the online docs, but we don't need to document a change that no one could have taken a dependency on without going to extreme trouble. -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue34434> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue34458] No way to alternate options
Josh Rosenberg added the comment: That's a *really* niche use case; you want to store everything to a common destination list, in order, but distinguish which switch added each one? I don't know of any programs that use such a design outside of Python (and therefore, it seems unlikely there would be enough demand from argparse users to justify the development, maintenance, and complexity cost of adding it). argparse does support defining custom Actions, so it's wholly possible to add this sort of support for yourself if there isn't enough demand to add it to argparse itself. For example, a simple implementation would be: class AppendWithSwitchAction(argparse.Action): def __init__(self, option_strings, dest, *args, **kwargs): super().__init__(option_strings, dest, *args, **kwargs) # Map all possible switches to the final switch provided # so we store a consistent switch name self.option_map = dict.fromkeys(option_strings, option_strings[-1]) def __call__(self, parser, namespace, values, option_string=None): option = self.option_map.get(option_string) try: getattr(namespace, self.dest).append((option, value)) except AttributeError: setattr(namespace, self.dest, [(option, value)]) then use it with: parser.add_argument('-p', '--preload', help='preload asset', action=AppendWithSwitchAction, metavar='NAMESPACE') parser.add_argument('-f', '--file', help='preload file', action=AppendWithSwitchAction, metavar='FILE', dest='preload') All that does is append ('--preload', argument) or ('--file', argument) instead of just appending the argument, so you can distinguish one from the other (for switch, arg in args.preload:, then test if switch=='--preload' or '--file'). It's bare bones (the actual class underlying the 'append' action ensures nargs isn't 0, and that if const is provided, nargs is '?'), but it would serve. -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue34458> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue34410] itertools.tee not thread-safe; can segfault interpreter when wrapped iterator releases GIL
Josh Rosenberg added the comment: Carlo: The point of Xiang's post is that this is only tangentially related to multiprocessing; the real problem is that tee-ing an iterator implemented in Python (of which pool.imap_unordered is just one example) and using the resulting tee-ed iterators in multiple threads (which pool.imap_unordered does implicitly, as there is a thread involved in dispatching work). The problem is *exposed* by multiprocessing.pool.imap_unordered, but it entirely a problem with itertools.tee, and as Xiang's repro indicates, it can be triggered easily without the complexity of multiprocessing being involved. I've updated the bug title to reflect this. -- components: +Library (Lib) nosy: +josh.r title: Segfault/TimeoutError: itertools.tee of multiprocessing.pool.imap_unordered -> itertools.tee not thread-safe; can segfault interpreter when wrapped iterator releases GIL versions: +Python 3.6 ___ Python tracker <https://bugs.python.org/issue34410> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue34364] problem with traceback for syntax error in f-string
Josh Rosenberg added the comment: So the bug is that the line number and module are incorrect for the f-string, right? Nothing else? -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue34364> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue34321] mmap.mmap() should not necessarily clone the file descriptor
Josh Rosenberg added the comment: Why would it "cause an issue if the file is closed before accessing the mmapped region"? As shown in your own link, the constructor performs the mmap call immediately after the descriptor is duplicated, with the GIL held; any race condition that could close the file before the mmap occurs could equally well close it before the descriptor is duplicated. The possible issues aren't tied to accessing the memory (once the mapping has been performed, the file descriptor can be safely closed in general), but rather, to the size and resize methods of mmap objects (the former using the fd to fstat the file, the latter using it to ftruncate the file). As long as you don't use size/resize, nothing else depends on the file descriptor after construction has completed. The size method in particular seems like a strange wart on the API; it returns the total file size, not the size of the mapping (len(mapping) gets the size of the actual mapping). -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue34321> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue34259] Improve docstring of list.sort
Josh Rosenberg added the comment: Copying from the sorted built-in's docstring would make sense here, given that sorted is implemented in terms of list.sort in the first place. -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue34259> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue29842] Make Executor.map work with infinite/large inputs correctly
Josh Rosenberg added the comment: In response to Max's comments: >But consider the case where input is produced slower than it can be processed >(`iterables` may fetch data from a database, but the callable `fn` may be a >fast in-memory transformation). Now suppose the `Executor.map` is called when >the pool is busy, so there'll be a delay before processing begins. In this >case, the most efficient approach is to get as much input as possible while >the pool is busy, since eventually (when the pool is freed up) it will become >the bottleneck. This is exactly what the current implementation does. I'm not sure the "slow input iterable, fast task, competing tasks from other sources" case is all that interesting. Uses of Executor.map in the first place are usually a replacement for complex task submission; perhaps my viewpoint is blinkered, but I see the Executors used for *either* explicit use of submit *or* map, rather than mixing and matching (you might use it for both, but rarely interleave usages). Without a mix and match scenario (and importantly, a mix and match scenario where enough work is submitted before the map to occupy all workers, and very little work is submitted after the map begins to space out map tasks such that additional map input is requested while workers are idle), the smallish default prefetch is an improvement, simply by virtue of getting initial results more quickly. The solution of making a dedicated input thread would introduce quite a lot of additional complexity, well beyond what I think it justifiable for a relatively niche use case, especially one with many available workarounds, e.g. 1. Raising the prefetch count explicitly 2. Having the caller listify the iterable (similar to passing an arbitrarily huge prefetch value, with the large prefetch value having the advantage of sending work to the workers immediately, while listifying has the advantage of allowing you to handle any input exceptions up front rather than receiving them lazily during processing) 3. Use cheaper inputs (e.g. the query string, not the results of the DB query) and perform the expensive work as part of the task (after all, the whole point is to parallelize the most expensive work) 4. Using separate Executors so the manually submitted work doesn't interfere with the mapped work, and vice versa 5. Making a separate ThreadPoolExecutor to generate the expensive input values via its own map function (optionally with a larger prefetch count), e.g. instead of with SomeExecutor() as executor: for result in executor.map(func, (get_from_db(query) for query in queries)): do: with SomeExecutor() as executor, ThreadPoolExecutor() as inputexec: inputs = inputexec.map(get_from_db, queries) for result in executor.map(func, inputs): Point is, yes, there will still be niche cases where Executor.map isn't perfect, but this patch is intentionally a bit more minimal to keep the Python code base simple (no marshaling exceptions across thread boundaries) and avoid extreme behavioral changes; it has some smaller changes, e.g. it necessarily means input-iterator-triggered exceptions can be raised after some results are successfully produced, but it doesn't involve adding more implicit threading, marshaling exceptions across threads, etc. Your proposed alternative, with a thread for prefetching inputs, a thread for sending tasks, and a thread for returning results creates a number of problems: 1. As you mentioned, if no prefetch limit is imposed, memory usage remains unbounded; if the input is cheap to generate and slow to process, memory exhaustion is nearly guaranteed for infinite inputs, and more likely for "very large" inputs. I'd prefer the default arguments to be stable in (almost) all cases, rather than try to maximize performance for rare cases at the expense of stability in many cases. 2. When input generation is CPU bound, you've just introduced an additional source of unavoidable GIL contention; granted, after the GIL fixes in 3.2, GIL contention tends to hurt less (before those fixes, I could easily occupy 1.9 cores doing 0.5 cores worth of actual work with just two CPU bound threads). Particularly in the ProcessPoolExecutor case (where avoiding GIL contention is the goal), it's a little weird if you can end up with unavoidable GIL contention in the main process. 3. Exception handling from the input iterator just became a nightmare; in a "single thread performs input pulls and result yield" scenario, the exceptions from the input thread naturally bubble to the caller of Executor.map (possibly after several results have been produced, but eventually). If a separate thread is caching from the input iterator, we'd need to marshal the exception from that thread back to the thread running Executor.map so it's visible to the caller, and providing a traceba
[issue29842] Make Executor.map work with infinite/large inputs correctly
Josh Rosenberg added the comment: In any event, sorry to be a pain, but is there any way to get some movement on this issue? One person reviewed the code with no significant concerns to address. There have been a duplicate (#30323) and closely related (#34168) issues opened that this would address; I'd really like to see Executor.map made more bulletproof against cases that plain map handles with equanimity. Even if it's not applied as is, something similar (with prefetch count defaults tweaked, or, at the expense of code complexity, a separate worker thread to perform the prefetch to address Max's concerns) would be a vast improvement over the status quo. -- ___ Python tracker <https://bugs.python.org/issue29842> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue34168] RAM consumption too high using concurrent.futures (Python 3.7 / 3.6 )
Josh Rosenberg added the comment: Note: While this particular use case wouldn't be fixed (map returns in order, not as completed), applying the fix from #29842 would make many similar use cases both simpler to implement and more efficient/possible. That said, no action has been taken on #29842 (no objections, but no action either), so I'm not sure what to do to push it to completion. -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue34168> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1617161] Instance methods compare equal when their self's are equal
Josh Rosenberg added the comment: If [].append == [].append is True in the "unique set of callbacks" scenario, that implies that it's perfectly fine to not call one of them when both are registered. But this means that only one list ends up getting updated, when you tried to register both for updates. That's definitely surprising, and not in a good way. -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue1617161> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33864] collections.abc.ByteString does not register memoryview
Josh Rosenberg added the comment: memoryview isn't just for bytes strings though; the format can make it a sequence of many types of different widths, meanings, etc. Calling it a BytesString would be misleading in many cases. -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue33864> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33361] readline() + seek() on codecs.EncodedFile breaks next readline()
Change by Josh Rosenberg <shadowranger+pyt...@gmail.com>: -- title: readline() + seek() on io.EncodedFile breaks next readline() -> readline() + seek() on codecs.EncodedFile breaks next readline() ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue33361> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33381] Incorrect documentation for strftime()/strptime() format code %f
Josh Rosenberg <shadowranger+pyt...@gmail.com> added the comment: Note: strftime follows the existing documentation: >>> datetime.datetime(1970, 1, 1, microsecond=1).strftime('%f') '01' The strptime behavior bug seems like a duplicate of #32267, which claims to be fixed in master as of early January; may not have made it into a release yet though. I can't figure out how to view the patch on that issue, it doesn't seem to be linked to GitHub like normal. -- nosy: +josh.r ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue33381> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33404] Phone Number Generator
Change by Josh Rosenberg <shadowranger+pyt...@gmail.com>: -- versions: -Python 3.6 ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue33404> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33404] Phone Number Generator
Josh Rosenberg <shadowranger+pyt...@gmail.com> added the comment: You named your loop variable i, overlapping the name of your second to last digit, so you end up replacing the original value of i in each (given the break, the only) loop. So before the loop begins, i has the expected value of '6', but on the first iteration, i is rebound to the value of a (the first element in the tuple), '5', and your format string uses that value instead. If you removed the break, you'd see the second to last digit cycle through all the other values as it goes, because i would be repeatedly rebound to each digit as it goes. This is a bug in your code, not a problem with Python; in the future, direct questions of this sort to other online resources (e.g. Stack Overflow); unless you have a provable bug in Python itself, odds are it's a bug in your code's logic. -- nosy: +josh.r ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue33404> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33315] Allow queue.Queue to be used in type annotations
Josh Rosenberg <shadowranger+pyt...@gmail.com> added the comment: None of the actual classes outside of the typing module support this either to my knowledge. You can't do: from collections import deque a: deque[int] nor can you do: a: list[int] Adding Queue to the typing module might make sense (feel free to edit it if that's what you're looking for), but unless something has changed in 3.7 (my local install is 3.6.4), it's never been legal to do what you're trying to do with queue.Queue itself with the original type, only with the special typing types that exist for that specific purpose. -- nosy: +josh.r ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue33315> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33319] `subprocess.run` documentation doesn't tell is using `stdout=PIPE` safe
Josh Rosenberg <shadowranger+pyt...@gmail.com> added the comment: If the goal is just to suppress stdout, that's what passing subprocess.DEVNULL is for (doesn't exist in Py2, but opening os.devnull and passing that is a slightly higher overhead equivalent). subprocess.run includes a call to communicate as part of its default behavior, and stores its results, so call() isn't quite equivalent to run().returncode when PIPE was passed for standard handles, because call only includes an implicit call to wait, not communicate, and therefore pipes are not explicitly read and can block. Basically, subprocess.run is deadlock-safe (because it uses communicate, not just wait), but if you don't care about the results, and the results might be huge, don't pass it PIPE for stdout/stderr (because it will store the complete outputs in memory, just like any use of communicate with PIPE). The docs effectively tell you PIPE is safe; it returns a CompletedProcess object, and explicitly tells you that it has attributes that are (completely) populated based on whether capture was requested. If it had such attributes and still allowed deadlocks, it would definitely merit a warning. -- nosy: +josh.r ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue33319> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33267] ctypes array types create reference cycles
Josh Rosenberg <shadowranger+pyt...@gmail.com> added the comment: Pretty sure this is a problem with classes in general; classes are self-referencing, and using multiplication to create new ctypes array types is creating new classes. -- nosy: +josh.r ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue33267> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19993] Pool.imap doesn't work as advertised
Josh Rosenberg <shadowranger+pyt...@gmail.com> added the comment: Related: issue29842 "Make Executor.map work with infinite/large inputs correctly" for a similar problem in concurrent.futures (but worse, since it doesn't even allow you to begin consuming results until all inputs are dispatched). A similar approach to my Executor.map patch could probably be used with imap/imap_unordered. -- nosy: +josh.r ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue19993> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33231] Potential memory leak in normalizestring()
New submission from Josh Rosenberg <shadowranger+pyt...@gmail.com>: Patch is good, but while we're at it, is there any reason why this multi-allocation design was even used? It PyMem_Mallocs a buffer, makes a C-style string in it, then uses PyUnicode_FromString to convert C-style string to Python str. Seems like the correct approach would be to just use PyUnicode_New to preallocate the final string buffer up front, then pull out the internal buffer with PyUnicode_1BYTE_DATA and populate that directly, saving a pointless allocation/deallocation, which also means the failure case means no cleanup needed at all, while barely changing the code (aside from removing the need to explicitly NUL terminate). Only reason I can see to avoid this would be if the codec names could contain arbitrary Unicode encoded as UTF-8 (and therefore strlen wouldn't tell you the final length in Unicode ordinals), but I'm pretty sure that's not the case (if it is, we're not normalizing properly, since we only lower case ASCII). If Unicode codec names need to be handled, there are other options, though the easy savings go away. -- nosy: +josh.r ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue33231> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33229] Documentation - io — Core tools for working with streams - seek()
Josh Rosenberg <shadowranger+pyt...@gmail.com> added the comment: As indicated in the seek docs ( https://docs.python.org/3/library/io.html#io.IOBase.seek ), all three names were added to the io module in 3.1: > New in version 3.1: The SEEK_* constants. Since they're part of the io module too, there is no need to qualify them on the io module docs page. They're available in os as well, but you don't need to import it to use them. The OS specific addition position flags are explicitly documented to be found on the os module. -- nosy: +josh.r ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue33229> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33200] Optimize the empty set "literal"
Josh Rosenberg <shadowranger+pyt...@gmail.com> added the comment: I may have immediately latched onto this, dubbing it the "one-eyed monkey operator", the moment the generalized unpacking released. I always hated the lack of an empty set literal, and enjoyed having this exist just to fill that asymmetry with the other built-in collection types (that said, I never use it in production code, nor teach it in classes I run). -- nosy: +josh.r ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue33200> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33124] Lazy execution of module bytecode
Josh Rosenberg <shadowranger+pyt...@gmail.com> added the comment: Serhiy: There is a semi-common case where global constants can be quite expensive, specifically, initializing a global full of expensive to compute/serialize data so it will be shared post-fork when doing multiprocessing on a POSIX system. That said, that would likely be a case where lazy initialization would be a problem; you don't want each worker independently initializing the global lazily. Also, for all practical purposes, aren't enums and namedtuples global constants too? Since they don't rely on any syntax based support at point of use, they're just a "function call" followed by assignment to a global name; you couldn't really separate the concept of global constants from enums/namedtuple definitions, right? -- nosy: +josh.r ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue33124> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33087] No reliable clean shutdown method
Josh Rosenberg <shadowranger+pyt...@gmail.com> added the comment: To my knowledge, there is no safe way to do this for other threads for a reason. If you make all your worker threads daemons, then they will terminate with the main thread, but they won't perform cleanup actions. If you don't make them daemons, any "clean exit" procedure risks the threads choosing not to exit (even if you inject a SystemExit into every other thread, they might be in a try/except: or try/finally that suppresses it, or blocks waiting for something from another thread that has already exited, etc.). Exiting the thread that calls sys.exit() this way is considered okay, since you control when it is called, and it's up to you to do it at a safe place, but doing so asynchronously in other threads introduces all sorts of problems. Basically, you want a reliable "shut down the process" and a reliable "clean up every thread", but anything that allows clean up in arbitrary threads also allows them to block your desired "shut down the process". Do you have a proposal for handling this? -- nosy: +josh.r ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue33087> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com