[issue41740] Improve error message for string concatenation via `sum`
Marco Paolini added the comment: I was thinking to just clarify a bit the error message that results from Py_NumberAdd. This won't make it slower in the "hot" path doing something like (not compile tested, sorry) --- a/Python/bltinmodule.c +++ b/Python/bltinmodule.c @@ -2451,8 +2451,13 @@ builtin_sum_impl(PyObject *module, PyObject *iterable, PyObject *start) Py_DECREF(result); Py_DECREF(item); result = temp; -if (result == NULL) +if (result == NULL) { + if (PyUnicode_Check(item) || PyBytes_Check(item) || PyByteArray_Check(item)) + PyErr_SetString(PyExc_TypeError, + "sum() can't sum bytes, strings or byte-arrays [use .join(seq) instead]"); + } break; + } } Py_DECREF(iter); return result; -- ___ Python tracker <https://bugs.python.org/issue41740> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41740] string concatenation via `sum`
Marco Paolini added the comment: also worth noting, the start argument is type checked instead. Maybe we could apply the same checks to the items of the iterable? python3 -c "print(sum(('a', 'b', 'c'), start='d'))" Traceback (most recent call last): File "", line 1, in TypeError: sum() can't sum strings [use ''.join(seq) instead] see https://github.com/python/cpython/blob/c96d00e88ead8f99bb6aa1357928ac4545d9287c/Python/bltinmodule.c#L2310 -- ___ Python tracker <https://bugs.python.org/issue41740> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41740] string concatenation via `sum`
Marco Paolini added the comment: This happens because the default value for the start argument is zero , hence the first operation is `0 + 'a'` -- nosy: +mpaolini ___ Python tracker <https://bugs.python.org/issue41740> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue34624] -W option and PYTHONWARNINGS env variable does not accept module regexes
Marco Paolini added the comment: hello Thomas, do you need any help fixing the conflicts in your PR? even if Lib/warnings.py changed a little in the last 2 years, your PR is still good! -- nosy: +mpaolini ___ Python tracker <https://bugs.python.org/issue34624> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38677] Arraymodule initialization error handling improvements
Change by Marco Paolini : -- keywords: +patch pull_requests: +16552 stage: -> patch review pull_request: https://github.com/python/cpython/pull/17039 ___ Python tracker <https://bugs.python.org/issue38677> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38677] Arraymodule initialization error handling improvements
New submission from Marco Paolini : Module two-phase initialization does not report errors correctly to the import machinery -- components: Extension Modules messages: 355913 nosy: mpaolini priority: normal severity: normal status: open title: Arraymodule initialization error handling improvements type: behavior versions: Python 3.8, Python 3.9 ___ Python tracker <https://bugs.python.org/issue38677> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue37587] JSON loads performance improvement for long strings
Marco Paolini added the comment: ujson (https://github.com/esnme/ultrajson) instead is faster when decoding non-ascii in the same example above, so it is likely there is room for improvement... -- ___ Python tracker <https://bugs.python.org/issue37587> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue37587] JSON loads performance improvement for long strings
Marco Paolini added the comment: ops sorry here's the right commands python -m pyperf timeit -s 'import json;' -s 'c = "a"; s = json.dumps(c * (2**10 // (len(json.dumps(c)) - 2)))' 'json.loads(s)' -o ascii2k.json python -m pyperf timeit -s 'import json;' -s 'c = "€"; s = json.dumps(c * (2**10 // (len(json.dumps(c)) - 2)))' 'json.loads(s)' -o nonascii2k.json Mean +- std dev: [ascii2k] 3.69 us +- 0.05 us -> [nonascii2k] 12.4 us +- 0.1 us: 3.35x slower (+235%) -- ___ Python tracker <https://bugs.python.org/issue37587> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue37587] JSON loads performance improvement for long strings
Marco Paolini added the comment: also worth noting escape sequences for non-ascii characters are slower, even when encoded length is the same. python -m pyperf timeit -s 'import json;' -s 'c = "€"; s = json.dumps(c * (2**10 // len(json.dumps(c)) - 2))' 'json.loads(s)' -o nonascii2k.json python -m pyperf timeit -s 'import json;' -s 'c = "a"; s = json.dumps(c * (2**10 // len(json.dumps(c)) - 2))' 'json.loads(s)' -o ascii2k.json Mean +- std dev: [ascii2k] 2.59 us +- 0.04 us -> [nonascii2k] 9.98 us +- 0.12 us: 3.86x slower (+286%) -- ___ Python tracker <https://bugs.python.org/issue37587> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue37587] JSON loads performance improvement for long strings
Marco Paolini added the comment: I also confirm Inada's patch further improves performance! All my previous benchmarks were done with gcc and PGO optimizations performed only with test_json task... maybe this explains the weird results? I tested the performance of new master 69f37bcb28d7cd78255828029f895958b5baf6ff with *all* PGO task reverting my original patch: iff --git a/Modules/_json.c b/Modules/_json.c index 112903ea57..9b63167276 100644 --- a/Modules/_json.c +++ b/Modules/_json.c @@ -442,7 +442,7 @@ scanstring_unicode(PyObject *pystr, Py_ssize_t end, int strict, Py_ssize_t *next if (d == '"' || d == '\\') { break; } -if (d <= 0x1f && strict) { +if (strict && d <= 0x1f) { raise_errmsg("Invalid control character at", pystr, next); goto bail; } ... and surprise... Mean +- std dev: [69f37bcb28d7cd78255828029f895958b5baf6ff] 5.29 us +- 0.07 us -> [69f37bcb28d7cd78255828029f895958b5baf6ff-patched] 5.11 us +- 0.03 us: 1.04x faster (-4%) should we revert my original patch entirely now? Or am I missing something? -- ___ Python tracker <https://bugs.python.org/issue37587> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue37587] JSON loads performance improvement for long strings
Marco Paolini added the comment: @steve.dower yes, that's what made me discard that experiment we did during the sprint. Ok will test your new patch soon -- ___ Python tracker <https://bugs.python.org/issue37587> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue37587] JSON loads performance improvement for long strings
Marco Paolini added the comment: I forgot to mention, I was inspired by @christian.heimes 's talk at EuroPython 2019 https://ep2019.europython.eu/talks/es2pZ6C-introduction-to-low-level-profiling-and-tracing/ (thanks!) -- ___ Python tracker <https://bugs.python.org/issue37587> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue37587] JSON loads performance improvement for long strings
Marco Paolini added the comment: I am also working on a different patch that uses the "pcmpestri" SSE4 processor instruction, it looks like this for now. While at it I realized there is (maybe) another potential speedup: avoiding the ucs4lib_find_max_char we do for each chunk of the string ( that entails scanning the string in memory one more time)... anyways that's another (much longer) story, probably for another issue? ``` diff --git a/Modules/_json.c b/Modules/_json.c index 38beb6f50d..25b1cf4a99 100644 --- a/Modules/_json.c +++ b/Modules/_json.c @@ -400,6 +400,38 @@ _build_rval_index_tuple(PyObject *rval, Py_ssize_t idx) { Py_CLEAR(chunk); \ } + +inline unsigned int +_fast_search(const void *needle, unsigned int needle_len, const void *haystack, unsigned int haystack_len) +{ + unsigned int pos; + __asm__ __volatile__("movq (%1), %%xmm1;\n" + "mov %2, %%eax;\n" + "movq %3, %%r8;\n" + "mov %4, %%edx;\n" + ".intel_syntax noprefix;\n" + "loop: pcmpestri xmm1, [r8], 0;\n" /* 0 = equal any */ + /* "pcmpestri %%mm1, (%%r8), $0;\n" /\* 0 = equal any *\/ */ + ".att_syntax prefix;\n" + "cmp $15, %%ecx;\n" + "jbe found;\n" + "sub $16, %%edx;\n" + "jnge notfound;\n" + "add $16, %%r8;\n" + "jmp loop;\n" + "notfound: movl %4, %%ecx;\n" + "jmp exit;\n" + "found: mov %4, %%eax;\n" + "sub %%edx, %%eax;\n" + "add %%eax, %%ecx;\n" + "exit: mov %%ecx, %0;\n" + :"=m"(pos) + :"r"(needle), "r"(needle_len), "r"(haystack), "r"(haystack_len) + :"%eax", "%edx", "%ecx", "%r8", "%xmm1"); + return pos; +} + + static PyObject * scanstring_unicode(PyObject *pystr, Py_ssize_t end, int strict, Py_ssize_t *next_end_ptr) { @@ -431,17 +463,26 @@ scanstring_unicode(PyObject *pystr, Py_ssize_t end, int strict, Py_ssize_t *next PyErr_SetString(PyExc_ValueError, "end is out of bounds"); goto bail; } +char needle[2]; +needle[0] = '"'; +needle[1] = '\\'; while (1) { /* Find the end of the string or the next escape */ Py_UCS4 c = 0; -for (next = end; next < len; next++) { +if (kind == PyUnicode_1BYTE_KIND) { + next = _fast_search(needle, 2, buf+end, len-end) + end; + if (next < len) c = PyUnicode_READ(kind, buf, next); -if (c == '"' || c == '\\') { -break; -} -else if (strict && c <= 0x1f) { -raise_errmsg("Invalid control character at", pystr, next); -goto bail; +} else { +for (next = end; next < len; next++) { +c = PyUnicode_READ(kind, buf, next); +if (c == '"' || c == '\\') { +break; +} +else if (strict && c <= 0x1f) { +raise_errmsg("Invalid control character at", pystr, next); +goto bail; +} } } if (!(c == '"' || c == '\\')) { ``` -- ___ Python tracker <https://bugs.python.org/issue37587> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue37587] JSON loads performance improvement for long strings
Marco Paolini added the comment: On gcc, running the tests above, the only change that is relevant for speedup is switching around the strict check. Removing the extra MOV related to the outer "c" variable is not significant (at least on gcc and the few tests I did) Unfortunately I had to change the patch we did together during the sprint because it was breaking the strict check logic... I updated my PR accordingly, kept only the bare minimum. -- ___ Python tracker <https://bugs.python.org/issue37587> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue37587] JSON loads performance improvement for long strings
Marco Paolini added the comment: Here's the real world example $ ls -hs events-100k.json 84M events-100k.json +---+-+-+ | Benchmark | vanilla-bpo-events-100k | patched-bpo-events-100k | +===+=+=+ | timeit| 985 ms | 871 ms: 1.13x faster (-12%) | +---+-+-+ -- ___ Python tracker <https://bugs.python.org/issue37587> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue37587] JSON loads performance improvement for long strings
Marco Paolini added the comment: Also on my real workload (loading 60GB jsonl file containing mostly strings) I measured a 10% improvement -- ___ Python tracker <https://bugs.python.org/issue37587> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue37587] JSON loads performance improvement for long strings
Change by Marco Paolini : -- nosy: +ezio.melotti, rhettinger ___ Python tracker <https://bugs.python.org/issue37587> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue37587] JSON loads performance improvement for long strings
Change by Marco Paolini : -- keywords: +patch pull_requests: +14547 stage: -> patch review pull_request: https://github.com/python/cpython/pull/14752 ___ Python tracker <https://bugs.python.org/issue37587> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue37587] JSON loads performance improvement for long strings
New submission from Marco Paolini : I analysed the performance of json.loads in one production workload we have. Spy-py tells me the majority of time is spent into C json module (see events.svg) Digging deeper, linux perf tells me hottest loop (where 20%+ of the time is spent) in _json.scanstring_unicode, in this loop: 189: movzx eax,BYTE PTR [rbp+rbx*1+0x0] movDWORD PTR [rsp+0x44],eax cmpeax,0x22 je 22f cmpeax,0x5c je 22f test r13d,r13d je 180 cmpeax,0x1f which is related to this code in Modules/_json.c /* Find the end of the string or the next escape */ Py_UCS4 c = 0; for (next = end; next < len; next++) { c = PyUnicode_READ(kind, buf, next); if (c == '"' || c == '\\') { break; } else if (strict && c <= 0x1f) { raise_errmsg("Invalid control character at", pystr, next); goto bail; } } Two optimisations can be done: 1. remove the mov entirely. It is not needed inside the loop and it is only needed later, outside the loop to access the variable 2. switch around the strict check (in the second if) because strict defaults to 1 so it will likely pass the test, while the likelyness of finding an invalid character is lower Running the pyperformance json_loads benchmark I get: ++--+-+ | Benchmark | vanilla-pyperf-pgo38 | patched-pyperf-pgo38| ++==+=+ | json_loads | 54.9 us | 53.9 us: 1.02x faster (-2%) | ++--+-+ A micro benchmark on a 1MB long json string gives better results: python -m pyperf timeit -s "import json; x = json.dumps({'k': '1' * 2 ** 20})" "json.loads(x)" +---++-+ | Benchmark | vanilla-1m | patched-1m | +===++=+ | timeit| 2.62 ms| 2.39 ms: 1.10x faster (-9%) | +---++-+ -- components: Library (Lib) files: events.svg messages: 347832 nosy: christian.heimes, mpaolini, pablogsal priority: normal severity: normal status: open title: JSON loads performance improvement for long strings type: performance versions: Python 3.6, Python 3.7, Python 3.8, Python 3.9 Added file: https://bugs.python.org/file48476/events.svg ___ Python tracker <https://bugs.python.org/issue37587> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue26081] Implement asyncio Future in C to improve performance
Marco Paolini added the comment: THe guys developing uvloop say their implementation is already quite fast [1]. Is it worth integrating it? [1] https://github.com/MagicStack/uvloop -- nosy: +mpaolini ___ Python tracker <http://bugs.python.org/issue26081> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24891] race condition in initstdio() (python aborts running under nohup)
Marco Paolini added the comment: @haypo, yeah, definitely better than mine! All good for me. -- ___ Python tracker <http://bugs.python.org/issue24891> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24891] race condition in initstdio() (python aborts running under nohup)
Marco Paolini added the comment: @haypo thanks for the quick review. This new issue24891_2.patch covers all of the points you raised except the "check exception type" which I am still figuring out. -- Added file: http://bugs.python.org/file40355/issue24891_2.patch ___ Python tracker <http://bugs.python.org/issue24891> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24891] race condition in initstdio() (python aborts running under nohup)
Changes by Marco Paolini : Removed file: http://bugs.python.org/file40353/issue24891.patch ___ Python tracker <http://bugs.python.org/issue24891> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24891] race condition in initstdio() (python aborts running under nohup)
Marco Paolini added the comment: ops wrong patch... trying again. -- Added file: http://bugs.python.org/file40354/issue24891.patch ___ Python tracker <http://bugs.python.org/issue24891> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24891] race condition in initstdio() (python aborts running under nohup)
Marco Paolini added the comment: Attaching a patch. Does it make any sense? -- keywords: +patch nosy: +mpaolini Added file: http://bugs.python.org/file40353/issue24891.patch ___ Python tracker <http://bugs.python.org/issue24891> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24726] OrderedDict has strange behaviour when dict.__setitem__ is used.
Marco Paolini added the comment: Linking related issues http://bugs.python.org/issue24721 http://bugs.python.org/issue24667 and http://bugs.python.org/issue24685 -- nosy: +mpaolini ___ Python tracker <http://bugs.python.org/issue24726> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23670] Modifications to support iOS as a cross-compilation target
Changes by Marco Paolini : -- nosy: +mpaolini ___ Python tracker <http://bugs.python.org/issue23670> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23496] Steps for Android Native Build of Python 3.4.2
Changes by Marco Paolini : -- nosy: +mpaolini ___ Python tracker <http://bugs.python.org/issue23496> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24080] asyncio.Event.wait() Task was destroyed but it is pending
Marco Paolini added the comment: KeyboardInterrupt is not handled gently by asyncio (see https://groups.google.com/d/msg/python-tulip/sovg7EIBoXs/m7U-0UXqzSQJ) you could cancel all tasks in the signal handler: ... def sig_interrupt(): print('interrupt') for task in asyncio.Task.all_tasks(): task.cancel() loop.add_signal_handler(signal.SIGINT, sig_interrupt) ... -- nosy: +mpaolini ___ Python tracker <http://bugs.python.org/issue24080> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23227] Generator's finally block not run if close() called before first iteration
Marco Paolini added the comment: I think there is an issue in the way you designed your cleanup logic. So I think this issue is invalid. Usually, the code (funcion, class, ...) that *opens* the file should also be resposible of closing it. option 1) the caller opens and closes the file and wrapping the logged lines in a try/finally def logged_lines(f): try: for line in f: logging.warning(line.strip()) yield line finally: logging.warning('closing') f = open('yyy', 'r') try: for l in logged_lines(f): print(l) finally: f.close() option 2) the funcion opens and closes the file def logged_lines(fname): f = open('yyy', 'r') try: for line in f: logging.warning(line.strip()) yield line finally: logging.warning('closing') f.close() for l in logged_lines('yyy'): print(l) -- nosy: +mpaolini ___ Python tracker <http://bugs.python.org/issue23227> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21163] asyncio doesn't warn if a task is destroyed during its execution
Marco Paolini added the comment: Sorry for keeping this alive. Take a look at the `wait_for.py` just submitted in the unrelated #22448: no strong refs to the tasks are kept. Tasks remain alive only because they are timers and the event loop keeps strong ref. Do you think my proposed patch is OK? Sould I open a new issue? -- ___ Python tracker <http://bugs.python.org/issue21163> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21163] asyncio doesn't warn if a task is destroyed during its execution
Marco Paolini added the comment: > I don't understand how keeping a strong refrence would fix anything. You > only provided one example (async-gc-bug.py) which uses Queue objects but > keep weak references to them. Keeping strong references to tasks is not the > right fix. You must keep strong references to queues. If a queue is > destroyed, how can you put an item into it? Otherwise, the task will wait > forever. Keeping a strong refrence to the task just hides the bug. Or I > missed something. The original asyncio-gc-issue.py wasn't written by me, and yes, as you say it does have the reference bug you describe. I argue that bug shouldn't cause tasks to die: it should rather limit (as gc proceeds) the number of queues available to the producer in the WeakSet() and leaving alive all consumer waiting on an unreachable queue. Please look at my test2.py or even better test3.py for a simpler example. Note that in my issue_22163_patch_0.diff I only keep strong refs to futures a task is waiting on. Just as asyncio is already doing with asyncio.sleep() coroutine. > I dislike the idea of keeping strong references to tasks, it may create > even more reference cycles. We already have too many cycles with exceptions > stored in futures (in tasks). We are also already keeping strong refs to futures like asyncio.sleep I dislike the idea of randomly losing tasks. I also dislike the idea of forcing the user to manage strong refs to its tasks. All 3rd party libraries will have to invent their own way and it will lead to even more leaks/cycles very hard to debug. Not just exceptions: everytime a task is yielding on a future asyncio creates a reference cycle. > The current unit test uses low level functions to remove a variable using a > frame object. Can you provide an example which shows the bug without using > low level functions? My issue_22163_patch_0.diff only clears references by setting variables to `None`. No low level stuff needed. My test2.py example script also doesn't use any low level stuff I just uploaded test3.py with a simpler (and possibly more realistic) example. -- Added file: http://bugs.python.org/file36413/test3.py ___ Python tracker <http://bugs.python.org/issue21163> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21163] asyncio doesn't warn if a task is destroyed during its execution
Marco Paolini added the comment: Submitted a first stab at #2. Let me know what you think. If this works we'll have to remove the test_gc_pending test and then maybe even the code that now logs errors when a pending task is gc'ed -- Added file: http://bugs.python.org/file36408/issue_22163_patch_0.diff ___ Python tracker <http://bugs.python.org/issue21163> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21163] asyncio doesn't warn if a task is destroyed during its execution
Marco Paolini added the comment: > So you are changing your mind and withdrawing your option #1. I think option #1 (tell users to keep strong refs to tasks) is OK but option #2 is better. Yes, I changed my mind ;) -- ___ Python tracker <http://bugs.python.org/issue21163> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21163] asyncio doesn't warn if a task is destroyed during its execution
Marco Paolini added the comment: Asking the user to manage strong refs is just passing the potential leak issue outside of the standard library. It doesn't really solve anything. If the user gets the strong refs wrong he can either lose tasks or leak memory. If the standard library gets it right, both issues are avoided. > I'm all in favor of documenting that you must keep a strong reference to a > task that you want to keep alive. I'm not keen on automatically keep all > tasks alive, that might exacerbate leaks (which are by definition hard to find) in existing programs. -- ___ Python tracker <http://bugs.python.org/issue21163> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21163] asyncio doesn't warn if a task is destroyed during its execution
Marco Paolini added the comment: I finally wrapped my head around this. I wrote a (simpler) script to get a better picture. What happens - When a consumer task is first istantiated, the loop holds a strong reference to it (_ready) Later on, as the loop starts, the consumer task is yielded and it waits on an unreachable future. The last strong ref to it is lost (loop._ready). It is not collected immediately because it just created a reference loop (task -> coroutine -> stack -> future -> task) that will be broken only at task completion. gc.collect() called *before* the tasks are ever run has the weird side effect of moving the automatic gc collection forward in time. Automatic gc triggers after a few (but not all) consumers have become unreachable, depending on how many instructions were executed before running the loop. gc.collect() called after all the consumers are waiting on the unreachable future reaps all consumer tasks as expected. No bug in garbage collection. Yielding from asyncio.sleep() prevents the consumers from being collected: it creates a strong ref to the future in the loop. I suspect also all network-related asyncio coroutines behave this way. Summing up: Tasks that have no strong refs may be garbage collected unexpectedly or not at all, depending on which future they yield to. It is very difficult to debug and undestand why these tasks disappear. Side note: the patches submitted and merged in this issue do emit the relevant warnings when PYTHONASYNCIODEBUG is set. This is very useful. Proposed enhanchements -- 1. Document that you should always keep strong refs to tasks or to futures/coroutines the tasks yields from. This knowledge is currently passed around the brave asyncio users like oral tradition. 2. Alternatively, keep strong references to all futures that make it through Task._step. We are already keeping strong refs to *some* of the asyncio builtin coroutines (`asyncio.sleep` is one of those). Also, we do keep strong references to tasks that are ready to be run (the ones that simply `yield` or the ones that have not started yet) If you also think 1. or 2. are neeed, let me know and I'll try cook a patch. Sorry for the noise -- nosy: +mpaolini Added file: http://bugs.python.org/file36405/test2.py ___ Python tracker <http://bugs.python.org/issue21163> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com