[issue41740] Improve error message for string concatenation via `sum`

2020-09-07 Thread Marco Paolini


Marco Paolini  added the comment:

I was thinking to just clarify a bit the error message that results from 
Py_NumberAdd. This won't make it slower in the "hot" path

doing something like (not compile tested, sorry)

--- a/Python/bltinmodule.c
+++ b/Python/bltinmodule.c
@@ -2451,8 +2451,13 @@ builtin_sum_impl(PyObject *module, PyObject *iterable, 
PyObject *start)
 Py_DECREF(result);
 Py_DECREF(item);
 result = temp;
-if (result == NULL)
+if (result == NULL) {
+ if (PyUnicode_Check(item) || PyBytes_Check(item) || 
PyByteArray_Check(item))
+ PyErr_SetString(PyExc_TypeError,
+   "sum() can't sum bytes, strings or byte-arrays [use 
.join(seq) instead]");
+   }
 break;
+   }
 }
 Py_DECREF(iter);
 return result;

--

___
Python tracker 
<https://bugs.python.org/issue41740>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41740] string concatenation via `sum`

2020-09-07 Thread Marco Paolini


Marco Paolini  added the comment:

also worth noting, the start argument is type checked instead. Maybe we could 
apply the same checks to the items of the iterable?

python3 -c "print(sum(('a', 'b', 'c'), start='d'))"
Traceback (most recent call last):
  File "", line 1, in 
TypeError: sum() can't sum strings [use ''.join(seq) instead]


see 
https://github.com/python/cpython/blob/c96d00e88ead8f99bb6aa1357928ac4545d9287c/Python/bltinmodule.c#L2310

--

___
Python tracker 
<https://bugs.python.org/issue41740>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41740] string concatenation via `sum`

2020-09-07 Thread Marco Paolini


Marco Paolini  added the comment:

This happens because the default value for the start argument is zero , hence 
the first operation is `0 + 'a'`

--
nosy: +mpaolini

___
Python tracker 
<https://bugs.python.org/issue41740>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34624] -W option and PYTHONWARNINGS env variable does not accept module regexes

2020-07-08 Thread Marco Paolini


Marco Paolini  added the comment:

hello Thomas,

do you need any help fixing the conflicts in your PR?


even if Lib/warnings.py changed a little in the last 2 years, your PR is still 
good!

--
nosy: +mpaolini

___
Python tracker 
<https://bugs.python.org/issue34624>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38677] Arraymodule initialization error handling improvements

2019-11-03 Thread Marco Paolini


Change by Marco Paolini :


--
keywords: +patch
pull_requests: +16552
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/17039

___
Python tracker 
<https://bugs.python.org/issue38677>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38677] Arraymodule initialization error handling improvements

2019-11-03 Thread Marco Paolini


New submission from Marco Paolini :

Module two-phase initialization does not report errors correctly to the import 
machinery

--
components: Extension Modules
messages: 355913
nosy: mpaolini
priority: normal
severity: normal
status: open
title: Arraymodule initialization error handling improvements
type: behavior
versions: Python 3.8, Python 3.9

___
Python tracker 
<https://bugs.python.org/issue38677>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37587] JSON loads performance improvement for long strings

2019-08-15 Thread Marco Paolini


Marco Paolini  added the comment:

ujson (https://github.com/esnme/ultrajson) instead is faster when decoding 
non-ascii in the same example above, so it is likely there is room for 
improvement...

--

___
Python tracker 
<https://bugs.python.org/issue37587>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37587] JSON loads performance improvement for long strings

2019-08-15 Thread Marco Paolini

Marco Paolini  added the comment:

ops sorry here's the right commands

python -m pyperf timeit -s 'import json;' -s  'c = "a"; s = json.dumps(c * 
(2**10 // (len(json.dumps(c)) - 2)))' 'json.loads(s)' -o ascii2k.json
python -m pyperf timeit -s 'import json;' -s  'c = "€"; s = json.dumps(c * 
(2**10 // (len(json.dumps(c)) - 2)))' 'json.loads(s)' -o nonascii2k.json

Mean +- std dev: [ascii2k] 3.69 us +- 0.05 us -> [nonascii2k] 12.4 us +- 0.1 
us: 3.35x slower (+235%)

--

___
Python tracker 
<https://bugs.python.org/issue37587>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37587] JSON loads performance improvement for long strings

2019-08-15 Thread Marco Paolini

Marco Paolini  added the comment:

also worth noting escape sequences for non-ascii characters are slower, even 
when encoded length is the same.

python -m pyperf timeit -s 'import json;' -s  'c = "€"; s = json.dumps(c * 
(2**10 // len(json.dumps(c)) - 2))' 'json.loads(s)' -o nonascii2k.json

python -m pyperf timeit -s 'import json;' -s  'c = "a"; s = json.dumps(c * 
(2**10 // len(json.dumps(c)) - 2))' 'json.loads(s)' -o ascii2k.json

Mean +- std dev: [ascii2k] 2.59 us +- 0.04 us -> [nonascii2k] 9.98 us +- 0.12 
us: 3.86x slower (+286%)

--

___
Python tracker 
<https://bugs.python.org/issue37587>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37587] JSON loads performance improvement for long strings

2019-08-15 Thread Marco Paolini


Marco Paolini  added the comment:

I also confirm Inada's patch further improves performance!

All my previous benchmarks were done with gcc and PGO optimizations performed 
only with test_json task... maybe this explains the weird results?

I tested the performance of new master 69f37bcb28d7cd78255828029f895958b5baf6ff 
with *all* PGO task reverting my original patch:

iff --git a/Modules/_json.c b/Modules/_json.c
index 112903ea57..9b63167276 100644
--- a/Modules/_json.c
+++ b/Modules/_json.c
@@ -442,7 +442,7 @@ scanstring_unicode(PyObject *pystr, Py_ssize_t end, int 
strict, Py_ssize_t *next
 if (d == '"' || d == '\\') {
 break;
 }
-if (d <= 0x1f && strict) {
+if (strict && d <= 0x1f) {
 raise_errmsg("Invalid control character at", pystr, next);
 goto bail;
 }

... and surprise...

Mean +- std dev: [69f37bcb28d7cd78255828029f895958b5baf6ff] 5.29 us +- 0.07 us 
-> [69f37bcb28d7cd78255828029f895958b5baf6ff-patched] 5.11 us +- 0.03 us: 1.04x 
faster (-4%)

should we revert my original patch entirely now? Or am I missing something?

--

___
Python tracker 
<https://bugs.python.org/issue37587>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37587] JSON loads performance improvement for long strings

2019-07-30 Thread Marco Paolini


Marco Paolini  added the comment:

@steve.dower yes, that's what made me discard that experiment we did during the 
sprint.

Ok will test your new patch soon

--

___
Python tracker 
<https://bugs.python.org/issue37587>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37587] JSON loads performance improvement for long strings

2019-07-29 Thread Marco Paolini


Marco Paolini  added the comment:

I forgot to mention, I was inspired by @christian.heimes 's talk at EuroPython 
2019 
https://ep2019.europython.eu/talks/es2pZ6C-introduction-to-low-level-profiling-and-tracing/
 (thanks!)

--

___
Python tracker 
<https://bugs.python.org/issue37587>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37587] JSON loads performance improvement for long strings

2019-07-29 Thread Marco Paolini


Marco Paolini  added the comment:

I am also working on a different patch that uses the "pcmpestri" SSE4 processor 
instruction, it looks like this for now.

While at it I realized there is (maybe) another potential speedup: avoiding the 
ucs4lib_find_max_char we do for each chunk of the string ( that entails 
scanning the string in memory one more time)... anyways that's another (much 
longer) story, probably for another issue?


```
diff --git a/Modules/_json.c b/Modules/_json.c
index 38beb6f50d..25b1cf4a99 100644
--- a/Modules/_json.c
+++ b/Modules/_json.c
@@ -400,6 +400,38 @@ _build_rval_index_tuple(PyObject *rval, Py_ssize_t idx) {
 Py_CLEAR(chunk); \
 }
 
+
+inline unsigned int
+_fast_search(const void *needle, unsigned int needle_len, const void 
*haystack, unsigned int haystack_len)
+{
+  unsigned int pos;
+  __asm__ __volatile__("movq (%1), %%xmm1;\n"
+   "mov %2, %%eax;\n"
+   "movq %3, %%r8;\n"
+   "mov %4, %%edx;\n"
+   ".intel_syntax noprefix;\n"
+   "loop: pcmpestri xmm1, [r8], 0;\n" /* 0 = equal any */
+   /* "pcmpestri %%mm1, (%%r8), $0;\n" /\* 0 = equal any 
*\/ */
+   ".att_syntax prefix;\n"
+   "cmp $15, %%ecx;\n"
+   "jbe found;\n"
+   "sub $16, %%edx;\n"
+   "jnge notfound;\n"
+   "add $16, %%r8;\n"
+   "jmp loop;\n"
+   "notfound: movl %4, %%ecx;\n"
+   "jmp exit;\n"
+   "found: mov %4, %%eax;\n"
+   "sub %%edx, %%eax;\n"
+   "add %%eax, %%ecx;\n"
+   "exit: mov %%ecx, %0;\n"
+   :"=m"(pos)
+   :"r"(needle), "r"(needle_len), "r"(haystack), 
"r"(haystack_len)
+   :"%eax", "%edx", "%ecx", "%r8", "%xmm1");
+  return pos;
+}
+
+
 static PyObject *
 scanstring_unicode(PyObject *pystr, Py_ssize_t end, int strict, Py_ssize_t 
*next_end_ptr)
 {
@@ -431,17 +463,26 @@ scanstring_unicode(PyObject *pystr, Py_ssize_t end, int 
strict, Py_ssize_t *next
 PyErr_SetString(PyExc_ValueError, "end is out of bounds");
 goto bail;
 }
+char needle[2];
+needle[0] = '"';
+needle[1] = '\\';
 while (1) {
 /* Find the end of the string or the next escape */
 Py_UCS4 c = 0;
-for (next = end; next < len; next++) {
+if (kind == PyUnicode_1BYTE_KIND) {
+  next = _fast_search(needle, 2, buf+end, len-end) + end;
+  if (next < len)
 c = PyUnicode_READ(kind, buf, next);
-if (c == '"' || c == '\\') {
-break;
-}
-else if (strict && c <= 0x1f) {
-raise_errmsg("Invalid control character at", pystr, next);
-goto bail;
+} else {
+for (next = end; next < len; next++) {
+c = PyUnicode_READ(kind, buf, next);
+if (c == '"' || c == '\\') {
+break;
+}
+else if (strict && c <= 0x1f) {
+raise_errmsg("Invalid control character at", pystr, next);
+goto bail;
+}
 }
 }
 if (!(c == '"' || c == '\\')) {
```

--

___
Python tracker 
<https://bugs.python.org/issue37587>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37587] JSON loads performance improvement for long strings

2019-07-29 Thread Marco Paolini


Marco Paolini  added the comment:

On gcc, running the tests above, the only change that is relevant for speedup 
is switching around the strict check. Removing the extra MOV related to the 
outer "c" variable is not significant (at least on gcc and the few tests I did)

Unfortunately I had to change the patch we did together during the sprint 
because it was breaking the strict check logic...

I updated my PR accordingly, kept only the bare minimum.

--

___
Python tracker 
<https://bugs.python.org/issue37587>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37587] JSON loads performance improvement for long strings

2019-07-13 Thread Marco Paolini


Marco Paolini  added the comment:

Here's the real world example

$ ls -hs events-100k.json 
84M events-100k.json

+---+-+-+
| Benchmark | vanilla-bpo-events-100k | patched-bpo-events-100k |
+===+=+=+
| timeit| 985 ms  | 871 ms: 1.13x faster (-12%) |
+---+-+-+

--

___
Python tracker 
<https://bugs.python.org/issue37587>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37587] JSON loads performance improvement for long strings

2019-07-13 Thread Marco Paolini


Marco Paolini  added the comment:

Also on my real workload (loading 60GB jsonl file containing mostly strings) I 
measured a 10% improvement

--

___
Python tracker 
<https://bugs.python.org/issue37587>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37587] JSON loads performance improvement for long strings

2019-07-13 Thread Marco Paolini


Change by Marco Paolini :


--
nosy: +ezio.melotti, rhettinger

___
Python tracker 
<https://bugs.python.org/issue37587>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37587] JSON loads performance improvement for long strings

2019-07-13 Thread Marco Paolini


Change by Marco Paolini :


--
keywords: +patch
pull_requests: +14547
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/14752

___
Python tracker 
<https://bugs.python.org/issue37587>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37587] JSON loads performance improvement for long strings

2019-07-13 Thread Marco Paolini


New submission from Marco Paolini :

I analysed the performance of json.loads in one production workload we have.

Spy-py tells me the majority of time is spent into C json module (see 
events.svg)

Digging deeper, linux perf tells me hottest loop (where 20%+ of the time is 
spent) in _json.scanstring_unicode, in this loop:

189:   movzx  eax,BYTE PTR [rbp+rbx*1+0x0]
   movDWORD PTR [rsp+0x44],eax
   cmpeax,0x22
   je 22f
   cmpeax,0x5c
   je 22f
   test   r13d,r13d
   je 180
   cmpeax,0x1f

which is related to this code in Modules/_json.c


/* Find the end of the string or the next escape */
Py_UCS4 c = 0;
for (next = end; next < len; next++) {
c = PyUnicode_READ(kind, buf, next);
if (c == '"' || c == '\\') {
break;
}
else if (strict && c <= 0x1f) {
raise_errmsg("Invalid control character at", pystr, next);
goto bail;
}
}

Two optimisations can be done:

1. remove the mov entirely. It is not needed inside the loop and it is only 
needed later, outside the loop to access the variable
2. switch around the strict check (in the second if) because strict defaults to 
1 so it will likely pass the test, while the likelyness of finding an invalid 
character is lower

Running the pyperformance json_loads benchmark I get:

++--+-+
| Benchmark  | vanilla-pyperf-pgo38 | patched-pyperf-pgo38|
++==+=+
| json_loads | 54.9 us  | 53.9 us: 1.02x faster (-2%) |
++--+-+


A micro benchmark on a 1MB long json string gives better results:

python -m pyperf timeit -s "import json; x = json.dumps({'k': '1' * 2 ** 20})" 
"json.loads(x)"

+---++-+
| Benchmark | vanilla-1m | patched-1m  |
+===++=+
| timeit| 2.62 ms| 2.39 ms: 1.10x faster (-9%) |
+---++-+

--
components: Library (Lib)
files: events.svg
messages: 347832
nosy: christian.heimes, mpaolini, pablogsal
priority: normal
severity: normal
status: open
title: JSON loads performance improvement for long strings
type: performance
versions: Python 3.6, Python 3.7, Python 3.8, Python 3.9
Added file: https://bugs.python.org/file48476/events.svg

___
Python tracker 
<https://bugs.python.org/issue37587>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26081] Implement asyncio Future in C to improve performance

2016-07-21 Thread Marco Paolini

Marco Paolini added the comment:

THe guys developing uvloop say their implementation is already quite fast [1]. 
Is it worth integrating it?

[1] https://github.com/MagicStack/uvloop

--
nosy: +mpaolini

___
Python tracker 
<http://bugs.python.org/issue26081>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24891] race condition in initstdio() (python aborts running under nohup)

2015-09-04 Thread Marco Paolini

Marco Paolini added the comment:

@haypo, yeah, definitely better than mine! All good for me.

--

___
Python tracker 
<http://bugs.python.org/issue24891>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24891] race condition in initstdio() (python aborts running under nohup)

2015-09-04 Thread Marco Paolini

Marco Paolini added the comment:

@haypo thanks for the quick review. This new issue24891_2.patch covers all of 
the points you raised except the "check exception type" which I am still 
figuring out.

--
Added file: http://bugs.python.org/file40355/issue24891_2.patch

___
Python tracker 
<http://bugs.python.org/issue24891>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24891] race condition in initstdio() (python aborts running under nohup)

2015-09-04 Thread Marco Paolini

Changes by Marco Paolini :


Removed file: http://bugs.python.org/file40353/issue24891.patch

___
Python tracker 
<http://bugs.python.org/issue24891>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24891] race condition in initstdio() (python aborts running under nohup)

2015-09-04 Thread Marco Paolini

Marco Paolini added the comment:

ops wrong patch... trying again.

--
Added file: http://bugs.python.org/file40354/issue24891.patch

___
Python tracker 
<http://bugs.python.org/issue24891>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24891] race condition in initstdio() (python aborts running under nohup)

2015-09-04 Thread Marco Paolini

Marco Paolini added the comment:

Attaching a patch. Does it make any sense?

--
keywords: +patch
nosy: +mpaolini
Added file: http://bugs.python.org/file40353/issue24891.patch

___
Python tracker 
<http://bugs.python.org/issue24891>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24726] OrderedDict has strange behaviour when dict.__setitem__ is used.

2015-07-26 Thread Marco Paolini

Marco Paolini added the comment:

Linking related issues http://bugs.python.org/issue24721 
http://bugs.python.org/issue24667 and http://bugs.python.org/issue24685

--
nosy: +mpaolini

___
Python tracker 
<http://bugs.python.org/issue24726>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23670] Modifications to support iOS as a cross-compilation target

2015-07-25 Thread Marco Paolini

Changes by Marco Paolini :


--
nosy: +mpaolini

___
Python tracker 
<http://bugs.python.org/issue23670>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23496] Steps for Android Native Build of Python 3.4.2

2015-07-25 Thread Marco Paolini

Changes by Marco Paolini :


--
nosy: +mpaolini

___
Python tracker 
<http://bugs.python.org/issue23496>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24080] asyncio.Event.wait() Task was destroyed but it is pending

2015-04-30 Thread Marco Paolini

Marco Paolini added the comment:

KeyboardInterrupt is not handled gently by asyncio (see 
https://groups.google.com/d/msg/python-tulip/sovg7EIBoXs/m7U-0UXqzSQJ)

you could cancel all tasks in the signal handler:

...

def sig_interrupt():
print('interrupt')
for task in asyncio.Task.all_tasks():
task.cancel()

loop.add_signal_handler(signal.SIGINT, sig_interrupt)
...

--
nosy: +mpaolini

___
Python tracker 
<http://bugs.python.org/issue24080>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23227] Generator's finally block not run if close() called before first iteration

2015-04-22 Thread Marco Paolini

Marco Paolini added the comment:

I think there is an issue in the way you designed your cleanup logic. So I 
think this issue is invalid.

Usually, the code (funcion, class, ...) that *opens* the file should also be 
resposible of closing it.

option 1) the caller opens and closes the file and wrapping the logged lines in 
a try/finally


def logged_lines(f):
try:
for line in f:
logging.warning(line.strip())
yield line
finally:
logging.warning('closing')


f = open('yyy', 'r')
try:
for l in logged_lines(f):
   print(l)
finally:
f.close()


option 2) the funcion opens and closes the file

def logged_lines(fname):
f = open('yyy', 'r')
try:
for line in f:
logging.warning(line.strip())
yield line
finally:
logging.warning('closing')
f.close()

for l in logged_lines('yyy'):
   print(l)

--
nosy: +mpaolini

___
Python tracker 
<http://bugs.python.org/issue23227>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21163] asyncio doesn't warn if a task is destroyed during its execution

2014-09-20 Thread Marco Paolini

Marco Paolini added the comment:

Sorry for keeping this alive.

Take a look at the `wait_for.py` just submitted in the unrelated #22448: no 
strong refs to the tasks are kept. Tasks remain alive only because they are 
timers and the event loop keeps strong ref.

Do you think my proposed patch is OK? Sould I open a new issue?

--

___
Python tracker 
<http://bugs.python.org/issue21163>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21163] asyncio doesn't warn if a task is destroyed during its execution

2014-08-19 Thread Marco Paolini

Marco Paolini added the comment:

> I don't understand how keeping a strong refrence would fix anything. You
> only provided one example (async-gc-bug.py) which uses Queue objects but
> keep weak references to them. Keeping strong references to tasks is not the
> right fix. You must keep strong references to queues. If a queue is
> destroyed, how can you put an item into it? Otherwise, the task will wait
> forever. Keeping a strong refrence to the task just hides the bug. Or I
> missed something.

The original asyncio-gc-issue.py wasn't written by me, and yes, as you say it 
does have the reference bug you describe. I argue that bug shouldn't cause 
tasks to die: it should rather limit (as gc proceeds) the number of queues 
available to the producer in the WeakSet() and leaving alive all consumer 
waiting on an unreachable queue.

Please look at my test2.py or even better test3.py for a simpler example.

Note that in my issue_22163_patch_0.diff I only keep strong refs to futures a 
task is waiting on. Just as asyncio is already doing with asyncio.sleep() 
coroutine.

> I dislike the idea of keeping strong references to tasks, it may create
> even more reference cycles. We already have too many cycles with exceptions
> stored in futures (in tasks).
We are also already keeping strong refs to futures like asyncio.sleep

I dislike the idea of randomly losing tasks.

I also dislike the idea of forcing the user to manage strong refs to its tasks. 
All 3rd party libraries will have to invent their own way and it will lead to 
even more leaks/cycles very hard to debug.

Not just exceptions: everytime a task is yielding on a future asyncio creates a 
reference cycle.

> The current unit test uses low level functions to remove a variable using a
> frame object. Can you provide an example which shows the bug without using
> low level functions?

My issue_22163_patch_0.diff only clears references by setting variables to 
`None`. No low level stuff needed.
 
My test2.py example script also doesn't use any low level stuff

I just uploaded test3.py with a simpler (and possibly more realistic) example.

--
Added file: http://bugs.python.org/file36413/test3.py

___
Python tracker 
<http://bugs.python.org/issue21163>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21163] asyncio doesn't warn if a task is destroyed during its execution

2014-08-18 Thread Marco Paolini

Marco Paolini added the comment:

Submitted a first stab at #2. Let me know what you think.

If this works we'll have to remove the test_gc_pending test and then maybe even 
the code that now logs errors when a pending task is gc'ed

--
Added file: http://bugs.python.org/file36408/issue_22163_patch_0.diff

___
Python tracker 
<http://bugs.python.org/issue21163>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21163] asyncio doesn't warn if a task is destroyed during its execution

2014-08-18 Thread Marco Paolini

Marco Paolini added the comment:

> So you are changing your mind and withdrawing your option #1.

I think option #1 (tell users to keep strong refs to tasks) is
OK but option #2 is better.

Yes, I changed my mind ;)

--

___
Python tracker 
<http://bugs.python.org/issue21163>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21163] asyncio doesn't warn if a task is destroyed during its execution

2014-08-18 Thread Marco Paolini

Marco Paolini added the comment:

Asking the user to manage strong refs is just passing the potential 
leak issue outside of the standard library. It doesn't really solve anything.

If the user gets the strong refs wrong he can either lose tasks or 
leak memory.

If the standard library gets it right, both issues are avoided.

> I'm all in favor of documenting that you must keep a strong reference to a
> task that you want to keep alive. I'm not keen on automatically keep all
> tasks alive, that might exacerbate leaks (which are by definition hard to
find) in existing programs.

--

___
Python tracker 
<http://bugs.python.org/issue21163>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21163] asyncio doesn't warn if a task is destroyed during its execution

2014-08-18 Thread Marco Paolini

Marco Paolini added the comment:

I finally wrapped my head around this. I wrote a (simpler) script to get a 
better picture.

What happens
-

When a consumer task is first istantiated, the loop holds a strong reference to 
it (_ready)

Later on, as the loop starts, the consumer task is yielded and it waits on an 
unreachable future. The last strong ref to it is lost (loop._ready).

It is not collected immediately because it just created a reference loop
(task -> coroutine -> stack -> future -> task) that will be broken only at task 
completion.

gc.collect() called *before* the tasks are ever run has the weird side effect 
of moving the automatic gc collection forward in time.
Automatic gc triggers after a few (but not all) consumers have become 
unreachable, depending on how many instructions were executed before running 
the loop.

gc.collect() called after all the consumers are waiting on the unreachable 
future reaps all consumer tasks as expected. No bug in garbage collection.

Yielding from asyncio.sleep() prevents the consumers from being 
collected: it creates a strong ref to the future in the loop.
I suspect also all network-related asyncio coroutines behave this way.

Summing up: Tasks that have no strong refs may be garbage collected 
unexpectedly or not at all, depending on which future they yield to. It is very 
difficult to debug and undestand why these tasks disappear.
 
Side note: the patches submitted and merged in this issue do emit the relevant 
warnings when PYTHONASYNCIODEBUG is set. This is very useful.

Proposed enhanchements
--

1. Document that you should always keep strong refs to tasks or to 
futures/coroutines the tasks yields from. This knowledge is currently passed 
around the brave asyncio users like oral tradition.

2. Alternatively, keep strong references to all futures that make it through 
Task._step. We are already keeping strong refs to *some* of the asyncio builtin 
coroutines (`asyncio.sleep` is one of those). Also, we do keep strong 
references to tasks that are ready to be run (the ones that simply `yield` or 
the ones that have not started yet)

If you also think 1. or 2. are neeed, let me know and I'll try cook a patch.

Sorry for the noise

--
nosy: +mpaolini
Added file: http://bugs.python.org/file36405/test2.py

___
Python tracker 
<http://bugs.python.org/issue21163>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com