Nick Coghlan <ncogh...@gmail.com> added the comment:

I was going to note that the algorithm Anthony has pursued here is the same one 
we already use for the list constructor and list.extend(), but Inada-san 
already pointed that out :)

While length_hint is allowed to be somewhat inaccurate, we do expect it to be 
at least *vaguely* accurate (otherwise it isn't very useful, and if it can be 
inaccurate enough to trigger OverflowError or MemoryError in cases that would 
otherwise work reasonably well, it would be better for a type not to implement 
it at all).

While it would be nice to be able to avoid adding a new opcode, the problem is 
that the existing candidate opcodes (BUILD_LIST, BUILD_LIST_UNPACK) are both 
inflexible in what they do:

- BUILD_LIST requires that the final list length be known at compile time
- BUILD_LIST_UNPACK infers the final length from an object reference, but calls 
_PyList_Extend directly, so it combines the preallocation and the iterator 
consumption into a single step, and hence can't be used outside of iterable 
unpacking into a list

At the same time, attempting to generalise either of them isn't desirable, 
since it would slow them down for their existing use cases, and be slower than 
a new opcode for this use case.

The proposed BUILD_LIST_PREALLOC opcode splits the difference: it lets the 
compiler provide the interpreter with a *hint* as to how big the resulting list 
is expected to be.

That said, you'd want to run the result through the benchmark suite rather than 
relying solely on microbenchmarks, as even though unfiltered 
"[some_operation_on_x for x in y]" comprehensions without nested loops or 
filter clauses are pretty common (more common than the relatively new "[*itr]" 
syntax), it's far less clear what the typical distribution in input lengths 
actually is, and how many memory allocations need to be avoided in order to 
offset the cost of the initial _PyObject_LengthHint call (as Pablo's small 
scale results show).

(Note that in the _PyList_Extend code, there are preceding special cases for 
builtin lists and tuples that take those down a much faster path that avoids 
the _PyObject_LengthHint call entirely)

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue36551>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to