Re: [Python-Dev] Do we need __length_hint__ at all? (Was PEP 0424: A method for exposing a length hint)
On Tue, 17 Jul 2012 00:18:55 +1000 Nick Coghlan wrote: > > Given that all standard library containers default to assuming a size > of zero (absent information indicating otherwise), and a special value > would need to be special cased by *every* client of the API (and > almost always defaulted to zero), it's simply not a good trade-off. Actually, dict and set default to a non-zero internal size, but I agree making containers harder to implement is not a good thing. > >> The PEP itself already has this general tone, but I think that it > >> should be even more emphatic that it's about codifying the status quo, > >> *not* about modifying it with ideas haven't been proven useful through > >> past experience. > > > > Then the PEP shouldn't address infinite iterators at all. > > Noting that infinite iterators are free to define __length_hint__ to > always throw an exception *is* the status quo. Being "free" to do unexpected or unconventional things is not the same thing as codifying those behaviours in a PEP, especially when noone is actually doing them. __length_hint__ is supposed to be informative, it shouldn't error out on you. So still -1 from me. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Do we need __length_hint__ at all? (Was PEP 0424: A method for exposing a length hint)
On Tue, Jul 17, 2012 at 12:01 AM, Antoine Pitrou wrote: > On Mon, 16 Jul 2012 23:23:18 +1000 > Nick Coghlan wrote: >> >> - distinguishing between different reasons for saying "don't >> preallocate any space" (i.e. returning zero). I still haven't heard a >> convincing rationale for this one > > The point is that zero is a valid value for a length hint. By making it > a special value for "don't know", you are making the protocol > potentially confusing, and you are also departing from the current > semantics. No, it just means the default estimate is always zero. If you don't do that, then *every* client of __length_hint__ has to check for the magic value. It's making the API harder to use for no good reason. > (and, yes, I think distinguishing between zero and "don't know" is > useful: imagine a container that would preallocate 256 entries by > default when the answer is "don't know") Such a container has to already deal with the case when it overestimates severely. The only cost of using zero as a default estimate is that such hypothetical containers will overallocate when they technically didn't need to, which will already happen for any empty iterator that doesn't provide __length_hint__ at all. Given that all standard library containers default to assuming a size of zero (absent information indicating otherwise), and a special value would need to be special cased by *every* client of the API (and almost always defaulted to zero), it's simply not a good trade-off. >> The PEP itself already has this general tone, but I think that it >> should be even more emphatic that it's about codifying the status quo, >> *not* about modifying it with ideas haven't been proven useful through >> past experience. > > Then the PEP shouldn't address infinite iterators at all. Noting that infinite iterators are free to define __length_hint__ to always throw an exception *is* the status quo. We just haven't done it for the stdlib ones. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Do we need __length_hint__ at all? (Was PEP 0424: A method for exposing a length hint)
On Mon, 16 Jul 2012 23:23:18 +1000 Nick Coghlan wrote: > > - distinguishing between different reasons for saying "don't > preallocate any space" (i.e. returning zero). I still haven't heard a > convincing rationale for this one The point is that zero is a valid value for a length hint. By making it a special value for "don't know", you are making the protocol potentially confusing, and you are also departing from the current semantics. (and, yes, I think distinguishing between zero and "don't know" is useful: imagine a container that would preallocate 256 entries by default when the answer is "don't know") > The PEP itself already has this general tone, but I think that it > should be even more emphatic that it's about codifying the status quo, > *not* about modifying it with ideas haven't been proven useful through > past experience. Then the PEP shouldn't address infinite iterators at all. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Do we need __length_hint__ at all? (Was PEP 0424: A method for exposing a length hint)
On Mon, Jul 16, 2012 at 7:21 PM, Tim Golden wrote: >> Speaking of which - I find this bikeshed disgusting. > > Disgusting? Irritating, perhaps, but why should a PEP -- even one whose > purpose is to codify existing practice -- not result in discussions > about its subject matter? > > The final P stands for Proposal, not for Pronouncement. Indeed - I'd be worried if any PEP sailed through python-dev review without a thorough kicking of the tires. Yes, it can be annoying having to bring people up to speed on issues that they aren't familiar with, but that's generally a sign that there is relevant background information *missing from the PEP*. PEP's aren't supposed to be written just for people that are already intimately familiar with a problem - they're supposed to provide enough background that they stand on their own. In this case, the key points that I think need to be added: - more background on why the __length_hint__ API exists in CPython in the first place: to minimise potentially expensive data copies (due to memory reallocation) when creating a concrete container from an iterator. This includes when creating them from another concrete container via an intermediate iterator. This is why at least the following produce objects that define __length_hint__ in CPython: reversed itertools.repeat iter(dict()) iter(list()) iter(tuple()) iter(str()) iter(bytes()) iter(bytearray()) iter(set()) iter(frozenset()) dict.values() dict.keys() As well as any user defined sequence that relies on the default sequence iterator: >>> class MySeq(): ... def __getitem__(self, idx): ... return idx ... def __len__(self): ... return 10 ... >>> hasattr(iter(MySeq()), "__length_hint__") True - clarification on the implications of it only being a "hint": specifically, as it may be an over- or underestimate, you *cannot* rely on the hint to decide whether or not to iterate over the object when a valid length is returned (as a value of zero may be an underestimate). However, it does allow you to presize your container more appropriately than just starting at zero as usual, thus achieving the aim of reducing the risk of unnecessary memory copies. That's the basic proposal. Separate from that, there are a few suggestions for *enhancement* beyond what CPython currently uses (and has demonstrated a clear need for): - adding operator.length_hint(). This makes sense to me, as it makes it much easier to use the API when implementing a container type in Python. It's also a pretty simple addition. - adding a C level type slot. I'm personally -1 on this one in the context of the PEP. I don't think the current PEP (which is really aimed at standardisation for PyPy's benefit) should be weighed down with this CPython specific implementation detail. As a separate proposal, independent of this PEP, from someone that cares specifically about micro-optimising this API for CPython, and (preferably) can produce benchmark numbers to show the additional implementation complexity is worthwhile, then I wouldn't object. I just don't want this orthogonal concern to derail the standardisation effort. - distinguishing between different reasons for saying "don't preallocate any space" (i.e. returning zero). I still haven't heard a convincing rationale for this one - it seems to be based on the notion that the case of skipping the iteration step for a genuinely empty iterable is worth optimising. This idea just doesn't make sense when any legitimate length value that is returned can't be trusted to be completely accurate - you have to iterate to confirm the actual number. - making it possible to fail *fast* when a known infinite iterator (like itertools.cycle or itertools.count) is passed to a concrete container. I think this is best covered in the PEP by explicitly stating that some types may implement __length_hint__ to always raise TypeError rather than defining a magic return value that means "I'm infinite". - making it possible for iterators like enumerate, map and filter to delegate __length_hint__ to their underlying iterator. This seems potentially worth doing, but requires resolving the problem that Raymond noted with handling the difference in internal behaviour between enumerate("abc") and enumerate(iter("abc")). Again, it would be unfortunate to see the PEP held up over this. - making it possible to define __length_hint__ for generator-iterator objects. While this is a nice idea, again, I don't think it's something that this particular PEP should be burdened with. My main point is that the current __length_hint__ behaviour has already proven its value in the real world. The PyPy team would like that behaviour codified, so they can be reasonably sure both implementations are doing the same thing. Many of the enhancements I have listed above may be suitable material for future enhancement proposals, but I'm not seeing any requeste
Re: [Python-Dev] Do we need __length_hint__ at all? (Was PEP 0424: A method for exposing a length hint)
> *If* resizing list is so slow, then why not make it faster? A simple solution to speed up such problem is to change the overallocation factor, but it may waste memory. Python tries to be fast and to not waste too much memory. > Why is it a significant optimisation? > How much slower is it? > Where is the data? I worked recently on optimizing str%args and str.format(args). Handle correctly the memory allocation is just critical for performances, especially for str with the PEP 393, because we have to shrink the buffer to the exact string length with the formatting function is done. I tried various overallocation factors and I chose 1.25 (5/4) because it was the fastest. See for example this issue for benchmark numbers: http://bugs.python.org/issue14687 The PyUnicodeWriter internal object uses various options to choose how many bytes should be allocated: * an overallocation flag to disable overallocation when we know that we are writing the last character/string into be buffer * a "minimal length" used for the first allocation * an hardcoded overallocation factor (1.25) PyUnicodeWriter is a little bit different than the __length_hint__ issue because PyUnicodeWriter has to shrink the buffer when it is done, but I can say that overallocation is very useful for speed. Victor ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Do we need __length_hint__ at all? (Was PEP 0424: A method for exposing a length hint)
Maciej Fijalkowski, 16.07.2012 11:14: > On Mon, Jul 16, 2012 at 11:02 AM, Stefan Behnel wrote: >> Mark Shannon, 16.07.2012 10:37: >>> If resizing of lists is too slow, then we should reconsider the 9/8 >>> factor and/or look to tweak the resizing code. >> >> The thing is that the performance is platform specific. On systems with a >> fast memory allocator, especially on Linux, the difference is negligible. >> However, with a slow memory allocator, especially a slow realloc(), e.g. on >> Windows or Solaris, this can substantially hurt the performance, up to a >> quadratically increasing runtime in the worst case. >> >> The length hint was implemented specifically to work around this problem. > > It's not the actual allocation (typically), it's the copying that's the > problem. Note that a call to realloc() includes that part and can avoid copying if possible. A good realloc() implementation can make this use case run in amortised linear time, at least on a system with sufficient memory. A bad one can result in quadratic runtime, which is way more than a change by a constant factor. Thus my above comment that it's platform specific. > there are cases where it *can* be significant (as in the copying is by far > the dominating operation), most notable giant templates with iterators > written in C. Absolutely. This is particularly visible at the C level because C implemented iterators have a very low overhead overall. > Speaking of which - I find this bikeshed disgusting. The purpose of the PEP > is to codify whatever is already written in code in CPython. If you guys > don't want it, we'll just implement it anyway and try to follow the CPython > current implementation from 2.7. The idea behind this bikeshedding is that at the moment where we make this an official protocol rather than an implementation detail, it should be able to communicate the different states on the callee side of such a protocol. I.e. it needs a better design than the current one. Stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Do we need __length_hint__ at all? (Was PEP 0424: A method for exposing a length hint)
> Speaking of which - I find this bikeshed disgusting. Disgusting? Irritating, perhaps, but why should a PEP -- even one whose purpose is to codify existing practice -- not result in discussions about its subject matter? The final P stands for Proposal, not for Pronouncement. TJG ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Do we need __length_hint__ at all? (Was PEP 0424: A method for exposing a length hint)
On Mon, Jul 16, 2012 at 11:02 AM, Stefan Behnel wrote: > Mark Shannon, 16.07.2012 10:37: > > If resizing of lists is too slow, then we should reconsider the 9/8 > factor > > and/or look to tweak the resizing code. > > The thing is that the performance is platform specific. On systems with a > fast memory allocator, especially on Linux, the difference is negligible. > However, with a slow memory allocator, especially a slow realloc(), e.g. on > Windows or Solaris, this can substantially hurt the performance, up to a > quadratically increasing runtime in the worst case. > > The length hint was implemented specifically to work around this problem. > > Stefan > > It's not the actual allocation (typically), it's the copying that's the problem. As far as data goes - preallocation can make 4x difference (on PyPy, although the dominant operation is not different on CPython) on ''.join(some-iterable). It depends grossly on the sizes of the list, so you can't claim that there is a precise speedup of a constant factor, however, there are cases where it *can* be significant (as in the copying is by far the dominating operation), most notable giant templates with iterators written in C. Speaking of which - I find this bikeshed disgusting. The purpose of the PEP is to codify whatever is already written in code in CPython. If you guys don't want it, we'll just implement it anyway and try to follow the CPython current implementation from 2.7. Cheers, fijal ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Do we need __length_hint__ at all? (Was PEP 0424: A method for exposing a length hint)
On Jul 16, 2012, at 1:37 AM, Mark Shannon wrote: > To quote from PEP 424: > >> Rationale >> = >> Being able to pre-allocate lists based on the expected size, as estimated by >> ``__length_hint__``, can be a significant optimization. CPython has been >> observed to run some code faster than PyPy, purely because of this >> optimization >> being present. > > Why is it a significant optimisation? Unless pre-sized by with a length prediction, a growing list periodically needs to call realloc() which can move all the data to a new location in memory.Pre-sizing avoids that entirely. > If resizing of lists is too slow, then we should reconsider the 9/8 factor > and/or look to tweak the resizing code. A great deal of thought and care went into the current design. It has already been "tweaked". Raymond P.S. The dictionary code also uses presizing for copies, updates, set conversion, etc. It is a perfectly reasonable technique to pre-allocate the correct size container when the ultimate length is knowable in advance. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Do we need __length_hint__ at all? (Was PEP 0424: A method for exposing a length hint)
Mark Shannon, 16.07.2012 10:37: > If resizing of lists is too slow, then we should reconsider the 9/8 factor > and/or look to tweak the resizing code. The thing is that the performance is platform specific. On systems with a fast memory allocator, especially on Linux, the difference is negligible. However, with a slow memory allocator, especially a slow realloc(), e.g. on Windows or Solaris, this can substantially hurt the performance, up to a quadratically increasing runtime in the worst case. The length hint was implemented specifically to work around this problem. Stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Do we need __length_hint__ at all? (Was PEP 0424: A method for exposing a length hint)
To quote from PEP 424: Rationale = Being able to pre-allocate lists based on the expected size, as estimated by ``__length_hint__``, can be a significant optimization. CPython has been observed to run some code faster than PyPy, purely because of this optimization being present. Why is it a significant optimisation? How much slower is it? Where is the data? *If* resizing list is so slow, then why not make it faster? To quote wikipedia (http://en.wikipedia.org/wiki/Dynamic_array) """ As n elements are inserted, the capacities form a geometric progression. Expanding the array by any constant proportion ensures that inserting n elements takes O(n) time overall, meaning that each insertion takes amortized constant time. The value of this proportion a leads to a time-space tradeoff: the average time per insertion operation is about a/(a-1), while the number of wasted cells is bounded above by (a-1)n. The choice of a depends on the library or application: some textbooks use a = 2, but Java's ArrayList implementation uses a = 3/2 and the C implementation of Python's list data structure uses a = 9/8. """ If resizing of lists is too slow, then we should reconsider the 9/8 factor and/or look to tweak the resizing code. Cheers, Mark. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com