[Armin Rigo] > There is an oversight in the design of __index__() that only just > surfaced :-( It is responsible for the following behavior, on a 32-bit > machine with >= 2GB of RAM: > > >>> s = 'x' * (2**100) # works! > >>> len(s) > 2147483647 > > This is because PySequence_Repeat(v, w) works by applying w.__index__ in > order to call v->sq_repeat.
? I don't see an invocation of __index__ or nb_index in PySequence_Repeat. To the contrary, its /incoming/ `count` argument is constrained to Py_ssize_t from the start: PyObject * PySequence_Repeat(PyObject *o, Py_ssize_t count) ... OK, I think you mean sequence_repeat() in abstract.c. That does invoke nb_index. But, as below, I don't think it should in this case. > However, __index__ is defined to clip the result to fit in a Py_ssize_t. > This means that the above problem exists > with all sequences, not just strings, given enough RAM to create such > sequences with 2147483647 items. > > For reference, in 2.4 we correctly get an OverflowError. > > Argh! What should be done about it? IMO, this is plain wrong. PEP 357 isn't entirely clear, but it is clear the author only had /slicing/ in mind (where clipping makes sense -- and which makes `__index__` a misleading name). Guido pointed out the ambiguity here: http://mail.python.org/pipermail/python-dev/2006-February/060624.html There's also an ambiguity when using simple indexing. When writing x[i] where x is a sequence and i an object that isn't int or long but implements __index__, I think i.__index__() should be used rather than bailing out. I suspect that you didn't think of this because you've already special-cased this in your code -- when a non-integer is passed, the mapping API is used (mp_subscript). This is done to suppose extended slicing. The built-in sequences (list, str, unicode, tuple for sure, probably more) that implement mp_subscript should probe for nb_index before giving up. The generic code in PyObject_GetItem should also check for nb_index before giving up. So, e.g., plain a[i] shouldn't use __index__ either if i is already int or long. I don't see any justification for invoking nb_index in sequence_repeat(), although if someone thinks it should, then as for plain indexing it certainly shouldn't invoke nb_index if the incoming count is an int or long to begin with. Ah, fudge. Contrary to Guido's advice above, I see that PyObject_GetItem() /also/ unconditionally invokes nb_index (even when the incoming key is already int or long). It shouldn't do that either (according to me). OTOH, in the long discussion about PEP 357, I'm not sure anyone except Travis was clear on whether nb_index was meant to apply only to sequence /slicing/ or was meant to apply "everywhere an object gets used in an index-like context". Clipping makes sense only for the former, but it looks like the implementation treats it more like the latter. This was probably exacerbated by: http://mail.python.org/pipermail/python-dev/2006-February/060663.html [Travis] There are other places in Python that check specifically for int objects and long integer objects and fail with anything else. Perhaps all of these should aslo call the __index__ slot. [Guido] Right, absolutely. This is a mess :-) _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com