[Python-Dev] Bad interaction of __index__ and sequence repeat

2006-07-28 Thread Armin Rigo
Hi,

There is an oversight in the design of __index__() that only just
surfaced :-(  It is responsible for the following behavior, on a 32-bit
machine with >= 2GB of RAM:

>>> s = 'x' * (2**100)   # works!
>>> len(s)
2147483647

This is because PySequence_Repeat(v, w) works by applying w.__index__ in
order to call v->sq_repeat.  However, __index__ is defined to clip the
result to fit in a Py_ssize_t.  This means that the above problem exists
with all sequences, not just strings, given enough RAM to create such
sequences with 2147483647 items.

For reference, in 2.4 we correctly get an OverflowError.

Argh!  What should be done about it?


A bientot,

Armin.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bad interaction of __index__ and sequence repeat

2006-07-28 Thread David Hopwood
Armin Rigo wrote:
> Hi,
> 
> There is an oversight in the design of __index__() that only just
> surfaced :-(  It is responsible for the following behavior, on a 32-bit
> machine with >= 2GB of RAM:
> 
> >>> s = 'x' * (2**100)   # works!
> >>> len(s)
> 2147483647
> 
> This is because PySequence_Repeat(v, w) works by applying w.__index__ in
> order to call v->sq_repeat.  However, __index__ is defined to clip the
> result to fit in a Py_ssize_t.

Clipping the result sounds like it would *never* be a good idea. What was
the rationale for that? It should throw an exception.

-- 
David Hopwood <[EMAIL PROTECTED]>


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bad interaction of __index__ and sequence repeat

2006-07-28 Thread Michael Hudson
David Hopwood <[EMAIL PROTECTED]> writes:

> Armin Rigo wrote:
>> Hi,
>> 
>> There is an oversight in the design of __index__() that only just
>> surfaced :-(  It is responsible for the following behavior, on a 32-bit
>> machine with >= 2GB of RAM:
>> 
>> >>> s = 'x' * (2**100)   # works!
>> >>> len(s)
>> 2147483647
>> 
>> This is because PySequence_Repeat(v, w) works by applying w.__index__ in
>> order to call v->sq_repeat.  However, __index__ is defined to clip the
>> result to fit in a Py_ssize_t.
>
> Clipping the result sounds like it would *never* be a good idea. What was
> the rationale for that? It should throw an exception.

Why would you expect range(10)[:2**32-1] and range(10)[:2**32] to do
different things?

Cheers,
mwh

-- 
  This makes it possible to pass complex object hierarchies to
  a C coder who thinks computer science has made no worthwhile
  advancements since the invention of the pointer.
   -- Gordon McMillan, 30 Jul 1998
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bad interaction of __index__ and sequence repeat

2006-07-28 Thread Nick Coghlan
David Hopwood wrote:
> Armin Rigo wrote:
>> Hi,
>>
>> There is an oversight in the design of __index__() that only just
>> surfaced :-(  It is responsible for the following behavior, on a 32-bit
>> machine with >= 2GB of RAM:
>>
>> >>> s = 'x' * (2**100)   # works!
>> >>> len(s)
>> 2147483647
>>
>> This is because PySequence_Repeat(v, w) works by applying w.__index__ in
>> order to call v->sq_repeat.  However, __index__ is defined to clip the
>> result to fit in a Py_ssize_t.
> 
> Clipping the result sounds like it would *never* be a good idea. What was
> the rationale for that? It should throw an exception.

A simple demonstration of the clipping behaviour that works on machines with 
limited memory:

 >>> (2**100).__index__()
2147483647
 >>> (-2**100).__index__()
-2147483648

PEP 357 doesn't even mention the issue, and the comment on long_index in the 
code doesn't give a rationale - it just notes that the function clips the 
result.

Neither the PyNumber_AsIndex nor the __index__ documentation mention anything 
about the possibility of clipping, and there's no test case to verify this 
behaviour.

I'm inclined to call it a bug, too, but I've cc'ed Travis to see if he can 
shed some light on the question - the implementation of long_index explicitly 
suppresses the overflow error generated by _long_as_ssize_t, so the current 
behaviour appears to be deliberate.

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://www.boredomandlaziness.org
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bad interaction of __index__ and sequence repeat

2006-07-28 Thread Nick Coghlan
Michael Hudson wrote:
> David Hopwood <[EMAIL PROTECTED]> writes:
> 
>> Armin Rigo wrote:
>>> Hi,
>>>
>>> There is an oversight in the design of __index__() that only just
>>> surfaced :-(  It is responsible for the following behavior, on a 32-bit
>>> machine with >= 2GB of RAM:
>>>
>>> >>> s = 'x' * (2**100)   # works!
>>> >>> len(s)
>>> 2147483647
>>>
>>> This is because PySequence_Repeat(v, w) works by applying w.__index__ in
>>> order to call v->sq_repeat.  However, __index__ is defined to clip the
>>> result to fit in a Py_ssize_t.
>> Clipping the result sounds like it would *never* be a good idea. What was
>> the rationale for that? It should throw an exception.
> 
> Why would you expect range(10)[:2**32-1] and range(10)[:2**32] to do
> different things?

In that case, I believe it is the slice object that should be suppressing the 
overflow error (via PyErr_Occurred and PyErr_Matches) when calculating the 
indices for a given length, rather than having silent clipping be part of the 
basic implementation of long.__index__().

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://www.boredomandlaziness.org
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bad interaction of __index__ and sequence repeat

2006-07-28 Thread Tim Peters
[Armin Rigo]
> There is an oversight in the design of __index__() that only just
> surfaced :-(  It is responsible for the following behavior, on a 32-bit
> machine with >= 2GB of RAM:
>
> >>> s = 'x' * (2**100)   # works!
> >>> len(s)
> 2147483647
>
> This is because PySequence_Repeat(v, w) works by applying w.__index__ in
> order to call v->sq_repeat.

?  I don't see an invocation of __index__ or nb_index in
PySequence_Repeat.  To the contrary, its /incoming/ `count` argument
is constrained to Py_ssize_t from the start:

PyObject * PySequence_Repeat(PyObject *o, Py_ssize_t count)

... OK, I think you mean sequence_repeat() in abstract.c.  That does
invoke nb_index.  But, as below, I don't think it should in this case.

> However, __index__ is defined to clip the result to fit in a Py_ssize_t.
>  This means that the above problem exists
> with all sequences, not just strings, given enough RAM to create such
> sequences with 2147483647 items.
>
> For reference, in 2.4 we correctly get an OverflowError.
>
> Argh!  What should be done about it?

IMO, this is plain wrong.  PEP 357 isn't entirely clear, but it is
clear the author only had /slicing/ in mind (where clipping makes
sense -- and which makes `__index__` a misleading name).  Guido
pointed out the ambiguity here:

http://mail.python.org/pipermail/python-dev/2006-February/060624.html

There's also an ambiguity when using simple indexing. When writing
x[i] where x is a sequence and i an object that isn't int or long but
implements __index__, I think i.__index__() should be used rather than
bailing out.  I suspect that you didn't think of this because you've
already special-cased this in your code -- when a non-integer is
passed, the mapping API is used (mp_subscript). This is done to
suppose extended slicing. The built-in sequences (list, str, unicode,
tuple for sure, probably more) that implement mp_subscript should
probe for nb_index before giving up. The generic code in
PyObject_GetItem should also check for nb_index before giving up.

So, e.g., plain a[i] shouldn't use __index__ either if i is already
int or long.  I don't see any justification for invoking nb_index in
sequence_repeat(), although if someone thinks it should, then as for
plain indexing it certainly shouldn't invoke nb_index if the incoming
count is an int or long to begin with.

Ah, fudge.  Contrary to Guido's advice above, I see that
PyObject_GetItem() /also/ unconditionally invokes nb_index (even when
the incoming key is already int or long).  It shouldn't do that either
(according to me).

OTOH, in the long discussion about PEP 357, I'm not sure anyone except
Travis was clear on whether nb_index was meant to apply only to
sequence /slicing/ or was meant to apply "everywhere an object gets
used in an index-like context".  Clipping makes sense only for the
former, but it looks like the implementation treats it more like the
latter.  This was probably exacerbated by:

http://mail.python.org/pipermail/python-dev/2006-February/060663.html

[Travis]
There are other places in Python that check specifically for int objects
and long integer objects and fail with anything else.  Perhaps all of
these should aslo call the __index__ slot.

[Guido]
Right, absolutely.

This is a mess :-)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bad interaction of __index__ and sequence repeat

2006-07-28 Thread Guido van Rossum
Argh.

I also find it a bug.

I also feel responsible because I reviewed the patch. :-(

In my recollection I tried to avoid this exact behavior. I wanted
__index__() to just return the unclipped int or long value, but have a
C API that clipped it for use in slice operations. It looks like I
failed (the patch went through so many revisions that at some point I
must've stopped caring).

--Guido

On 7/28/06, Nick Coghlan <[EMAIL PROTECTED]> wrote:
> David Hopwood wrote:
> > Armin Rigo wrote:
> >> Hi,
> >>
> >> There is an oversight in the design of __index__() that only just
> >> surfaced :-(  It is responsible for the following behavior, on a 32-bit
> >> machine with >= 2GB of RAM:
> >>
> >> >>> s = 'x' * (2**100)   # works!
> >> >>> len(s)
> >> 2147483647
> >>
> >> This is because PySequence_Repeat(v, w) works by applying w.__index__ in
> >> order to call v->sq_repeat.  However, __index__ is defined to clip the
> >> result to fit in a Py_ssize_t.
> >
> > Clipping the result sounds like it would *never* be a good idea. What was
> > the rationale for that? It should throw an exception.
>
> A simple demonstration of the clipping behaviour that works on machines with
> limited memory:
>
>  >>> (2**100).__index__()
> 2147483647
>  >>> (-2**100).__index__()
> -2147483648
>
> PEP 357 doesn't even mention the issue, and the comment on long_index in the
> code doesn't give a rationale - it just notes that the function clips the 
> result.
>
> Neither the PyNumber_AsIndex nor the __index__ documentation mention anything
> about the possibility of clipping, and there's no test case to verify this
> behaviour.
>
> I'm inclined to call it a bug, too, but I've cc'ed Travis to see if he can
> shed some light on the question - the implementation of long_index explicitly
> suppresses the overflow error generated by _long_as_ssize_t, so the current
> behaviour appears to be deliberate.
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
> ---
>  http://www.boredomandlaziness.org
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bad interaction of __index__ and sequence repeat

2006-07-28 Thread Nick Coghlan
Tim Peters wrote:
> OTOH, in the long discussion about PEP 357, I'm not sure anyone except
> Travis was clear on whether nb_index was meant to apply only to
> sequence /slicing/ or was meant to apply "everywhere an object gets
> used in an index-like context".  Clipping makes sense only for the
> former, but it looks like the implementation treats it more like the
> latter.  This was probably exacerbated by:
> 
> http://mail.python.org/pipermail/python-dev/2006-February/060663.html
> 
> [Travis]
> There are other places in Python that check specifically for int objects
> and long integer objects and fail with anything else.  Perhaps all of
> these should aslo call the __index__ slot.
> 
> [Guido]
> Right, absolutely.
> 
> This is a mess :-)

I've been trawling through the code a bit, and I don't think it's as bad as 
all that.

All I believe is really needed is to:
  - remove the PyErr_Occurred() check and its body from long_index in 
longobject.c
  - add a PyErr_Occurred() check to force a -1 return from PyNumber_Index in 
abstract.c
  - add a PyErr_Occurred() and PyErr_ExceptionMatches(PyOverflowError) check 
to invoke PyErr_Clear() in _PyEval_SliceIndex in ceval.c.

Add test cases to test_index.py to check that:

   (2**100).__index__() == 2**100
   (-2**100).__index__() == -2**100
   slice(-2**100, 2**100).indices(sys.maxint) == (0, sys.maxint, 1)
   "a" * 2**100 raises OverflowError

Add test cases to test_operator.py to check that:

   operator.index(2**100) == 2**100
   operator.index(-2**100) == -2**100

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://www.boredomandlaziness.org
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bad interaction of __index__ and sequence repeat

2006-07-28 Thread Nick Coghlan
Nick Coghlan wrote:
> Tim Peters wrote:
>> OTOH, in the long discussion about PEP 357, I'm not sure anyone except
>> Travis was clear on whether nb_index was meant to apply only to
>> sequence /slicing/ or was meant to apply "everywhere an object gets
>> used in an index-like context".  Clipping makes sense only for the
>> former, but it looks like the implementation treats it more like the
>> latter.  This was probably exacerbated by:
>>
>> http://mail.python.org/pipermail/python-dev/2006-February/060663.html
>>
>> [Travis]
>> There are other places in Python that check specifically for int objects
>> and long integer objects and fail with anything else.  Perhaps all of
>> these should aslo call the __index__ slot.
>>
>> [Guido]
>> Right, absolutely.
>>
>> This is a mess :-)
> 
> I've been trawling through the code a bit, and I don't think it's as bad as 
> all that.

Damn, it really is a mess. . . nb_index returns the Pyssize_t directly, and a 
whole heap of the code expects errors to be signalled via returning -1 before 
checking PyErr_Occurred().

To get it to work without clipping everywhere, wrap_lenfunc (typeobject.c), 
_PyEval_SliceIndex (ceval.c), PyNumber_Index (abstract.c) and sequence_repeat 
(abstract.c) all had to be modified to recognize PY_SSIZE_T_MIN and 
PY_SSIZE_T_MAX as potential error flags (in order to clear the overflow error 
for _PyEval_SliceIndex, and in order to propagate the exception for the other 
three).

And using this approach still means that (2**100).__index__() raises an 
OverflowError.

It would probably be cleaner to change the signature of nb_index to return a 
PyObject *, and let the code that uses it worry about how (or even whether!) 
to convert PyLong results to a Py_ssize_t.

Cheers,
Nick.


-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://www.boredomandlaziness.org
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bad interaction of __index__ and sequence repeat

2006-07-28 Thread Guido van Rossum
On 7/28/06, Nick Coghlan <[EMAIL PROTECTED]> wrote:
> Nick Coghlan wrote:
> > Tim Peters wrote:
> >> OTOH, in the long discussion about PEP 357, I'm not sure anyone except
> >> Travis was clear on whether nb_index was meant to apply only to
> >> sequence /slicing/ or was meant to apply "everywhere an object gets
> >> used in an index-like context".  Clipping makes sense only for the
> >> former, but it looks like the implementation treats it more like the
> >> latter.  This was probably exacerbated by:
> >>
> >> http://mail.python.org/pipermail/python-dev/2006-February/060663.html
> >>
> >> [Travis]
> >> There are other places in Python that check specifically for int 
> >> objects
> >> and long integer objects and fail with anything else.  Perhaps all of
> >> these should aslo call the __index__ slot.
> >>
> >> [Guido]
> >> Right, absolutely.
> >>
> >> This is a mess :-)
> >
> > I've been trawling through the code a bit, and I don't think it's as bad as
> > all that.
>
> Damn, it really is a mess. . . nb_index returns the Pyssize_t directly, and a
> whole heap of the code expects errors to be signalled via returning -1 before
> checking PyErr_Occurred().
>
> To get it to work without clipping everywhere, wrap_lenfunc (typeobject.c),
> _PyEval_SliceIndex (ceval.c), PyNumber_Index (abstract.c) and sequence_repeat
> (abstract.c) all had to be modified to recognize PY_SSIZE_T_MIN and
> PY_SSIZE_T_MAX as potential error flags (in order to clear the overflow error
> for _PyEval_SliceIndex, and in order to propagate the exception for the other
> three).
>
> And using this approach still means that (2**100).__index__() raises an
> OverflowError.
>
> It would probably be cleaner to change the signature of nb_index to return a
> PyObject *, and let the code that uses it worry about how (or even whether!)
> to convert PyLong results to a Py_ssize_t.

No time to look through the code here, but IMO it's acceptable (at
least for 2.5) if (2**100).__index__() raises OverflowError, as long
as x[:2**100] silently clips. __index__() is primarily meant to return
a value useful for indexing concrete sequences, and 2**100 isn't.

Certainly the exception is preferrable to the silently truncated
result currently returned.

Fortunately there's some extra time since we're now going to do a third beta.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bad interaction of __index__ and sequence repeat

2006-07-28 Thread Tim Peters
[Tim]
>>> ...
>>> This is a mess :-)

[Nick Coghlan]
>> I've been trawling through the code a bit, and I don't think it's as bad as
>> all that.

[also Nick, but older & wiser ;-)]
> Damn, it really is a mess. . . nb_index returns the Pyssize_t directly,

Bingo.  It's a /conceptual/ mess.  Best I can make out, Travis only
cared about sequence slicing (not indexing), and then the machinery
got hijacked to become a more general "can you faithfully act like an
integer?" thing -- but kept a signature that made sense only for the
original slicing use (where clipping is fine).

> and a whole heap of the code expects errors to be signalled via returning -1 
> before
> checking PyErr_Occurred().
>
> To get it to work without clipping everywhere, wrap_lenfunc (typeobject.c),
> _PyEval_SliceIndex (ceval.c), PyNumber_Index (abstract.c) and sequence_repeat
> (abstract.c) all had to be modified to recognize PY_SSIZE_T_MIN and
> PY_SSIZE_T_MAX as potential error flags (in order to clear the overflow error
> for _PyEval_SliceIndex, and in order to propagate the exception for the other
> three).
>
> And using this approach still means that (2**100).__index__() raises an
> OverflowError.
>
> It would probably be cleaner to change the signature of nb_index to return a
> PyObject *,

Given that the more-general use is what everyone else either wanted,
or simply /assumed/, in the original discussions, I expect it would
be, although with the understanding that the PyObject * returned must
be NULL (in case of error), or a Python int or long.

> and let the code that uses it worry about how (or even whether!)
> to convert PyLong results to a Py_ssize_t.

A utility function or two could help, like one that converted to
Py_ssize_t with clipping, and another that did the same but raised
OverflowError if Py_ssize_t isn't big enough (and in the latter case a
caller would do the usual business of checking for a -1 return and
PyErr_Occurred())..
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bad interaction of __index__ and sequence repeat

2006-07-28 Thread Travis Oliphant
Nick Coghlan wrote:
> David Hopwood wrote:
>> Armin Rigo wrote:
>>> Hi,
>>>
>>> There is an oversight in the design of __index__() that only just
>>> surfaced :-(  It is responsible for the following behavior, on a 32-bit
>>> machine with >= 2GB of RAM:
>>>
>>> >>> s = 'x' * (2**100)   # works!
>>> >>> len(s)
>>> 2147483647
>>>
>>> This is because PySequence_Repeat(v, w) works by applying 
>>> w.__index__ in
>>> order to call v->sq_repeat.  However, __index__ is defined to clip the
>>> result to fit in a Py_ssize_t.
>>
>> Clipping the result sounds like it would *never* be a good idea. What 
>> was
>> the rationale for that? It should throw an exception.
>
> A simple demonstration of the clipping behaviour that works on 
> machines with limited memory:
>
> >>> (2**100).__index__()
> 2147483647
> >>> (-2**100).__index__()
> -2147483648
>
> PEP 357 doesn't even mention the issue, and the comment on long_index 
> in the code doesn't give a rationale - it just notes that the function 
> clips the result.
I can't think of the rationale so it was probably an unclear one and 
should be thought of as a bug.  The fact that it isn't discussed in the 
PEP means it wasn't thought about clearly.  I think I had the vague idea 
that .__index_() should always succeed.  But, this shows a problem with 
that notion.
>
> I'm inclined to call it a bug, too, but I've cc'ed Travis to see if he 
> can shed some light on the question - the implementation of long_index 
> explicitly suppresses the overflow error generated by 
> _long_as_ssize_t, so the current behaviour appears to be deliberate.

If it was deliberate, it was a hurried decision and one that should be 
re-thought and probably changed.  I think the idea came from the fact 
that out-of-bounds slicing returns empty lists and since __index__ was 
primarily developed to allow integer-like objects to be used in slicing 
it adopted that behavior.  In fact it looks like the comment above 
_long_index contains words from the comment above _PyEval_SliceIndex 
showing the direct borrowing of the idea.   But, _long_index is clearly 
the wrong place to handle the situation since it is used by more than 
just the slicing code.  An error return is already handled by the 
_Eval_SliceIndex code anyway.

I say it's a bug that should be fixed.  Don't clear the error, raise it.


-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bad interaction of __index__ and sequence repeat

2006-07-28 Thread Martin v. Löwis
Travis Oliphant wrote:
> I say it's a bug that should be fixed.  Don't clear the error, raise it.

Several people have said this, but I don't think it can work.

If you raise an OverflowError in __index__, the slicing code cannot know
whether this meant as overflow or underflow (in a signed sense).

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bad interaction of __index__ and sequence repeat

2006-07-28 Thread David Hopwood
Martin v. Löwis wrote:
> Travis Oliphant wrote:
> 
>>I say it's a bug that should be fixed.  Don't clear the error, raise it.
> 
> Several people have said this, but I don't think it can work.
> 
> If you raise an OverflowError in __index__, the slicing code cannot know
> whether this meant as overflow or underflow (in a signed sense).

Why not use IndexError for an underflow, and OverflowError for an overflow?

-- 
David Hopwood <[EMAIL PROTECTED]>



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bad interaction of __index__ and sequence repeat

2006-07-28 Thread Greg Ewing
Armin Rigo wrote:

> This is because PySequence_Repeat(v, w) works by applying w.__index__ in
> order to call v->sq_repeat.

Why does it do that? Shouldn't __index__ only be used for
numbers which are going to be used as an index?

> However, __index__ is defined to clip the
> result to fit in a Py_ssize_t.

Why is it defined to do this instead of raising OverflowError?

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bad interaction of __index__ and sequence repeat

2006-07-28 Thread Greg Ewing
Tim Peters wrote:

> So, e.g., plain a[i] shouldn't use __index__ either if i is already
> int or long.  I don't see any justification for invoking nb_index in
> sequence_repeat(), although if someone thinks it should, then as for
> plain indexing it certainly shouldn't invoke nb_index if the incoming
> count is an int or long to begin with.

Hmmm. So that means anything accepting an integer index
needs to do its own range checking anyway. So having
__index__ do clipping is at best unnecessary and at worst
counterproductive, since it could suppress an overflow
or range exception that *should* be produced by the code
using the index, and would be if it got the equivalent
index value as an int or long directly.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bad interaction of __index__ and sequence repeat

2006-07-28 Thread Greg Ewing
Guido van Rossum wrote:

> In my recollection I tried to avoid this exact behavior. I wanted
> __index__() to just return the unclipped int or long value, but have a
> C API that clipped it for use in slice operations.

So is there still a chance to fix it?

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bad interaction of __index__ and sequence repeat

2006-07-28 Thread Nick Coghlan
Martin v. Löwis wrote:
> Travis Oliphant wrote:
>> I say it's a bug that should be fixed.  Don't clear the error, raise it.
> 
> Several people have said this, but I don't think it can work.
> 
> If you raise an OverflowError in __index__, the slicing code cannot know
> whether this meant as overflow or underflow (in a signed sense).

It can actually, but you have to allow 3 possible error return values from 
nb_index (-1, PY_SSIZE_T_MIN and PY_SSIZE_T_MAX).

This is ugly as hell [1], so I'm going to try to work out a patch that changes 
the signature of nb_index to return PyObject * (leaving the client code in the 
core to decide whether to clip, overflow, or leave the result alone).

Cheers,
Nick.

[1] http://www.python.org/sf/1530738

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://www.boredomandlaziness.org
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bad interaction of __index__ and sequence repeat

2006-07-29 Thread Armin Rigo
Hi Guido,

On Fri, Jul 28, 2006 at 11:31:09AM -0700, Guido van Rossum wrote:
> No time to look through the code here, but IMO it's acceptable (at
> least for 2.5) if (2**100).__index__() raises OverflowError, as long
> as x[:2**100] silently clips. __index__() is primarily meant to return
> a value useful for indexing concrete sequences, and 2**100 isn't.

If nb_index keeps returning a Py_ssize_t with clipping, it means that
there is no way to write in pure Python an object that emulates a long
-- only an int.  Sounds inconsistent with the int/long unification trend
for pure Python code.  It would make it awkward to write, say, pure
Python classes that pretend to be very large sequences, because using
__index__ in such code wouldn't work.

Another example of this is that if places like sequence_repeat are made
to use the following pseudo-logic:

if isinstance(w, long) and w > sys.maxint:
raise OverflowError
else:
i = w.__index__()

then if an object 'l' is an emulated pseudo-long, then  "x"*l  will
still silently clip the pseudo-long to sys.maxint.

I'm more in favor of changing nb_index to return a PyObject *, since now
is our last chance to do so.  A pair of API functions can be added to
return a Py_ssize_t with either the proper clipping, or the proper
OverflowError'ing.


A bientot,

Armin.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bad interaction of __index__ and sequence repeat

2006-07-29 Thread Nick Coghlan
Armin Rigo wrote:
> Hi,
> 
> There is an oversight in the design of __index__() that only just
> surfaced :-(  It is responsible for the following behavior, on a 32-bit
> machine with >= 2GB of RAM:
> 
> >>> s = 'x' * (2**100)   # works!
> >>> len(s)
> 2147483647
> 
> This is because PySequence_Repeat(v, w) works by applying w.__index__ in
> order to call v->sq_repeat.  However, __index__ is defined to clip the
> result to fit in a Py_ssize_t.  This means that the above problem exists
> with all sequences, not just strings, given enough RAM to create such
> sequences with 2147483647 items.
> 
> For reference, in 2.4 we correctly get an OverflowError.
> 
> Argh!  What should be done about it?

I've now got a patch on SF that aims to fix this properly [1].

The gist of the patch:

1. Redesign the PyNumber_Index C API to serve the actual use cases in the 
interpreter core and the standard library.

   The PEP 357 abstract C API as written was bypassed by nearly all of the 
uses in the core and the standard library. The patch redesigns that API to 
reduce code duplication between the various parts of the code base that were 
previously calling nb_index directly.

   The principal change is to provide an "is_index" output variable that the 
various mp_subscript implementations can use to determine whether or not the 
passed in object was an index or not, rather than having to repeat the type 
check everywhere. The rationale for doing it this way:
   a. you may want to try something else (e.g. the mp_subscript 
implementations in the standard library try indexing before checking to see if 
the object is a slice object)
   b. a different error message may be wanted (e.g. the normal indexing 
related Type Error doesn't make sense for sequence repetition)
   c. a separate checking function would lead to repeating the check on common 
code paths (e.g. if an mp_subscript implementation did the type check first, 
and then PyNumber_Check did it again to see whether or not to raise an error)

   The output variable solves the problem nicely: either pass in NULL to get 
the default behaviour of raising a sequence indexing TypeError, or pass in a 
pointer to a C int in order to be told whether or not the typecheck succeeded 
without an exception actually being set if it fails (if the typecheck passes, 
but the actual call fails, the exception state is set as normal).

   Additionally, PyNumber_Index is redefined to raise an IndexError for values 
that cannot be represented as a Py_ssize_t. The choice of IndexError was made 
based on the dominant usage in the standard library (IndexError is the correct 
error to raise so that an mp_subscript implementation does the right thing). 
There are only a few places that need to override the IndexError to replace it 
with OverflowError (the length argument to slice.indices, sequence repetition, 
the mmap constructor), whereas all of the sequence objects (list, tuple, 
string, unicode, array), as well as PyObject_Get/Set/DelItem, need it to raise 
IndexError.

   Raising IndexError also benefits sequences implemented in Python, which can 
simply do:

   def __getitem__(self, idx):
  if isinstance(idx, slice):
  return self._get_slice(idx)
  idx = operator.index(idx) # Will raise IndexError on overflow

   A second API function PyNumber_SliceIndex is added so that the clipping 
semantics are still available where needed and _PyEval_SliceIndex is modified 
to use the new public API. This is exposed to Python code as 
operator.sliceindex().

   With the redesigned C API, the *only* code that calls the nb_index slot 
directly is the two functions in abstract.c. Everything else uses one or the 
other of those interfaces. Code duplication was significantly reduced as a 
result, and it should be much easier for 3rd party C libraries to do what they 
need to do (i.e. implementing mp_subscript slots).

2. Redefine nb_index to return a PyObject *

   Returning the PyInt/PyLong objects directly from nb_index greatly 
simplified the implementation of the nb_index methods for the affected 
classes. For classic classes, instance_index could be modified to simply 
return the result of calling __index__, as could slot_nb_index for new-style 
classes. For the standard library classes, the existing int_int method, and 
the long_long method could be used instead of needing new functions.

   This convenience should hold true for extension classes - instead of 
needing to implement __index__ separately, they should be able to reuse their 
existing __int__ or __long__ implementations.

   The other benefit is that the logic to downconvert to Py_ssize_t that was 
formerly invoked by long's __index__ method is now instead invoked by 
PyNumber_Index and PyNumber_SliceIndex. This means that directly calling an 
__index__() method allows large long results to be passed through unaffected, 
but calling the indexing operator will raise IndexError if the long is outside 
the memo

Re: [Python-Dev] Bad interaction of __index__ and sequence repeat

2006-07-29 Thread Nick Coghlan
Nick Coghlan wrote:
>   The other benefit is that the logic to downconvert to Py_ssize_t that 
> was formerly invoked by long's __index__ method is now instead invoked 
> by PyNumber_Index and PyNumber_SliceIndex. This means that directly 
> calling an __index__() method allows large long results to be passed 
> through unaffected, but calling the indexing operator will raise 
> IndexError if the long is outside the memory address space:
> 
>   (2 ** 100).__index__() == (2**100)  # This works
>   operator.index(2**100)  # This raises IndexError
> 
> The patch includes additions to test_index.py to cover these limit 
> cases, as well as the necessary updates to the C API and operator module 
> documentation.

I forgot to mention the main benefit of this: when working with a 
pseudo-sequence rather than a concrete one, __index__() can be used directly 
to ensure you are working with integral data types while still allowing access 
to the full range of representable integer values.

operator.index is available for when what you have really is a concrete data 
set that is limited to the memory capacity of a single machine, and 
operator.sliceindex for when you want to clamp at the memory address space 
limits rather than raising an exception.

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://www.boredomandlaziness.org
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bad interaction of __index__ and sequence repeat

2006-07-29 Thread Nick Coghlan
Nick Coghlan wrote:
> Armin Rigo wrote:
>> Hi,
>>
>> There is an oversight in the design of __index__() that only just
>> surfaced :-(  It is responsible for the following behavior, on a 32-bit
>> machine with >= 2GB of RAM:
>>
>> >>> s = 'x' * (2**100)   # works!
>> >>> len(s)
>> 2147483647
>>
>> This is because PySequence_Repeat(v, w) works by applying w.__index__ in
>> order to call v->sq_repeat.  However, __index__ is defined to clip the
>> result to fit in a Py_ssize_t.  This means that the above problem exists
>> with all sequences, not just strings, given enough RAM to create such
>> sequences with 2147483647 items.
>>
>> For reference, in 2.4 we correctly get an OverflowError.
>>
>> Argh!  What should be done about it?
> 
> I've now got a patch on SF that aims to fix this properly [1].

I revised this patch to further reduce the code duplication associated with 
the indexing code in the standard library.

The patch now has three new functions in the abstract C API:

   PyNumber_Index (used in a dozen or so places)
 - raises IndexError on overflow
   PyNumber_AsSsize_t (used in 3 places)
 - raises OverflowError on overflow
   PyNumber_AsClippedSsize_t() (used once, by _PyEval_SliceIndex)
 - clips to PY_SSIZE_T_MIN/MAX on overflow

All 3 have an int * output argument allowing type errors to be flagged 
directly to the caller rather than through PyErr_Occurred().

Of the 3, only PyNumber_Index is exposed through the operator module.

Probably the most interesting thing now would be for Travis to review it, and 
see whether it makes things easier to handle for the Numeric scalar types 
(given the amount of code the patch deleted from the builtin and standard 
library data types, hopefully the benefits to Numeric will be comparable).

Cheers,
Nick.

[1] http://www.python.org/sf/1530738


-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://www.boredomandlaziness.org
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bad interaction of __index__ and sequence repeat

2006-08-01 Thread Nick Coghlan
Travis Oliphant wrote:
>> Probably the most interesting thing now would be for Travis to review 
>> it, and see whether it makes things easier to handle for the Numeric 
>> scalar types (given the amount of code the patch deleted from the 
>> builtin and standard library data types, hopefully the benefits to 
>> Numeric will be comparable).
> 
> 
> I noticed most of the checks for PyInt where removed in the patch.  If I 
> remember correctly, I left these in for "optimization."   Other than 
> that, I think the patch is great.

You're right - there was a fast path based on PyInt_Check in 
_PyEval_SliceIndex that got lost, which I'll add back in. I'll also add fast 
paths for PyInt_Check to the functions in abstract.c, too.

The other PyInt_Check's (in slot_nb_index and instance_index) were there to 
check that __index__ returned the right thing. The check was still there in 
slot_nb_index, but I'd incorrectly removed it from instance_index. I'll add 
that one back in, too.

Once that's done, I'll update the tracker item and reassign to Tim for a review.

Cheers,
Nick.

> As far as helping with NumPy,  I think it will help to be able to remove 
> special-checks for all the different integer-types.  But, this has not 
> yet been done in the NumPy code.
> 
> -Travis
> 
> 
> 


-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://www.boredomandlaziness.org
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bad interaction of __index__ and sequence repeat

2006-08-01 Thread Nick Coghlan
Nick Coghlan wrote:
> Travis Oliphant wrote:
>>> Probably the most interesting thing now would be for Travis to review 
>>> it, and see whether it makes things easier to handle for the Numeric 
>>> scalar types (given the amount of code the patch deleted from the 
>>> builtin and standard library data types, hopefully the benefits to 
>>> Numeric will be comparable).
>>
>> I noticed most of the checks for PyInt where removed in the patch.  If I 
>> remember correctly, I left these in for "optimization."   Other than 
>> that, I think the patch is great.
> 
> You're right - there was a fast path based on PyInt_Check in 
> _PyEval_SliceIndex that got lost, which I'll add back in. I'll also add fast 
> paths for PyInt_Check to the functions in abstract.c, too.
> 
> The other PyInt_Check's (in slot_nb_index and instance_index) were there to 
> check that __index__ returned the right thing. The check was still there in 
> slot_nb_index, but I'd incorrectly removed it from instance_index. I'll add 
> that one back in, too.
> 
> Once that's done, I'll update the tracker item and reassign to Tim for a 
> review.

Aside from the 'assign to Tim' part, this is done now (pep357-fix-v3 on the 
tracker item). I also tweaked the clipping version of the C API to optionally 
expose the overflow flag that abstract.c was using internally so the client 
code can find out whether or not clipping actually occurred.

The patch also updates the index unit tests to:
   - cover the various type errors that Travis picked up
   - actually run the full set of unit tests under -O
   - only run the basic sequence independent tests once (instead of once per 
sequence type)

Neal also had a number of questions that I responded to in the tracker comments.

The most significant question related to the API signature, which Neal 
considered potentially confusing:
   PyNumber_Index(PyObject *item, int *is_index)

I added the is_index flag to the function signatures instead of providing a 
separate PyNumber_IndexCheck function in order to avoid doing the same type 
check twice in the typical mp_subscript implementation cases.

One possibility would be to invert the sense of that flag and call it 
"typeerror", which probably more accurately reflects what it's intended for - 
it's a way of telling the function "if this object does not have the correct 
type, tell me by setting this flag instead of by setting the Python error 
state".

Regards,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://www.boredomandlaziness.org
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bad interaction of __index__ and sequence repeat

2006-08-02 Thread Guido van Rossum
All, please decide without me.

On 8/1/06, Nick Coghlan <[EMAIL PROTECTED]> wrote:
> Nick Coghlan wrote:
> > Travis Oliphant wrote:
> >>> Probably the most interesting thing now would be for Travis to review
> >>> it, and see whether it makes things easier to handle for the Numeric
> >>> scalar types (given the amount of code the patch deleted from the
> >>> builtin and standard library data types, hopefully the benefits to
> >>> Numeric will be comparable).
> >>
> >> I noticed most of the checks for PyInt where removed in the patch.  If I
> >> remember correctly, I left these in for "optimization."   Other than
> >> that, I think the patch is great.
> >
> > You're right - there was a fast path based on PyInt_Check in
> > _PyEval_SliceIndex that got lost, which I'll add back in. I'll also add
> fast
> > paths for PyInt_Check to the functions in abstract.c, too.
> >
> > The other PyInt_Check's (in slot_nb_index and instance_index) were there
> to
> > check that __index__ returned the right thing. The check was still there
> in
> > slot_nb_index, but I'd incorrectly removed it from instance_index. I'll
> add
> > that one back in, too.
> >
> > Once that's done, I'll update the tracker item and reassign to Tim for a
> review.
>
> Aside from the 'assign to Tim' part, this is done now (pep357-fix-v3 on the
> tracker item). I also tweaked the clipping version of the C API to
> optionally
> expose the overflow flag that abstract.c was using internally so the client
> code can find out whether or not clipping actually occurred.
>
> The patch also updates the index unit tests to:
>- cover the various type errors that Travis picked up
>- actually run the full set of unit tests under -O
>- only run the basic sequence independent tests once (instead of once per
> sequence type)
>
> Neal also had a number of questions that I responded to in the tracker
> comments.
>
> The most significant question related to the API signature, which Neal
> considered potentially confusing:
>PyNumber_Index(PyObject *item, int *is_index)
>
> I added the is_index flag to the function signatures instead of providing a
> separate PyNumber_IndexCheck function in order to avoid doing the same type
> check twice in the typical mp_subscript implementation cases.
>
> One possibility would be to invert the sense of that flag and call it
> "typeerror", which probably more accurately reflects what it's intended for
> -
> it's a way of telling the function "if this object does not have the correct
> type, tell me by setting this flag instead of by setting the Python error
> state".
>
> Regards,
> Nick.
>
> --
> Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
> ---
>  http://www.boredomandlaziness.org
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bad interaction of __index__ and sequence repeat

2006-08-02 Thread Travis Oliphant
Nick Coghlan wrote:
> 
> 
> One possibility would be to invert the sense of that flag and call it 
> "typeerror", which probably more accurately reflects what it's intended for - 
> it's a way of telling the function "if this object does not have the correct 
> type, tell me by setting this flag instead of by setting the Python error 
> state".

+1 on changing the name (type_error might be a better spelling)

-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bad interaction of __index__ and sequence repeat

2006-08-03 Thread Travis Oliphant
Nick Coghlan wrote:
> Nick Coghlan wrote:
>> Armin Rigo wrote:
>>> Hi,
>>>
>>> There is an oversight in the design of __index__() that only just
>>> surfaced :-(  It is responsible for the following behavior, on a 32-bit
>>> machine with >= 2GB of RAM:
>>>
>>> >>> s = 'x' * (2**100)   # works!
>>> >>> len(s)
>>> 2147483647
>>>
>>> This is because PySequence_Repeat(v, w) works by applying 
>>> w.__index__ in
>>> order to call v->sq_repeat.  However, __index__ is defined to clip the
>>> result to fit in a Py_ssize_t.  This means that the above problem 
>>> exists
>>> with all sequences, not just strings, given enough RAM to create such
>>> sequences with 2147483647 items.
>>>
>>> For reference, in 2.4 we correctly get an OverflowError.
>>>
>>> Argh!  What should be done about it?
>>
>> I've now got a patch on SF that aims to fix this properly [1].
>
> I revised this patch to further reduce the code duplication associated 
> with the indexing code in the standard library.
>
> The patch now has three new functions in the abstract C API:
>
>   PyNumber_Index (used in a dozen or so places)
> - raises IndexError on overflow
>   PyNumber_AsSsize_t (used in 3 places)
> - raises OverflowError on overflow
>   PyNumber_AsClippedSsize_t() (used once, by _PyEval_SliceIndex)
> - clips to PY_SSIZE_T_MIN/MAX on overflow
>
> All 3 have an int * output argument allowing type errors to be flagged 
> directly to the caller rather than through PyErr_Occurred().
>
> Of the 3, only PyNumber_Index is exposed through the operator module.
>
> Probably the most interesting thing now would be for Travis to review 
> it, and see whether it makes things easier to handle for the Numeric 
> scalar types (given the amount of code the patch deleted from the 
> builtin and standard library data types, hopefully the benefits to 
> Numeric will be comparable).


I noticed most of the checks for PyInt where removed in the patch.  If I 
remember correctly, I left these in for "optimization."   Other than 
that, I think the patch is great.

As far as helping with NumPy,  I think it will help to be able to remove 
special-checks for all the different integer-types.  But, this has not 
yet been done in the NumPy code.

-Travis


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com