Re: [Python-Dev] index clipping

Nick Coghlan Thu, 10 Aug 2006 07:18:04 -0700

Guido van Rossum wrote:
>> It seems like Nick's recent patches solved the problems that were
>> identified.
> 
> Nick, can you summarize how your patches differ from my proposal?


nb_index and __index__ are essentially exactly as you propose. To make an 
object implemented in C usable as an index you would take either the nb_int 
slot or the nb_long slot and put the same function pointer into the nb_index 
slot. For a Python object, you would write either '__index__ = __int__' or 
'__index__ = __long__' as part of the class definition.

operator.index is provided to support writing __getitem__, __setitem__ and 
__delitem__ methods - it raises IndexError on overflow so you don't have to 
catch and reraise to convert an OverflowError to an IndexError.

On the C API side, the 3 functions you suggest are all present (although the 
version returning a Python object is accessed via PyObject_CallMethod), and 
there's a 4th variant that raises IndexError instead of OverflowError (this 
version is convenient when writing mp_subscript and mp_ass_subscript functions).

Avoiding Py_ssize_t -> PyInt -> Py_ssize_t conversions for all integer types 
implemented in C would be nice, but I don't think it's practical (the latest 
version of the patch does at least avoid it for the builtin integer types).

Cheers,
Nick.



P.S. Here's the detailed rationale for the form the patch has evolved to [1]:

In addition to allowing (2**100).__index__() == 2**100, having nb_index return 
a Python object resulted in a decent reduction in code duplication - 
previously the coercion logic to get a Python integer or long value down to a 
Py_ssize_t was present in 3 places (long_index, instance_index, 
slot_nb_index), and would also have needed to be duplicated by any other C 
implemented index type whose value could exceed the range of a Py_ssize_t. 
With the patch, that logic appears only inside abstract.c and extension types 
can just return a PyLong value and let the interpreter figure out how to 
handle overflow. The biggest benefit of this approach is that a single slot 
(nb_index) can be used to implement four different overflow behaviours in the 
core (return PyLong, raise OverflowError, raise IndexError, clip to 
Py_ssize_t), as well as providing a hook to allow extension module authors to 
define their own overflow handling.

If the nb_index slot does not return a true Python integer or long, TypeError 
gets raised. Subclasses are not accepted in order to rule out Armin's 
favourite set of recursion problems :)

The C level API is based on the use cases in the standard library, with one of 
the functions generalised a bit to allow extension modules to easily handle 
type errors and overflow differently if they want to.

The three different use cases for nb_index in the standard library are:
   - concrete sequence indices (want IndexError on overflow)
   - 'true integer' retrieval (want OverflowError on overflow)
   - slice endpoints (want to clip to Py_ssize_t max/min values)

The proposed fix (Travis & Neal provided some useful comments on earlier 
versions) includes a C API function for each of these different use cases:

   PyNumber_Index(PyObject *obj, int *type_err)
   PyNumber_AsSsize_t(PyObject *obj, int *type_err)
   PyNumber_AsClippedSsize_t(PyObject *obj, int *type_err, int *clipped)

type_err is an output variable to say "obj does not provide nb_index" in order 
to get rid of boilerplate dealing with PyErr_Occurred() in mp_subscript and 
mp_ass_subscript implementations (those methods generally didn't want a 
TypeError raised at this point - they wanted to go on and check if the object 
was a slice object instead). It's also useful if you want to provide a 
specific error message for TypeErrors (sequence repetition takes advantage of 
this). You can also leave the pointer as NULL and the functions will raise a 
fairly generic TypeError for you. PyObject_GetItem and friends, use the 
functions that way.

Avoiding repeated code is also why there are two non-clipping variants, one 
raising IndexError and one raising OverflowError. Raising OverflowError in 
PyNumber_Index broke half a dozen unit tests, while raising IndexError for 
things like sequence repetition turned out to break different unit tests.

The clipping variant is for slice indices. The interpreter core doesn't 
actually care whether or not the result gets clipped in this case (it sets the 
last parameter to NULL), but I kept the output variable in the signature for 
the benefit of extension authors.

All 3 of the C API methods return Py_ssize_t. The "give me a Python object" 
case isn't actually needed anywhere in the core, but is available to extension 
modules via:
   PyObject_CallMethod(obj, "__index__", NULL)

As Travis notes, indexing with something other than a builtin integer will be 
slightly slower due to the temporary object created by calling the nb_index 
slot (version 4 of the patch avoids this overhead for ints, version 5 avoids 
it for longs as well). I don't think this is avoidable - a non-PyObject return 
value really doesn't provide the necessary flexibility to detect and handle 
overflow correctly.

[1] http://www.python.org/sf/1530738

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] __index__ clipping

Reply via email to

Re: [Python-Dev] index clipping