I updated the patches destined for the trunk (slice-object support for all objects that supported simple slicing, and actual extended slicing support for most of them) and checked them in. Next stop is cleaning up the actual slice-removal bits. I do have two remaining issues: what do we do about PyMapping_Check(), and should I make the post an actual PEP? I'm thinking PyMapping_Check() should be removed, or made to look at tp_as_sequence->sq_item instead of tp_as_sequence->sq_slice and deprecated. I also think this change should be documented in an actual PEP, as it'll be a pretty big change for any C extension implementing simple slices but not slice-object support.
On 8/24/07, Thomas Wouters <[EMAIL PROTECTED]> wrote: > > > I did some work at last year's Google sprint on removing the simple > slicing API (__getslice__, tp_as_sequence->sq_slice) in favour of the more > flexible sliceobject API (__getitem__ and tp_as_mapping->mp_subscript using > slice objects as index.) For some more detail, see the semi-PEP below. (I > hesitate to call it a PEP because it's way past the Py3k PEP deadline, but > the email I was originally going to send on this subject grew in such a size > that I figured I might as well use PEP layout and use the opportunity to > record some best practices and behaviour. And the change should probably be > recorded in a PEP anyway, even though it has never been formally proposed, > just taken as a given.) > > If anyone is bored and/or interested in doing some complicated work, there > is still a bit of (optional) work to be done in this area: I uploaded > patches to be applied to the trunk SF 8 months ago -- extended slicing > support for a bunch of types. Some of that extended slicing support is > limited to step-1 slices, though, most notably UserString.MutableStringand > ctypes. I can guarantee adding non-step-1 support to them is a > challenging and fulfilling exercise, having done it for several types, but I > can't muster the intellectual stamina to do it for these (to me) fringe > types. The patches can be found in Roundup: > http://bugs.python.org/issue?%40search_text=&title=&%40columns=title&id=&%40columns=id&creation=&creator=twouters&activity=&%40columns=activity&%40sort=activity&actor=&type=&components=&versions=&severity=&dependencies=&assignee=&keywords=&priority=&%40group=priority&status=1&%40columns=status&resolution=&%40pagesize=50&%40startwith=0&%40action=search > (there doesn't seem to be a shorter URL; just search for issues created by > 'twouters' instead.) > > If nobody cares, I will be checking these patches into the trunk this > weekend (after updating them), and then update and check in the rest of the > p3yk-noslice branch into the py3k branch. > > Abstract > ======== > > This proposal discusses getting rid of the two types of slicing Python > uses, > ``simple`` and ``extended``. Extended slicing was added later, and uses a > different API at both the C and the Python level for backward > compatibility. > Extended slicing can express everything simple slicing can express, > however, making the simple slicing API practically redundant. > > A Tale of Two APIs > ================== > > Simple slicing is a slice operation without a step, Ellipsis or tuple of > slices -- the archetypical slice of just `start` and/or `stop`, with a > single colon separating them and both sides being optional:: > > L[1:3] > L[2:] > L[:-5] > L[:] > > An extended slice is any slice that isn't simple:: > > L[1:5:2] > L[1:3, 8:10] > L[1, ..., 5:-2] > L[1:3:] > > (Note that the presence of an extra colon in the last example makes the > very > first simple slice an extended slice, but otherwise expresses the exact > same > slicing operation.) > > In applying a simple slice, Python does the work of translating omitted, > out > of bounds or negative indices into the appropriate actual indices, based > on > the length of the sequence. The normalized ``start`` and ``stop`` indices > are then passed to the appropriate method: ``__getslice__``, > ``__setslice__`` or ``__delslice__`` for Python classes, > ``tp_as_sequence``'s ``sq_slice`` or ``sq_ass_slice`` for C types. > > For extended slicing, no special handling of slice indices is done. The > indices in ``start:stop:step`` are wrapped in a ``slice`` object, with > missing indices represented as None. The indices are otherwise taken > as-is. > The sequence object is then indexed with the slice object as if it were a > mapping: ``__getitem__``,`` __setitem__`` or ``__delitem__`` for Python > classes, ``tp_as_mapping``'s ``mp_subscript`` or ``mp_ass_subscript``. > It is entirely up to the sequence to interpret the meaning of missing, out > > of bounds or negative indices, let alone non-numerical indices like tuples > or Ellipsis or arbitrary objects. > > Since at least Python 2.1, applying a simple slice to an object that does > not > implement the simple slicing API will fall back to using extended slicing, > > calling __getitem__ (or mp_subscript) instead of __getslice__ (or > sq_slice), > and similarly for slice assignment/deletion. > > Problems > ======== > > Aside from the obvious disadvantage of having two ways to do the same > thing, > simple slicing is an inconvenient wart for several reasons: > > 1) It (passively) promotes supporting only simple slicing, as observed by > the builtin types only supporting extended slicing many years after > extended slicing was introduced. > > 2) The Python VM dedicates 12 of its opcodes, about 11%, to support > simple slicing, and effectively reserves another 13 for code > convenience. Reducing the Big Switch in the bytecode interpreter > would certainly not hurt Python performance. > > 5) The same goes for the number of functions, macros and > function-pointers > supporting simple slicing, although the impact would be > maintainability > and readability of the source rather than performance. > > Proposed Solution > ================= > > The proposed solution, as implemented in the p3yk-noslice SVN branch, gets > rid of the simple slicing methods and PyType entries. The simple C API > (using ``Py_ssize_t`` for start and stop) remains, but creates a slice > object as necessary instead. Various types had to be updated to support > slice objects, or improve the simple slicing case of extended slicing. > > The result is that ``__getslice__``, ``__setslice__`` and ``__delslice__`` > are no longer > called in any situation. Classes that delegate ``__getitem__`` (or the C > equivalent) to a sequence type get any slicing behaviour of that type for > free. Classes that implement their own slicing will have to be modified to > > accept slice objects and process the indices themselves. This means that > at > the C level, like is already the case at the Python level, the same method > is used for mapping-like access as for slicing. C types will still want to > > implement ``tp_as_sequence->sq_item``, but that function will only be > called > when using the ``PySequence_*Item()`` API. Those API functions do not > (yet) fall > back to using ``tp_as_mapping->mp_subscript``, although they possibly > should. > > A casualty of this change is ``PyMapping_Check()``. It used to check for > ``tp_as_mapping`` being available, and was modified to check for > ``tp_as_mapping`` but *not* ``tp_as_sequence->sq_slice`` when extended > slicing was added to the builtin types. It could conceivably check for > ``tp_as_sequence->sq_item`` instead of ``sq_slice``, but the added value > is > unclear (especially considering ABCs.) In the standard library and CPython > > itself, ``PyMapping_Check()`` is used mostly to provide early errors, for > instance by checking the arguments to ``exec()``. > > Alternate Solution > ------------------ > > A possible alternative to removing simple slicing completely, would be to > introduce a new typestruct hook, with the same signature as > ``tp_as_mapping->mp_subscript``, which would be called for slicing > operations. All as-mapping index operations would have to fall back to > this > new ``sq_extended_slice`` hook, in order for ``seq[slice(...)]`` to work > as > expected. For some added efficiency and error-checking, expressions using > actual slice syntax could compile into bytecodes specific for slicing (of > which there would only be three, instead of twelve.) This approach would > simplify C types wanting to support extended slicing but not > arbitrary-object indexing (and vice-versa) somewhat, but the benefit seems > too small to warrant the added complexity in the CPython runtime itself. > > > Implementing Extended Slicing > ============================= > > Supporting extended slicing in C types is not as easily done as supporting > simple slicing. There are a number of edgecases in interpreting the odder > combinations of ``start``, ``stop`` and ``step``. This section tries to > give > some explanations and best practices. > > Extended Slicing in C > --------------------- > > Because the mapping API takes precedence over the sequence API, any > ``tp_as_mapping->mp_subscript`` and ``tp_as_mapping->mp_ass_subscript`` > functions need to proper typechecks on their argument. In Python 2.5 and > later, this is best done using ``PyIndex_Check()`` and ``PySlice_Check()`` > > (and possibly ``PyTuple_Check()`` and comparison against ``Py_Ellipsis``.) > For compatibility with Python 2.4 and earlier, ``PyIndex_Check()`` would > have to be replaced with ``PyInt_Check()`` and ``PyLong_Check()``. > > Indices that pass ``PyIndex_Check()`` should be converted to a > ``Py_ssize_t`` using ``PyIndex_AsSsizeT()`` and delegated to > ``tp_as_sequence->sq_item``. (For compatibility with Python 2.4, use > ``PyNumber_AsLong()`` and downcast to an ``int`` instead.) > > The exact meaning of tuples of slices, and of Ellipsis, is up to the type, > as no standard-library types support it. It may be useful to use the same > convention as the Numpy package. Slices inside tuples, if supported, > should > probably follow the same rules as direct slices. > > From slice objects, correct indices can be extracted with > ``PySlice_GetIndicesEx()``. Negative and out-of-bounds indices will be > adjusted based on the provided length, but a negative ``step``, and a > ``stop`` before a ``step`` are kept as-is. This means that, for a getslice > operation, a simple for-loop can be used to visit the correct items in the > correct order:: > > for (cur = start, i = 0; i < slicelength; cur += step, i++) > dest[i] = src[cur]; > > > If ``PySlice_GetIndicesEx()`` is not appropriate, the individual indices > can > be extracted from the ``PySlice`` object. If the indices are to be > converted > to C types, that should be done using ``PyIndex_Check()``, > ``PyIndex_AsSsizeT()`` and the ``Py_ssize_t`` type, except that ``None`` > should be accepted as the default value for the index. > > For deleting slices (``mp_ass_subscript`` called with ``NULL`` as > value) where the order does not matter, a reverse slice can be turned into > > the equivalent forward slice with:: > > if (step < 0) { > stop = start + 1; > start = stop + step*(slicelength - 1) - 1; > step = -step; > } > > > For slice assignment with a ``step`` other than 1, it's usually necessary > to > require the source iterable to have the same length as the slice. When > assigning to a slice of length 0, care needs to be taken to select the > right > insertion point. For a slice S[5:2], the correct insertion point is before > > index 5, not before index 2. > > For both deleting slice and slice assignment, it is important to remember > arbitrary Python code may be executed when calling Py_DECREF() or > otherwise > interacting with arbitrary objects. Because of that, it's important your > datatype stays consistent throughout the operation. Either operate on a > copy > of your datatype, or delay (for instance) Py_DECREF() calls until the > datatype is updated. The latter is usually done by keeping a scratchpad of > > to-be-DECREF'ed items. > > Extended slicing in Python > -------------------------- > > The simplest way to support extended slicing in Python is by delegating to > an underlying type that already supports extended slicing. The class can > simply index the underlying type with the slice object (or tuple) it was > indexed with. > > Barring that, the Python code will have to pretty much apply > the same logic as the C type. ``PyIndex_AsSsizeT()`` is available as > ``operator.index()``, with a ``try/except`` block replacing > ``PyIndex_Check()``. ``isinstance(o, slice)`` and ``sliceobj.indices()`` > replace ``PySlice_Check()`` and ``PySlice_GetIndices()``, but the > slicelength > (which is provided by ``PySlice_GetIndicesEx()``) has to be calculated > manually. > > Testing extended slicing > ------------------------ > > Proper tests of extended slicing capabilities should at least include the > following (if the operations are supported), assuming a sequence of > length 10. Triple-colon notation is used everywhere so it uses extended > slicing even in Python 2.5 and earlier:: > > S[2:5:] (same as S[2:5]) > S[5:2:] (same as S[5:2], an empty slice) > S[::] (same as S[:], a copy of the sequence) > S[:2:] (same as S[:2]) > S[:11:] (same as S[:11], a copy of the sequence) > S[5::] (same as S[5:]) > S[-11::] (same as S[-11:], a copy of the sequence) > S[-5:2:1] (same as S[:2]) > S[-5:-2:2] (same as S[-5:-2], an empty slice) > S[5:2:-1] (the reverse of S[2:4]) > S[-2:-5:-1] (the reverse of S[-4:-1]) > > S[:5:2] ([ S[0], S[2], S[4] ])) > S[9::2] ([ S[9] ]) > S[8::2] ([ S[8] ]) > S[7::2] ([ S[7], S[9]]) > S[1::-1] ([ S[1], S[0] ]) > S[1:0:-1] ([ S[1] ], does not include S[0]!) > S[1:-1:-1] (an empty slice) > S[::10] ([ S[0] ]) > S[::-10] ([ S[9] ]) > > S[2:5:] = [1, 2, 3] ([ S[2], S[3], S[4] ] become [1, 2, 3]) > S[2:5:] = [1] (S[2] becomes 1, S[3] and S[4] are deleted) > S[5:2:] = [1, 2, 3] ([1, 2, 3] inserted before S[5]) > S[2:5:2] = [1, 2] ([ S[2], S[4] ] become [1, 2]) > S[5:2:-2] = [1, 2] ([ S[3], S[5] ] become [2, 1]) > S[3::3] = [1, 2, 3] ([ S[3], S[6], S[9] ] become [1, 2, 3]) > S[:-5:-2] = [1, 2] ([ S[7], S[9] ] become [2, 1]) > > S[::-1] = S (reverse S in-place awkwardly) > S[:5:] = S (replaces S[:5] with a copy of S) > > S[2:5:2] = [1, 2, 3] (error: assigning length-3 to slicelength-2) > S[2:5:2] = None (error: need iterable) > > > -- Thomas Wouters <[EMAIL PROTECTED]> Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
_______________________________________________ Python-3000 mailing list [email protected] http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
