Terry Reedy wrote: > "Paul Rubin" wrote: > >>Really it's x[-1]'s behavior that should go, not find/rfind. > > I complete disagree, x[-1] as an abbreviation of x[len(x)-1] is extremely > useful, especially when 'x' is an expression instead of a name.
Hear us out; your disagreement might not be so complete as you think. From-the-far-end indexing is too useful a feature to trash. If you look back several posts, you'll see that the suggestion here is that the index expression should explicitly call for it, rather than treat negative integers as a special case. I wrote up and sent off my proposal, and once the PEP-Editors respond, I'll be pitching it on the python-dev list. Below is the version I sent (not yet a listed PEP). -- --Bryan PEP: -1 Title: Improved from-the-end indexing and slicing Version: $Revision: 1.00 $ Last-Modified: $Date: 2005/08/26 00:00:00 $ Author: Bryan G. Olson <[EMAIL PROTECTED]> Status: Draft Type: Standards Track Content-Type: text/plain Created: 26 Aug 2005 Post-History: Abstract To index or slice a sequence from the far end, we propose using a symbol, '$', to stand for the length, instead of Python's current special-case interpretation of negative subscripts. Where Python currently uses: sequence[-i] We propose: sequence[$ - i] Python's treatment of negative indexes as offsets from the high end of a sequence causes minor obvious problems and major subtle ones. This PEP proposes a consistent meaning for indexes, yet still supports from-the-far-end indexing. Use of new syntax avoids breaking existing code. Specification We propose a new style of slicing and indexing for Python sequences. Instead of: sequence[start : stop : step] new-style slicing uses the syntax: sequence[start ; stop ; step] It works like current slicing, except that negative start or stop values do not trigger from-the-high-end interpretation. Omissions and 'None' work the same as in old-style slicing. Within the square-brackets, the '$' symbol stands for the length of the sequence. One can index from the high end by subtracting the index from '$'. Instead of: seq[3 : -4] we write: seq[3 ; $ - 4] When square-brackets appear within other square-brackets, the inner-most bracket-pair determines which sequence '$' describes. The length of the next-outer sequence is denoted by '$1', and the next-out after than by '$2', and so on. The symbol '$0' behaves identically to '$'. Resolution of $x is syntactic; a callable object invoked within square brackets cannot use the symbol to examine the context of the call. The '$' notation also works in simple (non-slice) indexing. Instead of: seq[-2] we write: seq[$ - 2] If we did not care about backward compatibility, new-style slicing would define seq[-2] to be out-of-bounds. Of course we do care about backward compatibility, and rejecting negative indexes would break way too much code. For now, simple indexing with a negative subscript (and no '$') must continue to index from the high end, as a deprecated feature. The presence of '$' always indicates new-style indexing, so a programmer who needs a negative index to trigger a range error can write: seq[($ - $) + index] Motivation From-the-far-end indexing is such a useful feature that we cannot reasonably propose its removal; nevertheless Python's current method, which is to treat a range of negative indexes as special cases, is warty. The wart bites novice or imperfect Pythoners by not raising an exceptions when they need to know about a bug. For example, the following code prints 'y' with no sign of error: s = 'buggy' print s[s.find('w')] The wart becomes an even bigger problem with more sophisticated use of Python sequences. What is the 'stop' value for a slice when the step is negative and the slice includes the zero index? An instance of Python's slice type will report that the stop value is -1, but if we use this stop value to slice, it gets misinterpreted as the last index in the sequence. Here's an example: class BuggerAll: def __init__(self, somelist): self.sequence = somelist[:] def __getitem__(self, key): if isinstance(key, slice): start, stop, step = key.indices(len(self.sequence)) # print 'Slice says start, stop, step are:', start, stop, step return self.sequence[start : stop : step] print range(10) [None : None : -2] print BuggerAll(range(10))[None : None : -2] The above prints: [9, 7, 5, 3, 1] [] Un-commenting the print statement in __getitem__ shows: Slice says start, stop, step are: 9 -1 -2 The slice object seems to think that -1 is a valid exclusive bound, but when using it to actually slice, Python interprets the negative number as an offset from the high end of the sequence. Steven Bethard offered the simpler example: py> range(10)[slice(None, None, -2)] [9, 7, 5, 3, 1] py> slice(None, None, -2).indices(10) (9, -1, -2) py> range(10)[9:-1:-2] [] The double-meaning of -1, as both an exclusive stopping bound and an alias for the highest valid index, is just plain whacked. So what should the slice object return? With Python's current indexing/slicing, there is no value that just works. 'None' will work as a stop value in a slice, but index arithmetic will fail. The value 0 - (len(sequence) + 1) will work as a stop value, and slice arithmetic and range() will happily use it, but the result is not what the programmer probably intended. The problem is subtle. A Python sequence starts at index zero. There is some appeal to giving negative indexes a useful interpretation, on the theory that they were invalid as subscripts and thus useless otherwise. That theory is wrong, because negative indexes were already useful, even though not legal subscripts, and the reinterpretation often breaks their exiting use. Specifically, negative indexes are useful in index arithmetic, and as exclusive stopping bounds. The problem is fixable. We propose that negative indexes not be treated as a special case. To index from the far end of a sequence, we use a syntax that explicitly calls for far-end indexing. Rationale New-style slicing/indexing is designed to fix the problems described above, yet live happily in Python along-side the old style. The new syntax leaves the meaning of existing code unchanged, and is even more Pythonic than current Python. Semicolons look a lot like colons, so the new semicolon syntax follows the rule that things that are similar should look similar. The semicolon syntax is currently illegal, so its addition will not break existing code. Python is historically tied to C, and the semicolon syntax is evocative of the similar start-stop-step expressions of C's 'for' loop. JPython is tied to Java, which uses a similar 'for' loop syntax. The '$' character currently has no place in a Python index, so its new interpretation will not break existing code. We chose it over other unused symbols because the usage roughly corresponds to its meaning in the Python library's regular expression module. We expect use of the $0, $1, $2 ... syntax to be rare; nevertheless, it has a Pythonic consistency. Thanks to Paul Rubin for advocating it over the inferior multiple-$ syntax that this author initially proposed. Backwards Compatibility To avoid braking code, we use new syntax that is currently illegal. The new syntax more-or-less looks like current Python, which may help Python programmers adjust. User-defined classes that implement the sequence protocol are likely to work, unchanged, with new-style slicing. 'Likely' is not certain; we've found one subtle issue (and there may be others): Currently, user-defined classes can implement Python subscripting and slicing without implementing Python's len() function. In our proposal, the '$' symbol stands for the sequence's length, so classes must be able to report their length in order for $ to work within their slices and indexes. Specifically, to support new-style slicing, a class that accepts index or slice arguments to any of: __getitem__ __setitem__ __delitem__ __getslice__ __setslice__ __delslice__ must also consistently implement: __len__ Sane programmers already follow this rule. Copyright: This document has been placed in the public domain. -- http://mail.python.org/mailman/listinfo/python-list