Re: [Python-Dev] Maintenance burden of str.swapcase
Glyph Lefkowitz wrote: > On Sep 11, 2011, at 11:49 AM, Michael Foord wrote: > Does anyone *actually* use .title() for this? > > Yes. Twisted does, in various MIME-ish places (IMAP, SIP), > although not in HTTP from what I can see. I imagine other > similar software would as well. Not to mention it doesn't work for WWW-Authenticate or TE, to give just a couple of examples. Robert Brewer fuman...@aminus.org ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Maintenance burden of str.swapcase
On Sep 11, 2011, at 11:49 AM, Michael Foord wrote: > Does anyone *actually* use .title() for this? (And why not just use the > correct casing in the string literal...) Yes. Twisted does, in various MIME-ish places (IMAP, SIP), although not in HTTP from what I can see. I imagine other similar software would as well. One issue is that you don't always have a string literal to work with. If you're proxying traffic, you start from a mis-cased header and you possibly need to correct it to a canonically-cased one. (On at least one occasion I've had to use such a proxy to make certain buggy client software work.) Of course you could have something like {b"CONNECTION-LOST": b"Connection-Lost", ...} somewhere at module scope, but that feels a bit sillier than just having a nice '.title()' method. -glyph ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Maintenance burden of str.swapcase
On 08/09/2011 03:46, Stephen J. Turnbull wrote: Glyph Lefkowitz writes: > On Sep 7, 2011, at 10:26 AM, Stephen J. Turnbull wrote: > > > How about "title"? > > >>> 'content-length'.title() > 'Content-Length' > Does anyone *actually* use .title() for this? (And why not just use the correct casing in the string literal...) Michael > You might say that the protocol "has" to be case-insensitive so > this is a silly frill: Not me, sir. My whole point about the "bytes should be more like str" controversy is the dual of that: you don't know what will be coming at you, so the regularities and (normally allowable) fuzziness of text processing are inadmissible. > there are definitely enough case-sensitive crappy bits of network > middleware out there that this function is critically important for > an HTTP server. "Critically important" is surely an overstatement. You could always title-case the literal strings containing field names in the source. The problem with having lots of str-like features on bytes is that you lose TOOWDTI, or worse, to many performance-happy coders, use of bytes becomes TOOWDTI "because none of the characters[sic] I'm planning to process myself are non-ASCII". This is the road to Babel; it's workable for one-off scripts but it's asking for long-term trouble in multi-module applications. The choice of decoding to str and processing in that form should be made as attractive as possible. On the other hand, it is undeniably useful for protocol tokens to have mnemonic representations even in binary protocols. Textual manipulations on those tokens should be convenient. It seems to me that what might be an improvement over the current situation (maybe for Py4k only, though) is for bytes and (PEP-393-style) str to share representation, and have a "cast" method which would convert from one to the other, validating that the range contraints on the representation are satisfied. The problem I see is that this either sanctions the practice of using latin-1 as "ASCII plus anything", which is an unpleasant hack, or you'd need to check in text methods that nothing is done with non-ASCII values other than checks for set membership (including equality comparison, of course). OTOH, AFAICS, Antoine's claim that inserting a non-latin-1 character in a str that happens to contain only ASCII values would convert the representation to multioctets (true), and therefore this doesn't give the desired efficiency properties, is beside the point. Just don't do that! You *can't* do that in a bytes object, anyway; use of str in this way is a "consenting adults" issue. You trade off the convenience of the full suite of text tools vs. the possibility that somebody might insert such a character -- but for the algorithms they're going to be using, they shouldn't be doing that anyway. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Maintenance burden of str.swapcase
Glyph Lefkowitz writes: > On Sep 7, 2011, at 10:26 AM, Stephen J. Turnbull wrote: > > > How about "title"? > > >>> 'content-length'.title() > 'Content-Length' > > You might say that the protocol "has" to be case-insensitive so > this is a silly frill: Not me, sir. My whole point about the "bytes should be more like str" controversy is the dual of that: you don't know what will be coming at you, so the regularities and (normally allowable) fuzziness of text processing are inadmissible. > there are definitely enough case-sensitive crappy bits of network > middleware out there that this function is critically important for > an HTTP server. "Critically important" is surely an overstatement. You could always title-case the literal strings containing field names in the source. The problem with having lots of str-like features on bytes is that you lose TOOWDTI, or worse, to many performance-happy coders, use of bytes becomes TOOWDTI "because none of the characters[sic] I'm planning to process myself are non-ASCII". This is the road to Babel; it's workable for one-off scripts but it's asking for long-term trouble in multi-module applications. The choice of decoding to str and processing in that form should be made as attractive as possible. On the other hand, it is undeniably useful for protocol tokens to have mnemonic representations even in binary protocols. Textual manipulations on those tokens should be convenient. It seems to me that what might be an improvement over the current situation (maybe for Py4k only, though) is for bytes and (PEP-393-style) str to share representation, and have a "cast" method which would convert from one to the other, validating that the range contraints on the representation are satisfied. The problem I see is that this either sanctions the practice of using latin-1 as "ASCII plus anything", which is an unpleasant hack, or you'd need to check in text methods that nothing is done with non-ASCII values other than checks for set membership (including equality comparison, of course). OTOH, AFAICS, Antoine's claim that inserting a non-latin-1 character in a str that happens to contain only ASCII values would convert the representation to multioctets (true), and therefore this doesn't give the desired efficiency properties, is beside the point. Just don't do that! You *can't* do that in a bytes object, anyway; use of str in this way is a "consenting adults" issue. You trade off the convenience of the full suite of text tools vs. the possibility that somebody might insert such a character -- but for the algorithms they're going to be using, they shouldn't be doing that anyway. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Maintenance burden of str.swapcase
On Thu, Sep 8, 2011 at 3:51 AM, Glyph Lefkowitz wrote: > On Sep 7, 2011, at 10:26 AM, Stephen J. Turnbull wrote: > > How about "title"? > 'content-length'.title() > 'Content-Length' > You might say that the protocol "has" to be case-insensitive so this is a > silly frill: there are definitely enough case-sensitive crappy bits of > network middleware out there that this function is critically important for > an HTTP server. Actually, the HTTP header case occurred to me as well shortly after sending my last message, so I think it's a legitimate reason to keep the methods around on bytes and bytearray. So, putting my "practicality beats purity" hat back on, I would describe the status quo as follows: 1. Binary data is not text, so bytes and bytearray are deliberately conceptualised as arrays of arbitrary integers in the range 0-255 rather than as arrays of 8-bit 'characters'. This distinction is one of the core design principles separating Python 3 from Python 2. 2. However, the use of ASCII words and characters is a common feature of many existing wire protocols, so it is useful to be able to manipulate binary sequences that contain data in an ASCII-compatible format without having to convert them to text first. Retaining additional ASCII-based methods also eases the transition to Python 3 for code that manipulates binary data using the 2.x str type. 3. ASCII whitespace characters are used as delimeters in many formats. Thus, various methods such as split(), partition(), strip() and their variants, retain their "ASCII whitespace" default arguments and expandtabs() is also retained. 4. Padding values out to fill fields of a certain size is needed for some formats. Thus, center(), ljust(), rjust(), zfill() are retained (again retaining their ASCII space default fill character in the case of the first 3 methods) 5. Identifying ASCII alphanumeric data is important for some formats. Thus, isalnum(), isalpha() and isdigit() are retained. 6. Case insensitive ASCII comparisons are important for some formats (e.g. RFC 822 headers, HTTP headers). Thus, upper(), lower(), isupper() and islower() are retained. 7. Even correct mixed case ASCII can be important for some formats (e.g. HTTP headers). Thus, capitalize(), title() and istitle() are retained. 8. A valid use for swapcase() on binary data has not been identified, but once all the other ASCII based methods are being kept around for the various reasons given above, it doesn't seem worth the effort to get rid of this one (despite the additional implementation effort needed for alternate implementations). 9. Algorithms that operate purely on binary data or purely on text can just use literals of the appropriate type (if they use literals at all). Algorithms that are designed to operate on either kind of data may want to adopt an implicit decode/encode approach to handle binary inputs (this allows assumptions regarding the input encoding to be made explicit). I'm actually fairly happy with that rationalisation for the current Python 3 set up. I'd been thinking recently that we would have been better off if more of the methods that rely on the data using an ASCII compatible encoding scheme had been removed from bytes and bytearray, but swapcase() is really the only one we can't give a decent justification for beyond "it was there in 2.x". Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Maintenance burden of str.swapcase
On Sep 7, 2011, at 10:26 AM, Stephen J. Turnbull wrote: > How about "title"? >>> 'content-length'.title() 'Content-Length' You might say that the protocol "has" to be case-insensitive so this is a silly frill: there are definitely enough case-sensitive crappy bits of network middleware out there that this function is critically important for an HTTP server. In general I'd like to defend keeping as many of these methods as possible for compatibility (porting to Py3 is already hard enough). Although even I might have a hard time defending 'swapcase', which is never used _at all_ within Twisted, on text or bytes. The only use-case I can think of for that method is goofy joke text filters, and it wouldn't be very good at that either. -glyph ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Maintenance burden of str.swapcase
Antoine Pitrou writes: > You could also point out UTF-16 or EBCDIC, but I fail to see how that's > relevant. Do you have problems with ISO 2022 when parsing, say, e-mail > headers? Yes, of course! Especially when it's say, packed EUC not encapsulated in MIME words. I think Mailman now handles that without crashing, but it took 10 years. Most Emacs MUAs still blow chunks on that. My procmail recipes and my employer's virus checker both occasionally punt. The point about ISO 2022 is that it allows arbitrary binary crap in the stream, delimited by appropriate well-defined constructs. Just like the ASCII-like tokens in the protocols you talk about. But parsing full-bore ISO 2022 is non-trivial, especially if you're going to try to provide error-handling that's useful to the user. Nobody ever really took it seriously as a solution to the problem of internationalization in the 15 years or so when it was the only solution, and even less so once it became clear that UCSes were going to get traction. > > > not arbitrary "arrays of bytes". And making indexing of bytes > > > objects return ints was IMHO a mistake. > > > > Bytes objects are not ASCII strings, even though they can be used to > > represent them. > > I'm talking about practice, So am I, and so is Nick. > not some idealistic view of the world. > In many use cases (XML, HTML, e-mail headers, many other test-based > protocols), you can get a mixture of ASCII "commands", and opaque > binary stuff (which will or will not, depending on these "commands", > have a meaningful unicode decoding). Yeah, so what? Those protocol tokens are deliberately chosen to resemble ASCII text, but you need to parse them out of the binary sludge somehow, and the surrounding content remains binary sludge until deserialized or (for text) decoded. How is having b[0] return a bytes object, rather than an integer, going to help in that? Especially if the value is not in the ASCII range? > > AFAICS, anything that should be done with ASCII-punned magic numbers > > ("protocol tokens", if you prefer) can be done with slices and (ta-da!) > > case conversion. > > So, basically, you're saying that we should remove useful functionality No, that *was* Nick's position; I specifically opposed the suggestion that "lower" and "upper" be removed, and he concurred after a bit of thought. And remember, he's talking about removing "swapcase". Which RFC defines a protocol where that would be useful? How about "title"? > and tell people to reimplement an adhoc version of it when they > need it. Of course not; I'm with Michael Foord on that: nobody should ever be asked to reimplement swapcase! My position is simply that bytes are not text, and the occasional reminder (such as b[0] returning an integer, not a bytes object) is good. My experience has been that it makes a lot of sense to layer these things, for example transforming a protocol stream serialized as octets into a more structured object composed of protocol tokens and payloads. It's *not* text, and the relevant techniques are different. It's like the old saw about "aha, I'll use regexps to solve this problem!" and now you have *two* problems. I don't advocate getting rid of regexps, and I don't advocate removing methods from bytes (although I do dream about it occasionally). I do advocate that people think twice before implementing complex text-like algorithms on binary protocol streams. If the stream really is text-like, then transform it into text of a known, well-behaved encoding, and then apply the powerful text-processing facilities provided for str. If it's not, then transform to a token stream or whatever makes sense. In both cases, do as little "text processing" on bytes objects as possible, and put more structure on the content as soon as possible. If you really need the efficiency, then do what you need to do. As I say, I don't have any practical objection to keeping your tools for that case. But such applications, although important (I guess), are a minority. > That sounds obnoxious. Good advice almost always sounds obnoxious to the recipient. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Maintenance burden of str.swapcase
On Wed, Sep 7, 2011 at 6:31 PM, Simon Cross wrote: > http://www.google.com/codesearch#search/&q=swapcase%20lang:%5Epython$&type=cs > > There are quite a few hits but more people appear to be > re-implementing it than using it (I haven't gone to the trouble of > mining the search results to get an accurate picture though). Scratch that -- I should gloss over search results less. It looks like the most common use case is to provide a consistent string-like API somewhere else. So removing it is liking to cause headaches (e.g. test failures) for the people who are wrapping it. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Maintenance burden of str.swapcase
On Tue, Sep 6, 2011 at 10:36 PM, "Martin v. Löwis" wrote: >> Which applications? I'm not sure the number of applications using >> str.swapcase gets even as high as ten. > > I think this is what people underestimate. I can't name > applications either - but that doesn't mean they don't exist. > I'm deeply convinced that the majority of Python code (and > I mean *large* majority) is unpublished. > > I expect thousands of uses world-wide. http://www.google.com/codesearch#search/&q=swapcase%20lang:%5Epython$&type=cs There are quite a few hits but more people appear to be re-implementing it than using it (I haven't gone to the trouble of mining the search results to get an accurate picture though). ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Maintenance burden of str.swapcase
On Wed, 07 Sep 2011 11:15:04 +0900 "Stephen J. Turnbull" wrote: > Antoine Pitrou writes: > > > Bytes objects are often used for partly ASCII strings, > > All I can say to that phrase is, "urk, ISO 2022 anyone?" You could also point out UTF-16 or EBCDIC, but I fail to see how that's relevant. Do you have problems with ISO 2022 when parsing, say, e-mail headers? > > not arbitrary "arrays of bytes". And making indexing of bytes > > objects return ints was IMHO a mistake. > > Bytes objects are not ASCII strings, even though they can be used to > represent them. I'm talking about practice, not some idealistic view of the world. In many use cases (XML, HTML, e-mail headers, many other test-based protocols), you can get a mixture of ASCII "commands", and opaque binary stuff (which will or will not, depending on these "commands", have a meaningful unicode decoding). In the stdlib, bytes objects are accessed far more often to poke at some text-like data, than to poke at arbitrary numbers. > With PEP 393, > there isn't even really a space excuse. Of course there is. Any single non-ASCII byte of data mingled with aforementioned ASCII "commands" will make it switch to a less efficient representation. And "surrogateescape" will be a performance problem in itself, when used on large binary data; if you use "latin1" instead, you are risking far greater confusion; ask David about that dilemma. :-) > AFAICS, anything that should be done with ASCII-punned magic numbers > ("protocol tokens", if you prefer) can be done with slices and (ta-da!) > case conversion. So, basically, you're saying that we should remove useful functionality and tell people to reimplement an adhoc version of it when they need it. That sounds obnoxious. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Maintenance burden of str.swapcase
On Wed, Sep 7, 2011 at 11:53 AM, Stephen J. Turnbull wrote: > Nick Coghlan writes: > > The case-related methods, though, have no place in sane wire > > protocol handling. > > RFC 822 headers are a somewhat insane but venerable (isn't that true > of anything that's reached age 350 in dog-years?), and venerated, > counterexample. Specifically, field names are case-insensitive (RFC > 5322, section 1.2.2). I'll bet you can find plenty of others if you > look. You can call that "text" and say it should be processed in > Unicode, if you like, but you're not even going to convince me (and as > I say, I like the Kool-Aid). Specifically, SMTP processes can (and > even MUST, under some circumstances IIRC) manipulate the RFC 822 header. > > Sorry, Nick, no can do. > > -1 Heh, I knew as soon as I sent that message that someone would be able to point out a counter example. I agree that RFC 822 (and case-insensitive ASCII comparison in general) is enough to save lower() and upper() and co, but what about this even further reduced list of text-specific methods: 'capitalize' 'istitle' 'swapcase' 'title' While case-insensitive comparison makes sense for wire level data, where do these methods fit in, even when embedded ASCII text fragments are involved? Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Maintenance burden of str.swapcase
Antoine Pitrou writes: > Bytes objects are often used for partly ASCII strings, All I can say to that phrase is, "urk, ISO 2022 anyone?" > not arbitrary "arrays of bytes". And making indexing of bytes > objects return ints was IMHO a mistake. Bytes objects are not ASCII strings, even though they can be used to represent them. The practice of using magic numbers that look like English words is a useful one, but by the same token, it should not be too easy to use bytes to represent *text* just because the programmer doesn't know any words that don't fit into 7*N bits. With PEP 393, there isn't even really a space excuse. AFAICS, anything that should be done with ASCII-punned magic numbers ("protocol tokens", if you prefer) can be done with slices and (ta-da!) case conversion. (Sorry, Nick!) But the components of a bytes object are just numbers; they are not characters until you've run them through a codec. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Maintenance burden of str.swapcase
Nick Coghlan writes: > However, a big +1 for deprecation in the case of bytes and bytearray. > That's nothing to do with the maintenance burden though, it's to do > with the semantic confusion between binary data and ASCII-encoded text > implied by the retention of methods like upper(), lower() and > swapcase(). [...] > These are all text operations, not something you do with binary data. "Yea, Brother, Amen!" I like the taste of this Kool-Aid. But > The case-related methods, though, have no place in sane wire > protocol handling. RFC 822 headers are a somewhat insane but venerable (isn't that true of anything that's reached age 350 in dog-years?), and venerated, counterexample. Specifically, field names are case-insensitive (RFC 5322, section 1.2.2). I'll bet you can find plenty of others if you look. You can call that "text" and say it should be processed in Unicode, if you like, but you're not even going to convince me (and as I say, I like the Kool-Aid). Specifically, SMTP processes can (and even MUST, under some circumstances IIRC) manipulate the RFC 822 header. Sorry, Nick, no can do. -1 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Maintenance burden of str.swapcase
On Wed, 7 Sep 2011 10:47:16 +1000 Nick Coghlan wrote: > > However, a big +1 for deprecation in the case of bytes and bytearray. > That's nothing to do with the maintenance burden though, it's to do > with the semantic confusion between binary data and ASCII-encoded text > implied by the retention of methods like upper(), lower() and > swapcase(). A big -1 on that. Bytes objects are often used for partly ASCII strings, not arbitrary "arrays of bytes". And making indexing of bytes objects return ints was IMHO a mistake. Besides, if you want an array of ints, there's already array.array() with your typecode of choice. Not sure why other types should conform. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Maintenance burden of str.swapcase
Raymond Hettinger wrote: On Sep 6, 2011, at 1:36 PM, Martin v. Löwis wrote: I think this is what people underestimate. I can't name applications either - but that doesn't mean they don't exist. Google code search is pretty good indicator that this method has near zero uptake. If it dies, I don't think anyone will cry. Near-zero is not zero, and Terry has already shown some examples of code which use, or misuse, swapcase. In any case (pun intended *wink*) this was discussed in December and Guido expressed little enthusiasm for the idea: http://mail.python.org/pipermail/python-dev/2010-December/106650.html I can't exactly defend the existence of swapcase, it does seem to be a fairly specialised function. But given that it exists, I'm -0.5 on removal on the basis of "if it ain't broke, don't fix it". -- Steven ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Maintenance burden of str.swapcase
On Wed, Sep 7, 2011 at 7:23 AM, Raymond Hettinger wrote: > > On Sep 6, 2011, at 1:36 PM, Martin v. Löwis wrote: > > I think this is what people underestimate. I can't name > applications either - but that doesn't mean they don't exist. > > Google code search is pretty good indicator that this method > has near zero uptake. If it dies, I don't think anyone will cry. For str itself, I'm -0 on removing it - the Unicode implications mean implementation isn't completely trivial and there's at least one legitimate use case (i.e. providing, or deliberately reversing, Caps Lock style functionality). However, a big +1 for deprecation in the case of bytes and bytearray. That's nothing to do with the maintenance burden though, it's to do with the semantic confusion between binary data and ASCII-encoded text implied by the retention of methods like upper(), lower() and swapcase(). Specifically, the methods I consider particularly problematic on that front are: 'capitalize' 'islower' 'istitle' 'isupper' 'lower' 'swapcase' 'title' 'upper' These are all text operations, not something you do with binary data. There are some other methods that make ASCII specific default assumptions regarding whitespace and line separators, but ASCII whitespace is often used as a delimiter in wire protocols so losing those would be genuinely annoying. I've also left out the methods for identifying ASCII letters and digits, since again, those are useful for interpreting various wire encodings. The case-related methods, though, have no place in sane wire protocol handling. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Maintenance burden of str.swapcase
On Sep 6, 2011, at 1:36 PM, Martin v. Löwis wrote: > I think this is what people underestimate. I can't name > applications either - but that doesn't mean they don't exist. Google code search is pretty good indicator that this method has near zero uptake. If it dies, I don't think anyone will cry. Raymond___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Maintenance burden of str.swapcase
> Which applications? I'm not sure the number of applications using > str.swapcase gets even as high as ten. I think this is what people underestimate. I can't name applications either - but that doesn't mean they don't exist. I'm deeply convinced that the majority of Python code (and I mean *large* majority) is unpublished. I expect thousands of uses world-wide. > We still have our standard deprecation policy that we can follow in > Python 3. We don't have to wait until Python 4 to remove things. That's true. However, part of the deprecation procedure is also that there should be a rationale for removing it. In the past, things have been removed that had been superseded with something new, or things that had been flawed in their design so that fixing it wasn't really possible, or that did indeed cause ongoing maintenance effort for a minority of users (such as the support for little-used platforms). None if these motivations hold for str.swapcase, and I think the "other implementations will have to implement it" is not sufficient motivation. If the other implementations believe that the feature is truly useless and also not used, they just can declare it a deliberate deviation from CPython, and refuse to implement it. If I had to pick a truly useless feature, I'd kill complex numbers, not str.swapcase. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Maintenance burden of str.swapcase
On 6 Sep 2011, at 21:18, Martin v. Löwis wrote: >>> Perhaps I missed something early on, but why are we proposing >>> removing a function which (presumably) is stable and tested and >>> works and is not broken? What maintenance is needed here? >> >> >> The maintenance burden is on other implementations. > > It's not a maintenance burden (at least not in the sense in which > I understand the word "maintenance" - as an ongoing effort). When > they implement it once, the implementation can likely stay forever, > unmodified. Ok, burden rather than "maintenance" burden. > >> Even if there is >> no maintenance burden for CPython having useless methods simply >> because it is less effort to leave them in place creates work for >> new implementations wanting to be fully compatible. > > That's true. > > However, that alone is not enough reason to remove the feature, IMO. > The effort that is saved is not only on the developers of CPython, > but also on users of the feature. My claim is that for any little-used > feature, removing it costs more time world-wide than re-implementing > it in 10 alternative Python implementations (with the number 10 drawn > out of blue air), because of the cost of changing the applications that > actually do use the feature. > Which applications? I'm not sure the number of applications using str.swapcase gets even as high as ten. > With the switch to Python 3, there would have been a chance to remove > little-used features. IMO, the next such chance is with Python 4. > It could be useful to start collecting little-used features that might > be removed with Python 4 - which I don't expect until 2020. We still have our standard deprecation policy that we can follow in Python 3. We don't have to wait until Python 4 to remove things. Changing semantics or syntax is harder because you can't really deprecate. Just removing methods is straightforward. MIchael > > Regards, > Martin > -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Maintenance burden of str.swapcase
>> Perhaps I missed something early on, but why are we proposing >> removing a function which (presumably) is stable and tested and >> works and is not broken? What maintenance is needed here? > > > The maintenance burden is on other implementations. It's not a maintenance burden (at least not in the sense in which I understand the word "maintenance" - as an ongoing effort). When they implement it once, the implementation can likely stay forever, unmodified. > Even if there is > no maintenance burden for CPython having useless methods simply > because it is less effort to leave them in place creates work for > new implementations wanting to be fully compatible. That's true. However, that alone is not enough reason to remove the feature, IMO. The effort that is saved is not only on the developers of CPython, but also on users of the feature. My claim is that for any little-used feature, removing it costs more time world-wide than re-implementing it in 10 alternative Python implementations (with the number 10 drawn out of blue air), because of the cost of changing the applications that actually do use the feature. With the switch to Python 3, there would have been a chance to remove little-used features. IMO, the next such chance is with Python 4. It could be useful to start collecting little-used features that might be removed with Python 4 - which I don't expect until 2020. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Maintenance burden of str.swapcase
On Sep 06, 2011, at 03:42 PM, Fred Drake wrote: >On Tue, Sep 6, 2011 at 3:36 PM, Steven D'Aprano wrote: >> pERSONNALLY, i THINK THAT A SWAPCASE COMMAND IS ESSENTIAL FOR TEXT EDITOR >> APPLICATIONS, TO AVOID THOSE LITTLE cAPS lOCK ACCIDENTS. > >There's a better solution to that, but the caps lock lobby has a stranglehold >on keyboard manufacturers. Fight The Man with xmodmap! -Barry ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Maintenance burden of str.swapcase
On Tue, Sep 6, 2011 at 3:36 PM, Steven D'Aprano wrote: > pERSONNALLY, i THINK THAT A SWAPCASE COMMAND IS ESSENTIAL FOR TEXT EDITOR > APPLICATIONS, TO AVOID THOSE LITTLE cAPS lOCK ACCIDENTS. There's a better solution to that, but the caps lock lobby has a stranglehold on keyboard manufacturers. -- Fred L. Drake, Jr. "A person who won't read has no advantage over one who can't read." --Samuel Langhorne Clemens ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Maintenance burden of str.swapcase
On 6 Sep 2011, at 20:36, Steven D'Aprano wrote: > Terry Reedy wrote: >> On 9/6/2011 12:58 PM, Tres Seaver wrote: >>> -BEGIN PGP SIGNED MESSAGE- >>> Hash: SHA1 >>> >>> On 09/06/2011 12:59 PM, Stephen J. Turnbull wrote: Joao S. O. Bueno writes: > Removing it would mean explicitly "batteries removal". That's what we usually do with a dead battery, no? >>> >>> Normally one "replaces" dead batteries. :) >> Not if it is dead and leaking because the device has been unused for years. > > > Can we please not make decisions about what code should be removed based on > dodgy analogies? :) > > Perhaps I missed something early on, but why are we proposing removing a > function which (presumably) is stable and tested and works and is not broken? > What maintenance is needed here? The maintenance burden is on other implementations. Even if there is no maintenance burden for CPython having useless methods simply because it is less effort to leave them in place creates work for new implementations wanting to be fully compatible. > > > [...] >> If k is lowercase, .lower() is redundant and k[0].swapcase()+k[1:].lower() >> == k.title(). > > Not so. > > >>> k = ' ' > >>> k.title() > 'Aaaa Bbbb' > >>> k[0].swapcase()+k[1:].lower() > 'Aaaa ' > > >> If k is uppercase, previous .upper() is redundant. If k is mixed case, code >> may have problems. > > "May" have problems? > > > pERSONNALLY, i THINK THAT A SWAPCASE COMMAND IS ESSENTIAL FOR TEXT EDITOR > APPLICATIONS, TO AVOID THOSE LITTLE cAPS lOCK ACCIDENTS. Have you ever used str.swapcase for that purpose? Michael -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Maintenance burden of str.swapcase
Terry Reedy wrote: On 9/6/2011 12:58 PM, Tres Seaver wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 09/06/2011 12:59 PM, Stephen J. Turnbull wrote: Joao S. O. Bueno writes: Removing it would mean explicitly "batteries removal". That's what we usually do with a dead battery, no? Normally one "replaces" dead batteries. :) Not if it is dead and leaking because the device has been unused for years. Can we please not make decisions about what code should be removed based on dodgy analogies? :) Perhaps I missed something early on, but why are we proposing removing a function which (presumably) is stable and tested and works and is not broken? What maintenance is needed here? [...] If k is lowercase, .lower() is redundant and k[0].swapcase()+k[1:].lower() == k.title(). Not so. >>> k = ' ' >>> k.title() 'Aaaa Bbbb' >>> k[0].swapcase()+k[1:].lower() 'Aaaa ' If k is uppercase, previous .upper() is redundant. If k is mixed case, code may have problems. "May" have problems? pERSONNALLY, i THINK THAT A SWAPCASE COMMAND IS ESSENTIAL FOR TEXT EDITOR APPLICATIONS, TO AVOID THOSE LITTLE cAPS lOCK ACCIDENTS. -- Steven ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Maintenance burden of str.swapcase
On 9/6/2011 12:58 PM, Tres Seaver wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 09/06/2011 12:59 PM, Stephen J. Turnbull wrote: Joao S. O. Bueno writes: Removing it would mean explicitly "batteries removal". That's what we usually do with a dead battery, no? Normally one "replaces" dead batteries. :) Not if it is dead and leaking because the device has been unused for years. https://www.google.com/codesearch#search/&q=lang:^python$%20swapcase%20case:yes&type=cs returns a mere 300 hits. At least half are definitions of the function, or tests thereof, or inclusions in lists. Some actual uses: 1.http://pytof.googlecode.com/svn/trunk/pytof/utils.py def ListCurrentDirFileFromExt(ext, path): """ list file matching extension from a list in the current directory emulate a `ls *.{(',').join(ext)` with ext in both upper and downcase}""" import glob extfiles = [] for e in ext: extfiles.extend(glob.glob(join(path,'*' + e))) extfiles.extend(glob.glob(join(path,'*' + e.swapcase( If e is all upper or lower, using e.upper() and e.lower() will do same. If e is mixed, using .upper and .lower is required to fulfill the spec. On *nix, where matching of letters is case sensitive, both will fail with '.Jpg'. On Windows, where letter matching ignores case, the above code will list everything twice. 2.http://ydict.googlecode.com/svn/trunk/ydict k is random word from database. result.replace(k, "").replace(k.upper(), "").replace(k[0].swapcase()+k[1:].lower(),"") If k is lowercase, .lower() is redundant and k[0].swapcase()+k[1:].lower() == k.title(). If k is uppercase, previous .upper() is redundant. If k is mixed case, code may have problems. 3. http://migrid.googlecode.com/svn/trunk/mig/sftp-mount/migaccess.py #This is how we could add stub extended attribute handlers... #(We can't have ones which aptly delegate requests to the underlying fs #because Python lacks a standard xattr interface.) # #def getxattr(self, path, name, size): #val = name.swapcase() + '@' + path #if size == 0: ## We are asked for size of the value. #return len(val) #return val This is not actually used. Passing a name with all cases swapped from what they should be is a bit strange. 4. elif char >= 'A' and char <= 'Z': element = element + char.swapcase() uppercasechar.swapcase() == uppercasechar.lower() My perusal of the first 70 of 300 hits suggests that .swapcase is more of an attractive nuisance or redundant rather than actually useful. -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Maintenance burden of str.swapcase
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 09/06/2011 12:59 PM, Stephen J. Turnbull wrote: > Joao S. O. Bueno writes: > >> Removing it would mean explicitly "batteries removal". > > That's what we usually do with a dead battery, no? Normally one "replaces" dead batteries. :) Tres. - -- === Tres Seaver +1 540-429-0999 tsea...@palladion.com Palladion Software "Excellence by Design"http://palladion.com -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk5mUR8ACgkQ+gerLs4ltQ7Y3gCgzRdR3Vjc/i7KsC3S0OFxRi1I r3sAoMzmSxot9+k5EnatZ8RYvFnhPO5B =PNN1 -END PGP SIGNATURE- ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Maintenance burden of str.swapcase
Joao S. O. Bueno writes: > Removing it would mean explicitly "batteries removal". That's what we usually do with a dead battery, no? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Maintenance burden of str.swapcase
On Mon, Sep 5, 2011 at 8:56 AM, Michael Foord wrote: > Hey all, > A while ago there was a discussion of the value of apis like str.swapcase, > and it was suggested that even though it was acknowledged to be useless the > effort of deprecating and removing it was thought to be more than the value > in removing it. > Earlier this year I was at a pypy sprint helping to work on Python 2.7 > compatibility. The bytearray type has much of the string interface, > including swapcase… So there was effort to implement this method with the > correct semantics for pypy. Doubtless the same has been true for IronPython, > and will also be true for Jython. > Whilst it is too late for Python 2.x, it *is* (in my opinion) worth removing > unused and unneeded APIs. Even if the effort to remove them is more than any > effort saved on the part of users it helps other implementations down the > road that no longer need to provide these APIs. > All the best, > Michael Foord > On the other hand, for any users wanting to use this i n the future, if it is not there, they'd have to implement the logic for themselves. If it is a "burden" for someone in a sprint, looking at other implementations, and with all the unicode knowledge/documentation around, it would be pretty much undoable in the correct way by a casual user. Removing it would mean explicitly "batteries removal". If you get some traction o n that, at least consider moving it to a pure python function on the string module. js -><- > -- > http://www.voidspace.org.uk/ > > May you do good and not evil > May you find forgiveness for yourself and forgive others > May you share freely, never taking more than you give. > -- the sqlite blessing http://www.sqlite.org/different.html > > ___ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/jsbueno%40python.org.br > > ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Maintenance burden of str.swapcase
Michael Foord voidspace.org.uk> writes: > > Earlier this year I was at a pypy sprint helping to work on Python 2.7 compatibility. The bytearray type has much of the string interface, including swapcase… So there was effort to implement this method with the correct semantics for pypy. Doubtless the same has been true for IronPython, and will also be true for Jython. While I haven't used swapcase() a single time, I doubt there is much difficult in implementing pure ASCII semantics, is there? Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Maintenance burden of str.swapcase
Hey all, A while ago there was a discussion of the value of apis like str.swapcase, and it was suggested that even though it was acknowledged to be useless the effort of deprecating and removing it was thought to be more than the value in removing it. Earlier this year I was at a pypy sprint helping to work on Python 2.7 compatibility. The bytearray type has much of the string interface, including swapcase… So there was effort to implement this method with the correct semantics for pypy. Doubtless the same has been true for IronPython, and will also be true for Jython. Whilst it is too late for Python 2.x, it *is* (in my opinion) worth removing unused and unneeded APIs. Even if the effort to remove them is more than any effort saved on the part of users it helps other implementations down the road that no longer need to provide these APIs. All the best, Michael Foord -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com