subject:"\[Python\-Dev\] Maintenance burden of str.swapcase"

Re: [Python-Dev] Maintenance burden of str.swapcase

2011-09-12 Thread Robert Brewer

Glyph Lefkowitz wrote:
> On Sep 11, 2011, at 11:49 AM, Michael Foord wrote:
> Does anyone *actually* use .title() for this?
> 
> Yes.  Twisted does, in various MIME-ish places (IMAP, SIP),
> although not in HTTP from what I can see.  I imagine other
> similar software would as well.

Not to mention it doesn't work for WWW-Authenticate or TE, to give just a 
couple of examples.


Robert Brewer
fuman...@aminus.org
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Maintenance burden of str.swapcase

2011-09-11 Thread Glyph Lefkowitz

On Sep 11, 2011, at 11:49 AM, Michael Foord wrote:

> Does anyone *actually* use .title() for this? (And why not just use the 
> correct casing in the string literal...)

Yes.  Twisted does, in various MIME-ish places (IMAP, SIP), although not in 
HTTP from what I can see.  I imagine other similar software would as well.

One issue is that you don't always have a string literal to work with.  If 
you're proxying traffic, you start from a mis-cased header and you possibly 
need to correct it to a canonically-cased one.  (On at least one occasion I've 
had to use such a proxy to make certain buggy client software work.)

Of course you could have something like {b"CONNECTION-LOST": 
b"Connection-Lost", ...} somewhere at module scope, but that feels a bit 
sillier than just having a nice '.title()' method.

-glyph

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Maintenance burden of str.swapcase

2011-09-11 Thread Michael Foord

On 08/09/2011 03:46, Stephen J. Turnbull wrote:

Glyph Lefkowitz writes:
  >  On Sep 7, 2011, at 10:26 AM, Stephen J. Turnbull wrote:
  >
  >  >  How about "title"?
  >
  >  >>>  'content-length'.title()
  >  'Content-Length'
  >  

Does anyone *actually* use .title() for this? (And why not just use the 
correct casing in the string literal...)

Michael

  >  You might say that the protocol "has" to be case-insensitive so
  >  this is a silly frill:

Not me, sir.  My whole point about the "bytes should be more like str"
controversy is the dual of that: you don't know what will be coming at
you, so the regularities and (normally allowable) fuzziness of text
processing are inadmissible.

  >  there are definitely enough case-sensitive crappy bits of network
  >  middleware out there that this function is critically important for
  >  an HTTP server.

"Critically important" is surely an overstatement.  You could always
title-case the literal strings containing field names in the source.

The problem with having lots of str-like features on bytes is that you
lose TOOWDTI, or worse, to many performance-happy coders, use of bytes
becomes TOOWDTI "because none of the characters[sic] I'm planning to
process myself are non-ASCII".  This is the road to Babel; it's
workable for one-off scripts but it's asking for long-term trouble in
multi-module applications.  The choice of decoding to str and
processing in that form should be made as attractive as possible.

On the other hand, it is undeniably useful for protocol tokens to have
mnemonic representations even in binary protocols.  Textual
manipulations on those tokens should be convenient.

It seems to me that what might be an improvement over the current
situation (maybe for Py4k only, though) is for bytes and
(PEP-393-style) str to share representation, and have a "cast" method
which would convert from one to the other, validating that the range
contraints on the representation are satisfied.  The problem I see is
that this either sanctions the practice of using latin-1 as "ASCII
plus anything", which is an unpleasant hack, or you'd need to check in
text methods that nothing is done with non-ASCII values other than
checks for set membership (including equality comparison, of course).

OTOH, AFAICS, Antoine's claim that inserting a non-latin-1 character
in a str that happens to contain only ASCII values would convert the
representation to multioctets (true), and therefore this doesn't give
the desired efficiency properties, is beside the point.  Just don't do
that!  You *can't* do that in a bytes object, anyway; use of str in
this way is a "consenting adults" issue.  You trade off the
convenience of the full suite of text tools vs. the possibility that
somebody might insert such a character -- but for the algorithms
they're going to be using, they shouldn't be doing that anyway.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk

--
http://www.voidspace.org.uk/

May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing http://www.sqlite.org/different.html

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Maintenance burden of str.swapcase

2011-09-07 Thread Stephen J. Turnbull

Glyph Lefkowitz writes:
 > On Sep 7, 2011, at 10:26 AM, Stephen J. Turnbull wrote:
 > 
 > > How about "title"?
 > 
 > >>> 'content-length'.title()
 > 'Content-Length'
 > 
 > You might say that the protocol "has" to be case-insensitive so
 > this is a silly frill:

Not me, sir.  My whole point about the "bytes should be more like str"
controversy is the dual of that: you don't know what will be coming at
you, so the regularities and (normally allowable) fuzziness of text
processing are inadmissible.

 > there are definitely enough case-sensitive crappy bits of network
 > middleware out there that this function is critically important for
 > an HTTP server.

"Critically important" is surely an overstatement.  You could always
title-case the literal strings containing field names in the source.

The problem with having lots of str-like features on bytes is that you
lose TOOWDTI, or worse, to many performance-happy coders, use of bytes
becomes TOOWDTI "because none of the characters[sic] I'm planning to
process myself are non-ASCII".  This is the road to Babel; it's
workable for one-off scripts but it's asking for long-term trouble in
multi-module applications.  The choice of decoding to str and
processing in that form should be made as attractive as possible.

On the other hand, it is undeniably useful for protocol tokens to have
mnemonic representations even in binary protocols.  Textual
manipulations on those tokens should be convenient.

It seems to me that what might be an improvement over the current
situation (maybe for Py4k only, though) is for bytes and
(PEP-393-style) str to share representation, and have a "cast" method
which would convert from one to the other, validating that the range
contraints on the representation are satisfied.  The problem I see is
that this either sanctions the practice of using latin-1 as "ASCII
plus anything", which is an unpleasant hack, or you'd need to check in
text methods that nothing is done with non-ASCII values other than
checks for set membership (including equality comparison, of course).

OTOH, AFAICS, Antoine's claim that inserting a non-latin-1 character
in a str that happens to contain only ASCII values would convert the
representation to multioctets (true), and therefore this doesn't give
the desired efficiency properties, is beside the point.  Just don't do
that!  You *can't* do that in a bytes object, anyway; use of str in
this way is a "consenting adults" issue.  You trade off the
convenience of the full suite of text tools vs. the possibility that
somebody might insert such a character -- but for the algorithms
they're going to be using, they shouldn't be doing that anyway.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Maintenance burden of str.swapcase

2011-09-07 Thread Nick Coghlan

On Thu, Sep 8, 2011 at 3:51 AM, Glyph Lefkowitz  wrote:
> On Sep 7, 2011, at 10:26 AM, Stephen J. Turnbull wrote:
>
> How about "title"?
>
 'content-length'.title()
> 'Content-Length'
> You might say that the protocol "has" to be case-insensitive so this is a
> silly frill: there are definitely enough case-sensitive crappy bits of
> network middleware out there that this function is critically important for
> an HTTP server.

Actually, the HTTP header case occurred to me as well shortly after
sending my last message, so I think it's a legitimate reason to keep
the methods around on bytes and bytearray.

So, putting my "practicality beats purity" hat back on, I would
describe the status quo as follows:

1. Binary data is not text, so bytes and bytearray are deliberately
conceptualised as arrays of arbitrary integers in the range 0-255
rather than as arrays of 8-bit 'characters'. This distinction is one
of the core design principles separating Python 3 from Python 2.

2. However, the use of ASCII words and characters is a common feature
of many existing wire protocols, so it is useful to be able to
manipulate binary sequences that contain data in an ASCII-compatible
format without having to convert them to text first. Retaining
additional ASCII-based methods also eases the transition to Python 3
for code that manipulates binary data using the 2.x str type.

3. ASCII whitespace characters are used as delimeters in many formats.
Thus, various methods such as split(), partition(), strip() and their
variants, retain their "ASCII whitespace" default arguments and
expandtabs() is also retained.

4. Padding values out to fill fields of a certain size is needed for
some formats. Thus, center(), ljust(), rjust(), zfill() are retained
(again retaining their ASCII space default fill character in the case
of the first 3 methods)

5. Identifying ASCII alphanumeric data is important for some formats.
Thus, isalnum(), isalpha() and isdigit() are retained.

6. Case insensitive ASCII comparisons are important for some formats
(e.g. RFC 822 headers, HTTP headers). Thus, upper(), lower(),
isupper() and islower() are retained.

7. Even correct mixed case ASCII can be important for some formats
(e.g. HTTP headers). Thus, capitalize(), title() and istitle() are
retained.

8. A valid use for swapcase() on binary data has not been identified,
but once all the other ASCII based methods are being kept around for
the various reasons given above, it doesn't seem worth the effort to
get rid of this one (despite the additional implementation effort
needed for alternate implementations).

9. Algorithms that operate purely on binary data or purely on text can
just use literals of the appropriate type (if they use literals at
all). Algorithms that are designed to operate on either kind of data
may want to adopt an implicit decode/encode approach to handle binary
inputs (this allows assumptions regarding the input encoding to be
made explicit).

I'm actually fairly happy with that rationalisation for the current
Python 3 set up. I'd been thinking recently that we would have been
better off if more of the methods that rely on the data using an ASCII
compatible encoding scheme had been removed from bytes and bytearray,
but swapcase() is really the only one we can't give a decent
justification for beyond "it was there in 2.x".

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Maintenance burden of str.swapcase

2011-09-07 Thread Glyph Lefkowitz

On Sep 7, 2011, at 10:26 AM, Stephen J. Turnbull wrote:

> How about "title"?

>>> 'content-length'.title()
'Content-Length'

You might say that the protocol "has" to be case-insensitive so this is a silly 
frill: there are definitely enough case-sensitive crappy bits of network 
middleware out there that this function is critically important for an HTTP 
server.

In general I'd like to defend keeping as many of these methods as possible for 
compatibility (porting to Py3 is already hard enough).  Although even I might 
have a hard time defending 'swapcase', which is never used _at all_ within 
Twisted, on text or bytes.  The only use-case I can think of for that method is 
goofy joke text filters, and it wouldn't be very good at that either.

-glyph

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Maintenance burden of str.swapcase

2011-09-07 Thread Stephen J. Turnbull

Antoine Pitrou writes:

 > You could also point out UTF-16 or EBCDIC, but I fail to see how that's
 > relevant. Do you have problems with ISO 2022 when parsing, say, e-mail
 > headers?

Yes, of course!  Especially when it's say, packed EUC not encapsulated
in MIME words.  I think Mailman now handles that without crashing, but
it took 10 years.  Most Emacs MUAs still blow chunks on that.  My
procmail recipes and my employer's virus checker both occasionally punt.

The point about ISO 2022 is that it allows arbitrary binary crap in
the stream, delimited by appropriate well-defined constructs.  Just
like the ASCII-like tokens in the protocols you talk about.  But
parsing full-bore ISO 2022 is non-trivial, especially if you're going
to try to provide error-handling that's useful to the user.  Nobody
ever really took it seriously as a solution to the problem of
internationalization in the 15 years or so when it was the only
solution, and even less so once it became clear that UCSes were going
to get traction.

 > >  > not arbitrary "arrays of bytes". And making indexing of bytes
 > >  > objects return ints was IMHO a mistake.
 > > 
 > > Bytes objects are not ASCII strings, even though they can be used to
 > > represent them.
 > 
 > I'm talking about practice,

So am I, and so is Nick.

 > not some idealistic view of the world.
 > In many use cases (XML, HTML, e-mail headers, many other test-based
 > protocols), you can get a mixture of ASCII "commands", and opaque
 > binary stuff (which will or will not, depending on these "commands",
 > have a meaningful unicode decoding).

Yeah, so what?  Those protocol tokens are deliberately chosen to
resemble ASCII text, but you need to parse them out of the binary
sludge somehow, and the surrounding content remains binary sludge
until deserialized or (for text) decoded.  How is having b[0] return a
bytes object, rather than an integer, going to help in that?
Especially if the value is not in the ASCII range?

 > > AFAICS, anything that should be done with ASCII-punned magic numbers
 > > ("protocol tokens", if you prefer) can be done with slices and (ta-da!)
 > > case conversion.
 > 
 > So, basically, you're saying that we should remove useful functionality

No, that *was* Nick's position; I specifically opposed the suggestion
that "lower" and "upper" be removed, and he concurred after a bit of
thought.  And remember, he's talking about removing "swapcase".  Which
RFC defines a protocol where that would be useful?  How about "title"?

 > and tell people to reimplement an adhoc version of it when they
 > need it.

Of course not; I'm with Michael Foord on that: nobody should ever be
asked to reimplement swapcase!  My position is simply that bytes are
not text, and the occasional reminder (such as b[0] returning an
integer, not a bytes object) is good.  My experience has been that it
makes a lot of sense to layer these things, for example transforming a
protocol stream serialized as octets into a more structured object
composed of protocol tokens and payloads.  It's *not* text, and the
relevant techniques are different.

It's like the old saw about "aha, I'll use regexps to solve this
problem!" and now you have *two* problems.

I don't advocate getting rid of regexps, and I don't advocate removing
methods from bytes (although I do dream about it occasionally).  I do
advocate that people think twice before implementing complex text-like
algorithms on binary protocol streams.  If the stream really is
text-like, then transform it into text of a known, well-behaved
encoding, and then apply the powerful text-processing facilities
provided for str.  If it's not, then transform to a token stream or
whatever makes sense.  In both cases, do as little "text processing"
on bytes objects as possible, and put more structure on the content as
soon as possible.

If you really need the efficiency, then do what you need to do.  As I
say, I don't have any practical objection to keeping your tools for
that case.  But such applications, although important (I guess), are a
minority.

 > That sounds obnoxious.

Good advice almost always sounds obnoxious to the recipient.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Maintenance burden of str.swapcase

2011-09-07 Thread Simon Cross

On Wed, Sep 7, 2011 at 6:31 PM, Simon Cross
 wrote:
> http://www.google.com/codesearch#search/&q=swapcase%20lang:%5Epython$&type=cs
>
> There are quite a few hits but more people appear to be
> re-implementing it than using it (I haven't gone to the trouble of
> mining the search results to get an accurate picture though).

Scratch that -- I should gloss over search results less. It looks like
the most common use case is to provide a consistent string-like API
somewhere else. So removing it is liking to cause headaches (e.g. test
failures) for the people who are wrapping it.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Maintenance burden of str.swapcase

2011-09-07 Thread Simon Cross

On Tue, Sep 6, 2011 at 10:36 PM, "Martin v. Löwis"  wrote:
>> Which applications? I'm not sure the number of applications using
>> str.swapcase gets even as high as ten.
>
> I think this is what people underestimate. I can't name
> applications either - but that doesn't mean they don't exist.
> I'm deeply convinced that the majority of Python code (and
> I mean *large* majority) is unpublished.
>
> I expect thousands of uses world-wide.

http://www.google.com/codesearch#search/&q=swapcase%20lang:%5Epython$&type=cs

There are quite a few hits but more people appear to be
re-implementing it than using it (I haven't gone to the trouble of
mining the search results to get an accurate picture though).
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Maintenance burden of str.swapcase

2011-09-07 Thread Antoine Pitrou

On Wed, 07 Sep 2011 11:15:04 +0900
"Stephen J. Turnbull"  wrote:
> Antoine Pitrou writes:
> 
>  > Bytes objects are often used for partly ASCII strings,
> 
> All I can say to that phrase is, "urk, ISO 2022 anyone?"

You could also point out UTF-16 or EBCDIC, but I fail to see how that's
relevant. Do you have problems with ISO 2022 when parsing, say, e-mail
headers?

>  > not arbitrary "arrays of bytes". And making indexing of bytes
>  > objects return ints was IMHO a mistake.
> 
> Bytes objects are not ASCII strings, even though they can be used to
> represent them.

I'm talking about practice, not some idealistic view of the world.
In many use cases (XML, HTML, e-mail headers, many other test-based
protocols), you can get a mixture of ASCII "commands", and opaque
binary stuff (which will or will not, depending on these "commands",
have a meaningful unicode decoding).

In the stdlib, bytes objects are accessed far more often to poke at
some text-like data, than to poke at arbitrary numbers.

> With PEP 393,
> there isn't even really a space excuse.

Of course there is. Any single non-ASCII byte of data mingled with
aforementioned ASCII "commands" will make it switch to a less efficient
representation.

And "surrogateescape" will be a performance problem in itself, when
used on large binary data; if you use "latin1" instead, you are risking
far greater confusion; ask David about that dilemma. :-)

> AFAICS, anything that should be done with ASCII-punned magic numbers
> ("protocol tokens", if you prefer) can be done with slices and (ta-da!)
> case conversion.

So, basically, you're saying that we should remove useful functionality
and tell people to reimplement an adhoc version of it when they need
it. That sounds obnoxious.

Regards

Antoine.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Maintenance burden of str.swapcase

2011-09-06 Thread Nick Coghlan

On Wed, Sep 7, 2011 at 11:53 AM, Stephen J. Turnbull  wrote:
> Nick Coghlan writes:
>  > The case-related methods, though, have no place in sane wire
>  > protocol handling.
>
> RFC 822 headers are a somewhat insane but venerable (isn't that true
> of anything that's reached age 350 in dog-years?), and venerated,
> counterexample.  Specifically, field names are case-insensitive (RFC
> 5322, section 1.2.2).  I'll bet you can find plenty of others if you
> look.  You can call that "text" and say it should be processed in
> Unicode, if you like, but you're not even going to convince me (and as
> I say, I like the Kool-Aid).  Specifically, SMTP processes can (and
> even MUST, under some circumstances IIRC) manipulate the RFC 822 header.
>
> Sorry, Nick, no can do.
>
> -1

Heh, I knew as soon as I sent that message that someone would be able
to point out a counter example. I agree that RFC 822 (and
case-insensitive ASCII comparison in general) is enough to save
lower() and upper() and co, but what about this even further reduced
list of text-specific methods:

 'capitalize'
 'istitle'
 'swapcase'
 'title'

While case-insensitive comparison makes sense for wire level data,
where do these methods fit in, even when embedded ASCII text fragments
are involved?

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Maintenance burden of str.swapcase

2011-09-06 Thread Stephen J. Turnbull

Antoine Pitrou writes:

 > Bytes objects are often used for partly ASCII strings,

All I can say to that phrase is, "urk, ISO 2022 anyone?"

 > not arbitrary "arrays of bytes". And making indexing of bytes
 > objects return ints was IMHO a mistake.

Bytes objects are not ASCII strings, even though they can be used to
represent them.  The practice of using magic numbers that look like
English words is a useful one, but by the same token, it should not be
too easy to use bytes to represent *text* just because the programmer
doesn't know any words that don't fit into 7*N bits.  With PEP 393,
there isn't even really a space excuse.

AFAICS, anything that should be done with ASCII-punned magic numbers
("protocol tokens", if you prefer) can be done with slices and (ta-da!)
case conversion.  (Sorry, Nick!)  But the components of a bytes object
are just numbers; they are not characters until you've run them
through a codec.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Maintenance burden of str.swapcase

2011-09-06 Thread Stephen J. Turnbull

Nick Coghlan writes:

 > However, a big +1 for deprecation in the case of bytes and bytearray.
 > That's nothing to do with the maintenance burden though, it's to do
 > with the semantic confusion between binary data and ASCII-encoded text
 > implied by the retention of methods like upper(), lower() and
 > swapcase().

[...]

 > These are all text operations, not something you do with binary data.

"Yea, Brother, Amen!"  I like the taste of this Kool-Aid.  But

 > The case-related methods, though, have no place in sane wire
 > protocol handling.

RFC 822 headers are a somewhat insane but venerable (isn't that true
of anything that's reached age 350 in dog-years?), and venerated,
counterexample.  Specifically, field names are case-insensitive (RFC
5322, section 1.2.2).  I'll bet you can find plenty of others if you
look.  You can call that "text" and say it should be processed in
Unicode, if you like, but you're not even going to convince me (and as
I say, I like the Kool-Aid).  Specifically, SMTP processes can (and
even MUST, under some circumstances IIRC) manipulate the RFC 822 header.

Sorry, Nick, no can do.

-1
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Maintenance burden of str.swapcase

2011-09-06 Thread Antoine Pitrou

On Wed, 7 Sep 2011 10:47:16 +1000
Nick Coghlan  wrote:
> 
> However, a big +1 for deprecation in the case of bytes and bytearray.
> That's nothing to do with the maintenance burden though, it's to do
> with the semantic confusion between binary data and ASCII-encoded text
> implied by the retention of methods like upper(), lower() and
> swapcase().

A big -1 on that.
Bytes objects are often used for partly ASCII strings, not arbitrary
"arrays of bytes". And making indexing of bytes objects return ints was
IMHO a mistake.

Besides, if you want an array of ints, there's already array.array()
with your typecode of choice. Not sure why other types should conform.

Regards

Antoine.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Maintenance burden of str.swapcase

2011-09-06 Thread Steven D'Aprano


Raymond Hettinger wrote:

On Sep 6, 2011, at 1:36 PM, Martin v. Löwis wrote:


I think this is what people underestimate. I can't name
applications either - but that doesn't mean they don't exist.


Google code search is pretty good indicator that this method
has near zero uptake.   If it dies, I don't think anyone will cry.


Near-zero is not zero, and Terry has already shown some examples of code 
which use, or misuse, swapcase.


In any case (pun intended *wink*) this was discussed in December and 
Guido expressed little enthusiasm for the idea:


http://mail.python.org/pipermail/python-dev/2010-December/106650.html

I can't exactly defend the existence of swapcase, it does seem to be a 
fairly specialised function. But given that it exists, I'm -0.5 on 
removal on the basis of "if it ain't broke, don't fix it".




--
Steven

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Maintenance burden of str.swapcase

2011-09-06 Thread Nick Coghlan

On Wed, Sep 7, 2011 at 7:23 AM, Raymond Hettinger
 wrote:
>
> On Sep 6, 2011, at 1:36 PM, Martin v. Löwis wrote:
>
> I think this is what people underestimate. I can't name
> applications either - but that doesn't mean they don't exist.
>
> Google code search is pretty good indicator that this method
> has near zero uptake.   If it dies, I don't think anyone will cry.

For str itself, I'm -0 on removing it - the Unicode implications mean
implementation isn't completely trivial and there's at least one
legitimate use case (i.e. providing, or deliberately reversing, Caps
Lock style functionality).

However, a big +1 for deprecation in the case of bytes and bytearray.
That's nothing to do with the maintenance burden though, it's to do
with the semantic confusion between binary data and ASCII-encoded text
implied by the retention of methods like upper(), lower() and
swapcase().

Specifically, the methods I consider particularly problematic on that front are:
 'capitalize'
 'islower'
 'istitle'
 'isupper'
 'lower'
 'swapcase'
 'title'
 'upper'

These are all text operations, not something you do with binary data.

There are some other methods that make ASCII specific default
assumptions regarding whitespace and line separators, but ASCII
whitespace is often used as a delimiter in wire protocols so losing
those would be genuinely annoying. I've also left out the methods for
identifying ASCII letters and digits, since again, those are useful
for interpreting various wire encodings. The case-related methods,
though, have no place in sane wire protocol handling.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Maintenance burden of str.swapcase

2011-09-06 Thread Raymond Hettinger


On Sep 6, 2011, at 1:36 PM, Martin v. Löwis wrote:

> I think this is what people underestimate. I can't name
> applications either - but that doesn't mean they don't exist.

Google code search is pretty good indicator that this method
has near zero uptake.   If it dies, I don't think anyone will cry.


Raymond___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Maintenance burden of str.swapcase

2011-09-06 Thread Martin v. Löwis

> Which applications? I'm not sure the number of applications using
> str.swapcase gets even as high as ten.

I think this is what people underestimate. I can't name
applications either - but that doesn't mean they don't exist.
I'm deeply convinced that the majority of Python code (and
I mean *large* majority) is unpublished.

I expect thousands of uses world-wide.

> We still have our standard deprecation policy that we can follow in
> Python 3. We don't have to wait until Python 4 to remove things.

That's true. However, part of the deprecation procedure is also that
there should be a rationale for removing it. In the past, things have
been removed that had been superseded with something new, or things
that had been flawed in their design so that fixing it wasn't really
possible, or that did indeed cause ongoing maintenance effort for
a minority of users (such as the support for little-used platforms).

None if these motivations hold for str.swapcase, and I think the
"other implementations will have to implement it" is not sufficient
motivation. If the other implementations believe that the feature
is truly useless and also not used, they just can declare it a
deliberate deviation from CPython, and refuse to implement it.

If I had to pick a truly useless feature, I'd kill complex numbers,
not str.swapcase.

Regards,
Martin

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Maintenance burden of str.swapcase

2011-09-06 Thread Michael Foord

On 6 Sep 2011, at 21:18, Martin v. Löwis wrote:
>>> Perhaps I missed something early on, but why are we proposing
>>> removing a function which (presumably) is stable and tested and
>>> works and is not broken? What maintenance is needed here?
>> 
>> 
>> The maintenance burden is on other implementations.
> 
> It's not a maintenance burden (at least not in the sense in which
> I understand the word "maintenance" - as an ongoing effort). When
> they implement it once, the implementation can likely stay forever,
> unmodified.

Ok, burden rather than "maintenance" burden.

> 
>> Even if there is
>> no maintenance burden for CPython having useless methods simply
>> because  it is less effort to leave them in place creates work for
>> new implementations wanting to be fully compatible.
> 
> That's true.
> 
> However, that alone is not enough reason to remove the feature, IMO.
> The effort that is saved is not only on the developers of CPython,
> but also on users of the feature. My claim is that for any little-used
> feature, removing it costs more time world-wide than re-implementing
> it in 10 alternative Python implementations (with the number 10 drawn
> out of blue air), because of the cost of changing the applications that
> actually do use the feature.
> 

Which applications? I'm not sure the number of applications using str.swapcase 
gets even as high as ten.

> With the switch to Python 3, there would have been a chance to remove
> little-used features. IMO, the next such chance is with Python 4.
> It could be useful to start collecting little-used features that might
> be removed with Python 4 - which I don't expect until 2020.

We still have our standard deprecation policy that we can follow in Python 3. 
We don't have to wait until Python 4 to remove things. Changing semantics or 
syntax is harder because you can't really deprecate. Just removing methods is 
straightforward.

MIchael

> 
> Regards,
> Martin
> 




--
http://www.voidspace.org.uk/


May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing 
http://www.sqlite.org/different.html





___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Maintenance burden of str.swapcase

2011-09-06 Thread Martin v. Löwis

>> Perhaps I missed something early on, but why are we proposing
>> removing a function which (presumably) is stable and tested and
>> works and is not broken? What maintenance is needed here?
> 
> 
> The maintenance burden is on other implementations.

It's not a maintenance burden (at least not in the sense in which
I understand the word "maintenance" - as an ongoing effort). When
they implement it once, the implementation can likely stay forever,
unmodified.

> Even if there is
> no maintenance burden for CPython having useless methods simply
> because  it is less effort to leave them in place creates work for
> new implementations wanting to be fully compatible.

That's true.

However, that alone is not enough reason to remove the feature, IMO.
The effort that is saved is not only on the developers of CPython,
but also on users of the feature. My claim is that for any little-used
feature, removing it costs more time world-wide than re-implementing
it in 10 alternative Python implementations (with the number 10 drawn
out of blue air), because of the cost of changing the applications that
actually do use the feature.

With the switch to Python 3, there would have been a chance to remove
little-used features. IMO, the next such chance is with Python 4.
It could be useful to start collecting little-used features that might
be removed with Python 4 - which I don't expect until 2020.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Maintenance burden of str.swapcase

2011-09-06 Thread Barry Warsaw

On Sep 06, 2011, at 03:42 PM, Fred Drake wrote:

>On Tue, Sep 6, 2011 at 3:36 PM, Steven D'Aprano  wrote:
>> pERSONNALLY, i THINK THAT A SWAPCASE COMMAND IS ESSENTIAL FOR TEXT EDITOR
>> APPLICATIONS, TO AVOID THOSE LITTLE cAPS lOCK ACCIDENTS.
>
>There's a better solution to that, but the caps lock lobby has a stranglehold
>on keyboard manufacturers.

Fight The Man with xmodmap!

-Barry
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Maintenance burden of str.swapcase

2011-09-06 Thread Fred Drake

On Tue, Sep 6, 2011 at 3:36 PM, Steven D'Aprano  wrote:
> pERSONNALLY, i THINK THAT A SWAPCASE COMMAND IS ESSENTIAL FOR TEXT EDITOR
> APPLICATIONS, TO AVOID THOSE LITTLE cAPS lOCK ACCIDENTS.

There's a better solution to that, but the caps lock lobby has a stranglehold
on keyboard manufacturers.


-- 
Fred L. Drake, Jr.    
"A person who won't read has no advantage over one who can't read."
   --Samuel Langhorne Clemens
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Maintenance burden of str.swapcase

2011-09-06 Thread Michael Foord

On 6 Sep 2011, at 20:36, Steven D'Aprano wrote:
> Terry Reedy wrote:
>> On 9/6/2011 12:58 PM, Tres Seaver wrote:
>>> -BEGIN PGP SIGNED MESSAGE-
>>> Hash: SHA1
>>> 
>>> On 09/06/2011 12:59 PM, Stephen J. Turnbull wrote:
 Joao S. O. Bueno writes:
 
> Removing it would mean explicitly "batteries removal".
 
 That's what we usually do with a dead battery, no?
>>> 
>>> Normally one "replaces" dead batteries. :)
>> Not if it is dead and leaking because the device has been unused for years.
> 
> 
> Can we please not make decisions about what code should be removed based on 
> dodgy analogies? :)
> 
> Perhaps I missed something early on, but why are we proposing removing a 
> function which (presumably) is stable and tested and works and is not broken? 
> What maintenance is needed here?


The maintenance burden is on other implementations. Even if there is no 
maintenance burden for CPython having useless methods simply because  it is 
less effort to leave them in place creates work for new implementations wanting 
to be fully compatible. 

> 
> 
> [...]
>> If k is lowercase, .lower() is redundant and k[0].swapcase()+k[1:].lower() 
>> == k.title(). 
> 
> Not so.
> 
> >>> k = ' '
> >>> k.title()
> 'Aaaa Bbbb'
> >>> k[0].swapcase()+k[1:].lower()
> 'Aaaa '
> 
> 
>> If k is uppercase, previous .upper() is redundant. If k is mixed case, code 
>> may have problems.
> 
> "May" have problems?
> 
> 
> pERSONNALLY, i THINK THAT A SWAPCASE COMMAND IS ESSENTIAL FOR TEXT EDITOR 
> APPLICATIONS, TO AVOID THOSE LITTLE cAPS lOCK ACCIDENTS.


Have you ever used str.swapcase for that purpose?

Michael


--
http://www.voidspace.org.uk/


May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing 
http://www.sqlite.org/different.html






___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Maintenance burden of str.swapcase

2011-09-06 Thread Steven D'Aprano

Terry Reedy wrote:

On 9/6/2011 12:58 PM, Tres Seaver wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 09/06/2011 12:59 PM, Stephen J. Turnbull wrote:

Joao S. O. Bueno writes:

Removing it would mean explicitly "batteries removal".

That's what we usually do with a dead battery, no?

Normally one "replaces" dead batteries. :)

Not if it is dead and leaking because the device has been unused for years.

Can we please not make decisions about what code should be removed based 
on dodgy analogies? :)

Perhaps I missed something early on, but why are we proposing removing a 
function which (presumably) is stable and tested and works and is not 
broken? What maintenance is needed here?

[...]
If k is lowercase, .lower() is redundant and 
k[0].swapcase()+k[1:].lower() == k.title(). 

Not so.

>>> k = ' '
>>> k.title()
'Aaaa Bbbb'
>>> k[0].swapcase()+k[1:].lower()
'Aaaa '

If k is uppercase, previous 
.upper() is redundant. If k is mixed case, code may have problems.

"May" have problems?

pERSONNALLY, i THINK THAT A SWAPCASE COMMAND IS ESSENTIAL FOR TEXT 
EDITOR APPLICATIONS, TO AVOID THOSE LITTLE cAPS lOCK ACCIDENTS.

--
Steven
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Maintenance burden of str.swapcase

2011-09-06 Thread Terry Reedy


On 9/6/2011 12:58 PM, Tres Seaver wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 09/06/2011 12:59 PM, Stephen J. Turnbull wrote:

Joao S. O. Bueno writes:


Removing it would mean explicitly "batteries removal".


That's what we usually do with a dead battery, no?


Normally one "replaces" dead batteries. :)


Not if it is dead and leaking because the device has been unused for years.

https://www.google.com/codesearch#search/&q=lang:^python$%20swapcase%20case:yes&type=cs

returns a mere 300 hits. At least half are definitions of the function, 
or tests thereof, or inclusions in lists. Some actual uses:


1.http://pytof.googlecode.com/svn/trunk/pytof/utils.py
def ListCurrentDirFileFromExt(ext, path):
""" list file matching extension from a list
in the current directory
emulate a `ls *.{(',').join(ext)` with ext in both upper and 
downcase}"""

import glob
extfiles = []
for e in ext:
extfiles.extend(glob.glob(join(path,'*' + e)))
extfiles.extend(glob.glob(join(path,'*' + e.swapcase(

If e is all upper or lower, using e.upper() and e.lower() will do same. 
If e is mixed, using .upper and .lower is required to fulfill the spec. 
On *nix, where matching of letters is case sensitive, both will fail 
with '.Jpg'. On Windows, where letter matching ignores case, the above 
code will list everything twice.


2.http://ydict.googlecode.com/svn/trunk/ydict
k is random word from database.

result.replace(k, "").replace(k.upper(), 
"").replace(k[0].swapcase()+k[1:].lower(),"")


If k is lowercase, .lower() is redundant and 
k[0].swapcase()+k[1:].lower() == k.title(). If k is uppercase, previous 
.upper() is redundant. If k is mixed case, code may have problems.


3. http://migrid.googlecode.com/svn/trunk/mig/sftp-mount/migaccess.py

#This is how we could add stub extended attribute handlers...
#(We can't have ones which aptly delegate requests to the underlying fs
#because Python lacks a standard xattr interface.)
#
#def getxattr(self, path, name, size):
#val = name.swapcase() + '@' + path
#if size == 0:
## We are asked for size of the value.
#return len(val)
#return val

This is not actually used. Passing a name with all cases swapped from 
what they should be is a bit strange.


4.
elif char >= 'A' and  char <= 'Z':
element = element + char.swapcase()

uppercasechar.swapcase() == uppercasechar.lower()


My perusal of the first 70 of 300 hits suggests that .swapcase is more 
of an attractive nuisance or redundant rather than actually useful.


--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Maintenance burden of str.swapcase

2011-09-06 Thread Tres Seaver

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 09/06/2011 12:59 PM, Stephen J. Turnbull wrote:
> Joao S. O. Bueno writes:
> 
>> Removing it would mean explicitly "batteries removal".
> 
> That's what we usually do with a dead battery, no?

Normally one "replaces" dead batteries. :)



Tres.
- -- 
===
Tres Seaver  +1 540-429-0999  tsea...@palladion.com
Palladion Software   "Excellence by Design"http://palladion.com
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk5mUR8ACgkQ+gerLs4ltQ7Y3gCgzRdR3Vjc/i7KsC3S0OFxRi1I
r3sAoMzmSxot9+k5EnatZ8RYvFnhPO5B
=PNN1
-END PGP SIGNATURE-

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Maintenance burden of str.swapcase

2011-09-06 Thread Stephen J. Turnbull

Joao S. O. Bueno writes:

 > Removing it would mean explicitly "batteries removal".

That's what we usually do with a dead battery, no?
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Maintenance burden of str.swapcase

2011-09-06 Thread Joao S. O. Bueno

On Mon, Sep 5, 2011 at 8:56 AM, Michael Foord  wrote:
> Hey all,
> A while ago there was a discussion of the value of apis like str.swapcase,
> and it was suggested that even though it was acknowledged to be useless the
> effort of deprecating and removing it was thought to be more than the value
> in removing it.
> Earlier this year I was at a pypy sprint helping to work on Python 2.7
> compatibility. The bytearray type has much of the string interface,
> including swapcase… So there was effort to implement this method with the
> correct semantics for pypy. Doubtless the same has been true for IronPython,
> and will also be true for Jython.
> Whilst it is too late for Python 2.x, it *is* (in my opinion) worth removing
> unused and unneeded APIs. Even if the effort to remove them is more than any
> effort saved on the part of users it helps other implementations down the
> road that no longer need to provide these APIs.
> All the best,
> Michael Foord
>

On the other hand,
for any users wanting to use this i n the future, if it is not there,
they'd have to implement the logic for themselves. If it is a "burden"
for someone in a sprint, looking at other implementations, and with
all the unicode knowledge/documentation around, it would be pretty
much undoable in the correct way by a casual user. Removing it would
mean explicitly "batteries removal".

If you get some traction o n that, at least consider moving it to  a
pure python function on the string module.


  js
 -><-

> --
> http://www.voidspace.org.uk/
>
> May you do good and not evil
> May you find forgiveness for yourself and forgive others
> May you share freely, never taking more than you give.
> -- the sqlite blessing http://www.sqlite.org/different.html
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/jsbueno%40python.org.br
>
>
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Maintenance burden of str.swapcase

2011-09-06 Thread Antoine Pitrou

Michael Foord  voidspace.org.uk> writes:
> 
> Earlier this year I was at a pypy sprint helping to work on Python 2.7
compatibility. The bytearray type has much of the string interface, including
swapcase… So there was effort to implement this method with the correct
semantics for pypy. Doubtless the same has been true for IronPython, and will
also be true for Jython.

While I haven't used swapcase() a single time, I doubt there is much difficult
in implementing pure ASCII semantics, is there?

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Maintenance burden of str.swapcase

2011-09-05 Thread Michael Foord

Hey all,

A while ago there was a discussion of the value of apis like str.swapcase, and 
it was suggested that even though it was acknowledged to be useless the effort 
of deprecating and removing it was thought to be more than the value in 
removing it. 

Earlier this year I was at a pypy sprint helping to work on Python 2.7 
compatibility. The bytearray type has much of the string interface, including 
swapcase… So there was effort to implement this method with the correct 
semantics for pypy. Doubtless the same has been true for IronPython, and will 
also be true for Jython.

Whilst it is too late for Python 2.x, it *is* (in my opinion) worth removing 
unused and unneeded APIs. Even if the effort to remove them is more than any 
effort saved on the part of users it helps other implementations down the road 
that no longer need to provide these APIs.

All the best,

Michael Foord

-- 
http://www.voidspace.org.uk/

May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing http://www.sqlite.org/different.html
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Maintenance burden of str.swapcase

Re: [Python-Dev] Maintenance burden of str.swapcase

Re: [Python-Dev] Maintenance burden of str.swapcase

Re: [Python-Dev] Maintenance burden of str.swapcase

Re: [Python-Dev] Maintenance burden of str.swapcase

Re: [Python-Dev] Maintenance burden of str.swapcase

Re: [Python-Dev] Maintenance burden of str.swapcase

Re: [Python-Dev] Maintenance burden of str.swapcase

Re: [Python-Dev] Maintenance burden of str.swapcase

Re: [Python-Dev] Maintenance burden of str.swapcase

Re: [Python-Dev] Maintenance burden of str.swapcase

Re: [Python-Dev] Maintenance burden of str.swapcase

Re: [Python-Dev] Maintenance burden of str.swapcase

Re: [Python-Dev] Maintenance burden of str.swapcase

Re: [Python-Dev] Maintenance burden of str.swapcase

Re: [Python-Dev] Maintenance burden of str.swapcase

Re: [Python-Dev] Maintenance burden of str.swapcase

Re: [Python-Dev] Maintenance burden of str.swapcase

Re: [Python-Dev] Maintenance burden of str.swapcase

Re: [Python-Dev] Maintenance burden of str.swapcase

Re: [Python-Dev] Maintenance burden of str.swapcase

Re: [Python-Dev] Maintenance burden of str.swapcase

Re: [Python-Dev] Maintenance burden of str.swapcase

Re: [Python-Dev] Maintenance burden of str.swapcase

Re: [Python-Dev] Maintenance burden of str.swapcase

Re: [Python-Dev] Maintenance burden of str.swapcase

Re: [Python-Dev] Maintenance burden of str.swapcase

Re: [Python-Dev] Maintenance burden of str.swapcase

Re: [Python-Dev] Maintenance burden of str.swapcase

[Python-Dev] Maintenance burden of str.swapcase

30 matches

Site Navigation

Mail list logo

Footer information