Re: [Python-Dev] email package status in 3.X
On Fri, 18 Jun 2010 18:52:45 -, l...@rmi.net wrote: What I'm suggesting is that extreme caution be exercised from this point forward with all things 3.X-related. Whether you wish to accept this or not, 3.X has a negative image to many. This suggestion specifically includes not abandoning current 3.X email package users as a case in point. Ripping the rug out from new 3.X users after they took the time to port seems like it may be just enough to tip the scales altogether. Catching up on my python-dev email, I just want to clarify this with respect to email. (1) I suspect that the new API will be enough of a carrot that they won't mind converting to it, BUT, (2) the plan is to provide a compatibility API that will fully support the current Python3 email5 API (but with fewer bugs in areas such as header folding and unfolding). -- R. David Murray www.bitdance.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
Guido van Rossum wrote: On Tue, Jun 22, 2010 at 9:37 AM, Tres Seaver tsea...@palladion.com wrote: Any turdiness (which I am *not* arguing for) is a natural consequence of the kinds of backward incompatibilities which were *not* ruled out for Python 3, along with the (early, now waning) build it and they will come optimism about adoption rates. FWIW, my optimisim is *not* waning. I think it's good that we're having this discussion and I expect something useful will come out of it; I also expect in general that the (admittedly serious) problem of having to port all dependencies will be solved in the next few years. Not by magic, but because many people are taking small steps in the right direction, and there will be light eventually. In the mean time I don't blame anyone for sticking with 2.x or being too busy to help port stuff to 3.x. Python 3 has been a long time in the making -- it will be a bit longer still, which was expected. +1 The important thing is to avoid bigotry and FUD, and deal with things the way they are. The #python IRC team have just helped us make a major step forward. This won't be a campaign with a victorious charge over some imaginary finish line. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS:http://holdenweb.eventbrite.com/ All I want for my birthday is another birthday - Ian Dury, 1942-2000 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
On Jun 23, 2010, at 8:17 AM, Steve Holden wrote: Guido van Rossum wrote: On Tue, Jun 22, 2010 at 9:37 AM, Tres Seaver tsea...@palladion.com wrote: Any turdiness (which I am *not* arguing for) is a natural consequence of the kinds of backward incompatibilities which were *not* ruled out for Python 3, along with the (early, now waning) build it and they will come optimism about adoption rates. FWIW, my optimisim is *not* waning. I think it's good that we're having this discussion and I expect something useful will come out of it; I also expect in general that the (admittedly serious) problem of having to port all dependencies will be solved in the next few years. Not by magic, but because many people are taking small steps in the right direction, and there will be light eventually. In the mean time I don't blame anyone for sticking with 2.x or being too busy to help port stuff to 3.x. Python 3 has been a long time in the making -- it will be a bit longer still, which was expected. +1 The important thing is to avoid bigotry and FUD, and deal with things the way they are. The #python IRC team have just helped us make a major step forward. This won't be a campaign with a victorious charge over some imaginary finish line. For sure. I don't speak for Tres, but I don't think he wasn't talking about optimism about *adoption*, overall, but optimism about adoption *rates*. And I don't think he was talking about it coming from Guido :). There has definitely been some irrational exuberance from some quarters. The form it usually takes is someone making a blog post which assumes, because the author could port their smallish library or application without too much hassle, that Python 2.x is already dead and everyone should be off of it in a couple of weeks. I've never heard this position from the core team or any official communication or documentation. Far from it: the realistic attitude that the Python 3 migration is something that will take a while has significantly reduced my own concerns. Even the aforementioned blog posts have been encouraging in some ways, because a lot of people are reporting surprisingly easy transitions. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Glyph Lefkowitz wrote: I don't speak for Tres, but I don't think he wasn't talking about optimism about *adoption*, overall, but optimism about adoption *rates*. And I don't think he was talking about it coming from Guido :). You channel me correctly here. In particular, the phrase build it and they will come was meant to address the idea that the only thing needed to drive adoption was the release of the new, shiny Python3. That particular bit of optimism is what I meant to describe as waning: the community on the whole seems to be more realistic now than two or three years ago about the kind of extra effort required from both core developers and from existing Python 2 folks to get to Python 3. There has definitely been some irrational exuberance from some quarters. The form it usually takes is someone making a blog post which assumes, because the author could port their smallish library or application without too much hassle, that Python 2.x is already dead and everyone should be off of it in a couple of weeks. I've never heard this position from the core team or any official communication or documentation. Far from it: the realistic attitude that the Python 3 migration is something that will take a while has significantly reduced my own concerns. Even the aforementioned blog posts have been encouraging in some ways, because a lot of people are reporting surprisingly easy transitions. Indeed. Tres. - -- === Tres Seaver +1 540-429-0999 tsea...@palladion.com Palladion Software Excellence by Designhttp://palladion.com -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkwiVS8ACgkQ+gerLs4ltQ4kQgCeJ9nwU8XyiWzOTpHSbWg21bzU 0/IAnjVOj5SlgA9mnAsx4/wMad5lNkqq =HObh -END PGP SIGNATURE- ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
Michael Urman writes: It is somewhat troublesome that there doesn't appear to be an obvious built-in idempotent-when-possible function that gives back the provided bytes/str, If you want something idempotent, it's already the case that bytes(b'abc') = b'abc'. What might be desirable is to make bytes('abc') work and return b'abc', but only if 'abc' is pure ASCII (or maybe ISO 8859/1). Unfortunately, str(b'abc') already does work, but st...@uwakimon ~ $ python3.1 Python 3.1.2 (release31-maint, May 12 2010, 20:15:06) [GCC 4.3.4] on linux2 Type help, copyright, credits or license for more information. str(b'abc') b'abc' Oops. You can see why that probably should be the case. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
P.J. Eby writes: I know, it's a hard thing to wrap one's head around, since on the surface it sounds like unicode is the programmer's savior. I don't need to wrap my head around it. It's been deeply embedded, point first, and the nasty barbs ensure that I have no desire to pull it back out. To wit, I've been dealing with Japanese encoding issues on a daily basis for 20 years, and I'm well aware that programmers have several good reasons (and a lot more bad ones) for avoiding them, and even for avoiding Unicode when they must deal with encodings at all. I don't think any of the good reasons have been offered here yet, that's all. Unfortunately, real-world text data exists which cannot be safely roundtripped to unicode, and must be handled in bytes with encoding form for certain operations. Or Unicode with encoding form. See below for why this makes sense in the context of Python. I personally do not have to deal with this *particular* use case any more -- I haven't been at NTT/Verio for six years now. As mentioned, I have a bit of understanding of the specific problems of Japanese-language computing. In particular, roundtripping Japanese from *any* encoding to *any other* encoding is problematic, because the national standards provide a proper subset of the repertoire actually used by the Japanese people. (Even JIS X 0213.) My current needs are simpler, thank goodness. ;-) However, they *do* involve situations where I'm dealing with *other* encoding-restricted legacy systems, such as software for interfacing with the US Postal Service that only works with a restricted subset of latin1, while receiving mangled ASCII from an ecommerce provider, and storing things in what's effectively a latin-1 database. Yes, I know of similar issues in other applications. For example, TeX error messages do not respect UTF-8 character boundaries, so Emacs has to handle them specially (basically a mechanism similar in spirit to PEP 383 is used). Being able to easily assert what kind of bytes I've got would actually let me catch errors sooner, *if* those assertions were being checked when different kinds of strings or bytes were being combined. i.e., at coercion time). I see that this would make life a little easier for you in maintaining without refactoring. I'd say it's a kludge, but without a full list of requirements I'm in no position to claim any authority wink. Eg, for a non-kludgey suggestion, how about defining a codec which takes Latin-1 bytes, checks (with error on failure) for the restricted subset, and converts to str? Then you can manipulate these things as str with abandon internally. Finally you get another check in the outgoing codec which converts from str to effective Latin-1 bytes, however that is defined. But OK, maybe I'm just being naive. You need this unlovely artifice so you can put in asserts in appropriate places. Now, does it belong in the stdlib? It seems to me that in the case of Japanese roundtripping, *most* of the time encoding back to a standard Japanese encoding will work. If you run into one of the problematic characters that JIS doesn't allow but Japanese like to use because they prefer the glyph to the JIS-standard glyph, you get an occasional error on encoding to a standard Japanese encoding, which you handle specially with a database of such characters. Knowing the specific encoding originally used *normally does not help unless you're replying to that person and **only** that person*, because the extended repertoires vary widely and the only standard is Japanese. I conclude ebytes does *no* good here. For the ecommerce/USPS case, well, actually you need special-purpose encodings anyway (ISTM). 'latin-1' loses, the USPS is allergic to some valid 'latin-1' characters. 'ascii' loses, apparently you need some of the Latin-1 repertoire, and anyway AIUI the ecommerce provider munges the ASCII. So what does ebytes actually buy you here, unless you write the codecs? If you've got the codecs, what additional benefit do you get from ebytes? Note that you would *also* need to do explicit transcoding anyway if you were dealing with Japan Post instead of the USPS, although I grant your code is probably general enough to deal with Deutsche Telecom (but the German equivalent of your ecommerce provider probably has its own ways of munging Latin-1). I conclude that there may be genuine benefits to ebytes here, but they're probably not general enough to put in the stdlib (or the Python language). Which works if and only if your outputs are truly unicode-able. With PEP 383, they always are, as long as you allow Unicode to be decoded to the same garbage your bytes-based program would have produced anyway. If you work with legacy systems (e.g. those Asian email clients and US postal software), you are really working with a *character set*, not unicode, I think you're missing something. Namely, Unicode is a standard
Re: [Python-Dev] email package status in 3.X
On Tue, Jun 22, 2010 at 4:49 PM, Stephen J. Turnbull step...@xemacs.org wrote: Which works if and only if your outputs are truly unicode-able. With PEP 383, they always are, as long as you allow Unicode to be decoded to the same garbage your bytes-based program would have produced anyway. Could it be that part of the problem here is that we need to better advertise errors='surrogateescape' as a mechanism for decoding incorrectly encoded data according to a nominal codec without throwing UnicodeDecode and UnicodeEncode errors all over the place? Currently it only garners a mention in the docs in the context of the os module, the list of error handlers in the codecs module and as a default error handler argument in the tarfile module. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
Nick Coghlan writes: On Tue, Jun 22, 2010 at 4:49 PM, Stephen J. Turnbull step...@xemacs.org wrote: Which works if and only if your outputs are truly unicode-able. With PEP 383, they always are, as long as you allow Unicode to be decoded to the same garbage your bytes-based program would have produced anyway. Could it be that part of the problem here is that we need to better advertise errors='surrogateescape' as a mechanism for decoding incorrectly encoded data according to a nominal codec without throwing UnicodeDecode and UnicodeEncode errors all over the place? Yes, I think that would make the use str internally to urllib strategy a lot more palatable. But it still needs to be combined with a program architecture of decode-process-encode, which might require substantial refactoring for some existing modules. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
On Tue, Jun 22, 2010 at 00:28, Stephen J. Turnbull step...@xemacs.org wrote: Michael Urman writes: It is somewhat troublesome that there doesn't appear to be an obvious built-in idempotent-when-possible function that gives back the provided bytes/str, If you want something idempotent, it's already the case that bytes(b'abc') = b'abc'. What might be desirable is to make bytes('abc') work and return b'abc', but only if 'abc' is pure ASCII (or maybe ISO 8859/1). By idempotent-when-possible, I mean to_bytes(str_or_bytes, encoding, errors) that would pass an instance of bytes through, or encode an instance of str. And of course a to_str that performs similarly, passing str through and decoding bytes. While bytes(b'abc') will give me b'abc', neither bytes('abc') nor bytes(b'abc', 'latin-1') get me the b'abc' I want to see. These are trivial functions; I just don't fully understand why the capability isn't baked in. A one argument call is idempotent capable; a two argument call isn't as it only converts. It's not a completely made-up requirement either. A cross-platform piece of software may need to present to a user items that are sometimes str and sometimes bytes - particularly filenames. Unfortunately, str(b'abc') already does work, but st...@uwakimon ~ $ python3.1 Python 3.1.2 (release31-maint, May 12 2010, 20:15:06) [GCC 4.3.4] on linux2 Type help, copyright, credits or license for more information. str(b'abc') b'abc' Oops. You can see why that probably should be the case Sure, and I love having this there for debugging. But this is hardly good enough for presenting to a user once you leave ascii. u = '日本語' sjis = bytes(u, 'shift-jis') utf8 = bytes(u, 'utf-8') str(sjis), str(utf8) (b'\\x93\\xfa\\x96{\\x8c\\xea', b'\\xe6\\x97\\xa5\\xe6\\x9c\\xac\\xe8\\xaa\\x9e') When I happen to know the encoding, I can reverse it much more cleanly. str(sjis, 'shift-jis'), str(utf8, 'utf-8') ('日本語', '日本語') But I can't mix this approach with str instances without writing a different invocation. str(u, 'argh') TypeError: decoding str is not supported -- Michael Urman ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Jesse Noller wrote: On Jun 19, 2010, at 10:13 AM, Tres Seaver tsea...@palladion.com wrote: Nothing is set in stone; if something is incredibly painful, or worse yet broken, then someone needs to file a bug, bring it to this list, or bring up a patch. Or walk away. Ok. If you want. I specifically said I *didn't* want to walk away. I'm pointing out that in the general case, the ordinary user who finds something incredibly painful or broken is far more likely to walk away from the platform than try to fix it, especially if there are available alternatives (e.g., Ruby, Python 2) where the pain level for that user's application is lower. I guess tutorial welcome, rather than patch welcome then ;) The only folks who can write the tutorial are the ones who have already drunk the koolaid. Note that I've been making my living with Python for about twelve years now, and would *like* to use Python3, but can't, yet, and therefore haven't taken the first sip. Why can't you? Is it a bug? It's not *a* bug, it is that I do my day to day work on very large applications which depend on a large number of not-yet-ported libraries. This barrier is the negative network effect which is the whole point of this thread: there is nothing wrong with Python3 except that, to use it, I have to stop doing the work which pays to do an indeterminately-large amount of hobby work (of which I already do quite a lot). Let's file it and fix it. Is it that you need a dependency ported? I need dozens of them ported, and am working on some of them in the aforementioned copious spare time. Cool - let's bring it up to the maintainers, or this list, or ask the PSF to push resources into helping port. Anything but nothing. Nothing is the default: I am already successful with Python 2, and can't be successfulwith Python 3 (in the sense of delivering timely, cost-effective solutions to my customers) until *all* those dependencies are ported and stable there. If what you're saying is that python 3 is a completely unsuitable platform, well, then yeah - we can all fix it or walk away. I didn't say that: I said that Python 3 is unsuitable *today* for the work I'm doing, and that the relative wins it provides over Python 2 are dwarfed by the effort required to do all those ports myself. IOW, 3.x has broken TOOOWTDI for me in some areas. There may be obvious ways to do it, but, as per the Zen of Python, that way may not be obvious at first unless you're Dutch. ;-) OT: The Dutch smiley there doesn't actually help anything but undercut any point to having TOOOWTDI in the list at all. What areas. We need specifics which can either be: 1 Shot down. 2 Turned into bugs, so they can be fixed 3 Documented in the core documentation. That's bloody ironic in a thread which had pointed at reasons why people are not even considering Py3 for their projects: those folks won't even find the issues due to the lack of confidence in the suitability of the platform. What I saw was a thread about some issues in email, and cgi. We have some work being done to address the issue. This will help resolve some of the issues. If there are other issues, then we should step up and either help, or get out ofthe way. Arguing about the viability of a platform we knew would take a bit for adoption is silly and breeds ill will. I'm not arguing about viability: there are obviously users for whom Python 3 is not only viable, but superior to Python 2. However, I am quite confident that many pro-Python 3 folks arguing here underestimate the scope of the issues which have generated the (self-fullfilling) not yet perception. It's not a turd, and it's not hopeless, in fact rumor has it NumPy will be ported soon which is a major stepping stone. Sure, for the (far from trivial) subset of the community doing numerical work. The only way to counteract this meme that python 3 is horribly broken is to prove that it's not, fix bugs, and move on. There's no point debating relative turdiness here. Any turdiness (which I am *not* arguing for) is a natural consequence of the kinds of backward incompatibilities which were *not* ruled out for Python 3, along with the (early, now waning) build it and they will come optimism about adoption rates. Tres. - -- === Tres Seaver +1 540-429-0999 tsea...@palladion.com Palladion Software Excellence by Designhttp://palladion.com -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkwg5rIACgkQ+gerLs4ltQ6J7wCdFkQL7XeKtBM407Z5D2rSKk8n EWYAoJUfW+JgURUz7NJcWmqFw3PkNYde =WZEv -END PGP SIGNATURE- ___ Python-Dev mailing list Python-Dev@python.org
Re: [Python-Dev] email package status in 3.X
On Mon, Jun 21, 2010 at 10:28 PM, Stephen J. Turnbull step...@xemacs.org wrote: Michael Urman writes: It is somewhat troublesome that there doesn't appear to be an obvious built-in idempotent-when-possible function that gives back the provided bytes/str, If you want something idempotent, it's already the case that bytes(b'abc') = b'abc'. What might be desirable is to make bytes('abc') work and return b'abc', but only if 'abc' is pure ASCII (or maybe ISO 8859/1). No, no, no! That's just what Python 2 did. Unfortunately, str(b'abc') already does work, but st...@uwakimon ~ $ python3.1 Python 3.1.2 (release31-maint, May 12 2010, 20:15:06) [GCC 4.3.4] on linux2 Type help, copyright, credits or license for more information. str(b'abc') b'abc' Oops. You can see why that probably should be the case. There is a near-contract that str() of pretty much anything returns a printable version of that thing. -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
On 6/22/2010 9:24 AM, Michael Urman wrote: By idempotent-when-possible, I mean to_bytes(str_or_bytes, encoding, errors) that would pass an instance of bytes through, or encode an instance of str. And of course a to_str that performs similarly, passing str through and decoding bytes. While bytes(b'abc') will give me b'abc', neither bytes('abc') nor bytes(b'abc', 'latin-1') get me the b'abc' I want to see. These are trivial functions; I just don't fully understand why the capability isn't baked in. Possible reasons: They are special purpose functions easily built on the basic functions provided. Fine for a 3rd party library. Most people do not need them. Some might be mislead by them. As other have said, Not every one-liner should be builtin. -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
Tres, I am a Python3 enthusiast and realist. I did not expect major adoption for about 3 years (more optimistic than the 5 years of some). If you are feeling pressured to 'move' to Python3, it is not from me. I am sure you will do so on your own, perhaps even with enthusiasm, when it will be good for *you* to do so. If someone wants to contribute while sticking to Python2, its easy. The tracker has perhaps 2000 open 2.x issues, hundreds with no responses. If more Python2 people worked on making 2.7 as bug-free as possible, the developers would be freer to make 3.2 as good as possible (which is what *I* want). The porting of numpy (which I suspect has gotten some urging) will not just benefit 'nemerical' computing. For instance, there cannot be a 3.x version of pygame until there is a 3.x version of numpy, its main Python dependency. (The C Simple Directmedia Llibrary it also wraps and builds upon does not care.) -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
On Tue, Jun 22, 2010 at 9:37 AM, Tres Seaver tsea...@palladion.com wrote: Any turdiness (which I am *not* arguing for) is a natural consequence of the kinds of backward incompatibilities which were *not* ruled out for Python 3, along with the (early, now waning) build it and they will come optimism about adoption rates. FWIW, my optimisim is *not* waning. I think it's good that we're having this discussion and I expect something useful will come out of it; I also expect in general that the (admittedly serious) problem of having to port all dependencies will be solved in the next few years. Not by magic, but because many people are taking small steps in the right direction, and there will be light eventually. In the mean time I don't blame anyone for sticking with 2.x or being too busy to help port stuff to 3.x. Python 3 has been a long time in the making -- it will be a bit longer still, which was expected. -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
On Tue, Jun 22, 2010 at 15:32, Terry Reedy tjre...@udel.edu wrote: On 6/22/2010 9:24 AM, Michael Urman wrote: These are trivial functions; I just don't fully understand why the capability isn't baked in. Possible reasons: They are special purpose functions easily built on the basic functions provided. Fine for a 3rd party library. Most people do not need them. Some might be mislead by them. As other have said, Not every one-liner should be builtin. Perhaps the two-argument constructions on bytes and str should have been removed in favor of the .decode and .encode methods on their respective classes. Or vice versa; I don't have the history to know in which order they originated, and which is theoretically preferred these days. -- Michael Urman ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
On Mon, Jun 21, 2010 at 11:58 AM, P.J. Eby p...@telecommunity.com wrote: At 08:08 AM 6/21/2010 +1000, Nick Coghlan wrote: Perhaps if people could identify which specific string methods are causing problems? __getitem__(int) returns an integer rather than a bytestring, so anything that manipulates individual characters can't be given bytes and have it work. It can if you use length one slices rather than simple indexing. Depending on the details, such algorithms may still fail for multi-byte codecs though. That was one of the key differences I had in mind for a bstr type, apart from designing it to coerce normal strings to bstrs in cross-type operations, and to allow O(1) conversion to/from bytes. Erk, that just sounds like a recipe for recreating the problems 2.x has in a new form. Another randomly chosen byte/string incompatibility (Python 3.1; I don't have 3.2 handy at the moment): os.path.join(b'x','y') Traceback (most recent call last): File stdin, line 1, in module File c:\Python31\lib\ntpath.py, line 161, in join if b[:1] in seps: TypeError: Type str doesn't support the buffer API os.path.join('x',b'y') Traceback (most recent call last): File stdin, line 1, in module File c:\Python31\lib\ntpath.py, line 161, in join if b[:1] in seps: TypeError: 'in string' requires string as left operand, not bytes Ironically, it seems to me that in trying to make the type distinction more rigid, Py3K fails in this area precisely because it is not a rigidly typed language in the Java or Haskell sense: i.e., os.path.join doesn't say, I need two stringlike objects of the *same type*, not even in its docstring. I believe it actually needs the objects to be compatible with the type of os.sep, rather than just with each other (i.e. the type restrictions on os.path.join are the same as those on os.sep.join, even though the join algorithm itself is slightly different). This restriction should be mentioned in the Py3k docstring and docs for os.path.join - if it isn't, that would be a doc bug. At least in Java, you would either implement a path type with coercions from bytes and strings, or you'd have a class with overloaded methods for handling join operations on bytes and strings, respectively, thereby avoiding this whole mess. (Alas, this little example on the 'in' operator also shows that my bstr effort would probably fail anyway, because there's no '__rcontains__' (__lcontains__?) to allow it to override the str type's __contains__.) OK, these examples convince me that the incompatibility problem is real. However, I don't think a bstr type can solve them even without the __rcontains__ problem - it would just recreate the pain that we already have in the 2.x world. Something that may make sense to ease the porting process is for some of these on the boundary I/O related string manipulation functions (such as os.path.join) to grow encoding keyword-only arguments. The recommended approach would be to provide all strings, but bytes could also be accepted if an encoding was specified. (If you want to mix encodings - tough, do the decoding yourself). For the idea of avoiding excess copying of bytes through multiple encoding/decoding calls... isn't that meant to be handled at an architectural level (i.e. decode once on the way in, encode once on the way out)? Optimising the single-byte codec case by minimising data copying (possibly through creative use of PEP 3118) may be something that we want to look at eventually, but it strikes me as something of a premature optimisation at this point in time (i.e. the old adage first get it working, then get it working fast). Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
At 10:20 PM 6/21/2010 +1000, Nick Coghlan wrote: For the idea of avoiding excess copying of bytes through multiple encoding/decoding calls... isn't that meant to be handled at an architectural level (i.e. decode once on the way in, encode once on the way out)? Optimising the single-byte codec case by minimising data copying (possibly through creative use of PEP 3118) may be something that we want to look at eventually, but it strikes me as something of a premature optimisation at this point in time (i.e. the old adage first get it working, then get it working fast). The issue is, I'd like to have an idempotent incantation that I can use to make the inputs and outputs to stdlib functions behave in a type-safe manner with respect to bytes, in cases where bytes are really what I want operated on. Note too that this is an argument for symmetry in wrapping the inputs and outputs, so that the code doesn't have to know what it's dealing with! After all, right now, if a stdlib function might return bytes or unicode depending on runtime conditions, I can't even hardcode an .encode() call -- it would fail if the return type is a bytes. This basically goes against the tell, don't ask pattern, and the Pythonically idempotent approach. That is, Python builtins normally return you back the same thing if it's already what you want - int(someInt)- someInt, iter(someIter)-someIter, etc. Since this incantation may need to be used often, and in places that are not known to me in advance, I would like it to not impose new overhead in unexpected places. (i.e., the usual argument brought against making changes to the 'list' type that would change certain operations from O(1) to O(log something)). It's more about predictability, and having One *Obvious* Way To Do It, as opposed to several ways, which you need to think carefully about and restructure your entire architecture around if necessary. One obvious way means I can focus on the mechanical effort of porting *first*, without having to think. So, the performance issue isn't really about performance *per se*, so much as about the mental UI of the language. You could just as easily lie and tell me that your bstr implementation is O(1), and I would probably be happy and never notice, because the issue was never really about performance as such, but about having to *think* about it. (i.e., breaking flow.) Really, the entire issue can presumably be dealt with by some series of incantations - it's just code after all. But having to sit and think about *every* situation where I'm dealing with bytes/unicode distinctions seems like a torture compared to being able to say, okay, so when dealing with this sort of API and this sort of data, this is the One Obvious Way to do the conversions. It's One Obvious Way that I want, but some people seem to be arguing that the One Obvious Way is to Think Carefully About It Every Time -- and that seems to violate the Obvious part, IMO. ;-) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
On Jun 21, 2010, at 10:20 PM, Nick Coghlan wrote: Something that may make sense to ease the porting process is for some of these on the boundary I/O related string manipulation functions (such as os.path.join) to grow encoding keyword-only arguments. The recommended approach would be to provide all strings, but bytes could also be accepted if an encoding was specified. (If you want to mix encodings - tough, do the decoding yourself). This is probably a stupid idea, and if so I'll plead Monday morning mindfuzz for it. Would it make sense to have encoding-carrying bytes and str types? Basically, I'm thinking of types (maybe even the current ones) that carry around a .encoding attribute so that they can be automatically encoded and decoded where necessary. This at least would simplify APIs that need to do the conversion. By default, the .encoding attribute would be some marker to indicated I have no idea, do it explicitly and if you combine ebytes or estrs that have incompatible encodings, you'd either throw an exception or reset the .encoding to IAmConfuzzled. But say you had an email header like: =?euc-jp?b?pc+l7aG8pe+hvKXrpcmhqg==?= And code like the following (made less crappy): -snip snip- class ebytes(bytes): encoding = 'ascii' def __str__(self): s = estr(self.decode(self.encoding)) s.encoding = self.encoding return s class estr(str): encoding = 'ascii' s = str(b'\xa5\xcf\xa5\xed\xa1\xbc\xa5\xef\xa1\xbc\xa5\xeb\xa5\xc9\xa1\xaa', 'euc-jp') b = bytes(s, 'euc-jp') eb = ebytes(b) eb.encoding = 'euc-jp' es = str(eb) print(repr(eb), es, es.encoding) -snip snip- Running this you get: b'\xa5\xcf\xa5\xed\xa1\xbc\xa5\xef\xa1\xbc\xa5\xeb\xa5\xc9\xa1\xaa' ハローワールド! euc-jp Would it be feasible? Dunno. Would it help ease the bytes/str confusion? Dunno. But I think it would help make APIs easier to design and use because it would cut down on the encoding-keyword function signature infection. -Barry signature.asc Description: PGP signature ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
On Mon, Jun 21, 2010 at 09:51, P.J. Eby p...@telecommunity.com wrote: The issue is, I'd like to have an idempotent incantation that I can use to make the inputs and outputs to stdlib functions behave in a type-safe manner with respect to bytes, in cases where bytes are really what I want operated on. Note too that this is an argument for symmetry in wrapping the inputs and outputs, so that the code doesn't have to know what it's dealing with! It is somewhat troublesome that there doesn't appear to be an obvious built-in idempotent-when-possible function that gives back the provided bytes/str, or converts to the requested type per the listed encoding (as of 3.1.2). Would it be useful to make the second versions of these work, or would that return us to the confusion of the 2.x era? On the other hand, since these are all TypeErrors instead of UnicodeErrors, it's an easy wrapper to write. bytes('abc', 'latin-1') b'abc' bytes(b'abc', 'latin-1') TypeError: encoding or errors without a string argument str(b'abc', 'latin-1') 'abc' str('abc', 'latin-1') TypeError: decoding str is not supported Interestingly the online docs for str say it can decode either a byte string or a character buffer, a term which doesn't yield a definition in a search; apparently either a string is not a character buffer, or the docs are incorrect. http://docs.python.org/py3k/library/functions.html?highlight=str#str However it looks like this is consistent with int. int(4, 0) TypeError: int() can't convert non-string with explicit base -- Michael Urman ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
On Mon, Jun 21, 2010 at 11:43:07AM -0400, Barry Warsaw wrote: On Jun 21, 2010, at 10:20 PM, Nick Coghlan wrote: Something that may make sense to ease the porting process is for some of these on the boundary I/O related string manipulation functions (such as os.path.join) to grow encoding keyword-only arguments. The recommended approach would be to provide all strings, but bytes could also be accepted if an encoding was specified. (If you want to mix encodings - tough, do the decoding yourself). This is probably a stupid idea, and if so I'll plead Monday morning mindfuzz for it. Would it make sense to have encoding-carrying bytes and str types? Basically, I'm thinking of types (maybe even the current ones) that carry around a .encoding attribute so that they can be automatically encoded and decoded where necessary. This at least would simplify APIs that need to do the conversion. By default, the .encoding attribute would be some marker to indicated I have no idea, do it explicitly and if you combine ebytes or estrs that have incompatible encodings, you'd either throw an exception or reset the .encoding to IAmConfuzzled. But say you had an email header like: =?euc-jp?b?pc+l7aG8pe+hvKXrpcmhqg==?= And code like the following (made less crappy): -snip snip- class ebytes(bytes): encoding = 'ascii' def __str__(self): s = estr(self.decode(self.encoding)) s.encoding = self.encoding return s class estr(str): encoding = 'ascii' s = str(b'\xa5\xcf\xa5\xed\xa1\xbc\xa5\xef\xa1\xbc\xa5\xeb\xa5\xc9\xa1\xaa', 'euc-jp') b = bytes(s, 'euc-jp') eb = ebytes(b) eb.encoding = 'euc-jp' es = str(eb) print(repr(eb), es, es.encoding) -snip snip- Running this you get: b'\xa5\xcf\xa5\xed\xa1\xbc\xa5\xef\xa1\xbc\xa5\xeb\xa5\xc9\xa1\xaa' ハローワールド! euc-jp Would it be feasible? Dunno. Would it help ease the bytes/str confusion? Dunno. But I think it would help make APIs easier to design and use because it would cut down on the encoding-keyword function signature infection. I like the idea of having encoding information carried with the data. I don't think that an ebytes type that can *optionally* have an encoding attribute makes the situation less confusing, though. To me the biggest problem with python-2.x's unicode/bytes handling was not that it threw exceptions but that it didn't always throw exceptions. You might test this in python2:: t = u'cafe' function(t) And say, ah my code works. Then a user gives it this:: t = u'café' function(t) And get a unicode error because the function only works with unicode in the ascii range. ebytes seems to have the same pitfall where the code path exercised by your tests could work with:: eb = ebytes(b) eb.encoding = 'euc-jp' function(eb) but the user exercises a code path that does this and fails:: eb = ebytes(b) function(eb) What do you think of making the encoding attribute a mandatory part of creating an ebyte object? (ex: ``eb = ebytes(b, 'euc-jp')``). -Toshio pgpc4qEcxzofr.pgp Description: PGP signature ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
At 11:43 AM 6/21/2010 -0400, Barry Warsaw wrote: On Jun 21, 2010, at 10:20 PM, Nick Coghlan wrote: Something that may make sense to ease the porting process is for some of these on the boundary I/O related string manipulation functions (such as os.path.join) to grow encoding keyword-only arguments. The recommended approach would be to provide all strings, but bytes could also be accepted if an encoding was specified. (If you want to mix encodings - tough, do the decoding yourself). This is probably a stupid idea, and if so I'll plead Monday morning mindfuzz for it. Would it make sense to have encoding-carrying bytes and str types? It's not a stupid idea, and could potentially work. It also might have a better chance of being able to actually be *implemented* in 3.x than my idea. Basically, I'm thinking of types (maybe even the current ones) that carry around a .encoding attribute so that they can be automatically encoded and decoded where necessary. This at least would simplify APIs that need to do the conversion. I'm not really sure how much use the encoding is on a unicode object - what would it actually mean? Hm. I suppose it would effectively mean this string can be represented in this encoding -- which is useful, in that you could fail operations when combining with bytes of a different encoding. Hm... no, in that case you should just encode the string to the bytes' encoding, and let that throw an error if it fails. So, really, there's no reason for a string to know its encoding. All you need is the bytes type to have an encoding attribute, and when doing mixed-type operations between bytes and strings, coerce to *bytes of the same encoding*. However, if .encoding is None, then coercion would follow the same rules as now -- i.e., convert the bytes to unicode, assuming an ascii encoding. (This would be different than setting an encoding of 'ascii', because in that case, it means you want cross-type operations to result in ascii bytes, rather than a unicode string, and to fail if the unicode part can't be encoded appropriately. The 'None' setting is effectively a nod to compatibility with prior 3.x versions, since I assume we can't just throw out the old coercion behavior.) Then, a few more changes to the bytes type would round out the implementation: * Allow .decode() to not specify an encoding, unless .encoding is None * Add back in the missing string methods (e.g. .encode()), since you can transparently upgrade to a string) * Smart __str__, as shown in your proposal. Would it be feasible? Dunno. Probably, although it might mean adding back in special cases that were previously taken out, and a few new ones. Would it help ease the bytes/str confusion? Dunno. Not sure what confusion you mean -- Web-SIG and I at least are not confused about the difference between bytes and str, or we wouldn't be having an issue. ;-) Or maybe you mean the stdlib's API confusion? In which case, yes, definitely! But I think it would help make APIs easier to design and use because it would cut down on the encoding-keyword function signature infection. Not only that, but I believe it would also retroactively make the stdlib's implementation of those APIs correct again, and give us One Obvious Way to work with bytes of a known encoding, while constraining any unicode that gets combined with those bytes to be validly encodable. It also gives you an idempotent constructor for bytes of a specified encoding, that can take either a bytes of unspecified encoding, a bytes of the correct encoding, or a string that can be encoded as such. In short, +1. (I wish it were possible to go back and make bytes non-strings and have only this ebytes or bstr or whatever type have string methods, but I'm pretty sure that ship has already sailed.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
At 12:34 PM 6/21/2010 -0400, Toshio Kuratomi wrote: What do you think of making the encoding attribute a mandatory part of creating an ebyte object? (ex: ``eb = ebytes(b, 'euc-jp')``). As long as the coercion rules force str+ebytes (or str % ebytes, ebytes % str, etc.) to result in another ebytes (and fail if the str can't be encoded in the ebytes' encoding), I'm personally fine with it, although I really like the idea of tacking the encoding to bytes objects in the first place. OTOH, one potential problem with having the encoding on the bytes object rather than the ebytes object is that then you can't easily take bytes from a socket and then say what encoding they are, without interfering with the sockets API (or whatever other place you get the bytes from). So, on balance, making ebytes a separate type (perhaps one that's just a pointer to the bytes and a pointer to the encoding) would indeed make more sense. It having different coercion rules for interacting with strings would make more sense too in that case. (The ideal, of course, would still be to not let bytes objects be stringlike at all, with only ebytes acting string-like. That way, you'd be forced to be explicit about your encoding when working with bytes, but all you'd need to do was make an ebytes call.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
On 6/21/2010 11:43 AM, Barry Warsaw wrote: This is probably a stupid idea, and if so I'll plead Monday morning mindfuzz for it. Would it make sense to have encoding-carrying bytes and str types? On 2009-11-5 I posted 'Add encoding attribute to bytes' to python-ideas. It was shot down at the time. Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
P.J. Eby writes: Note too that this is an argument for symmetry in wrapping the inputs and outputs, so that the code doesn't have to know what it's dealing with! and After all, right now, if a stdlib function might return bytes or unicode depending on runtime conditions, I can't even hardcode an .encode() call -- it would fail if the return type is a bytes. I'm lost. What stdlib functions are you talking about whose return type depends on runtime conditions, and what runtime conditions? What do you mean by wrapping? The only times I've run into str/bytes nondeterminancy is when I've mixed str/bytes myself, and passed them into functions that are type-identities (str - str, bytes - bytes), which then appear to give a nondeterministic result. It's a deterministic bug, though, always mine.wink It's One Obvious Way that I want, but some people seem to be arguing that the One Obvious Way is to Think Carefully About It Every Time -- and that seems to violate the Obvious part, IMO. ;-) Nick alluded to the The One Obvious Way as a change in architecture. Specifically: Decode all bytes to typed objects (str, images, audio, structured objects) at input. Do no manipulations on bytes ever except decode and encode (both to text, and to special-purpose objects such as images) in a program that does I/O. (Obviously image manipulation libraries etc will have to operate on bytes, but they should have no functions that consume bytes except constructors a la bytes.decode() for text, and no functions that produce bytes except the output serializers that write files and the like, a la str.encode().) Encode back to bytes on output. Yes, this is tedious if you live in an ASCII world, compared to using bytes as characters. However, it works for the rest of us, which the old style doesn't. As for Think Carefully About It Every Time, that is required only in Porting Programs That Mix Operation On Bytes With Operation On Str. If you write programs from scratch, however, the decode-process-encode paradigm quickly becomes second nature. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
Barry Warsaw writes: Would it make sense to have encoding-carrying bytes and str types? Why limit that to bytes and str? Why not have all objects carry their serializer/deserializer around with them? I think the answer is no, though, because (1) it would constitute an attractive nuisance (the default would be abused, it would work fine in Kansas, and all hell would break loose in Kagoshima, simply delaying the pain and/or passing it on to third parties), and (2) you really want this under control of higher level objects that have access to some knowledge of the environment, rather than the lowest level. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
At 01:36 PM 6/21/2010 -0400, Terry Reedy wrote: On 6/21/2010 11:43 AM, Barry Warsaw wrote: This is probably a stupid idea, and if so I'll plead Monday morning mindfuzz for it. Would it make sense to have encoding-carrying bytes and str types? On 2009-11-5 I posted 'Add encoding attribute to bytes' to python-ideas. It was shot down at the time. AFAICT, that's mainly for lack of apparent use cases, and also for confusion. Here, the use case (restoring the polymorphy of stdlib APIs) is pretty clear. However, if we had the string equivalent of a coercion protocol (that core strings and bytes would co-operate with), then it would enable people to write their own versions of either your idea or Barry's idea (or other things altogether), and still get the stdlib to play along. Personally, I think ebytes() would do the trick and it'd be nice to see it in stdlib, but gaining a string coercion protocol instead might not be a bad tradeoff. ;-) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
At 02:58 AM 6/22/2010 +0900, Stephen J. Turnbull wrote: Nick alluded to the The One Obvious Way as a change in architecture. Specifically: Decode all bytes to typed objects (str, images, audio, structured objects) at input. Do no manipulations on bytes ever except decode and encode (both to text, and to special-purpose objects such as images) in a program that does I/O. This ignores the existence of use cases where what you have is text that can't be properly encoded in unicode. I know, it's a hard thing to wrap one's head around, since on the surface it sounds like unicode is the programmer's savior. Unfortunately, real-world text data exists which cannot be safely roundtripped to unicode, and must be handled in bytes with encoding form for certain operations. I personally do not have to deal with this *particular* use case any more -- I haven't been at NTT/Verio for six years now. But I do know it exists for e.g. Asian language email handling, which is where I first encountered it. At the time (this *may* have changed), many popular email clients did not actually support unicode, so you couldn't necessarily just send off an email in UTF-8. It drove us nuts on the project where this was involved (an i18n of an existing Python app), and I think we had to compromise a bit in some fashion (because we couldn't really avoid unicode roundtripping due to database issues), but the use case does actually exist. My current needs are simpler, thank goodness. ;-) However, they *do* involve situations where I'm dealing with *other* encoding-restricted legacy systems, such as software for interfacing with the US Postal Service that only works with a restricted subset of latin1, while receiving mangled ASCII from an ecommerce provider, and storing things in what's effectively a latin-1 database. Being able to easily assert what kind of bytes I've got would actually let me catch errors sooner, *if* those assertions were being checked when different kinds of strings or bytes were being combined. i.e., at coercion time). Yes, this is tedious if you live in an ASCII world, compared to using bytes as characters. However, it works for the rest of us, which the old style doesn't. I'm not trying to go back to the old style -- ideally, I want something that would actually improve on the it's not really unicode use cases above if it were available in 2.x. I don't want to be encoding agnostic or encoding implicit, -- I want to make it possible to be even *more* explicit and restrictive than it is currently possible to be in either 2.x OR 3.x. It's just that 3.x affords greater opportunity for doing this, and is an ideal place to make the switch -- i.e., at a point where you now have to get explicit about your encodings, anyway! As for Think Carefully About It Every Time, that is required only in Porting Programs That Mix Operation On Bytes With Operation On Str. If you write programs from scratch, however, the decode-process-encode paradigm quickly becomes second nature. Which works if and only if your outputs are truly unicode-able. If you work with legacy systems (e.g. those Asian email clients and US postal software), you are really working with a *character set*, not unicode, and so putting your data in unicode form is actually *wrong* -- an expedient lie. Heresy, I know, but there you go. ;-) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
At 03:08 AM 6/22/2010 +0900, Stephen J. Turnbull wrote: Barry Warsaw writes: Would it make sense to have encoding-carrying bytes and str types? I think the answer is no, though, because (1) it would constitute an attractive nuisance (the default would be abused, it would work fine in Kansas, and all hell would break loose in Kagoshima, simply delaying the pain and/or passing it on to third parties), You have the proposal exactly backwards, actually. In Kagoshima, you'd use pass in an ebytes with your encoding to a stdlib API, and *get back an ebytes with the right encoding*, rather than an (incorrect and useless) unicode object which has lost data you need. Why limit that to bytes and str? Why not have all objects carry their serializer/deserializer around with them? Because it's not a serialization or deserialization. Your conceptual framework here implies that unicode objects are the real thing, and that bytes are just a way of transporting unicode around. But this is not the case at all, for use cases where no, really, you *have to* work with bytes-encoded text streams. The mere release of Python 3.x will not cause all the world's applications, libraries, and protocols to suddenly work with unicode, where they did not before. Being explicit about the encoding of the bytes you're flinging around is actually an *increase* in specificity, explicitness, robustness, and error-checking ability over the status quo for either 2.x *or* 3.x... *and* it improves these qualities for essentially *all* string-handling code, without requiring that code to be rewritten to do so. It's like getting to use the time machine, really. and (2) you really want this under control of higher level objects that have access to some knowledge of the environment, rather than the lowest level. This proposal actually has such a higher-level object: an ebytes. And it passes that information *through* the lowest level, in such a way as to permit the stringlike operations to be fully polymorphic, without the information being lost inside somebody else's API. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
On Mon, Jun 21, 2010 at 01:24:10PM -0400, P.J. Eby wrote: At 12:34 PM 6/21/2010 -0400, Toshio Kuratomi wrote: What do you think of making the encoding attribute a mandatory part of creating an ebyte object? (ex: ``eb = ebytes(b, 'euc-jp')``). As long as the coercion rules force str+ebytes (or str % ebytes, ebytes % str, etc.) to result in another ebytes (and fail if the str can't be encoded in the ebytes' encoding), I'm personally fine with it, although I really like the idea of tacking the encoding to bytes objects in the first place. I wouldn't like this. It brings us back to the python2 problem where sometimes you pass an ebyte into a function and it works and other times you pass an ebyte into the function and it issues a traceback. The coercion must end up with a str and no traceback (this assumes that we've checked that the ebyte and the encoding match when we create the ebyte). If you want bytes out the other end, you should either have a different function or explicitly transform the output from str to bytes. So, what's the advantage of using ebytes instead of bytes? * It keeps together the text and encoding information when you're taking bytes in and want to give bytes back under the same encoding. * It takes some of the boilerplate that people are supposed to do (checking that bytes are legal in a specific encoding) and writes it into the initialization of the object. That forces you to think about the issue at two points in the code: when converting into ebytes and when converting out to bytes. For data that's going to be used with both str and bytes, this is the accepted best practice. (For exceptions, the byte type remains which you can do conversion on when you want to). -Toshio pgpjsqwszNbF7.pgp Description: PGP signature ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
On Jun 21, 2010, at 12:34 PM, Toshio Kuratomi wrote: I like the idea of having encoding information carried with the data. I don't think that an ebytes type that can *optionally* have an encoding attribute makes the situation less confusing, though. Agreed. I think the attribute should always be there, but there probably needs to be a magic value (perhaps None) that indicates and unknown, manual, garbage, error, broken encoding. Examples: you read bytes off a socket and don't know what the encoding is; you concatenate two ebytes that have incompatible encodings. To me the biggest problem with python-2.x's unicode/bytes handling was not that it threw exceptions but that it didn't always throw exceptions. You might test this in python2:: t = u'cafe' function(t) And say, ah my code works. Then a user gives it this:: t = u'café' function(t) And get a unicode error because the function only works with unicode in the ascii range. That's an excellent point. ebytes seems to have the same pitfall where the code path exercised by your tests could work with:: eb = ebytes(b) eb.encoding = 'euc-jp' function(eb) but the user exercises a code path that does this and fails:: eb = ebytes(b) function(eb) What do you think of making the encoding attribute a mandatory part of creating an ebyte object? (ex: ``eb = ebytes(b, 'euc-jp')``). If ebytes is a separate type, then definitely +1. If 'ebytes is bytes' then I'd probably want to default the second argument to the magical i-don't-know' marker. -Barry signature.asc Description: PGP signature ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
On Jun 22, 2010, at 03:08 AM, Stephen J. Turnbull wrote: Barry Warsaw writes: Would it make sense to have encoding-carrying bytes and str types? Why limit that to bytes and str? Why not have all objects carry their serializer/deserializer around with them? Only because the .encoding attribute isn't really a serializer/deserializer. That's still bytes() and str() or the equivalent. This is just a hint to a specific serializer for parameters to that action. I think the answer is no, though, because (1) it would constitute an attractive nuisance (the default would be abused, it would work fine in Kansas, and all hell would break loose in Kagoshima, simply delaying the pain and/or passing it on to third parties), and (2) you really want this under control of higher level objects that have access to some knowledge of the environment, rather than the lowest level. I'm still not sure ebytes solves the problem, but it avoids one I'm most concerned about seeing proposed. I really really do not want to add encoding=blah arguments to boatloads of function signatures. -Barry signature.asc Description: PGP signature ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
On Jun 21, 2010, at 01:24 PM, P.J. Eby wrote: OTOH, one potential problem with having the encoding on the bytes object rather than the ebytes object is that then you can't easily take bytes from a socket and then say what encoding they are, without interfering with the sockets API (or whatever other place you get the bytes from). Unless the default was the I don't know marker and you were able to set it after you've done whatever kind of application-level calculation you needed to do. -Barry signature.asc Description: PGP signature ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
On Jun 21, 2010, at 03:29 PM, Toshio Kuratomi wrote: I wouldn't like this. It brings us back to the python2 problem where sometimes you pass an ebyte into a function and it works and other times you pass an ebyte into the function and it issues a traceback. The coercion must end up with a str and no traceback (this assumes that we've checked that the ebyte and the encoding match when we create the ebyte). Doing this at ebyte construction time does have the nice benefit of getting the exception early, and because the ebyte is unmutable, you could cache the results in an attribute on the ebyte. Well, unmutable if the .encoding is also unmutable. If that can change, then you'd have to re-run the cached decoding whenever the attribute were set, and there would be a penalty paid each time this was done. That, plus the socket use case, does argue for a separate ebytes type. -Barry signature.asc Description: PGP signature ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
At 03:29 PM 6/21/2010 -0400, Toshio Kuratomi wrote: On Mon, Jun 21, 2010 at 01:24:10PM -0400, P.J. Eby wrote: At 12:34 PM 6/21/2010 -0400, Toshio Kuratomi wrote: What do you think of making the encoding attribute a mandatory part of creating an ebyte object? (ex: ``eb = ebytes(b, 'euc-jp')``). As long as the coercion rules force str+ebytes (or str % ebytes, ebytes % str, etc.) to result in another ebytes (and fail if the str can't be encoded in the ebytes' encoding), I'm personally fine with it, although I really like the idea of tacking the encoding to bytes objects in the first place. I wouldn't like this. It brings us back to the python2 problem where sometimes you pass an ebyte into a function and it works and other times you pass an ebyte into the function and it issues a traceback. For stdlib functions, this isn't going to happen unless your ebytes' encoding is not compatible with the ascii subset of unicode, or the stdlib function is working with dynamic data... in which case you really *do* want to fail early! I don't see this as a repeat of the 2.x situation; rather, it allows you to cause errors to happen much *earlier* than they would otherwise show up if you were using unicode for your encoded-bytes data. For example, if your program's intent is to end up with latin-1 output, then it would be better for an error to show up at the very *first* point where non-latin1 characters are mixed with your data, rather than only showing up at the output boundary! However, if you promoted mixed-type operation results to unicode instead of ebytes, then you: 1) can't preserve data that doesn't have a 1:1 mapping to unicode, and 2) can't detect an error until your data reaches the output point in your application -- forcing you to defensively insert ebytes calls everywhere (vs. simply wrapping them around a handful of designated inputs), or else have to go right back to tracing down where the unusable data showed up in the first place. One thing that seems like a bit of a blind spot for some folks is that having unicode is *not* everybody's goal. Not because we don't believe unicode is generally a good thing or anything like that, but because we have to work with systems that flat out don't *do* unicode, thereby making the presence of (fully-general) unicode an error condition that has to be stamped out! IOW, if you're producing output that has to go into another system that doesn't take unicode, it doesn't matter how theoretically-correct it would be for your app to process the data in unicode form. In that case, unicode is not a feature: it's a bug. And as it really *is* an error in that case, it should not pass silently, unless explicitly silenced. So, what's the advantage of using ebytes instead of bytes? * It keeps together the text and encoding information when you're taking bytes in and want to give bytes back under the same encoding. * It takes some of the boilerplate that people are supposed to do (checking that bytes are legal in a specific encoding) and writes it into the initialization of the object. That forces you to think about the issue at two points in the code: when converting into ebytes and when converting out to bytes. For data that's going to be used with both str and bytes, this is the accepted best practice. (For exceptions, the byte type remains which you can do conversion on when you want to). Hm. For the output case, I suppose that means you might also want the text I/O wrappers to be able to be strict about ebytes' encoding. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
At 04:04 PM 6/21/2010 -0400, Barry Warsaw wrote: On Jun 21, 2010, at 01:24 PM, P.J. Eby wrote: OTOH, one potential problem with having the encoding on the bytes object rather than the ebytes object is that then you can't easily take bytes from a socket and then say what encoding they are, without interfering with the sockets API (or whatever other place you get the bytes from). Unless the default was the I don't know marker and you were able to set it after you've done whatever kind of application-level calculation you needed to do. True, but making it a separate type with a required encoding gets rid of the magical I don't know - the I don't know encoding is just a plain old bytes object. (In principle, you could then drop *all* the stringlike methods from plain-old-bytes objects. If it's really text-in-bytes you want, you should use an ebytes with the encoding specified.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
On Jun 21, 2010, at 01:17 PM, P.J. Eby wrote: I'm not really sure how much use the encoding is on a unicode object - what would it actually mean? Hm. I suppose it would effectively mean this string can be represented in this encoding -- which is useful, in that you could fail operations when combining with bytes of a different encoding. That's basically what I was thinking. Hm... no, in that case you should just encode the string to the bytes' encoding, and let that throw an error if it fails. So, really, there's no reason for a string to know its encoding. All you need is the bytes type to have an encoding attribute, and when doing mixed-type operations between bytes and strings, coerce to *bytes of the same encoding*. If ebytes were a separate type, and it did the encoding check at constructor time, and the results of the decoding were cached, then I think you would not need the equivalent of an estr type. If you had a string and knew what it could be encoded to, then you could just coerce it to an ebytes and use the cached decoded value wherever you needed it. E.g. mystring = 'some unicode string' myencoding = 'iso--foo' myebytes = ebytes(mystring, myencoding) myebytes.encoding == myencoding True myebytes.string == mystring True So ebytes() could accept a str or bytes as its first argument. mybytes = b'some encoded string' myebytes = ebytes(mybytes, myencoding) mybytes == myebytes True myebytes.encoding == myencoding True In the first example ebytes() encodes mystring to set the internal bytes representation. In the second example, ebytes() decodes the bytes to get the .string attribute value. In both cases, an exception is raised if the encoding/decoding fails. However, if .encoding is None, then coercion would follow the same rules as now -- i.e., convert the bytes to unicode, assuming an ascii encoding. (This would be different than setting an encoding of 'ascii', because in that case, it means you want cross-type operations to result in ascii bytes, rather than a unicode string, and to fail if the unicode part can't be encoded appropriately. The 'None' setting is effectively a nod to compatibility with prior 3.x versions, since I assume we can't just throw out the old coercion behavior.) Then, a few more changes to the bytes type would round out the implementation: * Allow .decode() to not specify an encoding, unless .encoding is None * Add back in the missing string methods (e.g. .encode()), since you can transparently upgrade to a string) * Smart __str__, as shown in your proposal. If my example above isn't nonsense, then __str__() would just return the .string attribute. In short, +1. (I wish it were possible to go back and make bytes non-strings and have only this ebytes or bstr or whatever type have string methods, but I'm pretty sure that ship has already sailed.) Maybe it's PEP time? No, I'm not volunteering. ;) -Barry signature.asc Description: PGP signature ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
On Jun 21, 2010, at 04:16 PM, P.J. Eby wrote: At 04:04 PM 6/21/2010 -0400, Barry Warsaw wrote: On Jun 21, 2010, at 01:24 PM, P.J. Eby wrote: OTOH, one potential problem with having the encoding on the bytes object rather than the ebytes object is that then you can't easily take bytes from a socket and then say what encoding they are, without interfering with the sockets API (or whatever other place you get the bytes from). Unless the default was the I don't know marker and you were able to set it after you've done whatever kind of application-level calculation you needed to do. True, but making it a separate type with a required encoding gets rid of the magical I don't know - the I don't know encoding is just a plain old bytes object. (In principle, you could then drop *all* the stringlike methods from plain-old-bytes objects. If it's really text-in-bytes you want, you should use an ebytes with the encoding specified.) Yep, agreed! -Barry signature.asc Description: PGP signature ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
On Mon, Jun 21, 2010 at 02:46:57PM -0400, P.J. Eby wrote: At 02:58 AM 6/22/2010 +0900, Stephen J. Turnbull wrote: Nick alluded to the The One Obvious Way as a change in architecture. Specifically: Decode all bytes to typed objects (str, images, audio, structured objects) at input. Do no manipulations on bytes ever except decode and encode (both to text, and to special-purpose objects such as images) in a program that does I/O. This ignores the existence of use cases where what you have is text that can't be properly encoded in unicode. I know, it's a hard thing to wrap one's head around, since on the surface it sounds like unicode is the programmer's savior. Unfortunately, real-world text data exists which cannot be safely roundtripped to unicode, and must be handled in bytes with encoding form for certain operations. I personally do not have to deal with this *particular* use case any more -- I haven't been at NTT/Verio for six years now. But I do know it exists for e.g. Asian language email handling, which is where I first encountered it. At the time (this *may* have changed), many popular email clients did not actually support unicode, so you couldn't necessarily just send off an email in UTF-8. It drove us nuts on the project where this was involved (an i18n of an existing Python app), and I think we had to compromise a bit in some fashion (because we couldn't really avoid unicode roundtripping due to database issues), but the use case does actually exist. My current needs are simpler, thank goodness. ;-) However, they *do* involve situations where I'm dealing with *other* encoding-restricted legacy systems, such as software for interfacing with the US Postal Service that only works with a restricted subset of latin1, while receiving mangled ASCII from an ecommerce provider, and storing things in what's effectively a latin-1 database. Being able to easily assert what kind of bytes I've got would actually let me catch errors sooner, *if* those assertions were being checked when different kinds of strings or bytes were being combined. i.e., at coercion time). While it's certainly possible that you have a grapheme that has no corresponding unicode codepoint, it doesn't sound like this is the case you're dealing with here. You talk about restricted subset of latin1 but all of latin1's graphemes have unicode codepoints. You also talk about not being able to send off an email in UTF-8 but UTF-8 is an encoding of unicode, not unicode itself. Similarly, the statement that some email clients don't support unicode isn't very clear as to actual problem. The email client supports displaying graphemes using glyphs present on the computer. As long as the graphemes needed have a unicode codepoint, using unicode inside of your application and then encoding to bytes on the way out works fine. Even in cases where there's no unicode codepoint for the grapheme that you're receiving unicode gives you a way out. It provides you a private use area where you can map the graphemes to unused codepoints. Your application keeps a mapping from that codepoint to the particular byte sequence that you want. Then write you a codec that converts from unicode w/ these private codepoints into your particular encoding (and from bytes into unicode). -Toshio pgp0riTqgpAbp.pgp Description: PGP signature ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
Barry Warsaw wrote: On Jun 21, 2010, at 12:34 PM, Toshio Kuratomi wrote: I like the idea of having encoding information carried with the data. I don't think that an ebytes type that can *optionally* have an encoding attribute makes the situation less confusing, though. Agreed. I think the attribute should always be there, but there probably needs to be a magic value (perhaps None) that indicates and unknown, manual, garbage, error, broken encoding. Examples: you read bytes off a socket and don't know what the encoding is; you concatenate two ebytes that have incompatible encodings. Such extra information tends to be lost whenever you pass the bytes data through a C level API or some other function that doesn't know about the special nature of those objects, treating them just like any bytes object. It may sound nice in theory, but in practice it doesn't work out. Besides, if you do know the encoding, you can easily carry the data around in a Unicode str object. The problem lies elsewhere: What to do with a piece of text for which you don't know the encoding and how to combine that piece of text with other pieces of text for which you do know the encoding. There are a few options at hand: * you keep working on the bytes data and only convert things to Unicode when needed and where the encoding is known * you decode the bytes data for which you don't have the encoding information into some special Unicode form (eg. using the surrogateescape error handler) and hope that when the time comes to encode the Unicode data back into bytes, the codec supports reversing the conversion * you manage the data as a list of Unicode str and bytes objects and don't even try to be clever about encodings of text without unknown encoding It depends a lot on the use case, which of these options fits best. To me the biggest problem with python-2.x's unicode/bytes handling was not that it threw exceptions but that it didn't always throw exceptions. You might test this in python2:: t = u'cafe' function(t) And say, ah my code works. Then a user gives it this:: t = u'café' function(t) And get a unicode error because the function only works with unicode in the ascii range. That's an excellent point. Here's a little known fact: by changing the Python2 default encoding to 'undefined' (yes, that's a real codec !), you can disable all automatic string coercion in Python2. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 21 2010) Python/Zope Consulting and Support ...http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ 2010-07-19: EuroPython 2010, Birmingham, UK27 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
On Jun 21, 2010, at 4:29 PM, M.-A. Lemburg wrote: Here's a little known fact: by changing the Python2 default encoding to 'undefined' (yes, that's a real codec !), you can disable all automatic string coercion in Python2. I tried that once: half the stdlib stops working if you do (for example, the re module), so it's not particularly useful for checking if your own code is unicode-safe. James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
On Mon, Jun 21, 2010 at 04:09:52PM -0400, P.J. Eby wrote: At 03:29 PM 6/21/2010 -0400, Toshio Kuratomi wrote: On Mon, Jun 21, 2010 at 01:24:10PM -0400, P.J. Eby wrote: At 12:34 PM 6/21/2010 -0400, Toshio Kuratomi wrote: What do you think of making the encoding attribute a mandatory part of creating an ebyte object? (ex: ``eb = ebytes(b, 'euc-jp')``). As long as the coercion rules force str+ebytes (or str % ebytes, ebytes % str, etc.) to result in another ebytes (and fail if the str can't be encoded in the ebytes' encoding), I'm personally fine with it, although I really like the idea of tacking the encoding to bytes objects in the first place. I wouldn't like this. It brings us back to the python2 problem where sometimes you pass an ebyte into a function and it works and other times you pass an ebyte into the function and it issues a traceback. For stdlib functions, this isn't going to happen unless your ebytes' encoding is not compatible with the ascii subset of unicode, or the stdlib function is working with dynamic data... in which case you really *do* want to fail early! The ebytes encoding will often be incompatible with the ascii subset. It's the reason that people were so often tempted to change the defaultencoding on python2 to utf8. I don't see this as a repeat of the 2.x situation; rather, it allows you to cause errors to happen much *earlier* than they would otherwise show up if you were using unicode for your encoded-bytes data. For example, if your program's intent is to end up with latin-1 output, then it would be better for an error to show up at the very *first* point where non-latin1 characters are mixed with your data, rather than only showing up at the output boundary! That highly depends on your usage. If you're formatting a comment on a web page, checking at output and replacing with '?' is better than a traceback. If you're entering key values into a database, then you likely want to know where the non-latin1 data is entering your program, not where it's mixed with your data or the output boundary. However, if you promoted mixed-type operation results to unicode instead of ebytes, then you: 1) can't preserve data that doesn't have a 1:1 mapping to unicode, and ebytes should be immutable like bytes and str. So you shouldn't lose the data if you keep a reference to it. 2) can't detect an error until your data reaches the output point in your application -- forcing you to defensively insert ebytes calls everywhere (vs. simply wrapping them around a handful of designated inputs), or else have to go right back to tracing down where the unusable data showed up in the first place. Usually, you don't want to know where you are combining two incompatible strings. Instead, you want to know where the incompatible strings are being set in the first place. If function(a, b) tracebacks with certain combinations of a and b I need to know where a and b are being set, not where function(a, b) is in the source code. So you need to be making input values ebytes() (or str in current python3) no matter what. One thing that seems like a bit of a blind spot for some folks is that having unicode is *not* everybody's goal. Not because we don't believe unicode is generally a good thing or anything like that, but because we have to work with systems that flat out don't *do* unicode, thereby making the presence of (fully-general) unicode an error condition that has to be stamped out! I think that sometimes as well. However, here I think you're in a bit of a blind spot yourself. I'm saying that making ebytes + str coerce to ebytes will only yield a traceback some of the time; which is the python2 behaviour. Having ebytes + str coerce to str will never throw a traceback as long as our implementation checks that the bytes and encoding work together fro mthe start. Throwing an error in code, only on some input is one of the main reasons that debugging unicode vs byte issues sucks on python2. On my box, with my dataset, everything works. Toss it up on pypi and suddenly I have a user in Japan who reports that he gets a traceback with his dataset that he can't give to me because it's proprietary, overly large, or transient. IOW, if you're producing output that has to go into another system that doesn't take unicode, it doesn't matter how theoretically-correct it would be for your app to process the data in unicode form. In that case, unicode is not a feature: it's a bug. This is not always true. If you read a webpage, chop it up so you get a list of words, create a histogram of word length, and then write the output as utf8 to a database. Should you do all your intermediate string operations on utf8 encoded byte strings? No, you should do them on unicode strings as otherwise you need to know about the details of how utf8 encodes characters. And as it really *is* an error in that case, it should not pass silently,
Re: [Python-Dev] email package status in 3.X
... IOW, if you're producing output that has to go into another system that doesn't take unicode, it doesn't matter how theoretically-correct it would be for your app to process the data in unicode form. In that case, unicode is not a feature: it's a bug. This is not always true. If you read a webpage, chop it up so you get a list of words, create a histogram of word length, and then write the output as utf8 to a database. Should you do all your intermediate string operations on utf8 encoded byte strings? No, you should do them on unicode strings as otherwise you need to know about the details of how utf8 encodes characters. You'd still have problems in Unicode given stuff like å =~ å even though u'\xe5' vs u'a\u030a' (those will look the same depending on your Unicode system. IDLE shows them pretty much the same, T-Bird on Windosw with my current font shows the second as 2 characters.) I realize this was a toy example, but it does point out that Unicode complicates the idea of 'equality' as well as the idea of 'what is a character'. And just saying decode it to Unicode isn't really sufficient. John =:- ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
On Tue, Jun 22, 2010 at 6:16 AM, P.J. Eby p...@telecommunity.com wrote: True, but making it a separate type with a required encoding gets rid of the magical I don't know - the I don't know encoding is just a plain old bytes object. So, to boil down the ebytes idea, it is basically a request for a second string type that holds an octet stream plus an encoding name, rather than a Unicode character stream. Calling it ebytes seems to emphasise the wrong parallel in that case (you have a 'str' object with a different internal structure, not any kind of bytes object). For now I'll call it an altstr. Then the idea can be described as - altstr would expose the same API as str, NOT the same API as bytes - explicit conversion via str would use the altstr's __str__ method - explicit conversion via bytes would use the altstr's __bytes__ method - implicit interaction with str would convert the str to an altstr object according to the altstr's rules. This may be best handled via a coercion method on altstr, rather than str actually needing to know the details (i.e. an altrstr.__coerce_str__() method). For the 'ebytes' model, this would do something like type(self)(other.encode(self.encoding), self.encoding)). The operation would then be handled by the corresponding method on the coerced object. A new type could then override operations such as __contains__, __mod__, format() and join(). This is still smelling an awful lot like the 2.x str type to me, but supporting a __coerce_str__ method may allow some useful experimentation in this space (as PJE suggested). There's a chance it would be abused, but it offers a greater chance of success than trying to come up with a concrete altstr type without providing a means for experimentation first. (In principle, you could then drop *all* the stringlike methods from plain-old-bytes objects. If it's really text-in-bytes you want, you should use an ebytes with the encoding specified.) Except that a lot of those string-like methods are just plain useful, even when you *know* you're dealing with an octet stream rather than latin-1 encoded text. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
On Mon, Jun 21, 2010 at 04:52:08PM -0500, John Arbash Meinel wrote: ... IOW, if you're producing output that has to go into another system that doesn't take unicode, it doesn't matter how theoretically-correct it would be for your app to process the data in unicode form. In that case, unicode is not a feature: it's a bug. This is not always true. If you read a webpage, chop it up so you get a list of words, create a histogram of word length, and then write the output as utf8 to a database. Should you do all your intermediate string operations on utf8 encoded byte strings? No, you should do them on unicode strings as otherwise you need to know about the details of how utf8 encodes characters. You'd still have problems in Unicode given stuff like å =~ å even though u'\xe5' vs u'a\u030a' (those will look the same depending on your Unicode system. IDLE shows them pretty much the same, T-Bird on Windosw with my current font shows the second as 2 characters.) I realize this was a toy example, but it does point out that Unicode complicates the idea of 'equality' as well as the idea of 'what is a character'. And just saying decode it to Unicode isn't really sufficient. Ah -- but if you're dealing with unicode objects you can use the unicodedata.normalize() function on them to come out with the right values. If you're using bytes, it's yet another case where you, the programmer, have to know what byte sequences represent combining characters in the particular encoding that you're dealing with. -Toshio pgpF7cCCZvokU.pgp Description: PGP signature ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
On Tue, 22 Jun 2010 06:09:52 am P.J. Eby wrote: However, if you promoted mixed-type operation results to unicode instead of ebytes, then you: 1) can't preserve data that doesn't have a 1:1 mapping to unicode, Sounds like exactly the sort of thing the Unicode private codepoints were invented for, as Toshio suggests. In any case, if there are use-cases for text that aren't solved by Unicode, and I'm not convinced that there are, Python doesn't need to solve them. At the very least, such a solution should start off as a third-party package to prove itself before being made a part of the standard library, let alone a built-in. -- Steven D'Aprano ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
On Tue, 22 Jun 2010 08:03:58 am Nick Coghlan wrote: On Tue, Jun 22, 2010 at 6:16 AM, P.J. Eby p...@telecommunity.com wrote: True, but making it a separate type with a required encoding gets rid of the magical I don't know - the I don't know encoding is just a plain old bytes object. So, to boil down the ebytes idea, it is basically a request for a second string type that holds an octet stream plus an encoding name, rather than a Unicode character stream. Do any other languages have any equivalent to this ebtyes type? If not, how do they deal with this issue? [...] This is still smelling an awful lot like the 2.x str type to me Yes. Virtually the only difference I can see is that it lets the user set a per-object default encoding to use when coercing strings to and from bytes. If this is not the case, can somebody please explain what I'm missing? -- Steven D'Aprano ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
Steven D'Aprano: Do any other languages have any equivalent to this ebtyes type? The String type in Ruby 1.9 is a byte string with an encoding attribute. Most online Ruby documentation is for 1.8 but the API can be examined here: http://ruby-doc.org/ruby-1.9/index.html Here's something more explanatory: http://blog.grayproductions.net/articles/ruby_19s_string My view is that this actually makes things much more complex by making encoding combination an n*n problem (where n is the number of encodings) rather an n sized problem when you have a single core string type Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
On 6/21/2010 1:58 PM, Stephen J. Turnbull wrote: As for Think Carefully About It Every Time, that is required only in Porting Programs That Mix Operation On Bytes With Operation On Str. The 2.x anti-pattern If you write programs from scratch, however, the decode-process-encode paradigm quickly becomes second nature. Except in this particular arena, it already should be to anyone reading this list. Decorate-sort-undecorate is another example of the same idea. Transform-compute-untransform is the basis of NP-complete theory. Frequency domain processing sandwiched between forward and reverse Fourier transforms is a third example. And so on. -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
On Mon, Jun 22, 2010 at 7:27:31 PM, Steven D'Aprano st...@pearwood.info wrote: On Tue, 22 Jun 2010 08:03:58 am Nick Coghlan wrote: So, to boil down the ebytes idea, it is basically a request for a second string type that holds an octet stream plus an encoding name, rather than a Unicode character stream. Do any other languages have any equivalent to this ebtyes type? Ruby seems to do this: http://yokolet.blogspot.com/2009/07/design-and-implementation-of-ruby-m17n.html I don't use ruby myself, and I'm probably missing some subtle flaws, but the exposition at that link makes sense to me. cheers, Jess ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
On 6/21/2010 2:46 PM, P.J. Eby wrote: This ignores the existence of use cases where what you have is text that can't be properly encoded in unicode. I think it depends on what you mean by 'properly'. I will try to explain with English examples. 1. Unicode represents a finite set of characters and symbols and a few control or markup operators. The potential set is unbounded, so unicode includes a user area. I include use of that area in 'properly'. I kind of suspect that the statement above does not since any byte or short byte sequence that does not translate can instead use the user area. 2. Unicode disclaims direct representation of font and style information, leaving that to markup either in or out of the text stream. (It made an exception for japanese narrow and wide ascii chars, which I consider to essentially be duplicate font variations of the normal ascii codes.) Html uses both in-band and out-of-band (css) markup. Stripping markup information is a loss of information. If one wants it, one must keep it in one form or another. I believe that some early editors like Wordstar used high-bit-set bytes for bold, underline, italic on and off. Assuming I have the example right, can Wordstar text be 'properly encoded in unicode'? If one insists that that mean replacement of each of the format markup chars with a single defined char in the Basic Multilingual Plane, then 'no'. If one allows replacement by bold, /bold, and so on, then 'yes'. 3. Unicode disclaims direct representation of glyphic variants (though again, exceptions were made for asian acceptance). For example, in English, mechanically printed 'a' and 'g' are different from manually printed 'a' and 'g'. Representing both by the same codepoint, in itself, loses information. One who wishes to preserve the distinction must instead use a font tag or perhaps a handprinted tag. Similarly, older English had a significantly different glyph for 's', which looks more like a modern 'f'. If IBM's EBCDIC had codes for these glyph variants, IBM might have insisted that unicode also have such so char for char round-tripping would be possible. It does not and unicode does not. (Wordstar and other 1980s editor publishers were mostly defunct or weak and not in a position to make such demands.) If one wants to write on the history of glyph evolution, say of latin chars, one much either number the variants 'e-0', 'e-1', etc, or resort to the user area. In either case, proprietary software would be needed to actually print the variations with other text. I know, it's a hard thing to wrap one's head around, since on the surface it sounds like unicode is the programmer's savior. Unfortunately, real-world text data exists which cannot be safely roundtripped to unicode, I do not believe that. Digital information can always be recoded one way or another. As it is, the rules were bent for Japanese, in a way that they were not for English, to aid round-tripping of the major public encodings. I can, however, believe that there were private encodings for which round-tripping is more difficult. But there are also difficulties for old proprietary and even private English encodings. -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
Barry Warsaw writes: I'm still not sure ebytes solves the problem, I don't see how it can. If you have an encoding to stuff into ebytes, you could just convert to Unicode and guarantee that all internal string operations will succeed. If you use ebytes instead, every string operation has to be wrapped in try ... except EBytesError, to no gain that I can see. If you don't have an encoding, then you just have bytes, which strictly speaking shouldn't be operated on (in the sense of slicing, dicing, or stir-frying) at all if you're in an environment where they are a carrier for formatted information such as non-ASCII characters or PNG images. but it avoids one I'm most concerned about seeing proposed. I really really do not want to add encoding=blah arguments to boatloads of function signatures. Agreed. But ebytes isn't a solution to that; it's a regression to one of the hardest problems in Python 2. OTOH, it seems to me that there's only one boatload to worry about. That's the boatload containing protocol-less APIs, ie, Unix OS data (names in the filesystem, content of environment variables). Other platforms (Windows, Mac) are standardizing on protocols for these things and enforcing them in the OS, and free Unices are going to the convention that everything is non-normalized UTF-8. What other boats are you worried about? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
P.J. Eby writes: In Kagoshima, you'd use pass in an ebytes with your encoding to a stdlib API, and *get back an ebytes with the right encoding*, rather than an (incorrect and useless) unicode object which has lost data you need. How does the stdlib do that? Unless it guesses which encoding for Japanese is being used? And even if this ebytes uses Shift JIS, what makes that the right encoding for anything? On the other hand, I know when *I* need some encoding, and when I figure it out I will store it in an appropriate place in my program. The problem is that for some programs it is not unlikely that I will see all of Shift JIS, EUC-JP, ISO-2022-JP, UTF-8, and UTF-16, and on a very bad day, RFC 2047, GB 2312, and Big5, too, used to encode Japanese. It's not totally unlikely for a browser to send URLs to a server expecting UTF-8 to recover a message/rfc822 object containing ISO-2022-JP in the mail header and EUC-JP in the body. So I need to know which encoding was used by the server that sent the reply, but the ebytes can't tell me that if it fishes an URL in EUC-JP out of the message body. I need to convert that URL to UTF-8, or most servers will 404. But this is not the case at all, for use cases where no, really, you *have to* work with bytes-encoded text streams. The mere release of Python 3.x will not cause all the world's applications, libraries, and protocols to suddenly work with unicode, where they did not before. Sure. That's what .encode() and .decode() are for. The problem is what to do when you don't know what to put in the parentheses, and I can't think of a use case offhand where ebytes(stuff,'garbage') does better than PEP 383-enabled str for: Being explicit about the encoding of the bytes you're flinging around is actually an *increase* in specificity, explicitness, robustness, and error-checking ability over the status quo for either 2.x *or* 3.x... *and* it improves these qualities for essentially *all* string-handling code, without requiring that code to be rewritten to do so. A well-spoken piece. But, you see, most of those encodings are *only* interesting so that you can transcode characters to the encoding of interest. What's the e.o.i.? That is easily found in the context or has an obvious default, if you're lucky, or otherwise a hard problem that ebytes does nothing to help solve as far as I can see. Cf. Robert Collins' post aanlktinq_d_vahbw5ikuyy9qgjqoffy4xczc0dyzt...@mail.gmail.com, where he makes it quite explicit that a bytes interface is all about punting in the face of missing encoding information. and (2) you really want this under control of higher level objects that have access to some knowledge of the environment, rather than the lowest level. This proposal actually has such a higher-level object: an ebytes. I don't see how that can be true. An ebytes is a very low-level object that has no idea whether its encoding is interesting (eg, the one that an RFC or a server specifies), or a technical detail of use only until the ebytes is decoded, then can be thrown away. I just don't see, in the case where there is a real encoding in the ebytes, what harm is done by decoding the ebytes to str. If context indicates that the encoding is an interesting one (eg, it should be the default for encoding on output), then you want to save that in an appropriate place that preserves not just the encoding itself, but the context that gives it its importance. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
Antoine Pitrou writes: I think it's an unfortunate analogy. Propose a better one, then. I'm definitely not wedded to the ones I've proposed! But we have a PR problem *now*. The loyal opposition clearly intend to continue trash-talking Python 3 until the libraries get to 100% (or a government-approved approximation of 100%). The topic on #python seems unlikely to change at this point, with both Glyph and JP pointedly failing to denounce it publicly, while Stephen defends it and says it's not going to change as long as the libraries aren't done. What do you suggest? Or do you think there's no PR problem we should worry about, just accept that this going to be a further drag on adoption and improvement, and keep on keeping on? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
I can only imagine how difficult can it be to do such a conversion in a project like Twisted or Django where the I/O plays a fundamental role. For Django, you don't need to imagine, but can look at the actual changes: http://bitbucket.org/loewis/django-3k/ The choice of forcing the user to use Unicode and think in Unicode was a very brave one, and I'm sure it's for the better, but not everyone wants to deal with that because Unicode is hard to swallow. The majority of people prefer to stay with bytes and eventually learn and introduce Unicode only when that is actually needed. It's not really an issue with Unicode, but rather with characters. Surprisingly, most people don't grasp the notion of abstract character. This is similar to not grasping the notion of abstract integral number, which most programmers master over time (although my students typically need a year or more to get the difference between decimal number, two's complement, and abstract integer; the difference between character string and number is easier (*)). For numbers, programmers are forced to accept the abstraction. For character strings, they apparently resist much more. Regards, Martin (*) An anecdotal dialog may read like this Teacher: How are numbers represented in Python? Student: In decimal. T: How so? S: I can do x = 47 and it is decimal. I can then do print x and get 47. See? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
On Mon, 21 Jun 2010 02:30:17 +0900 Stephen J. Turnbull step...@xemacs.org wrote: Antoine Pitrou writes: I think it's an unfortunate analogy. Propose a better one, then. I'm definitely not wedded to the ones I've proposed! I'm not sure why you want an analogy. Python 3 improves the language and drops legacy cruft. Bringing C++ makes the description unnecessarily contentious and loaded (because C++ has a rather bad reputation amongst many people; recently Linus Torvalds explained again why he thought C was much more appropriate a programming language). And it's not even warranted, because the situation is vastly different. What do you suggest? Or do you think there's no PR problem we should worry about, just accept that this going to be a further drag on adoption and improvement, and keep on keeping on? I suppose the PR problem could be solved by having an official page on python.org explain what the new features and advantages of Python 3 over Python 2 are. There's no such thing right now; actually, I'm not sure there's a Web page explaining clearly what the difference is about, why it was done in such a compatibility-breaking way, and what we advise (both actual and potential) users to do. I suppose that's a task for the Web content editor community. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
On Sun, 20 Jun 2010 14:26:28 +0200 Giampaolo Rodolà g.rod...@gmail.com wrote: I attempted to port pyftpdlib to python 3 several times and the biggest show stopper has always been the bytes / string difference introduced by Python 3 which forces you to *know* and *use* Unicode every time you deal with some text and 2to3 is completely useless here. I don't really understand what the difficulties are. A character is a character; to convert from bytes to characters needs to know the encoding, which your protocol should specify somewhere (of course, I suppose FTP is old and crummy enough that it may not specify anything). An encoding is nothing more than a transformation. When you get gzipped data, you must decompress it before doing anything useful out of it. Similarly, when you get (say) UTF-8 data, you must decode it before doing anything useful out of it. I can only imagine how difficult can it be to do such a conversion in a project like Twisted or Django where the I/O plays a fundamental role. Twisted actually seems to enforce the bytes / unicode separation quite well already, so I don't think they should have many problems on that front. Modern Web frameworks seem to be in the same boat (they already give the Web developer unicode strings to play with, and handle the encoding/decoding at the IO boundary transparently). The choice of forcing the user to use Unicode and think in Unicode was a very brave one, and I'm sure it's for the better, but not everyone wants to deal with that because Unicode is hard to swallow. Could Google fund a project named Unicode Swallow? Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
On Sun, Jun 20, 2010 at 5:26 AM, Giampaolo Rodolà g.rod...@gmail.com wrote: 2010/6/20 Steven D'Aprano st...@pearwood.info: Python 2.x introduced Unicode strings. Python 3.x merely makes them the default. Merely? To me this looks as the main reason why a lot of projects haven't been ported to Python 3 yet. I attempted to port pyftpdlib to python 3 several times and the biggest show stopper has always been the bytes / string difference introduced by Python 3 which forces you to *know* and *use* Unicode every time you deal with some text Ah, but this is the crux of the difference between Python 2 and 3. The distinction between text and bytes is crucial, and Python 2 tried to paper over the differences in a way that led to endless pain. Many clumsy and shaky hacks have been invented to alleviate the pain but it never goes away. Python 3 takes a much clearer stance on the difference -- your code *must* be aware of the distinction and it *must* deal with it. The problem comes exactly where you find it: when *porting* existing code that uses aforementioned ways to alleviate the pain, you find that the hacks no longer work and a properly layered design is needed that clearly distinguishes between which variables contain bytes and which text. and 2to3 is completely useless here. Alas, this is true, because it is not a matter of changing some simple things. The old ways are no longer supported. I can only imagine how difficult can it be to do such a conversion in a project like Twisted or Django where the I/O plays a fundamental role. Django actually took one of the most principled stances towards this issue and has already been ported (although the port is not maintained by the core Django developers yet). I can't speak for Twisted but I know they have some funding towards a port. The problem is often worse for smaller libraries (like I presume pyftplib is) which don't have a clear stance about bytes vs. text. Another problem is some internet protocols (of which FTP I believe is one) which use antiquated models for dealing with binary vs. text data, often focusing entirely on encodings (usually and mistakenly called character sets) rather than on proper Unicode support. The choice of forcing the user to use Unicode and think in Unicode was a very brave one, and I'm sure it's for the better, but not everyone wants to deal with that because Unicode is hard to swallow. Education is needed. When you search Google (or Bing, for that matter :-) for python unicode the first hit is http://www.amk.ca/python/howto/unicode, which is highly detailed but probably too much information for the typical person faced with a UnicodeError exception traceback (that page is also focused on Python 2). What we need is a cookbook on how to deal with various common situations. The majority of people prefer to stay with bytes and eventually learn and introduce Unicode only when that is actually needed. This is exactly what we tried to do in Python 2 and it was a flagrant disaster. It's just that the work-arounds people have created to deal with it don't port clearly -- which is by design. This is why I've always said that I assumed that the Python 3 transition would take 5 years. On the #python issue, I expect that IRC is much less influential that some here fear (and than some fervent IRC users believe). I don't see reason for panic or heavy-handed interference. OTOH engaging the channel operators more in python-dev sounds like a useful approach. -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
Pass the ketchup, I need to eat my words. I wrote: The loyal opposition clearly intend to continue trash-talking Python 3 until the libraries get to 100% (or a government-approved approximation of 100%). The topic on #python seems unlikely to change at this point, with both Glyph and JP pointedly failing to denounce it publicly, while Stephen defends it and says it's not going to change as long as the libraries aren't done. It would seem from posts I received after replying (local mail glitch, should have know there was more coming :-( ) that the facts are that the topic is quite likely to change soonish, and that trash-talking is being done, if at all, by trolls. (Having spent a few hours on #python today, I see that's a lot more possible than I would have believed in this community. Nobody's immune.) Glyph, JP, and Stephen have my personal apologies. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
On Sun, Jun 20, 2010 at 7:30 PM, Stephen J. Turnbull step...@xemacs.org wrote: Antoine Pitrou writes: But we have a PR problem *now*. The loyal opposition clearly intend to continue trash-talking Python 3 until the libraries get to 100% (or a government-approved approximation of 100%). The topic on #python seems unlikely to change at this point, with both Glyph and JP pointedly failing to denounce it publicly, while Stephen defends it and says it's not going to change as long as the libraries aren't done. Huh? We just changed the topic on #python because people complained about it. We didn't do it earlier because we didn't know it was a problem. Defending it doesn't mean it's set in stone :-) I don't wanna come across like a jerk but could we please not use loaded terms like loyal opposition and trash-talking? I don't really think that's what people do or are (or at least want to be/intend to do). I've really honestly tried my best to fix this situation (see the other thread) and the people whom I've gotten input from (both here and in the IRC channels) have been nothing but helpful. What do you suggest? Or do you think there's no PR problem we should worry about, just accept that this going to be a further drag on adoption and improvement, and keep on keeping on? I very much like Martin and Antoine's ideas of putting the thing up on python.org, that might also solve people's problems with the apparent dissonance between #python and python-dev/the PSF that neither side really wants. To the contrary, I think everyone wants this situation to improve, including Guido, apparently. Myself included, I think everyone stands to gain here. thanks for listening Laurens ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
Guido van Rossum writes: On the #python issue, I expect that IRC is much less influential that some here fear (and than some fervent IRC users believe). I don't see reason for panic or heavy-handed interference. OTOH engaging the channel operators more in python-dev sounds like a useful approach. More vice-versa, I now think. Ie, (somewhat) greater python-dev presence on #python is more important. I sort of assumed that people actually participated in #python, as a number do in c.l.p, but that doesn't seem to be so. At least while I was there, I didn't see anybody else who seemed to be python-dev, whether core or the regular denizens of the peanut gallery. From a few hours monitoring and participating in #python, Laurens gives pretty accurate summary of the kind of people in the channel. I didn't see anything about Python 3, but I can definitely imagine there being Python-3-baiting trolls. There certainly were a few trollish posters. Anyway, what I personally plan to do is put in a couple of hours a week on #python, and I probably mostly won't mention Python 3 unless asked, and maybe in discussing Unicode issues. While I don't claim to be particularly *representative* of python-dev, an additional dimension of diversity should go a long way. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
At 10:57 AM 6/20/2010 -0700, Guido van Rossum wrote: The problem comes exactly where you find it: when *porting* existing code that uses aforementioned ways to alleviate the pain, you find that the hacks no longer work and a properly layered design is needed that clearly distinguishes between which variables contain bytes and which text. Actually, I would say that it's more that (in the network protocol case) we *have* bytes, some of which we would like to *treat* as text, yet do not wish to constantly convert back and forth to full-blown unicode -- especially since the protocols themselves designate ASCII or latin-1 at the transport layer (sometimes with odder encodings above, but these already have to be explicitly dealt with by existing code). While reading over this thread, I'm wondering whether at least my (WSGI-related) problems in this area would be solved by the availability of a type (say bstr) that was simply a wrapper providing string-like behavior over an underlying bytes, byte array, or memoryview, that would produce objects of compatible type when combined with strings (by encoding them to match). Then, I could wrap bytes with it to pass them to string operations, and then feed them back into everything else. The bstr type ideally would be directly compatible with bytes I/O, or at least have a .bytes attribute that would be. It seems like that would reduce WSGI porting issues quite a bit, since it would mostly consist of throwing extra bstr() calls in where things are breaking, and maybe grabbing the .bytes attribute for I/O. This approach would still be explicit as to what types you're working with, but would not require O(n) *conversions* at every interaction boundary. It would be limited, of course, to single-byte encodings with all characters (0-255) valid. OTOH, maybe there should just be a bytestrings module with bytestrings.ascii and bytestrings.latin1, and between the two that should cover the network protocol needs quite well. Actually, if the Python 3 str() constructor could do O(1) conversion for the latin-1 case (i.e., just wrapped the underlying bytes), I would just put, bstr = lambda x: str(x,'latin-1') at the top of my programs and have roughly the same effect. This idea is still a bit half-baked, but a more baked version might be just the ticket for porting stuff that used str to work with bytes in 2.x, if only because writing, e.g.: newurl = bstr(urljoin(bstr(base), 'subdir')) seems so much saner than writing *this* everywhere: newurl = str(urljoin(str(base, 'latin-1'), 'subdir'), 'latin-1') It is perhaps a bit late to propose this idea, since ideally we would also want to use it in 2.x to aid porting. But I'm curious if any other people here experiencing byte/unicode woes in relation to network protocols would find this a solution to their chief frustration. (i.e., that the stdlib often insists now on strings, where effectively bytes were usable before, and thus one must do conversions both coming and going.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
On Sun, Jun 20, 2010 at 2:40 PM, P.J. Eby p...@telecommunity.com wrote: At 10:57 AM 6/20/2010 -0700, Guido van Rossum wrote: The problem comes exactly where you find it: when *porting* existing code that uses aforementioned ways to alleviate the pain, you find that the hacks no longer work and a properly layered design is needed that clearly distinguishes between which variables contain bytes and which text. Actually, I would say that it's more that (in the network protocol case) we *have* bytes, some of which we would like to *treat* as text, yet do not wish to constantly convert back and forth to full-blown unicode -- especially since the protocols themselves designate ASCII or latin-1 at the transport layer (sometimes with odder encodings above, but these already have to be explicitly dealt with by existing code). While reading over this thread, I'm wondering whether at least my (WSGI-related) problems in this area would be solved by the availability of a type (say bstr) that was simply a wrapper providing string-like behavior over an underlying bytes, byte array, or memoryview, that would produce objects of compatible type when combined with strings (by encoding them to match). Then, I could wrap bytes with it to pass them to string operations, and then feed them back into everything else. The bstr type ideally would be directly compatible with bytes I/O, or at least have a .bytes attribute that would be. It seems like that would reduce WSGI porting issues quite a bit, since it would mostly consist of throwing extra bstr() calls in where things are breaking, and maybe grabbing the .bytes attribute for I/O. This approach would still be explicit as to what types you're working with, but would not require O(n) *conversions* at every interaction boundary. It would be limited, of course, to single-byte encodings with all characters (0-255) valid. OTOH, maybe there should just be a bytestrings module with bytestrings.ascii and bytestrings.latin1, and between the two that should cover the network protocol needs quite well. Actually, if the Python 3 str() constructor could do O(1) conversion for the latin-1 case (i.e., just wrapped the underlying bytes), I would just put, bstr = lambda x: str(x,'latin-1') at the top of my programs and have roughly the same effect. This idea is still a bit half-baked, but a more baked version might be just the ticket for porting stuff that used str to work with bytes in 2.x, if only because writing, e.g.: newurl = bstr(urljoin(bstr(base), 'subdir')) seems so much saner than writing *this* everywhere: newurl = str(urljoin(str(base, 'latin-1'), 'subdir'), 'latin-1') It is perhaps a bit late to propose this idea, since ideally we would also want to use it in 2.x to aid porting. But I'm curious if any other people here experiencing byte/unicode woes in relation to network protocols would find this a solution to their chief frustration. (i.e., that the stdlib often insists now on strings, where effectively bytes were usable before, and thus one must do conversions both coming and going.) I hate to reply with a simple +1 - but I've heard this pain and proposal from a frightening number of people, something which allowed you to use bytes with some of the sting methods would go a really long way to solving a lot of peoples python 3 pain. I don't relish the idea that once people start moving over, there might be a billion implementations of things like this. jesse ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
On Sun, Jun 20, 2010 at 10:57:05AM -0700, Guido van Rossum wrote: Education is needed. When you search Google (or Bing, for that matter :-) for python unicode the first hit is http://www.amk.ca/python/howto/unicode, which is highly detailed but probably too much information for the typical person faced with a UnicodeError exception traceback (that page is also focused on Python 2). What we need is a cookbook on how to deal with various common Eep! That should be directed to http://docs.python.org/howto/unicode.html, the copy that's actually incorporated in the Python docs. I'll fix that immediately. Regarding a smaller document for people who hit a UnicodeError exception: could we write a little Unicode FAQ for python.org? --amk ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
On 6/20/2010 8:26 AM, Giampaolo Rodolà wrote: I attempted to port pyftpdlib to python 3 several times and the biggest show stopper has always been the bytes / string difference introduced by Python 3 which forces you to *know* and *use* Unicode every time you deal with some text and 2to3 is completely useless here. I believe the advice in the wiki porting page is to use unicode() and bytes() but never str(), in a version that runs in 2.6. Then 2to3 should do fine. For 2.5-, add 'bytes = str' somewhere. 2to3 still gets patches, I believe, when someone exhibits code that could and ought to be converted but is not. I suspect that if you posted 'Problems porting pyftpdlib to Python3', you would get some help. If it involved inadequacies in the current tools and guides, it would to be be on-topic here. Or try python-list. The choice of forcing the user to use Unicode and think in Unicode was a very brave one, and I'm sure it's for the better, but not everyone wants to deal with that because Unicode is hard to swallow. I felt that way until my daughter decided to switch from Spanish to Japanese for here foreign language. Once I quit fighting it, it because much easier to swallow and learn. As it turns out, thinking in Unicode is a pretty straightforward generalization of thinking in ascii. There are some annoying glitches due to the need to accomodate legacy systems. The plethora of legacy encodings for various subsets, besides ascii, is also a nuisance. The majority of people who use latin-char alphabets prefer to stay with bytes and eventually learn and introduce Unicode only when that is actually needed. The example at http://code.google.com/p/pyftpdlib/ uses names and filenames. Without unicode, these are restricted to ascii, unless you use multiple encodings, which to me would be worse. Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
On 6/20/2010 1:30 PM, Stephen J. Turnbull wrote: The topic on #python seems unlikely to change at this point I just verified that, thanks to Laurens and whoever, it has been. It is now rather good. Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
On Sun, Jun 20, 2010 at 11:30 PM, Terry Reedy tjre...@udel.edu wrote: On 6/20/2010 8:26 AM, Giampaolo Rodolà wrote: I attempted to port pyftpdlib to python 3 several times and the biggest show stopper has always been the bytes / string difference introduced by Python 3 which forces you to *know* and *use* Unicode every time you deal with some text and 2to3 is completely useless here. I believe the advice in the wiki porting page is to use unicode() and bytes() but never str(), in a version that runs in 2.6. Then 2to3 should do fine. For 2.5-, add 'bytes = str' somewhere. Really? I thought you were supposed to call encode/decode methods on the appropriate thing, depending if they're coming from a byte source or a character source. The problems arise when you're doing things like paths, which I believe are bytes on *nix and proper Unicode on Windows (which basically just means they enforce an encoding, UTF-16 if I'm not mistaken). I don't actually use Windows so I might be completely wrong here. 2to3 still gets patches, I believe, when someone exhibits code that could and ought to be converted but is not. I suspect that if you posted 'Problems porting pyftpdlib to Python3', you would get some help. If it involved inadequacies in the current tools and guides, it would to be be on-topic here. Or try python-list. The choice of forcing the user to use Unicode and think in Unicode was a very brave one, and I'm sure it's for the better, but not everyone wants to deal with that because Unicode is hard to swallow. I felt that way until my daughter decided to switch from Spanish to Japanese for here foreign language. Once I quit fighting it, it because much easier to swallow and learn. As it turns out, thinking in Unicode is a pretty straightforward generalization of thinking in ascii. There are some annoying glitches due to the need to accomodate legacy systems. The plethora of legacy encodings for various subsets, besides ascii, is also a nuisance. I think doing unicode/str properly in 2.x is very important, #python stresses it quite often, I think Py3k's strictness is a good idea because people very often write something that appears to work for a long time, and then someone tries it using funny bytes, and everything blows apart. Convincing people their software is wrong when everything worked five minutes ago is really hard :-) You'd be surprised how long it can take before some of these problems are found, a couple of weeks ago in #python we had exactly this problem when we were helping Blender folks. There was a bug report from a German Blender user, turns out Blender ignores unicode in some critical spot making importing between people who disagree on charsets impossible. And Blender isn't exactly a project that's two weeks old and filled with idiots :) The downside is that *fixing* them then becomes a nontrivial task. The central problem is probably that a lot of people don't understand Unicode. Recently I learned that even Tanenbaum got it wrong in his latest revision of the computer networks book! (Although that might just be my dutch translation of it being bad). Terry Jan Reedy Laurens ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
I hate to reply with a simple +1 - but I've heard this pain and proposal from a frightening number of people, something which allowed you to use bytes with some of the sting methods would go a really long way to solving a lot of peoples python 3 pain. I don't relish the idea that once people start moving over, there might be a billion implementations of things like this. My concern with it would be creating the temptation to use these new objects that can't tolerate multibyte or variable character length encodings when the general string type was more relevant (thus to some degree perpetuating Python 2.x issues with incomplete Unicode handling). Perhaps if people could identify which specific string methods are causing problems? In 3.2, there really aren't that many differences between the available methods for strings and bytes: set(dir(str)) - set(dir(bytes)) {'isprintable', 'format', '__mod__', 'encode', 'isidentifier', '_formatter_field_name_split', 'isnumeric', '__rmod__', 'isdecimal', '_formatter_parser'} set(dir(bytes)) - set(dir(str)) {'decode', 'fromhex'} Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
On 6/20/2010 4:10 PM, Jesse Noller wrote: On Sun, Jun 20, 2010 at 2:40 PM, P.J. Ebyp...@telecommunity.com wrote: While reading over this thread, I'm wondering whether at least my (WSGI-related) problems in this area would be solved by the availability of a type (say bstr) that was simply a wrapper providing string-like behavior over an underlying bytes, byte array, or memoryview, that would produce objects of compatible type when combined with strings (by encoding them to match). I hate to reply with a simple +1 - but I've heard this pain and proposal from a frightening number of people, something which allowed you to use bytes with some of the sting methods would go a really long way to solving a lot of peoples python 3 pain. I don't relish the idea that once people start moving over, there might be a billion implementations of things like this. Given that the 3.x bytes and bytearray classes do retain text methods like .capitalize(), which are meaningless for arbitrary binary data, it is not clear to me what you are asking for or what problem a new class would solve. I am curious though. Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
On Jun 20, 2010, at 6:21 PM, Terry Reedy tjre...@udel.edu wrote: On 6/20/2010 4:10 PM, Jesse Noller wrote: On Sun, Jun 20, 2010 at 2:40 PM, P.J. Ebyp...@telecommunity.com wrote: While reading over this thread, I'm wondering whether at least my (WSGI-related) problems in this area would be solved by the availability of a type (say bstr) that was simply a wrapper providing string- like behavior over an underlying bytes, byte array, or memoryview, that would produce objects of compatible type when combined with strings (by encoding them to match). I hate to reply with a simple +1 - but I've heard this pain and proposal from a frightening number of people, something which allowed you to use bytes with some of the sting methods would go a really long way to solving a lot of peoples python 3 pain. I don't relish the idea that once people start moving over, there might be a billion implementations of things like this. Given that the 3.x bytes and bytearray classes do retain text methods like .capitalize(), which are meaningless for arbitrary binary data, it is not clear to me what you are asking for or what problem a new class would solve. I am curious though. Ask the web-sig and wsgi folks for starters. I know they've experienced non-zero pain. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
On Mon, 21 Jun 2010 08:01:08 am Laurens Van Houtven wrote: I think doing unicode/str properly in 2.x is very important, #python stresses it quite often, I think Py3k's strictness is a good idea because people very often write something that appears to work for a long time, and then someone tries it using funny bytes, and everything blows apart. Convincing people their software is wrong when everything worked five minutes ago is really hard :-) Worse is when you have people who, when faced with their software failing to handle filenames containing non-ASCII characters (those funny letters), insist that the problem is the user for giving non-ASCII characters. Even when they're in the user's native (non-Latin) language. Even when the OS supports them. Gah. -- Steven D'Aprano ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
At 08:08 AM 6/21/2010 +1000, Nick Coghlan wrote: Perhaps if people could identify which specific string methods are causing problems? __getitem__(int) returns an integer rather than a bytestring, so anything that manipulates individual characters can't be given bytes and have it work. That was one of the key differences I had in mind for a bstr type, apart from designing it to coerce normal strings to bstrs in cross-type operations, and to allow O(1) conversion to/from bytes. Another randomly chosen byte/string incompatibility (Python 3.1; I don't have 3.2 handy at the moment): os.path.join(b'x','y') Traceback (most recent call last): File stdin, line 1, in module File c:\Python31\lib\ntpath.py, line 161, in join if b[:1] in seps: TypeError: Type str doesn't support the buffer API os.path.join('x',b'y') Traceback (most recent call last): File stdin, line 1, in module File c:\Python31\lib\ntpath.py, line 161, in join if b[:1] in seps: TypeError: 'in string' requires string as left operand, not bytes Ironically, it seems to me that in trying to make the type distinction more rigid, Py3K fails in this area precisely because it is not a rigidly typed language in the Java or Haskell sense: i.e., os.path.join doesn't say, I need two stringlike objects of the *same type*, not even in its docstring. At least in Java, you would either implement a path type with coercions from bytes and strings, or you'd have a class with overloaded methods for handling join operations on bytes and strings, respectively, thereby avoiding this whole mess. (Alas, this little example on the 'in' operator also shows that my bstr effort would probably fail anyway, because there's no '__rcontains__' (__lcontains__?) to allow it to override the str type's __contains__.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
Steven D'Aprano writes: Frankly, I believe that pushing the meme that Python 3 is different is a strategic mistake. I agree that it's strategically undesirable. Unfortunately, the genuine backward incompatibility, as well as the huge mind-share already garnered by what I consider wrong-headed advice from certain quarters have made pushing the meme that Python 3 is very nearly the same untenable. It's hard to beat something like it's not yet time to use Python 3 with a nuanced explanation. had my experience would have been different. It's bad enough to have to tell people Python 3 is currently lacking some critical libraries, particularly third-party libraries without also telling them (wrongly IMO) oh, and it's a new language too. That's why I propose the C to C++ analogy. True, C++ does introduce a lot of new features, but most programmers migrating from C to C++ don't learn to use them properly for years, if ever, I'm told. Note also that I don't propose this as PSF advertising. I proposed it as a response to Mark's question, what should I tell my readers? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
On Sun, 20 Jun 2010 18:14:02 +0900 Stephen J. Turnbull step...@xemacs.org wrote: had my experience would have been different. It's bad enough to have to tell people Python 3 is currently lacking some critical libraries, particularly third-party libraries without also telling them (wrongly IMO) oh, and it's a new language too. That's why I propose the C to C++ analogy. I think it's an unfortunate analogy. C++ needs new libraries (with brand new APIs) to take advantage of its abstraction capabilities. Python 3 has almost the same abstraction capabilities as Python 2, you don't need to write new libraries: just port the existing ones. True, C++ does introduce a lot of new features, but most programmers migrating from C to C++ don't learn to use them properly for years, if ever, I'm told. I don't see how Python 3 has that problem. You can be productive here and now in Python 3, re-using your knowledge of Python 2 with a bit of added information. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
On Sun, Jun 20, 2010 at 7:32 PM, Antoine Pitrou solip...@pitrou.net wrote: True, C++ does introduce a lot of new features, but most programmers migrating from C to C++ don't learn to use them properly for years, if ever, I'm told. I don't see how Python 3 has that problem. You can be productive here and now in Python 3, re-using your knowledge of Python 2 with a bit of added information. Yeah, the significant issues with Python 3 over Python 2 *only* apply to people with legacy Python 2 code to worry about. The one thing that makes Python 3 potentially less desirable than Python 2 for some new applications is that the third party library support isn't quite as good yet. As more of the big libraries and frameworks provide Python 3 compatible versions, that factor will go away. As far as I can tell, with 3 years still to go on my own original prediction of 5+ years for Python 3 to start to be competitive with Python 2 for programming mindshare, adoption actually seems to be progressing fairly well. A lot of key functionality is either already supported in Python 3 or will be soon, and a lot of the rest is at least talking about plans for Python 3 compatibility. It's just that 5 years can seem like an eternity in the internet age, so sometimes people see the relative lack of adoption of Python 3 at this stage and start to panic about it being a failure. Now, if we're still having this conversation in 2013, then I'll admit we have a problem with the Python 3 uptake rate ;) Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
2010/6/20 Steven D'Aprano st...@pearwood.info: Python 2.x introduced Unicode strings. Python 3.x merely makes them the default. Merely? To me this looks as the main reason why a lot of projects haven't been ported to Python 3 yet. I attempted to port pyftpdlib to python 3 several times and the biggest show stopper has always been the bytes / string difference introduced by Python 3 which forces you to *know* and *use* Unicode every time you deal with some text and 2to3 is completely useless here. I can only imagine how difficult can it be to do such a conversion in a project like Twisted or Django where the I/O plays a fundamental role. The choice of forcing the user to use Unicode and think in Unicode was a very brave one, and I'm sure it's for the better, but not everyone wants to deal with that because Unicode is hard to swallow. The majority of people prefer to stay with bytes and eventually learn and introduce Unicode only when that is actually needed. --- Giampaolo http://code.google.com/p/pyftpdlib http://code.google.com/p/psutil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Michael Foord wrote: I didn't make myself clear. The expected disappointment I was referring to was about the rate of adoption, not about the quality of the product. I'm still baffled as to how a bug in the cgi module (along with the acknowledged email problems) is such a big deal. Was it reported and then languished in the bug tracker? That would be bad ion its own but if it was only recently discovered that indicates that it probably isn't such a big deal - either way it needs fixing, but using Python for writing cgis hasn't been a big use case for a long time. FWIW: some APIs in the cgi module is actually used by a number of Python2 web frameworks and libraries: Paste, for instance, uses it, and is in turn used by BFG, Pylons, TurboGears. Zope has used it that way since for ever. Tres. - -- === Tres Seaver +1 540-429-0999 tsea...@palladion.com Palladion Software Excellence by Designhttp://palladion.com -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkwczNsACgkQ+gerLs4ltQ7IjACfVcUshd10OQfZJqLMmU5p1nZ6 5OcAmwSsn7+q1GO67I1HuOH1waEDI8v/ =1geT -END PGP SIGNATURE- ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
l...@rmi.net writes: I agree that 3.X isn't all bad, and I very much hope it succeeds. And no, I have no answers; I'm just reporting the perception from downwind. The fact is, though, that many of your downwind readers are not the audience for Python 3, not yet. If you want to do Python 3 a favor, make sure that they understand that Python 3 is *not* an upgrade of Python 2. It's a hard task for you, but IMO one strategy is to write in the style that we wrote the DVCS PEP (#374) in: here's how you do the same task in these similar languages. And just as git and Bazaar turned out to have fatal defects in terms of adoption *in that time frame*, Python 3 is not yet adoptable for many, many users. Python 3 is a Python-2-like language, but even though it's built on the same design principles, and uses nearly identical syntax, there are fundamental differences. And it is *very* young. So it's a new language and should be approached in the same way as any new language. Try it on non-mission critical projects, on projects where its library support has a good reputation, etc. Many of your readers have no time (or perhaps no approval from upstairs) for that kind of thing. Too bad, but that's what happens to every great new language. So here it is: The prevailing view is that 3.X developers hoisted things on users that they did not fully work through themselves. Unicode is prime among these: for all the talk here about how 2.X was broken in this regard, the implications of the 3.X string solution remain to be fully resolved in the 3.X standard library to this day. What is a common Python user to make of that? Why should she make anything of that? Python 3 is a *new* language, possibly as different from Python 2 as C++ was from C (and *more* different in terms of fundamental incompatibilities). And as long as C++ was almost entirely dependent on C libraries, there were problems. (Not to mention that even today there are plenty of programmers who are proud to be C programmers, not C++ programmers.) Today, Python 3 is entirely dependent on Python 2 libraries. It's human to hope there will be no problems, but not realistic. BTW, I think what you're missing is that you're wrong about the money. Python 3 is still about the fun and the code. Fun and code are why the core developers spent about five years developing it, because doing that was fun, because the new code has high value as code, and because it promised *them* a more fun and more productive future. Library support, on the other hand, *is* about money. Your readers, down in the trenches of WWW, intraweb, and sysadmin implementation and support, depend on robust libraries to get their day jobs done. They really don't care that writing Python 3 was fun, and that programming in Python 3 is more fun than ever. That doesn't compensate for even one lingering str/bytes bogosity to most of them, and since they don't get paid for fixing Python library bugs, they don't, and they're in no mood to *forgive* any, either. So tell users who feel that way to use Python 2, for now, and check on Python 3 progress every 6 months or so. And users who are just a bit more adventurous to stick to applications where the libraries already have a good reputation *in Python 3*. It's as simple as that, I think. Regards, ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Jesse Noller wrote: On Fri, Jun 18, 2010 at 4:48 PM, P.J. Eby p...@telecommunity.com wrote: At 05:22 PM 6/18/2010 +, l...@rmi.net wrote: So here it is: The prevailing view is that 3.X developers hoisted things on users that they did not fully work through themselves. Unicode is prime among these: for all the talk here about how 2.X was broken in this regard, the implications of the 3.X string solution remain to be fully resolved in the 3.X standard library to this day. What is a common Python user to make of that? Certainly, this was my impression as well, after all the Web-SIG discussions regarding the state of the stdlib in 3.x with respect to URL parsing, joining, opening, etc. Nothing is set in stone; if something is incredibly painful, or worse yet broken, then someone needs to file a bug, bring it to this list, or bring up a patch. Or walk away. This is code we're talking about - nothing is set in stone, and if something is criminally broken it needs to be first identified, and then fixed. To be honest, I'm waiting to see some sort of tutorial(s) for using 3.x that actually addresses these kinds of stdlib usage issues, so that I don't have to think about it or futz around with experimenting, possibly to find that some things can't be done at all. I guess tutorial welcome, rather than patch welcome then ;) The only folks who can write the tutorial are the ones who have already drunk the koolaid. Note that I've been making my living with Python for about twelve years now, and would *like* to use Python3, but can't, yet, and therefore haven't taken the first sip. IOW, 3.x has broken TOOOWTDI for me in some areas. There may be obvious ways to do it, but, as per the Zen of Python, that way may not be obvious at first unless you're Dutch. ;-) What areas. We need specifics which can either be: 1 Shot down. 2 Turned into bugs, so they can be fixed 3 Documented in the core documentation. That's bloody ironic in a thread which had pointed at reasons why people are not even considering Py3 for their projects: those folks won't even find the issues due to the lack of confidence in the suitability of the platform. Tres. - -- === Tres Seaver +1 540-429-0999 tsea...@palladion.com Palladion Software Excellence by Designhttp://palladion.com -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkwc0I0ACgkQ+gerLs4ltQ6aDgCguYv+BXou0a42Yi7ERGCHOfIv 6REAnjejq4LDbE9c/gCqB+xs1yGfQ4KR =/9fw -END PGP SIGNATURE- ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
On Jun 19, 2010, at 10:13 AM, Tres Seaver tsea...@palladion.com wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Jesse Noller wrote: On Fri, Jun 18, 2010 at 4:48 PM, P.J. Eby p...@telecommunity.com wrote: At 05:22 PM 6/18/2010 +, l...@rmi.net wrote: So here it is: The prevailing view is that 3.X developers hoisted things on users that they did not fully work through themselves. Unicode is prime among these: for all the talk here about how 2.X was broken in this regard, the implications of the 3.X string solution remain to be fully resolved in the 3.X standard library to this day. What is a common Python user to make of that? Certainly, this was my impression as well, after all the Web-SIG discussions regarding the state of the stdlib in 3.x with respect to URL parsing, joining, opening, etc. Nothing is set in stone; if something is incredibly painful, or worse yet broken, then someone needs to file a bug, bring it to this list, or bring up a patch. Or walk away. Ok. If you want. This is code we're talking about - nothing is set in stone, and if something is criminally broken it needs to be first identified, and then fixed. To be honest, I'm waiting to see some sort of tutorial(s) for using 3.x that actually addresses these kinds of stdlib usage issues, so that I don't have to think about it or futz around with experimenting, possibly to find that some things can't be done at all. I guess tutorial welcome, rather than patch welcome then ;) The only folks who can write the tutorial are the ones who have already drunk the koolaid. Note that I've been making my living with Python for about twelve years now, and would *like* to use Python3, but can't, yet, and therefore haven't taken the first sip. Why can't you? Is it a bug? Let's file it and fix it. Is it that you need a dependency ported? Cool - let's bring it up to the maintainers, or this list, or ask the PSF to push resources into helping port. Anything but nothing. If what you're saying is that python 3 is a completely unsuitable platform, well, then yeah - we can all fix it or walk away. IOW, 3.x has broken TOOOWTDI for me in some areas. There may be obvious ways to do it, but, as per the Zen of Python, that way may not be obvious at first unless you're Dutch. ;-) What areas. We need specifics which can either be: 1 Shot down. 2 Turned into bugs, so they can be fixed 3 Documented in the core documentation. That's bloody ironic in a thread which had pointed at reasons why people are not even considering Py3 for their projects: those folks won't even find the issues due to the lack of confidence in the suitability of the platform. What I saw was a thread about some issues in email, and cgi. We have some work being done to address the issue. This will help resolve some of the issues. I'd there are other issues, then we should step up and either help, or get out ofthe way. Arguing about the viability of a platform we knew would take a bit for adoption is silly and breeds ill will. It's not a turd, and it's not hopeless, in fact rumor has it NumPy will be ported soon which is a major stepping stone. The only way to counteract this meme that python 3 is horribly broken is to prove that it's not, fix bugs, and move on. There's no point debating relative turdiness here. Jesse ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
On Sat, Jun 19, 2010 at 10:59 AM, Jesse Noller jnol...@gmail.com wrote: On Jun 19, 2010, at 10:13 AM, Tres Seaver tsea...@palladion.com wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Jesse Noller wrote: On Fri, Jun 18, 2010 at 4:48 PM, P.J. Eby p...@telecommunity.com wrote: At 05:22 PM 6/18/2010 +, l...@rmi.net wrote: So here it is: The prevailing view is that 3.X developers hoisted things on users that they did not fully work through themselves. Unicode is prime among these: for all the talk here about how 2.X was broken in this regard, the implications of the 3.X string solution remain to be fully resolved in the 3.X standard library to this day. What is a common Python user to make of that? Certainly, this was my impression as well, after all the Web-SIG discussions regarding the state of the stdlib in 3.x with respect to URL parsing, joining, opening, etc. Nothing is set in stone; if something is incredibly painful, or worse yet broken, then someone needs to file a bug, bring it to this list, or bring up a patch. Or walk away. Ok. If you want. This is code we're talking about - nothing is set in stone, and if something is criminally broken it needs to be first identified, and then fixed. To be honest, I'm waiting to see some sort of tutorial(s) for using 3.x that actually addresses these kinds of stdlib usage issues, so that I don't have to think about it or futz around with experimenting, possibly to find that some things can't be done at all. I guess tutorial welcome, rather than patch welcome then ;) The only folks who can write the tutorial are the ones who have already drunk the koolaid. Note that I've been making my living with Python for about twelve years now, and would *like* to use Python3, but can't, yet, and therefore haven't taken the first sip. Why can't you? Is it a bug? Let's file it and fix it. Is it that you need a dependency ported? Cool - let's bring it up to the maintainers, or this list, or ask the PSF to push resources into helping port. Anything but nothing. If what you're saying is that python 3 is a completely unsuitable platform, well, then yeah - we can all fix it or walk away. IOW, 3.x has broken TOOOWTDI for me in some areas. There may be obvious ways to do it, but, as per the Zen of Python, that way may not be obvious at first unless you're Dutch. ;-) What areas. We need specifics which can either be: 1 Shot down. 2 Turned into bugs, so they can be fixed 3 Documented in the core documentation. That's bloody ironic in a thread which had pointed at reasons why people are not even considering Py3 for their projects: those folks won't even find the issues due to the lack of confidence in the suitability of the platform. What I saw was a thread about some issues in email, and cgi. We have some work being done to address the issue. This will help resolve some of the issues. I'd there are other issues, then we should step up and either help, or get out ofthe way. Arguing about the viability of a platform we knew would take a bit for adoption is silly and breeds ill will. s/I'd/If - stupid phone. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
At 10:55 PM 6/19/2010 +0900, Stephen J. Turnbull wrote: They really don't care that writing Python 3 was fun, and that programming in Python 3 is more fun than ever. That doesn't compensate for even one lingering str/bytes bogosity to most of them, and since they don't get paid for fixing Python library bugs, they don't, and they're in no mood to *forgive* any, either. This is pretty much where I'm at, except that the only potential fun increase Py3 appears to offer me are argument annotations and keyword-only args -- but these are partly balanced by the loss of argument tuple unpacking. The metaclass keyword argument is nice, but the loss of dynamically-settable __metaclass__ is just plain annoying. Really, just about everything that Py3 offers in the way of added fun, seems offset by a matching loss somewhere else. So it's hard to get excited about it - it seems like, ho hum, a new language that's kind of like Python, but just different enough to be annoying. OTOH, I don't know what to do about that, besides adding some sort of killer app feature that makes Python 3 the One Obvious Way to do some specific application domain. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
On Jun 18, 2010, at 7:39 PM, Terry Reedy wrote: On 6/18/2010 6:51 PM, Raymond Hettinger wrote: There has been a disappointing lack of bug reports across the board for 3.x. Here is one from this week involving the interaction of array and bytearray. It needs a comment from someone who can understand the C-API based patch, which is beyond me. http://bugs.python.org/issue8990 I'll take a look at this one. Raymond P.S. For those who are interested, here is the story on BeautifulSoup: http://www.crummy.com/software/BeautifulSoup/3.1-problems.html ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
On Sun, 20 Jun 2010 12:13:34 am Tres Seaver wrote: I guess tutorial welcome, rather than patch welcome then ;) The only folks who can write the tutorial are the ones who have already drunk the koolaid. Note that I've been making my living with Python for about twelve years now, and would *like* to use Python3, but can't, yet, and therefore haven't taken the first sip. You emphatically say you would like to use Python3, but describe those who already have as having drunk the Koolaid. Comparing those who can and have successfully moved to Python3 with the Jonestown cult mass-suicide doesn't really strike me as a sign that you want to join them. -- Steven D'Aprano ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
l...@rmi.net writes: FWIW, after rewriting Programming Python for 3.1, 3.x still feels a lot like a beta to me, almost 2 years after its release. Email, of course, is a big wart. But guess what? Python 2's email module doesn't actually work! Sure, the program runs most of the time, but every program that depends on email must acquire inches of armorplate against all the things that can go wrong. You simply can't rely on it to DTRT except in a pre-MIME, pre-HTML, ASCII-only world. Although they're often addressing general problems, these hacks are *not* integrated back into the email module in most cases, but remain app-specific voodoo. If you live in Kansas, sure, you can concentrate on dodging tornados and completely forget about Unicode and MIME and text/bogus content. For the rest of the world, though, the problem is not Python 3. It's STD 11 (which still points at RFC 822, dated 1982!) It's really inappropriate to point at the email module, whose developers are trying *not* to punt on conformance and robustness, when even the IETF can only run in circles, scream and shout! Maybe there are other problems with Python 3 that deserve to be pointed at, but given the general scarcity of resources I think the email module developers are working on the right things. Unlike many other modules, email really needs to be rewritten from the ground (Python 3) up, because of the centrality of bytes/unicode confusion to all email problems. Python 3 completely changes the assumptions there; a Python 2-style email module really can't work properly. Then on top of that, today we know a lot more about handling issues like text/html content and MIME in general than when the Python 2 email module was designed. New problems have arisen over the period of Python 3 development, like domain keys, which email doesn't handle out of the box AFAIK, but email for Python 3 should IMHO. Should Python 3 have been held back until email was fixed? Dunno, but I personally am very glad it was not; where I have a choice, I always use Python 3 now, and have yet to run into a problem. I expect that to change if I can find the time to get involved in email and Mailman 3 development, of course.wink ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
Replying en masse to save bandwidth here... Barry Warsaw ba...@python.org writes: We know it, we have extensively discussed how to fix it, we have IMO a good design, and we even have someone willing and able to tackle the problem. We need to find a sufficient source of funding to enable him to do the work it will take, and so far that's been the biggest stumbling block. It will take a focused and determined effort to see this through, and it's obvious that volunteers cannot make it happen. I include myself in the latter category, as I've tried and failed at least twice to do it in my spare time. All understood, and again, not to disparage anyone here. My comments are directed to the development community at large to underscore the grave p/r problems 3.X faces. I realize email parsing is a known issue; I also realize that most people evaluating 3.X today won't care that it is. Most will care only that the new version of a language reportedly used by Google and YouTube still doesn't support CGI uploads a year and a half after its release. As an author, that's a downright horrible story to have to tell the world. Stephen J. Turnbull step...@xemacs.org writes: Email, of course, is a big wart. But guess what? Python 2's email module doesn't actually work! Yes it does (see next point). If you live in Kansas, sure, you can concentrate on dodging tornados and completely forget about Unicode and MIME and text/bogus content. For the rest of the world, though, the problem is not Python 3 Yes it is, and Kansas is a lot bigger than you seem to think. I want to reiterate that I was able to build a feature rich email client with the email package as it exists in 3.1. This includes support on both the receiving and sending sides for HTML, arbitrary attachments, and decoding and encoding of both text payloads and headers according to email, MIME, and Unicode/I18N standards. It's an amazingly useful package, and does work as is in 3.X. The two main issues I found have been recently fixed. It's unfortunate that this package is also the culprit behind CGI breakage, but it's not clear why it became a critical path for so much utility in the first place. The package might not be aesthetically ideal, but to me it seems that an utterly incompatible overhaul of this in the name of supporting potentially very different data streams is a huge functional overload. And to those people in Kansas who live outside the pydev clique, replacing it with something different at this point will look as if an incompatible Python is already incompatible with releases in its own line. Why in the world would anyone base a new project on that sort of thrashing? For my part, I've had to add far too many notes to the upcoming edition of Programming Python about major pieces of functionality that worked in 2.X but no longer do in 3.X. That's disappointing to me personally, but it will probably seem a lot worse to the book's tens of thousands of readers. Yet this is the reality that 3.X has created for itself. Should Python 3 have been held back until email was fixed? Dunno, but I personally am very glad it was not; where I have a choice, I always use Python 3 now, and have yet to run into a problem. I guess we'll just have to disagree on that. IMHO, Python 3 shot itself in the foot by releasing in half-baked form. And the 3.0 I/O speed issue (remember that?) came very close to blowing its leg clean off. The reality out there in Kansas today is that 3.X is perceived as so bad that it could very well go the way of POP4 if its story does not improve. I don't know what sort of Python world will be left behind in the wake, but I do know it will probably be much smaller. Steve Holden st...@holdenweb.com writes: Lest the readership think that the PSF is unaware of this issue, allow me to point out that we have already partially funded this effort, and are still offering R. David Murray some further matching funds if he can raise sponsorship to complete the effort (on which he has made a very promising start). We are also attempting to enable tax-deductible fund raising to increase the likelihood of David's finding support. Perhaps we need to think about a broader campaign to increase the quality of the python 3 libraries. I find it very annoying that the #python IRC group still has Don't use Python 3 in it's topic. They adamantly refuse to remove it until there is better library support, and they are the guys who see the issues day in day out so it is hard to argue with them (and I don't think an autocratic decision-making process would be appropriate). I'm all for people getting paid for work they do, but with all due respect, I think this underscores part of the problem in the Python world today. If funding had been as stringent a prerequisite in the 90s, I doubt there would be a Python today. It was about the fun and the code, not the bucks and the bureaucracy. As
Re: [Python-Dev] email package status in 3.X
On 18/06/2010 16:09, l...@rmi.net wrote: Replying en masse to save bandwidth here... Barry Warsawba...@python.org writes: We know it, we have extensively discussed how to fix it, we have IMO a good design, and we even have someone willing and able to tackle the problem. We need to find a sufficient source of funding to enable him to do the work it will take, and so far that's been the biggest stumbling block. It will take a focused and determined effort to see this through, and it's obvious that volunteers cannot make it happen. I include myself in the latter category, as I've tried and failed at least twice to do it in my spare time. All understood, and again, not to disparage anyone here. My comments are directed to the development community at large to underscore the grave p/r problems 3.X faces. I realize email parsing is a known issue; I also realize that most people evaluating 3.X today won't care that it is. Most will care only that the new version of a language reportedly used by Google and YouTube still doesn't support CGI uploads a year and a half after its release. As an author, that's a downright horrible story to have to tell the world. Really? How widely used is the CGI module these days? Maybe there is a reason nobody appeared to notice... [snip...] Should Python 3 have been held back until email was fixed? Dunno, but I personally am very glad it was not; where I have a choice, I always use Python 3 now, and have yet to run into a problem. I guess we'll just have to disagree on that. IMHO, Python 3 shot itself in the foot by releasing in half-baked form. And the 3.0 I/O speed issue (remember that?) came very close to blowing its leg clean off. Whilst I agree that there are plenty of issues to workon, and I don't underestimate the difficulty of some of them, I think half-baked is very much overblown. Whilst you have a lot to say about how much of a problem this is I don't understand what you are suggesting be *done*? Python 3.0 was *declared* to be an experimental release, and by most standards 3.1 (in terms of the core language and functionality) was a solid release. Any reasonable expectation about Python 3 adoption predicted that it would take years, and would include going through a phase of difficulty and disappointment... All the best, Michael Foord -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (”BOGUS AGREEMENTS”) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
Python 3.0 was *declared* to be an experimental release, and by most standards 3.1 (in terms of the core language and functionality) was a solid release. Any reasonable expectation about Python 3 adoption predicted that it would take years, and would include going through a phase of difficulty and disappointment... Declaring something to be a turd doesn't change the fact that it's a turd. I have a feeling that most people outside this list would have much rather avoided the difficulty and disappointment altogether. Let's be honest here; 3.X was released to the community in part as an extended beta. That's not a problem, unless you drop the word beta. And if you're still not buying that, imagine the sort of response you'd get if you tried to sell software that billed itself as experimental, and promised a phase of disappointment. Why would you expect the Python world to react any differently? Whilst I agree that there are plenty of issues to workon, and I don't underestimate the difficulty of some of them, I think half-baked is very much overblown. Whilst you have a lot to say about how much of a problem this is I don't understand what you are suggesting be *done*? I agree that 3.X isn't all bad, and I very much hope it succeeds. And no, I have no answers; I'm just reporting the perception from downwind. So here it is: The prevailing view is that 3.X developers hoisted things on users that they did not fully work through themselves. Unicode is prime among these: for all the talk here about how 2.X was broken in this regard, the implications of the 3.X string solution remain to be fully resolved in the 3.X standard library to this day. What is a common Python user to make of that? --Mark Lutz (http://learning-python.com, http://rmi.net/~lutz) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
On 18/06/2010 18:22, l...@rmi.net wrote: Python 3.0 was *declared* to be an experimental release, and by most standards 3.1 (in terms of the core language and functionality) was a solid release. Any reasonable expectation about Python 3 adoption predicted that it would take years, and would include going through a phase of difficulty and disappointment... Declaring something to be a turd doesn't change the fact that it's a turd. Right - but *you're* the one calling it a turd, which is not a helpful approach or likely to achieve *anything* useful. I still have no idea what you are actually suggesting. I have a feeling that most people outside this list would have much rather avoided the difficulty and disappointment altogether. Let's be honest here; 3.X was released to the community in part as an extended beta. Correction - 3.0 was an experimental release. That is not true of 3.1 and future releases. All the best, Michael That's not a problem, unless you drop the word beta. And if you're still not buying that, imagine the sort of response you'd get if you tried to sell software that billed itself as experimental, and promised a phase of disappointment. Why would you expect the Python world to react any differently? Whilst I agree that there are plenty of issues to workon, and I don't underestimate the difficulty of some of them, I think half-baked is very much overblown. Whilst you have a lot to say about how much of a problem this is I don't understand what you are suggesting be *done*? I agree that 3.X isn't all bad, and I very much hope it succeeds. And no, I have no answers; I'm just reporting the perception from downwind. So here it is: The prevailing view is that 3.X developers hoisted things on users that they did not fully work through themselves. Unicode is prime among these: for all the talk here about how 2.X was broken in this regard, the implications of the 3.X string solution remain to be fully resolved in the 3.X standard library to this day. What is a common Python user to make of that? --Mark Lutz (http://learning-python.com, http://rmi.net/~lutz) -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (”BOGUS AGREEMENTS”) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
Giampaolo Rodolà g.rod...@gmail.com wrote: 2010/6/17 Bill Janssen jans...@parc.com: There's a related meta-issue having to do with antique protocols. Can I know what meta-issue are you talking about exactly? Giampaolo, I believe that you and I have already discussed this on one of the FTP issues. Bill ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
2010/6/18 Bill Janssen jans...@parc.com: Giampaolo Rodolà g.rod...@gmail.com wrote: 2010/6/17 Bill Janssen jans...@parc.com: There's a related meta-issue having to do with antique protocols. Can I know what meta-issue are you talking about exactly? Giampaolo, I believe that you and I have already discussed this on one of the FTP issues. Bill I only remember a discussion in which I was against removing OOB data support from asyncore in order to support certain parts of the FTP protocol using it, but that's all. I don't see how urlib or any other stdlib module is supposed to be penalized by FTP protocol in any way. --- Giampaolo http://code.google.com/p/pyftpdlib http://code.google.com/p/psutil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
I wasn't calling Python 3 a turd. I was trying to show the strangeness of the logic behind your rationalization. And failing badly... (maybe I should have used tar ball?) What I'm suggesting is that extreme caution be exercised from this point forward with all things 3.X-related. Whether you wish to accept this or not, 3.X has a negative image to many. This suggestion specifically includes not abandoning current 3.X email package users as a case in point. Ripping the rug out from new 3.X users after they took the time to port seems like it may be just enough to tip the scales altogether. --Mark Lutz (http://learning-python.com, http://rmi.net/~lutz) -Original Message- From: Michael Foord fuzzy...@voidspace.org.uk To: l...@rmi.net Subject: Re: [Python-Dev] email package status in 3.X Date: Fri, 18 Jun 2010 18:27:46 +0100 On 18/06/2010 18:22, l...@rmi.net wrote: Python 3.0 was *declared* to be an experimental release, and by most standards 3.1 (in terms of the core language and functionality) was a solid release. Any reasonable expectation about Python 3 adoption predicted that it would take years, and would include going through a phase of difficulty and disappointment... Declaring something to be a turd doesn't change the fact that it's a turd. Right - but *you're* the one calling it a turd, which is not a helpful approach or likely to achieve *anything* useful. I still have no idea what you are actually suggesting. I have a feeling that most people outside this list would have much rather avoided the difficulty and disappointment altogether. Let's be honest here; 3.X was released to the community in part as an extended beta. Correction - 3.0 was an experimental release. That is not true of 3.1 and future releases. All the best, Michael That's not a problem, unless you drop the word beta. And if you're still not buying that, imagine the sort of response you'd get if you tried to sell software that billed itself as experimental, and promised a phase of disappointment. Why would you expect the Python world to react any differently? Whilst I agree that there are plenty of issues to workon, and I don't underestimate the difficulty of some of them, I think half-baked is very much overblown. Whilst you have a lot to say about how much of a problem this is I don't understand what you are suggesting be *done*? I agree that 3.X isn't all bad, and I very much hope it succeeds. And no, I have no answers; I'm just reporting the perception from downwind. So here it is: The prevailing view is that 3.X developers hoisted things on users that they did not fully work through themselves. Unicode is prime among these: for all the talk here about how 2.X was broken in this regard, the implications of the 3.X string solution remain to be fully resolved in the 3.X standard library to this day. What is a common Python user to make of that? --Mark Lutz (http://learning-python.com, http://rmi.net/~lutz) -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (”BOGUS AGREEMENTS”) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
At 05:22 PM 6/18/2010 +, l...@rmi.net wrote: So here it is: The prevailing view is that 3.X developers hoisted things on users that they did not fully work through themselves. Unicode is prime among these: for all the talk here about how 2.X was broken in this regard, the implications of the 3.X string solution remain to be fully resolved in the 3.X standard library to this day. What is a common Python user to make of that? Certainly, this was my impression as well, after all the Web-SIG discussions regarding the state of the stdlib in 3.x with respect to URL parsing, joining, opening, etc. To be honest, I'm waiting to see some sort of tutorial(s) for using 3.x that actually addresses these kinds of stdlib usage issues, so that I don't have to think about it or futz around with experimenting, possibly to find that some things can't be done at all. IOW, 3.x has broken TOOOWTDI for me in some areas. There may be obvious ways to do it, but, as per the Zen of Python, that way may not be obvious at first unless you're Dutch. ;-) Since at the moment Python 3 offers me only cosmetic improvements over 2.x (apart from argument annotations), it's hard to get excited enough about it to want to muck about with porting anything to it, or even trying to learn about all the ramifications of the changes. :-( ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
On Fri, Jun 18, 2010 at 4:48 PM, P.J. Eby p...@telecommunity.com wrote: At 05:22 PM 6/18/2010 +, l...@rmi.net wrote: So here it is: The prevailing view is that 3.X developers hoisted things on users that they did not fully work through themselves. Unicode is prime among these: for all the talk here about how 2.X was broken in this regard, the implications of the 3.X string solution remain to be fully resolved in the 3.X standard library to this day. What is a common Python user to make of that? Certainly, this was my impression as well, after all the Web-SIG discussions regarding the state of the stdlib in 3.x with respect to URL parsing, joining, opening, etc. Nothing is set in stone; if something is incredibly painful, or worse yet broken, then someone needs to file a bug, bring it to this list, or bring up a patch. This is code we're talking about - nothing is set in stone, and if something is criminally broken it needs to be first identified, and then fixed. To be honest, I'm waiting to see some sort of tutorial(s) for using 3.x that actually addresses these kinds of stdlib usage issues, so that I don't have to think about it or futz around with experimenting, possibly to find that some things can't be done at all. I guess tutorial welcome, rather than patch welcome then ;) IOW, 3.x has broken TOOOWTDI for me in some areas. There may be obvious ways to do it, but, as per the Zen of Python, that way may not be obvious at first unless you're Dutch. ;-) What areas. We need specifics which can either be: 1 Shot down. 2 Turned into bugs, so they can be fixed 3 Documented in the core documentation. jesse ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
On 18/06/2010 19:52, l...@rmi.net wrote: I wasn't calling Python 3 a turd. I was trying to show the strangeness of the logic behind your rationalization. And failing badly... (maybe I should have used tar ball?) I didn't make myself clear. The expected disappointment I was referring to was about the rate of adoption, not about the quality of the product. I'm still baffled as to how a bug in the cgi module (along with the acknowledged email problems) is such a big deal. Was it reported and then languished in the bug tracker? That would be bad ion its own but if it was only recently discovered that indicates that it probably isn't such a big deal - either way it needs fixing, but using Python for writing cgis hasn't been a big use case for a long time. All the best, Michael What I'm suggesting is that extreme caution be exercised from this point forward with all things 3.X-related. Whether you wish to accept this or not, 3.X has a negative image to many. This suggestion specifically includes not abandoning current 3.X email package users as a case in point. Ripping the rug out from new 3.X users after they took the time to port seems like it may be just enough to tip the scales altogether. --Mark Lutz (http://learning-python.com, http://rmi.net/~lutz) -Original Message- From: Michael Foordfuzzy...@voidspace.org.uk To: l...@rmi.net Subject: Re: [Python-Dev] email package status in 3.X Date: Fri, 18 Jun 2010 18:27:46 +0100 On 18/06/2010 18:22, l...@rmi.net wrote: Python 3.0 was *declared* to be an experimental release, and by most standards 3.1 (in terms of the core language and functionality) was a solid release. Any reasonable expectation about Python 3 adoption predicted that it would take years, and would include going through a phase of difficulty and disappointment... Declaring something to be a turd doesn't change the fact that it's a turd. Right - but *you're* the one calling it a turd, which is not a helpful approach or likely to achieve *anything* useful. I still have no idea what you are actually suggesting. I have a feeling that most people outside this list would have much rather avoided the difficulty and disappointment altogether. Let's be honest here; 3.X was released to the community in part as an extended beta. Correction - 3.0 was an experimental release. That is not true of 3.1 and future releases. All the best, Michael That's not a problem, unless you drop the word beta. And if you're still not buying that, imagine the sort of response you'd get if you tried to sell software that billed itself as experimental, and promised a phase of disappointment. Why would you expect the Python world to react any differently? Whilst I agree that there are plenty of issues to workon, and I don't underestimate the difficulty of some of them, I think half-baked is very much overblown. Whilst you have a lot to say about how much of a problem this is I don't understand what you are suggesting be *done*? I agree that 3.X isn't all bad, and I very much hope it succeeds. And no, I have no answers; I'm just reporting the perception from downwind. So here it is: The prevailing view is that 3.X developers hoisted things on users that they did not fully work through themselves. Unicode is prime among these: for all the talk here about how 2.X was broken in this regard, the implications of the 3.X string solution remain to be fully resolved in the 3.X standard library to this day. What is a common Python user to make of that? --Mark Lutz (http://learning-python.com, http://rmi.net/~lutz) -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (”BOGUS AGREEMENTS”) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (”BOGUS AGREEMENTS”) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further
Re: [Python-Dev] email package status in 3.X
Michael Foord: Python 3.0 was *declared* to be an experimental release, and by most standards 3.1 (in terms of the core language and functionality) was a solid release. That looks to me like an after-the-event rationalization. The release note for Python 3.0 (and the What's new) gives no indication that it is experimental but does say We are confident that Python 3.0 is of the same high quality as our previous releases ... you can safely choose either version (or both) to use in your projects. http://mail.python.org/pipermail/python-dev/2008-December/083824.html Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
On Jun 18, 2010, at 3:08 PM, Michael Foord wrote: I'm still baffled as to how a bug in the cgi module (along with the acknowledged email problems) is such a big deal. Was it reported and then languished in the bug tracker? That would be bad ion its own but if it was only recently discovered that indicates that it probably isn't such a big deal - either way it needs fixing, but using Python for writing cgis hasn't been a big use case for a long time. That's one possible explanation. Another possible explanation is the product isn't being heavily exercised for serious work and that it has yet to be shaken-out thoroughly. There has been a disappointing lack of bug reports across the board for 3.x. That doesn't mean that the bugs aren't there and that they won't be reported when adoption is heavier. In the cases of email, mime handling, cgi and whatnot, the important point is not whether a given technology is popular. The important part is that it hints at the kind of bytes/text issues that people are going to face and that we will need to help them address (i.e. such as blobs containing multiple encodings, a need to use byte oriented tools such as md5 in conjunction with text oriented applications, etc.) One other thought: In addition to not getting many 3.x specific bug reports, we don't seem to be getting many 3.x specific help questions (i.e. asking about dictviews or how to make a priority queue in a environment where many callable don't support ordering operations, etc.). Mark Lutz wrote What I'm suggesting is that extreme caution be exercised from this point forward with all things 3.X-related. Whether you wish to accept this or not, 3.X has a negative image to many. This suggestion specifically includes not abandoning current 3.X email package users as a case in point. Ripping the rug out from new 3.X users after they took the time to port seems like it may be just enough to tip the scales altogether. A couple other areas that need work (some of them are minor): * BeautifulSoup was left behind when SGML parsing was removed from the standard lib. * Shelves were crippled for Windows users when bsddb was ripped out. * Lists containing None for missing values are no longer sortable. * The basic heapq approach to making a priority queue not longer works well. Simply decorating with (priority_level, callable_or_object) fails with two tasks at the same priority if the callable or other objects aren't orderable. Raymond P.S. I do think it would be great if we could direct some attention to parts of 3.x that are really nice. Am hoping that this conversation doesn't drown in negativity. Instead, it should focus on what improvements are needed to win broader adoption. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
On 18/06/2010 23:51, Raymond Hettinger wrote: On Jun 18, 2010, at 3:08 PM, Michael Foord wrote: I'm still baffled as to how a bug in the cgi module (along with the acknowledged email problems) is such a big deal. Was it reported and then languished in the bug tracker? That would be bad ion its own but if it was only recently discovered that indicates that it probably isn't such a big deal - either way it needs fixing, but using Python for writing cgis hasn't been a big use case for a long time. That's one possible explanation. Another possible explanation is the product isn't being heavily exercised for serious work and that it has yet to be shaken-out thoroughly. There has been a disappointing lack of bug reports across the board for 3.x. That doesn't mean that the bugs aren't there and that they won't be reported when adoption is heavier. Oh, I quite agree. I don't think it makes py3k a turd either. In the cases of email, mime handling, cgi and whatnot, the important point is not whether a given technology is popular. The important part is that it hints at the kind of bytes/text issues that people are going to face and that we will need to help them address (i.e. such as blobs containing multiple encodings, a need to use byte oriented tools such as md5 in conjunction with text oriented applications, etc.) One other thought: In addition to not getting many 3.x specific bug reports, we don't seem to be getting many 3.x specific help questions (i.e. asking about dictviews or how to make a priority queue in a environment where many callable don't support ordering operations, etc.). Most of the questions I've seen about Python 3 are from library authors doing porting rather than application developers. This is to be expected I guess. Mark Lutz wrote What I'm suggesting is that extreme caution be exercised from this point forward with all things 3.X-related. Whether you wish to accept this or not, 3.X has a negative image to many. This suggestion specifically includes not abandoning current 3.X email package users as a case in point. Ripping the rug out from new 3.X users after they took the time to port seems like it may be just enough to tip the scales altogether. A couple other areas that need work (some of them are minor): * BeautifulSoup was left behind when SGML parsing was removed from the standard lib. * Shelves were crippled for Windows users when bsddb was ripped out. * Lists containing None for missing values are no longer sortable. Yeah, this one can be a bugger. :-) * The basic heapq approach to making a priority queue not longer works well. Simply decorating with (priority_level, callable_or_object) fails with two tasks at the same priority if the callable or other objects aren't orderable. Raymond P.S. I do think it would be great if we could direct some attention to parts of 3.x that are really nice. Am hoping that this conversation doesn't drown in negativity. Instead, it should focus on what improvements are needed to win broader adoption. I definitely agree that our focus should be on fixing problems as we find them and working on increasing adoption. No argument from me. All the best, Michael -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (”BOGUS AGREEMENTS”) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
On 6/18/2010 6:51 PM, Raymond Hettinger wrote: There has been a disappointing lack of bug reports across the board for 3.x. Here is one from this week involving the interaction of array and bytearray. It needs a comment from someone who can understand the C-API based patch, which is beyond me. http://bugs.python.org/issue8990 Another possible reason for the lack: 500 of the current 2800 open issues have NO comment (ie, message count = 1), some with patches. I just posted '500 tracker orphans; we need more reviewers' on python-list to encourage more participation. Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
On Jun 16, 2010, at 08:48 PM, l...@rmi.net wrote: Well, it looks like I've stumbled onto the other shoe on this issue--that the email package's problems are also apparently behind the fact that CGI binary file uploads don't work in 3.1 (http://bugs.python.org/issue4953). Yikes. I trust that people realize this is a show-stopper for broader Python 3.X adoption. We know it, we have extensively discussed how to fix it, we have IMO a good design, and we even have someone willing and able to tackle the problem. We need to find a sufficient source of funding to enable him to do the work it will take, and so far that's been the biggest stumbling block. It will take a focused and determined effort to see this through, and it's obvious that volunteers cannot make it happen. I include myself in the latter category, as I've tried and failed at least twice to do it in my spare time. -Barry signature.asc Description: PGP signature ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
Nick Coghlan ncogh...@gmail.com wrote: My personal perspective is that a lot of that code was likely already broken in hard to detect ways when dealing with mixed encodings - releasing 3.x just made the associated errors significantly easier to detect. I have to agree with this, and not just about encodings. I think much of the stdlib code dealing with all aspects of HTTP (urllib and the http package which now includes cgi) is kind of shaky. And it affects (infects) other parts of the stdlib, too; sockets are hacked to support the read-after-close paradigm that httplib uses, for instance. Which means that SSL and other socket-using code also has to support it, etc. Some of this was cleaned up in the move to 3.x, but more work needs to be done. Cudos to the folks working on httplib2 (http://code.google.com/p/httplib2/) and WSGI. There's a related meta-issue having to do with antique protocols. FTP, for instance, was designed when the Internet had only 19 nodes connected together with custom-built refrigerator-sized routers. A very early experiment in application protocols. It does a few odd things that we've since learned to be inefficient/unwise/unnecessary. Does it make sense that Python support every part of it? On the other hand, it was fairly static when the Python support was added (unlike HTTP, which was under very active development!) so that module is pretty robust. Bill ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com