Re: [Python-Dev] email package status in 3.X

2010-06-27 Thread R. David Murray
On Fri, 18 Jun 2010 18:52:45 -, l...@rmi.net wrote:
 What I'm suggesting is that extreme caution be exercised from
 this point forward with all things 3.X-related.  Whether you
 wish to accept this or not, 3.X has a negative image to many.
 This suggestion specifically includes not abandoning current
 3.X email package users as a case in point.  Ripping the rug
 out from new 3.X users after they took the time to port seems
 like it may be just enough to tip the scales altogether.

Catching up on my python-dev email, I just want to clarify this with
respect to email.  (1) I suspect that the new API will be enough of a
carrot that they won't mind converting to it, BUT, (2) the plan is to
provide a compatibility API that will fully support the current Python3
email5 API (but with fewer bugs in areas such as header folding and
unfolding).

--
R. David Murray  www.bitdance.com
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-23 Thread Steve Holden
Guido van Rossum wrote:
 On Tue, Jun 22, 2010 at 9:37 AM, Tres Seaver tsea...@palladion.com wrote:
 Any turdiness (which I am *not* arguing for) is a natural consequence
 of the kinds of backward incompatibilities which were *not* ruled out
 for Python 3, along with the (early, now waning) build it and they will
  come optimism about adoption rates.
 
 FWIW, my optimisim is *not* waning. I think it's good that we're
 having this discussion and I expect something useful will come out of
 it; I also expect in general that the (admittedly serious) problem of
 having to port all dependencies will be solved in the next few years.
 Not by magic, but because many people are taking small steps in the
 right direction, and there will be light eventually. In the mean time
 I don't blame anyone for sticking with 2.x or being too busy to help
 port stuff to 3.x. Python 3 has been a long time in the making -- it
 will be a bit longer still, which was expected.
 
+1

The important thing is to avoid bigotry and FUD, and deal with things
the way they are. The #python IRC team have just helped us make a major
step forward. This won't be a campaign with a victorious charge over
some imaginary finish line.

regards
 Steve
-- 
Steve Holden   +1 571 484 6266   +1 800 494 3119
See Python Video!   http://python.mirocommunity.org/
Holden Web LLC http://www.holdenweb.com/
UPCOMING EVENTS:http://holdenweb.eventbrite.com/
All I want for my birthday is another birthday -
 Ian Dury, 1942-2000
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-23 Thread Glyph Lefkowitz

On Jun 23, 2010, at 8:17 AM, Steve Holden wrote:

 Guido van Rossum wrote:
 On Tue, Jun 22, 2010 at 9:37 AM, Tres Seaver tsea...@palladion.com wrote:
 Any turdiness (which I am *not* arguing for) is a natural consequence
 of the kinds of backward incompatibilities which were *not* ruled out
 for Python 3, along with the (early, now waning) build it and they will
 come optimism about adoption rates.
 
 FWIW, my optimisim is *not* waning. I think it's good that we're
 having this discussion and I expect something useful will come out of
 it; I also expect in general that the (admittedly serious) problem of
 having to port all dependencies will be solved in the next few years.
 Not by magic, but because many people are taking small steps in the
 right direction, and there will be light eventually. In the mean time
 I don't blame anyone for sticking with 2.x or being too busy to help
 port stuff to 3.x. Python 3 has been a long time in the making -- it
 will be a bit longer still, which was expected.
 
 +1
 
 The important thing is to avoid bigotry and FUD, and deal with things
 the way they are. The #python IRC team have just helped us make a major
 step forward. This won't be a campaign with a victorious charge over
 some imaginary finish line.

For sure.

I don't speak for Tres, but I don't think he wasn't talking about optimism 
about *adoption*, overall, but optimism about adoption *rates*.  And I don't 
think he was talking about it coming from Guido :).

There has definitely been some irrational exuberance from some quarters.  The 
form it usually takes is someone making a blog post which assumes, because the 
author could port their smallish library or application without too much 
hassle, that Python 2.x is already dead and everyone should be off of it in a 
couple of weeks.

I've never heard this position from the core team or any official communication 
or documentation.  Far from it: the realistic attitude that the Python 3 
migration is something that will take a while has significantly reduced my own 
concerns.

Even the aforementioned blog posts have been encouraging in some ways, because 
a lot of people are reporting surprisingly easy transitions.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-23 Thread Tres Seaver
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Glyph Lefkowitz wrote:

 I don't speak for Tres, but I don't think he wasn't talking about
 optimism about *adoption*, overall, but optimism about adoption
 *rates*.  And I don't think he was talking about it coming from Guido
 :).

You channel me correctly here.  In particular, the phrase build it and
they will come was meant to address the idea that the only thing needed
to drive adoption was the release of the new, shiny Python3.  That
particular bit of optimism is what I meant to describe as waning:  the
community on the whole seems to be more realistic now than two or three
years ago about the kind of extra effort required from both core
developers and from existing Python 2 folks to get to Python 3.

 There has definitely been some irrational exuberance from some
 quarters.  The form it usually takes is someone making a blog post
 which assumes, because the author could port their smallish library
 or application without too much hassle, that Python 2.x is already
 dead and everyone should be off of it in a couple of weeks.
 
 I've never heard this position from the core team or any official
 communication or documentation.  Far from it: the realistic attitude
 that the Python 3 migration is something that will take a while has
 significantly reduced my own concerns.
 
 Even the aforementioned blog posts have been encouraging in some
 ways, because a lot of people are reporting surprisingly easy
 transitions.

Indeed.


Tres.
- --
===
Tres Seaver  +1 540-429-0999  tsea...@palladion.com
Palladion Software   Excellence by Designhttp://palladion.com
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkwiVS8ACgkQ+gerLs4ltQ4kQgCeJ9nwU8XyiWzOTpHSbWg21bzU
0/IAnjVOj5SlgA9mnAsx4/wMad5lNkqq
=HObh
-END PGP SIGNATURE-

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-22 Thread Stephen J. Turnbull
Michael Urman writes:

  It is somewhat troublesome that there doesn't appear to be an obvious
  built-in idempotent-when-possible function that gives back the
  provided bytes/str,

If you want something idempotent, it's already the case that
bytes(b'abc') = b'abc'.  What might be desirable is to make
bytes('abc') work and return b'abc', but only if 'abc' is pure ASCII
(or maybe ISO 8859/1).

Unfortunately, str(b'abc') already does work, but

st...@uwakimon ~ $ python3.1
Python 3.1.2 (release31-maint, May 12 2010, 20:15:06) 
[GCC 4.3.4] on linux2
Type help, copyright, credits or license for more information.
 str(b'abc')
b'abc'
 

Oops.  You can see why that probably should be the case.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-22 Thread Stephen J. Turnbull
P.J. Eby writes:

  I know, it's a hard thing to wrap one's head around, since on the
  surface it sounds like unicode is the programmer's savior.

I don't need to wrap my head around it.  It's been deeply embedded,
point first, and the nasty barbs ensure that I have no desire to pull
it back out.

To wit, I've been dealing with Japanese encoding issues on a daily
basis for 20 years, and I'm well aware that programmers have several
good reasons (and a lot more bad ones) for avoiding them, and even for
avoiding Unicode when they must deal with encodings at all.  I don't
think any of the good reasons have been offered here yet, that's all.

  Unfortunately, real-world text data exists which cannot be safely
  roundtripped to unicode, and must be handled in bytes with
  encoding form for certain operations.

Or Unicode with encoding form.  See below for why this makes sense in
the context of Python.

  I personally do not have to deal with this *particular* use case any 
  more -- I haven't been at NTT/Verio for six years now.

As mentioned, I have a bit of understanding of the specific problems
of Japanese-language computing.  In particular, roundtripping Japanese
from *any* encoding to *any other* encoding is problematic, because
the national standards provide a proper subset of the repertoire
actually used by the Japanese people.  (Even JIS X 0213.)

  My current needs are simpler, thank goodness.  ;-)  However, they 
  *do* involve situations where I'm dealing with *other* 
  encoding-restricted legacy systems, such as software for interfacing 
  with the US Postal Service that only works with a restricted subset 
  of latin1, while receiving mangled ASCII from an ecommerce provider, 
  and storing things in what's effectively a latin-1 database.

Yes, I know of similar issues in other applications.  For example, TeX
error messages do not respect UTF-8 character boundaries, so Emacs has
to handle them specially (basically a mechanism similar in spirit to
PEP 383 is used).

  Being able to easily assert what kind of bytes I've got would
  actually let me catch errors sooner, *if* those assertions were
  being checked when different kinds of strings or bytes were being
  combined.  i.e., at coercion time).

I see that this would make life a little easier for you in maintaining
without refactoring.  I'd say it's a kludge, but without a full list
of requirements I'm in no position to claim any authority wink.  Eg,
for a non-kludgey suggestion, how about defining a codec which takes
Latin-1 bytes, checks (with error on failure) for the restricted
subset, and converts to str?  Then you can manipulate these things as
str with abandon internally.  Finally you get another check in the
outgoing codec which converts from str to effective Latin-1 bytes,
however that is defined.

But OK, maybe I'm just being naive.  You need this unlovely artifice
so you can put in asserts in appropriate places.  Now, does it belong
in the stdlib?

It seems to me that in the case of Japanese roundtripping, *most* of
the time encoding back to a standard Japanese encoding will work.  If
you run into one of the problematic characters that JIS doesn't allow
but Japanese like to use because they prefer the glyph to the
JIS-standard glyph, you get an occasional error on encoding to a
standard Japanese encoding, which you handle specially with a database
of such characters.  Knowing the specific encoding originally used
*normally does not help unless you're replying to that person and
**only** that person*, because the extended repertoires vary widely
and the only standard is Japanese.  I conclude ebytes does *no* good
here.

For the ecommerce/USPS case, well, actually you need special-purpose
encodings anyway (ISTM).  'latin-1' loses, the USPS is allergic to
some valid 'latin-1' characters.  'ascii' loses, apparently you need
some of the Latin-1 repertoire, and anyway AIUI the ecommerce provider
munges the ASCII.  So what does ebytes actually buy you here, unless
you write the codecs?  If you've got the codecs, what additional
benefit do you get from ebytes?

Note that you would *also* need to do explicit transcoding anyway if
you were dealing with Japan Post instead of the USPS, although I grant
your code is probably general enough to deal with Deutsche Telecom
(but the German equivalent of your ecommerce provider probably has its
own ways of munging Latin-1).  I conclude that there may be genuine
benefits to ebytes here, but they're probably not general enough to
put in the stdlib (or the Python language).

  Which works if and only if your outputs are truly unicode-able.

With PEP 383, they always are, as long as you allow Unicode to be
decoded to the same garbage your bytes-based program would have
produced anyway.

  If you work with legacy systems (e.g. those Asian email clients and
  US postal software), you are really working with a *character set*,
  not unicode,

I think you're missing something.  Namely, Unicode is a standard 

Re: [Python-Dev] email package status in 3.X

2010-06-22 Thread Nick Coghlan
On Tue, Jun 22, 2010 at 4:49 PM, Stephen J. Turnbull step...@xemacs.org wrote:
   Which works if and only if your outputs are truly unicode-able.

 With PEP 383, they always are, as long as you allow Unicode to be
 decoded to the same garbage your bytes-based program would have
 produced anyway.

Could it be that part of the problem here is that we need to better
advertise errors='surrogateescape' as a mechanism for decoding
incorrectly encoded data according to a nominal codec without throwing
UnicodeDecode and UnicodeEncode errors all over the place? Currently
it only garners a mention in the docs in the context of the os module,
the list of error handlers in the codecs module and as a default error
handler argument in the tarfile module.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-22 Thread Stephen J. Turnbull
Nick Coghlan writes:
  On Tue, Jun 22, 2010 at 4:49 PM, Stephen J. Turnbull step...@xemacs.org 
  wrote:
     Which works if and only if your outputs are truly unicode-able.
  
   With PEP 383, they always are, as long as you allow Unicode to be
   decoded to the same garbage your bytes-based program would have
   produced anyway.
  
  Could it be that part of the problem here is that we need to better
  advertise errors='surrogateescape' as a mechanism for decoding
  incorrectly encoded data according to a nominal codec without throwing
  UnicodeDecode and UnicodeEncode errors all over the place?

Yes, I think that would make the use str internally to urllib
strategy a lot more palatable.  But it still needs to be combined with
a program architecture of decode-process-encode, which might require
substantial refactoring for some existing modules.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-22 Thread Michael Urman
On Tue, Jun 22, 2010 at 00:28, Stephen J. Turnbull step...@xemacs.org wrote:
 Michael Urman writes:

   It is somewhat troublesome that there doesn't appear to be an obvious
   built-in idempotent-when-possible function that gives back the
   provided bytes/str,

 If you want something idempotent, it's already the case that
 bytes(b'abc') = b'abc'.  What might be desirable is to make
 bytes('abc') work and return b'abc', but only if 'abc' is pure ASCII
 (or maybe ISO 8859/1).

By idempotent-when-possible, I mean to_bytes(str_or_bytes, encoding,
errors) that would pass an instance of bytes through, or encode an
instance of str. And of course a to_str that performs similarly,
passing str through and decoding bytes. While bytes(b'abc') will give
me b'abc', neither bytes('abc') nor bytes(b'abc', 'latin-1') get me
the b'abc' I want to see.

These are trivial functions; I just don't fully understand why the
capability isn't baked in. A one argument call is idempotent capable;
a two argument call isn't as it only converts.

It's not a completely made-up requirement either. A cross-platform
piece of software may need to present to a user items that are
sometimes str and sometimes bytes - particularly filenames.

 Unfortunately, str(b'abc') already does work, but

 st...@uwakimon ~ $ python3.1
 Python 3.1.2 (release31-maint, May 12 2010, 20:15:06)
 [GCC 4.3.4] on linux2
 Type help, copyright, credits or license for more information.
 str(b'abc')
 b'abc'


 Oops.  You can see why that probably should be the case

Sure, and I love having this there for debugging. But this is hardly
good enough for presenting to a user once you leave ascii.
 u = '日本語'
 sjis = bytes(u, 'shift-jis')
 utf8 = bytes(u, 'utf-8')
 str(sjis), str(utf8)
(b'\\x93\\xfa\\x96{\\x8c\\xea',
b'\\xe6\\x97\\xa5\\xe6\\x9c\\xac\\xe8\\xaa\\x9e')

When I happen to know the encoding, I can reverse it much more cleanly.
 str(sjis, 'shift-jis'), str(utf8, 'utf-8')
('日本語', '日本語')

But I can't mix this approach with str instances without writing a
different invocation.
 str(u, 'argh')
TypeError: decoding str is not supported

-- 
Michael Urman
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-22 Thread Tres Seaver
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Jesse Noller wrote:
 
 On Jun 19, 2010, at 10:13 AM, Tres Seaver tsea...@palladion.com wrote:

 Nothing is set in stone; if something is incredibly painful, or worse
 yet broken, then someone needs to file a bug, bring it to this list,
 or bring up a patch.
 Or walk away.

 
 Ok. If you want.

I specifically said I *didn't* want to walk away.  I'm pointing out that
in the general case, the ordinary user who finds something incredibly
painful or broken is far more likely to walk away from the platform than
try to fix it, especially if there are available alternatives (e.g.,
Ruby, Python 2) where the pain level for that user's application is lower.

 I guess tutorial welcome, rather than patch welcome then ;)
 The only folks who can write the tutorial are the ones who have  
 already drunk the koolaid.  Note that I've been making my living with Python 
  
 for about twelve years now, and would *like* to use Python3, but can't,  
 yet, and therefore haven't taken the first sip.
 
 Why can't you? Is it a bug?

It's not *a* bug, it is that I do my day to day work on very large
applications which depend on a large number of not-yet-ported libraries.
 This barrier is the negative network effect which is the whole point
of this thread:  there is nothing wrong with Python3 except that, to use
it, I have to stop doing the work which pays to do an
indeterminately-large amount of hobby work (of which I already do
quite a lot).

 Let's file it and fix it. Is it that you  
 need a dependency ported?

I need dozens of them ported, and am working on some of them in the
aforementioned copious spare time.

 Cool - let's bring it up to the maintainers,  
 or this list, or ask the PSF to push resources into helping port.  
 Anything but nothing.

Nothing is the default:  I am already successful with Python 2, and
can't be successfulwith Python 3 (in the sense of delivering timely,
cost-effective solutions to my customers) until *all* those dependencies
are ported and stable there.

 If what you're saying is that python 3 is a completely unsuitable  
 platform, well, then yeah - we can all fix it or walk away.

I didn't say that:  I said that Python 3 is unsuitable *today* for the
work I'm doing, and that the relative wins it provides over Python 2 are
dwarfed by the effort required to do all those ports myself.

 IOW, 3.x has broken TOOOWTDI for me in some areas.  There may
 be obvious ways to do it, but, as per the Zen of Python, that
 way may not be obvious at first unless you're Dutch.  ;-)

OT:  The Dutch smiley there doesn't actually help anything but undercut
any point to having TOOOWTDI in the list at all.

 What areas. We need specifics which can either be:

 1 Shot down.
 2 Turned into bugs, so they can be fixed
 3 Documented in the core documentation.

 That's bloody ironic in a thread which had pointed at reasons why  
 people are not even considering Py3 for their projects:  those folks won't  
 even find the issues due to the lack of confidence in the suitability of  
 the platform.
 
 What I saw was a thread about some issues in email, and cgi. We have  
 some work being done to address the issue. This will help resolve some  
 of the issues.
 
 If there are other issues, then we should step up and either help, or  
 get out ofthe way. Arguing about the viability of a platform we knew  
 would take a bit for adoption is silly and breeds ill will.

I'm not arguing about viability:  there are obviously users for whom
Python 3 is not only viable, but superior to Python 2.  However, I am
quite confident that many pro-Python 3 folks arguing here underestimate
the scope of the issues which have generated the (self-fullfilling) not
yet perception.

 It's not a turd, and it's not hopeless, in fact rumor has it NumPy  
 will be ported soon which is a major stepping stone.

Sure, for the (far from trivial) subset of the community doing numerical
work.

 The only way to counteract this meme that python 3 is horribly  
 broken is to prove that it's not, fix bugs, and move on. There's no  
 point debating relative turdiness here.

Any turdiness (which I am *not* arguing for) is a natural consequence
of the kinds of backward incompatibilities which were *not* ruled out
for Python 3, along with the (early, now waning) build it and they will
 come optimism about adoption rates.



Tres.
- --
===
Tres Seaver  +1 540-429-0999  tsea...@palladion.com
Palladion Software   Excellence by Designhttp://palladion.com
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkwg5rIACgkQ+gerLs4ltQ6J7wCdFkQL7XeKtBM407Z5D2rSKk8n
EWYAoJUfW+JgURUz7NJcWmqFw3PkNYde
=WZEv
-END PGP SIGNATURE-

___
Python-Dev mailing list
Python-Dev@python.org

Re: [Python-Dev] email package status in 3.X

2010-06-22 Thread Guido van Rossum
On Mon, Jun 21, 2010 at 10:28 PM, Stephen J. Turnbull
step...@xemacs.org wrote:
 Michael Urman writes:

   It is somewhat troublesome that there doesn't appear to be an obvious
   built-in idempotent-when-possible function that gives back the
   provided bytes/str,

 If you want something idempotent, it's already the case that
 bytes(b'abc') = b'abc'.  What might be desirable is to make
 bytes('abc') work and return b'abc', but only if 'abc' is pure ASCII
 (or maybe ISO 8859/1).

No, no, no! That's just what Python 2 did.

 Unfortunately, str(b'abc') already does work, but

 st...@uwakimon ~ $ python3.1
 Python 3.1.2 (release31-maint, May 12 2010, 20:15:06)
 [GCC 4.3.4] on linux2
 Type help, copyright, credits or license for more information.
 str(b'abc')
 b'abc'


 Oops.  You can see why that probably should be the case.

There is a near-contract that str() of pretty much anything returns a
printable version of that thing.

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-22 Thread Terry Reedy

On 6/22/2010 9:24 AM, Michael Urman wrote:


By idempotent-when-possible, I mean to_bytes(str_or_bytes, encoding,
errors) that would pass an instance of bytes through, or encode an
instance of str. And of course a to_str that performs similarly,
passing str through and decoding bytes. While bytes(b'abc') will give
me b'abc', neither bytes('abc') nor bytes(b'abc', 'latin-1') get me
the b'abc' I want to see.

These are trivial functions;
I just don't fully understand why the capability isn't baked in.


Possible reasons: They are special purpose functions easily built on the 
basic functions provided. Fine for a 3rd party library. Most people do 
not need them. Some might be mislead by them. As other have said, Not 
every one-liner should be builtin.


--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-22 Thread Terry Reedy
Tres, I am a Python3 enthusiast and realist. I did not expect major 
adoption for about 3 years (more optimistic than the 5 years of some).


If you are feeling pressured to 'move' to Python3, it is not from me. I 
am sure you will do so on your own, perhaps even with enthusiasm, when 
it will be good for *you* to do so.


If someone wants to contribute while sticking to Python2, its easy. The 
tracker has perhaps 2000 open 2.x issues, hundreds with no responses. If 
more Python2 people worked on making 2.7 as bug-free as possible, the 
developers would be freer to make 3.2 as good as possible (which is what 
*I* want).


The porting of numpy (which I suspect has gotten some urging) will not 
just benefit 'nemerical' computing. For instance, there cannot be a 3.x 
version of pygame until there is a 3.x version of numpy, its main Python 
dependency. (The C Simple Directmedia Llibrary it also wraps and builds 
upon does not care.)


--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-22 Thread Guido van Rossum
On Tue, Jun 22, 2010 at 9:37 AM, Tres Seaver tsea...@palladion.com wrote:
 Any turdiness (which I am *not* arguing for) is a natural consequence
 of the kinds of backward incompatibilities which were *not* ruled out
 for Python 3, along with the (early, now waning) build it and they will
  come optimism about adoption rates.

FWIW, my optimisim is *not* waning. I think it's good that we're
having this discussion and I expect something useful will come out of
it; I also expect in general that the (admittedly serious) problem of
having to port all dependencies will be solved in the next few years.
Not by magic, but because many people are taking small steps in the
right direction, and there will be light eventually. In the mean time
I don't blame anyone for sticking with 2.x or being too busy to help
port stuff to 3.x. Python 3 has been a long time in the making -- it
will be a bit longer still, which was expected.

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-22 Thread Michael Urman
On Tue, Jun 22, 2010 at 15:32, Terry Reedy tjre...@udel.edu wrote:
 On 6/22/2010 9:24 AM, Michael Urman wrote:
 These are trivial functions;
 I just don't fully understand why the capability isn't baked in.

 Possible reasons: They are special purpose functions easily built on the
 basic functions provided. Fine for a 3rd party library. Most people do not
 need them. Some might be mislead by them. As other have said, Not every
 one-liner should be builtin.

Perhaps the two-argument constructions on bytes and str should have
been removed in favor of the .decode and .encode methods on their
respective classes. Or vice versa; I don't have the history to know in
which order they originated, and which is theoretically preferred
these days.

-- 
Michael Urman
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread Nick Coghlan
On Mon, Jun 21, 2010 at 11:58 AM, P.J. Eby p...@telecommunity.com wrote:
 At 08:08 AM 6/21/2010 +1000, Nick Coghlan wrote:

 Perhaps if people could identify which specific string methods are
 causing problems?

 __getitem__(int) returns an integer rather than a bytestring, so anything
 that manipulates individual characters can't be given bytes and have it
 work.

It can if you use length one slices rather than simple indexing.
Depending on the details, such algorithms may still fail for
multi-byte codecs though.

 That was one of the key differences I had in mind for a bstr type, apart
 from  designing it to coerce normal strings to bstrs in cross-type
 operations, and to allow O(1) conversion to/from bytes.

Erk, that just sounds like a recipe for recreating the problems 2.x
has in a new form.

 Another randomly chosen byte/string incompatibility (Python 3.1; I don't
 have 3.2 handy at the moment):

 os.path.join(b'x','y')
 Traceback (most recent call last):
  File stdin, line 1, in module
  File c:\Python31\lib\ntpath.py, line 161, in join
    if b[:1] in seps:
 TypeError: Type str doesn't support the buffer API

 os.path.join('x',b'y')
 Traceback (most recent call last):
  File stdin, line 1, in module
  File c:\Python31\lib\ntpath.py, line 161, in join
    if b[:1] in seps:
 TypeError: 'in string' requires string as left operand, not bytes

 Ironically, it seems to me that in trying to make the type distinction more
 rigid, Py3K fails in this area precisely because it is not a rigidly typed
 language in the Java or Haskell sense: i.e., os.path.join doesn't say, I
 need two stringlike objects of the *same type*, not even in its docstring.

I believe it actually needs the objects to be compatible with the type
of os.sep, rather than just with each other (i.e. the type
restrictions on os.path.join are the same as those on os.sep.join,
even though the join algorithm itself is slightly different). This
restriction should be mentioned in the Py3k docstring and docs for
os.path.join - if it isn't, that would be a doc bug.

 At least in Java, you would either implement a path type with coercions
 from bytes and strings, or you'd have a class with overloaded methods for
 handling join operations on bytes and strings, respectively, thereby
 avoiding this whole mess.

 (Alas, this little example on the 'in' operator also shows that my bstr
 effort would probably fail anyway, because there's no '__rcontains__'
 (__lcontains__?) to allow it to override the str type's __contains__.)

OK, these examples convince me that the incompatibility problem is
real. However, I don't think a bstr type can solve them even without
the __rcontains__ problem - it would just recreate the pain that we
already have in the 2.x world.

Something that may make sense to ease the porting process is for some
of these on the boundary I/O related string manipulation functions
(such as os.path.join) to grow encoding keyword-only arguments. The
recommended approach would be to provide all strings, but bytes could
also be accepted if an encoding was specified. (If you want to mix
encodings - tough, do the decoding yourself).

For the idea of avoiding excess copying of bytes through multiple
encoding/decoding calls... isn't that meant to be handled at an
architectural level (i.e. decode once on the way in, encode once on
the way out)? Optimising the single-byte codec case by minimising data
copying (possibly through creative use of PEP 3118) may be something
that we want to look at eventually, but it strikes me as something of
a premature optimisation at this point in time (i.e. the old adage
first get it working, then get it working fast).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread P.J. Eby

At 10:20 PM 6/21/2010 +1000, Nick Coghlan wrote:

For the idea of avoiding excess copying of bytes through multiple
encoding/decoding calls... isn't that meant to be handled at an
architectural level (i.e. decode once on the way in, encode once on
the way out)? Optimising the single-byte codec case by minimising data
copying (possibly through creative use of PEP 3118) may be something
that we want to look at eventually, but it strikes me as something of
a premature optimisation at this point in time (i.e. the old adage
first get it working, then get it working fast).


The issue is, I'd like to have an idempotent incantation that I can 
use to make the inputs and outputs to stdlib functions behave in a 
type-safe manner with respect to bytes, in cases where bytes are 
really what I want operated on.


Note too that this is an argument for symmetry in wrapping the inputs 
and outputs, so that the code doesn't have to know what it's dealing with!


After all, right now, if a stdlib function might return bytes or 
unicode depending on runtime conditions, I can't even hardcode an 
.encode() call -- it would fail if the return type is a bytes.


This basically goes against the tell, don't ask pattern, and the 
Pythonically idempotent approach.  That is, Python builtins normally 
return you back the same thing if it's already what you want - 
int(someInt)- someInt, iter(someIter)-someIter, etc.


Since this incantation may need to be used often, and in places that 
are not known to me in advance, I would like it to not impose new 
overhead in unexpected places.  (i.e., the usual argument brought 
against making changes to the 'list' type that would change certain 
operations from O(1) to O(log something)).


It's more about predictability, and having One *Obvious* Way To Do 
It, as opposed to several ways, which you need to think carefully 
about and restructure your entire architecture around if 
necessary.  One obvious way means I can focus on the mechanical 
effort of porting *first*, without having to think.


So, the performance issue isn't really about performance *per se*, so 
much as about the mental UI of the language.  You could just as 
easily lie and tell me that your bstr implementation is O(1), and I 
would probably be happy and never notice, because the issue was never 
really about performance as such, but about having to *think* about 
it.  (i.e., breaking flow.)


Really, the entire issue can presumably be dealt with by some series 
of incantations - it's just code after all.  But having to sit and 
think about *every* situation where I'm dealing with bytes/unicode 
distinctions seems like a torture compared to being able to say, 
okay, so when dealing with this sort of API and this sort of data, 
this is the One Obvious Way to do the conversions.


It's One Obvious Way that I want, but some people seem to be arguing 
that the One Obvious Way is to Think Carefully About It Every Time -- 
and that seems to violate the Obvious part, IMO.  ;-)


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread Barry Warsaw
On Jun 21, 2010, at 10:20 PM, Nick Coghlan wrote:

Something that may make sense to ease the porting process is for some
of these on the boundary I/O related string manipulation functions
(such as os.path.join) to grow encoding keyword-only arguments. The
recommended approach would be to provide all strings, but bytes could
also be accepted if an encoding was specified. (If you want to mix
encodings - tough, do the decoding yourself).

This is probably a stupid idea, and if so I'll plead Monday morning mindfuzz
for it.

Would it make sense to have encoding-carrying bytes and str types?
Basically, I'm thinking of types (maybe even the current ones) that carry
around a .encoding attribute so that they can be automatically encoded and
decoded where necessary.  This at least would simplify APIs that need to do
the conversion.

By default, the .encoding attribute would be some marker to indicated I have
no idea, do it explicitly and if you combine ebytes or estrs that have
incompatible encodings, you'd either throw an exception or reset the .encoding
to IAmConfuzzled.  But say you had an email header like:

=?euc-jp?b?pc+l7aG8pe+hvKXrpcmhqg==?=

And code like the following (made less crappy):

-snip snip-
class ebytes(bytes):
encoding = 'ascii'

def __str__(self):
s = estr(self.decode(self.encoding))
s.encoding = self.encoding
return s


class estr(str):
encoding = 'ascii'


s = str(b'\xa5\xcf\xa5\xed\xa1\xbc\xa5\xef\xa1\xbc\xa5\xeb\xa5\xc9\xa1\xaa', 
'euc-jp')
b = bytes(s, 'euc-jp')

eb = ebytes(b)
eb.encoding = 'euc-jp'
es = str(eb)
print(repr(eb), es, es.encoding)
-snip snip-

Running this you get:

b'\xa5\xcf\xa5\xed\xa1\xbc\xa5\xef\xa1\xbc\xa5\xeb\xa5\xc9\xa1\xaa' ハローワールド! 
euc-jp

Would it be feasible?  Dunno.  Would it help ease the bytes/str confusion?
Dunno.  But I think it would help make APIs easier to design and use because
it would cut down on the encoding-keyword function signature infection.

-Barry


signature.asc
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread Michael Urman
On Mon, Jun 21, 2010 at 09:51, P.J. Eby p...@telecommunity.com wrote:
 The issue is, I'd like to have an idempotent incantation that I can use to
 make the inputs and outputs to stdlib functions behave in a type-safe manner
 with respect to bytes, in cases where bytes are really what I want operated
 on.

 Note too that this is an argument for symmetry in wrapping the inputs and
 outputs, so that the code doesn't have to know what it's dealing with!

It is somewhat troublesome that there doesn't appear to be an obvious
built-in idempotent-when-possible function that gives back the
provided bytes/str, or converts to the requested type per the listed
encoding (as of 3.1.2). Would it be useful to make the second versions
of these work, or would that return us to the confusion of the 2.x
era? On the other hand, since these are all TypeErrors instead of
UnicodeErrors, it's an easy wrapper to write.

 bytes('abc', 'latin-1')
b'abc'
 bytes(b'abc', 'latin-1')
TypeError: encoding or errors without a string argument

 str(b'abc', 'latin-1')
'abc'
 str('abc', 'latin-1')
TypeError: decoding str is not supported

Interestingly the online docs for str say it can decode either a byte
string or a character buffer, a term which doesn't yield a definition
in a search; apparently either a string is not a character buffer, or
the docs are incorrect.
http://docs.python.org/py3k/library/functions.html?highlight=str#str

However it looks like this is consistent with int.
 int(4, 0)
TypeError: int() can't convert non-string with explicit base

-- 
Michael Urman
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread Toshio Kuratomi
On Mon, Jun 21, 2010 at 11:43:07AM -0400, Barry Warsaw wrote:
 On Jun 21, 2010, at 10:20 PM, Nick Coghlan wrote:
 
 Something that may make sense to ease the porting process is for some
 of these on the boundary I/O related string manipulation functions
 (such as os.path.join) to grow encoding keyword-only arguments. The
 recommended approach would be to provide all strings, but bytes could
 also be accepted if an encoding was specified. (If you want to mix
 encodings - tough, do the decoding yourself).
 
 This is probably a stupid idea, and if so I'll plead Monday morning mindfuzz
 for it.
 
 Would it make sense to have encoding-carrying bytes and str types?
 Basically, I'm thinking of types (maybe even the current ones) that carry
 around a .encoding attribute so that they can be automatically encoded and
 decoded where necessary.  This at least would simplify APIs that need to do
 the conversion.
 
 By default, the .encoding attribute would be some marker to indicated I have
 no idea, do it explicitly and if you combine ebytes or estrs that have
 incompatible encodings, you'd either throw an exception or reset the .encoding
 to IAmConfuzzled.  But say you had an email header like:
 
 =?euc-jp?b?pc+l7aG8pe+hvKXrpcmhqg==?=
 
 And code like the following (made less crappy):
 
 -snip snip-
 class ebytes(bytes):
 encoding = 'ascii'
 
 def __str__(self):
 s = estr(self.decode(self.encoding))
 s.encoding = self.encoding
 return s
 
 
 class estr(str):
 encoding = 'ascii'
 
 
 s = str(b'\xa5\xcf\xa5\xed\xa1\xbc\xa5\xef\xa1\xbc\xa5\xeb\xa5\xc9\xa1\xaa', 
 'euc-jp')
 b = bytes(s, 'euc-jp')
 
 eb = ebytes(b)
 eb.encoding = 'euc-jp'
 es = str(eb)
 print(repr(eb), es, es.encoding)
 -snip snip-
 
 Running this you get:
 
 b'\xa5\xcf\xa5\xed\xa1\xbc\xa5\xef\xa1\xbc\xa5\xeb\xa5\xc9\xa1\xaa' ハローワールド! 
 euc-jp
 
 Would it be feasible?  Dunno.  Would it help ease the bytes/str confusion?
 Dunno.  But I think it would help make APIs easier to design and use because
 it would cut down on the encoding-keyword function signature infection.
 
I like the idea of having encoding information carried with the data.
I don't think that an ebytes type that can *optionally* have an encoding
attribute makes the situation less confusing, though.  To me the biggest
problem with python-2.x's unicode/bytes handling was not that it threw
exceptions but that it didn't always throw exceptions.  You might test this
in python2::
t = u'cafe'
function(t)

And say, ah my code works.  Then a user gives it this::
t = u'café'
function(t)

And get a unicode error because the function only works with unicode in the
ascii range.

ebytes seems to have the same pitfall where the code path exercised by your
tests could work with::
eb = ebytes(b)
eb.encoding = 'euc-jp'
function(eb)

but the user exercises a code path that does this and fails::
eb = ebytes(b)
function(eb)

What do you think of making the encoding attribute a mandatory part of
creating an ebyte object?  (ex: ``eb = ebytes(b, 'euc-jp')``).

-Toshio


pgpc4qEcxzofr.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread P.J. Eby

At 11:43 AM 6/21/2010 -0400, Barry Warsaw wrote:

On Jun 21, 2010, at 10:20 PM, Nick Coghlan wrote:
Something that may make sense to ease the porting process is for some
of these on the boundary I/O related string manipulation functions
(such as os.path.join) to grow encoding keyword-only arguments. The
recommended approach would be to provide all strings, but bytes could
also be accepted if an encoding was specified. (If you want to mix
encodings - tough, do the decoding yourself).

This is probably a stupid idea, and if so I'll plead Monday morning mindfuzz
for it.

Would it make sense to have encoding-carrying bytes and str types?


It's not a stupid idea, and could potentially work.  It also might 
have a better chance of being able to actually be *implemented* in 
3.x than my idea.



Basically, I'm thinking of types (maybe even the current ones) that carry
around a .encoding attribute so that they can be automatically encoded and
decoded where necessary.  This at least would simplify APIs that need to do
the conversion.


I'm not really sure how much use the encoding is on a unicode object 
- what would it actually mean?


Hm. I suppose it would effectively mean this string can be 
represented in this encoding -- which is useful, in that you could 
fail operations when combining with bytes of a different encoding.


Hm... no, in that case you should just encode the string to the 
bytes' encoding, and let that throw an error if it fails.  So, 
really, there's no reason for a string to know its encoding.  All you 
need is the bytes type to have an encoding attribute, and when doing 
mixed-type operations between bytes and strings, coerce to *bytes of 
the same encoding*.


However, if .encoding is None, then coercion would follow the same 
rules as now -- i.e., convert the bytes to unicode, assuming an ascii 
encoding.  (This would be different than setting an encoding of 
'ascii', because in that case, it means you want cross-type 
operations to result in ascii bytes, rather than a  unicode string, 
and to fail if the unicode part can't be encoded appropriately.  The 
'None' setting is effectively a nod to compatibility with prior 3.x 
versions, since I assume we can't just throw out the old coercion behavior.)


Then, a few more changes to the bytes type would round out the implementation:

* Allow .decode() to not specify an encoding, unless .encoding is None

* Add back in the missing string methods (e.g. .encode()), since you 
can transparently upgrade to a string)


* Smart __str__, as shown in your proposal.



Would it be feasible?  Dunno.


Probably, although it might mean adding back in special cases that 
were previously taken out, and a few new ones.




  Would it help ease the bytes/str confusion?  Dunno.


Not sure what confusion you mean -- Web-SIG and I at least are not 
confused about the difference between bytes and str, or we wouldn't 
be having an issue.  ;-)  Or maybe you mean the stdlib's API 
confusion?  In which case, yes, definitely!




  But I think it would help make APIs easier to design and use because
it would cut down on the encoding-keyword function signature infection.


Not only that, but I believe it would also retroactively make the 
stdlib's implementation of those APIs correct again, and give us 
One Obvious Way to work with bytes of a known encoding, while 
constraining any unicode that gets combined with those bytes to be 
validly encodable.  It also gives you an idempotent constructor for 
bytes of a specified encoding, that can take either a bytes of 
unspecified encoding, a bytes of the correct encoding, or a string 
that can be encoded as such.


In short, +1.  (I wish it were possible to go back and make bytes 
non-strings and have only this ebytes or bstr or whatever type have 
string methods, but I'm pretty sure that ship has already sailed.)


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread P.J. Eby

At 12:34 PM 6/21/2010 -0400, Toshio Kuratomi wrote:

What do you think of making the encoding attribute a mandatory part of
creating an ebyte object?  (ex: ``eb = ebytes(b, 'euc-jp')``).


As long as the coercion rules force str+ebytes (or str % ebytes, 
ebytes % str, etc.) to result in another ebytes (and fail if the str 
can't be encoded in the ebytes' encoding), I'm personally fine with 
it, although I really like the idea of tacking the encoding to bytes 
objects in the first place.


OTOH, one potential problem with having the encoding on the bytes 
object rather than the ebytes object is that then you can't easily 
take bytes from a socket and then say what encoding they are, without 
interfering with the sockets API (or whatever other place you get the 
bytes from).


So, on balance, making ebytes a separate type (perhaps one that's 
just a pointer to the bytes and a pointer to the encoding) would 
indeed make more sense.  It having different coercion rules for 
interacting with strings would make more sense too in that 
case.  (The ideal, of course, would still be to not let bytes objects 
be stringlike at all, with only ebytes acting string-like.  That way, 
you'd be forced to be explicit about your encoding when working with 
bytes, but all you'd need to do was make an ebytes call.)


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread Terry Reedy

On 6/21/2010 11:43 AM, Barry Warsaw wrote:


This is probably a stupid idea, and if so I'll plead Monday morning mindfuzz
for it.

Would it make sense to have encoding-carrying bytes and str types?


On 2009-11-5 I posted 'Add encoding attribute to bytes' to python-ideas. 
It was shot down at the time.


Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread Stephen J. Turnbull
P.J. Eby writes:

  Note too that this is an argument for symmetry in wrapping the
  inputs and outputs, so that the code doesn't have to know what
  it's dealing with!

and

  After all, right now, if a stdlib function might return bytes or 
  unicode depending on runtime conditions, I can't even hardcode an 
  .encode() call -- it would fail if the return type is a bytes.

I'm lost.  What stdlib functions are you talking about whose return
type depends on runtime conditions, and what runtime conditions?  What
do you mean by wrapping?

The only times I've run into str/bytes nondeterminancy is when I've
mixed str/bytes myself, and passed them into functions that are
type-identities (str - str, bytes - bytes), which then appear to
give a nondeterministic result.  It's a deterministic bug, though,
always mine.wink

  It's One Obvious Way that I want, but some people seem to be arguing 
  that the One Obvious Way is to Think Carefully About It Every Time -- 
  and that seems to violate the Obvious part, IMO.  ;-)

Nick alluded to the The One Obvious Way as a change in architecture.

Specifically: Decode all bytes to typed objects (str, images, audio,
structured objects) at input.  Do no manipulations on bytes ever
except decode and encode (both to text, and to special-purpose objects
such as images) in a program that does I/O.  (Obviously image
manipulation libraries etc will have to operate on bytes, but they
should have no functions that consume bytes except constructors a la
bytes.decode() for text, and no functions that produce bytes except
the output serializers that write files and the like, a la
str.encode().)  Encode back to bytes on output.

Yes, this is tedious if you live in an ASCII world, compared to using
bytes as characters.  However, it works for the rest of us, which the
old style doesn't.

As for Think Carefully About It Every Time, that is required only in
Porting Programs That Mix Operation On Bytes With Operation On Str.
If you write programs from scratch, however, the decode-process-encode
paradigm quickly becomes second nature.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread Stephen J. Turnbull
Barry Warsaw writes:

  Would it make sense to have encoding-carrying bytes and str
  types?

Why limit that to bytes and str?  Why not have all objects carry their
serializer/deserializer around with them?

I think the answer is no, though, because (1) it would constitute an
attractive nuisance (the default would be abused, it would work fine
in Kansas, and all hell would break loose in Kagoshima, simply
delaying the pain and/or passing it on to third parties), and (2) you
really want this under control of higher level objects that have
access to some knowledge of the environment, rather than the lowest
level.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread P.J. Eby

At 01:36 PM 6/21/2010 -0400, Terry Reedy wrote:

On 6/21/2010 11:43 AM, Barry Warsaw wrote:


This is probably a stupid idea, and if so I'll plead Monday morning mindfuzz
for it.

Would it make sense to have encoding-carrying bytes and str types?


On 2009-11-5 I posted 'Add encoding attribute to bytes' to 
python-ideas. It was shot down at the time.


AFAICT, that's mainly for lack of apparent use cases, and also for 
confusion.  Here, the use case (restoring the polymorphy of stdlib 
APIs) is pretty clear.


However, if we had the string equivalent of a coercion protocol (that 
core strings and bytes would co-operate with), then it would enable 
people to write their own versions of either your idea or Barry's 
idea (or other things altogether), and still get the stdlib to play along.


Personally, I think ebytes() would do the trick and it'd be nice to 
see it in stdlib, but gaining a string coercion protocol instead 
might not be a bad tradeoff.  ;-)


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread P.J. Eby

At 02:58 AM 6/22/2010 +0900, Stephen J. Turnbull wrote:

Nick alluded to the The One Obvious Way as a change in architecture.

Specifically: Decode all bytes to typed objects (str, images, audio,
structured objects) at input.  Do no manipulations on bytes ever
except decode and encode (both to text, and to special-purpose objects
such as images) in a program that does I/O.


This ignores the existence of use cases where what you have is text 
that can't be properly encoded in unicode.  I know, it's a hard thing 
to wrap one's head around, since on the surface it sounds like 
unicode is the programmer's savior.  Unfortunately, real-world text 
data exists which cannot be safely roundtripped to unicode, and must 
be handled in bytes with encoding form for certain operations.


I personally do not have to deal with this *particular* use case any 
more -- I haven't been at NTT/Verio for six years now.  But I do know 
it exists for e.g. Asian language email handling, which is where I 
first encountered it.  At the time (this *may* have changed), many 
popular email clients did not actually support unicode, so you 
couldn't necessarily just send off an email in UTF-8.  It drove us 
nuts on the project where this was involved (an i18n of an existing 
Python app), and I think we had to compromise a bit in some fashion 
(because we couldn't really avoid unicode roundtripping due to 
database issues), but the use case does actually exist.


My current needs are simpler, thank goodness.  ;-)  However, they 
*do* involve situations where I'm dealing with *other* 
encoding-restricted legacy systems, such as software for interfacing 
with the US Postal Service that only works with a restricted subset 
of latin1, while receiving mangled ASCII from an ecommerce provider, 
and storing things in what's effectively a latin-1 database.  Being 
able to easily assert what kind of bytes I've got would actually let 
me catch errors sooner, *if* those assertions were being checked when 
different kinds of strings or bytes were being combined.  i.e., at 
coercion time).




Yes, this is tedious if you live in an ASCII world, compared to using
bytes as characters.  However, it works for the rest of us, which the
old style doesn't.


I'm not trying to go back to the old style -- ideally, I want 
something that would actually improve on the it's not really 
unicode use cases above if it were available in 2.x.


I don't want to be encoding agnostic or encoding implicit, -- I 
want to make it possible to be even *more* explicit and restrictive 
than it is currently possible to be in either 2.x OR 3.x.  It's just 
that 3.x affords greater opportunity for doing this, and is an ideal 
place to make the switch -- i.e., at a point where you now have to 
get explicit about your encodings, anyway!




As for Think Carefully About It Every Time, that is required only in
Porting Programs That Mix Operation On Bytes With Operation On Str.
If you write programs from scratch, however, the decode-process-encode
paradigm quickly becomes second nature.


Which works if and only if your outputs are truly unicode-able.  If 
you work with legacy systems (e.g. those Asian email clients and US 
postal software), you are really working with a *character set*, not 
unicode, and so putting your data in unicode form is actually *wrong* 
-- an expedient lie.


Heresy, I know, but there you go.  ;-)

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread P.J. Eby

At 03:08 AM 6/22/2010 +0900, Stephen J. Turnbull wrote:

Barry Warsaw writes:

  Would it make sense to have encoding-carrying bytes and str
  types?


I think the answer is no, though, because (1) it would constitute an
attractive nuisance (the default would be abused, it would work fine
in Kansas, and all hell would break loose in Kagoshima, simply
delaying the pain and/or passing it on to third parties),


You have the proposal exactly backwards, actually.

In Kagoshima, you'd use pass in an ebytes with your encoding to a 
stdlib API, and *get back an ebytes with the right encoding*, rather 
than an (incorrect and useless) unicode object which has lost data you need.




Why limit that to bytes and str?  Why not have all objects carry their
serializer/deserializer around with them?


Because it's not a serialization or deserialization.  Your conceptual 
framework here implies that unicode objects are the real thing, and 
that bytes are just a way of transporting unicode around.


But this is not the case at all, for use cases where no, really, you 
*have to* work with bytes-encoded text streams.  The mere release of 
Python 3.x will not cause all the world's applications, libraries, 
and protocols to suddenly work with unicode, where they did not before.


Being explicit about the encoding of the bytes you're flinging around 
is actually an *increase* in specificity, explicitness, robustness, 
and error-checking ability over the status quo for either 2.x *or* 
3.x...  *and* it improves these qualities for essentially *all* 
string-handling code, without requiring that code to be rewritten to do so.


It's like getting to use the time machine, really.



and (2) you
really want this under control of higher level objects that have
access to some knowledge of the environment, rather than the lowest
level.


This proposal actually has such a higher-level object: an 
ebytes.  And it passes that information *through* the lowest level, 
in such a way as to permit the stringlike operations to be fully 
polymorphic, without the information being lost inside somebody else's API.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread Toshio Kuratomi
On Mon, Jun 21, 2010 at 01:24:10PM -0400, P.J. Eby wrote:
 At 12:34 PM 6/21/2010 -0400, Toshio Kuratomi wrote:
 What do you think of making the encoding attribute a mandatory part of
 creating an ebyte object?  (ex: ``eb = ebytes(b, 'euc-jp')``).
 
 As long as the coercion rules force str+ebytes (or str % ebytes,
 ebytes % str, etc.) to result in another ebytes (and fail if the str
 can't be encoded in the ebytes' encoding), I'm personally fine with
 it, although I really like the idea of tacking the encoding to bytes
 objects in the first place.
 
I wouldn't like this.  It brings us back to the python2 problem where
sometimes you pass an ebyte into a function and it works and other times you
pass an ebyte into the function and it issues a traceback.  The coercion
must end up with a str and no traceback (this assumes that we've checked
that the ebyte and the encoding match when we create the ebyte).

If you want bytes out the other end, you should either have a different
function or explicitly transform the output from str to bytes.

So, what's the advantage of using ebytes instead of bytes?

* It keeps together the text and encoding information when you're taking
  bytes in and want to give bytes back under the same encoding.
* It takes some of the boilerplate that people are supposed to do (checking
  that bytes are legal in a specific encoding) and writes it into the
  initialization of the object.  That forces you to think about the issue
  at two points in the code:  when converting into ebytes and when
  converting out to bytes.  For data that's going to be used with both
  str and bytes, this is the accepted best practice.  (For exceptions, the
  byte type remains which you can do conversion on when you want to).

-Toshio


pgpjsqwszNbF7.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread Barry Warsaw
On Jun 21, 2010, at 12:34 PM, Toshio Kuratomi wrote:

I like the idea of having encoding information carried with the data.
I don't think that an ebytes type that can *optionally* have an encoding
attribute makes the situation less confusing, though.

Agreed.  I think the attribute should always be there, but there probably
needs to be a magic value (perhaps None) that indicates and unknown, manual,
garbage, error, broken encoding.

Examples: you read bytes off a socket and don't know what the encoding is; you
concatenate two ebytes that have incompatible encodings.

To me the biggest
problem with python-2.x's unicode/bytes handling was not that it threw
exceptions but that it didn't always throw exceptions.  You might test this
in python2::
t = u'cafe'
function(t)

And say, ah my code works.  Then a user gives it this::
t = u'café'
function(t)

And get a unicode error because the function only works with unicode in the
ascii range.

That's an excellent point.

ebytes seems to have the same pitfall where the code path exercised by your
tests could work with::
eb = ebytes(b)
eb.encoding = 'euc-jp'
function(eb)

but the user exercises a code path that does this and fails::
eb = ebytes(b)
function(eb)

What do you think of making the encoding attribute a mandatory part of
creating an ebyte object?  (ex: ``eb = ebytes(b, 'euc-jp')``).

If ebytes is a separate type, then definitely +1.  If 'ebytes is bytes' then
I'd probably want to default the second argument to the magical i-don't-know'
marker.

-Barry


signature.asc
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread Barry Warsaw
On Jun 22, 2010, at 03:08 AM, Stephen J. Turnbull wrote:

Barry Warsaw writes:

  Would it make sense to have encoding-carrying bytes and str
  types?

Why limit that to bytes and str?  Why not have all objects carry their
serializer/deserializer around with them?

Only because the .encoding attribute isn't really a serializer/deserializer.
That's still bytes() and str() or the equivalent.  This is just a hint to a
specific serializer for parameters to that action.

I think the answer is no, though, because (1) it would constitute an
attractive nuisance (the default would be abused, it would work fine
in Kansas, and all hell would break loose in Kagoshima, simply
delaying the pain and/or passing it on to third parties), and (2) you
really want this under control of higher level objects that have
access to some knowledge of the environment, rather than the lowest
level.

I'm still not sure ebytes solves the problem, but it avoids one I'm most
concerned about seeing proposed.  I really really do not want to add
encoding=blah arguments to boatloads of function signatures.

-Barry


signature.asc
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread Barry Warsaw
On Jun 21, 2010, at 01:24 PM, P.J. Eby wrote:

OTOH, one potential problem with having the encoding on the bytes object
rather than the ebytes object is that then you can't easily take bytes from a
socket and then say what encoding they are, without interfering with the
sockets API (or whatever other place you get the bytes from).

Unless the default was the I don't know marker and you were able to set it
after you've done whatever kind of application-level calculation you needed to
do.

-Barry


signature.asc
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread Barry Warsaw
On Jun 21, 2010, at 03:29 PM, Toshio Kuratomi wrote:

I wouldn't like this.  It brings us back to the python2 problem where
sometimes you pass an ebyte into a function and it works and other times you
pass an ebyte into the function and it issues a traceback.  The coercion
must end up with a str and no traceback (this assumes that we've checked
that the ebyte and the encoding match when we create the ebyte).

Doing this at ebyte construction time does have the nice benefit of getting
the exception early, and because the ebyte is unmutable, you could cache the
results in an attribute on the ebyte.  Well, unmutable if the .encoding is
also unmutable.  If that can change, then you'd have to re-run the cached
decoding whenever the attribute were set, and there would be a penalty paid
each time this was done.

That, plus the socket use case, does argue for a separate ebytes type.

-Barry


signature.asc
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread P.J. Eby

At 03:29 PM 6/21/2010 -0400, Toshio Kuratomi wrote:

On Mon, Jun 21, 2010 at 01:24:10PM -0400, P.J. Eby wrote:
 At 12:34 PM 6/21/2010 -0400, Toshio Kuratomi wrote:
 What do you think of making the encoding attribute a mandatory part of
 creating an ebyte object?  (ex: ``eb = ebytes(b, 'euc-jp')``).

 As long as the coercion rules force str+ebytes (or str % ebytes,
 ebytes % str, etc.) to result in another ebytes (and fail if the str
 can't be encoded in the ebytes' encoding), I'm personally fine with
 it, although I really like the idea of tacking the encoding to bytes
 objects in the first place.

I wouldn't like this.  It brings us back to the python2 problem where
sometimes you pass an ebyte into a function and it works and other times you
pass an ebyte into the function and it issues a traceback.


For stdlib functions, this isn't going to happen unless your ebytes' 
encoding is not compatible with the ascii subset of unicode, or the 
stdlib function is working with dynamic data...  in which case you 
really *do* want to fail early!


I don't see this as a repeat of the 2.x situation; rather, it allows 
you to cause errors to happen much *earlier* than they would 
otherwise show up if you were using unicode for your encoded-bytes data.


For example, if your program's intent is to end up with latin-1 
output, then it would be better for an error to show up at the very 
*first* point where non-latin1 characters are mixed with your data, 
rather than only showing up at the output boundary!


However, if you promoted mixed-type operation results to unicode 
instead of ebytes, then you:


1) can't preserve data that doesn't have a 1:1 mapping to unicode, and

2) can't detect an error until your data reaches the output point in 
your application -- forcing you to defensively insert ebytes calls 
everywhere (vs. simply wrapping them around a handful of designated 
inputs), or else have to go right back to tracing down where the 
unusable data showed up in the first place.


One thing that seems like a bit of a blind spot for some folks is 
that having unicode is *not* everybody's goal.  Not because we don't 
believe unicode is generally a good thing or anything like that, but 
because we have to work with systems that flat out don't *do* 
unicode, thereby making the presence of (fully-general) unicode an 
error condition that has to be stamped out!


IOW, if you're producing output that has to go into another system 
that doesn't take unicode, it doesn't matter how 
theoretically-correct it would be for your app to process the data in 
unicode form.  In that case, unicode is not a feature: it's a bug.


And as it really *is* an error in that case, it should not pass 
silently, unless explicitly silenced.




So, what's the advantage of using ebytes instead of bytes?

* It keeps together the text and encoding information when you're taking
  bytes in and want to give bytes back under the same encoding.
* It takes some of the boilerplate that people are supposed to do (checking
  that bytes are legal in a specific encoding) and writes it into the
  initialization of the object.  That forces you to think about the issue
  at two points in the code:  when converting into ebytes and when
  converting out to bytes.  For data that's going to be used with both
  str and bytes, this is the accepted best practice.  (For exceptions, the
  byte type remains which you can do conversion on when you want to).


Hm.  For the output case, I suppose that means you might also want 
the text I/O wrappers to be able to be strict about ebytes' encoding.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread P.J. Eby

At 04:04 PM 6/21/2010 -0400, Barry Warsaw wrote:

On Jun 21, 2010, at 01:24 PM, P.J. Eby wrote:

OTOH, one potential problem with having the encoding on the bytes object
rather than the ebytes object is that then you can't easily take 
bytes from a

socket and then say what encoding they are, without interfering with the
sockets API (or whatever other place you get the bytes from).

Unless the default was the I don't know marker and you were able to set it
after you've done whatever kind of application-level calculation you needed to
do.


True, but making it a separate type with a required encoding gets rid 
of the magical I don't know - the I don't know encoding is just a 
plain old bytes object.


(In principle, you could then drop *all* the stringlike methods from 
plain-old-bytes objects.  If it's really text-in-bytes you want, you 
should use an ebytes with the encoding specified.)


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread Barry Warsaw
On Jun 21, 2010, at 01:17 PM, P.J. Eby wrote:

I'm not really sure how much use the encoding is on a unicode object - what
would it actually mean?

Hm. I suppose it would effectively mean this string can be represented in
this encoding -- which is useful, in that you could fail operations when
combining with bytes of a different encoding.

That's basically what I was thinking.

Hm... no, in that case you should just encode the string to the bytes'
encoding, and let that throw an error if it fails.  So, really, there's no
reason for a string to know its encoding.  All you need is the bytes type to
have an encoding attribute, and when doing mixed-type operations between
bytes and strings, coerce to *bytes of the same encoding*.

If ebytes were a separate type, and it did the encoding check at constructor
time, and the results of the decoding were cached, then I think you would not
need the equivalent of an estr type.  If you had a string and knew what it
could be encoded to, then you could just coerce it to an ebytes and use the
cached decoded value wherever you needed it.

E.g.

 mystring = 'some unicode string'
 myencoding = 'iso--foo'
 myebytes = ebytes(mystring, myencoding)
 myebytes.encoding == myencoding
True
 myebytes.string == mystring
True

So ebytes() could accept a str or bytes as its first argument.

 mybytes = b'some encoded string'
 myebytes = ebytes(mybytes, myencoding)
 mybytes == myebytes
True
 myebytes.encoding == myencoding
True

In the first example ebytes() encodes mystring to set the internal bytes
representation.  In the second example, ebytes() decodes the bytes to get the
.string attribute value.  In both cases, an exception is raised if the
encoding/decoding fails.

However, if .encoding is None, then coercion would follow the same rules as
now -- i.e., convert the bytes to unicode, assuming an ascii encoding.  (This
would be different than setting an encoding of 'ascii', because in that case,
it means you want cross-type operations to result in ascii bytes, rather than
a unicode string, and to fail if the unicode part can't be encoded
appropriately.  The 'None' setting is effectively a nod to compatibility with
prior 3.x versions, since I assume we can't just throw out the old coercion
behavior.)

Then, a few more changes to the bytes type would round out the implementation:

* Allow .decode() to not specify an encoding, unless .encoding is None

* Add back in the missing string methods (e.g. .encode()), since you can 
transparently upgrade to a string)

* Smart __str__, as shown in your proposal.

If my example above isn't nonsense, then __str__() would just return the
.string attribute.

In short, +1.  (I wish it were possible to go back and make bytes non-strings
and have only this ebytes or bstr or whatever type have string methods, but
I'm pretty sure that ship has already sailed.)

Maybe it's PEP time?  No, I'm not volunteering. ;)

-Barry



signature.asc
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread Barry Warsaw
On Jun 21, 2010, at 04:16 PM, P.J. Eby wrote:

At 04:04 PM 6/21/2010 -0400, Barry Warsaw wrote:
On Jun 21, 2010, at 01:24 PM, P.J. Eby wrote:

 OTOH, one potential problem with having the encoding on the bytes object
 rather than the ebytes object is that then you can't easily take  bytes 
 from a
 socket and then say what encoding they are, without interfering with the
 sockets API (or whatever other place you get the bytes from).

Unless the default was the I don't know marker and you were able to set it
after you've done whatever kind of application-level calculation you needed to
do.

True, but making it a separate type with a required encoding gets rid of the 
magical I don't know - the I don't know encoding is just a plain old bytes 
object.

(In principle, you could then drop *all* the stringlike methods from 
plain-old-bytes objects.  If it's really text-in-bytes you want, you should 
use an ebytes with the encoding specified.)

Yep, agreed!
-Barry


signature.asc
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread Toshio Kuratomi
On Mon, Jun 21, 2010 at 02:46:57PM -0400, P.J. Eby wrote:
 At 02:58 AM 6/22/2010 +0900, Stephen J. Turnbull wrote:
 Nick alluded to the The One Obvious Way as a change in architecture.
 
 Specifically: Decode all bytes to typed objects (str, images, audio,
 structured objects) at input.  Do no manipulations on bytes ever
 except decode and encode (both to text, and to special-purpose objects
 such as images) in a program that does I/O.
 
 This ignores the existence of use cases where what you have is text
 that can't be properly encoded in unicode.  I know, it's a hard thing
 to wrap one's head around, since on the surface it sounds like
 unicode is the programmer's savior.  Unfortunately, real-world text
 data exists which cannot be safely roundtripped to unicode, and must
 be handled in bytes with encoding form for certain operations.
 
 I personally do not have to deal with this *particular* use case any
 more -- I haven't been at NTT/Verio for six years now.  But I do know
 it exists for e.g. Asian language email handling, which is where I
 first encountered it.  At the time (this *may* have changed), many
 popular email clients did not actually support unicode, so you
 couldn't necessarily just send off an email in UTF-8.  It drove us
 nuts on the project where this was involved (an i18n of an existing
 Python app), and I think we had to compromise a bit in some fashion
 (because we couldn't really avoid unicode roundtripping due to
 database issues), but the use case does actually exist.
 
 My current needs are simpler, thank goodness.  ;-)  However, they
 *do* involve situations where I'm dealing with *other*
 encoding-restricted legacy systems, such as software for interfacing
 with the US Postal Service that only works with a restricted subset
 of latin1, while receiving mangled ASCII from an ecommerce provider,
 and storing things in what's effectively a latin-1 database.  Being
 able to easily assert what kind of bytes I've got would actually let
 me catch errors sooner, *if* those assertions were being checked when
 different kinds of strings or bytes were being combined.  i.e., at
 coercion time).
 
While it's certainly possible that you have a grapheme that has no
corresponding unicode codepoint, it doesn't sound like this is the case
you're dealing with here.  You talk about restricted subset of latin1
but all of latin1's graphemes have unicode codepoints.  You also talk about
not being able to send off an email in UTF-8 but UTF-8 is an encoding of
unicode, not unicode itself.  Similarly, the statement that some email
clients don't support unicode isn't very clear as to actual problem.  The
email client supports displaying graphemes using glyphs present on the
computer.  As long as the graphemes needed have a unicode codepoint, using
unicode inside of your application and then encoding to bytes on the way out
works fine.

Even in cases where there's no unicode codepoint for the grapheme that
you're receiving unicode gives you a way out.  It provides you a private use
area where you can map the graphemes to unused codepoints.  Your
application keeps a mapping from that codepoint to the particular byte
sequence that you want.  Then write you a codec that converts from unicode w/
these private codepoints into your particular encoding (and from bytes into
unicode).

-Toshio


pgp0riTqgpAbp.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread M.-A. Lemburg
Barry Warsaw wrote:
 On Jun 21, 2010, at 12:34 PM, Toshio Kuratomi wrote:
 
 I like the idea of having encoding information carried with the data.
 I don't think that an ebytes type that can *optionally* have an encoding
 attribute makes the situation less confusing, though.
 
 Agreed.  I think the attribute should always be there, but there probably
 needs to be a magic value (perhaps None) that indicates and unknown, manual,
 garbage, error, broken encoding.
 
 Examples: you read bytes off a socket and don't know what the encoding is; you
 concatenate two ebytes that have incompatible encodings.

Such extra information tends to be lost whenever you pass the
bytes data through a C level API or some other function that
doesn't know about the special nature of those objects, treating
them just like any bytes object.

It may sound nice in theory, but in practice it doesn't work out.

Besides, if you do know the encoding, you can easily carry the
data around in a Unicode str object.

The problem lies elsewhere: What to do with a piece of text for
which you don't know the encoding and how to combine that piece
of text with other pieces of text for which you do know the
encoding.

There are a few options at hand:

 * you keep working on the bytes data and only convert things
   to Unicode when needed and where the encoding is known

 * you decode the bytes data for which you don't have the encoding
   information into some special Unicode form (eg. using the
   surrogateescape error handler) and hope that when the time
   comes to encode the Unicode data back into bytes, the codec
   supports reversing the conversion

 * you manage the data as a list of Unicode str and
   bytes objects and don't even try to be clever about encodings
   of text without unknown encoding

It depends a lot on the use case, which of these options fits
best.

 To me the biggest
 problem with python-2.x's unicode/bytes handling was not that it threw
 exceptions but that it didn't always throw exceptions.  You might test this
 in python2::
t = u'cafe'
function(t)

 And say, ah my code works.  Then a user gives it this::
t = u'café'
function(t)

 And get a unicode error because the function only works with unicode in the
 ascii range.
 
 That's an excellent point.

Here's a little known fact: by changing the Python2 default
encoding to 'undefined' (yes, that's a real codec !), you can disable
all automatic string coercion in Python2.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jun 21 2010)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

2010-07-19: EuroPython 2010, Birmingham, UK27 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! 


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread James Y Knight

On Jun 21, 2010, at 4:29 PM, M.-A. Lemburg wrote:

Here's a little known fact: by changing the Python2 default
encoding to 'undefined' (yes, that's a real codec !), you can disable
all automatic string coercion in Python2.


I tried that once: half the stdlib stops working if you do (for  
example, the re module), so it's not particularly useful for checking  
if your own code is unicode-safe.


James
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread Toshio Kuratomi
On Mon, Jun 21, 2010 at 04:09:52PM -0400, P.J. Eby wrote:
 At 03:29 PM 6/21/2010 -0400, Toshio Kuratomi wrote:
 On Mon, Jun 21, 2010 at 01:24:10PM -0400, P.J. Eby wrote:
  At 12:34 PM 6/21/2010 -0400, Toshio Kuratomi wrote:
  What do you think of making the encoding attribute a mandatory part of
  creating an ebyte object?  (ex: ``eb = ebytes(b, 'euc-jp')``).
 
  As long as the coercion rules force str+ebytes (or str % ebytes,
  ebytes % str, etc.) to result in another ebytes (and fail if the str
  can't be encoded in the ebytes' encoding), I'm personally fine with
  it, although I really like the idea of tacking the encoding to bytes
  objects in the first place.
 
 I wouldn't like this.  It brings us back to the python2 problem where
 sometimes you pass an ebyte into a function and it works and other times you
 pass an ebyte into the function and it issues a traceback.
 
 For stdlib functions, this isn't going to happen unless your ebytes'
 encoding is not compatible with the ascii subset of unicode, or the
 stdlib function is working with dynamic data...  in which case you
 really *do* want to fail early!
 
The ebytes encoding will often be incompatible with the ascii subset.
It's the reason that people were so often tempted to change the
defaultencoding on python2 to utf8.

 I don't see this as a repeat of the 2.x situation; rather, it allows
 you to cause errors to happen much *earlier* than they would
 otherwise show up if you were using unicode for your encoded-bytes
 data.
 
 For example, if your program's intent is to end up with latin-1
 output, then it would be better for an error to show up at the very
 *first* point where non-latin1 characters are mixed with your data,
 rather than only showing up at the output boundary!
 
That highly depends on your usage.  If you're formatting a comment on a web
page, checking at output and replacing with '?' is better than a traceback.
If you're entering key values into a database, then you likely want to know
where the non-latin1 data is entering your program, not where it's mixed
with your data or the output boundary.

 However, if you promoted mixed-type operation results to unicode
 instead of ebytes, then you:
 
 1) can't preserve data that doesn't have a 1:1 mapping to unicode, and
 
ebytes should be immutable like bytes and str.  So you shouldn't lose the
data if you keep a reference to it.

 2) can't detect an error until your data reaches the output point in
 your application -- forcing you to defensively insert ebytes calls
 everywhere (vs. simply wrapping them around a handful of designated
 inputs), or else have to go right back to tracing down where the
 unusable data showed up in the first place.
 
Usually, you don't want to know where you are combining two incompatible
strings.  Instead, you want to know where the incompatible strings are being
set in the first place.  If function(a, b) tracebacks with certain
combinations of a and b I need to know where a and b are being set, not
where function(a, b) is in the source code.  So you need to be making input
values ebytes() (or str in current python3) no matter what.

 One thing that seems like a bit of a blind spot for some folks is
 that having unicode is *not* everybody's goal.  Not because we don't
 believe unicode is generally a good thing or anything like that, but
 because we have to work with systems that flat out don't *do*
 unicode, thereby making the presence of (fully-general) unicode an
 error condition that has to be stamped out!
 
I think that sometimes as well.  However, here I think you're in a bit of
a blind spot yourself.  I'm saying that making ebytes + str coerce to ebytes
will only yield a traceback some of the time; which is the python2
behaviour.  Having ebytes + str coerce to str will never throw a traceback
as long as our implementation checks that the bytes and encoding work
together fro mthe start.

Throwing an error in code, only on some input is one of the main reasons
that debugging unicode vs byte issues sucks on python2.  On my box, with my
dataset, everything works.  Toss it up on pypi and suddenly I have a user in
Japan who reports that he gets a traceback with his dataset that he can't
give to me because it's proprietary, overly large, or transient.



 IOW, if you're producing output that has to go into another system
 that doesn't take unicode, it doesn't matter how
 theoretically-correct it would be for your app to process the data in
 unicode form.  In that case, unicode is not a feature: it's a bug.
 
This is not always true.  If you read a webpage, chop it up so you get
a list of words, create a histogram of word length, and then write the output as
utf8 to a database.  Should you do all your intermediate string operations
on utf8 encoded byte strings?  No, you should do them on unicode strings as
otherwise you need to know about the details of how utf8 encodes characters.

 And as it really *is* an error in that case, it should not pass
 silently, 

Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread John Arbash Meinel

...
 IOW, if you're producing output that has to go into another system
 that doesn't take unicode, it doesn't matter how
 theoretically-correct it would be for your app to process the data in
 unicode form.  In that case, unicode is not a feature: it's a bug.

 This is not always true.  If you read a webpage, chop it up so you get
 a list of words, create a histogram of word length, and then write the output 
 as
 utf8 to a database.  Should you do all your intermediate string operations
 on utf8 encoded byte strings?  No, you should do them on unicode strings as
 otherwise you need to know about the details of how utf8 encodes characters.
 

You'd still have problems in Unicode given stuff like å =~ å even though
u'\xe5' vs u'a\u030a' (those will look the same depending on your
Unicode system. IDLE shows them pretty much the same, T-Bird on Windosw
with my current font shows the second as 2 characters.)

I realize this was a toy example, but it does point out that Unicode
complicates the idea of 'equality' as well as the idea of 'what is a
character'. And just saying decode it to Unicode isn't really sufficient.

John
=:-

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread Nick Coghlan
On Tue, Jun 22, 2010 at 6:16 AM, P.J. Eby p...@telecommunity.com wrote:
 True, but making it a separate type with a required encoding gets rid of the
 magical I don't know - the I don't know encoding is just a plain old
 bytes object.

So, to boil down the ebytes idea, it is basically a request for a
second string type that holds an octet stream plus an encoding name,
rather than a Unicode character stream. Calling it ebytes seems to
emphasise the wrong parallel in that case (you have a 'str' object
with a different internal structure, not any kind of bytes object).
For now I'll call it an altstr. Then the idea can be described as

- altstr would expose the same API as str, NOT the same API as bytes
- explicit conversion via str would use the altstr's __str__ method
- explicit conversion via bytes would use the altstr's __bytes__ method
- implicit interaction with str would convert the str to an altstr
object according to the altstr's rules. This may be best handled via a
coercion method on altstr, rather than str actually needing to know
the details (i.e. an altrstr.__coerce_str__() method). For the
'ebytes' model, this would do something like
type(self)(other.encode(self.encoding), self.encoding)). The
operation would then be handled by the corresponding method on the
coerced object. A new type could then override operations such as
__contains__, __mod__, format() and join().

This is still smelling an awful lot like the 2.x str type to me, but
supporting a __coerce_str__ method may allow some useful
experimentation in this space (as PJE suggested). There's a chance it
would be abused, but it offers a greater chance of success than trying
to come up with a concrete altstr type without providing a means for
experimentation first.

 (In principle, you could then drop *all* the stringlike methods from
 plain-old-bytes objects.  If it's really text-in-bytes you want, you should
 use an ebytes with the encoding specified.)

Except that a lot of those string-like methods are just plain useful,
even when you *know* you're dealing with an octet stream rather than
latin-1 encoded text.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread Toshio Kuratomi
On Mon, Jun 21, 2010 at 04:52:08PM -0500, John Arbash Meinel wrote:
 
 ...
  IOW, if you're producing output that has to go into another system
  that doesn't take unicode, it doesn't matter how
  theoretically-correct it would be for your app to process the data in
  unicode form.  In that case, unicode is not a feature: it's a bug.
 
  This is not always true.  If you read a webpage, chop it up so you get
  a list of words, create a histogram of word length, and then write the 
  output as
  utf8 to a database.  Should you do all your intermediate string operations
  on utf8 encoded byte strings?  No, you should do them on unicode strings as
  otherwise you need to know about the details of how utf8 encodes characters.
  
 
 You'd still have problems in Unicode given stuff like å =~ å even though
 u'\xe5' vs u'a\u030a' (those will look the same depending on your
 Unicode system. IDLE shows them pretty much the same, T-Bird on Windosw
 with my current font shows the second as 2 characters.)
 
 I realize this was a toy example, but it does point out that Unicode
 complicates the idea of 'equality' as well as the idea of 'what is a
 character'. And just saying decode it to Unicode isn't really sufficient.
 
Ah -- but if you're dealing with unicode objects you can use the
unicodedata.normalize() function on them to come out with the right values.
If you're using bytes, it's yet another case where you, the programmer, have
to know what byte sequences represent combining characters in the particular
encoding that you're dealing with.

-Toshio


pgpF7cCCZvokU.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread Steven D'Aprano
On Tue, 22 Jun 2010 06:09:52 am P.J. Eby wrote:
 However, if you promoted mixed-type operation results to unicode
 instead of ebytes, then you:

 1) can't preserve data that doesn't have a 1:1 mapping to unicode,

Sounds like exactly the sort of thing the Unicode private codepoints 
were invented for, as Toshio suggests.

In any case, if there are use-cases for text that aren't solved by 
Unicode, and I'm not convinced that there are, Python doesn't need to 
solve them. At the very least, such a solution should start off as a 
third-party package to prove itself before being made a part of the 
standard library, let alone a built-in.


-- 
Steven D'Aprano
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread Steven D'Aprano
On Tue, 22 Jun 2010 08:03:58 am Nick Coghlan wrote:
 On Tue, Jun 22, 2010 at 6:16 AM, P.J. Eby p...@telecommunity.com 
wrote:
  True, but making it a separate type with a required encoding gets
  rid of the magical I don't know - the I don't know encoding is
  just a plain old bytes object.

 So, to boil down the ebytes idea, it is basically a request for a
 second string type that holds an octet stream plus an encoding name,
 rather than a Unicode character stream.

Do any other languages have any equivalent to this ebtyes type?

If not, how do they deal with this issue?

[...]
 This is still smelling an awful lot like the 2.x str type to me

Yes. Virtually the only difference I can see is that it lets the user 
set a per-object default encoding to use when coercing strings to and 
from bytes.

If this is not the case, can somebody please explain what I'm missing?



-- 
Steven D'Aprano
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread Neil Hodgson
Steven D'Aprano:

 Do any other languages have any equivalent to this ebtyes type?

   The String type in Ruby 1.9 is a byte string with an encoding attribute.

   Most online Ruby documentation is for 1.8 but the API can be examined here:
http://ruby-doc.org/ruby-1.9/index.html
   Here's something more explanatory:
http://blog.grayproductions.net/articles/ruby_19s_string

   My view is that this actually makes things much more complex by
making encoding combination an n*n problem (where n is the number of
encodings) rather an n sized problem when you have a single core
string type

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread Terry Reedy

On 6/21/2010 1:58 PM, Stephen J. Turnbull wrote:


As for Think Carefully About It Every Time, that is required only in
Porting Programs That Mix Operation On Bytes With Operation On Str.


The 2.x anti-pattern


If you write programs from scratch, however, the decode-process-encode
paradigm quickly becomes second nature.


Except in this particular arena, it already should be to anyone reading 
this list. Decorate-sort-undecorate is another example of the same idea. 
Transform-compute-untransform is the basis of NP-complete theory. 
Frequency domain processing sandwiched between forward and reverse 
Fourier transforms is a third example. And so on.


--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread Jess Austin
On Mon, Jun 22, 2010 at 7:27:31 PM, Steven D'Aprano st...@pearwood.info wrote:
 On Tue, 22 Jun 2010 08:03:58 am Nick Coghlan wrote:
 So, to boil down the ebytes idea, it is basically a request for a
 second string type that holds an octet stream plus an encoding name,
 rather than a Unicode character stream.

 Do any other languages have any equivalent to this ebtyes type?

Ruby seems to do this:

http://yokolet.blogspot.com/2009/07/design-and-implementation-of-ruby-m17n.html

I don't use ruby myself, and I'm probably missing some subtle flaws,
but the exposition at that link makes sense to me.

cheers,
Jess
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread Terry Reedy

On 6/21/2010 2:46 PM, P.J. Eby wrote:


This ignores the existence of use cases where what you have is text that
can't be properly encoded in unicode.


I think it depends on what you mean by 'properly'. I will try to explain 
with English examples.


1. Unicode represents a finite set of characters and symbols and a few 
control or markup operators. The potential set is unbounded, so unicode 
includes a user area. I include use of that area in 'properly'. I kind 
of suspect that the statement above does not since any byte or short 
byte sequence that does not translate can instead use the user area.


2. Unicode disclaims direct representation of font and style 
information, leaving that to markup either in or out of the text stream. 
(It made an exception for japanese narrow and wide ascii chars, which I 
consider to essentially be duplicate font variations of the normal ascii 
codes.) Html uses both in-band and out-of-band (css) markup. Stripping 
markup information is a loss of information. If one wants it, one must 
keep it in one form or another.


I believe that some early editors like Wordstar used high-bit-set bytes 
for bold, underline, italic on and off. Assuming I have the example 
right, can Wordstar text be 'properly encoded in unicode'? If one 
insists that that mean replacement of each of the format markup chars 
with a single defined char in the Basic Multilingual Plane, then 'no'. 
If one allows replacement by bold, /bold, and so on, then 'yes'.


3. Unicode disclaims direct representation of glyphic variants (though 
again, exceptions were made for asian acceptance). For example, in 
English, mechanically printed 'a' and 'g' are different from manually 
printed 'a' and 'g'. Representing both by the same codepoint, in itself, 
loses information. One who wishes to preserve the distinction must 
instead use a font tag or perhaps a handprinted tag. Similarly, older 
English had a significantly different glyph for 's', which looks more 
like a modern 'f'.


If IBM's EBCDIC had codes for these glyph variants, IBM might have 
insisted that unicode also have such so char for char round-tripping 
would be possible. It does not and unicode does not. (Wordstar and other 
1980s editor publishers were mostly defunct or weak and not in a 
position to make such demands.)


If one wants to write on the history of glyph evolution, say of latin 
chars, one much either number the variants 'e-0', 'e-1', etc, or resort 
to the user area. In either case, proprietary software would be needed 
to actually print the variations with other text.



I know, it's a hard thing to wrap
one's head around, since on the surface it sounds like unicode is the
programmer's savior. Unfortunately, real-world text data exists which
cannot be safely roundtripped to unicode,


I do not believe that. Digital information can always be recoded one way 
or another. As it is, the rules were bent for Japanese, in a way that 
they were not for English, to aid round-tripping of the major public 
encodings. I can, however, believe that there were private encodings for 
which round-tripping is more difficult. But there are also difficulties 
for old proprietary and even private English encodings.



--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread Stephen J. Turnbull
Barry Warsaw writes:

  I'm still not sure ebytes solves the problem,

I don't see how it can.  If you have an encoding to stuff into ebytes,
you could just convert to Unicode and guarantee that all internal
string operations will succeed.  If you use ebytes instead, every
string operation has to be wrapped in try ... except EBytesError, to
no gain that I can see.

If you don't have an encoding, then you just have bytes, which
strictly speaking shouldn't be operated on (in the sense of slicing,
dicing, or stir-frying) at all if you're in an environment where they
are a carrier for formatted information such as non-ASCII characters
or PNG images.

  but it avoids one I'm most concerned about seeing proposed.  I
  really really do not want to add encoding=blah arguments to
  boatloads of function signatures.

Agreed.  But ebytes isn't a solution to that; it's a regression to one
of the hardest problems in Python 2.

OTOH, it seems to me that there's only one boatload to worry about.
That's the boatload containing protocol-less APIs, ie, Unix OS data
(names in the filesystem, content of environment variables).
Other platforms (Windows, Mac) are standardizing on protocols for
these things and enforcing them in the OS, and free Unices are going
to the convention that everything is non-normalized UTF-8.

What other boats are you worried about?

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread Stephen J. Turnbull
P.J. Eby writes:

  In Kagoshima, you'd use pass in an ebytes with your encoding to a
  stdlib API, and *get back an ebytes with the right encoding*,
  rather than an (incorrect and useless) unicode object which has
  lost data you need.

How does the stdlib do that?  Unless it guesses which encoding for
Japanese is being used?  And even if this ebytes uses Shift JIS, what
makes that the right encoding for anything?

On the other hand, I know when *I* need some encoding, and when I
figure it out I will store it in an appropriate place in my program.
The problem is that for some programs it is not unlikely that I will
see all of Shift JIS, EUC-JP, ISO-2022-JP, UTF-8, and UTF-16, and on a
very bad day, RFC 2047, GB 2312, and Big5, too, used to encode
Japanese.  It's not totally unlikely for a browser to send URLs to a
server expecting UTF-8 to recover a message/rfc822 object containing
ISO-2022-JP in the mail header and EUC-JP in the body.

So I need to know which encoding was used by the server that sent the
reply, but the ebytes can't tell me that if it fishes an URL in EUC-JP
out of the message body.  I need to convert that URL to UTF-8, or most
servers will 404.

  But this is not the case at all, for use cases where no, really, you 
  *have to* work with bytes-encoded text streams.  The mere release of 
  Python 3.x will not cause all the world's applications, libraries, 
  and protocols to suddenly work with unicode, where they did not before.

Sure.  That's what .encode() and .decode() are for.  The problem is
what to do when you don't know what to put in the parentheses, and I
can't think of a use case offhand where ebytes(stuff,'garbage')
does better than PEP 383-enabled str for:

  Being explicit about the encoding of the bytes you're flinging
  around is actually an *increase* in specificity, explicitness,
  robustness, and error-checking ability over the status quo for
  either 2.x *or* 3.x...  *and* it improves these qualities for
  essentially *all* string-handling code, without requiring that code
  to be rewritten to do so.

A well-spoken piece.  But, you see, most of those encodings are *only*
interesting so that you can transcode characters to the encoding of
interest.  What's the e.o.i.?  That is easily found in the context or
has an obvious default, if you're lucky, or otherwise a hard problem
that ebytes does nothing to help solve as far as I can see.

Cf. Robert Collins' post
aanlktinq_d_vahbw5ikuyy9qgjqoffy4xczc0dyzt...@mail.gmail.com, where
he makes it quite explicit that a bytes interface is all about punting
in the face of missing encoding information.

  and (2) you really want this under control of higher level objects
  that have access to some knowledge of the environment, rather than
  the lowest level.
  
  This proposal actually has such a higher-level object: an 
  ebytes.

I don't see how that can be true.  An ebytes is a very low-level
object that has no idea whether its encoding is interesting (eg, the
one that an RFC or a server specifies), or a technical detail of use
only until the ebytes is decoded, then can be thrown away.

I just don't see, in the case where there is a real encoding in the
ebytes, what harm is done by decoding the ebytes to str.  If context
indicates that the encoding is an interesting one (eg, it should be
the default for encoding on output), then you want to save that in an
appropriate place that preserves not just the encoding itself, but the
context that gives it its importance.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-20 Thread Stephen J. Turnbull
Antoine Pitrou writes:

  I think it's an unfortunate analogy.

Propose a better one, then.  I'm definitely not wedded to the ones
I've proposed!

But we have a PR problem *now*.  The loyal opposition clearly intend
to continue trash-talking Python 3 until the libraries get to 100% (or
a government-approved approximation of 100%).  The topic on #python
seems unlikely to change at this point, with both Glyph and JP
pointedly failing to denounce it publicly, while Stephen defends it
and says it's not going to change as long as the libraries aren't
done.

What do you suggest?  Or do you think there's no PR problem we should
worry about, just accept that this going to be a further drag on
adoption and improvement, and keep on keeping on?

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-20 Thread Martin v. Löwis

I can only imagine how difficult can it be to do such a conversion in
a project like Twisted or Django where the I/O plays a fundamental
role.


For Django, you don't need to imagine, but can look at the actual changes:

http://bitbucket.org/loewis/django-3k/


The choice of forcing the user to use Unicode and think in Unicode
was a very brave one, and I'm sure it's for the better, but not
everyone wants to deal with that because Unicode is hard to swallow.
The majority of people prefer to stay with bytes and eventually learn
and introduce Unicode only when that is actually needed.


It's not really an issue with Unicode, but rather with characters.
Surprisingly, most people don't grasp the notion of abstract character.

This is similar to not grasping the notion of abstract integral 
number, which most programmers master over time (although my students 
typically need a year or more to get the difference between decimal 
number, two's complement, and abstract integer; the difference 
between character string and number is easier (*)).


For numbers, programmers are forced to accept the abstraction. For 
character strings, they apparently resist much more.


Regards,
Martin

(*) An anecdotal dialog may read like this
Teacher: How are numbers represented in Python?
Student: In decimal.
T: How so?
S: I can do
  x = 47
and it is decimal. I can then do
  print x
and get 47. See?
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-20 Thread Antoine Pitrou
On Mon, 21 Jun 2010 02:30:17 +0900
Stephen J. Turnbull step...@xemacs.org wrote:
 Antoine Pitrou writes:
 
   I think it's an unfortunate analogy.
 
 Propose a better one, then.  I'm definitely not wedded to the ones
 I've proposed!

I'm not sure why you want an analogy. Python 3 improves the language
and drops legacy cruft. Bringing C++ makes the description
unnecessarily contentious and loaded (because C++ has a rather bad
reputation amongst many people; recently Linus Torvalds explained
again why he thought C was much more appropriate a programming
language). And it's not even warranted, because the situation is vastly
different.

 What do you suggest?  Or do you think there's no PR problem we should
 worry about, just accept that this going to be a further drag on
 adoption and improvement, and keep on keeping on?

I suppose the PR problem could be solved by having an official page on
python.org explain what the new features and advantages of Python 3 over
Python 2 are. There's no such thing right now; actually, I'm not sure
there's a Web page explaining clearly what the difference is about, why
it was done in such a compatibility-breaking way, and what we advise
(both actual and potential) users to do.

I suppose that's a task for the Web content editor community.

Regards

Antoine.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-20 Thread Antoine Pitrou
On Sun, 20 Jun 2010 14:26:28 +0200
Giampaolo Rodolà g.rod...@gmail.com wrote:
 I attempted to port pyftpdlib to python 3 several times and the
 biggest show stopper has always been the bytes / string difference
 introduced by Python 3 which forces you to *know* and *use* Unicode
 every time you deal with some text and 2to3 is completely useless
 here.

I don't really understand what the difficulties are. A character is a
character; to convert from bytes to characters needs to know the
encoding, which your protocol should specify somewhere (of course, I
suppose FTP is old and crummy enough that it may not specify anything).

An encoding is nothing more than a transformation. When you get
gzipped data, you must decompress it before doing anything useful out
of it. Similarly, when you get (say) UTF-8 data, you must decode it
before doing anything useful out of it.

 I can only imagine how difficult can it be to do such a conversion in
 a project like Twisted or Django where the I/O plays a fundamental
 role.

Twisted actually seems to enforce the bytes / unicode separation quite
well already, so I don't think they should have many problems on that
front. Modern Web frameworks seem to be in the same boat (they already
give the Web developer unicode strings to play with, and handle the
encoding/decoding at the IO boundary transparently).

 The choice of forcing the user to use Unicode and think in Unicode
 was a very brave one, and I'm sure it's for the better, but not
 everyone wants to deal with that because Unicode is hard to swallow.

Could Google fund a project named Unicode Swallow?

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-20 Thread Guido van Rossum
On Sun, Jun 20, 2010 at 5:26 AM, Giampaolo Rodolà g.rod...@gmail.com wrote:
 2010/6/20 Steven D'Aprano st...@pearwood.info:
 Python 2.x introduced Unicode strings. Python 3.x merely makes them the
 default.

 Merely? To me this looks as the main reason why a lot of projects
 haven't been ported to Python 3 yet.
 I attempted to port pyftpdlib to python 3 several times and the
 biggest show stopper has always been the bytes / string difference
 introduced by Python 3 which forces you to *know* and *use* Unicode
 every time you deal with some text

Ah, but this is the crux of the difference between Python 2 and 3. The
distinction between text and bytes is crucial, and Python 2 tried to
paper over the differences in a way that led to endless pain. Many
clumsy and shaky hacks have been invented to alleviate the pain but it
never goes away. Python 3 takes a much clearer stance on the
difference -- your code *must* be aware of the distinction and it
*must* deal with it.

The problem comes exactly where you find it: when *porting* existing
code that uses aforementioned ways to alleviate the pain, you find
that the hacks no longer work and a properly layered design is needed
that clearly distinguishes between which variables contain bytes and
which text.

 and 2to3 is completely useless here.

Alas, this is true, because it is not a matter of changing some simple
things. The old ways are no longer supported.

 I can only imagine how difficult can it be to do such a conversion in
 a project like Twisted or Django where the I/O plays a fundamental
 role.

Django actually took one of the most principled stances towards this
issue and has already been ported (although the port is not maintained
by the core Django developers yet). I can't speak for Twisted but I
know they have some funding towards a port.

The problem is often worse for smaller libraries (like I presume
pyftplib is) which don't have a clear stance about bytes vs. text.

Another problem is some internet protocols (of which FTP I believe is
one) which use antiquated models for dealing with binary vs. text
data, often focusing entirely on encodings (usually and mistakenly
called character sets) rather than on proper Unicode support.

 The choice of forcing the user to use Unicode and think in Unicode
 was a very brave one, and I'm sure it's for the better, but not
 everyone wants to deal with that because Unicode is hard to swallow.

Education is needed. When you search Google (or Bing, for that matter
:-) for python unicode the first hit is
http://www.amk.ca/python/howto/unicode, which is highly detailed but
probably too much information for the typical person faced with a
UnicodeError exception traceback (that page is also focused on Python
2). What we need is a cookbook on how to deal with various common
situations.

 The majority of people prefer to stay with bytes and eventually learn
 and introduce Unicode only when that is actually needed.

This is exactly what we tried to do in Python 2 and it was a flagrant
disaster. It's just that the work-arounds people have created to deal
with it don't port clearly -- which is by design.

This is why I've always said that I assumed that the Python 3
transition would take 5 years.

On the #python issue, I expect that IRC is much less influential that
some here fear (and than some fervent IRC users believe). I don't see
reason for panic or heavy-handed interference. OTOH engaging the
channel operators more in python-dev sounds like a useful approach.

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-20 Thread Stephen J. Turnbull
Pass the ketchup, I need to eat my words.

I wrote:

  The loyal opposition clearly intend to continue trash-talking
  Python 3 until the libraries get to 100% (or a government-approved
  approximation of 100%).  The topic on #python seems unlikely to
  change at this point, with both Glyph and JP pointedly failing to
  denounce it publicly, while Stephen defends it and says it's not
  going to change as long as the libraries aren't done.

It would seem from posts I received after replying (local mail glitch,
should have know there was more coming :-( ) that the facts are that
the topic is quite likely to change soonish, and that trash-talking
is being done, if at all, by trolls.  (Having spent a few hours on
#python today, I see that's a lot more possible than I would have
believed in this community.  Nobody's immune.)

Glyph, JP, and Stephen have my personal apologies.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-20 Thread Laurens Van Houtven
On Sun, Jun 20, 2010 at 7:30 PM, Stephen J. Turnbull step...@xemacs.org wrote:
 Antoine Pitrou writes:
 But we have a PR problem *now*.  The loyal opposition clearly intend
 to continue trash-talking Python 3 until the libraries get to 100% (or
 a government-approved approximation of 100%).  The topic on #python
 seems unlikely to change at this point, with both Glyph and JP
 pointedly failing to denounce it publicly, while Stephen defends it
 and says it's not going to change as long as the libraries aren't
 done.

Huh? We just changed the topic on #python because people complained
about it. We didn't do it earlier because we didn't know it was a
problem. Defending it doesn't mean it's set in stone :-)

I don't wanna come across like a jerk but could we please not use
loaded terms like loyal opposition and trash-talking? I don't
really think that's what people do or are (or at least want to
be/intend to do). I've really honestly tried my best to fix this
situation (see the other thread) and the people whom I've gotten input
from (both here and in the IRC channels) have been nothing but
helpful.

 What do you suggest?  Or do you think there's no PR problem we should
 worry about, just accept that this going to be a further drag on
 adoption and improvement, and keep on keeping on?

I very much like Martin and Antoine's ideas of putting the thing up on
python.org, that might also solve people's problems with the apparent
dissonance between #python and python-dev/the PSF that neither side
really wants. To the contrary, I think everyone wants this situation
to improve, including Guido, apparently. Myself included, I think
everyone stands to gain here.


thanks for listening
Laurens
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-20 Thread Stephen J. Turnbull
Guido van Rossum writes:

  On the #python issue, I expect that IRC is much less influential that
  some here fear (and than some fervent IRC users believe). I don't see
  reason for panic or heavy-handed interference. OTOH engaging the
  channel operators more in python-dev sounds like a useful approach.

More vice-versa, I now think.  Ie, (somewhat) greater python-dev
presence on #python is more important.  I sort of assumed that people
actually participated in #python, as a number do in c.l.p, but that
doesn't seem to be so.  At least while I was there, I didn't see
anybody else who seemed to be python-dev, whether core or the regular
denizens of the peanut gallery.

From a few hours monitoring and participating in #python, Laurens
gives pretty accurate summary of the kind of people in the channel.  I
didn't see anything about Python 3, but I can definitely imagine there
being Python-3-baiting trolls.  There certainly were a few trollish
posters.

Anyway, what I personally plan to do is put in a couple of hours a
week on #python, and I probably mostly won't mention Python 3 unless
asked, and maybe in discussing Unicode issues.  While I don't claim to
be particularly *representative* of python-dev, an additional
dimension of diversity should go a long way.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-20 Thread P.J. Eby

At 10:57 AM 6/20/2010 -0700, Guido van Rossum wrote:

The problem comes exactly where you find it: when *porting* existing
code that uses aforementioned ways to alleviate the pain, you find
that the hacks no longer work and a properly layered design is needed
that clearly distinguishes between which variables contain bytes and
which text.


Actually, I would say that it's more that (in the network protocol 
case) we *have* bytes, some of which we would like to *treat* as 
text, yet do not wish to constantly convert back and forth to 
full-blown unicode -- especially since the protocols themselves 
designate ASCII or latin-1 at the transport layer (sometimes with 
odder encodings above, but these already have to be explicitly dealt 
with by existing code).


While reading over this thread, I'm wondering whether at least my 
(WSGI-related) problems in this area would be solved by the 
availability of a type (say bstr) that was simply a wrapper 
providing string-like behavior over an underlying bytes, byte array, 
or memoryview, that would produce objects of compatible type when 
combined with strings (by encoding them to match).


Then, I could wrap bytes with it to pass them to string operations, 
and then feed them back into everything else.  The bstr type ideally 
would be directly compatible with bytes I/O, or at least have a 
.bytes attribute that would be.


It seems like that would reduce WSGI porting issues quite a bit, 
since it would mostly consist of throwing extra bstr() calls in where 
things are breaking, and maybe grabbing the .bytes attribute for I/O.


This approach would still be explicit as to what types you're working 
with, but would not require O(n) *conversions* at every interaction 
boundary.  It would be limited, of course, to single-byte encodings 
with all characters (0-255) valid.


OTOH, maybe there should just be a bytestrings module with 
bytestrings.ascii and bytestrings.latin1, and between the two that 
should cover the network protocol needs quite well.


Actually, if the Python 3 str() constructor could do O(1) conversion 
for the latin-1 case (i.e., just wrapped the underlying bytes), I 
would just put, bstr = lambda x: str(x,'latin-1') at the top of my 
programs and have roughly the same effect.


This idea is still a bit half-baked, but a more baked version might 
be just the ticket for porting stuff that used str to work with bytes 
in 2.x, if only because writing, e.g.:


 newurl = bstr(urljoin(bstr(base), 'subdir'))

seems so much saner than writing *this* everywhere:

 newurl = str(urljoin(str(base, 'latin-1'), 'subdir'), 'latin-1')

It is perhaps a bit late to propose this idea, since ideally we would 
also want to use it in 2.x to aid porting.  But I'm curious if any 
other people here experiencing byte/unicode woes in relation to 
network protocols would find this a solution to their chief 
frustration.  (i.e., that the stdlib often insists now on strings, 
where effectively bytes were usable before, and thus one must do 
conversions both coming and going.)


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-20 Thread Jesse Noller
On Sun, Jun 20, 2010 at 2:40 PM, P.J. Eby p...@telecommunity.com wrote:
 At 10:57 AM 6/20/2010 -0700, Guido van Rossum wrote:

 The problem comes exactly where you find it: when *porting* existing
 code that uses aforementioned ways to alleviate the pain, you find
 that the hacks no longer work and a properly layered design is needed
 that clearly distinguishes between which variables contain bytes and
 which text.

 Actually, I would say that it's more that (in the network protocol case) we
 *have* bytes, some of which we would like to *treat* as text, yet do not
 wish to constantly convert back and forth to full-blown unicode --
 especially since the protocols themselves designate ASCII or latin-1 at the
 transport layer (sometimes with odder encodings above, but these already
 have to be explicitly dealt with by existing code).

 While reading over this thread, I'm wondering whether at least my
 (WSGI-related) problems in this area would be solved by the availability of
 a type (say bstr) that was simply a wrapper providing string-like behavior
 over an underlying bytes, byte array, or memoryview, that would produce
 objects of compatible type when combined with strings (by encoding them to
 match).

 Then, I could wrap bytes with it to pass them to string operations, and then
 feed them back into everything else.  The bstr type ideally would be
 directly compatible with bytes I/O, or at least have a .bytes attribute that
 would be.

 It seems like that would reduce WSGI porting issues quite a bit, since it
 would mostly consist of throwing extra bstr() calls in where things are
 breaking, and maybe grabbing the .bytes attribute for I/O.

 This approach would still be explicit as to what types you're working with,
 but would not require O(n) *conversions* at every interaction boundary.  It
 would be limited, of course, to single-byte encodings with all characters
 (0-255) valid.

 OTOH, maybe there should just be a bytestrings module with bytestrings.ascii
 and bytestrings.latin1, and between the two that should cover the network
 protocol needs quite well.

 Actually, if the Python 3 str() constructor could do O(1) conversion for the
 latin-1 case (i.e., just wrapped the underlying bytes), I would just put,
 bstr = lambda x: str(x,'latin-1') at the top of my programs and have
 roughly the same effect.

 This idea is still a bit half-baked, but a more baked version might be just
 the ticket for porting stuff that used str to work with bytes in 2.x, if
 only because writing, e.g.:

     newurl = bstr(urljoin(bstr(base), 'subdir'))

 seems so much saner than writing *this* everywhere:

     newurl = str(urljoin(str(base, 'latin-1'), 'subdir'), 'latin-1')

 It is perhaps a bit late to propose this idea, since ideally we would also
 want to use it in 2.x to aid porting.  But I'm curious if any other people
 here experiencing byte/unicode woes in relation to network protocols would
 find this a solution to their chief frustration.  (i.e., that the stdlib
 often insists now on strings, where effectively bytes were usable before,
 and thus one must do conversions both coming and going.)


I hate to reply with a simple +1 - but I've heard this pain and
proposal from a frightening number of people, something which allowed
you to use bytes with some of the sting methods would go a really long
way to solving a lot of peoples python 3 pain. I don't relish the idea
that once people start moving over, there might be a billion
implementations of things like this.

jesse
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-20 Thread A.M. Kuchling
On Sun, Jun 20, 2010 at 10:57:05AM -0700, Guido van Rossum wrote:
 Education is needed. When you search Google (or Bing, for that matter
 :-) for python unicode the first hit is
 http://www.amk.ca/python/howto/unicode, which is highly detailed but
 probably too much information for the typical person faced with a
 UnicodeError exception traceback (that page is also focused on Python
 2). What we need is a cookbook on how to deal with various common

Eep!  That should be directed to
http://docs.python.org/howto/unicode.html, the copy that's actually
incorporated in the Python docs.  I'll fix that immediately.

Regarding a smaller document for people who hit a UnicodeError
exception: could we write a little Unicode FAQ for python.org?

--amk

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-20 Thread Terry Reedy

On 6/20/2010 8:26 AM, Giampaolo Rodolà wrote:


I attempted to port pyftpdlib to python 3 several times and the
biggest show stopper has always been the bytes / string difference
introduced by Python 3 which forces you to *know* and *use* Unicode
every time you deal with some text and 2to3 is completely useless
here.


I believe the advice in the wiki porting page is to use unicode() and 
bytes() but never str(), in a version that runs in 2.6. Then 2to3 should 
do fine. For 2.5-, add 'bytes = str' somewhere.


2to3 still gets patches, I believe, when someone exhibits code that 
could and ought to be converted but is not.


I suspect that if you posted 'Problems porting pyftpdlib to Python3', 
you would get some help. If it involved inadequacies in the current 
tools and guides, it would to be be on-topic here. Or try python-list.



The choice of forcing the user to use Unicode and think in Unicode
was a very brave one, and I'm sure it's for the better, but not
everyone wants to deal with that because Unicode is hard to swallow.


I felt that way until my daughter decided to switch from Spanish to 
Japanese for here foreign language. Once I quit fighting it, it because 
much easier to swallow and learn. As it turns out, thinking in Unicode 
is a pretty straightforward generalization of thinking in ascii. There 
are some annoying glitches due to the need to accomodate legacy systems. 
The plethora of legacy encodings for various subsets, besides ascii, is 
also a nuisance.



The majority of people


who use latin-char alphabets


prefer to stay with bytes and eventually learn
and introduce Unicode only when that is actually needed.


The example at
http://code.google.com/p/pyftpdlib/
uses names and filenames. Without unicode, these are restricted to 
ascii, unless you use multiple encodings, which to me would be worse.


Terry Jan Reedy


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-20 Thread Terry Reedy

On 6/20/2010 1:30 PM, Stephen J. Turnbull wrote:

The topic on #python seems unlikely to change at this point


I just verified that, thanks to Laurens and whoever, it has been.
It is now rather good.

Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-20 Thread Laurens Van Houtven
On Sun, Jun 20, 2010 at 11:30 PM, Terry Reedy tjre...@udel.edu wrote:
 On 6/20/2010 8:26 AM, Giampaolo Rodolà wrote:

 I attempted to port pyftpdlib to python 3 several times and the
 biggest show stopper has always been the bytes / string difference
 introduced by Python 3 which forces you to *know* and *use* Unicode
 every time you deal with some text and 2to3 is completely useless
 here.

 I believe the advice in the wiki porting page is to use unicode() and
 bytes() but never str(), in a version that runs in 2.6. Then 2to3 should do
 fine. For 2.5-, add 'bytes = str' somewhere.

Really? I thought you were supposed to call encode/decode methods on
the appropriate thing, depending if they're coming from a byte source
or a character source. The problems arise when you're doing things
like paths, which I believe are bytes on *nix and proper Unicode on
Windows (which basically just means they enforce an encoding, UTF-16
if I'm not mistaken). I don't actually use Windows so I might be
completely wrong here.

 2to3 still gets patches, I believe, when someone exhibits code that could
 and ought to be converted but is not.

 I suspect that if you posted 'Problems porting pyftpdlib to Python3', you
 would get some help. If it involved inadequacies in the current tools and
 guides, it would to be be on-topic here. Or try python-list.

 The choice of forcing the user to use Unicode and think in Unicode
 was a very brave one, and I'm sure it's for the better, but not
 everyone wants to deal with that because Unicode is hard to swallow.

 I felt that way until my daughter decided to switch from Spanish to Japanese
 for here foreign language. Once I quit fighting it, it because much easier
 to swallow and learn. As it turns out, thinking in Unicode is a pretty
 straightforward generalization of thinking in ascii. There are some annoying
 glitches due to the need to accomodate legacy systems. The plethora of
 legacy encodings for various subsets, besides ascii, is also a nuisance.

I think doing unicode/str properly in 2.x is very important, #python
stresses it quite often, I think Py3k's strictness is a good idea
because people very often write something that appears to work for a
long time, and then someone tries it using funny bytes, and everything
blows apart. Convincing people their software is wrong when
everything worked five minutes ago is really hard :-)

You'd be surprised how long it can take before some of these problems
are found, a couple of weeks ago in #python we had exactly this
problem when we were helping Blender folks. There was a bug report
from a German Blender user, turns out Blender ignores unicode in some
critical spot making importing between people who disagree on charsets
impossible. And Blender isn't exactly a project that's two weeks old
and filled with idiots :) The downside is that *fixing* them then
becomes a nontrivial task.

The central problem is probably that a lot of people don't understand
Unicode. Recently I learned that even Tanenbaum got it wrong in his
latest revision of the computer networks book! (Although that might
just be my dutch translation of it being bad).

 Terry Jan Reedy

Laurens
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-20 Thread Nick Coghlan
 I hate to reply with a simple +1 - but I've heard this pain and
 proposal from a frightening number of people, something which allowed
 you to use bytes with some of the sting methods would go a really long
 way to solving a lot of peoples python 3 pain. I don't relish the idea
 that once people start moving over, there might be a billion
 implementations of things like this.

My concern with it would be creating the temptation to use these new
objects that can't tolerate multibyte or variable character length
encodings when the general string type was more relevant (thus to some
degree perpetuating Python 2.x issues with incomplete Unicode
handling).

Perhaps if people could identify which specific string methods are
causing problems? In 3.2, there really aren't that many differences
between the available methods for strings and bytes:

 set(dir(str)) - set(dir(bytes))
{'isprintable', 'format', '__mod__', 'encode', 'isidentifier',
'_formatter_field_name_split', 'isnumeric', '__rmod__', 'isdecimal',
'_formatter_parser'}
 set(dir(bytes)) - set(dir(str))
{'decode', 'fromhex'}

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-20 Thread Terry Reedy

On 6/20/2010 4:10 PM, Jesse Noller wrote:

On Sun, Jun 20, 2010 at 2:40 PM, P.J. Ebyp...@telecommunity.com  wrote:



While reading over this thread, I'm wondering whether at least my
(WSGI-related) problems in this area would be solved by the availability of
a type (say bstr) that was simply a wrapper providing string-like behavior
over an underlying bytes, byte array, or memoryview, that would produce
objects of compatible type when combined with strings (by encoding them to
match).



I hate to reply with a simple +1 - but I've heard this pain and
proposal from a frightening number of people, something which allowed
you to use bytes with some of the sting methods would go a really long
way to solving a lot of peoples python 3 pain. I don't relish the idea
that once people start moving over, there might be a billion
implementations of things like this.


Given that the 3.x bytes and bytearray classes do retain text methods 
like .capitalize(), which are meaningless for arbitrary binary data, it 
is not clear to me what you are asking for or what problem a new class 
would solve. I am curious though.


Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-20 Thread Jesse Noller



On Jun 20, 2010, at 6:21 PM, Terry Reedy tjre...@udel.edu wrote:


On 6/20/2010 4:10 PM, Jesse Noller wrote:
On Sun, Jun 20, 2010 at 2:40 PM, P.J. Ebyp...@telecommunity.com   
wrote:



While reading over this thread, I'm wondering whether at least my
(WSGI-related) problems in this area would be solved by the  
availability of
a type (say bstr) that was simply a wrapper providing string- 
like behavior
over an underlying bytes, byte array, or memoryview, that would  
produce
objects of compatible type when combined with strings (by encoding  
them to

match).



I hate to reply with a simple +1 - but I've heard this pain and
proposal from a frightening number of people, something which allowed
you to use bytes with some of the sting methods would go a really  
long
way to solving a lot of peoples python 3 pain. I don't relish the  
idea

that once people start moving over, there might be a billion
implementations of things like this.


Given that the 3.x bytes and bytearray classes do retain text  
methods like .capitalize(), which are meaningless for arbitrary  
binary data, it is not clear to me what you are asking for or what  
problem a new class would solve. I am curious though.




Ask the web-sig and wsgi folks for starters. I know they've  
experienced non-zero pain.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-20 Thread Steven D'Aprano
On Mon, 21 Jun 2010 08:01:08 am Laurens Van Houtven wrote:

 I think doing unicode/str properly in 2.x is very important, #python
 stresses it quite often, I think Py3k's strictness is a good idea
 because people very often write something that appears to work for a
 long time, and then someone tries it using funny bytes, and
 everything blows apart. Convincing people their software is wrong
 when everything worked five minutes ago is really hard :-)

Worse is when you have people who, when faced with their software 
failing to handle filenames containing non-ASCII characters (those 
funny letters), insist that the problem is the user for giving 
non-ASCII characters. Even when they're in the user's native 
(non-Latin) language. Even when the OS supports them.

Gah.


-- 
Steven D'Aprano
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-20 Thread P.J. Eby

At 08:08 AM 6/21/2010 +1000, Nick Coghlan wrote:

Perhaps if people could identify which specific string methods are
causing problems?


__getitem__(int) returns an integer rather than a bytestring, so 
anything that manipulates individual characters can't be given bytes 
and have it work.


That was one of the key differences I had in mind for a bstr type, 
apart from  designing it to coerce normal strings to bstrs in 
cross-type operations, and to allow O(1) conversion to/from bytes.


Another randomly chosen byte/string incompatibility (Python 3.1; I 
don't have 3.2 handy at the moment):


 os.path.join(b'x','y')
Traceback (most recent call last):
  File stdin, line 1, in module
  File c:\Python31\lib\ntpath.py, line 161, in join
if b[:1] in seps:
TypeError: Type str doesn't support the buffer API

 os.path.join('x',b'y')
Traceback (most recent call last):
  File stdin, line 1, in module
  File c:\Python31\lib\ntpath.py, line 161, in join
if b[:1] in seps:
TypeError: 'in string' requires string as left operand, not bytes

Ironically, it seems to me that in trying to make the type 
distinction more rigid, Py3K fails in this area precisely because it 
is not a rigidly typed language in the Java or Haskell sense: i.e., 
os.path.join doesn't say, I need two stringlike objects of the *same 
type*, not even in its docstring.


At least in Java, you would either implement a path type with 
coercions from bytes and strings, or you'd have a class with 
overloaded methods for handling join operations on bytes and strings, 
respectively, thereby avoiding this whole mess.


(Alas, this little example on the 'in' operator also shows that my 
bstr effort would probably fail anyway, because there's no 
'__rcontains__' (__lcontains__?) to allow it to override the str 
type's __contains__.)


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-20 Thread Stephen J. Turnbull
Steven D'Aprano writes:

  Frankly, I believe that pushing the meme that Python 3 is different is 
  a strategic mistake.

I agree that it's strategically undesirable.  Unfortunately, the
genuine backward incompatibility, as well as the huge mind-share
already garnered by what I consider wrong-headed advice from certain
quarters have made pushing the meme that Python 3 is very nearly the
same untenable.  It's hard to beat something like it's not yet time
to use Python 3 with a nuanced explanation.

  had my experience would have been different. It's bad enough to have to 
  tell people Python 3 is currently lacking some critical libraries, 
  particularly third-party libraries without also telling them (wrongly 
  IMO) oh, and it's a new language too.

That's why I propose the C to C++ analogy.  True, C++ does introduce a
lot of new features, but most programmers migrating from C to C++
don't learn to use them properly for years, if ever, I'm told.

Note also that I don't propose this as PSF advertising.  I proposed it
as a response to Mark's question, what should I tell my readers?

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-20 Thread Antoine Pitrou
On Sun, 20 Jun 2010 18:14:02 +0900
Stephen J. Turnbull step...@xemacs.org wrote:
 
   had my experience would have been different. It's bad enough to have to 
   tell people Python 3 is currently lacking some critical libraries, 
   particularly third-party libraries without also telling them (wrongly 
   IMO) oh, and it's a new language too.
 
 That's why I propose the C to C++ analogy.

I think it's an unfortunate analogy. C++ needs new libraries (with
brand new APIs) to take advantage of its abstraction capabilities.
Python 3 has almost the same abstraction capabilities as Python 2, you
don't need to write new libraries: just port the existing ones.

 True, C++ does introduce a
 lot of new features, but most programmers migrating from C to C++
 don't learn to use them properly for years, if ever, I'm told.

I don't see how Python 3 has that problem. You can be productive here
and now in Python 3, re-using your knowledge of Python 2 with a bit of
added information.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-20 Thread Nick Coghlan
On Sun, Jun 20, 2010 at 7:32 PM, Antoine Pitrou solip...@pitrou.net wrote:
 True, C++ does introduce a
 lot of new features, but most programmers migrating from C to C++
 don't learn to use them properly for years, if ever, I'm told.

 I don't see how Python 3 has that problem. You can be productive here
 and now in Python 3, re-using your knowledge of Python 2 with a bit of
 added information.

Yeah, the significant issues with Python 3 over Python 2 *only* apply
to people with legacy Python 2 code to worry about. The one thing that
makes Python 3 potentially less desirable than Python 2 for some new
applications is that the third party library support isn't quite as
good yet. As more of the big libraries and frameworks provide Python
3 compatible versions, that factor will go away.

As far as I can tell, with 3 years still to go on my own original
prediction of 5+ years for Python 3 to start to be competitive with
Python 2 for programming mindshare, adoption actually seems to be
progressing fairly well. A lot of key functionality is either already
supported in Python 3 or will be soon, and a lot of the rest is at
least talking about plans for Python 3 compatibility. It's just that 5
years can seem like an eternity in the internet age, so sometimes
people see the relative lack of adoption of Python 3 at this stage and
start to panic about it being a failure.

Now, if we're still having this conversation in 2013, then I'll admit
we have a problem with the Python 3 uptake rate ;)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-20 Thread Giampaolo Rodolà
2010/6/20 Steven D'Aprano st...@pearwood.info:
 Python 2.x introduced Unicode strings. Python 3.x merely makes them the
 default.

Merely? To me this looks as the main reason why a lot of projects
haven't been ported to Python 3 yet.
I attempted to port pyftpdlib to python 3 several times and the
biggest show stopper has always been the bytes / string difference
introduced by Python 3 which forces you to *know* and *use* Unicode
every time you deal with some text and 2to3 is completely useless
here.
I can only imagine how difficult can it be to do such a conversion in
a project like Twisted or Django where the I/O plays a fundamental
role.

The choice of forcing the user to use Unicode and think in Unicode
was a very brave one, and I'm sure it's for the better, but not
everyone wants to deal with that because Unicode is hard to swallow.
The majority of people prefer to stay with bytes and eventually learn
and introduce Unicode only when that is actually needed.


--- Giampaolo
http://code.google.com/p/pyftpdlib
http://code.google.com/p/psutil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-19 Thread Tres Seaver
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Michael Foord wrote:

 I didn't make myself clear. The expected disappointment I was referring 
 to was about the rate of adoption, not about the quality of the product.
 
 I'm still baffled as to how a bug in the cgi module (along with the 
 acknowledged email problems) is such a big deal. Was it reported and 
 then languished in the bug tracker? That would be bad ion its own but if 
 it was only recently discovered that indicates that it probably isn't 
 such a big deal - either way it needs fixing, but using Python for 
 writing cgis hasn't been a big use case for a long time.

FWIW:  some APIs in the cgi module is actually used by a number of
Python2 web frameworks and libraries:  Paste, for instance, uses it, and
is in turn used by BFG, Pylons, TurboGears.  Zope has used it that way
since for ever.


Tres.
- --
===
Tres Seaver  +1 540-429-0999  tsea...@palladion.com
Palladion Software   Excellence by Designhttp://palladion.com
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkwczNsACgkQ+gerLs4ltQ7IjACfVcUshd10OQfZJqLMmU5p1nZ6
5OcAmwSsn7+q1GO67I1HuOH1waEDI8v/
=1geT
-END PGP SIGNATURE-

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-19 Thread Stephen J. Turnbull
l...@rmi.net writes:

  I agree that 3.X isn't all bad, and I very much hope it succeeds.  And 
  no, I have no answers; I'm just reporting the perception from downwind.

The fact is, though, that many of your downwind readers are not the
audience for Python 3, not yet.  If you want to do Python 3 a favor,
make sure that they understand that Python 3 is *not* an upgrade of
Python 2.  It's a hard task for you, but IMO one strategy is to write
in the style that we wrote the DVCS PEP (#374) in: here's how you do
the same task in these similar languages.  And just as git and Bazaar
turned out to have fatal defects in terms of adoption *in that time
frame*, Python 3 is not yet adoptable for many, many users.

Python 3 is a Python-2-like language, but even though it's built on
the same design principles, and uses nearly identical syntax, there
are fundamental differences.  And it is *very* young.  So it's a new
language and should be approached in the same way as any new language.
Try it on non-mission critical projects, on projects where its library
support has a good reputation, etc.  Many of your readers have no time
(or perhaps no approval from upstairs) for that kind of thing.  Too
bad, but that's what happens to every great new language.

  So here it is: The prevailing view is that 3.X developers hoisted things
  on users that they did not fully work through themselves.  Unicode is 
  prime among these: for all the talk here about how 2.X was broken in 
  this regard, the implications of the 3.X string solution remain to be
  fully resolved in the 3.X standard library to this day.  What is a 
  common Python user to make of that?

Why should she make anything of that?  Python 3 is a *new* language,
possibly as different from Python 2 as C++ was from C (and *more*
different in terms of fundamental incompatibilities).  And as long as
C++ was almost entirely dependent on C libraries, there were problems.
(Not to mention that even today there are plenty of programmers who
are proud to be C programmers, not C++ programmers.)  Today, Python 3
is entirely dependent on Python 2 libraries.  It's human to hope there
will be no problems, but not realistic.

BTW, I think what you're missing is that you're wrong about the money.
Python 3 is still about the fun and the code.  Fun and code are why
the core developers spent about five years developing it, because
doing that was fun, because the new code has high value as code, and
because it promised *them* a more fun and more productive future.

Library support, on the other hand, *is* about money.  Your readers,
down in the trenches of WWW, intraweb, and sysadmin implementation and
support, depend on robust libraries to get their day jobs done.  They
really don't care that writing Python 3 was fun, and that programming
in Python 3 is more fun than ever.  That doesn't compensate for even
one lingering str/bytes bogosity to most of them, and since they don't
get paid for fixing Python library bugs, they don't, and they're in no
mood to *forgive* any, either.

So tell users who feel that way to use Python 2, for now, and check on
Python 3 progress every 6 months or so.  And users who are just a bit
more adventurous to stick to applications where the libraries already
have a good reputation *in Python 3*.  It's as simple as that, I think.

Regards,

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-19 Thread Tres Seaver
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Jesse Noller wrote:
 On Fri, Jun 18, 2010 at 4:48 PM, P.J. Eby p...@telecommunity.com wrote:
 At 05:22 PM 6/18/2010 +, l...@rmi.net wrote:
 So here it is: The prevailing view is that 3.X developers hoisted things
 on users that they did not fully work through themselves.  Unicode is
 prime among these: for all the talk here about how 2.X was broken in
 this regard, the implications of the 3.X string solution remain to be
 fully resolved in the 3.X standard library to this day.  What is a
 common Python user to make of that?
 Certainly, this was my impression as well, after all the Web-SIG discussions
 regarding the state of the stdlib in 3.x with respect to URL parsing,
 joining, opening, etc.
 
 Nothing is set in stone; if something is incredibly painful, or worse
 yet broken, then someone needs to file a bug, bring it to this list,
 or bring up a patch.

Or walk away.

 This is code we're talking about - nothing is set
 in stone, and if something is criminally broken it needs to be first
 identified, and then fixed.
 
 To be honest, I'm waiting to see some sort of tutorial(s) for using 3.x that
 actually addresses these kinds of stdlib usage issues, so that I don't have
 to think about it or futz around with experimenting, possibly to find that
 some things can't be done at all.
 
 I guess tutorial welcome, rather than patch welcome then ;)

The only folks who can write the tutorial are the ones who have already
drunk the koolaid.  Note that I've been making my living with Python for
about twelve years now, and would *like* to use Python3, but can't, yet,
and therefore haven't taken the first sip.

 IOW, 3.x has broken TOOOWTDI for me in some areas.  There may be obvious
 ways to do it, but, as per the Zen of Python, that way may not be obvious
 at first unless you're Dutch.  ;-)
 
 What areas. We need specifics which can either be:
 
 1 Shot down.
 2 Turned into bugs, so they can be fixed
 3 Documented in the core documentation.

That's bloody ironic in a thread which had pointed at reasons why people
are not even considering Py3 for their projects:  those folks won't even
find the issues due to the lack of confidence in the suitability of the
platform.


Tres.
- --
===
Tres Seaver  +1 540-429-0999  tsea...@palladion.com
Palladion Software   Excellence by Designhttp://palladion.com
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkwc0I0ACgkQ+gerLs4ltQ6aDgCguYv+BXou0a42Yi7ERGCHOfIv
6REAnjejq4LDbE9c/gCqB+xs1yGfQ4KR
=/9fw
-END PGP SIGNATURE-

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-19 Thread Jesse Noller



On Jun 19, 2010, at 10:13 AM, Tres Seaver tsea...@palladion.com wrote:


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Jesse Noller wrote:
On Fri, Jun 18, 2010 at 4:48 PM, P.J. Eby p...@telecommunity.com  
wrote:

At 05:22 PM 6/18/2010 +, l...@rmi.net wrote:
So here it is: The prevailing view is that 3.X developers hoisted  
things
on users that they did not fully work through themselves.   
Unicode is
prime among these: for all the talk here about how 2.X was broken  
in
this regard, the implications of the 3.X string solution remain  
to be

fully resolved in the 3.X standard library to this day.  What is a
common Python user to make of that?
Certainly, this was my impression as well, after all the Web-SIG  
discussions
regarding the state of the stdlib in 3.x with respect to URL  
parsing,

joining, opening, etc.


Nothing is set in stone; if something is incredibly painful, or worse
yet broken, then someone needs to file a bug, bring it to this list,
or bring up a patch.


Or walk away.



Ok. If you want.


This is code we're talking about - nothing is set
in stone, and if something is criminally broken it needs to be first
identified, and then fixed.

To be honest, I'm waiting to see some sort of tutorial(s) for  
using 3.x that
actually addresses these kinds of stdlib usage issues, so that I  
don't have
to think about it or futz around with experimenting, possibly to  
find that

some things can't be done at all.


I guess tutorial welcome, rather than patch welcome then ;)


The only folks who can write the tutorial are the ones who have  
already
drunk the koolaid.  Note that I've been making my living with Python  
for
about twelve years now, and would *like* to use Python3, but can't,  
yet,

and therefore haven't taken the first sip.


Why can't you? Is it a bug? Let's file it and fix it. Is it that you  
need a dependency ported? Cool - let's bring it up to the maintainers,  
or this list, or ask the PSF to push resources into helping port.  
Anything but nothing.


If what you're saying is that python 3 is a completely unsuitable  
platform, well, then yeah - we can all fix it or walk away.




IOW, 3.x has broken TOOOWTDI for me in some areas.  There may be  
obvious
ways to do it, but, as per the Zen of Python, that way may not be  
obvious

at first unless you're Dutch.  ;-)


What areas. We need specifics which can either be:

1 Shot down.
2 Turned into bugs, so they can be fixed
3 Documented in the core documentation.


That's bloody ironic in a thread which had pointed at reasons why  
people
are not even considering Py3 for their projects:  those folks won't  
even
find the issues due to the lack of confidence in the suitability of  
the

platform.


What I saw was a thread about some issues in email, and cgi. We have  
some work being done to address the issue. This will help resolve some  
of the issues.


I'd there are other issues, then we should step up and either help, or  
get out ofthe way. Arguing about the viability of a platform we knew  
would take a bit for adoption is silly and breeds ill will.


It's not a turd, and it's not hopeless, in fact rumor has it NumPy  
will be ported soon which is a major stepping stone.


 The only way to counteract this meme that python 3 is horribly  
broken is to prove that it's not, fix bugs, and move on. There's no  
point debating relative turdiness here.


Jesse
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-19 Thread Jesse Noller
On Sat, Jun 19, 2010 at 10:59 AM, Jesse Noller jnol...@gmail.com wrote:


 On Jun 19, 2010, at 10:13 AM, Tres Seaver tsea...@palladion.com wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Jesse Noller wrote:

 On Fri, Jun 18, 2010 at 4:48 PM, P.J. Eby p...@telecommunity.com wrote:

 At 05:22 PM 6/18/2010 +, l...@rmi.net wrote:

 So here it is: The prevailing view is that 3.X developers hoisted
 things
 on users that they did not fully work through themselves.  Unicode is
 prime among these: for all the talk here about how 2.X was broken in
 this regard, the implications of the 3.X string solution remain to be
 fully resolved in the 3.X standard library to this day.  What is a
 common Python user to make of that?

 Certainly, this was my impression as well, after all the Web-SIG
 discussions
 regarding the state of the stdlib in 3.x with respect to URL parsing,
 joining, opening, etc.

 Nothing is set in stone; if something is incredibly painful, or worse
 yet broken, then someone needs to file a bug, bring it to this list,
 or bring up a patch.

 Or walk away.


 Ok. If you want.

 This is code we're talking about - nothing is set
 in stone, and if something is criminally broken it needs to be first
 identified, and then fixed.

 To be honest, I'm waiting to see some sort of tutorial(s) for using 3.x
 that
 actually addresses these kinds of stdlib usage issues, so that I don't
 have
 to think about it or futz around with experimenting, possibly to find
 that
 some things can't be done at all.

 I guess tutorial welcome, rather than patch welcome then ;)

 The only folks who can write the tutorial are the ones who have already
 drunk the koolaid.  Note that I've been making my living with Python for
 about twelve years now, and would *like* to use Python3, but can't, yet,
 and therefore haven't taken the first sip.

 Why can't you? Is it a bug? Let's file it and fix it. Is it that you need a
 dependency ported? Cool - let's bring it up to the maintainers, or this
 list, or ask the PSF to push resources into helping port. Anything but
 nothing.

 If what you're saying is that python 3 is a completely unsuitable platform,
 well, then yeah - we can all fix it or walk away.


 IOW, 3.x has broken TOOOWTDI for me in some areas.  There may be obvious
 ways to do it, but, as per the Zen of Python, that way may not be
 obvious
 at first unless you're Dutch.  ;-)

 What areas. We need specifics which can either be:

 1 Shot down.
 2 Turned into bugs, so they can be fixed
 3 Documented in the core documentation.

 That's bloody ironic in a thread which had pointed at reasons why people
 are not even considering Py3 for their projects:  those folks won't even
 find the issues due to the lack of confidence in the suitability of the
 platform.

 What I saw was a thread about some issues in email, and cgi. We have some
 work being done to address the issue. This will help resolve some of the
 issues.

 I'd there are other issues, then we should step up and either help, or get
 out ofthe way. Arguing about the viability of a platform we knew would take
 a bit for adoption is silly and breeds ill will.


s/I'd/If - stupid phone.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-19 Thread P.J. Eby

At 10:55 PM 6/19/2010 +0900, Stephen J. Turnbull wrote:
They really don't care that writing Python 3 was fun, and that 
programming in Python 3 is more fun than ever.  That doesn't 
compensate for even one lingering str/bytes bogosity to most of 
them, and since they don't get paid for fixing Python library bugs, 
they don't, and they're in no mood to *forgive* any, either.


This is pretty much where I'm at, except that the only potential fun 
increase Py3 appears to offer me are argument annotations and 
keyword-only args -- but these are partly balanced by the loss of 
argument tuple unpacking.  The metaclass keyword argument is nice, 
but the loss of dynamically-settable __metaclass__ is just plain annoying.


Really, just about everything that Py3 offers in the way of added 
fun, seems offset by a matching loss somewhere else.  So it's hard to 
get excited about it - it seems like, ho hum, a new language that's 
kind of like Python, but just different enough to be annoying.


OTOH, I don't know what to do about that, besides adding some sort of 
killer app feature that makes Python 3 the One Obvious Way to do 
some specific application domain.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-19 Thread Raymond Hettinger

On Jun 18, 2010, at 7:39 PM, Terry Reedy wrote:

 On 6/18/2010 6:51 PM, Raymond Hettinger wrote:
 There has been a disappointing
 lack of bug reports across the board for 3.x.
 
 Here is one from this week involving the interaction of array and bytearray. 
 It needs a comment from someone who can understand the C-API based patch, 
 which is beyond me.
 http://bugs.python.org/issue8990

I'll take a look at this one.


Raymond


P.S.  For those who are interested, here is the story on BeautifulSoup:
http://www.crummy.com/software/BeautifulSoup/3.1-problems.html
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-19 Thread Steven D'Aprano
On Sun, 20 Jun 2010 12:13:34 am Tres Seaver wrote:

  I guess tutorial welcome, rather than patch welcome then ;)

 The only folks who can write the tutorial are the ones who have
 already drunk the koolaid.  Note that I've been making my living with
 Python for about twelve years now, and would *like* to use Python3,
 but can't, yet, and therefore haven't taken the first sip.

You emphatically say you would like to use Python3, but describe those 
who already have as having drunk the Koolaid. Comparing those who can 
and have successfully moved to Python3 with the Jonestown cult 
mass-suicide doesn't really strike me as a sign that you want to join 
them.



-- 
Steven D'Aprano
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-18 Thread Stephen J. Turnbull
l...@rmi.net writes:

  FWIW, after rewriting Programming Python for 3.1, 3.x still feels
  a lot like a beta to me, almost 2 years after its release.

Email, of course, is a big wart.  But guess what?  Python 2's email
module doesn't actually work!  Sure, the program runs most of the
time, but every program that depends on email must acquire inches of
armorplate against all the things that can go wrong.  You simply can't
rely on it to DTRT except in a pre-MIME, pre-HTML, ASCII-only world.
Although they're often addressing general problems, these hacks are
*not* integrated back into the email module in most cases, but remain
app-specific voodoo.

If you live in Kansas, sure, you can concentrate on dodging tornados
and completely forget about Unicode and MIME and text/bogus content.
For the rest of the world, though, the problem is not Python 3.  It's
STD 11 (which still points at RFC 822, dated 1982!)  It's really
inappropriate to point at the email module, whose developers are
trying *not* to punt on conformance and robustness, when even the IETF
can only run in circles, scream and shout!

Maybe there are other problems with Python 3 that deserve to be
pointed at, but given the general scarcity of resources I think the
email module developers are working on the right things.  Unlike many
other modules, email really needs to be rewritten from the ground
(Python 3) up, because of the centrality of bytes/unicode confusion to
all email problems.  Python 3 completely changes the assumptions
there; a Python 2-style email module really can't work properly.

Then on top of that, today we know a lot more about handling issues
like text/html content and MIME in general than when the Python 2
email module was designed.  New problems have arisen over the period
of Python 3 development, like domain keys, which email doesn't
handle out of the box AFAIK, but email for Python 3 should IMHO.

Should Python 3 have been held back until email was fixed?  Dunno, but
I personally am very glad it was not; where I have a choice, I always
use Python 3 now, and have yet to run into a problem.  I expect that
to change if I can find the time to get involved in email and Mailman
3 development, of course.wink

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-18 Thread lutz
Replying en masse to save bandwidth here...

Barry Warsaw ba...@python.org writes:
 We know it, we have extensively discussed how to fix it, we have IMO a good
 design, and we even have someone willing and able to tackle the problem.  We
 need to find a sufficient source of funding to enable him to do the work it
 will take, and so far that's been the biggest stumbling block.  It will take a
 focused and determined effort to see this through, and it's obvious that
 volunteers cannot make it happen.  I include myself in the latter category, as
 I've tried and failed at least twice to do it in my spare time.

All understood, and again, not to disparage anyone here.  My 
comments are directed to the development community at large
to underscore the grave p/r problems 3.X faces.

I realize email parsing is a known issue; I also realize that
most people evaluating 3.X today won't care that it is.  Most
will care only that the new version of a language reportedly 
used by Google and YouTube still doesn't support CGI uploads 
a year and a half after its release.  As an author, that's a 
downright horrible story to have to tell the world.


Stephen J. Turnbull step...@xemacs.org writes:
 Email, of course, is a big wart.  But guess what?  Python 2's email
 module doesn't actually work! 

Yes it does (see next point).

 If you live in Kansas, sure, you can concentrate on dodging tornados
 and completely forget about Unicode and MIME and text/bogus content.
 For the rest of the world, though, the problem is not Python 3

Yes it is, and Kansas is a lot bigger than you seem to think.

I want to reiterate that I was able to build a feature rich
email client with the email package as it exists in 3.1.  This
includes support on both the receiving and sending sides for HTML,
arbitrary attachments, and decoding and encoding of both text 
payloads and headers according to email, MIME, and Unicode/I18N
standards.  It's an amazingly useful package, and does work as is
in 3.X.  The two main issues I found have been recently fixed.  
It's unfortunate that this package is also the culprit behind CGI
breakage, but it's not clear why it became a critical path for so
much utility in the first place.

The package might not be aesthetically ideal, but to me it 
seems that an utterly incompatible overhaul of this in the name
of supporting potentially very different data streams is a huge
functional overload.  And to those people in Kansas who live 
outside the pydev clique, replacing it with something different 
at this point will look as if an incompatible Python is already 
incompatible with releases in its own line.  Why in the world 
would anyone base a new project on that sort of thrashing?

For my part, I've had to add far too many notes to the upcoming
edition of Programming Python about major pieces of functionality
that worked in 2.X but no longer do in 3.X.  That's disappointing
to me personally, but it will probably seem a lot worse to the
book's tens of thousands of readers.  Yet this is the reality 
that 3.X has created for itself.

 Should Python 3 have been held back until email was fixed?  Dunno, but
 I personally am very glad it was not; where I have a choice, I always
 use Python 3 now, and have yet to run into a problem. 

I guess we'll just have to disagree on that.  IMHO, Python 3 shot
itself in the foot by releasing in half-baked form.  And the 3.0 
I/O speed issue (remember that?) came very close to blowing its 
leg clean off.

The reality out there in Kansas today is that 3.X is perceived as 
so bad that it could very well go the way of POP4 if its story does
not improve.  I don't know what sort of Python world will be left
behind in the wake, but I do know it will probably be much smaller.


Steve Holden st...@holdenweb.com writes:
 Lest the readership think that the PSF is unaware of this issue, allow
 me to point out that we have already partially funded this effort, and
 are still offering R. David Murray some further matching funds if he can
 raise sponsorship to complete the effort (on which he has made a very
 promising start).
 
 We are also attempting to enable tax-deductible fund raising to increase
 the likelihood of David's finding support. Perhaps we need to think
 about a broader campaign to increase the quality of the python 3
 libraries. I find it very annoying that the #python IRC group still has
 Don't use Python 3 in it's topic.  They adamantly refuse to remove it
 until there is better library support, and they are the guys who see the
 issues day in day out so it is hard to argue with them (and I don't
 think an autocratic decision-making process would be appropriate).

I'm all for people getting paid for work they do, but with all
due respect, I think this underscores part of the problem in 
the Python world today.  If funding had been as stringent a 
prerequisite in the 90s, I doubt there would be a Python today.
It was about the fun and the code, not the bucks and the 
bureaucracy.  As 

Re: [Python-Dev] email package status in 3.X

2010-06-18 Thread Michael Foord

On 18/06/2010 16:09, l...@rmi.net wrote:

Replying en masse to save bandwidth here...

Barry Warsawba...@python.org  writes:
   

We know it, we have extensively discussed how to fix it, we have IMO a good
design, and we even have someone willing and able to tackle the problem.  We
need to find a sufficient source of funding to enable him to do the work it
will take, and so far that's been the biggest stumbling block.  It will take a
focused and determined effort to see this through, and it's obvious that
volunteers cannot make it happen.  I include myself in the latter category, as
I've tried and failed at least twice to do it in my spare time.
 

All understood, and again, not to disparage anyone here.  My
comments are directed to the development community at large
to underscore the grave p/r problems 3.X faces.

I realize email parsing is a known issue; I also realize that
most people evaluating 3.X today won't care that it is.  Most
will care only that the new version of a language reportedly
used by Google and YouTube still doesn't support CGI uploads
a year and a half after its release.  As an author, that's a
downright horrible story to have to tell the world.

   


Really? How widely used is the CGI module these days? Maybe there is a 
reason nobody appeared to notice...




[snip...]

Should Python 3 have been held back until email was fixed?  Dunno, but
I personally am very glad it was not; where I have a choice, I always
use Python 3 now, and have yet to run into a problem.
 

I guess we'll just have to disagree on that.  IMHO, Python 3 shot
itself in the foot by releasing in half-baked form.  And the 3.0
I/O speed issue (remember that?) came very close to blowing its
leg clean off.

   


Whilst I agree that there are plenty of issues to workon, and I don't 
underestimate the difficulty of some of them, I think half-baked is 
very much overblown. Whilst you have a lot to say about how much of a 
problem this is I don't understand what you are suggesting be *done*?


Python 3.0 was *declared* to be an experimental release, and by most 
standards 3.1 (in terms of the core language and functionality) was a 
solid release.


Any reasonable expectation about Python 3 adoption predicted that it 
would take years, and would include going through a phase of difficulty 
and disappointment...


All the best,

Michael Foord

--
http://www.ironpythoninaction.com/
http://www.voidspace.org.uk/blog

READ CAREFULLY. By accepting and reading this email you agree, on behalf of 
your employer, to release me from all obligations and waivers arising from any 
and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, 
clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and 
acceptable use policies (”BOGUS AGREEMENTS”) that I have entered into with your 
employer, its partners, licensors, agents and assigns, in perpetuity, without 
prejudice to my ongoing rights and privileges. You further represent that you 
have the authority to release me from any BOGUS AGREEMENTS on behalf of your 
employer.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-18 Thread lutz
 Python 3.0 was *declared* to be an experimental release, and by most 
 standards 3.1 (in terms of the core language and functionality) was a 
 solid release.
 
 Any reasonable expectation about Python 3 adoption predicted that it 
 would take years, and would include going through a phase of difficulty 
 and disappointment...

Declaring something to be a turd doesn't change the fact that
it's a turd.  I have a feeling that most people outside this
list would have much rather avoided the difficulty and 
disappointment altogether.

Let's be honest here; 3.X was released to the community in part 
as an extended beta.  That's not a problem, unless you drop the 
word beta.  And if you're still not buying that, imagine the sort
of response you'd get if you tried to sell software that billed 
itself as experimental, and promised a phase of disappointment.  
Why would you expect the Python world to react any differently?

 Whilst I agree that there are plenty of issues to workon, and I don't 
 underestimate the difficulty of some of them, I think half-baked is 
 very much overblown. Whilst you have a lot to say about how much of a 
 problem this is I don't understand what you are suggesting be *done*?

I agree that 3.X isn't all bad, and I very much hope it succeeds.  And 
no, I have no answers; I'm just reporting the perception from downwind.

So here it is: The prevailing view is that 3.X developers hoisted things
on users that they did not fully work through themselves.  Unicode is 
prime among these: for all the talk here about how 2.X was broken in 
this regard, the implications of the 3.X string solution remain to be
fully resolved in the 3.X standard library to this day.  What is a 
common Python user to make of that?

--Mark Lutz  (http://learning-python.com, http://rmi.net/~lutz)



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-18 Thread Michael Foord

On 18/06/2010 18:22, l...@rmi.net wrote:

Python 3.0 was *declared* to be an experimental release, and by most
standards 3.1 (in terms of the core language and functionality) was a
solid release.

Any reasonable expectation about Python 3 adoption predicted that it
would take years, and would include going through a phase of difficulty
and disappointment...
 

Declaring something to be a turd doesn't change the fact that
it's a turd.


Right - but *you're* the one calling it a turd, which is not a helpful 
approach or likely to achieve *anything* useful. I still have no idea 
what you are actually suggesting.



I have a feeling that most people outside this
list would have much rather avoided the difficulty and
disappointment altogether.

Let's be honest here; 3.X was released to the community in part
as an extended beta.


Correction - 3.0 was an experimental release. That is not true of 3.1 
and future releases.


All the best,

Michael

That's not a problem, unless you drop the
word beta.  And if you're still not buying that, imagine the sort
of response you'd get if you tried to sell software that billed
itself as experimental, and promised a phase of disappointment.
Why would you expect the Python world to react any differently?

   

Whilst I agree that there are plenty of issues to workon, and I don't
underestimate the difficulty of some of them, I think half-baked is
very much overblown. Whilst you have a lot to say about how much of a
problem this is I don't understand what you are suggesting be *done*?
 

I agree that 3.X isn't all bad, and I very much hope it succeeds.  And
no, I have no answers; I'm just reporting the perception from downwind.

So here it is: The prevailing view is that 3.X developers hoisted things
on users that they did not fully work through themselves.  Unicode is
prime among these: for all the talk here about how 2.X was broken in
this regard, the implications of the 3.X string solution remain to be
fully resolved in the 3.X standard library to this day.  What is a
common Python user to make of that?

--Mark Lutz  (http://learning-python.com, http://rmi.net/~lutz)


   



--
http://www.ironpythoninaction.com/
http://www.voidspace.org.uk/blog

READ CAREFULLY. By accepting and reading this email you agree, on behalf of 
your employer, to release me from all obligations and waivers arising from any 
and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, 
clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and 
acceptable use policies (”BOGUS AGREEMENTS”) that I have entered into with your 
employer, its partners, licensors, agents and assigns, in perpetuity, without 
prejudice to my ongoing rights and privileges. You further represent that you 
have the authority to release me from any BOGUS AGREEMENTS on behalf of your 
employer.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-18 Thread Bill Janssen
Giampaolo Rodolà g.rod...@gmail.com wrote:

 2010/6/17 Bill Janssen jans...@parc.com:
 
  There's a related meta-issue having to do with antique protocols.
 
 Can I know what meta-issue are you talking about exactly?

Giampaolo, I believe that you and I have already discussed this on one
of the FTP issues.

Bill

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-18 Thread Giampaolo Rodolà
2010/6/18 Bill Janssen jans...@parc.com:
 Giampaolo Rodolà g.rod...@gmail.com wrote:

 2010/6/17 Bill Janssen jans...@parc.com:

  There's a related meta-issue having to do with antique protocols.

 Can I know what meta-issue are you talking about exactly?

 Giampaolo, I believe that you and I have already discussed this on one
 of the FTP issues.

 Bill

I only remember a discussion in which I was against removing OOB data
support from asyncore in order to support certain parts of the FTP
protocol using it, but that's all.
I don't see how urlib or any other stdlib module is supposed to be
penalized by FTP protocol in any way.

--- Giampaolo
http://code.google.com/p/pyftpdlib
http://code.google.com/p/psutil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-18 Thread lutz
I wasn't calling Python 3 a turd.  I was trying to show
the strangeness of the logic behind your rationalization.
And failing badly... (maybe I should have used tar ball?)

What I'm suggesting is that extreme caution be exercised from
this point forward with all things 3.X-related.  Whether you 
wish to accept this or not, 3.X has a negative image to many.
This suggestion specifically includes not abandoning current 
3.X email package users as a case in point.  Ripping the rug 
out from new 3.X users after they took the time to port seems
like it may be just enough to tip the scales altogether.

--Mark Lutz  (http://learning-python.com, http://rmi.net/~lutz)


 -Original Message-
 From: Michael Foord fuzzy...@voidspace.org.uk
 To: l...@rmi.net
 Subject: Re: [Python-Dev] email package status in 3.X
 Date: Fri, 18 Jun 2010 18:27:46 +0100
 
 On 18/06/2010 18:22, l...@rmi.net wrote:
  Python 3.0 was *declared* to be an experimental release, and by most
  standards 3.1 (in terms of the core language and functionality) was a
  solid release.
 
  Any reasonable expectation about Python 3 adoption predicted that it
  would take years, and would include going through a phase of difficulty
  and disappointment...
   
  Declaring something to be a turd doesn't change the fact that
  it's a turd.
 
 Right - but *you're* the one calling it a turd, which is not a helpful 
 approach or likely to achieve *anything* useful. I still have no idea 
 what you are actually suggesting.
 
  I have a feeling that most people outside this
  list would have much rather avoided the difficulty and
  disappointment altogether.
 
  Let's be honest here; 3.X was released to the community in part
  as an extended beta.
 
 Correction - 3.0 was an experimental release. That is not true of 3.1 
 and future releases.
 
 All the best,
 
 Michael
  That's not a problem, unless you drop the
  word beta.  And if you're still not buying that, imagine the sort
  of response you'd get if you tried to sell software that billed
  itself as experimental, and promised a phase of disappointment.
  Why would you expect the Python world to react any differently?
 
 
  Whilst I agree that there are plenty of issues to workon, and I don't
  underestimate the difficulty of some of them, I think half-baked is
  very much overblown. Whilst you have a lot to say about how much of a
  problem this is I don't understand what you are suggesting be *done*?
   
  I agree that 3.X isn't all bad, and I very much hope it succeeds.  And
  no, I have no answers; I'm just reporting the perception from downwind.
 
  So here it is: The prevailing view is that 3.X developers hoisted things
  on users that they did not fully work through themselves.  Unicode is
  prime among these: for all the talk here about how 2.X was broken in
  this regard, the implications of the 3.X string solution remain to be
  fully resolved in the 3.X standard library to this day.  What is a
  common Python user to make of that?
 
  --Mark Lutz  (http://learning-python.com, http://rmi.net/~lutz)
 
 
 
 
 
 -- 
 http://www.ironpythoninaction.com/
 http://www.voidspace.org.uk/blog
 
 READ CAREFULLY. By accepting and reading this email you agree, on behalf of 
 your employer, to release me from all obligations and waivers arising from 
 any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap,
  clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and 
 acceptable use policies (”BOGUS AGREEMENTS”) that I have entered into with 
 your employer, its partners, licensors, agents and assigns, in perpetuity, 
 without prejudice to my ongoing rights and privileges. You further represent 
 that you have the authority to release me from any BOGUS AGREEMENTS on behalf 
 of your employer.
 
 
 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-18 Thread P.J. Eby

At 05:22 PM 6/18/2010 +, l...@rmi.net wrote:

So here it is: The prevailing view is that 3.X developers hoisted things
on users that they did not fully work through themselves.  Unicode is
prime among these: for all the talk here about how 2.X was broken in
this regard, the implications of the 3.X string solution remain to be
fully resolved in the 3.X standard library to this day.  What is a
common Python user to make of that?


Certainly, this was my impression as well, after all the Web-SIG 
discussions regarding the state of the stdlib in 3.x with respect to 
URL parsing, joining, opening, etc.


To be honest, I'm waiting to see some sort of tutorial(s) for using 
3.x that actually addresses these kinds of stdlib usage issues, so 
that I don't have to think about it or futz around with 
experimenting, possibly to find that some things can't be done at all.


IOW, 3.x has broken TOOOWTDI for me in some areas.  There may be 
obvious ways to do it, but, as per the Zen of Python, that way may 
not be obvious at first unless you're Dutch.  ;-)
Since at the moment Python 3 offers me only cosmetic improvements 
over 2.x (apart from argument annotations), it's hard to get excited 
enough about it to want to muck about with porting anything to it, or 
even trying to learn about all the ramifications of the changes.  :-(


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-18 Thread Jesse Noller
On Fri, Jun 18, 2010 at 4:48 PM, P.J. Eby p...@telecommunity.com wrote:
 At 05:22 PM 6/18/2010 +, l...@rmi.net wrote:

 So here it is: The prevailing view is that 3.X developers hoisted things
 on users that they did not fully work through themselves.  Unicode is
 prime among these: for all the talk here about how 2.X was broken in
 this regard, the implications of the 3.X string solution remain to be
 fully resolved in the 3.X standard library to this day.  What is a
 common Python user to make of that?

 Certainly, this was my impression as well, after all the Web-SIG discussions
 regarding the state of the stdlib in 3.x with respect to URL parsing,
 joining, opening, etc.

Nothing is set in stone; if something is incredibly painful, or worse
yet broken, then someone needs to file a bug, bring it to this list,
or bring up a patch. This is code we're talking about - nothing is set
in stone, and if something is criminally broken it needs to be first
identified, and then fixed.

 To be honest, I'm waiting to see some sort of tutorial(s) for using 3.x that
 actually addresses these kinds of stdlib usage issues, so that I don't have
 to think about it or futz around with experimenting, possibly to find that
 some things can't be done at all.

I guess tutorial welcome, rather than patch welcome then ;)

 IOW, 3.x has broken TOOOWTDI for me in some areas.  There may be obvious
 ways to do it, but, as per the Zen of Python, that way may not be obvious
 at first unless you're Dutch.  ;-)

What areas. We need specifics which can either be:

1 Shot down.
2 Turned into bugs, so they can be fixed
3 Documented in the core documentation.

jesse
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-18 Thread Michael Foord

On 18/06/2010 19:52, l...@rmi.net wrote:

I wasn't calling Python 3 a turd.  I was trying to show
the strangeness of the logic behind your rationalization.
And failing badly... (maybe I should have used tar ball?)

   


I didn't make myself clear. The expected disappointment I was referring 
to was about the rate of adoption, not about the quality of the product.


I'm still baffled as to how a bug in the cgi module (along with the 
acknowledged email problems) is such a big deal. Was it reported and 
then languished in the bug tracker? That would be bad ion its own but if 
it was only recently discovered that indicates that it probably isn't 
such a big deal - either way it needs fixing, but using Python for 
writing cgis hasn't been a big use case for a long time.


All the best,

Michael


What I'm suggesting is that extreme caution be exercised from
this point forward with all things 3.X-related.  Whether you
wish to accept this or not, 3.X has a negative image to many.
This suggestion specifically includes not abandoning current
3.X email package users as a case in point.  Ripping the rug
out from new 3.X users after they took the time to port seems
like it may be just enough to tip the scales altogether.

--Mark Lutz  (http://learning-python.com, http://rmi.net/~lutz)


   

-Original Message-
From: Michael Foordfuzzy...@voidspace.org.uk
To: l...@rmi.net
Subject: Re: [Python-Dev] email package status in 3.X
Date: Fri, 18 Jun 2010 18:27:46 +0100

On 18/06/2010 18:22, l...@rmi.net wrote:
 

Python 3.0 was *declared* to be an experimental release, and by most
standards 3.1 (in terms of the core language and functionality) was a
solid release.

Any reasonable expectation about Python 3 adoption predicted that it
would take years, and would include going through a phase of difficulty
and disappointment...

 

Declaring something to be a turd doesn't change the fact that
it's a turd.
   

Right - but *you're* the one calling it a turd, which is not a helpful
approach or likely to achieve *anything* useful. I still have no idea
what you are actually suggesting.

 

I have a feeling that most people outside this
list would have much rather avoided the difficulty and
disappointment altogether.

Let's be honest here; 3.X was released to the community in part
as an extended beta.
   

Correction - 3.0 was an experimental release. That is not true of 3.1
and future releases.

All the best,

Michael
 

That's not a problem, unless you drop the
word beta.  And if you're still not buying that, imagine the sort
of response you'd get if you tried to sell software that billed
itself as experimental, and promised a phase of disappointment.
Why would you expect the Python world to react any differently?


   

Whilst I agree that there are plenty of issues to workon, and I don't
underestimate the difficulty of some of them, I think half-baked is
very much overblown. Whilst you have a lot to say about how much of a
problem this is I don't understand what you are suggesting be *done*?

 

I agree that 3.X isn't all bad, and I very much hope it succeeds.  And
no, I have no answers; I'm just reporting the perception from downwind.

So here it is: The prevailing view is that 3.X developers hoisted things
on users that they did not fully work through themselves.  Unicode is
prime among these: for all the talk here about how 2.X was broken in
this regard, the implications of the 3.X string solution remain to be
fully resolved in the 3.X standard library to this day.  What is a
common Python user to make of that?

--Mark Lutz  (http://learning-python.com, http://rmi.net/~lutz)



   


--
http://www.ironpythoninaction.com/
http://www.voidspace.org.uk/blog

READ CAREFULLY. By accepting and reading this email you agree, on behalf of
your employer, to release me from all obligations and waivers arising from
any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap,
  clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and
acceptable use policies (”BOGUS AGREEMENTS”) that I have entered into with
your employer, its partners, licensors, agents and assigns, in perpetuity,
without prejudice to my ongoing rights and privileges. You further represent
that you have the authority to release me from any BOGUS AGREEMENTS on behalf
of your employer.



 



--
http://www.ironpythoninaction.com/
http://www.voidspace.org.uk/blog

READ CAREFULLY. By accepting and reading this email you agree, on behalf of 
your employer, to release me from all obligations and waivers arising from any 
and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, 
clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and 
acceptable use policies (”BOGUS AGREEMENTS”) that I have entered into with your 
employer, its partners, licensors, agents and assigns, in perpetuity, without 
prejudice to my ongoing rights and privileges. You further

Re: [Python-Dev] email package status in 3.X

2010-06-18 Thread Neil Hodgson
Michael Foord:

 Python 3.0 was *declared* to be an experimental release, and by most
 standards 3.1 (in terms of the core language and functionality) was a solid
 release.

   That looks to me like an after-the-event rationalization. The
release note for Python 3.0 (and the What's new) gives no indication
that it is experimental but does say 
We are confident that Python 3.0 is of the same high quality as our
previous releases ...
you can safely choose either version (or both) to use in your projects. 
http://mail.python.org/pipermail/python-dev/2008-December/083824.html

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-18 Thread Raymond Hettinger

On Jun 18, 2010, at 3:08 PM, Michael Foord wrote:

 I'm still baffled as to how a bug in the cgi module (along with the 
 acknowledged email problems) is such a big deal. Was it reported and then 
 languished in the bug tracker? That would be bad ion its own but if it was 
 only recently discovered that indicates that it probably isn't such a big 
 deal - either way it needs fixing, but using Python for writing cgis hasn't 
 been a big use case for a long time.

That's one possible explanation.  Another possible explanation is the product 
isn't being heavily exercised for serious work and that it has yet to be 
shaken-out thoroughly.   There has been a disappointing lack of bug reports 
across the board for 3.x.  That doesn't mean that the bugs aren't there and 
that they won't be reported when adoption is heavier.

In the cases of email, mime handling, cgi and whatnot, the important point is 
not whether a given technology is popular.  The important part is that it hints 
at the kind of bytes/text issues that people are going to face and that we will 
need to help them address (i.e. such as blobs containing multiple encodings, a 
need to use byte oriented tools such as md5 in conjunction with text oriented 
applications, etc.)

One other thought:  In addition to not getting many 3.x specific bug reports, 
we don't seem to be getting many  3.x specific help questions (i.e. asking 
about dictviews or how to make a priority queue in a environment where many 
callable don't support ordering operations, etc.). 


 Mark Lutz wrote

 What I'm suggesting is that extreme caution be exercised from
 this point forward with all things 3.X-related.  Whether you
 wish to accept this or not, 3.X has a negative image to many.
 This suggestion specifically includes not abandoning current
 3.X email package users as a case in point.  Ripping the rug
 out from new 3.X users after they took the time to port seems
 like it may be just enough to tip the scales altogether.

A couple other areas that need work (some of them are minor):

* BeautifulSoup was left behind when SGML parsing was removed from the standard 
lib.
* Shelves were crippled for Windows users when bsddb was ripped out.
* Lists containing None for missing values are no longer sortable.
* The basic heapq approach to making a priority queue not longer works well.
   Simply decorating with (priority_level, callable_or_object) fails with two 
tasks at the
   same priority if the callable or other objects aren't orderable.


Raymond

P.S.  I do think it would be great if we could direct some attention
to parts of 3.x that are really nice.  Am hoping that this conversation
doesn't drown in negativity.   Instead, it should focus on what 
improvements are needed to win broader adoption.



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-18 Thread Michael Foord

On 18/06/2010 23:51, Raymond Hettinger wrote:

On Jun 18, 2010, at 3:08 PM, Michael Foord wrote:

   

I'm still baffled as to how a bug in the cgi module (along with the 
acknowledged email problems) is such a big deal. Was it reported and then 
languished in the bug tracker? That would be bad ion its own but if it was only 
recently discovered that indicates that it probably isn't such a big deal - 
either way it needs fixing, but using Python for writing cgis hasn't been a big 
use case for a long time.
 

That's one possible explanation.  Another possible explanation is the product 
isn't being heavily exercised for serious work and that it has yet to be 
shaken-out thoroughly.   There has been a disappointing lack of bug reports 
across the board for 3.x.  That doesn't mean that the bugs aren't there and 
that they won't be reported when adoption is heavier.

   


Oh, I quite agree. I don't think it makes py3k a turd either.


In the cases of email, mime handling, cgi and whatnot, the important point is 
not whether a given technology is popular.  The important part is that it hints 
at the kind of bytes/text issues that people are going to face and that we will 
need to help them address (i.e. such as blobs containing multiple encodings, a 
need to use byte oriented tools such as md5 in conjunction with text oriented 
applications, etc.)

One other thought:  In addition to not getting many 3.x specific bug reports, 
we don't seem to be getting many  3.x specific help questions (i.e. asking 
about dictviews or how to make a priority queue in a environment where many 
callable don't support ordering operations, etc.).

   


Most of the questions I've seen about Python 3 are from library authors 
doing porting rather than application developers. This is to be expected 
I guess.



   

Mark Lutz wrote
 
   

What I'm suggesting is that extreme caution be exercised from
this point forward with all things 3.X-related.  Whether you
wish to accept this or not, 3.X has a negative image to many.
This suggestion specifically includes not abandoning current
3.X email package users as a case in point.  Ripping the rug
out from new 3.X users after they took the time to port seems
like it may be just enough to tip the scales altogether.
 

A couple other areas that need work (some of them are minor):

* BeautifulSoup was left behind when SGML parsing was removed from the standard 
lib.
* Shelves were crippled for Windows users when bsddb was ripped out.
* Lists containing None for missing values are no longer sortable.
   


Yeah, this one can be a bugger. :-)


* The basic heapq approach to making a priority queue not longer works well.
Simply decorating with (priority_level, callable_or_object) fails with two 
tasks at the
same priority if the callable or other objects aren't orderable.


Raymond

P.S.  I do think it would be great if we could direct some attention
to parts of 3.x that are really nice.  Am hoping that this conversation
doesn't drown in negativity.   Instead, it should focus on what
improvements are needed to win broader adoption.


   


I definitely agree that our focus should be on fixing problems as we 
find them and working on increasing adoption. No argument from me.


All the best,

Michael






--
http://www.ironpythoninaction.com/
http://www.voidspace.org.uk/blog

READ CAREFULLY. By accepting and reading this email you agree, on behalf of 
your employer, to release me from all obligations and waivers arising from any 
and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, 
clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and 
acceptable use policies (”BOGUS AGREEMENTS”) that I have entered into with your 
employer, its partners, licensors, agents and assigns, in perpetuity, without 
prejudice to my ongoing rights and privileges. You further represent that you 
have the authority to release me from any BOGUS AGREEMENTS on behalf of your 
employer.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-18 Thread Terry Reedy

On 6/18/2010 6:51 PM, Raymond Hettinger wrote:

There has been a disappointing
lack of bug reports across the board for 3.x.


Here is one from this week involving the interaction of array and 
bytearray. It needs a comment from someone who can understand the C-API 
based patch, which is beyond me.

http://bugs.python.org/issue8990

Another possible reason for the lack: 500 of the current 2800 open 
issues have NO comment (ie, message count = 1), some with patches.
I just posted '500 tracker orphans; we need more reviewers' on 
python-list to encourage more participation.


Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-17 Thread Barry Warsaw
On Jun 16, 2010, at 08:48 PM, l...@rmi.net wrote:

Well, it looks like I've stumbled onto the other shoe on this
issue--that the email package's problems are also apparently 
behind the fact that CGI binary file uploads don't work in 3.1
(http://bugs.python.org/issue4953).  Yikes.

I trust that people realize this is a show-stopper for broader
Python 3.X adoption.

We know it, we have extensively discussed how to fix it, we have IMO a good
design, and we even have someone willing and able to tackle the problem.  We
need to find a sufficient source of funding to enable him to do the work it
will take, and so far that's been the biggest stumbling block.  It will take a
focused and determined effort to see this through, and it's obvious that
volunteers cannot make it happen.  I include myself in the latter category, as
I've tried and failed at least twice to do it in my spare time.

-Barry


signature.asc
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-17 Thread Bill Janssen
Nick Coghlan ncogh...@gmail.com wrote:

 My personal perspective is that a lot of that code was likely already
 broken in hard to detect ways when dealing with mixed encodings -
 releasing 3.x just made the associated errors significantly easier to
 detect.

I have to agree with this, and not just about encodings.  I think much
of the stdlib code dealing with all aspects of HTTP (urllib and the http
package which now includes cgi) is kind of shaky.  And it affects
(infects) other parts of the stdlib, too; sockets are hacked to support
the read-after-close paradigm that httplib uses, for instance.  Which
means that SSL and other socket-using code also has to support it, etc.
Some of this was cleaned up in the move to 3.x, but more work needs to
be done.  Cudos to the folks working on httplib2
(http://code.google.com/p/httplib2/) and WSGI.

There's a related meta-issue having to do with antique protocols.  FTP,
for instance, was designed when the Internet had only 19 nodes connected
together with custom-built refrigerator-sized routers.  A very early
experiment in application protocols.  It does a few odd things that
we've since learned to be inefficient/unwise/unnecessary.  Does it make
sense that Python support every part of it?  On the other hand, it was
fairly static when the Python support was added (unlike HTTP, which was
under very active development!) so that module is pretty robust.

Bill
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


  1   2   >