On Sun, Jun 20, 2010 at 11:30 PM, Terry Reedy <tjre...@udel.edu> wrote: > On 6/20/2010 8:26 AM, Giampaolo RodolĂ wrote: > >> I attempted to port pyftpdlib to python 3 several times and the >> biggest show stopper has always been the bytes / string difference >> introduced by Python 3 which forces you to *know* and *use* Unicode >> every time you deal with some text and 2to3 is completely useless >> here. > > I believe the advice in the wiki porting page is to use unicode() and > bytes() but never str(), in a version that runs in 2.6. Then 2to3 should do > fine. For 2.5-, add 'bytes = str' somewhere.
Really? I thought you were supposed to call encode/decode methods on the appropriate thing, depending if they're coming from a byte source or a character source. The problems arise when you're doing things like paths, which I believe are bytes on *nix and proper Unicode on Windows (which basically just means they enforce an encoding, UTF-16 if I'm not mistaken). I don't actually use Windows so I might be completely wrong here. > 2to3 still gets patches, I believe, when someone exhibits code that could > and ought to be converted but is not. > > I suspect that if you posted 'Problems porting pyftpdlib to Python3', you > would get some help. If it involved inadequacies in the current tools and > guides, it would to be be on-topic here. Or try python-list. > >> The choice of forcing the user to use Unicode and "think in Unicode" >> was a very brave one, and I'm sure it's for the better, but not >> everyone wants to deal with that because Unicode is hard to swallow. > > I felt that way until my daughter decided to switch from Spanish to Japanese > for here foreign language. Once I quit fighting it, it because much easier > to swallow and learn. As it turns out, thinking in Unicode is a pretty > straightforward generalization of thinking in ascii. There are some annoying > glitches due to the need to accomodate legacy systems. The plethora of > legacy encodings for various subsets, besides ascii, is also a nuisance. I think doing unicode/str properly in 2.x is very important, #python stresses it quite often, I think Py3k's strictness is a good idea because people very often write something that appears to work for a long time, and then someone tries it using funny bytes, and everything blows apart. Convincing people their software is wrong when "everything worked five minutes ago" is really hard :-) You'd be surprised how long it can take before some of these problems are found, a couple of weeks ago in #python we had exactly this problem when we were helping Blender folks. There was a bug report from a German Blender user, turns out Blender ignores unicode in some critical spot making importing between people who disagree on charsets impossible. And Blender isn't exactly a project that's two weeks old and filled with idiots :) The downside is that *fixing* them then becomes a nontrivial task. The central problem is probably that a lot of people don't understand Unicode. Recently I learned that even Tanenbaum got it wrong in his latest revision of the computer networks book! (Although that might just be my dutch translation of it being bad). > Terry Jan Reedy Laurens _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com