On May 1, 2009, at 9:42 PM, Zooko O'Whielacronx wrote:
Yep, I reversed the order of encode() and decode().  However, my whole
statement was utterly wrong and shows that I still didn't fully get it
yet.  I have flip-flopped again and currently think that PEP 383 is
useless for this use case and that my original plan [1] is still the
way to go.  Please let me know if you spot a flaw in my plan or a
ridiculousity in my requirements, or if you see a way that PEP 383 can
help me.

If I were designing a new system such as this, I'd probably just go for utf8b *always*. That is, set the filesystem encoding to utf-8b. The end. All files always keep the same bytes transferring between unix systems. Thus, for the 99% of the world that uses either windows or a utf-8 locale, they get useful filenames inside tahoe. The other 1% of the world that uses something like latin-1, EUC_JP, etc. on their local system sees mojibake filenames in tahoe, but will see the same filename that they put in when they take it back out.

Gnome already uses only utf-8 for filename displays for a few years now, for example, so this isn't exactly an unheard-of position to take...

But if you don't do that, then, I still don't see what purpose your requirements serve. If I have two systems: one with a UTF-8 locale, and one with a Latin-1 locale, why should transmitting filenames from system 1 to system 2 through tahoe preserve the raw bytes, but doing the reverse *not* preserve the raw bytes? (all byte-sequences are valid in latin-1, remember, so they'll all decode into unicode without error, and then be reencoded in utf-8...). This seems rather a useless behavior to me.

James
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to