Ultimately I switched to reading the filenames from file descriptor 0 using os.read(); this gave back bytes in 3.x, strings of single-byte characters in 2.x - which are similar enough for my purposes, and eliminated the filesystem encoding(s) question nicely.
I rewrote readline0 (http://stromberg.dnsalias.org/cgi-bin/viewvc.cgi/readline0/trunk/?root=svn) for 2.x and 3.x to facilitate reading null-terminated strings from stdin. It's in better shape now anyway - more OOP than functional, and with a bunch of unit tests. The module now works on CPython 2.x, CPython 3.x and PyPy 1.4 from the same code. On Mon, Nov 29, 2010 at 9:26 PM, Dan Stromberg <drsali...@gmail.com> wrote: > I've got a couple of programs that read filenames from stdin, and then > open those files and do things with them. These programs sort of do > the *ix xargs thing, without requiring xargs. > > In Python 2, these work well. Irrespective of how filenames are > encoded, things are opened OK, because it's all just a stream of > single byte characters. > > In Python 3, I'm finding that I have encoding issues with characters > with their high bit set. Things are fine with strictly ASCII > filenames. With high-bit-set characters, even if I change stdin's > encoding with: > > import io > STDIN = io.open(sys.stdin.fileno(), 'r', encoding='ISO-8859-1') > > ...even with that, when I read a filename from stdin with a > single-character Spanish n~, the program cannot open that filename > because the n~ is apparently internally converted to two bytes, but > remains one byte in the filesystem. I decided to try ISO-8859-1 with > Python 3, because I have a Java program that encountered a similar > problem until I used en_US.ISO-8859-1 in an environment variable to > set the JVM's encoding for stdin. > > Python 2 shows the n~ as 0xf1 in an os.listdir('.'). Python 3 with an > encoding of ISO-8859-1 wants it to be 0xc3 followed by 0xb1. > > Does anyone know what I need to do to read filenames from stdin with > Python 3.1 and subsequently open them, when some of those filenames > include characters with their high bit set? > > TIA! > -- http://mail.python.org/mailman/listinfo/python-list