Dan Stromberg <drsali...@gmail.com> wrote: > I've got a couple of programs that read filenames from stdin, and then > open those files and do things with them. These programs sort of do > the *ix xargs thing, without requiring xargs. > > In Python 2, these work well. Irrespective of how filenames are > encoded, things are opened OK, because it's all just a stream of > single byte characters. > > In Python 3, I'm finding that I have encoding issues with characters > with their high bit set. Things are fine with strictly ASCII > filenames. With high-bit-set characters, even if I change stdin's > encoding with: > > import io > STDIN = io.open(sys.stdin.fileno(), 'r', encoding='ISO-8859-1') > > ...even with that, when I read a filename from stdin with a > single-character Spanish n~, the program cannot open that filename > because the n~ is apparently internally converted to two bytes, but > remains one byte in the filesystem. I decided to try ISO-8859-1 with > Python 3, because I have a Java program that encountered a similar > problem until I used en_US.ISO-8859-1 in an environment variable to > set the JVM's encoding for stdin. > > Python 2 shows the n~ as 0xf1 in an os.listdir('.'). Python 3 with an > encoding of ISO-8859-1 wants it to be 0xc3 followed by 0xb1. > > Does anyone know what I need to do to read filenames from stdin with > Python 3.1 and subsequently open them, when some of those filenames > include characters with their high bit set? > > TIA!
Try using sys.stdin.buffer instead of sys.stdin. It gives you bytes instead of strings. Also use byteliterals instead of stringliterals for paths, i.e. os.listdir(b'.'). Marc -- http://mail.python.org/mailman/listinfo/python-list