gabor schrieb: >> I may have missed something, but did you present a solution that would >> make the case above work? > > if we use the same decoding flags as binary-string.decode(), > then we could do: > > [os.path.join(path,n) for n in os.listdir(path,'ignore')]
That wouldn't work. The characters in the file name that didn't decode would be dropped, so the resulting file names would be invalid. Trying to do os.stat() on such a file name would raise an exception that the file doesn't exist. > [os.path.join(path,n) for n in os.listdir(path,'replace')] Likewise. The characters would get replaced with REPLACEMENT CHARACTER; passing that to os.stat would give an encoding error. > it's not an elegant solution, but it would solve i think most of the > problems. No, it wouldn't. This idea is as bad or worse than just dropping these file names from the directory listing. >> One approach I had been considering is to always make the decoding >> succeed, by using the private-use-area of Unicode to represent bytes >> that don't decode correctly. >> > > hmm..an interesting idea.. > > and what happens with such texts, when they are encoded into let's say > utf-8? are the in-private-use-area characters ignored? UTF-8 supports encoding of all Unicode characters, including the PUA blocks. py> u"\ue020".encode("utf-8") '\xee\x80\xa0' Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list