Re: os.lisdir, gets unicode, returns unicode... USUALLY?!?!?

Martin v. Löwis Sun, 19 Nov 2006 15:56:00 -0800

gabor schrieb:
>> I may have missed something, but did you present a solution that would
>> make the case above work?
> 
> if we use the same decoding flags as binary-string.decode(),
> then we could do:
> 
> [os.path.join(path,n) for n in os.listdir(path,'ignore')]


That wouldn't work. The characters in the file name that didn't
decode would be dropped, so the resulting file names would be
invalid. Trying to do os.stat() on such a file name would raise
an exception that the file doesn't exist.

> [os.path.join(path,n) for n in os.listdir(path,'replace')]

Likewise. The characters would get replaced with REPLACEMENT
CHARACTER; passing that to os.stat would give an encoding
error.

> it's not an elegant solution, but it would solve i think most of the
> problems.

No, it wouldn't. This idea is as bad or worse than just dropping
these file names from the directory listing.

>> One approach I had been considering is to always make the decoding
>> succeed, by using the private-use-area of Unicode to represent bytes
>> that don't decode correctly.
>>
> 
> hmm..an interesting idea..
> 
> and what happens with such texts, when they are encoded into let's say
> utf-8? are the in-private-use-area characters ignored?

UTF-8 supports encoding of all Unicode characters, including the PUA
blocks.

py> u"\ue020".encode("utf-8")
'\xee\x80\xa0'

Regards,
Martin
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: os.lisdir, gets unicode, returns unicode... USUALLY?!?!?

Reply via email to