On 2008-12-09 09:41, Anders J. Munch wrote:
> On Sun, Dec 7, 2008 at 3:53 PM, Terry Reedy <[EMAIL PROTECTED]> wrote:
>>>> try:
>>>>  files = os.listdir(somedir, errors = strict)
>>>> except OSError as e:
>>>>  log(<verbose error message that includes somedir and e>)
>>>>  files = os.listdir(somedir)
> 
> Instead of a codecs error handler name, how about a callback for
> converting bytes to str?
> 
> os.listdir(somedir, decoder=bytes.decode)
> os.listdir(somedir, decoder=lambda b: b.decode(preferredencoding, 
> errors='xmlcharrefreplace'))
> os.listdir(somedir, decoder=repr)
> 
> ISTM that would be simpler and more flexible than going over the
> codecs registry.  One caveat though is that there's no obvious way of
> telling listdir to skip a name.  But if the default behaviour for
> decoder=None is to skip with a warning, then the need to explicitly
> ask for files to be skipped would be small.
> 
> Terry's example would then be:
> 
>>>> try:
>>>>  files = os.listdir(somedir, decoder=bytes.decode)
>>>> except UnicodeDecodeError as e:
>>>>  log(<verbose error message that includes somedir and e>)
>>>>  files = os.listdir(somedir)

Well, this is not too far away from just putting the whole decoding
logic into the application directly:

files = [filename.decode(filesystemencoding, errors='warnreplace')
         for filename in os.listdir(dir)]

(or os.listdirb() if that's where the discussion is heading)

... and that also tells us something about this discussion: we're
trying to come up with some magic to work around writing two
lines of Python code.

I'd just have all the os APIs return bytes and leave whatever
conversion to Unicode might be necessary to a higher level API.

Think of it: You really only need the Unicode values if you
ever want to output those values in text form somewhere.

In those cases, it's usually a human reading a log file or
screen output. Most other cases, just care about getting
some form of file identifier in order to open the file
and don't really care about the encoding of the file name
at all.

It's probably better to have a two helper functions in the os module
that take care of the conversion on demand rather than trying
to force this conversion even in cases where the application
never really needs to write the filename somewhere, e.g.
os.decodefilename() and os.encodefilename().

These should then provide some reasonable default logic, e.g.
use a 'warnreplace' error handler. Applications are then
free to use these converters or implement their own.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Dec 09 2008)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2008-12-02: Released mxODBC.Connect 1.0.0      http://python.egenix.com/

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to