On Sun, Dec 7, 2008 at 3:53 PM, Terry Reedy <[EMAIL PROTECTED]> wrote: > Guido van Rossum wrote: >> >> On Sun, Dec 7, 2008 at 1:20 PM, Terry Reedy <[EMAIL PROTECTED]> wrote: >>> >>> Toshio Kuratomi wrote: >>> >>>> - If this is true, a definition of os.listdir(<type 'str'>) that would >>>> better meet programmer expectation would be: "Give me all files in a >>>> directory with the output as str type". The definition of >>>> os.listdir(<type 'bytes'>) would be "Give me all files in a directory >>>> with the output as bytes type". Raising an exception when the filenames >>>> are undecodable is perfectly reasonable in this situation. >>> >>> Your examples (snipped) pretty well convince me that there is a use case >>> for >>> raising exceptions. We should move beyond arguing over which one way is >>> right. I think there should be a second argument 'ignorebad=False' to >>> ignore undecodable files rather than raise the exception (or >>> 'strict=True' >>> to stop and raise exception on non-decodable names -- then code is 'if >>> strict: raise ...'). I believe other functions have a similar parameter. > > I was thinking of the "normal Unicode 'errors' parameter", as described by > Nick. > >> If you want the exceptions, just use the bytes API and try to decode >> the byte strings using the system encoding. > > If it was a matter of adding a new method, I might agree. But: > > 1. We already have a method that does exactly what you describe. It is only > a matter of adding flexibility to the response to problems, for which there > is already precedent. > > 2. Suggesting that people who want strings and not bytes should have to deal > with bytes, just to get an error notification, seems to negate that point of > moving to 3.0 > > 3. A builtin would probably do so better than most programmers would, with > little touches such as the one suggested below. > > 4. An error parameter would ALERT programmers to the possibility of a > PROBLEM, both in the present and future. As you say below, people need to > better anticipate the future. > >> My problem with raising exceptions *by default* when an undecodable >> name exists is that it may render an app completely useless in a >> situation where the developer is no longer around. This happened all >> the time with the 2.x Unicode API, where the developer hadn't >> anticipated a particular input potentially containing non-ASCII bytes, >> and the user fed the application non-ASCII text. Making os.listdir >> raise an exception when a directory contains a single undecodable file >> means that the entire directory can't be read, and most likely the >> entire app crashes at that point. Most likely the developer never >> anticipated this situation (since in most places it is either >> impossible or very unlikely) -- after all, if they had anticipated it >> they would have used the bytes API in the first place. (It's worse >> because the exception being raised would be UnicodeError -- most >> people expect os.listdir to raise OSError, not other errors.) > > This to be is an argument for keeping the default the current behavior, but > not for rejecting flexibility. The computing world seems to be messier than > we would like and worse that I realized until this week. As you say below, > people need to better anticipate the future, and an errors parameter would > help do that.
I'm fine with whatever API enhancements you can come up with (assuming others like them too :-) as long as the default remains the current behavior. > Is Windows really immune? What about when it reads the directory of > possibly old removable media with whatever byte name encodings? Is this a > possible source of 'unanticipated' problems? > > As to your last sentence, os.listdir() with an errors parameter could > convert a decoding UnicodeError to "OSError: undecodable file name > <ascii+hex repr>", thereby supplying the expected exception as well as an > extractable representation of problematical the raw bytes > > Here is a possible use case: I want filenames as 3.0 strings and I > anticipate no problems at present but, as you say above, something might > happen years in the future. I am using 3.0 *because* of the strings == > unicode feature. I would like to write > > try: > files = os.listdir(somedir, errors = strict) > except OSError as e: > log(<verbose error message that includes somedir and e>) > files = os.listdir(somedir) > > and go one without the problem file but not without logging the problem so a > future maintainer can consider what to do about it, but only when there is > an actual need to think about it. > > Terry Jan Reedy > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com