Richard Damon <rich...@damon-family.org> writes: > This does bring up an interesting point. Since the Unix file system > really has file names that are collection of bytes instead of really > being strings, and the Python API to it want to treat them as strings, > then we have an issue that we are going to be stuck with problems with > filenames.
I agree with the general statement “we are going to be stuck with problems with filenames”; the world of filesystems is messy, which will always cause problems. With that said, I don't agree that “the Python API wants to treat [file paths] as strings”. The ‘os’ module explicitly promises to treat bytes as bytes, and text as text, in filesystem paths: Note: All of these functions accept either only bytes or only string objects as their parameters. The result is an object of the same type, if a path or file name is returned. <URL:https://docs.python.org/3/library/os.path.html> There is a *preference* for text, it's true. The opening paragraph includes this: Applications are encouraged to represent file names as (Unicode) character strings. That is immediately followed by more specific advice that says when to use bytes: Unfortunately, some file names may not be representable as strings on Unix, so applications that need to support arbitrary file names on Unix should use bytes objects to represent path names. Vice versa, using bytes objects cannot represent all file names on Windows (in the standard mbcs encoding), hence Windows applications should use string objects to access all files. (That needs IMO a correction, because as already explored in this thread, it's not Unix or Windows that makes the distinction there. It's the specific *filesystem type* which records either bytes or text, and that is true no matter what operating system happens to be reading the filesystem.) > Ultimately we have a fundamental limitation with trying to abstract out > the format of filenames in the API, and we need a back door to allow us > to define what encoding to use for filenames (and be able to detect that > it doesn't work for a given file, and change it on the fly to try > again), or we need an alternate API that lets us pass raw bytes as file > names and the program needs to know how to handle the raw filename for > that particular file system. Yes, I agree that there is an unresolved problem to explicitly declare the encoding for filesystem paths on ext4 and other filesystems where byte strings are used for filesystem paths. -- \ “Give a man a fish, and you'll feed him for a day; give him a | `\ religion, and he'll starve to death while praying for a fish.” | _o__) —Anonymous | Ben Finney -- https://mail.python.org/mailman/listinfo/python-list