Robert Elz wrote, on 12 Apr 2022: > > | 1. The vast majority of apps will never need to do that because they know > | (or can assume) that the pathnames they handle either always use the > | portable filename character set or use the user's locale. > > The latter, perhaps, the former, certainly not in an international context. > The point was that, at least as I read the proposed text, you're defining > things like '*' to only work (reliably as specified) when the locale is > POSIX (aka C). In the user's locale, who knows what happens?
That is how things are at present. The suggested changes just make it explicit. Do you have an alternative proposal? > | I.e. the pathnames are not <b>abitrary</b> (a word I was careful to > | include in the proposed changes). > > Sure, the problem is that when dealing with user input (as in, for example, > the command line args) the application cannot assume that the pathnames are > not aribtrary. They're anything that's OK for the user. The application can document that it requires pathnames to be in the same encoding as the user's locale. > | 2. In apps that truly do need to do matching or expansion on arbitrary > | pathnames, a C program can call uselocale() before and after calls to > | fnmatch(), glob(), and wordexp(). A shell script can set LC_ALL=C before > | handling pathnames (and unset it or restore it afterwards). > > But how does that help *.doc (in a defined way, as opposed to "of course > that works in all glob implementations") match a filename that isn't > entirely ascii (by which I mean, using characters only from the portable > character set)? The C locale is specified as containing 256 single-byte characters. Thus in the C locale all pathnames are valid character strings. > Even worse perhaps, ???.doc which should match 7 char > names that end in ".doc" (or is that 7 byte names?) (not counting the \0). It would match 7-byte names. -- Geoff Clare <[email protected]> The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
