Martin Tarenskeen writes: | On Tue, 27 Apr 2004, Stephen Kellett wrote: | > John Chambers wrote: | > >OSX presents an interesting portability challenge: The default file | > >system has "caseless" file names. If you look around, you might not | > >notice this, because mixed-case names abound. But the case of letters | > >isn't significant when opening files. | > | > You have the same problem on Windows. Windows supports both upper and | > lower case letters in filenames, however filename matching is case | > insensitive. | > | > Try creating textfile.txt and Textfile.txt in the same directory. Can't | > do it. | > | I have an Atari Falcon030 computer running the FreeMiNT operating system. | (Never heard of ? Never mind :-) It is a sort of hybrid OS a little bit | like OSX. It is a mix of the TOS operating system that is in the ROM of | classic Atari computers, combined with a Unix-like multitasking OS. I have | several partitions on my harddisk with different filesystems. On one | partition I have a ext2 system that is really case sensitive. On drive | C:\ I need a FAT filesystem with the old fashion 8+3 case-insensitive DOS | file names. On another drive I have VFAT: long filenames, with upper- and | lowercase, but not really case-sensitive.
Hey; you seem to have the worst situation of all. ;-) One of the lessons about software engineering that I remember as a strong point in several classes was the general idea that such "policy" decisions don't properly belong in the lower levels of the OS; they belong up in the "application" or "library" level. The unix kernel's approach was often used as an example of the right way to do it: The kernel itself treats a file name as just a character string, and the only special characters are the '/' and the final NULL char. The rest are "just chars" with no meaning. The kernel just implements file-access mechanisms; "policy" decisions are the responsibility of the application level. The advantage of this is that it's easy to implement a name-matching policy in a library file-open routine. Suppose you want to implement caseless matching. First decide on your alphabet (7-bit ASCII that ignores the 8th bit; Latin-1; ISO-8859-7, whatever) so you know what are upper- and lower-case letters. Then your open routine first calls the system open() routine. If that succeeds, fine. If not, you pass the name to your filenamematch() routine. It splits the name into a directory part and a filename part, does a readdir() on the directory, runs through the list of filenames, and applies whatever test you want on each one. When it gets a match, it returns the matched filename to the caller, which opens that file. I've done this on a number of projects, and it really is that easy. Well, sometimes you want to apply the matching to the directory portion, too, but that's a simple recursive call. The best example of why this is the right approach is in the growing problem of "internationalization". We have any number of competing character sets these days. What's an upper- or lower-case letter is different in different character sets. Some alphabets don't even have a case distinction. Some (such as German) even have letters that only come in one case. Others (Hebrew, Arabic) have don't have case but have letters that have several forms, and you might want to treat variants on a letter as equal. If your OS does this, then it *will* get it wrong for most of the possible alphabets, and there's nothing you can do to fix it. If the OS just says "a character is a chunk of bits without meaning", and the meaning is up in the runtime libraries, then it's easy to fix a problem. You just change the library that you're using. Lest you think this is way off topic, I might mention that I've been involved in attempts to use non-ASCII char sets in my ABC tunes. I have a lot of "international folk dance" tunes, and it would be really nice to be able to spell the titles right. Also, I like to use single-tune files as my primary data (with little programs that combine them for pages of tunes). It's really handy if the tune title can be used in the file name. I've done this on my linux box, and at least Latin-1 names work there. But when I rsync a directory over to my Mac Powerbook, it goes berserk on the files with non-ASCII letters in the names. This tells me that OSX "isn't ready for prime time" in the coming international world. If it can't even handle a simple 'ä' or 'ö' in a file name, how is it ever going to handle Chinese or Japanese file names? It can't even handle a Finnish or Arabic file name. You can't expect those people to use English file names. (Well, the Finns do all speak English these days, but still ... ;-) Actually, my linux box can't handle Chinese file names yet, either. But there's a Chinese version of linux being developed in China, as an official computer platform for the government and industry. It will be able to do the job right. And I'll bet it will sell well outside of Asia. People making "world music" collections will want a system like that. And programmers will appreciate a system that doesn't force you to fit your names into an English character set. One of their reason for standardizing on linux was that it's an OS that has no builtin rules for what's a valid file name. So there's very little in the kernel to undo. Really all that's necessary is a safe way to handle multi-byte chars so that a 16-bit char with '/' in one of its 8-bit halves isn't treated as a directory separator. One question for our Scandinavian friends: Do any of you use Macs? Can you get filenames that contain the non-ASCII letters in your alphabet? If so, how do you make it work right? I've tried setting my charsets to 8859-1 and UTF-8 and others, and none of them seem to make the files in my .../Scand/ directory copy correctly from my linux box. Copying between linux to this FreeBSD system works fine, because those systems treat a character as unanalyzed bits. But when copying to OSX, those files end up with gibberish names. To subscribe/unsubscribe, point your browser to: http://www.tullochgorm.com/lists.html