John Chambers wrote: > Lest you think this is way off topic, I might mention that I've been > involved in attempts to use non-ASCII char sets in my ABC tunes. I > have a lot of "international folk dance" tunes, and it would be > really nice to be able to spell the titles right. Also, I like to use > single-tune files as my primary data (with little programs that > combine them for pages of tunes). It's really handy if the tune title > can be used in the file name. I've done this on my linux box, and at > least Latin-1 names work there. But when I rsync a directory over to > my Mac Powerbook, it goes berserk on the files with non-ASCII letters > in the names. > > This tells me that OSX "isn't ready for prime time" in the coming > international world. If it can't even handle a simple 'ä' or 'ö' in a > file name, how is it ever going to handle Chinese or Japanese file > names? It can't even handle a Finnish or Arabic file name. You can't > expect those people to use English file names. (Well, the Finns do > all speak English these days, but still ... ;-)
Snip... > One question for our Scandinavian friends: Do any of you use Macs? > Can you get filenames that contain the non-ASCII letters in your > alphabet? If so, how do you make it work right? I've tried setting my > charsets to 8859-1 and UTF-8 and others, and none of them seem to > make the files in my .../Scand/ directory copy correctly from my > linux box. Copying between linux to this FreeBSD system works fine, > because those systems treat a character as unanalyzed bits. But when > copying to OSX, those files end up with gibberish names. Mac OS X has full support for Unicode, although not all the BSD UNIX utilities which have been ported over support Unicode to it's fullest extent, so there are oddities when you use the command line. That said, of all the systems I've ever programmed on, the Mac has the best international support of any of them -- internationalization has been a strong point for Macs since the early days. Apple's HFS+ file system (the Mac OS X default file system) stores filenames in UTF-16 Unicode format. This means I can (and do) have files with names in just about any language using just about any characters from anywhere in the Unicode code set. (Including mixing and matching entirely different language sets). What happens to those when transferred to a Linux system or a Windows system, who knows. The problem you're seeing is not that the Mac doesn't support internationalization, it's that it doesn't have any way of telling what the encoding is for the filenames you're giving it. Most filesystems out there (with a couple exceptions, like HFS+ and NTFS, which store filenames in UTF-16) encode filenames in some 8 bit string. To get international filenames, they use either different charsets, or UTF-8. But there's *nothing* in the filesystem itself which says "this filename is encoded in format XXXX". That information is stored in the OS application layer as a *display* parameter. So it all looks correct on that system, because the OS translates it into the right characters when they get displayed. But when you try to transfer it to another system, all it knows is that the file is named some weird 8 bit string. This is why it gets all mangled in the translation. It's even worse when you send it via email, because you have to hope the email programs on both sides know how to deal with the encodings you are sending. I suspect rsync is the culprit in your case -- I seriously doubt that's been made Unicode aware. Probably your safest bet for a valid transfer is to burn the files to a CD using ISO-9660 format. There *is* a standard for filenames stored like this, that most systems ought to be able to read. It also should be possible to write a simple Demangle application which would read in a filename (or a directory of filenames), and given an encoding specified by the user, would translate it to Unicode and rename the file appropriately. Shouldn't be too complicated to write -- the standard OSX string routines have all kinds of support for translating strings between various encodings. -->Steve Bennett To subscribe/unsubscribe, point your browser to: http://www.tullochgorm.com/lists.html