> prohibits names that are not valid UTF-8 What is an example of a name that is not valid UTF-8? Names that include Tangut or Klingon characters? Or do you mean "broken" UTF-8 that contains bytes (NOT characters) that are not valid UTF-8, or the first byte of a 2-byte sequence but not the second? Should that be permitted?
Should filenames be case-insensitive is a matter of taste. I happen to like the Windows scheme of preserving case but performing case-insensitive searches of file names -- but that's just me. I code in C and am perfectly happy that Foo is different from foo. Yes, a system should be consistent across English and Cyrillic, and yes, the "equivalency" (or not!) of año/ano is a completely different issue than año/Año. Charles -----Original Message----- From: IBM Mainframe Discussion List [mailto:[email protected]] On Behalf Of Paul Gilmartin Sent: Friday, October 04, 2013 4:30 PM To: [email protected] Subject: Re: OT? A cause to join, but somewhat humorous On Fri, 4 Oct 2013 17:19:44 -0400, Tony Harminc wrote: > >To say nothing of y+diaeresis U+00FF, which carries the strange baggage >of having its lower case version ÿ in ISO 8859-1 (and CP 037, 1047, and >so on), but finding its upper case version Ÿ languishing in the higher >reaches of the Unicode BMP at U+0178. > Many operating systems nowadays welcome files named in the UTF-8 character set (notable exceptions are z/OS and z/VM). OS X will let me name files in the Finder GUI in Greek, Hebrew, Cyrillic, ... But the GUI complains and prohibits names that are not valid UTF-8. (I can sneak around and assign such names in Terminal line commands.) But this raises a question for the case-insensitive partisans (Windows bigots, IOW): Should the OS or filesystem treat files named "ÿ" and "Ÿ" as equivalent; allow either to be referred to by the other name, and prohibit the occurrence of both in a single directory? It's unsatisfactory to suggest that it should depend on one's locale settings; it's parochial to suggest that the Roman alphabet should be case-insensitive but the Cyrillic case-sensitive. Similar concerns apply to diacritics; they can drastically alter semantics. The Spanish word for "year" is "año". It's important not to neglect the tilde; you get a very different word. (Are there similar examples in other languages?) -- gil ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [email protected] with the message: INFO IBM-MAIN ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [email protected] with the message: INFO IBM-MAIN
