> prohibits names that are not valid UTF-8

What is an example of a name that is not valid UTF-8? Names that include Tangut 
or Klingon characters? Or do you mean "broken" UTF-8 that contains bytes (NOT 
characters) that are not valid UTF-8, or the first byte of a 2-byte sequence 
but not the second? Should that be permitted?

Should filenames be case-insensitive is a matter of taste. I happen to like the 
Windows scheme of preserving case but performing case-insensitive searches of 
file names -- but that's just me. I code in C and am perfectly happy that Foo 
is different from foo. Yes, a system should be consistent across English and 
Cyrillic, and yes, the "equivalency" (or not!) of año/ano is a completely 
different issue than año/Año.

Charles

-----Original Message-----
From: IBM Mainframe Discussion List [mailto:[email protected]] On Behalf 
Of Paul Gilmartin
Sent: Friday, October 04, 2013 4:30 PM
To: [email protected]
Subject: Re: OT? A cause to join, but somewhat humorous

On Fri, 4 Oct 2013 17:19:44 -0400, Tony Harminc wrote:
>
>To say nothing of y+diaeresis U+00FF, which carries the strange baggage 
>of having its lower case version ÿ in ISO 8859-1 (and CP 037, 1047, and 
>so on), but finding its upper case version Ÿ languishing in the higher 
>reaches of the Unicode BMP at U+0178.
> 
Many operating systems nowadays welcome files named in the
UTF-8 character set (notable exceptions are z/OS and z/VM).
OS X will let me name files in the Finder GUI in Greek, Hebrew, Cyrillic, ...  
But the GUI complains and prohibits names that are not valid UTF-8.  (I can 
sneak around and assign such names in Terminal line commands.)

But this raises a question for the case-insensitive partisans (Windows bigots, 
IOW):  Should the OS or filesystem treat files named "ÿ" and "Ÿ" as equivalent; 
allow either to be referred to by the other name, and prohibit the occurrence 
of both in a single directory?  It's unsatisfactory to suggest that it should 
depend on one's locale settings; it's parochial to suggest that the Roman 
alphabet should be case-insensitive but the Cyrillic case-sensitive.

Similar concerns apply to diacritics; they can drastically alter semantics.  
The Spanish word for "year" is "año".  It's important not to neglect the tilde; 
you get a very different word.  (Are there similar examples in other languages?)

-- gil

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions, send email to 
[email protected] with the message: INFO IBM-MAIN

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Reply via email to