By the way, to all of the people threading on inputting other language
text: I was showing a loss from ASCII--you can't type all filenames
because some of them will have characters you can't necessarily type.
This was a minor point, since (as I've said) it can't really be fixed.

(Well, it could be fixed, but not cleanly.)

OTOH, the unprinting character problem is important.  Would it be
reasonable to escape (\u) characters with wcwidth(c)==0 (in tool output,
ie ls -b), or is there some reasonable use of them in filenames?

Combining characters at the beginning of a filename probably shouldn't be
output literally, either.

On Thu, Feb 21, 2002 at 03:33:40PM +0000, Markus Kuhn wrote:
> > One thing that's bound to be lost in the transition to UTF-8 filenames:
> > the ability to reference any file on the filesystem with a pure CLI.
> 
> I can generate plenty of file names with ISO 8859-1 that you will have
> troubles typing in. Try a file name that starts with CR or NBSP just to
> warm up. Nothing new with UTF-8 here. Keep it simple.

02:01pm [EMAIL PROTECTED]/5 [~/testing] touch "
dquote> hello"
02:01pm [EMAIL PROTECTED]/5 [~/testing] ls
\nhello

ls escapes the control character.  If I'm not in escape mode, it outputs
a question mark; it never outputs it literally.  It doesn't do this for
Unicode unprinting characters.

(NBSP isn't a problem here, since it can be copy-and-pasted.)

> Just like with the file £¤¥¦§¨©ª« I guess. Has that been a problem
> in practice so far?

That can still be copy-and-pasted; the control character examples can not.
Overly combined characters probably couldn't, either.

> We agreed already ages ago here that Normalization Form C should be
> considered to be recommended practice under Linux and on the Web. But

Then we're in agreement.

> nothing should prevent you in the future from using arbitrary opaque
> byte strings as POSIX file names. In particular, POSIX forbids that the
> file system applies any sort of normalization automatically. All the URL
> security issues that IIS on NTFS had demonstrates, what a wise decision
> that was.

> Please do not even think about automatically normalizing file names
> anywhere. There is absolutely no need for introducing such nonsense, and
> deviating from the POSIX requirement that filenames be opaque byte
> strings is a Bad Idea[TM] (also known as NTFS).

Nobody's disagreeing on any of this.

> No, it won't. Unicode normalization will not eliminate homoglyphs and
> can't possibly. You try to apply the wrong tool to the wrong problem.
> Again nothing new here. We have lived happily for over a decade with the
> homoglyphs SP and NBSP in ISO 8859-1 in POSIX file systems. Security
> problems have arousen in file systems that attempted to do case
> invariant matching and other forms of normalization and now we know that
> that was a bad idea (see the web attack log I posted here 2002-02-14
> as one example).

(this has been said already)

-- 
Glenn Maynard
--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to