On Thu, Feb 21, 2002 at 11:59:14AM +0100, Pablo Saratxaga wrote:
> It isn't that much of a problem.

I think it's not a completely trivial loss, compared to an ASCII environment
where filenames were completely unambiguous (invalid characters being
escaped.)  There doesn't seem to be any obvious fix, so I suppose it's
just a price paid.

> The same thing could happen here; well, not as bad, as I don't think any
> program will purposedly *change* the chars composing a filename previously
> selected (eg when doing "open" then "save" there wouldn't be any name
> change); but whe a user will type manually a filename it could happen

If a program wants to operate in a normalized form internally, it might,
but that's probably asking for trouble anyway.

> that the system will tell him "no such filename" and he will be puzzled
> as he sees there is; as there is no visual difference betwen a precomposed
> character like "aacute" and two characters "a" and "composing acute accent".

Should control characters ever end up in filenames?  I'd be surprised if
many terminal emulators handled copy and paste with control characters
well, if at all.  (They don't need to be drawn, so I'd expect most that
don't use them would just discard them.)

06:29am [EMAIL PROTECTED]/2 [~/testing] perl -e '`touch "\xEF\xBB\xBF"`;'
06:29am [EMAIL PROTECTED]/2 [~/testing] ls

06:29am [EMAIL PROTECTED]/2 [~/testing] ls -l
total 0
-rw-r--r--    1 glenn    users           0 Feb 21 06:29

(rm)

06:31am [EMAIL PROTECTED]/2 [~/testing] perl -e '`touch "\xEF\xBB\xBFfile"`;'
06:31am [EMAIL PROTECTED]/2 [~/testing] ls
file
06:31am [EMAIL PROTECTED]/2 [~/testing] cat file
cat: file: No such file or directory

I can't copy and paste it.  Wildcards wouldn't help much if I'd stuck BOM's
between letters (and *f*i*l*e* isn't very obvious, especially if you
don't know what's going on, or if one's not really the letter it looks
like), and tab completion may or may not help, depending on the shell.
(Someone mentioned moving everything out of the directory and rm -f'ing;
I should never have to do that.)

Are control characters (and all non-printing characters) useful in filenames
at all?  If not, they should be escaped, too, to avoid this kind of problem.

(Another one, perhaps: a character with a ton of combining characters on
top of it.  Most terminal emulators won't deal with an arbitrary number
of them.)

> This reminds me of a discussion in pango and the ability to have different
> view and edit modes: normal (with text showing as expected), and another
> mode where composing chars are de-composed, and invisible control characters
> (such as zwj, etc) are made visible.

"Reveal codes" for filenames? :)

> > I don't know who would actually normalize filenames, though--a shell
> > can't just normalize all args (not all args are filenames) and doing it
> > in all tools would be unreliable.
> 
> The normalization should be done at the input method layer; that way it will
> be transparent and hopefully, if all OS do the same, the potential problem
> of duplicates will never happen.

See my other response: characters are often entered in other ways than a
nice modularized input method; terminal emulators will need to behave in
the same way as IMs for this to work, as well as GUIs at some layer.

-- 
Glenn Maynard
--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to