Re: [Fish-users] Completion bug

Axel Liljencrantz Fri, 27 Jan 2006 18:35:09 -0800

2006/1/27, Isak Savo <[EMAIL PROTECTED]>:
> 2006/1/27, Axel Liljencrantz <[EMAIL PROTECTED]>:
> > 2006/1/26, Isak Savo <[EMAIL PROTECTED]>:
> > > Something like this:
> > > $ ls<TAB>
> > > ls          (List contents of directory)
> > > lsfoo     (<<No description found>>)
> > > lspci     (List all PCI devices)
> > > ...
> > >
> > > Perhaps the "No description found" part could be error colored or
> > > something. It could also be completely omitted, leaving just an empty
> > > parenthesis.
> >
> > In this particular case, that can be done. I was thinking about the
> > general case. For example, on my machine the 'locale -a' command seems
> > to outputs locale names in the native charset of the locale, which
> > will sometimes result in invalid strings. What should a command like:
> >
> > for i in (locales -a)
> >     ...
> > end
> >
> > do?
> >
> > Should it skip the the broken strings. Try to guess what they are?
> > Skip the broken characters? Maybe the whole command should fail?
>
> Why would you want to skip them? Imagine the following
> for i in (locales -a)
>    process_string($i);
> end
>
> process_string() might handle, or even depend on, $i being in weird
> charsets. I'm not familiar with the fish internals, but the logical
> thing would be to not care unless the string is being printed to the
> user.
>
> Isak
>
> PS. I have no idea how other shells handle this. I'm basically arguing
> theoretical points here :-)
>


After occasionally thinking about this problem for something like two
years, I today reached enlightenment. I have come up with a way to
support an arbitrary byte sequence, completely independant of the
specified character set, in an application that internally uses wide
character strings. My approach uses a unicode private use area to give
each illegal byte value a unique wide-string representation, and
making sure that conversion to/from wide strings respect this
conversion as well. This means that the 'encoding muck' can be handled
exclusively by the character set conversion functions, this also means
that all regular wide character functions, including length
calculations, keep working. Lingering problems:

* All unknown/illegal characters are assumed to be one byte long. This
will mean that wildcard matches using '?' may give the wrong result
when the illegal characters used a multibyte encoding. This is pretty
hard to avoid, and it should be extremely rare.

* The terminal will still have to somehow display broken characters.
I'm thinking that maybe completion code should use a special backlash
escape for broken characters. Perhaps \Xxx, where xx is the
hexadecimal bytecode for the illegal character. (Note the uppercase X,
to specify a byte, in contrast to \xxx, ie. a lower case x, that
specifys a byte which will be encoded to the locales character set,
possibly making it more than one byte long.

There is a patch in the Darcs repo implementing the above behaviour.
Everything seems to work pretty nicely. As near as I can tell, this
solution removes all drawbacks associated with wide characters except
the increased memory usage.

In a UTF-8 locale, one can try this new functionality by typing something like:

mkdir foo
touch foo/\Xaa
touch foo\Xbb
cat foo/*
echo foo/<TAB>

Everything should work as expected in the above example, event though
the two files in the above example do not have filenames that are
valid UTF-8.

--
Axel

Re: [Fish-users] Completion bug

Reply via email to