On Wed, Aug 19, 2015 at 01:43:51AM +0200, Ángel González wrote: > And of course, there's the question of what to do if the filename we > are trying to convert to utf-16 is not in fact valid utf-8.
My current understanding: (i) there is a current patch, that fixes most problems on Unix and can be applied today (ii) one also wants to fix Windows problems, and in the process do something more general for Unix. We can discuss a future patch that does something like: Look at the remote filename. Assign a character set as follows: - if the user specified a from-charset, use that - if the name is printable ASCII (in 0x20-0x7f), take ASCII - if the name is non-ASCII and valid UTF-8, take UTF-8 - otherwise take Unknown. Determine a local character set as follows: - if the user specified a to-charset, use that - if the locale uses UTF-8, use that - otherwise take ASCII Convert the name from from-charset to to-charset: - if the user asked for unmodified filenames, do nothing - if the name is ASCII, do nothing - if the name is UTF-8 and the locale uses UTF-8, do nothing - convert from Unknown by hex-escaping the entire name - convert to ASCII by hex-escaping the entire name - otherwise invoke iconv(); upon failure, escape the illegal bytes See whether the resulting name can be used. On Unix all strings (without NUL and '/') are ok. On Windows there are many restrictions. Further hex escape problematic characters on Windows. Since conversions to 8-bit character sets will often fail, it is desirable to convince Windows to use Unicode as current codeset. Maybe that requires a copy of the common fileio routines. That is my view of the result of the present conversation. Probably some refinements will be needed. Moreover, there is interference with iri stuff that should be looked at. Once we know what we want it is trivial to write the code, but it may take a while to figure out what we want. I think we should start applying the current patch. Andries
