Kornel Benko wrote: > Setting 'wrong' lang environment causes lyx to use different encoding for > filenames. > > Setting > export LANG="en_IE@euro" > > Now, reading the file "Testoübernahme.lyx" which needs conversion leads to > this log snippet: > > support/TempFile.cpp (35): Temporary file in > /home/kornel/.lyx2/tmp/lyx_tmpdir.dkXWbiwl8040/Buffer_convertLyXFormatXXXXXX.lyx > support/TempFile.cpp (38): Temporary file > `/home/kornel/.lyx2/tmp/lyx_tmpdir.dkXWbiwl8040/Buffer_convertLyXFormatAS8040.lyx' > created. Buffer.cpp (1297): Running 'python -tt > "/usr/local/share/lyx2.3/lyx2lyx/lyx2lyx" -t 509 -o > "/home/kornel/.lyx2/tmp/lyx_tmpdir.dkXWbiwl8040/Buffer_convertLyXFormatAS8040.lyx" > "/usr2/kornel/lyx/privat/Briefe-Edgar/Testoübernahme.lyx"' usage: lyx2lyx > [options] [file] lyx2lyx: error: argument input: invalid cmd_arg value: > '/usr2/kornel/lyx/privat/Briefe-Edgar/Testo\xc3\xbcbernahme.lyx' > > Everything is OK, if using e.g. LANG="en_IE.utf8". > > From my POV, encoding of file-names should not depend on locales.
TL;DR: The current behaviour is probably correct, or QFile::encodeName() has a bug. Unfortunately this is complicated, but I'll try to explain. First let's have a look how file names are stored in the file system. This depends of course on the file system type. Both NTFS on windows and HFS+ on OS X store file names encoded in utf-16 (see https://en.wikipedia.org/wiki/NTFS and https://en.wikipedia.org/wiki/HFS_Plus). This is simple and reliable, any program or operating system that deals with the file system directly (e.g. when mounting it on a different machine), knows how to interpret file names and can present them to the user in the correct way. For other file systems such as FAT or the typical linux file systems (e.g. ext3) the situation is a mess. ext3 and relatives do not specify in which encoding a file name is stored. They only know bytes (see e.g. http://unix.stackexchange.com/questions/39175/understanding-unix-file-name-encoding). The interpretation of the bytes is left to the user space, and here comes the locale into account: I the locale is set to en_IE@euro, and you create a file, the encoding of the file name will be iso_8859-15. If you do the same while the locale is set to en_IE, the encoding of the file name will be utf8. This used to cause big trouble in the transition period from fixed width 8bit locales to utf8, when people hand file names with non-ascii letters, and used the old hard disk on a machine with a newer Linux, and suddenly all file names looked broken. Therefore utilities like convmv were invented, and when mounting FAT file systems on linux the codepage= and iocharset= options can be used. What happens in your case is the following: LyX does _not_ use the iso_8859-15 encoding when calling lyx2lyx. This can be seen from the error message, if it would use iso_8859-15 then the ü would not be encoded in two bytes. Here we might have a bug in QFile::encodeName() that is used internally, but I rather suspect that you still have some LC_* variables set to use an utf8-encoding. Unfortunately the qt documentation is rather unspecific about how exactly the "local 8-bit encoding determined by the user's locale" (which is used by QFile::encodeName()) is determined, one would have to read the sources. Assuming that LyX would really pass the file name encoded in iso_8859-15 to lyx2lyx, then the commandline argument decoding in lyx2lyx would work (I did spend some evenings to understand how this works and to implement the current parsing interface in lyx2lyx). However, when lyx2lyx would try to read the input file it would not work. The reason for this is that your original file was created with an active utf8 locale, but the current locale tells lyx2lyx to use iso_8859-15 for decoding the file name. It would work if you called convmv to convert the file name in the file system to iso_8859-15 before starting LyX. Encoding commandline arguments of programs according to the currently active locale is standard among all operating systems (see e.g. http://stackoverflow.com/questions/5408730/what-is-the-encoding-of-argv). So for the case that the user calls lyx2lyx directly in a terminal, or from a different program than LyX, the current lyx2lyx behaviour is correct (I tested that using different encodings). If you want to test this as well you need to ensure that you set all environment variables that are currently set to the wanted locale. These may be LANG, LANGUAGE and LC_*. When using a terminal emulator from X, you also need to change the encoding of the terminal emulator, because this determines how the keyboard input that is fed to the shell is encoded. If called from LyX we could simply decide to use utf8 for lyx2lyx commandline arguments. Of course this would have to be specified by a special commandline parameter, so that non-LyX usage of lyx2lyx does not break. I do not see any real advantage when doing this. We would not need the ugly FileName::toSafeFilesystemEncoding() on windows, and we would be able to encode every file for the lyx2lyx commandline, but on linux, if the file name is not encodable by the current locale, lyx2lyx would fail when trying to open the file. Georg