Hi Anthony, Anthony J. Bentley wrote on Wed, Nov 29, 2017 at 10:29:28AM -0700: > Ingo Schwarze writes:
>> That's a bad idea. Do not use non-ASCII bytes in file names. >> You are in for all kinds of trouble. > I don't agree. In a situation where a single user will be accessing > files, That's a very strong condition, which will rarely hold. But sure, when it does hold, and when the number of files is too large to assign sensible file names, it partially mitigates the problems. But only partially. > you can use whatever naming scheme you like. UTF-8 works exactly > how you would expect: the filename you enter is the filename you'll > get. Until some program from ports decides to legitimately do Unicode normalization, uses buggy built-in locale components, assumes the wrong locale, or incorrectly validates character encoding and crashes or truncates data. Just as a few examples of what can still go wrong even on a purely single-user system. All these are fairly widespread in the wild. Quite certainly, xterm is not the only program doing normalization, and i have rarely seen any program that is not buggy with respect to multibyte-character handling. > Misencoded files can also exist, with exactly the results you would > expect also: you can't necessarily type it, but if you can pass the > exact filename, programs will work. Except those using fgetws(3), mbtowc(3), mbstowcs(3), and friends for reading UTF-8 data and terminating on encoding errors, which includes for example almost all of the FreeBSD base system, including POSIX utilities like cut(1). [...] > This is indeed xterm's fault. > > precompose (class Precompose) > Tells xterm whether to precompose UTF-8 data into Normalization > Form C, which combines commonly-used accents onto base > characters. If it does not do this, accents are left as > separatate characters. The default is "true". > > In my opinion, that's a *very* poor default. I don't expect base tools > to canonicalize text like that. Base tools certainly shouldn't. In my opinion, if Xenocara wouldn't, that would be an improvement, too. In particular in much-used tools like xterm(1). Even if that causes us to diverge a bit from upstream. > The only unexpected thing here is xterm doing these transformations > without asking. I think i would support a diff to fix that near the end of /usr/X11R6/share/X11/app-defaults/XTerm == /usr/xenocara/app/xterm/XTerm.ad Thanks for digging up the root cause of the OP's issue. Yours, Ingo