Hi kre, Sending to you directly so you see a version that Mailman doesn't touch.
> | >The «"» around `Blind-Carbon-Copy' > > I am leaving that there just so you can see what happens... What I > see when composing this is (ignoring my "|" quoting marker, ">The " > (which I assume is fine for everyone), Yes. > capital A with a caret (I hope that is the right name, like the ^ > char, but smaller), Yes, see `dict -d foldoc caret'. > the opening guillemets (never heard that name before...) a normal > double quote (ascii), another capital A-caret, and the closing > guillemets (and then a space, and the rest of the text). Yes. > What I think is happening, is that everything I do is "un-localed", > that is, I have no LC_* or LANG settings at all, which means that > everything runs in the C (aka POSIX) locale (more or less US-ASCII). > > If I use nmh (ie: show) to look at your message, I see: > > >The ?"? around `Blind-Carbon-Copy' > > which is correct as I understand things. I think that's because nmh knows the text has two bytes representing each guillemet, and iconv(3) says it can't translate either of them, Unicode U+00AB or U+00BB, to the C locale so nmh renders each two bytes as a single `?' byte. > Then, what I expect happens, is that when the reply is composed, and > the 2 byte UTF-8 character is read, it is instead interpreted as 2 > characters, one of which is the A-Caret, and the other is, probably > not entirely by fluke, the opening « (which I just pasted from your > message, no idea in what form it will be sent out). Correct. The UTF-8 encoding of U+00AB is 0xc2 0xab. $ printf '\uab' | hd 00000000 c2 ab |..| This is because the 11 bits, p-z, of the Unicode runes [0x80, 0x800) are mapped to two bytes. 110p qrst 10uv wxyz u-z stay in their original place within the byte. Their byte is headed by 10. If s-t's value is 10 then the second byte retains its original value and so runes [0x80, 0xc0) are the ones that are simply prefixed by 0xc2 when they are UTF-8 encoded. Byte 0xc2 is a `Â' in ISO 8859-1 so if you see a pair of runes starting with `Â' then the second one is what was intended under a common mis-encoding. > \xe0\xb9\x80\xe0\xb8\xa3\xe0\xb8\xb5\xe0\xb8\xa2\xe0\xb8\x99 > \xe0\xb8\x84\xe0\xb8\x93\xe0\xb8\xb2\xe0\xb8\x88\xe0\xb8\xb2 > \xe0\xb8\xa3\xe0\xb8\xa2\xe0\xb9\x8c $ show | grep \\\\x | > tr -dc 0-9a-f | tr a-f A-F | > sed 's/.*/16i&0AP/' | dc เรียนคณาจารย์ $ -- Cheers, Ralph. -- nmh-workers https://lists.nongnu.org/mailman/listinfo/nmh-workers