On 7 December 2011 12:13, Zhi Yong Wu <zwu.ker...@gmail.com> wrote: > Can you let me know how you see that it is ISO-8859-1 coding, not > UTF-8? They look same to me.
This gets a bit confusing because mail clients and web browsers tend to try to fix up what they think are wrongly labelled encodings, so for example in my mail client the diff looks like it is not changing anything. However if you look at the file in the current git repo with a hex editor: 00000000 23 20 32 30 30 34 2d 30 33 2d 31 36 20 48 61 6c |# 2004-03-16 Hal| 00000010 6c 64 f3 72 20 47 75 f0 6d 75 6e 64 73 73 6f 6e |ld.r Gu.mundsson| you can see that there is an 0xf3 at offset 0x12, which is LATIN SMALL LETTER O WITH ACUTE in ISO-8859-1. ISO-8859-1 is a one-byte-per-character encoding which is why it has a raw 0xf3 here. However although the character is at Unicode codepoint 0xf3 as well, the encoding of this in UTF-8 is the two byte sequence 0xc3 0xb3. Similarly the 0xf0 LATIN SMALL LETTER ETH has to be encoded as 0xc3 0xb0. If you look at the raw text of Stefan's email it reads: -# 2004-03-16 Halld=F3r Gu=F0mundsson and Morten Lange +# 2004-03-16 Halld=C3=B3r Gu=C3=B0mundsson and Morten Lange which is the quoted-printable encoding of this change. -- PMM