Hi Ken, > > Valid UTF-8 and valid GB2312 can share the same sequences, > > especially if it's just the odd `£' or `拢` in ASCII text. > > It was just a suggestion, not one I was particularly crazy about ... > but not all arbitrary 8-bit sequences are valid UTF-8.
Oh, agreed. > And it looks like for GB2312 (using the EUC-CN encoding, right?) it > would be harder, but there are certainly invalid sequences for GB2312. Yep. But there's a lot of valid sequences for both that look like each other. UTF-8 for U+00a3, that `£', is U+62e2, `拢', if the UTF-8 0xc2 0xa3 is treated as (EUC-CN) GB2312. $ printf '\x00\xa3' | > iconv -f ucs-2be -t utf-8 | > iconv -f gb2312 -t ucs-2be | > hd 00000000 62 e2 |b.| 00000002 $ -- Cheers, Ralph. https://plus.google.com/+RalphCorderoy _______________________________________________ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers