The SGF format doesn't have any way to specify a character set, so
encodings like Big5 or Shift-JIS wouldn't be easily autodetected. But I
think we should try UTF-8 first, then if that fails use Latin-1?
The reason I say that is because I think a lot of Europeans are still
using Windows, which (as far as I know - maybe this has changed
recently) has an unreasonable preference for codepages over Unicode.
Although U.S. ASCII is 100% compatible with UTF-8, Latin letters with
accent marks done in 8859-1 aren't. Also it is possible to have an
invalid UTF-8 stream (thus fail and use the backup 8859-1 -> UTF-8
conversion) but I don't believe it's possible to have a truly invalid
stream of bytes from Latin-1 (thus no easy way to figure out if it's
wrong).
Make sense?
On Sun, 2009-11-08 at 12:33 +0800, LiangXu Wang wrote:
> Hi,
> If I have a UTF-8 encoded SGF file ( for e.g., in Chinese), quarry
> can not recognize the UTF-8 character in the sgf file, including the
> player's name and the comment.
> So I find the problem is happened in sgf-parser.c, in the parser, it
> is assumed that the sgf is a latin code file. If it is changed to
> UTF-8 encode, the problem can be solved, and I think the latin encoded
> will not affect too (not sure).
> The patch:
> ------------------------------
> --- src/sgf/sgf-parser.c (revision 1003)
> +++ src/sgf/sgf-parser.c (working copy)
> @@ -380,7 +380,7 @@
> data->board = NULL;
> data->error_list = *error_list;
>
> - data->latin1_to_utf8 = iconv_open ("UTF-8", "ISO-8859-1");
> + data->latin1_to_utf8 = iconv_open ("UTF-8", "UTF-8");
> assert (data->latin1_to_utf8 != (iconv_t) (-1));
>
> next_token (data);
>
>
> Liangxu Wang
>
> _______________________________________________
> Quarry-dev mailing list
> [email protected]
> https://mail.gna.org/listinfo/quarry-dev
_______________________________________________
Quarry-dev mailing list
[email protected]
https://mail.gna.org/listinfo/quarry-dev