Re: [Quarry-dev] Read the UTF-8 SGF file

Ethan Baldridge Sat, 07 Nov 2009 21:47:40 -0800

The SGF format doesn't have any way to specify a character set, so
encodings like Big5 or Shift-JIS wouldn't be easily autodetected. But I
think we should try UTF-8 first, then if that fails use Latin-1?

The reason I say that is because I think a lot of Europeans are still
using Windows, which (as far as I know - maybe this has changed
recently) has an unreasonable preference for codepages over Unicode.
Although U.S. ASCII is 100% compatible with UTF-8, Latin letters with
accent marks done in 8859-1 aren't. Also it is possible to have an
invalid UTF-8 stream (thus fail and use the backup 8859-1 -> UTF-8
conversion) but I don't believe it's possible to have a truly invalid
stream of bytes from Latin-1 (thus no easy way to figure out if it's
wrong).

Make sense?

On Sun, 2009-11-08 at 12:33 +0800, LiangXu Wang wrote:
> Hi,
>    If I have a UTF-8 encoded SGF file ( for e.g., in Chinese), quarry
> can not recognize the UTF-8 character in the sgf file, including the
> player's name and the comment.
>   So I find the problem is happened in sgf-parser.c, in the parser, it
> is assumed that the sgf is a latin code file. If it is changed to
> UTF-8 encode, the problem can be solved, and I think the latin encoded
> will not affect too (not sure).
>    The patch:
> ------------------------------
> --- src/sgf/sgf-parser.c      (revision 1003)
> +++ src/sgf/sgf-parser.c      (working copy)
> @@ -380,7 +380,7 @@
>    data->board = NULL;
>    data->error_list = *error_list;
> 
> -  data->latin1_to_utf8 = iconv_open ("UTF-8", "ISO-8859-1");
> +  data->latin1_to_utf8 = iconv_open ("UTF-8", "UTF-8");
>    assert (data->latin1_to_utf8 != (iconv_t) (-1));
> 
>    next_token (data);
> 
> 
> Liangxu Wang
> 
> _______________________________________________
> Quarry-dev mailing list
> [email protected]
> https://mail.gna.org/listinfo/quarry-dev

_______________________________________________
Quarry-dev mailing list
[email protected]
https://mail.gna.org/listinfo/quarry-dev

Re: [Quarry-dev] Read the UTF-8 SGF file

Reply via email to