Erik van der Werf write on 2014-12-15:

> Also, if you're not careful, it is quite easy to get
> duplicate games in your dataset (I've had cases where
> one game was annotated in chinese, and the other (duplicate) in
> English, or where the board was simply rotated). My solution
> around this was to always test on games from the most recent
> pro-tournaments, for which I was certain they could not yet be
> in the training database.

It is quite easy to avoid duplicates using a signature.
For full games, where one only wants to defend against
reflection/rotation, use the minimum of the eight md5sums
of the sequences of moves obtained by rotating and/or reflecting
the board. For possibly truncated games, use the same idea with
as basis a Dyer-type signature on six or eight moves.
Software like sgfinfo computes such signatures.

Andries
_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Reply via email to