Erik van der Werf write on 2014-12-15: > Also, if you're not careful, it is quite easy to get > duplicate games in your dataset (I've had cases where > one game was annotated in chinese, and the other (duplicate) in > English, or where the board was simply rotated). My solution > around this was to always test on games from the most recent > pro-tournaments, for which I was certain they could not yet be > in the training database.
It is quite easy to avoid duplicates using a signature. For full games, where one only wants to defend against reflection/rotation, use the minimum of the eight md5sums of the sequences of moves obtained by rotating and/or reflecting the board. For possibly truncated games, use the same idea with as basis a Dyer-type signature on six or eight moves. Software like sgfinfo computes such signatures. Andries _______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go