Do you intend to use the same draw values for both sides in the self-play games? They can be independent: - in a 3/1/0 scenario, neither player is especially happy with a draw (and in fact would rather each throw a game to each other in a two-game match than make two draws, but that's a separate issue); - in a match with one game left, both players agree that a draw and a Black win (say) are equivalent results; - in a tournament, the must-win situations of both players could be independent.
In real life you usually have a good sense of how your opponent's "must-win" parameter is set, but that doesn't really apply here. On Tue, Feb 13, 2018 at 10:58 AM, David Wu <lightvec...@gmail.com> wrote: > Actually this pretty much solves the whole issue right? Of course the > proof would be to actually test it out, but it seems to me a pretty > straightforward solution, not nontrivial at all. > > > On Feb 13, 2018 10:52 AM, "David Wu" <lightvec...@gmail.com> wrote: > > Seems to me like you could fix that in the policy too by providing an > input feature plane that indicates the value of a draw, whether 0 as > normal, or -1 for must-win, or -1/3 for 3/1/0, or 1 for only-need-not-lose, > etc. > > Then just play games with a variety of values for this parameter in your > self-play training pipeline so the policy net gets exposed to each kind of > game. > > On Feb 13, 2018 10:40 AM, "Dan Schmidt" <d...@dfan.org> wrote: > > The AlphaZero paper says that they just assign values 1, 0, and -1 to > wins, draws, and losses respectively. This is fine for maximizing your > expected value over an infinite number of games given the way that chess > tournaments (to pick the example that I'm familiar with) are typically > scored, where you get 1, 0.5, and 0 points respectively for wins, draws, > and losses. > > However 1) not all tournaments use this scoring system (3/1/0 is popular > these days, to discourage draws), and 2) this system doesn't account for > must-win situations where a draw is as bad as a loss (say you are 1 point > behind your opponent and it's the last game of a match). Ideally you'd keep > track of all three probabilities and use some linear meta-scoring function > on top of them. I don't think it's trivial to extend the AlphaZero > architecture to handle this, though. Maybe it is sufficient to train with > the standard meta-scoring (while keeping track of the separate W/D/L > probabilities) but then use the currently applicable meta-scoring while > playing. Your policy network won't quite match your current situation, but > at least your value network and search will. > > On Tue, Feb 13, 2018 at 10:05 AM, "Ingo Althöfer" <3-hirn-ver...@gmx.de> > wrote: > >> Hello, >> >> what is known about proper MCTS procedures for games >> which do not only have wins and losses, but also draws >> (like chess, Shogi or Go with integral komi)? >> >> Should neural nets provide (win, draw, loss)-probabilities >> for positions in such games? >> >> Ingo. >> _______________________________________________ >> Computer-go mailing list >> Computer-go@computer-go.org >> http://computer-go.org/mailman/listinfo/computer-go > > > > _______________________________________________ > Computer-go mailing list > Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go > > > > > _______________________________________________ > Computer-go mailing list > Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go >
_______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go