Do you intend to use the same draw values for both sides in the self-play
games? They can be independent:
 - in a 3/1/0 scenario, neither player is especially happy with a draw (and
in fact would rather each throw a game to each other in a two-game match
than make two draws, but that's a separate issue);
 - in a match with one game left, both players agree that a draw and a
Black win (say) are equivalent results;
 - in a tournament, the must-win situations of both players could be
independent.

In real life you usually have a good sense of how your opponent's
"must-win" parameter is set, but that doesn't really apply here.


On Tue, Feb 13, 2018 at 10:58 AM, David Wu <lightvec...@gmail.com> wrote:

> Actually this pretty much solves the whole issue right? Of course the
> proof would be to actually test it out, but it seems to me a pretty
> straightforward solution, not nontrivial at all.
>
>
> On Feb 13, 2018 10:52 AM, "David Wu" <lightvec...@gmail.com> wrote:
>
> Seems to me like you could fix that in the policy too by providing an
> input feature plane that indicates the value of a draw, whether 0 as
> normal, or -1 for must-win, or -1/3 for 3/1/0, or 1 for only-need-not-lose,
> etc.
>
> Then just play games with a variety of values for this parameter in your
> self-play training pipeline so the policy net gets exposed to each kind of
> game.
>
> On Feb 13, 2018 10:40 AM, "Dan Schmidt" <d...@dfan.org> wrote:
>
> The AlphaZero paper says that they just assign values 1, 0, and -1 to
> wins, draws, and losses respectively. This is fine for maximizing your
> expected value over an infinite number of games given the way that chess
> tournaments (to pick the example that I'm familiar with) are typically
> scored, where you get 1, 0.5, and 0 points respectively for wins, draws,
> and losses.
>
> However 1) not all tournaments use this scoring system (3/1/0 is popular
> these days, to discourage draws), and 2) this system doesn't account for
> must-win situations where a draw is as bad as a loss (say you are 1 point
> behind your opponent and it's the last game of a match). Ideally you'd keep
> track of all three probabilities and use some linear meta-scoring function
> on top of them. I don't think it's trivial to extend the AlphaZero
> architecture to handle this, though. Maybe it is sufficient to train with
> the standard meta-scoring (while keeping track of the separate W/D/L
> probabilities) but then use the currently applicable meta-scoring while
> playing. Your policy network won't quite match your current situation, but
> at least your value network and search will.
>
> On Tue, Feb 13, 2018 at 10:05 AM, "Ingo Althöfer" <3-hirn-ver...@gmx.de>
> wrote:
>
>> Hello,
>>
>> what is known about proper MCTS procedures for games
>> which do not only have wins and losses, but also draws
>> (like chess, Shogi or Go with integral komi)?
>>
>> Should neural nets provide (win, draw, loss)-probabilities
>> for positions in such games?
>>
>> Ingo.
>> _______________________________________________
>> Computer-go mailing list
>> Computer-go@computer-go.org
>> http://computer-go.org/mailman/listinfo/computer-go
>
>
>
> _______________________________________________
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
>
>
>
> _______________________________________________
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Reply via email to