Re: [Computer-go] Some experiences with CNN trained on moves by the winning player

Erik van der Werf Sun, 11 Dec 2016 03:52:25 -0800

Detlef, I think your result makes sense. For games between
near-equally strong players the winning player's moves will not be
much better than the loosing player's moves. The game is typically
decided by subtle mistakes. Even if nearly all my moves are perfect,
just one blunder can throw the game. Of course it depends on how you
implement the details, but in principle reinforcement learning should
be able to deal with such cases (i.e., prevent propagating irrelevant
information all the way back to the starting position).

W.r.t. AG's reinforcement learning results, as far as I know,
reinforcement learning was only indirectly helpful. The RL policy net
performed worse then the SL policy net in the over-all system. Only by
training the value net to predict expected outcomes from the
(over-fitted?) RL policy net they got some improvement (or so they
claim). In essence this just means that RL may have been effective in
creating a better training set for SL. Don't get me wrong, I love RL,
but the reason why the RL part was hyped so much is in my opinion more
related to marketing, politics and personal ego.

Erik

On Sun, Dec 11, 2016 at 11:38 AM, Detlef Schmicker <d...@physik.de> wrote:
> I want to share some experience training my policy cnn:
>
> As I wondered, why reinforcement learning was so helpful. I trained
> from the Godod database with only using the moves by the winner of
> each game.
>
> Interestingly the prediction rate of this moves was slightly higher
> (without training, just taking the previously trained network) than
> taking into account the moves by both players (53% against 52%)
>
> Training on winning player moves did not help a lot, I got a
> statistical significant improvement of about 20-30ELO.
>
> So I still don't understand, why reinforcement should do around
> 100-200ELO :)
>
> Detlef
> _______________________________________________
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Some experiences with CNN trained on moves by the winning player

Reply via email to