IMO, training using only the moves of winners is obviously the practical choice.

Worst case: you "waste" half of your data. But that is actually not a downside 
provided that you have lots of data, and as your program strengthens you will 
avoid potential data-quality problems.

Asymptotically, you have to train using only self-play games (and only the 
moves of winners). Otherwise you cannot break through the limitations inherent 
in the quality of the training games.

-----Original Message-----
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Erik van der Werf
Sent: Sunday, December 11, 2016 6:51 AM
To: computer-go <computer-go@computer-go.org>
Subject: Re: [Computer-go] Some experiences with CNN trained on moves by the 
winning player

Detlef, I think your result makes sense. For games between near-equally strong 
players the winning player's moves will not be much better than the loosing 
player's moves. The game is typically decided by subtle mistakes. Even if 
nearly all my moves are perfect, just one blunder can throw the game. Of course 
it depends on how you implement the details, but in principle reinforcement 
learning should be able to deal with such cases (i.e., prevent propagating 
irrelevant information all the way back to the starting position).

W.r.t. AG's reinforcement learning results, as far as I know, reinforcement 
learning was only indirectly helpful. The RL policy net performed worse then 
the SL policy net in the over-all system. Only by training the value net to 
predict expected outcomes from the
(over-fitted?) RL policy net they got some improvement (or so they claim). In 
essence this just means that RL may have been effective in creating a better 
training set for SL. Don't get me wrong, I love RL, but the reason why the RL 
part was hyped so much is in my opinion more related to marketing, politics and 
personal ego.

Erik


On Sun, Dec 11, 2016 at 11:38 AM, Detlef Schmicker <d...@physik.de> wrote:
> I want to share some experience training my policy cnn:
>
> As I wondered, why reinforcement learning was so helpful. I trained 
> from the Godod database with only using the moves by the winner of 
> each game.
>
> Interestingly the prediction rate of this moves was slightly higher 
> (without training, just taking the previously trained network) than 
> taking into account the moves by both players (53% against 52%)
>
> Training on winning player moves did not help a lot, I got a 
> statistical significant improvement of about 20-30ELO.
>
> So I still don't understand, why reinforcement should do around 
> 100-200ELO :)
>
> Detlef
> _______________________________________________
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Reply via email to