Re: [Computer-go] Some experiences with CNN trained on moves by the winning player

Rémi Coulom Tue, 13 Dec 2016 03:04:54 -0800

I probably ran matches, but I did not write the result down.

I remember connecting the stochastic policy to KGS. It had a very unnatural 
style, playing blunders from time to time, mixed with strong moves.


If you have one good move with probability 0.3, and 70 bad moves with 
probability 0.01, it will play a blunder with probability 0.7.

I wonder if the policy trained by policy gradient becomes stronger than the 
greedy policy. Is it reported in the AlphaGo paper?

----- Mail original -----
De: "Álvaro Begué" <alvaro.be...@gmail.com>
À: "computer-go" <computer-go@computer-go.org>
Envoyé: Dimanche 11 Décembre 2016 22:52:31
Objet: Re: [Computer-go] Some experiences with CNN trained on moves by the 
winning player







On Sun, Dec 11, 2016 at 4:50 PM, Rémi Coulom < remi.cou...@free.fr > wrote: 


It makes the policy stronger because it makes it more deterministic. The greedy 
policy is way stronger than the probability distribution. 



I suspected this is what it was mainly about. Did you run any experiments to 
see if that explains the whole effect? 





Rémi 

----- Mail original ----- 
De: "Detlef Schmicker" < d...@physik.de > 
À: computer-go@computer-go.org 
Envoyé: Dimanche 11 Décembre 2016 11:38:08 
Objet: [Computer-go] Some experiences with CNN trained on moves by the winning 
player 

I want to share some experience training my policy cnn: 

As I wondered, why reinforcement learning was so helpful. I trained 
from the Godod database with only using the moves by the winner of 
each game. 

Interestingly the prediction rate of this moves was slightly higher 
(without training, just taking the previously trained network) than 
taking into account the moves by both players (53% against 52%) 

Training on winning player moves did not help a lot, I got a 
statistical significant improvement of about 20-30ELO. 

So I still don't understand, why reinforcement should do around 
100-200ELO :) 

Detlef 
_______________________________________________ 
Computer-go mailing list 
Computer-go@computer-go.org 
http://computer-go.org/mailman/listinfo/computer-go 
_______________________________________________ 
Computer-go mailing list 
Computer-go@computer-go.org 
http://computer-go.org/mailman/listinfo/computer-go 

_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go
_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Some experiences with CNN trained on moves by the winning player

Reply via email to