On Sun, Dec 11, 2016 at 4:50 PM, Rémi Coulom <remi.cou...@free.fr> wrote:
> It makes the policy stronger because it makes it more deterministic. The > greedy policy is way stronger than the probability distribution. > I suspected this is what it was mainly about. Did you run any experiments to see if that explains the whole effect? > > Rémi > > ----- Mail original ----- > De: "Detlef Schmicker" <d...@physik.de> > À: computer-go@computer-go.org > Envoyé: Dimanche 11 Décembre 2016 11:38:08 > Objet: [Computer-go] Some experiences with CNN trained on moves by the > winning player > > I want to share some experience training my policy cnn: > > As I wondered, why reinforcement learning was so helpful. I trained > from the Godod database with only using the moves by the winner of > each game. > > Interestingly the prediction rate of this moves was slightly higher > (without training, just taking the previously trained network) than > taking into account the moves by both players (53% against 52%) > > Training on winning player moves did not help a lot, I got a > statistical significant improvement of about 20-30ELO. > > So I still don't understand, why reinforcement should do around > 100-200ELO :) > > Detlef > _______________________________________________ > Computer-go mailing list > Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go > _______________________________________________ > Computer-go mailing list > Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go
_______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go