On Sun, Dec 11, 2016 at 4:50 PM, Rémi Coulom <remi.cou...@free.fr> wrote:

> It makes the policy stronger because it makes it more deterministic. The
> greedy policy is way stronger than the probability distribution.
>

I suspected this is what it was mainly about. Did you run any experiments
to see if that explains the whole effect?



>
> Rémi
>
> ----- Mail original -----
> De: "Detlef Schmicker" <d...@physik.de>
> À: computer-go@computer-go.org
> Envoyé: Dimanche 11 Décembre 2016 11:38:08
> Objet: [Computer-go] Some experiences with CNN trained on moves by the
> winning player
>
> I want to share some experience training my policy cnn:
>
> As I wondered, why reinforcement learning was so helpful. I trained
> from the Godod database with only using the moves by the winner of
> each game.
>
> Interestingly the prediction rate of this moves was slightly higher
> (without training, just taking the previously trained network) than
> taking into account the moves by both players (53% against 52%)
>
> Training on winning player moves did not help a lot, I got a
> statistical significant improvement of about 20-30ELO.
>
> So I still don't understand, why reinforcement should do around
> 100-200ELO :)
>
> Detlef
> _______________________________________________
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
> _______________________________________________
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Reply via email to