Detlef, I think your result makes sense. For games between
near-equally strong players the winning player's moves will not be
much better than the loosing player's moves. The game is typically
decided by subtle mistakes. Even if nearly all my moves are perfect,
just one blunder can throw the game. Of course it depends on how you
implement the details, but in principle reinforcement learning should
be able to deal with such cases (i.e., prevent propagating irrelevant
information all the way back to the starting position).

W.r.t. AG's reinforcement learning results, as far as I know,
reinforcement learning was only indirectly helpful. The RL policy net
performed worse then the SL policy net in the over-all system. Only by
training the value net to predict expected outcomes from the
(over-fitted?) RL policy net they got some improvement (or so they
claim). In essence this just means that RL may have been effective in
creating a better training set for SL. Don't get me wrong, I love RL,
but the reason why the RL part was hyped so much is in my opinion more
related to marketing, politics and personal ego.


On Sun, Dec 11, 2016 at 11:38 AM, Detlef Schmicker <> wrote:
> I want to share some experience training my policy cnn:
> As I wondered, why reinforcement learning was so helpful. I trained
> from the Godod database with only using the moves by the winner of
> each game.
> Interestingly the prediction rate of this moves was slightly higher
> (without training, just taking the previously trained network) than
> taking into account the moves by both players (53% against 52%)
> Training on winning player moves did not help a lot, I got a
> statistical significant improvement of about 20-30ELO.
> So I still don't understand, why reinforcement should do around
> 100-200ELO :)
> Detlef
> _______________________________________________
> Computer-go mailing list
Computer-go mailing list

Reply via email to