IMO, training using only the moves of winners is obviously the practical choice.
Worst case: you "waste" half of your data. But that is actually not a downside provided that you have lots of data, and as your program strengthens you will avoid potential data-quality problems. Asymptotically, you have to train using only self-play games (and only the moves of winners). Otherwise you cannot break through the limitations inherent in the quality of the training games. -----Original Message----- From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of Erik van der Werf Sent: Sunday, December 11, 2016 6:51 AM To: computer-go <computer-go@computer-go.org> Subject: Re: [Computer-go] Some experiences with CNN trained on moves by the winning player Detlef, I think your result makes sense. For games between near-equally strong players the winning player's moves will not be much better than the loosing player's moves. The game is typically decided by subtle mistakes. Even if nearly all my moves are perfect, just one blunder can throw the game. Of course it depends on how you implement the details, but in principle reinforcement learning should be able to deal with such cases (i.e., prevent propagating irrelevant information all the way back to the starting position). W.r.t. AG's reinforcement learning results, as far as I know, reinforcement learning was only indirectly helpful. The RL policy net performed worse then the SL policy net in the over-all system. Only by training the value net to predict expected outcomes from the (over-fitted?) RL policy net they got some improvement (or so they claim). In essence this just means that RL may have been effective in creating a better training set for SL. Don't get me wrong, I love RL, but the reason why the RL part was hyped so much is in my opinion more related to marketing, politics and personal ego. Erik On Sun, Dec 11, 2016 at 11:38 AM, Detlef Schmicker <d...@physik.de> wrote: > I want to share some experience training my policy cnn: > > As I wondered, why reinforcement learning was so helpful. I trained > from the Godod database with only using the moves by the winner of > each game. > > Interestingly the prediction rate of this moves was slightly higher > (without training, just taking the previously trained network) than > taking into account the moves by both players (53% against 52%) > > Training on winning player moves did not help a lot, I got a > statistical significant improvement of about 20-30ELO. > > So I still don't understand, why reinforcement should do around > 100-200ELO :) > > Detlef > _______________________________________________ > Computer-go mailing list > Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go _______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go _______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go