Hi, Thanks for sharing your idea.
In my experience it is rarely efficient to train value functions from very short term data (ie, next move). TD(lambda), or training from the final outcome of the game is often better, because it uses a longer horizon. But of course, it is difficult to tell without experiments whether your idea would work or not. The advantage of your ideas is that you can collect a lot of training data more easily. Rémi ----- Mail original ----- De: "Bo Peng" <b...@withablink.com> À: computer-go@computer-go.org Envoyé: Mardi 10 Janvier 2017 23:25:19 Objet: [Computer-go] Training the value network (a possibly more efficient approach) Hi everyone. It occurs to me there might be a more efficient method to train the value network directly (without using the policy network). You are welcome to check my method: http://withablink.com/GoValueFunction.pdf Let me know if there is any silly mistakes :) _______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go _______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go