On Fri, Oct 20, 2017, 21:48 Petr Baudis <pa...@ucw.cz> wrote: > Few open questions I currently have, comments welcome: > > - there is no input representing the number of captures; is this > information somehow implicit or can the learned winrate predictor > never truly approximate the true values because of this? >
They are using Chinese rules, so prisoners don't matter. There are simply less stones of one color on the board. > - what ballpark values for c_{puct} are reasonable? > The original paper has the value they used. But this likely needs tuning. I would tune with a supervised network to get started, but you need games for that. Does it even matter much early on? The network is random :) > - why is the dirichlet noise applied only at the root node, if it's > useful? > It's only used to get some randomness in the move selection, no ? It's not actually useful for anything besides that. > - the training process is quite lazy - it's not like the network sees > each game immediately and adjusts, it looks at last 500k games and > samples 1000*2048 positions, meaning about 4 positions per game (if > I understood this right) - I wonder what would happen if we trained > it more aggressively, and what AlphaGo does during the initial 500k > games; currently, I'm training on all positions immediately, I guess > I should at least shuffle them ;) > I think the lazyness may be related to the concern that reinforcement methods can easily "forget" things they had learned before. The value network training also likes positions from distinct games. -- GCP
_______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go