> Mastering Chess and Shogi by Self-Play with a General Reinforcement > Learning Algorithm > https://arxiv.org/pdf/1712.01815.pdf
One of the changes they made (bottom of p.3) was to continuously update the neural net, rather than require a new network to beat it 55% of the time to be used. (That struck me as strange at the time, when reading the AlphaGoZero paper - why not just >50%?) The AlphaZero paper shows it out-performs AlphaGoZero, but they are comparing to the 20-block, 3-day version. Not the 40-block, 40-day version that was even stronger. As papers rarely show failures, can we take it to mean they couldn't out-perform their best go bot, do you think? If so, I wonder how hard they tried? In other words, do you think the changes they made from AlphaGo Zero to Alpha Zero have made it weaker (when just viewed from the point of view of making the strongest possible go program). Darren _______________________________________________ Computer-go mailing list [email protected] http://computer-go.org/mailman/listinfo/computer-go
