For any interested people on this list who don't follow Leela Zero discussion or reddit threads:
I recently released a paper on ways to improve the efficiency of AlphaZero-like learning in Go. A variety of the ideas tried deviate a little from "pure zero" (e.g. ladder detection, predicting board ownership), but still only uses self-play starting from random and with no outside human data. Although longer training runs have NOT yet been tested, for reaching up to about LZ130 strength so far (strong human pro or just beyond it, depending on hardware), you can speed up the learning to that point by roughly a factor of 5 at least compared to Leela Zero, and closer to a factor of 30 for merely reaching the earlier level of very strong amateur strength rather than pro or superhuman. I found some other interesting results, too - for example contrary to intuition built up from earlier-generation MCTS programs in Go, putting significant weight on score maximization rather than only win/loss seems to help. Blog post: https://blog.janestreet.com/accelerating-self-play-learning-in-go/ Paper: https://arxiv.org/abs/1902.10565 Code: https://github.com/lightvector/KataGo
_______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go