I don't remember the content of the paper and currently can't look at the PDF, but one possible explanation could be that a simple model trained directly maybe regularizes differently from one trained on the best-fit pre-smoothed output of a deeper net. The second could perhaps offer better local optimization and regularization at higher accuracy with equal parameter count. Am 12.06.2016 13:05 schrieb "Álvaro Begué" <alvaro.be...@gmail.com>:
> I don't understand the point of using the deeper network to train the > shallower one. If you had enough data to be able to train a model with many > parameters, you have enough to train a model with fewer parameters. > > Álvaro. > > > On Sun, Jun 12, 2016 at 5:52 AM, Michael Markefka < > michael.marke...@gmail.com> wrote: > >> Might be worthwhile to try the faster, shallower policy network as a >> MCTS replacement if it were fast enough to support enough breadth. >> Could cut down on some of the scoring variations that confuse rather >> than inform the score expectation. >> >> On Sun, Jun 12, 2016 at 10:56 AM, Stefan Kaitschick >> <skaitsch...@gmail.com> wrote: >> > I don't know how the added training compares to direct training of the >> > shallow network. >> > It's prob. not so important, because both should be much faster than the >> > training of the deep NN. >> > Accuracy should be slightly improved. >> > >> > Together, that might not justify the effort. But I think the fact that >> you >> > can create the mimicking NN, after the deep NN has been refined with >> self >> > play, is important. >> > >> > On Sun, Jun 12, 2016 at 9:51 AM, Petri Pitkanen < >> petri.t.pitka...@gmail.com> >> > wrote: >> >> >> >> Would the expected improvement be reduced training time or improved >> >> accuracy? >> >> >> >> >> >> 2016-06-11 23:06 GMT+03:00 Stefan Kaitschick >> >> <stefan.kaitsch...@hamburg.de>: >> >>> >> >>> If I understood it right, the playout NN in AlphaGo was created by >> using >> >>> the same training set as the one used for the large NN that is used >> in the >> >>> tree. There would be an alternative though. I don't know if this is >> the best >> >>> source, but here is one example: https://arxiv.org/pdf/1312.6184.pdf >> >>> The idea is to teach a shallow NN to mimic the outputs of a deeper >> net. >> >>> For one thing, this seems to give better results than direct training >> on the >> >>> same set. But also, more importantly, this could be done after the >> large NN >> >>> has been improved with selfplay. >> >>> And after that, the selfplay could be restarted with the new playout >> NN. >> >>> So it seems to me, there is real room for improvement here. >> >>> >> >>> Stefan >> >>> >> >>> _______________________________________________ >> >>> Computer-go mailing list >> >>> Computer-go@computer-go.org >> >>> http://computer-go.org/mailman/listinfo/computer-go >> >> >> >> >> >> >> >> _______________________________________________ >> >> Computer-go mailing list >> >> Computer-go@computer-go.org >> >> http://computer-go.org/mailman/listinfo/computer-go >> > >> > >> > >> > _______________________________________________ >> > Computer-go mailing list >> > Computer-go@computer-go.org >> > http://computer-go.org/mailman/listinfo/computer-go >> _______________________________________________ >> Computer-go mailing list >> Computer-go@computer-go.org >> http://computer-go.org/mailman/listinfo/computer-go >> > > > _______________________________________________ > Computer-go mailing list > Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go >
_______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go