Re: [Computer-go] Creating the playout NN
> > The purpose is to see if there is some sort of "simplification" available > to the emerged complex functions encoded in the weights. It is a typical > reductionist strategy, especially where there is an attempt to converge on > human conceptualization. > > That's an interesting way to look at it. If you do this with several different smaller NN of varying complexity, and see how good they are, you would get some kind of numeric estimate of the complexity of the encoded concepts. Of course. there is the slight problem, that we also would need to map those "simple" NN to concepts somehow. ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] Creating the playout NN
> > BTW, by improvement, I don't mean higher Go playing skill...I mean > appearing close to the same level of Go playing skill _per_ _move_ with far > less computational cost. It's the total game outcomes that will fall. > > For the playouts, you always need a relatively inexpensive computation. Because for every invocation of the main NN in the tree, you need several hundred cheaper calls in the playout. So it will have to be orders of magnitude faster. Surely, replacing a crude fast NN with a slightly less crude fast NN would be beneficial. I don't know if other bots besides AlphaGo are already utilizing the selfplay improvement. But when they do, it will be helpful there too. Because the added knowledge of the main NN can be transferred down to the playout NN. ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] Creating the playout NN
BTW, by improvement, I don't mean higher Go playing skill...I mean appearing close to the same level of Go playing skill _per_ _move_ with far less computational cost. It's the total game outcomes that will fall. On Sun, Jun 12, 2016 at 3:55 PM, Jim O'Flaherty wrote: > The purpose is to see if there is some sort of "simplification" available > to the emerged complex functions encoded in the weights. It is a typical > reductionist strategy, especially where there is an attempt to converge on > human conceptualization. Given the complexity of the nuances in Go, my > intuition says that it will show excellent improvement in short term play > at the cost of nuance in longer term play. > > On Sun, Jun 12, 2016 at 6:05 AM, Álvaro Begué > wrote: > >> I don't understand the point of using the deeper network to train the >> shallower one. If you had enough data to be able to train a model with many >> parameters, you have enough to train a model with fewer parameters. >> >> Álvaro. >> >> >> On Sun, Jun 12, 2016 at 5:52 AM, Michael Markefka < >> michael.marke...@gmail.com> wrote: >> >>> Might be worthwhile to try the faster, shallower policy network as a >>> MCTS replacement if it were fast enough to support enough breadth. >>> Could cut down on some of the scoring variations that confuse rather >>> than inform the score expectation. >>> >>> On Sun, Jun 12, 2016 at 10:56 AM, Stefan Kaitschick >>> wrote: >>> > I don't know how the added training compares to direct training of the >>> > shallow network. >>> > It's prob. not so important, because both should be much faster than >>> the >>> > training of the deep NN. >>> > Accuracy should be slightly improved. >>> > >>> > Together, that might not justify the effort. But I think the fact that >>> you >>> > can create the mimicking NN, after the deep NN has been refined with >>> self >>> > play, is important. >>> > >>> > On Sun, Jun 12, 2016 at 9:51 AM, Petri Pitkanen < >>> petri.t.pitka...@gmail.com> >>> > wrote: >>> >> >>> >> Would the expected improvement be reduced training time or improved >>> >> accuracy? >>> >> >>> >> >>> >> 2016-06-11 23:06 GMT+03:00 Stefan Kaitschick >>> >> : >>> >>> >>> >>> If I understood it right, the playout NN in AlphaGo was created by >>> using >>> >>> the same training set as the one used for the large NN that is used >>> in the >>> >>> tree. There would be an alternative though. I don't know if this is >>> the best >>> >>> source, but here is one example: https://arxiv.org/pdf/1312.6184.pdf >>> >>> The idea is to teach a shallow NN to mimic the outputs of a deeper >>> net. >>> >>> For one thing, this seems to give better results than direct >>> training on the >>> >>> same set. But also, more importantly, this could be done after the >>> large NN >>> >>> has been improved with selfplay. >>> >>> And after that, the selfplay could be restarted with the new playout >>> NN. >>> >>> So it seems to me, there is real room for improvement here. >>> >>> >>> >>> Stefan >>> >>> >>> >>> ___ >>> >>> Computer-go mailing list >>> >>> Computer-go@computer-go.org >>> >>> http://computer-go.org/mailman/listinfo/computer-go >>> >> >>> >> >>> >> >>> >> ___ >>> >> Computer-go mailing list >>> >> Computer-go@computer-go.org >>> >> http://computer-go.org/mailman/listinfo/computer-go >>> > >>> > >>> > >>> > ___ >>> > Computer-go mailing list >>> > Computer-go@computer-go.org >>> > http://computer-go.org/mailman/listinfo/computer-go >>> ___ >>> Computer-go mailing list >>> Computer-go@computer-go.org >>> http://computer-go.org/mailman/listinfo/computer-go >>> >> >> >> ___ >> Computer-go mailing list >> Computer-go@computer-go.org >> http://computer-go.org/mailman/listinfo/computer-go >> > > ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] Creating the playout NN
The purpose is to see if there is some sort of "simplification" available to the emerged complex functions encoded in the weights. It is a typical reductionist strategy, especially where there is an attempt to converge on human conceptualization. Given the complexity of the nuances in Go, my intuition says that it will show excellent improvement in short term play at the cost of nuance in longer term play. On Sun, Jun 12, 2016 at 6:05 AM, Álvaro Begué wrote: > I don't understand the point of using the deeper network to train the > shallower one. If you had enough data to be able to train a model with many > parameters, you have enough to train a model with fewer parameters. > > Álvaro. > > > On Sun, Jun 12, 2016 at 5:52 AM, Michael Markefka < > michael.marke...@gmail.com> wrote: > >> Might be worthwhile to try the faster, shallower policy network as a >> MCTS replacement if it were fast enough to support enough breadth. >> Could cut down on some of the scoring variations that confuse rather >> than inform the score expectation. >> >> On Sun, Jun 12, 2016 at 10:56 AM, Stefan Kaitschick >> wrote: >> > I don't know how the added training compares to direct training of the >> > shallow network. >> > It's prob. not so important, because both should be much faster than the >> > training of the deep NN. >> > Accuracy should be slightly improved. >> > >> > Together, that might not justify the effort. But I think the fact that >> you >> > can create the mimicking NN, after the deep NN has been refined with >> self >> > play, is important. >> > >> > On Sun, Jun 12, 2016 at 9:51 AM, Petri Pitkanen < >> petri.t.pitka...@gmail.com> >> > wrote: >> >> >> >> Would the expected improvement be reduced training time or improved >> >> accuracy? >> >> >> >> >> >> 2016-06-11 23:06 GMT+03:00 Stefan Kaitschick >> >> : >> >>> >> >>> If I understood it right, the playout NN in AlphaGo was created by >> using >> >>> the same training set as the one used for the large NN that is used >> in the >> >>> tree. There would be an alternative though. I don't know if this is >> the best >> >>> source, but here is one example: https://arxiv.org/pdf/1312.6184.pdf >> >>> The idea is to teach a shallow NN to mimic the outputs of a deeper >> net. >> >>> For one thing, this seems to give better results than direct training >> on the >> >>> same set. But also, more importantly, this could be done after the >> large NN >> >>> has been improved with selfplay. >> >>> And after that, the selfplay could be restarted with the new playout >> NN. >> >>> So it seems to me, there is real room for improvement here. >> >>> >> >>> Stefan >> >>> >> >>> ___ >> >>> Computer-go mailing list >> >>> Computer-go@computer-go.org >> >>> http://computer-go.org/mailman/listinfo/computer-go >> >> >> >> >> >> >> >> ___ >> >> Computer-go mailing list >> >> Computer-go@computer-go.org >> >> http://computer-go.org/mailman/listinfo/computer-go >> > >> > >> > >> > ___ >> > Computer-go mailing list >> > Computer-go@computer-go.org >> > http://computer-go.org/mailman/listinfo/computer-go >> ___ >> Computer-go mailing list >> Computer-go@computer-go.org >> http://computer-go.org/mailman/listinfo/computer-go >> > > > ___ > Computer-go mailing list > Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go > ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] Creating the playout NN
I don't remember the content of the paper and currently can't look at the PDF, but one possible explanation could be that a simple model trained directly maybe regularizes differently from one trained on the best-fit pre-smoothed output of a deeper net. The second could perhaps offer better local optimization and regularization at higher accuracy with equal parameter count. Am 12.06.2016 13:05 schrieb "Álvaro Begué" : > I don't understand the point of using the deeper network to train the > shallower one. If you had enough data to be able to train a model with many > parameters, you have enough to train a model with fewer parameters. > > Álvaro. > > > On Sun, Jun 12, 2016 at 5:52 AM, Michael Markefka < > michael.marke...@gmail.com> wrote: > >> Might be worthwhile to try the faster, shallower policy network as a >> MCTS replacement if it were fast enough to support enough breadth. >> Could cut down on some of the scoring variations that confuse rather >> than inform the score expectation. >> >> On Sun, Jun 12, 2016 at 10:56 AM, Stefan Kaitschick >> wrote: >> > I don't know how the added training compares to direct training of the >> > shallow network. >> > It's prob. not so important, because both should be much faster than the >> > training of the deep NN. >> > Accuracy should be slightly improved. >> > >> > Together, that might not justify the effort. But I think the fact that >> you >> > can create the mimicking NN, after the deep NN has been refined with >> self >> > play, is important. >> > >> > On Sun, Jun 12, 2016 at 9:51 AM, Petri Pitkanen < >> petri.t.pitka...@gmail.com> >> > wrote: >> >> >> >> Would the expected improvement be reduced training time or improved >> >> accuracy? >> >> >> >> >> >> 2016-06-11 23:06 GMT+03:00 Stefan Kaitschick >> >> : >> >>> >> >>> If I understood it right, the playout NN in AlphaGo was created by >> using >> >>> the same training set as the one used for the large NN that is used >> in the >> >>> tree. There would be an alternative though. I don't know if this is >> the best >> >>> source, but here is one example: https://arxiv.org/pdf/1312.6184.pdf >> >>> The idea is to teach a shallow NN to mimic the outputs of a deeper >> net. >> >>> For one thing, this seems to give better results than direct training >> on the >> >>> same set. But also, more importantly, this could be done after the >> large NN >> >>> has been improved with selfplay. >> >>> And after that, the selfplay could be restarted with the new playout >> NN. >> >>> So it seems to me, there is real room for improvement here. >> >>> >> >>> Stefan >> >>> >> >>> ___ >> >>> Computer-go mailing list >> >>> Computer-go@computer-go.org >> >>> http://computer-go.org/mailman/listinfo/computer-go >> >> >> >> >> >> >> >> ___ >> >> Computer-go mailing list >> >> Computer-go@computer-go.org >> >> http://computer-go.org/mailman/listinfo/computer-go >> > >> > >> > >> > ___ >> > Computer-go mailing list >> > Computer-go@computer-go.org >> > http://computer-go.org/mailman/listinfo/computer-go >> ___ >> Computer-go mailing list >> Computer-go@computer-go.org >> http://computer-go.org/mailman/listinfo/computer-go >> > > > ___ > Computer-go mailing list > Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go > ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] Creating the playout NN
I don't understand the point of using the deeper network to train the shallower one. If you had enough data to be able to train a model with many parameters, you have enough to train a model with fewer parameters. Álvaro. On Sun, Jun 12, 2016 at 5:52 AM, Michael Markefka < michael.marke...@gmail.com> wrote: > Might be worthwhile to try the faster, shallower policy network as a > MCTS replacement if it were fast enough to support enough breadth. > Could cut down on some of the scoring variations that confuse rather > than inform the score expectation. > > On Sun, Jun 12, 2016 at 10:56 AM, Stefan Kaitschick > wrote: > > I don't know how the added training compares to direct training of the > > shallow network. > > It's prob. not so important, because both should be much faster than the > > training of the deep NN. > > Accuracy should be slightly improved. > > > > Together, that might not justify the effort. But I think the fact that > you > > can create the mimicking NN, after the deep NN has been refined with self > > play, is important. > > > > On Sun, Jun 12, 2016 at 9:51 AM, Petri Pitkanen < > petri.t.pitka...@gmail.com> > > wrote: > >> > >> Would the expected improvement be reduced training time or improved > >> accuracy? > >> > >> > >> 2016-06-11 23:06 GMT+03:00 Stefan Kaitschick > >> : > >>> > >>> If I understood it right, the playout NN in AlphaGo was created by > using > >>> the same training set as the one used for the large NN that is used in > the > >>> tree. There would be an alternative though. I don't know if this is > the best > >>> source, but here is one example: https://arxiv.org/pdf/1312.6184.pdf > >>> The idea is to teach a shallow NN to mimic the outputs of a deeper net. > >>> For one thing, this seems to give better results than direct training > on the > >>> same set. But also, more importantly, this could be done after the > large NN > >>> has been improved with selfplay. > >>> And after that, the selfplay could be restarted with the new playout > NN. > >>> So it seems to me, there is real room for improvement here. > >>> > >>> Stefan > >>> > >>> ___ > >>> Computer-go mailing list > >>> Computer-go@computer-go.org > >>> http://computer-go.org/mailman/listinfo/computer-go > >> > >> > >> > >> ___ > >> Computer-go mailing list > >> Computer-go@computer-go.org > >> http://computer-go.org/mailman/listinfo/computer-go > > > > > > > > ___ > > Computer-go mailing list > > Computer-go@computer-go.org > > http://computer-go.org/mailman/listinfo/computer-go > ___ > Computer-go mailing list > Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go > ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] Creating the playout NN
Might be worthwhile to try the faster, shallower policy network as a MCTS replacement if it were fast enough to support enough breadth. Could cut down on some of the scoring variations that confuse rather than inform the score expectation. On Sun, Jun 12, 2016 at 10:56 AM, Stefan Kaitschick wrote: > I don't know how the added training compares to direct training of the > shallow network. > It's prob. not so important, because both should be much faster than the > training of the deep NN. > Accuracy should be slightly improved. > > Together, that might not justify the effort. But I think the fact that you > can create the mimicking NN, after the deep NN has been refined with self > play, is important. > > On Sun, Jun 12, 2016 at 9:51 AM, Petri Pitkanen > wrote: >> >> Would the expected improvement be reduced training time or improved >> accuracy? >> >> >> 2016-06-11 23:06 GMT+03:00 Stefan Kaitschick >> : >>> >>> If I understood it right, the playout NN in AlphaGo was created by using >>> the same training set as the one used for the large NN that is used in the >>> tree. There would be an alternative though. I don't know if this is the best >>> source, but here is one example: https://arxiv.org/pdf/1312.6184.pdf >>> The idea is to teach a shallow NN to mimic the outputs of a deeper net. >>> For one thing, this seems to give better results than direct training on the >>> same set. But also, more importantly, this could be done after the large NN >>> has been improved with selfplay. >>> And after that, the selfplay could be restarted with the new playout NN. >>> So it seems to me, there is real room for improvement here. >>> >>> Stefan >>> >>> ___ >>> Computer-go mailing list >>> Computer-go@computer-go.org >>> http://computer-go.org/mailman/listinfo/computer-go >> >> >> >> ___ >> Computer-go mailing list >> Computer-go@computer-go.org >> http://computer-go.org/mailman/listinfo/computer-go > > > > ___ > Computer-go mailing list > Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] Creating the playout NN
I don't know how the added training compares to direct training of the shallow network. It's prob. not so important, because both should be much faster than the training of the deep NN. Accuracy should be slightly improved. Together, that might not justify the effort. But I think the fact that you can create the mimicking NN, after the deep NN has been refined with self play, is important. On Sun, Jun 12, 2016 at 9:51 AM, Petri Pitkanen wrote: > Would the expected improvement be reduced training time or improved > accuracy? > > > 2016-06-11 23:06 GMT+03:00 Stefan Kaitschick >: > >> If I understood it right, the playout NN in AlphaGo was created by using >> the same training set as the one used for the large NN that is used in the >> tree. There would be an alternative though. I don't know if this is the >> best source, but here is one example: https://arxiv.org/pdf/1312.6184.pdf >> The idea is to teach a shallow NN to mimic the outputs of a deeper net. >> For one thing, this seems to give better results than direct training on >> the same set. But also, more importantly, this could be done after the >> large NN has been improved with selfplay. >> And after that, the selfplay could be restarted with the new playout NN. >> So it seems to me, there is real room for improvement here. >> >> Stefan >> >> ___ >> Computer-go mailing list >> Computer-go@computer-go.org >> http://computer-go.org/mailman/listinfo/computer-go >> > > > ___ > Computer-go mailing list > Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go > ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] Creating the playout NN
On Sun, Jun 12, 2016 at 10:51:37AM +0300, Petri Pitkanen wrote: > 2016-06-11 23:06 GMT+03:00 Stefan Kaitschick : > > > If I understood it right, the playout NN in AlphaGo was created by using > > the same training set as the one used for the large NN that is used in the > > tree. There would be an alternative though. I don't know if this is the > > best source, but here is one example: https://arxiv.org/pdf/1312.6184.pdf > > The idea is to teach a shallow NN to mimic the outputs of a deeper net. > > For one thing, this seems to give better results than direct training on > > the same set. But also, more importantly, this could be done after the > > large NN has been improved with selfplay. > > And after that, the selfplay could be restarted with the new playout NN. > > So it seems to me, there is real room for improvement here. > > Would the expected improvement be reduced training time or improved > accuracy? Neither - faster runtime move scoring procedure, i.e. more board positions scored throughout the game, plus also latency reduction (i.e. board scoring available sooner after the move is expanded, i.e. less playouts made without the NN scoring in the last few moves). -- Petr Baudis If you have good ideas, good data and fast computers, you can do almost anything. -- Geoffrey Hinton ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go
Re: [Computer-go] Creating the playout NN
Would the expected improvement be reduced training time or improved accuracy? 2016-06-11 23:06 GMT+03:00 Stefan Kaitschick : > If I understood it right, the playout NN in AlphaGo was created by using > the same training set as the one used for the large NN that is used in the > tree. There would be an alternative though. I don't know if this is the > best source, but here is one example: https://arxiv.org/pdf/1312.6184.pdf > The idea is to teach a shallow NN to mimic the outputs of a deeper net. > For one thing, this seems to give better results than direct training on > the same set. But also, more importantly, this could be done after the > large NN has been improved with selfplay. > And after that, the selfplay could be restarted with the new playout NN. > So it seems to me, there is real room for improvement here. > > Stefan > > ___ > Computer-go mailing list > Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go > ___ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go