Ian Shaw wrote: >> Our experience is: TD is nice for kickstarting the training >> process. But supervised training is the real thing. Make a big >> database of positions and the rollout results according to these >> positions and train supervised. >> >> If you still would like to do TD training with your system, I >> really recommend looking at Sutton/Barto. >> > > It's probably worth noting that Frank Berger has had a different > experience. If I recall correctly, Frank used only TD training for > BgBlitz, with no supervised training. (This was some years ago, so I > may be out of data or just wrong.)
Really right. > With the increase in processing power since the current gnubg net was > developed, I wonder if there is some merit in having another crack at > it. Are you doing any work on the NN side of things, Øystein? I think > Joseph has stopped. I did some effort about 2 years ago, but I could not harvest any fruits from it. I'm hoping to catch up with that work. Among the things I did was to rewrite/refactor some of the evaluation code. I also tried to make different position-classes with a k-means scheme. I can't say it did not work, but it has to be fine tuned and further trained to give better results, I believe. I remember I first tried TD training. (lambda=0), and I made the same experience as Joseph reported. TD is slow. However, I was able to run 5000 games/minutes. TDG 1.0 was trained with 300.000 games, and I'm able to reach that in an hour. Maybe TD can be reconsidered. BTW: I also think Frank's training algorithm uses other values for lambda. I'm not sure of all the details in his project. -Øystein
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Bug-gnubg mailing list Bug-gnubg@gnu.org http://lists.gnu.org/mailman/listinfo/bug-gnubg