Ian Shaw wrote:
>> Our experience is: TD is nice for kickstarting the training 
>> process. But supervised training is the real thing. Make a big
>> database of positions and the rollout results according to these
>> positions and train supervised.
>> 
>> If you still would like to do TD training with your system, I 
>> really recommend looking at Sutton/Barto.
>> 
> 
> It's probably worth noting that Frank Berger has had a different
> experience. If I recall correctly, Frank used only TD training for
> BgBlitz, with no supervised training. (This was some years ago, so I
> may be out of data or just wrong.)

Really right.

> With the increase in processing power since the current gnubg net was
> developed, I wonder if there is some merit in having another crack at
> it. Are you doing any work on the NN side of things, Øystein? I think
> Joseph has stopped.

I did some effort about 2 years ago, but I could not harvest any fruits
from it. I'm hoping to catch up with that work. Among the things I did
was to rewrite/refactor some of the evaluation code. I also tried to
make different position-classes with a k-means scheme. I can't say it
did not work, but it has to be fine tuned and further trained to give
better results, I believe.

I remember I first tried TD training. (lambda=0), and I made the same
experience as Joseph reported. TD is slow. However, I was able to run
5000 games/minutes. TDG 1.0 was trained with 300.000 games, and I'm able
to reach that in an hour. Maybe TD can be reconsidered.

BTW: I also think Frank's training algorithm uses other values for
lambda. I'm not sure of all the details in his project.

-Øystein


Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Bug-gnubg mailing list
Bug-gnubg@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-gnubg

Reply via email to