The learning rate seems much too high. My experience (which is from backgammon 
rather than Go, among other caveats) is that you need tiny learning rates. 
Tiny, as in 1/TrainingSetSize.

 

Neural networks are dark magic. Be prepared to spend many weeks just trying to 
figure things out. You can bet that the Google & FB results are just their 
final runs.

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
Robert Waite
Sent: Tuesday, August 23, 2016 2:40 AM
To: computer-go@computer-go.org
Subject: [Computer-go] Converging to 57%

 

I had subscribed to this mailing list back with MoGo... and remember probably 
arguing that the game of go wasn't going to be beat for years and years. I am a 
little late to the game now but was curious if anyone here has worked with 
supervised learning networks like in the AlphaGo paper.

 

I have been training some networks along the lines of the AlphaGo paper and the 
DarkForest paper.. and a couple others... and am working with a single 660gtx. 
I know... laugh... but its a fair benchmark and i'm being cheap for the moment.

 

Breaking 50% accuracy is quite challenging... I have attempted many 
permutations of learning algorithms... and can hit 40% accuracy in perhaps 4-12 
hours... depending on the parameters set. Some things I have tried are using 
default AlphaGo but wtih 128 filters, 32 minibatch size and .01 learning rate, 
changing the optimizer from vanilla SGD to Adam or RMSProp. Changing batching 
to match DarkForest style (making sure that a minibatch contains samples from 
game phases... for example beginning, middle and end-game).Pretty much 
everything seems to converge at a rate that will really stretch out. I am 
planning on picking a line and going with it for an extended training but was 
wondering if anyone has ever gotten close to the convergence rates implied by 
the DarkForest paper.

 

For comparison... Google team had 50 gpus, spend 3 weeks.. and processed 5440M 
state/action pairs. The FB team had 4 gpus, spent 2 weeks and processed 
150M-200M state/action pairs. Both seemed to get to around 57% accuracy with 
their networks.

 

I have also been testing them against GnuGo as a baseline.. and find that GnuGo 
can be beaten rather easily with very little network training... my eye is on 
Pachi... but have to break 50% accuracy i think to even worry about that.

 

Have also played with reinforcement learning phase... started with learning 
rate of .01... which i think was too high.... that does take quite a bit of 
time on my machine.. so didnt play too much with it yet.

 

Anyway.... does anyone have any tales of how long it took to break 50%? What is 
the magic bullet that will help me converge there quickly!

 

Here is a long-view graph of various attempts:

 

https://drive.google.com/file/d/0B0BbrXeL6VyCUFRkMlNPbzV2QTQ/view

 

Red and Blue lines are from another member that ran 32 in a minibatch, .01 
learning rate and 128 filters in the middle layers vs. 192. They had 4 k40 gpus 
I believe. They also used 40000 training pairs to 40000 validation pairs... so 
I imagine that is whey they had such a spread. There is a jump in the accuracy 
which was when learning rate was decreased to .001 I believe.

 

Closer shot:

 

https://drive.google.com/file/d/0B0BbrXeL6VyCRVUxUFJaWVJBdEE/view

 

Most stay between the lines... but looking at both graphs makes me wonder if 
any of the lines are approaching the convergence of DarkForest. My gut tells me 
they were onto something... and am rather curious of the playing strength of 
the DarkForest SL network and the AG SL network.

 

Also... a picture of the network's view on a position... this one was trained 
to 41% accuracy and played itself greedily.

 

https://drive.google.com/file/d/0B0BbrXeL6VyCNkRmVDBIYldraWs/view

 

Oh... and another thing.... AG used KGS amateur data... FB and my networks have 
been trained on pro games only. At one point I tested the 41% network in the 
image (trained on pro data) and a 44% network trained on amateur (KGS games) 
against GnuGo... and the pro data network soundly won... and the amateur 
network soundly lost... so I stuck with pro since. Not sure if the end result 
is the same... and kinda glad AG team used amateur as that removes the argument 
that it somehow learned Le Sedol's style.

_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Reply via email to