I think it uses the champion network. That is, the training periodically 
generates a candidate, and there is a playoff against the current champion. If 
the candidate wins by more than 55% then a new champion is declared.

 

Keeping a champion is an important mechanism, I believe. That creates the 
competitive coevolution dynamic, where the network is evolving to learn how to 
beat the best, and not just most recent. Without that dynamic, the training 
process can go up and down.

 

From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of 
uurtamo .
Sent: Wednesday, October 25, 2017 6:07 PM
To: computer-go <computer-go@computer-go.org>
Subject: Re: [Computer-go] Source code (Was: Reducing network size? (Was: 
AlphaGo Zero))

 

Does the self-play step use the most recent network for each move?

 

On Oct 25, 2017 2:23 PM, "Gian-Carlo Pascutto" <g...@sjeng.org 
<mailto:g...@sjeng.org> > wrote:

On 25-10-17 17:57, Xavier Combelle wrote:
> Is there some way to distribute learning of a neural network ?

Learning as in training the DCNN, not really unless there are high
bandwidth links between the machines (AFAIK - unless the state of the
art changed?).

Learning as in generating self-play games: yes. Especially if you update
the network only every 25 000 games.

My understanding is that this task is much more bottlenecked on game
generation than on DCNN training, until you get quite a bit of machines
that generate games.

--
GCP
_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org <mailto:Computer-go@computer-go.org> 
http://computer-go.org/mailman/listinfo/computer-go

_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Reply via email to