[Computer-go] Fw: Re: AlphaZero tensorflow implementation/tutorial

cody2007 via Computer-go Sun, 09 Dec 2018 19:10:39 -0800

(resending because I forgot to send this to the mailing list originally)
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Sunday, December 9, 2018 8:59 PM, cody2007 <cody2...@protonmail.com> wrote:


> Hi Daniel,
> Thanks for your thoughts/questions.
>
>>a) Do you use Dirichlet noise during training, if so is it limited to first 
>>30 or so plies ( which is the opening phase of chess) ?
>>The alphazero paper is not clear about it.
> I don't. I had implemented functionality for it at one point but wasn't 
> entirely clear if I had implemented it correctly / was unclear from what they 
> said in the paper so have disabled it. Have you noticed the noise to be 
> useful?
>
>>b) Do you need to shuffle batches if you are doing one epoch?
> To clarify: I run out 128 games in parallel (my batch size) to 20 
> turns/player (with 1000 simulations). So I get 20 turns times 128 games. I 
> train on *all* turns in random order, so 20 gradient descent steps. 
> Otherwise, I'd be training on turn 1, 2, 3... in order. Then I repeat after 
> the gradient steps with new simulations. Is that what you mean by epoch?
>
> I've been concerned I could be overfitting by doing all 20 turns so have been 
> running a test where I randomly select 10 and discard the rest. So far, no 
> difference in performance. Could be I haven't trained out enough to see a 
> difference or that I'd have to reduce the turn sampling even further (or pool 
> the training examples over larger numbers of games to randomize further, 
> which I think the AlphaGo papers did, if I recall).
>
>>do you shuffle those postions? I found the latter to be very important to 
>>avoid overfitting.
> If you mean random rotations and reflections, yes, I do.
>
>>c) Do you think there is a problem with using Adam Optimizer instead of SGD 
>>with learning rate drops?
> I haven't tried-- have you? In other domains (some stuff I've done with A3C 
> objectives), I felt like it was unstable in my hands--but maybe that's just 
> me. (A3C in general (for other games I've tried it on) has been unstable for 
> me which is partly why I've gone down the route of exploring the AlphaGo 
> approach.
>
> Have you written up anything about your experiments?
>
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Sunday, December 9, 2018 8:34 PM, Dani <dsha...@gmail.com> wrote:
>
>> Thanks for the tutorial! I have some questions about training
>>
>> a) Do you use Dirichlet noise during training, if so is it limited to first 
>> 30 or so plies ( which is the opening phase of chess) ?
>> The alphazero paper is not clear about it.
>>
>> b) Do you need to shuffle batches if you are doing one epoch? Also after 
>> generating game positions from each game,
>> do you shuffle those postions? I found the latter to be very important to 
>> avoid overfitting.
>>
>> c) Do you think there is a problem with using Adam Optimizer instead of SGD 
>> with learning rate drops?
>>
>> Daniel
>>
>> On Sun, Dec 9, 2018 at 6:23 PM cody2007 via Computer-go 
>> <computer-go@computer-go.org> wrote:
>>
>>> Thanks for your comments.
>>>
>>>>looks you made it work on a 7x7 19x19 would probably give better result 
>>>>especially against yourself if you are a complete novice
>>> I'd expect that'd make me win even more against the algorithm since it 
>>> would explore a far smaller amount of the search space, right?
>>> Certainly something I'd be interested in testing though--I just would 
>>> expect it'd take many months more months of training however, but would be 
>>> interesting to see how much performance falls apart, if at all.
>>>
>>>>for not cheating against gnugo, use --play-out-aftermath of gnugo parameter
>>> Yep, I evaluate with that parameter. The problem is more that I only play 
>>> 20 turns per player per game. And the network seems to like placing stones 
>>> in terrotories "owned" by the other player. My scoring system then no 
>>> longer counts that area as owned by the player. Probably playing more turns 
>>> out and/or using a more sophisticated scoring system would fix this.
>>>
>>>>If I don't mistake a competitive ai would need a lot more training such 
>>>>what does leela zero https://github.com/gcp/leela-zero
>>> Yeah, I agree more training is probably the key here. I'll take a look at 
>>> leela-zero.
>>>
>>> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>>> On Sunday, December 9, 2018 7:41 PM, Xavier Combelle 
>>> <xavier.combe...@gmail.com> wrote:
>>>
>>>> looks you made it work on a 7x7 19x19 would probably give better result 
>>>> especially against yourself if you are a complete novice
>>>>
>>>> for not cheating against gnugo, use --play-out-aftermath of gnugo parameter
>>>>
>>>> If I don't mistake a competitive ai would need a lot more training such 
>>>> what does leela zero https://github.com/gcp/leela-zero
>>>>
>>>> Le 10/12/2018 à 01:25, cody2007 via Computer-go a écrit :
>>>>
>>>>> Hi all,
>>>>>
>>>>> I've posted an implementation of the AlphaZero algorithm and brief 
>>>>> tutorial. The code runs on a single GPU. While performance is not that 
>>>>> great, I suspect its mostly been limited by hardware limitations (my 
>>>>> training and evaluation has been on a single Titan X). The network can 
>>>>> beat GNU go about 50% of the time, although it "abuses" the scoring a 
>>>>> little bit--which I talk a little more about in the article:
>>>>>
>>>>> https://medium.com/@cody2007.2/alphazero-implementation-and-tutorial-f4324d65fdfc
>>>>>
>>>>> -Cody
>>>>>
>>>>> _______________________________________________
>>>>> Computer-go mailing list
>>>>> Computer-go@computer-go.org
>>>>>
>>>>> http://computer-go.org/mailman/listinfo/computer-go
>>>
>>> _______________________________________________
>>> Computer-go mailing list
>>> Computer-go@computer-go.org
>>> http://computer-go.org/mailman/listinfo/computer-go

_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

[Computer-go] Fw: Re: AlphaZero tensorflow implementation/tutorial

Reply via email to