Re: [Computer-go] AlphaZero tensorflow implementation/tutorial

cody2007 via Computer-go Sun, 09 Dec 2018 20:05:43 -0800

Oh, I see. I believe I am, in fact, using Tromp-Taylor rules for scoring. I was 
unaware that that's what it was called.


‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Sunday, December 9, 2018 10:09 PM, cody2007 <cody2...@protonmail.com> wrote:

> Sorry, just to make sure I understand: your concern is the network may be 
> learning from the scoring system rather than through the self-play? Or are 
> you concerned the scoring is giving sub-par evaluations of games?
>
> The scoring I use is to simply count the number of stones each player has on 
> the board. Then add a point for each unoccupied space that is surrounded 
> completely by each player. It is simplistic and I think it does give sub-par 
> evaluations of who is the winner--and definitely is a potentially serious 
> deterrent to getting better performance. How much, maybe a lot. What do you 
> think?
>
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Sunday, December 9, 2018 9:31 PM, uurtamo <uurt...@gmail.com> wrote:
>
>> Imagine that your score estimator has a better idea about the outcome of the 
>> game than the players themselves.
>>
>> Then you can build a stronger computer player with the following algorithm: 
>> use the score estimator to pick the next move after evaluating all legal 
>> moves, by evaluating their after-move scores.
>>
>> If you use something like Tromp-Taylor (not sure what most people use 
>> nowadays) then you can score it less equivocally.
>>
>> Perhaps I was misunderstanding, but if not then this could be a somewhat 
>> serious problem.
>>
>> s
>>
>> On Sun, Dec 9, 2018, 6:17 PM cody2007 <cody2...@protonmail.com wrote:
>>
>>>>By the way, why only 40 moves? That seems like the wrong place to 
>>>>economize, but maybe on 7x7 it's fine?
>>> I haven't implemented any resign mechanism, so felt it was a reasonable 
>>> balance to at least see where the players roughly stand. Although, I think 
>>> I errored on too few turns.
>>>
>>>>A "scoring estimate" by definition should be weaker than the computer 
>>>>players it's evaluating until there are no more captures possible.
>>> Not sure I understand entirely. But would agree that the scoring I use is 
>>> probably a limitation here.
>>>
>>> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>>> On Sunday, December 9, 2018 8:51 PM, uurtamo <uurt...@gmail.com> wrote:
>>>
>>>> A "scoring estimate" by definition should be weaker than the computer 
>>>> players it's evaluating until there are no more captures possible.
>>>>
>>>> Yes?
>>>>
>>>> s.
>>>>
>>>> On Sun, Dec 9, 2018, 5:49 PM uurtamo <uurt...@gmail.com wrote:
>>>>
>>>>> By the way, why only 40 moves? That seems like the wrong place to 
>>>>> economize, but maybe on 7x7 it's fine?
>>>>>
>>>>> s.
>>>>>
>>>>> On Sun, Dec 9, 2018, 5:23 PM cody2007 via Computer-go 
>>>>> <computer-go@computer-go.org wrote:
>>>>>
>>>>>> Thanks for your comments.
>>>>>>
>>>>>>>looks you made it work on a 7x7 19x19 would probably give better result 
>>>>>>>especially against yourself if you are a complete novice
>>>>>> I'd expect that'd make me win even more against the algorithm since it 
>>>>>> would explore a far smaller amount of the search space, right?
>>>>>> Certainly something I'd be interested in testing though--I just would 
>>>>>> expect it'd take many months more months of training however, but would 
>>>>>> be interesting to see how much performance falls apart, if at all.
>>>>>>
>>>>>>>for not cheating against gnugo, use --play-out-aftermath of gnugo 
>>>>>>>parameter
>>>>>> Yep, I evaluate with that parameter. The problem is more that I only 
>>>>>> play 20 turns per player per game. And the network seems to like placing 
>>>>>> stones in terrotories "owned" by the other player. My scoring system 
>>>>>> then no longer counts that area as owned by the player. Probably playing 
>>>>>> more turns out and/or using a more sophisticated scoring system would 
>>>>>> fix this.
>>>>>>
>>>>>>>If I don't mistake a competitive ai would need a lot more training such 
>>>>>>>what does leela zero https://github.com/gcp/leela-zero
>>>>>> Yeah, I agree more training is probably the key here. I'll take a look 
>>>>>> at leela-zero.
>>>>>>
>>>>>> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>>>>>> On Sunday, December 9, 2018 7:41 PM, Xavier Combelle 
>>>>>> <xavier.combe...@gmail.com> wrote:
>>>>>>
>>>>>>> looks you made it work on a 7x7 19x19 would probably give better result 
>>>>>>> especially against yourself if you are a complete novice
>>>>>>>
>>>>>>> for not cheating against gnugo, use --play-out-aftermath of gnugo 
>>>>>>> parameter
>>>>>>>
>>>>>>> If I don't mistake a competitive ai would need a lot more training such 
>>>>>>> what does leela zero https://github.com/gcp/leela-zero
>>>>>>>
>>>>>>> Le 10/12/2018 à 01:25, cody2007 via Computer-go a écrit :
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I've posted an implementation of the AlphaZero algorithm and brief 
>>>>>>>> tutorial. The code runs on a single GPU. While performance is not that 
>>>>>>>> great, I suspect its mostly been limited by hardware limitations (my 
>>>>>>>> training and evaluation has been on a single Titan X). The network can 
>>>>>>>> beat GNU go about 50% of the time, although it "abuses" the scoring a 
>>>>>>>> little bit--which I talk a little more about in the article:
>>>>>>>>
>>>>>>>> https://medium.com/@cody2007.2/alphazero-implementation-and-tutorial-f4324d65fdfc
>>>>>>>>
>>>>>>>> -Cody
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Computer-go mailing list
>>>>>>>> Computer-go@computer-go.org
>>>>>>>>
>>>>>>>> http://computer-go.org/mailman/listinfo/computer-go
>>>>>>
>>>>>> _______________________________________________
>>>>>> Computer-go mailing list
>>>>>> Computer-go@computer-go.org
>>>>>> http://computer-go.org/mailman/listinfo/computer-go

_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaZero tensorflow implementation/tutorial

Reply via email to