valky...@phmp.se: <19f31e7e5cdf310b9afa91f577997...@phmp.se>:
>I think you misunderstood what I wrote,
>if perfect play on 9x9 is 6000 Elo, then if the value function is 3000 
>Elo and MC eval is 2000 Elo with 1 second thinking time then  it might 
>be that the combination of a value function and MC eval ends up being 
>2700 Elo. It could also be that it ends up at 3200 Elo.
>
>I personally believe MC eval alone can be very strong so it might be 
>that the capacity of a neural network is not enough to replace MC eval.
>
>When I wrote "converge to perfection" it was not to claim that the Zero 
>approach reaches perfection, just that it get stronger over time.

How to gurantee the improvement does not stop nor 
oscillate?  Actually, the first instance (40 layer version) 
of AlphaGo Zero stopped improvements in three days (at 
least looks so).

>The interesting question is if old school MC evaluation can fill up the 
>knowledge gaps of the value function.

My point is that.  As value networks approximate the value 
function in vary rough manner (ie, smoother parts) due to 
not enough freedom and/or samples and MC rollouts can 
implement (maybe partly) the detail parts of the function, 
mixing those two could yields better approximation (in 
theory :).

Hideki

>For my own Odin project I am not working on the MC evaluation currently, 
>since only when I a have final neural network solution can I see which 
>weaknesses I need to fix.
>
>Best
>Magnus
>
>
>On 2018-03-05 21:12, Hideki Kato wrote:
>> DCNNs are not magic but just non-linear continuous function
>> approximators with finite freedom and we can provide up to
>> 10^8 samples (board positions) in practice.
>> 
>> Why do most people believe VN can approximate (perfect or
>> near perfect) value function?  What do they estimate the
>> complexity of the value function for 19x19 Go?
>> 
>> valky...@phmp.se: <aa3cca138c40cbd620700cc36e950...@phmp.se>:
>>> My guess is that there is some kind of threshold depending on the
>>> relative strength of MC eval and the value function of the NN.
>>> 
>>> If the value function is stronger than MC eval I would guess MCEval
>>> turns into a bad noisy feature with little benefit.
>>> 
>>> Depending on how strong MC eval is this threshold is probably very
>>> different between engines. Also i can imagine that NN value function 
>>> can
>>> have some gaping holes in its knowledge that even simple MC eval can
>>> patch up. Probably true for supervised learning where training data
>>> probably has a lot of holes since bad moves are not in the data.
>>> 
>>> The Zero approach is different because it should converge to 
>>> perfection
>>> in the limit, thus overcome any weaknesses of the value function early
>>> on. At least in theory.
>>> 
>>> 
>>> On 2018-03-05 14:04, Gian-Carlo Pascutto wrote:
>>>> On 5/03/2018 12:28, valky...@phmp.se wrote:
>>>>> Remi twittered more details here (see the discussion with gghideki:
>>>>> 
>>>>> https://twitter.com/Remi_Coulom/status/969936332205318144
>>>> 
>>>> Thank you. So Remi gave up on rollouts as well. Interesting 
>>>> "difference
>>>> of opinion" there with Zen.
>>>> 
>>>> Last time I tested this in regular Leela, playouts were beneficial, 
>>>> but
>>>> this was before combined value+policy nets and much more training 
>>>> data
>>>> was available. I do not know what the current status would be.
>>> 
>>> _______________________________________________
>>> Computer-go mailing list
>>> Computer-go@computer-go.org
>>> http://computer-go.org/mailman/listinfo/computer-go
>_______________________________________________
>Computer-go mailing list
>Computer-go@computer-go.org
>http://computer-go.org/mailman/listinfo/computer-go
-- 
Hideki Kato <mailto:hideki_ka...@ybb.ne.jp>
_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Reply via email to