Re: [Computer-go] mini-max with Policy and Value network

Gian-Carlo Pascutto Mon, 22 May 2017 01:39:12 -0700

On 20/05/2017 22:26, Brian Sheppard via Computer-go wrote:
> Could use late-move reductions to eliminate the hard pruning. Given
> the accuracy rate of the policy network, I would guess that even move
> 2 should be reduced.
>


The question I always ask is: what's the real difference between MCTS
with a small UCT constant and an alpha-beta search with heavy Late Move
Reductions? Are the explored trees really so different?

In any case, in my experiments Monte Carlo still gives a strong benefit,
even with a not so strong Monte Carlo part. IIRC it was the case for
AlphaGo too, and they used more training data for the value network than
is publicly available, and Zen reported the same: Monte Carlo is important.

The main problem is the "only top x moves part". Late Move Reductions
are very nice because there is never a full pruning. This heavy pruning
by the policy network OTOH seems to be an issue for me. My program has
big tactical holes.

-- 
GCP
_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] mini-max with Policy and Value network

Reply via email to