On 24-07-17 16:07, David Wu wrote:
> Hmm. Why would discounting make things worse? Do you mean that you
> want the top move to drop off slower (i.e. for the bot to take longer
> to achieve the correct valuation of the top move) to give it "time"
> to search the other moves enough to find that they're also bad?

I don't want the top move to drop off slower, I just don't want to play
other moves until they've been searched to comparable "depth".

If there's a disaster lurking behind the main-variation that we only
just started to understand, the odds are, the same disaster also lurks
in a few of the alternative moves.

> I would have thought that with typical exploration policies, whether
> the top move drops off a little faster or a little slower, once its
> winrate drops down close to the other moves, the other moves should
> get a lot of simulations as well.

Yes. But the goal of the discounting is, that a new move can make it
above the old one, despite having had less total search effort.

My point is that it is not always clear this is a positive effect.

> I know that there are ways to handle this at the root, via time
> control or otherwise.

The situation isn't necessarily different here, if you consider that at
the root the best published technique is still "think longer so the new
move can overtake the old one", not "play the new move".

Anyway, not saying this can't work. Just pointing out the problem areas.

I would be a bit surprised if discounting worked for Go because it's
been published for other areas (e.g. Amazons) but I don't remember any
reports of success in Go. But the devil can be in the details (i.e. the
discounting formula) for tricks like this.

-- 
GCP
_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Reply via email to