Re: [Computer-go] Computer-go Digest, Vol 12, Issue 79

Aja Wed, 26 Jan 2011 05:14:14 -0800

Hi Hendrik,

Thanks.

Congratulations, you have done a really nice work. I check your thesis. Myresult is consistent with yours of LBR-2. No benefit at all, so I took itoff. I adapt LGR-1 to softmax policy of Erica. Basically, I am tuning theprobability offset by checking some aritifical test-positions. In 3000playouts, now it scores around 57% after 500 games, almost 60%, which is mytarget (my intuition is LGR-1 should help a lot already). :)

Actually I have one question and still can't figure out your reasoning. In aplayout, why do you over-write the earlier replies by the later ones? Usingthe earliest one looks more reasonable to me.

Aja

----- Original Message -----From: "Hendrik Baier" <>

To: <[email protected]>
Sent: Wednesday, January 26, 2011 5:00 PM
Subject: Re: [Computer-go] Computer-go Digest, Vol 12, Issue 79

Hi Aja,
I would be interested in your results. I think the LGRF policy is only asmall first step into the direction of more adaptive playouts (andhopefully the overcoming of the horizon effect).As for the Last-Bad-Reply idea, you can read about my experiences withthis and related policies in my Master's thesis, if you're interested. Itcontains the idea that resulted in the "Power of Forgetting" paper aswell.
http://www.ke.tu-darmstadt.de/lehre/arbeiten/master/2010/Baier_Hendrik.pdf

regards,
Hendrik
I admit that it's difficult for me to include such deterministic defaultpolicy. :-)With softmax policy, using the information of "last-LOST-reply" is maybea good direction.
Aja
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go


_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Re: [Computer-go] Computer-go Digest, Vol 12, Issue 79

Reply via email to