Re: [Bug-gnubg] pubeval benchmark

Mark Higgins Fri, 03 Feb 2012 08:00:49 -0800

If you're just looking at probability of win the gammon node doesn't matter, 
though of course if you want to look at equity then you'll get value from it.

Using 80 hidden units and simple inputs I got a player to 63% wins against 
pubeval, and gnubg 0-ply (with more hidden units and extended inputs) wins 
around 71%.

So 67.5% sounds a bit high but not unbelievable. (On 20k matches the standard 
error on the % of win estimate is around +/- 0.35%, so 67.5% is significantly 
different from 63% or 71%.)

It's surprisingly tricky to implement pubeval correctly - I had a bunch of 
mistakes in my first attempt, and gnubg's implementation also had a bug until 
recently.

One subtle implementation point: when comparing potential moves you have to 
make sure that you use the race or contact weights based on the starting 
position, not on whether each potential move is contact or race. That's because 
pubeval's evaluation function is a separate linear regression for contact and 
race, and so the results of the two regressions aren't sensibly comparable (ie 
they don't represent probability of win).

That makes a smallish but noticeable different to average pubeval performance.

On Feb 3, 2012, at 10:32 AM, boomslang wrote:

> 
> i have a net that won 67.5% out of 20k matches.  
> It has 40 hidden units and relatively simple inputs (dummies for 1, 2 and 
> more than 3 stones, and an integer for the excess of 3)
> 
> note: it doesnt have a notion of gammons yet.  Does this make it less 
> comparable?
> 
> gr boomslang
> 
> From: Mark Higgins <migg...@gmail.com>
> To: bug-gnubg@gnu.org 
> Sent: Tuesday, 17 January 2012, 6:28
> Subject: [Bug-gnubg] pubeval benchmark
> 
> How does gnubg perform against the pubeval benchmark in cubeless play? 
> 
> I ask because I'm playing around with a backgammon network and have got one 
> that wins 83% of games and +0.945ppg against pubeval (10k cubeless games). 
> This is a single 80-hidden-node network with outputs for prob of win, prob of 
> gammon win, and prob of gammon loss; and just the original Tesauro inputs. 
> 0-ply.
> 
> But in the TD-Gammon scholarpedia article it says that TD-Gammon 2.1 in 1-ply 
> mode wins only +0.596ppg against pubeval. (I think 1-ply here means the gnubg 
> 0-ply.)
> 
> http://www.scholarpedia.org/article/Td-gammon
> 
> That seems really low compared to my result, since I'm pretty sure 2.1 had 
> gammon outputs and also extra customized inputs.
> 
> So I'm wondering if I'm interpreting this correctly, or if I have an 
> incorrectly-setup version of pubeval, or something like that.
> 
> 
> _______________________________________________
> Bug-gnubg mailing list
> Bug-gnubg@gnu.org
> https://lists.gnu.org/mailman/listinfo/bug-gnubg
> 
>

_______________________________________________
Bug-gnubg mailing list
Bug-gnubg@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-gnubg

Re: [Bug-gnubg] pubeval benchmark

Reply via email to