Re: [computer-go] MoGo paper at ICML

2007-06-25 Thread Christian Nilsson

Hi,

In the paper you only present results of UCT_RAVE with the MoGo
default policy. Did you run tests with UCT_RAVE using pure random
playouts too?

I'm curious because I've tried millions ( well, it feels that way ) of
uses for AMAF in my code... but so far all of them have been proven
useless, often yielding worse results.

/Christian Nilsson


On 6/23/07, Sylvain Gelly [EMAIL PROTECTED] wrote:

Hello all,

We just presented our paper describing MoGo's improvements at ICML,
and we thought we would pass on some of the feedback and corrections
we have received.
(http://www.machinelearning.org/proceedings/icml2007/papers/387.pdf)

The way that we incorporate prior knowledge in UCT can be seen as a
bayesian prior, and corresponds exactly to the dirichlet prior (more
precisely to the beta prior as we here get binomials).

The cumulative result is only given using the prior knowledge on top
of RAVE, but it could have been done the other way round and give the
same type of results. Each particular improvement is somehow
independent of the others.

On figure 5, the legend of horizontal axis should be m_prior rather
than n_prior.

All experiments (except the default policy) were played against GnuGo
level 10, not level 8.

Any other comments are welcome!
Sylvain  David
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] scalability study - final results

2007-06-25 Thread Don Dailey
Someone just reminded me of the scalability study I did a few months
back and I reported that I would continue to run it for perhaps a few
more weeks.

I did run about 20% more games, but the data was quite useful because
it increased the number of games sampled for the highest levels.  I had
started the highest level program late but the auto-tester is designed
to try to equalize the number of games played for each player.

As a reminder, the study was designed to test the improvement of
modern UCT programs as the number of play-outs increase.  In the
study, I had two basic versions, each testing at 12 different levels.

The L series is Lazarus running with lite play-outs and the H series
is Lazarus running with heavy play-outs.  Since the study, Lazarus
has actually improved significantly, so these are both older versions
of Lazarus - still relatively strong and perhaps better candidates for
a study of this type since the older programs tend to be more
universal (less prone to serious intransitives.)

I don't have a graph like I did before, but one can easily be
constructed by the data:

--- Player Key ---

   H_  is heavy play-out version of Lazarus
   L_  is light play-out version of Lazarus
   
   The numeric portion of the player name describes how 
   many play-outs were executed to play each move.
  

PLAYERTIME/GME   RATING  GAMES   Total games: 2895
    ---  -  
H_204813350.17   2830.2168 
H_1024 6693.84   2768.0169 
H_0512 3147.28   2547.3168 
H_0256 1547.30   2399.3168 
L_2048 4549.37   2375.5168 
H_0128  758.64   2315.7168 
L_1024 2203.88   2287.8169 
H_0064  381.00   2240.3339 
L_0512 1064.80   2174.1168 
H_0032  214.12   2129.2318 
L_0256  523.12   2105.7168 
L_0128  258.54   2097.8170 
gg-3.7.9 68.97   2000.0307Standard GnuGo 3.7.9
L_0064  134.17   1981.7293 
H_0016  125.93   1950.2284 
L_0032   72.72   1941.5284 
H_0008   62.27   1872.4276 
L_0016   43.49   1758.6261 
H_0004   31.22   1679.1253 
L_0008   21.07   1556.2248 
H_0002   14.90   1402.1250 
L_0004   10.55   1347.0248 
L_00025.03   1123.6248 
H_00017.44   1031.6249 
L_00012.49863.6248 



Observations:

 If you look at the entire range of the HEAVY player, you will notice
 that each doubling (on average) was worth 164 ELO points.

 You will also notice a gradual falloff in improvement as the levels
 increase.

 As a general rule of thumb, there is about 150 ELO per doubling.  I
 figured this by throwing out the highest and lowest rated HEAVY player
 and averaging the increase per doubling.  It seems pragmatic to throw
 out the 2 extremes based on empirical observation - I have always
 noticed that in a pool of players the highest and lowest often have
 at least somewhat distorted ratings.  

 After throwing out the low and high ratings the top 5 players average
 about 132 ELO per doubling and the bottom 5 average an increase of
 about 210 per doubling.

 So there is a definite decrease per doubling, but it's quite gradual.


I did a similar study with 7x7 and found that the tapering is extremely
pronounced.  It was quite obvious which komi to use, because if it was
too low black won every games, if it was too high white won every
game.  The tapering was pronounced because at higher levels the play
was very close to perfect.  If you are playing perfect, there is no
improvement to be had by doubling.

It appears as a general rule of thumb, (and is supported by empirical
evidence in similar studies of other games) that the rating/resource
curve is almost linear when you are far away from perfect play but
gets pronounced as you approach perfect play.  I suspect Lazarus at
the highest level I tested is within a few hundred ELO points of
perfect play.  It's still a long way off, especially considering that
Lazarus at the highest level was spending almost 4 hours on each 9x9
game!

My auto-tester stores all data, including configuration in a single
sqlite3 database.  In it are the SGF game records, individual results
and even time spent on each move and it's available to anyone who
wants it upon request - so you can analyze the results for yourself
and come to your own conclusions!

- Don



___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] MoGo paper at ICML

2007-06-25 Thread Christian Nilsson

On 6/25/07, Sylvain Gelly [EMAIL PROTECTED] wrote:

I have to admit that it took me several weeks to make the RAVE algorithm
actually work, although the idea is so simple. That maybe explain your
previous results.
The description in the paper should be sufficient to make it work well.


Ok, I'll just have to work harder then. :)


Thanks,
Christian
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] scalability study - final results

2007-06-25 Thread dhillismail

 These are very interesting results. Thanks for doing all this work.
- Dave Hillis


-Original Message-
From: Don Dailey [EMAIL PROTECTED]
To: computer-go computer-go@computer-go.org
Sent: Mon, 25 Jun 2007 12:34 pm
Subject: [computer-go] scalability study - final results



Someone just reminded me of the scalability study I did a few months
ack and I reported that I would continue to run it for perhaps a few
ore weeks.
I did run about 20% more games, but the data was quite useful because
t increased the number of games sampled for the highest levels.  I had
tarted the highest level program late but the auto-tester is designed
o try to equalize the number of games played for each player.
As a reminder, the study was designed to test the improvement of
odern UCT programs as the number of play-outs increase.  In the
tudy, I had two basic versions, each testing at 12 different levels.
The L series is Lazarus running with lite play-outs and the H series
s Lazarus running with heavy play-outs.  Since the study, Lazarus
as actually improved significantly, so these are both older versions
f Lazarus - still relatively strong and perhaps better candidates for
 study of this type since the older programs tend to be more
niversal (less prone to serious intransitives.)
I don't have a graph like I did before, but one can easily be
onstructed by the data:
--- Player Key ---
   H_  is heavy play-out version of Lazarus
  L_  is light play-out version of Lazarus
  
  The numeric portion of the player name describes how 
  many play-outs were executed to play each move.
 
PLAYERTIME/GME   RATING  GAMES   Total games: 2895
---    ---  -  
_204813350.17   2830.2168 
_1024 6693.84   2768.0169 
_0512 3147.28   2547.3168 
_0256 1547.30   2399.3168 
_2048 4549.37   2375.5168 
_0128  758.64   2315.7168 
_1024 2203.88   2287.8169 
_0064  381.00   2240.3339 
_0512 1064.80   2174.1168 
_0032  214.12   2129.2318 
_0256  523.12   2105.7168 
_0128  258.54   2097.8170 
g-3.7.9 68.97   2000.0307Standard GnuGo 3.7.9
_0064  134.17   1981.7293 
_0016  125.93   1950.2284 
_0032   72.72   1941.5284 
_0008   62.27   1872.4276 
_0016   43.49   1758.6261 
_0004   31.22   1679.1253 
_0008   21.07   1556.2248 
_0002   14.90   1402.1250 
_0004   10.55   1347.0248 
_00025.03   1123.6248 
_00017.44   1031.6249 
_00012.49863.6248 

Observations:
 If you look at the entire range of the HEAVY player, you will notice
that each doubling (on average) was worth 164 ELO points.
 You will also notice a gradual falloff in improvement as the levels
increase.
 As a general rule of thumb, there is about 150 ELO per doubling.  I
figured this by throwing out the highest and lowest rated HEAVY player
and averaging the increase per doubling.  It seems pragmatic to throw
out the 2 extremes based on empirical observation - I have always
noticed that in a pool of players the highest and lowest often have
at least somewhat distorted ratings.  
 After throwing out the low and high ratings the top 5 players average
about 132 ELO per doubling and the bottom 5 average an increase of
about 210 per doubling.
 So there is a definite decrease per doubling, but it's quite gradual.

 did a similar study with 7x7 and found that the tapering is extremely
ronounced.  It was quite obvious which komi to use, because if it was
oo low black won every games, if it was too high white won every
ame.  The tapering was pronounced because at higher levels the play
as very close to perfect.  If you are playing perfect, there is no
mprovement to be had by doubling.
It appears as a general rule of thumb, (and is supported by empirical
vidence in similar studies of other games) that the rating/resource
urve is almost linear when you are far away from perfect play but
ets pronounced as you approach perfect play.  I suspect Lazarus at
he highest level I tested is within a few hundred ELO points of
erfect play.  It's still a long way off, especially considering that
azarus at the highest level was spending almost 4 hours on each 9x9
ame!
My auto-tester stores all data, including configuration in a single
qlite3 database.  In it are the SGF game records, individual results
nd even time spent on each move and it's available to anyone who
ants it upon request - so you can analyze the results for yourself
nd come to your own conclusions!
- Don

___
omputer-go mailing list
[EMAIL PROTECTED]
ttp://www.computer-go.org/mailman/listinfo/computer-go/



Check Out the new free AIM(R) Mail -- 2 GB of storage and industry-leading spam 
and email virus 

Re: [spam probable] [computer-go] scalability study - final results

2007-06-25 Thread Sylvain Gelly

Hi Don,

This is a very interesting study!

Sylvain

2007/6/25, Don Dailey [EMAIL PROTECTED]:


Someone just reminded me of the scalability study I did a few months
back and I reported that I would continue to run it for perhaps a few
more weeks.

I did run about 20% more games, but the data was quite useful because
it increased the number of games sampled for the highest levels.  I had
started the highest level program late but the auto-tester is designed
to try to equalize the number of games played for each player.

As a reminder, the study was designed to test the improvement of
modern UCT programs as the number of play-outs increase.  In the
study, I had two basic versions, each testing at 12 different levels.

The L series is Lazarus running with lite play-outs and the H series
is Lazarus running with heavy play-outs.  Since the study, Lazarus
has actually improved significantly, so these are both older versions
of Lazarus - still relatively strong and perhaps better candidates for
a study of this type since the older programs tend to be more
universal (less prone to serious intransitives.)

I don't have a graph like I did before, but one can easily be
constructed by the data:

--- Player Key ---

   H_  is heavy play-out version of Lazarus
   L_  is light play-out version of Lazarus

   The numeric portion of the player name describes how
   many play-outs were executed to play each move.


PLAYERTIME/GME   RATING  GAMES   Total games: 2895
    ---  -
H_204813350.17   2830.2168
H_1024 6693.84   2768.0169
H_0512 3147.28   2547.3168
H_0256 1547.30   2399.3168
L_2048 4549.37   2375.5168
H_0128  758.64   2315.7168
L_1024 2203.88   2287.8169
H_0064  381.00   2240.3339
L_0512 1064.80   2174.1168
H_0032  214.12   2129.2318
L_0256  523.12   2105.7168
L_0128  258.54   2097.8170
gg-3.7.9 68.97   2000.0307Standard GnuGo 3.7.9
L_0064  134.17   1981.7293
H_0016  125.93   1950.2284
L_0032   72.72   1941.5284
H_0008   62.27   1872.4276
L_0016   43.49   1758.6261
H_0004   31.22   1679.1253
L_0008   21.07   1556.2248
H_0002   14.90   1402.1250
L_0004   10.55   1347.0248
L_00025.03   1123.6248
H_00017.44   1031.6249
L_00012.49863.6248



Observations:

If you look at the entire range of the HEAVY player, you will notice
that each doubling (on average) was worth 164 ELO points.

You will also notice a gradual falloff in improvement as the levels
increase.

As a general rule of thumb, there is about 150 ELO per doubling.  I
figured this by throwing out the highest and lowest rated HEAVY player
and averaging the increase per doubling.  It seems pragmatic to throw
out the 2 extremes based on empirical observation - I have always
noticed that in a pool of players the highest and lowest often have
at least somewhat distorted ratings.

After throwing out the low and high ratings the top 5 players average
about 132 ELO per doubling and the bottom 5 average an increase of
about 210 per doubling.

So there is a definite decrease per doubling, but it's quite gradual.


I did a similar study with 7x7 and found that the tapering is extremely
pronounced.  It was quite obvious which komi to use, because if it was
too low black won every games, if it was too high white won every
game.  The tapering was pronounced because at higher levels the play
was very close to perfect.  If you are playing perfect, there is no
improvement to be had by doubling.

It appears as a general rule of thumb, (and is supported by empirical
evidence in similar studies of other games) that the rating/resource
curve is almost linear when you are far away from perfect play but
gets pronounced as you approach perfect play.  I suspect Lazarus at
the highest level I tested is within a few hundred ELO points of
perfect play.  It's still a long way off, especially considering that
Lazarus at the highest level was spending almost 4 hours on each 9x9
game!

My auto-tester stores all data, including configuration in a single
sqlite3 database.  In it are the SGF game records, individual results
and even time spent on each move and it's available to anyone who
wants it upon request - so you can analyze the results for yourself
and come to your own conclusions!

- Don



___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] evalgo autotesting system

2007-06-25 Thread Heikki Levanto
On Mon, Jun 25, 2007 at 03:17:06PM -0400, Don Dailey wrote:
 However I'm releasing my testing system to the public if anyone is
 interested.
 It may have some features in it that you will be interested in.

Thanks. Looks promising.

 It DOES require having sqlite3 and a little bit (but not much) sql
 knowledge.  You DO have to manually insert registry records into the
 database to specify who the players are and how they should be invoked,
 but no other sql knoweldge is required beyond this - a report is
 furnished by the program or if you want you can manually query the
 database to find out anything you want about the games.

I have it running, but at the moment I have no idea how to specify my
program for it. sqlite3 seems not to understand the describe command
that I normally use to sort out the table layouts before inserting
anything. Maybe you could add a one-line example to the README, on how
to add a program (say GNU Go) as a player.


Regards

  Heikki

-- 
Heikki Levanto   In Murphy We Turst heikki (at) lsd (dot) dk

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] evalgo autotesting system

2007-06-25 Thread Heikki Levanto
On Mon, Jun 25, 2007 at 04:33:47PM -0400, Don Dailey wrote:
 Here is how you might set up gnugo:

Thanks! that certainly looks enough to get me going.

- Heikki

-- 
Heikki Levanto   In Murphy We Turst heikki (at) lsd (dot) dk

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] evalgo autotesting system

2007-06-25 Thread Don Dailey
Actually,  the info table default to boardsize 9 and 7.5 komi, so to
change this you need to either delete and re-insert or update the
first row.

- Don


On Mon, 2007-06-25 at 21:57 +0200, Heikki Levanto wrote:
 On Mon, Jun 25, 2007 at 03:17:06PM -0400, Don Dailey wrote:
  However I'm releasing my testing system to the public if anyone is
  interested.
  It may have some features in it that you will be interested in.
 
 Thanks. Looks promising.
 
  It DOES require having sqlite3 and a little bit (but not much) sql
  knowledge.  You DO have to manually insert registry records into the
  database to specify who the players are and how they should be invoked,
  but no other sql knoweldge is required beyond this - a report is
  furnished by the program or if you want you can manually query the
  database to find out anything you want about the games.
 
 I have it running, but at the moment I have no idea how to specify my
 program for it. sqlite3 seems not to understand the describe command
 that I normally use to sort out the table layouts before inserting
 anything. Maybe you could add a one-line example to the README, on how
 to add a program (say GNU Go) as a player.
 
 
 Regards
 
   Heikki
 

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] evalgo autotesting system

2007-06-25 Thread Don Dailey
Let me know if you get it working.

- Don


On Mon, 2007-06-25 at 22:36 +0200, Heikki Levanto wrote:
 On Mon, Jun 25, 2007 at 04:33:47PM -0400, Don Dailey wrote:
  Here is how you might set up gnugo:
 
 Thanks! that certainly looks enough to get me going.
 
 - Heikki
 

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] scalability study - final results

2007-06-25 Thread terry mcintyre
Don,

That's exciting!  If Lazarus with heavy playouts can achieve within a few 
hundred points of perfect play on a 9x9 board, in less than 4 hours total game 
time, then it should do rather well on such turn-based servers as the Dragon Go 
Server. A 30-day clock should be more than adequate. That would be something of 
a milestone, trouncing strong human players on the 9x9 board, with no excuses 
about the humans running out of time. 

 
Terry McIntyre [EMAIL PROTECTED]
They mean to govern well; but they mean to govern. They promise to be kind 
masters; but they mean to be masters. -- Daniel Webster





 

The fish are biting. 
Get more visitors on your site using Yahoo! Search Marketing.
http://searchmarketing.yahoo.com/arp/sponsoredsearch_v2.php___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] scalability study - final results

2007-06-25 Thread elife

Hi Don,

 Thanks for doing this valueable work.
 Where can we get the data? I am interested with it.

Cai Qiang
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] scalability study - final results

2007-06-25 Thread Don Dailey
On Mon, 2007-06-25 at 15:07 -0700, terry mcintyre wrote:
 Don,
 
 That's exciting!  If Lazarus with heavy playouts can achieve within a
 few hundred points of perfect play on a 9x9 board, in less than 4
 hours total game time, then it should do rather well on such
 turn-based servers as the Dragon Go Server. A 30-day clock should be
 more than adequate. That would be something of a milestone, trouncing
 strong human players on the 9x9 board, with no excuses about the
 humans running out of time. 

I believe humans play much stronger too at those time controls.  Unless
of course they are playing many games and are not really focused on any
particular game.  

In fact, I'm quite convinced that a human really trying hard on a turn
based server would be a formidable opponent - playing much stronger than
he normally would over the board.   But then so would the program!

- Don


 
 Terry McIntyre [EMAIL PROTECTED]
 They mean to govern well; but they mean to govern. They promise to be
 kind masters; but they mean to be masters. -- Daniel Webster
 
 
 
 
 
 __
 Finding fabulous fares is fun.
 Let Yahoo! FareChase search your favorite travel sites to find flight
 and hotel bargains.
 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] scalability study - final results

2007-06-25 Thread Don Dailey
On Tue, 2007-06-26 at 06:50 +0800, elife wrote:
 Hi Don,
 
   Thanks for doing this valueable work.
   Where can we get the data? I am interested with it.
 
 Cai Qiang


I put everything on that web site:

 Just go to  http://www.greencheeks.homelinux.org:8015/

and you can get the games from april, may and june from CGOS,
 the autotester
 the cgos server
and scale.db - the data in sqlite3 format from the scalability study.

- Don




 
 
 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] scalability study - final results

2007-06-25 Thread Darren Cook
  After throwing out the low and high ratings the top 5 players average
  about 132 ELO per doubling and the bottom 5 average an increase of
  about 210 per doubling.
 ...
 I suspect Lazarus at
 the highest level I tested is within a few hundred ELO points of
 perfect play.  It's still a long way off, especially considering that
 Lazarus at the highest level was spending almost 4 hours on each 9x9
 game!

You're suggesting that it would be practically perfect with say three
more doublings (another 132*3=400 ELO points), which is only 32 hours
per game. At that level play should be relatively stable (statistically)
and it would be great to run just 2 games of self-play (128 hours = 5
days?), and study the game record.

Do you feel that at these strong levels the experiment is distorted by
the lack of an equally strong non-UCT program? UCT programs tend to be
weaker in the opening, stronger at the end. Once they reach the level
where GnuGo is cannon fodder then it is just a self-play experiment, and
their remaining weaknesses are not being exploited.

Darren
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/