Re: [computer-go] How to design the stronger playout policy?

2008-01-05 Thread Gian-Carlo Pascutto

Yamato wrote:

I guess the current top programs have much better playout policy than
the classical MoGo-style one.

The original policy of MoGo was,

(1) If the last move is an Atari, plays one saving move randomly.
(2) If there are interesting moves in the 8 positions around the
last move, plays one randomly.
(3) If there are the moves capturing stones, plays one randomly.
(4) Plays one random move on the board.

I (and maybe many others) use it with some improvements, however it
will be not enough to catch up the top programs.


What improvements did you try? The obvious one I know are prioritizing 
saving and capturing moves by the size of the string.


Zen appears quite strong on CGOS. Leela using the above system was 
certainly weaker.



Then I have tested a lot of change of probability distributions, but
it was very hard to improve the strength.

Any comments?


I had the same problem, i.e. it seems almost impossible to improve the 
MoGo system by having a different pattern set for interesting moves, 
or even by varying the probability of interesting moves by pattern score.


I tried 2 things:

a) I exctracted about 5000 positions with a known winner (determined by 
UCT) from CGOS games, and measured the Mean Square Error of the result 
fof my playouts against the known result (also described in one of the 
MoGo papers). Then I applied a genetic algorithm to optimize the playout 
patterns.


This worked, in the sense that the MSE measured over the 5000 positions 
dropped. However, it did not produce a stronger program! I found that 
somewhat shocking.


It makes me doubt the value of the MSE measure.

2) I made a simple genetic algorithm that makes a random pool of a few 
hundred playout policites, picks 2 random parents and crossovers/mutates 
to 2 children, plays a 10 game match between the 2 children with 
simulations = 100, and then keeps the winner.


This did not produce anything interesting either. My best guess is that 
the match results are simply too random.


So I did not found any way to automatically optimize the patterns.

I finally improved my playouts by using Remi's ELO system to learn a set 
of interesting patterns, and just randomly fiddling with the 
probabilities (compressing/expanding) until something improved my 
program in self-play with about +25%. Not a very satisfying method or an 
exceptional result. There could be some other magic combination that is 
even better, or maybe not.


I now got some improvement by merging the (1) (2) (3) in the MoGo 
system and using probabilities on that. It makes sense because the 
playouts wont try hopeless saving moves, for example.


What is so frustrating is that the playouts are essentially black magic. 
  I know of no way to automatically determine what is good and not 
besides playing about 500 games between 2 strategies. The results are 
very often completely counterintuitive. There is no systematic way to 
improve.


--
GCP
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] update on networking from phils with new CGOS configuration

2008-01-05 Thread Peter Christopher
Strangely enough, today the network from the phils to boardspace.net
is so bad that even don's new configuration doesn't make up for the
additional network lag (though I think it alleviates the bot's pain).

With .2 seconds of thinking time, the net time detracted from
time_remaining is still decreasing an average of 1 sec per move.

I did run this same bot from a friend's server @ the same
configuration, .2 sec/move, and the bot always still has 300 seconds
remaining after every move [ a .7s/move bot ALSO always still had 300 seconds],
so this means that Don's change works  no problem in CGOS in general, the
problem is the network from the philippines.

Strangely, my ping from the philippines to boardspace.net at the time
of this testing is 700ms.  BUT my ping from the philippines to MIT.edu
is only 350ms.So, I thought, Maybe boardspace.net has a network
problem.  I tested from the server in the US.  ping to boardspace.net
- 20ms.  ping to mit.edu - 12ms.

I definitely cannot understand those results.

For the present time, it appears I may have to throw in the towel on
getting dependable results running the bots from my laptop in the
philippines.

Raw output of ping  cgos script follows.

Peter

ps I did notice Fat1800 resign a lost game against my bot from my
laptop in the phils, thanks ;-)  Now, if my bot could only achieve
winning positions against Fat1800 on a regular basis ;)

-PING FROM PHILIPPINES 

0 [EMAIL PROTECTED] ping mit.edu
PING mit.edu (18.7.22.69) 56(84) bytes of data.
64 bytes from WEB.MIT.EDU (18.7.22.69): icmp_seq=1 ttl=238 time=354 ms
64 bytes from WEB.MIT.EDU (18.7.22.69): icmp_seq=2 ttl=238 time=354 ms
64 bytes from WEB.MIT.EDU (18.7.22.69): icmp_seq=3 ttl=238 time=363 ms
64 bytes from WEB.MIT.EDU (18.7.22.69): icmp_seq=4 ttl=238 time=371 ms
64 bytes from WEB.MIT.EDU (18.7.22.69): icmp_seq=5 ttl=238 time=346 ms
64 bytes from WEB.MIT.EDU (18.7.22.69): icmp_seq=6 ttl=238 time=355 ms
64 bytes from WEB.MIT.EDU (18.7.22.69): icmp_seq=7 ttl=238 time=330 ms
64 bytes from WEB.MIT.EDU (18.7.22.69): icmp_seq=8 ttl=238 time=338 ms

--- mit.edu ping statistics ---
8 packets transmitted, 8 received, 0% packet loss, time 7086ms
rtt min/avg/max/mdev = 330.218/351.896/371.914/12.419 ms

0 [EMAIL PROTECTED] ping  208.100.19.102 [boardspace.net]
PING 208.100.19.102 (208.100.19.102) 56(84) bytes of data.
64 bytes from 208.100.19.102: icmp_seq=1 ttl=49 time=615 ms
64 bytes from 208.100.19.102: icmp_seq=2 ttl=49 time=668 ms
64 bytes from 208.100.19.102: icmp_seq=3 ttl=49 time=761 ms
64 bytes from 208.100.19.102: icmp_seq=4 ttl=49 time=651 ms
64 bytes from 208.100.19.102: icmp_seq=5 ttl=49 time=740 ms

--- 208.100.19.102 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4026ms
rtt min/avg/max/mdev = 615.330/687.537/761.351/55.010 ms



--PING FROM AMAZON EC2 --



127 [EMAIL PROTECTED] ping boardspace.net
PING boardspace.net (208.100.19.102) 56(84) bytes of data.
64 bytes from boardspace.net (208.100.19.102): icmp_seq=1 ttl=50 time=20.6 ms
64 bytes from boardspace.net (208.100.19.102): icmp_seq=2 ttl=50 time=20.6 ms
64 bytes from boardspace.net (208.100.19.102): icmp_seq=3 ttl=50 time=20.7 ms
64 bytes from boardspace.net (208.100.19.102): icmp_seq=4 ttl=50 time=20.9 ms
64 bytes from boardspace.net (208.100.19.102): icmp_seq=5 ttl=50 time=20.9 ms

--- boardspace.net ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4003ms
rtt min/avg/max/mdev = 20.622/20.782/20.960/0.228 ms

0 [EMAIL PROTECTED] ping mit.edu
PING mit.edu (18.7.22.69) 56(84) bytes of data.
64 bytes from WEB.MIT.EDU (18.7.22.69): icmp_seq=1 ttl=239 time=12.1 ms
64 bytes from WEB.MIT.EDU (18.7.22.69): icmp_seq=2 ttl=239 time=12.4 ms
64 bytes from WEB.MIT.EDU (18.7.22.69): icmp_seq=3 ttl=239 time=12.4 ms
64 bytes from WEB.MIT.EDU (18.7.22.69): icmp_seq=4 ttl=239 time=12.9 ms

--- mit.edu ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 2997ms
rtt min/avg/max/mdev = 12.171/12.511/12.929/0.292 ms





-

Output from philippines of .2 sec / move bot with cgos script

18:51:55S-C genmove b 30
18:51:55C-E time_left b 300 0
18:51:55E-C =
18:51:55C-E genmove b
18:51:55E-C = G2
18:51:55C-S G2
18:52:10S-C info Estimated time until next round: 09:56
18:52:10Estimated time until next round: 09:56
18:52:12S-C play w F6 284916
18:52:12C-E play w F6
18:52:12E-C =
18:52:12S-C genmove b 299823
18:52:12C-E time_left b 299 0
18:52:12E-C =
18:52:12C-E genmove b
18:52:13E-C = D8
18:52:13C-S D8
18:52:25S-C info Estimated time until next round: 09:56
18:52:25Estimated time until next round: 09:56
18:52:28S-C play w G4 271014
18:52:28C-E play w G4
18:52:28E-C =
18:52:29S-C genmove b 298950
18:52:29C-E time_left 

RE: [computer-go] update on networking from phils with new CGOSconfiguration

2008-01-05 Thread dave.devos
It might be possible to automatically compensate for lag by looking up the 
geographic location of a bot's ISP. For instance via 
http://www.hostip.info/use.html 
https://webmail.planet.nl/exchweb/bin/redir.asp?URL=http://www.hostip.info/use.html
  .
 
Dave



Van: [EMAIL PROTECTED] namens Peter Christopher
Verzonden: za 5-1-2008 12:25
Aan: computer-go@computer-go.org
Onderwerp: [computer-go] update on networking from phils with new 
CGOSconfiguration



Strangely enough, today the network from the phils to boardspace.net
is so bad that even don's new configuration doesn't make up for the
additional network lag (though I think it alleviates the bot's pain).

With .2 seconds of thinking time, the net time detracted from
time_remaining is still decreasing an average of 1 sec per move.

I did run this same bot from a friend's server @ the same
configuration, .2 sec/move, and the bot always still has 300 seconds
remaining after every move [ a .7s/move bot ALSO always still had 300 seconds],
so this means that Don's change works  no problem in CGOS in general, the
problem is the network from the philippines.

Strangely, my ping from the philippines to boardspace.net at the time
of this testing is 700ms.  BUT my ping from the philippines to MIT.edu
is only 350ms.So, I thought, Maybe boardspace.net has a network
problem.  I tested from the server in the US.  ping to boardspace.net
- 20ms.  ping to mit.edu - 12ms.

I definitely cannot understand those results.

For the present time, it appears I may have to throw in the towel on
getting dependable results running the bots from my laptop in the
philippines.

Raw output of ping  cgos script follows.

Peter

ps I did notice Fat1800 resign a lost game against my bot from my
laptop in the phils, thanks ;-)  Now, if my bot could only achieve
winning positions against Fat1800 on a regular basis ;)

-PING FROM PHILIPPINES 

0 [EMAIL PROTECTED] ping mit.edu
PING mit.edu (18.7.22.69) 56(84) bytes of data.
64 bytes from WEB.MIT.EDU (18.7.22.69): icmp_seq=1 ttl=238 time=354 ms
64 bytes from WEB.MIT.EDU (18.7.22.69): icmp_seq=2 ttl=238 time=354 ms
64 bytes from WEB.MIT.EDU (18.7.22.69): icmp_seq=3 ttl=238 time=363 ms
64 bytes from WEB.MIT.EDU (18.7.22.69): icmp_seq=4 ttl=238 time=371 ms
64 bytes from WEB.MIT.EDU (18.7.22.69): icmp_seq=5 ttl=238 time=346 ms
64 bytes from WEB.MIT.EDU (18.7.22.69): icmp_seq=6 ttl=238 time=355 ms
64 bytes from WEB.MIT.EDU (18.7.22.69): icmp_seq=7 ttl=238 time=330 ms
64 bytes from WEB.MIT.EDU (18.7.22.69): icmp_seq=8 ttl=238 time=338 ms

--- mit.edu ping statistics ---
8 packets transmitted, 8 received, 0% packet loss, time 7086ms
rtt min/avg/max/mdev = 330.218/351.896/371.914/12.419 ms

0 [EMAIL PROTECTED] ping  208.100.19.102 [boardspace.net]
PING 208.100.19.102 (208.100.19.102) 56(84) bytes of data.
64 bytes from 208.100.19.102: icmp_seq=1 ttl=49 time=615 ms
64 bytes from 208.100.19.102: icmp_seq=2 ttl=49 time=668 ms
64 bytes from 208.100.19.102: icmp_seq=3 ttl=49 time=761 ms
64 bytes from 208.100.19.102: icmp_seq=4 ttl=49 time=651 ms
64 bytes from 208.100.19.102: icmp_seq=5 ttl=49 time=740 ms

--- 208.100.19.102 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4026ms
rtt min/avg/max/mdev = 615.330/687.537/761.351/55.010 ms



--PING FROM AMAZON EC2 --



127 [EMAIL PROTECTED] ping boardspace.net
PING boardspace.net (208.100.19.102) 56(84) bytes of data.
64 bytes from boardspace.net (208.100.19.102): icmp_seq=1 ttl=50 time=20.6 ms
64 bytes from boardspace.net (208.100.19.102): icmp_seq=2 ttl=50 time=20.6 ms
64 bytes from boardspace.net (208.100.19.102): icmp_seq=3 ttl=50 time=20.7 ms
64 bytes from boardspace.net (208.100.19.102): icmp_seq=4 ttl=50 time=20.9 ms
64 bytes from boardspace.net (208.100.19.102): icmp_seq=5 ttl=50 time=20.9 ms

--- boardspace.net ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4003ms
rtt min/avg/max/mdev = 20.622/20.782/20.960/0.228 ms

0 [EMAIL PROTECTED] ping mit.edu
PING mit.edu (18.7.22.69) 56(84) bytes of data.
64 bytes from WEB.MIT.EDU (18.7.22.69): icmp_seq=1 ttl=239 time=12.1 ms
64 bytes from WEB.MIT.EDU (18.7.22.69): icmp_seq=2 ttl=239 time=12.4 ms
64 bytes from WEB.MIT.EDU (18.7.22.69): icmp_seq=3 ttl=239 time=12.4 ms
64 bytes from WEB.MIT.EDU (18.7.22.69): icmp_seq=4 ttl=239 time=12.9 ms

--- mit.edu ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 2997ms
rtt min/avg/max/mdev = 12.171/12.511/12.929/0.292 ms





-

Output from philippines of .2 sec / move bot with cgos script

18:51:55S-C genmove b 30
18:51:55C-E time_left b 300 0
18:51:55E-C =
18:51:55C-E genmove b
18:51:55E-C = G2
18:51:55C-S G2
18:52:10S-C info Estimated time until next round: 09:56
18:52:10Estimated time until 

Re: [computer-go] How to design the stronger playout policy?

2008-01-05 Thread Yamato
Gian-Carlo Pascutto wrote:
What improvements did you try? The obvious one I know are prioritizing 
saving and capturing moves by the size of the string.

Zen appears quite strong on CGOS. Leela using the above system was 
certainly weaker.

I use the static ladder search in playouts. For example, if a move that
matched a 3x3 pattern is capturable in ladder, that is not interesting.
Of course such a rule makes a program slower, but I believe it is an
improvement.

I finally improved my playouts by using Remi's ELO system to learn a set 
of interesting patterns, and just randomly fiddling with the 
probabilities (compressing/expanding) until something improved my 
program in self-play with about +25%. Not a very satisfying method or an 
exceptional result. There could be some other magic combination that is 
even better, or maybe not.

I also have implemented Remi's Minorization-Maximization algorithm.
But I could not find how to use the result of it to improve the strength.
Would you explain the details of the playout policy?
Do you use only 3x3 patterns?

What is so frustrating is that the playouts are essentially black magic. 
   I know of no way to automatically determine what is good and not 
besides playing about 500 games between 2 strategies. The results are 
very often completely counterintuitive. There is no systematic way to 
improve.

Yes. In addition, the big problem is that testing policies is very time
consuming. I think at least 1000 games that use 3000 or more playouts
per move are needed to judge whether a change is good or bad.

--
Yamato
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] How to design the stronger playout policy?

2008-01-05 Thread Don Dailey
Lazarus uses a system very simlar to the original MoGo policy as
documented in the paper.   However I did find one significant
improvement.I used Rémi's ELO system to rate patterns and I simply
throw out moves which match the weakest patterns in the play-outs.In
the tree,  I also throw out these moves but this is progressive.   
Those moves still get explored but not near leaf nodes.  

It seems like I also found a minor improvement in giving captures a
higher priority at some point but I don't remember the details.  

I have not put a huge amount of energy into anything other than this.

Programs that play games tend to be highly idiosyncratic.   What may
work today may not work tomorrow or may work for me and not you.It
depends on what is already in your program, what isn't, how you
implemented it, etc.Some improvements work well with others and some
do not cooperate with other changes. The whole process is a bit of a
black art,  not just the play-outs.

The very biggest problem of all is how to test a change.   It's not so
difficult in the early stages where you are testing major improvements
and getting 100 ELO at a time.But when you get the point where you
refine a program,   you are talking about small but important
improvements.You are looking for 10 or 20 ELO and hoping to put
several of them together to get 100 or 200. But testing even 20 ELO
points takes thousands of games to get a reliable measure.  If we
could measure 5 ELO improvements in 5 minutes,  you can bet most of us
would be able to produce very strong programs.

Some of the top computer chess guys have testing systems that play
50,000 games or more to measure the value of small improvements, such as
a weight adjustment.   They play those games on several computers (or
cpu's)  and play relatively fast - I have heard of testing systems that
average a game per second per cpu at reasonably strong levels (they are
still playing at least master strength.)  

The problem is that if you test too fast, you are not really stressing a
monte carlo program or testing the sub-systems in the same way they
would be tested in real games. Monte Carlo relies on statistics
gathering and that seems to require games that last at least a few
seconds. So unless you have a large bank of workstations it's
difficult to get enough game to reliable measure small improvements -
since this requires tens of thousands of games.


- Don




Gian-Carlo Pascutto wrote:
 Yamato wrote:
 I guess the current top programs have much better playout policy than
 the classical MoGo-style one.

 The original policy of MoGo was,

 (1) If the last move is an Atari, plays one saving move randomly.
 (2) If there are interesting moves in the 8 positions around the
 last move, plays one randomly.
 (3) If there are the moves capturing stones, plays one randomly.
 (4) Plays one random move on the board.

 I (and maybe many others) use it with some improvements, however it
 will be not enough to catch up the top programs.

 What improvements did you try? The obvious one I know are prioritizing
 saving and capturing moves by the size of the string.

 Zen appears quite strong on CGOS. Leela using the above system was
 certainly weaker.

 Then I have tested a lot of change of probability distributions, but
 it was very hard to improve the strength.

 Any comments?

 I had the same problem, i.e. it seems almost impossible to improve the
 MoGo system by having a different pattern set for interesting moves,
 or even by varying the probability of interesting moves by pattern
 score.

 I tried 2 things:

 a) I exctracted about 5000 positions with a known winner (determined
 by UCT) from CGOS games, and measured the Mean Square Error of the
 result fof my playouts against the known result (also described in one
 of the MoGo papers). Then I applied a genetic algorithm to optimize
 the playout patterns.

 This worked, in the sense that the MSE measured over the 5000
 positions dropped. However, it did not produce a stronger program! I
 found that somewhat shocking.

 It makes me doubt the value of the MSE measure.

 2) I made a simple genetic algorithm that makes a random pool of a few
 hundred playout policites, picks 2 random parents and
 crossovers/mutates to 2 children, plays a 10 game match between the 2
 children with simulations = 100, and then keeps the winner.

 This did not produce anything interesting either. My best guess is
 that the match results are simply too random.

 So I did not found any way to automatically optimize the patterns.

 I finally improved my playouts by using Remi's ELO system to learn a
 set of interesting patterns, and just randomly fiddling with the
 probabilities (compressing/expanding) until something improved my
 program in self-play with about +25%. Not a very satisfying method or
 an exceptional result. There could be some other magic combination
 that is even better, or maybe not.

 I now got some improvement 

Re: [computer-go] How to design the stronger playout policy?

2008-01-05 Thread Gian-Carlo Pascutto

Yamato wrote:

I finally improved my playouts by using Remi's ELO system to learn a set 
of interesting patterns, and just randomly fiddling with the 
probabilities (compressing/expanding) until something improved my 
program in self-play with about +25%. Not a very satisfying method or an 
exceptional result. There could be some other magic combination that is 
even better, or maybe not.


I also have implemented Remi's Minorization-Maximization algorithm.
But I could not find how to use the result of it to improve the strength.



Would you explain the details of the playout policy?


(1) Captures of groups that could not save themselves last move.
(2) Save groups in atari due to last move by capturing or extending.
(3) Patterns next to last move.
(4) Global moves.

I quantize the MM pattern scores to 0..255 by multipying them with a 
large constant and clipping. This causes the very good patterns to 
have close scores. I then use a threshold so I do not play the very bad 
patterns at all. The remaining moves are played with the probabilities 
indicated by the quantized values.


I also throw away very bad moves in phase (4) unless there are no 
alternatives. This gives a small but measurable improvement.


But now I believe all the above is actually flawed. With this system I 
will play bad saving moves even if there are great pattern moves. It 
might be that your ladder detection avoids these problems somewhat.


Considering the probabilities of all moves as Crazy Stone does avoids 
this problem.


I am now trying to get a similar effect without incrementally updating 
all urgencies.



Do you use only 3x3 patterns?


Yes.

I have not tried bigger ones. For size = 4 the tables would become 2 x 
16M. Might be worth a try.


--
GCP
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] Re: update on networking from phils with new CGOS configuration

2008-01-05 Thread Dave Dyer
It's interesting to look at a graphic plot of a traceroute
to see there the actual delays are.  I use a program called 
pingplotter for this, but there are many such programs. 

Be warned though, that seeing a potential problem only leaves you
feeling helpless, since there is typically nothing you can do about
it.

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] How to design the stronger playout policy?

2008-01-05 Thread Yamato
Don Dailey wrote:
Lazarus uses a system very simlar to the original MoGo policy as
documented in the paper.   However I did find one significant
improvement.I used Rémi's ELO system to rate patterns and I simply
throw out moves which match the weakest patterns in the play-outs.In
the tree,  I also throw out these moves but this is progressive.   
Those moves still get explored but not near leaf nodes.  

It is interesting that many people say the similar thing.
I'll try to use ELO rating of patterns again.

Programs that play games tend to be highly idiosyncratic.   What may
work today may not work tomorrow or may work for me and not you.It
depends on what is already in your program, what isn't, how you
implemented it, etc.Some improvements work well with others and some
do not cooperate with other changes. The whole process is a bit of a
black art,  not just the play-outs.

I fully agree.

--
Yamato
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/