RE: [computer-go] Mogo MCTS is not UCT ? (GENERALIZED_AMAF)

2008-12-02 Thread Denis fidaali

In the mentioned article it is said that :

"

A weakness remains, due to the initial time: at the very beginning, neither AMAF

values nor standard statistics are available

"



I have developed a model that aim at healing that part. It gives 
virtual_simulations

out of arbitrary ancestor at a rate of 1/2 for each node you go through. 

I called that process "GENERALIZED_AMAF". I'm not familiar

with mathematical terms, but i think the idea is dead simple :



 AMAF goes as this : 

Evaluate the expected results KNOWING that move A has been played.



GENERALIZED_AMAF :

Evaluate the expected results KNOWING that move A (AND) move B (AND) move C

has been played.



--

You can easily build for example an AMAF_map, which would get both the moves

that have been marked as "played" by black on one part, and the moves that has

been "played" by white in the other part. That is associated with the result 
(black win/lose)

Given a node, classical AMAF would then try to check out every of those maps,

for each simulation starting from that node where the child you are trying to 
score has been played.

Generalized AMAF can build statistics out simulation from arbitrary

ancestors from the considered node. You would get (1/2)^n simulations

from the ancestor as GENERALIZED_AMAF_simulations (n the number of choice
made from the ancestor to the node you assess).



Suppose that you have 1000 simulations for a root node R.

Then amaf gives you about 500 simulations for every child

let's call them Cn those child, where n is the id of the child.
Cn also represent the move made from R to reach that child.



Then for each child of Cn, you can get 250 generalized_amaf

simulations. you would consider from the set of simulations from the root, only 
those where
Cn has been played, and then aggregate the results in the AMAF fashion for each 
son of Cn.

My prototype was scoring 30% win against Gnu-go lvl 0, 
without any tuning using plain standard_light_simulations. 
It would use the generalized-amaf
as a way to get RAVE values. Then it would guarantees that
the most promising nodes is explored exponentially more.
(it used a raw non deterministic algorithm)
I did not however tried to compare that win ratio, with
the win ratio i would have got out of standard Amaf.

Best regard,
Denis FIDAALI.

> Date: Mon, 1 Dec 2008 21:55:03 +0100
> From: [EMAIL PROTECTED]
> To: computer-go@computer-go.org
> Subject: Re: [computer-go] Mogo MCTS is not UCT ?
> 
> >  I think it's now well known that Mogo doesn't use UCT.
> > I realize that i have no idea at all what Mogo do use for
> > it's MCTS.
> 
> A complicated formula mixing
> (i) patterns (ii) rules (iii) rave values (iv) online statistics
> 
> Also we have a little learning (i.e. late parts of simulations
> are evolved based on online statistics and not only the early parts).
> 
> > I really wonder if there was an article describing
> > the new MCTS of mogo somewhere that i missed.
> > How is it better than UCT ?
> 
> http://www.lri.fr/~teytaud/eg.pdf contains most of the information
> (many other things
> have been tried and kept as they provided small improvements, but
> essentially the
> ideas are in this version)
> 
> Best regards,
> Olivier
> ___
> computer-go mailing list
> computer-go@computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/

_
Téléphonez gratuitement à tous vos proches avec Windows Live Messenger  !  
Téléchargez-le maintenant !
http://www.windowslive.fr/messenger/1.asp___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Mogo MCTS is not UCT ? (GENERALIZED_AMAF)

2008-12-02 Thread Mark Boon
At first blush, it sounds like a reasonable idea. However, as always  
with these things, the proof of the cake is in the eating. So you'd  
have to try it out against a ref-bot to see if it improves things.  
You also need to keep an eye on time used, as it sounds a lot more  
CPU intensive than plain AMAF.


As to the quote below: what is meant with the 'very beginning'? If  
that's the very beginning of the game, then I suppose an opening book  
can be used for the initial statistics. If it means, at the beginning  
of the search process of a move, then I suppose you can start by  
using the data generated by the previous search. Most likely it's  
very rare your opponent plays a move that has not been investigated  
by your previous search at all.


Another thing you can do is use the opponent's time to do a little  
preparation. I haven't spent time looking at 'pondering' yet, but if  
it were me I'd start by building a two-ply tree of AMAF values. Say  
you expand the tree after N playouts. You do N simulations at the  
current level. Then you choose the best node and do N simulations  
there. Then at the second-best node, etc. After N*m (m=number of  
legal moves) simulations, you have the initial AMAF data for every  
possible move your opponent can play. Even with N fairly large you're  
likely to be able to finish this process before your opponents plays  
his move. Of course the time saved is very little, as will always be  
the case with pondering. The 'weakness' as seen in the article is a  
very small one.


Mark


On 2-dec-08, at 06:48, Denis fidaali wrote:


In the mentioned article it is said that :
"
A weakness remains, due to the initial time: at the very beginning,  
neither AMAF

values nor standard statistics are available
"

I have developed a model that aim at healing that part. It gives  
virtual_simulations
out of arbitrary ancestor at a rate of 1/2 for each node you go  
through.

I called that process "GENERALIZED_AMAF". I'm not familiar
with mathematical terms, but i think the idea is dead simple :

 AMAF goes as this :
Evaluate the expected results KNOWING that move A has been played.

GENERALIZED_AMAF :
Evaluate the expected results KNOWING that move A (AND) move B  
(AND) move C

has been played.

--
You can easily build for example an AMAF_map, which would get both  
the moves
that have been marked as "played" by black on one part, and the  
moves that has
been "played" by white in the other part. That is associated with  
the result (black win/lose)
Given a node, classical AMAF would then try to check out every of  
those maps,
for each simulation starting from that node where the child you are  
trying to score has been played.

Generalized AMAF can build statistics out simulation from arbitrary
ancestors from the considered node. You would get (1/2)^n simulations
from the ancestor as GENERALIZED_AMAF_simulations (n the number of  
choice

made from the ancestor to the node you assess).

Suppose that you have 1000 simulations for a root node R.
Then amaf gives you about 500 simulations for every child
let's call them Cn those child, where n is the id of the child.
Cn also represent the move made from R to reach that child.

Then for each child of Cn, you can get 250 generalized_amaf
simulations. you would consider from the set of simulations from  
the root, only those where
Cn has been played, and then aggregate the results in the AMAF  
fashion for each son of Cn.


My prototype was scoring 30% win against Gnu-go lvl 0,
without any tuning using plain standard_light_simulations.
It would use the generalized-amaf
as a way to get RAVE values. Then it would guarantees that
the most promising nodes is explored exponentially more.
(it used a raw non deterministic algorithm)
I did not however tried to compare that win ratio, with
the win ratio i would have got out of standard Amaf.

Best regard,
Denis FIDAALI.

> Date: Mon, 1 Dec 2008 21:55:03 +0100
> From: [EMAIL PROTECTED]
> To: computer-go@computer-go.org
> Subject: Re: [computer-go] Mogo MCTS is not UCT ?
>
> > I think it's now well known that Mogo doesn't use UCT.
> > I realize that i have no idea at all what Mogo do use for
> > it's MCTS.
>
> A complicated formula mixing
> (i) patterns (ii) rules (iii) rave values (iv) online statistics
>
> Also we have a little learning (i.e. late parts of simulations
> are evolved based on online statistics and not only the early  
parts).

>
> > I really wonder if there was an article describing
> > the new MCTS of mogo somewhere that i missed.
> > How is it better than UCT ?
>
> http://www.lri.fr/~teytaud/eg.pdf contains most of the information
> (many other things
> have been tried and kept as they provided small improvements, but
> essentially the
> ideas are in this version)
>
> Best regards,
> Olivier
> ___
> computer-go mailing list
> computer-go@computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/

Vo

Re: [computer-go] regression testing

2008-12-02 Thread Mark Boon

Yes, I asked that question.

So GnuGo already uses a comment-node to store the information in the  
SGF tree. But twogtp uses information from the .tst file. So why the  
difference?


On 2-dec-08, at 01:18, terry mcintyre wrote:

Someone recently asked about regression testing, and methods of  
expressing the expected results.


Unfortunately, I can't find the post in question.

The GnuGo regression suite appears to encode expected results in  
the .sgf file; here is an example:


$ cat Goemate990902-1.sgf
(;N[Goemate990902-1.gob]ID[Goemate990902-1.gob]BS[0]WS[0]GM[1]FF[3] 
SZ[19]
AB[eb][gb][lb][qb][cc][dc][hc][ic][kc][mc][qc][ad][dd][nd][od][pd] 
[rd][be][de][k
e][af][cf][ef][ff][gf][hf][mf][bg][cg][gg][mg][hh][ih][lh][nh][gi] 
[hi][ji][ki][m
i][qi][hj][ij][kj][mj][qj][bk][dk][ek][ik][jk][nk][qk][fl][gl][hl] 
[nl][ql][cm][p
m][kn][mn][nn][pn][sn][bo][jo][lo][po][dp][fp][jp][mp][pp][qp][rp] 
[cq][eq][er][h

r][ir][jr][gs]
AW[mb][nb][rb][ec][oc][pc][rc][bd][cd][ed][gd][hd][id][jd][qd][ce] 
[ee][pe][qe][r
e][df][if][jf][dg][hg][ig][lg][og][qg][bh][ch][dh][fh][gh][jh][oh] 
[ei][ni][oi][e
j][fj][nj][pj][gk][hk][kk][lk][mk][pk][il][jl][ll][pl][rl][hm][km] 
[lm][qm][rm][d
n][gn][in][qn][eo][fo][qo][ep][hp][lp][np][op][fq][gq][lq][mq][pq] 
[qq][rq][fr][k

r][fs][js]
;C[move(rk,black):best ])

Terry McIntyre <[EMAIL PROTECTED]>


-- Libertarians Do It With Consent!




___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Tree Search reference bot

2008-12-02 Thread Mark Boon
I have made some minor performance improvements and this is as far as  
I intend to take this particular project. I might make some small  
changes if necessary, but most likely I'll leave this largely  
unchanged from now.


I had set myself as an arbitrary goal that it should do at least 20K  
playouts. But with real liberties, AMAF and a RAVE formula I got  
stuck in the 16K-17K range. According to my profiler that is mostly  
due to the expensive formula used to compare nodes, where it says it  
spends 25% of total time. The formula I'm using is:


beta * (virtual-win-ratio + RAVE) + (1-beta) * (win-ratio + UCT)

beta = 1 - log(parent-visits) / 20
UCT = exploration-factor *sqrt( log(parent-visits) / (nr-visits+1) )
	RAVE = exploration-factor *sqrt( log(parent-visits) / (nr-virtual- 
visits+1) )


There are probably quite a few possibilities still to tune this  
program with regards to playing strength and performance. But I felt  
it doesn't help to obscure the implementation by too many details.


The implementation of the search algorithm was entirely game- 
independent, until I introduced AMAF. I didn't see how to fix that,  
as there's no way getting around that it's based on the fact that a  
move is represented by a single coordinate, which is mapped to an  
array to store the statistical values. But strip the AMAF part, which  
is a block of 30 odd lines of code, and I think it can be used for  
other games basically as-is. I did this not because I ever see myself  
program another game, but because in my experience in doing so I get  
a cleaner separation between modules.


At 2,000 playouts, it's still quite a bit weaker than the plain MC- 
AMAF refbot. It wins only about 33%. But that's probably because the  
1,000-2,000 playouts range is the sweet-spot for that particular type  
of playing engine. It doesn't scale from there, whereas the MCTS ref- 
bot only just gets warmed up with 2,000 playouts.


This leads me to a question. I suppose it might be of some interest  
to put this bot up on CGOS. But what parameters to use? The main one  
being the number of playouts, naturally.


Mark
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Tree Search reference bot

2008-12-02 Thread dhillismail
There is another speedup trick that might interest you. IIRC Lukasz Lew came up 
with it, but I forget what he called it. After calculating the selected move 
for an internal node (going through UCT and RAVE and whatnot) store that move. 
Then, for the next N visits to that node (where N is 5 or 10 or so), just 
repeat that move without having to calculate what the move would be.

- Dave Hillis

-Original Message-
From: Mark Boon <[EMAIL PROTECTED]>
To: computer-go 
Sent: Tue, 2 Dec 2008 11:17 pm
Subject: Re: [computer-go] Monte-Carlo Tree Search reference bot


I have made some minor performance improvements and this is as far as I intend 
to take this particular project. I might make some small changes if necessary, 
but most likely I'll leave this largely unchanged from now.?
?
I had set myself as an arbitrary goal that it should do at least 20K playouts. 
But with real liberties, AMAF and a RAVE formula I got stuck in the 16K-17K 
range. According to my profiler that is mostly due to the expensive formula 
used to compare nodes, where it says it spends 25% of total time. The formula 
I'm using is:?
?
? beta * (virtual-win-ratio + RAVE) + (1-beta) * (win-ratio + UCT)?
?
? beta = 1 - log(parent-visits) / 20?
? UCT = exploration-factor *sqrt( log(parent-visits) / (nr-visits+1) )?
? RAVE = exploration-factor *sqrt( log(parent-visits) / (nr-virtual-visits+1) )?
?
There are probably quite a few possibilities still to tune this program with 
regards to playing strength and performance. But I felt it doesn't help to 
obscure the implementation by too many details.?
?
The implementation of the search algorithm was entirely game-independent, until 
I introduced AMAF. I didn't see how to fix that, as there's no way getting 
around that it's based on the fact that a move is represented by a single 
coordinate, which is mapped to an array to store the statistical values. But 
strip the AMAF part, which is a block of 30 odd lines of code, and I think it 
can be used for other games basically as-is. I did this not because I ever see 
myself program another game, but because in my experience in doing so I get a 
cleaner separation between modules.?
?
At 2,000 playouts, it's still quite a bit weaker than the plain MC-AMAF refbot. 
It wins only about 33%. But that's probably because the 1,000-2,000 playouts 
range is the sweet-spot for that particular type of playing engine. It doesn't 
scale from there, whereas the MCTS ref-bot only just gets warmed up with 2,000 
playouts.?
?
This leads me to a question. I suppose it might be of some interest to put this 
bot up on CGOS. But what parameters to use? The main one being the number of 
playouts, naturally.?
?
Mark?
___?
computer-go mailing list?
[EMAIL PROTECTED]
http://www.computer-go.org/mailman/listinfo/computer-go/?

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Monte-Carlo Tree Search reference bot

2008-12-02 Thread Michael Williams
That's going to repeat the same exact path through the tree three times, isn't it?  If so, it seems like it would be more efficient to do N playouts from the 
leaf after each traversal of the tree.



[EMAIL PROTECTED] wrote:
There is another speedup trick that might interest you. IIRC Lukasz Lew 
came up with it, but I forget what he called it. After calculating the 
selected move for an internal node (going through UCT and RAVE and 
whatnot) store that move. Then, for the next N visits to that node 
(where N is 5 or 10 or so), just repeat that move without having to 
calculate what the move would be.


- Dave Hillis

-Original Message-
From: Mark Boon <[EMAIL PROTECTED]>
To: computer-go 
Sent: Tue, 2 Dec 2008 11:17 pm
Subject: Re: [computer-go] Monte-Carlo Tree Search reference bot

I have made some minor performance improvements and this is as far as I 
intend to take this particular project. I might make some small changes 
if necessary, but most likely I'll leave this largely unchanged from now. 
 
I had set myself as an arbitrary goal that it should do at least 20K 
playouts. But with real liberties, AMAF and a RAVE formula I got stuck 
in the 16K-17K range. According to my profiler that is mostly due to the 
expensive formula used to compare nodes, where it says it spends 25% of 
total time. The formula I'm using is: 
 
  beta * (virtual-win-ratio + RAVE) + (1-beta) * (win-ratio + UCT) 
 
  beta = 1 - log(parent-visits) / 20 
  UCT = exploration-factor *sqrt( log(parent-visits) / (nr-visits+1) ) 
  RAVE = exploration-factor *sqrt( log(parent-visits) / 
(nr-virtual-visits+1) ) 
 
There are probably quite a few possibilities still to tune this program 
with regards to playing strength and performance. But I felt it doesn't 
help to obscure the implementation by too many details. 
 
The implementation of the search algorithm was entirely 
game-independent, until I introduced AMAF. I didn't see how to fix that, 
as there's no way getting around that it's based on the fact that a move 
is represented by a single coordinate, which is mapped to an array to 
store the statistical values. But strip the AMAF part, which is a block 
of 30 odd lines of code, and I think it can be used for other games 
basically as-is. I did this not because I ever see myself program 
another game, but because in my experience in doing so I get a cleaner 
separation between modules. 
 
At 2,000 playouts, it's still quite a bit weaker than the plain MC-AMAF 
refbot. It wins only about 33%. But that's probably because the 
1,000-2,000 playouts range is the sweet-spot for that particular type of 
playing engine. It doesn't scale from there, whereas the MCTS ref-bot 
only just gets warmed up with 2,000 playouts. 
 
This leads me to a question. I suppose it might be of some interest to 
put this bot up on CGOS. But what parameters to use? The main one being 
the number of playouts, naturally. 
 
Mark 
___ 
computer-go mailing list 
computer-go@computer-go.org  
http://www.computer-go.org/mailman/listinfo/computer-go/ 



Tis the season to save your money! Get the new AOL Holiday Toolbar 
 
for money saving offers and gift ideas.





___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Tree Search reference bot

2008-12-02 Thread dhillismail
Hmm.. it could be that N is picked randomly each time... now I can't seem to 
find the description and my memory is not serving me well here. Perhaps someone 
else remembers? I'll try to track it down.

- Dave Hillis


-Original Message-
From: Michael Williams <[EMAIL PROTECTED]>
To: computer-go 
Sent: Wed, 3 Dec 2008 12:14 am
Subject: Re: [computer-go] Monte-Carlo Tree Search reference bot


That's going to repeat the same exact path through the tree three times, isn't 
it? If so, it seems like it would be more efficient to do N playouts from the 
leaf after each traversal of the tree.?
?
[EMAIL PROTECTED] wrote:?
> There is another speedup trick that might interest you. IIRC Lukasz Lew > 
> came up with it, but I forget what he called it. After calculating the > 
> selected move for an internal node (going through UCT and RAVE and > whatnot) 
> store that move. Then, for the next N visits to that node > (where N is 5 or 
> 10 or so), just repeat that move without having to > calculate what the move 
> would be.?
> > - Dave Hillis?
> > -Original Message-?
> From: Mark Boon <[EMAIL PROTECTED]>?
> To: computer-go ?
> Sent: Tue, 2 Dec 2008 11:17 pm?
> Subject: Re: [computer-go] Monte-Carlo Tree Search reference bot?
> > I have made some minor performance improvements and this is as far as I > 
> > intend to take this particular project. I might make some small changes > 
> > if necessary, but most likely I'll leave this largely unchanged from now. > 
> > > I had set myself as an arbitrary goal that it should do at least 20K > 
> > playouts. But with real liberties, AMAF and a RAVE formula I got stuck > in 
> > the 16K-17K range. According to my profiler that is mostly due to the > 
> > expensive formula used to compare nodes, where it says it spends 25% of > 
> > total time. The formula I'm using is: > > beta * (virtual-win-ratio + RAVE) 
> > + (1-beta) * (win-ratio + UCT) > > beta = 1 - log(parent-visits) / 20 > UCT 
> > = exploration-factor *sqrt( log(parent-visits) / (nr-visits+1) ) > RAVE = 
> > exploration-factor *sqrt( log(parent-visits) / > (nr-virtual-visits+1) ) > 
> > > There are probably quite a few possibilities still to tune this program > 
> > with regards to playing strength and performance. But I felt it doesn't > 
> > help to obscure 
 the implementation by too many details. > > The implementation of the search 
algorithm was entirely > game-independent, until I introduced AMAF. I didn't 
see how to fix that, > as there's no way getting around that it's based on the 
fact that a move > is represented by a single coordinate, which is mapped to an 
array to > store the statistical values. But strip the AMAF part, which is a 
block > of 30 odd lines of code, and I think it can be used for other games > 
basically as-is. I did this not because I ever see myself program > another 
game, but because in my experience in doing so I get a cleaner > separation 
between modules. > > At 2,000 playouts, it's still quite a bit weaker than the 
plain MC-AMAF > refbot. It wins only about 33%. But that's probably because the 
> 1,000-2,000 playouts range is the sweet-spot for that particular type of > 
playing engine. It doesn't scale from there, whereas the MCTS ref-bot > only 
just gets warmed up with 2,000 playouts. > > This leads 
 me to a question. I suppose it might be of some interest to > put th

is bot up on CGOS. But what parameters to use? The main one being > the number 
of playouts, naturally. > > Mark > 
___ > computer-go mailing list > 
computer-go@computer-go.org  > 
http://www.computer-go.org/mailman/listinfo/computer-go/ > > 
?
> Tis the season to save your money! Get the new AOL Holiday Toolbar > 
>  > for 
> money saving offers and gift ideas.?
> > > ?
> > ___?
> computer-go mailing list?
> [EMAIL PROTECTED]
> http://www.computer-go.org/mailman/listinfo/computer-go/?
?
___?
computer-go mailing list?
[EMAIL PROTECTED]
http://www.computer-go.org/mailman/listinfo/computer-go/?

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/