Van: [EMAIL PROTECTED]
Verzonden: za 15-11-2008 22:44
Aan: tony tang
Onderwerp: RE: computer-go] Monte carlo play?


Hello Tony,
 
Ok, so you know about the minimax algorithm and the like. My impression was 
wrong and I'm very sorry for my analogy.
 
I'm no expert on montecarlo, so I can only say what I know. But most of the 
real experts have been rather quiet during the last week, so perhaps I can act 
as a stand-in.
 
The basics is indeed absolutely random play (alternating turns and removing 
captured stones as in normal play) until the game finishes. The only exceptions 
to random plays are occupied intersections, simple ko recaptures, suicide moves 
(not sure if all programs prune suicide if the rules allow it) and moves that 
fill one-point eyes of the color who is to play (there may be slight 
differences in the exact definition of a one point eye).
 
Without the one eye pruning, random games would never finish, because both 
random players would keep on killing their own stones and refilling the empty 
space after the opponent has captured. 
Even with eye detection and simple ko detection, the game could end up in a 
cycle occasionally because of triple ko or more pathological situations, so one 
should probably also limit the number of moves and discard the result if that 
limit is exceeded. This is probably a much cheaper solution to the cycle 
problem than a more sophisticated detection). 
 
A purely random game is played out until both players are forced to pass. This 
is a called a light playout. Claus Reinke recently dubbed the term "random 
fill-in" for this process 
(http://www.mail-archive.com/computer-go@computer-go.org/msg09749.html). I like 
this term better than "light playout", but it doesn't seem to catch on here.
 
The board is then counted to determine the winner and this binary result is 
added to a statistic for the move that is currently being evaluated. The 
accumulated statistic is effectively the evaluation value for the position. It 
is then backtracked to the game tree root in a minimax way.
 
In go it is hard to quickly evaluate a position statically (or even slowly for 
that matter). So it is nice to have a dynamic "evaluation" that becomes more 
accurate as you spend more cpu cyles on it. You can see that it's important 
that random fill-ins are fast, but it's still a rather large investment to do a 
lot of playouts, when you compare this with a static evaluation function. So it 
is even more important to avoid wasting effort in evaluating positions by 
random fill-ins. The trick is to use crude statistics (few fill-ins) to decide 
which statistic to refine by investing more fill-ins. This is where the UCT 
value comes in (http://senseis.xmp.net/?UCT). It is derived from the statistic 
and it decides which position to investigate further, preferring good 
statistics over bad ones, but also preferring crude statistics over accurate 
ones.
 
There is no training involved and there is no a priori knowledge involved here 
except for the rules and eye detection. The program has absolutely no idea 
about go concepts.
 
Would you call this process learning?
 
Programs using this algorithm are called "pure" MC programs and although they 
get better with more CPU cycles, they aren't very strong with normal time 
limits on current hardware (or hardware in the near future). You are right, by 
itself pure MC is just not effective enough.
 
So programmers started to build in more Go knowledge to improve on this. This 
makes the fill-in process slower, so they are called "heavy playouts". And this 
is also where my knowledge ends. I may guess that instead of selecting moves in 
a uniform random way, moves are weighted by a priori Go knowledge built in by 
the programmer, but I know no details and I don't know if there have been 
publications on this. Perhaps there aren't any, because it is also a 
competitive field and commercial interests might be involved too.
 
Other contributors can probably tell you more about this.
 
Best Regards,
Dave 
 

________________________________

Van: tony tang [mailto:[EMAIL PROTECTED]
Verzonden: za 15-11-2008 21:22
Aan: [EMAIL PROTECTED]
Onderwerp: [RE:computer-go] Monte carlo play?


Hello Dave,

Thank you for a thorough introduction of the theory, and i sincerely hope I am 
not wasting your time with amateur questions.
Because it was quite late when I sent the question I think I didnt word it 
correctly and mis-communicated 
my question. 

Being a computer scientist but new to go, i can grasp some of the theory. The 
question I was trying to get across 
was:

In a game of self play, if both parties are employing only monte carlo, surely 
its not a good conceptual representation
of a human, and if the reinforcement learning is based on random simulations 
wouldnt it be very weak when playing a real human?
According to nick wedd and some playing around I am clearly mistaken because 
its works quite well.
Is it because they are training(not in the neural network sense) their monte 
carlo based engine against another engine with prior knowledge or something
of that nature?

I have seen papers implementing patterns and prior knowledge with monte carlo 
(dated 2006) has that become a "standard"
now, and when people refer to monte carlo they dont mean absolutly random? 

Thank you

Kind regards

Tony


________________________________

Read amazing stories to your kids on Messenger Try it Now! 
<http://clk.atdmt.com/UKM/go/117588488/direct/01/>  
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to