[computer-go] Re: FW: computer-go] Monte carlo play?

2008-11-16 Thread Claus Reinke
 In a game of self play, if both parties are employing only monte
 carlo, ... random simulations... wouldnt it be very weak...
 ... and some playing around I am clearly mistaken because its works
 quite well.
Yes, it doesn't make sense but it does indeed seem to work :-).

Plain Monte-Carlo bots *are* rather weak. But they work better than
might be expected, and they are fairly simple, which is extremely
motivating for potential Go programmers!-) Instead of having to learn
all of Go and a lot of AI before being able to start building anything,
one can start quickly, and then improve together with the bot. And
the current top programs claim plain Monte-Carlo bots as their remote
ancestors, so it isn't a dead end, either (though complete rewrites may
be needed somewhere along the path;-).

Another way to think about playouts is by comparing them with a
human player processing through the ranks:

1 only knowing the rules makes for very inefficient/slow/buggy play
2 playing lots of games quickly has often been recommended to get a
better feeling for the game; personally, I don't like fast games(*), but
at least the worst inadequacies tend to show up even if we only play
equally clueless opponents (probably because weaknesses aren't
evenly distributed, so that peaks in understanding in one player
happen to match weaknesses in understanding in the other)
3 human players learn over time, until all players in a group have
reached an equal level (or have gone as far as they can, with the
effort they can afford to put into the game)
4 now further progress depends on innovation in the existing pool
of players, on joining a different player pool, or on bringing in
other players, preferably players whose strengths expose
weaknesses in the current pool (similar effects can be achieved
by reading books or observing games)

If we allow the learning outcome in 2/3 to be represented as patterns
(wait a minute, I've fallen into that trap before, so I shouldn't do this,
I usually do this, but that would be faster - I wonder whether it'll work),
then random playouts can emulate a weak form of this process. Instead
of learning from lots of games over time, plain Monte-Carlo bots
extrapolate from lots of dumb games, everytime they consider their
next move.

Random games are really too dumb to be considered games, but bots
can play lots of them in a short time, so they can at least see trends (a
mixture of on-the-spot learning and very bad reading ahead). If we
assume that the quantity compensates somewhat for the quality, a
Monte-Carlo bot is a bit like beginners after lots of games who
always forget their lessons learned when the session ends.

They may have an idea of who is ahead and what is alive, and be able
to organise their games based on those ideas, but a stronger player (who
remembers his lessons) may be able to rip their ideas apart (you might
both think that group is alive but what do you do if I play here? and who
is ahead if you lose that group?). And then there is the scenario of strong
player visits club and humiliates local champion (you might think that
playing there kills that group, but what if they reply here?).

Adding patterns or other biases to make playouts heavier (both slower
and more relevant for evaluation) is similar to improving reading skills,
but with differences: bots already read to the end, so eliminating useless
moves doesn't improve the playout depth, it improves at best the density
of information to be gained from playouts (so you don't have to play as
many of them, or can get more out of those you do play); but playouts
are also still used for on-the-spot-learning, and biases have limited scope
for success, so one has to be careful not to ruin this aspect. Using a tree
on top of the playouts is not just a way to represent the knowledge
extracted from the playouts, but also helps to direct playout effort towards
relevant situations, and gives a place where traditional Go knowledge
can be built in without affecting the playouts directly (so they become
more of an evaluation function than a candidate move generator).

One problem is that the top programs tend to be fairly close in strength.
There are variations in method, so there is some further progress, but
how much of the human going-through-the-ranks evolution has been
captured in current top programs, I don't know. I suspect that it is less
than their ranks would indicate. There is certainly reading going on,
discussion, and inviting stronger players or different bots.

Oh, and if any of this seems to make sense, remember that I just made
it all up!-) You might want to consult these slides for overviews from
folks who have been in the business:


Old-fashioned Computer Go vs Monte-Carlo Go; by Bruno Bouzy,
Invited tutorial, IEEE 2007 Symposium on Computational Intelligence
in Games, CIG 07, Hawaï, USA, 1st April 2007.
Abstract:


Re: [computer-go] Re: FW: computer-go] Monte carlo play?

2008-11-16 Thread Thomas Wolf
On Sun, 16 Nov 2008, Claus Reinke wrote:

 ...
 better feeling for the game; personally, I don't like fast games(*), but
 ...

But there is this saying:

Play quick, lose quick, learn quick!   :)

Thomas
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/