Re: [computer-go] FW: computer-go] Monte carlo play?

2008-11-17 Thread Magnus Persson
I use a method inititally from the Mogo team that sorts of randomizes  
the position before running the heavy playout. One simply plays  
uniformly random *non contact* moves. The effect of this is that it  
preserves the shapes of stones on the board, but it prevents the heavy  
playouts from playing long sequences deterministically. I have not  
tested this, but I felt that the evaluation on large boards got less  
noisy and/or biased doing this.


Magnus

Quoting Michael Williams [EMAIL PROTECTED]:


It seems move selection in the playouts should be very random at first
and more deterministic toward the end of the playout.  Has anyone tried
that?



___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] FW: computer-go] Monte carlo play?

2008-11-17 Thread Magnus Persson
I have to add that it is possible that a large part of the advantage  
from using heavy playouts in valkyria comes from using the same code  
to bias the the exploration part of MCTS.


I could probably test it by simply relying completely on AMAF with the  
proximity heuristic as the only bias.


A second experiment would be to rewrite the playouts and make them superfast.

-Magnus

Quoting Mark Boon [EMAIL PROTECTED]:



On 17-nov-08, at 02:42, George Dahl wrote:


So you say that: ...I'm observing that most of the increase in level
comes from the selection during exploration and only in small part
from the selection during simulation., could you elaborate at all?
This is very interesting.  That almost suggests it might be fruitful
to use the patterns in the tree, but keep lighter playouts.


That's exactly what it's suggesting. But as I said, I need to do some
more testing to make a hard case for that.

Mark

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/




--
Magnus Persson
Berlin, Germany
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] FW: computer-go] Monte carlo play?

2008-11-17 Thread Darren Cook
 So you say that: ...I'm observing that most of the increase in level
 comes from the selection during exploration and only in small part
 from the selection during simulation., could you elaborate at all?
 This is very interesting.  That almost suggests it might be fruitful
 to use the patterns in the tree, but keep lighter playouts.

If I understand David Fotland [1] (and you) correctly this is the same
approach Many Faces uses. Keep the playouts light, but it is worth
spending effort to guide the MCTS tree.

I also think Remi was saying the same in his paper [2] on using move
patterns. E.g. in section 4.1 he says he just used a few features in the
simulations, but used all the features to control the progressive
widening of the monte-carlo search tree (section 4.2).

Darren


[1]: http://www.mail-archive.com/computer-go@computer-go.org/msg09759.html

[2]:
http://remi.coulom.free.fr/Amsterdam2007/

-- 
Darren Cook, Software Researcher/Developer
http://dcook.org/mlsn/ (English-Japanese-German-Chinese-Arabic
open source dictionary/semantic network)
http://dcook.org/work/ (About me and my work)
http://dcook.org/blogs.html (My blogs and articles)
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] FW: computer-go] Monte carlo play?

2008-11-16 Thread Heikki Levanto
On Sat, Nov 15, 2008 at 11:38:34PM +0100, [EMAIL PROTECTED] wrote:
 Being a computer scientist but new to go, i can grasp some of the theory.
 The question I was trying to get across was:
 
 In a game of self play, if both parties are employing only monte carlo,
 surely its not a good conceptual representation of a human, and if the
 reinforcement learning is based on random simulations wouldnt it be very
 weak when playing a real human?


Here is another amateur answering.

The way I understand it, modern Monte Carlo programs do not even try to
emulate a human player with a random player - obviously that would not work.

What they do is that they build a quite traditional search tree starting from
the current position. They use a random playout as a crude way to evaluate a
position. Based on this evaluation, they decide which branch of the tree to
expand.

This is the way I understand the random playouts: If, in a given position,
white is clearly ahead, he will win the game if both parts play perfect
moves. He is also likely to win if both parts play reasonably good moves
(say, like human amateurs), but there is a bit more of a chance that one
player hits upon a good combination which the other misses, so the result is
not quite as reliable. If the playouts are totally random, there is still a
better chance for white to win, because both parts make equally bad moves.
The results have much more variation, of course. So far it does not sound
like a very good proposal, but things change if you consider the facts that
we don't have perfecr oracles, and good humans are slow to play out a
position, and can not be integrated into a computer program. Whereas random
playouts can be done awfully fast, tens of thousands of playouts in a second.
Averaging the reuslts gives a fair indication of who is more likely to win
from that position, just what is needed to decide which part of the search
tree to expand.

The 'random' playouts are not totally random, they include a minimum of
tactical rules (do not fill own eyes, do not pass as long as there are valid
moves). Even this little will produce a few blind spots, moves that the
playouts can not see, and systematically wrong results. Adding more
go-specific knowledge can make the results much better (more likely to be
right), but can also add some more blind spots. And it costs time, reducing
the number of playouts the program can make.

Hope that explains something of the mystery


Regards

   Heikki

-- 
Heikki Levanto   In Murphy We Turst heikki (at) lsd (dot) dk

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] FW: computer-go] Monte carlo play?

2008-11-16 Thread D Gilder
On Sunday 16 November 2008, Heikki Levanto wrote:
 On Sat, Nov 15, 2008 at 11:38:34PM +0100, [EMAIL PROTECTED] wrote:
  Being a computer scientist but new to go, i can grasp some of the theory.
  The question I was trying to get across was:
 
  In a game of self play, if both parties are employing only monte carlo,
  surely its not a good conceptual representation of a human, and if the
  reinforcement learning is based on random simulations wouldnt it be very
  weak when playing a real human?

 Here is another amateur answering.

 The way I understand it, modern Monte Carlo programs do not even try to
 emulate a human player with a random player - obviously that would not
 work.

 What they do is that they build a quite traditional search tree starting
 from the current position. They use a random playout as a crude way to
 evaluate a position. Based on this evaluation, they decide which branch of
 the tree to expand.

 This is the way I understand the random playouts: If, in a given position,
 white is clearly ahead, he will win the game if both parts play perfect
 moves. He is also likely to win if both parts play reasonably good moves
 (say, like human amateurs), but there is a bit more of a chance that one
 player hits upon a good combination which the other misses, so the result
 is not quite as reliable. If the playouts are totally random, there is
 still a better chance for white to win, because both parts make equally bad
 moves. The results have much more variation, of course. So far it does not
 sound like a very good proposal, but things change if you consider the
 facts that we don't have perfecr oracles, and good humans are slow to play
 out a position, and can not be integrated into a computer program. Whereas
 random playouts can be done awfully fast, tens of thousands of playouts in
 a second. Averaging the reuslts gives a fair indication of who is more
 likely to win from that position, just what is needed to decide which part
 of the search tree to expand.

Do you know what use (if any) is made of the standard deviation of the 
results?


 The 'random' playouts are not totally random, they include a minimum of
 tactical rules (do not fill own eyes, do not pass as long as there are
 valid moves). Even this little will produce a few blind spots, moves that
 the playouts can not see, and systematically wrong results. Adding more
 go-specific knowledge can make the results much better (more likely to be
 right), but can also add some more blind spots. And it costs time, reducing
 the number of playouts the program can make.

 Hope that explains something of the mystery


 Regards

Heikki


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] FW: computer-go] Monte carlo play?

2008-11-16 Thread Hideki Kato
Hello Heikki,

Heikki Levanto: [EMAIL PROTECTED]:
On Sat, Nov 15, 2008 at 11:38:34PM +0100, [EMAIL PROTECTED] wrote:
 Being a computer scientist but new to go, i can grasp some of the theory.
 The question I was trying to get across was:
 
 In a game of self play, if both parties are employing only monte carlo,
 surely its not a good conceptual representation of a human, and if the
 reinforcement learning is based on random simulations wouldnt it be very
 weak when playing a real human?


Here is another amateur answering.

The way I understand it, modern Monte Carlo programs do not even try to
emulate a human player with a random player - obviously that would not work.

I believe CrazyStone's use of patterns does so and it seems 
successful.

Hideki

What they do is that they build a quite traditional search tree starting from
the current position. They use a random playout as a crude way to evaluate a
position. Based on this evaluation, they decide which branch of the tree to
expand.

This is the way I understand the random playouts: If, in a given position,
white is clearly ahead, he will win the game if both parts play perfect
moves. He is also likely to win if both parts play reasonably good moves
(say, like human amateurs), but there is a bit more of a chance that one
player hits upon a good combination which the other misses, so the result is
not quite as reliable. If the playouts are totally random, there is still a
better chance for white to win, because both parts make equally bad moves.
The results have much more variation, of course. So far it does not sound
like a very good proposal, but things change if you consider the facts that
we don't have perfecr oracles, and good humans are slow to play out a
position, and can not be integrated into a computer program. Whereas random
playouts can be done awfully fast, tens of thousands of playouts in a second.
Averaging the reuslts gives a fair indication of who is more likely to win
from that position, just what is needed to decide which part of the search
tree to expand.

The 'random' playouts are not totally random, they include a minimum of
tactical rules (do not fill own eyes, do not pass as long as there are valid
moves). Even this little will produce a few blind spots, moves that the
playouts can not see, and systematically wrong results. Adding more
go-specific knowledge can make the results much better (more likely to be
right), but can also add some more blind spots. And it costs time, reducing
the number of playouts the program can make.

Hope that explains something of the mystery


Regards

   Heikki
--
[EMAIL PROTECTED] (Kato)
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] FW: computer-go] Monte carlo play?

2008-11-16 Thread Heikki Levanto
On Sun, Nov 16, 2008 at 11:46:28AM +, D Gilder wrote:
  This is the way I understand the random playouts: If, in a given position,
  white is clearly ahead, he will win the game if both parts play perfect
  moves. He is also likely to win if both parts play reasonably good moves
  (say, like human amateurs), but there is a bit more of a chance that one
  player hits upon a good combination which the other misses, so the result
  is not quite as reliable. If the playouts are totally random, there is
  still a better chance for white to win, because both parts make equally bad
  moves. The results have much more variation, of course. So far it does not
  sound like a very good proposal, but things change if you consider the
  facts that we don't have perfecr oracles, and good humans are slow to play
  out a position, and can not be integrated into a computer program. Whereas
  random playouts can be done awfully fast, tens of thousands of playouts in
  a second. Averaging the reuslts gives a fair indication of who is more
  likely to win from that position, just what is needed to decide which part
  of the search tree to expand.
 
 Do you know what use (if any) is made of the standard deviation of the 
 results?

Now statistics isn't my strong point, but one of the first and most
successfull systems was called UCT for Upper Confidence Bound. It calculated
some sort of expected error, and added that to the winning ratio. Then it
expanded the branch that had the highest value. If that expansion was a win,
the error margin would get smaller, but the average result would get higher.
Perhaps some other branch would be tried next, but this one would still be a
good candidate. If the result was a loss, the average would drop, and so
would the error, so this move would become much less likely to be expanded.

I am sure someone who understands statistics will soon jump in and correct my
explanation :-)

  - Heikki

-- 
Heikki Levanto   In Murphy We Turst heikki (at) lsd (dot) dk

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] FW: computer-go] Monte carlo play?

2008-11-16 Thread Magnus Persson

Quoting Hideki Kato [EMAIL PROTECTED]:


Heikki Levanto: [EMAIL PROTECTED]:

The way I understand it, modern Monte Carlo programs do not even try to
emulate a human player with a random player - obviously that would not work.


I believe CrazyStone's use of patterns does so and it seems
successful.


With Valkyria I try to follow two principles in heavy playouts.


1) In contact fights there are a lot of shapes that are played most of  
the time. Thus Valkyria checks each move played if there is an obvious  
local response to it. If so it plays it deterministcally. In many  
situations there are two or more such candidates and then it plays one  
of those moves.


2) In many positions the last move played does not trigger any obvious  
response, and then a random move is chosen uniformly


3) There are moves that are inferior 100% of the time both locally and  
globally. These moves are pruned if they are selected and a new random  
move is chosen as long as there are moves left to try.


I got hundreds of handcoded patterns for both 1 and 3. It would be too  
time consuming to test these patterns, so I use my knowledge and  
intuition (European 2 Dan) to simply decide what patterns to include.


So Valkyria has a lot of go knowledge, but mostly such knowledge that  
all go players have up to some strength such as perhaps 8-10 kyu. It  
has no knowledge about global matters. The beauty of MC-evaluation is  
that globally strong moves are most of the time evaluated better than  
globally weak moves. Heavy playouts removes noise from MC-evaluation  
and makes it more sensitive to the true value of moves. Still there  
are biases with all heavy playouts, but they are overcome with MC Tree  
Search (MCTS) that corrects mistakes in the evaluation recursively.


Here are my latest scaling experiment on 9x9 for Valkyria.

Valkyria plays 1150 random games per second on my 4 year old laptop.

This test is against gnugo 3.7.10 assumed to be Elo 1800. Most  
datapoints are based on 500 games. N sims means Valkyria playes N  
heavy playouts per move played. Winrates are in %.


N sims  WinRate Elo (rel Gnu)
47  7.4 1361
94  22  1580
188 37  1708
375 53  1821
750 69.91946
150081.22054
300088  2146
600092.62239
12000   94  2278
24000   97.22416
48000   97.42429

the heavy playouts of Valkyria needs just 375 random games per move to  
match gnugo using only 0.3 seconds per move. And even using only 47  
simulations per move it can still win.


So obviously the heavy playout code of Valkyria is much weaker ( Elo  
1361) than Gnugo and most human opponents, but compared to CGOS a lot  
of programs witho no knowledge are about the same level, although they  
uses 2000 simulations or more.


Searching efficiently using MCTS with AMAF it apparently can be made  
arbitrarily strong.


Hope this explains how both the nature of playouts and the MCTS  
contributes to the playing strength of a program.


Should one go heavy or light? I do not know, I feel that Valkyria is a  
little bit too slow on equivalent hardware against most top programs.  
On the other hand I think it could be tweaked and improved upon.  
Perhaps it can even be made faster by removing code that does not  
improve playing strength. And there is probably still room for adding  
code that improves strength without a noticable slowdown.


I just know that is a lot of hard work doing it the way I did it.

Best
Magnus
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] FW: computer-go] Monte carlo play?

2008-11-16 Thread Don Dailey
The random playouts or even  heavy playouts are not intended to emulate
a human player.   Heikki is exactly right.   

It's a crude measurement of how good the position is.   The moves in a
random playout are horrible and so are the moves in a heavy playout.
In fact, improving them arbitrarily will probably hurt the playouts.
Random playouts work reasonably well because to a certain extent bad
moves cancel each other.   Chrilly called this mutual stupidity I
think.  

For example,  a group of stones may be weak and ultimately lost, but it
is still possible to defend that group if the attacker isn't diligent.
What happens in the playouts is that the attacker is NOT diligent, but
neither is the defender!   So the result comes out correct anyway.

Of course this is not reliable, but it's amazingly good at getting a
reasonable picture of what is weak, what is strong and who owns what.

You can improve that by adding some knowledge to the playouts,  but you
must do this with great care.   In my example above, let's say you add a
piece of knowledge that causes the defender to succeed.   You can argue
that the playouts play better go now,  but the conclusion that you
come to for the group we are talking about is now wrong.   

- Don



On Sun, 2008-11-16 at 10:08 +0100, Heikki Levanto wrote:
 On Sat, Nov 15, 2008 at 11:38:34PM +0100, [EMAIL PROTECTED] wrote:
  Being a computer scientist but new to go, i can grasp some of the theory.
  The question I was trying to get across was:
  
  In a game of self play, if both parties are employing only monte carlo,
  surely its not a good conceptual representation of a human, and if the
  reinforcement learning is based on random simulations wouldnt it be very
  weak when playing a real human?
 
 
 Here is another amateur answering.
 
 The way I understand it, modern Monte Carlo programs do not even try to
 emulate a human player with a random player - obviously that would not work.
 
 What they do is that they build a quite traditional search tree starting from
 the current position. They use a random playout as a crude way to evaluate a
 position. Based on this evaluation, they decide which branch of the tree to
 expand.
 
 This is the way I understand the random playouts: If, in a given position,
 white is clearly ahead, he will win the game if both parts play perfect
 moves. He is also likely to win if both parts play reasonably good moves
 (say, like human amateurs), but there is a bit more of a chance that one
 player hits upon a good combination which the other misses, so the result is
 not quite as reliable. If the playouts are totally random, there is still a
 better chance for white to win, because both parts make equally bad moves.
 The results have much more variation, of course. So far it does not sound
 like a very good proposal, but things change if you consider the facts that
 we don't have perfecr oracles, and good humans are slow to play out a
 position, and can not be integrated into a computer program. Whereas random
 playouts can be done awfully fast, tens of thousands of playouts in a second.
 Averaging the reuslts gives a fair indication of who is more likely to win
 from that position, just what is needed to decide which part of the search
 tree to expand.
 
 The 'random' playouts are not totally random, they include a minimum of
 tactical rules (do not fill own eyes, do not pass as long as there are valid
 moves). Even this little will produce a few blind spots, moves that the
 playouts can not see, and systematically wrong results. Adding more
 go-specific knowledge can make the results much better (more likely to be
 right), but can also add some more blind spots. And it costs time, reducing
 the number of playouts the program can make.
 
 Hope that explains something of the mystery
 
 
 Regards
 
Heikki
 


signature.asc
Description: This is a digitally signed message part
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

RE: [computer-go] FW: computer-go] Monte carlo play?

2008-11-16 Thread Magnus Persson
Yes, Valkyria does a lot of ladder reading as well. Actually pattern  
matching in the case of Valkyria is a little unclear, it is a  
decision trees where the leaves are often procedure calls that looks  
at a larger portion of the board. The ladder code is called for  
various reasons in the tree.


Is 3.7.10 level 10 weaker than the default settings? I will run a test  
using 5000 playouts and 375 playouts to see if it makes a difference.


Also have you tried this version on CGOS9x9?

This version http://cgos.boardspace.net/9x9/cross/Valkyria3.2.6_C2.html
used a machine similar to yours and reached 3000 playouts per second  
using two threads. Your code is then 8 times faster and should be much  
stronger.



-Magnus



Quoting David Fotland [EMAIL PROTECTED]:


I thought Valkyria does local search (ladders) during the playouts.

Many Faces is lighter on the playouts.  I have 17 local 3x3 patterns, then
go to uniform random without filling eyes.

Against Gnugo 3.7.10 level 10 on 9x9, with 5000 playouts, I win 92%, so our
performance is similar.  I'm doing 23K playouts per second on my 2.2 GHz
Core 2 Duo, so my performance might be a little better, depending on the
specs of your old machine.


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


RE: [computer-go] FW: computer-go] Monte carlo play?

2008-11-16 Thread David Fotland
I think I added a small capture bias, but it didn't make much difference.
Sorry, I forgot that it isn't quite pure random.  Before the uniform random,
if there is an enemy one liberty group on the board, with some small
probability, I capture it.  

A pattern includes don't cares and is matched in any orientation.  The 3x3
patterns are only matched adjacent to the last move (8 local places).

David

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:computer-go-
 [EMAIL PROTECTED] On Behalf Of Jason House
 Sent: Sunday, November 16, 2008 5:06 PM
 To: computer-go
 Subject: Re: [computer-go] FW: computer-go] Monte carlo play?
 
 On Nov 16, 2008, at 11:18 AM, David Fotland [EMAIL PROTECTED]
 games.com wrote:
 
  I thought Valkyria does local search (ladders) during the playouts.
 
  Many Faces is lighter on the playouts.  I have 17 local 3x3
  patterns, then
  go to uniform random without filling eyes.
 
 
 No capture bias in the playouts? I thought that was a big strength
 boost.
 
 Out of curiosity, how do you count your patterns. For example, is it
 still one pattern if it includes a don't care? How about rotations/
 reflections of the same basic pattern?
 
 
 
 
  Against Gnugo 3.7.10 level 10 on 9x9, with 5000 playouts, I win 92%,
  so our
  performance is similar.  I'm doing 23K playouts per second on my 2.2
  GHz
  Core 2 Duo, so my performance might be a little better, depending on
  the
  specs of your old machine.
 
  David
 
  -Original Message-
  From: [EMAIL PROTECTED] [mailto:computer-go-
  [EMAIL PROTECTED] On Behalf Of Magnus Persson
  Sent: Sunday, November 16, 2008 5:45 AM
  To: computer-go@computer-go.org
  Subject: Re: [computer-go] FW: computer-go] Monte carlo play?
 
  Quoting Hideki Kato [EMAIL PROTECTED]:
 
  Heikki Levanto: [EMAIL PROTECTED]:
  The way I understand it, modern Monte Carlo programs do not even
  try to
  emulate a human player with a random player - obviously that
  would not
  work.
 
  I believe CrazyStone's use of patterns does so and it seems
  successful.
 
  With Valkyria I try to follow two principles in heavy playouts.
 
 
  1) In contact fights there are a lot of shapes that are played most
  of
  the time. Thus Valkyria checks each move played if there is an
  obvious
  local response to it. If so it plays it deterministcally. In many
  situations there are two or more such candidates and then it plays
  one
  of those moves.
 
  2) In many positions the last move played does not trigger any
  obvious
  response, and then a random move is chosen uniformly
 
  3) There are moves that are inferior 100% of the time both locally
  and
  globally. These moves are pruned if they are selected and a new
  random
  move is chosen as long as there are moves left to try.
 
  I got hundreds of handcoded patterns for both 1 and 3. It would be
  too
  time consuming to test these patterns, so I use my knowledge and
  intuition (European 2 Dan) to simply decide what patterns to include.
 
  So Valkyria has a lot of go knowledge, but mostly such knowledge that
  all go players have up to some strength such as perhaps 8-10 kyu. It
  has no knowledge about global matters. The beauty of MC-evaluation is
  that globally strong moves are most of the time evaluated better than
  globally weak moves. Heavy playouts removes noise from MC-evaluation
  and makes it more sensitive to the true value of moves. Still there
  are biases with all heavy playouts, but they are overcome with MC
  Tree
  Search (MCTS) that corrects mistakes in the evaluation recursively.
 
  Here are my latest scaling experiment on 9x9 for Valkyria.
 
  Valkyria plays 1150 random games per second on my 4 year old laptop.
 
  This test is against gnugo 3.7.10 assumed to be Elo 1800. Most
  datapoints are based on 500 games. N sims means Valkyria playes N
  heavy playouts per move played. Winrates are in %.
 
  N simsWinRateElo (rel Gnu)
  477.41361
  94221580
  188371708
  375531821
  75069.91946
  150081.22054
  3000882146
  600092.62239
  12000942278
  2400097.22416
  4800097.42429
 
  the heavy playouts of Valkyria needs just 375 random games per move
  to
  match gnugo using only 0.3 seconds per move. And even using only 47
  simulations per move it can still win.
 
  So obviously the heavy playout code of Valkyria is much weaker ( Elo
  1361) than Gnugo and most human opponents, but compared to CGOS a lot
  of programs witho no knowledge are about the same level, although
  they
  uses 2000 simulations or more.
 
  Searching efficiently using MCTS with AMAF it apparently can be made
  arbitrarily strong.
 
  Hope this explains how both the nature of playouts and the MCTS
  contributes to the playing strength of a program.
 
  Should one go heavy or light? I do not know, I feel that Valkyria
  is a
  little bit too slow on equivalent hardware against most top programs.
  On the other hand I think

Re: [computer-go] FW: computer-go] Monte carlo play?

2008-11-16 Thread Mark Boon
Some months ago I did several experiments with using tactics and  
patterns in playouts. Generally I found a big boost in strength using  
tactics. I also found a boost in strength using patterns but with a  
severe diminishing return after a certain number and even becoming  
detrimental when using large number of patterns (1,000s to 10,000s).  
Since I was using a generalized pattern-matcher, using it slowed  
things down considerably. Although it played a lot better with the  
same number of playouts, if I compared MC playouts with patterns to a  
MC playout without patterns using the same amount of CPU time the  
gain was not so obvious. Since most of the gain in strength was  
gained by just a few patterns I concluded just as David that it was  
probably better to just use a handful of hard-coded patterns during  
playouts.


I only recently started to do real experiments with hard-coded  
patterns and so far my results are rather inconclusive. I found when  
mixing different things it's not always clear what contributes to any  
increased strength observed. So I'm still in the process of trying to  
dissect what is actually contributing where. I found for example that  
a lot of the increased level of play using patterns does not come  
from using them during playouts but comes from the effect they have  
on move-exploration. I don't know if this is due to my particular way  
of implementing MC playouts in combination with UCT search, but moves  
matching a pattern (usually) automatically make it first in the tree- 
expansion as well and generally I can say so far I'm observing that  
most of the increase in level comes from the selection during  
exploration and only in small part from the selection during simulation.


For example, in one particular experiment using just 5 patterns I saw  
a win-rate of 65% against the same program not using patterns (with  
the same number of playouts). But when not using the patterns during  
exploration saw the win-rate drop to just 55%.


I still have a lot of testing to do and it's too early to draw any  
hard conclusions. But I think it's worthwhile trying to distinguish  
where the strength is actually gained. Better yet, finding out  
exactly 'why' it gained strength, because with MC playouts I often  
find results during testing highly counter-intuitive, occasionally to  
the point of being (seemingly) nonsensical.


I also think what Don was proposing with his reference-bot could be  
interesting. Trying to make it play around ELO 1700 on CGOS just  
using 5,000 (light) playouts. I don't know if it's possible, but I  
think it's a fruitful exercise. At a time where most people are  
looking at using more and more hardware to increase playing-strength,  
knowing what plays best at the other end of the spectrum is valuable  
as well. With that I mean, finding what plays best using severely  
constrained resources.


Mark

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] FW: computer-go] Monte carlo play?

2008-11-16 Thread George Dahl
So you say that: ...I'm observing that most of the increase in level
comes from the selection during exploration and only in small part
from the selection during simulation., could you elaborate at all?
This is very interesting.  That almost suggests it might be fruitful
to use the patterns in the tree, but keep lighter playouts.
- George

On Sun, Nov 16, 2008 at 10:39 PM, Mark Boon [EMAIL PROTECTED] wrote:
 Some months ago I did several experiments with using tactics and patterns in
 playouts. Generally I found a big boost in strength using tactics. I also
 found a boost in strength using patterns but with a severe diminishing
 return after a certain number and even becoming detrimental when using large
 number of patterns (1,000s to 10,000s). Since I was using a generalized
 pattern-matcher, using it slowed things down considerably. Although it
 played a lot better with the same number of playouts, if I compared MC
 playouts with patterns to a MC playout without patterns using the same
 amount of CPU time the gain was not so obvious. Since most of the gain in
 strength was gained by just a few patterns I concluded just as David that it
 was probably better to just use a handful of hard-coded patterns during
 playouts.

 I only recently started to do real experiments with hard-coded patterns and
 so far my results are rather inconclusive. I found when mixing different
 things it's not always clear what contributes to any increased strength
 observed. So I'm still in the process of trying to dissect what is actually
 contributing where. I found for example that a lot of the increased level of
 play using patterns does not come from using them during playouts but comes
 from the effect they have on move-exploration. I don't know if this is due
 to my particular way of implementing MC playouts in combination with UCT
 search, but moves matching a pattern (usually) automatically make it first
 in the tree-expansion as well and generally I can say so far I'm observing
 that most of the increase in level comes from the selection during
 exploration and only in small part from the selection during simulation.

 For example, in one particular experiment using just 5 patterns I saw a
 win-rate of 65% against the same program not using patterns (with the same
 number of playouts). But when not using the patterns during exploration saw
 the win-rate drop to just 55%.

 I still have a lot of testing to do and it's too early to draw any hard
 conclusions. But I think it's worthwhile trying to distinguish where the
 strength is actually gained. Better yet, finding out exactly 'why' it gained
 strength, because with MC playouts I often find results during testing
 highly counter-intuitive, occasionally to the point of being (seemingly)
 nonsensical.

 I also think what Don was proposing with his reference-bot could be
 interesting. Trying to make it play around ELO 1700 on CGOS just using 5,000
 (light) playouts. I don't know if it's possible, but I think it's a fruitful
 exercise. At a time where most people are looking at using more and more
 hardware to increase playing-strength, knowing what plays best at the other
 end of the spectrum is valuable as well. With that I mean, finding what
 plays best using severely constrained resources.

 Mark

 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] FW: computer-go] Monte carlo play?

2008-11-16 Thread Mark Boon


On 17-nov-08, at 02:42, George Dahl wrote:


So you say that: ...I'm observing that most of the increase in level
comes from the selection during exploration and only in small part
from the selection during simulation., could you elaborate at all?
This is very interesting.  That almost suggests it might be fruitful
to use the patterns in the tree, but keep lighter playouts.


That's exactly what it's suggesting. But as I said, I need to do some  
more testing to make a hard case for that.


Mark

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] FW: computer-go] Monte carlo play?

2008-11-16 Thread George Dahl
I look forward to hearing more!  Happy testing.
- George

On Sun, Nov 16, 2008 at 11:53 PM, Mark Boon [EMAIL PROTECTED] wrote:

 On 17-nov-08, at 02:42, George Dahl wrote:

 So you say that: ...I'm observing that most of the increase in level
 comes from the selection during exploration and only in small part
 from the selection during simulation., could you elaborate at all?
 This is very interesting.  That almost suggests it might be fruitful
 to use the patterns in the tree, but keep lighter playouts.

 That's exactly what it's suggesting. But as I said, I need to do some more
 testing to make a hard case for that.

 Mark

 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] FW: computer-go] Monte carlo play?

2008-11-16 Thread Michael Williams

It seems move selection in the playouts should be very random at first and more 
deterministic toward the end of the playout.  Has anyone tried that?


Mark Boon wrote:


On 17-nov-08, at 02:42, George Dahl wrote:


So you say that: ...I'm observing that most of the increase in level
comes from the selection during exploration and only in small part
from the selection during simulation., could you elaborate at all?
This is very interesting.  That almost suggests it might be fruitful
to use the patterns in the tree, but keep lighter playouts.


That's exactly what it's suggesting. But as I said, I need to do some 
more testing to make a hard case for that.


Mark

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/



___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] FW: computer-go] Monte carlo play?

2008-11-15 Thread dave.devos
Van: tony tang [mailto:[EMAIL PROTECTED]
Verzonden: za 15-11-2008 21:22
Aan: [EMAIL PROTECTED]
Onderwerp: [RE:computer-go] Monte carlo play?


Hello Dave,

Thank you for a thorough introduction of the theory, and i sincerely hope I am 
not wasting your time with amateur questions.
Because it was quite late when I sent the question I think I didnt word it 
correctly and mis-communicated 
my question. 

Being a computer scientist but new to go, i can grasp some of the theory. The 
question I was trying to get across 
was:

In a game of self play, if both parties are employing only monte carlo, surely 
its not a good conceptual representation
of a human, and if the reinforcement learning is based on random simulations 
wouldnt it be very weak when playing a real human?
According to nick wedd and some playing around I am clearly mistaken because 
its works quite well.
Is it because they are training(not in the neural network sense) their monte 
carlo based engine against another engine with prior knowledge or something
of that nature?

I have seen papers implementing patterns and prior knowledge with monte carlo 
(dated 2006) has that become a standard
now, and when people refer to monte carlo they dont mean absolutly random? 

Thank you

Kind regards

Tony




Read amazing stories to your kids on Messenger Try it Now! 
http://clk.atdmt.com/UKM/go/117588488/direct/01/  
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

[computer-go] FW: computer-go] Monte carlo play?

2008-11-15 Thread dave.devos
Van: [EMAIL PROTECTED]
Verzonden: za 15-11-2008 22:44
Aan: tony tang
Onderwerp: RE: computer-go] Monte carlo play?


Hello Tony,
 
Ok, so you know about the minimax algorithm and the like. My impression was 
wrong and I'm very sorry for my analogy.
 
I'm no expert on montecarlo, so I can only say what I know. But most of the 
real experts have been rather quiet during the last week, so perhaps I can act 
as a stand-in.
 
The basics is indeed absolutely random play (alternating turns and removing 
captured stones as in normal play) until the game finishes. The only exceptions 
to random plays are occupied intersections, simple ko recaptures, suicide moves 
(not sure if all programs prune suicide if the rules allow it) and moves that 
fill one-point eyes of the color who is to play (there may be slight 
differences in the exact definition of a one point eye).
 
Without the one eye pruning, random games would never finish, because both 
random players would keep on killing their own stones and refilling the empty 
space after the opponent has captured. 
Even with eye detection and simple ko detection, the game could end up in a 
cycle occasionally because of triple ko or more pathological situations, so one 
should probably also limit the number of moves and discard the result if that 
limit is exceeded. This is probably a much cheaper solution to the cycle 
problem than a more sophisticated detection). 
 
A purely random game is played out until both players are forced to pass. This 
is a called a light playout. Claus Reinke recently dubbed the term random 
fill-in for this process 
(http://www.mail-archive.com/computer-go@computer-go.org/msg09749.html). I like 
this term better than light playout, but it doesn't seem to catch on here.
 
The board is then counted to determine the winner and this binary result is 
added to a statistic for the move that is currently being evaluated. The 
accumulated statistic is effectively the evaluation value for the position. It 
is then backtracked to the game tree root in a minimax way.
 
In go it is hard to quickly evaluate a position statically (or even slowly for 
that matter). So it is nice to have a dynamic evaluation that becomes more 
accurate as you spend more cpu cyles on it. You can see that it's important 
that random fill-ins are fast, but it's still a rather large investment to do a 
lot of playouts, when you compare this with a static evaluation function. So it 
is even more important to avoid wasting effort in evaluating positions by 
random fill-ins. The trick is to use crude statistics (few fill-ins) to decide 
which statistic to refine by investing more fill-ins. This is where the UCT 
value comes in (http://senseis.xmp.net/?UCT). It is derived from the statistic 
and it decides which position to investigate further, preferring good 
statistics over bad ones, but also preferring crude statistics over accurate 
ones.
 
There is no training involved and there is no a priori knowledge involved here 
except for the rules and eye detection. The program has absolutely no idea 
about go concepts.
 
Would you call this process learning?
 
Programs using this algorithm are called pure MC programs and although they 
get better with more CPU cycles, they aren't very strong with normal time 
limits on current hardware (or hardware in the near future). You are right, by 
itself pure MC is just not effective enough.
 
So programmers started to build in more Go knowledge to improve on this. This 
makes the fill-in process slower, so they are called heavy playouts. And this 
is also where my knowledge ends. I may guess that instead of selecting moves in 
a uniform random way, moves are weighted by a priori Go knowledge built in by 
the programmer, but I know no details and I don't know if there have been 
publications on this. Perhaps there aren't any, because it is also a 
competitive field and commercial interests might be involved too.
 
Other contributors can probably tell you more about this.
 
Best Regards,
Dave 
 



Van: tony tang [mailto:[EMAIL PROTECTED]
Verzonden: za 15-11-2008 21:22
Aan: [EMAIL PROTECTED]
Onderwerp: [RE:computer-go] Monte carlo play?


Hello Dave,

Thank you for a thorough introduction of the theory, and i sincerely hope I am 
not wasting your time with amateur questions.
Because it was quite late when I sent the question I think I didnt word it 
correctly and mis-communicated 
my question. 

Being a computer scientist but new to go, i can grasp some of the theory. The 
question I was trying to get across 
was:

In a game of self play, if both parties are employing only monte carlo, surely 
its not a good conceptual representation
of a human, and if the reinforcement learning is based on random simulations 
wouldnt it be very weak when playing a real human?
According to nick wedd and some playing around I am clearly mistaken because 
its works quite well.
Is it because they are training(not in the neural 

[computer-go] FW: computer-go] Monte carlo play?

2008-11-15 Thread dave.devos
Van: tony tang [mailto:[EMAIL PROTECTED]
Verzonden: za 15-11-2008 23:08
Aan: [EMAIL PROTECTED]
Onderwerp: RE: computer-go] Monte carlo play?


Thanks Dave, that was incredible helpful, hopefully this new hobby of mine will 
materialise into a decent project.
 
All the best
 
Tony
 




Subject: RE: computer-go] Monte carlo play?
Date: Sat, 15 Nov 2008 22:44:51 +0100
From: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]



Hello Tony,
 
Ok, so you know about the minimax algorithm and the like. My impression was 
wrong and I'm very sorry for my analogy.
 
I'm no expert on montecarlo, so I can only say what I know. But most of the 
real experts have been rather quiet during the last week, so perhaps I can act 
as a stand-in.
 
The basics is indeed absolutely random play (alternating turns and removing 
captured stones as in normal play) until the game finishes. The only exceptions 
to random plays are occupied intersections, simple ko recaptures, suicide moves 
(not sure if all programs prune suicide if the rules allow it) and moves that 
fill one-point eyes of the color who is to play (there may be slight 
differences in the exact definition of a one point eye).
 
Without the one eye pruning, random games would never finish, because both 
random players would keep on killing their own stones and refilling the empty 
space after the opponent has captured. 
Even with eye detection and simple ko detection, the game could end up in a 
cycle occasionally because of triple ko or more pathological situations, so one 
should probably also limit the number of moves and discard the result if that 
limit is exceeded. This is probably a much cheaper solution to the cycle 
problem than a more sophisticated detection). 
 
A purely random game is played out until both players are forced to pass. This 
is a called a light playout. Claus Reinke recently dubbed the term random 
fill-in for this process 
(http://www.mail-archive.com/computer-go@computer-go.org/msg09749.html). I like 
this term better than light playout, but it doesn't seem to catch on here.
 
The board is then counted to determine the winner and this binary result is 
added to a statistic for the move that is currently being evaluated. The 
accumulated statistic is effectively the evaluation value for the position. It 
is then backtracked to the game tree root in a minimax way.
 
In go it is hard to quickly evaluate a position statically (or even slowly for 
that matter). So it is nice to have a dynamic evaluation that becomes more 
accurate as you spend more cpu cyles on it. You can see that it's important 
that random fill-ins are fast, but it's still a rather large investment to do a 
lot of playouts, when you compare this with a static evaluation function. So it 
is even more important to avoid wasting effort in evaluating positions by 
random fill-ins. The trick is to use crude statistics (few fill-ins) to decide 
which statistic to refine by investing more fill-ins. This is where the UCT 
value comes in (http://senseis.xmp.net/?UCT). It is derived from the statistic 
and it decides which position to investigate further, preferring good 
statistics over bad ones, but also preferring crude statistics over accurate 
ones.
 
There is no training involved and there is no a priori knowledge involved here 
except for the rules and eye detection. The program has absolutely no idea 
about go concepts.
 
Would you call this process learning?
 
Programs using this algorithm are called pure MC programs and although they 
get better with more CPU cycles, they aren't very strong with normal time 
limits on current hardware (or hardware in the near future). You are right, by 
itself pure MC is just not effective enough.
 
So programmers started to build in more Go knowledge to improve on this. This 
makes the fill-in process slower, so they are called heavy playouts. And this 
is also where my knowledge ends. I may guess that instead of selecting moves in 
a uniform random way, moves are weighted by a priori Go knowledge built in by 
the programmer, but I know no details and I don't know if there have been 
publications on this. Perhaps there aren't any, because it is also a 
competitive field and commercial interests might be involved too.
 
Other contributors can probably tell you more about this.
 
Best Regards,
Dave 
 



Van: tony tang [mailto:[EMAIL PROTECTED]
Verzonden: za 15-11-2008 21:22
Aan: [EMAIL PROTECTED]
Onderwerp: [RE:computer-go] Monte carlo play?


Hello Dave,

Thank you for a thorough introduction of the theory, and i sincerely hope I am 
not wasting your time with amateur questions.
Because it was quite late when I sent the question I think I didnt word it 
correctly and mis-communicated 
my question. 

Being a computer scientist but new to go, i can grasp some of the theory. The 
question I was trying to get across 
was:

In a game of self play, if both parties are employing only monte carlo, 

Re: [computer-go] FW: computer-go] Monte carlo play?

2008-11-15 Thread Darren Cook
 In a game of self play, if both parties are employing only monte
 carlo, ... random simulations... wouldnt it be very weak...
 ... and some playing around I am clearly mistaken because its works
 quite well.

Yes, it doesn't make sense but it does indeed seem to work :-).

 I have seen papers implementing patterns and prior knowledge with
 monte carlo (dated 2006) has that become a standard now, and when
 people refer to monte carlo they dont mean absolutly random?

Just using monte carlo playouts is still weak.
Monte carlo + UCT is strong. (UCT is a specific algorithm, so people are
now referring to the set of similar algorithms as MCTS for Monte Carlo
Tree Search.)

No one is using purely random playouts in any top program, but David
Fotland described Many Faces' playouts as fairly light. I think Crazy
Stone (as described in detail in one of RĂ©mi Coulom's papers) or
Valkryia perhaps have the heaviest playouts, and Mogo is somewhere between?

Too much knowledge in the monte carlo playouts and you use too many CPU
cycles and cannot do enough of them. (And the real situation is even
more complex than that, as sometimes adding knowledge that seems really
fundamental for human players does not seem to help, even before
allowing for the extra CPU cycles used.)

Darren


-- 
Darren Cook, Software Researcher/Developer
http://dcook.org/mlsn/ (English-Japanese-German-Chinese-Arabic
open source dictionary/semantic network)
http://dcook.org/work/ (About me and my work)
http://dcook.org/blogs.html (My blogs and articles)
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/