Re: [computer-go] FW: computer-go] Monte carlo play?

2008-11-17 Thread Darren Cook
> So you say that: "...I'm observing that most of the increase in level
> comes from the selection during exploration and only in small part
> from the selection during simulation.", could you elaborate at all?
> This is very interesting.  That almost suggests it might be fruitful
> to use the patterns in the tree, but keep lighter playouts.

If I understand David Fotland [1] (and you) correctly this is the same
approach Many Faces uses. Keep the playouts light, but it is worth
spending effort to guide the MCTS tree.

I also think Remi was saying the same in his paper [2] on using move
patterns. E.g. in section 4.1 he says he just used a few features in the
simulations, but used all the features to control the "progressive
widening of the monte-carlo search tree" (section 4.2).

Darren


[1]: http://www.mail-archive.com/computer-go@computer-go.org/msg09759.html

[2]:
http://remi.coulom.free.fr/Amsterdam2007/

-- 
Darren Cook, Software Researcher/Developer
http://dcook.org/mlsn/ (English-Japanese-German-Chinese-Arabic
open source dictionary/semantic network)
http://dcook.org/work/ (About me and my work)
http://dcook.org/blogs.html (My blogs and articles)
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] FW: computer-go] Monte carlo play?

2008-11-17 Thread Magnus Persson
I have to add that it is possible that a large part of the advantage  
from using heavy playouts in valkyria comes from using the same code  
to bias the the exploration part of MCTS.


I could probably test it by simply relying completely on AMAF with the  
proximity heuristic as the only bias.


A second experiment would be to rewrite the playouts and make them superfast.

-Magnus

Quoting Mark Boon <[EMAIL PROTECTED]>:



On 17-nov-08, at 02:42, George Dahl wrote:


So you say that: "...I'm observing that most of the increase in level
comes from the selection during exploration and only in small part
from the selection during simulation.", could you elaborate at all?
This is very interesting.  That almost suggests it might be fruitful
to use the patterns in the tree, but keep lighter playouts.


That's exactly what it's suggesting. But as I said, I need to do some
more testing to make a hard case for that.

Mark

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/




--
Magnus Persson
Berlin, Germany
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] FW: computer-go] Monte carlo play?

2008-11-17 Thread Magnus Persson
I use a method inititally from the Mogo team that sorts of randomizes  
the position before running the heavy playout. One simply plays  
uniformly random *non contact* moves. The effect of this is that it  
preserves the shapes of stones on the board, but it prevents the heavy  
playouts from playing long sequences deterministically. I have not  
tested this, but I felt that the evaluation on large boards got less  
noisy and/or biased doing this.


Magnus

Quoting Michael Williams <[EMAIL PROTECTED]>:


It seems move selection in the playouts should be very random at first
and more deterministic toward the end of the playout.  Has anyone tried
that?



___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] FW: computer-go] Monte carlo play?

2008-11-16 Thread dhillismail

Yes, the best combination for my program, Antigo, is to use (somewhat) more 
stochastic 

moves for the first 5-9 ply. 


I decided to look into this after noticing how surprisingly badly my heavy 
playouts 

did as part of AMAF without UCT/tree search. A more stochastic version of my 
heavy 

playouts was much stronger when using just AMAF, but weaker when I used UCT. 
Switching after the first 7 plys gave me the strongest policy for AMAF with no 
tree. 
It also turned out to be the strongest policy when using UCT, although the 
improvement was small.


- Dave Hillis


-Original Message-
From: Michael Williams <[EMAIL PROTECTED]>
To: computer-go 
Sent: Mon, 17 Nov 2008 12:01 am
Subject: Re: [computer-go] FW: computer-go] Monte carlo play?



It seems move selection in the playouts should be very random at first and more 
deterministic toward the end of the playout. Has anyone tried that??
?
Mark Boon wrote:?
> > On 17-nov-08, at 02:42, George Dahl wrote:?
> >> So you say that: "...I'm observing that most of the increase in level?
>> comes from the selection during exploration and only in small part?
>> from the selection during simulation.", could you elaborate at all??
>> This is very interesting. That almost suggests it might be fruitful?
>> to use the patterns in the tree, but keep lighter playouts.?
> > That's exactly what it's suggesting. But as I said, I need to do some > 
> > more testing to make a hard case for that.?
> > Mark?
> > ___?
> computer-go mailing list?
> [EMAIL PROTECTED]
> http://www.computer-go.org/mailman/listinfo/computer-go/?
> ?
___?
computer-go mailing list?
[EMAIL PROTECTED]
http://www.computer-go.org/mailman/listinfo/computer-go/?

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] FW: computer-go] Monte carlo play?

2008-11-16 Thread Michael Williams

It seems move selection in the playouts should be very random at first and more 
deterministic toward the end of the playout.  Has anyone tried that?


Mark Boon wrote:


On 17-nov-08, at 02:42, George Dahl wrote:


So you say that: "...I'm observing that most of the increase in level
comes from the selection during exploration and only in small part
from the selection during simulation.", could you elaborate at all?
This is very interesting.  That almost suggests it might be fruitful
to use the patterns in the tree, but keep lighter playouts.


That's exactly what it's suggesting. But as I said, I need to do some 
more testing to make a hard case for that.


Mark

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/



___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] FW: computer-go] Monte carlo play?

2008-11-16 Thread George Dahl
I look forward to hearing more!  Happy testing.
- George

On Sun, Nov 16, 2008 at 11:53 PM, Mark Boon <[EMAIL PROTECTED]> wrote:
>
> On 17-nov-08, at 02:42, George Dahl wrote:
>
>> So you say that: "...I'm observing that most of the increase in level
>> comes from the selection during exploration and only in small part
>> from the selection during simulation.", could you elaborate at all?
>> This is very interesting.  That almost suggests it might be fruitful
>> to use the patterns in the tree, but keep lighter playouts.
>
> That's exactly what it's suggesting. But as I said, I need to do some more
> testing to make a hard case for that.
>
> Mark
>
> ___
> computer-go mailing list
> computer-go@computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/
>
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] FW: computer-go] Monte carlo play?

2008-11-16 Thread Mark Boon


On 17-nov-08, at 02:42, George Dahl wrote:


So you say that: "...I'm observing that most of the increase in level
comes from the selection during exploration and only in small part
from the selection during simulation.", could you elaborate at all?
This is very interesting.  That almost suggests it might be fruitful
to use the patterns in the tree, but keep lighter playouts.


That's exactly what it's suggesting. But as I said, I need to do some  
more testing to make a hard case for that.


Mark

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] FW: computer-go] Monte carlo play?

2008-11-16 Thread George Dahl
So you say that: "...I'm observing that most of the increase in level
comes from the selection during exploration and only in small part
from the selection during simulation.", could you elaborate at all?
This is very interesting.  That almost suggests it might be fruitful
to use the patterns in the tree, but keep lighter playouts.
- George

On Sun, Nov 16, 2008 at 10:39 PM, Mark Boon <[EMAIL PROTECTED]> wrote:
> Some months ago I did several experiments with using tactics and patterns in
> playouts. Generally I found a big boost in strength using tactics. I also
> found a boost in strength using patterns but with a severe diminishing
> return after a certain number and even becoming detrimental when using large
> number of patterns (1,000s to 10,000s). Since I was using a generalized
> pattern-matcher, using it slowed things down considerably. Although it
> played a lot better with the same number of playouts, if I compared MC
> playouts with patterns to a MC playout without patterns using the same
> amount of CPU time the gain was not so obvious. Since most of the gain in
> strength was gained by just a few patterns I concluded just as David that it
> was probably better to just use a handful of hard-coded patterns during
> playouts.
>
> I only recently started to do real experiments with hard-coded patterns and
> so far my results are rather inconclusive. I found when mixing different
> things it's not always clear what contributes to any increased strength
> observed. So I'm still in the process of trying to dissect what is actually
> contributing where. I found for example that a lot of the increased level of
> play using patterns does not come from using them during playouts but comes
> from the effect they have on move-exploration. I don't know if this is due
> to my particular way of implementing MC playouts in combination with UCT
> search, but moves matching a pattern (usually) automatically make it first
> in the tree-expansion as well and generally I can say so far I'm observing
> that most of the increase in level comes from the selection during
> exploration and only in small part from the selection during simulation.
>
> For example, in one particular experiment using just 5 patterns I saw a
> win-rate of 65% against the same program not using patterns (with the same
> number of playouts). But when not using the patterns during exploration saw
> the win-rate drop to just 55%.
>
> I still have a lot of testing to do and it's too early to draw any hard
> conclusions. But I think it's worthwhile trying to distinguish where the
> strength is actually gained. Better yet, finding out exactly 'why' it gained
> strength, because with MC playouts I often find results during testing
> highly counter-intuitive, occasionally to the point of being (seemingly)
> nonsensical.
>
> I also think what Don was proposing with his reference-bot could be
> interesting. Trying to make it play around ELO 1700 on CGOS just using 5,000
> (light) playouts. I don't know if it's possible, but I think it's a fruitful
> exercise. At a time where most people are looking at using more and more
> hardware to increase playing-strength, knowing what plays best at the other
> end of the spectrum is valuable as well. With that I mean, finding what
> plays best using severely constrained resources.
>
> Mark
>
> ___
> computer-go mailing list
> computer-go@computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/
>
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] FW: computer-go] Monte carlo play?

2008-11-16 Thread Mark Boon
Some months ago I did several experiments with using tactics and  
patterns in playouts. Generally I found a big boost in strength using  
tactics. I also found a boost in strength using patterns but with a  
severe diminishing return after a certain number and even becoming  
detrimental when using large number of patterns (1,000s to 10,000s).  
Since I was using a generalized pattern-matcher, using it slowed  
things down considerably. Although it played a lot better with the  
same number of playouts, if I compared MC playouts with patterns to a  
MC playout without patterns using the same amount of CPU time the  
gain was not so obvious. Since most of the gain in strength was  
gained by just a few patterns I concluded just as David that it was  
probably better to just use a handful of hard-coded patterns during  
playouts.


I only recently started to do real experiments with hard-coded  
patterns and so far my results are rather inconclusive. I found when  
mixing different things it's not always clear what contributes to any  
increased strength observed. So I'm still in the process of trying to  
dissect what is actually contributing where. I found for example that  
a lot of the increased level of play using patterns does not come  
from using them during playouts but comes from the effect they have  
on move-exploration. I don't know if this is due to my particular way  
of implementing MC playouts in combination with UCT search, but moves  
matching a pattern (usually) automatically make it first in the tree- 
expansion as well and generally I can say so far I'm observing that  
most of the increase in level comes from the selection during  
exploration and only in small part from the selection during simulation.


For example, in one particular experiment using just 5 patterns I saw  
a win-rate of 65% against the same program not using patterns (with  
the same number of playouts). But when not using the patterns during  
exploration saw the win-rate drop to just 55%.


I still have a lot of testing to do and it's too early to draw any  
hard conclusions. But I think it's worthwhile trying to distinguish  
where the strength is actually gained. Better yet, finding out  
exactly 'why' it gained strength, because with MC playouts I often  
find results during testing highly counter-intuitive, occasionally to  
the point of being (seemingly) nonsensical.


I also think what Don was proposing with his reference-bot could be  
interesting. Trying to make it play around ELO 1700 on CGOS just  
using 5,000 (light) playouts. I don't know if it's possible, but I  
think it's a fruitful exercise. At a time where most people are  
looking at using more and more hardware to increase playing-strength,  
knowing what plays best at the other end of the spectrum is valuable  
as well. With that I mean, finding what plays best using severely  
constrained resources.


Mark

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


RE: [computer-go] FW: computer-go] Monte carlo play?

2008-11-16 Thread David Fotland
I think I added a small capture bias, but it didn't make much difference.
Sorry, I forgot that it isn't quite pure random.  Before the uniform random,
if there is an enemy one liberty group on the board, with some small
probability, I capture it.  

A pattern includes don't cares and is matched in any orientation.  The 3x3
patterns are only matched adjacent to the last move (8 local places).

David

> -Original Message-
> From: [EMAIL PROTECTED] [mailto:computer-go-
> [EMAIL PROTECTED] On Behalf Of Jason House
> Sent: Sunday, November 16, 2008 5:06 PM
> To: computer-go
> Subject: Re: [computer-go] FW: computer-go] Monte carlo play?
> 
> On Nov 16, 2008, at 11:18 AM, "David Fotland" <[EMAIL PROTECTED]
> games.com> wrote:
> 
> > I thought Valkyria does local search (ladders) during the playouts.
> >
> > Many Faces is lighter on the playouts.  I have 17 local 3x3
> > patterns, then
> > go to uniform random without filling eyes.
> 
> 
> No capture bias in the playouts? I thought that was a big strength
> boost.
> 
> Out of curiosity, how do you count your patterns. For example, is it
> still one pattern if it includes a don't care? How about rotations/
> reflections of the same basic pattern?
> 
> 
> >
> >
> > Against Gnugo 3.7.10 level 10 on 9x9, with 5000 playouts, I win 92%,
> > so our
> > performance is similar.  I'm doing 23K playouts per second on my 2.2
> > GHz
> > Core 2 Duo, so my performance might be a little better, depending on
> > the
> > specs of your old machine.
> >
> > David
> >
> >> -Original Message-
> >> From: [EMAIL PROTECTED] [mailto:computer-go-
> >> [EMAIL PROTECTED] On Behalf Of Magnus Persson
> >> Sent: Sunday, November 16, 2008 5:45 AM
> >> To: computer-go@computer-go.org
> >> Subject: Re: [computer-go] FW: computer-go] Monte carlo play?
> >>
> >> Quoting Hideki Kato <[EMAIL PROTECTED]>:
> >>
> >>> Heikki Levanto: <[EMAIL PROTECTED]>:
> >>>> The way I understand it, modern Monte Carlo programs do not even
> >>>> try to
> >>>> emulate a human player with a random player - obviously that
> >>>> would not
> >> work.
> >>>
> >>> I believe CrazyStone's use of patterns does so and it seems
> >>> successful.
> >>
> >> With Valkyria I try to follow two principles in heavy playouts.
> >>
> >>
> >> 1) In contact fights there are a lot of shapes that are played most
> >> of
> >> the time. Thus Valkyria checks each move played if there is an
> >> obvious
> >> local response to it. If so it plays it deterministcally. In many
> >> situations there are two or more such candidates and then it plays
> >> one
> >> of those moves.
> >>
> >> 2) In many positions the last move played does not trigger any
> >> obvious
> >> response, and then a random move is chosen uniformly
> >>
> >> 3) There are moves that are inferior 100% of the time both locally
> >> and
> >> globally. These moves are pruned if they are selected and a new
> >> random
> >> move is chosen as long as there are moves left to try.
> >>
> >> I got hundreds of handcoded patterns for both 1 and 3. It would be
> >> too
> >> time consuming to test these patterns, so I use my knowledge and
> >> intuition (European 2 Dan) to simply decide what patterns to include.
> >>
> >> So Valkyria has a lot of go knowledge, but mostly such knowledge that
> >> all go players have up to some strength such as perhaps 8-10 kyu. It
> >> has no knowledge about global matters. The beauty of MC-evaluation is
> >> that globally strong moves are most of the time evaluated better than
> >> globally weak moves. Heavy playouts removes noise from MC-evaluation
> >> and makes it more sensitive to the true value of moves. Still there
> >> are biases with all heavy playouts, but they are overcome with MC
> >> Tree
> >> Search (MCTS) that corrects mistakes in the evaluation recursively.
> >>
> >> Here are my latest scaling experiment on 9x9 for Valkyria.
> >>
> >> Valkyria plays 1150 random games per second on my 4 year old laptop.
> >>
> >> This test is against gnugo 3.7.10 assumed to be Elo 1800. Most
> >> datapoints are based on 500 games. "N sims" means Valkyria playes N
> >> heavy playouts per move played. Winrates are in %.
> >&

Re: [computer-go] FW: computer-go] Monte carlo play?

2008-11-16 Thread Jason House
On Nov 16, 2008, at 11:18 AM, "David Fotland" <[EMAIL PROTECTED] 
games.com> wrote:



I thought Valkyria does local search (ladders) during the playouts.

Many Faces is lighter on the playouts.  I have 17 local 3x3  
patterns, then

go to uniform random without filling eyes.



No capture bias in the playouts? I thought that was a big strength  
boost.


Out of curiosity, how do you count your patterns. For example, is it  
still one pattern if it includes a don't care? How about rotations/  
reflections of the same basic pattern?






Against Gnugo 3.7.10 level 10 on 9x9, with 5000 playouts, I win 92%,  
so our
performance is similar.  I'm doing 23K playouts per second on my 2.2  
GHz
Core 2 Duo, so my performance might be a little better, depending on  
the

specs of your old machine.

David


-Original Message-
From: [EMAIL PROTECTED] [mailto:computer-go-
[EMAIL PROTECTED] On Behalf Of Magnus Persson
Sent: Sunday, November 16, 2008 5:45 AM
To: computer-go@computer-go.org
Subject: Re: [computer-go] FW: computer-go] Monte carlo play?

Quoting Hideki Kato <[EMAIL PROTECTED]>:


Heikki Levanto: <[EMAIL PROTECTED]>:
The way I understand it, modern Monte Carlo programs do not even  
try to
emulate a human player with a random player - obviously that  
would not

work.


I believe CrazyStone's use of patterns does so and it seems
successful.


With Valkyria I try to follow two principles in heavy playouts.


1) In contact fights there are a lot of shapes that are played most  
of
the time. Thus Valkyria checks each move played if there is an  
obvious

local response to it. If so it plays it deterministcally. In many
situations there are two or more such candidates and then it plays  
one

of those moves.

2) In many positions the last move played does not trigger any  
obvious

response, and then a random move is chosen uniformly

3) There are moves that are inferior 100% of the time both locally  
and
globally. These moves are pruned if they are selected and a new  
random

move is chosen as long as there are moves left to try.

I got hundreds of handcoded patterns for both 1 and 3. It would be  
too

time consuming to test these patterns, so I use my knowledge and
intuition (European 2 Dan) to simply decide what patterns to include.

So Valkyria has a lot of go knowledge, but mostly such knowledge that
all go players have up to some strength such as perhaps 8-10 kyu. It
has no knowledge about global matters. The beauty of MC-evaluation is
that globally strong moves are most of the time evaluated better than
globally weak moves. Heavy playouts removes noise from MC-evaluation
and makes it more sensitive to the true value of moves. Still there
are biases with all heavy playouts, but they are overcome with MC  
Tree

Search (MCTS) that corrects mistakes in the evaluation recursively.

Here are my latest scaling experiment on 9x9 for Valkyria.

Valkyria plays 1150 random games per second on my 4 year old laptop.

This test is against gnugo 3.7.10 assumed to be Elo 1800. Most
datapoints are based on 500 games. "N sims" means Valkyria playes N
heavy playouts per move played. Winrates are in %.

N simsWinRateElo (rel Gnu)
477.41361
94221580
188371708
375531821
75069.91946
150081.22054
3000882146
600092.62239
12000942278
2400097.22416
4800097.42429

the heavy playouts of Valkyria needs just 375 random games per move  
to

match gnugo using only 0.3 seconds per move. And even using only 47
simulations per move it can still win.

So obviously the heavy playout code of Valkyria is much weaker (< Elo
1361) than Gnugo and most human opponents, but compared to CGOS a lot
of programs witho no knowledge are about the same level, although  
they

uses 2000 simulations or more.

Searching efficiently using MCTS with AMAF it apparently can be made
arbitrarily strong.

Hope this explains how both the nature of playouts and the MCTS
contributes to the playing strength of a program.

Should one go heavy or light? I do not know, I feel that Valkyria  
is a

little bit too slow on equivalent hardware against most top programs.
On the other hand I think it could be tweaked and improved upon.
Perhaps it can even be made faster by removing code that does not
improve playing strength. And there is probably still room for adding
code that improves strength without a noticable slowdown.

I just know that is a lot of hard work doing it the way I did it.

Best
Magnus
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


RE: [computer-go] FW: computer-go] Monte carlo play?

2008-11-16 Thread Magnus Persson
Yes, Valkyria does a lot of ladder reading as well. Actually "pattern  
matching" in the case of Valkyria is a little unclear, it is a  
decision trees where the leaves are often procedure calls that looks  
at a larger portion of the board. The ladder code is called for  
various reasons in the tree.


Is 3.7.10 level 10 weaker than the default settings? I will run a test  
using 5000 playouts and 375 playouts to see if it makes a difference.


Also have you tried this version on CGOS9x9?

This version http://cgos.boardspace.net/9x9/cross/Valkyria3.2.6_C2.html
used a machine similar to yours and reached 3000 playouts per second  
using two threads. Your code is then 8 times faster and should be much  
stronger.



-Magnus



Quoting David Fotland <[EMAIL PROTECTED]>:


I thought Valkyria does local search (ladders) during the playouts.

Many Faces is lighter on the playouts.  I have 17 local 3x3 patterns, then
go to uniform random without filling eyes.

Against Gnugo 3.7.10 level 10 on 9x9, with 5000 playouts, I win 92%, so our
performance is similar.  I'm doing 23K playouts per second on my 2.2 GHz
Core 2 Duo, so my performance might be a little better, depending on the
specs of your old machine.


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


RE: [computer-go] FW: computer-go] Monte carlo play?

2008-11-16 Thread David Fotland
I thought Valkyria does local search (ladders) during the playouts.

Many Faces is lighter on the playouts.  I have 17 local 3x3 patterns, then
go to uniform random without filling eyes.

Against Gnugo 3.7.10 level 10 on 9x9, with 5000 playouts, I win 92%, so our
performance is similar.  I'm doing 23K playouts per second on my 2.2 GHz
Core 2 Duo, so my performance might be a little better, depending on the
specs of your old machine.

David

> -Original Message-
> From: [EMAIL PROTECTED] [mailto:computer-go-
> [EMAIL PROTECTED] On Behalf Of Magnus Persson
> Sent: Sunday, November 16, 2008 5:45 AM
> To: computer-go@computer-go.org
> Subject: Re: [computer-go] FW: computer-go] Monte carlo play?
> 
> Quoting Hideki Kato <[EMAIL PROTECTED]>:
> 
> > Heikki Levanto: <[EMAIL PROTECTED]>:
> >> The way I understand it, modern Monte Carlo programs do not even try to
> >> emulate a human player with a random player - obviously that would not
> work.
> >
> > I believe CrazyStone's use of patterns does so and it seems
> > successful.
> 
> With Valkyria I try to follow two principles in heavy playouts.
> 
> 
> 1) In contact fights there are a lot of shapes that are played most of
> the time. Thus Valkyria checks each move played if there is an obvious
> local response to it. If so it plays it deterministcally. In many
> situations there are two or more such candidates and then it plays one
> of those moves.
> 
> 2) In many positions the last move played does not trigger any obvious
> response, and then a random move is chosen uniformly
> 
> 3) There are moves that are inferior 100% of the time both locally and
> globally. These moves are pruned if they are selected and a new random
> move is chosen as long as there are moves left to try.
> 
> I got hundreds of handcoded patterns for both 1 and 3. It would be too
> time consuming to test these patterns, so I use my knowledge and
> intuition (European 2 Dan) to simply decide what patterns to include.
> 
> So Valkyria has a lot of go knowledge, but mostly such knowledge that
> all go players have up to some strength such as perhaps 8-10 kyu. It
> has no knowledge about global matters. The beauty of MC-evaluation is
> that globally strong moves are most of the time evaluated better than
> globally weak moves. Heavy playouts removes noise from MC-evaluation
> and makes it more sensitive to the true value of moves. Still there
> are biases with all heavy playouts, but they are overcome with MC Tree
> Search (MCTS) that corrects mistakes in the evaluation recursively.
> 
> Here are my latest scaling experiment on 9x9 for Valkyria.
> 
> Valkyria plays 1150 random games per second on my 4 year old laptop.
> 
> This test is against gnugo 3.7.10 assumed to be Elo 1800. Most
> datapoints are based on 500 games. "N sims" means Valkyria playes N
> heavy playouts per move played. Winrates are in %.
> 
> N simsWinRate Elo (rel Gnu)
> 477.4 1361
> 9422  1580
> 188   37  1708
> 375   53  1821
> 750   69.91946
> 1500  81.22054
> 3000  88  2146
> 6000  92.62239
> 12000 94  2278
> 24000 97.22416
> 48000 97.42429
> 
> the heavy playouts of Valkyria needs just 375 random games per move to
> match gnugo using only 0.3 seconds per move. And even using only 47
> simulations per move it can still win.
> 
> So obviously the heavy playout code of Valkyria is much weaker (< Elo
> 1361) than Gnugo and most human opponents, but compared to CGOS a lot
> of programs witho no knowledge are about the same level, although they
> uses 2000 simulations or more.
> 
> Searching efficiently using MCTS with AMAF it apparently can be made
> arbitrarily strong.
> 
> Hope this explains how both the nature of playouts and the MCTS
> contributes to the playing strength of a program.
> 
> Should one go heavy or light? I do not know, I feel that Valkyria is a
> little bit too slow on equivalent hardware against most top programs.
> On the other hand I think it could be tweaked and improved upon.
> Perhaps it can even be made faster by removing code that does not
> improve playing strength. And there is probably still room for adding
> code that improves strength without a noticable slowdown.
> 
> I just know that is a lot of hard work doing it the way I did it.
> 
> Best
> Magnus
> ___
> computer-go mailing list
> computer-go@computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] FW: computer-go] Monte carlo play?

2008-11-16 Thread Don Dailey
The random playouts or even  heavy playouts are not intended to emulate
a human player.   Heikki is exactly right.   

It's a crude measurement of how good the position is.   The moves in a
random playout are horrible and so are the moves in a heavy playout.
In fact, improving them arbitrarily will probably hurt the playouts.
Random playouts work reasonably well because to a certain extent bad
moves cancel each other.   Chrilly called this "mutual stupidity" I
think.  

For example,  a group of stones may be weak and ultimately lost, but it
is still possible to defend that group if the attacker isn't diligent.
What happens in the playouts is that the attacker is NOT diligent, but
neither is the defender!   So the result comes out "correct" anyway.

Of course this is not reliable, but it's amazingly good at getting a
reasonable picture of what is weak, what is strong and who owns what.

You can improve that by adding some knowledge to the playouts,  but you
must do this with great care.   In my example above, let's say you add a
piece of knowledge that causes the defender to succeed.   You can argue
that the playouts "play better go" now,  but the conclusion that you
come to for the group we are talking about is now wrong.   

- Don



On Sun, 2008-11-16 at 10:08 +0100, Heikki Levanto wrote:
> On Sat, Nov 15, 2008 at 11:38:34PM +0100, [EMAIL PROTECTED] wrote:
> > Being a computer scientist but new to go, i can grasp some of the theory.
> > The question I was trying to get across was:
> > 
> > In a game of self play, if both parties are employing only monte carlo,
> > surely its not a good conceptual representation of a human, and if the
> > reinforcement learning is based on random simulations wouldnt it be very
> > weak when playing a real human?
> 
> 
> Here is another amateur answering.
> 
> The way I understand it, modern Monte Carlo programs do not even try to
> emulate a human player with a random player - obviously that would not work.
> 
> What they do is that they build a quite traditional search tree starting from
> the current position. They use a random playout as a crude way to evaluate a
> position. Based on this evaluation, they decide which branch of the tree to
> expand.
> 
> This is the way I understand the random playouts: If, in a given position,
> white is clearly ahead, he will win the game if both parts play perfect
> moves. He is also likely to win if both parts play reasonably good moves
> (say, like human amateurs), but there is a bit more of a chance that one
> player hits upon a good combination which the other misses, so the result is
> not quite as reliable. If the playouts are totally random, there is still a
> better chance for white to win, because both parts make equally bad moves.
> The results have much more variation, of course. So far it does not sound
> like a very good proposal, but things change if you consider the facts that
> we don't have perfecr oracles, and good humans are slow to play out a
> position, and can not be integrated into a computer program. Whereas random
> playouts can be done awfully fast, tens of thousands of playouts in a second.
> Averaging the reuslts gives a fair indication of who is more likely to win
> from that position, just what is needed to decide which part of the search
> tree to expand.
> 
> The 'random' playouts are not totally random, they include a minimum of
> tactical rules (do not fill own eyes, do not pass as long as there are valid
> moves). Even this little will produce a few blind spots, moves that the
> playouts can not see, and systematically wrong results. Adding more
> go-specific knowledge can make the results much better (more likely to be
> right), but can also add some more blind spots. And it costs time, reducing
> the number of playouts the program can make.
> 
> Hope that explains something of the mystery
> 
> 
> Regards
> 
>Heikki
> 


signature.asc
Description: This is a digitally signed message part
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Re: FW: computer-go] Monte carlo play?

2008-11-16 Thread Thomas Wolf
On Sun, 16 Nov 2008, Claus Reinke wrote:

> ...
> better feeling for the game; personally, I don't like fast games(*), but
> ...

But there is this saying:

"Play quick, lose quick, learn quick!"   :)

Thomas
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] FW: computer-go] Monte carlo play?

2008-11-16 Thread Magnus Persson

Quoting Hideki Kato <[EMAIL PROTECTED]>:


Heikki Levanto: <[EMAIL PROTECTED]>:

The way I understand it, modern Monte Carlo programs do not even try to
emulate a human player with a random player - obviously that would not work.


I believe CrazyStone's use of patterns does so and it seems
successful.


With Valkyria I try to follow two principles in heavy playouts.


1) In contact fights there are a lot of shapes that are played most of  
the time. Thus Valkyria checks each move played if there is an obvious  
local response to it. If so it plays it deterministcally. In many  
situations there are two or more such candidates and then it plays one  
of those moves.


2) In many positions the last move played does not trigger any obvious  
response, and then a random move is chosen uniformly


3) There are moves that are inferior 100% of the time both locally and  
globally. These moves are pruned if they are selected and a new random  
move is chosen as long as there are moves left to try.


I got hundreds of handcoded patterns for both 1 and 3. It would be too  
time consuming to test these patterns, so I use my knowledge and  
intuition (European 2 Dan) to simply decide what patterns to include.


So Valkyria has a lot of go knowledge, but mostly such knowledge that  
all go players have up to some strength such as perhaps 8-10 kyu. It  
has no knowledge about global matters. The beauty of MC-evaluation is  
that globally strong moves are most of the time evaluated better than  
globally weak moves. Heavy playouts removes noise from MC-evaluation  
and makes it more sensitive to the true value of moves. Still there  
are biases with all heavy playouts, but they are overcome with MC Tree  
Search (MCTS) that corrects mistakes in the evaluation recursively.


Here are my latest scaling experiment on 9x9 for Valkyria.

Valkyria plays 1150 random games per second on my 4 year old laptop.

This test is against gnugo 3.7.10 assumed to be Elo 1800. Most  
datapoints are based on 500 games. "N sims" means Valkyria playes N  
heavy playouts per move played. Winrates are in %.


N sims  WinRate Elo (rel Gnu)
47  7.4 1361
94  22  1580
188 37  1708
375 53  1821
750 69.91946
150081.22054
300088  2146
600092.62239
12000   94  2278
24000   97.22416
48000   97.42429

the heavy playouts of Valkyria needs just 375 random games per move to  
match gnugo using only 0.3 seconds per move. And even using only 47  
simulations per move it can still win.


So obviously the heavy playout code of Valkyria is much weaker (< Elo  
1361) than Gnugo and most human opponents, but compared to CGOS a lot  
of programs witho no knowledge are about the same level, although they  
uses 2000 simulations or more.


Searching efficiently using MCTS with AMAF it apparently can be made  
arbitrarily strong.


Hope this explains how both the nature of playouts and the MCTS  
contributes to the playing strength of a program.


Should one go heavy or light? I do not know, I feel that Valkyria is a  
little bit too slow on equivalent hardware against most top programs.  
On the other hand I think it could be tweaked and improved upon.  
Perhaps it can even be made faster by removing code that does not  
improve playing strength. And there is probably still room for adding  
code that improves strength without a noticable slowdown.


I just know that is a lot of hard work doing it the way I did it.

Best
Magnus
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] FW: computer-go] Monte carlo play?

2008-11-16 Thread Heikki Levanto
On Sun, Nov 16, 2008 at 11:46:28AM +, D Gilder wrote:
> > This is the way I understand the random playouts: If, in a given position,
> > white is clearly ahead, he will win the game if both parts play perfect
> > moves. He is also likely to win if both parts play reasonably good moves
> > (say, like human amateurs), but there is a bit more of a chance that one
> > player hits upon a good combination which the other misses, so the result
> > is not quite as reliable. If the playouts are totally random, there is
> > still a better chance for white to win, because both parts make equally bad
> > moves. The results have much more variation, of course. So far it does not
> > sound like a very good proposal, but things change if you consider the
> > facts that we don't have perfecr oracles, and good humans are slow to play
> > out a position, and can not be integrated into a computer program. Whereas
> > random playouts can be done awfully fast, tens of thousands of playouts in
> > a second. Averaging the reuslts gives a fair indication of who is more
> > likely to win from that position, just what is needed to decide which part
> > of the search tree to expand.
> 
> Do you know what use (if any) is made of the standard deviation of the 
> results?

Now statistics isn't my strong point, but one of the first and most
successfull systems was called UCT for Upper Confidence Bound. It calculated
some sort of expected error, and added that to the winning ratio. Then it
expanded the branch that had the highest value. If that expansion was a win,
the error margin would get smaller, but the average result would get higher.
Perhaps some other branch would be tried next, but this one would still be a
good candidate. If the result was a loss, the average would drop, and so
would the error, so this move would become much less likely to be expanded.

I am sure someone who understands statistics will soon jump in and correct my
explanation :-)

  - Heikki

-- 
Heikki Levanto   "In Murphy We Turst" heikki (at) lsd (dot) dk

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] FW: computer-go] Monte carlo play?

2008-11-16 Thread Hideki Kato
Hello Heikki,

Heikki Levanto: <[EMAIL PROTECTED]>:
>On Sat, Nov 15, 2008 at 11:38:34PM +0100, [EMAIL PROTECTED] wrote:
>> Being a computer scientist but new to go, i can grasp some of the theory.
>> The question I was trying to get across was:
>> 
>> In a game of self play, if both parties are employing only monte carlo,
>> surely its not a good conceptual representation of a human, and if the
>> reinforcement learning is based on random simulations wouldnt it be very
>> weak when playing a real human?
>
>
>Here is another amateur answering.
>
>The way I understand it, modern Monte Carlo programs do not even try to
>emulate a human player with a random player - obviously that would not work.

I believe CrazyStone's use of patterns does so and it seems 
successful.

Hideki

>What they do is that they build a quite traditional search tree starting from
>the current position. They use a random playout as a crude way to evaluate a
>position. Based on this evaluation, they decide which branch of the tree to
>expand.
>
>This is the way I understand the random playouts: If, in a given position,
>white is clearly ahead, he will win the game if both parts play perfect
>moves. He is also likely to win if both parts play reasonably good moves
>(say, like human amateurs), but there is a bit more of a chance that one
>player hits upon a good combination which the other misses, so the result is
>not quite as reliable. If the playouts are totally random, there is still a
>better chance for white to win, because both parts make equally bad moves.
>The results have much more variation, of course. So far it does not sound
>like a very good proposal, but things change if you consider the facts that
>we don't have perfecr oracles, and good humans are slow to play out a
>position, and can not be integrated into a computer program. Whereas random
>playouts can be done awfully fast, tens of thousands of playouts in a second.
>Averaging the reuslts gives a fair indication of who is more likely to win
>from that position, just what is needed to decide which part of the search
>tree to expand.
>
>The 'random' playouts are not totally random, they include a minimum of
>tactical rules (do not fill own eyes, do not pass as long as there are valid
>moves). Even this little will produce a few blind spots, moves that the
>playouts can not see, and systematically wrong results. Adding more
>go-specific knowledge can make the results much better (more likely to be
>right), but can also add some more blind spots. And it costs time, reducing
>the number of playouts the program can make.
>
>Hope that explains something of the mystery
>
>
>Regards
>
>   Heikki
--
[EMAIL PROTECTED] (Kato)
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] Re: FW: computer-go] Monte carlo play?

2008-11-16 Thread Claus Reinke
>> In a game of self play, if both parties are employing only monte
>> carlo, ... random simulations... wouldnt it be very weak...
>> ... and some playing around I am clearly mistaken because its works
>> quite well.
>Yes, it doesn't make sense but it does indeed seem to work :-).

Plain Monte-Carlo bots *are* rather weak. But they work better than
might be expected, and they are fairly simple, which is extremely
motivating for potential Go programmers!-) Instead of having to learn
all of Go and a lot of AI before being able to start building anything,
one can start quickly, and then improve together with the bot. And
the current top programs claim plain Monte-Carlo bots as their remote
ancestors, so it isn't a dead end, either (though complete rewrites may
be needed somewhere along the path;-).

Another way to think about playouts is by comparing them with a
human player processing through the ranks:

1 only knowing the rules makes for very inefficient/slow/buggy play
2 playing lots of games quickly has often been recommended to get a
better feeling for the game; personally, I don't like fast games(*), but
at least the worst inadequacies tend to show up even if we only play
equally clueless opponents (probably because weaknesses aren't
evenly distributed, so that peaks in understanding in one player
happen to match weaknesses in understanding in the other)
3 human players learn over time, until all players in a group have
reached an equal level (or have gone as far as they can, with the
effort they can afford to put into the game)
4 now further progress depends on innovation in the existing pool
of players, on joining a different player pool, or on bringing in
other players, preferably players whose strengths expose
weaknesses in the current pool (similar effects can be achieved
by reading books or observing games)

If we allow the learning outcome in 2/3 to be represented as patterns
("wait a minute, I've fallen into that trap before, so I shouldn't do this",
"I usually do this, but that would be faster - I wonder whether it'll work"),
then random playouts can emulate a weak form of this process. Instead
of learning from lots of games over time, plain Monte-Carlo bots
extrapolate from lots of dumb games, everytime they consider their
next move.

Random "games" are really too dumb to be considered games, but bots
can play lots of them in a short time, so they can at least see trends (a
mixture of on-the-spot learning and very bad reading ahead). If we
assume that the quantity compensates somewhat for the quality, a
Monte-Carlo bot is a bit like beginners after lots of games who
always forget their lessons learned when the session ends.

They may have an idea of who is ahead and what is alive, and be able
to organise their games based on those ideas, but a stronger player (who
remembers his lessons) may be able to rip their ideas apart ("you might
both think that group is alive but what do you do if I play here? and who
is ahead if you lose that group?"). And then there is the scenario of "strong
player visits club and humiliates local champion" ("you might think that
playing there kills that group, but what if they reply here?").

Adding patterns or other biases to make playouts heavier (both slower
and more relevant for evaluation) is similar to improving reading skills,
but with differences: bots already read to the end, so eliminating useless
moves doesn't improve the playout depth, it improves at best the density
of information to be gained from playouts (so you don't have to play as
many of them, or can get more out of those you do play); but playouts
are also still used for on-the-spot-learning, and biases have limited scope
for success, so one has to be careful not to ruin this aspect. Using a tree
on top of the playouts is not just a way to represent the knowledge
extracted from the playouts, but also helps to direct playout effort towards
relevant situations, and gives a place where traditional Go knowledge
can be built in without affecting the playouts directly (so they become
more of an evaluation function than a candidate move generator).

One problem is that the top programs tend to be fairly close in strength.
There are variations in method, so there is some further progress, but
how much of the human going-through-the-ranks evolution has been
captured in current top programs, I don't know. I suspect that it is less
than their ranks would indicate. There is certainly reading going on,
discussion, and inviting stronger players or different bots.

Oh, and if any of this seems to make sense, remember that I just made
it all up!-) You might want to consult these slides for overviews from
folks who have been in the business:


"Old-fashioned Computer Go vs Monte-Carlo Go"; by Bruno Bouzy,
Invited tutorial, IEEE 2007 Symposium on Computational Intelligence
in Games, CIG 07, Hawaï, USA, 1st April 2007.
Abstract:

http://cswww.essex.ac.

Re: [computer-go] FW: computer-go] Monte carlo play?

2008-11-16 Thread D Gilder
On Sunday 16 November 2008, Heikki Levanto wrote:
> On Sat, Nov 15, 2008 at 11:38:34PM +0100, [EMAIL PROTECTED] wrote:
> > Being a computer scientist but new to go, i can grasp some of the theory.
> > The question I was trying to get across was:
> >
> > In a game of self play, if both parties are employing only monte carlo,
> > surely its not a good conceptual representation of a human, and if the
> > reinforcement learning is based on random simulations wouldnt it be very
> > weak when playing a real human?
>
> Here is another amateur answering.
>
> The way I understand it, modern Monte Carlo programs do not even try to
> emulate a human player with a random player - obviously that would not
> work.
>
> What they do is that they build a quite traditional search tree starting
> from the current position. They use a random playout as a crude way to
> evaluate a position. Based on this evaluation, they decide which branch of
> the tree to expand.
>
> This is the way I understand the random playouts: If, in a given position,
> white is clearly ahead, he will win the game if both parts play perfect
> moves. He is also likely to win if both parts play reasonably good moves
> (say, like human amateurs), but there is a bit more of a chance that one
> player hits upon a good combination which the other misses, so the result
> is not quite as reliable. If the playouts are totally random, there is
> still a better chance for white to win, because both parts make equally bad
> moves. The results have much more variation, of course. So far it does not
> sound like a very good proposal, but things change if you consider the
> facts that we don't have perfecr oracles, and good humans are slow to play
> out a position, and can not be integrated into a computer program. Whereas
> random playouts can be done awfully fast, tens of thousands of playouts in
> a second. Averaging the reuslts gives a fair indication of who is more
> likely to win from that position, just what is needed to decide which part
> of the search tree to expand.

Do you know what use (if any) is made of the standard deviation of the 
results?

>
> The 'random' playouts are not totally random, they include a minimum of
> tactical rules (do not fill own eyes, do not pass as long as there are
> valid moves). Even this little will produce a few blind spots, moves that
> the playouts can not see, and systematically wrong results. Adding more
> go-specific knowledge can make the results much better (more likely to be
> right), but can also add some more blind spots. And it costs time, reducing
> the number of playouts the program can make.
>
> Hope that explains something of the mystery
>
>
> Regards
>
>Heikki


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] FW: computer-go] Monte carlo play?

2008-11-16 Thread Heikki Levanto
On Sat, Nov 15, 2008 at 11:38:34PM +0100, [EMAIL PROTECTED] wrote:
> Being a computer scientist but new to go, i can grasp some of the theory.
> The question I was trying to get across was:
> 
> In a game of self play, if both parties are employing only monte carlo,
> surely its not a good conceptual representation of a human, and if the
> reinforcement learning is based on random simulations wouldnt it be very
> weak when playing a real human?


Here is another amateur answering.

The way I understand it, modern Monte Carlo programs do not even try to
emulate a human player with a random player - obviously that would not work.

What they do is that they build a quite traditional search tree starting from
the current position. They use a random playout as a crude way to evaluate a
position. Based on this evaluation, they decide which branch of the tree to
expand.

This is the way I understand the random playouts: If, in a given position,
white is clearly ahead, he will win the game if both parts play perfect
moves. He is also likely to win if both parts play reasonably good moves
(say, like human amateurs), but there is a bit more of a chance that one
player hits upon a good combination which the other misses, so the result is
not quite as reliable. If the playouts are totally random, there is still a
better chance for white to win, because both parts make equally bad moves.
The results have much more variation, of course. So far it does not sound
like a very good proposal, but things change if you consider the facts that
we don't have perfecr oracles, and good humans are slow to play out a
position, and can not be integrated into a computer program. Whereas random
playouts can be done awfully fast, tens of thousands of playouts in a second.
Averaging the reuslts gives a fair indication of who is more likely to win
from that position, just what is needed to decide which part of the search
tree to expand.

The 'random' playouts are not totally random, they include a minimum of
tactical rules (do not fill own eyes, do not pass as long as there are valid
moves). Even this little will produce a few blind spots, moves that the
playouts can not see, and systematically wrong results. Adding more
go-specific knowledge can make the results much better (more likely to be
right), but can also add some more blind spots. And it costs time, reducing
the number of playouts the program can make.

Hope that explains something of the mystery


Regards

   Heikki

-- 
Heikki Levanto   "In Murphy We Turst" heikki (at) lsd (dot) dk

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] FW: computer-go] Monte carlo play?

2008-11-15 Thread Darren Cook
> In a game of self play, if both parties are employing only monte
> carlo, ... random simulations... wouldnt it be very weak...
> ... and some playing around I am clearly mistaken because its works
> quite well.

Yes, it doesn't make sense but it does indeed seem to work :-).

> I have seen papers implementing patterns and prior knowledge with
> monte carlo (dated 2006) has that become a "standard" now, and when
> people refer to monte carlo they dont mean absolutly random?

Just using monte carlo playouts is still weak.
Monte carlo + UCT is strong. (UCT is a specific algorithm, so people are
now referring to the set of similar algorithms as "MCTS" for Monte Carlo
Tree Search.)

No one is using purely random playouts in any top program, but David
Fotland described Many Faces' playouts as "fairly light". I think Crazy
Stone (as described in detail in one of Rémi Coulom's papers) or
Valkryia perhaps have the heaviest playouts, and Mogo is somewhere between?

Too much knowledge in the monte carlo playouts and you use too many CPU
cycles and cannot do enough of them. (And the real situation is even
more complex than that, as sometimes adding knowledge that seems really
fundamental for human players does not seem to help, even before
allowing for the extra CPU cycles used.)

Darren


-- 
Darren Cook, Software Researcher/Developer
http://dcook.org/mlsn/ (English-Japanese-German-Chinese-Arabic
open source dictionary/semantic network)
http://dcook.org/work/ (About me and my work)
http://dcook.org/blogs.html (My blogs and articles)
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] FW: computer-go] Monte carlo play?

2008-11-15 Thread dave.devos
Van: tony tang [mailto:[EMAIL PROTECTED]
Verzonden: za 15-11-2008 23:08
Aan: [EMAIL PROTECTED]
Onderwerp: RE: computer-go] Monte carlo play?


Thanks Dave, that was incredible helpful, hopefully this new hobby of mine will 
materialise into a decent project.
 
All the best
 
Tony
 




Subject: RE: computer-go] Monte carlo play?
Date: Sat, 15 Nov 2008 22:44:51 +0100
From: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]



Hello Tony,
 
Ok, so you know about the minimax algorithm and the like. My impression was 
wrong and I'm very sorry for my analogy.
 
I'm no expert on montecarlo, so I can only say what I know. But most of the 
real experts have been rather quiet during the last week, so perhaps I can act 
as a stand-in.
 
The basics is indeed absolutely random play (alternating turns and removing 
captured stones as in normal play) until the game finishes. The only exceptions 
to random plays are occupied intersections, simple ko recaptures, suicide moves 
(not sure if all programs prune suicide if the rules allow it) and moves that 
fill one-point eyes of the color who is to play (there may be slight 
differences in the exact definition of a one point eye).
 
Without the one eye pruning, random games would never finish, because both 
random players would keep on killing their own stones and refilling the empty 
space after the opponent has captured. 
Even with eye detection and simple ko detection, the game could end up in a 
cycle occasionally because of triple ko or more pathological situations, so one 
should probably also limit the number of moves and discard the result if that 
limit is exceeded. This is probably a much cheaper solution to the cycle 
problem than a more sophisticated detection). 
 
A purely random game is played out until both players are forced to pass. This 
is a called a light playout. Claus Reinke recently dubbed the term "random 
fill-in" for this process 
(http://www.mail-archive.com/computer-go@computer-go.org/msg09749.html). I like 
this term better than "light playout", but it doesn't seem to catch on here.
 
The board is then counted to determine the winner and this binary result is 
added to a statistic for the move that is currently being evaluated. The 
accumulated statistic is effectively the evaluation value for the position. It 
is then backtracked to the game tree root in a minimax way.
 
In go it is hard to quickly evaluate a position statically (or even slowly for 
that matter). So it is nice to have a dynamic "evaluation" that becomes more 
accurate as you spend more cpu cyles on it. You can see that it's important 
that random fill-ins are fast, but it's still a rather large investment to do a 
lot of playouts, when you compare this with a static evaluation function. So it 
is even more important to avoid wasting effort in evaluating positions by 
random fill-ins. The trick is to use crude statistics (few fill-ins) to decide 
which statistic to refine by investing more fill-ins. This is where the UCT 
value comes in (http://senseis.xmp.net/?UCT). It is derived from the statistic 
and it decides which position to investigate further, preferring good 
statistics over bad ones, but also preferring crude statistics over accurate 
ones.
 
There is no training involved and there is no a priori knowledge involved here 
except for the rules and eye detection. The program has absolutely no idea 
about go concepts.
 
Would you call this process learning?
 
Programs using this algorithm are called "pure" MC programs and although they 
get better with more CPU cycles, they aren't very strong with normal time 
limits on current hardware (or hardware in the near future). You are right, by 
itself pure MC is just not effective enough.
 
So programmers started to build in more Go knowledge to improve on this. This 
makes the fill-in process slower, so they are called "heavy playouts". And this 
is also where my knowledge ends. I may guess that instead of selecting moves in 
a uniform random way, moves are weighted by a priori Go knowledge built in by 
the programmer, but I know no details and I don't know if there have been 
publications on this. Perhaps there aren't any, because it is also a 
competitive field and commercial interests might be involved too.
 
Other contributors can probably tell you more about this.
 
Best Regards,
Dave 
 



Van: tony tang [mailto:[EMAIL PROTECTED]
Verzonden: za 15-11-2008 21:22
Aan: [EMAIL PROTECTED]
Onderwerp: [RE:computer-go] Monte carlo play?


Hello Dave,

Thank you for a thorough introduction of the theory, and i sincerely hope I am 
not wasting your time with amateur questions.
Because it was quite late when I sent the question I think I didnt word it 
correctly and mis-communicated 
my question. 

Being a computer scientist but new to go, i can grasp some of the theory. The 
question I 

[computer-go] FW: computer-go] Monte carlo play?

2008-11-15 Thread dave.devos
Van: [EMAIL PROTECTED]
Verzonden: za 15-11-2008 22:44
Aan: tony tang
Onderwerp: RE: computer-go] Monte carlo play?


Hello Tony,
 
Ok, so you know about the minimax algorithm and the like. My impression was 
wrong and I'm very sorry for my analogy.
 
I'm no expert on montecarlo, so I can only say what I know. But most of the 
real experts have been rather quiet during the last week, so perhaps I can act 
as a stand-in.
 
The basics is indeed absolutely random play (alternating turns and removing 
captured stones as in normal play) until the game finishes. The only exceptions 
to random plays are occupied intersections, simple ko recaptures, suicide moves 
(not sure if all programs prune suicide if the rules allow it) and moves that 
fill one-point eyes of the color who is to play (there may be slight 
differences in the exact definition of a one point eye).
 
Without the one eye pruning, random games would never finish, because both 
random players would keep on killing their own stones and refilling the empty 
space after the opponent has captured. 
Even with eye detection and simple ko detection, the game could end up in a 
cycle occasionally because of triple ko or more pathological situations, so one 
should probably also limit the number of moves and discard the result if that 
limit is exceeded. This is probably a much cheaper solution to the cycle 
problem than a more sophisticated detection). 
 
A purely random game is played out until both players are forced to pass. This 
is a called a light playout. Claus Reinke recently dubbed the term "random 
fill-in" for this process 
(http://www.mail-archive.com/computer-go@computer-go.org/msg09749.html). I like 
this term better than "light playout", but it doesn't seem to catch on here.
 
The board is then counted to determine the winner and this binary result is 
added to a statistic for the move that is currently being evaluated. The 
accumulated statistic is effectively the evaluation value for the position. It 
is then backtracked to the game tree root in a minimax way.
 
In go it is hard to quickly evaluate a position statically (or even slowly for 
that matter). So it is nice to have a dynamic "evaluation" that becomes more 
accurate as you spend more cpu cyles on it. You can see that it's important 
that random fill-ins are fast, but it's still a rather large investment to do a 
lot of playouts, when you compare this with a static evaluation function. So it 
is even more important to avoid wasting effort in evaluating positions by 
random fill-ins. The trick is to use crude statistics (few fill-ins) to decide 
which statistic to refine by investing more fill-ins. This is where the UCT 
value comes in (http://senseis.xmp.net/?UCT). It is derived from the statistic 
and it decides which position to investigate further, preferring good 
statistics over bad ones, but also preferring crude statistics over accurate 
ones.
 
There is no training involved and there is no a priori knowledge involved here 
except for the rules and eye detection. The program has absolutely no idea 
about go concepts.
 
Would you call this process learning?
 
Programs using this algorithm are called "pure" MC programs and although they 
get better with more CPU cycles, they aren't very strong with normal time 
limits on current hardware (or hardware in the near future). You are right, by 
itself pure MC is just not effective enough.
 
So programmers started to build in more Go knowledge to improve on this. This 
makes the fill-in process slower, so they are called "heavy playouts". And this 
is also where my knowledge ends. I may guess that instead of selecting moves in 
a uniform random way, moves are weighted by a priori Go knowledge built in by 
the programmer, but I know no details and I don't know if there have been 
publications on this. Perhaps there aren't any, because it is also a 
competitive field and commercial interests might be involved too.
 
Other contributors can probably tell you more about this.
 
Best Regards,
Dave 
 



Van: tony tang [mailto:[EMAIL PROTECTED]
Verzonden: za 15-11-2008 21:22
Aan: [EMAIL PROTECTED]
Onderwerp: [RE:computer-go] Monte carlo play?


Hello Dave,

Thank you for a thorough introduction of the theory, and i sincerely hope I am 
not wasting your time with amateur questions.
Because it was quite late when I sent the question I think I didnt word it 
correctly and mis-communicated 
my question. 

Being a computer scientist but new to go, i can grasp some of the theory. The 
question I was trying to get across 
was:

In a game of self play, if both parties are employing only monte carlo, surely 
its not a good conceptual representation
of a human, and if the reinforcement learning is based on random simulations 
wouldnt it be very weak when playing a real human?
According to nick wedd and some playing around I am 

[computer-go] FW: computer-go] Monte carlo play?

2008-11-15 Thread dave.devos
Van: tony tang [mailto:[EMAIL PROTECTED]
Verzonden: za 15-11-2008 21:22
Aan: [EMAIL PROTECTED]
Onderwerp: [RE:computer-go] Monte carlo play?


Hello Dave,

Thank you for a thorough introduction of the theory, and i sincerely hope I am 
not wasting your time with amateur questions.
Because it was quite late when I sent the question I think I didnt word it 
correctly and mis-communicated 
my question. 

Being a computer scientist but new to go, i can grasp some of the theory. The 
question I was trying to get across 
was:

In a game of self play, if both parties are employing only monte carlo, surely 
its not a good conceptual representation
of a human, and if the reinforcement learning is based on random simulations 
wouldnt it be very weak when playing a real human?
According to nick wedd and some playing around I am clearly mistaken because 
its works quite well.
Is it because they are training(not in the neural network sense) their monte 
carlo based engine against another engine with prior knowledge or something
of that nature?

I have seen papers implementing patterns and prior knowledge with monte carlo 
(dated 2006) has that become a "standard"
now, and when people refer to monte carlo they dont mean absolutly random? 

Thank you

Kind regards

Tony




Read amazing stories to your kids on Messenger Try it Now! 
  
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

RE: [computer-go] Monte carlo play

2008-11-15 Thread dave.devos
Just a quickstart here, and I'm not one of the real experts, so it may contain 
inaccuracies. Perhaps you already know all this, perhaps not.
 
The basis of many game-playing programs (like chess programs) is the minimax 
(or negamax) algorithm.
It is a simple algorithm by itself and you can find a lot of background and 
examples for chess or other games if you google it.
Basically, minimax explores all possible variations stemming from the current 
position and then selects the best move.
 
This algorithm works in principle, but there is a problem with it:
For interesting games like chess and go, it will usually take the program 
billions of years to find the best move, given a game position.
 
In the past decades several adaptations of the minimax algorithm were invented 
to enable the algorithm to find a reasonably good move in a timespan that 
humans are prepared to wait for (like evaluation functions, alfabeta pruning 
and transposition tables). The program keeps investigating possibilities for 
some time and then the move seeming best so far is played. The longer the 
program is allowed to think (and the faster the computer), the better the move 
it will come up with.
 
These adaptations work by reducing the number of possibilities the program 
explores, without weakening the program too much. The program must ignore many 
possibilities to save time (this is called pruning), but the more possibilities 
it ignores, the higher the risk that it accidentally ignores the best move. 
Also, there is allways the danger of the horizon effect: The program 
misinterprets the situation because it simply doesn't get enough time to fully 
investigate every possibility. 
 
The challenge for the programmer is to strike an optimal balance between time 
consumption and losing by accidentally ignoring or misinterpreting the best 
move. Some game programs are even able to finetune themselves to optimize this 
balance over time (learning).
 
A combination of these adaptations have proven to be well enough to enable 
chess programs on current hardware to beat any human with reasonable "thinking" 
time.
But computer-go is a harder problem than computer-chess. The adaptations that 
are succesful in computer-chess are just not enough for computer-go, mainly 
because there ar far more possibilities (and I mean FAR more) to explore in go. 
 
But now there is a recent adaptation that has proven to be fairly succesful in 
computer-go: MonteCarlo playouts.
The program reduces the work to be done by gambling a number of times and 
measuring the average payoff. If the average payoff is good, the move is 
assumed to be good for the time being. Also, it doesn't automatically ignore 
moves with a low payoff (remember the risk of ignoring the best move). The 
program just postpones further investigation of seemingly bad moves until the 
moves seeming good initially give more and more dissapointing payoffs after 
further investigation. 
Although this invention has had only moderate success in computer-chess, it has 
proven to be fairly succesful in computer-go.
 
But at the heart of a MonteCarlo go program, you can still recognize the 
minimax algorithm.
 
Best Regards,
Dave
 
 


Van: [EMAIL PROTECTED] namens tony tang
Verzonden: wo 12-11-2008 2:23
Aan: computer-go@computer-go.org
Onderwerp: [computer-go] Monte carlo play


Greetings all go gurus,
 
I am new to programming go, could some one explain to me how a monte carlo 
based evalution manages to play random games by itself?
ie: who/what is the oppoent which supplies the opposing moves which allows 
another move to be randomly played after making the initial move at the root.
I am implementing in java, is there a package/framework which allows me to 
train my software.
 
Thank you
 
Kind regards
 
Tony




Read amazing stories to your kids on Messenger Try it Now! 
<http://clk.atdmt.com/UKM/go/117588488/direct/01/>  
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

RE: [computer-go] Monte carlo play

2008-11-15 Thread dave.devos
You are absolutely right. Tony did not imply he wanted to go for the summit 
today. Also, Tony's qualifications are surely a lot better than a hawaiian 
shirt and a surfboard. 
 
My analogy was an exaggeration with the intention of explaining to Tony that he 
has more catching up to do than he might have expected. 
By no means was my intention to put Tony down. I hope Tony understands this and 
I sincerely hope that Tony won't give up on joining us in this wonderful 
endeavour, because we need all the help we can get.
 
Best Regards,
Dave



Van: [EMAIL PROTECTED] namens Nick Wedd
Verzonden: za 15-11-2008 15:48
Aan: computer-go
Onderwerp: Re: [computer-go] Monte carlo play



In message
<[EMAIL PROTECTED]>,
[EMAIL PROTECTED] writes

< Hawaiian shirt analogy snipped >

>I hope you don't feel offended. Indeed you took up a wonderful
>endeavour, but I sense that you're not quite ready to go for the summit
>today.

But Tony never expressed any interest in "going for the summit".  Maybe
he just wants to write a simple MC implementation.  Here is what he
wrote:

>I am new to programming go, could some one explain to me how a monte
>carlo based evalution manages to play random games by itself?
>ie: who/what is the oppoent which supplies the opposing moves which
>allows another move to be randomly played after making the initial move
>at the root.

>I am implementing in java, is there a package/framework which allows me
>to train my software.

I will try to answer these questions myself.

> who/what is the oppoent which supplies the opposing moves

It is the same piece of your program that supplies your own moves.  For
each "random game" or "rollout", your program places both black and
white stones, alternately, at random.

>is there a package/framework which allows me
>to train my software.

Nothing "trains" your software.  This isn't a neural net.  The way it
works is:
  you write a program
  A: it plays, badly
 you modify it so it plays differently, and with luck, better
 repeat from A

Nick
--
Nick Wedd[EMAIL PROTECTED]
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Monte carlo play

2008-11-15 Thread Nick Wedd
In message
<[EMAIL PROTECTED]>,
[EMAIL PROTECTED] writes

< Hawaiian shirt analogy snipped >

>I hope you don't feel offended. Indeed you took up a wonderful
>endeavour, but I sense that you're not quite ready to go for the summit
>today.

But Tony never expressed any interest in "going for the summit".  Maybe
he just wants to write a simple MC implementation.  Here is what he
wrote:

>I am new to programming go, could some one explain to me how a monte
>carlo based evalution manages to play random games by itself?
>ie: who/what is the oppoent which supplies the opposing moves which
>allows another move to be randomly played after making the initial move
>at the root.

>I am implementing in java, is there a package/framework which allows me
>to train my software.

I will try to answer these questions myself.

> who/what is the oppoent which supplies the opposing moves

It is the same piece of your program that supplies your own moves.  For
each "random game" or "rollout", your program places both black and
white stones, alternately, at random.

>is there a package/framework which allows me
>to train my software.

Nothing "trains" your software.  This isn't a neural net.  The way it
works is:
  you write a program
  A: it plays, badly
 you modify it so it plays differently, and with luck, better
 repeat from A

Nick
-- 
Nick Wedd[EMAIL PROTECTED]
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


RE: [computer-go] Monte carlo play

2008-11-15 Thread dave.devos
Hello Tony,
 
I'm just speaking for myself here, but I suspect you may not get the answers 
you are looking for, because you're not asking the right questions.
 
I get the impression that the situation is similar to this scenario:
 

A young guy with a surf board and a hawai shirt turns up at 
basecamp on mount Everest and says:
"Hey dudes, what's up!
This morning I had this wonderful idea: I wanna climb the mount 
Everest!
So here I am, and I even brought my hiking shoes!
But can anyone give me directions to the nearest McDonalds, cuz 
I'm starving."
 
Everybody in the tent stares at him speechlessy.

I may be wrong, but that's my impression.
I'm not even sure whether if I myself should be here at basecamp, even though I 
am a 4d in Go and even though I have 25 years of programming experience in a 
couple of different languages (including the last 8 years as a fulltime 
professional developer.)
It's only logical that ever since I've learned Go 20 years ago I've had a 
special interest in computer-go and I spent some time over the years attempting 
to build a strong go playing program myself (although not nearly as much time 
as some others here).
I'm a mountaineer, but climbing the mount Everest is something else. I feel a 
bit like a rooky between all the seasoned mountaineers hanging around here at 
the Everest basecamp. Most of the time I keep my mouth shut when one of them is 
talking, because I'm a bit apprehensive that I might say something that is 
utterly obvious to the others or even outright stupid.
 
And then this guy turns up in his hawai shirt.
 
I hope you don't feel offended. Indeed you took up a wonderful endeavour, but I 
sense that you're not quite ready to go for the summit today.
 
Best Regards,
Dave



Van: [EMAIL PROTECTED] namens tony tang
Verzonden: do 13-11-2008 22:03
Aan: go mailing list
Onderwerp: RE: [computer-go] Monte carlo play


Thanks Darren,
 
After reading a couple of interesting papers on this field I am applying 
pattern matching to limit the number of possibilities of the monte carlo tree 
(on 19x19 it appears to take a day to play).
I was wondering if you could comment on weather my following concept is right 
on the evalution part.
 
The possible patterns for the current situation are passed to a class where 
self play begins. Each different pattern runs on a instance, and at the end the 
results are written into a file. The game tree will be constructed using the 
stats from the file. 
 
Thanks again
 
Tony
 
 


> Date: Wed, 12 Nov 2008 10:34:21 +0900
> From: [EMAIL PROTECTED]
> To: computer-go@computer-go.org
> Subject: Re: [computer-go] Monte carlo play
> 
> > I am new to programming go, could some one explain to me how a monte
> > carlo based evalution manages to play random games by itself? ie:
> > who/what is the oppoent which supplies the opposing moves which
> > allows another move to be randomly played after making the initial
> 
> It is self-play, so both players are random.
> 
> I'm not aware of any programs that use different playout strategies for
> black/white, though it has been briefly mentioned before on this list,
> and I personally think it might be interesting as a way of discovering
> and more accurately evaluating the imbalances present in most go
> positions (i.e. usually one side has more strength and one side has more
> territory, and so to correctly understand the position one side has to
> play aggressive moves and the other has to play defensive moves).
> 
> > move at the root. I am implementing in java, is there a
> > package/framework which allows me to train my software.
> 
> This page, recently started by Eric Marchand, might be a good starting
> place:
> http://ricoh51.free.fr/go/engineeng.htm
> 
> Darren
> 
> 
> -- 
> Darren Cook, Software Researcher/Developer
> http://dcook.org/mlsn/ (English-Japanese-German-Chinese-Arabic
> open source dictionary/semantic network)
> http://dcook.org/work/ (About me and my work)
> http://dcook.org/blogs.html (My blogs and articles)
> ___
> computer-go mailing list
> computer-go@computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/





Get the best wallpapers on the Web - FREE. Click here! 
<http://wallpapers.msn.com/?ocid=[B001MSN42A0716B]>  
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

RE: [computer-go] Monte carlo play

2008-11-13 Thread tony tang

Thanks Darren,
 
After reading a couple of interesting papers on this field I am applying 
pattern matching to limit the number of possibilities of the monte carlo tree 
(on 19x19 it appears to take a day to play).
I was wondering if you could comment on weather my following concept is right 
on the evalution part.
 
The possible patterns for the current situation are passed to a class where 
self play begins. Each different pattern runs on a instance, and at the end the 
results are written into a file. The game tree will be constructed using the 
stats from the file. 
 
Thanks again
 
Tony
 
 
> Date: Wed, 12 Nov 2008 10:34:21 +0900> From: [EMAIL PROTECTED]> To: 
> computer-go@computer-go.org> Subject: Re: [computer-go] Monte carlo play> > > 
> I am new to programming go, could some one explain to me how a monte> > carlo 
> based evalution manages to play random games by itself? ie:> > who/what is 
> the oppoent which supplies the opposing moves which> > allows another move to 
> be randomly played after making the initial> > It is self-play, so both 
> players are random.> > I'm not aware of any programs that use different 
> playout strategies for> black/white, though it has been briefly mentioned 
> before on this list,> and I personally think it might be interesting as a way 
> of discovering> and more accurately evaluating the imbalances present in most 
> go> positions (i.e. usually one side has more strength and one side has more> 
> territory, and so to correctly understand the position one side has to> play 
> aggressive moves and the other has to play defensive moves).> > > move at the 
> root. I am implementing in java, is there a> > package/framework which allows 
> me to train my software.> > This page, recently started by Eric Marchand, 
> might be a good starting> place:> http://ricoh51.free.fr/go/engineeng.htm> > 
> Darren> > > -- > Darren Cook, Software Researcher/Developer> 
> http://dcook.org/mlsn/ (English-Japanese-German-Chinese-Arabic> open source 
> dictionary/semantic network)> http://dcook.org/work/ (About me and my work)> 
> http://dcook.org/blogs.html (My blogs and articles)> 
> ___> computer-go mailing list> 
> computer-go@computer-go.org> 
> http://www.computer-go.org/mailman/listinfo/computer-go/
_
Win £1000 John Lewis shopping sprees with BigSnapSearch.com
http://clk.atdmt.com/UKM/go/117442309/direct/01/___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Monte carlo play

2008-11-11 Thread Darren Cook
> I am new to programming go, could some one explain to me how a monte
> carlo based evalution manages to play random games by itself? ie:
> who/what is the oppoent which supplies the opposing moves which
> allows another move to be randomly played after making the initial

It is self-play, so both players are random.

I'm not aware of any programs that use different playout strategies for
black/white, though it has been briefly mentioned before on this list,
and I personally think it might be interesting as a way of discovering
and more accurately evaluating the imbalances present in most go
positions (i.e. usually one side has more strength and one side has more
territory, and so to correctly understand the position one side has to
play aggressive moves and the other has to play defensive moves).

> move at the root. I am implementing in java, is there a
> package/framework which allows me to train my software.

This page, recently started by Eric Marchand, might be a good starting
place:
  http://ricoh51.free.fr/go/engineeng.htm

Darren


-- 
Darren Cook, Software Researcher/Developer
http://dcook.org/mlsn/ (English-Japanese-German-Chinese-Arabic
open source dictionary/semantic network)
http://dcook.org/work/ (About me and my work)
http://dcook.org/blogs.html (My blogs and articles)
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] Monte carlo play

2008-11-11 Thread tony tang

Greetings all go gurus,
 
I am new to programming go, could some one explain to me how a monte carlo 
based evalution manages to play random games by itself?
ie: who/what is the oppoent which supplies the opposing moves which allows 
another move to be randomly played after making the initial move at the root.
I am implementing in java, is there a package/framework which allows me to 
train my software.
 
Thank you
 
Kind regards
 
Tony
_
See the most popular videos on the web 
http://clk.atdmt.com/GBL/go/115454061/direct/01/___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/