Re: [Computer-go] Monte-Carlo Tree Search as Regularized Policy Optimization

2020-07-19 Thread Kensuke Matsuzaki
I used stochastic sampling at internal nodes, because of this.
> During the forward simulation phase of SEARCH, the action at each node x is 
> selected by sampling a ∼ π¯(·|x).
> As a result, the full imaginary trajectory is generated consistently 
> according to policy π¯.

> In this section, we establish our main claim namely that AlphaZero’s action 
> selection criteria can be interpreted as approximating the solution to a 
> regularized policy-optimization objective.

I think they say UCT and PUCT is approximation of direct π¯ sampling,
but I haven't understood section 3 well.

2020年7月20日(月) 2:51 Daniel :
>
> @Kensuke I suppose all the proposed algorithms ACT, SEARCH and LEARN are 
> meant to be used during training, no?
> I think I understand ACT and LEARN but I am not sure about SEARCH for which 
> they say this:
>
> > During search, we propose to stochastically sample actions according to π¯ 
> > instead of the deterministic action selection rule of Eq. 1.
>
> This sounds much like the random selection done at the root with temperature, 
> but this time applied at internal nodes.
> Does it mean the pUCT formula is not used? Why does the selection have to be 
> stochastic now?
> On selection, you compute π_bar every time from (q, π_theta, n_visits) so I 
> suppose π_bar has everything it needs to balance exploration and exploitation.
>
>
> On Sun, Jul 19, 2020 at 8:10 AM David Wu  wrote:
>>
>> I imagine that at low visits at least, "ACT" behaves similarly to Leela 
>> Zero's "LCB" move selection, which also has the effect of sometimes 
>> selecting a move that is not the max-visits move, if its value estimate has 
>> recently been found to be sufficiently larger to balance the fact that it is 
>> lower prior and lower visits (at least, typically, this is why the move 
>> wouldn't have been the max visits move in the first place). It also scales 
>> in an interesting way with empirical observed playout-by-playout variance of 
>> moves, but I think by far the important part is that it can use sufficiently 
>> confident high value to override max-visits.
>>
>> The gain from "LCB" in match play I recall is on the very very rough order 
>> of 100 Elo, although it could be less or more depending on match conditions 
>> and what neural net is used and other things. So for LZ at least, "ACT"-like 
>> behavior at low visits is not new.
>>
>>
>> On Sun, Jul 19, 2020 at 5:39 AM Kensuke Matsuzaki  
>> wrote:
>>>
>>> Hi,
>>>
>>> I couldn't improve leela zero's strength by implementing SEARCH and ACT.
>>> https://github.com/zakki/leela-zero/commits/regularized_policy
>>>
>>> 2020年7月17日(金) 2:47 Rémi Coulom :
>>> >
>>> > This looks very interesting.
>>> >
>>> > From a quick glance, it seems the improvement is mainly when the number 
>>> > of playouts is small. Also they don't test on the game of Go. Has anybody 
>>> > tried it?
>>> >
>>> > I will take a deeper look later.
>>> >
>>> > On Thu, Jul 16, 2020 at 9:49 AM Ray Tayek  wrote:
>>> >>
>>> >> https://old.reddit.com/r/MachineLearning/comments/hrzooh/r_montecarlo_tree_search_as_regularized_policy/
>>> >>
>>> >>
>>> >> --
>>> >> Honesty is a very expensive gift. So, don't expect it from cheap people 
>>> >> - Warren Buffett
>>> >> http://tayek.com/
>>> >>
>>> >> ___
>>> >> Computer-go mailing list
>>> >> Computer-go@computer-go.org
>>> >> http://computer-go.org/mailman/listinfo/computer-go
>>> >
>>> > ___
>>> > Computer-go mailing list
>>> > Computer-go@computer-go.org
>>> > http://computer-go.org/mailman/listinfo/computer-go
>>>
>>>
>>>
>>> --
>>> Kensuke Matsuzaki
>>> ___
>>> Computer-go mailing list
>>> Computer-go@computer-go.org
>>> http://computer-go.org/mailman/listinfo/computer-go
>>
>> ___
>> Computer-go mailing list
>> Computer-go@computer-go.org
>> http://computer-go.org/mailman/listinfo/computer-go
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go



-- 
Kensuke Matsuzaki
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go


Re: [Computer-go] Monte-Carlo Tree Search as Regularized Policy Optimization

2020-07-19 Thread Daniel
@Kensuke I suppose all the proposed algorithms ACT, SEARCH and LEARN are
meant to be used during training, no?
I think I understand ACT and LEARN but I am not sure about SEARCH for which
they say this:

> During search, we propose to stochastically sample actions according to
π¯ instead of the deterministic action selection rule of Eq. 1.

This sounds much like the random selection done at the root with
temperature, but this time applied at internal nodes.
Does it mean the pUCT formula is not used? Why does the selection have to
be stochastic now?
On selection, you compute π_bar every time from (q, π_theta, n_visits) so I
suppose π_bar has everything it needs to balance exploration and
exploitation.


On Sun, Jul 19, 2020 at 8:10 AM David Wu  wrote:

> I imagine that at low visits at least, "ACT" behaves similarly to Leela
> Zero's "LCB" move selection, which also has the effect of sometimes
> selecting a move that is not the max-visits move, if its value estimate has
> recently been found to be sufficiently larger to balance the fact that it
> is lower prior and lower visits (at least, typically, this is why the move
> wouldn't have been the max visits move in the first place). It also scales
> in an interesting way with empirical observed playout-by-playout variance
> of moves, but I think by far the important part is that it can use
> sufficiently confident high value to override max-visits.
>
> The gain from "LCB" in match play I recall is on the very very rough order
> of 100 Elo, although it could be less or more depending on match conditions
> and what neural net is used and other things. So for LZ at least,
> "ACT"-like behavior at low visits is not new.
>
>
> On Sun, Jul 19, 2020 at 5:39 AM Kensuke Matsuzaki 
> wrote:
>
>> Hi,
>>
>> I couldn't improve leela zero's strength by implementing SEARCH and ACT.
>> https://github.com/zakki/leela-zero/commits/regularized_policy
>>
>> 2020年7月17日(金) 2:47 Rémi Coulom :
>> >
>> > This looks very interesting.
>> >
>> > From a quick glance, it seems the improvement is mainly when the number
>> of playouts is small. Also they don't test on the game of Go. Has anybody
>> tried it?
>> >
>> > I will take a deeper look later.
>> >
>> > On Thu, Jul 16, 2020 at 9:49 AM Ray Tayek  wrote:
>> >>
>> >>
>> https://old.reddit.com/r/MachineLearning/comments/hrzooh/r_montecarlo_tree_search_as_regularized_policy/
>> >>
>> >>
>> >> --
>> >> Honesty is a very expensive gift. So, don't expect it from cheap
>> people - Warren Buffett
>> >> http://tayek.com/
>> >>
>> >> ___
>> >> Computer-go mailing list
>> >> Computer-go@computer-go.org
>> >> http://computer-go.org/mailman/listinfo/computer-go
>> >
>> > ___
>> > Computer-go mailing list
>> > Computer-go@computer-go.org
>> > http://computer-go.org/mailman/listinfo/computer-go
>>
>>
>>
>> --
>> Kensuke Matsuzaki
>> ___
>> Computer-go mailing list
>> Computer-go@computer-go.org
>> http://computer-go.org/mailman/listinfo/computer-go
>>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go


Re: [Computer-go] Monte-Carlo Tree Search as Regularized Policy Optimization

2020-07-19 Thread David Wu
I imagine that at low visits at least, "ACT" behaves similarly to Leela
Zero's "LCB" move selection, which also has the effect of sometimes
selecting a move that is not the max-visits move, if its value estimate has
recently been found to be sufficiently larger to balance the fact that it
is lower prior and lower visits (at least, typically, this is why the move
wouldn't have been the max visits move in the first place). It also scales
in an interesting way with empirical observed playout-by-playout variance
of moves, but I think by far the important part is that it can use
sufficiently confident high value to override max-visits.

The gain from "LCB" in match play I recall is on the very very rough order
of 100 Elo, although it could be less or more depending on match conditions
and what neural net is used and other things. So for LZ at least,
"ACT"-like behavior at low visits is not new.


On Sun, Jul 19, 2020 at 5:39 AM Kensuke Matsuzaki 
wrote:

> Hi,
>
> I couldn't improve leela zero's strength by implementing SEARCH and ACT.
> https://github.com/zakki/leela-zero/commits/regularized_policy
>
> 2020年7月17日(金) 2:47 Rémi Coulom :
> >
> > This looks very interesting.
> >
> > From a quick glance, it seems the improvement is mainly when the number
> of playouts is small. Also they don't test on the game of Go. Has anybody
> tried it?
> >
> > I will take a deeper look later.
> >
> > On Thu, Jul 16, 2020 at 9:49 AM Ray Tayek  wrote:
> >>
> >>
> https://old.reddit.com/r/MachineLearning/comments/hrzooh/r_montecarlo_tree_search_as_regularized_policy/
> >>
> >>
> >> --
> >> Honesty is a very expensive gift. So, don't expect it from cheap people
> - Warren Buffett
> >> http://tayek.com/
> >>
> >> ___
> >> Computer-go mailing list
> >> Computer-go@computer-go.org
> >> http://computer-go.org/mailman/listinfo/computer-go
> >
> > ___
> > Computer-go mailing list
> > Computer-go@computer-go.org
> > http://computer-go.org/mailman/listinfo/computer-go
>
>
>
> --
> Kensuke Matsuzaki
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go


Re: [Computer-go] Monte-Carlo Tree Search as Regularized Policy Optimization

2020-07-19 Thread Kensuke Matsuzaki
Hi,

I couldn't improve leela zero's strength by implementing SEARCH and ACT.
https://github.com/zakki/leela-zero/commits/regularized_policy

2020年7月17日(金) 2:47 Rémi Coulom :
>
> This looks very interesting.
>
> From a quick glance, it seems the improvement is mainly when the number of 
> playouts is small. Also they don't test on the game of Go. Has anybody tried 
> it?
>
> I will take a deeper look later.
>
> On Thu, Jul 16, 2020 at 9:49 AM Ray Tayek  wrote:
>>
>> https://old.reddit.com/r/MachineLearning/comments/hrzooh/r_montecarlo_tree_search_as_regularized_policy/
>>
>>
>> --
>> Honesty is a very expensive gift. So, don't expect it from cheap people - 
>> Warren Buffett
>> http://tayek.com/
>>
>> ___
>> Computer-go mailing list
>> Computer-go@computer-go.org
>> http://computer-go.org/mailman/listinfo/computer-go
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go



-- 
Kensuke Matsuzaki
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go


Re: [Computer-go] Monte-Carlo Tree Search as Regularized Policy Optimization

2020-07-16 Thread Rémi Coulom
This looks very interesting.

>From a quick glance, it seems the improvement is mainly when the number of
playouts is small. Also they don't test on the game of Go. Has anybody
tried it?

I will take a deeper look later.

On Thu, Jul 16, 2020 at 9:49 AM Ray Tayek  wrote:

>
> https://old.reddit.com/r/MachineLearning/comments/hrzooh/r_montecarlo_tree_search_as_regularized_policy/
>
>
> --
> Honesty is a very expensive gift. So, don't expect it from cheap people -
> Warren Buffett
> http://tayek.com/
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go


Re: [Computer-go] monte carlo search; all valid moves?

2015-03-28 Thread Urban Hafner
Exactly you don't check super ko in the playouts and you generally stop 
playouts at 3 * boardsize * boardsize. But once you start adding more logic 
into the playouts (like not playing in single point eyes) they tend not to hit 
this limit. 

Von meinem iPhone gesendet

 Am 28.03.2015 um 11:53 schrieb hughperkins2 hughperki...@gmail.com:
 
 Oh wait, superko check is not that cheap, but it is so rare, you can probably 
 ignore it in playouts, and jist check befote submitting a move to the server. 
 If its superko, then jist pass pethaps. 
 ___
 Computer-go mailing list
 Computer-go@computer-go.org
 http://computer-go.org/mailman/listinfo/computer-go
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] monte carlo search; all valid moves?

2015-03-28 Thread folkert
Well you could check if a field is empty or not but you could also
check if you're putting stones in a dead area or if it would be a
suicide, etc.

On Sat, Mar 28, 2015 at 10:15:32AM +0100, ?lvaro Begu? wrote:
 I am not sure I understand the question. The only thing that is typically
 not checked in the playouts is superko. What other validity checks are
 you performing?
 
 Álvaro.
 
 
 
 On Sat, Mar 28, 2015 at 9:54 AM, holger krekel hol...@merlinux.eu wrote:
 
  On Sat, Mar 28, 2015 at 08:51 +0100, folkert wrote:
   Hi,
  
   For a monte carlo search, are only valid moves performed? Or does it
   work from beginning to the end of a playout using whatever free position
   is available on the board?
 
  I am also interested in the question.
 
   Because I read here that people can do 25k playouts per second while my
   program can only do ~ 20 per second when doing full validity checks on
   all steps.
 
  Do you have a reference, some context for the 25K playouts?
 
  holger
 
 
  
   Folkert van Heusden
  
   --
   --
   Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com
   ___
   Computer-go mailing list
   Computer-go@computer-go.org
   http://computer-go.org/mailman/listinfo/computer-go
 
  --
  about me:http://holgerkrekel.net/about-me/
  contracting: http://merlinux.eu
  ___
  Computer-go mailing list
  Computer-go@computer-go.org
  http://computer-go.org/mailman/listinfo/computer-go
 

 ___
 Computer-go mailing list
 Computer-go@computer-go.org
 http://computer-go.org/mailman/listinfo/computer-go



Folkert van Heusden

-- 
MultiTail is a versatile tool for watching logfiles and output of
commands. Filtering, coloring, merging, diff-view, etc.
http://www.vanheusden.com/multitail/
--
Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] monte carlo search; all valid moves?

2015-03-28 Thread Urban Hafner
20 playouts is a bit slow Folkert. But I've been there, too! Now my bot is
at around 1000 19x19 pps and 5000 on 9x9 (but I also check to not play into
single point eyes). That's still quite slow and further optimisation will
be required. But OTOH on a 4 core machine without UCT (only the AMAF)
heuristic I get around 1250ELO on CGOS 13x13 (
http://cgos.boardspace.net/13x13/cross/Imrscl-017-A-N.html).

Urban

On Sat, Mar 28, 2015 at 11:30 AM, Petr Baudis pa...@ucw.cz wrote:

 On Sat, Mar 28, 2015 at 10:15:32AM +0100, Álvaro Begué wrote:
  I am not sure I understand the question. The only thing that is typically
  not checked in the playouts is superko. What other validity checks are
  you performing?

 There is a few:

 (i) No single stone suicide.  Can't think of how to playout without
 that, you wouldn't know when to stop playing.

 (ii) No multi stone suicide.  Sometimes in the fastest implementations
 this rule is relaxed (i.e. you proclaim that you are using Tromp-Taylor
 or New Zealand rules).  But in real-world engines, you need to check
 things like self-atari or 2-liberty semeai and then a suicide check
 naturally falls out from that too.

 (iii) No ko. This is a pretty cheap check, and without it again your
 playouts may end up looping.

 BTW, the fastest playouts are probably done by libego and that's open
 source.  But as I said in the past - consider not optimizing first;
 write some reasonable playout heuristics first and *then* profile your
 code.

 Petr Baudis
 ___
 Computer-go mailing list
 Computer-go@computer-go.org
 http://computer-go.org/mailman/listinfo/computer-go




-- 
Blog: http://bettong.net/
Twitter: https://twitter.com/ujh
Homepage: http://www.urbanhafner.com/
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] monte carlo search; all valid moves?

2015-03-28 Thread hughperkins2
20 playouts per second is pretty slow. Are you sure youre not just looping 
around endlessly for 1 moves or something? 

Playing in a dead area is legal, as is filling your own eyes etc. Only suicide, 
and ko is illegal. Even superko is a pretty cheap check. ___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] monte carlo search; all valid moves?

2015-03-28 Thread folkert
  Because I read here that people can do 25k playouts per second while my
  program can only do ~ 20 per second when doing full validity checks on
  all steps.
 Do you have a reference, some context for the 25K playouts?

Look for the topic What's a good playout speed? in the mailinglist
archives.

http://computer-go.org/pipermail/computer-go/


Folkert van Heusden

-- 
Curious about the inner workings of your car? Then check O2OO: it'll
tell you all that there is to know about your car's engine!
http://www.vanheusden.com/O2OO/
--
Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] monte carlo search; all valid moves?

2015-03-28 Thread holger krekel
On Sat, Mar 28, 2015 at 08:51 +0100, folkert wrote:
 Hi,
 
 For a monte carlo search, are only valid moves performed? Or does it
 work from beginning to the end of a playout using whatever free position
 is available on the board?
 
I am also interested in the question.

 Because I read here that people can do 25k playouts per second while my
 program can only do ~ 20 per second when doing full validity checks on
 all steps.

Do you have a reference, some context for the 25K playouts?

holger


 
 Folkert van Heusden
 
 -- 
 --
 Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com
 ___
 Computer-go mailing list
 Computer-go@computer-go.org
 http://computer-go.org/mailman/listinfo/computer-go

-- 
about me:http://holgerkrekel.net/about-me/
contracting: http://merlinux.eu
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] monte carlo search; all valid moves?

2015-03-28 Thread Álvaro Begué
I am not sure I understand the question. The only thing that is typically
not checked in the playouts is superko. What other validity checks are
you performing?

Álvaro.



On Sat, Mar 28, 2015 at 9:54 AM, holger krekel hol...@merlinux.eu wrote:

 On Sat, Mar 28, 2015 at 08:51 +0100, folkert wrote:
  Hi,
 
  For a monte carlo search, are only valid moves performed? Or does it
  work from beginning to the end of a playout using whatever free position
  is available on the board?

 I am also interested in the question.

  Because I read here that people can do 25k playouts per second while my
  program can only do ~ 20 per second when doing full validity checks on
  all steps.

 Do you have a reference, some context for the 25K playouts?

 holger


 
  Folkert van Heusden
 
  --
  --
  Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com
  ___
  Computer-go mailing list
  Computer-go@computer-go.org
  http://computer-go.org/mailman/listinfo/computer-go

 --
 about me:http://holgerkrekel.net/about-me/
 contracting: http://merlinux.eu
 ___
 Computer-go mailing list
 Computer-go@computer-go.org
 http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] monte carlo search; all valid moves?

2015-03-28 Thread hughperkins2
Oh wait, superko check is not that cheap, but it is so rare, you can probably 
ignore it in playouts, and jist check befote submitting a move to the server. 
If its superko, then jist pass pethaps. ___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [computer-go] monte carlo

2009-10-25 Thread Folkert van Heusden
What method are you guys using for the monte carlo search? What do you
do?
I pick at random a move, then for the resulting board that comes out of
that pick another move and so on.
Then, after that I evaulate the number of stones etc.
What do you guys look at to select a move using monte carlo?

On Sat, Oct 17, 2009 at 05:02:33PM +0200, Folkert van Heusden wrote:
 People,
 
 I'm trying to implement a monthecarlo algorithm in my go program. Now
 the results are dramatic: the elo-rating of my go program drops from
 1150 to below 700. I tried:
  - evaluate the number of captured stone
  - evaluate strategic elements (without MC this strategic eval gives
that 1150 elo).
 Currently my program can evaluate 500 scenes per second and I let it
 think for 5 seconds.
 What could be the cause of this dramatic results? Wrong evaluation? Not
 enough nodes processed?


Folkert van Heusden

-- 
www.vanheusden.com/multitail - multitail is tail on steroids. multiple
   windows, filtering, coloring, anything you can think of
--
Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] monte carlo

2009-10-25 Thread Petr Baudis
On Sun, Oct 25, 2009 at 06:52:56PM +0100, Folkert van Heusden wrote:
 What method are you guys using for the monte carlo search? What do you
 do?
 I pick at random a move, then for the resulting board that comes out of
 that pick another move and so on.
 Then, after that I evaulate the number of stones etc.

Do you mean you count the score? Make sure you actually count the stone
right, and (quite important) that you are picking out moves really at
random, not skewed in any way.

 What do you guys look at to select a move using monte carlo?

People use heuristics to prefer captures, escaping atari, play local
moves that match certain 3x3 patterns etc. - there is plenty of papers
covering this, I recommend you look at them. It's fairly important to
have all the heuristics somewhat randomized (depending on other parts of
my engine, I've found any value between 60% to 90% probability of
applying each heuristic optimal).

-- 
Petr Pasky Baudis
A lot of people have my books on their bookshelves.
That's the problem, they need to read them. -- Don Knuth
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] monte carlo

2009-10-25 Thread Folkert van Heusden
  What method are you guys using for the monte carlo search? What do you
  do?
  I pick at random a move, then for the resulting board that comes out of
  that pick another move and so on.
  Then, after that I evaulate the number of stones etc.
 
 Do you mean you count the score? Make sure you actually count the stone
 right, and (quite important) that you are picking out moves really at
 random, not skewed in any way.

Yes, the score indeed. Well, only the number of stones removed from the
board. I did not implement an algorithm to count areas belonging to a
color yet.

  What do you guys look at to select a move using monte carlo?
 People use heuristics to prefer captures, escaping atari, play local
 moves that match certain 3x3 patterns etc. - there is plenty of papers
 covering this, I recommend you look at them. It's fairly important to
 have all the heuristics somewhat randomized (depending on other parts of
 my engine, I've found any value between 60% to 90% probability of
 applying each heuristic optimal).

Yeah, I rated most of them equally. Some are weighted in depending on
applicability.

May I ask: how many board-settings (one move + all evaluation) do your
programs calculate per second?


Folkert van Heusden

-- 
Ever wonder what is out there? Any alien races? Then please support
the s...@home project: setiathome.ssl.berkeley.edu
--
Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] monte carlo

2009-10-25 Thread Petr Baudis
On Sun, Oct 25, 2009 at 10:16:58PM +0100, Folkert van Heusden wrote:
   What method are you guys using for the monte carlo search? What do you
   do?
   I pick at random a move, then for the resulting board that comes out of
   that pick another move and so on.
   Then, after that I evaulate the number of stones etc.
  
  Do you mean you count the score? Make sure you actually count the stone
  right, and (quite important) that you are picking out moves really at
  random, not skewed in any way.
 
 Yes, the score indeed. Well, only the number of stones removed from the
 board. I did not implement an algorithm to count areas belonging to a
 color yet.

That would seem to have only quite limited correlation with winning the
game.

 May I ask: how many board-settings (one move + all evaluation) do your
 programs calculate per second?

Depends on the hardware. ;-) We usually count in number of complete
playouts per second.

On 9x9, libego can play i think about 100k playouts per second on decent
CPU, but is very light. Pachi can do 35k light playouts per second and
15k heavy playouts per second on the same CPU (single-threaded). Both
Pachi and libego are written in a compiled language; when your speed is
in the right order of magnitude, I've found playout move selection
improvements much more profitable than further optimizing for speed.

-- 
Petr Pasky Baudis
A lot of people have my books on their bookshelves.
That's the problem, they need to read them. -- Don Knuth
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


RE: [computer-go] monte carlo

2009-10-25 Thread David Fotland
Many Faces of go is currently much slower.  On 9x9 it does about 18k
playouts per second, on two CPU cores, so about 9K per thread.  The average
9x9 game is about 92 moves, so there are about 1.6 million board-settings
per second.

The playouts are far from random.  Many moves are local responses to the
last move, and when there is no forced local response, the global moves are
not fully random either. 

My original light code did about 55K playouts per second on one CPU core,
but was far weaker.

David

 
  May I ask: how many board-settings (one move + all evaluation) do your
  programs calculate per second?
 
 Depends on the hardware. ;-) We usually count in number of complete
 playouts per second.
 
 On 9x9, libego can play i think about 100k playouts per second on decent
 CPU, but is very light. Pachi can do 35k light playouts per second and
 15k heavy playouts per second on the same CPU (single-threaded). Both
 Pachi and libego are written in a compiled language; when your speed is
 in the right order of magnitude, I've found playout move selection
 improvements much more profitable than further optimizing for speed.


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Simulation Balancing

2009-08-13 Thread Isaac Deutsch
I admit I had trouble understanding the details of the paper. What I  
think is the biggest problem for applying this to bigger (up to 19x19)  
games is that you somehow need access to the true value of a move,  
i.e. it's a win or a loss. On the 5x5 board they used, this might be  
approximated pretty well, but there's no chance on 19x19 to do so.



Am 13.08.2009 um 05:14 schrieb Michael Williams:

After about the 5th reading, I'm concluding that this is an  
excellent paper.  Is anyone (besides the authors) doing research  
based on this?  There is a lot to do.




___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Simulation Balancing

2009-08-13 Thread Michael Williams
In future papers they should avoid using a strong authority like Fuego for the training and instead force it to learn from a naive uniform random playout policy 
(with 100x or 1000x more playouts) and then build on that with an iterative approach (as was suggested in the paper).


I also had another thought.  Since they are training the policy to maximize the balance and not the winrate, wouldn't you be able to extract more information 
from each trial by using the score instead of the game result?  The normal pitfalls to doing so do not apply here.




Isaac Deutsch wrote:
I admit I had trouble understanding the details of the paper. What I 
think is the biggest problem for applying this to bigger (up to 19x19) 
games is that you somehow need access to the true value of a move, 
i.e. it's a win or a loss. On the 5x5 board they used, this might be 
approximated pretty well, but there's no chance on 19x19 to do so.



Am 13.08.2009 um 05:14 schrieb Michael Williams:

After about the 5th reading, I'm concluding that this is an excellent 
paper.  Is anyone (besides the authors) doing research based on this?  
There is a lot to do.




___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/



___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Simulation Balancing

2009-08-13 Thread Jason House
A web search turned up a 2 page and an 8 page version. I read the  
short one. I agree that it's promising work that requires some follow- 
up research.


Now that you've read it so many times, what excites you about it? Can  
you envision a way to scale it to larger patterns and boards on modern  
hardware?


Sent from my iPhone

On Aug 12, 2009, at 11:14 PM, Michael Williams michaelwilliam...@gmail.com 
 wrote:


After about the 5th reading, I'm concluding that this is an  
excellent paper.  Is anyone (besides the authors) doing research  
based on this?  There is a lot to do.



David Silver wrote:

Hi everyone,
Please find attached my ICML paper with Gerry Tesauro on  
automatically learning a simulation policy for Monte-Carlo Go. Our  
preliminary results show a 200+ Elo improvement over previous  
approaches, although our experiments were restricted to simple  
Monte-Carlo search with no tree on small boards.

Abstract
In this paper we introduce the first algorithms for efficiently  
learning a simulation policy for Monte-Carlo search. Our main idea  
is to optimise the balance of a simulation policy, so that an  
accurate spread of simulation outcomes is maintained, rather than  
optimising the direct strength of the simulation policy. We develop  
two algorithms for balancing a simulation policy by gradient  
descent. The first algorithm optimises the balance of complete  
simulations, using a policy gradient algorithm; whereas the second  
algorithm optimises the balance over every two steps of simulation.  
We compare our algorithms to reinforcement learning and supervised  
learning algorithms for maximising the strength of the simulation  
policy. We test each algorithm in the domain of 5x5 and 6x6  
Computer Go, using a softmax policy that is parameterised by  
weights for a hundred simple patterns. When used in a simple Monte- 
Carlo search, the policies learnt by simulation balancing achieved  
significantly better performance, with half the mean squared error  
of a uniform random policy, and equal overall performance to a  
sophisticated Go engine.

-Dave
--- 
-

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Simulation Balancing

2009-08-13 Thread Olivier Teytaud
 . Pebbles has a Mogo playout design, where you check
 for patterns only around the last move (or two).


In MoGo, it's not only around the last move (at least with some probability
and when there are empty spaces in the board); this is the fill board
modification.

(this provides a big improvement in 19x19 with big numbers of simulations,
see http://www.lri.fr/~rimmel/publi/EK_explo.pdf , Fig 3 page 8 - not only
quantitative improvement, but also, according to players, a qualitative
improvement in the way mogo plays)

Best regards,
Olivier
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Monte-Carlo Simulation Balancing

2009-08-13 Thread Olivier Teytaud


 Just to clarify: I was not saying that Mogo's policy consisted
 *solely* of looking for patterns around the last move. Merely that
 it does not look for patterns around *every* point, which other
 playout policies (e.g., CrazyStone, if I understand Remi's papers
 correctly) appear to do. The RL paper seems to require that
 playout design.


Fine!



 BTW, FillBoard seems to help Pebbles, too. A few percent better
 on 9x9 games. No testing on larger boards. YMMV, and like everything
 about computer go: all predictions are guaranteed to be wrong,
 or your money back.


For us the improvement is essential in 19x19 - I'll find that for the
generality
of fillboard if it helps also for you :-) the loss of diversity due to
patterns
is really clear in some situations, so the problem solved by fillboard is
understandable,
so I believe it should work also for you - but, as you say, all predictions
in computer-go
are almost guaranteed to be wrong :-)
Olivier
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Monte-Carlo Simulation Balancing

2009-08-12 Thread Michael Williams

After about the 5th reading, I'm concluding that this is an excellent paper.  
Is anyone (besides the authors) doing research based on this?  There is a lot 
to do.


David Silver wrote:

Hi everyone,

Please find attached my ICML paper with Gerry Tesauro on automatically 
learning a simulation policy for Monte-Carlo Go. Our preliminary results 
show a 200+ Elo improvement over previous approaches, although our 
experiments were restricted to simple Monte-Carlo search with no tree on 
small boards.





Abstract

In this paper we introduce the first algorithms for efficiently learning 
a simulation policy for Monte-Carlo search. Our main idea is to optimise 
the balance of a simulation policy, so that an accurate spread of 
simulation outcomes is maintained, rather than optimising the direct 
strength of the simulation policy. We develop two algorithms for 
balancing a simulation policy by gradient descent. The first algorithm 
optimises the balance of complete simulations, using a policy gradient 
algorithm; whereas the second algorithm optimises the balance over every 
two steps of simulation. We compare our algorithms to reinforcement 
learning and supervised learning algorithms for maximising the strength 
of the simulation policy. We test each algorithm in the domain of 5x5 
and 6x6 Computer Go, using a softmax policy that is parameterised by 
weights for a hundred simple patterns. When used in a simple Monte-Carlo 
search, the policies learnt by simulation balancing achieved 
significantly better performance, with half the mean squared error of a 
uniform random policy, and equal overall performance to a sophisticated 
Go engine.


-Dave




___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Simulation Balancing

2009-05-04 Thread David Silver

Hi,

We used alpha=0.1. There may well be a better setting of alpha, but  
this appeared to work nicely in our experiments.

-Dave

On 3-May-09, at 2:01 AM, elife wrote:


Hi Dave,

 In your experiments what's the constant value alpha you set?

Thanks.

2009/5/1 David Silver sil...@cs.ualberta.ca:


Yes, in our experiments they were just constant numbers M=N=100.

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Simulation Balancing

2009-05-01 Thread David Silver

Hi Yamato,


If M and N are the same, is there any reason to run M simulations and
N simulations separately?  What happens if you combine them and  
calculate

V and g in the single loop?


I think it gives the wrong answer to do it in a single loop. Note that  
the simulation outcomes z are used to compute both V and g, so that  
they are quite strongly correlated. In general, if random variables X  
and Y are correlated then E[X]E[Y] != E[XY].


A simple example of this problem would be sampling E[X]E[X] for some  
random variable X. Let's say X is actually just noise with mean zero.  
If you just take one sample x1, then x1*x1 is always +ve, and you  
would estimate E[X]E[X]=E[X^2]0. But if you take two independent  
samples x1 and x2, then x1*x2 would be +ve half the time and -ve half  
the time, and you would correctly estimate E[X]E[X]=0.


-Dave

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-30 Thread Rémi Coulom

David Silver wrote:


2. Run another N simulations, average the value of psi(s,a) over 
all positions and moves in games that black won (call this g) 


This is strange: you do not take lost playouts into consideration.

I believe there is a problem with your estimation of the gradient. 
Suppose for instance that you count z = +1 for a win, and z = -1 for a 
loss. Then you would take lost playouts into consideration. This makes 
me a little suspicious.


The fundamental problem here may be that your estimate of the gradient 
is biased by the playout policy. You should probably sample X(s) 
uniformly at random to have an unbiased estimator. Maybe this can be 
fixed with importance sampling, and then you may get a formula that is 
symmetrical regarding wins and losses. I don't have time to do it now, 
but it may be worth taking a look.


Rémi
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-30 Thread Rémi Coulom

Rémi Coulom wrote:
The fundamental problem here may be that your estimate of the gradient 
is biased by the playout policy. You should probably sample X(s) 
uniformly at random to have an unbiased estimator. Maybe this can be 
fixed with importance sampling, and then you may get a formula that is 
symmetrical regarding wins and losses. I don't have time to do it now, 
but it may be worth taking a look.


Rémi 


More precisely: you should estimate the value of N playouts as Sum p_i 
z_i / Sum p_i instead of Sum z_i. Then, take the gradient of Sum p_i z_i 
/ Sum p_i. This would be better.


Maybe Sum p_i z_i / Sum p_i would be better for MCTS, too ?

Rémi
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-30 Thread David Silver

Hi Remi,


This is strange: you do not take lost playouts into consideration.

I believe there is a problem with your estimation of the gradient.  
Suppose for instance that you count z = +1 for a win, and z = -1 for  
a loss. Then you would take lost playouts into consideration. This  
makes me a little suspicious.


Sorry, I should have made it clear that this assumes that we are  
treating black wins as z=1 and white wins as z=0.
In this special case, the gradient is the average of games in which  
black won.
But yes, more generally you need to include games won by both sides.  
The algorithms in the paper still cover this case - I was just trying  
to simplify their description to make it easy to understand the ideas.


The fundamental problem here may be that your estimate of the  
gradient is biased by the playout policy. You should probably sample  
X(s) uniformly at random to have an unbiased estimator. Maybe this  
can be fixed with importance sampling, and then you may get a  
formula that is symmetrical regarding wins and losses. I don't have  
time to do it now, but it may be worth taking a look.


The gradient already compensates for the playout policy (equation 9),  
so in fact it would bias the gradient to sample uniformly at random!


In equation 9, the gradient is taken with respect to the playout  
policy parameters. Using the product rule (third line), the gradient  
is equal to the playout policy probabilities multiplied by the sum  
likelihood ratios multiplied by the simulation outcomes z. This  
gradient can be computed by sampling playouts  instead of multiplying  
by the playout policy probabilities. This is also why games with  
outcomes of z=0 can be ignored - they don't affect this gradient  
computation.


More precisely: you should estimate the value of N playouts as Sum  
p_i z_i / Sum p_i instead of Sum z_i. Then, take the gradient of Sum  
p_i z_i / Sum p_i. This would be better.


Maybe Sum p_i z_i / Sum p_i would be better for MCTS, too ?


I think a similar point applies here. We care about the expected value  
of the playout policy, which can be sampled directly from playouts,  
instead of multiplying by the policy probabilities. You would only  
need importance sampling if you were using a different playout policy  
to the one which you are evaluating. But I guess I'm not seeing any  
good reason to do this?


-Dave

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-30 Thread Rémi Coulom

David Silver wrote:


Sorry, I should have made it clear that this assumes that we are 
treating black wins as z=1 and white wins as z=0.
In this special case, the gradient is the average of games in which 
black won.
But yes, more generally you need to include games won by both sides. 
The algorithms in the paper still cover this case - I was just trying 
to simplify their description to make it easy to understand the ideas.


I understood this. What I find strange is that using -1/1 should be 
equivalent to using 0/1, but your algorithm behaves differently: it 
ignores lost games with 0/1, and uses them with -1/1.


Imagine you add a big constant to z. One million, say. This does not 
change the problem. You get either 100 or 101 as outcome of a 
playout. But then, your estimate of the gradient becomes complete noise.


So maybe using -1/1 is better than 0/1 ? Since your algorithm depends so 
much on the definition of the reward, there must be an optimal way to 
set the reward. Or there must a better way to define an algorithm that 
would not depend on an offset in the reward.




The gradient already compensates for the playout policy (equation 9), 
so in fact it would bias the gradient to sample uniformly at random!

Yes, you are right.

There is still something wrong that I don't understand. There may be a 
way to quantify the amount of noise in the unbiased gradient estimate, 
and it would depend on the average reward. Probably setting the average 
reward to zero is what would minimize noise in the gradient estimate. 
This is just an intuitive guess.


Rémi
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-30 Thread David Silver

Hi Remi,

I understood this. What I find strange is that using -1/1 should be  
equivalent to using 0/1, but your algorithm behaves differently: it  
ignores lost games with 0/1, and uses them with -1/1.


Imagine you add a big constant to z. One million, say. This does not  
change the problem. You get either 100 or 101 as outcome of  
a playout. But then, your estimate of the gradient becomes complete  
noise.


So maybe using -1/1 is better than 0/1 ? Since your algorithm  
depends so much on the definition of the reward, there must be an  
optimal way to set the reward. Or there must a better way to define  
an algorithm that would not depend on an offset in the reward.


There is still something wrong that I don't understand. There may be  
a way to quantify the amount of noise in the unbiased gradient  
estimate, and it would depend on the average reward. Probably  
setting the average reward to zero is what would minimize noise in  
the gradient estimate. This is just an intuitive guess.


Okay, now I understand your point :-) It's a good question - and I  
think you're right. In REINFORCE any baseline can be subtracted from  
the reward, without affecting the expected gradient, but possibly  
reducing its variance. The baseline leading to the best estimate is  
indeed the average reward.  So it should be the case that  {-1,+1}  
would estimate the gradient g more efficiently than {0,1}, assuming  
that we see similar numbers of black wins as white wins across the  
training set.


So to answer your question, we can safely modify the algorithm to use  
(z-b) instead of z, where b is the average reward. This would then  
make the {0,1} and {-1,+1} cases equivalent (with appropriate scaling  
of step-size). I don't think this would have affected the results we  
presented (because all of the learning algorithms converged anyway, at  
least approximately, during training) but it could be an important  
modification for larger boards.


-Dave

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-30 Thread Michael Williams

I wish I was smart   :(


David Silver wrote:

Hi Remi,

I understood this. What I find strange is that using -1/1 should be 
equivalent to using 0/1, but your algorithm behaves differently: it 
ignores lost games with 0/1, and uses them with -1/1.


Imagine you add a big constant to z. One million, say. This does not 
change the problem. You get either 100 or 101 as outcome of a 
playout. But then, your estimate of the gradient becomes complete noise.


So maybe using -1/1 is better than 0/1 ? Since your algorithm depends 
so much on the definition of the reward, there must be an optimal way 
to set the reward. Or there must a better way to define an algorithm 
that would not depend on an offset in the reward.


There is still something wrong that I don't understand. There may be a 
way to quantify the amount of noise in the unbiased gradient estimate, 
and it would depend on the average reward. Probably setting the 
average reward to zero is what would minimize noise in the gradient 
estimate. This is just an intuitive guess.


Okay, now I understand your point :-) It's a good question - and I think 
you're right. In REINFORCE any baseline can be subtracted from the 
reward, without affecting the expected gradient, but possibly reducing 
its variance. The baseline leading to the best estimate is indeed the 
average reward.  So it should be the case that  {-1,+1} would estimate 
the gradient g more efficiently than {0,1}, assuming that we see similar 
numbers of black wins as white wins across the training set.


So to answer your question, we can safely modify the algorithm to use 
(z-b) instead of z, where b is the average reward. This would then make 
the {0,1} and {-1,+1} cases equivalent (with appropriate scaling of 
step-size). I don't think this would have affected the results we 
presented (because all of the learning algorithms converged anyway, at 
least approximately, during training) but it could be an important 
modification for larger boards.


-Dave

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/



___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-30 Thread David Silver

Hi Yamato,


Thanks for the detailed explanation.
M, N and alpha are constant numbers, right?  What did you set them to?


You're welcome!
Yes, in our experiments they were just constant numbers M=N=100.


The feature vector is the set of patterns you use, with value 1 if a
pattern is matched and 0 otherwise. The simulation policy selects
actions in proportion to the exponentiated, weighted sum of all
matching patterns. For example let's say move a matches patterns 1  
and

2, move b matches patterns 1 and 3, and move c matches patterns 2 and
4. Then move a would be selected with probability e^(theta1 +
theta2) / (e^(theta1 + theta2) + e^(theta1 + theta3) + e^(theta2 +
theta4)). The theta values are the weights on the patterns which we
would like to learn. They are the log of the Elo ratings in Remi
Coulom's approach.


OK, I guess it is the formula 5 in the paper.


Yes, exactly.


The only tricky part is computing the vector psi(s,a). Each component
of psi(s,a) corresponds to a particular pattern, and is the  
difference

between the observed feature (i.e. whether the pattern actually
occurred after move a in position s) and the expected feature (the
average value of the pattern, weighted by the probability of  
selecting

each action).


I still don't understand this. Is it the formula 6?
Could you please give me an example like the above?


Yes that's right, this is equation 6.

Okay, let's continue the example above. Let's say that in position s,  
using the current theta, moves a, b and c will be selected with  
probability 0.5, 0.3 and 0.2 respectively. Let's say that move a was  
actually selected. Now consider pattern 1, this is matched after a.  
But the probability of matching was (0.5*1 +0.3*1 +0.2*0) = 0.8. So  
psi_1(s,a)=1-0.8 = 0.2. For pattern 2, psi_2(s,a)=1- 
(0.5*1+0.3*0+0.2*1)=0.3, etc.. So in this example the vector psi(s,a)  
= [0.2,0.3,-0.3,-0.2].


In other words, psi tells us whether each pattern was actually matched  
more or less than we could have expected.


Hope this helps.
-Dave

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-30 Thread David Silver
IMO other people's equations/code/ideas/papers always seem smarter  
than your own. The stuff you understand and do yourself just seems  
like common sense, and the stuff you don't always has a mystical air  
of complexity, at least until you understand it too :-)


On 30-Apr-09, at 1:59 PM, Michael Williams wrote:


I wish I was smart   :(


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-30 Thread Yamato
David Silver wrote:
Yes, in our experiments they were just constant numbers M=N=100.

If M and N are the same, is there any reason to run M simulations and
N simulations separately?  What happens if you combine them and calculate
V and g in the single loop?

Okay, let's continue the example above. Let's say that in position s,  
using the current theta, moves a, b and c will be selected with  
probability 0.5, 0.3 and 0.2 respectively. Let's say that move a was  
actually selected. Now consider pattern 1, this is matched after a.  
But the probability of matching was (0.5*1 +0.3*1 +0.2*0) = 0.8. So  
psi_1(s,a)=1-0.8 = 0.2. For pattern 2, psi_2(s,a)=1- 
(0.5*1+0.3*0+0.2*1)=0.3, etc.. So in this example the vector psi(s,a)  
= [0.2,0.3,-0.3,-0.2].

In other words, psi tells us whether each pattern was actually matched  
more or less than we could have expected.

I understood what psi was. I am not sure how it works, but anyway I can
see your algorithm now. Thanks.

--
Yamato
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-29 Thread David Silver

Hi Yamato,


Could you give us the source code which you used?  Your algorithm is
too complicated, so it would be very helpful if possible.


Actually I think the source code would be much harder to understand!  
It is written inside RLGO, and makes use of a substantial existing  
framework that would take a lot of effort to understand. (On a  
separate note I am considering making RLGO open source at some point,  
but I'd prefer to spend some effort cleaning it up before making it  
public).


But I think maybe Algorithm 1 is much easier than you think:

A: Estimate value V* of every position in a training set, using deep  
rollouts.


B: Repeat, for each position in the training set
1. Run M simulations, estimate value of position (call this V)
	2. Run another N simulations, average the value of psi(s,a) over all  
positions and moves in games that black won (call this g)

3. Adjust parameters by alpha * (V* - V) * g

The feature vector is the set of patterns you use, with value 1 if a  
pattern is matched and 0 otherwise. The simulation policy selects  
actions in proportion to the exponentiated, weighted sum of all  
matching patterns. For example let's say move a matches patterns 1 and  
2, move b matches patterns 1 and 3, and move c matches patterns 2 and  
4. Then move a would be selected with probability e^(theta1 +  
theta2) / (e^(theta1 + theta2) + e^(theta1 + theta3) + e^(theta2 +  
theta4)). The theta values are the weights on the patterns which we  
would like to learn. They are the log of the Elo ratings in Remi  
Coulom's approach.


The only tricky part is computing the vector psi(s,a). Each component  
of psi(s,a) corresponds to a particular pattern, and is the difference  
between the observed feature (i.e. whether the pattern actually  
occurred after move a in position s) and the expected feature (the  
average value of the pattern, weighted by the probability of selecting  
each action).


It's also very important to be careful about signs and the colour to  
play - it's easy to make a mistake and follow the gradient in the  
wrong direction.


Is that any clearer?
-Dave

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-29 Thread Michael Williams

David Silver wrote:

Hi Michael,

But one thing confuses me: You are using the value from Fuego's 10k 
simulations as an approximation of the actual value of the position.  
But isn't the actual
value of the position either a win or a loss?  On such small boards, 
can't you assume that Fuego is able to correctly determin who is 
winning and round it's
evaluation to the nearest win/loss?  i.e. if it evaluates the position 
to 0.674, that gets rounded to 1.  If such an assumption about Fuego's 
ability to read
the position on a small board is valid, then it should improve the 
results of your balanced simulation strategy, right?  Or am I missing 
something?


It's true that 5x5 Go is solved, so in principle we could have used the 
true minimax values. However we chose to use an approach that can scale 
to larger boards, which means that we should treat the expert 
evaluations as approximate. And in fact Fuego was not always accurate on 
6x6 boards, as we used only 10k simulations in our training set.


Also, I think it really helps to have soft rather than hard expert 
evaluations. We want a simulation policy that helps differentiate e.g. a 
90% winning position from an 85% winning position. Rounding all the 
expert evaluations to 0 or 1 would lose much of this advantage.


-Dave


By this argument (your last paragraph), you need to do some magical
number of simulations for the training data.  Not enough simulations
and you have too much noise.  And infinite simulations gives you hard
0 or 1 results.  But I can't argue with your results.

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] Re: [computer go] Monte-Carlo Simulation Balancing

2009-04-29 Thread David Silver

Hi Remi,


What komi did you use for 5x5 and 6x6 ?

I used 7.5 komi for both board sizes.


I find it strange that you get only 70 Elo points from supervised
learning over uniform random. Don't you have any feature for atari
extension ? This one alone should improve strength immensely (extend
string in atari because of the previous move).
Actually no. The features are very simple, and know how to capture but  
not how to defend ataris. I'm sure that a better set of features could  
improve by more than 70 Elo, but I expect we would still see a benefit  
to balancing the weights correctly. For example, the Fuego policy  
defends ataris and follows several other common-sense rules, but the  
results in 5x5 and 6x6 show that it is not well balanced on small  
boards.


Let us extrapolate: I got more than 200 Elo points of improvements  
from
my patterns in 9x9 over uniform random (I never really measured  
that, it

may be even more than 200).
I guess you got more than 200 Elo on 9x9, in Mogo (Gelly et al. 2006)  
the improvement from uniform random was at least 400 Elo and I think  
your simulation policy is probably at least as strong.


By the way I was sad to hear you're not working on Crazystone any  
more. Is it because you are you busy with other projects?



So maybe I could get 600 more Elo points
with your method. And even more on 19x19.
I noticed that, in general, changes in the playout policy have a much
bigger impact on larger boards than on smaller boards.


As to whether we can extrapolate, I hope so :-)
I share the same feeling that improving the simulation policy will be  
more impactful on bigger boards with longer simulations.
On the other hand I've been surprised by many things in MC Go, so  
nothing is certain until we try it!


-Dave

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-29 Thread Yamato
David Silver wrote:
A: Estimate value V* of every position in a training set, using deep  
rollouts.

B: Repeat, for each position in the training set
   1. Run M simulations, estimate value of position (call this V)
   2. Run another N simulations, average the value of psi(s,a) over all  
positions and moves in games that black won (call this g)
   3. Adjust parameters by alpha * (V* - V) * g

Thanks for the detailed explanation.
M, N and alpha are constant numbers, right?  What did you set them to?

The feature vector is the set of patterns you use, with value 1 if a  
pattern is matched and 0 otherwise. The simulation policy selects  
actions in proportion to the exponentiated, weighted sum of all  
matching patterns. For example let's say move a matches patterns 1 and  
2, move b matches patterns 1 and 3, and move c matches patterns 2 and  
4. Then move a would be selected with probability e^(theta1 +  
theta2) / (e^(theta1 + theta2) + e^(theta1 + theta3) + e^(theta2 +  
theta4)). The theta values are the weights on the patterns which we  
would like to learn. They are the log of the Elo ratings in Remi  
Coulom's approach.

OK, I guess it is the formula 5 in the paper.

The only tricky part is computing the vector psi(s,a). Each component  
of psi(s,a) corresponds to a particular pattern, and is the difference  
between the observed feature (i.e. whether the pattern actually  
occurred after move a in position s) and the expected feature (the  
average value of the pattern, weighted by the probability of selecting  
each action).

I still don't understand this. Is it the formula 6?
Could you please give me an example like the above?

--
Yamato
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-28 Thread Rémi Coulom

David Silver wrote:

I don't think the 200+ Elo improvement is so impressive


I agree that it would be much more impressive to report positive 
results on larger boards. But perhaps it is already interesting that 
tuning the balance of the simulation policy can make a big difference 
on small boards? Also, larger boards mean longer simulations, and so 
the importance of simulation balancing should become even more 
exaggerated. 


What komi did you use for 5x5 and 6x6 ?

I find it strange that you get only 70 Elo points from supervised 
learning over uniform random. Don't you have any feature for atari 
extension ? This one alone should improve strength immensely (extend 
string in atari because of the previous move).


Let us extrapolate: I got more than 200 Elo points of improvements from 
my patterns in 9x9 over uniform random (I never really measured that, it 
may be even more than 200). So maybe I could get 600 more Elo points 
with your method. And even more on 19x19.


I noticed that, in general, changes in the playout policy have a much 
bigger impact on larger boards than on smaller boards.


Rémi
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-28 Thread Ivan Dubois
I noticed that, in general, changes in the playout policy have a much 
bigger impact on larger boards than on smaller boards.


Rémi


I think rating differences are emplified on larger boards. This is easy to 
see if you think about it this way :


Somehow a 19x19 board is like 4 9x9 boards. Let us define a new game that I 
would call 4-Go where instead of playing one game, you play simultenously 4 
games and determine the winner by calculating the sum of the scores of the 
four games. Certainly rating differences would be bigger with 4-go than with 
go (given the same two players). This explains why rating differences are 
bigger on 19x19 than 9x9.


Ivan 



___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-28 Thread Don Dailey
A simplistic model that helps explain this is golf.   On a single hole, even
a casual golfer has a realistic chance of out-golfing Tiger Woods.  Tiger
occasionally shoots a 1 over par on some hole and even weak amateurs
occasionally par or even birdie a hole.It's not going to happen a lot,
but it's not ridiculous either.   Years ago I taught a player how to golf,
and on his third time out with me,  he hit a hole in one on a short par
3. If Tiger Woods had been playing with us, he would have lost that hole
to this beginner.

But in a 9 hole match,  the odds go down enormously - for all practical
purposes there is no chance.

I kind of think of GO like that, even though it's a pretty simplistic
model.   Each move is like a hole of golf,  it can be a good shot or a bad
one. With GO, however, probably a LOT of your moves are just as good as
the moves of a good player.   But it's the ones that fall short, that kill
you.

Go on a big board is like 18 holes of golf  compared to just 1 or 2 holes of
golf.   The better player is far more likely to win the 18 hole match than
the 1 hole match.

- Don





On Tue, Apr 28, 2009 at 1:53 PM, Ivan Dubois dubois_i...@yahoo.fr wrote:

 I noticed that, in general, changes in the playout policy have a much
 bigger impact on larger boards than on smaller boards.

 Rémi


 I think rating differences are emplified on larger boards. This is easy to
 see if you think about it this way :

 Somehow a 19x19 board is like 4 9x9 boards. Let us define a new game that I
 would call 4-Go where instead of playing one game, you play simultenously 4
 games and determine the winner by calculating the sum of the scores of the
 four games. Certainly rating differences would be bigger with 4-go than with
 go (given the same two players). This explains why rating differences are
 bigger on 19x19 than 9x9.

 Ivan

 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-28 Thread steve uurtamo
also, i'm not sure that a lot of most amateurs' moves are very
good.  the spectrum of bad moves is wide, it's just that it takes
someone many stones stronger to severely punish small differences
between good and nearly-good moves.  among players of relatively
similar strength, these differences will go unnoticed and unpunished.

s.

2009/4/28 Don Dailey dailey@gmail.com:
 A simplistic model that helps explain this is golf.   On a single hole, even
 a casual golfer has a realistic chance of out-golfing Tiger Woods.  Tiger
 occasionally shoots a 1 over par on some hole and even weak amateurs
 occasionally par or even birdie a hole.    It's not going to happen a lot,
 but it's not ridiculous either.   Years ago I taught a player how to golf,
 and on his third time out with me,  he hit a hole in one on a short par
 3. If Tiger Woods had been playing with us, he would have lost that hole
 to this beginner.

 But in a 9 hole match,  the odds go down enormously - for all practical
 purposes there is no chance.

 I kind of think of GO like that, even though it's a pretty simplistic
 model.   Each move is like a hole of golf,  it can be a good shot or a bad
 one. With GO, however, probably a LOT of your moves are just as good as
 the moves of a good player.   But it's the ones that fall short, that kill
 you.

 Go on a big board is like 18 holes of golf  compared to just 1 or 2 holes of
 golf.   The better player is far more likely to win the 18 hole match than
 the 1 hole match.

 - Don





 On Tue, Apr 28, 2009 at 1:53 PM, Ivan Dubois dubois_i...@yahoo.fr wrote:

 I noticed that, in general, changes in the playout policy have a much
 bigger impact on larger boards than on smaller boards.

 Rémi

 I think rating differences are emplified on larger boards. This is easy to
 see if you think about it this way :

 Somehow a 19x19 board is like 4 9x9 boards. Let us define a new game that
 I would call 4-Go where instead of playing one game, you play simultenously
 4 games and determine the winner by calculating the sum of the scores of the
 four games. Certainly rating differences would be bigger with 4-go than with
 go (given the same two players). This explains why rating differences are
 bigger on 19x19 than 9x9.

 Ivan

 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/


 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-28 Thread terry mcintyre
A fair number of amateur moves are exactly what a pro would use; this is no 
great surprise, since the moves are actually copied from professional games. 
The weaker player can blindly emulate, but does not know why a pro plays A 
instead of B or C; does not know how to punish slack moves.

I've been taking some classes from an 8d pro. Sadly, I'm at the bottom of the 
class. The top players in these classes are 8d amateurs; the pro can give them 
3 stones and win. When going over their games, many of their moves are quite 
respectable, but the pro is able to find mistakes which can be exploited with 
devastating effect. You can't afford to be a little bit off against a pro, 
but weaker players find it hard to exploit slack moves.

 
When explaining his moves, the pro often uses very deep reading. We've all seen 
simple cases, where move A is advisable if a ladder works, otherwise B is 
recommended. The pro knows six moves in advance that he'll need to read that 
ladder, before the ladder is actually on the board. These are the simplest 
expositions I've seen; things get more complex from there. 
Terry McIntyre terrymcint...@yahoo.com


Government is an association of men who do violence to the rest of us.
- Leo Tolstoy





From: steve uurtamo uurt...@gmail.com
To: computer-go computer-go@computer-go.org
Sent: Tuesday, April 28, 2009 5:09:20 PM
Subject: Re: [computer-go] Monte-Carlo Simulation Balancing

also, i'm not sure that a lot of most amateurs' moves are very
good.  the spectrum of bad moves is wide, it's just that it takes
someone many stones stronger to severely punish small differences
between good and nearly-good moves.  among players of relatively
similar strength, these differences will go unnoticed and unpunished.

s.

2009/4/28 Don Dailey dailey@gmail.com:
 A simplistic model that helps explain this is golf.   On a single hole, even
 a casual golfer has a realistic chance of out-golfing Tiger Woods.  Tiger
 occasionally shoots a 1 over par on some hole and even weak amateurs
 occasionally par or even birdie a hole.It's not going to happen a lot,
 but it's not ridiculous either.   Years ago I taught a player how to golf,
 and on his third time out with me,  he hit a hole in one on a short par
 3. If Tiger Woods had been playing with us, he would have lost that hole
 to this beginner.

 But in a 9 hole match,  the odds go down enormously - for all practical
 purposes there is no chance.

 I kind of think of GO like that, even though it's a pretty simplistic
 model.   Each move is like a hole of golf,  it can be a good shot or a bad
 one. With GO, however, probably a LOT of your moves are just as good as
 the moves of a good player.   But it's the ones that fall short, that kill
 you.

 Go on a big board is like 18 holes of golf  compared to just 1 or 2 holes of
 golf.   The better player is far more likely to win the 18 hole match than
 the 1 hole match.

 - Don





 On Tue, Apr 28, 2009 at 1:53 PM, Ivan Dubois dubois_i...@yahoo.fr wrote:

 I noticed that, in general, changes in the playout policy have a much
 bigger impact on larger boards than on smaller boards.

 Rémi

 I think rating differences are emplified on larger boards. This is easy to
 see if you think about it this way :

 Somehow a 19x19 board is like 4 9x9 boards. Let us define a new game that
 I would call 4-Go where instead of playing one game, you play simultenously
 4 games and determine the winner by calculating the sum of the scores of the
 four games. Certainly rating differences would be bigger with 4-go than with
 go (given the same two players). This explains why rating differences are
 bigger on 19x19 than 9x9.

 Ivan

 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/


 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/



  ___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-28 Thread Yamato
David Silver wrote:
 because the previous approaches were not optimized for such a small  
 boards.

I'm not sure what you mean here? The supervised learning and  
reinforcement learning approaches that we compared against are both  
trained on the small boards, i.e. the pattern weights are specifically  
optimised for that size of board.

Ah, sorry. I missed it.

I agree that the handcrafted policy from Fuego was not optimised for  
small boards, which is why it performed poorly. But perhaps this is  
also interesting, i.e. it suggests that a handcrafted policy for 9x9  
Go may be significantly suboptimal when used in 19x19 Go. So  
automatically learning a simulation policy may ultimately prove to be  
very beneficial.

It is surprising Fuego was far worse than uniform random on 5x5 and 6x6.
These results show the particularity of the very small boards.
I suppose 5x5 is very different from 9x9 but 19x19 is not so from 9x9.

 I'm looking forward to your results on larger boards.

Me too :-)
Coming soon, will let you know how it goes.
But I hope that others will try these ideas too, it's always much  
better to compare multiple implementations of the same algorithm.

Could you give us the source code which you used?  Your algorithm is
too complicated, so it would be very helpful if possible.

--
Yamato
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-28 Thread Don Dailey
But I'm only trying to make a point, not pin the analogy down perfectly.
Naturally the stronger the player, the more likely his moves will conform to
the level of the top players.

The basic principle is that the longer the contest, the more opportunities a
strong player has to demonstrate his superiority over an inferior player
and/or recover from mistakes or surprises.In my golf analogy,   my
friend hits a hole in one.  Let's say it was on the first hole of 9 against
Tiger Woods.   Tiger likely would have hit it in 2 strokes,  and my beginner
friend would be ahead by 1 stroke!But that 1 stroke lead would almost
certainly disappear by the time the next hole was completed, as my friend
was typically taking 7 or 8 shots on that par 3 course!

Another way to look at this,  is that a random legal move generator has a
chance to play a perfect game, although that chance is almost infinitesimal.
  But even that small chance can be an order of magnitude larger if you can
shorted the contest even by 1 or 2 moves.

- Don


On Tue, Apr 28, 2009 at 8:09 PM, steve uurtamo uurt...@gmail.com wrote:

 also, i'm not sure that a lot of most amateurs' moves are very
 good.  the spectrum of bad moves is wide, it's just that it takes
 someone many stones stronger to severely punish small differences
 between good and nearly-good moves.  among players of relatively
 similar strength, these differences will go unnoticed and unpunished.

 s.

 2009/4/28 Don Dailey dailey@gmail.com:
  A simplistic model that helps explain this is golf.   On a single hole,
 even
  a casual golfer has a realistic chance of out-golfing Tiger Woods.  Tiger
  occasionally shoots a 1 over par on some hole and even weak amateurs
  occasionally par or even birdie a hole.It's not going to happen a
 lot,
  but it's not ridiculous either.   Years ago I taught a player how to
 golf,
  and on his third time out with me,  he hit a hole in one on a short par
  3. If Tiger Woods had been playing with us, he would have lost that
 hole
  to this beginner.
 
  But in a 9 hole match,  the odds go down enormously - for all practical
  purposes there is no chance.
 
  I kind of think of GO like that, even though it's a pretty simplistic
  model.   Each move is like a hole of golf,  it can be a good shot or a
 bad
  one. With GO, however, probably a LOT of your moves are just as good
 as
  the moves of a good player.   But it's the ones that fall short, that
 kill
  you.
 
  Go on a big board is like 18 holes of golf  compared to just 1 or 2 holes
 of
  golf.   The better player is far more likely to win the 18 hole match
 than
  the 1 hole match.
 
  - Don
 
 
 
 
 
  On Tue, Apr 28, 2009 at 1:53 PM, Ivan Dubois dubois_i...@yahoo.fr
 wrote:
 
  I noticed that, in general, changes in the playout policy have a much
  bigger impact on larger boards than on smaller boards.
 
  Rémi
 
  I think rating differences are emplified on larger boards. This is easy
 to
  see if you think about it this way :
 
  Somehow a 19x19 board is like 4 9x9 boards. Let us define a new game
 that
  I would call 4-Go where instead of playing one game, you play
 simultenously
  4 games and determine the winner by calculating the sum of the scores of
 the
  four games. Certainly rating differences would be bigger with 4-go than
 with
  go (given the same two players). This explains why rating differences
 are
  bigger on 19x19 than 9x9.
 
  Ivan
 
  ___
  computer-go mailing list
  computer-go@computer-go.org
  http://www.computer-go.org/mailman/listinfo/computer-go/
 
 
  ___
  computer-go mailing list
  computer-go@computer-go.org
  http://www.computer-go.org/mailman/listinfo/computer-go/
 
 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-27 Thread Michael Williams

Finally!  I guess you can add this technique to your list, Lukasz.


David Silver wrote:

Hi everyone,

Please find attached my ICML paper with Gerry Tesauro on automatically 
learning a simulation policy for Monte-Carlo Go. Our preliminary results 
show a 200+ Elo improvement over previous approaches, although our 
experiments were restricted to simple Monte-Carlo search with no tree on 
small boards.





Abstract

In this paper we introduce the first algorithms for efficiently learning 
a simulation policy for Monte-Carlo search. Our main idea is to optimise 
the balance of a simulation policy, so that an accurate spread of 
simulation outcomes is maintained, rather than optimising the direct 
strength of the simulation policy. We develop two algorithms for 
balancing a simulation policy by gradient descent. The first algorithm 
optimises the balance of complete simulations, using a policy gradient 
algorithm; whereas the second algorithm optimises the balance over every 
two steps of simulation. We compare our algorithms to reinforcement 
learning and supervised learning algorithms for maximising the strength 
of the simulation policy. We test each algorithm in the domain of 5x5 
and 6x6 Computer Go, using a softmax policy that is parameterised by 
weights for a hundred simple patterns. When used in a simple Monte-Carlo 
search, the policies learnt by simulation balancing achieved 
significantly better performance, with half the mean squared error of a 
uniform random policy, and equal overall performance to a sophisticated 
Go engine.


-Dave




___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-27 Thread Rémi Coulom

David Silver wrote:

Hi everyone,

Please find attached my ICML paper with Gerry Tesauro on automatically 
learning a simulation policy for Monte-Carlo Go. Our preliminary 
results show a 200+ Elo improvement over previous approaches, although 
our experiments were restricted to simple Monte-Carlo search with no 
tree on small boards.


Very interesting, thanks.

If I understand correctly, your method makes your program 250 Elo points 
stronger than my pattern-learning algorithm on 5x5 and 6x6, by just 
learning better weights. That is impressive. I had stopped working on my 
program for a very long time, but maybe I will go back to work and try 
your algorithm.


Rémi
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-27 Thread Michael Williams

My favorite part:
One natural idea is to use the learned simulation policy in Monte-Carlo search, and 
generate new deep search values, in an iterative cycle.

But one thing confuses me: You are using the value from Fuego's 10k simulations as an approximation of the actual value of the position.  But isn't the actual 
value of the position either a win or a loss?  On such small boards, can't you assume that Fuego is able to correctly determin who is winning and round it's 
evaluation to the nearest win/loss?  i.e. if it evaluates the position to 0.674, that gets rounded to 1.  If such an assumption about Fuego's ability to read 
the position on a small board is valid, then it should improve the results of your balanced simulation strategy, right?  Or am I missing something?





David Silver wrote:

Hi everyone,

Please find attached my ICML paper with Gerry Tesauro on automatically 
learning a simulation policy for Monte-Carlo Go. Our preliminary results 
show a 200+ Elo improvement over previous approaches, although our 
experiments were restricted to simple Monte-Carlo search with no tree on 
small boards.





Abstract

In this paper we introduce the first algorithms for efficiently learning 
a simulation policy for Monte-Carlo search. Our main idea is to optimise 
the balance of a simulation policy, so that an accurate spread of 
simulation outcomes is maintained, rather than optimising the direct 
strength of the simulation policy. We develop two algorithms for 
balancing a simulation policy by gradient descent. The first algorithm 
optimises the balance of complete simulations, using a policy gradient 
algorithm; whereas the second algorithm optimises the balance over every 
two steps of simulation. We compare our algorithms to reinforcement 
learning and supervised learning algorithms for maximising the strength 
of the simulation policy. We test each algorithm in the domain of 5x5 
and 6x6 Computer Go, using a softmax policy that is parameterised by 
weights for a hundred simple patterns. When used in a simple Monte-Carlo 
search, the policies learnt by simulation balancing achieved 
significantly better performance, with half the mean squared error of a 
uniform random policy, and equal overall performance to a sophisticated 
Go engine.


-Dave




___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-27 Thread Yamato
David Silver wrote:
Please find attached my ICML paper with Gerry Tesauro on automatically  
learning a simulation policy for Monte-Carlo Go. Our preliminary  
results show a 200+ Elo improvement over previous approaches, although  
our experiments were restricted to simple Monte-Carlo search with no  
tree on small boards.

I like you idea, but why do you use only 5x5 and 6x6 Go?  I don't think
the 200+ Elo improvement is so impressive because the previous approaches
were not optimized for such a small boards. I'm looking forward to your
results on larger boards.

--
Yamato
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Tree Search reference bot

2008-12-07 Thread dhillismail
Finally got a window of time. It's called the epsilon trick. Here is a cut 
and paste of Lukasz' explanation from the archives, including how to calculate 
N.
- Dave Hillis

--

-Original Message-
From: [EMAIL PROTECTED]
To: computer-go@computer-go.org
Sent: Sun, 14 Jan 2007 4:50 AM
Subject: [computer-go] Fast UCT (epsilon trick)


Hi 
I've looked back into the article and I see that it is not trivial to 
understand how does the epsilon trick (ET) works. Therefore I will 
describe it here on the example of UCT. 

The bottleneck of UCT algorithm is choosing a move. Program has to 
iterate over all children 
evaluating their UCB value. We want to use of this loop rarely. 
One way to do it is to group playouts together. I.e. play 2 or more MC 
simulations per each tree traversal. But this is not good enough as it 
doesn't distribute 
the playouts over the tree and some parts will not get enough of them. 

The ET works as follows: 
With each node of the tree program keeps additional information: 
- child_to_visit 
- visits_to_go 

When we visit node, 
if node.visits_to_go  0 then 
node-visits_to_go - node.visits_to_go - 1 
node = child_to_visit 

So we have predetermined strategy to go to particular child visits_to_go times. 

But 
if node.visis_to_go = 0 then // we have to set the strategy first 
node-child_to_visit = UCT_find_child (node) 
node-visits_to_go = Epsilon * node-visited_count // round up 

As we can see, when 
Epsilon is small or when a particular node is 
visited not too many times, UCT-ET is identical to UCT. 

But when node was visited more times (near the root, far from leaves), 
then UCT-ET may save some time and go to precalculated child. 

That's it. 
I hope You like it :) 
Łukasz Lew 

PS. 

In Df-PN, good Epsilon was 1/4 I believe that here it will be much 
smaller. 1/64 maybe. 
Or even one may change the Epsilon equation to: 
node-visits_to_go = Epsilon * sqrt(node-visited_count) 

If someone will do some experiments, I'm eager to hear the results. 

PS2 

float InvSqrt (float x) { 
  float xhalf = 0.5f * x; 
  int i = *(int*)x; 
  i = 0x5f3759df - (i1); 
  x = *(float*)i; 
  x = x * (1.5f - xhalf * x * x); 
  return x; 
} 

;-) 

___
--


-Original Message-
From: [EMAIL PROTECTED]
To: computer-go@computer-go.org
Sent: Wed, 3 Dec 2008 12:56 am
Subject: Re: [computer-go] Monte-Carlo Tree Search reference bot


Hmm.. it could be that N is picked randomly each time... now I can't seem to 
find the description and my memory is not serving me well here. Perhaps someone 
else remembers? I'll try to track it down.

- Dave Hillis


-Original Message-
From: Michael Williams [EMAIL PROTECTED]
To: computer-go computer-go@computer-go.org
Sent: Wed, 3 Dec 2008 12:14 am
Subject: Re: [computer-go] Monte-Carlo Tree Search reference b
ot


That's going to repeat the same exact path through the tree three times, isn't 
it? If so, it seems like it would be more efficient to do N playouts from the 
leaf after each traversal of the tree. 
 
[EMAIL PROTECTED] wrote: 
 There is another speedup trick that might interest you. IIRC Lukasz Lew  
 came up with it, but I forget what he called it. After calculating the  
 selected move for an internal node (going through UCT and RAVE and  whatnot) 
 store that move. Then, for the next N visits to that node  (where N is 5 or 
 10 or so), just repeat that move without having to  calculate what the move 
 would be. 
  - Dave Hillis 
  -Original Message- 
 From: Mark Boon [EMAIL PROTECTED] 
 To: computer-go computer-go@computer-go.org 
 Sent: Tue, 2 Dec 2008 11:17 pm 
 Subject: Re: [computer-go] Monte-Carlo Tree Search reference bot 
  I have made some minor performance improvements and this is as far as I  
  intend to take this particular project. I might make some small changes  
  if necessary, but most likely I'll leave this largely unchanged from now.  
   I had set myself as an arbitrary goal that it should do at least 20K  
  playouts. But with real liberties, AMAF and a RAVE formula I got stuck  in 
  the 16K-17K range. According to my profiler that is mostly due to the  
  expensive formula used to compare nodes, where it says it spends 25% of  
  total time. The formula I'm using is:   beta * (virtu
al-win-ratio + RAVE) + (1-beta) * (win-ratio + UCT)   beta = 1 - 
log(parent-visits) / 20  UCT = exploration-factor *sqrt( log(parent-visits) / 
(nr-visits+1) )  RAVE = exploration-factor *sqrt( log(parent-visits) /  
(nr-virtual-visits+1) )   There are probably quite a few possibilities still 
to tune this program  with regards to playing streng th and performance. But I 
felt it doesn't  help to obscure the implementation by too many details.   
The implementation of the search algorithm was entirely  game-independent, 
until I introduced AMAF. I didn't see how to fix that,  as there's no way 
getting around

Re: [computer-go] Monte-Carlo Tree Search reference bot

2008-12-03 Thread Sylvain Gelly
What I did (was a long time ago, I don't know if it is still used in
Mogo), is to compute the m best moves every so often and most of the
time just do the max over those m moves.
m was on the order of 5, and every so often was an increasing
function like sqrt or log (I don't remember).
That speeds up quite a lot (when the max becomes the bottleneck), and
if you tune the parameters well you almost don't loose any strength at
a fixed number of playouts.

Sylvain

2008/12/3  [EMAIL PROTECTED]:
 There is another speedup trick that might interest you. IIRC Lukasz Lew came
 up with it, but I forget what he called it. After calculating the selected
 move for an internal node (going through UCT and RAVE and whatnot) store
 that move. Then, for the next N visits to that node (where N is 5 or 10 or
 so), just repeat that move without having to calculate what the move would
 be.

 - Dave Hillis

 -Original Message-
 From: Mark Boon [EMAIL PROTECTED]
 To: computer-go computer-go@computer-go.org
 Sent: Tue, 2 Dec 2008 11:17 pm
 Subject: Re: [computer-go] Monte-Carlo Tree Search reference bot

 I have made some minor performance improvements and this is as far as I
 intend to take this particular project. I might make some small changes if
 necessary, but most likely I'll leave this largely unchanged from now.

 I had set myself as an arbitrary goal that it should do at least 20K
 playouts. But with real liberties, AMAF and a RAVE formula I got stuck in
 the 16K-17K range. According to my profiler that is mostly due to the
 expensive formula used to compare nodes, where it says it spends 25% of
 total time. The formula I'm using is:

   beta * (virtual-win-ratio + RAVE) + (1-beta) * (win-ratio + UCT)

   beta = 1 - log(parent-visits) / 20
   UCT = exploration-factor *sqrt( log(parent-visits) / (nr-visits+1) )
   RAVE = exploration-factor *sqrt( log(parent-visits) /
 (nr-virtual-visits+1) )

 There are probably quite a few possibilities still to tune this program with
 regards to playing strength and performance. But I felt it doesn't help to
 obscure the implementation by too many details.

 The implementation of the search algorithm was entirely game-independent,
 until I introduced AMAF. I didn't see how to fix that, as there's no way
 getting around that it's based on the fact that a move is represented by a
 single coordinate, which is mapped to an array to store the statistical
 values. But strip the AMAF part, which is a block of 30 odd lines of code,
 and I think it can be used for other games basically as-is. I did this not
 because I ever see myself program another game, but because in my experience
 in doing so I get a cleaner separation between modules.

 At 2,000 playouts, it's still quite a bit weaker than the plain MC-AMAF
 refbot. It wins only about 33%. But that's probably because the 1,000-2,000
 playouts range is the sweet-spot for that particular type of playing engine.
 It doesn't scale from there, whereas the MCTS ref-bot only just gets warmed
 up with 2,000 playouts.

 This leads me to a question. I suppose it might be of some interest to put
 this bot up on CGOS. But what parameters to use? The main one being the
 number of playouts, naturally.

 Mark
 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

 
 Tis the season to save your money! Get the new AOL Holiday Toolbar for money
 saving offers and gift ideas.
 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Tree Search reference bot

2008-12-03 Thread Vlad Dumitrescu
On Wed, Dec 3, 2008 at 05:17, Mark Boon [EMAIL PROTECTED] wrote:
 I had set myself as an arbitrary goal that it should do at least 20K
 playouts. But with real liberties, AMAF and a RAVE formula I got stuck in
 the 16K-17K range. According to my profiler that is mostly due to the
 expensive formula used to compare nodes, where it says it spends 25% of
 total time. The formula I'm using is:

beta * (virtual-win-ratio + RAVE) + (1-beta) * (win-ratio + UCT)

beta = 1 - log(parent-visits) / 20
UCT = exploration-factor *sqrt( log(parent-visits) / (nr-visits+1) )
RAVE = exploration-factor *sqrt( log(parent-visits) /
 (nr-virtual-visits+1) )

Hi,

This is probably something you already do, but just in case you don't,
here comes a simple optimization trick: since parents-visits is an
integer value and in a somewhat bounded range (0..max_playouts), the
logarithm can be precomputed in a table. Then even sqrt(log()) can be
precomputed also. Similarly, a table with sqrt(integer()) can be used
for the other values.

best regards,
Vlad
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Tree Search reference bot

2008-12-03 Thread Mark Boon


On 3-dec-08, at 06:09, Sylvain Gelly wrote:


What I did (was a long time ago, I don't know if it is still used in
Mogo), is to compute the m best moves every so often and most of the
time just do the max over those m moves.
m was on the order of 5, and every so often was an increasing
function like sqrt or log (I don't remember).
That speeds up quite a lot (when the max becomes the bottleneck), and
if you tune the parameters well you almost don't loose any strength at
a fixed number of playouts.



I thought about that, but I was afraid the code would become too  
obscure. After all, this is supposed to be a reference  
implementation. But maybe I should actually give it a try to see what  
it would look like.


I also played with precomputing a table of the log() function but  
found little speedup. Instead I decided to (p)recompute the log(nr- 
parent-visits) only once in 8 times. According to my profiler that  
reduced the log() from using 8% to 1% of computation time. Another  
thing I did was replace log(nr-virtual-parent-visits) in the RAVE  
calculation by the same log(nr-parent-visits) used in the UCT  
calculation, cutting both the number of times I need to call log in  
half and reducing the amount of storage in the node a little. After  
these modifications I didn't notice any degradation in play.


In terms of memory use, I did a few obvious things to keep storage of  
the nodes to a minimum. I had seen people write about memory usage of  
the tree, but never understood the concerns. But that was because I  
always expanded the tree one node at a time. Of course, if you create  
all the children in one go like this reference implementation does,  
it's a little bit a different story. But still, I have to push my  
computer hard to run out of space before I run out of time so I  
didn't see much need to reduce memory too aggressively. Using just a  
moderately small number of playouts before expansion is already  
enough to never have memory problems. But if memory is a concern to  
anyone I see two obvious ways to make make significant reductions.  
One is to flatten the data-structure, reducing the object overhead  
Java imposes. That could save up to a third.  The other is to use  
smaller placeholders for nodes that have been created, but not yet  
visited. Once a node gets visited it can be replaced by a full-blown  
node, but until then all you need is the move-coordinate and the AMAF  
statistics. That should save up to 75% or so.


Mark

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Tree Search reference bot

2008-12-03 Thread Heikki Levanto
On Wed, Dec 03, 2008 at 09:55:07AM -0200, Mark Boon wrote:
 I thought about that, but I was afraid the code would become too  
 obscure. After all, this is supposed to be a reference  
 implementation. But maybe I should actually give it a try to see what  
 it would look like.

I agree that the reference implementation should be maximally clear code.
Performance is not all that important here, correctness is.


Having said that, I can't help responding to one detail:

 I had seen people write about memory usage of  the tree, but never
 understood the concerns. 

One thing to remember is that more memory use means more cache misses, and
more access to the main memory. On modern computers, those can cost as much
as executing a thousand instructions! So memory optimizing can often pay off,
also with respect to time!

Of course, Java does a lot of memory management behind the scenes, so
optimizing such can be harder... But when writing in C or C++, it does make
sense.

  

  -H

-- 
Heikki Levanto   In Murphy We Turst heikki (at) lsd (dot) dk

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Tree Search reference bot

2008-12-03 Thread Mark Boon


On 3-dec-08, at 10:31, Heikki Levanto wrote:


Having said that, I can't help responding to one detail:


I had seen people write about memory usage of  the tree, but never
understood the concerns.


One thing to remember is that more memory use means more cache  
misses, and
more access to the main memory. On modern computers, those can cost  
as much
as executing a thousand instructions! So memory optimizing can  
often pay off,

also with respect to time!

Of course, Java does a lot of memory management behind the scenes, so
optimizing such can be harder... But when writing in C or C++, it  
does make

sense.



I've seen this stated often. But color me skeptical. If what you say  
is true, then the MC-AMAF bot, which hardly uses any memory at all  
and which fits in the secondary cache entirely, code and data  
combined, should show an enormous speed-up in terms of number of  
playouts per second. Instead it does only about twice the number of  
playouts per second as the tree-search version, which can be almost  
if not entirely explained by the extra work that needs to be done to  
traverse and maintain the tree. If this is because I use Java, then  
Don's concise C implementation of the MC-AMAF bot should be a lot  
faster than my bloated Java version. Which, ahem, is not the case at  
all. I think a well-crafted C program can be up to twice as fast as a  
similar Java program. But I doubt that has much to do with cache misses.


I think in computer-Go, trying to control cache misses is futile no  
matter the language. Maybe you can do it for a relatively trivial,  
well-understood and stable sub-part. But then you risk losing the  
gain again as soon as it gets incorporated into a larger whole. So  
your efforts go to waste.


So this is in the show me category for me. You can use C or C++ if  
you like, but show me a tree-search bot that speeds up the number of  
playouts significantly by reducing the node-size. Maybe it's  
possible, but I don't believe it until I see it. This could be  
different for Chess programs, at least I've seen it claimed often.  
But I don't envy them, it must be excruciatingly frustrating to deal  
with.


Mark

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Tree Search reference bot

2008-12-03 Thread Heikki Levanto
On Wed, Dec 03, 2008 at 11:24:22AM -0200, Mark Boon wrote:
 
 Heikki wrote:
 One thing to remember is that more memory use means more cache  misses,
 and more access to the main memory. On modern computers, those can cost
 as much as executing a thousand instructions! So memory optimizing can
 often pay off, also with respect to time!

 
 I've seen this stated often. But color me skeptical.

Maybe I was quoting common wisdom without having checked it personally. I
only remember one case - not related to go at all - where optimizing the
memory stuff made a noticeable difference.

I would also like to see hard evidence. But not badly enough to put in the
necessary work to get some ;-)

  - H

-- 
Heikki Levanto   In Murphy We Turst heikki (at) lsd (dot) dk

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Tree Search reference bot

2008-12-03 Thread Don Dailey
I had a chess program years ago that was blindingly fast on some
computers,  very slow on others.   It was all about the cache.  The move
generator was hard coded for each piece on each square.  For instance a
white pawn on d7 had it's very own move specialized move generator.
There was a function for each piece of each color on each square. 

By doing this I avoided the majority of conditional tests.  I did not
have to test for board edge, pawn on 7th, pawn on the second rank, etc.
All the logic was unrolled and I basically had to write a program to
write the move generator.  

The machine I developed this program on had a lot of cache and it was
not a problem.  But when running on, for instance a laptop,  it would
slow down enormously - something like 4 to 1 where other cache friendly
programs would slow down perhaps 25%.   

I don't keep up much with processor technologies now and I don't know
how much data vs instruction cache modern setups have, but I know it is
far better than it used to be.   I bet that old program would run well
on modern computers. 

One trick, which you are probably doing, is to keep data structures
together in memory.  For instance it is better to have children next to
each other so that you are not accessing memory all over the place,
perhaps incurring several cache misses instead of one. 


On Wed, 2008-12-03 at 11:24 -0200, Mark Boon wrote:
 On 3-dec-08, at 10:31, Heikki Levanto wrote:
 
  Having said that, I can't help responding to one detail:
 
  I had seen people write about memory usage of  the tree, but never
  understood the concerns.
 
  One thing to remember is that more memory use means more cache  
  misses, and
  more access to the main memory. On modern computers, those can cost  
  as much
  as executing a thousand instructions! So memory optimizing can  
  often pay off,
  also with respect to time!
 
  Of course, Java does a lot of memory management behind the scenes, so
  optimizing such can be harder... But when writing in C or C++, it  
  does make
  sense.
 
 
 I've seen this stated often. But color me skeptical. If what you say  
 is true, then the MC-AMAF bot, which hardly uses any memory at all  
 and which fits in the secondary cache entirely, code and data  
 combined, should show an enormous speed-up in terms of number of  
 playouts per second. Instead it does only about twice the number of  
 playouts per second as the tree-search version, which can be almost  
 if not entirely explained by the extra work that needs to be done to  
 traverse and maintain the tree. If this is because I use Java, then  
 Don's concise C implementation of the MC-AMAF bot should be a lot  
 faster than my bloated Java version. Which, ahem, is not the case at  
 all. I think a well-crafted C program can be up to twice as fast as a  
 similar Java program. But I doubt that has much to do with cache misses.
 
 I think in computer-Go, trying to control cache misses is futile no  
 matter the language. Maybe you can do it for a relatively trivial,  
 well-understood and stable sub-part. But then you risk losing the  
 gain again as soon as it gets incorporated into a larger whole. So  
 your efforts go to waste.
 
 So this is in the show me category for me. You can use C or C++ if  
 you like, but show me a tree-search bot that speeds up the number of  
 playouts significantly by reducing the node-size. Maybe it's  
 possible, but I don't believe it until I see it. This could be  
 different for Chess programs, at least I've seen it claimed often.  
 But I don't envy them, it must be excruciatingly frustrating to deal  
 with.
 
 Mark
 
 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/


signature.asc
Description: This is a digitally signed message part
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Monte-Carlo Tree Search reference bot

2008-12-03 Thread Don Dailey
On Wed, 2008-12-03 at 11:24 -0200, Mark Boon wrote:
 If this is because I use Java, then  
 Don's concise C implementation of the MC-AMAF bot should be a lot  
 faster than my bloated Java version.

I don't remember the numbers, but my own java and C implementation were
written in the same style - same basic data structure and just about the
only difference was language.   I did do some kind of list trick using
pointers than you cannot do in java, at least not easily and efficiently
but I don't think it made that much of a difference.

The Java version was not that bad for speed compared to C, I think C was
something like 2/3 of the speed or close to that.But if you got into
heavy duty data structures Java hurts - you won't get as big a tree in
Java for instance and I'm sure caching will be a bigger issue.


- Don



signature.asc
Description: This is a digitally signed message part
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Monte-Carlo Tree Search reference bot

2008-12-02 Thread Mark Boon
I have made some minor performance improvements and this is as far as  
I intend to take this particular project. I might make some small  
changes if necessary, but most likely I'll leave this largely  
unchanged from now.


I had set myself as an arbitrary goal that it should do at least 20K  
playouts. But with real liberties, AMAF and a RAVE formula I got  
stuck in the 16K-17K range. According to my profiler that is mostly  
due to the expensive formula used to compare nodes, where it says it  
spends 25% of total time. The formula I'm using is:


beta * (virtual-win-ratio + RAVE) + (1-beta) * (win-ratio + UCT)

beta = 1 - log(parent-visits) / 20
UCT = exploration-factor *sqrt( log(parent-visits) / (nr-visits+1) )
	RAVE = exploration-factor *sqrt( log(parent-visits) / (nr-virtual- 
visits+1) )


There are probably quite a few possibilities still to tune this  
program with regards to playing strength and performance. But I felt  
it doesn't help to obscure the implementation by too many details.


The implementation of the search algorithm was entirely game- 
independent, until I introduced AMAF. I didn't see how to fix that,  
as there's no way getting around that it's based on the fact that a  
move is represented by a single coordinate, which is mapped to an  
array to store the statistical values. But strip the AMAF part, which  
is a block of 30 odd lines of code, and I think it can be used for  
other games basically as-is. I did this not because I ever see myself  
program another game, but because in my experience in doing so I get  
a cleaner separation between modules.


At 2,000 playouts, it's still quite a bit weaker than the plain MC- 
AMAF refbot. It wins only about 33%. But that's probably because the  
1,000-2,000 playouts range is the sweet-spot for that particular type  
of playing engine. It doesn't scale from there, whereas the MCTS ref- 
bot only just gets warmed up with 2,000 playouts.


This leads me to a question. I suppose it might be of some interest  
to put this bot up on CGOS. But what parameters to use? The main one  
being the number of playouts, naturally.


Mark
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Tree Search reference bot

2008-12-02 Thread dhillismail
There is another speedup trick that might interest you. IIRC Lukasz Lew came up 
with it, but I forget what he called it. After calculating the selected move 
for an internal node (going through UCT and RAVE and whatnot) store that move. 
Then, for the next N visits to that node (where N is 5 or 10 or so), just 
repeat that move without having to calculate what the move would be.

- Dave Hillis

-Original Message-
From: Mark Boon [EMAIL PROTECTED]
To: computer-go computer-go@computer-go.org
Sent: Tue, 2 Dec 2008 11:17 pm
Subject: Re: [computer-go] Monte-Carlo Tree Search reference bot


I have made some minor performance improvements and this is as far as I intend 
to take this particular project. I might make some small changes if necessary, 
but most likely I'll leave this largely unchanged from now.?
?
I had set myself as an arbitrary goal that it should do at least 20K playouts. 
But with real liberties, AMAF and a RAVE formula I got stuck in the 16K-17K 
range. According to my profiler that is mostly due to the expensive formula 
used to compare nodes, where it says it spends 25% of total time. The formula 
I'm using is:?
?
? beta * (virtual-win-ratio + RAVE) + (1-beta) * (win-ratio + UCT)?
?
? beta = 1 - log(parent-visits) / 20?
? UCT = exploration-factor *sqrt( log(parent-visits) / (nr-visits+1) )?
? RAVE = exploration-factor *sqrt( log(parent-visits) / (nr-virtual-visits+1) )?
?
There are probably quite a few possibilities still to tune this program with 
regards to playing strength and performance. But I felt it doesn't help to 
obscure the implementation by too many details.?
?
The implementation of the search algorithm was entirely game-independent, until 
I introduced AMAF. I didn't see how to fix that, as there's no way getting 
around that it's based on the fact that a move is represented by a single 
coordinate, which is mapped to an array to store the statistical values. But 
strip the AMAF part, which is a block of 30 odd lines of code, and I think it 
can be used for other games basically as-is. I did this not because I ever see 
myself program another game, but because in my experience in doing so I get a 
cleaner separation between modules.?
?
At 2,000 playouts, it's still quite a bit weaker than the plain MC-AMAF refbot. 
It wins only about 33%. But that's probably because the 1,000-2,000 playouts 
range is the sweet-spot for that particular type of playing engine. It doesn't 
scale from there, whereas the MCTS ref-bot only just gets warmed up with 2,000 
playouts.?
?
This leads me to a question. I suppose it might be of some interest to put this 
bot up on CGOS. But what parameters to use? The main one being the number of 
playouts, naturally.?
?
Mark?
___?
computer-go mailing list?
[EMAIL PROTECTED]
http://www.computer-go.org/mailman/listinfo/computer-go/?

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Monte-Carlo Tree Search reference bot

2008-12-02 Thread Michael Williams
That's going to repeat the same exact path through the tree three times, isn't it?  If so, it seems like it would be more efficient to do N playouts from the 
leaf after each traversal of the tree.



[EMAIL PROTECTED] wrote:
There is another speedup trick that might interest you. IIRC Lukasz Lew 
came up with it, but I forget what he called it. After calculating the 
selected move for an internal node (going through UCT and RAVE and 
whatnot) store that move. Then, for the next N visits to that node 
(where N is 5 or 10 or so), just repeat that move without having to 
calculate what the move would be.


- Dave Hillis

-Original Message-
From: Mark Boon [EMAIL PROTECTED]
To: computer-go computer-go@computer-go.org
Sent: Tue, 2 Dec 2008 11:17 pm
Subject: Re: [computer-go] Monte-Carlo Tree Search reference bot

I have made some minor performance improvements and this is as far as I 
intend to take this particular project. I might make some small changes 
if necessary, but most likely I'll leave this largely unchanged from now. 
 
I had set myself as an arbitrary goal that it should do at least 20K 
playouts. But with real liberties, AMAF and a RAVE formula I got stuck 
in the 16K-17K range. According to my profiler that is mostly due to the 
expensive formula used to compare nodes, where it says it spends 25% of 
total time. The formula I'm using is: 
 
  beta * (virtual-win-ratio + RAVE) + (1-beta) * (win-ratio + UCT) 
 
  beta = 1 - log(parent-visits) / 20 
  UCT = exploration-factor *sqrt( log(parent-visits) / (nr-visits+1) ) 
  RAVE = exploration-factor *sqrt( log(parent-visits) / 
(nr-virtual-visits+1) ) 
 
There are probably quite a few possibilities still to tune this program 
with regards to playing strength and performance. But I felt it doesn't 
help to obscure the implementation by too many details. 
 
The implementation of the search algorithm was entirely 
game-independent, until I introduced AMAF. I didn't see how to fix that, 
as there's no way getting around that it's based on the fact that a move 
is represented by a single coordinate, which is mapped to an array to 
store the statistical values. But strip the AMAF part, which is a block 
of 30 odd lines of code, and I think it can be used for other games 
basically as-is. I did this not because I ever see myself program 
another game, but because in my experience in doing so I get a cleaner 
separation between modules. 
 
At 2,000 playouts, it's still quite a bit weaker than the plain MC-AMAF 
refbot. It wins only about 33%. But that's probably because the 
1,000-2,000 playouts range is the sweet-spot for that particular type of 
playing engine. It doesn't scale from there, whereas the MCTS ref-bot 
only just gets warmed up with 2,000 playouts. 
 
This leads me to a question. I suppose it might be of some interest to 
put this bot up on CGOS. But what parameters to use? The main one being 
the number of playouts, naturally. 
 
Mark 
___ 
computer-go mailing list 
computer-go@computer-go.org mailto:computer-go@computer-go.org 
http://www.computer-go.org/mailman/listinfo/computer-go/ 



Tis the season to save your money! Get the new AOL Holiday Toolbar 
http://toolbar.aol.com/holiday/download.html?ncid=emlweusdown0008 
for money saving offers and gift ideas.





___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Tree Search reference bot

2008-12-02 Thread dhillismail
Hmm.. it could be that N is picked randomly each time... now I can't seem to 
find the description and my memory is not serving me well here. Perhaps someone 
else remembers? I'll try to track it down.

- Dave Hillis


-Original Message-
From: Michael Williams [EMAIL PROTECTED]
To: computer-go computer-go@computer-go.org
Sent: Wed, 3 Dec 2008 12:14 am
Subject: Re: [computer-go] Monte-Carlo Tree Search reference bot


That's going to repeat the same exact path through the tree three times, isn't 
it? If so, it seems like it would be more efficient to do N playouts from the 
leaf after each traversal of the tree.?
?
[EMAIL PROTECTED] wrote:?
 There is another speedup trick that might interest you. IIRC Lukasz Lew  
 came up with it, but I forget what he called it. After calculating the  
 selected move for an internal node (going through UCT and RAVE and  whatnot) 
 store that move. Then, for the next N visits to that node  (where N is 5 or 
 10 or so), just repeat that move without having to  calculate what the move 
 would be.?
  - Dave Hillis?
  -Original Message-?
 From: Mark Boon [EMAIL PROTECTED]?
 To: computer-go computer-go@computer-go.org?
 Sent: Tue, 2 Dec 2008 11:17 pm?
 Subject: Re: [computer-go] Monte-Carlo Tree Search reference bot?
  I have made some minor performance improvements and this is as far as I  
  intend to take this particular project. I might make some small changes  
  if necessary, but most likely I'll leave this largely unchanged from now.  
   I had set myself as an arbitrary goal that it should do at least 20K  
  playouts. But with real liberties, AMAF and a RAVE formula I got stuck  in 
  the 16K-17K range. According to my profiler that is mostly due to the  
  expensive formula used to compare nodes, where it says it spends 25% of  
  total time. The formula I'm using is:   beta * (virtual-win-ratio + RAVE) 
  + (1-beta) * (win-ratio + UCT)   beta = 1 - log(parent-visits) / 20  UCT 
  = exploration-factor *sqrt( log(parent-visits) / (nr-visits+1) )  RAVE = 
  exploration-factor *sqrt( log(parent-visits) /  (nr-virtual-visits+1) )  
   There are probably quite a few possibilities still to tune this program  
  with regards to playing strength and performance. But I felt it doesn't  
  help to obscure 
 the implementation by too many details.   The implementation of the search 
algorithm was entirely  game-independent, until I introduced AMAF. I didn't 
see how to fix that,  as there's no way getting around that it's based on the 
fact that a move  is represented by a single coordinate, which is mapped to an 
array to  store the statistical values. But strip the AMAF part, which is a 
block  of 30 odd lines of code, and I think it can be used for other games  
basically as-is. I did this not because I ever see myself program  another 
game, but because in my experience in doing so I get a cleaner  separation 
between modules.   At 2,000 playouts, it's still quite a bit weaker than the 
plain MC-AMAF  refbot. It wins only about 33%. But that's probably because the 
 1,000-2,000 playouts range is the sweet-spot for that particular type of  
playing engine. It doesn't scale from there, whereas the MCTS ref-bot  only 
just gets warmed up with 2,000 playouts.   This leads 
 me to a question. I suppose it might be of some interest to  put th

is bot up on CGOS. But what parameters to use? The main one being  the number 
of playouts, naturally.   Mark  
___  computer-go mailing list  
computer-go@computer-go.org mailto:computer-go@computer-go.org  
http://www.computer-go.org/mailman/listinfo/computer-go/   
?
 Tis the season to save your money! Get the new AOL Holiday Toolbar  
 http://toolbar.aol.com/holiday/download.html?ncid=emlweusdown0008  for 
 money saving offers and gift ideas.?
   ?
  ___?
 computer-go mailing list?
 [EMAIL PROTECTED]
 http://www.computer-go.org/mailman/listinfo/computer-go/?
?
___?
computer-go mailing list?
[EMAIL PROTECTED]
http://www.computer-go.org/mailman/listinfo/computer-go/?

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

RE: [computer-go] Monte-Carlo Tree Search reference bot

2008-11-28 Thread Denis fidaali

 My own experiments showed a very big increase in strength
especially with about 1000 light-playouts. I had many features
and parameters witch may make a difference. But my
main hypothesis was that it shouldn't ...

I think that i weighted real simulations (aka first move) 10 times more than
virtual one (aka AMAF moves). 

I assume that by expansions you mean allowing data for every child
of a node. So let's assume that your root node is called A, you would
then allocate for every of it's child. Then when make up a simulation
running from A, you would first apply the UCT formula for
determining wich first move you should make ... then
 you would both get back the AMAF score for every moves 
that has been played during the simulation for virtual scoring
AND you would use the true first move for Real scoring (normal UCT part)

Of course, this shouldn't behave exactly like AMAF, because for one thing
you can get a better judgment for the first move because of UCT, and
then you also make the first move weight more. (although you probably
technically should also do so with AMAF despite that the first move
is still 100% random. Which is part of what the
weighted AMAF does. But first move weight would be arbitrary
thus defying the purpose of a standard way)

Last, you could try to downgrade the weight of the virtual score part
as the number of simulation grows.

like with a (if nbsim  1000)
winRatio=[realRatio *(1000+nbsim)]/3000+[virtualRatio*(2000-(1000+nbsim))]/3000

I'm not sure about the formula :) but you get the idea of taking more and more 
into account
the realRatio. I think that what i did was that i didn't take the realRatio 
into accout at all at
first for estimating the winRatio, and then increased it linearly to the point 
where i wouldn't
take into accout the virtualRatio.


The drawback with all that, is that you will certainly have to make several 
versions, although
they would share a lot of code. I don't think that trying to get One version 
behave as
several is a good idea. You probably should keep the code safe, once you have 
made a lot
of testing of it. For reference purpose. Then you'll be able to exactly know 
the strength
of a particular piece of software, even in two years from now. I din't do that. 
I tried
to make it evolve. And with poor svn versionning capacity, and poor management 
of
the measures/software link : i'm now to a point where all i can do is make 
random
guesses, although i made millions of experiments in the last past years :) I 
just
can't link those experiments back to a fixed piece of software.

 Then, if by any means, you think a code is bug free. Which is hard to bet on,
you really should put that somewhere, with all the proof (and experiments)
you can associate with it. (Even on a web page if you ask me , or a thesis)
To me that is the most usefull part of the standard ref bot : 
because a lot of persons have tried it, you can be sure of the exact link
between the software, and the experimental results. It makes assertions 
reproducibles, and that's really great.


 From: [EMAIL PROTECTED]
 Subject: Re: [computer-go] Monte-Carlo Tree Search reference bot
 Date: Fri, 28 Nov 2008 01:48:09 -0200
 To: computer-go@computer-go.org
 CC: 
 
 
 On 27-nov-08, at 19:50, Denis fidaali wrote:
 
  So, you use AMAF for simulating the first UCT evaluations ?
  I though the classical way to use AMAF, was to affect only the
  win/lose ratio portion of the uct equation.
  Obvioulsy it should be allowed to use an arbitrary
  large number of AMAF simulation accumulating them longer
  than what it take to expand a node.
  I think that a classical way to affect the win/ratio is to
  decrease the effect of the AMAF correction as the number
  of simulation grows.
 
  If you test with a very low number of simulation
  (in the 1000 - 3000 range), i think you should be
  able to get out a very nice improvement out of the
  AMAF version. If you don't, i would think that something
  is wrong somewhere.
 
  What test process do you use for this version ?
 
 I tested it mostly doing 2,000 playouts. When AMAF is true I create a  
 map of virtual-win values of all the moves played during a playout.  
 These values get accumulated over all the playouts (not just the  
 first ones). The 'virtual-value' of a move is calculated as follows:
 
 exploration-factor * UCT + ( (nr-wins*2 + nr-virtual-wins) / (nr- 
 playouts*2 + nr-virtual-playouts))
 
 where the exploration-factor is currently sqrt(0.2) and UCT is sqrt 
 ( log( nr-parent-playouts ) / ( nr-playouts+1) )
 
 Like I said, I haven't had time to experiment much so this formula  
 may not be any good. I had also expected to see some positive effect  
 of the virtual-win / virtual-playout ratio from AMAF, but I see none.  
 Of course it's also possible I have a different kind of bug still.
 
 What happens in my 'formula' is that when it never expands beyond the  
 first level, which is what happens if the number of simulations is  
 equal

Re: [computer-go] Monte-Carlo Tree Search reference bot

2008-11-28 Thread Mark Boon
 :
because a lot of persons have tried it, you can be sure of the  
exact link
between the software, and the experimental results. It makes  
assertions

reproducibles, and that's really great.


 From: [EMAIL PROTECTED]
 Subject: Re: [computer-go] Monte-Carlo Tree Search reference bot
 Date: Fri, 28 Nov 2008 01:48:09 -0200
 To: computer-go@computer-go.org
 CC:


 On 27-nov-08, at 19:50, Denis fidaali wrote:

  So, you use AMAF for simulating the first UCT evaluations ?
  I though the classical way to use AMAF, was to affect only the
  win/lose ratio portion of the uct equation.
  Obvioulsy it should be allowed to use an arbitrary
  large number of AMAF simulation accumulating them longer
  than what it take to expand a node.
  I think that a classical way to affect the win/ratio is to
  decrease the effect of the AMAF correction as the number
  of simulation grows.
 
  If you test with a very low number of simulation
  (in the 1000 - 3000 range), i think you should be
  able to get out a very nice improvement out of the
  AMAF version. If you don't, i would think that something
  is wrong somewhere.
 
  What test process do you use for this version ?

 I tested it mostly doing 2,000 playouts. When AMAF is true I  
create a

 map of virtual-win values of all the moves played during a playout.
 These values get accumulated over all the playouts (not just the
 first ones). The 'virtual-value' of a move is calculated as follows:

 exploration-factor * UCT + ( (nr-wins*2 + nr-virtual-wins) / (nr-
 playouts*2 + nr-virtual-playouts))

 where the exploration-factor is currently sqrt(0.2) and UCT is sqrt
 ( log( nr-parent-playouts ) / ( nr-playouts+1) )

 Like I said, I haven't had time to experiment much so this formula
 may not be any good. I had also expected to see some positive effect
 of the virtual-win / virtual-playout ratio from AMAF, but I see  
none.

 Of course it's also possible I have a different kind of bug still.

 What happens in my 'formula' is that when it never expands beyond  
the

 first level, which is what happens if the number of simulations is
 equal to the number of simulations before expansion, the virtual-
 value becomes completely determined by nr-virtual-wins / nr-virtual-
 playouts making it equivalent to the original ref-bot. In case it
 does expand further and creates a tree, the actual win-loss ratio is
 weighed twice as heavily as the virtual win-loss ratio. That seemed
 like a reasonable first try. I have tried a few others, but usually
 didn't get much different results or much worse results.

 Mark

 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

Souhaitez vous  « être au bureau sans y être » ? Oui je le veux !
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

RE: [computer-go] Monte-Carlo Tree Search reference bot

2008-11-27 Thread Denis fidaali

So, you use AMAF for simulating the first UCT evaluations ?
I though the classical way to use AMAF, was to affect only the
win/lose ratio portion of the uct equation. 
Obvioulsy it should be allowed to use an arbitrary
large number of AMAF simulation accumulating them longer
than what it take to expand a node.
I think that a classical way to affect the win/ratio is to 
decrease the effect of the AMAF correction as the number
of simulation grows.

If you test with a very low number of simulation
(in the 1000 - 3000 range), i think you should be
able to get out a very nice improvement out of the 
AMAF version. If you don't, i would think that something
is wrong somewhere.

What test process do you use for this version ?

 From: [EMAIL PROTECTED]
 Date: Thu, 27 Nov 2008 18:15:34 -0200
 To: computer-go@computer-go.org
 CC: 
 Subject: [computer-go] Monte-Carlo Tree Search reference bot
 
 I have added a MCTS implementation to my reference-bot project. It  
 allows you to specify how many nodes to search, after how many nodes  
 to expand and whether to use AMAF or not. If you specify the number  
 of nodes before expansion to be the same as the total number of nodes  
 to search and set AMAF to true you get the equivalent of Don's  
 original ref-bot. When you set AMAF to false and set the number of  
 nodes before epxansion to 1 you get my original UCT search algorithm.
 
 Between those extremes there might be a good formula to combine AMAF  
 with tree-search, but unfortunately I haven't had time lately to look  
 for one. The few small attempts I made show no benefit using AMAF in  
 tree-search, only when used on a single level. The contrast between  
 the two exptremes is very stark, so I'm actually convinced there  
 should be a way to use both. This implementation is also quite a bit  
 slower than my original search algorithm but I also didn't have time  
 yet to trace it. It might simply be due to the different expansion  
 method, which is much more expensive with a value of 1. Also,  
 searching for the best UCT node gets more expensive with more  
 (unused) nodes on each level. Using a higher expansion value may  
 alleviate the performance hit. Anyway I think this is a reasonable  
 starting point.
 
 At first I intended to create a different project for the search  
 reference bot. But half of the code (the MC part) is the same. And I  
 don't want to end up having to maintain the same code in two places.  
 I also didn't want to separate out some code into a separate library  
 and making the structure for the simple ref-bot more complicated.  
 This organization may need some more thought though.
 
 I'll update the project pages tomorrow.
 
 Mark
 
 
 
 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

_
Email envoyé avec Windows Live Hotmail. Dites adieux aux spam et virus, passez 
à Hotmail ! C'est gratuit !
http://www.windowslive.fr/hotmail/default.asp___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

RE: [computer-go] Monte carlo play

2008-11-15 Thread dave.devos
Hello Tony,
 
I'm just speaking for myself here, but I suspect you may not get the answers 
you are looking for, because you're not asking the right questions.
 
I get the impression that the situation is similar to this scenario:
 

A young guy with a surf board and a hawai shirt turns up at 
basecamp on mount Everest and says:
Hey dudes, what's up!
This morning I had this wonderful idea: I wanna climb the mount 
Everest!
So here I am, and I even brought my hiking shoes!
But can anyone give me directions to the nearest McDonalds, cuz 
I'm starving.
 
Everybody in the tent stares at him speechlessy.

I may be wrong, but that's my impression.
I'm not even sure whether if I myself should be here at basecamp, even though I 
am a 4d in Go and even though I have 25 years of programming experience in a 
couple of different languages (including the last 8 years as a fulltime 
professional developer.)
It's only logical that ever since I've learned Go 20 years ago I've had a 
special interest in computer-go and I spent some time over the years attempting 
to build a strong go playing program myself (although not nearly as much time 
as some others here).
I'm a mountaineer, but climbing the mount Everest is something else. I feel a 
bit like a rooky between all the seasoned mountaineers hanging around here at 
the Everest basecamp. Most of the time I keep my mouth shut when one of them is 
talking, because I'm a bit apprehensive that I might say something that is 
utterly obvious to the others or even outright stupid.
 
And then this guy turns up in his hawai shirt.
 
I hope you don't feel offended. Indeed you took up a wonderful endeavour, but I 
sense that you're not quite ready to go for the summit today.
 
Best Regards,
Dave



Van: [EMAIL PROTECTED] namens tony tang
Verzonden: do 13-11-2008 22:03
Aan: go mailing list
Onderwerp: RE: [computer-go] Monte carlo play


Thanks Darren,
 
After reading a couple of interesting papers on this field I am applying 
pattern matching to limit the number of possibilities of the monte carlo tree 
(on 19x19 it appears to take a day to play).
I was wondering if you could comment on weather my following concept is right 
on the evalution part.
 
The possible patterns for the current situation are passed to a class where 
self play begins. Each different pattern runs on a instance, and at the end the 
results are written into a file. The game tree will be constructed using the 
stats from the file. 
 
Thanks again
 
Tony
 
 


 Date: Wed, 12 Nov 2008 10:34:21 +0900
 From: [EMAIL PROTECTED]
 To: computer-go@computer-go.org
 Subject: Re: [computer-go] Monte carlo play
 
  I am new to programming go, could some one explain to me how a monte
  carlo based evalution manages to play random games by itself? ie:
  who/what is the oppoent which supplies the opposing moves which
  allows another move to be randomly played after making the initial
 
 It is self-play, so both players are random.
 
 I'm not aware of any programs that use different playout strategies for
 black/white, though it has been briefly mentioned before on this list,
 and I personally think it might be interesting as a way of discovering
 and more accurately evaluating the imbalances present in most go
 positions (i.e. usually one side has more strength and one side has more
 territory, and so to correctly understand the position one side has to
 play aggressive moves and the other has to play defensive moves).
 
  move at the root. I am implementing in java, is there a
  package/framework which allows me to train my software.
 
 This page, recently started by Eric Marchand, might be a good starting
 place:
 http://ricoh51.free.fr/go/engineeng.htm
 
 Darren
 
 
 -- 
 Darren Cook, Software Researcher/Developer
 http://dcook.org/mlsn/ (English-Japanese-German-Chinese-Arabic
 open source dictionary/semantic network)
 http://dcook.org/work/ (About me and my work)
 http://dcook.org/blogs.html (My blogs and articles)
 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/





Get the best wallpapers on the Web - FREE. Click here! 
http://wallpapers.msn.com/?ocid=[B001MSN42A0716B]  
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Monte carlo play

2008-11-15 Thread Nick Wedd
In message
[EMAIL PROTECTED],
[EMAIL PROTECTED] writes

 Hawaiian shirt analogy snipped 

I hope you don't feel offended. Indeed you took up a wonderful
endeavour, but I sense that you're not quite ready to go for the summit
today.

But Tony never expressed any interest in going for the summit.  Maybe
he just wants to write a simple MC implementation.  Here is what he
wrote:

I am new to programming go, could some one explain to me how a monte
carlo based evalution manages to play random games by itself?
ie: who/what is the oppoent which supplies the opposing moves which
allows another move to be randomly played after making the initial move
at the root.

I am implementing in java, is there a package/framework which allows me
to train my software.

I will try to answer these questions myself.

 who/what is the oppoent which supplies the opposing moves

It is the same piece of your program that supplies your own moves.  For
each random game or rollout, your program places both black and
white stones, alternately, at random.

is there a package/framework which allows me
to train my software.

Nothing trains your software.  This isn't a neural net.  The way it
works is:
  you write a program
  A: it plays, badly
 you modify it so it plays differently, and with luck, better
 repeat from A

Nick
-- 
Nick Wedd[EMAIL PROTECTED]
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


RE: [computer-go] Monte carlo play

2008-11-15 Thread dave.devos
You are absolutely right. Tony did not imply he wanted to go for the summit 
today. Also, Tony's qualifications are surely a lot better than a hawaiian 
shirt and a surfboard. 
 
My analogy was an exaggeration with the intention of explaining to Tony that he 
has more catching up to do than he might have expected. 
By no means was my intention to put Tony down. I hope Tony understands this and 
I sincerely hope that Tony won't give up on joining us in this wonderful 
endeavour, because we need all the help we can get.
 
Best Regards,
Dave



Van: [EMAIL PROTECTED] namens Nick Wedd
Verzonden: za 15-11-2008 15:48
Aan: computer-go
Onderwerp: Re: [computer-go] Monte carlo play



In message
[EMAIL PROTECTED],
[EMAIL PROTECTED] writes

 Hawaiian shirt analogy snipped 

I hope you don't feel offended. Indeed you took up a wonderful
endeavour, but I sense that you're not quite ready to go for the summit
today.

But Tony never expressed any interest in going for the summit.  Maybe
he just wants to write a simple MC implementation.  Here is what he
wrote:

I am new to programming go, could some one explain to me how a monte
carlo based evalution manages to play random games by itself?
ie: who/what is the oppoent which supplies the opposing moves which
allows another move to be randomly played after making the initial move
at the root.

I am implementing in java, is there a package/framework which allows me
to train my software.

I will try to answer these questions myself.

 who/what is the oppoent which supplies the opposing moves

It is the same piece of your program that supplies your own moves.  For
each random game or rollout, your program places both black and
white stones, alternately, at random.

is there a package/framework which allows me
to train my software.

Nothing trains your software.  This isn't a neural net.  The way it
works is:
  you write a program
  A: it plays, badly
 you modify it so it plays differently, and with luck, better
 repeat from A

Nick
--
Nick Wedd[EMAIL PROTECTED]
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

RE: [computer-go] Monte carlo play

2008-11-15 Thread dave.devos
Just a quickstart here, and I'm not one of the real experts, so it may contain 
inaccuracies. Perhaps you already know all this, perhaps not.
 
The basis of many game-playing programs (like chess programs) is the minimax 
(or negamax) algorithm.
It is a simple algorithm by itself and you can find a lot of background and 
examples for chess or other games if you google it.
Basically, minimax explores all possible variations stemming from the current 
position and then selects the best move.
 
This algorithm works in principle, but there is a problem with it:
For interesting games like chess and go, it will usually take the program 
billions of years to find the best move, given a game position.
 
In the past decades several adaptations of the minimax algorithm were invented 
to enable the algorithm to find a reasonably good move in a timespan that 
humans are prepared to wait for (like evaluation functions, alfabeta pruning 
and transposition tables). The program keeps investigating possibilities for 
some time and then the move seeming best so far is played. The longer the 
program is allowed to think (and the faster the computer), the better the move 
it will come up with.
 
These adaptations work by reducing the number of possibilities the program 
explores, without weakening the program too much. The program must ignore many 
possibilities to save time (this is called pruning), but the more possibilities 
it ignores, the higher the risk that it accidentally ignores the best move. 
Also, there is allways the danger of the horizon effect: The program 
misinterprets the situation because it simply doesn't get enough time to fully 
investigate every possibility. 
 
The challenge for the programmer is to strike an optimal balance between time 
consumption and losing by accidentally ignoring or misinterpreting the best 
move. Some game programs are even able to finetune themselves to optimize this 
balance over time (learning).
 
A combination of these adaptations have proven to be well enough to enable 
chess programs on current hardware to beat any human with reasonable thinking 
time.
But computer-go is a harder problem than computer-chess. The adaptations that 
are succesful in computer-chess are just not enough for computer-go, mainly 
because there ar far more possibilities (and I mean FAR more) to explore in go. 
 
But now there is a recent adaptation that has proven to be fairly succesful in 
computer-go: MonteCarlo playouts.
The program reduces the work to be done by gambling a number of times and 
measuring the average payoff. If the average payoff is good, the move is 
assumed to be good for the time being. Also, it doesn't automatically ignore 
moves with a low payoff (remember the risk of ignoring the best move). The 
program just postpones further investigation of seemingly bad moves until the 
moves seeming good initially give more and more dissapointing payoffs after 
further investigation. 
Although this invention has had only moderate success in computer-chess, it has 
proven to be fairly succesful in computer-go.
 
But at the heart of a MonteCarlo go program, you can still recognize the 
minimax algorithm.
 
Best Regards,
Dave
 
 


Van: [EMAIL PROTECTED] namens tony tang
Verzonden: wo 12-11-2008 2:23
Aan: computer-go@computer-go.org
Onderwerp: [computer-go] Monte carlo play


Greetings all go gurus,
 
I am new to programming go, could some one explain to me how a monte carlo 
based evalution manages to play random games by itself?
ie: who/what is the oppoent which supplies the opposing moves which allows 
another move to be randomly played after making the initial move at the root.
I am implementing in java, is there a package/framework which allows me to 
train my software.
 
Thank you
 
Kind regards
 
Tony




Read amazing stories to your kids on Messenger Try it Now! 
http://clk.atdmt.com/UKM/go/117588488/direct/01/  
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Monte carlo play

2008-11-11 Thread Darren Cook
 I am new to programming go, could some one explain to me how a monte
 carlo based evalution manages to play random games by itself? ie:
 who/what is the oppoent which supplies the opposing moves which
 allows another move to be randomly played after making the initial

It is self-play, so both players are random.

I'm not aware of any programs that use different playout strategies for
black/white, though it has been briefly mentioned before on this list,
and I personally think it might be interesting as a way of discovering
and more accurately evaluating the imbalances present in most go
positions (i.e. usually one side has more strength and one side has more
territory, and so to correctly understand the position one side has to
play aggressive moves and the other has to play defensive moves).

 move at the root. I am implementing in java, is there a
 package/framework which allows me to train my software.

This page, recently started by Eric Marchand, might be a good starting
place:
  http://ricoh51.free.fr/go/engineeng.htm

Darren


-- 
Darren Cook, Software Researcher/Developer
http://dcook.org/mlsn/ (English-Japanese-German-Chinese-Arabic
open source dictionary/semantic network)
http://dcook.org/work/ (About me and my work)
http://dcook.org/blogs.html (My blogs and articles)
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo and Japanese rules

2008-11-06 Thread Rémi Coulom

Ingo Althöfer wrote:

Hello all, two questions.

(i) Do there exist strong 9x9-go programs on Monte-Carlo base
for Japanese rules?

(ii) Having available only programs for Chinese rules, but playing
in a tournament with Japanese rules, which special tricks and
settings should be used to maximise winning chances? (This is meant
especially in the light of MC's tendency to win games by 0.5
points according to the rules implemented.)

Ingo.
  

Hi Ingo,

The standard trick is to pass as soon as your opponent passes. For this 
to work, you need to take a one-point security margin with the komi. You 
should also score seki the Japanese way.


That is how Crazy Stone played the UEC Cup and the matches at FIT2008. 
In fact, for the UEC cup, Crazy Stone did not understand seki, so I took 
a bigger security margin with the komi. But a bigger security margin is 
not good for 9x9.


Rémi
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo and Japanese rules

2008-11-06 Thread Don Dailey
On Thu, 2008-11-06 at 09:19 +0100, Ingo Althöfer wrote:
 Hello all, two questions.
 
 (i) Do there exist strong 9x9-go programs on Monte-Carlo base
 for Japanese rules?
 
 (ii) Having available only programs for Chinese rules, but playing
 in a tournament with Japanese rules, which special tricks and
 settings should be used to maximise winning chances? (This is meant
 especially in the light of MC's tendency to win games by 0.5
 points according to the rules implemented.)

I've thought about those questions myself from time to time.  Let me
think out loud concerning this.   I am by know means an expert in
Japanese scoring or even GO in general, so I'm just giving some thoughts
here and a plan for building a Japanese simple bot that you can be
free to criticize:

It seems to me the primary difference between the two is knowing when to
stop playing and of course scoring dead groups.   The Chinese style bots
do not technically need to know about scoring.  

You can look at the combined statistics at the end of the games for a
given point to get a sense of whether that point is still in play or
whether it's a forgone conclusion.  You can do the same to determine
dead groups.   I don't know how well that works in all cases, but I have
used it and it works pretty well.  

But we also want to recognize dame,  and not play to dame points early
in the game even if it doesn't affect the final Chinese outcome.   So
here is my idea:

  1. If ALL the stones of a particular group belong to the opponent with
high certainty,  they are dead.   

  2. If there are open spaces that belong to you or the opponent with
high certainty don't move to them.   

  3. If an uncertain point is touching stones of both colors and both
colors have high certainty for the color they belong to, it is probably
dame and you shouldn't move to them.   

example:   White has a stone on d4 that is clearly alive.
   Black has a stone on f4 that is clearly alive.  
   An empty point on e4 is highly uncertain.   
   Do not play to e4 - it is probably dame.

   question:  Is that a reasonably good rule or does it need some work?
 

  4. If you have no moves other than these cases, you should pass.   

You can test this idea by playing a bot on KGS under Japanese rules.
You may have to tweak what you consider your uncertainty margin.   Also,
I'm not considering seki here but we would want to find a way to cope
with that. 

- Don



 Ingo.


signature.asc
Description: This is a digitally signed message part
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Monte-Carlo and Japanese rules

2008-11-06 Thread Jason House
I think simplistic handling of Japanese rules should play dame points  
that connect chains. This avoids some problems that can arise where  
ownership probability drops after the opponent plays the dame, and a  
point of territory must get filled.


Even if not technically required, I can imagine bots acting like  
beginners and get nervous over imagined vulnerabilites.


Sent from my iPhone

On Nov 6, 2008, at 9:12 AM, Don Dailey [EMAIL PROTECTED] wrote:


On Thu, 2008-11-06 at 09:19 +0100, Ingo Althöfer wrote:

Hello all, two questions.

(i) Do there exist strong 9x9-go programs on Monte-Carlo base
for Japanese rules?

(ii) Having available only programs for Chinese rules, but playing
in a tournament with Japanese rules, which special tricks and
settings should be used to maximise winning chances? (This is meant
especially in the light of MC's tendency to win games by 0.5
points according to the rules implemented.)


I've thought about those questions myself from time to time.  Let me
think out loud concerning this.   I am by know means an expert in
Japanese scoring or even GO in general, so I'm just giving some  
thoughts

here and a plan for building a Japanese simple bot that you can be
free to criticize:

It seems to me the primary difference between the two is knowing  
when to
stop playing and of course scoring dead groups.   The Chinese style  
bots

do not technically need to know about scoring.

You can look at the combined statistics at the end of the games for a
given point to get a sense of whether that point is still in play or
whether it's a forgone conclusion.  You can do the same to determine
dead groups.   I don't know how well that works in all cases, but I  
have

used it and it works pretty well.

But we also want to recognize dame,  and not play to dame points early
in the game even if it doesn't affect the final Chinese outcome.   So
here is my idea:

 1. If ALL the stones of a particular group belong to the opponent  
with

high certainty,  they are dead.

 2. If there are open spaces that belong to you or the opponent with
high certainty don't move to them.

 3. If an uncertain point is touching stones of both colors and both
colors have high certainty for the color they belong to, it is  
probably

dame and you shouldn't move to them.

   example:   White has a stone on d4 that is clearly alive.
  Black has a stone on f4 that is clearly alive.
  An empty point on e4 is highly uncertain.
  Do not play to e4 - it is probably dame.

  question:  Is that a reasonably good rule or does it need some work?


 4. If you have no moves other than these cases, you should pass.

You can test this idea by playing a bot on KGS under Japanese rules.
You may have to tweak what you consider your uncertainty margin.
Also,

I'm not considering seki here but we would want to find a way to cope
with that.

- Don




Ingo.

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo and Japanese rules

2008-11-06 Thread Don Dailey
On Thu, 2008-11-06 at 10:44 -0500, Jason House wrote:
 I think simplistic handling of Japanese rules should play dame
 points  
 that connect chains. This avoids some problems that can arise where  
 ownership probability drops after the opponent plays the dame, and a  
 point of territory must get filled.

I'm trying to think of a counter example where my rule will not work.
Of course this has to be understood within the context of the playing
strength of the bot itself, so we don't expect extremely sophisticated
handling of special cases.  

In my example, both black and white have clearly live stones and there
is an empty point touching both of them.  Assuming that the map is
correct of course and both groups are alive,  then I suppose a
connecting move could cause a weaker group to live.   So that suggests
another rule -  in addition to the original condition of at least 1
white and black live group touching,  there should be no connections to
a group that is not alive.  

A group that is alive passes some thresehold of certainty.
Otherwise, it is not-alive but not dead.   The threshold might be
something like 90 - 95% or something but some experimentation would be
required to find a good value for this.

- Don


 
 Even if not technically required, I can imagine bots acting like  
 beginners and get nervous over imagined vulnerabilites.


signature.asc
Description: This is a digitally signed message part
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Monte-Carlo and Japanese rules

2008-11-06 Thread Erik van der Werf
IIRC under official Japanese rules at the end of the game all groups
with liberties shared between opposing colours are by definition in
seki. Therefore eventually (before counting) all dame have to be
filled.

Further, playing dame points is almost equally bad under Chinese rules
as it is under Japanese rules. So, if you have a strong 'Chinese'
engine no special tricks are needed at least until you reach the
endgame. The only thing that is severely penalized under Japanese
rules is playing needless defensive moves inside your own territory
while the opponent is passing.

Erik



On Thu, Nov 6, 2008 at 4:44 PM, Jason House [EMAIL PROTECTED] wrote:
 I think simplistic handling of Japanese rules should play dame points that
 connect chains. This avoids some problems that can arise where ownership
 probability drops after the opponent plays the dame, and a point of
 territory must get filled.

 Even if not technically required, I can imagine bots acting like beginners
 and get nervous over imagined vulnerabilites.

 Sent from my iPhone

 On Nov 6, 2008, at 9:12 AM, Don Dailey [EMAIL PROTECTED] wrote:

 On Thu, 2008-11-06 at 09:19 +0100, Ingo Althöfer wrote:

 Hello all, two questions.

 (i) Do there exist strong 9x9-go programs on Monte-Carlo base
 for Japanese rules?

 (ii) Having available only programs for Chinese rules, but playing
 in a tournament with Japanese rules, which special tricks and
 settings should be used to maximise winning chances? (This is meant
 especially in the light of MC's tendency to win games by 0.5
 points according to the rules implemented.)

 I've thought about those questions myself from time to time.  Let me
 think out loud concerning this.   I am by know means an expert in
 Japanese scoring or even GO in general, so I'm just giving some thoughts
 here and a plan for building a Japanese simple bot that you can be
 free to criticize:

 It seems to me the primary difference between the two is knowing when to
 stop playing and of course scoring dead groups.   The Chinese style bots
 do not technically need to know about scoring.

 You can look at the combined statistics at the end of the games for a
 given point to get a sense of whether that point is still in play or
 whether it's a forgone conclusion.  You can do the same to determine
 dead groups.   I don't know how well that works in all cases, but I have
 used it and it works pretty well.

 But we also want to recognize dame,  and not play to dame points early
 in the game even if it doesn't affect the final Chinese outcome.   So
 here is my idea:

  1. If ALL the stones of a particular group belong to the opponent with
 high certainty,  they are dead.

  2. If there are open spaces that belong to you or the opponent with
 high certainty don't move to them.

  3. If an uncertain point is touching stones of both colors and both
 colors have high certainty for the color they belong to, it is probably
 dame and you shouldn't move to them.

   example:   White has a stone on d4 that is clearly alive.
  Black has a stone on f4 that is clearly alive.
  An empty point on e4 is highly uncertain.
  Do not play to e4 - it is probably dame.

  question:  Is that a reasonably good rule or does it need some work?


  4. If you have no moves other than these cases, you should pass.

 You can test this idea by playing a bot on KGS under Japanese rules.
 You may have to tweak what you consider your uncertainty margin.   Also,
 I'm not considering seki here but we would want to find a way to cope
 with that.

 - Don



 Ingo.

 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo and Japanese rules

2008-11-06 Thread Mark Boon
Although what Don writes is all correct, I understood the question to  
be rather different. It's not a matter of being able to determine the  
right score at the end or the right way to play, it's a matter of  
determining the right score after each playout. For performance  
reasons MC programs will cut corners there which could be taken  
advantage of when playing by Japanese rules because the after the  
playout it is prone to getting the wrong score in certain situations.


Mark

On 6-nov-08, at 12:12, Don Dailey wrote:


On Thu, 2008-11-06 at 09:19 +0100, Ingo Althöfer wrote:

Hello all, two questions.

(i) Do there exist strong 9x9-go programs on Monte-Carlo base
for Japanese rules?

(ii) Having available only programs for Chinese rules, but playing
in a tournament with Japanese rules, which special tricks and
settings should be used to maximise winning chances? (This is meant
especially in the light of MC's tendency to win games by 0.5
points according to the rules implemented.)


I've thought about those questions myself from time to time.  Let me
think out loud concerning this.   I am by know means an expert in
Japanese scoring or even GO in general, so I'm just giving some  
thoughts

here and a plan for building a Japanese simple bot that you can be
free to criticize:

It seems to me the primary difference between the two is knowing  
when to
stop playing and of course scoring dead groups.   The Chinese style  
bots

do not technically need to know about scoring.

You can look at the combined statistics at the end of the games for a
given point to get a sense of whether that point is still in play or
whether it's a forgone conclusion.  You can do the same to determine
dead groups.   I don't know how well that works in all cases, but I  
have

used it and it works pretty well.

But we also want to recognize dame,  and not play to dame points early
in the game even if it doesn't affect the final Chinese outcome.   So
here is my idea:

  1. If ALL the stones of a particular group belong to the opponent  
with

high certainty,  they are dead.

  2. If there are open spaces that belong to you or the opponent with
high certainty don't move to them.

  3. If an uncertain point is touching stones of both colors and both
colors have high certainty for the color they belong to, it is  
probably

dame and you shouldn't move to them.

example:   White has a stone on d4 that is clearly alive.
   Black has a stone on f4 that is clearly alive.
   An empty point on e4 is highly uncertain.
   Do not play to e4 - it is probably dame.

   question:  Is that a reasonably good rule or does it need some  
work?



  4. If you have no moves other than these cases, you should pass.

You can test this idea by playing a bot on KGS under Japanese rules.
You may have to tweak what you consider your uncertainty margin.
Also,

I'm not considering seki here but we would want to find a way to cope
with that.

- Don




Ingo.

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo and Japanese rules

2008-11-06 Thread Don Dailey
On Thu, 2008-11-06 at 17:10 +0100, Erik van der Werf wrote:
 IIRC under official Japanese rules at the end of the game all groups
 with liberties shared between opposing colours are by definition in
 seki. Therefore eventually (before counting) all dame have to be
 filled.
 
 Further, playing dame points is almost equally bad under Chinese rules
 as it is under Japanese rules. So, if you have a strong 'Chinese'
 engine no special tricks are needed at least until you reach the
 endgame. The only thing that is severely penalized under Japanese
 rules is playing needless defensive moves inside your own territory
 while the opponent is passing.

The dame rule I assumed was needed for an MC bot playing because of the
way MC bots play badly when winning or losing. The idea was to
focus on what fights were left on the board so that you don't lose due
to a Japanese technicality.

But it's now not clear to me that my rule would fix that.   Maybe it
should be left out.I'm also not sure how my rule would affect seki
positions. 
   

- Don

 


signature.asc
Description: This is a digitally signed message part
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Monte-Carlo and Japanese rules

2008-11-06 Thread Jason House

On Nov 6, 2008, at 11:09 AM, Don Dailey [EMAIL PROTECTED] wrote:


On Thu, 2008-11-06 at 10:44 -0500, Jason House wrote:

I think simplistic handling of Japanese rules should play dame
points
that connect chains. This avoids some problems that can arise where
ownership probability drops after the opponent plays the dame, and a
point of territory must get filled.


I'm trying to think of a counter example where my rule will not work.
Of course this has to be understood within the context of the playing
strength of the bot itself, so we don't expect extremely sophisticated
handling of special cases.


Consider a miai to connect a chain with two liberties. It's easy to  
get the ownership correct in all heavy playouts. If one of the two  
miai points is dame, your rule would say not to play it. If the  
opponent plays there, the chain ends up in atari...








In my example, both black and white have clearly live stones and there
is an empty point touching both of them.  Assuming that the map is
correct of course and both groups are alive,  then I suppose a
connecting move could cause a weaker group to live.   So that suggests
another rule -  in addition to the original condition of at least 1
white and black live group touching,  there should be no connections  
to

a group that is not alive.

A group that is alive passes some thresehold of certainty.
Otherwise, it is not-alive but not dead.   The threshold might be
something like 90 - 95% or something but some experimentation would be
required to find a good value for this.

- Don




Even if not technically required, I can imagine bots acting like
beginners and get nervous over imagined vulnerabilites.

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


RE: [computer-go] Monte-Carlo and Japanese rules

2008-11-06 Thread David Fotland
Many Faces of Go's Monte Carlo engine plays strongly using Japanese rules.
It's required for sales in American and japan (as AI Igo).  I don't use
Remi's trick, since there are sometimes points remaining when your opponent
passes when playing against weaker players.

David

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:computer-go-
 [EMAIL PROTECTED] On Behalf Of Rémi Coulom
 Sent: Thursday, November 06, 2008 1:02 AM
 To: computer-go
 Subject: Re: [computer-go] Monte-Carlo and Japanese rules
 
 Ingo Althöfer wrote:
  Hello all, two questions.
 
  (i) Do there exist strong 9x9-go programs on Monte-Carlo base
  for Japanese rules?
 
  (ii) Having available only programs for Chinese rules, but playing
  in a tournament with Japanese rules, which special tricks and
  settings should be used to maximise winning chances? (This is meant
  especially in the light of MC's tendency to win games by 0.5
  points according to the rules implemented.)
 
  Ingo.
 
 Hi Ingo,
 
 The standard trick is to pass as soon as your opponent passes. For this
 to work, you need to take a one-point security margin with the komi. You
 should also score seki the Japanese way.
 
 That is how Crazy Stone played the UEC Cup and the matches at FIT2008.
 In fact, for the UEC cup, Crazy Stone did not understand seki, so I took
 a bigger security margin with the komi. But a bigger security margin is
 not good for 9x9.
 
 Rémi
 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


RE: [computer-go] Monte-Carlo and Japanese rules

2008-11-06 Thread dave.devos
What if the playout uses the AGA rule of paying 1 point for a pass and 
requiring white to pass last (so the game does not end by two passes if black 
plays the second pass).
Wouldn't the score then be equivalent to the japanese score?
 
Dave



Van: [EMAIL PROTECTED] namens Mark Boon
Verzonden: do 6-11-2008 17:11
Aan: computer-go
Onderwerp: Re: [computer-go] Monte-Carlo and Japanese rules



Although what Don writes is all correct, I understood the question to 
be rather different. It's not a matter of being able to determine the 
right score at the end or the right way to play, it's a matter of 
determining the right score after each playout. For performance 
reasons MC programs will cut corners there which could be taken 
advantage of when playing by Japanese rules because the after the 
playout it is prone to getting the wrong score in certain situations.

Mark

On 6-nov-08, at 12:12, Don Dailey wrote:

 On Thu, 2008-11-06 at 09:19 +0100, Ingo Althöfer wrote:
 Hello all, two questions.

 (i) Do there exist strong 9x9-go programs on Monte-Carlo base
 for Japanese rules?

 (ii) Having available only programs for Chinese rules, but playing
 in a tournament with Japanese rules, which special tricks and
 settings should be used to maximise winning chances? (This is meant
 especially in the light of MC's tendency to win games by 0.5
 points according to the rules implemented.)

 I've thought about those questions myself from time to time.  Let me
 think out loud concerning this.   I am by know means an expert in
 Japanese scoring or even GO in general, so I'm just giving some 
 thoughts
 here and a plan for building a Japanese simple bot that you can be
 free to criticize:

 It seems to me the primary difference between the two is knowing 
 when to
 stop playing and of course scoring dead groups.   The Chinese style 
 bots
 do not technically need to know about scoring.

 You can look at the combined statistics at the end of the games for a
 given point to get a sense of whether that point is still in play or
 whether it's a forgone conclusion.  You can do the same to determine
 dead groups.   I don't know how well that works in all cases, but I 
 have
 used it and it works pretty well.

 But we also want to recognize dame,  and not play to dame points early
 in the game even if it doesn't affect the final Chinese outcome.   So
 here is my idea:

   1. If ALL the stones of a particular group belong to the opponent 
 with
 high certainty,  they are dead.

   2. If there are open spaces that belong to you or the opponent with
 high certainty don't move to them.

   3. If an uncertain point is touching stones of both colors and both
 colors have high certainty for the color they belong to, it is 
 probably
 dame and you shouldn't move to them.

 example:   White has a stone on d4 that is clearly alive.
Black has a stone on f4 that is clearly alive.
An empty point on e4 is highly uncertain.
Do not play to e4 - it is probably dame.

question:  Is that a reasonably good rule or does it need some 
 work?


   4. If you have no moves other than these cases, you should pass.

 You can test this idea by playing a bot on KGS under Japanese rules.
 You may have to tweak what you consider your uncertainty margin.   
 Also,
 I'm not considering seki here but we would want to find a way to cope
 with that.

 - Don



 Ingo.
 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

RE: [computer-go] Monte-Carlo and Japanese rules

2008-11-06 Thread dave.devos
And of course black should pay 1 point for each extra handicap stone. 
http://www.britgo.org/rules/compare.html#coun
 
Dave



Van: [EMAIL PROTECTED] namens [EMAIL PROTECTED]
Verzonden: do 6-11-2008 19:28
Aan: computer-go
Onderwerp: RE: [computer-go] Monte-Carlo and Japanese rules


What if the playout uses the AGA rule of paying 1 point for a pass and 
requiring white to pass last (so the game does not end by two passes if black 
plays the second pass).
Wouldn't the score then be equivalent to the japanese score?
 
Dave



Van: [EMAIL PROTECTED] namens Mark Boon
Verzonden: do 6-11-2008 17:11
Aan: computer-go
Onderwerp: Re: [computer-go] Monte-Carlo and Japanese rules



Although what Don writes is all correct, I understood the question to 
be rather different. It's not a matter of being able to determine the 
right score at the end or the right way to play, it's a matter of 
determining the right score after each playout. For performance 
reasons MC programs will cut corners there which could be taken 
advantage of when playing by Japanese rules because the after the 
playout it is prone to getting the wrong score in certain situations.

Mark

On 6-nov-08, at 12:12, Don Dailey wrote:

 On Thu, 2008-11-06 at 09:19 +0100, Ingo Althöfer wrote:
 Hello all, two questions.

 (i) Do there exist strong 9x9-go programs on Monte-Carlo base
 for Japanese rules?

 (ii) Having available only programs for Chinese rules, but playing
 in a tournament with Japanese rules, which special tricks and
 settings should be used to maximise winning chances? (This is meant
 especially in the light of MC's tendency to win games by 0.5
 points according to the rules implemented.)

 I've thought about those questions myself from time to time.  Let me
 think out loud concerning this.   I am by know means an expert in
 Japanese scoring or even GO in general, so I'm just giving some 
 thoughts
 here and a plan for building a Japanese simple bot that you can be
 free to criticize:

 It seems to me the primary difference between the two is knowing 
 when to
 stop playing and of course scoring dead groups.   The Chinese style 
 bots
 do not technically need to know about scoring.

 You can look at the combined statistics at the end of the games for a
 given point to get a sense of whether that point is still in play or
 whether it's a forgone conclusion.  You can do the same to determine
 dead groups.   I don't know how well that works in all cases, but I 
 have
 used it and it works pretty well.

 But we also want to recognize dame,  and not play to dame points early
 in the game even if it doesn't affect the final Chinese outcome.   So
 here is my idea:

   1. If ALL the stones of a particular group belong to the opponent 
 with
 high certainty,  they are dead.

   2. If there are open spaces that belong to you or the opponent with
 high certainty don't move to them.

   3. If an uncertain point is touching stones of both colors and both
 colors have high certainty for the color they belong to, it is 
 probably
 dame and you shouldn't move to them.

 example:   White has a stone on d4 that is clearly alive.
Black has a stone on f4 that is clearly alive.
An empty point on e4 is highly uncertain.
Do not play to e4 - it is probably dame.

question:  Is that a reasonably good rule or does it need some 
 work?


   4. If you have no moves other than these cases, you should pass.

 You can test this idea by playing a bot on KGS under Japanese rules.
 You may have to tweak what you consider your uncertainty margin.   
 Also,
 I'm not considering seki here but we would want to find a way to cope
 with that.

 - Don



 Ingo.
 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

RE: [computer-go] Monte-Carlo and Japanese rules

2008-11-06 Thread Jeff Nowakowski
On Thu, 2008-11-06 at 09:43 -0800, David Fotland wrote:
 Many Faces of Go's Monte Carlo engine plays strongly using Japanese rules.

So what do you do in the playouts?  Do you score with area or territory?
Does your program play optimally where different rules would result in
different winner?

-Jeff

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo and Japanese rules

2008-11-06 Thread Michael Williams

I'm sure he meant, Does your program play optimally in trivial situations where 
different rules would result in a different winner?

I'm not sure if your last answer also applies to that, more specific question.


David Fotland wrote:

I score with area, and adjust for Japanese rules.  It doesn't play optimally
under and rule set :)

David


-Original Message-
From: [EMAIL PROTECTED] [mailto:computer-go-
[EMAIL PROTECTED] On Behalf Of Jeff Nowakowski
Sent: Thursday, November 06, 2008 11:57 AM
To: computer-go
Subject: RE: [computer-go] Monte-Carlo and Japanese rules

On Thu, 2008-11-06 at 09:43 -0800, David Fotland wrote:

Many Faces of Go's Monte Carlo engine plays strongly using Japanese

rules.

So what do you do in the playouts?  Do you score with area or territory?
Does your program play optimally where different rules would result in
different winner?

-Jeff

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/



___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo and Japanese rules

2008-11-06 Thread Darren Cook
 (ii) Having available only programs for Chinese rules, but playing
 in a tournament with Japanese rules, which special tricks and
 settings should be used to maximise winning chances? (This is meant
 especially in the light of MC's tendency to win games by 0.5
 points according to the rules implemented.)

The difference I've noticed most in analyzing strong 9x9 games (in
Japanese rules using a bot playing in Chinese rules) is a half-point ko
at the end of the game.

In Chinese rules, once all the dame have been filled in, if one side has
no ko threats he has to pass, and the opponent fills the ko and gets an
extra point. In Japanese rules passing does not cost anything. So the
score ends up one point different. In these games what tends to happen
is the Chinese-rule-monte-carlo bot, playing for its 0.5pt win, ends up
choosing the wrong move 20 moves from the end of the game.

If that is as clear as mud let me know and I'll try to hunt up an
example game.

Darren

-- 
Darren Cook, Software Researcher/Developer
http://dcook.org/mlsn/ (English-Japanese-German-Chinese-Arabic
open source dictionary/semantic network)
http://dcook.org/work/ (About me and my work)
http://dcook.org/blogs.html (My blogs and articles)
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte Carlo evaluation in chess

2008-07-21 Thread Gian-Carlo Pascutto

Álvaro Begué wrote:

On Sun, Jul 20, 2008 at 3:40 AM, Rémi Coulom
[EMAIL PROTECTED] wrote:
Rybka 3 has Monte-Carlo evaluation: 
http://www.chessbase.com/newsdetail.asp?newsid=4772


If I understand the release note correctly, Monte Carlo Analysis is 
something like a feature of the GUI for analyzing a position and 
giving statistics back to the user, not part of the evaluation 
function used in the search.


Correct. They play a lot of very fast games from a position and look at 
the winning rate.


It is similar to basic Monte Carlo eval in go, which implies that it 
also inherits its weaknesses. Now if they would add UCT... :)


I am sure it is not used in the actual playing engine.

--
GCP
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte Carlo evaluation in chess

2008-07-20 Thread Álvaro Begué
On Sun, Jul 20, 2008 at 3:40 AM, Rémi Coulom [EMAIL PROTECTED] wrote:
 Rybka 3 has Monte-Carlo evaluation:
 http://www.chessbase.com/newsdetail.asp?newsid=4772

If I understand the release note correctly, Monte Carlo Analysis is
something like a feature of the GUI for analyzing a position and
giving statistics back to the user, not part of the evaluation
function used in the search.

Álvaro.
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte Carlo evaluation in chess

2008-07-20 Thread Álvaro Begué
On Sun, Jul 20, 2008 at 10:10 AM, Jason House
[EMAIL PROTECTED] wrote:
 On Jul 20, 2008, at 9:10 AM, Álvaro Begué [EMAIL PROTECTED] wrote:

 On Sun, Jul 20, 2008 at 3:40 AM, Rémi Coulom [EMAIL PROTECTED]
 wrote:

 Rybka 3 has Monte-Carlo evaluation:
 http://www.chessbase.com/newsdetail.asp?newsid=4772

 If I understand the release note correctly, Monte Carlo Analysis is
 something like a feature of the GUI for analyzing a position and
 giving statistics back to the user, not part of the evaluation
 function used in the search.

 It also mentions a recent strength gain of 80 ELO. I would not be surprised
 if they added MC to influence the search.

Well, I would be. I was a chess programmer for years before I started
toying with go, and I can't believe there is any chance of MC having
anything to do with the improved strength.
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] monte carlo transposition reuse (MCTR)

2007-09-14 Thread Chris Fant
Does it defeat it based on number of samples taken or time allotted per turn?

On 9/14/07, Jason House [EMAIL PROTECTED] wrote:
 I know I'm only wading in the kiddie pool of computer go with my 1-ply bots,
 but I think I may have found a useful enhancement to monte carlo.

 HouseBot supports three 1-ply search modes:
   1plyMC - Uniform sampling
   1plyShuffle - Uniform sampling with monte carlo transposition reuse
   1plyUCT - Non-uniform sampling based on the UCT algorithm (AKA UCB)

 Obviously, 1plyMC is far inferior to 1plyUCT as everyone probably expects.
 What may surprise many is that 1plyShuffle defeats 1plyUCT nearly every
 time.  I'm basic this on self-play data from CGOS.  Currently,
 http://cgos.boardspace.net/9x9/cross/housebot-617-UCB.html
 shows 10 matches between housebot-617-UCB has played housebot-618-shuff.
 housebot-617-UCB (1plyUCT) lost every time.

 While tricky, it should be possible to combine UCT and MCTR for an even
 stronger bot.  MCTR can be thought of as a low bias alternative to the AMAF
 heuristic.  Rather than using all moves, MCTR takes only the top N moves,
 where N is computed based on which moves were played in the random game.
 From an open board position MCTR uses about 1/3 of the moves that AMAF
 would.  Computation of the resulting winning percentage must also be
 weighted based on the probabilities of duplicating results (roughly
 speaking, it's 1/N).

 As a result of using MCTR, winning rates are no longer integers as one would
 expect.  Here's the estimated winning rates for all three algorithms when
 asked for a white response to black G3:

 1plyMC:   781 / 1272
 1plyShuffle:  140.15 /  231.75
 1plyUCT: 936 / 1515

 1plyShuffle is slower because of the extra work information tracking, but
 the variance in estimates should be far lower than the numbers would
 indicate.  I have yet to do the computations, but a sample size of 231.75
 has an estimation error of around 6000 normal MC runs for that position.
 That is why my implementation of MCTR is defeating my (1ply) implementation
 of UCT.

 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] monte carlo transposition reuse (MCTR)

2007-09-14 Thread Jason House
Time management is identical.  Based on quick profiling, the MCTR version
does about 1/6 of the simulations.  Actually, since the MCTR does extra
tracking of info, it can reuse some old simulations, so it may be more like
1/4 or 1/5 of the simulations.  It's just using the results more
efficiently.

On 9/14/07, Chris Fant [EMAIL PROTECTED] wrote:

 Does it defeat it based on number of samples taken or time allotted per
 turn?

 On 9/14/07, Jason House [EMAIL PROTECTED] wrote:
  I know I'm only wading in the kiddie pool of computer go with my 1-ply
 bots,
  but I think I may have found a useful enhancement to monte carlo.
 
  HouseBot supports three 1-ply search modes:
1plyMC - Uniform sampling
1plyShuffle - Uniform sampling with monte carlo transposition reuse
1plyUCT - Non-uniform sampling based on the UCT algorithm (AKA UCB)
 
  Obviously, 1plyMC is far inferior to 1plyUCT as everyone probably
 expects.
  What may surprise many is that 1plyShuffle defeats 1plyUCT nearly every
  time.  I'm basic this on self-play data from CGOS.  Currently,
  http://cgos.boardspace.net/9x9/cross/housebot-617-UCB.html
  shows 10 matches between housebot-617-UCB has played housebot-618-shuff.
  housebot-617-UCB (1plyUCT) lost every time.
 
  While tricky, it should be possible to combine UCT and MCTR for an even
  stronger bot.  MCTR can be thought of as a low bias alternative to the
 AMAF
  heuristic.  Rather than using all moves, MCTR takes only the top N
 moves,
  where N is computed based on which moves were played in the random game.
  From an open board position MCTR uses about 1/3 of the moves that AMAF
  would.  Computation of the resulting winning percentage must also be
  weighted based on the probabilities of duplicating results (roughly
  speaking, it's 1/N).
 
  As a result of using MCTR, winning rates are no longer integers as one
 would
  expect.  Here's the estimated winning rates for all three algorithms
 when
  asked for a white response to black G3:
 
  1plyMC:   781 / 1272
  1plyShuffle:  140.15 /  231.75
  1plyUCT: 936 / 1515
 
  1plyShuffle is slower because of the extra work information tracking,
 but
  the variance in estimates should be far lower than the numbers would
  indicate.  I have yet to do the computations, but a sample size of
 231.75
  has an estimation error of around 6000 normal MC runs for that position.
  That is why my implementation of MCTR is defeating my (1ply)
 implementation
  of UCT.

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] monte carlo transposition reuse (MCTR)

2007-09-14 Thread Jason House
Sure.

I'm not sure how best to answer your question, so I'll respond with mostly
pseudo code. I can answer questions on theory, but I expect that to go back
and forth a bit since it's typically tough to explain.

Where there's grey areas in implementation choice, I have text describing it
(including basic theory).  For the real source code, the logic is in
search.d of HouseBot's (open) source. HouseBot's home page is
http://housebot.sourceforge.net

Slowdown over basic MC occurs in two ways:
1. Maintaining a list of moves whose order can be shuffled around.
2. Applying the results to all affected leaves in the search tree
(3. In my current implementation, I may save too much info and memory
management may be another problem)

Here's some pseudo code
Main thinking loop:
  while ( can think ){
reset board
reset shuffle list
shuffleMode = true
foreach(legal move){
  play(move)
  if ( shuffleMode ){
shuffleMode = transposition still true  //see below
if (shuffleMode || first move)
  insert move into shuffle list
}
apply result   //see below
  }

Transposition still true
  By transposition or shuffle, I mean that I can reorder the moves by each
color and arrive at the same board position without violating any rules
along the way... Still representing a mostly random game.
  Obviously, any move that's illegal in the starting board position should
will violate this and breaks the move shuffling property.  Illegal would be
stuff like suicide plays, taking a ko, or playing on top of another stone.
If you assume all captures break the move shuffling property, life is simple
and strange suicide plays that occur as combinations of stones won't occur.
If you're smart enough to find a way to do that efficiently (or ignore it
like I did), then you must check to ensure that stones captured are not in
the shuffle list.  If they are, order matters and the latest move breaks the
move shuffling property.
  My code is far different, but when I think about it, the simplest form
would be : return not (capture on current board).  I should probably try
that one out...  Allowing captures does have the benefit that the shuffle
list gets longer, but the control logic gets more complex (simple ko's,
suicide in original position or a complex reordered combination of moves
that create suicides, etc...).


How to apply the result:
  Essentially, everything in the shuffle list can consider this as a game
that began with that move first.  At first, I had fractional game = 1 in a
nieve attempt to test out my theory and it failed horribly.  The problem is
that a positions with long shuffle lists can be reached in many ways by many
starting moves.  Because of this, sequences with long shuffle lists got
counted way too much.  I had to make it so that regardless of the shuffle
list length, a position would not be over-counted in the final average.  To
do this, you should compute the probability of the position being arrived at
by all candidate methods and then do a weighted sum.  prob(this method) / [
prob (this method) + prob (alternate method 1) + prob( alternate method 2) +
...].  I made the naive assumption that prob(method a) == prob(method b).  I
think that's reasonable for uniform sampling.  That value is used in the
pseudo code below

  foreach(random game)
float fractional game = 1 / number of moves in shuffle list with the
same color to move
foreach(move in shuffle list with same color to move){
  num sims (move of interest) += fractional game
  if (random game is is win)
num wins (move of interest) += fractional game
  // Note that variance of the estimate increases by (0.5 * fractional
game)^2
}
  }

On 9/14/07, Brian Slesinsky [EMAIL PROTECTED] wrote:

 Can you explain a bit more about how 1plyShuffle works?

 On 9/14/07, Jason House [EMAIL PROTECTED] wrote:
  Time management is identical.  Based on quick profiling, the MCTR
 version
  does about 1/6 of the simulations.  Actually, since the MCTR does extra
  tracking of info, it can reuse some old simulations, so it may be more
 like
  1/4 or 1/5 of the simulations.  It's just using the results more
  efficiently.
 
 
  On 9/14/07, Chris Fant [EMAIL PROTECTED] wrote:
   Does it defeat it based on number of samples taken or time allotted
 per
  turn?
  
   On 9/14/07, Jason House [EMAIL PROTECTED] wrote:
I know I'm only wading in the kiddie pool of computer go with my
 1-ply
  bots,
but I think I may have found a useful enhancement to monte carlo.
   
HouseBot supports three 1-ply search modes:
  1plyMC - Uniform sampling
  1plyShuffle - Uniform sampling with monte carlo transposition
 reuse
  1plyUCT - Non-uniform sampling based on the UCT algorithm (AKA
 UCB)
   
Obviously, 1plyMC is far inferior to 1plyUCT as everyone probably
  expects.
What may surprise many is that 1plyShuffle defeats 1plyUCT nearly
 every
time.  I'm basic this on self-play data from 

Re: [computer-go] monte carlo transposition reuse (MCTR)

2007-09-14 Thread Jason House
On 9/14/07, Jason House [EMAIL PROTECTED] wrote:

   // Note that variance of the estimate increases by (0.5 * fractional
 game)^2


I should say that the variance of wins increases by that amount.  The
estimate of winning percentage will be computed as wins / sims.  The
variance for that is (variance of wins variable) / sims^2.  In normal monte
carlo, each sim causes the variance of wins to increase by 0.5^2 for a total
of 0.5^2 * sims.  The final result is a variance of 0.5^2/sims.  Of course,
final computations typically use the standard deviation (the square root of
variance).  This gives the classical result of 0.25 / sqrt(sims)...  For the
same MCBR, the result is less for the same value of sims...
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Monte-Carlo Go simulation

2007-02-12 Thread David Doshay

On 9, Feb 2007, at 4:40 AM, Sylvain Gelly wrote:

Alain's point, that knowledge can both help narrow the search to  
good

moves and at the same time steer you away from the best move is
absolutely true in SlugGo's case.

I completely agree with that.
However can we agree that we want a better player in a whole, and not
only better in some particular positions? So perhaps, I think, behind
far from the best move, while playing always good moves is already
good no?

Sylvain


Absolutely. I notice that when SlugGo makes moves that a professional
said look quite playable a huge mistake is going to happen very soon.

Making the best move from the point of view of a really strong player
is also NOT what I want SlugGo to do. SlugGo has no concept of what
the implications are and what the required followup moves will be.

It is clearly better for SlugGo to make many 90% moves in a row than to
have it try to make any 100% moves. There is a much better chance of
it finding the correct followup to slightly lower quality moves.



Cheers,
David






___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Go simulation

2007-02-11 Thread Sylvain Gelly

Alain's point, that knowledge can both help narrow the search to good
moves and at the same time steer you away from the best move is
absolutely true in SlugGo's case.

I completely agree with that.
However can we agree that we want a better player in a whole, and not
only better in some particular positions? So perhaps, I think, behind
far from the best move, while playing always good moves is already
good no?

Sylvain
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Go simulation

2007-02-09 Thread alain Baeckeroot
Le jeudi 8 février 2007 22:09, Sylvain Gelly a écrit :
  It seems i was ambiguous: I was speaking of the simulation player too.
  What i meant is a random simulation player is not biased, whereas a better
  simulation player is biased by its knowledge, and thus can give wrong
  evaluation of a position.
 I think we have to start defining what the bias. For me the bias is
 the difference between the expected value of the outcomes of playouts
 by the simulation player and the real minimax value. In this
 definition the uniform random simulation player is VERY biased and
 gnugo much less.
OK, by i used bias in common sense, to mean that the strong simulator has
preferences for some moves, and doesn't consider them equally, 
or worse doesn't consider some moves. So it will miss some good points due to
its knowledge, whereas the random player will find the move.

 
  A trivial example is GNU Go: its analyze is sometimes wrong.
 Of course, if not computer go would be solved :-).
 
  Even if it is obviously much stronger than a random player, it would give
  wrong result if used as a simulation player. 
 Hum, are you sure?
I m 100% sure of this :-) 
 
 I think that GnuGo with randomisation, (and much 
 faster of course) would make a very good simulation player (much
 better than any existing simulation player).
Even with randomization, GNU Go considers only a few dozen of possible moves,
and makes systematic errors. Some times ago Rémi Coulom asked for positions
illustrating computer stupidity (2006-11-22) 
http://computer-go.org/pipermail/computer-go/2006-November/007107.html
and GNU Go provided some nice examples where its (wrong/misunderstood) knowledge
induces a failure in play. One very impressive was GNU GO 3.6 not invading
where obviously it is possible to invade (Steven Clark 2006-11-27)
http://computer-go.org/pipermail/computer-go/2006-November/007184.html

 But a weaker player than GnuGo can make an even better simulation player.
yes.
 
  David Doshay experiments with SlugGo showed that
  searching very deep/wide does not improve a lot the strength of the engine,
  which is bound by the underlying weaknesses of GNU Go.
 Yes, this a similar non trivial result. I think there are more
 existing experimental and theoritical analysis of this, though.
 Perhaps such an analysis already exist for MC also, it is just that I
 don't know.
 
  Or maybe i just understood nothing of what you explained ;)
 It was not really explanations, just thoughts. I have no the
 solution, just think that it is an interesting question, and that it
 may be discussed. May be from a strong explanation of this phenomenon
 could come new ideas.
 
 I understand all these counter examples, I just think that it is more
 complicated than that.
 
 Sylvain

I fully agree.
Alain
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Go Misnomer?

2007-02-08 Thread Sylvain Gelly

Hello,


Is there any known (by theory or tests) function of how much a increase
in the strength of the simulation policy increases the strength of the
MC/UCT Program as a whole?


I think that is a very interesting question.
In our work on MoGo we found that there could be a decrease of the
strength of the MC/UCT program while using a stronger simulation
policy. It is why in MoGo it is more the sequence idea, than the
strength idea. Our best simulation policy is quite weak compared to
others we tested.
But we have further experiments, in a work with David Silver from the
university of Alberta. We found out that the relation strong
simulation policy = strong MC program is wrong at a much larger
scale. So the intransivity is true even with much much stronger
simulation policies.

Of course there is the simple counter example of a deterministic
player. But our results  hold even if we randomise (in a lot of
manners, and tuning as best as we can the parameters) the much
stronger policy.

I have some theory about this phenomenon in general, but not enough
polished for the moment. I really think that understanding deeply
this experimental evidence, deeper than some intuition, would help
going further.
But maybe some already did.

Sylvain
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Go Misnomer?

2007-02-08 Thread Chris Fant

In our work on MoGo we found that there could be a decrease of the
strength of the MC/UCT program while using a stronger simulation
policy. It is why in MoGo it is more the sequence idea, than the
strength idea. Our best simulation policy is quite weak compared to
others we tested.


Could you elaborate on the sequence idea?  Or point me to somewhere
where you have already done so?
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Go Misnomer?

2007-02-08 Thread Sylvain Gelly

Could you elaborate on the sequence idea?  Or point me to somewhere
where you have already done so?


Sure. We already describe it in our research report that you can find there:
http://hal.inria.fr/inria-00117266
or there if you prefers the web interface in english but with a longer URL :-)
http://hal.inria.fr/index.php?halsid=f72d067037f3b8f161448b400d117210view_this_doc=inria-00117266version=3

Sylvain
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Go simulation

2007-02-08 Thread Sylvain Gelly

One simple explaination could be that a random player shamelessly tries all
moves (very bad ones but also very nice tesuji) whereas the stronger player
is restricted by its knowledge and will always miss some kind of moves.


Here we are not speeking about the pruning in the tree, but the
simulation player. The tree must explore every move, to avoid missing
important ones. However we totally don't care if all possible games
can or not be played by the simulation player. What we care about is
the expectation of the wins by self play.
If the simulation player sometimes play meaningful sequences but with
a very small probability, then it has very little influence on the
expectation.

It seems to me that the explanation may be more complicated.

Sylvain
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Go simulation

2007-02-08 Thread Sylvain Gelly

It seems i was ambiguous: I was speaking of the simulation player too.
What i meant is a random simulation player is not biased, whereas a better
simulation player is biased by its knowledge, and thus can give wrong
evaluation of a position.

I think we have to start defining what the bias. For me the bias is
the difference between the expected value of the outcomes of playouts
by the simulation player and the real minimax value. In this
definition the uniform random simulation player is VERY biased and
gnugo much less.


A trivial example is GNU Go: its analyze is sometimes wrong.

Of course, if not computer go would be solved :-).


Even if it is obviously much stronger than a random player, it would give wrong 
result if used as a simulation player.

Hum, are you sure? I think that GnuGo with randomisation, (and much
faster of course) would make a very good simulation player (much
better than any existing simulation player). But a weaker player than
GnuGo can make an even better simulation player.


David Doshay experiments with SlugGo showed that
searching very deep/wide does not improve a lot the strength of the engine,
which is bound by the underlying weaknesses of GNU Go.

Yes, this a similar non trivial result. I think there are more
existing experimental and theoritical analysis of this, though.
Perhaps such an analysis already exist for MC also, it is just that I
don't know.


Or maybe i just understood nothing of what you explained ;)

It was not really explanations, just thoughts. I have no the
solution, just think that it is an interesting question, and that it
may be discussed. May be from a strong explanation of this phenomenon
could come new ideas.

I understand all these counter examples, I just think that it is more
complicated than that.

Sylvain
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


  1   2   >