Re: [Computer-go] Monte-Carlo Tree Search as Regularized Policy Optimization

2020-07-19 Thread Kensuke Matsuzaki
I used stochastic sampling at internal nodes, because of this.
> During the forward simulation phase of SEARCH, the action at each node x is 
> selected by sampling a ∼ π¯(·|x).
> As a result, the full imaginary trajectory is generated consistently 
> according to policy π¯.

> In this section, we establish our main claim namely that AlphaZero’s action 
> selection criteria can be interpreted as approximating the solution to a 
> regularized policy-optimization objective.

I think they say UCT and PUCT is approximation of direct π¯ sampling,
but I haven't understood section 3 well.

2020年7月20日(月) 2:51 Daniel :
>
> @Kensuke I suppose all the proposed algorithms ACT, SEARCH and LEARN are 
> meant to be used during training, no?
> I think I understand ACT and LEARN but I am not sure about SEARCH for which 
> they say this:
>
> > During search, we propose to stochastically sample actions according to π¯ 
> > instead of the deterministic action selection rule of Eq. 1.
>
> This sounds much like the random selection done at the root with temperature, 
> but this time applied at internal nodes.
> Does it mean the pUCT formula is not used? Why does the selection have to be 
> stochastic now?
> On selection, you compute π_bar every time from (q, π_theta, n_visits) so I 
> suppose π_bar has everything it needs to balance exploration and exploitation.
>
>
> On Sun, Jul 19, 2020 at 8:10 AM David Wu  wrote:
>>
>> I imagine that at low visits at least, "ACT" behaves similarly to Leela 
>> Zero's "LCB" move selection, which also has the effect of sometimes 
>> selecting a move that is not the max-visits move, if its value estimate has 
>> recently been found to be sufficiently larger to balance the fact that it is 
>> lower prior and lower visits (at least, typically, this is why the move 
>> wouldn't have been the max visits move in the first place). It also scales 
>> in an interesting way with empirical observed playout-by-playout variance of 
>> moves, but I think by far the important part is that it can use sufficiently 
>> confident high value to override max-visits.
>>
>> The gain from "LCB" in match play I recall is on the very very rough order 
>> of 100 Elo, although it could be less or more depending on match conditions 
>> and what neural net is used and other things. So for LZ at least, "ACT"-like 
>> behavior at low visits is not new.
>>
>>
>> On Sun, Jul 19, 2020 at 5:39 AM Kensuke Matsuzaki  
>> wrote:
>>>
>>> Hi,
>>>
>>> I couldn't improve leela zero's strength by implementing SEARCH and ACT.
>>> https://github.com/zakki/leela-zero/commits/regularized_policy
>>>
>>> 2020年7月17日(金) 2:47 Rémi Coulom :
>>> >
>>> > This looks very interesting.
>>> >
>>> > From a quick glance, it seems the improvement is mainly when the number 
>>> > of playouts is small. Also they don't test on the game of Go. Has anybody 
>>> > tried it?
>>> >
>>> > I will take a deeper look later.
>>> >
>>> > On Thu, Jul 16, 2020 at 9:49 AM Ray Tayek  wrote:
>>> >>
>>> >> https://old.reddit.com/r/MachineLearning/comments/hrzooh/r_montecarlo_tree_search_as_regularized_policy/
>>> >>
>>> >>
>>> >> --
>>> >> Honesty is a very expensive gift. So, don't expect it from cheap people 
>>> >> - Warren Buffett
>>> >> http://tayek.com/
>>> >>
>>> >> ___
>>> >> Computer-go mailing list
>>> >> Computer-go@computer-go.org
>>> >> http://computer-go.org/mailman/listinfo/computer-go
>>> >
>>> > ___
>>> > Computer-go mailing list
>>> > Computer-go@computer-go.org
>>> > http://computer-go.org/mailman/listinfo/computer-go
>>>
>>>
>>>
>>> --
>>> Kensuke Matsuzaki
>>> ___
>>> Computer-go mailing list
>>> Computer-go@computer-go.org
>>> http://computer-go.org/mailman/listinfo/computer-go
>>
>> ___
>> Computer-go mailing list
>> Computer-go@computer-go.org
>> http://computer-go.org/mailman/listinfo/computer-go
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go



-- 
Kensuke Matsuzaki
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go


Re: [Computer-go] Monte-Carlo Tree Search as Regularized Policy Optimization

2020-07-19 Thread Daniel
@Kensuke I suppose all the proposed algorithms ACT, SEARCH and LEARN are
meant to be used during training, no?
I think I understand ACT and LEARN but I am not sure about SEARCH for which
they say this:

> During search, we propose to stochastically sample actions according to
π¯ instead of the deterministic action selection rule of Eq. 1.

This sounds much like the random selection done at the root with
temperature, but this time applied at internal nodes.
Does it mean the pUCT formula is not used? Why does the selection have to
be stochastic now?
On selection, you compute π_bar every time from (q, π_theta, n_visits) so I
suppose π_bar has everything it needs to balance exploration and
exploitation.


On Sun, Jul 19, 2020 at 8:10 AM David Wu  wrote:

> I imagine that at low visits at least, "ACT" behaves similarly to Leela
> Zero's "LCB" move selection, which also has the effect of sometimes
> selecting a move that is not the max-visits move, if its value estimate has
> recently been found to be sufficiently larger to balance the fact that it
> is lower prior and lower visits (at least, typically, this is why the move
> wouldn't have been the max visits move in the first place). It also scales
> in an interesting way with empirical observed playout-by-playout variance
> of moves, but I think by far the important part is that it can use
> sufficiently confident high value to override max-visits.
>
> The gain from "LCB" in match play I recall is on the very very rough order
> of 100 Elo, although it could be less or more depending on match conditions
> and what neural net is used and other things. So for LZ at least,
> "ACT"-like behavior at low visits is not new.
>
>
> On Sun, Jul 19, 2020 at 5:39 AM Kensuke Matsuzaki 
> wrote:
>
>> Hi,
>>
>> I couldn't improve leela zero's strength by implementing SEARCH and ACT.
>> https://github.com/zakki/leela-zero/commits/regularized_policy
>>
>> 2020年7月17日(金) 2:47 Rémi Coulom :
>> >
>> > This looks very interesting.
>> >
>> > From a quick glance, it seems the improvement is mainly when the number
>> of playouts is small. Also they don't test on the game of Go. Has anybody
>> tried it?
>> >
>> > I will take a deeper look later.
>> >
>> > On Thu, Jul 16, 2020 at 9:49 AM Ray Tayek  wrote:
>> >>
>> >>
>> https://old.reddit.com/r/MachineLearning/comments/hrzooh/r_montecarlo_tree_search_as_regularized_policy/
>> >>
>> >>
>> >> --
>> >> Honesty is a very expensive gift. So, don't expect it from cheap
>> people - Warren Buffett
>> >> http://tayek.com/
>> >>
>> >> ___
>> >> Computer-go mailing list
>> >> Computer-go@computer-go.org
>> >> http://computer-go.org/mailman/listinfo/computer-go
>> >
>> > ___
>> > Computer-go mailing list
>> > Computer-go@computer-go.org
>> > http://computer-go.org/mailman/listinfo/computer-go
>>
>>
>>
>> --
>> Kensuke Matsuzaki
>> ___
>> Computer-go mailing list
>> Computer-go@computer-go.org
>> http://computer-go.org/mailman/listinfo/computer-go
>>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go


Re: [Computer-go] Monte-Carlo Tree Search as Regularized Policy Optimization

2020-07-19 Thread David Wu
I imagine that at low visits at least, "ACT" behaves similarly to Leela
Zero's "LCB" move selection, which also has the effect of sometimes
selecting a move that is not the max-visits move, if its value estimate has
recently been found to be sufficiently larger to balance the fact that it
is lower prior and lower visits (at least, typically, this is why the move
wouldn't have been the max visits move in the first place). It also scales
in an interesting way with empirical observed playout-by-playout variance
of moves, but I think by far the important part is that it can use
sufficiently confident high value to override max-visits.

The gain from "LCB" in match play I recall is on the very very rough order
of 100 Elo, although it could be less or more depending on match conditions
and what neural net is used and other things. So for LZ at least,
"ACT"-like behavior at low visits is not new.


On Sun, Jul 19, 2020 at 5:39 AM Kensuke Matsuzaki 
wrote:

> Hi,
>
> I couldn't improve leela zero's strength by implementing SEARCH and ACT.
> https://github.com/zakki/leela-zero/commits/regularized_policy
>
> 2020年7月17日(金) 2:47 Rémi Coulom :
> >
> > This looks very interesting.
> >
> > From a quick glance, it seems the improvement is mainly when the number
> of playouts is small. Also they don't test on the game of Go. Has anybody
> tried it?
> >
> > I will take a deeper look later.
> >
> > On Thu, Jul 16, 2020 at 9:49 AM Ray Tayek  wrote:
> >>
> >>
> https://old.reddit.com/r/MachineLearning/comments/hrzooh/r_montecarlo_tree_search_as_regularized_policy/
> >>
> >>
> >> --
> >> Honesty is a very expensive gift. So, don't expect it from cheap people
> - Warren Buffett
> >> http://tayek.com/
> >>
> >> ___
> >> Computer-go mailing list
> >> Computer-go@computer-go.org
> >> http://computer-go.org/mailman/listinfo/computer-go
> >
> > ___
> > Computer-go mailing list
> > Computer-go@computer-go.org
> > http://computer-go.org/mailman/listinfo/computer-go
>
>
>
> --
> Kensuke Matsuzaki
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go


Re: [Computer-go] Monte-Carlo Tree Search as Regularized Policy Optimization

2020-07-19 Thread Kensuke Matsuzaki
Hi,

I couldn't improve leela zero's strength by implementing SEARCH and ACT.
https://github.com/zakki/leela-zero/commits/regularized_policy

2020年7月17日(金) 2:47 Rémi Coulom :
>
> This looks very interesting.
>
> From a quick glance, it seems the improvement is mainly when the number of 
> playouts is small. Also they don't test on the game of Go. Has anybody tried 
> it?
>
> I will take a deeper look later.
>
> On Thu, Jul 16, 2020 at 9:49 AM Ray Tayek  wrote:
>>
>> https://old.reddit.com/r/MachineLearning/comments/hrzooh/r_montecarlo_tree_search_as_regularized_policy/
>>
>>
>> --
>> Honesty is a very expensive gift. So, don't expect it from cheap people - 
>> Warren Buffett
>> http://tayek.com/
>>
>> ___
>> Computer-go mailing list
>> Computer-go@computer-go.org
>> http://computer-go.org/mailman/listinfo/computer-go
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go



-- 
Kensuke Matsuzaki
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go


Re: [Computer-go] Monte-Carlo Tree Search as Regularized Policy Optimization

2020-07-16 Thread Rémi Coulom
This looks very interesting.

>From a quick glance, it seems the improvement is mainly when the number of
playouts is small. Also they don't test on the game of Go. Has anybody
tried it?

I will take a deeper look later.

On Thu, Jul 16, 2020 at 9:49 AM Ray Tayek  wrote:

>
> https://old.reddit.com/r/MachineLearning/comments/hrzooh/r_montecarlo_tree_search_as_regularized_policy/
>
>
> --
> Honesty is a very expensive gift. So, don't expect it from cheap people -
> Warren Buffett
> http://tayek.com/
>
> ___
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go


[Computer-go] Monte-Carlo Tree Search as Regularized Policy Optimization

2020-07-16 Thread Ray Tayek

https://old.reddit.com/r/MachineLearning/comments/hrzooh/r_montecarlo_tree_search_as_regularized_policy/


--
Honesty is a very expensive gift. So, don't expect it from cheap people - 
Warren Buffett
http://tayek.com/

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go


Re: [Computer-go] monte carlo search; all valid moves?

2015-03-28 Thread Urban Hafner
Exactly you don't check super ko in the playouts and you generally stop 
playouts at 3 * boardsize * boardsize. But once you start adding more logic 
into the playouts (like not playing in single point eyes) they tend not to hit 
this limit. 

Von meinem iPhone gesendet

 Am 28.03.2015 um 11:53 schrieb hughperkins2 hughperki...@gmail.com:
 
 Oh wait, superko check is not that cheap, but it is so rare, you can probably 
 ignore it in playouts, and jist check befote submitting a move to the server. 
 If its superko, then jist pass pethaps. 
 ___
 Computer-go mailing list
 Computer-go@computer-go.org
 http://computer-go.org/mailman/listinfo/computer-go
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] monte carlo search; all valid moves?

2015-03-28 Thread folkert
Well you could check if a field is empty or not but you could also
check if you're putting stones in a dead area or if it would be a
suicide, etc.

On Sat, Mar 28, 2015 at 10:15:32AM +0100, ?lvaro Begu? wrote:
 I am not sure I understand the question. The only thing that is typically
 not checked in the playouts is superko. What other validity checks are
 you performing?
 
 Álvaro.
 
 
 
 On Sat, Mar 28, 2015 at 9:54 AM, holger krekel hol...@merlinux.eu wrote:
 
  On Sat, Mar 28, 2015 at 08:51 +0100, folkert wrote:
   Hi,
  
   For a monte carlo search, are only valid moves performed? Or does it
   work from beginning to the end of a playout using whatever free position
   is available on the board?
 
  I am also interested in the question.
 
   Because I read here that people can do 25k playouts per second while my
   program can only do ~ 20 per second when doing full validity checks on
   all steps.
 
  Do you have a reference, some context for the 25K playouts?
 
  holger
 
 
  
   Folkert van Heusden
  
   --
   --
   Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com
   ___
   Computer-go mailing list
   Computer-go@computer-go.org
   http://computer-go.org/mailman/listinfo/computer-go
 
  --
  about me:http://holgerkrekel.net/about-me/
  contracting: http://merlinux.eu
  ___
  Computer-go mailing list
  Computer-go@computer-go.org
  http://computer-go.org/mailman/listinfo/computer-go
 

 ___
 Computer-go mailing list
 Computer-go@computer-go.org
 http://computer-go.org/mailman/listinfo/computer-go



Folkert van Heusden

-- 
MultiTail is a versatile tool for watching logfiles and output of
commands. Filtering, coloring, merging, diff-view, etc.
http://www.vanheusden.com/multitail/
--
Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] monte carlo search; all valid moves?

2015-03-28 Thread Urban Hafner
20 playouts is a bit slow Folkert. But I've been there, too! Now my bot is
at around 1000 19x19 pps and 5000 on 9x9 (but I also check to not play into
single point eyes). That's still quite slow and further optimisation will
be required. But OTOH on a 4 core machine without UCT (only the AMAF)
heuristic I get around 1250ELO on CGOS 13x13 (
http://cgos.boardspace.net/13x13/cross/Imrscl-017-A-N.html).

Urban

On Sat, Mar 28, 2015 at 11:30 AM, Petr Baudis pa...@ucw.cz wrote:

 On Sat, Mar 28, 2015 at 10:15:32AM +0100, Álvaro Begué wrote:
  I am not sure I understand the question. The only thing that is typically
  not checked in the playouts is superko. What other validity checks are
  you performing?

 There is a few:

 (i) No single stone suicide.  Can't think of how to playout without
 that, you wouldn't know when to stop playing.

 (ii) No multi stone suicide.  Sometimes in the fastest implementations
 this rule is relaxed (i.e. you proclaim that you are using Tromp-Taylor
 or New Zealand rules).  But in real-world engines, you need to check
 things like self-atari or 2-liberty semeai and then a suicide check
 naturally falls out from that too.

 (iii) No ko. This is a pretty cheap check, and without it again your
 playouts may end up looping.

 BTW, the fastest playouts are probably done by libego and that's open
 source.  But as I said in the past - consider not optimizing first;
 write some reasonable playout heuristics first and *then* profile your
 code.

 Petr Baudis
 ___
 Computer-go mailing list
 Computer-go@computer-go.org
 http://computer-go.org/mailman/listinfo/computer-go




-- 
Blog: http://bettong.net/
Twitter: https://twitter.com/ujh
Homepage: http://www.urbanhafner.com/
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] monte carlo search; all valid moves?

2015-03-28 Thread hughperkins2
20 playouts per second is pretty slow. Are you sure youre not just looping 
around endlessly for 1 moves or something? 

Playing in a dead area is legal, as is filling your own eyes etc. Only suicide, 
and ko is illegal. Even superko is a pretty cheap check. ___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] monte carlo search; all valid moves?

2015-03-28 Thread folkert
  Because I read here that people can do 25k playouts per second while my
  program can only do ~ 20 per second when doing full validity checks on
  all steps.
 Do you have a reference, some context for the 25K playouts?

Look for the topic What's a good playout speed? in the mailinglist
archives.

http://computer-go.org/pipermail/computer-go/


Folkert van Heusden

-- 
Curious about the inner workings of your car? Then check O2OO: it'll
tell you all that there is to know about your car's engine!
http://www.vanheusden.com/O2OO/
--
Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] monte carlo search; all valid moves?

2015-03-28 Thread holger krekel
On Sat, Mar 28, 2015 at 08:51 +0100, folkert wrote:
 Hi,
 
 For a monte carlo search, are only valid moves performed? Or does it
 work from beginning to the end of a playout using whatever free position
 is available on the board?
 
I am also interested in the question.

 Because I read here that people can do 25k playouts per second while my
 program can only do ~ 20 per second when doing full validity checks on
 all steps.

Do you have a reference, some context for the 25K playouts?

holger


 
 Folkert van Heusden
 
 -- 
 --
 Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com
 ___
 Computer-go mailing list
 Computer-go@computer-go.org
 http://computer-go.org/mailman/listinfo/computer-go

-- 
about me:http://holgerkrekel.net/about-me/
contracting: http://merlinux.eu
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] monte carlo search; all valid moves?

2015-03-28 Thread Álvaro Begué
I am not sure I understand the question. The only thing that is typically
not checked in the playouts is superko. What other validity checks are
you performing?

Álvaro.



On Sat, Mar 28, 2015 at 9:54 AM, holger krekel hol...@merlinux.eu wrote:

 On Sat, Mar 28, 2015 at 08:51 +0100, folkert wrote:
  Hi,
 
  For a monte carlo search, are only valid moves performed? Or does it
  work from beginning to the end of a playout using whatever free position
  is available on the board?

 I am also interested in the question.

  Because I read here that people can do 25k playouts per second while my
  program can only do ~ 20 per second when doing full validity checks on
  all steps.

 Do you have a reference, some context for the 25K playouts?

 holger


 
  Folkert van Heusden
 
  --
  --
  Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com
  ___
  Computer-go mailing list
  Computer-go@computer-go.org
  http://computer-go.org/mailman/listinfo/computer-go

 --
 about me:http://holgerkrekel.net/about-me/
 contracting: http://merlinux.eu
 ___
 Computer-go mailing list
 Computer-go@computer-go.org
 http://computer-go.org/mailman/listinfo/computer-go

___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

[Computer-go] monte carlo search; all valid moves?

2015-03-28 Thread folkert
Hi,

For a monte carlo search, are only valid moves performed? Or does it
work from beginning to the end of a playout using whatever free position
is available on the board?

Because I read here that people can do 25k playouts per second while my
program can only do ~ 20 per second when doing full validity checks on
all steps.


Folkert van Heusden

-- 
--
Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com
___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] monte carlo search; all valid moves?

2015-03-28 Thread hughperkins2
Oh wait, superko check is not that cheap, but it is so rare, you can probably 
ignore it in playouts, and jist check befote submitting a move to the server. 
If its superko, then jist pass pethaps. ___
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [computer-go] monte carlo

2009-10-25 Thread Folkert van Heusden
What method are you guys using for the monte carlo search? What do you
do?
I pick at random a move, then for the resulting board that comes out of
that pick another move and so on.
Then, after that I evaulate the number of stones etc.
What do you guys look at to select a move using monte carlo?

On Sat, Oct 17, 2009 at 05:02:33PM +0200, Folkert van Heusden wrote:
 People,
 
 I'm trying to implement a monthecarlo algorithm in my go program. Now
 the results are dramatic: the elo-rating of my go program drops from
 1150 to below 700. I tried:
  - evaluate the number of captured stone
  - evaluate strategic elements (without MC this strategic eval gives
that 1150 elo).
 Currently my program can evaluate 500 scenes per second and I let it
 think for 5 seconds.
 What could be the cause of this dramatic results? Wrong evaluation? Not
 enough nodes processed?


Folkert van Heusden

-- 
www.vanheusden.com/multitail - multitail is tail on steroids. multiple
   windows, filtering, coloring, anything you can think of
--
Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] monte carlo

2009-10-25 Thread Petr Baudis
On Sun, Oct 25, 2009 at 06:52:56PM +0100, Folkert van Heusden wrote:
 What method are you guys using for the monte carlo search? What do you
 do?
 I pick at random a move, then for the resulting board that comes out of
 that pick another move and so on.
 Then, after that I evaulate the number of stones etc.

Do you mean you count the score? Make sure you actually count the stone
right, and (quite important) that you are picking out moves really at
random, not skewed in any way.

 What do you guys look at to select a move using monte carlo?

People use heuristics to prefer captures, escaping atari, play local
moves that match certain 3x3 patterns etc. - there is plenty of papers
covering this, I recommend you look at them. It's fairly important to
have all the heuristics somewhat randomized (depending on other parts of
my engine, I've found any value between 60% to 90% probability of
applying each heuristic optimal).

-- 
Petr Pasky Baudis
A lot of people have my books on their bookshelves.
That's the problem, they need to read them. -- Don Knuth
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] monte carlo

2009-10-25 Thread Folkert van Heusden
  What method are you guys using for the monte carlo search? What do you
  do?
  I pick at random a move, then for the resulting board that comes out of
  that pick another move and so on.
  Then, after that I evaulate the number of stones etc.
 
 Do you mean you count the score? Make sure you actually count the stone
 right, and (quite important) that you are picking out moves really at
 random, not skewed in any way.

Yes, the score indeed. Well, only the number of stones removed from the
board. I did not implement an algorithm to count areas belonging to a
color yet.

  What do you guys look at to select a move using monte carlo?
 People use heuristics to prefer captures, escaping atari, play local
 moves that match certain 3x3 patterns etc. - there is plenty of papers
 covering this, I recommend you look at them. It's fairly important to
 have all the heuristics somewhat randomized (depending on other parts of
 my engine, I've found any value between 60% to 90% probability of
 applying each heuristic optimal).

Yeah, I rated most of them equally. Some are weighted in depending on
applicability.

May I ask: how many board-settings (one move + all evaluation) do your
programs calculate per second?


Folkert van Heusden

-- 
Ever wonder what is out there? Any alien races? Then please support
the s...@home project: setiathome.ssl.berkeley.edu
--
Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] monte carlo

2009-10-25 Thread Petr Baudis
On Sun, Oct 25, 2009 at 10:16:58PM +0100, Folkert van Heusden wrote:
   What method are you guys using for the monte carlo search? What do you
   do?
   I pick at random a move, then for the resulting board that comes out of
   that pick another move and so on.
   Then, after that I evaulate the number of stones etc.
  
  Do you mean you count the score? Make sure you actually count the stone
  right, and (quite important) that you are picking out moves really at
  random, not skewed in any way.
 
 Yes, the score indeed. Well, only the number of stones removed from the
 board. I did not implement an algorithm to count areas belonging to a
 color yet.

That would seem to have only quite limited correlation with winning the
game.

 May I ask: how many board-settings (one move + all evaluation) do your
 programs calculate per second?

Depends on the hardware. ;-) We usually count in number of complete
playouts per second.

On 9x9, libego can play i think about 100k playouts per second on decent
CPU, but is very light. Pachi can do 35k light playouts per second and
15k heavy playouts per second on the same CPU (single-threaded). Both
Pachi and libego are written in a compiled language; when your speed is
in the right order of magnitude, I've found playout move selection
improvements much more profitable than further optimizing for speed.

-- 
Petr Pasky Baudis
A lot of people have my books on their bookshelves.
That's the problem, they need to read them. -- Don Knuth
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


RE: [computer-go] monte carlo

2009-10-25 Thread David Fotland
Many Faces of go is currently much slower.  On 9x9 it does about 18k
playouts per second, on two CPU cores, so about 9K per thread.  The average
9x9 game is about 92 moves, so there are about 1.6 million board-settings
per second.

The playouts are far from random.  Many moves are local responses to the
last move, and when there is no forced local response, the global moves are
not fully random either. 

My original light code did about 55K playouts per second on one CPU core,
but was far weaker.

David

 
  May I ask: how many board-settings (one move + all evaluation) do your
  programs calculate per second?
 
 Depends on the hardware. ;-) We usually count in number of complete
 playouts per second.
 
 On 9x9, libego can play i think about 100k playouts per second on decent
 CPU, but is very light. Pachi can do 35k light playouts per second and
 15k heavy playouts per second on the same CPU (single-threaded). Both
 Pachi and libego are written in a compiled language; when your speed is
 in the right order of magnitude, I've found playout move selection
 improvements much more profitable than further optimizing for speed.


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Simulation Balancing

2009-08-13 Thread Isaac Deutsch
I admit I had trouble understanding the details of the paper. What I  
think is the biggest problem for applying this to bigger (up to 19x19)  
games is that you somehow need access to the true value of a move,  
i.e. it's a win or a loss. On the 5x5 board they used, this might be  
approximated pretty well, but there's no chance on 19x19 to do so.



Am 13.08.2009 um 05:14 schrieb Michael Williams:

After about the 5th reading, I'm concluding that this is an  
excellent paper.  Is anyone (besides the authors) doing research  
based on this?  There is a lot to do.




___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Simulation Balancing

2009-08-13 Thread Michael Williams
In future papers they should avoid using a strong authority like Fuego for the training and instead force it to learn from a naive uniform random playout policy 
(with 100x or 1000x more playouts) and then build on that with an iterative approach (as was suggested in the paper).


I also had another thought.  Since they are training the policy to maximize the balance and not the winrate, wouldn't you be able to extract more information 
from each trial by using the score instead of the game result?  The normal pitfalls to doing so do not apply here.




Isaac Deutsch wrote:
I admit I had trouble understanding the details of the paper. What I 
think is the biggest problem for applying this to bigger (up to 19x19) 
games is that you somehow need access to the true value of a move, 
i.e. it's a win or a loss. On the 5x5 board they used, this might be 
approximated pretty well, but there's no chance on 19x19 to do so.



Am 13.08.2009 um 05:14 schrieb Michael Williams:

After about the 5th reading, I'm concluding that this is an excellent 
paper.  Is anyone (besides the authors) doing research based on this?  
There is a lot to do.




___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/



___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Simulation Balancing

2009-08-13 Thread Jason House
A web search turned up a 2 page and an 8 page version. I read the  
short one. I agree that it's promising work that requires some follow- 
up research.


Now that you've read it so many times, what excites you about it? Can  
you envision a way to scale it to larger patterns and boards on modern  
hardware?


Sent from my iPhone

On Aug 12, 2009, at 11:14 PM, Michael Williams michaelwilliam...@gmail.com 
 wrote:


After about the 5th reading, I'm concluding that this is an  
excellent paper.  Is anyone (besides the authors) doing research  
based on this?  There is a lot to do.



David Silver wrote:

Hi everyone,
Please find attached my ICML paper with Gerry Tesauro on  
automatically learning a simulation policy for Monte-Carlo Go. Our  
preliminary results show a 200+ Elo improvement over previous  
approaches, although our experiments were restricted to simple  
Monte-Carlo search with no tree on small boards.

Abstract
In this paper we introduce the first algorithms for efficiently  
learning a simulation policy for Monte-Carlo search. Our main idea  
is to optimise the balance of a simulation policy, so that an  
accurate spread of simulation outcomes is maintained, rather than  
optimising the direct strength of the simulation policy. We develop  
two algorithms for balancing a simulation policy by gradient  
descent. The first algorithm optimises the balance of complete  
simulations, using a policy gradient algorithm; whereas the second  
algorithm optimises the balance over every two steps of simulation.  
We compare our algorithms to reinforcement learning and supervised  
learning algorithms for maximising the strength of the simulation  
policy. We test each algorithm in the domain of 5x5 and 6x6  
Computer Go, using a softmax policy that is parameterised by  
weights for a hundred simple patterns. When used in a simple Monte- 
Carlo search, the policies learnt by simulation balancing achieved  
significantly better performance, with half the mean squared error  
of a uniform random policy, and equal overall performance to a  
sophisticated Go engine.

-Dave
--- 
-

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] Monte-Carlo Simulation Balancing

2009-08-13 Thread Brian Sheppard
Is anyone (besides the authors) doing research based on this?

Well, Pebbles does apply reinforcement learning (RL) to improve
its playout policy. But not in the manner described in that paper.
There are practical obstacles to directly applying that paper.

To directly apply that paper, you must have a CrazyStone
playout design, wherein you maintain 3x3 neighborhoods around
each point. Pebbles has a Mogo playout design, where you check
for patterns only around the last move (or two).

To directly pursue this would require a rewrite. Right now, there
is no published evidence that the Mogo design is inferior. In fact,
two of the world's best programs use the Mogo design (Mogo, Fuego).
So I am unwilling to make that commitment.

I would also have to research how to scale that paper to
realistic conditions, including

   1) 9x9 boards at a minimum.
   2) Self-play, instead of assuming an oracle.
   3) Playout after a UCT/RAVE search rather than pure MC.
   4) Pattern sets that have ~1 million parameters.
   5) Pattern sets that have more general geometry than 3x3, perhaps.

My guess is that all of these research problems are solvable. But
that's a lot of work to do. If I had to face this task list, I
would put it off until later, because there is always an easier
way to make progress.


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Simulation Balancing

2009-08-13 Thread Olivier Teytaud
 . Pebbles has a Mogo playout design, where you check
 for patterns only around the last move (or two).


In MoGo, it's not only around the last move (at least with some probability
and when there are empty spaces in the board); this is the fill board
modification.

(this provides a big improvement in 19x19 with big numbers of simulations,
see http://www.lri.fr/~rimmel/publi/EK_explo.pdf , Fig 3 page 8 - not only
quantitative improvement, but also, according to players, a qualitative
improvement in the way mogo plays)

Best regards,
Olivier
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

[computer-go] Monte-Carlo Simulation Balancing

2009-08-13 Thread Brian Sheppard
 . Pebbles has a Mogo playout design, where you check
 for patterns only around the last move (or two).


In MoGo, it's not only around the last move (at least with some probability
and when there are empty spaces in the board); this is the fill board
modification.

Just to clarify: I was not saying that Mogo's policy consisted
*solely* of looking for patterns around the last move. Merely that
it does not look for patterns around *every* point, which other
playout policies (e.g., CrazyStone, if I understand Remi's papers
correctly) appear to do. The RL paper seems to require that
playout design.

If you want to prioritize patterns around every point then you
need a more sophisticated board representation than Pebbles uses.

BTW, FillBoard seems to help Pebbles, too. A few percent better
on 9x9 games. No testing on larger boards. YMMV, and like everything
about computer go: all predictions are guaranteed to be wrong,
or your money back.

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Simulation Balancing

2009-08-13 Thread Olivier Teytaud


 Just to clarify: I was not saying that Mogo's policy consisted
 *solely* of looking for patterns around the last move. Merely that
 it does not look for patterns around *every* point, which other
 playout policies (e.g., CrazyStone, if I understand Remi's papers
 correctly) appear to do. The RL paper seems to require that
 playout design.


Fine!



 BTW, FillBoard seems to help Pebbles, too. A few percent better
 on 9x9 games. No testing on larger boards. YMMV, and like everything
 about computer go: all predictions are guaranteed to be wrong,
 or your money back.


For us the improvement is essential in 19x19 - I'll find that for the
generality
of fillboard if it helps also for you :-) the loss of diversity due to
patterns
is really clear in some situations, so the problem solved by fillboard is
understandable,
so I believe it should work also for you - but, as you say, all predictions
in computer-go
are almost guaranteed to be wrong :-)
Olivier
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Monte-Carlo Simulation Balancing

2009-08-12 Thread Michael Williams

After about the 5th reading, I'm concluding that this is an excellent paper.  
Is anyone (besides the authors) doing research based on this?  There is a lot 
to do.


David Silver wrote:

Hi everyone,

Please find attached my ICML paper with Gerry Tesauro on automatically 
learning a simulation policy for Monte-Carlo Go. Our preliminary results 
show a 200+ Elo improvement over previous approaches, although our 
experiments were restricted to simple Monte-Carlo search with no tree on 
small boards.





Abstract

In this paper we introduce the first algorithms for efficiently learning 
a simulation policy for Monte-Carlo search. Our main idea is to optimise 
the balance of a simulation policy, so that an accurate spread of 
simulation outcomes is maintained, rather than optimising the direct 
strength of the simulation policy. We develop two algorithms for 
balancing a simulation policy by gradient descent. The first algorithm 
optimises the balance of complete simulations, using a policy gradient 
algorithm; whereas the second algorithm optimises the balance over every 
two steps of simulation. We compare our algorithms to reinforcement 
learning and supervised learning algorithms for maximising the strength 
of the simulation policy. We test each algorithm in the domain of 5x5 
and 6x6 Computer Go, using a softmax policy that is parameterised by 
weights for a hundred simple patterns. When used in a simple Monte-Carlo 
search, the policies learnt by simulation balancing achieved 
significantly better performance, with half the mean squared error of a 
uniform random policy, and equal overall performance to a sophisticated 
Go engine.


-Dave




___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] Monte-Carlo Simulation Balancing

2009-06-22 Thread Isaac Deutsch
Has anyone tried this algorithm improvement on bigger boards and can give us
a result?

Link to original message:
http://computer-go.org/pipermail/computer-go/2009-April/018159.html

Thanks,
ibd

  So maybe I could get 600 more Elo points
  with your method. And even more on 19x19.
  I noticed that, in general, changes in the playout policy have a much
  bigger impact on larger boards than on smaller boards.
 
 As to whether we can extrapolate, I hope so :-)
 I share the same feeling that improving the simulation policy will be  
 more impactful on bigger boards with longer simulations.
 On the other hand I've been surprised by many things in MC Go, so  
 nothing is certain until we try it!
 
 -Dave

-- 
GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT!
Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] Monte Carlo on GPU

2009-05-04 Thread Michael Williams

See the April 30 2009 posting:   http://www.tobiaspreis.de/
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Simulation Balancing

2009-05-04 Thread David Silver

Hi,

We used alpha=0.1. There may well be a better setting of alpha, but  
this appeared to work nicely in our experiments.

-Dave

On 3-May-09, at 2:01 AM, elife wrote:


Hi Dave,

 In your experiments what's the constant value alpha you set?

Thanks.

2009/5/1 David Silver sil...@cs.ualberta.ca:


Yes, in our experiments they were just constant numbers M=N=100.

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Simulation Balancing

2009-05-01 Thread David Silver

Hi Yamato,


If M and N are the same, is there any reason to run M simulations and
N simulations separately?  What happens if you combine them and  
calculate

V and g in the single loop?


I think it gives the wrong answer to do it in a single loop. Note that  
the simulation outcomes z are used to compute both V and g, so that  
they are quite strongly correlated. In general, if random variables X  
and Y are correlated then E[X]E[Y] != E[XY].


A simple example of this problem would be sampling E[X]E[X] for some  
random variable X. Let's say X is actually just noise with mean zero.  
If you just take one sample x1, then x1*x1 is always +ve, and you  
would estimate E[X]E[X]=E[X^2]0. But if you take two independent  
samples x1 and x2, then x1*x2 would be +ve half the time and -ve half  
the time, and you would correctly estimate E[X]E[X]=0.


-Dave

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-30 Thread Rémi Coulom

David Silver wrote:


2. Run another N simulations, average the value of psi(s,a) over 
all positions and moves in games that black won (call this g) 


This is strange: you do not take lost playouts into consideration.

I believe there is a problem with your estimation of the gradient. 
Suppose for instance that you count z = +1 for a win, and z = -1 for a 
loss. Then you would take lost playouts into consideration. This makes 
me a little suspicious.


The fundamental problem here may be that your estimate of the gradient 
is biased by the playout policy. You should probably sample X(s) 
uniformly at random to have an unbiased estimator. Maybe this can be 
fixed with importance sampling, and then you may get a formula that is 
symmetrical regarding wins and losses. I don't have time to do it now, 
but it may be worth taking a look.


Rémi
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-30 Thread Rémi Coulom

Rémi Coulom wrote:
The fundamental problem here may be that your estimate of the gradient 
is biased by the playout policy. You should probably sample X(s) 
uniformly at random to have an unbiased estimator. Maybe this can be 
fixed with importance sampling, and then you may get a formula that is 
symmetrical regarding wins and losses. I don't have time to do it now, 
but it may be worth taking a look.


Rémi 


More precisely: you should estimate the value of N playouts as Sum p_i 
z_i / Sum p_i instead of Sum z_i. Then, take the gradient of Sum p_i z_i 
/ Sum p_i. This would be better.


Maybe Sum p_i z_i / Sum p_i would be better for MCTS, too ?

Rémi
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-30 Thread David Silver

Hi Remi,


This is strange: you do not take lost playouts into consideration.

I believe there is a problem with your estimation of the gradient.  
Suppose for instance that you count z = +1 for a win, and z = -1 for  
a loss. Then you would take lost playouts into consideration. This  
makes me a little suspicious.


Sorry, I should have made it clear that this assumes that we are  
treating black wins as z=1 and white wins as z=0.
In this special case, the gradient is the average of games in which  
black won.
But yes, more generally you need to include games won by both sides.  
The algorithms in the paper still cover this case - I was just trying  
to simplify their description to make it easy to understand the ideas.


The fundamental problem here may be that your estimate of the  
gradient is biased by the playout policy. You should probably sample  
X(s) uniformly at random to have an unbiased estimator. Maybe this  
can be fixed with importance sampling, and then you may get a  
formula that is symmetrical regarding wins and losses. I don't have  
time to do it now, but it may be worth taking a look.


The gradient already compensates for the playout policy (equation 9),  
so in fact it would bias the gradient to sample uniformly at random!


In equation 9, the gradient is taken with respect to the playout  
policy parameters. Using the product rule (third line), the gradient  
is equal to the playout policy probabilities multiplied by the sum  
likelihood ratios multiplied by the simulation outcomes z. This  
gradient can be computed by sampling playouts  instead of multiplying  
by the playout policy probabilities. This is also why games with  
outcomes of z=0 can be ignored - they don't affect this gradient  
computation.


More precisely: you should estimate the value of N playouts as Sum  
p_i z_i / Sum p_i instead of Sum z_i. Then, take the gradient of Sum  
p_i z_i / Sum p_i. This would be better.


Maybe Sum p_i z_i / Sum p_i would be better for MCTS, too ?


I think a similar point applies here. We care about the expected value  
of the playout policy, which can be sampled directly from playouts,  
instead of multiplying by the policy probabilities. You would only  
need importance sampling if you were using a different playout policy  
to the one which you are evaluating. But I guess I'm not seeing any  
good reason to do this?


-Dave

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-30 Thread Rémi Coulom

David Silver wrote:


Sorry, I should have made it clear that this assumes that we are 
treating black wins as z=1 and white wins as z=0.
In this special case, the gradient is the average of games in which 
black won.
But yes, more generally you need to include games won by both sides. 
The algorithms in the paper still cover this case - I was just trying 
to simplify their description to make it easy to understand the ideas.


I understood this. What I find strange is that using -1/1 should be 
equivalent to using 0/1, but your algorithm behaves differently: it 
ignores lost games with 0/1, and uses them with -1/1.


Imagine you add a big constant to z. One million, say. This does not 
change the problem. You get either 100 or 101 as outcome of a 
playout. But then, your estimate of the gradient becomes complete noise.


So maybe using -1/1 is better than 0/1 ? Since your algorithm depends so 
much on the definition of the reward, there must be an optimal way to 
set the reward. Or there must a better way to define an algorithm that 
would not depend on an offset in the reward.




The gradient already compensates for the playout policy (equation 9), 
so in fact it would bias the gradient to sample uniformly at random!

Yes, you are right.

There is still something wrong that I don't understand. There may be a 
way to quantify the amount of noise in the unbiased gradient estimate, 
and it would depend on the average reward. Probably setting the average 
reward to zero is what would minimize noise in the gradient estimate. 
This is just an intuitive guess.


Rémi
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-30 Thread David Silver

Hi Remi,

I understood this. What I find strange is that using -1/1 should be  
equivalent to using 0/1, but your algorithm behaves differently: it  
ignores lost games with 0/1, and uses them with -1/1.


Imagine you add a big constant to z. One million, say. This does not  
change the problem. You get either 100 or 101 as outcome of  
a playout. But then, your estimate of the gradient becomes complete  
noise.


So maybe using -1/1 is better than 0/1 ? Since your algorithm  
depends so much on the definition of the reward, there must be an  
optimal way to set the reward. Or there must a better way to define  
an algorithm that would not depend on an offset in the reward.


There is still something wrong that I don't understand. There may be  
a way to quantify the amount of noise in the unbiased gradient  
estimate, and it would depend on the average reward. Probably  
setting the average reward to zero is what would minimize noise in  
the gradient estimate. This is just an intuitive guess.


Okay, now I understand your point :-) It's a good question - and I  
think you're right. In REINFORCE any baseline can be subtracted from  
the reward, without affecting the expected gradient, but possibly  
reducing its variance. The baseline leading to the best estimate is  
indeed the average reward.  So it should be the case that  {-1,+1}  
would estimate the gradient g more efficiently than {0,1}, assuming  
that we see similar numbers of black wins as white wins across the  
training set.


So to answer your question, we can safely modify the algorithm to use  
(z-b) instead of z, where b is the average reward. This would then  
make the {0,1} and {-1,+1} cases equivalent (with appropriate scaling  
of step-size). I don't think this would have affected the results we  
presented (because all of the learning algorithms converged anyway, at  
least approximately, during training) but it could be an important  
modification for larger boards.


-Dave

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-30 Thread Michael Williams

I wish I was smart   :(


David Silver wrote:

Hi Remi,

I understood this. What I find strange is that using -1/1 should be 
equivalent to using 0/1, but your algorithm behaves differently: it 
ignores lost games with 0/1, and uses them with -1/1.


Imagine you add a big constant to z. One million, say. This does not 
change the problem. You get either 100 or 101 as outcome of a 
playout. But then, your estimate of the gradient becomes complete noise.


So maybe using -1/1 is better than 0/1 ? Since your algorithm depends 
so much on the definition of the reward, there must be an optimal way 
to set the reward. Or there must a better way to define an algorithm 
that would not depend on an offset in the reward.


There is still something wrong that I don't understand. There may be a 
way to quantify the amount of noise in the unbiased gradient estimate, 
and it would depend on the average reward. Probably setting the 
average reward to zero is what would minimize noise in the gradient 
estimate. This is just an intuitive guess.


Okay, now I understand your point :-) It's a good question - and I think 
you're right. In REINFORCE any baseline can be subtracted from the 
reward, without affecting the expected gradient, but possibly reducing 
its variance. The baseline leading to the best estimate is indeed the 
average reward.  So it should be the case that  {-1,+1} would estimate 
the gradient g more efficiently than {0,1}, assuming that we see similar 
numbers of black wins as white wins across the training set.


So to answer your question, we can safely modify the algorithm to use 
(z-b) instead of z, where b is the average reward. This would then make 
the {0,1} and {-1,+1} cases equivalent (with appropriate scaling of 
step-size). I don't think this would have affected the results we 
presented (because all of the learning algorithms converged anyway, at 
least approximately, during training) but it could be an important 
modification for larger boards.


-Dave

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/



___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-30 Thread David Silver

Hi Yamato,


Thanks for the detailed explanation.
M, N and alpha are constant numbers, right?  What did you set them to?


You're welcome!
Yes, in our experiments they were just constant numbers M=N=100.


The feature vector is the set of patterns you use, with value 1 if a
pattern is matched and 0 otherwise. The simulation policy selects
actions in proportion to the exponentiated, weighted sum of all
matching patterns. For example let's say move a matches patterns 1  
and

2, move b matches patterns 1 and 3, and move c matches patterns 2 and
4. Then move a would be selected with probability e^(theta1 +
theta2) / (e^(theta1 + theta2) + e^(theta1 + theta3) + e^(theta2 +
theta4)). The theta values are the weights on the patterns which we
would like to learn. They are the log of the Elo ratings in Remi
Coulom's approach.


OK, I guess it is the formula 5 in the paper.


Yes, exactly.


The only tricky part is computing the vector psi(s,a). Each component
of psi(s,a) corresponds to a particular pattern, and is the  
difference

between the observed feature (i.e. whether the pattern actually
occurred after move a in position s) and the expected feature (the
average value of the pattern, weighted by the probability of  
selecting

each action).


I still don't understand this. Is it the formula 6?
Could you please give me an example like the above?


Yes that's right, this is equation 6.

Okay, let's continue the example above. Let's say that in position s,  
using the current theta, moves a, b and c will be selected with  
probability 0.5, 0.3 and 0.2 respectively. Let's say that move a was  
actually selected. Now consider pattern 1, this is matched after a.  
But the probability of matching was (0.5*1 +0.3*1 +0.2*0) = 0.8. So  
psi_1(s,a)=1-0.8 = 0.2. For pattern 2, psi_2(s,a)=1- 
(0.5*1+0.3*0+0.2*1)=0.3, etc.. So in this example the vector psi(s,a)  
= [0.2,0.3,-0.3,-0.2].


In other words, psi tells us whether each pattern was actually matched  
more or less than we could have expected.


Hope this helps.
-Dave

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-30 Thread David Silver
IMO other people's equations/code/ideas/papers always seem smarter  
than your own. The stuff you understand and do yourself just seems  
like common sense, and the stuff you don't always has a mystical air  
of complexity, at least until you understand it too :-)


On 30-Apr-09, at 1:59 PM, Michael Williams wrote:


I wish I was smart   :(


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-30 Thread Yamato
David Silver wrote:
Yes, in our experiments they were just constant numbers M=N=100.

If M and N are the same, is there any reason to run M simulations and
N simulations separately?  What happens if you combine them and calculate
V and g in the single loop?

Okay, let's continue the example above. Let's say that in position s,  
using the current theta, moves a, b and c will be selected with  
probability 0.5, 0.3 and 0.2 respectively. Let's say that move a was  
actually selected. Now consider pattern 1, this is matched after a.  
But the probability of matching was (0.5*1 +0.3*1 +0.2*0) = 0.8. So  
psi_1(s,a)=1-0.8 = 0.2. For pattern 2, psi_2(s,a)=1- 
(0.5*1+0.3*0+0.2*1)=0.3, etc.. So in this example the vector psi(s,a)  
= [0.2,0.3,-0.3,-0.2].

In other words, psi tells us whether each pattern was actually matched  
more or less than we could have expected.

I understood what psi was. I am not sure how it works, but anyway I can
see your algorithm now. Thanks.

--
Yamato
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-29 Thread David Silver

Hi Yamato,


Could you give us the source code which you used?  Your algorithm is
too complicated, so it would be very helpful if possible.


Actually I think the source code would be much harder to understand!  
It is written inside RLGO, and makes use of a substantial existing  
framework that would take a lot of effort to understand. (On a  
separate note I am considering making RLGO open source at some point,  
but I'd prefer to spend some effort cleaning it up before making it  
public).


But I think maybe Algorithm 1 is much easier than you think:

A: Estimate value V* of every position in a training set, using deep  
rollouts.


B: Repeat, for each position in the training set
1. Run M simulations, estimate value of position (call this V)
	2. Run another N simulations, average the value of psi(s,a) over all  
positions and moves in games that black won (call this g)

3. Adjust parameters by alpha * (V* - V) * g

The feature vector is the set of patterns you use, with value 1 if a  
pattern is matched and 0 otherwise. The simulation policy selects  
actions in proportion to the exponentiated, weighted sum of all  
matching patterns. For example let's say move a matches patterns 1 and  
2, move b matches patterns 1 and 3, and move c matches patterns 2 and  
4. Then move a would be selected with probability e^(theta1 +  
theta2) / (e^(theta1 + theta2) + e^(theta1 + theta3) + e^(theta2 +  
theta4)). The theta values are the weights on the patterns which we  
would like to learn. They are the log of the Elo ratings in Remi  
Coulom's approach.


The only tricky part is computing the vector psi(s,a). Each component  
of psi(s,a) corresponds to a particular pattern, and is the difference  
between the observed feature (i.e. whether the pattern actually  
occurred after move a in position s) and the expected feature (the  
average value of the pattern, weighted by the probability of selecting  
each action).


It's also very important to be careful about signs and the colour to  
play - it's easy to make a mistake and follow the gradient in the  
wrong direction.


Is that any clearer?
-Dave

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-29 Thread Michael Williams

David Silver wrote:

Hi Michael,

But one thing confuses me: You are using the value from Fuego's 10k 
simulations as an approximation of the actual value of the position.  
But isn't the actual
value of the position either a win or a loss?  On such small boards, 
can't you assume that Fuego is able to correctly determin who is 
winning and round it's
evaluation to the nearest win/loss?  i.e. if it evaluates the position 
to 0.674, that gets rounded to 1.  If such an assumption about Fuego's 
ability to read
the position on a small board is valid, then it should improve the 
results of your balanced simulation strategy, right?  Or am I missing 
something?


It's true that 5x5 Go is solved, so in principle we could have used the 
true minimax values. However we chose to use an approach that can scale 
to larger boards, which means that we should treat the expert 
evaluations as approximate. And in fact Fuego was not always accurate on 
6x6 boards, as we used only 10k simulations in our training set.


Also, I think it really helps to have soft rather than hard expert 
evaluations. We want a simulation policy that helps differentiate e.g. a 
90% winning position from an 85% winning position. Rounding all the 
expert evaluations to 0 or 1 would lose much of this advantage.


-Dave


By this argument (your last paragraph), you need to do some magical
number of simulations for the training data.  Not enough simulations
and you have too much noise.  And infinite simulations gives you hard
0 or 1 results.  But I can't argue with your results.

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] Re: [computer go] Monte-Carlo Simulation Balancing

2009-04-29 Thread David Silver

Hi Remi,


What komi did you use for 5x5 and 6x6 ?

I used 7.5 komi for both board sizes.


I find it strange that you get only 70 Elo points from supervised
learning over uniform random. Don't you have any feature for atari
extension ? This one alone should improve strength immensely (extend
string in atari because of the previous move).
Actually no. The features are very simple, and know how to capture but  
not how to defend ataris. I'm sure that a better set of features could  
improve by more than 70 Elo, but I expect we would still see a benefit  
to balancing the weights correctly. For example, the Fuego policy  
defends ataris and follows several other common-sense rules, but the  
results in 5x5 and 6x6 show that it is not well balanced on small  
boards.


Let us extrapolate: I got more than 200 Elo points of improvements  
from
my patterns in 9x9 over uniform random (I never really measured  
that, it

may be even more than 200).
I guess you got more than 200 Elo on 9x9, in Mogo (Gelly et al. 2006)  
the improvement from uniform random was at least 400 Elo and I think  
your simulation policy is probably at least as strong.


By the way I was sad to hear you're not working on Crazystone any  
more. Is it because you are you busy with other projects?



So maybe I could get 600 more Elo points
with your method. And even more on 19x19.
I noticed that, in general, changes in the playout policy have a much
bigger impact on larger boards than on smaller boards.


As to whether we can extrapolate, I hope so :-)
I share the same feeling that improving the simulation policy will be  
more impactful on bigger boards with longer simulations.
On the other hand I've been surprised by many things in MC Go, so  
nothing is certain until we try it!


-Dave

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-29 Thread Yamato
David Silver wrote:
A: Estimate value V* of every position in a training set, using deep  
rollouts.

B: Repeat, for each position in the training set
   1. Run M simulations, estimate value of position (call this V)
   2. Run another N simulations, average the value of psi(s,a) over all  
positions and moves in games that black won (call this g)
   3. Adjust parameters by alpha * (V* - V) * g

Thanks for the detailed explanation.
M, N and alpha are constant numbers, right?  What did you set them to?

The feature vector is the set of patterns you use, with value 1 if a  
pattern is matched and 0 otherwise. The simulation policy selects  
actions in proportion to the exponentiated, weighted sum of all  
matching patterns. For example let's say move a matches patterns 1 and  
2, move b matches patterns 1 and 3, and move c matches patterns 2 and  
4. Then move a would be selected with probability e^(theta1 +  
theta2) / (e^(theta1 + theta2) + e^(theta1 + theta3) + e^(theta2 +  
theta4)). The theta values are the weights on the patterns which we  
would like to learn. They are the log of the Elo ratings in Remi  
Coulom's approach.

OK, I guess it is the formula 5 in the paper.

The only tricky part is computing the vector psi(s,a). Each component  
of psi(s,a) corresponds to a particular pattern, and is the difference  
between the observed feature (i.e. whether the pattern actually  
occurred after move a in position s) and the expected feature (the  
average value of the pattern, weighted by the probability of selecting  
each action).

I still don't understand this. Is it the formula 6?
Could you please give me an example like the above?

--
Yamato
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] Monte-Carlo Simulation Balancing

2009-04-28 Thread David Silver

Hi Yamato,


I like you idea, but why do you use only 5x5 and 6x6 Go?


1. Our second algorithm, two-ply simulation balancing, requires a  
training set of two-ply rollouts. Rolling out every position from a  
complete two-ply search is very expensive on larger board sizes, so we  
would probably have to consider some subset of leaf positions. We  
wanted to analyse the full algorithm first, before we started making  
approximations.
2. We can generate a lot more data on small boards, to give high  
confidence on the results we report.
3. IMO it's important to do the science to understand underlying  
principles first, and then scale up to bigger boards, more complex  
Monte-Carlo searches, etc.



I don't think the 200+ Elo improvement is so impressive


I agree that it would be much more impressive to report positive  
results on larger boards. But perhaps it is already interesting that  
tuning the balance of the simulation policy can make a big difference  
on small boards? Also, larger boards mean longer simulations, and so  
the importance of simulation balancing should become even more  
exaggerated.


because the previous approaches were not optimized for such a small  
boards.


I'm not sure what you mean here? The supervised learning and  
reinforcement learning approaches that we compared against are both  
trained on the small boards, i.e. the pattern weights are specifically  
optimised for that size of board.


I agree that the handcrafted policy from Fuego was not optimised for  
small boards, which is why it performed poorly. But perhaps this is  
also interesting, i.e. it suggests that a handcrafted policy for 9x9  
Go may be significantly suboptimal when used in 19x19 Go. So  
automatically learning a simulation policy may ultimately prove to be  
very beneficial.



I'm looking forward to your results on larger boards.


Me too :-)
Coming soon, will let you know how it goes.
But I hope that others will try these ideas too, it's always much  
better to compare multiple implementations of the same algorithm.


-Dave

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-28 Thread Rémi Coulom

David Silver wrote:

I don't think the 200+ Elo improvement is so impressive


I agree that it would be much more impressive to report positive 
results on larger boards. But perhaps it is already interesting that 
tuning the balance of the simulation policy can make a big difference 
on small boards? Also, larger boards mean longer simulations, and so 
the importance of simulation balancing should become even more 
exaggerated. 


What komi did you use for 5x5 and 6x6 ?

I find it strange that you get only 70 Elo points from supervised 
learning over uniform random. Don't you have any feature for atari 
extension ? This one alone should improve strength immensely (extend 
string in atari because of the previous move).


Let us extrapolate: I got more than 200 Elo points of improvements from 
my patterns in 9x9 over uniform random (I never really measured that, it 
may be even more than 200). So maybe I could get 600 more Elo points 
with your method. And even more on 19x19.


I noticed that, in general, changes in the playout policy have a much 
bigger impact on larger boards than on smaller boards.


Rémi
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-28 Thread Ivan Dubois
I noticed that, in general, changes in the playout policy have a much 
bigger impact on larger boards than on smaller boards.


Rémi


I think rating differences are emplified on larger boards. This is easy to 
see if you think about it this way :


Somehow a 19x19 board is like 4 9x9 boards. Let us define a new game that I 
would call 4-Go where instead of playing one game, you play simultenously 4 
games and determine the winner by calculating the sum of the scores of the 
four games. Certainly rating differences would be bigger with 4-go than with 
go (given the same two players). This explains why rating differences are 
bigger on 19x19 than 9x9.


Ivan 



___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-28 Thread Don Dailey
A simplistic model that helps explain this is golf.   On a single hole, even
a casual golfer has a realistic chance of out-golfing Tiger Woods.  Tiger
occasionally shoots a 1 over par on some hole and even weak amateurs
occasionally par or even birdie a hole.It's not going to happen a lot,
but it's not ridiculous either.   Years ago I taught a player how to golf,
and on his third time out with me,  he hit a hole in one on a short par
3. If Tiger Woods had been playing with us, he would have lost that hole
to this beginner.

But in a 9 hole match,  the odds go down enormously - for all practical
purposes there is no chance.

I kind of think of GO like that, even though it's a pretty simplistic
model.   Each move is like a hole of golf,  it can be a good shot or a bad
one. With GO, however, probably a LOT of your moves are just as good as
the moves of a good player.   But it's the ones that fall short, that kill
you.

Go on a big board is like 18 holes of golf  compared to just 1 or 2 holes of
golf.   The better player is far more likely to win the 18 hole match than
the 1 hole match.

- Don





On Tue, Apr 28, 2009 at 1:53 PM, Ivan Dubois dubois_i...@yahoo.fr wrote:

 I noticed that, in general, changes in the playout policy have a much
 bigger impact on larger boards than on smaller boards.

 Rémi


 I think rating differences are emplified on larger boards. This is easy to
 see if you think about it this way :

 Somehow a 19x19 board is like 4 9x9 boards. Let us define a new game that I
 would call 4-Go where instead of playing one game, you play simultenously 4
 games and determine the winner by calculating the sum of the scores of the
 four games. Certainly rating differences would be bigger with 4-go than with
 go (given the same two players). This explains why rating differences are
 bigger on 19x19 than 9x9.

 Ivan

 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-28 Thread steve uurtamo
also, i'm not sure that a lot of most amateurs' moves are very
good.  the spectrum of bad moves is wide, it's just that it takes
someone many stones stronger to severely punish small differences
between good and nearly-good moves.  among players of relatively
similar strength, these differences will go unnoticed and unpunished.

s.

2009/4/28 Don Dailey dailey@gmail.com:
 A simplistic model that helps explain this is golf.   On a single hole, even
 a casual golfer has a realistic chance of out-golfing Tiger Woods.  Tiger
 occasionally shoots a 1 over par on some hole and even weak amateurs
 occasionally par or even birdie a hole.    It's not going to happen a lot,
 but it's not ridiculous either.   Years ago I taught a player how to golf,
 and on his third time out with me,  he hit a hole in one on a short par
 3. If Tiger Woods had been playing with us, he would have lost that hole
 to this beginner.

 But in a 9 hole match,  the odds go down enormously - for all practical
 purposes there is no chance.

 I kind of think of GO like that, even though it's a pretty simplistic
 model.   Each move is like a hole of golf,  it can be a good shot or a bad
 one. With GO, however, probably a LOT of your moves are just as good as
 the moves of a good player.   But it's the ones that fall short, that kill
 you.

 Go on a big board is like 18 holes of golf  compared to just 1 or 2 holes of
 golf.   The better player is far more likely to win the 18 hole match than
 the 1 hole match.

 - Don





 On Tue, Apr 28, 2009 at 1:53 PM, Ivan Dubois dubois_i...@yahoo.fr wrote:

 I noticed that, in general, changes in the playout policy have a much
 bigger impact on larger boards than on smaller boards.

 Rémi

 I think rating differences are emplified on larger boards. This is easy to
 see if you think about it this way :

 Somehow a 19x19 board is like 4 9x9 boards. Let us define a new game that
 I would call 4-Go where instead of playing one game, you play simultenously
 4 games and determine the winner by calculating the sum of the scores of the
 four games. Certainly rating differences would be bigger with 4-go than with
 go (given the same two players). This explains why rating differences are
 bigger on 19x19 than 9x9.

 Ivan

 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/


 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-28 Thread terry mcintyre
A fair number of amateur moves are exactly what a pro would use; this is no 
great surprise, since the moves are actually copied from professional games. 
The weaker player can blindly emulate, but does not know why a pro plays A 
instead of B or C; does not know how to punish slack moves.

I've been taking some classes from an 8d pro. Sadly, I'm at the bottom of the 
class. The top players in these classes are 8d amateurs; the pro can give them 
3 stones and win. When going over their games, many of their moves are quite 
respectable, but the pro is able to find mistakes which can be exploited with 
devastating effect. You can't afford to be a little bit off against a pro, 
but weaker players find it hard to exploit slack moves.

 
When explaining his moves, the pro often uses very deep reading. We've all seen 
simple cases, where move A is advisable if a ladder works, otherwise B is 
recommended. The pro knows six moves in advance that he'll need to read that 
ladder, before the ladder is actually on the board. These are the simplest 
expositions I've seen; things get more complex from there. 
Terry McIntyre terrymcint...@yahoo.com


Government is an association of men who do violence to the rest of us.
- Leo Tolstoy





From: steve uurtamo uurt...@gmail.com
To: computer-go computer-go@computer-go.org
Sent: Tuesday, April 28, 2009 5:09:20 PM
Subject: Re: [computer-go] Monte-Carlo Simulation Balancing

also, i'm not sure that a lot of most amateurs' moves are very
good.  the spectrum of bad moves is wide, it's just that it takes
someone many stones stronger to severely punish small differences
between good and nearly-good moves.  among players of relatively
similar strength, these differences will go unnoticed and unpunished.

s.

2009/4/28 Don Dailey dailey@gmail.com:
 A simplistic model that helps explain this is golf.   On a single hole, even
 a casual golfer has a realistic chance of out-golfing Tiger Woods.  Tiger
 occasionally shoots a 1 over par on some hole and even weak amateurs
 occasionally par or even birdie a hole.It's not going to happen a lot,
 but it's not ridiculous either.   Years ago I taught a player how to golf,
 and on his third time out with me,  he hit a hole in one on a short par
 3. If Tiger Woods had been playing with us, he would have lost that hole
 to this beginner.

 But in a 9 hole match,  the odds go down enormously - for all practical
 purposes there is no chance.

 I kind of think of GO like that, even though it's a pretty simplistic
 model.   Each move is like a hole of golf,  it can be a good shot or a bad
 one. With GO, however, probably a LOT of your moves are just as good as
 the moves of a good player.   But it's the ones that fall short, that kill
 you.

 Go on a big board is like 18 holes of golf  compared to just 1 or 2 holes of
 golf.   The better player is far more likely to win the 18 hole match than
 the 1 hole match.

 - Don





 On Tue, Apr 28, 2009 at 1:53 PM, Ivan Dubois dubois_i...@yahoo.fr wrote:

 I noticed that, in general, changes in the playout policy have a much
 bigger impact on larger boards than on smaller boards.

 Rémi

 I think rating differences are emplified on larger boards. This is easy to
 see if you think about it this way :

 Somehow a 19x19 board is like 4 9x9 boards. Let us define a new game that
 I would call 4-Go where instead of playing one game, you play simultenously
 4 games and determine the winner by calculating the sum of the scores of the
 four games. Certainly rating differences would be bigger with 4-go than with
 go (given the same two players). This explains why rating differences are
 bigger on 19x19 than 9x9.

 Ivan

 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/


 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/



  ___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-28 Thread Yamato
David Silver wrote:
 because the previous approaches were not optimized for such a small  
 boards.

I'm not sure what you mean here? The supervised learning and  
reinforcement learning approaches that we compared against are both  
trained on the small boards, i.e. the pattern weights are specifically  
optimised for that size of board.

Ah, sorry. I missed it.

I agree that the handcrafted policy from Fuego was not optimised for  
small boards, which is why it performed poorly. But perhaps this is  
also interesting, i.e. it suggests that a handcrafted policy for 9x9  
Go may be significantly suboptimal when used in 19x19 Go. So  
automatically learning a simulation policy may ultimately prove to be  
very beneficial.

It is surprising Fuego was far worse than uniform random on 5x5 and 6x6.
These results show the particularity of the very small boards.
I suppose 5x5 is very different from 9x9 but 19x19 is not so from 9x9.

 I'm looking forward to your results on larger boards.

Me too :-)
Coming soon, will let you know how it goes.
But I hope that others will try these ideas too, it's always much  
better to compare multiple implementations of the same algorithm.

Could you give us the source code which you used?  Your algorithm is
too complicated, so it would be very helpful if possible.

--
Yamato
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-28 Thread Don Dailey
But I'm only trying to make a point, not pin the analogy down perfectly.
Naturally the stronger the player, the more likely his moves will conform to
the level of the top players.

The basic principle is that the longer the contest, the more opportunities a
strong player has to demonstrate his superiority over an inferior player
and/or recover from mistakes or surprises.In my golf analogy,   my
friend hits a hole in one.  Let's say it was on the first hole of 9 against
Tiger Woods.   Tiger likely would have hit it in 2 strokes,  and my beginner
friend would be ahead by 1 stroke!But that 1 stroke lead would almost
certainly disappear by the time the next hole was completed, as my friend
was typically taking 7 or 8 shots on that par 3 course!

Another way to look at this,  is that a random legal move generator has a
chance to play a perfect game, although that chance is almost infinitesimal.
  But even that small chance can be an order of magnitude larger if you can
shorted the contest even by 1 or 2 moves.

- Don


On Tue, Apr 28, 2009 at 8:09 PM, steve uurtamo uurt...@gmail.com wrote:

 also, i'm not sure that a lot of most amateurs' moves are very
 good.  the spectrum of bad moves is wide, it's just that it takes
 someone many stones stronger to severely punish small differences
 between good and nearly-good moves.  among players of relatively
 similar strength, these differences will go unnoticed and unpunished.

 s.

 2009/4/28 Don Dailey dailey@gmail.com:
  A simplistic model that helps explain this is golf.   On a single hole,
 even
  a casual golfer has a realistic chance of out-golfing Tiger Woods.  Tiger
  occasionally shoots a 1 over par on some hole and even weak amateurs
  occasionally par or even birdie a hole.It's not going to happen a
 lot,
  but it's not ridiculous either.   Years ago I taught a player how to
 golf,
  and on his third time out with me,  he hit a hole in one on a short par
  3. If Tiger Woods had been playing with us, he would have lost that
 hole
  to this beginner.
 
  But in a 9 hole match,  the odds go down enormously - for all practical
  purposes there is no chance.
 
  I kind of think of GO like that, even though it's a pretty simplistic
  model.   Each move is like a hole of golf,  it can be a good shot or a
 bad
  one. With GO, however, probably a LOT of your moves are just as good
 as
  the moves of a good player.   But it's the ones that fall short, that
 kill
  you.
 
  Go on a big board is like 18 holes of golf  compared to just 1 or 2 holes
 of
  golf.   The better player is far more likely to win the 18 hole match
 than
  the 1 hole match.
 
  - Don
 
 
 
 
 
  On Tue, Apr 28, 2009 at 1:53 PM, Ivan Dubois dubois_i...@yahoo.fr
 wrote:
 
  I noticed that, in general, changes in the playout policy have a much
  bigger impact on larger boards than on smaller boards.
 
  Rémi
 
  I think rating differences are emplified on larger boards. This is easy
 to
  see if you think about it this way :
 
  Somehow a 19x19 board is like 4 9x9 boards. Let us define a new game
 that
  I would call 4-Go where instead of playing one game, you play
 simultenously
  4 games and determine the winner by calculating the sum of the scores of
 the
  four games. Certainly rating differences would be bigger with 4-go than
 with
  go (given the same two players). This explains why rating differences
 are
  bigger on 19x19 than 9x9.
 
  Ivan
 
  ___
  computer-go mailing list
  computer-go@computer-go.org
  http://www.computer-go.org/mailman/listinfo/computer-go/
 
 
  ___
  computer-go mailing list
  computer-go@computer-go.org
  http://www.computer-go.org/mailman/listinfo/computer-go/
 
 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-27 Thread Michael Williams

Finally!  I guess you can add this technique to your list, Lukasz.


David Silver wrote:

Hi everyone,

Please find attached my ICML paper with Gerry Tesauro on automatically 
learning a simulation policy for Monte-Carlo Go. Our preliminary results 
show a 200+ Elo improvement over previous approaches, although our 
experiments were restricted to simple Monte-Carlo search with no tree on 
small boards.





Abstract

In this paper we introduce the first algorithms for efficiently learning 
a simulation policy for Monte-Carlo search. Our main idea is to optimise 
the balance of a simulation policy, so that an accurate spread of 
simulation outcomes is maintained, rather than optimising the direct 
strength of the simulation policy. We develop two algorithms for 
balancing a simulation policy by gradient descent. The first algorithm 
optimises the balance of complete simulations, using a policy gradient 
algorithm; whereas the second algorithm optimises the balance over every 
two steps of simulation. We compare our algorithms to reinforcement 
learning and supervised learning algorithms for maximising the strength 
of the simulation policy. We test each algorithm in the domain of 5x5 
and 6x6 Computer Go, using a softmax policy that is parameterised by 
weights for a hundred simple patterns. When used in a simple Monte-Carlo 
search, the policies learnt by simulation balancing achieved 
significantly better performance, with half the mean squared error of a 
uniform random policy, and equal overall performance to a sophisticated 
Go engine.


-Dave




___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-27 Thread Rémi Coulom

David Silver wrote:

Hi everyone,

Please find attached my ICML paper with Gerry Tesauro on automatically 
learning a simulation policy for Monte-Carlo Go. Our preliminary 
results show a 200+ Elo improvement over previous approaches, although 
our experiments were restricted to simple Monte-Carlo search with no 
tree on small boards.


Very interesting, thanks.

If I understand correctly, your method makes your program 250 Elo points 
stronger than my pattern-learning algorithm on 5x5 and 6x6, by just 
learning better weights. That is impressive. I had stopped working on my 
program for a very long time, but maybe I will go back to work and try 
your algorithm.


Rémi
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] Monte-Carlo Simulation Balancing

2009-04-27 Thread David Silver

Hi Remi,

If I understand correctly, your method makes your program 250 Elo  
points

stronger than my pattern-learning algorithm on 5x5 and 6x6, by just
learning better weights.


Yes, although this is just in a very simple MC setting.

Also we did not compare directly to the algorithm you used  
(minorisation/maximisation), but to a different algorithm for  
maximising the likelihood of an expert policy. But as the softmax  
policy is a log transformation of the generalised Bradley-Terry model,  
I think this comparison is still valid.


I had stopped working on my program for a very long time, but maybe  
I will go back to work and try

your algorithm.


I would be really interested to know how this works out! Please let me  
know.


Best wishes
-Dave

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-27 Thread Michael Williams

My favorite part:
One natural idea is to use the learned simulation policy in Monte-Carlo search, and 
generate new deep search values, in an iterative cycle.

But one thing confuses me: You are using the value from Fuego's 10k simulations as an approximation of the actual value of the position.  But isn't the actual 
value of the position either a win or a loss?  On such small boards, can't you assume that Fuego is able to correctly determin who is winning and round it's 
evaluation to the nearest win/loss?  i.e. if it evaluates the position to 0.674, that gets rounded to 1.  If such an assumption about Fuego's ability to read 
the position on a small board is valid, then it should improve the results of your balanced simulation strategy, right?  Or am I missing something?





David Silver wrote:

Hi everyone,

Please find attached my ICML paper with Gerry Tesauro on automatically 
learning a simulation policy for Monte-Carlo Go. Our preliminary results 
show a 200+ Elo improvement over previous approaches, although our 
experiments were restricted to simple Monte-Carlo search with no tree on 
small boards.





Abstract

In this paper we introduce the first algorithms for efficiently learning 
a simulation policy for Monte-Carlo search. Our main idea is to optimise 
the balance of a simulation policy, so that an accurate spread of 
simulation outcomes is maintained, rather than optimising the direct 
strength of the simulation policy. We develop two algorithms for 
balancing a simulation policy by gradient descent. The first algorithm 
optimises the balance of complete simulations, using a policy gradient 
algorithm; whereas the second algorithm optimises the balance over every 
two steps of simulation. We compare our algorithms to reinforcement 
learning and supervised learning algorithms for maximising the strength 
of the simulation policy. We test each algorithm in the domain of 5x5 
and 6x6 Computer Go, using a softmax policy that is parameterised by 
weights for a hundred simple patterns. When used in a simple Monte-Carlo 
search, the policies learnt by simulation balancing achieved 
significantly better performance, with half the mean squared error of a 
uniform random policy, and equal overall performance to a sophisticated 
Go engine.


-Dave




___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] Monte-Carlo Simulation Balancing

2009-04-27 Thread David Silver

Hi Michael,

But one thing confuses me: You are using the value from Fuego's 10k  
simulations as an approximation of the actual value of the  
position.  But isn't the actual
value of the position either a win or a loss?  On such small boards,  
can't you assume that Fuego is able to correctly determin who is  
winning and round it's
evaluation to the nearest win/loss?  i.e. if it evaluates the  
position to 0.674, that gets rounded to 1.  If such an assumption  
about Fuego's ability to read
the position on a small board is valid, then it should improve the  
results of your balanced simulation strategy, right?  Or am I  
missing something?


It's true that 5x5 Go is solved, so in principle we could have used  
the true minimax values. However we chose to use an approach that can  
scale to larger boards, which means that we should treat the expert  
evaluations as approximate. And in fact Fuego was not always accurate  
on 6x6 boards, as we used only 10k simulations in our training set.


Also, I think it really helps to have soft rather than hard expert  
evaluations. We want a simulation policy that helps differentiate e.g.  
a 90% winning position from an 85% winning position. Rounding all the  
expert evaluations to 0 or 1 would lose much of this advantage.


-Dave


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Simulation Balancing

2009-04-27 Thread Yamato
David Silver wrote:
Please find attached my ICML paper with Gerry Tesauro on automatically  
learning a simulation policy for Monte-Carlo Go. Our preliminary  
results show a 200+ Elo improvement over previous approaches, although  
our experiments were restricted to simple Monte-Carlo search with no  
tree on small boards.

I like you idea, but why do you use only 5x5 and 6x6 Go?  I don't think
the 200+ Elo improvement is so impressive because the previous approaches
were not optimized for such a small boards. I'm looking forward to your
results on larger boards.

--
Yamato
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Tree Search reference bot

2008-12-07 Thread dhillismail
Finally got a window of time. It's called the epsilon trick. Here is a cut 
and paste of Lukasz' explanation from the archives, including how to calculate 
N.
- Dave Hillis

--

-Original Message-
From: [EMAIL PROTECTED]
To: computer-go@computer-go.org
Sent: Sun, 14 Jan 2007 4:50 AM
Subject: [computer-go] Fast UCT (epsilon trick)


Hi 
I've looked back into the article and I see that it is not trivial to 
understand how does the epsilon trick (ET) works. Therefore I will 
describe it here on the example of UCT. 

The bottleneck of UCT algorithm is choosing a move. Program has to 
iterate over all children 
evaluating their UCB value. We want to use of this loop rarely. 
One way to do it is to group playouts together. I.e. play 2 or more MC 
simulations per each tree traversal. But this is not good enough as it 
doesn't distribute 
the playouts over the tree and some parts will not get enough of them. 

The ET works as follows: 
With each node of the tree program keeps additional information: 
- child_to_visit 
- visits_to_go 

When we visit node, 
if node.visits_to_go  0 then 
node-visits_to_go - node.visits_to_go - 1 
node = child_to_visit 

So we have predetermined strategy to go to particular child visits_to_go times. 

But 
if node.visis_to_go = 0 then // we have to set the strategy first 
node-child_to_visit = UCT_find_child (node) 
node-visits_to_go = Epsilon * node-visited_count // round up 

As we can see, when 
Epsilon is small or when a particular node is 
visited not too many times, UCT-ET is identical to UCT. 

But when node was visited more times (near the root, far from leaves), 
then UCT-ET may save some time and go to precalculated child. 

That's it. 
I hope You like it :) 
Łukasz Lew 

PS. 

In Df-PN, good Epsilon was 1/4 I believe that here it will be much 
smaller. 1/64 maybe. 
Or even one may change the Epsilon equation to: 
node-visits_to_go = Epsilon * sqrt(node-visited_count) 

If someone will do some experiments, I'm eager to hear the results. 

PS2 

float InvSqrt (float x) { 
  float xhalf = 0.5f * x; 
  int i = *(int*)x; 
  i = 0x5f3759df - (i1); 
  x = *(float*)i; 
  x = x * (1.5f - xhalf * x * x); 
  return x; 
} 

;-) 

___
--


-Original Message-
From: [EMAIL PROTECTED]
To: computer-go@computer-go.org
Sent: Wed, 3 Dec 2008 12:56 am
Subject: Re: [computer-go] Monte-Carlo Tree Search reference bot


Hmm.. it could be that N is picked randomly each time... now I can't seem to 
find the description and my memory is not serving me well here. Perhaps someone 
else remembers? I'll try to track it down.

- Dave Hillis


-Original Message-
From: Michael Williams [EMAIL PROTECTED]
To: computer-go computer-go@computer-go.org
Sent: Wed, 3 Dec 2008 12:14 am
Subject: Re: [computer-go] Monte-Carlo Tree Search reference b
ot


That's going to repeat the same exact path through the tree three times, isn't 
it? If so, it seems like it would be more efficient to do N playouts from the 
leaf after each traversal of the tree. 
 
[EMAIL PROTECTED] wrote: 
 There is another speedup trick that might interest you. IIRC Lukasz Lew  
 came up with it, but I forget what he called it. After calculating the  
 selected move for an internal node (going through UCT and RAVE and  whatnot) 
 store that move. Then, for the next N visits to that node  (where N is 5 or 
 10 or so), just repeat that move without having to  calculate what the move 
 would be. 
  - Dave Hillis 
  -Original Message- 
 From: Mark Boon [EMAIL PROTECTED] 
 To: computer-go computer-go@computer-go.org 
 Sent: Tue, 2 Dec 2008 11:17 pm 
 Subject: Re: [computer-go] Monte-Carlo Tree Search reference bot 
  I have made some minor performance improvements and this is as far as I  
  intend to take this particular project. I might make some small changes  
  if necessary, but most likely I'll leave this largely unchanged from now.  
   I had set myself as an arbitrary goal that it should do at least 20K  
  playouts. But with real liberties, AMAF and a RAVE formula I got stuck  in 
  the 16K-17K range. According to my profiler that is mostly due to the  
  expensive formula used to compare nodes, where it says it spends 25% of  
  total time. The formula I'm using is:   beta * (virtu
al-win-ratio + RAVE) + (1-beta) * (win-ratio + UCT)   beta = 1 - 
log(parent-visits) / 20  UCT = exploration-factor *sqrt( log(parent-visits) / 
(nr-visits+1) )  RAVE = exploration-factor *sqrt( log(parent-visits) /  
(nr-virtual-visits+1) )   There are probably quite a few possibilities still 
to tune this program  with regards to playing streng th and performance. But I 
felt it doesn't  help to obscure the implementation by too many details.   
The implementation of the search algorithm was entirely  game-independent, 
until I introduced AMAF. I didn't see how to fix that,  as there's no way 
getting around

Re: [computer-go] Monte-Carlo Tree Search reference bot

2008-12-03 Thread Sylvain Gelly
What I did (was a long time ago, I don't know if it is still used in
Mogo), is to compute the m best moves every so often and most of the
time just do the max over those m moves.
m was on the order of 5, and every so often was an increasing
function like sqrt or log (I don't remember).
That speeds up quite a lot (when the max becomes the bottleneck), and
if you tune the parameters well you almost don't loose any strength at
a fixed number of playouts.

Sylvain

2008/12/3  [EMAIL PROTECTED]:
 There is another speedup trick that might interest you. IIRC Lukasz Lew came
 up with it, but I forget what he called it. After calculating the selected
 move for an internal node (going through UCT and RAVE and whatnot) store
 that move. Then, for the next N visits to that node (where N is 5 or 10 or
 so), just repeat that move without having to calculate what the move would
 be.

 - Dave Hillis

 -Original Message-
 From: Mark Boon [EMAIL PROTECTED]
 To: computer-go computer-go@computer-go.org
 Sent: Tue, 2 Dec 2008 11:17 pm
 Subject: Re: [computer-go] Monte-Carlo Tree Search reference bot

 I have made some minor performance improvements and this is as far as I
 intend to take this particular project. I might make some small changes if
 necessary, but most likely I'll leave this largely unchanged from now.

 I had set myself as an arbitrary goal that it should do at least 20K
 playouts. But with real liberties, AMAF and a RAVE formula I got stuck in
 the 16K-17K range. According to my profiler that is mostly due to the
 expensive formula used to compare nodes, where it says it spends 25% of
 total time. The formula I'm using is:

   beta * (virtual-win-ratio + RAVE) + (1-beta) * (win-ratio + UCT)

   beta = 1 - log(parent-visits) / 20
   UCT = exploration-factor *sqrt( log(parent-visits) / (nr-visits+1) )
   RAVE = exploration-factor *sqrt( log(parent-visits) /
 (nr-virtual-visits+1) )

 There are probably quite a few possibilities still to tune this program with
 regards to playing strength and performance. But I felt it doesn't help to
 obscure the implementation by too many details.

 The implementation of the search algorithm was entirely game-independent,
 until I introduced AMAF. I didn't see how to fix that, as there's no way
 getting around that it's based on the fact that a move is represented by a
 single coordinate, which is mapped to an array to store the statistical
 values. But strip the AMAF part, which is a block of 30 odd lines of code,
 and I think it can be used for other games basically as-is. I did this not
 because I ever see myself program another game, but because in my experience
 in doing so I get a cleaner separation between modules.

 At 2,000 playouts, it's still quite a bit weaker than the plain MC-AMAF
 refbot. It wins only about 33%. But that's probably because the 1,000-2,000
 playouts range is the sweet-spot for that particular type of playing engine.
 It doesn't scale from there, whereas the MCTS ref-bot only just gets warmed
 up with 2,000 playouts.

 This leads me to a question. I suppose it might be of some interest to put
 this bot up on CGOS. But what parameters to use? The main one being the
 number of playouts, naturally.

 Mark
 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

 
 Tis the season to save your money! Get the new AOL Holiday Toolbar for money
 saving offers and gift ideas.
 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Tree Search reference bot

2008-12-03 Thread Vlad Dumitrescu
On Wed, Dec 3, 2008 at 05:17, Mark Boon [EMAIL PROTECTED] wrote:
 I had set myself as an arbitrary goal that it should do at least 20K
 playouts. But with real liberties, AMAF and a RAVE formula I got stuck in
 the 16K-17K range. According to my profiler that is mostly due to the
 expensive formula used to compare nodes, where it says it spends 25% of
 total time. The formula I'm using is:

beta * (virtual-win-ratio + RAVE) + (1-beta) * (win-ratio + UCT)

beta = 1 - log(parent-visits) / 20
UCT = exploration-factor *sqrt( log(parent-visits) / (nr-visits+1) )
RAVE = exploration-factor *sqrt( log(parent-visits) /
 (nr-virtual-visits+1) )

Hi,

This is probably something you already do, but just in case you don't,
here comes a simple optimization trick: since parents-visits is an
integer value and in a somewhat bounded range (0..max_playouts), the
logarithm can be precomputed in a table. Then even sqrt(log()) can be
precomputed also. Similarly, a table with sqrt(integer()) can be used
for the other values.

best regards,
Vlad
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Tree Search reference bot

2008-12-03 Thread Mark Boon


On 3-dec-08, at 06:09, Sylvain Gelly wrote:


What I did (was a long time ago, I don't know if it is still used in
Mogo), is to compute the m best moves every so often and most of the
time just do the max over those m moves.
m was on the order of 5, and every so often was an increasing
function like sqrt or log (I don't remember).
That speeds up quite a lot (when the max becomes the bottleneck), and
if you tune the parameters well you almost don't loose any strength at
a fixed number of playouts.



I thought about that, but I was afraid the code would become too  
obscure. After all, this is supposed to be a reference  
implementation. But maybe I should actually give it a try to see what  
it would look like.


I also played with precomputing a table of the log() function but  
found little speedup. Instead I decided to (p)recompute the log(nr- 
parent-visits) only once in 8 times. According to my profiler that  
reduced the log() from using 8% to 1% of computation time. Another  
thing I did was replace log(nr-virtual-parent-visits) in the RAVE  
calculation by the same log(nr-parent-visits) used in the UCT  
calculation, cutting both the number of times I need to call log in  
half and reducing the amount of storage in the node a little. After  
these modifications I didn't notice any degradation in play.


In terms of memory use, I did a few obvious things to keep storage of  
the nodes to a minimum. I had seen people write about memory usage of  
the tree, but never understood the concerns. But that was because I  
always expanded the tree one node at a time. Of course, if you create  
all the children in one go like this reference implementation does,  
it's a little bit a different story. But still, I have to push my  
computer hard to run out of space before I run out of time so I  
didn't see much need to reduce memory too aggressively. Using just a  
moderately small number of playouts before expansion is already  
enough to never have memory problems. But if memory is a concern to  
anyone I see two obvious ways to make make significant reductions.  
One is to flatten the data-structure, reducing the object overhead  
Java imposes. That could save up to a third.  The other is to use  
smaller placeholders for nodes that have been created, but not yet  
visited. Once a node gets visited it can be replaced by a full-blown  
node, but until then all you need is the move-coordinate and the AMAF  
statistics. That should save up to 75% or so.


Mark

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Tree Search reference bot

2008-12-03 Thread Heikki Levanto
On Wed, Dec 03, 2008 at 09:55:07AM -0200, Mark Boon wrote:
 I thought about that, but I was afraid the code would become too  
 obscure. After all, this is supposed to be a reference  
 implementation. But maybe I should actually give it a try to see what  
 it would look like.

I agree that the reference implementation should be maximally clear code.
Performance is not all that important here, correctness is.


Having said that, I can't help responding to one detail:

 I had seen people write about memory usage of  the tree, but never
 understood the concerns. 

One thing to remember is that more memory use means more cache misses, and
more access to the main memory. On modern computers, those can cost as much
as executing a thousand instructions! So memory optimizing can often pay off,
also with respect to time!

Of course, Java does a lot of memory management behind the scenes, so
optimizing such can be harder... But when writing in C or C++, it does make
sense.

  

  -H

-- 
Heikki Levanto   In Murphy We Turst heikki (at) lsd (dot) dk

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Tree Search reference bot

2008-12-03 Thread Mark Boon


On 3-dec-08, at 10:31, Heikki Levanto wrote:


Having said that, I can't help responding to one detail:


I had seen people write about memory usage of  the tree, but never
understood the concerns.


One thing to remember is that more memory use means more cache  
misses, and
more access to the main memory. On modern computers, those can cost  
as much
as executing a thousand instructions! So memory optimizing can  
often pay off,

also with respect to time!

Of course, Java does a lot of memory management behind the scenes, so
optimizing such can be harder... But when writing in C or C++, it  
does make

sense.



I've seen this stated often. But color me skeptical. If what you say  
is true, then the MC-AMAF bot, which hardly uses any memory at all  
and which fits in the secondary cache entirely, code and data  
combined, should show an enormous speed-up in terms of number of  
playouts per second. Instead it does only about twice the number of  
playouts per second as the tree-search version, which can be almost  
if not entirely explained by the extra work that needs to be done to  
traverse and maintain the tree. If this is because I use Java, then  
Don's concise C implementation of the MC-AMAF bot should be a lot  
faster than my bloated Java version. Which, ahem, is not the case at  
all. I think a well-crafted C program can be up to twice as fast as a  
similar Java program. But I doubt that has much to do with cache misses.


I think in computer-Go, trying to control cache misses is futile no  
matter the language. Maybe you can do it for a relatively trivial,  
well-understood and stable sub-part. But then you risk losing the  
gain again as soon as it gets incorporated into a larger whole. So  
your efforts go to waste.


So this is in the show me category for me. You can use C or C++ if  
you like, but show me a tree-search bot that speeds up the number of  
playouts significantly by reducing the node-size. Maybe it's  
possible, but I don't believe it until I see it. This could be  
different for Chess programs, at least I've seen it claimed often.  
But I don't envy them, it must be excruciatingly frustrating to deal  
with.


Mark

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Tree Search reference bot

2008-12-03 Thread Heikki Levanto
On Wed, Dec 03, 2008 at 11:24:22AM -0200, Mark Boon wrote:
 
 Heikki wrote:
 One thing to remember is that more memory use means more cache  misses,
 and more access to the main memory. On modern computers, those can cost
 as much as executing a thousand instructions! So memory optimizing can
 often pay off, also with respect to time!

 
 I've seen this stated often. But color me skeptical.

Maybe I was quoting common wisdom without having checked it personally. I
only remember one case - not related to go at all - where optimizing the
memory stuff made a noticeable difference.

I would also like to see hard evidence. But not badly enough to put in the
necessary work to get some ;-)

  - H

-- 
Heikki Levanto   In Murphy We Turst heikki (at) lsd (dot) dk

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Tree Search reference bot

2008-12-03 Thread Don Dailey
I had a chess program years ago that was blindingly fast on some
computers,  very slow on others.   It was all about the cache.  The move
generator was hard coded for each piece on each square.  For instance a
white pawn on d7 had it's very own move specialized move generator.
There was a function for each piece of each color on each square. 

By doing this I avoided the majority of conditional tests.  I did not
have to test for board edge, pawn on 7th, pawn on the second rank, etc.
All the logic was unrolled and I basically had to write a program to
write the move generator.  

The machine I developed this program on had a lot of cache and it was
not a problem.  But when running on, for instance a laptop,  it would
slow down enormously - something like 4 to 1 where other cache friendly
programs would slow down perhaps 25%.   

I don't keep up much with processor technologies now and I don't know
how much data vs instruction cache modern setups have, but I know it is
far better than it used to be.   I bet that old program would run well
on modern computers. 

One trick, which you are probably doing, is to keep data structures
together in memory.  For instance it is better to have children next to
each other so that you are not accessing memory all over the place,
perhaps incurring several cache misses instead of one. 


On Wed, 2008-12-03 at 11:24 -0200, Mark Boon wrote:
 On 3-dec-08, at 10:31, Heikki Levanto wrote:
 
  Having said that, I can't help responding to one detail:
 
  I had seen people write about memory usage of  the tree, but never
  understood the concerns.
 
  One thing to remember is that more memory use means more cache  
  misses, and
  more access to the main memory. On modern computers, those can cost  
  as much
  as executing a thousand instructions! So memory optimizing can  
  often pay off,
  also with respect to time!
 
  Of course, Java does a lot of memory management behind the scenes, so
  optimizing such can be harder... But when writing in C or C++, it  
  does make
  sense.
 
 
 I've seen this stated often. But color me skeptical. If what you say  
 is true, then the MC-AMAF bot, which hardly uses any memory at all  
 and which fits in the secondary cache entirely, code and data  
 combined, should show an enormous speed-up in terms of number of  
 playouts per second. Instead it does only about twice the number of  
 playouts per second as the tree-search version, which can be almost  
 if not entirely explained by the extra work that needs to be done to  
 traverse and maintain the tree. If this is because I use Java, then  
 Don's concise C implementation of the MC-AMAF bot should be a lot  
 faster than my bloated Java version. Which, ahem, is not the case at  
 all. I think a well-crafted C program can be up to twice as fast as a  
 similar Java program. But I doubt that has much to do with cache misses.
 
 I think in computer-Go, trying to control cache misses is futile no  
 matter the language. Maybe you can do it for a relatively trivial,  
 well-understood and stable sub-part. But then you risk losing the  
 gain again as soon as it gets incorporated into a larger whole. So  
 your efforts go to waste.
 
 So this is in the show me category for me. You can use C or C++ if  
 you like, but show me a tree-search bot that speeds up the number of  
 playouts significantly by reducing the node-size. Maybe it's  
 possible, but I don't believe it until I see it. This could be  
 different for Chess programs, at least I've seen it claimed often.  
 But I don't envy them, it must be excruciatingly frustrating to deal  
 with.
 
 Mark
 
 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/


signature.asc
Description: This is a digitally signed message part
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Monte-Carlo Tree Search reference bot

2008-12-03 Thread Don Dailey
On Wed, 2008-12-03 at 11:24 -0200, Mark Boon wrote:
 If this is because I use Java, then  
 Don's concise C implementation of the MC-AMAF bot should be a lot  
 faster than my bloated Java version.

I don't remember the numbers, but my own java and C implementation were
written in the same style - same basic data structure and just about the
only difference was language.   I did do some kind of list trick using
pointers than you cannot do in java, at least not easily and efficiently
but I don't think it made that much of a difference.

The Java version was not that bad for speed compared to C, I think C was
something like 2/3 of the speed or close to that.But if you got into
heavy duty data structures Java hurts - you won't get as big a tree in
Java for instance and I'm sure caching will be a bigger issue.


- Don



signature.asc
Description: This is a digitally signed message part
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Monte-Carlo Tree Search reference bot

2008-12-02 Thread Mark Boon
I have made some minor performance improvements and this is as far as  
I intend to take this particular project. I might make some small  
changes if necessary, but most likely I'll leave this largely  
unchanged from now.


I had set myself as an arbitrary goal that it should do at least 20K  
playouts. But with real liberties, AMAF and a RAVE formula I got  
stuck in the 16K-17K range. According to my profiler that is mostly  
due to the expensive formula used to compare nodes, where it says it  
spends 25% of total time. The formula I'm using is:


beta * (virtual-win-ratio + RAVE) + (1-beta) * (win-ratio + UCT)

beta = 1 - log(parent-visits) / 20
UCT = exploration-factor *sqrt( log(parent-visits) / (nr-visits+1) )
	RAVE = exploration-factor *sqrt( log(parent-visits) / (nr-virtual- 
visits+1) )


There are probably quite a few possibilities still to tune this  
program with regards to playing strength and performance. But I felt  
it doesn't help to obscure the implementation by too many details.


The implementation of the search algorithm was entirely game- 
independent, until I introduced AMAF. I didn't see how to fix that,  
as there's no way getting around that it's based on the fact that a  
move is represented by a single coordinate, which is mapped to an  
array to store the statistical values. But strip the AMAF part, which  
is a block of 30 odd lines of code, and I think it can be used for  
other games basically as-is. I did this not because I ever see myself  
program another game, but because in my experience in doing so I get  
a cleaner separation between modules.


At 2,000 playouts, it's still quite a bit weaker than the plain MC- 
AMAF refbot. It wins only about 33%. But that's probably because the  
1,000-2,000 playouts range is the sweet-spot for that particular type  
of playing engine. It doesn't scale from there, whereas the MCTS ref- 
bot only just gets warmed up with 2,000 playouts.


This leads me to a question. I suppose it might be of some interest  
to put this bot up on CGOS. But what parameters to use? The main one  
being the number of playouts, naturally.


Mark
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Tree Search reference bot

2008-12-02 Thread dhillismail
There is another speedup trick that might interest you. IIRC Lukasz Lew came up 
with it, but I forget what he called it. After calculating the selected move 
for an internal node (going through UCT and RAVE and whatnot) store that move. 
Then, for the next N visits to that node (where N is 5 or 10 or so), just 
repeat that move without having to calculate what the move would be.

- Dave Hillis

-Original Message-
From: Mark Boon [EMAIL PROTECTED]
To: computer-go computer-go@computer-go.org
Sent: Tue, 2 Dec 2008 11:17 pm
Subject: Re: [computer-go] Monte-Carlo Tree Search reference bot


I have made some minor performance improvements and this is as far as I intend 
to take this particular project. I might make some small changes if necessary, 
but most likely I'll leave this largely unchanged from now.?
?
I had set myself as an arbitrary goal that it should do at least 20K playouts. 
But with real liberties, AMAF and a RAVE formula I got stuck in the 16K-17K 
range. According to my profiler that is mostly due to the expensive formula 
used to compare nodes, where it says it spends 25% of total time. The formula 
I'm using is:?
?
? beta * (virtual-win-ratio + RAVE) + (1-beta) * (win-ratio + UCT)?
?
? beta = 1 - log(parent-visits) / 20?
? UCT = exploration-factor *sqrt( log(parent-visits) / (nr-visits+1) )?
? RAVE = exploration-factor *sqrt( log(parent-visits) / (nr-virtual-visits+1) )?
?
There are probably quite a few possibilities still to tune this program with 
regards to playing strength and performance. But I felt it doesn't help to 
obscure the implementation by too many details.?
?
The implementation of the search algorithm was entirely game-independent, until 
I introduced AMAF. I didn't see how to fix that, as there's no way getting 
around that it's based on the fact that a move is represented by a single 
coordinate, which is mapped to an array to store the statistical values. But 
strip the AMAF part, which is a block of 30 odd lines of code, and I think it 
can be used for other games basically as-is. I did this not because I ever see 
myself program another game, but because in my experience in doing so I get a 
cleaner separation between modules.?
?
At 2,000 playouts, it's still quite a bit weaker than the plain MC-AMAF refbot. 
It wins only about 33%. But that's probably because the 1,000-2,000 playouts 
range is the sweet-spot for that particular type of playing engine. It doesn't 
scale from there, whereas the MCTS ref-bot only just gets warmed up with 2,000 
playouts.?
?
This leads me to a question. I suppose it might be of some interest to put this 
bot up on CGOS. But what parameters to use? The main one being the number of 
playouts, naturally.?
?
Mark?
___?
computer-go mailing list?
[EMAIL PROTECTED]
http://www.computer-go.org/mailman/listinfo/computer-go/?

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Monte-Carlo Tree Search reference bot

2008-12-02 Thread Michael Williams
That's going to repeat the same exact path through the tree three times, isn't it?  If so, it seems like it would be more efficient to do N playouts from the 
leaf after each traversal of the tree.



[EMAIL PROTECTED] wrote:
There is another speedup trick that might interest you. IIRC Lukasz Lew 
came up with it, but I forget what he called it. After calculating the 
selected move for an internal node (going through UCT and RAVE and 
whatnot) store that move. Then, for the next N visits to that node 
(where N is 5 or 10 or so), just repeat that move without having to 
calculate what the move would be.


- Dave Hillis

-Original Message-
From: Mark Boon [EMAIL PROTECTED]
To: computer-go computer-go@computer-go.org
Sent: Tue, 2 Dec 2008 11:17 pm
Subject: Re: [computer-go] Monte-Carlo Tree Search reference bot

I have made some minor performance improvements and this is as far as I 
intend to take this particular project. I might make some small changes 
if necessary, but most likely I'll leave this largely unchanged from now. 
 
I had set myself as an arbitrary goal that it should do at least 20K 
playouts. But with real liberties, AMAF and a RAVE formula I got stuck 
in the 16K-17K range. According to my profiler that is mostly due to the 
expensive formula used to compare nodes, where it says it spends 25% of 
total time. The formula I'm using is: 
 
  beta * (virtual-win-ratio + RAVE) + (1-beta) * (win-ratio + UCT) 
 
  beta = 1 - log(parent-visits) / 20 
  UCT = exploration-factor *sqrt( log(parent-visits) / (nr-visits+1) ) 
  RAVE = exploration-factor *sqrt( log(parent-visits) / 
(nr-virtual-visits+1) ) 
 
There are probably quite a few possibilities still to tune this program 
with regards to playing strength and performance. But I felt it doesn't 
help to obscure the implementation by too many details. 
 
The implementation of the search algorithm was entirely 
game-independent, until I introduced AMAF. I didn't see how to fix that, 
as there's no way getting around that it's based on the fact that a move 
is represented by a single coordinate, which is mapped to an array to 
store the statistical values. But strip the AMAF part, which is a block 
of 30 odd lines of code, and I think it can be used for other games 
basically as-is. I did this not because I ever see myself program 
another game, but because in my experience in doing so I get a cleaner 
separation between modules. 
 
At 2,000 playouts, it's still quite a bit weaker than the plain MC-AMAF 
refbot. It wins only about 33%. But that's probably because the 
1,000-2,000 playouts range is the sweet-spot for that particular type of 
playing engine. It doesn't scale from there, whereas the MCTS ref-bot 
only just gets warmed up with 2,000 playouts. 
 
This leads me to a question. I suppose it might be of some interest to 
put this bot up on CGOS. But what parameters to use? The main one being 
the number of playouts, naturally. 
 
Mark 
___ 
computer-go mailing list 
computer-go@computer-go.org mailto:computer-go@computer-go.org 
http://www.computer-go.org/mailman/listinfo/computer-go/ 



Tis the season to save your money! Get the new AOL Holiday Toolbar 
http://toolbar.aol.com/holiday/download.html?ncid=emlweusdown0008 
for money saving offers and gift ideas.





___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Monte-Carlo Tree Search reference bot

2008-12-02 Thread dhillismail
Hmm.. it could be that N is picked randomly each time... now I can't seem to 
find the description and my memory is not serving me well here. Perhaps someone 
else remembers? I'll try to track it down.

- Dave Hillis


-Original Message-
From: Michael Williams [EMAIL PROTECTED]
To: computer-go computer-go@computer-go.org
Sent: Wed, 3 Dec 2008 12:14 am
Subject: Re: [computer-go] Monte-Carlo Tree Search reference bot


That's going to repeat the same exact path through the tree three times, isn't 
it? If so, it seems like it would be more efficient to do N playouts from the 
leaf after each traversal of the tree.?
?
[EMAIL PROTECTED] wrote:?
 There is another speedup trick that might interest you. IIRC Lukasz Lew  
 came up with it, but I forget what he called it. After calculating the  
 selected move for an internal node (going through UCT and RAVE and  whatnot) 
 store that move. Then, for the next N visits to that node  (where N is 5 or 
 10 or so), just repeat that move without having to  calculate what the move 
 would be.?
  - Dave Hillis?
  -Original Message-?
 From: Mark Boon [EMAIL PROTECTED]?
 To: computer-go computer-go@computer-go.org?
 Sent: Tue, 2 Dec 2008 11:17 pm?
 Subject: Re: [computer-go] Monte-Carlo Tree Search reference bot?
  I have made some minor performance improvements and this is as far as I  
  intend to take this particular project. I might make some small changes  
  if necessary, but most likely I'll leave this largely unchanged from now.  
   I had set myself as an arbitrary goal that it should do at least 20K  
  playouts. But with real liberties, AMAF and a RAVE formula I got stuck  in 
  the 16K-17K range. According to my profiler that is mostly due to the  
  expensive formula used to compare nodes, where it says it spends 25% of  
  total time. The formula I'm using is:   beta * (virtual-win-ratio + RAVE) 
  + (1-beta) * (win-ratio + UCT)   beta = 1 - log(parent-visits) / 20  UCT 
  = exploration-factor *sqrt( log(parent-visits) / (nr-visits+1) )  RAVE = 
  exploration-factor *sqrt( log(parent-visits) /  (nr-virtual-visits+1) )  
   There are probably quite a few possibilities still to tune this program  
  with regards to playing strength and performance. But I felt it doesn't  
  help to obscure 
 the implementation by too many details.   The implementation of the search 
algorithm was entirely  game-independent, until I introduced AMAF. I didn't 
see how to fix that,  as there's no way getting around that it's based on the 
fact that a move  is represented by a single coordinate, which is mapped to an 
array to  store the statistical values. But strip the AMAF part, which is a 
block  of 30 odd lines of code, and I think it can be used for other games  
basically as-is. I did this not because I ever see myself program  another 
game, but because in my experience in doing so I get a cleaner  separation 
between modules.   At 2,000 playouts, it's still quite a bit weaker than the 
plain MC-AMAF  refbot. It wins only about 33%. But that's probably because the 
 1,000-2,000 playouts range is the sweet-spot for that particular type of  
playing engine. It doesn't scale from there, whereas the MCTS ref-bot  only 
just gets warmed up with 2,000 playouts.   This leads 
 me to a question. I suppose it might be of some interest to  put th

is bot up on CGOS. But what parameters to use? The main one being  the number 
of playouts, naturally.   Mark  
___  computer-go mailing list  
computer-go@computer-go.org mailto:computer-go@computer-go.org  
http://www.computer-go.org/mailman/listinfo/computer-go/   
?
 Tis the season to save your money! Get the new AOL Holiday Toolbar  
 http://toolbar.aol.com/holiday/download.html?ncid=emlweusdown0008  for 
 money saving offers and gift ideas.?
   ?
  ___?
 computer-go mailing list?
 [EMAIL PROTECTED]
 http://www.computer-go.org/mailman/listinfo/computer-go/?
?
___?
computer-go mailing list?
[EMAIL PROTECTED]
http://www.computer-go.org/mailman/listinfo/computer-go/?

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

RE: [computer-go] Monte-Carlo Tree Search reference bot

2008-11-28 Thread Denis fidaali

 My own experiments showed a very big increase in strength
especially with about 1000 light-playouts. I had many features
and parameters witch may make a difference. But my
main hypothesis was that it shouldn't ...

I think that i weighted real simulations (aka first move) 10 times more than
virtual one (aka AMAF moves). 

I assume that by expansions you mean allowing data for every child
of a node. So let's assume that your root node is called A, you would
then allocate for every of it's child. Then when make up a simulation
running from A, you would first apply the UCT formula for
determining wich first move you should make ... then
 you would both get back the AMAF score for every moves 
that has been played during the simulation for virtual scoring
AND you would use the true first move for Real scoring (normal UCT part)

Of course, this shouldn't behave exactly like AMAF, because for one thing
you can get a better judgment for the first move because of UCT, and
then you also make the first move weight more. (although you probably
technically should also do so with AMAF despite that the first move
is still 100% random. Which is part of what the
weighted AMAF does. But first move weight would be arbitrary
thus defying the purpose of a standard way)

Last, you could try to downgrade the weight of the virtual score part
as the number of simulation grows.

like with a (if nbsim  1000)
winRatio=[realRatio *(1000+nbsim)]/3000+[virtualRatio*(2000-(1000+nbsim))]/3000

I'm not sure about the formula :) but you get the idea of taking more and more 
into account
the realRatio. I think that what i did was that i didn't take the realRatio 
into accout at all at
first for estimating the winRatio, and then increased it linearly to the point 
where i wouldn't
take into accout the virtualRatio.


The drawback with all that, is that you will certainly have to make several 
versions, although
they would share a lot of code. I don't think that trying to get One version 
behave as
several is a good idea. You probably should keep the code safe, once you have 
made a lot
of testing of it. For reference purpose. Then you'll be able to exactly know 
the strength
of a particular piece of software, even in two years from now. I din't do that. 
I tried
to make it evolve. And with poor svn versionning capacity, and poor management 
of
the measures/software link : i'm now to a point where all i can do is make 
random
guesses, although i made millions of experiments in the last past years :) I 
just
can't link those experiments back to a fixed piece of software.

 Then, if by any means, you think a code is bug free. Which is hard to bet on,
you really should put that somewhere, with all the proof (and experiments)
you can associate with it. (Even on a web page if you ask me , or a thesis)
To me that is the most usefull part of the standard ref bot : 
because a lot of persons have tried it, you can be sure of the exact link
between the software, and the experimental results. It makes assertions 
reproducibles, and that's really great.


 From: [EMAIL PROTECTED]
 Subject: Re: [computer-go] Monte-Carlo Tree Search reference bot
 Date: Fri, 28 Nov 2008 01:48:09 -0200
 To: computer-go@computer-go.org
 CC: 
 
 
 On 27-nov-08, at 19:50, Denis fidaali wrote:
 
  So, you use AMAF for simulating the first UCT evaluations ?
  I though the classical way to use AMAF, was to affect only the
  win/lose ratio portion of the uct equation.
  Obvioulsy it should be allowed to use an arbitrary
  large number of AMAF simulation accumulating them longer
  than what it take to expand a node.
  I think that a classical way to affect the win/ratio is to
  decrease the effect of the AMAF correction as the number
  of simulation grows.
 
  If you test with a very low number of simulation
  (in the 1000 - 3000 range), i think you should be
  able to get out a very nice improvement out of the
  AMAF version. If you don't, i would think that something
  is wrong somewhere.
 
  What test process do you use for this version ?
 
 I tested it mostly doing 2,000 playouts. When AMAF is true I create a  
 map of virtual-win values of all the moves played during a playout.  
 These values get accumulated over all the playouts (not just the  
 first ones). The 'virtual-value' of a move is calculated as follows:
 
 exploration-factor * UCT + ( (nr-wins*2 + nr-virtual-wins) / (nr- 
 playouts*2 + nr-virtual-playouts))
 
 where the exploration-factor is currently sqrt(0.2) and UCT is sqrt 
 ( log( nr-parent-playouts ) / ( nr-playouts+1) )
 
 Like I said, I haven't had time to experiment much so this formula  
 may not be any good. I had also expected to see some positive effect  
 of the virtual-win / virtual-playout ratio from AMAF, but I see none.  
 Of course it's also possible I have a different kind of bug still.
 
 What happens in my 'formula' is that when it never expands beyond the  
 first level, which is what happens if the number of simulations is  
 equal

Re: [computer-go] Monte-Carlo Tree Search reference bot

2008-11-28 Thread Mark Boon
 :
because a lot of persons have tried it, you can be sure of the  
exact link
between the software, and the experimental results. It makes  
assertions

reproducibles, and that's really great.


 From: [EMAIL PROTECTED]
 Subject: Re: [computer-go] Monte-Carlo Tree Search reference bot
 Date: Fri, 28 Nov 2008 01:48:09 -0200
 To: computer-go@computer-go.org
 CC:


 On 27-nov-08, at 19:50, Denis fidaali wrote:

  So, you use AMAF for simulating the first UCT evaluations ?
  I though the classical way to use AMAF, was to affect only the
  win/lose ratio portion of the uct equation.
  Obvioulsy it should be allowed to use an arbitrary
  large number of AMAF simulation accumulating them longer
  than what it take to expand a node.
  I think that a classical way to affect the win/ratio is to
  decrease the effect of the AMAF correction as the number
  of simulation grows.
 
  If you test with a very low number of simulation
  (in the 1000 - 3000 range), i think you should be
  able to get out a very nice improvement out of the
  AMAF version. If you don't, i would think that something
  is wrong somewhere.
 
  What test process do you use for this version ?

 I tested it mostly doing 2,000 playouts. When AMAF is true I  
create a

 map of virtual-win values of all the moves played during a playout.
 These values get accumulated over all the playouts (not just the
 first ones). The 'virtual-value' of a move is calculated as follows:

 exploration-factor * UCT + ( (nr-wins*2 + nr-virtual-wins) / (nr-
 playouts*2 + nr-virtual-playouts))

 where the exploration-factor is currently sqrt(0.2) and UCT is sqrt
 ( log( nr-parent-playouts ) / ( nr-playouts+1) )

 Like I said, I haven't had time to experiment much so this formula
 may not be any good. I had also expected to see some positive effect
 of the virtual-win / virtual-playout ratio from AMAF, but I see  
none.

 Of course it's also possible I have a different kind of bug still.

 What happens in my 'formula' is that when it never expands beyond  
the

 first level, which is what happens if the number of simulations is
 equal to the number of simulations before expansion, the virtual-
 value becomes completely determined by nr-virtual-wins / nr-virtual-
 playouts making it equivalent to the original ref-bot. In case it
 does expand further and creates a tree, the actual win-loss ratio is
 weighed twice as heavily as the virtual win-loss ratio. That seemed
 like a reasonable first try. I have tried a few others, but usually
 didn't get much different results or much worse results.

 Mark

 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

Souhaitez vous  « être au bureau sans y être » ? Oui je le veux !
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

[computer-go] Monte-Carlo Tree Search reference bot

2008-11-27 Thread Mark Boon
I have added a MCTS implementation to my reference-bot project. It  
allows you to specify how many nodes to search, after how many nodes  
to expand and whether to use AMAF or not. If you specify the number  
of nodes before expansion to be the same as the total number of nodes  
to search and set AMAF to true you get the equivalent of Don's  
original ref-bot. When you set AMAF to false and set the number of  
nodes before epxansion to 1 you get my original UCT search algorithm.


Between those extremes there might be a good formula to combine AMAF  
with tree-search, but unfortunately I haven't had time lately to look  
for one. The few small attempts I made show no benefit using AMAF in  
tree-search, only when used on a single level. The contrast between  
the two exptremes is very stark, so I'm actually convinced there  
should be a way to use both. This implementation is also quite a bit  
slower than my original search algorithm but I also didn't have time  
yet to trace it. It might simply be due to the different expansion  
method, which is much more expensive with a value of 1. Also,  
searching for the best UCT node gets more expensive with more  
(unused) nodes on each level. Using a higher expansion value may  
alleviate the performance hit. Anyway I think this is a reasonable  
starting point.


At first I intended to create a different project for the search  
reference bot. But half of the code (the MC part) is the same. And I  
don't want to end up having to maintain the same code in two places.  
I also didn't want to separate out some code into a separate library  
and making the structure for the simple ref-bot more complicated.  
This organization may need some more thought though.


I'll update the project pages tomorrow.

Mark



___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


RE: [computer-go] Monte-Carlo Tree Search reference bot

2008-11-27 Thread Denis fidaali

So, you use AMAF for simulating the first UCT evaluations ?
I though the classical way to use AMAF, was to affect only the
win/lose ratio portion of the uct equation. 
Obvioulsy it should be allowed to use an arbitrary
large number of AMAF simulation accumulating them longer
than what it take to expand a node.
I think that a classical way to affect the win/ratio is to 
decrease the effect of the AMAF correction as the number
of simulation grows.

If you test with a very low number of simulation
(in the 1000 - 3000 range), i think you should be
able to get out a very nice improvement out of the 
AMAF version. If you don't, i would think that something
is wrong somewhere.

What test process do you use for this version ?

 From: [EMAIL PROTECTED]
 Date: Thu, 27 Nov 2008 18:15:34 -0200
 To: computer-go@computer-go.org
 CC: 
 Subject: [computer-go] Monte-Carlo Tree Search reference bot
 
 I have added a MCTS implementation to my reference-bot project. It  
 allows you to specify how many nodes to search, after how many nodes  
 to expand and whether to use AMAF or not. If you specify the number  
 of nodes before expansion to be the same as the total number of nodes  
 to search and set AMAF to true you get the equivalent of Don's  
 original ref-bot. When you set AMAF to false and set the number of  
 nodes before epxansion to 1 you get my original UCT search algorithm.
 
 Between those extremes there might be a good formula to combine AMAF  
 with tree-search, but unfortunately I haven't had time lately to look  
 for one. The few small attempts I made show no benefit using AMAF in  
 tree-search, only when used on a single level. The contrast between  
 the two exptremes is very stark, so I'm actually convinced there  
 should be a way to use both. This implementation is also quite a bit  
 slower than my original search algorithm but I also didn't have time  
 yet to trace it. It might simply be due to the different expansion  
 method, which is much more expensive with a value of 1. Also,  
 searching for the best UCT node gets more expensive with more  
 (unused) nodes on each level. Using a higher expansion value may  
 alleviate the performance hit. Anyway I think this is a reasonable  
 starting point.
 
 At first I intended to create a different project for the search  
 reference bot. But half of the code (the MC part) is the same. And I  
 don't want to end up having to maintain the same code in two places.  
 I also didn't want to separate out some code into a separate library  
 and making the structure for the simple ref-bot more complicated.  
 This organization may need some more thought though.
 
 I'll update the project pages tomorrow.
 
 Mark
 
 
 
 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

_
Email envoyé avec Windows Live Hotmail. Dites adieux aux spam et virus, passez 
à Hotmail ! C'est gratuit !
http://www.windowslive.fr/hotmail/default.asp___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] FW: computer-go] Monte carlo play?

2008-11-17 Thread Magnus Persson
I use a method inititally from the Mogo team that sorts of randomizes  
the position before running the heavy playout. One simply plays  
uniformly random *non contact* moves. The effect of this is that it  
preserves the shapes of stones on the board, but it prevents the heavy  
playouts from playing long sequences deterministically. I have not  
tested this, but I felt that the evaluation on large boards got less  
noisy and/or biased doing this.


Magnus

Quoting Michael Williams [EMAIL PROTECTED]:


It seems move selection in the playouts should be very random at first
and more deterministic toward the end of the playout.  Has anyone tried
that?



___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] FW: computer-go] Monte carlo play?

2008-11-17 Thread Magnus Persson
I have to add that it is possible that a large part of the advantage  
from using heavy playouts in valkyria comes from using the same code  
to bias the the exploration part of MCTS.


I could probably test it by simply relying completely on AMAF with the  
proximity heuristic as the only bias.


A second experiment would be to rewrite the playouts and make them superfast.

-Magnus

Quoting Mark Boon [EMAIL PROTECTED]:



On 17-nov-08, at 02:42, George Dahl wrote:


So you say that: ...I'm observing that most of the increase in level
comes from the selection during exploration and only in small part
from the selection during simulation., could you elaborate at all?
This is very interesting.  That almost suggests it might be fruitful
to use the patterns in the tree, but keep lighter playouts.


That's exactly what it's suggesting. But as I said, I need to do some
more testing to make a hard case for that.

Mark

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/




--
Magnus Persson
Berlin, Germany
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] FW: computer-go] Monte carlo play?

2008-11-17 Thread Darren Cook
 So you say that: ...I'm observing that most of the increase in level
 comes from the selection during exploration and only in small part
 from the selection during simulation., could you elaborate at all?
 This is very interesting.  That almost suggests it might be fruitful
 to use the patterns in the tree, but keep lighter playouts.

If I understand David Fotland [1] (and you) correctly this is the same
approach Many Faces uses. Keep the playouts light, but it is worth
spending effort to guide the MCTS tree.

I also think Remi was saying the same in his paper [2] on using move
patterns. E.g. in section 4.1 he says he just used a few features in the
simulations, but used all the features to control the progressive
widening of the monte-carlo search tree (section 4.2).

Darren


[1]: http://www.mail-archive.com/computer-go@computer-go.org/msg09759.html

[2]:
http://remi.coulom.free.fr/Amsterdam2007/

-- 
Darren Cook, Software Researcher/Developer
http://dcook.org/mlsn/ (English-Japanese-German-Chinese-Arabic
open source dictionary/semantic network)
http://dcook.org/work/ (About me and my work)
http://dcook.org/blogs.html (My blogs and articles)
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] FW: computer-go] Monte carlo play?

2008-11-16 Thread Heikki Levanto
On Sat, Nov 15, 2008 at 11:38:34PM +0100, [EMAIL PROTECTED] wrote:
 Being a computer scientist but new to go, i can grasp some of the theory.
 The question I was trying to get across was:
 
 In a game of self play, if both parties are employing only monte carlo,
 surely its not a good conceptual representation of a human, and if the
 reinforcement learning is based on random simulations wouldnt it be very
 weak when playing a real human?


Here is another amateur answering.

The way I understand it, modern Monte Carlo programs do not even try to
emulate a human player with a random player - obviously that would not work.

What they do is that they build a quite traditional search tree starting from
the current position. They use a random playout as a crude way to evaluate a
position. Based on this evaluation, they decide which branch of the tree to
expand.

This is the way I understand the random playouts: If, in a given position,
white is clearly ahead, he will win the game if both parts play perfect
moves. He is also likely to win if both parts play reasonably good moves
(say, like human amateurs), but there is a bit more of a chance that one
player hits upon a good combination which the other misses, so the result is
not quite as reliable. If the playouts are totally random, there is still a
better chance for white to win, because both parts make equally bad moves.
The results have much more variation, of course. So far it does not sound
like a very good proposal, but things change if you consider the facts that
we don't have perfecr oracles, and good humans are slow to play out a
position, and can not be integrated into a computer program. Whereas random
playouts can be done awfully fast, tens of thousands of playouts in a second.
Averaging the reuslts gives a fair indication of who is more likely to win
from that position, just what is needed to decide which part of the search
tree to expand.

The 'random' playouts are not totally random, they include a minimum of
tactical rules (do not fill own eyes, do not pass as long as there are valid
moves). Even this little will produce a few blind spots, moves that the
playouts can not see, and systematically wrong results. Adding more
go-specific knowledge can make the results much better (more likely to be
right), but can also add some more blind spots. And it costs time, reducing
the number of playouts the program can make.

Hope that explains something of the mystery


Regards

   Heikki

-- 
Heikki Levanto   In Murphy We Turst heikki (at) lsd (dot) dk

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] FW: computer-go] Monte carlo play?

2008-11-16 Thread D Gilder
On Sunday 16 November 2008, Heikki Levanto wrote:
 On Sat, Nov 15, 2008 at 11:38:34PM +0100, [EMAIL PROTECTED] wrote:
  Being a computer scientist but new to go, i can grasp some of the theory.
  The question I was trying to get across was:
 
  In a game of self play, if both parties are employing only monte carlo,
  surely its not a good conceptual representation of a human, and if the
  reinforcement learning is based on random simulations wouldnt it be very
  weak when playing a real human?

 Here is another amateur answering.

 The way I understand it, modern Monte Carlo programs do not even try to
 emulate a human player with a random player - obviously that would not
 work.

 What they do is that they build a quite traditional search tree starting
 from the current position. They use a random playout as a crude way to
 evaluate a position. Based on this evaluation, they decide which branch of
 the tree to expand.

 This is the way I understand the random playouts: If, in a given position,
 white is clearly ahead, he will win the game if both parts play perfect
 moves. He is also likely to win if both parts play reasonably good moves
 (say, like human amateurs), but there is a bit more of a chance that one
 player hits upon a good combination which the other misses, so the result
 is not quite as reliable. If the playouts are totally random, there is
 still a better chance for white to win, because both parts make equally bad
 moves. The results have much more variation, of course. So far it does not
 sound like a very good proposal, but things change if you consider the
 facts that we don't have perfecr oracles, and good humans are slow to play
 out a position, and can not be integrated into a computer program. Whereas
 random playouts can be done awfully fast, tens of thousands of playouts in
 a second. Averaging the reuslts gives a fair indication of who is more
 likely to win from that position, just what is needed to decide which part
 of the search tree to expand.

Do you know what use (if any) is made of the standard deviation of the 
results?


 The 'random' playouts are not totally random, they include a minimum of
 tactical rules (do not fill own eyes, do not pass as long as there are
 valid moves). Even this little will produce a few blind spots, moves that
 the playouts can not see, and systematically wrong results. Adding more
 go-specific knowledge can make the results much better (more likely to be
 right), but can also add some more blind spots. And it costs time, reducing
 the number of playouts the program can make.

 Hope that explains something of the mystery


 Regards

Heikki


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] Re: FW: computer-go] Monte carlo play?

2008-11-16 Thread Claus Reinke
 In a game of self play, if both parties are employing only monte
 carlo, ... random simulations... wouldnt it be very weak...
 ... and some playing around I am clearly mistaken because its works
 quite well.
Yes, it doesn't make sense but it does indeed seem to work :-).

Plain Monte-Carlo bots *are* rather weak. But they work better than
might be expected, and they are fairly simple, which is extremely
motivating for potential Go programmers!-) Instead of having to learn
all of Go and a lot of AI before being able to start building anything,
one can start quickly, and then improve together with the bot. And
the current top programs claim plain Monte-Carlo bots as their remote
ancestors, so it isn't a dead end, either (though complete rewrites may
be needed somewhere along the path;-).

Another way to think about playouts is by comparing them with a
human player processing through the ranks:

1 only knowing the rules makes for very inefficient/slow/buggy play
2 playing lots of games quickly has often been recommended to get a
better feeling for the game; personally, I don't like fast games(*), but
at least the worst inadequacies tend to show up even if we only play
equally clueless opponents (probably because weaknesses aren't
evenly distributed, so that peaks in understanding in one player
happen to match weaknesses in understanding in the other)
3 human players learn over time, until all players in a group have
reached an equal level (or have gone as far as they can, with the
effort they can afford to put into the game)
4 now further progress depends on innovation in the existing pool
of players, on joining a different player pool, or on bringing in
other players, preferably players whose strengths expose
weaknesses in the current pool (similar effects can be achieved
by reading books or observing games)

If we allow the learning outcome in 2/3 to be represented as patterns
(wait a minute, I've fallen into that trap before, so I shouldn't do this,
I usually do this, but that would be faster - I wonder whether it'll work),
then random playouts can emulate a weak form of this process. Instead
of learning from lots of games over time, plain Monte-Carlo bots
extrapolate from lots of dumb games, everytime they consider their
next move.

Random games are really too dumb to be considered games, but bots
can play lots of them in a short time, so they can at least see trends (a
mixture of on-the-spot learning and very bad reading ahead). If we
assume that the quantity compensates somewhat for the quality, a
Monte-Carlo bot is a bit like beginners after lots of games who
always forget their lessons learned when the session ends.

They may have an idea of who is ahead and what is alive, and be able
to organise their games based on those ideas, but a stronger player (who
remembers his lessons) may be able to rip their ideas apart (you might
both think that group is alive but what do you do if I play here? and who
is ahead if you lose that group?). And then there is the scenario of strong
player visits club and humiliates local champion (you might think that
playing there kills that group, but what if they reply here?).

Adding patterns or other biases to make playouts heavier (both slower
and more relevant for evaluation) is similar to improving reading skills,
but with differences: bots already read to the end, so eliminating useless
moves doesn't improve the playout depth, it improves at best the density
of information to be gained from playouts (so you don't have to play as
many of them, or can get more out of those you do play); but playouts
are also still used for on-the-spot-learning, and biases have limited scope
for success, so one has to be careful not to ruin this aspect. Using a tree
on top of the playouts is not just a way to represent the knowledge
extracted from the playouts, but also helps to direct playout effort towards
relevant situations, and gives a place where traditional Go knowledge
can be built in without affecting the playouts directly (so they become
more of an evaluation function than a candidate move generator).

One problem is that the top programs tend to be fairly close in strength.
There are variations in method, so there is some further progress, but
how much of the human going-through-the-ranks evolution has been
captured in current top programs, I don't know. I suspect that it is less
than their ranks would indicate. There is certainly reading going on,
discussion, and inviting stronger players or different bots.

Oh, and if any of this seems to make sense, remember that I just made
it all up!-) You might want to consult these slides for overviews from
folks who have been in the business:


Old-fashioned Computer Go vs Monte-Carlo Go; by Bruno Bouzy,
Invited tutorial, IEEE 2007 Symposium on Computational Intelligence
in Games, CIG 07, Hawaï, USA, 1st April 2007.
Abstract:


Re: [computer-go] FW: computer-go] Monte carlo play?

2008-11-16 Thread Hideki Kato
Hello Heikki,

Heikki Levanto: [EMAIL PROTECTED]:
On Sat, Nov 15, 2008 at 11:38:34PM +0100, [EMAIL PROTECTED] wrote:
 Being a computer scientist but new to go, i can grasp some of the theory.
 The question I was trying to get across was:
 
 In a game of self play, if both parties are employing only monte carlo,
 surely its not a good conceptual representation of a human, and if the
 reinforcement learning is based on random simulations wouldnt it be very
 weak when playing a real human?


Here is another amateur answering.

The way I understand it, modern Monte Carlo programs do not even try to
emulate a human player with a random player - obviously that would not work.

I believe CrazyStone's use of patterns does so and it seems 
successful.

Hideki

What they do is that they build a quite traditional search tree starting from
the current position. They use a random playout as a crude way to evaluate a
position. Based on this evaluation, they decide which branch of the tree to
expand.

This is the way I understand the random playouts: If, in a given position,
white is clearly ahead, he will win the game if both parts play perfect
moves. He is also likely to win if both parts play reasonably good moves
(say, like human amateurs), but there is a bit more of a chance that one
player hits upon a good combination which the other misses, so the result is
not quite as reliable. If the playouts are totally random, there is still a
better chance for white to win, because both parts make equally bad moves.
The results have much more variation, of course. So far it does not sound
like a very good proposal, but things change if you consider the facts that
we don't have perfecr oracles, and good humans are slow to play out a
position, and can not be integrated into a computer program. Whereas random
playouts can be done awfully fast, tens of thousands of playouts in a second.
Averaging the reuslts gives a fair indication of who is more likely to win
from that position, just what is needed to decide which part of the search
tree to expand.

The 'random' playouts are not totally random, they include a minimum of
tactical rules (do not fill own eyes, do not pass as long as there are valid
moves). Even this little will produce a few blind spots, moves that the
playouts can not see, and systematically wrong results. Adding more
go-specific knowledge can make the results much better (more likely to be
right), but can also add some more blind spots. And it costs time, reducing
the number of playouts the program can make.

Hope that explains something of the mystery


Regards

   Heikki
--
[EMAIL PROTECTED] (Kato)
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] FW: computer-go] Monte carlo play?

2008-11-16 Thread Heikki Levanto
On Sun, Nov 16, 2008 at 11:46:28AM +, D Gilder wrote:
  This is the way I understand the random playouts: If, in a given position,
  white is clearly ahead, he will win the game if both parts play perfect
  moves. He is also likely to win if both parts play reasonably good moves
  (say, like human amateurs), but there is a bit more of a chance that one
  player hits upon a good combination which the other misses, so the result
  is not quite as reliable. If the playouts are totally random, there is
  still a better chance for white to win, because both parts make equally bad
  moves. The results have much more variation, of course. So far it does not
  sound like a very good proposal, but things change if you consider the
  facts that we don't have perfecr oracles, and good humans are slow to play
  out a position, and can not be integrated into a computer program. Whereas
  random playouts can be done awfully fast, tens of thousands of playouts in
  a second. Averaging the reuslts gives a fair indication of who is more
  likely to win from that position, just what is needed to decide which part
  of the search tree to expand.
 
 Do you know what use (if any) is made of the standard deviation of the 
 results?

Now statistics isn't my strong point, but one of the first and most
successfull systems was called UCT for Upper Confidence Bound. It calculated
some sort of expected error, and added that to the winning ratio. Then it
expanded the branch that had the highest value. If that expansion was a win,
the error margin would get smaller, but the average result would get higher.
Perhaps some other branch would be tried next, but this one would still be a
good candidate. If the result was a loss, the average would drop, and so
would the error, so this move would become much less likely to be expanded.

I am sure someone who understands statistics will soon jump in and correct my
explanation :-)

  - Heikki

-- 
Heikki Levanto   In Murphy We Turst heikki (at) lsd (dot) dk

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] FW: computer-go] Monte carlo play?

2008-11-16 Thread Magnus Persson

Quoting Hideki Kato [EMAIL PROTECTED]:


Heikki Levanto: [EMAIL PROTECTED]:

The way I understand it, modern Monte Carlo programs do not even try to
emulate a human player with a random player - obviously that would not work.


I believe CrazyStone's use of patterns does so and it seems
successful.


With Valkyria I try to follow two principles in heavy playouts.


1) In contact fights there are a lot of shapes that are played most of  
the time. Thus Valkyria checks each move played if there is an obvious  
local response to it. If so it plays it deterministcally. In many  
situations there are two or more such candidates and then it plays one  
of those moves.


2) In many positions the last move played does not trigger any obvious  
response, and then a random move is chosen uniformly


3) There are moves that are inferior 100% of the time both locally and  
globally. These moves are pruned if they are selected and a new random  
move is chosen as long as there are moves left to try.


I got hundreds of handcoded patterns for both 1 and 3. It would be too  
time consuming to test these patterns, so I use my knowledge and  
intuition (European 2 Dan) to simply decide what patterns to include.


So Valkyria has a lot of go knowledge, but mostly such knowledge that  
all go players have up to some strength such as perhaps 8-10 kyu. It  
has no knowledge about global matters. The beauty of MC-evaluation is  
that globally strong moves are most of the time evaluated better than  
globally weak moves. Heavy playouts removes noise from MC-evaluation  
and makes it more sensitive to the true value of moves. Still there  
are biases with all heavy playouts, but they are overcome with MC Tree  
Search (MCTS) that corrects mistakes in the evaluation recursively.


Here are my latest scaling experiment on 9x9 for Valkyria.

Valkyria plays 1150 random games per second on my 4 year old laptop.

This test is against gnugo 3.7.10 assumed to be Elo 1800. Most  
datapoints are based on 500 games. N sims means Valkyria playes N  
heavy playouts per move played. Winrates are in %.


N sims  WinRate Elo (rel Gnu)
47  7.4 1361
94  22  1580
188 37  1708
375 53  1821
750 69.91946
150081.22054
300088  2146
600092.62239
12000   94  2278
24000   97.22416
48000   97.42429

the heavy playouts of Valkyria needs just 375 random games per move to  
match gnugo using only 0.3 seconds per move. And even using only 47  
simulations per move it can still win.


So obviously the heavy playout code of Valkyria is much weaker ( Elo  
1361) than Gnugo and most human opponents, but compared to CGOS a lot  
of programs witho no knowledge are about the same level, although they  
uses 2000 simulations or more.


Searching efficiently using MCTS with AMAF it apparently can be made  
arbitrarily strong.


Hope this explains how both the nature of playouts and the MCTS  
contributes to the playing strength of a program.


Should one go heavy or light? I do not know, I feel that Valkyria is a  
little bit too slow on equivalent hardware against most top programs.  
On the other hand I think it could be tweaked and improved upon.  
Perhaps it can even be made faster by removing code that does not  
improve playing strength. And there is probably still room for adding  
code that improves strength without a noticable slowdown.


I just know that is a lot of hard work doing it the way I did it.

Best
Magnus
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Re: FW: computer-go] Monte carlo play?

2008-11-16 Thread Thomas Wolf
On Sun, 16 Nov 2008, Claus Reinke wrote:

 ...
 better feeling for the game; personally, I don't like fast games(*), but
 ...

But there is this saying:

Play quick, lose quick, learn quick!   :)

Thomas
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] FW: computer-go] Monte carlo play?

2008-11-16 Thread Don Dailey
The random playouts or even  heavy playouts are not intended to emulate
a human player.   Heikki is exactly right.   

It's a crude measurement of how good the position is.   The moves in a
random playout are horrible and so are the moves in a heavy playout.
In fact, improving them arbitrarily will probably hurt the playouts.
Random playouts work reasonably well because to a certain extent bad
moves cancel each other.   Chrilly called this mutual stupidity I
think.  

For example,  a group of stones may be weak and ultimately lost, but it
is still possible to defend that group if the attacker isn't diligent.
What happens in the playouts is that the attacker is NOT diligent, but
neither is the defender!   So the result comes out correct anyway.

Of course this is not reliable, but it's amazingly good at getting a
reasonable picture of what is weak, what is strong and who owns what.

You can improve that by adding some knowledge to the playouts,  but you
must do this with great care.   In my example above, let's say you add a
piece of knowledge that causes the defender to succeed.   You can argue
that the playouts play better go now,  but the conclusion that you
come to for the group we are talking about is now wrong.   

- Don



On Sun, 2008-11-16 at 10:08 +0100, Heikki Levanto wrote:
 On Sat, Nov 15, 2008 at 11:38:34PM +0100, [EMAIL PROTECTED] wrote:
  Being a computer scientist but new to go, i can grasp some of the theory.
  The question I was trying to get across was:
  
  In a game of self play, if both parties are employing only monte carlo,
  surely its not a good conceptual representation of a human, and if the
  reinforcement learning is based on random simulations wouldnt it be very
  weak when playing a real human?
 
 
 Here is another amateur answering.
 
 The way I understand it, modern Monte Carlo programs do not even try to
 emulate a human player with a random player - obviously that would not work.
 
 What they do is that they build a quite traditional search tree starting from
 the current position. They use a random playout as a crude way to evaluate a
 position. Based on this evaluation, they decide which branch of the tree to
 expand.
 
 This is the way I understand the random playouts: If, in a given position,
 white is clearly ahead, he will win the game if both parts play perfect
 moves. He is also likely to win if both parts play reasonably good moves
 (say, like human amateurs), but there is a bit more of a chance that one
 player hits upon a good combination which the other misses, so the result is
 not quite as reliable. If the playouts are totally random, there is still a
 better chance for white to win, because both parts make equally bad moves.
 The results have much more variation, of course. So far it does not sound
 like a very good proposal, but things change if you consider the facts that
 we don't have perfecr oracles, and good humans are slow to play out a
 position, and can not be integrated into a computer program. Whereas random
 playouts can be done awfully fast, tens of thousands of playouts in a second.
 Averaging the reuslts gives a fair indication of who is more likely to win
 from that position, just what is needed to decide which part of the search
 tree to expand.
 
 The 'random' playouts are not totally random, they include a minimum of
 tactical rules (do not fill own eyes, do not pass as long as there are valid
 moves). Even this little will produce a few blind spots, moves that the
 playouts can not see, and systematically wrong results. Adding more
 go-specific knowledge can make the results much better (more likely to be
 right), but can also add some more blind spots. And it costs time, reducing
 the number of playouts the program can make.
 
 Hope that explains something of the mystery
 
 
 Regards
 
Heikki
 


signature.asc
Description: This is a digitally signed message part
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

RE: [computer-go] FW: computer-go] Monte carlo play?

2008-11-16 Thread Magnus Persson
Yes, Valkyria does a lot of ladder reading as well. Actually pattern  
matching in the case of Valkyria is a little unclear, it is a  
decision trees where the leaves are often procedure calls that looks  
at a larger portion of the board. The ladder code is called for  
various reasons in the tree.


Is 3.7.10 level 10 weaker than the default settings? I will run a test  
using 5000 playouts and 375 playouts to see if it makes a difference.


Also have you tried this version on CGOS9x9?

This version http://cgos.boardspace.net/9x9/cross/Valkyria3.2.6_C2.html
used a machine similar to yours and reached 3000 playouts per second  
using two threads. Your code is then 8 times faster and should be much  
stronger.



-Magnus



Quoting David Fotland [EMAIL PROTECTED]:


I thought Valkyria does local search (ladders) during the playouts.

Many Faces is lighter on the playouts.  I have 17 local 3x3 patterns, then
go to uniform random without filling eyes.

Against Gnugo 3.7.10 level 10 on 9x9, with 5000 playouts, I win 92%, so our
performance is similar.  I'm doing 23K playouts per second on my 2.2 GHz
Core 2 Duo, so my performance might be a little better, depending on the
specs of your old machine.


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


RE: [computer-go] FW: computer-go] Monte carlo play?

2008-11-16 Thread David Fotland
I think I added a small capture bias, but it didn't make much difference.
Sorry, I forgot that it isn't quite pure random.  Before the uniform random,
if there is an enemy one liberty group on the board, with some small
probability, I capture it.  

A pattern includes don't cares and is matched in any orientation.  The 3x3
patterns are only matched adjacent to the last move (8 local places).

David

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:computer-go-
 [EMAIL PROTECTED] On Behalf Of Jason House
 Sent: Sunday, November 16, 2008 5:06 PM
 To: computer-go
 Subject: Re: [computer-go] FW: computer-go] Monte carlo play?
 
 On Nov 16, 2008, at 11:18 AM, David Fotland [EMAIL PROTECTED]
 games.com wrote:
 
  I thought Valkyria does local search (ladders) during the playouts.
 
  Many Faces is lighter on the playouts.  I have 17 local 3x3
  patterns, then
  go to uniform random without filling eyes.
 
 
 No capture bias in the playouts? I thought that was a big strength
 boost.
 
 Out of curiosity, how do you count your patterns. For example, is it
 still one pattern if it includes a don't care? How about rotations/
 reflections of the same basic pattern?
 
 
 
 
  Against Gnugo 3.7.10 level 10 on 9x9, with 5000 playouts, I win 92%,
  so our
  performance is similar.  I'm doing 23K playouts per second on my 2.2
  GHz
  Core 2 Duo, so my performance might be a little better, depending on
  the
  specs of your old machine.
 
  David
 
  -Original Message-
  From: [EMAIL PROTECTED] [mailto:computer-go-
  [EMAIL PROTECTED] On Behalf Of Magnus Persson
  Sent: Sunday, November 16, 2008 5:45 AM
  To: computer-go@computer-go.org
  Subject: Re: [computer-go] FW: computer-go] Monte carlo play?
 
  Quoting Hideki Kato [EMAIL PROTECTED]:
 
  Heikki Levanto: [EMAIL PROTECTED]:
  The way I understand it, modern Monte Carlo programs do not even
  try to
  emulate a human player with a random player - obviously that
  would not
  work.
 
  I believe CrazyStone's use of patterns does so and it seems
  successful.
 
  With Valkyria I try to follow two principles in heavy playouts.
 
 
  1) In contact fights there are a lot of shapes that are played most
  of
  the time. Thus Valkyria checks each move played if there is an
  obvious
  local response to it. If so it plays it deterministcally. In many
  situations there are two or more such candidates and then it plays
  one
  of those moves.
 
  2) In many positions the last move played does not trigger any
  obvious
  response, and then a random move is chosen uniformly
 
  3) There are moves that are inferior 100% of the time both locally
  and
  globally. These moves are pruned if they are selected and a new
  random
  move is chosen as long as there are moves left to try.
 
  I got hundreds of handcoded patterns for both 1 and 3. It would be
  too
  time consuming to test these patterns, so I use my knowledge and
  intuition (European 2 Dan) to simply decide what patterns to include.
 
  So Valkyria has a lot of go knowledge, but mostly such knowledge that
  all go players have up to some strength such as perhaps 8-10 kyu. It
  has no knowledge about global matters. The beauty of MC-evaluation is
  that globally strong moves are most of the time evaluated better than
  globally weak moves. Heavy playouts removes noise from MC-evaluation
  and makes it more sensitive to the true value of moves. Still there
  are biases with all heavy playouts, but they are overcome with MC
  Tree
  Search (MCTS) that corrects mistakes in the evaluation recursively.
 
  Here are my latest scaling experiment on 9x9 for Valkyria.
 
  Valkyria plays 1150 random games per second on my 4 year old laptop.
 
  This test is against gnugo 3.7.10 assumed to be Elo 1800. Most
  datapoints are based on 500 games. N sims means Valkyria playes N
  heavy playouts per move played. Winrates are in %.
 
  N simsWinRateElo (rel Gnu)
  477.41361
  94221580
  188371708
  375531821
  75069.91946
  150081.22054
  3000882146
  600092.62239
  12000942278
  2400097.22416
  4800097.42429
 
  the heavy playouts of Valkyria needs just 375 random games per move
  to
  match gnugo using only 0.3 seconds per move. And even using only 47
  simulations per move it can still win.
 
  So obviously the heavy playout code of Valkyria is much weaker ( Elo
  1361) than Gnugo and most human opponents, but compared to CGOS a lot
  of programs witho no knowledge are about the same level, although
  they
  uses 2000 simulations or more.
 
  Searching efficiently using MCTS with AMAF it apparently can be made
  arbitrarily strong.
 
  Hope this explains how both the nature of playouts and the MCTS
  contributes to the playing strength of a program.
 
  Should one go heavy or light? I do not know, I feel that Valkyria
  is a
  little bit too slow on equivalent hardware against most top programs.
  On the other hand I think

Re: [computer-go] FW: computer-go] Monte carlo play?

2008-11-16 Thread Mark Boon
Some months ago I did several experiments with using tactics and  
patterns in playouts. Generally I found a big boost in strength using  
tactics. I also found a boost in strength using patterns but with a  
severe diminishing return after a certain number and even becoming  
detrimental when using large number of patterns (1,000s to 10,000s).  
Since I was using a generalized pattern-matcher, using it slowed  
things down considerably. Although it played a lot better with the  
same number of playouts, if I compared MC playouts with patterns to a  
MC playout without patterns using the same amount of CPU time the  
gain was not so obvious. Since most of the gain in strength was  
gained by just a few patterns I concluded just as David that it was  
probably better to just use a handful of hard-coded patterns during  
playouts.


I only recently started to do real experiments with hard-coded  
patterns and so far my results are rather inconclusive. I found when  
mixing different things it's not always clear what contributes to any  
increased strength observed. So I'm still in the process of trying to  
dissect what is actually contributing where. I found for example that  
a lot of the increased level of play using patterns does not come  
from using them during playouts but comes from the effect they have  
on move-exploration. I don't know if this is due to my particular way  
of implementing MC playouts in combination with UCT search, but moves  
matching a pattern (usually) automatically make it first in the tree- 
expansion as well and generally I can say so far I'm observing that  
most of the increase in level comes from the selection during  
exploration and only in small part from the selection during simulation.


For example, in one particular experiment using just 5 patterns I saw  
a win-rate of 65% against the same program not using patterns (with  
the same number of playouts). But when not using the patterns during  
exploration saw the win-rate drop to just 55%.


I still have a lot of testing to do and it's too early to draw any  
hard conclusions. But I think it's worthwhile trying to distinguish  
where the strength is actually gained. Better yet, finding out  
exactly 'why' it gained strength, because with MC playouts I often  
find results during testing highly counter-intuitive, occasionally to  
the point of being (seemingly) nonsensical.


I also think what Don was proposing with his reference-bot could be  
interesting. Trying to make it play around ELO 1700 on CGOS just  
using 5,000 (light) playouts. I don't know if it's possible, but I  
think it's a fruitful exercise. At a time where most people are  
looking at using more and more hardware to increase playing-strength,  
knowing what plays best at the other end of the spectrum is valuable  
as well. With that I mean, finding what plays best using severely  
constrained resources.


Mark

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] FW: computer-go] Monte carlo play?

2008-11-16 Thread George Dahl
So you say that: ...I'm observing that most of the increase in level
comes from the selection during exploration and only in small part
from the selection during simulation., could you elaborate at all?
This is very interesting.  That almost suggests it might be fruitful
to use the patterns in the tree, but keep lighter playouts.
- George

On Sun, Nov 16, 2008 at 10:39 PM, Mark Boon [EMAIL PROTECTED] wrote:
 Some months ago I did several experiments with using tactics and patterns in
 playouts. Generally I found a big boost in strength using tactics. I also
 found a boost in strength using patterns but with a severe diminishing
 return after a certain number and even becoming detrimental when using large
 number of patterns (1,000s to 10,000s). Since I was using a generalized
 pattern-matcher, using it slowed things down considerably. Although it
 played a lot better with the same number of playouts, if I compared MC
 playouts with patterns to a MC playout without patterns using the same
 amount of CPU time the gain was not so obvious. Since most of the gain in
 strength was gained by just a few patterns I concluded just as David that it
 was probably better to just use a handful of hard-coded patterns during
 playouts.

 I only recently started to do real experiments with hard-coded patterns and
 so far my results are rather inconclusive. I found when mixing different
 things it's not always clear what contributes to any increased strength
 observed. So I'm still in the process of trying to dissect what is actually
 contributing where. I found for example that a lot of the increased level of
 play using patterns does not come from using them during playouts but comes
 from the effect they have on move-exploration. I don't know if this is due
 to my particular way of implementing MC playouts in combination with UCT
 search, but moves matching a pattern (usually) automatically make it first
 in the tree-expansion as well and generally I can say so far I'm observing
 that most of the increase in level comes from the selection during
 exploration and only in small part from the selection during simulation.

 For example, in one particular experiment using just 5 patterns I saw a
 win-rate of 65% against the same program not using patterns (with the same
 number of playouts). But when not using the patterns during exploration saw
 the win-rate drop to just 55%.

 I still have a lot of testing to do and it's too early to draw any hard
 conclusions. But I think it's worthwhile trying to distinguish where the
 strength is actually gained. Better yet, finding out exactly 'why' it gained
 strength, because with MC playouts I often find results during testing
 highly counter-intuitive, occasionally to the point of being (seemingly)
 nonsensical.

 I also think what Don was proposing with his reference-bot could be
 interesting. Trying to make it play around ELO 1700 on CGOS just using 5,000
 (light) playouts. I don't know if it's possible, but I think it's a fruitful
 exercise. At a time where most people are looking at using more and more
 hardware to increase playing-strength, knowing what plays best at the other
 end of the spectrum is valuable as well. With that I mean, finding what
 plays best using severely constrained resources.

 Mark

 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] FW: computer-go] Monte carlo play?

2008-11-16 Thread Mark Boon


On 17-nov-08, at 02:42, George Dahl wrote:


So you say that: ...I'm observing that most of the increase in level
comes from the selection during exploration and only in small part
from the selection during simulation., could you elaborate at all?
This is very interesting.  That almost suggests it might be fruitful
to use the patterns in the tree, but keep lighter playouts.


That's exactly what it's suggesting. But as I said, I need to do some  
more testing to make a hard case for that.


Mark

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] FW: computer-go] Monte carlo play?

2008-11-16 Thread George Dahl
I look forward to hearing more!  Happy testing.
- George

On Sun, Nov 16, 2008 at 11:53 PM, Mark Boon [EMAIL PROTECTED] wrote:

 On 17-nov-08, at 02:42, George Dahl wrote:

 So you say that: ...I'm observing that most of the increase in level
 comes from the selection during exploration and only in small part
 from the selection during simulation., could you elaborate at all?
 This is very interesting.  That almost suggests it might be fruitful
 to use the patterns in the tree, but keep lighter playouts.

 That's exactly what it's suggesting. But as I said, I need to do some more
 testing to make a hard case for that.

 Mark

 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] FW: computer-go] Monte carlo play?

2008-11-16 Thread Michael Williams

It seems move selection in the playouts should be very random at first and more 
deterministic toward the end of the playout.  Has anyone tried that?


Mark Boon wrote:


On 17-nov-08, at 02:42, George Dahl wrote:


So you say that: ...I'm observing that most of the increase in level
comes from the selection during exploration and only in small part
from the selection during simulation., could you elaborate at all?
This is very interesting.  That almost suggests it might be fruitful
to use the patterns in the tree, but keep lighter playouts.


That's exactly what it's suggesting. But as I said, I need to do some 
more testing to make a hard case for that.


Mark

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/



___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


RE: [computer-go] Monte carlo play

2008-11-15 Thread dave.devos
Hello Tony,
 
I'm just speaking for myself here, but I suspect you may not get the answers 
you are looking for, because you're not asking the right questions.
 
I get the impression that the situation is similar to this scenario:
 

A young guy with a surf board and a hawai shirt turns up at 
basecamp on mount Everest and says:
Hey dudes, what's up!
This morning I had this wonderful idea: I wanna climb the mount 
Everest!
So here I am, and I even brought my hiking shoes!
But can anyone give me directions to the nearest McDonalds, cuz 
I'm starving.
 
Everybody in the tent stares at him speechlessy.

I may be wrong, but that's my impression.
I'm not even sure whether if I myself should be here at basecamp, even though I 
am a 4d in Go and even though I have 25 years of programming experience in a 
couple of different languages (including the last 8 years as a fulltime 
professional developer.)
It's only logical that ever since I've learned Go 20 years ago I've had a 
special interest in computer-go and I spent some time over the years attempting 
to build a strong go playing program myself (although not nearly as much time 
as some others here).
I'm a mountaineer, but climbing the mount Everest is something else. I feel a 
bit like a rooky between all the seasoned mountaineers hanging around here at 
the Everest basecamp. Most of the time I keep my mouth shut when one of them is 
talking, because I'm a bit apprehensive that I might say something that is 
utterly obvious to the others or even outright stupid.
 
And then this guy turns up in his hawai shirt.
 
I hope you don't feel offended. Indeed you took up a wonderful endeavour, but I 
sense that you're not quite ready to go for the summit today.
 
Best Regards,
Dave



Van: [EMAIL PROTECTED] namens tony tang
Verzonden: do 13-11-2008 22:03
Aan: go mailing list
Onderwerp: RE: [computer-go] Monte carlo play


Thanks Darren,
 
After reading a couple of interesting papers on this field I am applying 
pattern matching to limit the number of possibilities of the monte carlo tree 
(on 19x19 it appears to take a day to play).
I was wondering if you could comment on weather my following concept is right 
on the evalution part.
 
The possible patterns for the current situation are passed to a class where 
self play begins. Each different pattern runs on a instance, and at the end the 
results are written into a file. The game tree will be constructed using the 
stats from the file. 
 
Thanks again
 
Tony
 
 


 Date: Wed, 12 Nov 2008 10:34:21 +0900
 From: [EMAIL PROTECTED]
 To: computer-go@computer-go.org
 Subject: Re: [computer-go] Monte carlo play
 
  I am new to programming go, could some one explain to me how a monte
  carlo based evalution manages to play random games by itself? ie:
  who/what is the oppoent which supplies the opposing moves which
  allows another move to be randomly played after making the initial
 
 It is self-play, so both players are random.
 
 I'm not aware of any programs that use different playout strategies for
 black/white, though it has been briefly mentioned before on this list,
 and I personally think it might be interesting as a way of discovering
 and more accurately evaluating the imbalances present in most go
 positions (i.e. usually one side has more strength and one side has more
 territory, and so to correctly understand the position one side has to
 play aggressive moves and the other has to play defensive moves).
 
  move at the root. I am implementing in java, is there a
  package/framework which allows me to train my software.
 
 This page, recently started by Eric Marchand, might be a good starting
 place:
 http://ricoh51.free.fr/go/engineeng.htm
 
 Darren
 
 
 -- 
 Darren Cook, Software Researcher/Developer
 http://dcook.org/mlsn/ (English-Japanese-German-Chinese-Arabic
 open source dictionary/semantic network)
 http://dcook.org/work/ (About me and my work)
 http://dcook.org/blogs.html (My blogs and articles)
 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/





Get the best wallpapers on the Web - FREE. Click here! 
http://wallpapers.msn.com/?ocid=[B001MSN42A0716B]  
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Monte carlo play

2008-11-15 Thread Nick Wedd
In message
[EMAIL PROTECTED],
[EMAIL PROTECTED] writes

 Hawaiian shirt analogy snipped 

I hope you don't feel offended. Indeed you took up a wonderful
endeavour, but I sense that you're not quite ready to go for the summit
today.

But Tony never expressed any interest in going for the summit.  Maybe
he just wants to write a simple MC implementation.  Here is what he
wrote:

I am new to programming go, could some one explain to me how a monte
carlo based evalution manages to play random games by itself?
ie: who/what is the oppoent which supplies the opposing moves which
allows another move to be randomly played after making the initial move
at the root.

I am implementing in java, is there a package/framework which allows me
to train my software.

I will try to answer these questions myself.

 who/what is the oppoent which supplies the opposing moves

It is the same piece of your program that supplies your own moves.  For
each random game or rollout, your program places both black and
white stones, alternately, at random.

is there a package/framework which allows me
to train my software.

Nothing trains your software.  This isn't a neural net.  The way it
works is:
  you write a program
  A: it plays, badly
 you modify it so it plays differently, and with luck, better
 repeat from A

Nick
-- 
Nick Wedd[EMAIL PROTECTED]
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


RE: [computer-go] Monte carlo play

2008-11-15 Thread dave.devos
You are absolutely right. Tony did not imply he wanted to go for the summit 
today. Also, Tony's qualifications are surely a lot better than a hawaiian 
shirt and a surfboard. 
 
My analogy was an exaggeration with the intention of explaining to Tony that he 
has more catching up to do than he might have expected. 
By no means was my intention to put Tony down. I hope Tony understands this and 
I sincerely hope that Tony won't give up on joining us in this wonderful 
endeavour, because we need all the help we can get.
 
Best Regards,
Dave



Van: [EMAIL PROTECTED] namens Nick Wedd
Verzonden: za 15-11-2008 15:48
Aan: computer-go
Onderwerp: Re: [computer-go] Monte carlo play



In message
[EMAIL PROTECTED],
[EMAIL PROTECTED] writes

 Hawaiian shirt analogy snipped 

I hope you don't feel offended. Indeed you took up a wonderful
endeavour, but I sense that you're not quite ready to go for the summit
today.

But Tony never expressed any interest in going for the summit.  Maybe
he just wants to write a simple MC implementation.  Here is what he
wrote:

I am new to programming go, could some one explain to me how a monte
carlo based evalution manages to play random games by itself?
ie: who/what is the oppoent which supplies the opposing moves which
allows another move to be randomly played after making the initial move
at the root.

I am implementing in java, is there a package/framework which allows me
to train my software.

I will try to answer these questions myself.

 who/what is the oppoent which supplies the opposing moves

It is the same piece of your program that supplies your own moves.  For
each random game or rollout, your program places both black and
white stones, alternately, at random.

is there a package/framework which allows me
to train my software.

Nothing trains your software.  This isn't a neural net.  The way it
works is:
  you write a program
  A: it plays, badly
 you modify it so it plays differently, and with luck, better
 repeat from A

Nick
--
Nick Wedd[EMAIL PROTECTED]
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

RE: [computer-go] Monte carlo play

2008-11-15 Thread dave.devos
Just a quickstart here, and I'm not one of the real experts, so it may contain 
inaccuracies. Perhaps you already know all this, perhaps not.
 
The basis of many game-playing programs (like chess programs) is the minimax 
(or negamax) algorithm.
It is a simple algorithm by itself and you can find a lot of background and 
examples for chess or other games if you google it.
Basically, minimax explores all possible variations stemming from the current 
position and then selects the best move.
 
This algorithm works in principle, but there is a problem with it:
For interesting games like chess and go, it will usually take the program 
billions of years to find the best move, given a game position.
 
In the past decades several adaptations of the minimax algorithm were invented 
to enable the algorithm to find a reasonably good move in a timespan that 
humans are prepared to wait for (like evaluation functions, alfabeta pruning 
and transposition tables). The program keeps investigating possibilities for 
some time and then the move seeming best so far is played. The longer the 
program is allowed to think (and the faster the computer), the better the move 
it will come up with.
 
These adaptations work by reducing the number of possibilities the program 
explores, without weakening the program too much. The program must ignore many 
possibilities to save time (this is called pruning), but the more possibilities 
it ignores, the higher the risk that it accidentally ignores the best move. 
Also, there is allways the danger of the horizon effect: The program 
misinterprets the situation because it simply doesn't get enough time to fully 
investigate every possibility. 
 
The challenge for the programmer is to strike an optimal balance between time 
consumption and losing by accidentally ignoring or misinterpreting the best 
move. Some game programs are even able to finetune themselves to optimize this 
balance over time (learning).
 
A combination of these adaptations have proven to be well enough to enable 
chess programs on current hardware to beat any human with reasonable thinking 
time.
But computer-go is a harder problem than computer-chess. The adaptations that 
are succesful in computer-chess are just not enough for computer-go, mainly 
because there ar far more possibilities (and I mean FAR more) to explore in go. 
 
But now there is a recent adaptation that has proven to be fairly succesful in 
computer-go: MonteCarlo playouts.
The program reduces the work to be done by gambling a number of times and 
measuring the average payoff. If the average payoff is good, the move is 
assumed to be good for the time being. Also, it doesn't automatically ignore 
moves with a low payoff (remember the risk of ignoring the best move). The 
program just postpones further investigation of seemingly bad moves until the 
moves seeming good initially give more and more dissapointing payoffs after 
further investigation. 
Although this invention has had only moderate success in computer-chess, it has 
proven to be fairly succesful in computer-go.
 
But at the heart of a MonteCarlo go program, you can still recognize the 
minimax algorithm.
 
Best Regards,
Dave
 
 


Van: [EMAIL PROTECTED] namens tony tang
Verzonden: wo 12-11-2008 2:23
Aan: computer-go@computer-go.org
Onderwerp: [computer-go] Monte carlo play


Greetings all go gurus,
 
I am new to programming go, could some one explain to me how a monte carlo 
based evalution manages to play random games by itself?
ie: who/what is the oppoent which supplies the opposing moves which allows 
another move to be randomly played after making the initial move at the root.
I am implementing in java, is there a package/framework which allows me to 
train my software.
 
Thank you
 
Kind regards
 
Tony




Read amazing stories to your kids on Messenger Try it Now! 
http://clk.atdmt.com/UKM/go/117588488/direct/01/  
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

[computer-go] FW: computer-go] Monte carlo play?

2008-11-15 Thread dave.devos
Van: tony tang [mailto:[EMAIL PROTECTED]
Verzonden: za 15-11-2008 21:22
Aan: [EMAIL PROTECTED]
Onderwerp: [RE:computer-go] Monte carlo play?


Hello Dave,

Thank you for a thorough introduction of the theory, and i sincerely hope I am 
not wasting your time with amateur questions.
Because it was quite late when I sent the question I think I didnt word it 
correctly and mis-communicated 
my question. 

Being a computer scientist but new to go, i can grasp some of the theory. The 
question I was trying to get across 
was:

In a game of self play, if both parties are employing only monte carlo, surely 
its not a good conceptual representation
of a human, and if the reinforcement learning is based on random simulations 
wouldnt it be very weak when playing a real human?
According to nick wedd and some playing around I am clearly mistaken because 
its works quite well.
Is it because they are training(not in the neural network sense) their monte 
carlo based engine against another engine with prior knowledge or something
of that nature?

I have seen papers implementing patterns and prior knowledge with monte carlo 
(dated 2006) has that become a standard
now, and when people refer to monte carlo they dont mean absolutly random? 

Thank you

Kind regards

Tony




Read amazing stories to your kids on Messenger Try it Now! 
http://clk.atdmt.com/UKM/go/117588488/direct/01/  
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

[computer-go] FW: computer-go] Monte carlo play?

2008-11-15 Thread dave.devos
Van: [EMAIL PROTECTED]
Verzonden: za 15-11-2008 22:44
Aan: tony tang
Onderwerp: RE: computer-go] Monte carlo play?


Hello Tony,
 
Ok, so you know about the minimax algorithm and the like. My impression was 
wrong and I'm very sorry for my analogy.
 
I'm no expert on montecarlo, so I can only say what I know. But most of the 
real experts have been rather quiet during the last week, so perhaps I can act 
as a stand-in.
 
The basics is indeed absolutely random play (alternating turns and removing 
captured stones as in normal play) until the game finishes. The only exceptions 
to random plays are occupied intersections, simple ko recaptures, suicide moves 
(not sure if all programs prune suicide if the rules allow it) and moves that 
fill one-point eyes of the color who is to play (there may be slight 
differences in the exact definition of a one point eye).
 
Without the one eye pruning, random games would never finish, because both 
random players would keep on killing their own stones and refilling the empty 
space after the opponent has captured. 
Even with eye detection and simple ko detection, the game could end up in a 
cycle occasionally because of triple ko or more pathological situations, so one 
should probably also limit the number of moves and discard the result if that 
limit is exceeded. This is probably a much cheaper solution to the cycle 
problem than a more sophisticated detection). 
 
A purely random game is played out until both players are forced to pass. This 
is a called a light playout. Claus Reinke recently dubbed the term random 
fill-in for this process 
(http://www.mail-archive.com/computer-go@computer-go.org/msg09749.html). I like 
this term better than light playout, but it doesn't seem to catch on here.
 
The board is then counted to determine the winner and this binary result is 
added to a statistic for the move that is currently being evaluated. The 
accumulated statistic is effectively the evaluation value for the position. It 
is then backtracked to the game tree root in a minimax way.
 
In go it is hard to quickly evaluate a position statically (or even slowly for 
that matter). So it is nice to have a dynamic evaluation that becomes more 
accurate as you spend more cpu cyles on it. You can see that it's important 
that random fill-ins are fast, but it's still a rather large investment to do a 
lot of playouts, when you compare this with a static evaluation function. So it 
is even more important to avoid wasting effort in evaluating positions by 
random fill-ins. The trick is to use crude statistics (few fill-ins) to decide 
which statistic to refine by investing more fill-ins. This is where the UCT 
value comes in (http://senseis.xmp.net/?UCT). It is derived from the statistic 
and it decides which position to investigate further, preferring good 
statistics over bad ones, but also preferring crude statistics over accurate 
ones.
 
There is no training involved and there is no a priori knowledge involved here 
except for the rules and eye detection. The program has absolutely no idea 
about go concepts.
 
Would you call this process learning?
 
Programs using this algorithm are called pure MC programs and although they 
get better with more CPU cycles, they aren't very strong with normal time 
limits on current hardware (or hardware in the near future). You are right, by 
itself pure MC is just not effective enough.
 
So programmers started to build in more Go knowledge to improve on this. This 
makes the fill-in process slower, so they are called heavy playouts. And this 
is also where my knowledge ends. I may guess that instead of selecting moves in 
a uniform random way, moves are weighted by a priori Go knowledge built in by 
the programmer, but I know no details and I don't know if there have been 
publications on this. Perhaps there aren't any, because it is also a 
competitive field and commercial interests might be involved too.
 
Other contributors can probably tell you more about this.
 
Best Regards,
Dave 
 



Van: tony tang [mailto:[EMAIL PROTECTED]
Verzonden: za 15-11-2008 21:22
Aan: [EMAIL PROTECTED]
Onderwerp: [RE:computer-go] Monte carlo play?


Hello Dave,

Thank you for a thorough introduction of the theory, and i sincerely hope I am 
not wasting your time with amateur questions.
Because it was quite late when I sent the question I think I didnt word it 
correctly and mis-communicated 
my question. 

Being a computer scientist but new to go, i can grasp some of the theory. The 
question I was trying to get across 
was:

In a game of self play, if both parties are employing only monte carlo, surely 
its not a good conceptual representation
of a human, and if the reinforcement learning is based on random simulations 
wouldnt it be very weak when playing a real human?
According to nick wedd and some playing around I am clearly mistaken because 
its works quite well.
Is it because they are training(not in the neural

  1   2   >