Re: [computer-go] Goal-directedness of Monte-Carlo

2008-09-08 Thread Michael Williams
I don't think this specific test had been done.  But I'm assuming the result will be the same as previous tests: deviating from winning percentage> leads to a degradation in strength.



Brett Koonce wrote:

Greetings from a lurker,

Forgive me if I am talking out of my hat.  It has been a long time since 
I have done any real coding.


It seems most of the gains in MC/UCT come fairly quickly (or rather you 
can get within 50% of a good move guess with a few iterations).  It 
would be interesting to perhaps do a progressive stepping down/widening, 
i.e. 1k playouts with komi + 3 as the cutoff, then feed this tree into 
2k playouts with komi + 2, then 4k playouts with komi + 1, and then 
finally do the usual full blown regular analysis, say 50k playouts 
(numbers can be tweaked of course).  You would lose the initial 
simulations from your final one, so you would be sacrificing say 10% of 
the possible simulations, but on the other hand it would seem to bias 
the tree toward making moves that have a greater chance of winning by a 
greedy amount without explicitly telling the computer it has to win by a 
certain number, which would seem dangerous if the simulations are near 
the threshold.


I apologize if this is an obvious idea, was just wondering if there was 
a flaw with it/someone had done experiments in this direction already.


-Brett
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/



___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Goal-directedness of Monte-Carlo

2008-09-08 Thread Brett Koonce

Greetings from a lurker,

Forgive me if I am talking out of my hat.  It has been a long time  
since I have done any real coding.


It seems most of the gains in MC/UCT come fairly quickly (or rather  
you can get within 50% of a good move guess with a few iterations).   
It would be interesting to perhaps do a progressive stepping down/ 
widening, i.e. 1k playouts with komi + 3 as the cutoff, then feed this  
tree into 2k playouts with komi + 2, then 4k playouts with komi + 1,  
and then finally do the usual full blown regular analysis, say 50k  
playouts (numbers can be tweaked of course).  You would lose the  
initial simulations from your final one, so you would be sacrificing  
say 10% of the possible simulations, but on the other hand it would  
seem to bias the tree toward making moves that have a greater chance  
of winning by a greedy amount without explicitly telling the computer  
it has to win by a certain number, which would seem dangerous if the  
simulations are near the threshold.


I apologize if this is an obvious idea, was just wondering if there  
was a flaw with it/someone had done experiments in this direction  
already.


-Brett
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Goal-directedness of Monte-Carlo

2008-09-08 Thread Jason House
Actually your summary of what people do sounds exactly like what MC  
programs do, except for one point...


MC programs don't differentiate moves by point value. They only look  
at winning rate. It's extremely tough to differentiate the one move  
sequence with 99.1% win rate when all other moves have a 99% win rate.


Without any other heuristics or local search to guide MC programs,  
their play seems reckless...


Sent from my iPhone

On Sep 8, 2008, at 5:45 PM, terry mcintyre <[EMAIL PROTECTED]>  
wrote:



Interesting analysis, Don.

Human players sometimes adhere to a simple policy: "rich men don't  
pick fights."


When one is objectively far ahead, one picks up the easy profits,  
and otherwise takes no risks. If moves A, B, and C are comparable  
risk-wise, one would prefer the more profitable of the lot.


On the other hand, when one is far behind, one takes risks.

Such a strategy appears to maximize wins, especially when one is  
uncertain about the status.


Can that strategy be effectively translated to MC terms?

To approach the problem from another angle, strong amateur and  
professional players have a consensus that some moves return maximal  
value, others are unsatisfying, and still others are risky. They  
seem to have a high level of agreement about the value of low-risk  
moves; disputes arise for high-risk plays where the outcome is less  
certain.




___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Goal-directedness of Monte-Carlo

2008-09-08 Thread Gian-Carlo Pascutto
Don Dailey wrote:

>> Would a discrepancy on the amount of ELO gained or lost per handicap
>> stone, when comparing MC bots to humans & classical computers, be a good
>> measure of the maximum possible improvement?
> 
> Maybe.  How could you accurately make such a measurement without
> thousands of games?  

I guess the number for human players should be known at this point.

I can give my bot a stone and let it play against itself and Gnu Go, and
see what happens. After 1000 games we'll have a good impression of the
order of magnitude.

-- 
GCP
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Goal-directedness of Monte-Carlo

2008-09-08 Thread terry mcintyre
Interesting analysis, Don.

Human players sometimes adhere to a simple policy: "rich men don't pick fights."

When one is objectively far ahead, one picks up the easy profits, and otherwise 
takes no risks. If moves A, B, and C are comparable risk-wise, one would prefer 
the more profitable of the lot.

On the other hand, when one is far behind, one takes risks. 

Such a strategy appears to maximize wins, especially when one is uncertain 
about the status. 

Can that strategy be effectively translated to MC terms?

To approach the problem from another angle, strong amateur and professional 
players have a consensus that some moves return maximal value, others are 
unsatisfying, and still others are risky. They seem to have a high level of 
agreement about the value of low-risk moves; disputes arise for high-risk plays 
where the outcome is less certain. 


  
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Goal-directedness of Monte-Carlo

2008-09-08 Thread Don Dailey
On Mon, 2008-09-08 at 20:01 +0200, Gian-Carlo Pascutto wrote:
> Don Dailey wrote:
> 
> > That probably just means I have not stumbled on the right ideas or that
> > I was not able to properly tune it.   I would be delighted if someone
> > was able to show us a workable scheme.   I believe if something is found
> > it will result in a very minor improvement, but that it will be an
> > actual improvement.  
> 
> Would a discrepancy on the amount of ELO gained or lost per handicap
> stone, when comparing MC bots to humans & classical computers, be a good
> measure of the maximum possible improvement?

Maybe.  How could you accurately make such a measurement without
thousands of games?  

The problem seems to be a catch-22.   If you are in a dead won position,
it's really risking telling the program you are NOT in a dead won
position.   It now doesn't understand what is required to win the game,
it only knows that it must win another stone at all costs, whether it's
possible or not.   

Most of the time IT IS possible to win another stone or more without
much risk.   But as soon as you do, the dynamic komi adjuster says you
must do it again, and again, and again until you reach a situation where
it is not possible.

And I believe that is where the trouble comes.   At some point, you have
set a goal too high to reach.   This is signal to the program that it
must try at all costs to win (what appears to it to be) a dead lost game
and of course it will very likely play a high risk desperation move in
order to please its master.

So some simple naive scheme is not going to work.   However this would
probably work pretty well if you have some way to gain prior knowledge
about whether it would be safe to "escalate" or not.  

One simple way that might work with some tuning is to use search.  If
you are winning the game with high confidence,  reset the komi a few
stones and do another search (from scratch) to see if you are still
easily winning.  Perhaps something like a binary search will find the
right komi value that gives you a high winning confidence with maximum
greed or some acceptable balance of such.

However, such a scheme is going to cost you resources - which perhaps
may cancel some or all of the benefit.   My own gut feeling tells me
that you are playing with pretty small margins.   At best how much can
we expect to gain?   

I think this is probably something we need to explore and do -
especially if it's important to the reputation of your product, or to
produce a product that mimics more the style of human players who are
less concerned about the beauty of omission.I personally see a bit
of beauty in this style even though it certainly looks odd when you are
not used to it.   

When losing the game, a dynamic adjuster may be safer.  After all, you
are losing anyway, so why not try something?It's not risky trying to
win a lost game by picking off whatever stones you can.

- Don



___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] Goal-directedness of Monte-Carlo

2008-09-08 Thread Ingo Althöfer
Hello Gian-Carlo,

Gian-Carlo Pascutto wrote:
> There has been discussion here about dynamic komi to keep 
> the winning rate close to 50%. As far as I saw there was no 
> clear conclusion about whether that works. Some people argued 
> that it should not exist and measuring objective winning rates is 
> always the right thing.
>
> However, there is less and less doubt in my mind about that the 
> effect you describe exists and it hurts playing strength.
>
> What exactly did you do when you say "reproduce in a clear model"?

I have just written a quick report on my experiment.
You can read or download it from

http://www.althofer.de/mc-laziness.pdf

Best regards, Ingo.

-- 
GMX startet ShortView.de. Hier findest Du Leute mit Deinen Interessen!
Jetzt dabei sein: http://www.shortview.de/[EMAIL PROTECTED]
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Goal-directedness of Monte-Carlo

2008-09-08 Thread Gian-Carlo Pascutto
Don Dailey wrote:

> That probably just means I have not stumbled on the right ideas or that
> I was not able to properly tune it.   I would be delighted if someone
> was able to show us a workable scheme.   I believe if something is found
> it will result in a very minor improvement, but that it will be an
> actual improvement.  

Would a discrepancy on the amount of ELO gained or lost per handicap
stone, when comparing MC bots to humans & classical computers, be a good
measure of the maximum possible improvement?

-- 
GCP
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] Re: 13x13 anchor hung

2008-09-08 Thread Hideki Kato
Don Dailey: <[EMAIL PROTECTED]>:
>Who is running the 13x13 anchor?   It is hung and needs to be restarted.

It's me and restarted right now.

Hmmm, I cannot find any strange things in the log file.  My guess is 
that there are no players other than the anchor running for a while 
and the match server doesn't sent any message to my site, the NAPT 
table time-out syndrome happened.

Hideki

>- Don
>
>
>___
>computer-go mailing list
>computer-go@computer-go.org
>http://www.computer-go.org/mailman/listinfo/computer-go/
--
[EMAIL PROTECTED] (Kato)
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Lockless hash table and other parallel search ideas

2008-09-08 Thread Hideki Kato

Olivier Teytaud: <[EMAIL PROTECTED]>:
>>
>>
>> By my recent experiments, 8~9 * (threads - 1) ELO is lost.  This
>> matches my earlier result well.
>>
>
>Do you have tricks for avoiding redundancies between simulations ?

Yes.  I use Sylvain's fpu and decrease it a little before starting a 
simulation, say, fpu *= 0.99.  This is very simple and fast.

For detailed info, you can read my implementation of PMCTS 
(parallel MCTS) submitted to GPW 2007.  Though it's written in 
Japanese, pseudo code are in English and abstract and captions are 
written in both.
http://www.geocities.jp/hideki_katoh/publications/gpw2007/gpw07-private.pdf
If you have any trouble on downloading or reading, please let me 
know.

Hideki

>I suggest simple tricks like "do not go to node X if there is a thread
>currently in node X"
>(simply by setting the score of the moves leading to node X to zero until
>you get out of the node).
> inline file
>___
>computer-go mailing list
>computer-go@computer-go.org
>http://www.computer-go.org/mailman/listinfo/computer-go/
--
[EMAIL PROTECTED] (Kato)
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Goal-directedness of Monte-Carlo

2008-09-08 Thread Don Dailey
I have some sense that it might be possible to slightly improve the
playing strength with some dynamic komi scheme.   However, I have also
experimented quite a bit with various ways to do this and in each case I
have been able to detect at least a slight weakening of play.  

That probably just means I have not stumbled on the right ideas or that
I was not able to properly tune it.   I would be delighted if someone
was able to show us a workable scheme.   I believe if something is found
it will result in a very minor improvement, but that it will be an
actual improvement.  

When trying ideas like this I require statistically significant samples
which generally mean thousands of games if the improvement or weakening
is fairly small.

- Don



On Mon, 2008-09-08 at 15:22 +0200, Gian-Carlo Pascutto wrote:
> > Especially I was able to "reproduce" the
> > following behaviour of MC in a very clear model:
> >
> > MC is playing most "goal-directed" ("zielgerichtet"
> > in German) when the position is balanced or when
> > the side of MC is slightly behind. However, when
> > MC is clearly ahead or clearly behind it is playing rather
> > lazy.
> >
> > Does someone here know, if there are or have been
> > investigations on this topic in existing papers or
> > projects (for instance in the context of computer go)?
> 
> There has been discussion here about dynamic komi to keep the winning rate
> close to 50%. As far as I saw there was no clear conclusion about whether
> that works. Some people argued that it should not exist and measuring
> objective winning rates is always the right thing.
> 
> However, there is less and less doubt in my mind about that the effect you
> describe exists and it hurts playing strength.
> 
> What exactly did you do when you say "reproduce in a clear model"?
> 
> We need a better understanding of what happens (which might point to a
> solution).
> 

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Lockless hash table and other parallel search ideas

2008-09-08 Thread Álvaro Begué
On Mon, Sep 8, 2008 at 11:45 AM, Olivier Teytaud <[EMAIL PROTECTED]> wrote:
>>
>> By my recent experiments, 8~9 * (threads - 1) ELO is lost.  This
>> matches my earlier result well.
>
>
> Do you have tricks for avoiding redundancies between simulations ?
> I suggest simple tricks like "do not go to node X if there is a thread
> currently in node X"
> (simply by setting the score of the moves leading to node X to zero until
> you get out of the node).

That doesn't sound like a good idea to me, since it would force most
threads out of the interesting moves at the root.

If you are using UCT, you can simply increment the number of plays
when a thread starts exploring this branch; the number of victories
will be incremented later, if it needs to be incremented. This way
other threads will be discouraged from taking this move, but not
disproportionally.

By the way, are people aware that you can do atomic increments with
one line of assembly code?

Álvaro.
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Lockless hash table and other parallel search ideas

2008-09-08 Thread Olivier Teytaud
>
>
> By my recent experiments, 8~9 * (threads - 1) ELO is lost.  This
> matches my earlier result well.
>

Do you have tricks for avoiding redundancies between simulations ?

I suggest simple tricks like "do not go to node X if there is a thread
currently in node X"
(simply by setting the score of the moves leading to node X to zero until
you get out of the node).
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Goal-directedness of Monte-Carlo

2008-09-08 Thread Gian-Carlo Pascutto
> Especially I was able to "reproduce" the
> following behaviour of MC in a very clear model:
>
> MC is playing most "goal-directed" ("zielgerichtet"
> in German) when the position is balanced or when
> the side of MC is slightly behind. However, when
> MC is clearly ahead or clearly behind it is playing rather
> lazy.
>
> Does someone here know, if there are or have been
> investigations on this topic in existing papers or
> projects (for instance in the context of computer go)?

There has been discussion here about dynamic komi to keep the winning rate
close to 50%. As far as I saw there was no clear conclusion about whether
that works. Some people argued that it should not exist and measuring
objective winning rates is always the right thing.

However, there is less and less doubt in my mind about that the effect you
describe exists and it hurts playing strength.

What exactly did you do when you say "reproduce in a clear model"?

We need a better understanding of what happens (which might point to a
solution).

-- 
GCP
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] 13x13 anchor hung

2008-09-08 Thread Don Dailey
Who is running the 13x13 anchor?   It is hung and needs to be restarted.

- Don


___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] Goal-directedness of Monte-Carlo

2008-09-08 Thread Ingo Althöfer
In the last few weeks I have been investigating
Monte-Carlo (MC) game tree search on a rather abstract
level. Especially I was able to "reproduce" the
following behaviour of MC in a very clear model:

MC is playing most "goal-directed" ("zielgerichtet"
in German) when the position is balanced or when
the side of MC is slightly behind. However, when
MC is clearly ahead or clearly behind it is playing rather
lazy. 

Does someone here know, if there are or have been 
investigations on this topic in existing papers or 
projects (for instance in the context of computer go)?

Thanks in advance for all replies!
Ingo Althofer

-- 
Pt! Schon das coole Video vom GMX MultiMessenger gesehen?
Der Eine für Alle: http://www.gmx.net/de/go/messenger03
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/