Re: [computer-go] Goal-directedness of Monte-Carlo
I don't think this specific test had been done. But I'm assuming the result will be the same as previous tests: deviating from winning percentage> leads to a degradation in strength. Brett Koonce wrote: Greetings from a lurker, Forgive me if I am talking out of my hat. It has been a long time since I have done any real coding. It seems most of the gains in MC/UCT come fairly quickly (or rather you can get within 50% of a good move guess with a few iterations). It would be interesting to perhaps do a progressive stepping down/widening, i.e. 1k playouts with komi + 3 as the cutoff, then feed this tree into 2k playouts with komi + 2, then 4k playouts with komi + 1, and then finally do the usual full blown regular analysis, say 50k playouts (numbers can be tweaked of course). You would lose the initial simulations from your final one, so you would be sacrificing say 10% of the possible simulations, but on the other hand it would seem to bias the tree toward making moves that have a greater chance of winning by a greedy amount without explicitly telling the computer it has to win by a certain number, which would seem dangerous if the simulations are near the threshold. I apologize if this is an obvious idea, was just wondering if there was a flaw with it/someone had done experiments in this direction already. -Brett ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Goal-directedness of Monte-Carlo
Greetings from a lurker, Forgive me if I am talking out of my hat. It has been a long time since I have done any real coding. It seems most of the gains in MC/UCT come fairly quickly (or rather you can get within 50% of a good move guess with a few iterations). It would be interesting to perhaps do a progressive stepping down/ widening, i.e. 1k playouts with komi + 3 as the cutoff, then feed this tree into 2k playouts with komi + 2, then 4k playouts with komi + 1, and then finally do the usual full blown regular analysis, say 50k playouts (numbers can be tweaked of course). You would lose the initial simulations from your final one, so you would be sacrificing say 10% of the possible simulations, but on the other hand it would seem to bias the tree toward making moves that have a greater chance of winning by a greedy amount without explicitly telling the computer it has to win by a certain number, which would seem dangerous if the simulations are near the threshold. I apologize if this is an obvious idea, was just wondering if there was a flaw with it/someone had done experiments in this direction already. -Brett ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Goal-directedness of Monte-Carlo
Actually your summary of what people do sounds exactly like what MC programs do, except for one point... MC programs don't differentiate moves by point value. They only look at winning rate. It's extremely tough to differentiate the one move sequence with 99.1% win rate when all other moves have a 99% win rate. Without any other heuristics or local search to guide MC programs, their play seems reckless... Sent from my iPhone On Sep 8, 2008, at 5:45 PM, terry mcintyre <[EMAIL PROTECTED]> wrote: Interesting analysis, Don. Human players sometimes adhere to a simple policy: "rich men don't pick fights." When one is objectively far ahead, one picks up the easy profits, and otherwise takes no risks. If moves A, B, and C are comparable risk-wise, one would prefer the more profitable of the lot. On the other hand, when one is far behind, one takes risks. Such a strategy appears to maximize wins, especially when one is uncertain about the status. Can that strategy be effectively translated to MC terms? To approach the problem from another angle, strong amateur and professional players have a consensus that some moves return maximal value, others are unsatisfying, and still others are risky. They seem to have a high level of agreement about the value of low-risk moves; disputes arise for high-risk plays where the outcome is less certain. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Goal-directedness of Monte-Carlo
Don Dailey wrote: >> Would a discrepancy on the amount of ELO gained or lost per handicap >> stone, when comparing MC bots to humans & classical computers, be a good >> measure of the maximum possible improvement? > > Maybe. How could you accurately make such a measurement without > thousands of games? I guess the number for human players should be known at this point. I can give my bot a stone and let it play against itself and Gnu Go, and see what happens. After 1000 games we'll have a good impression of the order of magnitude. -- GCP ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Goal-directedness of Monte-Carlo
Interesting analysis, Don. Human players sometimes adhere to a simple policy: "rich men don't pick fights." When one is objectively far ahead, one picks up the easy profits, and otherwise takes no risks. If moves A, B, and C are comparable risk-wise, one would prefer the more profitable of the lot. On the other hand, when one is far behind, one takes risks. Such a strategy appears to maximize wins, especially when one is uncertain about the status. Can that strategy be effectively translated to MC terms? To approach the problem from another angle, strong amateur and professional players have a consensus that some moves return maximal value, others are unsatisfying, and still others are risky. They seem to have a high level of agreement about the value of low-risk moves; disputes arise for high-risk plays where the outcome is less certain. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Goal-directedness of Monte-Carlo
On Mon, 2008-09-08 at 20:01 +0200, Gian-Carlo Pascutto wrote: > Don Dailey wrote: > > > That probably just means I have not stumbled on the right ideas or that > > I was not able to properly tune it. I would be delighted if someone > > was able to show us a workable scheme. I believe if something is found > > it will result in a very minor improvement, but that it will be an > > actual improvement. > > Would a discrepancy on the amount of ELO gained or lost per handicap > stone, when comparing MC bots to humans & classical computers, be a good > measure of the maximum possible improvement? Maybe. How could you accurately make such a measurement without thousands of games? The problem seems to be a catch-22. If you are in a dead won position, it's really risking telling the program you are NOT in a dead won position. It now doesn't understand what is required to win the game, it only knows that it must win another stone at all costs, whether it's possible or not. Most of the time IT IS possible to win another stone or more without much risk. But as soon as you do, the dynamic komi adjuster says you must do it again, and again, and again until you reach a situation where it is not possible. And I believe that is where the trouble comes. At some point, you have set a goal too high to reach. This is signal to the program that it must try at all costs to win (what appears to it to be) a dead lost game and of course it will very likely play a high risk desperation move in order to please its master. So some simple naive scheme is not going to work. However this would probably work pretty well if you have some way to gain prior knowledge about whether it would be safe to "escalate" or not. One simple way that might work with some tuning is to use search. If you are winning the game with high confidence, reset the komi a few stones and do another search (from scratch) to see if you are still easily winning. Perhaps something like a binary search will find the right komi value that gives you a high winning confidence with maximum greed or some acceptable balance of such. However, such a scheme is going to cost you resources - which perhaps may cancel some or all of the benefit. My own gut feeling tells me that you are playing with pretty small margins. At best how much can we expect to gain? I think this is probably something we need to explore and do - especially if it's important to the reputation of your product, or to produce a product that mimics more the style of human players who are less concerned about the beauty of omission.I personally see a bit of beauty in this style even though it certainly looks odd when you are not used to it. When losing the game, a dynamic adjuster may be safer. After all, you are losing anyway, so why not try something?It's not risky trying to win a lost game by picking off whatever stones you can. - Don ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
[computer-go] Goal-directedness of Monte-Carlo
Hello Gian-Carlo, Gian-Carlo Pascutto wrote: > There has been discussion here about dynamic komi to keep > the winning rate close to 50%. As far as I saw there was no > clear conclusion about whether that works. Some people argued > that it should not exist and measuring objective winning rates is > always the right thing. > > However, there is less and less doubt in my mind about that the > effect you describe exists and it hurts playing strength. > > What exactly did you do when you say "reproduce in a clear model"? I have just written a quick report on my experiment. You can read or download it from http://www.althofer.de/mc-laziness.pdf Best regards, Ingo. -- GMX startet ShortView.de. Hier findest Du Leute mit Deinen Interessen! Jetzt dabei sein: http://www.shortview.de/[EMAIL PROTECTED] ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Goal-directedness of Monte-Carlo
Don Dailey wrote: > That probably just means I have not stumbled on the right ideas or that > I was not able to properly tune it. I would be delighted if someone > was able to show us a workable scheme. I believe if something is found > it will result in a very minor improvement, but that it will be an > actual improvement. Would a discrepancy on the amount of ELO gained or lost per handicap stone, when comparing MC bots to humans & classical computers, be a good measure of the maximum possible improvement? -- GCP ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
[computer-go] Re: 13x13 anchor hung
Don Dailey: <[EMAIL PROTECTED]>: >Who is running the 13x13 anchor? It is hung and needs to be restarted. It's me and restarted right now. Hmmm, I cannot find any strange things in the log file. My guess is that there are no players other than the anchor running for a while and the match server doesn't sent any message to my site, the NAPT table time-out syndrome happened. Hideki >- Don > > >___ >computer-go mailing list >computer-go@computer-go.org >http://www.computer-go.org/mailman/listinfo/computer-go/ -- [EMAIL PROTECTED] (Kato) ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Lockless hash table and other parallel search ideas
Olivier Teytaud: <[EMAIL PROTECTED]>: >> >> >> By my recent experiments, 8~9 * (threads - 1) ELO is lost. This >> matches my earlier result well. >> > >Do you have tricks for avoiding redundancies between simulations ? Yes. I use Sylvain's fpu and decrease it a little before starting a simulation, say, fpu *= 0.99. This is very simple and fast. For detailed info, you can read my implementation of PMCTS (parallel MCTS) submitted to GPW 2007. Though it's written in Japanese, pseudo code are in English and abstract and captions are written in both. http://www.geocities.jp/hideki_katoh/publications/gpw2007/gpw07-private.pdf If you have any trouble on downloading or reading, please let me know. Hideki >I suggest simple tricks like "do not go to node X if there is a thread >currently in node X" >(simply by setting the score of the moves leading to node X to zero until >you get out of the node). > inline file >___ >computer-go mailing list >computer-go@computer-go.org >http://www.computer-go.org/mailman/listinfo/computer-go/ -- [EMAIL PROTECTED] (Kato) ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Goal-directedness of Monte-Carlo
I have some sense that it might be possible to slightly improve the playing strength with some dynamic komi scheme. However, I have also experimented quite a bit with various ways to do this and in each case I have been able to detect at least a slight weakening of play. That probably just means I have not stumbled on the right ideas or that I was not able to properly tune it. I would be delighted if someone was able to show us a workable scheme. I believe if something is found it will result in a very minor improvement, but that it will be an actual improvement. When trying ideas like this I require statistically significant samples which generally mean thousands of games if the improvement or weakening is fairly small. - Don On Mon, 2008-09-08 at 15:22 +0200, Gian-Carlo Pascutto wrote: > > Especially I was able to "reproduce" the > > following behaviour of MC in a very clear model: > > > > MC is playing most "goal-directed" ("zielgerichtet" > > in German) when the position is balanced or when > > the side of MC is slightly behind. However, when > > MC is clearly ahead or clearly behind it is playing rather > > lazy. > > > > Does someone here know, if there are or have been > > investigations on this topic in existing papers or > > projects (for instance in the context of computer go)? > > There has been discussion here about dynamic komi to keep the winning rate > close to 50%. As far as I saw there was no clear conclusion about whether > that works. Some people argued that it should not exist and measuring > objective winning rates is always the right thing. > > However, there is less and less doubt in my mind about that the effect you > describe exists and it hurts playing strength. > > What exactly did you do when you say "reproduce in a clear model"? > > We need a better understanding of what happens (which might point to a > solution). > ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Lockless hash table and other parallel search ideas
On Mon, Sep 8, 2008 at 11:45 AM, Olivier Teytaud <[EMAIL PROTECTED]> wrote: >> >> By my recent experiments, 8~9 * (threads - 1) ELO is lost. This >> matches my earlier result well. > > > Do you have tricks for avoiding redundancies between simulations ? > I suggest simple tricks like "do not go to node X if there is a thread > currently in node X" > (simply by setting the score of the moves leading to node X to zero until > you get out of the node). That doesn't sound like a good idea to me, since it would force most threads out of the interesting moves at the root. If you are using UCT, you can simply increment the number of plays when a thread starts exploring this branch; the number of victories will be incremented later, if it needs to be incremented. This way other threads will be discouraged from taking this move, but not disproportionally. By the way, are people aware that you can do atomic increments with one line of assembly code? Álvaro. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Lockless hash table and other parallel search ideas
> > > By my recent experiments, 8~9 * (threads - 1) ELO is lost. This > matches my earlier result well. > Do you have tricks for avoiding redundancies between simulations ? I suggest simple tricks like "do not go to node X if there is a thread currently in node X" (simply by setting the score of the moves leading to node X to zero until you get out of the node). ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Goal-directedness of Monte-Carlo
> Especially I was able to "reproduce" the > following behaviour of MC in a very clear model: > > MC is playing most "goal-directed" ("zielgerichtet" > in German) when the position is balanced or when > the side of MC is slightly behind. However, when > MC is clearly ahead or clearly behind it is playing rather > lazy. > > Does someone here know, if there are or have been > investigations on this topic in existing papers or > projects (for instance in the context of computer go)? There has been discussion here about dynamic komi to keep the winning rate close to 50%. As far as I saw there was no clear conclusion about whether that works. Some people argued that it should not exist and measuring objective winning rates is always the right thing. However, there is less and less doubt in my mind about that the effect you describe exists and it hurts playing strength. What exactly did you do when you say "reproduce in a clear model"? We need a better understanding of what happens (which might point to a solution). -- GCP ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
[computer-go] 13x13 anchor hung
Who is running the 13x13 anchor? It is hung and needs to be restarted. - Don ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
[computer-go] Goal-directedness of Monte-Carlo
In the last few weeks I have been investigating Monte-Carlo (MC) game tree search on a rather abstract level. Especially I was able to "reproduce" the following behaviour of MC in a very clear model: MC is playing most "goal-directed" ("zielgerichtet" in German) when the position is balanced or when the side of MC is slightly behind. However, when MC is clearly ahead or clearly behind it is playing rather lazy. Does someone here know, if there are or have been investigations on this topic in existing papers or projects (for instance in the context of computer go)? Thanks in advance for all replies! Ingo Althofer -- Pt! Schon das coole Video vom GMX MultiMessenger gesehen? Der Eine für Alle: http://www.gmx.net/de/go/messenger03 ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/