Re: [computer-go] Strength of Monte-Carlo w/ UCT...

Don Dailey Sat, 09 Aug 2008 18:01:24 -0700

On Sun, 2008-08-10 at 01:59 +0200, Vincent Diepeveen wrote:
> On Aug 9, 2008, at 9:45 PM, Don Dailey wrote:
> 
> >>> I'm curious what you guys think about the scalability of monte carlo
> > with UCT.
> >
> > The MCTS technique appears to be extremely scalable.  The theoretical
> > papers about it claim that it scales up to perfect play in theory.
> >
> 
> We agree here that this is not true of course.


No, I think we disagree this time my friend! 

Monte Carlo of course by itself is not scalable.  But when combined with
tree search such as UCT,  it is equivalent  to a mini-max search with a
high quality evaluation at leaf nodes.   It's scalable because the
longer it searches, the more it acts like a proper mini-max search.  

It's not just theoretically scalable,  practice is proving that it's
scalable in the real world.   



> 
> You can prove that random forms of search basically search based upon  
> mobility.

In Go, everybody has the same move choices except in the case of
self-atari and  KO depending on the exact rules.   


> The insight for that is real simple. If you control more area you  
> have more possibilities,
> so the statistical odds of that backtracking to the root is bigger  
> than positions where you
> control little area.

But mobility would be the same for both players in GO.  Unless the
mobility was more sophisticated - measuring good moves instead of just
any moves - but then it's not uniformly random any longer.

But even so,  as long as the evaluation function eventually terminated
at fully played out positions that are scored correctly,  it's scalable
even if horrible and not practical.

MCTS programs are not mobility driven programs.  

> 
> So a very selective searcher equipped with a good mobility function  
> will beat it,
> as its worst case is better. As you notice correctly at a certain  
> time when programs get real
> strong at certain areas, then the randomness of a search tree will  
> backfire as it creates a
> worst case in program vs program.
> 
> There is however 2 problems to solve to make it happen, maybe a 3d:
>     a) making a good mobility evaluation function. Not easy in chess  
> nor go.
>          i must admit in chess it took me until somewhere in  
> 2004-2005, so more than 11 years
>         after starting chess, to make one. This as a good chessplayer  
> who has put in a lot of time there.
>          In go where literature is even more inaccessable to  
> programmers it will be harder even.
> 
>     b) selective searching is not so easy. How many is there who can  
> search ultra selective
>         and still improve bigtime in playstrength when given more cpu  
> time? Well, other than Leela that is...
> 
> Historic note: we had randomness at chess too, if you just use  
> mobility, nothing else, not even material values,
> you soon end up above 2000 elo.
> 
> Go programs are far from that yet.
> 
> So until then you still will have to deal with the busloads of guys  
> who claim neural networks
> work very well as a replacement of search and/or evaluation.
> 
> It took until 1998 or so to find great well tuned parameters for  
> material in chess (done by Chrilly Donninger,
> note he posted publicly that the tuning didn't matter, just the  
> patterns themselves; an interesting statement at the time),
> 
> Now the question is, what will happen when someone finds those for  
> the most dominating positional components in computer-go?
> 
> Note Don that the obvious thing in go that a lot of humans who suck  
> in go miss, is that finding the best
> moves in a position as a subset of all total moves is a LOT easier  
> than in chess.

I agree with this completely.   The go boys won't want to hear that
because they think chess is trivial.  


> So selective search in go is far easier to do well than in chess.  
> It's relative easy to select each time just a move or 30 to 40
> that are real relevant. If you look at it like that, then you can  
> achieve easily far deeper selective search depths that are relevant
> in go, than in chess, given the same hardware.

I agree up to a point.  The problem is that you have to search a lot
deeper to have anything very useful unless your evaluation function
truly shines.    But I definitely agree that you can get away with a lot
more pruning in GO.   

- Don


> 
> In chess you simply cannot afford to not look at the most stupid line  
> possible, simply because sometimes sacraficing the entire board works.
> The goal is just conquering the enemies king.
> 
> In go however if with some certain degree of sureness you have a won  
> position, there is NEAR ZERO need to look further.
> Hard pruning works far superior there.
> 
> Effectively go topprograms therefore get similar selective search  
> depths to the search depths of chessprograms at the start of the game,
> given the same hardware and time controls. This despite that  
> chessprograms get far higher NPS-es.
> 
> When todays top chessprograms run on hardware from 1999, which was  
> quad xeon 500Mhz that showed up in world champs,
> at 3 minutes a move, they totally destroy with 100% scores the field  
> back then. Not a single draw.
> 
> So we can argue a lot about search here. Considering the selective  
> search depths todays go programs get i would argue that
> the most important improvement now is evaluation function. That  
> doesn't mean it needs to get implemented in the same manner,
> nor that it needs to be some sort of huge function.
> 
> Considering the go strengths of the top authors in computer-go, one  
> would argue they better can implement simple knowledge
> and tune it very well.
> 
> That'll kick more butt than a small improvement in search.
> 
> People who holy believe in search depths as the absolute truth are  
> interesting people.
> If i play my todays Diep at hardware from 1999 against Fritz from  
> those days, one of the
> stronger programs at the time, Fritz would get 17 ply every move.  
> Versus my Diep maybe 12.
> 
> Hell i'll even give 7 ply odds when needed. Just let me turn on a few  
> extensions to not be tactical TOO weak.
> 
> It won't help Fritz from those days. It loses bigtime.
> 
> On paper 7 ply search depth extra is worth an elo or 700, so todays  
> Diep should make 0% chance, just on paper.
> 
> In chess the center is dominant. Real soon everyone figured out that  
> pieces and pawns in the center are worth more than
> when not in the center. In Go all this is more complicated than  
> chess. In todays top programs when they do not know what to do,
> they just put their (light) pieces and pawns in the center. That  
> simply even at the highest levels wins the games.
> 
> That dominant such a simple knowledge rule is. There is just 4 center  
> squares in chess, you'll realize how easy then chess is
> from a knowledge viewpoint seen. Material + central placement of  
> (light) pieces. Because the goal is to mate a king,
> search cannot abort easily based upon the above positional knowledge.
> 
> In go termination is far easier, what's hard to discover is HOW TO  
> EVALUATE in a simple yet GENERIC and real effective manner.
> Secondly: how to TUNE it?
> 
> Whether you do that with a todays random search or not won't matter  
> that much as the improvement possible
> with evaluation function.
> 
> A 16 core Power6 node should be more than enough to beat majority of  
> all professional players.
> 
> Of course it is very well possible that the combination of a real  
> good evaluation function of the future requires a better
> search. For now i'd say, try to find that evaluation function.
> 
> If i could put back todays Diep's evaluation in 1999's diep's search,  
> which sucks real major league compared to todays
> searching, it still would've won any event from 2004 and backwards,  
> given the hardware i used at each championship,
> which never was the fastest hardware except for 2003 world champs (so  
> the supercomputer out of 2003 i would replace
> by my oldie dual k7, which also in 2003 was real slow compared to the  
> quad xeon MP 2.8Ghz that most participants
> showed up with).
> 
> Note that i would argue that my evaluation as it is today won't win  
> the world champs 2008. I need to really do a lot coming
> month to do well there. Most likely in 2009 i tell you about my  
> evaluation function 2008 that it was THAT WEAK, that
> any type of non-total bugged search, could beat it.
> 
> So i would argue that all the discussion above about search is total  
> irrelevant to the future of computer-go.
> 
> Vincent
> 
> > My feeling is that as you scale up in power,  certain things will
> > improve relative to humans faster than other things as you imply.    
> > This
> > happened in computer chess and to this day computers are inferior to
> > human players in many ways,  and yet the computers are superior in
> > playing the game overall.
> >
> > At some point computer go programs will start doing some thing better
> > than the top players, while other things will remain behind.  So they
> > will still lose.   The things the computers do not do well will still
> > continue to slowly improve, helping the situation.  The things  
> > computers
> > do better will become overwhelming and it's only a question of when  
> > the
> > combined effect is enough to beat the top players.
> >
> > I don't believe there are any solid barriers as you imply.  There are
> > just some things that are harder than others and improvements in those
> > areas will come slower (when compared to humans.)
> >
> > I personally believe that what computers do better will give humans a
> > hard time more than the reverse.   It may be that once computers start
> > doing a few things better, humans will have a difficult time.
> >
> > In chess this manifested itself in 2 areas - tactics and consistency.
> > They do "consistency" better than any human will.  A human will make a
> > silly oversight that is "uncharacteristic" of him.   A computer may  
> > also
> > make errors, but not "uncharacteristically."   A chess computer will
> > never miss a 3 move checkmate for instance.   This is a wonderful
> > strength that should not be underestimated.  In tennis it is hard to
> > beat even a mediocre player who's only strength is that he doesn't  
> > make
> > errors.  If the ball comes near to him,  it will always come back and
> > you will never get a free point.  It puts the pressure on you to  
> > deliver
> > and you must never miss.
> >
> > Because of the ability to calculate accurately,  chess programs  
> > quickly
> > became known for their tactical ability.   Even though they made
> > positional errors frequently,  it got to the point where you still had
> > to work hard to beat them and they never let down.    So then players
> > had to actually change their style in order to win,  which in turn  
> > is a
> > constraint on the way you play and makes you weaker.
> >
> > It will be harder in GO than in Chess.  The weakness are more glaring
> > and the strengths are less useful in GO, but I think it will  
> > eventually
> > happen.
> >
> > - Don
> >
> >
> >
> >
> >
> > On Sat, 2008-08-09 at 14:54 -0400, Robert Waite wrote:
> >> I'm curious what you guys think about the scalability of monte carlo
> >> with UCT. Let's say we took a cluster like that which was used for  
> >> the
> >> Mogo vs. Kim game. Then lets say we made 128 of these clusters and
> >> connected them together efficiently. Putting aside implementation and
> >> latency issues... what kind of stones-strength increase would you
> >> imagine?
> >>
> >> Its a pretty arbitrary guess.. but do you think one stone
> >> improvement... or that this would alone be enough to beat a pro even?
> >>
> >> I am wondering because there could be a weakness or limit in MC w/
> >> UCT. I am only now learning about the UCT addition... but there are
> >> vast numbers of possible games that are never visited during the  
> >> monte
> >> carlo simulations. The random stone simulations are pretty random
> >> aren't they? I am reading some of the papers on the UCT addition...
> >> and that does seem to show certain branches to be "better" and worth
> >> more time. Pro players may have a counter-strategy that might come  
> >> out
> >> as Mogo is tested at higher levels of play. Perhaps there will be a
> >> need to combine MCwUCT with heuristics or more knowledge based play.
> >> Going the route of heuristics seems unpleasant and the promise of
> >> using a more computational method would be great. However... if MC
> >> techniques alone have a diminishing return... the route of heuristics
> >> might come back (or perhaps a whole new paradigm for game  
> >> algorithms).
> >>
> >> I am still secretly on the side of human go beating the machine.. but
> >> the recent match really changed my view on topic and really showed  
> >> the
> >> value of statistical analysis. I am just wondering about what kind of
> >> roadblocks might show up for the monte carlo techniques.
> >>
> >> _______________________________________________
> >> computer-go mailing list
> >> [email protected]
> >> http://www.computer-go.org/mailman/listinfo/computer-go/
> >
> > _______________________________________________
> > computer-go mailing list
> > [email protected]
> > http://www.computer-go.org/mailman/listinfo/computer-go/
> >
> 
> _______________________________________________
> computer-go mailing list
> [email protected]
> http://www.computer-go.org/mailman/listinfo/computer-go/

_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Strength of Monte-Carlo w/ UCT...

Reply via email to