If you google thompson heuristic ( don't forget the p in thompson ), you'll 
find a number of hits.

 Terry McIntyre <terrymcint...@yahoo.com>


-- Libertarians Do It With Consent!




________________________________
From: "dave.de...@planet.nl" <dave.de...@planet.nl>
To: computer-go <computer-go@computer-go.org>
Sent: Monday, March 2, 2009 12:12:15 PM
Subject: RE: [computer-go] re-CGT approximating the sum of subgames

RE: [computer-go] re-CGT approximating the sum of subgames 
Hi Philipp,
 
Thank you for replying. I've seen your article before, but I read it only 
superficially then. By the way, I couldn't find any online source about the 
Thomson Heuristic. Do you know if ther are any?
 
Fistly, I want to use ownership and local correlation statistics from light 
montecarlo playout for discovering subgames. 
I'm not particularly focussed on combining UCT and CGT directly.
I don't want to perform global UCT or minimax tree search. This kind of tree 
search should be confined to local subgames only (to improve the accuracy of 
the mean and temperature of subgames).
I only want to use global lookahead to discover subgames, not for exploring the 
global game tree.
 
Secondly, discovering temperature and mean of subgames:
Perhaps it is possible to get a rough approximation of the mean and temperature 
of subgames from ownership statistics and their correlations.
Perhaps it is possible to represent large, cool subgames (ownership 
distribution near random, low correlation with surroundings) like the center in 
the opening, by simple statistical properties (the environment) until more 
details are discovered.
 
Thridly, early in the game, moves don't just affect subgames, they also change 
how the global game divides into subgames. 
This is a major complication.
Also, early in the game, subgames could have fuzzy (but cool) edges (like a 
joseki). 
I don't have a clear picture yet for how to deal with dynamic subgames with 
fuzzy edges. I'm envisioning a structure containing subgame divisions, pointing 
to subgames in a hashtable.
So this structure would represent the superposition of subgame divisions 
discovered so far. But this is still very unclear.
Also, lookahead within subgames may stuble upon subgame interference in a 
portion of their subgame trees (which may or may not affect the canonicalform 
of the combined subgame). I have no idea yet how to deal with this.
 
I know, this is just a bunch of vague ideas with a lot of ifs.
So I have some work to do ;)
 
Dave

________________________________
 Van: computer-go-boun...@computer-go.org namens Philipp Hennig
Verzonden: ma 2-3-2009 1:04
Aan: computer-go@computer-go.org
Onderwerp: RE: [computer-go] re-CGT approximating the sum of subgames


Hi Martin, Dave and others,

I have never posted to this list before (so, hi everyone! :-), but 
your discussion on approximate CGT caught my eye.

Dave. you mentioned that you are interested in trying to mix UCT and 
CGT:
Dave Devos wrote on 19 Feb 2009, 19:33 GMT:
> Ok, I'll look into your temperature discovery article. I noticed 
> that Amazons is mentioned a lot in CGT, but I am not familiar with 
> the game. I want to use montecarlo to discover subgames. Then I want 
> to use CGT techniques to evaluate and sum subgames to get a full 
> board evaluation. If this is possible, it could dramatically reduce 
> the combinatorial explosion on large boards (like you said, totally 
> killing full board search) This is the idea. I don't even know if it 
> is possible, but I definitely want to give it a try.


I worked  on a similar idea during a short project early on in my PhD 
(which has since taken other directions). We (David Stern, Thore 
Graepel and myself) tried to use UCT to measure the temperature of 
Amazons games. You can find a technical report at

http://www.inference.phy.cam.ac.uk/ph347/CPGS-Report_Hennig.pdf
(I turned this into my first-year report, so you will have to excuse 
the length and the rambling at the beginning. If you know about UCT 
and CGT, you can directly jump to chapters 2.3 or 3.0).

The short of it is: The results were not very encouraging. The problem 
is that TDS is searching for an event (the switch from the "stack" of 
coupons to play on the board) which happens at some depth >1 in the 
game tree*, and UCT is not very good at performing inference on such 
quantities. The only thing it is really good at is inference about the 
value of the topmost nodes, where the statistical fidelity is good 
(nodes even a few steps deep into the tree will often only be visited 
a handful of times, so there is not much data available to make a good 
estimate from).
Further, adding a coupon stack to the overall game seems to sometimes 
create situations where the roll-outs become biased towards one 
player. In such situations, it can take UCT very long to discover the 
right solution, wasting a lot of search time expanding the wrong part 
of the tree. In chapter 5.1.1 of my report, you will find an 
experimental study of this behaviour using an artificial game tree.
Then, there are a few more subtleties to keep in mind. One is that Go 
games rarely are truly separable into sub-games (End-Games are an 
exception, as Martin showed in his paper mentioned in his earlier 
mail). Independence may hinge on just one particularly "ingenious" 
move deep in the tree and therefore might be hard to find by Monte-
Carlo search (but this is pure conjecture, so don't let yourself be 
stopped from trying). Finally, searching for independence will cost 
time itself.
So, in the end, you need to search for independence, which takes time 
and introduces errors, because you might miss crucial connections 
between sub-games. Then you need to search for the temperature of each 
sub-game, which is more expensive than just searching for a good move 
in the sub-game, since the TDS tree is more complex than the board 
tree itself, and UCT needs way longer to build up any statistical 
evidence around the interesting bits deep in the search tree (where 
you also find the best next move). It also causes more uncertainty, 
since UCT-based TDS will have an unknown error. And, finally, playing 
HOTSTRAT is in itself an approximation to optimal play, although 
apparently a good one, and with bounded error.
UCT on the whole board, on the other hand, has to struggle with the 
combinatorial explosion of the tree. But we know it converges to 
optimal play, eventually.
Bottom line: I would be interested to hear about your results. All of 
my arguments above are obviously not very quantitative, and it might 
still be possible to solve Go by Divide and Conquer (humans do it, 
apparently), so I'm not saying you shouldn't try, not at all. But just 
naively mixing TDS and UCT, as I did, is not recommended. :-)

Good luck with your experiments!

Philipp

*more precisely, the optimal shift happens at depth ~ ((T_max - T) / 
delta), where T is the temperature of the game, T_max is the value of 
the top-most coupon and delta is the resolution of the measurement 
(i.e. the difference between face values of consecutive coupons), 
because you will typically only have an upper bound of the true 
temperature, and the value of the topmost coupon on the stack has to 
be larger than the temperature. Also, TDS robustness to increasing 
delta or varying delta over the stack is not entirely understood, (see 
Martin's original TDS paper). In particular, something like a binary 
search for the right Temperature (starting with a huge delta, and 
subsequently refining the stack) didn't work for us.
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/



      
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to