I have been following the scaling study closely. Thanks for everyone for
gathering such interesting data and especially to Don and the people
donating computers.

 

I have 2 suggestions that could help increase the amount of information
gathered from all the CPU hours being used to play these games. The first
suggestion is about changing the algorithm used to pair players together,
and the second is about "seeding" players to an appropriate starting
position.

 

Suppose we have two players A and B who each have about a 50% chance of
winning when they play each other. We expect to gain about 0.5*log2(0.5) +
0.5*log2(0.5) = 1 bit of information from every game they play. On the other
hand, if player A has a 95% chance of beating player B we get about
0.95*log2(0.95) + 0.05*log2(0.05) = about 0.29 bits of information when they
play each other. Therefore, the more often you are playing nearly equal
players, the more information is gained.

 

Now suppose we have a list of players ranked 0 to N, where the lower the
ranking, the stronger the player. Assume that player n takes twice as long
to play a game as player n+1 for any n. Also assume that the Elo difference
between player n and player n+1 is 80 points, and that the cost of player 0
playing a game is 1.0 time units. These assumptions seem reasonable for the
upper levels of the study if we only consider the Mogo programs up to
Mogo_16 (the ones that have about 500 games). Using these assumptions we can
build the following table.

 

Rank    Cost                 WinProb          Bits
Bits/Cost

0          2.0000             0.5000             1.0000             0.5000

1          1.5000             0.3869             0.9627             0.6418

2          1.2500             0.2847             0.8618             0.6895

3          1.1250             0.2008             0.7234             0.6431

4          1.0625             0.1368             0.5758             0.5419

5          1.0313             0.0909             0.4395             0.4262

6          1.0156             0.0594             0.3249             0.3199

7          1.0078             0.0383             0.2344             0.2326

8          1.0039             0.0245             0.1660             0.1654

9          1.0020             0.0156             0.1160             0.1157

10        1.0010             0.0099             0.0801             0.0801

 

For example, if player 0 plays player1 then the total cost is 1.5 because
player 0 takes 1 unit of time and player 1 takes 0.5 (half of player 0 time)
for a total cost of 1.5 units of time. The Elo difference is 80, so using
the formula P = 1 / (1 + 10^(80/400)) = 0.3869 we see that player 1 has
about a 39% change of winning. The bits of information we expect to get are
0.3869*log2(0.3869) + (1 - 0.3869)*log2(1 - 0.3869) = 0.9627. The Bits/Cost
is 0.9627/1.5 = 0.6418. The table shows the results of player 0 playing all
players ranked 0 to 10 ranks lower. If player 0 plays himself the cost is
2.0, the probability of a win is 50%, resulting in 1 bit of information and
the Bits/Cost is 0.5.

 

>From the table, to maximize the information gain per unit of time, the best
games are played between players that are 2 ranks apart. That has to be
balanced against having a wide variety of games to get robust rankings.

 

I believe the current method is to pick 3 random opponents and choose the
one closest to the player. If there are N players, then the top player would
expect his average opponent to be about N / (3 + 1) ranks down the list.
Since there are 34 players the top player will play the player 8.25 ranks
down on average. From the table, this would result in about 0.1654 bits per
game. The actual rate is higher than this because a range of opponents are
played, some closer than 8.25 ranks. Also notice that the larger the pools
of players, the further away in rank the top player's average opponent will
be.

 

I propose that instead of the current method, if players play an opponent R
ranks away where R follows a normal distribution with standard deviation 3.
If there is no player at the suggested rank, generate another rank and try
again. For example, the top player has no player ranked higher, so if a
positive R is generated, keep generating new ranks until a negative rank R
is generated. This method is not affected by the size of the player pool and
also results in the average opponent being much closer than 8.25 ranks away.

 

This should result in more information being gained from each game, but
still preserve the property that any player has a chance of playing any
other player, although if they're not within about 2.5 standard deviations
(3 * 2.5 = 7.5 ranks) then it becomes a less than 1% chance.

 

The second suggestion is that when some information is known about new
players being introduced into the ladder, they could be "seeded" into an
appropriate rank instead of having to play their way up the ladder. For
example, if we assume Mogo_16 is about 80 Elo points stronger than Mogo_15
then we could pretend that Mogo_16 has 2 wins and 1 loss against Mogo_15
before any actual games are played. This would seed Mogo_16 into about the
right rank, and it would immediately start playing high information games
instead of games against opponents that are too weak to yield much
information.

 

Chuck Paulson

www.PuffinwareLLC.com <http://www.puffinwarellc.com/> 

 

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to