I have been following the scaling study closely. Thanks for everyone for gathering such interesting data and especially to Don and the people donating computers.
I have 2 suggestions that could help increase the amount of information gathered from all the CPU hours being used to play these games. The first suggestion is about changing the algorithm used to pair players together, and the second is about "seeding" players to an appropriate starting position. Suppose we have two players A and B who each have about a 50% chance of winning when they play each other. We expect to gain about 0.5*log2(0.5) + 0.5*log2(0.5) = 1 bit of information from every game they play. On the other hand, if player A has a 95% chance of beating player B we get about 0.95*log2(0.95) + 0.05*log2(0.05) = about 0.29 bits of information when they play each other. Therefore, the more often you are playing nearly equal players, the more information is gained. Now suppose we have a list of players ranked 0 to N, where the lower the ranking, the stronger the player. Assume that player n takes twice as long to play a game as player n+1 for any n. Also assume that the Elo difference between player n and player n+1 is 80 points, and that the cost of player 0 playing a game is 1.0 time units. These assumptions seem reasonable for the upper levels of the study if we only consider the Mogo programs up to Mogo_16 (the ones that have about 500 games). Using these assumptions we can build the following table. Rank Cost WinProb Bits Bits/Cost 0 2.0000 0.5000 1.0000 0.5000 1 1.5000 0.3869 0.9627 0.6418 2 1.2500 0.2847 0.8618 0.6895 3 1.1250 0.2008 0.7234 0.6431 4 1.0625 0.1368 0.5758 0.5419 5 1.0313 0.0909 0.4395 0.4262 6 1.0156 0.0594 0.3249 0.3199 7 1.0078 0.0383 0.2344 0.2326 8 1.0039 0.0245 0.1660 0.1654 9 1.0020 0.0156 0.1160 0.1157 10 1.0010 0.0099 0.0801 0.0801 For example, if player 0 plays player1 then the total cost is 1.5 because player 0 takes 1 unit of time and player 1 takes 0.5 (half of player 0 time) for a total cost of 1.5 units of time. The Elo difference is 80, so using the formula P = 1 / (1 + 10^(80/400)) = 0.3869 we see that player 1 has about a 39% change of winning. The bits of information we expect to get are 0.3869*log2(0.3869) + (1 - 0.3869)*log2(1 - 0.3869) = 0.9627. The Bits/Cost is 0.9627/1.5 = 0.6418. The table shows the results of player 0 playing all players ranked 0 to 10 ranks lower. If player 0 plays himself the cost is 2.0, the probability of a win is 50%, resulting in 1 bit of information and the Bits/Cost is 0.5. >From the table, to maximize the information gain per unit of time, the best games are played between players that are 2 ranks apart. That has to be balanced against having a wide variety of games to get robust rankings. I believe the current method is to pick 3 random opponents and choose the one closest to the player. If there are N players, then the top player would expect his average opponent to be about N / (3 + 1) ranks down the list. Since there are 34 players the top player will play the player 8.25 ranks down on average. From the table, this would result in about 0.1654 bits per game. The actual rate is higher than this because a range of opponents are played, some closer than 8.25 ranks. Also notice that the larger the pools of players, the further away in rank the top player's average opponent will be. I propose that instead of the current method, if players play an opponent R ranks away where R follows a normal distribution with standard deviation 3. If there is no player at the suggested rank, generate another rank and try again. For example, the top player has no player ranked higher, so if a positive R is generated, keep generating new ranks until a negative rank R is generated. This method is not affected by the size of the player pool and also results in the average opponent being much closer than 8.25 ranks away. This should result in more information being gained from each game, but still preserve the property that any player has a chance of playing any other player, although if they're not within about 2.5 standard deviations (3 * 2.5 = 7.5 ranks) then it becomes a less than 1% chance. The second suggestion is that when some information is known about new players being introduced into the ladder, they could be "seeded" into an appropriate rank instead of having to play their way up the ladder. For example, if we assume Mogo_16 is about 80 Elo points stronger than Mogo_15 then we could pretend that Mogo_16 has 2 wins and 1 loss against Mogo_15 before any actual games are played. This would seed Mogo_16 into about the right rank, and it would immediately start playing high information games instead of games against opponents that are too weak to yield much information. Chuck Paulson www.PuffinwareLLC.com <http://www.puffinwarellc.com/>
_______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/