Matthew, You're pretty close with your schema. The main thing you're missing is the fact that to perform the different types of queries you want to do, you'll end up denormalizing your data and storing it so it's efficient to access. For the queries you mention, you'd want at least two tables. "achievements" stores/groups everything by the achievement (achievement-centered queries). "users" stores/groups everything by the user (user-centered queries).
> - What players have a given achievement? > - Who are the first 25 people to have a given achievement? Table(achievements) Row(achievementID) Family(players) Columns(epochtimestamp+playerid) Value(could be unused, or store other data about this player-achievement) To get the first 25, you'd just take the first 25 columns returned. Hbase 0.20 should have some good limit/offset-type filters to do that as efficiently as possible. Prepending an epoch stamp (using HBase's Bytes.toBytes(long) not storing ascii) sorts each achievement entry in the row/family by stamp so they are time ordered and you can easily grab the first 25 sequentially. > - What are all the possible achievements? Table(achievements) Row(achievementID) Family(content) Each column could be a key, the value could be the value. This gives you a key/val dictionary for the given achievementID. > - What achievements does a give player have? Table(players) Row(playerid) Family(achievements) Columns(epochtimestamp+achievementid) Value(same as above) > - What achievements does a given player NOT have? You could have a Family(notachievements) in the players table, though that's a bit extreme :) Otherwise you'd basically cache the achievement id list and you would subtract the result from the above query. Hope that helps. Jonathan Gray
