Matthew,

You're pretty close with your schema.  The main thing you're missing is the 
fact that to perform the different types of queries you want to do, you'll end 
up denormalizing your data and storing it so it's efficient to access.  For the 
queries you mention, you'd want at least two tables.  "achievements" 
stores/groups everything by the achievement (achievement-centered queries).  
"users" stores/groups everything by the user (user-centered queries).

> - What players have a given achievement?
> - Who are the first 25 people to have a given achievement?

Table(achievements) Row(achievementID) Family(players) 
Columns(epochtimestamp+playerid) Value(could be unused, or store other data 
about this player-achievement)

To get the first 25, you'd just take the first 25 columns returned.  Hbase 0.20 
should have some good limit/offset-type filters to do that as efficiently as 
possible.  Prepending an epoch stamp (using HBase's Bytes.toBytes(long) not 
storing ascii) sorts each achievement entry in the row/family by stamp so they 
are time ordered and you can easily grab the first 25 sequentially.


> - What are all the possible achievements?

Table(achievements) Row(achievementID) Family(content) Each column could be a 
key, the value could be the value.  This gives you a key/val dictionary for the 
given achievementID.


> - What achievements does a give player have?

Table(players) Row(playerid) Family(achievements) 
Columns(epochtimestamp+achievementid) Value(same as above)


> - What achievements does a given player NOT have?

You could have a Family(notachievements) in the players table, though that's a 
bit extreme :)  Otherwise you'd basically cache the achievement id list and you 
would subtract the result from the above query.


Hope that helps.

Jonathan Gray


Reply via email to