I've been thinking more about a similar sort of problem.

The major difference between normal relational databases and big hashtables
is that in the former you can sort and retrieve on any column. In big
hashtables (or at least from Cassandra), you only have 1 field to sort on
and the sort type is predetermined.

>From a theoretical perspective, your traditional DBMS typically allows you
to create arbitrary indexes in order to speed up access. I'm thinking the
same can be through of for something like this.

Ergo, I imagine that for different kinds of entities, you can have a
separate supercolumn family that basically serves as an index table. From
what I've heard, this is somewhat indicated.

In a broader perspective, you can also use tables that serve as metadata.
Ergo, you could store keys of all posts bucketed by some time period (eg.
month).

Peter


On Thu, Mar 11, 2010 at 7:34 PM, Bill Au <bill.w...@gmail.com> wrote:

> Let take Twitter as an example.  All the tweets are timestamped.  I want to
> keep only a month's worth of tweets for each user.  The number of tweets
> that fit within this one month window varies from user to user.  What is the
> best way to accomplish this?  There are millions of users.  Do I need to
> loop through all of them and handle the delete one user at a time?  Or is
> there a better way to do this?  If a user has not post a new tweet in more
> than a month, I also want to remove the user itself.  Do I also need to do
> looking through all the users one at a time?
>
> Bill
>

Reply via email to