HBase schema design and value filtering

Raghava Mutharaju Sun, 16 May 2010 17:44:08 -0700

Hi all,

    Let a set, S(X) = {a, b, c, d, e, f, .....}. I compute the values of the
set in multiple MR job iterations i.e. multiple MR jobs would be run one
after another several times. In each iteration, a subset of the values would
be computed i.e. the value of the set would be computed incrementally. I am
using HBase to store the result. In this scenario, my design is as follows


Schema Design:

   - S(X) is the row key.
   - Each element would be a column in the column family. The label of the
   column would be the iteration number followed by a number indicating the
   position of the element in the subset.
   Eg: In iteration 1, subset {a,b} has been computed. Then the row would be
   S(X) = {contains: {{1.1: a}, {1.2: b}}}. Here, contains is the name of the
   column family.

I can add the results of subsequent iterations (other subsets) to S(X) by
adding more columns.
Would this design be appropriate for the above scenario?

There would be many S(X) - X can be X1, X2, X3, .... and many elements in
the set, S(X).

Filtering:

To retrieve all the sets, S(X), a range fetch should be performed. I
wouldn't know the startkey and endkey because number of S(X) sets is not
known before hand. Can I use PrefixFilter for this, by setting prefix as
'S'?

Thank you in advance.

Regards,
Raghava.

HBase schema design and value filtering

Reply via email to