Hi all,
Let a set, S(X) = {a, b, c, d, e, f, .....}. I compute the values of the
set in multiple MR job iterations i.e. multiple MR jobs would be run one
after another several times. In each iteration, a subset of the values would
be computed i.e. the value of the set would be computed incrementally. I am
using HBase to store the result. In this scenario, my design is as follows
Schema Design:
- S(X) is the row key.
- Each element would be a column in the column family. The label of the
column would be the iteration number followed by a number indicating the
position of the element in the subset.
Eg: In iteration 1, subset {a,b} has been computed. Then the row would be
S(X) = {contains: {{1.1: a}, {1.2: b}}}. Here, contains is the name of the
column family.
I can add the results of subsequent iterations (other subsets) to S(X) by
adding more columns.
Would this design be appropriate for the above scenario?
There would be many S(X) - X can be X1, X2, X3, .... and many elements in
the set, S(X).
Filtering:
To retrieve all the sets, S(X), a range fetch should be performed. I
wouldn't know the startkey and endkey because number of S(X) sets is not
known before hand. Can I use PrefixFilter for this, by setting prefix as
'S'?
Thank you in advance.
Regards,
Raghava.