Hi Massimiliano,

I think your design is very thorough, I wouldn't worry about the cardinality of such index but its per index size (how many keys a single 2i index will return?) , in that case think of 2i as yet another relational DB (LevelDB), you should test it with many keys and check its performance, but from my perspective I see nothing wrong about it.

If that isn't good enough and what you need is a graph oriented search, the alternative could be Neo4J, though it is not as high available and scalable as Riak (In theory), and yes, you are right, for the many-to-many combinations you are talking about you will have to avoid MapReduce and use a well designed 2i, or use a MapReduce using a 2i result as input (As long as the list of keys per 2i index isn't too long)

You might have to do a couple of 2i queries to get your many-to-many list, then fetch key by key either single threaded or multi-threaded.

Hope that helps,

Guido.

On 04/08/13 19:43, Massimiliano Ciancio wrote:
Hi all,
I'm new to Riak and I'm evaluating its use in a project in which I've
to map some many-to-many relationship with millions of link.
That is, imagine you have objects of type X={x_0,x_1,...,x_n} and
objects of type Y={y_0,y_1,...,y_m} that are related. Xs are connected
to millions of Ys and viceversa.
If there wasn't a so high number I could put the relation inside the
objects X as a list of related Ys and viceversa.
But with this number I'll have a very big data field associated to each object.
I'm thinking that I can create a bucket "relation X-Y" in which I can
put objects with key "x_i-y_j" with value, i.e., "True" menaing that
x_i is related to y_j. In this way it will be simple to know if x_i is
related to y_j or viceversa: if the object with key "x_i-y_j" exists
then they are related. Ok. But now, how can I extract all the Ys
related to x_i or, viceversa, the Xs related to y_j?
Using map-reduce or similar will imply analizyng all the objects in the cluster.
I can use a secondary index on the objects of the bucket "relation
X-Y". But I imagine it could give performance problems because of the
hundreds of millions of items in it.
So I thought of the following use of 2i: when I write a new relation
"x_i-y_j" I can add it to two secondary indexes, one named "x_i-index"
with value y_j and the other "y_j-index" with value x_i. In this way
I'll not have one big index but many (millions!) of smaller indexes.
Suppose I want all the Ys related to x_i: I can ask Riak for the
elements in the index "x_i-index". In my mind this will be a not so
heavy query...
Now my questions are: is this approach viable? How many million of
"small" indexes impact on the riak's performances? Are there best
practices to do this?
Thanks in advance
Massimiliano

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to