Chris,

I would agree with Alexander about doing more work up front.

We had a lot of data in SQL we moved to Riak, and like you, our initial instinct was to keep the data normalised and use M/R to work out aggregate values.

After some time experimenting, I believe the better solution, which we now use, is to play to the strength of riak and treat it more like an infinite size store with fast lookup on keys. This means denormalising your data and maybe storing the same piece of information in several different ways to match your later access patterns.

This is antithetical to SQL view of the world, but does allow us to scale much better. In our application of smart meter data, we keep SQL around for all the low volume data that we like to query in lots of different ways, and use riak for the high volume but slowly changing stuff. As the latter arrives, we store it in several data structures, pre-computing most of the calculations that would were previously done in SQL on the fly.

M/R as implemented in Riak has applications, but is a poor choice when you're starting with 'all the data'. You can help it along by preparing your data to be M/R friendly.

Paul



Alexander Sicular wrote, On 15/04/13 05:47:
by date via a secondary index query or via riak search. Oh, and
precompute everything. Pick whichever time slice has less keys than the
number of keys that make your queries go boom. If a month is too big do
a week or even a day. Persist all computation in materialized keys like


_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to