Hello Riak enthusiasts, I am trying to design a solution for storing time series data coming from a very large number of potential high-frequency sources.
I thought Riak could be of help, though based on what I read about it I can't use it without some other layer on top of it. The problem is I need to be able to do range queries over this data, by the source. Hence, I want to be able to say "give me the N first data points for source S between time T1 and time T2." I need to store this data for a rather long time, and the expected volume should grow more than what a "vanilla" RDBMS would support. Another thing to note is that I can restrict the number of data points to be returned by a query, so no query would return more than MaxN data points. I thought about doing this the following way: 1. bundle date time series in batches of MaxN, to ensure that any query would require reading at most two batches. The batches would be store inside Riak. 2. Store the start-time, end-time, size and Riak batch ID in a MySQL (or PostgreSQL) DB. My thinking is such a strategy would allow me to persist data in Riak and linearly grow with the data, and the index would be kept in a RDBM for fast range queries. Does it sound sensible to use Riak this way? Does this make you laugh/cry/shake your head in disbelief? Am I overlooking something from Riak which would make all this much better? Thanks and best regards, Paul
_______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
