Re: High volume data series storage and queries

Jeremiah Peschka Wed, 10 Aug 2011 06:40:34 -0700

Protip: It's called a SAN.
---
Jeremiah Peschka - Founder, Brent Ozar PLF, LLC
 Microsoft SQL Server MVP


On Aug 10, 2011, at 6:39 AM, Ciprian Dorin Craciun wrote:

> On Wed, Aug 10, 2011 at 16:30, Jeremiah Peschka
> <[email protected]> wrote:
>> Ciprian's response this morning has some interesting information in it - I 
>> have suspicions about where the PostgreSQL and MySQL performance problems. 
>> Unfortunately, my suspicion also highlights why some people don't like using 
>> RDBMSes: when you get to higher performance scenarios you frequently need 
>> specialized knowledge to tune the RDBMS to run well.
> 
>    Indeed you need highly qualified DBA's to tune the RDBMS. And in
> my case I've failed to successfully do so :). (I've asked here and
> there for feedback, but nothing I have done "revived" the insert
> speed.)
> 
> 
>> Within an RDBMS, you'll get good performance by splitting out your writes 
>> into a partitioned table where you either group up writes by some constant 
>> value (modulus of sensor location, for example) or else you group writes by 
>> month. You can then spread your writes out across multiple storage devices, 
>> gaining some of the benefits of a system like Riak, and still getting the 
>> benefit of an RDBMSes sequential scans.
>> ---
>> Jeremiah Peschka - Founder, Brent Ozar PLF, LLC
>> Microsoft SQL Server MVP
> 
>    But then the solution gets quite complicated and you'll end up
> constructing a specialized "data-store" on-top of a cluster of RDBMS,
> when you could have just started from the beginning on-top of some
> lightweight storage engine.
> 
>    Ciprian.
> 
> 
>> On Aug 9, 2011, at 6:38 PM, Paul O wrote:
>> 
>>> Jeremiah, with all the feedback indicating Riak to not be such a straight 
>>> fit for the problem even accepting batching compromises, I'll have to 
>>> strongly reconsider RDBMs solutions, though they come with their own 
>>> tradeoffs, as discussed here today. How about storing data using innostore 
>>> instead of bitcask, would this be a "best of both worlds" situation?
>>> 
>>> Thanks for these and the previous valuable comments, again,
>>> 
>>> Paul
>>> 
>>> On Tue, Aug 9, 2011 at 11:14 AM, Jeremiah Peschka 
>>> <[email protected]> wrote:
>>> Excellent points, Alex.
>>> 
>>> When you compare Riak's storage overhead to something like an RDBMS where 
>>> you have maybe 24 bytes for row overhead (as is the case for PostgreSQL), 
>>> you'll find that there's a tremendous economy of space elsewhere.
>>> 
>>> Riak is going to excel in applications where reads and writes will be truly 
>>> random. Yes, there are Map Reduce features, but randomly organized data is 
>>> still randomly organized data.
>>> 
>>> If you look at RDBMSes, horizontally partitioning your system through RDBMS 
>>> features (SQL Server's partitioned tables, PostgreSQL's partitioned views, 
>>> for example), gives you the ability to take advantage of many known 
>>> quantities in that world - range queries can take advantage of sequential 
>>> scan speeds across rotational disks.
>>> 
>>> You can even avoid the overhead of a traditional RDBMS altogether and use 
>>> the InnoDB APIs or something like HandlerSocket to write data very quickly.
>>> 
>>> ---
>>> Jeremiah Peschka - Founder, Brent Ozar PLF, LLC
>>>  Microsoft SQL Server MVP
>>> 
>>> On Aug 9, 2011, at 7:43 AM, Alexander Sicular wrote:
>>> 
>>>> A couple of thoughts:
>>>> 
>>>> -disk io
>>>> -total keys versus memory
>>>> -data on disk overhead
>>>> 
>>>> As Jeremiah noted, disk io is crucial. Thankfully, Riak's distributed mesh 
>>>> gives you access to a number of spindles limited only by your budget. I 
>>>> think that is a critical bonus of a distributed system like Riak that is 
>>>> often not fully appreciated. Here Riak is a win for you.
>>>> Bitcask needs all keys to fit in memory. We are talking something like:
>>>> 
>>>> (key length + overhead) * number of keys * replicas < cluster max 
>>>> available ram.
>>>> 
>>>> There is a tool on the wiki which should help figure this out. What that 
>>>> basically means for you is that you will have to batch your data by some 
>>>> sensor/time granularity metric. Let's say every minute. At 10hz that is a 
>>>> 600x reduction in total keys. Of course, this doesn't come for free. Your 
>>>> application middleware will have to accommodate. That means you could lose 
>>>> up to whatever your time granularity batch is. Ie. You could lose a minute 
>>>> of sensor data should your application fail. Here Riak is neutral to 
>>>> negative.
>>>> Riak data structure is not friendly towards small values. Sensor data 
>>>> generally spit out integers or other small data tuples. If you search the 
>>>> list archives you will find a magnificent data overhead writeup. IIRC, it 
>>>> was something on the order of 450b. What that basically tells you is that 
>>>> you can't use bitcask for small values if disk space is a concern, as I 
>>>> imagine it to be in this case. Also, sensor data is generally write only, 
>>>> ie. never deleted or modified, so compaction should not be a concern when 
>>>> using bitcask. Here Riak is a strong negative.
>>>> Data retrieval issues aside (which between Riak Search/secondary 
>>>> indexes/third party indexes should not be a major concern), I am of the 
>>>> opinion that Riak is not a good fit for high frequency sensor data 
>>>> applications.
>>>> 
>>>> Cheers,
>>>> Alexander
>>>> 
>>>> Sent from my rotary phone.
>>>> 
>>>> On Aug 8, 2011 9:40 PM, "Paul O" <[email protected]> wrote:
>>>>> Quite a few interesting points, thanks!
>>>>> 
>>>>> On Mon, Aug 8, 2011 at 5:53 PM, Jeremiah Peschka 
>>>>> <[email protected]
>>>>>> wrote:
>>>>> 
>>>>>> Responses inline
>>>>>> 
>>>>>> On Aug 8, 2011, at 1:25 PM, Paul O wrote:
>>>>>> 
>>>>>> Will any existing data be imported? If this is totally greenfield, then
>>>>>> you're free to do whatever zany things you want!
>>>>> 
>>>>> 
>>>>> Almost totally greenfield, yes. Some data will need to be imported but 
>>>>> it's
>>>>> already in the format described.
>>>>> 
>>>>> Ah, so you need IOPS throughput, not storage capacity. On the hardware 
>>>>> side
>>>>>> make sure your storage subsystem can keep up - don't cheap out on disks 
>>>>>> just
>>>>>> because you have a lot of nodes. A single rotational HDD can only handle
>>>>>> about 180 IOPS on average. There's a lot you can do on the storage 
>>>>>> backend
>>>>>> to make sure you're able to keep up there.
>>>>>> 
>>>>> 
>>>>> Indeed, storage capacity is also an issue but IOPS would be important, 
>>>>> too.
>>>>> I assume that sending batches to Riak (opaque blobs) would help a lot with
>>>>> the quantity of writes, but it's still a very important point.
>>>>> 
>>>>> You may want to look into ways to force Riak to clean up the bitcask 
>>>>> files.
>>>>>> I don't entirely remember how it's going to handle cleaning up deleted
>>>>>> records, but you might run into some tricky situations where compactions
>>>>>> aren't occurring.
>>>>>> 
>>>>> 
>>>>> Hm, any references regarding that? It would be a major snag in the whole
>>>>> schema Riak doesn't properly reclaim space for deleted records.
>>>>> 
>>>>> Riak is pretty constant time for Bitcask. The tricky part with the amount 
>>>>> of
>>>>>> data you're describing is that Bitcask requires (I think) that all keys 
>>>>>> fit
>>>>>> into memory. As your data volume increases, you'll need to do a 
>>>>>> combination
>>>>>> of scaling up and scaling out. Scale up RAM in the nodes and then add
>>>>>> additional nodes to handle load. RAM will help with data volume, more 
>>>>>> nodes
>>>>>> will help with write throughput.
>>>>>> 
>>>>> 
>>>>> Indeed, for high frequency sources that would create lots of bundles even
>>>>> the MaxN to 1 reduction for key names might still generate loads of keys.
>>>>> Any idea how much RAM Riak requires per record, or a reference that would
>>>>> point me to it?
>>>>> 
>>>>> Since you're searching on time series, mostly, you could build time 
>>>>> indexes
>>>>>> in your RDBMS. The nice thing is that querying temporal data is well
>>>>>> documented in the relational world, especially in the data warehousing
>>>>>> world. In your case, I'd create a dates table and have a foreign key
>>>>>> relating to my RDBMS index table to make it easy to search for dates.
>>>>>> Querying your time table will be fast which reduces the need for scans in
>>>>>> your index table.
>>>>>> 
>>>>>> EXAMPLE:
>>>>>> 
>>>>>> CREATE TABLE timeseries (
>>>>>> time_key INT,
>>>>>> date TIMESTAMP,
>>>>>> datestring VARCHAR(30),
>>>>>> year SMALLINT,
>>>>>> month TINYINT,
>>>>>> day TINYINT,
>>>>>> day_of_week TINYINT
>>>>>> -- etc
>>>>>> );
>>>>>> 
>>>>>> CREATE TABLE riak_index (
>>>>>> id INT NOT NULL,
>>>>>> time_key INT NOT NULL REFERENCES timeseries(time_key),
>>>>>> riak_key VARCHAR(100) NOT NULL
>>>>>> );
>>>>>> 
>>>>>> 
>>>>>> SELECT ri.riak_key
>>>>>> FROM timeseries ts
>>>>>> JOIN riak_index ri ON ts.time_key = ri.time_key
>>>>>> WHERE ts.date BETWEEN '20090702' AND '20100702';
>>>>>> 
>>>>> 
>>>>> My plan was to have the riak_index contain something like: (id, 
>>>>> start_time,
>>>>> end_time, source_id, record_count.)
>>>>> 
>>>>> Without going too much into RDBMS fun, this pattern can get your RDBMS
>>>>>> running pretty quickly and then you can combine that with Riak's 
>>>>>> performance
>>>>>> and have a really good idea of how quick any query will be.
>>>>> 
>>>>> 
>>>>> That's roughly the plan, thanks again for your contributions to the
>>>>> discussion!
>>>>> 
>>>>> Paul
>>>> _______________________________________________
>>>> riak-users mailing list
>>>> [email protected]
>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>> 
>>> 
>>> _______________________________________________
>>> riak-users mailing list
>>> [email protected]
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>> 
>>> _______________________________________________
>>> riak-users mailing list
>>> [email protected]
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> 
>> 
>> _______________________________________________
>> riak-users mailing list
>> [email protected]
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> 


_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: High volume data series storage and queries

Reply via email to