Protip: It's called a SAN. --- Jeremiah Peschka - Founder, Brent Ozar PLF, LLC Microsoft SQL Server MVP
On Aug 10, 2011, at 6:39 AM, Ciprian Dorin Craciun wrote: > On Wed, Aug 10, 2011 at 16:30, Jeremiah Peschka > <[email protected]> wrote: >> Ciprian's response this morning has some interesting information in it - I >> have suspicions about where the PostgreSQL and MySQL performance problems. >> Unfortunately, my suspicion also highlights why some people don't like using >> RDBMSes: when you get to higher performance scenarios you frequently need >> specialized knowledge to tune the RDBMS to run well. > > Indeed you need highly qualified DBA's to tune the RDBMS. And in > my case I've failed to successfully do so :). (I've asked here and > there for feedback, but nothing I have done "revived" the insert > speed.) > > >> Within an RDBMS, you'll get good performance by splitting out your writes >> into a partitioned table where you either group up writes by some constant >> value (modulus of sensor location, for example) or else you group writes by >> month. You can then spread your writes out across multiple storage devices, >> gaining some of the benefits of a system like Riak, and still getting the >> benefit of an RDBMSes sequential scans. >> --- >> Jeremiah Peschka - Founder, Brent Ozar PLF, LLC >> Microsoft SQL Server MVP > > But then the solution gets quite complicated and you'll end up > constructing a specialized "data-store" on-top of a cluster of RDBMS, > when you could have just started from the beginning on-top of some > lightweight storage engine. > > Ciprian. > > >> On Aug 9, 2011, at 6:38 PM, Paul O wrote: >> >>> Jeremiah, with all the feedback indicating Riak to not be such a straight >>> fit for the problem even accepting batching compromises, I'll have to >>> strongly reconsider RDBMs solutions, though they come with their own >>> tradeoffs, as discussed here today. How about storing data using innostore >>> instead of bitcask, would this be a "best of both worlds" situation? >>> >>> Thanks for these and the previous valuable comments, again, >>> >>> Paul >>> >>> On Tue, Aug 9, 2011 at 11:14 AM, Jeremiah Peschka >>> <[email protected]> wrote: >>> Excellent points, Alex. >>> >>> When you compare Riak's storage overhead to something like an RDBMS where >>> you have maybe 24 bytes for row overhead (as is the case for PostgreSQL), >>> you'll find that there's a tremendous economy of space elsewhere. >>> >>> Riak is going to excel in applications where reads and writes will be truly >>> random. Yes, there are Map Reduce features, but randomly organized data is >>> still randomly organized data. >>> >>> If you look at RDBMSes, horizontally partitioning your system through RDBMS >>> features (SQL Server's partitioned tables, PostgreSQL's partitioned views, >>> for example), gives you the ability to take advantage of many known >>> quantities in that world - range queries can take advantage of sequential >>> scan speeds across rotational disks. >>> >>> You can even avoid the overhead of a traditional RDBMS altogether and use >>> the InnoDB APIs or something like HandlerSocket to write data very quickly. >>> >>> --- >>> Jeremiah Peschka - Founder, Brent Ozar PLF, LLC >>> Microsoft SQL Server MVP >>> >>> On Aug 9, 2011, at 7:43 AM, Alexander Sicular wrote: >>> >>>> A couple of thoughts: >>>> >>>> -disk io >>>> -total keys versus memory >>>> -data on disk overhead >>>> >>>> As Jeremiah noted, disk io is crucial. Thankfully, Riak's distributed mesh >>>> gives you access to a number of spindles limited only by your budget. I >>>> think that is a critical bonus of a distributed system like Riak that is >>>> often not fully appreciated. Here Riak is a win for you. >>>> Bitcask needs all keys to fit in memory. We are talking something like: >>>> >>>> (key length + overhead) * number of keys * replicas < cluster max >>>> available ram. >>>> >>>> There is a tool on the wiki which should help figure this out. What that >>>> basically means for you is that you will have to batch your data by some >>>> sensor/time granularity metric. Let's say every minute. At 10hz that is a >>>> 600x reduction in total keys. Of course, this doesn't come for free. Your >>>> application middleware will have to accommodate. That means you could lose >>>> up to whatever your time granularity batch is. Ie. You could lose a minute >>>> of sensor data should your application fail. Here Riak is neutral to >>>> negative. >>>> Riak data structure is not friendly towards small values. Sensor data >>>> generally spit out integers or other small data tuples. If you search the >>>> list archives you will find a magnificent data overhead writeup. IIRC, it >>>> was something on the order of 450b. What that basically tells you is that >>>> you can't use bitcask for small values if disk space is a concern, as I >>>> imagine it to be in this case. Also, sensor data is generally write only, >>>> ie. never deleted or modified, so compaction should not be a concern when >>>> using bitcask. Here Riak is a strong negative. >>>> Data retrieval issues aside (which between Riak Search/secondary >>>> indexes/third party indexes should not be a major concern), I am of the >>>> opinion that Riak is not a good fit for high frequency sensor data >>>> applications. >>>> >>>> Cheers, >>>> Alexander >>>> >>>> Sent from my rotary phone. >>>> >>>> On Aug 8, 2011 9:40 PM, "Paul O" <[email protected]> wrote: >>>>> Quite a few interesting points, thanks! >>>>> >>>>> On Mon, Aug 8, 2011 at 5:53 PM, Jeremiah Peschka >>>>> <[email protected] >>>>>> wrote: >>>>> >>>>>> Responses inline >>>>>> >>>>>> On Aug 8, 2011, at 1:25 PM, Paul O wrote: >>>>>> >>>>>> Will any existing data be imported? If this is totally greenfield, then >>>>>> you're free to do whatever zany things you want! >>>>> >>>>> >>>>> Almost totally greenfield, yes. Some data will need to be imported but >>>>> it's >>>>> already in the format described. >>>>> >>>>> Ah, so you need IOPS throughput, not storage capacity. On the hardware >>>>> side >>>>>> make sure your storage subsystem can keep up - don't cheap out on disks >>>>>> just >>>>>> because you have a lot of nodes. A single rotational HDD can only handle >>>>>> about 180 IOPS on average. There's a lot you can do on the storage >>>>>> backend >>>>>> to make sure you're able to keep up there. >>>>>> >>>>> >>>>> Indeed, storage capacity is also an issue but IOPS would be important, >>>>> too. >>>>> I assume that sending batches to Riak (opaque blobs) would help a lot with >>>>> the quantity of writes, but it's still a very important point. >>>>> >>>>> You may want to look into ways to force Riak to clean up the bitcask >>>>> files. >>>>>> I don't entirely remember how it's going to handle cleaning up deleted >>>>>> records, but you might run into some tricky situations where compactions >>>>>> aren't occurring. >>>>>> >>>>> >>>>> Hm, any references regarding that? It would be a major snag in the whole >>>>> schema Riak doesn't properly reclaim space for deleted records. >>>>> >>>>> Riak is pretty constant time for Bitcask. The tricky part with the amount >>>>> of >>>>>> data you're describing is that Bitcask requires (I think) that all keys >>>>>> fit >>>>>> into memory. As your data volume increases, you'll need to do a >>>>>> combination >>>>>> of scaling up and scaling out. Scale up RAM in the nodes and then add >>>>>> additional nodes to handle load. RAM will help with data volume, more >>>>>> nodes >>>>>> will help with write throughput. >>>>>> >>>>> >>>>> Indeed, for high frequency sources that would create lots of bundles even >>>>> the MaxN to 1 reduction for key names might still generate loads of keys. >>>>> Any idea how much RAM Riak requires per record, or a reference that would >>>>> point me to it? >>>>> >>>>> Since you're searching on time series, mostly, you could build time >>>>> indexes >>>>>> in your RDBMS. The nice thing is that querying temporal data is well >>>>>> documented in the relational world, especially in the data warehousing >>>>>> world. In your case, I'd create a dates table and have a foreign key >>>>>> relating to my RDBMS index table to make it easy to search for dates. >>>>>> Querying your time table will be fast which reduces the need for scans in >>>>>> your index table. >>>>>> >>>>>> EXAMPLE: >>>>>> >>>>>> CREATE TABLE timeseries ( >>>>>> time_key INT, >>>>>> date TIMESTAMP, >>>>>> datestring VARCHAR(30), >>>>>> year SMALLINT, >>>>>> month TINYINT, >>>>>> day TINYINT, >>>>>> day_of_week TINYINT >>>>>> -- etc >>>>>> ); >>>>>> >>>>>> CREATE TABLE riak_index ( >>>>>> id INT NOT NULL, >>>>>> time_key INT NOT NULL REFERENCES timeseries(time_key), >>>>>> riak_key VARCHAR(100) NOT NULL >>>>>> ); >>>>>> >>>>>> >>>>>> SELECT ri.riak_key >>>>>> FROM timeseries ts >>>>>> JOIN riak_index ri ON ts.time_key = ri.time_key >>>>>> WHERE ts.date BETWEEN '20090702' AND '20100702'; >>>>>> >>>>> >>>>> My plan was to have the riak_index contain something like: (id, >>>>> start_time, >>>>> end_time, source_id, record_count.) >>>>> >>>>> Without going too much into RDBMS fun, this pattern can get your RDBMS >>>>>> running pretty quickly and then you can combine that with Riak's >>>>>> performance >>>>>> and have a really good idea of how quick any query will be. >>>>> >>>>> >>>>> That's roughly the plan, thanks again for your contributions to the >>>>> discussion! >>>>> >>>>> Paul >>>> _______________________________________________ >>>> riak-users mailing list >>>> [email protected] >>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >>> >>> >>> _______________________________________________ >>> riak-users mailing list >>> [email protected] >>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >>> >>> _______________________________________________ >>> riak-users mailing list >>> [email protected] >>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> >> >> _______________________________________________ >> riak-users mailing list >> [email protected] >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> _______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
