SSDs are an option, sure. I have one in my laptop; we have a bunch of X25s on the way already for the servers. Yes, they're good. But IOPS is not the core issue since the whole thing can sit in RAM which is faster yet. Disk-flush "later" isn't time critical. Getting the data into the buckets is.
5k per second per key, over multiple concurrent writers (3-6 initially, possibly more later.) Pre-cache+flush doesn't work because you lose the interleave from the multiple writers. NTP's resolution is only "so good." :) The buckets can by cycled/sharded based on time, so slicing it into "5 second buckets of children" is possible but this is just a specialization of the sharding ideology. Point being: If it's basically used as an append-only-bucket (throw it all in, sort it out later) how painful, underneath, is the child resolution vs the traditional "get it, write it" and then dealing with children ANYWAY when you do get collisions (which, at 5kps, you ARE going to end up with. This was touched on that it uses lists underneath. Given high-end modern hardware, (6 core CPUs, SSDs, etc.) ballpark, where would you guess the red-line is? 10k children? 25k? 100k? I won't hold anyone to it, but if you say "hell no, children are really expensive" then I'll abort the idea right here compared to "they're pretty efficient underneath, it might be doable." I'm familiar with all the HA/clustering "normal stuff" but I'm curious about Riak in particular because while Riak isn't built to be fast, I'm curious about how much load you can push a ring through before the underlying architecture stresses. I know Yammer was putting some load on theirs; something around 4k per sec over a few boxes but not to a single key. The big "problem" is that you have to have "knowledge of the buckets" to later correlate them. Listing buckets is expensive. I don't want to hard-code bucket names into the application space if I can help it. Writing "list of buckets" to another key simply moves the bottleneck from one key to another. Shifting buckets based on time works, but it's obnoxious to have to correlate at 10 second intervals .... 8640 buckets worth of obnoxious. Every day. Much easier to sort a large dataset all at once from a single bucket. Assuming an entry size of 300 bytes that works out to around ~130G per day, which will fit in RAM for the boxes. Correlation can be done on separate boxes later. GigE cards bonded, etc. Removing the hardware limitations, where are the guesses on where Riak itself will curl up in a corner, sob and not come out? If you had to do it, what suggestions would you all propose? (Yes, I know I could just memcache with backup writes to secondary/tertiary copies and flush later ... I'm interested in Riak. :) TIA! -mox On Mon, Oct 3, 2011 at 9:11 AM, Ryan Zezeski <rzeze...@basho.com> wrote: > Mike, > I'd say you're going to be pushing the limits of Riak pretty hard given that > fact that you're talking about 5k writes-pre-second on a _single_ key. I > hope you listen to Artur Bergman and run SSDs in your data center, heh [1]. > My first thought would be to batch those writes locally for a given period > of time and then flush to Riak. > To your question, if you really have 5k/s then that's 300k siblings for one > minute. Given that Riak uses lists for siblings underneath I highly doubt > this will be feasible. Also, will there be many concurrent writers like > this? I.e. many keys being rapidly updated? > -Ryan > [1]: http://www.youtube.com/watch?v=H7PJ1oeEyGg > On Mon, Sep 19, 2011 at 10:44 PM, Mike Oxford <moxf...@gmail.com> wrote: >> >> High performance updates to a single bucket/key space where ordering >> isn't critical. Say, 5k TPS into a single bucket/key. Data is >> written out such that it can be ordered later. >> >> I'm aware of sharding/fragmenting/splitting and what not ... I'm >> looking purely at intra-bucket performance. Yes, 5k is going to run >> into a lot of contention; that's the point. >> >> Options: >> 1) Read old data, [NewData|Olddata] and write it back out, dealing >> with siblings as they arise, -or- >> 2) Go full sibling explosion (read: force it) and resolve the whole >> thing at intervals, say, once per day, offline or on another system. >> The logistics of this are doable in my case, so let's not worry about >> them and just focus on raw TPS. >> >> #1 has more round trips and still has siblings to deal with. >> #2 takes up more space but you skip the pull/update/push in lieu of >> "just push it, we'll deal with it later." >> >> Thoughts from those in the know? How expensive, really, is forcing >> the explosion? Has anyone done this (intentionally or not) and can >> share what they ran into with real data sets? >> >> Thanks! >> >> -mox >> >> _______________________________________________ >> riak-users mailing list >> riak-users@lists.basho.com >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com