Re: The power of the siblings....

Mike Oxford Mon, 03 Oct 2011 21:07:47 -0700

SSDs are an option, sure.  I have one in my laptop; we have a bunch
of X25s on the way already for the servers.  Yes, they're good.  But
IOPS is not the core issue since the whole thing can sit in RAM
which is faster yet.  Disk-flush "later" isn't time critical.  Getting the
data into the buckets is.

5k per second per key, over multiple concurrent writers (3-6 initially,
possibly more later.) Pre-cache+flush doesn't work because you
lose the interleave from the multiple writers.  NTP's resolution is only
"so good." :)

The buckets can by cycled/sharded based on time, so slicing it into
"5 second buckets of children" is possible but this is just a specialization
of the sharding ideology.

Point being: If it's basically used as an append-only-bucket (throw it
all in, sort it out later) how painful, underneath, is the child resolution vs
the traditional "get it, write it" and then dealing with children ANYWAY
when you do get collisions (which, at 5kps, you ARE going to end up with.

This was touched on that it uses lists underneath.  Given high-end modern
hardware, (6 core CPUs, SSDs, etc.) ballpark, where would you guess the
red-line is?  10k children? 25k? 100k?  I won't hold anyone to it, but if
you say "hell no, children are really expensive" then I'll abort the idea
right here compared to "they're pretty efficient underneath, it might be
doable."

I'm familiar with all the HA/clustering "normal stuff" but I'm curious
about Riak in particular because while Riak isn't built to be fast,
I'm curious about how much load you can push a ring through before
the underlying architecture stresses.

I know Yammer was putting some load on theirs; something around 4k
per sec over a few boxes but not to a single key.

The big "problem" is that you have to have "knowledge of the buckets"
to later correlate them. Listing buckets is expensive.  I don't want to
hard-code bucket names into the application space if I can help it.
Writing "list of buckets" to another key simply moves the bottleneck
from one key to another.  Shifting buckets based on time works, but
it's obnoxious to have to correlate at 10 second intervals ....
8640 buckets worth of obnoxious.  Every day.  Much easier to sort a
large dataset all at once from a single bucket.

Assuming an entry size of 300 bytes that works out to around
~130G per day, which will fit in RAM for the boxes.  Correlation can be
done on separate boxes later.  GigE cards bonded, etc.

Removing the hardware limitations, where are the guesses on where
Riak itself will curl up in a corner, sob and not come out?

If you had to do it, what suggestions would you all propose?
(Yes, I know I could just memcache with backup writes to
secondary/tertiary copies and flush later ... I'm interested in Riak.  :)

TIA!

-mox

On Mon, Oct 3, 2011 at 9:11 AM, Ryan Zezeski <rzeze...@basho.com> wrote:
> Mike,
> I'd say you're going to be pushing the limits of Riak pretty hard given that
> fact that you're talking about 5k writes-pre-second on a _single_ key.  I
> hope you listen to Artur Bergman and run SSDs in your data center, heh [1].
>  My first thought would be to batch those writes locally for a given period
> of time and then flush to Riak.
> To your question, if you really have 5k/s then that's 300k siblings for one
> minute.  Given that Riak uses lists for siblings underneath I highly doubt
> this will be feasible.  Also, will there be many concurrent writers like
> this?  I.e. many keys being rapidly updated?
> -Ryan
> [1]: http://www.youtube.com/watch?v=H7PJ1oeEyGg
> On Mon, Sep 19, 2011 at 10:44 PM, Mike Oxford <moxf...@gmail.com> wrote:
>>
>> High performance updates to a single bucket/key space where ordering
>> isn't critical.  Say, 5k TPS into a single bucket/key.  Data is
>> written out such that it can be ordered later.
>>
>> I'm aware of sharding/fragmenting/splitting and what not ... I'm
>> looking purely at intra-bucket performance.  Yes, 5k is going to run
>> into a lot of contention; that's the point.
>>
>> Options:
>> 1)  Read old data, [NewData|Olddata] and write it back out, dealing
>> with siblings as they arise, -or-
>> 2)  Go full sibling explosion (read: force it) and resolve the whole
>> thing at intervals, say, once per day, offline or on another system.
>> The logistics of this are doable in my case, so let's not worry about
>> them and just focus on raw TPS.
>>
>> #1 has more round trips and still has siblings to deal with.
>> #2 takes up more space but you skip the pull/update/push in lieu of
>> "just push it, we'll deal with it later."
>>
>> Thoughts from those in the know?  How expensive, really, is forcing
>> the explosion?  Has anyone done this (intentionally or not) and can
>> share what they ran into with real data sets?
>>
>> Thanks!
>>
>> -mox
>>
>> _______________________________________________
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: The power of the siblings....

Reply via email to