Re: Local state write throughput

Chris Riccomini Tue, 20 Jan 2015 16:20:08 -0800

Hey Roger,

> Is this the way you typically do it for stream/table joins?


Yup, in cases where the source stream has a copy of the data, and you
fully bootstrap, there's no need to have a changelog.

> but the current write throughput puts hard limits on the amount of local
>state that a container can have without really long
>initialization/recovery times.

Yea. The current guidance we give is <10G/container, and every gig costs
on startup recovery time. We definitely want to decrease startup times.

> Is my tests, LevelDB has about the same performance.  Have you noticed
>that as well?

I haven't checked, but I'm not surprised. LevelDB and RocksDB are fairly
similar. Anecdotally, we have found LevelDB less predictable. It performs
on-par with RocksDB, and then falls off a cliff for some reason (data
size, access patterns, etc) sometimes.

Cheers,
Chris

On 1/20/15 4:05 PM, "Roger Hoover" <[email protected]> wrote:

>Thanks, Chris.
>
>I am not using a changelog for the store because the the bootstrap stream
>is a master copy of the data and the job can recover from there.  No need
>to write out another copy.  Is this the way you typically do it for
>stream/table joins?
>
>Great to know you that you're looking into the performance issues.  I love
>the idea of local state for isolation and predictable throughput but the
>current write throughput puts hard limits on the amount of local state
>that
>a container can have without really long initialization/recovery times.
>
>Is my tests, LevelDB has about the same performance.  Have you noticed
>that
>as well?
>
>Cheers,
>
>Roger
>
>On Tue, Jan 20, 2015 at 9:28 AM, Chris Riccomini <
>[email protected]> wrote:
>
>> Hey Roger,
>>
>> We did some benchmarking, and discovered very similar performance to
>>what
>> you've described. We saw ~40k writes/sec, and ~20 k reads/sec,
>> per-container, on a Virident SSD. This was without any changelog. Are
>>you
>> using a changelog on the store?
>>
>> When we attached a changelog to the store, the writes dropped
>> significantly (~1000 writes/sec). When we hooked up VisualVM, we saw
>>that
>> the container was spending > 99% of its time in
>>KafkaSystemProducer.send().
>>
>> We're currently doing two things:
>>
>> 1. Working with our performance team to understand and tune RocksDB
>> properly.
>> 2. Upgrading the Kafka producer to use the new Java-based API.
>>(SAMZA-227)
>>
>> For (1), it seems like we should be able to get a lot higher throughput
>> from RocksDB. Anecdotally, we've heard that RocksDB requires many
>>threads
>> in order to max out an SSD, and since Samza is single-threaded, we could
>> just be hitting a RocksDB bottleneck. We won't know until we dig into
>>the
>> problem (which we started investigating last week). The current plan is
>>to
>> start by benchmarking RocksDB JNI outside of Samza, and see what we can
>> get. From there, we'll know our "speed of light", and can try to get
>>Samza
>> as close as possible to it. If RocksDB JNI can't be made to go "fast",
>> then we'll have to understand why.
>>
>> (2) should help with the changelog issue. I believe that the slowness
>>with
>> the changelog is caused because the changelog is using a sync producer
>>to
>> send to Kafka, and is blocking when a batch is flushed. In the new API,
>> the concept of a "sync" producer is removed. All writes are handled on
>>an
>> async writer thread (though we can still guarantee writes are safely
>> written before checkpointing, which is what we need).
>>
>> In short, I agree, it seems slow. We see this behavior, too. We're
>>digging
>> into it.
>>
>> Cheers,
>> Chris
>>
>> On 1/17/15 12:58 PM, "Roger Hoover" <[email protected]> wrote:
>>
>> >Michael,
>> >
>> >Thanks for the response.  I used VisualVM and YourKit and see the CPU
>>is
>> >not being used (0.1%).  I took a few thread dumps and see the main
>>thread
>> >blocked on the flush() method inside the KV store.
>> >
>> >On Sat, Jan 17, 2015 at 7:09 AM, Michael Rose <[email protected]>
>> >wrote:
>> >
>> >> Is your process at 100% CPU? I suspect you're spending most of your
>> >>time in
>> >> JSON deserialization, but profile it and check.
>> >>
>> >> Michael
>> >>
>> >> On Friday, January 16, 2015, Roger Hoover <[email protected]>
>> >>wrote:
>> >>
>> >> > Hi guys,
>> >> >
>> >> > I'm testing a job that needs to load 40M records (6GB in Kafka as
>> >>JSON)
>> >> > from a bootstrap topic.  The topic has 4 partitions and I'm running
>> >>the
>> >> job
>> >> > using the ProcessJobFactory so all four tasks are in one container.
>> >> >
>> >> > Using RocksDB, it's taking 19 minutes to load all the data which
>> >>amounts
>> >> to
>> >> > 35k records/sec or 5MB/s based on input size.  I ran iostat during
>> >>this
>> >> > time as see the disk write throughput is 14MB/s.
>> >> >
>> >> > I didn't tweak any of the storage settings.
>> >> >
>> >> > A few questions:
>> >> > 1) Does this seem low?  I'm running on a Macbook Pro with SSD.
>> >> > 2) Do you have any recommendations for improving the load speed?
>> >> >
>> >> > Thanks,
>> >> >
>> >> > Roger
>> >> >
>> >>
>>
>>

Re: Local state write throughput

Reply via email to