Re: Local state write throughput

Roger Hoover Tue, 20 Jan 2015 16:58:13 -0800

Good to know.  Thanks, Jay and Chris.  I want the job to accept updates it
may be worthwhile for me to add a changelog so recoveries are faster.



On Tue, Jan 20, 2015 at 4:38 PM, Chris Riccomini <
[email protected]> wrote:

> Hey Roger,
>
> To add to Jay's comment, if you don't care about getting updates after the
> initial bootstrap, you can configure a store with a changelog pointed to
> your bootstrap topic. This will cause the SamzaContainer to bootstrap
> using the optimized code that Jay described. Just make sure you don't
> write to the store (since it would put the mutation back into your
> bootstrap stream). This configuration won't allow new updates to come into
> the store until the job is restarted. If you use the 'bootstrap stream'
> concept, then you continue getting updates after the initial bootstrap.
> The 'bootstrap' stream also allows you to have arbitrary logic, which
> might be useful for your job--not sure.
>
> Cheers,
> Chris
>
> On 1/20/15 4:30 PM, "Jay Kreps" <[email protected]> wrote:
>
> >It's also worth noting that restoring from a changelog *should* be much
> >faster than restoring from upstream. The restore case is optimized and
> >batches the updates and skips serialization both of which help a ton with
> >performance.
> >
> >-Jay
> >
> >On Tue, Jan 20, 2015 at 4:19 PM, Chinmay Soman <[email protected]
> >
> >wrote:
> >
> >> I remember running both RocksDB and LevelDB and it was definitely better
> >> (in that 1 test case, it was ~40K vs ~30K random writes/sec) - but I
> >> haven't done any exhaustive comparison.
> >>
> >> Btw, I see that you're using 4 partitions ? Any reason you're not using
> >> like >= 128 and running with more containers ?
> >>
> >> On Tue, Jan 20, 2015 at 4:05 PM, Roger Hoover <[email protected]>
> >> wrote:
> >>
> >> > Thanks, Chris.
> >> >
> >> > I am not using a changelog for the store because the the bootstrap
> >>stream
> >> > is a master copy of the data and the job can recover from there.  No
> >>need
> >> > to write out another copy.  Is this the way you typically do it for
> >> > stream/table joins?
> >> >
> >> > Great to know you that you're looking into the performance issues.  I
> >> love
> >> > the idea of local state for isolation and predictable throughput but
> >>the
> >> > current write throughput puts hard limits on the amount of local state
> >> that
> >> > a container can have without really long initialization/recovery
> >>times.
> >> >
> >> > Is my tests, LevelDB has about the same performance.  Have you noticed
> >> that
> >> > as well?
> >> >
> >> > Cheers,
> >> >
> >> > Roger
> >> >
> >> > On Tue, Jan 20, 2015 at 9:28 AM, Chris Riccomini <
> >> > [email protected]> wrote:
> >> >
> >> > > Hey Roger,
> >> > >
> >> > > We did some benchmarking, and discovered very similar performance to
> >> what
> >> > > you've described. We saw ~40k writes/sec, and ~20 k reads/sec,
> >> > > per-container, on a Virident SSD. This was without any changelog.
> >>Are
> >> you
> >> > > using a changelog on the store?
> >> > >
> >> > > When we attached a changelog to the store, the writes dropped
> >> > > significantly (~1000 writes/sec). When we hooked up VisualVM, we saw
> >> that
> >> > > the container was spending > 99% of its time in
> >> > KafkaSystemProducer.send().
> >> > >
> >> > > We're currently doing two things:
> >> > >
> >> > > 1. Working with our performance team to understand and tune RocksDB
> >> > > properly.
> >> > > 2. Upgrading the Kafka producer to use the new Java-based API.
> >> > (SAMZA-227)
> >> > >
> >> > > For (1), it seems like we should be able to get a lot higher
> >>throughput
> >> > > from RocksDB. Anecdotally, we've heard that RocksDB requires many
> >> threads
> >> > > in order to max out an SSD, and since Samza is single-threaded, we
> >> could
> >> > > just be hitting a RocksDB bottleneck. We won't know until we dig
> >>into
> >> the
> >> > > problem (which we started investigating last week). The current
> >>plan is
> >> > to
> >> > > start by benchmarking RocksDB JNI outside of Samza, and see what we
> >>can
> >> > > get. From there, we'll know our "speed of light", and can try to get
> >> > Samza
> >> > > as close as possible to it. If RocksDB JNI can't be made to go
> >>"fast",
> >> > > then we'll have to understand why.
> >> > >
> >> > > (2) should help with the changelog issue. I believe that the
> >>slowness
> >> > with
> >> > > the changelog is caused because the changelog is using a sync
> >>producer
> >> to
> >> > > send to Kafka, and is blocking when a batch is flushed. In the new
> >>API,
> >> > > the concept of a "sync" producer is removed. All writes are handled
> >>on
> >> an
> >> > > async writer thread (though we can still guarantee writes are safely
> >> > > written before checkpointing, which is what we need).
> >> > >
> >> > > In short, I agree, it seems slow. We see this behavior, too. We're
> >> > digging
> >> > > into it.
> >> > >
> >> > > Cheers,
> >> > > Chris
> >> > >
> >> > > On 1/17/15 12:58 PM, "Roger Hoover" <[email protected]> wrote:
> >> > >
> >> > > >Michael,
> >> > > >
> >> > > >Thanks for the response.  I used VisualVM and YourKit and see the
> >>CPU
> >> is
> >> > > >not being used (0.1%).  I took a few thread dumps and see the main
> >> > thread
> >> > > >blocked on the flush() method inside the KV store.
> >> > > >
> >> > > >On Sat, Jan 17, 2015 at 7:09 AM, Michael Rose
> >><[email protected]
> >> >
> >> > > >wrote:
> >> > > >
> >> > > >> Is your process at 100% CPU? I suspect you're spending most of
> >>your
> >> > > >>time in
> >> > > >> JSON deserialization, but profile it and check.
> >> > > >>
> >> > > >> Michael
> >> > > >>
> >> > > >> On Friday, January 16, 2015, Roger Hoover
> >><[email protected]>
> >> > > >>wrote:
> >> > > >>
> >> > > >> > Hi guys,
> >> > > >> >
> >> > > >> > I'm testing a job that needs to load 40M records (6GB in Kafka
> >>as
> >> > > >>JSON)
> >> > > >> > from a bootstrap topic.  The topic has 4 partitions and I'm
> >> running
> >> > > >>the
> >> > > >> job
> >> > > >> > using the ProcessJobFactory so all four tasks are in one
> >> container.
> >> > > >> >
> >> > > >> > Using RocksDB, it's taking 19 minutes to load all the data
> >>which
> >> > > >>amounts
> >> > > >> to
> >> > > >> > 35k records/sec or 5MB/s based on input size.  I ran iostat
> >>during
> >> > > >>this
> >> > > >> > time as see the disk write throughput is 14MB/s.
> >> > > >> >
> >> > > >> > I didn't tweak any of the storage settings.
> >> > > >> >
> >> > > >> > A few questions:
> >> > > >> > 1) Does this seem low?  I'm running on a Macbook Pro with SSD.
> >> > > >> > 2) Do you have any recommendations for improving the load
> >>speed?
> >> > > >> >
> >> > > >> > Thanks,
> >> > > >> >
> >> > > >> > Roger
> >> > > >> >
> >> > > >>
> >> > >
> >> > >
> >> >
> >>
> >>
> >>
> >> --
> >> Thanks and regards
> >>
> >> Chinmay Soman
> >>
>
>

Re: Local state write throughput

Reply via email to