Re: Initializing StateStores takes *really* long for large datasets

2016-12-03 Thread williamtellme123
Unsubscribe Sent via the Samsung Galaxy S7, an AT 4G LTE smartphone Original message From: Guozhang Wang <wangg...@gmail.com> Date: 12/2/16 5:13 PM (GMT-06:00) To: users@kafka.apache.org Subject: Re: Initializing StateStores takes *really* long for large datasets Bef

Re: Initializing StateStores takes *really* long for large datasets

2016-12-02 Thread Guozhang Wang
Before we have the a single-knob memory management feature, I'd like to propose reducing the Streams' default config values for RocksDB caching and memory block size. For example, I remember Henry has done some fine tuning on the RocksDB config for his use case:

Re: Initializing StateStores takes *really* long for large datasets

2016-11-30 Thread Ara Ebrahimi
+1 on this. Ara. > On Nov 30, 2016, at 5:18 AM, Mathieu Fenniak > wrote: > > I'd like to quickly reinforce Frank's opinion regarding the rocksdb memory > usage. I was also surprised by the amount of non-JVM-heap memory being > used and had to tune the 100 MB

Re: Initializing StateStores takes *really* long for large datasets

2016-11-30 Thread Eno Thereska
Mathieu, You are absolutely right. We've written about the memory management strategy below: https://cwiki.apache.org/confluence/display/KAFKA/Discussion%3A+Memory+Management+in+Kafka+Streams

Re: Initializing StateStores takes *really* long for large datasets

2016-11-30 Thread Mathieu Fenniak
I'd like to quickly reinforce Frank's opinion regarding the rocksdb memory usage. I was also surprised by the amount of non-JVM-heap memory being used and had to tune the 100 MB default down considerably. It's also unfortunate that it's hard to estimate the memory requirements for a KS app

Re: Initializing StateStores takes *really* long for large datasets

2016-11-28 Thread Frank Lyaruu
I'll write an update on where I am now. I've got about 40 'primary' topics, some small, some up to about 10M messages, and about 30 internal topics, divided over 6 stream instances, all running in a single app, talking to a 3 node Kafka cluster. I use a single thread per stream instance, as my

Re: Initializing StateStores takes *really* long for large datasets

2016-11-28 Thread Guozhang Wang
Hello Frank, How many instances do you have in your apps and how many threads did you use per thread? Note that besides the topology complexity (i.e. number of state stores, number of internal topics etc) the (re-)initialization process is depending on the underlying consumer's membership

Re: Initializing StateStores takes *really* long for large datasets

2016-11-25 Thread Frank Lyaruu
I'm running all on a single node, so there is no 'data mobility' involved. So if Streams does not use any existing data, I might as well wipe the whole RocksDb before starting, right? As for the RocksDb tuning, I am using a RocksDBConfigSetter, to reduce the memory usage a bit:

Re: Initializing StateStores takes *really* long for large datasets

2016-11-25 Thread Damian Guy
Hi Frank, If you have run the app before with the same applicationId, completely shut it down, and then restarted it again, it will need to restore all of the state which will take some time depending on the amount of data you have. In this case the placement of the partitions doesn't take into

Re: Initializing StateStores takes *really* long for large datasets

2016-11-25 Thread Frank Lyaruu
@Damian: Yes, it ran before, and it has that 200gb blob worth of Rocksdb stuff @Svente: It's on a pretty high end san in a managed private cloud, I'm unsure what the ultimate storage is, but I doubt there is a performance problem there. On Fri, 25 Nov 2016 at 13:37, Svante Karlsson

Re: Initializing StateStores takes *really* long for large datasets

2016-11-25 Thread Svante Karlsson
What kind of disk are you using for the rocksdb store? ie spinning or ssd? 2016-11-25 12:51 GMT+01:00 Damian Guy : > Hi Frank, > > Is this on a restart of the application? > > Thanks, > Damian > > On Fri, 25 Nov 2016 at 11:09 Frank Lyaruu wrote: > > > Hi

Re: Initializing StateStores takes *really* long for large datasets

2016-11-25 Thread Damian Guy
Hi Frank, Is this on a restart of the application? Thanks, Damian On Fri, 25 Nov 2016 at 11:09 Frank Lyaruu wrote: > Hi y'all, > > I have a reasonably simple KafkaStream application, which merges about 20 > topics a few times. > The thing is, some of those topic datasets

Initializing StateStores takes *really* long for large datasets

2016-11-25 Thread Frank Lyaruu
Hi y'all, I have a reasonably simple KafkaStream application, which merges about 20 topics a few times. The thing is, some of those topic datasets are pretty big, about 10M messages. In total I've got about 200Gb worth of state in RocksDB, the largest topic is 38 Gb. I had set the