Re: Kafka windowed table not aggregating correctly

2016-11-25 Thread Sachin Mittal
Hi, I fixed that sorted set issue but I am facing a weird problem which I am not able to replicate. Here is the sample problem that I could isolate: My class is like this: public static class Message implements Comparable { public long ts; public String message; public

Re: Kafka consumers are not equally distributed

2016-11-25 Thread Guozhang Wang
You can take a look at this FAQ wiki: https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whyisdatanotevenlydistributedamongpartitionswhenapartitioningkeyisnotspecified ? And even if you are using the new Java producer, if you specify the key and key distribution is not even, then it will

Re: Initializing StateStores takes *really* long for large datasets

2016-11-25 Thread Frank Lyaruu
I'm running all on a single node, so there is no 'data mobility' involved. So if Streams does not use any existing data, I might as well wipe the whole RocksDb before starting, right? As for the RocksDb tuning, I am using a RocksDBConfigSetter, to reduce the memory usage a bit:

Re: no luck with kafka-connect on secure cluster

2016-11-25 Thread Koert Kuipers
well it seems if you run connect in distributed mode... its again security.protocol=SASL_PLAINTEXT and not producer.security.protocol= SASL_PLAINTEXT dont ask me why On Thu, Nov 24, 2016 at 10:40 PM, Koert Kuipers wrote: > for anyone that runs into this. turns out i also had

Re: A strange controller log in Kafka 0.9.0.1

2016-11-25 Thread Json Tu
thanks guozhang, if it's convenient,can we disscuss it in the jira https://issues.apache.org/jira/browse/KAFKA-4447 ,I guess some body may also encounter this problem. > 在 2016年11月25日,下午12:31,Guozhang Wang 写道: > >

RE: Kafka consumers are not equally distributed

2016-11-25 Thread Ghosh, Achintya (Contractor)
So what is the option to messages make it equally distributed from that point? I mean is any other option to make the consumers to speed up? Thanks Acintya -Original Message- From: Guozhang Wang [mailto:wangg...@gmail.com] Sent: Friday, November 25, 2016 12:09 PM To:

test

2016-11-25 Thread Samy CHBINOU
test

Re: Initializing StateStores takes *really* long for large datasets

2016-11-25 Thread Damian Guy
Hi Frank, If you have run the app before with the same applicationId, completely shut it down, and then restarted it again, it will need to restore all of the state which will take some time depending on the amount of data you have. In this case the placement of the partitions doesn't take into

RE: Kafka consumers are not equally distributed

2016-11-25 Thread Ghosh, Achintya (Contractor)
Thank you Guozhang. Let me clarify : "some of the partitions are sitting idle and some of are overloaded", I mean we stopped the load after 9 hours as see the messages were processing very slow. That time we observed that some partitions had lot of messages and some were sitting idle. So my

Re: Initializing StateStores takes *really* long for large datasets

2016-11-25 Thread Frank Lyaruu
@Damian: Yes, it ran before, and it has that 200gb blob worth of Rocksdb stuff @Svente: It's on a pretty high end san in a managed private cloud, I'm unsure what the ultimate storage is, but I doubt there is a performance problem there. On Fri, 25 Nov 2016 at 13:37, Svante Karlsson

Re: Initializing StateStores takes *really* long for large datasets

2016-11-25 Thread Svante Karlsson
What kind of disk are you using for the rocksdb store? ie spinning or ssd? 2016-11-25 12:51 GMT+01:00 Damian Guy : > Hi Frank, > > Is this on a restart of the application? > > Thanks, > Damian > > On Fri, 25 Nov 2016 at 11:09 Frank Lyaruu wrote: > > > Hi

Re: Kafka producer dropping records

2016-11-25 Thread Ismael Juma
Hi Varun, You could increase `retries`, but seems like you already configured it to be `100`. Another option is to increase `retry.backoff.ms` which will increase the time between retries. Ismael On Fri, Nov 25, 2016 at 9:38 AM, Phadnis, Varun wrote: > Hello, > >

Re: Initializing StateStores takes *really* long for large datasets

2016-11-25 Thread Damian Guy
Hi Frank, Is this on a restart of the application? Thanks, Damian On Fri, 25 Nov 2016 at 11:09 Frank Lyaruu wrote: > Hi y'all, > > I have a reasonably simple KafkaStream application, which merges about 20 > topics a few times. > The thing is, some of those topic datasets

Re: Messages intermittently get lost

2016-11-25 Thread Zac Harvey
Hi Martin, My server.properties looks like this: listeners=PLAINTEXT://0.0.0.0:9092 advertised.host.name= broker.id=2 port=9092 num.partitions=4 zookeeper.connect=zkA:2181,zkB:2181,zkC:2181 num.network.threads=3 num.io.threads=8 socket.send.buffer.bytes=102400

Initializing StateStores takes *really* long for large datasets

2016-11-25 Thread Frank Lyaruu
Hi y'all, I have a reasonably simple KafkaStream application, which merges about 20 topics a few times. The thing is, some of those topic datasets are pretty big, about 10M messages. In total I've got about 200Gb worth of state in RocksDB, the largest topic is 38 Gb. I had set the

Re: KafkaStreams KTable#through not creating changelog topic

2016-11-25 Thread Mikael Högqvist
Thanks, based on this we will re-evaluate the use of internal topics. The main motivation for using the internal changelog topics was to avoid duplication of data and have an easy way to access the update stream of any state store. Best, Mikael On Fri, Nov 25, 2016 at 9:52 AM Michael Noll

Re: Data (re)processing with Kafka (new wiki page)

2016-11-25 Thread saiprasad mishra
This page is really helpful.Thanks for putting this Some nice to have features can be (not sure for this wiki page) 1) Pause and resume without having to start and stop. It should drain all the inflight calculations before doing the actual pause and a notifier will be helpful that it is actually

Graceful shutdown on Windows when using procrun

2016-11-25 Thread Harald Kirsch
Hi all, we are using apache-daemon (aka procrun) to run the Kafka broker as a Windows service. This does not create a process with 'kafka.Kafka' in the name such that bin\windows\kafka-server-stop.bat does not work. Instead we use stop-service to shut down the service (=procrun=Kafka), but

RE: Kafka producer dropping records

2016-11-25 Thread Phadnis, Varun
Hello, Sorry for the late response, we tried logging the errors received in the callback and the result is that we are facing TimeoutExceptions org.apache.kafka.common.errors.TimeoutException: Batch containing 93 record(s) expired due to timeout while requesting metadata from brokers

Re: Data (re)processing with Kafka (new wiki page)

2016-11-25 Thread Michael Noll
Thanks a lot, Matthias! I have already begun to provide feedback. -Michael On Wed, Nov 23, 2016 at 11:41 PM, Matthias J. Sax wrote: > Hi, > > we added a new wiki page that is supposed to collect data (re)processing > scenario with Kafka: > >

kafka balance partition data across directory locations after partition created

2016-11-25 Thread Yuanjia
Hi all, In kafka cluster, we config multi directory using "log.dirs" property. Kafka balances the partition data directories across these given directory locations when to create new topic. With in increase in data size, some disks may be full while the other not. So we need to move some

Re: KafkaStreams KTable#through not creating changelog topic

2016-11-25 Thread Michael Noll
Mikael, > Sure, I guess the topic is auto-created the first time I start the topology > and the second time its there already. It could be possible to create > topics up front for us, or even use an admin call from inside the code. Yes, that (i.e. you are running with auto-topic creation