kaka-streams 0.11.0.1 rocksdb bug?

2017-09-22 Thread Ara Ebrahimi
Hi, We just upgraded to kaka-streams 0.11.0.1 and noticed that in the cluster deployment reduce() never gets called. Funny thing is it does gets called in the unit tests. And no, it’s not a data issue. What I have noticed is that all rocked folders (//1_0// and so on) are empty. I do not see

Re: session window bug not fixed in 0.10.2.1?

2017-05-08 Thread Ara Ebrahimi
issue still persists in trunk, hence there might be another issue that is not fixed in 2645. Could you help verify if that is the case? In which we can re-open https://issues.apache.org/jira/browse/KAFKA-4851 and investigate further. Guozhang On Tue, May 2, 2017 at 1:02 PM, Ara Eb

Re: session window bug not fixed in 0.10.2.1?

2017-05-02 Thread Ara Ebrahimi
com/apache/kafka/pull/2645 has gone to both trunk and > 0.10.2.1, I just checked. What error are you seeing, could you give us an > update? > > Thanks > Eno > > On Fri, Apr 28, 2017 at 7:10 PM, Ara Ebrahimi <ara.ebrah...@argyledata.com> > wrote: > >> Hi, >>

session window bug not fixed in 0.10.2.1?

2017-04-28 Thread Ara Ebrahimi
o the 0.10.2 branch, so you could > build from source and not have to create the StateStoreSupplier. > > Thanks, > Damian > > On Mon, 27 Mar 2017 at 21:56 Ara Ebrahimi <ara.ebrah...@argyledata.com> > wrote: > > Thanks for the response Mathias! > > The reason we wan

Re: more uniform task assignment across kafka stream nodes

2017-03-28 Thread Ara Ebrahimi
ers@kafka.apache.org>> Created a JIRA: https://issues.apache.org/jira/browse/KAFKA-4969 -Matthias On 3/27/17 4:33 PM, Ara Ebrahimi wrote: Well, even with 4-5x better performance thanks to the session window fix, I expect to get ~10x better performance if I throw 10x more nodes at the pro

Re: more uniform task assignment across kafka stream nodes

2017-03-27 Thread Ara Ebrahimi
:users@kafka.apache.org>> Great! So overall, the issue is not related to task assignment. Also the description below, does not indicate that different task assignment would change anything. -Matthias On 3/27/17 3:08 PM, Ara Ebrahimi wrote: Let me clarify, cause I think we’re using different ter

Re: more uniform task assignment across kafka stream nodes

2017-03-27 Thread Ara Ebrahimi
hot" keys, so you might need to use a custom partitioner there, too) Also "100s of records" does not sound much to me. Streams can process multiple hundredths of thousandth records per thread. That is the reason, why I think that the fix Damian pointed out will most like

Re: more uniform task assignment across kafka stream nodes

2017-03-27 Thread Ara Ebrahimi
> The fix has also been cherry-picked to the 0.10.2 branch, so you could > build from source and not have to create the StateStoreSupplier. > > Thanks, > Damian > > On Mon, 27 Mar 2017 at 21:56 Ara Ebrahimi <ara.ebrah...@argyledata.com> > wrote: > > Thanks for the r

Re: more uniform task assignment across kafka stream nodes

2017-03-27 Thread Ara Ebrahimi
of tasks are not distinguished? I would like to understand your requirement better -- it might be worth to improve Streams here. -Matthias On 3/27/17 12:57 PM, Ara Ebrahimi wrote: Hi, So, I simplified the topology by making sure we have only 1 source topic. Now I have 1 source topi

Re: more uniform task assignment across kafka stream nodes

2017-03-27 Thread Ara Ebrahimi
t;> Please share the rest of your topology code (without any UDFs / business logic). Otherwise, I cannot give further advice. -Matthias On 3/25/17 6:08 PM, Ara Ebrahimi wrote: Via: builder.stream("topic1"); builder.stream("topic2"); builder.stream("topic3”); These are

Re: more uniform task assignment across kafka stream nodes

2017-03-25 Thread Ara Ebrahimi
uot;topic1"); builder.stream("topic2"); builder.stream("topic3"); Both and handled differently with regard to creating tasks (partition to task assignment also depends on you downstream code though). If this does not help, can you maybe share the structure of processing? To

Re: more uniform task assignment across kafka stream nodes

2017-03-25 Thread Ara Ebrahimi
first place. Streams, load-balanced over all running instances, and each instance should be the same number of tasks (and thus partitions) assigned. What is the overall assignment? Do you have StandyBy tasks configured? What version do you use? -Matthias On 3/24/17 8:09 PM, Ara Ebrahimi wrote: Hi

Re: kafka streams locking issue in 0.10.20.0

2017-02-25 Thread Ara Ebrahimi
As the first exception suggests, you may need to increase the > max.poll.interval.ms > > Thanks, > Damian > > On Thu, 23 Feb 2017 at 18:26 Ara Ebrahimi <ara.ebrah...@argyledata.com> > wrote: > >> Hi, >> >> After upgrading to 0.10.20.0 I got this: >> >>

kafka streams locking issue in 0.10.20.0

2017-02-23 Thread Ara Ebrahimi
Hi, After upgrading to 0.10.20.0 I got this: Caused by: org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was

KTable TTL

2017-02-13 Thread Ara Ebrahimi
Hi, I have a ktable and I want to keep entries in it only for that past 24 hours. How can I do that? I understand rocksdb has support for ttl. Should I set that? How? Should I use kafka-streams window functionality? Would it remove data from old windows? I want to do this because I’m seeing a

Re: kafka-consumer-offset-checker complaining about NoNode for X in zk

2017-02-02 Thread Ara Ebrahimi
ffsets >> in kafka rather than zookeeper. >> You can use the --new-consumer option to check for kafka stored offsets. >> >> Best Jan >> >> >> On 01.02.2017 21:14, Ara Ebrahimi wrote: >>> Hi, >>> >>> For a subset of our topics we g

kafka-consumer-offset-checker complaining about NoNode for X in zk

2017-02-01 Thread Ara Ebrahimi
Hi, For a subset of our topics we get this error: $KAFKA_HOME/bin/kafka-consumer-offset-checker.sh --group argyle-streams --topic topic_name --zookeeper $ZOOKEEPERS [2017-02-01 12:08:56,115] WARN WARNING: ConsumerOffsetChecker is deprecated and will be dropped in releases following 0.9.0. Use

kafka-streams ktable recovery after rebalance crash

2017-02-01 Thread Ara Ebrahimi
Hi, My kafka-streams application crashed due to a rebalance event (seems like I need to increase max.poll.interval.ms even more!) and then when I restarted the app I noticed existing rocksdb files were gone and while the rest of the pipeline was processing the part dealing with ktable was

Re: kafka streams consumer partition assignment is uneven

2017-01-09 Thread Ara Ebrahimi
s is forced to let one of your three Streams > nodes process "more" topics/partitions than the other two nodes. > > -Michael > > > > On Mon, Jan 9, 2017 at 6:52 PM, Ara Ebrahimi <ara.ebrah...@argyledata.com> > wrote: > >> Hi, >> >> I have 3 ka

kafka streams consumer partition assignment is uneven

2017-01-09 Thread Ara Ebrahimi
Hi, I have 3 kafka brokers, each with 4 disks. I have 12 partitions. I have 3 kafka streams nodes. Each is configured to have 4 streaming threads. My topology is quite complex and I have 7 topics and lots of joins and states. What I have noticed is that each of the 3 kafka streams nodes gets

kafka streams passes org.apache.kafka.streams.kstream.internals.Change to my app!!

2016-12-08 Thread Ara Ebrahimi
Hi, Once in a while and quite randomly this happens, but it does happen every few hundred thousand message: 2016-12-03 11:48:05 ERROR StreamThread:249 - stream-thread [StreamThread-4] Streams application error during processing: java.lang.ClassCastException:

Re: Initializing StateStores takes *really* long for large datasets

2016-11-30 Thread Ara Ebrahimi
+1 on this. Ara. > On Nov 30, 2016, at 5:18 AM, Mathieu Fenniak > wrote: > > I'd like to quickly reinforce Frank's opinion regarding the rocksdb memory > usage. I was also surprised by the amount of non-JVM-heap memory being > used and had to tune the 100 MB

Re: Hang while close() on KafkaStream

2016-11-17 Thread Ara Ebrahimi
This happens for me too. On 10.1.0. Seems like it just sits there waiting in the streamThread.join() call. Ara. > On Nov 17, 2016, at 6:02 PM, mordac2k wrote: > > Hello all, > > I have a Java application in which I use an instance of KafkaStreams that > is working

Re: kafka streams rocksdb tuning (or memory leak?)

2016-11-16 Thread Ara Ebrahimi
t; > >confluent >http://packages.confluent.io/maven/ > > > >org.apache.kafka >kafka-streams >0.10.1.0-cp2 >org.apache.kafka >kafka-clients > 0.10.1.0-cp2 > > Thanks, > > Damian > > > On Wed, 16 Nov 2016 at

kafka streams rocksdb tuning (or memory leak?)

2016-11-16 Thread Ara Ebrahimi
Hi, I have a few KTables in my application. Some of them have unlimited windows. If I leave the application to run for a few hours, I see the java process consume more and more memory, way above the -Xmx limit. I understand this is due to the rocksdb native lib used by kafka streams. What I

Re: kafka streaming rocks db lock bug?

2016-10-25 Thread Ara Ebrahimi
way? If not I would love to investigate this issue further with you. Guozhang Guozhang On Sun, Oct 23, 2016 at 1:45 PM, Ara Ebrahimi <ara.ebrah...@argyledata.com<mailto:ara.ebrah...@argyledata.com>> wrote: And then this on a different node: 2016-10-23 13:43:57 INFO StreamThread

Re: kafka streaming rocks db lock bug?

2016-10-23 Thread Ara Ebrahimi
.(ProcessorStateManager.java:98) at org.apache.kafka.streams.processor.internals.AbstractTask.(AbstractTask.java:69) ... 13 more Ara. On Oct 23, 2016, at 1:24 PM, Ara Ebrahimi <ara.ebrah...@argyledata.com<mailto:ara.ebrah...@argyledata.com>> wrote: Hi, This happens when I hammer our 5 kaf

kafka streaming rocks db lock bug?

2016-10-23 Thread Ara Ebrahimi
Hi, This happens when I hammer our 5 kafka streaming nodes (each with 4 streaming threads) hard enough for an hour or so: 2016-10-23 13:04:17 ERROR StreamThread:324 - stream-thread [StreamThread-2] Failed to flush state for StreamTask 3_8:

Re: micro-batching in kafka streams

2016-09-28 Thread Ara Ebrahimi
e store. We are going to write a blog post about step-by-step instructions to leverage this feature for use cases just like yours soon. Guozhang On Wed, Sep 28, 2016 at 2:19 PM, Ara Ebrahimi <ara.ebrah...@argyledata.com<mailto:ara.ebrah...@argyledata.com>> wrote: I need this ReadO

Re: micro-batching in kafka streams

2016-09-28 Thread Ara Ebrahimi
ed to happen, if you are indeed using this feature I'd like to learn more of your error scenario. Guozhang On Tue, Sep 27, 2016 at 9:41 AM, Ara Ebrahimi <ara.ebrah...@argyledata.com<mailto:ara.ebrah...@argyledata.com>> wrote: One more thing: Guozhang pointed me towards this sample f

Re: micro-batching in kafka streams

2016-09-27 Thread Ara Ebrahimi
(or any other such KTable-producing operation) and such periodic triggers. Ara. On Sep 26, 2016, at 10:40 AM, Ara Ebrahimi <ara.ebrah...@argyledata.com<mailto:ara.ebrah...@argyledata.com>> wrote: Hi, So, here’s the situation: - for classic batching of writes to external systems,

Re: micro-batching in kafka streams

2016-09-26 Thread Ara Ebrahimi
t; micro-batching behind the scenes should (at least in an ideal world) be of > no concern to the user. > > -Michael > > > > > > On Mon, Sep 5, 2016 at 9:10 PM, Ara Ebrahimi <ara.ebrah...@argyledata.com> > wrote: > >> Hi, >> >> What’s the best way

Re: micro-batching in kafka streams

2016-09-12 Thread Ara Ebrahimi
licit "trigger" mechanism. This > is not exactly the same as micro-batching, but also acts as reducing IO > costs as well as data traffic: > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-63%3A+Unify+store+and+downstream+caching+in+streams > > > Let me know if the

Re: Performance issue with KafkaStreams

2016-09-10 Thread Ara Ebrahimi
Hi Eno, Could you elaborate more on tuning Kafka Streaming applications? What are the relationships between partitions and num.stream.threads num.consumer.fetchers and other such parameters? On a single node setup with x partitions, what’s the best way to make sure these partitions are

Re: enhancing KStream DSL

2016-09-09 Thread Ara Ebrahimi
llRecord) -> "DATA".equalsIgnoreCase(callRe >> cord.getCallCommType()), >>(imsi, callRecord) -> !(callRecord.getCallCommType(). >> equalsIgnoreCase("VOICE") || callRecord.getCallCommType().e >> qualsIgnoreCase("DATA")) >&

enhancing KStream DSL

2016-09-08 Thread Ara Ebrahimi
Let’s say I have this: KStream[] branches = allRecords .branch( (imsi, callRecord) -> "VOICE".equalsIgnoreCase(callRecord.getCallCommType()), (imsi, callRecord) -> "DATA".equalsIgnoreCase(callRecord.getCallCommType()), (imsi,

micro-batching in kafka streams

2016-09-05 Thread Ara Ebrahimi
Hi, What’s the best way to do micro-batching in Kafka Streams? Any plans for a built-in mechanism? Perhaps StateStore could act as the buffer? What exactly are ProcessorContext.schedule()/punctuate() for? They don’t seem to be used anywhere?

kafka streams: join 2 Windowed tables

2016-09-01 Thread Ara Ebrahimi
Hi, Is joining 2 streams based on Windowed keys supposed to work? I have 2 KTables: - KTable events: I process events and aggregate events that have a common criteria using aggregateByKey and UnlimitedWindows as window (for now) - KTable hourlyStats: I calculate some