Kafka Streams vs Spark Streaming

2017-02-24 Thread Tianji Li
Hi there, Can anyone give a good explanation in what cases Kafka Streams is preferred, and in what cases Sparking Streaming is better? Thanks Tianji

Is there a list of companies using Kafka Streams?

2017-02-24 Thread Tianji Li
Just curious... Thanks Tianji

Re: Immutable Record with Kafka Stream

2017-02-24 Thread Matthias J. Sax
First, I want to mention that you do no see "duplicate" -- you see late updates. Kafka Streams embraces "change" and there is no such thing as a final aggregate, but each agg output record is an update/refinement of the result. Strict filtering of "late updates" is hard in Kafka Streams If you

Re: Immutable Record with Kafka Stream

2017-02-24 Thread Jozef.koval
Hi Kohki, Kafka streams windows use so called "segments" internally and their retention time cannot be lower than some minimum. Your configuration is set to less than this minimum, therefore is not accepted. Even Windows#until javadoc specifies it: * Set the window maintain duration

Re: [ANNOUNCE] Apache Kafka 0.10.2.0 Released

2017-02-24 Thread Guozhang Wang
Hello folks, Some people have reported that the java doc was missing for this release. We have updated the web site including the java docs for 0102 just now. Thanks, Guozhang On Wed, Feb 22, 2017 at 2:56 PM, James Cheng wrote: > Woohoo! Thanks for running the release,

Re: Overriding JAAS config for connector

2017-02-24 Thread Ismael Juma
I suggest filing a JIRA with the details to reproduce and the stacktrace. The way it works for the consumer/producer is that the string config gets converted into a Password instance during the parsing stage. Seems like this is not happening in the connect case for some reason. Ismael On 24 Feb

Re: Immutable Record with Kafka Stream

2017-02-24 Thread Kohki Nishio
Guozhang, thanks for the reply, but I'm having trouble understanding, here's the statement from the document Windowing operations are available in the Kafka Streams DSL > , > where users can specify a

Creating topic partitions using python

2017-02-24 Thread VIVEK KUMAR MISHRA 13BIT0066
Hi All, I want to create kafka partitions for topic automatically using pykafka or kafka-python. Please do suggest me any solution through which I can do this using python. Thank You! Regards, Vivek Mishra

Re: Immutable Record with Kafka Stream

2017-02-24 Thread Guozhang Wang
Hi Kohki, Note that Streams execute operations based on the "timestamp" of the record, i.e. in your case it is the "event time" not the processing time. When you received 00:00:00,metric,2 After the long pause, it is considered as a "late arrived record" which happens at 00:00:00 but received

Re: Immutable Record with Kafka Stream

2017-02-24 Thread Kohki Nishio
Thanks for the info, however there's an alarming functionality, duplicate message is a tricky thing to manage.. I thought 'retention-period' could work for that purpose, however here's the result My TimeWindow is TimeWindows.of(6).until(6), And here's the input 00:00:00,metric,1

Re: Overriding JAAS config for connector

2017-02-24 Thread Stephen Durfey
Yes, if I set jaas.sasl.config in my connector config it results in a ClassCastException thrown by JaasUtils when it retrieves that key from the config map. It's doing an explicit cast to Password, but I he value type is String at that point.  On Fri, Feb 24, 2017 at 11:30 AM -0600,

Re: Scaling up kafka consumers

2017-02-24 Thread Gerrit Jansen van Vuuren
The kafka fast connector handles this differently than the standard kafka client (which requires one consumer per partition at most), by breaking offsets into consumable ranges which allows one partition to be read by multiple conumers where each consumer uniquely receives a different offset

Re: Overriding JAAS config for connector

2017-02-24 Thread Ismael Juma
Hi Stephen, Did you get an error when you set this as a String? It should work fine. Ismael On Thu, Feb 23, 2017 at 8:09 PM, Stephen Durfey wrote: > Now that 0.10.2.0 is out, I was looking forward to checking out the > inclusion of KIP-85 >

Re: Scaling up kafka consumers

2017-02-24 Thread Ian Wrigley
Hi If you have two consumers in your consumer group, but only one partition in the topic, then only one consumer will do any work. It’s not the case that “those two nodes start competing for messages” — one node will read from the partition, the other will have nothing to do. So to scale up by

Re: Immutable Record with Kafka Stream

2017-02-24 Thread Eno Thereska
Hi Kohki, As you mentioned, this is expected behavior. However, if you are willing to tolerate some more latency, you can improve the chance that a message with the same key is overwritten by increasing the commit time. By default it is 30 seconds, but you can increase it:

Re: Scaling up kafka consumers

2017-02-24 Thread Pradeep Gollakota
A single partition can be consumed by at most a single consumer. Consumers compete to take ownership of a partition. So, in order to gain parallelism you need to add more partitions. There is a library that allows multiple consumers to consume from a single partition

Scaling up kafka consumers

2017-02-24 Thread Jakub Stransky
Hello everyone, I was reading/checking kafka documentation regarding point-2-point and publish subscribe communications patterns in kafka and I am wondering how to scale up consumer side in point to point scenario when consuming from single kafka topic. Let say I have a single topic with single

Immutable Record with Kafka Stream

2017-02-24 Thread Kohki Nishio
Hello Kafka experts I'm trying to do windowed aggregation with Kafka Stream, however I'm getting multiple messages for the same time window, I know this is an expected behavior, however I really want to have a single message for given time window. my test code looks like below

Re: kafka streams locking issue in 0.10.20.0

2017-02-24 Thread Damian Guy
Hi Ara, This usually means that one, or more, of your StreamThreads is taking a long time during recovery or processing. So the rebalance times out and you get another rebalance. The second exception, i.e., 2017-02-22 20:12:48 WARN StreamThread:1184 - Could not create task 3_0. Will retry.