Re: Which is True? Kafka site vs Confluent 3.2 site upgrade doc details contradiction regarding 0.10.2 clients backward compatible to resp. 0.10.0 vs 0.10.1?

2017-04-01 Thread Matthias J. Sax
The reason Streams API 0.10.2 is not backward compatible to 0.10.0
broker is not related to Producer/Consumer API. Streams API (as of
0.10.2) uses a new AdminClient (cf. KIP-4) for topic management that is
not supported by 0.10.0 brokers. 0.10.0 broker topics are managed via ZK
client only -- and ZK client got remove form Streams API in 0.10.2.

-Matthias

On 4/1/17 12:27 AM, Hans Jespersen wrote:
> They are both true. The Apache text is talking about the compatibility of the 
> Producer/Consumer API and the Confluent text is talking about the Streams API.
> 
> -hans
> 
>> On Mar 31, 2017, at 11:46 PM, Roger Vandusen 
>>  wrote:
>>
>> Read below and answer: So which is the source of truth ?
>> Is 0.10.2.0 compatible to 0.10.0 or 0.10.1?
>>
>> Which site needs correction?
>>
>>
>> From current Kafka docs.
>> Statement from Kafka site:
>> https://kafka.apache.org/documentation/#upgrade
>>
>> Starting with version 0.10.2, Java clients (producer and consumer) have 
>> acquired the ability to communicate with older brokers.
>> Version 0.10.2 clients can talk to version 0.10.0 or newer brokers.
>> However, if your brokers are older than 0.10.0, you must upgrade all the 
>> brokers in the Kafka cluster before upgrading your clients.
>> Version 0.10.2 brokers support 0.8.x and newer clients. Before 0.10.2, Kafka 
>> is backward compatible,
>> which means that clients from Kafka 0.8.x releases (CP 1.0.x) will work with 
>> brokers from Kafka 0.9.x, 0.10.0, 0.10.1 and 0.10.2
>> releases (CP 2.0.x, 3.0.x, 3.1.x and 3.2.x), but not vice-versa.
>> This means you always need to plan upgrades such that all brokers are 
>> upgraded before clients.
>>
>>
>> Hmm...do some more reading and research on Confluent site found this which 
>> seems to contradict the above statement:
>> http://docs.confluent.io/3.2.0/streams/upgrade-guide.html
>>
>>
>> Upgrading from CP 3.1.x (Kafka 0.10.1.x-cp2) to CP 3.2.0 (Kafka 0.10.2.0-cp1)
>> Compatibility
>> Kafka Streams applications built with CP 3.2.0 (Kafka 0.10.2.0-cp1) are 
>> forward and backward compatible with certain Kafka clusters.
>>
>> Compatibility Matrix:
>>
>>
>> Kafka Broker (columns)
>>
>> Streams API (rows)
>>
>> 3.0.x / 0.10.0.x
>>
>> 3.1.x / 0.10.1.x
>>
>> 3.2.0 / 0.10.2.0
>>
>> 3.0.x / 0.10.0.x
>>
>> compatible
>>
>> compatible
>>
>> compatible
>>
>> 3.1.x / 0.10.1.x
>>
>>
>>
>> compatible
>>
>> compatible
>>
>> 3.2.0 / 0.10.2.0
>>
>>
>>
>> compatible
>>
>> compatible
>>
>>
>> EMPHASIS ON CONTRADICTION BELOW from site desciption:
>>
>> Backward-compatible to CP 3.1.x clusters (Kafka 0.10.1.x-cp2):
>> This is the first release allowing to upgrade your Kafka Streams application 
>> without a broker upgrade.
>> New Kafka Streams applications built with CP 3.2.0 (Kafka 0.10.2.x-cp1) will 
>> work with older Kafka clusters running CP 3.1.x (Kafka 0.10.1.x-cp2).
>> Kafka clusters running CP 3.0.x (Kafka 0.10.0.x-cp1) are not compatible with 
>> new CP 3.2.0 Kafka Streams applications though.
>>
>> -Roger
>>



signature.asc
Description: OpenPGP digital signature


Re: Difference between with and without state store cleanup streams startup

2017-04-01 Thread Matthias J. Sax
Yes. That's correct.

-Matthias

On 3/31/17 9:32 PM, Sachin Mittal wrote:
> Hi,
> Ok so basically what I understand is that there are no global offset
> maintained from changelog topic at broker level.
> Every local state store maintains the offset under a local checkpoint file.
> 
> And in order to make sure state store rebuilds or builds its state by
> reading from changelog topic faster, we need to ensure that change log
> topics are compacted efficiently.
> 
> I hope these assumptions are correct.
> 
> Thanks
> Sachin
> 
> 
> On Sat, Apr 1, 2017 at 4:51 AM, Matthias J. Sax 
> wrote:
> 
>> 1. The whole log will be read.
>>
>> 2. It will read all the key-value pairs. However, the store will contain
>> only the latest record for each key, after state recovery finished.
>>
>> Both both (1) and (2): note, that changelog topics are compacted, thus,
>> it will not read everything since you started your app, as log
>> compaction will remove old records that got "overwritten" with newer
>> records.
>>
>> 3. The instance will hit the timeout and you will get an exception.
>>
>> 4. and 5. It depends. On a clean shutdown, Streams writes a checkpoint
>> file and thus knows the state of the store. Therefore, it does not need
>> to read the changelog to recover the store. On unclean shutdown (like
>> kill -9) the checkpoint file will be missing, and thus Streams will wipe
>> out the state and recreate it from scratch (thus, you don't need to call
>> .cleanup() manually)
>>
>> 6.
>> https://github.com/apache/kafka/blob/trunk/streams/src/
>> main/java/org/apache/kafka/streams/processor/internals/
>> StoreChangelogReader.java#L99
>>
>> called here:
>> https://github.com/apache/kafka/blob/trunk/streams/src/
>> main/java/org/apache/kafka/streams/processor/internals/
>> StreamThread.java#L1294
>>
>>
>> -Matthias
>>
>> On 3/31/17 11:46 AM, Sachin Mittal wrote:
>>> Hi,
>>> There are two ways to re start a streams application
>>> 1. executing streams.cleanUp() before streams.start()
>>> This cleans up the local state store.
>>>
>>> 2. Just by calling streams.start()
>>>
>>> What are the differences between two.
>>>
>>> As I understand in first case it will try to create local state store by
>>> replaying the changelog topic.
>>>
>>> So questions here are
>>> 1. Will it try to replay the whole log from earliest or from last
>> committed
>>> offset?
>>> 2. Will it read and fetch all the values from the topic for a given key
>> or
>>> only the last value for a key when creating a state store for that change
>>> log topic?
>>> 3. What happens if time to create the state store is greater than
>>> max.poll.interval.ms
>>>
>>> 4. If we don't delete the state store then what happens. Does it again
>> try
>>> to recreate the same store by reading entire change log topic? Or if it
>>> determines somehow from the state store what the latest offset is and
>>> updates the state store with values from that offset onwards.
>>>
>>> 5. In case of unclean shutdown say by kill -9 is it advisable to cleanup
>>> local state store before restart or we can just stat it normally.
>>>
>>> 6. Finally can anyone point me to the code where it creates the state
>> store
>>> by reading from changelog topic.
>>>
>>> Thanks
>>> Sachin
>>>
>>
>>
> 



signature.asc
Description: OpenPGP digital signature


Re: How to increase network throughput of Kafka cluster?

2017-04-01 Thread Hans Jespersen
Then you will need even more parallel producers to saturate a 10 GigE network 
(if you don't hit you disk I/O limit first)

-hans

> On Apr 1, 2017, at 3:15 PM, Archie  wrote:
> 
> My replication factor is 1.
> 
> Thanks,
> Archie
> 
>> On Sat, Apr 1, 2017 at 3:51 PM, Hans Jespersen  wrote:
>> 
>> What replication factor are you using? If you have a default replication
>> factor = 3 then a publish rate of 1.4 Gbps is actually 1.4 Gbps *3 = 4.2
>> Gbps of network traffic. If you are also consuming at the same time then
>> it’s actually 4.2 Gbps + 1.4 Gbps = 5.6 Gbps.
>> 
>> You would completely saturate the network if you added a second producer
>> and consumer at those rates (if your storage system can keep up to the
>> network bandwidth).
>> 
>> -hans
>> 
>> 
>> 
>>> On Apr 1, 2017, at 10:25 AM, Archie  wrote:
>>> 
>>> I have set up my kafka cluster in a network with 9.3 Gbps links. And am
>>> using the bin/kafka-producer-perf-test.sh script to test the throughput
>>> performance of kafka for a topic which has 1 partition.
>>> 
>>> Right now I am only able to use 1.4 Gbps of network link, i.e. the script
>>> returns a throughput of 1.4 Gbps (179 MBps)
>>> 
>>> My basic question is how can I increase the throughput of kafka to
>>> completely saturate the network?
>>> 
>>> Thanks,
>>> Archie
>> 
>> 


Re: How to increase network throughput of Kafka cluster?

2017-04-01 Thread Archie
My replication factor is 1.

Thanks,
Archie

On Sat, Apr 1, 2017 at 3:51 PM, Hans Jespersen  wrote:

> What replication factor are you using? If you have a default replication
> factor = 3 then a publish rate of 1.4 Gbps is actually 1.4 Gbps *3 = 4.2
> Gbps of network traffic. If you are also consuming at the same time then
> it’s actually 4.2 Gbps + 1.4 Gbps = 5.6 Gbps.
>
> You would completely saturate the network if you added a second producer
> and consumer at those rates (if your storage system can keep up to the
> network bandwidth).
>
> -hans
>
>
>
> > On Apr 1, 2017, at 10:25 AM, Archie  wrote:
> >
> > I have set up my kafka cluster in a network with 9.3 Gbps links. And am
> > using the bin/kafka-producer-perf-test.sh script to test the throughput
> > performance of kafka for a topic which has 1 partition.
> >
> > Right now I am only able to use 1.4 Gbps of network link, i.e. the script
> > returns a throughput of 1.4 Gbps (179 MBps)
> >
> > My basic question is how can I increase the throughput of kafka to
> > completely saturate the network?
> >
> > Thanks,
> > Archie
>
>


kafka connection problem.

2017-04-01 Thread Angshuman Chatterjee
Hi!
   for last few days i am try to connect spark and kafka using direct
streaming but I am not able to do so. I am facing following problem:
   http://stackoverflow.com/q/42991780/2315294
http://stackoverflow.com/q/43050654/2315294 .

So,please tell me what else I can do?

Thanks.
Angshuman


Re: How to increase network throughput of Kafka cluster?

2017-04-01 Thread Hans Jespersen
What replication factor are you using? If you have a default replication factor 
= 3 then a publish rate of 1.4 Gbps is actually 1.4 Gbps *3 = 4.2 Gbps of 
network traffic. If you are also consuming at the same time then it’s actually 
4.2 Gbps + 1.4 Gbps = 5.6 Gbps.  

You would completely saturate the network if you added a second producer and 
consumer at those rates (if your storage system can keep up to the network 
bandwidth).

-hans



> On Apr 1, 2017, at 10:25 AM, Archie  wrote:
> 
> I have set up my kafka cluster in a network with 9.3 Gbps links. And am
> using the bin/kafka-producer-perf-test.sh script to test the throughput
> performance of kafka for a topic which has 1 partition.
> 
> Right now I am only able to use 1.4 Gbps of network link, i.e. the script
> returns a throughput of 1.4 Gbps (179 MBps)
> 
> My basic question is how can I increase the throughput of kafka to
> completely saturate the network?
> 
> Thanks,
> Archie



Re: Kafka Streams: Is it possible to pause/resume consuming a topic?

2017-04-01 Thread Tianji Li
Perfect, thanks so much Eno!

On Sat, Apr 1, 2017 at 12:44 PM, Eno Thereska 
wrote:

> Sure, here is an example of pausing:
> https://github.com/apache/kafka/blob/trunk/streams/src/
> main/java/org/apache/kafka/streams/processor/internals/
> StreamTask.java#L165  kafka/blob/trunk/streams/src/main/java/org/apache/kafka/
> streams/processor/internals/StreamTask.java#L165>
>
> And resuming:
> https://github.com/apache/kafka/blob/trunk/streams/src/
> main/java/org/apache/kafka/streams/processor/internals/
> StreamTask.java#L215  kafka/blob/trunk/streams/src/main/java/org/apache/kafka/
> streams/processor/internals/StreamTask.java#L215>
>
> Cheers
> Eno
> > On 1 Apr 2017, at 17:14, Tianji Li  wrote:
> >
> > Hi Eno,
> >
> > Could you point to me where in code this is happening please?
> >
> > Thanks
> > Tianji
> >
> > On Sat, Apr 1, 2017 at 11:45 AM, Eno Thereska 
> > wrote:
> >
> >> Tianji,
> >>
> >> You shouldn’t have to worry about pausing and resuming the consumer,
> since
> >> that happens internally automatically.
> >>
> >> Eno
> >>
> >>> On Apr 1, 2017, at 3:26 PM, Tianji Li  wrote:
> >>>
> >>> Hi there,
> >>>
> >>> Say a processor that is consuming topic A and producing into topic B,
> and
> >>> somehow the processing takes long time, is it possible to pause the
> >>> consuming from topic A, and later on resume?
> >>>
> >>> Or does it make sense to do so? If not, what are the options to resolve
> >>> this issue?
> >>>
> >>> Thanks
> >>> Tianji
> >>
> >>
>
>


Re: How to increase network throughput of Kafka cluster?

2017-04-01 Thread Manikumar
Producer performance also depends on disk I/O.  You can try tuning
batch.size, compression.type configs.
also, add more partitions for more throughput.

Useful presentation:
https://www.slideshare.net/JiangjieQin/producer-performance-tuning-for-apache-kafka-63147600

On Sat, Apr 1, 2017 at 10:55 PM, Archie  wrote:

> I have set up my kafka cluster in a network with 9.3 Gbps links. And am
> using the bin/kafka-producer-perf-test.sh script to test the throughput
> performance of kafka for a topic which has 1 partition.
>
> Right now I am only able to use 1.4 Gbps of network link, i.e. the script
> returns a throughput of 1.4 Gbps (179 MBps)
>
> My basic question is how can I increase the throughput of kafka to
> completely saturate the network?
>
> Thanks,
> Archie
>


How to increase network throughput of Kafka cluster?

2017-04-01 Thread Archie
I have set up my kafka cluster in a network with 9.3 Gbps links. And am
using the bin/kafka-producer-perf-test.sh script to test the throughput
performance of kafka for a topic which has 1 partition.

Right now I am only able to use 1.4 Gbps of network link, i.e. the script
returns a throughput of 1.4 Gbps (179 MBps)

My basic question is how can I increase the throughput of kafka to
completely saturate the network?

Thanks,
Archie


Re: Kafka Streams: Is it possible to pause/resume consuming a topic?

2017-04-01 Thread Eno Thereska
Sure, here is an example of pausing:
https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamTask.java#L165
 


And resuming:
https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamTask.java#L215
 


Cheers
Eno
> On 1 Apr 2017, at 17:14, Tianji Li  wrote:
> 
> Hi Eno,
> 
> Could you point to me where in code this is happening please?
> 
> Thanks
> Tianji
> 
> On Sat, Apr 1, 2017 at 11:45 AM, Eno Thereska 
> wrote:
> 
>> Tianji,
>> 
>> You shouldn’t have to worry about pausing and resuming the consumer, since
>> that happens internally automatically.
>> 
>> Eno
>> 
>>> On Apr 1, 2017, at 3:26 PM, Tianji Li  wrote:
>>> 
>>> Hi there,
>>> 
>>> Say a processor that is consuming topic A and producing into topic B, and
>>> somehow the processing takes long time, is it possible to pause the
>>> consuming from topic A, and later on resume?
>>> 
>>> Or does it make sense to do so? If not, what are the options to resolve
>>> this issue?
>>> 
>>> Thanks
>>> Tianji
>> 
>> 



Re: Kafka Streams: Is it possible to pause/resume consuming a topic?

2017-04-01 Thread Tianji Li
Hi Eno,

Could you point to me where in code this is happening please?

Thanks
Tianji

On Sat, Apr 1, 2017 at 11:45 AM, Eno Thereska 
wrote:

> Tianji,
>
> You shouldn’t have to worry about pausing and resuming the consumer, since
> that happens internally automatically.
>
> Eno
>
> > On Apr 1, 2017, at 3:26 PM, Tianji Li  wrote:
> >
> > Hi there,
> >
> > Say a processor that is consuming topic A and producing into topic B, and
> > somehow the processing takes long time, is it possible to pause the
> > consuming from topic A, and later on resume?
> >
> > Or does it make sense to do so? If not, what are the options to resolve
> > this issue?
> >
> > Thanks
> > Tianji
>
>


Re: Kafka Streams: Is it possible to pause/resume consuming a topic?

2017-04-01 Thread Eno Thereska
Tianji,

You shouldn’t have to worry about pausing and resuming the consumer, since that 
happens internally automatically. 

Eno

> On Apr 1, 2017, at 3:26 PM, Tianji Li  wrote:
> 
> Hi there,
> 
> Say a processor that is consuming topic A and producing into topic B, and
> somehow the processing takes long time, is it possible to pause the
> consuming from topic A, and later on resume?
> 
> Or does it make sense to do so? If not, what are the options to resolve
> this issue?
> 
> Thanks
> Tianji



Kafka Streams: Is it possible to pause/resume consuming a topic?

2017-04-01 Thread Tianji Li
Hi there,

Say a processor that is consuming topic A and producing into topic B, and
somehow the processing takes long time, is it possible to pause the
consuming from topic A, and later on resume?

Or does it make sense to do so? If not, what are the options to resolve
this issue?

Thanks
Tianji


How many levels can Message Set nest recursively?

2017-04-01 Thread Yang Cui
I am thinking about that:
   1 If a Producer compresses a Message set which is nested more than 2 levels 
records recursively and sends it to broker, how does broker know which offset 
should be allocated to this message set without uncompressed all levels and 
getting the  all records?
   2 Assuming that there is a Message set just compresses with 10 
record(message) without more deeper nest (only 1 level nest),  when Consumer 
receives this message set,  Consumer thinks that it gets 10 messages, but 
Consumer observes that the offsets of all 10 items are the same value, right?



Re: Which is True? Kafka site vs Confluent 3.2 site upgrade doc details contradiction regarding 0.10.2 clients backward compatible to resp. 0.10.0 vs 0.10.1?

2017-04-01 Thread Hans Jespersen
They are both true. The Apache text is talking about the compatibility of the 
Producer/Consumer API and the Confluent text is talking about the Streams API.

-hans

> On Mar 31, 2017, at 11:46 PM, Roger Vandusen 
>  wrote:
> 
> Read below and answer: So which is the source of truth ?
> Is 0.10.2.0 compatible to 0.10.0 or 0.10.1?
> 
> Which site needs correction?
> 
> 
> From current Kafka docs.
> Statement from Kafka site:
> https://kafka.apache.org/documentation/#upgrade
> 
> Starting with version 0.10.2, Java clients (producer and consumer) have 
> acquired the ability to communicate with older brokers.
> Version 0.10.2 clients can talk to version 0.10.0 or newer brokers.
> However, if your brokers are older than 0.10.0, you must upgrade all the 
> brokers in the Kafka cluster before upgrading your clients.
> Version 0.10.2 brokers support 0.8.x and newer clients. Before 0.10.2, Kafka 
> is backward compatible,
> which means that clients from Kafka 0.8.x releases (CP 1.0.x) will work with 
> brokers from Kafka 0.9.x, 0.10.0, 0.10.1 and 0.10.2
> releases (CP 2.0.x, 3.0.x, 3.1.x and 3.2.x), but not vice-versa.
> This means you always need to plan upgrades such that all brokers are 
> upgraded before clients.
> 
> 
> Hmm...do some more reading and research on Confluent site found this which 
> seems to contradict the above statement:
> http://docs.confluent.io/3.2.0/streams/upgrade-guide.html
> 
> 
> Upgrading from CP 3.1.x (Kafka 0.10.1.x-cp2) to CP 3.2.0 (Kafka 0.10.2.0-cp1)
> Compatibility
> Kafka Streams applications built with CP 3.2.0 (Kafka 0.10.2.0-cp1) are 
> forward and backward compatible with certain Kafka clusters.
> 
> Compatibility Matrix:
> 
> 
> Kafka Broker (columns)
> 
> Streams API (rows)
> 
> 3.0.x / 0.10.0.x
> 
> 3.1.x / 0.10.1.x
> 
> 3.2.0 / 0.10.2.0
> 
> 3.0.x / 0.10.0.x
> 
> compatible
> 
> compatible
> 
> compatible
> 
> 3.1.x / 0.10.1.x
> 
> 
> 
> compatible
> 
> compatible
> 
> 3.2.0 / 0.10.2.0
> 
> 
> 
> compatible
> 
> compatible
> 
> 
> EMPHASIS ON CONTRADICTION BELOW from site desciption:
> 
> Backward-compatible to CP 3.1.x clusters (Kafka 0.10.1.x-cp2):
> This is the first release allowing to upgrade your Kafka Streams application 
> without a broker upgrade.
> New Kafka Streams applications built with CP 3.2.0 (Kafka 0.10.2.x-cp1) will 
> work with older Kafka clusters running CP 3.1.x (Kafka 0.10.1.x-cp2).
> Kafka clusters running CP 3.0.x (Kafka 0.10.0.x-cp1) are not compatible with 
> new CP 3.2.0 Kafka Streams applications though.
> 
> -Roger
>