Re: Mirrormaker between 0.8.2.1 cluster and 0.10 cluster

2016-07-29 Thread Gwen Shapira
You need to use the old mirrormaker (0.8.2.1) to mirror 0.8.2.1 to 0.10.0.0.

This is true in general - always use MirrorMaker from the older release.
Because new Kafka can talk to old clients and not the other way around.

Gwen

On Fri, Jul 29, 2016 at 12:04 AM, Yifan Ying  wrote:
> Hi all,
>
> I am trying to use the mirrormaker on the 0.10 cluster to mirror the
> 0.8.2.1 cluster into 0.10 cluster. Then I got a bunch of consumer errors as
> follows:
>
>  Error in fetch kafka.consumer.ConsumerFetcherThread$FetchRequest@f9533ee
> (kafka.consumer.ConsumerFetcherThread)
>
> java.nio.BufferUnderflowException
>
> at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:145)
>
> at java.nio.ByteBuffer.get(ByteBuffer.java:694)
>
> at kafka.api.ApiUtils$.readShortString(ApiUtils.scala:40)
>
> at kafka.api.TopicData$.readFrom(FetchResponse.scala:96)
>
> at kafka.api.FetchResponse$$anonfun$4.apply(FetchResponse.scala:170)
>
> at kafka.api.FetchResponse$$anonfun$4.apply(FetchResponse.scala:169)
>
> at
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>
> at
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>
> at scala.collection.immutable.Range.foreach(Range.scala:160)
>
> at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
>
> at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
>
> at kafka.api.FetchResponse$.readFrom(FetchResponse.scala:169)
>
> at kafka.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:135)
>
> at
> kafka.consumer.ConsumerFetcherThread.fetch(ConsumerFetcherThread.scala:108)
>
> at
> kafka.consumer.ConsumerFetcherThread.fetch(ConsumerFetcherThread.scala:29)
>
> at
> kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:107)
>
> at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:98)
>
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
>
>
> Error in fetch kafka.consumer.ConsumerFetcherThread$FetchRequest@5d1339d8
>
> java.lang.IllegalArgumentException
>
> at java.nio.Buffer.limit(Buffer.java:267)
>
> at kafka.api.FetchResponsePartitionData$.readFrom(FetchResponse.scala:38)
>
> at kafka.api.TopicData$$anonfun$1.apply(FetchResponse.scala:100)
>
> at kafka.api.TopicData$$anonfun$1.apply(FetchResponse.scala:98)
>
> at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>
> at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>
> at scala.collection.immutable.Range.foreach(Range.scala:141)
>
> at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>
> at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>
> at kafka.api.TopicData$.readFrom(FetchResponse.scala:98)
>
> at kafka.api.FetchResponse$$anonfun$4.apply(FetchResponse.scala:170)
>
> at kafka.api.FetchResponse$$anonfun$4.apply(FetchResponse.scala:169)
>
> at
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
>
> at
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
>
> at scala.collection.immutable.Range.foreach(Range.scala:141)
>
> at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
>
> at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
>
> at kafka.api.FetchResponse$.readFrom(FetchResponse.scala:169)
>
> at kafka.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:135)
>
> at
> kafka.consumer.ConsumerFetcherThread.fetch(ConsumerFetcherThread.scala:108)
>
> at
> kafka.consumer.ConsumerFetcherThread.fetch(ConsumerFetcherThread.scala:29)
>
> at
> kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:107)
>
> at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:98)
>
> Is there any compatibility issue when using the new mirrormaker to mirror a
> 0.8.2.1 cluster into a 0.10 cluster?
>
> Thanks.
>
>
> --
> Yifan


Re: Chocolatey packages for ZooKeeper, Kafka?

2016-07-29 Thread Gwen Shapira
If anyone packages Kafka with Chocolatey, we'll be happy to add this
to our ecosystem page.

Currently Apache Kafka only publishes tarballs.

Gwen

On Thu, Jul 28, 2016 at 6:58 PM, Andrew Pennebaker
 wrote:
> Could we please publish Chocolatey packages for ZooKeeper and Kafka, to
> make it easier for newbies to get started?
>
> https://chocolatey.org/
>
> --
> Cheers,
> Andrew


Re: Kafka 0.9.0.1 failing on new leader election

2016-07-29 Thread Gwen Shapira
you know, I ran into those null pointer exceptions when I accidentally
tested Kafka with mismatching version of zkclient.

Can you share the versions of both? And make sure you have only one
zkclient on your classpath?

On Tue, Jul 26, 2016 at 6:40 AM, Sean Morris (semorris)
 wrote:
> I have a setup with 2 brokers and it is going through leader re-election but 
> seems to fail to complete. The behavior I start to see is that some published 
> succeed but others will fail with NotLeader exceptions like this
>
>
> java.util.concurrent.ExecutionException: 
> org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is 
> not the leader for that topic-partition.
>
> at 
> org.apache.kafka.clients.producer.internals.FutureRecordMetadata.valueOrError(FutureRecordMetadata.java:56)
>
> at 
> org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:43)
>
> at 
> org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:25)
>
>
> My Kafka and zookeeper log file has errors like this
>
>
> [2016-07-26 02:01:12,842] ERROR 
> [kafka.controller.ControllerBrokerRequestBatch] Haven't been able to send 
> metadata update requests, current state of the map is Map(2 -> Map(eox-1 -> 
> (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:46,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1),
>  notify-eportal-1 -> 
> (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1),
>  psirts-1 -> 
> (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1),
>  notify-pushNotif-low-1 -> 
> (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1)),
>  1 -> Map(eox-1 -> 
> (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:46,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1),
>  notify-eportal-1 -> 
> (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1),
>  psirts-1 -> 
> (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1),
>  notify-pushNotif-low-1 -> 
> (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1)))
>
> [2016-07-26 02:01:12,845] ERROR [kafka.controller.KafkaController] 
> [Controller 1]: Forcing the controller to resign
>
>
> Which is then followed by a null pointer exception
>
>
> [2016-07-26 02:01:13,021] ERROR [org.I0Itec.zkclient.ZkEventThread] Error 
> handling event ZkEvent[Children of /isr_change_notification changed sent to 
> kafka.controller.IsrChangeNotificationListener@55ca3750]
>
> java.lang.IllegalStateException: java.lang.NullPointerException
>
> at 
> kafka.controller.ControllerBrokerRequestBatch.sendRequestsToBrokers(ControllerChannelManager.scala:434)
>
> at 
> kafka.controller.KafkaController.sendUpdateMetadataRequest(KafkaController.scala:1029)
>
> at 
> kafka.controller.IsrChangeNotificationListener.kafka$controller$IsrChangeNotificationListener$$processUpdateNotifications(KafkaController.scala:1372)
>
> at 
> kafka.controller.IsrChangeNotificationListener$$anonfun$handleChildChange$1.apply$mcV$sp(KafkaController.scala:1359)
>
> at 
> kafka.controller.IsrChangeNotificationListener$$anonfun$handleChildChange$1.apply(KafkaController.scala:1352)
>
> at 
> kafka.controller.IsrChangeNotificationListener$$anonfun$handleChildChange$1.apply(KafkaController.scala:1352)
>
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
>
> at 
> kafka.controller.IsrChangeNotificationListener.handleChildChange(KafkaController.scala:1352)
>
> at org.I0Itec.zkclient.ZkClient$10.run(ZkClient.java:842)
>
> at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
>
> Caused by: java.lang.NullPointerException
>
> at 
> kafka.controller.KafkaController.sendRequest(KafkaController.scala:699)
>
> at 
> kafka.controller.ControllerBrokerRequestBatch$$anonfun$sendRequestsToBrokers$2.apply(ControllerChannelManager.scala:403)
>
> at 
> kafka.controller.ControllerBrokerRequestBatch$$anonfun$sendRequestsToBrokers$2.apply(ControllerChannelManager.scala:369)
>
> at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
>
> at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
>
> at 
> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
>
> at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
>
> at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
>
> at 
> kafka.controller.ControllerBrokerRequestBatch.sendRequestsToBrokers(ControllerChannelManager.scala:369)
>
> ... 9 more
>
>
> I eventually restarted zookeeper and my 

Re: Too Many Open Files

2016-07-29 Thread Gwen Shapira
woah, it looks like you have 15,000 replicas per broker?

You can go into the directory you configured for kafka's log.dir and
see how many files you have there. Depending on your segment size and
retention policy, you could have hundreds of files per partition
there...

Make sure you have at least that many file handles and then also add
handles for the client connections.

1 million file handles sound like a lot, but you are running lots of
partitions per broker...

We normally don't see more than maybe 4000 per broker and most
clusters have a lot fewer, so consider adding brokers and spreading
partitions around a bit.

Gwen

On Fri, Jul 29, 2016 at 12:00 PM, Kessiler Rodrigues
 wrote:
> Hi guys,
>
> I have been experiencing some issues on kafka, where its throwing too many 
> open files.
>
> I have around of 6k topics and 5 partitions each.
>
> My cluster was made with 6 brokers. All of them are running Ubuntu 16 and the 
> file limits settings are:
>
> `cat  /proc/sys/fs/file-max`
> 200
>
>  `ulimit -n`
> 100
>
> Anyone has experienced it before?


Re: Verify log compaction

2016-07-29 Thread John Holland
Check the log-cleaner.log file on the server.  When the thread runs you'll
see output for every partition it compacts and the compaction ratio it
achieved.

The __consumer_offsets topic is compacted, I see log output from it being
compacted frequently.

Depending on your settings for the topic it may take a while for it to
compact.  Compaction doesn't occur on the current log segment.  Look at
these settings for the topic, "segment.bytes" and "segment.ms".  Lower them
to force quicker compaction.

On 0.8.2.2, occasionally, the compaction thread would die and then the
__consumer_offets topic would grow out of control.  Kafka logs the thread
death in the log-cleaner.log.


On Fri, Jul 29, 2016 at 4:10 PM David Yu  wrote:

> Hi,
>
> We are using Kafka 0.9.0.0. One of our topic is set to use log compaction.
> We have also set log.cleaner.enable. However, we suspected that the topic
> is not being compacted.
>
> What is the best way for us to verify the compaction is happening?
>
> Thanks,
> David
>


Verify log compaction

2016-07-29 Thread David Yu
Hi,

We are using Kafka 0.9.0.0. One of our topic is set to use log compaction.
We have also set log.cleaner.enable. However, we suspected that the topic
is not being compacted.

What is the best way for us to verify the compaction is happening?

Thanks,
David


Too Many Open Files

2016-07-29 Thread Kessiler Rodrigues
Hi guys,

I have been experiencing some issues on kafka, where its throwing too many open 
files.

I have around of 6k topics and 5 partitions each.

My cluster was made with 6 brokers. All of them are running Ubuntu 16 and the 
file limits settings are:

`cat  /proc/sys/fs/file-max`
200

 `ulimit -n`
100

Anyone has experienced it before? 

Re: compression ratio

2016-07-29 Thread Ian Wrigley
I believe so, yes.

---
Ian Wrigley
Director, Education Services
Confluent, Inc

> On Jul 29, 2016, at 7:51 PM, Tauzell, Dave  
> wrote:
> 
> A compression ratio of .5 means we are getting about 2x compression?
> 
> Dave Tauzell | Senior Software Engineer | Surescripts
> O: 651.855.3042 | www.surescripts.com |   dave.tauz...@surescripts.com
> Connect with us: Twitter I LinkedIn I Facebook I YouTube
> 
> 
> -Original Message-
> From: Tauzell, Dave [mailto:dave.tauz...@surescripts.com] 
> Sent: Friday, July 29, 2016 10:09 AM
> To: users@kafka.apache.org
> Subject: RE: compression ratio
> 
> Thanks.  That's just tracked on the client, right?
> 
> -Dave
> 
> Dave Tauzell | Senior Software Engineer | Surescripts
> O: 651.855.3042 | www.surescripts.com |   dave.tauz...@surescripts.com
> Connect with us: Twitter I LinkedIn I Facebook I YouTube
> 
> 
> -Original Message-
> From: Ian Wrigley [mailto:i...@confluent.io] 
> Sent: Friday, July 29, 2016 9:27 AM
> To: users@kafka.apache.org
> Subject: Re: compression ratio
> 
> Hi Dave
> 
> The JMX metric compression-rate-avg should give you that info.
> 
> Regards
> 
> Ian.
> 
> ---
> Ian Wrigley
> Director, Education Services
> Confluent, Inc
>> On Jul 29, 2016, at 2:58 PM, Tauzell, Dave  
>> wrote:
>> 
>> Is there a good way to see what sort of compression ratio is being achieved?
>> 
>> -Dave
>> 
>> Dave Tauzell | Senior Software Engineer | Surescripts
>> O: 651.855.3042 | www.surescripts.com |   
>> dave.tauz...@surescripts.com
>> Connect with us: Twitter I 
>> LinkedIn I 
>> Facebook I 
>> YouTube
>> 
>> 
>> This e-mail and any files transmitted with it are confidential, may contain 
>> sensitive information, and are intended solely for the use of the individual 
>> or entity to whom they are addressed. If you have received this e-mail in 
>> error, please notify the sender by reply e-mail immediately and destroy all 
>> copies of the e-mail and any attachments.
> 


RE: compression ratio

2016-07-29 Thread Tauzell, Dave
A compression ratio of .5 means we are getting about 2x compression?

Dave Tauzell | Senior Software Engineer | Surescripts
O: 651.855.3042 | www.surescripts.com |   dave.tauz...@surescripts.com
Connect with us: Twitter I LinkedIn I Facebook I YouTube


-Original Message-
From: Tauzell, Dave [mailto:dave.tauz...@surescripts.com] 
Sent: Friday, July 29, 2016 10:09 AM
To: users@kafka.apache.org
Subject: RE: compression ratio

Thanks.  That's just tracked on the client, right?

-Dave

Dave Tauzell | Senior Software Engineer | Surescripts
O: 651.855.3042 | www.surescripts.com |   dave.tauz...@surescripts.com
Connect with us: Twitter I LinkedIn I Facebook I YouTube


-Original Message-
From: Ian Wrigley [mailto:i...@confluent.io] 
Sent: Friday, July 29, 2016 9:27 AM
To: users@kafka.apache.org
Subject: Re: compression ratio

Hi Dave

The JMX metric compression-rate-avg should give you that info.

Regards

Ian.

---
Ian Wrigley
Director, Education Services
Confluent, Inc
> On Jul 29, 2016, at 2:58 PM, Tauzell, Dave  
> wrote:
> 
> Is there a good way to see what sort of compression ratio is being achieved?
> 
> -Dave
> 
> Dave Tauzell | Senior Software Engineer | Surescripts
> O: 651.855.3042 | www.surescripts.com |   
> dave.tauz...@surescripts.com
> Connect with us: Twitter I 
> LinkedIn I 
> Facebook I 
> YouTube
> 
> 
> This e-mail and any files transmitted with it are confidential, may contain 
> sensitive information, and are intended solely for the use of the individual 
> or entity to whom they are addressed. If you have received this e-mail in 
> error, please notify the sender by reply e-mail immediately and destroy all 
> copies of the e-mail and any attachments.



Re: Same partition number of different Kafka topcs

2016-07-29 Thread Jack Huang
Hi Gerard,

After further digging, I found that the clients we are using also have
different partitioner. The Python one uses murmur2 (
https://github.com/dpkp/kafka-python/blob/master/kafka/partitioner/default.py),
and the NodeJS one uses its own impl (
https://github.com/SOHU-Co/kafka-node/blob/master/lib/partitioner.js). Does
Kafka delegate the task of partitioning to client? From their documentation
it doesn't seem like they provide an option to select the "default Kafka
partitioner".

Thanks,
Jack


On Fri, Jul 29, 2016 at 7:42 AM, Gerard Klijs 
wrote:

> The default partitioner will take the key, make the hash from it, and do a
> modulo operation to determine the partition it goes to. Some things which
> might cause it to and up different for different topics:
> - partition number are not the same (you already checked)
> - key is not exactly the same, for example one might have a space after the
> id
> - the other topic is configured to use another partitioner
> - the serialiser for the key is different for both topics, since the hash
> is created based on the bytes of key of the serialised message
> - all the topics use another partitioner (for example round robin)
>
> On Thu, Jul 28, 2016 at 9:11 PM Jack Huang  wrote:
>
> > Hi all,
> >
> > I have an application where I need to join events from two different
> > topics. Every event is identified by an id, which is used as the key for
> > the topic partition. After doing some experiment, I observed that events
> > will go into different partitions even if the number of partitions for
> both
> > topics are the same. I can't find any documentation on this point though.
> > Does anyone know if this is indeed the case?
> >
> >
> > Thanks,
> > Jack
> >
>


Re: Kafka 0.9.0.1 failing on new leader election

2016-07-29 Thread Sean Morris (semorris)
Yes. This is happening after several days of running data, not on initial 
startup.

Thanks.




On 7/29/16, 11:54 AM, "David Garcia"  wrote:

>Well, just a dumb question, but did you include all the brokers in your client 
>connection properties?
>
>On 7/29/16, 10:48 AM, "Sean Morris (semorris)"  wrote:
>
>Anyone have any ideas?
>
>From: semorris >
>Date: Tuesday, July 26, 2016 at 9:40 AM
>To: "users@kafka.apache.org" 
> >
>Subject: Kafka 0.9.0.1 failing on new leader election
>
>I have a setup with 2 brokers and it is going through leader re-election 
> but seems to fail to complete. The behavior I start to see is that some 
> published succeed but others will fail with NotLeader exceptions like this
>
>
>java.util.concurrent.ExecutionException: 
> org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is 
> not the leader for that topic-partition.
>
>at 
> org.apache.kafka.clients.producer.internals.FutureRecordMetadata.valueOrError(FutureRecordMetadata.java:56)
>
>at 
> org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:43)
>
>at 
> org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:25)
>
>
>My Kafka and zookeeper log file has errors like this
>
>
>[2016-07-26 02:01:12,842] ERROR 
> [kafka.controller.ControllerBrokerRequestBatch] Haven't been able to send 
> metadata update requests, current state of the map is Map(2 -> Map(eox-1 -> 
> (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:46,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1),
>  notify-eportal-1 -> 
> (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1),
>  psirts-1 -> 
> (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1),
>  notify-pushNotif-low-1 -> 
> (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1)),
>  1 -> Map(eox-1 -> 
> (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:46,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1),
>  notify-eportal-1 -> 
> (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1),
>  psirts-1 -> 
> (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1),
>  notify-pushNotif-low-1 -> 
> (LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1)))
>
>[2016-07-26 02:01:12,845] ERROR [kafka.controller.KafkaController] 
> [Controller 1]: Forcing the controller to resign
>
>
>Which is then followed by a null pointer exception
>
>
>[2016-07-26 02:01:13,021] ERROR [org.I0Itec.zkclient.ZkEventThread] Error 
> handling event ZkEvent[Children of /isr_change_notification changed sent to 
> kafka.controller.IsrChangeNotificationListener@55ca3750]
>
>java.lang.IllegalStateException: java.lang.NullPointerException
>
>at 
> kafka.controller.ControllerBrokerRequestBatch.sendRequestsToBrokers(ControllerChannelManager.scala:434)
>
>at 
> kafka.controller.KafkaController.sendUpdateMetadataRequest(KafkaController.scala:1029)
>
>at 
> kafka.controller.IsrChangeNotificationListener.kafka$controller$IsrChangeNotificationListener$$processUpdateNotifications(KafkaController.scala:1372)
>
>at 
> kafka.controller.IsrChangeNotificationListener$$anonfun$handleChildChange$1.apply$mcV$sp(KafkaController.scala:1359)
>
>at 
> kafka.controller.IsrChangeNotificationListener$$anonfun$handleChildChange$1.apply(KafkaController.scala:1352)
>
>at 
> kafka.controller.IsrChangeNotificationListener$$anonfun$handleChildChange$1.apply(KafkaController.scala:1352)
>
>at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
>
>at 
> kafka.controller.IsrChangeNotificationListener.handleChildChange(KafkaController.scala:1352)
>
>at org.I0Itec.zkclient.ZkClient$10.run(ZkClient.java:842)
>
>at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
>
>Caused by: java.lang.NullPointerException
>
>at 
> kafka.controller.KafkaController.sendRequest(KafkaController.scala:699)
>
>at 
> kafka.controller.ControllerBrokerRequestBatch$$anonfun$sendRequestsToBrokers$2.apply(ControllerChannelManager.scala:403)
>
>at 
> kafka.controller.ControllerBrokerRequestBatch$$anonfun$sendRequestsToBrokers$2.apply(ControllerChannelManager.scala:369)
>
>at 
> 

Re: Questions about Kafka Streams Partitioning & Deployment

2016-07-29 Thread Michael Noll
Michael,

> Guozhang, in (2) above did you mean "some keys *may be* hashed to
different
> partitions and the existing local state stores will not be valid?"
> That fits with out understanding.

Yes, that's what Guozhang meant.

Corrected version:

When you increase the number of input partitions and hence number of
processors / local stores, however, some keys may be hashed to
different partitions and the existing local state stores will not be
valid
in this case. [...]


Hope this helps,
Michael



On Wed, Jul 20, 2016 at 11:13 PM, Michael Ruberry 
wrote:

> Thank you both for your replies. This is incredibly helpful.
>
> Guozhang, in (2) above did you mean "some keys* may be* hashed to different
> partitions and the existing local state stores will not be valid?" That
> fits with out understanding.
>
> As to your caveats in (3) and (4), we are trying to be sure that our
> datastore will be "loaded" properly before we begin processing. Would you
> say the promise when we request a store value for a key given in
> process(key, ...) is that we get the most up date value for that key? Is
> this promise true if we restart the app or create a new app consuming the
> same local store? I believe that's the case but want to double check now.
>
> On Wed, Jul 20, 2016 at 1:14 PM, Guozhang Wang  wrote:
>
> > Hi Michael,
> >
> > 1. Kafka Streams always tries to colocate the local stores with the
> > processing nodes based on the partition key. For example, if you want to
> do
> > an aggregation based on key K1, but the input topic is not keyed on K1
> and
> > hence not partitioned on that. The library then will auto-repartition
> into
> > an intermediate topic based on K1 to make sure that the local stores used
> > for storing the aggregates based on K1 will be co-located with the
> > processor that gets partitions hashed on K1 as well.
> >
> > 2. When you increase the number of input partitions and hence number of
> > processors / local stores, however, some keys may not be hashed to
> > different partitions and the existing local state stores will not be
> valid
> > in this case. In practice, we recommend users to over-partition their
> input
> > topics (note that multiple partitions can be assigned to the same
> > processor) so that when they increase the number of streams instances
> > later, they do not need to add more partitions.
> >
> > 3. If you change your code, again the existing state stores may not be
> > valid (note colocating is still guaranteed) anymore depending on how you
> > modified the computational logic. In this case either you treat the new
> > code as a new application with a different application id so that
> > everything can be restarted from scratch, or you can "wipe out" the
> > existing invalid processing state, for which we have provided some tools
> > for this purpose and are writing a tutorial about how to do
> "re-processing"
> > in general.
> >
> > 4. About bootstrapping, currently Kafka Streams does not have a
> "bootstrap
> > stream" concept so that it can be processed completely before processing
> > other streams. Instead, we are currently relying on using the record
> > timestamp to "synchronize streams" (similar to the message chooser
> > functionality in Samza) and you can find more details here:
> >
> >
> >
> http://docs.confluent.io/3.0.0/streams/architecture.html#flow-control-with-timestamps
> >
> > And we are currently working on having finer flow control mechanisms as
> > well:
> >
> > https://issues.apache.org/jira/browse/KAFKA-3478
> >
> >
> > Hope these help.
> >
> > Guozhang
> >
> >
> >
> > On Wed, Jul 20, 2016 at 12:36 PM, Eno Thereska 
> > wrote:
> >
> > > Hi Michael,
> > >
> > > These are good questions and I can confirm that the system works in the
> > > way you hope it works, if you use the DSL and don't make up keys
> > > arbitrarily. In other words, there is nothing currently that prevents
> you
> > > from shooting yourself in the foot e.g., by making up keys and using
> the
> > > same made-up key in different processing nodes.
> > >
> > > However, if you use the Kafka Streams primitives, then such bad
> > situations
> > > are not supposed to happen.
> > >
> > > Thanks
> > > Eno
> > >
> > > > On 18 Jul 2016, at 11:28, Michael Ruberry 
> > wrote:
> > > >
> > > > Hi all,
> > > >
> > > > My company, Taboola, has been looking at Kafka Streams and we are
> still
> > > > confused about some details of partitions and store persistence.
> > > >
> > > > So let's say we have an app consuming a single topic and we're using
> an
> > > > embedded and persisted key-value store:
> > > >
> > > >
> > > >   1. If we restart the app, will each node finish loading values from
> > the
> > > >   store before it begins processing? (Analogous to Samza's boot-strap
> > > >   streams?)
> > > >   2. How are are the store's entries partitioned among the nodes? Is
> > > there
> > > > 

Re: [kafka-clients] [VOTE] 0.10.0.1 RC0

2016-07-29 Thread Dana Powers
+1

tested against kafka-python integration test suite = pass.

Aside: as the scope of kafka gets bigger, it may be useful to organize
release notes into functional groups like core, brokers, clients,
kafka-streams, etc. I've found this useful when organizing
kafka-python release notes.

-Dana

On Fri, Jul 29, 2016 at 7:46 AM, Ismael Juma  wrote:
> Hello Kafka users, developers and client-developers,
>
> This is the first candidate for the release of Apache Kafka 0.10.0.1. This
> is a bug fix release and it includes fixes and improvements from 50 JIRAs
> (including a few critical bugs). See the release notes for more details:
>
> http://home.apache.org/~ijuma/kafka-0.10.0.1-rc0/RELEASE_NOTES.html
>
> *** Please download, test and vote by Monday, 1 August, 8am PT ***
>
> Kafka's KEYS file containing PGP keys we use to sign the release:
> http://kafka.apache.org/KEYS
>
> * Release artifacts to be voted upon (source and binary):
> http://home.apache.org/~ijuma/kafka-0.10.0.1-rc0/
>
> * Maven artifacts to be voted upon:
> https://repository.apache.org/content/groups/staging
>
> * Javadoc:
> http://home.apache.org/~ijuma/kafka-0.10.0.1-rc0/javadoc/
>
> * Tag to be voted upon (off 0.10.0 branch) is the 0.10.0.1-rc0 tag:
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=0c2322c2cf7ab7909cfd8b834d1d2fffc34db109
>
> * Documentation:
> http://kafka.apache.org/0100/documentation.html
>
> * Protocol:
> http://kafka.apache.org/0100/protocol.html
>
> * Successful Jenkins builds for the 0.10.0 branch:
> Unit/integration tests: https://builds.apache.org/job/kafka-0.10.0-jdk7/170/
> System tests: https://jenkins.confluent.io/job/system-test-kafka-0.10.0/130/
>
> Thanks,
> Ismael
>
> --
> You received this message because you are subscribed to the Google Groups
> "kafka-clients" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kafka-clients+unsubscr...@googlegroups.com.
> To post to this group, send email to kafka-clie...@googlegroups.com.
> Visit this group at https://groups.google.com/group/kafka-clients.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/kafka-clients/CAD5tkZYz8fbLAodpqKg5eRiCsm4ze9QK3ufTz3Q4U%3DGs0CRb1A%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.


Re: Kafka 0.9.0.1 failing on new leader election

2016-07-29 Thread David Garcia
Well, just a dumb question, but did you include all the brokers in your client 
connection properties?

On 7/29/16, 10:48 AM, "Sean Morris (semorris)"  wrote:

Anyone have any ideas?

From: semorris >
Date: Tuesday, July 26, 2016 at 9:40 AM
To: "users@kafka.apache.org" 
>
Subject: Kafka 0.9.0.1 failing on new leader election

I have a setup with 2 brokers and it is going through leader re-election 
but seems to fail to complete. The behavior I start to see is that some 
published succeed but others will fail with NotLeader exceptions like this


java.util.concurrent.ExecutionException: 
org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is 
not the leader for that topic-partition.

at 
org.apache.kafka.clients.producer.internals.FutureRecordMetadata.valueOrError(FutureRecordMetadata.java:56)

at 
org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:43)

at 
org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:25)


My Kafka and zookeeper log file has errors like this


[2016-07-26 02:01:12,842] ERROR 
[kafka.controller.ControllerBrokerRequestBatch] Haven't been able to send 
metadata update requests, current state of the map is Map(2 -> Map(eox-1 -> 
(LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:46,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1),
 notify-eportal-1 -> 
(LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1),
 psirts-1 -> 
(LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1),
 notify-pushNotif-low-1 -> 
(LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1)),
 1 -> Map(eox-1 -> 
(LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:46,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1),
 notify-eportal-1 -> 
(LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1),
 psirts-1 -> 
(LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1),
 notify-pushNotif-low-1 -> 
(LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1)))

[2016-07-26 02:01:12,845] ERROR [kafka.controller.KafkaController] 
[Controller 1]: Forcing the controller to resign


Which is then followed by a null pointer exception


[2016-07-26 02:01:13,021] ERROR [org.I0Itec.zkclient.ZkEventThread] Error 
handling event ZkEvent[Children of /isr_change_notification changed sent to 
kafka.controller.IsrChangeNotificationListener@55ca3750]

java.lang.IllegalStateException: java.lang.NullPointerException

at 
kafka.controller.ControllerBrokerRequestBatch.sendRequestsToBrokers(ControllerChannelManager.scala:434)

at 
kafka.controller.KafkaController.sendUpdateMetadataRequest(KafkaController.scala:1029)

at 
kafka.controller.IsrChangeNotificationListener.kafka$controller$IsrChangeNotificationListener$$processUpdateNotifications(KafkaController.scala:1372)

at 
kafka.controller.IsrChangeNotificationListener$$anonfun$handleChildChange$1.apply$mcV$sp(KafkaController.scala:1359)

at 
kafka.controller.IsrChangeNotificationListener$$anonfun$handleChildChange$1.apply(KafkaController.scala:1352)

at 
kafka.controller.IsrChangeNotificationListener$$anonfun$handleChildChange$1.apply(KafkaController.scala:1352)

at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)

at 
kafka.controller.IsrChangeNotificationListener.handleChildChange(KafkaController.scala:1352)

at org.I0Itec.zkclient.ZkClient$10.run(ZkClient.java:842)

at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)

Caused by: java.lang.NullPointerException

at 
kafka.controller.KafkaController.sendRequest(KafkaController.scala:699)

at 
kafka.controller.ControllerBrokerRequestBatch$$anonfun$sendRequestsToBrokers$2.apply(ControllerChannelManager.scala:403)

at 
kafka.controller.ControllerBrokerRequestBatch$$anonfun$sendRequestsToBrokers$2.apply(ControllerChannelManager.scala:369)

at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)

at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)

at 
scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)

at 

Re: Kafka 0.9.0.1 failing on new leader election

2016-07-29 Thread Sean Morris (semorris)
Anyone have any ideas?

From: semorris >
Date: Tuesday, July 26, 2016 at 9:40 AM
To: "users@kafka.apache.org" 
>
Subject: Kafka 0.9.0.1 failing on new leader election

I have a setup with 2 brokers and it is going through leader re-election but 
seems to fail to complete. The behavior I start to see is that some published 
succeed but others will fail with NotLeader exceptions like this


java.util.concurrent.ExecutionException: 
org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is 
not the leader for that topic-partition.

at 
org.apache.kafka.clients.producer.internals.FutureRecordMetadata.valueOrError(FutureRecordMetadata.java:56)

at 
org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:43)

at 
org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:25)


My Kafka and zookeeper log file has errors like this


[2016-07-26 02:01:12,842] ERROR [kafka.controller.ControllerBrokerRequestBatch] 
Haven't been able to send metadata update requests, current state of the map is 
Map(2 -> Map(eox-1 -> 
(LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:46,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1),
 notify-eportal-1 -> 
(LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1),
 psirts-1 -> 
(LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1),
 notify-pushNotif-low-1 -> 
(LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1)),
 1 -> Map(eox-1 -> 
(LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:46,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1),
 notify-eportal-1 -> 
(LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1),
 psirts-1 -> 
(LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1),
 notify-pushNotif-low-1 -> 
(LeaderAndIsrInfo:(Leader:2,ISR:2,1,LeaderEpoch:51,ControllerEpoch:34),ReplicationFactor:2),AllReplicas:2,1)))

[2016-07-26 02:01:12,845] ERROR [kafka.controller.KafkaController] [Controller 
1]: Forcing the controller to resign


Which is then followed by a null pointer exception


[2016-07-26 02:01:13,021] ERROR [org.I0Itec.zkclient.ZkEventThread] Error 
handling event ZkEvent[Children of /isr_change_notification changed sent to 
kafka.controller.IsrChangeNotificationListener@55ca3750]

java.lang.IllegalStateException: java.lang.NullPointerException

at 
kafka.controller.ControllerBrokerRequestBatch.sendRequestsToBrokers(ControllerChannelManager.scala:434)

at 
kafka.controller.KafkaController.sendUpdateMetadataRequest(KafkaController.scala:1029)

at 
kafka.controller.IsrChangeNotificationListener.kafka$controller$IsrChangeNotificationListener$$processUpdateNotifications(KafkaController.scala:1372)

at 
kafka.controller.IsrChangeNotificationListener$$anonfun$handleChildChange$1.apply$mcV$sp(KafkaController.scala:1359)

at 
kafka.controller.IsrChangeNotificationListener$$anonfun$handleChildChange$1.apply(KafkaController.scala:1352)

at 
kafka.controller.IsrChangeNotificationListener$$anonfun$handleChildChange$1.apply(KafkaController.scala:1352)

at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)

at 
kafka.controller.IsrChangeNotificationListener.handleChildChange(KafkaController.scala:1352)

at org.I0Itec.zkclient.ZkClient$10.run(ZkClient.java:842)

at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)

Caused by: java.lang.NullPointerException

at 
kafka.controller.KafkaController.sendRequest(KafkaController.scala:699)

at 
kafka.controller.ControllerBrokerRequestBatch$$anonfun$sendRequestsToBrokers$2.apply(ControllerChannelManager.scala:403)

at 
kafka.controller.ControllerBrokerRequestBatch$$anonfun$sendRequestsToBrokers$2.apply(ControllerChannelManager.scala:369)

at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)

at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)

at 
scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)

at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)

at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)

at 
kafka.controller.ControllerBrokerRequestBatch.sendRequestsToBrokers(ControllerChannelManager.scala:369)

... 9 more


I eventually restarted zookeeper and my brokers. This has happened twice in the 
last week. Any ideas?


Thanks,

Sean



RE: Kafka streams Issue

2016-07-29 Thread Hamza HACHANI
Thanks i will try that.


Hamza


De : Tauzell, Dave 
Envoyé : vendredi 29 juillet 2016 03:18:47
À : users@kafka.apache.org
Objet : RE: Kafka streams Issue

Let's say you currently have:

Procesing App---> OUTPUT TOPIC ---> output consumer

You would ideally like the processing app to only write to the output topic 
every minute, but cannot easily do this.  So what you might be able to do is:


Processing App ---> INTERMIDIATE OUTPUT TOPIC --->  Coalesce Process --->>= 
OUTPUT TOPIC

The Coalesce Process is an application that does something like:

Bucket = new list()
Consumer = createConsumer()
While( message = Cosumer.next() ) {
Window = calculate current window
   If message is after Window:
 Send Bucket to OUTPUT TOPIC
  Else
Add message to Bucket

}

Dave Tauzell | Senior Software Engineer | Surescripts
O: 651.855.3042 | www.surescripts.com |   
dave.tauz...@surescripts.com
Connect with us: Twitter I LinkedIn I Facebook I YouTube


-Original Message-
From: Hamza HACHANI [mailto:hamza.hach...@supcom.tn]
Sent: Friday, July 29, 2016 9:53 AM
To: users@kafka.apache.org
Subject: RE: Kafka streams Issue

Hi Dave,

Could you explain a little bit much your idea ?
I can't figure out what you are suggesting.
Thank you

-Hamza

De : Tauzell, Dave  Envoyé : vendredi 29 juillet 
2016 02:39:53 À : users@kafka.apache.org Objet : RE: Kafka streams Issue

You could send the message immediately to an intermediary topic.  Then have a 
consumer of that topic that pull messages off and waits until the minute is up.

-Dave

Dave Tauzell | Senior Software Engineer | Surescripts
O: 651.855.3042 | www.surescripts.com |   
dave.tauz...@surescripts.com
Connect with us: Twitter I LinkedIn I Facebook I YouTube


-Original Message-
From: Hamza HACHANI [mailto:hamza.hach...@supcom.tn]
Sent: Friday, July 29, 2016 9:36 AM
To: users@kafka.apache.org
Subject: Kafka streams Issue

> Good morning,
>
> I'm an ICT student in TELECOM BRRETAGNE (a french school).
> I did follow your presentation in Youtube and i found them really
> intresting.
> I'm trying to do some stuffs with Kafka. And now it has been  about 3
> days that I'm blocked.
> I'm trying to control the time in which my processing application send
> data to the output topic .
> What i'm trying to do is to make the application process data from the
> input topic all the time but send the messages only at the end of a
> minute/an hour/a month  (the notion of windowing).
> For the moment what i managed to do is that the application instead of
> sending data only at the end of the minute,it send it anytime it does
> receive it from the input topic.
> Have you any suggestions to help me?
> I would be really gratfeul.


Preliminary answer for now:

> For the moment what i managed to do is that the application instead of
sending data only at the end
> of the minute,it send it anytime it does receive it from the input topic.

This is actually the expected behavior at the moment.

The main reason for this behavior is that, in stream processing, we never know 
whether there is still late-arriving data to be received.  For example, imagine 
you have 1-minute windows based on event-time.  Here, it may happen that, after 
the first 1 minute window has passed, another record arrives five minutes later 
but, according to the record's event-time, it should have still been part of 
the first 1-minute window.  In this case, what we typically want to happen is 
that the first 1-window will be updated/reprocessed with the late-arriving 
record included.  In other words, just because 1 minute has passed (= the 
1-minute window is "done") it does not mean that actually all the data for that 
time interval has been processed already -- so sending only a single update 
after 1 minute has passed would even produce incorrect results in many cases.  
For this reason you currently see a downstream update anytime there is a new 
incoming data record ("send it anytime it does receive it from the input 
topic").  So the point here is due ensure correctness of processing.

That said, one known drawback of the current behavior is that users haven't 
been able to control (read: decrease/reduce) the rate/volume of the resulting 
downstream updates.  For example, if you have an input topic with a rate of 1 
million msg/s (which is easy for Kafka), some users want to aggregate/window 
results primarily to reduce the input rate to a lower numbers (e.g. 1 thousand 
msg/s) so that the data can be fed from Kafka to other systems that might not 
scale as well as Kafka.  To help these use cases we will have a new 
configuration parameter in the next major version of Kafka that allows you to 
control the rate/volume of downstream updates.
Here, the point is to help users optimize resource usage rather than 

RE: Kafka streams Issue

2016-07-29 Thread Tauzell, Dave
Let's say you currently have:

Procesing App---> OUTPUT TOPIC ---> output consumer

You would ideally like the processing app to only write to the output topic 
every minute, but cannot easily do this.  So what you might be able to do is:


Processing App ---> INTERMIDIATE OUTPUT TOPIC --->  Coalesce Process --->>= 
OUTPUT TOPIC

The Coalesce Process is an application that does something like:

Bucket = new list()
Consumer = createConsumer()
While( message = Cosumer.next() ) {
Window = calculate current window
   If message is after Window:
 Send Bucket to OUTPUT TOPIC
  Else
Add message to Bucket

}

Dave Tauzell | Senior Software Engineer | Surescripts
O: 651.855.3042 | www.surescripts.com |   dave.tauz...@surescripts.com
Connect with us: Twitter I LinkedIn I Facebook I YouTube


-Original Message-
From: Hamza HACHANI [mailto:hamza.hach...@supcom.tn] 
Sent: Friday, July 29, 2016 9:53 AM
To: users@kafka.apache.org
Subject: RE: Kafka streams Issue

Hi Dave,

Could you explain a little bit much your idea ?
I can't figure out what you are suggesting.
Thank you

-Hamza

De : Tauzell, Dave  Envoyé : vendredi 29 juillet 
2016 02:39:53 À : users@kafka.apache.org Objet : RE: Kafka streams Issue

You could send the message immediately to an intermediary topic.  Then have a 
consumer of that topic that pull messages off and waits until the minute is up.

-Dave

Dave Tauzell | Senior Software Engineer | Surescripts
O: 651.855.3042 | www.surescripts.com |   
dave.tauz...@surescripts.com
Connect with us: Twitter I LinkedIn I Facebook I YouTube


-Original Message-
From: Hamza HACHANI [mailto:hamza.hach...@supcom.tn]
Sent: Friday, July 29, 2016 9:36 AM
To: users@kafka.apache.org
Subject: Kafka streams Issue

> Good morning,
>
> I'm an ICT student in TELECOM BRRETAGNE (a french school).
> I did follow your presentation in Youtube and i found them really 
> intresting.
> I'm trying to do some stuffs with Kafka. And now it has been  about 3 
> days that I'm blocked.
> I'm trying to control the time in which my processing application send 
> data to the output topic .
> What i'm trying to do is to make the application process data from the 
> input topic all the time but send the messages only at the end of a 
> minute/an hour/a month  (the notion of windowing).
> For the moment what i managed to do is that the application instead of 
> sending data only at the end of the minute,it send it anytime it does 
> receive it from the input topic.
> Have you any suggestions to help me?
> I would be really gratfeul.


Preliminary answer for now:

> For the moment what i managed to do is that the application instead of
sending data only at the end
> of the minute,it send it anytime it does receive it from the input topic.

This is actually the expected behavior at the moment.

The main reason for this behavior is that, in stream processing, we never know 
whether there is still late-arriving data to be received.  For example, imagine 
you have 1-minute windows based on event-time.  Here, it may happen that, after 
the first 1 minute window has passed, another record arrives five minutes later 
but, according to the record's event-time, it should have still been part of 
the first 1-minute window.  In this case, what we typically want to happen is 
that the first 1-window will be updated/reprocessed with the late-arriving 
record included.  In other words, just because 1 minute has passed (= the 
1-minute window is "done") it does not mean that actually all the data for that 
time interval has been processed already -- so sending only a single update 
after 1 minute has passed would even produce incorrect results in many cases.  
For this reason you currently see a downstream update anytime there is a new 
incoming data record ("send it anytime it does receive it from the input 
topic").  So the point here is due ensure correctness of processing.

That said, one known drawback of the current behavior is that users haven't 
been able to control (read: decrease/reduce) the rate/volume of the resulting 
downstream updates.  For example, if you have an input topic with a rate of 1 
million msg/s (which is easy for Kafka), some users want to aggregate/window 
results primarily to reduce the input rate to a lower numbers (e.g. 1 thousand 
msg/s) so that the data can be fed from Kafka to other systems that might not 
scale as well as Kafka.  To help these use cases we will have a new 
configuration parameter in the next major version of Kafka that allows you to 
control the rate/volume of downstream updates.
Here, the point is to help users optimize resource usage rather than 
correctness of processing.  This new parameter should also help you with your 
use case.  But even this new parameter is not based on strict time behavior or 
time windows.

This e-mail and any files transmitted with it are confidential, may 

Re: Chocolatey packages for ZooKeeper, Kafka?

2016-07-29 Thread Patrick Hunt
Hi Andrew, if you want to publish somelike like that for ZK (on github say)
we'd be happy to link to it on our wiki "useful tools" page.

Regards,

Patrick

On Thu, Jul 28, 2016 at 6:58 PM, Andrew Pennebaker <
andrew.penneba...@gmail.com> wrote:

> Could we please publish Chocolatey packages for ZooKeeper and Kafka, to
> make it easier for newbies to get started?
>
> https://chocolatey.org/
>
> --
> Cheers,
> Andrew
>


RE: compression ratio

2016-07-29 Thread Tauzell, Dave
Thanks.  That's just tracked on the client, right?

-Dave

Dave Tauzell | Senior Software Engineer | Surescripts
O: 651.855.3042 | www.surescripts.com |   dave.tauz...@surescripts.com
Connect with us: Twitter I LinkedIn I Facebook I YouTube


-Original Message-
From: Ian Wrigley [mailto:i...@confluent.io] 
Sent: Friday, July 29, 2016 9:27 AM
To: users@kafka.apache.org
Subject: Re: compression ratio

Hi Dave

The JMX metric compression-rate-avg should give you that info.

Regards

Ian.

---
Ian Wrigley
Director, Education Services
Confluent, Inc
> On Jul 29, 2016, at 2:58 PM, Tauzell, Dave  
> wrote:
> 
> Is there a good way to see what sort of compression ratio is being achieved?
> 
> -Dave
> 
> Dave Tauzell | Senior Software Engineer | Surescripts
> O: 651.855.3042 | www.surescripts.com |   
> dave.tauz...@surescripts.com
> Connect with us: Twitter I 
> LinkedIn I 
> Facebook I 
> YouTube
> 
> 
> This e-mail and any files transmitted with it are confidential, may contain 
> sensitive information, and are intended solely for the use of the individual 
> or entity to whom they are addressed. If you have received this e-mail in 
> error, please notify the sender by reply e-mail immediately and destroy all 
> copies of the e-mail and any attachments.



Re: [VOTE] 0.10.0.1 RC0

2016-07-29 Thread Harsha Ch
Hi Ismael,
  I would like to this JIRA included in the minor release
https://issues.apache.org/jira/browse/KAFKA-3950.
Thanks,
Harsha

On Fri, Jul 29, 2016 at 7:46 AM Ismael Juma  wrote:

> Hello Kafka users, developers and client-developers,
>
> This is the first candidate for the release of Apache Kafka 0.10.0.1. This
> is a bug fix release and it includes fixes and improvements from 50 JIRAs
> (including a few critical bugs). See the release notes for more details:
>
> http://home.apache.org/~ijuma/kafka-0.10.0.1-rc0/RELEASE_NOTES.html
>
> *** Please download, test and vote by Monday, 1 August, 8am PT ***
>
> Kafka's KEYS file containing PGP keys we use to sign the release:
> http://kafka.apache.org/KEYS
>
> * Release artifacts to be voted upon (source and binary):
> http://home.apache.org/~ijuma/kafka-0.10.0.1-rc0/
>
> * Maven artifacts to be voted upon:
> https://repository.apache.org/content/groups/staging
>
> * Javadoc:
> http://home.apache.org/~ijuma/kafka-0.10.0.1-rc0/javadoc/
>
> * Tag to be voted upon (off 0.10.0 branch) is the 0.10.0.1-rc0 tag:
>
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=0c2322c2cf7ab7909cfd8b834d1d2fffc34db109
>
> * Documentation:
> http://kafka.apache.org/0100/documentation.html
>
> * Protocol:
> http://kafka.apache.org/0100/protocol.html
>
> * Successful Jenkins builds for the 0.10.0 branch:
> Unit/integration tests:
> https://builds.apache.org/job/kafka-0.10.0-jdk7/170/
> System tests:
> https://jenkins.confluent.io/job/system-test-kafka-0.10.0/130/
>
> Thanks,
> Ismael
>


Re: Same partition number of different Kafka topcs

2016-07-29 Thread Gerard Klijs
The default partitioner will take the key, make the hash from it, and do a
modulo operation to determine the partition it goes to. Some things which
might cause it to and up different for different topics:
- partition number are not the same (you already checked)
- key is not exactly the same, for example one might have a space after the
id
- the other topic is configured to use another partitioner
- the serialiser for the key is different for both topics, since the hash
is created based on the bytes of key of the serialised message
- all the topics use another partitioner (for example round robin)

On Thu, Jul 28, 2016 at 9:11 PM Jack Huang  wrote:

> Hi all,
>
> I have an application where I need to join events from two different
> topics. Every event is identified by an id, which is used as the key for
> the topic partition. After doing some experiment, I observed that events
> will go into different partitions even if the number of partitions for both
> topics are the same. I can't find any documentation on this point though.
> Does anyone know if this is indeed the case?
>
>
> Thanks,
> Jack
>


RE: Kafka streams Issue

2016-07-29 Thread Tauzell, Dave
You could send the message immediately to an intermediary topic.  Then have a 
consumer of that topic that pull messages off and waits until the minute is up.

-Dave

Dave Tauzell | Senior Software Engineer | Surescripts
O: 651.855.3042 | www.surescripts.com |   dave.tauz...@surescripts.com
Connect with us: Twitter I LinkedIn I Facebook I YouTube


-Original Message-
From: Hamza HACHANI [mailto:hamza.hach...@supcom.tn]
Sent: Friday, July 29, 2016 9:36 AM
To: users@kafka.apache.org
Subject: Kafka streams Issue

> Good morning,
>
> I'm an ICT student in TELECOM BRRETAGNE (a french school).
> I did follow your presentation in Youtube and i found them really
> intresting.
> I'm trying to do some stuffs with Kafka. And now it has been  about 3
> days that I'm blocked.
> I'm trying to control the time in which my processing application send
> data to the output topic .
> What i'm trying to do is to make the application process data from the
> input topic all the time but send the messages only at the end of a
> minute/an hour/a month  (the notion of windowing).
> For the moment what i managed to do is that the application instead of
> sending data only at the end of the minute,it send it anytime it does
> receive it from the input topic.
> Have you any suggestions to help me?
> I would be really gratfeul.


Preliminary answer for now:

> For the moment what i managed to do is that the application instead of
sending data only at the end
> of the minute,it send it anytime it does receive it from the input topic.

This is actually the expected behavior at the moment.

The main reason for this behavior is that, in stream processing, we never know 
whether there is still late-arriving data to be received.  For example, imagine 
you have 1-minute windows based on event-time.  Here, it may happen that, after 
the first 1 minute window has passed, another record arrives five minutes later 
but, according to the record's event-time, it should have still been part of 
the first 1-minute window.  In this case, what we typically want to happen is 
that the first 1-window will be updated/reprocessed with the late-arriving 
record included.  In other words, just because 1 minute has passed (= the 
1-minute window is "done") it does not mean that actually all the data for that 
time interval has been processed already -- so sending only a single update 
after 1 minute has passed would even produce incorrect results in many cases.  
For this reason you currently see a downstream update anytime there is a new 
incoming data record ("send it anytime it does receive it from the input 
topic").  So the point here is due ensure correctness of processing.

That said, one known drawback of the current behavior is that users haven't 
been able to control (read: decrease/reduce) the rate/volume of the resulting 
downstream updates.  For example, if you have an input topic with a rate of 1 
million msg/s (which is easy for Kafka), some users want to aggregate/window 
results primarily to reduce the input rate to a lower numbers (e.g. 1 thousand 
msg/s) so that the data can be fed from Kafka to other systems that might not 
scale as well as Kafka.  To help these use cases we will have a new 
configuration parameter in the next major version of Kafka that allows you to 
control the rate/volume of downstream updates.
Here, the point is to help users optimize resource usage rather than 
correctness of processing.  This new parameter should also help you with your 
use case.  But even this new parameter is not based on strict time behavior or 
time windows.

This e-mail and any files transmitted with it are confidential, may contain 
sensitive information, and are intended solely for the use of the individual or 
entity to whom they are addressed. If you have received this e-mail in error, 
please notify the sender by reply e-mail immediately and destroy all copies of 
the e-mail and any attachments.


RE: SSD or not for Kafka brokers?

2016-07-29 Thread Tauzell, Dave
In addition for sequential writes, which is common with kafka, SSD isn't much 
faster than HDD.

Dave Tauzell | Senior Software Engineer | Surescripts
O: 651.855.3042 | www.surescripts.com |   dave.tauz...@surescripts.com
Connect with us: Twitter I LinkedIn I Facebook I YouTube


-Original Message-
From: Gerard Klijs [mailto:gerard.kl...@dizzit.com]
Sent: Friday, July 29, 2016 9:33 AM
To: users@kafka.apache.org
Subject: Re: SSD or not for Kafka brokers?

As I under stood it won't really has any advantage over using HDD since most 
things will work from the working memory anyway. You might want to use SSD for 
zookeeper through.

On Fri, Jul 29, 2016 at 12:19 AM Kessiler Rodrigues 
wrote:

> Hi guys,
>
> Should I use SSD for my brokers or not?
>
> What are the pros and cons?
>
> Thank you!
>
This e-mail and any files transmitted with it are confidential, may contain 
sensitive information, and are intended solely for the use of the individual or 
entity to whom they are addressed. If you have received this e-mail in error, 
please notify the sender by reply e-mail immediately and destroy all copies of 
the e-mail and any attachments.


Kafka streams Issue

2016-07-29 Thread Hamza HACHANI
> Good morning,
>
> I'm an ICT student in TELECOM BRRETAGNE (a french school).
> I did follow your presentation in Youtube and i found them really
> intresting.
> I'm trying to do some stuffs with Kafka. And now it has been  about 3 days
> that I'm blocked.
> I'm trying to control the time in which my processing application send
> data to the output topic .
> What i'm trying to do is to make the application process data from the
> input topic all the time but send the messages only at the end of a
> minute/an hour/a month  (the notion of windowing).
> For the moment what i managed to do is that the application instead of
> sending data only at the end of the minute,it send it anytime it does
> receive it from the input topic.
> Have you any suggestions to help me?
> I would be really gratfeul.


Preliminary answer for now:

> For the moment what i managed to do is that the application instead of
sending data only at the end
> of the minute,it send it anytime it does receive it from the input topic.

This is actually the expected behavior at the moment.

The main reason for this behavior is that, in stream processing, we never
know whether there is still late-arriving data to be received.  For
example, imagine you have 1-minute windows based on event-time.  Here, it
may happen that, after the first 1 minute window has passed, another record
arrives five minutes later but, according to the record's event-time, it
should have still been part of the first 1-minute window.  In this case,
what we typically want to happen is that the first 1-window will be
updated/reprocessed with the late-arriving record included.  In other
words, just because 1 minute has passed (= the 1-minute window is "done")
it does not mean that actually all the data for that time interval has been
processed already -- so sending only a single update after 1 minute has
passed would even produce incorrect results in many cases.  For this reason
you currently see a downstream update anytime there is a new incoming data
record ("send it anytime it does receive it from the input topic").  So the
point here is due ensure correctness of processing.

That said, one known drawback of the current behavior is that users haven't
been able to control (read: decrease/reduce) the rate/volume of the
resulting downstream updates.  For example, if you have an input topic with
a rate of 1 million msg/s (which is easy for Kafka), some users want to
aggregate/window results primarily to reduce the input rate to a lower
numbers (e.g. 1 thousand msg/s) so that the data can be fed from Kafka to
other systems that might not scale as well as Kafka.  To help these use
cases we will have a new configuration parameter in the next major version
of Kafka that allows you to control the rate/volume of downstream updates.
Here, the point is to help users optimize resource usage rather than
correctness of processing.  This new parameter should also help you with
your use case.  But even this new parameter is not based on strict time
behavior or time windows.



Re: compression ratio

2016-07-29 Thread Ian Wrigley
Hi Dave

The JMX metric compression-rate-avg should give you that info.

Regards

Ian.

---
Ian Wrigley
Director, Education Services
Confluent, Inc
> On Jul 29, 2016, at 2:58 PM, Tauzell, Dave  
> wrote:
> 
> Is there a good way to see what sort of compression ratio is being achieved?
> 
> -Dave
> 
> Dave Tauzell | Senior Software Engineer | Surescripts
> O: 651.855.3042 | www.surescripts.com |   
> dave.tauz...@surescripts.com
> Connect with us: Twitter I 
> LinkedIn I 
> Facebook I 
> YouTube
> 
> 
> This e-mail and any files transmitted with it are confidential, may contain 
> sensitive information, and are intended solely for the use of the individual 
> or entity to whom they are addressed. If you have received this e-mail in 
> error, please notify the sender by reply e-mail immediately and destroy all 
> copies of the e-mail and any attachments.



compression ratio

2016-07-29 Thread Tauzell, Dave
Is there a good way to see what sort of compression ratio is being achieved?

-Dave

Dave Tauzell | Senior Software Engineer | Surescripts
O: 651.855.3042 | www.surescripts.com |   
dave.tauz...@surescripts.com
Connect with us: Twitter I 
LinkedIn I 
Facebook I 
YouTube


This e-mail and any files transmitted with it are confidential, may contain 
sensitive information, and are intended solely for the use of the individual or 
entity to whom they are addressed. If you have received this e-mail in error, 
please notify the sender by reply e-mail immediately and destroy all copies of 
the e-mail and any attachments.


Re: SSD or not for Kafka brokers?

2016-07-29 Thread Gerard Klijs
As I under stood it won't really has any advantage over using HDD since
most things will work from the working memory anyway. You might want to use
SSD for zookeeper through.

On Fri, Jul 29, 2016 at 12:19 AM Kessiler Rodrigues 
wrote:

> Hi guys,
>
> Should I use SSD for my brokers or not?
>
> What are the pros and cons?
>
> Thank you!
>


Re: Jars in Kafka 0.10

2016-07-29 Thread Gerard Klijs
No, if you don't use streams you don't need them. If you have no clients
(so also no mirror maker) running on the same machine you also don't need
the client jar, if you run zookeeper separately you also don't need those.

On Fri, Jul 29, 2016 at 4:22 PM Bhuvaneswaran Gopalasami <
bhuvanragha...@gmail.com> wrote:

> I have recently started looking into Kafka I noticed the number of Jars in
> Kafka 0.10 has increased when compared to 0.8.2. Do we really need all
> those libraries to run Kafka ?
>
> Thanks,
> Bhuvanes
>


Jars in Kafka 0.10

2016-07-29 Thread Bhuvaneswaran Gopalasami
I have recently started looking into Kafka I noticed the number of Jars in
Kafka 0.10 has increased when compared to 0.8.2. Do we really need all
those libraries to run Kafka ?

Thanks,
Bhuvanes


AUTO: Yan Wang is out of the office (returning 08/08/2016)

2016-07-29 Thread Yan Wang


I am out of the office until 08/08/2016.




Note: This is an automated response to your message  "Mirrormaker between
0.8.2.1 cluster and 0.10 cluster" sent on 7/29/2016 2:04:44 AM.

This is the only notification you will receive while this person is away.
**

This email and any attachments may contain information that is confidential 
and/or privileged for the sole use of the intended recipient.  Any use, review, 
disclosure, copying, distribution or reliance by others, and any forwarding of 
this email or its contents, without the express permission of the sender is 
strictly prohibited by law.  If you are not the intended recipient, please 
contact the sender immediately, delete the e-mail and destroy all copies.
**


Mirrormaker between 0.8.2.1 cluster and 0.10 cluster

2016-07-29 Thread Yifan Ying
Hi all,

I am trying to use the mirrormaker on the 0.10 cluster to mirror the
0.8.2.1 cluster into 0.10 cluster. Then I got a bunch of consumer errors as
follows:

 Error in fetch kafka.consumer.ConsumerFetcherThread$FetchRequest@f9533ee
(kafka.consumer.ConsumerFetcherThread)

java.nio.BufferUnderflowException

at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:145)

at java.nio.ByteBuffer.get(ByteBuffer.java:694)

at kafka.api.ApiUtils$.readShortString(ApiUtils.scala:40)

at kafka.api.TopicData$.readFrom(FetchResponse.scala:96)

at kafka.api.FetchResponse$$anonfun$4.apply(FetchResponse.scala:170)

at kafka.api.FetchResponse$$anonfun$4.apply(FetchResponse.scala:169)

at
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)

at
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)

at scala.collection.immutable.Range.foreach(Range.scala:160)

at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)

at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)

at kafka.api.FetchResponse$.readFrom(FetchResponse.scala:169)

at kafka.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:135)

at
kafka.consumer.ConsumerFetcherThread.fetch(ConsumerFetcherThread.scala:108)

at
kafka.consumer.ConsumerFetcherThread.fetch(ConsumerFetcherThread.scala:29)

at
kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:107)

at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:98)

at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)


Error in fetch kafka.consumer.ConsumerFetcherThread$FetchRequest@5d1339d8

java.lang.IllegalArgumentException

at java.nio.Buffer.limit(Buffer.java:267)

at kafka.api.FetchResponsePartitionData$.readFrom(FetchResponse.scala:38)

at kafka.api.TopicData$$anonfun$1.apply(FetchResponse.scala:100)

at kafka.api.TopicData$$anonfun$1.apply(FetchResponse.scala:98)

at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)

at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)

at scala.collection.immutable.Range.foreach(Range.scala:141)

at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)

at scala.collection.AbstractTraversable.map(Traversable.scala:105)

at kafka.api.TopicData$.readFrom(FetchResponse.scala:98)

at kafka.api.FetchResponse$$anonfun$4.apply(FetchResponse.scala:170)

at kafka.api.FetchResponse$$anonfun$4.apply(FetchResponse.scala:169)

at
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)

at
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)

at scala.collection.immutable.Range.foreach(Range.scala:141)

at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)

at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)

at kafka.api.FetchResponse$.readFrom(FetchResponse.scala:169)

at kafka.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:135)

at
kafka.consumer.ConsumerFetcherThread.fetch(ConsumerFetcherThread.scala:108)

at
kafka.consumer.ConsumerFetcherThread.fetch(ConsumerFetcherThread.scala:29)

at
kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:107)

at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:98)

Is there any compatibility issue when using the new mirrormaker to mirror a
0.8.2.1 cluster into a 0.10 cluster?

Thanks.


-- 
Yifan