Re: Kafka and spark integration

2016-10-28 Thread Andrew Stevenson
Spark has a Kafka Integration, if you want to write data from Kafka to HDFS use the HDFS Kafka Connect Sink from Confluent. On 27/10/2016, 03:37, "Mohan Nani" wrote: Any body know the end to end hadoop data flow which has Kafka - spark integration. I am primarily concerned o

Problem with timestamp in Producer

2016-10-28 Thread Debasish Ghosh
Hello - I am a beginner in Kafka .. with my first Kafka streams application .. I have a streams application that reads from a topic, does some transformation on the data and writes to another topic. The record that I manipulate is a CSV record. It runs fine when I run it on a local Kafka instanc

Re: Problem with timestamp in Producer

2016-10-28 Thread Debasish Ghosh
I am actually using 0.10.0 and NOT 0.10.1 as I mentioned in the last mail. And I am using Kafka within a DC/OS cluster under AWS. The version that I mentioned works ok is on my local machine using a local Kafka installation. And it works for both single broker and multi broker scenario. Thanks.

Kafka Multi DataCenter HA/Failover

2016-10-28 Thread Mudit Agarwal
Hi, I learned that Confluent Enterprise provides Multi DC failover and HA synchronously and without any lag.I'm looking to learn further information and more detailed documentation on this.I have gone thorugh the white paper and it just talks about Replicator. Any pointers for more information

RE: Kafka Multi DataCenter HA/Failover

2016-10-28 Thread Tauzell, Dave
>> without any lag You are going to have some lag at some point between datacenters. I haven't used this but from taking to them they are working or have created a replacement for MirrorMaker using the Connect framework which will fix a number of MirrorMaker issues. I haven't talked to anybod

Re: Kafka Multi DataCenter HA/Failover

2016-10-28 Thread Mudit Agarwal
Thanks dave. Any ways for how we can achieve HA/Failover in kafka across two DC? Thanks,Mudit From: "Tauzell, Dave" To: "users@kafka.apache.org" ; Mudit Agarwal Sent: Friday, 28 October 2016 4:02 PM Subject: RE: Kafka Multi DataCenter HA/Failover >> without any lag You are going

Kafka Connect Hdfs Sink not sinking

2016-10-28 Thread Henry Kim
Hi, I'm was attempting to follow the hdfs-connector quick start guide (http://docs.confluent.io/3.0.0/connect/connect-hdfs/docs/hdfs_connector.html#quickstart), but I'm unable to consume messages using Kafka Connect (hdfs-connector). I did confirm that I am able to consume the messages via con

RE: Kafka Multi DataCenter HA/Failover

2016-10-28 Thread Tauzell, Dave
By failover do you mean: 1. The producers in Datacenter A will start writing to Kafka in Datacenter B if Kafka in A is failing? Or 2. Consumers in Datacenter B have access to messages written to Kafka in Datacenter A -Dave -Original Message- From: Mudit Agarwal [mailto:mudit...@yahoo.c

Re: Kafka Multi DataCenter HA/Failover

2016-10-28 Thread Hans Jespersen
What is the latency between the two datacenters? I ask because unless they are very close, you probably don’t want to do any form of synchronous replication. The Confluent Replicator (coming very soon in Confluent Enterprise 3.1) will do async replication of both messages and configuration metad

Re: Kafka Multi DataCenter HA/Failover

2016-10-28 Thread Mudit Agarwal
Hi Hans, The latency between my two DC is 150ms.And yes I'm looking for synchronous replication.Is that possible? Thanks,Mudit From: Hans Jespersen To: users@kafka.apache.org; Mudit Agarwal Sent: Friday, 28 October 2016 4:34 PM Subject: Re: Kafka Multi DataCenter HA/Failover What

Re: Kafka Multi DataCenter HA/Failover

2016-10-28 Thread Mudit Agarwal
I means 1.The producers in Datacenter A will start writing to Kafka in Datacenter B if Kafka in A is failing? From: "Tauzell, Dave" To: "users@kafka.apache.org" ; Mudit Agarwal Sent: Friday, 28 October 2016 4:22 PM Subject: RE: Kafka Multi DataCenter HA/Failover By failover do yo

RE: Kafka Multi DataCenter HA/Failover

2016-10-28 Thread Tauzell, Dave
I don't know of anything to handle that situation for you, but your application can be written to do that. -Dave -Original Message- From: Mudit Agarwal [mailto:mudit...@yahoo.com.INVALID] Sent: Friday, October 28, 2016 11:08 AM To: Tauzell, Dave; users@kafka.apache.org Subject: Re: Kafk

Re: Kafka Multi DataCenter HA/Failover

2016-10-28 Thread Hans Jespersen
Are you willing to have a maximum throughput of 6.67 messages per second? -hans /** * Hans Jespersen, Principal Systems Engineer, Confluent Inc. * h...@confluent.io (650)924-2670 */ On Fri, Oct 28, 2016 at 9:07 AM, Mudit Agarwal wrote: > Hi Hans, > > The latency between my two DC is 150ms.A

RE: Kafka Multi DataCenter HA/Failover

2016-10-28 Thread Tauzell, Dave
I wouldn't use synchronous replication between two datacenters. If your network link ever goes down all Kafka writes will fail. If you ever need to do maintenance you'll either need to somehow turn this off or all kafka writes will fail. Plus, as Hans mentions, this will slow down your thro

Zookeeper fails to see all the brokers at once

2016-10-28 Thread vivek thakre
Hello All, I have a Kafka Cluster deployed on AWS. I am noticing this issue randomly when none of the brokers are registered with zookeeper ( I have set up a monitor on this by using zk-shell util) During this issue, the cluster continues to operate i.e events can be produced and consumed. But th

consumer_offsets partition skew and possibly ignored retention

2016-10-28 Thread Chi Hoang
Hi, We have a 3-node cluster that is running 0.9.0.1, and recently saw that the "__consumer_offsets" topic on one of the nodes seems really skewed with disk usage that looks like: 73G ./__consumer_offsets-10 0 ./__consumer_offsets-7 0 ./__consumer_offsets-4 0 ./__consumer_off

Re: Problem with timestamp in Producer

2016-10-28 Thread Matthias J. Sax
-BEGIN PGP SIGNED MESSAGE- Hash: SHA512 Hey, we just added a new FAQ entry for upcoming CP 3.2 release that answers your question. I just c&p it here. More concrete answer below. > If you get an exception similar to the one shown below, there are > multiple possible causes: > > Exceptio

Re: consumer_offsets partition skew and possibly ignored retention

2016-10-28 Thread James Brown
I was having this problem with one of my __consumer_offsets partitions; I used reassignment to move the large partition onto a different set of machines (which forced the cleaner to run through them again) and after the new machines finished replicating, the partition was back down from 41GB to a n

Kafka cannot shutdown

2016-10-28 Thread Json Tu
Hi all, We have a kafka cluster with 11 nodes, and we found there are some partition’s replica num is not equal to isr’s num,because our data traffic is small,we think it should isr’s num should equal to replica’s num at last, but it can not recovery to normal,so we try to shutdown a brok

Re: consumer_offsets partition skew and possibly ignored retention

2016-10-28 Thread Jeff Widman
James, What version did you experience the problem with? On Oct 28, 2016 6:26 PM, "James Brown" wrote: > I was having this problem with one of my __consumer_offsets partitions; I > used reassignment to move the large partition onto a different set of > machines (which forced the cleaner to run t

Re: Problem with timestamp in Producer

2016-10-28 Thread Debasish Ghosh
Hello Mathias - Thanks a lot for the response. I think what may be happening is a version mismatch between the development & deployment versions of Kafka. The Kafka streams application that I developed uses 0.10.0 based libraries. And my local environment contains a server installation of the same

Re: Problem with timestamp in Producer

2016-10-28 Thread Matthias J. Sax
-BEGIN PGP SIGNED MESSAGE- Hash: SHA512 That sounds reasonable. However, I am wondering how your Streams application can connect to 0.9 broker in the first place. Streams internally uses standard Kafka clients, and those are not backward compatible. Thus, the 0.10 Streams clients should no

Re: Problem with timestamp in Producer

2016-10-28 Thread Debasish Ghosh
I will check out all options that u mentioned. I am sure on my local it's all 0.10.0, so no wonder it works correctly. In The cluster, I just checked the version of Kafka that ships with DC/OS 1.8 (the version I get with dcos package install kafka) is 0.9.0. Regarding .. In case you do have 0.10 b

Re: Problem with timestamp in Producer

2016-10-28 Thread Matthias J. Sax
-BEGIN PGP SIGNED MESSAGE- Hash: SHA512 Btw: I would highly recommend to use Kafka 0.10.1 -- there are many new Streams feature and usability improvements and bug fixes. - -Matthias On 10/28/16 11:42 PM, Matthias J. Sax wrote: > That sounds reasonable. However, I am wondering how your St

Re: Problem with timestamp in Producer

2016-10-28 Thread Debasish Ghosh
I agree .. the problem is DC/OS still ships the older version. Let me check if I can upgrade this .. Thanks! On Sat, Oct 29, 2016 at 12:21 PM, Matthias J. Sax wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA512 > > Btw: I would highly recommend to use Kafka 0.10.1 -- there are many > new