Re: Using Multiple Kafka Producers for a single Kafka Topic

2016-05-25 Thread Joe San
I do not mind the ordering as I have a Timestamp in all my messages and all
my messaged land in a Timeseries database. So I understand that it is
better to have just one Producer instance per JVM and use that to write to
n number of topics. I mean even if I have 10,000 topics, I can just get
away with a single Producer instance per JVM?

On Wed, May 25, 2016 at 8:41 AM, Ewen Cheslack-Postava 
wrote:

> On Mon, Apr 25, 2016 at 6:34 AM, Joe San  wrote:
>
> > I have an application that is currently running and is using Rx Streams
> to
> > move data. Now in this application, I have a couple of streams whose
> > messages I would like to write to a single Kafka topic. Given this, I
> have
> > say Streams 1 to 5 as below:
> >
> > Stream1 - Takes in DataType A Stream2 - Takes in DataType B and so on
> >
> > Where these Streams are Rx Observers. All these data types that I get out
> > of the stream are converted to a common JSON structure. I want this JSON
> > structure to be pushed to a single Kafka topic.
> >
> > Now the questions are:
> >
> >1.
> >
> >Should I create one KafkaProducer for each of those Streams or rather
> Rx
> >Observer instances?
> >
>
> A single producer instance is fine. In fact, it may be better since you
> share TCP connections and requests to produce data can be batched together.
>
>
> >2.
> >
> >What happens if multiple threads using its own instance of a
> >KafkaProducer to write to the same topic?
> >
>
> They can all write to the same topic, but their data will be arbitrarily
> interleaved since there's no ordering guarantee across these producers.
>
>
>
> --
> Thanks,
> Ewen
>


Re: Kafka Scalability with the Number of Partitions

2016-05-25 Thread Yazeed Alabdulkarim
Hi Tom,
Thank you for your help. I have only one broker. I used kafka production
server configuration listed in kafka's documentation page:
http://kafka.apache.org/documentation.html#prodconfig . I have increased
the flush interval and number of messages to prevent the disk from becoming
the bottleneck. For the consumers, I used the following configurations:
Properties props = new Properties();
props.put("enable.auto.commit", "true");
props.put("request.timeout.ms", "5");
props.put("session.timeout.ms", "5000");
props.put("connections.max.idle.ms", "5000");
props.put("fetch.min.bytes", 1);
props.put("fetch.max.wait.ms", "500");
props.put("group.id", "gid");
props.put("key.deserializer", StringDeserializer.class.getName());
props.put("value.deserializer", StringDeserializer.class.getName());
props.put("max.partition.fetch.bytes", "128");
consumer = new KafkaConsumer(props);

I am setting the max.partition.fetch.bytes to 128, because I only want to
process one record for each poll.

Thank a lot for your help. I really appreciate it.

On Tue, May 24, 2016 at 7:51 AM, Tom Crayford  wrote:

> What's your server setup for the brokers and consumers? Generally I'd
> expect something to be exhausted here and that to end up being the
> bottleneck.
>
> Thanks
>
> Tom Crayford
> Heroku Kafka
>
> On Mon, May 23, 2016 at 7:32 PM, Yazeed Alabdulkarim <
> y.alabdulka...@gmail.com> wrote:
>
> > Hi,
> > I am running simple experiments to evaluate the scalability of Kafka
> > consumers with respect to the number of partitions. I assign every
> consumer
> > to a specific partition. Each consumer polls the records in its assigned
> > partition and print the first one, then polls again from the offset of
> the
> > printed record until all records are printed. Prior to running the test,
> I
> > produce 10 Million records evenly among partitions. After running the
> test,
> > I measure the time it took for the consumers to print all the records. I
> > was expecting Kafka to scale as I increase the number of
> > consumers/partitions. However, the scalability diminishes as I increase
> the
> > number of partitions/consumers, beyond certain number. Going from 1,2,4,8
> > the scalability is great as the duration of the test is reduced by the
> > factor increase of the number of partitions/consumers. However, beyond 8
> > consumers/partitions, the duration of the test reaches a steady state. I
> am
> > monitoring the resources of my server and didn't see any bottleneck. Am I
> > missing something here? Shouldn't Kafka consumers scale with the number
> of
> > partitions?
> > --
> > Best Regards,
> > Yazeed Alabdulkarim
> >
>



-- 
Best Regards,
Yazeed Alabdulkarim


Re: Which should scale: Producer or Topic

2016-05-25 Thread Hafsa Asif
A very good question from Joe. I have also the same question.

Hafsa

2016-05-24 18:00 GMT+02:00 Joe San :

> Interesting discussion!
>
> What do you mean here by a process? Is that a thread or the JVM process?
>
> On Tue, May 24, 2016 at 5:49 PM, Tom Crayford 
> wrote:
>
> > Aha, yep that helped a lot.
> >
> > One producer per process. There's not really a per producer topic limit.
> > There's buffering and batching space, but assuming you have sufficient
> > memory (which is by the partition, not by topic), you'll be fine.
> >
> > Thanks
> >
> > Tom Crayford
> > Heroku Kafka
> >
> > On Tue, May 24, 2016 at 4:46 PM, Hafsa Asif 
> > wrote:
> >
> > > One more question:
> > > How many topics can be easily handled by one producer?
> > >
> > > Hafsa
> > >
> > > 2016-05-24 17:39 GMT+02:00 Hafsa Asif :
> > >
> > > > Ok, let me rephrase (may be I am not using correct terms):
> > > > Simply consider I have 2 topics, and I have both Java and NodeJS
> client
> > > > for Kafka.
> > > >
> > > > *NodeJS:*
> > > > Is it good that I write two producers per each topic like that :
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > *var producer1 = new Producer(client);producer1.on('ready', function
> ()
> > > > {});producer1.on('error', function (err) {
> > });producer1.send(payloads,
> > > > cb);var producer2 = new Producer(client);producer2.on('ready',
> function
> > > ()
> > > > {});producer2.on('error', function (err) {
> > });producer2.send(payloads,
> > > > cb);*
> > > > Or, I should create one producer for all 2 topics.
> > > >
> > > >
> > > >
> > > > *Java:*Is it good that I write two producers per each topic like
> that :
> > > >
> > > >
> > > >
> > > >
> > > > *private static Producer producer1;private static
> > > > Producer producer2;producer1 = new Producer<>(new
> > > > ProducerConfig(properties));producer2 = new Producer<>(new
> > > > ProducerConfig(properties));*
> > > >
> > > > Or, I should create one producer for all 2 topics.
> > > >
> > > > Suggest your answer in the light of my estimations (10 topics and 1
> > > > million records per each topic in next week)
> > > >
> > > > Best Regards,
> > > > Hafsa
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > 2016-05-24 17:23 GMT+02:00 Tom Crayford :
> > > >
> > > >> Hi,
> > > >>
> > > >> I think I'm a bit confused. When you say "one producer per topic",
> do
> > > you
> > > >> mean one instance of the JVM application that's producing per topic?
> > > >>
> > > >> Thanks
> > > >>
> > > >> Tom
> > > >>
> > > >> On Tue, May 24, 2016 at 4:19 PM, Hafsa Asif <
> > hafsa.a...@matchinguu.com>
> > > >> wrote:
> > > >>
> > > >> > Tom,
> > > >> >
> > > >> > Thank you for your answer. No, I am talking about one PRODUCER for
> > > each
> > > >> > topic, not one instance of same producer class. I am asking for
> > > general
> > > >> > concept only.
> > > >> >  Actually we are just growing and not so much far from the case
> of 1
> > > >> > million records per sec. Just considering our future case, I need
> > your
> > > >> > suggestion in more detail, that in general is it a good practice
> to:
> > > >> > 1. Prepare a single producer for multiple topics (consider 10
> > topics)
> > > .
> > > >> > 2. Prepare 10 producers for 10 topics respectively.
> > > >> >
> > > >> > Your answer is quite satisfying for me, but I need more details so
> > > that
> > > >> I
> > > >> > can convince my team in a good way.
> > > >> >
> > > >> > Best Regards,
> > > >> > Hafsa
> > > >> >
> > > >> > 2016-05-24 16:11 GMT+02:00 Tom Crayford :
> > > >> >
> > > >> > > Is that "one instance of the producer class per topic"? I'd
> > > recommend
> > > >> > just
> > > >> > > having a single producer shared per process.
> > > >> > >
> > > >> > > 1 million records in a week is not very many records, it works
> > down
> > > to
> > > >> > ~1.6
> > > >> > > records a second on average, which is nothing (we typically see
> 1
> > > >> > million+
> > > >> > > messages per second on our clusters). Or maybe your load is
> > spikier
> > > >> than
> > > >> > > that?
> > > >> > >
> > > >> > > Generally if you have multiple producer instances they will fail
> > > >> slightly
> > > >> > > differently, but most failures that hit one (e.g. a broker going
> > > down
> > > >> and
> > > >> > > the controller not changing over the leader fast enough).
> > > >> > >
> > > >> > > Thanks
> > > >> > >
> > > >> > > Tom Crayford
> > > >> > > Heroku Kafka
> > > >> > >
> > > >> > > On Tue, May 24, 2016 at 3:03 PM, Hafsa Asif <
> > > >> hafsa.a...@matchinguu.com>
> > > >> > > wrote:
> > > >> > >
> > > >> > > > Hello Folks,
> > > >> > > >
> > > >> > > > I am using Kafka (0.9) in my company and it is expected that
> we
> > > are
> > > >> > going
> > > >> > > > to receive 1 million records in next week. I have many topics
> > for
> > > >> > solely
> > > >> > > > different purposes. Is it good that I define one producer per
> > > topic
> > > >> or
> > > >> > > > create one producer for every

Re: Kafka encryption

2016-05-25 Thread Tom Crayford
If you're using EBS then it's a single flag to use encrypted drives at the
provision time of the volume. I don't know about the other storage options,
I'd recommend looking at the AWS documentation.

Thanks

Tom Crayford
Heroku Kafka

On Wednesday, 25 May 2016, Snehalata Nagaje <
snehalata.nag...@harbingergroup.com> wrote:

>
>
> Thanks,
>
> How can we do file system encryption?
>
> we are using aws environment.
>
> Thanks,
> Snehalata
>
> - Original Message -
> From: "Gerard Klijs" >
> To: "Users" >
> Sent: Tuesday, May 24, 2016 7:26:27 PM
> Subject: Re: Kafka encryption
>
> For both old and new consumers/producers you can make your own
> (de)serializer to do some encryption, maybe that could be an option?
>
> On Tue, May 24, 2016 at 2:40 PM Tom Crayford  > wrote:
>
> > Hi,
> >
> > There's no encryption at rest. It's recommended to use filesystem
> > encryption, or encryption of each individual message before producing it
> > for this.
> >
> > Only the new producer and consumers have SSL support.
> >
> > Thanks
> >
> > Tom Crayford
> > Heroku Kafka
> >
> > On Tue, May 24, 2016 at 11:33 AM, Snehalata Nagaje <
> > snehalata.nag...@harbingergroup.com > wrote:
> >
> > >
> > >
> > > Thanks for quick reply.
> > >
> > > Do you mean If I see messages in kafka, those will not be readable?
> > >
> > > And also, we are using new producer but old consumer , does old
> consumer
> > > have ssl support?
> > >
> > > As mentioned in document, its not there.
> > >
> > >
> > > Thanks,
> > > Snehalata
> > >
> > > - Original Message -
> > > From: "Mudit Kumar" >
> > > To: users@kafka.apache.org 
> > > Sent: Tuesday, May 24, 2016 3:53:26 PM
> > > Subject: Re: Kafka encryption
> > >
> > > Yes,it does that.What specifically you are looking for?
> > >
> > >
> > >
> > >
> > > On 5/24/16, 3:52 PM, "Snehalata Nagaje" <
> > > snehalata.nag...@harbingergroup.com > wrote:
> > >
> > > >Hi All,
> > > >
> > > >
> > > >We have requirement of encryption in kafka.
> > > >
> > > >As per docs, we can configure kafka with ssl, for secured
> communication.
> > > >
> > > >But does kafka also stores data in encrypted format?
> > > >
> > > >
> > > >Thanks,
> > > >Snehalata
> > >
> >
>


Re: Kafka Scalability with the Number of Partitions

2016-05-25 Thread Tom Crayford
Hi,

Kafka's performance all comes from batching. There's going to be a huge
perf impact from limiting your batching like that, and that's likely the
issue. I'd recommend designing your system around Kafka's batching model,
which involves large numbers of messages per fetch request.

Thanks

Tom Crayford
Heroku Kafka

On Wednesday, 25 May 2016, Yazeed Alabdulkarim 
wrote:

> Hi Tom,
> Thank you for your help. I have only one broker. I used kafka production
> server configuration listed in kafka's documentation page:
> http://kafka.apache.org/documentation.html#prodconfig . I have increased
> the flush interval and number of messages to prevent the disk from becoming
> the bottleneck. For the consumers, I used the following configurations:
> Properties props = new Properties();
> props.put("enable.auto.commit", "true");
> props.put("request.timeout.ms", "5");
> props.put("session.timeout.ms", "5000");
> props.put("connections.max.idle.ms", "5000");
> props.put("fetch.min.bytes", 1);
> props.put("fetch.max.wait.ms", "500");
> props.put("group.id", "gid");
> props.put("key.deserializer", StringDeserializer.class.getName());
> props.put("value.deserializer", StringDeserializer.class.getName());
> props.put("max.partition.fetch.bytes", "128");
> consumer = new KafkaConsumer(props);
>
> I am setting the max.partition.fetch.bytes to 128, because I only want to
> process one record for each poll.
>
> Thank a lot for your help. I really appreciate it.
>
> On Tue, May 24, 2016 at 7:51 AM, Tom Crayford  > wrote:
>
> > What's your server setup for the brokers and consumers? Generally I'd
> > expect something to be exhausted here and that to end up being the
> > bottleneck.
> >
> > Thanks
> >
> > Tom Crayford
> > Heroku Kafka
> >
> > On Mon, May 23, 2016 at 7:32 PM, Yazeed Alabdulkarim <
> > y.alabdulka...@gmail.com > wrote:
> >
> > > Hi,
> > > I am running simple experiments to evaluate the scalability of Kafka
> > > consumers with respect to the number of partitions. I assign every
> > consumer
> > > to a specific partition. Each consumer polls the records in its
> assigned
> > > partition and print the first one, then polls again from the offset of
> > the
> > > printed record until all records are printed. Prior to running the
> test,
> > I
> > > produce 10 Million records evenly among partitions. After running the
> > test,
> > > I measure the time it took for the consumers to print all the records.
> I
> > > was expecting Kafka to scale as I increase the number of
> > > consumers/partitions. However, the scalability diminishes as I increase
> > the
> > > number of partitions/consumers, beyond certain number. Going from
> 1,2,4,8
> > > the scalability is great as the duration of the test is reduced by the
> > > factor increase of the number of partitions/consumers. However, beyond
> 8
> > > consumers/partitions, the duration of the test reaches a steady state.
> I
> > am
> > > monitoring the resources of my server and didn't see any bottleneck.
> Am I
> > > missing something here? Shouldn't Kafka consumers scale with the number
> > of
> > > partitions?
> > > --
> > > Best Regards,
> > > Yazeed Alabdulkarim
> > >
> >
>
>
>
> --
> Best Regards,
> Yazeed Alabdulkarim
>


Re: Which should scale: Producer or Topic

2016-05-25 Thread Tom Crayford
By process I mean a JVM process (if you're using the JVM clients and for
your app).

Thanks

Tom Crayford
Heroku Kafka

On Wednesday, 25 May 2016, Hafsa Asif  wrote:

> A very good question from Joe. I have also the same question.
>
> Hafsa
>
> 2016-05-24 18:00 GMT+02:00 Joe San 
> >:
>
> > Interesting discussion!
> >
> > What do you mean here by a process? Is that a thread or the JVM process?
> >
> > On Tue, May 24, 2016 at 5:49 PM, Tom Crayford  >
> > wrote:
> >
> > > Aha, yep that helped a lot.
> > >
> > > One producer per process. There's not really a per producer topic
> limit.
> > > There's buffering and batching space, but assuming you have sufficient
> > > memory (which is by the partition, not by topic), you'll be fine.
> > >
> > > Thanks
> > >
> > > Tom Crayford
> > > Heroku Kafka
> > >
> > > On Tue, May 24, 2016 at 4:46 PM, Hafsa Asif  >
> > > wrote:
> > >
> > > > One more question:
> > > > How many topics can be easily handled by one producer?
> > > >
> > > > Hafsa
> > > >
> > > > 2016-05-24 17:39 GMT+02:00 Hafsa Asif  >:
> > > >
> > > > > Ok, let me rephrase (may be I am not using correct terms):
> > > > > Simply consider I have 2 topics, and I have both Java and NodeJS
> > client
> > > > > for Kafka.
> > > > >
> > > > > *NodeJS:*
> > > > > Is it good that I write two producers per each topic like that :
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > *var producer1 = new Producer(client);producer1.on('ready',
> function
> > ()
> > > > > {});producer1.on('error', function (err) {
> > > });producer1.send(payloads,
> > > > > cb);var producer2 = new Producer(client);producer2.on('ready',
> > function
> > > > ()
> > > > > {});producer2.on('error', function (err) {
> > > });producer2.send(payloads,
> > > > > cb);*
> > > > > Or, I should create one producer for all 2 topics.
> > > > >
> > > > >
> > > > >
> > > > > *Java:*Is it good that I write two producers per each topic like
> > that :
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > *private static Producer producer1;private static
> > > > > Producer producer2;producer1 = new Producer<>(new
> > > > > ProducerConfig(properties));producer2 = new Producer<>(new
> > > > > ProducerConfig(properties));*
> > > > >
> > > > > Or, I should create one producer for all 2 topics.
> > > > >
> > > > > Suggest your answer in the light of my estimations (10 topics and 1
> > > > > million records per each topic in next week)
> > > > >
> > > > > Best Regards,
> > > > > Hafsa
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > 2016-05-24 17:23 GMT+02:00 Tom Crayford  >:
> > > > >
> > > > >> Hi,
> > > > >>
> > > > >> I think I'm a bit confused. When you say "one producer per topic",
> > do
> > > > you
> > > > >> mean one instance of the JVM application that's producing per
> topic?
> > > > >>
> > > > >> Thanks
> > > > >>
> > > > >> Tom
> > > > >>
> > > > >> On Tue, May 24, 2016 at 4:19 PM, Hafsa Asif <
> > > hafsa.a...@matchinguu.com >
> > > > >> wrote:
> > > > >>
> > > > >> > Tom,
> > > > >> >
> > > > >> > Thank you for your answer. No, I am talking about one PRODUCER
> for
> > > > each
> > > > >> > topic, not one instance of same producer class. I am asking for
> > > > general
> > > > >> > concept only.
> > > > >> >  Actually we are just growing and not so much far from the case
> > of 1
> > > > >> > million records per sec. Just considering our future case, I
> need
> > > your
> > > > >> > suggestion in more detail, that in general is it a good practice
> > to:
> > > > >> > 1. Prepare a single producer for multiple topics (consider 10
> > > topics)
> > > > .
> > > > >> > 2. Prepare 10 producers for 10 topics respectively.
> > > > >> >
> > > > >> > Your answer is quite satisfying for me, but I need more details
> so
> > > > that
> > > > >> I
> > > > >> > can convince my team in a good way.
> > > > >> >
> > > > >> > Best Regards,
> > > > >> > Hafsa
> > > > >> >
> > > > >> > 2016-05-24 16:11 GMT+02:00 Tom Crayford  >:
> > > > >> >
> > > > >> > > Is that "one instance of the producer class per topic"? I'd
> > > > recommend
> > > > >> > just
> > > > >> > > having a single producer shared per process.
> > > > >> > >
> > > > >> > > 1 million records in a week is not very many records, it works
> > > down
> > > > to
> > > > >> > ~1.6
> > > > >> > > records a second on average, which is nothing (we typically
> see
> > 1
> > > > >> > million+
> > > > >> > > messages per second on our clusters). Or maybe your load is
> > > spikier
> > > > >> than
> > > > >> > > that?
> > > > >> > >
> > > > >> > > Generally if you have multiple producer instances they will
> fail
> > > > >> slightly
> > > > >> > > differently, but most failures that hit one (e.g. a broker
> going
> > > > down
> > > > >> and
> > > > >> > > the controller not changing over the leader fast enough).
> > > > >> > >
> > > > >> > > Thanks
> > > > >> > >
> > > > >> > > Tom Crayford
> > > > >> > > Heroku Kafka
> > > > >> > >
> 

Re: Using Multiple Kafka Producers for a single Kafka Topic

2016-05-25 Thread Tom Crayford
Generally Kafka isn't super great with a giant number of topics. I'd
recommend designing your system around a smaller number than 10k. There's
an upper limit enforced on the total number of partitions by zookeeper
anyway, somewhere around 29k.

I'd recommend having just a single producer per JVM, to reuse TCP
connections and maximize batching. There's no real benefit over having more
producers except slightly minimized lock contention. However, the limiting
factor in most Kafka based apps isn't usually anything like lock contention
on the producer - I'd expect the network to be the real limiter here.

Thanks

Tom Crayford
Heroku Kafka

On Wednesday, 25 May 2016, Joe San  wrote:

> I do not mind the ordering as I have a Timestamp in all my messages and all
> my messaged land in a Timeseries database. So I understand that it is
> better to have just one Producer instance per JVM and use that to write to
> n number of topics. I mean even if I have 10,000 topics, I can just get
> away with a single Producer instance per JVM?
>
> On Wed, May 25, 2016 at 8:41 AM, Ewen Cheslack-Postava  >
> wrote:
>
> > On Mon, Apr 25, 2016 at 6:34 AM, Joe San  > wrote:
> >
> > > I have an application that is currently running and is using Rx Streams
> > to
> > > move data. Now in this application, I have a couple of streams whose
> > > messages I would like to write to a single Kafka topic. Given this, I
> > have
> > > say Streams 1 to 5 as below:
> > >
> > > Stream1 - Takes in DataType A Stream2 - Takes in DataType B and so on
> > >
> > > Where these Streams are Rx Observers. All these data types that I get
> out
> > > of the stream are converted to a common JSON structure. I want this
> JSON
> > > structure to be pushed to a single Kafka topic.
> > >
> > > Now the questions are:
> > >
> > >1.
> > >
> > >Should I create one KafkaProducer for each of those Streams or
> rather
> > Rx
> > >Observer instances?
> > >
> >
> > A single producer instance is fine. In fact, it may be better since you
> > share TCP connections and requests to produce data can be batched
> together.
> >
> >
> > >2.
> > >
> > >What happens if multiple threads using its own instance of a
> > >KafkaProducer to write to the same topic?
> > >
> >
> > They can all write to the same topic, but their data will be arbitrarily
> > interleaved since there's no ordering guarantee across these producers.
> >
> >
> >
> > --
> > Thanks,
> > Ewen
> >
>


Best monitoring tool for Kafka in production

2016-05-25 Thread Hafsa Asif
Hello,

What is the best monitoring tool for Kafka in production, preferable free
tool? If there is no free tool, then please mention non-free efficient
monitoring tools also.

We are feeling so much problem without monitoring tool. Sometimes brokers
goes down or consumer is not working, we are not informed.

Best Regards,
Hafsa


VerifyConsumerRebalance complaining about all consumers?

2016-05-25 Thread Shane Hender
Hi, I'm trying to debug why one particular partition has a huge lag for our
consumers, but while debugging I ran the following 2 commands which seem to
contradict each other. I snipped the output for brevity. I'm focused on
partition 80. I'm also running Kafka 0.8.2.2 with the high level consumer
in the java process connected to it.

$ kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --group
mycgroup --zookeeper $zoo --topic maxwell
Group   Topic  Pid Offset
logSize Lag Owner
...
mycgroup maxwell75  0   0
 0   mycgroup_myhost-1464168755165-8c0adb35-0
mycgroup maxwell76  196963610
196963611   1
mycgroup_myhost-1464168755165-8c0adb35-0
mycgroup maxwell77  1326987677
1326987950  273
mycgroup_myhost-1464168755165-8c0adb35-0
mycgroup maxwell78  255727662
255727843   181
mycgroup_myhost-1464168755165-8c0adb35-0
mycgroup maxwell79  0   0
 0   mycgroup_myhost-1464168755165-8c0adb35-0
mycgroup maxwell80  20450706001
20700099163 249393162
mycgroup_myhost-1464168755399-66315e10-0
mycgroup maxwell81  940659152
940667598   8446
mycgroup_myhost-1464168755399-66315e10-0
mycgroup maxwell82  607085553
607087055   1502
mycgroup_myhost-1464168755399-66315e10-0
mycgroup maxwell83  0   0
 0   mycgroup_myhost-1464168755399-66315e10-0
mycgroup maxwell84  61361682
613616820
mycgroup_myhost-1464168755399-66315e10-0
mycgroup maxwell85  343862579
343866453   3874
mycgroup_myhost-1464168755399-66315e10-0
...

$ kafka-run-class.sh kafka.tools.VerifyConsumerRebalance --group
mycgroup --zookeeper.connect $zoo
...
[2016-05-25 13:03:19,851] ERROR No owner for partition [maxwell,75]
(kafka.tools.VerifyConsumerRebalance$)
[2016-05-25 13:03:19,853] ERROR Owner
mycgroup_myhost-1464168755165-8c0adb35-0 for partition [maxwell,75] is
not a valid member of consumer group mycgroup
(kafka.tools.VerifyConsumerRebalance$)
[2016-05-25 13:03:19,853] ERROR No owner for partition [maxwell,76]
(kafka.tools.VerifyConsumerRebalance$)
[2016-05-25 13:03:19,855] ERROR Owner
mycgroup_myhost-1464168755165-8c0adb35-0 for partition [maxwell,76] is
not a valid member of consumer group mycgroup
(kafka.tools.VerifyConsumerRebalance$)
[2016-05-25 13:03:19,855] ERROR No owner for partition [maxwell,77]
(kafka.tools.VerifyConsumerRebalance$)
[2016-05-25 13:03:19,857] ERROR Owner
mycgroup_myhost-1464168755165-8c0adb35-0 for partition [maxwell,77] is
not a valid member of consumer group mycgroup
(kafka.tools.VerifyConsumerRebalance$)
[2016-05-25 13:03:19,857] ERROR No owner for partition [maxwell,78]
(kafka.tools.VerifyConsumerRebalance$)
[2016-05-25 13:03:19,860] ERROR Owner
mycgroup_myhost-1464168755165-8c0adb35-0 for partition [maxwell,78] is
not a valid member of consumer group mycgroup
(kafka.tools.VerifyConsumerRebalance$)
[2016-05-25 13:03:19,860] ERROR No owner for partition [maxwell,79]
(kafka.tools.VerifyConsumerRebalance$)
[2016-05-25 13:03:19,861] ERROR Owner
mycgroup_myhost-1464168755165-8c0adb35-0 for partition [maxwell,79] is
not a valid member of consumer group mycgroup
(kafka.tools.VerifyConsumerRebalance$)
[2016-05-25 13:03:19,862] ERROR No owner for partition [maxwell,80]
(kafka.tools.VerifyConsumerRebalance$)
[2016-05-25 13:03:19,864] ERROR Owner
mycgroup_myhost-1464168755399-66315e10-0 for partition [maxwell,80] is
not a valid member of consumer group mycgroup
(kafka.tools.VerifyConsumerRebalance$)
[2016-05-25 13:03:19,864] ERROR No owner for partition [maxwell,81]
(kafka.tools.VerifyConsumerRebalance$)
[2016-05-25 13:03:19,866] ERROR Owner
mycgroup_myhost-1464168755399-66315e10-0 for partition [maxwell,81] is
not a valid member of consumer group mycgroup
(kafka.tools.VerifyConsumerRebalance$)
[2016-05-25 13:03:19,866] ERROR No owner for partition [maxwell,82]
(kafka.tools.VerifyConsumerRebalance$)
[2016-05-25 13:03:19,868] ERROR Owner
mycgroup_myhost-1464168755399-66315e10-0 for partition [maxwell,82] is
not a valid member of consumer group mycgroup
(kafka.tools.VerifyConsumerRebalance$)
[2016-05-25 13:03:19,868] ERROR No owner for partition [maxwell,83]
(kafka.tools.VerifyConsumerRebalance$)
[2016-05-25 13:03:19,870] ERROR Owner
mycgroup_myhost-1464168755399-66315e10-0 for partition [maxwell,83] is
not a valid member of consumer group mycgroup
(kafka.tools.VerifyConsumerRebalance$)
[2016-05-25 13:03:19,870] ERROR No owner for partition [maxwell,84]
(kafka.tools.VerifyConsumerRebalance$)
[2016-05-25 13:03:19,871] ERROR Owner
mycgroup_myhost-1464168755399-66315e10-0 for partition [maxwell,84] is
not a 

Java Client for Kafka Consumer(0.9) not polling the records from brokers instantly

2016-05-25 Thread Navneet Kumar
Hi
We are facing a issue where our Consumer component is not instantly
logging the records polled from the Brokers. We have following the
below architecture as attached. Following are the properties
configured:

Producer.properties

bootstrap.servers=xx.xxx.xxx.140:9092,xx.xxx.xxx.140:9093,xx.xxx.xxx.140:9094,xx.xxx.xxx.141:9092,xx.xxx.xxx.141:9093,xx.xxx.xxx.141:9094,xx.xxx.xxx.142:9092,xx.xxx.xxx.142:9093,xx.xxx.xxx.142:9094
acknowledgement=all
group_id=kafkaClusterVLink
retries=3
batch_size=6384
linger_ms=1
buffer_memory=54432
topic=visualize-link-redirect
key.serializer=org.apache.kafka.common.serialization.StringSerializer
value.serializer=org.apache.kafka.common.serialization.StringSerializer

We send the Json String to from Producer to Borkers(Approxs 250Byte/Messsage)

Consumer.properties

bootstrap.servers=xx.xxx.xxx.140:9092,xx.xxx.xxx.140:9093,xx.xxx.xxx.140:9094,xx.xxx.xxx.141:9092,xx.xxx.xxx.141:9093,xx.xxx.xxx.141:9094,xx.xxx.xxx.142:9092,xx.xxx.xxx.142:9093,xx.xxx.xxx.142:9094
group.id=kafkaClusterVLink
enable.auto.commit=true
auto.commit.interval.ms=1
session.timeout.ms=3
key.deserializer=org.apache.kafka.common.serialization.StringDeserializer
value.deserializer=org.apache.kafka.common.serialization.StringDeserializer
topic=visualize-link-redirect


Please help me to resolve this issue as soon as possible. Will be
grateful to you.







Thanks and Regards,
Navneet Kumar


FetchRequest Question

2016-05-25 Thread Heath Ivie
Can someone please explain why if I write 1 message to the queue it takes N 
FetchRequests to get the data out where n > 1?

Heath

Warning: This e-mail may contain information proprietary to AutoAnything Inc. 
and is intended only for the use of the intended recipient(s). If the reader of 
this message is not the intended recipient(s), you have received this message 
in error and any review, dissemination, distribution or copying of this message 
is strictly prohibited. If you have received this message in error, please 
notify the sender immediately and delete all copies.


Brokers changing mtime on data files during startup?

2016-05-25 Thread Andrew Otto
Hiya,

We’ve recently upgraded to 0.9.  In 0.8, when we restarted a broker, data
log file mtimes were not changed.  In 0.9, any data log file that was on
disk before the broker has it’s mtime modified to the time of the broker
restart.

This causes problems with log retention, as all the files then look like
they contain recent data to kafka.  We use the default log retention of 7
weeks, but if all the files are touched at the same time, this can cause us
to retain up to 2 weeks of log data, which can fill up our disks.

We saw this during our initial upgrade, but I had just thought it had
something to do with the change of inter.broker.protocol.version, and
assumed it wouldn’t happen again.  We just did our first broker restart
after the upgrade, and we are seeing this again.  We worked around this
during our upgrade by temporarily setting a high volume topic’s retention
very low, causing brokers to purge more recent data.  This allowed us to
avoid filling up our disks, but we shouldn’t have to do this every time we
bounce brokers.

Has anyone else noticed this?

-Ao


Re: Brokers changing mtime on data files during startup?

2016-05-25 Thread Andrew Otto
“We use the default log retention of 7 *days*" :)*

On Wed, May 25, 2016 at 12:34 PM, Andrew Otto  wrote:

> Hiya,
>
> We’ve recently upgraded to 0.9.  In 0.8, when we restarted a broker, data
> log file mtimes were not changed.  In 0.9, any data log file that was on
> disk before the broker has it’s mtime modified to the time of the broker
> restart.
>
> This causes problems with log retention, as all the files then look like
> they contain recent data to kafka.  We use the default log retention of 7
> weeks, but if all the files are touched at the same time, this can cause us
> to retain up to 2 weeks of log data, which can fill up our disks.
>
> We saw this during our initial upgrade, but I had just thought it had
> something to do with the change of inter.broker.protocol.version, and
> assumed it wouldn’t happen again.  We just did our first broker restart
> after the upgrade, and we are seeing this again.  We worked around this
> during our upgrade by temporarily setting a high volume topic’s retention
> very low, causing brokers to purge more recent data.  This allowed us to
> avoid filling up our disks, but we shouldn’t have to do this every time we
> bounce brokers.
>
> Has anyone else noticed this?
>
> -Ao
>


Heron Spouts & Bolts & Example for Apache Kafka

2016-05-25 Thread Joe Stein
Hey Kafka community, I wanted to pass along some of the work we have been
doing as part of providing commercial support for Heron
https://blog.twitter.com/2016/open-sourcing-twitter-heron Open Sourced
Today.

https://github.com/twitter/heron/pull/751 Kafka 0.8 & 0.9 Spout, Bolt &
Example Topology

Looking forward to continued contributes, if you haven't tried out Heron
yet you should.

Kafka is the wind to make Heron fly!

/**
  Joe Stein
  Elodina Inc
  http://www.elodina.net
**/


Kafka 0.10.0.0: kafka.utils.VerifiableProperties not found - continuing with a stub.

2016-05-25 Thread Niko Davor
Using a bare-bones build.sbt: http://pastebin.com/CADyngYs

Results in:

[warn] Class kafka.utils.VerifiableProperties not found - continuing with a
stub.

[warn] Class kafka.utils.VerifiableProperties not found - continuing with a
stub.

[warn] two warnings found


Re: Brokers changing mtime on data files during startup?

2016-05-25 Thread tao xiao
I noticed the same issue too with 0.9.

On Wed, 25 May 2016 at 09:49 Andrew Otto  wrote:

> “We use the default log retention of 7 *days*" :)*
>
> On Wed, May 25, 2016 at 12:34 PM, Andrew Otto  wrote:
>
> > Hiya,
> >
> > We’ve recently upgraded to 0.9.  In 0.8, when we restarted a broker, data
> > log file mtimes were not changed.  In 0.9, any data log file that was on
> > disk before the broker has it’s mtime modified to the time of the broker
> > restart.
> >
> > This causes problems with log retention, as all the files then look like
> > they contain recent data to kafka.  We use the default log retention of 7
> > weeks, but if all the files are touched at the same time, this can cause
> us
> > to retain up to 2 weeks of log data, which can fill up our disks.
> >
> > We saw this during our initial upgrade, but I had just thought it had
> > something to do with the change of inter.broker.protocol.version, and
> > assumed it wouldn’t happen again.  We just did our first broker restart
> > after the upgrade, and we are seeing this again.  We worked around this
> > during our upgrade by temporarily setting a high volume topic’s retention
> > very low, causing brokers to purge more recent data.  This allowed us to
> > avoid filling up our disks, but we shouldn’t have to do this every time
> we
> > bounce brokers.
> >
> > Has anyone else noticed this?
> >
> > -Ao
> >
>


Re: Heron Spouts & Bolts & Example for Apache Kafka

2016-05-25 Thread Guozhang Wang
Thanks for the news Joe! Browsed at the PR and it looks interesting.


Guozhang



On Wed, May 25, 2016 at 9:54 AM, Joe Stein  wrote:

> Hey Kafka community, I wanted to pass along some of the work we have been
> doing as part of providing commercial support for Heron
> https://blog.twitter.com/2016/open-sourcing-twitter-heron Open Sourced
> Today.
>
> https://github.com/twitter/heron/pull/751 Kafka 0.8 & 0.9 Spout, Bolt &
> Example Topology
>
> Looking forward to continued contributes, if you haven't tried out Heron
> yet you should.
>
> Kafka is the wind to make Heron fly!
>
> /**
>   Joe Stein
>   Elodina Inc
>   http://www.elodina.net
> **/
>



-- 
-- Guozhang


Separating internal and external traffic

2016-05-25 Thread D C
I'm sure i can do this but I'm just not stumbling on the right
documentation anywhere.  I have a handful of kafka servers that I am trying
to get ready for production. I'm trying separate the internal and external
network traffic, but I don't see how to do it.

Each host has two addresses.
10.x.y.z = default interface
192.168.x.y = private network seen only by the kafka nodes.

How can I tell kafka to make use of this?


Add new client to "clients" page please

2016-05-25 Thread Vadim Chekan
Hi all,

I'd like kafka4net client to be added to "clients" page:
https://cwiki.apache.org/confluence/display/KAFKA/Clients

This is C# client, asynchronous, all 3 compressions supported (read and
write), tracks leader partition changes transparently, long time in
production.
Maintainer: https://github.com/vchekan/
License: Apache-2.0
Repository: https://github.com/ntent-ad/kafka4net

Thanks,
Vadim Chekan.

-- 
>From RFC 2631: In ASN.1, EXPLICIT tagging is implicit unless IMPLICIT is
explicitly specified


Re: Best monitoring tool for Kafka in production

2016-05-25 Thread Alex Loddengaard
Hi Hafsa,

We often see Grafana and Graphite, which are both free. Keep in mind you
should monitor the system's metrics and Kafka's JMX metrics.

Alex

On Wed, May 25, 2016 at 3:42 AM, Hafsa Asif 
wrote:

> Hello,
>
> What is the best monitoring tool for Kafka in production, preferable free
> tool? If there is no free tool, then please mention non-free efficient
> monitoring tools also.
>
> We are feeling so much problem without monitoring tool. Sometimes brokers
> goes down or consumer is not working, we are not informed.
>
> Best Regards,
> Hafsa
>


Relaying UDP packets into Kafka

2016-05-25 Thread Sunil Saggar
Hi All,

I am looking for a kafka producer to receive UDP packets and send that
information to specified topic. Is there a out of box producer which does
this ?

There are a few github projects which do udp2kafka  ( receives UDP packets
and then relay them to kafka)

Any advise ?

-- 
---
Thanks
Sunil Saggar


Re: Separating internal and external traffic

2016-05-25 Thread Alex Loddengaard
Hi there,

You can use the `listeners` config to tell Kafka which interfaces to listen
on. The `listeners` config also supports setting the port and protocol. You
may also want to set `advertised.listeners` if the `listeners` hostnames or
IPs aren't reachable by your clients.

Alex

On Wed, May 25, 2016 at 11:41 AM, D C  wrote:

> I'm sure i can do this but I'm just not stumbling on the right
> documentation anywhere.  I have a handful of kafka servers that I am trying
> to get ready for production. I'm trying separate the internal and external
> network traffic, but I don't see how to do it.
>
> Each host has two addresses.
> 10.x.y.z = default interface
> 192.168.x.y = private network seen only by the kafka nodes.
>
> How can I tell kafka to make use of this?
>


Re: Relaying UDP packets into Kafka

2016-05-25 Thread Radoslaw Gruchalski
First result in Google for “kafka udp listener” brings this:
https://github.com/agaoglu/udp-kafka-bridge
–  
Best regards,

Radek Gruchalski

ra...@gruchalski.com
de.linkedin.com/in/radgruchalski

Confidentiality:
This communication is intended for the above-named person and may be 
confidential and/or legally privileged.
If it has come to you in error you must take no action based on it, nor must 
you copy or show it to anyone; please delete/destroy and inform the sender 
immediately.

On May 25, 2016 at 10:12:20 PM, Sunil Saggar (sunil.sag...@gmail.com) wrote:

Hi All,  

I am looking for a kafka producer to receive UDP packets and send that  
information to specified topic. Is there a out of box producer which does  
this ?  

There are a few github projects which do udp2kafka ( receives UDP packets  
and then relay them to kafka)  

Any advise ?  

--  
---  
Thanks  
Sunil Saggar  


Re: Relaying UDP packets into Kafka

2016-05-25 Thread Joe San
What about this one: https://github.com/agaoglu/udp-kafka-bridge

On Wed, May 25, 2016 at 6:48 PM, Sunil Saggar 
wrote:

> Hi All,
>
> I am looking for a kafka producer to receive UDP packets and send that
> information to specified topic. Is there a out of box producer which does
> this ?
>
> There are a few github projects which do udp2kafka  ( receives UDP packets
> and then relay them to kafka)
>
> Any advise ?
>
> --
> ---
> Thanks
> Sunil Saggar
>


upgrading Kafka

2016-05-25 Thread Karnam, Kiran
Hi All,

We are using Docker containers to deploy Kafka, we are planning to use mesos 
for the deployment and maintenance of containers. Is there a way during upgrade 
that we can persist the data so that it is available for the upgraded container.

we don't want the clusters to go into chaos with data replicating around the 
network because a node that was upgraded suddenly has no data

Thanks,
Kiran


Re: upgrading Kafka

2016-05-25 Thread Radoslaw Gruchalski
Kiran,

If you’re using Docker, you can use Docker on Mesos, you can use constraints to 
force relaunched kafka broker to always relaunch at the same agent and you can 
use Docker volumes to persist the data.
Not sure if https://github.com/mesos/kafka provides these capabilites.
–  
Best regards,

Radek Gruchalski

ra...@gruchalski.com
de.linkedin.com/in/radgruchalski

Confidentiality:
This communication is intended for the above-named person and may be 
confidential and/or legally privileged.
If it has come to you in error you must take no action based on it, nor must 
you copy or show it to anyone; please delete/destroy and inform the sender 
immediately.

On May 25, 2016 at 10:58:06 PM, Karnam, Kiran (kkar...@ea.com) wrote:

Hi All,  

We are using Docker containers to deploy Kafka, we are planning to use mesos 
for the deployment and maintenance of containers. Is there a way during upgrade 
that we can persist the data so that it is available for the upgraded 
container.  

we don't want the clusters to go into chaos with data replicating around the 
network because a node that was upgraded suddenly has no data  

Thanks,  
Kiran  


Re: Kafka 0.10.0.0: kafka.utils.VerifiableProperties not found - continuing with a stub.

2016-05-25 Thread Ismael Juma
Hi Niko,

VerifiableProperties is part of the kafka jar and your build only depends
on kafka-clients.

Ismael

On Wed, May 25, 2016 at 6:09 PM, Niko Davor  wrote:

> Using a bare-bones build.sbt: http://pastebin.com/CADyngYs
>
> Results in:
>
> [warn] Class kafka.utils.VerifiableProperties not found - continuing with a
> stub.
>
> [warn] Class kafka.utils.VerifiableProperties not found - continuing with a
> stub.
>
> [warn] two warnings found
>


Re: Brokers changing mtime on data files during startup?

2016-05-25 Thread Meghana Narasimhan
I have seen this issue as well with 0.9. I also thought that it was because
of the upgrade, but that doesn't seem to be it. But there were also a
couple of instances when it didn't change the timestamps, so I was unable
to pinpoint the exact root cause or steps and hence had not yet posted it
here.

On Wed, May 25, 2016 at 2:15 PM, tao xiao  wrote:

> I noticed the same issue too with 0.9.
>
> On Wed, 25 May 2016 at 09:49 Andrew Otto  wrote:
>
> > “We use the default log retention of 7 *days*" :)*
> >
> > On Wed, May 25, 2016 at 12:34 PM, Andrew Otto 
> wrote:
> >
> > > Hiya,
> > >
> > > We’ve recently upgraded to 0.9.  In 0.8, when we restarted a broker,
> data
> > > log file mtimes were not changed.  In 0.9, any data log file that was
> on
> > > disk before the broker has it’s mtime modified to the time of the
> broker
> > > restart.
> > >
> > > This causes problems with log retention, as all the files then look
> like
> > > they contain recent data to kafka.  We use the default log retention
> of 7
> > > weeks, but if all the files are touched at the same time, this can
> cause
> > us
> > > to retain up to 2 weeks of log data, which can fill up our disks.
> > >
> > > We saw this during our initial upgrade, but I had just thought it had
> > > something to do with the change of inter.broker.protocol.version, and
> > > assumed it wouldn’t happen again.  We just did our first broker restart
> > > after the upgrade, and we are seeing this again.  We worked around this
> > > during our upgrade by temporarily setting a high volume topic’s
> retention
> > > very low, causing brokers to purge more recent data.  This allowed us
> to
> > > avoid filling up our disks, but we shouldn’t have to do this every time
> > we
> > > bounce brokers.
> > >
> > > Has anyone else noticed this?
> > >
> > > -Ao
> > >
> >
>


Re: upgrading Kafka

2016-05-25 Thread craig w
The Kafka framework can be used to deploy brokers. It will also bring a
broker back up on the server it was last running on (within a certain
amount of time).

However the Kafka framework doesn't run brokers in containers.

On Wednesday, May 25, 2016, Radoslaw Gruchalski 
wrote:

> Kiran,
>
> If you’re using Docker, you can use Docker on Mesos, you can use
> constraints to force relaunched kafka broker to always relaunch at the same
> agent and you can use Docker volumes to persist the data.
> Not sure if https://github.com/mesos/kafka provides these capabilites.
> –
> Best regards,
> Radek Gruchalski
> ra...@gruchalski.com 
> de.linkedin.com/in/radgruchalski
>
> Confidentiality:
> This communication is intended for the above-named person and may be
> confidential and/or legally privileged.
> If it has come to you in error you must take no action based on it, nor
> must you copy or show it to anyone; please delete/destroy and inform the
> sender immediately.
>
> On May 25, 2016 at 10:58:06 PM, Karnam, Kiran (kkar...@ea.com
> ) wrote:
>
> Hi All,
>
> We are using Docker containers to deploy Kafka, we are planning to use
> mesos for the deployment and maintenance of containers. Is there a way
> during upgrade that we can persist the data so that it is available for the
> upgraded container.
>
> we don't want the clusters to go into chaos with data replicating around
> the network because a node that was upgraded suddenly has no data
>
> Thanks,
> Kiran
>


-- 

https://github.com/mindscratch
https://www.google.com/+CraigWickesser
https://twitter.com/mind_scratch
https://twitter.com/craig_links


Re: Relaying UDP packets into Kafka

2016-05-25 Thread Andrew Otto
Super old, but: https://github.com/atdt/UdpKafka

On Wed, May 25, 2016 at 4:20 PM, Joe San  wrote:

> What about this one: https://github.com/agaoglu/udp-kafka-bridge
>
> On Wed, May 25, 2016 at 6:48 PM, Sunil Saggar 
> wrote:
>
> > Hi All,
> >
> > I am looking for a kafka producer to receive UDP packets and send that
> > information to specified topic. Is there a out of box producer which does
> > this ?
> >
> > There are a few github projects which do udp2kafka  ( receives UDP
> packets
> > and then relay them to kafka)
> >
> > Any advise ?
> >
> > --
> > ---
> > Thanks
> > Sunil Saggar
> >
>


Log Compact current segment

2016-05-25 Thread Ajay Ramani
Hello,
I'm using Kafka 0.9 and have a topic with the
config: cleanup.policy=compact, delete.retention.ms=3, segment.ms
=3, min.cleanable.dirty.ratio=0.01.

I understood regarding the requirements with the latest segments and how
only the segments other than the latest (active) are compacted.

Imagine I have the  messages-(key,value)  on segment 1 as:
key1 - msg1
key1 - msg2

Then the log rolls over according to segment.ms and creates a new segment
and I insert the value
key1 - msg3

I want to see only one message finally (key1, msg3) but when I consume from
oldest offset I get two messages msg2 and msg3 with the same key1. And I
see that log-cleaner.log has only the log displayed as :
Start size: 0.0 MB (2 messages)
End size: 0.0 MB (1 messages),
which corresponds to the first segment being compacted. I get the reason
behind this is that the new segment is active and is not compacted.

But is there any work around to make sure that I only see (key1,msg3) after
these operations
One buggy way I can think of is inserting a dummy (key2,msg) after
segment.ms duration, which will compact the key1 messages in the old
segments and give me two messages: (key1, msg3) and (key2, msg). But I do
not want this new key2 message :(

Any help  is greatly appreciated.
Thank You
Ajay


Re: Reporting security issues

2016-05-25 Thread Mayuresh Gharat
Excellent :)

Thanks,

Mayuresh

On Tue, May 24, 2016 at 2:55 AM, Ismael Juma  wrote:

> Hi all,
>
> Since Kafka implements a number of security features, we need a procedure
> for reporting potential security vulnerabilities privately (as per
> http://www.apache.org/security/). We have added a simple page to the
> website that describes the procedure (thanks Flavio):
>
> http://kafka.apache.org/project-security.html
>
> See https://issues.apache.org/jira/browse/KAFKA-3709 for more background.
>
> If you have suggestions on how the page could be improved, pull requests
> are welcome. :)
>
> Ismael
>



-- 
-Regards,
Mayuresh R. Gharat
(862) 250-7125


Re: upgrading Kafka

2016-05-25 Thread craig w
More specifically, see:
https://github.com/mesos/kafka#failed-broker-recovery

On Wed, May 25, 2016 at 6:02 PM, craig w  wrote:

> The Kafka framework can be used to deploy brokers. It will also bring a
> broker back up on the server it was last running on (within a certain
> amount of time).
>
> However the Kafka framework doesn't run brokers in containers.
>
>
> On Wednesday, May 25, 2016, Radoslaw Gruchalski 
> wrote:
>
>> Kiran,
>>
>> If you’re using Docker, you can use Docker on Mesos, you can use
>> constraints to force relaunched kafka broker to always relaunch at the same
>> agent and you can use Docker volumes to persist the data.
>> Not sure if https://github.com/mesos/kafka provides these capabilites.
>> –
>> Best regards,
>> Radek Gruchalski
>> ra...@gruchalski.com
>> de.linkedin.com/in/radgruchalski
>>
>> Confidentiality:
>> This communication is intended for the above-named person and may be
>> confidential and/or legally privileged.
>> If it has come to you in error you must take no action based on it, nor
>> must you copy or show it to anyone; please delete/destroy and inform the
>> sender immediately.
>>
>> On May 25, 2016 at 10:58:06 PM, Karnam, Kiran (kkar...@ea.com) wrote:
>>
>> Hi All,
>>
>> We are using Docker containers to deploy Kafka, we are planning to use
>> mesos for the deployment and maintenance of containers. Is there a way
>> during upgrade that we can persist the data so that it is available for the
>> upgraded container.
>>
>> we don't want the clusters to go into chaos with data replicating around
>> the network because a node that was upgraded suddenly has no data
>>
>> Thanks,
>> Kiran
>>
>
>
> --
>
> https://github.com/mindscratch
> https://www.google.com/+CraigWickesser
> https://twitter.com/mind_scratch
> https://twitter.com/craig_links
>
>
>


-- 

https://github.com/mindscratch
https://www.google.com/+CraigWickesser
https://twitter.com/mind_scratch
https://twitter.com/craig_links


Re: Kafka stream join scenarios

2016-05-25 Thread Guozhang Wang
A processor is guaranteed to be executed on the same thread at any given
time, its process() and punctuate() will always be triggered to run in a
single thread.

Currently TimestampExtractor is set globally, but you can definitely define
different logics depending on the topic name (which is included in the
input ConsumerRecord).

Guozhang


On Mon, May 23, 2016 at 6:20 PM, Srikanth  wrote:

> Guozhang,
>
> I guess you are referring to a scenario where noOfThreads < totalNoOfTasks.
> We could have KTable task and KStream task running on the same thread and
> sleep will be counter productive?
> On this note, will a Processor always run on the same thread? Are process()
> and punctuate() guaranteed to never run in parallel?
>
> The jira you gave seems to be on the same lines.
> Can you comment on my question regarding TimestampExtractor?
> We set one TimestampExtractor as a stream config at global level. Timestamp
> extraction logic on the other hand will be specific to each stream.
>
> Srikanth
>
>
> On Mon, May 23, 2016 at 5:55 PM, Guozhang Wang  wrote:
>
> > Srikanth,
> >
> > Note that the same thread maybe used for fetching both the "semi-static"
> > KTable stream as well as the continuous KStream stream, so
> > sleep-on-no-match may not work.
> >
> > I think setting timestamps for this KTable to make sure its values is
> > smaller than the KStream stream will work, and there is a JIRA open for
> > better handling logics:
> >
> > https://issues.apache.org/jira/browse/KAFKA-3478
> >
> >
> > Guozhang
> >
> >
> > On Mon, May 23, 2016 at 1:58 PM, Srikanth  wrote:
> >
> > > Thanks Guozhang & Matthias!
> > >
> > > For 1), it is going to be a common ask. So a DSL API will be good.
> > >
> > > For 2), source for the KTable currently is a file. Think about it as a
> > dump
> > > from a DB table.
> > > We are thinking of ways to stream updates from this table. But for now
> > its
> > > a new file every day or so.
> > > I plan to automate file uploader to write to kafka when it gets a new
> > file.
> > > File is too big to be "broadcasted". The delta changes between files
> > should
> > > be really small though.
> > >
> > > Now, "current content" can be modeled based on timestamp. If I add a
> > > timestamp field when pushing to kafka.
> > > The records themselves have no notion of time. Its just metadata that
> > will
> > > be useful in join.
> > >
> > > Another way is similar to what Matthias suggested. I can make it sleep
> > if a
> > > key is not found in KTable.
> > > I can treat it as a condition to indicated KTable is still being
> > > initialized. Of course, I need a way to break this sleep cycle if key
> > never
> > > comes.
> > >
> > > Or this can be implemented with a custom watermark assigner that knows
> > when
> > > to emit a "special watermark" to indicate current content is read.
> > >
> > > Or for such a slow stream, any poll to kafka broker that returns zero
> > > records can be treated as reaching end of current content.
> > >
> > >
> > > Matthias,
> > > I haven't spent enough time on the approach you outlined. Will let you
> > > know.
> > >
> > > Srikanth
> > >
> > >
> > >
> > > On Mon, May 23, 2016 at 1:40 PM, Matthias J. Sax <
> matth...@confluent.io>
> > > wrote:
> > >
> > > > Hi Srikanth,
> > > >
> > > > as Guozhang mentioned, the problem is the definition of the time,
> when
> > > > your table is read for joining with the stream.
> > > >
> > > > Using transform() you would basically read a changlog-stream within
> > your
> > > > custom Transformer class and apply it via KStream.transform() to your
> > > > regular stream. (ie, your Transformer class has a member KTable).
> > > >
> > > > If Transformer.transform() is called you need to decide somehow, if
> you
> > > > table is read for joining or not (and "sleep" if it is not ready yet,
> > > > effectively stalling your KStream).
> > > >
> > > > As some point in time (not sure how you wanna decide when this point
> in
> > > > time actually is -- see beginning of this mail) you can start to
> > process
> > > > the data from the regular stream.
> > > >
> > > > Have a look into KStreamKTableLeftJoin to get an idea how this would
> > > work.
> > > >
> > > >
> > > > -Matthias
> > > >
> > > >
> > > > On 05/23/2016 06:25 PM, Guozhang Wang wrote:
> > > > > Hi Srikanth,
> > > > >
> > > > > How do you define if the "KTable is read completely" to get to the
> > > > "current
> > > > > content"? Since as you said that table is not purely static, but
> > still
> > > > with
> > > > > maybe-low-traffic update streams, I guess "catch up to current
> > content"
> > > > is
> > > > > still depending on some timestamp?
> > > > >
> > > > > BTW about 1), we are consider adding "read-only global state" as
> well
> > > > into
> > > > > the DSL in the future, but like Matthias said it won't be available
> > in
> > > > > 0.10.0, so you need to do it through the transform() call where you
> > can
> > > > > provide any customized processor.
> > > > >
> > > > >
> > > > > 

Re: Kafka Streams error

2016-05-25 Thread Guozhang Wang
Hi Walter,

Synced up with Ismael regarding your questions, and here are our
suggestions:


"Scala 2.11 is the minimum. The downside of that flag is that it includes
new features that are still changing, may be less stable and may not work
fully.

You may probably consider planning an upgrade as support for 2.10 will
start to be dropped by various projects once 2.12 is released later this
year."


Guozhang



On Tue, May 24, 2016 at 2:28 PM, Walter rakoff 
wrote:

> Hello,
>
> I'm trying a sample Kafka Streams program in Scala.
>
> val clickRecordKStream: KStream[String, ClickRecord] =
>   kStreamBuilder.stream(stringSerde, stringSerde, "test-topic")
>   .map( (k:String, v:String) => (k, ClickRecord(v)))
>
> The map call throws error "type mismatch; found : (String, String) =>
> (String, ClickRecord) required:
>
> org.apache.kafka.streams.kstream.KeyValueMapper[String,String,org.apache.kafka.streams.KeyValue[?,?]]
>
> A similar example [1] has a comment " Requires a version of Scala that
> supports Java 8 and SAM / Java lambda (e.g. Scala 2.11
> with `-Xexperimental` compiler flag, or 2.12)"
> Is there any way to get this working on Scala 2.10? Everything we write is
> built in 2.10.
> If not, what are the downside of `-Xexperimental` flag in 2.11?
>
> [1]
>
> https://github.com/confluentinc/examples/blob/master/kafka-streams/src/main/scala/io/confluent/examples/streams/MapFunctionScalaExample.scala
>
> Thanks,
> Walter
>



-- 
-- Guozhang


Re: upgrading Kafka

2016-05-25 Thread Mudit Agarwal
Yes,you can use constraints and same volumes.That can be trusted.

  From: Radoslaw Gruchalski 
 To: "Karnam, Kiran" ; users@kafka.apache.org 
 Sent: Thursday, 26 May 2016 2:31 AM
 Subject: Re: upgrading Kafka
   
Kiran,

If you’re using Docker, you can use Docker on Mesos, you can use constraints to 
force relaunched kafka broker to always relaunch at the same agent and you can 
use Docker volumes to persist the data.
Not sure if https://github.com/mesos/kafka provides these capabilites.
–  
Best regards,

Radek Gruchalski

ra...@gruchalski.com
de.linkedin.com/in/radgruchalski

Confidentiality:
This communication is intended for the above-named person and may be 
confidential and/or legally privileged.
If it has come to you in error you must take no action based on it, nor must 
you copy or show it to anyone; please delete/destroy and inform the sender 
immediately.

On May 25, 2016 at 10:58:06 PM, Karnam, Kiran (kkar...@ea.com) wrote:

Hi All,  

We are using Docker containers to deploy Kafka, we are planning to use mesos 
for the deployment and maintenance of containers. Is there a way during upgrade 
that we can persist the data so that it is available for the upgraded 
container.  

we don't want the clusters to go into chaos with data replicating around the 
network because a node that was upgraded suddenly has no data  

Thanks,  
Kiran