Re: Unable to write, leader not available

2016-08-03 Thread Manikumar Reddy
Hi,

Can you enable Authorization debug logs and check for logs related to
denied operations..
we should also enable operations on Cluster resource.


Thanks,
Manikumar

On Thu, Aug 4, 2016 at 1:51 AM, Bryan Baugher  wrote:

> Hi everyone,
>
> I was trying out kerberos on Kafka 0.10.0.0 by creating a single node
> cluster. I managed to get everything setup and past all the authentication
> errors but whenever I try to use the console producer I get 'Error while
> fetching metadata ... LEADER_NOT_AVAILABLE'. In this case I've created the
> topic ahead of time (1 replica, 1 partition) and I can see that broker 0 is
> in the ISR and is the leader. I have also opened an ACL to the topic for my
> user to produce and was previously seeing authentication errors prior. I
> don't see any errors or helpful logs on the broker side even after turning
> on debug logging. Turning on debug logging on the client the only thing
> that stands out is that it lists the broker as 'node -1' instead of 0. It
> does mention the correct hostname/port and that it was able to successfully
> connect. Any ideas?
>
> Bryan
>


Re : A specific use case

2016-08-03 Thread Hamza HACHANI
Hi,
Yes in fact .
And ï found à solution.
It was in editing the method punctuate in kafka stream processor.

- Message de réponse -
De : "Guozhang Wang" 
Pour : "users@kafka.apache.org" 
Objet : A specific use case
Date : mer., août 3, 2016 23:38

Hello Hamza,

By saying "broker" I think you are actually referring to a Kafka Streams
instance?


Guozhang

On Mon, Aug 1, 2016 at 1:01 AM, Hamza HACHANI 
wrote:

> Good morning,
>
> I'm working on a specific use case. In fact i'm receiving messages from an
> operator network and trying to do statistics on their number per
> minute,perhour,per day ...
>
> I would like to create a broker that receives the messages and generates a
> message every minute. These producted messages are consumed by a consumer
> from in one hand and also se,t to an other topic which receives them and
> generates messages every minute.
>
> I've  been doing that for a while without a success. In fact the first
> broker in any time it receives a messages ,it produces one and send it to
> the other topic.
>
> My question is ,what i'm trying to do,Is it possible without passing by an
> intermediate java processus which is out of kafka.
>
> If yes , How ?
>
> Thanks In advance.
>



--
-- Guozhang


Re: Kafka Consumer poll

2016-08-03 Thread sat
Hi,

Sorry, i have one more query regarding consumer.poll()

As KafkaConsumer fetches record when it is available irrespective of
poll(timeout), does KafkaConsumer splits the poll(timeout) in to multiple
intervals and checks Kafka Server for any messages.

Eg., poll timeout is 10min , does it split into 1min intervals of 10 times

do{
 if(  pollForMessageIfAny()){
  return;
 }
 Thread.sleep(1000);
 timeremaining = curretime - timeout;
}while(timeremainning > 0)

Thanks and Regards
A.SathishKumar

On Wed, Aug 3, 2016 at 5:46 PM, sat  wrote:

> Hi,
>
>
> Thanks for your reply Kamal and Oleg.
>
>
> Thanks and Regards
>
> A.SathishKumar
>
> >Also keep in mind that unfortunately KafkaConsumer.poll(..) will deadlock 
> >regardless of the
> >timeout if connection to the broker can not be established and won't react 
> >to thread interrupts.
> >This essentially means that the only way to exit is to kill jvm. This is all 
> >because Kafka
> >fetches topic metadata synchronously before timeout takes effect.
> >While it is my understanding that the reason for it is there is a background 
> >thread attempting
> >to reconnect in the event of temporary broker outage, it doesn't help if you 
> >accidentally
> >specified wrong broker url.
>
> Oleg
>
> > On Aug 2, 2016, at 10:27, Kamal C  wrote:
> >
> > See the answers inline.
> >
> >> On Tue, Aug 2, 2016 at 12:23 AM, sat  wrote:
> >>
> >> Hi,
> >>
> >> I am new to Kafka. We are planning to use Kafka messaging for our
> >> application. I was playing with Kafka 0.9.0.1 version and i have following
> >> queries. Sorry for asking basic questions.
> >>
> >>
> >> 1) I have instantiated Kafka Consumer and invoked
> >> consumer.poll(Long.MAX_VALUE). Although i have specified timeout as
> >> Long.MAX_VALUE, i observe my consumer to fetch records whenever the
> >> publisher publishes a message to a topic. This makes me wonder whether
> >> Kafka Consumer is push or pull mechanism. Please help us understand the
> >> logic of consumer.poll(timeout).
> >
> > Fetches the data from the topic, waiting up to the specified wait time *if
> > necessary *for a record to become available.
> > Kafka Consumer by design is pull mechanism.
> >
> > Take a look into Kafka Consumer java docs[1]. It's explained in detail.
> >
> >
> >> 2) What are the pros and cons of poll for long timeout vs short timeout.
> >>
> >> Short Timeout
> >
> > Pros:
> > - On shutdown, if no data available in the topic -- Shutdown will be quick
> >
> > Cons:
> > - Number of network trips will be high
> >
> >
> >>
> >> Thanks and Regards
> >> A.SathishKumar
> >
> > [1]:
> > https://kafka.apache.org/090/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html
> >
> > -- Kamal
>
>
> On Mon, Aug 1, 2016 at 11:53 AM, sat  wrote:
>
>> Hi,
>>
>> I am new to Kafka. We are planning to use Kafka messaging for our
>> application. I was playing with Kafka 0.9.0.1 version and i have following
>> queries. Sorry for asking basic questions.
>>
>>
>> 1) I have instantiated Kafka Consumer and invoked
>> consumer.poll(Long.MAX_VALUE). Although i have specified timeout as
>> Long.MAX_VALUE, i observe my consumer to fetch records whenever the
>> publisher publishes a message to a topic. This makes me wonder whether
>> Kafka Consumer is push or pull mechanism. Please help us understand the
>> logic of consumer.poll(timeout).
>>
>> 2) What are the pros and cons of poll for long timeout vs short timeout.
>>
>>
>> Thanks and Regards
>> A.SathishKumar
>>
>>
>
>
> --
> A.SathishKumar
> 044-24735023
>



-- 
A.SathishKumar
044-24735023


Kafka consumers in cluster

2016-08-03 Thread sat
Hi,

We have Kafka server/broker running in a seperate machine (say machine A),
for now we are planning to have in one node. We have multiple topics and
all topics have only 1 partition for now.

We have our application which includes Kafka consumers installed in machine
B and machine C. Our application in machine B and C are in cluster, hence
Kafka Consumers will also be in cluster. Both our consumers will have same
group id. We want all the messages to be consumed by consumer in machine B
and only when machine B is down consumer in machine C should pull messages.

Since consumer in machine B and C have same group id, we came to know
consumer in machine B will get message for some time duration (10mins) and
then consumer in machine C will get message for some time duration. Since
our consumers are in cluster, we want only consumer to be active or receive
all the messages as long as it is alive.

Please let us know how to achieve this.


Thanks and Regards
A.SathishKumar
044-24735023


Re: Same partition number of different Kafka topcs

2016-08-03 Thread Dana Powers
kafka-python by default uses the same partitioning algorithm as the Java
client. If there are bugs, please let me know. I think the issue here is
with the default nodejs partitioner.

-Dana
On Aug 3, 2016 7:03 PM, "Jack Huang"  wrote:

I see, thanks for the clarification.

On Tue, Aug 2, 2016 at 10:07 PM, Ewen Cheslack-Postava 
wrote:

> Jack,
>
> The partition is always selected by the client -- if it weren't the
brokers
> would need to forward requests since different partitions are handled by
> different brokers. The only "default Kafka partitioner" is the one that
you
> could consider "standardized" by the Java client implementation. Some
> client libraries will make this pluggable like the Java client does so you
> could use a compatible implementation.
>
> -Ewen
>
> On Fri, Jul 29, 2016 at 11:27 AM, Jack Huang  wrote:
>
> > Hi Gerard,
> >
> > After further digging, I found that the clients we are using also have
> > different partitioner. The Python one uses murmur2 (
> >
> >
>
https://github.com/dpkp/kafka-python/blob/master/kafka/partitioner/default.py
> > ),
> > and the NodeJS one uses its own impl (
> > https://github.com/SOHU-Co/kafka-node/blob/master/lib/partitioner.js).
> > Does
> > Kafka delegate the task of partitioning to client? From their
> documentation
> > it doesn't seem like they provide an option to select the "default Kafka
> > partitioner".
> >
> > Thanks,
> > Jack
> >
> >
> > On Fri, Jul 29, 2016 at 7:42 AM, Gerard Klijs 
> > wrote:
> >
> > > The default partitioner will take the key, make the hash from it, and
> do
> > a
> > > modulo operation to determine the partition it goes to. Some things
> which
> > > might cause it to and up different for different topics:
> > > - partition number are not the same (you already checked)
> > > - key is not exactly the same, for example one might have a space
after
> > the
> > > id
> > > - the other topic is configured to use another partitioner
> > > - the serialiser for the key is different for both topics, since the
> hash
> > > is created based on the bytes of key of the serialised message
> > > - all the topics use another partitioner (for example round robin)
> > >
> > > On Thu, Jul 28, 2016 at 9:11 PM Jack Huang  wrote:
> > >
> > > > Hi all,
> > > >
> > > > I have an application where I need to join events from two different
> > > > topics. Every event is identified by an id, which is used as the key
> > for
> > > > the topic partition. After doing some experiment, I observed that
> > events
> > > > will go into different partitions even if the number of partitions
> for
> > > both
> > > > topics are the same. I can't find any documentation on this point
> > though.
> > > > Does anyone know if this is indeed the case?
> > > >
> > > >
> > > > Thanks,
> > > > Jack
> > > >
> > >
> >
>
>
>
> --
> Thanks,
> Ewen
>


Re: Same partition number of different Kafka topcs

2016-08-03 Thread Jack Huang
I see, thanks for the clarification.

On Tue, Aug 2, 2016 at 10:07 PM, Ewen Cheslack-Postava 
wrote:

> Jack,
>
> The partition is always selected by the client -- if it weren't the brokers
> would need to forward requests since different partitions are handled by
> different brokers. The only "default Kafka partitioner" is the one that you
> could consider "standardized" by the Java client implementation. Some
> client libraries will make this pluggable like the Java client does so you
> could use a compatible implementation.
>
> -Ewen
>
> On Fri, Jul 29, 2016 at 11:27 AM, Jack Huang  wrote:
>
> > Hi Gerard,
> >
> > After further digging, I found that the clients we are using also have
> > different partitioner. The Python one uses murmur2 (
> >
> >
> https://github.com/dpkp/kafka-python/blob/master/kafka/partitioner/default.py
> > ),
> > and the NodeJS one uses its own impl (
> > https://github.com/SOHU-Co/kafka-node/blob/master/lib/partitioner.js).
> > Does
> > Kafka delegate the task of partitioning to client? From their
> documentation
> > it doesn't seem like they provide an option to select the "default Kafka
> > partitioner".
> >
> > Thanks,
> > Jack
> >
> >
> > On Fri, Jul 29, 2016 at 7:42 AM, Gerard Klijs 
> > wrote:
> >
> > > The default partitioner will take the key, make the hash from it, and
> do
> > a
> > > modulo operation to determine the partition it goes to. Some things
> which
> > > might cause it to and up different for different topics:
> > > - partition number are not the same (you already checked)
> > > - key is not exactly the same, for example one might have a space after
> > the
> > > id
> > > - the other topic is configured to use another partitioner
> > > - the serialiser for the key is different for both topics, since the
> hash
> > > is created based on the bytes of key of the serialised message
> > > - all the topics use another partitioner (for example round robin)
> > >
> > > On Thu, Jul 28, 2016 at 9:11 PM Jack Huang  wrote:
> > >
> > > > Hi all,
> > > >
> > > > I have an application where I need to join events from two different
> > > > topics. Every event is identified by an id, which is used as the key
> > for
> > > > the topic partition. After doing some experiment, I observed that
> > events
> > > > will go into different partitions even if the number of partitions
> for
> > > both
> > > > topics are the same. I can't find any documentation on this point
> > though.
> > > > Does anyone know if this is indeed the case?
> > > >
> > > >
> > > > Thanks,
> > > > Jack
> > > >
> > >
> >
>
>
>
> --
> Thanks,
> Ewen
>


Re: Kafka Consumer poll

2016-08-03 Thread sat
Hi,


Thanks for your reply Kamal and Oleg.


Thanks and Regards

A.SathishKumar

>Also keep in mind that unfortunately KafkaConsumer.poll(..) will deadlock 
>regardless of the
>timeout if connection to the broker can not be established and won't react to 
>thread interrupts.
>This essentially means that the only way to exit is to kill jvm. This is all 
>because Kafka
>fetches topic metadata synchronously before timeout takes effect.
>While it is my understanding that the reason for it is there is a background 
>thread attempting
>to reconnect in the event of temporary broker outage, it doesn't help if you 
>accidentally
>specified wrong broker url.

Oleg

> On Aug 2, 2016, at 10:27, Kamal C  wrote:
>
> See the answers inline.
>
>> On Tue, Aug 2, 2016 at 12:23 AM, sat  wrote:
>>
>> Hi,
>>
>> I am new to Kafka. We are planning to use Kafka messaging for our
>> application. I was playing with Kafka 0.9.0.1 version and i have following
>> queries. Sorry for asking basic questions.
>>
>>
>> 1) I have instantiated Kafka Consumer and invoked
>> consumer.poll(Long.MAX_VALUE). Although i have specified timeout as
>> Long.MAX_VALUE, i observe my consumer to fetch records whenever the
>> publisher publishes a message to a topic. This makes me wonder whether
>> Kafka Consumer is push or pull mechanism. Please help us understand the
>> logic of consumer.poll(timeout).
>
> Fetches the data from the topic, waiting up to the specified wait time *if
> necessary *for a record to become available.
> Kafka Consumer by design is pull mechanism.
>
> Take a look into Kafka Consumer java docs[1]. It's explained in detail.
>
>
>> 2) What are the pros and cons of poll for long timeout vs short timeout.
>>
>> Short Timeout
>
> Pros:
> - On shutdown, if no data available in the topic -- Shutdown will be quick
>
> Cons:
> - Number of network trips will be high
>
>
>>
>> Thanks and Regards
>> A.SathishKumar
>
> [1]:
> https://kafka.apache.org/090/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html
>
> -- Kamal


On Mon, Aug 1, 2016 at 11:53 AM, sat  wrote:

> Hi,
>
> I am new to Kafka. We are planning to use Kafka messaging for our
> application. I was playing with Kafka 0.9.0.1 version and i have following
> queries. Sorry for asking basic questions.
>
>
> 1) I have instantiated Kafka Consumer and invoked
> consumer.poll(Long.MAX_VALUE). Although i have specified timeout as
> Long.MAX_VALUE, i observe my consumer to fetch records whenever the
> publisher publishes a message to a topic. This makes me wonder whether
> Kafka Consumer is push or pull mechanism. Please help us understand the
> logic of consumer.poll(timeout).
>
> 2) What are the pros and cons of poll for long timeout vs short timeout.
>
>
> Thanks and Regards
> A.SathishKumar
>
>


-- 
A.SathishKumar
044-24735023


Re: [kafka-clients] [VOTE] 0.10.0.1 RC1

2016-08-03 Thread Ismael Juma
Thanks Manikumar. I filed KAFKA-4018 with the details of when we regressed
as well as a fix.

Ismael

On Wed, Aug 3, 2016 at 4:18 PM, Manikumar Reddy 
wrote:

> Hi,
>
> There are two versions of slf4j-log4j jar in the build. (1.6.1, 1.7.21).
> slf4j-log4j12-1.6.1.jar is coming from streams:examples module.
>
> Thanks,
> Manikumar
>
> On Tue, Aug 2, 2016 at 8:31 PM, Ismael Juma  wrote:
>
>> Hello Kafka users, developers and client-developers,
>>
>> This is the second candidate for the release of Apache Kafka 0.10.0.1.
>> This is a bug fix release and it includes fixes and improvements from 52
>> JIRAs (including a few critical bugs). See the release notes for more
>> details:
>>
>> http://home.apache.org/~ijuma/kafka-0.10.0.1-rc1/RELEASE_NOTES.html
>>
>> When compared to RC0, RC1 contains fixes for two bugs (KAFKA-4008
>> and KAFKA-3950) and a couple of test stabilisation fixes.
>>
>> *** Please download, test and vote by Friday, 5 August, 8am PT ***
>>
>> Kafka's KEYS file containing PGP keys we use to sign the release:
>> http://kafka.apache.org/KEYS
>>
>> * Release artifacts to be voted upon (source and binary):
>> http://home.apache.org/~ijuma/kafka-0.10.0.1-rc1/
>>
>> * Maven artifacts to be voted upon:
>> https://repository.apache.org/content/groups/staging
>>
>> * Javadoc:
>> http://home.apache.org/~ijuma/kafka-0.10.0.1-rc1/javadoc/
>>
>> * Tag to be voted upon (off 0.10.0 branch) is the 0.10.0.1-rc1 tag:
>>
>> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=108580e4594d694827c953264969fe1ce2a7
>>
>> * Documentation:
>> http://kafka.apache.org/0100/documentation.html
>>
>> * Protocol:
>> http://kafka.apache.org/0100/protocol.html
>>
>> * Successful Jenkins builds for the 0.10.0 branch:
>> Unit/integration tests: *https://builds.apache.org/job/kafka-0.10.0-jdk7/179/
>> *
>> System tests: *https://jenkins.confluent.io/job/system-test-kafka-0.10.0/136/
>> *
>>
>> Thanks,
>> Ismael
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "kafka-clients" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to kafka-clients+unsubscr...@googlegroups.com.
>> To post to this group, send email to kafka-clie...@googlegroups.com.
>> Visit this group at https://groups.google.com/group/kafka-clients.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/kafka-clients/CAD5tkZaRxAjQbwS_1q4MqskSYKxQWBFmdPVf_PP020bjY9%3DCgQ%40mail.gmail.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>


Re: Kafka DNS Caching in AWS

2016-08-03 Thread Zuber Saiyed
Thank you all for your responses.

So here is what I tried and it worked since EIP is not an option for me -
1) Created an ENI with a dedicated IP
2) Associated that IP with A Type address
3) Assigned that ENI to the EC2 instance
4) Created EBS volume to keep ZK data

As EBS volume and ENI are bound to AZ, I have created AutoScaling group per AZ.


Thanks,
Zuber


On Wed, Aug 3, 2016 at 10:22 AM, Zuber  wrote:
> Hello –
>
>
>
> We are planning to use Kafka as Event Store in a system which is being built
> using event sourcing design approach.
>
> Here is how we deployed the cluster in AWS to verify HA in the cloud (in our
> POC we only had 1 topic with 1 partition and 3 replication factor) -
>
> 1)3 ZK servers running in different AZs (managed by Auto Scaling Group)
>
> 2)3 Kafka brokers EC2 running in different AZs (managed by Auto Scaling
> Group)
>
> 3)Kafka logs are stored in EBS volumes
>
> 4)A type addresses are defined for all ZK servers & Kafka brokers in
> Route53
>
> EC2 instance registers its IP for corresponding A type address (in Route53)
> on startup
>
>
>
> But due a bug in ZKClient used by Kafka broker which caches ZK IP forever, I
> don’t see any other option other than bouncing all brokers.
>
>
>
> One of the Netflix presentation (following links) mentions about the issue
> as well as couple of ZK JIRA defects but I haven’t found any concrete
> solution yet.
>
> I would really appreciate any help in this regard.
>
>
>
> http://image.slidesharecdn.com/netflix-kafka-150325105558-conversion-gate01/95/netflix-data-pipeline-with-kafka-36-638.jpg?cb=1427281139
>
> http://image.slidesharecdn.com/netflix-kafka-150325105558-conversion-gate01/95/netflix-data-pipeline-with-kafka-36-638.jpg?cb=1427281139
>
> https://issues.apache.org/jira/browse/ZOOKEEPER-338
>
> https://issues.apache.org/jira/browse/ZOOKEEPER-1506
>
> http://grokbase.com/t/kafka/users/131x67h1bt/zookeeper-caching-dns-entries
>
>
>
> Thanks,
>
> Zuber
>
>



-- 
Thanks & Regards,
Zuber Saiyed


Re: Issue with KTable State Store

2016-08-03 Thread Srinidhi Muppalla
Hi Guozahang,

I believe we are using RocksDB. We are not using the Processor API, just simple 
map and countByKey functions so it is using the default KeyValue Store.

Thanks!

Srinidhi

Hello Srinidhi,

Are you using RocksDB as well like in the WordCountDemo for your
aggregation operator?

Guozhang


On Tue, Aug 2, 2016 at 5:20 PM, Srinidhi Muppalla 
wrote:

> Hey All,
>
> We are having issues successfully storing and accessing a Ktable on our
> cluster which happens to be on AWS. We are trying to store a Ktable of
> counts of ’success' and ‘failure’ strings, similar to the WordCountDemo in
> the documentation. The Kafka Streams application that creates the KTable
> works locally, but doesn’t appear be storing the state on our cluster. Does
> anyone have any experience working with Ktables and AWS or knows what
> configs related to our Kafka brokers or Streams setup could be causing this
> failure on our server but not on my local machine? Any insight into what
> could be causing this issue would be helpful.
>
> Here is what the output topic we are writing the Ktable to looks like
> locally:
>
> SUCCESS 1
> SUCCESS 2
> FAILURE 1
> SUCCESS 3
> FAILURE 2
> FAILURE 3
> FAILURE 4.
>
> Here is what it looks like on our cluster:
>
> SUCCESS 1
> SUCCESS 1
> FAILURE 1
> SUCCESS 1
> FAILURE 1
> FAILURE 1
> FAILURE 1.
>
> Thanks,
> Srinidhi
>
>


--
-- Guozhang


Re: Preserve offset in Mirror Maker

2016-08-03 Thread Gwen Shapira
MirrorMaker actually doesn't have a default - it uses what you
configured in the consumer.properties file you use.

Either:
auto.offset.reset = latest (biggest in old versions)
or
auto.offset.reset = earliest (smallest in old versions)

So you can choose whether when MirrorMaker first comes up, if it
starts from beginning or end.

Note that this is just for first start. Any other restart after, it
should use the checkpoints like normal consumers and not lose data
(especially in 0.9.0 and above).

Gwen

On Wed, Aug 3, 2016 at 2:15 PM, Sunil Parmar  wrote:
> We're using mirror maker to mirror data from one data center to another data 
> center ( 1 to 1 ). We noticed that by default the mirror maker by default 
> start from latest offset.  How to change mirror maker producer config to 
> start from last check pointed offset in case of crash without losing data ?
>
> -Sunil
>


Re: Issue with KTable State Store

2016-08-03 Thread Guozhang Wang
Hello Srinidhi,

Are you using RocksDB as well like in the WordCountDemo for your
aggregation operator?

Guozhang


On Tue, Aug 2, 2016 at 5:20 PM, Srinidhi Muppalla 
wrote:

> Hey All,
>
> We are having issues successfully storing and accessing a Ktable on our
> cluster which happens to be on AWS. We are trying to store a Ktable of
> counts of ’success' and ‘failure’ strings, similar to the WordCountDemo in
> the documentation. The Kafka Streams application that creates the KTable
> works locally, but doesn’t appear be storing the state on our cluster. Does
> anyone have any experience working with Ktables and AWS or knows what
> configs related to our Kafka brokers or Streams setup could be causing this
> failure on our server but not on my local machine? Any insight into what
> could be causing this issue would be helpful.
>
> Here is what the output topic we are writing the Ktable to looks like
> locally:
>
> SUCCESS 1
> SUCCESS 2
> FAILURE 1
> SUCCESS 3
> FAILURE 2
> FAILURE 3
> FAILURE 4.
>
> Here is what it looks like on our cluster:
>
> SUCCESS 1
> SUCCESS 1
> FAILURE 1
> SUCCESS 1
> FAILURE 1
> FAILURE 1
> FAILURE 1.
>
> Thanks,
> Srinidhi
>
>


-- 
-- Guozhang


Re: Opening up Kafka JMX port for Kafka Consumer in Kafka Streams app

2016-08-03 Thread Guozhang Wang
Hello Phillip,

You are right that a Kafka Streams instance has one producer and consumer
client behind the scene, and hence you can just monitor it like you are
monitoring a normal producer / consumer client.

More specifically, you can find the producer metric names as in:

http://docs.confluent.io/3.0.0/kafka/monitoring.html

And Kafka Streams has a few more metrics for its own in addition to the
consumer and producer clients, for which we are yet to add the docs for.


Guozhang


On Tue, Aug 2, 2016 at 11:49 AM, David Garcia  wrote:

> Have you looked at kafka manager: https://github.com/yahoo/kafka-manager
> It provides consumer level metrics.
>
> -David
>
> On 8/2/16, 12:36 PM, "Phillip Mann"  wrote:
>
> Hello all,
>
> This is a bit of a convoluted title but we are trying to set up
> monitoring on our Kafka Cluster and Kafka Streams app.  I currently have
> JMX port open on our Kafka cluster across our brokers.  I am able to use a
> Java JMX client to get certain metrics that are helpful to us.  However,
> the data we want to monitor the most is consumer level JMX metrics.  This
> is made complicated because there is no documentation for what we are
> trying to do.  Essentially we need to expose the JMX port for the Kafka
> Streams Consumer.  Each Kafka Streams app contains a producer and consumer
> (group?).  We want to get the offset lag metrics for the Kafka Streams
> consumer.  Is this possible?  If so, how do we do it?  Thanks for the help!!
>
> Phillip
>
>
>


-- 
-- Guozhang


Preserve offset in Mirror Maker

2016-08-03 Thread Sunil Parmar
We're using mirror maker to mirror data from one data center to another data 
center ( 1 to 1 ). We noticed that by default the mirror maker by default start 
from latest offset.  How to change mirror maker producer config to start from 
last check pointed offset in case of crash without losing data ?

-Sunil



Re: A specific use case

2016-08-03 Thread Guozhang Wang
Hello Hamza,

By saying "broker" I think you are actually referring to a Kafka Streams
instance?


Guozhang

On Mon, Aug 1, 2016 at 1:01 AM, Hamza HACHANI 
wrote:

> Good morning,
>
> I'm working on a specific use case. In fact i'm receiving messages from an
> operator network and trying to do statistics on their number per
> minute,perhour,per day ...
>
> I would like to create a broker that receives the messages and generates a
> message every minute. These producted messages are consumed by a consumer
> from in one hand and also se,t to an other topic which receives them and
> generates messages every minute.
>
> I've  been doing that for a while without a success. In fact the first
> broker in any time it receives a messages ,it produces one and send it to
> the other topic.
>
> My question is ,what i'm trying to do,Is it possible without passing by an
> intermediate java processus which is out of kafka.
>
> If yes , How ?
>
> Thanks In advance.
>



-- 
-- Guozhang


Unable to write, leader not available

2016-08-03 Thread Bryan Baugher
Hi everyone,

I was trying out kerberos on Kafka 0.10.0.0 by creating a single node
cluster. I managed to get everything setup and past all the authentication
errors but whenever I try to use the console producer I get 'Error while
fetching metadata ... LEADER_NOT_AVAILABLE'. In this case I've created the
topic ahead of time (1 replica, 1 partition) and I can see that broker 0 is
in the ISR and is the leader. I have also opened an ACL to the topic for my
user to produce and was previously seeing authentication errors prior. I
don't see any errors or helpful logs on the broker side even after turning
on debug logging. Turning on debug logging on the client the only thing
that stands out is that it lists the broker as 'node -1' instead of 0. It
does mention the correct hostname/port and that it was able to successfully
connect. Any ideas?

Bryan


Reg: SSL setup

2016-08-03 Thread BigData dev
Hi,
Can you please provide information on Self signed certificate setup in
Kafka. As in Kafka documentation only CA signed setup is provided.

http://kafka.apache.org/documentation.html#security_ssl


As because, we need to provide parameters trustore, keystore during
configuration.

Or to work with self signed certificate, do we need to import all nodes
certificates to trustore on all machines?

Can you please provide information on this, if you have worked on this.


Thanks,
Bharat


Re: Kafka DNS Caching in AWS

2016-08-03 Thread Alexis Midon
Hi Gwen,

I have explored and tested this approach in the past. It does not work for
2 reasons:
 A. the first one relates to the ZKClient implementation,
 B. the second is the JVM behavior.


A. The ZKConnection [1] managed by ZKClient uses a legacy constructor of
org,apache.Zookeeper [2]. The created Zookeeper instance relies on a
StaticHostProvider [3].
This host provider implementation will resolve the DNS on instantiation. So
as soon as the Kafka broker creates its ZKClient instance, all server
addresses are resolved and the corresponding InetSocketAddress instances
will store the IP for their lifetime.  :(

I believe the right thing to do would be for ZKClient to use a custom
HostProvider implementation that create a new InetSocketAddress instance on
each invocation of `HostProvider#next()` [4] (therefore resolving the
address).

You would think this is enough, but no, because the JVM itself caches DNS.


B. When an InetSocketAddress instance resolves a DNS name, the JVM will
cache the value! So even if a dynamic HostProvider implementation is used,
the JVM might return a cached value.
And the default TTL is implementation specific. If I remember correctly the
Oracle JVM caches them for ever. [5]

So you must also configure the Kafka JVM correctly.

hope it helps,

Alexis

[1]
https://github.com/sgroschupf/zkclient/blob/ec77080a5d7a5d920fa0e8ea5bd5119fb02a06f1/src/main/java/org/I0Itec/zkclient/ZkConnection.java#L69

[2]
https://github.com/apache/zookeeper/blob/a0fcb8ff6c2eece8804ca6c009c175cf8a86335d/src/java/main/org/apache/zookeeper/ZooKeeper.java#L1210

[3]
https://github.com/apache/zookeeper/blob/a0fcb8ff6c2eece8804ca6c009c175cf8a86335d/src/java/main/org/apache/zookeeper/client/StaticHostProvider.java

[4]
https://github.com/apache/zookeeper/blob/a0fcb8ff6c2eece8804ca6c009c175cf8a86335d/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1071

[5]
http://docs.oracle.com/javase/8/docs/technotes/guides/net/properties.html
*`networkaddress.cache.ttl`*Specified in java.security to indicate the
caching policy for successful name lookups from the name service.. The
value is specified as integer to indicate the number of seconds to cache
the successful lookup.

A value of -1 indicates "cache forever". The default behavior is to cache
forever when a security manager is installed, and to cache for an
implementation specific period of time, when a security manager is not
installed.
See also `networkaddress.cache.negative.ttl`.




On Wed, Aug 3, 2016 at 9:45 AM Gwen Shapira  wrote:

> Can you define a DNS name that round-robins to multiple IP addresses?
> This way ZKClient will cache the name and you can rotate IPs behind
> the scenes with no issues?
>
>
>
> On Wed, Aug 3, 2016 at 7:22 AM, Zuber  wrote:
> > Hello –
> >
> > We are planning to use Kafka as Event Store in a system which is being
> built using event sourcing design approach.
> > Here is how we deployed the cluster in AWS to verify HA in the cloud (in
> our POC we only had 1 topic with 1 partition and 3 replication factor) -
> > 1)3 ZK servers running in different AZs (managed by Auto Scaling
> Group)
> > 2)3 Kafka brokers EC2 running in different AZs (managed by Auto
> Scaling Group)
> > 3)Kafka logs are stored in EBS volumes
> > 4)A type addresses are defined for all ZK servers & Kafka brokers in
> Route53
> > EC2 instance registers its IP for corresponding A type address (in
> Route53) on startup
> >
> > But due a bug in ZKClient used by Kafka broker which caches ZK IP
> forever, I don’t see any other option other than bouncing all brokers.
> >
> > One of the Netflix presentation (following links) mentions about the
> issue as well as couple of ZK JIRA defects but I haven’t found any concrete
> solution yet.
> > I would really appreciate any help in this regard.
> >
> >
> http://image.slidesharecdn.com/netflix-kafka-150325105558-conversion-gate01/95/netflix-data-pipeline-with-kafka-36-638.jpg?cb=1427281139
> >
> http://image.slidesharecdn.com/netflix-kafka-150325105558-conversion-gate01/95/netflix-data-pipeline-with-kafka-36-638.jpg?cb=1427281139
> > https://issues.apache.org/jira/browse/ZOOKEEPER-338
> > https://issues.apache.org/jira/browse/ZOOKEEPER-1506
> >
> http://grokbase.com/t/kafka/users/131x67h1bt/zookeeper-caching-dns-entries
> >
> > Thanks,
> > Zuber
> >
>


Re: Kafka DNS Caching in AWS

2016-08-03 Thread Joe Lawson
In the past on classic EC2 with an autoscaling group of zookeeper
instances, I've used elastic IPs for my list. There we subscribed an SQS
queue to the autoscaling SNS topic and when a new instances was brought
online one of the spare IPs was allocated to the instance. It has to try
over and over sometimes if we had 3 EIPs and all were assigned as a new
instance was being brought online. All of the EIPs and SNS topic
subscriptions were automatically created via Cloudformation.

On Wed, Aug 3, 2016 at 12:54 PM, Gian Merlino  wrote:

> Hey Zuber,
>
> Our AWS ZK deployment involves a subnet that is not used for other things,
> fixed private IP addresses, and EBS volumes for ZK data. That way, if a ZK
> instance fails, it can be replaced with another instance with the same IP
> and data volume.
>
> On Wed, Aug 3, 2016 at 7:22 AM, Zuber  wrote:
>
> > Hello –
> >
> > We are planning to use Kafka as Event Store in a system which is being
> > built using event sourcing design approach.
> > Here is how we deployed the cluster in AWS to verify HA in the cloud (in
> > our POC we only had 1 topic with 1 partition and 3 replication factor) -
> > 1)3 ZK servers running in different AZs (managed by Auto Scaling
> Group)
> > 2)3 Kafka brokers EC2 running in different AZs (managed by Auto
> > Scaling Group)
> > 3)Kafka logs are stored in EBS volumes
> > 4)A type addresses are defined for all ZK servers & Kafka brokers in
> > Route53
> > EC2 instance registers its IP for corresponding A type address (in
> > Route53) on startup
> >
> > But due a bug in ZKClient used by Kafka broker which caches ZK IP
> forever,
> > I don’t see any other option other than bouncing all brokers.
> >
> > One of the Netflix presentation (following links) mentions about the
> issue
> > as well as couple of ZK JIRA defects but I haven’t found any concrete
> > solution yet.
> > I would really appreciate any help in this regard.
> >
> >
> >
> http://image.slidesharecdn.com/netflix-kafka-150325105558-conversion-gate01/95/netflix-data-pipeline-with-kafka-36-638.jpg?cb=1427281139
> >
> >
> http://image.slidesharecdn.com/netflix-kafka-150325105558-conversion-gate01/95/netflix-data-pipeline-with-kafka-36-638.jpg?cb=1427281139
> > https://issues.apache.org/jira/browse/ZOOKEEPER-338
> > https://issues.apache.org/jira/browse/ZOOKEEPER-1506
> >
> http://grokbase.com/t/kafka/users/131x67h1bt/zookeeper-caching-dns-entries
> >
> > Thanks,
> > Zuber
> >
> >
>



-- 
-Joe


Re: Kafka DNS Caching in AWS

2016-08-03 Thread Gian Merlino
Hey Zuber,

Our AWS ZK deployment involves a subnet that is not used for other things,
fixed private IP addresses, and EBS volumes for ZK data. That way, if a ZK
instance fails, it can be replaced with another instance with the same IP
and data volume.

On Wed, Aug 3, 2016 at 7:22 AM, Zuber  wrote:

> Hello –
>
> We are planning to use Kafka as Event Store in a system which is being
> built using event sourcing design approach.
> Here is how we deployed the cluster in AWS to verify HA in the cloud (in
> our POC we only had 1 topic with 1 partition and 3 replication factor) -
> 1)3 ZK servers running in different AZs (managed by Auto Scaling Group)
> 2)3 Kafka brokers EC2 running in different AZs (managed by Auto
> Scaling Group)
> 3)Kafka logs are stored in EBS volumes
> 4)A type addresses are defined for all ZK servers & Kafka brokers in
> Route53
> EC2 instance registers its IP for corresponding A type address (in
> Route53) on startup
>
> But due a bug in ZKClient used by Kafka broker which caches ZK IP forever,
> I don’t see any other option other than bouncing all brokers.
>
> One of the Netflix presentation (following links) mentions about the issue
> as well as couple of ZK JIRA defects but I haven’t found any concrete
> solution yet.
> I would really appreciate any help in this regard.
>
>
> http://image.slidesharecdn.com/netflix-kafka-150325105558-conversion-gate01/95/netflix-data-pipeline-with-kafka-36-638.jpg?cb=1427281139
>
> http://image.slidesharecdn.com/netflix-kafka-150325105558-conversion-gate01/95/netflix-data-pipeline-with-kafka-36-638.jpg?cb=1427281139
> https://issues.apache.org/jira/browse/ZOOKEEPER-338
> https://issues.apache.org/jira/browse/ZOOKEEPER-1506
> http://grokbase.com/t/kafka/users/131x67h1bt/zookeeper-caching-dns-entries
>
> Thanks,
> Zuber
>
>


Re: Kafka DNS Caching in AWS

2016-08-03 Thread Gwen Shapira
Can you define a DNS name that round-robins to multiple IP addresses?
This way ZKClient will cache the name and you can rotate IPs behind
the scenes with no issues?



On Wed, Aug 3, 2016 at 7:22 AM, Zuber  wrote:
> Hello –
>
> We are planning to use Kafka as Event Store in a system which is being built 
> using event sourcing design approach.
> Here is how we deployed the cluster in AWS to verify HA in the cloud (in our 
> POC we only had 1 topic with 1 partition and 3 replication factor) -
> 1)3 ZK servers running in different AZs (managed by Auto Scaling Group)
> 2)3 Kafka brokers EC2 running in different AZs (managed by Auto Scaling 
> Group)
> 3)Kafka logs are stored in EBS volumes
> 4)A type addresses are defined for all ZK servers & Kafka brokers in 
> Route53
> EC2 instance registers its IP for corresponding A type address (in Route53) 
> on startup
>
> But due a bug in ZKClient used by Kafka broker which caches ZK IP forever, I 
> don’t see any other option other than bouncing all brokers.
>
> One of the Netflix presentation (following links) mentions about the issue as 
> well as couple of ZK JIRA defects but I haven’t found any concrete solution 
> yet.
> I would really appreciate any help in this regard.
>
> http://image.slidesharecdn.com/netflix-kafka-150325105558-conversion-gate01/95/netflix-data-pipeline-with-kafka-36-638.jpg?cb=1427281139
> http://image.slidesharecdn.com/netflix-kafka-150325105558-conversion-gate01/95/netflix-data-pipeline-with-kafka-36-638.jpg?cb=1427281139
> https://issues.apache.org/jira/browse/ZOOKEEPER-338
> https://issues.apache.org/jira/browse/ZOOKEEPER-1506
> http://grokbase.com/t/kafka/users/131x67h1bt/zookeeper-caching-dns-entries
>
> Thanks,
> Zuber
>


Re: Kafka In cloud environment

2016-08-03 Thread Gwen Shapira
No. If you want automatic update, you need to use the same broker id.
Many deployments use EBS to store their broker data. The
auto-generated id is stored with the data, so if a broker dies they
install a new machine and connect it to the existing EBS volume and
immediately get both the old id and the old data.

If you use a different broker id, you need to use the
replica-reassignment tool to move replicas to the new broker.

On Tue, Aug 2, 2016 at 6:53 PM, Digumarthi, Prabhakar Venkata Surya
 wrote:
> In case I use automatic brokerId generation and if a broker dies, and a new 
> broker is added with a different broker Id . Will the replica set gets 
> updated automatically ?
>
> 
>
> The information contained in this e-mail is confidential and/or proprietary 
> to Capital One and/or its affiliates and may only be used solely in 
> performance of work or services for Capital One. The information transmitted 
> herewith is intended only for use by the individual or entity to which it is 
> addressed. If the reader of this message is not the intended recipient, you 
> are hereby notified that any review, retransmission, dissemination, 
> distribution, copying or other use of, or taking of any action in reliance 
> upon this information is strictly prohibited. If you have received this 
> communication in error, please contact the sender and delete the material 
> from your computer.


Re: [kafka-clients] [VOTE] 0.10.0.1 RC1

2016-08-03 Thread Manikumar Reddy
Hi,

There are two versions of slf4j-log4j jar in the build. (1.6.1, 1.7.21).
slf4j-log4j12-1.6.1.jar is coming from streams:examples module.

Thanks,
Manikumar

On Tue, Aug 2, 2016 at 8:31 PM, Ismael Juma  wrote:

> Hello Kafka users, developers and client-developers,
>
> This is the second candidate for the release of Apache Kafka 0.10.0.1.
> This is a bug fix release and it includes fixes and improvements from 52
> JIRAs (including a few critical bugs). See the release notes for more
> details:
>
> http://home.apache.org/~ijuma/kafka-0.10.0.1-rc1/RELEASE_NOTES.html
>
> When compared to RC0, RC1 contains fixes for two bugs (KAFKA-4008
> and KAFKA-3950) and a couple of test stabilisation fixes.
>
> *** Please download, test and vote by Friday, 5 August, 8am PT ***
>
> Kafka's KEYS file containing PGP keys we use to sign the release:
> http://kafka.apache.org/KEYS
>
> * Release artifacts to be voted upon (source and binary):
> http://home.apache.org/~ijuma/kafka-0.10.0.1-rc1/
>
> * Maven artifacts to be voted upon:
> https://repository.apache.org/content/groups/staging
>
> * Javadoc:
> http://home.apache.org/~ijuma/kafka-0.10.0.1-rc1/javadoc/
>
> * Tag to be voted upon (off 0.10.0 branch) is the 0.10.0.1-rc1 tag:
>
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=108580e4594d694827c953264969fe1ce2a7
>
> * Documentation:
> http://kafka.apache.org/0100/documentation.html
>
> * Protocol:
> http://kafka.apache.org/0100/protocol.html
>
> * Successful Jenkins builds for the 0.10.0 branch:
> Unit/integration tests: *https://builds.apache.org/job/kafka-0.10.0-jdk7/179/
> *
> System tests: *https://jenkins.confluent.io/job/system-test-kafka-0.10.0/136/
> *
>
> Thanks,
> Ismael
>
> --
> You received this message because you are subscribed to the Google Groups
> "kafka-clients" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kafka-clients+unsubscr...@googlegroups.com.
> To post to this group, send email to kafka-clie...@googlegroups.com.
> Visit this group at https://groups.google.com/group/kafka-clients.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/kafka-clients/CAD5tkZaRxAjQbwS_1q4MqskSYKxQWBFmdPVf_PP020bjY9%3DCgQ%40mail.gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>


Kafka DNS Caching in AWS

2016-08-03 Thread Zuber
Hello –
 
We are planning to use Kafka as Event Store in a system which is being built 
using event sourcing design approach.
Here is how we deployed the cluster in AWS to verify HA in the cloud (in our 
POC we only had 1 topic with 1 partition and 3 replication factor) -
1)3 ZK servers running in different AZs (managed by Auto Scaling Group)
2)3 Kafka brokers EC2 running in different AZs (managed by Auto Scaling 
Group)
3)Kafka logs are stored in EBS volumes
4)A type addresses are defined for all ZK servers & Kafka brokers in Route53
EC2 instance registers its IP for corresponding A type address (in Route53) on 
startup
 
But due a bug in ZKClient used by Kafka broker which caches ZK IP forever, I 
don’t see any other option other than bouncing all brokers.
 
One of the Netflix presentation (following links) mentions about the issue as 
well as couple of ZK JIRA defects but I haven’t found any concrete solution yet.
I would really appreciate any help in this regard.
 
http://image.slidesharecdn.com/netflix-kafka-150325105558-conversion-gate01/95/netflix-data-pipeline-with-kafka-36-638.jpg?cb=1427281139
http://image.slidesharecdn.com/netflix-kafka-150325105558-conversion-gate01/95/netflix-data-pipeline-with-kafka-36-638.jpg?cb=1427281139
https://issues.apache.org/jira/browse/ZOOKEEPER-338
https://issues.apache.org/jira/browse/ZOOKEEPER-1506
http://grokbase.com/t/kafka/users/131x67h1bt/zookeeper-caching-dns-entries
 
Thanks,
Zuber



Re: Consumer poll in 0.9.0.1 hanging

2016-08-03 Thread Carlos Rodriguez Fernandez
Thank you Eween for the help. I'll give that a try.
Regards,
Carlos.

On Tue, Aug 2, 2016, 10:12 PM Ewen Cheslack-Postava 
wrote:

> It seems like we definitely shouldn't block indefinitely, but what is
> probably happening is that the consumer is fetching metadata, not finding
> the topic, then getting hung up somewhere.
>
> It probably won't hang indefinitely -- there is a periodic refresh of
> metadata, defaulting to every 5 minutes, which should pick up the topic
> once you have created it. Presumably then it would start returning data.
>
> -Ewen
>
> On Mon, Aug 1, 2016 at 4:31 PM, Carlos Rodriguez Fernandez <
> carlosrodrifernan...@gmail.com> wrote:
>
> > Hi,
> > When using Apache Camel Kafka to consume messages, I notice that when the
> > topic is not created the fetching here:
> >
> >org.apache.camel.component.kafka.KafkaConsumer.run..
> >
> > ConsumerRecords records = consumer.poll(Long.MAX_VALUE);
> >
> >
> > ... just hangs forever, even if I create the topic and publish messages.
> It
> > seems that I need to create the topic *before* the "poll" is invoked,
> > otherwise, the poll call never comes back not matter what I do.
> >
> >
> > Is this an expected behaviors?
> >
> >
> > Thank you,
> >
> > Carlos.
> >
>
>
>
> --
> Thanks,
> Ewen
>


Re: Using automatic brokerId generation

2016-08-03 Thread Tom Crayford
You have to run kafka-reassign-partitions.sh script to move partitions to a
new replica id.

On Wed, Aug 3, 2016 at 3:14 AM, Digumarthi, Prabhakar Venkata Surya <
prabhakarvenkatasurya.digumar...@capitalone.com> wrote:

> Hi ,
>
>
> I am right now using kafka version 0.9.1.0
>
> If I choose to enable automatic brokerId generation, and let’s say if one
> of my broker dies and a new broker gets started with a different brokerId.
> Is there a way I can get the new broker Id part of the replica set of a
> partition automatically?  Or is that I need to run the
> kafka-reassign-partitions.sh script for that to happen?
>
> Thanks,
> PD
>
> 
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>


Re: Expose Kafka Server Configuration

2016-08-03 Thread Daniel Schierbeck
This would be of value to me, as well. I'm currently not sure how to avoid
having users of ruby-kafka produce messages that exceed that limit when
using an async producer loop – I'd prefer to not allow such a message into
the buffers at all rather than having to deal with it only when there's a
broker error.

On Tue, Aug 2, 2016 at 3:57 PM Chris Barlock  wrote:

> Does Kafka expose its server configuration in any way that I can get it
> programmatically?  Specifically, I'm interested in knowing the
> message.max.bytes value.
>
> Chris
>
>


UNKNOWN_MEMBER_ID

2016-08-03 Thread dhiraj prajapati
Hi,
I am using kafka 0.9.0.1 and the corresponding java client for my consumer.
I see the below error in my consumer logs:

o.a.k.c.c.i.ConsumerCoordinator - Error UNKNOWN_MEMBER_ID occurred while
committing offsets for group consumergroup001

Why could this error occur?