Re: Newbie Question

2020-03-28 Thread Colin Ross
Thanks Hans - this makes sense, except for the debug messages give me
exactly what I need without having to instrument any clients. It should be
noted that for now, I am running a single server, so perhaps the messages
change when I cluster?
I maybe caused confusion by mentioning that I want to know where the
messages go - that is not quite precise from an individual message
perspective, but it is right enough for what I want to achieve (for now ;-)
). I just want a record of each IP Address and which topic (or something
that can be traced back to a topic) they are connected to, from a high
level, without having to instrument the clients (which can be upwards of
10,000, and I have no control or access over).
Currently, as I mentioned, the debug messages have exactly what I need for
this phase:
[2020-03-28 20:32:23,901] DEBUG Principal = User:ANONYMOUS is Allowed
Operation = Read from host = x.x.x.x on resource = Topic:LITERAL:
(kafka.authorizer.logger)
Just figuring there must be a better way of getting this info rather than
turning on debug.

On Sat, Mar 28, 2020 at 4:15 PM Hans Jespersen  wrote:

> I can tell from the terminology you use that you are familiar with
> traditional message queue products. Kafka is very different. Thats what
> makes it so interesting and revolutionary in my opinion.
>
> Clients do not connect to topics because kafka is a distributed and
> clustered system where topics are sharded into pieces called partitions and
> the topic partitions are spread out across all the kafka brokers in the
> cluster (and also replicated several more times across the cluster for
> fault tolerance). When a client logically connects to a topic, its actually
> making many connections to many nodes in the kafka cluster which enables
> both parallel processing and fault tolerance.
>
> Also when a client consumes a message, the message is not removed from a
> queue, it remains in kafka for many days (sometimes months or years). It is
> not “taken off the queue” it is rather “copied from the commit log”. It can
> be consumed again and again if needed because it is an immutable record of
> an event that happened.
>
> Now getting back to your question of how to see where messages get
> consumed (copied). The reality is that they go many places and can be
> consumed many times. This makes tracing and tracking message delivery more
> difficult but not impossible. There are many tools both open source and
> commercial that can track data from producer to kafka (with replication) to
> multiple consumers. They typically involve taking telemetry from both
> clients (producers and consumers) and brokers (all of them as they act as a
> cluster) and aggregate all the data to see the full flow of messages in the
> system. Thats why the logs may seem overwelming and you need to look at the
> logs of all the broker (and perhaps all the clients as well) to get the
> full picture.
>
> -hans
>
> > On Mar 28, 2020, at 4:50 PM, Colin Ross  wrote:
> >
> > Hi All - just started to use Kafka. Just one thing driving me nuts. I
> want
> > to get logs of each time a publisher or subscriber connects. I am trying
> to
> > just get the IP that they connected from and the topic to which they
> > connected. I have managed to do this through enabling debug in the
> > kafka-authorizer, however, the number of logs are overwhelming as is the
> > update rate (looks like 2 per second per client).
> >
> > What I am actually trying to achieve is to understand where messages go,
> so
> > I would be more than happy to just see notifications when messages are
> > actually sent and actually taken off the queue.
> >
> > Is there a more efficient way of achieving my goal than turning on debug?
> >
> > Cheers
> > Rossi
>


Re: Newbie Question

2020-03-28 Thread Hans Jespersen
I can tell from the terminology you use that you are familiar with traditional 
message queue products. Kafka is very different. Thats what makes it so 
interesting and revolutionary in my opinion.

Clients do not connect to topics because kafka is a distributed and clustered 
system where topics are sharded into pieces called partitions and the topic 
partitions are spread out across all the kafka brokers in the cluster (and also 
replicated several more times across the cluster for fault tolerance). When a 
client logically connects to a topic, its actually making many connections to 
many nodes in the kafka cluster which enables both parallel processing and 
fault tolerance.

Also when a client consumes a message, the message is not removed from a queue, 
it remains in kafka for many days (sometimes months or years). It is not “taken 
off the queue” it is rather “copied from the commit log”. It can be consumed 
again and again if needed because it is an immutable record of an event that 
happened.

Now getting back to your question of how to see where messages get consumed 
(copied). The reality is that they go many places and can be consumed many 
times. This makes tracing and tracking message delivery more difficult but not 
impossible. There are many tools both open source and commercial that can track 
data from producer to kafka (with replication) to multiple consumers. They 
typically involve taking telemetry from both clients (producers and consumers) 
and brokers (all of them as they act as a cluster) and aggregate all the data 
to see the full flow of messages in the system. Thats why the logs may seem 
overwelming and you need to look at the logs of all the broker (and perhaps all 
the clients as well) to get the full picture.

-hans 

> On Mar 28, 2020, at 4:50 PM, Colin Ross  wrote:
> 
> Hi All - just started to use Kafka. Just one thing driving me nuts. I want
> to get logs of each time a publisher or subscriber connects. I am trying to
> just get the IP that they connected from and the topic to which they
> connected. I have managed to do this through enabling debug in the
> kafka-authorizer, however, the number of logs are overwhelming as is the
> update rate (looks like 2 per second per client).
> 
> What I am actually trying to achieve is to understand where messages go, so
> I would be more than happy to just see notifications when messages are
> actually sent and actually taken off the queue.
> 
> Is there a more efficient way of achieving my goal than turning on debug?
> 
> Cheers
> Rossi


Newbie Question

2020-03-28 Thread Colin Ross
Hi All - just started to use Kafka. Just one thing driving me nuts. I want
to get logs of each time a publisher or subscriber connects. I am trying to
just get the IP that they connected from and the topic to which they
connected. I have managed to do this through enabling debug in the
kafka-authorizer, however, the number of logs are overwhelming as is the
update rate (looks like 2 per second per client).

What I am actually trying to achieve is to understand where messages go, so
I would be more than happy to just see notifications when messages are
actually sent and actually taken off the queue.

Is there a more efficient way of achieving my goal than turning on debug?

Cheers
Rossi


Re: Streams Newbie Question - Deployment and Management of Stream Processors

2020-01-13 Thread Sachin Mittal
I think literature on confluent/ASF and also the community support here is
best to learn about streaming.

On Mon, Jan 13, 2020 at 6:47 PM M. Manna  wrote:

> Hey Sachin,
>
> On Mon, 13 Jan 2020 at 05:12, Sachin Mittal  wrote:
>
> > Hi,
> > The way I have used streams processing in past; use case to process
> streams
> > is when you have a continuous stream of data which needs to be processed
> > and used by certain applications.
> > Since in kafka streams can be a simple java application, this application
> > can run in its own JVM which is different from say actual client
> > application.
> > It can be on same physical or virtual machine, but some degree of
> > separation is best.
> >
> > Regarding streams the way I look at it that, it is some continuous
> process
> > whose data downstream is used by micro services.
> > The downstream data can be stored using stream's state stores or can be
> > some external data store (say mongodb, cassandra, etc).
> >
>
>  I totally get your point. My understanding has been the same too. Stream
> processing is all about honouring what stream is all about - stateless,
> non-interfering (almost), and side-effect free.
>  Also, even though the terminal result from stream topology can be stored -
> may be it's needed for decision making only. So storage is a usage (amongst
> many).
>
> Thanks a lot for clarifying. I shall continue my endeavour to learn other
> things. Apart from Confluent and ASF examples, do you recommend anything
> else for starters ?
>
> Regards,
>
> Hope it answers some of your questions.
> >
> > Thanks
> > Sachin
> >
> >
> >
> > On Mon, Jan 13, 2020 at 1:32 AM M. Manna  wrote:
> >
> > > Hello,
> > >
> > > Even though I have been using Kafka for a while, it's primarily for
> > > publish/subscribe event messaging ( and I understand them reasonably
> > well).
> > > But I would like to do more regarding streams.
> > >
> > > For my initiative, I have been going through the code written in
> > "examples"
> > > folder. I would like to apologise for such newbie questions in advance.
> > >
> > > With reference to WordCountDemo.java - I wanted to understand something
> > > related to Stream Processor integration with business applications
> (i.e.
> > > clients). Is it a good practice to always keep the stream processor
> > > topology separate from actual client application who uses the processed
> > > data?
> > >
> > > My understanding (from what I can see at first glace) multiple
> > > streams.start() needs careful observation for scaling up/out in long
> > term.
> > > To separate problems, I would expected this to be deployed separately
> > (may
> > > be microservices?) But again, I am simply entering this world of
> streams,
> > > so I could really use some insight into how some of us has tackled this
> > > over the years.
> > >
> > > Kindest Regards,
> > >
> >
>


Re: Streams Newbie Question - Deployment and Management of Stream Processors

2020-01-13 Thread M. Manna
Hey Sachin,

On Mon, 13 Jan 2020 at 05:12, Sachin Mittal  wrote:

> Hi,
> The way I have used streams processing in past; use case to process streams
> is when you have a continuous stream of data which needs to be processed
> and used by certain applications.
> Since in kafka streams can be a simple java application, this application
> can run in its own JVM which is different from say actual client
> application.
> It can be on same physical or virtual machine, but some degree of
> separation is best.
>
> Regarding streams the way I look at it that, it is some continuous process
> whose data downstream is used by micro services.
> The downstream data can be stored using stream's state stores or can be
> some external data store (say mongodb, cassandra, etc).
>

 I totally get your point. My understanding has been the same too. Stream
processing is all about honouring what stream is all about - stateless,
non-interfering (almost), and side-effect free.
 Also, even though the terminal result from stream topology can be stored -
may be it's needed for decision making only. So storage is a usage (amongst
many).

Thanks a lot for clarifying. I shall continue my endeavour to learn other
things. Apart from Confluent and ASF examples, do you recommend anything
else for starters ?

Regards,

Hope it answers some of your questions.
>
> Thanks
> Sachin
>
>
>
> On Mon, Jan 13, 2020 at 1:32 AM M. Manna  wrote:
>
> > Hello,
> >
> > Even though I have been using Kafka for a while, it's primarily for
> > publish/subscribe event messaging ( and I understand them reasonably
> well).
> > But I would like to do more regarding streams.
> >
> > For my initiative, I have been going through the code written in
> "examples"
> > folder. I would like to apologise for such newbie questions in advance.
> >
> > With reference to WordCountDemo.java - I wanted to understand something
> > related to Stream Processor integration with business applications (i.e.
> > clients). Is it a good practice to always keep the stream processor
> > topology separate from actual client application who uses the processed
> > data?
> >
> > My understanding (from what I can see at first glace) multiple
> > streams.start() needs careful observation for scaling up/out in long
> term.
> > To separate problems, I would expected this to be deployed separately
> (may
> > be microservices?) But again, I am simply entering this world of streams,
> > so I could really use some insight into how some of us has tackled this
> > over the years.
> >
> > Kindest Regards,
> >
>


Re: Streams Newbie Question - Deployment and Management of Stream Processors

2020-01-12 Thread Sachin Mittal
Hi,
The way I have used streams processing in past; use case to process streams
is when you have a continuous stream of data which needs to be processed
and used by certain applications.
Since in kafka streams can be a simple java application, this application
can run in its own JVM which is different from say actual client
application.
It can be on same physical or virtual machine, but some degree of
separation is best.

Regarding streams the way I look at it that, it is some continuous process
whose data downstream is used by micro services.
The downstream data can be stored using stream's state stores or can be
some external data store (say mongodb, cassandra, etc).

Hope it answers some of your questions.

Thanks
Sachin



On Mon, Jan 13, 2020 at 1:32 AM M. Manna  wrote:

> Hello,
>
> Even though I have been using Kafka for a while, it's primarily for
> publish/subscribe event messaging ( and I understand them reasonably well).
> But I would like to do more regarding streams.
>
> For my initiative, I have been going through the code written in "examples"
> folder. I would like to apologise for such newbie questions in advance.
>
> With reference to WordCountDemo.java - I wanted to understand something
> related to Stream Processor integration with business applications (i.e.
> clients). Is it a good practice to always keep the stream processor
> topology separate from actual client application who uses the processed
> data?
>
> My understanding (from what I can see at first glace) multiple
> streams.start() needs careful observation for scaling up/out in long term.
> To separate problems, I would expected this to be deployed separately (may
> be microservices?) But again, I am simply entering this world of streams,
> so I could really use some insight into how some of us has tackled this
> over the years.
>
> Kindest Regards,
>


Streams Newbie Question - Deployment and Management of Stream Processors

2020-01-12 Thread M. Manna
Hello,

Even though I have been using Kafka for a while, it's primarily for
publish/subscribe event messaging ( and I understand them reasonably well).
But I would like to do more regarding streams.

For my initiative, I have been going through the code written in "examples"
folder. I would like to apologise for such newbie questions in advance.

With reference to WordCountDemo.java - I wanted to understand something
related to Stream Processor integration with business applications (i.e.
clients). Is it a good practice to always keep the stream processor
topology separate from actual client application who uses the processed
data?

My understanding (from what I can see at first glace) multiple
streams.start() needs careful observation for scaling up/out in long term.
To separate problems, I would expected this to be deployed separately (may
be microservices?) But again, I am simply entering this world of streams,
so I could really use some insight into how some of us has tackled this
over the years.

Kindest Regards,


Re: [External] Re: Newbie question using Kafka Producers in Web apps

2019-01-25 Thread Tauzell, Dave
We are using both and leaning towards a web service fronting Kafka because it 
gives us the ability to centralize other logic.   That said, I don't think the 
webservice will be much more "stable" and you'll need to consider what to do 
with your audit records if the web service call fails.

-Dave

On 1/25/19, 12:07 AM, "Michael Eugene"  wrote:

I don’t feel it would be a big hit in performance because Kafka works very 
fast. I think the speed difference would be negligible. Why are you worried 
about stability? I’m just curious because it doesn’t seem like it would be 
unstable, but maybe it would be a bit overkill for one app and some decoupling 
might make sense.

Sent from my iPhone

> On Jan 24, 2019, at 9:59 PM, Raghavendran Chellappa 
 wrote:
>
> Hi All,
> We have a Spring based web app.
>
> We are planning to build an 'Audit Tracking' feature and plan to use Kafka
> - as a sink for storing Audit messages (which will then be consumed and
> persisted to a common DB).
>
>
>
> We are planning to build a simple, ‘pass-through’ REST service which will
> take a JSON and push it into the appropriate Kafka topic.
>
> This REST service will be called from various pages (from server side) in
> the web app (during Create, View, Edit actions) to store the Audit 
entries.
>
>
>
> My question is can we directly have Kafka Producers in the web app so that
> we post messages to Kafka Topic directly (instead of going through a
> Webservice)?
>
> Will adding a Kafka Producer in web app will make the app less stable 
(make
> pages less performant)? This is one of the reasons why we want to hide the
> Kafka producer complexity behind the webservice. Also we feel that this
> webservice can be a starting point for a generic “Auditing service” that
> can be used by other applications, in the enterprise, in the future.
>
>
>
> I think the ‘pass-through’ webservice is not required and it is OK to push
> messages directly from web app to Kafka (but unable to point to any
> examples of this being done or any benefits of doing so).
>
>
>
> What do you think?
>
>
>
> Thanks,
>
> Ragha


This e-mail and any files transmitted with it are confidential, may contain 
sensitive information, and are intended solely for the use of the individual or 
entity to whom they are addressed. If you have received this e-mail in error, 
please notify the sender by reply e-mail immediately and destroy all copies of 
the e-mail and any attachments.


Re: Newbie question using Kafka Producers in Web apps

2019-01-24 Thread Michael Eugene
I don’t feel it would be a big hit in performance because Kafka works very 
fast. I think the speed difference would be negligible. Why are you worried 
about stability? I’m just curious because it doesn’t seem like it would be 
unstable, but maybe it would be a bit overkill for one app and some decoupling 
might make sense. 

Sent from my iPhone

> On Jan 24, 2019, at 9:59 PM, Raghavendran Chellappa 
>  wrote:
> 
> Hi All,
> We have a Spring based web app.
> 
> We are planning to build an 'Audit Tracking' feature and plan to use Kafka
> - as a sink for storing Audit messages (which will then be consumed and
> persisted to a common DB).
> 
> 
> 
> We are planning to build a simple, ‘pass-through’ REST service which will
> take a JSON and push it into the appropriate Kafka topic.
> 
> This REST service will be called from various pages (from server side) in
> the web app (during Create, View, Edit actions) to store the Audit entries.
> 
> 
> 
> My question is can we directly have Kafka Producers in the web app so that
> we post messages to Kafka Topic directly (instead of going through a
> Webservice)?
> 
> Will adding a Kafka Producer in web app will make the app less stable (make
> pages less performant)? This is one of the reasons why we want to hide the
> Kafka producer complexity behind the webservice. Also we feel that this
> webservice can be a starting point for a generic “Auditing service” that
> can be used by other applications, in the enterprise, in the future.
> 
> 
> 
> I think the ‘pass-through’ webservice is not required and it is OK to push
> messages directly from web app to Kafka (but unable to point to any
> examples of this being done or any benefits of doing so).
> 
> 
> 
> What do you think?
> 
> 
> 
> Thanks,
> 
> Ragha


Newbie question using Kafka Producers in Web apps

2019-01-24 Thread Raghavendran Chellappa
Hi All,
We have a Spring based web app.

We are planning to build an 'Audit Tracking' feature and plan to use Kafka
- as a sink for storing Audit messages (which will then be consumed and
persisted to a common DB).



We are planning to build a simple, ‘pass-through’ REST service which will
take a JSON and push it into the appropriate Kafka topic.

This REST service will be called from various pages (from server side) in
the web app (during Create, View, Edit actions) to store the Audit entries.



My question is can we directly have Kafka Producers in the web app so that
we post messages to Kafka Topic directly (instead of going through a
Webservice)?

Will adding a Kafka Producer in web app will make the app less stable (make
pages less performant)? This is one of the reasons why we want to hide the
Kafka producer complexity behind the webservice. Also we feel that this
webservice can be a starting point for a generic “Auditing service” that
can be used by other applications, in the enterprise, in the future.



I think the ‘pass-through’ webservice is not required and it is OK to push
messages directly from web app to Kafka (but unable to point to any
examples of this being done or any benefits of doing so).



What do you think?



Thanks,

Ragha


Re: Newbie question: kerberos with Kafka and zookeeper

2017-02-17 Thread Stephane Maarek
Hi

It seems that your keytab doesn't have the principal you configured your
"client" section to use. Post your jaas here if you want further help but
basically you should be able to do

kinit -V -k -t   

On 18 Feb. 2017 3:56 am, "Raghav"  wrote:

Hi

I am trying to setup a simple setup with one Kafka broker, and zookeeper on
the same VM. One producer and one consumer on each VM. I have setup a KDC
on cents VM.

I am trying to following this guide:
http://docs.confluent.io/2.0.0/kafka/sasl.html#kerberos

When I start Kafka, it errors out with the following error. Do I need to
setup anything on zookeeper side as well to fix these errors ?

Thanks.



*[2017-02-16 19:05:00,583] WARN Could not login: the client is being asked
for a password, but the Zookeeper client code does not currently support
obtaining a password from the user. Make sure that the client is configured
to use a ticket cache (using the JAAS configuration setting
'useTicketCache=true)' and restart the client. If you still get this
message after that, the TGT in the ticket cache has expired and must be
manually refreshed. To do so, first determine if you are using a password
or a keytab. If the former, run kinit in a Unix shell in the environment of
the user who is running this Zookeeper client using the command 'kinit
' (where  is the name of the client's Kerberos principal). If
the latter, do 'kinit -k -t  ' (where  is the name of
the Kerberos principal, and  is the location of the keytab file).
After manually refreshing your cache, restart this client. If you continue
to see this message after manually refreshing your cache, ensure that your
KDC host's clock is in sync with this host's clock.
(org.apache.zookeeper.client.ZooKeeperSaslClient)*

*[2017-02-16 19:05:00,584] WARN SASL configuration failed:
javax.security.auth.login.LoginException: No password provided Will
continue connection to Zookeeper server without SASL authentication, if
Zookeeper server allows it. (org.apache.zookeeper.ClientCnxn)*

*[2017-02-16 19:05:00,585] INFO Opening socket connection to server
kafka2.example.com/172.26.230.67:2181

(org.apache.zookeeper.ClientCnxn)*

*[2017-02-16 19:05:00,585] INFO zookeeper state changed (AuthFailed)
(org.I0Itec.zkclient.ZkClient)*

*[2017-02-16 19:05:00,586] INFO Terminate ZkClient event thread.
(org.I0Itec.zkclient.ZkEventThread)*

*[2017-02-16 19:05:00,591] INFO Socket connection established to
kafka2.example.com/172.26.230.67:2181
, initiating session
(org.apache.zookeeper.ClientCnxn)*
*[2017-02-16 19:05:00,597] INFO Session establishment complete on server
kafka2.example.com/172.26.230.67:2181
, sessionid =
0x15a4a0678610002, negotiated timeout = 6000
(org.apache.zookeeper.ClientCnxn)*

*[2017-02-16 19:05:00,599] INFO Session: 0x15a4a0678610002 closed
(org.apache.zookeeper.ZooKeeper)*
*[2017-02-16 19:05:00,599] FATAL Fatal error during KafkaServer startup.
Prepare to shutdown (kafka.server.KafkaServer)*
*org.I0Itec.zkclient.exception.ZkAuthFailedException: Authentication
failure*
*at
org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.java:946)*
*at
org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:923)*
*at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:1230)*
*at org.I0Itec.zkclient.ZkClient.(ZkClient.java:156)*
*at org.I0Itec.zkclient.ZkClient.(ZkClient.java:130)*
*at
kafka.utils.ZkUtils$.createZkClientAndConnection(ZkUtils.scala:76)*
*at kafka.utils.ZkUtils$.apply(ZkUtils.scala:58)*
*at kafka.server.KafkaServer.initZk(KafkaServer.scala:327)*
*at kafka.server.KafkaServer.startup(KafkaServer.scala:200)*
*at
kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:39)*
*at kafka.Kafka$.main(Kafka.scala:67)*
*at kafka.Kafka.main(Kafka.scala)*


--
Raghav


Newbie question: kerberos with Kafka and zookeeper

2017-02-17 Thread Raghav
Hi

I am trying to setup a simple setup with one Kafka broker, and zookeeper on
the same VM. One producer and one consumer on each VM. I have setup a KDC
on cents VM.

I am trying to following this guide:
http://docs.confluent.io/2.0.0/kafka/sasl.html#kerberos

When I start Kafka, it errors out with the following error. Do I need to
setup anything on zookeeper side as well to fix these errors ?

Thanks.



*[2017-02-16 19:05:00,583] WARN Could not login: the client is being asked
for a password, but the Zookeeper client code does not currently support
obtaining a password from the user. Make sure that the client is configured
to use a ticket cache (using the JAAS configuration setting
'useTicketCache=true)' and restart the client. If you still get this
message after that, the TGT in the ticket cache has expired and must be
manually refreshed. To do so, first determine if you are using a password
or a keytab. If the former, run kinit in a Unix shell in the environment of
the user who is running this Zookeeper client using the command 'kinit
' (where  is the name of the client's Kerberos principal). If
the latter, do 'kinit -k -t  ' (where  is the name of
the Kerberos principal, and  is the location of the keytab file).
After manually refreshing your cache, restart this client. If you continue
to see this message after manually refreshing your cache, ensure that your
KDC host's clock is in sync with this host's clock.
(org.apache.zookeeper.client.ZooKeeperSaslClient)*

*[2017-02-16 19:05:00,584] WARN SASL configuration failed:
javax.security.auth.login.LoginException: No password provided Will
continue connection to Zookeeper server without SASL authentication, if
Zookeeper server allows it. (org.apache.zookeeper.ClientCnxn)*

*[2017-02-16 19:05:00,585] INFO Opening socket connection to server
kafka2.example.com/172.26.230.67:2181

(org.apache.zookeeper.ClientCnxn)*

*[2017-02-16 19:05:00,585] INFO zookeeper state changed (AuthFailed)
(org.I0Itec.zkclient.ZkClient)*

*[2017-02-16 19:05:00,586] INFO Terminate ZkClient event thread.
(org.I0Itec.zkclient.ZkEventThread)*

*[2017-02-16 19:05:00,591] INFO Socket connection established to
kafka2.example.com/172.26.230.67:2181
, initiating session
(org.apache.zookeeper.ClientCnxn)*
*[2017-02-16 19:05:00,597] INFO Session establishment complete on server
kafka2.example.com/172.26.230.67:2181
, sessionid =
0x15a4a0678610002, negotiated timeout = 6000
(org.apache.zookeeper.ClientCnxn)*

*[2017-02-16 19:05:00,599] INFO Session: 0x15a4a0678610002 closed
(org.apache.zookeeper.ZooKeeper)*
*[2017-02-16 19:05:00,599] FATAL Fatal error during KafkaServer startup.
Prepare to shutdown (kafka.server.KafkaServer)*
*org.I0Itec.zkclient.exception.ZkAuthFailedException: Authentication
failure*
*at
org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.java:946)*
*at
org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:923)*
*at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:1230)*
*at org.I0Itec.zkclient.ZkClient.(ZkClient.java:156)*
*at org.I0Itec.zkclient.ZkClient.(ZkClient.java:130)*
*at
kafka.utils.ZkUtils$.createZkClientAndConnection(ZkUtils.scala:76)*
*at kafka.utils.ZkUtils$.apply(ZkUtils.scala:58)*
*at kafka.server.KafkaServer.initZk(KafkaServer.scala:327)*
*at kafka.server.KafkaServer.startup(KafkaServer.scala:200)*
*at
kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:39)*
*at kafka.Kafka$.main(Kafka.scala:67)*
*at kafka.Kafka.main(Kafka.scala)*


-- 
Raghav


Re: Kafka Newbie question

2016-04-13 Thread R Krishna
Also, a newbie, using 0.9.0.1, I think you meant
auto.offset.reset=earliest, did the OP have an intention to use his own
commit strategy/management by setting enable.auto.commit=false?

With the auto.offset.reset=earliest, a "new" consumer will get the earliest
partition offsets and commit them and start consuming, going forward, they
will use whatever was committed to continue where they left off.

You could also do this programatically as:

consumer.subscribe(Collections.singletonList(KafkaProperties.topic));
consumer.poll(0);// << this will update its current
subscriptions first

// 2. Get these partitions
consumer.seekToBeginning();// << then seek to the beginning
consumer.poll(0);

Feel free to correct me.

On Tue, Apr 12, 2016 at 10:10 PM, Pradeep Bhattiprolu 
wrote:

> I tried both the approaches stated above, with no luck :(.
> Let me give concrete examples of what i am trying to achieve here :
>
> 1) Kafka Producer adds multiple JSON messages to a particular topic in the
> message broker (done, this part works)
> 2) I want to have multiple consumers identified under a single consumer
> group to read these messages and perform an index into a search engine. To
>  have multiple consumers, I have created a process which spawns multiple
> java threads.
> 3) While creating these threads and adding them to the pool, I select one
> thread as a "leader" thread. The only difference between a leader thread
> and other threads is that the "leader" thread sets the offset to beginning
> by calling Consumer.seekToBeginning() (with no parameters).
> 4) Other threads pick out messages based on the last committed offsets of
> the "leader" thread and other consumer threads.
>
> The idea is to ensure that every thread group of consumers reading a single
> topic should always read from the start of the topic. This is a fair
> assumption to make within my application that each consumer group should
> read from the start.
>
> PS: I am using the new KafkaConsumer class and the latest 0.9.0.0 version
> of the kafka-clients dependency in my project.
>
> Any help / code samples to help me move forward are highly appreciated.
>
> Thanks
> Pradeep
>
> On Sun, Apr 10, 2016 at 2:43 AM, Liquan Pei  wrote:
>
> > Hi Predeep,
> >
> > I think I misinterpreted your question. Are you trying to consume a topic
> > multiple times for each consumer instance or consume one topic with
> > multiple consumer instances?
> >
> > In case that you want to consume a topic multiple times with one consumer
> > instance, `seekToBeginning`` will reset the offset to the beginning in
> the
> > next ``poll``.
> >
> > In case that you want each thread to consume the same topic multiple
> times,
> > you need to use multiple consumer groups. Otherwise, only one consumer
> > instance will be will be consuming the topic.
> >
> > Thanks,
> > Liquan
> >
> >
> >
> > On Sun, Apr 10, 2016 at 12:21 AM, Harsha  wrote:
> >
> > > Pradeep,
> > > How about
> > >
> > >
> >
> https://kafka.apache.org/090/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html#seekToBeginning%28org.apache.kafka.common.TopicPartition...%29
> > >
> > > -Harsha
> > >
> > > On Sat, Apr 9, 2016, at 09:48 PM, Pradeep Bhattiprolu wrote:
> > > > Liquan , thanks for the response.
> > > > By setting the auto commit to false do i have to manage queue offset
> > > > manually ?
> > > > I am running a multiple threads with each thread being a consumer, it
> > > > would
> > > > be complicated to manage offsets across threads, if i dont use
> kafka's
> > > > automatic consumer group abstraction.
> > > >
> > > > Thanks
> > > > Pradeep
> > > >
> > > > On Sat, Apr 9, 2016 at 3:12 AM, Liquan Pei 
> > wrote:
> > > >
> > > > > Hi Pradeep,
> > > > >
> > > > > Can you try to set enable.auto.commit = false if you want to read
> to
> > > the
> > > > > earliest offset? According to the documentation, auto.offset.reset
> > > controls
> > > > > what to do when there is no initial offset in Kafka or if the
> current
> > > > > offset does not exist any more on the server (e.g. because that
> data
> > > has
> > > > > been deleted). In case that auto commit is enabled, the committed
> > > offset is
> > > > > available in some servers.
> > > > >
> > > > > Thanks,
> > > > > Liquan
> > > > >
> > > > > On Fri, Apr 8, 2016 at 10:44 PM, Pradeep Bhattiprolu <
> > > pbhatt...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi All
> > > > > >
> > > > > > I am a newbie to kafka. I am using the new Consumer API in a
> thread
> > > > > acting
> > > > > > as a consumer for a topic in Kafka.
> > > > > > For my testing and other purposes I have read the queue multiple
> > > times
> > > > > > using console-consumer.sh script of kafka.
> > > > > >
> > > > > > To start reading the message from the beginning in my java code
> , I
> > > have
> > > > > > set the value of the auto.offset.reset to "earliest".
> > > > > >
> > > > > > However that property does not guarantee that i begi

Re: Kafka Newbie question

2016-04-12 Thread Pradeep Bhattiprolu
I tried both the approaches stated above, with no luck :(.
Let me give concrete examples of what i am trying to achieve here :

1) Kafka Producer adds multiple JSON messages to a particular topic in the
message broker (done, this part works)
2) I want to have multiple consumers identified under a single consumer
group to read these messages and perform an index into a search engine. To
 have multiple consumers, I have created a process which spawns multiple
java threads.
3) While creating these threads and adding them to the pool, I select one
thread as a "leader" thread. The only difference between a leader thread
and other threads is that the "leader" thread sets the offset to beginning
by calling Consumer.seekToBeginning() (with no parameters).
4) Other threads pick out messages based on the last committed offsets of
the "leader" thread and other consumer threads.

The idea is to ensure that every thread group of consumers reading a single
topic should always read from the start of the topic. This is a fair
assumption to make within my application that each consumer group should
read from the start.

PS: I am using the new KafkaConsumer class and the latest 0.9.0.0 version
of the kafka-clients dependency in my project.

Any help / code samples to help me move forward are highly appreciated.

Thanks
Pradeep

On Sun, Apr 10, 2016 at 2:43 AM, Liquan Pei  wrote:

> Hi Predeep,
>
> I think I misinterpreted your question. Are you trying to consume a topic
> multiple times for each consumer instance or consume one topic with
> multiple consumer instances?
>
> In case that you want to consume a topic multiple times with one consumer
> instance, `seekToBeginning`` will reset the offset to the beginning in the
> next ``poll``.
>
> In case that you want each thread to consume the same topic multiple times,
> you need to use multiple consumer groups. Otherwise, only one consumer
> instance will be will be consuming the topic.
>
> Thanks,
> Liquan
>
>
>
> On Sun, Apr 10, 2016 at 12:21 AM, Harsha  wrote:
>
> > Pradeep,
> > How about
> >
> >
> https://kafka.apache.org/090/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html#seekToBeginning%28org.apache.kafka.common.TopicPartition...%29
> >
> > -Harsha
> >
> > On Sat, Apr 9, 2016, at 09:48 PM, Pradeep Bhattiprolu wrote:
> > > Liquan , thanks for the response.
> > > By setting the auto commit to false do i have to manage queue offset
> > > manually ?
> > > I am running a multiple threads with each thread being a consumer, it
> > > would
> > > be complicated to manage offsets across threads, if i dont use kafka's
> > > automatic consumer group abstraction.
> > >
> > > Thanks
> > > Pradeep
> > >
> > > On Sat, Apr 9, 2016 at 3:12 AM, Liquan Pei 
> wrote:
> > >
> > > > Hi Pradeep,
> > > >
> > > > Can you try to set enable.auto.commit = false if you want to read to
> > the
> > > > earliest offset? According to the documentation, auto.offset.reset
> > controls
> > > > what to do when there is no initial offset in Kafka or if the current
> > > > offset does not exist any more on the server (e.g. because that data
> > has
> > > > been deleted). In case that auto commit is enabled, the committed
> > offset is
> > > > available in some servers.
> > > >
> > > > Thanks,
> > > > Liquan
> > > >
> > > > On Fri, Apr 8, 2016 at 10:44 PM, Pradeep Bhattiprolu <
> > pbhatt...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi All
> > > > >
> > > > > I am a newbie to kafka. I am using the new Consumer API in a thread
> > > > acting
> > > > > as a consumer for a topic in Kafka.
> > > > > For my testing and other purposes I have read the queue multiple
> > times
> > > > > using console-consumer.sh script of kafka.
> > > > >
> > > > > To start reading the message from the beginning in my java code , I
> > have
> > > > > set the value of the auto.offset.reset to "earliest".
> > > > >
> > > > > However that property does not guarantee that i begin reading the
> > > > messages
> > > > > from start, it goes by the most recent smallest offset for the
> > consumer
> > > > > group.
> > > > >
> > > > > Here is my question,
> > > > > Is there a assured way of starting to read the messages from
> > beginning
> > > > from
> > > > > Java based Kafka Consumer ?
> > > > > Once I reset one of my consumers to zero, do i have to do offset
> > > > management
> > > > > myself for other consumer threads or does kafka automatically lower
> > the
> > > > > offset to the first threads read offset ?
> > > > >
> > > > > Any information / material pointing to the solution are highly
> > > > appreciated.
> > > > >
> > > > > Thanks
> > > > > Pradeep
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Liquan Pei
> > > > Software Engineer, Confluent Inc
> > > >
> >
>
>
>
> --
> Liquan Pei
> Software Engineer, Confluent Inc
>


Re: Kafka Newbie question

2016-04-10 Thread Liquan Pei
Hi Predeep,

I think I misinterpreted your question. Are you trying to consume a topic
multiple times for each consumer instance or consume one topic with
multiple consumer instances?

In case that you want to consume a topic multiple times with one consumer
instance, `seekToBeginning`` will reset the offset to the beginning in the
next ``poll``.

In case that you want each thread to consume the same topic multiple times,
you need to use multiple consumer groups. Otherwise, only one consumer
instance will be will be consuming the topic.

Thanks,
Liquan



On Sun, Apr 10, 2016 at 12:21 AM, Harsha  wrote:

> Pradeep,
> How about
>
> https://kafka.apache.org/090/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html#seekToBeginning%28org.apache.kafka.common.TopicPartition...%29
>
> -Harsha
>
> On Sat, Apr 9, 2016, at 09:48 PM, Pradeep Bhattiprolu wrote:
> > Liquan , thanks for the response.
> > By setting the auto commit to false do i have to manage queue offset
> > manually ?
> > I am running a multiple threads with each thread being a consumer, it
> > would
> > be complicated to manage offsets across threads, if i dont use kafka's
> > automatic consumer group abstraction.
> >
> > Thanks
> > Pradeep
> >
> > On Sat, Apr 9, 2016 at 3:12 AM, Liquan Pei  wrote:
> >
> > > Hi Pradeep,
> > >
> > > Can you try to set enable.auto.commit = false if you want to read to
> the
> > > earliest offset? According to the documentation, auto.offset.reset
> controls
> > > what to do when there is no initial offset in Kafka or if the current
> > > offset does not exist any more on the server (e.g. because that data
> has
> > > been deleted). In case that auto commit is enabled, the committed
> offset is
> > > available in some servers.
> > >
> > > Thanks,
> > > Liquan
> > >
> > > On Fri, Apr 8, 2016 at 10:44 PM, Pradeep Bhattiprolu <
> pbhatt...@gmail.com>
> > > wrote:
> > >
> > > > Hi All
> > > >
> > > > I am a newbie to kafka. I am using the new Consumer API in a thread
> > > acting
> > > > as a consumer for a topic in Kafka.
> > > > For my testing and other purposes I have read the queue multiple
> times
> > > > using console-consumer.sh script of kafka.
> > > >
> > > > To start reading the message from the beginning in my java code , I
> have
> > > > set the value of the auto.offset.reset to "earliest".
> > > >
> > > > However that property does not guarantee that i begin reading the
> > > messages
> > > > from start, it goes by the most recent smallest offset for the
> consumer
> > > > group.
> > > >
> > > > Here is my question,
> > > > Is there a assured way of starting to read the messages from
> beginning
> > > from
> > > > Java based Kafka Consumer ?
> > > > Once I reset one of my consumers to zero, do i have to do offset
> > > management
> > > > myself for other consumer threads or does kafka automatically lower
> the
> > > > offset to the first threads read offset ?
> > > >
> > > > Any information / material pointing to the solution are highly
> > > appreciated.
> > > >
> > > > Thanks
> > > > Pradeep
> > > >
> > >
> > >
> > >
> > > --
> > > Liquan Pei
> > > Software Engineer, Confluent Inc
> > >
>



-- 
Liquan Pei
Software Engineer, Confluent Inc


Re: Kafka Newbie question

2016-04-10 Thread Harsha
Pradeep,
How about

https://kafka.apache.org/090/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html#seekToBeginning%28org.apache.kafka.common.TopicPartition...%29

-Harsha

On Sat, Apr 9, 2016, at 09:48 PM, Pradeep Bhattiprolu wrote:
> Liquan , thanks for the response.
> By setting the auto commit to false do i have to manage queue offset
> manually ?
> I am running a multiple threads with each thread being a consumer, it
> would
> be complicated to manage offsets across threads, if i dont use kafka's
> automatic consumer group abstraction.
> 
> Thanks
> Pradeep
> 
> On Sat, Apr 9, 2016 at 3:12 AM, Liquan Pei  wrote:
> 
> > Hi Pradeep,
> >
> > Can you try to set enable.auto.commit = false if you want to read to the
> > earliest offset? According to the documentation, auto.offset.reset controls
> > what to do when there is no initial offset in Kafka or if the current
> > offset does not exist any more on the server (e.g. because that data has
> > been deleted). In case that auto commit is enabled, the committed offset is
> > available in some servers.
> >
> > Thanks,
> > Liquan
> >
> > On Fri, Apr 8, 2016 at 10:44 PM, Pradeep Bhattiprolu 
> > wrote:
> >
> > > Hi All
> > >
> > > I am a newbie to kafka. I am using the new Consumer API in a thread
> > acting
> > > as a consumer for a topic in Kafka.
> > > For my testing and other purposes I have read the queue multiple times
> > > using console-consumer.sh script of kafka.
> > >
> > > To start reading the message from the beginning in my java code , I have
> > > set the value of the auto.offset.reset to "earliest".
> > >
> > > However that property does not guarantee that i begin reading the
> > messages
> > > from start, it goes by the most recent smallest offset for the consumer
> > > group.
> > >
> > > Here is my question,
> > > Is there a assured way of starting to read the messages from beginning
> > from
> > > Java based Kafka Consumer ?
> > > Once I reset one of my consumers to zero, do i have to do offset
> > management
> > > myself for other consumer threads or does kafka automatically lower the
> > > offset to the first threads read offset ?
> > >
> > > Any information / material pointing to the solution are highly
> > appreciated.
> > >
> > > Thanks
> > > Pradeep
> > >
> >
> >
> >
> > --
> > Liquan Pei
> > Software Engineer, Confluent Inc
> >


Re: Kafka Newbie question

2016-04-09 Thread Pradeep Bhattiprolu
Liquan , thanks for the response.
By setting the auto commit to false do i have to manage queue offset
manually ?
I am running a multiple threads with each thread being a consumer, it would
be complicated to manage offsets across threads, if i dont use kafka's
automatic consumer group abstraction.

Thanks
Pradeep

On Sat, Apr 9, 2016 at 3:12 AM, Liquan Pei  wrote:

> Hi Pradeep,
>
> Can you try to set enable.auto.commit = false if you want to read to the
> earliest offset? According to the documentation, auto.offset.reset controls
> what to do when there is no initial offset in Kafka or if the current
> offset does not exist any more on the server (e.g. because that data has
> been deleted). In case that auto commit is enabled, the committed offset is
> available in some servers.
>
> Thanks,
> Liquan
>
> On Fri, Apr 8, 2016 at 10:44 PM, Pradeep Bhattiprolu 
> wrote:
>
> > Hi All
> >
> > I am a newbie to kafka. I am using the new Consumer API in a thread
> acting
> > as a consumer for a topic in Kafka.
> > For my testing and other purposes I have read the queue multiple times
> > using console-consumer.sh script of kafka.
> >
> > To start reading the message from the beginning in my java code , I have
> > set the value of the auto.offset.reset to "earliest".
> >
> > However that property does not guarantee that i begin reading the
> messages
> > from start, it goes by the most recent smallest offset for the consumer
> > group.
> >
> > Here is my question,
> > Is there a assured way of starting to read the messages from beginning
> from
> > Java based Kafka Consumer ?
> > Once I reset one of my consumers to zero, do i have to do offset
> management
> > myself for other consumer threads or does kafka automatically lower the
> > offset to the first threads read offset ?
> >
> > Any information / material pointing to the solution are highly
> appreciated.
> >
> > Thanks
> > Pradeep
> >
>
>
>
> --
> Liquan Pei
> Software Engineer, Confluent Inc
>


Re: Kafka Newbie question

2016-04-09 Thread Liquan Pei
Hi Pradeep,

Can you try to set enable.auto.commit = false if you want to read to the
earliest offset? According to the documentation, auto.offset.reset controls
what to do when there is no initial offset in Kafka or if the current
offset does not exist any more on the server (e.g. because that data has
been deleted). In case that auto commit is enabled, the committed offset is
available in some servers.

Thanks,
Liquan

On Fri, Apr 8, 2016 at 10:44 PM, Pradeep Bhattiprolu 
wrote:

> Hi All
>
> I am a newbie to kafka. I am using the new Consumer API in a thread acting
> as a consumer for a topic in Kafka.
> For my testing and other purposes I have read the queue multiple times
> using console-consumer.sh script of kafka.
>
> To start reading the message from the beginning in my java code , I have
> set the value of the auto.offset.reset to "earliest".
>
> However that property does not guarantee that i begin reading the messages
> from start, it goes by the most recent smallest offset for the consumer
> group.
>
> Here is my question,
> Is there a assured way of starting to read the messages from beginning from
> Java based Kafka Consumer ?
> Once I reset one of my consumers to zero, do i have to do offset management
> myself for other consumer threads or does kafka automatically lower the
> offset to the first threads read offset ?
>
> Any information / material pointing to the solution are highly appreciated.
>
> Thanks
> Pradeep
>



-- 
Liquan Pei
Software Engineer, Confluent Inc


Kafka Newbie question

2016-04-08 Thread Pradeep Bhattiprolu
Hi All

I am a newbie to kafka. I am using the new Consumer API in a thread acting
as a consumer for a topic in Kafka.
For my testing and other purposes I have read the queue multiple times
using console-consumer.sh script of kafka.

To start reading the message from the beginning in my java code , I have
set the value of the auto.offset.reset to "earliest".

However that property does not guarantee that i begin reading the messages
from start, it goes by the most recent smallest offset for the consumer
group.

Here is my question,
Is there a assured way of starting to read the messages from beginning from
Java based Kafka Consumer ?
Once I reset one of my consumers to zero, do i have to do offset management
myself for other consumer threads or does kafka automatically lower the
offset to the first threads read offset ?

Any information / material pointing to the solution are highly appreciated.

Thanks
Pradeep