Re: Consumer questions

2015-01-17 Thread Manikumar Reddy
AFAIK, we can not replay the messages with high level consumer. We need to
use simple consumer.

On Sun, Jan 18, 2015 at 12:15 AM, Christopher Piggott cpigg...@gmail.com
wrote:

 Thanks.  That helped clear a lot up in my mind.

 I'm trying to high-level consumer now.  Occasionally I need to do a replay
 of the stream.  The example is:

KafkaStream.iterator();

 which starts at wherever zookeeper recorded as where you left off.

 With the high level interface, can you request an iterator that starts at
 the very beginning?



 On Fri, Jan 16, 2015 at 8:55 PM, Manikumar Reddy ku...@nmsworks.co.in
 wrote:

  Hi,
 
  1. In SimpleConsumer, you must keep track of the offsets in your
  application.
 In the example code,  readOffset  variable  can be saved in
  redis/zookeeper.
 You should plugin this logic in your code. High Level Consumer stores
  the last
 read offset information in ZooKeeper.
 
  2. You will get OffsetOutOfRange for any invalid offset.
 On error, you can decide what to do. i.e read from the latest ,
 earliest
  or some other offset.
 
  3. https://issues.apache.org/jira/browse/KAFKA-1779
 
  4. Yes
 
 
  Manikumar
 
  On Sat, Jan 17, 2015 at 2:25 AM, Christopher Piggott cpigg...@gmail.com
 
  wrote:
 
   Hi,
  
   I am following this link:
  
  
  
 
 https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example
  
   for a consumer design using kafka_2.9.2 version 0.8.1.1 (what I can
 find
  in
   maven central).  I have a couple of questions about the consumer.  I
   checked the archives and didn't see these exact questions asked
 already,
   but I may have missed them -- I apologize if that is the case.
  
  
   When I create a consumer I give it a consumer ID.  I assumed that it
  would
   store my consumer's name as well as the last readOffset in zookeeper,
 but
   looking in zookeeper that doesn't seem to be the case.  So it seems to
 me
   that when my consumers come up they need to either get the entire
 history
   from the start of time (which could take a long time, as I have 14 day
   durability); or else they need to somehow keep track of the read offset
   themselves.
  
   I have redis in my system already, so I have the choice of keeping
 track
  of
   this in either redis or zookeeper.  It seems like zookeeper would be a
   better idea.  Am I right, though, that the SimpleConsumer and the
  example I
   linked above don't keep track of this, so if I want to do that I would
  have
   to do it myself?
  
   Second question: in the example consumer, there is an error handler
 that
   checks if you received an OffsetOutOfRange response from kafka.  If so,
  it
   gets a new read offset .LatestTime().  My interpretation of this is
 that
   you have asked it for an offset which doesn't make sense, so it just
  scans
   you to the end of the stream.  That's a guaranteed data loss.  A simple
   alternative would be to take the beginning of the stream, which if you
  have
   idempotent processing would be fine - it would be a replay - but it
 could
   take a long time.
  
   I don't know for sure what would cause you to get an OffsetOutOfRange -
  the
   only thing I can really think of is that someone has changed the
  underlying
   stream on you (like they deleted and recreated it and didn't tell all
 the
   consumers).  I guess it's possible that if I have a 1 day stream
  durability
   and I stop my consumer for 3 days that it could ask for a readOffset
 that
   no longer exists; it's not clear to me whether or not that would result
  in
   an OffsetOutOfRange error or not.
  
   Does that all make sense?
  
   Third question: I set a .maxWait(1000) interpreting that to mean that
  when
   I make my fetch request the consumer will time out if there are no new
   messages in 1 second.  It doesn't seem tow ork - my call to
   consumer.fetch() seems to return immediately.  Is that expected?
  
   Final question: just to confirm:
  
   new FetchRequestBuilder().addFetch( topic, shardNum, readOffset,
   FETCH_SIZE )
  
   FETCH_SIZE is in bytes, not number of messages, so presumably it
 fetches
  as
   many messages as will fit into that many byte buffer?  Is that right?
  
   Thanks.
  
  
   Christopher Piggott
   Sr. Staff Engineer
   Golisano Institute for Sustainability
   Rochester Institute of Technology
  
 



Query regarding serialization

2015-01-17 Thread Liju John
Hi,

I am new to kafka and still learning .I have a query .. As per my
understanding the serialization is happening before the partitioning and
grouping of messages per broker . Is my understanding correct and what is
the reason for the same?

Regards,
Liju John


In Apache Kafka, how can one achieve delay queue support (similar to what ActiveMQ has)?

2015-01-17 Thread vishwambhar Upadhyay
Hi Kafka Users Community,
In Apache Kafka, how can one achieve delay queue support (similar to what 
ActiveMQ has)? Has anyone solved similar problem before? 
Thanks in advance?

Regards,Vish


Re: Consumer questions

2015-01-17 Thread Christopher Piggott
Thanks.  That helped clear a lot up in my mind.

I'm trying to high-level consumer now.  Occasionally I need to do a replay
of the stream.  The example is:

   KafkaStream.iterator();

which starts at wherever zookeeper recorded as where you left off.

With the high level interface, can you request an iterator that starts at
the very beginning?



On Fri, Jan 16, 2015 at 8:55 PM, Manikumar Reddy ku...@nmsworks.co.in
wrote:

 Hi,

 1. In SimpleConsumer, you must keep track of the offsets in your
 application.
In the example code,  readOffset  variable  can be saved in
 redis/zookeeper.
You should plugin this logic in your code. High Level Consumer stores
 the last
read offset information in ZooKeeper.

 2. You will get OffsetOutOfRange for any invalid offset.
On error, you can decide what to do. i.e read from the latest , earliest
 or some other offset.

 3. https://issues.apache.org/jira/browse/KAFKA-1779

 4. Yes


 Manikumar

 On Sat, Jan 17, 2015 at 2:25 AM, Christopher Piggott cpigg...@gmail.com
 wrote:

  Hi,
 
  I am following this link:
 
 
 
 https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example
 
  for a consumer design using kafka_2.9.2 version 0.8.1.1 (what I can find
 in
  maven central).  I have a couple of questions about the consumer.  I
  checked the archives and didn't see these exact questions asked already,
  but I may have missed them -- I apologize if that is the case.
 
 
  When I create a consumer I give it a consumer ID.  I assumed that it
 would
  store my consumer's name as well as the last readOffset in zookeeper, but
  looking in zookeeper that doesn't seem to be the case.  So it seems to me
  that when my consumers come up they need to either get the entire history
  from the start of time (which could take a long time, as I have 14 day
  durability); or else they need to somehow keep track of the read offset
  themselves.
 
  I have redis in my system already, so I have the choice of keeping track
 of
  this in either redis or zookeeper.  It seems like zookeeper would be a
  better idea.  Am I right, though, that the SimpleConsumer and the
 example I
  linked above don't keep track of this, so if I want to do that I would
 have
  to do it myself?
 
  Second question: in the example consumer, there is an error handler that
  checks if you received an OffsetOutOfRange response from kafka.  If so,
 it
  gets a new read offset .LatestTime().  My interpretation of this is that
  you have asked it for an offset which doesn't make sense, so it just
 scans
  you to the end of the stream.  That's a guaranteed data loss.  A simple
  alternative would be to take the beginning of the stream, which if you
 have
  idempotent processing would be fine - it would be a replay - but it could
  take a long time.
 
  I don't know for sure what would cause you to get an OffsetOutOfRange -
 the
  only thing I can really think of is that someone has changed the
 underlying
  stream on you (like they deleted and recreated it and didn't tell all the
  consumers).  I guess it's possible that if I have a 1 day stream
 durability
  and I stop my consumer for 3 days that it could ask for a readOffset that
  no longer exists; it's not clear to me whether or not that would result
 in
  an OffsetOutOfRange error or not.
 
  Does that all make sense?
 
  Third question: I set a .maxWait(1000) interpreting that to mean that
 when
  I make my fetch request the consumer will time out if there are no new
  messages in 1 second.  It doesn't seem tow ork - my call to
  consumer.fetch() seems to return immediately.  Is that expected?
 
  Final question: just to confirm:
 
  new FetchRequestBuilder().addFetch( topic, shardNum, readOffset,
  FETCH_SIZE )
 
  FETCH_SIZE is in bytes, not number of messages, so presumably it fetches
 as
  many messages as will fit into that many byte buffer?  Is that right?
 
  Thanks.
 
 
  Christopher Piggott
  Sr. Staff Engineer
  Golisano Institute for Sustainability
  Rochester Institute of Technology
 



Re: Consumer questions

2015-01-17 Thread Joe Stein
You can replay the messages with the high level consumer you can even
start at whatever position you want.

Prior to your consumers starting call

ZkUtils.maybeDeletePath(zkClientConnection, /consumers/ + groupId)

make sure you have in your consumer properties

auto.offset.reset=smallest

This way you start at the beginning of the stream once the offsets are gone.

If you have many consumers process launching within your group you might
want to have a barrier (
http://zookeeper.apache.org/doc/r3.4.6/recipes.html#sc_recipes_eventHandles)
so that only one of your launching consumer process does this... if you
have only one process or have the ability to-do the operation
administratively then no need.

You can even trigger this to happen while they are all running...more code
to write but 100% doable (and works well if you do it right) have them
watch a node, get the notification, stop what they are doing, barrier,
delete path (or change the value of the offset so you can start wherever
you want), start again...

You can also just change the groupId to something brand new when you start
up with auto.offset.reset=smallest in your properties, either way. The
above is less lint in zk long term.

It is all just 1s and 0s and just a matter of how many you put together
yourself vs take out of the box that are given too you =8^)

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop
/

On Sat, Jan 17, 2015 at 2:11 PM, Manikumar Reddy ku...@nmsworks.co.in
wrote:

 AFAIK, we can not replay the messages with high level consumer. We need to
 use simple consumer.

 On Sun, Jan 18, 2015 at 12:15 AM, Christopher Piggott cpigg...@gmail.com
 wrote:

  Thanks.  That helped clear a lot up in my mind.
 
  I'm trying to high-level consumer now.  Occasionally I need to do a
 replay
  of the stream.  The example is:
 
 KafkaStream.iterator();
 
  which starts at wherever zookeeper recorded as where you left off.
 
  With the high level interface, can you request an iterator that starts at
  the very beginning?
 
 
 
  On Fri, Jan 16, 2015 at 8:55 PM, Manikumar Reddy ku...@nmsworks.co.in
  wrote:
 
   Hi,
  
   1. In SimpleConsumer, you must keep track of the offsets in your
   application.
  In the example code,  readOffset  variable  can be saved in
   redis/zookeeper.
  You should plugin this logic in your code. High Level Consumer
 stores
   the last
  read offset information in ZooKeeper.
  
   2. You will get OffsetOutOfRange for any invalid offset.
  On error, you can decide what to do. i.e read from the latest ,
  earliest
   or some other offset.
  
   3. https://issues.apache.org/jira/browse/KAFKA-1779
  
   4. Yes
  
  
   Manikumar
  
   On Sat, Jan 17, 2015 at 2:25 AM, Christopher Piggott 
 cpigg...@gmail.com
  
   wrote:
  
Hi,
   
I am following this link:
   
   
   
  
 
 https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example
   
for a consumer design using kafka_2.9.2 version 0.8.1.1 (what I can
  find
   in
maven central).  I have a couple of questions about the consumer.  I
checked the archives and didn't see these exact questions asked
  already,
but I may have missed them -- I apologize if that is the case.
   
   
When I create a consumer I give it a consumer ID.  I assumed that it
   would
store my consumer's name as well as the last readOffset in zookeeper,
  but
looking in zookeeper that doesn't seem to be the case.  So it seems
 to
  me
that when my consumers come up they need to either get the entire
  history
from the start of time (which could take a long time, as I have 14
 day
durability); or else they need to somehow keep track of the read
 offset
themselves.
   
I have redis in my system already, so I have the choice of keeping
  track
   of
this in either redis or zookeeper.  It seems like zookeeper would be
 a
better idea.  Am I right, though, that the SimpleConsumer and the
   example I
linked above don't keep track of this, so if I want to do that I
 would
   have
to do it myself?
   
Second question: in the example consumer, there is an error handler
  that
checks if you received an OffsetOutOfRange response from kafka.  If
 so,
   it
gets a new read offset .LatestTime().  My interpretation of this is
  that
you have asked it for an offset which doesn't make sense, so it just
   scans
you to the end of the stream.  That's a guaranteed data loss.  A
 simple
alternative would be to take the beginning of the stream, which if
 you
   have
idempotent processing would be fine - it would be a replay - but it
  could
take a long time.
   
I don't know for sure what would cause you to get an
 OffsetOutOfRange -
   the
only thing I can really 

Re: dumping JMX data

2015-01-17 Thread Manikumar Reddy
JIRAs related to the issue are

https://issues.apache.org/jira/browse/KAFKA-1680
https://issues.apache.org/jira/browse/KAFKA-1679

On Sun, Jan 18, 2015 at 3:12 AM, Scott Chapman sc...@woofplanet.com wrote:

 While I appreciate all the suggestions on other JMX related tools, my
 question is really about the JMXTool included in and documented in Kafka
 and how to use it to dump all the JMX data. I can get it to dump some
 mbeans, so i know my config is working. But what I can't seem to do (which
 is described in the documentation) is to dump all attributes of all
 objects.

 Please, anyone using it have any experience it that might be able to help
 me?

 Thanks in advance!

 On Sat Jan 17 2015 at 12:39:56 PM Albert Strasheim full...@gmail.com
 wrote:

  On Fri, Jan 16, 2015 at 5:52 PM, Joe Stein joe.st...@stealth.ly wrote:
   Here are some more tools for that
   https://cwiki.apache.org/confluence/display/KAFKA/JMX+Reporters
  depending
   on what you have in place and what you are trying todo different
 options
   exist.
  
   A lot of folks like JMX Trans.
 
  We tried JMX Trans for a while, but didn't like it very much.
 
  Jolokia looks promising. Trying that now.
 
  http://www.jolokia.org/
 



Re: dumping JMX data

2015-01-17 Thread Scott Chapman
Thanks, that second one might be material. I find that if I run without any
arguments I get no output and it just keeps running. *sigh*

On Sat Jan 17 2015 at 7:58:52 PM Manikumar Reddy ku...@nmsworks.co.in
wrote:

 JIRAs related to the issue are

 https://issues.apache.org/jira/browse/KAFKA-1680
 https://issues.apache.org/jira/browse/KAFKA-1679

 On Sun, Jan 18, 2015 at 3:12 AM, Scott Chapman sc...@woofplanet.com
 wrote:

  While I appreciate all the suggestions on other JMX related tools, my
  question is really about the JMXTool included in and documented in Kafka
  and how to use it to dump all the JMX data. I can get it to dump some
  mbeans, so i know my config is working. But what I can't seem to do
 (which
  is described in the documentation) is to dump all attributes of all
  objects.
 
  Please, anyone using it have any experience it that might be able to help
  me?
 
  Thanks in advance!
 
  On Sat Jan 17 2015 at 12:39:56 PM Albert Strasheim full...@gmail.com
  wrote:
 
   On Fri, Jan 16, 2015 at 5:52 PM, Joe Stein joe.st...@stealth.ly
 wrote:
Here are some more tools for that
https://cwiki.apache.org/confluence/display/KAFKA/JMX+Reporters
   depending
on what you have in place and what you are trying todo different
  options
exist.
   
A lot of folks like JMX Trans.
  
   We tried JMX Trans for a while, but didn't like it very much.
  
   Jolokia looks promising. Trying that now.
  
   http://www.jolokia.org/
  
 



Re: dumping JMX data

2015-01-17 Thread Scott Chapman
So, related question.

If I query for a specific object name, I always seem to get UNIX time:
./bin/kafka-run-class.sh kafka.tools.JmxTool --object-name
'kafka.server:name=UnderReplicatedPartitions,type=ReplicaManager'
--jmx-url service:jmx:rmi:///jndi/rmi://localhost:/jmxrmi

always returns:
1421543777895
1421543779895
1421543781895
1421543783896
1421543785896

What am I missing?

On Sat Jan 17 2015 at 8:11:38 PM Scott Chapman sc...@woofplanet.com wrote:

 Thanks, that second one might be material. I find that if I run without
 any arguments I get no output and it just keeps running. *sigh*

 On Sat Jan 17 2015 at 7:58:52 PM Manikumar Reddy ku...@nmsworks.co.in
 wrote:

 JIRAs related to the issue are

 https://issues.apache.org/jira/browse/KAFKA-1680
 https://issues.apache.org/jira/browse/KAFKA-1679

 On Sun, Jan 18, 2015 at 3:12 AM, Scott Chapman sc...@woofplanet.com
 wrote:

  While I appreciate all the suggestions on other JMX related tools, my
  question is really about the JMXTool included in and documented in Kafka
  and how to use it to dump all the JMX data. I can get it to dump some
  mbeans, so i know my config is working. But what I can't seem to do
 (which
  is described in the documentation) is to dump all attributes of all
  objects.
 
  Please, anyone using it have any experience it that might be able to
 help
  me?
 
  Thanks in advance!
 
  On Sat Jan 17 2015 at 12:39:56 PM Albert Strasheim full...@gmail.com
  wrote:
 
   On Fri, Jan 16, 2015 at 5:52 PM, Joe Stein joe.st...@stealth.ly
 wrote:
Here are some more tools for that
https://cwiki.apache.org/confluence/display/KAFKA/JMX+Reporters
   depending
on what you have in place and what you are trying todo different
  options
exist.
   
A lot of folks like JMX Trans.
  
   We tried JMX Trans for a while, but didn't like it very much.
  
   Jolokia looks promising. Trying that now.
  
   http://www.jolokia.org/
  
 




kafka shutdown automatically

2015-01-17 Thread Yonghui Zhao
Hi,

our kafka cluster is shut down automatically today, here is the log.

I don't find any error log.  Anything wrong?


[2015-01-18 05:01:01,788] INFO [BrokerChangeListener on Controller 0]:
Broker change listener fired for path /brokers/ids with children 0
(kafka.controller.ReplicaStateMachine$BrokerChangeListener)

[2015-01-18 05:01:01,791] INFO [Controller-0-to-broker-1-send-thread],
Shutting down (kafka.controller.RequestSendThread)

[2015-01-18 05:01:01,792] INFO [Controller-0-to-broker-1-send-thread],
Stopped  (kafka.controller.RequestSendThread)

[2015-01-18 05:01:01,792] INFO [Controller-0-to-broker-1-send-thread],
Shutdown completed (kafka.controller.RequestSendThread)

[2015-01-18 05:01:01,792] INFO [Controller-0-to-broker-0-send-thread],
Shutting down (kafka.controller.RequestSendThread)

[2015-01-18 05:01:01,792] INFO [Controller-0-to-broker-0-send-thread],
Stopped  (kafka.controller.RequestSendThread)

[2015-01-18 05:01:01,792] INFO [Controller-0-to-broker-0-send-thread],
Shutdown completed (kafka.controller.RequestSendThread)

[2015-01-18 05:01:01,792] INFO [Controller 0]: Controller shutdown complete
(kafka.controller.KafkaController)


Command to list my brokers

2015-01-17 Thread Dillian Murphey
Hi all,

I just want a way to query all of my brokers to see if they're all
connected and online, without creating a topic.  Or is creating a topic the
best way to verify all my brokers are up and running???

Thanks


Re: dumping JMX data

2015-01-17 Thread Albert Strasheim
On Fri, Jan 16, 2015 at 5:52 PM, Joe Stein joe.st...@stealth.ly wrote:
 Here are some more tools for that
 https://cwiki.apache.org/confluence/display/KAFKA/JMX+Reporters depending
 on what you have in place and what you are trying todo different options
 exist.

 A lot of folks like JMX Trans.

We tried JMX Trans for a while, but didn't like it very much.

Jolokia looks promising. Trying that now.

http://www.jolokia.org/