Re: Consumer questions
AFAIK, we can not replay the messages with high level consumer. We need to use simple consumer. On Sun, Jan 18, 2015 at 12:15 AM, Christopher Piggott cpigg...@gmail.com wrote: Thanks. That helped clear a lot up in my mind. I'm trying to high-level consumer now. Occasionally I need to do a replay of the stream. The example is: KafkaStream.iterator(); which starts at wherever zookeeper recorded as where you left off. With the high level interface, can you request an iterator that starts at the very beginning? On Fri, Jan 16, 2015 at 8:55 PM, Manikumar Reddy ku...@nmsworks.co.in wrote: Hi, 1. In SimpleConsumer, you must keep track of the offsets in your application. In the example code, readOffset variable can be saved in redis/zookeeper. You should plugin this logic in your code. High Level Consumer stores the last read offset information in ZooKeeper. 2. You will get OffsetOutOfRange for any invalid offset. On error, you can decide what to do. i.e read from the latest , earliest or some other offset. 3. https://issues.apache.org/jira/browse/KAFKA-1779 4. Yes Manikumar On Sat, Jan 17, 2015 at 2:25 AM, Christopher Piggott cpigg...@gmail.com wrote: Hi, I am following this link: https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example for a consumer design using kafka_2.9.2 version 0.8.1.1 (what I can find in maven central). I have a couple of questions about the consumer. I checked the archives and didn't see these exact questions asked already, but I may have missed them -- I apologize if that is the case. When I create a consumer I give it a consumer ID. I assumed that it would store my consumer's name as well as the last readOffset in zookeeper, but looking in zookeeper that doesn't seem to be the case. So it seems to me that when my consumers come up they need to either get the entire history from the start of time (which could take a long time, as I have 14 day durability); or else they need to somehow keep track of the read offset themselves. I have redis in my system already, so I have the choice of keeping track of this in either redis or zookeeper. It seems like zookeeper would be a better idea. Am I right, though, that the SimpleConsumer and the example I linked above don't keep track of this, so if I want to do that I would have to do it myself? Second question: in the example consumer, there is an error handler that checks if you received an OffsetOutOfRange response from kafka. If so, it gets a new read offset .LatestTime(). My interpretation of this is that you have asked it for an offset which doesn't make sense, so it just scans you to the end of the stream. That's a guaranteed data loss. A simple alternative would be to take the beginning of the stream, which if you have idempotent processing would be fine - it would be a replay - but it could take a long time. I don't know for sure what would cause you to get an OffsetOutOfRange - the only thing I can really think of is that someone has changed the underlying stream on you (like they deleted and recreated it and didn't tell all the consumers). I guess it's possible that if I have a 1 day stream durability and I stop my consumer for 3 days that it could ask for a readOffset that no longer exists; it's not clear to me whether or not that would result in an OffsetOutOfRange error or not. Does that all make sense? Third question: I set a .maxWait(1000) interpreting that to mean that when I make my fetch request the consumer will time out if there are no new messages in 1 second. It doesn't seem tow ork - my call to consumer.fetch() seems to return immediately. Is that expected? Final question: just to confirm: new FetchRequestBuilder().addFetch( topic, shardNum, readOffset, FETCH_SIZE ) FETCH_SIZE is in bytes, not number of messages, so presumably it fetches as many messages as will fit into that many byte buffer? Is that right? Thanks. Christopher Piggott Sr. Staff Engineer Golisano Institute for Sustainability Rochester Institute of Technology
Query regarding serialization
Hi, I am new to kafka and still learning .I have a query .. As per my understanding the serialization is happening before the partitioning and grouping of messages per broker . Is my understanding correct and what is the reason for the same? Regards, Liju John
In Apache Kafka, how can one achieve delay queue support (similar to what ActiveMQ has)?
Hi Kafka Users Community, In Apache Kafka, how can one achieve delay queue support (similar to what ActiveMQ has)? Has anyone solved similar problem before? Thanks in advance? Regards,Vish
Re: Consumer questions
Thanks. That helped clear a lot up in my mind. I'm trying to high-level consumer now. Occasionally I need to do a replay of the stream. The example is: KafkaStream.iterator(); which starts at wherever zookeeper recorded as where you left off. With the high level interface, can you request an iterator that starts at the very beginning? On Fri, Jan 16, 2015 at 8:55 PM, Manikumar Reddy ku...@nmsworks.co.in wrote: Hi, 1. In SimpleConsumer, you must keep track of the offsets in your application. In the example code, readOffset variable can be saved in redis/zookeeper. You should plugin this logic in your code. High Level Consumer stores the last read offset information in ZooKeeper. 2. You will get OffsetOutOfRange for any invalid offset. On error, you can decide what to do. i.e read from the latest , earliest or some other offset. 3. https://issues.apache.org/jira/browse/KAFKA-1779 4. Yes Manikumar On Sat, Jan 17, 2015 at 2:25 AM, Christopher Piggott cpigg...@gmail.com wrote: Hi, I am following this link: https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example for a consumer design using kafka_2.9.2 version 0.8.1.1 (what I can find in maven central). I have a couple of questions about the consumer. I checked the archives and didn't see these exact questions asked already, but I may have missed them -- I apologize if that is the case. When I create a consumer I give it a consumer ID. I assumed that it would store my consumer's name as well as the last readOffset in zookeeper, but looking in zookeeper that doesn't seem to be the case. So it seems to me that when my consumers come up they need to either get the entire history from the start of time (which could take a long time, as I have 14 day durability); or else they need to somehow keep track of the read offset themselves. I have redis in my system already, so I have the choice of keeping track of this in either redis or zookeeper. It seems like zookeeper would be a better idea. Am I right, though, that the SimpleConsumer and the example I linked above don't keep track of this, so if I want to do that I would have to do it myself? Second question: in the example consumer, there is an error handler that checks if you received an OffsetOutOfRange response from kafka. If so, it gets a new read offset .LatestTime(). My interpretation of this is that you have asked it for an offset which doesn't make sense, so it just scans you to the end of the stream. That's a guaranteed data loss. A simple alternative would be to take the beginning of the stream, which if you have idempotent processing would be fine - it would be a replay - but it could take a long time. I don't know for sure what would cause you to get an OffsetOutOfRange - the only thing I can really think of is that someone has changed the underlying stream on you (like they deleted and recreated it and didn't tell all the consumers). I guess it's possible that if I have a 1 day stream durability and I stop my consumer for 3 days that it could ask for a readOffset that no longer exists; it's not clear to me whether or not that would result in an OffsetOutOfRange error or not. Does that all make sense? Third question: I set a .maxWait(1000) interpreting that to mean that when I make my fetch request the consumer will time out if there are no new messages in 1 second. It doesn't seem tow ork - my call to consumer.fetch() seems to return immediately. Is that expected? Final question: just to confirm: new FetchRequestBuilder().addFetch( topic, shardNum, readOffset, FETCH_SIZE ) FETCH_SIZE is in bytes, not number of messages, so presumably it fetches as many messages as will fit into that many byte buffer? Is that right? Thanks. Christopher Piggott Sr. Staff Engineer Golisano Institute for Sustainability Rochester Institute of Technology
Re: Consumer questions
You can replay the messages with the high level consumer you can even start at whatever position you want. Prior to your consumers starting call ZkUtils.maybeDeletePath(zkClientConnection, /consumers/ + groupId) make sure you have in your consumer properties auto.offset.reset=smallest This way you start at the beginning of the stream once the offsets are gone. If you have many consumers process launching within your group you might want to have a barrier ( http://zookeeper.apache.org/doc/r3.4.6/recipes.html#sc_recipes_eventHandles) so that only one of your launching consumer process does this... if you have only one process or have the ability to-do the operation administratively then no need. You can even trigger this to happen while they are all running...more code to write but 100% doable (and works well if you do it right) have them watch a node, get the notification, stop what they are doing, barrier, delete path (or change the value of the offset so you can start wherever you want), start again... You can also just change the groupId to something brand new when you start up with auto.offset.reset=smallest in your properties, either way. The above is less lint in zk long term. It is all just 1s and 0s and just a matter of how many you put together yourself vs take out of the box that are given too you =8^) /*** Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop / On Sat, Jan 17, 2015 at 2:11 PM, Manikumar Reddy ku...@nmsworks.co.in wrote: AFAIK, we can not replay the messages with high level consumer. We need to use simple consumer. On Sun, Jan 18, 2015 at 12:15 AM, Christopher Piggott cpigg...@gmail.com wrote: Thanks. That helped clear a lot up in my mind. I'm trying to high-level consumer now. Occasionally I need to do a replay of the stream. The example is: KafkaStream.iterator(); which starts at wherever zookeeper recorded as where you left off. With the high level interface, can you request an iterator that starts at the very beginning? On Fri, Jan 16, 2015 at 8:55 PM, Manikumar Reddy ku...@nmsworks.co.in wrote: Hi, 1. In SimpleConsumer, you must keep track of the offsets in your application. In the example code, readOffset variable can be saved in redis/zookeeper. You should plugin this logic in your code. High Level Consumer stores the last read offset information in ZooKeeper. 2. You will get OffsetOutOfRange for any invalid offset. On error, you can decide what to do. i.e read from the latest , earliest or some other offset. 3. https://issues.apache.org/jira/browse/KAFKA-1779 4. Yes Manikumar On Sat, Jan 17, 2015 at 2:25 AM, Christopher Piggott cpigg...@gmail.com wrote: Hi, I am following this link: https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example for a consumer design using kafka_2.9.2 version 0.8.1.1 (what I can find in maven central). I have a couple of questions about the consumer. I checked the archives and didn't see these exact questions asked already, but I may have missed them -- I apologize if that is the case. When I create a consumer I give it a consumer ID. I assumed that it would store my consumer's name as well as the last readOffset in zookeeper, but looking in zookeeper that doesn't seem to be the case. So it seems to me that when my consumers come up they need to either get the entire history from the start of time (which could take a long time, as I have 14 day durability); or else they need to somehow keep track of the read offset themselves. I have redis in my system already, so I have the choice of keeping track of this in either redis or zookeeper. It seems like zookeeper would be a better idea. Am I right, though, that the SimpleConsumer and the example I linked above don't keep track of this, so if I want to do that I would have to do it myself? Second question: in the example consumer, there is an error handler that checks if you received an OffsetOutOfRange response from kafka. If so, it gets a new read offset .LatestTime(). My interpretation of this is that you have asked it for an offset which doesn't make sense, so it just scans you to the end of the stream. That's a guaranteed data loss. A simple alternative would be to take the beginning of the stream, which if you have idempotent processing would be fine - it would be a replay - but it could take a long time. I don't know for sure what would cause you to get an OffsetOutOfRange - the only thing I can really
Re: dumping JMX data
JIRAs related to the issue are https://issues.apache.org/jira/browse/KAFKA-1680 https://issues.apache.org/jira/browse/KAFKA-1679 On Sun, Jan 18, 2015 at 3:12 AM, Scott Chapman sc...@woofplanet.com wrote: While I appreciate all the suggestions on other JMX related tools, my question is really about the JMXTool included in and documented in Kafka and how to use it to dump all the JMX data. I can get it to dump some mbeans, so i know my config is working. But what I can't seem to do (which is described in the documentation) is to dump all attributes of all objects. Please, anyone using it have any experience it that might be able to help me? Thanks in advance! On Sat Jan 17 2015 at 12:39:56 PM Albert Strasheim full...@gmail.com wrote: On Fri, Jan 16, 2015 at 5:52 PM, Joe Stein joe.st...@stealth.ly wrote: Here are some more tools for that https://cwiki.apache.org/confluence/display/KAFKA/JMX+Reporters depending on what you have in place and what you are trying todo different options exist. A lot of folks like JMX Trans. We tried JMX Trans for a while, but didn't like it very much. Jolokia looks promising. Trying that now. http://www.jolokia.org/
Re: dumping JMX data
Thanks, that second one might be material. I find that if I run without any arguments I get no output and it just keeps running. *sigh* On Sat Jan 17 2015 at 7:58:52 PM Manikumar Reddy ku...@nmsworks.co.in wrote: JIRAs related to the issue are https://issues.apache.org/jira/browse/KAFKA-1680 https://issues.apache.org/jira/browse/KAFKA-1679 On Sun, Jan 18, 2015 at 3:12 AM, Scott Chapman sc...@woofplanet.com wrote: While I appreciate all the suggestions on other JMX related tools, my question is really about the JMXTool included in and documented in Kafka and how to use it to dump all the JMX data. I can get it to dump some mbeans, so i know my config is working. But what I can't seem to do (which is described in the documentation) is to dump all attributes of all objects. Please, anyone using it have any experience it that might be able to help me? Thanks in advance! On Sat Jan 17 2015 at 12:39:56 PM Albert Strasheim full...@gmail.com wrote: On Fri, Jan 16, 2015 at 5:52 PM, Joe Stein joe.st...@stealth.ly wrote: Here are some more tools for that https://cwiki.apache.org/confluence/display/KAFKA/JMX+Reporters depending on what you have in place and what you are trying todo different options exist. A lot of folks like JMX Trans. We tried JMX Trans for a while, but didn't like it very much. Jolokia looks promising. Trying that now. http://www.jolokia.org/
Re: dumping JMX data
So, related question. If I query for a specific object name, I always seem to get UNIX time: ./bin/kafka-run-class.sh kafka.tools.JmxTool --object-name 'kafka.server:name=UnderReplicatedPartitions,type=ReplicaManager' --jmx-url service:jmx:rmi:///jndi/rmi://localhost:/jmxrmi always returns: 1421543777895 1421543779895 1421543781895 1421543783896 1421543785896 What am I missing? On Sat Jan 17 2015 at 8:11:38 PM Scott Chapman sc...@woofplanet.com wrote: Thanks, that second one might be material. I find that if I run without any arguments I get no output and it just keeps running. *sigh* On Sat Jan 17 2015 at 7:58:52 PM Manikumar Reddy ku...@nmsworks.co.in wrote: JIRAs related to the issue are https://issues.apache.org/jira/browse/KAFKA-1680 https://issues.apache.org/jira/browse/KAFKA-1679 On Sun, Jan 18, 2015 at 3:12 AM, Scott Chapman sc...@woofplanet.com wrote: While I appreciate all the suggestions on other JMX related tools, my question is really about the JMXTool included in and documented in Kafka and how to use it to dump all the JMX data. I can get it to dump some mbeans, so i know my config is working. But what I can't seem to do (which is described in the documentation) is to dump all attributes of all objects. Please, anyone using it have any experience it that might be able to help me? Thanks in advance! On Sat Jan 17 2015 at 12:39:56 PM Albert Strasheim full...@gmail.com wrote: On Fri, Jan 16, 2015 at 5:52 PM, Joe Stein joe.st...@stealth.ly wrote: Here are some more tools for that https://cwiki.apache.org/confluence/display/KAFKA/JMX+Reporters depending on what you have in place and what you are trying todo different options exist. A lot of folks like JMX Trans. We tried JMX Trans for a while, but didn't like it very much. Jolokia looks promising. Trying that now. http://www.jolokia.org/
kafka shutdown automatically
Hi, our kafka cluster is shut down automatically today, here is the log. I don't find any error log. Anything wrong? [2015-01-18 05:01:01,788] INFO [BrokerChangeListener on Controller 0]: Broker change listener fired for path /brokers/ids with children 0 (kafka.controller.ReplicaStateMachine$BrokerChangeListener) [2015-01-18 05:01:01,791] INFO [Controller-0-to-broker-1-send-thread], Shutting down (kafka.controller.RequestSendThread) [2015-01-18 05:01:01,792] INFO [Controller-0-to-broker-1-send-thread], Stopped (kafka.controller.RequestSendThread) [2015-01-18 05:01:01,792] INFO [Controller-0-to-broker-1-send-thread], Shutdown completed (kafka.controller.RequestSendThread) [2015-01-18 05:01:01,792] INFO [Controller-0-to-broker-0-send-thread], Shutting down (kafka.controller.RequestSendThread) [2015-01-18 05:01:01,792] INFO [Controller-0-to-broker-0-send-thread], Stopped (kafka.controller.RequestSendThread) [2015-01-18 05:01:01,792] INFO [Controller-0-to-broker-0-send-thread], Shutdown completed (kafka.controller.RequestSendThread) [2015-01-18 05:01:01,792] INFO [Controller 0]: Controller shutdown complete (kafka.controller.KafkaController)
Command to list my brokers
Hi all, I just want a way to query all of my brokers to see if they're all connected and online, without creating a topic. Or is creating a topic the best way to verify all my brokers are up and running??? Thanks
Re: dumping JMX data
On Fri, Jan 16, 2015 at 5:52 PM, Joe Stein joe.st...@stealth.ly wrote: Here are some more tools for that https://cwiki.apache.org/confluence/display/KAFKA/JMX+Reporters depending on what you have in place and what you are trying todo different options exist. A lot of folks like JMX Trans. We tried JMX Trans for a while, but didn't like it very much. Jolokia looks promising. Trying that now. http://www.jolokia.org/