Trouble recovering after a crashed broker
Hi all, We have a cluster of 3 0.8 brokers, and this morning one of the broker crashed. It is a test broker, and we stored the logs in /tmp/kafka-logs. All topics in use are replicated on the three brokers. You can guess the problem, when the broker rebooted it wiped all the data in the logs. The producers and consumers are fine, but the broker with the wiped data keeps generating a lot of exceptions, and I don't really know what to do to recover. Example exception: [2014-01-03 10:09:47,755] ERROR [KafkaApi-1] Error when processing fetch request for partition [topic,0] offset 814798 from consumer with correlation id 0 (kafka.server.KafkaApis) kafka.common.OffsetOutOfRangeException: Request for offset 814798 but we only have log segments in the range 0 to 19372. There are a lot of them, something like 10+ per second. I (maybe wrongly) assumed that the broker would catch up, if that's the case how can I see the progress ? In general, what is the recommended way to bring back a broker with wiped data in a cluster ? Thanks.
Re: Trouble recovering after a crashed broker
If a broker crashes and restarts, it will catch up the missing data from the leader replicas. Normally, when this broker is catching up, it won't be serving any client requests though. Are you seeing those errors on the crashed broker? Also, you are not supposed to see OffsetOutOfRangeException with just one broker failure with 3 replicas. Do you see the following in the controller log? No broker in ISR is alive for ... There's potential data loss. Thanks, Jun On Fri, Jan 3, 2014 at 1:23 AM, Vincent Rischmann zecmerqu...@gmail.comwrote: Hi all, We have a cluster of 3 0.8 brokers, and this morning one of the broker crashed. It is a test broker, and we stored the logs in /tmp/kafka-logs. All topics in use are replicated on the three brokers. You can guess the problem, when the broker rebooted it wiped all the data in the logs. The producers and consumers are fine, but the broker with the wiped data keeps generating a lot of exceptions, and I don't really know what to do to recover. Example exception: [2014-01-03 10:09:47,755] ERROR [KafkaApi-1] Error when processing fetch request for partition [topic,0] offset 814798 from consumer with correlation id 0 (kafka.server.KafkaApis) kafka.common.OffsetOutOfRangeException: Request for offset 814798 but we only have log segments in the range 0 to 19372. There are a lot of them, something like 10+ per second. I (maybe wrongly) assumed that the broker would catch up, if that's the case how can I see the progress ? In general, what is the recommended way to bring back a broker with wiped data in a cluster ? Thanks.
Re: problem with high-level consumer stream filter regex....
Thanks Joe, I can confirm that your patch works for me, as applied to 0.8.0. Jason On Fri, Dec 20, 2013 at 6:28 PM, Jason Rosenberg j...@squareup.com wrote: Thanks Joe, I generally build locally, and upload to our maven proxy (using a custom pom). I haven't yet had luck using maven central (although, I might upgrade to the 2.10 version, in which case I understand it to be in better shape?). I containerize the broker (and all the producers and consumers), so I use the kafka jar directly. I think if you do the patch against 0.8, I can apply and use. Ultimately, I'll upgrade to 0.8.1, once that's in a beta release state. Thanks again, Jason On Fri, Dec 20, 2013 at 10:29 AM, Joe Stein joe.st...@stealth.ly wrote: Hey Jason, I was able to reproduce the issue and have a fix in hand to test later today. If it looks good I will post the patch. I am going to-do the patch against 0.8 branch first. How do you deploy and use libraries? Is it download broker and use maven central? /*** Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop / On Wed, Dec 18, 2013 at 4:13 PM, Jason Rosenberg j...@squareup.com wrote: thanks Joe! On Wed, Dec 18, 2013 at 11:05 AM, Joe Stein joe.st...@stealth.ly wrote: Hey Jason, I have someone looking into it now (they just started). I can look at it on Friday or if I finish up what I am working on for tomorrow then sooner. /*** Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop / On Wed, Dec 18, 2013 at 8:15 AM, Jason Rosenberg j...@squareup.com wrote: Joe, I think the java code I listed in the Jira ticket should reproduce the issue directly, does that not work? Jason On Tue, Dec 17, 2013 at 9:49 AM, Joe Stein joe.st...@stealth.ly wrote: Hi Jason, I just replied on the ticket. If it is a bug the update to create new filter or fix as bug, same. Can you post some code to help reproduce the problem? so apples to apples and such, thanks! /*** Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop / On Tue, Dec 17, 2013 at 1:16 AM, Jason Rosenberg j...@squareup.com wrote: Ping Any thoughts on this? Seems like a bug, but then again, we're not sure what the expected behavior for regexes should be here (e.g. is there a way to whitelist topics with a filter that looks for a leading substring, but then blocks subsequent substrings)? E.g. apply a blacklist to a whitelist :). Jason On Thu, Dec 12, 2013 at 1:01 PM, Jason Rosenberg j...@squareup.com wrote: All, I've filed: https://issues.apache.org/jira/browse/KAFKA-1180 We are needing to create a stream selector that essentially combines the logic of the BlackList and WhiteList classes. That is, we want to select a topic that contains a certain prefix, as long as it doesn't also contain a secondary string. This should be easy to do with ordinary java Regex's, but we're running into some issues, trying to do this with the WhiteList class only. We have a pattern that uses negative lookahead, like this: test-(?!bad\\b)[\\w]+ So this should select a topic like: test-good, but exclude a topic like test-bad, and also exclude a topic without the test prefix, like foo-bar. Instead, what we see is a NullPointerException in the ConsumerIterator, and the consumer just hangs, after sending a topic of 'test-topic' followed by 'test-bad': 21700 [ConsumerFetcherThread-group1_square-1a7ac0.local-1386869343370-dc19c7dc-0-1946108683] ERROR kafka.consumer.ConsumerFetcherThread - [ConsumerFetcherThread-group1_square-1a7ac0.local-1386869343370-dc19c7dc-0-1946108683], Error due to kafka.common.KafkaException: error processing data for partition [test-bad,0] offset 0 at
Re: node.js client library?
Thanks On 27/12/2013 18:49, Joe Stein wrote: I added the wurstmeister client to the wiki SOHU-Co, can you provide a license file in the project and I would link it then too please. https://cwiki.apache.org/confluence/display/KAFKA/Clients I also added wurstmeister's port of storm-kafka for 0.8.0 also to the client list and my company's Scala DSL too thnx =) Joestein On Fri, Dec 27, 2013 at 1:15 AM, 小宇 mocking...@gmail.com wrote: Hi, here is a Node.js client for latest Kafka: https://github.com/SOHU-Co/kafka-node.git 2013/12/25 Thomas thomas...@arcor.de Hi Joe, I've started a node.js implementation for 0.8. (https://github.com/ wurstmeister/node-kafka-0.8-plus) I'd welcome any feedback or help. Regards Thomas On 24/12/2013 15:24, Joe Stein wrote: Hi, I wanted to reach out if folks are using https://github.com/cainus/Prozess for a node.js client library? Are there other node.js implementations folks are using or is that primarily it? Are there even folks using node.js and producing to kafka broker and want 0.8.0 ... 0.8.1 ... etc ... support? /*** Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop /
Format of Kafka storage on disk
Is there any place where I can know about the internal structure of the log file where kafka stores the data. A topic has a .index and a .log file. I want to read the entire log file and parse the contents out. Thanks Subbu
Re: Format of Kafka storage on disk
The DumpLogSegments should do that for you https://github.com/apache/kafka/blob/0.8/core/src/main/scala/kafka/tools/DumpLogSegments.scala bin/kafka-run-class.sh kafka.tools.DumpLogSegments Option Description -- --- --deep-iterationif set, uses deep instead of shallow iteration --files file1, file2, ... REQUIRED: The comma separated list of data and index log files to be dumped --max-message-size Integer: size Size of largest message. (default: 5242880) --print-data-logif set, printing the messages content when dumping data logs --verify-index-only if set, just verify the index log without printing its content or use the code as entry point for whatever you want to-do :) /*** Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop / On Fri, Jan 3, 2014 at 5:10 PM, Subbu Srinivasan ssriniva...@gmail.comwrote: Is there any place where I can know about the internal structure of the log file where kafka stores the data. A topic has a .index and a .log file. I want to read the entire log file and parse the contents out. Thanks Subbu