Trouble recovering after a crashed broker

2014-01-03 Thread Vincent Rischmann
Hi all,

We have a cluster of 3 0.8 brokers, and this morning one of the broker
crashed.
It is a test broker, and we stored the logs in /tmp/kafka-logs. All topics
in use are replicated on the three brokers.

You can guess the problem, when the broker rebooted it wiped all the data
in the logs.

The producers and consumers are fine, but the broker with the wiped data
keeps generating a lot of exceptions, and I don't really know what to do to
recover.

Example exception:

[2014-01-03 10:09:47,755] ERROR [KafkaApi-1] Error when processing fetch
request for partition [topic,0] offset 814798 from consumer with
correlation id 0 (kafka.server.KafkaApis)
kafka.common.OffsetOutOfRangeException: Request for offset 814798 but we
only have log segments in the range 0 to 19372.

There are a lot of them, something like 10+ per second. I (maybe wrongly)
assumed that the broker would catch up, if that's the case how can I see
the progress ?

In general, what is the recommended way to bring back a broker with wiped
data in a cluster ?

Thanks.


Re: Trouble recovering after a crashed broker

2014-01-03 Thread Jun Rao
If a broker crashes and restarts, it will catch up the missing data from
the leader replicas. Normally, when this broker is catching up, it won't be
serving any client requests though. Are you seeing those errors on the
crashed broker? Also, you are not supposed to see OffsetOutOfRangeException
with just one broker failure with 3 replicas. Do you see the following in
the controller log?

No broker in ISR is alive for ... There's potential data loss.

Thanks,

Jun

On Fri, Jan 3, 2014 at 1:23 AM, Vincent Rischmann zecmerqu...@gmail.comwrote:

 Hi all,

 We have a cluster of 3 0.8 brokers, and this morning one of the broker
 crashed.
 It is a test broker, and we stored the logs in /tmp/kafka-logs. All topics
 in use are replicated on the three brokers.

 You can guess the problem, when the broker rebooted it wiped all the data
 in the logs.

 The producers and consumers are fine, but the broker with the wiped data
 keeps generating a lot of exceptions, and I don't really know what to do to
 recover.

 Example exception:

 [2014-01-03 10:09:47,755] ERROR [KafkaApi-1] Error when processing fetch
 request for partition [topic,0] offset 814798 from consumer with
 correlation id 0 (kafka.server.KafkaApis)
 kafka.common.OffsetOutOfRangeException: Request for offset 814798 but we
 only have log segments in the range 0 to 19372.

 There are a lot of them, something like 10+ per second. I (maybe wrongly)
 assumed that the broker would catch up, if that's the case how can I see
 the progress ?

 In general, what is the recommended way to bring back a broker with wiped
 data in a cluster ?

 Thanks.



Re: problem with high-level consumer stream filter regex....

2014-01-03 Thread Jason Rosenberg
Thanks Joe,

I can confirm that your patch works for me, as applied to 0.8.0.

Jason

On Fri, Dec 20, 2013 at 6:28 PM, Jason Rosenberg j...@squareup.com wrote:
 Thanks Joe,

 I generally build locally, and upload to our maven proxy (using a custom
 pom).

 I haven't yet had luck using maven central (although, I might upgrade to the
 2.10 version, in which case I understand it to be in better shape?).

 I containerize the broker (and all the producers and consumers), so I use
 the kafka jar directly.

 I think if you do the patch against 0.8, I can apply and use.  Ultimately,
 I'll upgrade to 0.8.1, once that's in a beta release state.

 Thanks again,

 Jason




 On Fri, Dec 20, 2013 at 10:29 AM, Joe Stein joe.st...@stealth.ly wrote:

 Hey Jason, I was able to reproduce the issue and have a fix in hand to
 test
 later today.  If it looks good I will post the patch. I am going to-do the
 patch against 0.8 branch first.  How do you deploy and use libraries? Is
 it
 download broker and use maven central?

 /***
  Joe Stein
  Founder, Principal Consultant
  Big Data Open Source Security LLC
  http://www.stealth.ly
  Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop
 /


 On Wed, Dec 18, 2013 at 4:13 PM, Jason Rosenberg j...@squareup.com wrote:

  thanks Joe!
 
 
  On Wed, Dec 18, 2013 at 11:05 AM, Joe Stein joe.st...@stealth.ly
  wrote:
 
   Hey Jason, I have someone looking into it now (they just started).
  
   I can look at it on Friday or if I finish up what I am working on for
   tomorrow then sooner.
  
   /***
Joe Stein
Founder, Principal Consultant
Big Data Open Source Security LLC
http://www.stealth.ly
Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop
   /
  
  
   On Wed, Dec 18, 2013 at 8:15 AM, Jason Rosenberg j...@squareup.com
  wrote:
  
Joe,
   
I think the java code I listed in the Jira ticket should reproduce
the
issue directly, does that not work?
   
Jason
   
   
On Tue, Dec 17, 2013 at 9:49 AM, Joe Stein joe.st...@stealth.ly
  wrote:
   
 Hi Jason, I just replied on the ticket.  If it is a bug the update
 to
 create new filter or fix as bug, same.

 Can you post some code to help reproduce the problem?  so apples
 to
apples
 and such, thanks!

 /***
  Joe Stein
  Founder, Principal Consultant
  Big Data Open Source Security LLC
  http://www.stealth.ly
  Twitter: @allthingshadoop
 http://www.twitter.com/allthingshadoop
 /


 On Tue, Dec 17, 2013 at 1:16 AM, Jason Rosenberg
 j...@squareup.com
wrote:

  Ping
 
  Any thoughts on this?
 
  Seems like a bug, but then again, we're not sure what the
  expected
 behavior
  for regexes should be here (e.g. is there a way to whitelist
  topics
with
 a
  filter that looks for a leading substring, but then blocks
  subsequent
  substrings)?  E.g. apply a blacklist to a whitelist :).
 
  Jason
 
 
  On Thu, Dec 12, 2013 at 1:01 PM, Jason Rosenberg
  j...@squareup.com
  
 wrote:
 
   All, I've filed:
  https://issues.apache.org/jira/browse/KAFKA-1180
  
   We are needing to create a stream selector that essentially
   combines
 the
   logic of the BlackList and WhiteList classes.  That is, we
   want
  to
  select a
   topic that contains a certain prefix, as long as it doesn't
   also
 contain
  a
   secondary string.
  
   This should be easy to do with ordinary java Regex's, but
   we're
running
   into some issues, trying to do this with the WhiteList class
  only.
  
   We have a pattern that uses negative lookahead, like this:
  
   test-(?!bad\\b)[\\w]+
  
   So this should select a topic like: test-good, but exclude a
   topic
 like
   test-bad, and also exclude a topic without the test
   prefix,
   like
   foo-bar.
  
   Instead, what we see is a NullPointerException in the
ConsumerIterator,
   and the consumer just hangs, after sending a topic of
  'test-topic'
  followed
   by 'test-bad':
  
   21700
  
 

   
  
 
  [ConsumerFetcherThread-group1_square-1a7ac0.local-1386869343370-dc19c7dc-0-1946108683]
   ERROR kafka.consumer.ConsumerFetcherThread  -
  
 

   
  
 
  [ConsumerFetcherThread-group1_square-1a7ac0.local-1386869343370-dc19c7dc-0-1946108683],
   Error due to
   kafka.common.KafkaException: error processing data for
   partition
   [test-bad,0] offset 0
   at
  
 

   
  
 
  

Re: node.js client library?

2014-01-03 Thread Thomas

Thanks

On 27/12/2013 18:49, Joe Stein wrote:

I added the wurstmeister client to the wiki

SOHU-Co, can you provide a license file in the project and I would link it
then too please.

https://cwiki.apache.org/confluence/display/KAFKA/Clients

I also added wurstmeister's port of storm-kafka for 0.8.0 also to the
client list and my company's Scala DSL too

thnx =) Joestein


On Fri, Dec 27, 2013 at 1:15 AM, 小宇 mocking...@gmail.com wrote:


Hi, here is a Node.js client for latest Kafka:
https://github.com/SOHU-Co/kafka-node.git


2013/12/25 Thomas thomas...@arcor.de


Hi Joe,

I've started a node.js implementation for 0.8. (https://github.com/
wurstmeister/node-kafka-0.8-plus)

I'd welcome any feedback or help.

Regards

Thomas



On 24/12/2013 15:24, Joe Stein wrote:


Hi, I wanted to reach out if folks are using
https://github.com/cainus/Prozess for a node.js client library?  Are
there
other node.js implementations folks are using or is that primarily it?
  Are
there even folks using node.js and producing to kafka broker and want
0.8.0
... 0.8.1 ... etc ... support?

/***
   Joe Stein
   Founder, Principal Consultant
   Big Data Open Source Security LLC
   http://www.stealth.ly
   Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop
/






Format of Kafka storage on disk

2014-01-03 Thread Subbu Srinivasan
Is there any place where I can know about the internal structure of
the log file where kafka stores the data. A topic has a .index and a .log
file.

I want to read the entire log file and parse the contents out.

Thanks
Subbu


Re: Format of Kafka storage on disk

2014-01-03 Thread Joe Stein
The DumpLogSegments should do that for you
https://github.com/apache/kafka/blob/0.8/core/src/main/scala/kafka/tools/DumpLogSegments.scala

bin/kafka-run-class.sh kafka.tools.DumpLogSegments

Option  Description

--  ---

--deep-iterationif set, uses deep instead of
shallow
  iteration

--files file1, file2, ... REQUIRED: The comma separated list
of
  data and index log files to be
dumped
--max-message-size Integer: size  Size of largest message. (default:

  5242880)

--print-data-logif set, printing the messages
content
  when dumping data logs

--verify-index-only if set, just verify the index log

  without printing its content

or use the code as entry point for whatever you want to-do :)


/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop
/


On Fri, Jan 3, 2014 at 5:10 PM, Subbu Srinivasan ssriniva...@gmail.comwrote:

 Is there any place where I can know about the internal structure of
 the log file where kafka stores the data. A topic has a .index and a .log
 file.

 I want to read the entire log file and parse the contents out.

 Thanks
 Subbu