Re: Format of Kafka storage on disk

2014-01-03 Thread Joe Stein
The DumpLogSegments should do that for you
https://github.com/apache/kafka/blob/0.8/core/src/main/scala/kafka/tools/DumpLogSegments.scala

bin/kafka-run-class.sh kafka.tools.DumpLogSegments

Option  Description

--  ---

--deep-iterationif set, uses deep instead of
shallow
  iteration

--files  REQUIRED: The comma separated list
of
  data and index log files to be
dumped
--max-message-size   Size of largest message. (default:

  5242880)

--print-data-logif set, printing the messages
content
  when dumping data logs

--verify-index-only if set, just verify the index log

  without printing its content

or use the code as entry point for whatever you want to-do :)


/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop 
/


On Fri, Jan 3, 2014 at 5:10 PM, Subbu Srinivasan wrote:

> Is there any place where I can know about the internal structure of
> the log file where kafka stores the data. A topic has a .index and a .log
> file.
>
> I want to read the entire log file and parse the contents out.
>
> Thanks
> Subbu
>


Format of Kafka storage on disk

2014-01-03 Thread Subbu Srinivasan
Is there any place where I can know about the internal structure of
the log file where kafka stores the data. A topic has a .index and a .log
file.

I want to read the entire log file and parse the contents out.

Thanks
Subbu


Re: node.js client library?

2014-01-03 Thread Thomas

Thanks

On 27/12/2013 18:49, Joe Stein wrote:

I added the wurstmeister client to the wiki

SOHU-Co, can you provide a license file in the project and I would link it
then too please.

https://cwiki.apache.org/confluence/display/KAFKA/Clients

I also added wurstmeister's port of storm-kafka for 0.8.0 also to the
client list and my company's Scala DSL too

thnx =) Joestein


On Fri, Dec 27, 2013 at 1:15 AM, 小宇  wrote:


Hi, here is a Node.js client for latest Kafka:
https://github.com/SOHU-Co/kafka-node.git


2013/12/25 Thomas 


Hi Joe,

I've started a node.js implementation for 0.8. (https://github.com/
wurstmeister/node-kafka-0.8-plus)

I'd welcome any feedback or help.

Regards

Thomas



On 24/12/2013 15:24, Joe Stein wrote:


Hi, I wanted to reach out if folks are using
https://github.com/cainus/Prozess for a node.js client library?  Are
there
other node.js implementations folks are using or is that primarily it?
  Are
there even folks using node.js and producing to kafka broker and want
0.8.0
... 0.8.1 ... etc ... support?

/***
   Joe Stein
   Founder, Principal Consultant
   Big Data Open Source Security LLC
   http://www.stealth.ly
   Twitter: @allthingshadoop 
/






Re: problem with high-level consumer stream filter regex....

2014-01-03 Thread Jason Rosenberg
Thanks Joe,

I can confirm that your patch works for me, as applied to 0.8.0.

Jason

On Fri, Dec 20, 2013 at 6:28 PM, Jason Rosenberg  wrote:
> Thanks Joe,
>
> I generally build locally, and upload to our maven proxy (using a custom
> pom).
>
> I haven't yet had luck using maven central (although, I might upgrade to the
> 2.10 version, in which case I understand it to be in better shape?).
>
> I containerize the broker (and all the producers and consumers), so I use
> the kafka jar directly.
>
> I think if you do the patch against 0.8, I can apply and use.  Ultimately,
> I'll upgrade to 0.8.1, once that's in a beta release state.
>
> Thanks again,
>
> Jason
>
>
>
>
> On Fri, Dec 20, 2013 at 10:29 AM, Joe Stein  wrote:
>>
>> Hey Jason, I was able to reproduce the issue and have a fix in hand to
>> test
>> later today.  If it looks good I will post the patch. I am going to-do the
>> patch against 0.8 branch first.  How do you deploy and use libraries? Is
>> it
>> download broker and use maven central?
>>
>> /***
>>  Joe Stein
>>  Founder, Principal Consultant
>>  Big Data Open Source Security LLC
>>  http://www.stealth.ly
>>  Twitter: @allthingshadoop 
>> /
>>
>>
>> On Wed, Dec 18, 2013 at 4:13 PM, Jason Rosenberg  wrote:
>>
>> > thanks Joe!
>> >
>> >
>> > On Wed, Dec 18, 2013 at 11:05 AM, Joe Stein 
>> > wrote:
>> >
>> > > Hey Jason, I have someone looking into it now (they just started).
>> > >
>> > > I can look at it on Friday or if I finish up what I am working on for
>> > > tomorrow then sooner.
>> > >
>> > > /***
>> > >  Joe Stein
>> > >  Founder, Principal Consultant
>> > >  Big Data Open Source Security LLC
>> > >  http://www.stealth.ly
>> > >  Twitter: @allthingshadoop 
>> > > /
>> > >
>> > >
>> > > On Wed, Dec 18, 2013 at 8:15 AM, Jason Rosenberg 
>> > wrote:
>> > >
>> > > > Joe,
>> > > >
>> > > > I think the java code I listed in the Jira ticket should reproduce
>> > > > the
>> > > > issue directly, does that not work?
>> > > >
>> > > > Jason
>> > > >
>> > > >
>> > > > On Tue, Dec 17, 2013 at 9:49 AM, Joe Stein 
>> > wrote:
>> > > >
>> > > > > Hi Jason, I just replied on the ticket.  If it is a bug the update
>> > > > > to
>> > > > > create new filter or fix as bug, same.
>> > > > >
>> > > > > Can you post some code to help reproduce the problem?  so apples
>> > > > > to
>> > > > apples
>> > > > > and such, thanks!
>> > > > >
>> > > > > /***
>> > > > >  Joe Stein
>> > > > >  Founder, Principal Consultant
>> > > > >  Big Data Open Source Security LLC
>> > > > >  http://www.stealth.ly
>> > > > >  Twitter: @allthingshadoop
>> > > > > 
>> > > > > /
>> > > > >
>> > > > >
>> > > > > On Tue, Dec 17, 2013 at 1:16 AM, Jason Rosenberg
>> > > > > 
>> > > > wrote:
>> > > > >
>> > > > > > Ping
>> > > > > >
>> > > > > > Any thoughts on this?
>> > > > > >
>> > > > > > Seems like a bug, but then again, we're not sure what the
>> > > > > > expected
>> > > > > behavior
>> > > > > > for regexes should be here (e.g. is there a way to whitelist
>> > > > > > topics
>> > > > with
>> > > > > a
>> > > > > > filter that looks for a leading substring, but then blocks
>> > subsequent
>> > > > > > substrings)?  E.g. apply a blacklist to a whitelist :).
>> > > > > >
>> > > > > > Jason
>> > > > > >
>> > > > > >
>> > > > > > On Thu, Dec 12, 2013 at 1:01 PM, Jason Rosenberg
>> > > > > > > > >
>> > > > > wrote:
>> > > > > >
>> > > > > > > All, I've filed:
>> > https://issues.apache.org/jira/browse/KAFKA-1180
>> > > > > > >
>> > > > > > > We are needing to create a stream selector that essentially
>> > > combines
>> > > > > the
>> > > > > > > logic of the BlackList and WhiteList classes.  That is, we
>> > > > > > > want
>> > to
>> > > > > > select a
>> > > > > > > topic that contains a certain prefix, as long as it doesn't
>> > > > > > > also
>> > > > > contain
>> > > > > > a
>> > > > > > > secondary string.
>> > > > > > >
>> > > > > > > This should be easy to do with ordinary java Regex's, but
>> > > > > > > we're
>> > > > running
>> > > > > > > into some issues, trying to do this with the WhiteList class
>> > only.
>> > > > > > >
>> > > > > > > We have a pattern that uses negative lookahead, like this:
>> > > > > > >
>> > > > > > > "test-(?!bad\\b)[\\w]+"
>> > > > > > >
>> > > > > > > So this should select a topic like: "test-good", but exclude a
>> > > topic
>> > > > > like
>> > > > > > > "test-bad", and also exclude a topic without the "test"
>> > > > > > > prefix,
>> > > like
>> > > > > > > "foo-bar".
>> > > > > > >
>> > > > > > > Instead, what we see is a NullPointerException in the
>> > > > ConsumerIterator,
>> > > > > > > and the consumer just hangs,

Re: Trouble recovering after a crashed broker

2014-01-03 Thread Jun Rao
If a broker crashes and restarts, it will catch up the missing data from
the leader replicas. Normally, when this broker is catching up, it won't be
serving any client requests though. Are you seeing those errors on the
crashed broker? Also, you are not supposed to see OffsetOutOfRangeException
with just one broker failure with 3 replicas. Do you see the following in
the controller log?

"No broker in ISR is alive for ... There's potential data loss."

Thanks,

Jun

On Fri, Jan 3, 2014 at 1:23 AM, Vincent Rischmann wrote:

> Hi all,
>
> We have a cluster of 3 0.8 brokers, and this morning one of the broker
> crashed.
> It is a test broker, and we stored the logs in /tmp/kafka-logs. All topics
> in use are replicated on the three brokers.
>
> You can guess the problem, when the broker rebooted it wiped all the data
> in the logs.
>
> The producers and consumers are fine, but the broker with the wiped data
> keeps generating a lot of exceptions, and I don't really know what to do to
> recover.
>
> Example exception:
>
> [2014-01-03 10:09:47,755] ERROR [KafkaApi-1] Error when processing fetch
> request for partition [topic,0] offset 814798 from consumer with
> correlation id 0 (kafka.server.KafkaApis)
> kafka.common.OffsetOutOfRangeException: Request for offset 814798 but we
> only have log segments in the range 0 to 19372.
>
> There are a lot of them, something like 10+ per second. I (maybe wrongly)
> assumed that the broker would catch up, if that's the case how can I see
> the progress ?
>
> In general, what is the recommended way to bring back a broker with wiped
> data in a cluster ?
>
> Thanks.
>


Trouble recovering after a crashed broker

2014-01-03 Thread Vincent Rischmann
Hi all,

We have a cluster of 3 0.8 brokers, and this morning one of the broker
crashed.
It is a test broker, and we stored the logs in /tmp/kafka-logs. All topics
in use are replicated on the three brokers.

You can guess the problem, when the broker rebooted it wiped all the data
in the logs.

The producers and consumers are fine, but the broker with the wiped data
keeps generating a lot of exceptions, and I don't really know what to do to
recover.

Example exception:

[2014-01-03 10:09:47,755] ERROR [KafkaApi-1] Error when processing fetch
request for partition [topic,0] offset 814798 from consumer with
correlation id 0 (kafka.server.KafkaApis)
kafka.common.OffsetOutOfRangeException: Request for offset 814798 but we
only have log segments in the range 0 to 19372.

There are a lot of them, something like 10+ per second. I (maybe wrongly)
assumed that the broker would catch up, if that's the case how can I see
the progress ?

In general, what is the recommended way to bring back a broker with wiped
data in a cluster ?

Thanks.