Re: [ANNOUNCE] New committer: Damian Guy

2017-06-09 Thread Joe Stein
Congrats!


~ Joe Stein

On Fri, Jun 9, 2017 at 6:49 PM, Neha Narkhede  wrote:

> Well deserved. Congratulations Damian!
>
> On Fri, Jun 9, 2017 at 1:34 PM Guozhang Wang  wrote:
>
> > Hello all,
> >
> >
> > The PMC of Apache Kafka is pleased to announce that we have invited
> Damian
> > Guy as a committer to the project.
> >
> > Damian has made tremendous contributions to Kafka. He has not only
> > contributed a lot into the Streams api, but have also been involved in
> many
> > other areas like the producer and consumer clients, broker-side
> > coordinators (group coordinator and the ongoing transaction coordinator).
> > He has contributed more than 100 patches so far, and have been driving
> on 6
> > KIP contributions.
> >
> > More importantly, Damian has been a very prolific reviewer on open PRs
> and
> > has been actively participating on community activities such as email
> lists
> > and slack overflow questions. Through his code contributions and reviews,
> > Damian has demonstrated good judgement on system design and code
> qualities,
> > especially on thorough unit test coverages. We believe he will make a
> great
> > addition to the committers of the community.
> >
> >
> > Thank you for your contributions, Damian!
> >
> >
> > -- Guozhang, on behalf of the Apache Kafka PMC
> >
> --
> Thanks,
> Neha
>


Re: [DISCUSS] Java 8 as a minimum requirement

2016-06-17 Thread Joe Stein
Compatibility shouldn't be broken in a minor release. Minor versions are
for new features in a backwards-compatible manner. The Kafka bylaws do not
explicitly state this but I believe it is implied based on general practice
and so many other Apache projects explicitly calling this out, documenting
and communicating their semantic version strategy.

If JDK8 is so much desired then jump to 0.11 and only do bug fixes on the
0.10 release (which should be rigorous and not forceful to make folks
upgrade unnecessarily to get such improvements).

My 0.2824152382 cents.

Regards,

~ Joe Stein

On Fri, Jun 17, 2016 at 8:53 AM, Marina  wrote:

> +1 - wish it was already done with Kafka 0.9 version :)
>
>
>   From: Tommy Becker 
>  To: users@kafka.apache.org
>  Sent: Friday, June 17, 2016 7:55 AM
>  Subject: Re: [DISCUSS] Java 8 as a minimum requirement
>
> +1 We're on Java 8 already.
>
> On 06/16/2016 04:45 PM, Ismael Juma wrote:
>
> Hi all,
>
> I would like to start a discussion on making Java 8 a minimum requirement
> for Kafka's next feature release (let's say Kafka 0.10.1.0 for now). This
> is the first discussion on the topic so the idea is to understand how
> people feel about it. If people feel it's too soon, then we can pick up the
> conversation again after Kafka 0.10.1.0. If the feedback is mostly
> positive, I will start a vote thread.
>
> Let's start with some dates. Java 7 hasn't received public updates since
> April 2015[1], Java 8 was released in March 2014[2] and Java 9 is scheduled
> to be released in March 2017[3].
>
> The first argument for dropping support for Java 7 is that the last public
> release by Oracle contains a large number of known security
> vulnerabilities. The effectiveness of Kafka's security features is reduced
> if the underlying runtime is not itself secure.
>
> The second argument for moving to Java 8 is that it adds a number of
> compelling features:
>
> * Lambda expressions and method references (particularly useful for the
> Kafka Streams DSL)
> * Default methods (very useful for maintaining compatibility when adding
> methods to interfaces)
> * java.util.stream (helpful for making collection transformations more
> concise)
> * Lots of improvements to java.util.concurrent (CompletableFuture,
> DoubleAdder, DoubleAccumulator, StampedLock, LongAdder, LongAccumulator)
> * Other nice things: SplittableRandom, Optional (and many others I have not
> mentioned)
>
> The third argument is that it will simplify our testing matrix, we won't
> have to test with Java 7 any longer (this is particularly useful for system
> tests that take hours to run). It will also make it easier to support Scala
> 2.12, which requires Java 8.
>
> The fourth argument is that many other open-source projects have taken the
> leap already. Examples are Cassandra[4], Lucene[5], Akka[6], Hadoop 3[7],
> Jetty[8], Eclipse[9], IntelliJ[10] and many others[11]. Even Android will
> support Java 8 in the next version (although it will take a while before
> most phones will use that version sadly). This reduces (but does not
> eliminate) the chance that we would be the first project that would cause a
> user to consider a Java upgrade.
>
> The main argument for not making the change is that a reasonable number of
> users may still be using Java 7 by the time Kafka 0.10.1.0 is released.
> More specifically, we care about the subset who would be able to upgrade to
> Kafka 0.10.1.0, but would not be able to upgrade the Java version. It would
> be great if we could quantify this in some way.
>
> What do you think?
>
> Ismael
>
> [1] https://java.com/en/download/faq/java_7.xml
> [2] https://blogs.oracle.com/thejavatutorials/entry/jdk_8_is_released
> [3] http://openjdk.java.net/projects/jdk9/
> [4] https://github.com/apache/cassandra/blob/trunk/README.asc
> [5] https://lucene.apache.org/#highlights-of-this-lucene-release-include
> [6] http://akka.io/news/2015/09/30/akka-2.4.0-released.html
> [7] https://issues.apache.org/jira/browse/HADOOP-11858
> [8] https://webtide.com/jetty-9-3-features/
> [9] http://markmail.org/message/l7s276y3xkga2eqf
> [10]
>
> https://intellij-support.jetbrains.com/hc/en-us/articles/206544879-Selecting-the-JDK-version-the-IDE-will-run-under
> [11] http://markmail.org/message/l7s276y3xkga2eqf
>
>
>
> --
> Tommy Becker
> Senior Software Engineer
>
> Digitalsmiths
> A TiVo Company
>
> www.digitalsmiths.com<http://www.digitalsmiths.com>
> tobec...@tivo.com<mailto:tobec...@tivo.com>
>
> 
>
> This email and any attachments may contain confidential and privileged
> material for the sole use of the intended recipient. Any 

Heron Spouts & Bolts & Example for Apache Kafka

2016-05-25 Thread Joe Stein
Hey Kafka community, I wanted to pass along some of the work we have been
doing as part of providing commercial support for Heron
https://blog.twitter.com/2016/open-sourcing-twitter-heron Open Sourced
Today.

https://github.com/twitter/heron/pull/751 Kafka 0.8 & 0.9 Spout, Bolt &
Example Topology

Looking forward to continued contributes, if you haven't tried out Heron
yet you should.

Kafka is the wind to make Heron fly!

/******
  Joe Stein
  Elodina Inc
  http://www.elodina.net
**/


Re: Kafka support needed

2016-04-19 Thread Joe Stein
If you use Go you can use https://github.com/sclasen/event-shuttle which is
nice choice in some cases because footprint, it uses boltdb which is like
leveldb which is like embedded k/v ok

NiFi is cool too https://nifi.apache.org/

So is bruce https://github.com/ifwe/bruce

those are more out of the box and can work with but reliable because people
use them live

On Wed, Apr 20, 2016 at 1:05 AM, Sunny Shah  wrote:

> Hi Yogesh,
>
> You can even use sqllite/leveldb to buffer the data on client.
>
> Thanks,
> Sunny
> On Apr 20, 2016 10:31 AM, "Yogesh BG"  wrote:
>
> > Thank You for the reply.
> >
> > I am running producer in very resource constraint device(IOT hub). I
> doubt
> > whether i can accommodate local broker.
> >
> >
> > On Wed, Apr 20, 2016 at 10:07 AM, Sunny Shah 
> wrote:
> >
> > > Hi Yogesh,
> > >
> > > No, Kafka does not provide this functionality out of the box, Though
> you
> > > can easily engineer it by having a localhost Kafka setup.
> > >
> > >1. Always write data to the localhost Kafka.
> > >2. When broker connection is available then read data from localhost
> > >Kafka and send it to remote Kafka broker.
> > >
> > > If you don't want to engineer this sytem then you can use Apache NiFi,
> It
> > > is meant for reliable edge node data ingestion.
> > >
> > > Thanks,
> > >  Sunny
> > >
> > > On Wed, Apr 20, 2016 at 9:41 AM, Yogesh BG 
> wrote:
> > >
> > > > Hi
> > > >
> > > >
> > > >
> > > > I have a one scenario ass below, I want to know whether its supported
> > > > currently. If not is there any work around by using existing kafka
> > > > features.
> > > >
> > > >
> > > >
> > > > I have kafka producer, currently he doesn’t have connection to the
> > > broker.
> > > > I want to send the messages to kafka broker when the connection is
> > > > available.
> > > >
> > > > Meanwhile I should be able to delete the messages from producer
> buffer
> > > > after some size / some days of interval the connection is not
> > available.
> > > >
> > > > --
> > > > Yogesh..BG
> > > > Senior Software engineer
> > > > Sling Media Pvt. Ltd.
> > > > PSS Plaza, #6,
> > > > Wind Tunnel Road.
> > > > Murghesh Palya,
> > > > Banglore - 560 017
> > > > Contact no: 7760922118
> > > >
> > >
> >
> >
> >
> > --
> > Yogesh..BG
> > Senior Software engineer
> > Sling Media Pvt. Ltd.
> > PSS Plaza, #6,
> > Wind Tunnel Road.
> > Murghesh Palya,
> > Banglore - 560 017
> > Contact no: 7760922118
> >
>


Re: Kafka Connect - Source Connector for Apache Cassandra

2016-04-13 Thread Joe Stein
There is one being worked on here
https://github.com/tuplejump/kafka-connect-cassandra/

On Wed, Apr 13, 2016 at 5:21 PM, Kaz Chehresa  wrote:

> Has anyone tried implementing or know whether it's possible to implement a
> "Kafka Connect" source connector for apache Cassandra, similar to its
> JDBCSourceConnector?
>
> Thanks.
>


Re: Powered by Kafka

2015-11-30 Thread Joe Stein
ok, you should be gt2g

~ Joe Stein
- - - - - - - - - - - - - - - - - - -
 [image: Logo-Black.jpg]
  http://www.elodina.net
http://www.stealth.ly
- - - - - - - - - - - - - - - - - - -

On Mon, Nov 30, 2015 at 1:05 PM, Andrew Schofield <
andrew_schofi...@uk.ibm.com> wrote:

> Hi,
> My Confluence name is "andrew_schofield" and being able to edit would be
> great.
>
> Thanks,
> Andrew
>
> Andrew Schofield
> Chief Architect, Hybrid Cloud Messaging
> Senior Technical Staff Member
> IBM Systems, Middleware
>
> IBM United Kingdom Limited
> Mail Point 211
> Hursley Park
> Winchester
> Hampshire
> SO21 2JN
>
> Phone ext. 37248357 (External:+44-1962-818357), DE2J22
> Internet mail: andrew_schofi...@uk.ibm.com
>
>
>
> From:   Joe Stein 
> To: "users@kafka.apache.org" 
> Date:   30/11/2015 18:02
> Subject:Re: Powered by Kafka
>
>
>
> Hey Andrew, cool, yeah!
>
> What is your confluence name you can edit the page once you get permission
> to edit just need to ask on list.
>
> Has anyone thought about working more on that page putting it together
> more
> for folks? I think once I put the page on 7 slides 9 font it wasn't
> categorized or anything lots of info in there too for folks that are
> helpful.
>
> I think more services that folks know have kafka behind it gives it more
> reliability that it can do what the expectation for it are to-do.
>
> ~ Joe Stein
>
> On Mon, Nov 30, 2015 at 12:39 PM, Andrew Schofield <
> andrew_schofi...@uk.ibm.com> wrote:
>
> > Hi,
> > Please could we be added to the "Powered by Kafka" list.
> >
> > Company: IBM
> > Description: The Message Hub service in our Bluemix PaaS offers
> > Kafka-based messaging in a multi-tenant, pay-as-you-go public cloud.
> It's
> > intended to provide messaging services for microservices, event-driven
> > processing and streaming data into analytics systems.
> >
> > Thanks,
> > Andrew
> >
> > Andrew Schofield
> > Chief Architect, Hybrid Cloud Messaging
> > Senior Technical Staff Member
> > IBM Systems, Middleware
> >
> > IBM United Kingdom Limited
> > Mail Point 211
> > Hursley Park
> > Winchester
> > Hampshire
> > SO21 2JN
> >
> > Phone ext. 37248357 (External:+44-1962-818357), DE2J22
> > Internet mail: andrew_schofi...@uk.ibm.com
> > Unless stated otherwise above:
> > IBM United Kingdom Limited - Registered in England and Wales with number
> > 741598.
> > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
> 3AU
> >
>
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>


Re: Powered by Kafka

2015-11-30 Thread Joe Stein
Hey Andrew, cool, yeah!

What is your confluence name you can edit the page once you get permission
to edit just need to ask on list.

Has anyone thought about working more on that page putting it together more
for folks? I think once I put the page on 7 slides 9 font it wasn't
categorized or anything lots of info in there too for folks that are
helpful.

I think more services that folks know have kafka behind it gives it more
reliability that it can do what the expectation for it are to-do.

~ Joe Stein

On Mon, Nov 30, 2015 at 12:39 PM, Andrew Schofield <
andrew_schofi...@uk.ibm.com> wrote:

> Hi,
> Please could we be added to the "Powered by Kafka" list.
>
> Company: IBM
> Description: The Message Hub service in our Bluemix PaaS offers
> Kafka-based messaging in a multi-tenant, pay-as-you-go public cloud. It's
> intended to provide messaging services for microservices, event-driven
> processing and streaming data into analytics systems.
>
> Thanks,
> Andrew
>
> Andrew Schofield
> Chief Architect, Hybrid Cloud Messaging
> Senior Technical Staff Member
> IBM Systems, Middleware
>
> IBM United Kingdom Limited
> Mail Point 211
> Hursley Park
> Winchester
> Hampshire
> SO21 2JN
>
> Phone ext. 37248357 (External:+44-1962-818357), DE2J22
> Internet mail: andrew_schofi...@uk.ibm.com
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>


Re: [kafka-clients] Re: 0.9.0.0 RC4

2015-11-24 Thread Joe Stein
Thanks for everyone that contributed to this release! It has been a long
time in the works with some really great new additions for folks waiting
with excitement of the new consumer, security and connect (copycat) and
everything else baked in.

Thanks again!

~ Joe Stein
- - - - - - - - - - - - - - - - - - -
 [image: Logo-Black.jpg]
  http://www.elodina.net
http://www.stealth.ly
- - - - - - - - - - - - - - - - - - -

On Mon, Nov 23, 2015 at 11:49 PM, Jun Rao  wrote:

> Thanks everyone for voting.
>
> The following are the results of the votes.
>
> +1 binding = 4 votes (Neha Narkhede, Sriharsha Chintalapani, Guozhang
> Wang, Jun Rao)
> +1 non-binding = 3 votes
> -1 = 0 votes
> 0 = 0 votes
>
> The vote passes.
>
> I will release artifacts to maven central, update the dist svn and download
> site. Will send out an announce after that.
>
> Jun
>
> On Mon, Nov 23, 2015 at 8:46 PM, Jun Rao  wrote:
>
>> +1
>>
>> Thanks,
>>
>> Jun
>>
>> On Fri, Nov 20, 2015 at 5:21 PM, Jun Rao  wrote:
>>
>>> This is the fourth candidate for release of Apache Kafka 0.9.0.0. This a
>>> major release that includes (1) authentication (through SSL and SASL) and
>>> authorization, (2) a new java consumer, (3) a Kafka connect framework for
>>> data ingestion and egression, and (4) quotas. Since this is a major
>>> release, we will give people a bit more time for trying this out.
>>>
>>> Release Notes for the 0.9.0.0 release
>>>
>>> https://people.apache.org/~junrao/kafka-0.9.0.0-candidate4/RELEASE_NOTES.html
>>>
>>> *** Please download, test and vote by Monday, Nov. 23, 6pm PT
>>>
>>> Kafka's KEYS file containing PGP keys we use to sign the release:
>>> http://kafka.apache.org/KEYS in addition to the md5, sha1
>>> and sha2 (SHA256) checksum.
>>>
>>> * Release artifacts to be voted upon (source and binary):
>>> https://people.apache.org/~junrao/kafka-0.9.0.0-candidate4/
>>>
>>> * Maven artifacts to be voted upon prior to release:
>>> https://repository.apache.org/content/groups/staging/
>>>
>>> * scala-doc
>>> https://people.apache.org/~junrao/kafka-0.9.0.0-candidate4/scaladoc/
>>>
>>> * java-doc
>>> https://people.apache.org/~junrao/kafka-0.9.0.0-candidate4/javadoc/
>>>
>>> * The tag to be voted upon (off the 0.9.0 branch) is the 0.9.0.0 tag
>>>
>>> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=132943b0f83831132cd46ac961cf6f1c00132565
>>>
>>> * Documentation
>>> http://kafka.apache.org/090/documentation.html
>>>
>>> /***
>>>
>>> Thanks,
>>>
>>> Jun
>>>
>>>
>>
> --
> You received this message because you are subscribed to the Google Groups
> "kafka-clients" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kafka-clients+unsubscr...@googlegroups.com.
> To post to this group, send email to kafka-clie...@googlegroups.com.
> Visit this group at http://groups.google.com/group/kafka-clients.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/kafka-clients/CAFc58G9cig_D_9b4cvcs%3DnLyC-1%2BwF1f%2B%2BnMg%2B89qDHGVcLb_A%40mail.gmail.com
> <https://groups.google.com/d/msgid/kafka-clients/CAFc58G9cig_D_9b4cvcs%3DnLyC-1%2BwF1f%2B%2BnMg%2B89qDHGVcLb_A%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>


Re: One more Kafka Meetup hosted by LinkedIn in 2015 (this time in San Francisco) - does anyone want to talk?

2015-11-04 Thread Joe Stein
They should all be on the user groups section of the confluence page
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+papers+and+presentations
for which there were video. It might need some curating but that is where
it has been going so far.

~ Joe Stein

On Tue, Nov 3, 2015 at 4:48 PM, Grant Henke  wrote:

> Is there a place where we can find all previously streamed/recorded
> meetups?
>
> Thank you,
> Grant
>
> On Tue, Nov 3, 2015 at 2:07 PM, Ed Yakabosky 
> wrote:
>
> > I'm sorry to hear that Lukas.  I have heard that people are starting to
> do
> > carpools via rydeful.com for some of these meetups.
> >
> > Additionally, we will live stream and record the presentations, so you
> can
> > participate remotely.
> >
> > Ed
> >
> > On Tue, Nov 3, 2015 at 10:43 AM, Lukas Steiblys 
> > wrote:
> >
> > > This is sad news. I was looking forward to finally going to a Kafka or
> > > Samza meetup. Going to Mountain View for a meetup is just unrealistic
> > with
> > > 2h travel time each way.
> > >
> > > Lukas
> > >
> > > -Original Message- From: Ed Yakabosky
> > > Sent: Tuesday, November 3, 2015 10:36 AM
> > > To: users@kafka.apache.org ; d...@kafka.apache.org ; Clark Haskins
> > > Subject: Re: One more Kafka Meetup hosted by LinkedIn in 2015 (this
> time
> > > in San Francisco) - does anyone want to talk?
> > >
> > > Hi all,
> > >
> > > Two corrections to the invite:
> > >
> > >   1. The invitation is for November 18, 2015.  *NOT 2016.*  I was a
> > little
> > >   hasty...
> > >   2. LinkedIn has finished remodeling our broadcast room, so we are
> going
> > >
> > >   to host the meet up in Mountain View, not San Francisco.
> > >
> > > We've arranged for speakers from HortonWorks to talk about Security and
> > > LinkedIn to talk about Quotas.  We are still looking for one more
> > speaker,
> > > so please let me know if you are interested.
> > >
> > > Thanks!
> > > Ed
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Fri, Oct 30, 2015 at 12:49 PM, Ed Yakabosky <
> eyakabo...@linkedin.com>
> > > wrote:
> > >
> > > Hi all,
> > >>
> > >> LinkedIn is hoping to host one more Apache Kafka meetup this year on
> > >> November 18 in our San Francisco office.  We're working on building
> the
> > >> agenda now.  Does anyone want to talk?  Please send me (and Clark) a
> > >> private email with a short description of what you would be talking
> > about
> > >> if interested.
> > >>
> > >> --
> > >> Thanks,
> > >>
> > >> Ed Yakabosky
> > >> ​Technical Program Management @ LinkedIn>
> > >>
> > >>
> > >
> > > --
> > > Thanks,
> > > Ed Yakabosky
> > >
> >
> >
> >
> > --
> > Thanks,
> > Ed Yakabosky
> >
>
>
>
> --
> Grant Henke
> Software Engineer | Cloudera
> gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke
>


Re: Pushing from Pig to Kafka

2015-09-28 Thread Joe Stein
Hey Stefan, I don't think it is much a matter for which list a lot of folks
read both (depending on from where/whom you are expecting answer some).

It has been ~ 2 years since folks have been operationalizing that part of
the code (for a lot of reasons mostly map/reduce kind of having other
different options, etc).

We should maybe think about deprecating or reducing its support (which
probably is there on the latter already) such the case might be.

If you really need to go from Pig <-> Kafka and Kafka <-> Pig you may want
to find out from the Pig community their best support for integrated
systems... I suspect that integrated system will support Kafka too :)

The Hadoop community has spent A LOT of time since 0.8 making all of that
ecosystem more cohesive within that community. I urge support for it always.

~ Joe Stein
- - - - - - - - - - - - - - - - - - -
 [image: Logo-Black.jpg]
  http://www.elodina.net
http://www.stealth.ly
- - - - - - - - - - - - - - - - - - -

On Mon, Sep 28, 2015 at 4:11 AM, Stefan Kupstaitis-Dunkler <
stefan@gmail.com> wrote:

> Hi all!
>
> I accidently posted this on the dev mailing list yesterday, when it's much
> better off here. Can anybody help me?
>
> When I run a pig script I get:
>
> java.lang.InstantiationError: org.apache.avro.io.BinaryEncoder
> at
>
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
> Caused by: java.lang.InstantiationError: org.apache.avro.io.BinaryEncoder
> at
> kafka.bridge.pig.AvroKafkaStorage.prepareToWrite(AvroKafkaStorage.java:77)
>
> I am using a HDP 2.2 installation which comes with Kafka 0.8.1. I try to
> push a single string from Pig into Kafka and follow these instructions:
>
> https://github.com/apache/kafka/blob/43b92f8b1ce8140c432edf11b0c842f5fbe04120/contrib/hadoop-producer/README.md
>
> The script looks like:
> REGISTER
>
> /usr/hdp/current/kafka-broker/contrib/lib/kafka-hadoop-producer-0.8.1.2.2.0.0-2041.jar;
> REGISTER /usr/hdp/current/pig-client/lib/avro-1.4.1.jar;
> REGISTER /usr/hdp/current/pig-client/piggybank.jar;
> REGISTER
> /usr/hdp/current/kafka-broker/libs/kafka_2.10-0.8.1.2.2.0.0-2041.jar;
> REGISTER /usr/hdp/current/pig-client/lib/jackson-core-asl-1.8.8.jar;
> REGISTER /usr/hdp/current/pig-client/lib/jackson-mapper-asl-1.8.8.jar;
> REGISTER /usr/hdp/current/kafka-broker/libs/scala-library-2.10.4.jar;
> STORE my_data INTO 'kafka://MY-BROKER-HOST-NAME:6667/myTopic' USING
> kafka.bridge.pig.AvroKafkaStorage('"string"');
>
> MY-BROKER-HOST-NAME is one of my Kafka brokers, myTopic is an existing
> topic. The version 1.4.1 of avro is not deployed by default, but I deployed
> it and registered it instead of avro 1.7.5, because I first thought this
> will solve my problem, but the same problem occurs. Also, I noticed, that
> the same problom still occurs, when I do not register avro at all. When I
> look into the avro documentation I can see, that the BinaryEncoder class is
> abstract, but nevertheless is tried to be instantiated in the
> AvroKakfaStorage class. So the error is not a surprise. Is there anything I
> am missing to make this work? Am I doing something wrong?
>
> Best regards, Stefan
>


Re: Broker restart, new producer and snappy in 0.8.2.1

2015-09-13 Thread Joe Stein
Hi, the 0.8.2.2 release (which vote just passed and should be announced
soon) has a patch that may be related
https://issues.apache.org/jira/browse/KAFKA-2308 not sure.

Here are the 0.8.2.2 artifacts
https://people.apache.org/~junrao/kafka-0.8.2.2-candidate1/ I don't see
them yet in Maven central so you will need to download into a local mvn and
use 0.8.2.2 producer. I don't know it if will fix your problem but if you
can reproduce it and see and if not post to a JIRA that would be great.

Thanks!

~ Joe Stein
- - - - - - - - - - - - - - - - - - -
 [image: Logo-Black.jpg]
  http://www.elodina.net
http://www.stealth.ly
- - - - - - - - - - - - - - - - - - -

On Sat, Sep 12, 2015 at 3:32 PM, Vidhya Arvind 
wrote:

> There has been multiple instances of this incident. When I restart the
> broker for config changes it's bringing down the consumers and mirror maker
> and I am seeing CRC corruption in mirror maker and following error in
> broker. I have reset the offset in zookeeper for certain topic/partitions.
> But I still see this issue popping up for other topics/partitions. Please
> let me know how I can resolve this. This has happened twice in production
> system
>
> Please let me if there is anything I can try to fix the issue before the
> next broker restart
>
> Vidhya
>
>
> [2015-09-12 19:04:03,409] ERROR [KafkaApi-30001] Error processing
> ProducerRequest with correlation id 3480 from client producer-1 on
> partition [events_prod_oncue.ws.client.ui_UIEvent,29]
> (kafka.server.KafkaApis)
> kafka.common.KafkaException: Error in validating messages while
> appending to log 'events_prod_oncue.ws.client.ui_UIEvent-29'
> at kafka.log.Log.liftedTree1$1(Log.scala:277)
> at kafka.log.Log.append(Log.scala:274)
> at
> kafka.cluster.Partition$$anonfun$appendMessagesToLeader$1.apply(Partition.scala:379)
> at
> kafka.cluster.Partition$$anonfun$appendMessagesToLeader$1.apply(Partition.scala:365)
> at kafka.utils.Utils$.inLock(Utils.scala:535)
> at kafka.utils.Utils$.inReadLock(Utils.scala:541)
> at
> kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:365)
> at
> kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:291)
> at
> kafka.server.KafkaApis$$anonfun$appendToLocalLog$2.apply(KafkaApis.scala:282)
> at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
> at
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
> at
> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
> at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
> at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
> at
> scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
> at scala.collection.AbstractTraversable.map(Traversable.scala:105)
> at kafka.server.KafkaApis.appendToLocalLog(KafkaApis.scala:282)
> at
> kafka.server.KafkaApis.handleProducerOrOffsetCommitRequest(KafkaApis.scala:204)
> at kafka.server.KafkaApis.handle(KafkaApis.scala:59)
> at
> kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:59)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: FAILED_TO_UNCOMPRESS(5)
> at org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:84)
> at org.xerial.snappy.SnappyNative.rawUncompress(Native Method)
> at org.xerial.snappy.Snappy.rawUncompress(Snappy.java:444)
> at org.xerial.snappy.Snappy.uncompress(Snappy.java:480)
> at
> org.xerial.snappy.SnappyInputStream.hasNextChunk(SnappyInputStream.java:362)
> at
> org.xerial.snappy.SnappyInputStream.rawRead(SnappyInputStream.java:167)
> at
> org.xerial.snappy.SnappyInputStream.read(SnappyInputStream.java:150)
> at java.io.InputStream.read(InputStream.java:101)
> at
> kafka.message.ByteBufferMessageSet$$anonfun$decompress$1.apply$mcI$sp(ByteBufferMessageSet.scala:67)
> at
> kafka.message.ByteBufferMessageSet$$anonfun$decompress$1.apply(ByteBufferMessageSet.scala:67)
> at
> kafka.message.ByteBufferMessageSet$$anonfun$decompress$1.apply(ByteBufferMessageSet.scala:67)
> at
> scala.collection.immutable.Stream$.continually(Stream.scala:1129)
> at
> scala.collection.immutable.Stream$$anonfun$continually$1.apply(Stream.scala:1129)
> at
> scala.collection.immutable.Stream$$anonfun$c

Re: [DISCUSSION] Kafka 0.8.2.2 release?

2015-08-15 Thread Joe Stein
Is this something also that should be in 0.8.2.2
https://issues.apache.org/jira/browse/KAFKA-2421 wasn't sure what the lz4
usage has been seems to break on some jdks.

+1 on the snappy fixes

~ Joe Stein

On Fri, Aug 14, 2015 at 5:39 PM, Guozhang Wang  wrote:

> +1 for both KAFKA-2189 and 2308.
>
> On Fri, Aug 14, 2015 at 7:03 AM, Gwen Shapira  wrote:
>
> > Will be nice to include Kafka-2308 and fix two critical snappy issues in
> > the maintenance release.
> >
> > Gwen
> > On Aug 14, 2015 6:16 AM, "Grant Henke"  wrote:
> >
> > > Just to clarify. Will KAFKA-2189 be the only patch in the release?
> > >
> > > On Fri, Aug 14, 2015 at 7:35 AM, Manikumar Reddy  >
> > > wrote:
> > >
> > > > +1  for 0.8.2.2 release
> > > >
> > > > On Fri, Aug 14, 2015 at 5:49 PM, Ismael Juma 
> > wrote:
> > > >
> > > > > I think this is a good idea as the change is minimal on our side
> and
> > it
> > > > has
> > > > > been tested in production for some time by the reporter.
> > > > >
> > > > > Best,
> > > > > Ismael
> > > > >
> > > > > On Fri, Aug 14, 2015 at 1:15 PM, Jun Rao  wrote:
> > > > >
> > > > > > Hi, Everyone,
> > > > > >
> > > > > > Since the release of Kafka 0.8.2.1, a number of people have
> > reported
> > > an
> > > > > > issue with snappy compression (
> > > > > > https://issues.apache.org/jira/browse/KAFKA-2189). Basically, if
> > > they
> > > > > use
> > > > > > snappy in 0.8.2.1, they will experience a 2-3X space increase.
> The
> > > > issue
> > > > > > has since been fixed in trunk (just a snappy jar upgrade). Since
> > > 0.8.3
> > > > is
> > > > > > still a few months away, it may make sense to do an 0.8.2.2
> release
> > > > just
> > > > > to
> > > > > > fix this issue. Any objections?
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jun
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Grant Henke
> > > Software Engineer | Cloudera
> > > gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke
> > >
> >
>
>
>
> --
> -- Guozhang
>


Re: abstracting ZooKeeper

2015-08-09 Thread Joe Stein
I have started writing a KIP about this topic
https://cwiki.apache.org/confluence/display/KAFKA/KIP-30+-+Allow+for+brokers+to+have+plug-able+consensus+and+meta+data+storage+sub+systems
with hopes to get more of it typed out and then circulate for discussion on
the dev list over the next ~ week.

I think the plug-in that the project should support is the existing
implementation which will give folks not looking for an alternative a
stable upgrade path.

There are 5 ways I have heard folks wanting to-do this. From the project
perspective I think that 1 (max 2) should be supported and the rest
available in contrib repo we can link to.

Other thoughts/comments happy to catch those and update the KIP.

Thanks!

~ Joe Stein
- - - - - - - - - - - - - - - - - - -
 [image: Logo-Black.jpg]
  http://www.elodina.net
http://www.stealth.ly
- - - - - - - - - - - - - - - - - - -

On Sun, Aug 9, 2015 at 8:30 PM, Julio Castillo <
jcasti...@financialengines.com> wrote:

> The only reason for this request is because I may want to use alternatives
> like Consul.
>
> ** julio
>
> On 8/9/15, 3:40 PM, "Joe Lawson" 
> wrote:
>
> >Inline responses below.
> >
> >Sincerely,
> >
> >Joe Lawson
> >
> >On Aug 9, 2015 1:52 PM, "Julio Castillo" 
> >wrote:
> >>
> >> Thank for the lead.
> >> Does that mean that Kafka is/will be using Curator?
> >
> >I don't think so.
> >
> >>
> >> Also, this appears to simplify the interaction with ZooKeeper, but if I
> >> understand it correctly, it doesn易t abstract the interface where could
> >> plug-in a different service.
> >
> >You are right. I misunderstood what you meant and am unaware of any
> >ZooKeeper abstraction api that could allow other k/v stores underneath.
> >That is an interesting idea. Any reasons for the desire?
> >
> >>
> >> Thanks
> >>
> >> ** julio
> >>
> >> On 8/9/15, 10:21 AM, "Joe Lawson" 
> >> wrote:
> >>
> >> >Netflix contributed Curator
> >> >(
> >https://urldefense.proofpoint.com/v2/url?u=http-3A__curator.apache.org_&d
> >>
> >>=AwIBaQ&c=cKbMccWasSe6U4u_qE0M-qEjqwAh3shjuL5QPa1B7Yk&r=rJHFl4LhCQ-6kvKRO
> >>h
> >>
> >>IocflKqVSHRTvT-PgdZ5MFuS0&m=tR362iZgCAvRev2Bgf1Itdzh9j2bLPt9FEjoay6gO0A&s
> >>=
> >> >Rc_80-_j6gjBCp2GKL8sueIP8IKNdt7p7kgBhsPw2ZA&e= ) to Apache which
> >> >implements some generic zk recipes.
> >> >On Aug 9, 2015 11:39 AM, "Julio Castillo"
> >> >>
> >> >wrote:
> >> >
> >> >> Had there been any thought at abstracting the interface to ZooKeeper?
> >> >>
> >> >> The reason I'm asking is because I'm looking at Consul for service
> >> >> discovery today, perhaps a different one tomorrow, but the point here
> >is
> >> >> the ability to plug in any type of service discovery, K/V store
> >service.
> >> >>
> >> >> Any thoughts?
> >> >>
> >> >> ** julio
> >> >>
> >> >> NOTICE: This e-mail and any attachments to it may be privileged,
> >> >> confidential or contain trade secret information and is intended only
> >> >>for
> >> >> the use of the individual or entity to which it is addressed. If this
> >> >> e-mail was sent to you in error, please notify me immediately by
> >>either
> >> >> reply e-mail or by phone at 408.498.6000, and do not use,
> >>disseminate,
> >> >> retain, print or copy the e-mail or any attachment. All messages sent
> >to
> >> >> and from this e-mail address may be monitored as permitted by or
> >> >>necessary
> >> >> under applicable law and regulations.
> >> >>
> >>
> >> NOTICE: This e-mail and any attachments to it may be privileged,
> >confidential or contain trade secret information and is intended only for
> >the use of the individual or entity to which it is addressed. If this
> >e-mail was sent to you in error, please notify me immediately by either
> >reply e-mail or by phone at 408.498.6000, and do not use, disseminate,
> >retain, print or copy the e-mail or any attachment. All messages sent to
> >and from this e-mail address may be monitored as permitted by or necessary
> >under applicable law and regulations.
>
>
> NOTICE: This e-mail and any attachments to it may be privileged,
> confidential or contain trade secret information and is intended only for
> the use of the individual or entity to which it is addressed. If this
> e-mail was sent to you in error, please notify me immediately by either
> reply e-mail or by phone at 408.498.6000, and do not use, disseminate,
> retain, print or copy the e-mail or any attachment. All messages sent to
> and from this e-mail address may be monitored as permitted by or necessary
> under applicable law and regulations.
>


Re: Re: Closing socket connection to /192.115.190.61. (kafka.network.Processor)

2015-06-19 Thread Joe Stein
yup, my bad I really thought we got that one in but it is on trunk so 0.8.3
it is

~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -

On Fri, Jun 19, 2015 at 8:52 AM, Todd Palino  wrote:

> I don't think this got changed until after 0.8.2. I believe the change is
> still in trunk and not a released version. We haven't even picked it up
> internally at LinkedIn yet.
>
> -Todd
>
>
> On Fri, Jun 19, 2015 at 12:03 AM, bit1...@163.com  wrote:
>
> > Thank you for the replay.
> > I am using kafka_2.10-0.8.2.1,and I didn't change the log things in
> Kafka.
> >
> >
> >
> > bit1...@163.com
> >
> > From: Joe Stein
> > Date: 2015-06-19 13:43
> > To: users
> > Subject: Re: Closing socket connection to /192.115.190.61.
> > (kafka.network.Processor)
> > What version of Kafka are you using? This was changed to debug level in
> > 0.8.2.
> >
> > ~ Joestein
> > On Jun 18, 2015 10:39 PM, "bit1...@163.com"  wrote:
> >
> > > Hi,
> > > I have started the kafka server as a backgroud process, however, the
> > > following INFO log appears on the console very 10 seconds.
> > > Looks it is not an error since its log level is INFO. How could I
> > suppress
> > > this annoying log? Thanks
> > >
> > >
> > > [2015-06-19 13:34:10,884] INFO Closing socket connection to /
> > > 192.115.190.61. (kafka.network.Processor)
> > >
> > >
> > >
> > > bit1...@163.com
> > >
> >
>


Re: Closing socket connection to /192.115.190.61. (kafka.network.Processor)

2015-06-18 Thread Joe Stein
What version of Kafka are you using? This was changed to debug level in
0.8.2.

~ Joestein
On Jun 18, 2015 10:39 PM, "bit1...@163.com"  wrote:

> Hi,
> I have started the kafka server as a backgroud process, however, the
> following INFO log appears on the console very 10 seconds.
> Looks it is not an error since its log level is INFO. How could I suppress
> this annoying log? Thanks
>
>
> [2015-06-19 13:34:10,884] INFO Closing socket connection to /
> 192.115.190.61. (kafka.network.Processor)
>
>
>
> bit1...@163.com
>


Re: Keeping Zookeeper and Kafka Server Up

2015-06-17 Thread Joe Stein
+1 to using exhibitor. Besides managing the ensemble it also helps with
backups and zk log cleanup (which if you don't do your machine will run out
of space).

~ Joestein
On Jun 17, 2015 9:44 AM, "Dillian Murphey"  wrote:

> supervisord is pretty easy to use.  Netflix Exhibitor will manage this all
> for zookeeper, if you want to try that tool.
>
> On Wed, Jun 17, 2015 at 7:03 AM, Kashyap Mhaisekar 
> wrote:
>
> > We use supervisord for this. It ensures that the processes are always up
> > and running.
> >
> > Thanks
> > Kashyap
> >
> > On Wednesday, June 17, 2015, Shayne S  wrote:
> >
> > > kafka-server-start.sh has a -daemon option, but I don't think Zookeeper
> > has
> > > it.
> > >
> > > On Tue, Jun 16, 2015 at 11:32 PM, Su She  > > > wrote:
> > >
> > > > It seems like nohup has solved this issue, even when the putty window
> > > > becomes inactive the processes are still running (I din't need to
> > > > interact with them). I might look into using screen or tmux as a long
> > > > term solution.
> > > >
> > > > Thanks Terry and Mike!
> > > >
> > > > Best,
> > > >
> > > > Su
> > > >
> > > >
> > > > On Tue, Jun 16, 2015 at 3:42 PM, Terry Bates  > > >
> > > > wrote:
> > > > > Greetings,
> > > > >
> > > > > nohup does the trick, as Mr. Bridge has shared. If you seem to want
> > to
> > > > run
> > > > > these and still have some "interactivity" with
> > > > > the services, consider using "screen" or "tmux" as these will
> enable
> > > you
> > > > to
> > > > > run these programs in foreground, have added
> > > > > windows you can use to access shell, tail logs, and so on, and
> enable
> > > you
> > > > > to disconnect from the session, but still have
> > > > > these sessions available for re-attachment.
> > > > >
> > > > > In addition, I using "runit" for service supervision may enable you
> > to
> > > > keep
> > > > > daemons running, but if your services are dying
> > > > > you may need to introspect more deeply on the root cause versus
> > working
> > > > > around it by restarting them.
> > > > >
> > > > >
> > > > > *Terry Bates*
> > > > >
> > > > > *Email: *terryjba...@gmail.com 
> > > > > *Phone: (*412) 215-0881
> > > > > *Skype*: terryjbates
> > > > > *GitHub*: https://github.com/terryjbates
> > > > > *Linkedin*: http://www.linkedin.com/in/terryjbates/
> > > > >
> > > > >
> > > > > On Tue, Jun 16, 2015 at 3:30 PM, Mike Bridge <
> m...@bridgecanada.com
> > > >
> > > > wrote:
> > > > >
> > > > >> Have you tried using "nohup"
> > > > >>
> > > > >> nohup bin/zookeeper-server-start.sh
> config/zookeeper.properties
> > &
> > > > >> nohup bin/kafka-server-start.sh config/server.properties &
> > > > >>
> > > > >>
> > > > >> On Tue, Jun 16, 2015 at 3:21 PM, Su She  > > > wrote:
> > > > >>
> > > > >> > Hello Everyone,
> > > > >> >
> > > > >> > I'm wondering how to keep Zookeeper and Kafka Server up even
> when
> > my
> > > > >> > SSH (using putty) becomes inactive. I've tried running it in the
> > > > >> > background (using &), but it seems like it stops sometimes
> after a
> > > > >> > couple hours or so and I'll have to restart zookeeper and/or the
> > > kafka
> > > > >> > server.
> > > > >> >
> > > > >> > The only remediation i've found is to export TMOUT=[big number],
> > but
> > > > >> > there must be another solution.
> > > > >> >
> > > > >> > Thank you!
> > > > >> >
> > > > >> > Best,
> > > > >> >
> > > > >> > Su
> > > > >> >
> > > > >>
> > > >
> > >
> >
>


Re: If you run Kafka in AWS or Docker, how do you persist data?

2015-06-11 Thread Joe Stein
If your running kafka on mesos you should use the framework
https://github.com/mesos/kafka as the scheduler is better opinionated than
generic launch on marathon.

~ Joestein
On Jun 11, 2015 10:26 AM, "Domen Pogacnik" <
domen.pogac...@rocket-internet.de> wrote:

> @Jeff Schroeder:
>
> I’m trying to do a similar thing: running Kafka brokers in docker under
> marathon under Mesos. I’m wondering if you're able to do a rolling deploy
> in
> a way that Marathon waits for the old broker instances to replicate the
> data to the new ones before killing them?
>
> Best,
>
> Domen
>


Re: KAFKA 0.9 release timeline

2015-05-29 Thread Joe Stein
Hey Murthy, we discussed 0.8.3 a couple months back
https://mail-archives.apache.org/mod_mbox/kafka-dev/201503.mbox/%3ccafc58g8qpqy1mwd3w3oobexepdmgyy9gzuxcnovgesg1txd...@mail.gmail.com%3E
we should be in a better place in the next few weeks for when that could
ship. A lot of what you might have been looking for could be in 0.8.3 you
can see security+0.8.3
https://issues.apache.org/jira/browse/KAFKA-1882?jql=project%20%3D%20KAFKA%20AND%20fixVersion%20%3D%200.8.3%20AND%20resolution%20%3D%20Unresolved%20AND%20component%20%3D%20security%20ORDER%20BY%20priority%20DESC

We should start the discussion again maybe in a few weeks after folks have
had a chance to test and try out all what is done/almost done.

~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -

On Fri, May 29, 2015 at 10:04 AM, Kuchibhotla, Murthy <
venkata.murthy.kuchibho...@bnymellon.com> wrote:

> Hi,
>
> I am checking to see when do we get KAFKA 0.9 version - where the basic
> authentication (security) is being implemented?
>
> Thanks
> --
> Murthy Kuchibhotla
>
>
> The information contained in this e-mail, and any attachment, is
> confidential and is intended solely for the use of the intended recipient.
> Access, copying or re-use of the e-mail or any attachment, or any
> information contained therein, by any other person is not authorized. If
> you are not the intended recipient please return the e-mail to the sender
> and delete it from your computer. Although we attempt to sweep e-mail and
> attachments for viruses, we do not guarantee that either are virus-free and
> accept no liability for any damage sustained as a result of viruses.
>
> Please refer to http://disclaimer.bnymellon.com/eu.htm for certain
> disclosures relating to European legal entities.


Re: kafkacat

2015-05-19 Thread Joe Stein
Try out bruce
https://github.com/ifwe/bruce it's a daemon listening socket producer, does
exactly what you are looking for I think.

~ Joestein
On May 19, 2015 7:05 AM, "clay teahouse"  wrote:

> Thanks Magnus. I'll take a look at n2kafka. I have many data sources
> sending data to kafka and I don't want to spawn lots of kafkacat processes.
>
> On Tue, May 19, 2015 at 2:40 AM, Magnus Edenhill 
> wrote:
>
> > Hi Clay,
> >
> > not really sure what you mean by socket, but if you want something
> > listening on a network port and forwards/produces all data to Kafka then
> > you might want to look at n2kafka: https://github.com/redBorder/n2kafka
> >
> > Another alternative would be to use inetd, socat, or similar to pipe a
> > network socket to kafkacat.
> >
> > Regards,
> > Magnus
> >
> >
> > 2015-05-19 2:02 GMT+02:00 clay teahouse :
> >
> > > Hi All,
> > >
> > > Does anyone know of an implementation of kafkacat that reads from
> socket?
> > > Or for that matter any kafka producer client that can read from socket
> > and
> > > publish to kafka?
> > >
> > > thanks
> > >
> > > Clay
> > >
> >
>


Re: Data replication and zero data loss

2015-05-01 Thread Joe Stein
If you want 0 data loss you should also look into the min.insync.repica
setting in 0.8.2.1 as it guarantees data in multiple racks.

If you don't have that set then you have this scenario as possible.

lets say 1 topic, 1 partition, replication 3. You are producing with ACK=-1

b1, b2, b3 (where b=broker and b1 is leader, b2, b3 replicas).

b1,b2 dies, b3 is leader. so far all is well.

10 minutes go by and b3 dies

1 minute later b1 comes back online, it will truncate essentially 45
minutes of data upstream thought was saved.

but now, you can have ACK=-1 get a failure if you don't have a enough
replica to survive data loss guarantees. min.isr=2 min.sir=3 //depends on
data

Also take a look at
https://github.com/stealthly/go_kafka_client/tree/master/mirrormaker it
might be helpful for what you are looking for.

~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -

On Fri, May 1, 2015 at 7:43 AM, Joong Lee  wrote:

> It is based on our understanding from reading the documents.
>
> We aren't concerned of data duplication as that is going to be handled by
> elasticsearch.
>
> > On May 1, 2015, at 12:15 AM, Daniel Compton <
> daniel.compton.li...@gmail.com> wrote:
> >
> > When we evaluated MirrorMaker last year we didn't find any risk of data
> > loss, only duplicate messages in the case of a network partition.
> >
> > Did you discover data loss in your tests, or were you just looking at the
> > docs?
> > On Fri, 1 May 2015 at 4:31 pm Jiangjie Qin 
> > wrote:
> >
> >> Which mirror maker version did you look at? The MirrorMaker in trunk
> >> should not have data loss if you just use the default setting.
> >>
> >>> On 4/30/15, 7:53 PM, "Joong Lee"  wrote:
> >>>
> >>> Hi,
> >>> We are exploring Kafka to keep two data centers (primary and DR)
> running
> >>> hosts of elastic search nodes in sync. One key requirement is that we
> >>> can't lose any data. We POC'd use of MirrorMaker and felt it may not
> meet
> >>> out data loss requirement.
> >>>
> >>> I would like ask the community if we should look for another solution
> or
> >>> would Kafka be the right solution considering zero data loss
> requirement.
> >>>
> >>> Thanks
> >>
> >>
>


Re: Horizontally Scaling Kafka Consumers

2015-04-29 Thread Joe Stein
The Go Kafka Client also supports offset storage in ZK and Kafka
https://github.com/stealthly/go_kafka_client/blob/master/docs/offset_storage.md
and has two other strategies for partition ownership with a consensus
server (currently uses Zookeeper will be implementing Consul in near
future).

~ Joestein

On Thu, Apr 30, 2015 at 2:15 AM, Nimi Wariboko Jr 
wrote:

> My mistake, it seems the Java drivers are a lot more advanced than the
> Shopify's Kafka driver (or I am missing something) - and I haven't used
> Kafka before.
>
> With the Go driver - it seems you have to manage offsets and partitions
> within the application code, while in Scala driver it seems you have the
> option of simply subscribing to a topic, and someone else will manage that
> part.
>
> After digging around a bit more, I found there is another library -
> https://github.com/wvanbergen/kafka - that speaks the consumergroup API
> and
> accomplishes what I was looking for and I assume is implemented by keeping
> track of memberships w/ Zookeeper.
>
> Thank you for the information - it really helped clear up what I failing to
> understand with kafka.
>
> Nimi
>
> On Wed, Apr 29, 2015 at 10:10 PM, Joe Stein  wrote:
>
> > You can do this with the existing Kafka Consumer
> >
> >
> https://github.com/apache/kafka/blob/0.8.2/core/src/main/scala/kafka/consumer/SimpleConsumer.scala#L106
> > and probably any other Kafka client too (maybe with minor/major rework
> > to-do the offset management).
> >
> > The new consumer approach is more transparent on "Subscribing To Specific
> > Partitions"
> >
> >
> https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java#L200-L234
> > .
> >
> > Here is a Docker file (** pull request pending **) for wrapping kafka
> > consumers (doesn't have to be the go client, need to abstract that out
> some
> > more after more testing)
> >
> >
> https://github.com/stealthly/go_kafka_client/blob/mesos-marathon/consumers/Dockerfile
> >
> >
> > Also a VM (** pull request pending **) to build container, push to local
> > docker repository and launch on Apache Mesos
> >
> >
> https://github.com/stealthly/go_kafka_client/tree/mesos-marathon/mesos/vagrant
> > as working example how-to-do.
> >
> > All of this could be done without the Docker container and still work on
> > Mesos ... or even without Mesos and on YARN.
> >
> > You might also want to checkout how Samza integrates with Execution
> > Frameworks
> >
> >
> http://samza.apache.org/learn/documentation/0.9/comparisons/introduction.html
> > which has a Mesos patch https://issues.apache.org/jira/browse/SAMZA-375
> > and
> > built in YARN support.
> >
> > ~ Joe Stein
> > - - - - - - - - - - - - - - - - -
> >
> >   http://www.stealth.ly
> > - - - - - - - - - - - - - - - - -
> >
> > On Wed, Apr 29, 2015 at 8:56 AM, David Corley 
> > wrote:
> >
> > > You're right Stevo, I should re-phrase to say that there can be no more
> > > _active_ consumers than there are partitions (within a single consumer
> > > group).
> > > I'm guessing that's what Nimi is alluding to asking, but perhaps he can
> > > elaborate on whether he's using consumer groups and/or whether the 100
> > > partitions are all for a single topic, or multiple topics.
> > >
> > > On 29 April 2015 at 13:38, Stevo Slavić  wrote:
> > >
> > > > Please correct me if wrong, but I think it is really not hard
> > constraint
> > > > that one cannot have more consumers (from same group) than partitions
> > on
> > > > single topic - all the surplus consumers will not be assigned to
> > consume
> > > > any partition, but they can be there and as soon as one active
> consumer
> > > > from same group goes offline (its connection to ZK is dropped),
> > consumers
> > > > from the group will be rebalanced so one passively waiting consumer
> > will
> > > > become active.
> > > >
> > > > Kind regards,
> > > > Stevo Slavic.
> > > >
> > > > On Wed, Apr 29, 2015 at 2:25 PM, David Corley  >
> > > > wrote:
> > > >
> > > > > If the 100 partitions are all for the same topic, you can have up
> to
> > > 100
> > > > > consumers working as part of a single consumer group for that
> topic.
> > > > > You cannot have more consumers than there are partitions within a
> > given
> > > > > consumer group.
> > > > >
> > > > > On 29 April 2015 at 08:41, Nimi Wariboko Jr  >
> > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I was wondering what options there are for horizontally scaling
> > kafka
> > > > > > consumers? Basically if I have 100 partitions and 10 consumers,
> and
> > > > want
> > > > > to
> > > > > > temporarily scale up to 50 consumers, what options do I have?
> > > > > >
> > > > > > So far I've thought of just simply tracking consumer membership
> > > somehow
> > > > > > (either through Raft or zookeeper's znodes) on the consumers.
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: Horizontally Scaling Kafka Consumers

2015-04-29 Thread Joe Stein
You can do this with the existing Kafka Consumer
https://github.com/apache/kafka/blob/0.8.2/core/src/main/scala/kafka/consumer/SimpleConsumer.scala#L106
and probably any other Kafka client too (maybe with minor/major rework
to-do the offset management).

The new consumer approach is more transparent on "Subscribing To Specific
Partitions"
https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java#L200-L234
.

Here is a Docker file (** pull request pending **) for wrapping kafka
consumers (doesn't have to be the go client, need to abstract that out some
more after more testing)
https://github.com/stealthly/go_kafka_client/blob/mesos-marathon/consumers/Dockerfile


Also a VM (** pull request pending **) to build container, push to local
docker repository and launch on Apache Mesos
https://github.com/stealthly/go_kafka_client/tree/mesos-marathon/mesos/vagrant
as working example how-to-do.

All of this could be done without the Docker container and still work on
Mesos ... or even without Mesos and on YARN.

You might also want to checkout how Samza integrates with Execution
Frameworks
http://samza.apache.org/learn/documentation/0.9/comparisons/introduction.html
which has a Mesos patch https://issues.apache.org/jira/browse/SAMZA-375 and
built in YARN support.

~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -

On Wed, Apr 29, 2015 at 8:56 AM, David Corley  wrote:

> You're right Stevo, I should re-phrase to say that there can be no more
> _active_ consumers than there are partitions (within a single consumer
> group).
> I'm guessing that's what Nimi is alluding to asking, but perhaps he can
> elaborate on whether he's using consumer groups and/or whether the 100
> partitions are all for a single topic, or multiple topics.
>
> On 29 April 2015 at 13:38, Stevo Slavić  wrote:
>
> > Please correct me if wrong, but I think it is really not hard constraint
> > that one cannot have more consumers (from same group) than partitions on
> > single topic - all the surplus consumers will not be assigned to consume
> > any partition, but they can be there and as soon as one active consumer
> > from same group goes offline (its connection to ZK is dropped), consumers
> > from the group will be rebalanced so one passively waiting consumer will
> > become active.
> >
> > Kind regards,
> > Stevo Slavic.
> >
> > On Wed, Apr 29, 2015 at 2:25 PM, David Corley 
> > wrote:
> >
> > > If the 100 partitions are all for the same topic, you can have up to
> 100
> > > consumers working as part of a single consumer group for that topic.
> > > You cannot have more consumers than there are partitions within a given
> > > consumer group.
> > >
> > > On 29 April 2015 at 08:41, Nimi Wariboko Jr 
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > I was wondering what options there are for horizontally scaling kafka
> > > > consumers? Basically if I have 100 partitions and 10 consumers, and
> > want
> > > to
> > > > temporarily scale up to 50 consumers, what options do I have?
> > > >
> > > > So far I've thought of just simply tracking consumer membership
> somehow
> > > > (either through Raft or zookeeper's znodes) on the consumers.
> > > >
> > >
> >
>


Re: jruby-kafka 1.4.0 released

2015-03-21 Thread Joe Stein
Nice!

~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -

On Sun, Mar 22, 2015 at 12:03 AM, Joseph Lawson  wrote:

> Hi everyone,
>
>
> I updated jruby-kafka to version 1.4.0 which uses Kafka 0.8.2.1 and has a
> new interface for the latest KafkaProducer. I also recently updated the
> packaging of the gemset to include the jar files needed to run everything
> so it's easier to use.  Check it out
> https://github.com/joekiller/jruby-kafka or gem install jruby-kafka.
> Feedback is always welcome.
>
>
> Regards,
>
>
> Joe Lawson
>
>


Re: Kafka elastic no downtime scalability

2015-03-13 Thread Joe Stein
https://kafka.apache.org/documentation.html#basic_ops_cluster_expansion

~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -

On Fri, Mar 13, 2015 at 3:05 PM, sunil kalva  wrote:

> Joe
>
> "Well, I know it is semantic but right now it "can" be elastically scaled
> without down time but you have to integrate into your environment for what
> that means it has been that way since 0.8.0 imho"
>
> here what do you mean "you have to integrate into your environment", how do
> i achieve elastically scaled cluster seamlessly ?
>
> SunilKalva
>
> On Fri, Mar 13, 2015 at 10:27 PM, Joe Stein  wrote:
>
> > Well, I know it is semantic but right now it "can" be elastically scaled
> > without down time but you have to integrate into your environment for
> what
> > that means it has been that way since 0.8.0 imho.
> >
> > My point was just another way to-do that out of the box... folks do this
> > elastic scailing today with AWS CloudFormation and internal systems they
> > built too.
> >
> > So, it can be done... you just have todo it.
> >
> > ~ Joe Stein
> > - - - - - - - - - - - - - - - - -
> >
> >   http://www.stealth.ly
> > - - - - - - - - - - - - - - - - -
> >
> > On Fri, Mar 13, 2015 at 12:39 PM, Stevo Slavić 
> wrote:
> >
> > > OK, thanks for heads up.
> > >
> > > When reading Apache Kafka docs, and reading what Apache Kafka "can" I
> > > expect it to already be available in latest general availability
> release,
> > > not what's planned as part of some other project.
> > >
> > > Kind regards,
> > > Stevo Slavic.
> > >
> > > On Fri, Mar 13, 2015 at 2:32 PM, Joe Stein 
> wrote:
> > >
> > > > Hey Stevo, "can be elastically and transparently expanded without
> > > > downtime." is
> > > > the goal of Kafka on Mesos https://github.com/mesos/kafka as Kafka
> as
> > > the
> > > > ability (knobs/levers) to-do this but has to be made to-do this out
> of
> > > the
> > > > box.
> > > >
> > > > e.g. in Kafka on Mesos when a broker fails, after the configurable
> max
> > > fail
> > > > over timeout (meaning it is truly deemed hard failure) then a broker
> > > (with
> > > > the same id) will automatically be started on a another machine, data
> > > > replicated and back in action once that is done, automatically. Lots
> > more
> > > > features already in there... we are also in progress to auto balance
> > > > partitions when increasing/decreasing the size of the cluster and
> some
> > > more
> > > > goodies too.
> > > >
> > > > ~ Joe Stein
> > > > - - - - - - - - - - - - - - - - -
> > > >
> > > >   http://www.stealth.ly
> > > > - - - - - - - - - - - - - - - - -
> > > >
> > > > On Fri, Mar 13, 2015 at 8:43 AM, Stevo Slavić 
> > wrote:
> > > >
> > > > > Hello Apache Kafka community,
> > > > >
> > > > > On Apache Kafka website home page http://kafka.apache.org/ it is
> > > stated
> > > > > that Kafka "can be elastically and transparently expanded without
> > > > > downtime."
> > > > > Is that really true? More specifically, can one just add one more
> > > broker,
> > > > > have another partition added for the topic, have new broker
> assigned
> > to
> > > > be
> > > > > the leader for new partition, have producers correctly write to the
> > new
> > > > > partition, and consumers read from it, with no broker, consumer or
> > > > producer
> > > > > downtime, no data loss, no manual action to move data from existing
> > > > > partitions to new partition?
> > > > >
> > > > > Kind regards,
> > > > > Stevo Slavic.
> > > > >
> > > >
> > >
> >
>
>
>
> --
> SunilKalva
>


Re: Kafka elastic no downtime scalability

2015-03-13 Thread Joe Stein
Well, I know it is semantic but right now it "can" be elastically scaled
without down time but you have to integrate into your environment for what
that means it has been that way since 0.8.0 imho.

My point was just another way to-do that out of the box... folks do this
elastic scailing today with AWS CloudFormation and internal systems they
built too.

So, it can be done... you just have todo it.

~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -

On Fri, Mar 13, 2015 at 12:39 PM, Stevo Slavić  wrote:

> OK, thanks for heads up.
>
> When reading Apache Kafka docs, and reading what Apache Kafka "can" I
> expect it to already be available in latest general availability release,
> not what's planned as part of some other project.
>
> Kind regards,
> Stevo Slavic.
>
> On Fri, Mar 13, 2015 at 2:32 PM, Joe Stein  wrote:
>
> > Hey Stevo, "can be elastically and transparently expanded without
> > downtime." is
> > the goal of Kafka on Mesos https://github.com/mesos/kafka as Kafka as
> the
> > ability (knobs/levers) to-do this but has to be made to-do this out of
> the
> > box.
> >
> > e.g. in Kafka on Mesos when a broker fails, after the configurable max
> fail
> > over timeout (meaning it is truly deemed hard failure) then a broker
> (with
> > the same id) will automatically be started on a another machine, data
> > replicated and back in action once that is done, automatically. Lots more
> > features already in there... we are also in progress to auto balance
> > partitions when increasing/decreasing the size of the cluster and some
> more
> > goodies too.
> >
> > ~ Joe Stein
> > - - - - - - - - - - - - - - - - -
> >
> >   http://www.stealth.ly
> > - - - - - - - - - - - - - - - - -
> >
> > On Fri, Mar 13, 2015 at 8:43 AM, Stevo Slavić  wrote:
> >
> > > Hello Apache Kafka community,
> > >
> > > On Apache Kafka website home page http://kafka.apache.org/ it is
> stated
> > > that Kafka "can be elastically and transparently expanded without
> > > downtime."
> > > Is that really true? More specifically, can one just add one more
> broker,
> > > have another partition added for the topic, have new broker assigned to
> > be
> > > the leader for new partition, have producers correctly write to the new
> > > partition, and consumers read from it, with no broker, consumer or
> > producer
> > > downtime, no data loss, no manual action to move data from existing
> > > partitions to new partition?
> > >
> > > Kind regards,
> > > Stevo Slavic.
> > >
> >
>


Re: Kafka elastic no downtime scalability

2015-03-13 Thread Joe Stein
Hey Stevo, "can be elastically and transparently expanded without downtime." is
the goal of Kafka on Mesos https://github.com/mesos/kafka as Kafka as the
ability (knobs/levers) to-do this but has to be made to-do this out of the
box.

e.g. in Kafka on Mesos when a broker fails, after the configurable max fail
over timeout (meaning it is truly deemed hard failure) then a broker (with
the same id) will automatically be started on a another machine, data
replicated and back in action once that is done, automatically. Lots more
features already in there... we are also in progress to auto balance
partitions when increasing/decreasing the size of the cluster and some more
goodies too.

~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -

On Fri, Mar 13, 2015 at 8:43 AM, Stevo Slavić  wrote:

> Hello Apache Kafka community,
>
> On Apache Kafka website home page http://kafka.apache.org/ it is stated
> that Kafka "can be elastically and transparently expanded without
> downtime."
> Is that really true? More specifically, can one just add one more broker,
> have another partition added for the topic, have new broker assigned to be
> the leader for new partition, have producers correctly write to the new
> partition, and consumers read from it, with no broker, consumer or producer
> downtime, no data loss, no manual action to move data from existing
> partitions to new partition?
>
> Kind regards,
> Stevo Slavic.
>


Re: [kafka-clients] Re: [VOTE] 0.8.2.1 Candidate 2

2015-03-10 Thread Joe Stein
Thanks Jun for getting this release out the door and everyone that
contributed to the work in 0.8.2.1, awesome!

~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -

On Mon, Mar 9, 2015 at 2:12 PM, Jun Rao  wrote:

> The following are the results of the votes.
>
> +1 binding = 3 votes
> +1 non-binding = 2 votes
> -1 = 0 votes
> 0 = 0 votes
>
> The vote passes.
>
> I will release artifacts to maven central, update the dist svn and download
> site. Will send out an announce after that.
>
> Thanks everyone that contributed to the work in 0.8.2.1!
>
> Jun
>
> On Thu, Feb 26, 2015 at 2:59 PM, Jun Rao  wrote:
>
>> This is the second candidate for release of Apache Kafka 0.8.2.1. This
>> fixes 4 critical issue in 0.8.2.0.
>>
>> Release Notes for the 0.8.2.1 release
>>
>> https://people.apache.org/~junrao/kafka-0.8.2.1-candidate2/RELEASE_NOTES.html
>>
>> *** Please download, test and vote by Monday, Mar 2, 3pm PT
>>
>> Kafka's KEYS file containing PGP keys we use to sign the release:
>> http://kafka.apache.org/KEYS in addition to the md5, sha1
>> and sha2 (SHA256) checksum.
>>
>> * Release artifacts to be voted upon (source and binary):
>> https://people.apache.org/~junrao/kafka-0.8.2.1-candidate2/
>>
>> * Maven artifacts to be voted upon prior to release:
>> https://repository.apache.org/content/groups/staging/
>>
>> * scala-doc
>> https://people.apache.org/~junrao/kafka-0.8.2.1-candidate2/scaladoc/
>>
>> * java-doc
>> https://people.apache.org/~junrao/kafka-0.8.2.1-candidate2/javadoc/
>>
>> * The tag to be voted upon (off the 0.8.2 branch) is the 0.8.2.1 tag
>>
>> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=bd1bfb63ec73c10d08432ac893a23f28281ea021
>> (git commit ee1267b127f3081db491fa1bf9a287084c324e36)
>>
>> /***
>>
>> Thanks,
>>
>> Jun
>>
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "kafka-clients" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kafka-clients+unsubscr...@googlegroups.com.
> To post to this group, send email to kafka-clie...@googlegroups.com.
> Visit this group at http://groups.google.com/group/kafka-clients.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/kafka-clients/CAFc58G9w_bhrPq1wqEJe-R7-_DecvMNCTuOuLtrA%3DUzXXs%2Bt%3Dg%40mail.gmail.com
> <https://groups.google.com/d/msgid/kafka-clients/CAFc58G9w_bhrPq1wqEJe-R7-_DecvMNCTuOuLtrA%3DUzXXs%2Bt%3Dg%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>


Re: framework to load streamed data from kafka into relational database

2015-03-10 Thread Joe Stein
If you wanted to use Go you could code a simple worker strategy
https://github.com/stealthly/go_kafka_client/blob/master/consumers/consumers.go#L162-L170
 with https://godoc.org/golang.org/x/tools/oracle to-do it and have at
least once insert/update guarantee.

Not sure if you have language requirements or were looking for something
higher level like Sqoop or NiFi (neither which I think support both yet) or
something.

~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -

On Tue, Mar 10, 2015 at 3:14 AM, Vadim Keylis  wrote:

> Good evening. Can someone suggest existing framework that allows to
> reliably load data from kafka into relation database like Oracle in real
> time?
>
> Thanks so much in advance,
> Vadim
>


Re: New Errors in 0.8.2 Protocol

2015-03-04 Thread Joe Stein
Hey Evan, moving forward (so 0.8.3.0 and beyond) the release documentation
is going to match up more with specific KIP changes
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
which elaborated on things like "breaking changes" and "major modifications
you should adopt before breaking changes happen", "etc". This will be
helpful not only to know what is changing but a place for discussions prior
to those changes to happen in a better forum for everyone.

One of the issues we have now (agreed) is that flattened wire protocol
document doesn't provide the information everyone needs fluidly enough.

~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -

On Wed, Mar 4, 2015 at 12:44 PM, Evan Huus  wrote:

> Hey all, it seems that 0.8.2 has added a handful more errors to the
> protocol which are not yet reflected on the wiki page [1]. Specifically,
> [2] seems to indicate that codes 17-20 now have associated meanings.
>
> My questions are:
> - Which of these are exposed "publicly"? (for example, the existing error
> 13 is only ever internal to the brokers, so is irrelevant for third-party
> clients to know about, are any of the new ones like that?)
> - When (if ever) is InvalidTopicException returned, and what is it for that
> UnknownTopicOrPartition couldn't be used?
>
> I would also note that this is the *second* issue I've come across where
> the protocol specification is not up-to-date with the protocol actually
> used by the 0.8.2 brokers. That specification is what I, as developer of
> third-party language bindings, rely on in order to be compatible with Kafka
> proper. If you want a healthy community of third-party bindings and
> clients, you *have* to do a better job of keeping that documentation up to
> date, this is getting really frustrating. Fortunately none of these issues
> have caused data loss for us. Yet.
>
> Thanks,
> Evan
>
> [1]
>
> https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol
> (currently down for maintenance apparently)
> [2]
>
> https://github.com/apache/kafka/blob/ee1267b127f3081db491fa1bf9a287084c324e36/clients/src/main/java/org/apache/kafka/common/protocol/Errors.java#L46-L49
>


Re: [kafka-clients] Re: [VOTE] 0.8.2.1 Candidate 2

2015-03-03 Thread Joe Stein
Ok, lets fix the transient test failure on trunk agreed not a blocker.

+1 quick start passed, verified artifacts, updates in scala
https://github.com/stealthly/scala-kafka/tree/0.8.2.1 and go
https://github.com/stealthly/go_kafka_client/tree/0.8.2.1 look good

~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -

On Tue, Mar 3, 2015 at 12:30 PM, Jun Rao  wrote:

> Hi, Joe,
>
> Yes, that unit test does have transient failures from time to time. The
> issue seems to be with the unit test itself and not the actual code. So,
> this is not a blocker for 0.8.2.1 release. I think we can just fix it in
> trunk.
>
> Thanks,
>
> Jun
>
> On Tue, Mar 3, 2015 at 9:08 AM, Joe Stein  wrote:
>
>> Jun, I have most everything looks good except I keep getting test
>> failures from wget
>> https://people.apache.org/~junrao/kafka-0.8.2.1-candidate2/kafka-0.8.2.1-src.tgz
>> && tar -xvf kafka-0.8.2.1-src.tgz && cd kafka-0.8.2.1-src && gradle &&
>> ./gradlew test
>>
>> kafka.api.ProducerFailureHandlingTest >
>> testNotEnoughReplicasAfterBrokerShutdown FAILED
>> org.scalatest.junit.JUnitTestFailedError: Expected
>> NotEnoughReplicasException when producing to topic with fewer brokers than
>> min.insync.replicas
>> at
>> org.scalatest.junit.AssertionsForJUnit$class.newAssertionFailedException(AssertionsForJUnit.scala:101)
>> at
>> org.scalatest.junit.JUnit3Suite.newAssertionFailedException(JUnit3Suite.scala:149)
>> at org.scalatest.Assertions$class.fail(Assertions.scala:711)
>> at org.scalatest.junit.JUnit3Suite.fail(JUnit3Suite.scala:149)
>> at
>> kafka.api.ProducerFailureHandlingTest.testNotEnoughReplicasAfterBrokerShutdown(ProducerFailureHandlingTest.scala:355)
>>
>> This happens to me all the time on a few different machines.
>>
>> ~ Joe Stein
>> - - - - - - - - - - - - - - - - -
>>
>>   http://www.stealth.ly
>> - - - - - - - - - - - - - - - - -
>>
>> On Mon, Mar 2, 2015 at 7:36 PM, Jun Rao  wrote:
>>
>>> +1 from me. Verified quickstart and unit tests.
>>>
>>> Thanks,
>>>
>>> Jun
>>>
>>> On Thu, Feb 26, 2015 at 2:59 PM, Jun Rao  wrote:
>>>
>>>> This is the second candidate for release of Apache Kafka 0.8.2.1. This
>>>> fixes 4 critical issue in 0.8.2.0.
>>>>
>>>> Release Notes for the 0.8.2.1 release
>>>>
>>>> https://people.apache.org/~junrao/kafka-0.8.2.1-candidate2/RELEASE_NOTES.html
>>>>
>>>> *** Please download, test and vote by Monday, Mar 2, 3pm PT
>>>>
>>>> Kafka's KEYS file containing PGP keys we use to sign the release:
>>>> http://kafka.apache.org/KEYS in addition to the md5, sha1
>>>> and sha2 (SHA256) checksum.
>>>>
>>>> * Release artifacts to be voted upon (source and binary):
>>>> https://people.apache.org/~junrao/kafka-0.8.2.1-candidate2/
>>>>
>>>> * Maven artifacts to be voted upon prior to release:
>>>> https://repository.apache.org/content/groups/staging/
>>>>
>>>> * scala-doc
>>>> https://people.apache.org/~junrao/kafka-0.8.2.1-candidate2/scaladoc/
>>>>
>>>> * java-doc
>>>> https://people.apache.org/~junrao/kafka-0.8.2.1-candidate2/javadoc/
>>>>
>>>> * The tag to be voted upon (off the 0.8.2 branch) is the 0.8.2.1 tag
>>>>
>>>> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=bd1bfb63ec73c10d08432ac893a23f28281ea021
>>>> (git commit ee1267b127f3081db491fa1bf9a287084c324e36)
>>>>
>>>> /***
>>>>
>>>> Thanks,
>>>>
>>>> Jun
>>>>
>>>>
>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "kafka-clients" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to kafka-clients+unsubscr...@googlegroups.com.
>>> To post to this group, send email to kafka-clie...@googlegroups.com.
>>> Visit this group at http://groups.google.com/group/kafka-clients.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/kafka-clients/CAFc58G_5FuLKx3_kM4PCgqQL8d%2B4sqE0o-%2BXfu3FJicAgn5KPw%40mail.gmail.com
>>> <https://groups.google.com/d/msgid/kafka-clients/CAFc58G_5FuLKx3_kM4PCgqQL8d%2B4sqE0o-%2BXfu3FJicAgn5KPw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>


Re: [kafka-clients] Re: [VOTE] 0.8.2.1 Candidate 2

2015-03-03 Thread Joe Stein
Jun, I have most everything looks good except I keep getting test failures
from wget
https://people.apache.org/~junrao/kafka-0.8.2.1-candidate2/kafka-0.8.2.1-src.tgz
&& tar -xvf kafka-0.8.2.1-src.tgz && cd kafka-0.8.2.1-src && gradle &&
./gradlew test

kafka.api.ProducerFailureHandlingTest >
testNotEnoughReplicasAfterBrokerShutdown FAILED
org.scalatest.junit.JUnitTestFailedError: Expected
NotEnoughReplicasException when producing to topic with fewer brokers than
min.insync.replicas
at
org.scalatest.junit.AssertionsForJUnit$class.newAssertionFailedException(AssertionsForJUnit.scala:101)
at
org.scalatest.junit.JUnit3Suite.newAssertionFailedException(JUnit3Suite.scala:149)
at org.scalatest.Assertions$class.fail(Assertions.scala:711)
at org.scalatest.junit.JUnit3Suite.fail(JUnit3Suite.scala:149)
at
kafka.api.ProducerFailureHandlingTest.testNotEnoughReplicasAfterBrokerShutdown(ProducerFailureHandlingTest.scala:355)

This happens to me all the time on a few different machines.

~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -

On Mon, Mar 2, 2015 at 7:36 PM, Jun Rao  wrote:

> +1 from me. Verified quickstart and unit tests.
>
> Thanks,
>
> Jun
>
> On Thu, Feb 26, 2015 at 2:59 PM, Jun Rao  wrote:
>
>> This is the second candidate for release of Apache Kafka 0.8.2.1. This
>> fixes 4 critical issue in 0.8.2.0.
>>
>> Release Notes for the 0.8.2.1 release
>>
>> https://people.apache.org/~junrao/kafka-0.8.2.1-candidate2/RELEASE_NOTES.html
>>
>> *** Please download, test and vote by Monday, Mar 2, 3pm PT
>>
>> Kafka's KEYS file containing PGP keys we use to sign the release:
>> http://kafka.apache.org/KEYS in addition to the md5, sha1
>> and sha2 (SHA256) checksum.
>>
>> * Release artifacts to be voted upon (source and binary):
>> https://people.apache.org/~junrao/kafka-0.8.2.1-candidate2/
>>
>> * Maven artifacts to be voted upon prior to release:
>> https://repository.apache.org/content/groups/staging/
>>
>> * scala-doc
>> https://people.apache.org/~junrao/kafka-0.8.2.1-candidate2/scaladoc/
>>
>> * java-doc
>> https://people.apache.org/~junrao/kafka-0.8.2.1-candidate2/javadoc/
>>
>> * The tag to be voted upon (off the 0.8.2 branch) is the 0.8.2.1 tag
>>
>> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=bd1bfb63ec73c10d08432ac893a23f28281ea021
>> (git commit ee1267b127f3081db491fa1bf9a287084c324e36)
>>
>> /***
>>
>> Thanks,
>>
>> Jun
>>
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "kafka-clients" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kafka-clients+unsubscr...@googlegroups.com.
> To post to this group, send email to kafka-clie...@googlegroups.com.
> Visit this group at http://groups.google.com/group/kafka-clients.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/kafka-clients/CAFc58G_5FuLKx3_kM4PCgqQL8d%2B4sqE0o-%2BXfu3FJicAgn5KPw%40mail.gmail.com
> <https://groups.google.com/d/msgid/kafka-clients/CAFc58G_5FuLKx3_kM4PCgqQL8d%2B4sqE0o-%2BXfu3FJicAgn5KPw%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>


Re: can we send pdf, image files in kafka queue

2015-02-26 Thread Joe Stein
It can be done, sure. We built a prototype a while back
https://github.com/stealthly/f2k though I can't say I have bumped into a
use case where the tradeoffs worked to do it. Chunking the file in each
message and reconstructing it is going to be an overhead or your going to
block on waiting to write big messages.

~ Joestein
On Feb 26, 2015 9:09 AM, "Udbhav Agarwal" 
wrote:

> Hi,
> Can we send pdf, image etc files in kafka queue. Not the url containing
> the address of the pdf etc files but actualy pdf etc files. I want to send
> pdf etc files from kafka producer to kafka consumer where I want to put the
> files in hdfs.
>
>
>
> Thanks,
> Udbhav Agarwal
>
>


Re: Anyone using log4j Appender for Kafka?

2015-02-24 Thread Joe Stein
Sounds like https://issues.apache.org/jira/browse/KAFKA-1788 maybe
On Feb 24, 2015 2:28 PM, "Scott Chapman"  wrote:

> Yea, however I don't get async behavior. When kafka is down the log blocks,
> which is kinda nasty to my app.
>
> On Tue Feb 24 2015 at 2:27:09 PM Joe Stein  wrote:
>
> > Producer type isn't needed anymore with the new producer so in the the
> > logger properties just leave that out in 0.8.2 and it should work.
> >
> > On Tue, Feb 24, 2015 at 2:24 PM, Joe Stein  wrote:
> >
> > > Interesting, looks like a breaking change from 0.8.1
> > > https://github.com/apache/kafka/blob/0.8.1/core/src/
> > main/scala/kafka/producer/KafkaLog4jAppender.scala
> > > to 0.8.2
> > > https://github.com/apache/kafka/blob/0.8.2/core/src/
> > main/scala/kafka/producer/KafkaLog4jAppender.scala
> > >
> > > On Tue, Feb 24, 2015 at 2:21 PM, Joe Stein 
> wrote:
> > >
> > >> and kafka too :)
> > >>
> > >> On Tue, Feb 24, 2015 at 2:21 PM, Joe Stein 
> > wrote:
> > >>
> > >>> are you including
> > >>>
> https://github.com/stealthly/scala-kafka/blob/master/build.gradle#L122
> > >>> in your project?
> > >>>
> > >>> ~ Joe Stein
> > >>> - - - - - - - - - - - - - - - - -
> > >>>
> > >>>   http://www.stealth.ly
> > >>> - - - - - - - - - - - - - - - - -
> > >>>
> > >>> On Tue, Feb 24, 2015 at 2:02 PM, Scott Chapman  >
> > >>> wrote:
> > >>>
> > >>>> Yea, when I try to set type to async (exactly like the example) I
> get:
> > >>>> log4j:WARN No such property [producerType] in
> > >>>> kafka.producer.KafkaLog4jAppender.
> > >>>>
> > >>>> On Tue Feb 24 2015 at 1:35:54 PM Joe Stein 
> > >>>> wrote:
> > >>>>
> > >>>> > Here is sample log4j.properties
> > >>>> > https://github.com/stealthly/scala-kafka/blob/master/src/tes
> > >>>> > t/resources/log4j.properties#L54-L67
> > >>>> >
> > >>>> > I _almost_ have always pulled the class
> > >>>> > https://github.com/apache/kafka/blob/0.8.2/core/src/main/
> > >>>> > scala/kafka/producer/KafkaLog4jAppender.scala
> > >>>> > internal
> > >>>> > to private repo and changed it as things came up... e.g.
> > setSource(),
> > >>>> > setTags() blah blah...
> > >>>> >
> > >>>> > Paul Otto has an open source version
> > >>>> > https://github.com/potto007/kafka-appender-layout that you could
> > try
> > >>>> out
> > >>>> > too that he built to tackle some of the layout things.
> > >>>> >
> > >>>> > ~ Joe Stein
> > >>>> > - - - - - - - - - - - - - - - - -
> > >>>> >
> > >>>> >   http://www.stealth.ly
> > >>>> > - - - - - - - - - - - - - - - - -
> > >>>> >
> > >>>> > On Mon, Feb 23, 2015 at 4:42 PM, Alex Melville <
> amelvi...@g.hmc.edu
> > >
> > >>>> > wrote:
> > >>>> >
> > >>>> > > ^^ I would really appreciate this as well. It's unclear how to
> get
> > >>>> log4j
> > >>>> > > working with Kafka when you have no prior experience with log4j.
> > >>>> > >
> > >>>> > > On Mon, Feb 23, 2015 at 4:39 AM, Scott Chapman <
> > >>>> sc...@woofplanet.com>
> > >>>> > > wrote:
> > >>>> > >
> > >>>> > > > Thanks. But we're using log4j. I tried setting the type to
> async
> > >>>> but it
> > >>>> > > > generated a warning of no such field. Is there any real
> > >>>> documentation
> > >>>> > on
> > >>>> > > > the log4j appender?
> > >>>> > > >
> > >>>> > > > On Mon Feb 23 2015 at 2:58:54 AM Steven Schlansker <
> > >>>> > > > sschlans...@opentable.com> wrote:
> > >>>> > > >
> > >>>> > > > > We just configure our logback.xml to have two Appenders, an
> > >>>> &

Re: Anyone using log4j Appender for Kafka?

2015-02-24 Thread Joe Stein
Interesting, looks like a breaking change from 0.8.1
https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/producer/KafkaLog4jAppender.scala
to 0.8.2
https://github.com/apache/kafka/blob/0.8.2/core/src/main/scala/kafka/producer/KafkaLog4jAppender.scala

On Tue, Feb 24, 2015 at 2:21 PM, Joe Stein  wrote:

> and kafka too :)
>
> On Tue, Feb 24, 2015 at 2:21 PM, Joe Stein  wrote:
>
>> are you including
>> https://github.com/stealthly/scala-kafka/blob/master/build.gradle#L122
>> in your project?
>>
>> ~ Joe Stein
>> - - - - - - - - - - - - - - - - -
>>
>>   http://www.stealth.ly
>> - - - - - - - - - - - - - - - - -
>>
>> On Tue, Feb 24, 2015 at 2:02 PM, Scott Chapman 
>> wrote:
>>
>>> Yea, when I try to set type to async (exactly like the example) I get:
>>> log4j:WARN No such property [producerType] in
>>> kafka.producer.KafkaLog4jAppender.
>>>
>>> On Tue Feb 24 2015 at 1:35:54 PM Joe Stein  wrote:
>>>
>>> > Here is sample log4j.properties
>>> > https://github.com/stealthly/scala-kafka/blob/master/src/tes
>>> > t/resources/log4j.properties#L54-L67
>>> >
>>> > I _almost_ have always pulled the class
>>> > https://github.com/apache/kafka/blob/0.8.2/core/src/main/
>>> > scala/kafka/producer/KafkaLog4jAppender.scala
>>> > internal
>>> > to private repo and changed it as things came up... e.g. setSource(),
>>> > setTags() blah blah...
>>> >
>>> > Paul Otto has an open source version
>>> > https://github.com/potto007/kafka-appender-layout that you could try
>>> out
>>> > too that he built to tackle some of the layout things.
>>> >
>>> > ~ Joe Stein
>>> > - - - - - - - - - - - - - - - - -
>>> >
>>> >   http://www.stealth.ly
>>> > - - - - - - - - - - - - - - - - -
>>> >
>>> > On Mon, Feb 23, 2015 at 4:42 PM, Alex Melville 
>>> > wrote:
>>> >
>>> > > ^^ I would really appreciate this as well. It's unclear how to get
>>> log4j
>>> > > working with Kafka when you have no prior experience with log4j.
>>> > >
>>> > > On Mon, Feb 23, 2015 at 4:39 AM, Scott Chapman >> >
>>> > > wrote:
>>> > >
>>> > > > Thanks. But we're using log4j. I tried setting the type to async
>>> but it
>>> > > > generated a warning of no such field. Is there any real
>>> documentation
>>> > on
>>> > > > the log4j appender?
>>> > > >
>>> > > > On Mon Feb 23 2015 at 2:58:54 AM Steven Schlansker <
>>> > > > sschlans...@opentable.com> wrote:
>>> > > >
>>> > > > > We just configure our logback.xml to have two Appenders, an
>>> > > AsyncAppender
>>> > > > > -> KafkaAppender, and FileAppender (or ConsoleAppender as
>>> > appropriate).
>>> > > > >
>>> > > > > AsyncAppender removes more failure cases too, e.g. a health check
>>> > > hanging
>>> > > > > rather than returning rapidly could block you application.
>>> > > > >
>>> > > > > On Feb 22, 2015, at 11:26 PM, anthony musyoki <
>>> > > anthony.musy...@gmail.com
>>> > > > >
>>> > > > > wrote:
>>> > > > >
>>> > > > > > Theres also another one here.
>>> > > > > >
>>> > > > > > https://github.com/danielwegener/logback-kafka-appender.
>>> > > > > >
>>> > > > > > It has a fallback appender which might address the issue of
>>> Kafka
>>> > > being
>>> > > > > > un-available.
>>> > > > > >
>>> > > > > >
>>> > > > > > On Mon, Feb 23, 2015 at 9:45 AM, Steven Schlansker <
>>> > > > > > sschlans...@opentable.com> wrote:
>>> > > > > >
>>> > > > > >> Here’s my attempt at a Logback version, should be fairly
>>> easily
>>> > > > ported:
>>> > > > > >>
>>> > > > > >> https://github.com/opentable/otj-logging/blob/master/kafka/
>>> > > > > src/main/java/com/opentable/logging/KafkaAppender.java
>>> > > > > >>
>>> > > > > >> On Feb 22, 2015, at 1:36 PM, Scott Chapman <
>>> sc...@woofplanet.com>
>>> > > > > wrote:
>>> > > > > >>
>>> > > > > >>> I am just starting to use it and could use a little
>>> guidance. I
>>> > was
>>> > > > > able
>>> > > > > >> to
>>> > > > > >>> get it working with 0.8.2 but am not clear on best practices
>>> for
>>> > > > using
>>> > > > > >> it.
>>> > > > > >>>
>>> > > > > >>> Anyway willing to help me out a bit? Got a few questions,
>>> like
>>> > how
>>> > > to
>>> > > > > >>> protect applications from when kafka is down or unreachable.
>>> > > > > >>>
>>> > > > > >>> It seems like a great idea for being able to get logs from
>>> > existing
>>> > > > > >>> applications to be collected by kafka.
>>> > > > > >>>
>>> > > > > >>> Thanks in advance!
>>> > > > > >>
>>> > > > > >>
>>> > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>>
>>
>>
>


Re: Anyone using log4j Appender for Kafka?

2015-02-24 Thread Joe Stein
Producer type isn't needed anymore with the new producer so in the the
logger properties just leave that out in 0.8.2 and it should work.

On Tue, Feb 24, 2015 at 2:24 PM, Joe Stein  wrote:

> Interesting, looks like a breaking change from 0.8.1
> https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/producer/KafkaLog4jAppender.scala
> to 0.8.2
> https://github.com/apache/kafka/blob/0.8.2/core/src/main/scala/kafka/producer/KafkaLog4jAppender.scala
>
> On Tue, Feb 24, 2015 at 2:21 PM, Joe Stein  wrote:
>
>> and kafka too :)
>>
>> On Tue, Feb 24, 2015 at 2:21 PM, Joe Stein  wrote:
>>
>>> are you including
>>> https://github.com/stealthly/scala-kafka/blob/master/build.gradle#L122
>>> in your project?
>>>
>>> ~ Joe Stein
>>> - - - - - - - - - - - - - - - - -
>>>
>>>   http://www.stealth.ly
>>> - - - - - - - - - - - - - - - - -
>>>
>>> On Tue, Feb 24, 2015 at 2:02 PM, Scott Chapman 
>>> wrote:
>>>
>>>> Yea, when I try to set type to async (exactly like the example) I get:
>>>> log4j:WARN No such property [producerType] in
>>>> kafka.producer.KafkaLog4jAppender.
>>>>
>>>> On Tue Feb 24 2015 at 1:35:54 PM Joe Stein 
>>>> wrote:
>>>>
>>>> > Here is sample log4j.properties
>>>> > https://github.com/stealthly/scala-kafka/blob/master/src/tes
>>>> > t/resources/log4j.properties#L54-L67
>>>> >
>>>> > I _almost_ have always pulled the class
>>>> > https://github.com/apache/kafka/blob/0.8.2/core/src/main/
>>>> > scala/kafka/producer/KafkaLog4jAppender.scala
>>>> > internal
>>>> > to private repo and changed it as things came up... e.g. setSource(),
>>>> > setTags() blah blah...
>>>> >
>>>> > Paul Otto has an open source version
>>>> > https://github.com/potto007/kafka-appender-layout that you could try
>>>> out
>>>> > too that he built to tackle some of the layout things.
>>>> >
>>>> > ~ Joe Stein
>>>> > - - - - - - - - - - - - - - - - -
>>>> >
>>>> >   http://www.stealth.ly
>>>> > - - - - - - - - - - - - - - - - -
>>>> >
>>>> > On Mon, Feb 23, 2015 at 4:42 PM, Alex Melville 
>>>> > wrote:
>>>> >
>>>> > > ^^ I would really appreciate this as well. It's unclear how to get
>>>> log4j
>>>> > > working with Kafka when you have no prior experience with log4j.
>>>> > >
>>>> > > On Mon, Feb 23, 2015 at 4:39 AM, Scott Chapman <
>>>> sc...@woofplanet.com>
>>>> > > wrote:
>>>> > >
>>>> > > > Thanks. But we're using log4j. I tried setting the type to async
>>>> but it
>>>> > > > generated a warning of no such field. Is there any real
>>>> documentation
>>>> > on
>>>> > > > the log4j appender?
>>>> > > >
>>>> > > > On Mon Feb 23 2015 at 2:58:54 AM Steven Schlansker <
>>>> > > > sschlans...@opentable.com> wrote:
>>>> > > >
>>>> > > > > We just configure our logback.xml to have two Appenders, an
>>>> > > AsyncAppender
>>>> > > > > -> KafkaAppender, and FileAppender (or ConsoleAppender as
>>>> > appropriate).
>>>> > > > >
>>>> > > > > AsyncAppender removes more failure cases too, e.g. a health
>>>> check
>>>> > > hanging
>>>> > > > > rather than returning rapidly could block you application.
>>>> > > > >
>>>> > > > > On Feb 22, 2015, at 11:26 PM, anthony musyoki <
>>>> > > anthony.musy...@gmail.com
>>>> > > > >
>>>> > > > > wrote:
>>>> > > > >
>>>> > > > > > Theres also another one here.
>>>> > > > > >
>>>> > > > > > https://github.com/danielwegener/logback-kafka-appender.
>>>> > > > > >
>>>> > > > > > It has a fallback appender which might address the issue of
>>>> Kafka
>>>> > > being
>>>> > > > > > un-available.
>>>> > > > > >
>>>> > > > > >
>>>> > > > > > On Mon, Feb 23, 2015 at 9:45 AM, Steven Schlansker <
>>>> > > > > > sschlans...@opentable.com> wrote:
>>>> > > > > >
>>>> > > > > >> Here’s my attempt at a Logback version, should be fairly
>>>> easily
>>>> > > > ported:
>>>> > > > > >>
>>>> > > > > >> https://github.com/opentable/otj-logging/blob/master/kafka/
>>>> > > > > src/main/java/com/opentable/logging/KafkaAppender.java
>>>> > > > > >>
>>>> > > > > >> On Feb 22, 2015, at 1:36 PM, Scott Chapman <
>>>> sc...@woofplanet.com>
>>>> > > > > wrote:
>>>> > > > > >>
>>>> > > > > >>> I am just starting to use it and could use a little
>>>> guidance. I
>>>> > was
>>>> > > > > able
>>>> > > > > >> to
>>>> > > > > >>> get it working with 0.8.2 but am not clear on best
>>>> practices for
>>>> > > > using
>>>> > > > > >> it.
>>>> > > > > >>>
>>>> > > > > >>> Anyway willing to help me out a bit? Got a few questions,
>>>> like
>>>> > how
>>>> > > to
>>>> > > > > >>> protect applications from when kafka is down or unreachable.
>>>> > > > > >>>
>>>> > > > > >>> It seems like a great idea for being able to get logs from
>>>> > existing
>>>> > > > > >>> applications to be collected by kafka.
>>>> > > > > >>>
>>>> > > > > >>> Thanks in advance!
>>>> > > > > >>
>>>> > > > > >>
>>>> > > > >
>>>> > > > >
>>>> > > >
>>>> > >
>>>> >
>>>>
>>>
>>>
>>
>


Re: Anyone using log4j Appender for Kafka?

2015-02-24 Thread Joe Stein
are you including
https://github.com/stealthly/scala-kafka/blob/master/build.gradle#L122 in
your project?

~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -

On Tue, Feb 24, 2015 at 2:02 PM, Scott Chapman  wrote:

> Yea, when I try to set type to async (exactly like the example) I get:
> log4j:WARN No such property [producerType] in
> kafka.producer.KafkaLog4jAppender.
>
> On Tue Feb 24 2015 at 1:35:54 PM Joe Stein  wrote:
>
> > Here is sample log4j.properties
> > https://github.com/stealthly/scala-kafka/blob/master/src/tes
> > t/resources/log4j.properties#L54-L67
> >
> > I _almost_ have always pulled the class
> > https://github.com/apache/kafka/blob/0.8.2/core/src/main/
> > scala/kafka/producer/KafkaLog4jAppender.scala
> > internal
> > to private repo and changed it as things came up... e.g. setSource(),
> > setTags() blah blah...
> >
> > Paul Otto has an open source version
> > https://github.com/potto007/kafka-appender-layout that you could try out
> > too that he built to tackle some of the layout things.
> >
> > ~ Joe Stein
> > - - - - - - - - - - - - - - - - -
> >
> >   http://www.stealth.ly
> > - - - - - - - - - - - - - - - - -
> >
> > On Mon, Feb 23, 2015 at 4:42 PM, Alex Melville 
> > wrote:
> >
> > > ^^ I would really appreciate this as well. It's unclear how to get
> log4j
> > > working with Kafka when you have no prior experience with log4j.
> > >
> > > On Mon, Feb 23, 2015 at 4:39 AM, Scott Chapman 
> > > wrote:
> > >
> > > > Thanks. But we're using log4j. I tried setting the type to async but
> it
> > > > generated a warning of no such field. Is there any real documentation
> > on
> > > > the log4j appender?
> > > >
> > > > On Mon Feb 23 2015 at 2:58:54 AM Steven Schlansker <
> > > > sschlans...@opentable.com> wrote:
> > > >
> > > > > We just configure our logback.xml to have two Appenders, an
> > > AsyncAppender
> > > > > -> KafkaAppender, and FileAppender (or ConsoleAppender as
> > appropriate).
> > > > >
> > > > > AsyncAppender removes more failure cases too, e.g. a health check
> > > hanging
> > > > > rather than returning rapidly could block you application.
> > > > >
> > > > > On Feb 22, 2015, at 11:26 PM, anthony musyoki <
> > > anthony.musy...@gmail.com
> > > > >
> > > > > wrote:
> > > > >
> > > > > > Theres also another one here.
> > > > > >
> > > > > > https://github.com/danielwegener/logback-kafka-appender.
> > > > > >
> > > > > > It has a fallback appender which might address the issue of Kafka
> > > being
> > > > > > un-available.
> > > > > >
> > > > > >
> > > > > > On Mon, Feb 23, 2015 at 9:45 AM, Steven Schlansker <
> > > > > > sschlans...@opentable.com> wrote:
> > > > > >
> > > > > >> Here’s my attempt at a Logback version, should be fairly easily
> > > > ported:
> > > > > >>
> > > > > >> https://github.com/opentable/otj-logging/blob/master/kafka/
> > > > > src/main/java/com/opentable/logging/KafkaAppender.java
> > > > > >>
> > > > > >> On Feb 22, 2015, at 1:36 PM, Scott Chapman <
> sc...@woofplanet.com>
> > > > > wrote:
> > > > > >>
> > > > > >>> I am just starting to use it and could use a little guidance. I
> > was
> > > > > able
> > > > > >> to
> > > > > >>> get it working with 0.8.2 but am not clear on best practices
> for
> > > > using
> > > > > >> it.
> > > > > >>>
> > > > > >>> Anyway willing to help me out a bit? Got a few questions, like
> > how
> > > to
> > > > > >>> protect applications from when kafka is down or unreachable.
> > > > > >>>
> > > > > >>> It seems like a great idea for being able to get logs from
> > existing
> > > > > >>> applications to be collected by kafka.
> > > > > >>>
> > > > > >>> Thanks in advance!
> > > > > >>
> > > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>


Re: Anyone using log4j Appender for Kafka?

2015-02-24 Thread Joe Stein
and kafka too :)

On Tue, Feb 24, 2015 at 2:21 PM, Joe Stein  wrote:

> are you including
> https://github.com/stealthly/scala-kafka/blob/master/build.gradle#L122 in
> your project?
>
> ~ Joe Stein
> - - - - - - - - - - - - - - - - -
>
>   http://www.stealth.ly
> - - - - - - - - - - - - - - - - -
>
> On Tue, Feb 24, 2015 at 2:02 PM, Scott Chapman 
> wrote:
>
>> Yea, when I try to set type to async (exactly like the example) I get:
>> log4j:WARN No such property [producerType] in
>> kafka.producer.KafkaLog4jAppender.
>>
>> On Tue Feb 24 2015 at 1:35:54 PM Joe Stein  wrote:
>>
>> > Here is sample log4j.properties
>> > https://github.com/stealthly/scala-kafka/blob/master/src/tes
>> > t/resources/log4j.properties#L54-L67
>> >
>> > I _almost_ have always pulled the class
>> > https://github.com/apache/kafka/blob/0.8.2/core/src/main/
>> > scala/kafka/producer/KafkaLog4jAppender.scala
>> > internal
>> > to private repo and changed it as things came up... e.g. setSource(),
>> > setTags() blah blah...
>> >
>> > Paul Otto has an open source version
>> > https://github.com/potto007/kafka-appender-layout that you could try
>> out
>> > too that he built to tackle some of the layout things.
>> >
>> > ~ Joe Stein
>> > - - - - - - - - - - - - - - - - -
>> >
>> >   http://www.stealth.ly
>> > - - - - - - - - - - - - - - - - -
>> >
>> > On Mon, Feb 23, 2015 at 4:42 PM, Alex Melville 
>> > wrote:
>> >
>> > > ^^ I would really appreciate this as well. It's unclear how to get
>> log4j
>> > > working with Kafka when you have no prior experience with log4j.
>> > >
>> > > On Mon, Feb 23, 2015 at 4:39 AM, Scott Chapman 
>> > > wrote:
>> > >
>> > > > Thanks. But we're using log4j. I tried setting the type to async
>> but it
>> > > > generated a warning of no such field. Is there any real
>> documentation
>> > on
>> > > > the log4j appender?
>> > > >
>> > > > On Mon Feb 23 2015 at 2:58:54 AM Steven Schlansker <
>> > > > sschlans...@opentable.com> wrote:
>> > > >
>> > > > > We just configure our logback.xml to have two Appenders, an
>> > > AsyncAppender
>> > > > > -> KafkaAppender, and FileAppender (or ConsoleAppender as
>> > appropriate).
>> > > > >
>> > > > > AsyncAppender removes more failure cases too, e.g. a health check
>> > > hanging
>> > > > > rather than returning rapidly could block you application.
>> > > > >
>> > > > > On Feb 22, 2015, at 11:26 PM, anthony musyoki <
>> > > anthony.musy...@gmail.com
>> > > > >
>> > > > > wrote:
>> > > > >
>> > > > > > Theres also another one here.
>> > > > > >
>> > > > > > https://github.com/danielwegener/logback-kafka-appender.
>> > > > > >
>> > > > > > It has a fallback appender which might address the issue of
>> Kafka
>> > > being
>> > > > > > un-available.
>> > > > > >
>> > > > > >
>> > > > > > On Mon, Feb 23, 2015 at 9:45 AM, Steven Schlansker <
>> > > > > > sschlans...@opentable.com> wrote:
>> > > > > >
>> > > > > >> Here’s my attempt at a Logback version, should be fairly easily
>> > > > ported:
>> > > > > >>
>> > > > > >> https://github.com/opentable/otj-logging/blob/master/kafka/
>> > > > > src/main/java/com/opentable/logging/KafkaAppender.java
>> > > > > >>
>> > > > > >> On Feb 22, 2015, at 1:36 PM, Scott Chapman <
>> sc...@woofplanet.com>
>> > > > > wrote:
>> > > > > >>
>> > > > > >>> I am just starting to use it and could use a little guidance.
>> I
>> > was
>> > > > > able
>> > > > > >> to
>> > > > > >>> get it working with 0.8.2 but am not clear on best practices
>> for
>> > > > using
>> > > > > >> it.
>> > > > > >>>
>> > > > > >>> Anyway willing to help me out a bit? Got a few questions, like
>> > how
>> > > to
>> > > > > >>> protect applications from when kafka is down or unreachable.
>> > > > > >>>
>> > > > > >>> It seems like a great idea for being able to get logs from
>> > existing
>> > > > > >>> applications to be collected by kafka.
>> > > > > >>>
>> > > > > >>> Thanks in advance!
>> > > > > >>
>> > > > > >>
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>


Re: Anyone using log4j Appender for Kafka?

2015-02-24 Thread Joe Stein
Here is sample log4j.properties
https://github.com/stealthly/scala-kafka/blob/master/src/test/resources/log4j.properties#L54-L67

I _almost_ have always pulled the class
https://github.com/apache/kafka/blob/0.8.2/core/src/main/scala/kafka/producer/KafkaLog4jAppender.scala
internal
to private repo and changed it as things came up... e.g. setSource(),
setTags() blah blah...

Paul Otto has an open source version
https://github.com/potto007/kafka-appender-layout that you could try out
too that he built to tackle some of the layout things.

~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -

On Mon, Feb 23, 2015 at 4:42 PM, Alex Melville  wrote:

> ^^ I would really appreciate this as well. It's unclear how to get log4j
> working with Kafka when you have no prior experience with log4j.
>
> On Mon, Feb 23, 2015 at 4:39 AM, Scott Chapman 
> wrote:
>
> > Thanks. But we're using log4j. I tried setting the type to async but it
> > generated a warning of no such field. Is there any real documentation on
> > the log4j appender?
> >
> > On Mon Feb 23 2015 at 2:58:54 AM Steven Schlansker <
> > sschlans...@opentable.com> wrote:
> >
> > > We just configure our logback.xml to have two Appenders, an
> AsyncAppender
> > > -> KafkaAppender, and FileAppender (or ConsoleAppender as appropriate).
> > >
> > > AsyncAppender removes more failure cases too, e.g. a health check
> hanging
> > > rather than returning rapidly could block you application.
> > >
> > > On Feb 22, 2015, at 11:26 PM, anthony musyoki <
> anthony.musy...@gmail.com
> > >
> > > wrote:
> > >
> > > > Theres also another one here.
> > > >
> > > > https://github.com/danielwegener/logback-kafka-appender.
> > > >
> > > > It has a fallback appender which might address the issue of Kafka
> being
> > > > un-available.
> > > >
> > > >
> > > > On Mon, Feb 23, 2015 at 9:45 AM, Steven Schlansker <
> > > > sschlans...@opentable.com> wrote:
> > > >
> > > >> Here’s my attempt at a Logback version, should be fairly easily
> > ported:
> > > >>
> > > >> https://github.com/opentable/otj-logging/blob/master/kafka/
> > > src/main/java/com/opentable/logging/KafkaAppender.java
> > > >>
> > > >> On Feb 22, 2015, at 1:36 PM, Scott Chapman 
> > > wrote:
> > > >>
> > > >>> I am just starting to use it and could use a little guidance. I was
> > > able
> > > >> to
> > > >>> get it working with 0.8.2 but am not clear on best practices for
> > using
> > > >> it.
> > > >>>
> > > >>> Anyway willing to help me out a bit? Got a few questions, like how
> to
> > > >>> protect applications from when kafka is down or unreachable.
> > > >>>
> > > >>> It seems like a great idea for being able to get logs from existing
> > > >>> applications to be collected by kafka.
> > > >>>
> > > >>> Thanks in advance!
> > > >>
> > > >>
> > >
> > >
> >
>


Re: [kafka-clients] Re: [VOTE] 0.8.2.1 Candidate 1

2015-02-22 Thread Joe Stein
Jun,

Can we also add https://issues.apache.org/jira/browse/KAFKA-1724 to the
next RC please?

Thanks!

~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -

On Sun, Feb 22, 2015 at 11:59 AM, Jun Rao  wrote:

> We identified at least one more blocker issue KAFKA-1971 during testing.
> So, we will have to roll another RC for 0.8.2.1.
>
> Thanks,
>
> Jun
>
> On Sat, Feb 21, 2015 at 6:04 PM, Joe Stein  wrote:
>
>> Source verified, tests pass, quick start ok.
>>
>> Binaries verified, tests on scala
>> https://github.com/stealthly/scala-kafka/pull/27 and go clients
>> https://github.com/stealthly/go_kafka_client/pull/55 passing.
>>
>> If the release passes we should update the release notes to include the
>> change from KAFKA-1729 please.
>>
>> +1 (binding)
>>
>> ~ Joe Stein
>>
>> On Fri, Feb 20, 2015 at 9:08 PM, ted won  wrote:
>>
>>> +1
>>>
>>> On Friday, February 20, 2015, Guozhang Wang  wrote:
>>>
>>> > +1 binding.
>>> >
>>> > Checked the md5, and quick start.
>>> >
>>> > Some minor comments:
>>> >
>>> > 1. The quickstart section would better include the building step after
>>> > download and before starting server.
>>> >
>>> > 2. There seems to be a bug in Gradle 1.1x with Java 8 causing the
>>> "gradle"
>>> > initialization to fail:
>>> >
>>> > -
>>> >
>>> > FAILURE: Build failed with an exception.
>>> >
>>> > * Where:
>>> > Build file '/home/guwang/Workspace/temp/kafka/build.gradle' line: 199
>>> >
>>> > * What went wrong:
>>> > A problem occurred evaluating root project 'kafka'.
>>> > > Could not create task of type 'ScalaDoc'.
>>> > --
>>> >
>>> > Downgrading Java to 1.7 resolve this issue.
>>> >
>>> > Guozhang
>>> >
>>> > On Wed, Feb 18, 2015 at 7:56 PM, Connie Yang >> > > wrote:
>>> >
>>> > > +1
>>> > > On Feb 18, 2015 7:23 PM, "Matt Narrell" >> > > wrote:
>>> > >
>>> > > > +1
>>> > > >
>>> > > > > On Feb 18, 2015, at 7:56 PM, Jun Rao >> > > wrote:
>>> > > > >
>>> > > > > This is the first candidate for release of Apache Kafka 0.8.2.1.
>>> This
>>> > > > > only fixes one critical issue (KAFKA-1952) in 0.8.2.0.
>>> > > > >
>>> > > > > Release Notes for the 0.8.2.1 release
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> https://people.apache.org/~junrao/kafka-0.8.2.1-candidate1/RELEASE_NOTES.html
>>> > > > >
>>> > > > > *** Please download, test and vote by Saturday, Feb 21, 7pm PT
>>> > > > >
>>> > > > > Kafka's KEYS file containing PGP keys we use to sign the release:
>>> > > > > http://kafka.apache.org/KEYS in addition to the md5, sha1
>>> > > > > and sha2 (SHA256) checksum.
>>> > > > >
>>> > > > > * Release artifacts to be voted upon (source and binary):
>>> > > > > https://people.apache.org/~junrao/kafka-0.8.2.1-candidate1/
>>> > > > >
>>> > > > > * Maven artifacts to be voted upon prior to release:
>>> > > > > https://repository.apache.org/content/groups/staging/
>>> > > > >
>>> > > > > * scala-doc
>>> > > > >
>>> https://people.apache.org/~junrao/kafka-0.8.2.1-candidate1/scaladoc/
>>> > > > >
>>> > > > > * java-doc
>>> > > > >
>>> https://people.apache.org/~junrao/kafka-0.8.2.1-candidate1/javadoc/
>>> > > > >
>>> > > > > * The tag to be voted upon (off the 0.8.2 branch) is the 0.8.2.1
>>> tag
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=c1b4c58531343dce80232e0122d085fc687633f6
>>> > > > >
>>> > > > > /***
>>> > > > >
>>> > > > > Thanks,
>>> > > > >
>>> > > > > Jun
>>> > > >
>>> > > >
>>> > >
>>> >
>>> >
>>> >
>>> > --
>>> > -- Guozhang
>>> >
>>>
>>
>>  --
>> You received this message because you are subscribed to the Google Groups
>> "kafka-clients" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to kafka-clients+unsubscr...@googlegroups.com.
>> To post to this group, send email to kafka-clie...@googlegroups.com.
>> Visit this group at http://groups.google.com/group/kafka-clients.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/kafka-clients/CAA7ooCDvUNQx2B351P3LaOYAejoxR9M_PbzfmWo5-ssgEJ_%2Bpw%40mail.gmail.com
>> <https://groups.google.com/d/msgid/kafka-clients/CAA7ooCDvUNQx2B351P3LaOYAejoxR9M_PbzfmWo5-ssgEJ_%2Bpw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>


Re: [VOTE] 0.8.2.1 Candidate 1

2015-02-21 Thread Joe Stein
Source verified, tests pass, quick start ok.

Binaries verified, tests on scala
https://github.com/stealthly/scala-kafka/pull/27 and go clients
https://github.com/stealthly/go_kafka_client/pull/55 passing.

If the release passes we should update the release notes to include the
change from KAFKA-1729 please.

+1 (binding)

~ Joe Stein

On Fri, Feb 20, 2015 at 9:08 PM, ted won  wrote:

> +1
>
> On Friday, February 20, 2015, Guozhang Wang  wrote:
>
> > +1 binding.
> >
> > Checked the md5, and quick start.
> >
> > Some minor comments:
> >
> > 1. The quickstart section would better include the building step after
> > download and before starting server.
> >
> > 2. There seems to be a bug in Gradle 1.1x with Java 8 causing the
> "gradle"
> > initialization to fail:
> >
> > -
> >
> > FAILURE: Build failed with an exception.
> >
> > * Where:
> > Build file '/home/guwang/Workspace/temp/kafka/build.gradle' line: 199
> >
> > * What went wrong:
> > A problem occurred evaluating root project 'kafka'.
> > > Could not create task of type 'ScalaDoc'.
> > --
> >
> > Downgrading Java to 1.7 resolve this issue.
> >
> > Guozhang
> >
> > On Wed, Feb 18, 2015 at 7:56 PM, Connie Yang  > > wrote:
> >
> > > +1
> > > On Feb 18, 2015 7:23 PM, "Matt Narrell"  > > wrote:
> > >
> > > > +1
> > > >
> > > > > On Feb 18, 2015, at 7:56 PM, Jun Rao  > > wrote:
> > > > >
> > > > > This is the first candidate for release of Apache Kafka 0.8.2.1.
> This
> > > > > only fixes one critical issue (KAFKA-1952) in 0.8.2.0.
> > > > >
> > > > > Release Notes for the 0.8.2.1 release
> > > > >
> > > >
> > >
> >
> https://people.apache.org/~junrao/kafka-0.8.2.1-candidate1/RELEASE_NOTES.html
> > > > >
> > > > > *** Please download, test and vote by Saturday, Feb 21, 7pm PT
> > > > >
> > > > > Kafka's KEYS file containing PGP keys we use to sign the release:
> > > > > http://kafka.apache.org/KEYS in addition to the md5, sha1
> > > > > and sha2 (SHA256) checksum.
> > > > >
> > > > > * Release artifacts to be voted upon (source and binary):
> > > > > https://people.apache.org/~junrao/kafka-0.8.2.1-candidate1/
> > > > >
> > > > > * Maven artifacts to be voted upon prior to release:
> > > > > https://repository.apache.org/content/groups/staging/
> > > > >
> > > > > * scala-doc
> > > > >
> https://people.apache.org/~junrao/kafka-0.8.2.1-candidate1/scaladoc/
> > > > >
> > > > > * java-doc
> > > > >
> https://people.apache.org/~junrao/kafka-0.8.2.1-candidate1/javadoc/
> > > > >
> > > > > * The tag to be voted upon (off the 0.8.2 branch) is the 0.8.2.1
> tag
> > > > >
> > > >
> > >
> >
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=c1b4c58531343dce80232e0122d085fc687633f6
> > > > >
> > > > > /***
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jun
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>


Re: Kafka on Mesos?

2015-02-21 Thread Joe Stein
We pushed first bits for the kafka framework we are working on to
https://github.com/mesos/kafka.

We have been using Aurora and Marathon to run Kafka on Mesos but are
cutting over to a framework approach as described in the ticket over the
next 5 weeks.

~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -

On Sat, Feb 21, 2015 at 6:31 PM, Otis Gospodnetic <
otis.gospodne...@gmail.com> wrote:

> Hi Jeff,
>
> I see a patchless JIRA issue via
> http://search-hadoop.com/?q=%2Bmesos+%2Bkafka
>
> And if Kafka on YARN is of interest, there is KOYA:
> http://search-hadoop.com/?q=koya
>
> Otis
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>
> On Sat, Feb 21, 2015 at 3:04 PM, Jeff Schroeder <
> jeffschroe...@computer.org>
> wrote:
>
> > I've got several use cases where being able to spin up multiple
> *discrete*
> > kafka clusters would be extremely advantageous. External cloud services
> > aren't an option and we already have a decent Mesos infrastructure
> running
> > on bare metal for these types of things.
> >
> > I was curious if anyone else has done this in a reasonably supportable
> way
> > and how they went about it. Today I'll be trying out the stealth.ly's
> > mesos
> > framework[1] but am not sure how well it will work with a newer mesos
> > (0.21.1).
> >
> > Thanks in advance!
> >
> > [1] https://github.com/stealthly/kafka-mesos
> >
> > --
> > Jeff Schroeder
> >
> > Don't drink and derive, alcohol and analysis don't mix.
> > http://www.digitalprognosis.com
> >
>


Re: Kafka 0.8.2 New Producer Example

2015-02-21 Thread Joe Stein
in scala
https://github.com/stealthly/scala-kafka/blob/master/src/test/scala/KafkaSpec.scala#L146-L168

~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -

On Sat, Feb 21, 2015 at 8:10 AM, Cdhar B  wrote:

> Hi,
>
> Could you please provide some info. on Kafka 0.8.2 new producer example. I
> tried but not able to find a complete example. Does anybody has a sample?
>
> Appreciate your quick help.
>
> Thanks,
> Cdhar
>


Re: still too many files open errors for kafka web cosole

2015-02-20 Thread Joe Stein
The Apache project doesn't have a web console for kafka.

Have you taken a look at https://github.com/yahoo/kafka-manager as of yet?
I haven't myself hoping to get sometime tonight/this weekend to-do so.


~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -

On Fri, Feb 20, 2015 at 6:09 PM, Sa Li  wrote:

> Hi, All
>
> I 've like to use kafka web console to monitor the offset/topics stuff, it
> is easy to use, however, it is freezing/stopping or dying too frequently.
> I don't think it's a problem on the OS level.
> Seems to be a problem on the application level.
> I've already fixed open file handlers to 98000 for anybody and time_waits
> to 30s instead of the default 5 minutes.
>
> From what I can see from the logs, it starts with play:
> [ESC[31merrorESC[0m] play - Cannot invoke the action, eventually got an
> error: java.lang.RuntimeException: Exception while executing statement : IO
> Exception: "java.io.IOException: Too many open files";
> "/etc/kafka-web-console/play"; SQL statement:
> delete from offsetPoints
> where
> (offsetPoints.offsetHistoryId = ?) [90031-172]
> errorCode: 90031, sqlState: 90031
>
> Caused by: java.lang.RuntimeException: Exception while executing statement
> : IO Exception: "java.io.IOException: Too many open files";
> "/etc/kafka-web-console/play"; SQL statement:
> delete from offsetPoints
> where
> (offsetPoints.offsetHistoryId = ?) [90031-172]
> errorCode: 90031, sqlState: 90031
> delete from offsetPoints
>
> then this seems to cause socket connection errors:
> Caused by: java.io.IOException: Too many open files
> at java.io.UnixFileSystem.createFileExclusively(Native Method)
> ~[na:1.7.0_75]
> at java.io.File.createNewFile(File.java:1006) ~[na:1.7.0_75]
> at org.h2.store.fs.FilePathDisk.createTempFile(FilePathDisk.java:367)
> ~[h2.jar:1.3.172]
> at org.h2.store.fs.FileUtils.createTempFile(FileUtils.java:329)
> ~[h2.jar:1.3.172]
> at org.h2.engine.Database.createTempFile(Database.java:1529)
> ~[h2.jar:1.3.172]
> at org.h2.result.RowList.writeAllRows(RowList.java:90) ~[h2.jar:1.3.172]
> [ESC[36mdebugESC[0m] application - Getting partition leaders for topic
> topic-exist-test
> [ESC[36mdebugESC[0m] application - Getting partition leaders for topic
> topic-rep-3-test
> [ESC[36mdebugESC[0m] application - Getting partition leaders for topic
> PofApiTest
> [ESC[36mdebugESC[0m] application - Getting partition leaders for topic
> PofApiTest-2
> [ESC[36mdebugESC[0m] application - Getting partition leaders for topic
> fileread
> [ESC[36mdebugESC[0m] application - Getting partition leaders for topic
> pageview
> [ESC[36mdebugESC[0m] application - Getting partition log sizes for topic
> topic-exist-test from partition leaders 10.100.71.42:9092,
> 10.100.71.42:9092,
> 10.100.71.42:9092, 10.100.71.42:9092, 10.100.71.42:9092, 10.100.71.42:9092
> ,
> 10.100.71.42:9092, 10.100.71.42:9092
> [ESC[33mwarnESC[0m] application - Could not connect to partition leader
> 10.100.71.42:9092. Error message: Failed to open a socket.
> [ESC[33mwarnESC[0m] application - Could not connect to partition leader
> 10.100.71.42:9092. Error message: Failed to open a socket.
> [ESC[33mwarnESC[0m] application - Could not connect to partition leader
> 10.100.71.42:9092. Error message: Failed to open a socket.
> [ESC[33mwarnESC[0m] application - Could not connect to partition leader
> 10.100.71.42:9092. Error message: Failed to open a socket.
> [ESC[33mwarnESC[0m] application - Could not connect to partition leader
> 10.100.71.42:9092. Error message: Failed to open a socket.
> [ESC[33mwarnESC[0m] application - Could not connect to partition leader
> 10.100.71.42:9092. Error message: Failed to open a socket.
> [ESC[33mwarnESC[0m] application - Could not connect to partition leader
> 10.100.71.42:9092. Error message: Failed to open a socket.
> [ESC[33mwarnESC[0m] application - Could not connect to partition leader
> 10.100.71.42:9092. Error message: Failed to open a socket.
> [ESC[36mdebugESC[0m] application - Getting partition offsets for topic
> topic-exist-test
>
> -jar:9092, exemplary-birds:9092, voluminous-mass:9092
> [ESC[33mwarnESC[0m] application - Could not connect to partition leader
> voluminous-mass:9092. Error message: Failed to open a socket.
> [ESC[33mwarnESC[0m] application - Could not connect to partition leader
> exemplary-birds:9092. Error message: Failed to open a socket.
> [ESC[33mwarnESC[0m] application - Could not connect to partition leader
> harmful-jar:9092. Error message: Failed to open a socket.
> [ESC[33mwarnESC[0m] application - Could not connect to partition leader
> voluminous-mass:9092. Error message: Fa

Re: rebalancing leadership quite often

2015-02-20 Thread Joe Stein
https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whypartitionleadersmigratethemselvessometimes
?

~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -

On Fri, Feb 20, 2015 at 7:19 PM, Sa Li  wrote:

> Hi, All
>
> My dev cluster has three nodes (1, 2, 3), but I've seen quite often that
> the 1 node just not work as a leader, I run preferred-replica-election many
> time, every time I run replica election, I see 1 turn out to be leader for
> some partitions, but it just stop leadership after a while, and the that
> partitions transfer to other partitions, meaning it will not be used for
> consumers.
>
> I thought that happens only that broker 1 was crashed or stopped, but it
> didn't, but I still see that leadership shift. Any ideas about this?
>
> thanks
>
> --
>
> Alec Li
>


Re: [DISCUSS] KIP-12 - Kafka Sasl/Kerberos implementation

2015-02-12 Thread Joe Stein
Lets have a KIP for removing JDK 6 support (on another thread) as it would
make for a good discussion and something we should be able to come to
closure on as a dependency for this and other patches that would favor it
happening.

~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -

On Thu, Feb 12, 2015 at 11:37 AM, Harsha  wrote:

>
> Thanks for the review Gwen. I'll keep in mind about java 6 support.
> -Harsha
> On Wed, Feb 11, 2015, at 03:28 PM, Gwen Shapira wrote:
> > Looks good. Thanks for working on this.
> >
> > One note, the Channel implementation from SSL only works on Java7 and up.
> > Since we are still supporting Java 6, I'm working on a lighter wrapper
> > that
> > will be a composite on SocketChannel but will not extend it. Perhaps
> > you'll
> > want to use that.
> >
> > Looking forward to the patch!
> >
> > Gwen
> >
> > On Wed, Feb 11, 2015 at 9:17 AM, Harsha  wrote:
> >
> > > Hi,
> > > Here is the initial proposal for sasl/kerberos implementation for
> > > kafka https://cwiki.apache.org/confluence/x/YI4WAw
> > > and JIRA https://issues.apache.org/jira/browse/KAFKA-1686. I am
> > > currently working on prototype which will add more details to the KIP.
> > > Just opening the thread to say the work is in progress. I'll update the
> > > thread with a initial prototype patch.
> > > Thanks,
> > > Harsha
> > >
>


Re: Ship Kafka in on prem product

2015-02-11 Thread Joe Stein
Cool!!!

Maybe on the eco system confluence page we should have a packaging and/or
distro section

~ Joestein

On Wed, Feb 11, 2015 at 1:34 PM, David Morales  wrote:

> Regarding RPM/DEB packages, we, at stratio.com, have a public repository
> which includes RPM and dev packages
>
> http://repository.stratio.com/sds/1.1/ubuntu/13.10/binary/
>
> http://repository.stratio.com/sds/1.1/RHEL/
>
>
>
> Regards.
>
>
> 2015-02-11 19:22 GMT+01:00 Otis Gospodnetic :
>
> > Hi,
> >
> > For what it's worth, we include Kafka (now 0.8.2) in our On Premises
> > version of SPM and Logsene.  Kafka is used to to transport metrics and
> logs
> > there.
> >
> > I do seem to recall some not-so-elegant stuff we had to do around
> > packaging, though. I think because
> > http://kafka.apache.org/downloads.html has only archive files and not
> > RPM/DEB packages.
> >
> > Otis
> > --
> > Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> > Solr & Elasticsearch Support * http://sematext.com/
> >
> >
> > On Tue, Feb 10, 2015 at 11:22 AM, Geeta Gharpure 
> > wrote:
> >
> > > Hello
> > >
> > > I am looking for a reference about shipping kafka as part of Enterprise
> > on
> > > prem product. Does anyone know if Kafka can be redistributed safely
> > and/or
> > > is found secure enough to ship as part of enterprise product?
> > >
> > > Thanks in advance for your help.
> > >
> > > Thanks and regards,
> > > Geeta
> > >
> >
>
>
>
> --
>
> David Morales de Frías  ::  +34 607 010 411 :: @dmoralesdf
> 
>
>
> 
> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
> 28224 Pozuelo de Alarcón, Madrid
> Tel: +34 91 828 6473 // www.stratio.com // *@stratiobd
> *
>


Re: [DISCUSS] KIP-12 - Kafka Sasl/Kerberos implementation

2015-02-11 Thread Joe Stein
Thanks Harsha, looks good so far. How were you thinking of running
the KerberosTicketManager as a standalone process or like controller or is
it a layer of code that does the plumbing pieces everywhere?

~ Joestein

On Wed, Feb 11, 2015 at 12:18 PM, Harsha  wrote:

> Hi,
> Here is the initial proposal for sasl/kerberos implementation for
> kafka https://cwiki.apache.org/confluence/x/YI4WAw
> and JIRA https://issues.apache.org/jira/browse/KAFKA-1686. I am
> currently working on prototype which will add more details to the KIP.
> Just opening the thread to say the work is in progress. I'll update the
> thread with a initial prototype patch.
> Thanks,
> Harsha
>


Re: ping kafka server

2015-02-10 Thread Joe Stein
One good canary is to have a topic you loop through every partition and
write a timestamp on every broker for n in 1..P; do; echo
$(datetime)>kafkacat -b host:port -t ops -p n done and trap any errors and
output accordingly.

You can get fancy and start validating reading and stuff but it is a canary
after all.

~ Joestein

On Tue, Feb 10, 2015 at 12:17 PM, Magnus Edenhill 
wrote:

> Relying on just the TCP connection getting established seems a bit poor,
> the easiest non-intrusive approach is probably to query the broker for
> metadata,
> e.g.: kafkacat -b mybroker -L
>
>
> 2015-02-10 1:47 GMT+01:00 Koert Kuipers :
>
> > a simple nagios check_tcp works fine. as gwen indicated kafka closes the
> > connection on me, but this is (supposedly) harmless. i see in server
> logs:
> > [2015-02-09 19:39:17,069] INFO Closing socket connection to /
> 192.168.1.31.
> > (kafka.network.Processor)
> >
> >
> > On Mon, Feb 9, 2015 at 6:06 PM, Scott Clasen  wrote:
> >
> > > I have used nagios in this manner with kafaka before and worked fine.
> > >
> > > On Mon, Feb 9, 2015 at 2:48 PM, Koert Kuipers 
> wrote:
> > >
> > > > i would like to be able to ping kafka servers from nagios to confirm
> > they
> > > > are alive. since kafka servers dont run a http server (web ui) i am
> not
> > > > sure how to do this.
> > > >
> > > > is it safe to establish a "test" tcp connection (so connect and
> > > immediately
> > > > disconnect using telnet or netstat or something like that) to the
> kafka
> > > > server on port 9092 to confirm its alive?
> > > >
> > > > thanks
> > > >
> > >
> >
>


Re: Ship Kafka in on prem product

2015-02-10 Thread Joe Stein
Hey Geeta, have you taken a look at
http://www.cloudera.com/content/cloudera/en/developers/home/cloudera-labs/apache-kafka.html
??? I really liked what they did with 0.8.2-beta and the changes for
0.8.2.0 are looking good
https://github.com/cloudera-labs/kafka/tree/cdh5-0.8.2_1.1.0 I don't know
what their release plan is though.

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
/

On Tue, Feb 10, 2015 at 11:22 AM, Geeta Gharpure  wrote:

> Hello
>
> I am looking for a reference about shipping kafka as part of Enterprise on
> prem product. Does anyone know if Kafka can be redistributed safely and/or
> is found secure enough to ship as part of enterprise product?
>
> Thanks in advance for your help.
>
> Thanks and regards,
> Geeta
>


Re: java API equivalent of --from-beginning?

2015-02-06 Thread Joe Stein
You have to delete the znode from zookeeper for your consumer group
https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/consumer/ConsoleConsumer.scala#L181

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
/

On Fri, Feb 6, 2015 at 5:48 AM, Yang  wrote:

> we are trying to achive the equivalent of
> bin/kafka-console-consumer.sh
>
>
> we tried " props.put("auto.offset.reset", "smallest");
>
> but it still doesn't work.
>
> how can I achieve --from-beginning in java code? thanks
>
> (we are using 8.1.1)
>


Re: Kafka Architecture diagram

2015-02-05 Thread Joe Stein
Ankur,

There is more from papers and presentations you can check out too
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+papers+and+presentations
if you haven't already.

- Joestein

On Thu, Feb 5, 2015 at 12:57 PM, Conikee  wrote:

> Michael Noll's blog posting might serve your purpose as well
>
>
> http://www.michael-noll.com/blog/2013/03/13/running-a-multi-broker-apache-kafka-cluster-on-a-single-node/
>
> Sent from my iPhone
>
> > On Feb 5, 2015, at 9:52 AM, Gwen Shapira  wrote:
> >
> > The Kafka documentation has several good diagrams. Did you check it out?
> >
> > http://kafka.apache.org/documentation.html
> >
> >> On Thu, Feb 5, 2015 at 6:31 AM, Ankur Jain 
> wrote:
> >>
> >> Hi Team,
> >>
> >> I am looking out high and low level architecture diagram of Kafka with
> >> Zookeeper, but haven't got any good one , showing concepts like
> >> replication, high availability etc.
> >>
> >> Please do let me know if there is any...
> >>
> >> Thank you
> >> Ankur
> >>
>


Re: New Producer - ONLY sync mode?

2015-02-04 Thread Joe Stein
Now that 0.8.2.0 is in the wild I look forward to working with more and
seeing what folks start to-do with this function
https://dist.apache.org/repos/dist/release/kafka/0.8.2.0/javadoc/org/apache/kafka/clients/producer/KafkaProducer.html#send(org.apache.kafka.clients.producer.ProducerRecord,
org.apache.kafka.clients.producer.Callback) and keeping it fully non
blocking.

One sprint I know of coming up is going to have the new producer as a
component in their reactive calls and handling bookkeeping and retries
through that type of call back approach. Should work well (haven't tried
but don't see why not) with Akka, ScalaZ, RxJava, Finagle, etc, etc, etc in
functional languages and frameworks.

I think as JDK 8 starts to get out in the wild too more (may after jdk7
eol) the use of .get will be reduced (imho) and folks will be thinking more
about non-blocking vs blocking and not as so much sync vs async but my
crystal ball just back from the shop so well see =8^)

/*******
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
/

On Tue, Feb 3, 2015 at 10:45 PM, Jay Kreps  wrote:

> Hey guys,
>
> I guess the question is whether it really matters how many underlying
> network requests occur? It is very hard for an application to depend on
> this even in the old producer since it depends on the partitions placement
> (a send to two partitions may go to either one machine or two and so it
> will send either one or two requests). So when you send a batch in one call
> you may feel that is "all at once", but that is only actually guaranteed if
> all messages have the same partition.
>
> The challenge is allowing even this in the presence of bounded request
> sizes which we have in the new producer. The user sends a list of objects
> and the serialized size that will result is not very apparent to them. If
> you break it up into multiple requests then that is kind of further ruining
> the illusion of a single send. If you don't then you have to just error out
> which is equally annoying to have to handle.
>
> But I'm not sure if from your description you are saying you actually care
> how many physical requests are issued. I think it is more like it is just
> syntactically annoying to send a batch of data now because it needs a for
> loop.
>
> Currently to do this you would do:
>
> List responses = new ArrayList();
> for(input: recordBatch)
> responses.add(producer.send(input));
> for(response: responses)
> response.get
>
> If you don't depend on the offset/error info we could add a flush call so
> you could instead do
> for(input: recordBatch)
> producer.send(input);
> producer.flush();
>
> But if you do want the error/offset then you are going to be back to the
> original case.
>
> Thoughts?
>
> -Jay
>
> On Mon, Feb 2, 2015 at 1:48 PM, Gwen Shapira 
> wrote:
>
> > I've been thinking about that too, since both Flume and Sqoop rely on
> > send(List) API of the old API.
> >
> > I'd like to see this API come back, but I'm debating how we'd handle
> > errors. IIRC, the old API would fail an entire batch on a single
> > error, which can lead to duplicates. Having N callbacks lets me retry
> > / save / whatever just the messages that had issues.
> >
> > If messages had identifiers from the producer side, we could have the
> > API call the callback with a list of message-ids and their status. But
> > they don't :)
> >
> > Any thoughts on how you'd like it to work?
> >
> > Gwen
> >
> >
> > On Mon, Feb 2, 2015 at 1:38 PM, Pradeep Gollakota 
> > wrote:
> > > This is a great question Otis. Like Gwen said, you can accomplish Sync
> > mode
> > > by setting the batch size to 1. But this does highlight a shortcoming
> of
> > > the new producer API.
> > >
> > > I really like the design of the new API and it has really great
> > properties
> > > and I'm enjoying working with it. However, once API that I think we're
> > > lacking is a "batch" API. Currently, I have to iterate over a batch and
> > > call .send() on each record, which returns n callbacks instead of 1
> > > callback for the whole batch. This significantly complicates recovery
> > logic
> > > where we need to commit a batch as opposed 1 record at a time.
> > >
> > > Do you guys have any plans to add better semantics around batches?
> > >
> > > On Mon, Feb 2, 2015 at 1:34 PM, Gwen Shapir

Re: How to configure "asynchronous replication"

2015-02-04 Thread Joe Stein
Hi Chen,

Take a look at the quick start
https://kafka.apache.org/documentation.html#quickstart step 3 change
--replication-factor 1 to be --replication-factor 3 would do that. You need
to have at least 3 brokers running live when you do that. You also want to
look at changing your partition count too (same line --partitions).

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
/

On Wed, Feb 4, 2015 at 5:46 AM, 陈洪海  wrote:

> Hi Kafka,
>
> The doc mentioned that “Single producer thread, 3x
> ASYNChronous replication”, how to configure it?  What’s the configuration
> item?  Or need use async produce api to send?
>
>
> https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
>
>
>
> Thanks,
>
>  Chen
>
>


Re: [kafka-clients] Re: [VOTE] 0.8.2.0 Candidate 3

2015-02-02 Thread Joe Stein
Huzzah!

Thanks Jun for preparing the release candidates and getting this out to the
community.

- Joe Stein

On Mon, Feb 2, 2015 at 2:27 PM, Jun Rao  wrote:

> The following are the results of the votes.
>
> +1 binding = 3 votes
> +1 non-binding = 1 votes
> -1 = 0 votes
> 0 = 0 votes
>
> The vote passes.
>
> I will release artifacts to maven central, update the dist svn and download
> site. Will send out an announce after that.
>
> Thanks everyone that contributed to the work in 0.8.2.0!
>
> Jun
>
> On Wed, Jan 28, 2015 at 9:22 PM, Jun Rao  wrote:
>
>> This is the third candidate for release of Apache Kafka 0.8.2.0.
>>
>> Release Notes for the 0.8.2.0 release
>>
>> https://people.apache.org/~junrao/kafka-0.8.2.0-candidate3/RELEASE_NOTES.html
>>
>> *** Please download, test and vote by Saturday, Jan 31, 11:30pm PT
>>
>> Kafka's KEYS file containing PGP keys we use to sign the release:
>> http://kafka.apache.org/KEYS in addition to the md5, sha1 and sha2
>> (SHA256) checksum.
>>
>> * Release artifacts to be voted upon (source and binary):
>> https://people.apache.org/~junrao/kafka-0.8.2.0-candidate3/
>>
>> * Maven artifacts to be voted upon prior to release:
>> https://repository.apache.org/content/groups/staging/
>>
>> * scala-doc
>> https://people.apache.org/~junrao/kafka-0.8.2.0-candidate3/scaladoc/
>>
>> * java-doc
>> https://people.apache.org/~junrao/kafka-0.8.2.0-candidate3/javadoc/
>>
>> * The tag to be voted upon (off the 0.8.2 branch) is the 0.8.2.0 tag
>>
>> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=223ac42a7a2a0dab378cc411f4938a9cea1eb7ea
>> (commit 7130da90a9ee9e6fb4beb2a2a6ab05c06c9bfac4)
>>
>> /***
>>
>> Thanks,
>>
>> Jun
>>
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "kafka-clients" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kafka-clients+unsubscr...@googlegroups.com.
> To post to this group, send email to kafka-clie...@googlegroups.com.
> Visit this group at http://groups.google.com/group/kafka-clients.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/kafka-clients/CAFc58G-XRpw9ik35%2BCmsYm-uc2hjHet6fOpw4bF90ka9Z%3DhH%3Dw%40mail.gmail.com
> <https://groups.google.com/d/msgid/kafka-clients/CAFc58G-XRpw9ik35%2BCmsYm-uc2hjHet6fOpw4bF90ka9Z%3DhH%3Dw%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>


Re: [kafka-clients] Re: [VOTE] 0.8.2.0 Candidate 3

2015-01-30 Thread Joe Stein
+1 (binding)

verified signatures, ran quick start, tests all passed.

- Joe Stein

On Fri, Jan 30, 2015 at 12:04 PM, Jun Rao  wrote:

> This is a reminder that the vote will close tomorrow night. Please test
> RC3 out and vote before the deadline.
>
> Thanks,
>
> Jun
>
> On Wed, Jan 28, 2015 at 11:22 PM, Jun Rao  wrote:
>
>> This is the third candidate for release of Apache Kafka 0.8.2.0.
>>
>> Release Notes for the 0.8.2.0 release
>>
>> https://people.apache.org/~junrao/kafka-0.8.2.0-candidate3/RELEASE_NOTES.html
>>
>> *** Please download, test and vote by Saturday, Jan 31, 11:30pm PT
>>
>> Kafka's KEYS file containing PGP keys we use to sign the release:
>> http://kafka.apache.org/KEYS in addition to the md5, sha1 and sha2
>> (SHA256) checksum.
>>
>> * Release artifacts to be voted upon (source and binary):
>> https://people.apache.org/~junrao/kafka-0.8.2.0-candidate3/
>>
>> * Maven artifacts to be voted upon prior to release:
>> https://repository.apache.org/content/groups/staging/
>>
>> * scala-doc
>> https://people.apache.org/~junrao/kafka-0.8.2.0-candidate3/scaladoc/
>>
>> * java-doc
>> https://people.apache.org/~junrao/kafka-0.8.2.0-candidate3/javadoc/
>>
>> * The tag to be voted upon (off the 0.8.2 branch) is the 0.8.2.0 tag
>>
>> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=223ac42a7a2a0dab378cc411f4938a9cea1eb7ea
>> (commit 7130da90a9ee9e6fb4beb2a2a6ab05c06c9bfac4)
>>
>> /***
>>
>> Thanks,
>>
>> Jun
>>
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "kafka-clients" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kafka-clients+unsubscr...@googlegroups.com.
> To post to this group, send email to kafka-clie...@googlegroups.com.
> Visit this group at http://groups.google.com/group/kafka-clients.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/kafka-clients/CAFc58G9pGUpo7jp4BXY3Y_doPTzZiJKRCPPY-oYj1sqAapUx-Q%40mail.gmail.com
> <https://groups.google.com/d/msgid/kafka-clients/CAFc58G9pGUpo7jp4BXY3Y_doPTzZiJKRCPPY-oYj1sqAapUx-Q%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>


Re: Poll: Producer/Consumer impl/language you use?

2015-01-28 Thread Joe Stein
I kind of look at the Storm, Spark, Samza, etc integrations as
producers/consumers too.

Not sure if that maybe was getting lumped also into other.

I think Jason's 90/10 80/20 70/30 would be found to be typical.

As far as the Scala API goes, I think we should have a wrapper around the
shiny new Java Consumer. Folks I know use things like scalaz streams
https://github.com/scalaz/scalaz-stream which the new consumer can work
nicely with I think. It would be great if we could come up with a new Scala
layer on top of the Java consumer that we release in the project. One of my
engineers is taking a look at that now unless someone is already working on
that? We are using the partition static assignment in the new consumer and
just using Mesos for handling re-balance for us. When he gets further along
and if it makes sense we will shoot a KIP around people can chat about it
on dev.

- Joe Stein

On Wed, Jan 28, 2015 at 10:33 AM, Otis Gospodnetic <
otis.gospodne...@gmail.com> wrote:

> Good point, Jason.  Not sure how we could account for that easily.  But
> maybe that is at least a partial explanation of the Java % being under 50%
> when Java in general is more popular than that...
>
> Otis
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>
> On Wed, Jan 28, 2015 at 1:22 PM, Jason Rosenberg  wrote:
>
> > I think the results could be a bit skewed, in cases where an organization
> > uses multiple languages, but not equally.  In our case, we overwhelmingly
> > use java clients (>90%).  But we also have ruby and Go clients too.  But
> in
> > the poll, these come out as equally used client languages.
> >
> > Jason
> >
> > On Wed, Jan 28, 2015 at 12:05 PM, David McNelis <
> > dmcne...@emergingthreats.net> wrote:
> >
> > > I agree with Stephen, it would be really unfortunate to see the Scala
> api
> > > go away.
> > >
> > > On Wed, Jan 28, 2015 at 11:57 AM, Stephen Boesch 
> > > wrote:
> > >
> > > > The scala API going away would be a minus. As Koert mentioned we
> could
> > > use
> > > > the java api but it is less ..  well .. functional.
> > > >
> > > > Kafka is included in the Spark examples and external modules and is
> > > popular
> > > > as a component of ecosystems on Spark (for which scala is the primary
> > > > language).
> > > >
> > > > 2015-01-28 8:51 GMT-08:00 Otis Gospodnetic <
> otis.gospodne...@gmail.com
> > >:
> > > >
> > > > > Hi,
> > > > >
> > > > > I don't have a good excuse here. :(
> > > > > I thought about including Scala, but for some reason didn't do
> it.  I
> > > see
> > > > > 12-13% of people chose "Other".  Do you think that is because I
> > didn't
> > > > > include Scala?
> > > > >
> > > > > Also, is the Scala API reeally going away?
> > > > >
> > > > > Otis
> > > > > --
> > > > > Monitoring * Alerting * Anomaly Detection * Centralized Log
> > Management
> > > > > Solr & Elasticsearch Support * http://sematext.com/
> > > > >
> > > > >
> > > > > On Tue, Jan 20, 2015 at 4:43 PM, Koert Kuipers 
> > > > wrote:
> > > > >
> > > > > > no scala? although scala can indeed use the java api, its
> ugly
> > we
> > > > > > prefer to use the scala api (which i believe will go away
> > > > unfortunately)
> > > > > >
> > > > > > On Tue, Jan 20, 2015 at 2:52 PM, Otis Gospodnetic <
> > > > > > otis.gospodne...@gmail.com> wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > I was wondering which implementations/languages people use for
> > > their
> > > > > > Kafka
> > > > > > > Producer/Consumers not everyone is using the Java APIs.  So
> > > > here's
> > > > > a
> > > > > > > 1-question poll:
> > > > > > >
> > > > > > >
> > > > >
> > >
> http://blog.sematext.com/2015/01/20/kafka-poll-producer-consumer-client/
> > > > > > >
> > > > > > > Will share the results in about a week when we have enough
> votes.
> > > > > > >
> > > > > > > Thanks!
> > > > > > > Otis
> > > > > > > --
> > > > > > > Monitoring * Alerting * Anomaly Detection * Centralized Log
> > > > Management
> > > > > > > Solr & Elasticsearch Support * http://sematext.com/
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: Regarding Kafka release 0.8.2-beta

2015-01-26 Thread Joe Stein
Matve wr should add to the documentation experimental so folks that don't
know understand.

/*******
Joe Stein
Founder, Principal Consultant
Big Data Open Source Security LLC
http://www.stealth.ly
Twitter: @allthingshadoop
/
On Jan 26, 2015 11:56 AM, "Jason Rosenberg"  wrote:

> shouldn't the new consumer api be removed from the 0.8.2 code base then?
>
> On Fri, Jan 23, 2015 at 10:30 AM, Joe Stein  wrote:
>
> > The new consumer is scheduled for 0.9.0.
> >
> > Currently Kafka release candidate 2 for 0.8.2.0 is being voted on.
> >
> > There is an in progress patch to the new consumer that you can try out
> > https://issues.apache.org/jira/browse/KAFKA-1760
> >
> > /***
> >  Joe Stein
> >  Founder, Principal Consultant
> >  Big Data Open Source Security LLC
> >  http://www.stealth.ly
> >  Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> > /
> >
> > On Fri, Jan 23, 2015 at 1:55 AM, Reeni Mathew 
> > wrote:
> >
> > > Hi Team,
> > >
> > > I was playing around with your recent release 0.8.2-beta.
> > > Producer worked fine whereas new consumer did not.
> > >
> > > org.apache.kafka.clients.consumer.KafkaConsumer
> > >
> > > After digging the code I realized that the implementation for the same
> is
> > > not available. Only API is present.
> > > Could you please let me know by when we can expect the implementation
> of
> > > the same.
> > >
> > > Thanks & Regards
> > >
> > > Reeni
> > >
> >
>


Re: [kafka-clients] Re: [VOTE] 0.8.2.0 Candidate 2 (with the correct links)

2015-01-26 Thread Joe Stein
+1 (binding)

artifacts and quick start look good. I ran in some client code, minor edits
from 0-8.2-beta https://github.com/stealthly/scala-kafka/pull/26

On Mon, Jan 26, 2015 at 3:38 AM, Manikumar Reddy 
wrote:

> +1 (Non-binding)
> Verified source package, unit tests, release build, topic deletion,
> compaction and random testing
>
> On Mon, Jan 26, 2015 at 6:14 AM, Neha Narkhede  wrote:
>
>> +1 (binding)
>> Verified keys, quick start, unit tests.
>>
>> On Sat, Jan 24, 2015 at 4:26 PM, Joe Stein  wrote:
>>
>> > That makes sense, thanks!
>> >
>> > On Sat, Jan 24, 2015 at 7:00 PM, Jay Kreps  wrote:
>> >
>> > > But I think the flaw in trying to guess what kind of serializer they
>> will
>> > > use is when we get it wrong. Basically let's say we guess "String".
>> Say
>> > 30%
>> > > of the time we will be right and we will save the two configuration
>> > lines.
>> > > 70% of the time we will be wrong and the user gets a super cryptic
>> > > ClassCastException: "xyz cannot be cast to [B" (because [B is how java
>> > > chooses to display the byte array class just to up the pain), then
>> they
>> > > figure out how to subscribe to our mailing list and email us the
>> cryptic
>> > > exception, then we explain about how we helpfully set these properties
>> > for
>> > > them to save them time. :-)
>> > >
>> > >
>> https://www.google.com/?gws_rd=ssl#q=kafka+classcastexception+%22%5BB%22
>> > >
>> > > I think basically we did this experiment with the old clients and the
>> > > conclusion is that serialization is something you basically have to
>> think
>> > > about to use Kafka and trying to guess just makes things worse.
>> > >
>> > > -Jay
>> > >
>> > > On Sat, Jan 24, 2015 at 2:51 PM, Joe Stein 
>> wrote:
>> > >
>> > >> Maybe. I think the StringSerialzer could look more like a typical
>> type
>> > of
>> > >> message.  Instead of encoding being a property it would be more
>> > typically
>> > >> just written in the bytes.
>> > >>
>> > >> On Sat, Jan 24, 2015 at 12:12 AM, Jay Kreps 
>> > wrote:
>> > >>
>> > >> > I don't think so--see if you buy my explanation. We previously
>> > defaulted
>> > >> > to the byte array serializer and it was a source of unending
>> > frustration
>> > >> > and confusion. Since it wasn't a required config people just went
>> > along
>> > >> > plugging in whatever objects they had, and thinking that changing
>> the
>> > >> > parametric types would somehow help. Then they would get a class
>> case
>> > >> > exception and assume our stuff was somehow busted, not realizing we
>> > had
>> > >> > helpfully configured a type different from what they were passing
>> in
>> > >> under
>> > >> > the covers. So I think it is actually good for people to think: how
>> > am I
>> > >> > serializing my data, and getting that exception will make them ask
>> > that
>> > >> > question right?
>> > >> >
>> > >> > -Jay
>> > >> >
>> > >> > On Fri, Jan 23, 2015 at 9:06 PM, Joe Stein 
>> > >> wrote:
>> > >> >
>> > >> >> Should value.serializer in the new java producer be defaulted to
>> > >> >> Array[Byte] ?
>> > >> >>
>> > >> >> I was working on testing some upgrade paths and got this
>> > >> >>
>> > >> >> ! return exception in callback when buffer cannot accept
>> message
>> > >> >>
>> > >> >>   ConfigException: Missing required configuration
>> > >> "value.serializer"
>> > >> >> which has no default value. (ConfigDef.java:124)
>> > >> >>
>> > >> >>
>>  org.apache.kafka.common.config.ConfigDef.parse(ConfigDef.java:124)
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >>
>> > >>
>> >
>> org.apache.kafka.common.config.AbstractConfig.(AbstractConfig.java:48)
>> > >> >>
>> > >> >>
>> &

Re: [kafka-clients] Re: [VOTE] 0.8.2.0 Candidate 2 (with the correct links)

2015-01-24 Thread Joe Stein
That makes sense, thanks!

On Sat, Jan 24, 2015 at 7:00 PM, Jay Kreps  wrote:

> But I think the flaw in trying to guess what kind of serializer they will
> use is when we get it wrong. Basically let's say we guess "String". Say 30%
> of the time we will be right and we will save the two configuration lines.
> 70% of the time we will be wrong and the user gets a super cryptic
> ClassCastException: "xyz cannot be cast to [B" (because [B is how java
> chooses to display the byte array class just to up the pain), then they
> figure out how to subscribe to our mailing list and email us the cryptic
> exception, then we explain about how we helpfully set these properties for
> them to save them time. :-)
>
> https://www.google.com/?gws_rd=ssl#q=kafka+classcastexception+%22%5BB%22
>
> I think basically we did this experiment with the old clients and the
> conclusion is that serialization is something you basically have to think
> about to use Kafka and trying to guess just makes things worse.
>
> -Jay
>
> On Sat, Jan 24, 2015 at 2:51 PM, Joe Stein  wrote:
>
>> Maybe. I think the StringSerialzer could look more like a typical type of
>> message.  Instead of encoding being a property it would be more typically
>> just written in the bytes.
>>
>> On Sat, Jan 24, 2015 at 12:12 AM, Jay Kreps  wrote:
>>
>> > I don't think so--see if you buy my explanation. We previously defaulted
>> > to the byte array serializer and it was a source of unending frustration
>> > and confusion. Since it wasn't a required config people just went along
>> > plugging in whatever objects they had, and thinking that changing the
>> > parametric types would somehow help. Then they would get a class case
>> > exception and assume our stuff was somehow busted, not realizing we had
>> > helpfully configured a type different from what they were passing in
>> under
>> > the covers. So I think it is actually good for people to think: how am I
>> > serializing my data, and getting that exception will make them ask that
>> > question right?
>> >
>> > -Jay
>> >
>> > On Fri, Jan 23, 2015 at 9:06 PM, Joe Stein 
>> wrote:
>> >
>> >> Should value.serializer in the new java producer be defaulted to
>> >> Array[Byte] ?
>> >>
>> >> I was working on testing some upgrade paths and got this
>> >>
>> >> ! return exception in callback when buffer cannot accept message
>> >>
>> >>   ConfigException: Missing required configuration
>> "value.serializer"
>> >> which has no default value. (ConfigDef.java:124)
>> >>
>> >>   org.apache.kafka.common.config.ConfigDef.parse(ConfigDef.java:124)
>> >>
>> >>
>> >>
>> >>
>> org.apache.kafka.common.config.AbstractConfig.(AbstractConfig.java:48)
>> >>
>> >>
>> >>
>> >>
>> org.apache.kafka.clients.producer.ProducerConfig.(ProducerConfig.java:235)
>> >>
>> >>
>> >>
>> >>
>> org.apache.kafka.clients.producer.KafkaProducer.(KafkaProducer.java:129)
>> >>
>> >>
>> >>
>> >>
>> ly.stealth.testing.BaseSpec$class.createNewKafkaProducer(BaseSpec.scala:42)
>> >>
>> >>
>>  ly.stealth.testing.KafkaSpec.createNewKafkaProducer(KafkaSpec.scala:36)
>> >>
>> >>
>> >>
>> >>
>> ly.stealth.testing.KafkaSpec$$anonfun$3$$anonfun$apply$37.apply(KafkaSpec.scala:175)
>> >>
>> >>
>> >>
>> >>
>> ly.stealth.testing.KafkaSpec$$anonfun$3$$anonfun$apply$37.apply(KafkaSpec.scala:170)
>> >>
>> >>
>> >>
>> >> On Fri, Jan 23, 2015 at 5:55 PM, Jun Rao  wrote:
>> >>
>> >> > This is a reminder that the deadline for the vote is this Monday, Jan
>> >> 26,
>> >> > 7pm PT.
>> >> >
>> >> > Thanks,
>> >> >
>> >> > Jun
>> >> >
>> >> > On Wed, Jan 21, 2015 at 8:28 AM, Jun Rao  wrote:
>> >> >
>> >> >> This is the second candidate for release of Apache Kafka 0.8.2.0.
>> There
>> >> >> has been some changes since the 0.8.2 beta release, especially in
>> the
>> >> new
>> >> >> java producer api and jmx mbean names. It would be great if people
>> can
>> >> test
>> >> >> this out thoroughl

Re: [kafka-clients] Re: [VOTE] 0.8.2.0 Candidate 2 (with the correct links)

2015-01-24 Thread Joe Stein
Maybe. I think the StringSerialzer could look more like a typical type of
message.  Instead of encoding being a property it would be more typically
just written in the bytes.

On Sat, Jan 24, 2015 at 12:12 AM, Jay Kreps  wrote:

> I don't think so--see if you buy my explanation. We previously defaulted
> to the byte array serializer and it was a source of unending frustration
> and confusion. Since it wasn't a required config people just went along
> plugging in whatever objects they had, and thinking that changing the
> parametric types would somehow help. Then they would get a class case
> exception and assume our stuff was somehow busted, not realizing we had
> helpfully configured a type different from what they were passing in under
> the covers. So I think it is actually good for people to think: how am I
> serializing my data, and getting that exception will make them ask that
> question right?
>
> -Jay
>
> On Fri, Jan 23, 2015 at 9:06 PM, Joe Stein  wrote:
>
>> Should value.serializer in the new java producer be defaulted to
>> Array[Byte] ?
>>
>> I was working on testing some upgrade paths and got this
>>
>> ! return exception in callback when buffer cannot accept message
>>
>>   ConfigException: Missing required configuration "value.serializer"
>> which has no default value. (ConfigDef.java:124)
>>
>>   org.apache.kafka.common.config.ConfigDef.parse(ConfigDef.java:124)
>>
>>
>>
>> org.apache.kafka.common.config.AbstractConfig.(AbstractConfig.java:48)
>>
>>
>>
>> org.apache.kafka.clients.producer.ProducerConfig.(ProducerConfig.java:235)
>>
>>
>>
>> org.apache.kafka.clients.producer.KafkaProducer.(KafkaProducer.java:129)
>>
>>
>>
>> ly.stealth.testing.BaseSpec$class.createNewKafkaProducer(BaseSpec.scala:42)
>>
>>   ly.stealth.testing.KafkaSpec.createNewKafkaProducer(KafkaSpec.scala:36)
>>
>>
>>
>> ly.stealth.testing.KafkaSpec$$anonfun$3$$anonfun$apply$37.apply(KafkaSpec.scala:175)
>>
>>
>>
>> ly.stealth.testing.KafkaSpec$$anonfun$3$$anonfun$apply$37.apply(KafkaSpec.scala:170)
>>
>>
>>
>> On Fri, Jan 23, 2015 at 5:55 PM, Jun Rao  wrote:
>>
>> > This is a reminder that the deadline for the vote is this Monday, Jan
>> 26,
>> > 7pm PT.
>> >
>> > Thanks,
>> >
>> > Jun
>> >
>> > On Wed, Jan 21, 2015 at 8:28 AM, Jun Rao  wrote:
>> >
>> >> This is the second candidate for release of Apache Kafka 0.8.2.0. There
>> >> has been some changes since the 0.8.2 beta release, especially in the
>> new
>> >> java producer api and jmx mbean names. It would be great if people can
>> test
>> >> this out thoroughly.
>> >>
>> >> Release Notes for the 0.8.2.0 release
>> >>
>> >>
>> https://people.apache.org/~junrao/kafka-0.8.2.0-candidate2/RELEASE_NOTES.html
>> >>
>> >> *** Please download, test and vote by Monday, Jan 26h, 7pm PT
>> >>
>> >> Kafka's KEYS file containing PGP keys we use to sign the release:
>> >> http://kafka.apache.org/KEYS in addition to the md5, sha1 and sha2
>> >> (SHA256) checksum.
>> >>
>> >> * Release artifacts to be voted upon (source and binary):
>> >> https://people.apache.org/~junrao/kafka-0.8.2.0-candidate2/
>> >>
>> >> * Maven artifacts to be voted upon prior to release:
>> >> https://repository.apache.org/content/groups/staging/
>> >>
>> >> * scala-doc
>> >> https://people.apache.org/~junrao/kafka-0.8.2.0-candidate2/scaladoc/
>> >>
>> >> * java-doc
>> >> https://people.apache.org/~junrao/kafka-0.8.2.0-candidate2/javadoc/
>> >>
>> >> * The tag to be voted upon (off the 0.8.2 branch) is the 0.8.2.0 tag
>> >>
>> >>
>> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=058d58adef2ab2787e49d8efeefd61bb3d32f99c
>> >> (commit 0b312a6b9f0833d38eec434bfff4c647c1814564)
>> >>
>> >> /***
>> >>
>> >> Thanks,
>> >>
>> >> Jun
>> >>
>> >>
>> >  --
>> > You received this message because you are subscribed to the Google
>> Groups
>> > "kafka-clients" group.
>> > To unsubscribe from this group and stop receiving emails from it, send
>> an
>> > email to kafka-clients+unsubscr...@googlegroups.com.
>>

Re: [kafka-clients] Re: [VOTE] 0.8.2.0 Candidate 2 (with the correct links)

2015-01-23 Thread Joe Stein
Should value.serializer in the new java producer be defaulted to
Array[Byte] ?

I was working on testing some upgrade paths and got this

! return exception in callback when buffer cannot accept message

  ConfigException: Missing required configuration "value.serializer"
which has no default value. (ConfigDef.java:124)

  org.apache.kafka.common.config.ConfigDef.parse(ConfigDef.java:124)


org.apache.kafka.common.config.AbstractConfig.(AbstractConfig.java:48)


org.apache.kafka.clients.producer.ProducerConfig.(ProducerConfig.java:235)


org.apache.kafka.clients.producer.KafkaProducer.(KafkaProducer.java:129)


ly.stealth.testing.BaseSpec$class.createNewKafkaProducer(BaseSpec.scala:42)

  ly.stealth.testing.KafkaSpec.createNewKafkaProducer(KafkaSpec.scala:36)


ly.stealth.testing.KafkaSpec$$anonfun$3$$anonfun$apply$37.apply(KafkaSpec.scala:175)


ly.stealth.testing.KafkaSpec$$anonfun$3$$anonfun$apply$37.apply(KafkaSpec.scala:170)



On Fri, Jan 23, 2015 at 5:55 PM, Jun Rao  wrote:

> This is a reminder that the deadline for the vote is this Monday, Jan 26,
> 7pm PT.
>
> Thanks,
>
> Jun
>
> On Wed, Jan 21, 2015 at 8:28 AM, Jun Rao  wrote:
>
>> This is the second candidate for release of Apache Kafka 0.8.2.0. There
>> has been some changes since the 0.8.2 beta release, especially in the new
>> java producer api and jmx mbean names. It would be great if people can test
>> this out thoroughly.
>>
>> Release Notes for the 0.8.2.0 release
>>
>> https://people.apache.org/~junrao/kafka-0.8.2.0-candidate2/RELEASE_NOTES.html
>>
>> *** Please download, test and vote by Monday, Jan 26h, 7pm PT
>>
>> Kafka's KEYS file containing PGP keys we use to sign the release:
>> http://kafka.apache.org/KEYS in addition to the md5, sha1 and sha2
>> (SHA256) checksum.
>>
>> * Release artifacts to be voted upon (source and binary):
>> https://people.apache.org/~junrao/kafka-0.8.2.0-candidate2/
>>
>> * Maven artifacts to be voted upon prior to release:
>> https://repository.apache.org/content/groups/staging/
>>
>> * scala-doc
>> https://people.apache.org/~junrao/kafka-0.8.2.0-candidate2/scaladoc/
>>
>> * java-doc
>> https://people.apache.org/~junrao/kafka-0.8.2.0-candidate2/javadoc/
>>
>> * The tag to be voted upon (off the 0.8.2 branch) is the 0.8.2.0 tag
>>
>> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=058d58adef2ab2787e49d8efeefd61bb3d32f99c
>> (commit 0b312a6b9f0833d38eec434bfff4c647c1814564)
>>
>> /***
>>
>> Thanks,
>>
>> Jun
>>
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "kafka-clients" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kafka-clients+unsubscr...@googlegroups.com.
> To post to this group, send email to kafka-clie...@googlegroups.com.
> Visit this group at http://groups.google.com/group/kafka-clients.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/kafka-clients/CAFc58G83a%3DsvoKkkB3476kpbcQ8p0Fob6vtJYj9CgxMACvvEEQ%40mail.gmail.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>


Re: Regarding Kafka release 0.8.2-beta

2015-01-23 Thread Joe Stein
The new consumer is scheduled for 0.9.0.

Currently Kafka release candidate 2 for 0.8.2.0 is being voted on.

There is an in progress patch to the new consumer that you can try out
https://issues.apache.org/jira/browse/KAFKA-1760

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
/

On Fri, Jan 23, 2015 at 1:55 AM, Reeni Mathew 
wrote:

> Hi Team,
>
> I was playing around with your recent release 0.8.2-beta.
> Producer worked fine whereas new consumer did not.
>
> org.apache.kafka.clients.consumer.KafkaConsumer
>
> After digging the code I realized that the implementation for the same is
> not available. Only API is present.
> Could you please let me know by when we can expect the implementation of
> the same.
>
> Thanks & Regards
>
> Reeni
>


Re: Help: Kafka LeaderNotAvailableException

2015-01-22 Thread Joe Stein
Vishal,

Does this error happen every time you are sending? Or just the first time?

Joe Stein

On Thu, Jan 22, 2015 at 4:33 AM,  wrote:

> Hi,
> Let me overview on the issue that I am facing on producing message in
> Kafka:
> I have horthonworks HDP-2.1 installed, along that we have Kafka on other
> node.
>
> * On kafka node:
> Start Zookeepeer
> Start Kafka Broker service
> Send message/producer
> Consume message - Works (Note: here we start Zookeeper locally on kafka01
> node)
>
> * Issue side:
> >Now in HDP-2.1 we have Zookeeper inbuild & we have Zookeeper service
> running on master node
> >I go to Kafka server & Started Kafka Broker
> (In config\server.properties file I have added zookeeper.connect with
> maasternode:2181)
> Then I start producer & Send message... after that we got error like
> kafka.common.LeaderNotAvailableException
>
> [2015-01-17 05:54:09,465] WARN Error while fetching metadata
> [{TopicMetadata for topic fred ->
> No partition metadata for topic fred due to
> kafka.common.LeaderNotAvailableException}] for topic [fred]: class
> kafka.common.LeaderNotAvailableException
> (kafka.producer.BrokerPartitionInfo)
> [2015-01-17 05:54:09,659] WARN Error while fetching metadata
> [{TopicMetadata for topic fred ->
> No partition metadata for topic fred due to
> kafka.common.LeaderNotAvailableException}] for topic [fred]: class
> kafka.common.LeaderNotAvailableException
> (kafka.producer.BrokerPartitionInfo)
> [2015-01-17 05:54:09,659] ERROR Failed to collate messages by topic,
> partition due to: Failed to fetch topic metadata for topic: fred
> (kafka.producer.async.DefaultEventHandler)
> [2015-01-17 05:54:09,802] WARN Error while fetching metadata
> [{TopicMetadata for topic fred ->
> No partition metadata for topic fred due to
> kafka.common.LeaderNotAvailableException}] for topic [fred]: class
> kafka.common.LeaderNotAvailableException
> (kafka.producer.BrokerPartitionInfo)
> [2015-01-17 05:54:09,820] ERROR Failed to send requests for topics fred
> with correlation ids in [0,8] (kafka.producer.async.DefaultEventHandler)
> [2015-01-17 05:54:09,822] ERROR Error in handling batch of 1 events
> (kafka.producer.async.ProducerSendThread)
> kafka.common.FailedToSendMessageException: Failed to send messages after 3
> tries.
> at
> kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:90)
> at
> kafka.producer.async.ProducerSendThread.tryToHandle(ProducerSendThread.scala:104)
> at
> kafka.producer.async.ProducerSendThread$$anonfun$processEvents$3.apply(ProducerSendThread.scala:87)
>
> Can someone suggest what is going wrong.
> Thanks.
>
>
>
>
>
> -Vishal
>
>
> Regards,
> Vishal
> Software Dev Staff Engineer
> Dell | Bangalore
> Ext : 79268
>
>


Re: Kafka Producers and Virtual Kafka Endpoint

2015-01-21 Thread Joe Stein
<< My goal is to be able to hide the underlying hosts from the producer and
be able to use a virtual endpoint (like the vip).

That is a typical approach and good to be done if you can for the reasons
you brought up. The first call of a producer is to "bootstrap" itself with
the list of kafka brokers (through a metadatarequest) and their current
ownership. More on that here
https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-Partitioningandbootstrapping
if you are interested.


/*******
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
/

On Wed, Jan 21, 2015 at 1:44 PM, Sekhar2, Sinu 
wrote:

> Hello Kafka Users
>
> I am new to Kafka and am evaluating it for using it for our high
> throughput messaging needs.
>
> I tested my producers using two different approaches.
> (a) metadata.broker.list=kafka1:9092,kafka2:9092,kafka3:9092
> And
> (b) metadata.broker.list=vip:9092
> Where vip is an ELB (elastic load balancer in AWS) pointing to kafka1,
> kafka2 and kafka3
>
> partitions=3,replication=3 for the topic that I am using.
>
> Both the approaches seem to work from a producer standpoint and I am able
> to consume these messages fine as well.
>
> My goal is to be able to hide the underlying hosts from the producer and
> be able to use a virtual endpoint (like the vip).
> But I am not sure how the clients will know which node is hosting a
> specific partition. If my number of partitions are less than the number of
> nodes, then there is a chance that the request can end up in a node that is
> not hosting the partition as master….what happens in that case?
>
> Has anyone implemented Kafka with a virtual endpoint using ELB, or some
> other load balancer like Nginx in AWS?
> What are the recommendations to hide the underlying node information from
> the kafka clients when it comes to producers OR even consumers?
>
> Any guidance will be extremely helpful!
>
> Cheers!
> Sinu
>


Re: Isr difference between Metadata Response vs /kafka-topics.sh --describe

2015-01-21 Thread Joe Stein
Sounds like you are bumping into this
https://issues.apache.org/jira/browse/KAFKA-1367

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
/

On Wed, Jan 21, 2015 at 10:10 AM, svante karlsson  wrote:

> We are running an external (like in non supported) C++ client library
> agains 0.8.2-rc2 and see differences in the Isr vector in Metadata Response
> compared to what ./kafka-topics.sh --describe returns.
>
> We have a triple replicated topic that is not updated during the test.
>
> kafka-topics.sh
> returns
>
> Topic: saka.test.int_datastream Partition: 0Leader: 3
> Replicas: 3,1,2 Isr: 2,1,3
> Topic: saka.test.int_datastream Partition: 1Leader: 1
> Replicas: 1,2,3 Isr: 2,1,3
>
>
> After some debugging of the received packet it seems the data is actually
> missing from the server.
>
> After a sequensial restart of each broker - everything was back to normal
>
> two pairs of loglines every 10s
>
> initial state:
>
> saka.test.int_datastream Partition: 1 Leader: 1 Replicas: 1, 2, 3, Isr: 2,
> 1,
> saka.test.int_datastream Partition: 0 Leader: 3 Replicas: 3, 1, 2, Isr: 2,
> 1, 3,
> saka.test.int_datastream Partition: 1 Leader: 1 Replicas: 1, 2, 3, Isr: 2,
> 1,
> saka.test.int_datastream Partition: 0 Leader: 3 Replicas: 3, 1, 2, Isr: 2,
> 1, 3,
> saka.test.int_datastream Partition: 1 Leader: 1 Replicas: 1, 2, 3, Isr: 2,
> 1,
> saka.test.int_datastream Partition: 0 Leader: 3 Replicas: 3, 1, 2, Isr: 2,
> 1, 3,
> saka.test.int_datastream Partition: 1 Leader: 1 Replicas: 1, 2, 3, Isr: 2,
> 1,
> saka.test.int_datastream Partition: 0 Leader: 3 Replicas: 3, 1, 2, Isr: 2,
> 1, 3,
>
> restart broker 1
>
> handle_connect_retry_timer
> _connect_async_next z8r102-mc12-4-4.sth-tc2.videoplaza.net:9092
>
> saka.test.int_datastream Partition: 1 Leader: 2 Replicas: 1, 2, 3, Isr: 2,
> 3,
> saka.test.int_datastream Partition: 0 Leader: 3 Replicas: 3, 1, 2, Isr: 2,
> 3,
> saka.test.int_datastream Partition: 1 Leader: 2 Replicas: 1, 2, 3, Isr: 2,
> 3,
> saka.test.int_datastream Partition: 0 Leader: 3 Replicas: 3, 1, 2, Isr: 2,
> 3,
> ...
> saka.test.int_datastream Partition: 1 Leader: 2 Replicas: 1, 2, 3, Isr: 2,
> 3,
> saka.test.int_datastream Partition: 0 Leader: 3 Replicas: 3, 1, 2, Isr: 2,
> 3,
> saka.test.int_datastream Partition: 1 Leader: 2 Replicas: 1, 2, 3, Isr: 2,
> 3,
> saka.test.int_datastream Partition: 0 Leader: 3 Replicas: 3, 1, 2, Isr: 2,
> 3,
> saka.test.int_datastream Partition: 1 Leader: 1 Replicas: 1, 2, 3, Isr: 2,
> 3, 1,
> saka.test.int_datastream Partition: 0 Leader: 3 Replicas: 3, 1, 2, Isr: 2,
> 3,
> saka.test.int_datastream Partition: 1 Leader: 1 Replicas: 1, 2, 3, Isr: 2,
> 3, 1,
> saka.test.int_datastream Partition: 0 Leader: 3 Replicas: 3, 1, 2, Isr: 2,
> 3,
> saka.test.int_datastream Partition: 1 Leader: 1 Replicas: 1, 2, 3, Isr: 2,
> 3, 1,
> saka.test.int_datastream Partition: 0 Leader: 3 Replicas: 3, 1, 2, Isr: 2,
> 3,
> saka.test.int_datastream Partition: 1 Leader: 1 Replicas: 1, 2, 3, Isr: 2,
> 3, 1,
> saka.test.int_datastream Partition: 0 Leader: 3 Replicas: 3, 1, 2, Isr: 2,
> 3,
> saka.test.int_datastream Partition: 1 Leader: 1 Replicas: 1, 2, 3, Isr: 2,
> 3, 1,
> saka.test.int_datastream Partition: 0 Leader: 3 Replicas: 3, 1, 2, Isr: 2,
> 3,
> saka.test.int_datastream Partition: 1 Leader: 1 Replicas: 1, 2, 3, Isr: 2,
> 3, 1,
> saka.test.int_datastream Partition: 0 Leader: 3 Replicas: 3, 1, 2, Isr: 2,
> 3,
> saka.test.int_datastream Partition: 1 Leader: 1 Replicas: 1, 2, 3, Isr: 2,
> 3, 1,
> saka.test.int_datastream Partition: 0 Leader: 3 Replicas: 3, 1, 2, Isr: 2,
> 3,
> saka.test.int_datastream Partition: 1 Leader: 1 Replicas: 1, 2, 3, Isr: 2,
> 3, 1,
> saka.test.int_datastream Partition: 0 Leader: 3 Replicas: 3, 1, 2, Isr: 2,
> 3,
> saka.test.int_datastream Partition: 1 Leader: 1 Replicas: 1, 2, 3, Isr: 2,
> 3, 1,
> saka.test.int_datastream Partition: 0 Leader: 3 Replicas: 3, 1, 2, Isr: 2,
> 3,
> saka.test.int_datastream Partition: 1 Leader: 1 Replicas: 1, 2, 3, Isr: 2,
> 3, 1,
> saka.test.int_datastream Partition: 0 Leader: 3 Replicas: 3, 1, 2, Isr: 2,
> 3,
> saka.test.int_datastream Partition: 1 Leader: 1 Replicas: 1, 2, 3, Isr: 2,
> 3, 1,
> saka.test.int_datastream Partition: 0 Leader: 3 Replicas: 3, 1, 2, Isr: 2,
> 3,
> saka.test.int_datastream Partition: 1 Leader: 1 Replicas: 1, 2, 3, Isr: 2,
> 3, 1,
> saka.test.int_datastream Partition: 0 Leader: 3 Replicas: 3, 1, 2, Isr: 2,
> 3,
> saka.test.int_datastream Partition: 1 Lead

Re: Can I run Kafka Producer in BG and then run a program that outputs to the console?

2015-01-20 Thread Joe Stein
This is a stdin/stdout producer/consumer that works great for (what I
think) you are trying to-do https://github.com/edenhill/kafkacat

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
/

On Tue, Jan 20, 2015 at 3:44 PM, Su She  wrote:

> Hello Everyone,
>
> Sorry for asking multiple questions, but I am currently trying another
> approach to run a kafka producer.
>
> 1) I started the kafka console producer as mentioned here in the background
> (just added a & to the end of the producer script) :
> http://kafka.apache.org/documentation.html#introduction
>
> 2) I then ran a Java script that publishes messages to the console, like
> Hello World, etc, but the messages did not get published (I know the
> connection was set up as if I manually typed in messages, I was able to see
> them in my consumer).
>
> 3) Is this set up possible?
>
> Thanks!
>
> -Su
>


Re: Issue size message

2015-01-19 Thread Joe Stein
If you increase the size of the messages for producing then you **MUST** also
change *replica.fetch.max.bytes i*n the broker* server.properties *otherwise
none of your replicas will be able to fetch from the leader and they will
all fall out of the ISR. You also then need to change your consumers
*fetch.message.max.bytes* in your consumers properties (whoever that might
be configured for your specific consumer being used) so that they can read
that data otherwise you won't see messages downstream.

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
/

On Mon, Jan 19, 2015 at 1:03 PM, Magnus Edenhill  wrote:

> (duplicating the github answer for reference)
>
> Hi Eduardo,
>
> the default maximum fetch size is 1 Meg which means your 2 Meg messages
> will not fit the fetch request.
> Try increasing it by appending -X fetch.message.max.bytes=400 to your
> command line.
>
> Regards,
> Magnus
>
>
> 2015-01-19 17:52 GMT+01:00 Eduardo Costa Alfaia :
>
> > Hi All,
> > I am having an issue when using kafka with librdkafka. I've changed the
> > message.max.bytes to 2MB in my server.properties config file, that is the
> > size of my message, when I run the command line ./rdkafka_performance -C
> -t
> > test -p 0 -b computer49:9092, after consume some messages the consumer
> > remain waiting something that don't arrive. My producer continues sending
> > messages. Some idea?
> >
> > % Using random seed 1421685059, verbosity level 1
> > % 214 messages and 1042835 bytes consumed in 20ms: 10518 msgs/s and 51.26
> > Mb/s, no compression
> > % 21788 messages and 106128192 bytes consumed in 1029ms: 21154 msgs/s and
> > 103.04 Mb/s, no compression
> > % 43151 messages and 210185259 bytes consumed in 2030ms: 21252 msgs/s and
> > 103.52 Mb/s, no compression
> > % 64512 messages and 314233575 bytes consumed in 3031ms: 21280 msgs/s and
> > 103.66 Mb/s, no compression
> > % 86088 messages and 419328692 bytes consumed in 4039ms: 21313 msgs/s and
> > 103.82 Mb/s, no compression
> > % 100504 messages and 490022646 bytes consumed in 5719ms: 17571 msgs/s
> and
> > 85.67 Mb/s, no compression
> > % 100504 messages and 490022646 bytes consumed in 6720ms: 14955 msgs/s
> and
> > 72.92 Mb/s, no compression
> > % 100504 messages and 490022646 bytes consumed in 7720ms: 13018 msgs/s
> and
> > 63.47 Mb/s, no compression
> > % 100504 messages and 490022646 bytes consumed in 8720ms: 11524 msgs/s
> and
> > 56.19 Mb/s, no compression
> > % 100504 messages and 490022646 bytes consumed in 9720ms: 10339 msgs/s
> and
> > 50.41 Mb/s, no compression
> > % 100504 messages and 490022646 bytes consumed in 10721ms: 9374 msgs/s
> and
> > 45.71 Mb/s, no compression
> > % 100504 messages and 490022646 bytes consumed in 11721ms: 8574 msgs/s
> and
> > 41.81 Mb/s, no compression
> > % 100504 messages and 490022646 bytes consumed in 12721ms: 7900 msgs/s
> and
> > 38.52 Mb/s, no compression
> > % 100504 messages and 490022646 bytes consumed in 13721ms: 7324 msgs/s
> and
> > 35.71 Mb/s, no compression
> > % 100504 messages and 490022646 bytes consumed in 14721ms: 6826 msgs/s
> and
> > 33.29 Mb/s, no compression
> > % 100504 messages and 490022646 bytes consumed in 15722ms: 6392 msgs/s
> and
> > 31.17 Mb/s, no compression
> > % 100504 messages and 490022646 bytes consumed in 16722ms: 6010 msgs/s
> and
> > 29.30 Mb/s, no
> > 
> >
> >
> > The software when consume all offset send me the message:
> >
> > % Consumer reached end of unibs.nec [0] message queue at offset 229790
> > RD_KAFKA_RESP_ERR__PARTITION_EOF: [-191]
> >
> > However changed de message.max.bytes to 2MB I don’t receive the code from
> > Kafka.
> >
> > Anyone has some idea?
> >
> > Thanks guys.
> > --
> > Informativa sulla Privacy: http://www.unibs.it/node/8155
> >
>


Re: [VOTE] 0.8.2.0 Candidate 1

2015-01-18 Thread Joe Stein
It works ok in gradle but fails if your using maven.

taking a look at the patch you uploaded now.

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
/

On Sun, Jan 18, 2015 at 8:59 PM, Jun Rao  wrote:

> There seems to be an issue with the pom file for kafka_2.11-0.8.2 jar. It
> references scala-library 2.11, which doesn't exist in maven central
> (2.11.1, etc do exist). This seems to be an issue in the 0.8.2 beta as
> well. I tried to reference kafka_2.11-0.8.2 beta in a project and the build
> failed because scala-library:jar:2.11 doesn't exist. Filed KAFKA-1876 as an
> 0.8.2 blocker. It would be great if people familiar with scala can take a
> look and see if this is a real issue.
>
> Thanks,
>
> Jun
>
> On Tue, Jan 13, 2015 at 7:16 PM, Jun Rao  wrote:
>
> > This is the first candidate for release of Apache Kafka 0.8.2.0. There
> > has been some changes since the 0.8.2 beta release, especially in the new
> > java producer api and jmx mbean names. It would be great if people can
> test
> > this out thoroughly. We are giving people 10 days for testing and voting.
> >
> > Release Notes for the 0.8.2.0 release
> > *
> https://people.apache.org/~junrao/kafka-0.8.2.0-candidate1/RELEASE_NOTES.html
> > <
> https://people.apache.org/~junrao/kafka-0.8.2.0-candidate1/RELEASE_NOTES.html
> >*
> >
> > *** Please download, test and vote by Friday, Jan 23h, 7pm PT
> >
> > Kafka's KEYS file containing PGP keys we use to sign the release:
> > *https://people.apache.org/~junrao/kafka-0.8.2.0-candidate1/KEYS
> > <https://people.apache.org/~junrao/kafka-0.8.2.0-candidate1/KEYS>* in
> > addition to the md5, sha1
> > and sha2 (SHA256) checksum.
> >
> > * Release artifacts to be voted upon (source and binary):
> > *https://people.apache.org/~junrao/kafka-0.8.2.0-candidate1/
> > <https://people.apache.org/~junrao/kafka-0.8.2.0-candidate1/>*
> >
> > * Maven artifacts to be voted upon prior to release:
> > *
> https://people.apache.org/~junrao/kafka-0.8.2.0-candidate1/maven_staging/
> > <
> https://people.apache.org/~junrao/kafka-0.8.2.0-candidate1/maven_staging/
> >*
> >
> > * scala-doc
> > *
> https://people.apache.org/~junrao/kafka-0.8.2.0-candidate1/scaladoc/#package
> > <
> https://people.apache.org/~junrao/kafka-0.8.2.0-candidate1/scaladoc/#package
> >*
> >
> > * java-doc
> > *https://people.apache.org/~junrao/kafka-0.8.2.0-candidate1/javadoc/
> > <https://people.apache.org/~junrao/kafka-0.8.2.0-candidate1/javadoc/>*
> >
> > * The tag to be voted upon (off the 0.8.2 branch) is the 0.8.2.0 tag
> > *
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=b0c7d579f8aeb5750573008040a42b7377a651d5
> > <
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=b0c7d579f8aeb5750573008040a42b7377a651d5
> >*
> >
> > /***
> >
> > Thanks,
> >
> > Jun
> >
>


Re: Consumer questions

2015-01-17 Thread Joe Stein
You can replay the messages with the high level consumer you can even
start at whatever position you want.

Prior to your consumers starting call

ZkUtils.maybeDeletePath(zkClientConnection, "/consumers/" + groupId)

make sure you have in your consumer properties

auto.offset.reset="smallest"

This way you start at the beginning of the stream once the offsets are gone.

If you have many consumers process launching within your group you might
want to have a barrier (
http://zookeeper.apache.org/doc/r3.4.6/recipes.html#sc_recipes_eventHandles)
so that only one of your launching consumer process does this... if you
have only one process or have the ability to-do the operation
administratively then no need.

You can even trigger this to happen while they are all running...more code
to write but 100% doable (and works well if you do it right) have them
watch a node, get the notification, stop what they are doing, barrier,
delete path (or change the value of the offset so you can start wherever
you want), start again...

You can also just change the groupId to something brand new when you start
up with auto.offset.reset="smallest" in your properties, either way. The
above is less lint in zk long term.

It is all just 1s and 0s and just a matter of how many you put together
yourself vs take out of the box that are given too you =8^)

/***********
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
/

On Sat, Jan 17, 2015 at 2:11 PM, Manikumar Reddy 
wrote:

> AFAIK, we can not replay the messages with high level consumer. We need to
> use simple consumer.
>
> On Sun, Jan 18, 2015 at 12:15 AM, Christopher Piggott 
> wrote:
>
> > Thanks.  That helped clear a lot up in my mind.
> >
> > I'm trying to high-level consumer now.  Occasionally I need to do a
> replay
> > of the stream.  The example is:
> >
> >KafkaStream.iterator();
> >
> > which starts at wherever zookeeper recorded as where you left off.
> >
> > With the high level interface, can you request an iterator that starts at
> > the very beginning?
> >
> >
> >
> > On Fri, Jan 16, 2015 at 8:55 PM, Manikumar Reddy 
> > wrote:
> >
> > > Hi,
> > >
> > > 1. In SimpleConsumer, you must keep track of the offsets in your
> > > application.
> > >In the example code,  "readOffset"  variable  can be saved in
> > > redis/zookeeper.
> > >You should plugin this logic in your code. High Level Consumer
> stores
> > > the last
> > >read offset information in ZooKeeper.
> > >
> > > 2. You will get OffsetOutOfRange for any invalid offset.
> > >On error, you can decide what to do. i.e read from the latest ,
> > earliest
> > > or some other offset.
> > >
> > > 3. https://issues.apache.org/jira/browse/KAFKA-1779
> > >
> > > 4. Yes
> > >
> > >
> > > Manikumar
> > >
> > > On Sat, Jan 17, 2015 at 2:25 AM, Christopher Piggott <
> cpigg...@gmail.com
> > >
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > I am following this link:
> > > >
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example
> > > >
> > > > for a consumer design using kafka_2.9.2 version 0.8.1.1 (what I can
> > find
> > > in
> > > > maven central).  I have a couple of questions about the consumer.  I
> > > > checked the archives and didn't see these exact questions asked
> > already,
> > > > but I may have missed them -- I apologize if that is the case.
> > > >
> > > >
> > > > When I create a consumer I give it a consumer ID.  I assumed that it
> > > would
> > > > store my consumer's name as well as the last readOffset in zookeeper,
> > but
> > > > looking in zookeeper that doesn't seem to be the case.  So it seems
> to
> > me
> > > > that when my consumers come up they need to either get the entire
> > history
> > > > from the start of time (which could take a long time, as I have 14
> day
> > > > durability); or else they need to somehow keep track of the read
> offset
> > > > themselves.
> > > >
> > > > I have redis in my system already, so I have the choice of keeping
> > track
> > > o

Re: dumping JMX data

2015-01-16 Thread Joe Stein
Here are some more tools for that
https://cwiki.apache.org/confluence/display/KAFKA/JMX+Reporters depending
on what you have in place and what you are trying todo different options
exist.

A lot of folks like JMX Trans.

My favorite quick out of the box is using
https://github.com/airbnb/kafka-statsd-metrics2 and sending to
https://github.com/kamon-io/docker-grafana-graphite you can quickly chart
and see everything going on.

There are also software as a service options too.

/***
Joe Stein
Founder, Principal Consultant
Big Data Open Source Security LLC
http://www.stealth.ly
Twitter: @allthingshadoop
/
On Jan 16, 2015 8:42 PM, "Scott Chapman"  wrote:

> I appologize in advance for a noob question, just getting started with
> kafka, and trying to get JMX data from it.
>
> So, I had thought that running the JXMTool with no arguments would dump all
> the data, but it doesn't seem to return.
>
> I do know I can query for a specific Mbean name seems to work. But I was
> hoping to dump everything.
>
> I had a hard time finding any examples of using JMXTool, hoping someone
> with some experience might be able to point me in the right direction.
>
> Thanks in advance!
>


Syslog Producer to Kafka

2015-01-16 Thread Joe Stein
Hi, we just opened sourced a syslog producer to Kafka
https://github.com/stealthly/go_kafka_client/tree/master/syslog. This
server has a few features that folks might be interested in.

Besides it producing your data to Kafka you can also configure it (via
command line) to associate meta data with the log data (as a protobuf) when
that happens. The purpose of this is for deeper analytics down stream. e.g.
(--source i-59a059a8 --tag dc=ny9 --tag floor=2 --tag aisle=17 --tag rack=5
--tag u=7 --log.type.id 3) which would associate the meta data with the log
data at a persistent point in time... you can set source, tags, log.type.id
to anything you wanted for yourself for your needs to parse later on
downstream. A little bit more written up about that here
http://allthingshadoop.com/2015/01/16/syslog-producer-for-apache-kafka/

easy to get started too with the docker image

*docker run --net=host stealthly/syslog --topic logs --broker.list
brokerHost:9092*
And thats it, now send your data to port 5140 (TCP) or 5141 (UDP) both
configurable to the host you are running the docker image on.

We are in progress of rolling this out to clients but wanted to get it out
there for anyone to try out and get feedback so that as we continue to
stabilize the existing deployments we can build in more usage scenarios.

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
/


Re: [VOTE] 0.8.2.0 Candidate 1

2015-01-15 Thread Joe Stein
I think that is a change of behavior that organizations may get burned on.
Right now there is no delete data feature. If an operations teams upgrades
to 0.8.2 and someone decides to delete a topic then there will be data
loss. The organization may not have wanted that to happen. I would argue to
not have a way to "by default" delete data. There is something actionable
about consciously turning on a feature that allows anyone with access to
kafka-topics (or zookeeper for that matter) to delete Kafka data. If folks
want that feature then flip the switch prior to upgrade or after and
rolling restart and have at it. By not setting it as default they will know
they have to turn it on and figure out what they need to-do from a security
perspective (until Kafka gives them that) to protect their data (through
network or other type of measures).

On Thu, Jan 15, 2015 at 8:24 PM, Manikumar Reddy 
wrote:

> Also can we remove "delete.topic.enable" config property and enable topic
> deletion by default?
> On Jan 15, 2015 10:07 PM, "Jun Rao"  wrote:
>
> > Thanks for reporting this. I will remove that option in RC2.
> >
> > Jun
> >
> > On Thu, Jan 15, 2015 at 5:21 AM, Jaikiran Pai 
> > wrote:
> >
> > > I just downloaded the Kafka binary and am trying this on my 32 bit JVM
> > > (Java 7)? Trying to start Zookeeper or Kafka server keeps failing with
> > > "Unrecognized VM option 'UseCompressedOops'":
> > >
> > > ./zookeeper-server-start.sh ../config/zookeeper.properties
> > > Unrecognized VM option 'UseCompressedOops'
> > > Error: Could not create the Java Virtual Machine.
> > > Error: A fatal exception has occurred. Program will exit.
> > >
> > > Same with the Kafka server startup scripts. My Java version is:
> > >
> > > java version "1.7.0_71"
> > > Java(TM) SE Runtime Environment (build 1.7.0_71-b14)
> > > Java HotSpot(TM) Server VM (build 24.71-b01, mixed mode)
> > >
> > > Should there be a check in the script, before adding this option?
> > >
> > > -Jaikiran
> > >
> > > On Wednesday 14 January 2015 10:08 PM, Jun Rao wrote:
> > >
> > >> + users mailing list. It would be great if people can test this out
> and
> > >> report any blocker issues.
> > >>
> > >> Thanks,
> > >>
> > >> Jun
> > >>
> > >> On Tue, Jan 13, 2015 at 7:16 PM, Jun Rao  wrote:
> > >>
> > >>  This is the first candidate for release of Apache Kafka 0.8.2.0.
> There
> > >>> has been some changes since the 0.8.2 beta release, especially in the
> > new
> > >>> java producer api and jmx mbean names. It would be great if people
> can
> > >>> test
> > >>> this out thoroughly. We are giving people 10 days for testing and
> > voting.
> > >>>
> > >>> Release Notes for the 0.8.2.0 release
> > >>> *https://people.apache.org/~junrao/kafka-0.8.2.0-
> > >>> candidate1/RELEASE_NOTES.html
> > >>>  > >>> candidate1/RELEASE_NOTES.html>*
> > >>>
> > >>> *** Please download, test and vote by Friday, Jan 23h, 7pm PT
> > >>>
> > >>> Kafka's KEYS file containing PGP keys we use to sign the release:
> > >>> *https://people.apache.org/~junrao/kafka-0.8.2.0-candidate1/KEYS
> > >>> *
> in
> > >>> addition to the md5, sha1
> > >>> and sha2 (SHA256) checksum.
> > >>>
> > >>> * Release artifacts to be voted upon (source and binary):
> > >>> *https://people.apache.org/~junrao/kafka-0.8.2.0-candidate1/
> > >>> *
> > >>>
> > >>> * Maven artifacts to be voted upon prior to release:
> > >>> *https://people.apache.org/~junrao/kafka-0.8.2.0-
> > >>> candidate1/maven_staging/
> > >>>  > >>> candidate1/maven_staging/>*
> > >>>
> > >>> * scala-doc
> > >>> *https://people.apache.org/~junrao/kafka-0.8.2.0-
> > >>> candidate1/scaladoc/#package
> > >>>  > >>> candidate1/scaladoc/#package>*
> > >>>
> > >>> * java-doc
> > >>> *https://people.apache.org/~junrao/kafka-0.8.2.0-candidate1/javadoc/
> > >>>  >*
> > >>>
> > >>> * The tag to be voted upon (off the 0.8.2 branch) is the 0.8.2.0 tag
> > >>> *https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=
> > >>> b0c7d579f8aeb5750573008040a42b7377a651d5
> > >>>  > >>> b0c7d579f8aeb5750573008040a42b7377a651d5>*
> > >>>
> > >>> /***
> > >>>
> > >>> Thanks,
> > >>>
> > >>> Jun
> > >>>
> > >>>
> > >
> >
>


Re: Best Go Client Library

2015-01-14 Thread Joe Stein
The Go Kafka Client https://github.com/stealthly/go_kafka_client is a
wrapper around Sarama https://github.com/Shopify/sarama that implements
high level consumer functionality (including automatic load balancing and
fail-over when consumer clients fail) and other features for producing and
consuming with Kafka through Sarama.

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
/

On Wed, Jan 14, 2015 at 6:42 PM, Dieterly, Deklan 
wrote:

> I’m looking at implementing Kafka consumers in Go. From the Kafka wiki, I
> see that there are two clients available.
>
> 1. https://github.com/Shopify/sarama
> 2. https://github.com/stealthly/go_kafka_client
>
> Of the two which is more fully featured? Specifically, I’m looking for
> automatic load balancing and fail-over when consumer clients fail. I was
> disappointed with the Python client library (
> https://github.com/mumrah/kafka-python) that did not have this
> functionality built into it and I want to make sure that I don’t make
> another choice were I’ll be disappointed. Are there any reasons to prefer
> one of the listed Go clients over the other? Are there any gotchas with
> either of the two client libraries?
>
> Thanks.
>
>
> From the wiki at
> https://cwiki.apache.org/confluence/display/KAFKA/Clients#Clients-Go%28AKAgolang%29
> Go (AKA golang)
>
> Pure Go implementation with full protocol support. Consumer and Producer
> implementations included, GZIP and Snappy compression supported.
>
> Maintainer: Shopify<http://shopify.com>
> License: MIT
>
> https://github.com/Shopify/sarama
>
> Another Pure Go implementation built to work as like the "Consumer Group"
> High Level Consumer with more features including plug-able partition
> ownership and work fan out.
>
> Maintainer: Big Data Open Source Security<
> https://www.linkedin.com/company/big-data-open-source-security-llc>
> License: Apache v2.0
>
> https://github.com/stealthly/go_kafka_client
>
>
> Regards.
> --
> Deklan Dieterly
> Hewlett-Packard Company
> Sr. Systems Software Engineer
> HP Cloud
>
>


Re: kafka cluster on aws

2015-01-14 Thread Joe Stein
We have an open source framework you can use to spin up Kafka (any version
or even any build you want) clusters (and Zookeeper) with CloudFormation on
AWS https://github.com/stealthly/minotaur

It is very nice/handy you basically specify your instance types, counts,
versions of code, etc and hit a 
https://github.com/stealthly/minotaur/tree/master/labs/kafka e.g.

./minotaur.py lab deploy kafka -e bdoss-dev -d testing -r us-east-1 -z
us-east-1a -k http://example.com/kafka.tar.gz -n 3 -i m1.small

There is some setup for the bastion host (
https://github.com/stealthly/minotaur/tree/master/infrastructure/aws/bastion)
and supervisor (https://github.com/stealthly/minotaur/tree/master/supervisor)
and after that it is really nice and easy.

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
/

On Wed, Jan 14, 2015 at 2:54 PM, Joseph Lawson  wrote:

> We have a separate daemon process that assigns EIPs to servers when they
> startup in an autoscaling group based off of an autoscaling message.  So
> for a cluster of 3 we have 3 EIPs. Then we inject the EIPs into startup
> script for Kafka which checks to see if it has one of the EIPs and assigns
> itself the index of that IP so in the list:
> 10.0.0.1 10.0.0.2 10.0.0.3
>
> 1 is broker 0, 2 is broker 1 and 3 is broker 2.  All this is injected via
> cloudformation and then we have a mod value so if we want to spin brokers
> in the same group we do mod 1,2 and get brokers mod * 3 + index to
> determine which is in the group. (the EIPs are different as it is a
> different cloudformation)
>
> For redundancy make sure you run at least two that have full replicas of
> all other partitions.  We run replication factor of 3 with three instances
> so if any goes down the other two bring it back in sync once a fresh server
> spins in the autoscaling group.
>
> 
> From: Dillian Murphey 
> Sent: Wednesday, January 14, 2015 2:42 PM
> To: users@kafka.apache.org
> Subject: kafka cluster on aws
>
> I can't seem to find much information to help me (being green to kafka) on
> setting up a cluster on aws. Does anyone have any sources?
>
> The question I have off the bat is, what methods have already been explored
> to generate a unique broker id? If I spin up a new server, do I just need
> to maintain my own broker-id list somewhere so I don't re-use an already
> allocated broker id?
>
> Also, I read an article about a broker going down and requiring a new
> broker be spun up with the same id. Is this also something I need to
> implement?
>
> I want to setup a kafka auto-scaling group on AWS, so I can add brokers at
> well or based on load. It doesn't seem too complicated, or maybe I'm too
> green to see it, but I don't want to re-invent everything myself.
>
> I know Loggly uses AWS/Kafka, so I'm hunting for more details on that too.
>
> Thanks for any help
>


Re: metric-kafka problems

2015-01-09 Thread Joe Stein
Hi, https://github.com/stealthly/metrics-kafka is a project to be used as
an example of how to use Kafka as a central point to send all of your
metrics for your entire infrastructure. The consumers integrate so as to
abstract the load and coupling of services so systems can just send their
stats to Kafka and then you can do whatever you want with them from there
(often multiple things). We also build a Yammer Metrics Reporter (which is
what Kafka uses to send its Metrics) for Kafka itself so brokers can send
their stats into a Kafka topic and used downstream (typically another
cluster).  The issue you reported was caused by changes by github and I
just pushed fixes for them so things are working again.

If you are not looking for that type of solution and want to just see and
chart broker metrics then I would suggest taking a look at
https://github.com/airbnb/kafka-statsd-metrics2 and you point it to
https://github.com/kamon-io/docker-grafana-graphite. I find this a very
quick out the box way to see what is going on with a broker when no stats
reporter is already in place. If you want a Kafka metrics reporter for just
graphite check out https://github.com/damienclaveau/kafka-graphite for just
ganglie https://github.com/criteo/kafka-ganglia for just Riemann
https://github.com/TheLadders/KafkaRiemannMetricsReporter and/or you also
can use a service like SPM
https://apps.sematext.com/spm-reports/mainPage.do?selectedApplication=4293
or DataDog https://www.datadoghq.com/

Hope this help, thanks!

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
/

On Fri, Jan 9, 2015 at 7:51 PM, Sa Li  wrote:

> Hello, all
>
> I like to use the tool metrics-kafka which seems to be attractive to report
> kafka metric and use graphite to graph metrics, however I am having trouble
> to make it work.
>
> In https://github.com/stealthly/metrics-kafka, it says:
>
> In the main metrics-kafka folder
>
> 1) sudo ./bootstrap.sh 2) ./gradlew test 3) sudo ./shutdown.sh
> When I run ./bootstrap, see this is what I got
> root@DO-mq-dev:/home/stuser/jmx/metrics-kafka# ././bootstrap.sh
> /dev/stdin: line 1: syntax error near unexpected token `newline'
> /dev/stdin: line 1: `'
> /dev/stdin: line 1: syntax error near unexpected token `newline'
> /dev/stdin: line 1: `'
> e348a98a5afb8b89b94fce51b125e8a2045d9834268ec64c3e38cb7b165ef642
> 2015/01/09 16:49:21 Error response from daemon: Could not find entity for
> broker1
>
> And this is how I vagrant up:
> root@DO-mq-dev:/home/stuser/jmx/metrics-kafka# vagrant up
> /usr/share/vagrant/plugins/provisioners/docker/plugin.rb:13:in
> `require_relative':
> /usr/share/vagrant/plugins/provisioners/docker/config.rb:23: syntax error,
> unexpected tPOW (SyntaxError)
>   def run(name, **options)
>   ^
> /usr/share/vagrant/plugins/provisioners/docker/config.rb:43: syntax error,
> unexpected keyword_end, expecting $end
> from /usr/share/vagrant/plugins/provisioners/docker/plugin.rb:13:in
> `block in '
> from /usr/lib/ruby/vendor_ruby/vagrant/registry.rb:27:in `call'
> from /usr/lib/ruby/vendor_ruby/vagrant/registry.rb:27:in `get'
> from
> /usr/share/vagrant/plugins/kernel_v2/config/vm_provisioner.rb:34:in
> `initialize'
> from /usr/share/vagrant/plugins/kernel_v2/config/vm.rb:223:in `new'
> from /usr/share/vagrant/plugins/kernel_v2/config/vm.rb:223:in
> `provision'
> from /home/stuser/jmx/metrics-kafka/Vagrantfile:29:in `block (2
> levels) in '
> from /usr/lib/ruby/vendor_ruby/vagrant/config/v2/loader.rb:37:in
> `call'
> from /usr/lib/ruby/vendor_ruby/vagrant/config/v2/loader.rb:37:in
> `load'
> from /usr/lib/ruby/vendor_ruby/vagrant/config/loader.rb:104:in
> `block (2 levels) in load'
> from /usr/lib/ruby/vendor_ruby/vagrant/config/loader.rb:98:in
> `each'
> from /usr/lib/ruby/vendor_ruby/vagrant/config/loader.rb:98:in
> `block in load'
> from /usr/lib/ruby/vendor_ruby/vagrant/config/loader.rb:95:in
> `each'
> from /usr/lib/ruby/vendor_ruby/vagrant/config/loader.rb:95:in
> `load'
> from /usr/lib/ruby/vendor_ruby/vagrant/environment.rb:335:in
> `machine'
> from /usr/lib/ruby/vendor_ruby/vagrant/plugin/v2/command.rb:142:in
> `block in with_target_vms'
> from /usr/lib/ruby/vendor_ruby/vagrant/plugin/v2/command.rb:175:in
> `call'
> from /usr/lib/ruby/vendor_ruby/vagrant/plugin/v2/command.rb:175:in
> `block in with_target_vms'
>  

Re: kafka monitoring

2015-01-08 Thread Joe Stein
You need to export the JMX_PORT for kafka to use on startup before starting
up

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
/

On Thu, Jan 8, 2015 at 2:08 PM, Sa Li  wrote:

> Hello, All
>
> I understand many of you are using jmxtrans along with graphite/ganglia to
> pull out metrics, according to https://kafka.apache.org/081/ops.html,  it
> says "The easiest way to see the available metrics to fire up jconsole and
> point it at a running kafka client or server; this will all browsing all
> metrics with JMX. .."
>
> I tried to fire up a jconsole on windows attempting to access our dev and
> production cluster which are running good,
> here is the main node of my dev:
> 10.100.75.128, broker port:9092, zk port:2181
>
> Jconsole shows:
>
>  New Connection
> Remote Process:
>
> Usage: : OR service:jmx::
> Username:Password:
>
> Sorry about my naive, I tried connect base on above ip just can't be
> connected, do I need to do something in dev server to be able to make it
> work?
>
> thanks
>
> --
>
> Alec Li
>


Re: Consumer and offset management support in 0.8.2 and 0.9

2015-01-07 Thread Joe Stein
<< We need to take the versioning of the protocol seriously

amen

<< People are definitely using the offset commit functionality in 0.8.1

agreed

<< I really think we should treat this as a bug and revert the change to
version 0.

What do you mean exactly by revert? Why can't we use version as a feature
flag and support 0 and 1 at the same time? in the handleOffsetFetch and
handleOffsetCommit functions that process the request messages just do if
version == 0 old functionality else if version == 1 new functionality.
This way everyone works and nothing breaks  =8^)

/*******
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
/

On Wed, Jan 7, 2015 at 1:04 PM, Jay Kreps  wrote:

> Hey guys,
>
> We need to take the versioning of the protocol seriously. People are
> definitely using the offset commit functionality in 0.8.1 and I really
> think we should treat this as a bug and revert the change to version 0.
>
> -Jay
>
> On Wed, Jan 7, 2015 at 9:24 AM, Jun Rao  wrote:
>
> > Yes, we did make an incompatible change in OffsetCommitRequest in 0.8.2,
> > which is a mistake. The incompatible change was introduced in KAFKA-1012
> in
> > Mar, 2014 when we added the kafka-based offset management support.
> However,
> > we didn't realize that this breaks the wire protocol until much later.
> Now,
> > the wire protocol has evolved again and it's a bit hard to fix the format
> > in version 0. I can see a couple of options.
> >
> > Option 1: Just accept the incompatible change as it is.
> > The argument is that even though we introduced OffsetCommitRequest in
> > 0.8.1, it's not used in the high level consumer. It's possible that some
> > users of SimpleConsumer started using it. However, that number is likely
> > small. Also, the functionality of OffsetCommitRequest has changed since
> > it's writing the offset to a Kafka log, instead of ZK (for good reasons).
> > So, we can document this as a wire protocol and functionality
> incompatible
> > change. For users who don't mind the functionality change, they will need
> > to upgrade the client to the new protocol before they can use the new
> > broker. For users who want to preserve the old functionality, they will
> > have to write the offsets directly to ZK. In either case, hopefully the
> > number of people being affected is small.
> >
> > Option 2: Revert version 0 format to what's in 0.8.1.
> > There will be a few issues here. First, it's not clear how this affects
> > other people who have been deploying from trunk. Second, I am not sure
> that
> > we want to continue supporting writing the offset to ZK in
> > OffsetCommitRequest
> > since that can cause ZK to be overloaded.
> >
> > Joel Koshy,
> >
> > Any thoughts on this?
> >
> > Thanks,
> >
> > Jun
> >
> > On Mon, Jan 5, 2015 at 11:39 PM, Joe Stein  wrote:
> >
> > > In addition to the issue you bring up, the functionality as a whole has
> > > changed.. when you call OffsetFetchRequest the version = 0 needs to
> > > preserve the old functionality
> > >
> > >
> >
> https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/server/KafkaApis.scala#L678-L700
> > > and version = 1 the new
> > >
> > >
> >
> https://github.com/apache/kafka/blob/0.8.2/core/src/main/scala/kafka/server/KafkaApis.scala#L153-L223
> > > .
> > > Also the OffsetFetchRequest functionality even though the wire protocol
> > is
> > > the same after the 0.8.2 upgrade for OffsetFetchRequest if you were
> using
> > > 0.8.1.1 OffsetFetchRequest
> > >
> > >
> >
> https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/server/KafkaApis.scala#L705-L728
> > > will stop going to zookeeper and start going to Kafka storage
> > >
> > >
> >
> https://github.com/apache/kafka/blob/0.8.2/core/src/main/scala/kafka/server/KafkaApis.scala#L504-L519
> > > so more errors will happen and things break too.
> > >
> > > I think we should treat the version field not just to stop from
> breaking
> > > the wire protocol calls but also as a "feature flag" preserving
> upgrades
> > > and multiple pathways.
> > >
> > > I updated the JIRA for the feature flag needs for OffsetFetch and
> > > OffsetCommit too.
> > >
> > > /***

Re: messages lost

2015-01-06 Thread Joe Stein
You should never be storing your log files in /tmp please change that.

Ack = -1 is what you should be using if you want to guarantee messages are
saved. You should not be seeing high latencies (unless a few milliseconds
is high for you).

Are you using sync or async producer? What version of Kafka? How are you
counting the data from the topic? How are you counting you sent each
message and that it successfully acked? How are you counting from the topic
and have you verified the counts summed from each partition?

Can you share some sample code that reproduces this issue?

You can try counting the message from each partition using
https://github.com/edenhill/kafkacat and pipe to wc -l it makes for a nice
simple sanity check to where the problem might be.

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
/

On Tue, Jan 6, 2015 at 7:21 PM, Sa Li  wrote:

> Hi, experts
>
> Again, we still having the issues of losing data, see we see data 5000
> records, but only find 4500 records on brokers, we did set required.acks -1
> to make sure all brokers ack, but that only cause the long latency, but not
> cure the data lost.
>
>
> thanks
>
>
> On Mon, Jan 5, 2015 at 9:55 AM, Xiaoyu Wang  wrote:
>
> > @Sa,
> >
> > the required.acks is producer side configuration. Set to -1 means
> requiring
> > ack from all brokers.
> >
> > On Fri, Jan 2, 2015 at 1:51 PM, Sa Li  wrote:
> >
> > > Thanks a lot, Tim, this is the config of brokers
> > >
> > > --
> > > broker.id=1
> > > port=9092
> > > host.name=10.100.70.128
> > > num.network.threads=4
> > > num.io.threads=8
> > > socket.send.buffer.bytes=1048576
> > > socket.receive.buffer.bytes=1048576
> > > socket.request.max.bytes=104857600
> > > auto.leader.rebalance.enable=true
> > > auto.create.topics.enable=true
> > > default.replication.factor=3
> > >
> > > log.dirs=/tmp/kafka-logs-1
> > > num.partitions=8
> > >
> > > log.flush.interval.messages=1
> > > log.flush.interval.ms=1000
> > > log.retention.hours=168
> > > log.segment.bytes=536870912
> > > log.cleanup.interval.mins=1
> > >
> > > zookeeper.connect=10.100.70.128:2181,10.100.70.28:2181,
> 10.100.70.29:2181
> > > zookeeper.connection.timeout.ms=100
> > >
> > > ---
> > >
> > >
> > > We actually play around request.required.acks in producer config, -1
> > cause
> > > long latency, 1 is the parameter to cause messages lost. But I am not
> > sure,
> > > if this is the reason to lose the records.
> > >
> > >
> > > thanks
> > >
> > > AL
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Fri, Jan 2, 2015 at 9:59 AM, Timothy Chen 
> wrote:
> > >
> > > > What's your configured required.acks? And also are you waiting for
> all
> > > > your messages to be acknowledged as well?
> > > >
> > > > The new producer returns futures back, but you still need to wait for
> > > > the futures to complete.
> > > >
> > > > Tim
> > > >
> > > > On Fri, Jan 2, 2015 at 9:54 AM, Sa Li  wrote:
> > > > > Hi, all
> > > > >
> > > > > We are sending the message from a producer, we send 10 records,
> > but
> > > > we
> > > > > see only 99573 records for that topics, we confirm this by consume
> > this
> > > > > topic and check the log size in kafka web console.
> > > > >
> > > > > Any ideas for the message lost, what is the reason to cause this?
> > > > >
> > > > > thanks
> > > > >
> > > > > --
> > > > >
> > > > > Alec Li
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > Alec Li
> > >
> >
>
>
>
> --
>
> Alec Li
>


Re: no space left error

2015-01-06 Thread Joe Stein
There are two parts to this

1) How to prevent Kafka from filling up disks which
https://issues.apache.org/jira/browse/KAFKA-1489 is trying to deal with (I
set the ticket to unassigned just now since i don't think anyone is working
on it and was assigned by default, could be wrong though so assign back if
I am wrong). I don't know the solution off the top of my head but I think
it is something we should strive for in 0.8.3 (worse case 0.9.0) as we need
a solution it happens frequently enough.

2) until then what to-do when it does happen

For #2 I have been pulled into the situation a number of times and honestly
the solution has been a bit different each time and not sure any one "tool"
or guideline is going to work honestly it is not always systematic... TBH
... but we could try some more guidelines and experiences that different
folks have had if someone already has this written up that would be great
otherwise I can carve out some time in the near future and do that (though
honestly I would rather effort go to #1 but it is a balance for sure.

For Sa's problem (where this thread started) << OpenJDK 64-Bit Server VM
warning: Insufficient space for shared memory file:
   /tmp/hsperfdata_root/19721

I don't think this is Kafka related though so what we are talking about
partition and retention is not applicable, even though important.

/*******
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
/

On Tue, Jan 6, 2015 at 3:10 PM, David Birdsong 
wrote:

> I'm keen to hear about how to work one's way out of a filled partition
> since I've run into this many times after having tuned retention bytes or
> retention (time?) incorrectly. The proper path to resolving this isn't
> obvious based on my many harried searches through documentation.
>
> I often end up stopping the particular broker, picking an unlucky
> topic/partition, deleting, modifying the any topics that consumed too much
> space by lowering their retention bytes, and restarting.
>
> On Tue, Jan 6, 2015 at 12:02 PM, Sa Li  wrote:
>
> > Continue this issue, when I restart the server, like
> > bin/kafka-server-start.sh config/server.properties
> >
> > it will fails to start the server, like
> >
> > [2015-01-06 20:00:55,441] FATAL Fatal error during KafkaServerStable
> > startup. Prepare to shutdown (kafka.server.KafkaServerStartable)
> > java.lang.InternalError: a fault occurred in a recent unsafe memory
> access
> > operation in compiled Java code
> > at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57)
> > at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
> > at
> > kafka.log.FileMessageSet$$anon$1.makeNext(FileMessageSet.scala:188)
> > at
> > kafka.log.FileMessageSet$$anon$1.makeNext(FileMessageSet.scala:165)
> > at
> > kafka.utils.IteratorTemplate.maybeComputeNext(IteratorTemplate.scala:66)
> > at
> kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:58)
> > at kafka.log.LogSegment.recover(LogSegment.scala:165)
> > at kafka.log.Log.recoverLog(Log.scala:179)
> > at kafka.log.Log.loadSegments(Log.scala:155)
> > at kafka.log.Log.(Log.scala:64)
> > at
> >
> >
> kafka.log.LogManager$$anonfun$loadLogs$1$$anonfun$apply$4.apply(LogManager.scala:118)
> > at
> >
> >
> kafka.log.LogManager$$anonfun$loadLogs$1$$anonfun$apply$4.apply(LogManager.scala:113)
> > at
> >
> >
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
> > at
> > scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:105)
> > at
> > kafka.log.LogManager$$anonfun$loadLogs$1.apply(LogManager.scala:113)
> > at
> > kafka.log.LogManager$$anonfun$loadLogs$1.apply(LogManager.scala:105)
> > at
> >
> >
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
> > at
> > scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
> > at kafka.log.LogManager.loadLogs(LogManager.scala:105)
> > at kafka.log.LogManager.(LogManager.scala:57)
> > at
> kafka.server.KafkaServer.createLogManager(KafkaServer.scala:275)
> > at kafka.server.KafkaServer.startup(KafkaServer.scala:72)
> > at
> > kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:34)
> > at kafka.Kafka$.main(Kafka.scala:46)
> &

Re: Consumer and offset management support in 0.8.2 and 0.9

2015-01-05 Thread Joe Stein
In addition to the issue you bring up, the functionality as a whole has
changed.. when you call OffsetFetchRequest the version = 0 needs to
preserve the old functionality
https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/server/KafkaApis.scala#L678-L700
and version = 1 the new
https://github.com/apache/kafka/blob/0.8.2/core/src/main/scala/kafka/server/KafkaApis.scala#L153-L223.
Also the OffsetFetchRequest functionality even though the wire protocol is
the same after the 0.8.2 upgrade for OffsetFetchRequest if you were using
0.8.1.1 OffsetFetchRequest
https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/server/KafkaApis.scala#L705-L728
will stop going to zookeeper and start going to Kafka storage
https://github.com/apache/kafka/blob/0.8.2/core/src/main/scala/kafka/server/KafkaApis.scala#L504-L519
so more errors will happen and things break too.

I think we should treat the version field not just to stop from breaking
the wire protocol calls but also as a "feature flag" preserving upgrades
and multiple pathways.

I updated the JIRA for the feature flag needs for OffsetFetch and
OffsetCommit too.

/*******
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
/

On Mon, Jan 5, 2015 at 3:21 PM, Dana Powers  wrote:

> ok, opened KAFKA-1841 .  KAFKA-1634 also related.
>
> -Dana
>
> On Mon, Jan 5, 2015 at 10:55 AM, Gwen Shapira 
> wrote:
>
> > Ooh, I see what you mean - the OffsetAndMetadata (or PartitionData)
> > part of the Map changed, which will modify the wire protocol.
> >
> > This is actually not handled in the Java client either. It will send
> > the timestamp no matter which version is used.
> >
> > This looks like a bug and I'd even mark it as blocker for 0.8.2 since
> > it may prevent rolling upgrades.
> >
> > Are you opening the JIRA?
> >
> > Gwen
> >
> > On Mon, Jan 5, 2015 at 10:28 AM, Dana Powers  wrote:
> > > specifically comparing 0.8.1 --
> > >
> > >
> >
> https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/api/OffsetCommitRequest.scala#L37-L50
> > > ```
> > > (1 to partitionCount).map(_ => {
> > >   val partitionId = buffer.getInt
> > >   val offset = buffer.getLong
> > >   val metadata = readShortString(buffer)
> > >   (TopicAndPartition(topic, partitionId),
> OffsetMetadataAndError(offset,
> > > metadata))
> > > })
> > > ```
> > >
> > > totrunk --
> > >
> > >
> >
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/api/OffsetCommitRequest.scala#L44-L69
> > > ```
> > > (1 to partitionCount).map(_ => {
> > >   val partitionId = buffer.getInt
> > >   val offset = buffer.getLong
> > >   val timestamp = {
> > > val given = buffer.getLong
> > > if (given == -1L) now else given
> > >   }
> > >   val metadata = readShortString(buffer)
> > >   (TopicAndPartition(topic, partitionId), OffsetAndMetadata(offset,
> > > metadata, timestamp))
> > > })
> > > ```
> > >
> > > should the `timestamp` buffer read be wrapped in an api version check?
> > >
> > >
> > > Dana Powers
> > > Rdio, Inc.
> > > dana.pow...@rd.io
> > > rdio.com/people/dpkp/
> > >
> > > On Mon, Jan 5, 2015 at 9:49 AM, Gwen Shapira 
> > wrote:
> > >
> > >> Ah, I see :)
> > >>
> > >> The readFrom function basically tries to read two extra fields if you
> > >> are on version 1:
> > >>
> > >> if (versionId == 1) {
> > >>   groupGenerationId = buffer.getInt
> > >>   consumerId = readShortString(buffer)
> > >> }
> > >>
> > >> The rest looks identical in version 0 and 1, and still no timestamp in
> > >> sight...
> > >>
> > >> Gwen
> > >>
> > >> On Mon, Jan 5, 2015 at 9:33 AM, Dana Powers 
> wrote:
> > >> > Hi Gwen, I am using/writing kafka-python to construct api requests
> and
> > >> have
> > >> > not dug too deeply into the server source code.  But I believe it is
> > >> > kafka/api/OffsetCommitRequest.scala and specifically the readFrom
> > method
> > >> > used to decode the wire protocol.
> > >> >
> > >> > -Dana
> > >> > OffsetCommitR

Re: Kafka getMetadata api

2015-01-02 Thread Joe Stein
You could do what you are asking with a custom encoder/decoder so the
message bytes are made up of "messageType+AvroMessage" for the bytes of the
message. The message can be whatever byte structure you want and not just
the avro binary i.e.
https://github.com/linkedin/camus/blob/master/camus-kafka-coders/src/main/java/com/linkedin/camus/etl/kafka/coders/KafkaAvroMessageEncoder.java
/
https://github.com/linkedin/camus/blob/master/camus-kafka-coders/src/main/java/com/linkedin/camus/etl/kafka/coders/KafkaAvroMessageDecoder.java
or even write the messageType as the key
https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/message/MessageAndMetadata.scala#L30
and read that in your consumer.

I am not sure exactly what you are trying to solve because if you have a
5MB message going over the wire and your consumer doesn't need that data
then you just wasted transport costs in your infrastructure... in which
case Jayesh's suggestion makes sense for the Kafka message to be a pointer
to use to query in another system that supports reliable larger data/file
storage (lots of options here) which is also a typical type of deployment
in these scenarios.

/*******
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
/

On Fri, Jan 2, 2015 at 8:25 PM, Mukesh Jha  wrote:

> Indeed my message size varies b/w ~500kb to ~5mb per avro.
>
> I am using kafka as a I need a scalable pub-sub messaging architecture with
> multiple produces and consumers and guaranty of delivery.
> Keeping data on filesystem or hdfs won't give me that.
>
> Also In the link below [1] there is a linkedin's performance benchmark of
> kafka wrt message size which shows that Kafka's throughput increases with
> messages of size ~100kb+.
>
> Agreed for kafka a record is key+value, I'm wondering if kafka can give us
> a way to sneak peek a records metadata via its key.
>
> [1]
>
> https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
> On 3 Jan 2015 01:27, "Jayesh Thakrar"  wrote:
>
> > Just wondering Mukesh - the reason you want this feature is because your
> > value payload is not small (tens of kb). Don't know if that is the right
> > usage of kafka. It might be worthwhile to store the avro files in a
> > filesystem (regular, cluster fs, hdfs or even hbase) and the value in
> your
> > kafka message can be the reference or uri for the avro file.
> >
> > That way you make the best use of each system's features and strengths.
> >
> > Kafka does have api to get metadata - the topics, partitions and  primary
> > for partition etc. If we consider a key-value pair as a "record" than
> what
> > you are looking for is to get a part of the record (ie key only) and not
> > the whole record - so i would still consider that a data query/api.
> >
> >
>


Re: kafka-web-console error

2015-01-02 Thread Joe Stein
The kafka project doesn't have an official web console so you might need to
open an issue on the github page of the project for the web console you are
using as they may not be closing connections and using up all of resources
regardless of what you have set, etc if you have the default setting
you may have to increase this value for the operating system if it is not a
bug in the client you are using. You can give
http://www.cyberciti.biz/faq/howto-linux-get-list-of-open-files/ a try to
ascertain which is the problem and/or do a ulimit -n on your machine and
see if the value = 1024 which is the default likely for your OS.

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
/

On Fri, Jan 2, 2015 at 7:41 PM, Sa Li  wrote:

> Hi, all
>
> I am running kafka-web-console, I periodically getting such error and cause
> the UI down:
>
>
>
> ! @6kldaf9lj - Internal server error, for (GET)
> [/assets/images/zookeeper_small.gif] ->
> play.api.Application$$anon$1: Execution exception[[FileNotFoundException:
>
> /vagrant/kafka-web-console-master/target/scala-2.10/classes/public/images/zookeeper_small.gif
> (Too many open files)]]
> at play.api.Application$class.handleError(Application.scala:293)
> ~[play_2.10-2.2.1.jar:2.2.1]
> at play.api.DefaultApplication.handleError(Application.scala:399)
> [play_2.10-2.2.1.jar:2.2.1]
> at
>
> play.core.server.netty.PlayDefaultUpstreamHandler$$anonfun$12$$anonfun$apply$1.applyOrElse(PlayDefaultUpstreamHandler.scala:165)
> [play_2.10-2.2.1.jar:2.2.1]
> at
>
> play.core.server.netty.PlayDefaultUpstreamHandler$$anonfun$12$$anonfun$apply$1.applyOrElse(PlayDefaultUpstreamHandler.scala:162)
> [play_2.10-2.2.1.jar:2.2.1]
> at
>
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
> [scala-library-2.10.2.jar:na]
> at scala.util.Failure$$anonfun$recover$1.apply(Try.scala:185)
> [scala-library-2.10.2.jar:na]
> Caused by: java.io.FileNotFoundException:
>
> /vagrant/kafka-web-console-master/target/scala-2.10/classes/public/images/zookeeper_small.gif
> (Too many open files)
> at java.io.FileInputStream.open(Native Method) ~[na:1.7.0_65]
> at java.io.FileInputStream.(FileInputStream.java:146)
> ~[na:1.7.0_65]
> at java.io.FileInputStream.(FileInputStream.java:101)
> ~[na:1.7.0_65]
> at
>
> sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:90)
> ~[na:1.7.0_65]
> at
>
> sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:188)
> ~[na:1.7.0_65]
> at java.net.URL.openStream(URL.java:1037) ~[na:1.7.0_65]
>
> [debug] application - Getting partition offsets for topic ui_test_topic_6
> [warn] application - Could not connect to partition leader
> exemplary-birds.master:9092. Error message: Failed to open a socket.
> [debug] application - Getting partition offsets for topic ui_test_topic_5
> [warn] application - Could not connect to partition leader
> exemplary-birds.master:9092. Error message: Failed to open a socket.
> [warn] application - Could not connect to partition leader
> harmful-jar.master:9092. Error message: Failed to open a socket.
> [warn] application - Could not connect to partition leader
> voluminous-mass.master:9092. Error message: Failed to open a socket.
> [warn] application - Could not connect to partition leader
> voluminous-mass.master:9092. Error message: Failed to open a socket.
> [warn] application - Could not connect to partition leader
> exemplary-birds.master:9092. Error message: Failed to open a socket.
> [warn] application - Could not connect to partition leader
> exemplary-birds.master:9092. Error message: Failed to open a socket.
> [warn] application - Could not connect to partition leader
> harmful-jar.master:9092. Error message: Failed to open a socket.
> [warn] application - Could not connect to partition leader
> harmful-jar.master:9092. Error message: Failed to open a socket.
> [warn] application - Could not connect to partition leader
> voluminous-mass.master:9092. Error message: Failed to open a socket.
> [warn] application - Could not connect to partition leader
> voluminous-mass.master:9092. Error message: Failed to open a socket.
> [warn] application - Could not connect to partition leader
> exemplary-birds.master:9092. Error message: Failed to open a socket.
> [warn] application - Could not connect to partition leader
> harmful-jar.master:9092. Error message: Failed to open a socket.
> [warn] application - Could not connect to p

Re: kafka logs gone after reboot the server

2015-01-02 Thread Joe Stein
Either should be fine but is all dependent on your environment and how you
want to operate your cluster in the long run i.e. if you have multiple
volumes then you need to have them each in the log.dirs separated by comma,
etc. If there is no good reason to have the broker.id in the directory they
you should probably have it be consistent on each node.

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
/

On Fri, Jan 2, 2015 at 5:26 PM, Sa Li  wrote:

> One more question, when I set the log.dirs in different nodes in the
> cluster, should I set them different name, say kafka-logs-1 which
> associated with broker id, or I can set the same directory name, like
> /var/log/kafka for every node (assume one broker in each server).
>
> thanks
>
>
> On Fri, Jan 2, 2015 at 2:20 PM, Sa Li  wrote:
>
> > Thanks a lot!
> >
> >
> > On Fri, Jan 2, 2015 at 12:15 PM, Jay Kreps  wrote:
> >
> >> Nice catch Joe--several people have complained about this as a problem
> and
> >> we were a bit mystified as to what kind of bug could lead to all their
> >> logs
> >> getting deleted and re-replicated when they bounced the server. We
> assumed
> >> "bounced" meant restarted the app, but I think likely what is happening
> is
> >> what you describe--the logs were in /tmp and bouncing the server meant
> >> restarting.
> >>
> >> -Jay
> >>
> >> On Fri, Jan 2, 2015 at 11:02 AM, Joe Stein 
> wrote:
> >>
> >> > That is because your logs are in /tmp which you can change by
> >> > setting log.dirs to something else.
> >> >
> >> > /***
> >> >  Joe Stein
> >> >  Founder, Principal Consultant
> >> >  Big Data Open Source Security LLC
> >> >  http://www.stealth.ly
> >> >  Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> >> > /
> >> >
> >> > On Fri, Jan 2, 2015 at 1:58 PM, Sa Li  wrote:
> >> >
> >> > > Hi, All
> >> > >
> >> > > I've just notice one thing, when I am experiencing some errors in
> >> Kafka
> >> > > servers, I reboot the dev servers (not a good way), after reboot, I
> >> get
> >> > > into zkCli, I can see all the topics still exist. But when I get
> into
> >> > kafka
> >> > > log directory, I found all data gone, see
> >> > >
> >> > > root@DO-mq-dev:/tmp/kafka-logs-1/ui_test_topic_4-0# ll
> >> > > total 8
> >> > > drwxr-xr-x  2 root root 4096 Jan  2 09:39 ./
> >> > > drwxr-xr-x 46 root root 4096 Jan  2 10:46 ../
> >> > > -rw-r--r--  1 root root 10485760 Jan  2 09:39
> >> .index
> >> > > -rw-r--r--  1 root root0 Jan  2 09:39
> .log
> >> > >
> >> > > I wonder, if for some reasons, the server down, and restart it, all
> >> the
> >> > > data in hard drive will be gone?
> >> > >
> >> > > thanks
> >> > >
> >> > > --
> >> > >
> >> > > Alec Li
> >> > >
> >> >
> >>
> >
> >
> >
> > --
> >
> > Alec Li
> >
>
>
>
> --
>
> Alec Li
>


Re: kafka logs gone after reboot the server

2015-01-02 Thread Joe Stein
That is because your logs are in /tmp which you can change by
setting log.dirs to something else.

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
/

On Fri, Jan 2, 2015 at 1:58 PM, Sa Li  wrote:

> Hi, All
>
> I've just notice one thing, when I am experiencing some errors in Kafka
> servers, I reboot the dev servers (not a good way), after reboot, I get
> into zkCli, I can see all the topics still exist. But when I get into kafka
> log directory, I found all data gone, see
>
> root@DO-mq-dev:/tmp/kafka-logs-1/ui_test_topic_4-0# ll
> total 8
> drwxr-xr-x  2 root root 4096 Jan  2 09:39 ./
> drwxr-xr-x 46 root root 4096 Jan  2 10:46 ../
> -rw-r--r--  1 root root 10485760 Jan  2 09:39 .index
> -rw-r--r--  1 root root0 Jan  2 09:39 .log
>
> I wonder, if for some reasons, the server down, and restart it, all the
> data in hard drive will be gone?
>
> thanks
>
> --
>
> Alec Li
>


Re: Kafka getMetadata api

2015-01-02 Thread Joe Stein
I think partitioning is best left for the semantics of the message (i.e.
userId, customerId, etc) and not the type of message. If your consumers
only need specific message types then separate the messages types by
topics. This will make the consumers that don't need those message types
work better not having to ignore them and focus on what they are built for.
If you have consumers that need multiple message types that are now across
topics then those consumers should consume from multiple topics i.e.
https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/consumer/ConsoleConsumer.scala#L196

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
/

/*******
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
/

On Fri, Jan 2, 2015 at 12:34 PM, Manikumar Reddy 
wrote:

> Hi,
>
> One option is to partition the data using key and consume from relevant
> partition.
> Or your current approach (filtering messages in the application) should be
> OK.
>
> Using separate getMetaData/getkey and getMessage may hit the consumer
> performance/throughput.
>
>
> Regards,
> Kumar
>
> On Fri, Jan 2, 2015 at 9:53 PM, Mukesh Jha 
> wrote:
>
> > Any pointers guys?
> > On 1 Jan 2015 15:26, "Mukesh Jha"  wrote:
> >
> > > Hello Experts,
> > >
> > > I'm using a kafka topic to store bunch of messages where the key
> contains
> > > metadata and value is the data (avro file in our case).
> > > There are multiple consumers for each topic and the consumer can decide
> > if
> > > the message is relevant for it or not based on the metadata i.e. the
> key
> > of
> > > the message.
> > > Using this a group-consumer can check the key if the message is
> required
> > > by it and then it can retrieve the entire message otherwise it'll just
> > > commit the offset and move on to the next message.
> > >
> > > So I was wondering if kafka has an api in kafka that lets the consumer
> to
> > > get just the metadata i.e. key, something like getMetadata instead
> > > of getMessageAndMetadata?
> > >
> > > If kafka has something like this can you help me out with some
> > > documentation for the same?
> > > I think this will be useful in a lot of scenarios so if its not there I
> > > can file a JIRA and take a dig at it. Let me know what you all think.
> > >
> > > Thanks for your help & suggestions.
> > >
> > > --
> > > Thanks & Regards,
> > >
> > > *Mukesh Jha *
> > >
> >
>


Re: under replicated topics

2014-12-30 Thread Joe Stein
It sounds like you are looking for FetcherLagMetrics. See
https://kafka.apache.org/documentation.html#monitoring you can (if you find
your ISR shrink/growth rate flapping) increase the max lag setting
replica.lag.max.messages (default is 4000) depending on where you see this
lag often hovering at as such... bursty traffic (especially with batched
messages) can upset this, yup.

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
/

On Tue, Dec 30, 2014 at 5:42 PM,  wrote:

> I am still trying to find a way to detect how far behind a replica is,
> nicely, so I can differentiate between "10 offsets" and "1 offsets"
> behind.  ‎ This would help with problems like this one, as we often have
> replicas that are just slightly behind, due to Bursty traffic, but the ones
> that get more and more behind, those are the ones we need to get called for.
>
> T.
>
> Sent from my BlackBerry 10 smartphone on the TELUS network.
>   Original Message
> From: Gene Robichaux
> Sent: Tuesday, December 30, 2014 4:15 PM
> To: users@kafka.apache.org
> Reply To: users@kafka.apache.org
> Subject: RE: under replicated topics
>
> Thanks for the response. We are using python to grab the JMX values and
> stuff them into graphite. We noticed on some graphs that we had a server
> with 2 underreplicated partitions. The restarted fixed it.
>
> Gene Robichaux
> Manager, Database Operations
> Match.com
> 8300 Douglas Avenue I Suite 800 I Dallas, TX  75225
>
> -Original Message-
> From: t...@borked.ca [mailto:t...@borked.ca]
> Sent: Tuesday, December 30, 2014 2:59 PM
> To: Gene Robichaux; users@kafka.apache.org
> Subject: Re: under replicated topics
>
> We have seen cases with 0.8.1 when, under load, replication threads would
> hang up and not transfer data any longer. Restarting clears this.
>
> I haven't found a way to monitor for this in a nice way, other than seeing
> partitions stay under-replicated for long periods of time.
>
> Sent from my BlackBerry 10 smartphone on the TELUS network.
>   Original Message
> From: Gene Robichaux
> Sent: Tuesday, December 30, 2014 2:43 PM
> To: users@kafka.apache.org
> Reply To: users@kafka.apache.org
> Subject: RE: under replicated topics
>
> We restarted the Kafka brokers this morning. That fixed the issue.
>
> Gene Robichaux
> Manager, Database Operations
> Match.com
> 8300 Douglas Avenue I Suite 800 I Dallas, TX  75225
>
> -Original Message-
> From: Allen Wang [mailto:aw...@netflix.com.INVALID]
> Sent: Tuesday, December 30, 2014 1:38 PM
> To: users@kafka.apache.org
> Subject: Re: under replicated topics
>
> Brokers may have temporary problems catching up with the leaders. So I
> would not worry about it if it happens only once a while and goes away.
>
> Occasionally we have seen under replicated topics for long time, which
> might be caused by ZooKeeper session problem as indicated by such log
> messages:
>
> [info] I wrote this conflicted ephemeral node [{"jmx_port":-1,"timestamp":"
> 1418688028517","host":"ec2-54-75-33-226.eu-west-1.compute.amazonaws.com
> ","version":1,"port":7101}] at /brokers/ids/0 a while back in a different
> session, hence I will backoff for this node to be deleted by Zookeeper and
> retry
>
> Typically restarting the Kafka process on such brokers will fix the
> problem.
>
>
>
> On Mon, Dec 29, 2014 at 12:49 PM, Gene Robichaux  >
> wrote:
>
> > My team is new to Kafka so any help is appreciated.
> >
> > We have a situation where we have 3 under replicated topics. What is
> > the best way to correct this?
> >
> > # bin/kafka-topics.sh --describe --under-replicated-partitions
> > --zookeeper
> > ServerName:2181
> > Topic: DA_DbExceptionLog Partition: 8 Leader: 2
> > Replicas: 3,2,4 Isr: 2,3
> > Topic: DA_DebugLog Partition: 12 Leader: 1 Replicas:
> > 3,1,2 Isr: 1,2
> > Topic: EC_Interaction Partition: 8 Leader: 2 Replicas:
> > 3,2,4 Isr: 2,3
> >
> > Gene Robichaux
> > Manager, Database Operations
> > Match.com
> > 8300 Douglas Avenue I Suite 800 I Dallas, TX 75225
> >
> >
>


Re: csv files to kafka

2014-12-28 Thread Joe Stein
In that project the protobuf
https://github.com/stealthly/f2k/blob/master/src/main/proto/FileType.proto
is compiled/generated in the gradle task using the plugin
https://github.com/stealthly/f2k/blob/master/build.gradle#L17

You need to have protobuf https://github.com/google/protobuf/ installed and
have it setup for your environment appropriately (see protoBuf {} gradle
task in build.gradle
https://github.com/stealthly/f2k/blob/master/build.gradle#L19-L24 )

I left the issue you created on github open so we can get around to
updating the README and adding a vagrant up implementation so it works
right from the repo better without having to-do much local for others so
they don't bump into this and then work around it locally by installing
protobuf.

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
/

On Sun, Dec 28, 2014 at 2:56 AM, kishore kumar 
wrote:

> Hi Joe,
>
> I am cloned this repo with
>
> git clone https://github.com/stealthly/f2k.git
>
> cd f2k
>
> when I run .
> #./gradlew clean fatJar
>
> the error is "Execution failed for task ':compileProto'."
>
> when I run with --debug option it gives the info
>
> Gradle user home: /root/.gradle
> Current dir: /root/f2k
> java.io.IOException: Cannot run program "/usr/bin/protoc" (in directory
> "/root/f2k"): error=2, No such file or directory
>
> but when I run from /root
>
> ./f2k/gradlew clean fatJar --stacktrace
>
> the error is "Task 'clean' not found in root project 'root'."
>
> Please help me how to build this.
>
> Thanks.
>
> On Sun, Dec 14, 2014 at 12:53 PM, Joe Stein  wrote:
>
> > Here is a non-production example of a file load to Kafka using Scala
> >
> >
> https://github.com/stealthly/f2k/blob/master/src/main/scala/ly/stealth/f2k/KafkaUploader.scala
> >
> >
> > If you wanted to convert that to javacode the Scala compiler does that
> for
> > you :) and makes it importable as jar to your java program, or you can
> use
> > as reference and rewrite it in Java.
> >
> > /***
> >  Joe Stein
> >  Founder, Principal Consultant
> >  Big Data Open Source Security LLC
> >  http://www.stealth.ly
> >  Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> > /
> >
> > On Sun, Dec 14, 2014 at 1:39 AM, kishore kumar 
> > wrote:
> > >
> > > HI Experts,
> > >
> > > I want to load csv files to kafka, anybody help me to write javacode
> for
> > > this?
> > >
> > > --
> > > Thanks,
> > > Kishore.
> > >
> >
>
>
>
> --
> Thanks,
> Kishore.
>


Re: Kafka in C#

2014-12-22 Thread Joe Stein
Thunder, can you add that to
https://cwiki.apache.org/confluence/display/KAFKA/Clients#Clients-.net
didn't know it existed but cool that it uses Rx

On Mon, Dec 22, 2014 at 12:52 PM, Thunder Stumpges 
wrote:

> Hi there,
>
> We looked at both of these a while back and ended up writing our own (
> https://github.com/ntent-ad/kafka4net).
>
> The first one in the list was completely synchronous, and had no concept
> of batching. We initially attempted to use the second one (kafka-net) but
> had some issues with detecting leader changes, temporary errors, and
> general topic metadata changes. Also had some problems with the use of
> async and the tracking of correlation IDs on tcp messages.
>
> I know our client does not have a lot of documentation (yet) and says
> "Work in progress, not ready yet!" but we have been testing this a lot, and
> have been running in staging under load with good results. We will be
> entering production in the next few weeks (waiting until after the
> holidays). I wouldn't be comfortable with it going straight to production
> without some testing in your environment, but especially the Producer is
> very robust, and is resilient to all sorts of changes in the environment
> (the integration tests use Vagrant and a set of VMs and test Producing to
> non-existent partition and waiting for auto-creation of topic, partition
> rebalancing, broker down or other partition re-assignment, etc.) The client
> is fully Async and leverages Rx (https://rx.codeplex.com/) and an
> event-loop-scheduler to do all processing. Changes in partition state are
> broadcast Rx style to listening components (Producer, Consumer,
> PartitionRecoveryMonitor).
>
> One of the reasons we have not finalized the documentation and notice on
> the github page is we weren't sure if the API might change based on usage.
> To this end, we'd like to know what you think and if you have any use-cases
> not handled by the API.
>
> I understand if you're not comfortable with the beta state of the client,
> but we'd love to have you check it out. We are actively developing on this
> and can help with any issues.
>
> Thanks,
> Thunder
>
>
> -Original Message-
> From: Matti Waarna [mailto:mwaa...@sapient.com]
> Sent: Monday, December 22, 2014 7:55 AM
> To: users@kafka.apache.org
> Subject: Kafka in C#
>
> We are using kafka version 0.8.1.1 and trying to produce from a C# app.
>
> I realize that there is no official C# library release and want to get
> your experience with the existing solutions that are currently available.
>
> I am looking for a solution that is a) stable enough for production
> environment and b) performs well.
>
> 1) A couple of active github projects are available along with a few forks
> each.
> Has Anybody worked on either of the following two options to contribute
> their findings?
> https://github.com/Jroland/kafka-net
>
> https://github.com/miknil/Kafka4n
>
> 2) Also there is the option of IKVM to import kafka JARS into a .net DLL.
>
> Maybe even another solution?
>
> Thanks
>
> MATTI
>
>


Re: Kafka in C#

2014-12-22 Thread Joe Stein
Another option is a HTTP wrapper around the actual producer and doing a
HTTP POST from C# to a REST server e.g.
https://github.com/stealthly/dropwizard-kafka-http which I know folks have
done successfully.

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
/

On Mon, Dec 22, 2014 at 10:55 AM, Matti Waarna  wrote:

> We are using kafka version 0.8.1.1 and trying to produce from a C# app.
>
> I realize that there is no official C# library release and want to get
> your experience with the existing solutions that are currently available.
>
> I am looking for a solution that is a) stable enough for production
> environment and b) performs well.
>
> 1) A couple of active github projects are available along with a few forks
> each.
> Has Anybody worked on either of the following two options to contribute
> their findings?
> https://github.com/Jroland/kafka-net
>
> https://github.com/miknil/Kafka4n
>
> 2) Also there is the option of IKVM to import kafka JARS into a .net DLL.
>
> Maybe even another solution?
>
> Thanks
>
> MATTI
>
>


Re: Max. storage for Kafka and impact

2014-12-19 Thread Joe Stein
Wait, how do you get 2,000 topics each with 50 partitions == 1,000,000
partitions? I think you can take what I said below and change my 250 to 25
as I went with your result (1,000,000) and not your arguments (2,000 x 50).

And you should think on the processing as a separate step from fetch and
commit your offset in batch post processing. Then you only need more
partitions to fetch batches to process in parallel.

Regards, Joestein

On Fri, Dec 19, 2014 at 12:01 PM, Joe Stein  wrote:
>
> see some comments inline
>
> On Fri, Dec 19, 2014 at 11:30 AM, Achanta Vamsi Subhash <
> achanta.va...@flipkart.com> wrote:
>>
>> We require:
>> - many topics
>> - ordering of messages for every topic
>>
>
> Ordering is only on a per partition basis so you might have to pick a
> partition key that makes sense for what you are doing.
>
>
>> - Consumers hit different Http EndPoints which may be slow (in a push
>> model). In case of a Pull model, consumers may pull at the rate at which
>> they can process.
>> - We need parallelism to hit with as many consumers. Hence, we currently
>> have around 50 consumers/topic => 50 partitions.
>>
>
> I think you might be mixing up the fetch with the processing. You can have
> 1 partition and still have 50 message being processed in parallel (so a
> batch of messages).
>
> What language are you working in? How are you doing this processing
> exactly?
>
>
>>
>> Currently we have:
>> 2000 topics x 50 => 1,00,000 partitions.
>>
>
> If this is really the case then you are going to need at least 250 brokers
> (~ 4,000 partitions per broker).
>
> If you do that then you are in the 200TB per day world which doesn't sound
> to be the case.
>
> I really think you need to strategize more on your processing model some
> more.
>
>
>>
>> The incoming rate of ingestion at max is 100 MB/sec. We are planning for a
>> big cluster with many brokers.
>
>
> It is possible to handle this on just 3 brokers depending on message size,
> ability to batch, durability are also factors you really need to be
> thinking about.
>
>
>>
>> We have exactly the same use cases as mentioned in this video (usage at
>> LinkedIn):
>> https://www.youtube.com/watch?v=19DvtEC0EbQ​
>>
>> ​To handle the zookeeper scenario, as mentioned in the above video, we are
>> planning to use SSDs​ and would upgrade to the new consumer (0.9+) once
>> its
>> available as per the below video.
>> https://www.youtube.com/watch?v=7TZiN521FQA
>>
>> On Fri, Dec 19, 2014 at 9:06 PM, Jayesh Thakrar
>> > > wrote:
>>
>> > Technically/conceptually it is possible to have 200,000 topics, but do
>> you
>> > really need it like that?What do you intend to do with those messages -
>> > i.e. how do you forsee them being processed downstream? And are those
>> > topics really there to segregate different kinds of processing or
>> different
>> > ids?E.g. if you were LinkedIn, Facebook or Google, would you have have
>> one
>> > topic per user or one topic per kind of event (e.g. login, pageview,
>> > adview, etc.)Remember there is significant book-keeping done within
>> > Zookeeper - and these many topics will make that book-keeping
>> significant.
>> > As for storage, I don't think it should be an issue with sufficient
>> > spindles, servers and higher than default memory configuration.
>> > Jayesh
>> >   From: Achanta Vamsi Subhash 
>> >  To: "users@kafka.apache.org" 
>> >  Sent: Friday, December 19, 2014 9:00 AM
>> >  Subject: Re: Max. storage for Kafka and impact
>> >
>> > Yes. We need those many max partitions as we have a central messaging
>> > service and thousands of topics.
>> >
>> > On Friday, December 19, 2014, nitin sharma > >
>> > wrote:
>> >
>> > > hi,
>> > >
>> > > Few things you have to plan for:
>> > > a. Ensure that from resilience point of view, you are having
>> sufficient
>> > > follower brokers for your partitions.
>> > > b. In my testing of kafka (50TB/week) so far, haven't seen much issue
>> > with
>> > > CPU utilization or memory. I had 24 CPU and 32GB RAM.
>> > > c. 200,000 partitions means around 1MB/week/partition. are you sure
>> you
>> > > need so many partitions?
>> > >
>> > > Regards,
>> > > Nitin Kumar Sharma.
>> > >
>> > >
>> > > On Fri, Dec 19, 2014 at 9:12 AM, Achant

Re: Max. storage for Kafka and impact

2014-12-19 Thread Joe Stein
see some comments inline

On Fri, Dec 19, 2014 at 11:30 AM, Achanta Vamsi Subhash <
achanta.va...@flipkart.com> wrote:
>
> We require:
> - many topics
> - ordering of messages for every topic
>

Ordering is only on a per partition basis so you might have to pick a
partition key that makes sense for what you are doing.


> - Consumers hit different Http EndPoints which may be slow (in a push
> model). In case of a Pull model, consumers may pull at the rate at which
> they can process.
> - We need parallelism to hit with as many consumers. Hence, we currently
> have around 50 consumers/topic => 50 partitions.
>

I think you might be mixing up the fetch with the processing. You can have
1 partition and still have 50 message being processed in parallel (so a
batch of messages).

What language are you working in? How are you doing this processing
exactly?


>
> Currently we have:
> 2000 topics x 50 => 1,00,000 partitions.
>

If this is really the case then you are going to need at least 250 brokers
(~ 4,000 partitions per broker).

If you do that then you are in the 200TB per day world which doesn't sound
to be the case.

I really think you need to strategize more on your processing model some
more.


>
> The incoming rate of ingestion at max is 100 MB/sec. We are planning for a
> big cluster with many brokers.


It is possible to handle this on just 3 brokers depending on message size,
ability to batch, durability are also factors you really need to be
thinking about.


>
> We have exactly the same use cases as mentioned in this video (usage at
> LinkedIn):
> https://www.youtube.com/watch?v=19DvtEC0EbQ​
>
> ​To handle the zookeeper scenario, as mentioned in the above video, we are
> planning to use SSDs​ and would upgrade to the new consumer (0.9+) once its
> available as per the below video.
> https://www.youtube.com/watch?v=7TZiN521FQA
>
> On Fri, Dec 19, 2014 at 9:06 PM, Jayesh Thakrar
>  > wrote:
>
> > Technically/conceptually it is possible to have 200,000 topics, but do
> you
> > really need it like that?What do you intend to do with those messages -
> > i.e. how do you forsee them being processed downstream? And are those
> > topics really there to segregate different kinds of processing or
> different
> > ids?E.g. if you were LinkedIn, Facebook or Google, would you have have
> one
> > topic per user or one topic per kind of event (e.g. login, pageview,
> > adview, etc.)Remember there is significant book-keeping done within
> > Zookeeper - and these many topics will make that book-keeping
> significant.
> > As for storage, I don't think it should be an issue with sufficient
> > spindles, servers and higher than default memory configuration.
> > Jayesh
> >   From: Achanta Vamsi Subhash 
> >  To: "users@kafka.apache.org" 
> >  Sent: Friday, December 19, 2014 9:00 AM
> >  Subject: Re: Max. storage for Kafka and impact
> >
> > Yes. We need those many max partitions as we have a central messaging
> > service and thousands of topics.
> >
> > On Friday, December 19, 2014, nitin sharma 
> > wrote:
> >
> > > hi,
> > >
> > > Few things you have to plan for:
> > > a. Ensure that from resilience point of view, you are having sufficient
> > > follower brokers for your partitions.
> > > b. In my testing of kafka (50TB/week) so far, haven't seen much issue
> > with
> > > CPU utilization or memory. I had 24 CPU and 32GB RAM.
> > > c. 200,000 partitions means around 1MB/week/partition. are you sure you
> > > need so many partitions?
> > >
> > > Regards,
> > > Nitin Kumar Sharma.
> > >
> > >
> > > On Fri, Dec 19, 2014 at 9:12 AM, Achanta Vamsi Subhash <
> > > achanta.va...@flipkart.com > wrote:
> > > >
> > > > We definitely need a retention policy of a week. Hence.
> > > >
> > > > On Fri, Dec 19, 2014 at 7:40 PM, Achanta Vamsi Subhash <
> > > > achanta.va...@flipkart.com > wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > We are using Kafka for our messaging system and we have an estimate
> > for
> > > > > 200 TB/week in the coming months. Will it impact any performance
> for
> > > > Kafka?
> > > > >
> > > > > PS: We will be having greater than 2 lakh partitions.
> >
> >
> > > > >
> > > > > --
> > > > > Regards
> > > > > Vamsi Subhash
> > > > >
> > > >
> > > >
> > > > --
> > > > Regards
> > > > Vamsi Subhash
> > > >
> > >
> >
> >
> > --
> > Regards
> > Vamsi Subhash
> >
> >
> >
> >
>
>
>
> --
> Regards
> Vamsi Subhash
>


Re: Better exception handling in kafka.producer.async.DefaultEventHandler

2014-12-18 Thread Joe Stein
I would suggest to use the new client java producer in 0.8.2-beta
http://kafka.apache.org/082/javadoc/org/apache/kafka/clients/producer/KafkaProducer.html
it handles the case you brought up (among lots of other goodies).

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
/

On Thu, Dec 18, 2014 at 4:27 PM, Xiaoyu Wang  wrote:
>
> Hello,
>
> I am looking at 0.8.1.1, the kafka.producer.async.DefaultEventHandler
> file. Below is the dispatchSerializedData function. Looks like we are
> catching exception outside the loop and purely logs an error message.
> We then return failedProduceRequests.
>
> In case one broker is having problem, messages that will be sent to
> brokers after the problematic broker will NOT be included in the
> failedTopicAndPartitions and will be ignored quietly. Is this correct?
> Shall we change the code to catch exception for sending message to
> each broker?
>
> Thanks
>
> private def dispatchSerializedData(messages:
> Seq[KeyedMessage[K,Message]]): Seq[KeyedMessage[K, Message]] = {
>   val partitionedDataOpt = partitionAndCollate(messages)
>   partitionedDataOpt match {
> case Some(partitionedData) =>
>   val failedProduceRequests = new ArrayBuffer[KeyedMessage[K,Message]]
>   try {
>
> *  for ((brokerid, messagesPerBrokerMap) <- partitionedData) { *
>   if (logger.isTraceEnabled)
> messagesPerBrokerMap.foreach(partitionAndEvent =>
>   trace("Handling event for Topic: %s, Broker: %d,
> Partitions: %s".format(partitionAndEvent._1, brokerid,
> partitionAndEvent._2)))
>   val messageSetPerBroker =
> groupMessagesToSet(messagesPerBrokerMap)
>
>   val failedTopicPartitions = send(brokerid, messageSetPerBroker)
>   failedTopicPartitions.foreach(topicPartition => {
> messagesPerBrokerMap.get(topicPartition) match {
>   case Some(data) => failedProduceRequests.appendAll(data)
>   case None => // nothing
> }
>   })
> }
>
>
>
> *   } catch {case t: Throwable => error("Failed to send
> messages", t)  }  *failedProduceRequests
> case None => // all produce requests failed
>   messages
>   }
> }
>


Re: Kafka getting some love from Logstash and Elasticsearch

2014-12-15 Thread Joe Stein
Nice!

On Mon, Dec 15, 2014 at 11:41 AM, Roger Hoover 
wrote:
>
> Joseph,
>
> That's great!  Thank you for writing that plugin.
>
> Cheers,
>
> Roger
>
> On Mon, Dec 15, 2014 at 7:24 AM, Joseph Lawson 
> wrote:
> >
> > ?Kafka made some headlines with Logstash announcing their latest version
> > beta (1.5) which includes by default a Kafka input and output plugin.
> Good
> > stuff.  http://www.elasticsearch.org/blog/logstash-1-5-0-beta1-released/
> ?
> >
> >
> >
> >
>


Re: spark kafka batch integration

2014-12-14 Thread Joe Stein
I like the idea of the KafkaRDD and Spark partition/split per Kafka
partition. That is good use of the SimpleConsumer.

I can see a few different strategies for the commitOffsets and
partitionOwnership.

What use case are you committing your offsets for?

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
/

On Sun, Dec 14, 2014 at 8:22 PM, Koert Kuipers  wrote:
>
> hello all,
> we at tresata wrote a library to provide for batch integration between
> spark and kafka. it supports:
> * distributed write of rdd to kafa
> * distributed read of rdd from kafka
>
> our main use cases are (in lambda architecture speak):
> * periodic appends to the immutable master dataset on hdfs from kafka using
> spark
> * make non-streaming data available in kafka with periodic data drops from
> hdfs using spark. this is to facilitate merging the speed and batch layers
> in spark-streaming
> * distributed writes from spark-streaming
>
> see here:
> https://github.com/tresata/spark-kafka
>
> best,
> koert
>


Re: csv files to kafka

2014-12-14 Thread Joe Stein
Here is a non-production example of a file load to Kafka using Scala
https://github.com/stealthly/f2k/blob/master/src/main/scala/ly/stealth/f2k/KafkaUploader.scala


If you wanted to convert that to javacode the Scala compiler does that for
you :) and makes it importable as jar to your java program, or you can use
as reference and rewrite it in Java.

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
/

On Sun, Dec 14, 2014 at 1:39 AM, kishore kumar 
wrote:
>
> HI Experts,
>
> I want to load csv files to kafka, anybody help me to write javacode for
> this?
>
> --
> Thanks,
> Kishore.
>


Re: how to achieve availability with no data loss when replica = 1?

2014-12-10 Thread Joe Stein
By replica == 1 do you mean replication-factor == 1 or something different?

You should have replication-factor == 3 if you are trying to have durable
writes survive failure. On the producer side set ack = -1 with that for it
to work as expected.

On Wed, Dec 10, 2014 at 7:14 PM, Helin Xiang  wrote:

> Thanks for the reply , Joe.
>
> In my opinion, when replica == 1, the ack == -1 would cause producer
> stopping sending any data to kafka cluster if 1 broker is down. That means
> we could not bear single point of failure. Am I right?
>
> What we want is when 1 broker down, and the topic replica is set to 1, the
> whole system is still available and the data would go to other partitions
> without loss.
>
>
> THANKS again.
>
> On Thu, Dec 11, 2014 at 12:37 AM, Joe Stein  wrote:
>
> > If you want no data loss then you need to set ack = -1
> > Copied from https://kafka.apache.org/documentation.html#producerconfigs
> ==
> > -1, which means that the producer gets an acknowledgement after all
> in-sync
> > replicas have received the data. This option provides the best
> durability,
> > we guarantee that no messages will be lost as long as at least one in
> sync
> > replica remains.
> >
> > /***
> >  Joe Stein
> >  Founder, Principal Consultant
> >  Big Data Open Source Security LLC
> >  http://www.stealth.ly
> >  Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> > /
> >
> > On Wed, Dec 10, 2014 at 11:27 AM, Helin Xiang  wrote:
> >
> > > Hi,
> > >
> > > in some topics of our system, the data volumn is so huge that we think
> > > doing extra replica is a waste of disk and network resource( plus the
> > data
> > > is not so important).
> > >
> > > firstly, we use 1 replica + ack=0, found when 1 broker is down, the
> data
> > > would loss 1/n.
> > > then we tried 1 replica + ack=1, and found after 3 tries, the data is
> > still
> > > lost. and when we set the try number large enough, no more data can be
> > > produced to Kafka.
> > >
> > > If I did not misunderstand, In 0.7, when 1 broker is down, both
> producing
> > > and consuming are available with no data loss. I can see the reason why
> > > kafka 0.8 is designed to be different like 0.7.  but is there a way to
> > let
> > > producer of 0.8 act like the behavior in 0.7?  We don't care which part
> > of
> > > data should go to the specific partition, as long as the data goes into
> > the
> > > kafka with no loss.
> > >
> > >
> > > Thanks
> > >
> > >
> > >
> > >
> > > --
> > >
> > >
> > > *Best RegardsXiang Helin*
> > >
> >
>
>
>
> --
>
>
> *Best Regards向河林*
>


New Go Kafka Client

2014-12-10 Thread Joe Stein
Hi, we open sourced a new Go Kafka Client
http://github.com/stealthly/go_kafka_client some more info on blog post
http://allthingshadoop.com/2014/12/10/making-big-data-go/ for those working
with Go or looking to get into Go and Apache Kafka.

/***
Joe Stein
Founder, Principal Consultant
Big Data Open Source Security LLC
http://www.stealth.ly
Twitter: @allthingshadoop
/


Re: how to achieve availability with no data loss when replica = 1?

2014-12-10 Thread Joe Stein
If you want no data loss then you need to set ack = -1
Copied from https://kafka.apache.org/documentation.html#producerconfigs ==
-1, which means that the producer gets an acknowledgement after all in-sync
replicas have received the data. This option provides the best durability,
we guarantee that no messages will be lost as long as at least one in sync
replica remains.

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
/

On Wed, Dec 10, 2014 at 11:27 AM, Helin Xiang  wrote:

> Hi,
>
> in some topics of our system, the data volumn is so huge that we think
> doing extra replica is a waste of disk and network resource( plus the data
> is not so important).
>
> firstly, we use 1 replica + ack=0, found when 1 broker is down, the data
> would loss 1/n.
> then we tried 1 replica + ack=1, and found after 3 tries, the data is still
> lost. and when we set the try number large enough, no more data can be
> produced to Kafka.
>
> If I did not misunderstand, In 0.7, when 1 broker is down, both producing
> and consuming are available with no data loss. I can see the reason why
> kafka 0.8 is designed to be different like 0.7.  but is there a way to let
> producer of 0.8 act like the behavior in 0.7?  We don't care which part of
> data should go to the specific partition, as long as the data goes into the
> kafka with no loss.
>
>
> Thanks
>
>
>
>
> --
>
>
> *Best RegardsXiang Helin*
>


Re: Kafka Issue [Corrupted broker]

2014-12-09 Thread Joe Stein
It looks like broker 5 is in a bad state. You are likely going to have to
shut it down. From there you have a few options and depending on your
environment setup will dictate if you do shut it down and/or what you do
after that. Spinning up another server with broker.id == 5  and let
replication heal the topics that were durable is a way to go. If you do
that then you can go back to the old server and debug what went wrong and
recover the replication factor == 1 partition data (back it up) and fix
that later after you figure out what went wrong.

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
/

On Tue, Dec 9, 2014 at 2:54 AM, ashendra bansal 
wrote:

> Hi,
>
> One of the broker seems to have got corrupted in my cluster of 7
> brokers. All the topic partitions where this broker was leader are having
> NoLeader or UnderReplicated partition exceptions.
>
> All these partittions have no leader and even no replica in the isr(in-sync
> replica) set.
>
> Corrupt broker id - 5.
>
> topic: topic1 partition: 2 leader: -1 replicas: 5 isr:
> topic: topic1 partition: 8 leader: -1 replicas: 5 isr:
> topic: topic1 partition: 14 leader: -1 replicas: 5 isr:
> topic: topic2 partition: 1 leader: -1 replicas: 5 isr:
> topic: topic2 partition: 8 leader: -1 replicas: 5 isr:
> topic: topic2 partition: 15 leader: -1 replicas: 5 isr:
> topic: topic3 partition: 1 leader: -1 replicas: 5 isr:
> topic: topic3 partition: 8 leader: -1 replicas: 5 isr:
> topic: topic3 partition: 15 leader: -1 replicas: 5 isr:
>
> I have tried the replication tools to manually assign broker to these
> partitions but that did not helped. As none of them are in isr set.
>
> Unfortunately the replication factor for these topics was 1. But for topics
> where the replication factor was higher, the problem persist. There the
> leader has been assigned to the next preferred replica but the replica on
> corrupt broker is not moved to isr set even after long time(days) and
> partitions have logs in order of 100s.
>
> topic: topic4 partition: 1 leader: 6 replicas: 5,6 isr: 6
>
> For same topic, the partition where leader was not broker 5(corrupted
> broker) there broker 5 is still in isr set.
>
> topic: topic4 partition: 0 leader: 4 replicas: 4,5 isr: 4,5
>
> Another observation, the corrupted broker has topic creation log in its
> INFO logs, printed very frequently, every minute
>
> [2014-12-09 13:07:27,878] INFO Topic creation { "partitions":{ "0":[ 4, 3
> ], "1":[ 5, 4 ] }, "version":1 } (kafka.admin.AdminUtils$)
>
> Though there are no topics created on the cluster.
>
> Has anyone faced a similar problem. How can I fix it.
>
> Ashendra
>


Re: unable to create topic

2014-12-08 Thread Joe Stein
Cloudera Manager utilizes the chroot znode structures so you should connect
with --zookeeper localhost:2181/kafka

Or whatever value CM has set for the chroot path of your installation.

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
/

On Mon, Dec 8, 2014 at 4:04 AM, kishore kumar  wrote:

> I have running zookeeper 3.4.5-cdh5.2.0 cluster on 3 nodes and managing
> with cm5.2, Integrated kafka with "
>
> http://www.cloudera.com/content/cloudera/en/developers/home/cloudera-labs/apache-kafka.html
> ",
>
>
> The problem is, when I try to create a topic with
>
>
>
> *# bin/kafka-topics.sh --create --zookeeper localhost:2181
> --replication-factor 1 --partitions 1 --topic test*
>
>
>
> *the error is *
>
>
>
> *Nonode for /brokers/ids*
>
>
> *I created the path /brokers/ids/0,1,2 in zookeeper manually, still
> the problem is same.*
>
>
>
>
> *How to get rid from this, any help ?*
>
>
> *Thanks,*
>
>
> *Kishore.*
>


  1   2   3   4   >