Re: Questions about .9 consumer API

2015-10-22 Thread Guozhang Wang
Hi Mohit:

In general new consumers will abstract developers from any network
failures. More specifically.

1) consumers will automatically try to re-fetch the messages if the
previous fetch has failed.
2) consumers will remember the currently fetch positions after each
successful fetch, and can periodically commit these offsets back to Kafka.

Guozhang

On Thu, Oct 22, 2015 at 10:11 AM, Mohit Anchlia 
wrote:

> It looks like the new consumer API expects developers to manage the
> failures? Or is there some other API that can abstract the failures,
> primarily:
>
> 1) Automatically resent failed messages because of network issue or some
> other issue between the broker and the consumer
> 2) Ability to acknowledge receipt of a message by the consumer such that
> message is sent again if consumer fails to acknowledge the receipt.
>
> Is there such an API or are the clients expected to deal with failure
> scenarios?
>
> Docs I am looking at are here:
>
> http://people.apache.org/~nehanarkhede/kafka-0.9-consumer-javadoc/doc/
>



-- 
-- Guozhang


Re: future of Camus?

2015-10-22 Thread Henry Cai
Take a look at secor:

https://github.com/pinterest/secor

Secor is a no-frill kafka->HDFS/Ingesting tool, doesn't depend on any
underlying systems such as Hadoop, it only uses Kafka high level consumer
to balance the work loads.  Very easy to understand and manage.  It's
probably the 2nd most popular kafka/HDFS ingestion tool (behind camus).
Lots of web companies use this to do the kafka data ingestion
(Pinterest/Uber/AirBnb).


On Thu, Oct 22, 2015 at 3:56 AM, Adrian Woodhead 
wrote:

> Hello all,
>
> We're looking at options for getting data from Kafka onto HDFS and Camus
> looks like the natural choice for this. It's also evident that LinkedIn who
> originally created Camus are taking things in a different direction and are
> advising people to use their Gobblin ETL framework instead. We feel that
> Gobblin is overkill for many simple use cases and Camus seems a much
> simpler and better fit. The problem now is that with LinkedIn apparently
> withdrawing official support for it it appears that any changes to Camus
> are being managed by various forks of it and it looks like everyone is
> building and using their own versions. Wouldn't it be better for a
> community to form around one official fork so development efforts can be
> focused on this? Any thoughts on this?
>
> Thanks,
>
> Adrian
>
>


future of Camus?

2015-10-22 Thread Adrian Woodhead
Hello all,

We're looking at options for getting data from Kafka onto HDFS and Camus looks 
like the natural choice for this. It's also evident that LinkedIn who 
originally created Camus are taking things in a different direction and are 
advising people to use their Gobblin ETL framework instead. We feel that 
Gobblin is overkill for many simple use cases and Camus seems a much simpler 
and better fit. The problem now is that with LinkedIn apparently withdrawing 
official support for it it appears that any changes to Camus are being managed 
by various forks of it and it looks like everyone is building and using their 
own versions. Wouldn't it be better for a community to form around one official 
fork so development efforts can be focused on this? Any thoughts on this?

Thanks,

Adrian



Re: future of Camus?

2015-10-22 Thread Todd Snyder
Another alternative is to checkout Kaboom

‎  https://github.com/blackberry/KaBoom

‎It uses a pared down kafka consumer library to pull data from Kafka and write 
it to defined (and somewhat dynamic) hdfs paths in a custom (and changeable) 
avro schema we call boom. It uses kerberos for authentication, and supports 
very high throughout.

It's still actively being developed, with a new release coming soon with 
enhanced configuration through a new rest api (kontroller).

Cheers

Todd.



Sent from my BlackBerry 10 smartphone on the TELUS network.
  Original Message
From: Guozhang Wang
Sent: Thursday, October 22, 2015 5:03 PM
To: users@kafka.apache.org
Reply To: users@kafka.apache.org
Subject: Re: future of Camus?


Hi Adrian,

Another alternative approach is to use Kafka's own Copycat framework for
data ingressing / egressing. It will be released in our 0.9.0 version
expected in Nov.

Under Copycat users can write different "connector" instantiated for
different source / sink systems, while for your case there is a in-built
HDFS connector coming along with the framework itself. You can find more
details in these Kafka wikis / java docs:

https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=58851767

https://s3-us-west-2.amazonaws.com/confluent-files/copycat-docs-wip/intro.html

Guozhang


On Thu, Oct 22, 2015 at 12:52 PM, Henry Cai 
wrote:

> Take a look at secor:
>
> https://github.com/pinterest/secor
>
> Secor is a no-frill kafka->HDFS/Ingesting tool, doesn't depend on any
> underlying systems such as Hadoop, it only uses Kafka high level consumer
> to balance the work loads.  Very easy to understand and manage.  It's
> probably the 2nd most popular kafka/HDFS ingestion tool (behind camus).
> Lots of web companies use this to do the kafka data ingestion
> (Pinterest/Uber/AirBnb).
>
>
> On Thu, Oct 22, 2015 at 3:56 AM, Adrian Woodhead 
> wrote:
>
> > Hello all,
> >
> > We're looking at options for getting data from Kafka onto HDFS and Camus
> > looks like the natural choice for this. It's also evident that LinkedIn
> who
> > originally created Camus are taking things in a different direction and
> are
> > advising people to use their Gobblin ETL framework instead. We feel that
> > Gobblin is overkill for many simple use cases and Camus seems a much
> > simpler and better fit. The problem now is that with LinkedIn apparently
> > withdrawing official support for it it appears that any changes to Camus
> > are being managed by various forks of it and it looks like everyone is
> > building and using their own versions. Wouldn't it be better for a
> > community to form around one official fork so development efforts can be
> > focused on this? Any thoughts on this?
> >
> > Thanks,
> >
> > Adrian
> >
> >
>



--
-- Guozhang


kafka 0.8 consumer polling topic

2015-10-22 Thread Kudumula, Surender
Hi all
General question does the current kafka consumer needs to be written in java 
threads in order to poll the topic continuously and how should it be written 
any ideas please? Thanks

Regards

Surender Kudumula
Big Data Consultant - EMEA
Analytics & Data Management

surender.kudum...@hpe.com
M +44 7795970923

Hewlett-Packard Enterprise
Cain Rd,
Bracknell
RG12 1HN
UK

[http://graphics8.nytimes.com/images/2015/06/03/technology/03bits-hp/03bits-hp-master315.png]



Re: future of Camus?

2015-10-22 Thread Guozhang Wang
Hi Adrian,

Another alternative approach is to use Kafka's own Copycat framework for
data ingressing / egressing. It will be released in our 0.9.0 version
expected in Nov.

Under Copycat users can write different "connector" instantiated for
different source / sink systems, while for your case there is a in-built
HDFS connector coming along with the framework itself. You can find more
details in these Kafka wikis / java docs:

https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=58851767

https://s3-us-west-2.amazonaws.com/confluent-files/copycat-docs-wip/intro.html

Guozhang


On Thu, Oct 22, 2015 at 12:52 PM, Henry Cai 
wrote:

> Take a look at secor:
>
> https://github.com/pinterest/secor
>
> Secor is a no-frill kafka->HDFS/Ingesting tool, doesn't depend on any
> underlying systems such as Hadoop, it only uses Kafka high level consumer
> to balance the work loads.  Very easy to understand and manage.  It's
> probably the 2nd most popular kafka/HDFS ingestion tool (behind camus).
> Lots of web companies use this to do the kafka data ingestion
> (Pinterest/Uber/AirBnb).
>
>
> On Thu, Oct 22, 2015 at 3:56 AM, Adrian Woodhead 
> wrote:
>
> > Hello all,
> >
> > We're looking at options for getting data from Kafka onto HDFS and Camus
> > looks like the natural choice for this. It's also evident that LinkedIn
> who
> > originally created Camus are taking things in a different direction and
> are
> > advising people to use their Gobblin ETL framework instead. We feel that
> > Gobblin is overkill for many simple use cases and Camus seems a much
> > simpler and better fit. The problem now is that with LinkedIn apparently
> > withdrawing official support for it it appears that any changes to Camus
> > are being managed by various forks of it and it looks like everyone is
> > building and using their own versions. Wouldn't it be better for a
> > community to form around one official fork so development efforts can be
> > focused on this? Any thoughts on this?
> >
> > Thanks,
> >
> > Adrian
> >
> >
>



-- 
-- Guozhang


Re: future of Camus?

2015-10-22 Thread vivek thakre
We are using Apache Flume as a router to consume data from Kafka and push
to HDFS.
With Flume 1.6, Kafka Channel, Source and Sink are available out of the box.

Here is the blog post from Cloudera
http://blog.cloudera.com/blog/2014/11/flafka-apache-flume-meets-apache-kafka-for-event-processing/

Thanks,

Vivek Thakre



On Thu, Oct 22, 2015 at 2:29 PM, Hawin Jiang  wrote:

> Very useful information for us.
> Thanks Guozhang.
> On Oct 22, 2015 2:02 PM, "Guozhang Wang"  wrote:
>
> > Hi Adrian,
> >
> > Another alternative approach is to use Kafka's own Copycat framework for
> > data ingressing / egressing. It will be released in our 0.9.0 version
> > expected in Nov.
> >
> > Under Copycat users can write different "connector" instantiated for
> > different source / sink systems, while for your case there is a in-built
> > HDFS connector coming along with the framework itself. You can find more
> > details in these Kafka wikis / java docs:
> >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=58851767
> >
> >
> >
> https://s3-us-west-2.amazonaws.com/confluent-files/copycat-docs-wip/intro.html
> >
> > Guozhang
> >
> >
> > On Thu, Oct 22, 2015 at 12:52 PM, Henry Cai 
> > wrote:
> >
> > > Take a look at secor:
> > >
> > > https://github.com/pinterest/secor
> > >
> > > Secor is a no-frill kafka->HDFS/Ingesting tool, doesn't depend on any
> > > underlying systems such as Hadoop, it only uses Kafka high level
> consumer
> > > to balance the work loads.  Very easy to understand and manage.  It's
> > > probably the 2nd most popular kafka/HDFS ingestion tool (behind camus).
> > > Lots of web companies use this to do the kafka data ingestion
> > > (Pinterest/Uber/AirBnb).
> > >
> > >
> > > On Thu, Oct 22, 2015 at 3:56 AM, Adrian Woodhead  >
> > > wrote:
> > >
> > > > Hello all,
> > > >
> > > > We're looking at options for getting data from Kafka onto HDFS and
> > Camus
> > > > looks like the natural choice for this. It's also evident that
> LinkedIn
> > > who
> > > > originally created Camus are taking things in a different direction
> and
> > > are
> > > > advising people to use their Gobblin ETL framework instead. We feel
> > that
> > > > Gobblin is overkill for many simple use cases and Camus seems a much
> > > > simpler and better fit. The problem now is that with LinkedIn
> > apparently
> > > > withdrawing official support for it it appears that any changes to
> > Camus
> > > > are being managed by various forks of it and it looks like everyone
> is
> > > > building and using their own versions. Wouldn't it be better for a
> > > > community to form around one official fork so development efforts can
> > be
> > > > focused on this? Any thoughts on this?
> > > >
> > > > Thanks,
> > > >
> > > > Adrian
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>


Re: future of Camus?

2015-10-22 Thread Hawin Jiang
Very useful information for us.
Thanks Guozhang.
On Oct 22, 2015 2:02 PM, "Guozhang Wang"  wrote:

> Hi Adrian,
>
> Another alternative approach is to use Kafka's own Copycat framework for
> data ingressing / egressing. It will be released in our 0.9.0 version
> expected in Nov.
>
> Under Copycat users can write different "connector" instantiated for
> different source / sink systems, while for your case there is a in-built
> HDFS connector coming along with the framework itself. You can find more
> details in these Kafka wikis / java docs:
>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=58851767
>
>
> https://s3-us-west-2.amazonaws.com/confluent-files/copycat-docs-wip/intro.html
>
> Guozhang
>
>
> On Thu, Oct 22, 2015 at 12:52 PM, Henry Cai 
> wrote:
>
> > Take a look at secor:
> >
> > https://github.com/pinterest/secor
> >
> > Secor is a no-frill kafka->HDFS/Ingesting tool, doesn't depend on any
> > underlying systems such as Hadoop, it only uses Kafka high level consumer
> > to balance the work loads.  Very easy to understand and manage.  It's
> > probably the 2nd most popular kafka/HDFS ingestion tool (behind camus).
> > Lots of web companies use this to do the kafka data ingestion
> > (Pinterest/Uber/AirBnb).
> >
> >
> > On Thu, Oct 22, 2015 at 3:56 AM, Adrian Woodhead 
> > wrote:
> >
> > > Hello all,
> > >
> > > We're looking at options for getting data from Kafka onto HDFS and
> Camus
> > > looks like the natural choice for this. It's also evident that LinkedIn
> > who
> > > originally created Camus are taking things in a different direction and
> > are
> > > advising people to use their Gobblin ETL framework instead. We feel
> that
> > > Gobblin is overkill for many simple use cases and Camus seems a much
> > > simpler and better fit. The problem now is that with LinkedIn
> apparently
> > > withdrawing official support for it it appears that any changes to
> Camus
> > > are being managed by various forks of it and it looks like everyone is
> > > building and using their own versions. Wouldn't it be better for a
> > > community to form around one official fork so development efforts can
> be
> > > focused on this? Any thoughts on this?
> > >
> > > Thanks,
> > >
> > > Adrian
> > >
> > >
> >
>
>
>
> --
> -- Guozhang
>


Re: kafka 0.8 consumer polling topic

2015-10-22 Thread Guozhang Wang
You can find the Java doc with some examples under "KafkaConsumer" here:

http://people.apache.org/~nehanarkhede/kafka-0.9-consumer-javadoc/doc/

Guozhang


On Thu, Oct 22, 2015 at 1:28 PM, Kudumula, Surender <
surender.kudum...@hpe.com> wrote:

> Hi all
>
> General question does the current kafka consumer needs to be written in
> java threads in order to poll the topic continuously and how should it be
> written any ideas please? Thanks
>
>
>
> Regards
>
>
>
>
> *Surender Kudumula *Big Data Consultant - EMEA
> Analytics & Data Management
>
>
>
> surender.kudum...@hpe.com
> M +44 7795970923
>
>
> Hewlett-Packard Enterprise
> Cain Rd,
>
> Bracknell
>
> RG12 1HN
> UK
>
> [image:
> http://graphics8.nytimes.com/images/2015/06/03/technology/03bits-hp/03bits-hp-master315.png]
>
>
>



-- 
-- Guozhang


Re: Questions about .9 consumer API

2015-10-22 Thread Mohit Anchlia
It's in this link. Most of the examples have some kind of error handling

http://people.apache.org/~nehanarkhede/kafka-0.9-consumer-javadoc/doc/

On Thu, Oct 22, 2015 at 7:45 PM, Guozhang Wang  wrote:

> Could you point me to the exact examples that indicate user error handling?
>
> Guozhang
>
> On Thu, Oct 22, 2015 at 5:43 PM, Mohit Anchlia 
> wrote:
>
> > The examples in the javadoc seems to imply that developers need to manage
> > all of the aspects around failures. Those examples are for rewinding
> > offsets, dealing with failed portioned for instance.
> >
> > On Thu, Oct 22, 2015 at 11:17 AM, Guozhang Wang 
> > wrote:
> >
> > > Hi Mohit:
> > >
> > > In general new consumers will abstract developers from any network
> > > failures. More specifically.
> > >
> > > 1) consumers will automatically try to re-fetch the messages if the
> > > previous fetch has failed.
> > > 2) consumers will remember the currently fetch positions after each
> > > successful fetch, and can periodically commit these offsets back to
> > Kafka.
> > >
> > > Guozhang
> > >
> > > On Thu, Oct 22, 2015 at 10:11 AM, Mohit Anchlia <
> mohitanch...@gmail.com>
> > > wrote:
> > >
> > > > It looks like the new consumer API expects developers to manage the
> > > > failures? Or is there some other API that can abstract the failures,
> > > > primarily:
> > > >
> > > > 1) Automatically resent failed messages because of network issue or
> > some
> > > > other issue between the broker and the consumer
> > > > 2) Ability to acknowledge receipt of a message by the consumer such
> > that
> > > > message is sent again if consumer fails to acknowledge the receipt.
> > > >
> > > > Is there such an API or are the clients expected to deal with failure
> > > > scenarios?
> > > >
> > > > Docs I am looking at are here:
> > > >
> > > >
> http://people.apache.org/~nehanarkhede/kafka-0.9-consumer-javadoc/doc/
> > > >
> > >
> > >
> > >
> > > --
> > > -- Guozhang
> > >
> >
>
>
>
> --
> -- Guozhang
>


Re: Questions about .9 consumer API

2015-10-22 Thread Guozhang Wang
Could you point me to the exact examples that indicate user error handling?

Guozhang

On Thu, Oct 22, 2015 at 5:43 PM, Mohit Anchlia 
wrote:

> The examples in the javadoc seems to imply that developers need to manage
> all of the aspects around failures. Those examples are for rewinding
> offsets, dealing with failed portioned for instance.
>
> On Thu, Oct 22, 2015 at 11:17 AM, Guozhang Wang 
> wrote:
>
> > Hi Mohit:
> >
> > In general new consumers will abstract developers from any network
> > failures. More specifically.
> >
> > 1) consumers will automatically try to re-fetch the messages if the
> > previous fetch has failed.
> > 2) consumers will remember the currently fetch positions after each
> > successful fetch, and can periodically commit these offsets back to
> Kafka.
> >
> > Guozhang
> >
> > On Thu, Oct 22, 2015 at 10:11 AM, Mohit Anchlia 
> > wrote:
> >
> > > It looks like the new consumer API expects developers to manage the
> > > failures? Or is there some other API that can abstract the failures,
> > > primarily:
> > >
> > > 1) Automatically resent failed messages because of network issue or
> some
> > > other issue between the broker and the consumer
> > > 2) Ability to acknowledge receipt of a message by the consumer such
> that
> > > message is sent again if consumer fails to acknowledge the receipt.
> > >
> > > Is there such an API or are the clients expected to deal with failure
> > > scenarios?
> > >
> > > Docs I am looking at are here:
> > >
> > > http://people.apache.org/~nehanarkhede/kafka-0.9-consumer-javadoc/doc/
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>



-- 
-- Guozhang


Questions about .9 consumer API

2015-10-22 Thread Mohit Anchlia
It looks like the new consumer API expects developers to manage the
failures? Or is there some other API that can abstract the failures,
primarily:

1) Automatically resent failed messages because of network issue or some
other issue between the broker and the consumer
2) Ability to acknowledge receipt of a message by the consumer such that
message is sent again if consumer fails to acknowledge the receipt.

Is there such an API or are the clients expected to deal with failure
scenarios?

Docs I am looking at are here:

http://people.apache.org/~nehanarkhede/kafka-0.9-consumer-javadoc/doc/