Re: What is the best way to write Kafka data into HDFS?

2016-02-11 Thread Jay Kreps
Check out Kafka Connect:

http://www.confluent.io/blog/how-to-build-a-scalable-etl-pipeline-with-kafka-connect

-Jay


On Wed, Feb 10, 2016 at 5:09 PM, R P  wrote:

> Hello All,
>   New Kafka user here. What is the best way to write Kafka data into HDFS?
> I have looked into following options and found that Flume is quickest and
> easiest to setup.
>
> 1. Flume
> 2. KaBoom
> 3. Kafka Hadoop Loader
> 4. Camus -> Gobblin
>
> Although Flume can result into small file problems when your data is
> partitioned and some partitions generate sporadic data.
>
> What are some best practices and options to write data from Kafka to HDFS?
>
> Thanks,
> R P
>
>
>
>


Re: What is the best way to write Kafka data into HDFS?

2016-02-11 Thread R P

Hello Steve, Thanks for the suggestion. Looks like this Git repo is not updated 
for more than 10 months. 
Is this project still supported? 
Where can I find current usage and performance metrics ?

Thanks,
R P

From: steve.mo...@gmail.com  on behalf of Steve Morin 

Sent: Wednesday, February 10, 2016 6:36 PM
To: users@kafka.apache.org
Subject: Re: What is the best way to write Kafka data into HDFS?

R P, happy to walk you through https://github.com/DemandCube/Scribengin if
your interested

On Wed, Feb 10, 2016 at 5:09 PM, R P  wrote:

> Hello All,
>   New Kafka user here. What is the best way to write Kafka data into HDFS?
> I have looked into following options and found that Flume is quickest and
> easiest to setup.
>
> 1. Flume
> 2. KaBoom
> 3. Kafka Hadoop Loader
> 4. Camus -> Gobblin
>
> Although Flume can result into small file problems when your data is
> partitioned and some partitions generate sporadic data.
>
> What are some best practices and options to write data from Kafka to HDFS?
>
> Thanks,
> R P
>
>
>
>


--
*Steve Morin | Managing Partner - CTO*

*Nvent*

O 800-407-1156 ext 803 <800-407-1156;803> | M 347-453-5579

smo...@nventdata.com  

*Enabling the Data Driven Enterprise*
*(Ask us how we can setup scalable open source realtime billion+ event/data
collection/analytics infrastructure in weeks)*

Service Areas: Management & Strategy Consulting | Data Engineering | Data
Science & Visualization

BigData Technologies: Hadoop & Ecosystem | NoSql| Hbase | Cassandra | Storm
| Spark | Kafka | Mesos | Docker | & More

Industries: IoT | Advertising | Retail | Manufacturing | TV & Cable |
Energy | Oil & Gas | Insurance | Finance | Telecom


RE: What is the best way to write Kafka data into HDFS?

2016-02-11 Thread Kudumula, Surender
May be you can try Apache NiFi its quicker as well. Give a try good luck




-Original Message-
From: R P [mailto:hadoo...@outlook.com] 
Sent: 11 February 2016 16:09
To: users@kafka.apache.org
Subject: Re: What is the best way to write Kafka data into HDFS?


Hello Steve, Thanks for the suggestion. Looks like this Git repo is not updated 
for more than 10 months. 
Is this project still supported? 
Where can I find current usage and performance metrics ?

Thanks,
R P

From: steve.mo...@gmail.com  on behalf of Steve Morin 

Sent: Wednesday, February 10, 2016 6:36 PM
To: users@kafka.apache.org
Subject: Re: What is the best way to write Kafka data into HDFS?

R P, happy to walk you through https://github.com/DemandCube/Scribengin if your 
interested

On Wed, Feb 10, 2016 at 5:09 PM, R P  wrote:

> Hello All,
>   New Kafka user here. What is the best way to write Kafka data into HDFS?
> I have looked into following options and found that Flume is quickest 
> and easiest to setup.
>
> 1. Flume
> 2. KaBoom
> 3. Kafka Hadoop Loader
> 4. Camus -> Gobblin
>
> Although Flume can result into small file problems when your data is 
> partitioned and some partitions generate sporadic data.
>
> What are some best practices and options to write data from Kafka to HDFS?
>
> Thanks,
> R P
>
>
>
>


--
*Steve Morin | Managing Partner - CTO*

*Nvent*

O 800-407-1156 ext 803 <800-407-1156;803> | M 347-453-5579

smo...@nventdata.com  

*Enabling the Data Driven Enterprise*
*(Ask us how we can setup scalable open source realtime billion+ event/data 
collection/analytics infrastructure in weeks)*

Service Areas: Management & Strategy Consulting | Data Engineering | Data 
Science & Visualization

BigData Technologies: Hadoop & Ecosystem | NoSql| Hbase | Cassandra | Storm
| Spark | Kafka | Mesos | Docker | & More

Industries: IoT | Advertising | Retail | Manufacturing | TV & Cable | Energy | 
Oil & Gas | Insurance | Finance | Telecom


Re: What is the best way to write Kafka data into HDFS?

2016-02-11 Thread R P
Hey Jay, 
  It's awesome to get reply from one of the key Kafka contributor :) .  Thanks 
for suggesting Kafka Connect.

How does Kafka-Connect deals with HDFS small files? ( I assume setting large 
flus.size allows user to maintain minimum HDFS file size.  )
Does Kafka-Connect keep file handle open until file is committed?  ( Flume 
keeps file handles open resulting into too many files open) 
Can I write custom serializer for kafka-connect ?

Thanks,
R P


From: Jay Kreps 
Sent: Thursday, February 11, 2016 11:45 AM
To: users@kafka.apache.org
Subject: Re: What is the best way to write Kafka data into HDFS?

Check out Kafka Connect:

http://www.confluent.io/blog/how-to-build-a-scalable-etl-pipeline-with-kafka-connect

-Jay


On Wed, Feb 10, 2016 at 5:09 PM, R P  wrote:

> Hello All,
>   New Kafka user here. What is the best way to write Kafka data into HDFS?
> I have looked into following options and found that Flume is quickest and
> easiest to setup.
>
> 1. Flume
> 2. KaBoom
> 3. Kafka Hadoop Loader
> 4. Camus -> Gobblin
>
> Although Flume can result into small file problems when your data is
> partitioned and some partitions generate sporadic data.
>
> What are some best practices and options to write data from Kafka to HDFS?
>
> Thanks,
> R P
>
>
>
>


Re: What is the best way to write Kafka data into HDFS?

2016-02-10 Thread Steve Morin
R P, happy to walk you through https://github.com/DemandCube/Scribengin if
your interested

On Wed, Feb 10, 2016 at 5:09 PM, R P  wrote:

> Hello All,
>   New Kafka user here. What is the best way to write Kafka data into HDFS?
> I have looked into following options and found that Flume is quickest and
> easiest to setup.
>
> 1. Flume
> 2. KaBoom
> 3. Kafka Hadoop Loader
> 4. Camus -> Gobblin
>
> Although Flume can result into small file problems when your data is
> partitioned and some partitions generate sporadic data.
>
> What are some best practices and options to write data from Kafka to HDFS?
>
> Thanks,
> R P
>
>
>
>


-- 
*Steve Morin | Managing Partner - CTO*

*Nvent*

O 800-407-1156 ext 803 <800-407-1156;803> | M 347-453-5579

smo...@nventdata.com  

*Enabling the Data Driven Enterprise*
*(Ask us how we can setup scalable open source realtime billion+ event/data
collection/analytics infrastructure in weeks)*

Service Areas: Management & Strategy Consulting | Data Engineering | Data
Science & Visualization

BigData Technologies: Hadoop & Ecosystem | NoSql| Hbase | Cassandra | Storm
| Spark | Kafka | Mesos | Docker | & More

Industries: IoT | Advertising | Retail | Manufacturing | TV & Cable |
Energy | Oil & Gas | Insurance | Finance | Telecom


Re: What is the best way to write Kafka data into HDFS?

2016-02-10 Thread Adam Kunicki
If you're looking for a lightweight solution with a friendly GUI (and fully
open source) check out streamsets.com

.
It supports writing messages to a parameterized directory hierarchy (e.g.
partitioned hive tables), support for late records if your template happens
to involve date/time variables.
How many messages per file and maximum file size are also fully
configurable.

Full Disclosure: I'm an engineer actively working on the project.

-Adam

On Wed, Feb 10, 2016 at 5:09 PM, R P  wrote:

> Hello All,
>   New Kafka user here. What is the best way to write Kafka data into HDFS?
> I have looked into following options and found that Flume is quickest and
> easiest to setup.
>
> 1. Flume
> 2. KaBoom
> 3. Kafka Hadoop Loader
> 4. Camus -> Gobblin
>
> Although Flume can result into small file problems when your data is
> partitioned and some partitions generate sporadic data.
>
> What are some best practices and options to write data from Kafka to HDFS?
>
> Thanks,
> R P
>
>
>
>


-- 
Adam Kunicki
StreamSets | Field Engineer
mobile: 415.890.DATA (3282) | linkedin



What is the best way to write Kafka data into HDFS?

2016-02-10 Thread R P
Hello All,
  New Kafka user here. What is the best way to write Kafka data into HDFS?
I have looked into following options and found that Flume is quickest and 
easiest to setup.

1. Flume
2. KaBoom
3. Kafka Hadoop Loader
4. Camus -> Gobblin

Although Flume can result into small file problems when your data is 
partitioned and some partitions generate sporadic data.

What are some best practices and options to write data from Kafka to HDFS?

Thanks,
R P