Last time I checked, Camus doesn't support storing data as parquet, which
is a deal breaker for me. Otherwise it works well for my Kafka topics with
low data volume.
I am currently using spark streaming to ingest data, generate semi-realtime
stats and publish to a dashboard, and dump full dataset
Also Kafka has a Hadoop consumer API for doing such things, please refer to
http://kafka.apache.org/081/documentation.html#kafkahadoopconsumerapi
2015-05-06 12:22 GMT+08:00 MrAsanjar . afsan...@gmail.com:
why not try https://github.com/linkedin/camus - camus is kafka to HDFS
pipeline
On
Because using spark streaming looks like a lot simpler. Whats the
difference between Camus and Kafka Streaming for this case? Why Camus excel?
Rendy
On Wed, May 6, 2015 at 2:15 PM, Saisai Shao sai.sai.s...@gmail.com wrote:
Also Kafka has a Hadoop consumer API for doing such things, please
Hi all,
I am planning to load data from Kafka to HDFS. Is it normal to use spark
streaming to load data from Kafka to HDFS? What are concerns on doing this?
There are no processing to be done by Spark, only to store data to HDFS
from Kafka for storage and for further Spark processing
Rendy
why not try https://github.com/linkedin/camus - camus is kafka to HDFS
pipeline
On Tue, May 5, 2015 at 11:13 PM, Rendy Bambang Junior
rendy.b.jun...@gmail.com wrote:
Hi all,
I am planning to load data from Kafka to HDFS. Is it normal to use spark
streaming to load data from Kafka to HDFS?