Last time I checked, Camus doesn't support storing data as parquet, which is a deal breaker for me. Otherwise it works well for my Kafka topics with low data volume.
I am currently using spark streaming to ingest data, generate semi-realtime stats and publish to a dashboard, and dump full dataset into hdfs in parquet at a longer interval. One problem is that storing parquet is sometimes time consuming, and that cause delay of my regular stats-generating tasks. I am thinking of splitting my streaming job into two, one for parquet output and one for stats generation, but obviously this would consume data from Kafka twice. -Simon On Wednesday, May 6, 2015, Rendy Bambang Junior <rendy.b.jun...@gmail.com> wrote: > Because using spark streaming looks like a lot simpler. Whats the > difference between Camus and Kafka Streaming for this case? Why Camus excel? > > Rendy > > On Wed, May 6, 2015 at 2:15 PM, Saisai Shao <sai.sai.s...@gmail.com > <javascript:_e(%7B%7D,'cvml','sai.sai.s...@gmail.com');>> wrote: > >> Also Kafka has a Hadoop consumer API for doing such things, please refer >> to http://kafka.apache.org/081/documentation.html#kafkahadoopconsumerapi >> >> >> 2015-05-06 12:22 GMT+08:00 MrAsanjar . <afsan...@gmail.com >> <javascript:_e(%7B%7D,'cvml','afsan...@gmail.com');>>: >> >>> why not try https://github.com/linkedin/camus - camus is kafka to HDFS >>> pipeline >>> >>> On Tue, May 5, 2015 at 11:13 PM, Rendy Bambang Junior < >>> rendy.b.jun...@gmail.com >>> <javascript:_e(%7B%7D,'cvml','rendy.b.jun...@gmail.com');>> wrote: >>> >>>> Hi all, >>>> >>>> I am planning to load data from Kafka to HDFS. Is it normal to use >>>> spark streaming to load data from Kafka to HDFS? What are concerns on doing >>>> this? >>>> >>>> There are no processing to be done by Spark, only to store data to HDFS >>>> from Kafka for storage and for further Spark processing >>>> >>>> Rendy >>>> >>> >>> >> >