Re: Spark stream data from kafka topics and output as parquet file on HDFS

2014-08-06 Thread Mahebub Sayyed
Hello, I have referred link https://github.com/dibbhatt/kafka-spark-consumer; and I have successfully consumed tuples from kafka. Tuples are JSON objects and I want to store that objects in HDFS as parque format. Please suggest me any sample example for that. Thanks in advance. On Tue, Aug

Re: Spark stream data from kafka topics and output as parquet file on HDFS

2014-08-06 Thread Tathagata Das
You can use SparkSQL for that very easily. You can convert the rdds you get from kafka input stream, convert them to a RDDs of case classes and save as parquet files. More information here. https://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files On Wed, Aug 6, 2014 at 5:23

Spark stream data from kafka topics and output as parquet file on HDFS

2014-08-05 Thread rafeeq s
Hi, I am new to Apache Spark and Trying to Develop spark streaming program to *stream data from kafka topics and output as parquet file on HDFS*. Please share the *sample reference* program to stream data from kafka topics and output as parquet file on HDFS. Thanks in Advance. Regards,

Re: Spark stream data from kafka topics and output as parquet file on HDFS

2014-08-05 Thread Dibyendu Bhattacharya
You can try this Kafka Spark Consumer which I recently wrote. This uses the Low Level Kafka Consumer https://github.com/dibbhatt/kafka-spark-consumer Dibyendu On Tue, Aug 5, 2014 at 12:52 PM, rafeeq s rafeeq.ec...@gmail.com wrote: Hi, I am new to Apache Spark and Trying to Develop spark

Re: Spark stream data from kafka topics and output as parquet file on HDFS

2014-08-05 Thread rafeeq s
Thanks Dibyendu. 1. Spark itself have api jar for kafka, still we require manual offset management (using simple consumer concept) and manual consumer ? 2.Kafka Spark Consumer which is implemented in kafka 0.8.0 ,Can we use it for kafka 0.8.1 ? 3.How to use Kafka Spark Consumer to produce output

RE: Spark stream data from kafka topics and output as parquet file on HDFS

2014-08-05 Thread Shao, Saisai
Hi Rafeeq, I think current Spark Streaming api can offer you the ability to fetch data from Kafka and store to another external store, if you do not care about management of consumer offset manually, there’s no need to use low level api as SimpleConsumer. For Kafka 0.8.1 compatibility, you