Hello,
I have referred link https://github.com/dibbhatt/kafka-spark-consumer; and
I have successfully consumed tuples from kafka.
Tuples are JSON objects and I want to store that objects in HDFS as parque
format.
Please suggest me any sample example for that.
Thanks in advance.
On Tue, Aug
You can use SparkSQL for that very easily. You can convert the rdds you get
from kafka input stream, convert them to a RDDs of case classes and save as
parquet files.
More information here.
https://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files
On Wed, Aug 6, 2014 at 5:23
Hi,
I am new to Apache Spark and Trying to Develop spark streaming program
to *stream
data from kafka topics and output as parquet file on HDFS*.
Please share the *sample reference* program to stream data from kafka
topics and output as parquet file on HDFS.
Thanks in Advance.
Regards,
You can try this Kafka Spark Consumer which I recently wrote. This uses the
Low Level Kafka Consumer
https://github.com/dibbhatt/kafka-spark-consumer
Dibyendu
On Tue, Aug 5, 2014 at 12:52 PM, rafeeq s rafeeq.ec...@gmail.com wrote:
Hi,
I am new to Apache Spark and Trying to Develop spark
Thanks Dibyendu.
1. Spark itself have api jar for kafka, still we require manual offset
management (using simple consumer concept) and manual consumer ?
2.Kafka Spark Consumer which is implemented in kafka 0.8.0 ,Can we use it
for kafka 0.8.1 ?
3.How to use Kafka Spark Consumer to produce output
Hi Rafeeq,
I think current Spark Streaming api can offer you the ability to fetch data
from Kafka and store to another external store, if you do not care about
management of consumer offset manually, there’s no need to use low level api as
SimpleConsumer.
For Kafka 0.8.1 compatibility, you