RE: Spark and Kafka Integration

Singh, Abhijeet Mon, 07 Dec 2015 05:05:24 -0800

For Q2. The order of the logs in each partition is guaranteed but there cannot 
be any such thing as global order.

From: Prashant Bhardwaj [mailto:prashant2006s...@gmail.com]
Sent: Monday, December 07, 2015 5:46 PM
To: user@spark.apache.org
Subject: Spark and Kafka Integration

Hi

Some Background:
We have a Kafka cluster with ~45 topics. Some of topics contains logs in Json 
format and some in PSV(pipe separated value) format. Now I want to consume 
these logs using Spark streaming and store them in Parquet format in HDFS.

Now my question is:
1. Can we create a InputDStream per topic in the same application?

 Since for every topic Schema of logs might differ, so want to process some 
topics in different way.
I want to store logs in different output directory based on the topic name.

2. Also how to partition logs based on timestamp?

--
Regards
Prashant

RE: Spark and Kafka Integration

Reply via email to