For Q2. The order of the logs in each partition is guaranteed but there cannot 
be any such thing as global order.

From: Prashant Bhardwaj []
Sent: Monday, December 07, 2015 5:46 PM
Subject: Spark and Kafka Integration


Some Background:
We have a Kafka cluster with ~45 topics. Some of topics contains logs in Json 
format and some in PSV(pipe separated value) format. Now I want to consume 
these logs using Spark streaming and store them in Parquet format in HDFS.

Now my question is:
1. Can we create a InputDStream per topic in the same application?

 Since for every topic Schema of logs might differ, so want to process some 
topics in different way.
I want to store logs in different output directory based on the topic name.

2. Also how to partition logs based on timestamp?


Reply via email to