Hi

Some Background:
We have a Kafka cluster with ~45 topics. Some of topics contains logs in
Json format and some in PSV(pipe separated value) format. Now I want to
consume these logs using Spark streaming and store them in Parquet format
in HDFS.

Now my question is:
1. Can we create a InputDStream per topic in the same application?

 Since for every topic Schema of logs might differ, so want to process some
topics in different way.
I want to store logs in different output directory based on the topic name.

2. Also how to partition logs based on timestamp?

-- 
Regards
Prashant

Reply via email to