Hi Some Background: We have a Kafka cluster with ~45 topics. Some of topics contains logs in Json format and some in PSV(pipe separated value) format. Now I want to consume these logs using Spark streaming and store them in Parquet format in HDFS.
Now my question is: 1. Can we create a InputDStream per topic in the same application? Since for every topic Schema of logs might differ, so want to process some topics in different way. I want to store logs in different output directory based on the topic name. 2. Also how to partition logs based on timestamp? -- Regards Prashant