Hi all,

The requirement is, to process file using Spark Streaming fed from Kafka
Topic and once all the transformations are done, make it a batch of static
dataframe and pass it into a Spark ML Model tuning.

As of now, I had been doing it in the below fashion -

1) Read the file using Kafka
2) Consume it in Spark using a streaming dataframe
3) Run spark transformation jobs on streaming data
4) Append and write on HDFS.
5) Read the transformed file as batch in Spark
6) Run Spark ML Model

But, the requirement is to avoid use of HDFS as it may not be installed in
certain clusters, so, we've to avoid the disk I/O and do it on the fly from
Kafka to append in a spark static DF and hence pass that DF to the ML Model.

How to go about it?

Thanks,
Aakash.

Reply via email to