Hello Team, I am planning to write to two datasource at the same time . Scenario:-
Writing the same dataframe to HDFS and MinIO without re-executing the transformations and no cache(). Then how can we make it faster ? Read the parquet file and do a few transformations and write to HDFS and MinIO. here in both write spark needs execute the transformation again. Do we know how we can avoid re-execution of transformation without cache()/persist ? Scenario2 :- I am writing 3.2G data to HDFS and MinIO which takes ~6mins. Do we have any way to make writing this faster ? I don't want to do repartition and write as repartition will have overhead of shuffling . Please provide some inputs.