Hi All, I need to save a huge data frame as parquet file. As it is huge its taking several hours. To improve performance it is known I have to send it group wise.
But when I do partition ( columns*) /groupBy(Columns*) , driver is spilling a lot of data and performance hits a lot again. So how to handle this situation and save one group after another. Attaching the sample scenario of the same. https://stackoverflow.com/questions/54416623/how-to-group-dataframe-year-wise-and-iterate-through-groups-and-send-each-year-d Highly appreciate your help. Thanks, Shyam