Hi All,
  I need to save a huge data frame as parquet file. As it is huge its
taking several hours. To improve performance it is known I have to send it
group wise.

But when I do partition ( columns*) /groupBy(Columns*) , driver is spilling
a lot of data and performance hits a lot again.

So how to handle this situation and save one group after another.

Attaching the sample scenario of the same.

https://stackoverflow.com/questions/54416623/how-to-group-dataframe-year-wise-and-iterate-through-groups-and-send-each-year-d

Highly appreciate your help.

Thanks,
Shyam

Reply via email to