Kumar sp, collect() brings in all the data represented by the rdd/dataframe
into the memory of the single machine which is acting like driver. You will
run out of memory if the underlying rdd/dataframe represents large volume
of data distributed on several machines.
If your data is huge even
I have a use case where i am using collect().toMap (Group by certain column
and finding count ,creating map with a key) and use that map to enable some
further calculations.
I am getting Out of memory errors and is there any alternative than
.collect() to create a structure like Map or some