Re: Error using .collect()

2019-05-13 Thread Shahab Yunus
Kumar sp, collect() brings in all the data represented by the rdd/dataframe into the memory of the single machine which is acting like driver. You will run out of memory if the underlying rdd/dataframe represents large volume of data distributed on several machines. If your data is huge even

Error using .collect()

2019-05-13 Thread Kumar sp
I have a use case where i am using collect().toMap (Group by certain column and finding count ,creating map with a key) and use that map to enable some further calculations. I am getting Out of memory errors and is there any alternative than .collect() to create a structure like Map or some