Collecting large dataset

Rishikesh Gawade Thu, 05 Sep 2019 11:23:11 -0700

Hi.
I have been trying to collect a large dataset(about 2 gb in size, 30
columns, more than a million rows) onto the driver side. I am aware that
collecting such a huge dataset isn't suggested, however, the application
within which the spark driver is running requires that data.
While collecting the dataframe, the spark job throws an error,
TaskResultLost( resultset lost from blockmanager).
I searched for solutions around this and set the following properties:
spark.blockManager.port, maxResultSize to 0(unlimited),
spark.driver.blockManager.port
and the application within which spark driver is running has 28 gb of max
heap size.
And yet the error arises again.
There are 22 executors running in my cluster.
Is there any config/necessary step that i am missing before collecting such
large data?
Or is there any other effective approach that would guarantee collecting
such large data without failure?


Thanks,
Rishikesh

Collecting large dataset

Reply via email to