We have too many (large) files. We have about 30k partitions with about 4
years worth data and we need to process entire history in a one time
monolithic job.
I would like to know how spark decides the number of executors requested.
I've seen testcases where the max executors count is Integer's
Do you have too many small files you are trying to read? Number of
executors are very high
On 24 Sep 2016 10:28, "Yash Sharma" wrote:
> Have been playing around with configs to crack this. Adding them here
> where it would be helpful to others :)
> Number of executors and
Hi Dhruve, thanks.
I've solved the issue with adding max executors.
I wanted to find some place where I can add this behavior in Spark so that
user should not have to worry about the max executors.
Cheers
- Thanks, via mobile, excuse brevity.
On Sep 24, 2016 1:15 PM, "dhruve ashar"
Is there anywhere I can help fix this ?
I can see the requests being made in the yarn allocator. What should be the
upperlimit of the requests made ?
https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala#L222
On Sat, Sep 24, 2016 at
Have been playing around with configs to crack this. Adding them here where
it would be helpful to others :)
Number of executors and timeout seemed like the core issue.
{code}
--driver-memory 4G \
--conf spark.dynamicAllocation.enabled=true \
--conf spark.dynamicAllocation.maxExecutors=500 \