I have a spark job that has multiple stages. For now i star it with 100
executors, each with 12G mem (max is 16G). I am using Spark 1.3 over YARN
2.4.x.

For now i start the Spark Job with a very limited input (1 file of size
2G), overall there are 200 files. My first run is yet to complete as its
taking too much of time / throwing OOM exceptions / buffer exceptions (keep
that aside).

How will i know how much resources are required to run this job. (# of
cores, executors, mem, serialization buffers, and i do not yet what else).

IN M/R world, all i do is set split size and rest is taken care
automatically (yes i need to worry about mem, in case of OOM).


1) Can someone explain how they do resource estimation before running the
job or is there no way and one needs to only try it out ?
2) Even if i give 100 executors, the first stage takes only 5, how did
spark decide this ?

Please point me to any resources that also talks about similar things or
please explain here.

-- 
Deepak

Reply via email to