Also, if not already done, you may want to try repartition your data to 50
partition s
On 6 May 2015 05:56, Manu Kaul manohar.k...@gmail.com wrote:
Hi All,
For a job I am running on Spark with a dataset of say 350,000 lines (not
big), I am finding that even though my cluster has a large
Hi All,
For a job I am running on Spark with a dataset of say 350,000 lines (not
big), I am finding that even though my cluster has a large number of cores
available (like 100 cores), the Spark system seems to stop after using just
4 cores and after that the runtime is pretty much a straight line
Hi,
do you have information on how many partitions/tasks the stage/job is
running? By default there is 1 core per task, and your number of concurrent
tasks may be limiting core utilization.
There are a few settings you could play with, assuming your issue is
related to the above: