You need to debug further and figure out the bottle neck. Why are you doing
a collect? If the dataset is too huge that will mostly hung the driver
machine. It would be good if you can paste the sample code, without that
its really hard to understand the flow of your program.

Thanks
Best Regards

On Sun, Aug 16, 2015 at 1:14 PM, Sagi r <stsa...@gmail.com> wrote:

> Hi,
> I'm building a spark application in which I load some data from an
> Elasticsearch cluster (using latest elasticsearch-hadoop connector) and
> continue to perform some calculations on the spark cluster.
>
> In one case, I use collect on the RDD as soon as it is created (loaded from
> ES).
> However, it is sometimes hangs on one (and sometimes more) node and doesn't
> continue.
> In the web UI, I can see that one node is stuck on scheduler delay and
> prevents from the job to continue,
> (while others have finished).
>
> Do you have any idea what is going on here?
>
> The data that is being loaded is fairly small, and only gets mapped once to
> domain objects before being collected.
>
> Thank you
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-hangs-on-collect-stuck-on-scheduler-delay-tp24283.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to