Re: Spark on EMR suddenly stalling

2018-01-02 Thread Gourav Sengupta
Hi Jeroen, in case you are using HIVE partitions how many partitions do you have? Also is there any chance that you might post the code? Regards, Gourav Sengupta On Tue, Jan 2, 2018 at 7:50 AM, Jeroen Miller wrote: > Hello Gourav, > > On 30 Dec 2017, at 20:20, Gourav

Re: Spark on EMR suddenly stalling

2018-01-01 Thread Jeroen Miller
Hello Mans, On 1 Jan 2018, at 17:12, M Singh wrote: > I am not sure if I missed it - but can you let us know what is your input > source and output sink ? Reading from S3 and writing to S3. However the never-ending task 0.0 happens in a stage way before outputting

Re: Spark on EMR suddenly stalling

2018-01-01 Thread Jeroen Miller
Hello Gourav, On 30 Dec 2017, at 20:20, Gourav Sengupta wrote: > Please try to use the SPARK UI from the way that AWS EMR recommends, it > should be available from the resource manager. I never ever had any problem > working with it. THAT HAS ALWAYS BEEN MY PRIMARY

Re: Spark on EMR suddenly stalling

2018-01-01 Thread M Singh
Hi Jeroen: I am not sure if I missed it - but can you let us know what is your input source and output sink ?   In some cases, I found that saving to S3 was a problem. In this case I started saving the output to the EMR HDFS and later copied to S3 using s3-dist-cp which solved our issue. Mans

Re: Spark on EMR suddenly stalling

2018-01-01 Thread Rohit Karlupia
Here is the list that I will probably try to fill: 1. Check GC on the offending executor when the task is running. May be you need even more memory. 2. Go back to some previous successful run of the job and check the spark ui for the offending stage and check max task time/max

Re: Spark on EMR suddenly stalling

2017-12-30 Thread Gourav Sengupta
Hi, Please try to use the SPARK UI from the way that AWS EMR recommends, it should be available from the resource manager. I never ever had any problem working with it. THAT HAS ALWAYS BEEN MY PRIMARY AND SOLE SOURCE OF DEBUGGING. Sadly, I cannot be of much help unless we go for a screen share

Re: Spark on EMR suddenly stalling

2017-12-29 Thread Shushant Arora
you may have to recreate your cluster with below configuration at emr creation "Configurations": [ { "Properties": { "maximizeResourceAllocation": "false" }, "Classification": "spark" } ] On

Re: Spark on EMR suddenly stalling

2017-12-29 Thread Jeroen Miller
On 28 Dec 2017, at 19:25, Patrick Alwell wrote: > Dynamic allocation is great; but sometimes I’ve found explicitly setting the > num executors, cores per executor, and memory per executor to be a better > alternative. No difference with spark.dynamicAllocation.enabled

Fwd: Spark on EMR suddenly stalling

2017-12-29 Thread Jeroen Miller
Hello, Just a quick update as I did not made much progress yet. On 28 Dec 2017, at 21:09, Gourav Sengupta wrote: > can you try to then use the EMR version 5.10 instead or EMR version 5.11 > instead? Same issue with EMR 5.11.0. Task 0 in one stage never finishes. >

Re: Spark on EMR suddenly stalling

2017-12-28 Thread Gourav Sengupta
Hi Jeroen, can you try to then use the EMR version 5.10 instead or EMR version 5.11 instead? can you please try selecting a subnet which is in a different availability zone? if possible just try to increase the number of task instances and see the difference? also in case you are using caching,

Re: Spark on EMR suddenly stalling

2017-12-28 Thread Jeroen Miller
On 28 Dec 2017, at 19:42, Gourav Sengupta wrote: > In the EMR cluster what are the other applications that you have enabled > (like HIVE, FLUME, Livy, etc). Nothing that I can think of, just a Spark step (unless EMR is doing fancy stuff behind my back). > Are you

Re: Spark on EMR suddenly stalling

2017-12-28 Thread Jeroen Miller
On 28 Dec 2017, at 19:40, Maximiliano Felice wrote: > I experienced a similar issue a few weeks ago. The situation was a result of > a mix of speculative execution and OOM issues in the container. Interesting! However I don't have any OOM exception in the logs.

Fwd: Spark on EMR suddenly stalling

2017-12-28 Thread Jeroen Miller
On 28 Dec 2017, at 19:25, Patrick Alwell wrote: > You are using groupByKey() have you thought of an alternative like > aggregateByKey() or combineByKey() to reduce shuffling? I am aware of this indeed. I do have a groupByKey() that is difficult to avoid, but the

Re: Spark on EMR suddenly stalling

2017-12-28 Thread Gourav Sengupta
HI Jeroen, Can I get a few pieces of additional information please? In the EMR cluster what are the other applications that you have enabled (like HIVE, FLUME, Livy, etc). Are you using SPARK Session? If yes is your application using cluster mode or client mode? Have you read the EC2 service

Re: Spark on EMR suddenly stalling

2017-12-28 Thread Maximiliano Felice
Hi Jeroen, I experienced a similar issue a few weeks ago. The situation was a result of a mix of speculative execution and OOM issues in the container. First of all, when an executor takes too much time in Spark, it is handled by the YARN speculative execution, which will launch a new executor

Re: Spark on EMR suddenly stalling

2017-12-28 Thread Patrick Alwell
Joren, Anytime there is a shuffle in the network, Spark moves to a new stage. It seems like you are having issues either pre or post shuffle. Have you looked at a resource management tool like ganglia to determine if this is a memory or thread related issue? The spark UI? You are using

Re: Spark on EMR suddenly stalling

2017-12-28 Thread Jeroen Miller
On 28 Dec 2017, at 17:41, Richard Qiao wrote: > Are you able to specify which path of data filled up? I can narrow it down to a bunch of files but it's not so straightforward. > Any logs not rolled over? I have to manually terminate the cluster but there is nothing

Spark on EMR suddenly stalling

2017-12-28 Thread Jeroen Miller
Dear Sparkers, Once again in times of desperation, I leave what remains of my mental sanity to this wise and knowledgeable community. I have a Spark job (on EMR 5.8.0) which had been running daily for months, if not the whole year, with absolutely no supervision. This changed all of sudden