Re: Spark on EMR suddenly stalling
Hello Mans, On 1 Jan 2018, at 17:12, M Singhwrote: > I am not sure if I missed it - but can you let us know what is your input > source and output sink ? Reading from S3 and writing to S3. However the never-ending task 0.0 happens in a stage way before outputting anything to S3. Regards, Jeroen - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Spark on EMR suddenly stalling
Hello Gourav, On 30 Dec 2017, at 20:20, Gourav Senguptawrote: > Please try to use the SPARK UI from the way that AWS EMR recommends, it > should be available from the resource manager. I never ever had any problem > working with it. THAT HAS ALWAYS BEEN MY PRIMARY AND SOLE SOURCE OF DEBUGGING. For some reason sometimes there is absolutely nothing showing up in the Spark UI or the UI is not refreshed, e.g. for the current stage is #x while the logs shows stage #y (with y > x) is currently under way. It may very well be that the source of this problem lies between the keyboard and the chair, but if this is the case, I do not know how to solve this. > Also, I ALWAYS prefer the maximize Resource Allocation setting in EMR to be > set to true. Thanks for the tip -- will try this setting in my next batch of experiments! JM - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Custom line/record delimiter
Thanks for the update Kwon. Regards, On Mon, Jan 1, 2018 at 7:54 PM Hyukjin Kwonwrote: > Hi, > > > There's a PR - https://github.com/apache/spark/pull/18581 and JIRA > - SPARK-21289 > > Alternatively, you could check out multiLine option for CSV and see if > applicable. > > > Thanks. > > > 2017-12-30 2:19 GMT+09:00 sk skk : > >> Hi, >> >> Do we have an option to write a csv or text file with a custom >> record/line separator through spark ? >> >> I could not find any ref on the api. I have a issue while loading data >> into a warehouse as one of the column on csv have a new line character and >> the warehouse is not letting to escape that new line character . >> >> Thank you , >> Sk >> > >
Re: Custom line/record delimiter
Hi, There's a PR - https://github.com/apache/spark/pull/18581 and JIRA - SPARK-21289 Alternatively, you could check out multiLine option for CSV and see if applicable. Thanks. 2017-12-30 2:19 GMT+09:00 sk skk: > Hi, > > Do we have an option to write a csv or text file with a custom record/line > separator through spark ? > > I could not find any ref on the api. I have a issue while loading data > into a warehouse as one of the column on csv have a new line character and > the warehouse is not letting to escape that new line character . > > Thank you , > Sk >
mesos cluster dispatcher
hi, Would like an opinion on using *mesos cluster dispatcher*. It worked for me on 2 vagrant machines setup( i.e mesos master and slave). Is it better to start the spark driver using Marathon instead of dispatcher? the —supervise option can become a pain as you cannot stop the driver. please share your experience if you use dispatcher in production? p.s: i did see other discussions on this topic, but they are slightly older. Thanks -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Spark on EMR suddenly stalling
Hi Jeroen: I am not sure if I missed it - but can you let us know what is your input source and output sink ? In some cases, I found that saving to S3 was a problem. In this case I started saving the output to the EMR HDFS and later copied to S3 using s3-dist-cp which solved our issue. Mans On Monday, January 1, 2018 7:41 AM, Rohit Karlupiawrote: Here is the list that I will probably try to fill: - Check GC on the offending executor when the task is running. May be you need even more memory. - Go back to some previous successful run of the job and check the spark ui for the offending stage and check max task time/max input/max shuffle in/out for the largest task. Will help you understand the degree of skew in this stage. - Take a thread dump of the executor from the Spark UI and verify if the task is really doing any work or it stuck in some deadlock. Some of the hive serde are not really usable from multi-threaded/multi-use spark executors. - Take a thread dump of the executor from the Spark UI and verify if the task is spilling to disk. Playing with storage and memory fraction or generally increasing the memory will help. - Check the disk utilisation on the machine running the executor. - Look for event loss messages in the logs due to event queue full. Loss of events can send some of the spark components into really bad states. thanks,rohitk On Sun, Dec 31, 2017 at 12:50 AM, Gourav Sengupta wrote: Hi, Please try to use the SPARK UI from the way that AWS EMR recommends, it should be available from the resource manager. I never ever had any problem working with it. THAT HAS ALWAYS BEEN MY PRIMARY AND SOLE SOURCE OF DEBUGGING. Sadly, I cannot be of much help unless we go for a screen share session over google chat or skype. Also, I ALWAYS prefer the maximize Resource Allocation setting in EMR to be set to true. Besides that, there is a metrics in the EMR console which shows the number of containers getting generated by your job on graphs. Regards,Gourav Sengupta On Fri, Dec 29, 2017 at 6:23 PM, Jeroen Miller wrote: Hello, Just a quick update as I did not made much progress yet. On 28 Dec 2017, at 21:09, Gourav Sengupta wrote: > can you try to then use the EMR version 5.10 instead or EMR version 5.11 > instead? Same issue with EMR 5.11.0. Task 0 in one stage never finishes. > can you please try selecting a subnet which is in a different availability > zone? I did not try this yet. But why should that make a difference? > if possible just try to increase the number of task instances and see the > difference? I tried with 512 partitions -- no difference. > also in case you are using caching, No caching used. > Also can you please report the number of containers that your job is creating > by looking at the metrics in the EMR console? 8 containers if I trust the directories in j-xxx/containers/application_x xx/. > Also if you see the spark UI then you can easily see which particular step is > taking the longest period of time - you just have to drill in a bit in order > to see that. Generally in case shuffling is an issue then it definitely > appears in the SPARK UI as I drill into the steps and see which particular > one is taking the longest. I always have issues with the Spark UI on EC2 -- it never seems to be up to date. JM
Re: Spark on EMR suddenly stalling
Here is the list that I will probably try to fill: 1. Check GC on the offending executor when the task is running. May be you need even more memory. 2. Go back to some previous successful run of the job and check the spark ui for the offending stage and check max task time/max input/max shuffle in/out for the largest task. Will help you understand the degree of skew in this stage. 3. Take a thread dump of the executor from the Spark UI and verify if the task is really doing any work or it stuck in some deadlock. Some of the hive serde are not really usable from multi-threaded/multi-use spark executors. 4. Take a thread dump of the executor from the Spark UI and verify if the task is spilling to disk. Playing with storage and memory fraction or generally increasing the memory will help. 5. Check the disk utilisation on the machine running the executor. 6. Look for event loss messages in the logs due to event queue full. Loss of events can send some of the spark components into really bad states. thanks, rohitk On Sun, Dec 31, 2017 at 12:50 AM, Gourav Senguptawrote: > Hi, > > Please try to use the SPARK UI from the way that AWS EMR recommends, it > should be available from the resource manager. I never ever had any problem > working with it. THAT HAS ALWAYS BEEN MY PRIMARY AND SOLE SOURCE OF > DEBUGGING. > > Sadly, I cannot be of much help unless we go for a screen share session > over google chat or skype. > > Also, I ALWAYS prefer the maximize Resource Allocation setting in EMR to > be set to true. > > Besides that, there is a metrics in the EMR console which shows the number > of containers getting generated by your job on graphs. > > > > Regards, > Gourav Sengupta > > On Fri, Dec 29, 2017 at 6:23 PM, Jeroen Miller > wrote: > >> Hello, >> >> Just a quick update as I did not made much progress yet. >> >> On 28 Dec 2017, at 21:09, Gourav Sengupta >> wrote: >> > can you try to then use the EMR version 5.10 instead or EMR version >> 5.11 instead? >> >> Same issue with EMR 5.11.0. Task 0 in one stage never finishes. >> >> > can you please try selecting a subnet which is in a different >> availability zone? >> >> I did not try this yet. But why should that make a difference? >> >> > if possible just try to increase the number of task instances and see >> the difference? >> >> I tried with 512 partitions -- no difference. >> >> > also in case you are using caching, >> >> No caching used. >> >> > Also can you please report the number of containers that your job is >> creating by looking at the metrics in the EMR console? >> >> 8 containers if I trust the directories in j-xxx/containers/application_x >> xx/. >> >> > Also if you see the spark UI then you can easily see which particular >> step is taking the longest period of time - you just have to drill in a bit >> in order to see that. Generally in case shuffling is an issue then it >> definitely appears in the SPARK UI as I drill into the steps and see which >> particular one is taking the longest. >> >> I always have issues with the Spark UI on EC2 -- it never seems to be up >> to date. >> >> JM >> >> >