Re: Spark on EMR suddenly stalling

2018-01-01 Thread Jeroen Miller
Hello Mans, On 1 Jan 2018, at 17:12, M Singh wrote: > I am not sure if I missed it - but can you let us know what is your input > source and output sink ? Reading from S3 and writing to S3. However the never-ending task 0.0 happens in a stage way before outputting

Re: Spark on EMR suddenly stalling

2018-01-01 Thread Jeroen Miller
Hello Gourav, On 30 Dec 2017, at 20:20, Gourav Sengupta wrote: > Please try to use the SPARK UI from the way that AWS EMR recommends, it > should be available from the resource manager. I never ever had any problem > working with it. THAT HAS ALWAYS BEEN MY PRIMARY

Re: Custom line/record delimiter

2018-01-01 Thread sk skk
Thanks for the update Kwon. Regards, On Mon, Jan 1, 2018 at 7:54 PM Hyukjin Kwon wrote: > Hi, > > > There's a PR - https://github.com/apache/spark/pull/18581 and JIRA > - SPARK-21289 > > Alternatively, you could check out multiLine option for CSV and see if > applicable.

Re: Custom line/record delimiter

2018-01-01 Thread Hyukjin Kwon
Hi, There's a PR - https://github.com/apache/spark/pull/18581 and JIRA - SPARK-21289 Alternatively, you could check out multiLine option for CSV and see if applicable. Thanks. 2017-12-30 2:19 GMT+09:00 sk skk : > Hi, > > Do we have an option to write a csv or text

mesos cluster dispatcher

2018-01-01 Thread puneetloya
hi, Would like an opinion on using *mesos cluster dispatcher*. It worked for me on 2 vagrant machines setup( i.e mesos master and slave). Is it better to start the spark driver using Marathon instead of dispatcher? the —supervise option can become a pain as you cannot stop the driver. please

Re: Spark on EMR suddenly stalling

2018-01-01 Thread M Singh
Hi Jeroen: I am not sure if I missed it - but can you let us know what is your input source and output sink ?   In some cases, I found that saving to S3 was a problem. In this case I started saving the output to the EMR HDFS and later copied to S3 using s3-dist-cp which solved our issue. Mans

Re: Spark on EMR suddenly stalling

2018-01-01 Thread Rohit Karlupia
Here is the list that I will probably try to fill: 1. Check GC on the offending executor when the task is running. May be you need even more memory. 2. Go back to some previous successful run of the job and check the spark ui for the offending stage and check max task time/max