Spark 2.2.1 EMR 5.11.1 Encrypted S3 bucket overwriting parquet file

2018-02-13 Thread Stephen Robinson
Hi All, I am using the latest version of EMR to overwrite Parquet files to an S3 bucket encrypted with a KMS key. I am seeing the attached error whenever I Overwrite a parquet file. For example the below code produces the attached error and stacktrace:

Re: Spark on EMR suddenly stalling

2018-01-02 Thread Gourav Sengupta
Hi Jeroen, in case you are using HIVE partitions how many partitions do you have? Also is there any chance that you might post the code? Regards, Gourav Sengupta On Tue, Jan 2, 2018 at 7:50 AM, Jeroen Miller wrote: > Hello Gourav, > > On 30 Dec 2017, at 20:20, Gourav

Re: Spark on EMR suddenly stalling

2018-01-01 Thread Jeroen Miller
Hello Mans, On 1 Jan 2018, at 17:12, M Singh wrote: > I am not sure if I missed it - but can you let us know what is your input > source and output sink ? Reading from S3 and writing to S3. However the never-ending task 0.0 happens in a stage way before outputting

Re: Spark on EMR suddenly stalling

2018-01-01 Thread Jeroen Miller
Hello Gourav, On 30 Dec 2017, at 20:20, Gourav Sengupta wrote: > Please try to use the SPARK UI from the way that AWS EMR recommends, it > should be available from the resource manager. I never ever had any problem > working with it. THAT HAS ALWAYS BEEN MY PRIMARY

Re: Spark on EMR suddenly stalling

2018-01-01 Thread M Singh
Hi Jeroen: I am not sure if I missed it - but can you let us know what is your input source and output sink ?   In some cases, I found that saving to S3 was a problem. In this case I started saving the output to the EMR HDFS and later copied to S3 using s3-dist-cp which solved our issue. Mans

Re: Spark on EMR suddenly stalling

2018-01-01 Thread Rohit Karlupia
Here is the list that I will probably try to fill: 1. Check GC on the offending executor when the task is running. May be you need even more memory. 2. Go back to some previous successful run of the job and check the spark ui for the offending stage and check max task time/max

Re: Spark on EMR suddenly stalling

2017-12-30 Thread Gourav Sengupta
Hi, Please try to use the SPARK UI from the way that AWS EMR recommends, it should be available from the resource manager. I never ever had any problem working with it. THAT HAS ALWAYS BEEN MY PRIMARY AND SOLE SOURCE OF DEBUGGING. Sadly, I cannot be of much help unless we go for a screen share

Re: Spark on EMR suddenly stalling

2017-12-29 Thread Shushant Arora
you may have to recreate your cluster with below configuration at emr creation "Configurations": [ { "Properties": { "maximizeResourceAllocation": "false" }, "Classification": "spark" } ] On

Re: Spark on EMR suddenly stalling

2017-12-29 Thread Jeroen Miller
On 28 Dec 2017, at 19:25, Patrick Alwell wrote: > Dynamic allocation is great; but sometimes I’ve found explicitly setting the > num executors, cores per executor, and memory per executor to be a better > alternative. No difference with spark.dynamicAllocation.enabled

Fwd: Spark on EMR suddenly stalling

2017-12-29 Thread Jeroen Miller
Hello, Just a quick update as I did not made much progress yet. On 28 Dec 2017, at 21:09, Gourav Sengupta wrote: > can you try to then use the EMR version 5.10 instead or EMR version 5.11 > instead? Same issue with EMR 5.11.0. Task 0 in one stage never finishes. >

Re: Spark on EMR suddenly stalling

2017-12-28 Thread Gourav Sengupta
, Jeroen Miller <bluedasya...@gmail.com> wrote: > On 28 Dec 2017, at 19:42, Gourav Sengupta <gourav.sengu...@gmail.com> > wrote: > > In the EMR cluster what are the other applications that you have enabled > (like HIVE, FLUME, Livy, etc). > > Nothing that I can think

Re: Spark on EMR suddenly stalling

2017-12-28 Thread Jeroen Miller
On 28 Dec 2017, at 19:42, Gourav Sengupta <gourav.sengu...@gmail.com> wrote: > In the EMR cluster what are the other applications that you have enabled > (like HIVE, FLUME, Livy, etc). Nothing that I can think of, just a Spark step (unless EMR is doing fancy stuff behind my back)

Re: Spark on EMR suddenly stalling

2017-12-28 Thread Jeroen Miller
On 28 Dec 2017, at 19:40, Maximiliano Felice wrote: > I experienced a similar issue a few weeks ago. The situation was a result of > a mix of speculative execution and OOM issues in the container. Interesting! However I don't have any OOM exception in the logs.

Fwd: Spark on EMR suddenly stalling

2017-12-28 Thread Jeroen Miller
On 28 Dec 2017, at 19:25, Patrick Alwell wrote: > You are using groupByKey() have you thought of an alternative like > aggregateByKey() or combineByKey() to reduce shuffling? I am aware of this indeed. I do have a groupByKey() that is difficult to avoid, but the

Re: Spark on EMR suddenly stalling

2017-12-28 Thread Gourav Sengupta
job in that? Regards, Gourav Sengupta On Thu, Dec 28, 2017 at 4:06 PM, Jeroen Miller <bluedasya...@gmail.com> wrote: > Dear Sparkers, > > Once again in times of desperation, I leave what remains of my mental > sanity to this wise and knowledgeable community. > > I have a

Re: Spark on EMR suddenly stalling

2017-12-28 Thread Maximiliano Felice
Hi Jeroen, I experienced a similar issue a few weeks ago. The situation was a result of a mix of speculative execution and OOM issues in the container. First of all, when an executor takes too much time in Spark, it is handled by the YARN speculative execution, which will launch a new executor

Re: Spark on EMR suddenly stalling

2017-12-28 Thread Patrick Alwell
Joren, Anytime there is a shuffle in the network, Spark moves to a new stage. It seems like you are having issues either pre or post shuffle. Have you looked at a resource management tool like ganglia to determine if this is a memory or thread related issue? The spark UI? You are using

Re: Spark on EMR suddenly stalling

2017-12-28 Thread Jeroen Miller
On 28 Dec 2017, at 17:41, Richard Qiao wrote: > Are you able to specify which path of data filled up? I can narrow it down to a bunch of files but it's not so straightforward. > Any logs not rolled over? I have to manually terminate the cluster but there is nothing

Spark on EMR suddenly stalling

2017-12-28 Thread Jeroen Miller
Dear Sparkers, Once again in times of desperation, I leave what remains of my mental sanity to this wise and knowledgeable community. I have a Spark job (on EMR 5.8.0) which had been running daily for months, if not the whole year, with absolutely no supervision. This changed all of sudden

Re: Running Spark on EMR

2017-01-16 Thread Everett Anderson
- >> From: Andrew Holway <andrew.hol...@otternetworks.de> >> Date: 1/15/17 11:37 AM (GMT-05:00) >> To: Marco Mistroni <mmistr...@gmail.com> >> Cc: Neil Jonkers <neilod...@gmail.com>, User <user@spark.apache.org> >> Subject: Re: Running Spark o

Re: Running Spark on EMR

2017-01-15 Thread Andrew Holway
w Holway <andrew.hol...@otternetworks.de> > Date: 1/15/17 11:37 AM (GMT-05:00) > To: Marco Mistroni <mmistr...@gmail.com> > Cc: Neil Jonkers <neilod...@gmail.com>, User <user@spark.apache.org> > Subject: Re: Running Spark on EMR > > Darn. I didn't respond t

Re: Running Spark on EMR

2017-01-15 Thread Darren Govoni
m>, User <user@spark.apache.org> Subject: Re: Running Spark on EMR Darn. I didn't respond to the list. Sorry. On Sun, Jan 15, 2017 at 5:29 PM, Marco Mistroni <mmistr...@gmail.com> wrote: thanks Neil. I followed original suggestion from Andrw and everything is working fine nowkr On Sun,

Re: Running Spark on EMR

2017-01-15 Thread Andrew Holway
<neilod...@gmail.com> wrote: > >> Hello, >> >> Can you drop the url: >> >> spark://master:7077 >> >> The url is used when running Spark in standalone mode. >> >> Regards >> >> >> Original message ---- >

Re: Running Spark on EMR

2017-01-15 Thread Marco Mistroni
ng Spark in standalone mode. > > Regards > > > Original message > From: Marco Mistroni > Date:15/01/2017 16:34 (GMT+02:00) > To: User > Subject: Running Spark on EMR > > hi all > could anyone assist here? > i am trying to run spark 2.0.0 on an EMR c

Re: Running Spark on EMR

2017-01-15 Thread Neil Jonkers
Hello, Can you drop the url:  spark://master:7077 The url is used when running Spark in standalone mode. Regards Original message From: Marco Mistroni <mmistr...@gmail.com> Date:15/01/2017 16:34 (GMT+02:00) To: User <user@spark.apache.org> Subject: Running S

Running Spark on EMR

2017-01-15 Thread Marco Mistroni
hi all could anyone assist here? i am trying to run spark 2.0.0 on an EMR cluster,but i am having issues connecting to the master node So, below is a snippet of what i am doing sc = SparkSession.builder.master(sparkHost).appName("DataProcess").getOrCreate() sparkHost is passed as input

Run Apache Spark on EMR

2016-04-22 Thread Jinan Alhajjaj
Hi AllI would like to ask for two thing and I really appreciate the answer ASAP1. How do I implement the parallelism in Apache Spark java application?2. How to run the Spark application in Amazon EMR?

Question around spark on EMR

2016-04-05 Thread Natu Lauchande
Hi, I am setting up a Scala spark streaming app in EMR . I wonder if anyone in the list can help me with the following question : 1. What's the approach that you guys have been using to submit in an EMR job step environment variables that will be needed by the Spark application ? 2. Can i have

Re: Spark on EMR: out-of-the-box solution for real-time application logs monitoring?

2015-12-11 Thread Roberto Coluccio
and, > since I'm on AWS, those service are pretty well integrated together..but I > could just find examples about it using on standalone EC2 instances. > > > > In my use case, EMR 3.9 and Spark 1.4.1 drivers running on YARN (cluster > mode), I would like to be able to real-

Re: Spark on EMR: out-of-the-box solution for real-time application logs monitoring?

2015-12-10 Thread Steve Loughran
he CloudWatch Logs + Lambda pretty straightforward and, since > I'm on AWS, those service are pretty well integrated together..but I could > just find examples about it using on standalone EC2 instances. > > In my use case, EMR 3.9 and Spark 1.4.1 drivers running on YARN (cluster &

Spark on EMR: out-of-the-box solution for real-time application logs monitoring?

2015-12-10 Thread Roberto Coluccio
..but I could just find examples about it using on standalone EC2 instances. In my use case, EMR 3.9 and Spark 1.4.1 drivers running on YARN (cluster mode), I would like to be able to real-time monitor Spark logs, so not just about when the processing ends and they are copied to S3. Is there any out

RE: Yarn Spark on EMR

2015-11-20 Thread Bozeman, Christopher
be found from the YARN Resource Manager UI (master node:8088) and it would be best to use a SOCKS proxy in order nicely resolve the URLs. Best regards, Christopher From: SURAJ SHETH [mailto:shet...@gmail.com] Sent: Sunday, November 15, 2015 8:19 AM To: user@spark.apache.org Subject: Yarn Spark

Yarn Spark on EMR

2015-11-15 Thread SURAJ SHETH
Hi, Yarn UI on 18080 stops receiving updates Spark jobs/tasks immediately after it starts. We see only one task completed in the UI while the other hasn't got any resources while in reality, more than 5 tasks would have completed. Hadoop - Amazon 2.6 Spark - 1.5 Thanks and Regards, Suraj Sheth

Zeppelin + Spark on EMR

2015-09-07 Thread shahab
Hi, I am trying to use Zeppelin to work with Spark on Amazon EMR. I used the script provided by Anders ( https://gist.github.com/andershammar/224e1077021d0ea376dd) to setup Zeppelin. The Zeppelin can connect to Spark but when I got error when I run the tutorials. and I get the following error

Fwd: [Spark + Hive + EMR + S3] Issue when reading from Hive external table backed on S3 with large amount of small files

2015-08-07 Thread Roberto Coluccio
Please community, I'd really appreciate your opinion on this topic. Best regards, Roberto -- Forwarded message -- From: Roberto Coluccio roberto.coluc...@gmail.com Date: Sat, Jul 25, 2015 at 6:28 PM Subject: [Spark + Hive + EMR + S3] Issue when reading from Hive external table

[Spark + Hive + EMR + S3] Issue when reading from Hive external table backed on S3 with large amount of small files

2015-07-25 Thread Roberto Coluccio
Hello Spark community, I currently have a Spark 1.3.1 batch driver, deployed in YARN-cluster mode on an EMR cluster (AMI 3.7.0) that reads input data through an HiveContext, in particular SELECTing data from an EXTERNAL TABLE backed on S3. Such table has dynamic partitions and contains *hundreds

Re: Spark on EMR with S3 example (Python)

2015-07-15 Thread Akhil Das
on Amazon. Do I still need to provide the keys? Thank you, *From:* Sujit Pal [mailto:sujitatgt...@gmail.com] *Sent:* Tuesday, July 14, 2015 3:14 PM *To:* Pagliari, Roberto *Cc:* user@spark.apache.org *Subject:* Re: Spark on EMR with S3 example (Python) Hi Roberto, I have written

Re: Spark on EMR with S3 example (Python)

2015-07-15 Thread Sujit Pal
to provide the keys? Thank you, *From:* Sujit Pal [mailto:sujitatgt...@gmail.com] *Sent:* Tuesday, July 14, 2015 3:14 PM *To:* Pagliari, Roberto *Cc:* user@spark.apache.org *Subject:* Re: Spark on EMR with S3 example (Python) Hi Roberto, I have written PySpark code that reads

Re: Spark on EMR with S3 example (Python)

2015-07-14 Thread Sujit Pal
Hi Roberto, I have written PySpark code that reads from private S3 buckets, it should be similar for public S3 buckets as well. You need to set the AWS access and secret keys into the SparkContext, then you can access the S3 folders and files with their s3n:// paths. Something like this: sc =

RE: Spark on EMR with S3 example (Python)

2015-07-14 Thread Pagliari, Roberto
Hi Sujit, I just wanted to access public datasets on Amazon. Do I still need to provide the keys? Thank you, From: Sujit Pal [mailto:sujitatgt...@gmail.com] Sent: Tuesday, July 14, 2015 3:14 PM To: Pagliari, Roberto Cc: user@spark.apache.org Subject: Re: Spark on EMR with S3 example (Python

Spark on EMR with S3 example (Python)

2015-07-14 Thread Pagliari, Roberto
Is there an example about how to load data from a public S3 bucket in Python? I haven't found any. Thank you,

Re: Spark on EMR

2015-06-19 Thread Bozeman, Christopher
You can use Spark 1.4 on EMR AMI 3.8.0 if you install Spark as a 3rd party application using the bootstrap action directly without the native Spark inclusion with 1.3.1. See https://github.com/awslabs/emr-bootstrap-actions/tree/master/spark Refer to https://github.com/awslabs/emr-bootstrap

Re: Spark on EMR

2015-06-17 Thread Eugen Cepoi
It looks like it is a wrapper around https://github.com/awslabs/emr-bootstrap-actions/tree/master/spark So basically adding an option -v,1.4.0.a should work. https://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-spark-configure.html 2015-06-17 15:32 GMT+02:00 Hideyoshi Maeda

Re: Spark on EMR

2015-06-17 Thread Hideyoshi Maeda
Any ideas what version of Spark is underneath? i.e. is it 1.4? and is SparkR supported on Amazon EMR? On Wed, Jun 17, 2015 at 12:06 AM, ayan guha guha.a...@gmail.com wrote: That's great news. Can I assume spark on EMR supports kinesis to hbase pipeline? On 17 Jun 2015 05:29, kamatsuoka ken

Re: Spark on EMR

2015-06-17 Thread Kelly, Jonathan
is currently being used under the hood, passing -v,1.4.0 in the options is not supported. Sent from Ninehttp://www.9folders.com/ From: Eugen Cepoi cepoi.eu...@gmail.com Sent: Jun 17, 2015 6:37 AM To: Hideyoshi Maeda Cc: ayan guha;kamatsuoka;user Subject: Re: Spark on EMR It looks like

Re: Spark on EMR

2015-06-16 Thread ayan guha
That's great news. Can I assume spark on EMR supports kinesis to hbase pipeline? On 17 Jun 2015 05:29, kamatsuoka ken...@gmail.com wrote: Spark is now officially supported on Amazon Elastic Map Reduce: http://aws.amazon.com/elasticmapreduce/details/spark/ -- View this message in context