Re: Quick one... AWS SDK version?

2017-10-08 Thread Jonathan Kelly
tr...@gmail.com> wrote: > > Hi JG > out of curiosity what's ur usecase? are you writing to S3? you could use > Spark to do that , e.g using hadoop package > org.apache.hadoop:hadoop-aws:2.7.1 ..that will download the aws client > which is in line with hadoop 2.7.1? > >

Re: Quick one... AWS SDK version?

2017-10-06 Thread Jonathan Kelly
Note: EMR builds Hadoop, Spark, et al, from source against specific versions of certain packages like the AWS Java SDK, httpclient/core, Jackson, etc., sometimes requiring some patches in these applications in order to work with versions of these dependencies that differ from what the applications

Re: RDD blocks on Spark Driver

2017-02-28 Thread Jonathan Kelly
Prithish, It would be helpful for you to share the spark-submit command you are running. ~ Jonathan On Sun, Feb 26, 2017 at 8:29 AM Prithish wrote: > Thanks for the responses, I am running this on Amazon EMR which runs the > Yarn cluster manager. > > On Sat, Feb 25, 2017

Re: Custom log4j.properties on AWS EMR

2017-02-28 Thread Jonathan Kelly
Prithish, I saw you posted this on SO, so I responded there just now. See http://stackoverflow.com/questions/42452622/custom-log4j-properties-on-aws-emr/42516161#42516161 In short, an hdfs:// path can't be used to configure log4j because log4j knows nothing about hdfs. Instead, since you are

Re: [Erorr:]vieiwng Web UI on EMR cluster

2016-09-13 Thread Jonathan Kelly
ed: connect failed: Connection refused >>> >>> channel 4: open failed: connect failed: Connection refused >>> >>> channel 5: open failed: connect failed: Connection refused >>> >>> channel 22: open failed: connect failed: Connection refused >>

Re: [Erorr:]vieiwng Web UI on EMR cluster

2016-09-13 Thread Jonathan Kelly
I would not recommend opening port 50070 on your cluster, as that would give the entire world access to your data on HDFS. Instead, you should follow the instructions found here to create a secure tunnel to the cluster, through which you can proxy requests to the UIs using a browser plugin like

Re: Unsubscribe - 3rd time

2016-06-29 Thread Jonathan Kelly
If at first you don't succeed, try, try again. But please don't. :) See the "unsubscribe" link here: http://spark.apache.org/community.html I'm not sure I've ever come across an email list that allows you to unsubscribe by responding to the list with "unsubscribe". At least, all of the Apache

Re: Logging trait in Spark 2.0

2016-06-24 Thread Jonathan Kelly
Ted, how is that thread related to Paolo's question? On Fri, Jun 24, 2016 at 1:50 PM Ted Yu wrote: > See this related thread: > > > http://search-hadoop.com/m/q3RTtEor1vYWbsW=RE+Configuring+Log4J+Spark+1+5+on+EMR+4+1+ > > On Fri, Jun 24, 2016 at 6:07 AM, Paolo Patierno

Re: Spark 2.0 on YARN - Files in config archive not ending up on executor classpath

2016-06-20 Thread Jonathan Kelly
ave a bug tracking it, in case anyone else has > time to look at it before I do. > > On Mon, Jun 20, 2016 at 1:20 PM, Jonathan Kelly <jonathaka...@gmail.com> > wrote: > > Thanks for the confirmation! Shall I cut a JIRA issue? > > > > On Mon, Jun 20, 2016 at 10:42 AM

Re: Spark 2.0 on YARN - Files in config archive not ending up on executor classpath

2016-06-20 Thread Jonathan Kelly
Mon, Jun 20, 2016 at 7:04 AM, Jonathan Kelly <jonathaka...@gmail.com> > wrote: > > Does anybody have any thoughts on this? > > > > On Fri, Jun 17, 2016 at 6:36 PM Jonathan Kelly <jonathaka...@gmail.com> > > wrote: > >>

Re: Spark 2.0 on YARN - Files in config archive not ending up on executor classpath

2016-06-20 Thread Jonathan Kelly
Does anybody have any thoughts on this? On Fri, Jun 17, 2016 at 6:36 PM Jonathan Kelly <jonathaka...@gmail.com> wrote: > I'm trying to debug a problem in Spark 2.0.0-SNAPSHOT > (commit bdf5fe4143e5a1a393d97d0030e76d35791ee248) where Spark's > log4j.properties is not ge

Re: Running Spark in local mode

2016-06-19 Thread Jonathan Kelly
Mich, what Jacek is saying is not that you implied that YARN relies on two masters. He's just clarifying that yarn-client and yarn-cluster modes are really both using the same (type of) master (simply "yarn"). In fact, if you specify "--master yarn-client" or "--master yarn-cluster", spark-submit

Spark 2.0 on YARN - Files in config archive not ending up on executor classpath

2016-06-17 Thread Jonathan Kelly
I'm trying to debug a problem in Spark 2.0.0-SNAPSHOT (commit bdf5fe4143e5a1a393d97d0030e76d35791ee248) where Spark's log4j.properties is not getting picked up in the executor classpath (and driver classpath for yarn-cluster mode), so Hadoop's log4j.properties file is taking precedence in the YARN

Re: Configure Spark Resource on AWS CLI Not Working

2016-03-01 Thread Jonathan Kelly
Weiwei, Please see this documentation for configuring Spark and other apps on EMR 4.x: http://docs.aws.amazon.com/ElasticMapReduce/latest/ReleaseGuide/emr-configure-apps.html This documentation about what has changed between 3.x and 4.x should also be helpful:

Re: scikit learn on EMR PySpark

2016-03-01 Thread Jonathan Kelly
Hi, Myles, We do not install scikit-learn or spark-sklearn on EMR clusters by default, but you may install them yourself by just doing "sudo pip install scikit-learn spark-sklearn" (either by ssh'ing to the master instance and running this manually, or by running it as an EMR Step). ~ Jonathan

Re: Spark-avro issue in 1.5.2

2016-02-24 Thread Jonathan Kelly
This error is likely due to EMR including some Hadoop lib dirs in spark.{driver,executor}.extraClassPath. (Hadoop bundles an older version of Avro than what Spark uses, so you are probably getting bitten by this Avro mismatch.) We determined that these Hadoop dirs are not actually necessary to

Re: Error :Type mismatch error when passing hdfs file path to spark-csv load method

2016-02-21 Thread Jonathan Kelly
On the line preceding the one that the compiler is complaining about (which doesn't actually have a problem in itself), you declare df as "df"+fileName, making it a string. Then you try to assign a DataFrame to df, but it's already a string. I don't quite understand your intent with that previous

Re: Memory issues on spark

2016-02-17 Thread Jonathan Kelly
(I'm not 100% sure, but...) I think the SPARK_EXECUTOR_* environment variables are intended to be used with Spark Standalone. Even if not, I'd recommend setting the corresponding properties in spark-defaults.conf rather than in spark-env.sh. For example, you may use the following Configuration

Re: AM creation in yarn-client mode

2016-02-09 Thread Jonathan Kelly
In yarn-client mode, the driver is separate from the AM. The AM is created in YARN, and YARN controls where it goes (though you can somewhat control it using YARN node labels--I just learned earlier today in a different thread on this list that this can be controlled by

Re: Dataframe, Spark SQL - Drops First 8 Characters of String on Amazon EMR

2016-01-28 Thread Jonathan Kelly
Just FYI, Spark 1.6 was released on emr-4.3.0 a couple days ago: https://aws.amazon.com/blogs/aws/emr-4-3-0-new-updated-applications-command-line-export/ On Thu, Jan 28, 2016 at 7:30 PM Andrew Zurn wrote: > Hey Daniel, > > Thanks for the response. > > After playing around for a

Re: Terminating Spark Steps in AWS

2016-01-26 Thread Jonathan Kelly
Daniel, The "hadoop job -list" command is a deprecated form of "mapred job -list", which is only for Hadoop MapReduce jobs. For Spark jobs, which run on YARN, you instead want "yarn application -list". Hope this helps, Jonathan (from the EMR team) On Tue, Jan 26, 2016 at 10:05 AM Daniel

Re: Read from AWS s3 with out having to hard-code sensitive keys

2016-01-11 Thread Jonathan Kelly
Yes, IAM roles are actually required now for EMR. If you use Spark on EMR (vs. just EC2), you get S3 configuration for free (it goes by the name EMRFS), and it will use your IAM role for communicating with S3. Here is the corresponding documentation:

Re: Discover SparkUI port for spark streaming job running in cluster mode

2015-12-14 Thread Jonathan Kelly
tionId > > On Mon, Dec 14, 2015 at 2:33 PM, Jonathan Kelly <jonathaka...@gmail.com> > wrote: > >> Are you running Spark on YARN? If so, you can get to the Spark UI via the >> YARN ResourceManager. Each running Spark application will have a link on >> the YARN Resou

Re: Discover SparkUI port for spark streaming job running in cluster mode

2015-12-14 Thread Jonathan Kelly
Are you running Spark on YARN? If so, you can get to the Spark UI via the YARN ResourceManager. Each running Spark application will have a link on the YARN ResourceManager labeled "ApplicationMaster". If you click that, it will take you to the Spark UI, even if it is running on a slave node in the

Re: spark-ec2 vs. EMR

2015-12-04 Thread Jonathan Kelly
, so hopefully this works... On Wednesday, December 2, 2015, Jonathan Kelly <jonathaka...@gmail.com> wrote: > EMR is currently running a private preview of an upcoming feature allowing > EMR clusters to be launched in VPC private subnets. This will allow you to > launch a cluster in a

Re: Spark Tasks on second node never return in Yarn when I have more than 1 task node

2015-11-19 Thread Jonathan Kelly
I don't know if this actually has anything to do with why your job is hanging, but since you are using EMR you should probably not set those fs.s3 properties but rather let it use EMRFS, EMR's optimized Hadoop FileSystem implementation for interacting with S3. One benefit is that it will

Re: spark-submit stuck and no output in console

2015-11-16 Thread Jonathan Kelly
He means for you to use jstack to obtain a stacktrace of all of the threads. Or are you saying that the Java process never even starts? On Mon, Nov 16, 2015 at 7:48 AM, Kayode Odeyemi wrote: > Spark 1.5.1 > > The fact is that there's no stack trace. No output from that

Re: Spark EC2 script on Large clusters

2015-11-05 Thread Jonathan Kelly
Christian, Is there anything preventing you from using EMR, which will manage your cluster for you? Creating large clusters would take mins on EMR instead of hours. Also, EMR supports growing your cluster easily and recently added support for shrinking your cluster gracefully (even while jobs are

Re: Why is the Spark Web GUI failing with JavaScript "Uncaught SyntaxError"?

2015-10-14 Thread Jonathan Kelly
to a private rather than > public IP; replacing IPs brings me to the same Spark GUI. > > Joshua > [image: Inline image 3] > > > > > On Tue, Oct 13, 2015 at 6:23 PM, Jonathan Kelly <jonathaka...@gmail.com> > wrote: > >> Joshua, >> >> Since Spa

Re: Why is the Spark Web GUI failing with JavaScript "Uncaught SyntaxError"?

2015-10-13 Thread Jonathan Kelly
Joshua, Since Spark is configured to run on YARN in EMR, instead of viewing the Spark application UI at port 4040, you should instead start from the YARN ResourceManager (on port 8088), then click on the ApplicationMaster link for the Spark application you are interested in. This will take you to

Re: Spark 1.5.0 on YARN dynamicAllocation - Initial job has not accepted any resources

2015-09-24 Thread Jonathan Kelly
I cut https://issues.apache.org/jira/browse/SPARK-10790 for this issue. On Wed, Sep 23, 2015 at 8:38 PM, Jonathan Kelly <jonathaka...@gmail.com> wrote: > AHA! I figured it out, but it required some tedious remote debugging of > the Spark ApplicationMaster. (But now I understa

Re: Spark 1.5.0 on YARN dynamicAllocation - Initial job has not accepted any resources

2015-09-23 Thread Jonathan Kelly
ly. I can't seem to find a JIRA for this, so shall I file one, or has anybody else seen anything like this? ~ Jonathan On Wed, Sep 23, 2015 at 7:08 PM, Jonathan Kelly <jonathaka...@gmail.com> wrote: > Another update that doesn't make much sense: > > The SparkPi example doe

Re: Spark 1.5.0 on YARN dynamicAllocation - Initial job has not accepted any resources

2015-09-23 Thread Jonathan Kelly
ing dynamic allocation. > > > On Wed, Sep 23, 2015 at 18:04 Jonathan Kelly <jonathaka...@gmail.com> > wrote: > >> I'm running into a problem with YARN dynamicAllocation on Spark 1.5.0 >> after using it successfully on an identically configured cluster with Spark >>

Re: Spark 1.5.0 on YARN dynamicAllocation - Initial job has not accepted any resources

2015-09-23 Thread Jonathan Kelly
work. ~ Jonathan On Wed, Sep 23, 2015 at 6:22 PM, Jonathan Kelly <jonathaka...@gmail.com> wrote: > Thanks for the quick response! > > spark-shell is indeed using yarn-client. I forgot to mention that I also > have "spark.master yarn-client" in my spark-defaults.co

Spark 1.5.0 on YARN dynamicAllocation - Initial job has not accepted any resources

2015-09-23 Thread Jonathan Kelly
I'm running into a problem with YARN dynamicAllocation on Spark 1.5.0 after using it successfully on an identically configured cluster with Spark 1.4.1. I'm getting the dreaded warning "YarnClusterScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers