Re: How to access Spark UI through AWS

2015-08-25 Thread Kelly, Jonathan
I'm not sure why the UI appears broken like that either and haven't investigated it myself yet, but if you instead go to the YARN ResourceManager UI (port 8088 if you are using emr-4.x; port 9026 for 3.x, I believe), then you should be able to click on the ApplicationMaster link (or the History

Re: Unable to use dynamicAllocation if spark.executor.instances is set in spark-defaults.conf

2015-07-15 Thread Kelly, Jonathan
Would there be any problem in having spark.executor.instances (or --num-executors) be completely ignored (i.e., even for non-zero values) if spark.dynamicAllocation.enabled is true (i.e., rather than throwing an exception)? I can see how the exception would be helpful if, say, you tried to

Re: Unable to use dynamicAllocation if spark.executor.instances is set in spark-defaults.conf

2015-07-15 Thread Kelly, Jonathan
bump From: Jonathan Kelly jonat...@amazon.commailto:jonat...@amazon.com Date: Tuesday, July 14, 2015 at 4:23 PM To: user@spark.apache.orgmailto:user@spark.apache.org user@spark.apache.orgmailto:user@spark.apache.org Subject: Unable to use dynamicAllocation if spark.executor.instances is set in

Unable to use dynamicAllocation if spark.executor.instances is set in spark-defaults.conf

2015-07-14 Thread Kelly, Jonathan
I've set up my cluster with a pre-calcualted value for spark.executor.instances in spark-defaults.conf such that I can run a job and have it maximize the utilization of the cluster resources by default. However, if I want to run a job with dynamicAllocation (by passing -c

Re: Spark ec2 cluster lost worker

2015-06-24 Thread Kelly, Jonathan
lost worker Hi Jonathan, Thanks for this information! I will take a look into it. However is there a way to reconnect the lost node? Or there's no way that I could do to find back the lost worker? Thanks! Anny On Wed, Jun 24, 2015 at 6:06 PM, Kelly, Jonathan jonat...@amazon.commailto:jonat

Re: Spark ec2 cluster lost worker

2015-06-24 Thread Kelly, Jonathan
Just curious, would you be able to use Spark on EMR rather than on EC2? Spark on EMR will handle lost nodes for you, and it will let you scale your cluster up and down or clone a cluster (its config, that is, not the data stored in HDFS), among other things. We also recently announced official

Re: [ERROR] Insufficient Space

2015-06-19 Thread Kelly, Jonathan
Would you be able to use Spark on EMR rather than on EC2? EMR clusters allow easy resizing of the cluster, and EMR also now supports Spark 1.3.1 as of EMR AMI 3.8.0. See http://aws.amazon.com/emr/spark ~ Jonathan From: Vadim Bichutskiy

Re: Spark on EMR

2015-06-17 Thread Kelly, Jonathan
Yes, for now it is a wrapper around the old install-spark BA, but that will change soon. The currently supported version in AMI 3.8.0 is 1.3.1, as 1.4.0 was released too late to include it in AMI 3.8.0. Spark 1.4.0 support is coming soon though, of course. Unfortunately, though install-spark is

Re: Spark + Kinesis

2015-04-03 Thread Kelly, Jonathan
=admFkaW0uYmljaHV0c2tpeUBnbWFpbC5jb20%3Dtype=zerocontentguid=3d9e0d72-3cbe-4d6f-b262-829b92632515]ᐧ On Thu, Apr 2, 2015 at 1:15 PM, Kelly, Jonathan jonat...@amazon.commailto:jonat...@amazon.com wrote: It looks like you're attempting to mix Scala versions, so that's going to cause some problems. If you really want

Re: Spark + Kinesis

2015-04-03 Thread Kelly, Jonathan
, Kelly, Jonathan jonat...@amazon.commailto:jonat...@amazon.com wrote: spark-streaming-kinesis-asl is not part of the Spark distribution on your cluster, so you cannot have it be just a provided dependency. This is also why the KCL and its dependencies were not included in the assembly (but yes

Re: Spark + Kinesis

2015-04-02 Thread Kelly, Jonathan
It looks like you're attempting to mix Scala versions, so that's going to cause some problems. If you really want to use Scala 2.11.5, you must also use Spark package versions built for Scala 2.11 rather than 2.10. Anyway, that's not quite the correct way to specify Scala dependencies in

Spark and OpenJDK - jar: No such file or directory

2015-03-30 Thread Kelly, Jonathan
I'm trying to use OpenJDK 7 with Spark 1.3.0 and noticed that the compute-classpath.sh script is not adding the datanucleus jars to the classpath because compute-classpath.sh is assuming to find the jar command in $JAVA_HOME/bin/jar, which does not exist for OpenJDK. Is this an issue anybody

Re: When will 1.3.1 release?

2015-03-30 Thread Kelly, Jonathan
Are you referring to SPARK-6330https://issues.apache.org/jira/browse/SPARK-6330? If you are able to build Spark from source yourself, I believe you should just need to cherry-pick the following commits in order to backport the fix: 67fa6d1f830dee37244b5a30684d797093c7c134 [SPARK-6330] Fix

Re: Spark and OpenJDK - jar: No such file or directory

2015-03-30 Thread Kelly, Jonathan
Ah, never mind, I found the jar command in the java-1.7.0-openjdk-devel package. I only had java-1.7.0-openjdk installed. Looks like I just need to install java-1.7.0-openjdk-devel then set JAVA_HOME to /usr/lib/jvm/java instead of /usr/lib/jvm/jre. ~ Jonathan Kelly From: Kelly, Jonathan

Using Spark with a SOCKS proxy

2015-03-17 Thread Kelly, Jonathan
I'm trying to figure out how I might be able to use Spark with a SOCKS proxy. That is, my dream is to be able to write code in my IDE then run it without much trouble on a remote cluster, accessible only via a SOCKS proxy between the local development machine and the master node of the cluster

Re: sqlContext.parquetFile doesn't work with s3n in version 1.3.0

2015-03-16 Thread Kelly, Jonathan
See https://issues.apache.org/jira/browse/SPARK-6351 ~ Jonathan From: Shuai Zheng szheng.c...@gmail.commailto:szheng.c...@gmail.com Date: Monday, March 16, 2015 at 11:46 AM To: user@spark.apache.orgmailto:user@spark.apache.org user@spark.apache.orgmailto:user@spark.apache.org Subject:

problems with spark-streaming-kinesis-asl and sbt assembly (different file contents found)

2015-03-16 Thread Kelly, Jonathan
I'm attempting to use the Spark Kinesis Connector, so I've added the following dependency in my build.sbt: libraryDependencies += org.apache.spark %% spark-streaming-kinesis-asl % 1.3.0 My app works fine with sbt run, but I can't seem to get sbt assembly to work without failing with different

Re: problems with spark-streaming-kinesis-asl and sbt assembly (different file contents found)

2015-03-16 Thread Kelly, Jonathan
sure spark-streaming is marked as provided. spark-streaming is already part of the spark installation so will be present at run time. That might solve some of these, may be!? TD On Mon, Mar 16, 2015 at 11:30 AM, Kelly, Jonathan jonat...@amazon.commailto:jonat...@amazon.com wrote: I'm attempting

Re: problems with spark-streaming-kinesis-asl and sbt assembly (different file contents found)

2015-03-16 Thread Kelly, Jonathan
@spark.apache.orgmailto:user@spark.apache.org Subject: Re: problems with spark-streaming-kinesis-asl and sbt assembly (different file contents found) Can you give use your SBT project? Minus the source codes if you don't wish to expose them. TD On Mon, Mar 16, 2015 at 12:54 PM, Kelly, Jonathan jonat

Re: Spark v1.2.1 failing under BigTop build in External Flume Sink (due to missing Netty library)

2015-03-05 Thread Kelly, Jonathan
...@gmail.com wrote: You may need to add the -Phadoop-2.4 profile. When building or release packages for Hadoop 2.4 we use the following flags: -Phadoop-2.4 -Phive -Phive-thriftserver -Pyarn - Patrick On Thu, Mar 5, 2015 at 12:47 PM, Kelly, Jonathan jonat...@amazon.com wrote: I confirmed that this has

Re: Spark v1.2.1 failing under BigTop build in External Flume Sink (due to missing Netty library)

2015-03-05 Thread Kelly, Jonathan
I confirmed that this has nothing to do with BigTop by running the same mvn command directly in a fresh clone of the Spark package at the v1.2.1 tag. I got the same exact error. Jonathan Kelly Elastic MapReduce - SDE Port 99 (SEA35) 08.220.C2 From: Kelly, Jonathan Kelly jonat

Spark v1.2.1 failing under BigTop build in External Flume Sink (due to missing Netty library)

2015-03-05 Thread Kelly, Jonathan
I'm running into an issue building Spark v1.2.1 (as well as the latest in branch-1.2 and v1.3.0-rc2 and the latest in branch-1.3) with BigTop (v0.9, which is not quite released yet). The build fails in the External Flume Sink subproject with the following error: [INFO] Compiling 5 Scala

Re: kinesis multiple records adding into stream

2015-01-16 Thread Kelly, Jonathan
Are you referring to the PutRecords method, which was added in 1.9.9? (See http://aws.amazon.com/releasenotes/1369906126177804) If so, can't you just depend upon this later version of the SDK in your app even though spark-streaming-kinesis-asl is depending upon this earlier 1.9.3 version that

Re: SchemaRDD.saveAsTable() when schema contains arrays and was loaded from a JSON file using schema auto-detection

2014-11-27 Thread Kelly, Jonathan
On Wed, Nov 26, 2014 at 9:01 PM, Kelly, Jonathan jonat...@amazon.commailto:jonat...@amazon.com wrote: After playing around with this a little more, I discovered that: 1. If test.json contains something like {values:[null,1,2,3]}, the schema auto-determined by SchemaRDD.jsonFile() will have

SchemaRDD.saveAsTable() when schema contains arrays and was loaded from a JSON file using schema auto-detection

2014-11-26 Thread Kelly, Jonathan
I've noticed some strange behavior when I try to use SchemaRDD.saveAsTable() with a SchemaRDD that I¹ve loaded from a JSON file that contains elements with nested arrays. For example, with a file test.json that contains the single line: {values:[1,2,3]} and with code like the following:

Re: SchemaRDD.saveAsTable() when schema contains arrays and was loaded from a JSON file using schema auto-detection

2014-11-26 Thread Kelly, Jonathan
. ~ Jonathan On 11/26/14, 5:23 PM, Kelly, Jonathan jonat...@amazon.com wrote: I've noticed some strange behavior when I try to use SchemaRDD.saveAsTable() with a SchemaRDD that I¹ve loaded from a JSON file that contains elements with nested arrays. For example, with a file test.json that contains