Re: Purpose of spark-submit?

2014-07-09 Thread Robert James
compile them inside of their program. That's the one you mention here. You can choose to use this feature or not. If you know your configs are not going to change, then you don't need to set them with spark-submit. On Wed, Jul 9, 2014 at 10:22 AM, Robert James srobertja...@gmail.com wrote: What

Re: Comparative study

2014-07-08 Thread Robert James
As a new user, I can definitely say that my experience with Spark has been rather raw. The appeal of interactive, batch, and in between all using more or less straight Scala is unarguable. But the experience of deploying Spark has been quite painful, mainly about gaps between compile time and

Purpose of spark-submit?

2014-07-08 Thread Robert James
What is the purpose of spark-submit? Does it do anything outside of the standard val conf = new SparkConf ... val sc = new SparkContext ... ?

Requirements for Spark cluster

2014-07-08 Thread Robert James
I have a Spark app which runs well on local master. I'm now ready to put it on a cluster. What needs to be installed on the master? What needs to be installed on the workers? If the cluster already has Hadoop or YARN or Cloudera, does it still need an install of Spark?

spark-submit conflicts with dependencies

2014-07-07 Thread Robert James
When I use spark-submit (along with spark-ec2), I get dependency conflicts. spark-assembly includes older versions of apache commons codec and httpclient, and these conflict with many of the libs our software uses. Is there any way to resolve these? Or, if we use the precompiled spark, can we

spark-assembly libraries conflict with needed libraries

2014-07-07 Thread Robert James
spark-submit includes a spark-assembly uber jar, which has older versions of many common libraries. These conflict with some of the dependencies we need. I have been racking my brain trying to find a solution (including experimenting with ProGuard), but haven't been able to: when we use

spark-assembly libraries conflict with application libraries

2014-07-07 Thread Robert James
spark-submit includes a spark-assembly uber jar, which has older versions of many common libraries. These conflict with some of the dependencies we need. I have been racking my brain trying to find a solution (including experimenting with ProGuard), but haven't been able to: when we use

Re: spark-assembly libraries conflict with needed libraries

2014-07-07 Thread Robert James
jars in front of classpath, which should do the trick. however i had no luck with this. see here: https://issues.apache.org/jira/browse/SPARK-1863 On Mon, Jul 7, 2014 at 1:31 PM, Robert James srobertja...@gmail.com wrote: spark-submit includes a spark-assembly uber jar, which has older

Addind and subtracting workers on Spark EC2 cluster

2014-07-06 Thread Robert James
If I've created a Spark EC2 cluster, how can I add or take away workers? Also: If I use EC2 spot instances, what happens when Amazon removes them? Will my computation be saved in any way, or will I need to restart from scratch? Finally: The spark-ec2 scripts seem to use Hadoop 1. How can I

Re: Unable to run Spark 1.0 SparkPi on HDP 2.0

2014-07-06 Thread Robert James
I can say from my experience that getting Spark to work with Hadoop 2 is not for the beginner; after solving one problem after another (dependencies, scripts, etc.), I went back to Hadoop 1. Spark's Maven, ec2 scripts, and others all use Hadoop 1 - not sure why, but, given so, Hadoop 2 has too

Is it possible to use Spark, Maven, and Hadoop 2?

2014-06-29 Thread Robert James
Although Spark's home page offers binaries for Spark 1.0.0 with Hadoop 2, the Maven repository only seems to have one version, which uses Hadoop 1. Is it possible to use a Maven link and Hadoop 2? What is the id? If not: How can I use the prebuilt binaries to use Hadoop 2? Do I just copy the

Re: Is it possible to use Spark, Maven, and Hadoop 2?

2014-06-29 Thread Robert James
to make a jar assembly using your approach? How? If not: How do you distribute the jars to the workers? On Sun, Jun 29, 2014 at 12:20 PM, Robert James srobertja...@gmail.com wrote: Although Spark's home page offers binaries for Spark 1.0.0 with Hadoop 2, the Maven repository only seems to have

Re: Hadoop interface vs class

2014-06-26 Thread Robert James
this problem? (Surely I'm not the only one using Hadoop 2 and sbt or maven or ivy!) On Jun 26, 2014 11:07 AM, Robert James srobertja...@gmail.com wrote: Yes. As far as I can tell, Spark seems to be including Hadoop 1 via its transitive dependency: http://mvnrepository.com/artifact

Spark's Hadooop Dependency

2014-06-25 Thread Robert James
To add Spark to a SBT project, I do: libraryDependencies += org.apache.spark %% spark-core % 1.0.0 % provided How do I make sure that the spark version which will be downloaded will depend on, and use, Hadoop 2, and not Hadoop 1? Even with a line: libraryDependencies += org.apache.hadoop %

Spark's Maven dependency on Hadoop 1

2014-06-25 Thread Robert James
According to http://mvnrepository.com/artifact/org.apache.spark/spark-core_2.10/1.0.0 , spark depends on Hadoop 1.0.4. What about the versions of Spark that work with Hadoop 2? Do they also depend on Hadoop 1.0.4? How does everyone handle this?

Hadoop interface vs class

2014-06-25 Thread Robert James
After upgrading to Spark 1.0.0, I get this error: ERROR org.apache.spark.executor.ExecutorUncaughtExceptionHandler - Uncaught exception in thread Thread[Executor task launch worker-2,5,main] java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext,

Centralized Spark Logging solution

2014-06-24 Thread Robert James
We need a centralized spark logging solution. Ideally, it should: * Allow any Spark process to log at multiple levels (info, warn, debug) using a single line, similar to log4j * All logs should go to a central location - so, to read the logs, we don't need to check each worker by itself *

Upgrading to Spark 1.0.0 causes NoSuchMethodError

2014-06-24 Thread Robert James
My app works fine under Spark 0.9. I just tried upgrading to Spark 1.0, by downloading the Spark distro to a dir, changing the sbt file, and running sbt assembly, but I get now NoSuchMethodErrors when trying to use spark-submit. I copied in the SimpleApp example from

Re: Upgrading to Spark 1.0.0 causes NoSuchMethodError

2014-06-24 Thread Robert James
On 6/24/14, Peng Cheng pc...@uow.edu.au wrote: I got 'NoSuchFieldError' which is of the same type. its definitely a dependency jar conflict. spark driver will load jars of itself which in recent version get many dependencies that are 1-2 years old. And if your newer version dependency is in

Re: Passing runtime config to workers?

2014-05-18 Thread Robert James
--- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Fri, May 16, 2014 at 1:59 PM, Robert James srobertja...@gmail.comwrote: What is a good way to pass config variables to workers? I've tried setting them

Workers unable to find class, even when in the SparkConf JAR list

2014-05-16 Thread Robert James
I'm using spark-ec2 to run some Spark code. When I set master to local, then it runs fine. However, when I set master to $MASTER, the workers immediately fail, with java.lang.NoClassDefFoundError for the classes. I've used sbt-assembly to make a jar with the classes, confirmed using jar tvf

Passing runtime config to workers?

2014-05-16 Thread Robert James
What is a good way to pass config variables to workers? I've tried setting them in environment variables via spark-env.sh, but, as far as I can tell, the environment variables set there don't appear in workers' environments. If I want to be able to configure all workers, what's a good way to do

Re: Distribute jar dependencies via sc.AddJar(fileName)

2014-05-16 Thread Robert James
I've experienced the same bug, which I had to workaround manually. I posted the details here: http://stackoverflow.com/questions/23687081/spark-workers-unable-to-find-jar-on-ec2-cluster On 5/15/14, DB Tsai dbt...@stanford.edu wrote: Hi guys, I think it maybe a bug in Spark. I wrote some code

What is the difference between a Spark Worker and a Spark Slave?

2014-05-16 Thread Robert James
What is the difference between a Spark Worker and a Spark Slave?

Debugging Spark AWS S3

2014-05-16 Thread Robert James
I have Spark code which runs beautifully when MASTER=local. When I run it with MASTER set to a spark ec2 cluster, the workers seem to run, but the results, which are supposed to be put to AWS S3, don't appear on S3. I'm at a loss for how to debug this. I don't see any S3 exceptions anywhere.