compile them inside of their program. That's
the one you mention here. You can choose to use this feature or not.
If you know your configs are not going to change, then you don't need
to set them with spark-submit.
On Wed, Jul 9, 2014 at 10:22 AM, Robert James srobertja...@gmail.com
wrote:
What
As a new user, I can definitely say that my experience with Spark has
been rather raw. The appeal of interactive, batch, and in between all
using more or less straight Scala is unarguable. But the experience
of deploying Spark has been quite painful, mainly about gaps between
compile time and
What is the purpose of spark-submit? Does it do anything outside of
the standard val conf = new SparkConf ... val sc = new SparkContext
... ?
I have a Spark app which runs well on local master. I'm now ready to
put it on a cluster. What needs to be installed on the master? What
needs to be installed on the workers?
If the cluster already has Hadoop or YARN or Cloudera, does it still
need an install of Spark?
When I use spark-submit (along with spark-ec2), I get dependency
conflicts. spark-assembly includes older versions of apache commons
codec and httpclient, and these conflict with many of the libs our
software uses.
Is there any way to resolve these? Or, if we use the precompiled
spark, can we
spark-submit includes a spark-assembly uber jar, which has older
versions of many common libraries. These conflict with some of the
dependencies we need. I have been racking my brain trying to find a
solution (including experimenting with ProGuard), but haven't been
able to: when we use
spark-submit includes a spark-assembly uber jar, which has older
versions of many common libraries. These conflict with some of the
dependencies we need. I have been racking my brain trying to find a
solution (including experimenting with ProGuard), but haven't been
able to: when we use
jars in front of classpath, which should do
the trick.
however i had no luck with this. see here:
https://issues.apache.org/jira/browse/SPARK-1863
On Mon, Jul 7, 2014 at 1:31 PM, Robert James srobertja...@gmail.com
wrote:
spark-submit includes a spark-assembly uber jar, which has older
If I've created a Spark EC2 cluster, how can I add or take away workers?
Also: If I use EC2 spot instances, what happens when Amazon removes
them? Will my computation be saved in any way, or will I need to
restart from scratch?
Finally: The spark-ec2 scripts seem to use Hadoop 1. How can I
I can say from my experience that getting Spark to work with Hadoop 2
is not for the beginner; after solving one problem after another
(dependencies, scripts, etc.), I went back to Hadoop 1.
Spark's Maven, ec2 scripts, and others all use Hadoop 1 - not sure
why, but, given so, Hadoop 2 has too
Although Spark's home page offers binaries for Spark 1.0.0 with Hadoop
2, the Maven repository only seems to have one version, which uses
Hadoop 1.
Is it possible to use a Maven link and Hadoop 2? What is the id?
If not: How can I use the prebuilt binaries to use Hadoop 2? Do I just
copy the
to make a jar assembly using your approach? How? If
not: How do you distribute the jars to the workers?
On Sun, Jun 29, 2014 at 12:20 PM, Robert James srobertja...@gmail.com
wrote:
Although Spark's home page offers binaries for Spark 1.0.0 with Hadoop
2, the Maven repository only seems to have
this problem? (Surely I'm not the only one
using Hadoop 2 and sbt or maven or ivy!)
On Jun 26, 2014 11:07 AM, Robert James srobertja...@gmail.com wrote:
Yes. As far as I can tell, Spark seems to be including Hadoop 1 via
its transitive dependency:
http://mvnrepository.com/artifact
To add Spark to a SBT project, I do:
libraryDependencies += org.apache.spark %% spark-core % 1.0.0
% provided
How do I make sure that the spark version which will be downloaded
will depend on, and use, Hadoop 2, and not Hadoop 1?
Even with a line:
libraryDependencies += org.apache.hadoop %
According to
http://mvnrepository.com/artifact/org.apache.spark/spark-core_2.10/1.0.0
, spark depends on Hadoop 1.0.4. What about the versions of Spark that
work with Hadoop 2? Do they also depend on Hadoop 1.0.4?
How does everyone handle this?
After upgrading to Spark 1.0.0, I get this error:
ERROR org.apache.spark.executor.ExecutorUncaughtExceptionHandler -
Uncaught exception in thread Thread[Executor task launch
worker-2,5,main]
java.lang.IncompatibleClassChangeError: Found interface
org.apache.hadoop.mapreduce.TaskAttemptContext,
We need a centralized spark logging solution. Ideally, it should:
* Allow any Spark process to log at multiple levels (info, warn,
debug) using a single line, similar to log4j
* All logs should go to a central location - so, to read the logs, we
don't need to check each worker by itself
*
My app works fine under Spark 0.9. I just tried upgrading to Spark
1.0, by downloading the Spark distro to a dir, changing the sbt file,
and running sbt assembly, but I get now NoSuchMethodErrors when trying
to use spark-submit.
I copied in the SimpleApp example from
On 6/24/14, Peng Cheng pc...@uow.edu.au wrote:
I got 'NoSuchFieldError' which is of the same type. its definitely a
dependency jar conflict. spark driver will load jars of itself which in
recent version get many dependencies that are 1-2 years old. And if your
newer version dependency is in
---
My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai
On Fri, May 16, 2014 at 1:59 PM, Robert James
srobertja...@gmail.comwrote:
What is a good way to pass config variables to workers?
I've tried setting them
I'm using spark-ec2 to run some Spark code. When I set master to
local, then it runs fine. However, when I set master to $MASTER,
the workers immediately fail, with java.lang.NoClassDefFoundError for
the classes.
I've used sbt-assembly to make a jar with the classes, confirmed using
jar tvf
What is a good way to pass config variables to workers?
I've tried setting them in environment variables via spark-env.sh, but, as
far as I can tell, the environment variables set there don't appear in
workers' environments. If I want to be able to configure all workers,
what's a good way to do
I've experienced the same bug, which I had to workaround manually. I
posted the details here:
http://stackoverflow.com/questions/23687081/spark-workers-unable-to-find-jar-on-ec2-cluster
On 5/15/14, DB Tsai dbt...@stanford.edu wrote:
Hi guys,
I think it maybe a bug in Spark. I wrote some code
What is the difference between a Spark Worker and a Spark Slave?
I have Spark code which runs beautifully when MASTER=local. When I
run it with MASTER set to a spark ec2 cluster, the workers seem to
run, but the results, which are supposed to be put to AWS S3, don't
appear on S3. I'm at a loss for how to debug this. I don't see any
S3 exceptions anywhere.
25 matches
Mail list logo