at 2:51 PM, James King jakwebin...@gmail.com wrote:
I have two hosts 192.168.1.15 (Master) and 192.168.1.16 (Worker)
These two hosts have exchanged public keys so they have free access to
each other.
But when I do spark home/sbin/start-all.sh from 192.168.1.15 I still
get
192.168.1.16
I have two hosts 192.168.1.15 (Master) and 192.168.1.16 (Worker)
These two hosts have exchanged public keys so they have free access to each
other.
But when I do spark home/sbin/start-all.sh from 192.168.1.15 I still get
192.168.1.16: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
this just needs further tuning?
* Increasing executors, RAM, etc. This doesn't make a difference by itself
for this job, so I'm thinking we're already not fully utilising the
resources we have in a smaller cluster.
Again, any recommendations appreciated. Thanks for the help!
James.
On 4 June 2015
to using
hadoopRDD() with the appropriate Input/Output formats?
Any advice or tips greatly appreciated!
James.
run on a specific port?
Regards
jk
On Wed, May 13, 2015 at 7:51 PM, James King jakwebin...@gmail.com wrote:
Indeed, many thanks.
On Wednesday, 13 May 2015, Cody Koeninger c...@koeninger.org wrote:
I believe most ports are configurable at this point, look at
http://spark.apache.org/docs
through a context.
So, master != driver and executor != worker.
Best
Ayan
On Fri, May 15, 2015 at 7:52 PM, James King jakwebin...@gmail.com wrote:
So I'm using code like this to use specific ports:
val conf = new SparkConf()
.setMaster(master)
.setAppName(namexxx)
.set
From: http://spark.apache.org/docs/latest/streaming-kafka-integration.html
I'm trying to use the direct approach to read messages form Kafka.
Kafka is running as a cluster and configured with Zookeeper.
On the above page it mentions:
In the Kafka parameters, you must specify either
of brokers in pre-existing Kafka
project apis. I don't know why the Kafka project chose to use 2 different
configuration keys.
On Wed, May 13, 2015 at 5:00 AM, James King jakwebin...@gmail.com wrote:
From:
http://spark.apache.org/docs/latest/streaming-kafka-integration.html
I'm trying to use
I'm trying Kafka Direct approach (for consume) but when I use only this
config:
kafkaParams.put(group.id, groupdid);
kafkaParams.put(zookeeper.connect, zookeeperHostAndPort + /cb_kafka);
I get this
Exception in thread main org.apache.spark.SparkException: Must specify
metadata.broker.list or
I understated that this port value is randomly selected.
Is there a way to enforce which spark port a Worker should use?
, James King jakwebin...@gmail.com wrote:
Looking at Consumer Configs in
http://kafka.apache.org/documentation.html#consumerconfigs
The properties *metadata.broker.list* or *bootstrap.servers *are not
mentioned.
Should I need these for consume side?
On Wed, May 13, 2015 at 3:52 PM, James King
Looking at Consumer Configs in
http://kafka.apache.org/documentation.html#consumerconfigs
The properties *metadata.broker.list* or *bootstrap.servers *are not
mentioned.
Should I need these for consume side?
On Wed, May 13, 2015 at 3:52 PM, James King jakwebin...@gmail.com wrote:
Many thanks
Indeed, many thanks.
On Wednesday, 13 May 2015, Cody Koeninger c...@koeninger.org wrote:
I believe most ports are configurable at this point, look at
http://spark.apache.org/docs/latest/configuration.html
search for .port
On Wed, May 13, 2015 at 9:38 AM, James King jakwebin...@gmail.com
Thanks Akhil,
I'm using Spark in standalone mode so i guess Mesos is not an option here.
On Tue, May 12, 2015 at 1:27 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:
Mesos has a HA option (of course it includes zookeeper)
Thanks
Best Regards
On Tue, May 12, 2015 at 4:53 PM, James King
What I want is if the driver dies for some reason and it is restarted I
want to read only messages that arrived into Kafka following the restart of
the driver program and re-connection to Kafka.
Has anyone done this? any links or resources that can help explain this?
Regards
jk
Best Regards
On Tue, May 12, 2015 at 5:15 PM, James King jakwebin...@gmail.com wrote:
What I want is if the driver dies for some reason and it is restarted I
want to read only messages that arrived into Kafka following the restart of
the driver program and re-connection to Kafka.
Has anyone
at 9:01 AM, James King jakwebin...@gmail.com wrote:
Thanks Cody.
Here are the events:
- Spark app connects to Kafka first time and starts consuming
- Messages 1 - 10 arrive at Kafka then Spark app gets them
- Now driver dies
- Messages 11 - 15 arrive at Kafka
- Spark driver program
I know that it is possible to use Zookeeper and File System (not for
production use) to achieve HA.
Are there any other options now or in the near future?
that the linked library is
much more flexible/reliable than what's available in Spark at this point.
James, what you're describing is the default behavior for the
createDirectStream api available as part of spark since 1.3. The kafka
parameter auto.offset.reset defaults to largest, ie start at the most
should set
your master URL to be
spark://host01:7077,host02:7077
And the property spark.deploy.recoveryMode=ZOOKEEPER
See here for more info:
http://spark.apache.org/docs/latest/spark-standalone.html#standby-masters-with-zookeeper
From: James King
Date: Friday, May 8, 2015 at 11:22 AM
Many Thanks Silvio,
Someone also suggested using something similar :
./bin/spark-class org.apache.spark.deploy.Client kill master url driver
ID
Regards
jk
On Fri, May 8, 2015 at 2:12 AM, Silvio Fiorito
silvio.fior...@granturing.com wrote:
Hi James,
If you’re on Spark 1.3 you can use
Why does this not work
./spark-1.3.0-bin-hadoop2.4/bin/spark-submit --class SomeApp --deploy-mode
cluster --supervise --master spark://host01:7077,host02:7077 Some.jar
With exception:
Caused by: java.lang.NumberFormatException: For input string:
7077,host02:7077
It seems to accept only one
I have two hosts host01 and host02 (lets call them)
I run one Master and two Workers on host01
I also run one Master and two Workers on host02
Now I have 1 LIVE Master on host01 and a STANDBY Master on host02
The LIVE Master is aware of all Workers in the cluster
Now I submit a Spark
BTW I'm using Spark 1.3.0.
Thanks
On Fri, May 8, 2015 at 5:22 PM, James King jakwebin...@gmail.com wrote:
I have two hosts host01 and host02 (lets call them)
I run one Master and two Workers on host01
I also run one Master and two Workers on host02
Now I have 1 LIVE Master on host01
Many thanks all, your responses have been very helpful. Cheers
On Wed, May 6, 2015 at 2:14 PM, ayan guha guha.a...@gmail.com wrote:
https://spark.apache.org/docs/latest/streaming-programming-guide.html#fault-tolerance-semantics
On Wed, May 6, 2015 at 10:09 PM, James King jakwebin
In the O'reilly book Learning Spark Chapter 10 section 24/7 Operation
It talks about 'Receiver Fault Tolerance'
I'm unsure of what a Receiver is here, from reading it sounds like when you
submit an application to the cluster in cluster mode i.e. *--deploy-mode
cluster *the driver program will
to see
here, move along. :)
On Sat, May 2, 2015 at 2:44 PM Mohammed Guller moham...@glassbeam.com
wrote:
No, you don’t need to do anything special. Perhaps, your application is
getting stuck somewhere? If you can share your code, someone may be able to
help.
Mohammed
*From:* James
I have the following simple example program:
public class SimpleCount {
public static void main(String[] args) {
final String master = System.getProperty(spark.master,
local[*]);
System.out.printf(Running job against spark master %s ...%n,
master);
final SparkConf
/spark-events. And this folder does
not exits.
Best Regards,
Shixiong Zhu
2015-04-29 23:22 GMT-07:00 James King jakwebin...@gmail.com:
I'm unclear why I'm getting this exception.
It seems to have realized that I want to enable Event Logging but
ignoring where I want it to log to i.e. file
In all the examples, it seems that the spark application doesn't really do
anything special in order to exit. When I run my application, however, the
spark-submit script just hangs there at the end. Is there something
special I need to do to get that thing to exit normally?
I'm unclear why I'm getting this exception.
It seems to have realized that I want to enable Event Logging but ignoring
where I want it to log to i.e. file:/opt/cb/tmp/spark-events which does
exist.
spark-default.conf
# Example:
spark.master spark://master1:7077,master2:7077
explicitly
Shouldn't Spark just consult with ZK and us the active master?
Or is ZK only used during failure?
On Mon, Apr 27, 2015 at 1:53 PM, James King jakwebin...@gmail.com wrote:
Thanks.
I've set SPARK_HOME and SPARK_CONF_DIR appropriately in .bash_profile
But when I start worker like
I have multiple masters running and I'm trying to submit an application
using
spark-1.3.0-bin-hadoop2.4/bin/spark-submit
with this config (i.e. a comma separated list of master urls)
--master spark://master01:7077,spark://master02:7077
But getting this exception
suggestions?
Should this work?
James.
I renamed spark-defaults.conf.template to spark-defaults.conf
and invoked
spark-1.3.0-bin-hadoop2.4/sbin/start-slave.sh
But I still get
failed to launch org.apache.spark.deploy.worker.Worker:
--properties-file FILE Path to a custom Spark properties file.
, SPARK_CONF_DIR.
On Mon, Apr 27, 2015 at 12:56 PM James King jakwebin...@gmail.com wrote:
I renamed spark-defaults.conf.template to spark-defaults.conf
and invoked
spark-1.3.0-bin-hadoop2.4/sbin/start-slave.sh
But I still get
failed to launch org.apache.spark.deploy.worker.Worker:
--properties
On Sun, Apr 26, 2015 at 6:31 PM, James King jakwebin...@gmail.com wrote:
If I have 5 nodes and I wish to maintain 1 Master and 2 Workers on each
node, so in total I will have 5 master and 10 Workers.
Now to maintain that setup I would like to query spark regarding the
number Masters and Workers
If I have 5 nodes and I wish to maintain 1 Master and 2 Workers on each
node, so in total I will have 5 master and 10 Workers.
Now to maintain that setup I would like to query spark regarding the number
Masters and Workers that are currently available using API calls and then
take some
I'm trying to find out how to setup a resilient Spark cluster.
Things I'm thinking about include:
- How to start multiple masters on different hosts?
- there isn't a conf/masters file from what I can see
Thank you.
://twitter.com/deanwampler
http://polyglotprogramming.com
On Fri, Apr 24, 2015 at 5:01 AM, James King jakwebin...@gmail.com wrote:
I'm trying to find out how to setup a resilient Spark cluster.
Things I'm thinking about include:
- How to start multiple masters on different hosts
Is there a good resource that covers what kind of chatter (communication)
that goes on between driver, master and worker processes?
Thanks
Hi Emre, thanks for the help will have a look. Cheers!
On Tue, Apr 21, 2015 at 1:46 PM, Emre Sevinc emre.sev...@gmail.com wrote:
Hello James,
Did you check the following resources:
-
https://github.com/apache/spark/tree/master/streaming/src/test/java/org/apache/spark/streaming
-
http
I'm trying to write some unit tests for my spark code.
I need to pass a JavaPairDStreamString, String to my spark class.
Is there a way to create a JavaPairDStream using Java API?
Also is there a good resource that covers an approach (or approaches) for
unit testing using Java.
Regards
jk
In the web ui i can see some jobs as 'skipped' what does that mean? why are
these jobs skipped? do they ever get executed?
Regards
jk
/apache/spark/graphx/impl/GraphImpl.scala#L237-266
Ankur
On Thu, Apr 9, 2015 at 3:21 AM, James alcaid1...@gmail.com wrote:
In aggregateMessagesWithActiveSet, Spark still have to read all edges. It
means that a fixed time which scale with graph size is unavoidable on a
pregel-like iteration
Any idea what this means, many thanks
==
logs/spark-.-org.apache.spark.deploy.worker.Worker-1-09.out.1
==
15/04/13 07:07:22 INFO Worker: Starting Spark worker 09:39910 with 4
cores, 6.6 GB RAM
15/04/13 07:07:22 INFO Worker: Running Spark version 1.3.0
15/04/13 07:07:22 INFO
(...)
Ankur
On Tue, Apr 7, 2015 at 2:56 AM, James alcaid1...@gmail.com wrote:
Hello,
The old api of GraphX mapReduceTriplets has an optional parameter
activeSetOpt: Option[(VertexRDD[_] that limit the input of sendMessage.
However, to the new api aggregateMessages I could not find
Hello,
The old api of GraphX mapReduceTriplets has an optional parameter
activeSetOpt: Option[(VertexRDD[_] that limit the input of sendMessage.
However, to the new api aggregateMessages I could not find this option,
why it does not offer any more?
Alcaid
when running the thrift server,
I need to create a Hive table definition first? Is that the case, or did I
miss something? If it is, is there some sensible way to automate this?
Many thanks!
James
[1]
https://spark.apache.org/docs/latest/sql-programming-guide.html#running-the-thrift-jdbcodbc-server
by periodically
restarting the server with a new context internally. That certainly beats
manual curation of Hive table definitions, if it will work?
Thanks again,
James.
On 7 April 2015 at 19:30, Michael Armbrust mich...@databricks.com wrote:
1) What exactly is the relationship between the thrift
The example below illustrates how to use the DIMSUM algorithm to calculate
the similarity between each two rows and output row pairs with cosine
simiarity that is not less than a threshold.
I'm reading a stream of string lines that are in json format.
I'm using Java with Spark.
Is there a way to get this from a transformation? so that I end up with a
stream of JSON objects.
I would also welcome any feedback about this approach or alternative
approaches.
thanks
jk
I have a simple setup/runtime of Kafka and Sprak.
I have a command line consumer displaying arrivals to Kafka topic. So i
know messages are being received.
But when I try to read from Kafka topic I get no messages, here are some
logs below.
I'm thinking there aren't enough threads. How do i
receiving data from sources like Kafka.
2015-04-01 16:18 GMT+08:00 James King jakwebin...@gmail.com:
Thank you bit1129,
From looking at the web UI i can see 2 cores
Also looking at http://spark.apache.org/docs/1.2.1/configuration.html
But can't see obvious configuration for number of receivers
:
Please make sure that you have given more cores than Receiver numbers.
*From:* James King jakwebin...@gmail.com
*Date:* 2015-04-01 15:21
*To:* user user@spark.apache.org
*Subject:* Spark + Kafka
I have a simple setup/runtime of Kafka and Sprak.
I have a command line consumer displaying
().getSimpleName())
.setMaster(master);
JavaStreamingContext ssc = new JavaStreamingContext(sparkConf,
Durations.seconds(duration));
return ssc;
}
On Wed, Apr 1, 2015 at 11:37 AM, James King jakwebin...@gmail.com wrote:
Thanks Saisai,
Sure will do.
But just a quick note that when i set master
I'm trying to run the Java NetwrokWordCount example against a simple spark
standalone runtime of one master and one worker.
But it doesn't seem to work, the text entered on the Netcat data server is
not being picked up and printed to Eclispe console output.
However if I use
at 6:31 PM, James King jakwebin...@gmail.com wrote:
I'm trying to run the Java NetwrokWordCount example against a simple
spark standalone runtime of one master and one worker.
But it doesn't seem to work, the text entered on the Netcat data server
is not being picked up and printed to Eclispe
Hello,
Is that possible to delete shuffle data of previous iteration as it is not
necessary?
Alcaid
On Mar 18, 2015, at 2:38 AM, James King jakwebin...@gmail.com wrote:
Hi All,
Which build of Spark is best when using Kafka?
Regards
jk
Hello All,
I'm using Spark for streaming but I'm unclear one which implementation
language to use Java, Scala or Python.
I don't know anything about Python, familiar with Scala and have been doing
Java for a long time.
I think the above shouldn't influence my decision on which language to use
Many thanks all for the good responses, appreciated.
On Thu, Mar 19, 2015 at 8:36 AM, James King jakwebin...@gmail.com wrote:
Thanks Khanderao.
On Wed, Mar 18, 2015 at 7:18 PM, Khanderao Kand Gmail
khanderao.k...@gmail.com wrote:
I have used various version of spark (1.0, 1.2.1) without
keep the most complex Scala
constructions out of your code)
On Thu, Mar 19, 2015 at 3:50 PM, James King jakwebin...@gmail.com wrote:
Hello All,
I'm using Spark for streaming but I'm unclear one which implementation
language to use Java, Scala or Python.
I don't know anything about
not including the mailing
list in the response, I'm the only one who will get your message.
Regards,
Jeff
2015-03-18 10:49 GMT+01:00 James King jakwebin...@gmail.com:
Any sub-category recommendations hadoop, MapR, CDH?
On Wed, Mar 18, 2015 at 10:48 AM, James King jakwebin...@gmail.com
wrote
I have got NullPointerException in aggregateMessages on a graph which is
the output of mapVertices function of a graph. I found the problem is
because of the mapVertices funciton did not affect all the triplet of the
graph.
// Initial the graph, assign a counter to each vertex that contains
the
Hello,
I am got a cluster with spark on yarn. Currently some nodes of it are
running a spark streamming program, thus their local space is not enough to
support other application. Thus I wonder is that possible to use a
blacklist to avoid using these nodes when running a new spark program?
My hadoop version is 2.2.0, and my spark version is 1.2.0
2015-03-14 17:22 GMT+08:00 Ted Yu yuzhih...@gmail.com:
Which release of hadoop are you using ?
Can you utilize node labels feature ?
See YARN-2492 and YARN-796
Cheers
On Sat, Mar 14, 2015 at 1:49 AM, James alcaid1...@gmail.com
, James alcaid1...@gmail.com wrote:
Hello,
I want to execute a hql script through `spark-sql` command, my script
contains:
```
ALTER TABLE xxx
DROP PARTITION (date_key = ${hiveconf:CUR_DATE});
```
when I execute
```
spark-sql -f script.hql -hiveconf CUR_DATE=20150119
Hello,
I want to execute a hql script through `spark-sql` command, my script
contains:
```
ALTER TABLE xxx
DROP PARTITION (date_key = ${hiveconf:CUR_DATE});
```
when I execute
```
spark-sql -f script.hql -hiveconf CUR_DATE=20150119
```
It throws an error like
```
cannot recognize input near
shortest path is an option, you could simply
find the APSP using https://github.com/apache/spark/pull/3619 and then
take the average distance (apsp.map(_._2.toDouble).mean).
Ankur http://www.ankurdave.com/
On Sun, Jan 4, 2015 at 6:28 PM, James alcaid1...@gmail.com wrote:
Recently we want
Hello,
I am trying to load a very large graph to run a GraphX algorithm, and the
graph is not fix the memory,
I found that if I use DISK_ONLY or MEMORY_AND_DISK_SER storage level, the
program will met OOM, but if I use MEMORY_ONLY_SER, the program will not.
Thus I want to know what kind of
, Nov 1, 2014 at 10:57 PM, James alcaid1...@gmail.com wrote:
Hello,
I am trying to run Connected Component algorithm on a very big graph. In
practice I found that a small number of partition size would lead to OOM,
while a large number would cause various time out exceptions. Thus I wonder
how
Hello,
I am trying to run Connected Component algorithm on a very big graph. In
practice I found that a small number of partition size would lead to OOM,
while a large number would cause various time out exceptions. Thus I wonder
how to estimate the number of partition of a graph in GraphX?
I have a related question. With Hadoop, I would do the same thing for
non-serializable objects and setup(). I also had a use case where it
was so expensive to initialize the non-serializable object that I
would make it a static member of the mapper, turn on JVM reuse across
tasks, and then
compile them inside of their program. That's
the one you mention here. You can choose to use this feature or not.
If you know your configs are not going to change, then you don't need
to set them with spark-submit.
On Wed, Jul 9, 2014 at 10:22 AM, Robert James srobertja...@gmail.com
wrote:
What
As a new user, I can definitely say that my experience with Spark has
been rather raw. The appeal of interactive, batch, and in between all
using more or less straight Scala is unarguable. But the experience
of deploying Spark has been quite painful, mainly about gaps between
compile time and
What is the purpose of spark-submit? Does it do anything outside of
the standard val conf = new SparkConf ... val sc = new SparkContext
... ?
I have a Spark app which runs well on local master. I'm now ready to
put it on a cluster. What needs to be installed on the master? What
needs to be installed on the workers?
If the cluster already has Hadoop or YARN or Cloudera, does it still
need an install of Spark?
When I use spark-submit (along with spark-ec2), I get dependency
conflicts. spark-assembly includes older versions of apache commons
codec and httpclient, and these conflict with many of the libs our
software uses.
Is there any way to resolve these? Or, if we use the precompiled
spark, can we
spark-submit includes a spark-assembly uber jar, which has older
versions of many common libraries. These conflict with some of the
dependencies we need. I have been racking my brain trying to find a
solution (including experimenting with ProGuard), but haven't been
able to: when we use
spark-submit includes a spark-assembly uber jar, which has older
versions of many common libraries. These conflict with some of the
dependencies we need. I have been racking my brain trying to find a
solution (including experimenting with ProGuard), but haven't been
able to: when we use
jars in front of classpath, which should do
the trick.
however i had no luck with this. see here:
https://issues.apache.org/jira/browse/SPARK-1863
On Mon, Jul 7, 2014 at 1:31 PM, Robert James srobertja...@gmail.com
wrote:
spark-submit includes a spark-assembly uber jar, which has older
If I've created a Spark EC2 cluster, how can I add or take away workers?
Also: If I use EC2 spot instances, what happens when Amazon removes
them? Will my computation be saved in any way, or will I need to
restart from scratch?
Finally: The spark-ec2 scripts seem to use Hadoop 1. How can I
I can say from my experience that getting Spark to work with Hadoop 2
is not for the beginner; after solving one problem after another
(dependencies, scripts, etc.), I went back to Hadoop 1.
Spark's Maven, ec2 scripts, and others all use Hadoop 1 - not sure
why, but, given so, Hadoop 2 has too
Although Spark's home page offers binaries for Spark 1.0.0 with Hadoop
2, the Maven repository only seems to have one version, which uses
Hadoop 1.
Is it possible to use a Maven link and Hadoop 2? What is the id?
If not: How can I use the prebuilt binaries to use Hadoop 2? Do I just
copy the
to make a jar assembly using your approach? How? If
not: How do you distribute the jars to the workers?
On Sun, Jun 29, 2014 at 12:20 PM, Robert James srobertja...@gmail.com
wrote:
Although Spark's home page offers binaries for Spark 1.0.0 with Hadoop
2, the Maven repository only seems to have
this problem? (Surely I'm not the only one
using Hadoop 2 and sbt or maven or ivy!)
On Jun 26, 2014 11:07 AM, Robert James srobertja...@gmail.com wrote:
Yes. As far as I can tell, Spark seems to be including Hadoop 1 via
its transitive dependency:
http://mvnrepository.com/artifact
To add Spark to a SBT project, I do:
libraryDependencies += org.apache.spark %% spark-core % 1.0.0
% provided
How do I make sure that the spark version which will be downloaded
will depend on, and use, Hadoop 2, and not Hadoop 1?
Even with a line:
libraryDependencies += org.apache.hadoop %
According to
http://mvnrepository.com/artifact/org.apache.spark/spark-core_2.10/1.0.0
, spark depends on Hadoop 1.0.4. What about the versions of Spark that
work with Hadoop 2? Do they also depend on Hadoop 1.0.4?
How does everyone handle this?
After upgrading to Spark 1.0.0, I get this error:
ERROR org.apache.spark.executor.ExecutorUncaughtExceptionHandler -
Uncaught exception in thread Thread[Executor task launch
worker-2,5,main]
java.lang.IncompatibleClassChangeError: Found interface
org.apache.hadoop.mapreduce.TaskAttemptContext,
We need a centralized spark logging solution. Ideally, it should:
* Allow any Spark process to log at multiple levels (info, warn,
debug) using a single line, similar to log4j
* All logs should go to a central location - so, to read the logs, we
don't need to check each worker by itself
*
My app works fine under Spark 0.9. I just tried upgrading to Spark
1.0, by downloading the Spark distro to a dir, changing the sbt file,
and running sbt assembly, but I get now NoSuchMethodErrors when trying
to use spark-submit.
I copied in the SimpleApp example from
On 6/24/14, Peng Cheng pc...@uow.edu.au wrote:
I got 'NoSuchFieldError' which is of the same type. its definitely a
dependency jar conflict. spark driver will load jars of itself which in
recent version get many dependencies that are 1-2 years old. And if your
newer version dependency is in
Unsubscribe
James Jones
Acquisition Editor
[ Packt Publishing ]
Tel: 0121 265 6486
Web: www.packtpub.com
Linkedin: uk.linkedin.com/pub/james-jones/52/3b9/596/
Twitter: @_James_Jones_
Packt Publishing Limited
Registered Office: Livery Place, 35 Livery Street, Birmingham, West Midlands
---
My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai
On Fri, May 16, 2014 at 1:59 PM, Robert James
srobertja...@gmail.comwrote:
What is a good way to pass config variables to workers?
I've tried setting them
I'm using spark-ec2 to run some Spark code. When I set master to
local, then it runs fine. However, when I set master to $MASTER,
the workers immediately fail, with java.lang.NoClassDefFoundError for
the classes.
I've used sbt-assembly to make a jar with the classes, confirmed using
jar tvf
What is a good way to pass config variables to workers?
I've tried setting them in environment variables via spark-env.sh, but, as
far as I can tell, the environment variables set there don't appear in
workers' environments. If I want to be able to configure all workers,
what's a good way to do
I've experienced the same bug, which I had to workaround manually. I
posted the details here:
http://stackoverflow.com/questions/23687081/spark-workers-unable-to-find-jar-on-ec2-cluster
On 5/15/14, DB Tsai dbt...@stanford.edu wrote:
Hi guys,
I think it maybe a bug in Spark. I wrote some code
What is the difference between a Spark Worker and a Spark Slave?
101 - 200 of 201 matches
Mail list logo