at 2:51 PM, James King jakwebin...@gmail.com wrote:
I have two hosts 192.168.1.15 (Master) and 192.168.1.16 (Worker)
These two hosts have exchanged public keys so they have free access to
each other.
But when I do spark home/sbin/start-all.sh from 192.168.1.15 I still
get
192.168.1.16
I have two hosts 192.168.1.15 (Master) and 192.168.1.16 (Worker)
These two hosts have exchanged public keys so they have free access to each
other.
But when I do spark home/sbin/start-all.sh from 192.168.1.15 I still get
192.168.1.16: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
run on a specific port?
Regards
jk
On Wed, May 13, 2015 at 7:51 PM, James King jakwebin...@gmail.com wrote:
Indeed, many thanks.
On Wednesday, 13 May 2015, Cody Koeninger c...@koeninger.org wrote:
I believe most ports are configurable at this point, look at
http://spark.apache.org/docs
through a context.
So, master != driver and executor != worker.
Best
Ayan
On Fri, May 15, 2015 at 7:52 PM, James King jakwebin...@gmail.com wrote:
So I'm using code like this to use specific ports:
val conf = new SparkConf()
.setMaster(master)
.setAppName(namexxx)
.set
From: http://spark.apache.org/docs/latest/streaming-kafka-integration.html
I'm trying to use the direct approach to read messages form Kafka.
Kafka is running as a cluster and configured with Zookeeper.
On the above page it mentions:
In the Kafka parameters, you must specify either
of brokers in pre-existing Kafka
project apis. I don't know why the Kafka project chose to use 2 different
configuration keys.
On Wed, May 13, 2015 at 5:00 AM, James King jakwebin...@gmail.com wrote:
From:
http://spark.apache.org/docs/latest/streaming-kafka-integration.html
I'm trying to use
I'm trying Kafka Direct approach (for consume) but when I use only this
config:
kafkaParams.put(group.id, groupdid);
kafkaParams.put(zookeeper.connect, zookeeperHostAndPort + /cb_kafka);
I get this
Exception in thread main org.apache.spark.SparkException: Must specify
metadata.broker.list or
I understated that this port value is randomly selected.
Is there a way to enforce which spark port a Worker should use?
, James King jakwebin...@gmail.com wrote:
Looking at Consumer Configs in
http://kafka.apache.org/documentation.html#consumerconfigs
The properties *metadata.broker.list* or *bootstrap.servers *are not
mentioned.
Should I need these for consume side?
On Wed, May 13, 2015 at 3:52 PM, James King
Looking at Consumer Configs in
http://kafka.apache.org/documentation.html#consumerconfigs
The properties *metadata.broker.list* or *bootstrap.servers *are not
mentioned.
Should I need these for consume side?
On Wed, May 13, 2015 at 3:52 PM, James King jakwebin...@gmail.com wrote:
Many thanks
Indeed, many thanks.
On Wednesday, 13 May 2015, Cody Koeninger c...@koeninger.org wrote:
I believe most ports are configurable at this point, look at
http://spark.apache.org/docs/latest/configuration.html
search for .port
On Wed, May 13, 2015 at 9:38 AM, James King jakwebin...@gmail.com
Thanks Akhil,
I'm using Spark in standalone mode so i guess Mesos is not an option here.
On Tue, May 12, 2015 at 1:27 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:
Mesos has a HA option (of course it includes zookeeper)
Thanks
Best Regards
On Tue, May 12, 2015 at 4:53 PM, James King
What I want is if the driver dies for some reason and it is restarted I
want to read only messages that arrived into Kafka following the restart of
the driver program and re-connection to Kafka.
Has anyone done this? any links or resources that can help explain this?
Regards
jk
Best Regards
On Tue, May 12, 2015 at 5:15 PM, James King jakwebin...@gmail.com wrote:
What I want is if the driver dies for some reason and it is restarted I
want to read only messages that arrived into Kafka following the restart of
the driver program and re-connection to Kafka.
Has anyone
at 9:01 AM, James King jakwebin...@gmail.com wrote:
Thanks Cody.
Here are the events:
- Spark app connects to Kafka first time and starts consuming
- Messages 1 - 10 arrive at Kafka then Spark app gets them
- Now driver dies
- Messages 11 - 15 arrive at Kafka
- Spark driver program
I know that it is possible to use Zookeeper and File System (not for
production use) to achieve HA.
Are there any other options now or in the near future?
using an earlier version of spark, you can
accomplish what you want simply by starting the job using a new consumer
group (there will be no prior state in zookeeper, so it will start
consuming according to auto.offset.reset)
On Tue, May 12, 2015 at 7:26 AM, James King jakwebin...@gmail.com wrote
should set
your master URL to be
spark://host01:7077,host02:7077
And the property spark.deploy.recoveryMode=ZOOKEEPER
See here for more info:
http://spark.apache.org/docs/latest/spark-standalone.html#standby-masters-with-zookeeper
From: James King
Date: Friday, May 8, 2015 at 11:22 AM
the kill command in spark-submit to
shut it down. You’ll need the driver id from the Spark UI or from when you
submitted the app.
spark-submit --master spark://master:7077 --kill driver-id
Thanks,
Silvio
From: James King
Date: Wednesday, May 6, 2015 at 12:02 PM
To: user
Subject: Stop
Why does this not work
./spark-1.3.0-bin-hadoop2.4/bin/spark-submit --class SomeApp --deploy-mode
cluster --supervise --master spark://host01:7077,host02:7077 Some.jar
With exception:
Caused by: java.lang.NumberFormatException: For input string:
7077,host02:7077
It seems to accept only one
I have two hosts host01 and host02 (lets call them)
I run one Master and two Workers on host01
I also run one Master and two Workers on host02
Now I have 1 LIVE Master on host01 and a STANDBY Master on host02
The LIVE Master is aware of all Workers in the cluster
Now I submit a Spark
BTW I'm using Spark 1.3.0.
Thanks
On Fri, May 8, 2015 at 5:22 PM, James King jakwebin...@gmail.com wrote:
I have two hosts host01 and host02 (lets call them)
I run one Master and two Workers on host01
I also run one Master and two Workers on host02
Now I have 1 LIVE Master on host01
Many thanks all, your responses have been very helpful. Cheers
On Wed, May 6, 2015 at 2:14 PM, ayan guha guha.a...@gmail.com wrote:
https://spark.apache.org/docs/latest/streaming-programming-guide.html#fault-tolerance-semantics
On Wed, May 6, 2015 at 10:09 PM, James King jakwebin
In the O'reilly book Learning Spark Chapter 10 section 24/7 Operation
It talks about 'Receiver Fault Tolerance'
I'm unsure of what a Receiver is here, from reading it sounds like when you
submit an application to the cluster in cluster mode i.e. *--deploy-mode
cluster *the driver program will
/spark-events. And this folder does
not exits.
Best Regards,
Shixiong Zhu
2015-04-29 23:22 GMT-07:00 James King jakwebin...@gmail.com:
I'm unclear why I'm getting this exception.
It seems to have realized that I want to enable Event Logging but
ignoring where I want it to log to i.e. file
I'm unclear why I'm getting this exception.
It seems to have realized that I want to enable Event Logging but ignoring
where I want it to log to i.e. file:/opt/cb/tmp/spark-events which does
exist.
spark-default.conf
# Example:
spark.master spark://master1:7077,master2:7077
explicitly
Shouldn't Spark just consult with ZK and us the active master?
Or is ZK only used during failure?
On Mon, Apr 27, 2015 at 1:53 PM, James King jakwebin...@gmail.com wrote:
Thanks.
I've set SPARK_HOME and SPARK_CONF_DIR appropriately in .bash_profile
But when I start worker like
I have multiple masters running and I'm trying to submit an application
using
spark-1.3.0-bin-hadoop2.4/bin/spark-submit
with this config (i.e. a comma separated list of master urls)
--master spark://master01:7077,spark://master02:7077
But getting this exception
I renamed spark-defaults.conf.template to spark-defaults.conf
and invoked
spark-1.3.0-bin-hadoop2.4/sbin/start-slave.sh
But I still get
failed to launch org.apache.spark.deploy.worker.Worker:
--properties-file FILE Path to a custom Spark properties file.
, SPARK_CONF_DIR.
On Mon, Apr 27, 2015 at 12:56 PM James King jakwebin...@gmail.com wrote:
I renamed spark-defaults.conf.template to spark-defaults.conf
and invoked
spark-1.3.0-bin-hadoop2.4/sbin/start-slave.sh
But I still get
failed to launch org.apache.spark.deploy.worker.Worker:
--properties
On Sun, Apr 26, 2015 at 6:31 PM, James King jakwebin...@gmail.com wrote:
If I have 5 nodes and I wish to maintain 1 Master and 2 Workers on each
node, so in total I will have 5 master and 10 Workers.
Now to maintain that setup I would like to query spark regarding the
number Masters and Workers
If I have 5 nodes and I wish to maintain 1 Master and 2 Workers on each
node, so in total I will have 5 master and 10 Workers.
Now to maintain that setup I would like to query spark regarding the number
Masters and Workers that are currently available using API calls and then
take some
I'm trying to find out how to setup a resilient Spark cluster.
Things I'm thinking about include:
- How to start multiple masters on different hosts?
- there isn't a conf/masters file from what I can see
Thank you.
://twitter.com/deanwampler
http://polyglotprogramming.com
On Fri, Apr 24, 2015 at 5:01 AM, James King jakwebin...@gmail.com wrote:
I'm trying to find out how to setup a resilient Spark cluster.
Things I'm thinking about include:
- How to start multiple masters on different hosts
Is there a good resource that covers what kind of chatter (communication)
that goes on between driver, master and worker processes?
Thanks
://www.slideshare.net/databricks/strata-sj-everyday-im-shuffling-tips-for-writing-better-spark-programs
--
Emre Sevinç
http://www.bigindustries.be/
On Tue, Apr 21, 2015 at 1:26 PM, James King jakwebin...@gmail.com wrote:
I'm trying to write some unit tests for my spark code.
I need to pass
I'm trying to write some unit tests for my spark code.
I need to pass a JavaPairDStreamString, String to my spark class.
Is there a way to create a JavaPairDStream using Java API?
Also is there a good resource that covers an approach (or approaches) for
unit testing using Java.
Regards
jk
In the web ui i can see some jobs as 'skipped' what does that mean? why are
these jobs skipped? do they ever get executed?
Regards
jk
Any idea what this means, many thanks
==
logs/spark-.-org.apache.spark.deploy.worker.Worker-1-09.out.1
==
15/04/13 07:07:22 INFO Worker: Starting Spark worker 09:39910 with 4
cores, 6.6 GB RAM
15/04/13 07:07:22 INFO Worker: Running Spark version 1.3.0
15/04/13 07:07:22 INFO
I'm reading a stream of string lines that are in json format.
I'm using Java with Spark.
Is there a way to get this from a transformation? so that I end up with a
stream of JSON objects.
I would also welcome any feedback about this approach or alternative
approaches.
thanks
jk
I have a simple setup/runtime of Kafka and Sprak.
I have a command line consumer displaying arrivals to Kafka topic. So i
know messages are being received.
But when I try to read from Kafka topic I get no messages, here are some
logs below.
I'm thinking there aren't enough threads. How do i
receiving data from sources like Kafka.
2015-04-01 16:18 GMT+08:00 James King jakwebin...@gmail.com:
Thank you bit1129,
From looking at the web UI i can see 2 cores
Also looking at http://spark.apache.org/docs/1.2.1/configuration.html
But can't see obvious configuration for number of receivers
:
Please make sure that you have given more cores than Receiver numbers.
*From:* James King jakwebin...@gmail.com
*Date:* 2015-04-01 15:21
*To:* user user@spark.apache.org
*Subject:* Spark + Kafka
I have a simple setup/runtime of Kafka and Sprak.
I have a command line consumer displaying
().getSimpleName())
.setMaster(master);
JavaStreamingContext ssc = new JavaStreamingContext(sparkConf,
Durations.seconds(duration));
return ssc;
}
On Wed, Apr 1, 2015 at 11:37 AM, James King jakwebin...@gmail.com wrote:
Thanks Saisai,
Sure will do.
But just a quick note that when i set master
I'm trying to run the Java NetwrokWordCount example against a simple spark
standalone runtime of one master and one worker.
But it doesn't seem to work, the text entered on the Netcat data server is
not being picked up and printed to Eclispe console output.
However if I use
at 6:31 PM, James King jakwebin...@gmail.com wrote:
I'm trying to run the Java NetwrokWordCount example against a simple
spark standalone runtime of one master and one worker.
But it doesn't seem to work, the text entered on the Netcat data server
is not being picked up and printed to Eclispe
On Mar 18, 2015, at 2:38 AM, James King jakwebin...@gmail.com wrote:
Hi All,
Which build of Spark is best when using Kafka?
Regards
jk
Hello All,
I'm using Spark for streaming but I'm unclear one which implementation
language to use Java, Scala or Python.
I don't know anything about Python, familiar with Scala and have been doing
Java for a long time.
I think the above shouldn't influence my decision on which language to use
Many thanks all for the good responses, appreciated.
On Thu, Mar 19, 2015 at 8:36 AM, James King jakwebin...@gmail.com wrote:
Thanks Khanderao.
On Wed, Mar 18, 2015 at 7:18 PM, Khanderao Kand Gmail
khanderao.k...@gmail.com wrote:
I have used various version of spark (1.0, 1.2.1) without
keep the most complex Scala
constructions out of your code)
On Thu, Mar 19, 2015 at 3:50 PM, James King jakwebin...@gmail.com wrote:
Hello All,
I'm using Spark for streaming but I'm unclear one which implementation
language to use Java, Scala or Python.
I don't know anything about
not including the mailing
list in the response, I'm the only one who will get your message.
Regards,
Jeff
2015-03-18 10:49 GMT+01:00 James King jakwebin...@gmail.com:
Any sub-category recommendations hadoop, MapR, CDH?
On Wed, Mar 18, 2015 at 10:48 AM, James King jakwebin...@gmail.com
wrote
51 matches
Mail list logo