Re: spark ssh to slave

2015-06-08 Thread James King
at 2:51 PM, James King jakwebin...@gmail.com wrote: I have two hosts 192.168.1.15 (Master) and 192.168.1.16 (Worker) These two hosts have exchanged public keys so they have free access to each other. But when I do spark home/sbin/start-all.sh from 192.168.1.15 I still get 192.168.1.16

spark ssh to slave

2015-06-08 Thread James King
I have two hosts 192.168.1.15 (Master) and 192.168.1.16 (Worker) These two hosts have exchanged public keys so they have free access to each other. But when I do spark home/sbin/start-all.sh from 192.168.1.15 I still get 192.168.1.16: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).

Re: Worker Spark Port

2015-05-15 Thread James King
run on a specific port? Regards jk On Wed, May 13, 2015 at 7:51 PM, James King jakwebin...@gmail.com wrote: Indeed, many thanks. On Wednesday, 13 May 2015, Cody Koeninger c...@koeninger.org wrote: I believe most ports are configurable at this point, look at http://spark.apache.org/docs

Re: Worker Spark Port

2015-05-15 Thread James King
through a context. So, master != driver and executor != worker. Best Ayan On Fri, May 15, 2015 at 7:52 PM, James King jakwebin...@gmail.com wrote: So I'm using code like this to use specific ports: val conf = new SparkConf() .setMaster(master) .setAppName(namexxx) .set

Kafka Direct Approach + Zookeeper

2015-05-13 Thread James King
From: http://spark.apache.org/docs/latest/streaming-kafka-integration.html I'm trying to use the direct approach to read messages form Kafka. Kafka is running as a cluster and configured with Zookeeper. On the above page it mentions: In the Kafka parameters, you must specify either

Re: Kafka Direct Approach + Zookeeper

2015-05-13 Thread James King
of brokers in pre-existing Kafka project apis. I don't know why the Kafka project chose to use 2 different configuration keys. On Wed, May 13, 2015 at 5:00 AM, James King jakwebin...@gmail.com wrote: From: http://spark.apache.org/docs/latest/streaming-kafka-integration.html I'm trying to use

Kafka + Direct + Zookeeper

2015-05-13 Thread James King
I'm trying Kafka Direct approach (for consume) but when I use only this config: kafkaParams.put(group.id, groupdid); kafkaParams.put(zookeeper.connect, zookeeperHostAndPort + /cb_kafka); I get this Exception in thread main org.apache.spark.SparkException: Must specify metadata.broker.list or

Worker Spark Port

2015-05-13 Thread James King
I understated that this port value is randomly selected. Is there a way to enforce which spark port a Worker should use?

Re: Kafka Direct Approach + Zookeeper

2015-05-13 Thread James King
, James King jakwebin...@gmail.com wrote: Looking at Consumer Configs in http://kafka.apache.org/documentation.html#consumerconfigs The properties *metadata.broker.list* or *bootstrap.servers *are not mentioned. Should I need these for consume side? On Wed, May 13, 2015 at 3:52 PM, James King

Re: Kafka Direct Approach + Zookeeper

2015-05-13 Thread James King
Looking at Consumer Configs in http://kafka.apache.org/documentation.html#consumerconfigs The properties *metadata.broker.list* or *bootstrap.servers *are not mentioned. Should I need these for consume side? On Wed, May 13, 2015 at 3:52 PM, James King jakwebin...@gmail.com wrote: Many thanks

Re: Worker Spark Port

2015-05-13 Thread James King
Indeed, many thanks. On Wednesday, 13 May 2015, Cody Koeninger c...@koeninger.org wrote: I believe most ports are configurable at this point, look at http://spark.apache.org/docs/latest/configuration.html search for .port On Wed, May 13, 2015 at 9:38 AM, James King jakwebin...@gmail.com

Re: Master HA

2015-05-12 Thread James King
Thanks Akhil, I'm using Spark in standalone mode so i guess Mesos is not an option here. On Tue, May 12, 2015 at 1:27 PM, Akhil Das ak...@sigmoidanalytics.com wrote: Mesos has a HA option (of course it includes zookeeper) Thanks Best Regards On Tue, May 12, 2015 at 4:53 PM, James King

Reading Real Time Data only from Kafka

2015-05-12 Thread James King
What I want is if the driver dies for some reason and it is restarted I want to read only messages that arrived into Kafka following the restart of the driver program and re-connection to Kafka. Has anyone done this? any links or resources that can help explain this? Regards jk

Re: Reading Real Time Data only from Kafka

2015-05-12 Thread James King
Best Regards On Tue, May 12, 2015 at 5:15 PM, James King jakwebin...@gmail.com wrote: What I want is if the driver dies for some reason and it is restarted I want to read only messages that arrived into Kafka following the restart of the driver program and re-connection to Kafka. Has anyone

Re: Reading Real Time Data only from Kafka

2015-05-12 Thread James King
at 9:01 AM, James King jakwebin...@gmail.com wrote: Thanks Cody. Here are the events: - Spark app connects to Kafka first time and starts consuming - Messages 1 - 10 arrive at Kafka then Spark app gets them - Now driver dies - Messages 11 - 15 arrive at Kafka - Spark driver program

Master HA

2015-05-12 Thread James King
I know that it is possible to use Zookeeper and File System (not for production use) to achieve HA. Are there any other options now or in the near future?

Re: Reading Real Time Data only from Kafka

2015-05-12 Thread James King
using an earlier version of spark, you can accomplish what you want simply by starting the job using a new consumer group (there will be no prior state in zookeeper, so it will start consuming according to auto.offset.reset) On Tue, May 12, 2015 at 7:26 AM, James King jakwebin...@gmail.com wrote

Re: Submit Spark application in cluster mode and supervised

2015-05-09 Thread James King
should set your master URL to be spark://host01:7077,host02:7077 And the property spark.deploy.recoveryMode=ZOOKEEPER See here for more info: http://spark.apache.org/docs/latest/spark-standalone.html#standby-masters-with-zookeeper From: James King Date: Friday, May 8, 2015 at 11:22 AM

Re: Stop Cluster Mode Running App

2015-05-08 Thread James King
the kill command in spark-submit to shut it down. You’ll need the driver id from the Spark UI or from when you submitted the app. spark-submit --master spark://master:7077 --kill driver-id Thanks, Silvio From: James King Date: Wednesday, May 6, 2015 at 12:02 PM To: user Subject: Stop

Cluster mode and supervised app with multiple Masters

2015-05-08 Thread James King
Why does this not work ./spark-1.3.0-bin-hadoop2.4/bin/spark-submit --class SomeApp --deploy-mode cluster --supervise --master spark://host01:7077,host02:7077 Some.jar With exception: Caused by: java.lang.NumberFormatException: For input string: 7077,host02:7077 It seems to accept only one

Submit Spark application in cluster mode and supervised

2015-05-08 Thread James King
I have two hosts host01 and host02 (lets call them) I run one Master and two Workers on host01 I also run one Master and two Workers on host02 Now I have 1 LIVE Master on host01 and a STANDBY Master on host02 The LIVE Master is aware of all Workers in the cluster Now I submit a Spark

Re: Submit Spark application in cluster mode and supervised

2015-05-08 Thread James King
BTW I'm using Spark 1.3.0. Thanks On Fri, May 8, 2015 at 5:22 PM, James King jakwebin...@gmail.com wrote: I have two hosts host01 and host02 (lets call them) I run one Master and two Workers on host01 I also run one Master and two Workers on host02 Now I have 1 LIVE Master on host01

Re: Receiver Fault Tolerance

2015-05-06 Thread James King
Many thanks all, your responses have been very helpful. Cheers On Wed, May 6, 2015 at 2:14 PM, ayan guha guha.a...@gmail.com wrote: https://spark.apache.org/docs/latest/streaming-programming-guide.html#fault-tolerance-semantics On Wed, May 6, 2015 at 10:09 PM, James King jakwebin

Receiver Fault Tolerance

2015-05-06 Thread James King
In the O'reilly book Learning Spark Chapter 10 section 24/7 Operation It talks about 'Receiver Fault Tolerance' I'm unsure of what a Receiver is here, from reading it sounds like when you submit an application to the cluster in cluster mode i.e. *--deploy-mode cluster *the driver program will

Re: Enabling Event Log

2015-05-01 Thread James King
/spark-events. And this folder does not exits. Best Regards, Shixiong Zhu 2015-04-29 23:22 GMT-07:00 James King jakwebin...@gmail.com: I'm unclear why I'm getting this exception. It seems to have realized that I want to enable Event Logging but ignoring where I want it to log to i.e. file

Enabling Event Log

2015-04-30 Thread James King
I'm unclear why I'm getting this exception. It seems to have realized that I want to enable Event Logging but ignoring where I want it to log to i.e. file:/opt/cb/tmp/spark-events which does exist. spark-default.conf # Example: spark.master spark://master1:7077,master2:7077

Re: spark-defaults.conf

2015-04-28 Thread James King
explicitly Shouldn't Spark just consult with ZK and us the active master? Or is ZK only used during failure? On Mon, Apr 27, 2015 at 1:53 PM, James King jakwebin...@gmail.com wrote: Thanks. I've set SPARK_HOME and SPARK_CONF_DIR appropriately in .bash_profile But when I start worker like

submitting to multiple masters

2015-04-28 Thread James King
I have multiple masters running and I'm trying to submit an application using spark-1.3.0-bin-hadoop2.4/bin/spark-submit with this config (i.e. a comma separated list of master urls) --master spark://master01:7077,spark://master02:7077 But getting this exception

spark-defaults.conf

2015-04-27 Thread James King
I renamed spark-defaults.conf.template to spark-defaults.conf and invoked spark-1.3.0-bin-hadoop2.4/sbin/start-slave.sh But I still get failed to launch org.apache.spark.deploy.worker.Worker: --properties-file FILE Path to a custom Spark properties file.

Re: spark-defaults.conf

2015-04-27 Thread James King
, SPARK_CONF_DIR. On Mon, Apr 27, 2015 at 12:56 PM James King jakwebin...@gmail.com wrote: I renamed spark-defaults.conf.template to spark-defaults.conf and invoked spark-1.3.0-bin-hadoop2.4/sbin/start-slave.sh But I still get failed to launch org.apache.spark.deploy.worker.Worker: --properties

Re: Querying Cluster State

2015-04-26 Thread James King
On Sun, Apr 26, 2015 at 6:31 PM, James King jakwebin...@gmail.com wrote: If I have 5 nodes and I wish to maintain 1 Master and 2 Workers on each node, so in total I will have 5 master and 10 Workers. Now to maintain that setup I would like to query spark regarding the number Masters and Workers

Querying Cluster State

2015-04-26 Thread James King
If I have 5 nodes and I wish to maintain 1 Master and 2 Workers on each node, so in total I will have 5 master and 10 Workers. Now to maintain that setup I would like to query spark regarding the number Masters and Workers that are currently available using API calls and then take some

Spark Cluster Setup

2015-04-24 Thread James King
I'm trying to find out how to setup a resilient Spark cluster. Things I'm thinking about include: - How to start multiple masters on different hosts? - there isn't a conf/masters file from what I can see Thank you.

Re: Spark Cluster Setup

2015-04-24 Thread James King
://twitter.com/deanwampler http://polyglotprogramming.com On Fri, Apr 24, 2015 at 5:01 AM, James King jakwebin...@gmail.com wrote: I'm trying to find out how to setup a resilient Spark cluster. Things I'm thinking about include: - How to start multiple masters on different hosts

Master -chatter - Worker

2015-04-22 Thread James King
Is there a good resource that covers what kind of chatter (communication) that goes on between driver, master and worker processes? Thanks

Re: Spark Unit Testing

2015-04-21 Thread James King
://www.slideshare.net/databricks/strata-sj-everyday-im-shuffling-tips-for-writing-better-spark-programs -- Emre Sevinç http://www.bigindustries.be/ On Tue, Apr 21, 2015 at 1:26 PM, James King jakwebin...@gmail.com wrote: I'm trying to write some unit tests for my spark code. I need to pass

Spark Unit Testing

2015-04-21 Thread James King
I'm trying to write some unit tests for my spark code. I need to pass a JavaPairDStreamString, String to my spark class. Is there a way to create a JavaPairDStream using Java API? Also is there a good resource that covers an approach (or approaches) for unit testing using Java. Regards jk

Skipped Jobs

2015-04-19 Thread James King
In the web ui i can see some jobs as 'skipped' what does that mean? why are these jobs skipped? do they ever get executed? Regards jk

Spark Cluster: RECEIVED SIGNAL 15: SIGTERM

2015-04-13 Thread James King
Any idea what this means, many thanks == logs/spark-.-org.apache.spark.deploy.worker.Worker-1-09.out.1 == 15/04/13 07:07:22 INFO Worker: Starting Spark worker 09:39910 with 4 cores, 6.6 GB RAM 15/04/13 07:07:22 INFO Worker: Running Spark version 1.3.0 15/04/13 07:07:22 INFO

A stream of json objects using Java

2015-04-02 Thread James King
I'm reading a stream of string lines that are in json format. I'm using Java with Spark. Is there a way to get this from a transformation? so that I end up with a stream of JSON objects. I would also welcome any feedback about this approach or alternative approaches. thanks jk

Spark + Kafka

2015-04-01 Thread James King
I have a simple setup/runtime of Kafka and Sprak. I have a command line consumer displaying arrivals to Kafka topic. So i know messages are being received. But when I try to read from Kafka topic I get no messages, here are some logs below. I'm thinking there aren't enough threads. How do i

Re: Spark + Kafka

2015-04-01 Thread James King
receiving data from sources like Kafka. 2015-04-01 16:18 GMT+08:00 James King jakwebin...@gmail.com: Thank you bit1129, From looking at the web UI i can see 2 cores Also looking at http://spark.apache.org/docs/1.2.1/configuration.html But can't see obvious configuration for number of receivers

Re: Spark + Kafka

2015-04-01 Thread James King
: Please make sure that you have given more cores than Receiver numbers. *From:* James King jakwebin...@gmail.com *Date:* 2015-04-01 15:21 *To:* user user@spark.apache.org *Subject:* Spark + Kafka I have a simple setup/runtime of Kafka and Sprak. I have a command line consumer displaying

Re: Spark + Kafka

2015-04-01 Thread James King
().getSimpleName()) .setMaster(master); JavaStreamingContext ssc = new JavaStreamingContext(sparkConf, Durations.seconds(duration)); return ssc; } On Wed, Apr 1, 2015 at 11:37 AM, James King jakwebin...@gmail.com wrote: Thanks Saisai, Sure will do. But just a quick note that when i set master

NetwrokWordCount + Spark standalone

2015-03-25 Thread James King
I'm trying to run the Java NetwrokWordCount example against a simple spark standalone runtime of one master and one worker. But it doesn't seem to work, the text entered on the Netcat data server is not being picked up and printed to Eclispe console output. However if I use

Re: NetwrokWordCount + Spark standalone

2015-03-25 Thread James King
at 6:31 PM, James King jakwebin...@gmail.com wrote: I'm trying to run the Java NetwrokWordCount example against a simple spark standalone runtime of one master and one worker. But it doesn't seem to work, the text entered on the Netcat data server is not being picked up and printed to Eclispe

Re: Spark + Kafka

2015-03-19 Thread James King
On Mar 18, 2015, at 2:38 AM, James King jakwebin...@gmail.com wrote: Hi All, Which build of Spark is best when using Kafka? Regards jk

Writing Spark Streaming Programs

2015-03-19 Thread James King
Hello All, I'm using Spark for streaming but I'm unclear one which implementation language to use Java, Scala or Python. I don't know anything about Python, familiar with Scala and have been doing Java for a long time. I think the above shouldn't influence my decision on which language to use

Re: Spark + Kafka

2015-03-19 Thread James King
Many thanks all for the good responses, appreciated. On Thu, Mar 19, 2015 at 8:36 AM, James King jakwebin...@gmail.com wrote: Thanks Khanderao. On Wed, Mar 18, 2015 at 7:18 PM, Khanderao Kand Gmail khanderao.k...@gmail.com wrote: I have used various version of spark (1.0, 1.2.1) without

Re: Writing Spark Streaming Programs

2015-03-19 Thread James King
keep the most complex Scala constructions out of your code) On Thu, Mar 19, 2015 at 3:50 PM, James King jakwebin...@gmail.com wrote: Hello All, I'm using Spark for streaming but I'm unclear one which implementation language to use Java, Scala or Python. I don't know anything about

Re: Spark + Kafka

2015-03-18 Thread James King
not including the mailing list in the response, I'm the only one who will get your message. Regards, Jeff 2015-03-18 10:49 GMT+01:00 James King jakwebin...@gmail.com: Any sub-category recommendations hadoop, MapR, CDH? On Wed, Mar 18, 2015 at 10:48 AM, James King jakwebin...@gmail.com wrote