[jira] [Commented] (SPARK-9559) Worker redundancy/failover in spark stand-alone mode

2015-08-03 Thread partha bishnu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651984#comment-14651984
 ] 

partha bishnu commented on SPARK-9559:
--

Thanks. If I understand correctly --num-executor is for deploying on Yarn 
cluster and --total-executor-cores for spark stand-alone cluster. I am using 
spark stand-alone cluster.

 Worker redundancy/failover in spark stand-alone mode
 

 Key: SPARK-9559
 URL: https://issues.apache.org/jira/browse/SPARK-9559
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.3.0
Reporter: partha bishnu





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9559) Worker redundancy/failover in spark stand-alone mode

2015-08-03 Thread partha bishnu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651907#comment-14651907
 ] 

partha bishnu commented on SPARK-9559:
--

The expected behavior should be that the spark master on n-1 should restart the 
jobs with one new executor under the running worker jvm on the other worker 
node n-2 that is up and running after the n-3 went down. Isn that expected 
behavior ? But that does not happen.
Thanks for your comments


 Worker redundancy/failover in spark stand-alone mode
 

 Key: SPARK-9559
 URL: https://issues.apache.org/jira/browse/SPARK-9559
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.3.0
Reporter: partha bishnu





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9559) Worker redundancy/failover in spark stand-alone mode

2015-08-03 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651923#comment-14651923
 ] 

Sean Owen commented on SPARK-9559:
--

OK so you have requested 1 total executor. Did the job fail then? or are you 
talking about the state after it completed?

 Worker redundancy/failover in spark stand-alone mode
 

 Key: SPARK-9559
 URL: https://issues.apache.org/jira/browse/SPARK-9559
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.3.0
Reporter: partha bishnu





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9559) Worker redundancy/failover in spark stand-alone mode

2015-08-03 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651924#comment-14651924
 ] 

Sean Owen commented on SPARK-9559:
--

PS you should try reproducing this on master rather than 1.3, which is 
relatively old at this stage.

 Worker redundancy/failover in spark stand-alone mode
 

 Key: SPARK-9559
 URL: https://issues.apache.org/jira/browse/SPARK-9559
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.3.0
Reporter: partha bishnu





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9559) Worker redundancy/failover in spark stand-alone mode

2015-08-03 Thread partha bishnu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652933#comment-14652933
 ] 

partha bishnu commented on SPARK-9559:
--

We tested on 1.4.1 and got same results i.e.  a new executor JVM  did not get 
started on the other worker node after the node running the jobs stopped 
running. So it seems a like a major defect.

 Worker redundancy/failover in spark stand-alone mode
 

 Key: SPARK-9559
 URL: https://issues.apache.org/jira/browse/SPARK-9559
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.3.0
Reporter: partha bishnu





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9559) Worker redundancy/failover in spark stand-alone mode

2015-08-03 Thread partha bishnu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651965#comment-14651965
 ] 

partha bishnu commented on SPARK-9559:
--

Hi
Yes..I requested 1 executor like I mentioned in the original description [ I 
used --total-executor-cores 1 with spark_submit]
We are using 1.3 so far and as you suggested to use 1.4, we will look into it 
and try to reproduce on 1.4 and report back. Again Thanks for looking into it. 
Again to recap:

With options: --total-executor-core 1 and check-pointing enabled, I have:

node-1: Spark Master running
node-2: 1 worker jvm  running and can start at most one executor 
node-3: 1 worker jvm  and can start at most one executor.

 I launch jobs using spark_submit that started jobs in one executor on node-2
 I killed node-2 (both worker jvm and executor)
 Expected behavior: Spark master should ask worker JVM on node-3 to launch a 
 new executor and restart the jobs in that executor.
 Observed behavior: Jobs got stuck




 Worker redundancy/failover in spark stand-alone mode
 

 Key: SPARK-9559
 URL: https://issues.apache.org/jira/browse/SPARK-9559
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.3.0
Reporter: partha bishnu





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9559) Worker redundancy/failover in spark stand-alone mode

2015-08-03 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651968#comment-14651968
 ] 

Sean Owen commented on SPARK-9559:
--

total-executor-cores isn't the same as num-executors but 1 total core must mean 
1 executor, yes.
Use master (which is nearly 1.5), not 1.4, just to be most useful.

 Worker redundancy/failover in spark stand-alone mode
 

 Key: SPARK-9559
 URL: https://issues.apache.org/jira/browse/SPARK-9559
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.3.0
Reporter: partha bishnu





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9559) Worker redundancy/failover in spark stand-alone mode

2015-08-03 Thread partha bishnu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651882#comment-14651882
 ] 

partha bishnu commented on SPARK-9559:
--

Hi
I am running some tests on spark in stand-alone mode with 3 nodes cluster. 
spark master is running on n-1, and slaves are on n-2 and n-3. Each machine is 
with 8G RAM and 4 core cpu. I am trying to test worker redundancy.
I wanted to set up the cluster such a way so that there will be two worker JVM, 
one on each slave (n-2 and n-3) after I start up the cluster.

Then one of the slave's worker JVM will launch the executor jvm to process the 
tasks when I submit the job with the following flags:
 ---total-executor-cores 1 and --executor-memory 1G

(1) Job submitted successfully in client mode. n-2 had worker jvm launched  a 
executor jvm. So now n-2 had one worker jvm and one executor jvm running and 
n-3 just had the worker jvm running as before.

(2) I killed the worker jvm and the executor jvm on n-2

(3) I expected spark master on n-1 will then ask the worker jvm on n-3 to 
launch a new executor to start processing jobs but that did not happen. driver 
just got hung on the screen. n-2 disappeared from spark  cluster as expected. 
n-3 just had the worker jvm running as before and no new executor was launched 
as expected after n-2 disappeared. 

 Worker redundancy/failover in spark stand-alone mode
 

 Key: SPARK-9559
 URL: https://issues.apache.org/jira/browse/SPARK-9559
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.3.0
Reporter: partha bishnu





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9559) Worker redundancy/failover in spark stand-alone mode

2015-08-03 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651891#comment-14651891
 ] 

Sean Owen commented on SPARK-9559:
--

You should see 1 executor per worker. You lost an entire worker, so your jobs 
now use 1 executor each. I think this is expected behavior?

 Worker redundancy/failover in spark stand-alone mode
 

 Key: SPARK-9559
 URL: https://issues.apache.org/jira/browse/SPARK-9559
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.3.0
Reporter: partha bishnu





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org