FW: Submitting jobs to Spark EC2 cluster remotely

Oleg Shirokikh Mon, 23 Feb 2015 01:14:45 -0800

Patrick,

I haven't changed the configs much. I just executed ec2-script to create 1 
master, 2 slaves cluster. Then I try to submit the jobs from remote machine 
leaving all defaults configured by Spark scripts as default. I've tried to 
change configs as suggested in other mailing-list and stack overflow threads 
(such as setting spark.driver.host, etc...), removed (hopefully) all 
security/firewall restrictions from AWS, etc. but it didn't help.


I think that what you are saying is exactly the issue: on my master node UI at 
the bottom I can see the list of "Completed Drivers" all with ERROR state...

Thanks,
Oleg

-----Original Message-----
From: Patrick Wendell [mailto:pwend...@gmail.com] 
Sent: Monday, February 23, 2015 12:59 AM
To: Oleg Shirokikh
Cc: user@spark.apache.org
Subject: Re: Submitting jobs to Spark EC2 cluster remotely

Can you list other configs that you are setting? It looks like the executor 
can't communicate back to the driver. I'm actually not sure it's a good idea to 
set spark.driver.host here, you want to let spark set that automatically.

- Patrick

On Mon, Feb 23, 2015 at 12:48 AM, Oleg Shirokikh <o...@solver.com> wrote:
> Dear Patrick,
>
> Thanks a lot for your quick response. Indeed, following your advice I've 
> uploaded the jar onto S3 and FileNotFoundException is gone now and job is 
> submitted in "cluster" deploy mode.
>
> However, now both (client and cluster) fail with the following errors in 
> executors (they keep exiting/killing executors as I see in UI):
>
> 15/02/23 08:42:46 ERROR security.UserGroupInformation: 
> PriviledgedActionException as:oleg 
> cause:java.util.concurrent.TimeoutException: Futures timed out after 
> [30 seconds]
>
>
> Full log is:
>
> 15/02/23 01:59:11 INFO executor.CoarseGrainedExecutorBackend: 
> Registered signal handlers for [TERM, HUP, INT]
> 15/02/23 01:59:12 INFO spark.SecurityManager: Changing view acls to: 
> root,oleg
> 15/02/23 01:59:12 INFO spark.SecurityManager: Changing modify acls to: 
> root,oleg
> 15/02/23 01:59:12 INFO spark.SecurityManager: SecurityManager: 
> authentication disabled; ui acls disabled; users with view 
> permissions: Set(root, oleg); users with modify permissions: Set(root, 
> oleg)
> 15/02/23 01:59:12 INFO slf4j.Slf4jLogger: Slf4jLogger started
> 15/02/23 01:59:12 INFO Remoting: Starting remoting
> 15/02/23 01:59:13 INFO Remoting: Remoting started; listening on 
> addresses 
> :[akka.tcp://driverpropsfetc...@ip-172-31-33-194.us-west-2.compute.int
> ernal:39379]
> 15/02/23 01:59:13 INFO util.Utils: Successfully started service 
> 'driverPropsFetcher' on port 39379.
> 15/02/23 01:59:43 ERROR security.UserGroupInformation: 
> PriviledgedActionException as:oleg 
> cause:java.util.concurrent.TimeoutException: Futures timed out after [30 
> seconds] Exception in thread "main" 
> java.lang.reflect.UndeclaredThrowableException: Unknown exception in doAs
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1134)
>         at 
> org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:59)
>         at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:115)
>         at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:163)
>         at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrai
> nedExecutorBackend.scala) Caused by: 
> java.security.PrivilegedActionException: 
> java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
>         ... 4 more
> Caused by: java.util.concurrent.TimeoutException: Futures timed out after [30 
> seconds]
>         at 
> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
>         at 
> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
>         at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
>         at 
> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
>         at scala.concurrent.Await$.result(package.scala:107)
>         at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:127)
>         at 
> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:60)
>         at 
> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:59)
>         ... 7 more
>
>
>
>
> -----Original Message-----
> From: Patrick Wendell [mailto:pwend...@gmail.com]
> Sent: Monday, February 23, 2015 12:17 AM
> To: Oleg Shirokikh
> Subject: Re: Submitting jobs to Spark EC2 cluster remotely
>
> The reason is that the file needs to be in a globally visible 
> filesystem where the master node can download. So it needs to be on 
> s3, for instance, rather than on your local filesystem.
>
> - Patrick
>
> On Sun, Feb 22, 2015 at 11:55 PM, olegshirokikh <o...@solver.com> wrote:
>> I've set up the EC2 cluster with Spark. Everything works, all 
>> master/slaves are up and running.
>>
>> I'm trying to submit a sample job (SparkPi). When I ssh to cluster 
>> and submit it from there - everything works fine. However when driver 
>> is created on a remote host (my laptop), it doesn't work. I've tried 
>> both modes for
>> `--deploy-mode`:
>>
>> **`--deploy-mode=client`:**
>>
>> From my laptop:
>>
>>     ./bin/spark-submit --master
>> spark://ec2-52-10-82-218.us-west-2.compute.amazonaws.com:7077 --class 
>> SparkPi ec2test/target/scala-2.10/ec2test_2.10-0.0.1.jar
>>
>> Results in the following indefinite warnings/errors:
>>
>>>  WARN TaskSchedulerImpl: Initial job has not accepted any resources; 
>>> check your cluster UI to ensure that workers are registered and have 
>>> sufficient memory 15/02/22 18:30:45
>>
>>> ERROR SparkDeploySchedulerBackend: Asked to remove non-existent 
>>> executor 0
>>> 15/02/22 18:30:45
>>
>>> ERROR SparkDeploySchedulerBackend: Asked to remove non-existent 
>>> executor 1
>>
>> ...and failed drivers - in Spark Web UI "Completed Drivers" with 
>> "State=ERROR" appear.
>>
>> I've tried to pass limits for cores and memory to submit script but 
>> it didn't help...
>>
>> **`--deploy-mode=cluster`:**
>>
>> From my laptop:
>>
>>     ./bin/spark-submit --master
>> spark://ec2-52-10-82-218.us-west-2.compute.amazonaws.com:7077 
>> --deploy-mode cluster --class SparkPi 
>> ec2test/target/scala-2.10/ec2test_2.10-0.0.1.jar
>>
>> The result is:
>>
>>> .... Driver successfully submitted as driver-20150223023734-0007 ...
>>> waiting before polling master for driver state ... polling master 
>>> for driver state State of driver-20150223023734-0007 is ERROR 
>>> Exception from cluster was: java.io.FileNotFoundException: File 
>>> file:/home/oleg/spark/spark12/ec2test/target/scala-2.10/ec2test_2.10
>>> -0.0.1.jar does not exist. java.io.FileNotFoundException: File 
>>> file:/home/oleg/spark/spark12/ec2test/target/scala-2.10/ec2test_2.10-0.0.1.jar
>>> does not exist.       at
>>> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:397)
>>>       at
>>> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251)
>>>       at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:329)        at
>>> org.apache.spark.deploy.worker.DriverRunner.org$apache$spark$deploy$worker$DriverRunner$$downloadUserJar(DriverRunner.scala:150)
>>>       at
>>> org.apache.spark.deploy.worker.DriverRunner$$anon$1.run(DriverRunner
>>> .scala:75)
>>
>>  So, I'd appreciate any pointers on what is going wrong and some 
>> guidance how to deploy jobs from remote client. Thanks.
>>
>>
>>
>> --
>> View this message in context: 
>> http://apache-spark-user-list.1001560.n3.nabble.com/Submitting-jobs-t
>> o-Spark-EC2-cluster-remotely-tp21762.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For 
>> additional commands, e-mail: user-h...@spark.apache.org
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

FW: Submitting jobs to Spark EC2 cluster remotely

Reply via email to