Re: Down-scaling Spark on EC2 cluster
What about down-scaling when I use Mesos, does that really deteriorate the performance ? Otherwise we would probably go for spark on mesos on ec2 :) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Down-scaling-Spark-on-EC2-cluster-tp10494p12109.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Down-scaling Spark on EC2 cluster
Any idea about the probable dates for this implementation. I believe it would be a wonderful (and essential) functionality to gain more acceptance in the community. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Down-scaling-Spark-on-EC2-cluster-tp10494p10639.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Down-scaling Spark on EC2 cluster
Hello, We plan to use Spark on EC2 for our data science pipeline. We successfully manage to set up cluster as-well-as launch and run applications on remote-clusters. However, to enhance scalability we would like to implement auto-scaling in EC2 for Spark applications. However, I did not find any proper reference about this. For example when we launch training programs that use Matlab scripts on EC2 cluster we do auto scaling by SQS. Can anyone please suggest what are the options for Spark ? This is especially more important when we would downscaling by removing a machine (how graceful can it be if it is in the middle of a task). Thanks in advance. Shubhabrata -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Down-scaling-Spark-on-EC2-cluster-tp10494.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Log analysis
I am new to spark and we are developing a data science pipeline based on spark on ec2. So far we have been using python on spark standalone cluster. However, being a newbie I would like to know more about how can I do debugging (program level) from spark logs (is it stderr ?). I find it a bit difficult to debug since, spark itself has many messages there. Any ideas or suggestion regarding configuration change to facilitate this would be highly appreciated !! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Log-analysis-tp6168.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
unsubscribe
unsubscribe
Re: Deploying a python code on a spark EC2 cluster
This is the error from stderr: Spark Executor Command: java -cp :/root/ephemeral-hdfs/conf:/root/ephemeral-hdfs/conf:/root/ephemeral-hdfs/conf:/root/spark/conf:/root/spark/assembly/target/scala-2.10/spark-assembly_2.10-0.9.1-hadoop1.0.4.jar -Djava.library.path=/root/ephemeral-hdfs/lib/native/ -Dspark.local.dir=/mnt/spark -Dspark.local.dir=/mnt/spark -Dspark.local.dir=/mnt/spark -Dspark.local.dir=/mnt/spark -Xms2048M -Xmx2048M org.apache.spark.executor.CoarseGrainedExecutorBackend akka.tcp://spark@192.168.122.1:44577/user/CoarseGrainedScheduler 1 ip-10-84-7-178.eu-west-1.compute.internal 1 akka.tcp://sparkwor...@ip-10-84-7-178.eu-west-1.compute.internal:57839/user/Worker app-20140425133749- 14/04/25 13:39:37 INFO slf4j.Slf4jLogger: Slf4jLogger started 14/04/25 13:39:38 INFO Remoting: Starting remoting 14/04/25 13:39:38 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkexecu...@ip-10-84-7-178.eu-west-1.compute.internal:36800] 14/04/25 13:39:38 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkexecu...@ip-10-84-7-178.eu-west-1.compute.internal:36800] 14/04/25 13:39:38 INFO worker.WorkerWatcher: Connecting to worker akka.tcp://sparkwor...@ip-10-84-7-178.eu-west-1.compute.internal:57839/user/Worker 14/04/25 13:39:38 INFO executor.CoarseGrainedExecutorBackend: Connecting to driver: akka.tcp://spark@192.168.122.1:44577/user/CoarseGrainedScheduler 14/04/25 13:39:39 INFO worker.WorkerWatcher: Successfully connected to akka.tcp://sparkwor...@ip-10-84-7-178.eu-west-1.compute.internal:57839/user/Worker 14/04/25 13:41:19 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkexecu...@ip-10-84-7-178.eu-west-1.compute.internal:36800] - [akka.tcp://spark@192.168.122.1:44577] disassociated! Shutting down. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Deploying-a-python-code-on-a-spark-EC2-cluster-tp4758p4828.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Deploying a python code on a spark EC2 cluster
In order to check if there is any issue with python API I ran a scala application provided in the examples. Still the same error ./bin/run-example org.apache.spark.examples.SparkPi spark://[Master-URL]:7077 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/mnt/work/spark-0.9.1/examples/target/scala-2.10/spark-examples-assembly-0.9.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/mnt/work/spark-0.9.1/assembly/target/scala-2.10/spark-assembly-0.9.1-hadoop1.0.4.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 14/04/25 17:07:10 INFO Utils: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 14/04/25 17:07:10 WARN Utils: Your hostname, rd-hu resolves to a loopback address: 127.0.1.1; using 192.168.122.1 instead (on interface virbr0) 14/04/25 17:07:10 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address 14/04/25 17:07:11 INFO Slf4jLogger: Slf4jLogger started 14/04/25 17:07:11 INFO Remoting: Starting remoting 14/04/25 17:07:11 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://spark@192.168.122.1:26278] 14/04/25 17:07:11 INFO Remoting: Remoting now listens on addresses: [akka.tcp://spark@192.168.122.1:26278] 14/04/25 17:07:11 INFO SparkEnv: Registering BlockManagerMaster 14/04/25 17:07:11 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20140425170711-d1da 14/04/25 17:07:11 INFO MemoryStore: MemoryStore started with capacity 16.0 GB. 14/04/25 17:07:11 INFO ConnectionManager: Bound socket to port 9788 with id = ConnectionManagerId(192.168.122.1,9788) 14/04/25 17:07:11 INFO BlockManagerMaster: Trying to register BlockManager 14/04/25 17:07:11 INFO BlockManagerMasterActor$BlockManagerInfo: Registering block manager 192.168.122.1:9788 with 16.0 GB RAM 14/04/25 17:07:11 INFO BlockManagerMaster: Registered BlockManager 14/04/25 17:07:11 INFO HttpServer: Starting HTTP Server 14/04/25 17:07:11 INFO HttpBroadcast: Broadcast server started at http://192.168.122.1:58091 14/04/25 17:07:11 INFO SparkEnv: Registering MapOutputTracker 14/04/25 17:07:11 INFO HttpFileServer: HTTP File server directory is /tmp/spark-599577a4-5732-4949-a2e8-f59eb679e843 14/04/25 17:07:11 INFO HttpServer: Starting HTTP Server 14/04/25 17:07:12 WARN AbstractLifeCycle: FAILED SelectChannelConnector@0.0.0.0:4040: java.net.BindException: Address already in use java.net.BindException: Address already in use at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:444) at sun.nio.ch.Net.bind(Net.java:436) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) at org.eclipse.jetty.server.nio.SelectChannelConnector.open(SelectChannelConnector.java:187) at org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:316) at org.eclipse.jetty.server.nio.SelectChannelConnector.doStart(SelectChannelConnector.java:265) at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64) at org.eclipse.jetty.server.Server.doStart(Server.java:286) at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64) at org.apache.spark.ui.JettyUtils$$anonfun$1.apply$mcV$sp(JettyUtils.scala:118) at org.apache.spark.ui.JettyUtils$$anonfun$1.apply(JettyUtils.scala:118) at org.apache.spark.ui.JettyUtils$$anonfun$1.apply(JettyUtils.scala:118) at scala.util.Try$.apply(Try.scala:161) at org.apache.spark.ui.JettyUtils$.connect$1(JettyUtils.scala:118) at org.apache.spark.ui.JettyUtils$.startJettyServer(JettyUtils.scala:129) at org.apache.spark.ui.SparkUI.bind(SparkUI.scala:57) at org.apache.spark.SparkContext.init(SparkContext.scala:159) at org.apache.spark.SparkContext.init(SparkContext.scala:100) at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:31) at org.apache.spark.examples.SparkPi.main(SparkPi.scala) 14/04/25 17:07:12 WARN AbstractLifeCycle: FAILED org.eclipse.jetty.server.Server@74f4b96: java.net.BindException: Address already in use java.net.BindException: Address already in use at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:444) at sun.nio.ch.Net.bind(Net.java:436) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) at org.eclipse.jetty.server.nio.SelectChannelConnector.open(SelectChannelConnector.java:187) at org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:316) at
Re: Deploying a python code on a spark EC2 cluster
Moreover it seems all the workers are registered and have sufficient memory (2.7GB where as I have asked for 512 MB). The UI also shows the jobs are running on the slaves. But on the termial it is still the same error Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory Please see the screenshot. Thanks http://apache-spark-user-list.1001560.n3.nabble.com/file/n4761/33.png -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Deploying-a-python-code-on-a-spark-EC2-cluster-tp4758p4761.html Sent from the Apache Spark User List mailing list archive at Nabble.com.