Could you run spark-shell at $SPARK_HOME DIR? You can try to change you command run at $SPARK_HOME or, point to README.md with full path.
Peter Zhang -- Google Sent with Airmail On January 19, 2016 at 11:26:14, Oleg Ruchovets (oruchov...@gmail.com) wrote: It looks spark is not working fine : I followed this link ( http://spark.apache.org/docs/latest/ec2-scripts.html. ) and I see spot instances installed on EC2. from spark shell I am counting lines and got connection exception. scala> val lines = sc.textFile("README.md") scala> lines.count() scala> val lines = sc.textFile("README.md") 16/01/19 03:17:35 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 26.5 KB, free 26.5 KB) 16/01/19 03:17:35 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 5.6 KB, free 32.1 KB) 16/01/19 03:17:35 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.31.28.196:44028 (size: 5.6 KB, free: 511.5 MB) 16/01/19 03:17:35 INFO spark.SparkContext: Created broadcast 0 from textFile at <console>:21 lines: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[1] at textFile at <console>:21 scala> lines.count() 16/01/19 03:17:55 INFO ipc.Client: Retrying connect to server: ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS) 16/01/19 03:17:56 INFO ipc.Client: Retrying connect to server: ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS) 16/01/19 03:17:57 INFO ipc.Client: Retrying connect to server: ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS) 16/01/19 03:17:58 INFO ipc.Client: Retrying connect to server: ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS) 16/01/19 03:17:59 INFO ipc.Client: Retrying connect to server: ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS) 16/01/19 03:18:00 INFO ipc.Client: Retrying connect to server: ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS) 16/01/19 03:18:01 INFO ipc.Client: Retrying connect to server: ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS) 16/01/19 03:18:02 INFO ipc.Client: Retrying connect to server: ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS) 16/01/19 03:18:03 INFO ipc.Client: Retrying connect to server: ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS) 16/01/19 03:18:04 INFO ipc.Client: Retrying connect to server: ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS) java.lang.RuntimeException: java.net.ConnectException: Call to ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000 failed on connection exception: java.net.ConnectException: Connection refused at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:567) at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:318) at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:291) at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$33.apply(SparkContext.scala:1015) at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$33.apply(SparkContext.scala:1015) at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176) at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176) at scala.Option.map(Option.scala:145) at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:176) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:195) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929) at org.apache.spark.rdd.RDD.count(RDD.scala:1143) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:24) at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:29) at $iwC$$iwC$$iwC$$iwC.<init>(<console>:31) at $iwC$$iwC$$iwC.<init>(<console>:33) at $iwC$$iwC.<init>(<console>:35) at $iwC.<init>(<console>:37) at <init>(<console>:39) at .<init>(<console>:43) at .<clinit>(<console>) at .<init>(<console>:7) at .<clinit>(<console>) at $print(<console>) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657) at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.net.ConnectException: Call to ec2-54-88-242-197.compute-1.amazonaws.com/172.31.28.196:9000 failed on connection exception: java.net.ConnectException: Connection refused at org.apache.hadoop.ipc.Client.wrapException(Client.java:1142) at org.apache.hadoop.ipc.Client.call(Client.java:1118) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229) at com.sun.proxy.$Proxy15.getProtocolVersion(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62) at com.sun.proxy.$Proxy15.getProtocolVersion(Unknown Source) at org.apache.hadoop.ipc.RPC.checkVersion(RPC.java:422) at org.apache.hadoop.hdfs.DFSClient.createNamenode(DFSClient.java:183) at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:281) at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:245) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:100) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1446) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:67) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1464) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:263) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:124) at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:563) ... 64 more Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:511) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:481) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:457) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:583) at org.apache.hadoop.ipc.Client$Connection.access$2200(Client.java:205) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1249) at org.apache.hadoop.ipc.Client.call(Client.java:1093) ... 84 more scala> On Tue, Jan 19, 2016 at 1:22 AM, Daniel Darabos <daniel.dara...@lynxanalytics.com> wrote: On Mon, Jan 18, 2016 at 5:24 PM, Oleg Ruchovets <oruchov...@gmail.com> wrote: I thought script tries to install hadoop / hdfs also. And it looks like it failed. Installation is only standalone spark without hadoop. Is it correct behaviour? Yes, it also sets up two HDFS clusters. Are they not working? Try to see if Spark is working by running some simple jobs on it. (See http://spark.apache.org/docs/latest/ec2-scripts.html.) There is no program called Hadoop. If you mean YARN, then indeed the script does not set up YARN. It sets up standalone Spark. Also errors in the log: ERROR: Unknown Tachyon version Error: Could not find or load main class crayondata.com.log As long as Spark is working fine, you can ignore all output from the EC2 script :).