What should I put the value of that environment variable? I want to run the scripts locally on my machine and do not have any Hadoop installed.
Thank you From: Akhil Das [mailto:ak...@sigmoidanalytics.com] Sent: Mittwoch, 27. August 2014 12:54 To: Hingorani, Vineet Cc: user@spark.apache.org Subject: Re: Example File not running The statement java.io.IOException: Could not locate executable null\bin\winutils.exe explains that the null is received when expanding or replacing an Environment Variable. I'm guessing that you are missing HADOOP_HOME in the environment variables. Thanks Best Regards On Wed, Aug 27, 2014 at 3:52 PM, Hingorani, Vineet <vineet.hingor...@sap.com<mailto:vineet.hingor...@sap.com>> wrote: Hello all, I am able to use Spark in the shell but I am not able to run a spark file. I am using sbt and the jar is created but even the SimpleApp class example given on the site http://spark.apache.org/docs/latest/quick-start.html is not running. I installed a prebuilt version of spark and >sbt package is compiling the scala file to jar. I am running it locally on my machine. The error log is huge but it starts with something like this: 14/08/27 12:14:21 INFO SecurityManager: Using Spark's default log4j profile: org/apache/sp ark/log4j-defaults.properties 14/08/27 12:14:21 INFO SecurityManager: Changing view acls to: D062844 14/08/27 12:14:21 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(D062844) 14/08/27 12:14:22 INFO Slf4jLogger: Slf4jLogger started 14/08/27 12:14:22 INFO Remoting: Starting remoting 14/08/27 12:14:22 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://spark@10.94.74.159:51157<http://rk@10.94.74.159:51157>] 14/08/27 12:14:22 INFO Remoting: Remoting now listens on addresses: [akka.tcp://spark@10.9 4.74.159:51157] 14/08/27 12:14:22 INFO SparkEnv: Registering MapOutputTracker 14/08/27 12:14:22 INFO SparkEnv: Registering BlockManagerMaster 14/08/27 12:14:22 INFO DiskBlockManager: Created local directory at C:\Users\D062844\AppDa ta\Local\Temp\spark-local-20140827121422-dec8 14/08/27 12:14:22 INFO MemoryStore: MemoryStore started with capacity 294.9 MB. 14/08/27 12:14:22 INFO ConnectionManager: Bound socket to port 51160 with id = ConnectionM anagerId(10.94.74.159,51160) 14/08/27 12:14:22 INFO BlockManagerMaster: Trying to register BlockManager 14/08/27 12:14:22 INFO BlockManagerInfo: Registering block manager 10.94.74.159:51160<http://10.94.74.159:51160> with 294.9 MB RAM 14/08/27 12:14:22 INFO BlockManagerMaster: Registered BlockManager 14/08/27 12:14:22 INFO HttpServer: Starting HTTP Server 14/08/27 12:14:22 INFO HttpBroadcast: Broadcast server started at http://10.94.74.159:5116 1 14/08/27 12:14:22 INFO HttpFileServer: HTTP File server directory is C:\Users\D062844\AppD ata\Local\Temp\spark-d79d2857-3d85-4b16-8d76-ade83d465f10 14/08/27 12:14:22 INFO HttpServer: Starting HTTP Server 14/08/27 12:14:22 INFO SparkUI: Started SparkUI at http://10.94.74.159:4040 14/08/27 12:14:23 WARN NativeCodeLoader: Unable to load native-hadoop library for your pla tform... using builtin-java classes where applicable 14/08/27 12:14:23 INFO SparkContext: Added JAR file:/C:/Users/D062844/Desktop/HandsOnSpark /Install/spark-1.0.2-bin-hadoop2/target/scala-2.10/simple-project_2.10-1.0.jar at http://10.94.74.159:51162/jars/simple-project_2.10-1.0.jar<http://0.94.74.159:51162/jars/simple-project_2.10-1.0.jar> with timestamp 1409134463198 14/08/27 12:14:23 INFO MemoryStore: ensureFreeSpace(138763) called with curMem=0, maxMem=3 09225062 14/08/27 12:14:23 INFO MemoryStore: Block broadcast_0 stored as values to memory (estimate d size 135.5 KB, free 294.8 MB) 14/08/27 12:14:23 ERROR Shell: Failed to locate the winutils binary in the hadoop binary p ath java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binar ies. at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:278) at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:300) at org.apache.hadoop.util.Shell.<clinit>(Shell.java:293) at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:76) at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:362 ) at org.apache.spark.SparkContext$$anonfun$22.apply(SparkContext.scala:546) at org.apache.spark.SparkContext$$anonfun$22.apply(SparkContext.scala:546) at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$1.apply(HadoopRDD.scala:145) at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$1.apply(HadoopRDD.scala:145) at scala.Option.map(Option.scala:145) at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:145) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:168) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:202) at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:202) at org.apache.spark.rdd.FilteredRDD.getPartitions(FilteredRDD.scala:29) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) ……………… …………….. It goes on like this and doesn’t show me the result of count in the file. I had installed pre0built version named Hadoop2. The problem what I think is because I am running it on local machine and it is not able to find some dependencies of Hadoop. Please give me what file should I download to work on my local machine (pre-built, so that I don’t have to build it again). Thank you Regards, Vineet Hingorani