I am seeing this same issue with Spark 1.0.1 (tried with file:// for local file ) :
scala> val lines = sc.textFile("file:///home/monir/.bashrc") lines: org.apache.spark.rdd.RDD[String] = MappedRDD[1] at textFile at <console>:12 scala> val linecount = lines.count org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/home/monir/.bashrc at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:197) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:208) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:175) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) -----Original Message----- From: wsun Sent: Feb 03, 2014; 12:44pm To: u...@spark.incubator.apache.org Subject: cannot read file form a local path After installing spark 0.8.1 on a EC2 cluster, I launched Spark shell on the master. This is what happened to me: scala>val textFile=sc.textFile("README.md") 14/02/03 20:38:08 INFO storage.MemoryStore: ensureFreeSpace(34380) called with c urMem=0, maxMem=4082116853 14/02/03 20:38:08 INFO storage.MemoryStore: Block broadcast_0 stored as values t o memory (estimated size 33.6 KB, free 3.8 GB) textFile: org.apache.spark.rdd.RDD[String] = MappedRDD[1] at textFile at <consol e>:12 scala> textFile.count() 14/02/03 20:38:39 WARN snappy.LoadSnappy: Snappy native library is available 14/02/03 20:38:39 INFO util.NativeCodeLoader: Loaded the native-hadoop library 14/02/03 20:38:39 INFO snappy.LoadSnappy: Snappy native library loaded org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs: //ec2-54-234-136-50.compute-1.amazonaws.com:9000/user/root/README.md at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.j ava:197) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.ja va:208) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:141) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:201) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:199) at scala.Option.getOrElse(Option.scala:108) at org.apache.spark.rdd.RDD.partitions(RDD.scala:199) at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:26) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:201) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:199) at scala.Option.getOrElse(Option.scala:108) at org.apache.spark.rdd.RDD.partitions(RDD.scala:199) at org.apache.spark.SparkContext.runJob(SparkContext.scala:886) at org.apache.spark.rdd.RDD.count(RDD.scala:698) Spark seems looking for "README.md" in hdfs. However, I did not specify the file is located in hdfs. I am just wondering if there any configuration in Spark that force Spark to read files from local file system. Thanks in advance for any helps. wp --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org