I am seeing this same issue with Spark 1.0.1 (tried with file:// for local file 
) :



scala> val lines = sc.textFile("file:///home/monir/.bashrc")
lines: org.apache.spark.rdd.RDD[String] = MappedRDD[1] at textFile at 
<console>:12

scala> val linecount = lines.count
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: 
file:/home/monir/.bashrc
        at 
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:197)
        at 
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:208)
        at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:175)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)

-----Original Message-----
From: wsun
Sent: Feb 03, 2014; 12:44pm
To: u...@spark.incubator.apache.org
Subject: cannot read file form a local path


After installing spark 0.8.1 on a EC2 cluster, I launched Spark shell on the 
master. This is what happened to me: 

scala>val textFile=sc.textFile("README.md") 
14/02/03 20:38:08 INFO storage.MemoryStore: ensureFreeSpace(34380) called with 
c                                      urMem=0, maxMem=4082116853 
14/02/03 20:38:08 INFO storage.MemoryStore: Block broadcast_0 stored as values 
t                                      o memory (estimated size 33.6 KB, free 
3.8 GB) 
textFile: org.apache.spark.rdd.RDD[String] = MappedRDD[1] at textFile at 
<consol                                      e>:12 


scala> textFile.count() 
14/02/03 20:38:39 WARN snappy.LoadSnappy: Snappy native library is available 
14/02/03 20:38:39 INFO util.NativeCodeLoader: Loaded the native-hadoop library 
14/02/03 20:38:39 INFO snappy.LoadSnappy: Snappy native library loaded 
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: 
hdfs:                                      
//ec2-54-234-136-50.compute-1.amazonaws.com:9000/user/root/README.md 
        at 
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.j           
                           ava:197) 
        at 
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.ja           
                           va:208) 
        at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:141) 
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:201) 
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:199) 
        at scala.Option.getOrElse(Option.scala:108) 
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:199) 
        at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:26) 
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:201) 
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:199) 
        at scala.Option.getOrElse(Option.scala:108) 
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:199) 
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:886) 
        at org.apache.spark.rdd.RDD.count(RDD.scala:698) 


Spark seems looking for "README.md" in hdfs. However, I did not specify the 
file is located in hdfs. I am just wondering if there any configuration in 
Spark that force Spark to read files from local file system. Thanks in advance 
for any helps. 

wp

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to