RE: cannot read file form a local path

Mozumder, Monir Thu, 11 Sep 2014 13:44:36 -0700

Seems starting spark-shell in local mode solves this. But still then it cannot 
recognize file beginning with a '.'


    MASTER=local[4] ./bin/spark-shell
    ....
    .....
    scala> val lineCount = sc.textFile("/home/monir/ref").count
    lineCount: Long = 68

    scala> val lineCount2 = sc.textFile("/home/monir/.ref").count
    org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: 
file:/home/monir/.ref


Though I am ok with running spark-shell in  local mode to basic examples run, I 
was wondering if getting to local files on the cluster nodes is possible when 
all of the worker nodes have the file in question in their local file system.

Still fairly new to Spark so bear with me if this is easily tunable by some 
config params.

Bests,
-Monir



-----Original Message-----
From: Mozumder, Monir 
Sent: Thursday, September 11, 2014 12:15 PM
To: user@spark.apache.org
Subject: RE: cannot read file form a local path

I am seeing this same issue with Spark 1.0.1 (tried with file:// for local file 
) :



scala> val lines = sc.textFile("file:///home/monir/.bashrc")
lines: org.apache.spark.rdd.RDD[String] = MappedRDD[1] at textFile at 
<console>:12

scala> val linecount = lines.count
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: 
file:/home/monir/.bashrc
        at 
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:197)
        at 
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:208)
        at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:175)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)

-----Original Message-----
From: wsun
Sent: Feb 03, 2014; 12:44pm
To: u...@spark.incubator.apache.org
Subject: cannot read file form a local path


After installing spark 0.8.1 on a EC2 cluster, I launched Spark shell on the 
master. This is what happened to me: 

scala>val textFile=sc.textFile("README.md")
14/02/03 20:38:08 INFO storage.MemoryStore: ensureFreeSpace(34380) called with 
c                                      urMem=0, maxMem=4082116853 
14/02/03 20:38:08 INFO storage.MemoryStore: Block broadcast_0 stored as values 
t                                      o memory (estimated size 33.6 KB, free 
3.8 GB) 
textFile: org.apache.spark.rdd.RDD[String] = MappedRDD[1] at textFile at 
<consol                                      e>:12 


scala> textFile.count()
14/02/03 20:38:39 WARN snappy.LoadSnappy: Snappy native library is available
14/02/03 20:38:39 INFO util.NativeCodeLoader: Loaded the native-hadoop library
14/02/03 20:38:39 INFO snappy.LoadSnappy: Snappy native library loaded 
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: 
hdfs:                                      
//ec2-54-234-136-50.compute-1.amazonaws.com:9000/user/root/README.md 
        at 
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.j           
                           ava:197) 
        at 
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.ja           
                           va:208) 
        at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:141) 
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:201) 
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:199) 
        at scala.Option.getOrElse(Option.scala:108) 
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:199) 
        at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:26) 
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:201) 
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:199) 
        at scala.Option.getOrElse(Option.scala:108) 
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:199) 
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:886) 
        at org.apache.spark.rdd.RDD.count(RDD.scala:698) 


Spark seems looking for "README.md" in hdfs. However, I did not specify the 
file is located in hdfs. I am just wondering if there any configuration in 
Spark that force Spark to read files from local file system. Thanks in advance 
for any helps. 

wp

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional 
commands, e-mail: user-h...@spark.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

RE: cannot read file form a local path

Reply via email to