Seems starting spark-shell in local mode solves this. But still then it cannot recognize file beginning with a '.'
MASTER=local[4] ./bin/spark-shell .... ..... scala> val lineCount = sc.textFile("/home/monir/ref").count lineCount: Long = 68 scala> val lineCount2 = sc.textFile("/home/monir/.ref").count org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/home/monir/.ref Though I am ok with running spark-shell in local mode to basic examples run, I was wondering if getting to local files on the cluster nodes is possible when all of the worker nodes have the file in question in their local file system. Still fairly new to Spark so bear with me if this is easily tunable by some config params. Bests, -Monir -----Original Message----- From: Mozumder, Monir Sent: Thursday, September 11, 2014 12:15 PM To: user@spark.apache.org Subject: RE: cannot read file form a local path I am seeing this same issue with Spark 1.0.1 (tried with file:// for local file ) : scala> val lines = sc.textFile("file:///home/monir/.bashrc") lines: org.apache.spark.rdd.RDD[String] = MappedRDD[1] at textFile at <console>:12 scala> val linecount = lines.count org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/home/monir/.bashrc at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:197) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:208) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:175) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) -----Original Message----- From: wsun Sent: Feb 03, 2014; 12:44pm To: u...@spark.incubator.apache.org Subject: cannot read file form a local path After installing spark 0.8.1 on a EC2 cluster, I launched Spark shell on the master. This is what happened to me: scala>val textFile=sc.textFile("README.md") 14/02/03 20:38:08 INFO storage.MemoryStore: ensureFreeSpace(34380) called with c urMem=0, maxMem=4082116853 14/02/03 20:38:08 INFO storage.MemoryStore: Block broadcast_0 stored as values t o memory (estimated size 33.6 KB, free 3.8 GB) textFile: org.apache.spark.rdd.RDD[String] = MappedRDD[1] at textFile at <consol e>:12 scala> textFile.count() 14/02/03 20:38:39 WARN snappy.LoadSnappy: Snappy native library is available 14/02/03 20:38:39 INFO util.NativeCodeLoader: Loaded the native-hadoop library 14/02/03 20:38:39 INFO snappy.LoadSnappy: Snappy native library loaded org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs: //ec2-54-234-136-50.compute-1.amazonaws.com:9000/user/root/README.md at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.j ava:197) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.ja va:208) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:141) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:201) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:199) at scala.Option.getOrElse(Option.scala:108) at org.apache.spark.rdd.RDD.partitions(RDD.scala:199) at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:26) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:201) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:199) at scala.Option.getOrElse(Option.scala:108) at org.apache.spark.rdd.RDD.partitions(RDD.scala:199) at org.apache.spark.SparkContext.runJob(SparkContext.scala:886) at org.apache.spark.rdd.RDD.count(RDD.scala:698) Spark seems looking for "README.md" in hdfs. However, I did not specify the file is located in hdfs. I am just wondering if there any configuration in Spark that force Spark to read files from local file system. Thanks in advance for any helps. wp --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org