Seems starting spark-shell in local mode solves this. But still then it cannot
recognize file beginning with a '.'
MASTER=local[4] ./bin/spark-shell
.
scala val lineCount = sc.textFile(/home/monir/ref).count
lineCount: Long = 68
scala val lineCount2 = sc.textFile(/home/monir/.ref).count
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist:
file:/home/monir/.ref
Though I am ok with running spark-shell in local mode to basic examples run, I
was wondering if getting to local files on the cluster nodes is possible when
all of the worker nodes have the file in question in their local file system.
Still fairly new to Spark so bear with me if this is easily tunable by some
config params.
Bests,
-Monir
-Original Message-
From: Mozumder, Monir
Sent: Thursday, September 11, 2014 12:15 PM
To: user@spark.apache.org
Subject: RE: cannot read file form a local path
I am seeing this same issue with Spark 1.0.1 (tried with file:// for local file
) :
scala val lines = sc.textFile(file:///home/monir/.bashrc)
lines: org.apache.spark.rdd.RDD[String] = MappedRDD[1] at textFile at
console:12
scala val linecount = lines.count
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist:
file:/home/monir/.bashrc
at
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:197)
at
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:208)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:175)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
-Original Message-
From: wsun
Sent: Feb 03, 2014; 12:44pm
To: u...@spark.incubator.apache.org
Subject: cannot read file form a local path
After installing spark 0.8.1 on a EC2 cluster, I launched Spark shell on the
master. This is what happened to me:
scalaval textFile=sc.textFile(README.md)
14/02/03 20:38:08 INFO storage.MemoryStore: ensureFreeSpace(34380) called with
c urMem=0, maxMem=4082116853
14/02/03 20:38:08 INFO storage.MemoryStore: Block broadcast_0 stored as values
t o memory (estimated size 33.6 KB, free
3.8 GB)
textFile: org.apache.spark.rdd.RDD[String] = MappedRDD[1] at textFile at
consol e:12
scala textFile.count()
14/02/03 20:38:39 WARN snappy.LoadSnappy: Snappy native library is available
14/02/03 20:38:39 INFO util.NativeCodeLoader: Loaded the native-hadoop library
14/02/03 20:38:39 INFO snappy.LoadSnappy: Snappy native library loaded
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist:
hdfs:
//ec2-54-234-136-50.compute-1.amazonaws.com:9000/user/root/README.md
at
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.j
ava:197)
at
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.ja
va:208)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:141)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:201)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:199)
at scala.Option.getOrElse(Option.scala:108)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:199)
at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:26)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:201)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:199)
at scala.Option.getOrElse(Option.scala:108)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:199)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:886)
at org.apache.spark.rdd.RDD.count(RDD.scala:698)
Spark seems looking for README.md in hdfs. However, I did not specify the
file is located in hdfs. I am just wondering if there any configuration in
Spark that force Spark to read files from local file system. Thanks in advance
for any helps.
wp
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional
commands, e-mail: user-h...@spark.apache.org
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org