SparkContext.textfile() cannot load file using UNC path on windows

2014-12-04 Thread Ningjun Wang
SparkContext.textfile() cannot load file using UNC path on windows

I run the following on Windows XP

val conf = new
SparkConf().setAppName(testproj1.ClassificationEngine).setMaster(local)
val sc = new SparkContext(conf)
sc.textFile(raw\\10.209.128.150\TempShare\SvmPocData\reuters-two-categories.load).count()
// This line throw the following exception

Exception in thread main org.apache.hadoop.mapred.InvalidInputException:
Input path does not exist: file://
10.209.128.150/TempShare/SvmPocData/reuters-two-categories.load
at
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:197)

at
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:208)

at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:179)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1135)
at org.apache.spark.rdd.RDD.count(RDD.scala:904)
at
testproj1.ClassificationEngine$.buildIndex(ClassificationEngine.scala:49)
at testproj1.ClassificationEngine$.main(ClassificationEngine.scala:36)
at testproj1.ClassificationEngine.main(ClassificationEngine.scala)

If I use local path, it works
sc.textFile(rawC:/temp/Share/SvmPocData/reuters-two-categories.load).count()

sc.textFile(rawC:\temp\Share\SvmPocData\reuters-two-categories.load).count()


I tried other form of UNC path below and always got the same exception
sc.textFile(raw//
10.209.128.150/TempShare/SvmPocData/reuters-two-categories.load).count()

sc.textFile(rawfile://
10.209.128.150/TempShare/SvmPocData/reuters-two-categories.load).count()

sc.textFile(rawfile:///
10.209.128.150/TempShare/SvmPocData/reuters-two-categories.load).count()

sc.textFile(rawfile:
10.209.128.150/TempShare/SvmPocData/reuters-two-categories.load).count()

The UNC path is valid. I can go to my windows explorer and type
“\\10.209.128.150\TempShare\SvmPocData\reuters-two-categories.load to open
the file in notepade.

Please advise.


SparkContext.textfile() cannot load file using UNC path on windows

2014-11-26 Thread Wang, Ningjun (LNG-NPV)
SparkContext.textfile() cannot load file using UNC path on windows

I run the following on Windows XP

val conf = new 
SparkConf().setAppName(testproj1.ClassificationEngine).setMaster(local)
val sc = new SparkContext(conf)

sc.textFile(raw\\10.209.128.150\TempShare\SvmPocData\reuters-two-categories.load).count()
 // This line throw the following exception

Exception in thread main org.apache.hadoop.mapred.InvalidInputException: 
Input path does not exist: 
file://10.209.128.150/TempShare/SvmPocData/reuters-two-categories.load
   at 
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:197)
   at 
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:208)
   at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:179)
   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
   at scala.Option.getOrElse(Option.scala:120)
   at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
   at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
   at scala.Option.getOrElse(Option.scala:120)
   at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
   at org.apache.spark.SparkContext.runJob(SparkContext.scala:1135)
   at org.apache.spark.rdd.RDD.count(RDD.scala:904)
   at 
testproj1.ClassificationEngine$.buildIndex(ClassificationEngine.scala:49)
   at testproj1.ClassificationEngine$.main(ClassificationEngine.scala:36)
   at testproj1.ClassificationEngine.main(ClassificationEngine.scala)

If I use local path, it works
sc.textFile(rawC:/temp/Share/SvmPocData/reuters-two-categories.load).count()
sc.textFile(rawC:\temp\Share\SvmPocData\reuters-two-categories.load).count()

I tried other form of UNC path below and always got the same exception
  
sc.textFile(raw//10.209.128.150/TempShare/SvmPocData/reuters-two-categories.load).count()


sc.textFile(rawfile://10.209.128.150/TempShare/SvmPocData/reuters-two-categories.load).count()


sc.textFile(rawfile:///10.209.128.150/TempShare/SvmPocData/reuters-two-categories.load).count()


sc.textFile(rawfile:10.209.128.150/TempShare/SvmPocData/reuters-two-categories.load).count()

The UNC path is valid. I can go to my windows explorer and type 
\\10.209.128.150\TempShare\SvmPocData\reuters-two-categories.load to open the 
file in notepade.

Please advise.

Regards,

Ningjun