subject:"\"wholeTextFiles not working with HDFS\""

Re: wholeTextFiles not working with HDFS

2014-08-22 Thread pierred

I forgot to say, I am using bin/spark-shell, spark-1.0.2
That host has scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java
1.8.0_11)




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/wholeTextFiles-not-working-with-HDFS-tp7490p12678.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: wholeTextFiles not working with HDFS

2014-08-22 Thread pierred

I had the same issue with spark-1.0.2-bin-hadoop*1*, and indeed the issue
seems related to Hadoop1.  When switching to using
spark-1.0.2-bin-hadoop*2*, the issue disappears.




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/wholeTextFiles-not-working-with-HDFS-tp7490p12677.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: wholeTextFiles not working with HDFS

2014-07-23 Thread kmader

That worked for me as well, I was using spark 1.0 compiled against Hadoop
1.0, switching to 1.0.1 compiled against hadoop 2



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/wholeTextFiles-not-working-with-HDFS-tp7490p10547.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: wholeTextFiles not working with HDFS

2014-07-23 Thread kmader

I have the same issue

val a = sc.textFile("s3n://MyBucket/MyFolder/*.tif")
a.first

works perfectly fine, but 

val d = sc.wholeTextFiles("s3n://MyBucket/MyFolder/*.tif")  does not
work
d.first

Gives the following error message

java.io.FileNotFoundException: File /MyBucket/MyFolder.tif does not
exist.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/wholeTextFiles-not-working-with-HDFS-tp7490p10505.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: wholeTextFiles not working with HDFS

2014-06-17 Thread Sguj

I can write one if you'll point me to where I need to write it.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/wholeTextFiles-not-working-with-HDFS-tp7490p7737.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: wholeTextFiles not working with HDFS

2014-06-17 Thread Xusen Yin

Hi Sguj and littlebird,

I'll try to fix it tomorrow evening and the day after tomorrow, because I
am now busy preparing a talk (slides) tomorrow. Sorry for the inconvenience
to you. Would you mind to write an issue on Spark JIRA?


2014-06-17 20:55 GMT+08:00 Sguj :

> I didn't fix the issue so much as work around it. I was running my cluster
> locally, so using HDFS was just a preference. The code worked with the
> local
> file system, so that's what I'm using until I can get some help.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/wholeTextFiles-not-working-with-HDFS-tp7490p7726.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>



-- 
Best Regards
---
Xusen Yin(尹绪森)
Intel Labs China
Homepage: *http://yinxusen.github.io/ <http://yinxusen.github.io/>*

Re: wholeTextFiles not working with HDFS

2014-06-17 Thread Sguj

I didn't fix the issue so much as work around it. I was running my cluster
locally, so using HDFS was just a preference. The code worked with the local
file system, so that's what I'm using until I can get some help.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/wholeTextFiles-not-working-with-HDFS-tp7490p7726.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: wholeTextFiles not working with HDFS

2014-06-16 Thread littlebird

Hi, I have the same exception. Can you tell me how did you fix it? Thank you!



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/wholeTextFiles-not-working-with-HDFS-tp7490p7665.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: wholeTextFiles not working with HDFS

2014-06-13 Thread Sguj

My exception stack looks about the same.

java.io.FileNotFoundException: File /user/me/target/capacity-scheduler.xml
does not exist.
at
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:397)
at
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251)
at
org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat$OneFileInfo.(CombineFileInputFormat.java:489)
at
org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getMoreSplits(CombineFileInputFormat.java:280)
at
org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:240)
at
org.apache.spark.rdd.WholeTextFileRDD.getPartitions(NewHadoopRDD.scala:173)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1094)
at org.apache.spark.rdd.RDD.collect(RDD.scala:717)

I'm using Hadoop 1.2.1, and everything else I've tried in Spark with that
version has worked, so I doubt it's a version error.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/wholeTextFiles-not-working-with-HDFS-tp7490p7570.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: wholeTextFiles not working with HDFS

2014-06-12 Thread yinxusen

Hi Sguj,

Could you give me the exception stack?

I test it on my laptop and find that it gets the wrong FileSystem. It should
be DistributedFileSystem, but it finds the RawLocalFileSystem.

If we get the same exception stack, I'll try to fix it.

Here is my exception stack:

java.io.FileNotFoundException: File /sen/reuters-out/reut2-000.sgm-0.txt
does not exist.
at
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:397)
at
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251)
at
org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat$OneFileInfo.(CombineFileInputFormat.java:489)
at
org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getMoreSplits(CombineFileInputFormat.java:280)
at
org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:240)
at
org.apache.spark.rdd.WholeTextFileRDD.getPartitions(NewHadoopRDD.scala:173)
at
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
at
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:201)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:201)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1097)
at org.apache.spark.rdd.RDD.collect(RDD.scala:728)

Besides, what's your hadoop version?




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/wholeTextFiles-not-working-with-HDFS-tp7490p7548.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

wholeTextFiles not working with HDFS

2014-06-12 Thread Sguj

I'm trying to get a list of every filename in a directory from HDFS using
pySpark, and the only thing that seems like it would return the filenames is
the wholeTextFiles function. My code for just trying to collect that data is
this:

   files = sc.wholeTextFiles("hdfs://localhost:port/users/me/target")
   files = files.collect()

These lines return the error "java.io.FileNotFoundException: File
/user/me/target/capacity-scheduler.xml does not exist" which makes it seem
like the hdfs information isn't getting used with the wholeTextFiles
function. 

Those lines work if I use them on a local filesystem directory, and the
textFile() function works on the HDFS directory I'm trying to use
wholeTextFiles() on.

I need a way to either fix this, or an alternate method of reading the
filenames from a directory in HDFS.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/wholeTextFiles-not-working-with-HDFS-tp7490.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: wholeTextFiles not working with HDFS

Re: wholeTextFiles not working with HDFS

Re: wholeTextFiles not working with HDFS

Re: wholeTextFiles not working with HDFS

Re: wholeTextFiles not working with HDFS

Re: wholeTextFiles not working with HDFS

Re: wholeTextFiles not working with HDFS

Re: wholeTextFiles not working with HDFS

Re: wholeTextFiles not working with HDFS

Re: wholeTextFiles not working with HDFS

wholeTextFiles not working with HDFS

11 matches

Site Navigation

Mail list logo

Footer information