[jira] [Commented] (SPARK-4414) SparkContext.wholeTextFiles Doesn't work with S3 Buckets

2015-04-28 Thread Peter Marsh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516971#comment-14516971
 ] 

Peter Marsh commented on SPARK-4414:


I managed to get this to work by re-installing spark. Initially I had installed 
spark from source and built it locally, after removing that and installing 
spark-1.3.0-bin-hadoop2.4 (prebuilt) I was able to use wholeTextFiles(...)

 SparkContext.wholeTextFiles Doesn't work with S3 Buckets
 

 Key: SPARK-4414
 URL: https://issues.apache.org/jira/browse/SPARK-4414
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.1.0, 1.2.0
Reporter: Pedro Rodriguez
Priority: Critical

 SparkContext.wholeTextFiles does not read files which SparkContext.textFile 
 can read. Below are general steps to reproduce, my specific case is following 
 that on a git repo.
 Steps to reproduce.
 1. Create Amazon S3 bucket, make public with multiple files
 2. Attempt to read bucket with
 sc.wholeTextFiles(s3n://mybucket/myfile.txt)
 3. Spark returns the following error, even if the file exists.
 Exception in thread main java.io.FileNotFoundException: File does not 
 exist: /myfile.txt
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:517)
   at 
 org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat$OneFileInfo.init(CombineFileInputFormat.java:489)
 4. Change the call to
 sc.textFile(s3n://mybucket/myfile.txt)
 and there is no error message, the application should run fine.
 There is a question on StackOverflow as well on this:
 http://stackoverflow.com/questions/26258458/sparkcontext-wholetextfiles-java-io-filenotfoundexception-file-does-not-exist
 This is link to repo/lines of code. The uncommented call doesn't work, the 
 commented call works as expected:
 https://github.com/EntilZha/nips-lda-spark/blob/45f5ad1e2646609ef9d295a0954fbefe84111d8a/src/main/scala/NipsLda.scala#L13-L19
 It would be easy to use textFile with a multifile argument, but this should 
 work correctly for s3 bucket files as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4414) SparkContext.wholeTextFiles Doesn't work with S3 Buckets

2015-04-27 Thread Peter Marsh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514417#comment-14514417
 ] 

Peter Marsh commented on SPARK-4414:


For what it's worth, I'm also experiencing this problem:

sc.textFile('s3n://mybucket/mylovelyfile') # works
sc.wholeTextFiles('s3n://mybucket/mylovelyfile') # FileNotFoundException

I'm running spark 1.3.0 on Ubuntu, a colleague is running 1.3.0 on OSX and 
sc.wholeTextFiles works fine for him, maybe it's a platform-specific issue?

 SparkContext.wholeTextFiles Doesn't work with S3 Buckets
 

 Key: SPARK-4414
 URL: https://issues.apache.org/jira/browse/SPARK-4414
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.1.0, 1.2.0
Reporter: Pedro Rodriguez
Priority: Critical

 SparkContext.wholeTextFiles does not read files which SparkContext.textFile 
 can read. Below are general steps to reproduce, my specific case is following 
 that on a git repo.
 Steps to reproduce.
 1. Create Amazon S3 bucket, make public with multiple files
 2. Attempt to read bucket with
 sc.wholeTextFiles(s3n://mybucket/myfile.txt)
 3. Spark returns the following error, even if the file exists.
 Exception in thread main java.io.FileNotFoundException: File does not 
 exist: /myfile.txt
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:517)
   at 
 org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat$OneFileInfo.init(CombineFileInputFormat.java:489)
 4. Change the call to
 sc.textFile(s3n://mybucket/myfile.txt)
 and there is no error message, the application should run fine.
 There is a question on StackOverflow as well on this:
 http://stackoverflow.com/questions/26258458/sparkcontext-wholetextfiles-java-io-filenotfoundexception-file-does-not-exist
 This is link to repo/lines of code. The uncommented call doesn't work, the 
 commented call works as expected:
 https://github.com/EntilZha/nips-lda-spark/blob/45f5ad1e2646609ef9d295a0954fbefe84111d8a/src/main/scala/NipsLda.scala#L13-L19
 It would be easy to use textFile with a multifile argument, but this should 
 work correctly for s3 bucket files as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org