[jira] [Comment Edited] (SPARK-6527) sc.binaryFiles can not access files on s3
[ https://issues.apache.org/jira/browse/SPARK-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259963#comment-15259963 ] Steve Loughran edited comment on SPARK-6527 at 4/1/17 12:41 PM: I've not seen a JIRA surface; # if anyone does, -link it to HADOOP-11694, S3a Phase II, which I'm trying to wrap up this week.- # what are the characters in question? # if it's not just when there are complex characters in a name, how many files in a directory tree does it take to trigger this problem. looking into the Hadoop code, this specific error string appears if there is no match on a path containing a pattern, {code} Path p = dirs[i]; FileSystem fs = p.getFileSystem(job.getConfiguration()); FileStatus[] matches = fs.globStatus(p, inputFilter); if (matches == null) { errors.add(new IOException("Input path does not exist: " + p)); } else if (matches.length == 0) { errors.add(new IOException("Input Pattern " + p + " matches 0 files")); ... {code} It might be that odd chars in filenames are confusing that pattern matching was (Author: ste...@apache.org): I've not seen a JIRA surface; # if anyone does, link it to HADOOP-11694, S3a Phase II, which I'm trying to wrap up this week. # what are the characters in question? # if it's not just when there are complex characters in a name, how many files in a directory tree does it take to trigger this problem. looking into the Hadoop code, this specific error string appears if there is no match on a path containing a pattern, {code} Path p = dirs[i]; FileSystem fs = p.getFileSystem(job.getConfiguration()); FileStatus[] matches = fs.globStatus(p, inputFilter); if (matches == null) { errors.add(new IOException("Input path does not exist: " + p)); } else if (matches.length == 0) { errors.add(new IOException("Input Pattern " + p + " matches 0 files")); ... {code} It might be that odd chars in filenames are confusing that pattern matching > sc.binaryFiles can not access files on s3 > - > > Key: SPARK-6527 > URL: https://issues.apache.org/jira/browse/SPARK-6527 > Project: Spark > Issue Type: Bug > Components: EC2, Input/Output >Affects Versions: 1.2.0, 1.3.0 > Environment: I am running Spark on EC2 >Reporter: Zhao Zhang >Priority: Minor > > The sc.binaryFIles() can not access the files stored on s3. It can correctly > list the number of files, but report "file does not exist" when processing > them. I also tried sc.textFile() which works fine. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-6527) sc.binaryFiles can not access files on s3
[ https://issues.apache.org/jira/browse/SPARK-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15249141#comment-15249141 ] Nicholas Chammas edited comment on SPARK-6527 at 4/20/16 2:27 AM: -- Did the s3a suggestion work? If not, did anybody file an issue as Steve suggested? was (Author: nchammas): Did the s3a suggestion work? If not, did anybody file an issue as Steve suggested with more detail? > sc.binaryFiles can not access files on s3 > - > > Key: SPARK-6527 > URL: https://issues.apache.org/jira/browse/SPARK-6527 > Project: Spark > Issue Type: Bug > Components: EC2, Input/Output >Affects Versions: 1.2.0, 1.3.0 > Environment: I am running Spark on EC2 >Reporter: Zhao Zhang >Priority: Minor > > The sc.binaryFIles() can not access the files stored on s3. It can correctly > list the number of files, but report "file does not exist" when processing > them. I also tried sc.textFile() which works fine. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-6527) sc.binaryFiles can not access files on s3
[ https://issues.apache.org/jira/browse/SPARK-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964267#comment-14964267 ] bin wang edited comment on SPARK-6527 at 10/19/15 11:52 PM: [~zhaozhang], this errors happens to me too while I am using Databricks' notebook. I have tons of images in a bucket, say mybucket, when I do binaryfiles('mybucket/*'), it will error out with same message as yours. However, some of the images contain special characters that when I do binaryfiles('mybucket/00*.jpg') to restrict to a very small number of images, the command ran successfully. In that case, I think there is probably something picky about the file names containing certain characters. was (Author: biwa7636): [~zhaozhang], this errors happens to me too while I am using Databricks' notebook. I have tons of images in a bucket, say `mybucket` wher when I do `binaryfiles('mybucket/*')`, it will error out with same message as yours. However, some of the images contain special characters that when I do `binaryfiles('mybucket/00*.jpg')` to restrict to a very small number of images, the command ran successfully. In that case, I think there is probably something picky about the file names containing certain characters. > sc.binaryFiles can not access files on s3 > - > > Key: SPARK-6527 > URL: https://issues.apache.org/jira/browse/SPARK-6527 > Project: Spark > Issue Type: Bug > Components: EC2, Input/Output >Affects Versions: 1.2.0, 1.3.0 > Environment: I am running Spark on EC2 >Reporter: Zhao Zhang >Priority: Minor > > The sc.binaryFIles() can not access the files stored on s3. It can correctly > list the number of files, but report "file does not exist" when processing > them. I also tried sc.textFile() which works fine. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org