[jira] [Comment Edited] (SPARK-6527) sc.binaryFiles can not access files on s3

2017-04-01 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259963#comment-15259963
 ] 

Steve Loughran edited comment on SPARK-6527 at 4/1/17 12:41 PM:


I've not seen a JIRA surface;

#  if anyone does, -link it to HADOOP-11694, S3a Phase II, which I'm trying to 
wrap up this week.-
# what are the characters in question?
# if it's not just when there are complex characters in a name, how many files 
in a directory tree does it take to trigger this problem.


looking into the Hadoop code, this specific error string appears if there is no 
match on a path containing a pattern, 
{code}
  Path p = dirs[i];
  FileSystem fs = p.getFileSystem(job.getConfiguration()); 
  FileStatus[] matches = fs.globStatus(p, inputFilter);
  if (matches == null) {
errors.add(new IOException("Input path does not exist: " + p));
  } else if (matches.length == 0) {
errors.add(new IOException("Input Pattern " + p + " matches 0 files"));
...
{code}

It might be that odd chars in filenames are confusing that pattern matching



was (Author: ste...@apache.org):
I've not seen a JIRA surface;

#  if anyone does, link it to HADOOP-11694, S3a Phase II, which I'm trying to 
wrap up this week.
# what are the characters in question?
# if it's not just when there are complex characters in a name, how many files 
in a directory tree does it take to trigger this problem.


looking into the Hadoop code, this specific error string appears if there is no 
match on a path containing a pattern, 
{code}
  Path p = dirs[i];
  FileSystem fs = p.getFileSystem(job.getConfiguration()); 
  FileStatus[] matches = fs.globStatus(p, inputFilter);
  if (matches == null) {
errors.add(new IOException("Input path does not exist: " + p));
  } else if (matches.length == 0) {
errors.add(new IOException("Input Pattern " + p + " matches 0 files"));
...
{code}

It might be that odd chars in filenames are confusing that pattern matching


> sc.binaryFiles can not access files on s3
> -
>
> Key: SPARK-6527
> URL: https://issues.apache.org/jira/browse/SPARK-6527
> Project: Spark
>  Issue Type: Bug
>  Components: EC2, Input/Output
>Affects Versions: 1.2.0, 1.3.0
> Environment: I am running Spark on EC2
>Reporter: Zhao Zhang
>Priority: Minor
>
> The sc.binaryFIles() can not access the files stored on s3. It can correctly 
> list the number of files, but report "file does not exist" when processing 
> them. I also tried sc.textFile() which works fine.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-6527) sc.binaryFiles can not access files on s3

2016-04-19 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15249141#comment-15249141
 ] 

Nicholas Chammas edited comment on SPARK-6527 at 4/20/16 2:27 AM:
--

Did the s3a suggestion work? If not, did anybody file an issue as Steve 
suggested?


was (Author: nchammas):
Did the s3a suggestion work? If not, did anybody file an issue as Steve 
suggested with more detail?

> sc.binaryFiles can not access files on s3
> -
>
> Key: SPARK-6527
> URL: https://issues.apache.org/jira/browse/SPARK-6527
> Project: Spark
>  Issue Type: Bug
>  Components: EC2, Input/Output
>Affects Versions: 1.2.0, 1.3.0
> Environment: I am running Spark on EC2
>Reporter: Zhao Zhang
>Priority: Minor
>
> The sc.binaryFIles() can not access the files stored on s3. It can correctly 
> list the number of files, but report "file does not exist" when processing 
> them. I also tried sc.textFile() which works fine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-6527) sc.binaryFiles can not access files on s3

2015-10-19 Thread bin wang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964267#comment-14964267
 ] 

bin wang edited comment on SPARK-6527 at 10/19/15 11:52 PM:


[~zhaozhang], this errors happens to me too while I am using Databricks' 
notebook. I have tons of images in a bucket, say mybucket, when I do 
binaryfiles('mybucket/*'), it will error out with same message as yours. 
However, some of the images contain special characters that when I do 
binaryfiles('mybucket/00*.jpg') to restrict to a very small number of images, 
the command ran successfully. 

In that case, I think there is probably something picky about the file names 
containing certain characters. 


was (Author: biwa7636):
[~zhaozhang], this errors happens to me too while I am using Databricks' 
notebook. I have tons of images in a bucket, say `mybucket` wher when I do 
`binaryfiles('mybucket/*')`, it will error out with same message as yours. 
However, some of the images contain special characters that when I do 
`binaryfiles('mybucket/00*.jpg')` to restrict to a very small number of images, 
the command ran successfully. 

In that case, I think there is probably something picky about the file names 
containing certain characters. 

> sc.binaryFiles can not access files on s3
> -
>
> Key: SPARK-6527
> URL: https://issues.apache.org/jira/browse/SPARK-6527
> Project: Spark
>  Issue Type: Bug
>  Components: EC2, Input/Output
>Affects Versions: 1.2.0, 1.3.0
> Environment: I am running Spark on EC2
>Reporter: Zhao Zhang
>Priority: Minor
>
> The sc.binaryFIles() can not access the files stored on s3. It can correctly 
> list the number of files, but report "file does not exist" when processing 
> them. I also tried sc.textFile() which works fine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org