[jira] [Comment Edited] (HADOOP-14237) S3A Support Shared Instance Profile Credentials Across All Hadoop Nodes
[ https://issues.apache.org/jira/browse/HADOOP-14237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15969742#comment-15969742 ] Kazuyuki Tanimura edited comment on HADOOP-14237 at 4/15/17 12:53 AM: -- I will make this an independent credential provider. was (Author: kazuyukitanimura): I will make this an independnet credential provider. > S3A Support Shared Instance Profile Credentials Across All Hadoop Nodes > --- > > Key: HADOOP-14237 > URL: https://issues.apache.org/jira/browse/HADOOP-14237 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.8.0, 3.0.0-alpha1, 3.0.0-alpha2, 2.8.1 > Environment: EC2, AWS >Reporter: Kazuyuki Tanimura >Assignee: Kazuyuki Tanimura > > When I run a large Hadoop cluster on EC2 instances with IAM Role, it fails > getting the instance profile credentials, eventually all jobs on the cluster > fail. Since a number of S3A clients (all mappers and reducers) try to get the > credentials, the AWS credential endpoint starts responding 5xx and 4xx error > codes. > SharedInstanceProfileCredentialsProvider.java is sort of trying to solve it, > but it still does not share the credentials with other EC2 nodes / JVM > processes. > This issue prevents users from creating Hadoop clusters on EC2 -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14237) S3A Support Shared Instance Profile Credentials Across All Hadoop Nodes
[ https://issues.apache.org/jira/browse/HADOOP-14237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15969742#comment-15969742 ] Kazuyuki Tanimura commented on HADOOP-14237: I will make this an independnet credential provider. > S3A Support Shared Instance Profile Credentials Across All Hadoop Nodes > --- > > Key: HADOOP-14237 > URL: https://issues.apache.org/jira/browse/HADOOP-14237 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.8.0, 3.0.0-alpha1, 3.0.0-alpha2, 2.8.1 > Environment: EC2, AWS >Reporter: Kazuyuki Tanimura >Assignee: Kazuyuki Tanimura > > When I run a large Hadoop cluster on EC2 instances with IAM Role, it fails > getting the instance profile credentials, eventually all jobs on the cluster > fail. Since a number of S3A clients (all mappers and reducers) try to get the > credentials, the AWS credential endpoint starts responding 5xx and 4xx error > codes. > SharedInstanceProfileCredentialsProvider.java is sort of trying to solve it, > but it still does not share the credentials with other EC2 nodes / JVM > processes. > This issue prevents users from creating Hadoop clusters on EC2 -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-14235) S3A Path does not understand colon (:) when globbing
[ https://issues.apache.org/jira/browse/HADOOP-14235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kazuyuki Tanimura resolved HADOOP-14235. Resolution: Won't Fix Ok, closing in favor of HADOOP-3257 > S3A Path does not understand colon (:) when globbing > > > Key: HADOOP-14235 > URL: https://issues.apache.org/jira/browse/HADOOP-14235 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 2.8.1 > Environment: EC2, AWS >Reporter: Kazuyuki Tanimura >Priority: Minor > > S3 paths, colons ":" are valid character in S3 paths. However, the Java URI > class, which is used in the Path class, does not allow it. > This becomes a problem particularly when we are globbing S3 paths. The > globber thinks paths with colons are invalid paths and throws > URISyntaxException. > The reason is we are sharing Globber.java with all other Fs. Some of the > rules for regular Fs are not applicable to S3 just like this colon as an > example. > Same issue is reported here https://issues.apache.org/jira/browse/SPARK-20061 > The good news is I have a one line fix that I am about to send a pull request. > However, for a right fix, we should separate the S3 globber from the > Globber.java as proposed at https://issues.apache.org/jira/browse/HADOOP-13371 -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-14239) S3A Retry Multiple S3 Key Deletion
[ https://issues.apache.org/jira/browse/HADOOP-14239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kazuyuki Tanimura resolved HADOOP-14239. Resolution: Duplicate I don't have a good way to identify 404 failures out of the all failed keys. I think we can still try to use the multi-delete on failed keys if there are multiple instead of deleting one by one. If the number of failed keys reduced, it means the retry was a right thing to do. Anyway, HADOOP-11572 covers the same topic. I am closing this ticket as duplication. > S3A Retry Multiple S3 Key Deletion > -- > > Key: HADOOP-14239 > URL: https://issues.apache.org/jira/browse/HADOOP-14239 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 2.8.0 > Environment: EC2, AWS >Reporter: Kazuyuki Tanimura > > When fs.s3a.multiobjectdelete.enable == true, It tries to delete multiple S3 > keys at once. > Although this is a great feature, it becomes problematic when AWS fails > deleting some S3 keys out of the deletion list. The aws-java-sdk internally > retries to delete them, but it does not help because it simply retries the > same list of S3 keys including the successfully deleted ones. In that case, > all successive retries fail deleting previously deleted keys since they do > not exist any more. Eventually it throws an Exception and leads to a job > failure entirely. > Luckily, the AWS API reports which keys it failed to delete. We should retry > only for the keys that failed to be deleted from S3A -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14237) S3A Support Shared Instance Profile Credentials Across All Hadoop Nodes
[ https://issues.apache.org/jira/browse/HADOOP-14237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kazuyuki Tanimura updated HADOOP-14237: --- Description: When I run a large Hadoop cluster on EC2 instances with IAM Role, it fails getting the instance profile credentials, eventually all jobs on the cluster fail. Since a number of S3A clients (all mappers and reducers) try to get the credentials, the AWS credential endpoint starts responding 5xx and 4xx error codes. SharedInstanceProfileCredentialsProvider.java is sort of trying to solve it, but it still does not share the credentials with other EC2 nodes / JVM processes. This issue prevents users from creating Hadoop clusters on EC2 was: When I run a large Hadoop cluster on EC2 instances with IAM Role, it fails getting the instance profile credentials, eventually all jobs on the cluster fail. Since a number of S3A clients (all mappers and reducers) try to get the credentials, the AWS credential endpoint starts responding 5xx and 4xx error codes. SharedInstanceProfileCredentialsProvider.java is sort of trying to solve it, but it still does not share the credentials with other EC2 nodes / processes. This issue prevents users from creating Hadoop clusters on EC2 > S3A Support Shared Instance Profile Credentials Across All Hadoop Nodes > --- > > Key: HADOOP-14237 > URL: https://issues.apache.org/jira/browse/HADOOP-14237 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 2.8.0, 3.0.0-alpha1, 3.0.0-alpha2, 2.8.1 > Environment: EC2, AWS >Reporter: Kazuyuki Tanimura > > When I run a large Hadoop cluster on EC2 instances with IAM Role, it fails > getting the instance profile credentials, eventually all jobs on the cluster > fail. Since a number of S3A clients (all mappers and reducers) try to get the > credentials, the AWS credential endpoint starts responding 5xx and 4xx error > codes. > SharedInstanceProfileCredentialsProvider.java is sort of trying to solve it, > but it still does not share the credentials with other EC2 nodes / JVM > processes. > This issue prevents users from creating Hadoop clusters on EC2 -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-14237) S3A Support Shared Instance Profile Credentials Across All Hadoop Nodes
[ https://issues.apache.org/jira/browse/HADOOP-14237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15941523#comment-15941523 ] Kazuyuki Tanimura edited comment on HADOOP-14237 at 3/25/17 2:29 AM: - True. Just to be clear, this patch is for making sure the credentials is shared among all Hadoop nodes not only shared within a node. As I add more nodes to a cluster, it was too easy to hit the AWS account level limits. was (Author: kazuyukitanimura): True. Just to be clear, this patch is for making sure the credentials is shared among all Hadoop nodes not only shared within a node. As I add more nodes to a cluster, it was too easy to hit the account level limits. > S3A Support Shared Instance Profile Credentials Across All Hadoop Nodes > --- > > Key: HADOOP-14237 > URL: https://issues.apache.org/jira/browse/HADOOP-14237 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 2.8.0, 3.0.0-alpha1, 3.0.0-alpha2, 2.8.1 > Environment: EC2, AWS >Reporter: Kazuyuki Tanimura > > When I run a large Hadoop cluster on EC2 instances with IAM Role, it fails > getting the instance profile credentials, eventually all jobs on the cluster > fail. Since a number of S3A clients (all mappers and reducers) try to get the > credentials, the AWS credential endpoint starts responding 5xx and 4xx error > codes. > SharedInstanceProfileCredentialsProvider.java is sort of trying to solve it, > but it still does not share the credentials with other EC2 nodes / processes. > This issue prevents users from creating Hadoop clusters on EC2 -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14237) S3A Support Shared Instance Profile Credentials Across All Hadoop Nodes
[ https://issues.apache.org/jira/browse/HADOOP-14237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15941523#comment-15941523 ] Kazuyuki Tanimura commented on HADOOP-14237: True. Just to be clear, this patch is for making sure the credentials is shared among all Hadoop nodes not only shared within a node. As I add more nodes to a cluster, it was too easy to hit the account level limits. > S3A Support Shared Instance Profile Credentials Across All Hadoop Nodes > --- > > Key: HADOOP-14237 > URL: https://issues.apache.org/jira/browse/HADOOP-14237 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 2.8.0, 3.0.0-alpha1, 3.0.0-alpha2, 2.8.1 > Environment: EC2, AWS >Reporter: Kazuyuki Tanimura > > When I run a large Hadoop cluster on EC2 instances with IAM Role, it fails > getting the instance profile credentials, eventually all jobs on the cluster > fail. Since a number of S3A clients (all mappers and reducers) try to get the > credentials, the AWS credential endpoint starts responding 5xx and 4xx error > codes. > SharedInstanceProfileCredentialsProvider.java is sort of trying to solve it, > but it still does not share the credentials with other EC2 nodes / processes. > This issue prevents users from creating Hadoop clusters on EC2 -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14237) S3A Support Shared Instance Profile Credentials Across All Hadoop Nodes
[ https://issues.apache.org/jira/browse/HADOOP-14237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kazuyuki Tanimura updated HADOOP-14237: --- Description: When I run a large Hadoop cluster on EC2 instances with IAM Role, it fails getting the instance profile credentials, eventually all jobs on the cluster fail. Since a number of S3A clients (all mappers and reducers) try to get the credentials, the AWS credential endpoint starts responding 5xx and 4xx error codes. SharedInstanceProfileCredentialsProvider.java is sort of trying to solve it, but it still does not share the credentials with other EC2 nodes / processes. This issue prevents users from creating Hadoop clusters on EC2 was: When I run a large Hadoop cluster on EC2 instances with IAM Role, it fails getting the instance profile credentials, eventually all jobs on the cluster fail. Since a number of S3A clients (all mappers and reducers) try to get the credentials, the AWS credential endpoint starts responding 5xx and 4xx error codes. SharedInstanceProfileCredentialsProvider.java is sort of trying to solve it, but it still does not share the credentials with other EC2 instances / processes. This issue prevents users from creating Hadoop clusters on EC2 > S3A Support Shared Instance Profile Credentials Across All Hadoop Nodes > --- > > Key: HADOOP-14237 > URL: https://issues.apache.org/jira/browse/HADOOP-14237 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 2.8.0, 3.0.0-alpha1, 3.0.0-alpha2, 2.8.1 > Environment: EC2, AWS >Reporter: Kazuyuki Tanimura > > When I run a large Hadoop cluster on EC2 instances with IAM Role, it fails > getting the instance profile credentials, eventually all jobs on the cluster > fail. Since a number of S3A clients (all mappers and reducers) try to get the > credentials, the AWS credential endpoint starts responding 5xx and 4xx error > codes. > SharedInstanceProfileCredentialsProvider.java is sort of trying to solve it, > but it still does not share the credentials with other EC2 nodes / processes. > This issue prevents users from creating Hadoop clusters on EC2 -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14237) S3A Support Shared Instance Profile Credentials Across All Hadoop Nodes
[ https://issues.apache.org/jira/browse/HADOOP-14237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kazuyuki Tanimura updated HADOOP-14237: --- Summary: S3A Support Shared Instance Profile Credentials Across All Hadoop Nodes (was: S3A Support Shared Instance Profile Credentials Across All Instances) > S3A Support Shared Instance Profile Credentials Across All Hadoop Nodes > --- > > Key: HADOOP-14237 > URL: https://issues.apache.org/jira/browse/HADOOP-14237 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 2.8.0, 3.0.0-alpha1, 3.0.0-alpha2, 2.8.1 > Environment: EC2, AWS >Reporter: Kazuyuki Tanimura > > When I run a large Hadoop cluster on EC2 instances with IAM Role, it fails > getting the instance profile credentials, eventually all jobs on the cluster > fail. Since a number of S3A clients (all mappers and reducers) try to get the > credentials, the AWS credential endpoint starts responding 5xx and 4xx error > codes. > SharedInstanceProfileCredentialsProvider.java is sort of trying to solve it, > but it still does not share the credentials with other EC2 instances / > processes. > This issue prevents users from creating Hadoop clusters on EC2 -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14239) S3A Retry Multiple S3 Key Deletion
[ https://issues.apache.org/jira/browse/HADOOP-14239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15941457#comment-15941457 ] Kazuyuki Tanimura commented on HADOOP-14239: To be clear, deletion is called from rename() as well. It makes more frequent to encounter this issue... > S3A Retry Multiple S3 Key Deletion > -- > > Key: HADOOP-14239 > URL: https://issues.apache.org/jira/browse/HADOOP-14239 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 2.8.0, 3.0.0-alpha1, 3.0.0-alpha2, 2.8.1 > Environment: EC2, AWS >Reporter: Kazuyuki Tanimura > > When fs.s3a.multiobjectdelete.enable == true, It tries to delete multiple S3 > keys at once. > Although this is a great feature, it becomes problematic when AWS fails > deleting some S3 keys out of the deletion list. The aws-java-sdk internally > retries to delete them, but it does not help because it simply retries the > same list of S3 keys including the successfully deleted ones. In that case, > all successive retries fail deleting previously deleted keys since they do > not exist any more. Eventually it throws an Exception and leads to a job > failure entirely. > Luckily, the AWS API reports which keys it failed to delete. We should retry > only for the keys that failed to be deleted from S3A -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-14239) S3A Retry Multiple S3 Key Deletion
Kazuyuki Tanimura created HADOOP-14239: -- Summary: S3A Retry Multiple S3 Key Deletion Key: HADOOP-14239 URL: https://issues.apache.org/jira/browse/HADOOP-14239 Project: Hadoop Common Issue Type: Bug Components: fs/s3 Affects Versions: 3.0.0-alpha2, 3.0.0-alpha1, 2.8.0, 2.8.1 Environment: EC2, AWS Reporter: Kazuyuki Tanimura When fs.s3a.multiobjectdelete.enable == true, It tries to delete multiple S3 keys at once. Although this is a great feature, it becomes problematic when AWS fails deleting some S3 keys out of the deletion list. The aws-java-sdk internally retries to delete them, but it does not help because it simply retries the same list of S3 keys including the successfully deleted ones. In that case, all successive retries fail deleting previously deleted keys since they do not exist any more. Eventually it throws an Exception and leads to a job failure entirely. Luckily, the AWS API reports which keys it failed to delete. We should retry only for the keys that failed to be deleted from S3A -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-14237) S3A Support Shared Instance Profile Credentials Across All Instances
Kazuyuki Tanimura created HADOOP-14237: -- Summary: S3A Support Shared Instance Profile Credentials Across All Instances Key: HADOOP-14237 URL: https://issues.apache.org/jira/browse/HADOOP-14237 Project: Hadoop Common Issue Type: Bug Components: fs/s3 Affects Versions: 3.0.0-alpha2, 3.0.0-alpha1, 2.8.0, 2.8.1 Environment: EC2, AWS Reporter: Kazuyuki Tanimura When I run a large Hadoop cluster on EC2 instances with IAM Role, it fails getting the instance profile credentials, eventually all jobs on the cluster fail. Since a number of S3A clients (all mappers and reducers) try to get the credentials, the AWS credential endpoint starts responding 5xx and 4xx error codes. SharedInstanceProfileCredentialsProvider.java is sort of trying to solve it, but it still does not share the credentials with other EC2 instances / processes. This issue prevents users from creating Hadoop clusters on EC2 -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14235) S3A Path does not understand colon (:) when globbing
[ https://issues.apache.org/jira/browse/HADOOP-14235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kazuyuki Tanimura updated HADOOP-14235: --- Environment: EC2, AWS > S3A Path does not understand colon (:) when globbing > > > Key: HADOOP-14235 > URL: https://issues.apache.org/jira/browse/HADOOP-14235 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 2.8.0, 3.0.0-alpha1, 3.0.0-alpha2, 2.8.1 > Environment: EC2, AWS >Reporter: Kazuyuki Tanimura > > S3 paths, colons ":" are valid character in S3 paths. However, the Java URI > class, which is used in the Path class, does not allow it. > This becomes a problem particularly when we are globbing S3 paths. The > globber thinks paths with colons are invalid paths and throws > URISyntaxException. > The reason is we are sharing Globber.java with all other Fs. Some of the > rules for regular Fs are not applicable to S3 just like this colon as an > example. > Same issue is reported here https://issues.apache.org/jira/browse/SPARK-20061 > The good news is I have a one line fix that I am about to send a pull request. > However, for a right fix, we should separate the S3 globber from the > Globber.java as proposed at https://issues.apache.org/jira/browse/HADOOP-13371 -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14235) S3A Path does not understand colon (:) when globbing
[ https://issues.apache.org/jira/browse/HADOOP-14235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kazuyuki Tanimura updated HADOOP-14235: --- Description: S3 paths, colons ":" are valid character in S3 paths. However, the Java URI class, which is used in the Path class, does not allow it. This becomes a problem particularly when we are globbing S3 paths. The globber thinks paths with colons are invalid paths and throws URISyntaxException. The reason is we are sharing Globber.java with all other Fs. Some of the rules for regular Fs are not applicable to S3 just like this colon as an example. Same issue is reported here https://issues.apache.org/jira/browse/SPARK-20061 The good news is I have a one line fix that I am about to send a pull request. However, for a right fix, we should separate the S3 globber from the Globber.java as proposed at https://issues.apache.org/jira/browse/HADOOP-13371 was: S3 paths, colons (:) are valid character in S3 paths. However, the Java URI class, which is used in the Path class, does not allow it. This becomes a problem particularly when we are globbing S3 paths. The globber thinks paths with colons are invalid paths and throws URISyntaxException. The reason is we are sharing Globber.java with all other Fs. Some of the rules for regular Fs are not applicable to S3 just like this colon as an example. Same issue is reported here https://issues.apache.org/jira/browse/SPARK-20061 The good news is I have a one line fix that I am about to send a pull request. However, for a right fix, we should separate the S3 globber from the Globber.java as proposed at https://issues.apache.org/jira/browse/HADOOP-13371 > S3A Path does not understand colon (:) when globbing > > > Key: HADOOP-14235 > URL: https://issues.apache.org/jira/browse/HADOOP-14235 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 2.8.0, 3.0.0-alpha1, 3.0.0-alpha2, 2.8.1 >Reporter: Kazuyuki Tanimura > > S3 paths, colons ":" are valid character in S3 paths. However, the Java URI > class, which is used in the Path class, does not allow it. > This becomes a problem particularly when we are globbing S3 paths. The > globber thinks paths with colons are invalid paths and throws > URISyntaxException. > The reason is we are sharing Globber.java with all other Fs. Some of the > rules for regular Fs are not applicable to S3 just like this colon as an > example. > Same issue is reported here https://issues.apache.org/jira/browse/SPARK-20061 > The good news is I have a one line fix that I am about to send a pull request. > However, for a right fix, we should separate the S3 globber from the > Globber.java as proposed at https://issues.apache.org/jira/browse/HADOOP-13371 -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-14235) S3A Path does not understand colon (:) when globbing
Kazuyuki Tanimura created HADOOP-14235: -- Summary: S3A Path does not understand colon (:) when globbing Key: HADOOP-14235 URL: https://issues.apache.org/jira/browse/HADOOP-14235 Project: Hadoop Common Issue Type: Bug Components: fs/s3 Affects Versions: 3.0.0-alpha2, 3.0.0-alpha1, 2.8.0, 2.8.1 Reporter: Kazuyuki Tanimura S3 paths, colons (:) are valid character in S3 paths. However, the Java URI class, which is used in the Path class, does not allow it. This becomes a problem particularly when we are globbing S3 paths. The globber thinks paths with colons are invalid paths and throws URISyntaxException. The reason is we are sharing Globber.java with all other Fs. Some of the rules for regular Fs are not applicable to S3 just like this colon as an example. Same issue is reported here https://issues.apache.org/jira/browse/SPARK-20061 The good news is I have a one line fix that I am about to send a pull request. However, for a right fix, we should separate the S3 globber from the Globber.java as proposed at https://issues.apache.org/jira/browse/HADOOP-13371 -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org