[jira] [Comment Edited] (HADOOP-14237) S3A Support Shared Instance Profile Credentials Across All Hadoop Nodes

2017-04-14 Thread Kazuyuki Tanimura (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15969742#comment-15969742
 ] 

Kazuyuki Tanimura edited comment on HADOOP-14237 at 4/15/17 12:53 AM:
--

I will make this an independent credential provider.


was (Author: kazuyukitanimura):
I will make this an independnet credential provider.

> S3A Support Shared Instance Profile Credentials Across All Hadoop Nodes
> ---
>
> Key: HADOOP-14237
> URL: https://issues.apache.org/jira/browse/HADOOP-14237
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0, 3.0.0-alpha1, 3.0.0-alpha2, 2.8.1
> Environment: EC2, AWS
>Reporter: Kazuyuki Tanimura
>Assignee: Kazuyuki Tanimura
>
> When I run a large Hadoop cluster on EC2 instances with IAM Role, it fails 
> getting the instance profile credentials, eventually all jobs on the cluster 
> fail. Since a number of S3A clients (all mappers and reducers) try to get the 
> credentials, the AWS credential endpoint starts responding 5xx and 4xx error 
> codes.
> SharedInstanceProfileCredentialsProvider.java is sort of trying to solve it, 
> but it still does not share the credentials with other EC2 nodes / JVM 
> processes.
> This issue prevents users from creating Hadoop clusters on EC2



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14237) S3A Support Shared Instance Profile Credentials Across All Hadoop Nodes

2017-04-14 Thread Kazuyuki Tanimura (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15969742#comment-15969742
 ] 

Kazuyuki Tanimura commented on HADOOP-14237:


I will make this an independnet credential provider.

> S3A Support Shared Instance Profile Credentials Across All Hadoop Nodes
> ---
>
> Key: HADOOP-14237
> URL: https://issues.apache.org/jira/browse/HADOOP-14237
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0, 3.0.0-alpha1, 3.0.0-alpha2, 2.8.1
> Environment: EC2, AWS
>Reporter: Kazuyuki Tanimura
>Assignee: Kazuyuki Tanimura
>
> When I run a large Hadoop cluster on EC2 instances with IAM Role, it fails 
> getting the instance profile credentials, eventually all jobs on the cluster 
> fail. Since a number of S3A clients (all mappers and reducers) try to get the 
> credentials, the AWS credential endpoint starts responding 5xx and 4xx error 
> codes.
> SharedInstanceProfileCredentialsProvider.java is sort of trying to solve it, 
> but it still does not share the credentials with other EC2 nodes / JVM 
> processes.
> This issue prevents users from creating Hadoop clusters on EC2



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-14235) S3A Path does not understand colon (:) when globbing

2017-04-01 Thread Kazuyuki Tanimura (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuyuki Tanimura resolved HADOOP-14235.

Resolution: Won't Fix

Ok, closing in favor of HADOOP-3257

> S3A Path does not understand colon (:) when globbing
> 
>
> Key: HADOOP-14235
> URL: https://issues.apache.org/jira/browse/HADOOP-14235
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.8.1
> Environment: EC2, AWS
>Reporter: Kazuyuki Tanimura
>Priority: Minor
>
> S3 paths, colons ":" are valid character in S3 paths. However, the Java URI 
> class, which is used in the Path class, does not allow it.
> This becomes a problem particularly when we are globbing S3 paths. The 
> globber thinks paths with colons are invalid paths and throws 
> URISyntaxException.
> The reason is we are sharing Globber.java with all other Fs. Some of the 
> rules for regular Fs are not applicable to S3 just like this colon as an 
> example.
> Same issue is reported here https://issues.apache.org/jira/browse/SPARK-20061
> The good news is I have a one line fix that I am about to send a pull request.
> However, for a right fix, we should separate the S3 globber from the 
> Globber.java as proposed at https://issues.apache.org/jira/browse/HADOOP-13371



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-14239) S3A Retry Multiple S3 Key Deletion

2017-04-01 Thread Kazuyuki Tanimura (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuyuki Tanimura resolved HADOOP-14239.

Resolution: Duplicate

I don't have a good way to identify 404 failures out of the all failed keys.
I think we can still try to use the multi-delete on failed keys if there are 
multiple instead of deleting one by one. If the number of failed keys reduced, 
it means the retry was a right thing to do.

Anyway, HADOOP-11572 covers the same topic. 
I am closing this ticket as duplication.

> S3A Retry Multiple S3 Key Deletion
> --
>
> Key: HADOOP-14239
> URL: https://issues.apache.org/jira/browse/HADOOP-14239
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.8.0
> Environment: EC2, AWS
>Reporter: Kazuyuki Tanimura
>
> When fs.s3a.multiobjectdelete.enable == true, It tries to delete multiple S3 
> keys at once.
> Although this is a great feature, it becomes problematic when AWS fails 
> deleting some S3 keys out of the deletion list. The aws-java-sdk internally 
> retries to delete them, but it does not help because it simply retries the 
> same list of S3 keys including the successfully deleted ones. In that case, 
> all successive retries fail deleting previously deleted keys since they do 
> not exist any more. Eventually it throws an Exception and leads to a job 
> failure entirely.
> Luckily, the AWS API reports which keys it failed to delete. We should retry 
> only for the keys that failed to be deleted from S3A



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14237) S3A Support Shared Instance Profile Credentials Across All Hadoop Nodes

2017-03-24 Thread Kazuyuki Tanimura (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuyuki Tanimura updated HADOOP-14237:
---
Description: 
When I run a large Hadoop cluster on EC2 instances with IAM Role, it fails 
getting the instance profile credentials, eventually all jobs on the cluster 
fail. Since a number of S3A clients (all mappers and reducers) try to get the 
credentials, the AWS credential endpoint starts responding 5xx and 4xx error 
codes.

SharedInstanceProfileCredentialsProvider.java is sort of trying to solve it, 
but it still does not share the credentials with other EC2 nodes / JVM 
processes.

This issue prevents users from creating Hadoop clusters on EC2

  was:
When I run a large Hadoop cluster on EC2 instances with IAM Role, it fails 
getting the instance profile credentials, eventually all jobs on the cluster 
fail. Since a number of S3A clients (all mappers and reducers) try to get the 
credentials, the AWS credential endpoint starts responding 5xx and 4xx error 
codes.

SharedInstanceProfileCredentialsProvider.java is sort of trying to solve it, 
but it still does not share the credentials with other EC2 nodes / processes.

This issue prevents users from creating Hadoop clusters on EC2


> S3A Support Shared Instance Profile Credentials Across All Hadoop Nodes
> ---
>
> Key: HADOOP-14237
> URL: https://issues.apache.org/jira/browse/HADOOP-14237
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.8.0, 3.0.0-alpha1, 3.0.0-alpha2, 2.8.1
> Environment: EC2, AWS
>Reporter: Kazuyuki Tanimura
>
> When I run a large Hadoop cluster on EC2 instances with IAM Role, it fails 
> getting the instance profile credentials, eventually all jobs on the cluster 
> fail. Since a number of S3A clients (all mappers and reducers) try to get the 
> credentials, the AWS credential endpoint starts responding 5xx and 4xx error 
> codes.
> SharedInstanceProfileCredentialsProvider.java is sort of trying to solve it, 
> but it still does not share the credentials with other EC2 nodes / JVM 
> processes.
> This issue prevents users from creating Hadoop clusters on EC2



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-14237) S3A Support Shared Instance Profile Credentials Across All Hadoop Nodes

2017-03-24 Thread Kazuyuki Tanimura (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15941523#comment-15941523
 ] 

Kazuyuki Tanimura edited comment on HADOOP-14237 at 3/25/17 2:29 AM:
-

True.

Just to be clear, this patch is for making sure the credentials is shared among 
all Hadoop nodes not only shared within a node.
As I add more nodes to a cluster, it was too easy to hit the AWS account level 
limits.


was (Author: kazuyukitanimura):
True.

Just to be clear, this patch is for making sure the credentials is shared among 
all Hadoop nodes not only shared within a node.
As I add more nodes to a cluster, it was too easy to hit the account level 
limits.

> S3A Support Shared Instance Profile Credentials Across All Hadoop Nodes
> ---
>
> Key: HADOOP-14237
> URL: https://issues.apache.org/jira/browse/HADOOP-14237
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.8.0, 3.0.0-alpha1, 3.0.0-alpha2, 2.8.1
> Environment: EC2, AWS
>Reporter: Kazuyuki Tanimura
>
> When I run a large Hadoop cluster on EC2 instances with IAM Role, it fails 
> getting the instance profile credentials, eventually all jobs on the cluster 
> fail. Since a number of S3A clients (all mappers and reducers) try to get the 
> credentials, the AWS credential endpoint starts responding 5xx and 4xx error 
> codes.
> SharedInstanceProfileCredentialsProvider.java is sort of trying to solve it, 
> but it still does not share the credentials with other EC2 nodes / processes.
> This issue prevents users from creating Hadoop clusters on EC2



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14237) S3A Support Shared Instance Profile Credentials Across All Hadoop Nodes

2017-03-24 Thread Kazuyuki Tanimura (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15941523#comment-15941523
 ] 

Kazuyuki Tanimura commented on HADOOP-14237:


True.

Just to be clear, this patch is for making sure the credentials is shared among 
all Hadoop nodes not only shared within a node.
As I add more nodes to a cluster, it was too easy to hit the account level 
limits.

> S3A Support Shared Instance Profile Credentials Across All Hadoop Nodes
> ---
>
> Key: HADOOP-14237
> URL: https://issues.apache.org/jira/browse/HADOOP-14237
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.8.0, 3.0.0-alpha1, 3.0.0-alpha2, 2.8.1
> Environment: EC2, AWS
>Reporter: Kazuyuki Tanimura
>
> When I run a large Hadoop cluster on EC2 instances with IAM Role, it fails 
> getting the instance profile credentials, eventually all jobs on the cluster 
> fail. Since a number of S3A clients (all mappers and reducers) try to get the 
> credentials, the AWS credential endpoint starts responding 5xx and 4xx error 
> codes.
> SharedInstanceProfileCredentialsProvider.java is sort of trying to solve it, 
> but it still does not share the credentials with other EC2 nodes / processes.
> This issue prevents users from creating Hadoop clusters on EC2



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14237) S3A Support Shared Instance Profile Credentials Across All Hadoop Nodes

2017-03-24 Thread Kazuyuki Tanimura (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuyuki Tanimura updated HADOOP-14237:
---
Description: 
When I run a large Hadoop cluster on EC2 instances with IAM Role, it fails 
getting the instance profile credentials, eventually all jobs on the cluster 
fail. Since a number of S3A clients (all mappers and reducers) try to get the 
credentials, the AWS credential endpoint starts responding 5xx and 4xx error 
codes.

SharedInstanceProfileCredentialsProvider.java is sort of trying to solve it, 
but it still does not share the credentials with other EC2 nodes / processes.

This issue prevents users from creating Hadoop clusters on EC2

  was:
When I run a large Hadoop cluster on EC2 instances with IAM Role, it fails 
getting the instance profile credentials, eventually all jobs on the cluster 
fail. Since a number of S3A clients (all mappers and reducers) try to get the 
credentials, the AWS credential endpoint starts responding 5xx and 4xx error 
codes.

SharedInstanceProfileCredentialsProvider.java is sort of trying to solve it, 
but it still does not share the credentials with other EC2 instances / 
processes.

This issue prevents users from creating Hadoop clusters on EC2


> S3A Support Shared Instance Profile Credentials Across All Hadoop Nodes
> ---
>
> Key: HADOOP-14237
> URL: https://issues.apache.org/jira/browse/HADOOP-14237
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.8.0, 3.0.0-alpha1, 3.0.0-alpha2, 2.8.1
> Environment: EC2, AWS
>Reporter: Kazuyuki Tanimura
>
> When I run a large Hadoop cluster on EC2 instances with IAM Role, it fails 
> getting the instance profile credentials, eventually all jobs on the cluster 
> fail. Since a number of S3A clients (all mappers and reducers) try to get the 
> credentials, the AWS credential endpoint starts responding 5xx and 4xx error 
> codes.
> SharedInstanceProfileCredentialsProvider.java is sort of trying to solve it, 
> but it still does not share the credentials with other EC2 nodes / processes.
> This issue prevents users from creating Hadoop clusters on EC2



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14237) S3A Support Shared Instance Profile Credentials Across All Hadoop Nodes

2017-03-24 Thread Kazuyuki Tanimura (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuyuki Tanimura updated HADOOP-14237:
---
Summary: S3A Support Shared Instance Profile Credentials Across All Hadoop 
Nodes  (was: S3A Support Shared Instance Profile Credentials Across All 
Instances)

> S3A Support Shared Instance Profile Credentials Across All Hadoop Nodes
> ---
>
> Key: HADOOP-14237
> URL: https://issues.apache.org/jira/browse/HADOOP-14237
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.8.0, 3.0.0-alpha1, 3.0.0-alpha2, 2.8.1
> Environment: EC2, AWS
>Reporter: Kazuyuki Tanimura
>
> When I run a large Hadoop cluster on EC2 instances with IAM Role, it fails 
> getting the instance profile credentials, eventually all jobs on the cluster 
> fail. Since a number of S3A clients (all mappers and reducers) try to get the 
> credentials, the AWS credential endpoint starts responding 5xx and 4xx error 
> codes.
> SharedInstanceProfileCredentialsProvider.java is sort of trying to solve it, 
> but it still does not share the credentials with other EC2 instances / 
> processes.
> This issue prevents users from creating Hadoop clusters on EC2



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14239) S3A Retry Multiple S3 Key Deletion

2017-03-24 Thread Kazuyuki Tanimura (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15941457#comment-15941457
 ] 

Kazuyuki Tanimura commented on HADOOP-14239:


To be clear, deletion is called from rename() as well. It makes more frequent 
to encounter this issue...

> S3A Retry Multiple S3 Key Deletion
> --
>
> Key: HADOOP-14239
> URL: https://issues.apache.org/jira/browse/HADOOP-14239
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.8.0, 3.0.0-alpha1, 3.0.0-alpha2, 2.8.1
> Environment: EC2, AWS
>Reporter: Kazuyuki Tanimura
>
> When fs.s3a.multiobjectdelete.enable == true, It tries to delete multiple S3 
> keys at once.
> Although this is a great feature, it becomes problematic when AWS fails 
> deleting some S3 keys out of the deletion list. The aws-java-sdk internally 
> retries to delete them, but it does not help because it simply retries the 
> same list of S3 keys including the successfully deleted ones. In that case, 
> all successive retries fail deleting previously deleted keys since they do 
> not exist any more. Eventually it throws an Exception and leads to a job 
> failure entirely.
> Luckily, the AWS API reports which keys it failed to delete. We should retry 
> only for the keys that failed to be deleted from S3A



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-14239) S3A Retry Multiple S3 Key Deletion

2017-03-24 Thread Kazuyuki Tanimura (JIRA)
Kazuyuki Tanimura created HADOOP-14239:
--

 Summary: S3A Retry Multiple S3 Key Deletion
 Key: HADOOP-14239
 URL: https://issues.apache.org/jira/browse/HADOOP-14239
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
Affects Versions: 3.0.0-alpha2, 3.0.0-alpha1, 2.8.0, 2.8.1
 Environment: EC2, AWS
Reporter: Kazuyuki Tanimura


When fs.s3a.multiobjectdelete.enable == true, It tries to delete multiple S3 
keys at once.

Although this is a great feature, it becomes problematic when AWS fails 
deleting some S3 keys out of the deletion list. The aws-java-sdk internally 
retries to delete them, but it does not help because it simply retries the same 
list of S3 keys including the successfully deleted ones. In that case, all 
successive retries fail deleting previously deleted keys since they do not 
exist any more. Eventually it throws an Exception and leads to a job failure 
entirely.

Luckily, the AWS API reports which keys it failed to delete. We should retry 
only for the keys that failed to be deleted from S3A



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-14237) S3A Support Shared Instance Profile Credentials Across All Instances

2017-03-24 Thread Kazuyuki Tanimura (JIRA)
Kazuyuki Tanimura created HADOOP-14237:
--

 Summary: S3A Support Shared Instance Profile Credentials Across 
All Instances
 Key: HADOOP-14237
 URL: https://issues.apache.org/jira/browse/HADOOP-14237
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
Affects Versions: 3.0.0-alpha2, 3.0.0-alpha1, 2.8.0, 2.8.1
 Environment: EC2, AWS
Reporter: Kazuyuki Tanimura


When I run a large Hadoop cluster on EC2 instances with IAM Role, it fails 
getting the instance profile credentials, eventually all jobs on the cluster 
fail. Since a number of S3A clients (all mappers and reducers) try to get the 
credentials, the AWS credential endpoint starts responding 5xx and 4xx error 
codes.

SharedInstanceProfileCredentialsProvider.java is sort of trying to solve it, 
but it still does not share the credentials with other EC2 instances / 
processes.

This issue prevents users from creating Hadoop clusters on EC2



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14235) S3A Path does not understand colon (:) when globbing

2017-03-24 Thread Kazuyuki Tanimura (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuyuki Tanimura updated HADOOP-14235:
---
Environment: EC2, AWS

> S3A Path does not understand colon (:) when globbing
> 
>
> Key: HADOOP-14235
> URL: https://issues.apache.org/jira/browse/HADOOP-14235
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.8.0, 3.0.0-alpha1, 3.0.0-alpha2, 2.8.1
> Environment: EC2, AWS
>Reporter: Kazuyuki Tanimura
>
> S3 paths, colons ":" are valid character in S3 paths. However, the Java URI 
> class, which is used in the Path class, does not allow it.
> This becomes a problem particularly when we are globbing S3 paths. The 
> globber thinks paths with colons are invalid paths and throws 
> URISyntaxException.
> The reason is we are sharing Globber.java with all other Fs. Some of the 
> rules for regular Fs are not applicable to S3 just like this colon as an 
> example.
> Same issue is reported here https://issues.apache.org/jira/browse/SPARK-20061
> The good news is I have a one line fix that I am about to send a pull request.
> However, for a right fix, we should separate the S3 globber from the 
> Globber.java as proposed at https://issues.apache.org/jira/browse/HADOOP-13371



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14235) S3A Path does not understand colon (:) when globbing

2017-03-24 Thread Kazuyuki Tanimura (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuyuki Tanimura updated HADOOP-14235:
---
Description: 
S3 paths, colons ":" are valid character in S3 paths. However, the Java URI 
class, which is used in the Path class, does not allow it.

This becomes a problem particularly when we are globbing S3 paths. The globber 
thinks paths with colons are invalid paths and throws URISyntaxException.

The reason is we are sharing Globber.java with all other Fs. Some of the rules 
for regular Fs are not applicable to S3 just like this colon as an example.

Same issue is reported here https://issues.apache.org/jira/browse/SPARK-20061

The good news is I have a one line fix that I am about to send a pull request.

However, for a right fix, we should separate the S3 globber from the 
Globber.java as proposed at https://issues.apache.org/jira/browse/HADOOP-13371

  was:
S3 paths, colons (:) are valid character in S3 paths. However, the Java URI 
class, which is used in the Path class, does not allow it.

This becomes a problem particularly when we are globbing S3 paths. The globber 
thinks paths with colons are invalid paths and throws URISyntaxException.

The reason is we are sharing Globber.java with all other Fs. Some of the rules 
for regular Fs are not applicable to S3 just like this colon as an example.

Same issue is reported here https://issues.apache.org/jira/browse/SPARK-20061

The good news is I have a one line fix that I am about to send a pull request.

However, for a right fix, we should separate the S3 globber from the 
Globber.java as proposed at https://issues.apache.org/jira/browse/HADOOP-13371


> S3A Path does not understand colon (:) when globbing
> 
>
> Key: HADOOP-14235
> URL: https://issues.apache.org/jira/browse/HADOOP-14235
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.8.0, 3.0.0-alpha1, 3.0.0-alpha2, 2.8.1
>Reporter: Kazuyuki Tanimura
>
> S3 paths, colons ":" are valid character in S3 paths. However, the Java URI 
> class, which is used in the Path class, does not allow it.
> This becomes a problem particularly when we are globbing S3 paths. The 
> globber thinks paths with colons are invalid paths and throws 
> URISyntaxException.
> The reason is we are sharing Globber.java with all other Fs. Some of the 
> rules for regular Fs are not applicable to S3 just like this colon as an 
> example.
> Same issue is reported here https://issues.apache.org/jira/browse/SPARK-20061
> The good news is I have a one line fix that I am about to send a pull request.
> However, for a right fix, we should separate the S3 globber from the 
> Globber.java as proposed at https://issues.apache.org/jira/browse/HADOOP-13371



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-14235) S3A Path does not understand colon (:) when globbing

2017-03-24 Thread Kazuyuki Tanimura (JIRA)
Kazuyuki Tanimura created HADOOP-14235:
--

 Summary: S3A Path does not understand colon (:) when globbing
 Key: HADOOP-14235
 URL: https://issues.apache.org/jira/browse/HADOOP-14235
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
Affects Versions: 3.0.0-alpha2, 3.0.0-alpha1, 2.8.0, 2.8.1
Reporter: Kazuyuki Tanimura


S3 paths, colons (:) are valid character in S3 paths. However, the Java URI 
class, which is used in the Path class, does not allow it.

This becomes a problem particularly when we are globbing S3 paths. The globber 
thinks paths with colons are invalid paths and throws URISyntaxException.

The reason is we are sharing Globber.java with all other Fs. Some of the rules 
for regular Fs are not applicable to S3 just like this colon as an example.

Same issue is reported here https://issues.apache.org/jira/browse/SPARK-20061

The good news is I have a one line fix that I am about to send a pull request.

However, for a right fix, we should separate the S3 globber from the 
Globber.java as proposed at https://issues.apache.org/jira/browse/HADOOP-13371



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org