[jira] [Commented] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to Aliyun OSS performance

2018-01-15 Thread wujinhu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16326830#comment-16326830
 ] 

wujinhu commented on HADOOP-15027:
--

Thanks [~Sammi] for the review. I have updated the patch.:)

> AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to 
> Aliyun OSS performance
> --
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, 
> HADOOP-15027.006.patch, HADOOP-15027.007.patch, HADOOP-15027.008.patch, 
> HADOOP-15027.009.patch, HADOOP-15027.010.patch, HADOOP-15027.011.patch, 
> HADOOP-15027.012.patch, HADOOP-15027.013.patch
>
>
> Currently, AliyunOSSInputStream uses single thread to read data from 
> AliyunOSS,  so we can do some refactoring by using multi-thread pre-read to 
> improve read performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to Aliyun OSS performance

2018-01-15 Thread wujinhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15027:
-
Attachment: HADOOP-15027.013.patch

> AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to 
> Aliyun OSS performance
> --
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, 
> HADOOP-15027.006.patch, HADOOP-15027.007.patch, HADOOP-15027.008.patch, 
> HADOOP-15027.009.patch, HADOOP-15027.010.patch, HADOOP-15027.011.patch, 
> HADOOP-15027.012.patch, HADOOP-15027.013.patch
>
>
> Currently, AliyunOSSInputStream uses single thread to read data from 
> AliyunOSS,  so we can do some refactoring by using multi-thread pre-read to 
> improve read performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15173) Possible dead code in CombineFileInputFormat

2018-01-15 Thread Lior Regev (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16326802#comment-16326802
 ] 

Lior Regev commented on HADOOP-15173:
-

I forgot to mention in the issue but creating OneFileInfo is performing a 
side-effect of fetching file infos, which, in cases where one is accessing a 
remote service (s3 for example) might actually give a performance hit

> Possible dead code in CombineFileInputFormat
> 
>
> Key: HADOOP-15173
> URL: https://issues.apache.org/jira/browse/HADOOP-15173
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Lior Regev
>Priority: Minor
>  Labels: newbie
>
> I found that in the file:
> hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineFileInputFormat.java:280
> There's a generation of OneFileInfo[] without using that array ever again.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15173) Possible dead code in CombineFileInputFormat

2018-01-15 Thread Akira Ajisaka (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16326801#comment-16326801
 ] 

Akira Ajisaka commented on HADOOP-15173:


Yes, we don't need to create the array.

> Possible dead code in CombineFileInputFormat
> 
>
> Key: HADOOP-15173
> URL: https://issues.apache.org/jira/browse/HADOOP-15173
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Lior Regev
>Priority: Minor
>  Labels: newbie
>
> I found that in the file:
> hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineFileInputFormat.java:280
> There's a generation of OneFileInfo[] without using that array ever again.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15173) Possible dead code in CombineFileInputFormat

2018-01-15 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HADOOP-15173:
---
Priority: Minor  (was: Major)

> Possible dead code in CombineFileInputFormat
> 
>
> Key: HADOOP-15173
> URL: https://issues.apache.org/jira/browse/HADOOP-15173
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Lior Regev
>Priority: Minor
>  Labels: newbie
>
> I found that in the file:
> hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineFileInputFormat.java:280
> There's a generation of OneFileInfo[] without using that array ever again.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15173) Possible dead code in CombineFileInputFormat

2018-01-15 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HADOOP-15173:
---
Environment: (was: I found that in the file:

hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineFileInputFormat.java:280

There's a generation of OneFileInfo[] without using that array ever again.

 )
 Labels: newbie  (was: )
Description: 
I found that in the file:

hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineFileInputFormat.java:280

There's a generation of OneFileInfo[] without using that array ever again.

> Possible dead code in CombineFileInputFormat
> 
>
> Key: HADOOP-15173
> URL: https://issues.apache.org/jira/browse/HADOOP-15173
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Lior Regev
>Priority: Major
>  Labels: newbie
>
> I found that in the file:
> hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineFileInputFormat.java:280
> There's a generation of OneFileInfo[] without using that array ever again.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15027) AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to Aliyun OSS performance

2018-01-15 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16326764#comment-16326764
 ] 

SammiChen commented on HADOOP-15027:


Hi [~wujinhu], the performance data looks very good.  We are very close now.  

The findbugs filter is too general.  It should be as specific as possible. 
Refer to other modules like AWS to see how to specify the filter.  Also please 
make sure the filter is necessary.
 {quote}
 


  {quote}




> AliyunOSS: Support multi-thread pre-read to improve read from Hadoop to 
> Aliyun OSS performance
> --
>
> Key: HADOOP-15027
> URL: https://issues.apache.org/jira/browse/HADOOP-15027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, 
> HADOOP-15027.003.patch, HADOOP-15027.004.patch, HADOOP-15027.005.patch, 
> HADOOP-15027.006.patch, HADOOP-15027.007.patch, HADOOP-15027.008.patch, 
> HADOOP-15027.009.patch, HADOOP-15027.010.patch, HADOOP-15027.011.patch, 
> HADOOP-15027.012.patch
>
>
> Currently, AliyunOSSInputStream uses single thread to read data from 
> AliyunOSS,  so we can do some refactoring by using multi-thread pre-read to 
> improve read performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-01-15 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-14999:
---
Attachment: HADOOP-14999.005.patch

> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Major
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, 
> asynchronous_file_uploading.pdf
>
>
> This mechanism is designed for uploading file in parallel and asynchronously: 
> - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
> - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039. 
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and 
> {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based 
> uploading mechanism.
> 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local 
> disk before we can upload it to OSS. This will poses two problems:
> - if the output file is too large, it will run out of the local disk.
> - if the output file is too large, task will wait long time to upload 
> result to OSS before finish, wasting much compute resource.
> 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
> i.e. some small local file, and each block will be packaged into a uploading 
> task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}.  
> {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this 
> will improve performance greatly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-01-15 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-14999:
---
Attachment: HADOOP-14999.004.patch

> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Major
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> HADOOP-14999.003.patch, HADOOP-14999.004.patch, 
> asynchronous_file_uploading.pdf
>
>
> This mechanism is designed for uploading file in parallel and asynchronously: 
> - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
> - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039. 
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and 
> {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based 
> uploading mechanism.
> 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local 
> disk before we can upload it to OSS. This will poses two problems:
> - if the output file is too large, it will run out of the local disk.
> - if the output file is too large, task will wait long time to upload 
> result to OSS before finish, wasting much compute resource.
> 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
> i.e. some small local file, and each block will be packaged into a uploading 
> task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}.  
> {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this 
> will improve performance greatly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13134) WASB's file delete still throwing Blob not found exception

2018-01-15 Thread Rohan Garg (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16326413#comment-16326413
 ] 

Rohan Garg commented on HADOOP-13134:
-

[~ste...@apache.org] : one of cases where I discovered a similar bug (parent 
blob for a key suddenly got missing while doing a rename operation) was with a 
spark 2.1 query. It was an 'insert into' query. spark's code uses 
'deleteOnExit' method on staging directory 
([https://github.com/apache/spark/blob/branch-2.1/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala#L118]).
 But from what I saw, spark's code doesn't try to preserve the filesystem 
object to which it latched the staging directory's deleteOnExit. Also, we were 
not using file system caching in hadoop. So, whenever the Azure FS object on 
which deleteOnExit was called got GCed, finalize method for FS object was 
called which in-turn calls the FileSystem#close(). The file system's close 
method ended up deleting the staging directory. This posed a problem as we were 
doing renames in the main thread which failed leading to query failure.

> WASB's file delete still throwing Blob not found exception
> --
>
> Key: HADOOP-13134
> URL: https://issues.apache.org/jira/browse/HADOOP-13134
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 2.7.1
>Reporter: Lin Chan
>Assignee: Dushyanth
>Priority: Major
>
> WASB is still throwing blob not found exception as shown in the following 
> stack. Need to catch that and convert to Boolean return code in WASB delete.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15150) in FsShell, UGI params should be overidden through env vars(-D arg)

2018-01-15 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16326333#comment-16326333
 ] 

Vinayakumar B commented on HADOOP-15150:


{color:#33}*+1*{color}, Using the configuration with overlay values *(keys 
set using -D args)* to initialize the UGI will be really helpful.

> in FsShell, UGI params should be overidden through env vars(-D arg)
> ---
>
> Key: HADOOP-15150
> URL: https://issues.apache.org/jira/browse/HADOOP-15150
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>Priority: Major
> Attachments: HADOOP-15150.patch
>
>
> org.apache.hadoop.security.UserGroupInformation#ensureInitialized,will always 
> get the configure from the configuration files.*So that, -D args will not 
> take effect*.
> {code}
>   private static void ensureInitialized() {
> if (conf == null) {
>   synchronized(UserGroupInformation.class) {
> if (conf == null) { // someone might have beat us
>   initialize(new Configuration(), false);
> }
>   }
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14507) extend per-bucket secret key config with explicit getPassword() on fs.s3a.$bucket.secret,key

2018-01-15 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16326204#comment-16326204
 ] 

Steve Loughran commented on HADOOP-14507:
-

 

 Patch 006; rebase to trunk + some minor cleanup after review

 

test: s3 ireland + s3guard + auth

> extend per-bucket secret key config with explicit getPassword() on 
> fs.s3a.$bucket.secret,key
> 
>
> Key: HADOOP-14507
> URL: https://issues.apache.org/jira/browse/HADOOP-14507
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-14507-001.patch, HADOOP-14507-002.patch, 
> HADOOP-14507-003.patch, HADOOP-14507-004.patch, HADOOP-14507-005.patch, 
> HADOOP-14507-006.patch
>
>
> Per-bucket jceks support turns out to be complex as you have to manage 
> multiple jecks files & configure the client to ask for the right one. This is 
> because we're calling {{Configuration.getPassword{"fs,s3a.secret.key"}}. 
> If before that, we do a check for the explict id, key, session key in the 
> properties {{fs.s3a.$bucket.secret}} ( & c), we could have a single JCEKs 
> file with all the secrets for different bucket. You would only need to 
> explicitly point the base config to the secrets file, and the right 
> credentials would be picked up, if set



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14507) extend per-bucket secret key config with explicit getPassword() on fs.s3a.$bucket.secret,key

2018-01-15 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-14507:

Attachment: HADOOP-14507-006.patch

> extend per-bucket secret key config with explicit getPassword() on 
> fs.s3a.$bucket.secret,key
> 
>
> Key: HADOOP-14507
> URL: https://issues.apache.org/jira/browse/HADOOP-14507
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-14507-001.patch, HADOOP-14507-002.patch, 
> HADOOP-14507-003.patch, HADOOP-14507-004.patch, HADOOP-14507-005.patch, 
> HADOOP-14507-006.patch
>
>
> Per-bucket jceks support turns out to be complex as you have to manage 
> multiple jecks files & configure the client to ask for the right one. This is 
> because we're calling {{Configuration.getPassword{"fs,s3a.secret.key"}}. 
> If before that, we do a check for the explict id, key, session key in the 
> properties {{fs.s3a.$bucket.secret}} ( & c), we could have a single JCEKs 
> file with all the secrets for different bucket. You would only need to 
> explicitly point the base config to the secrets file, and the right 
> credentials would be picked up, if set



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14507) extend per-bucket secret key config with explicit getPassword() on fs.s3a.$bucket.secret,key

2018-01-15 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-14507:

Status: Open  (was: Patch Available)

> extend per-bucket secret key config with explicit getPassword() on 
> fs.s3a.$bucket.secret,key
> 
>
> Key: HADOOP-14507
> URL: https://issues.apache.org/jira/browse/HADOOP-14507
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-14507-001.patch, HADOOP-14507-002.patch, 
> HADOOP-14507-003.patch, HADOOP-14507-004.patch, HADOOP-14507-005.patch, 
> HADOOP-14507-006.patch
>
>
> Per-bucket jceks support turns out to be complex as you have to manage 
> multiple jecks files & configure the client to ask for the right one. This is 
> because we're calling {{Configuration.getPassword{"fs,s3a.secret.key"}}. 
> If before that, we do a check for the explict id, key, session key in the 
> properties {{fs.s3a.$bucket.secret}} ( & c), we could have a single JCEKs 
> file with all the secrets for different bucket. You would only need to 
> explicitly point the base config to the secrets file, and the right 
> credentials would be picked up, if set



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15079) ITestS3AFileOperationCost#testFakeDirectoryDeletion failing after OutputCommitter patch

2018-01-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16326167#comment-16326167
 ] 

Hudson commented on HADOOP-15079:
-

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13498 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13498/])
HADOOP-15079. ITestS3AFileOperationCost#testFakeDirectoryDeletion (stevel: rev 
a0c71dcc33ca7c5539d0ab61c4a276c4f39e5744)
* (edit) 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Retries.java
* (edit) 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/S3Guard.java
* (edit) 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Listing.java
* (edit) 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
* (edit) 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AFileOperationCost.java


> ITestS3AFileOperationCost#testFakeDirectoryDeletion failing after 
> OutputCommitter patch
> ---
>
> Key: HADOOP-15079
> URL: https://issues.apache.org/jira/browse/HADOOP-15079
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 3.1.0
>Reporter: Sean Mackrory
>Assignee: Steve Loughran
>Priority: Critical
> Fix For: 3.1.0
>
> Attachments: HADOOP-15079-001.patch, HADOOP-15079-002.patch, 
> HADOOP-15079-003.patch
>
>
> I see this test failing with "object_delete_requests expected:<1> but 
> was:<2>". I printed stack traces whenever this metric was incremented, and 
> found the root cause to be that innerMkdirs is now causing two calls to 
> delete fake directories when it previously caused only one. It is called once 
> inside createFakeDirectory, and once directly inside innerMkdirs later:
> {code}
> at 
> org.apache.hadoop.fs.s3a.S3AInstrumentation.incrementCounter(S3AInstrumentation.java:454)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.incrementStatistic(S3AFileSystem.java:1108)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$deleteObjects$8(S3AFileSystem.java:1369)
> at 
> org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:313)
> at 
> org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:279)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.deleteObjects(S3AFileSystem.java:1366)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.removeKeys(S3AFileSystem.java:1625)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.deleteUnnecessaryFakeDirectories(S3AFileSystem.java:2634)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.finishedWrite(S3AFileSystem.java:2599)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.putObjectDirect(S3AFileSystem.java:1498)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$createEmptyObject$11(S3AFileSystem.java:2684)
> at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:108)
> at org.apache.hadoop.fs.s3a.Invoker.lambda$retry$3(Invoker.java:259)
> at 
> org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:313)
> at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:255)
> at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:230)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.createEmptyObject(S3AFileSystem.java:2682)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.createFakeDirectory(S3AFileSystem.java:2657)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.innerMkdirs(S3AFileSystem.java:2021)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.mkdirs(S3AFileSystem.java:1956)
> at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:2305)
> at 
> org.apache.hadoop.fs.contract.AbstractFSContractTestBase.mkdirs(AbstractFSContractTestBase.java:338)
> at 
> org.apache.hadoop.fs.s3a.ITestS3AFileOperationCost.testFakeDirectoryDeletion(ITestS3AFileOperationCost.java:209)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> at 
> 

[jira] [Updated] (HADOOP-15079) ITestS3AFileOperationCost#testFakeDirectoryDeletion failing after OutputCommitter patch

2018-01-15 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15079:

   Resolution: Fixed
Fix Version/s: 3.1.0
   Status: Resolved  (was: Patch Available)

> ITestS3AFileOperationCost#testFakeDirectoryDeletion failing after 
> OutputCommitter patch
> ---
>
> Key: HADOOP-15079
> URL: https://issues.apache.org/jira/browse/HADOOP-15079
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 3.1.0
>Reporter: Sean Mackrory
>Assignee: Steve Loughran
>Priority: Critical
> Fix For: 3.1.0
>
> Attachments: HADOOP-15079-001.patch, HADOOP-15079-002.patch, 
> HADOOP-15079-003.patch
>
>
> I see this test failing with "object_delete_requests expected:<1> but 
> was:<2>". I printed stack traces whenever this metric was incremented, and 
> found the root cause to be that innerMkdirs is now causing two calls to 
> delete fake directories when it previously caused only one. It is called once 
> inside createFakeDirectory, and once directly inside innerMkdirs later:
> {code}
> at 
> org.apache.hadoop.fs.s3a.S3AInstrumentation.incrementCounter(S3AInstrumentation.java:454)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.incrementStatistic(S3AFileSystem.java:1108)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$deleteObjects$8(S3AFileSystem.java:1369)
> at 
> org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:313)
> at 
> org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:279)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.deleteObjects(S3AFileSystem.java:1366)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.removeKeys(S3AFileSystem.java:1625)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.deleteUnnecessaryFakeDirectories(S3AFileSystem.java:2634)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.finishedWrite(S3AFileSystem.java:2599)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.putObjectDirect(S3AFileSystem.java:1498)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$createEmptyObject$11(S3AFileSystem.java:2684)
> at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:108)
> at org.apache.hadoop.fs.s3a.Invoker.lambda$retry$3(Invoker.java:259)
> at 
> org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:313)
> at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:255)
> at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:230)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.createEmptyObject(S3AFileSystem.java:2682)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.createFakeDirectory(S3AFileSystem.java:2657)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.innerMkdirs(S3AFileSystem.java:2021)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.mkdirs(S3AFileSystem.java:1956)
> at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:2305)
> at 
> org.apache.hadoop.fs.contract.AbstractFSContractTestBase.mkdirs(AbstractFSContractTestBase.java:338)
> at 
> org.apache.hadoop.fs.s3a.ITestS3AFileOperationCost.testFakeDirectoryDeletion(ITestS3AFileOperationCost.java:209)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
> at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74
> {code}
> {code}
> at 
> org.apache.hadoop.fs.s3a.S3AInstrumentation.incrementCounter(S3AInstrumentation.java:454)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.incrementStatistic(S3AFileSystem.java:1108)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$deleteObjects$8(S3AFileSystem.java:1369)
> at 
> org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:313)
> at 
> 

[jira] [Created] (HADOOP-15174) ASF License warning in hadoop-mapreduce-client

2018-01-15 Thread Takanobu Asanuma (JIRA)
Takanobu Asanuma created HADOOP-15174:
-

 Summary: ASF License warning in hadoop-mapreduce-client
 Key: HADOOP-15174
 URL: https://issues.apache.org/jira/browse/HADOOP-15174
 Project: Hadoop Common
  Issue Type: Test
  Components: test
Reporter: Takanobu Asanuma
Assignee: Takanobu Asanuma


it occurred in MAPREDUCE-7021 and MAPREDUCE-7034.

{noformat}

Lines that start with ? in the ASF License report indicate files that do 
not have an Apache license header: !? 
/testptch/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/jobTokenPassword

{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15173) Possible dead code in CombineFileInputFormat

2018-01-15 Thread Lior Regev (JIRA)
Lior Regev created HADOOP-15173:
---

 Summary: Possible dead code in CombineFileInputFormat
 Key: HADOOP-15173
 URL: https://issues.apache.org/jira/browse/HADOOP-15173
 Project: Hadoop Common
  Issue Type: Bug
 Environment: I found that in the file:

hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineFileInputFormat.java:280

There's a generation of OneFileInfo[] without using that array ever again.

 
Reporter: Lior Regev






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org