[jira] [Commented] (HADOOP-15385) Many tests are failing in hadoop-distcp project in branch-2.8

2018-04-23 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449234#comment-16449234
 ] 

genericqa commented on HADOOP-15385:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
24s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} branch-2 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 
12s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
23s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
16s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
29s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
18s{color} | {color:green} branch-2 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}423m 39s{color} 
| {color:red} hadoop-distcp in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
27s{color} | {color:red} The patch generated 2 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}438m 38s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.tools.TestDistCpSystem |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:f667ef1 |
| JIRA Issue | HADOOP-15385 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12920074/HADOOP-15385-branch-2.001.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux d0c3fd34b579 3.13.0-137-generic #186-Ubuntu SMP Mon Dec 4 
19:09:19 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | branch-2 / 99e82e2 |
| maven | version: Apache Maven 3.3.9 
(bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-10T16:41:47+00:00) |
| Default Java | 1.7.0_171 |
| unit | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14517/artifact/out/patch-unit-hadoop-tools_hadoop-distcp.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14517/testReport/ |
| asflicense | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14517/artifact/out/patch-asflicense-problems.txt
 |
| Max. process+thread count | 241 (vs. ulimit of 1) |
| modules | C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14517/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Many tests are failing in hadoop-distcp project in branch-2.8
> -
>
> Key: HADOOP-15385
> URL: https://issues.apache.org/jira/browse/HADOOP-15385
> Project: Hadoop Common
>

[jira] [Assigned] (HADOOP-7714) Umbrella for usage of native calls to manage OS cache and readahead

2018-04-23 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-7714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon reassigned HADOOP-7714:
---

Assignee: (was: Todd Lipcon)

> Umbrella for usage of native calls to manage OS cache and readahead
> ---
>
> Key: HADOOP-7714
> URL: https://issues.apache.org/jira/browse/HADOOP-7714
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: io, native, performance
>Affects Versions: 0.24.0
>Reporter: Todd Lipcon
>Priority: Major
> Attachments: 7714-fallocate-20s.patch, graphs.pdf, hadoop-7714-2.txt, 
> hadoop-7714-20s-prelim.txt
>
>
> Especially in shared HBase/MR situations, management of the OS buffer cache 
> is important. Currently, running a big MR job will evict all of HBase's hot 
> data from cache, causing HBase performance to really suffer. However, caching 
> of the MR input/output is rarely useful, since the datasets tend to be larger 
> than cache and not re-read often enough that the cache is used. Having access 
> to the native calls {{posix_fadvise}} and {{sync_data_range}} on platforms 
> where they are supported would allow us to do a better job of managing this 
> cache.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-9545) Improve logging in ActiveStandbyElector

2018-04-23 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HADOOP-9545.
-
Resolution: Won't Fix

> Improve logging in ActiveStandbyElector
> ---
>
> Key: HADOOP-9545
> URL: https://issues.apache.org/jira/browse/HADOOP-9545
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: auto-failover, ha
>Affects Versions: 2.1.0-beta
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Minor
>
> The ActiveStandbyElector currently logs a lot of stuff at DEBUG level which 
> would be useful for troubleshooting. We've seen one instance in the wild of a 
> ZKFC thinking it should be in standby state when in fact it won the election, 
> but the logging is insufficient to understand why. I'd like to bump most of 
> the existing DEBUG logs to INFO and add some additional logs as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-10859) Native implementation of java Checksum interface

2018-04-23 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HADOOP-10859.
--
Resolution: Won't Fix

No plans to work on this.

> Native implementation of java Checksum interface
> 
>
> Key: HADOOP-10859
> URL: https://issues.apache.org/jira/browse/HADOOP-10859
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Minor
>
> Some parts of our code such as IFileInputStream/IFileOutputStream use the 
> java Checksum interface to calculate/verify checksums. Currently we don't 
> have a native implementation of these. For CRC32C in particular, we can get a 
> very big speedup with a native implementation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-04-23 Thread Esfandiar Manii (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449145#comment-16449145
 ] 

Esfandiar Manii commented on HADOOP-15407:
--

My bad, the order of diff was incorrect. Updated with the correct one. :)

> Support Windows Azure Storage - Blob file system in Hadoop
> --
>
> Key: HADOOP-15407
> URL: https://issues.apache.org/jira/browse/HADOOP-15407
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Esfandiar Manii
>Assignee: Esfandiar Manii
>Priority: Major
> Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch
>
>
> *{color:#212121}Description{color}*
>  This JIRA adds a new file system implementation, ABFS, for running Big Data 
> and Analytics workloads against Azure Storage. This is a complete rewrite of 
> the previous WASB driver with a heavy focus on optimizing both performance 
> and cost.
>  {color:#212121} {color}
>  *{color:#212121}High level design{color}*
>  At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blobs in Azure Storage. The scheme abfs is used 
> for accessing it over HTTP, and abfss for accessing over HTTPS. The following 
> URI scheme is used to address individual paths:
>  {color:#212121} {color}
>  
> {color:#212121}abfs[s]://@.dfs.core.windows.net/{color}
>  {color:#212121} {color}
>  {color:#212121}ABFS is intended as a replacement to WASB. WASB is not 
> deprecated but is in pure maintenance mode and customers should upgrade to 
> ABFS once it hits General Availability later in CY18.{color}
>  {color:#212121}Benefits of ABFS include:{color}
>  {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big 
> Data and Analytics workloads by allowing higher limits on storage 
> accounts{color}
>  {color:#212121}· Removing any ramp up time with Storage backend 
> partitioning; blocks are now automatically sharded across partitions in the 
> Storage backend{color}
> {color:#212121}          .         This avoids the need for using 
> temporary/intermediate files, increasing the cost (and framework complexity 
> around committing jobs/tasks){color}
>  {color:#212121}· Enabling much higher read and write throughput on 
> single files (tens of Gbps by default){color}
>  {color:#212121}· Still retaining all of the Azure Blob features 
> customers are familiar with and expect, and gaining the benefits of future 
> Blob features as well{color}
>  {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the 
> file system throughput and operations. Ambari metrics are not currently 
> implemented for ABFS, but will be available soon.{color}
>  {color:#212121} {color}
>  *{color:#212121}Credits and history{color}*
>  Credit for this work goes to (hope I don't forget anyone): Shane Mainali, 
> {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar 
> Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, 
> and James Baker. {color}
>  {color:#212121} {color}
>  *Test*
>  ABFS has gone through many test procedures including Hadoop file system 
> contract tests, unit testing, functional testing, and manual testing. All the 
> Junit tests provided with the driver are capable of running in both 
> sequential/parallel fashion in order to reduce the testing time.
>  {color:#212121}Besides unit tests, we have used ABFS as the default file 
> system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a 
> storage option. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, 
> Spark Streaming and Spark SQL, and others have been run to do scenario, 
> performance, and functional testing. Third parties and customers have also 
> done various testing of ABFS.{color}
>  {color:#212121}The current version reflects to the version of the code 
> tested and used in our production environment.{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-04-23 Thread Esfandiar Manii (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esfandiar Manii updated HADOOP-15407:
-
Attachment: HADOOP-15407-002.patch

> Support Windows Azure Storage - Blob file system in Hadoop
> --
>
> Key: HADOOP-15407
> URL: https://issues.apache.org/jira/browse/HADOOP-15407
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Esfandiar Manii
>Assignee: Esfandiar Manii
>Priority: Major
> Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch
>
>
> *{color:#212121}Description{color}*
>  This JIRA adds a new file system implementation, ABFS, for running Big Data 
> and Analytics workloads against Azure Storage. This is a complete rewrite of 
> the previous WASB driver with a heavy focus on optimizing both performance 
> and cost.
>  {color:#212121} {color}
>  *{color:#212121}High level design{color}*
>  At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blobs in Azure Storage. The scheme abfs is used 
> for accessing it over HTTP, and abfss for accessing over HTTPS. The following 
> URI scheme is used to address individual paths:
>  {color:#212121} {color}
>  
> {color:#212121}abfs[s]://@.dfs.core.windows.net/{color}
>  {color:#212121} {color}
>  {color:#212121}ABFS is intended as a replacement to WASB. WASB is not 
> deprecated but is in pure maintenance mode and customers should upgrade to 
> ABFS once it hits General Availability later in CY18.{color}
>  {color:#212121}Benefits of ABFS include:{color}
>  {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big 
> Data and Analytics workloads by allowing higher limits on storage 
> accounts{color}
>  {color:#212121}· Removing any ramp up time with Storage backend 
> partitioning; blocks are now automatically sharded across partitions in the 
> Storage backend{color}
> {color:#212121}          .         This avoids the need for using 
> temporary/intermediate files, increasing the cost (and framework complexity 
> around committing jobs/tasks){color}
>  {color:#212121}· Enabling much higher read and write throughput on 
> single files (tens of Gbps by default){color}
>  {color:#212121}· Still retaining all of the Azure Blob features 
> customers are familiar with and expect, and gaining the benefits of future 
> Blob features as well{color}
>  {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the 
> file system throughput and operations. Ambari metrics are not currently 
> implemented for ABFS, but will be available soon.{color}
>  {color:#212121} {color}
>  *{color:#212121}Credits and history{color}*
>  Credit for this work goes to (hope I don't forget anyone): Shane Mainali, 
> {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar 
> Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, 
> and James Baker. {color}
>  {color:#212121} {color}
>  *Test*
>  ABFS has gone through many test procedures including Hadoop file system 
> contract tests, unit testing, functional testing, and manual testing. All the 
> Junit tests provided with the driver are capable of running in both 
> sequential/parallel fashion in order to reduce the testing time.
>  {color:#212121}Besides unit tests, we have used ABFS as the default file 
> system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a 
> storage option. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, 
> Spark Streaming and Spark SQL, and others have been run to do scenario, 
> performance, and functional testing. Third parties and customers have also 
> done various testing of ABFS.{color}
>  {color:#212121}The current version reflects to the version of the code 
> tested and used in our production environment.{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-04-23 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449125#comment-16449125
 ] 

Devaraj Das commented on HADOOP-15407:
--

[~esmanii], the patch seems to have been generated incorrectly. I'd expect this 
jira is adding lot of new code, but the patch does otherwise :)

> Support Windows Azure Storage - Blob file system in Hadoop
> --
>
> Key: HADOOP-15407
> URL: https://issues.apache.org/jira/browse/HADOOP-15407
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Esfandiar Manii
>Assignee: Esfandiar Manii
>Priority: Major
> Attachments: HADOOP-15407-001.patch
>
>
> *{color:#212121}Description{color}*
>  This JIRA adds a new file system implementation, ABFS, for running Big Data 
> and Analytics workloads against Azure Storage. This is a complete rewrite of 
> the previous WASB driver with a heavy focus on optimizing both performance 
> and cost.
>  {color:#212121} {color}
>  *{color:#212121}High level design{color}*
>  At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blobs in Azure Storage. The scheme abfs is used 
> for accessing it over HTTP, and abfss for accessing over HTTPS. The following 
> URI scheme is used to address individual paths:
>  {color:#212121} {color}
>  
> {color:#212121}abfs[s]://@.dfs.core.windows.net/{color}
>  {color:#212121} {color}
>  {color:#212121}ABFS is intended as a replacement to WASB. WASB is not 
> deprecated but is in pure maintenance mode and customers should upgrade to 
> ABFS once it hits General Availability later in CY18.{color}
>  {color:#212121}Benefits of ABFS include:{color}
>  {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big 
> Data and Analytics workloads by allowing higher limits on storage 
> accounts{color}
>  {color:#212121}· Removing any ramp up time with Storage backend 
> partitioning; blocks are now automatically sharded across partitions in the 
> Storage backend{color}
> {color:#212121}          .         This avoids the need for using 
> temporary/intermediate files, increasing the cost (and framework complexity 
> around committing jobs/tasks){color}
>  {color:#212121}· Enabling much higher read and write throughput on 
> single files (tens of Gbps by default){color}
>  {color:#212121}· Still retaining all of the Azure Blob features 
> customers are familiar with and expect, and gaining the benefits of future 
> Blob features as well{color}
>  {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the 
> file system throughput and operations. Ambari metrics are not currently 
> implemented for ABFS, but will be available soon.{color}
>  {color:#212121} {color}
>  *{color:#212121}Credits and history{color}*
>  Credit for this work goes to (hope I don't forget anyone): Shane Mainali, 
> {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar 
> Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, 
> and James Baker. {color}
>  {color:#212121} {color}
>  *Test*
>  ABFS has gone through many test procedures including Hadoop file system 
> contract tests, unit testing, functional testing, and manual testing. All the 
> Junit tests provided with the driver are capable of running in both 
> sequential/parallel fashion in order to reduce the testing time.
>  {color:#212121}Besides unit tests, we have used ABFS as the default file 
> system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a 
> storage option. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, 
> Spark Streaming and Spark SQL, and others have been run to do scenario, 
> performance, and functional testing. Third parties and customers have also 
> done various testing of ABFS.{color}
>  {color:#212121}The current version reflects to the version of the code 
> tested and used in our production environment.{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-04-23 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449125#comment-16449125
 ] 

Devaraj Das edited comment on HADOOP-15407 at 4/24/18 12:58 AM:


[~esmanii], the patch seems to have been generated incorrectly. I'd expect this 
jira to add a lot of new code, but the patch does otherwise :)


was (Author: devaraj):
[~esmanii], the patch seems to have been generated incorrectly. I'd expect this 
jira is adding lot of new code, but the patch does otherwise :)

> Support Windows Azure Storage - Blob file system in Hadoop
> --
>
> Key: HADOOP-15407
> URL: https://issues.apache.org/jira/browse/HADOOP-15407
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Esfandiar Manii
>Assignee: Esfandiar Manii
>Priority: Major
> Attachments: HADOOP-15407-001.patch
>
>
> *{color:#212121}Description{color}*
>  This JIRA adds a new file system implementation, ABFS, for running Big Data 
> and Analytics workloads against Azure Storage. This is a complete rewrite of 
> the previous WASB driver with a heavy focus on optimizing both performance 
> and cost.
>  {color:#212121} {color}
>  *{color:#212121}High level design{color}*
>  At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blobs in Azure Storage. The scheme abfs is used 
> for accessing it over HTTP, and abfss for accessing over HTTPS. The following 
> URI scheme is used to address individual paths:
>  {color:#212121} {color}
>  
> {color:#212121}abfs[s]://@.dfs.core.windows.net/{color}
>  {color:#212121} {color}
>  {color:#212121}ABFS is intended as a replacement to WASB. WASB is not 
> deprecated but is in pure maintenance mode and customers should upgrade to 
> ABFS once it hits General Availability later in CY18.{color}
>  {color:#212121}Benefits of ABFS include:{color}
>  {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big 
> Data and Analytics workloads by allowing higher limits on storage 
> accounts{color}
>  {color:#212121}· Removing any ramp up time with Storage backend 
> partitioning; blocks are now automatically sharded across partitions in the 
> Storage backend{color}
> {color:#212121}          .         This avoids the need for using 
> temporary/intermediate files, increasing the cost (and framework complexity 
> around committing jobs/tasks){color}
>  {color:#212121}· Enabling much higher read and write throughput on 
> single files (tens of Gbps by default){color}
>  {color:#212121}· Still retaining all of the Azure Blob features 
> customers are familiar with and expect, and gaining the benefits of future 
> Blob features as well{color}
>  {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the 
> file system throughput and operations. Ambari metrics are not currently 
> implemented for ABFS, but will be available soon.{color}
>  {color:#212121} {color}
>  *{color:#212121}Credits and history{color}*
>  Credit for this work goes to (hope I don't forget anyone): Shane Mainali, 
> {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar 
> Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, 
> and James Baker. {color}
>  {color:#212121} {color}
>  *Test*
>  ABFS has gone through many test procedures including Hadoop file system 
> contract tests, unit testing, functional testing, and manual testing. All the 
> Junit tests provided with the driver are capable of running in both 
> sequential/parallel fashion in order to reduce the testing time.
>  {color:#212121}Besides unit tests, we have used ABFS as the default file 
> system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a 
> storage option. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, 
> Spark Streaming and Spark SQL, and others have been run to do scenario, 
> performance, and functional testing. Third parties and customers have also 
> done various testing of ABFS.{color}
>  {color:#212121}The current version reflects to the version of the code 
> tested and used in our production environment.{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15390) Yarn RM logs flooded by DelegationTokenRenewer trying to renew KMS tokens

2018-04-23 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated HADOOP-15390:
---
Fix Version/s: 2.8.4

> Yarn RM logs flooded by DelegationTokenRenewer trying to renew KMS tokens
> -
>
> Key: HADOOP-15390
> URL: https://issues.apache.org/jira/browse/HADOOP-15390
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Critical
> Fix For: 2.8.4, 3.2.0, 3.1.1, 2.9.2, 3.0.3
>
> Attachments: HADOOP-15390.01.patch, HADOOP-15390.02.patch
>
>
> When looking at a recent issue with [~rkanter] and [~yufeigu], we found that 
> the RM log in a cluster was flooded by KMS token renewal errors below:
> {noformat}
> $ tail -9 hadoop-cmf-yarn-RESOURCEMANAGER.log
> 2018-04-11 11:34:09,367 WARN 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider$KMSTokenRenewer: 
> keyProvider null cannot renew dt.
> 2018-04-11 11:34:09,367 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Renewed delegation-token= [Kind: kms-dt, Service: KMSIP:16000, Ident: 
> (kms-dt owner=user, renewer=yarn, realUser=, issueDate=1522192283334, 
> maxDate=1522797083334, sequenceNumber=15108613, masterKeyId=2674);exp=0; 
> apps=[]], for []
> 2018-04-11 11:34:09,367 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Renew Kind: kms-dt, Service: KMSIP:16000, Ident: (kms-dt owner=user, 
> renewer=yarn, realUser=, issueDate=1522192283334, maxDate=1522797083334, 
> sequenceNumber=15108613, masterKeyId=2674);exp=0; apps=[] in -1523446449367 
> ms, appId = []
> ...
> 2018-04-11 11:34:09,367 WARN 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider$KMSTokenRenewer: 
> keyProvider null cannot renew dt.
> 2018-04-11 11:34:09,367 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Renewed delegation-token= [Kind: kms-dt, Service: KMSIP:16000, Ident: 
> (kms-dt owner=user, renewer=yarn, realUser=, issueDate=1522192283334, 
> maxDate=1522797083334, sequenceNumber=15108613, masterKeyId=2674);exp=0; 
> apps=[]], for []
> 2018-04-11 11:34:09,367 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Renew Kind: kms-dt, Service: KMSIP:16000, Ident: (kms-dt owner=user, 
> renewer=yarn, realUser=, issueDate=1522192283334, maxDate=1522797083334, 
> sequenceNumber=15108613, masterKeyId=2674);exp=0; apps=[] in -1523446449367 
> ms, appId = []
> {noformat}
> Further inspection shows the KMS IP is from another cluster. The RM is before 
> HADOOP-14445, so needs to read from config. The config rightfully doesn't 
> have the other cluster's KMS configured.
> Although HADOOP-14445 will make this a non-issue by creating the provider 
> from token service, we should fix 2 things here:
> - KMS token renewer should throw instead of return 0. Returning 0 when not 
> able to renew shall be considered a bug in the renewer.
> - Yarn RM's {{DelegationTokenRenewer}} service should validate the return and 
> not go into this busy loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15390) Yarn RM logs flooded by DelegationTokenRenewer trying to renew KMS tokens

2018-04-23 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449044#comment-16449044
 ] 

Xiao Chen commented on HADOOP-15390:


Thanks a lot Robert!
Cherry-picked to branch-2.8 too. There were some trivial conflict in 
TestDelegationTokenRenewer because some tests were not in branch-2.8. Manually 
resolved it and {{mvn test}}'ed that class locally before pushing.

> Yarn RM logs flooded by DelegationTokenRenewer trying to renew KMS tokens
> -
>
> Key: HADOOP-15390
> URL: https://issues.apache.org/jira/browse/HADOOP-15390
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Critical
> Fix For: 3.2.0, 3.1.1, 2.9.2, 3.0.3
>
> Attachments: HADOOP-15390.01.patch, HADOOP-15390.02.patch
>
>
> When looking at a recent issue with [~rkanter] and [~yufeigu], we found that 
> the RM log in a cluster was flooded by KMS token renewal errors below:
> {noformat}
> $ tail -9 hadoop-cmf-yarn-RESOURCEMANAGER.log
> 2018-04-11 11:34:09,367 WARN 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider$KMSTokenRenewer: 
> keyProvider null cannot renew dt.
> 2018-04-11 11:34:09,367 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Renewed delegation-token= [Kind: kms-dt, Service: KMSIP:16000, Ident: 
> (kms-dt owner=user, renewer=yarn, realUser=, issueDate=1522192283334, 
> maxDate=1522797083334, sequenceNumber=15108613, masterKeyId=2674);exp=0; 
> apps=[]], for []
> 2018-04-11 11:34:09,367 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Renew Kind: kms-dt, Service: KMSIP:16000, Ident: (kms-dt owner=user, 
> renewer=yarn, realUser=, issueDate=1522192283334, maxDate=1522797083334, 
> sequenceNumber=15108613, masterKeyId=2674);exp=0; apps=[] in -1523446449367 
> ms, appId = []
> ...
> 2018-04-11 11:34:09,367 WARN 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider$KMSTokenRenewer: 
> keyProvider null cannot renew dt.
> 2018-04-11 11:34:09,367 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Renewed delegation-token= [Kind: kms-dt, Service: KMSIP:16000, Ident: 
> (kms-dt owner=user, renewer=yarn, realUser=, issueDate=1522192283334, 
> maxDate=1522797083334, sequenceNumber=15108613, masterKeyId=2674);exp=0; 
> apps=[]], for []
> 2018-04-11 11:34:09,367 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Renew Kind: kms-dt, Service: KMSIP:16000, Ident: (kms-dt owner=user, 
> renewer=yarn, realUser=, issueDate=1522192283334, maxDate=1522797083334, 
> sequenceNumber=15108613, masterKeyId=2674);exp=0; apps=[] in -1523446449367 
> ms, appId = []
> {noformat}
> Further inspection shows the KMS IP is from another cluster. The RM is before 
> HADOOP-14445, so needs to read from config. The config rightfully doesn't 
> have the other cluster's KMS configured.
> Although HADOOP-14445 will make this a non-issue by creating the provider 
> from token service, we should fix 2 things here:
> - KMS token renewer should throw instead of return 0. Returning 0 when not 
> able to renew shall be considered a bug in the renewer.
> - Yarn RM's {{DelegationTokenRenewer}} service should validate the return and 
> not go into this busy loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15390) Yarn RM logs flooded by DelegationTokenRenewer trying to renew KMS tokens

2018-04-23 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated HADOOP-15390:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.3
   2.9.2
   3.1.1
   3.2.0
   Status: Resolved  (was: Patch Available)

Thanks [~xiaochen].  Committed to trunk, branch-3.1, branch-3.0, and branch-2.9!

> Yarn RM logs flooded by DelegationTokenRenewer trying to renew KMS tokens
> -
>
> Key: HADOOP-15390
> URL: https://issues.apache.org/jira/browse/HADOOP-15390
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Critical
> Fix For: 3.2.0, 3.1.1, 2.9.2, 3.0.3
>
> Attachments: HADOOP-15390.01.patch, HADOOP-15390.02.patch
>
>
> When looking at a recent issue with [~rkanter] and [~yufeigu], we found that 
> the RM log in a cluster was flooded by KMS token renewal errors below:
> {noformat}
> $ tail -9 hadoop-cmf-yarn-RESOURCEMANAGER.log
> 2018-04-11 11:34:09,367 WARN 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider$KMSTokenRenewer: 
> keyProvider null cannot renew dt.
> 2018-04-11 11:34:09,367 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Renewed delegation-token= [Kind: kms-dt, Service: KMSIP:16000, Ident: 
> (kms-dt owner=user, renewer=yarn, realUser=, issueDate=1522192283334, 
> maxDate=1522797083334, sequenceNumber=15108613, masterKeyId=2674);exp=0; 
> apps=[]], for []
> 2018-04-11 11:34:09,367 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Renew Kind: kms-dt, Service: KMSIP:16000, Ident: (kms-dt owner=user, 
> renewer=yarn, realUser=, issueDate=1522192283334, maxDate=1522797083334, 
> sequenceNumber=15108613, masterKeyId=2674);exp=0; apps=[] in -1523446449367 
> ms, appId = []
> ...
> 2018-04-11 11:34:09,367 WARN 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider$KMSTokenRenewer: 
> keyProvider null cannot renew dt.
> 2018-04-11 11:34:09,367 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Renewed delegation-token= [Kind: kms-dt, Service: KMSIP:16000, Ident: 
> (kms-dt owner=user, renewer=yarn, realUser=, issueDate=1522192283334, 
> maxDate=1522797083334, sequenceNumber=15108613, masterKeyId=2674);exp=0; 
> apps=[]], for []
> 2018-04-11 11:34:09,367 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Renew Kind: kms-dt, Service: KMSIP:16000, Ident: (kms-dt owner=user, 
> renewer=yarn, realUser=, issueDate=1522192283334, maxDate=1522797083334, 
> sequenceNumber=15108613, masterKeyId=2674);exp=0; apps=[] in -1523446449367 
> ms, appId = []
> {noformat}
> Further inspection shows the KMS IP is from another cluster. The RM is before 
> HADOOP-14445, so needs to read from config. The config rightfully doesn't 
> have the other cluster's KMS configured.
> Although HADOOP-14445 will make this a non-issue by creating the provider 
> from token service, we should fix 2 things here:
> - KMS token renewer should throw instead of return 0. Returning 0 when not 
> able to renew shall be considered a bug in the renewer.
> - Yarn RM's {{DelegationTokenRenewer}} service should validate the return and 
> not go into this busy loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-12427) [JDK8] Upgrade Mockito version to 1.10.19

2018-04-23 Thread Giovanni Matteo Fumarola (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449028#comment-16449028
 ] 

Giovanni Matteo Fumarola commented on HADOOP-12427:
---

[~jlowe], there were 2 unit tests that failed with the new version.

* TestEditLogRace.testSaveRightBeforeSync
* TestResourceLocalizationService.testFailedDirsResourceRelease
This was a couple of years ago. I am not sure if they are still failing with 
the upgrade.

> [JDK8] Upgrade Mockito version to 1.10.19
> -
>
> Key: HADOOP-12427
> URL: https://issues.apache.org/jira/browse/HADOOP-12427
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Minor
> Attachments: HADOOP-12427.v0.patch
>
>
> The current version is 1.8.5 - inserted in 2011.
> JDK 8 has been supported since 1.10.0. 
> https://github.com/mockito/mockito/blob/master/doc/release-notes/official.md
> "Compatible with JDK8 with exception of defender methods, JDK8 support will 
> improve in 2.0"
> http://mockito.org/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-12427) [JDK8] Upgrade Mockito version to 1.10.19

2018-04-23 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448999#comment-16448999
 ] 

genericqa commented on HADOOP-12427:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
22s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
35m  1s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 14s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
11s{color} | {color:green} hadoop-project in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 48m 55s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b |
| JIRA Issue | HADOOP-12427 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12761772/HADOOP-12427.v0.patch 
|
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  xml  |
| uname | Linux 1ebf9a67d26c 3.13.0-137-generic #186-Ubuntu SMP Mon Dec 4 
19:09:19 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 42e82f0 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14518/testReport/ |
| Max. process+thread count | 334 (vs. ulimit of 1) |
| modules | C: hadoop-project U: hadoop-project |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14518/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> [JDK8] Upgrade Mockito version to 1.10.19
> -
>
> Key: HADOOP-12427
> URL: https://issues.apache.org/jira/browse/HADOOP-12427
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Minor
> Attachments: HADOOP-12427.v0.patch
>
>
> The current version 

[jira] [Commented] (HADOOP-15390) Yarn RM logs flooded by DelegationTokenRenewer trying to renew KMS tokens

2018-04-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448989#comment-16448989
 ] 

Hudson commented on HADOOP-15390:
-

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #14051 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14051/])
HADOOP-15390. Yarn RM logs flooded by DelegationTokenRenewer trying to 
(rkanter: rev 7ab08a9c37a76edbe02d556fcfb2e637f45afc21)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
* (edit) 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/kms/KMSTokenRenewer.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java


> Yarn RM logs flooded by DelegationTokenRenewer trying to renew KMS tokens
> -
>
> Key: HADOOP-15390
> URL: https://issues.apache.org/jira/browse/HADOOP-15390
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Critical
> Attachments: HADOOP-15390.01.patch, HADOOP-15390.02.patch
>
>
> When looking at a recent issue with [~rkanter] and [~yufeigu], we found that 
> the RM log in a cluster was flooded by KMS token renewal errors below:
> {noformat}
> $ tail -9 hadoop-cmf-yarn-RESOURCEMANAGER.log
> 2018-04-11 11:34:09,367 WARN 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider$KMSTokenRenewer: 
> keyProvider null cannot renew dt.
> 2018-04-11 11:34:09,367 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Renewed delegation-token= [Kind: kms-dt, Service: KMSIP:16000, Ident: 
> (kms-dt owner=user, renewer=yarn, realUser=, issueDate=1522192283334, 
> maxDate=1522797083334, sequenceNumber=15108613, masterKeyId=2674);exp=0; 
> apps=[]], for []
> 2018-04-11 11:34:09,367 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Renew Kind: kms-dt, Service: KMSIP:16000, Ident: (kms-dt owner=user, 
> renewer=yarn, realUser=, issueDate=1522192283334, maxDate=1522797083334, 
> sequenceNumber=15108613, masterKeyId=2674);exp=0; apps=[] in -1523446449367 
> ms, appId = []
> ...
> 2018-04-11 11:34:09,367 WARN 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider$KMSTokenRenewer: 
> keyProvider null cannot renew dt.
> 2018-04-11 11:34:09,367 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Renewed delegation-token= [Kind: kms-dt, Service: KMSIP:16000, Ident: 
> (kms-dt owner=user, renewer=yarn, realUser=, issueDate=1522192283334, 
> maxDate=1522797083334, sequenceNumber=15108613, masterKeyId=2674);exp=0; 
> apps=[]], for []
> 2018-04-11 11:34:09,367 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Renew Kind: kms-dt, Service: KMSIP:16000, Ident: (kms-dt owner=user, 
> renewer=yarn, realUser=, issueDate=1522192283334, maxDate=1522797083334, 
> sequenceNumber=15108613, masterKeyId=2674);exp=0; apps=[] in -1523446449367 
> ms, appId = []
> {noformat}
> Further inspection shows the KMS IP is from another cluster. The RM is before 
> HADOOP-14445, so needs to read from config. The config rightfully doesn't 
> have the other cluster's KMS configured.
> Although HADOOP-14445 will make this a non-issue by creating the provider 
> from token service, we should fix 2 things here:
> - KMS token renewer should throw instead of return 0. Returning 0 when not 
> able to renew shall be considered a bug in the renewer.
> - Yarn RM's {{DelegationTokenRenewer}} service should validate the return and 
> not go into this busy loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-04-23 Thread Esfandiar Manii (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448976#comment-16448976
 ] 

Esfandiar Manii commented on HADOOP-15407:
--

{code:java}
[INFO] ---
[INFO]  T E S T S
[INFO] ---
[INFO] Running org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemCreate
[INFO] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.924 s 
- in org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemCreate
[INFO] Running org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemCopy
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.623 s 
- in org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemCopy
[INFO] Running 
org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemInitAndCreate
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.731 s 
- in org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemInitAndCreate
[INFO] Running org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemE2EScale
[INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 246.169 
s - in org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemE2EScale
[INFO] Running org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemAppend
[INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.202 s 
- in org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemAppend
[INFO] Running 
org.apache.hadoop.fs.azurebfs.diagnostics.TestConfigurationValidators
[INFO] Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.805 s 
- in org.apache.hadoop.fs.azurebfs.diagnostics.TestConfigurationValidators
[INFO] Running org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemRename
[WARNING] Tests run: 6, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 
27.916 s - in org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemRename
[INFO] Running 
org.apache.hadoop.fs.azurebfs.services.TestConfigurationServiceFieldsValidation
[INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.258 s 
- in 
org.apache.hadoop.fs.azurebfs.services.TestConfigurationServiceFieldsValidation
[INFO] Running org.apache.hadoop.fs.azurebfs.services.ITestAbfsHttpServiceImpl
[INFO] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.977 s 
- in org.apache.hadoop.fs.azurebfs.services.ITestAbfsHttpServiceImpl
[INFO] Running 
org.apache.hadoop.fs.azurebfs.services.TestParameterizedLoggingServiceImpl
[INFO] Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.283 s 
- in org.apache.hadoop.fs.azurebfs.services.TestParameterizedLoggingServiceImpl
[INFO] Running org.apache.hadoop.fs.azurebfs.services.TestLoggingServiceImpl
[INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.253 s 
- in org.apache.hadoop.fs.azurebfs.services.TestLoggingServiceImpl
[INFO] Running 
org.apache.hadoop.fs.azurebfs.services.TestNetworkThroughputAnalysisServiceImpl
[INFO] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 35.87 s 
- in 
org.apache.hadoop.fs.azurebfs.services.TestNetworkThroughputAnalysisServiceImpl
[INFO] Running org.apache.hadoop.fs.azurebfs.services.ITestReadWriteAndSeek
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 244.85 s 
- in org.apache.hadoop.fs.azurebfs.services.ITestReadWriteAndSeek
[INFO] Running 
org.apache.hadoop.fs.azurebfs.services.TestAbfsStatisticsServiceImpl
[INFO] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.195 s 
- in org.apache.hadoop.fs.azurebfs.services.TestAbfsStatisticsServiceImpl
[INFO] Running org.apache.hadoop.fs.azurebfs.services.ITestTracingServiceImpl
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.893 s 
- in org.apache.hadoop.fs.azurebfs.services.ITestTracingServiceImpl
[INFO] Running org.apache.hadoop.fs.azurebfs.utils.TestUriUtils
[INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.037 s 
- in org.apache.hadoop.fs.azurebfs.utils.TestUriUtils
[INFO] Running org.apache.hadoop.fs.azurebfs.ITestWasbAbfsCompatibility
[WARNING] Tests run: 5, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 
11.948 s - in org.apache.hadoop.fs.azurebfs.ITestWasbAbfsCompatibility
[INFO] Running org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemFileStatus
[INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.894 s 
- in org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemFileStatus
[INFO] Running 
org.apache.hadoop.fs.azurebfs.contract.ITestAbfsFileSystemContractDistCp
[WARNING] Tests run: 6, Failures: 0, Errors: 0, Skipped: 6, Time elapsed: 0.834 
s - in org.apache.hadoop.fs.azurebfs.contract.ITestAbfsFileSystemContractDistCp
[INFO] Running 
org.apache.hadoop.fs.azurebfs.contract.ITestAzureBlobFileSystemContract
[INFO] Tests run: 45, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 35.694 
s - in 

[jira] [Updated] (HADOOP-14756) S3Guard: expose capability query in MetadataStore and add tests of authoritative mode

2018-04-23 Thread Aaron Fabbri (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Fabbri updated HADOOP-14756:
--
   Resolution: Fixed
Fix Version/s: 3.2.0
   Status: Resolved  (was: Patch Available)

Committed to trunk after fixing the remaining checkstyle issue.  Thank you for 
the contribution [~gabor.bota].

> S3Guard: expose capability query in MetadataStore and add tests of 
> authoritative mode
> -
>
> Key: HADOOP-14756
> URL: https://issues.apache.org/jira/browse/HADOOP-14756
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Steve Loughran
>Assignee: Gabor Bota
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: HADOOP-14756.001.patch, HADOOP-14756.002.patch, 
> HADOOP-14756.003.patch
>
>
> {{MetadataStoreTestBase.testListChildren}} would be improved with the ability 
> to query the features offered by the store, and the outcome of {{put()}}, so 
> probe the correctness of the authoritative mode
> # Add predicate to MetadataStore interface  
> {{supportsAuthoritativeDirectories()}} or similar
> # If #1 is true, assert that directory is fully cached after changes
> # Add "isNew" flag to MetadataStore.put(DirListingMetadata); use to verify 
> when changes are made



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14756) S3Guard: expose capability query in MetadataStore and add tests of authoritative mode

2018-04-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448973#comment-16448973
 ] 

Hudson commented on HADOOP-14756:
-

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #14050 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14050/])
HADOOP-14756 S3Guard: expose capability query in MetadataStore and add (fabbri: 
rev 989a3929a92edb000cfa486146987fb75a9eda61)
* (edit) 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/MetadataStoreTestBase.java
* (edit) 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/DynamoDBMetadataStore.java
* (add) 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/MetadataStoreCapabilities.java
* (edit) 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/LocalMetadataStore.java
* (edit) 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/MetadataStore.java


> S3Guard: expose capability query in MetadataStore and add tests of 
> authoritative mode
> -
>
> Key: HADOOP-14756
> URL: https://issues.apache.org/jira/browse/HADOOP-14756
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Steve Loughran
>Assignee: Gabor Bota
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: HADOOP-14756.001.patch, HADOOP-14756.002.patch, 
> HADOOP-14756.003.patch
>
>
> {{MetadataStoreTestBase.testListChildren}} would be improved with the ability 
> to query the features offered by the store, and the outcome of {{put()}}, so 
> probe the correctness of the authoritative mode
> # Add predicate to MetadataStore interface  
> {{supportsAuthoritativeDirectories()}} or similar
> # If #1 is true, assert that directory is fully cached after changes
> # Add "isNew" flag to MetadataStore.put(DirListingMetadata); use to verify 
> when changes are made



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-04-23 Thread Esfandiar Manii (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esfandiar Manii updated HADOOP-15407:
-
Attachment: HADOOP-15407-001.patch

> Support Windows Azure Storage - Blob file system in Hadoop
> --
>
> Key: HADOOP-15407
> URL: https://issues.apache.org/jira/browse/HADOOP-15407
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Esfandiar Manii
>Assignee: Esfandiar Manii
>Priority: Major
> Attachments: HADOOP-15407-001.patch
>
>
> *{color:#212121}Description{color}*
>  This JIRA adds a new file system implementation, ABFS, for running Big Data 
> and Analytics workloads against Azure Storage. This is a complete rewrite of 
> the previous WASB driver with a heavy focus on optimizing both performance 
> and cost.
>  {color:#212121} {color}
>  *{color:#212121}High level design{color}*
>  At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blobs in Azure Storage. The scheme abfs is used 
> for accessing it over HTTP, and abfss for accessing over HTTPS. The following 
> URI scheme is used to address individual paths:
>  {color:#212121} {color}
>  
> {color:#212121}abfs[s]://@.dfs.core.windows.net/{color}
>  {color:#212121} {color}
>  {color:#212121}ABFS is intended as a replacement to WASB. WASB is not 
> deprecated but is in pure maintenance mode and customers should upgrade to 
> ABFS once it hits General Availability later in CY18.{color}
>  {color:#212121}Benefits of ABFS include:{color}
>  {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big 
> Data and Analytics workloads by allowing higher limits on storage 
> accounts{color}
>  {color:#212121}· Removing any ramp up time with Storage backend 
> partitioning; blocks are now automatically sharded across partitions in the 
> Storage backend{color}
> {color:#212121}          .         This avoids the need for using 
> temporary/intermediate files, increasing the cost (and framework complexity 
> around committing jobs/tasks){color}
>  {color:#212121}· Enabling much higher read and write throughput on 
> single files (tens of Gbps by default){color}
>  {color:#212121}· Still retaining all of the Azure Blob features 
> customers are familiar with and expect, and gaining the benefits of future 
> Blob features as well{color}
>  {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the 
> file system throughput and operations. Ambari metrics are not currently 
> implemented for ABFS, but will be available soon.{color}
>  {color:#212121} {color}
>  *{color:#212121}Credits and history{color}*
>  Credit for this work goes to (hope I don't forget anyone): Shane Mainali, 
> {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar 
> Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, 
> and James Baker. {color}
>  {color:#212121} {color}
>  *Test*
>  ABFS has gone through many test procedures including Hadoop file system 
> contract tests, unit testing, functional testing, and manual testing. All the 
> Junit tests provided with the driver are capable of running in both 
> sequential/parallel fashion in order to reduce the testing time.
>  {color:#212121}Besides unit tests, we have used ABFS as the default file 
> system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a 
> storage option. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, 
> Spark Streaming and Spark SQL, and others have been run to do scenario, 
> performance, and functional testing. Third parties and customers have also 
> done various testing of ABFS.{color}
>  {color:#212121}The current version reflects to the version of the code 
> tested and used in our production environment.{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-04-23 Thread Esfandiar Manii (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esfandiar Manii updated HADOOP-15407:
-
Description: 
*{color:#212121}Description{color}*
 This JIRA adds a new file system implementation, ABFS, for running Big Data 
and Analytics workloads against Azure Storage. This is a complete rewrite of 
the previous WASB driver with a heavy focus on optimizing both performance and 
cost.
 {color:#212121} {color}
 *{color:#212121}High level design{color}*
 At a high level, the code here extends the FileSystem class to provide an 
implementation for accessing blobs in Azure Storage. The scheme abfs is used 
for accessing it over HTTP, and abfss for accessing over HTTPS. The following 
URI scheme is used to address individual paths:
 {color:#212121} {color}
 
{color:#212121}abfs[s]://@.dfs.core.windows.net/{color}
 {color:#212121} {color}
 {color:#212121}ABFS is intended as a replacement to WASB. WASB is not 
deprecated but is in pure maintenance mode and customers should upgrade to ABFS 
once it hits General Availability later in CY18.{color}
 {color:#212121}Benefits of ABFS include:{color}
 {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big 
Data and Analytics workloads by allowing higher limits on storage 
accounts{color}
 {color:#212121}· Removing any ramp up time with Storage backend 
partitioning; blocks are now automatically sharded across partitions in the 
Storage backend{color}
{color:#212121}          .         This avoids the need for using 
temporary/intermediate files, increasing the cost (and framework complexity 
around committing jobs/tasks){color}
 {color:#212121}· Enabling much higher read and write throughput on 
single files (tens of Gbps by default){color}
 {color:#212121}· Still retaining all of the Azure Blob features 
customers are familiar with and expect, and gaining the benefits of future Blob 
features as well{color}
 {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the file 
system throughput and operations. Ambari metrics are not currently implemented 
for ABFS, but will be available soon.{color}
 {color:#212121} {color}
 *{color:#212121}Credits and history{color}*
 Credit for this work goes to (hope I don't forget anyone): Shane Mainali, 
{color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar Manii, 
Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, and 
James Baker. {color}
 {color:#212121} {color}
 *Test*
 ABFS has gone through many test procedures including Hadoop file system 
contract tests, unit testing, functional testing, and manual testing. All the 
Junit tests provided with the driver are capable of running in both 
sequential/parallel fashion in order to reduce the testing time.
 {color:#212121}Besides unit tests, we have used ABFS as the default file 
system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a 
storage option. (HDFS is also used but not as default file system.) Various 
different customer and test workloads have been run against clusters with such 
configurations for quite some time. Benchmarks such as Tera*, TPC-DS, Spark 
Streaming and Spark SQL, and others have been run to do scenario, performance, 
and functional testing. Third parties and customers have also done various 
testing of ABFS.{color}
 {color:#212121}The current version reflects to the version of the code tested 
and used in our production environment.{color}

  was:
{color:#212121}Description{color}
 This JIRA adds a new file system implementation, ABFS, for running Big Data 
and Analytics workloads against Azure Storage. This is a complete rewrite of 
the previous WASB driver with a heavy focus on optimizing both performance and 
cost.
 {color:#212121} {color}
 {color:#212121}High level design{color}
 At a high level, the code here extends the FileSystem class to provide an 
implementation for accessing blobs in Azure Storage. The scheme abfs is used 
for accessing it over HTTP, and abfss for accessing over HTTPS. The following 
URI scheme is used to address individual paths:
 {color:#212121} {color}
 
{color:#212121}abfs[s]://@.dfs.core.windows.net/{color}
 {color:#212121} {color}
 {color:#212121}ABFS is intended as a replacement to WASB. WASB is not 
deprecated but is in pure maintenance mode and customers should upgrade to ABFS 
once it hits General Availability later in CY18.{color}
 {color:#212121}Benefits of ABFS include:{color}
 {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big 
Data and Analytics workloads by allowing higher limits on storage 
accounts{color}
 {color:#212121}· Removing any ramp up time with Storage backend 
partitioning; blocks are now automatically sharded across partitions in the 
Storage backend{color}
 {color:#212121}oThis avoids the need for using temporary/intermediate 
files, increasing 

[jira] [Updated] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-04-23 Thread Esfandiar Manii (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esfandiar Manii updated HADOOP-15407:
-
Description: 
{color:#212121}Description{color}
 This JIRA adds a new file system implementation, ABFS, for running Big Data 
and Analytics workloads against Azure Storage. This is a complete rewrite of 
the previous WASB driver with a heavy focus on optimizing both performance and 
cost.
 {color:#212121} {color}
 {color:#212121}High level design{color}
 At a high level, the code here extends the FileSystem class to provide an 
implementation for accessing blobs in Azure Storage. The scheme abfs is used 
for accessing it over HTTP, and abfss for accessing over HTTPS. The following 
URI scheme is used to address individual paths:
 {color:#212121} {color}
 
{color:#212121}abfs[s]://@.dfs.core.windows.net/{color}
 {color:#212121} {color}
 {color:#212121}ABFS is intended as a replacement to WASB. WASB is not 
deprecated but is in pure maintenance mode and customers should upgrade to ABFS 
once it hits General Availability later in CY18.{color}
 {color:#212121}Benefits of ABFS include:{color}
 {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big 
Data and Analytics workloads by allowing higher limits on storage 
accounts{color}
 {color:#212121}· Removing any ramp up time with Storage backend 
partitioning; blocks are now automatically sharded across partitions in the 
Storage backend{color}
 {color:#212121}oThis avoids the need for using temporary/intermediate 
files, increasing the cost (and framework complexity around committing 
jobs/tasks){color}
 {color:#212121}· Enabling much higher read and write throughput on 
single files (tens of Gbps by default){color}
 {color:#212121}· Still retaining all of the Azure Blob features 
customers are familiar with and expect, and gaining the benefits of future Blob 
features as well{color}
 {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the file 
system throughput and operations. Ambari metrics are not currently implemented 
for ABFS, but will be available soon.{color}
 {color:#212121} {color}
 {color:#212121}Credits and history{color}
 Credit for this work goes to (hope I don't forget anyone): Shane Mainali, 
{color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar Manii, 
Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, and 
James Baker. {color}
 {color:#212121} {color}
 Test
 ABFS has gone through many test procedures including Hadoop file system 
contract tests, unit testing, functional testing, and manual testing. All the 
Junit tests provided with the driver are capable of running in both 
sequential/parallel fashion in order to reduce the testing time.
 {color:#212121}Besides unit tests, we have used ABFS as the default file 
system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a 
storage option. (HDFS is also used but not as default file system.) Various 
different customer and test workloads have been run against clusters with such 
configurations for quite some time. Benchmarks such as Tera*, TPC-DS, Spark 
Streaming and Spark SQL, and others have been run to do scenario, performance, 
and functional testing. Third parties and customers have also done various 
testing of ABFS.{color}
 {color:#212121}The current version reflects to the version of the code tested 
and used in our production environment.{color}

  was:
{color:#33}Description{color}
This JIRA adds a new file system implementation, ABFS, for running Big Data and 
Analytics workloads against Azure Storage. This is a complete rewrite of the 
previous WASB driver with a heavy focus on optimizing both performance and cost.
{color:#33}High level design{color}
At a high level, the code here extends the FileSystem class to provide an 
implementation for accessing blobs in Azure Storage. The scheme abfs is used 
for accessing it over HTTP, and abfss for accessing over HTTPS. The following 
URI scheme is used to address individual paths:
abfs[s]://@.dfs.core.windows.net/
{color:#33} {color}
ABFS is intended as a replacement to WASB. WASB is not deprecated but is in 
pure maintenance mode and customers should upgrade to ABFS once it hits General 
Availability later in CY18.
Benefits of ABFS include: * Higher scale (capacity, throughput, and IOPS) Big 
Data and Analytics workloads by allowing higher limits on storage accounts
 * Removing any ramp up time with Storage backend partitioning; blocks are now 
automatically sharded across partitions in the Storage backend
 ** This avoids the need for using temporary/intermediate files, increasing the 
cost (and framework complexity around committing jobs/tasks)

 * Enabling much higher read and write throughput on single files (tens of Gbps 
by default)
 * Still retaining all of the Azure Blob features customers are 

[jira] [Updated] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-04-23 Thread Esfandiar Manii (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Esfandiar Manii updated HADOOP-15407:
-
Description: 
{color:#33}Description{color}
This JIRA adds a new file system implementation, ABFS, for running Big Data and 
Analytics workloads against Azure Storage. This is a complete rewrite of the 
previous WASB driver with a heavy focus on optimizing both performance and cost.
{color:#33}High level design{color}
At a high level, the code here extends the FileSystem class to provide an 
implementation for accessing blobs in Azure Storage. The scheme abfs is used 
for accessing it over HTTP, and abfss for accessing over HTTPS. The following 
URI scheme is used to address individual paths:
abfs[s]://@.dfs.core.windows.net/
{color:#33} {color}
ABFS is intended as a replacement to WASB. WASB is not deprecated but is in 
pure maintenance mode and customers should upgrade to ABFS once it hits General 
Availability later in CY18.
Benefits of ABFS include: * Higher scale (capacity, throughput, and IOPS) Big 
Data and Analytics workloads by allowing higher limits on storage accounts
 * Removing any ramp up time with Storage backend partitioning; blocks are now 
automatically sharded across partitions in the Storage backend
 ** This avoids the need for using temporary/intermediate files, increasing the 
cost (and framework complexity around committing jobs/tasks)

 * Enabling much higher read and write throughput on single files (tens of Gbps 
by default)
 * Still retaining all of the Azure Blob features customers are familiar with 
and expect, and gaining the benefits of future Blob features as well

ABFS incorporates Hadoop Filesystem metrics to monitor the file system 
throughput and operations. Ambari metrics are not currently implemented for 
ABFS, but will be available soon.
 
{color:#33}Credits and history{color}
Credit for this work goes to .
{color:#33}Test{color}
ABFS has gone through many test procedures including Hadoop file system 
contract tests, unit testing, functional testing, and manual testing. All the 
Junit tests provided with the driver are capable of running in both 
sequential/parallel fashion in order to reduce the testing time.
Besides unit tests, we have used ABFS as the default file system in Azure 
HDInsight. Azure HDInsight will very soon offer ABFS as a storage option. (HDFS 
is also used but not as default file system.) Various different customer and 
test workloads have been run against clusters with such configurations for 
quite some time. Benchmarks such as Tera*, TPC-DS, Spark Streaming and Spark 
SQL, and others have been run to do scenario, performance, and functional 
testing. Third parties and customers have also done various testing of ABFS.
The current version reflects to the version of the code tested and used in our 
production environment.

  was:
{color:#212121}{color:#33}Description{color}{color}
{color:#212121}This JIRA adds a new file system implementation, ABFS, for 
running Big Data and Analytics workloads against Azure Storage. This is a 
complete rewrite of the previous WASB driver with a heavy focus on optimizing 
both performance and cost.{color}
{color:#212121} {color}
{color:#212121}{color:#33}High level design{color}{color}
{color:#212121}At a high level, the code here extends the FileSystem class to 
provide an implementation for accessing blobs in Azure Storage. The scheme abfs 
is used for accessing it over HTTP, and abfss for accessing over HTTPS. The 
following URI scheme is used to address individual paths:{color}
{color:#212121} {color}
{color:#212121}abfs[s]://@.dfs.core.windows.net/{color}
{color:#212121} {color}
{color:#212121}ABFS is intended as a replacement to WASB. WASB is not 
deprecated but is in pure maintenance mode and customers should upgrade to ABFS 
once it hits General Availability later in CY18.{color}
{color:#212121}Benefits of ABFS include:{color}
{color:#212121}· Higher scale (capacity, throughput, and IOPS) Big Data 
and Analytics workloads by allowing higher limits on storage accounts{color}
{color:#212121}· Removing any ramp up time with Storage backend 
partitioning; blocks are now automatically sharded across partitions in the 
Storage backend{color}
{color:#212121}oThis avoids the need for using temporary/intermediate 
files, increasing the cost (and framework complexity around committing 
jobs/tasks){color}
{color:#212121}· Enabling much higher read and write throughput on 
single files (tens of Gbps by default){color}
{color:#212121}· Still retaining all of the Azure Blob features 
customers are familiar with and expect, and gaining the benefits of future Blob 
features as well{color}
{color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the file 
system throughput and operations. Ambari metrics are not currently implemented 
for 

[jira] [Created] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-04-23 Thread Esfandiar Manii (JIRA)
Esfandiar Manii created HADOOP-15407:


 Summary: Support Windows Azure Storage - Blob file system in Hadoop
 Key: HADOOP-15407
 URL: https://issues.apache.org/jira/browse/HADOOP-15407
 Project: Hadoop Common
  Issue Type: New Feature
  Components: fs/azure
Affects Versions: 3.2.0
Reporter: Esfandiar Manii
Assignee: Esfandiar Manii


{color:#212121}{color:#33}Description{color}{color}
{color:#212121}This JIRA adds a new file system implementation, ABFS, for 
running Big Data and Analytics workloads against Azure Storage. This is a 
complete rewrite of the previous WASB driver with a heavy focus on optimizing 
both performance and cost.{color}
{color:#212121} {color}
{color:#212121}{color:#33}High level design{color}{color}
{color:#212121}At a high level, the code here extends the FileSystem class to 
provide an implementation for accessing blobs in Azure Storage. The scheme abfs 
is used for accessing it over HTTP, and abfss for accessing over HTTPS. The 
following URI scheme is used to address individual paths:{color}
{color:#212121} {color}
{color:#212121}abfs[s]://@.dfs.core.windows.net/{color}
{color:#212121} {color}
{color:#212121}ABFS is intended as a replacement to WASB. WASB is not 
deprecated but is in pure maintenance mode and customers should upgrade to ABFS 
once it hits General Availability later in CY18.{color}
{color:#212121}Benefits of ABFS include:{color}
{color:#212121}· Higher scale (capacity, throughput, and IOPS) Big Data 
and Analytics workloads by allowing higher limits on storage accounts{color}
{color:#212121}· Removing any ramp up time with Storage backend 
partitioning; blocks are now automatically sharded across partitions in the 
Storage backend{color}
{color:#212121}oThis avoids the need for using temporary/intermediate 
files, increasing the cost (and framework complexity around committing 
jobs/tasks){color}
{color:#212121}· Enabling much higher read and write throughput on 
single files (tens of Gbps by default){color}
{color:#212121}· Still retaining all of the Azure Blob features 
customers are familiar with and expect, and gaining the benefits of future Blob 
features as well{color}
{color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the file 
system throughput and operations. Ambari metrics are not currently implemented 
for ABFS, but will be available soon.{color}
{color:#212121} {color}
{color:#212121}{color:#33}Credits and history{color}{color}
{color:#212121}Credit for this work goes to (hope I don't forget anyone): Shane 
Mainali, {color}{color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, 
Esfandiar Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, 
Saurabh Pant, and James Baker. {color}
{color:#212121}{color:#33} {color}{color}
{color:#212121}{color:#33}Test{color}{color}
{color:#212121}ABFS has gone through many test procedures including Hadoop file 
system contract tests, unit testing, functional testing, and manual testing. 
All the Junit tests provided with the driver are capable of running in both 
sequential/parallel fashion in order to reduce the testing time.{color}
{color:#212121}Besides unit tests, we have used ABFS as the default file system 
in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a storage 
option. (HDFS is also used but not as default file system.) Various different 
customer and test workloads have been run against clusters with such 
configurations for quite some time. Benchmarks such as Tera*, TPC-DS, Spark 
Streaming and Spark SQL, and others have been run to do scenario, performance, 
and functional testing. Third parties and customers have also done various 
testing of ABFS.{color}
{color:#212121}The current version reflects to the version of the code tested 
and used in our production environment.{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15390) Yarn RM logs flooded by DelegationTokenRenewer trying to renew KMS tokens

2018-04-23 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448945#comment-16448945
 ] 

Robert Kanter commented on HADOOP-15390:


+1 LGTM

> Yarn RM logs flooded by DelegationTokenRenewer trying to renew KMS tokens
> -
>
> Key: HADOOP-15390
> URL: https://issues.apache.org/jira/browse/HADOOP-15390
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Critical
> Attachments: HADOOP-15390.01.patch, HADOOP-15390.02.patch
>
>
> When looking at a recent issue with [~rkanter] and [~yufeigu], we found that 
> the RM log in a cluster was flooded by KMS token renewal errors below:
> {noformat}
> $ tail -9 hadoop-cmf-yarn-RESOURCEMANAGER.log
> 2018-04-11 11:34:09,367 WARN 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider$KMSTokenRenewer: 
> keyProvider null cannot renew dt.
> 2018-04-11 11:34:09,367 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Renewed delegation-token= [Kind: kms-dt, Service: KMSIP:16000, Ident: 
> (kms-dt owner=user, renewer=yarn, realUser=, issueDate=1522192283334, 
> maxDate=1522797083334, sequenceNumber=15108613, masterKeyId=2674);exp=0; 
> apps=[]], for []
> 2018-04-11 11:34:09,367 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Renew Kind: kms-dt, Service: KMSIP:16000, Ident: (kms-dt owner=user, 
> renewer=yarn, realUser=, issueDate=1522192283334, maxDate=1522797083334, 
> sequenceNumber=15108613, masterKeyId=2674);exp=0; apps=[] in -1523446449367 
> ms, appId = []
> ...
> 2018-04-11 11:34:09,367 WARN 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider$KMSTokenRenewer: 
> keyProvider null cannot renew dt.
> 2018-04-11 11:34:09,367 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Renewed delegation-token= [Kind: kms-dt, Service: KMSIP:16000, Ident: 
> (kms-dt owner=user, renewer=yarn, realUser=, issueDate=1522192283334, 
> maxDate=1522797083334, sequenceNumber=15108613, masterKeyId=2674);exp=0; 
> apps=[]], for []
> 2018-04-11 11:34:09,367 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Renew Kind: kms-dt, Service: KMSIP:16000, Ident: (kms-dt owner=user, 
> renewer=yarn, realUser=, issueDate=1522192283334, maxDate=1522797083334, 
> sequenceNumber=15108613, masterKeyId=2674);exp=0; apps=[] in -1523446449367 
> ms, appId = []
> {noformat}
> Further inspection shows the KMS IP is from another cluster. The RM is before 
> HADOOP-14445, so needs to read from config. The config rightfully doesn't 
> have the other cluster's KMS configured.
> Although HADOOP-14445 will make this a non-issue by creating the provider 
> from token service, we should fix 2 things here:
> - KMS token renewer should throw instead of return 0. Returning 0 when not 
> able to renew shall be considered a bug in the renewer.
> - Yarn RM's {{DelegationTokenRenewer}} service should validate the return and 
> not go into this busy loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-12427) [JDK8] Upgrade Mockito version to 1.10.19

2018-04-23 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448931#comment-16448931
 ] 

Jason Lowe commented on HADOOP-12427:
-

Ran across this as part of analyzing HADOOP-15398.  Was there anything that 
kept this from going in?  HADOOP-15398 proposes the same fix for its transient 
compile issue -- upgrading from 1.8.5 to 1.10.19.

> [JDK8] Upgrade Mockito version to 1.10.19
> -
>
> Key: HADOOP-12427
> URL: https://issues.apache.org/jira/browse/HADOOP-12427
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Minor
> Attachments: HADOOP-12427.v0.patch
>
>
> The current version is 1.8.5 - inserted in 2011.
> JDK 8 has been supported since 1.10.0. 
> https://github.com/mockito/mockito/blob/master/doc/release-notes/official.md
> "Compatible with JDK8 with exception of defender methods, JDK8 support will 
> improve in 2.0"
> http://mockito.org/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Assigned] (HADOOP-15398) StagingTestBase uses methods not available in Mockito 1.8.5

2018-04-23 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe reassigned HADOOP-15398:
---

Assignee: Mohammad Arshad
 Summary: StagingTestBase uses methods not available in Mockito 1.8.5  
(was: Compilation error in trunk in hadoop-aws )

Thanks for the patch!

+1 lgtm.  I'll commit this tomorrow if there are no objections.

> StagingTestBase uses methods not available in Mockito 1.8.5
> ---
>
> Key: HADOOP-15398
> URL: https://issues.apache.org/jira/browse/HADOOP-15398
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Mohammad Arshad
>Assignee: Mohammad Arshad
>Priority: Major
> Attachments: HADOOP-15398.001.patch
>
>
> *Problem:* hadoop trunk compilation is failing
>  *Root Cause:*
>  compilation error is coming from 
> {{org.apache.hadoop.fs.s3a.commit.staging.StagingTestBase}}. Compilation 
> error is "The method getArgumentAt(int, Class) is 
> undefined for the type InvocationOnMock".
> StagingTestBase is using getArgumentAt(int, Class) method 
> which is not available in mockito-all 1.8.5 version. getArgumentAt(int, 
> Class) method is available only from version 2.0.0-beta
> *Expectations:*
>  Either mockito-all version to be upgraded or test case to be written only 
> with available functions in 1.8.5.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15372) Race conditions and possible leaks in the Shell class

2018-04-23 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448892#comment-16448892
 ] 

Eric Badger commented on HADOOP-15372:
--

[~miklos.szeg...@cloudera.com], [~jlowe], uploaded a patch that addresses the 
issues stated above. I'm not super happy with how I had to fix up the code to 
get the try block to extend all the way up to right after the starting of the 
process, but it works. Basically because of the {{final}} variable for 
{{errReader}} being inside of the {{try}} block but also being needed in the 
{{finally}} block, I had to make a copy of it outside of the {{try}} block that 
bridges the gap. 

> Race conditions and possible leaks in the Shell class
> -
>
> Key: HADOOP-15372
> URL: https://issues.apache.org/jira/browse/HADOOP-15372
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.10.0, 3.2.0
>Reporter: Miklos Szegedi
>Assignee: Eric Badger
>Priority: Minor
> Attachments: HADOOP-15372.001.patch
>
>
> YARN-5641 introduced some cleanup code in the Shell class. It has a race 
> condition. {{Shell.
> runCommand()}} can be called while/after {{Shell.getAllShells()}} returned 
> all the shells to be cleaned up. This new thread can avoid the clean up, so 
> that the process held by it can be leaked causing leaked localized files/etc.
> I see another issue as well. {{Shell.runCommand()}} has a finally block with 
> a {{
> process.destroy();}} to clean up. However, the try catch block does not cover 
> all instructions after the process is started, so for example we can exit the 
> thread and leak the process, if {{
> timeOutTimer.schedule(timeoutTimerTask, timeOutInterval);}} causes an 
> exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15406) hadoop-nfs dependencies for mockito and junit are not test scope

2018-04-23 Thread Jason Lowe (JIRA)
Jason Lowe created HADOOP-15406:
---

 Summary: hadoop-nfs dependencies for mockito and junit are not 
test scope
 Key: HADOOP-15406
 URL: https://issues.apache.org/jira/browse/HADOOP-15406
 Project: Hadoop Common
  Issue Type: Bug
  Components: nfs
Reporter: Jason Lowe


hadoop-nfs asks for mockito-all and junit for its unit tests but it does not 
mark the dependency as being required only for tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15372) Race conditions and possible leaks in the Shell class

2018-04-23 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated HADOOP-15372:
-
Attachment: HADOOP-15372.001.patch

> Race conditions and possible leaks in the Shell class
> -
>
> Key: HADOOP-15372
> URL: https://issues.apache.org/jira/browse/HADOOP-15372
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.10.0, 3.2.0
>Reporter: Miklos Szegedi
>Assignee: Eric Badger
>Priority: Minor
> Attachments: HADOOP-15372.001.patch
>
>
> YARN-5641 introduced some cleanup code in the Shell class. It has a race 
> condition. {{Shell.
> runCommand()}} can be called while/after {{Shell.getAllShells()}} returned 
> all the shells to be cleaned up. This new thread can avoid the clean up, so 
> that the process held by it can be leaked causing leaked localized files/etc.
> I see another issue as well. {{Shell.runCommand()}} has a finally block with 
> a {{
> process.destroy();}} to clean up. However, the try catch block does not cover 
> all instructions after the process is started, so for example we can exit the 
> thread and leak the process, if {{
> timeOutTimer.schedule(timeoutTimerTask, timeOutInterval);}} causes an 
> exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15403) FileInputFormat recursive=false fails instead of ignoring the directories.

2018-04-23 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448602#comment-16448602
 ] 

Jason Lowe commented on HADOOP-15403:
-

bq. would a change in config be ok?

A change in the default value for a config is arguably the same thing as a code 
change that changes the default behavior from the perspective of a user.

To be clear I'm not saying we can't ever change the default behavior, but we 
need to be careful about the ramifications.  If we do, it needs to be marked as 
an incompatible change and have a corresponding release note that clearly 
explains the potential for silent data loss relative to the old behavior and 
what users can do to restore the old behavior.

Given the behavior for non-recursive has been this way for quite a long time, 
either users aren't running into this very often or they've set the value to 
recursive.  That leads me to suggest adding the ability to ignore directories 
but _not_ make it the default.  Then we don't have a backward incompatibility 
and the one Hive case you're trying can still work once the config is updated 
(or Hive can run the job with that setting automatically if it makes sense for 
that use case).


> FileInputFormat recursive=false fails instead of ignoring the directories.
> --
>
> Key: HADOOP-15403
> URL: https://issues.apache.org/jira/browse/HADOOP-15403
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HADOOP-15403.patch
>
>
> We are trying to create a split in Hive that will only read files in a 
> directory and not subdirectories.
> That fails with the below error.
> Given how this error comes about (two pieces of code interact, one explicitly 
> adding directories to results without failing, and one failing on any 
> directories in results), this seems like a bug.
> {noformat}
> Caused by: java.io.IOException: Not a file: 
> file:/,...warehouse/simple_to_mm_text/delta_001_001_
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:329) 
> ~[hadoop-mapreduce-client-core-3.1.0.jar:?]
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:553)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:754)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:203)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
> {noformat}
> This code, when recursion is disabled, adds directories to results 
> {noformat} 
> if (recursive && stat.isDirectory()) {
>   result.dirsNeedingRecursiveCalls.add(stat);
> } else {
>   result.locatedFileStatuses.add(stat);
> }
> {noformat} 
> However the getSplits code after that computes the size like this
> {noformat}
> long totalSize = 0;   // compute total size
> for (FileStatus file: files) {// check we have valid files
>   if (file.isDirectory()) {
> throw new IOException("Not a file: "+ file.getPath());
>   }
>   totalSize +=
> {noformat}
> which would always fail combined with the above code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13327) Add OutputStream + Syncable to the Filesystem Specification

2018-04-23 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-13327:

Status: Open  (was: Patch Available)

> Add OutputStream + Syncable to the Filesystem Specification
> ---
>
> Key: HADOOP-13327
> URL: https://issues.apache.org/jira/browse/HADOOP-13327
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-13327-002.patch, HADOOP-13327-branch-2-001.patch
>
>
> Write down what a Filesystem output stream should do. While core the API is 
> defined in Java, that doesn't say what's expected about visibility, 
> durability, etc —and Hadoop Syncable interface is entirely ours to define.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15403) FileInputFormat recursive=false fails instead of ignoring the directories.

2018-04-23 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448566#comment-16448566
 ] 

Sergey Shelukhin commented on HADOOP-15403:
---

[~jlowe] would a change in config be ok? I think it is better to add another 
config, but we can also make the existing one "true, false, -file not found- 
ignore", where ignore will have the new behavior. False can still work for 
people if they override listFiles.

[~ste...@apache.org] will fix both with other concerns once we decide on those.

> FileInputFormat recursive=false fails instead of ignoring the directories.
> --
>
> Key: HADOOP-15403
> URL: https://issues.apache.org/jira/browse/HADOOP-15403
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HADOOP-15403.patch
>
>
> We are trying to create a split in Hive that will only read files in a 
> directory and not subdirectories.
> That fails with the below error.
> Given how this error comes about (two pieces of code interact, one explicitly 
> adding directories to results without failing, and one failing on any 
> directories in results), this seems like a bug.
> {noformat}
> Caused by: java.io.IOException: Not a file: 
> file:/,...warehouse/simple_to_mm_text/delta_001_001_
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:329) 
> ~[hadoop-mapreduce-client-core-3.1.0.jar:?]
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:553)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:754)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:203)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
> {noformat}
> This code, when recursion is disabled, adds directories to results 
> {noformat} 
> if (recursive && stat.isDirectory()) {
>   result.dirsNeedingRecursiveCalls.add(stat);
> } else {
>   result.locatedFileStatuses.add(stat);
> }
> {noformat} 
> However the getSplits code after that computes the size like this
> {noformat}
> long totalSize = 0;   // compute total size
> for (FileStatus file: files) {// check we have valid files
>   if (file.isDirectory()) {
> throw new IOException("Not a file: "+ file.getPath());
>   }
>   totalSize +=
> {noformat}
> which would always fail combined with the above code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15405) adl:// use Configuration.getPassword() to look up fs.adl.oauth2.refresh.url

2018-04-23 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448397#comment-16448397
 ] 

Steve Loughran commented on HADOOP-15405:
-

stack
{code}
Caused by: java.io.IOException: Password fs.adl.oauth2.refresh.url not found
at 
org.apache.hadoop.fs.adl.AdlFileSystem.getPasswordString(AdlFileSystem.java:990)
at 
org.apache.hadoop.fs.adl.AdlFileSystem.getConfCredentialBasedTokenProvider(AdlFileSystem.java:291)
at 
org.apache.hadoop.fs.adl.AdlFileSystem.getAccessTokenProvider(AdlFileSystem.java:269)
at org.apache.hadoop.fs.adl.AdlFileSystem.initialize(AdlFileSystem.java:175)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3354)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3403)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3371)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:477)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361)
at 
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.setOutputPath(FileOutputFormat.java:178)
... 14 more
{code}

> adl:// use Configuration.getPassword() to look up fs.adl.oauth2.refresh.url
> ---
>
> Key: HADOOP-15405
> URL: https://issues.apache.org/jira/browse/HADOOP-15405
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/adl
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Priority: Minor
>
> the adl connector uses {{Configuration.getPassword()}} to look up the 
> {{fs.adl.oauth2.refresh.url}} value; reports it as an unknown password on 
> failure.
> it should be using getTrimmed() to get a trimmed string.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15405) adl:// use Configuration.getPassword() to look up fs.adl.oauth2.refresh.url

2018-04-23 Thread Steve Loughran (JIRA)
Steve Loughran created HADOOP-15405:
---

 Summary: adl:// use Configuration.getPassword() to look up 
fs.adl.oauth2.refresh.url
 Key: HADOOP-15405
 URL: https://issues.apache.org/jira/browse/HADOOP-15405
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/adl
Affects Versions: 3.1.0
Reporter: Steve Loughran


the adl connector uses {{Configuration.getPassword()}} to look up the 
{{fs.adl.oauth2.refresh.url}} value; reports it as an unknown password on 
failure.

it should be using getTrimmed() to get a trimmed string.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Assigned] (HADOOP-14764) Über-jira adl:// Azure Data Lake Phase II: Performance, Resilience and Testing

2018-04-23 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran reassigned HADOOP-14764:
---

Assignee: (was: John Zhuge)

> Über-jira adl:// Azure Data Lake Phase II: Performance, Resilience and Testing
> --
>
> Key: HADOOP-14764
> URL: https://issues.apache.org/jira/browse/HADOOP-14764
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/adl
>Affects Versions: 3.0.0
>Reporter: John Zhuge
>Priority: Major
>
> Uber-JIRA for adl:// phase II
> * Split out integration tests
> * Parallel test execution
> * More metrics
> * Performance optimizations
> * Performance tuning docs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15403) FileInputFormat recursive=false fails instead of ignoring the directories.

2018-04-23 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448264#comment-16448264
 ] 

Jason Lowe commented on HADOOP-15403:
-

Does this have backward compatibility ramifications?  The default for 
mapreduce.input.fileinputformat.input.dir.recursive is false, so unless users 
changed it the jobs are failing today if the input contains directories.  If we 
change the behavior to ignore directories that could lead to lead to silent 
data loss if the job tried to consume an input location that now suddenly 
contains some directories.

In short: is it OK to assume the users will be aware of and agree with the new 
behavior?  Is there any way for users to revert to the old behavior if they do 
not want any inputs to be silently ignored?

> FileInputFormat recursive=false fails instead of ignoring the directories.
> --
>
> Key: HADOOP-15403
> URL: https://issues.apache.org/jira/browse/HADOOP-15403
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HADOOP-15403.patch
>
>
> We are trying to create a split in Hive that will only read files in a 
> directory and not subdirectories.
> That fails with the below error.
> Given how this error comes about (two pieces of code interact, one explicitly 
> adding directories to results without failing, and one failing on any 
> directories in results), this seems like a bug.
> {noformat}
> Caused by: java.io.IOException: Not a file: 
> file:/,...warehouse/simple_to_mm_text/delta_001_001_
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:329) 
> ~[hadoop-mapreduce-client-core-3.1.0.jar:?]
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:553)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:754)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:203)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
> {noformat}
> This code, when recursion is disabled, adds directories to results 
> {noformat} 
> if (recursive && stat.isDirectory()) {
>   result.dirsNeedingRecursiveCalls.add(stat);
> } else {
>   result.locatedFileStatuses.add(stat);
> }
> {noformat} 
> However the getSplits code after that computes the size like this
> {noformat}
> long totalSize = 0;   // compute total size
> for (FileStatus file: files) {// check we have valid files
>   if (file.isDirectory()) {
> throw new IOException("Not a file: "+ file.getPath());
>   }
>   totalSize +=
> {noformat}
> which would always fail combined with the above code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-11766) Generic token authentication support for Hadoop

2018-04-23 Thread Jiajia Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447960#comment-16447960
 ] 

Jiajia Li commented on HADOOP-11766:


[~danilreddy] I'd like to introduce HAS(Hadoop Authentication Service) to you,  
it is a solution to support the authentication of open source big data 
ecosystem, you can implement the plugin interface in HAS to integrate your 
custom OAuth service. Please look at 
https://github.com/apache/directory-kerby/tree/has-project/has for details.

> Generic token authentication support for Hadoop
> ---
>
> Key: HADOOP-11766
> URL: https://issues.apache.org/jira/browse/HADOOP-11766
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: security
>Reporter: Kai Zheng
>Assignee: Kai Zheng
>Priority: Major
> Attachments: HADOOP-11766-V1.patch
>
>
> As a major goal of Rhino project, we proposed *TokenAuth* effort in 
> HADOOP-9392, where it's to provide a common token authentication framework to 
> integrate multiple authentication mechanisms, by adding a new 
> {{AuthenticationMethod}} in lieu of {{KERBEROS}} and {{SIMPLE}}. To minimize 
> the required changes and risk, we thought of another approach to achieve the 
> general goals based on Kerberos as Kerberos itself supports a 
> pre-authentication framework in both spec and implementation, which was 
> discussed in HADOOP-10959 as *TokenPreauth*. In both approaches, we had 
> performed workable prototypes covering both command line console and Hadoop 
> web UI. 
> As HADOOP-9392 is rather lengthy and heavy, HADOOP-10959 is mostly focused on 
> the concrete implementation approach based on Kerberos, we open this for more 
> general and updated discussions about requirement, use cases, and concerns 
> for the generic token authentication support for Hadoop. We distinguish this 
> token from existing Hadoop tokens as the token in this discussion is majorly 
> for the initial and primary authentication. We will refine our existing codes 
> in HADOOP-9392 and HADOOP-10959, break them down into smaller patches based 
> on latest trunk. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15398) Compilation error in trunk in hadoop-aws

2018-04-23 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447946#comment-16447946
 ] 

genericqa commented on HADOOP-15398:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
37m 38s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 37s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
12s{color} | {color:green} hadoop-project in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
26s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 52m  4s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b |
| JIRA Issue | HADOOP-15398 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12920260/HADOOP-15398.001.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  xml  |
| uname | Linux 9075eb37ef23 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 83e5f25 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14516/testReport/ |
| Max. process+thread count | 341 (vs. ulimit of 1) |
| modules | C: hadoop-project U: hadoop-project |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14516/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Compilation error in trunk in hadoop-aws 
> -
>
> Key: HADOOP-15398
> URL: https://issues.apache.org/jira/browse/HADOOP-15398
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Mohammad Arshad
>Priority: Major
> Attachments: HADOOP-15398.001.patch
>
>
> *Problem:* hadoop trunk compilation is failing
>  *Root Cause:*
>  compilation error is coming from 
> 

[jira] [Commented] (HADOOP-15392) S3A Metrics in S3AInstrumentation Cause Memory Leaks

2018-04-23 Thread Voyta (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447932#comment-16447932
 ] 

Voyta commented on HADOOP-15392:


[~ste...@apache.org]: We collect no metrics at all. Isn't it initialized by 
HBase ExportSnapshot code that we use and that actually uses this class?

> S3A Metrics in S3AInstrumentation Cause Memory Leaks
> 
>
> Key: HADOOP-15392
> URL: https://issues.apache.org/jira/browse/HADOOP-15392
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Voyta
>Priority: Major
>
> While using HBase S3A Export Snapshot utility we started to experience memory 
> leaks of the process after version upgrade.
> By running code analysis we traced the cause to revision 
> 6555af81a26b0b72ec3bee7034e01f5bd84b1564 that added the following static 
> reference (singleton):
> private static MetricsSystem metricsSystem = null;
> When application uses S3AFileSystem instance that is not closed immediately 
> metrics are accumulated in this instance and memory grows without any limit.
>  
> Expectation:
>  * It would be nice to have an option to disable metrics completely as this 
> is not needed for Export Snapshot utility.
>  * Usage of S3AFileSystem should not contain any static object that can grow 
> indefinitely.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15392) S3A Metrics in S3AInstrumentation Cause Memory Leaks

2018-04-23 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447924#comment-16447924
 ] 

Steve Loughran commented on HADOOP-15392:
-

[~Krizek]: what metrics are you collecting here? Is stuff being stored in 
memory? As that could be why you are seeing it and not the Sean



> S3A Metrics in S3AInstrumentation Cause Memory Leaks
> 
>
> Key: HADOOP-15392
> URL: https://issues.apache.org/jira/browse/HADOOP-15392
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Voyta
>Priority: Major
>
> While using HBase S3A Export Snapshot utility we started to experience memory 
> leaks of the process after version upgrade.
> By running code analysis we traced the cause to revision 
> 6555af81a26b0b72ec3bee7034e01f5bd84b1564 that added the following static 
> reference (singleton):
> private static MetricsSystem metricsSystem = null;
> When application uses S3AFileSystem instance that is not closed immediately 
> metrics are accumulated in this instance and memory grows without any limit.
>  
> Expectation:
>  * It would be nice to have an option to disable metrics completely as this 
> is not needed for Export Snapshot utility.
>  * Usage of S3AFileSystem should not contain any static object that can grow 
> indefinitely.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15403) FileInputFormat recursive=false fails instead of ignoring the directories.

2018-04-23 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447921#comment-16447921
 ] 

Steve Loughran commented on HADOOP-15403:
-

* best to preallocate size of array list from that of #of status entries 
returned
* I'm not a personal fan of continue; might be cleaner to have the files.add 
call in an the isDirectory condition's {{else}} clause, which would obsolete 
that entire need for an {{ else continue;}} call entirely
* test?



> FileInputFormat recursive=false fails instead of ignoring the directories.
> --
>
> Key: HADOOP-15403
> URL: https://issues.apache.org/jira/browse/HADOOP-15403
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HADOOP-15403.patch
>
>
> We are trying to create a split in Hive that will only read files in a 
> directory and not subdirectories.
> That fails with the below error.
> Given how this error comes about (two pieces of code interact, one explicitly 
> adding directories to results without failing, and one failing on any 
> directories in results), this seems like a bug.
> {noformat}
> Caused by: java.io.IOException: Not a file: 
> file:/,...warehouse/simple_to_mm_text/delta_001_001_
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:329) 
> ~[hadoop-mapreduce-client-core-3.1.0.jar:?]
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:553)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:754)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:203)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
> {noformat}
> This code, when recursion is disabled, adds directories to results 
> {noformat} 
> if (recursive && stat.isDirectory()) {
>   result.dirsNeedingRecursiveCalls.add(stat);
> } else {
>   result.locatedFileStatuses.add(stat);
> }
> {noformat} 
> However the getSplits code after that computes the size like this
> {noformat}
> long totalSize = 0;   // compute total size
> for (FileStatus file: files) {// check we have valid files
>   if (file.isDirectory()) {
> throw new IOException("Not a file: "+ file.getPath());
>   }
>   totalSize +=
> {noformat}
> which would always fail combined with the above code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14918) remove the Local Dynamo DB test option

2018-04-23 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447909#comment-16447909
 ] 

Steve Loughran commented on HADOOP-14918:
-

bq. should we ask Gabor Bota if he wants to help finish this one?

gladly!

> remove the Local Dynamo DB test option
> --
>
> Key: HADOOP-14918
> URL: https://issues.apache.org/jira/browse/HADOOP-14918
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.9.0, 3.0.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-14918-001.patch, HADOOP-14918-002.patch, 
> HADOOP-14918-003.patch
>
>
> I'm going to propose cutting out the localdynamo test option for s3guard
> * the local DDB JAR is unmaintained/lags the SDK We work with...eventually 
> there'll be differences in API.
> * as the local dynamo DB is unshaded. it complicates classpath setup for the 
> build. Remove it and there's no need to worry about versions of anything 
> other than the shaded AWS
> * it complicates test runs. Now we need to test for both localdynamo *and* 
> real dynamo
> * but we can't ignore real dynamo, because that's the one which matters
> While the local option promises to reduce test costs, really, it's just 
> adding complexity. If you are testing with s3guard, you need to have a real 
> table to test against., And with the exception of those people testing s3a 
> against non-AWS, consistent endpoints, everyone should be testing with 
> S3Guard.
> -Straightforward to remove.-



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-11766) Generic token authentication support for Hadoop

2018-04-23 Thread Anil (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447903#comment-16447903
 ] 

Anil commented on HADOOP-11766:
---

May I know if SSO integration with OAuth can be setup for Ambari and other 
Hadoop services! We have a custom OAuth service which needs to integrated for 
authentication and SSO for Hadoop services!

> Generic token authentication support for Hadoop
> ---
>
> Key: HADOOP-11766
> URL: https://issues.apache.org/jira/browse/HADOOP-11766
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: security
>Reporter: Kai Zheng
>Assignee: Kai Zheng
>Priority: Major
> Attachments: HADOOP-11766-V1.patch
>
>
> As a major goal of Rhino project, we proposed *TokenAuth* effort in 
> HADOOP-9392, where it's to provide a common token authentication framework to 
> integrate multiple authentication mechanisms, by adding a new 
> {{AuthenticationMethod}} in lieu of {{KERBEROS}} and {{SIMPLE}}. To minimize 
> the required changes and risk, we thought of another approach to achieve the 
> general goals based on Kerberos as Kerberos itself supports a 
> pre-authentication framework in both spec and implementation, which was 
> discussed in HADOOP-10959 as *TokenPreauth*. In both approaches, we had 
> performed workable prototypes covering both command line console and Hadoop 
> web UI. 
> As HADOOP-9392 is rather lengthy and heavy, HADOOP-10959 is mostly focused on 
> the concrete implementation approach based on Kerberos, we open this for more 
> general and updated discussions about requirement, use cases, and concerns 
> for the generic token authentication support for Hadoop. We distinguish this 
> token from existing Hadoop tokens as the token in this discussion is majorly 
> for the initial and primary authentication. We will refine our existing codes 
> in HADOOP-9392 and HADOOP-10959, break them down into smaller patches based 
> on latest trunk. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15398) Compilation error in trunk in hadoop-aws

2018-04-23 Thread Mohammad Arshad (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad updated HADOOP-15398:
-
Status: Patch Available  (was: Open)

> Compilation error in trunk in hadoop-aws 
> -
>
> Key: HADOOP-15398
> URL: https://issues.apache.org/jira/browse/HADOOP-15398
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Mohammad Arshad
>Priority: Major
> Attachments: HADOOP-15398.001.patch
>
>
> *Problem:* hadoop trunk compilation is failing
>  *Root Cause:*
>  compilation error is coming from 
> {{org.apache.hadoop.fs.s3a.commit.staging.StagingTestBase}}. Compilation 
> error is "The method getArgumentAt(int, Class) is 
> undefined for the type InvocationOnMock".
> StagingTestBase is using getArgumentAt(int, Class) method 
> which is not available in mockito-all 1.8.5 version. getArgumentAt(int, 
> Class) method is available only from version 2.0.0-beta
> *Expectations:*
>  Either mockito-all version to be upgraded or test case to be written only 
> with available functions in 1.8.5.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15398) Compilation error in trunk in hadoop-aws

2018-04-23 Thread Mohammad Arshad (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Arshad updated HADOOP-15398:
-
Attachment: HADOOP-15398.001.patch

> Compilation error in trunk in hadoop-aws 
> -
>
> Key: HADOOP-15398
> URL: https://issues.apache.org/jira/browse/HADOOP-15398
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Mohammad Arshad
>Priority: Major
> Attachments: HADOOP-15398.001.patch
>
>
> *Problem:* hadoop trunk compilation is failing
>  *Root Cause:*
>  compilation error is coming from 
> {{org.apache.hadoop.fs.s3a.commit.staging.StagingTestBase}}. Compilation 
> error is "The method getArgumentAt(int, Class) is 
> undefined for the type InvocationOnMock".
> StagingTestBase is using getArgumentAt(int, Class) method 
> which is not available in mockito-all 1.8.5 version. getArgumentAt(int, 
> Class) method is available only from version 2.0.0-beta
> *Expectations:*
>  Either mockito-all version to be upgraded or test case to be written only 
> with available functions in 1.8.5.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15398) Compilation error in trunk in hadoop-aws

2018-04-23 Thread Mohammad Arshad (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447866#comment-16447866
 ] 

Mohammad Arshad commented on HADOOP-15398:
--

Thanks [~jlowe] for your comments.
I  do not face this problem in all the environment, all the time, but the 
problem occurs some time. Yes, you are right the required functions are 
available in versions lower than 2.0.0-beta. As you said we can update it, I am 
submitting the patch

 

> Compilation error in trunk in hadoop-aws 
> -
>
> Key: HADOOP-15398
> URL: https://issues.apache.org/jira/browse/HADOOP-15398
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Mohammad Arshad
>Priority: Major
>
> *Problem:* hadoop trunk compilation is failing
>  *Root Cause:*
>  compilation error is coming from 
> {{org.apache.hadoop.fs.s3a.commit.staging.StagingTestBase}}. Compilation 
> error is "The method getArgumentAt(int, Class) is 
> undefined for the type InvocationOnMock".
> StagingTestBase is using getArgumentAt(int, Class) method 
> which is not available in mockito-all 1.8.5 version. getArgumentAt(int, 
> Class) method is available only from version 2.0.0-beta
> *Expectations:*
>  Either mockito-all version to be upgraded or test case to be written only 
> with available functions in 1.8.5.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15404) Remove multibyte characters in DataNodeUsageReportUtil

2018-04-23 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447627#comment-16447627
 ] 

genericqa commented on HADOOP-15404:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
35s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 30m 
 0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 31s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  4s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
39s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 65m 13s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b |
| JIRA Issue | HADOOP-15404 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12920231/HADOOP-15404.1.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux ccc191cf0f53 3.13.0-137-generic #186-Ubuntu SMP Mon Dec 4 
19:09:19 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 63803e7 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14515/testReport/ |
| Max. process+thread count | 328 (vs. ulimit of 1) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs-client U: 
hadoop-hdfs-project/hadoop-hdfs-client |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14515/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Remove multibyte 

[jira] [Commented] (HADOOP-15404) Remove multibyte characters in DataNodeUsageReportUtil

2018-04-23 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447612#comment-16447612
 ] 

Arpit Agarwal commented on HADOOP-15404:


+1 pending Jenkins. Thanks for catching and fixing this [~tasanuma0829].

> Remove multibyte characters in DataNodeUsageReportUtil
> --
>
> Key: HADOOP-15404
> URL: https://issues.apache.org/jira/browse/HADOOP-15404
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
> Attachments: HADOOP-15404.1.patch
>
>
> DataNodeUsageReportUtil created by HDFS-13055 includes multibyte characters. 
> We need to remove them for building it with java9.
> {noformat}
> mvn javadoc:javadoc --projects hadoop-hdfs-project/hadoop-hdfs-client
> ...
> [ERROR] 
> /hadoop/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/server/protocol/DataNodeUsageReportUtil.java:26:
>  error: unmappable character (0xE2) for encoding US-ASCII
> [ERROR]  * the delta between??current DataNode usage metrics and the 
> usage metrics
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org