[jira] [Created] (HADOOP-16857) ABFS: Optimize HttpRequest retry triggers

2020-02-11 Thread Sneha Vijayarajan (Jira)
Sneha Vijayarajan created HADOOP-16857:
--

 Summary: ABFS: Optimize HttpRequest retry triggers
 Key: HADOOP-16857
 URL: https://issues.apache.org/jira/browse/HADOOP-16857
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Affects Versions: 3.3.1
Reporter: Sneha Vijayarajan
Assignee: Sneha Vijayarajan


Currently retry logic gets triggered when access token fetch fails even with 
irrecoverable errors. Causing a large wait time for the request failure to be 
reported. 

 

Retry logic needs to be optimized to identify such access token fetch failures 
and fail fast.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-16856) cmake is missing in the CentOS 8 section of BUILDING.txt

2020-02-11 Thread Akira Ajisaka (Jira)
Akira Ajisaka created HADOOP-16856:
--

 Summary: cmake is missing in the CentOS 8 section of BUILDING.txt
 Key: HADOOP-16856
 URL: https://issues.apache.org/jira/browse/HADOOP-16856
 Project: Hadoop Common
  Issue Type: Bug
  Components: build, documentation
Reporter: Akira Ajisaka


The following command does not install cmake by default:
{noformat}
$ sudo dnf group install 'Development Tools'{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-16855) ABFS: hadoop-dist fails to add wildfly in class path for hadoop-azure

2020-02-11 Thread Sneha Vijayarajan (Jira)
Sneha Vijayarajan created HADOOP-16855:
--

 Summary: ABFS: hadoop-dist fails to add wildfly in class path for 
hadoop-azure
 Key: HADOOP-16855
 URL: https://issues.apache.org/jira/browse/HADOOP-16855
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Affects Versions: 3.3.1
Reporter: Sneha Vijayarajan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-16854) ABFS: Tune the login calculating max concurrent request count

2020-02-11 Thread Sneha Vijayarajan (Jira)
Sneha Vijayarajan created HADOOP-16854:
--

 Summary: ABFS: Tune the login calculating max concurrent request 
count
 Key: HADOOP-16854
 URL: https://issues.apache.org/jira/browse/HADOOP-16854
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Affects Versions: 3.3.1
Reporter: Sneha Vijayarajan
Assignee: Sneha Vijayarajan


Currently in environments where memory is restricted, current max concurrent 
request count logic will trigger a large number of buffers needed for the 
execution to be blocked leading to out Of Memory exceptions. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



Re: Restrict Frequency of BlockReport To Namenode startup and failover

2020-02-11 Thread Ayush Saxena
Thanx Everyone!!!
Just to conclude the thread.
Have created HDFS-15162 to track this.

-Ayush

> On 09-Feb-2020, at 5:01 PM, Ayush Saxena  wrote:
> 
> Hi Stephen,
> We are trying this on 3.1.1
> We aren’t upgrading from 2.x, we are trying to increase the cluster size to 
> go beyond 10K datanodes.
> In the process, we analysed that block reports from these many DN’s are quite 
> bothersome.
> There are plenty of reasons why block reports bothers performance, the major 
> being namenode holding the lock for these many datanodes, as you mentioned.
> HDFS-14657 may improve the situation a bit(I didn’t follow it) but our point 
> is rather than improving the impact, we can completely get rid of them in 
> most of the cases.
> 
> Why to unnecessarily have load of processing Block Reports, if it isn’t doing 
> anything good.
> 
> So, just wanted to know, if people are aware of any cases where eliminating 
> regular BR’s can be a problem, which we might have missed.
> 
> Let me know if you possess hard feelings for the change or  doubt something.
> 
> -Ayush
> 
>>> On 07-Feb-2020, at 4:03 PM, Stephen O'Donnell  
>>> wrote:
>>> 
>> 
>> Are you seeing this problem on the 3.x branch, and if so, did the problem 
>> exist before you upgraded to 3.x? I am wondering if the situation is better 
>> or worse since moving to 3.x.
>> 
>> Also, do you believe the issue is driven by the namenode holding its lock 
>> for too long while it processes each block report, blocking other threads?
>> 
>> There was an interesting proposal in 
>> https://issues.apache.org/jira/browse/HDFS-14657 to allow the NN lock to be 
>> dropped and retaken periodically while processing FBRs, but it has not 
>> progressed recently. I wonder if that would help here?
>> 
>> Thanks,
>> 
>> Stephen.
>> 
>>> On Fri, Feb 7, 2020 at 6:58 AM Surendra Singh Lilhore 
>>>  wrote:
>>> Thanks Wei-Chiu,
>>> 
>>> I feel now IBR is more stable in branch 3.x. If BR is just added to prevent
>>> bugs in IBR, I feel we should fix such bug in IBR. Adding one new
>>> functionality to prevent bug in other is not good.
>>> 
>>> I also thing, DN should send BR in failure and process start scenario only.
>>> 
>>> -Surendra
>>> 
>>> On Fri, Feb 7, 2020 at 10:52 AM Ayush Saxena  wrote:
>>> 
>>> > Hi Wei-Chiu,
>>> > Thanx for the response.
>>> > Yes, We are talking about the FBR only.
>>> > Increasing the frequency limits the problem, but doesn’t seems to be
>>> > solving it. With increasing cluster size, the frequency needs to be
>>> > increased, and we cannot increase it indefinitely, as in some case FBR is
>>> > needed.
>>> > One such case is Namenode failover, In case of failover the namenode marks
>>> > all the storages as Stale, it would correct them only once FBR comes, Any
>>> > overreplicated blocks won’t be deleted until the storages are in stale
>>> > state.
>>> >
>>> > Regarding the IBR error, the block is set Completed post IBR, when the
>>> > client claimed value and IBR values matches, so if there is a discrepancy
>>> > here, it would alarm out there itself.
>>> >
>>> > If it passes over this spot, so the FBR would also be sending the same
>>> > values from memory, it doesn’t check from the actual disk.
>>> > DirectoryScanner would be checking if the in memory data is same as that
>>> > on the disk.
>>> > Other scenario where FBR could be needed is to counter a split brain
>>> > scenario, but with QJM’s that is unlikely to happen.
>>> >
>>> > In case of any connection losses during the interval, we tend to send the
>>> > BR, so should be safe here.
>>> >
>>> > Anyway if a client gets hold of a invalid block, it will too report to the
>>> > Namenode.
>>> >
>>> > Other we cannot think as such, where not sending FBR can cause any issue.
>>> >
>>> > Let us know your thoughts on this..
>>> >
>>> > -Ayush
>>> >
>>> > >>> On 07-Feb-2020, at 4:12 AM, Wei-Chiu Chuang 
>>> > wrote:
>>> > >> Hey Ayush,
>>> > >>
>>> > >> Thanks a lot for your proposal.
>>> > >>
>>> > >> Do you mean the Full Block Report that is sent out every 6 hours per
>>> > >> DataNode?
>>> > >> Someone told me they reduced the frequency of FBR to 24 hours and it
>>> > seems
>>> > >> okay.
>>> > >>
>>> > >> One of the purposes of FBR was to prevent bugs in incremental block
>>> > report
>>> > >> implementation. In other words, it's a fail-safe mechanism. Any bugs in
>>> > >> IBRs get corrected after a FBR that refreshes the state of blocks at
>>> > >> NameNode. At least, that's my understanding of FBRs in its early days.
>>> > >>
>>> > >> On Tue, Feb 4, 2020 at 12:21 AM Ayush Saxena 
>>> > wrote:
>>> > >>
>>> > >> Hi All,
>>> > >> Me and Surendra have been lately trying to minimise the impact of Block
>>> > >> Reports on Namenode in huge cluster. We observed in a huge cluster,
>>> > about
>>> > >> 10k datanodes, the periodic block reports impact the Namenode
>>> > performance
>>> > >> adversely.
>>> > >> We have been thinking to restrict the block reports to be triggered 
>>> > >> only
>>> > >> during 

[jira] [Created] (HADOOP-16853) ITestS3GuardOutOfBandOperations failing on versioned S3 buckets

2020-02-11 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-16853:
---

 Summary: ITestS3GuardOutOfBandOperations failing on versioned S3 
buckets
 Key: HADOOP-16853
 URL: https://issues.apache.org/jira/browse/HADOOP-16853
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3, test
Affects Versions: 3.3.0
Reporter: Steve Loughran
Assignee: Steve Loughran


org.apache.hadoop.fs.s3a.ITestS3GuardOutOfBandOperations.testListingDelete[auth=true]

failing because the deleted file can still be read when the s3guard entry has 
the versionId.

Proposed: if the FS is versioned and the file status has versionID then we 
switch to tests which assert the file is readable, rather than tests which 
assert it isn't there




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-16852) ABFS: Send error back to client for Read Ahead request failure

2020-02-11 Thread Sneha Vijayarajan (Jira)
Sneha Vijayarajan created HADOOP-16852:
--

 Summary: ABFS: Send error back to client for Read Ahead request 
failure
 Key: HADOOP-16852
 URL: https://issues.apache.org/jira/browse/HADOOP-16852
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Affects Versions: 3.3.1
Reporter: Sneha Vijayarajan
Assignee: Sneha Vijayarajan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86

2020-02-11 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1407/

[Feb 10, 2020 4:13:11 AM] (iwasakims) HADOOP-16739. Fix native build failure of 
hadoop-pipes on CentOS 8.




-1 overall


The following subsystems voted -1:
asflicense findbugs pathlen unit xml


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

XML :

   Parsing Error(s): 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-excerpt.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags2.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-sample-output.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/fair-scheduler-invalid.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/yarn-site-with-invalid-allocation-file-ref.xml
 

FindBugs :

   
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-mawo/hadoop-yarn-applications-mawo-core
 
   Class org.apache.hadoop.applications.mawo.server.common.TaskStatus 
implements Cloneable but does not define or use clone method At 
TaskStatus.java:does not define or use clone method At TaskStatus.java:[lines 
39-346] 
   Equals method for 
org.apache.hadoop.applications.mawo.server.worker.WorkerId assumes the argument 
is of type WorkerId At WorkerId.java:the argument is of type WorkerId At 
WorkerId.java:[line 114] 
   
org.apache.hadoop.applications.mawo.server.worker.WorkerId.equals(Object) does 
not check for null argument At WorkerId.java:null argument At 
WorkerId.java:[lines 114-115] 

FindBugs :

   module:hadoop-cloud-storage-project/hadoop-cos 
   Redundant nullcheck of dir, which is known to be non-null in 
org.apache.hadoop.fs.cosn.BufferPool.createDir(String) Redundant null check at 
BufferPool.java:is known to be non-null in 
org.apache.hadoop.fs.cosn.BufferPool.createDir(String) Redundant null check at 
BufferPool.java:[line 66] 
   org.apache.hadoop.fs.cosn.CosNInputStream$ReadBuffer.getBuffer() may 
expose internal representation by returning CosNInputStream$ReadBuffer.buffer 
At CosNInputStream.java:by returning CosNInputStream$ReadBuffer.buffer At 
CosNInputStream.java:[line 87] 
   Found reliance on default encoding in 
org.apache.hadoop.fs.cosn.CosNativeFileSystemStore.storeFile(String, File, 
byte[]):in org.apache.hadoop.fs.cosn.CosNativeFileSystemStore.storeFile(String, 
File, byte[]): new String(byte[]) At CosNativeFileSystemStore.java:[line 199] 
   Found reliance on default encoding in 
org.apache.hadoop.fs.cosn.CosNativeFileSystemStore.storeFileWithRetry(String, 
InputStream, byte[], long):in 
org.apache.hadoop.fs.cosn.CosNativeFileSystemStore.storeFileWithRetry(String, 
InputStream, byte[], long): new String(byte[]) At 
CosNativeFileSystemStore.java:[line 178] 
   org.apache.hadoop.fs.cosn.CosNativeFileSystemStore.uploadPart(File, 
String, String, int) may fail to clean up java.io.InputStream Obligation to 
clean up resource created at CosNativeFileSystemStore.java:fail to clean up 
java.io.InputStream Obligation to clean up resource created at 
CosNativeFileSystemStore.java:[line 252] is not discharged 

Failed junit tests :

   hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA 
   hadoop.yarn.applications.distributedshell.TestDistributedShell 
  

   cc:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1407/artifact/out/diff-compile-cc-root.txt
  [8.0K]

   javac:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1407/artifact/out/diff-compile-javac-root.txt
  [428K]

   checkstyle:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1407/artifact/out/diff-checkstyle-root.txt
  [16M]

   pathlen:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1407/artifact/out/pathlen.txt
  [12K]

   pylint:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1407/artifact/out/diff-patch-pylint.txt
  [24K]

   shellcheck:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1407/artifact/out/diff-patch-shellcheck.txt
  [16K]

   shelldocs:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1407/artifact/out/diff-patch-shelldocs.txt
  [44K]

   whitespace:

   
https://build

Apache Hadoop qbt Report: branch2.10+JDK7 on Linux/x86

2020-02-11 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/593/

No changes




-1 overall


The following subsystems voted -1:
asflicense findbugs hadolint pathlen unit xml


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

XML :

   Parsing Error(s): 
   
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/conf/empty-configuration.xml
 
   hadoop-tools/hadoop-azure/src/config/checkstyle-suppressions.xml 
   hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/public/crossdomain.xml 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/public/crossdomain.xml
 

FindBugs :

   
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase/hadoop-yarn-server-timelineservice-hbase-client
 
   Boxed value is unboxed and then immediately reboxed in 
org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnRWHelper.readResultsWithTimestamps(Result,
 byte[], byte[], KeyConverter, ValueConverter, boolean) At 
ColumnRWHelper.java:then immediately reboxed in 
org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnRWHelper.readResultsWithTimestamps(Result,
 byte[], byte[], KeyConverter, ValueConverter, boolean) At 
ColumnRWHelper.java:[line 335] 

Failed junit tests :

   hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys 
   hadoop.fs.viewfs.TestViewFileSystemHdfs 
   hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA 
   hadoop.hdfs.TestRollingUpgrade 
   hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints 
   hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints 
   hadoop.registry.secure.TestSecureLogins 
   hadoop.yarn.server.nodemanager.amrmproxy.TestFederationInterceptor 
   hadoop.yarn.server.timelineservice.security.TestTimelineAuthFilterForV2 
  

   cc:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/593/artifact/out/diff-compile-cc-root-jdk1.7.0_95.txt
  [4.0K]

   javac:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/593/artifact/out/diff-compile-javac-root-jdk1.7.0_95.txt
  [328K]

   cc:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/593/artifact/out/diff-compile-cc-root-jdk1.8.0_242.txt
  [4.0K]

   javac:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/593/artifact/out/diff-compile-javac-root-jdk1.8.0_242.txt
  [308K]

   checkstyle:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/593/artifact/out/diff-checkstyle-root.txt
  [16M]

   hadolint:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/593/artifact/out/diff-patch-hadolint.txt
  [4.0K]

   pathlen:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/593/artifact/out/pathlen.txt
  [12K]

   pylint:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/593/artifact/out/diff-patch-pylint.txt
  [24K]

   shellcheck:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/593/artifact/out/diff-patch-shellcheck.txt
  [56K]

   shelldocs:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/593/artifact/out/diff-patch-shelldocs.txt
  [8.0K]

   whitespace:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/593/artifact/out/whitespace-eol.txt
  [12M]
   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/593/artifact/out/whitespace-tabs.txt
  [1.3M]

   xml:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/593/artifact/out/xml.txt
  [12K]

   findbugs:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/593/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-timelineservice-hbase_hadoop-yarn-server-timelineservice-hbase-client-warnings.html
  [8.0K]

   javadoc:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/593/artifact/out/diff-javadoc-javadoc-root-jdk1.7.0_95.txt
  [16K]
   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/593/artifact/out/diff-javadoc-javadoc-root-jdk1.8.0_242.txt
  [1.1M]

   unit:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/593/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
  [236K]
   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/593/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs_src_contrib_bkjournal.txt
  [12K]
   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/593/artifact/out/patch-unit-hadoop-yarn-project_hadoop-