[jira] [Created] (HADOOP-16857) ABFS: Optimize HttpRequest retry triggers
Sneha Vijayarajan created HADOOP-16857: -- Summary: ABFS: Optimize HttpRequest retry triggers Key: HADOOP-16857 URL: https://issues.apache.org/jira/browse/HADOOP-16857 Project: Hadoop Common Issue Type: Sub-task Components: fs/azure Affects Versions: 3.3.1 Reporter: Sneha Vijayarajan Assignee: Sneha Vijayarajan Currently retry logic gets triggered when access token fetch fails even with irrecoverable errors. Causing a large wait time for the request failure to be reported. Retry logic needs to be optimized to identify such access token fetch failures and fail fast. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-16856) cmake is missing in the CentOS 8 section of BUILDING.txt
Akira Ajisaka created HADOOP-16856: -- Summary: cmake is missing in the CentOS 8 section of BUILDING.txt Key: HADOOP-16856 URL: https://issues.apache.org/jira/browse/HADOOP-16856 Project: Hadoop Common Issue Type: Bug Components: build, documentation Reporter: Akira Ajisaka The following command does not install cmake by default: {noformat} $ sudo dnf group install 'Development Tools'{noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-16855) ABFS: hadoop-dist fails to add wildfly in class path for hadoop-azure
Sneha Vijayarajan created HADOOP-16855: -- Summary: ABFS: hadoop-dist fails to add wildfly in class path for hadoop-azure Key: HADOOP-16855 URL: https://issues.apache.org/jira/browse/HADOOP-16855 Project: Hadoop Common Issue Type: Sub-task Components: fs/azure Affects Versions: 3.3.1 Reporter: Sneha Vijayarajan -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-16854) ABFS: Tune the login calculating max concurrent request count
Sneha Vijayarajan created HADOOP-16854: -- Summary: ABFS: Tune the login calculating max concurrent request count Key: HADOOP-16854 URL: https://issues.apache.org/jira/browse/HADOOP-16854 Project: Hadoop Common Issue Type: Sub-task Components: fs/azure Affects Versions: 3.3.1 Reporter: Sneha Vijayarajan Assignee: Sneha Vijayarajan Currently in environments where memory is restricted, current max concurrent request count logic will trigger a large number of buffers needed for the execution to be blocked leading to out Of Memory exceptions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
Re: Restrict Frequency of BlockReport To Namenode startup and failover
Thanx Everyone!!! Just to conclude the thread. Have created HDFS-15162 to track this. -Ayush > On 09-Feb-2020, at 5:01 PM, Ayush Saxena wrote: > > Hi Stephen, > We are trying this on 3.1.1 > We aren’t upgrading from 2.x, we are trying to increase the cluster size to > go beyond 10K datanodes. > In the process, we analysed that block reports from these many DN’s are quite > bothersome. > There are plenty of reasons why block reports bothers performance, the major > being namenode holding the lock for these many datanodes, as you mentioned. > HDFS-14657 may improve the situation a bit(I didn’t follow it) but our point > is rather than improving the impact, we can completely get rid of them in > most of the cases. > > Why to unnecessarily have load of processing Block Reports, if it isn’t doing > anything good. > > So, just wanted to know, if people are aware of any cases where eliminating > regular BR’s can be a problem, which we might have missed. > > Let me know if you possess hard feelings for the change or doubt something. > > -Ayush > >>> On 07-Feb-2020, at 4:03 PM, Stephen O'Donnell >>> wrote: >>> >> >> Are you seeing this problem on the 3.x branch, and if so, did the problem >> exist before you upgraded to 3.x? I am wondering if the situation is better >> or worse since moving to 3.x. >> >> Also, do you believe the issue is driven by the namenode holding its lock >> for too long while it processes each block report, blocking other threads? >> >> There was an interesting proposal in >> https://issues.apache.org/jira/browse/HDFS-14657 to allow the NN lock to be >> dropped and retaken periodically while processing FBRs, but it has not >> progressed recently. I wonder if that would help here? >> >> Thanks, >> >> Stephen. >> >>> On Fri, Feb 7, 2020 at 6:58 AM Surendra Singh Lilhore >>> wrote: >>> Thanks Wei-Chiu, >>> >>> I feel now IBR is more stable in branch 3.x. If BR is just added to prevent >>> bugs in IBR, I feel we should fix such bug in IBR. Adding one new >>> functionality to prevent bug in other is not good. >>> >>> I also thing, DN should send BR in failure and process start scenario only. >>> >>> -Surendra >>> >>> On Fri, Feb 7, 2020 at 10:52 AM Ayush Saxena wrote: >>> >>> > Hi Wei-Chiu, >>> > Thanx for the response. >>> > Yes, We are talking about the FBR only. >>> > Increasing the frequency limits the problem, but doesn’t seems to be >>> > solving it. With increasing cluster size, the frequency needs to be >>> > increased, and we cannot increase it indefinitely, as in some case FBR is >>> > needed. >>> > One such case is Namenode failover, In case of failover the namenode marks >>> > all the storages as Stale, it would correct them only once FBR comes, Any >>> > overreplicated blocks won’t be deleted until the storages are in stale >>> > state. >>> > >>> > Regarding the IBR error, the block is set Completed post IBR, when the >>> > client claimed value and IBR values matches, so if there is a discrepancy >>> > here, it would alarm out there itself. >>> > >>> > If it passes over this spot, so the FBR would also be sending the same >>> > values from memory, it doesn’t check from the actual disk. >>> > DirectoryScanner would be checking if the in memory data is same as that >>> > on the disk. >>> > Other scenario where FBR could be needed is to counter a split brain >>> > scenario, but with QJM’s that is unlikely to happen. >>> > >>> > In case of any connection losses during the interval, we tend to send the >>> > BR, so should be safe here. >>> > >>> > Anyway if a client gets hold of a invalid block, it will too report to the >>> > Namenode. >>> > >>> > Other we cannot think as such, where not sending FBR can cause any issue. >>> > >>> > Let us know your thoughts on this.. >>> > >>> > -Ayush >>> > >>> > >>> On 07-Feb-2020, at 4:12 AM, Wei-Chiu Chuang >>> > wrote: >>> > >> Hey Ayush, >>> > >> >>> > >> Thanks a lot for your proposal. >>> > >> >>> > >> Do you mean the Full Block Report that is sent out every 6 hours per >>> > >> DataNode? >>> > >> Someone told me they reduced the frequency of FBR to 24 hours and it >>> > seems >>> > >> okay. >>> > >> >>> > >> One of the purposes of FBR was to prevent bugs in incremental block >>> > report >>> > >> implementation. In other words, it's a fail-safe mechanism. Any bugs in >>> > >> IBRs get corrected after a FBR that refreshes the state of blocks at >>> > >> NameNode. At least, that's my understanding of FBRs in its early days. >>> > >> >>> > >> On Tue, Feb 4, 2020 at 12:21 AM Ayush Saxena >>> > wrote: >>> > >> >>> > >> Hi All, >>> > >> Me and Surendra have been lately trying to minimise the impact of Block >>> > >> Reports on Namenode in huge cluster. We observed in a huge cluster, >>> > about >>> > >> 10k datanodes, the periodic block reports impact the Namenode >>> > performance >>> > >> adversely. >>> > >> We have been thinking to restrict the block reports to be triggered >>> > >> only >>> > >> during
[jira] [Created] (HADOOP-16853) ITestS3GuardOutOfBandOperations failing on versioned S3 buckets
Steve Loughran created HADOOP-16853: --- Summary: ITestS3GuardOutOfBandOperations failing on versioned S3 buckets Key: HADOOP-16853 URL: https://issues.apache.org/jira/browse/HADOOP-16853 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3, test Affects Versions: 3.3.0 Reporter: Steve Loughran Assignee: Steve Loughran org.apache.hadoop.fs.s3a.ITestS3GuardOutOfBandOperations.testListingDelete[auth=true] failing because the deleted file can still be read when the s3guard entry has the versionId. Proposed: if the FS is versioned and the file status has versionID then we switch to tests which assert the file is readable, rather than tests which assert it isn't there -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-16852) ABFS: Send error back to client for Read Ahead request failure
Sneha Vijayarajan created HADOOP-16852: -- Summary: ABFS: Send error back to client for Read Ahead request failure Key: HADOOP-16852 URL: https://issues.apache.org/jira/browse/HADOOP-16852 Project: Hadoop Common Issue Type: Sub-task Components: fs/azure Affects Versions: 3.3.1 Reporter: Sneha Vijayarajan Assignee: Sneha Vijayarajan -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86
For more details, see https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1407/ [Feb 10, 2020 4:13:11 AM] (iwasakims) HADOOP-16739. Fix native build failure of hadoop-pipes on CentOS 8. -1 overall The following subsystems voted -1: asflicense findbugs pathlen unit xml The following subsystems voted -1 but were configured to be filtered/ignored: cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace The following subsystems are considered long running: (runtime bigger than 1h 0m 0s) unit Specific tests: XML : Parsing Error(s): hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-excerpt.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags2.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-sample-output.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/fair-scheduler-invalid.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/yarn-site-with-invalid-allocation-file-ref.xml FindBugs : module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-mawo/hadoop-yarn-applications-mawo-core Class org.apache.hadoop.applications.mawo.server.common.TaskStatus implements Cloneable but does not define or use clone method At TaskStatus.java:does not define or use clone method At TaskStatus.java:[lines 39-346] Equals method for org.apache.hadoop.applications.mawo.server.worker.WorkerId assumes the argument is of type WorkerId At WorkerId.java:the argument is of type WorkerId At WorkerId.java:[line 114] org.apache.hadoop.applications.mawo.server.worker.WorkerId.equals(Object) does not check for null argument At WorkerId.java:null argument At WorkerId.java:[lines 114-115] FindBugs : module:hadoop-cloud-storage-project/hadoop-cos Redundant nullcheck of dir, which is known to be non-null in org.apache.hadoop.fs.cosn.BufferPool.createDir(String) Redundant null check at BufferPool.java:is known to be non-null in org.apache.hadoop.fs.cosn.BufferPool.createDir(String) Redundant null check at BufferPool.java:[line 66] org.apache.hadoop.fs.cosn.CosNInputStream$ReadBuffer.getBuffer() may expose internal representation by returning CosNInputStream$ReadBuffer.buffer At CosNInputStream.java:by returning CosNInputStream$ReadBuffer.buffer At CosNInputStream.java:[line 87] Found reliance on default encoding in org.apache.hadoop.fs.cosn.CosNativeFileSystemStore.storeFile(String, File, byte[]):in org.apache.hadoop.fs.cosn.CosNativeFileSystemStore.storeFile(String, File, byte[]): new String(byte[]) At CosNativeFileSystemStore.java:[line 199] Found reliance on default encoding in org.apache.hadoop.fs.cosn.CosNativeFileSystemStore.storeFileWithRetry(String, InputStream, byte[], long):in org.apache.hadoop.fs.cosn.CosNativeFileSystemStore.storeFileWithRetry(String, InputStream, byte[], long): new String(byte[]) At CosNativeFileSystemStore.java:[line 178] org.apache.hadoop.fs.cosn.CosNativeFileSystemStore.uploadPart(File, String, String, int) may fail to clean up java.io.InputStream Obligation to clean up resource created at CosNativeFileSystemStore.java:fail to clean up java.io.InputStream Obligation to clean up resource created at CosNativeFileSystemStore.java:[line 252] is not discharged Failed junit tests : hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA hadoop.yarn.applications.distributedshell.TestDistributedShell cc: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1407/artifact/out/diff-compile-cc-root.txt [8.0K] javac: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1407/artifact/out/diff-compile-javac-root.txt [428K] checkstyle: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1407/artifact/out/diff-checkstyle-root.txt [16M] pathlen: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1407/artifact/out/pathlen.txt [12K] pylint: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1407/artifact/out/diff-patch-pylint.txt [24K] shellcheck: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1407/artifact/out/diff-patch-shellcheck.txt [16K] shelldocs: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1407/artifact/out/diff-patch-shelldocs.txt [44K] whitespace: https://build
Apache Hadoop qbt Report: branch2.10+JDK7 on Linux/x86
For more details, see https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/593/ No changes -1 overall The following subsystems voted -1: asflicense findbugs hadolint pathlen unit xml The following subsystems voted -1 but were configured to be filtered/ignored: cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace The following subsystems are considered long running: (runtime bigger than 1h 0m 0s) unit Specific tests: XML : Parsing Error(s): hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/conf/empty-configuration.xml hadoop-tools/hadoop-azure/src/config/checkstyle-suppressions.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/public/crossdomain.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/public/crossdomain.xml FindBugs : module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase/hadoop-yarn-server-timelineservice-hbase-client Boxed value is unboxed and then immediately reboxed in org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnRWHelper.readResultsWithTimestamps(Result, byte[], byte[], KeyConverter, ValueConverter, boolean) At ColumnRWHelper.java:then immediately reboxed in org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnRWHelper.readResultsWithTimestamps(Result, byte[], byte[], KeyConverter, ValueConverter, boolean) At ColumnRWHelper.java:[line 335] Failed junit tests : hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys hadoop.fs.viewfs.TestViewFileSystemHdfs hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA hadoop.hdfs.TestRollingUpgrade hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints hadoop.registry.secure.TestSecureLogins hadoop.yarn.server.nodemanager.amrmproxy.TestFederationInterceptor hadoop.yarn.server.timelineservice.security.TestTimelineAuthFilterForV2 cc: https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/593/artifact/out/diff-compile-cc-root-jdk1.7.0_95.txt [4.0K] javac: https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/593/artifact/out/diff-compile-javac-root-jdk1.7.0_95.txt [328K] cc: https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/593/artifact/out/diff-compile-cc-root-jdk1.8.0_242.txt [4.0K] javac: https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/593/artifact/out/diff-compile-javac-root-jdk1.8.0_242.txt [308K] checkstyle: https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/593/artifact/out/diff-checkstyle-root.txt [16M] hadolint: https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/593/artifact/out/diff-patch-hadolint.txt [4.0K] pathlen: https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/593/artifact/out/pathlen.txt [12K] pylint: https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/593/artifact/out/diff-patch-pylint.txt [24K] shellcheck: https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/593/artifact/out/diff-patch-shellcheck.txt [56K] shelldocs: https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/593/artifact/out/diff-patch-shelldocs.txt [8.0K] whitespace: https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/593/artifact/out/whitespace-eol.txt [12M] https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/593/artifact/out/whitespace-tabs.txt [1.3M] xml: https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/593/artifact/out/xml.txt [12K] findbugs: https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/593/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-timelineservice-hbase_hadoop-yarn-server-timelineservice-hbase-client-warnings.html [8.0K] javadoc: https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/593/artifact/out/diff-javadoc-javadoc-root-jdk1.7.0_95.txt [16K] https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/593/artifact/out/diff-javadoc-javadoc-root-jdk1.8.0_242.txt [1.1M] unit: https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/593/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt [236K] https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/593/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs_src_contrib_bkjournal.txt [12K] https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/593/artifact/out/patch-unit-hadoop-yarn-project_hadoop-