[jira] [Resolved] (HDFS-16352) return the real datanode numBlocks in #getDatanodeStorageReport
[ https://issues.apache.org/jira/browse/HDFS-16352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoqiao He resolved HDFS-16352. Fix Version/s: 3.4.0 Hadoop Flags: Reviewed Resolution: Fixed Committed to trunk. Thanks [~qinyuren] for your report and contribution! > return the real datanode numBlocks in #getDatanodeStorageReport > --- > > Key: HDFS-16352 > URL: https://issues.apache.org/jira/browse/HDFS-16352 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: qinyuren >Assignee: qinyuren >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: image-2021-11-23-22-04-06-131.png > > Time Spent: 3h 40m > Remaining Estimate: 0h > > #getDatanodeStorageReport will return the array of DatanodeStorageReport > which contains the DatanodeInfo in each DatanodeStorageReport, but the > numBlocks in DatanodeInfo is always zero, which is confusing > !image-2021-11-23-22-04-06-131.png|width=683,height=338! > Or we can return the real numBlocks in DatanodeInfo when we call > #getDatanodeStorageReport -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-16388) The namenode should check the connection channel before the RPC returns data
JiangHua Zhu created HDFS-16388: --- Summary: The namenode should check the connection channel before the RPC returns data Key: HDFS-16388 URL: https://issues.apache.org/jira/browse/HDFS-16388 Project: Hadoop HDFS Issue Type: Improvement Components: namanode Affects Versions: 2.9.2 Reporter: JiangHua Zhu Some RBF-related information has been reported in HDFS-15078 and HDFS-15079, which may be similar to the situation here. When the client connects to the NameNode, such as creating a file, sometimes RPC will prompt some exception information: 2021-12-15 14:57:14,502 [2246654536] - WARN [IPC Server handler 58 on 8025:Server$Responder@1518] - IPC Server handler 58 on 8025, call Call#870382577 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 10.196.73.49:33700: output error 2021-12-15 14:57:14,502 [2246654536] - INFO [IPC Server handler 58 on 8025:Server$Handler@2619] - IPC Server handler 58 on 8025 caught an exception java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:270) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:461) at org.apache.hadoop.ipc.Server.channelWrite(Server.java:3173) at org.apache.hadoop.ipc.Server.access$1700(Server.java:136) at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:1468) at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:1538) at org.apache.hadoop.ipc.Server$Connection.sendResponse(Server.java:2517) at org.apache.hadoop.ipc.Server$Connection.access$300(Server.java:1610) at org.apache.hadoop.ipc.Server$RpcCall.doResponse(Server.java:935) at org.apache.hadoop.ipc.Server$Call.sendResponse(Server.java:769) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:880) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:817) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606) This information exists in the log recorded by the NameNode, which is somewhat different from HDFS-15078, but they have a common feature, that is, they all appear when accessing RPC. I think this problem can be solved in a unified way. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
Apache Hadoop qbt Report: branch-2.10+JDK7 on Linux/x86_64
For more details, see https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/514/ No changes -1 overall The following subsystems voted -1: docker Powered by Apache Yetushttps://yetus.apache.org - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86_64
For more details, see https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/720/ [Dec 14, 2021 9:00:46 PM] (Szilard Nemeth) Clean up checkstyle warnings from YARN-11024/10907/10929. Contributed by Benjamin Teke [Dec 15, 2021 3:16:32 AM] (noreply) HDFS-16378. Add datanode address to BlockReportLeaseManager logs (#3786). Contributed by tomscut. [Dec 15, 2021 8:47:51 AM] (noreply) YARN-11045. ATSv2 storage monitor fails to read from hbase cluster (#3796) -1 overall The following subsystems voted -1: blanks pathlen unit xml The following subsystems voted -1 but were configured to be filtered/ignored: cc checkstyle javac javadoc pylint shellcheck The following subsystems are considered long running: (runtime bigger than 1h 0m 0s) mvnsite unit Specific tests: XML : Parsing Error(s): hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-excerpt.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags2.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-sample-output.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/fair-scheduler-invalid.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/yarn-site-with-invalid-allocation-file-ref.xml Failed junit tests : hadoop.ipc.TestIPC hadoop.fs.http.client.TestHttpFSWithHttpFSFileSystem hadoop.hdfs.server.federation.router.TestRouterFederationRename hadoop.hdfs.rbfbalance.TestRouterDistCpProcedure hadoop.hdfs.server.federation.router.TestRouterRPCMultipleDestinationMountTableResolver hadoop.yarn.csi.client.TestCsiClient hadoop.tools.dynamometer.TestDynamometerInfra hadoop.tools.dynamometer.TestDynamometerInfra cc: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/720/artifact/out/results-compile-cc-root.txt [96K] javac: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/720/artifact/out/results-compile-javac-root.txt [360K] blanks: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/720/artifact/out/blanks-eol.txt [13M] https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/720/artifact/out/blanks-tabs.txt [2.0M] checkstyle: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/720/artifact/out/results-checkstyle-root.txt [14M] pathlen: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/720/artifact/out/results-pathlen.txt [16K] pylint: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/720/artifact/out/results-pylint.txt [20K] shellcheck: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/720/artifact/out/results-shellcheck.txt [28K] xml: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/720/artifact/out/xml.txt [24K] javadoc: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/720/artifact/out/results-javadoc-javadoc-root.txt [408K] unit: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/720/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt [216K] https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/720/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-httpfs.txt [60K] https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/720/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt [168K] https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/720/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-csi.txt [20K] https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/720/artifact/out/patch-unit-hadoop-tools_hadoop-dynamometer_hadoop-dynamometer-infra.txt [12K] https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/720/artifact/out/patch-unit-hadoop-tools_hadoop-dynamometer.txt [24K] Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Reopened] (HDFS-16384) Upgrade Netty to 4.1.72.Final
[ https://issues.apache.org/jira/browse/HDFS-16384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang reopened HDFS-16384: > Upgrade Netty to 4.1.72.Final > - > > Key: HDFS-16384 > URL: https://issues.apache.org/jira/browse/HDFS-16384 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.3.1 >Reporter: Tamas Penzes >Assignee: Tamas Penzes >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > New fixes for netty, nothing else changed, just netty version bumped and two > more exclusion in hdfs-client because of new netty. > No new tests added as not needed. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
Trunk broken by HDFS-16384
My bad. There was a transitive dependency problem in the PR causing trunk to fail the build. The commit has since been reverted. Sorry for the inconvenience.
[jira] [Created] (HDFS-16387) [FGL]Access to Create File is more secure
JiangHua Zhu created HDFS-16387: --- Summary: [FGL]Access to Create File is more secure Key: HDFS-16387 URL: https://issues.apache.org/jira/browse/HDFS-16387 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: Fine-Grained Locking Reporter: JiangHua Zhu When I introduced this patch, and tried to use NNThroughputBenchmark to verify the create function, for example: ./bin/hadoop org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark -fs hdfs:// -op create -threads 50 -files 200 Run multiple times, there may be an error once. I found that sometimes deadlocks occur, such as: Found one Java-level deadlock: = "CacheReplicationMonitor(72357231)": waiting for ownable synchronizer 0x7f6a74c1aa50, (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync), which is held by "IPC Server handler 49 on 8020" "IPC Server handler 49 on 8020": waiting for ownable synchronizer 0x7f6a74d14ec8, (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync), which is held by "IPC Server handler 24 on 8020" "IPC Server handler 24 on 8020": waiting for ownable synchronizer 0x7f69348ba648, (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync), which is held by "IPC Server handler 49 on 8020" Java stack information for the threads listed above: === "CacheReplicationMonitor(72357231)": at sun.misc.Unsafe.park(Native Method) parking to wait for <0x7f6a74c1aa50> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943) at org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.doLock(FSNamesystemLock.java:386) at org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeLock(FSNamesystemLock.java:248) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeLock(FSNamesystem.java:1587) at org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor.rescan(CacheReplicationMonitor.java:288) at org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor.run(CacheReplicationMonitor.java:189) "IPC Server handler 49 on 8020": at sun.misc.Unsafe.park(Native Method) parking to wait for <0x7f6a74d14ec8> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943) at org.apache.hadoop.hdfs.server.namenode.INodeMap$INodeMapLock.writeChildLock(INodeMap.java:164) at org.apache.hadoop.util.PartitionedGSet.latchWriteLock(PartitionedGSet.java:343) at org.apache.hadoop.hdfs.server.namenode.INodeMap.latchWriteLock(INodeMap.java:331) at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.createMissingDirs(FSDirMkdirOp.java:92) at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.startFile(FSDirWriteFileOp.java:372) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2346) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2266) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:733) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:413) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:501) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:926) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:865) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2687) "IPC
[jira] [Created] (HDFS-16386) Reduce DataNode load when FsDatasetAsyncDiskService is working
JiangHua Zhu created HDFS-16386: --- Summary: Reduce DataNode load when FsDatasetAsyncDiskService is working Key: HDFS-16386 URL: https://issues.apache.org/jira/browse/HDFS-16386 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.9.2 Reporter: JiangHua Zhu Our DataNode node has 36 disks. When FsDatasetAsyncDiskService is working, it will cause a high load on the DataNode. Here are some monitoring related to memory: Since each disk deletes the block asynchronously, and each thread allows 4 threads to work, This will cause some troubles to the DataNode, such as increased cpu and increased memory. We should appropriately reduce the number of jobs of the total thread so that the DataNode can work better. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org