[jira] [Created] (HDFS-15251) Add new zookeeper event type case after zk updated to 3.5.x
Jianfei Jiang created HDFS-15251: Summary: Add new zookeeper event type case after zk updated to 3.5.x Key: HDFS-15251 URL: https://issues.apache.org/jira/browse/HDFS-15251 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs Affects Versions: 3.2.1 Reporter: Jianfei Jiang In zookeeper 3.5.x, KeeperState add a new one named Closed, so should add Close case to the swich as it is not an unexpected Zookeeper watch event state. {code:java} /** @deprecated */ @Deprecated Unknown(-1), Disconnected(0), /** @deprecated */ @Deprecated NoSyncConnected(1), SyncConnected(3), AuthFailed(4), ConnectedReadOnly(5), SaslAuthenticated(6), Expired(-112), Closed(7);{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15250) Setting `dfs.client.use.datanode.hostname` to true can crash the system because of unhandled UnresolvedAddressException
Ctest created HDFS-15250: Summary: Setting `dfs.client.use.datanode.hostname` to true can crash the system because of unhandled UnresolvedAddressException Key: HDFS-15250 URL: https://issues.apache.org/jira/browse/HDFS-15250 Project: Hadoop HDFS Issue Type: Bug Reporter: Ctest *Problem:* `dfs.client.use.datanode.hostname` by default is set to false, which means the client will use the IP address of the datanode to connect to the datanode, rather than the hostname of the datanode. In `org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.nextTcpPeer`: {code:java} try { Peer peer = remotePeerFactory.newConnectedPeer(inetSocketAddress, token, datanode); LOG.trace("nextTcpPeer: created newConnectedPeer {}", peer); return new BlockReaderPeer(peer, false); } catch (IOException e) { LOG.trace("nextTcpPeer: failed to create newConnectedPeer connected to" + "{}", datanode); throw e; } {code} If `dfs.client.use.datanode.hostname` is false, then it will try to connect via IP address. If the IP address is illegal and the connection fails, IOException will be thrown from `newConnectedPeer` and be handled. If `dfs.client.use.datanode.hostname` is true, then it will try to connect via hostname. If the hostname cannot be resolved, UnresolvedAddressException will be thrown from `newConnectedPeer`. However, UnresolvedAddressException is not a subclass of IOException so `nextTcpPeer` doesn’t handle this exception at all. This unhandled exception could crash the system. *Solution:* Since the method is handling the illegal IP address, then the illegal hostname should be also handled as well. One solution is to add the handling logic in `nextTcpPeer`: {code:java} } catch (IOException e) { LOG.trace("nextTcpPeer: failed to create newConnectedPeer connected to" + "{}", datanode); throw e; } catch (UnresolvedAddressException e) { ... // handling logic }{code} I am very happy to provide a patch to do this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86
For more details, see https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1454/ [Mar 29, 2020 3:17:02 PM] (inigoiri) HDFS-15239. Add button to go to the parent directory in the explorer. [Mar 29, 2020 5:54:25 PM] (brahma) Preparing for 3.4.0 development [Mar 29, 2020 6:14:20 PM] (brahma) upate the hadoop.version property in the root pom.xml and [Mar 29, 2020 9:10:25 PM] (ayushsaxena) HDFS-15245. Improve JournalNode web UI. Contributed by Jianfei Jiang. -1 overall The following subsystems voted -1: asflicense findbugs pathlen unit xml The following subsystems voted -1 but were configured to be filtered/ignored: cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace The following subsystems are considered long running: (runtime bigger than 1h 0m 0s) unit Specific tests: XML : Parsing Error(s): hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-excerpt.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags2.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-sample-output.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/fair-scheduler-invalid.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/yarn-site-with-invalid-allocation-file-ref.xml FindBugs : module:hadoop-cloud-storage-project/hadoop-cos Redundant nullcheck of dir, which is known to be non-null in org.apache.hadoop.fs.cosn.BufferPool.createDir(String) Redundant null check at BufferPool.java:is known to be non-null in org.apache.hadoop.fs.cosn.BufferPool.createDir(String) Redundant null check at BufferPool.java:[line 66] org.apache.hadoop.fs.cosn.CosNInputStream$ReadBuffer.getBuffer() may expose internal representation by returning CosNInputStream$ReadBuffer.buffer At CosNInputStream.java:by returning CosNInputStream$ReadBuffer.buffer At CosNInputStream.java:[line 87] Found reliance on default encoding in org.apache.hadoop.fs.cosn.CosNativeFileSystemStore.storeFile(String, File, byte[]):in org.apache.hadoop.fs.cosn.CosNativeFileSystemStore.storeFile(String, File, byte[]): new String(byte[]) At CosNativeFileSystemStore.java:[line 199] Found reliance on default encoding in org.apache.hadoop.fs.cosn.CosNativeFileSystemStore.storeFileWithRetry(String, InputStream, byte[], long):in org.apache.hadoop.fs.cosn.CosNativeFileSystemStore.storeFileWithRetry(String, InputStream, byte[], long): new String(byte[]) At CosNativeFileSystemStore.java:[line 178] org.apache.hadoop.fs.cosn.CosNativeFileSystemStore.uploadPart(File, String, String, int) may fail to clean up java.io.InputStream Obligation to clean up resource created at CosNativeFileSystemStore.java:fail to clean up java.io.InputStream Obligation to clean up resource created at CosNativeFileSystemStore.java:[line 252] is not discharged Failed junit tests : hadoop.mapreduce.TestMapreduceConfigFields hadoop.mapred.TestNetworkedJob hadoop.yarn.applications.distributedshell.TestDistributedShell hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer cc: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1454/artifact/out/diff-compile-cc-root.txt [8.0K] javac: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1454/artifact/out/diff-compile-javac-root.txt [428K] checkstyle: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1454/artifact/out/diff-checkstyle-root.txt [16M] pathlen: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1454/artifact/out/pathlen.txt [12K] pylint: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1454/artifact/out/diff-patch-pylint.txt [24K] shellcheck: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1454/artifact/out/diff-patch-shellcheck.txt [16K] shelldocs: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1454/artifact/out/diff-patch-shelldocs.txt [44K] whitespace: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1454/artifact/out/whitespace-eol.txt [9.9M] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1454/artifact/out/whitespace-tabs.txt [1.1M] xml: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1454/artifact/out/xml.txt [20K] findbugs:
Apache Hadoop qbt Report: branch2.10+JDK7 on Linux/x86
For more details, see https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/640/ No changes -1 overall The following subsystems voted -1: asflicense findbugs hadolint pathlen unit xml The following subsystems voted -1 but were configured to be filtered/ignored: cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace The following subsystems are considered long running: (runtime bigger than 1h 0m 0s) unit Specific tests: XML : Parsing Error(s): hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/conf/empty-configuration.xml hadoop-tools/hadoop-azure/src/config/checkstyle-suppressions.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/public/crossdomain.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/public/crossdomain.xml FindBugs : module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase/hadoop-yarn-server-timelineservice-hbase-client Boxed value is unboxed and then immediately reboxed in org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnRWHelper.readResultsWithTimestamps(Result, byte[], byte[], KeyConverter, ValueConverter, boolean) At ColumnRWHelper.java:then immediately reboxed in org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnRWHelper.readResultsWithTimestamps(Result, byte[], byte[], KeyConverter, ValueConverter, boolean) At ColumnRWHelper.java:[line 335] Failed junit tests : hadoop.util.TestDiskChecker hadoop.util.TestReadWriteDiskValidator hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys hadoop.hdfs.server.datanode.TestDirectoryScanner hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints hadoop.registry.secure.TestSecureLogins cc: https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/640/artifact/out/diff-compile-cc-root-jdk1.7.0_95.txt [4.0K] javac: https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/640/artifact/out/diff-compile-javac-root-jdk1.7.0_95.txt [324K] cc: https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/640/artifact/out/diff-compile-cc-root-jdk1.8.0_242.txt [4.0K] javac: https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/640/artifact/out/diff-compile-javac-root-jdk1.8.0_242.txt [304K] checkstyle: https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/640/artifact/out/diff-checkstyle-root.txt [16M] hadolint: https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/640/artifact/out/diff-patch-hadolint.txt [4.0K] pathlen: https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/640/artifact/out/pathlen.txt [12K] pylint: The source tree stderr: https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/640/artifact/out/patch-pylint-stderr.txt [] shellcheck: https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/640/artifact/out/diff-patch-shellcheck.txt [56K] shelldocs: https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/640/artifact/out/diff-patch-shelldocs.txt [8.0K] whitespace: https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/640/artifact/out/whitespace-eol.txt [12M] https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/640/artifact/out/whitespace-tabs.txt [1.3M] xml: https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/640/artifact/out/xml.txt [12K] findbugs: https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/640/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-timelineservice-hbase_hadoop-yarn-server-timelineservice-hbase-client-warnings.html [8.0K] javadoc: https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/640/artifact/out/diff-javadoc-javadoc-root-jdk1.7.0_95.txt [16K] https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/640/artifact/out/diff-javadoc-javadoc-root-jdk1.8.0_242.txt [1.1M] unit: https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/640/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt [180K] https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/640/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt [236K] https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/640/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs_src_contrib_bkjournal.txt [12K]
[jira] [Created] (HDFS-15249) ThrottledAsyncChecker is not thread-safe.
Toshihiro Suzuki created HDFS-15249: --- Summary: ThrottledAsyncChecker is not thread-safe. Key: HDFS-15249 URL: https://issues.apache.org/jira/browse/HDFS-15249 Project: Hadoop HDFS Issue Type: Bug Reporter: Toshihiro Suzuki Assignee: Toshihiro Suzuki ThrottledAsyncChecker should be thread-safe because it can be used by multiple threads when we have multiple namespaces. *checksInProgress* and *completedChecks* are respectively HashMap and WeakHashMap which are not tread-safe. So we need to put them in synchronized block whenever we access them. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-14503) ThrottledAsyncChecker throws NPE during block pool initialization
[ https://issues.apache.org/jira/browse/HDFS-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Toshihiro Suzuki resolved HDFS-14503. - Resolution: Duplicate > ThrottledAsyncChecker throws NPE during block pool initialization > -- > > Key: HDFS-14503 > URL: https://issues.apache.org/jira/browse/HDFS-14503 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Yiqun Lin >Priority: Major > > ThrottledAsyncChecker throws NPE during block pool initialization. The error > leads the block pool registration failure. > The exception > {noformat} > 2019-05-20 01:02:36,003 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Unexpected exception in block pool Block pool (Datanode Uuid > x) service to xx.xx.xx.xx/xx.xx.xx.xx > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.datanode.checker.ThrottledAsyncChecker$LastCheckResult.access$000(ThrottledAsyncChecker.java:211) > at > org.apache.hadoop.hdfs.server.datanode.checker.ThrottledAsyncChecker.schedule(ThrottledAsyncChecker.java:129) > at > org.apache.hadoop.hdfs.server.datanode.checker.DatasetVolumeChecker.checkAllVolumes(DatasetVolumeChecker.java:209) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.checkDiskError(DataNode.java:3387) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1508) > at > org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:319) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:272) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:768) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Looks like this error due to {{WeakHashMap}} type map {{completedChecks}} has > removed the target entry while we still get that entry. Although we have done > a check before we get it, there is still a chance the entry is got as null. > We met a corner case for this: A federation mode, two block pools in DN, > {{ThrottledAsyncChecker}} schedules two same health checks for same volume. > {noformat} > 2019-05-20 01:02:36,000 INFO > org.apache.hadoop.hdfs.server.datanode.checker.ThrottledAsyncChecker: > Scheduling a check for /hadoop/2/hdfs/data/current > 2019-05-20 01:02:36,000 INFO > org.apache.hadoop.hdfs.server.datanode.checker.ThrottledAsyncChecker: > Scheduling a check for /hadoop/2/hdfs/data/current > {noformat} > {{completedChecks}} cleans up the entry for one successful check after called > {{completedChecks#get}}. However, after this, another check we get the null. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org