[jira] [Created] (HDFS-17332) DFSInputStream: avoid logging stacktrace until when we really need to fail a read request with a MissingBlockException
Xing Lin created HDFS-17332: --- Summary: DFSInputStream: avoid logging stacktrace until when we really need to fail a read request with a MissingBlockException Key: HDFS-17332 URL: https://issues.apache.org/jira/browse/HDFS-17332 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs Environment: In DFSInputStream#actualGetFromOneDataNode(), it would send the exception stacktrace to the dfsClient.LOG whenever we fail on a DN. However, in most cases, the read request will be served successfully by reading from the next available DN. The existence of exception stacktrace in the log has caused multiple hadoop users at Linkedin to consider this WARN message as the RC/fatal error for their jobs. We would like to improve the log message and avoid sending the stacktrace to dfsClient.LOG when a read succeeds. The stackTrace when reading reach DN is sent to the log only when we really need to fail a read request (when chooseDataNode()/refetchLocations() throws a BlockMissingException). Reporter: Xing Lin -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-17286) Add UDP as a transfer protocol for HDFS
Xing Lin created HDFS-17286: --- Summary: Add UDP as a transfer protocol for HDFS Key: HDFS-17286 URL: https://issues.apache.org/jira/browse/HDFS-17286 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs Reporter: Xing Lin Right now, every connection in HDFS is based on RPC/IPC which is based on TCP. Connection is re-used based on ConnectionID, which includes RpcTimeout as part of the key to identify a connection. The consequence is if we want to use a different rpc timeout between two hosts, this would create different TCP connections. A use case which motivated us to consider UDP is getHAServiceState() in ObserverReadProxyProvider. We'd like getHAServiceState() to time out with a much smaller timeout threshold and move to probe next Namenode. To support this, we used an executorService and set a timeout for the task in HDFS-17030. This implementation can be improved by using UDP to query HAServiceState. getHAServiceState() does not have to be very reliable, as we can always fall back to the active. Another motivation is it seems 5~10% of RPC calls hitting our active/observers are GetHAServiceState(). If we can move them off to the UDP server, that can hopefully improve RPC latency. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-17281) Added support of reporting RPC round-trip time at NN.
Xing Lin created HDFS-17281: --- Summary: Added support of reporting RPC round-trip time at NN. Key: HDFS-17281 URL: https://issues.apache.org/jira/browse/HDFS-17281 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs Reporter: Xing Lin Assignee: Xing Lin -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-17118) Fix minor checkstyle warnings in TestObserverReadProxyProvider
Xing Lin created HDFS-17118: --- Summary: Fix minor checkstyle warnings in TestObserverReadProxyProvider Key: HDFS-17118 URL: https://issues.apache.org/jira/browse/HDFS-17118 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs Affects Versions: 3.4.0 Reporter: Xing Lin We noticed a few checkstyle warnings when backporting HDFS-17030 from trunk to branch-3.3. The yetus build was not stable at that time and we did not notice the newly added checkstyle warnings. PR: https://github.com/apache/hadoop/pull/5700 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-17067) allowCoreThreadTimeOut should be set to true for nnProbingThreadPool in ObserverReadProxy
Xing Lin created HDFS-17067: --- Summary: allowCoreThreadTimeOut should be set to true for nnProbingThreadPool in ObserverReadProxy Key: HDFS-17067 URL: https://issues.apache.org/jira/browse/HDFS-17067 Project: Hadoop HDFS Issue Type: Bug Components: hdfs Affects Versions: 3.4.0 Reporter: Xing Lin Assignee: Xing Lin In HDFS-17030, we introduced an ExecutorService, to submit getHAServiceState() requests. We constructed the ExecutorService directly from a basic ThreadPoolExecutor, without setting _allowCoreThreadTimeOut_ to true. Then, the core thread will be kept up and running even when the main thread exits. To fix it, one could set _allowCoreThreadTimeOut_ to true. However, in this PR, we decide to directly use an existing executorService implementation (_BlockingThreadPoolExecutorService_) in hadoop instead. It takes care of setting _allowCoreThreadTimeOut_ and allowing setting the thread prefix. Second minor issue is we did not shutdown the executorService in close(). It is a minor issue as close() will only be called when the garbage collector starts to reclaim an ObserverReadProxyProvider object, not when there is no reference to the ObserverReadProxyProvider object. The time between when an ObserverReadProxyProvider becomes dereferenced and when the garage collector actually starts to reclaim that object is out of control/under-defined (unless the program is shutdown with an explicit System.exit(1)). {code:java} private final ExecutorService nnProbingThreadPool = new ThreadPoolExecutor(1, 4, 1L, TimeUnit.MINUTES, new ArrayBlockingQueue(1024)); {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-17055) Export HAState as a metric from Namenode for monitoring
Xing Lin created HDFS-17055: --- Summary: Export HAState as a metric from Namenode for monitoring Key: HDFS-17055 URL: https://issues.apache.org/jira/browse/HDFS-17055 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs Affects Versions: 3.4.0, 3.3.9 Reporter: Xing Lin We'd like measure the uptime for Namenodes: percentage of time when we have the active/standby/observer node available (up and running). We could monitor the namenode from an external service, such as ZKFC. But that would require the external service to be available 100% itself. And when this third-party external monitoring service is down, we won't have info on whether our Namenodes are still up. We propose to take a different approach: we will emit Namenode state directly from namenode itself. Whenever we miss a data point for this metric, we consider the corresponding namenode to be down/not available. In other words, we assume the metric collection/monitoring infrastructure to be 100% reliable. One implementation detail: in hadoop, we have the _NameNodeMetrics_ class, which is used to emit all metrics for {_}NameNode.java{_}. However, we don't think that is a good place to emit NameNode HAState. HAState is stored in NameNode.java and we should directly emit it from NameNode.java. Otherwise, we basically duplicate this info in two classes and we would have to keep them in sync. Besides, _NameNodeMetrics_ class does not have a reference to the _NameNode_ object which it belongs to. An _NameNodeMetrics_ is created by a _static_ function _initMetrics()_ in {_}NameNode.java{_}. We shouldn't emit HA state from FSNameSystem.java either, as it is initialized from NameNode.java and all state transitions are implemented in NameNode.java. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-17042) Add rpcCallSuccesses and OverallRpcProcessingTime to RpcMetrics for Namenode
Xing Lin created HDFS-17042: --- Summary: Add rpcCallSuccesses and OverallRpcProcessingTime to RpcMetrics for Namenode Key: HDFS-17042 URL: https://issues.apache.org/jira/browse/HDFS-17042 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs Affects Versions: 3.4.0, 3.3.9 Reporter: Xing Lin Assignee: Xing Lin We'd like to add two new types of metrics to the existing RpcMetrics/RpcDetailedMetrics. * {_}RpcCallSuccesses{_}: it measures the number of RPC requests where they are successfully processed by a NN (e.g., with a response with an RpcStatus {_}RpcStatusProto.SUCCESS){_}{_}.{_} Then, together with {_}RpcQueueNumOps ({_}which refers the total number of RPC requests{_}){_}, we can derive the RpcErrorRate for our NN, as (RpcQueueNumOps - RpcCallSuccesses) / RpcQueueNumOps. * OverallRpcProcessingTime for each RPC method: this metric measures the overall RPC processing time for each RPC method at the NN. It covers the time from when a request arrives at the NN to when a response is sent back. We are already emitting processingTime for each RPC method today in RpcDetailedMetrics. We want to extend it to emit overallRpcProcessingTime for each RPC method, which includes enqueueTime, queueTime, processingTime, responseTime, and handlerTime. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-17030) Limit wait time for getHAServiceState in ObserverReaderProxy
Xing Lin created HDFS-17030: --- Summary: Limit wait time for getHAServiceState in ObserverReaderProxy Key: HDFS-17030 URL: https://issues.apache.org/jira/browse/HDFS-17030 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs Affects Versions: 3.4.0 Reporter: Xing Lin When HA is enabled and a standby NN is not responsible (either when it is down or a heap dump is being taken), we would wait for either _socket_connection_timeout * socket_max_retries_on_connection_timeout_ or _rpcTimeOut_ before moving on to the next NN. This adds a significantly latency. For clusters at Linkedin, we set rpcTimeOut to 120 seconds and a request would need to take more than 2 mins to complete when we take a heap dump at a standby. This has been causing user job failures. The proposal is to add a timeout on getHAServiceState() calls in ObserverReaderProxy and we will only wait for the timeout for an NN to respond its HA state. Once we pass that timeout, we will move on to the next NN. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-16852) swallow IllegalStateException in KeyProviderCache
Xing Lin created HDFS-16852: --- Summary: swallow IllegalStateException in KeyProviderCache Key: HDFS-16852 URL: https://issues.apache.org/jira/browse/HDFS-16852 Project: Hadoop HDFS Issue Type: Bug Components: hdfs Reporter: Xing Lin When an HDFS client is created, it will register a shutdownhook to shutdownHookManager. ShutdownHookManager doesn't allow adding a new shutdownHook when the process is already in shutdown and throws an IllegalStateException. This behavior is not ideal, when a spark program failed during pre-launch. In that case, during shutdown, spark would call cleanStagingDir() to clean the staging dir. In cleanStagingDir(), it will create a FileSystem object to talk to HDFS. However, since this would be the first time to use a filesystem object in that process, it will need to create an hdfs client and register the shutdownHook. Then, we will hit the IllegalStateException. This illegalStateException will mask the actual exception which causes the spark program to fail during pre-launch. We propose to swallow IllegalStateException in KeyProviderCache and log a warning. The TCP connection between the client and NameNode should be closed by the OS when the process is shutdown. Example stacktrace {code:java} 13-09-2022 14:39:42 PDT INFO - 22/09/13 21:39:41 ERROR util.Utils: Uncaught exception in thread shutdown-hook-0 13-09-2022 14:39:42 PDT INFO - java.lang.IllegalStateException: Shutdown in progress, cannot add a shutdownHook 13-09-2022 14:39:42 PDT INFO - at org.apache.hadoop.util.ShutdownHookManager.addShutdownHook(ShutdownHookManager.java:299) 13-09-2022 14:39:42 PDT INFO - at org.apache.hadoop.hdfs.KeyProviderCache.(KeyProviderCache.java:71) 13-09-2022 14:39:42 PDT INFO - at org.apache.hadoop.hdfs.ClientContext.(ClientContext.java:130) 13-09-2022 14:39:42 PDT INFO - at org.apache.hadoop.hdfs.ClientContext.get(ClientContext.java:167) 13-09-2022 14:39:42 PDT INFO - at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:383) 13-09-2022 14:39:42 PDT INFO - at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:287) 13-09-2022 14:39:42 PDT INFO - at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:159) 13-09-2022 14:39:42 PDT INFO - at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3261) 13-09-2022 14:39:42 PDT INFO - at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:121) 13-09-2022 14:39:42 PDT INFO - at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3310) 13-09-2022 14:39:42 PDT INFO - at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3278) 13-09-2022 14:39:42 PDT INFO - at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:475) 13-09-2022 14:39:42 PDT INFO - at org.apache.hadoop.fs.Path.getFileSystem(Path.java:356) 13-09-2022 14:39:42 PDT INFO - at org.apache.spark.deploy.yarn.ApplicationMaster.cleanupStagingDir(ApplicationMaster.scala:675) 13-09-2022 14:39:42 PDT INFO - at org.apache.spark.deploy.yarn.ApplicationMaster.$anonfun$run$2(ApplicationMaster.scala:259) 13-09-2022 14:39:42 PDT INFO - at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214) 13-09-2022 14:39:42 PDT INFO - at org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$2(ShutdownHookManager.scala:188) 13-09-2022 14:39:42 PDT INFO - at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) 13-09-2022 14:39:42 PDT INFO - at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2023) 13-09-2022 14:39:42 PDT INFO - at org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$1(ShutdownHookManager.scala:188) 13-09-2022 14:39:42 PDT INFO - at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) 13-09-2022 14:39:42 PDT INFO - at scala.util.Try$.apply(Try.scala:213) 13-09-2022 14:39:42 PDT INFO - at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188) 13-09-2022 14:39:42 PDT INFO - at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178) 13-09-2022 14:39:42 PDT INFO - at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 13-09-2022 14:39:42 PDT INFO - at java.util.concurrent.FutureTask.run(FutureTask.java:266) 13-09-2022 14:39:42 PDT INFO - at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 13-09-2022 14:39:42 PDT INFO - at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 13-09-2022 14:39:42 PDT INFO - at java.lang.Thread.run(Thread.java:748)
[jira] [Created] (HDFS-16818) RBF TestRouterRPCMultipleDestinationMountTableResolver non-deterministic unit tests failures
Xing Lin created HDFS-16818: --- Summary: RBF TestRouterRPCMultipleDestinationMountTableResolver non-deterministic unit tests failures Key: HDFS-16818 URL: https://issues.apache.org/jira/browse/HDFS-16818 Project: Hadoop HDFS Issue Type: Bug Components: rbf Affects Versions: 3.4.0 Reporter: Xing Lin TestRouterRPCMultipleDestinationMountTableResolver fails a couple of times nondeterministically when run multiple times. I repeated the following commands for 10+ times against 454157a3844cdd6c92ef650af6c3b323cbec88af in trunk and observed two types of failed runs. {code:java} mvn test -Dtest="TestRouterRPCMultipleDestinationMountTableResolver"{code} Failed run 1 output: {code:java} [ERROR] Failures: [ERROR] TestRouterRPCMultipleDestinationMountTableResolver.testInvocationHashAllOrder:177->testInvocation:221->testDirec toryAndFileLevelInvocation:296->verifyDirectoryLevelInvocations:395 expected:<[COLD]> but was:<[HOT]> [ERROR] TestRouterRPCMultipleDestinationMountTableResolver.testInvocationHashOrder:193->testInvocation:221->testDirector yAndFileLevelInvocation:298->verifyDirectoryLevelInvocations:395 expected:<[COLD]> but was:<[HOT]> [ERROR] TestRouterRPCMultipleDestinationMountTableResolver.testInvocationLocalOrder:201->testInvocation:221->testDirecto ryAndFileLevelInvocation:296->verifyDirectoryLevelInvocations:395 expected:<[COLD]> but was:<[HOT]> [ERROR] TestRouterRPCMultipleDestinationMountTableResolver.testInvocationRandomOrder:185->testInvocation:221->testDirect oryAndFileLevelInvocation:296->verifyDirectoryLevelInvocations:395 expected:<[COLD]> but was:<[HOT]> [ERROR] TestRouterRPCMultipleDestinationMountTableResolver.testInvocationSpaceOrder:169->testInvocation:221->testDirecto ryAndFileLevelInvocation:296->verifyDirectoryLevelInvocations:395 expected:<[COLD]> but was:<[HOT]> [INFO] [ERROR] Tests run: 18, Failures: 5, Errors: 0, Skipped: 0{code} Failed run 2 output: {code:java} [ERROR] Failures: [ERROR] TestRouterRPCMultipleDestinationMountTableResolver.testECMultipleDestinations:430 [ERROR] Errors: [ERROR] TestRouterRPCMultipleDestinationMountTableResolver.testInvocationHashAllOrder:177->testInvocation:221->testDirec toryAndFileLevelInvocation:296->verifyDirectoryLevelInvocations:397 NullPointer [ERROR] TestRouterRPCMultipleDestinationMountTableResolver.testInvocationHashOrder:193->testInvocation:221->testDirector yAndFileLevelInvocation:298->verifyDirectoryLevelInvocations:397 NullPointer [ERROR] TestRouterRPCMultipleDestinationMountTableResolver.testInvocationLocalOrder:201->testInvocation:221->testDirecto ryAndFileLevelInvocation:296->verifyDirectoryLevelInvocations:397 NullPointer [ERROR] TestRouterRPCMultipleDestinationMountTableResolver.testInvocationRandomOrder:185->testInvocation:221->testDirect oryAndFileLevelInvocation:296->verifyDirectoryLevelInvocations:397 NullPointer [ERROR] TestRouterRPCMultipleDestinationMountTableResolver.testInvocationSpaceOrder:169->testInvocation:221->testDirecto ryAndFileLevelInvocation:296->verifyDirectoryLevelInvocations:397 NullPointer [INFO] [ERROR] Tests run: 18, Failures: 1, Errors: 5, Skipped: 0{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-16816) RBF: auto-create user home dir for trash paths by router
Xing Lin created HDFS-16816: --- Summary: RBF: auto-create user home dir for trash paths by router Key: HDFS-16816 URL: https://issues.apache.org/jira/browse/HDFS-16816 Project: Hadoop HDFS Issue Type: Improvement Components: rbf Reporter: Xing Lin In RBF, trash files are moved to trash root under user's home dir at the corresponding namespace/namenode where the files reside. This was added in HDFS-16024. When the user home dir is not created before-hand at a namenode, we run into permission denied exceptions when trying to create the parent dir for the trash file before moving the file into it. We propose to enhance Router, to auto-create a user home's dir at the namenode for trash paths, using router's identity (which is assumed to be a super-user). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-16790) rbf wrong path when destination dir is not created
Xing Lin created HDFS-16790: --- Summary: rbf wrong path when destination dir is not created Key: HDFS-16790 URL: https://issues.apache.org/jira/browse/HDFS-16790 Project: Hadoop HDFS Issue Type: Bug Components: rbf Affects Versions: 3.4.0 Reporter: Xing Lin mount table at router {code:java} $HADOOP_HOME/bin/hdfs dfsrouteradmin -ls /data1ns1->/data /data2 ns2->/data /data3ns3->/data {code} At a client node, when /data is not created in ns2, the error message shows a wrong path. {code:java} utos@c01:/usr/local/bin/hadoop-3.4.0-SNAPSHOT$ bin/hadoop dfs -ls hdfs://ns-fed/data2 ls: File hdfs://ns-fed/data2/data2 does not exist. utos@c01:/usr/local/bin/hadoop-3.4.0-SNAPSHOT$ bin/hadoop dfs -ls hdfs://ns-fed/data3 -rw-r--r-- 3 utos supergroup 0 2022-10-02 17:35 hdfs://ns-fed/data3/file3 {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-16128) Add support for saving/loading an FS Image
Xing Lin created HDFS-16128: --- Summary: Add support for saving/loading an FS Image Key: HDFS-16128 URL: https://issues.apache.org/jira/browse/HDFS-16128 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs, namenode Reporter: Xing Lin We target to enable fine-grained locking by splitting the in-memory namespace into multiple partitions each having a separate lock. Intended to improve performance of NameNode write operations. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-16125) iterator for PartitionedGSet would visit the first partition twice
Xing Lin created HDFS-16125: --- Summary: iterator for PartitionedGSet would visit the first partition twice Key: HDFS-16125 URL: https://issues.apache.org/jira/browse/HDFS-16125 Project: Hadoop HDFS Issue Type: Bug Components: hdfs, namenode Reporter: Xing Lin Iterator in PartitionedGSet would visit the first partition twice, since we did not set the keyIterator to move to the first key during initialization. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org