[DISCUSS] Move HDFS specific APIs to FileSystem abstration
Hi, Stephen and I are working on a project to make HBase to run on Ozone. HBase, born out of the Hadoop project, depends on a number of HDFS specific APIs, including recoverLease() and isInSafeMode(). The HBase community [1] strongly voiced that they don't want the project to have direct dependency on additional FS implementations due to dependency and vulnerability management concerns. To make this project successful, we're exploring options, to push up these APIs to the FileSystem abstraction. Eventually, it would make HBase FS implementation agnostic, and perhaps enable HBase to support other storage systems in the future. We'd use the PathCapabilities API to probe if the underlying FS implementation supports these APIs, and would then invoke the corresponding FileSystem APIs. This is straightforward but the FileSystem would become bloated. Another option is to create a "RecoverableFileSystem" interface, and have both DistributedFileSystem (HDFS) and RootedOzoneFileSystem (Ozone). This way the impact to the Hadoop project and the FileSystem abstraction is even smaller. Thoughts? [1] https://lists.apache.org/thread/tcrp8vxxs3z12y36mpzx35txhpp7tvxv
[jira] [Created] (HDFS-16957) RBF: Exit status of dfsrouteradmin -rm should be non-zero for unsuccessful attempt
Viraj Jasani created HDFS-16957: --- Summary: RBF: Exit status of dfsrouteradmin -rm should be non-zero for unsuccessful attempt Key: HDFS-16957 URL: https://issues.apache.org/jira/browse/HDFS-16957 Project: Hadoop HDFS Issue Type: Bug Reporter: Viraj Jasani Assignee: Viraj Jasani DFS router admin returns non-zero status code for unsuccessful attempt to add or update mount point. However, same is not the case with removal of mount point. For instance, {code:java} bin/hdfs dfsrouteradmin -add /data4 ns1 /data4 .. .. Cannot add destination at ns1 /data4 echo $? 255 {code} {code:java} /hadoop/bin/hdfs dfsrouteradmin -rm /data4 .. .. Cannot remove mount point /data4 echo $? 0{code} Removal of mount point should stay consistent with other options and return non-zero (unsuccessful) status code. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86_64
For more details, see https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1167/ [Mar 15, 2023, 4:33:00 AM] (github) HDFS-16942. Addendum. Send error to datanode if FBR is rejected due to bad lease (#5478). Contributed by Stephen O'Donnell/ [Mar 15, 2023, 4:45:37 PM] (github) HADOOP-18654. Remove unused custom appender TaskLogAppender (#5457) [Mar 15, 2023, 4:46:17 PM] (github) HADOOP-18649. CLA and CRLA appenders to be replaced with RFA (#5448) [Mar 15, 2023, 4:59:55 PM] (github) HDFS-16947. RBF NamenodeHeartbeatService to report error for not being able to register namenode in state store (#5470) [Mar 15, 2023, 5:10:42 PM] (github) HADOOP-17746. Compatibility table in directory_markers.md doesn't render right. (#3116) [Mar 15, 2023, 8:03:22 PM] (github) HADOOP-18647. x-ms-client-request-id to identify the retry of an API. (#5437) -1 overall The following subsystems voted -1: blanks hadolint pathlen spotbugs unit xml The following subsystems voted -1 but were configured to be filtered/ignored: cc checkstyle javac javadoc pylint shellcheck The following subsystems are considered long running: (runtime bigger than 1h 0m 0s) unit Specific tests: XML : Parsing Error(s): hadoop-common-project/hadoop-common/src/test/resources/xml/external-dtd.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-excerpt.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags2.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-sample-output.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/fair-scheduler-invalid.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/yarn-site-with-invalid-allocation-file-ref.xml spotbugs : module:hadoop-mapreduce-project/hadoop-mapreduce-client Write to static field org.apache.hadoop.mapreduce.task.reduce.Fetcher.nextId from instance method new org.apache.hadoop.mapreduce.task.reduce.Fetcher(JobConf, TaskAttemptID, ShuffleSchedulerImpl, MergeManager, Reporter, ShuffleClientMetrics, ExceptionReporter, SecretKey) At Fetcher.java:from instance method new org.apache.hadoop.mapreduce.task.reduce.Fetcher(JobConf, TaskAttemptID, ShuffleSchedulerImpl, MergeManager, Reporter, ShuffleClientMetrics, ExceptionReporter, SecretKey) At Fetcher.java:[line 120] spotbugs : module:hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core Write to static field org.apache.hadoop.mapreduce.task.reduce.Fetcher.nextId from instance method new org.apache.hadoop.mapreduce.task.reduce.Fetcher(JobConf, TaskAttemptID, ShuffleSchedulerImpl, MergeManager, Reporter, ShuffleClientMetrics, ExceptionReporter, SecretKey) At Fetcher.java:from instance method new org.apache.hadoop.mapreduce.task.reduce.Fetcher(JobConf, TaskAttemptID, ShuffleSchedulerImpl, MergeManager, Reporter, ShuffleClientMetrics, ExceptionReporter, SecretKey) At Fetcher.java:[line 120] spotbugs : module:hadoop-mapreduce-project Write to static field org.apache.hadoop.mapreduce.task.reduce.Fetcher.nextId from instance method new org.apache.hadoop.mapreduce.task.reduce.Fetcher(JobConf, TaskAttemptID, ShuffleSchedulerImpl, MergeManager, Reporter, ShuffleClientMetrics, ExceptionReporter, SecretKey) At Fetcher.java:from instance method new org.apache.hadoop.mapreduce.task.reduce.Fetcher(JobConf, TaskAttemptID, ShuffleSchedulerImpl, MergeManager, Reporter, ShuffleClientMetrics, ExceptionReporter, SecretKey) At Fetcher.java:[line 120] spotbugs : module:root Write to static field org.apache.hadoop.mapreduce.task.reduce.Fetcher.nextId from instance method new org.apache.hadoop.mapreduce.task.reduce.Fetcher(JobConf, TaskAttemptID, ShuffleSchedulerImpl, MergeManager, Reporter, ShuffleClientMetrics, ExceptionReporter, SecretKey) At Fetcher.java:from instance method new org.apache.hadoop.mapreduce.task.reduce.Fetcher(JobConf, TaskAttemptID, ShuffleSchedulerImpl, MergeManager, Reporter, ShuffleClientMetrics, ExceptionReporter, SecretKey) At Fetcher.java:[line 120] Failed junit tests : hadoop.mapreduce.v2.TestUberAM hadoop.mapreduce.v2.TestMRJobsWithProfiler hadoop.mapreduce.v2.TestMRJobs cc: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1167/artifact/out/results-compile-cc-root.txt [96K] javac: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1167/arti
[jira] [Created] (HDFS-16956) Introduce inverse quantiles for metrics where higher numeric value is better
Ravindra Dingankar created HDFS-16956: - Summary: Introduce inverse quantiles for metrics where higher numeric value is better Key: HDFS-16956 URL: https://issues.apache.org/jira/browse/HDFS-16956 Project: Hadoop HDFS Issue Type: Bug Components: datanode, metrics Affects Versions: 3.3.0, 3.4.0 Reporter: Ravindra Dingankar Currently quantiles are used for latencies, where lower numeric value is better. Hence p90 gives us a value val(p90) such that 90% of our sample set has a value better (lower) than val(p90) However for metrics such as calculating transfer rates (eg : HDFS-16917 ) higher numeric value is better. Thus for such metrics the current quantiles dont work. For these metrics in order for p90 to give a value val(p90) where 90% of the sample set is better (higher) than val(p90) we need to inverse the selection by choosing a value at the (100 - 90)th location instead of the usual 90th position. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-16955) Need to timeout grouplookup calls if delayed long
Franklinsam Paul created HDFS-16955: --- Summary: Need to timeout grouplookup calls if delayed long Key: HDFS-16955 URL: https://issues.apache.org/jira/browse/HDFS-16955 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs Reporter: Franklinsam Paul Similar to "hadoop.security.group.mapping.ldap.directory.search.timeout" we need timeout to be set for group lookup call in other " group mapping service providers" such as *org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback and* *org.apache.hadoop.security.ShellBasedUnixGroupsMapping.* Currently the group lookup delay hold locks for long time and crashes the Namenode. This is to timeout the call and send the user the failure of operation due to group lookup is delayed. {code:java} 2023-03-01 18:49:25,367 WARN org.apache.hadoop.security.Groups: Potential performance problem: getGroups(user=XX) took 232236 milliseconds. 2023-03-01 18:49:25,368 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of suppressed read-lock reports: 21 Longest read-lock held at 1970-01-11 13:29:34,218+0100 for 232236ms via java.lang.Thread.getStackTrace(Thread.java:1564) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-16954) RBF: The operation of renaming a multi-subcluster directory to a single-cluster directory should throw ioexception
Max Xie created HDFS-16954: --- Summary: RBF: The operation of renaming a multi-subcluster directory to a single-cluster directory should throw ioexception Key: HDFS-16954 URL: https://issues.apache.org/jira/browse/HDFS-16954 Project: Hadoop HDFS Issue Type: Bug Components: rbf Affects Versions: 3.4.0 Reporter: Max Xie The operation of renaming a multi-subcluster directory to a single-cluster directory may cause inconsistent behavior of the file system. This operation should throw exception to be reasonable. Examples are as follows: 1. add hash_all mount point `hdfs dfsrouteradmin -add /tmp/foo subcluster1,subcluster2 /tmp/foo -order HASH_ALL` 2. add mount point `hdfs dfsrouteradmin -add /user/foo subcluster1 /user/foo` 3. mkdir dir for all subcluster. ` hdfs dfs -mkdir /tmp/foo/123 ` 4. check dir and all subclusters will have dir `/tmp/foo/123` `hdfs dfs -ls /tmp/foo/` : will show dir `/tmp/foo/123`; `hdfs dfs -ls hdfs://subcluster1/tmp/foo/` : will show dir `hdfs://subcluster1/tmp/foo/123`; `hdfs dfs -ls hdfs://subcluster2/tmp/foo/` : will show dir `hdfs://subcluster2/tmp/foo/123`; 5. rename `/tmp/foo/123` to `/user/foo/123`. The op will succeed. `hdfs dfs -mv /tmp/foo/123 /user/foo/123 ` 6. check dir again, rbf cluster still show dir `/tmp/foo/123` `hdfs dfs -ls /tmp/foo/` : will show dir `/tmp/foo/123`; `hdfs dfs -ls hdfs://subcluster1/tmp/foo/` : will no dirs; `hdfs dfs -ls hdfs://subcluster2/tmp/foo/` : will show dir `hdfs://subcluster2/tmp/foo/123`; The step 5 should throw exception. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org