[DISCUSS] Move HDFS specific APIs to FileSystem abstration

2023-03-16 Thread Wei-Chiu Chuang
Hi,

Stephen and I are working on a project to make HBase to run on Ozone.

HBase, born out of the Hadoop project, depends on a number of HDFS specific
APIs, including recoverLease() and isInSafeMode(). The HBase community [1]
strongly voiced that they don't want the project to have direct dependency
on additional FS implementations due to dependency and vulnerability
management concerns.

To make this project successful, we're exploring options, to push up these
APIs to the FileSystem abstraction. Eventually, it would make HBase FS
implementation agnostic, and perhaps enable HBase to support other storage
systems in the future.

We'd use the PathCapabilities API to probe if the underlying FS
implementation supports these APIs, and would then invoke the corresponding
FileSystem APIs. This is straightforward but the FileSystem would become
bloated.

Another option is to create a "RecoverableFileSystem" interface, and have
both DistributedFileSystem (HDFS) and RootedOzoneFileSystem (Ozone). This
way the impact to the Hadoop project and the FileSystem abstraction is even
smaller.

Thoughts?

[1] https://lists.apache.org/thread/tcrp8vxxs3z12y36mpzx35txhpp7tvxv


[jira] [Created] (HDFS-16957) RBF: Exit status of dfsrouteradmin -rm should be non-zero for unsuccessful attempt

2023-03-16 Thread Viraj Jasani (Jira)
Viraj Jasani created HDFS-16957:
---

 Summary: RBF: Exit status of dfsrouteradmin -rm should be non-zero 
for unsuccessful attempt
 Key: HDFS-16957
 URL: https://issues.apache.org/jira/browse/HDFS-16957
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Viraj Jasani
Assignee: Viraj Jasani


DFS router admin returns non-zero status code for unsuccessful attempt to add 
or update mount point. However, same is not the case with removal of mount 
point.

For instance,
{code:java}
bin/hdfs dfsrouteradmin -add /data4 ns1 /data4
..
..

Cannot add destination at ns1 /data4


echo $?
255 {code}
{code:java}
/hadoop/bin/hdfs dfsrouteradmin -rm /data4
..
..
Cannot remove mount point /data4


echo $?
0{code}
Removal of mount point should stay consistent with other options and return 
non-zero (unsuccessful) status code.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86_64

2023-03-16 Thread Apache Jenkins Server
For more details, see 
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1167/

[Mar 15, 2023, 4:33:00 AM] (github) HDFS-16942. Addendum. Send error to 
datanode if FBR is rejected due to bad lease (#5478). Contributed by Stephen 
O'Donnell/
[Mar 15, 2023, 4:45:37 PM] (github) HADOOP-18654. Remove unused custom appender 
TaskLogAppender (#5457)
[Mar 15, 2023, 4:46:17 PM] (github) HADOOP-18649. CLA and CRLA appenders to be 
replaced with RFA (#5448)
[Mar 15, 2023, 4:59:55 PM] (github) HDFS-16947. RBF NamenodeHeartbeatService to 
report error for not being able to register namenode in state store (#5470)
[Mar 15, 2023, 5:10:42 PM] (github) HADOOP-17746. Compatibility table in 
directory_markers.md doesn't render right. (#3116)
[Mar 15, 2023, 8:03:22 PM] (github) HADOOP-18647. x-ms-client-request-id to 
identify the retry of an API. (#5437)




-1 overall


The following subsystems voted -1:
blanks hadolint pathlen spotbugs unit xml


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

XML :

   Parsing Error(s): 
   
hadoop-common-project/hadoop-common/src/test/resources/xml/external-dtd.xml 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-excerpt.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags2.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-sample-output.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/fair-scheduler-invalid.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/yarn-site-with-invalid-allocation-file-ref.xml
 

spotbugs :

   module:hadoop-mapreduce-project/hadoop-mapreduce-client 
   Write to static field 
org.apache.hadoop.mapreduce.task.reduce.Fetcher.nextId from instance method new 
org.apache.hadoop.mapreduce.task.reduce.Fetcher(JobConf, TaskAttemptID, 
ShuffleSchedulerImpl, MergeManager, Reporter, ShuffleClientMetrics, 
ExceptionReporter, SecretKey) At Fetcher.java:from instance method new 
org.apache.hadoop.mapreduce.task.reduce.Fetcher(JobConf, TaskAttemptID, 
ShuffleSchedulerImpl, MergeManager, Reporter, ShuffleClientMetrics, 
ExceptionReporter, SecretKey) At Fetcher.java:[line 120] 

spotbugs :

   
module:hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core
 
   Write to static field 
org.apache.hadoop.mapreduce.task.reduce.Fetcher.nextId from instance method new 
org.apache.hadoop.mapreduce.task.reduce.Fetcher(JobConf, TaskAttemptID, 
ShuffleSchedulerImpl, MergeManager, Reporter, ShuffleClientMetrics, 
ExceptionReporter, SecretKey) At Fetcher.java:from instance method new 
org.apache.hadoop.mapreduce.task.reduce.Fetcher(JobConf, TaskAttemptID, 
ShuffleSchedulerImpl, MergeManager, Reporter, ShuffleClientMetrics, 
ExceptionReporter, SecretKey) At Fetcher.java:[line 120] 

spotbugs :

   module:hadoop-mapreduce-project 
   Write to static field 
org.apache.hadoop.mapreduce.task.reduce.Fetcher.nextId from instance method new 
org.apache.hadoop.mapreduce.task.reduce.Fetcher(JobConf, TaskAttemptID, 
ShuffleSchedulerImpl, MergeManager, Reporter, ShuffleClientMetrics, 
ExceptionReporter, SecretKey) At Fetcher.java:from instance method new 
org.apache.hadoop.mapreduce.task.reduce.Fetcher(JobConf, TaskAttemptID, 
ShuffleSchedulerImpl, MergeManager, Reporter, ShuffleClientMetrics, 
ExceptionReporter, SecretKey) At Fetcher.java:[line 120] 

spotbugs :

   module:root 
   Write to static field 
org.apache.hadoop.mapreduce.task.reduce.Fetcher.nextId from instance method new 
org.apache.hadoop.mapreduce.task.reduce.Fetcher(JobConf, TaskAttemptID, 
ShuffleSchedulerImpl, MergeManager, Reporter, ShuffleClientMetrics, 
ExceptionReporter, SecretKey) At Fetcher.java:from instance method new 
org.apache.hadoop.mapreduce.task.reduce.Fetcher(JobConf, TaskAttemptID, 
ShuffleSchedulerImpl, MergeManager, Reporter, ShuffleClientMetrics, 
ExceptionReporter, SecretKey) At Fetcher.java:[line 120] 

Failed junit tests :

   hadoop.mapreduce.v2.TestUberAM 
   hadoop.mapreduce.v2.TestMRJobsWithProfiler 
   hadoop.mapreduce.v2.TestMRJobs 
  

   cc:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1167/artifact/out/results-compile-cc-root.txt
 [96K]

   javac:

  

[jira] [Created] (HDFS-16956) Introduce inverse quantiles for metrics where higher numeric value is better

2023-03-16 Thread Ravindra Dingankar (Jira)
Ravindra Dingankar created HDFS-16956:
-

 Summary: Introduce inverse quantiles for metrics where higher 
numeric value is better
 Key: HDFS-16956
 URL: https://issues.apache.org/jira/browse/HDFS-16956
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, metrics
Affects Versions: 3.3.0, 3.4.0
Reporter: Ravindra Dingankar


Currently quantiles are used for latencies, where lower numeric value is better.

Hence p90 gives us a value val(p90) such that 90% of our sample set has a value 
better (lower) than val(p90)

 

However for metrics such as calculating transfer rates (eg : HDFS-16917 ) 
higher numeric value is better. Thus for such metrics the current quantiles 
dont work.

For these metrics in order for p90 to give a value val(p90) where 90% of the 
sample set is better (higher) than val(p90) we need to inverse the selection by 
choosing a value at the (100 - 90)th location instead of the usual 90th 
position.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16955) Need to timeout grouplookup calls if delayed long

2023-03-16 Thread Franklinsam Paul (Jira)
Franklinsam Paul created HDFS-16955:
---

 Summary: Need to timeout grouplookup calls if delayed long
 Key: HDFS-16955
 URL: https://issues.apache.org/jira/browse/HDFS-16955
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs
Reporter: Franklinsam Paul


Similar to "hadoop.security.group.mapping.ldap.directory.search.timeout" we 
need timeout to be set for group lookup call in other " group mapping service 
providers" such as 
*org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback and* 
*org.apache.hadoop.security.ShellBasedUnixGroupsMapping.* 

 

Currently the group lookup delay hold locks for long time and crashes the 
Namenode. This  is to timeout the call and send the user the failure of 
operation due to group lookup is delayed.

 
{code:java}
2023-03-01 18:49:25,367 WARN org.apache.hadoop.security.Groups: Potential 
performance problem: getGroups(user=XX) took 232236 milliseconds.
2023-03-01 18:49:25,368 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:   Number of suppressed 
read-lock reports: 21
Longest read-lock held at 1970-01-11 13:29:34,218+0100 for 232236ms via 
java.lang.Thread.getStackTrace(Thread.java:1564) {code}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16954) RBF: The operation of renaming a multi-subcluster directory to a single-cluster directory should throw ioexception

2023-03-16 Thread Max Xie (Jira)
Max  Xie created HDFS-16954:
---

 Summary: RBF: The operation of renaming a multi-subcluster 
directory to a single-cluster directory should throw ioexception
 Key: HDFS-16954
 URL: https://issues.apache.org/jira/browse/HDFS-16954
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: rbf
Affects Versions: 3.4.0
Reporter: Max  Xie


The operation of renaming a multi-subcluster directory to a single-cluster 
directory may cause inconsistent behavior of the file system. This operation 
should throw exception to be reasonable.

Examples are as follows:
1. add  hash_all mount point   `hdfs dfsrouteradmin -add /tmp/foo 
subcluster1,subcluster2  /tmp/foo -order HASH_ALL`
2. add   mount point   `hdfs dfsrouteradmin -add /user/foo subcluster1 
/user/foo`
3. mkdir dir for all subcluster.  ` hdfs dfs -mkdir /tmp/foo/123 `

4. check dir and all subclusters will have dir `/tmp/foo/123`
`hdfs dfs -ls /tmp/foo/` : will show dir `/tmp/foo/123`;
`hdfs dfs -ls hdfs://subcluster1/tmp/foo/` : will show dir 
`hdfs://subcluster1/tmp/foo/123`;
`hdfs dfs -ls hdfs://subcluster2/tmp/foo/` : will show dir 
`hdfs://subcluster2/tmp/foo/123`;

5. rename `/tmp/foo/123` to `/user/foo/123`. The op will succeed. `hdfs dfs -mv 
/tmp/foo/123 /user/foo/123 `

6. check dir again, rbf cluster still show dir `/tmp/foo/123`
`hdfs dfs -ls /tmp/foo/` : will show dir `/tmp/foo/123`;
`hdfs dfs -ls hdfs://subcluster1/tmp/foo/` : will no dirs;
`hdfs dfs -ls hdfs://subcluster2/tmp/foo/` : will show dir 
`hdfs://subcluster2/tmp/foo/123`;

The step 5 should throw exception.
 

 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org