[jira] [Created] (HDFS-16487) RBF: getListing uses raw mount table points

2022-02-25 Thread Fengnan Li (Jira)
Fengnan Li created HDFS-16487:
-

 Summary: RBF: getListing uses raw mount table points
 Key: HDFS-16487
 URL: https://issues.apache.org/jira/browse/HDFS-16487
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Fengnan Li
Assignee: Fengnan Li


In getListing, the result is a union of subclusters results and mount points. 
However these two are of different concepts and the latter one is something 
Router internal. It is very possible that the actual path doesn't exist in the 
dest HDFS yet. 

Can we choose a different strategy that check each children mount point and 
confirm there is the HDFS path in the dest cluster? If so, we can add it; 
otherwise we should skip this mount because it confuses clients. (Clients could 
directly create a subdir under a dangling mount point)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16486) RBF: Don't override listing if there is a physical path from subcluster

2022-02-25 Thread Fengnan Li (Jira)
Fengnan Li created HDFS-16486:
-

 Summary: RBF: Don't override listing if there is a physical path 
from subcluster
 Key: HDFS-16486
 URL: https://issues.apache.org/jira/browse/HDFS-16486
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Fengnan Li
Assignee: Fengnan Li


In getListing in RouterClientProtocol, currently router mount point would 
override the listing from subclusters. This will result in different 
HdfsFileStatus especially for owner/group permissions since Router mounts and 
the actual HDFS path are created from different user.

 

[https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterClientProtocol.java#L857]

 

To mitigate this discrepancy we can skip the mount point if there is already 
such a listing from subcluster.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16483) RBF: DataNode talk to Router requesting block info in WebHDFS

2022-02-24 Thread Fengnan Li (Jira)
Fengnan Li created HDFS-16483:
-

 Summary: RBF: DataNode talk to Router requesting block info in 
WebHDFS
 Key: HDFS-16483
 URL: https://issues.apache.org/jira/browse/HDFS-16483
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Reporter: Fengnan Li
Assignee: Fengnan Li


In Webhdfs, before router redirects the OPEN call to datanode, it will attach 
the namenoderpcaddress param. When Datanode WebHdfsHandler takes the call, it 
will construct a DFSClient based on the ip address, which is pointing to Router.

This is OK when Router and Datanode are both secure or nonsecure. However when 
DN is not but Router is secure, there will be 
org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not 
enabled.  Available:[TOKEN, KERBEROS]]
Comments are welcome in terms of how to fix this.

One way is to always make Datanode construct the DFSClient based on the default 
FS since the default FS is always the Namenode in the same cluster which should 
is with the same security setting as Datanode.

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16436) RBF: CheckSafeMode before Read Operation

2022-01-24 Thread Fengnan Li (Jira)
Fengnan Li created HDFS-16436:
-

 Summary: RBF: CheckSafeMode before Read Operation
 Key: HDFS-16436
 URL: https://issues.apache.org/jira/browse/HDFS-16436
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: rbf
Reporter: Fengnan Li
Assignee: Fengnan Li


In Router's 
[checkOperation|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterRpcServer.java#L630]
 call, the READ operation check is before safemode check. This has one issue 
where in the case of Mount Table Unavailable, READ can still pass the check 
while Router can not get the correct path location.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16188) RBF: Router to support resolving monitored namenodes with DNS

2021-09-10 Thread Fengnan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengnan Li resolved HDFS-16188.
---
Resolution: Fixed

> RBF: Router to support resolving monitored namenodes with DNS
> -
>
> Key: HDFS-16188
> URL: https://issues.apache.org/jira/browse/HDFS-16188
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: Leon Gao
>Assignee: Leon Gao
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> We can use a DNS round-robin record to configure list of monitored namenodes, 
> so we don't have to reconfigure everything namenode hostname is changed. For 
> example, in containerized environment the hostname of namenode/observers can 
> change pretty often.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16157) Support configuring DNS record to get list of journal nodes.

2021-08-25 Thread Fengnan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengnan Li resolved HDFS-16157.
---
Resolution: Resolved

> Support configuring DNS record to get list of journal nodes.
> 
>
> Key: HDFS-16157
> URL: https://issues.apache.org/jira/browse/HDFS-16157
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: journal-node
>Reporter: Leon Gao
>Assignee: Leon Gao
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> We can use a DNS round-robin record to configure list of journal nodes, so we 
> don't have to reconfigure everything journal node hostname is changed. For 
> example, in some containerized environment the hostname of journal nodes can 
> change pretty often.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-15878) RBF: Flaky test TestRouterWebHDFSContractCreate>AbstractContractCreateTest#testSyncable in Trunk

2021-05-07 Thread Fengnan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengnan Li reopened HDFS-15878:
---

Reopen to change the closing status.

> RBF: Flaky test 
> TestRouterWebHDFSContractCreate>AbstractContractCreateTest#testSyncable in 
> Trunk
> 
>
> Key: HDFS-15878
> URL: https://issues.apache.org/jira/browse/HDFS-15878
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs, rbf
>Reporter: Renukaprasad C
>Assignee: Fengnan Li
>Priority: Major
>
> ERROR] Tests run: 16, Failures: 0, Errors: 1, Skipped: 2, Time elapsed: 
> 24.627 s <<< FAILURE! - in 
> org.apache.hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate
> [ERROR] 
> testSyncable(org.apache.hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate)
>   Time elapsed: 0.222 s  <<< ERROR!
> java.io.FileNotFoundException: File /test/testSyncable not found.
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121)
>   at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:110)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toIOException(WebHdfsFileSystem.java:576)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$900(WebHdfsFileSystem.java:146)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.shouldRetry(WebHdfsFileSystem.java:892)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:858)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:652)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:690)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:686)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.getRedirectedUrl(WebHdfsFileSystem.java:2307)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.(WebHdfsFileSystem.java:2296)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$WebHdfsInputStream.(WebHdfsFileSystem.java:2176)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.open(WebHdfsFileSystem.java:1610)
>   at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:975)
>   at 
> org.apache.hadoop.fs.contract.AbstractContractCreateTest.validateSyncableSemantics(AbstractContractCreateTest.java:556)
>   at 
> org.apache.hadoop.fs.contract.AbstractContractCreateTest.testSyncable(AbstractContractCreateTest.java:459)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File 
> /test/testSyncable not found.
>   at 
> org.apache.hadoop.hdfs.web.JsonUtilClient.toRemoteException(JsonUtilClient.java:90)
>   at 
> 

[jira] [Resolved] (HDFS-15878) RBF: Flaky test TestRouterWebHDFSContractCreate>AbstractContractCreateTest#testSyncable in Trunk

2021-05-07 Thread Fengnan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengnan Li resolved HDFS-15878.
---
Resolution: Not A Problem

> RBF: Flaky test 
> TestRouterWebHDFSContractCreate>AbstractContractCreateTest#testSyncable in 
> Trunk
> 
>
> Key: HDFS-15878
> URL: https://issues.apache.org/jira/browse/HDFS-15878
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs, rbf
>Reporter: Renukaprasad C
>Assignee: Fengnan Li
>Priority: Major
>
> ERROR] Tests run: 16, Failures: 0, Errors: 1, Skipped: 2, Time elapsed: 
> 24.627 s <<< FAILURE! - in 
> org.apache.hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate
> [ERROR] 
> testSyncable(org.apache.hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate)
>   Time elapsed: 0.222 s  <<< ERROR!
> java.io.FileNotFoundException: File /test/testSyncable not found.
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121)
>   at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:110)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toIOException(WebHdfsFileSystem.java:576)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$900(WebHdfsFileSystem.java:146)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.shouldRetry(WebHdfsFileSystem.java:892)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:858)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:652)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:690)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:686)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.getRedirectedUrl(WebHdfsFileSystem.java:2307)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.(WebHdfsFileSystem.java:2296)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$WebHdfsInputStream.(WebHdfsFileSystem.java:2176)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.open(WebHdfsFileSystem.java:1610)
>   at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:975)
>   at 
> org.apache.hadoop.fs.contract.AbstractContractCreateTest.validateSyncableSemantics(AbstractContractCreateTest.java:556)
>   at 
> org.apache.hadoop.fs.contract.AbstractContractCreateTest.testSyncable(AbstractContractCreateTest.java:459)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File 
> /test/testSyncable not found.
>   at 
> org.apache.hadoop.hdfs.web.JsonUtilClient.toRemoteException(JsonUtilClient.java:90)
>   at 
> 

[jira] [Resolved] (HDFS-15878) RBF: Flaky test TestRouterWebHDFSContractCreate>AbstractContractCreateTest#testSyncable in Trunk

2021-05-04 Thread Fengnan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengnan Li resolved HDFS-15878.
---
Resolution: Resolved

> RBF: Flaky test 
> TestRouterWebHDFSContractCreate>AbstractContractCreateTest#testSyncable in 
> Trunk
> 
>
> Key: HDFS-15878
> URL: https://issues.apache.org/jira/browse/HDFS-15878
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs, rbf
>Reporter: Renukaprasad C
>Assignee: Fengnan Li
>Priority: Major
>
> ERROR] Tests run: 16, Failures: 0, Errors: 1, Skipped: 2, Time elapsed: 
> 24.627 s <<< FAILURE! - in 
> org.apache.hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate
> [ERROR] 
> testSyncable(org.apache.hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate)
>   Time elapsed: 0.222 s  <<< ERROR!
> java.io.FileNotFoundException: File /test/testSyncable not found.
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121)
>   at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:110)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toIOException(WebHdfsFileSystem.java:576)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$900(WebHdfsFileSystem.java:146)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.shouldRetry(WebHdfsFileSystem.java:892)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:858)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:652)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:690)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:686)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.getRedirectedUrl(WebHdfsFileSystem.java:2307)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.(WebHdfsFileSystem.java:2296)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$WebHdfsInputStream.(WebHdfsFileSystem.java:2176)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.open(WebHdfsFileSystem.java:1610)
>   at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:975)
>   at 
> org.apache.hadoop.fs.contract.AbstractContractCreateTest.validateSyncableSemantics(AbstractContractCreateTest.java:556)
>   at 
> org.apache.hadoop.fs.contract.AbstractContractCreateTest.testSyncable(AbstractContractCreateTest.java:459)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File 
> /test/testSyncable not found.
>   at 
> org.apache.hadoop.hdfs.web.JsonUtilClient.toRemoteException(JsonUtilClient.java:90)
>   at 
> 

[jira] [Created] (HDFS-16005) RBF: AccessControlException is counted as proxy failure

2021-05-02 Thread Fengnan Li (Jira)
Fengnan Li created HDFS-16005:
-

 Summary: RBF: AccessControlException is counted as proxy failure
 Key: HDFS-16005
 URL: https://issues.apache.org/jira/browse/HDFS-16005
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs
Reporter: Fengnan Li
Assignee: Fengnan Li


We are using ProxyOpCommunicateFailure as a metric for monitoring Router's 
performance. However we recently noticed that when some clients try to access 
files they don't have access to in Namenode. AccessControlException thrown from 
Namenode was counted in this metric.

In our understanding ProxyOpCommunicateFailure is used as network/hardware 
failure  between Router and Namenode instead of the communication failure due 
to client side.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-15833) Make ObserverReadProxyProvider able to talk to DNS of Observers

2021-02-10 Thread Fengnan Li (Jira)
Fengnan Li created HDFS-15833:
-

 Summary: Make ObserverReadProxyProvider able to talk to DNS of 
Observers
 Key: HDFS-15833
 URL: https://issues.apache.org/jira/browse/HDFS-15833
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Fengnan Li
Assignee: Aihua Xu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-15832) Using DNS to access Zookeeper cluster

2021-02-10 Thread Fengnan Li (Jira)
Fengnan Li created HDFS-15832:
-

 Summary: Using DNS to access Zookeeper cluster
 Key: HDFS-15832
 URL: https://issues.apache.org/jira/browse/HDFS-15832
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Fengnan Li
Assignee: Aihua Xu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-15831) Adopt more DNS resolving for HDFS

2021-02-10 Thread Fengnan Li (Jira)
Fengnan Li created HDFS-15831:
-

 Summary: Adopt more DNS resolving for HDFS
 Key: HDFS-15831
 URL: https://issues.apache.org/jira/browse/HDFS-15831
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Fengnan Li
Assignee: Fengnan Li


There are some opportunities inside HDFS where we can use DNS for hosts instead 
of host names. This will help to a large extent in two aspects:
1. Server management, i.e. host replacement
2. Client transparency, i.e. client config with DNS without knowing the 
specific host.

It is worth mentioning that secure environment should be supported, we 
recommend having the principal wildcard matching turned on.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-15757) RBF: Improving Router Connection Management

2021-01-01 Thread Fengnan Li (Jira)
Fengnan Li created HDFS-15757:
-

 Summary: RBF: Improving Router Connection Management
 Key: HDFS-15757
 URL: https://issues.apache.org/jira/browse/HDFS-15757
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: rbf
Reporter: Fengnan Li
Assignee: Fengnan Li
 Attachments: RBF_ Router Connection Management.pdf

We have seen high number of connections from Router to namenodes, leaving 
namenodes unstable.
This ticket is trying to reduce connections through some changes. Please take a 
look at the design and leave comments. 
Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-15754) Create packet metrics for DataNode

2020-12-29 Thread Fengnan Li (Jira)
Fengnan Li created HDFS-15754:
-

 Summary: Create packet metrics for DataNode
 Key: HDFS-15754
 URL: https://issues.apache.org/jira/browse/HDFS-15754
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Fengnan Li
Assignee: Fengnan Li


In BlockReceiver, right now when there is slowness in writeToMirror, 
writeToDisk and writeToOsCache, it is dumped in the debug log. In practice we 
have found these are quite useful signal to detect issues in DataNode, so it 
will be great these metrics can be exposed by JMX.
Also we introduced totalPacket received to use a percentage as a signal to 
detect the potentially underperforming datanode since datanodes across one HDFS 
cluster may received different numbers of packets totally.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-15634) Invalidate block on decommissioning DataNode after replication

2020-10-15 Thread Fengnan Li (Jira)
Fengnan Li created HDFS-15634:
-

 Summary: Invalidate block on decommissioning DataNode after 
replication
 Key: HDFS-15634
 URL: https://issues.apache.org/jira/browse/HDFS-15634
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs
Reporter: Fengnan Li
Assignee: Fengnan Li


Right now when a DataNode starts decommission, Namenode will mark it as 
decommissioning and its blocks will be replicated over to different DataNodes, 
then marked as decommissioned. These blocks are not touched since they are not 
counted as live replicas.

Proposal: Invalidate these blocks once they are replicated and there are enough 
live replicas in the cluster.

Reason: A recent shutdown of decommissioned datanodes to finished the flow 
caused Namenode latency spike since namenode needs to remove all of the blocks 
from its memory and this step requires holding write lock. If we have gradually 
invalidated these blocks the deletion will be much easier and faster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-15599) RBF: Add API to expose resolved destinations (namespace) in Router

2020-09-24 Thread Fengnan Li (Jira)
Fengnan Li created HDFS-15599:
-

 Summary: RBF: Add API to expose resolved destinations (namespace) 
in Router
 Key: HDFS-15599
 URL: https://issues.apache.org/jira/browse/HDFS-15599
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Fengnan Li
Assignee: Fengnan Li


We have seen quite often requests like where a path in Router is actually 
pointed. Two main use cases are:

1) Calculate the HDFS capacity usage allocation of all Hive tables, whose have 
onboarded to Router.

2) A failure prevention method for cross-cluster rename. First check the source 
HDFS location and dest HDFS location, and then issue a distcp cmd if possible 
to avoid the Exception.

Inside Router, the function getLocationsForPath does the work but it is 
internal only and not visible to Clients.

RouterAdmin has getMountTableEntries but this is a cast of Mount table without 
any resolving.

 

We are proposing adding such an API, and there are two ways:

1) Adding this API in RouterRpcServer, which requires the change in 
ClientNameNodeProtocol to include this new API. 

2) Adding this API in RouterAdminServer, which requires the a protocol between 
Client and the admin server.

 

There is one existing resolvePath in FileSystem which can be used to implement 
this call from client side.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-15554) RBF: force router check file existence before adding/updating mount points

2020-09-01 Thread Fengnan Li (Jira)
Fengnan Li created HDFS-15554:
-

 Summary: RBF: force router check file existence before 
adding/updating mount points
 Key: HDFS-15554
 URL: https://issues.apache.org/jira/browse/HDFS-15554
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Fengnan Li
Assignee: Fengnan Li


Adding/Updating mount points right now is only a router action without 
validation in the downstream namenodes for the destination files/directories.

In practice we have set up the dangling mount points and when clients call 
listStatus they would get the file returned, but then if they try to access the 
file FileNotFoundException would be thrown out.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15447) RBF: Add top owners metrics for delegation tokens

2020-07-25 Thread Fengnan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengnan Li resolved HDFS-15447.
---
Resolution: Resolved

> RBF: Add top owners metrics for delegation tokens
> -
>
> Key: HDFS-15447
> URL: https://issues.apache.org/jira/browse/HDFS-15447
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Major
>
> Over time we have seen multiple times of token bombarding behavior either due 
> to mistakes or user issuing huge amount of traffic. Having this metric will 
> help figuring out much faster who/which service is owning these tokens and 
> stopping the behavior quicker.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-15447) RBF: Add top owners metrics for delegation tokens

2020-06-30 Thread Fengnan Li (Jira)
Fengnan Li created HDFS-15447:
-

 Summary: RBF: Add top owners metrics for delegation tokens
 Key: HDFS-15447
 URL: https://issues.apache.org/jira/browse/HDFS-15447
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Fengnan Li
Assignee: Fengnan Li


Over time we have seen multiple times of token bombarding behavior either due 
to mistakes or user issuing huge amount of traffic. Having this metric will 
help figuring out much faster who/which service is owning these tokens and 
stopping the behavior quicker.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-15383) RBF: Disable watch in ZKDelegationSecretManager for performance

2020-06-02 Thread Fengnan Li (Jira)
Fengnan Li created HDFS-15383:
-

 Summary: RBF: Disable watch in ZKDelegationSecretManager for 
performance
 Key: HDFS-15383
 URL: https://issues.apache.org/jira/browse/HDFS-15383
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Fengnan Li
Assignee: Fengnan Li


Based on the current design for delegation token in secure Router, the total 
number of watches for tokens is the product of number of routers and number of 
tokens, this is due to ZKDelegationTokenManager is using PathChildrenCache from 
curator, which automatically sets the watch and ZK will push the sync 
information to each router. There are some evaluations about the number of 
watches in Zookeeper has negative performance impact to Zookeeper server.

In our practice when the number of watches exceeds 1.2 Million in a single ZK 
server there will be significant ZK performance degradation. Thus this ticket 
is to rewrite ZKDelegationTokenManagerImpl.java to explicitly disable the 
PathChildrenCache and have Routers sync periodically from Zookeeper. This has 
been working fine at the scale of 10 Routers with 2 million tokens. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-15196) RouterRpcServer getListing cannot list large dirs correctly

2020-02-27 Thread Fengnan Li (Jira)
Fengnan Li created HDFS-15196:
-

 Summary: RouterRpcServer getListing cannot list large dirs 
correctly
 Key: HDFS-15196
 URL: https://issues.apache.org/jira/browse/HDFS-15196
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Fengnan Li
Assignee: Fengnan Li


In RouterRpcServer, getListing function is handled as two parts:
 # Union all partial listings from destination ns + paths
 # Append mount points for the dir to be listed

In the case of large dir which is bigger than DFSConfigKeys.DFS_LIST_LIMIT 
(with default value 1k), the batch listing will be used and the startAfter will 
be used to define the boundary of each batch listing. However, step 2 here will 
add existing mount points, which will mess up with the boundary of the batch, 
thus making the next batch startAfter wrong.

The fix is just to append the mount points when there is no more batch query 
necessary.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-14914) Observer should throw StandbyException in Safemode

2019-10-19 Thread Fengnan Li (Jira)
Fengnan Li created HDFS-14914:
-

 Summary: Observer should throw StandbyException in Safemode
 Key: HDFS-14914
 URL: https://issues.apache.org/jira/browse/HDFS-14914
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Fengnan Li
Assignee: Fengnan Li
 Attachments: HDFS-14914-001.patch

When observer is in safemode, calling getBlockLocations will make it throw 
RetriableException as in 
[HDFS-13898|https://issues.apache.org/jira/browse/HDFS-13898]. However, during 
startup the safemode is taking a really long time and retry would not help much 
here.

What makes it worse is that when Routers talking to Observers, since Router 
distinguishes StandbyException and RetriableException, it will keep retry 
(default 3) times and then return to the client an RetriableException. The 
client will retry again on the same Router and to the same Observer for default 
10 times, resulting in 3 * 10 = 30 retries per call.

The change is to make it failover so that Router can immediately try another 
Observer or Active namenode (depends on the design). The current 
ObserverReadProxyProvider doesn't get affected since both RetriableException 
and StandbyException will make it failover.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-14647) NPE during secure namenode startup

2019-07-15 Thread Fengnan Li (JIRA)
Fengnan Li created HDFS-14647:
-

 Summary: NPE during secure namenode startup
 Key: HDFS-14647
 URL: https://issues.apache.org/jira/browse/HDFS-14647
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs
Affects Versions: 2.8.2
Reporter: Fengnan Li
Assignee: Fengnan Li


In secure HDFS, during Namenode loading fsimage, when hitting Namenode through 
the REST API, below exception would be thrown out. (This is in version 2.8.2)
{quote}org.apache.hadoop.hdfs.web.resources.ExceptionHandler: 
INTERNAL_SERVER_ERROR
 java.lang.NullPointerException
 at 
org.apache.hadoop.hdfs.server.common.JspHelper.getTokenUGI(JspHelper.java:283)
 at org.apache.hadoop.hdfs.server.common.JspHelper.getUGI(JspHelper.java:226)
 at 
org.apache.hadoop.hdfs.web.resources.UserProvider.getValue(UserProvider.java:54)
 at 
org.apache.hadoop.hdfs.web.resources.UserProvider.getValue(UserProvider.java:42)
 at 
com.sun.jersey.server.impl.inject.InjectableValuesProvider.getInjectableValues(InjectableValuesProvider.java:46)
 at 
com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$EntityParamInInvoker.getParams(AbstractResourceMethodDispatchProvider.java:153)
 at 
com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:203)
 at 
com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
 at 
com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
 at 
com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
 at 
com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
 at 
com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
 at 
com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
 at 
com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469)
 at 
com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400)
 at 
com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
 at 
com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
 at 
com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
 at 
com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
 at 
com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:699)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
 at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
 at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
 at org.apache.hadoop.hdfs.web.AuthFilter.doFilter(AuthFilter.java:87)
 at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
 at 
org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1353)
 at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
 at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
 at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
 at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
 at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
 at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
 at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
 at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
 at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
 at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
 at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
 at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
 at org.mortbay.jetty.Server.handle(Server.java:326)
 at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
 at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
 at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
 at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
 at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
 at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
 at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
{quote}
This is because during this phase, namesystem hasn't been initialized. In 
non-HA context, it can throw a RetriableException to let 

[jira] [Created] (HDFS-14449) Expose total number of dt in jmx for KMS/Namenode

2019-04-22 Thread Fengnan Li (JIRA)
Fengnan Li created HDFS-14449:
-

 Summary: Expose total number of dt in jmx for KMS/Namenode
 Key: HDFS-14449
 URL: https://issues.apache.org/jira/browse/HDFS-14449
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Fengnan Li
Assignee: Fengnan Li






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-14444) RBF: Add safemode to Router UI

2019-04-20 Thread Fengnan Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengnan Li resolved HDFS-1.
---
Resolution: Duplicate

> RBF: Add safemode to Router UI
> --
>
> Key: HDFS-1
> URL: https://issues.apache.org/jira/browse/HDFS-1
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Major
>
> https://issues.apache.org/jira/browse/HDFS-14259 add the safemode metric, but 
> it is not very straightforward when open the UI of router. We should add in 
> the page somewhere about the safemode is on.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-14444) Add safemode to Router UI

2019-04-20 Thread Fengnan Li (JIRA)
Fengnan Li created HDFS-1:
-

 Summary: Add safemode to Router UI
 Key: HDFS-1
 URL: https://issues.apache.org/jira/browse/HDFS-1
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Fengnan Li
Assignee: Fengnan Li


https://issues.apache.org/jira/browse/HDFS-14259 add the safemode metric, but 
it is not very straightforward when open the UI of router. We should add in the 
page somewhere about the safemode is on.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-14427) Optimize some testing set up logic in MiniRouterDFSCluster

2019-04-14 Thread Fengnan Li (JIRA)
Fengnan Li created HDFS-14427:
-

 Summary: Optimize some testing set up logic in MiniRouterDFSCluster
 Key: HDFS-14427
 URL: https://issues.apache.org/jira/browse/HDFS-14427
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Fengnan Li
Assignee: Fengnan Li


[https://github.com/apache/hadoop/blob/HDFS-13891/hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/MiniRouterDFSCluster.java#L808]

the comment says one router is created per name service, while in the code one 
router is created per namenode in each nameservice.

There are a couple of things that might need to consider optimization:
 # make the code as the the comment
 # add some ways to specify the number of routers



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-14426) RBF: Add delegation token total count as one of the federation metrics

2019-04-14 Thread Fengnan Li (JIRA)
Fengnan Li created HDFS-14426:
-

 Summary: RBF: Add delegation token total count as one of the 
federation metrics
 Key: HDFS-14426
 URL: https://issues.apache.org/jira/browse/HDFS-14426
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Fengnan Li
Assignee: Fengnan Li


Currently router doesn't report the total number of current valid delegation 
tokens it has, but this piece of information is useful for monitoring and 
understanding the real time situation of tokens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-14405) RBF: Client should be able to renew dt immediately after it fetched the dt

2019-04-02 Thread Fengnan Li (JIRA)
Fengnan Li created HDFS-14405:
-

 Summary: RBF: Client should be able to renew dt immediately after 
it fetched the dt
 Key: HDFS-14405
 URL: https://issues.apache.org/jira/browse/HDFS-14405
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Fengnan Li
Assignee: Fengnan Li


By the current design, once a dt is generated it needs to sync to other routers 
as well as backing up in the state store, therefore there is a time gap between 
other routers are able to know the existence of this token.

Ideally, the same client should be able to renew the token it just created 
through fetchdt even though two calls are hitting two distinct routers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-14327) Support security for DNS resolving

2019-02-28 Thread Fengnan Li (JIRA)
Fengnan Li created HDFS-14327:
-

 Summary: Support security for DNS resolving
 Key: HDFS-14327
 URL: https://issues.apache.org/jira/browse/HDFS-14327
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Fengnan Li
Assignee: Fengnan Li


The DNS resolving, clients will get the IP of the servers (NN/Routers) and use 
the IP addresses to access the machine. This will fail in secure environment as 
Kerberos is using the domain name in the principal so it won't recognize the IP 
addresses.

This task is mainly adding a reverse look up on the current basis and get the 
domain name after the IP is fetched. After that clients will still use the 
domain name to access the servers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-14310) Improve documents for using DNS to resolve namenodes and routers

2019-02-21 Thread Fengnan Li (JIRA)
Fengnan Li created HDFS-14310:
-

 Summary: Improve documents for using DNS to resolve namenodes and 
routers
 Key: HDFS-14310
 URL: https://issues.apache.org/jira/browse/HDFS-14310
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Fengnan Li
Assignee: Fengnan Li


With https://issues.apache.org/jira/browse/HDFS-14118, clients can just use one 
domain name to access either namenodes or routers, instead of putting all of 
their domain names in the config.

Update the below documents with this new feature:
 * 
hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSHighAvailabilityWithQJM.md
 * 
hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSHighAvailabilityWithNFS.md
 * hadoop-hdfs-project/hadoop-hdfs-rbf/src/site/markdown/HDFSRouterFederation.md
 * hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/ObserverNamenode.md

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-14239) Fix the comment for getClass in Configuration

2019-01-28 Thread Fengnan Li (JIRA)
Fengnan Li created HDFS-14239:
-

 Summary: Fix the comment for getClass in Configuration
 Key: HDFS-14239
 URL: https://issues.apache.org/jira/browse/HDFS-14239
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Fengnan Li
Assignee: Fengnan Li


The comment for getClass method in org.apache.hadoop.conf.Configuration is 
wrong, it is using property name instead of the actual class name



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-14118) RBF: Use DNS to help resolve routers

2018-11-30 Thread Fengnan Li (JIRA)
Fengnan Li created HDFS-14118:
-

 Summary: RBF: Use DNS to help resolve routers
 Key: HDFS-14118
 URL: https://issues.apache.org/jira/browse/HDFS-14118
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Fengnan Li


Clients will need to know about routers to talk to the HDFS cluster 
(obviously), and having routers updating (adding/removing) will have to make 
every client change, which is a painful process.

DNS can be used here to resolve the single domain name clients knows to a list 
of routers in the current config. However, DNS won't be able to consider only 
resolving to the working router based on certain health thresholds.

There are some ways about how this can be solved. One way is to have a separate 
script to regularly check the status of the router and update the DNS records 
if a router fails the health thresholds. In this way, security might be 
carefully considered for this way. Another way is to have the client do the 
normal connecting/failover after they get the list of routers, which requires 
the change of current failover proxy provider.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org