[jira] [Created] (HDFS-16487) RBF: getListing uses raw mount table points
Fengnan Li created HDFS-16487: - Summary: RBF: getListing uses raw mount table points Key: HDFS-16487 URL: https://issues.apache.org/jira/browse/HDFS-16487 Project: Hadoop HDFS Issue Type: Bug Reporter: Fengnan Li Assignee: Fengnan Li In getListing, the result is a union of subclusters results and mount points. However these two are of different concepts and the latter one is something Router internal. It is very possible that the actual path doesn't exist in the dest HDFS yet. Can we choose a different strategy that check each children mount point and confirm there is the HDFS path in the dest cluster? If so, we can add it; otherwise we should skip this mount because it confuses clients. (Clients could directly create a subdir under a dangling mount point) -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-16486) RBF: Don't override listing if there is a physical path from subcluster
Fengnan Li created HDFS-16486: - Summary: RBF: Don't override listing if there is a physical path from subcluster Key: HDFS-16486 URL: https://issues.apache.org/jira/browse/HDFS-16486 Project: Hadoop HDFS Issue Type: Bug Reporter: Fengnan Li Assignee: Fengnan Li In getListing in RouterClientProtocol, currently router mount point would override the listing from subclusters. This will result in different HdfsFileStatus especially for owner/group permissions since Router mounts and the actual HDFS path are created from different user. [https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterClientProtocol.java#L857] To mitigate this discrepancy we can skip the mount point if there is already such a listing from subcluster. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-16483) RBF: DataNode talk to Router requesting block info in WebHDFS
Fengnan Li created HDFS-16483: - Summary: RBF: DataNode talk to Router requesting block info in WebHDFS Key: HDFS-16483 URL: https://issues.apache.org/jira/browse/HDFS-16483 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Reporter: Fengnan Li Assignee: Fengnan Li In Webhdfs, before router redirects the OPEN call to datanode, it will attach the namenoderpcaddress param. When Datanode WebHdfsHandler takes the call, it will construct a DFSClient based on the ip address, which is pointing to Router. This is OK when Router and Datanode are both secure or nonsecure. However when DN is not but Router is secure, there will be org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS]] Comments are welcome in terms of how to fix this. One way is to always make Datanode construct the DFSClient based on the default FS since the default FS is always the Namenode in the same cluster which should is with the same security setting as Datanode. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-16436) RBF: CheckSafeMode before Read Operation
Fengnan Li created HDFS-16436: - Summary: RBF: CheckSafeMode before Read Operation Key: HDFS-16436 URL: https://issues.apache.org/jira/browse/HDFS-16436 Project: Hadoop HDFS Issue Type: Improvement Components: rbf Reporter: Fengnan Li Assignee: Fengnan Li In Router's [checkOperation|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterRpcServer.java#L630] call, the READ operation check is before safemode check. This has one issue where in the case of Mount Table Unavailable, READ can still pass the check while Router can not get the correct path location. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-16188) RBF: Router to support resolving monitored namenodes with DNS
[ https://issues.apache.org/jira/browse/HDFS-16188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fengnan Li resolved HDFS-16188. --- Resolution: Fixed > RBF: Router to support resolving monitored namenodes with DNS > - > > Key: HDFS-16188 > URL: https://issues.apache.org/jira/browse/HDFS-16188 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Reporter: Leon Gao >Assignee: Leon Gao >Priority: Minor > Labels: pull-request-available > Time Spent: 5h 10m > Remaining Estimate: 0h > > We can use a DNS round-robin record to configure list of monitored namenodes, > so we don't have to reconfigure everything namenode hostname is changed. For > example, in containerized environment the hostname of namenode/observers can > change pretty often. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-16157) Support configuring DNS record to get list of journal nodes.
[ https://issues.apache.org/jira/browse/HDFS-16157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fengnan Li resolved HDFS-16157. --- Resolution: Resolved > Support configuring DNS record to get list of journal nodes. > > > Key: HDFS-16157 > URL: https://issues.apache.org/jira/browse/HDFS-16157 > Project: Hadoop HDFS > Issue Type: Improvement > Components: journal-node >Reporter: Leon Gao >Assignee: Leon Gao >Priority: Minor > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > We can use a DNS round-robin record to configure list of journal nodes, so we > don't have to reconfigure everything journal node hostname is changed. For > example, in some containerized environment the hostname of journal nodes can > change pretty often. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Reopened] (HDFS-15878) RBF: Flaky test TestRouterWebHDFSContractCreate>AbstractContractCreateTest#testSyncable in Trunk
[ https://issues.apache.org/jira/browse/HDFS-15878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fengnan Li reopened HDFS-15878: --- Reopen to change the closing status. > RBF: Flaky test > TestRouterWebHDFSContractCreate>AbstractContractCreateTest#testSyncable in > Trunk > > > Key: HDFS-15878 > URL: https://issues.apache.org/jira/browse/HDFS-15878 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs, rbf >Reporter: Renukaprasad C >Assignee: Fengnan Li >Priority: Major > > ERROR] Tests run: 16, Failures: 0, Errors: 1, Skipped: 2, Time elapsed: > 24.627 s <<< FAILURE! - in > org.apache.hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate > [ERROR] > testSyncable(org.apache.hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate) > Time elapsed: 0.222 s <<< ERROR! > java.io.FileNotFoundException: File /test/testSyncable not found. > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121) > at > org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:110) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toIOException(WebHdfsFileSystem.java:576) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$900(WebHdfsFileSystem.java:146) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.shouldRetry(WebHdfsFileSystem.java:892) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:858) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:652) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:690) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:686) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.getRedirectedUrl(WebHdfsFileSystem.java:2307) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.(WebHdfsFileSystem.java:2296) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$WebHdfsInputStream.(WebHdfsFileSystem.java:2176) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.open(WebHdfsFileSystem.java:1610) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:975) > at > org.apache.hadoop.fs.contract.AbstractContractCreateTest.validateSyncableSemantics(AbstractContractCreateTest.java:556) > at > org.apache.hadoop.fs.contract.AbstractContractCreateTest.testSyncable(AbstractContractCreateTest.java:459) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > Caused by: > org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File > /test/testSyncable not found. > at > org.apache.hadoop.hdfs.web.JsonUtilClient.toRemoteException(JsonUtilClient.java:90) > at >
[jira] [Resolved] (HDFS-15878) RBF: Flaky test TestRouterWebHDFSContractCreate>AbstractContractCreateTest#testSyncable in Trunk
[ https://issues.apache.org/jira/browse/HDFS-15878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fengnan Li resolved HDFS-15878. --- Resolution: Not A Problem > RBF: Flaky test > TestRouterWebHDFSContractCreate>AbstractContractCreateTest#testSyncable in > Trunk > > > Key: HDFS-15878 > URL: https://issues.apache.org/jira/browse/HDFS-15878 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs, rbf >Reporter: Renukaprasad C >Assignee: Fengnan Li >Priority: Major > > ERROR] Tests run: 16, Failures: 0, Errors: 1, Skipped: 2, Time elapsed: > 24.627 s <<< FAILURE! - in > org.apache.hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate > [ERROR] > testSyncable(org.apache.hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate) > Time elapsed: 0.222 s <<< ERROR! > java.io.FileNotFoundException: File /test/testSyncable not found. > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121) > at > org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:110) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toIOException(WebHdfsFileSystem.java:576) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$900(WebHdfsFileSystem.java:146) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.shouldRetry(WebHdfsFileSystem.java:892) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:858) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:652) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:690) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:686) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.getRedirectedUrl(WebHdfsFileSystem.java:2307) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.(WebHdfsFileSystem.java:2296) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$WebHdfsInputStream.(WebHdfsFileSystem.java:2176) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.open(WebHdfsFileSystem.java:1610) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:975) > at > org.apache.hadoop.fs.contract.AbstractContractCreateTest.validateSyncableSemantics(AbstractContractCreateTest.java:556) > at > org.apache.hadoop.fs.contract.AbstractContractCreateTest.testSyncable(AbstractContractCreateTest.java:459) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > Caused by: > org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File > /test/testSyncable not found. > at > org.apache.hadoop.hdfs.web.JsonUtilClient.toRemoteException(JsonUtilClient.java:90) > at >
[jira] [Resolved] (HDFS-15878) RBF: Flaky test TestRouterWebHDFSContractCreate>AbstractContractCreateTest#testSyncable in Trunk
[ https://issues.apache.org/jira/browse/HDFS-15878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fengnan Li resolved HDFS-15878. --- Resolution: Resolved > RBF: Flaky test > TestRouterWebHDFSContractCreate>AbstractContractCreateTest#testSyncable in > Trunk > > > Key: HDFS-15878 > URL: https://issues.apache.org/jira/browse/HDFS-15878 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs, rbf >Reporter: Renukaprasad C >Assignee: Fengnan Li >Priority: Major > > ERROR] Tests run: 16, Failures: 0, Errors: 1, Skipped: 2, Time elapsed: > 24.627 s <<< FAILURE! - in > org.apache.hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate > [ERROR] > testSyncable(org.apache.hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate) > Time elapsed: 0.222 s <<< ERROR! > java.io.FileNotFoundException: File /test/testSyncable not found. > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121) > at > org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:110) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toIOException(WebHdfsFileSystem.java:576) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$900(WebHdfsFileSystem.java:146) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.shouldRetry(WebHdfsFileSystem.java:892) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:858) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:652) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:690) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:686) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.getRedirectedUrl(WebHdfsFileSystem.java:2307) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.(WebHdfsFileSystem.java:2296) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$WebHdfsInputStream.(WebHdfsFileSystem.java:2176) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.open(WebHdfsFileSystem.java:1610) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:975) > at > org.apache.hadoop.fs.contract.AbstractContractCreateTest.validateSyncableSemantics(AbstractContractCreateTest.java:556) > at > org.apache.hadoop.fs.contract.AbstractContractCreateTest.testSyncable(AbstractContractCreateTest.java:459) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > Caused by: > org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File > /test/testSyncable not found. > at > org.apache.hadoop.hdfs.web.JsonUtilClient.toRemoteException(JsonUtilClient.java:90) > at >
[jira] [Created] (HDFS-16005) RBF: AccessControlException is counted as proxy failure
Fengnan Li created HDFS-16005: - Summary: RBF: AccessControlException is counted as proxy failure Key: HDFS-16005 URL: https://issues.apache.org/jira/browse/HDFS-16005 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs Reporter: Fengnan Li Assignee: Fengnan Li We are using ProxyOpCommunicateFailure as a metric for monitoring Router's performance. However we recently noticed that when some clients try to access files they don't have access to in Namenode. AccessControlException thrown from Namenode was counted in this metric. In our understanding ProxyOpCommunicateFailure is used as network/hardware failure between Router and Namenode instead of the communication failure due to client side. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15833) Make ObserverReadProxyProvider able to talk to DNS of Observers
Fengnan Li created HDFS-15833: - Summary: Make ObserverReadProxyProvider able to talk to DNS of Observers Key: HDFS-15833 URL: https://issues.apache.org/jira/browse/HDFS-15833 Project: Hadoop HDFS Issue Type: Improvement Reporter: Fengnan Li Assignee: Aihua Xu -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15832) Using DNS to access Zookeeper cluster
Fengnan Li created HDFS-15832: - Summary: Using DNS to access Zookeeper cluster Key: HDFS-15832 URL: https://issues.apache.org/jira/browse/HDFS-15832 Project: Hadoop HDFS Issue Type: Improvement Reporter: Fengnan Li Assignee: Aihua Xu -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15831) Adopt more DNS resolving for HDFS
Fengnan Li created HDFS-15831: - Summary: Adopt more DNS resolving for HDFS Key: HDFS-15831 URL: https://issues.apache.org/jira/browse/HDFS-15831 Project: Hadoop HDFS Issue Type: Improvement Reporter: Fengnan Li Assignee: Fengnan Li There are some opportunities inside HDFS where we can use DNS for hosts instead of host names. This will help to a large extent in two aspects: 1. Server management, i.e. host replacement 2. Client transparency, i.e. client config with DNS without knowing the specific host. It is worth mentioning that secure environment should be supported, we recommend having the principal wildcard matching turned on. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15757) RBF: Improving Router Connection Management
Fengnan Li created HDFS-15757: - Summary: RBF: Improving Router Connection Management Key: HDFS-15757 URL: https://issues.apache.org/jira/browse/HDFS-15757 Project: Hadoop HDFS Issue Type: Improvement Components: rbf Reporter: Fengnan Li Assignee: Fengnan Li Attachments: RBF_ Router Connection Management.pdf We have seen high number of connections from Router to namenodes, leaving namenodes unstable. This ticket is trying to reduce connections through some changes. Please take a look at the design and leave comments. Thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15754) Create packet metrics for DataNode
Fengnan Li created HDFS-15754: - Summary: Create packet metrics for DataNode Key: HDFS-15754 URL: https://issues.apache.org/jira/browse/HDFS-15754 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Reporter: Fengnan Li Assignee: Fengnan Li In BlockReceiver, right now when there is slowness in writeToMirror, writeToDisk and writeToOsCache, it is dumped in the debug log. In practice we have found these are quite useful signal to detect issues in DataNode, so it will be great these metrics can be exposed by JMX. Also we introduced totalPacket received to use a percentage as a signal to detect the potentially underperforming datanode since datanodes across one HDFS cluster may received different numbers of packets totally. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15634) Invalidate block on decommissioning DataNode after replication
Fengnan Li created HDFS-15634: - Summary: Invalidate block on decommissioning DataNode after replication Key: HDFS-15634 URL: https://issues.apache.org/jira/browse/HDFS-15634 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs Reporter: Fengnan Li Assignee: Fengnan Li Right now when a DataNode starts decommission, Namenode will mark it as decommissioning and its blocks will be replicated over to different DataNodes, then marked as decommissioned. These blocks are not touched since they are not counted as live replicas. Proposal: Invalidate these blocks once they are replicated and there are enough live replicas in the cluster. Reason: A recent shutdown of decommissioned datanodes to finished the flow caused Namenode latency spike since namenode needs to remove all of the blocks from its memory and this step requires holding write lock. If we have gradually invalidated these blocks the deletion will be much easier and faster. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15599) RBF: Add API to expose resolved destinations (namespace) in Router
Fengnan Li created HDFS-15599: - Summary: RBF: Add API to expose resolved destinations (namespace) in Router Key: HDFS-15599 URL: https://issues.apache.org/jira/browse/HDFS-15599 Project: Hadoop HDFS Issue Type: Improvement Reporter: Fengnan Li Assignee: Fengnan Li We have seen quite often requests like where a path in Router is actually pointed. Two main use cases are: 1) Calculate the HDFS capacity usage allocation of all Hive tables, whose have onboarded to Router. 2) A failure prevention method for cross-cluster rename. First check the source HDFS location and dest HDFS location, and then issue a distcp cmd if possible to avoid the Exception. Inside Router, the function getLocationsForPath does the work but it is internal only and not visible to Clients. RouterAdmin has getMountTableEntries but this is a cast of Mount table without any resolving. We are proposing adding such an API, and there are two ways: 1) Adding this API in RouterRpcServer, which requires the change in ClientNameNodeProtocol to include this new API. 2) Adding this API in RouterAdminServer, which requires the a protocol between Client and the admin server. There is one existing resolvePath in FileSystem which can be used to implement this call from client side. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15554) RBF: force router check file existence before adding/updating mount points
Fengnan Li created HDFS-15554: - Summary: RBF: force router check file existence before adding/updating mount points Key: HDFS-15554 URL: https://issues.apache.org/jira/browse/HDFS-15554 Project: Hadoop HDFS Issue Type: Improvement Reporter: Fengnan Li Assignee: Fengnan Li Adding/Updating mount points right now is only a router action without validation in the downstream namenodes for the destination files/directories. In practice we have set up the dangling mount points and when clients call listStatus they would get the file returned, but then if they try to access the file FileNotFoundException would be thrown out. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-15447) RBF: Add top owners metrics for delegation tokens
[ https://issues.apache.org/jira/browse/HDFS-15447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fengnan Li resolved HDFS-15447. --- Resolution: Resolved > RBF: Add top owners metrics for delegation tokens > - > > Key: HDFS-15447 > URL: https://issues.apache.org/jira/browse/HDFS-15447 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Fengnan Li >Assignee: Fengnan Li >Priority: Major > > Over time we have seen multiple times of token bombarding behavior either due > to mistakes or user issuing huge amount of traffic. Having this metric will > help figuring out much faster who/which service is owning these tokens and > stopping the behavior quicker. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15447) RBF: Add top owners metrics for delegation tokens
Fengnan Li created HDFS-15447: - Summary: RBF: Add top owners metrics for delegation tokens Key: HDFS-15447 URL: https://issues.apache.org/jira/browse/HDFS-15447 Project: Hadoop HDFS Issue Type: Improvement Reporter: Fengnan Li Assignee: Fengnan Li Over time we have seen multiple times of token bombarding behavior either due to mistakes or user issuing huge amount of traffic. Having this metric will help figuring out much faster who/which service is owning these tokens and stopping the behavior quicker. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15383) RBF: Disable watch in ZKDelegationSecretManager for performance
Fengnan Li created HDFS-15383: - Summary: RBF: Disable watch in ZKDelegationSecretManager for performance Key: HDFS-15383 URL: https://issues.apache.org/jira/browse/HDFS-15383 Project: Hadoop HDFS Issue Type: Improvement Reporter: Fengnan Li Assignee: Fengnan Li Based on the current design for delegation token in secure Router, the total number of watches for tokens is the product of number of routers and number of tokens, this is due to ZKDelegationTokenManager is using PathChildrenCache from curator, which automatically sets the watch and ZK will push the sync information to each router. There are some evaluations about the number of watches in Zookeeper has negative performance impact to Zookeeper server. In our practice when the number of watches exceeds 1.2 Million in a single ZK server there will be significant ZK performance degradation. Thus this ticket is to rewrite ZKDelegationTokenManagerImpl.java to explicitly disable the PathChildrenCache and have Routers sync periodically from Zookeeper. This has been working fine at the scale of 10 Routers with 2 million tokens. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-15196) RouterRpcServer getListing cannot list large dirs correctly
Fengnan Li created HDFS-15196: - Summary: RouterRpcServer getListing cannot list large dirs correctly Key: HDFS-15196 URL: https://issues.apache.org/jira/browse/HDFS-15196 Project: Hadoop HDFS Issue Type: Bug Reporter: Fengnan Li Assignee: Fengnan Li In RouterRpcServer, getListing function is handled as two parts: # Union all partial listings from destination ns + paths # Append mount points for the dir to be listed In the case of large dir which is bigger than DFSConfigKeys.DFS_LIST_LIMIT (with default value 1k), the batch listing will be used and the startAfter will be used to define the boundary of each batch listing. However, step 2 here will add existing mount points, which will mess up with the boundary of the batch, thus making the next batch startAfter wrong. The fix is just to append the mount points when there is no more batch query necessary. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14914) Observer should throw StandbyException in Safemode
Fengnan Li created HDFS-14914: - Summary: Observer should throw StandbyException in Safemode Key: HDFS-14914 URL: https://issues.apache.org/jira/browse/HDFS-14914 Project: Hadoop HDFS Issue Type: Improvement Reporter: Fengnan Li Assignee: Fengnan Li Attachments: HDFS-14914-001.patch When observer is in safemode, calling getBlockLocations will make it throw RetriableException as in [HDFS-13898|https://issues.apache.org/jira/browse/HDFS-13898]. However, during startup the safemode is taking a really long time and retry would not help much here. What makes it worse is that when Routers talking to Observers, since Router distinguishes StandbyException and RetriableException, it will keep retry (default 3) times and then return to the client an RetriableException. The client will retry again on the same Router and to the same Observer for default 10 times, resulting in 3 * 10 = 30 retries per call. The change is to make it failover so that Router can immediately try another Observer or Active namenode (depends on the design). The current ObserverReadProxyProvider doesn't get affected since both RetriableException and StandbyException will make it failover. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14647) NPE during secure namenode startup
Fengnan Li created HDFS-14647: - Summary: NPE during secure namenode startup Key: HDFS-14647 URL: https://issues.apache.org/jira/browse/HDFS-14647 Project: Hadoop HDFS Issue Type: Bug Components: hdfs Affects Versions: 2.8.2 Reporter: Fengnan Li Assignee: Fengnan Li In secure HDFS, during Namenode loading fsimage, when hitting Namenode through the REST API, below exception would be thrown out. (This is in version 2.8.2) {quote}org.apache.hadoop.hdfs.web.resources.ExceptionHandler: INTERNAL_SERVER_ERROR java.lang.NullPointerException at org.apache.hadoop.hdfs.server.common.JspHelper.getTokenUGI(JspHelper.java:283) at org.apache.hadoop.hdfs.server.common.JspHelper.getUGI(JspHelper.java:226) at org.apache.hadoop.hdfs.web.resources.UserProvider.getValue(UserProvider.java:54) at org.apache.hadoop.hdfs.web.resources.UserProvider.getValue(UserProvider.java:42) at com.sun.jersey.server.impl.inject.InjectableValuesProvider.getInjectableValues(InjectableValuesProvider.java:46) at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$EntityParamInInvoker.getParams(AbstractResourceMethodDispatchProvider.java:153) at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:203) at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75) at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339) at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:699) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221) at org.apache.hadoop.hdfs.web.AuthFilter.doFilter(AuthFilter.java:87) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1353) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) {quote} This is because during this phase, namesystem hasn't been initialized. In non-HA context, it can throw a RetriableException to let
[jira] [Created] (HDFS-14449) Expose total number of dt in jmx for KMS/Namenode
Fengnan Li created HDFS-14449: - Summary: Expose total number of dt in jmx for KMS/Namenode Key: HDFS-14449 URL: https://issues.apache.org/jira/browse/HDFS-14449 Project: Hadoop HDFS Issue Type: Improvement Reporter: Fengnan Li Assignee: Fengnan Li -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-14444) RBF: Add safemode to Router UI
[ https://issues.apache.org/jira/browse/HDFS-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fengnan Li resolved HDFS-1. --- Resolution: Duplicate > RBF: Add safemode to Router UI > -- > > Key: HDFS-1 > URL: https://issues.apache.org/jira/browse/HDFS-1 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Fengnan Li >Assignee: Fengnan Li >Priority: Major > > https://issues.apache.org/jira/browse/HDFS-14259 add the safemode metric, but > it is not very straightforward when open the UI of router. We should add in > the page somewhere about the safemode is on. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14444) Add safemode to Router UI
Fengnan Li created HDFS-1: - Summary: Add safemode to Router UI Key: HDFS-1 URL: https://issues.apache.org/jira/browse/HDFS-1 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Fengnan Li Assignee: Fengnan Li https://issues.apache.org/jira/browse/HDFS-14259 add the safemode metric, but it is not very straightforward when open the UI of router. We should add in the page somewhere about the safemode is on. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14427) Optimize some testing set up logic in MiniRouterDFSCluster
Fengnan Li created HDFS-14427: - Summary: Optimize some testing set up logic in MiniRouterDFSCluster Key: HDFS-14427 URL: https://issues.apache.org/jira/browse/HDFS-14427 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Fengnan Li Assignee: Fengnan Li [https://github.com/apache/hadoop/blob/HDFS-13891/hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/MiniRouterDFSCluster.java#L808] the comment says one router is created per name service, while in the code one router is created per namenode in each nameservice. There are a couple of things that might need to consider optimization: # make the code as the the comment # add some ways to specify the number of routers -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14426) RBF: Add delegation token total count as one of the federation metrics
Fengnan Li created HDFS-14426: - Summary: RBF: Add delegation token total count as one of the federation metrics Key: HDFS-14426 URL: https://issues.apache.org/jira/browse/HDFS-14426 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Fengnan Li Assignee: Fengnan Li Currently router doesn't report the total number of current valid delegation tokens it has, but this piece of information is useful for monitoring and understanding the real time situation of tokens. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14405) RBF: Client should be able to renew dt immediately after it fetched the dt
Fengnan Li created HDFS-14405: - Summary: RBF: Client should be able to renew dt immediately after it fetched the dt Key: HDFS-14405 URL: https://issues.apache.org/jira/browse/HDFS-14405 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Fengnan Li Assignee: Fengnan Li By the current design, once a dt is generated it needs to sync to other routers as well as backing up in the state store, therefore there is a time gap between other routers are able to know the existence of this token. Ideally, the same client should be able to renew the token it just created through fetchdt even though two calls are hitting two distinct routers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14327) Support security for DNS resolving
Fengnan Li created HDFS-14327: - Summary: Support security for DNS resolving Key: HDFS-14327 URL: https://issues.apache.org/jira/browse/HDFS-14327 Project: Hadoop HDFS Issue Type: Improvement Reporter: Fengnan Li Assignee: Fengnan Li The DNS resolving, clients will get the IP of the servers (NN/Routers) and use the IP addresses to access the machine. This will fail in secure environment as Kerberos is using the domain name in the principal so it won't recognize the IP addresses. This task is mainly adding a reverse look up on the current basis and get the domain name after the IP is fetched. After that clients will still use the domain name to access the servers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14310) Improve documents for using DNS to resolve namenodes and routers
Fengnan Li created HDFS-14310: - Summary: Improve documents for using DNS to resolve namenodes and routers Key: HDFS-14310 URL: https://issues.apache.org/jira/browse/HDFS-14310 Project: Hadoop HDFS Issue Type: Improvement Reporter: Fengnan Li Assignee: Fengnan Li With https://issues.apache.org/jira/browse/HDFS-14118, clients can just use one domain name to access either namenodes or routers, instead of putting all of their domain names in the config. Update the below documents with this new feature: * hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSHighAvailabilityWithQJM.md * hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSHighAvailabilityWithNFS.md * hadoop-hdfs-project/hadoop-hdfs-rbf/src/site/markdown/HDFSRouterFederation.md * hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/ObserverNamenode.md -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14239) Fix the comment for getClass in Configuration
Fengnan Li created HDFS-14239: - Summary: Fix the comment for getClass in Configuration Key: HDFS-14239 URL: https://issues.apache.org/jira/browse/HDFS-14239 Project: Hadoop HDFS Issue Type: Improvement Reporter: Fengnan Li Assignee: Fengnan Li The comment for getClass method in org.apache.hadoop.conf.Configuration is wrong, it is using property name instead of the actual class name -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14118) RBF: Use DNS to help resolve routers
Fengnan Li created HDFS-14118: - Summary: RBF: Use DNS to help resolve routers Key: HDFS-14118 URL: https://issues.apache.org/jira/browse/HDFS-14118 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Fengnan Li Clients will need to know about routers to talk to the HDFS cluster (obviously), and having routers updating (adding/removing) will have to make every client change, which is a painful process. DNS can be used here to resolve the single domain name clients knows to a list of routers in the current config. However, DNS won't be able to consider only resolving to the working router based on certain health thresholds. There are some ways about how this can be solved. One way is to have a separate script to regularly check the status of the router and update the DNS records if a router fails the health thresholds. In this way, security might be carefully considered for this way. Another way is to have the client do the normal connecting/failover after they get the list of routers, which requires the change of current failover proxy provider. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org