[ https://issues.apache.org/jira/browse/HDFS-14058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16689993#comment-16689993 ]
Chen Liang edited comment on HDFS-14058 at 12/5/18 12:02 AM: ------------------------------------------------------------- The tests were done with the setup of 100+ datanodes, 1 Active NameNode and 1 Observer NameNode. No other standby nodes. The cluster has light HDFS workload, and has YARN deployed, has security (Kerberos) enabled. The purpose here was not evaluate performance gain, but mainly to prove the functionality and correctness. In all the tests below, it is *verified from both name nodes audit log* that the reads actually went to Observer node and writes went to Active, and it is *verified from job/client logs* that when client could not talk to Observer (e.g. for write requests, or Observer node is actually in Standby not observer), it fell back to talking to the active. The specific tests done include: 1. basic hdfs IO - From hdfs command: -- create/delete directory -- basic file put/get/delete - From a simple Java program. I wrote some code which creates a DFSClient instance and perform some basic operations against it: -- create/delete directory -- get/renew delegation token One observation on this is that, from command line, depending on the relative order of ANN and ONN in config, the failover may happen every single time, with an exception printed. This is because from command, every single command line call will create a new DFSClient instance. Which may start with calling Observer for write, causing failover. But for reused DFSClient (e.g. from a Java program where it create and reuse same DFSClient), there is no this issue. 2. simple MR job: a simple wordcount job from mapreduce-examples jar, on a very small input. 3. SliveTest: ran Slive from hadoop-mapreduce-client-jobclient jar, with default parameters. I ran Slive 3 times for both with Observer enabled and disabled. I saw similar number of ops/sec. 4.DFSIO: ran DFSIO read test several times from hadoop-mapreduce-client-jobclient jar, the tests were done with 100 files, 100 MB each. 5. TeraGen/Sort/Validate: ran TeraGen/Sort/Validate several times from hadoop-mapreduce-examples jar with 1TB of data. TeraSort used 1800+ mappers and 500 reducers. All three jobs finished successfully. was (Author: vagarychen): The tests I've run include the following. Please note that the following tests were done without several recent changes such as HDFS-14035 and HDFS-14017, but with some hacky code change and workaround. Although the required changes have been formalized to recent Jiras, the following tests haven't all been re-run along with those change. Post here for record. The tests were done with the setup of 100+ datanodes, 1 Active NameNode and 1 Observer NameNode. No other standby nodes. The cluster has light HDFS workload, has YARN deployed, and has security (Kerberos) enabled. The purpose here was not evaluate performance gain, but only to prove the functionality. In all the tests below, it is verified from Observer node audit log that the reads actually went to Observer node. 1. basic hdfs IO - From hdfs command: -- create/delete directory -- basic file put/get/delete - From a simple Java program. I wrote some code which creates a DFSClient instance and perform some basic operations against it: -- create/delete directory -- get/renew delegation token One observation on this is that, from command line, depending on the relative order of ANN and ONN in config, the failover may happen every single time, with an exception printed. I believe this is because from command, every single command line call will create a new DFSClient instance. Which may start with calling Observer for write, causing failover. But for reused DFSClient (e.g. from a Java program where it create and reuse same DFSClient), there is no this issue. 2. simple MR job: a simple wordcount job from mapreduce-examples jar, on a very small input. 3. SliveTest: ran Slive from hadoop-mapreduce-client-jobclient jar, without parameters (so it uses default). I ran Slive 3 times for both with Observer enabled and disabled. I saw roughly the same ops/sec. 4.DFSIO: ran DFSIO read test several times from hadoop-mapreduce-client-jobclient jar, but only with very small input size. (10 files with 1KB each). 5. TeraGen/Sort/Validate: ran TeraGen/Sort/Validate from hadoop-mapreduce-examples jar with 1TB of data. TeraSort used 1800+ mappers and 500 reducers. All three jobs finished successfully. > Test reads from standby on a secure cluster with IP failover > ------------------------------------------------------------ > > Key: HDFS-14058 > URL: https://issues.apache.org/jira/browse/HDFS-14058 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test > Reporter: Konstantin Shvachko > Assignee: Chen Liang > Priority: Major > > Run standard HDFS tests to verify reading from ObserverNode on a secure HA > cluster with {{IPFailoverProxyProvider}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org