[ 
https://issues.apache.org/jira/browse/HDFS-14058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16689993#comment-16689993
 ] 

Chen Liang commented on HDFS-14058:
-----------------------------------

The tests I've run include the following. Please note that the following tests 
were done without several recent changes such as HDFS-14035 and HDFS-14017, but 
with some hacky code change and workaround. Although the required changes have 
been formalized to recent Jiras, the following tests haven't all been re-run 
along with those change. Post here for record.

The tests were done with the setup of 100+ datanodes, 1 Active NameNode and 1 
Observer NameNode. No other standby nodes. The cluster has light HDFS workload, 
has YARN deployed, and has security (Kerberos) enabled. The purpose here was 
not evaluate performance gain, but only to prove the functionality. In all the 
tests below, it is verified from Observer node audit log that the reads 
actually went to Observer node.

1. basic hdfs IO
- From hdfs command:
-- create/delete directory
-- basic file put/get/delete
- From a simple Java program. I wrote some code which creates a DFSClient 
instance and perform some basic operations against it:
-- create/delete directory
-- get/renew delegation token

One observation on this is that, from command line, depending on the relative 
order of ANN and ONN in config, the failover may happen every single time, with 
an exception printed. I believe this is because from command, every single 
command line call will create a new DFSClient instance. Which may start with 
calling Observer for write, causing failover. But for reused DFSClient (e.g. 
from a Java program where it create and reuse same DFSClient), there is no this 
issue.

2. simple MR job: a simple wordcount job from mapreduce-examples jar, on a very 
small input.

3. SliveTest: ran Slive from hadoop-mapreduce-client-jobclient jar, without 
parameters (so it uses default). I ran Slive 3 times for both with Observer 
enabled and disabled. I saw roughly the same ops/sec.

4.DFSIO: ran DFSIO read test several times from 
hadoop-mapreduce-client-jobclient jar, but only with very small input size. (10 
files with 1KB each). 

5. TeraGen/Sort/Validate: ran TeraGen/Sort/Validate from 
hadoop-mapreduce-examples jar with 1TB of data. TeraSort used 1800+ mappers and 
500 reducers. All three jobs finished successfully.

> Test reads from standby on a secure cluster with IP failover
> ------------------------------------------------------------
>
>                 Key: HDFS-14058
>                 URL: https://issues.apache.org/jira/browse/HDFS-14058
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: test
>            Reporter: Konstantin Shvachko
>            Assignee: Chen Liang
>            Priority: Major
>
> Run standard HDFS tests to verify reading from ObserverNode on a secure HA 
> cluster with {{IPFailoverProxyProvider}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to