[jira] [Commented] (HDFS-9890) libhdfs++: Add test suite to simulate network issues

Xiaowei Zhu (JIRA) Mon, 11 Jul 2016 09:43:28 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-9890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15371124#comment-15371124
 ]


Xiaowei Zhu commented on HDFS-9890:
-----------------------------------

I found the root cause of the non-deterministic failures in our unit tests. Our 
patch with the changes in filesystem.cc changes number of threads from 1 to 2, 
in FileSystemImpl::FileSystemImpl(...), which causes those failures. I verified 
with the latest HDFS-8707 and reproduced the same issue when I increased the 
number of threads. This change was introduced with the original 000.patch and 
is not so related to what this jira is about. So I plan to change the thread 
value back to 1 and file another jira about this found issue.

> libhdfs++: Add test suite to simulate network issues
> ----------------------------------------------------
>
>                 Key: HDFS-9890
>                 URL: https://issues.apache.org/jira/browse/HDFS-9890
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: hdfs-client
>            Reporter: James Clampffer
>            Assignee: Xiaowei Zhu
>         Attachments: HDFS-9890.HDFS-8707.000.patch, 
> HDFS-9890.HDFS-8707.001.patch, HDFS-9890.HDFS-8707.002.patch, 
> HDFS-9890.HDFS-8707.003.patch, HDFS-9890.HDFS-8707.004.patch, 
> HDFS-9890.HDFS-8707.005.patch, HDFS-9890.HDFS-8707.006.patch, 
> HDFS-9890.HDFS-8707.007.patch, HDFS-9890.HDFS-8707.008.patch, 
> HDFS-9890.HDFS-8707.009.patch, HDFS-9890.HDFS-8707.010.patch, 
> HDFS-9890.HDFS-8707.011.patch, HDFS-9890.HDFS-8707.012.patch, 
> HDFS-9890.HDFS-8707.012.patch, HDFS-9890.HDFS-8707.013.patch, 
> HDFS-9890.HDFS-8707.013.patch, HDFS-9890.HDFS-8707.014.patch, 
> HDFS-9890.HDFS-8707.015.patch, hs_err_pid26832.log, hs_err_pid4944.log
>
>
> I propose adding a test suite to simulate various network issues/failures in 
> order to get good test coverage on some of the retry paths that aren't easy 
> to hit in mock unit tests.
> At the moment the only things that hit the retry paths are the gmock unit 
> tests.  The gmock are only as good as their mock implementations which do a 
> great job of simulating protocol correctness but not more complex 
> interactions.  They also can't really simulate the types of lock contention 
> and subtle memory stomps that show up while doing hundreds or thousands of 
> concurrent reads.   We should add a new minidfscluster test that focuses on 
> heavy read/seek load and then randomly convert error codes returned by 
> network functions into errors.
> List of things to simulate(while heavily loaded), roughly in order of how 
> badly I think they need to be tested at the moment:
> -Rpc connection disconnect
> -Rpc connection slowed down enough to cause a timeout and trigger retry
> -DN connection disconnect



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-9890) libhdfs++: Add test suite to simulate network issues

Reply via email to