[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16251809#comment-16251809 ]
Allen Wittenauer commented on HDFS-12711: ----------------------------------------- With the kill code in place, I'm seeing wild fluctuations in hdfs and mr unit tests. Lots of unreaped processes. Probably a hint that they are paused for some reason. I have a hunch that we're pretty much bottlenecked on IO. Tests happen on a single disk that is shared among all the executors on that jenkins node. Let's say 2xHDFS tests are running, that could easily be thousands of threads doing IO to the same disk. It might be smart to decrease the # of parallel tests, at least in HDFS. This obviously impacts runtime (which is already out of control) but will probably increase accuracy. Or, we could attempt to split up the tests such that compute heavy get done in parallel, IO heavy get done serial. Of course, if no one is paying attention to the tests anyway, we could just disable them altogether I guess. > deadly hdfs test > ---------------- > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test > Affects Versions: 2.9.0, 2.8.2 > Reporter: Allen Wittenauer > Priority: Critical > Attachments: HDFS-12711.branch-2.00.patch, fakepatch.branch-2.txt > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org