[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124753#comment-16124753
 ] 

Allen Wittenauer commented on MAPREDUCE-4980:
---------------------------------------------

Thanks!

bq. Is the test failure related?

Sort of.  From what I've observed is that the ASF infra can get IO bound.  This 
means that operations that normally complete quickly can get stuck in IO wait.  
If you have a threaded test that has a portion of it dependent upon IO, it may 
end up where some threads are completing but others appear "stuck". This made 
worse by noisy neighbors on the same node.  Given that this test "failed" while 
doing IO, there's a good chance that due to the extra load this test will 
appear as flaky.  In fact, for this particular test, we've seen it before: 
MAPREDUCE-4753. 

These are the sort of problems I've been fighting.  Many of the tests in 
mr-jobclient are threaded.  They bring up full or partial clusters in 
@before/@after clauses with inappropriate timeouts for these situations. 
NotificationTestCase.java (which is what TestLocalMRNotification really is) 
follows this exact same pattern.

I wouldn't be surprised if unit tests in other parts of the code base don't 
suffer from the same problem.  We might have a test "anti-pattern" happening 
and not realize it.

bq. w.r.t @Test(timeout), I much prefer having a timeout rule per class; if we 
do that rather than simply upping the value in the @Test clauses we set things 
up better for the future.

I've been debating raising the default limit in general for jobclient rather 
than fight these one by one.  But overall, yeah, I agree. Per class timing 
rules really seem like a smarter choice.

bq. Could MiniMRClientClusterBuilder implement Closeable.close()? 
bq. If MiniMRClientClusterBuilder.namenode() was overriddent to take a URI or 
FileSystem, it'd be easier to setup

In the case of closeable, my hunch is that it will be highly dependent upon 
what the YARN minicluster code does, since it's effectively a wrapper around 
it.  

I've been hesitant to fix "too much" here since the tests, despite being slow, 
are (mostly) reliable.  As I go through the patch to fix the checkstyle issues 
(mainly caused by using machine translation), the patch is getting pretty big. 
:(


> Parallel test execution of hadoop-mapreduce-client-core
> -------------------------------------------------------
>
>                 Key: MAPREDUCE-4980
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4980
>             Project: Hadoop Map/Reduce
>          Issue Type: Test
>          Components: test
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Tsuyoshi Ozawa
>            Assignee: Andrey Klochkov
>         Attachments: MAPREDUCE-4980.010.patch, MAPREDUCE-4980.011.patch, 
> MAPREDUCE-4980.1.patch, MAPREDUCE-4980--n3.patch, MAPREDUCE-4980--n4.patch, 
> MAPREDUCE-4980--n5.patch, MAPREDUCE-4980--n6.patch, MAPREDUCE-4980--n7.patch, 
> MAPREDUCE-4980--n7.patch, MAPREDUCE-4980--n8.patch, MAPREDUCE-4980.patch
>
>
> The maven surefire plugin supports parallel testing feature. By using it, the 
> tests can be run more faster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

Reply via email to