[ https://issues.apache.org/jira/browse/HADOOP-9287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13663101#comment-13663101 ]
Chris Nauroth commented on HADOOP-9287: --------------------------------------- {quote} As part of this effort, it would be good to enumerate patterns that can cause concurrent tests to fail {quote} This sounds like good material for the code review checklist. http://wiki.apache.org/hadoop/CodeReviewChecklist {quote} Instead of changing individual tests to use unique test folder paths, couldn't we just reconfigure test.build.data from the outside (from maven)? {quote} This would be very convenient, but unfortunately, I can't think of a way to make it work. The problem is that our pom.xml code hands over control to maven-surefire-plugin, which then iterates through each test suite class and executes them. When execution enters maven-surefire-plugin, the Maven properties are frozen at a specific state. I don't believe there is any way for our pom.xml code to take back control from maven-surefire-plugin between test suite iterations to generate a different unique ID. Maybe a custom JUnit runner could do it? At that point, it might be more trouble than it's worth. Does anyone else have ideas on this? I'm also not aware of any built-in unique ID property or external plugins that generate unique IDs, so we might end up needing to code another custom plugin of our own. {quote} Chris Nauroth, have you had a chance to kick the tires on this patch for Windows? {quote} Results look good so far. First, I ran the tests without the parallel-tests profile enabled. As expected, this caused no harm to the test results on Windows. That's a great sign! Next, I enabled parallel-tests with the default thread count of 4. Performance improvement was similar to what is reported here: from ~15 minutes down to ~8 minutes, and this is on a fairly wimpy VM. I did see some new failures though: # There were failures due to test timeouts in TestCopyPreserveFlag (testPutWithP, testPutWithoutP, testGetWithP, testGetWithoutP), and TestLocalFileSystem (testWorkingDirectory, testCopy). These all have very short timeouts (1s). I suspect that multi-threaded execution introduced a bit of context-switching overhead that just barely pushed it over the timeout. I recommend increasing these timeouts to 10s. Unfortunately, this suggests that timeout settings + parallel execution could be another source of flaky test results in the future. # {{TestTFileNoneCodecsByteArrays#testFailureNegativeLength_3}} failed with an EOFException, which makes me think that 2 tests tried to share a file or directory and saw unexpected data. This inherits from a base class, and I see that the code changes in the base class should have prevented a sharing problem, but perhaps we missed something. I think we ought to investigate this one before committing. It's probably not a Windows problem, but rather just a coincidence that the problem manifested on a Windows machine. [~aklochkov], thanks again for sticking with this issue and responding to the feedback. This is going to be a big help for developer productivity. I got pretty excited when the common tests finished so quickly on my machine! :-) > Parallel testing hadoop-common > ------------------------------ > > Key: HADOOP-9287 > URL: https://issues.apache.org/jira/browse/HADOOP-9287 > Project: Hadoop Common > Issue Type: Test > Components: test > Affects Versions: 3.0.0 > Reporter: Tsuyoshi OZAWA > Assignee: Andrey Klochkov > Attachments: HADOOP-9287.1.patch, HADOOP-9287--N3.patch, > HADOOP-9287--N3.patch, HADOOP-9287--N4.patch, HADOOP-9287--N5.patch, > HADOOP-9287.patch, HADOOP-9287.patch > > > The maven surefire plugin supports parallel testing feature. By using it, the > tests can be run more faster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira