[ 
https://issues.apache.org/jira/browse/SUREFIRE-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15540785#comment-15540785
 ] 

James Taylor commented on SUREFIRE-1287:
----------------------------------------

We start an HBase mini cluster per test category in a BeforeClass method, so 
the same cluster is used for many test classes. We need to be able to modify 
the HBase server-side configurations on the cluster for different categories of 
tests to mimic different options provided by Phoenix, thus we cannot just spin 
up a single cluster. We also have a category of tests that simulate various 
failure scenarios (i.e. region server crashes), so each of these needs to spin 
up and tear down it's own mini cluster.

The parallel framework of surefire has been very helpful to us because it's 
enabled us to run test methods in parallel, cutting our test time in half. To 
do this, we need to create unique HBase tables for each test so that they don't 
conflict (since they're running on the same cluster). With too many tables, 
though, HBase starts to get over taxed and we must restart the mini cluster 
after some time. We monitor that through our own test runner which invokes the 
tear down after a threshold is reached. Having separate JVMs on top of the 
parallelization surefire enables helps alleviate this. FWIW, we tried just 
spawning a single JVM per test category, but unless we only run a few tests, 
the framework would just seem to inexplicably hang.

More logging would be very helpful. I'm pretty sure the JVM is crashing, but 
it's difficult for us to say which test was running at the time. Maybe it'd be 
possible for the forked JVM to communicate with the boot JVM after each test 
class is complete? Or maybe the pings that you do could (or already do?) 
communicate this back to the main JVM.

Thanks for your help, [~tibor17]. Really appreciate it!

> Improve logging to understand why test run failed and report the right failed 
> category
> --------------------------------------------------------------------------------------
>
>                 Key: SUREFIRE-1287
>                 URL: https://issues.apache.org/jira/browse/SUREFIRE-1287
>             Project: Maven Surefire
>          Issue Type: Bug
>          Components: Maven Surefire Plugin
>    Affects Versions: 2.19.1
>            Reporter: Samarth Jain
>
> As part of our automated jenkins builds that run after every checkin, we have 
> been seeing a lot of these failures:
> Failed to execute goal 
> org.apache.maven.plugins:maven-failsafe-plugin:2.19.1:verify 
> (ParallelStatsEnabledTest) on project phoenix-core: There was a timeout or 
> other error in the fork
> Sample run:
> https://builds.apache.org/job/Phoenix-master/1420/console
> Unfortunately that bit of error information doesn't really help. It would be 
> good to know why exactly the fork timed out or failed. What we do know is 
> that some of the tests in the Junit category ParallelStatsDisabledTest failed 
> to complete. However, failsafe incorrectly reports the failed category as the 
> first category that ran. In this case it happened to be 
> ParallelStatsEnabledTest. Also to note is the fact that failsafe kicks off 
> next category run even before all the tests in the current category have 
> finished. I am not sure if that is by design or a bug. 
> FYI, [~jamestaylor].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to