[ 
https://issues.apache.org/jira/browse/HADOOP-9112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583115#comment-13583115
 ] 

nkeywal commented on HADOOP-9112:
---------------------------------

An issue I have with timeouts is that we have to change them during debugging 
-may be there is an option I don't knwow-.

Anyway, a test process can fail in the afterSuite (basically, when you're 
shutting down the cluster). And surefire may not kill it, and you won't know, 
and you will find it at the next build. 

In HBase, we do that before running the tests:
  ### kill any process remaining from another test, maybe even another project
  jps | grep surefirebooter | cut -d ' ' -f 1 | xargs kill -9 2>/dev/null

And this after
  ZOMBIE_TESTS_COUNT=`jps | grep surefirebooter | wc -l`
  if [[ $ZOMBIE_TESTS_COUNT != 0 ]] ; then
    #It seems sometimes the tests are not dying immediately. Let's give them 30s
    echo "Suspicious java process found - waiting 30s to see if there are just 
slow to stop"
    sleep 30
    ZOMBIE_TESTS_COUNT=`jps | grep surefirebooter | wc -l`
    if [[ $ZOMBIE_TESTS_COUNT != 0 ]] ; then
      echo "There are $ZOMBIE_TESTS_COUNT zombie tests, they should have been 
killed by surefire but survived"
      echo "************ BEGIN zombies jstack extract"
      ZB_STACK=`jps | grep surefirebooter | cut -d ' ' -f 1 | xargs -n 1 jstack 
| grep ".test" | grep "\.java"`
      jps | grep surefirebooter | cut -d ' ' -f 1 | xargs -n 1 jstack
      echo "************ END  zombies jstack extract"
      JIRA_COMMENT="$JIRA_COMMENT

     {color:red}-1 core zombie tests{color}.  There are ${ZOMBIE_TESTS_COUNT} 
zombie test(s): ${ZB_STACK}"
      BAD=1
      jps | grep surefirebooter | cut -d ' ' -f 1 | xargs kill -9
    else
      echo "We're ok: there is no zombie test, but some tests took some time to 
stop"
    fi
  else
    echo "We're ok: there is no zombie test"
  fi

See http://www.mail-archive.com/issues@hbase.apache.org/msg73169.html for the 
outcome (it's actually a hdfs zombie, this was before we started killing the 
zombies at the beginning of our tests). The whole stack is in the build logs.

It has improved the precommit success ratio.



It was my two cents :-)


                
> test-patch should -1 for @Tests without a timeout
> -------------------------------------------------
>
>                 Key: HADOOP-9112
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9112
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Todd Lipcon
>            Assignee: Surenkumar Nihalani
>             Fix For: 3.0.0
>
>         Attachments: HADOOP-9112-1.patch, HADOOP-9112-2.patch, 
> HADOOP-9112-3.patch, HADOOP-9112-4.patch, HADOOP-9112-5.patch, 
> HADOOP-9112-6.patch, HADOOP-9112-7.patch
>
>
> With our current test running infrastructure, if a test with no timeout set 
> runs too long, it triggers a surefire-wide timeout, which for some reason 
> doesn't show up as a failed test in the test-patch output. Given that, we 
> should require that all tests have a timeout set, and have test-patch enforce 
> this with a simple check

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to