[ https://issues.apache.org/jira/browse/SPARK-8571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619135#comment-14619135 ]
shane knapp commented on SPARK-8571: ------------------------------------ basically the code would look something like: #!/bin/bash rm -rf ./work git clean -fdx export BLAH build/mvn BLAH BLAH retcode1=$? build/mvn WHEE ZOMG retcode2=$? lsof | xargs kill if [[ $retcode1 -ne 0 || $retcode2 -ne 0 ]]; then exit 1 fi > spark streaming hanging processes upon build exit > ------------------------------------------------- > > Key: SPARK-8571 > URL: https://issues.apache.org/jira/browse/SPARK-8571 > Project: Spark > Issue Type: Bug > Components: Build, Streaming > Environment: centos 6.6 amplab build system > Reporter: shane knapp > Assignee: shane knapp > Priority: Minor > Labels: build, test > > over the past 3 months i've been noticing that there are occasionally hanging > processes on our build system workers after various spark builds have > finished. these are all spark streaming processes. > today i noticed a 3+ hour spark build that was timed out after 200 minutes > (https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-pre-YARN/2994/), > and the matrix build hadoop.version=2.0.0-mr1-cdh4.1.2 ran on > amp-jenkins-worker-02. after the timeout, it left the following process (and > all of it's children) hanging. > the process' CLI command was: > {quote} > [root@amp-jenkins-worker-02 ~]# ps auxwww|grep 1714 > jenkins 1714 733 2.7 21342148 3642740 ? Sl 07:52 1713:41 java > -Dderby.system.durability=test -Djava.awt.headless=true > -Djava.io.tmpdir=/home/jenkins/workspace/Spark-Master-Maven-pre-YARN/hadoop.version/2.0.0-mr1-cdh4.1.2/label/centos/streaming/target/tmp > -Dspark.driver.allowMultipleContexts=true > -Dspark.test.home=/home/jenkins/workspace/Spark-Master-Maven-pre-YARN/hadoop.version/2.0.0-mr1-cdh4.1.2/label/centos > -Dspark.testing=1 -Dspark.ui.enabled=false > -Dspark.ui.showConsoleProgress=false > -Dbasedir=/home/jenkins/workspace/Spark-Master-Maven-pre-YARN/hadoop.version/2.0.0-mr1-cdh4.1.2/label/centos/streaming > -ea -Xmx3g -XX:MaxPermSize=512m -XX:ReservedCodeCacheSize=512m > org.scalatest.tools.Runner -R > /home/jenkins/workspace/Spark-Master-Maven-pre-YARN/hadoop.version/2.0.0-mr1-cdh4.1.2/label/centos/streaming/target/scala-2.10/classes > > /home/jenkins/workspace/Spark-Master-Maven-pre-YARN/hadoop.version/2.0.0-mr1-cdh4.1.2/label/centos/streaming/target/scala-2.10/test-classes > -o -f > /home/jenkins/workspace/Spark-Master-Maven-pre-YARN/hadoop.version/2.0.0-mr1-cdh4.1.2/label/centos/streaming/target/surefire-reports/SparkTestSuite.txt > -u > /home/jenkins/workspace/Spark-Master-Maven-pre-YARN/hadoop.version/2.0.0-mr1-cdh4.1.2/label/centos/streaming/target/surefire-reports/. > {quote} > stracing that process doesn't give us much: > {quote} > [root@amp-jenkins-worker-02 ~]# strace -p 1714 > Process 1714 attached - interrupt to quit > futex(0x7ff3cdd269d0, FUTEX_WAIT, 1715, NULL > {quote} > stracing it's children gives is a *little* bit more... some loop like this: > {quote} > <snip> > futex(0x7ff3c8012d28, FUTEX_WAKE_PRIVATE, 1) = 0 > futex(0x7ff3c8012f54, FUTEX_WAIT_PRIVATE, 28969, NULL) = 0 > futex(0x7ff3c8012f28, FUTEX_WAKE_PRIVATE, 1) = 0 > futex(0x7ff3c8f17954, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7ff3c8f17950, > {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1 > futex(0x7ff3c8f17928, FUTEX_WAKE_PRIVATE, 1) = 1 > futex(0x7ff3c8012d54, FUTEX_WAIT_BITSET_PRIVATE, 1, {2263862, 865233273}, > ffffffff) = -1 ETIMEDOUT (Connection timed out) > {quote} > and others loop on prtrace_attach (no such process) or restart_syscall > (resuming interrupted call) > even though this behavior has been solidly pinned to jobs timing out (which > ends w/an aborted, not failed, build), i've seen it happen for failed builds > as well. if i see any hanging processes from failed (not aborted) builds, i > will investigate them and update this bug as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org