[ 
https://issues.apache.org/jira/browse/SPARK-16752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Pran updated SPARK-16752:
-----------------------------
    Attachment: SJS_JOBS_RUNNING
                SJS_JOB_LOG_CONSOLE
                SJS_JOB_COMP_YARN
                SJS_Limited_Log.txt

Please see the attached files for further reference.

> Spark Job Server not releasing jobs from "running list" even after yarn 
> completes the job
> -----------------------------------------------------------------------------------------
>
>                 Key: SPARK-16752
>                 URL: https://issues.apache.org/jira/browse/SPARK-16752
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 0.6.0, 1.5.0
>         Environment: SJS version 0.6.1 and Spark 1.5.0 running on Yarn-client 
> mode
>            Reporter: Ash Pran
>              Labels: patch
>         Attachments: SJS_JOBS_RUNNING, SJS_JOB_COMP_YARN, 
> SJS_JOB_LOG_CONSOLE, SJS_Limited_Log.txt
>
>
> We are having a strange issue with Spark Job Server (SJS)
> We are using SJS 0.6.1 and Spark 1.5.0 with "yarn-client" mode. The details 
> of settings.sh for SJS is as below
> ********************************************************************
> INSTALL_DIR=$(cd `dirname $0`; pwd -P)
> LOG_DIR=$INSTALL_DIR/logs
> PIDFILE=spark-jobserver.pid
> JOBSERVER_MEMORY=16G
> SPARK_VERSION=1.5.0
> SPARK_HOME=/opt/cloudera/parcels/CDH-5.5.2-1.cdh5.5.2.p0.4/lib/spark
> SPARK_CONF_DIR=$SPARK_HOME/conf
> SCALA_VERSION=2.10.4
> ********************************************************************
> We are using fair scheduling with 2 pools with 50 executors of 1 GB each.
> We are also having max-jobs-per-context set to # of cores, which is 48.
> What we are seeing is for the first 5 minutes or so, it is all good ...the 
> jobs get processed fine.
> After 5 minutes or so, we see these 2 issues happening randomly.
> 1) There are no jobs running in the cluster, completely available, but SJS 
> takes request, but does not submit it to the cluster for almost 3 - 4 minutes 
> and the job will be in "running job" list for that long.
> 2) SJS takes request, submits it to cluster, job gets processed from cluster, 
> but even then, SJS does not move the job to completed list, it keeps it in 
> "running job" list for 3 - 4 minutes before moving it to completed job list 
> and during this time, our application keeps waiting for the response.
> More issue details are documented in the external issue URL given below



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to