runzhiwang commented on a change in pull request #242: [LIVY-336] Livy should not spawn one thread per job to track the job on Yarn URL: https://github.com/apache/incubator-livy/pull/242#discussion_r333403822
########## File path: server/src/main/scala/org/apache/livy/utils/SparkYarnApp.scala ########## @@ -165,40 +227,24 @@ class SparkYarnApp private[utils] ( /** * Find the corresponding YARN application id from an application tag. * - * @param appTag The application tag tagged on the target application. - * If the tag is not unique, it returns the first application it found. - * It will be converted to lower case to match YARN's behaviour. * @return ApplicationId or the failure. */ - @tailrec - private def getAppIdFromTag( - appTag: String, - pollInterval: Duration, - deadline: Deadline): ApplicationId = { - if (isProcessErrExit()) { - throw new IllegalStateException("spark-submit start failed") - } - - val appTagLowerCase = appTag.toLowerCase() - - // FIXME Should not loop thru all YARN applications but YarnClient doesn't offer an API. - // Consider calling rmClient in YarnClient directly. - yarnClient.getApplications(appType).asScala.find(_.getApplicationTags.contains(appTagLowerCase)) - match { - case Some(app) => app.getApplicationId - case None => - if (deadline.isOverdue) { - process.foreach(_.destroy()) - leakedAppTags.put(appTag, System.currentTimeMillis()) + private def getAppId(): ApplicationId = { + appIdOption.map(ConverterUtils.toApplicationId).getOrElse { + val appTagLowerCase = appTag.toLowerCase() + // FIXME Should not loop thru all YARN applications but YarnClient doesn't offer an API. + // Consider calling rmClient in YarnClient directly. + yarnClient.getApplications(appType).asScala. + find(_.getApplicationTags.contains(appTagLowerCase)) + match { + case Some(app) => app.getApplicationId + case None => throw new IllegalStateException(s"No YARN application is found with tag" + s" $appTagLowerCase in ${livyConf.getTimeAsMs(LivyConf.YARN_APP_LOOKUP_TIMEOUT)/1000}" + " seconds. This may be because 1) spark-submit fail to submit application to YARN; " + "or 2) YARN cluster doesn't have enough resources to start the application in time. " + "Please check Livy log and YARN log to know the details.") - } else { - Clock.sleep(pollInterval.toMillis) - getAppIdFromTag(appTagLowerCase, pollInterval, deadline) Review comment: Because this code call `getAppIdFromTag` recursively until successful or `deadline.isOverdue`, which maybe cost a lot of time, and delay to monitor other app. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services