Github user guoyuepeng commented on a diff in the pull request: https://github.com/apache/incubator-griffin/pull/421#discussion_r219672109 --- Diff: service/src/main/java/org/apache/griffin/core/util/YarnNetUtil.java --- @@ -56,6 +62,14 @@ public static boolean update(String url, JobInstanceBean instance) { instance.setState(LivySessionStates.toLivyState(state)); } return true; + } catch (HttpClientErrorException e) { + LOGGER.warn("client error {} from yarn: {}", + e.getMessage(), e.getResponseBodyAsString()); + if (e.getStatusCode() == HttpStatus.NOT_FOUND) { + // in sync with Livy behavior, see com.cloudera.livy.utils.SparkYarnApp + instance.setState(DEAD); --- End diff -- Agree we need to handle state, but what if this is caused by network issue, should we double confirm before we jump to conclusion that the instance is dead?
---