[
https://issues.apache.org/jira/browse/TEZ-4541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
László Bodor updated TEZ-4541:
------------------------------
Description:
TEZ-1495 introduced a dag id collection
([here|https://github.com/apache/tez/commit/4cf6472e39018d8809a945f4ccb39155d8c03220#diff-54ba4a2af15261379079ed5a9c1f9eea52da7bdd3f1109fe7b96d4abf07f6173R248])
to track all dag ids only to be able to give a different response ( ? )
https://github.com/apache/tez/blob/f8c2e11d0b469748ea95381e7021266e25e5ac89/tez-dag/src/main/java/org/apache/tez/dag/api/client/DAGClientHandler.java#L101-L111
{code}
if (!currentDAGIdStr.equals(dagIdStr)) {
if (getAllDagIDs().contains(dagIdStr)) {
LOG.debug("Looking for finished dagId {} current dag is {}", dagIdStr,
currentDAGIdStr);
throw new DAGNotRunningException("DAG " + dagIdStr + " Not running,
current dag is " +
currentDAGIdStr);
} else {
LOG.warn("Current DAGID : " + currentDAGIdStr + ", Looking for string
(not found): " +
dagIdStr + ", dagIdObj: " + dagId);
throw new TezException("Unknown dagId: " + dagIdStr);
}
}
{code}
I can see that DAGNotRunningException is used by the DAGClientImpl to handle
edge cases, which is fine, so maybe instead of removing this collection we
might want to limit its size, e.g. to 500, to make DAGAppMaster respond as
expected for a certain amount of time (hence not breaking current contract)
> Remove evergrowing dag collections from DAGAppMaster
> ----------------------------------------------------
>
> Key: TEZ-4541
> URL: https://issues.apache.org/jira/browse/TEZ-4541
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: László Bodor
> Priority: Major
>
> TEZ-1495 introduced a dag id collection
> ([here|https://github.com/apache/tez/commit/4cf6472e39018d8809a945f4ccb39155d8c03220#diff-54ba4a2af15261379079ed5a9c1f9eea52da7bdd3f1109fe7b96d4abf07f6173R248])
> to track all dag ids only to be able to give a different response ( ? )
> https://github.com/apache/tez/blob/f8c2e11d0b469748ea95381e7021266e25e5ac89/tez-dag/src/main/java/org/apache/tez/dag/api/client/DAGClientHandler.java#L101-L111
> {code}
> if (!currentDAGIdStr.equals(dagIdStr)) {
> if (getAllDagIDs().contains(dagIdStr)) {
> LOG.debug("Looking for finished dagId {} current dag is {}",
> dagIdStr, currentDAGIdStr);
> throw new DAGNotRunningException("DAG " + dagIdStr + " Not running,
> current dag is " +
> currentDAGIdStr);
> } else {
> LOG.warn("Current DAGID : " + currentDAGIdStr + ", Looking for string
> (not found): " +
> dagIdStr + ", dagIdObj: " + dagId);
> throw new TezException("Unknown dagId: " + dagIdStr);
> }
> }
> {code}
> I can see that DAGNotRunningException is used by the DAGClientImpl to handle
> edge cases, which is fine, so maybe instead of removing this collection we
> might want to limit its size, e.g. to 500, to make DAGAppMaster respond as
> expected for a certain amount of time (hence not breaking current contract)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)