[ 
https://issues.apache.org/jira/browse/TEZ-4541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4541:
------------------------------
    Description: 
TEZ-1495 introduced a dag id collection 
([here|https://github.com/apache/tez/commit/4cf6472e39018d8809a945f4ccb39155d8c03220#diff-54ba4a2af15261379079ed5a9c1f9eea52da7bdd3f1109fe7b96d4abf07f6173R248])
 to track all dag ids only to be able to give a different response ( ? )

https://github.com/apache/tez/blob/f8c2e11d0b469748ea95381e7021266e25e5ac89/tez-dag/src/main/java/org/apache/tez/dag/api/client/DAGClientHandler.java#L101-L111
{code}
    if (!currentDAGIdStr.equals(dagIdStr)) {
      if (getAllDagIDs().contains(dagIdStr)) {
        LOG.debug("Looking for finished dagId {} current dag is {}", dagIdStr, 
currentDAGIdStr);
        throw new DAGNotRunningException("DAG " + dagIdStr + " Not running, 
current dag is " +
            currentDAGIdStr);
      } else {
        LOG.warn("Current DAGID : " + currentDAGIdStr + ", Looking for string 
(not found): " +
            dagIdStr + ", dagIdObj: " + dagId);
        throw new TezException("Unknown dagId: " + dagIdStr);
      }
    }
{code}

I can see that DAGNotRunningException is used by the DAGClientImpl to handle 
edge cases (infer dag completion if the dag is not present as current dag but 
present in the dag ids collection), which is fine, so maybe instead of removing 
this collection we might want to limit its size, e.g. to 500, to make 
DAGAppMaster respond as expected for a certain amount of time (hence not 
breaking current contract)

  was:
TEZ-1495 introduced a dag id collection 
([here|https://github.com/apache/tez/commit/4cf6472e39018d8809a945f4ccb39155d8c03220#diff-54ba4a2af15261379079ed5a9c1f9eea52da7bdd3f1109fe7b96d4abf07f6173R248])
 to track all dag ids only to be able to give a different response ( ? )

https://github.com/apache/tez/blob/f8c2e11d0b469748ea95381e7021266e25e5ac89/tez-dag/src/main/java/org/apache/tez/dag/api/client/DAGClientHandler.java#L101-L111
{code}
    if (!currentDAGIdStr.equals(dagIdStr)) {
      if (getAllDagIDs().contains(dagIdStr)) {
        LOG.debug("Looking for finished dagId {} current dag is {}", dagIdStr, 
currentDAGIdStr);
        throw new DAGNotRunningException("DAG " + dagIdStr + " Not running, 
current dag is " +
            currentDAGIdStr);
      } else {
        LOG.warn("Current DAGID : " + currentDAGIdStr + ", Looking for string 
(not found): " +
            dagIdStr + ", dagIdObj: " + dagId);
        throw new TezException("Unknown dagId: " + dagIdStr);
      }
    }
{code}

I can see that DAGNotRunningException is used by the DAGClientImpl to handle 
edge cases, which is fine, so maybe instead of removing this collection we 
might want to limit its size, e.g. to 500, to make DAGAppMaster respond as 
expected for a certain amount of time (hence not breaking current contract)


> Remove or limit evergrowing DAG collections from DAGAppMaster
> -------------------------------------------------------------
>
>                 Key: TEZ-4541
>                 URL: https://issues.apache.org/jira/browse/TEZ-4541
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: László Bodor
>            Assignee: László Bodor
>            Priority: Major
>
> TEZ-1495 introduced a dag id collection 
> ([here|https://github.com/apache/tez/commit/4cf6472e39018d8809a945f4ccb39155d8c03220#diff-54ba4a2af15261379079ed5a9c1f9eea52da7bdd3f1109fe7b96d4abf07f6173R248])
>  to track all dag ids only to be able to give a different response ( ? )
> https://github.com/apache/tez/blob/f8c2e11d0b469748ea95381e7021266e25e5ac89/tez-dag/src/main/java/org/apache/tez/dag/api/client/DAGClientHandler.java#L101-L111
> {code}
>     if (!currentDAGIdStr.equals(dagIdStr)) {
>       if (getAllDagIDs().contains(dagIdStr)) {
>         LOG.debug("Looking for finished dagId {} current dag is {}", 
> dagIdStr, currentDAGIdStr);
>         throw new DAGNotRunningException("DAG " + dagIdStr + " Not running, 
> current dag is " +
>             currentDAGIdStr);
>       } else {
>         LOG.warn("Current DAGID : " + currentDAGIdStr + ", Looking for string 
> (not found): " +
>             dagIdStr + ", dagIdObj: " + dagId);
>         throw new TezException("Unknown dagId: " + dagIdStr);
>       }
>     }
> {code}
> I can see that DAGNotRunningException is used by the DAGClientImpl to handle 
> edge cases (infer dag completion if the dag is not present as current dag but 
> present in the dag ids collection), which is fine, so maybe instead of 
> removing this collection we might want to limit its size, e.g. to 500, to 
> make DAGAppMaster respond as expected for a certain amount of time (hence not 
> breaking current contract)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to