[ 
https://issues.apache.org/jira/browse/TEZ-4475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4475:
------------------------------
    Description: 
It's very hard to reproduce, but I was able to do it like:
{code}
2023-02-13 11:32:16,302 INFO  [DAGAppMaster Thread] app.DAGAppMaster 
(DAGAppMaster.java:startDAG(2545)) - Running DAG: 
testMultipleClientsWithoutSession2_useDfs
...
2023-02-13 11:32:16,406 INFO  [Thread-675] client.DAGClientImpl 
(DAGClientImpl.java:getVertexStatusInternal(280)) - getVertexStatusInternal for 
Sleep, dagCompleted: true, in cache: false
{code}

in this case, the latter log message was added 
[here|https://github.com/apache/tez/blob/e3e91a150dad44a9daa3102da04542e2e365203d/tez-api/src/main/java/org/apache/tez/dag/api/client/DAGClientImpl.java#L305]
 as:
{code}
    LOG.info("getVertexStatusInternal for {}, dagCompleted: {}, in cache: {}", 
vertexName, dagCompleted,
        cachedVertexStatus.containsKey(vertexName));
{code}

so, the dag has already completed, but there were no vertex status updates yet 
(cache was empty), so unit tests failed with an inconvenient NPE

this bug was always there, but got exposed by TEZ-4447
the easiest way to solve this is to simply wait for dag completion by a tez 
client call which collects vertex status as well, like: 
waitForCompletionWithStatusUpdates (instead of waitForCompletion)

  was:
It's very hard to reproduce, but I was able to do it like:
{code}
2023-02-13 11:32:16,302 INFO  [DAGAppMaster Thread] app.DAGAppMaster 
(DAGAppMaster.java:startDAG(2545)) - Running DAG: 
testMultipleClientsWithoutSession2_useDfs
...
2023-02-13 11:32:16,406 INFO  [Thread-675] client.DAGClientImpl 
(DAGClientImpl.java:getVertexStatusInternal(280)) - getVertexStatusInternal for 
Sleep, dagCompleted: true, in cache: false
{code}

in this case, the latter log message was added 
[here|https://github.com/apache/tez/blob/e3e91a150dad44a9daa3102da04542e2e365203d/tez-api/src/main/java/org/apache/tez/dag/api/client/DAGClientImpl.java#L305]
 as:
{code}
    LOG.info("getVertexStatusInternal for {}, dagCompleted: {}, in cache: {}", 
vertexName, dagCompleted,
        cachedVertexStatus.containsKey(vertexName));
{code}

so, the dag has already completed, but there were no vertex status updates yet 
(cache was empty), so unit tests failed with an inconvenient NPE


> VertexStatus is missing in TestLocalMode if DAG finishes too quickly
> --------------------------------------------------------------------
>
>                 Key: TEZ-4475
>                 URL: https://issues.apache.org/jira/browse/TEZ-4475
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: László Bodor
>            Assignee: László Bodor
>            Priority: Major
>
> It's very hard to reproduce, but I was able to do it like:
> {code}
> 2023-02-13 11:32:16,302 INFO  [DAGAppMaster Thread] app.DAGAppMaster 
> (DAGAppMaster.java:startDAG(2545)) - Running DAG: 
> testMultipleClientsWithoutSession2_useDfs
> ...
> 2023-02-13 11:32:16,406 INFO  [Thread-675] client.DAGClientImpl 
> (DAGClientImpl.java:getVertexStatusInternal(280)) - getVertexStatusInternal 
> for Sleep, dagCompleted: true, in cache: false
> {code}
> in this case, the latter log message was added 
> [here|https://github.com/apache/tez/blob/e3e91a150dad44a9daa3102da04542e2e365203d/tez-api/src/main/java/org/apache/tez/dag/api/client/DAGClientImpl.java#L305]
>  as:
> {code}
>     LOG.info("getVertexStatusInternal for {}, dagCompleted: {}, in cache: 
> {}", vertexName, dagCompleted,
>         cachedVertexStatus.containsKey(vertexName));
> {code}
> so, the dag has already completed, but there were no vertex status updates 
> yet (cache was empty), so unit tests failed with an inconvenient NPE
> this bug was always there, but got exposed by TEZ-4447
> the easiest way to solve this is to simply wait for dag completion by a tez 
> client call which collects vertex status as well, like: 
> waitForCompletionWithStatusUpdates (instead of waitForCompletion)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to