amahussein commented on a change in pull request #161:
URL: https://github.com/apache/tez/pull/161#discussion_r749556005



##########
File path: 
tez-api/src/main/java/org/apache/tez/dag/api/client/DAGClientImpl.java
##########
@@ -221,13 +245,14 @@ protected DAGStatus getDAGStatusInternal(@Nullable 
Set<StatusGetOpts> statusOpti
       final DAGStatus dagStatus = getDAGStatusViaAM(statusOptions, timeout);
 
       if (!dagCompleted) {
-        if (dagStatus != null) {
-          cachedDagStatus = dagStatus;
+        if (dagStatus != null) { // update the cached DAGStatus
+          cachedDAGStatusRef.setValue(dagStatus);
           return dagStatus;
         }
-        if (cachedDagStatus != null) {
+        DAGStatus cachedDAG = cachedDAGStatusRef.getValue();
+        if (cachedDAG != null) {
           // could not get from AM (not reachable/ was killed). return cached 
status.
-          return cachedDagStatus;
+          return cachedDAG;

Review comment:
       Thanks for the feedback.
   
   Yes, considering the implementation of 
[TezJob.run()-Line222](https://github.com/apache/pig/blob/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/TezJob.java#L222)
 in Pig:
   
   Pig polls on the DAGStatus inside an infinite loop:
   ```java
           while (true) {
               try {
                   dagStatus = dagClient.getDAGStatus(null);
               } catch (Exception e) {
                   log.info("Cannot retrieve DAG status", e);
                   break;
               }
              if (dagStatus.isCompleted()) {
                  // do something
                  // break;
              }
              sleep(1000);
          }  
   ```
   
   
   Let's assume the following scenario on Tez Side:
   - Pig first iteration calls `getDAGStatusViaAM()` which successfully pulls 
the DAGStatus and updates the cachedDAGStatus to running.
   - Pig sleeps 1000
   - second call from Pig calls `getDAGStatusViaAM()` which encounters 
`TezException` or `IOException`. The call would return the last cachedDAGStatus 
(which is running), instead of null.
   - Since the status is `running`, the Pig-thread sleeps
   - This will keep going as long as the `getDAGStatusViaAM()` fails, and the 
last valid DAGStatus is still cached.
   
   The problem in this corner case is that the Pig client will keep looping 
indefinitely as long as it does not receive a null or 
dagClient.getDAGStatus(null) does not throw an exception.
   From a client perspective, it is better to fail early in order to recover 
faster.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@tez.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to