[ 
https://issues.apache.org/jira/browse/TEZ-714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14486038#comment-14486038
 ] 

Bikas Saha edited comment on TEZ-714 at 4/8/15 9:06 PM:
--------------------------------------------------------


abortVertex() should probably be called for all non-succeeded states? Similar 
to DAGImpl. Otherwise TaskCompletedAfterVertexSuccessTransition will have a 
regression.
{code}      case ERROR:
        addDiagnostic("Vertex: " + logIdentifier + " error due to:" + 
terminationCause);
        if (!StringUtils.isEmpty(diag)) {
          addDiagnostic(diag);
        }
        eventHandler.handle(new DAGEvent(getDAGId(),
            DAGEventType.INTERNAL_ERROR));
        try {
          logJobHistoryVertexFailedEvent(finalState);
        } catch (IOException e) {
          LOG.error("Failed to send vertex finished event to recovery", e);
        }
        break;
      case KILLED:
      case FAILED:
        addDiagnostic("Vertex " + logIdentifier + " killed/failed due to:" + 
terminationCause);
        if (!StringUtils.isEmpty(diag)) {
          addDiagnostic(diag);
        }
        abortVertex(VertexStatus.State.valueOf(finalState.name()));
{code}

testCommitOutputOnVertexSuccess() still has
{code}    // both committers fail
    DAG dag4 = createDAG("testDAGBothCommitsFail", false, true); <<< true/true
{code}

Mis-written comment? v3 is killed due to...
{code}   // killed is failed due to the commit failure of the vertex group 
(v1,v2)
    Assert.assertEquals(VertexState.KILLED, v3.getState());
    Assert.assertEquals(VertexTerminationCause.OTHER_VERTEX_FAILURE,
 {code}

The test is not checking/verifying that one of the commits was indeed not 
scheduled on the threadpool and cancelled before that. Maybe set a flag when 
the callable is scheduled and check that the flag was not set. Also perhaps set 
a flag when interrupted exception is caught when the scheduled event got 
canceled.
{code}
 // test commit will be canceled no matter it is started or still in the 
threadpool
  @Test(timeout = 5000)
  public void testCommitCanceled() throws Exception {
{code}

Please cleanup the findbugs warnings.

Spoke offline with [~hitesh] about aborting all output after any failure and it 
seems like the right thing to do.


was (Author: bikassaha):


abortVertex() should probably be called for all non-succeeded states? Other 
TaskCompletedAfterVertexSuccessTransition will have a regression.
{code}      case ERROR:
        addDiagnostic("Vertex: " + logIdentifier + " error due to:" + 
terminationCause);
        if (!StringUtils.isEmpty(diag)) {
          addDiagnostic(diag);
        }
        eventHandler.handle(new DAGEvent(getDAGId(),
            DAGEventType.INTERNAL_ERROR));
        try {
          logJobHistoryVertexFailedEvent(finalState);
        } catch (IOException e) {
          LOG.error("Failed to send vertex finished event to recovery", e);
        }
        break;
      case KILLED:
      case FAILED:
        addDiagnostic("Vertex " + logIdentifier + " killed/failed due to:" + 
terminationCause);
        if (!StringUtils.isEmpty(diag)) {
          addDiagnostic(diag);
        }
        abortVertex(VertexStatus.State.valueOf(finalState.name()));
{code}

testCommitOutputOnVertexSuccess() still has
{code}    // both committers fail
    DAG dag4 = createDAG("testDAGBothCommitsFail", false, true); <<< true/true
{code}

Mis-written comment? v3 is killed due to...
{code}   // killed is failed due to the commit failure of the vertex group 
(v1,v2)
    Assert.assertEquals(VertexState.KILLED, v3.getState());
    Assert.assertEquals(VertexTerminationCause.OTHER_VERTEX_FAILURE,
 {code}

The test is not checking/verifying that one of the commits was indeed not 
scheduled on the threadpool and cancelled before that. Maybe set a flag when 
the callable is scheduled and check that the flag was not set. Also perhaps set 
a flag when interrupted exception is caught when the scheduled event got 
canceled.
{code}
 // test commit will be canceled no matter it is started or still in the 
threadpool
  @Test(timeout = 5000)
  public void testCommitCanceled() throws Exception {
{code}

Please cleanup the findbugs warnings.

Spoke offline with [~hitesh] about aborting all output after any failure and it 
seems like the right thing to do.

> OutputCommitters should not run in the main AM dispatcher thread
> ----------------------------------------------------------------
>
>                 Key: TEZ-714
>                 URL: https://issues.apache.org/jira/browse/TEZ-714
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Siddharth Seth
>            Assignee: Jeff Zhang
>            Priority: Critical
>         Attachments: DAG_2.pdf, TEZ-714-1.patch, TEZ-714-10.patch, 
> TEZ-714-2.patch, TEZ-714-3.patch, TEZ-714-4.patch, TEZ-714-5.patch, 
> TEZ-714-6.patch, TEZ-714-7.patch, TEZ-714-8.patch, TEZ-714-9.patch, 
> Vertex_2.pdf
>
>
> Follow up jira from TEZ-41.
> 1) If there's multiple OutputCommitters on a Vertex, they can be run in 
> parallel.
> 2) Running an OutputCommitter in the main thread blocks all other event 
> handling, w.r.t the DAG, and causes the event queue to back up.
> 3) This should also cover shared commits that happen in the DAG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to