Kuhu Shukla created TEZ-3347:
--------------------------------

             Summary: Vertex UI throws an error while getting vertexProgress 
for a killed Vertex
                 Key: TEZ-3347
                 URL: https://issues.apache.org/jira/browse/TEZ-3347
             Project: Apache Tez
          Issue Type: Bug
          Components: UI
            Reporter: Kuhu Shukla
         Attachments: ErrorCodeFailedVertex.png

Given an AM that fails all its attempts, the application fails and the very 
first click on the killed/failed vertex throws the following error:
{code}
 error code: Unknown, message: expected expression, got '<'
{code}
It self corrects if tried again immediately after the failure.

This is because the RM proxy redirects the call to the AHS server and the REST 
call is malformed for that server. Upon inspection of the responses, it was 
seen that the URL looked something like this:
{code}
http://<hostname>:<ahsport>/applicationhistory/app/application_123_456/ws/v1/tez/vertexProgress?dagID=1&vertexID=01&_=123
{code}
which is not a proper Rest call on the AHS.

I think the following code can cause this issue:
{code}
// Load progress in parallel for v1 version of the api
  _loadProgress: function (vertices) {
    var that = this,
        runningVerticesIdx = vertices
      .filterBy('status', 'RUNNING')
      .map(function(item) {
        return item.get('id').split('_').splice(-1).pop();
      });

    if (runningVerticesIdx.length > 0) {
      this.store.unloadAll('vertexProgress');
      this.store.findQuery('vertexProgress', {
        metadata: {
          appId: that.get('applicationId'),
          dagIdx: that.get('idx'),
          vertexIds: runningVerticesIdx.join(',')
        }
      }).then(function(vertexProgressInfo) {
          App.Helpers.emData.mergeRecords(
            that.get('rowsDisplayed'),
            vertexProgressInfo,
            ['progress']
          );
      }).catch(function(error) {
        error.message = "Failed to fetch vertexProgress. Application Master 
(AM) is out of reach. Either it's down, or CORS is not enabled for YARN 
ResourceManager.";
        Em.Logger.error(error);
        var err = App.Helpers.misc.formatError(error);
        var msg = 'Error code: %@, message: %@'.fmt(err.errCode, err.msg);
        App.Helpers.ErrorBar.getInstance().show(msg, err.details);
      });
{code}
which uses AMInfo that gets the response based on what loadApp method finds:
{code}
loadApp: function (store, appId, useCache) {
    if(!useCache) {
      App.Helpers.misc.removeRecord(store, 'appDetail', appId);
      App.Helpers.misc.removeRecord(store, 'clusterApp', appId);
    }

    return store.find('clusterApp', appId).catch(function () {
      return store.find('appDetail', appId);
    }).catch(function (error) {
      error.message = "Couldn't get details of application %@. RM is not 
reachable, and history service is not enabled.".fmt(appId);
      throw error;
    });
  }
{code}

We can check here in the catch block if the response type is not JSON  or not 
try and get vertexProgress since it knows that the application/AM has failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to