[ 
https://issues.apache.org/jira/browse/TEZ-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-3347:
-----------------------------
    Attachment: ErrorCodeFailedVertex.png

> Vertex UI throws an error while getting vertexProgress for a killed Vertex
> --------------------------------------------------------------------------
>
>                 Key: TEZ-3347
>                 URL: https://issues.apache.org/jira/browse/TEZ-3347
>             Project: Apache Tez
>          Issue Type: Bug
>          Components: UI
>            Reporter: Kuhu Shukla
>         Attachments: ErrorCodeFailedVertex.png
>
>
> Given an AM that fails all its attempts, the application fails and the very 
> first click on the killed/failed vertex throws the following error:
> {code}
>  error code: Unknown, message: expected expression, got '<'
> {code}
> It self corrects if tried again immediately after the failure.
> This is because the RM proxy redirects the call to the AHS server and the 
> REST call is malformed for that server. Upon inspection of the responses, it 
> was seen that the URL looked something like this:
> {code}
> http://<hostname>:<ahsport>/applicationhistory/app/application_123_456/ws/v1/tez/vertexProgress?dagID=1&vertexID=01&_=123
> {code}
> which is not a proper Rest call on the AHS.
> I think the following code can cause this issue:
> {code}
> // Load progress in parallel for v1 version of the api
>   _loadProgress: function (vertices) {
>     var that = this,
>         runningVerticesIdx = vertices
>       .filterBy('status', 'RUNNING')
>       .map(function(item) {
>         return item.get('id').split('_').splice(-1).pop();
>       });
>     if (runningVerticesIdx.length > 0) {
>       this.store.unloadAll('vertexProgress');
>       this.store.findQuery('vertexProgress', {
>         metadata: {
>           appId: that.get('applicationId'),
>           dagIdx: that.get('idx'),
>           vertexIds: runningVerticesIdx.join(',')
>         }
>       }).then(function(vertexProgressInfo) {
>           App.Helpers.emData.mergeRecords(
>             that.get('rowsDisplayed'),
>             vertexProgressInfo,
>             ['progress']
>           );
>       }).catch(function(error) {
>         error.message = "Failed to fetch vertexProgress. Application Master 
> (AM) is out of reach. Either it's down, or CORS is not enabled for YARN 
> ResourceManager.";
>         Em.Logger.error(error);
>         var err = App.Helpers.misc.formatError(error);
>         var msg = 'Error code: %@, message: %@'.fmt(err.errCode, err.msg);
>         App.Helpers.ErrorBar.getInstance().show(msg, err.details);
>       });
> {code}
> which uses AMInfo that gets the response based on what loadApp method finds:
> {code}
> loadApp: function (store, appId, useCache) {
>     if(!useCache) {
>       App.Helpers.misc.removeRecord(store, 'appDetail', appId);
>       App.Helpers.misc.removeRecord(store, 'clusterApp', appId);
>     }
>     return store.find('clusterApp', appId).catch(function () {
>       return store.find('appDetail', appId);
>     }).catch(function (error) {
>       error.message = "Couldn't get details of application %@. RM is not 
> reachable, and history service is not enabled.".fmt(appId);
>       throw error;
>     });
>   }
> {code}
> We can check here in the catch block if the response type is not JSON  or not 
> try and get vertexProgress since it knows that the application/AM has failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to