[ https://issues.apache.org/jira/browse/TEZ-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kuhu Shukla updated TEZ-3347: ----------------------------- Attachment: ErrorCodeFailedVertex.png > Vertex UI throws an error while getting vertexProgress for a killed Vertex > -------------------------------------------------------------------------- > > Key: TEZ-3347 > URL: https://issues.apache.org/jira/browse/TEZ-3347 > Project: Apache Tez > Issue Type: Bug > Components: UI > Reporter: Kuhu Shukla > Attachments: ErrorCodeFailedVertex.png > > > Given an AM that fails all its attempts, the application fails and the very > first click on the killed/failed vertex throws the following error: > {code} > error code: Unknown, message: expected expression, got '<' > {code} > It self corrects if tried again immediately after the failure. > This is because the RM proxy redirects the call to the AHS server and the > REST call is malformed for that server. Upon inspection of the responses, it > was seen that the URL looked something like this: > {code} > http://<hostname>:<ahsport>/applicationhistory/app/application_123_456/ws/v1/tez/vertexProgress?dagID=1&vertexID=01&_=123 > {code} > which is not a proper Rest call on the AHS. > I think the following code can cause this issue: > {code} > // Load progress in parallel for v1 version of the api > _loadProgress: function (vertices) { > var that = this, > runningVerticesIdx = vertices > .filterBy('status', 'RUNNING') > .map(function(item) { > return item.get('id').split('_').splice(-1).pop(); > }); > if (runningVerticesIdx.length > 0) { > this.store.unloadAll('vertexProgress'); > this.store.findQuery('vertexProgress', { > metadata: { > appId: that.get('applicationId'), > dagIdx: that.get('idx'), > vertexIds: runningVerticesIdx.join(',') > } > }).then(function(vertexProgressInfo) { > App.Helpers.emData.mergeRecords( > that.get('rowsDisplayed'), > vertexProgressInfo, > ['progress'] > ); > }).catch(function(error) { > error.message = "Failed to fetch vertexProgress. Application Master > (AM) is out of reach. Either it's down, or CORS is not enabled for YARN > ResourceManager."; > Em.Logger.error(error); > var err = App.Helpers.misc.formatError(error); > var msg = 'Error code: %@, message: %@'.fmt(err.errCode, err.msg); > App.Helpers.ErrorBar.getInstance().show(msg, err.details); > }); > {code} > which uses AMInfo that gets the response based on what loadApp method finds: > {code} > loadApp: function (store, appId, useCache) { > if(!useCache) { > App.Helpers.misc.removeRecord(store, 'appDetail', appId); > App.Helpers.misc.removeRecord(store, 'clusterApp', appId); > } > return store.find('clusterApp', appId).catch(function () { > return store.find('appDetail', appId); > }).catch(function (error) { > error.message = "Couldn't get details of application %@. RM is not > reachable, and history service is not enabled.".fmt(appId); > throw error; > }); > } > {code} > We can check here in the catch block if the response type is not JSON or not > try and get vertexProgress since it knows that the application/AM has failed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)