Oleksandr Shevchenko created MAPREDUCE-7133:
-----------------------------------------------

             Summary: History Server task attempts REST API returns invalid data
                 Key: MAPREDUCE-7133
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7133
             Project: Hadoop Map/Reduce
          Issue Type: Bug
            Reporter: Oleksandr Shevchenko


When we send a request to History Server with headers : Accept: 
application/json . 
[https://nodename:19888/ws/v1/history/mapreduce/jobs/job_1535363926925_0040/tasks/task_1535363926925_0040_r_000003/attempts|https://192.168.121.199:19890/ws/v1/history/mapreduce/jobs/job_1535363926925_0040/tasks/task_1535363926925_0040_r_000003/attempts]
 
we get the following JSON:
{code}

{
"taskAttempts": {
"taskAttempt": [{
"type": "reduceTaskAttemptInfo",
"startTime": 1535372984638,
"finishTime": 1535372986149,
"elapsedTime": 1511,
"progress": 100.0,
"id": "attempt_1535363926925_0040_r_000003_0",
"rack": "/default-rack",
"state": "SUCCEEDED",
"status": "reduce > reduce",
"nodeHttpAddress": "node2.cluster.com:8044",
"diagnostics": "",
"type": "REDUCE",
"assignedContainerId": "container_e01_1535363926925_0040_01_000006",
"shuffleFinishTime": 1535372986056,
"mergeFinishTime": 1535372986075,
"elapsedShuffleTime": 1418,
"elapsedMergeTime": 19,
"elapsedReduceTime": 74
}]
}
}
{code}

 

As you can see "type" property has duplicates:
"type": "reduceTaskAttemptInfo"

"type": "REDUCE"

It's lead to an error during parsing response body as JSON is not valid.

 

When we use application/xml we get the following response:
{code}

<taskAttempts>
<taskAttempt xmlns:xsi="[http://www.w3.org/2001/XMLSchema-instance]" 
xsi:type="reduceTaskAttemptInfo"><startTime>1535372984638</startTime><finishTime>1535372986149</finishTime><elapsedTime>1511</elapsedTime><progress>100.0</progress><id>attempt_1535363926925_0040_r_000003_0</id><rack>/default-rack</rack><state>SUCCEEDED</state><status>reduce
 > 
reduce</status><nodeHttpAddress>[node2.cluster.com:8044|http://node2.cluster.com:8044]</nodeHttpAddress><diagnostics/><type>REDUCE</type><assignedContainerId>container_e01_1535363926925_0040_01_000006</assignedContainerId><shuffleFinishTime>1535372986056</shuffleFinishTime><mergeFinishTime>1535372986075</mergeFinishTime><elapsedShuffleTime>1418</elapsedShuffleTime><elapsedMergeTime>19</elapsedMergeTime><elapsedReduceTime>74</elapsedReduceTime></taskAttempt>
</taskAttempts>
{code}

Take a look at the following string:
{code}
<taskAttempt xmlns:xsi="[http://www.w3.org/2001/XMLSchema-instance]" 
xsi:type="reduceTaskAttemptInfo">
{code}

We got "xsi:type" attribute which incorectly marshall later to duplicated field 
if we use JSON format.

It acceptable only to REDUCE task. For MAP task we get xml without "xsi:type" 
attribute.
{code}

153537075652815353707603183790100.0attempt_1535363926925_0029_m_000001_0/default-rackSUCCEEDEDmap
 > 
sort[node2.cluster.com:8044|http://node2.cluster.com:8044]MAPcontainer_e01_1535363926925_0029_01_000003
 
{code}

 

This happens since we have two different hierarchical classes for MAP 
->TaskAttemptInfo and REDUCE -> ReduceTaskAttemptInfo tasks.

ReduceTaskAttemptInfo extends TaskAttemptInfo, later we marshal all tasks (map 
and reduce) by TaskAttemptsInfo.getTaskAttempt(). In this place, we do not have 
any information about ReduceTaskAttemptInfo type as we store all tasks in 
ArrayList<TaskAttemptInfo>. 

During marshaling we see that actual type of task ReduceTaskAttemptInfo instead 
of TaskAttemptsInfo and add meta information for this. That's why we get 
duplicated fields.

Unfortunately we do not catch it before in TestHsWebServicesAttempts since we 
use 

org.codehaus.jettison.json.JSONObject library which overrides duplicated 
fields. Even when we use Postman to do request we get valid JSON. Only when we 
change represent type to Raw we can notice this issue. Also, we able to 
reproduce this bug by using "org.json:json" lib:

Something like this:

{code}

BufferedReader inReader = new BufferedReader( new 
InputStreamReader(connection.getInputStream() ) );
 String inputLine;
 StringBuilder response = new StringBuilder();

while ( (inputLine = inReader.readLine()) != null ) {
 response.append(inputLine);
 }

inReader.close();

JSONObject o = new JSONObject(response.toString());
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

Reply via email to