[ https://issues.apache.org/jira/browse/MAPREDUCE-6337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527436#comment-14527436 ]
Zhijie Shen commented on MAPREDUCE-6337: ---------------------------------------- Sangjin, thanks for the patch. Here're some high level comments: 1. I've a concern about the way to replay MR job history. Now the approach is to read all the history and convert it into entities, and write it once for a job. This may not reflect the realistic workload pattern, at least different from the current way MR puts the timeline data. Shall we add one more option to control 1) put all entities once per job, 2) put one entity per call and 3) repeatedly put entity per event. The third option is more close to current MR putting method, though it doesn't mean to be the optimal approach. Perhaps different options may affect the write performance. 2. TimelineEntityConverter is doing something similar to what we've done in MAPREDUCE-6237, but in a bit different way, and the entity composition is also slightly different, such as saving counter in metric. I think the reason why MAPREDUCE-6237 may not be reused is that we convert from XXXXInfo to entity while MAPREDUCE-6237 converts XXXXEvent to entity. Perhaps we want to refactor the code and consolidate the conversion later. > add a mode to replay MR job history files to the timeline service > ----------------------------------------------------------------- > > Key: MAPREDUCE-6337 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6337 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Reporter: Sangjin Lee > Assignee: Sangjin Lee > Attachments: MAPREDUCE-6337-YARN-2928.001.patch, YARN-3438.000.patch > > > The subtask covers the work on top of YARN-3437 to add a mode to replay MR > job history files to the timeline service storage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)