[ 
https://issues.apache.org/jira/browse/MAPREDUCE-279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008517#comment-13008517
 ] 

Todd Lipcon commented on MAPREDUCE-279:
---------------------------------------

Hi Arun. I spent the train ride this morning looking over yarn/src/main/avro in 
the branch. Here are a few comments, sorry for the somewhat 
stream-of-consciousness format.


- Is the correct suffix still .genavro? Thought we'd changed the name to 
.avroidl or something?
- Apache licenses needed on these files
- Does AvroIDL convert javadoc-style comments on records/protocols into JavaDoc 
on generated code? If so we should do more of that.


- AMRMProtocol:
-- the "release" parameter to allocate is strange: (a) it seems the function is 
misnamed if you can also release things as you call it, and (b) why isn't it an 
array<ContainerId>?
-- if you want to cancel previous resource requests, do you submit a new one 
with a negative numContainers?


- ApplicationSubmissionContext:
-- would be good to have some kind of scheduler-specific parameters here? eg 
maybe a scheduler has something beyond just "priority" (eg. perhaps a deadline)
-- using just URL type directly for resources - seems not quite flexible 
enough? eg one useful construct would be a URL + checksum
-- what's resources_todo going to be?
-- passing "user" - agreed, this should be more flexible than simple string.
-- Why not contain a ContainerLaunchContext to specify the container in which 
to run the AM? Seems like lots of duplicated fields.

- ContainerManager:
-- not following YarnContainerTags - these are opaque enums, how do they get 
interpolated in a string?
-- how does one access stderr/stdout contents? both while they're being written 
and after a container has terminated? (maybe I just haven't gotten to that bit 
yet somewhere else)

- yarn-types.avro:
-- For the typesafe ID classes, do we need to specify explicit comparison 
orderings? I don't know Avro behavior here.
-- Did you consider making the ids all strings instead of ints? The pro would 
be that there could be canonical formats, like "AM-<hex id>" for app masters vs 
"C-<hex id>" for containers. AWS does a good job of this.
-- Resource: field names should include units, like "int memoryMB"
-- what are ContainerTokens? could use some extra doc at the protocol layer 
here. (I assume this is for security?)
-- The "Container" type doesn't appear 
-- the URL record is missing user/password used for http basic auth or s3n auth
-- there are some hard tabs in this file
-- ApplicationMaster:
--- httpPort seems like it would be better described as something like 
"httpStatusURL"?
-- LocalResourceVisibility:
--- just to clarify, APPLICATION visibility means "only to this application 
submitted by this user". ie if joe and bob both submit MapReduce 2.x.y jobs 
with identical jars, it still won't share, even if sha1s match?
--- if bob submits the same application (ie MR 2.x.y) twice, do APPLICATION 
visibility files get shared?


> Map-Reduce 2.0
> --------------
>
>                 Key: MAPREDUCE-279
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-279
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker, tasktracker
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 0.23.0
>
>         Attachments: MR-279.patch, MR-279.patch, MR-279.sh, 
> MR-279_MR_files_to_move.txt
>
>
> Re-factor MapReduce into a generic resource scheduler and a per-job, 
> user-defined component that manages the application execution. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to