[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13120921#comment-13120921
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-3143:
----------------------------------------------------

Just to summarize the design of the current system already implemented(but 
disabled) in YARN NodeManagers and the gaps we need to fill in.

Uploading
 - NM uploads the logs of all the containers of an App into a single file on 
HDFS named by node-id in a per-app directory. So for an app, there are a 
maximum of N log files, N being the number of nodes in the system.
 - NM starts streaming a container's logs to the file once a container finishes.
 - On app-finish, flushes all containers' logs and closes the per-app, per-node 
file.
 - Removes the local container-logs on app-finish and once the aggregated file 
is closed.
 - The log format is a T-File. Keys are container-ids. Values are a list of 
compound text of file-type(syslog/stdout/stderr) and the actual container 
log-file contents.
 - TODO: As of today, NM silently ignores any failures during the log upload. 
It can increment a counter for this failures or maintain a list per app of the 
containers for which it failed to upload the log.

Coverage of logs: In most cases, we don't need to upload the logs of all the 
containers.
 - Options include
    -- only AM logs will be uploaded onto the HDFS for any app.
    -- only AM logs + only failed containers' logs
    -- AM logs + failed containers' logs + x% of successful containers
    -- All logs
 - The above retention policy is already implemented by LogAggregationService, 
but this needs to be user-configurable: TODO.

Web Serving
 - NM serves the log files of a container till the App finishes. NM doesn't 
have any indices, all it does is it prints the logs treating them as files, one 
after another, possibly with headers for each log-type.
 - TODO: After the upload finishes, NM will point the user to a configured 
log-server location. This is mostly the same as JobHistory server.
 - TODO: For MapReduce: After the App finishes, when users visit their 
job-history, there will be servlets which parse the aggregated file and present 
per container.

command line user-interface
 - A dumper already included for the clients.
   -- Command line is like so, for all container-logs of a single app
      ./yarn/bin/yarn logs -applicationId application_1304487270789_0001
   -- Command line is like so, for a single container-logs
      ./yarn/bin/yarn logs -applicationId application_1304487270789_0001 
-containerId container_1304487270789_0001_000002 -nodeAddress 127.0.0.1_45454
 - TODO: We need mapreduce specific comand line that takes in a TaskAttemptID 
and returns logs.

Life on HDFS
 - The log file per-app per-node goes into a system log-dir and is written with 
user's credentials.
 - The log-dir is per-user and has quotas specified by admins. The quota are, 
for now, same for all users, a reasonable value.
 - Tooling like dfs -cat or dfs -text for letting the users print out their 
logs depending on the log format above.
 - Admins can have scripts to garbage-collect/HAR logs in the user-dir that 
have aged beyond a certain time period (e.g. 15days)
 - TODO: What is the behaviour when user-quotas are hit. Fail aggregation and 
skip container-logs? How does the user come to know?
                
> Complete aggregation of user-logs spit out by containers onto DFS
> -----------------------------------------------------------------
>
>                 Key: MAPREDUCE-3143
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3143
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, nodemanager
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>             Fix For: 0.23.0
>
>
> Already implemented the feature for handling user-logs spit out by containers 
> in NodeManager. But the feature is currently disabled due to user-interface 
> issues.
> This is the umbrella ticket for tracking the pending bugs w.r.t putting 
> container-logs on DFS.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to