[ 
https://issues.apache.org/jira/browse/HDFS-4680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13735105#comment-13735105
 ] 

Daryn Sharp commented on HDFS-4680:
-----------------------------------

I made the mistake of looking at the raw patch instead of applying it.

With the way you've done it, I think we may be able to simplify it.  The 
instanceof for the default audit logger seems like it can/should be avoided.  
It appears you did this in part to avoid the performance hit of looking up the 
token identifier and its tracking id every time you log a message.  We should 
probably think of a way to avoid that.

Off the top of my head, conceptually it would be ideal if the connection knew 
the trackingId, and the audit logger would simply log it if not null.  I'll 
think about it more today since I'm trying to contemplate how a forward lookup 
would be an easy drop-in in the future and if there would be any rolling 
upgrade issues.
                
> Audit logging of delegation tokens for MR tracing
> -------------------------------------------------
>
>                 Key: HDFS-4680
>                 URL: https://issues.apache.org/jira/browse/HDFS-4680
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode, security
>    Affects Versions: 2.0.3-alpha
>            Reporter: Andrew Wang
>            Assignee: Andrew Wang
>         Attachments: hdfs-4680-1.patch, hdfs-4680-2.patch, hdfs-4680-3.patch
>
>
> HDFS audit logging tracks HDFS operations made by different users, e.g. 
> creation and deletion of files. This is useful for after-the-fact root cause 
> analysis and security. However, logging merely the username is insufficient 
> for many usecases. For instance, it is common for a single user to run 
> multiple MapReduce jobs (I believe this is the case with Hive). In this 
> scenario, given a particular audit log entry, it is difficult to trace it 
> back to the MR job or task that generated that entry.
> I see a number of potential options for implementing this.
> 1. Make an optional "client name" field part of the NN RPC format. We already 
> pass a {{clientName}} as a parameter in many RPC calls, so this would 
> essentially make it standardized. MR tasks could then set this field to the 
> job and task ID.
> 2. This could be generalized to a set of optional key-value *tags* in the NN 
> RPC format, which would then be audit logged. This has standalone benefits 
> outside of just verifying MR task ids.
> 3. Neither of the above two options actually securely verify that MR clients 
> are who they claim they are. Doing this securely requires the JobTracker to 
> sign MR task attempts, and then having the NN verify this signature. However, 
> this is substantially more work, and could be built on after idea #2.
> Thoughts welcomed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to