Andrew Wang created HDFS-4680:
---------------------------------
Summary: Audit logging of client names
Key: HDFS-4680
URL: https://issues.apache.org/jira/browse/HDFS-4680
Project: Hadoop HDFS
Issue Type: Bug
Components: namenode, security
Affects Versions: 2.0.3-alpha
Reporter: Andrew Wang
Assignee: Andrew Wang
HDFS audit logging tracks HDFS operations made by different users, e.g.
creation and deletion of files. This is useful for after-the-fact root cause
analysis and security. However, logging merely the username is insufficient for
many usecases. For instance, it is common for a single user to run multiple
MapReduce jobs (I believe this is the case with Hive). In this scenario, given
a particular audit log entry, it is difficult to trace it back to the MR job or
task that generated that entry.
I see a number of potential options for implementing this.
1. Make an optional "client name" field part of the NN RPC format. We already
pass a {{clientName}} as a parameter in many RPC calls, so this would
essentially make it standardized. MR tasks could then set this field to the job
and task ID.
2. This could be generalized to a set of optional key-value *tags* in the NN
RPC format, which would then be audit logged. This has standalone benefits
outside of just verifying MR task ids.
3. Neither of the above two options actually securely verify that MR clients
are who they claim they are. Doing this securely requires the JobTracker to
sign MR task attempts, and then having the NN verify this signature. However,
this is substantially more work, and could be built on after idea #2.
Thoughts welcomed.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira