Github user Sherry302 commented on the issue: https://github.com/apache/spark/pull/14659 Hi, @tgravescs Thank you very much for the review. I have updated the PR based on your every comment, including adding a CallerContext class, updating java doc, and made the caller context string shorter, etc. Manual Tests against some Spark applications in Yarn client mode and Yarn cluster mode, and spark caller contexts are written into HDFS `hdfs-audit.log` successfully. The following is the screenshot of the audit log (SparkKMeans in yarn client mode): <img width="1407" alt="screen shot 2016-09-14 at 10 34 25 pm" src="https://cloud.githubusercontent.com/assets/8546874/18539563/1eb16748-7acd-11e6-840a-0e8bfabf5954.png"> This is the caller context which was written into `hdfs-audit.log` by `Yarn Client`: ``` 2016-09-14 22:28:59,341 INFO FSNamesystem.audit: allowed=true ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=getfileinfo src=/lr_big.txt dst=null perm=null proto=rpc callerContext=SPARK_AppName_SparkKMeans_AppID_application_1473908768790_0007 ``` The callerContext above is `SPARK_AppName_***_AppID_***` These are the caller contexts which were written into `hdfs-audit.log` by `Task`: ``` 2016-09-14 22:29:06,525 INFO FSNamesystem.audit: allowed=true ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=open src=/lr_big.txt dst=null perm=null proto=rpc callerContext=SPARK_AppID_application_1473908768790_0007_JobID_0_StageID_0_0_TaskId_1_0 2016-09-14 22:29:06,526 INFO FSNamesystem.audit: allowed=true ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=open src=/lr_big.txt dst=null perm=null proto=rpc callerContext=SPARK_AppID_application_1473908768790_0007_JobID_0_StageID_0_0_TaskId_0_0 2016-09-14 22:29:06,526 INFO FSNamesystem.audit: allowed=true ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=open src=/lr_big.txt dst=null perm=null proto=rpc callerContext=SPARK_AppID_application_1473908768790_0007_JobID_0_StageID_0_0_TaskId_2_0 ``` The callContext above is `SPARK_AppID_***_JobID_***_StageID_***_(StageAttemptID)_TaskId_***_(TaskAttemptNumber)`. The static strings `jobAttemptID`, `stageAttemptID`, and `attemptNumber` of tasks have been deleted. (For `jobAttemptID`, please refer the following records produced by SparkKMeans ran in Yarn cluster mode) The records below were written into `hdfs-audit.log` when SparkKMeans ran in Yarn cluster mode: ``` 2016-09-14 22:25:30,100 INFO FSNamesystem.audit: allowed=true ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=mkdirs src=/private/tmp/hadoop-wyang/nm-local-dir/usercache/wyang/appcache/application_1473908768790_0006/container_1473908768790_0006_01_000001/spark-warehouse dst=null perm=wyang:supergroup:rwxr-xr-x proto=rpc callerContext=SPARK_AppName_org.apache.spark.examples.SparkKMeans_AppID_application_1473908768790_0006_1 2016-09-14 22:25:33,635 INFO FSNamesystem.audit: allowed=true ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=open src=/lr_big.txt dst=null perm=null proto=rpc callerContext=SPARK_AppID_application_1473908768790_0006_1_JobID_0_StageID_0_0_TaskId_0_0 2016-09-14 22:25:33,635 INFO FSNamesystem.audit: allowed=true ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=open src=/lr_big.txt dst=null perm=null proto=rpc callerContext=SPARK_AppID_application_1473908768790_0006_1_JobID_0_StageID_0_0_TaskId_2_0 2016-09-14 22:25:33,635 INFO FSNamesystem.audit: allowed=true ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=open src=/lr_big.txt dst=null perm=null proto=rpc callerContext=SPARK_AppID_application_1473908768790_0006_1_JobID_0_StageID_0_0_TaskId_1_0 ```
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org