Github user Sherry302 commented on the issue: https://github.com/apache/spark/pull/14659 Hi, @steveloughran Thanks a lot for the comments. In the audit log, if users set some configuration in spark-defaults.conf like `spark.eventLog.dir hdfs://localhost:9000/spark-history`, there will be a record below in audit log: ``` 2016-08-21 23:47:50,834 INFO FSNamesystem.audit: allowed=true ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=setPermission src=/spark-history/application_1471835208589_0013.lz4.inprogress dst=null perm=wyang:supergroup:rwxrwx--- proto=rpc ``` We can see the application id `application_1471835208589_0013` above. Except that case, there is no Spark application information like application name and application id (or in yarn appID+attemptID) in the audit log. So I think it is better to include application name/id in the caller context. I have updated the PR to include those information. In the commit [5ab2a41](https://github.com/apache/spark/pull/14659/commits/5ab2a41b93bfd73baf3798ba66fc7554b10b78e6), application ID and attemptID (only in yarn cluster mode) are included in the value of the caller context when Yarn `client` (if applications run in Yarn client mode) or `ApplicationMaster` (if applications run in Yarn cluster mode) do some operations in HDFS. So in the audit log, you can see `callercontext = Spark_appName_**_appId_**_attemptID_**`: _Applications in yarn cluster mode_ ``` 2016-08-21 22:55:44,568 INFO FSNamesystem.audit: allowed=true ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=getfileinfo src=/lr_big.txt/_spark_metadata dst=null perm=null proto=rpc callerContext=Spark_AppName_org.apache.spark.examples.SparkKMeans_AppId_application_1471835208589_0010_AttemptId_1 2016-08-21 22:55:44,573 INFO FSNamesystem.audit: allowed=true ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=getfileinfo src=/lr_big.txt dst=null perm=null proto=rpc callerContext=Spark_AppName_org.apache.spark.examples.SparkKMeans_AppId_application_1471835208589_0010_AttemptId_1 2016-08-21 22:55:44,583 INFO FSNamesystem.audit: allowed=true ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=listStatus src=/lr_big.txt dst=null perm=null proto=rpc callerContext=Spark_AppName_org.apache.spark.examples.SparkKMeans_AppId_application_1471835208589_0010_AttemptId_1 2016-08-21 22:55:44,589 INFO FSNamesystem.audit: allowed=true ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=open src=/lr_big.txt dst=null perm=null proto=rpc callerContext=Spark_AppName_org.apache.spark.examples.SparkKMeans_AppId_application_1471835208589_0010_AttemptId_1 2016-08-21 22:55:46,163 INFO FSNamesystem.audit: allowed=true ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=mkdirs src=/private/tmp/hadoop-wyang/nm-local-dir/usercache/wyang/appcache/application_1471835208589_0010/container_1471835208589_0010_01_000001/spark-warehouse dst=null perm=wyang:supergroup:rwxr-xr-x proto=rpc callerContext=Spark_AppName_org.apache.spark.examples.SparkKMeans_AppId_application_1471835208589_0010_AttemptId_1 ``` _Applications in yarn client mode_ ``` 2016-08-21 22:59:20,775 INFO FSNamesystem.audit: allowed=true ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=getfileinfo src=/lr_big.txt/_spark_metadata dst=null perm=null proto=rpc callerContext=Spark_AppName_SparkKMeans_AppId_application_1471835208589_0011 2016-08-21 22:59:20,778 INFO FSNamesystem.audit: allowed=true ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=getfileinfo src=/lr_big.txt dst=null perm=null proto=rpc callerContext=Spark_AppName_SparkKMeans_AppId_application_1471835208589_0011 2016-08-21 22:59:20,785 INFO FSNamesystem.audit: allowed=true ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=listStatus src=/lr_big.txt dst=null perm=null proto=rpc callerContext=Spark_AppName_SparkKMeans_AppId_application_1471835208589_0011 2016-08-21 22:59:20,791 INFO FSNamesystem.audit: allowed=true ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=open src=/lr_big.txt dst=null perm=null proto=rpc callerContext=Spark_AppName_SparkKMeans_AppId_application_1471835208589_0011 ``` In the commit [1512775,](https://github.com/apache/spark/pull/14659/commits/1512775a3faddb9de9299662a6f3bfec3f6fe205) application ID, name and attempt ID (only in yarn cluster mode) are included in the value of the caller context when `Tasks` do operations in HDFS. So in the audit log, you can see `callercontext=Spark_appName_**_appID_**_appAttemtID_**_JobId_**_StageID_**_stageAttemptId_**_taskID_**_attemptNumber_**`: _Applications in Yarn cluster mode_ ``` 2016-08-21 22:55:50,977 INFO FSNamesystem.audit: allowed=true ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=open src=/lr_big.txt dst=null perm=null proto=rpc callerContext=Spark_AppId_application_1471835208589_0010_AppAttemptId_1_JobId_0_StageID_0_stageAttemptId_0_taskID_3_attemptNumber_0 2016-08-21 22:55:50,978 INFO FSNamesystem.audit: allowed=true ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=open src=/lr_big.txt dst=null perm=null proto=rpc callerContext=Spark_AppId_application_1471835208589_0010_AppAttemptId_1_JobId_0_StageID_0_stageAttemptId_0_taskID_5_attemptNumber_0 2016-08-21 22:55:50,978 INFO FSNamesystem.audit: allowed=true ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=open src=/lr_big.txt dst=null perm=null proto=rpc callerContext=Spark_AppId_application_1471835208589_0010_AppAttemptId_1_JobId_0_StageID_0_stageAttemptId_0_taskID_1_attemptNumber_0 ``` _Applications in Yarn client mode_ ``` 2016-08-21 23:15:43,089 INFO FSNamesystem.audit: allowed=true ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=open src=/lr_big.txt dst=null perm=null proto=rpc callerContext=Spark_AppId_application_1471835208589_0012_AppAttemptId_None_JobId_0_StageID_0_stageAttemptId_0_taskID_3_attemptNumber_0 2016-08-21 23:15:43,089 INFO FSNamesystem.audit: allowed=true ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=open src=/lr_big.txt dst=null perm=null proto=rpc callerContext=Spark_AppId_application_1471835208589_0012_AppAttemptId_None_JobId_0_StageID_0_stageAttemptId_0_taskID_5_attemptNumber_0 2016-08-21 23:15:43,089 INFO FSNamesystem.audit: allowed=true ugi=wyang (auth:SIMPLE) ip=/127.0.0.1 cmd=open src=/lr_big.txt dst=null perm=null proto=rpc callerContext=Spark_AppId_application_1471835208589_0012_AppAttemptId_None_JobId_0_StageID_0_stageAttemptId_0_taskID_1_attemptNumber_0 ``` For commit [1512775,](https://github.com/apache/spark/pull/14659/commits/1512775a3faddb9de9299662a6f3bfec3f6fe205) , Application Id and attemptID are passed to âTaskâ, is it good for âTaskâ to see those application information? What do you think about this @steveloughran ? Thanks.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org