Github user Sherry302 commented on the issue:

    https://github.com/apache/spark/pull/14659
  
    Hi, @steveloughran Thanks a lot for the comments.
    
    In the audit log, if users set some configuration in spark-defaults.conf 
like `spark.eventLog.dir hdfs://localhost:9000/spark-history`, there will be a 
record below in audit log:
    ```
    2016-08-21 23:47:50,834 INFO FSNamesystem.audit: allowed=true        
ugi=wyang (auth:SIMPLE)        ip=/127.0.0.1        cmd=setPermission      
src=/spark-history/application_1471835208589_0013.lz4.inprogress     dst=null   
     perm=wyang:supergroup:rwxrwx---       proto=rpc
    ```
    We can see the application id `application_1471835208589_0013` above. 
Except that case, there is no Spark application information like application 
name and application id (or in yarn appID+attemptID) in the audit log. So I 
think it is better to include application name/id in the caller context. I have 
updated the PR to include those information.
    
    In the commit       
[5ab2a41](https://github.com/apache/spark/pull/14659/commits/5ab2a41b93bfd73baf3798ba66fc7554b10b78e6),
 application ID and attemptID (only in yarn cluster mode) are included in the 
value of the caller context when Yarn `client` (if applications run in Yarn 
client mode) or `ApplicationMaster` (if applications run in Yarn cluster mode) 
do some operations in HDFS. So in the audit log, you can see `callercontext = 
Spark_appName_**_appId_**_attemptID_**`:
    _Applications in yarn cluster mode_
    ```
    2016-08-21 22:55:44,568 INFO FSNamesystem.audit: allowed=true       
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=getfileinfo 
src=/lr_big.txt/_spark_metadata dst=null        perm=null       proto=rpc       
callerContext=Spark_AppName_org.apache.spark.examples.SparkKMeans_AppId_application_1471835208589_0010_AttemptId_1
    2016-08-21 22:55:44,573 INFO FSNamesystem.audit: allowed=true       
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=getfileinfo src=/lr_big.txt 
dst=null        perm=null       proto=rpc       
callerContext=Spark_AppName_org.apache.spark.examples.SparkKMeans_AppId_application_1471835208589_0010_AttemptId_1
    2016-08-21 22:55:44,583 INFO FSNamesystem.audit: allowed=true       
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=listStatus  src=/lr_big.txt 
dst=null        perm=null       proto=rpc       
callerContext=Spark_AppName_org.apache.spark.examples.SparkKMeans_AppId_application_1471835208589_0010_AttemptId_1
    2016-08-21 22:55:44,589 INFO FSNamesystem.audit: allowed=true       
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=open        src=/lr_big.txt 
dst=null        perm=null       proto=rpc       
callerContext=Spark_AppName_org.apache.spark.examples.SparkKMeans_AppId_application_1471835208589_0010_AttemptId_1
    2016-08-21 22:55:46,163 INFO FSNamesystem.audit: allowed=true       
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=mkdirs      
src=/private/tmp/hadoop-wyang/nm-local-dir/usercache/wyang/appcache/application_1471835208589_0010/container_1471835208589_0010_01_000001/spark-warehouse
       dst=null        perm=wyang:supergroup:rwxr-xr-x proto=rpc       
callerContext=Spark_AppName_org.apache.spark.examples.SparkKMeans_AppId_application_1471835208589_0010_AttemptId_1
    ```
    _Applications in yarn client mode_
    ```
    2016-08-21 22:59:20,775 INFO FSNamesystem.audit: allowed=true       
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=getfileinfo 
src=/lr_big.txt/_spark_metadata dst=null        perm=null       proto=rpc       
callerContext=Spark_AppName_SparkKMeans_AppId_application_1471835208589_0011
    2016-08-21 22:59:20,778 INFO FSNamesystem.audit: allowed=true       
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=getfileinfo src=/lr_big.txt 
dst=null        perm=null       proto=rpc       
callerContext=Spark_AppName_SparkKMeans_AppId_application_1471835208589_0011
    2016-08-21 22:59:20,785 INFO FSNamesystem.audit: allowed=true       
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=listStatus  src=/lr_big.txt 
dst=null        perm=null       proto=rpc       
callerContext=Spark_AppName_SparkKMeans_AppId_application_1471835208589_0011
    2016-08-21 22:59:20,791 INFO FSNamesystem.audit: allowed=true       
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=open        src=/lr_big.txt 
dst=null        perm=null       proto=rpc       
callerContext=Spark_AppName_SparkKMeans_AppId_application_1471835208589_0011
    ```
     In the commit 
[1512775,](https://github.com/apache/spark/pull/14659/commits/1512775a3faddb9de9299662a6f3bfec3f6fe205)
 application ID, name and attempt ID (only in yarn cluster mode) are included 
in the value of the caller context when `Tasks` do operations in HDFS. So in 
the audit log, you can see 
`callercontext=Spark_appName_**_appID_**_appAttemtID_**_JobId_**_StageID_**_stageAttemptId_**_taskID_**_attemptNumber_**`:
    _Applications in Yarn cluster mode_
    ```
    2016-08-21 22:55:50,977 INFO FSNamesystem.audit: allowed=true       
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=open        src=/lr_big.txt 
dst=null        perm=null       proto=rpc       
callerContext=Spark_AppId_application_1471835208589_0010_AppAttemptId_1_JobId_0_StageID_0_stageAttemptId_0_taskID_3_attemptNumber_0
    2016-08-21 22:55:50,978 INFO FSNamesystem.audit: allowed=true       
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=open        src=/lr_big.txt 
dst=null        perm=null       proto=rpc       
callerContext=Spark_AppId_application_1471835208589_0010_AppAttemptId_1_JobId_0_StageID_0_stageAttemptId_0_taskID_5_attemptNumber_0
    2016-08-21 22:55:50,978 INFO FSNamesystem.audit: allowed=true       
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=open        src=/lr_big.txt 
dst=null        perm=null       proto=rpc       
callerContext=Spark_AppId_application_1471835208589_0010_AppAttemptId_1_JobId_0_StageID_0_stageAttemptId_0_taskID_1_attemptNumber_0
    ```
    _Applications in Yarn client mode_
    ```
    2016-08-21 23:15:43,089 INFO FSNamesystem.audit: allowed=true       
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=open        src=/lr_big.txt 
dst=null        perm=null       proto=rpc       
callerContext=Spark_AppId_application_1471835208589_0012_AppAttemptId_None_JobId_0_StageID_0_stageAttemptId_0_taskID_3_attemptNumber_0
    2016-08-21 23:15:43,089 INFO FSNamesystem.audit: allowed=true       
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=open        src=/lr_big.txt 
dst=null        perm=null       proto=rpc       
callerContext=Spark_AppId_application_1471835208589_0012_AppAttemptId_None_JobId_0_StageID_0_stageAttemptId_0_taskID_5_attemptNumber_0
    2016-08-21 23:15:43,089 INFO FSNamesystem.audit: allowed=true       
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=open        src=/lr_big.txt 
dst=null        perm=null       proto=rpc       
callerContext=Spark_AppId_application_1471835208589_0012_AppAttemptId_None_JobId_0_StageID_0_stageAttemptId_0_taskID_1_attemptNumber_0
    ```
    For commit 
[1512775,](https://github.com/apache/spark/pull/14659/commits/1512775a3faddb9de9299662a6f3bfec3f6fe205)
 , Application Id and attemptID are passed to ‘Task’, is it good for 
‘Task’ to see those application information? What do you think about this 
@steveloughran ? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to