Github user Sherry302 commented on the issue:

    https://github.com/apache/spark/pull/14659
  
    Hi, @tgravescs Thank you very much for the review. I have updated the PR 
based on your every comment, including adding a CallerContext class, updating 
java doc, and made the caller context string shorter, etc. Manual Tests against 
some Spark applications in Yarn client mode and Yarn cluster mode, and spark 
caller contexts are written into HDFS `hdfs-audit.log` successfully.
    
    The following is the screenshot of the audit log (SparkKMeans in yarn 
client mode):
    
    <img width="1407" alt="screen shot 2016-09-14 at 10 34 25 pm" 
src="https://cloud.githubusercontent.com/assets/8546874/18539563/1eb16748-7acd-11e6-840a-0e8bfabf5954.png";>
    
    This is the caller context which was written into `hdfs-audit.log` by `Yarn 
Client`:
    ```
    2016-09-14 22:28:59,341 INFO FSNamesystem.audit: allowed=true       
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=getfileinfo src=/lr_big.txt 
dst=null        perm=null       proto=rpc       
callerContext=SPARK_AppName_SparkKMeans_AppID_application_1473908768790_0007
    ```
    The callerContext above is `SPARK_AppName_***_AppID_***`
    
    These are the caller contexts which were written into `hdfs-audit.log` by 
`Task`:
    ```
    2016-09-14 22:29:06,525 INFO FSNamesystem.audit: allowed=true       
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=open        src=/lr_big.txt 
dst=null        perm=null       proto=rpc       
callerContext=SPARK_AppID_application_1473908768790_0007_JobID_0_StageID_0_0_TaskId_1_0
    2016-09-14 22:29:06,526 INFO FSNamesystem.audit: allowed=true       
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=open        src=/lr_big.txt 
dst=null        perm=null       proto=rpc       
callerContext=SPARK_AppID_application_1473908768790_0007_JobID_0_StageID_0_0_TaskId_0_0
    2016-09-14 22:29:06,526 INFO FSNamesystem.audit: allowed=true       
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=open        src=/lr_big.txt 
dst=null        perm=null       proto=rpc       
callerContext=SPARK_AppID_application_1473908768790_0007_JobID_0_StageID_0_0_TaskId_2_0
    ```
    The callContext above is 
`SPARK_AppID_***_JobID_***_StageID_***_(StageAttemptID)_TaskId_***_(TaskAttemptNumber)`.
 The static strings `jobAttemptID`, `stageAttemptID`, and `attemptNumber` of 
tasks have been deleted. (For `jobAttemptID`, please refer the following 
records produced by SparkKMeans ran in Yarn cluster mode)
    
    The records below were written into `hdfs-audit.log` when SparkKMeans ran 
in Yarn cluster mode:
    
    ```
    2016-09-14 22:25:30,100 INFO FSNamesystem.audit: allowed=true       
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=mkdirs      
src=/private/tmp/hadoop-wyang/nm-local-dir/usercache/wyang/appcache/application_1473908768790_0006/container_1473908768790_0006_01_000001/spark-warehouse
       dst=null        perm=wyang:supergroup:rwxr-xr-x proto=rpc       
callerContext=SPARK_AppName_org.apache.spark.examples.SparkKMeans_AppID_application_1473908768790_0006_1
    2016-09-14 22:25:33,635 INFO FSNamesystem.audit: allowed=true       
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=open        src=/lr_big.txt 
dst=null        perm=null       proto=rpc       
callerContext=SPARK_AppID_application_1473908768790_0006_1_JobID_0_StageID_0_0_TaskId_0_0
    2016-09-14 22:25:33,635 INFO FSNamesystem.audit: allowed=true       
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=open        src=/lr_big.txt 
dst=null        perm=null       proto=rpc       
callerContext=SPARK_AppID_application_1473908768790_0006_1_JobID_0_StageID_0_0_TaskId_2_0
    2016-09-14 22:25:33,635 INFO FSNamesystem.audit: allowed=true       
ugi=wyang (auth:SIMPLE) ip=/127.0.0.1   cmd=open        src=/lr_big.txt 
dst=null        perm=null       proto=rpc       
callerContext=SPARK_AppID_application_1473908768790_0006_1_JobID_0_StageID_0_0_TaskId_1_0
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to