[ 
https://issues.apache.org/jira/browse/HADOOP-17511?focusedWorklogId=597978&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-597978
 ]

ASF GitHub Bot logged work on HADOOP-17511:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 17/May/21 18:19
            Start Date: 17/May/21 18:19
    Worklog Time Spent: 10m 
      Work Description: steveloughran commented on pull request #2807:
URL: https://github.com/apache/hadoop/pull/2807#issuecomment-842340255


   +git showing some log output during a terasort test
   https://gist.github.com/steveloughran/8e0aadb51c63f1c3538deda19ee952ae
   
   some of the events (e.g 
183c9826b45486e485693808f38e2c4071004bf5dfd4c3ab210f0a21a4235ef8 ) have job ID 
in the referrer header "ji=job_1620911577786_0006". This is only set during the 
FS operations the S3A committer performs during task and job, as they're the 
only ones we know are explicitly related to a job. If we were confident that 
whichever thread called `Committer.setupTask()` was the only thread making 
FileSystem API calls for that task then we could set it at the task level.
   
   The`org.apache.hadoop.fs.audit.CommonAuditContext` class provides global and 
thread local context maps to let apps attach such attributes; the new 
ManifestCommitter will be setting them so that once ABFS picks up the same 
auditing, the context info will come down.
   
   Modified versions of Hive, Spark etc could use this API to set any of their 
context info when a specific thread was scheduled to work for a given query; 
trying to guess in the hadoop committer isn't the right place
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 597978)
    Time Spent: 18h 20m  (was: 18h 10m)

> Add an Audit plugin point for S3A auditing/context
> --------------------------------------------------
>
>                 Key: HADOOP-17511
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17511
>             Project: Hadoop Common
>          Issue Type: Sub-task
>    Affects Versions: 3.3.1
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 18h 20m
>  Remaining Estimate: 0h
>
> Add a way for auditing tools to correlate S3 object calls with Hadoop FS API 
> calls.
> Initially just to log/forward to an auditing service.
> Later: let us attach them as parameters in S3 requests, such as opentrace 
> headeers or (my initial idea: http referrer header -where it will get into 
> the log)
> Challenges
> * ensuring the audit span is created for every public entry point. That will 
> have to include those used in s3guard tools, some defacto public APIs
> * and not re-entered for active spans. s3A code must not call back into the 
> FS API points
> * Propagation across worker threads



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to