[ 
https://issues.apache.org/jira/browse/FLINK-25029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450241#comment-17450241
 ] 

David Morávek commented on FLINK-25029:
---------------------------------------

Hi [~liufangqi], I think this feature would be a nice improvement to the HDFS 
integration. It seems that you've gone really far in researching on how the 
feature is implemented in HDFS Client. Would you be willing to go one step 
further and try to contribute this feature into Flink? I personally don't have 
a capacity to work on this.

As for where to place the call to #setContext, this is pretty flexible as we 
can tweak the HadoopFileSystem implementation (we can set the context before 
every call to DFS in worst case). Take a look at PluginFileSystemFactory for 
inspiration.

> Hadoop Caller Context Setting In Flink
> --------------------------------------
>
>                 Key: FLINK-25029
>                 URL: https://issues.apache.org/jira/browse/FLINK-25029
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Task
>            Reporter: 刘方奇
>            Priority: Major
>
> For a given HDFS operation (e.g. delete file), it's very helpful to track 
> which upper level job issues it. The upper level callers may be specific 
> Oozie tasks, MR jobs, and hive queries. One scenario is that the namenode 
> (NN) is abused/spammed, the operator may want to know immediately which MR 
> job should be blamed so that she can kill it. To this end, the caller context 
> contains at least the application-dependent "tracking id".
> The above is the main effect of the Caller Context. HDFS Client set Caller 
> Context, then name node get it in audit log to do some work.
> Now the Spark and hive have the Caller Context to meet the HDFS Job Audit 
> requirement.
> In my company, flink jobs often cause some problems for HDFS, so we did it 
> for preventing some cases.
> If the feature is general enough. Should we support it, then I can submit a 
> PR for this.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to