Re: [DISCUSS] Add remote port information to HDFS audit log

Masatake Iwasaki Mon, 11 Oct 2021 00:26:28 -0700

I am not sure whether we can directly go and change this. Any changes to Audit 
Log format are considered incompatible.


https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/Compatibility.html#Audit_Log_Output


Adding a field for caller context seemed to be accepted since it is optional 
feature disabled by default.
https://github.com/apache/hadoop/blob/rel/release-3.3.1/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java#L8480-L8498

If we need to add fields, making it optional might be an option.

Masatake Iwasaki

On 2021/10/11 16:09, tom lee wrote:

However, adding port is to modify the internal content of the IP field,
which has little impact on the overall layout.

In our cluster, we parse the audit log through Vector and send the data to
Kafka, which is unaffected.

tom lee <tomlees...@gmail.com> 于2021年10月11日周一 下午2:44写道：

Thank Ayush for reminding me. I also have similar concerns, so I published
this discussion, hoping to let the members of the community know about this
matter and then give suggestions.

Ayush Saxena <ayush...@gmail.com> 于2021年10月11日周一 下午2:38写道：

Hey
I am not sure whether we can directly go and change this. Any changes to
Audit Log format are considered incompatible.


https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/Compatibility.html#Audit_Log_Output

-Ayush

On 10-Oct-2021, at 7:57 PM, tom lee <tomlees...@gmail.com> wrote:

Hi all,

In our production environment, we occasionally encounter a problem where a
user submits an abnormal computation task, causing a sudden flood of
requests, which causes the queueTime and processingTime of the Namenode to
rise very high, causing a large backlog of tasks.

We usually locate and kill specific Spark, Flink, or MapReduce tasks based
on metrics and audit logs. Currently, IP and UGI are recorded in audit
logs, but there is no port information, so it is difficult to locate
specific processes sometimes. Therefore, I propose that we add the port
information to the audit log, so that we can easily track the upstream
process.

Currently, some projects contain port information in audit logs, such as
Hbase and Alluxio. I think it is also necessary to add port information
for
HDFS audit logs.

I submitted a PR(https://github.com/apache/hadoop/pull/3538), which has
been tested in our test environment, and both RPC and HTTP are in effect.
I
look forward to your discussion on possible problems and suggestions for
modification. I will actively update the PR.

Best Regards,
Tom


---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Re: [DISCUSS] Add remote port information to HDFS audit log

Reply via email to