I think many users parse audit logs in their own way, and they will be affected if the format is changed. So I agree with Masatake's suggestion. - Takanobu
2021年10月11日(月) 18:19 tom lee <tomlees...@gmail.com>: > Thanks @Masatake Iwasaki <iwasak...@oss.nttdata.co.jp> for your > suggestion. This is a good idea. > > Masatake Iwasaki <iwasak...@oss.nttdata.co.jp> 于2021年10月11日周一 下午3:26写道: > > > > I am not sure whether we can directly go and change this. Any changes > to > > Audit Log format are considered incompatible. > > > > > > > > > https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/Compatibility.html#Audit_Log_Output > > > > Adding a field for caller context seemed to be accepted since it is > > optional feature disabled by default. > > > > > https://github.com/apache/hadoop/blob/rel/release-3.3.1/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java#L8480-L8498 > > > > If we need to add fields, making it optional might be an option. > > > > Masatake Iwasaki > > > > On 2021/10/11 16:09, tom lee wrote: > > > However, adding port is to modify the internal content of the IP field, > > > which has little impact on the overall layout. > > > > > > In our cluster, we parse the audit log through Vector and send the data > > to > > > Kafka, which is unaffected. > > > > > > tom lee <tomlees...@gmail.com> 于2021年10月11日周一 下午2:44写道: > > > > > >> Thank Ayush for reminding me. I also have similar concerns, so I > > published > > >> this discussion, hoping to let the members of the community know about > > this > > >> matter and then give suggestions. > > >> > > >> Ayush Saxena <ayush...@gmail.com> 于2021年10月11日周一 下午2:38写道: > > >> > > >>> Hey > > >>> I am not sure whether we can directly go and change this. Any changes > > to > > >>> Audit Log format are considered incompatible. > > >>> > > >>> > > >>> > > > https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/Compatibility.html#Audit_Log_Output > > >>> > > >>> -Ayush > > >>> > > >>> On 10-Oct-2021, at 7:57 PM, tom lee <tomlees...@gmail.com> wrote: > > >>> > > >>> Hi all, > > >>> > > >>> In our production environment, we occasionally encounter a problem > > where a > > >>> user submits an abnormal computation task, causing a sudden flood of > > >>> requests, which causes the queueTime and processingTime of the > > Namenode to > > >>> rise very high, causing a large backlog of tasks. > > >>> > > >>> We usually locate and kill specific Spark, Flink, or MapReduce tasks > > based > > >>> on metrics and audit logs. Currently, IP and UGI are recorded in > audit > > >>> logs, but there is no port information, so it is difficult to locate > > >>> specific processes sometimes. Therefore, I propose that we add the > port > > >>> information to the audit log, so that we can easily track the > upstream > > >>> process. > > >>> > > >>> Currently, some projects contain port information in audit logs, such > > as > > >>> Hbase and Alluxio. I think it is also necessary to add port > information > > >>> for > > >>> HDFS audit logs. > > >>> > > >>> I submitted a PR(https://github.com/apache/hadoop/pull/3538), which > > has > > >>> been tested in our test environment, and both RPC and HTTP are in > > effect. > > >>> I > > >>> look forward to your discussion on possible problems and suggestions > > for > > >>> modification. I will actively update the PR. > > >>> > > >>> Best Regards, > > >>> Tom > > >>> > > >>> > > > > > >