[jira] [Resolved] (HADOOP-17960) hadoop-auth module cannot import non-guava implementation in hatoop util

2021-10-11 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein resolved HADOOP-17960.

Resolution: Won't Fix

> hadoop-auth module cannot import non-guava implementation in hatoop util
> 
>
> Key: HADOOP-17960
> URL: https://issues.apache.org/jira/browse/HADOOP-17960
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Ahmed Hussein
>Priority: Major
>
> hadoop-common  provides several util implementations in 
> {{org.apache.hadoop.util.*}}. Since hadoop-common depends on hadoop-auth, all 
> the utility implementations cannot be used within hadoop-auth.
> There are several options:
> * similar to {{hadoop-annotations}} generic and utility implementations such 
> as maps, Strings, Preconditions, ..etc could be moved to a new common-util 
> module that has no dependency on other modules.
> * easier fix is to manually replace the guava calls in hadoop-auth module 
> without importing {{hadoop.util.*}}. Only few calls need to be manually 
> replaced: {{Splitter}}, {{Preconditions.checkNotNull}}, and 
> {{Preconditions.checkArgument}}
> CC: [~vjasani] , [~ste...@apache.org], [~tasanuma]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17963) Replace Guava VisibleForTesting by Hadoop's own annotation in hadoop-yarn-project modules

2021-10-11 Thread Viraj Jasani (Jira)
Viraj Jasani created HADOOP-17963:
-

 Summary: Replace Guava VisibleForTesting by Hadoop's own 
annotation in hadoop-yarn-project modules
 Key: HADOOP-17963
 URL: https://issues.apache.org/jira/browse/HADOOP-17963
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Viraj Jasani
Assignee: Viraj Jasani






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17962) Replace Guava VisibleForTesting by Hadoop's own annotation in hadoop-tools modules

2021-10-11 Thread Viraj Jasani (Jira)
Viraj Jasani created HADOOP-17962:
-

 Summary: Replace Guava VisibleForTesting by Hadoop's own 
annotation in hadoop-tools modules
 Key: HADOOP-17962
 URL: https://issues.apache.org/jira/browse/HADOOP-17962
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Viraj Jasani
Assignee: Viraj Jasani






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17961) s3 and abfs incremental listing: ask for a smaller first batch

2021-10-11 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-17961:
---

 Summary: s3 and abfs incremental listing: ask for a smaller first 
batch
 Key: HADOOP-17961
 URL: https://issues.apache.org/jira/browse/HADOOP-17961
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure, fs/s3
Affects Versions: 3.3.2
Reporter: Steve Loughran


With code gradually adopting listStatusIncremental(), asking for a smaller 
initial batch could permit faster ramp of result processing.

probably most significant on an s3 versioned bucket, as there the need to skip 
tombstones can result in significantly slower listings -but could benefit ABFS 
too



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



Apache Hadoop qbt Report: branch-2.10+JDK7 on Linux/x86_64

2021-10-11 Thread Apache Jenkins Server
For more details, see 
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/447/

No changes




-1 overall


The following subsystems voted -1:
asflicense hadolint mvnsite pathlen unit


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

Failed junit tests :

   hadoop.fs.TestFileUtil 
   hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys 
   
hadoop.hdfs.server.blockmanagement.TestReplicationPolicyWithUpgradeDomain 
   hadoop.hdfs.server.datanode.TestDirectoryScanner 
   hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints 
   hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints 
   hadoop.hdfs.server.federation.router.TestRouterNamenodeHeartbeat 
   hadoop.hdfs.server.federation.router.TestRouterQuota 
   hadoop.hdfs.server.federation.resolver.order.TestLocalResolver 
   hadoop.hdfs.server.federation.resolver.TestMultipleDestinationResolver 
   
hadoop.yarn.server.resourcemanager.monitor.invariants.TestMetricsInvariantChecker
 
   hadoop.yarn.server.resourcemanager.TestClientRMService 
   hadoop.mapreduce.jobhistory.TestHistoryViewerPrinter 
   hadoop.mapreduce.lib.input.TestLineRecordReader 
   hadoop.mapred.TestLineRecordReader 
   hadoop.tools.TestDistCpSystem 
   hadoop.yarn.sls.TestSLSRunner 
   hadoop.resourceestimator.service.TestResourceEstimatorService 
   hadoop.resourceestimator.solver.impl.TestLpSolver 
  

   cc:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/447/artifact/out/diff-compile-cc-root.txt
  [4.0K]

   javac:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/447/artifact/out/diff-compile-javac-root.txt
  [496K]

   checkstyle:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/447/artifact/out/diff-checkstyle-root.txt
  [14M]

   hadolint:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/447/artifact/out/diff-patch-hadolint.txt
  [4.0K]

   mvnsite:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/447/artifact/out/patch-mvnsite-root.txt
  [880K]

   pathlen:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/447/artifact/out/pathlen.txt
  [12K]

   pylint:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/447/artifact/out/diff-patch-pylint.txt
  [48K]

   shellcheck:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/447/artifact/out/diff-patch-shellcheck.txt
  [56K]

   shelldocs:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/447/artifact/out/diff-patch-shelldocs.txt
  [8.0K]

   whitespace:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/447/artifact/out/whitespace-eol.txt
  [12M]
   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/447/artifact/out/whitespace-tabs.txt
  [1.3M]

   javadoc:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/447/artifact/out/patch-javadoc-root.txt
  [32K]

   unit:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/447/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt
  [232K]
   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/447/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
  [428K]
   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/447/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs_src_contrib_bkjournal.txt
  [12K]
   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/447/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt
  [40K]
   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/447/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt
  [20K]
   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/447/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
  [128K]
   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/447/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-core.txt
  [104K]
   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/447/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-jobclient.txt
  [104K]
   

Re: [DISCUSS] Add remote port information to HDFS audit log

2021-10-11 Thread tom lee
Thanks @Masatake Iwasaki  for your
suggestion. This is a good idea.

Masatake Iwasaki  于2021年10月11日周一 下午3:26写道:

> > I am not sure whether we can directly go and change this. Any changes to
> Audit Log format are considered incompatible.
> >
> >
> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/Compatibility.html#Audit_Log_Output
>
> Adding a field for caller context seemed to be accepted since it is
> optional feature disabled by default.
>
> https://github.com/apache/hadoop/blob/rel/release-3.3.1/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java#L8480-L8498
>
> If we need to add fields, making it optional might be an option.
>
> Masatake Iwasaki
>
> On 2021/10/11 16:09, tom lee wrote:
> > However, adding port is to modify the internal content of the IP field,
> > which has little impact on the overall layout.
> >
> > In our cluster, we parse the audit log through Vector and send the data
> to
> > Kafka, which is unaffected.
> >
> > tom lee  于2021年10月11日周一 下午2:44写道:
> >
> >> Thank Ayush for reminding me. I also have similar concerns, so I
> published
> >> this discussion, hoping to let the members of the community know about
> this
> >> matter and then give suggestions.
> >>
> >> Ayush Saxena  于2021年10月11日周一 下午2:38写道:
> >>
> >>> Hey
> >>> I am not sure whether we can directly go and change this. Any changes
> to
> >>> Audit Log format are considered incompatible.
> >>>
> >>>
> >>>
> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/Compatibility.html#Audit_Log_Output
> >>>
> >>> -Ayush
> >>>
> >>> On 10-Oct-2021, at 7:57 PM, tom lee  wrote:
> >>>
> >>> Hi all,
> >>>
> >>> In our production environment, we occasionally encounter a problem
> where a
> >>> user submits an abnormal computation task, causing a sudden flood of
> >>> requests, which causes the queueTime and processingTime of the
> Namenode to
> >>> rise very high, causing a large backlog of tasks.
> >>>
> >>> We usually locate and kill specific Spark, Flink, or MapReduce tasks
> based
> >>> on metrics and audit logs. Currently, IP and UGI are recorded in audit
> >>> logs, but there is no port information, so it is difficult to locate
> >>> specific processes sometimes. Therefore, I propose that we add the port
> >>> information to the audit log, so that we can easily track the upstream
> >>> process.
> >>>
> >>> Currently, some projects contain port information in audit logs, such
> as
> >>> Hbase and Alluxio. I think it is also necessary to add port information
> >>> for
> >>> HDFS audit logs.
> >>>
> >>> I submitted a PR(https://github.com/apache/hadoop/pull/3538), which
> has
> >>> been tested in our test environment, and both RPC and HTTP are in
> effect.
> >>> I
> >>> look forward to your discussion on possible problems and suggestions
> for
> >>> modification. I will actively update the PR.
> >>>
> >>> Best Regards,
> >>> Tom
> >>>
> >>>
> >
>


Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86_64

2021-10-11 Thread Apache Jenkins Server
For more details, see 
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/654/

[Oct 9, 2021 6:00:43 PM] (noreply) HDFS-16265. Refactor HDFS tool tests for 
better reuse (#3536)




-1 overall


The following subsystems voted -1:
blanks pathlen spotbugs unit xml


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

XML :

   Parsing Error(s): 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-excerpt.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags2.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-sample-output.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/fair-scheduler-invalid.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/yarn-site-with-invalid-allocation-file-ref.xml
 

Failed junit tests :

   hadoop.hdfs.TestViewDistributedFileSystemContract 
   hadoop.hdfs.TestSnapshotCommands 
   hadoop.hdfs.TestDatanodeDeath 
   hadoop.hdfs.TestHDFSTrash 
   hadoop.hdfs.TestHDFSFileSystemContract 
   hadoop.hdfs.web.TestWebHdfsFileSystemContract 
   
hadoop.yarn.server.resourcemanager.reservation.TestCapacityOverTimePolicy 
   hadoop.hdfs.rbfbalance.TestRouterDistCpProcedure 
   hadoop.yarn.csi.client.TestCsiClient 
   hadoop.tools.dynamometer.TestDynamometerInfra 
   hadoop.tools.dynamometer.TestDynamometerInfra 
  

   cc:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/654/artifact/out/results-compile-cc-root.txt
 [96K]

   javac:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/654/artifact/out/results-compile-javac-root.txt
 [356K]

   blanks:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/654/artifact/out/blanks-eol.txt
 [13M]
  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/654/artifact/out/blanks-tabs.txt
 [2.0M]

   checkstyle:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/654/artifact/out/results-checkstyle-root.txt
 [14M]

   pathlen:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/654/artifact/out/results-pathlen.txt
 [16K]

   pylint:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/654/artifact/out/results-pylint.txt
 [20K]

   shellcheck:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/654/artifact/out/results-shellcheck.txt
 [28K]

   xml:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/654/artifact/out/xml.txt
 [24K]

   javadoc:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/654/artifact/out/results-javadoc-javadoc-root.txt
 [408K]

   spotbugs:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/654/artifact/out/branch-spotbugs-root.txt
 [564K]

   unit:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/654/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 [816K]
  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/654/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 [176K]
  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/654/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt
 [104K]
  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/654/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-csi.txt
 [24K]
  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/654/artifact/out/patch-unit-hadoop-tools_hadoop-dynamometer_hadoop-dynamometer-infra.txt
 [12K]
  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/654/artifact/out/patch-unit-hadoop-tools_hadoop-dynamometer.txt
 [24K]

Powered by Apache Yetus 0.14.0-SNAPSHOT   https://yetus.apache.org

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Re: [DISCUSS] Add remote port information to HDFS audit log

2021-10-11 Thread tom lee
However, adding port is to modify the internal content of the IP field,
which has little impact on the overall layout.

In our cluster, we parse the audit log through Vector and send the data to
Kafka, which is unaffected.

tom lee  于2021年10月11日周一 下午2:44写道:

> Thank Ayush for reminding me. I also have similar concerns, so I published
> this discussion, hoping to let the members of the community know about this
> matter and then give suggestions.
>
> Ayush Saxena  于2021年10月11日周一 下午2:38写道:
>
>> Hey
>> I am not sure whether we can directly go and change this. Any changes to
>> Audit Log format are considered incompatible.
>>
>>
>> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/Compatibility.html#Audit_Log_Output
>>
>> -Ayush
>>
>> On 10-Oct-2021, at 7:57 PM, tom lee  wrote:
>>
>> Hi all,
>>
>> In our production environment, we occasionally encounter a problem where a
>> user submits an abnormal computation task, causing a sudden flood of
>> requests, which causes the queueTime and processingTime of the Namenode to
>> rise very high, causing a large backlog of tasks.
>>
>> We usually locate and kill specific Spark, Flink, or MapReduce tasks based
>> on metrics and audit logs. Currently, IP and UGI are recorded in audit
>> logs, but there is no port information, so it is difficult to locate
>> specific processes sometimes. Therefore, I propose that we add the port
>> information to the audit log, so that we can easily track the upstream
>> process.
>>
>> Currently, some projects contain port information in audit logs, such as
>> Hbase and Alluxio. I think it is also necessary to add port information
>> for
>> HDFS audit logs.
>>
>> I submitted a PR(https://github.com/apache/hadoop/pull/3538), which has
>> been tested in our test environment, and both RPC and HTTP are in effect.
>> I
>> look forward to your discussion on possible problems and suggestions for
>> modification. I will actively update the PR.
>>
>> Best Regards,
>> Tom
>>
>>


Re: [DISCUSS] Add remote port information to HDFS audit log

2021-10-11 Thread tom lee
Thank Ayush for reminding me. I also have similar concerns, so I published
this discussion, hoping to let the members of the community know about this
matter and then give suggestions.

Ayush Saxena  于2021年10月11日周一 下午2:38写道:

> Hey
> I am not sure whether we can directly go and change this. Any changes to
> Audit Log format are considered incompatible.
>
>
> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/Compatibility.html#Audit_Log_Output
>
> -Ayush
>
> On 10-Oct-2021, at 7:57 PM, tom lee  wrote:
>
> Hi all,
>
> In our production environment, we occasionally encounter a problem where a
> user submits an abnormal computation task, causing a sudden flood of
> requests, which causes the queueTime and processingTime of the Namenode to
> rise very high, causing a large backlog of tasks.
>
> We usually locate and kill specific Spark, Flink, or MapReduce tasks based
> on metrics and audit logs. Currently, IP and UGI are recorded in audit
> logs, but there is no port information, so it is difficult to locate
> specific processes sometimes. Therefore, I propose that we add the port
> information to the audit log, so that we can easily track the upstream
> process.
>
> Currently, some projects contain port information in audit logs, such as
> Hbase and Alluxio. I think it is also necessary to add port information for
> HDFS audit logs.
>
> I submitted a PR(https://github.com/apache/hadoop/pull/3538), which has
> been tested in our test environment, and both RPC and HTTP are in effect. I
> look forward to your discussion on possible problems and suggestions for
> modification. I will actively update the PR.
>
> Best Regards,
> Tom
>
>


Re: [DISCUSS] Add remote port information to HDFS audit log

2021-10-11 Thread Masatake Iwasaki

I am not sure whether we can directly go and change this. Any changes to Audit 
Log format are considered incompatible.

https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/Compatibility.html#Audit_Log_Output


Adding a field for caller context seemed to be accepted since it is optional 
feature disabled by default.
https://github.com/apache/hadoop/blob/rel/release-3.3.1/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java#L8480-L8498

If we need to add fields, making it optional might be an option.

Masatake Iwasaki

On 2021/10/11 16:09, tom lee wrote:

However, adding port is to modify the internal content of the IP field,
which has little impact on the overall layout.

In our cluster, we parse the audit log through Vector and send the data to
Kafka, which is unaffected.

tom lee  于2021年10月11日周一 下午2:44写道:


Thank Ayush for reminding me. I also have similar concerns, so I published
this discussion, hoping to let the members of the community know about this
matter and then give suggestions.

Ayush Saxena  于2021年10月11日周一 下午2:38写道:


Hey
I am not sure whether we can directly go and change this. Any changes to
Audit Log format are considered incompatible.


https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/Compatibility.html#Audit_Log_Output

-Ayush

On 10-Oct-2021, at 7:57 PM, tom lee  wrote:

Hi all,

In our production environment, we occasionally encounter a problem where a
user submits an abnormal computation task, causing a sudden flood of
requests, which causes the queueTime and processingTime of the Namenode to
rise very high, causing a large backlog of tasks.

We usually locate and kill specific Spark, Flink, or MapReduce tasks based
on metrics and audit logs. Currently, IP and UGI are recorded in audit
logs, but there is no port information, so it is difficult to locate
specific processes sometimes. Therefore, I propose that we add the port
information to the audit log, so that we can easily track the upstream
process.

Currently, some projects contain port information in audit logs, such as
Hbase and Alluxio. I think it is also necessary to add port information
for
HDFS audit logs.

I submitted a PR(https://github.com/apache/hadoop/pull/3538), which has
been tested in our test environment, and both RPC and HTTP are in effect.
I
look forward to your discussion on possible problems and suggestions for
modification. I will actively update the PR.

Best Regards,
Tom






-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



Re: [DISCUSS] Add remote port information to HDFS audit log

2021-10-11 Thread Ayush Saxena
Hey
I am not sure whether we can directly go and change this. Any changes to Audit 
Log format are considered incompatible.

https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/Compatibility.html#Audit_Log_Output

-Ayush

> On 10-Oct-2021, at 7:57 PM, tom lee  wrote:
> 
> Hi all,
> 
> In our production environment, we occasionally encounter a problem where a
> user submits an abnormal computation task, causing a sudden flood of
> requests, which causes the queueTime and processingTime of the Namenode to
> rise very high, causing a large backlog of tasks.
> 
> We usually locate and kill specific Spark, Flink, or MapReduce tasks based
> on metrics and audit logs. Currently, IP and UGI are recorded in audit
> logs, but there is no port information, so it is difficult to locate
> specific processes sometimes. Therefore, I propose that we add the port
> information to the audit log, so that we can easily track the upstream
> process.
> 
> Currently, some projects contain port information in audit logs, such as
> Hbase and Alluxio. I think it is also necessary to add port information for
> HDFS audit logs.
> 
> I submitted a PR(https://github.com/apache/hadoop/pull/3538), which has
> been tested in our test environment, and both RPC and HTTP are in effect. I
> look forward to your discussion on possible problems and suggestions for
> modification. I will actively update the PR.
> 
> Best Regards,
> Tom


[jira] [Resolved] (HADOOP-17957) Replace Guava VisibleForTesting by Hadoop's own annotation in hadoop-hdfs-project modules

2021-10-11 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HADOOP-17957.
---
Resolution: Fixed

> Replace Guava VisibleForTesting by Hadoop's own annotation in 
> hadoop-hdfs-project modules
> -
>
> Key: HADOOP-17957
> URL: https://issues.apache.org/jira/browse/HADOOP-17957
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



ApacheCon@Home Big Data tracks recordings!

2021-10-11 Thread Wei-Chiu Chuang
For those who missed the live Apache@Home Big Data tracks, the video
recordings are being uploaded to the official ASF channel!

Big Data:
https://www.youtube.com/playlist?list=PLU2OcwpQkYCzXcumE9UxNirLF1IYLmARj
Big Data Ozone:
https://www.youtube.com/playlist?list=PLU2OcwpQkYCxtPdZ0nSowYLQMgkmoczMl
Big Data SQL/NoSQL:
https://www.youtube.com/playlist?list=PLU2OcwpQkYCwu-bpf3K-OIfAjHpf4kr4L
Big Data Streaming:
https://www.youtube.com/playlist?list=PLU2OcwpQkYCwf7Cl6xsCgHuIa8_NWX2JG

You can find other topics as well:
https://www.youtube.com/c/TheApacheFoundation/playlists

Thanks to all who presented. I seen multiple talks related to Hadoop:

* YARN Resource Management and Dynamic Max by Fang Liu, Fengguang Tian,
  Prashant Golash, Hanxiong Zhang, Shuyi Zhang
* Uber HDFS Unit Storage Cost 10x Deduction by Jeffrey Zhong, Jing Zhao,
Leon
  Gao
* Scaling the Namenode - Lessons learnt by Dinesh Chitlangia
* How Uber achieved millions of savings by managing disk IO across HDFS
  cluster by Leon Gao, Ekanth Sethuramalingam
* Containing an Elephant: How we moved Hadoop/HBase into Kubernetes and
Public
  Cloud by Dhiraj Hegde


You can also find the recordings for the Apache Asia (August 2021) and some
of our community members who presented include:

* Bigtop 3.0: Rerising community driven Hadoop distribution by Kengo Seki,
  Masatake Iwasaki.
* Technical tips for secure Apache Hadoop cluster by Akira Ajisaka, Kei
KORI.
* Data Lake accelerator on Hadoop-COS in Tencent Cloud by Li Cheng.

I may have missed a few great talks as I glanced through the list, so
please let me know if you find other relevant talks in other tracks.

Cheers,
Wei-Chiu