[GitHub] [hadoop] steveloughran commented on pull request #2807: HADOOP-17511. Add audit/telemetry logging to S3A connector

2021-05-25 Thread GitBox


steveloughran commented on pull request #2807:
URL: https://github.com/apache/hadoop/pull/2807#issuecomment-847810378


   backporting to branch-3.3 if the tests run successfully. Merge has gone in 
and first test run is happy.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] steveloughran commented on pull request #2807: HADOOP-17511. Add audit/telemetry logging to S3A connector

2021-05-25 Thread GitBox


steveloughran commented on pull request #2807:
URL: https://github.com/apache/hadoop/pull/2807#issuecomment-847700540


   Yetus reports are a bit confused, but the output is good
   * checkstyles are mistaken/unavoidable
   * tests are good
   merging


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] steveloughran commented on pull request #2807: HADOOP-17511. Add audit/telemetry logging to S3A connector

2021-05-24 Thread GitBox


steveloughran commented on pull request #2807:
URL: https://github.com/apache/hadoop/pull/2807#issuecomment-847222690


   not sure what is up with yetus there. Submitted again, with some updated docs


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] steveloughran commented on pull request #2807: HADOOP-17511. Add audit/telemetry logging to S3A connector

2021-05-24 Thread GitBox


steveloughran commented on pull request #2807:
URL: https://github.com/apache/hadoop/pull/2807#issuecomment-847159189


   rebased to trunk again after the AWS region patch from mehakmeet


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] steveloughran commented on pull request #2807: HADOOP-17511. Add audit/telemetry logging to S3A connector

2021-05-24 Thread GitBox


steveloughran commented on pull request #2807:
URL: https://github.com/apache/hadoop/pull/2807#issuecomment-847148990


   Somehow the header test had failed on the principal. Changes
   * how the principal is added has changed
   * fixed up the referrer entry which adding a hadoop/1 prefix change of 
friday was no longer a valid URI.
   
   This wasn't just a yetus failure; I replicated it locally. How did my tests 
pass? They didn't, but I hadn't noticed because the failsafe test run was 
happening anyway...and the failure of the unit tests was happening in a 
scrolled of test run I wasn't looking at.
   
   That's not good: I've always expected maven to fail as soon as unit tests 
do. Will investigate separately


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] steveloughran commented on pull request #2807: HADOOP-17511. Add audit/telemetry logging to S3A connector

2021-05-24 Thread GitBox


steveloughran commented on pull request #2807:
URL: https://github.com/apache/hadoop/pull/2807#issuecomment-847125985


   legit test regression. The code to determine the principal is returning null
   ```
   [ERROR] 
testHeaderComplexPaths(org.apache.hadoop.fs.s3a.audit.TestHttpReferrerAuditHeader)
  Time elapsed: 0.006 s  <<< FAILURE!
   org.junit.ComparisonFailure: [pr] expected:<"jenkins"> but was:
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at 
org.apache.hadoop.fs.s3a.audit.AbstractAuditingTest.assertMapContains(AbstractAuditingTest.java:210)
at 
org.apache.hadoop.fs.s3a.audit.TestHttpReferrerAuditHeader.testHeaderComplexPaths(TestHttpReferrerAuditHeader.java:135)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.lang.Thread.run(Thread.java:748)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] steveloughran commented on pull request #2807: HADOOP-17511. Add audit/telemetry logging to S3A connector

2021-05-24 Thread GitBox


steveloughran commented on pull request #2807:
URL: https://github.com/apache/hadoop/pull/2807#issuecomment-847009025


   test run all good, getting a bit slow (tombstones?)
   
   ```
   [INFO]
   [WARNING] Tests run: 151, Failures: 0, Errors: 0, Skipped: 17
   [INFO]
   [INFO] 

   [INFO] BUILD SUCCESS
   [INFO] 

   [INFO] Total time:  37:52 min (Wall Clock)
   [INFO] Finished at: 2021-05-24T12:48:00+01:00
   [INFO] 

   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] steveloughran commented on pull request #2807: HADOOP-17511. Add audit/telemetry logging to S3A connector

2021-05-23 Thread GitBox


steveloughran commented on pull request #2807:
URL: https://github.com/apache/hadoop/pull/2807#issuecomment-846551528


   thank's for the reviews, comments, votes etc.
   I'll address all of @mehakmeet's little details, push up a rebased/squashed 
PR to force it through yetus, then merge


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] steveloughran commented on pull request #2807: HADOOP-17511. Add audit/telemetry logging to S3A connector

2021-05-17 Thread GitBox


steveloughran commented on pull request #2807:
URL: https://github.com/apache/hadoop/pull/2807#issuecomment-842340255


   +git showing some log output during a terasort test
   https://gist.github.com/steveloughran/8e0aadb51c63f1c3538deda19ee952ae
   
   some of the events (e.g 
183c9826b45486e485693808f38e2c4071004bf5dfd4c3ab210f0a21a4235ef8 ) have job ID 
in the referrer header "ji=job_1620911577786_0006". This is only set during the 
FS operations the S3A committer performs during task and job, as they're the 
only ones we know are explicitly related to a job. If we were confident that 
whichever thread called `Committer.setupTask()` was the only thread making 
FileSystem API calls for that task then we could set it at the task level.
   
   The`org.apache.hadoop.fs.audit.CommonAuditContext` class provides global and 
thread local context maps to let apps attach such attributes; the new 
ManifestCommitter will be setting them so that once ABFS picks up the same 
auditing, the context info will come down.
   
   Modified versions of Hive, Spark etc could use this API to set any of their 
context info when a specific thread was scheduled to work for a given query; 
trying to guess in the hadoop committer isn't the right place
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] steveloughran commented on pull request #2807: HADOOP-17511. Add audit/telemetry logging to S3A connector

2021-05-17 Thread GitBox


steveloughran commented on pull request #2807:
URL: https://github.com/apache/hadoop/pull/2807#issuecomment-842340255


   +git showing some log output during a terasort test
   https://gist.github.com/steveloughran/8e0aadb51c63f1c3538deda19ee952ae
   
   some of the events (e.g 
183c9826b45486e485693808f38e2c4071004bf5dfd4c3ab210f0a21a4235ef8 ) have job ID 
in the referrer header "ji=job_1620911577786_0006". This is only set during the 
FS operations the S3A committer performs during task and job, as they're the 
only ones we know are explicitly related to a job. If we were confident that 
whichever thread called `Committer.setupTask()` was the only thread making 
FileSystem API calls for that task then we could set it at the task level.
   
   The`org.apache.hadoop.fs.audit.CommonAuditContext` class provides global and 
thread local context maps to let apps attach such attributes; the new 
ManifestCommitter will be setting them so that once ABFS picks up the same 
auditing, the context info will come down.
   
   Modified versions of Hive, Spark etc could use this API to set any of their 
context info when a specific thread was scheduled to work for a given query; 
trying to guess in the hadoop committer isn't the right place
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] steveloughran commented on pull request #2807: HADOOP-17511. Add audit/telemetry logging to S3A connector

2021-05-14 Thread GitBox


steveloughran commented on pull request #2807:
URL: https://github.com/apache/hadoop/pull/2807#issuecomment-841228793


   BTW, terasort tests show that the committers are passing in job IDs in MR job
   ```
   183c9826b45486e485693808f38e2c4071004bf5dfd4c3ab210f0a21a4235ef8 
stevel-london [13/May/2021:13:16:48 +] 109.157.171.170 
arn:aws:iam::152813717728:user/stevel-dev R4THYV1DGS4DASS2 REST.GET.BUCKET - 
"GET 
/?list-type=2&max-keys=5000&prefix=terasort-magic%2Fsortin%2F__magic%2Fjob-job_1620911577786_0004%2Ftasks%2Fattempt_1620911577786_0004_m_01_0%2F&fetch-owner=false
 HTTP/1.1" 200 - 982 - 13 12 
"https://audit.example.org/op_delete/7e459c19-f1fe-4713-9788-35d77206f9cc-0012/?op=op_delete&p1=s3a://stevel-london/terasort-magic/sortin/__magic/job-job_1620911577786_0004/tasks/attempt_1620911577786_0004_m_01_0&pr=stevel&ps=9866ff56-a50d-4744-8f38-7b9d29942e95&id=7e459c19-f1fe-4713-9788-35d77206f9cc-0012&t0=1&fs=7e459c19-f1fe-4713-9788-35d77206f9cc&t1=44&ji=job_1620911577786_0004&ts=1620911808104";
 "Hadoop 3.4.0-SNAPSHOT, aws-sdk-java/1.11.901 Mac_OS_X/10.16 
OpenJDK_64-Bit_Server_VM/25.282-b08 java/1.8.0_282 vendor/AdoptOpenJDK" - 
AW4uMFNGBVw+RZyNWphOgrVA27e1wQx7Fkg2/3+yGf4p
 R2lRvEac4NA3UXAEqhSEPs3J8bBG0r0= SigV4 ECDHE-RSA-AES128-GCM-SHA256 AuthHeader 
stevel-london.s3.eu-west-2.amazonaws.com TLSv1.2
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] steveloughran commented on pull request #2807: HADOOP-17511. Add audit/telemetry logging to S3A connector

2021-05-14 Thread GitBox


steveloughran commented on pull request #2807:
URL: https://github.com/apache/hadoop/pull/2807#issuecomment-841225106


   I don't get why patch doesn't work. Going to squash the patches, rebase to 
trunk, retry


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] steveloughran commented on pull request #2807: HADOOP-17511. Add audit/telemetry logging to S3A connector

2021-03-31 Thread GitBox


steveloughran commented on pull request #2807:
URL: https://github.com/apache/hadoop/pull/2807#issuecomment-811277712


   I'm going to do a squash of the PR and push up, as yetus has completely 
given up trying to build this


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] steveloughran commented on pull request #2807: HADOOP-17511. Add audit/telemetry logging to S3A connector

2021-03-26 Thread GitBox


steveloughran commented on pull request #2807:
URL: https://github.com/apache/hadoop/pull/2807#issuecomment-808185819


   I'm going to say the failures are related as its in the auditor code. 
interesting that you saw and not me. Will look at next week


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] steveloughran commented on pull request #2807: HADOOP-17511. Add audit/telemetry logging to S3A connector

2021-03-23 Thread GitBox


steveloughran commented on pull request #2807:
URL: https://github.com/apache/hadoop/pull/2807#issuecomment-805151587


   its not merging and I've over-squashed things into the AWS metrics patch. 
Will need to unroll it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org