[GitHub] [hadoop] steveloughran commented on pull request #2807: HADOOP-17511. Add audit/telemetry logging to S3A connector
steveloughran commented on pull request #2807: URL: https://github.com/apache/hadoop/pull/2807#issuecomment-847810378 backporting to branch-3.3 if the tests run successfully. Merge has gone in and first test run is happy. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] steveloughran commented on pull request #2807: HADOOP-17511. Add audit/telemetry logging to S3A connector
steveloughran commented on pull request #2807: URL: https://github.com/apache/hadoop/pull/2807#issuecomment-847700540 Yetus reports are a bit confused, but the output is good * checkstyles are mistaken/unavoidable * tests are good merging -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] steveloughran commented on pull request #2807: HADOOP-17511. Add audit/telemetry logging to S3A connector
steveloughran commented on pull request #2807: URL: https://github.com/apache/hadoop/pull/2807#issuecomment-847222690 not sure what is up with yetus there. Submitted again, with some updated docs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] steveloughran commented on pull request #2807: HADOOP-17511. Add audit/telemetry logging to S3A connector
steveloughran commented on pull request #2807: URL: https://github.com/apache/hadoop/pull/2807#issuecomment-847159189 rebased to trunk again after the AWS region patch from mehakmeet -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] steveloughran commented on pull request #2807: HADOOP-17511. Add audit/telemetry logging to S3A connector
steveloughran commented on pull request #2807: URL: https://github.com/apache/hadoop/pull/2807#issuecomment-847148990 Somehow the header test had failed on the principal. Changes * how the principal is added has changed * fixed up the referrer entry which adding a hadoop/1 prefix change of friday was no longer a valid URI. This wasn't just a yetus failure; I replicated it locally. How did my tests pass? They didn't, but I hadn't noticed because the failsafe test run was happening anyway...and the failure of the unit tests was happening in a scrolled of test run I wasn't looking at. That's not good: I've always expected maven to fail as soon as unit tests do. Will investigate separately -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] steveloughran commented on pull request #2807: HADOOP-17511. Add audit/telemetry logging to S3A connector
steveloughran commented on pull request #2807: URL: https://github.com/apache/hadoop/pull/2807#issuecomment-847125985 legit test regression. The code to determine the principal is returning null ``` [ERROR] testHeaderComplexPaths(org.apache.hadoop.fs.s3a.audit.TestHttpReferrerAuditHeader) Time elapsed: 0.006 s <<< FAILURE! org.junit.ComparisonFailure: [pr] expected:<"jenkins"> but was: at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at org.apache.hadoop.fs.s3a.audit.AbstractAuditingTest.assertMapContains(AbstractAuditingTest.java:210) at org.apache.hadoop.fs.s3a.audit.TestHttpReferrerAuditHeader.testHeaderComplexPaths(TestHttpReferrerAuditHeader.java:135) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.lang.Thread.run(Thread.java:748) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] steveloughran commented on pull request #2807: HADOOP-17511. Add audit/telemetry logging to S3A connector
steveloughran commented on pull request #2807: URL: https://github.com/apache/hadoop/pull/2807#issuecomment-847009025 test run all good, getting a bit slow (tombstones?) ``` [INFO] [WARNING] Tests run: 151, Failures: 0, Errors: 0, Skipped: 17 [INFO] [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 37:52 min (Wall Clock) [INFO] Finished at: 2021-05-24T12:48:00+01:00 [INFO] ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] steveloughran commented on pull request #2807: HADOOP-17511. Add audit/telemetry logging to S3A connector
steveloughran commented on pull request #2807: URL: https://github.com/apache/hadoop/pull/2807#issuecomment-846551528 thank's for the reviews, comments, votes etc. I'll address all of @mehakmeet's little details, push up a rebased/squashed PR to force it through yetus, then merge -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] steveloughran commented on pull request #2807: HADOOP-17511. Add audit/telemetry logging to S3A connector
steveloughran commented on pull request #2807: URL: https://github.com/apache/hadoop/pull/2807#issuecomment-842340255 +git showing some log output during a terasort test https://gist.github.com/steveloughran/8e0aadb51c63f1c3538deda19ee952ae some of the events (e.g 183c9826b45486e485693808f38e2c4071004bf5dfd4c3ab210f0a21a4235ef8 ) have job ID in the referrer header "ji=job_1620911577786_0006". This is only set during the FS operations the S3A committer performs during task and job, as they're the only ones we know are explicitly related to a job. If we were confident that whichever thread called `Committer.setupTask()` was the only thread making FileSystem API calls for that task then we could set it at the task level. The`org.apache.hadoop.fs.audit.CommonAuditContext` class provides global and thread local context maps to let apps attach such attributes; the new ManifestCommitter will be setting them so that once ABFS picks up the same auditing, the context info will come down. Modified versions of Hive, Spark etc could use this API to set any of their context info when a specific thread was scheduled to work for a given query; trying to guess in the hadoop committer isn't the right place -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] steveloughran commented on pull request #2807: HADOOP-17511. Add audit/telemetry logging to S3A connector
steveloughran commented on pull request #2807: URL: https://github.com/apache/hadoop/pull/2807#issuecomment-842340255 +git showing some log output during a terasort test https://gist.github.com/steveloughran/8e0aadb51c63f1c3538deda19ee952ae some of the events (e.g 183c9826b45486e485693808f38e2c4071004bf5dfd4c3ab210f0a21a4235ef8 ) have job ID in the referrer header "ji=job_1620911577786_0006". This is only set during the FS operations the S3A committer performs during task and job, as they're the only ones we know are explicitly related to a job. If we were confident that whichever thread called `Committer.setupTask()` was the only thread making FileSystem API calls for that task then we could set it at the task level. The`org.apache.hadoop.fs.audit.CommonAuditContext` class provides global and thread local context maps to let apps attach such attributes; the new ManifestCommitter will be setting them so that once ABFS picks up the same auditing, the context info will come down. Modified versions of Hive, Spark etc could use this API to set any of their context info when a specific thread was scheduled to work for a given query; trying to guess in the hadoop committer isn't the right place -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] steveloughran commented on pull request #2807: HADOOP-17511. Add audit/telemetry logging to S3A connector
steveloughran commented on pull request #2807: URL: https://github.com/apache/hadoop/pull/2807#issuecomment-841228793 BTW, terasort tests show that the committers are passing in job IDs in MR job ``` 183c9826b45486e485693808f38e2c4071004bf5dfd4c3ab210f0a21a4235ef8 stevel-london [13/May/2021:13:16:48 +] 109.157.171.170 arn:aws:iam::152813717728:user/stevel-dev R4THYV1DGS4DASS2 REST.GET.BUCKET - "GET /?list-type=2&max-keys=5000&prefix=terasort-magic%2Fsortin%2F__magic%2Fjob-job_1620911577786_0004%2Ftasks%2Fattempt_1620911577786_0004_m_01_0%2F&fetch-owner=false HTTP/1.1" 200 - 982 - 13 12 "https://audit.example.org/op_delete/7e459c19-f1fe-4713-9788-35d77206f9cc-0012/?op=op_delete&p1=s3a://stevel-london/terasort-magic/sortin/__magic/job-job_1620911577786_0004/tasks/attempt_1620911577786_0004_m_01_0&pr=stevel&ps=9866ff56-a50d-4744-8f38-7b9d29942e95&id=7e459c19-f1fe-4713-9788-35d77206f9cc-0012&t0=1&fs=7e459c19-f1fe-4713-9788-35d77206f9cc&t1=44&ji=job_1620911577786_0004&ts=1620911808104"; "Hadoop 3.4.0-SNAPSHOT, aws-sdk-java/1.11.901 Mac_OS_X/10.16 OpenJDK_64-Bit_Server_VM/25.282-b08 java/1.8.0_282 vendor/AdoptOpenJDK" - AW4uMFNGBVw+RZyNWphOgrVA27e1wQx7Fkg2/3+yGf4p R2lRvEac4NA3UXAEqhSEPs3J8bBG0r0= SigV4 ECDHE-RSA-AES128-GCM-SHA256 AuthHeader stevel-london.s3.eu-west-2.amazonaws.com TLSv1.2 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] steveloughran commented on pull request #2807: HADOOP-17511. Add audit/telemetry logging to S3A connector
steveloughran commented on pull request #2807: URL: https://github.com/apache/hadoop/pull/2807#issuecomment-841225106 I don't get why patch doesn't work. Going to squash the patches, rebase to trunk, retry -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] steveloughran commented on pull request #2807: HADOOP-17511. Add audit/telemetry logging to S3A connector
steveloughran commented on pull request #2807: URL: https://github.com/apache/hadoop/pull/2807#issuecomment-811277712 I'm going to do a squash of the PR and push up, as yetus has completely given up trying to build this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] steveloughran commented on pull request #2807: HADOOP-17511. Add audit/telemetry logging to S3A connector
steveloughran commented on pull request #2807: URL: https://github.com/apache/hadoop/pull/2807#issuecomment-808185819 I'm going to say the failures are related as its in the auditor code. interesting that you saw and not me. Will look at next week -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] steveloughran commented on pull request #2807: HADOOP-17511. Add audit/telemetry logging to S3A connector
steveloughran commented on pull request #2807: URL: https://github.com/apache/hadoop/pull/2807#issuecomment-805151587 its not merging and I've over-squashed things into the AWS metrics patch. Will need to unroll it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org