steveloughran commented on a change in pull request #3930: URL: https://github.com/apache/hadoop/pull/3930#discussion_r801755256
########## File path: hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/auditing_architecture.md ########## @@ -119,16 +119,81 @@ The auditor then creates and returns a span for the specific operation. The AuditManagerS3A will automatically activate the span returned by the auditor (i.e. assign it the thread local variable tracking the active span in each thread). -### Memory Leakage through `ThreadLocal` use +### Memory Leakage through `ThreadLocal` misuse -This architecture contains a critical defect, +The original implementation of the integration with the S3AFileSystem class +contained a critical defect, [HADOOP-18091](https://issues.apache.org/jira/browse/HADOOP-18091) _S3A auditing leaks memory through ThreadLocal references_. -The code was written assuming that when the `ActiveAuditManagerS3A` service is -stopped, it's `ThreadLocal` fields would be freed. -In fact, they are retained until the threads with references are terminated. +The original code was written with the assumption that when the `ActiveAuditManagerS3A` service was +garbage collected, references in its `ThreadLocal` field would be freed. +In fact, they are retained until all threads with references are terminated. +If any long-lived thread had performed an s3 operation which created a span, +a reference back to the audit manager instance was created +*whose lifetime was that of the thread* + +In short-lived processes, and long-lived processes where a limited set of +`S3AFileSystem` instances were reused, this had no adverse effect. +Indeed, if the filesystem instances were retained in the cache until +the process was shut down, there would be strong references to the owning +`S3AFileSystem` instance anyway. + +Where it did have problems was when the following conditions were met +1. Process was long-lived. +2. Long-lived threads in the process invoked filesystem operations on `s3a://` URLs. +3. Either `S3AFileSystem` instances were created repeatedly, rather than retrieved + from the cache of active instances. +4. Or, after a query for a specific user was completed, + `FileSystem.closeAllForUGI(UserGroupInformation)` was invoked to remove all + cached FS instances of that user. + +Conditions 1, 2 and 4 are exactly those which long-lived Hive services can +encounter. + +_Auditing was disabled by default until a fix was implemented._ + +The memory leak has been fixed using a new class `org.apache.hadoop.util.WeakReferenceMap` +to store a map of thread IDs to active spans. When the S3A fileystem is closed, +its audit manager service is stopped and all references to spans removed from the +map of thread ID to span. + +Weak References are used for the map so that span references do not consume memory even if +threads are terminated without resetting the span reference of that thread. +There is therefore a theoretical risk that if a garbage collection takes place during +execution of a spanned operation, the reference will be lost. + +This is not considered an issue as all bounded entry points into the S3A filesystem +retain a strong reference to their audit span. + +All entry points which return an object which can invoke s3 operations (input and output +streams, list iterators, etc.) also retain a strong reference to their span, a reference +they activate before performing S3 operations. +A factory method is provided to demand-generate a new span if, somehow, these conditions +are not met. The "unbounded span" is used here. +Except in deployments where `fs.s3a.audit.reject.out.of.span.operations` is true, +invoking S3 operations within the unbounded span are permitted. +That option is set to `true` within S3A test suites. +Therefore it is unlikely that any operations are invoked in unbounded spans except +for the special case of copy operations invoked by the transfer manager threads. +Those are already ignored in the logging auditor, whose unbounded span ignores +requests which `AWSRequestAnalyzer.isRequestNotAlwaysInSpan()` indicates +may happen outside of a span. +This is restricted to bucket location probes possibly performed by the SDK +on instantiatio, and copy part/complete calls. + + +```java + public static boolean + isRequestNotAlwaysInSpan(final Object request) { + return request instanceof CopyPartRequest + || request instanceof CompleteMultipartUploadRequest + || request instanceof GetBucketLocationRequest; + } +``` + +Auditing is still disabled by default for consistency with previous releases. Review comment: good point -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org