[
https://issues.apache.org/jira/browse/HADOOP-19330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Steve Loughran updated HADOOP-19330:
------------------------------------
Description:
A recurring problem is that applications forget to close their input streams;
eventually the HTTP connection runs out.
Having the finalizer close streams during GC will ensure that after a GC the
http connections are returned. While this is an improvement on today, it is
insufficient
* only happens during GC, so may not fix problem entirely
* doesn't let developers know things are going wrong.
* doesn't let us differentiate well between stream leak and overloaded FS
proposed enhancements then
* collect stack trace in constructor
* log in finalize at warn including path, thread and stack
* have special log for this, so it can be turned off in production (libraries
telling end users off for developer errors is simply an annoyance)
h2. Leak Reporting
* the log for leak reporting is {{org.apache.hadoop.fs.resource.leaks}}
* An error message is reported at WARN, including the file name.
* A stack trace of where the stream was created is reported
at INFO.
* A best-effort attempt is made to release any active HTTPS
connection.
* The filesystem IOStatistic stream_leaks is incremented.
The intent is to make it easier to identify where streams
are being opened and not closed -as these consume resources
including often HTTPS connections from the connection pool
of limited size.
It MUST NOT be relied on as a way to clean up open
files/streams automatically; some of the normal actions of
the close() method are omitted.
was:
A recurring problem is that applications forget to close their input streams;
eventually the HTTP connection runs out.
Having the finalizer close streams during GC will ensure that after a GC the
http connections are returned. While this is an improvement on today, it is
insufficient
* only happens during GC, so may not fix problem entirely
* doesn't let developers know things are going wrong.
* doesn't let us differentiate well between stream leak and overloaded FS
proposed enhancements then
* collect stack trace in constructor
* log in finalize at warn including path, thread and stack
* have special log for this, so it can be turned off in production (libraries
telling end users off for developer errors is simply an annoyance)
> S3A: Add LeakReporter; use in S3AInputStream
> --------------------------------------------
>
> Key: HADOOP-19330
> URL: https://issues.apache.org/jira/browse/HADOOP-19330
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 3.4.1
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Priority: Major
> Labels: pull-request-available
> Fix For: 3.5.0, 3.4.2
>
>
> A recurring problem is that applications forget to close their input streams;
> eventually the HTTP connection runs out.
> Having the finalizer close streams during GC will ensure that after a GC the
> http connections are returned. While this is an improvement on today, it is
> insufficient
> * only happens during GC, so may not fix problem entirely
> * doesn't let developers know things are going wrong.
> * doesn't let us differentiate well between stream leak and overloaded FS
> proposed enhancements then
> * collect stack trace in constructor
> * log in finalize at warn including path, thread and stack
> * have special log for this, so it can be turned off in production (libraries
> telling end users off for developer errors is simply an annoyance)
> h2. Leak Reporting
> * the log for leak reporting is {{org.apache.hadoop.fs.resource.leaks}}
> * An error message is reported at WARN, including the file name.
> * A stack trace of where the stream was created is reported
> at INFO.
> * A best-effort attempt is made to release any active HTTPS
> connection.
> * The filesystem IOStatistic stream_leaks is incremented.
> The intent is to make it easier to identify where streams
> are being opened and not closed -as these consume resources
> including often HTTPS connections from the connection pool
> of limited size.
> It MUST NOT be relied on as a way to clean up open
> files/streams automatically; some of the normal actions of
> the close() method are omitted.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]