[ 
https://issues.apache.org/jira/browse/HADOOP-19330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19330:
------------------------------------
    Description: 
A recurring problem is that applications forget to close their input streams; 
eventually the HTTP connection runs out.

Having the finalizer close streams during GC will ensure that after a GC the 
http connections are returned. While this is an improvement on today, it is 
insufficient
* only happens during GC, so may not fix problem entirely
* doesn't let developers know things are going wrong.
* doesn't let us differentiate well between stream leak and overloaded FS

proposed enhancements then
* collect stack trace in constructor
* log in finalize at warn including path, thread and stack
* have special log for this, so it can be turned off in production (libraries 
telling end users off for developer errors is simply an annoyance)

h2. Leak Reporting

* the log for  leak reporting is {{org.apache.hadoop.fs.resource.leaks}}
* An error message is reported at WARN, including the file name.
* A stack trace of where the stream was created is reported
  at INFO.
* A best-effort attempt is made to release any active HTTPS
  connection.
* The filesystem IOStatistic stream_leaks is incremented.

The intent is to make it easier to identify where streams
are being opened and not closed -as these consume resources
including often HTTPS connections from the connection pool
of limited size.

It MUST NOT be relied on as a way to clean up open
files/streams automatically; some of the normal actions of
the close() method are omitted.



  was:

A recurring problem is that applications forget to close their input streams; 
eventually the HTTP connection runs out.

Having the finalizer close streams during GC will ensure that after a GC the 
http connections are returned. While this is an improvement on today, it is 
insufficient
* only happens during GC, so may not fix problem entirely
* doesn't let developers know things are going wrong.
* doesn't let us differentiate well between stream leak and overloaded FS

proposed enhancements then
* collect stack trace in constructor
* log in finalize at warn including path, thread and stack
* have special log for this, so it can be turned off in production (libraries 
telling end users off for developer errors is simply an annoyance)




> S3A: Add LeakReporter; use in S3AInputStream
> --------------------------------------------
>
>                 Key: HADOOP-19330
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19330
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.4.1
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.5.0, 3.4.2
>
>
> A recurring problem is that applications forget to close their input streams; 
> eventually the HTTP connection runs out.
> Having the finalizer close streams during GC will ensure that after a GC the 
> http connections are returned. While this is an improvement on today, it is 
> insufficient
> * only happens during GC, so may not fix problem entirely
> * doesn't let developers know things are going wrong.
> * doesn't let us differentiate well between stream leak and overloaded FS
> proposed enhancements then
> * collect stack trace in constructor
> * log in finalize at warn including path, thread and stack
> * have special log for this, so it can be turned off in production (libraries 
> telling end users off for developer errors is simply an annoyance)
> h2. Leak Reporting
> * the log for  leak reporting is {{org.apache.hadoop.fs.resource.leaks}}
> * An error message is reported at WARN, including the file name.
> * A stack trace of where the stream was created is reported
>   at INFO.
> * A best-effort attempt is made to release any active HTTPS
>   connection.
> * The filesystem IOStatistic stream_leaks is incremented.
> The intent is to make it easier to identify where streams
> are being opened and not closed -as these consume resources
> including often HTTPS connections from the connection pool
> of limited size.
> It MUST NOT be relied on as a way to clean up open
> files/streams automatically; some of the normal actions of
> the close() method are omitted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to