Hi Carter, I don't have a super strong feeling for this either. I just would like us to try and reuse a library before reinventing the wheel, but my time is limited, like everyone else.
Gary On Mon, Aug 3, 2020, 13:19 Carter Kozak <cko...@ckozak.net> wrote: > Hi Gary, > > That sounds reasonable, however there are some subtle distinctions > between general purpose object pools and those used for http > connections, perhaps the most obvious of which is http/2 multiplexed > connections. It's certainly possible that it could work, but I imagine it > could become a large project and may result in something very > specific to http clients. Pools used by blocking and asynchronous > clients may also differ substantially, so I'm not sure how general this > implementation could be. > > I'd be happy to help in any way I can if folks more familiar than myself > with the current hc pool implementations are supportive of your proposal, > but I'm worried that it may become a blocker that prevents incremental > improvements. > > In conclusion, I don't feel comfortable taking a stance on this and I trust > the team to make the right decision. > > -ck > > On Fri, Jul 31, 2020, at 17:38, Gary Gregory wrote: > > What I would prefer we do is not reinvent yet another pooling feature, > all > > its potential bugs and added maintenance load when that wheel's been > > invented over and over. For me personally, I would prefer that this > project > > adopt a FOSS pooling library and improve _it_ if that were needed, > thereby > > improving the whole ecosystem. > > > > Gary > > > > On Fri, Jul 31, 2020, 16:22 Carter Kozak <cko...@ckozak.net> wrote: > > > > > On Fri, Jul 31, 2020, at 14:31, Gary Gregory wrote: > > > > On Fri, Jul 31, 2020 at 12:45 PM Oleg Kalnichevski <ol...@apache.org > > > > > wrote: > > > > > > > > > On Thu, 2020-07-30 at 13:28 -0400, Carter Kozak wrote: > > > > > > Hello Friends, > > > > > > > > > > > > I recently had a service crash due to an out of memory error as > the > > > > > > result of a response leak. While this is caused by code that uses > > > > > > closeable resources incorrectly, and that code should be fixed, > I'd > > > > > > like to discuss some ideas which could avoid catastrophic > failure in > > > > > > this scenario. > > > > > > > > > > > > A little background on my environment: I maintain an RPC library > > > > > > which wraps an hc5 client. There's no limit set on the hc5 > connection > > > > > > pool (Integer.MAX_VALUE), and resource use is bounded using a > > > > > > variation of additive-increase/multiplicative-decrease limiters > to > > > > > > scale based on load. Some endpoints are capable of returning a > raw > > > > > > binary stream, this introduces some risk of resource leaks -- > ideally > > > > > > we would take an inputstream consumer to fully own the > lifecycle, but > > > > > > unfortunately that's not an API change we can make at this > point. We > > > > > > return permits before passing the resulting inputstream to custom > > > > > > code to prevent leaks from blocking the entire service. > > > > > > > > > > > > Unfortunately leased connections are anchored directly to both > > > > > > StrictConnPool and LaxConnPool in order to validate that > connections > > > > > > are returned to the pool where they originated, which prevents a > > > > > > relatively large swath of resources from being garbage collected > in > > > > > > the event of a leak. I realize my case is a bit different due to > the > > > > > > effectively unlimited pool size, where memory becomes the > limiting > > > > > > factor rather than requests blocking against the pool limit, but > I > > > > > > believe there are a few options that can help us to both detect > and > > > > > > endure leaks. > > > > > > > > > > > > A couple potential solutions: > > > > > > > > > > > > Use weak references to track leased connections with a periodic > sweep > > > > > > to detect leaks and reclaim per-route permits. This approach > would be > > > > > > complex, but allows the default configuration to detect and > handle > > > > > > rare response leaks without eventually blocking connection > requests. > > > > > > Using this approach, a leaked connection wouldn't necessarily be > > > > > > closed when the pool is shut down, but may survive until a full > GC. > > > > > > In most cases I expect clients to be long-lived, and I'm not sure > > > > > > what guarantees are made about leaked connections. > > > > > > > > > > > > Use a counter to track leased connections instead of tracking > leased > > > > > > entries directly. This loses the safety guarantees that > connections > > > > > > aren't already leased, and that they're being returned to the > pool > > > > > > where they originated, however it eliminates the need for the > > > > > > connection pools hash table and references to checked-out values. > > > > > > This solution would not be helpful when connections are leaked > and > > > > > > the route limit is reached, so it may make more sense as a > > > > > > PoolConcurrencyPolicy.UNLIMITED option. This design would not be > able > > > > > > to close leased connections when the pool is closed, they would > > > > > > remain open until they are released, when the closed pool could > close > > > > > > the incoming connection. > > > > > > > > > > > > Have you encountered similar problems? If you would consider a > > > > > > contribution with one of these designs or another I have not yet > > > > > > considered, I will open a ticket and plan to implement a > solution. > > > > > > Any and all feedback is greatly appreciated. > > > > > > > > > > > > Thanks, > > > > > > Carter Kozak > > > > > > > > > > > > > > > > Hi Carter > > > > > > > > > > Automatic recovery of leaked connections has been actively > discussed in > > > > > HC 3.x and early HC 4.x days without ever producing any practical > > > > > outcome. It is _massively_ easier to just release connections then > jump > > > > > through many hoops to try and reclaim what might or might not be > some > > > > > leaked connections. > > > > > > > > > > You are very welcome to give it a shot but I would kindly ask you > to > > > > > start it off as a completely new connection manager initially. Once > > > > > there is a working solution we can decide how to fold it into the > > > > > project and in what form. > > > > > > > > > > > > > This might be made easier by using Apache Commons Pool which can > track > > > > borrowed but abandoned objects: > > > > > > > > The pool can also be configured to detect and remove "abandoned" > objects, > > > > i.e. objects that have been checked out of the pool but neither used > nor > > > > returned before the configured removeAbandonedTimeout > > > > < > > > > https://commons.apache.org/proper/commons-pool/apidocs/org/apache/commons/pool2/impl/AbandonedConfig.html#getRemoveAbandonedTimeout-- > > > >. > > > > Abandoned object removal can be configured to happen when > borrowObject is > > > > invoked and the pool is close to starvation, or it can be executed > by the > > > > idle object evictor, or both. If pooled objects implement the > TrackedUse > > > > < > > > > https://commons.apache.org/proper/commons-pool/apidocs/org/apache/commons/pool2/TrackedUse.html > > > > > > > > interface, > > > > their last use will be queried using the getLastUsed method on that > > > > interface; otherwise abandonment is determined by how long an object > has > > > > been checked out from the pool. > > > > > > > > See > > > > > > > > https://commons.apache.org/proper/commons-pool/apidocs/index.html?org/apache/commons/pool2/impl/GenericObjectPool.html > > > > > > > > Gary > > > > > > > > Cheers > > > > > > > > > > Oleg > > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > > To unsubscribe, e-mail: dev-unsubscr...@hc.apache.org > > > > > For additional commands, e-mail: dev-h...@hc.apache.org > > > > > > > > > > > > > > > > > > > > Thank you both, > > > > > > Oleg, I agree that it's much easier to close resources than it is to > write > > > a pool that is capable of recovery, and we shouldn't take on undue > > > complexity to force a solution to work. That said, the surface area of > code > > > that has the potential to leak (code that uses HC) is orders of > magnitude > > > larger than the connection pool, and a fix could reduce JIRA load on > the > > > project. I will plan to implement an initial experimental version as an > > > alternative to PoolingHttpClientConnectionManager so we can avoid > impacting > > > users, and I can validate it in a high-load production environment > before > > > it considered for a release. > > > > > > Gary, thanks for pointing out the abandonment feature in commons-pool! > I > > > worry about detection based on a timeout because it's difficult to be > > > certain an object is abandoned. While multi-hour requests waiting for > > > response headers aren't a good idea, I've run into a few places that > depend > > > on them and wouldn't want to break users. > > > > > > Carter > > > > > >