oversearch opened a new pull request #10220: URL: https://github.com/apache/pulsar/pull/10220
### Motivation The HandlerBase class has a shared_ptr to a boost ASIO deadline timer object. This deadline timer instance has a bare reference (internally) to a corresponding io_service object it takes as a parameter, but there's no shared ownership relationship between these objects. Boost expects io_service objects to outlive any users of them - it's more designed to be a stack allocated object setting at the top of a worker thread, passed down into any code that needs to use it, as opposed to this heap-allocated shared ownership scheme. This can still be problematic however, if a thread_local/global variable indirectly references it and outlives it (which is how I tripped over this), because deadline timers will reference the io_service in their destructor, leading to use-after-free undefined behavior. As can been seen by looking through the Pulsar C++ client code, all but one of the existing use cases carefully ensure that a shared reference to an ExecutorService, which owns the io_service instances, is kept above any deadline timer instances in the class order, so that they always outlive the timers. Unfortunately, this is very easy to overlook, as we see in the exceptional case here. ### Modifications This change applies a band-aid solution to the lifetime issue. The fix is simple: add an additional shared pointer reference to the executor service in the HandlerBase object above the timer object. Since HandlerBase is a parent of PublisherImpl, which also needs the same pointer, I removed the redundant one from PublisherImpl and let it use HandlerBase's instance. A good long term goal would be to change the ownership model of these ASIO objects entirely to better fit the expectations of the boost library, but I won't attempt that here. ### Verifying this change - [ ] Make sure that the change passes the CI checks. The problem here is undefined behavior, and is only explicitly exposed by building the code with Address Sanitizer enabled, or by using Valgrind. I couldn't think of an easy way to write a test for it. That said, I've run the existing test suite a few times and the code passes, and have verified that the ASAN build of my company's product now passes without errors where it crashed before. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
