[ https://issues.apache.org/jira/browse/HDFS-12427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
James Clampffer updated HDFS-12427: ----------------------------------- Resolution: Fixed Status: Resolved (was: Patch Available) Committed the 001 patch that [~mtracy] reviewed to HDFS-8707. The later patches were just there to poke the CI system. Still trying to sort out the CI issues with HDFS-12640 so I prior to committing I checked docker runs locally and ran stress tests on a few large clusters that were hitting the issue as well as valgrind tests and everything looked good. > libhdfs++: Prevent Requests from holding dangling pointer to RpcEngine > ---------------------------------------------------------------------- > > Key: HDFS-12427 > URL: https://issues.apache.org/jira/browse/HDFS-12427 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client > Reporter: James Clampffer > Assignee: James Clampffer > Priority: Critical > Attachments: HDFS-12427.HDFS-8707.000.patch, > HDFS-12427.HDFS-8707.001.patch, HDFS-12427.HDFS-8707.002.patch, > HDFS-12427.HDFS-8707.003.patch, HDFS-12427.HDFS-8707.004.patch > > > The lifetime of Request objects is tied to the worker thread(s) in the async > event loop. In the current code there's nothing that prevents a request from > outliving the RpcEngine (bound to FileSystem) while it's waiting for IO. If > the Request, or a task that makes a new request, outlives the RpcEngine it > attempts to dereference a dangling pointer and either crashes or continues to > run with bad data. > Proposed fix is to reference count the RpcEngine via shared_ptr so that > Requests can hold a weak_ptr to it. When a request or RpcConnection > attempting to make a request needs something from the RpcEngine like a call > id number it can promote the weak_ptr to a shared_ptr. If it's unable to > promote because the RpcEngine has been destroyed the Request's handler can be > invoked with an appropriate error message. A weak_ptr must be used rather > than a shared_ptr to avoid reference cycles. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org