westonpace commented on pull request #10968: URL: https://github.com/apache/arrow/pull/10968#issuecomment-908913026
> If I understand correctly, nothing here actually keeps the tasks alive. What this relies upon is that the user stored the shared_ptr or unique_ptr returned by Nursery::MakeXXXCloseable somewhere so that lifetimes are handled correctly? There's two concerns here. 1) Keeping the object alive (the nursery does not do this, but the smart pointers do) and 2) Not returning until all tasks have finished (the nursery does this because it blocks until every task has finished). > Is there a risk that these pointers may be kept alive too long (and delay the DoClose calls accordingly)? Yes, the nursery could very much trigger deadlock if a task never finishes. I'll call back to this later. > What is the recommended strategy for using this facility? Did you try to use this in the codebase to check that it actually reduces the burden of managing the sequencing of destructor calls? See https://github.com/apache/arrow/pull/11017 @pitrou --- I also received some negative feedback offline from @bkietz and so I tried today to rewrite this in another light, using only the asynchronous smart pointers and an asynchronous task group. The result is [here](https://github.com/apache/arrow/compare/master...westonpace:experiment/async-smart-ptr?expand=1). However, it still doesn't quite solve the problem. So, let me try and state the problem I am trying to solve in clear terms and I am open to any solution someone can come up with but at the moment this PR is still my preferred solution. # Problem Statement The use case here is from the perspective of a developer trying to write some code and they want to ensure the code they write is used safely. For example, let's consider the case of writing a file writer queue. A synchronous declaration might look something like this... ``` class FileWriterQueue { public: FileWriterQueue(std::unique_ptr<FileWriter> writer, const FileWriteOptions& options); ~FileWriterQueue(); Status QueueBatch(std::shared_ptr<RecordBatch> batch); void Finish(); private: std::unique_ptr<FileWriter> writer; const FileWriteOptions& options; }; ``` Now, let's pretend our implementation creates a dedicated writer thread and every call to QueueBatch adds the batch to a producer consumer queue that the writer thread drains. Our destructor would then look like this... ``` FileWriterQueue::~FileWriterQueue() { EnsureFinished(); writer_thread.join(); } ``` Now let's look at how this is used... ``` void WriteBatches(std::vector<std::shared_ptr<RecordBatch>> batches, const FileWriterOptions& options) { std::unique_ptr<FileWriter> file_writer = OpenWriter(); FileWriterQueue file_writer_queue(std::move(file_writer)); for (const auto& batch : batches) { ARROW_RETURN_NOT_OK(file_writer_queue.QueueBatch(batch)); } file_writer_queue.Finish(); // I could also skip this and just rely on the EnsureFinished in the destructor } ``` We are ensured several things: 1) The FileWriterQueue will not be deleted until the thread is joined 2) The options remain valid until the thread is joined 3) All work is completely done when WriteBatches returns My goal is to allow those same guarantees to exist if the "finishing work" (the stuff in the destructor) is a future. For example, the asynchronous analogue of above might be... ``` class FileWriterQueue { public: FileWriterQueue(std::unique_ptr<FileWriter> writer, const FileWriteOptions& options); ~FileWriterQueue(); void QueueBatch(std::shared_ptr<RecordBatch> batch); Future<> Finish(); private: std::unique_ptr<FileWriter> writer; const FileWriteOptions& options; }; ``` Now our destructor can't do anything. It can't block on Finish() because that would defeat the purpose of being asynchronous. Someone naively using this class might do... ``` void WriteBatches(std::vector<std::shared_ptr<RecordBatch>> batches, const FileWriterOptions& options) { std::unique_ptr<FileWriter> file_writer = OpenWriter(); FileWriterQueue file_writer_queue(std::move(file_writer)); for (const auto& batch : batches) { ARROW_RETURN_NOT_OK(file_writer_queue.QueueBatch(batch)); } return file_writer_queue.Finish().status(); } ``` This works ok until a call to QueueBatch returns an error status halfway through the write. Then the code will bail out and both `file_writer_queue` and `options` will go out of scope. This means if there are any leftover captures and they get executed they will segfault. With the nursery you can write... ``` void WriteBatches(std::vector<std::shared_ptr<RecordBatch>> batches, const FileWriterOptions& options) { return util::RunInNursery([] (Nursery* nursery) { std::unique_ptr<FileWriter> file_writer = OpenWriter(); FileWriterQueue file_writer_queue(nursery, std::move(file_writer)); for (const auto& batch : batches) { ARROW_RETURN_NOT_OK(file_writer_queue.QueueBatch(batch)); } return file_writer_queue.status(); } } ``` ...and now you can be assured that an error occurring during QueueBatch will not cause a segmentation fault because the nursery will still block until any outstanding captures have resolved. > Is there a risk that these pointers may be kept alive too long (and delay the DoClose calls accordingly)? Yes, if there is a bug, but, the risk is no greater than what you have with... ``` FileWriterQueue::~FileWriterQueue() { EnsureFinished(); writer_thread.join(); // Could deadlock / delay for a long time if there is a bug } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org