[GitHub] [arrow] westonpace commented on pull request #10968: ARROW-13680: [C++] Create an asynchronous nursery to simplify capture logic

GitBox Mon, 30 Aug 2021 22:33:34 -0700


westonpace commented on pull request #10968:
URL: https://github.com/apache/arrow/pull/10968#issuecomment-908913026



   > If I understand correctly, nothing here actually keeps the tasks alive. 
What this relies upon is that the user stored the shared_ptr or unique_ptr 
returned by Nursery::MakeXXXCloseable somewhere so that lifetimes are handled 
correctly?
   
   There's two concerns here.  1) Keeping the object alive (the nursery does 
not do this, but the smart pointers do) and 2) Not returning until all tasks 
have finished (the nursery does this because it blocks until every task has 
finished).
   
   > Is there a risk that these pointers may be kept alive too long (and delay 
the DoClose calls accordingly)?
   
   Yes, the nursery could very much trigger deadlock if a task never finishes.  
I'll call back to this later.
   
   > What is the recommended strategy for using this facility?
   Did you try to use this in the codebase to check that it actually reduces 
the burden of managing the sequencing of destructor calls?
   
   See https://github.com/apache/arrow/pull/11017
   @pitrou 
   ---
   
   I also received some negative feedback offline from @bkietz and so I tried 
today to rewrite this in another light, using only the asynchronous smart 
pointers and an asynchronous task group.  The result is 
[here](https://github.com/apache/arrow/compare/master...westonpace:experiment/async-smart-ptr?expand=1).
  However, it still doesn't quite solve the problem.  So, let me try and state 
the problem I am trying to solve in clear terms and I am open to any solution 
someone can come up with but at the moment this PR is still my preferred 
solution.
   
   # Problem Statement
   
   The use case here is from the perspective of a developer trying to write 
some code and they want to ensure the code they write is used safely.  For 
example, let's consider the case of writing a file writer queue.  A synchronous 
declaration might look something like this...
   
   ```
   class FileWriterQueue {
   public:
     FileWriterQueue(std::unique_ptr<FileWriter> writer, const 
FileWriteOptions& options);
     ~FileWriterQueue();
     Status QueueBatch(std::shared_ptr<RecordBatch> batch);
     void Finish();
   private:
     std::unique_ptr<FileWriter> writer;
     const FileWriteOptions& options;
   };
   ```
   
   Now, let's pretend our implementation creates a dedicated writer thread and 
every call to QueueBatch adds the batch to a producer consumer queue that the 
writer thread drains.  Our destructor would then look like this...
   
   ```
   FileWriterQueue::~FileWriterQueue() {
     EnsureFinished();
     writer_thread.join();
   }
   ```
   
   Now let's look at how this is used...
   
   ```
   void WriteBatches(std::vector<std::shared_ptr<RecordBatch>> batches, const 
FileWriterOptions& options) {
     std::unique_ptr<FileWriter> file_writer = OpenWriter();
     FileWriterQueue file_writer_queue(std::move(file_writer));
     for (const auto& batch : batches) {
       ARROW_RETURN_NOT_OK(file_writer_queue.QueueBatch(batch));
     }
     file_writer_queue.Finish(); // I could also skip this and just rely on the 
EnsureFinished in the destructor
   }
   ```
   
   We are ensured several things:
   
    1) The FileWriterQueue will not be deleted until the thread is joined
    2) The options remain valid until the thread is joined
    3) All work is completely done when WriteBatches returns
   
   My goal is to allow those same guarantees to exist if the "finishing work" 
(the stuff in the destructor) is a future. For example, the asynchronous 
analogue of above might be...
   
   ```
   class FileWriterQueue {
   public:
     FileWriterQueue(std::unique_ptr<FileWriter> writer, const 
FileWriteOptions& options);
     ~FileWriterQueue();
     void QueueBatch(std::shared_ptr<RecordBatch> batch);
     Future<> Finish();
   private:
     std::unique_ptr<FileWriter> writer;
     const FileWriteOptions& options;
   };
   ```
   
   Now our destructor can't do anything.  It can't block on Finish() because 
that would defeat the purpose of being asynchronous.  Someone naively using 
this class might do...
   
   ```
   void WriteBatches(std::vector<std::shared_ptr<RecordBatch>> batches, const 
FileWriterOptions& options) {
     std::unique_ptr<FileWriter> file_writer = OpenWriter();
     FileWriterQueue file_writer_queue(std::move(file_writer));
     for (const auto& batch : batches) {
       ARROW_RETURN_NOT_OK(file_writer_queue.QueueBatch(batch));
     }
     return file_writer_queue.Finish().status();
   }
   ```
   
   This works ok until a call to QueueBatch returns an error status halfway 
through the write.  Then the code will bail out and both `file_writer_queue` 
and `options` will go out of scope.  This means if there are any leftover 
captures and they get executed they will segfault.
   
   With the nursery you can write...
   
   ```
   void WriteBatches(std::vector<std::shared_ptr<RecordBatch>> batches, const 
FileWriterOptions& options) {
     return util::RunInNursery([] (Nursery* nursery) {
       std::unique_ptr<FileWriter> file_writer = OpenWriter();
       FileWriterQueue file_writer_queue(nursery, std::move(file_writer));
       for (const auto& batch : batches) {
         ARROW_RETURN_NOT_OK(file_writer_queue.QueueBatch(batch));
       }
       return file_writer_queue.status();
     }
   }
   ```
   
   ...and now you can be assured that an error occurring during QueueBatch will 
not cause a segmentation fault because the nursery will still block until any 
outstanding captures have resolved.
   
   > Is there a risk that these pointers may be kept alive too long (and delay 
the DoClose calls accordingly)?
   
   Yes, if there is a bug, but, the risk is no greater than what you have 
with...
   
   ```
   FileWriterQueue::~FileWriterQueue() {
     EnsureFinished();
     writer_thread.join(); // Could deadlock / delay for a long time if there 
is a bug
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] westonpace commented on pull request #10968: ARROW-13680: [C++] Create an asynchronous nursery to simplify capture logic

Reply via email to