OliLay commented on code in PR #43096:
URL: https://github.com/apache/arrow/pull/43096#discussion_r1698113895


##########
cpp/src/arrow/filesystem/azurefs.cc:
##########
@@ -1027,16 +1052,38 @@ class ObjectAppendStream final : public 
io::OutputStream {
     return Status::OK();
   }
 
+  Status EnsureReadyToFlushFromClose() {

Review Comment:
   I think this is more of a style question, as this function is called twice 
from different places, I actually prefer it being a separate method to remove 
duplication. But I don't have a strong opinion, so I'll just removed it.



##########
cpp/src/arrow/filesystem/azurefs.cc:
##########
@@ -1066,20 +1113,95 @@ class ObjectAppendStream final : public 
io::OutputStream {
       // flush. This also avoids some unhandled errors when flushing in the 
destructor.
       return Status::OK();
     }
-    return CommitBlockList(block_blob_client_, block_ids_, 
commit_block_list_options_);
+
+    auto fut = FlushAsync();
+    RETURN_NOT_OK(fut.status());
+
+    std::unique_lock<std::mutex> lock(upload_state_->mutex);
+    return CommitBlockList(block_blob_client_, upload_state_->block_ids,
+                           commit_block_list_options_);

Review Comment:
   If we restructure the lock as you've suggested, we get a deadlock because 
the future can actually never be marked as finished by a succeeding background 
write (because in `HandleUploadOutcome`, where we try to mark the future as 
finished, we also need to acquire the lock).
   I think the race you pointed out is only an issue if there is a parallel 
`Write()` and `Flush()` to the stream. That is the only way how 
`pending_blocks_completed` could change the state during `Flush()`. To my 
knowledge, arrow's output streams are not supposed to be thread-safe (e.g. the 
S3 output stream isn't), so it should not be a problem in practice.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to