OliLay commented on code in PR #43096:
URL: https://github.com/apache/arrow/pull/43096#discussion_r1698113895
##########
cpp/src/arrow/filesystem/azurefs.cc:
##########
@@ -1027,16 +1052,38 @@ class ObjectAppendStream final : public
io::OutputStream {
return Status::OK();
}
+ Status EnsureReadyToFlushFromClose() {
Review Comment:
I think this is more of a style question, as this function is called twice
from different places, I actually prefer it being a separate method to remove
duplication. But I don't have a strong opinion, so I'll just removed it.
##########
cpp/src/arrow/filesystem/azurefs.cc:
##########
@@ -1066,20 +1113,95 @@ class ObjectAppendStream final : public
io::OutputStream {
// flush. This also avoids some unhandled errors when flushing in the
destructor.
return Status::OK();
}
- return CommitBlockList(block_blob_client_, block_ids_,
commit_block_list_options_);
+
+ auto fut = FlushAsync();
+ RETURN_NOT_OK(fut.status());
+
+ std::unique_lock<std::mutex> lock(upload_state_->mutex);
+ return CommitBlockList(block_blob_client_, upload_state_->block_ids,
+ commit_block_list_options_);
Review Comment:
If we restructure the lock as you've suggested, we get a deadlock because
the future can actually never be marked as finished by a succeeding background
write (because in `HandleUploadOutcome`, where we try to mark the future as
finished, we also need to acquire the lock).
I think the race you pointed out is only an issue if there is a parallel
`Write()` and `Flush()` to the stream. That is the only way how
`pending_blocks_completed` could change the state during `Flush()`. To my
knowledge, arrow's output streams are not supposed to be thread-safe (e.g. the
S3 output stream isn't), so it should not be a problem in practice.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]