[ 
https://issues.apache.org/jira/browse/ARROW-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17519176#comment-17519176
 ] 

Micah Kornfield commented on ARROW-16147:
-----------------------------------------

Thanks for the thorough tests.  It isn't clear to me why we wouldn't close the 
sink here but I'd be a little concerned in the change of behavior.  
unfortunately this isn't specified in the contract.  I think we should fix the 
GcsOutputFile to close on destruction.



> [C++] ParquetFileWriter doesn't call sink_.Close when using 
> GcsRandomAccessFile
> -------------------------------------------------------------------------------
>
>                 Key: ARROW-16147
>                 URL: https://issues.apache.org/jira/browse/ARROW-16147
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++
>            Reporter: Rok Mihevc
>            Priority: Major
>              Labels: GCP
>
> On parquet::arrow::FileWriter::Close the underlying sink is not closed. The 
> implementation goes to FileSerializer::Close:
> {code:cpp}
> void Close() override {
>     if (is_open_) {
>       // If any functions here raise an exception, we set is_open_ to be false
>       // so that this does not get called again (possibly causing segfault)
>       is_open_ = false;
>       if (row_group_writer_) {
>         num_rows_ += row_group_writer_->num_rows();
>         row_group_writer_->Close();
>       }
>       row_group_writer_.reset();
>       // Write magic bytes and metadata
>       auto file_encryption_properties = 
> properties_->file_encryption_properties();
>       if (file_encryption_properties == nullptr) {  // Non encrypted file.
>         file_metadata_ = metadata_->Finish();
>         WriteFileMetaData(*file_metadata_, sink_.get());
>       } else {  // Encrypted file
>         CloseEncryptedFile(file_encryption_properties);
>       }
>     }
>   }
> {code}
> It doesn't call sink_->Close(), which leads to resource leaking and bugs.
> With files (they have own close() in destructor) it works fine, but doesn't 
> work with fs::GcsRandomAccessFile. When I calling 
> parquet::arrow::FileWriter::Close the data is not flushed to storage, until 
> manual close of a sink stream (or stack space change).
> Is it done by intention or a bug?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to