[GitHub] [arrow] schelhorn commented on issue #11781: Is adding Parquet partitions/part files using R's arrow::write_dataset() transactional?

GitBox Fri, 10 Dec 2021 07:24:43 -0800


schelhorn commented on issue #11781:
URL: https://github.com/apache/arrow/issues/11781#issuecomment-991062577



   I see, thank you @nealrichardson. So I guess I would fare safer by indeed 
having a second dataset to write in the same directory as the main dataset, and 
then sync all new part files from the temporary dataset to the main dataset 
after writing has concluded successfully. Since file moves on the OS level are 
atomic this should safeguard against partial writes. For concurrent writes I'll 
use the `basename_template` as you proposed.
   
   I know that partial writes may sound like an edge case to many, but once you 
write to shared storage in an HPC environment these things just happen from 
time to time.
   
   Please feel free to close this issue (I also suggest you could add your 
answer to the `arrow` documentation if it's not already captured there).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] schelhorn commented on issue #11781: Is adding Parquet partitions/part files using R's arrow::write_dataset() transactional?

Reply via email to