schelhorn commented on issue #11781: URL: https://github.com/apache/arrow/issues/11781#issuecomment-991062577
I see, thank you @nealrichardson. So I guess I would fare safer by indeed having a second dataset to write in the same directory as the main dataset, and then sync all new part files from the temporary dataset to the main dataset after writing has concluded successfully. Since file moves on the OS level are atomic this should safeguard against partial writes. For concurrent writes I'll use the `basename_template` as you proposed. I know that partial writes may sound like an edge case to many, but once you write to shared storage in an HPC environment these things just happen from time to time. Please feel free to close this issue (I also suggest you could add your answer to the `arrow` documentation if it's not already captured there). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
