kaybhutani opened a new pull request, #3635:
URL: https://github.com/apache/celeborn/pull/3635
### What changes were proposed in this pull request?
Add a new configuration `celeborn.worker.commitFiles.fsync` (default
`false`) that calls `FileChannel.force(false)` (fdatasync) before closing the
channel in
`LocalTierWriter.closeStreams()`.
### Why are the changes needed?
Without this, committed shuffle data can sit in the OS page cache before
the kernel flushes it to disk. A hard crash in that window loses data even
though Celeborn considers it committed. This option lets operators opt into
stronger durability guarantees.
### Does this PR resolve a correctness bug?
No. It adds an optional durability enhancement.
### Does this PR introduce _any_ user-facing change?
Yes. New configuration key `celeborn.worker.commitFiles.fsync` (boolean,
default `false`).
### How was this patch tested?
Existing unit tests. Configuration verified via `ConfigurationSuite`
Additional context:
[slack](https://apachecelebor-kw08030.slack.com/archives/C04B1FYS6SY/p1774259245973229)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]