On Mon, Jan 31, 2022 at 10:42:54AM +0530, Bharath Rupireddy wrote: > After an off-list discussion with Andreas, proposing here a patch that > basically replaces ReadDir call with ReadDirExtended and gets rid of > lstat entirely. With this chance, the checkpoint will only care about > the snapshot and mapping files and not fail if it finds other files in > the directories. Removing lstat enables us to make things faster as we > avoid a bunch of extra system calls - one lstat call per each mapping > or snapshot file.
I think removing the lstat() is probably reasonable. We currently aren't doing proper error checking, and the chances of a non-regular file matching the prefix are likely pretty low. In the worst case, we'll LOG or ERROR when unlinking or fsyncing fails. However, I'm not sure about the change to ReadDirExtended(). That might be okay for CheckPointSnapBuild(), which is just trying to remove old files, but CheckPointLogicalRewriteHeap() is responsible for ensuring that files are flushed to disk for the checkpoint. If we stop reading the directory after an error and let the checkpoint continue, isn't it possible that some mappings files won't be persisted to disk? -- Nathan Bossart Amazon Web Services: https://aws.amazon.com