Le tridi 3 floréal, an CCXXIII, David Wright a écrit : > OK. Here's a demonstration of a file going AWOL by moving *up* the > directory listing. Because of read-ahead, readdir still sees the old > name and the stat() fails.
What are you trying to prove with that test? You would get the same failure if you put your delay between readdir() and stat(). And on a preemptive multitasking OS (or even worse: with multiprocessing), that "delay" could be just the normal run time of the program. That is called a race condition, I am sure you know it. The Unix filesystem API has race conditions all over the place, everybody knows it. To eliminate them would require an explicit transactional API, and these cause a whole lot of problems of their own (deadlocks). I do not see any merit in singling out this particular race condition above all the others. > Again, because of read-ahead, I can't > demonstrate the opposite effect in the same program because > you'd have to have a directory bigger than the read-ahead buffer > in order to see any effect. Please do. Creating a thousands of files takes only a few seconds; strace can show you the calls to getdents() that lie underneath readdir() and tell you how many entries are read at once. > But, as was said already, it's occurrence > can be discovered by checking the inode numbers for duplicate returns. I am not convinced that occurrence happens. I believe that the readdir() should offer the following guarantee over the course of a single "opendir + full readdir loop": All entries that were present in the directory during the whole run are returned exactly once, under any of the names they had during the run. And for now, I have not seen any indication that this property were violated, i.e. the same entry shown twice or none at all. (There may be a more subtle issue: what happens if file9999 is renamed into file file0042 while readdir() is scanning around file5000? Would "file0042" be returned twice, but with different inode values?) I remember someone asking what happens with backup programs. I do not see it as an issue, for two reasons: First, a carefully written backup program could just make a consistency check at the end: if stat()ing any file failed with ENOENT, assume something has moved and run again. But this is useless, because: Second, the issue is much broader than that. Imagine you move the "billion_dollars_project" directory from ~/experimental to ~/finished while the backup program is running. If the backup program proceeds in that order: ~/finished, ~/music, ~/experimental, and the move happens while it is scanning ~/music, then it never sees billion_dollars_project at all, and never sees an error for it. To make reliable backups, you need a way of getting the state of the full tree atomically. Nowadays, that is done with filesystem snapshots. Unless you use that, you have to assume that any file that was moved in any way during the backup was moved the stupid way, i.e. first delete the source then re-create the target. Regards, -- Nicolas George
signature.asc
Description: Digital signature