Thomas Munro <thomas.mu...@gmail.com> writes: > we have a page at offset 638976, and we can find all system calls that > touched that offset:
> [pid 26031] 23:26:48.521123 pwritev(50, > [{iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > iov_len=8192}], 1, 638976) = 8192 > [pid 26040] 23:26:48.568975 pwrite64(5, > "\0\0\0\0\0Nj\1\0\0\0\0\240\3\300\3\0 \4 > \0\0\0\0\340\2378\0\300\2378\0"..., 8192, 638976) = 8192 > [pid 26040] 23:26:48.593157 pread64(6, > "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > 8192, 638976) = 8192 Boy, it's hard to look at that trace and not call it a filesystem bug. Given the apparent dependency on COW, I wonder if this has something to do with getting confused about which copy is current? Another thing that struck me is that the two calls from pid 26040 are issued on different FDs. I checked the strace log and verified that these do both refer to "base/5/16384". It looks like there was a cache flush at about 23:26:48.575023 that caused 26040 to close and later reopen all its database relation FDs. Maybe that is somehow contributing to the filesystem's confusion? And more to the point, could that explain why other O_DIRECT users aren't up in arms over this bug? Maybe they don't switch FDs as readily as we do. regards, tom lane