On 2014-10-20 09:02, Zygo Blaxell wrote:
Just now saw this thread, but IIRC 'No such file or directory' also gets returned sometimes when trying to automount a share that can't be enumerated by the client, and also sometimes when there is a stale NFS file handle.On Mon, Oct 20, 2014 at 04:38:28AM +0000, Duncan wrote:Russell Coker posted on Sat, 18 Oct 2014 14:54:19 +1100 as excerpted:# find . -name "*546" ./1412233213.M638209P10546 # ls -l ./1412233213.M638209P10546 ls: cannot access ./1412233213.M638209P10546: No such file or directoryDoes your mail server do a lot of renames? Is one perhaps stuck? If so, that sounds like the same thing "Zygo Blaxell" is reporting in the "3.16.3..3.17.1 hang in renameat2()" thread, OP on Sun, 19 Oct 2014 15:25:26 -400, Msg-ID: <20141019192525.ga29...@hungrycats.org>, as linked here: <http://permalink.gmane.org/gmane.comp.file-systems.btrfs/39539> I pointed him at this thread too. I hadn't seen you mention a hung rename, but the other symptoms sound similar.Not really. It looks like Russell having a NFS client-side problem, I'm having a server-side one (maybe). Also, all Russell's system calls seem to be returning promptly, while some of mine are not. Even if there were timeouts, an NFS server timeout gives a different error than 'No such file or directory'. Finally, the one and only thing I _can_ do with my bug is 'ls' on the renamed files (for me, the find would get stuck before returning any output). For Russell's issue...most of the stuff I can think of has been tried already. I didn't see if there was any attempt try to ls the file from the NFS server as well as the client side. If ls is OK on the server but not the client, it's an NFS issue (possibly interacting with some btrfs-specific quirk); otherwise, it's likely a corrupted filesystem (mail servers seem to be unusually good at making these). Most of the I/O time on mail servers tends to land in the fsync() system call, and some nasty fsync() btrfs bugs were fixed in 3.17 (i.e. after 3.16, and not in the 3.16.x stable update for x <= 5 (the last one I've checked)). That said, I'm not familiar with how fsync() translates over NFS, so it might not be relevant after all. If the NFS server's view of the filesystem is OK, check the NFS protocol version from /proc/mounts on the client. Sometimes NFS clients will get some transient network error during connection and fall back to some earlier (and potentially buggier) NFS version. I've seen very different behavior in some important corner cases from v4 and v3 clients, for example, and if the client is falling all the way back to v2 the bugs and their workarounds start to get just plain _weird_ (e.g. filenames which produce specific values from some hash function or that contain specific character sequences are unusable). v2 is so old it may even have issues with 64-bit inode numbers.
smime.p7s
Description: S/MIME Cryptographic Signature