Dear Cephalopodians,

just now that our Ceph cluster is under high I/O load, we get user reports of 
files not being seen on some clients,
but somehow showing up after forcing a stat() syscall. 

For example, one user had added several files to a directory via an NFS client 
attached to nfs-ganesha (which uses libcephfs),
and afterwards, all other nfs-ganesha servers saw it, and 44 of our 
Fuse-clients - 
but one single client still saw the old contents of the directory, i.e. the 
files seemed missing(!). 
This happened both when using "ls" on the directory or when trying to access 
the non-existent files directly. 

I could confirm this observation also in a fresh login shell on the machine. 

Then, on the "broken" client, I entered in the directory which seemed to 
contain only the "old" content, and I created a new file in there. 
This worked fine, and all other clients saw the file immediately. 
Also on the broken client, metadata was now updated and all other files 
appeared - i.e. everything was "in sync" again. 

There's nothing in the ceph-logs of our MDS, or in the syslogs of the client 
machine / MDS. 


Another user observed the same, but not explicitly limited to one machine (it 
seems random). 
He now uses a "stat" on the file he expects to exist (but which is not seen 
with "ls"). 
The stat returns "No such file", a subsequent "ls" then however lists the file, 
and it can be accessed normally. 

This feels like something is messed up concerning the client caps - these are 
all 12.2.4 Fuse clients. 

Any ideas how to find the cause? 
It only happens since recently, and under high I/O load with many metadata 
operations. 

Cheers,
        Oliver

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to