I have a strange problem that began manifesting after I rebuilt my cluster a month or so back. A tiny subset of my files on CephFS are being zero-padded out to the length of ceph.dir.layout.stripe_unit when the files are later *read* (not when they are written). Tonight I realized the padding matched the stripe_unit value of 1048576, and changed it to 4194304, which resulted in those files that get padded taking on the new stripe_unit value. I've since changed it back. I've tried searching Google for answers, and Ceph bugs, but have had no luck so far.
Current ceph.dir.layout setting for the entire cluster. $ getfattr -n ceph.dir.layout /ceph/ka getfattr: Removing leading '/' from absolute path names # file: ceph/ka ceph.dir.layout="stripe_unit=1048576 stripe_count=2 object_size=8388608 pool=cephfs_data" Ceph is mounted on all machines using the kernel driver. The problem is not isolated to a single machine. $ grep /ceph/ka /etc/mtab backupz/ceph/ka /backupz/ceph/ka zfs rw,noatime,xattr,noacl 0 0 172.16.0.11:6789,172.16.0.19:6789:/ /ceph/ka ceph rw,noatime,nodiratime,name=admin,secret=<hidden>,acl 0 0 Files from a Subversion repository, where the last one was padded after I tried to check out the repo. kward@ka02 2016-10-20T22:57:13 %0]/ceph/ka/data/repoz/forestent/forestent/db/revs/0 $ ls -lrt |tail -5 -rw-r--r-- 1 www-data www-data 1079 Oct 20 08:53 877 -rw-r--r-- 1 www-data www-data 1415 Oct 20 08:55 878 -rw-r--r-- 1 www-data www-data 1059 Oct 20 09:01 879 -rw-r--r-- 1 www-data www-data 1318 Oct 20 09:36 880 -rw-r--r-- 1 www-data www-data 4194304 Oct 20 19:18 881 Files stored, then later accessed via WebDAV. Only those files accessed were subsequently padded. [kward@ka02 2016-10-20T23:01:42 %0]~/www/webdav/OmniFocus.ofocus $ ls -l total 16389 -rw-r--r-- 1 www-data www-data 4194304 Oct 20 20:08 00000000000000=ay-_KSusSw8+jOtYClSC2kx.zip -rw-r--r-- 1 www-data www-data 1383 Oct 20 19:22 20161020172209=pP4DpDOXAaA.client -rw-r--r-- 1 www-data www-data 4194304 Oct 20 20:20 20161020182047=pP4DpDOXAaA.client -rw-r--r-- 1 www-data www-data 4194304 Oct 20 21:11 20161020191117=pP4DpDOXAaA.client -rw-r--r-- 1 www-data www-data 1309 Oct 20 21:56 20161020195647=jY9iwiPfUhB.client -rw-r--r-- 1 www-data www-data 4194304 Oct 20 22:04 20161020200427=pP4DpDOXAaA.client -rw-r--r-- 1 www-data www-data 1309 Oct 20 22:54 20161020205415=jY9iwiPfUhB.client Cluster lists as healthy. (Yes, I'm aware one of the OSDs is currently down. The issue was there two months before it went down.) $ ceph status cluster f13b6373-0cdc-4372-85a2-66bf2841e313 health HEALTH_OK monmap e3: 3 mons at {ka01= 172.16.0.11:6789/0,ka03=172.16.0.15:6789/0,ka04=172.16.0.17:6789/0} election epoch 36, quorum 0,1,2 ka01,ka03,ka04 fsmap e1140219: 1/1/1 up {0=ka01=up:active}, 2 up:standby osdmap e1234338: 16 osds: 15 up, 15 in flags sortbitwise pgmap v2296058: 1216 pgs, 3 pools, 7343 GB data, 1718 kobjects 14801 GB used, 19360 GB / 34161 GB avail 1216 active+clean Details: - Ceph 10.2.2 (Ubuntu 16.04.1 packages) - 4x servers, each with 4x OSDs on HDDs (mixture of 2T and 3T drives); journals on SSD - 3x Mons, and 3x MDSs - Data is replicated 2x - The only usage of the cluster is via CephFS Kate https://ch.linkedin.com/in/kate-ward-1119b9
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com