Re: [ceph-users] Consistency problems when taking RBD snapshot
On Mon, Sep 26, 2016 at 11:13 AM, Ilya Dryomov wrote: > On Mon, Sep 26, 2016 at 8:39 AM, Nikolay Borisov wrote: >> >> >> On 09/22/2016 06:36 PM, Ilya Dryomov wrote: >>> On Thu, Sep 15, 2016 at 3:18 PM, Ilya Dryomov wrote: On Thu, Sep 15, 2016 at 2:43 PM, Nikolay Borisov wrote: > > [snipped] > > cat /sys/bus/rbd/devices/47/client_id > client157729 > cat /sys/bus/rbd/devices/1/client_id > client157729 > > Client client157729 is alxc13, based on correlation by the ip address > shown by the rados -p ... command. So it's the only client where the rbd > images are mapped. Well, the watches are there, but cookie numbers indicate that they may have been re-established, so that's inconclusive. My suggestion would be to repeat the test and do repeated freezes to see if snapshot continues to follow HEAD. Further, to rule out a missed snap context update, repeat the test, but stick # echo 1 >/sys/bus/rbd/devices//refresh after "rbd snap create" (for the today's test, ID_OF_THE_ORIG_DEVICE would be 47). >>> >>> Hi Nikolay, >>> >>> Any news on this? >> >> Hello, >> >> I was on holiday hence the radio silence. Here is the latest set of >> tests that were run: >> >> Results: >> >> c11579 (100GB - used: 83GB): >> root@alxc13:~# rbd showmapped |grep c11579 >> 47 rbd c11579 -/dev/rbd47 >> root@alxc13:~# fsfreeze -f /var/lxc/c11579 >> root@alxc13:~# md5sum <(dd if=/dev/rbd47 iflag=direct bs=8M) >> 12800+0 records in >> 12800+0 records out >> 107374182400 bytes (107 GB) copied, 686.382 s, 156 MB/s >> f2edb5abb100de30c1301b0856e595aa /dev/fd/63 >> root@alxc13:~# rbd snap create rbd/c11579@snap_test >> root@alxc13:~# rbd map c11579@snap_test >> /dev/rbd1 >> root@alxc13:~# echo 1 >/sys/bus/rbd/devices/47/refresh >> root@alxc13:~# md5sum <(dd if=/dev/rbd47 iflag=direct bs=8M) >> 12800+0 records in >> 12800+0 records out >> 107374182400 bytes (107 GB) copied, 915.225 s, 117 MB/s >> f2edb5abb100de30c1301b0856e595aa /dev/fd/63 >> root@alxc13:~# md5sum <(dd if=/dev/rbd1 iflag=direct bs=8M) >> 12800+0 records in >> 12800+0 records out >> 107374182400 bytes (107 GB) copied, 863.464 s, 143 MB/s >> f2edb5abb100de30c1301b0856e595aa /dev/fd/63 >> root@alxc13:~# file -s /dev/rbd1 >> /dev/rbd1: Linux rev 1.0 ext4 filesystem data (extents) (large files) >> (huge files) >> root@alxc13:~# fsfreeze -u /var/lxc/c11579 >> root@alxc13:~# md5sum <(dd if=/dev/rbd47 iflag=direct bs=8M) >> 12800+0 records in >> 12800+0 records out >> 107374182400 bytes (107 GB) copied, 730.243 s, 147 MB/s >> 65294ce9eae5694a56054ec4af011264 /dev/fd/63 >> root@alxc13:~# md5sum <(dd if=/dev/rbd1 iflag=direct bs=8M) >> 12800+0 records in >> 12800+0 records out >> 107374182400 bytes (107 GB) copied, 649.373 s, 165 MB/s >> f2edb5abb100de30c1301b0856e595aa /dev/fd/63 >> >> 30min later: >> root@alxc13:~# md5sum <(dd if=/dev/rbd1 iflag=direct bs=8M) >> 12800+0 records in >> 12800+0 records out >> 107374182400 bytes (107 GB) copied, 648.328 s, 166 MB/s >> f2edb5abb100de30c1301b0856e595aa /dev/fd/63 >> >> >> >> c12607 (30GB - used: 4GB): >> root@alxc13:~# rbd showmapped |grep c12607 >> 39 rbd c12607 -/dev/rbd39 >> root@alxc13:~# fsfreeze -f /var/lxc/c12607 >> root@alxc13:~# md5sum <(dd if=/dev/rbd39 iflag=direct bs=8M) >> 3840+0 records in >> 3840+0 records out >> 32212254720 bytes (32 GB) copied, 228.2 s, 141 MB/s >> e6ce3ea688a778b9c732041164b4638c /dev/fd/63 >> root@alxc13:~# rbd snap create rbd/c12607@snap_test >> root@alxc13:~# rbd map c12607@snap_test >> /dev/rbd21 >> root@alxc13:~# rbd snap protect rbd/c12607@snap_test >> root@alxc13:~# echo 1 >/sys/bus/rbd/devices/39/refresh >> root@alxc13:~# md5sum <(dd if=/dev/rbd39 iflag=direct bs=8M) >> 3840+0 records in >> 3840+0 records out >> 32212254720 bytes (32 GB) copied, 217.138 s, 148 MB/s >> e6ce3ea688a778b9c732041164b4638c /dev/fd/63 >> root@alxc13:~# md5sum <(dd if=/dev/rbd21 iflag=direct bs=8M) >> 3840+0 records in >> 3840+0 records out >> 32212254720 bytes (32 GB) copied, 212.254 s, 152 MB/s >> e6ce3ea688a778b9c732041164b4638c /dev/fd/63 >> root@alxc13:~# file -s /dev/rbd21 >> /dev/rbd21: Linux rev 1.0 ext4 filesystem data (extents) (large files) >> (huge files) >> root@alxc13:~# fsfreeze -u /var/lxc/c12607 >> root@alxc13:~# md5sum <(dd if=/dev/rbd39 iflag=direct bs=8M) >> 3840+0 records in >> 3840+0 records out >> 32212254720 bytes (32 GB) copied, 322.964 s, 99.7 MB/s >> 71c5efc24162452473cda50155cd4399 /dev/fd/63 >> root@alxc13:~# md5sum <(dd if=/dev/rbd21 iflag=direct bs=8M) >> 3840+0 records in >> 3840+0 records out >> 32212254720 bytes (32 GB) copied, 326.273 s, 98.7 MB/s >> e6ce3ea688a778b9c732041164b4638c /dev/fd/63 >> root@alxc13:~# file -s /dev/rbd21 >> /dev/rbd21: Linux rev 1.0 ext4 filesystem data (extents) (large files) >> (huge files) >> root@alxc13:~# >> >> 30min later: >> root@alxc13:~# md5sum <(dd if=/dev/rbd21 iflag=direc
Re: [ceph-users] Consistency problems when taking RBD snapshot
On Mon, Sep 26, 2016 at 8:39 AM, Nikolay Borisov wrote: > > > On 09/22/2016 06:36 PM, Ilya Dryomov wrote: >> On Thu, Sep 15, 2016 at 3:18 PM, Ilya Dryomov wrote: >>> On Thu, Sep 15, 2016 at 2:43 PM, Nikolay Borisov wrote: [snipped] cat /sys/bus/rbd/devices/47/client_id client157729 cat /sys/bus/rbd/devices/1/client_id client157729 Client client157729 is alxc13, based on correlation by the ip address shown by the rados -p ... command. So it's the only client where the rbd images are mapped. >>> >>> Well, the watches are there, but cookie numbers indicate that they may >>> have been re-established, so that's inconclusive. >>> >>> My suggestion would be to repeat the test and do repeated freezes to >>> see if snapshot continues to follow HEAD. >>> >>> Further, to rule out a missed snap context update, repeat the test, but >>> stick >>> >>> # echo 1 >/sys/bus/rbd/devices//refresh >>> >>> after "rbd snap create" (for the today's test, ID_OF_THE_ORIG_DEVICE >>> would be 47). >> >> Hi Nikolay, >> >> Any news on this? > > Hello, > > I was on holiday hence the radio silence. Here is the latest set of > tests that were run: > > Results: > > c11579 (100GB - used: 83GB): > root@alxc13:~# rbd showmapped |grep c11579 > 47 rbd c11579 -/dev/rbd47 > root@alxc13:~# fsfreeze -f /var/lxc/c11579 > root@alxc13:~# md5sum <(dd if=/dev/rbd47 iflag=direct bs=8M) > 12800+0 records in > 12800+0 records out > 107374182400 bytes (107 GB) copied, 686.382 s, 156 MB/s > f2edb5abb100de30c1301b0856e595aa /dev/fd/63 > root@alxc13:~# rbd snap create rbd/c11579@snap_test > root@alxc13:~# rbd map c11579@snap_test > /dev/rbd1 > root@alxc13:~# echo 1 >/sys/bus/rbd/devices/47/refresh > root@alxc13:~# md5sum <(dd if=/dev/rbd47 iflag=direct bs=8M) > 12800+0 records in > 12800+0 records out > 107374182400 bytes (107 GB) copied, 915.225 s, 117 MB/s > f2edb5abb100de30c1301b0856e595aa /dev/fd/63 > root@alxc13:~# md5sum <(dd if=/dev/rbd1 iflag=direct bs=8M) > 12800+0 records in > 12800+0 records out > 107374182400 bytes (107 GB) copied, 863.464 s, 143 MB/s > f2edb5abb100de30c1301b0856e595aa /dev/fd/63 > root@alxc13:~# file -s /dev/rbd1 > /dev/rbd1: Linux rev 1.0 ext4 filesystem data (extents) (large files) > (huge files) > root@alxc13:~# fsfreeze -u /var/lxc/c11579 > root@alxc13:~# md5sum <(dd if=/dev/rbd47 iflag=direct bs=8M) > 12800+0 records in > 12800+0 records out > 107374182400 bytes (107 GB) copied, 730.243 s, 147 MB/s > 65294ce9eae5694a56054ec4af011264 /dev/fd/63 > root@alxc13:~# md5sum <(dd if=/dev/rbd1 iflag=direct bs=8M) > 12800+0 records in > 12800+0 records out > 107374182400 bytes (107 GB) copied, 649.373 s, 165 MB/s > f2edb5abb100de30c1301b0856e595aa /dev/fd/63 > > 30min later: > root@alxc13:~# md5sum <(dd if=/dev/rbd1 iflag=direct bs=8M) > 12800+0 records in > 12800+0 records out > 107374182400 bytes (107 GB) copied, 648.328 s, 166 MB/s > f2edb5abb100de30c1301b0856e595aa /dev/fd/63 > > > > c12607 (30GB - used: 4GB): > root@alxc13:~# rbd showmapped |grep c12607 > 39 rbd c12607 -/dev/rbd39 > root@alxc13:~# fsfreeze -f /var/lxc/c12607 > root@alxc13:~# md5sum <(dd if=/dev/rbd39 iflag=direct bs=8M) > 3840+0 records in > 3840+0 records out > 32212254720 bytes (32 GB) copied, 228.2 s, 141 MB/s > e6ce3ea688a778b9c732041164b4638c /dev/fd/63 > root@alxc13:~# rbd snap create rbd/c12607@snap_test > root@alxc13:~# rbd map c12607@snap_test > /dev/rbd21 > root@alxc13:~# rbd snap protect rbd/c12607@snap_test > root@alxc13:~# echo 1 >/sys/bus/rbd/devices/39/refresh > root@alxc13:~# md5sum <(dd if=/dev/rbd39 iflag=direct bs=8M) > 3840+0 records in > 3840+0 records out > 32212254720 bytes (32 GB) copied, 217.138 s, 148 MB/s > e6ce3ea688a778b9c732041164b4638c /dev/fd/63 > root@alxc13:~# md5sum <(dd if=/dev/rbd21 iflag=direct bs=8M) > 3840+0 records in > 3840+0 records out > 32212254720 bytes (32 GB) copied, 212.254 s, 152 MB/s > e6ce3ea688a778b9c732041164b4638c /dev/fd/63 > root@alxc13:~# file -s /dev/rbd21 > /dev/rbd21: Linux rev 1.0 ext4 filesystem data (extents) (large files) > (huge files) > root@alxc13:~# fsfreeze -u /var/lxc/c12607 > root@alxc13:~# md5sum <(dd if=/dev/rbd39 iflag=direct bs=8M) > 3840+0 records in > 3840+0 records out > 32212254720 bytes (32 GB) copied, 322.964 s, 99.7 MB/s > 71c5efc24162452473cda50155cd4399 /dev/fd/63 > root@alxc13:~# md5sum <(dd if=/dev/rbd21 iflag=direct bs=8M) > 3840+0 records in > 3840+0 records out > 32212254720 bytes (32 GB) copied, 326.273 s, 98.7 MB/s > e6ce3ea688a778b9c732041164b4638c /dev/fd/63 > root@alxc13:~# file -s /dev/rbd21 > /dev/rbd21: Linux rev 1.0 ext4 filesystem data (extents) (large files) > (huge files) > root@alxc13:~# > > 30min later: > root@alxc13:~# md5sum <(dd if=/dev/rbd21 iflag=direct bs=8M) > 3840+0 records in > 3840+0 records out > 32212254720 bytes (32 GB) copied, 359.917 s, 89.5 MB/s > e6ce3ea688a778b9c732041164b4638c /dev/fd/63 > > Everything seems consistent, but
Re: [ceph-users] Consistency problems when taking RBD snapshot
On 09/22/2016 06:36 PM, Ilya Dryomov wrote: > On Thu, Sep 15, 2016 at 3:18 PM, Ilya Dryomov wrote: >> On Thu, Sep 15, 2016 at 2:43 PM, Nikolay Borisov wrote: >>> >>> [snipped] >>> >>> cat /sys/bus/rbd/devices/47/client_id >>> client157729 >>> cat /sys/bus/rbd/devices/1/client_id >>> client157729 >>> >>> Client client157729 is alxc13, based on correlation by the ip address >>> shown by the rados -p ... command. So it's the only client where the rbd >>> images are mapped. >> >> Well, the watches are there, but cookie numbers indicate that they may >> have been re-established, so that's inconclusive. >> >> My suggestion would be to repeat the test and do repeated freezes to >> see if snapshot continues to follow HEAD. >> >> Further, to rule out a missed snap context update, repeat the test, but >> stick >> >> # echo 1 >/sys/bus/rbd/devices//refresh >> >> after "rbd snap create" (for the today's test, ID_OF_THE_ORIG_DEVICE >> would be 47). > > Hi Nikolay, > > Any news on this? Hello, I was on holiday hence the radio silence. Here is the latest set of tests that were run: Results: c11579 (100GB - used: 83GB): root@alxc13:~# rbd showmapped |grep c11579 47 rbd c11579 -/dev/rbd47 root@alxc13:~# fsfreeze -f /var/lxc/c11579 root@alxc13:~# md5sum <(dd if=/dev/rbd47 iflag=direct bs=8M) 12800+0 records in 12800+0 records out 107374182400 bytes (107 GB) copied, 686.382 s, 156 MB/s f2edb5abb100de30c1301b0856e595aa /dev/fd/63 root@alxc13:~# rbd snap create rbd/c11579@snap_test root@alxc13:~# rbd map c11579@snap_test /dev/rbd1 root@alxc13:~# echo 1 >/sys/bus/rbd/devices/47/refresh root@alxc13:~# md5sum <(dd if=/dev/rbd47 iflag=direct bs=8M) 12800+0 records in 12800+0 records out 107374182400 bytes (107 GB) copied, 915.225 s, 117 MB/s f2edb5abb100de30c1301b0856e595aa /dev/fd/63 root@alxc13:~# md5sum <(dd if=/dev/rbd1 iflag=direct bs=8M) 12800+0 records in 12800+0 records out 107374182400 bytes (107 GB) copied, 863.464 s, 143 MB/s f2edb5abb100de30c1301b0856e595aa /dev/fd/63 root@alxc13:~# file -s /dev/rbd1 /dev/rbd1: Linux rev 1.0 ext4 filesystem data (extents) (large files) (huge files) root@alxc13:~# fsfreeze -u /var/lxc/c11579 root@alxc13:~# md5sum <(dd if=/dev/rbd47 iflag=direct bs=8M) 12800+0 records in 12800+0 records out 107374182400 bytes (107 GB) copied, 730.243 s, 147 MB/s 65294ce9eae5694a56054ec4af011264 /dev/fd/63 root@alxc13:~# md5sum <(dd if=/dev/rbd1 iflag=direct bs=8M) 12800+0 records in 12800+0 records out 107374182400 bytes (107 GB) copied, 649.373 s, 165 MB/s f2edb5abb100de30c1301b0856e595aa /dev/fd/63 30min later: root@alxc13:~# md5sum <(dd if=/dev/rbd1 iflag=direct bs=8M) 12800+0 records in 12800+0 records out 107374182400 bytes (107 GB) copied, 648.328 s, 166 MB/s f2edb5abb100de30c1301b0856e595aa /dev/fd/63 c12607 (30GB - used: 4GB): root@alxc13:~# rbd showmapped |grep c12607 39 rbd c12607 -/dev/rbd39 root@alxc13:~# fsfreeze -f /var/lxc/c12607 root@alxc13:~# md5sum <(dd if=/dev/rbd39 iflag=direct bs=8M) 3840+0 records in 3840+0 records out 32212254720 bytes (32 GB) copied, 228.2 s, 141 MB/s e6ce3ea688a778b9c732041164b4638c /dev/fd/63 root@alxc13:~# rbd snap create rbd/c12607@snap_test root@alxc13:~# rbd map c12607@snap_test /dev/rbd21 root@alxc13:~# rbd snap protect rbd/c12607@snap_test root@alxc13:~# echo 1 >/sys/bus/rbd/devices/39/refresh root@alxc13:~# md5sum <(dd if=/dev/rbd39 iflag=direct bs=8M) 3840+0 records in 3840+0 records out 32212254720 bytes (32 GB) copied, 217.138 s, 148 MB/s e6ce3ea688a778b9c732041164b4638c /dev/fd/63 root@alxc13:~# md5sum <(dd if=/dev/rbd21 iflag=direct bs=8M) 3840+0 records in 3840+0 records out 32212254720 bytes (32 GB) copied, 212.254 s, 152 MB/s e6ce3ea688a778b9c732041164b4638c /dev/fd/63 root@alxc13:~# file -s /dev/rbd21 /dev/rbd21: Linux rev 1.0 ext4 filesystem data (extents) (large files) (huge files) root@alxc13:~# fsfreeze -u /var/lxc/c12607 root@alxc13:~# md5sum <(dd if=/dev/rbd39 iflag=direct bs=8M) 3840+0 records in 3840+0 records out 32212254720 bytes (32 GB) copied, 322.964 s, 99.7 MB/s 71c5efc24162452473cda50155cd4399 /dev/fd/63 root@alxc13:~# md5sum <(dd if=/dev/rbd21 iflag=direct bs=8M) 3840+0 records in 3840+0 records out 32212254720 bytes (32 GB) copied, 326.273 s, 98.7 MB/s e6ce3ea688a778b9c732041164b4638c /dev/fd/63 root@alxc13:~# file -s /dev/rbd21 /dev/rbd21: Linux rev 1.0 ext4 filesystem data (extents) (large files) (huge files) root@alxc13:~# 30min later: root@alxc13:~# md5sum <(dd if=/dev/rbd21 iflag=direct bs=8M) 3840+0 records in 3840+0 records out 32212254720 bytes (32 GB) copied, 359.917 s, 89.5 MB/s e6ce3ea688a778b9c732041164b4638c /dev/fd/63 Everything seems consistent, but when an rsync was initiated from the snapshot it again failed. Unfortunately I deem those results rather unstable because they now contradict the ones which I showed you earlier with the differing checksums. > > Thanks, > > Ilya >
Re: [ceph-users] Consistency problems when taking RBD snapshot
On Thu, Sep 15, 2016 at 3:18 PM, Ilya Dryomov wrote: > On Thu, Sep 15, 2016 at 2:43 PM, Nikolay Borisov wrote: >> >> [snipped] >> >> cat /sys/bus/rbd/devices/47/client_id >> client157729 >> cat /sys/bus/rbd/devices/1/client_id >> client157729 >> >> Client client157729 is alxc13, based on correlation by the ip address >> shown by the rados -p ... command. So it's the only client where the rbd >> images are mapped. > > Well, the watches are there, but cookie numbers indicate that they may > have been re-established, so that's inconclusive. > > My suggestion would be to repeat the test and do repeated freezes to > see if snapshot continues to follow HEAD. > > Further, to rule out a missed snap context update, repeat the test, but > stick > > # echo 1 >/sys/bus/rbd/devices//refresh > > after "rbd snap create" (for the today's test, ID_OF_THE_ORIG_DEVICE > would be 47). Hi Nikolay, Any news on this? Thanks, Ilya ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Consistency problems when taking RBD snapshot
On Thu, Sep 15, 2016 at 2:43 PM, Nikolay Borisov wrote: > > [snipped] > > cat /sys/bus/rbd/devices/47/client_id > client157729 > cat /sys/bus/rbd/devices/1/client_id > client157729 > > Client client157729 is alxc13, based on correlation by the ip address > shown by the rados -p ... command. So it's the only client where the rbd > images are mapped. Well, the watches are there, but cookie numbers indicate that they may have been re-established, so that's inconclusive. My suggestion would be to repeat the test and do repeated freezes to see if snapshot continues to follow HEAD. Further, to rule out a missed snap context update, repeat the test, but stick # echo 1 >/sys/bus/rbd/devices//refresh after "rbd snap create" (for the today's test, ID_OF_THE_ORIG_DEVICE would be 47). Thanks, Ilya ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Consistency problems when taking RBD snapshot
On 09/15/2016 03:15 PM, Ilya Dryomov wrote: > On Thu, Sep 15, 2016 at 12:54 PM, Nikolay Borisov wrote: >> >> >> On 09/15/2016 01:24 PM, Ilya Dryomov wrote: >>> On Thu, Sep 15, 2016 at 10:22 AM, Nikolay Borisov >>> wrote: On 09/15/2016 09:22 AM, Nikolay Borisov wrote: > > > On 09/14/2016 05:53 PM, Ilya Dryomov wrote: >> On Wed, Sep 14, 2016 at 3:30 PM, Nikolay Borisov wrote: >>> >>> >>> On 09/14/2016 02:55 PM, Ilya Dryomov wrote: On Wed, Sep 14, 2016 at 9:01 AM, Nikolay Borisov wrote: > > > On 09/14/2016 09:55 AM, Adrian Saul wrote: >> >> I found I could ignore the XFS issues and just mount it with the >> appropriate options (below from my backup scripts): >> >> # >> # Mount with nouuid (conflicting XFS) and norecovery (ro >> snapshot) >> # >> if ! mount -o ro,nouuid,norecovery $SNAPDEV /backup${FS}; >> then >> echo "FAILED: Unable to mount snapshot $DATESTAMP of >> $FS - cleaning up" >> rbd unmap $SNAPDEV >> rbd snap rm ${RBDPATH}@${DATESTAMP} >> exit 3; >> fi >> echo "Backup snapshot of $RBDPATH mounted at: /backup${FS}" >> >> It's impossible without clones to do it without norecovery. > > But shouldn't freezing the fs and doing a snapshot constitute a "clean > unmount" hence no need to recover on the next mount (of the snapshot) > - > Ilya? I *thought* it should (well, except for orphan inodes), but now I'm not sure. Have you tried reproducing with loop devices yet? >>> >>> Here is what the checksum tests showed: >>> >>> fsfreeze -f /mountpoit >>> md5sum /dev/rbd0 >>> f33c926373ad604da674bcbfbe6460c5 /dev/rbd0 >>> rbd snap create xx@xxx && rbd snap protect xx@xxx >>> rbd map xx@xxx >>> md5sum /dev/rbd1 >>> 6f702740281874632c73aeb2c0fcf34a /dev/rbd1 >>> >>> where rbd1 is a snapshot of the rbd0 device. So the checksum is indeed >>> different, worrying. >> >> Sorry, for the filesystem device you should do >> >> md5sum <(dd if=/dev/rbd0 iflag=direct bs=8M) >> >> to get what's actually on disk, so that it's apples to apples. > > root@alxc13:~# rbd showmapped |egrep "device|c11579" > id pool image snap device > 47 rbd c11579 - /dev/rbd47 > root@alxc13:~# fsfreeze -f /var/lxc/c11579 > root@alxc13:~# md5sum <(dd if=/dev/rbd47 iflag=direct bs=8M) > 12800+0 records in > 12800+0 records out > 107374182400 bytes (107 GB) copied, 617.815 s, 174 MB/s > 2ddc99ce1b3ef51da1945d9da25ac296 /dev/fd/63 <--- Check sum after > freeze > root@alxc13:~# rbd snap create rbd/c11579@snap_test > root@alxc13:~# rbd map c11579@snap_test > /dev/rbd1 > root@alxc13:~# md5sum <(dd if=/dev/rbd1 iflag=direct bs=8M) > 12800+0 records in > 12800+0 records out > 107374182400 bytes (107 GB) copied, 610.043 s, 176 MB/s > 2ddc99ce1b3ef51da1945d9da25ac296 /dev/fd/63 <--- Check sum of > snapshot > root@alxc13:~# md5sum <(dd if=/dev/rbd47 iflag=direct bs=8M) > 12800+0 records in > 12800+0 records out > 107374182400 bytes (107 GB) copied, 592.164 s, 181 MB/s > 2ddc99ce1b3ef51da1945d9da25ac296 /dev/fd/63<--- Check sum of > original device, not changed - GOOD > root@alxc13:~# file -s /dev/rbd1 > /dev/rbd1: Linux rev 1.0 ext4 filesystem data (extents) (large files) > (huge files) > root@alxc13:~# fsfreeze -u /var/lxc/c11579 > root@alxc13:~# md5sum <(dd if=/dev/rbd47 iflag=direct bs=8M) > 12800+0 records in > 12800+0 records out > 107374182400 bytes (107 GB) copied, 647.01 s, 166 MB/s > 92b7182591d7d7380435cfdea79a8897 /dev/fd/63 <--- After unfreeze > checksum is different - OK > root@alxc13:~# md5sum <(dd if=/dev/rbd1 iflag=direct bs=8M) > 12800+0 records in > 12800+0 records out > 107374182400 bytes (107 GB) copied, 590.556 s, 182 MB/s > bc3b68f0276c608d9435223f89589962 /dev/fd/63 <--- Why the heck the > checksum of the snapshot is different after unfreeze? BAD? > root@alxc13:~# file -s /dev/rbd1 > /dev/rbd1: Linux rev 1.0 ext4 filesystem data (needs journal recovery) > (extents) (large files) (huge files) > root@alxc13:~# > And something even more peculiar - taking an md5sum some hours after the above test produced this: root@alxc13:~# md5sum <(dd if=/dev/rbd1 iflag=direct bs=8M) 12800+0 records in 12800+0 records out 107374182400 bytes (107 GB) copied, 636.836 s, 169 MB/s e68e41616489d41544cd873c73defb08 /dev/fd/63 Meaning the read-only snapshot somehow has "mutated".
Re: [ceph-users] Consistency problems when taking RBD snapshot
On Thu, Sep 15, 2016 at 12:54 PM, Nikolay Borisov wrote: > > > On 09/15/2016 01:24 PM, Ilya Dryomov wrote: >> On Thu, Sep 15, 2016 at 10:22 AM, Nikolay Borisov >> wrote: >>> >>> >>> On 09/15/2016 09:22 AM, Nikolay Borisov wrote: On 09/14/2016 05:53 PM, Ilya Dryomov wrote: > On Wed, Sep 14, 2016 at 3:30 PM, Nikolay Borisov wrote: >> >> >> On 09/14/2016 02:55 PM, Ilya Dryomov wrote: >>> On Wed, Sep 14, 2016 at 9:01 AM, Nikolay Borisov >>> wrote: On 09/14/2016 09:55 AM, Adrian Saul wrote: > > I found I could ignore the XFS issues and just mount it with the > appropriate options (below from my backup scripts): > > # > # Mount with nouuid (conflicting XFS) and norecovery (ro > snapshot) > # > if ! mount -o ro,nouuid,norecovery $SNAPDEV /backup${FS}; > then > echo "FAILED: Unable to mount snapshot $DATESTAMP of > $FS - cleaning up" > rbd unmap $SNAPDEV > rbd snap rm ${RBDPATH}@${DATESTAMP} > exit 3; > fi > echo "Backup snapshot of $RBDPATH mounted at: /backup${FS}" > > It's impossible without clones to do it without norecovery. But shouldn't freezing the fs and doing a snapshot constitute a "clean unmount" hence no need to recover on the next mount (of the snapshot) - Ilya? >>> >>> I *thought* it should (well, except for orphan inodes), but now I'm not >>> sure. Have you tried reproducing with loop devices yet? >> >> Here is what the checksum tests showed: >> >> fsfreeze -f /mountpoit >> md5sum /dev/rbd0 >> f33c926373ad604da674bcbfbe6460c5 /dev/rbd0 >> rbd snap create xx@xxx && rbd snap protect xx@xxx >> rbd map xx@xxx >> md5sum /dev/rbd1 >> 6f702740281874632c73aeb2c0fcf34a /dev/rbd1 >> >> where rbd1 is a snapshot of the rbd0 device. So the checksum is indeed >> different, worrying. > > Sorry, for the filesystem device you should do > > md5sum <(dd if=/dev/rbd0 iflag=direct bs=8M) > > to get what's actually on disk, so that it's apples to apples. root@alxc13:~# rbd showmapped |egrep "device|c11579" id pool image snap device 47 rbd c11579 - /dev/rbd47 root@alxc13:~# fsfreeze -f /var/lxc/c11579 root@alxc13:~# md5sum <(dd if=/dev/rbd47 iflag=direct bs=8M) 12800+0 records in 12800+0 records out 107374182400 bytes (107 GB) copied, 617.815 s, 174 MB/s 2ddc99ce1b3ef51da1945d9da25ac296 /dev/fd/63 <--- Check sum after freeze root@alxc13:~# rbd snap create rbd/c11579@snap_test root@alxc13:~# rbd map c11579@snap_test /dev/rbd1 root@alxc13:~# md5sum <(dd if=/dev/rbd1 iflag=direct bs=8M) 12800+0 records in 12800+0 records out 107374182400 bytes (107 GB) copied, 610.043 s, 176 MB/s 2ddc99ce1b3ef51da1945d9da25ac296 /dev/fd/63 <--- Check sum of snapshot root@alxc13:~# md5sum <(dd if=/dev/rbd47 iflag=direct bs=8M) 12800+0 records in 12800+0 records out 107374182400 bytes (107 GB) copied, 592.164 s, 181 MB/s 2ddc99ce1b3ef51da1945d9da25ac296 /dev/fd/63<--- Check sum of original device, not changed - GOOD root@alxc13:~# file -s /dev/rbd1 /dev/rbd1: Linux rev 1.0 ext4 filesystem data (extents) (large files) (huge files) root@alxc13:~# fsfreeze -u /var/lxc/c11579 root@alxc13:~# md5sum <(dd if=/dev/rbd47 iflag=direct bs=8M) 12800+0 records in 12800+0 records out 107374182400 bytes (107 GB) copied, 647.01 s, 166 MB/s 92b7182591d7d7380435cfdea79a8897 /dev/fd/63 <--- After unfreeze checksum is different - OK root@alxc13:~# md5sum <(dd if=/dev/rbd1 iflag=direct bs=8M) 12800+0 records in 12800+0 records out 107374182400 bytes (107 GB) copied, 590.556 s, 182 MB/s bc3b68f0276c608d9435223f89589962 /dev/fd/63 <--- Why the heck the checksum of the snapshot is different after unfreeze? BAD? root@alxc13:~# file -s /dev/rbd1 /dev/rbd1: Linux rev 1.0 ext4 filesystem data (needs journal recovery) (extents) (large files) (huge files) root@alxc13:~# >>> >>> And something even more peculiar - taking an md5sum some hours after the >>> above test produced this: >>> >>> root@alxc13:~# md5sum <(dd if=/dev/rbd1 iflag=direct bs=8M) >>> 12800+0 records in >>> 12800+0 records out >>> 107374182400 bytes (107 GB) copied, 636.836 s, 169 MB/s >>> e68e41616489d41544cd873c73defb08 /dev/fd/63 >>> >>> Meaning the read-only snapshot somehow has "mutated". E.g. it wasn't >>> recreated, just the same old snapshot. Is this normal? >> >> Hrm, I wonder if it missed a snapshot context update. Please pastebin >> entire dmesg for that boot. > > Th
Re: [ceph-users] Consistency problems when taking RBD snapshot
On 09/15/2016 01:24 PM, Ilya Dryomov wrote: > On Thu, Sep 15, 2016 at 10:22 AM, Nikolay Borisov > wrote: >> >> >> On 09/15/2016 09:22 AM, Nikolay Borisov wrote: >>> >>> >>> On 09/14/2016 05:53 PM, Ilya Dryomov wrote: On Wed, Sep 14, 2016 at 3:30 PM, Nikolay Borisov wrote: > > > On 09/14/2016 02:55 PM, Ilya Dryomov wrote: >> On Wed, Sep 14, 2016 at 9:01 AM, Nikolay Borisov wrote: >>> >>> >>> On 09/14/2016 09:55 AM, Adrian Saul wrote: I found I could ignore the XFS issues and just mount it with the appropriate options (below from my backup scripts): # # Mount with nouuid (conflicting XFS) and norecovery (ro snapshot) # if ! mount -o ro,nouuid,norecovery $SNAPDEV /backup${FS}; then echo "FAILED: Unable to mount snapshot $DATESTAMP of $FS - cleaning up" rbd unmap $SNAPDEV rbd snap rm ${RBDPATH}@${DATESTAMP} exit 3; fi echo "Backup snapshot of $RBDPATH mounted at: /backup${FS}" It's impossible without clones to do it without norecovery. >>> >>> But shouldn't freezing the fs and doing a snapshot constitute a "clean >>> unmount" hence no need to recover on the next mount (of the snapshot) - >>> Ilya? >> >> I *thought* it should (well, except for orphan inodes), but now I'm not >> sure. Have you tried reproducing with loop devices yet? > > Here is what the checksum tests showed: > > fsfreeze -f /mountpoit > md5sum /dev/rbd0 > f33c926373ad604da674bcbfbe6460c5 /dev/rbd0 > rbd snap create xx@xxx && rbd snap protect xx@xxx > rbd map xx@xxx > md5sum /dev/rbd1 > 6f702740281874632c73aeb2c0fcf34a /dev/rbd1 > > where rbd1 is a snapshot of the rbd0 device. So the checksum is indeed > different, worrying. Sorry, for the filesystem device you should do md5sum <(dd if=/dev/rbd0 iflag=direct bs=8M) to get what's actually on disk, so that it's apples to apples. >>> >>> root@alxc13:~# rbd showmapped |egrep "device|c11579" >>> id pool image snap device >>> 47 rbd c11579 - /dev/rbd47 >>> root@alxc13:~# fsfreeze -f /var/lxc/c11579 >>> root@alxc13:~# md5sum <(dd if=/dev/rbd47 iflag=direct bs=8M) >>> 12800+0 records in >>> 12800+0 records out >>> 107374182400 bytes (107 GB) copied, 617.815 s, 174 MB/s >>> 2ddc99ce1b3ef51da1945d9da25ac296 /dev/fd/63 <--- Check sum after >>> freeze >>> root@alxc13:~# rbd snap create rbd/c11579@snap_test >>> root@alxc13:~# rbd map c11579@snap_test >>> /dev/rbd1 >>> root@alxc13:~# md5sum <(dd if=/dev/rbd1 iflag=direct bs=8M) >>> 12800+0 records in >>> 12800+0 records out >>> 107374182400 bytes (107 GB) copied, 610.043 s, 176 MB/s >>> 2ddc99ce1b3ef51da1945d9da25ac296 /dev/fd/63 <--- Check sum of snapshot >>> root@alxc13:~# md5sum <(dd if=/dev/rbd47 iflag=direct bs=8M) >>> 12800+0 records in >>> 12800+0 records out >>> 107374182400 bytes (107 GB) copied, 592.164 s, 181 MB/s >>> 2ddc99ce1b3ef51da1945d9da25ac296 /dev/fd/63<--- Check sum of original >>> device, not changed - GOOD >>> root@alxc13:~# file -s /dev/rbd1 >>> /dev/rbd1: Linux rev 1.0 ext4 filesystem data (extents) (large files) (huge >>> files) >>> root@alxc13:~# fsfreeze -u /var/lxc/c11579 >>> root@alxc13:~# md5sum <(dd if=/dev/rbd47 iflag=direct bs=8M) >>> 12800+0 records in >>> 12800+0 records out >>> 107374182400 bytes (107 GB) copied, 647.01 s, 166 MB/s >>> 92b7182591d7d7380435cfdea79a8897 /dev/fd/63 <--- After unfreeze checksum >>> is different - OK >>> root@alxc13:~# md5sum <(dd if=/dev/rbd1 iflag=direct bs=8M) >>> 12800+0 records in >>> 12800+0 records out >>> 107374182400 bytes (107 GB) copied, 590.556 s, 182 MB/s >>> bc3b68f0276c608d9435223f89589962 /dev/fd/63 <--- Why the heck the checksum >>> of the snapshot is different after unfreeze? BAD? >>> root@alxc13:~# file -s /dev/rbd1 >>> /dev/rbd1: Linux rev 1.0 ext4 filesystem data (needs journal recovery) >>> (extents) (large files) (huge files) >>> root@alxc13:~# >>> >> >> And something even more peculiar - taking an md5sum some hours after the >> above test produced this: >> >> root@alxc13:~# md5sum <(dd if=/dev/rbd1 iflag=direct bs=8M) >> 12800+0 records in >> 12800+0 records out >> 107374182400 bytes (107 GB) copied, 636.836 s, 169 MB/s >> e68e41616489d41544cd873c73defb08 /dev/fd/63 >> >> Meaning the read-only snapshot somehow has "mutated". E.g. it wasn't >> recreated, just the same old snapshot. Is this normal? > > Hrm, I wonder if it missed a snapshot context update. Please pastebin > entire dmesg for that boot. The machine has been up more than 2 and the dmesg has been rewritten several times for that time. Also the node is rather busy so there's plenty of irrelevant stuff in the dmesg. Grepped for rbd1/0 and found
Re: [ceph-users] Consistency problems when taking RBD snapshot
On Thu, Sep 15, 2016 at 10:22 AM, Nikolay Borisov wrote: > > > On 09/15/2016 09:22 AM, Nikolay Borisov wrote: >> >> >> On 09/14/2016 05:53 PM, Ilya Dryomov wrote: >>> On Wed, Sep 14, 2016 at 3:30 PM, Nikolay Borisov wrote: On 09/14/2016 02:55 PM, Ilya Dryomov wrote: > On Wed, Sep 14, 2016 at 9:01 AM, Nikolay Borisov wrote: >> >> >> On 09/14/2016 09:55 AM, Adrian Saul wrote: >>> >>> I found I could ignore the XFS issues and just mount it with the >>> appropriate options (below from my backup scripts): >>> >>> # >>> # Mount with nouuid (conflicting XFS) and norecovery (ro >>> snapshot) >>> # >>> if ! mount -o ro,nouuid,norecovery $SNAPDEV /backup${FS}; then >>> echo "FAILED: Unable to mount snapshot $DATESTAMP of >>> $FS - cleaning up" >>> rbd unmap $SNAPDEV >>> rbd snap rm ${RBDPATH}@${DATESTAMP} >>> exit 3; >>> fi >>> echo "Backup snapshot of $RBDPATH mounted at: /backup${FS}" >>> >>> It's impossible without clones to do it without norecovery. >> >> But shouldn't freezing the fs and doing a snapshot constitute a "clean >> unmount" hence no need to recover on the next mount (of the snapshot) - >> Ilya? > > I *thought* it should (well, except for orphan inodes), but now I'm not > sure. Have you tried reproducing with loop devices yet? Here is what the checksum tests showed: fsfreeze -f /mountpoit md5sum /dev/rbd0 f33c926373ad604da674bcbfbe6460c5 /dev/rbd0 rbd snap create xx@xxx && rbd snap protect xx@xxx rbd map xx@xxx md5sum /dev/rbd1 6f702740281874632c73aeb2c0fcf34a /dev/rbd1 where rbd1 is a snapshot of the rbd0 device. So the checksum is indeed different, worrying. >>> >>> Sorry, for the filesystem device you should do >>> >>> md5sum <(dd if=/dev/rbd0 iflag=direct bs=8M) >>> >>> to get what's actually on disk, so that it's apples to apples. >> >> root@alxc13:~# rbd showmapped |egrep "device|c11579" >> id pool image snap device >> 47 rbd c11579 - /dev/rbd47 >> root@alxc13:~# fsfreeze -f /var/lxc/c11579 >> root@alxc13:~# md5sum <(dd if=/dev/rbd47 iflag=direct bs=8M) >> 12800+0 records in >> 12800+0 records out >> 107374182400 bytes (107 GB) copied, 617.815 s, 174 MB/s >> 2ddc99ce1b3ef51da1945d9da25ac296 /dev/fd/63 <--- Check sum after freeze >> root@alxc13:~# rbd snap create rbd/c11579@snap_test >> root@alxc13:~# rbd map c11579@snap_test >> /dev/rbd1 >> root@alxc13:~# md5sum <(dd if=/dev/rbd1 iflag=direct bs=8M) >> 12800+0 records in >> 12800+0 records out >> 107374182400 bytes (107 GB) copied, 610.043 s, 176 MB/s >> 2ddc99ce1b3ef51da1945d9da25ac296 /dev/fd/63 <--- Check sum of snapshot >> root@alxc13:~# md5sum <(dd if=/dev/rbd47 iflag=direct bs=8M) >> 12800+0 records in >> 12800+0 records out >> 107374182400 bytes (107 GB) copied, 592.164 s, 181 MB/s >> 2ddc99ce1b3ef51da1945d9da25ac296 /dev/fd/63<--- Check sum of original >> device, not changed - GOOD >> root@alxc13:~# file -s /dev/rbd1 >> /dev/rbd1: Linux rev 1.0 ext4 filesystem data (extents) (large files) (huge >> files) >> root@alxc13:~# fsfreeze -u /var/lxc/c11579 >> root@alxc13:~# md5sum <(dd if=/dev/rbd47 iflag=direct bs=8M) >> 12800+0 records in >> 12800+0 records out >> 107374182400 bytes (107 GB) copied, 647.01 s, 166 MB/s >> 92b7182591d7d7380435cfdea79a8897 /dev/fd/63 <--- After unfreeze checksum >> is different - OK >> root@alxc13:~# md5sum <(dd if=/dev/rbd1 iflag=direct bs=8M) >> 12800+0 records in >> 12800+0 records out >> 107374182400 bytes (107 GB) copied, 590.556 s, 182 MB/s >> bc3b68f0276c608d9435223f89589962 /dev/fd/63 <--- Why the heck the checksum >> of the snapshot is different after unfreeze? BAD? >> root@alxc13:~# file -s /dev/rbd1 >> /dev/rbd1: Linux rev 1.0 ext4 filesystem data (needs journal recovery) >> (extents) (large files) (huge files) >> root@alxc13:~# >> > > And something even more peculiar - taking an md5sum some hours after the > above test produced this: > > root@alxc13:~# md5sum <(dd if=/dev/rbd1 iflag=direct bs=8M) > 12800+0 records in > 12800+0 records out > 107374182400 bytes (107 GB) copied, 636.836 s, 169 MB/s > e68e41616489d41544cd873c73defb08 /dev/fd/63 > > Meaning the read-only snapshot somehow has "mutated". E.g. it wasn't > recreated, just the same old snapshot. Is this normal? Hrm, I wonder if it missed a snapshot context update. Please pastebin entire dmesg for that boot. Have those devices been remapped or alxc13 rebooted since then? If not, what's the output of $ rados -p rbd listwatchers $(rbd info c11579 | grep block_name_prefix | awk '{ print $2 }' | sed 's/rbd_data/rbd_header/') and can you check whether that snapshot is continuing to mutate as the image is mutated - freeze /var/lxc/c11579 again and check rbd47 and rbd1? Thanks,
Re: [ceph-users] Consistency problems when taking RBD snapshot
On 09/14/2016 05:53 PM, Ilya Dryomov wrote: > On Wed, Sep 14, 2016 at 3:30 PM, Nikolay Borisov wrote: >> >> >> On 09/14/2016 02:55 PM, Ilya Dryomov wrote: >>> On Wed, Sep 14, 2016 at 9:01 AM, Nikolay Borisov wrote: On 09/14/2016 09:55 AM, Adrian Saul wrote: > > I found I could ignore the XFS issues and just mount it with the > appropriate options (below from my backup scripts): > > # > # Mount with nouuid (conflicting XFS) and norecovery (ro snapshot) > # > if ! mount -o ro,nouuid,norecovery $SNAPDEV /backup${FS}; then > echo "FAILED: Unable to mount snapshot $DATESTAMP of $FS > - cleaning up" > rbd unmap $SNAPDEV > rbd snap rm ${RBDPATH}@${DATESTAMP} > exit 3; > fi > echo "Backup snapshot of $RBDPATH mounted at: /backup${FS}" > > It's impossible without clones to do it without norecovery. But shouldn't freezing the fs and doing a snapshot constitute a "clean unmount" hence no need to recover on the next mount (of the snapshot) - Ilya? >>> >>> I *thought* it should (well, except for orphan inodes), but now I'm not >>> sure. Have you tried reproducing with loop devices yet? >> >> Here is what the checksum tests showed: >> >> fsfreeze -f /mountpoit >> md5sum /dev/rbd0 >> f33c926373ad604da674bcbfbe6460c5 /dev/rbd0 >> rbd snap create xx@xxx && rbd snap protect xx@xxx >> rbd map xx@xxx >> md5sum /dev/rbd1 >> 6f702740281874632c73aeb2c0fcf34a /dev/rbd1 >> >> where rbd1 is a snapshot of the rbd0 device. So the checksum is indeed >> different, worrying. > > Sorry, for the filesystem device you should do > > md5sum <(dd if=/dev/rbd0 iflag=direct bs=8M) > > to get what's actually on disk, so that it's apples to apples. root@alxc13:~# rbd showmapped |egrep "device|c11579" id pool image snap device 47 rbd c11579 - /dev/rbd47 root@alxc13:~# fsfreeze -f /var/lxc/c11579 root@alxc13:~# md5sum <(dd if=/dev/rbd47 iflag=direct bs=8M) 12800+0 records in 12800+0 records out 107374182400 bytes (107 GB) copied, 617.815 s, 174 MB/s 2ddc99ce1b3ef51da1945d9da25ac296 /dev/fd/63 <--- Check sum after freeze root@alxc13:~# rbd snap create rbd/c11579@snap_test root@alxc13:~# rbd map c11579@snap_test /dev/rbd1 root@alxc13:~# md5sum <(dd if=/dev/rbd1 iflag=direct bs=8M) 12800+0 records in 12800+0 records out 107374182400 bytes (107 GB) copied, 610.043 s, 176 MB/s 2ddc99ce1b3ef51da1945d9da25ac296 /dev/fd/63 <--- Check sum of snapshot root@alxc13:~# md5sum <(dd if=/dev/rbd47 iflag=direct bs=8M) 12800+0 records in 12800+0 records out 107374182400 bytes (107 GB) copied, 592.164 s, 181 MB/s 2ddc99ce1b3ef51da1945d9da25ac296 /dev/fd/63<--- Check sum of original device, not changed - GOOD root@alxc13:~# file -s /dev/rbd1 /dev/rbd1: Linux rev 1.0 ext4 filesystem data (extents) (large files) (huge files) root@alxc13:~# fsfreeze -u /var/lxc/c11579 root@alxc13:~# md5sum <(dd if=/dev/rbd47 iflag=direct bs=8M) 12800+0 records in 12800+0 records out 107374182400 bytes (107 GB) copied, 647.01 s, 166 MB/s 92b7182591d7d7380435cfdea79a8897 /dev/fd/63 <--- After unfreeze checksum is different - OK root@alxc13:~# md5sum <(dd if=/dev/rbd1 iflag=direct bs=8M) 12800+0 records in 12800+0 records out 107374182400 bytes (107 GB) copied, 590.556 s, 182 MB/s bc3b68f0276c608d9435223f89589962 /dev/fd/63 <--- Why the heck the checksum of the snapshot is different after unfreeze? BAD? root@alxc13:~# file -s /dev/rbd1 /dev/rbd1: Linux rev 1.0 ext4 filesystem data (needs journal recovery) (extents) (large files) (huge files) root@alxc13:~# ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Consistency problems when taking RBD snapshot
On Wed, Sep 14, 2016 at 3:30 PM, Nikolay Borisov wrote: > > > On 09/14/2016 02:55 PM, Ilya Dryomov wrote: >> On Wed, Sep 14, 2016 at 9:01 AM, Nikolay Borisov wrote: >>> >>> >>> On 09/14/2016 09:55 AM, Adrian Saul wrote: I found I could ignore the XFS issues and just mount it with the appropriate options (below from my backup scripts): # # Mount with nouuid (conflicting XFS) and norecovery (ro snapshot) # if ! mount -o ro,nouuid,norecovery $SNAPDEV /backup${FS}; then echo "FAILED: Unable to mount snapshot $DATESTAMP of $FS - cleaning up" rbd unmap $SNAPDEV rbd snap rm ${RBDPATH}@${DATESTAMP} exit 3; fi echo "Backup snapshot of $RBDPATH mounted at: /backup${FS}" It's impossible without clones to do it without norecovery. >>> >>> But shouldn't freezing the fs and doing a snapshot constitute a "clean >>> unmount" hence no need to recover on the next mount (of the snapshot) - >>> Ilya? >> >> I *thought* it should (well, except for orphan inodes), but now I'm not >> sure. Have you tried reproducing with loop devices yet? > > Here is what the checksum tests showed: > > fsfreeze -f /mountpoit > md5sum /dev/rbd0 > f33c926373ad604da674bcbfbe6460c5 /dev/rbd0 > rbd snap create xx@xxx && rbd snap protect xx@xxx > rbd map xx@xxx > md5sum /dev/rbd1 > 6f702740281874632c73aeb2c0fcf34a /dev/rbd1 > > where rbd1 is a snapshot of the rbd0 device. So the checksum is indeed > different, worrying. Sorry, for the filesystem device you should do md5sum <(dd if=/dev/rbd0 iflag=direct bs=8M) to get what's actually on disk, so that it's apples to apples. Thanks, Ilya ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Consistency problems when taking RBD snapshot
On 09/14/2016 02:55 PM, Ilya Dryomov wrote: > On Wed, Sep 14, 2016 at 9:01 AM, Nikolay Borisov wrote: >> >> >> On 09/14/2016 09:55 AM, Adrian Saul wrote: >>> >>> I found I could ignore the XFS issues and just mount it with the >>> appropriate options (below from my backup scripts): >>> >>> # >>> # Mount with nouuid (conflicting XFS) and norecovery (ro snapshot) >>> # >>> if ! mount -o ro,nouuid,norecovery $SNAPDEV /backup${FS}; then >>> echo "FAILED: Unable to mount snapshot $DATESTAMP of $FS - >>> cleaning up" >>> rbd unmap $SNAPDEV >>> rbd snap rm ${RBDPATH}@${DATESTAMP} >>> exit 3; >>> fi >>> echo "Backup snapshot of $RBDPATH mounted at: /backup${FS}" >>> >>> It's impossible without clones to do it without norecovery. >> >> But shouldn't freezing the fs and doing a snapshot constitute a "clean >> unmount" hence no need to recover on the next mount (of the snapshot) - >> Ilya? > > I *thought* it should (well, except for orphan inodes), but now I'm not > sure. Have you tried reproducing with loop devices yet? Here is what the checksum tests showed: fsfreeze -f /mountpoit md5sum /dev/rbd0 f33c926373ad604da674bcbfbe6460c5 /dev/rbd0 rbd snap create xx@xxx && rbd snap protect xx@xxx rbd map xx@xxx md5sum /dev/rbd1 6f702740281874632c73aeb2c0fcf34a /dev/rbd1 where rbd1 is a snapshot of the rbd0 device. So the checksum is indeed different, worrying. > > Thanks, > > Ilya > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Consistency problems when taking RBD snapshot
On 09/14/2016 02:55 PM, Ilya Dryomov wrote: > On Wed, Sep 14, 2016 at 9:01 AM, Nikolay Borisov wrote: >> >> >> On 09/14/2016 09:55 AM, Adrian Saul wrote: >>> >>> I found I could ignore the XFS issues and just mount it with the >>> appropriate options (below from my backup scripts): >>> >>> # >>> # Mount with nouuid (conflicting XFS) and norecovery (ro snapshot) >>> # >>> if ! mount -o ro,nouuid,norecovery $SNAPDEV /backup${FS}; then >>> echo "FAILED: Unable to mount snapshot $DATESTAMP of $FS - >>> cleaning up" >>> rbd unmap $SNAPDEV >>> rbd snap rm ${RBDPATH}@${DATESTAMP} >>> exit 3; >>> fi >>> echo "Backup snapshot of $RBDPATH mounted at: /backup${FS}" >>> >>> It's impossible without clones to do it without norecovery. >> >> But shouldn't freezing the fs and doing a snapshot constitute a "clean >> unmount" hence no need to recover on the next mount (of the snapshot) - >> Ilya? > > I *thought* it should (well, except for orphan inodes), but now I'm not > sure. Have you tried reproducing with loop devices yet? Unfortunately not yet since this is being tested in our production setup which is non-trivial to replicate in a test environment. Tonight the results of the checksumming experiments should be available. While on the topic this might very well be caused by a race in the fsfreeze code as seen here: https://lkml.org/lkml/2016/9/12/337 Also this is observed only on large and busy volumes (e.g. testing with a 10g volume which is not very busy doesn't exhibit the corruption). > > Thanks, > > Ilya > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Consistency problems when taking RBD snapshot
On Wed, Sep 14, 2016 at 9:01 AM, Nikolay Borisov wrote: > > > On 09/14/2016 09:55 AM, Adrian Saul wrote: >> >> I found I could ignore the XFS issues and just mount it with the appropriate >> options (below from my backup scripts): >> >> # >> # Mount with nouuid (conflicting XFS) and norecovery (ro snapshot) >> # >> if ! mount -o ro,nouuid,norecovery $SNAPDEV /backup${FS}; then >> echo "FAILED: Unable to mount snapshot $DATESTAMP of $FS - >> cleaning up" >> rbd unmap $SNAPDEV >> rbd snap rm ${RBDPATH}@${DATESTAMP} >> exit 3; >> fi >> echo "Backup snapshot of $RBDPATH mounted at: /backup${FS}" >> >> It's impossible without clones to do it without norecovery. > > But shouldn't freezing the fs and doing a snapshot constitute a "clean > unmount" hence no need to recover on the next mount (of the snapshot) - > Ilya? I *thought* it should (well, except for orphan inodes), but now I'm not sure. Have you tried reproducing with loop devices yet? Thanks, Ilya ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Consistency problems when taking RBD snapshot
> But shouldn't freezing the fs and doing a snapshot constitute a "clean > unmount" hence no need to recover on the next mount (of the snapshot) - > Ilya? It's what I thought as well, but XFS seems to want to attempt to replay the log regardless on mount and write to the device to do so. This was the only way I found to mount it without converting the snapshot to a clone (which I couldn't do with the image options enabled anyway). I have this script snapshotting, mounting and backing up multiple file systems on my cluster with no issue. Confidentiality: This email and any attachments are confidential and may be subject to copyright, legal or some other professional privilege. They are intended solely for the attention and use of the named addressee(s). They may only be copied, distributed or disclosed with the consent of the copyright owner. If you have received this email by mistake or by breach of the confidentiality clause, please notify the sender immediately by return email and delete or destroy all copies of the email. Any confidentiality, privilege or copyright is not waived or lost because this email has been sent to you by mistake. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Consistency problems when taking RBD snapshot
On 09/14/2016 09:55 AM, Adrian Saul wrote: > > I found I could ignore the XFS issues and just mount it with the appropriate > options (below from my backup scripts): > > # > # Mount with nouuid (conflicting XFS) and norecovery (ro snapshot) > # > if ! mount -o ro,nouuid,norecovery $SNAPDEV /backup${FS}; then > echo "FAILED: Unable to mount snapshot $DATESTAMP of $FS - > cleaning up" > rbd unmap $SNAPDEV > rbd snap rm ${RBDPATH}@${DATESTAMP} > exit 3; > fi > echo "Backup snapshot of $RBDPATH mounted at: /backup${FS}" > > It's impossible without clones to do it without norecovery. But shouldn't freezing the fs and doing a snapshot constitute a "clean unmount" hence no need to recover on the next mount (of the snapshot) - Ilya? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Consistency problems when taking RBD snapshot
I found I could ignore the XFS issues and just mount it with the appropriate options (below from my backup scripts): # # Mount with nouuid (conflicting XFS) and norecovery (ro snapshot) # if ! mount -o ro,nouuid,norecovery $SNAPDEV /backup${FS}; then echo "FAILED: Unable to mount snapshot $DATESTAMP of $FS - cleaning up" rbd unmap $SNAPDEV rbd snap rm ${RBDPATH}@${DATESTAMP} exit 3; fi echo "Backup snapshot of $RBDPATH mounted at: /backup${FS}" It's impossible without clones to do it without norecovery. > -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Ilya Dryomov > Sent: Wednesday, 14 September 2016 1:51 AM > To: Nikolay Borisov > Cc: ceph-users; SiteGround Operations > Subject: Re: [ceph-users] Consistency problems when taking RBD snapshot > > On Tue, Sep 13, 2016 at 4:11 PM, Nikolay Borisov wrote: > > > > > > On 09/13/2016 04:30 PM, Ilya Dryomov wrote: > > [SNIP] > >> > >> Hmm, it could be about whether it is able to do journal replay on > >> mount. When you mount a snapshot, you get a read-only block device; > >> when you mount a clone image, you get a read-write block device. > >> > >> Let's try this again, suppose image is foo and snapshot is snap: > >> > >> # fsfreeze -f /mnt > >> > >> # rbd snap create foo@snap > >> # rbd map foo@snap > >> /dev/rbd0 > >> # file -s /dev/rbd0 > >> # fsck.ext4 -n /dev/rbd0 > >> # mount /dev/rbd0 /foo > >> # umount /foo > >> > >> # file -s /dev/rbd0 > >> # fsck.ext4 -n /dev/rbd0 > >> > >> # rbd clone foo@snap bar > >> $ rbd map bar > >> /dev/rbd1 > >> # file -s /dev/rbd1 > >> # fsck.ext4 -n /dev/rbd1 > >> # mount /dev/rbd1 /bar > >> # umount /bar > >> > >> # file -s /dev/rbd1 > >> # fsck.ext4 -n /dev/rbd1 > >> > >> Could you please provide the output for the above? > > > > Here you go : http://paste.ubuntu.com/23173721/ > > OK, so that explains it: the frozen filesystem is "needs journal recovery", so > mounting it off of read-only block device leads to errors. > > root@alxc13:~# fsfreeze -f /var/lxc/c11579 root@alxc13:~# rbd snap create > rbd/c11579@snap_test root@alxc13:~# rbd map c11579@snap_test > /dev/rbd151 > root@alxc13:~# fsfreeze -u /var/lxc/c11579 root@alxc13:~# file -s > /dev/rbd151 > /dev/rbd151: Linux rev 1.0 ext4 filesystem data (needs journal > recovery) (extents) (large files) (huge files) > > Now, to isolate the problem, the easiest would probably be to try to > reproduce it with loop devices. Can you try dding one of these images to a > file, make sure that the filesystem is clean, losetup + mount, freeze, make a > "snapshot" with cp and losetup -r + mount? > > Try sticking file -s before unfreeze and also compare md5sums: > > root@alxc13:~# fsfreeze -f /var/lxc/c11579 device> root@alxc13:~# rbd snap create rbd/c11579@snap_test > root@alxc13:~# rbd map c11579@snap_test device> root@alxc13:~# file -s /dev/rbd151 > root@alxc13:~# fsfreeze -u /var/lxc/c11579 device> root@alxc13:~# file -s /dev/rbd151 > > Thanks, > > Ilya > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com Confidentiality: This email and any attachments are confidential and may be subject to copyright, legal or some other professional privilege. They are intended solely for the attention and use of the named addressee(s). They may only be copied, distributed or disclosed with the consent of the copyright owner. If you have received this email by mistake or by breach of the confidentiality clause, please notify the sender immediately by return email and delete or destroy all copies of the email. Any confidentiality, privilege or copyright is not waived or lost because this email has been sent to you by mistake. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Consistency problems when taking RBD snapshot
On Tue, Sep 13, 2016 at 4:11 PM, Nikolay Borisov wrote: > > > On 09/13/2016 04:30 PM, Ilya Dryomov wrote: > [SNIP] >> >> Hmm, it could be about whether it is able to do journal replay on >> mount. When you mount a snapshot, you get a read-only block device; >> when you mount a clone image, you get a read-write block device. >> >> Let's try this again, suppose image is foo and snapshot is snap: >> >> # fsfreeze -f /mnt >> >> # rbd snap create foo@snap >> # rbd map foo@snap >> /dev/rbd0 >> # file -s /dev/rbd0 >> # fsck.ext4 -n /dev/rbd0 >> # mount /dev/rbd0 /foo >> # umount /foo >> >> # file -s /dev/rbd0 >> # fsck.ext4 -n /dev/rbd0 >> >> # rbd clone foo@snap bar >> $ rbd map bar >> /dev/rbd1 >> # file -s /dev/rbd1 >> # fsck.ext4 -n /dev/rbd1 >> # mount /dev/rbd1 /bar >> # umount /bar >> >> # file -s /dev/rbd1 >> # fsck.ext4 -n /dev/rbd1 >> >> Could you please provide the output for the above? > > Here you go : http://paste.ubuntu.com/23173721/ OK, so that explains it: the frozen filesystem is "needs journal recovery", so mounting it off of read-only block device leads to errors. root@alxc13:~# fsfreeze -f /var/lxc/c11579 root@alxc13:~# rbd snap create rbd/c11579@snap_test root@alxc13:~# rbd map c11579@snap_test /dev/rbd151 root@alxc13:~# fsfreeze -u /var/lxc/c11579 root@alxc13:~# file -s /dev/rbd151 /dev/rbd151: Linux rev 1.0 ext4 filesystem data (needs journal recovery) (extents) (large files) (huge files) Now, to isolate the problem, the easiest would probably be to try to reproduce it with loop devices. Can you try dding one of these images to a file, make sure that the filesystem is clean, losetup + mount, freeze, make a "snapshot" with cp and losetup -r + mount? Try sticking file -s before unfreeze and also compare md5sums: root@alxc13:~# fsfreeze -f /var/lxc/c11579 root@alxc13:~# rbd snap create rbd/c11579@snap_test root@alxc13:~# rbd map c11579@snap_test root@alxc13:~# file -s /dev/rbd151 root@alxc13:~# fsfreeze -u /var/lxc/c11579 root@alxc13:~# file -s /dev/rbd151 Thanks, Ilya ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Consistency problems when taking RBD snapshot
On 09/13/2016 04:30 PM, Ilya Dryomov wrote: [SNIP] > > Hmm, it could be about whether it is able to do journal replay on > mount. When you mount a snapshot, you get a read-only block device; > when you mount a clone image, you get a read-write block device. > > Let's try this again, suppose image is foo and snapshot is snap: > > # fsfreeze -f /mnt > > # rbd snap create foo@snap > # rbd map foo@snap > /dev/rbd0 > # file -s /dev/rbd0 > # fsck.ext4 -n /dev/rbd0 > # mount /dev/rbd0 /foo > # umount /foo > > # file -s /dev/rbd0 > # fsck.ext4 -n /dev/rbd0 > > # rbd clone foo@snap bar > $ rbd map bar > /dev/rbd1 > # file -s /dev/rbd1 > # fsck.ext4 -n /dev/rbd1 > # mount /dev/rbd1 /bar > # umount /bar > > # file -s /dev/rbd1 > # fsck.ext4 -n /dev/rbd1 > > Could you please provide the output for the above? Here you go : http://paste.ubuntu.com/23173721/ [SNIP] ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Consistency problems when taking RBD snapshot
On Tue, Sep 13, 2016 at 1:59 PM, Nikolay Borisov wrote: > > > On 09/13/2016 01:33 PM, Ilya Dryomov wrote: >> On Tue, Sep 13, 2016 at 12:08 PM, Nikolay Borisov wrote: >>> Hello list, >>> >>> >>> I have the following cluster: >>> >>> ceph status >>> cluster a2fba9c1-4ca2-46d8-8717-a8e42db14bb0 >>> health HEALTH_OK >>> monmap e2: 5 mons at >>> {alxc10=x:6789/0,alxc11=x:6789/0,alxc5=x:6789/0,alxc6=:6789/0,alxc7=x:6789/0} >>> election epoch 196, quorum 0,1,2,3,4 >>> alxc10,alxc5,alxc6,alxc7,alxc11 >>> mdsmap e797: 1/1/1 up {0=alxc11.=up:active}, 2 up:standby >>> osdmap e11243: 50 osds: 50 up, 50 in >>> pgmap v3563774: 8192 pgs, 3 pools, 1954 GB data, 972 kobjects >>> 4323 GB used, 85071 GB / 89424 GB avail >>> 8192 active+clean >>> client io 168 MB/s rd, 11629 kB/s wr, 3447 op/s >>> >>> It's running ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432) >>> and kernel 4.4.14 >>> >>> I have multiple rbd devices which are used as the root for lxc-based >>> containers and have ext4. At some point I want >>> to create a an rbd snapshot, for this the sequence of operations I do is >>> thus: >>> >>> 1. freezefs -f /path/to/where/ext4-ontop-of-rbd-is-mounted >> >> fsfreeze? > > Yes, indeed, my bad. > >> >>> >>> 2. rbd snap create >>> "${CEPH_POOL_NAME}/${name-of-blockdev}@${name-of-snapshot} >>> >>> 3. freezefs -u /path/to/where/ext4-ontop-of-rbd-is-mounted >>> >>> <= At this point normal container operation continues => >>> >>> 4. Mount the newly created snapshot to a 2nd location as read-only and >>> rsync the files from it to a remote server. >>> >>> However as I start rsyncing stuff to the remote server then certain files >>> in the snapshot are reported as corrupted. >> >> Can you share some dmesg snippets? Is there a pattern - the same >> file/set of files, etc? > > [1718059.910038] Buffer I/O error on dev rbd143, logical block 0, lost sync > page write > [1718060.044540] EXT4-fs error (device rbd143): ext4_lookup:1584: inode > #52269: comm rsync: deleted inode referenced: 46393 > [1718060.044978] EXT4-fs (rbd143): previous I/O error to superblock detected > [1718060.045246] rbd: rbd143: write 1000 at 0 result -30 > [1718060.045249] blk_update_request: I/O error, dev rbd143, sector 0 > [1718060.045487] Buffer I/O error on dev rbd143, logical block 0, lost sync > page write > [1718071.404057] EXT4-fs error (device rbd143): ext4_lookup:1584: inode > #385038: comm rsync: deleted inode referenced: 46581 > [1718071.404466] EXT4-fs (rbd143): previous I/O error to superblock detected > [1718071.404739] rbd: rbd143: write 1000 at 0 result -30 > [1718071.404742] blk_update_request: I/O error, dev rbd143, sector 0 > [1718071.404999] Buffer I/O error on dev rbd143, logical block 0, lost sync > page write > [1718071.419172] EXT4-fs error (device rbd143): ext4_lookup:1584: inode > #769039: comm rsync: deleted inode referenced: 410848 > [1718071.419575] EXT4-fs (rbd143): previous I/O error to superblock detected > [1718071.419844] rbd: rbd143: write 1000 at 0 result -30 > [1718071.419847] blk_update_request: I/O error, dev rbd143, sector 0 > [1718071.420081] Buffer I/O error on dev rbd143, logical block 0, lost sync > page write > [1718071.420758] EXT4-fs error (device rbd143): ext4_lookup:1584: inode > #769039: comm rsync: deleted inode referenced: 410848 > [1718071.421196] EXT4-fs (rbd143): previous I/O error to superblock detected > [1718071.421441] rbd: rbd143: write 1000 at 0 result -30 > [1718071.421443] blk_update_request: I/O error, dev rbd143, sector 0 > [1718071.421671] Buffer I/O error on dev rbd143, logical block 0, lost sync > page write > [1718071.543020] EXT4-fs error (device rbd143): ext4_lookup:1584: inode > #52269: comm rsync: deleted inode referenced: 46393 > [1718071.543422] EXT4-fs (rbd143): previous I/O error to superblock detected > [1718071.543680] rbd: rbd143: write 1000 at 0 result -30 > [1718071.543682] blk_update_request: I/O error, dev rbd143, sector 0 > [1718071.543945] Buffer I/O error on dev rbd143, logical block 0, lost sync > page write > [1718083.388635] EXT4-fs error (device rbd143): ext4_lookup:1584: inode > #385038: comm rsync: deleted inode referenced: 46581 > [1718083.389060] EXT4-fs (rbd143): previous I/O error to superblock detected > [1718083.389324] rbd: rbd143: write 1000 at 0 result -30 > [1718083.389327] blk_update_request: I/O error, dev rbd143, sector 0 > [1718083.389561] Buffer I/O error on dev rbd143, logical block 0, lost sync > page write > [1718083.403910] EXT4-fs error (device rbd143): ext4_lookup:1584: inode > #769039: comm rsync: deleted inode referenced: 410848 > [1718083.404319] EXT4-fs (rbd143): previous I/O error to superblock detected > [1718083.404581] rbd: rbd143: write 1000 at 0 result -30 > [1718083.404583] blk_update_request: I/O error, dev rbd143, sector 0 > [1718083.404816] Buffer I/O error on dev rbd143, logical block 0, lost sync
Re: [ceph-users] Consistency problems when taking RBD snapshot
On 09/13/2016 01:33 PM, Ilya Dryomov wrote: > On Tue, Sep 13, 2016 at 12:08 PM, Nikolay Borisov wrote: >> Hello list, >> >> >> I have the following cluster: >> >> ceph status >> cluster a2fba9c1-4ca2-46d8-8717-a8e42db14bb0 >> health HEALTH_OK >> monmap e2: 5 mons at >> {alxc10=x:6789/0,alxc11=x:6789/0,alxc5=x:6789/0,alxc6=:6789/0,alxc7=x:6789/0} >> election epoch 196, quorum 0,1,2,3,4 >> alxc10,alxc5,alxc6,alxc7,alxc11 >> mdsmap e797: 1/1/1 up {0=alxc11.=up:active}, 2 up:standby >> osdmap e11243: 50 osds: 50 up, 50 in >> pgmap v3563774: 8192 pgs, 3 pools, 1954 GB data, 972 kobjects >> 4323 GB used, 85071 GB / 89424 GB avail >> 8192 active+clean >> client io 168 MB/s rd, 11629 kB/s wr, 3447 op/s >> >> It's running ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432) >> and kernel 4.4.14 >> >> I have multiple rbd devices which are used as the root for lxc-based >> containers and have ext4. At some point I want >> to create a an rbd snapshot, for this the sequence of operations I do is >> thus: >> >> 1. freezefs -f /path/to/where/ext4-ontop-of-rbd-is-mounted > > fsfreeze? Yes, indeed, my bad. > >> >> 2. rbd snap create "${CEPH_POOL_NAME}/${name-of-blockdev}@${name-of-snapshot} >> >> 3. freezefs -u /path/to/where/ext4-ontop-of-rbd-is-mounted >> >> <= At this point normal container operation continues => >> >> 4. Mount the newly created snapshot to a 2nd location as read-only and rsync >> the files from it to a remote server. >> >> However as I start rsyncing stuff to the remote server then certain files in >> the snapshot are reported as corrupted. > > Can you share some dmesg snippets? Is there a pattern - the same > file/set of files, etc? [1718059.910038] Buffer I/O error on dev rbd143, logical block 0, lost sync page write [1718060.044540] EXT4-fs error (device rbd143): ext4_lookup:1584: inode #52269: comm rsync: deleted inode referenced: 46393 [1718060.044978] EXT4-fs (rbd143): previous I/O error to superblock detected [1718060.045246] rbd: rbd143: write 1000 at 0 result -30 [1718060.045249] blk_update_request: I/O error, dev rbd143, sector 0 [1718060.045487] Buffer I/O error on dev rbd143, logical block 0, lost sync page write [1718071.404057] EXT4-fs error (device rbd143): ext4_lookup:1584: inode #385038: comm rsync: deleted inode referenced: 46581 [1718071.404466] EXT4-fs (rbd143): previous I/O error to superblock detected [1718071.404739] rbd: rbd143: write 1000 at 0 result -30 [1718071.404742] blk_update_request: I/O error, dev rbd143, sector 0 [1718071.404999] Buffer I/O error on dev rbd143, logical block 0, lost sync page write [1718071.419172] EXT4-fs error (device rbd143): ext4_lookup:1584: inode #769039: comm rsync: deleted inode referenced: 410848 [1718071.419575] EXT4-fs (rbd143): previous I/O error to superblock detected [1718071.419844] rbd: rbd143: write 1000 at 0 result -30 [1718071.419847] blk_update_request: I/O error, dev rbd143, sector 0 [1718071.420081] Buffer I/O error on dev rbd143, logical block 0, lost sync page write [1718071.420758] EXT4-fs error (device rbd143): ext4_lookup:1584: inode #769039: comm rsync: deleted inode referenced: 410848 [1718071.421196] EXT4-fs (rbd143): previous I/O error to superblock detected [1718071.421441] rbd: rbd143: write 1000 at 0 result -30 [1718071.421443] blk_update_request: I/O error, dev rbd143, sector 0 [1718071.421671] Buffer I/O error on dev rbd143, logical block 0, lost sync page write [1718071.543020] EXT4-fs error (device rbd143): ext4_lookup:1584: inode #52269: comm rsync: deleted inode referenced: 46393 [1718071.543422] EXT4-fs (rbd143): previous I/O error to superblock detected [1718071.543680] rbd: rbd143: write 1000 at 0 result -30 [1718071.543682] blk_update_request: I/O error, dev rbd143, sector 0 [1718071.543945] Buffer I/O error on dev rbd143, logical block 0, lost sync page write [1718083.388635] EXT4-fs error (device rbd143): ext4_lookup:1584: inode #385038: comm rsync: deleted inode referenced: 46581 [1718083.389060] EXT4-fs (rbd143): previous I/O error to superblock detected [1718083.389324] rbd: rbd143: write 1000 at 0 result -30 [1718083.389327] blk_update_request: I/O error, dev rbd143, sector 0 [1718083.389561] Buffer I/O error on dev rbd143, logical block 0, lost sync page write [1718083.403910] EXT4-fs error (device rbd143): ext4_lookup:1584: inode #769039: comm rsync: deleted inode referenced: 410848 [1718083.404319] EXT4-fs (rbd143): previous I/O error to superblock detected [1718083.404581] rbd: rbd143: write 1000 at 0 result -30 [1718083.404583] blk_update_request: I/O error, dev rbd143, sector 0 [1718083.404816] Buffer I/O error on dev rbd143, logical block 0, lost sync page write [1718083.405484] EXT4-fs error (device rbd143): ext4_lookup:1584: inode #769039: comm rsync: deleted inode referenced: 410848 [1718083.405893] EXT4-fs (rbd143): previous I/O error to superblock detect
Re: [ceph-users] Consistency problems when taking RBD snapshot
On Tue, Sep 13, 2016 at 12:08 PM, Nikolay Borisov wrote: > Hello list, > > > I have the following cluster: > > ceph status > cluster a2fba9c1-4ca2-46d8-8717-a8e42db14bb0 > health HEALTH_OK > monmap e2: 5 mons at > {alxc10=x:6789/0,alxc11=x:6789/0,alxc5=x:6789/0,alxc6=:6789/0,alxc7=x:6789/0} > election epoch 196, quorum 0,1,2,3,4 > alxc10,alxc5,alxc6,alxc7,alxc11 > mdsmap e797: 1/1/1 up {0=alxc11.=up:active}, 2 up:standby > osdmap e11243: 50 osds: 50 up, 50 in > pgmap v3563774: 8192 pgs, 3 pools, 1954 GB data, 972 kobjects > 4323 GB used, 85071 GB / 89424 GB avail > 8192 active+clean > client io 168 MB/s rd, 11629 kB/s wr, 3447 op/s > > It's running ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432) > and kernel 4.4.14 > > I have multiple rbd devices which are used as the root for lxc-based > containers and have ext4. At some point I want > to create a an rbd snapshot, for this the sequence of operations I do is thus: > > 1. freezefs -f /path/to/where/ext4-ontop-of-rbd-is-mounted fsfreeze? > > 2. rbd snap create "${CEPH_POOL_NAME}/${name-of-blockdev}@${name-of-snapshot} > > 3. freezefs -u /path/to/where/ext4-ontop-of-rbd-is-mounted > > <= At this point normal container operation continues => > > 4. Mount the newly created snapshot to a 2nd location as read-only and rsync > the files from it to a remote server. > > However as I start rsyncing stuff to the remote server then certain files in > the snapshot are reported as corrupted. Can you share some dmesg snippets? Is there a pattern - the same file/set of files, etc? > > freezefs implies filesystem syncing I also tested with manually doing > sync/syncfs on the fs which is being snapshot. Before > and after the freezefs and the corruption is still present. So it's unlikely > there are dirty buffers in the page cache. > I'm using the kernel rbd driver for the clients. The theory currently is > there are some caches which are not being flushed, > other than the linux page cache. Reading the doc implies that only librbd is > using separate caching but I'm not using librbd. What happens if you run fsck -n on the snapshot (ro mapping)? What happens if you run clone from the snapshot and run fsck (rw mapping)? What happens if you mount the clone without running fsck and run rsync? Can you try taking more than one snapshot and then compare them? Thanks, Ilya ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Consistency problems when taking RBD snapshot
Hello list, I have the following cluster: ceph status cluster a2fba9c1-4ca2-46d8-8717-a8e42db14bb0 health HEALTH_OK monmap e2: 5 mons at {alxc10=x:6789/0,alxc11=x:6789/0,alxc5=x:6789/0,alxc6=:6789/0,alxc7=x:6789/0} election epoch 196, quorum 0,1,2,3,4 alxc10,alxc5,alxc6,alxc7,alxc11 mdsmap e797: 1/1/1 up {0=alxc11.=up:active}, 2 up:standby osdmap e11243: 50 osds: 50 up, 50 in pgmap v3563774: 8192 pgs, 3 pools, 1954 GB data, 972 kobjects 4323 GB used, 85071 GB / 89424 GB avail 8192 active+clean client io 168 MB/s rd, 11629 kB/s wr, 3447 op/s It's running ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432) and kernel 4.4.14 I have multiple rbd devices which are used as the root for lxc-based containers and have ext4. At some point I want to create a an rbd snapshot, for this the sequence of operations I do is thus: 1. freezefs -f /path/to/where/ext4-ontop-of-rbd-is-mounted 2. rbd snap create "${CEPH_POOL_NAME}/${name-of-blockdev}@${name-of-snapshot} 3. freezefs -u /path/to/where/ext4-ontop-of-rbd-is-mounted <= At this point normal container operation continues => 4. Mount the newly created snapshot to a 2nd location as read-only and rsync the files from it to a remote server. However as I start rsyncing stuff to the remote server then certain files in the snapshot are reported as corrupted. freezefs implies filesystem syncing I also tested with manually doing sync/syncfs on the fs which is being snapshot. Before and after the freezefs and the corruption is still present. So it's unlikely there are dirty buffers in the page cache. I'm using the kernel rbd driver for the clients. The theory currently is there are some caches which are not being flushed, other than the linux page cache. Reading the doc implies that only librbd is using separate caching but I'm not using librbd. Any ideas would be much appreciated. Regards, Nikolay ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com