Re: [ceph-users] RBD - 'attempt to access beyond end of device'

Jan Schermer Thu, 12 Nov 2015 14:06:05 -0800

Apologies, it seems that to shrink the device a parameter --allow-shrink must 
be used.


> On 12 Nov 2015, at 22:49, Jan Schermer <j...@schermer.cz> wrote:
> 
> xfs_growfs "autodetects" the block device size. You can force re-read of the 
> block device to refresh this info but might not do anything at all.
> 
> There are situations when block device size will not reflect reality - for 
> example you can't (or at least couldn't) resize partition that is in use 
> (mounted, mapped, used in LVM...) without serious hacks, and ioctls on this 
> partition will return the old size until you reboot.
> The block device can also simply lie (like if you triggered a bug that made 
> the rbd device visually larger).
> Device-mapper devices have their own issues.
> 
> The only advice I can give is to never, ever shrink LUNs or block devices and 
> to avoid partitions if you can. I usually set up a fairly large OS drive 
> (with oversized partitions to be safe, assuming you have thin-provisioning it 
> wastes no real space) and a separate data volume without any partitioning. 
> This also works-around possible alignment issues....
> Growing is always safe, shrinking destroys data. I am very surprised that 
> "rbd resize" doesn't require something like 
> "--i-really-really-know-what-i-am-doing --please-eatmydata" parameter to 
> shrink the image (or does it ask for confirmation when shrinking at least? I 
> can't try it now). Making a typo == instawipe?
> 
> My bet would still be that the original image was larger and you shrunk it by 
> mistake. The kernel client most probably never gets the capacity change 
> notification and you end up creating filesystem that points outside the 
> device. (not sure if mkfs.xfs actually tries seeking over the full sector 
> range). This is the most plausible explanation I can think of, but anything 
> is possible. I have other ideas if you want to investigate but I'd take it 
> off-list...
> 
> Jan
> 
> P.S. Your image is not 2TB but rather 2000 GiB ;-)
> 
> 
> 
>> On 12 Nov 2015, at 22:10, Bogdan SOLGA <bogdan.so...@gmail.com 
>> <mailto:bogdan.so...@gmail.com>> wrote:
>> 
>> Unfortunately I can no longer execute those commands for that rbd5, as I had 
>> to delete it; I couldn't 'resurrect' it, at least not in a decent time.
>> 
>> Here is the output for another image, which is 2TB big:
>> 
>> ceph-admin@ceph-client-01:~$ sudo blockdev --getsz --getss --getbsz /dev/rbd1
>> 4194304000
>> 512
>> 512
>> 
>> ceph-admin@ceph-client-01:~$ xfs_info /dev/rbd1
>> meta-data=/dev/rbd2              isize=256    agcount=8127, agsize=64512 blks
>>          =                       sectsz=512   attr=2
>> data     =                       bsize=4096   blocks=524288000, imaxpct=25
>>          =                       sunit=1024   swidth=1024 blks
>> naming   =version 2              bsize=4096   ascii-ci=0
>> log      =internal               bsize=4096   blocks=2560, version=2
>>          =                       sectsz=512   sunit=8 blks, lazy-count=1
>> realtime =none                   extsz=4096   blocks=0, rtextents=0
>> 
>> 
>> I know rbd can also shrink the image, but I'm sure I haven't shrunk it. What 
>> I have tried, accidentally, was to resize the image to the same size it 
>> previously had, and that operation has failed, after trying for some time. 
>> Hmm... I think the failed resize was the culprit for it's malfunctioning, 
>> then.
>> 
>> Any (additional) advices on how to prevent this type of issues, in the 
>> future? Should the resizing and the xfs_growfs be executed with some 
>> parameters, for a better configuration of the image and / or filesystem?
>> 
>> Thank you very much for your help!
>> 
>> Regards,
>> Bogdan
>> 
>> 
>> On Thu, Nov 12, 2015 at 11:00 PM, Jan Schermer <j...@schermer.cz 
>> <mailto:j...@schermer.cz>> wrote:
>> Can you post the output of:
>> 
>> blockdev --getsz --getss --getbsz /dev/rbd5
>> and
>> xfs_info /dev/rbd5
>> 
>> rbd resize can actually (?) shrink the image as well - is it possible that 
>> the device was actually larger and you shrunk it?
>> 
>> Jan
>> 
>>> On 12 Nov 2015, at 21:46, Bogdan SOLGA <bogdan.so...@gmail.com 
>>> <mailto:bogdan.so...@gmail.com>> wrote:
>>> 
>>> By running rbd resize 
>>> <http://docs.ceph.com/docs/master/rbd/rados-rbd-cmds/> and then 'xfs_growfs 
>>> -d' on the filesystem.
>>> 
>>> Is there a better way to resize an RBD image and the filesystem?
>>> 
>>> On Thu, Nov 12, 2015 at 10:35 PM, Jan Schermer <j...@schermer.cz 
>>> <mailto:j...@schermer.cz>> wrote:
>>> 
>>>> On 12 Nov 2015, at 20:49, Bogdan SOLGA <bogdan.so...@gmail.com 
>>>> <mailto:bogdan.so...@gmail.com>> wrote:
>>>> 
>>>> Hello Jan!
>>>> 
>>>> Thank you for your advices, first of all!
>>>> 
>>>> The filesystem was created using mkfs.xfs, after creating the RBD block 
>>>> device and mapping it on the Ceph client. I haven't specified any 
>>>> parameters when I created the filesystem, I just ran mkfs.xfs on the image 
>>>> name.
>>>> 
>>>> As you mentioned the filesystem thinking the block device should be larger 
>>>> than it is - I have initially created that image as a 2GB image, and then 
>>>> resized it to be much bigger. Could this be the issue?
>>> 
>>> Sounds more than likely :-) How exactly did you grow it?
>>> 
>>> Jan
>>> 
>>>> 
>>>> There are several RBD images mounted on one Ceph client, but only one of 
>>>> them had issues. I have made a clone, and I will try running fsck on it.
>>>> 
>>>> Fortunately it's not important data, it's just testing data. If I won't 
>>>> succeed repairing it I will trash and re-create it, of course.
>>>> 
>>>> Thank you, once again!
>>>> 
>>>> 
>>>> 
>>>> On Thu, Nov 12, 2015 at 9:28 PM, Jan Schermer <j...@schermer.cz 
>>>> <mailto:j...@schermer.cz>> wrote:
>>>> How did you create filesystems and/or partitions on this RBD block device?
>>>> The obvious causes would be
>>>> 1) you partitioned it and the partition on which you ran mkfs points or 
>>>> pointed during mkfs outside the block device size (happens if you for 
>>>> example automate this and confuse sectors x cylinders, or if you copied 
>>>> the partition table with dd or from some image)
>>>> or
>>>> 2) mkfs created the filesystem with pointers outside of the block device 
>>>> for some other reason (bug?)
>>>> or
>>>> 3) this RBD device is a snapshot that got corrupted (or wasn't snapshotted 
>>>> in crash-consistent state and you got "lucky") and some reference points 
>>>> to a non-sensical block number (fsck could fix this, but I wouldn't trust 
>>>> the data integrity anymore)
>>>> 
>>>> Basically the filesystem thinks the block device should be larger than it 
>>>> is and tries to reach beyond.
>>>> 
>>>> Is this just one machine or RBD image or is there more?
>>>> 
>>>> I'd first create a snapshot and then try running fsck on it, it should 
>>>> hopefully tell you if there's a problem in setup or a corruption.
>>>> 
>>>> If it's not important data and it's just one instance of this problem then 
>>>> I'd just trash and recreate it.
>>>> 
>>>> Jan
>>>> 
>>>>> On 12 Nov 2015, at 20:14, Bogdan SOLGA <bogdan.so...@gmail.com 
>>>>> <mailto:bogdan.so...@gmail.com>> wrote:
>>>>> 
>>>>> Hello everyone!
>>>>> 
>>>>> We have a recently installed Ceph cluster (v 0.94.5, Ubuntu 14.04), and 
>>>>> today I noticed a lot of 'attempt to access beyond end of device' 
>>>>> messages in the /var/log/syslog file. They are related to a mounted RBD 
>>>>> image, and have the following format:
>>>>> 
>>>>> Nov 12 21:06:44 ceph-client-01 kernel: [438507.952532] attempt to access 
>>>>> beyond end of device
>>>>> Nov 12 21:06:44 ceph-client-01 kernel: [438507.952534] rbd5: rw=33, 
>>>>> want=6193176, limit=4194304
>>>>> 
>>>>> After restarting that Ceph client, I see a lot of 'metadata I/O error' 
>>>>> messages in the boot log:
>>>>> 
>>>>> XFS (rbd5): metadata I/O error: block 0x46e001 
>>>>> ("xfs_buf_iodone_callbacks") error 5 numblks 1
>>>>> 
>>>>> Any idea on why these messages are shown? The health of the cluster shows 
>>>>> as OK, and I can access that block device without (apparent) issues...
>>>>> 
>>>>> Thank you!
>>>>> 
>>>>> Regards,
>>>>> Bogdan
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>>>>> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RBD - 'attempt to access beyond end of device'

Reply via email to