On 22 March 2017 at 05:51, Dan van der Ster <d...@vanderster.com> wrote:
> On Wed, Mar 22, 2017 at 8:24 AM, Marcus Furlong <furlo...@gmail.com>
wrote:
>> Hi,
>>
>> I'm experiencing the same issue as outlined in this post:
>>
>>
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-September/013330.html
>>
>> I have also deployed this jewel cluster using ceph-deploy.
>>
>> This is the message I see at boot (happens for all drives, on all OSD
nodes):
>>
>> [ 92.938882] XFS (sdi1): Mounting V5 Filesystem
>> [ 93.065393] XFS (sdi1): Ending clean mount
>> [ 93.175299] attempt to access beyond end of device
>> [ 93.175304] sdi1: rw=0, want=19134412768, limit=19134412767
>>
>> and again while the cluster is in operation:
>>
>> [429280.254400] attempt to access beyond end of device
>> [429280.254412] sdi1: rw=0, want=19134412768, limit=19134412767
>>
>
> We see these as well, and I'm also curious what's causing it. Perhaps
> sgdisk is doing something wrong when creating the ceph-data partition?

Apologies for reviving an old thread, but I figured out what happened and
never documented it, so I thought an update might be useful.

The disk layout I've ascertained is as follows:

sector 0 = protective MBR (or empty)
sectors 1 to 33 = GPT (33 sectors)
sectors 34 to 2047 = free (as confirmed by sgdisk -f -E)
sectors 2048 to 19134414814 (19134412767 sectors: Data Partition 1)
sectors 19134414815 to 19134414847 (33 sectors: GPT backup data)

And the error:

[ 92.938882] XFS (sdi1): Mounting V5 Filesystem
[ 93.065393] XFS (sdi1): Ending clean mount
[ 93.175299] attempt to access beyond end of device
[ 93.175304] sdi1: rw=0, want=19134412768, limit=19134412767

This shows that the error occurs when trying to access sector 1913441278 of
Partition 1, which we can see from the above, doesn't exist.

I noticed that the file system size is 3.5KiB less than the size of the
partition, and the XFS block size is 4KiB.

EMDS = 19134412767 * 512 = 9796819336704 <- actual partition size
CDS = 9567206383 * 1024 = 9796819336192 (512 bytes less than EMDS) <- oddly
/proc/partitions reports 512 bytes less, because it's using 1024 bytes as
the unit
FSS = 2391801595 * 4096 = 9796819333120 (3072 bytes less than CDS) <-
filesystem

It turns out, if I create a partition that matches the block size of the
XFS filesystem, then the error does not occur. i.e. no error when the
filesystem starts _and_ ends on a partition boundary.

When this happens, e.g. as follows, then there is no issue. This partition
is 7 sectors smaller than the one referenced above.

# sgdisk --new=0:2048:19134414807 -- /dev/sdi
Creating new GPT entries.
The operation has completed successfully.

# sgdisk -p /dev/sdi
Disk /dev/sdf: 19134414848 sectors, 8.9 TiB
Logical sector size: 512 bytes
Disk identifier (GUID): 3E61A8BA-838A-4D7E-BB8E-293972EB45AE
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 19134414814
Partitions will be aligned on 2048-sector boundaries
Total free space is 2021 sectors (1010.5 KiB)

When the end of the partition is not aligned to the 4KiB blocks used by
XFS, the error occurs. This explains why the defaults from parted work
correctly, as the 1MiB "padding" is 4K-aligned.

This non-alignment happens because ceph-deploy uses sgdisk, and sgdisk
seems to align the start of the partition with 2048-sector boundaries, but
_not_ the end of the partition, when used with the -L parameter.

The fix was to recreate the partition table, and reduce the unused sectors
down to the max filesystem size:

https://gist.github.com/furlongm/292aefa930f40dc03f21693d1fc19f35

In my testing, I could only reproduce this with XFS, not with other
filesystems. It can be reproduced on smaller XFS filesystems but seems to
take more time.

Cheers,
Marcus.
-- 
Marcus Furlong
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to