Hi Tom,

responding back on this briefly so that people are in the loop; I'll
have more details in a blog post that I hope to get around to writing.

On 12/08/2019 11:34, Thomas Byrne - UKRI STFC wrote:
>> And bluestore should refuse to start if the configured limit is > 4GB.  Or 
>> something along those lines...
> 
> Just on this point - Bluestore OSDs will fail to start with an 
> osd_max_object_size >=4GB with a helpful error message about the Bluestore 
> hard limit. I was mildly amused when I discovered that luminous OSDs can 
> start with osd_max_object_size = 4GB - 1 byte, but mimic OSDs require it to 
> be <= 4GB - 2 bytes to start without an error. I haven't checked to see if 
> nautilus OSDs require <= 4GB - 3 bytes yet.

Yes but that doesn't help users much for clusters where very large
objects already exist. Even in Luminous, osd_max_object_size defaults to
128M, but if an OSD already has objects larger than that, it will still
happily start up and serve data with FileStore — and crash any newly
added BlueStore OSDs unfortunate enough to be mapped to a PG with one or
more objects that are 4GiB or larger.

The pending PR to make this a scrub error even on FileStore OSDs
mitigates this issue (https://github.com/ceph/ceph/pull/29579), but
it'll still cause a somewhat unexpected surprise for people who have
just updated to a version including that fix and suddenly see tons of
scrub errors — they would be easily forgiven for assuming they've run
into a regression that involves false positives on scrub. "Hey, none of
these errors were here before the upgrade, surely there's a problem with
the software rather than my data!"

We've progressed further in the interim and it appears like I can give
all-clears on a couple of concerns that we had:

1. It looks like these objects were not created by an RBD going haywire,
but by something actually using librados to create them, presumably long
before the cluster ever went into production.

2. I am not changing the subject line so I don't mess up people's list
archives if their MUA doesn't correctly thread based on In-Reply-To or
References, but it's now evident that this is *not* related to bug
#38724 but instead really just due to objects being too large for
BlueStore, like Sage said in his first reply.

Thanks for the answer — by the way I have been imploring all my
colleagues to watch your Cephalocon talk,[1] which was excellent.

Cheers,
Florian

[1] https://youtu.be/niFNZN5EKvE
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to