On Wed, Jan 20, 2021 at 6:01 PM Peter Lieven <p...@kamp.de> wrote: > > > > Am 19.01.2021 um 15:20 schrieb Jason Dillaman <jdill...@redhat.com>: > > > > On Tue, Jan 19, 2021 at 4:36 AM Peter Lieven <p...@kamp.de> wrote: > >>> Am 18.01.21 um 23:33 schrieb Jason Dillaman: > >>> On Fri, Jan 15, 2021 at 10:39 AM Peter Lieven <p...@kamp.de> wrote: > >>>> Am 15.01.21 um 16:27 schrieb Jason Dillaman: > >>>>> On Thu, Jan 14, 2021 at 2:59 PM Peter Lieven <p...@kamp.de> wrote: > >>>>>> Am 14.01.21 um 20:19 schrieb Jason Dillaman: > >>>>>>> On Sun, Dec 27, 2020 at 11:42 AM Peter Lieven <p...@kamp.de> wrote: > >>>>>>>> since we implement byte interfaces and librbd supports aio on byte > >>>>>>>> granularity we can lift > >>>>>>>> the 512 byte alignment. > >>>>>>>> Signed-off-by: Peter Lieven <p...@kamp.de> > >>>>>>>> --- > >>>>>>>> block/rbd.c | 2 -- > >>>>>>>> 1 file changed, 2 deletions(-) > >>>>>>>> diff --git a/block/rbd.c b/block/rbd.c > >>>>>>>> index 27b4404adf..8673e8f553 100644 > >>>>>>>> --- a/block/rbd.c > >>>>>>>> +++ b/block/rbd.c > >>>>>>>> @@ -223,8 +223,6 @@ done: > >>>>>>>> static void qemu_rbd_refresh_limits(BlockDriverState *bs, Error > >>>>>>>> **errp) > >>>>>>>> { > >>>>>>>> BDRVRBDState *s = bs->opaque; > >>>>>>>> - /* XXX Does RBD support AIO on less than 512-byte alignment? */ > >>>>>>>> - bs->bl.request_alignment = 512; > >>>>>>> Just a suggestion, but perhaps improve discard alignment, max discard, > >>>>>>> optimal alignment (if that's something QEMU handles internally) if not > >>>>>>> overridden by the user. > >>>>>> Qemu supports max_discard and discard_alignment. Is there a call to > >>>>>> get these limits > >>>>>> from librbd? > >>>>>> What do you mean by optimal_alignment? The object size? > >>>>> krbd does a good job of initializing defaults [1] where optimal and > >>>>> discard alignment is 64KiB (can actually be 4KiB now), max IO size for > >>>>> writes, discards, and write-zeroes is the object size * the stripe > >>>>> count. > >>>> Okay, I will have a look at it. If qemu issues a write, discard, > >>>> write_zero greater than > >>>> obj_size * stripe count will librbd split it internally or will the > >>>> request fail? > >>> librbd will handle it as needed. My goal is really just to get the > >>> hints down the guest OS. > >>>> Regarding the alignment it seems that rbd_dev->opts->alloc_size is > >>>> something that comes from the device > >>>> configuration and not from rbd? I don't have that information inside the > >>>> Qemu RBD driver. > >>> librbd doesn't really have the information either. The 64KiB guess > >>> that krbd uses was a compromise since that was the default OSD > >>> allocation size for HDDs since Luminous. Starting with Pacific that > >>> default is going down to 4KiB. > >> I will try to adjust these values as far as it is possible and makes sense. > >> Is there a way to check the minimum supported OSD release in the backend > >> from librbd / librados? > > > > It's not a minimum -- RADOS will gladly access 1 byte writes as well. > > It's really just the optimal (performance and space-wise). Sadly, > > there is no realistic way to query this data from the backend. > > So you would suggest to advertise an optimal transfer length of 64k and max > transfer length of obj size * stripe count to the guest unless we have an API > in the future to query these limits from the backend?
I'll open a Ceph tracker ticket to expose these via the API in a future release. > I would leave request alignment at 1 byte as otherwise Qemu will issue RMWs > for all write requests that do not align. Everything that comes from a guest > OS is very likely 4k aligned anyway. My goal is definitely not to have QEMU do any extra work for splitting or aligning IOs. I am really only trying to get hints passed down the guest via the virtio drivers. If there isn't the plumbing in QEMU for that yet, disregard. > Peter > > -- Jason