On Sat, Jan 3, 2026 at 6:14 AM Ming Lei <[email protected]> wrote: > > On Fri, Jan 02, 2026 at 05:45:14PM -0700, Caleb Sander Mateos wrote: > > Indicate to the ublk server when an incoming request has integrity data > > by setting UBLK_IO_F_INTEGRITY in the ublksrv_io_desc's op_flags field. > > If the ublk device doesn't support integrity, the request will never > > provide integrity data. If the ublk device supports integrity, the > > request may omit the integrity buffer only if metadata_size matches the > > PI tuple size determined by csum_type. In this case, the ublk server > > should internally generate/verify the protection information from the > > data and sector offset. > > Set the UBLK_IO_F_CHECK_{GUARD,REFTAG,APPTAG} flags based on the > > request's BIP_CHECK_{GUARD,REFTAG,APPTAG} flags, indicating whether to > > verify the guard, reference, and app tags in the protection information. > > The expected reference tag (32 or 48 bits) and app tag (16 bits) are > > indicated in ublksrv_io_desc's new struct ublksrv_io_integrity integrity > > field. This field is unioned with the addr field to avoid changing the > > It might be fine to set per-rq app_tag, but bios in one request might have > different app_tag in case of io merge actually.
I based this logic largely on the code under if (ns->head->ms) in nvme_setup_rw(). That also assumes a single app_tag for the request. Sounds like an existing bug if bios with different app_tags can be merged together? > > Also block layer builds ref_tag for each internal, please see What do you mean by "internal"? "interval"? > t10_pi_generate() and ext_pi_crc64_generate(). Yes, the reftag increases by 1 for each integrity interval. That's why it suffices for an NVMe command reading multiple blocks to specify only the expected reftag for the first block; the reftags for subsequent blocks are incremented accordingly. Actually, I think we probably don't need to communicate the reftag seed to the ublk server. NVMe doesn't use the reftag seed (which can be overridden by struct uio_meta's seed field). Instead, nvme_set_ref_tag() always uses the offset into the block device divided by the integrity interval size, as required by all the existing csum_type formats the kernel supports. So a ublk server could just use the start_sector field of struct ublksrv_io_desc to compute the expected reftags. And using start_sector as the reftag also means merging requests would preserve their expected reftags. > > So looks this way is wrong. > > More importantly reusing iod->addr for other purpose not related with IO > buffer is very unfriendly for adding new features, and one lesson is for > ZONED support > by reusing ublksrv_io_cmd->addr for zoned's append lba. That's a fair point. > > For example, there is chance to support dma-buf based zero copy for ublk, and > please see the io-uring dma-buf support[1], and iod->addr might carry IO > buffer info > in dma-buf format in future. > > [1] > https://lore.kernel.org/io-uring/[email protected]/#t > > > > size of struct ublksrv_io_desc. UBLK_F_INTEGRITY requires > > UBLK_F_USER_COPY and the addr field isn't used for UBLK_F_USER_COPY, so > > the two fields aren't needed simultaneously. > > > > Signed-off-by: Caleb Sander Mateos <[email protected]> > > --- > > drivers/block/ublk_drv.c | 43 +++++++++++++++++++++++++++++++---- > > include/uapi/linux/ublk_cmd.h | 27 ++++++++++++++++++++-- > > 2 files changed, 64 insertions(+), 6 deletions(-) > > > > diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c > > index 2f9316febf83..51469e0627ff 100644 > > --- a/drivers/block/ublk_drv.c > > +++ b/drivers/block/ublk_drv.c > > @@ -316,10 +316,36 @@ static inline bool ublk_dev_is_zoned(const struct > > ublk_device *ub) > > static inline bool ublk_queue_is_zoned(const struct ublk_queue *ubq) > > { > > return ubq->flags & UBLK_F_ZONED; > > } > > > > +static void ublk_setup_iod_buf(const struct ublk_queue *ubq, > > + const struct request *req, > > + struct ublksrv_io_desc *iod) > > +{ > > +#ifdef CONFIG_BLK_DEV_INTEGRITY > > + if (ubq->flags & UBLK_F_INTEGRITY) { > > + struct bio_integrity_payload *bip; > > + sector_t ref_tag_seed; > > + > > + if (!blk_integrity_rq(req)) > > + return; > > + > > + bip = bio_integrity(req->bio); > > + ref_tag_seed = bip_get_seed(bip); > > As mentioned, t10_pi_generate() and ext_pi_crc64_generate() builds > per-internal ref tag. As mentioned, the reftags for subsequent intervals can be computed by simply incrementing the seed. If the seed is assumed to always be start_sector >> (interval_exp - SECTOR_SHIFT), then it may not be necessary to communicate ref_tag_seed at all. > > > > + iod->integrity.ref_tag_lo = ref_tag_seed; > > + iod->integrity.ref_tag_hi = ref_tag_seed >> 32; > > + iod->integrity.app_tag = bip->app_tag; > > In case of io merge, each bio may have different ->app_tag. It seems like it would make more sense to prevent merging bios with different app_tags. In the common case where a request contains a single bio, which has a single app_tag, it would be much more efficient to communicate only the 1 app_tag instead of having to pass a separate app_tag for every logical block/integrity interval. > > Given you have to copy meta data via user copy, I suggest to follow the PI > standard and make it per-internal. How are you suggesting the ublk server access bip->app_tag and bip_get_seed(bip) (if overriding the reftag seed is supported)? Would the ublk server need to make another user copy syscall? Or would you prefer I drop the BIP_CHECK_* flag support from this patch set for now? Thanks, Caleb
