On 8/1/23 10:49, Jeff Moyer wrote:
> Hi, Christoph,
>
> Christoph Hellwig <[email protected]> writes:
>
>> On Tue, Aug 01, 2023 at 11:23:36AM -0400, Jeff Moyer wrote:
>>> I am slightly embarrassed to have to ask this question, but what are the
>>> implications of setting this queue flag?  Is the submit_bio routine
>>> expected to never block?
>> Yes, at least not significantly.
> If there happens to be poisoned memory, the write path can make an acpi
> device specific method call.  That involves taking a mutex (see the call
> chain starting at acpi_ex_enter_interpreter()).  I'm not sure how long a
> DSM takes, but I doubt it's fast.

for this case I can add bio->bio_opf & REQ_NOWAIT check and return
with bio_wouldblock_error() that will take care of blocking case e.g.
something like this untested as a prep patch for write path :-

linux-block (pmem-nowait-on) # git diff
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index ddd485c377eb..eff100f74357 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -179,12 +179,16 @@ static blk_status_t pmem_do_read(struct 
pmem_device *pmem,

  static blk_status_t pmem_do_write(struct pmem_device *pmem,
                         struct page *page, unsigned int page_off,
-                       sector_t sector, unsigned int len)
+                       sector_t sector, unsigned int len, struct bio *bio)
  {
         phys_addr_t pmem_off = to_offset(pmem, sector);
         void *pmem_addr = pmem->virt_addr + pmem_off;

         if (unlikely(is_bad_pmem(&pmem->bb, sector, len))) {
+               if (bio && bio->bi_opf & REQ_NOWAIT) {
+                       bio_wouldblock_error(bio);
+                       return BLK_STS_AGAIN;
+               }
                 blk_status_t rc = pmem_clear_poison(pmem, pmem_off, len);

                 if (rc != BLK_STS_OK)
@@ -217,7 +221,7 @@ static void pmem_submit_bio(struct bio *bio)
         bio_for_each_segment(bvec, bio, iter) {
                 if (op_is_write(bio_op(bio)))
                         rc = pmem_do_write(pmem, bvec.bv_page, 
bvec.bv_offset,
-                               iter.bi_sector, bvec.bv_len);
+                               iter.bi_sector, bvec.bv_len, bio);
                 else
                         rc = pmem_do_read(pmem, bvec.bv_page, 
bvec.bv_offset,
                                 iter.bi_sector, bvec.bv_len);
@@ -297,7 +301,7 @@ static int pmem_dax_zero_page_range(struct 
dax_device *dax_dev, pgoff_t pgoff,

         return blk_status_to_errno(pmem_do_write(pmem, ZERO_PAGE(0), 0,
                                    PFN_PHYS(pgoff) >> SECTOR_SHIFT,
-                                  PAGE_SIZE));
+                                  PAGE_SIZE, NULL));
  }

  static long pmem_dax_direct_access(struct dax_device *dax_dev,


>>> Is the I/O expected to be performed
>>> asynchronously?
>> Not nessecarily if it is fast enough..
> Clear as mud.  :) It's a memcpy on potentially "slow" memory.  So, the
> amount of time spent copying depends on the speed of the cpu, the media
> and the size of the I/O.  I don't know off-hand what the upper bound
> would be on today's pmem.

Above scenario depends on the system and I'm not sure if we can 
generalize this
for all the deployments. In case we end up generalizing above scenario 
then we
can always add a modparam to disable nowait so user has total control 
whether
to enable or disable this an incremental patch ..

For the deployments those are not suffering from the "mempcy on potentially
slow memory" this patch shows clear performance win with enabling NOWAIT
for io_uring ..

-ck


Reply via email to