Re: [RFC v3 1/2] virtio-pmem: Async virtio-pmem flush

2022-02-16 Thread Pankaj Gupta
> >
> > > >
> > > > Enable asynchronous flush for virtio pmem using work queue. Also,
> > > > coalesce the flush requests when a flush is already in process.
> > > > This functionality is copied from md/RAID code.
> > > >
> > > > When a flush is already in process, new flush requests wait till
> > > > previous flush completes in another context (work queue). For all
> > > > the requests come between ongoing flush and new flush start time, only
> > > > single flush executes, thus adhers to flush coalscing logic. This is
> > >
> > > s/adhers/adheres/
> > >
> > > s/coalscing/coalescing/
> > >
> > > > important for maintaining the flush request order with request 
> > > > coalscing.
> > >
> > > s/coalscing/coalescing/
> >
> > o.k. Sorry for the spelling mistakes.
> >
> > >
> > > >
> > > > Signed-off-by: Pankaj Gupta 
> > > > ---
> > > >  drivers/nvdimm/nd_virtio.c   | 74 +++-
> > > >  drivers/nvdimm/virtio_pmem.c | 10 +
> > > >  drivers/nvdimm/virtio_pmem.h | 16 
> > > >  3 files changed, 83 insertions(+), 17 deletions(-)
> > > >
> > > > diff --git a/drivers/nvdimm/nd_virtio.c b/drivers/nvdimm/nd_virtio.c
> > > > index 10351d5b49fa..179ea7a73338 100644
> > > > --- a/drivers/nvdimm/nd_virtio.c
> > > > +++ b/drivers/nvdimm/nd_virtio.c
> > > > @@ -100,26 +100,66 @@ static int virtio_pmem_flush(struct nd_region 
> > > > *nd_region)
> > > >  /* The asynchronous flush callback function */
> > > >  int async_pmem_flush(struct nd_region *nd_region, struct bio *bio)
> > > >  {
> > > > -   /*
> > > > -* Create child bio for asynchronous flush and chain with
> > > > -* parent bio. Otherwise directly call nd_region flush.
> > > > +   /* queue asynchronous flush and coalesce the flush requests */
> > > > +   struct virtio_device *vdev = nd_region->provider_data;
> > > > +   struct virtio_pmem *vpmem  = vdev->priv;
> > > > +   ktime_t req_start = ktime_get_boottime();
> > > > +   int ret = -EINPROGRESS;
> > > > +
> > > > +   spin_lock_irq(>lock);
> > >
> > > Why a new lock and not continue to use ->pmem_lock?
> >
> > This spinlock is to protect entry in 'wait_event_lock_irq'
> > and the Other spinlock is to protect virtio queue data.
>
> Understood, but md shares the mddev->lock for both purposes, so I
> would ask that you either document what motivates the locking split,
> or just reuse the lock until a strong reason to split them arises.

O.k. Will check again if we could use same lock Or document it.

Thanks,
Pankaj



Re: [RFC v3 1/2] virtio-pmem: Async virtio-pmem flush

2022-02-16 Thread Dan Williams
On Wed, Feb 16, 2022 at 12:47 AM Pankaj Gupta
 wrote:
>
> > >
> > > Enable asynchronous flush for virtio pmem using work queue. Also,
> > > coalesce the flush requests when a flush is already in process.
> > > This functionality is copied from md/RAID code.
> > >
> > > When a flush is already in process, new flush requests wait till
> > > previous flush completes in another context (work queue). For all
> > > the requests come between ongoing flush and new flush start time, only
> > > single flush executes, thus adhers to flush coalscing logic. This is
> >
> > s/adhers/adheres/
> >
> > s/coalscing/coalescing/
> >
> > > important for maintaining the flush request order with request coalscing.
> >
> > s/coalscing/coalescing/
>
> o.k. Sorry for the spelling mistakes.
>
> >
> > >
> > > Signed-off-by: Pankaj Gupta 
> > > ---
> > >  drivers/nvdimm/nd_virtio.c   | 74 +++-
> > >  drivers/nvdimm/virtio_pmem.c | 10 +
> > >  drivers/nvdimm/virtio_pmem.h | 16 
> > >  3 files changed, 83 insertions(+), 17 deletions(-)
> > >
> > > diff --git a/drivers/nvdimm/nd_virtio.c b/drivers/nvdimm/nd_virtio.c
> > > index 10351d5b49fa..179ea7a73338 100644
> > > --- a/drivers/nvdimm/nd_virtio.c
> > > +++ b/drivers/nvdimm/nd_virtio.c
> > > @@ -100,26 +100,66 @@ static int virtio_pmem_flush(struct nd_region 
> > > *nd_region)
> > >  /* The asynchronous flush callback function */
> > >  int async_pmem_flush(struct nd_region *nd_region, struct bio *bio)
> > >  {
> > > -   /*
> > > -* Create child bio for asynchronous flush and chain with
> > > -* parent bio. Otherwise directly call nd_region flush.
> > > +   /* queue asynchronous flush and coalesce the flush requests */
> > > +   struct virtio_device *vdev = nd_region->provider_data;
> > > +   struct virtio_pmem *vpmem  = vdev->priv;
> > > +   ktime_t req_start = ktime_get_boottime();
> > > +   int ret = -EINPROGRESS;
> > > +
> > > +   spin_lock_irq(>lock);
> >
> > Why a new lock and not continue to use ->pmem_lock?
>
> This spinlock is to protect entry in 'wait_event_lock_irq'
> and the Other spinlock is to protect virtio queue data.

Understood, but md shares the mddev->lock for both purposes, so I
would ask that you either document what motivates the locking split,
or just reuse the lock until a strong reason to split them arises.



Re: [RFC v3 1/2] virtio-pmem: Async virtio-pmem flush

2022-02-16 Thread Pankaj Gupta
> >
> > Enable asynchronous flush for virtio pmem using work queue. Also,
> > coalesce the flush requests when a flush is already in process.
> > This functionality is copied from md/RAID code.
> >
> > When a flush is already in process, new flush requests wait till
> > previous flush completes in another context (work queue). For all
> > the requests come between ongoing flush and new flush start time, only
> > single flush executes, thus adhers to flush coalscing logic. This is
>
> s/adhers/adheres/
>
> s/coalscing/coalescing/
>
> > important for maintaining the flush request order with request coalscing.
>
> s/coalscing/coalescing/

o.k. Sorry for the spelling mistakes.

>
> >
> > Signed-off-by: Pankaj Gupta 
> > ---
> >  drivers/nvdimm/nd_virtio.c   | 74 +++-
> >  drivers/nvdimm/virtio_pmem.c | 10 +
> >  drivers/nvdimm/virtio_pmem.h | 16 
> >  3 files changed, 83 insertions(+), 17 deletions(-)
> >
> > diff --git a/drivers/nvdimm/nd_virtio.c b/drivers/nvdimm/nd_virtio.c
> > index 10351d5b49fa..179ea7a73338 100644
> > --- a/drivers/nvdimm/nd_virtio.c
> > +++ b/drivers/nvdimm/nd_virtio.c
> > @@ -100,26 +100,66 @@ static int virtio_pmem_flush(struct nd_region 
> > *nd_region)
> >  /* The asynchronous flush callback function */
> >  int async_pmem_flush(struct nd_region *nd_region, struct bio *bio)
> >  {
> > -   /*
> > -* Create child bio for asynchronous flush and chain with
> > -* parent bio. Otherwise directly call nd_region flush.
> > +   /* queue asynchronous flush and coalesce the flush requests */
> > +   struct virtio_device *vdev = nd_region->provider_data;
> > +   struct virtio_pmem *vpmem  = vdev->priv;
> > +   ktime_t req_start = ktime_get_boottime();
> > +   int ret = -EINPROGRESS;
> > +
> > +   spin_lock_irq(>lock);
>
> Why a new lock and not continue to use ->pmem_lock?

This spinlock is to protect entry in 'wait_event_lock_irq'
and the Other spinlock is to protect virtio queue data.
>
> Have you tested this with CONFIG_PROVE_LOCKING?
No, I only ran xfs tests and some of my unit test program.
Will enable and test with CONFIG_PROVE_LOCKING.

>
> Along those lines do you have a selftest that can be added to the
> kernel as well so that 0day or other bots could offer early warnings
> on regressions?

Will try to add one.

Thank you Dan for the feedback!

Best regards,
Pankaj
>
> > +   /* flush requests wait until ongoing flush completes,
> > +* hence coalescing all the pending requests.
> >  */
> > -   if (bio && bio->bi_iter.bi_sector != -1) {
> > -   struct bio *child = bio_alloc(GFP_ATOMIC, 0);
> > -
> > -   if (!child)
> > -   return -ENOMEM;
> > -   bio_copy_dev(child, bio);
> > -   child->bi_opf = REQ_PREFLUSH;
> > -   child->bi_iter.bi_sector = -1;
> > -   bio_chain(child, bio);
> > -   submit_bio(child);
> > -   return 0;
> > +   wait_event_lock_irq(vpmem->sb_wait,
> > +   !vpmem->flush_bio ||
> > +   ktime_before(req_start, 
> > vpmem->prev_flush_start),
> > +   vpmem->lock);
> > +   /* new request after previous flush is completed */
> > +   if (ktime_after(req_start, vpmem->prev_flush_start)) {
> > +   WARN_ON(vpmem->flush_bio);
> > +   vpmem->flush_bio = bio;
> > +   bio = NULL;
> > +   }
> > +   spin_unlock_irq(>lock);
> > +
> > +   if (!bio)
> > +   queue_work(vpmem->pmem_wq, >flush_work);
> > +   else {
> > +   /* flush completed in other context while we waited */
> > +   if (bio && (bio->bi_opf & REQ_PREFLUSH))
> > +   bio->bi_opf &= ~REQ_PREFLUSH;
> > +   else if (bio && (bio->bi_opf & REQ_FUA))
> > +   bio->bi_opf &= ~REQ_FUA;
> > +
> > +   ret = vpmem->prev_flush_err;
> > }
> > -   if (virtio_pmem_flush(nd_region))
> > -   return -EIO;
> >
> > -   return 0;
> > +   return ret;
> >  };
> >  EXPORT_SYMBOL_GPL(async_pmem_flush);
> > +
> > +void submit_async_flush(struct work_struct *ws)
>
> This name is too generic to be exported from drivers/nvdimm/nd_virtio.c
>
> ...it strikes me that there is little reason for nd_virtio and
> virtio_pmem to be separate modules. They are both enabled by the same
> Kconfig, so why not combine them into one module and drop the exports?

makes sense.
>
> > +{
> > +   struct virtio_pmem *vpmem = container_of(ws, struct virtio_pmem, 
> > flush_work);
> > +   struct bio *bio = vpmem->flush_bio;
> > +
> > +   vpmem->start_flush = ktime_get_boottime();
> > +   vpmem->prev_flush_err = virtio_pmem_flush(vpmem->nd_region);
> > +   vpmem->prev_flush_start = vpmem->start_flush;
> > +   vpmem->flush_bio = NULL;
> > +   wake_up(>sb_wait);
> > +
> > + 

Re: [RFC v3 1/2] virtio-pmem: Async virtio-pmem flush

2022-02-15 Thread Dan Williams
On Tue, Jan 11, 2022 at 8:23 AM Pankaj Gupta
 wrote:
>
> Enable asynchronous flush for virtio pmem using work queue. Also,
> coalesce the flush requests when a flush is already in process.
> This functionality is copied from md/RAID code.
>
> When a flush is already in process, new flush requests wait till
> previous flush completes in another context (work queue). For all
> the requests come between ongoing flush and new flush start time, only
> single flush executes, thus adhers to flush coalscing logic. This is

s/adhers/adheres/

s/coalscing/coalescing/

> important for maintaining the flush request order with request coalscing.

s/coalscing/coalescing/

>
> Signed-off-by: Pankaj Gupta 
> ---
>  drivers/nvdimm/nd_virtio.c   | 74 +++-
>  drivers/nvdimm/virtio_pmem.c | 10 +
>  drivers/nvdimm/virtio_pmem.h | 16 
>  3 files changed, 83 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/nvdimm/nd_virtio.c b/drivers/nvdimm/nd_virtio.c
> index 10351d5b49fa..179ea7a73338 100644
> --- a/drivers/nvdimm/nd_virtio.c
> +++ b/drivers/nvdimm/nd_virtio.c
> @@ -100,26 +100,66 @@ static int virtio_pmem_flush(struct nd_region 
> *nd_region)
>  /* The asynchronous flush callback function */
>  int async_pmem_flush(struct nd_region *nd_region, struct bio *bio)
>  {
> -   /*
> -* Create child bio for asynchronous flush and chain with
> -* parent bio. Otherwise directly call nd_region flush.
> +   /* queue asynchronous flush and coalesce the flush requests */
> +   struct virtio_device *vdev = nd_region->provider_data;
> +   struct virtio_pmem *vpmem  = vdev->priv;
> +   ktime_t req_start = ktime_get_boottime();
> +   int ret = -EINPROGRESS;
> +
> +   spin_lock_irq(>lock);

Why a new lock and not continue to use ->pmem_lock?

Have you tested this with CONFIG_PROVE_LOCKING?

Along those lines do you have a selftest that can be added to the
kernel as well so that 0day or other bots could offer early warnings
on regressions?

> +   /* flush requests wait until ongoing flush completes,
> +* hence coalescing all the pending requests.
>  */
> -   if (bio && bio->bi_iter.bi_sector != -1) {
> -   struct bio *child = bio_alloc(GFP_ATOMIC, 0);
> -
> -   if (!child)
> -   return -ENOMEM;
> -   bio_copy_dev(child, bio);
> -   child->bi_opf = REQ_PREFLUSH;
> -   child->bi_iter.bi_sector = -1;
> -   bio_chain(child, bio);
> -   submit_bio(child);
> -   return 0;
> +   wait_event_lock_irq(vpmem->sb_wait,
> +   !vpmem->flush_bio ||
> +   ktime_before(req_start, vpmem->prev_flush_start),
> +   vpmem->lock);
> +   /* new request after previous flush is completed */
> +   if (ktime_after(req_start, vpmem->prev_flush_start)) {
> +   WARN_ON(vpmem->flush_bio);
> +   vpmem->flush_bio = bio;
> +   bio = NULL;
> +   }
> +   spin_unlock_irq(>lock);
> +
> +   if (!bio)
> +   queue_work(vpmem->pmem_wq, >flush_work);
> +   else {
> +   /* flush completed in other context while we waited */
> +   if (bio && (bio->bi_opf & REQ_PREFLUSH))
> +   bio->bi_opf &= ~REQ_PREFLUSH;
> +   else if (bio && (bio->bi_opf & REQ_FUA))
> +   bio->bi_opf &= ~REQ_FUA;
> +
> +   ret = vpmem->prev_flush_err;
> }
> -   if (virtio_pmem_flush(nd_region))
> -   return -EIO;
>
> -   return 0;
> +   return ret;
>  };
>  EXPORT_SYMBOL_GPL(async_pmem_flush);
> +
> +void submit_async_flush(struct work_struct *ws)

This name is too generic to be exported from drivers/nvdimm/nd_virtio.c

...it strikes me that there is little reason for nd_virtio and
virtio_pmem to be separate modules. They are both enabled by the same
Kconfig, so why not combine them into one module and drop the exports?

> +{
> +   struct virtio_pmem *vpmem = container_of(ws, struct virtio_pmem, 
> flush_work);
> +   struct bio *bio = vpmem->flush_bio;
> +
> +   vpmem->start_flush = ktime_get_boottime();
> +   vpmem->prev_flush_err = virtio_pmem_flush(vpmem->nd_region);
> +   vpmem->prev_flush_start = vpmem->start_flush;
> +   vpmem->flush_bio = NULL;
> +   wake_up(>sb_wait);
> +
> +   if (vpmem->prev_flush_err)
> +   bio->bi_status = errno_to_blk_status(-EIO);
> +
> +   /* Submit parent bio only for PREFLUSH */
> +   if (bio && (bio->bi_opf & REQ_PREFLUSH)) {
> +   bio->bi_opf &= ~REQ_PREFLUSH;
> +   submit_bio(bio);
> +   } else if (bio && (bio->bi_opf & REQ_FUA)) {
> +   bio->bi_opf &= ~REQ_FUA;
> +   bio_endio(bio);
> +   }
> +}
> +EXPORT_SYMBOL_GPL(submit_async_flush);
>  MODULE_LICENSE("GPL");
> diff --git