RE: [RFC v2 11/11] scsi: storvsc: Support PAGE_SIZE larger than 4K

Michael Kelley Sat, 05 Sep 2020 08:38:27 -0700

From: Boqun Feng <[email protected]> Sent: Saturday, September 5, 2020 7:15 
AM
> 
> On Sat, Sep 05, 2020 at 02:55:48AM +0000, Michael Kelley wrote:
> > From: Boqun Feng <[email protected]> Sent: Tuesday, September 1, 2020 
> > 8:01 PM
> > >
> > > Hyper-V always use 4k page size (HV_HYP_PAGE_SIZE), so when
> > > communicating with Hyper-V, a guest should always use HV_HYP_PAGE_SIZE
> > > as the unit for page related data. For storvsc, the data is
> > > vmbus_packet_mpb_array. And since in scsi_cmnd, sglist of pages (in unit
> > > of PAGE_SIZE) is used, we need convert pages in the sglist of scsi_cmnd
> > > into Hyper-V pages in vmbus_packet_mpb_array.
> > >
> > > This patch does the conversion by dividing pages in sglist into Hyper-V
> > > pages, offset and indexes in vmbus_packet_mpb_array are recalculated
> > > accordingly.
> > >
> > > Signed-off-by: Boqun Feng <[email protected]>
> > > ---
> > >  drivers/scsi/storvsc_drv.c | 60 ++++++++++++++++++++++++++++++++++----
> > >  1 file changed, 54 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
> > > index 8f5f5dc863a4..3f6610717d4e 100644
> > > --- a/drivers/scsi/storvsc_drv.c
> > > +++ b/drivers/scsi/storvsc_drv.c
> > > @@ -1739,23 +1739,71 @@ static int storvsc_queuecommand(struct Scsi_Host 
> > > *host,
> struct
> > > scsi_cmnd *scmnd)
> > >   payload_sz = sizeof(cmd_request->mpb);
> > >
> > >   if (sg_count) {
> > > -         if (sg_count > MAX_PAGE_BUFFER_COUNT) {
> > > +         unsigned int hvpg_idx = 0;
> > > +         unsigned int j = 0;
> > > +         unsigned long hvpg_offset = sgl->offset & ~HV_HYP_PAGE_MASK;
> > > +         unsigned int hvpg_count = HVPFN_UP(hvpg_offset + length);
> > >
> > > -                 payload_sz = (sg_count * sizeof(u64) +
> > > +         if (hvpg_count > MAX_PAGE_BUFFER_COUNT) {
> > > +
> > > +                 payload_sz = (hvpg_count * sizeof(u64) +
> > >                                 sizeof(struct vmbus_packet_mpb_array));
> > >                   payload = kzalloc(payload_sz, GFP_ATOMIC);
> > >                   if (!payload)
> > >                           return SCSI_MLQUEUE_DEVICE_BUSY;
> > >           }
> > >
> > > +         /*
> > > +          * sgl is a list of PAGEs, and payload->range.pfn_array
> > > +          * expects the page number in the unit of HV_HYP_PAGE_SIZE (the
> > > +          * page size that Hyper-V uses, so here we need to divide PAGEs
> > > +          * into HV_HYP_PAGE in case that PAGE_SIZE > HV_HYP_PAGE_SIZE.
> > > +          */
> > >           payload->range.len = length;
> > > -         payload->range.offset = sgl[0].offset;
> > > +         payload->range.offset = sgl[0].offset & ~HV_HYP_PAGE_MASK;
> > > +         hvpg_idx = sgl[0].offset >> HV_HYP_PAGE_SHIFT;
> > >
> > >           cur_sgl = sgl;
> > > -         for (i = 0; i < sg_count; i++) {
> > > -                 payload->range.pfn_array[i] =
> > > -                         page_to_pfn(sg_page((cur_sgl)));
> > > +         for (i = 0, j = 0; i < sg_count; i++) {
> > > +                 /*
> > > +                  * "PAGE_SIZE / HV_HYP_PAGE_SIZE - hvpg_idx" is the #
> > > +                  * of HV_HYP_PAGEs in the current PAGE.
> > > +                  *
> > > +                  * "hvpg_count - j" is the # of unhandled HV_HYP_PAGEs.
> > > +                  *
> > > +                  * As shown in the following, the minimal of both is
> > > +                  * the # of HV_HYP_PAGEs, we need to handle in this
> > > +                  * PAGE.
> > > +                  *
> > > +                  * |------------------ PAGE ----------------------|
> > > +                  * |   PAGE_SIZE / HV_HYP_PAGE_SIZE in total      |
> > > +                  * |hvpg|hvpg| ...                 |hvpg|... |hvpg|
> > > +                  *           ^                     ^
> > > +                  *         hvpg_idx                |
> > > +                  *           ^                     |
> > > +                  *           +---(hvpg_count - j)--+
> > > +                  *
> > > +                  * or
> > > +                  *
> > > +                  * |------------------ PAGE ----------------------|
> > > +                  * |   PAGE_SIZE / HV_HYP_PAGE_SIZE in total      |
> > > +                  * |hvpg|hvpg| ...                 |hvpg|... |hvpg|
> > > +                  *           ^                                          
> > >  ^
> > > +                  *         hvpg_idx                                     
> > >  |
> > > +                  *           ^                                          
> > >  |
> > > +                  *           +---(hvpg_count - 
> > > j)------------------------+
> > > +                  */
> > > +                 unsigned int nr_hvpg = min((unsigned int)(PAGE_SIZE /
> HV_HYP_PAGE_SIZE) - hvpg_idx,
> > > +                                            hvpg_count - j);
> > > +                 unsigned int k;
> > > +
> > > +                 for (k = 0; k < nr_hvpg; k++) {
> > > +                         payload->range.pfn_array[j] =
> > > +                                 page_to_hvpfn(sg_page((cur_sgl))) + 
> > > hvpg_idx + k;
> > > +                         j++;
> > > +                 }
> > >                   cur_sgl = sg_next(cur_sgl);
> > > +                 hvpg_idx = 0;
> > >           }
> >
> > This code works; I don't see any errors.  But I think it can be made 
> > simpler based
> > on doing two things:
> > 1)  Rather than iterating over the sg_count, and having to calculate 
> > nr_hvpg on
> > each iteration, base the exit decision on having filled up the pfn_array[]. 
> >  You've
> > already calculated the exact size of the array that is needed given the data
> > length, so it's easy to exit when the array is full.
> > 2) In the inner loop, iterate from hvpg_idx to PAGE_SIZE/HV_HYP_PAGE_SIZE
> > rather than from 0 to a calculated value.
> >
> > Also, as an optimization, pull page_to_hvpfn(sg_page((cur_sgl)) out of the
> > inner loop.
> >
> > I think this code does it (though I haven't tested it):
> >
> >                 for (j = 0; ; sgl = sg_next(sgl)) {
> >                         unsigned int k;
> >                         unsigned long pfn;
> >
> >                         pfn = page_to_hvpfn(sg_page(sgl));
> >                         for (k = hvpg_idx; k < (unsigned int)(PAGE_SIZE 
> > /HV_HYP_PAGE_SIZE); k++) {
> >                                 payload->range.pfn_array[j] = pfn + k;
> >                                 if (++j == hvpg_count)
> >                                         goto done;
> >                         }
> >                         hvpg_idx = 0;
> >                 }
> > done:
> >
> > This approach also makes the limit of the inner loop a constant, and that
> > constant will be 1 when page size is 4K.  So the compiler should be able to
> > optimize away the loop in that case.
> >
> 
> Good point! I like your suggestion, and after thinking a bit harder
> based on your approach, I come up with the following:
> 
> #define HV_HYP_PAGES_IN_PAGE ((unsigned int)(PAGE_SIZE / HV_HYP_PAGE_SIZE))
> 
>               for (j = 0; j < hvpg_count; j++) {
>                       unsigned int k = (j + hvpg_idx) % HV_HYP_PAGES_IN_PAGE;
> 
>                       /*
>                        * Two cases that we need to fetch a page:
>                        * a) j == 0: the first step or
>                        * b) k == 0: when we reach the boundary of a
>                        * page.
>                        *
>                       if (k == 0 || j == 0) {
>                               pfn = page_to_hvpfn(sg_page(cur_sgl));
>                               cur_sgl = sg_next(cur_sgl);
>                       }
> 
>                       payload->range.pfn_arrary[j] = pfn + k;
>               }
> 
> , given the HV_HYP_PAGES_IN_PAGE is always a power of 2, so I think
> compilers could easily optimize the "%" into bit masking operation. And
> when HV_HYP_PAGES_IN_PAGE is 1, I think compilers can easily figure out
> k is always zero, then the if-statement can be optimized as always
> taken. And that gives us the same code as before ;-)
> 
> Thoughts? I will try with a test to see if I'm missing something subtle.
> 
> Thanks for looking into this!
>


Your newest version looks right to me -- very clever!  I like it even better 
than my version.

Michael

RE: [RFC v2 11/11] scsi: storvsc: Support PAGE_SIZE larger than 4K

Reply via email to