On 27.01.2021 12:33, Paul Durrant wrote: >> -----Original Message----- >> From: Jan Beulich <jbeul...@suse.com> >> Sent: 27 January 2021 11:21 >> To: p...@xen.org >> Cc: 'Paul Durrant' <pdurr...@amazon.com>; 'Konrad Rzeszutek Wilk' >> <konrad.w...@oracle.com>; 'Roger Pau >> Monné' <roger....@citrix.com>; 'Jens Axboe' <ax...@kernel.dk>; 'Dongli Zhang' >> <dongli.zh...@oracle.com>; linux-kernel@vger.kernel.org; >> linux-bl...@vger.kernel.org; xen- >> de...@lists.xenproject.org >> Subject: Re: [PATCH] xen-blkback: fix compatibility bug with single page >> rings >> >> On 27.01.2021 12:09, Paul Durrant wrote: >>>> -----Original Message----- >>>> From: Jan Beulich <jbeul...@suse.com> >>>> Sent: 27 January 2021 10:57 >>>> To: Paul Durrant <p...@xen.org> >>>> Cc: Paul Durrant <pdurr...@amazon.com>; Konrad Rzeszutek Wilk >>>> <konrad.w...@oracle.com>; Roger Pau >>>> Monné <roger....@citrix.com>; Jens Axboe <ax...@kernel.dk>; Dongli Zhang >>>> <dongli.zh...@oracle.com>; >>>> linux-kernel@vger.kernel.org; linux-bl...@vger.kernel.org; >>>> xen-de...@lists.xenproject.org >>>> Subject: Re: [PATCH] xen-blkback: fix compatibility bug with single page >>>> rings >>>> >>>> On 27.01.2021 11:30, Paul Durrant wrote: >>>>> From: Paul Durrant <pdurr...@amazon.com> >>>>> >>>>> Prior to commit 4a8c31a1c6f5 ("xen/blkback: rework connect_ring() to avoid >>>>> inconsistent xenstore 'ring-page-order' set by malicious blkfront"), the >>>>> behaviour of xen-blkback when connecting to a frontend was: >>>>> >>>>> - read 'ring-page-order' >>>>> - if not present then expect a single page ring specified by 'ring-ref' >>>>> - else expect a ring specified by 'ring-refX' where X is between 0 and >>>>> 1 << ring-page-order >>>>> >>>>> This was correct behaviour, but was broken by the afforementioned commit >>>>> to >>>>> become: >>>>> >>>>> - read 'ring-page-order' >>>>> - if not present then expect a single page ring >>>>> - expect a ring specified by 'ring-refX' where X is between 0 and >>>>> 1 << ring-page-order >>>>> - if that didn't work then see if there's a single page ring specified by >>>>> 'ring-ref' >>>>> >>>>> This incorrect behaviour works most of the time but fails when a frontend >>>>> that sets 'ring-page-order' is unloaded and replaced by one that does not >>>>> because, instead of reading 'ring-ref', xen-blkback will read the stale >>>>> 'ring-ref0' left around by the previous frontend will try to map the wrong >>>>> grant reference. >>>>> >>>>> This patch restores the original behaviour. >>>> >>>> Isn't this only the 2nd of a pair of fixes that's needed, the >>>> first being the drivers, upon being unloaded, to fully clean up >>>> after itself? Any stale key left may lead to confusion upon >>>> re-use of the containing directory. >>> >>> In a backend we shouldn't be relying on, nor really expect IMO, a frontend >>> to clean up after itself. >> Any backend should know *exactly* what xenstore nodes it’s looking for from >> a frontend. >> >> But the backend can't know whether a node exists because the present >> frontend has written it, or because an earlier instance forgot to >> delete it. It can only honor what's there. (In fact the other day I >> was wondering whether some of the writes of boolean "false" nodes >> wouldn't better be xenbus_rm() instead.) > > In the particular case this patch is fixing for me, the frontends are the > Windows XENVBD driver and the Windows crash version of the same driver > (actually built from different code). The 'normal' instance is multi-page > aware and the crash instance is not quite, i.e. it uses the old ring-ref but > knows to clean up 'ring-page-order'. > Clearly, in a crash situation, we cannot rely on frontend to clean up
Ah, I see (and agree). > so what you say does highlight that there indeed needs to be a second patch > to xen-blkback to make sure it removes 'ring-page-order' itself as 'state' > cycles through Closed and back to InitWait. And not just this one node then, I suppose? > I think this patch does still stand on its own though. Perhaps, yes. Jan