Re: [Qemu-block] [Qemu-devel] Some question about savem/qcow2 incremental snapshot

Stefan Hajnoczi Thu, 10 May 2018 01:27:38 -0700

On Wed, May 09, 2018 at 07:54:31PM +0200, Max Reitz wrote:
> On 2018-05-09 12:16, Stefan Hajnoczi wrote:
> > On Tue, May 08, 2018 at 05:03:09PM +0200, Kevin Wolf wrote:
> >> Am 08.05.2018 um 16:41 hat Eric Blake geschrieben:
> >>> On 12/25/2017 01:33 AM, He Junyan wrote:
> >> 2. Make the nvdimm device use the QEMU block layer so that it is backed
> >>    by a non-raw disk image (such as a qcow2 file representing the
> >>    content of the nvdimm) that supports snapshots.
> >>
> >>    This part is hard because it requires some completely new
> >>    infrastructure such as mapping clusters of the image file to guest
> >>    pages, and doing cluster allocation (including the copy on write
> >>    logic) by handling guest page faults.
> >>
> >> I think it makes sense to invest some effort into such interfaces, but
> >> be prepared for a long journey.
> > 
> > I like the suggestion but it needs to be followed up with a concrete
> > design that is feasible and fair for Junyan and others to implement.
> > Otherwise the "long journey" is really just a way of rejecting this
> > feature.
> > 
> > Let's discuss the details of using the block layer for NVDIMM and try to
> > come up with a plan.
> > 
> > The biggest issue with using the block layer is that persistent memory
> > applications use load/store instructions to directly access data.  This
> > is fundamentally different from the block layer, which transfers blocks
> > of data to and from the device.
> > 
> > Because of block DMA, QEMU is able to perform processing at each block
> > driver graph node.  This doesn't exist for persistent memory because
> > software does not trap I/O.  Therefore the concept of filter nodes
> > doesn't make sense for persistent memory - we certainly do not want to
> > trap every I/O because performance would be terrible.
> > 
> > Another difference is that persistent memory I/O is synchronous.
> > Load/store instructions execute quickly.  Perhaps we could use KVM async
> > page faults in cases where QEMU needs to perform processing, but again
> > the performance would be bad.
> 
> Let me first say that I have no idea how the interface to NVDIMM looks.
> I just assume it works pretty much like normal RAM (so the interface is
> just that it’s a part of the physical address space).
> 
> Also, it sounds a bit like you are already discarding my idea, but here
> goes anyway.
> 
> Would it be possible to introduce a buffering block driver that presents
> the guest an area of RAM/NVDIMM through an NVDIMM interface (so I
> suppose as part of the guest address space)?  For writing, we’d keep a
> dirty bitmap on it, and then we’d asynchronously move the dirty areas
> through the block layer, so basically like mirror.  On flushing, we’d
> block until everything is clean.
> 
> For reading, we’d follow a COR/stream model, basically, where everything
> is unpopulated in the beginning and everything is loaded through the
> block layer both asynchronously all the time and on-demand whenever the
> guest needs something that has not been loaded yet.
> 
> Now I notice that that looks pretty much like a backing file model where
> we constantly run both a stream and a commit job at the same time.
> 
> The user could decide how much memory to use for the buffer, so it could
> either hold everything or be partially unallocated.
> 
> You’d probably want to back the buffer by NVDIMM normally, so that
> nothing is lost on crashes (though this would imply that for partial
> allocation the buffering block driver would need to know the mapping
> between the area in real NVDIMM and its virtual representation of it).
> 
> Just my two cents while scanning through qemu-block to find emails that
> don’t actually concern me...


The guest kernel already implements this - it's the page cache and the
block layer!

Doing it in QEMU with dirty memory logging enabled is less efficient
than doing it in the guest.

That's why I said it's better to just use block devices than to
implement buffering.

I'm saying that persistent memory emulation on top of the iscsi:// block
driver (for example) does not make sense.  It could be implemented but
the performance wouldn't be better than block I/O and the
complexity/code size in QEMU isn't justified IMO.

Stefan

> > Most protocol drivers do not support direct memory access.  iscsi, curl,
> > etc just don't fit the model.  One might be tempted to implement
> > buffering but at that point it's better to just use block devices.
> > 
> > I have CCed Pankaj, who is working on the virtio-pmem device.  I need to
> > be clear that emulated NVDIMM cannot be supported with the block layer
> > since it lacks a guest flush mechanism.  There is no way for
> > applications to let the hypervisor know the file needs to be fsynced.
> > That's what virtio-pmem addresses.
> > 
> > Summary:
> > A subset of the block layer could be used to back virtio-pmem.  This
> > requires a new block driver API and the KVM async page fault mechanism
> > for trapping and mapping pages.  Actual emulated NVDIMM devices cannot
> > be supported unless the hardware specification is extended with a
> > virtualization-friendly interface in the future.
> > 
> > Please let me know your thoughts.
> > 
> > Stefan
> > 
> 
>

signature.asc
Description: PGP signature

Re: [Qemu-block] [Qemu-devel] Some question about savem/qcow2 incremental snapshot

Reply via email to