Am 02.07.2018 um 14:09 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 29.06.2018 20:40, Eric Blake wrote:
> > On 06/29/2018 12:30 PM, John Snow wrote:
> > > 
> > > 
> > > On 06/29/2018 11:15 AM, Vladimir Sementsov-Ogievskiy wrote:
> > > > We need to synchronize backup job with reading from fleecing image
> > > > like it was done in block/replication.c.
> > > > 
> > > > Otherwise, the following situation is theoretically possible:
> > > > 
> > > > 1. client start reading
> > > > 2. client understand, that there is no corresponding cluster in
> > > >     fleecing image
> > > 
> > > I don't think the client refocuses the read, but QEMU does. (the client
> > > has no idea if it is reading from the fleecing node or the backing
> > > file.)
> > > 
> > > ... but understood:
> > > 
> > > > 3. client is going to read from backing file (i.e. active image)
> > > > 4. guest writes to active image
> > > 
> > > My question here is if QEMU will allow reads and writes to interleave in
> > > general. If the client is reading from the backing file, the active
> > > image, will QEMU allow a write to modify that data before we're done
> > > getting that data?
> > > 
> > > > 5. this write is stopped by backup(sync=none) and cluster is copied to
> > > >     fleecing image
> > > > 6. guest write continues...
> > > > 7. and client reads _new_ (or partly new) date from active image
> > > > 
> > > 
> > > I can't answer this for myself one way or the other right now, but I
> > > imagine you observed a failure in practice in your test setups, which
> > > motivated this patch?
> > > 
> > > A test or some proof would help justification for this patch. It would
> > > also help prove that it solves what it claims to!
> > 
> > In fact, do we really need a new filter, or do we just need to make the
> > "sync":"none" blockdev-backup job take the appropriate synchronization
> > locks?
> > 
> 
> How? We'll need additional mechanism like serializing requests.. Or a way to
> reuse serializing requests. Using backup internal synchronization looks
> simpler, and it is already used in block replication.

But it also just an ugly hack that fixes one special case and leaves
everything else broken. replication is usually not a good example for
anything. It always gives me bad surprises when I have to look at it.

We'll have to figure out where to fix this problem (or what it really
is, once you look more than just at fleecing), but I think requiring the
user to add a filter driver to work around missing serialisation in
other code, and corrupting their image if they forget to, is not a
reasonable solution.

I see at least two things wrong in this context:

* The permissions don't seem to match reality. The NBD server
  unconditionally shares PERM_WRITE, which is wrong in this case. The
  client wants to see a point-in-time snapshot that never changes. This
  should become an option so that it can be properly reflected in the
  permissions used.

* Once we have proper permissions, the fleecing setup breaks down
  because the guest needs PERM_WRITE on the backing file, but the
  fleecing overlay allows that only if the NBD client allows it (which
  it doesn't for fleecing).

  Now we can implement an exception right into backup that installs a
  backup filter driver between source and target if the source is the
  backing file of the target. The filter driver would be similar to the
  commit filter driver in that it simply promises !PERM_WRITE to its
  parents, but allows PERM_WRITE on the source because it has installed
  the before_write_notifier that guarantees this condition.

  All writes to the target that are made by the backup job in this setup
  (including before_write_notifier writes) need to be marked as
  serialising so that any concurrent reads are completed first.

And if we decide to add a target filter to backup, we should probably at
the same time use a filter driver for intercepting source writes instead
of using before_write_notifier.

Max, I think you intended to make both source and target children of the
same block job node (or at least for mirror). But wouldn't that create
loops in a setup like this? I think we may need two filters that are
only connected through the block job, but not with block graph edges.

Kevin

Reply via email to