Kevin Wolf <kw...@redhat.com> wrote on 04/28/2020 07:11:24 AM:
> 
> Am 27.04.2020 um 21:49 hat Bryan S Rosenburg geschrieben:
> > Blockdev community,
> > 
> > Our group would like to write block device backups directly to an 
object 
> > store, using an interface such as s3fs or rclone-mount. We've run into 

> > problems with both interfaces, and in both cases the problems revolve 
> > around fdatasync system calls. With s3fs, fdatasync calls are 
painfully 
> > slow. With rclone-mount, the calls are very fast but don't do 
anything.
> > 
> > Syncing files to an object store is inherently problematic, as a 
proper 
> > sync requires finalizing the object that holds the file. After 
> > finalization, additional writes to the file require a new object to be 

> > created and the old object to be copied and destroyed. This process 
> > results in an N-squared performance problem for files that are synced 
> > periodically as they are written, as is the case for qemu backups.
> > 
> > Empirically, s3fs implements fdatasync, and hence backups written to 
s3fs 
> > take an untenably long time. I can provide data and straces, if 
needed.
> > 
> > Backups written to rclone-mount are much faster, but there are obvious 

> > semantic problems. The backup job completes successfully before the 
file 
> > is actually stable in the object store. And in fact, a lot of the work 
of 
> > finalizing the file occurs during the "close" system call that is 
invoked 
> > as part of the qmp_blockdev_del operation.The syscall causes that 
> > operation to take so long that other commands time out waiting to 
"acquire 
> > state change lock (held by monitor qemuProcessEventHandler)".
> > 
> > My questions for the group are: Has anyone else tried writing backups 
to 
> > file systems that don't have good support for fdatasync, and do you 
have 
> > any advice other than "Don't do that." ?
> 
> I think "don't do that" is a good answer actually.
> 
> You may want to put an NBD indirection between QEMU and your object
> store, so that the close() syscall will just block a qemu-nbd process
> that has already closed its connection to QEMU instead of blocking all
> of QEMU.
> 
> It is possible to disable fdatasync() by specifying cache=unsafe for
> the block device, so you could avoid the penalty of repeated syncs on
> s3fs.
> 
> Of course, if s3fs requires an fsync before data is actually stable, in
> this case you couldn't consider your backup completed when the backup
> block job finishes successfully, but you would have to issue an fsync
> manually and wait for its result before you can consider the backup
> successful.
> 
> Kevin

Thanks, Kevin.

It sounds like we should be specifying cache=unsafe when using 
rclone-mount, at least, so qemu won't think the file system is 
implementing fdatasyncs when it's not.

- Bryan

Reply via email to