Kevin Wolf <kw...@redhat.com> wrote on 04/28/2020 07:11:24 AM: > > Am 27.04.2020 um 21:49 hat Bryan S Rosenburg geschrieben: > > Blockdev community, > > > > Our group would like to write block device backups directly to an object > > store, using an interface such as s3fs or rclone-mount. We've run into
> > problems with both interfaces, and in both cases the problems revolve > > around fdatasync system calls. With s3fs, fdatasync calls are painfully > > slow. With rclone-mount, the calls are very fast but don't do anything. > > > > Syncing files to an object store is inherently problematic, as a proper > > sync requires finalizing the object that holds the file. After > > finalization, additional writes to the file require a new object to be > > created and the old object to be copied and destroyed. This process > > results in an N-squared performance problem for files that are synced > > periodically as they are written, as is the case for qemu backups. > > > > Empirically, s3fs implements fdatasync, and hence backups written to s3fs > > take an untenably long time. I can provide data and straces, if needed. > > > > Backups written to rclone-mount are much faster, but there are obvious > > semantic problems. The backup job completes successfully before the file > > is actually stable in the object store. And in fact, a lot of the work of > > finalizing the file occurs during the "close" system call that is invoked > > as part of the qmp_blockdev_del operation.The syscall causes that > > operation to take so long that other commands time out waiting to "acquire > > state change lock (held by monitor qemuProcessEventHandler)". > > > > My questions for the group are: Has anyone else tried writing backups to > > file systems that don't have good support for fdatasync, and do you have > > any advice other than "Don't do that." ? > > I think "don't do that" is a good answer actually. > > You may want to put an NBD indirection between QEMU and your object > store, so that the close() syscall will just block a qemu-nbd process > that has already closed its connection to QEMU instead of blocking all > of QEMU. > > It is possible to disable fdatasync() by specifying cache=unsafe for > the block device, so you could avoid the penalty of repeated syncs on > s3fs. > > Of course, if s3fs requires an fsync before data is actually stable, in > this case you couldn't consider your backup completed when the backup > block job finishes successfully, but you would have to issue an fsync > manually and wait for its result before you can consider the backup > successful. > > Kevin Thanks, Kevin. It sounds like we should be specifying cache=unsafe when using rclone-mount, at least, so qemu won't think the file system is implementing fdatasyncs when it's not. - Bryan