Am 18.12.2013 um 15:42 schrieb ronnie sahlberg <ronniesahlb...@gmail.com>:
> On Wed, Dec 18, 2013 at 2:00 AM, Orit Wasserman <owass...@redhat.com> wrote: >> On 12/18/2013 01:03 AM, Peter Lieven wrote: >>> >>> >>> >>>> Am 17.12.2013 um 18:32 schrieb "Daniel P. Berrange" >>>> <berra...@redhat.com>: >>>> >>>>> On Tue, Dec 17, 2013 at 10:15:25AM +0100, Peter Lieven wrote: >>>>> This patch adds native support for accessing images on NFS shares >>>>> without >>>>> the requirement to actually mount the entire NFS share on the host. >>>>> >>>>> NFS Images can simply be specified by an url of the form: >>>>> nfs://<host>/<export>/<filename> >>>>> >>>>> For example: >>>>> qemu-img create -f qcow2 nfs://10.0.0.1/qemu-images/test.qcow2 >>>> >>>> >>>> Does it support other config tunables, eg specifying which >>>> NFS version to use 2/3/4 ? If so will they be available as >>>> URI parameters in the obvious manner ? >>> >>> >>> currently only v3 is supported by libnfs. what other tunables would you >>> like to see? >>> >> >> For live migration we need the sync option (async ignores O_SYNC and >> O_DIRECT sadly), >> will it be supported? or will it be the default? >> > > If you use the high-level API that provides posix like functions, such > as nfs_open() then libnfs does. > nfs_open()/nfs_open_async() takes a mode parameter and libnfs checks > the O_SYNC flag in modes. > > By default libnfs will translate any nfs_write*() or nfs_pwrite*() to > NFS/WRITE3+UNSTABLE that allows the server to just write to > cache/memory. > > IF you specify O_SYNC in the mode argument to nfds_open/nfs_open_async > then libnfs will flag this handle as sync and any calls to > nfs_write/nfs_pwrite will translate to NFS/WRITE3+FILE_SYNC > > Calls to nfs_fsync is translated to NFS/COMMIT3 If this NFS/COMMIT3 would issue a sync on the server that would be all we actually need. And in this case migration should be safe. Even if we open a file with cache = none qemu would issue such a commit after every write. This also allow for writeback caching where the filesystem flushes would go through right to the server. > > > > Of course, as for normal file I/O this is useful but not great since > you can only control the sync vs async per open filehandle. > Libnfs does also allow you to send raw rpc commands to the server and > using this API you can control the sync behaviour for individual > writes. > This means you coould do something like > * emulate SCSI to the guest. > * if guest sends SCSI/WRITE* without any FUA bits set, then for that > I/O you send a NFS3/WRITE+UNSTABLE > * if guest sends SCSI/WRITE* with FUA bits set, then for that I/O you > send NFS3/WRITE+FILE_SYNC > and then the guest kernel can control for each individual write > whether it is sync or not. > > But that is probably something that can wait until later and don't > need to be part of the initial patch? > If peter wants to do this in the future I can create a small writeup > on how to mixin the two different APIs using the same context. We can do that, but I would like to focus on the basic functionality first. Peter