Re: [Qemu-devel] RFC: moving fsfreeze support from the userland guest agent to the guest kernel
On Sun, 07 Aug 2011 21:28:17 +0300 Ronen Hod r...@redhat.com wrote: Well, we want to support Microsoft's VSS, and that requires a guest agent that communicates with all the writers (applications), waiting for them to flush their app data in order to generate a consistent app-level snapshot. The VSS platform does most of the work. Still, at the bottom line, the agent's role is only to find the right moment in time. This moment can be relayed back to libvirt, and from there do it according to your suggestion, so that the guest agent does not do the freeze, and it is actually not a mandatory component. I think this discussion has reached the point where patches will speak louder than words.
Re: [Qemu-devel] RFC: moving fsfreeze support from the userland guest agent to the guest kernel
Well, we want to support Microsoft's VSS, and that requires a guest agent that communicates with all the writers (applications), waiting for them to flush their app data in order to generate a consistent app-level snapshot. The VSS platform does most of the work. Still, at the bottom line, the agent's role is only to find the right moment in time. This moment can be relayed back to libvirt, and from there do it according to your suggestion, so that the guest agent does not do the freeze, and it is actually not a mandatory component. Ronen.
Re: [Qemu-devel] RFC: moving fsfreeze support from the userland guest agent to the guest kernel
On Thu, Jul 28, 2011 at 11:53:50AM +0900, Fernando Luis Vázquez Cao wrote: On Wed, 2011-07-27 at 17:24 +0200, Andrea Arcangeli wrote: making sure no lib is calling any I/O function to be able to defreeze the filesystems later, making sure the oom killer or a wrong kill -9 $RANDOM isn't killing the agent by mistake while the I/O is blocked and the copy is going. Yes with the current API if the agent is killed while the filesystems are frozen we are screwed. I have just submitted patches that implement a new API that should make the virtualization use case more reliable. Basically, I am adding a new ioctl, FIGETFREEZEFD, which freezes the indicated filesystem and returns a file descriptor; as long as that file descriptor is held open, the filesystem remains open. If the freeze file descriptor is closed (be it through a explicit call to close(2) or as part of process exit housekeeping) the associated filesystem is automatically thawed. - fsfreeze: add ioctl to create a fd for freeze control http://marc.info/?l=linux-fsdevelm=131175212512290w=2 - fsfreeze: add freeze fd ioctls http://marc.info/?l=linux-fsdevelm=131175220612341w=2 This is probably how the API should have been implemented originally instead of FIFREEZE/FITHAW. It looks a bit overkill though, I would think it'd be enough to have the fsfreeze forced at FIGETFREEZEFD, and the only way to thaw by closing the file without requiring any of the FS_FREEZE_FD/FS_THAW_FD/FS_ISFROZEN_FD. But I guess you have use cases for those if you implemented it, maybe to check if root is stepping on its own toes by checking if the fs is already freezed before freezing it and returning failure if it is, running ioctl instead of opening closing the file isn't necessarily better. At the very least the get_user(should_freeze, argp) doesn't seem so necessary, it just complicates the ioctl API a bit without much gain, I think it'd be cleaner if the FS_FREEZE_FD was the only way to freeze then. It's certainly a nice reliability improvement and safer API. Now if you add a file descriptor to epoll/poll that userland can open and talk to, to know when a fsfreeze is asked on a certain fs, a fsfreeze userland agent (not virt related too) could open it and start the scripts if that filesystem is being fsfreezed before calling freeze_super(). Then a PARAVIRT_FSFREEZE=y/m driver could just invoke the fsfreeze without any dependency on a virt specific guest agent. Maybe Christoph's right there are filesystems in userland (not sure how the storage is related, it's all about filesystems and apps as far I can see, and it's all blkdev agnostic) that may make things more complicated, but those usually have a kernel backend too (like fuse). I may not see the full picture of the filesystem in userland or how the storage agent in guest userland relates to this. If you believe having libvirt talking QMP/QAPI over a virtio-serial vmchannel with some virt specific guest userland agent bypassing qemu entirely is better, that's ok with me, but there should be a strong reason for it because the paravirt_fsfreeze.ko approach with a small qemu backend and a qemu monitor command that starts paravirt-fsfreeze in guest before going ahead blocking all I/O (to provide backwards compatibility and reliable snapshots to guest OS that won't have the paravirt fsfreeze too) looks more reliable, more compact and simpler to use to me. I'll be surely ok either ways though. Thanks, Andrea
Re: [Qemu-devel] RFC: moving fsfreeze support from the userland guest agent to the guest kernel
On 07/27/11 18:40, Andrea Arcangeli wrote: Another thing to note is that snapshotting is not necessarily something that should be completely transparent to the guest. One of the planned future features for the guest agent (mentioned in the snapshot wiki, and a common use case that I've seen come up elsewhere as well in the context of database applications), is a way for userspace applications to register callbacks to be made in the event of a freeze (dumping application-managed caches to disk and things along that line). The Not sure if the scripts are really needed or if they would just open a brand new fsfreeze specific unix domain socket (created by the database) to tell the database to freeze. If the latter is the case, then it'd be better rather than changing the database to open unix domain socket so the script can connect to it when invoked (or maybe to just add some new function to the protocol of an existing open unix domain socket), to instead change the database to open a /dev/virtio-fsfreeze device, created by the virtio-fsfreeze.ko virtio driver through udev. The database would poll it, and it could read the request to freeze, and write into it that it finished freezing when done. Then when all openers of the device freezed, the virtio-fsfreeze.ko would go ahead freezing all the filesystems, and then tell qemu when it's finished freezing. Then qemu can finally block all the I/O and tell libvirt to go ahead with the snapshot. I think it could also be a combined operation, ie. having the freeze happen in the kernel, but doing the callouts using a userspace daemon. I like the userspace daemon for the callouts because it allows providing a more sophisticated API than if we provide just a socket like interface. In addition the callout is less critical wrt crashes than the fsfreeze operations. Cheers, Jes
Re: [Qemu-devel] RFC: moving fsfreeze support from the userland guest agent to the guest kernel
On 07/27/11 20:36, Christoph Hellwig wrote: Initiating the freeze from kernelspace doesn't make much sense. With virtio we could add in-band freeze request to the protocol, and although that would be a major change in that way virtio-blk works right now it's at least doable. But all other real storage targets only communicate with their initators over out of band procotols that are entirely handled in userspace, and given their high-level nature better are - that is if we know them at all given how vendors like to keep this secrete IP closed and just offer userspace management tools in binary form. building new infrastructure in the kernel just for virtio, while needing to duplicate the same thing in userspace for all real storage seems like a really bad idea. That is in addition to the userspace freeze notifier similar to what e.g. Windows has - if the freeze process is driven from userspace it's much easier to handle those properly compared to requiring kernel upcalls. The freeze operation would really just be a case of walking the list of mounted file systems and calling the FIFREEZE ioctl operation on them. I wouldn't anticipate doing anything else in a virtio-fsfreeze.ko module. Cheers, Jes
Re: [Qemu-devel] RFC: moving fsfreeze support from the userland guest agent to the guest kernel
On 07/28/2011 03:03 AM, Andrea Arcangeli wrote: On Thu, Jul 28, 2011 at 11:53:50AM +0900, Fernando Luis Vázquez Cao wrote: On Wed, 2011-07-27 at 17:24 +0200, Andrea Arcangeli wrote: making sure no lib is calling any I/O function to be able to defreeze the filesystems later, making sure the oom killer or a wrong kill -9 $RANDOM isn't killing the agent by mistake while the I/O is blocked and the copy is going. Yes with the current API if the agent is killed while the filesystems are frozen we are screwed. I have just submitted patches that implement a new API that should make the virtualization use case more reliable. Basically, I am adding a new ioctl, FIGETFREEZEFD, which freezes the indicated filesystem and returns a file descriptor; as long as that file descriptor is held open, the filesystem remains open. If the freeze file descriptor is closed (be it through a explicit call to close(2) or as part of process exit housekeeping) the associated filesystem is automatically thawed. - fsfreeze: add ioctl to create a fd for freeze control http://marc.info/?l=linux-fsdevelm=131175212512290w=2 - fsfreeze: add freeze fd ioctls http://marc.info/?l=linux-fsdevelm=131175220612341w=2 This is probably how the API should have been implemented originally instead of FIFREEZE/FITHAW. It looks a bit overkill though, I would think it'd be enough to have the fsfreeze forced at FIGETFREEZEFD, and the only way to thaw by closing the file without requiring any of the FS_FREEZE_FD/FS_THAW_FD/FS_ISFROZEN_FD. But I guess you have use cases One of the crappy things about the current implementation is the inability to determine whether or not a filesystem is frozen. At least in the context of guest agent at least, it'd be nice if guest-fsfreeze-status checked the actual system state rather than some internal state that may not necessarily reflect reality (if we freeze, and some other application thaws, we currently still report the state as frozen). Also in the context of the guest agent, we are indeed screwed if the agent gets killed while in a frozen state, and remain screwed even if it's restarted since we have no way of determining whether or not we're in a frozen state and thus should disable logging operations. We could check status by looking for a failure from the freeze operation, but if you're just interested in getting the state, having to potentially induce a freeze just to get at the state is really heavy-handed. So having an open operation that doesn't force a freeze/thaw/status operation serves some fairly common use cases I think. for those if you implemented it, maybe to check if root is stepping on its own toes by checking if the fs is already freezed before freezing it and returning failure if it is, running ioctl instead of opening closing the file isn't necessarily better. At the very least the get_user(should_freeze, argp) doesn't seem so necessary, it just complicates the ioctl API a bit without much gain, I think it'd be cleaner if the FS_FREEZE_FD was the only way to freeze then. It's certainly a nice reliability improvement and safer API. Now if you add a file descriptor to epoll/poll that userland can open and talk to, to know when a fsfreeze is asked on a certain fs, a fsfreeze userland agent (not virt related too) could open it and start the scripts if that filesystem is being fsfreezed before calling freeze_super(). Then a PARAVIRT_FSFREEZE=y/m driver could just invoke the fsfreeze without any dependency on a virt specific guest agent. Maybe Christoph's right there are filesystems in userland (not sure how the storage is related, it's all about filesystems and apps as far I can see, and it's all blkdev agnostic) that may make things more complicated, but those usually have a kernel backend too (like fuse). I may not see the full picture of the filesystem in userland or how the storage agent in guest userland relates to this. If you believe having libvirt talking QMP/QAPI over a virtio-serial vmchannel with some virt specific guest userland agent bypassing qemu entirely is better, that's ok with me, but there should be a strong reason for it because the paravirt_fsfreeze.ko approach with a small qemu backend and a qemu monitor command that starts paravirt-fsfreeze in guest before going ahead blocking all I/O (to provide backwards compatibility and reliable snapshots to guest OS that won't have the paravirt fsfreeze too) looks more reliable, more compact and simpler to use to me. I'll be surely ok either ways though. Thanks, Andrea
Re: [Qemu-devel] RFC: moving fsfreeze support from the userland guest agent to the guest kernel
On 07/28/2011 03:54 AM, Jes Sorensen wrote: On 07/27/11 18:40, Andrea Arcangeli wrote: Another thing to note is that snapshotting is not necessarily something that should be completely transparent to the guest. One of the planned future features for the guest agent (mentioned in the snapshot wiki, and a common use case that I've seen come up elsewhere as well in the context of database applications), is a way for userspace applications to register callbacks to be made in the event of a freeze (dumping application-managed caches to disk and things along that line). The Not sure if the scripts are really needed or if they would just open a brand new fsfreeze specific unix domain socket (created by the database) to tell the database to freeze. If the latter is the case, then it'd be better rather than changing the database to open unix domain socket so the script can connect to it when invoked (or maybe to just add some new function to the protocol of an existing open unix domain socket), to instead change the database to open a /dev/virtio-fsfreeze device, created by the virtio-fsfreeze.ko virtio driver through udev. The database would poll it, and it could read the request to freeze, and write into it that it finished freezing when done. Then when all openers of the device freezed, the virtio-fsfreeze.ko would go ahead freezing all the filesystems, and then tell qemu when it's finished freezing. Then qemu can finally block all the I/O and tell libvirt to go ahead with the snapshot. I think it could also be a combined operation, ie. having the freeze happen in the kernel, but doing the callouts using a userspace daemon. I like the userspace daemon for the callouts because it allows providing a more sophisticated API than if we provide just a socket like interface. In addition the callout is less critical wrt crashes than the fsfreeze operations. I'd prefer this approach as well. We could potentially implement it with a more general mechanism for executing scripts in the guest for whatever reason, rather than an fsfreeze-specific one. Let the management layer handle the orchestration between the 2. Whether the freeze is kernel-driven or not I think can go either way, though the potential issues I mentioned in response the Fernando's post seem to those proposed changes are required for a proper guest agent implementation, and at that point we're talking about kernel changes either way for the functionality we ultimately want. I think there may still be value in retaining the current fsfreeze support in the agent for older guests, however. What I'm convinced of now though is that the operation should not be tethered to the application callback operation, since that's applicable to other potential fsfreeze mechanisms. Cheers, Jes
Re: [Qemu-devel] RFC: moving fsfreeze support from the userland guest agent to the guest kernel
Michael Roth さんは書きました: On 07/28/2011 03:03 AM, Andrea Arcangeli wrote: On Thu, Jul 28, 2011 at 11:53:50AM +0900, Fernando Luis Vázquez Cao wrote: On Wed, 2011-07-27 at 17:24 +0200, Andrea Arcangeli wrote: making sure no lib is calling any I/O function to be able to defreeze the filesystems later, making sure the oom killer or a wrong kill -9 $RANDOM isn't killing the agent by mistake while the I/O is blocked and the copy is going. Yes with the current API if the agent is killed while the filesystems are frozen we are screwed. I have just submitted patches that implement a new API that should make the virtualization use case more reliable. Basically, I am adding a new ioctl, FIGETFREEZEFD, which freezes the indicated filesystem and returns a file descriptor; as long as that file descriptor is held open, the filesystem remains open. If the freeze file descriptor is closed (be it through a explicit call to close(2) or as part of process exit housekeeping) the associated filesystem is automatically thawed. - fsfreeze: add ioctl to create a fd for freeze control http://marc.info/?l=linux-fsdevelm=131175212512290w=2 - fsfreeze: add freeze fd ioctls http://marc.info/?l=linux-fsdevelm=131175220612341w=2 This is probably how the API should have been implemented originally instead of FIFREEZE/FITHAW. It looks a bit overkill though, I would think it'd be enough to have the fsfreeze forced at FIGETFREEZEFD, and the only way to thaw by closing the file without requiring any of the FS_FREEZE_FD/FS_THAW_FD/FS_ISFROZEN_FD. But I guess you have use cases One of the crappy things about the current implementation is the inability to determine whether or not a filesystem is frozen. At least in the context of guest agent at least, it'd be nice if guest-fsfreeze-status checked the actual system state rather than some internal state that may not necessarily reflect reality (if we freeze, and some other application thaws, we currently still report the state as frozen). Also in the context of the guest agent, we are indeed screwed if the agent gets killed while in a frozen state, and remain screwed even if it's restarted since we have no way of determining whether or not we're in a frozen state and thus should disable logging operations. That is precisely the reason I added the new API. We could check status by looking for a failure from the freeze operation, but if you're just interested in getting the state, having to potentially induce a freeze just to get at the state is really heavy-handed. So having an open operation that doesn't force a freeze/thaw/status operation serves some fairly common use cases I think. Yep. If you think there is something missing API wise let me know and I will implement it. Thanks, Fernando
[Qemu-devel] RFC: moving fsfreeze support from the userland guest agent to the guest kernel
Hello everyone, I've been thinking at the current design of the fsfreeze feature used by libvirt. It currently relays on an userland agent in the guest talking to qemu with some vmchannel communication. The guest agent would walk the filesystems in the guest and call fsfreeze ioctl on them. The fsfreeze is an optional feature, it's not required to do safe snapshots, after fsfreeze (regardless if available or not) QEMU must still block all I/O for all qemu blkdevices before the image is saved, to allow safe snapshotting of non-linux guests. Then if a VM is restarted in the snapshot it becomes identical to a fault tolerance fallback with nfs or drdb in a highly available configuration. Fsfreeze just provides some further (minor) benefit on top of that (which probably won't be available for non-linux guests any time soon). The benefits this optional fsfreeze feature provides to the snapshot are: 1) more peace of mind by not relaying on the kernel journal reply code when snapshotting journaled/cow filesystems like ext4/btrfs/xfs 2) all dirty outstanding cache is flushed, which reduces the chances of running into userland journaling data reply bugs if userland is restarted on the snapshot 3) allows safe live snapshotting of not jorunaled fs like vfat/ext2 on linux (not so common, and vfat on non-linux guest won't benefit) 4) allows to mount the snapshotted image readonly without requiring metadata journal reply Problem is that having a daemon in guest userland is not my preference, considering it can be done with a virtio-fsfreeze.ko kernel module in guest without requiring any userland modification to the guest (and no interprocess communication through vmchannel or similar way). This means a kernel upgrade in the guest that adds the virtio-fsfreeze.ko virtio paravirt driver would be enough to be able to provide fsfreeze during snapshots. A virtio-fsfreeze.ko would certainly be more developer friendly, you could just build the kernel and even boot it with -kernel bzImage (after building it with VIRTIO_FSFREEZE=y). Then it'd just work without any daemon or vmchannel or any other change to the guest userland. I could see some advantage in not having to modify qemu if libvirt was talking directly to the guest agent, so to avoid any knowledge into qemu about FSFREEZE. But it's not even like that, I see FSFREEZE guest agent patches floating around. So if qemu has to be modified and be aware of the fsfreeze feature in the userland guest agent (and not just asked to block all I/O which doesn't require any guest knowledge and in turn it'd remain agnostic about fsfreeze) I think it'd be better if the fsfreeze qemu code would just go into a virtio backend. There is also an advantage in reliability as there's no more need to worry about mlocking the memory of the userland guest agent, making sure no lib is calling any I/O function to be able to defreeze the filesystems later, making sure the oom killer or a wrong kill -9 $RANDOM isn't killing the agent by mistake while the I/O is blocked and the copy is going. The guest kernel is a more reliable and natural place to call fsfreeze through a virtio-fsfreeze guest driver without having to spend time into worrying about the reliability of the guest-agent feature. It'd surely also waste less memory in the guest (not that the agent takes much memory but a few kbytes of .text of a kernel module for this surely would takes a fraction of the mlocked RAM the agent would take, the RAM saving is the least interesting part of course). If there was no hypervisor behind the kernel, it could only be the userland starting a fsfreeze, so we shouldn't be fooled into thinking userland is the best place where to start a fsfreeze invocation, it's most certainly not, but on the host (without virt) there's no other thing that could possibly ask for it. But here we have an hypervisor behind the guest kernel that asks for it, so starting the fsfreeze through a virtio-fsfreeze.ko kernel module loaded into the guest kernel (or linked into the guest kernel) sounds a cleaner and more reliable solution (maybe simpler too). I'd be certainly a more friendly solution for developers to test or run it, libvirt would talk only with qemu, and qemu would only talk with the guest kernel without requiring any modification to the guest userland. My feeling is that usually what feels much simpler to use for developers tends to be a better solution (not guaranteed) and to me a virtio-fsfreeze.ko solution would look much simpler to use. There are drawbacks, like the fact respinning an update to the fsfreeze code, would then require an upgrade of the guest kernel, instead of a package update. But there are avantages too in terms of coverage, as an updated kernel would also run on top of an older guest userland that may not have a agent package to install through a repository. In any case if the virtio-fsfreeze.ko doesn't register into qemu virtio-fsfreeze backend, the qemu monitor command should
Re: [Qemu-devel] RFC: moving fsfreeze support from the userland guest agent to the guest kernel
On 07/27/2011 10:24 AM, Andrea Arcangeli wrote: Hello everyone, I've been thinking at the current design of the fsfreeze feature used by libvirt. It currently relays on an userland agent in the guest talking to qemu with some vmchannel communication. The guest agent would walk the filesystems in the guest and call fsfreeze ioctl on them. The fsfreeze is an optional feature, it's not required to do safe snapshots, after fsfreeze (regardless if available or not) QEMU must still block all I/O for all qemu blkdevices before the image is saved, to allow safe snapshotting of non-linux guests. Then if a VM is restarted in the snapshot it becomes identical to a fault tolerance fallback with nfs or drdb in a highly available configuration. Fsfreeze just provides some further (minor) benefit on top of that (which probably won't be available for non-linux guests any time soon). The benefits this optional fsfreeze feature provides to the snapshot are: 1) more peace of mind by not relaying on the kernel journal reply code when snapshotting journaled/cow filesystems like ext4/btrfs/xfs 2) all dirty outstanding cache is flushed, which reduces the chances of running into userland journaling data reply bugs if userland is restarted on the snapshot 3) allows safe live snapshotting of not jorunaled fs like vfat/ext2 on linux (not so common, and vfat on non-linux guest won't benefit) 4) allows to mount the snapshotted image readonly without requiring metadata journal reply Problem is that having a daemon in guest userland is not my preference, considering it can be done with a virtio-fsfreeze.ko kernel module in guest without requiring any userland modification to the guest (and no interprocess communication through vmchannel or similar way). This means a kernel upgrade in the guest that adds the virtio-fsfreeze.ko virtio paravirt driver would be enough to be able to provide fsfreeze during snapshots. A virtio-fsfreeze.ko would certainly be more developer friendly, you could just build the kernel and even boot it with -kernel bzImage (after building it with VIRTIO_FSFREEZE=y). Then it'd just work without any daemon or vmchannel or any other change to the guest userland. I could see some advantage in not having to modify qemu if libvirt was talking directly to the guest agent, so to avoid any knowledge into qemu about FSFREEZE. But it's not even like that, I see FSFREEZE guest agent patches floating around. So if qemu has to be modified and be aware of the fsfreeze feature in the userland guest agent (and not just asked to block all I/O which doesn't require any guest knowledge and in turn it'd remain agnostic about fsfreeze) I think it'd be better if the fsfreeze qemu code would just go into a virtio backend. There is also an advantage in reliability as there's no more need to worry about mlocking the memory of the userland guest agent, making sure no lib is calling any I/O function to be able to defreeze the filesystems later, making sure the oom killer or a wrong kill -9 $RANDOM isn't killing the agent by mistake while the I/O is blocked and the copy is going. The guest kernel is a more reliable and natural place to call fsfreeze through a virtio-fsfreeze guest driver without having to spend time into worrying about the reliability of the guest-agent feature. It'd surely also waste less memory in the guest (not that the agent takes much memory but a few kbytes of .text of a kernel module for this surely would takes a fraction of the mlocked RAM the agent would take, the RAM saving is the least interesting part of course). If there was no hypervisor behind the kernel, it could only be the userland starting a fsfreeze, so we shouldn't be fooled into thinking userland is the best place where to start a fsfreeze invocation, it's most certainly not, but on the host (without virt) there's no other thing that could possibly ask for it. But here we have an hypervisor behind the guest kernel that asks for it, so starting the fsfreeze through a virtio-fsfreeze.ko kernel module loaded into the guest kernel (or linked into the guest kernel) sounds a cleaner and more reliable solution (maybe simpler too). I'd be certainly a more friendly solution for developers to test or run it, libvirt would talk only with qemu, and qemu would only talk with the guest kernel without requiring any modification to the guest userland. My feeling is that usually what feels much simpler to use for developers tends to be a better solution (not guaranteed) and to me a virtio-fsfreeze.ko solution would look much simpler to use. There are drawbacks, like the fact respinning an update to the fsfreeze code, would then require an upgrade of the guest kernel, instead of a package update. But there are avantages too in terms of coverage, as an updated kernel would also run on top of an older guest userland that may not have a agent package to install through a repository. In any case if the virtio-fsfreeze.ko doesn't register into qemu
Re: [Qemu-devel] RFC: moving fsfreeze support from the userland guest agent to the guest kernel
On 07/27/2011 10:24 AM, Andrea Arcangeli wrote: Hello everyone, I've been thinking at the current design of the fsfreeze feature used by libvirt. It currently relays on an userland agent in the guest talking to qemu with some vmchannel communication. The guest agent would walk the filesystems in the guest and call fsfreeze ioctl on them. The fsfreeze is an optional feature, it's not required to do safe snapshots, after fsfreeze (regardless if available or not) QEMU must still block all I/O for all qemu blkdevices before the image is saved, to allow safe snapshotting of non-linux guests. Then if a VM is restarted in the snapshot it becomes identical to a fault tolerance fallback with nfs or drdb in a highly available configuration. Fsfreeze just provides some further (minor) benefit on top of that (which probably won't be available for non-linux guests any time soon). The benefits this optional fsfreeze feature provides to the snapshot are: 1) more peace of mind by not relaying on the kernel journal reply code when snapshotting journaled/cow filesystems like ext4/btrfs/xfs 2) all dirty outstanding cache is flushed, which reduces the chances of running into userland journaling data reply bugs if userland is restarted on the snapshot 3) allows safe live snapshotting of not jorunaled fs like vfat/ext2 on linux (not so common, and vfat on non-linux guest won't benefit) 4) allows to mount the snapshotted image readonly without requiring metadata journal reply Problem is that having a daemon in guest userland is not my preference, considering it can be done with a virtio-fsfreeze.ko kernel module in guest without requiring any userland modification to the guest (and no interprocess communication through vmchannel or similar way). This means a kernel upgrade in the guest that adds the virtio-fsfreeze.ko virtio paravirt driver would be enough to be able to provide fsfreeze during snapshots. A virtio-fsfreeze.ko would certainly be more developer friendly, you could just build the kernel and even boot it with -kernel bzImage (after building it with VIRTIO_FSFREEZE=y). Then it'd just work without any daemon or vmchannel or any other change to the guest userland. I could see some advantage in not having to modify qemu if libvirt was talking directly to the guest agent, so to avoid any knowledge into qemu about FSFREEZE. But it's not even like that, I see FSFREEZE guest agent patches floating around. So if qemu has to be modified and be aware of the fsfreeze feature in the userland guest agent (and not just asked to block all I/O which doesn't require any guest knowledge and in turn it'd remain agnostic about fsfreeze) I think it'd be better if the fsfreeze qemu code would just go into a virtio backend. Currently, QEMU doesn't know about fsfreeze. I don't think it ever will either. I understand an agent may be needed for other features but I think whenever a feature is better suited for not requiring userland guest support, it shouldn't. To me requiring modifications to the guest userland, looks the least transparent and most intrusive possible way to implement a libvirt feature so it should be used when it has advantages and I see mostly disadvantages here. I also dislike having orchestrate all of the freezing stuff because it's extremely hard in userspace to do it reliably. One challenge though is that it's highly desirable to have script hooks as part of the freeze process to let other userspace applications participate which means you will always need some userspace daemon to kick things off. Instead of having a virtio-fsfreeze, I think it would be better to think about if the kernel needs a higher level interface such that the userspace operation is dirt-simple. But I don't see a way to avoid userspace involvement in this set of operations unfortunately. Regards, Anthony Liguori This is just a suggestions, I think the agent should work too. Thanks a lot, Andrea
Re: [Qemu-devel] RFC: moving fsfreeze support from the userland guest agent to the guest kernel
Hello Michael, On Wed, Jul 27, 2011 at 11:07:13AM -0500, Michael Roth wrote: One thing worth mentioning is that the current host-side interface to the guest agent is not what we're hoping to build libvirt interfaces around. It's a standalone, out-of-band tool for now, but when QMP is converted to QAPI the guest agent interfaces will be exposed to the host transparently to the host as normal QMP commands. libvirt should be able to tell the difference from a guest-agent induced fsfreeze or a guest kernel induced fsfreeze (except perhaps to identify extended capabilities in a particular case): http://wiki.qemu.org/Features/QAPI/GuestAgent Sounds good. Another thing to note is that snapshotting is not necessarily something that should be completely transparent to the guest. One of the planned future features for the guest agent (mentioned in the snapshot wiki, and a common use case that I've seen come up elsewhere as well in the context of database applications), is a way for userspace applications to register callbacks to be made in the event of a freeze (dumping application-managed caches to disk and things along that line). The Not sure if the scripts are really needed or if they would just open a brand new fsfreeze specific unix domain socket (created by the database) to tell the database to freeze. If the latter is the case, then it'd be better rather than changing the database to open unix domain socket so the script can connect to it when invoked (or maybe to just add some new function to the protocol of an existing open unix domain socket), to instead change the database to open a /dev/virtio-fsfreeze device, created by the virtio-fsfreeze.ko virtio driver through udev. The database would poll it, and it could read the request to freeze, and write into it that it finished freezing when done. Then when all openers of the device freezed, the virtio-fsfreeze.ko would go ahead freezing all the filesystems, and then tell qemu when it's finished freezing. Then qemu can finally block all the I/O and tell libvirt to go ahead with the snapshot. If the script hangs (user agent in guest approach), or if the database hangs while keeping open the /dev/virtio-fsfreeze device (virtio-fsfreeze.ko approach), that would hang the whole fsfreeze operation in the virtio-fsfreeze.ko driver. Otherwise a timeout would be required. But the general idea is that the more stuff is going to be freezed (especially when userland is involved and not just guest kernel code like in the virtio-fsfreeze.ko), the higher the risk of an hang (or alternatively of a false positive timeout... if there's a timeout). If scripts are needed, then the agent starting the scripts with execve, could also open the /dev/virtio-fsfreeze instead of being invoked by the communication with libvirt with QMP/QAPI etc... The advantage at least is that if the database is killed, closing the file will not lead to an hang or a failure of the fsfreeze. If the agent is killed things would go bad instead (either hang or timeout). Maybe it's more a matter of taste, and maybe my taste makes me prefer a virtio-fsfreeze.ko that later can create register a dev /dev/virtio-fsfreeze that any app can open. The permission on the device will also define which apps may lead to false positive timeout of the snapshotting, or lead to an hang. implementation of this would likely be a directory where application can place scripts in that get called in the event of a freeze, something that would require a user-space daemon anyway. Also, in terms of supporting older guests, the proposed guest tools ISO (akin to virtualbox/vmware guest tools): http://lists.gnu.org/archive/html/qemu-devel/2011-06/msg02239.html would give us a distribution channel that doesn't require any involvement from distro maintainers. A distro-package to boot strap the agent would be still be preferable, but the ISO approach seems to work well in practice. And for managed environments getting custom packages installed generally isn't as much of a problem as requiring reboots or kernel changes. Nice to see it works for more hypervisors. I think it boils down if an agent is needed for fsfreeze or not. I think it's not, but I also tend to agree it can work with the agent. As a developer I don't have much doubt that it'd be so much simpler to use for me with a virtio driver and no userland change but I may be biased. I just don't see many cons to the kernel solution, except perhaps the fact to change the fsfreeze code you've to respin a kernel update.
Re: [Qemu-devel] RFC: moving fsfreeze support from the userland guest agent to the guest kernel
On Wed, Jul 27, 2011 at 11:34:44AM -0500, Anthony Liguori wrote: Currently, QEMU doesn't know about fsfreeze. I don't think it ever will either. Ah, sorry thanks for the correction, it's some other repo that you were modifying (qga). One challenge though is that it's highly desirable to have script hooks as part of the freeze process to let other userspace applications participate which means you will always need some userspace daemon to kick things off. Instead of having a virtio-fsfreeze, I think it would be better to think about if the kernel needs a higher level interface such that the userspace operation is dirt-simple. But I don't see a way to avoid userspace involvement in this set of operations unfortunately. A /dev/virtio-fsfreeze chardevice created by udev when virtio-fsfreeze.ko is loaded may be enough to do it. Or maybe it should be a host kernel solution /dev/fsfreeze that talks with fsfreeze (not just the virtio case). The apps liekly must be modified for this, I doubt the scripts would do much on their own (they'd likely just tell the app to do something through an unix domain socket) but if scripts are needed the agent could open that chardev instead of talking QMP/QAPI. It also depends if people prefers a single agent do it all, or a fsfreeze agent and some other agent for something else. Even if they want a single agent for everything they could still have it talk QMP/QAPI on the virtio-serial vmchannel for everything else and open /dev/virtio-fsfreeze or /dev/freeze if available. It's up to you... you understand the customer requirements better. For me a kernel update and no agent sounds nicer and looks more reliable considering what fsfreeze does.
Re: [Qemu-devel] RFC: moving fsfreeze support from the userland guest agent to the guest kernel
Initiating the freeze from kernelspace doesn't make much sense. With virtio we could add in-band freeze request to the protocol, and although that would be a major change in that way virtio-blk works right now it's at least doable. But all other real storage targets only communicate with their initators over out of band procotols that are entirely handled in userspace, and given their high-level nature better are - that is if we know them at all given how vendors like to keep this secrete IP closed and just offer userspace management tools in binary form. building new infrastructure in the kernel just for virtio, while needing to duplicate the same thing in userspace for all real storage seems like a really bad idea. That is in addition to the userspace freeze notifier similar to what e.g. Windows has - if the freeze process is driven from userspace it's much easier to handle those properly compared to requiring kernel upcalls.
Re: [Qemu-devel] RFC: moving fsfreeze support from the userland guest agent to the guest kernel
On Wed, Jul 27, 2011 at 08:36:10PM +0200, Christoph Hellwig wrote: Initiating the freeze from kernelspace doesn't make much sense. With virtio we could add in-band freeze request to the protocol, and although that would be a major change in that way virtio-blk works right now it's at least doable. But all other real storage targets only communicate with their initators over out of band procotols that are entirely handled in userspace, and given their high-level nature better are - that is if we know them at all given how vendors like to keep this secrete IP closed and just offer userspace management tools in binary form. I don't see how blkdev are related or how virtio-blk is related to this. Clearly there would be no ring for this, just a paravirt driver calling into the ioctl_fsfreeze(). What would those real storage targets be? It's just a matter of looping on the superblocks and call freeze_super() on those if sb-s_op-freeze_fs is not null. We don't even need to go through a fake file handle to reach the fs by doing it in the guest kernel. building new infrastructure in the kernel just for virtio, while needing It doesn't need to be virtio as in ring. Maybe I should have called it paravirt-fsfreeze (as in PARAVIRT_CLOCK), virtio as in doing I/O not. to duplicate the same thing in userspace for all real storage seems like a really bad idea. That is in addition to the userspace freeze notifier similar to what e.g. Windows has - if the freeze process is driven from userspace it's much easier to handle those properly compared to requiring kernel upcalls. Not sure how it is simpler to talk through a virtio-serial some protocol than to poll a /dev/fsfreeze or /dev/paravirt-fsfreeze.
Re: [Qemu-devel] RFC: moving fsfreeze support from the userland guest agent to the guest kernel
On Wed, 2011-07-27 at 17:24 +0200, Andrea Arcangeli wrote: making sure no lib is calling any I/O function to be able to defreeze the filesystems later, making sure the oom killer or a wrong kill -9 $RANDOM isn't killing the agent by mistake while the I/O is blocked and the copy is going. Yes with the current API if the agent is killed while the filesystems are frozen we are screwed. I have just submitted patches that implement a new API that should make the virtualization use case more reliable. Basically, I am adding a new ioctl, FIGETFREEZEFD, which freezes the indicated filesystem and returns a file descriptor; as long as that file descriptor is held open, the filesystem remains open. If the freeze file descriptor is closed (be it through a explicit call to close(2) or as part of process exit housekeeping) the associated filesystem is automatically thawed. - fsfreeze: add ioctl to create a fd for freeze control http://marc.info/?l=linux-fsdevelm=131175212512290w=2 - fsfreeze: add freeze fd ioctls http://marc.info/?l=linux-fsdevelm=131175220612341w=2