On 09/06/2017 01:42 PM, Daniel P. Berrange wrote: > On Wed, Sep 06, 2017 at 01:35:45PM +0200, Michal Privoznik wrote: >> On 09/05/2017 04:07 PM, Daniel P. Berrange wrote: >>> On Tue, Sep 05, 2017 at 03:59:09PM +0200, Michal Privoznik wrote: >>>> On 07/28/2017 10:59 AM, Daniel P. Berrange wrote: >>>>> On Fri, Jul 28, 2017 at 10:45:21AM +0200, Michal Privoznik wrote: >>>>>> On 07/27/2017 03:50 PM, Daniel P. Berrange wrote: >>>>>>> On Thu, Jul 27, 2017 at 02:11:25PM +0200, Michal Privoznik wrote: >>>>>>>> Dear list, >>>>>>>> >>>>>>>> there is the following bug [1] which I'm not quite sure how to grasp. >>>>>>>> So >>>>>>>> there is this application/infrastructure called Kove [2] that allows >>>>>>>> you >>>>>>>> to have memory for your application stored on a distant host in network >>>>>>>> and basically fetch needed region on pagefault. Now imagine that >>>>>>>> somebody wants to use it for backing up domain memory. However, the way >>>>>>>> that the tool works is it has some kernel module and then some userland >>>>>>>> binary that is fed with the path of the mmaped file. I don't know all >>>>>>>> the details, but the point is, in order to let users use this we need >>>>>>>> to >>>>>>>> expose the paths for mem-path for the guest memory. I know we did not >>>>>>>> want to do this in the past, but now it looks like we don't have a way >>>>>>>> around it, do we? >>>>>>> >>>>>>> We don't want to expose the concept of paths in the XML because this is >>>>>>> a linux specific way to configure hugepages / shared memory. So we hide >>>>>>> the particular path used in the internal impl of the QEMU driver, and >>>>>>> or via the qemu.conf global config file. I don't really want to change >>>>>>> that approach, particularly if the only reason is to integrate with a >>>>>>> closed source binary like Kove. >>>>>> >>>>>> Yep, I agree with that. However, if you read the discussion in the >>>>>> linked bug you'll find that they need to know what file in the >>>>>> memory_backing_dir (from qemu.conf) corresponds to which domain. The >>>>>> reported suggested using UUID based filenames, which I fear is not >>>>>> enough because one can have multiple <memory type='dimm'/> -s configured >>>>>> for their domain. But I guess we could go with: >>>>>> >>>>>> ${memory_backing_dir}/${domName} for generic memory >>>>>> ${memory_backing_dir}/${domName}_N for Nth <memory/> >>>>> >>>>> This feels like it is going to lead to hell when you add in memory >>>>> hotplug/unplug, with inevitable races. >>>>> >>>>>> BTW: IIUC they want predictable names because they need to create the >>>>>> files before spawning qemu so that they are picked by qemu instead of >>>>>> using temporary names. >>>>> >>>>> I would like to know why they even need to associate particular memory >>>>> files with particular QEMU processes. eg if they're just exposing a >>>>> new type of tmpfs filesystem from the kernel why does it matter what >>>>> each file is used for. >>>> >>>> This might get you answer: >>>> >>>> https://bugzilla.redhat.com/show_bug.cgi?id=1461214#c4 >>>> >>>> So the way I understand it is that they will create the files, and >>>> provide us with paths. So luckily, we don't have to make up the paths on >>>> our own. >>> >>> IOW it is pretending to be tmpfs except it is not behaving like tmpfs. >>> This doesn't really make me any more inclined to support this closed >>> source stuff in libvirt. >> >> Yeah, that's my feeling too. So, what about the following: let's assume >> they will fix their code so that it is proper tmpfs. Libvirt can then >> behave to it just like it is already doing so for hugetlbfs. For us >> it'll be just yet another type of hugepages. I mean, for hugepages we >> already create /hupages/mount/point/libvirt/$domain per each domain so >> the separation is there (even though this is considered internal impl), >> since it would be a proper tmpfs they can see the pid of qemu which is >> trying to mmap() (and take the name or whatever unique ID they want from >> there). > > Yep, we can at least make a reasonable guarantee that all files belonging > to a single QEMU process will always be within the same sub-directory. > This allows the kmod to distinguish 2 files owned by separate VMs, from 2 > files owned by the same VM and do what's needed. I don't see why it would > need to care about naming conventions beyond the layout. > >> I guess what I'm trying to ask is if it was proper tmpfs, we would be >> okay with it, wouldn't we? > > If it is indistinguishable from tmpfs/hugetlbfs from libvirt's POV, we > should be fine - at most you would need /etc/libvirt/qemu.conf change > to explicitly point at the custom mount point if libvirt doesn't > auto-detect the right one. >
Zack, can you join the discussion and tell us if our design sounds reasonable to you? Michal -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list