On Thu, Aug 14, 2014 at 05:09:00PM +0800, zhanghailiang wrote: > On 2014/8/14 15:15, Hu Tao wrote: > >On Thu, Aug 14, 2014 at 02:31:41PM +0800, zhanghailiang wrote: > >>On 2014/8/13 19:50, Michael S. Tsirkin wrote: > >>>On Wed, Aug 13, 2014 at 07:21:57PM +0800, zhanghailiang wrote: > >>>>If we configure qemu with realtime-mlock-on and memory-node-bind at the > >>>>same time, > >>>>Qemu will fail to start, and mbind() fails with message "Input/output > >>>>error". > >>>> > >>>>> From man page: > >>>>int mbind(void *addr, unsigned long len, int mode, > >>>> unsigned long *nodemask, unsigned long maxnode, > >>>> unsigned flags); > >>>>The *MPOL_BIND* mode specifies a strict policy that restricts memory > >>>>allocation > >>>>to the nodes specified in nodemask. > >>>>If *MPOL_MF_STRICT* is passed in flags and policy is not MPOL_DEFAULT(In > >>>>qemu > >>>>here is MPOL_BIND), then the call will fail with the error EIO if the > >>>>existing > >>>>pages in the memory range don't follow the policy. > >>>> > >>>>The memory locked ahead by mlockall can not guarantee to follow the > >>>>policy above, > >>>>And if that happens, it will result in an EIO error. > >>>> > >>>>So we should call mlock after mbind, here we adjust the place where > >>>>called mlock, > >>>>Move it to function pc_memory_init. > >>>> > >>>>Signed-off-by: xiexiangyou<xiexiang...@huawei.com> > >>>>Signed-off-by: zhanghailiang<zhang.zhanghaili...@huawei.com> > >>> > >>>OK but won't this still fail in case of memory hotplug? > >>>We set MCL_FUTURE so the same will apply? > > > >It has already been set. > > > >>>Maybe it's enough to set MPOL_MF_MOVE? > >>>Does the following work for you? > >>> > >> > >>Hi Michael, > >> > >>I have tested memory hotplug, use virsh command like > >>'virsh setmem redhat-6.4 6388608 --config --live', it is OK, and > >>it will not call mbind when do such memory hotplug. > >>but i don't know if there is command like 'memory-node hotplug' ? > > > >Using qemu monitor command: > >object_add memory-backend-ram,id=ram1,size=128M,host-nodes=0,policy=bind > > > > Hi Hu Tao, > > I have tested it use the above command, and yes, it failed. > And if i remove the *MCL_FUTURE* flag of mlockall(), it will work fine. > > *MCL_FUTURE* Lock all pages which will become mapped into the > address space of the process in the future.(From man page) > > So i think we could remove this flag, and call mlockall(MCL_CURRENT) > every time when we do the above 'memory object add'.
No that's wrong. We want allocations of qemu memory locked as well - not just guest memory. > >> > >>MPOL_MF_MOVE can work, it is more simple, but it is not perfect. It > >>consumes more time to *move the memory*(i guess will reconstruct > >>pages and copy memory)which has been locked by mlockall. The result > >>is VM will start slower than the above scenario. > > > >I think your patch makes sense but it doesn't work for hotplugged > >memory. We need MPOL_MF_MOVE, too. > > > > Thank you very much, before send another patch, I will look into the > qemu process of hotplug memory, and try to solve this fault.:) Does setting MPOL_MF_MOVE solve all problems? > >> > >>BTW, i think the follow process is clearer and more logical: > >>Allocate memory--->Set memory policy--->Lock memory. > >> > >>So what's your opinion? Thanks very much. > >> > >> > >>>--> > >>> > >>>hostmem: set MPOL_MF_MOVE > >>> > >>>When memory is allocated on a wrong node, MPOL_MF_STRICT > >>>doesn't move it - it just fails the allocation. > >>>A simple way to reproduce the failure is with mlock=on > >>>realtime feature. > >>> > >>>The code comment actually says: "ensure policy won't be ignored" > >>>so setting MPOL_MF_MOVE seems like a better way to do this. > >>> > >>>Signed-off-by: Michael S. Tsirkin<m...@redhat.com> > >>> > >>>--- > >>> > >>>diff --git a/backends/hostmem.c b/backends/hostmem.c > >>>index ca10c51..a9905c0 100644 > >>>--- a/backends/hostmem.c > >>>+++ b/backends/hostmem.c > >>>@@ -304,7 +304,7 @@ host_memory_backend_memory_complete(UserCreatable *uc, > >>>Error **errp) > >>> /* ensure policy won't be ignored in case memory is preallocated > >>> * before mbind(). note: MPOL_MF_STRICT is ignored on hugepages > >>> so > >>> * this doesn't catch hugepage case. */ > >>>- unsigned flags = MPOL_MF_STRICT; > >>>+ unsigned flags = MPOL_MF_STRICT | MPOL_MF_MOVE; > >>> > >>> /* check for invalid host-nodes and policies and give more > >>> verbose > >>> * error messages than mbind(). */ > >>> > >> > > > > > >. > > >