Re: [PATCH] mlockall() problem in OpenBSD 5.2
Hi, just wanted to notice that the difference in locking and resource counting algorithms is not fixed yet. That means that mlockall() in 5.3 is still broken. Given that src tree is unlocked now, is there any chance that it will be fixed? By the way, we're using the patched version of uvm_map_pageable_all() for a long time now and haven't noticed any problems. Thanks! On Sunday 09 December 2012 00:26:35 Ariane van der Steldt wrote: > Ilya Bakulin does point out a serious bug in the vmmap code however: the > resource counting algorithms and locking algorithm count differently. > The code ought to be in sync; if no developer is going to fix the > commit-part of the code, I would seriously recommend putting Ilya's diff > in. signature.asc Description: This is a digitally signed message part.
Re: [PATCH] mlockall() problem in OpenBSD 5.2
On Sun, 09 Dec 2012 00:26:35 +0100 Ariane van der Steldt wrote: > On 11/09/12 08:56, Gerhard Roth wrote: > > On Thu, 08 Nov 2012 16:22:41 -0500 > > Ted Unangst wrote: > >> On Thu, Nov 08, 2012 at 13:34, Ilya Bakulin wrote: > >> > >>> The problem seems to be in uvm_map_pageable_all() function > >>> (sys/uvm/uvm_map.c). This function is a "special case of > >>> uvm_map_pageable", > >>> which tries to mlockall() all mapped memory regions. > >>> Prior to calling uvm_map_pageable_wire(), which actually does locking, it > >>> tries to count how many memory bytes will be locked, and compares this > >>> number > >>> with uvmexp.wiredmax, which is set by RLIMIT_MEMLOCK. > >>> The problem is that counting algorithm doesn't take into account that some > >>> pages have VM_PROT_NONE flag set and hence won't be locked anyway. > >>> Later in uvm_map_pageable_wire() these pages are skipped when doing actual > >>> job. > >> I don't know if this is right. Should prot_none pages not be wired? > >> > >> I think the opposite should happen. prot_none pages should be locked > >> as well. The app may be using prot_none as a way to protect its super > >> secret secrets from itself. It certainly wouldn't want them being > >> swapped out. > >> > > As long as they have VM_PROT_NONE, they can't be accessed and wiring them > > is just a waste of resources. > > > > If your scenario applies then uvm_map_protect() kicks in. It takes care of > > wiring pages if the protection changes from VM_PROT_NONE to some different > > value, though I have to admit that this happens only in case the > > VM_MAP_WIREFUTURE flag was specified. But that looks acceptable to me. > > Tedu is right and you're wrong. PROT_NONE protected pages must be wired > when calling mlock* functions. > > The main argument: malloc protects its bookkeeping data using > mprotect(PROT_NONE), which you definitely want to wire if you call > mlockall (either because you want to prevent information leaking to disk > or you have a time-sensitive program like ntpd and swap hurts). So could you explain to me how malloc() accomplishes to store meta data in memory with PROT_NONE? It can't! It has to change the protection to PROT_READ|PROT_WRITE first. As I pointed out before, changing the protection to something different than PROT_NONE will wire the pages; but changing it back to PROT_NONE won't un-wire them. Hence this change really affects only PROT_NONE areas that are never accessed and probably serve as red zones or guard pages. Gerhard
Re: [PATCH] mlockall() problem in OpenBSD 5.2
On Sun, Dec 09, 2012 at 08:28:08AM +0100, Otto Moerbeek wrote: > On Sun, Dec 09, 2012 at 12:26:35AM +0100, Ariane van der Steldt wrote: > > > On 11/09/12 08:56, Gerhard Roth wrote: > > >On Thu, 08 Nov 2012 16:22:41 -0500 > > >Ted Unangst wrote: > > >>On Thu, Nov 08, 2012 at 13:34, Ilya Bakulin wrote: > > >> > > >>>The problem seems to be in uvm_map_pageable_all() function > > >>>(sys/uvm/uvm_map.c). This function is a "special case of > > >>>uvm_map_pageable", > > >>>which tries to mlockall() all mapped memory regions. > > >>>Prior to calling uvm_map_pageable_wire(), which actually does locking, it > > >>>tries to count how many memory bytes will be locked, and compares this > > >>>number > > >>>with uvmexp.wiredmax, which is set by RLIMIT_MEMLOCK. > > >>>The problem is that counting algorithm doesn't take into account that > > >>>some > > >>>pages have VM_PROT_NONE flag set and hence won't be locked anyway. > > >>>Later in uvm_map_pageable_wire() these pages are skipped when doing > > >>>actual > > >>>job. > > >>I don't know if this is right. Should prot_none pages not be wired? > > >> > > >>I think the opposite should happen. prot_none pages should be locked > > >>as well. The app may be using prot_none as a way to protect its super > > >>secret secrets from itself. It certainly wouldn't want them being > > >>swapped out. > > >> > > >As long as they have VM_PROT_NONE, they can't be accessed and wiring them > > >is just a waste of resources. > > > > > >If your scenario applies then uvm_map_protect() kicks in. It takes care of > > >wiring pages if the protection changes from VM_PROT_NONE to some different > > >value, though I have to admit that this happens only in case the > > >VM_MAP_WIREFUTURE flag was specified. But that looks acceptable to me. > > > > Tedu is right and you're wrong. PROT_NONE protected pages must be > > wired when calling mlock* functions. > > > > The main argument: malloc protects its bookkeeping data using > > mprotect(PROT_NONE), which you definitely want to wire if you call > > mlockall (either because you want to prevent information leaking to > > disk or you have a time-sensitive program like ntpd and swap hurts). > > As for wasting resources: the kernel has insufficient information to > > fix wasteful programs, nor does it have sufficient information to > > consider PROT_NONE pages on a case-by-case basis. > > > > Also consider that there is a limitation on wired memory, if you are > > concerned about wasting resources. > > > > > > Ilya Bakulin does point out a serious bug in the vmmap code however: > > the resource counting algorithms and locking algorithm count > > differently. The code ought to be in sync; if no developer is going > > to fix the commit-part of the code, I would seriously recommend > > putting Ilya's diff in. > > -- > > Ariane > > A corection is needed here: malloc uses PROT_NONE for guard pages, > PROT_NONE is not used to protect meta data. However, if the F flag is > used, cached free pages are protected by PROT_NONE. The only other > case is pages pointed to by the return values of malloc(0) calls. An an addition: by default malloc dos not use guard pages for user data, only a few for the main meta data. So you could argue that PROT_NONE ise used to protect meta data, but this is only two pages. -Otto
Re: [PATCH] mlockall() problem in OpenBSD 5.2
On Sun, Dec 09, 2012 at 12:26:35AM +0100, Ariane van der Steldt wrote: > On 11/09/12 08:56, Gerhard Roth wrote: > >On Thu, 08 Nov 2012 16:22:41 -0500 > >Ted Unangst wrote: > >>On Thu, Nov 08, 2012 at 13:34, Ilya Bakulin wrote: > >> > >>>The problem seems to be in uvm_map_pageable_all() function > >>>(sys/uvm/uvm_map.c). This function is a "special case of uvm_map_pageable", > >>>which tries to mlockall() all mapped memory regions. > >>>Prior to calling uvm_map_pageable_wire(), which actually does locking, it > >>>tries to count how many memory bytes will be locked, and compares this > >>>number > >>>with uvmexp.wiredmax, which is set by RLIMIT_MEMLOCK. > >>>The problem is that counting algorithm doesn't take into account that some > >>>pages have VM_PROT_NONE flag set and hence won't be locked anyway. > >>>Later in uvm_map_pageable_wire() these pages are skipped when doing actual > >>>job. > >>I don't know if this is right. Should prot_none pages not be wired? > >> > >>I think the opposite should happen. prot_none pages should be locked > >>as well. The app may be using prot_none as a way to protect its super > >>secret secrets from itself. It certainly wouldn't want them being > >>swapped out. > >> > >As long as they have VM_PROT_NONE, they can't be accessed and wiring them > >is just a waste of resources. > > > >If your scenario applies then uvm_map_protect() kicks in. It takes care of > >wiring pages if the protection changes from VM_PROT_NONE to some different > >value, though I have to admit that this happens only in case the > >VM_MAP_WIREFUTURE flag was specified. But that looks acceptable to me. > > Tedu is right and you're wrong. PROT_NONE protected pages must be > wired when calling mlock* functions. > > The main argument: malloc protects its bookkeeping data using > mprotect(PROT_NONE), which you definitely want to wire if you call > mlockall (either because you want to prevent information leaking to > disk or you have a time-sensitive program like ntpd and swap hurts). > As for wasting resources: the kernel has insufficient information to > fix wasteful programs, nor does it have sufficient information to > consider PROT_NONE pages on a case-by-case basis. > > Also consider that there is a limitation on wired memory, if you are > concerned about wasting resources. > > > Ilya Bakulin does point out a serious bug in the vmmap code however: > the resource counting algorithms and locking algorithm count > differently. The code ought to be in sync; if no developer is going > to fix the commit-part of the code, I would seriously recommend > putting Ilya's diff in. > -- > Ariane A corection is needed here: malloc uses PROT_NONE for guard pages, PROT_NONE is not used to protect meta data. However, if the F flag is used, cached free pages are protected by PROT_NONE. The only other case is pages pointed to by the return values of malloc(0) calls. -Otto
Re: [PATCH] mlockall() problem in OpenBSD 5.2
On 11/09/12 08:56, Gerhard Roth wrote: On Thu, 08 Nov 2012 16:22:41 -0500 Ted Unangst wrote: On Thu, Nov 08, 2012 at 13:34, Ilya Bakulin wrote: The problem seems to be in uvm_map_pageable_all() function (sys/uvm/uvm_map.c). This function is a "special case of uvm_map_pageable", which tries to mlockall() all mapped memory regions. Prior to calling uvm_map_pageable_wire(), which actually does locking, it tries to count how many memory bytes will be locked, and compares this number with uvmexp.wiredmax, which is set by RLIMIT_MEMLOCK. The problem is that counting algorithm doesn't take into account that some pages have VM_PROT_NONE flag set and hence won't be locked anyway. Later in uvm_map_pageable_wire() these pages are skipped when doing actual job. I don't know if this is right. Should prot_none pages not be wired? I think the opposite should happen. prot_none pages should be locked as well. The app may be using prot_none as a way to protect its super secret secrets from itself. It certainly wouldn't want them being swapped out. As long as they have VM_PROT_NONE, they can't be accessed and wiring them is just a waste of resources. If your scenario applies then uvm_map_protect() kicks in. It takes care of wiring pages if the protection changes from VM_PROT_NONE to some different value, though I have to admit that this happens only in case the VM_MAP_WIREFUTURE flag was specified. But that looks acceptable to me. Tedu is right and you're wrong. PROT_NONE protected pages must be wired when calling mlock* functions. The main argument: malloc protects its bookkeeping data using mprotect(PROT_NONE), which you definitely want to wire if you call mlockall (either because you want to prevent information leaking to disk or you have a time-sensitive program like ntpd and swap hurts). As for wasting resources: the kernel has insufficient information to fix wasteful programs, nor does it have sufficient information to consider PROT_NONE pages on a case-by-case basis. Also consider that there is a limitation on wired memory, if you are concerned about wasting resources. Ilya Bakulin does point out a serious bug in the vmmap code however: the resource counting algorithms and locking algorithm count differently. The code ought to be in sync; if no developer is going to fix the commit-part of the code, I would seriously recommend putting Ilya's diff in. -- Ariane
Re: [PATCH] mlockall() problem in OpenBSD 5.2
Hi Mark, could you please also review my patch? It fixes a problem, which is not related to RLIMIT thing described by mickey. On Thursday 08 November 2012 14:30:07 Mark Kettenis wrote: > > Date: Thu, 8 Nov 2012 13:08:24 + > > From: Stuart Henderson > > > > Oh talking of RLIMIT reminds me...can someone who knows this area take > > a look at http://thread.gmane.org/gmane.os.aeriebsd.general/100 please? > > Change looks reasonable, but I think that instead of multiplying > vm_ssize by PAGE_SIZE, the could should use atop() to convert > rlim_cur/rlim_max to a number of pages. That is how the checks are > done in uvm/uvm_unix.c.
Re: [PATCH] mlockall() problem in OpenBSD 5.2
On Thu, 08 Nov 2012 16:22:41 -0500 Ted Unangst wrote: > On Thu, Nov 08, 2012 at 13:34, Ilya Bakulin wrote: > > > The problem seems to be in uvm_map_pageable_all() function > > (sys/uvm/uvm_map.c). This function is a "special case of uvm_map_pageable", > > which tries to mlockall() all mapped memory regions. > > Prior to calling uvm_map_pageable_wire(), which actually does locking, it > > tries to count how many memory bytes will be locked, and compares this > > number > > with uvmexp.wiredmax, which is set by RLIMIT_MEMLOCK. > > The problem is that counting algorithm doesn't take into account that some > > pages have VM_PROT_NONE flag set and hence won't be locked anyway. > > Later in uvm_map_pageable_wire() these pages are skipped when doing actual > > job. > > I don't know if this is right. Should prot_none pages not be wired? > > I think the opposite should happen. prot_none pages should be locked > as well. The app may be using prot_none as a way to protect its super > secret secrets from itself. It certainly wouldn't want them being > swapped out. > As long as they have VM_PROT_NONE, they can't be accessed and wiring them is just a waste of resources. If your scenario applies then uvm_map_protect() kicks in. It takes care of wiring pages if the protection changes from VM_PROT_NONE to some different value, though I have to admit that this happens only in case the VM_MAP_WIREFUTURE flag was specified. But that looks acceptable to me. Gerhard
Re: [PATCH] mlockall() problem in OpenBSD 5.2
On Thu, Nov 08, 2012 at 13:34, Ilya Bakulin wrote: > The problem seems to be in uvm_map_pageable_all() function > (sys/uvm/uvm_map.c). This function is a "special case of uvm_map_pageable", > which tries to mlockall() all mapped memory regions. > Prior to calling uvm_map_pageable_wire(), which actually does locking, it > tries to count how many memory bytes will be locked, and compares this > number > with uvmexp.wiredmax, which is set by RLIMIT_MEMLOCK. > The problem is that counting algorithm doesn't take into account that some > pages have VM_PROT_NONE flag set and hence won't be locked anyway. > Later in uvm_map_pageable_wire() these pages are skipped when doing actual > job. I don't know if this is right. Should prot_none pages not be wired? I think the opposite should happen. prot_none pages should be locked as well. The app may be using prot_none as a way to protect its super secret secrets from itself. It certainly wouldn't want them being swapped out.
Re: [PATCH] mlockall() problem in OpenBSD 5.2
> Date: Thu, 8 Nov 2012 13:08:24 + > From: Stuart Henderson > > Oh talking of RLIMIT reminds me...can someone who knows this area take > a look at http://thread.gmane.org/gmane.os.aeriebsd.general/100 please? Change looks reasonable, but I think that instead of multiplying vm_ssize by PAGE_SIZE, the could should use atop() to convert rlim_cur/rlim_max to a number of pages. That is how the checks are done in uvm/uvm_unix.c.
Re: [PATCH] mlockall() problem in OpenBSD 5.2
On 11/08/2012 02:08 PM, Stuart Henderson wrote: > Oh talking of RLIMIT reminds me...can someone who knows this area take > a look at http://thread.gmane.org/gmane.os.aeriebsd.general/100 please? > To me the fix looks reasonable. Limiting the stack size below the current usage shouldn't be allowed. And returning EINVAL is the right error code according to SUS ("The limit specified cannot be lowered because current usage is already higher than the limit."). However, the EINVAL error should then be added to the man page (lib/libc/sys/getrlimit.2). That's missing in Mickey's patch. Gerhard
Re: [PATCH] mlockall() problem in OpenBSD 5.2
Oh talking of RLIMIT reminds me...can someone who knows this area take a look at http://thread.gmane.org/gmane.os.aeriebsd.general/100 please?
Re: [PATCH] mlockall() problem in OpenBSD 5.2
I did a similar change recently (http://marc.info/?l=openbsd-cvs&m=135055003602935&w=2). Therefore I think that Ilya's patch is valid and should be applied. If anyone is willing to ok, I can commit it. Gerhard On 11/08/2012 01:34 PM, Ilya Bakulin wrote: > Hi list, > after upgrade on OpenBSD 5.2 we observe the following message from ntpd: > > Oct 22 17:20:13 gg74 ntpd[2918]: ntpd 4.2.6p2@1.2194-o Tue Oct 16 20:26:47 > UTC > 2012 (1) > Oct 22 17:20:13 gg74 ntpd[10103]: mlockall(): Cannot allocate memory > Oct 22 17:20:13 gg74 ntpd[10103]: signal_no_reset: signal 13 had flags 12 > Oct 22 17:20:13 gg74 ntpd[10103]: proto: precision = 6.390 usec > ... > This doesn't prevent ntpd from starting, however. > We tried to debug this problem. This occurs when an application tries to set > resource limits (more precisely, RLIMIT_MEMLOCK) to some relatively low value > and then does mlockall(). > > The problem seems to be in uvm_map_pageable_all() function > (sys/uvm/uvm_map.c). This function is a "special case of uvm_map_pageable", > which tries to mlockall() all mapped memory regions. > Prior to calling uvm_map_pageable_wire(), which actually does locking, it > tries to count how many memory bytes will be locked, and compares this number > with uvmexp.wiredmax, which is set by RLIMIT_MEMLOCK. > The problem is that counting algorithm doesn't take into account that some > pages have VM_PROT_NONE flag set and hence won't be locked anyway. > Later in uvm_map_pageable_wire() these pages are skipped when doing actual > job. > > Attached patch fixes the problem on OpenBSD 5.2. > > I'm also attaching a simple test application, that can be used to reproduce > the bug. Just compile and run as root (otherwise mlockall() doesn't work > anyway). On OpenBSD 5.2 the RLIMIT_MEMLOCK limit will be significantly higher > than on OpenBSD 5.1. After applying my patch they will be almost the same. > > Thank you for your attention. > > // Ilya