Re: [PATCH] mlockall() problem in OpenBSD 5.2

2013-03-04 Thread Ilya Bakulin
Hi,
just wanted to notice that the difference in locking and resource counting 
algorithms is not fixed yet. That means that mlockall() in 5.3 is still 
broken.

Given that src tree is unlocked now, is there any chance that it will be 
fixed?
By the way, we're using the patched version of uvm_map_pageable_all() for a 
long time now and haven't noticed any problems.

Thanks!

On Sunday 09 December 2012 00:26:35 Ariane van der Steldt wrote:
> Ilya Bakulin does point out a serious bug in the vmmap code however: the
> resource counting algorithms and locking algorithm count differently.
> The code ought to be in sync; if no developer is going to fix the
> commit-part of the code, I would seriously recommend putting Ilya's diff
> in.



signature.asc
Description: This is a digitally signed message part.


Re: [PATCH] mlockall() problem in OpenBSD 5.2

2012-12-10 Thread Gerhard Roth
On Sun, 09 Dec 2012 00:26:35 +0100 Ariane van der Steldt  
wrote:
> On 11/09/12 08:56, Gerhard Roth wrote:
> > On Thu, 08 Nov 2012 16:22:41 -0500
> > Ted Unangst  wrote:
> >> On Thu, Nov 08, 2012 at 13:34, Ilya Bakulin wrote:
> >>
> >>> The problem seems to be in uvm_map_pageable_all() function
> >>> (sys/uvm/uvm_map.c). This function is a "special case of 
> >>> uvm_map_pageable",
> >>> which tries to mlockall() all mapped memory regions.
> >>> Prior to calling uvm_map_pageable_wire(), which actually does locking, it
> >>> tries to count how many memory bytes will be locked, and compares this
> >>> number
> >>> with uvmexp.wiredmax, which is set by RLIMIT_MEMLOCK.
> >>> The problem is that counting algorithm doesn't take into account that some
> >>> pages have VM_PROT_NONE flag set and hence won't be locked anyway.
> >>> Later in uvm_map_pageable_wire() these pages are skipped when doing actual
> >>> job.
> >> I don't know if this is right.  Should prot_none pages not be wired?
> >>
> >> I think the opposite should happen.  prot_none pages should be locked
> >> as well.  The app may be using prot_none as a way to protect its super
> >> secret secrets from itself.  It certainly wouldn't want them being
> >> swapped out.
> >>
> > As long as they have VM_PROT_NONE, they can't be accessed and wiring them
> > is just a waste of resources.
> >
> > If your scenario applies then uvm_map_protect() kicks in. It takes care of
> > wiring pages if the protection changes from VM_PROT_NONE to some different
> > value, though I have to admit that this happens only in case the
> > VM_MAP_WIREFUTURE flag was specified. But that looks acceptable to me.
> 
> Tedu is right and you're wrong.  PROT_NONE protected pages must be wired 
> when calling mlock* functions.
> 
> The main argument: malloc protects its bookkeeping data using 
> mprotect(PROT_NONE), which you definitely want to wire if you call 
> mlockall (either because you want to prevent information leaking to disk 
> or you have a time-sensitive program like ntpd and swap hurts). 

So could you explain to me how malloc() accomplishes to store meta data
in memory with PROT_NONE?

It can't! It has to change the protection to PROT_READ|PROT_WRITE first.
As I pointed out before, changing the protection to something different
than PROT_NONE will wire the pages; but changing it back to PROT_NONE
won't un-wire them. Hence this change really affects only PROT_NONE
areas that are never accessed and probably serve as red zones or guard
pages.


Gerhard



Re: [PATCH] mlockall() problem in OpenBSD 5.2

2012-12-08 Thread Otto Moerbeek
On Sun, Dec 09, 2012 at 08:28:08AM +0100, Otto Moerbeek wrote:

> On Sun, Dec 09, 2012 at 12:26:35AM +0100, Ariane van der Steldt wrote:
> 
> > On 11/09/12 08:56, Gerhard Roth wrote:
> > >On Thu, 08 Nov 2012 16:22:41 -0500
> > >Ted Unangst  wrote:
> > >>On Thu, Nov 08, 2012 at 13:34, Ilya Bakulin wrote:
> > >>
> > >>>The problem seems to be in uvm_map_pageable_all() function
> > >>>(sys/uvm/uvm_map.c). This function is a "special case of 
> > >>>uvm_map_pageable",
> > >>>which tries to mlockall() all mapped memory regions.
> > >>>Prior to calling uvm_map_pageable_wire(), which actually does locking, it
> > >>>tries to count how many memory bytes will be locked, and compares this
> > >>>number
> > >>>with uvmexp.wiredmax, which is set by RLIMIT_MEMLOCK.
> > >>>The problem is that counting algorithm doesn't take into account that 
> > >>>some
> > >>>pages have VM_PROT_NONE flag set and hence won't be locked anyway.
> > >>>Later in uvm_map_pageable_wire() these pages are skipped when doing 
> > >>>actual
> > >>>job.
> > >>I don't know if this is right.  Should prot_none pages not be wired?
> > >>
> > >>I think the opposite should happen.  prot_none pages should be locked
> > >>as well.  The app may be using prot_none as a way to protect its super
> > >>secret secrets from itself.  It certainly wouldn't want them being
> > >>swapped out.
> > >>
> > >As long as they have VM_PROT_NONE, they can't be accessed and wiring them
> > >is just a waste of resources.
> > >
> > >If your scenario applies then uvm_map_protect() kicks in. It takes care of
> > >wiring pages if the protection changes from VM_PROT_NONE to some different
> > >value, though I have to admit that this happens only in case the
> > >VM_MAP_WIREFUTURE flag was specified. But that looks acceptable to me.
> > 
> > Tedu is right and you're wrong.  PROT_NONE protected pages must be
> > wired when calling mlock* functions.
> > 
> > The main argument: malloc protects its bookkeeping data using
> > mprotect(PROT_NONE), which you definitely want to wire if you call
> > mlockall (either because you want to prevent information leaking to
> > disk or you have a time-sensitive program like ntpd and swap hurts).
> > As for wasting resources: the kernel has insufficient information to
> > fix wasteful programs, nor does it have sufficient information to
> > consider PROT_NONE pages on a case-by-case basis.
> > 
> > Also consider that there is a limitation on wired memory, if you are
> > concerned about wasting resources.
> > 
> > 
> > Ilya Bakulin does point out a serious bug in the vmmap code however:
> > the resource counting algorithms and locking algorithm count
> > differently.  The code ought to be in sync; if no developer is going
> > to fix the commit-part of the code, I would seriously recommend
> > putting Ilya's diff in.
> > -- 
> > Ariane
> 
> A corection is needed here: malloc uses PROT_NONE for guard pages,
> PROT_NONE is not used to protect meta data. However, if the F flag is
> used, cached free pages are protected by PROT_NONE. The only other
> case is pages pointed to by the return values of malloc(0) calls.


An an addition: by default malloc dos not use guard pages for user
data, only a few for the main meta data. So you could argue that
PROT_NONE ise used to protect meta data, but this is only two pages. 

-Otto



Re: [PATCH] mlockall() problem in OpenBSD 5.2

2012-12-08 Thread Otto Moerbeek
On Sun, Dec 09, 2012 at 12:26:35AM +0100, Ariane van der Steldt wrote:

> On 11/09/12 08:56, Gerhard Roth wrote:
> >On Thu, 08 Nov 2012 16:22:41 -0500
> >Ted Unangst  wrote:
> >>On Thu, Nov 08, 2012 at 13:34, Ilya Bakulin wrote:
> >>
> >>>The problem seems to be in uvm_map_pageable_all() function
> >>>(sys/uvm/uvm_map.c). This function is a "special case of uvm_map_pageable",
> >>>which tries to mlockall() all mapped memory regions.
> >>>Prior to calling uvm_map_pageable_wire(), which actually does locking, it
> >>>tries to count how many memory bytes will be locked, and compares this
> >>>number
> >>>with uvmexp.wiredmax, which is set by RLIMIT_MEMLOCK.
> >>>The problem is that counting algorithm doesn't take into account that some
> >>>pages have VM_PROT_NONE flag set and hence won't be locked anyway.
> >>>Later in uvm_map_pageable_wire() these pages are skipped when doing actual
> >>>job.
> >>I don't know if this is right.  Should prot_none pages not be wired?
> >>
> >>I think the opposite should happen.  prot_none pages should be locked
> >>as well.  The app may be using prot_none as a way to protect its super
> >>secret secrets from itself.  It certainly wouldn't want them being
> >>swapped out.
> >>
> >As long as they have VM_PROT_NONE, they can't be accessed and wiring them
> >is just a waste of resources.
> >
> >If your scenario applies then uvm_map_protect() kicks in. It takes care of
> >wiring pages if the protection changes from VM_PROT_NONE to some different
> >value, though I have to admit that this happens only in case the
> >VM_MAP_WIREFUTURE flag was specified. But that looks acceptable to me.
> 
> Tedu is right and you're wrong.  PROT_NONE protected pages must be
> wired when calling mlock* functions.
> 
> The main argument: malloc protects its bookkeeping data using
> mprotect(PROT_NONE), which you definitely want to wire if you call
> mlockall (either because you want to prevent information leaking to
> disk or you have a time-sensitive program like ntpd and swap hurts).
> As for wasting resources: the kernel has insufficient information to
> fix wasteful programs, nor does it have sufficient information to
> consider PROT_NONE pages on a case-by-case basis.
> 
> Also consider that there is a limitation on wired memory, if you are
> concerned about wasting resources.
> 
> 
> Ilya Bakulin does point out a serious bug in the vmmap code however:
> the resource counting algorithms and locking algorithm count
> differently.  The code ought to be in sync; if no developer is going
> to fix the commit-part of the code, I would seriously recommend
> putting Ilya's diff in.
> -- 
> Ariane

A corection is needed here: malloc uses PROT_NONE for guard pages,
PROT_NONE is not used to protect meta data. However, if the F flag is
used, cached free pages are protected by PROT_NONE. The only other
case is pages pointed to by the return values of malloc(0) calls.

-Otto



Re: [PATCH] mlockall() problem in OpenBSD 5.2

2012-12-08 Thread Ariane van der Steldt

On 11/09/12 08:56, Gerhard Roth wrote:

On Thu, 08 Nov 2012 16:22:41 -0500
Ted Unangst  wrote:

On Thu, Nov 08, 2012 at 13:34, Ilya Bakulin wrote:


The problem seems to be in uvm_map_pageable_all() function
(sys/uvm/uvm_map.c). This function is a "special case of uvm_map_pageable",
which tries to mlockall() all mapped memory regions.
Prior to calling uvm_map_pageable_wire(), which actually does locking, it
tries to count how many memory bytes will be locked, and compares this
number
with uvmexp.wiredmax, which is set by RLIMIT_MEMLOCK.
The problem is that counting algorithm doesn't take into account that some
pages have VM_PROT_NONE flag set and hence won't be locked anyway.
Later in uvm_map_pageable_wire() these pages are skipped when doing actual
job.

I don't know if this is right.  Should prot_none pages not be wired?

I think the opposite should happen.  prot_none pages should be locked
as well.  The app may be using prot_none as a way to protect its super
secret secrets from itself.  It certainly wouldn't want them being
swapped out.


As long as they have VM_PROT_NONE, they can't be accessed and wiring them
is just a waste of resources.

If your scenario applies then uvm_map_protect() kicks in. It takes care of
wiring pages if the protection changes from VM_PROT_NONE to some different
value, though I have to admit that this happens only in case the
VM_MAP_WIREFUTURE flag was specified. But that looks acceptable to me.


Tedu is right and you're wrong.  PROT_NONE protected pages must be wired 
when calling mlock* functions.


The main argument: malloc protects its bookkeeping data using 
mprotect(PROT_NONE), which you definitely want to wire if you call 
mlockall (either because you want to prevent information leaking to disk 
or you have a time-sensitive program like ntpd and swap hurts).  As for 
wasting resources: the kernel has insufficient information to fix 
wasteful programs, nor does it have sufficient information to consider 
PROT_NONE pages on a case-by-case basis.


Also consider that there is a limitation on wired memory, if you are 
concerned about wasting resources.



Ilya Bakulin does point out a serious bug in the vmmap code however: the 
resource counting algorithms and locking algorithm count differently.  
The code ought to be in sync; if no developer is going to fix the 
commit-part of the code, I would seriously recommend putting Ilya's diff in.

--
Ariane



Re: [PATCH] mlockall() problem in OpenBSD 5.2

2012-11-19 Thread Ilya Bakulin
Hi Mark,
could you please also review my patch? It fixes a problem,
which is not related to RLIMIT thing described by mickey.

On Thursday 08 November 2012 14:30:07 Mark Kettenis wrote:
> > Date: Thu, 8 Nov 2012 13:08:24 +
> > From: Stuart Henderson 
> >
> > Oh talking of RLIMIT reminds me...can someone who knows this area take
> > a look at http://thread.gmane.org/gmane.os.aeriebsd.general/100 please?
>
> Change looks reasonable, but I think that instead of multiplying
> vm_ssize by PAGE_SIZE, the could should use atop() to convert
> rlim_cur/rlim_max to a number of pages.  That is how the checks are
> done in uvm/uvm_unix.c.



Re: [PATCH] mlockall() problem in OpenBSD 5.2

2012-11-09 Thread Gerhard Roth
On Thu, 08 Nov 2012 16:22:41 -0500
Ted Unangst  wrote:
> On Thu, Nov 08, 2012 at 13:34, Ilya Bakulin wrote:
> 
> > The problem seems to be in uvm_map_pageable_all() function
> > (sys/uvm/uvm_map.c). This function is a "special case of uvm_map_pageable",
> > which tries to mlockall() all mapped memory regions.
> > Prior to calling uvm_map_pageable_wire(), which actually does locking, it
> > tries to count how many memory bytes will be locked, and compares this
> > number
> > with uvmexp.wiredmax, which is set by RLIMIT_MEMLOCK.
> > The problem is that counting algorithm doesn't take into account that some
> > pages have VM_PROT_NONE flag set and hence won't be locked anyway.
> > Later in uvm_map_pageable_wire() these pages are skipped when doing actual
> > job.
> 
> I don't know if this is right.  Should prot_none pages not be wired?
> 
> I think the opposite should happen.  prot_none pages should be locked
> as well.  The app may be using prot_none as a way to protect its super
> secret secrets from itself.  It certainly wouldn't want them being
> swapped out.
> 

As long as they have VM_PROT_NONE, they can't be accessed and wiring them
is just a waste of resources.

If your scenario applies then uvm_map_protect() kicks in. It takes care of
wiring pages if the protection changes from VM_PROT_NONE to some different
value, though I have to admit that this happens only in case the
VM_MAP_WIREFUTURE flag was specified. But that looks acceptable to me.

Gerhard



Re: [PATCH] mlockall() problem in OpenBSD 5.2

2012-11-08 Thread Ted Unangst
On Thu, Nov 08, 2012 at 13:34, Ilya Bakulin wrote:

> The problem seems to be in uvm_map_pageable_all() function
> (sys/uvm/uvm_map.c). This function is a "special case of uvm_map_pageable",
> which tries to mlockall() all mapped memory regions.
> Prior to calling uvm_map_pageable_wire(), which actually does locking, it
> tries to count how many memory bytes will be locked, and compares this
> number
> with uvmexp.wiredmax, which is set by RLIMIT_MEMLOCK.
> The problem is that counting algorithm doesn't take into account that some
> pages have VM_PROT_NONE flag set and hence won't be locked anyway.
> Later in uvm_map_pageable_wire() these pages are skipped when doing actual
> job.

I don't know if this is right.  Should prot_none pages not be wired?

I think the opposite should happen.  prot_none pages should be locked
as well.  The app may be using prot_none as a way to protect its super
secret secrets from itself.  It certainly wouldn't want them being
swapped out.



Re: [PATCH] mlockall() problem in OpenBSD 5.2

2012-11-08 Thread Mark Kettenis
> Date: Thu, 8 Nov 2012 13:08:24 +
> From: Stuart Henderson 
> 
> Oh talking of RLIMIT reminds me...can someone who knows this area take
> a look at http://thread.gmane.org/gmane.os.aeriebsd.general/100 please?

Change looks reasonable, but I think that instead of multiplying
vm_ssize by PAGE_SIZE, the could should use atop() to convert
rlim_cur/rlim_max to a number of pages.  That is how the checks are
done in uvm/uvm_unix.c.



Re: [PATCH] mlockall() problem in OpenBSD 5.2

2012-11-08 Thread Gerhard Roth
On 11/08/2012 02:08 PM, Stuart Henderson wrote:
> Oh talking of RLIMIT reminds me...can someone who knows this area take
> a look at http://thread.gmane.org/gmane.os.aeriebsd.general/100 please?
> 

To me the fix looks reasonable. Limiting the stack size below
the current usage shouldn't be allowed. And returning EINVAL is the
right error code according to SUS ("The limit specified cannot be
lowered because current usage is already higher than the limit.").

However, the EINVAL error should then be added to the man page
(lib/libc/sys/getrlimit.2). That's missing in Mickey's patch.

Gerhard



Re: [PATCH] mlockall() problem in OpenBSD 5.2

2012-11-08 Thread Stuart Henderson
Oh talking of RLIMIT reminds me...can someone who knows this area take
a look at http://thread.gmane.org/gmane.os.aeriebsd.general/100 please?



Re: [PATCH] mlockall() problem in OpenBSD 5.2

2012-11-08 Thread Gerhard Roth
I did a similar change recently
(http://marc.info/?l=openbsd-cvs&m=135055003602935&w=2). Therefore I
think that Ilya's patch is valid and should be applied.

If anyone is willing to ok, I can commit it.

Gerhard


On 11/08/2012 01:34 PM, Ilya Bakulin wrote:
> Hi list,
> after upgrade on OpenBSD 5.2 we observe the following message from ntpd:
> 
> Oct 22 17:20:13 gg74 ntpd[2918]: ntpd 4.2.6p2@1.2194-o Tue Oct 16 20:26:47 
> UTC 
> 2012 (1)
> Oct 22 17:20:13 gg74 ntpd[10103]: mlockall(): Cannot allocate memory
> Oct 22 17:20:13 gg74 ntpd[10103]: signal_no_reset: signal 13 had flags 12
> Oct 22 17:20:13 gg74 ntpd[10103]: proto: precision = 6.390 usec
> ...
> This doesn't prevent ntpd from starting, however.
> We tried to debug this problem. This occurs when an application tries to set 
> resource limits (more precisely, RLIMIT_MEMLOCK) to some relatively low value 
> and then does mlockall().
> 
> The problem seems to be in uvm_map_pageable_all() function 
> (sys/uvm/uvm_map.c). This function is a "special case of uvm_map_pageable", 
> which tries to mlockall() all mapped memory regions.
> Prior to calling uvm_map_pageable_wire(), which actually does locking, it 
> tries to count how many memory bytes will be locked, and compares this number 
> with uvmexp.wiredmax, which is set by RLIMIT_MEMLOCK.
> The problem is that counting algorithm doesn't take into account that some 
> pages have VM_PROT_NONE flag set and hence won't be locked anyway.
> Later in uvm_map_pageable_wire() these pages are skipped when doing actual 
> job.
> 
> Attached patch fixes the problem on OpenBSD 5.2.
> 
> I'm also attaching a simple test application, that can be used to reproduce 
> the bug. Just compile and run as root (otherwise mlockall() doesn't work 
> anyway). On OpenBSD 5.2 the RLIMIT_MEMLOCK limit will be significantly higher 
> than on OpenBSD 5.1. After applying my patch they will be almost the same.
> 
> Thank you for your attention.
> 
> // Ilya