Re: sigprocmask and fork

2011-10-26 Thread Kostik Belousov
On Wed, Oct 26, 2011 at 05:02:00PM +0400, Alexandr Matveev wrote:
> Hi,
> 
> We are using FreeBSD 8.2 on our servers for high load projects.
> When I was preparing system for production I saw strange (as I think) 
> behavior,
> that leads to increased load on servers.
> 
> If I made truss on httpd (apache22) process, I saw too much sigprocmask 
> syscalls:
> 
> 24822: sigprocmask(SIG_BLOCK,0x0,0x0)= 0 (0x0)
> 24822: sigprocmask(SIG_BLOCK,0x0,0x0)= 0 (0x0)
> 24822: sigprocmask(SIG_BLOCK,0x0,0x0)= 0 (0x0)
> 24822: sigprocmask(SIG_BLOCK,0x0,0x0)= 0 (0x0)
> 24822: sigprocmask(SIG_BLOCK,0x0,0x0)= 0 (0x0)
> ... too many lines ...
> 
> and
> 
> 24822:
> sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|S
>  
> 
> IGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0) = 0 (0x0)
> 24822: sigprocmask(SIG_SETMASK,0x0,0x0)  = 0 (0x0)
> 24822:
> sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|S
>  
> 
> IGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0) = 0 (0x0)
> 24822: sigprocmask(SIG_SETMASK,0x0,0x0)  = 0 (0x0)
> 24822:
> sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|S
>  
> 
> IGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0) = 0 (0x0)
> 24822: sigprocmask(SIG_SETMASK,0x0,0x0)  = 0 (0x0)
> ... too many lines ...
> 
> 
> but apache, and modules loaded from it do not call this directly.
> I was trying to use DTRACE for getting information about syscalls, and I
> got same result.
> 
> I wrote a tiny sample
> 
> code:
> $ cat sigproc_test.c
> #include
> 
> main()
> {
> fork();
> }
> I ran it on FreeBSD with different compilers:
> $ cc sigproc_test.c -o sigproc_test
> $ truss ./sigproc_test 2>&1 | grep sigprocmask | wc -l
>8
> $ g++ sigproc_test.c -o sigproc_test
> $ truss ./sigproc_test 2>&1 | grep sigprocmask | wc -l
>   20
> 
> Is it normal to make so many sigprocmask syscalls for such simple program?
> For example, there is no sigprocmask syscalls when I run it on Debian 
> Linux.
Yes, it is normal. I do not quite understand the relation between
"simple program" and "so many sigprocmask syscalls" claim, but alas.

The calls to sigprocmask originate from the rtld, which needs to
guarantee async-signal safety of the lazy binding process. The rtld
locks, besides other things, block signals. Also, due to an additional
requirement that rtld is functional after the fork, it has to acquire
the internal locks around fork in multithreaded programs.

All lock acquisions in the program which does only one fork(2) call
in main(), as well as in the program that does nothing in main at all,
come from the rtld initialization before main, and shared library
finalizations after. For empty main(), there are 4 locks acquisions
after return from main, so you get total 12 for forked version.

In essence, the trivial program makes the same amount of locking in
the startup/exit path, as non-trivial (the cost is proportional to the
number of shared libraries loaded).

The typical cause for the rtld locks acquisitions during the program
execution, besides the dlopen(3) activity, is the lazy resolution of
the PLT entries. LD_BIND_NOW=1 moves the load to the program startup.

> 
> Another sample. Here we have sigprocmask syscalls on Linux too, but 
> FreeBSD makes this syscall significantly more often:
> $ cat test.c
> #include
> 
> main()
> {
> int i;
> sleep(2);
> for (i = 0; i<3; i++) {
> int pid = fork();
> if (!pid) {
> sleep(0.5);
> return 0;
> }
> sleep(2);
> }
> }
> 
> FreeBSD 8.2:
> # truss -f ./test
> ... SKIPPED ...
> 48666: 
> sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0)
>  
> = 0 (0x0)
> 48666: sigprocmask(SIG_SETMASK,0x0,0x0) = 0 (0x0)
> 48666: __sysctl(0xbfbfe6a4,0x2,0x28192700,0xbfbfe6ac,0x0,0x0) = 0 (0x0)
> 48666: 
> sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0)
>  
> = 0 (0x0)
> 48666: sigprocmask(SIG_SETMASK,0x0,0x0) = 0 (0x0)
> 48666: nanosleep({2.0 }) = 0 (0x0)
> 48666: fork() = 48667 (0xbe1b)
> 48667: sigprocmask(SIG_SETMASK,0x0,0x0) = 0 (0x0)
> 48667: 
> sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0)
>  
> = 0 (0x0)
> 48667: sigprocmask(SIG_

Re: _SC_GETPW_R_SIZE_MAX undefined in sysconf.c, what is correct value?

2011-10-26 Thread Christopher J. Ruwe
On Tue, 25 Oct 2011 16:27:38 -0500
Dan Nelson  wrote:

> In the last episode (Oct 25), Christopher J. Ruwe said:
> > On Mon, 24 Oct 2011 15:42:10 -0500
> > Dan Nelson  wrote:
> > > In the last episode (Oct 24), Christopher J. Ruwe said:
> > > > On Sun, 23 Oct 2011 19:10:34 -0500
> > > > Dan Nelson  wrote:
> > > > > In the last episode (Oct 23), Christopher J. Ruwe said:
> > > > > > I need to get the maximum size of an pwd-entry to determine
> > > > > > the correct buffersize for calling getpwnam_r("uname",&pwd,
> > > > > > buf, bufsize, &pwdp).  I would like to use
> > > > > > sysconf(_SC_GETPW_R_SIZE_MAX) to determine bufsize, which
> > > > > > unfornutately fails (returns -1).  Currently, I used 16384,
> > > > > > which seems to be too much, bit works for the time being.
> > > [..]
> > > > > From looking at the libc/gen/getpwent.c file, it looks like a
> > > > > maximum size might be 1MB.  The wrapper functions that convert
> > > > > getpw*_r functions into ones that simply return a pointer to
> > > > > malloced data all use the getpw() helper function, which
> > > > > starts with a 1k buffer and keeps doubling its size until the
> > > > > data fits or it hits PWD_STORAGE_MAX (1MB).  PWD_STORAGE_MAX
> > > > > is only checked within that getpw() function, though, so it's
> > > > > possible that an nss library might return an even longer
> > > > > string to a get*_r call.  It's up to you to decide what your
> > > > > own limit is :)
> > > >
> > > > Uh ... it's just that I hoped I had not to decide ;-)
> > > 
> > > The getpwnam_r function needs enough space to store the "struct
> > > passwd" itself (which has a constant size) plus the strings
> > > pointed to by pw_name, pw_class, pw_gecos, pw_dir, and pw_shell.
> > > If you have enough control over your environment that you can
> > > guarantee that the sum of those strings won't be larger than 4k,
> > > then you can just used a fixed buffer of that size.  Even 1k is
> > > probably large enough for 99.999% of all systems.  That's a
> > > really long home directory or shell path :) On the other hand,
> > > the GECOS field is theoretially free-form and could contain a lot
> > > of data.  I've never see it hold more than an office number
> > > myself, though.
> > > 
> > 
> > Thanks for your help so far. Just assuming (I am not sufficiently
> > clear about myself and my own intents) I want to be precise and am
> > afraid of guessing: Can I assume that the gecos field is an entry
> > in /etc/passwd and can therefore never exceed LINE_MAX, i.e., 2048B
> > (limits.h, line 72)?  Or, more precisely, ( 2048B - sum( lenght(all
> > fields except passwd) ) )? Would that be an acceptable limit to set
> > the getpwnam_r( ...  ) buffer to and/or would that be an acceptable
> > value to replace the following bit from sysconf.c?
> > 
> > 372#if _POSIX_THREAD_SAFE_FUNCTIONS > -1
> > 373 case _SC_GETGR_R_SIZE_MAX:
> > 374 case _SC_GETPW_R_SIZE_MAX:
> > 375 #error "somebody needs to implement this"
> > 376#endif
> 
> If your nsswitch.conf has "passwd: files" in it, then yes, you can
> assume that the 2048-byte limit applies.  However, if you are using
> nss_ldap, nss_mysql, nss_winbind, or some other nsswitch module that
> provides user info, that backend user system may be capable of
> returning longer strings. If you want to be able to handle any struct
> passwd that might be thrown at you, you should implement a "retry
> with doubling" loop similar to the one in
> libc/gen/getpwent.c:getpw() . 

This method has been suggested at some other sites as well ... I just had hopes 
it would be able to implement that in a more elegant and more concise manner.

Anyways, thank you for your kind help, cheers,
-- 
Christopher J. Ruwe
TZ GMT + 2

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


sigprocmask and fork

2011-10-26 Thread Alexandr Matveev

Hi,

We are using FreeBSD 8.2 on our servers for high load projects.
When I was preparing system for production I saw strange (as I think) 
behavior,

that leads to increased load on servers.

If I made truss on httpd (apache22) process, I saw too much sigprocmask 
syscalls:


24822: sigprocmask(SIG_BLOCK,0x0,0x0)= 0 (0x0)
24822: sigprocmask(SIG_BLOCK,0x0,0x0)= 0 (0x0)
24822: sigprocmask(SIG_BLOCK,0x0,0x0)= 0 (0x0)
24822: sigprocmask(SIG_BLOCK,0x0,0x0)= 0 (0x0)
24822: sigprocmask(SIG_BLOCK,0x0,0x0)= 0 (0x0)
... too many lines ...

and

24822:
sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|S 


IGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0) = 0 (0x0)
24822: sigprocmask(SIG_SETMASK,0x0,0x0)  = 0 (0x0)
24822:
sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|S 


IGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0) = 0 (0x0)
24822: sigprocmask(SIG_SETMASK,0x0,0x0)  = 0 (0x0)
24822:
sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|S 


IGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0) = 0 (0x0)
24822: sigprocmask(SIG_SETMASK,0x0,0x0)  = 0 (0x0)
... too many lines ...


but apache, and modules loaded from it do not call this directly.
I was trying to use DTRACE for getting information about syscalls, and I
got same result.

I wrote a tiny sample

code:
$ cat sigproc_test.c
#include

main()
{
fork();
}
I ran it on FreeBSD with different compilers:
$ cc sigproc_test.c -o sigproc_test
$ truss ./sigproc_test 2>&1 | grep sigprocmask | wc -l
   8
$ g++ sigproc_test.c -o sigproc_test
$ truss ./sigproc_test 2>&1 | grep sigprocmask | wc -l
  20

Is it normal to make so many sigprocmask syscalls for such simple program?
For example, there is no sigprocmask syscalls when I run it on Debian 
Linux.


Another sample. Here we have sigprocmask syscalls on Linux too, but 
FreeBSD makes this syscall significantly more often:

$ cat test.c
#include

main()
{
int i;
sleep(2);
for (i = 0; i<3; i++) {
int pid = fork();
if (!pid) {
sleep(0.5);
return 0;
}
sleep(2);
}
}

FreeBSD 8.2:
# truss -f ./test
... SKIPPED ...
48666: 
sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0) 
= 0 (0x0)

48666: sigprocmask(SIG_SETMASK,0x0,0x0) = 0 (0x0)
48666: __sysctl(0xbfbfe6a4,0x2,0x28192700,0xbfbfe6ac,0x0,0x0) = 0 (0x0)
48666: 
sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0) 
= 0 (0x0)

48666: sigprocmask(SIG_SETMASK,0x0,0x0) = 0 (0x0)
48666: nanosleep({2.0 }) = 0 (0x0)
48666: fork() = 48667 (0xbe1b)
48667: sigprocmask(SIG_SETMASK,0x0,0x0) = 0 (0x0)
48667: 
sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0) 
= 0 (0x0)

48667: sigprocmask(SIG_SETMASK,0x0,0x0) = 0 (0x0)
48667: process exit, rval = 0
48666: nanosleep({2.0 }) = 0 (0x0)
48666: fork() = 48669 (0xbe1d)
48669: 
sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0) 
= 0 (0x0)

48669: sigprocmask(SIG_SETMASK,0x0,0x0) = 0 (0x0)
48669: process exit, rval = 0
48666: nanosleep({2.0 }) = 0 (0x0)
48666: fork() = 48674 (0xbe22)
48674: sigprocmask(SIG_SETMASK,0x0,0x0) = 0 (0x0)
48674: 
sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0) 
= 0 (0x0)

48674: sigprocmask(SIG_SETMASK,0x0,0x0) = 0 (0x0)
48674: process exit, rval = 0
48666: nanosleep({2.0 }) = 0 (0x0)
48666: 
sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0) 
= 0 (0x0)

48666: sigprocmask(SIG_SETMASK,0x0,0x0) = 0 (0x0)
48666: 
sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0) 
= 0 (0x0)

48666: sigprocmask(SIG_SETMASK,0x0,0x0) = 0

Re: mmap performance and memory use

2011-10-26 Thread Svatopluk Kraus
Hi,

well, I'm working on new port (arm11 mpcore) and pmap_enter_object()
is what I'm debugging rigth now. And I did not find any way in
userland how to force kernel to call pmap_enter_object() which makes
SUPERPAGE mapping without promotion. I tried to call mmap() with
MAP_PREFAULT_READ without success. I tried to call madvise() with
MADV_WILLNEED without success too.

To make SUPERPAGE mapping, it's obvious that all physical pages under
SUPERPAGE must be allocated in vm_object. And SUPERPAGE mapping must
be done before first access to them, otherwise a promotion is on the
way. MAP_PREFAULT_READ does nothing with it. If madvice() is used,
vm_object_madvise() is called but only cached pages are allocated in
advance. Of coarse, an allocation of all physical memory behind
virtual address space in advance is not preferred in most situations.

For example, I want to do some computation on 4M memory space (I know
that each byte will be accessed) and want to utilize SUPERPAGE mapping
without promotion, so save 4K page table (i386 machine). However,
malloc() leads to promotion, mmap() with MAP_PREFAULT_READ doesn't do
nothing so SUPERPAGE mapping is promoted, and madvice() with
MADV_WILLNEED calls vm_object_madvise() but because the pages are not
cached (how can be on anonymous memory), it is not work without
promotion too.

So, SUPERPAGE mapping without promotions is fine, but it can be done
only if physical memory being mapped is already allocated. Is it
really possible to force that in userland?

Moreover, the SUPERPAGE mapping is made readonly firstly. So, even if
I have SUPERPAGE mapping without promotion, the mapping is demoted
after first write, and promoted again after all underlying pages are
accessed by write. There is 4K page table saving no longer.

   Svata

On Wed, Oct 26, 2011 at 1:35 AM, Alan Cox  wrote:
> On 10/10/2011 4:28 PM, Wojciech Puchar wrote:
>>>
>>> Notice that vm.pmap.pde.promotions increased by 31.  This means that 31
>>> superpage mappings were created by promotion from small page mappings.
>>
>> thank you. i looked at .mappings as it seemed logical for me that is shows
>> total.
>>
>>> In contrast, vm.pmap.pde.mappings counts superpage mappings that are
>>> created directly and not by promotion from small page mappings.  For
>>> example, if a large executable, such as gcc, is resident in memory, the text
>>> segment will be pre-mapped using superpage mappings, avoiding soft fault and
>>> promotion overhead.  Similarly, mmap(..., MAP_PREFAULT_READ) on a large,
>>> memory resident file may pre-map the file using superpage mappings.
>>
>> your options are not described in mmap manpage nor madvise
>> (MAP_PREFAULT_READ).
>>
>> when can i find the up to date manpage or description?
>>
>
> A few minutes ago, I merged the changes to support and document
> MAP_PREFAULT_READ into 8-STABLE.  So, now it exists in HEAD, 9.0, and
> 8-STABLE.
>
> Alan
>
>
>
> ___
> freebsd-hackers@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
>
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"