Re: [9fans] Calling vac from C

2009-02-24 Thread erik quanstrom
 about 5 years ago i took a class on performance tuning Solaris.
 
 The instructor claimed that fork was expensive because accounting is never 
 really turned off, just piped to /dev/null.  there is no accounting overhead 
 for threads.
 
 I never bothered to verify this, but now that this comes up, I'd tempted.

there's no need to guess.  here's the source code.

http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/os/fork.c

cfork is ~525 lines long and seems to take the curious tack of
forking all the lwps associated with a process.  i don't
see any accounting, but i see at least 9 + nlwp + nresourcectls
mutex locks if you follow the regular fork path.  what is the
accounting that you're thinking of?  it would be easy to miss.

- erik



Re: [9fans] Calling vac from C

2009-02-24 Thread Roman V. Shaposhnik
On Tue, 2009-02-24 at 10:54 -0500, erik quanstrom wrote:
  about 5 years ago i took a class on performance tuning Solaris.
  
  The instructor claimed that fork was expensive because accounting is never 
  really turned off, just piped to /dev/null.  there is no accounting 
  overhead for threads.
  
  I never bothered to verify this, but now that this comes up, I'd tempted.
 
 there's no need to guess.  here's the source code.
 
 http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/os/fork.c
 
 cfork is ~525 lines long and seems to take the curious tack of
 forking all the lwps associated with a process. 

that would be forkall(), not fork1()/fork()

Thanks,
Roman.




Re: [9fans] Calling vac from C

2009-02-24 Thread erik quanstrom
  cfork is ~525 lines long and seems to take the curious tack of
  forking all the lwps associated with a process. 
 
 that would be forkall(), not fork1()/fork()

my mistake.  i assumed that since isfork1 was
a flag, that it was not the normal path through
the code.  silly me.

so where's the mythical accounting?

- erik



Re: [9fans] Calling vac from C

2009-02-24 Thread Roman V. Shaposhnik
On Tue, 2009-02-24 at 11:22 -0500, erik quanstrom wrote:
   cfork is ~525 lines long and seems to take the curious tack of
   forking all the lwps associated with a process. 
  
  that would be forkall(), not fork1()/fork()
 
 my mistake.  i assumed that since isfork1 was
 a flag, that it was not the normal path through
 the code.  silly me.
 
 so where's the mythical accounting?

It could have been the old accounting. Solaris 10
changed a lot of that and made things like microstate
accounting on by default, thus, possibly eliminating
the kind of bottlenecks the instructor was referring
to. More on that here:
   http://blogs.sun.com/eschrock/entry/microstate_accounting_in_solaris_10

Thanks,
Roman.





Re: [9fans] Calling vac from C

2009-02-24 Thread erik quanstrom
 It could have been the old accounting. Solaris 10
 changed a lot of that and made things like microstate
 accounting on by default, thus, possibly eliminating
 the kind of bottlenecks the instructor was referring
 to. More on that here:
http://blogs.sun.com/eschrock/entry/microstate_accounting_in_solaris_10

data would be helpful.  nobody here has shown any version
of solaris to be slow forking, much less that it is slow
forking because of accounting.  my dimly-remembered
data support the conclusion that the now-ancient 670mp
had a poor mmu, not that solaris was good or bad.
i couldn't find rob's original email online, so i put it up.
http://www.quanstro.net/plan9/p9fork.txt

- erik



Re: [9fans] Calling vac from C

2009-02-23 Thread Ben Calvert

On Fri, 20 Feb 2009, erik quanstrom wrote:


On Fri Feb 20 11:18:41 EST 2009, urie...@gmail.com wrote:

One of the main costs of dynamic linking is making fork much slower.
Even on linux statically linked binaries fork a few magnitude orders
faster than dynamically linked ones.

The main source of anti-fork FUD turns out to be the alleged
'solution' to a problem that didn't exist until the geniuses at Sun
decided dynamic linking was such a wonderful idea.


very generally, i agree with the direction of your
post.  but i do remember things a bit differently.

iirc, this went the other way 'round.  fork itself
was very expensive on sun hardware in the early
90s if one had some memory mapped.  sun mmus
had issues.  i benchmarked a vax 11/780 vs a sun
670mp.  the 4x50mhz 670mp was scheduled to replace the
1x5mhz (?) vaxen.  the vax forked maybe 10x faster when no
memory was allocated.  however, when a moderate
amount of memory was allocated, the vax pounded
the sun by many (3, i think) of magnitude.


about 5 years ago i took a class on performance tuning Solaris.

The instructor claimed that fork was expensive because accounting is never 
really turned off, just piped to /dev/null.  there is no accounting overhead 
for threads.

I never bothered to verify this, but now that this comes up, I'd tempted.


- erik






Re: [9fans] Calling vac from C

2009-02-20 Thread anooop . anooop
On Feb 19, 8:03 am, quans...@quanstro.net (erik quanstrom) wrote:

 what's wrong with the tools-based approach
 you're currently using?

 this may be hard to believe coming from unix,
 but your approach is what many tools do.  nobody
 links to a tcs library.  one uses the tcs(1)
 executable.

 executables.  god's answer to dynamic linking.

 - erik

just a matter of preference  :-)

I believe that
1) Its too much trouble parsing the output everytime.
2) Calling some function from an included library will be faster.

~Anoop



Re: [9fans] Calling vac from C

2009-02-20 Thread erik quanstrom
 I believe that
 1) Its too much trouble parsing the output everytime.

i don't buy that.  that takes very little code.  since you
have evidently already written the code, the cost
is zero.

(if you're worried about runtime, i measure parsing
time at 338ns on a core i7 920.  cf. attached digestspd.c)

 2) Calling some function from an included library will be faster.

maybe.  are you sure that it matters?  i measure
base fork/exec latency on a 1.8ghz xeon5000 at 330µs.
(files served from the fileserver, not a ram disk.)
the attached fork.c and nop.c were used to do the
measurement.  i measure vac throughput at ~3mb/s
for small files from a brand new venti running from a
ramdisk.  the venti was tiny with 5mb isect and 100mb
arenas, and empty.  at that rate, 330µs will cost you
1038 bytes, or 0.3%.

remember that dynamic linking isn't free.  that cost
assumes that dynamic linking is free, and it is not.

- erik#include u.h
#include libc.h
#include libsec.h

static int
nibble(int c)
{
if(c = '0'  c = '9')
return c - '0';
if(c  0x20)
c += 0x20;
if(c = 'a'  c = 'f')
return c - 'a'+10;
return 0xff;
}

static void
bindigest(char *s, uchar *t)
{
int i;

if(strlen(s) != 2*SHA1dlen)
sysfatal(bad digest %s, s);
for(i = 0; i  SHA1dlen; i++)
t[i] = nibble(s[2*i])4 | nibble(s[2*i + 1]);
}

static char *vs = vac:da6b4b5549383cffc1b5691d824fc4bd381f0f6b;

void
main(void)
{
int i, n;
uchar score[SHA1dlen];
uvlong t0, t1;

n = 1000*1000;
t0 = nsec();
for(i = 0; i  n; i++){
if(strncmp(vs, vac:, 4) == 0)
bindigest(vs + 4, score);
else
sysfatal(bad digest);
}
t1 = nsec();
print(%g\n, 1.*(t1 - t0)/(1.*n));
exits();
}#include u.h
#include libc.h

char *argv[] = {nop, 0};

void
main(void)
{
int i, n;
uvlong t0, t1;

n = 1;
t0 = nsec();
for(i = 0; i  n; i++)
switch(fork()){
case 0:
exec(*argv, argv);
_exits(exec);
case -1:
sysfatal(fork);
default:
free(wait());
}
t1 = nsec();
print(%g\n, 1.*(t1 - t0)/(1.*n));
exits();
}#include u.h
#include libc.h
void
main(void)
{
exits();
}

Re: [9fans] Calling vac from C

2009-02-20 Thread Uriel
One of the main costs of dynamic linking is making fork much slower.
Even on linux statically linked binaries fork a few magnitude orders
faster than dynamically linked ones.

The main source of anti-fork FUD turns out to be the alleged
'solution' to a problem that didn't exist until the geniuses at Sun
decided dynamic linking was such a wonderful idea.

On linux with ancient hardware one can do hundreds of forks per second
without any problems, it works great for werc[1], and it is amusing to
see people inventing hacks like fcgi and writing pthreaded web servers
to avoid as much as one fork call per request, when making hundreds of
them is a non-issue.

An example of how one mistake in systems design(introduction of
dynamic linking) leads to even greater mistakes down the road
(pthreads, fcgi, all kinds of hacks to avoid fork), and people never
steps back to think if the original design decision was really worth
it.

Peace

uriel

[1]: http://werc.cat-v.org

On Fri, Feb 20, 2009 at 4:41 PM, erik quanstrom quans...@quanstro.net wrote:
 I believe that
 1) Its too much trouble parsing the output everytime.

 i don't buy that.  that takes very little code.  since you
 have evidently already written the code, the cost
 is zero.

 (if you're worried about runtime, i measure parsing
 time at 338ns on a core i7 920.  cf. attached digestspd.c)

 2) Calling some function from an included library will be faster.

 maybe.  are you sure that it matters?  i measure
 base fork/exec latency on a 1.8ghz xeon5000 at 330µs.
 (files served from the fileserver, not a ram disk.)
 the attached fork.c and nop.c were used to do the
 measurement.  i measure vac throughput at ~3mb/s
 for small files from a brand new venti running from a
 ramdisk.  the venti was tiny with 5mb isect and 100mb
 arenas, and empty.  at that rate, 330µs will cost you
 1038 bytes, or 0.3%.

 remember that dynamic linking isn't free.  that cost
 assumes that dynamic linking is free, and it is not.

 - erik



Re: [9fans] Calling vac from C

2009-02-20 Thread erik quanstrom
On Fri Feb 20 11:18:41 EST 2009, urie...@gmail.com wrote:
 One of the main costs of dynamic linking is making fork much slower.
 Even on linux statically linked binaries fork a few magnitude orders
 faster than dynamically linked ones.
 
 The main source of anti-fork FUD turns out to be the alleged
 'solution' to a problem that didn't exist until the geniuses at Sun
 decided dynamic linking was such a wonderful idea.

very generally, i agree with the direction of your
post.  but i do remember things a bit differently.

iirc, this went the other way 'round.  fork itself
was very expensive on sun hardware in the early
90s if one had some memory mapped.  sun mmus
had issues.  i benchmarked a vax 11/780 vs a sun
670mp.  the 4x50mhz 670mp was scheduled to replace the
1x5mhz (?) vaxen.  the vax forked maybe 10x faster when no
memory was allocated.  however, when a moderate
amount of memory was allocated, the vax pounded
the sun by many (3, i think) of magnitude.

i posted this info way back when, but can't find
a reference.

threading became a really hot topic at the time,
too.  maybe just coincidence, but i'm sure it didn't
hurt to be able to show such great improvement.

the fork test run on my underpowered p3 machine
gets 1800µs/fork-exec.  since the p3 does 1836 bogomips
and the i7 does 43173, it's safe to assume that linux
has fine fork performance, given a reasonable amount
of shared libraries.

it would be very interesting if someone would
see how fork performance relates to the size and
number of dynamic libraries.  i'm not sure i know
how to do this without devoting weeks to the project.

- erik



Re: [9fans] Calling vac from C

2009-02-19 Thread erik quanstrom
On Thu Feb 19 05:04:15 EST 2009, anooop.ano...@gmail.com wrote:
 Hello once again,
 
 I was wondering whether if there are any libraries that I can include
 to call vac and unvac directly from my C code. Currently I am
 executing them in the shell using popen and capturing the output. I am
 looking for better ways.

what's wrong with the tools-based approach
you're currently using?

this may be hard to believe coming from unix,
but your approach is what many tools do.  nobody
links to a tcs library.  one uses the tcs(1)
executable.

executables.  god's answer to dynamic linking.

- erik