Re: fork is vfork? (was Re: "With similar rules, rspamd is about ten times faster than SpamAssassin.")

2013-03-08 Thread Matus UHLAR - fantomas

On Thu, 7 Mar 2013 14:18:12 +0100
Matus UHLAR - fantomas  wrote:


I'm not talking about the semantics but about the implementation.
Simply said, vfork() was developed to avoid process memory copying
used at fork(). on linux, fork() does NOT copy process memory.


On 07.03.13 09:48, David F. Skoll wrote:

vfork() also suspends execution of the parent until the child calls
execve or _exit.  If the child happens to write into its memory, the parent
sees the changes... very different from fork().


I think Giampaolo Tomassoni got the point in his reply to the same mail I
was replying to. 


Now, as for the great benefits of copy-on-write: It is actually almost
useless with Perl programs.  Here's the reason: Perl uses
reference-counting to know when to free memory.  So even if you access
memory "read-only" by creating a new reference to the underlying object,
that effectively becomes a write operation and Linux needs to copy the
page.


luckily, this does not happen at fork() time but at the time memory is
changed. Mamory may stay unchanged, so even after some time the memory
footprint can be smaller.

--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
"One World. One Web. One Program." - Microsoft promotional advertisement
"Ein Volk, ein Reich, ein Fuhrer!" - Adolf Hitler


RE: fork is vfork? (was Re: "With similar rules, rspamd is about ten times faster than SpamAssassin.")

2013-03-08 Thread Giampaolo Tomassoni
> Memory management is tricky though. Hard to tell which values sum up to
> the
> real thing.
> 
> Probably best meter on Linux is the actual free value highlighted
> below?
> Check it before starting amavisd/spamd/whatnot and check it again after
> running for a while.  Also double check it after killing all the
> processes.
> I'm open to be proved otherwise..
> 
> $ free
>  total   used   free sharedbuffers
> cached
> Mem:   1047496 944236 103260  0   2904
> 284336
> -/+ buffers/cache: 656996 ___390500___
> Swap:   524272 28 257604

Let's see if Private_ entries in smaps do the right thing.

This is the command I use to get allocated private memory from a process (in
kB):

 awk 'BEGIN {p=0;} $1 ~ /Private_/ {p += $2;} END {print p;}'
/proc/PID/smaps

Besides, one can specify the smaps file of more than a process, not only
one.

Regards,

Giampaolo



R: Re: fork is vfork? (was Re: "With similar rules, rspamd is about ten times faster than SpamAssassin.")

2013-03-07 Thread Giampaolo Tomassoni
The Private_ entries in /proc/.../smaps are reported to be the right choice 
here: they report only pages allocated while not shared with any other process. 
Ie, the ones "touched" after fork and the new allocated ones.

Also, smaps is a relatively new proc entry, meant exactly to cope with all the 
linux memory stats mess.

Giampaolo


Henrik K  ha scritto:

On Thu, Mar 07, 2013 at 07:02:00PM +0100, Giampaolo Tomassoni wrote:
>
> I just got a snip into my amavisd's 5 children /proc//smaps file,
> summing together the count of Private_{Clean|Dirty} pages.
> 
> I got this:
> 
> p1: 74,164 kb
> p2: 70,772 kb
> p3: 71,548 kb
> p4: 74,064 kb
> p5: 70,784 kb
> 
> This accounts for a total of unique 287,168 kB (say 280 MB?). ~ 56MB in the
> average.
> 
> Sounds this good?

Memory management is tricky though. Hard to tell which values sum up to the
real thing.

Probably best meter on Linux is the actual free value highlighted below? 
Check it before starting amavisd/spamd/whatnot and check it again after
running for a while.  Also double check it after killing all the processes. 
I'm open to be proved otherwise..

$ free
 total   used   free shared    buffers cached
Mem:   1047496 944236 103260  0   2904 284336
-/+ buffers/cache: 656996 ___390500___
Swap:   524272 28 257604



Re: fork is vfork? (was Re: "With similar rules, rspamd is about ten times faster than SpamAssassin.")

2013-03-07 Thread Henrik K
On Thu, Mar 07, 2013 at 07:02:00PM +0100, Giampaolo Tomassoni wrote:
>
> I just got a snip into my amavisd's 5 children /proc//smaps file,
> summing together the count of Private_{Clean|Dirty} pages.
> 
> I got this:
> 
>   p1: 74,164 kb
>   p2: 70,772 kb
>   p3: 71,548 kb
>   p4: 74,064 kb
>   p5: 70,784 kb
> 
> This accounts for a total of unique 287,168 kB (say 280 MB?). ~ 56MB in the
> average.
> 
> Sounds this good?

Memory management is tricky though. Hard to tell which values sum up to the
real thing.

Probably best meter on Linux is the actual free value highlighted below? 
Check it before starting amavisd/spamd/whatnot and check it again after
running for a while.  Also double check it after killing all the processes. 
I'm open to be proved otherwise..

$ free
 total   used   free sharedbuffers cached
Mem:   1047496 944236 103260  0   2904 284336
-/+ buffers/cache: 656996 ___390500___
Swap:   524272 28 257604



RE: fork is vfork? (was Re: "With similar rules, rspamd is about ten times faster than SpamAssassin.")

2013-03-07 Thread Giampaolo Tomassoni
> On Thu, Mar 07, 2013 at 11:37:33AM -0500, David F. Skoll wrote:
> > On Thu, 7 Mar 2013 17:47:22 +0200
> > Henrik K  wrote:
> >
> > > Memory measured with "free" (without buffers/cache etc):
> >
> > > begin 2588084
> > > end 1296756
> >
> > > About 25MB non-shared memory used per child,
> >
> > Are you sure your measurements are correct?  I use MIMEDefang which
> also
> > has a preforked-children architecture and I see only about 4MB shared
> > per child with the vast majority of per-child memory non-shared.
> This
> > is based on what top reports.
> 
> You provide no data how you end up with the 4MB etc. And MD is not SA,
> it
> might do all sorts of funky stuff.
> 
> How about actually trying the provided spamd line yourself and not keep
> again theorizing how someone is measuring wrong etc?
> 
> Well actually here is the one I used to get 50 childs.. pasted wrong
> one.
> spamd -4 -p 1234 -m 50 --min-children=50 --min-spare=40 --max-conn-per-
> child=1000 --round-robin -L
> 
> Just feed a lot of random messages with spamc -p 1234.

I just got a snip into my amavisd's 5 children /proc//smaps file,
summing together the count of Private_{Clean|Dirty} pages.

I got this:

p1: 74,164 kb
p2: 70,772 kb
p3: 71,548 kb
p4: 74,064 kb
p5: 70,784 kb

This accounts for a total of unique 287,168 kB (say 280 MB?). ~ 56MB in the
average.

Sounds this good?

Giampaolo



Re: fork is vfork? (was Re: "With similar rules, rspamd is about ten times faster than SpamAssassin.")

2013-03-07 Thread David F. Skoll
On Thu, 7 Mar 2013 18:56:45 +0200
Henrik K  wrote:

> You provide no data how you end up with the 4MB etc. And MD is not
> SA, it might do all sorts of funky stuff.

I wrote MD, so I'm pretty sure it's not doing any funky stuff.

> How about actually trying the provided spamd line yourself and not
> keep again theorizing how someone is measuring wrong etc?

I don't have any machine with spamd installed.

Regards,

David.


Re: fork is vfork? (was Re: "With similar rules, rspamd is about ten times faster than SpamAssassin.")

2013-03-07 Thread Henrik K
On Thu, Mar 07, 2013 at 11:37:33AM -0500, David F. Skoll wrote:
> On Thu, 7 Mar 2013 17:47:22 +0200
> Henrik K  wrote:
> 
> > Memory measured with "free" (without buffers/cache etc):
> 
> > begin 2588084
> > end 1296756
> 
> > About 25MB non-shared memory used per child,
> 
> Are you sure your measurements are correct?  I use MIMEDefang which also
> has a preforked-children architecture and I see only about 4MB shared
> per child with the vast majority of per-child memory non-shared.  This
> is based on what top reports.

You provide no data how you end up with the 4MB etc. And MD is not SA, it
might do all sorts of funky stuff.

How about actually trying the provided spamd line yourself and not keep
again theorizing how someone is measuring wrong etc?

Well actually here is the one I used to get 50 childs.. pasted wrong one.
spamd -4 -p 1234 -m 50 --min-children=50 --min-spare=40 
--max-conn-per-child=1000 --round-robin -L

Just feed a lot of random messages with spamc -p 1234.



Re: fork is vfork? (was Re: "With similar rules, rspamd is about ten times faster than SpamAssassin.")

2013-03-07 Thread David F. Skoll
On Thu, 7 Mar 2013 17:47:22 +0200
Henrik K  wrote:

> Memory measured with "free" (without buffers/cache etc):

> begin 2588084
> end 1296756

> About 25MB non-shared memory used per child,

Are you sure your measurements are correct?  I use MIMEDefang which also
has a preforked-children architecture and I see only about 4MB shared
per child with the vast majority of per-child memory non-shared.  This
is based on what top reports.

> So in the case of SA, it's not anywhere near "very little memory
> shared after a short while".

My measurements completely disagree with yours, so one of us (or both?) is
wrong.

Regards,

David.



Re: fork is vfork? (was Re: "With similar rules, rspamd is about ten times faster than SpamAssassin.")

2013-03-07 Thread Henrik K
On Thu, Mar 07, 2013 at 09:48:19AM -0500, David F. Skoll wrote:
>
> I think if you measure what happens to Perl processes that fork a number
> of children to handle requests, you'll see that there's very little memory
> sharing after a short while.

Please let's stop the techno-theorizing and provide actual results.
We already had this exact same discussion atleast once *sigh*.

Start something like:

spamd -4 -p 1234 --min-children=50 --min-spare=50 --max-conn-per-child=1000 
--round-robin -L

50 non-recycled childs, fed 1000 requests (~20 each).

Memory measured with "free" (without buffers/cache etc):

begin 2588084
end 1296756

About 25MB non-shared memory used per child, which is pretty normal
since SA uses lots of internal per-message data.  On 32-bit systems the
usage could be half of that.

So in the case of SA, it's not anywhere near "very little memory shared
after a short while".



Re: fork is vfork? (was Re: "With similar rules, rspamd is about ten times faster than SpamAssassin.")

2013-03-07 Thread David F. Skoll
On Thu, 7 Mar 2013 14:18:12 +0100
Matus UHLAR - fantomas  wrote:

> I'm not talking about the semantics but about the implementation.
> Simply said, vfork() was developed to avoid process memory copying
> used at fork(). on linux, fork() does NOT copy process memory.

vfork() also suspends execution of the parent until the child calls
execve or _exit.  If the child happens to write into its memory, the parent
sees the changes... very different from fork().

Now, as for the great benefits of copy-on-write: It is actually almost
useless with Perl programs.  Here's the reason: Perl uses
reference-counting to know when to free memory.  So even if you access
memory "read-only" by creating a new reference to the underlying object,
that effectively becomes a write operation and Linux needs to copy the
page.

I think if you measure what happens to Perl processes that fork a number
of children to handle requests, you'll see that there's very little memory
sharing after a short while.

Regards,

David.



RE: fork is vfork? (was Re: "With similar rules, rspamd is about ten times faster than SpamAssassin.")

2013-03-07 Thread Giampaolo Tomassoni
> On Thu, 7 Mar 2013 13:47:55 +0100
> Matus UHLAR - fantomas  wrote:
> 
> > the implementation of fork() in linux makes it nearly the same as
> > vfork().
> 
> That is completely wrong.  Just because modern forks use copy-on-write
> doesn't make them anything at all like vfork; the semantics are utterly
> different.

Uhu! You need to put things in their own context in order to get the
semantic.

Should I had to say: "a fork under linux attains performances as close to a
vfork"?

I'm replying to a list, not writing a CS book, come on...

Giampaolo


> Regards,
> 
> David.



Re: fork is vfork? (was Re: "With similar rules, rspamd is about ten times faster than SpamAssassin.")

2013-03-07 Thread Matus UHLAR - fantomas

On Thu, 7 Mar 2013 13:47:55 +0100
Matus UHLAR - fantomas  wrote:


the implementation of fork() in linux makes it nearly the same as
vfork().


On 07.03.13 07:53, David F. Skoll wrote:

That is completely wrong.  Just because modern forks use copy-on-write
doesn't make them anything at all like vfork; the semantics are utterly
different.


I'm not talking about the semantics but about the implementation.  Simply
said, vfork() was developed to avoid process memory copying used at fork(). 
on linux, fork() does NOT copy process memory.


--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
WinError #98652: Operation completed successfully.


Re: fork is vfork? (was Re: "With similar rules, rspamd is about ten times faster than SpamAssassin.")

2013-03-07 Thread David F. Skoll
On Thu, 7 Mar 2013 13:47:55 +0100
Matus UHLAR - fantomas  wrote:

> the implementation of fork() in linux makes it nearly the same as
> vfork().

That is completely wrong.  Just because modern forks use copy-on-write
doesn't make them anything at all like vfork; the semantics are utterly
different.

Regards,

David.



Re: fork is vfork? (was Re: "With similar rules, rspamd is about ten times faster than SpamAssassin.")

2013-03-07 Thread Matus UHLAR - fantomas

On Thu, 7 Mar 2013 08:57:28 +0100
"Giampaolo Tomassoni"  wrote:

I don't see too many differences with running more SA
processes with linuxes (in which a fork() is actually a vfork()).


On 07.03.13 07:01, David F. Skoll wrote:

I don't believe that's true.  Do you have evidence to back up that claim?
fork() and vfork() have very different semantics and vfork() would not
work at all for spamd.


the implementation of fork() in linux makes it nearly the same as vfork().
--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
WinError #9: Out of error messages.