RE: fork is vfork? (was Re: With similar rules, rspamd is about ten times faster than SpamAssassin.)
Memory management is tricky though. Hard to tell which values sum up to the real thing. Probably best meter on Linux is the actual free value highlighted below? Check it before starting amavisd/spamd/whatnot and check it again after running for a while. Also double check it after killing all the processes. I'm open to be proved otherwise.. $ free total used free sharedbuffers cached Mem: 1047496 944236 103260 0 2904 284336 -/+ buffers/cache: 656996 ___390500___ Swap: 524272 28 257604 Let's see if Private_ entries in smaps do the right thing. This is the command I use to get allocated private memory from a process (in kB): awk 'BEGIN {p=0;} $1 ~ /Private_/ {p += $2;} END {print p;}' /proc/PID/smaps Besides, one can specify the smaps file of more than a process, not only one. Regards, Giampaolo
Re: fork is vfork?
On Don, 2013-03-07 at 13:12 -0500, David F. Skoll wrote: On Thu, 07 Mar 2013 19:04:22 +0100 Bernd Petrovitsch be...@petrovitsch.priv.at wrote: MD forks the worker process and the worker process initializes libperl and loads the perl script. Nope. Hmm, I had the impression ages ago with md-2.56 - did this change since then? [ And I mean with embedded-perl compiled-in and activated. ] To share more memory on a fork, it should initialize libperl and load the perl script before forking off the worker processes. That's what it does. Then I have no idea, why the mimedefang-multiplexor processes have next to nothing shared - except that perl actually changes a lot in the internal data structures killing any copy-on-write benefits. Hmm, perl probably does not separate code from data internally that much . Bernd -- Bernd Petrovitsch Email : be...@petrovitsch.priv.at LUGA : http://www.luga.at
Re: R: Rspamd project
On 07/03/13 23:26, Giampaolo Tomassoni wrote: I see there would be problems in naming your project RSA. Nevertheless, is there any plan to have the current rspamd features in a library, in order to allow third-parties to develop their own message handling interface wrapping it? At the moment, rspamd consist of several shared libraries (rspamd-util, rspamd-mime and rspamd-server) and the main application logic (main process + worker processes). However, without application logic there is no convenient way to use things, such as plugins or event loops. Therefore, if you can give some concrete example of what functionality should be used in such 3-rd parties applications I can provide more details. Nevertheless, some features, including DKIM, SPF and other parsers are worth to move them in some separate BSD licensed libraries for those who want fast and lightweight support of these techniques. -- Vsevolod Stakhov
RE: R: Rspamd project
On 07/03/13 23:26, Giampaolo Tomassoni wrote: I see there would be problems in naming your project RSA. Nevertheless, is there any plan to have the current rspamd features in a library, in order to allow third-parties to develop their own message handling interface wrapping it? At the moment, rspamd consist of several shared libraries (rspamd-util, rspamd-mime and rspamd-server) and the main application logic (main process + worker processes). However, without application logic there is no convenient way to use things, such as plugins or event loops. Therefore, if you can give some concrete example of what functionality should be used in such 3-rd parties applications I can provide more details. Something like Amavisd is a concrete example (http://www.ijs.si/software/amavisd/). It has many features working around SA which may somehow be replicated in a piece of code using an rspamd library. Also, what if someone wants to integrate your product into some kind of javamail server? (Apache James, in example http://james.apache.org) Yes, one can leverage the spamd-like rspamd's protocol, but this means a message have to be copied back and forth from different processes, result codes converted to ASCII and then re-interpreted back by some other process... There would probably be better performances by merging together software layers which have basically to cooperate anyway. Also, a library could expose a response object carrying many more information than the ones one can expose via a thin socket protocol. The main difficulties in a transition to a library model are in managing (loading) the configuration and in the event pump. I guess a good starting point would be to put the configuration parser and the event pump model in the rspamd application, while all other functionalities may be in the library. This way an application could even, say, store the configuration in a database instead that in an xml file, as well as adopt a different model then the event-based one. Nevertheless, some features, including DKIM, SPF and other parsers are worth to move them in some separate BSD licensed libraries for those who want fast and lightweight support of these techniques. Right. There could even be an intermediate layer in the library, allowing a user to adopt its preferred DKIM/SPF implementation. This may even be true for DNS resolution (which basically means that is the user that chooses an event-based approach instead of a multithreaded or whatever one). Thanks, Giampaolo -- Vsevolod Stakhov
[OT] Re: R: Rspamd project
On 3/8/2013 8:39 AM, Giampaolo Tomassoni wrote: Also, what if someone wants to integrate your product into some kind of javamail server? (Apache James, in example Vsevolod, do you have forum for your project that you can post as this are a bit off topic for an SA list? Regards, KAM
RE: [OT] Re: R: Rspamd project
On 3/8/2013 8:39 AM, Giampaolo Tomassoni wrote: Also, what if someone wants to integrate your product into some kind of javamail server? (Apache James, in example Vsevolod, do you have forum for your project that you can post as this are a bit off topic for an SA list? Regards, KAM Oh, sorry Kevin: I'm the true guilt for this. If you don't mind, I suggest to let Vsevolod reply and eventually switch to his list for anything furhter. After all, some of the Vsevolod work could even result in a SA plugin... ;) Giampaolo
RE: Checking for email attachment name for containing Javscript code that could get potentially executed when displayed on a webpage.
Can you pastebin an example? Not sure what you mean with the attachment *name* contains JS code. Here is the requested sample http://pastebin.com/DN7PRnH4 The attachment name contains the javascript code at the bottom of the pasted file. thanks Ashish -Original Message- From: Axb [mailto:axb.li...@gmail.com] Sent: Wednesday, March 06, 2013 3:59 PM To: users@spamassassin.apache.org Subject: Re: Checking for email attachment name for containing Javscript code that could get potentially executed when displayed on a webpage. On 03/06/2013 11:20 AM, Sharma, Ashish wrote: All, I have a mail receiving server that parses incoming emails for email attachment and the files are listed on a web page for users to see. Here I need to check for email attachment name for containing Javscript code that could get potentially executed when displayed on a webpage. Is there a way in Spamassassin conf that can help me in testing for the above mentioned scenario? Can you pastebin an example? Not sure what you mean with the attachment *name* contains JS code.
Re: fork is vfork?
On Fri, 08 Mar 2013 13:26:50 +0100 Bernd Petrovitsch be...@petrovitsch.priv.at wrote: Hmm, I had the impression ages ago with md-2.56 - did this change since then? [ And I mean with embedded-perl compiled-in and activated. ] No, I think it's been that way for ages. Then I have no idea, why the mimedefang-multiplexor processes have next to nothing shared - except that perl actually changes a lot in the internal data structures killing any copy-on-write benefits. I think that's what I said originally... but was met with skepticism from some. Regards, David.
[OT] Re: R: Rspamd project
On 08/03/13 13:48, Giampaolo Tomassoni wrote: On 3/8/2013 8:39 AM, Giampaolo Tomassoni wrote: Also, what if someone wants to integrate your product into some kind of javamail server? (Apache James, in example Vsevolod, do you have forum for your project that you can post as this are a bit off topic for an SA list? Regards, KAM Oh, sorry Kevin: I'm the true guilt for this. If you don't mind, I suggest to let Vsevolod reply and eventually switch to his list for anything furhter. After all, some of the Vsevolod work could even result in a SA plugin... ;) I completely agree that rspamd design should be discussed outside of this group. Hence, I've created google group at: https://groups.google.com/forum/#!forum/rspamd So can you please repeat your post there to avoid flooding SA list. -- Vsevolod Stakhov
Re: [OT] Re: R: Rspamd project
On 3/8/2013 8:48 AM, Giampaolo Tomassoni wrote: On 3/8/2013 8:39 AM, Giampaolo Tomassoni wrote: Also, what if someone wants to integrate your product into some kind of javamail server? (Apache James, in example Vsevolod, do you have forum for your project that you can post as this are a bit off topic for an SA list? Regards, KAM Oh, sorry Kevin: I'm the true guilt for this. If you don't mind, I suggest to let Vsevolod reply and eventually switch to his list for anything furhter. After all, some of the Vsevolod work could even result in a SA plugin... ;) No worries. I don't mind seeing the occasional off-topic items and just general how to fight spam bastard discussions. And I look forward to see if there is synergy with the Rspam project! Regards, KAM
Re: fork is vfork?
On Fri, Mar 08, 2013 at 09:09:27AM -0500, David F. Skoll wrote: I think that's what I said originally... but was met with skepticism from some. There is a difference in saying something and actually providing some data. I'm sorry but this sounds like True Believers (no need to prove anything) vs Scepticists (no matter what you prove, it doesn't matter to them). :-) Here is my full documentation. I don't really care if you are on sa-users list and claim to not even have access to spamd, but if we keep going, let's base the claims on hard data? If MD behaves differently, that's another matter and not relevant to SA. $ uname -a Linux ubuntu 3.2.0-35-generic #55-Ubuntu SMP Wed Dec 5 17:42:16 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux $ /usr/local/perl/bin/perl -v This is perl 5, version 16, subversion 2 (v5.16.2) built for x86_64-linux $ free total used free sharedbuffers cached Mem: 40495843593572 456012 0 2903801636144 -/+ buffers/cache:16670482382536 Swap:0 0 0 (even disabled swap so it doesn't interfere) $ /usr/local/perl/bin/spamd -4 -p 1234 -m 50 --min-children=50 --min-spare=40 --max-conn-per-child=1000 --round-robin -L -d $ pgrep -f spamd |wc -l 51 $ free total used free sharedbuffers cached Mem: 40495843730644 318940 0 2903961636152 -/+ buffers/cache:18040962245488 Swap:0 0 0 (memory difference to before spamd running: 137MB) $ pgrep -f 'spamd child' | while read p; do grep Private_ /proc/$p/smaps; done | awk 'BEGIN {p=0;} $1 ~ /Private_/ {p += $2;} END {print p;}' 78700 (smaps claims virgin childs are using 78MB / 50 = ~1.5MB/child) $ find ham.old -type f | while read f; do spamc -p 1234 $f; done (zzz. about 25 messages processed per child) $ ps axu |grep spamd hege 23755 0.2 1.0 121608 44244 ?Ss 17:23 0:01 /usr/local/perl/bin/spamd -4 -p 1234 -m 50 --min-children=50 --min-spare=40 --max-conn-per-child=1000 --round-robin -L -d hege 23756 2.1 1.2 129400 52400 ?S17:23 0:09 spamd child hege 23757 2.0 1.3 130432 53284 ?S17:23 0:09 spamd child hege 23758 1.0 1.2 127168 50244 ?S17:23 0:04 spamd child hege 23759 0.9 1.2 129636 52628 ?S17:23 0:04 spamd child hege 23760 1.0 1.2 127984 50960 ?S17:23 0:04 spamd child hege 23761 1.2 1.3 130204 53080 ?S17:23 0:05 spamd child hege 23762 1.1 1.2 128312 51324 ?S17:23 0:05 spamd child hege 23763 1.4 1.2 126356 49512 ?S17:23 0:06 spamd child hege 23764 0.9 1.2 128636 51408 ?S17:23 0:04 spamd child hege 23765 1.2 1.2 129112 52056 ?S17:23 0:05 spamd child hege 23766 1.1 1.2 128048 51068 ?S17:23 0:05 spamd child hege 23767 1.0 1.2 127604 50588 ?S17:23 0:04 spamd child hege 23768 1.2 1.2 129244 52376 ?S17:23 0:05 spamd child hege 23769 0.9 1.2 128680 51632 ?S17:23 0:04 spamd child hege 23770 1.8 1.3 133168 56184 ?S17:23 0:08 spamd child hege 23771 0.8 1.2 129112 52096 ?S17:23 0:03 spamd child hege 23772 0.6 1.2 128268 51220 ?S17:23 0:03 spamd child hege 23773 2.4 1.3 132232 55132 ?S17:23 0:11 spamd child hege 23774 1.3 1.2 129260 52288 ?S17:23 0:06 spamd child hege 23775 1.0 1.3 130308 53360 ?S17:23 0:04 spamd child hege 23776 1.8 1.2 126960 49876 ?S17:23 0:08 spamd child hege 23777 1.0 1.2 128608 51692 ?S17:23 0:04 spamd child hege 23778 0.9 1.2 127232 50248 ?S17:23 0:04 spamd child hege 23779 1.5 1.2 128808 51936 ?S17:23 0:07 spamd child hege 23780 1.1 1.3 131680 54504 ?S17:23 0:05 spamd child hege 23781 1.5 1.2 126596 49760 ?S17:23 0:07 spamd child hege 23782 1.0 1.3 130636 53632 ?S17:23 0:04 spamd child hege 23783 1.0 1.3 130816 53716 ?S17:23 0:04 spamd child hege 23784 1.7 1.3 132872 55820 ?S17:23 0:08 spamd child hege 23785 0.8 1.3 131696 54608 ?S17:23 0:03 spamd child hege 23786 2.1 1.3 130104 53208 ?S17:23 0:10 spamd child hege 23787 1.1 1.3 130940 53892 ?S17:23 0:05 spamd child hege 23788 1.0 1.3 130116 52932 ?S17:23 0:04 spamd child hege 23789 1.3 1.3 132556 55388 ?S17:23 0:06 spamd child hege 23790 0.8 1.2 127744 50784 ?S17:23 0:04 spamd child hege 23791 1.1 1.2 125812 48984 ?S17:23 0:05 spamd child hege 23792 1.6 1.3 129812 52804 ?S17:23 0:07 spamd child hege 23793 0.8 1.3
How to log detected locale/language?
Hey there all, It seems a pretty core function in SA is the ok_languages and ok_locales function. I'd like to be able to turn on LOGGING of detected locales before I set which are ok (or specifically, which are less ok) I'm sure there's a knob for this somewhere, can anyone tell me where? -- Dan Mahoney Techie, Sysadmin, WebGeek Gushi on efnet/undernet IRC ICQ: 13735144 AIM: LarpGM Site: http://www.gushi.org ---
Re: fork is vfork?
On Fri, 8 Mar 2013 17:42:58 +0200 Henrik K h...@hege.li wrote: $ pgrep -f 'spamd child' | while read p; do grep Private_ /proc/$p/smaps; done | awk 'BEGIN {p=0;} $1 ~ /Private_/ {p += $2;} END {print p;}' I would be interested in seeing the output of: pgrep -f 'spamd child' | while read p; do grep Shared_ /proc/$p/smaps; done | awk 'BEGIN {p=0;} $1 ~ /Shared_/ {p += $2;} END {print p;}' Regards, David.
Re: How to log detected locale/language?
On 03/08/2013 04:46 PM, Dan Mahoney, System Admin wrote: Hey there all, It seems a pretty core function in SA is the ok_languages and ok_locales function. I'd like to be able to turn on LOGGING of detected locales before I set which are ok (or specifically, which are less ok) I'm sure there's a knob for this somewhere, can anyone tell me where? Nice someone documented this: http://spamassassin.apache.org/full/3.3.x/doc/Mail_SpamAssassin_Conf.txt _LANGUAGES_ so now what? a few lines later it tells us what to do add_header all X-BLAHTYPE _LANGUAGES_ add that to your local.cf and reload SA, glue, coffee machine. does this do what you want?
Re: fork is vfork?
On Fri, Mar 08, 2013 at 11:26:39AM -0500, David F. Skoll wrote: On Fri, 8 Mar 2013 17:42:58 +0200 Henrik K h...@hege.li wrote: $ pgrep -f 'spamd child' | while read p; do grep Private_ /proc/$p/smaps; done | awk 'BEGIN {p=0;} $1 ~ /Private_/ {p += $2;} END {print p;}' I would be interested in seeing the output of: pgrep -f 'spamd child' | while read p; do grep Shared_ /proc/$p/smaps; done | awk 'BEGIN {p=0;} $1 ~ /Shared_/ {p += $2;} END {print p;}' Here's a new run.. Virgin childs: $ pgrep -f 'spamd child' | while read p; do grep Private_ /proc/$p/smaps; done | awk 'BEGIN {p=0;} $1 ~ /Private_/ {p += $2;} END {print p;}' 79364 $ pgrep -f 'spamd child' | while read p; do grep Shared_ /proc/$p/smaps; done | awk 'BEGIN {p=0;} $1 ~ /Shared_/ {p += $2;} END {print p;}' 2057688 Used childs: $ pgrep -f 'spamd child' | while read p; do grep Private_ /proc/$p/smaps; done | awk 'BEGIN {p=0;} $1 ~ /Private_/ {p += $2;} END {print p;}' 1348048 $ pgrep -f 'spamd child' | while read p; do grep Shared_ /proc/$p/smaps; done | awk 'BEGIN {p=0;} $1 ~ /Shared_/ {p += $2;} END {print p;}' 1271252
Re: fork is vfork?
On Fri, 8 Mar 2013 18:44:54 +0200 Henrik K h...@hege.li wrote: Virgin childs: 79MB private; 2GB shared (~40MB shared/child) Used childs: 1.2GB private; 1.2GB shared (~24MB shared/child) This is roughly what I am seeing with MIMEDefang also: Only 50% shared. It's better than nothing, but not nearly as good as one might have hoped. On my system, about 3.5MB of the shared memory is the text portion of libraries such as libperl.so and libc.so which you'd expect to be shared anyway, so really only about 20.5MB/46.5MB of Perl memory is shared for each slave. Regards, David.
Re: fork is vfork? (was Re: With similar rules, rspamd is about ten times faster than SpamAssassin.)
On Thu, 7 Mar 2013 14:18:12 +0100 Matus UHLAR - fantomas uh...@fantomas.sk wrote: I'm not talking about the semantics but about the implementation. Simply said, vfork() was developed to avoid process memory copying used at fork(). on linux, fork() does NOT copy process memory. On 07.03.13 09:48, David F. Skoll wrote: vfork() also suspends execution of the parent until the child calls execve or _exit. If the child happens to write into its memory, the parent sees the changes... very different from fork(). I think Giampaolo Tomassoni got the point in his reply to the same mail I was replying to. Now, as for the great benefits of copy-on-write: It is actually almost useless with Perl programs. Here's the reason: Perl uses reference-counting to know when to free memory. So even if you access memory read-only by creating a new reference to the underlying object, that effectively becomes a write operation and Linux needs to copy the page. luckily, this does not happen at fork() time but at the time memory is changed. Mamory may stay unchanged, so even after some time the memory footprint can be smaller. -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. One World. One Web. One Program. - Microsoft promotional advertisement Ein Volk, ein Reich, ein Fuhrer! - Adolf Hitler
Re: fork is vfork?
On Fri, Mar 08, 2013 at 11:55:43AM -0500, David F. Skoll wrote: On Fri, 8 Mar 2013 18:44:54 +0200 Henrik K h...@hege.li wrote: Virgin childs: 79MB private; 2GB shared (~40MB shared/child) Used childs: 1.2GB private; 1.2GB shared (~24MB shared/child) This is roughly what I am seeing with MIMEDefang also: Only 50% shared. It's better than nothing, but not nearly as good as one might have hoped. On my system, about 3.5MB of the shared memory is the text portion of libraries such as libperl.so and libc.so which you'd expect to be shared anyway, so really only about 20.5MB/46.5MB of Perl memory is shared for each slave. But atleast ~10-20MB of private data per spamd child is the per message scan data/blobs etc (can be seen as 20mb heap allocation). This should not be calculated in any memory ratio conclusions.
Re: Checking for email attachment name for containing Javscript code that could get potentially executed when displayed on a webpage.
On 08/03/13 14:05, Sharma, Ashish wrote: Can you pastebin an example? Not sure what you mean with the attachment *name* contains JS code. Here is the requested sample http://pastebin.com/DN7PRnH4 The attachment name contains the javascript code at the bottom of the pasted file. thanks Ashish You could try this (untested): mimeheader L_CT_DOCWRITE Content-Type =~ /document\.write/ score L_CT_DOCWRITE 1 describeL_CT_DOCWRITE Content-type contains document.write Score as you see fit. -Original Message- From: Axb [mailto:axb.li...@gmail.com] Sent: Wednesday, March 06, 2013 3:59 PM To: users@spamassassin.apache.org Subject: Re: Checking for email attachment name for containing Javscript code that could get potentially executed when displayed on a webpage. On 03/06/2013 11:20 AM, Sharma, Ashish wrote: All, I have a mail receiving server that parses incoming emails for email attachment and the files are listed on a web page for users to see. Here I need to check for email attachment name for containing Javscript code that could get potentially executed when displayed on a webpage. Is there a way in Spamassassin conf that can help me in testing for the above mentioned scenario? Can you pastebin an example? Not sure what you mean with the attachment *name* contains JS code.
Re: fork is vfork?
On Fri, Mar 08, 2013 at 07:09:12PM +0200, Henrik K wrote: On Fri, Mar 08, 2013 at 11:55:43AM -0500, David F. Skoll wrote: On Fri, 8 Mar 2013 18:44:54 +0200 Henrik K h...@hege.li wrote: Virgin childs: 79MB private; 2GB shared (~40MB shared/child) Used childs: 1.2GB private; 1.2GB shared (~24MB shared/child) This is roughly what I am seeing with MIMEDefang also: Only 50% shared. It's better than nothing, but not nearly as good as one might have hoped. On my system, about 3.5MB of the shared memory is the text portion of libraries such as libperl.so and libc.so which you'd expect to be shared anyway, so really only about 20.5MB/46.5MB of Perl memory is shared for each slave. But atleast ~10-20MB of private data per spamd child is the per message scan data/blobs etc (can be seen as 20mb heap allocation). This should not be calculated in any memory ratio conclusions. Also to be noted that these shared values are completely vague anyway. All that matters is how much real system memory is taken. As seen by my previous free reports, master + 50 virgin childs only take total of 137MB. At full blast everything take ~1.2GB. So the real ratio per child might be something like 4MB bogus perl data, 20MB of per-message data. If the master process takes 50MB memory, this means the child ratio is _much_ better than 50% (4MB/50MB). I might dig deeper into this later with some tools.
RE: Checking for email attachment name for containing Javscript code that could get potentially executed when displayed on a webpage.
Sharma, Ashish skrev den 2013-03-08 15:05: The attachment name contains the javascript code at the bottom of the pasted file. extracttext plugin ?, so bayes learning javascript attachments ?
RE: Checking for email attachment name for containing Javscript code that could get potentially executed when displayed on a webpage.
John Hardin skrev den 2013-03-08 20:31: This is a simple, standard and robust solution to your problem that also prevents other attack vectors you haven't thought of yet. if php build with tidy its simple :)
Re: How to log detected locale/language?
Dan Mahoney, System Admin skrev den 2013-03-08 16:46: I'm sure there's a knob for this somewhere, can anyone tell me where? perldoc Mail::SpamAssassin::Conf did you ment syslog ?