Re: HarrisPoll
On Saturday 17 February 2007 23:29, Jeff Chan wrote: > I should have addded, we are removing the Harris Poll domain > hpolsurveys.com from the blacklist. Actually checking more closely, this domain is not on any SURBL blacklists. If you got this result recently, then you may be suffering from the DNS bug: http://issues.apache.org/SpamAssassin/show_bug.cgi?id=3997 If you're using a SpamAssassin version before 3.1, you should upgrade to 3.1+ since it's fixed in these versions. Or are you using a DNS proxy? What happens if you: dig hpolsurveys.com.multi.surbl.org or ping it?
Re: HarrisPoll
I should have addded, we are removing the Harris Poll domain hpolsurveys.com from the blacklist.
Re: HarrisPoll
On Saturday 17 February 2007 10:44, LuKreme wrote: > On 17-Feb-2007, at 06:39, Michael Scheidell wrote: > >> -Original Message- > >> From: LuKreme [mailto:[EMAIL PROTECTED] > >> Sent: Friday, February 16, 2007 1:26 PM > >> To: users@spamassassin.apache.org > >> Subject: HarrisPoll > >> > >> > >> Where does the WS-SURBL info come from? I ask because the Harris > >> Poll email is getting tagged with it. As far as I know, I've > >> never received spam from them, so I'd like to check out the actual > >> rbl. > > > > If harrispoll emailed you then you DID get spam from them. > > I get mail from them all the time. > > > We get it all the time, and I don't know ANY user that signed up > > for it. > > I did. > > > Did you sign up for it on purpose? ;-) > > Yep. ALthough I recently (Friday) unsubbed. It's a false positive (an error). Please report all false positives on SURBL lists to: [EMAIL PROTECTED] Ideally have the owner of the domain contact us with the information at: http://www.surbl.org/lists.html#removal
Re: Google Summer of Code 2007 ...
>> Not quite. Those show how many times *others* have seen it, not how >> many times *I* have seen it. Also, these have hysteresis so if you are >> unfortunately to be at the start of the spam run and receive multiple >> mails all with the same body then Razor, DCC and Pyzor might not >> help. Though if this were implemented then there would have to a >> whitelist for mailing lists to which multiple users have subscribed. >> Hi, ixhash, which also works that way, definitely started its life as an inhouse mail counter. You could probably use ixhash or razor along with your own server rather than the public one Wolfgang
Re: Google Summer of Code 2007 ...
Theo Van Dinter <[EMAIL PROTECTED]> writes: > Doesn't SA have at least 3 of those already? Razor, DCC, and Pyzor. Not quite. Those show how many times *others* have seen it, not how many times *I* have seen it. Also, these have hysteresis so if you are unfortunately to be at the start of the spam run and receive multiple mails all with the same body then Razor, DCC and Pyzor might not help. Though if this were implemented then there would have to a whitelist for mailing lists to which multiple users have subscribed.
Re: FuzzyOCR
On 2007-02-12, Sujit Choudhury <[EMAIL PROTECTED]> wrote: > Is there an easy way to get everything needed for FuzzyOCR? Has > somebody built a complete install, so that we don't have to go to > various sites to built various bits of FuzzyOCR? On FreeBSD when you install from the ports collection (p5-FuzzyOCR), it fetches, builds and installs all the pieces it needs automatically. -- John ([EMAIL PROTECTED])
Re: [2] How can I configure spamassassin to filter spam jpgs?
On 2007-02-15, NIbbLLe <[EMAIL PROTECTED]> wrote: > The problem is that we are running spamassassin through plesk 7 and we are > running it on a Windows machine. I went to the FuzzyOCR site, I see the only > files that they have is .tar (for linux) . Do you maybe have any > suggestions on how I can install the plugin on the Windows machine? If not > do you maybe know of another product I can use? You could use VMWare to run it in a linux virtual machine. -- John ([EMAIL PROTECTED])
Re: Google Summer of Code 2007 ...
On Saturday February 17 2007 03:01, Quinn Comendant wrote: > How about an extensive statistics reporting tool, ..., that > can show how well a current spamassassin installation is performing > and where it needs improvements. Well, not exactly by your words, but in the same spirit, this time belonging to SA itsef: Instrument SA with a couple of performance measuring probes, providing some easier way to spot where bottlenecks lie. Just something simple enough to tell, look, currently waiting for Razor server response (or some RBL) is taking 80% of elapsed time. Or, Bayes db is very sluggish, it is taking 5 seconds to provide a result. A timing breakdown by subtasks is not that much work to provide, but provides great insight into troubleshooting and performance improvements. Here is an example of a timing breakdown as currently provided in the log (at log level 2) by amavisd-new, without getting into specific details, except to say the numbers are elapsed time for each subtask in milliseconds (and in percents, just for the section, and then a cumulative percent of all sections so far): TIMING [total 1840 ms] - SMTP pre-DATA-flush: 4 (0%)0, SMTP DATA: 95 (5%)5, check_init: 1 (0%)5, sql-enter: 69 (4%)9, mime_decode: 16 (1%)10, get-file-type2: 26 (1%)11, parts_decode: 1 (0%)12, check_header: 3 (0%)12, AV-scan-1: 14 (1%)12, AV-scan-2: 20 (1%)14, spam-wb-list: 5 (0%)14, SA call: 1517 (82%)96, update_cache: 3 (0%)97, decide_mail_destiny: 6 (0%)97, ^ write-header: 15 (1%)98, save-to-local-mailbox: 1 (0%)98, prepare-dsn: 3 (0%)98, main_log_entry: 12 (1%)99, sql-update: 20 (1%)100, update_snmp: 2 (0%)100, SMTP pre-response: 1 (0%)100, SMTP response: 1 (0%) 100, unlink-2-files: 1 (0%)100, rundown: 0 (0%)100 It tells at a glance that message checking and I/O for this particular message took 1840 ms in total, that receiving a message over SMTP for example took 5% of this, virus scaners were very quick (14 and 20 ms), and SA call took 1517 ms, which is (82%) of all elapsed time, all sections up to SA (cumulative) took 96% of total elapsed time. Now, something of this relatively simple timing breakdown, but drilled down into a SA call, telling the administrator where is it worth spending his effort, or why all a sudden SA takes 10 seconds instead of the usual 2. Mark
Re: Google Summer of Code 2007 ...
On Sat, Feb 17, 2007 at 06:56:28PM -0500, Tim B. wrote: > How about a "How many times have I seen this message body" plugin... > > So each time SA see's the same or similar enough message body, it > increases the score. Doesn't SA have at least 3 of those already? Razor, DCC, and Pyzor. -- Randomly Selected Tagline: "I love deadlines. I like the whooshing sound they make as they fly by." - Douglas Adams pgpEbUumExLWy.pgp Description: PGP signature
Re: Google Summer of Code 2007 ...
Justin Mason wrote: Theo Van Dinter writes: I'm assuming that there will be a Google Summer of Code 2007 going on, and that the ASF will be involved again. So it's a good time to start thinking about things we'd like to put up as possible projects. We still have a number of items from last year that we could use again. Anything else that we'd like people to code up? Also, any suggestions from outside the dev team? Anyone got good ideas for new SpamAssassin features that would be good to pay someone to work on for 3 months? --j. How about a "How many times have I seen this message body" plugin... So each time SA see's the same or similar enough message body, it increases the score.
Re: Google Summer of Code 2007 ...
On Fri, 16 Feb 2007, Quinn Comendant wrote: How about an extensive statistics reporting tool, possible web-based, that can show how well a current spamassassin installation is performing and where it needs improvements. It could provide trends in different classes of spam and how each is marked. Also show info on whether expensive (as in cpu time) rules and plugins are actually doing any good. I don't know that this belongs in SA itself. It'd be a nice add-on, but SA already does logging that should be quite sufficient to write something like this. Not to mention, the best measure of the success of a spam filtering plan is user satisfaction. Chris St. Pierre Unix Systems Administrator Nebraska Wesleyan University -- Never send mail to [EMAIL PROTECTED]
Re: HarrisPoll
On 17-Feb-2007, at 06:39, Michael Scheidell wrote: -Original Message- From: LuKreme [mailto:[EMAIL PROTECTED] Sent: Friday, February 16, 2007 1:26 PM To: users@spamassassin.apache.org Subject: HarrisPoll Where does the WS-SURBL info come from? I ask because the Harris Poll email is getting tagged with it. As far as I know, I've never received spam from them, so I'd like to check out the actual rbl. If harrispoll emailed you then you DID get spam from them. I get mail from them all the time. We get it all the time, and I don't know ANY user that signed up for it. I did. Did you sign up for it on purpose? ;-) Yep. ALthough I recently (Friday) unsubbed. -- "I used to hate the sun, because it'd shone on everything I'd done. Made me feel that all that I had done was overfill the ashtray of my life."
Re: SA not working?
Matt Kettler schrieb am 17.02.2007 15:08: > David Obando wrote: > >> Dear all, >> >> I installed SA on a Debian Etch system together with Postfix and Amavis. >> Strangely SA doens't score mails at all, but I don't see why. >> >> See the output of a spam mail I checked manually, When I run a check on >> the same mail on a different machine, it is scored: >> >> [EMAIL PROTECTED] tmp]# spamassassin -D < spam >> > > . > > Ok, we know SA works when invoked as the "spamassassin" script.. What's > your amavis configuration for SpamAssassin like? > > Is your @local_domains_acl set correctly? > > What is your tag_level set to? (note: this doesn't mean tagged as spam, > it means has any SA type headers added at all, set to -999 if you want > sane behavior) > > See also the amavis faq: > > http://www.ijs.si/software/amavisd/#faq-spam > > > Hi, I don't think that SA is working because no test are made! When I SA-check a GTUBE mail then it is not scored but it should be scored with at least 1000 points! The problem doesn't have to do with amavis but I post you my configs: my @local_domains_acl is: 05-domain_id:@local_domains_acl = ( ".$mydomain" ); My tag levels: 20-debian_defaults:$sa_tag_level_deflt = 2.0; # add spam info headers if at, or above that level 20-debian_defaults:$sa_tag2_level_deflt = 8.31; # add 'spam detected' headers at that level 50-user:$sa_tag_level_deflt = -10; # zeige Spam-Infos im Mail-Header immer an Regards, David -- The day microsoft makes something that doesn't suck is the day they start making vacuum cleaners. gpg --keyserver pgp.mit.edu --recv-keys 1920BD87 Key fingerprint = 3326 32CE 888B DFF1 DED3 B8D2 105F 29CB 1920 BD87
Re: Export and append Bayes DB
Michael Parker a écrit : Sam Przyswa wrote: Hi, Is it possible to export a Bayes DB from a server and then append (not restore) it to others servers ? No, you generally can't combine two bayes databases that way. Best bet is to pick the most complete one and use it. For more details see a really long post on the users mailing list from me awhile back. Ok, thanks. Sam. -- Ce message a été vérifié par MailScanner pour des virus ou des polluriels et rien de suspect n'a été trouvé.
Re: Google Summer of Code 2007 ...
Raul Dias writes: **snip > If I remember correctly spamd was using something between 2 to 5% of > memory reported by top (45 process max). > > If it was really shared, it would have not collapsed. > > My bet is that the model used on Linux is copy on write. So after a > fork, when the child spamd changes a value, the kernel makes its own > copy of the memory. (please correct me if I am wrong). To make it worse > perl script (AFAIK) is data and not code which makes harder to reuse > (espcially with evals around). > > Even if sharing does happen it is not enough. > > OTOH, with an I/O model, the total memory used would be: > - the perl interpreter and libraries (this is trully shared on a fork > model). > - the compiled perl code and perl libraries. > - one copy of the parsed rules and compiled regular expressions and non >message/scanner related data. Yeah. It's the lists and rules and regexes that do it for me. > - one M::SA::PerMsgStatus object for each simultaneous scanned message >(this is a place to put a limit on). > >> Still, if someone tries it and can demo increased efficiency... >> go for it ;) > > This might require some internal changes to SA. Every Sync call would > have to be changed to Async (NON BLOCKING). This might include SQL > calls, DNS calls, exec ing external apps and even file I/O. An async version of Net::DNS is http://search.cpan.org/~msergeant/ParaDNS-1.1/
Re: Bayes db size....
Is there a consensus on this need? I deal with the seen db issue by scheduled deletion of that file. That said, with SA becoming more and more prominent all the time, I suspect the Average Joe will miss this oddity until they wind up with a sluggish system, out of drive space or other related issues. I was mostly curious of the logic on NOT doing maintenance on the Seen and AWL db files. If there is a consensus this needs to occur, then perhaps I can take the time to create a proper patch. I just want to make sure I am not missing something fundamental here Michael Parker wrote: > Dave Koontz wrote: > >> I am sure this has been asked numerous times before, but what is the logic >> in having auto expiry on the bayes DB, and not seen? Seems that once tokens >> have been removed from the DB there is little to no use for 'unlearning' any >> associated messages. Besides on a busy system, this seen file gets large >> very fast. I'd vote for auto expiry and maintenance on seen as well as AWL. >> >> > > Patches welcome. > > Michael > > >
Re: SA not working?
David Obando wrote: > Dear all, > > I installed SA on a Debian Etch system together with Postfix and Amavis. > Strangely SA doens't score mails at all, but I don't see why. > > See the output of a spam mail I checked manually, When I run a check on > the same mail on a different machine, it is scored: > > [EMAIL PROTECTED] tmp]# spamassassin -D < spam . Ok, we know SA works when invoked as the "spamassassin" script.. What's your amavis configuration for SpamAssassin like? Is your @local_domains_acl set correctly? What is your tag_level set to? (note: this doesn't mean tagged as spam, it means has any SA type headers added at all, set to -999 if you want sane behavior) See also the amavis faq: http://www.ijs.si/software/amavisd/#faq-spam
Re: Bayes db size....
Dave Koontz wrote: > I am sure this has been asked numerous times before, but what is the logic > in having auto expiry on the bayes DB, and not seen? Seems that once tokens > have been removed from the DB there is little to no use for 'unlearning' any > associated messages. Besides on a busy system, this seen file gets large > very fast. I'd vote for auto expiry and maintenance on seen as well as AWL. > Patches welcome. Michael > > -Original Message- > From: Theo Van Dinter [mailto:[EMAIL PROTECTED] > Sent: Friday, February 16, 2007 7:19 PM > To: spam mailling list > Subject: Re: Bayes db size > > On Fri, Feb 16, 2007 at 06:17:36PM -0600, Robert Nicholson wrote: >> So you're saying that right now seen isn't capped like tokens right? > > seen has no max size nor expiry features. > > -- > Randomly Selected Tagline: > "Like any French restaurant in America, it was overpriced, noisy, moody, > and would put you in mortal danger if you had an accident with anything > larger than a croissant." - Unknown about the Renault LeCar > >
RE: HarrisPoll
> -Original Message- > From: LuKreme [mailto:[EMAIL PROTECTED] > Sent: Friday, February 16, 2007 1:26 PM > To: users@spamassassin.apache.org > Subject: HarrisPoll > > > Where does the WS-SURBL info come from? I ask because the Harris > Poll email is getting tagged with it. As far as I know, I've never > received spam from them, so I'd like to check out the actual rbl. If harrispoll emailed you then you DID get spam from them. We get it all the time, and I don't know ANY user that signed up for it. Did you sign up for it on purpose? ;-)
Re: Google Summer of Code 2007 ...
On Sat, 2007-02-17 at 11:21 +, Justin Mason wrote: > Raul Dias writes: > > On Sat, 2007-02-17 at 02:07 +0100, Mark Martinec wrote: > > > On Saturday February 17 2007 01:49, Matthew Wilson wrote: > > > > I was/am primarily concerned with RAM usage for high-concurrency > > > > situations. > > > > > > Ok. Still, in my experience about 30 (maybe 50) SA processes can > > > fully utilize today's CPU & I/O, and it's probably no big deal > > > to provide about 2 GB of memory to cater for such system. > > > Also, and unfortunately, multithreading in Perl is rather > > > cumbersome and not significantly less expensive than fully > > > individual processes. > > > > After experiencing with the sa-blacklist.cf some time ago with 45 > > process brought my system to its knees with 3.5GB (out of memory). > > > > I agree about the thread model. > > > > But sticking to a async I/O model is a valid point. If implemented > > correctly it will save a lot of memory and even improve performance a > > little. > > > > Having separeted process saves the need to have to check for garbage > > after filtering a message, which will cause the code to have to be > > recheck. > > > > However, for uniprocessor systems, having multiple process running is > > actually more expansive than a async I/O one. For multiple process > > system, just keep one process for cpu or less. > > > > In the past I have played a lot with perl-loop (any loopers around?) > > which was the only way to go. It is too low level for most people, but > > perhaps POE is the way to go today (which can use perl-loop as its > > base). > > I'm dubious about the benefits for SpamAssassin... > > An async model works very well for network-bound and I/O-bound servers; > however, SpamAssassin is mainly CPU-bound, since the network and I/O parts > are already mostly run async during the scan operation. > > Also, the multiple spamd processes share quite a lot of RAM with each > other -- there's a bug in how linux reports "shared" memory which makes it > appear much worse than it is. read the FAQ for more details. yep, but ... 01:01:37 kernel: Out of Memory: Killed process 10024 (spamd). 01:01:51 kernel: Out of Memory: Killed process 10044 (spamd). 01:02:05 kernel: Out of Memory: Killed process 10612 (spamd). 01:02:19 kernel: Out of Memory: Killed process 10038 (spamd). 01:02:32 kernel: Out of Memory: Killed process 10602 (spamd). 01:02:45 kernel: Out of Memory: Killed process 10398 (spamd). 01:03:04 kernel: Out of Memory: Killed process 10020 (spamd). 01:03:29 kernel: Out of Memory: Killed process 10015 (spamd). 01:03:42 kernel: Out of Memory: Killed process 10237 (spamd). 01:04:00 kernel: Out of Memory: Killed process 11037 (spamd). 01:04:18 kernel: Out of Memory: Killed process 10478 (spamd). 01:04:34 kernel: Out of Memory: Killed process 11065 (spamd). 01:04:40 kernel: Out of Memory: Killed process 10405 (spamd). ...and it goes... If I remember correctly spamd was using something between 2 to 5% of memory reported by top (45 process max). If it was really shared, it would have not collapsed. My bet is that the model used on Linux is copy on write. So after a fork, when the child spamd changes a value, the kernel makes its own copy of the memory. (please correct me if I am wrong). To make it worse perl script (AFAIK) is data and not code which makes harder to reuse (espcially with evals around). Even if sharing does happen it is not enough. OTOH, with an I/O model, the total memory used would be: - the perl interpreter and libraries (this is trully shared on a fork model). - the compiled perl code and perl libraries. - one copy of the parsed rules and compiled regular expressions and non message/scanner related data. - one M::SA::PerMsgStatus object for each simultaneous scanned message (this is a place to put a limit on). > Still, if someone tries it and can demo increased efficiency... > go for it ;) This might require some internal changes to SA. Every Sync call would have to be changed to Async (NON BLOCKING). This might include SQL calls, DNS calls, exec ing external apps and even file I/O. -Raul Dias > --j.
SA not working?
Dear all, I installed SA on a Debian Etch system together with Postfix and Amavis. Strangely SA doens't score mails at all, but I don't see why. See the output of a spam mail I checked manually, When I run a check on the same mail on a different machine, it is scored: [EMAIL PROTECTED] tmp]# spamassassin -D < spam [20449] dbg: logger: adding facilities: all [20449] dbg: logger: logging level is DBG [20449] dbg: generic: SpamAssassin version 3.1.7-deb [20449] dbg: config: score set 0 chosen. [20449] dbg: util: running in taint mode? yes [20449] dbg: util: taint mode: deleting unsafe environment variables, resetting PATH [20449] dbg: util: PATH included '/usr/local/sbin', keeping [20449] dbg: util: PATH included '/usr/local/bin', keeping [20449] dbg: util: PATH included '/usr/sbin', keeping [20449] dbg: util: PATH included '/usr/bin', keeping [20449] dbg: util: PATH included '/sbin', keeping [20449] dbg: util: PATH included '/bin', keeping [20449] dbg: util: PATH included '/usr/bin/X11', keeping [20449] dbg: util: final PATH set to: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/bin/X11 [20449] dbg: message: MIME PARSER START [20449] dbg: message: main message type: text/plain [20449] dbg: message: parsing normal part [20449] dbg: message: added part, type: text/plain [20449] dbg: message: MIME PARSER END [20449] dbg: dns: is Net::DNS::Resolver available? yes [20449] dbg: dns: Net::DNS version: 0.59 [20449] dbg: config: using "/etc/spamassassin" for site rules pre files [20449] dbg: config: using "/var/lib/spamassassin/3.001007" for sys rules pre files [20449] dbg: config: read file /var/lib/spamassassin/3.001007/saupdates_openprotect_com.pre [20449] dbg: config: using "/var/lib/spamassassin/3.001007" for default rules dir [20449] dbg: config: read file /var/lib/spamassassin/3.001007/saupdates_openprotect_com.cf [20449] dbg: config: using "/etc/spamassassin" for site rules dir [20449] dbg: config: using "/root/.spamassassin" for user state dir [20449] dbg: config: using "/root/.spamassassin/user_prefs" for user prefs file [20449] dbg: config: read file /root/.spamassassin/user_prefs [20449] dbg: plugin: fixed relative path: /var/lib/spamassassin/3.001007/saupdates_openprotect_com/loadplugins.pre [20449] dbg: config: using "/var/lib/spamassassin/3.001007/saupdates_openprotect_com/loadplugins.pre" for included file [20449] dbg: config: read file /var/lib/spamassassin/3.001007/saupdates_openprotect_com/loadplugins.pre [20449] dbg: plugin: loading Mail::SpamAssassin::Plugin::SPF from @INC [20449] dbg: plugin: registered Mail::SpamAssassin::Plugin::SPF=HASH(0x99dd224) [20449] dbg: plugin: loading Mail::SpamAssassin::Plugin::Hashcash from @INC [20449] dbg: plugin: registered Mail::SpamAssassin::Plugin::Hashcash=HASH(0x99e74c0) [20449] dbg: plugin: loading Mail::SpamAssassin::Plugin::RelayCountry from @INC [20449] dbg: plugin: registered Mail::SpamAssassin::Plugin::RelayCountry=HASH(0x99c20b8) [20449] dbg: plugin: loading Mail::SpamAssassin::Plugin::Razor2 from @INC [20449] dbg: razor2: razor2 is available, version 2.81 [20449] dbg: plugin: registered Mail::SpamAssassin::Plugin::Razor2=HASH(0x9a14b84) [20449] dbg: plugin: loading Mail::SpamAssassin::Plugin::SpamCop from @INC [20449] dbg: reporter: network tests on, attempting SpamCop [20449] dbg: plugin: registered Mail::SpamAssassin::Plugin::SpamCop=HASH(0x9d03cdc) [20449] dbg: plugin: loading Mail::SpamAssassin::Plugin::URIDNSBL from @INC [20449] dbg: plugin: registered Mail::SpamAssassin::Plugin::URIDNSBL=HASH(0x9d3caa0) [20449] dbg: plugin: loading Mail::SpamAssassin::Plugin::Pyzor from @INC [20449] dbg: pyzor: network tests on, attempting Pyzor [20449] dbg: plugin: registered Mail::SpamAssassin::Plugin::Pyzor=HASH(0x9d5953c) [20449] dbg: plugin: loading Mail::SpamAssassin::Plugin::AWL from @INC [20449] dbg: plugin: registered Mail::SpamAssassin::Plugin::AWL=HASH(0x9d73998) [20449] dbg: plugin: loading Mail::SpamAssassin::Plugin::AutoLearnThreshold from @INC [20449] dbg: plugin: registered Mail::SpamAssassin::Plugin::AutoLearnThreshold=HASH(0x9d810c8) [20449] dbg: plugin: loading Mail::SpamAssassin::Plugin::WhiteListSubject from @INC [20449] dbg: plugin: registered Mail::SpamAssassin::Plugin::WhiteListSubject=HASH(0x9d8dc5c) [20449] dbg: plugin: loading Mail::SpamAssassin::Plugin::MIMEHeader from @INC [20449] dbg: plugin: registered Mail::SpamAssassin::Plugin::MIMEHeader=HASH(0x9d98eec) [20449] dbg: plugin: loading Mail::SpamAssassin::Plugin::ReplaceTags from @INC [20449] dbg: plugin: registered Mail::SpamAssassin::Plugin::ReplaceTags=HASH(0x9da5e20) [20449] dbg: plugin: fixed relative path: /var/lib/spamassassin/3.001007/saupdates_openprotect_com/70_sare_adult.cf [20449] dbg: config: using "/var/lib/spamassassin/3.001007/saupdates_openprotect_com/70_sare_adult.cf" for included file [20449] dbg: config: read file /var/lib/spamassassin/3.001007/saupdates_openprotect_com/70_sare_adult.cf [20449] dbg: plugin: fixed relativ
Re: Google Summer of Code 2007 ...
Raul Dias writes: > On Sat, 2007-02-17 at 02:07 +0100, Mark Martinec wrote: > > On Saturday February 17 2007 01:49, Matthew Wilson wrote: > > > I was/am primarily concerned with RAM usage for high-concurrency > > > situations. > > > > Ok. Still, in my experience about 30 (maybe 50) SA processes can > > fully utilize today's CPU & I/O, and it's probably no big deal > > to provide about 2 GB of memory to cater for such system. > > Also, and unfortunately, multithreading in Perl is rather > > cumbersome and not significantly less expensive than fully > > individual processes. > > After experiencing with the sa-blacklist.cf some time ago with 45 > process brought my system to its knees with 3.5GB (out of memory). > > I agree about the thread model. > > But sticking to a async I/O model is a valid point. If implemented > correctly it will save a lot of memory and even improve performance a > little. > > Having separeted process saves the need to have to check for garbage > after filtering a message, which will cause the code to have to be > recheck. > > However, for uniprocessor systems, having multiple process running is > actually more expansive than a async I/O one. For multiple process > system, just keep one process for cpu or less. > > In the past I have played a lot with perl-loop (any loopers around?) > which was the only way to go. It is too low level for most people, but > perhaps POE is the way to go today (which can use perl-loop as its > base). I'm dubious about the benefits for SpamAssassin... An async model works very well for network-bound and I/O-bound servers; however, SpamAssassin is mainly CPU-bound, since the network and I/O parts are already mostly run async during the scan operation. Also, the multiple spamd processes share quite a lot of RAM with each other -- there's a bug in how linux reports "shared" memory which makes it appear much worse than it is. read the FAQ for more details. Still, if someone tries it and can demo increased efficiency... go for it ;) --j.
RE: Bayes db size....
I am sure this has been asked numerous times before, but what is the logic in having auto expiry on the bayes DB, and not seen? Seems that once tokens have been removed from the DB there is little to no use for 'unlearning' any associated messages. Besides on a busy system, this seen file gets large very fast. I'd vote for auto expiry and maintenance on seen as well as AWL. -Original Message- From: Theo Van Dinter [mailto:[EMAIL PROTECTED] Sent: Friday, February 16, 2007 7:19 PM To: spam mailling list Subject: Re: Bayes db size On Fri, Feb 16, 2007 at 06:17:36PM -0600, Robert Nicholson wrote: > So you're saying that right now seen isn't capped like tokens right? seen has no max size nor expiry features. -- Randomly Selected Tagline: "Like any French restaurant in America, it was overpriced, noisy, moody, and would put you in mortal danger if you had an accident with anything larger than a croissant." - Unknown about the Renault LeCar