Re: Parallelizing Spam Assassin

2009-07-31 Thread Paweł Sasin
 In my tests - there was not MTA. The mails/spam were collected from
 some server in mbox format and fed to SA using --mbox switch. The
 size of msgs was not altered in any fashion - just the usual size of
 incoming spam/mails

If you're interested in testing/tuning spamassassin for heavy loads you
should consider using spamd daemon. Then you may use SLAMD [1] as
performance evaluation platform [2].

It takes some effort to set up the environment, but SLAMD helps in
repetitive testing and keeping track of the results (comparison,
history, charts).

[1] http://www.slamd.com
[2] https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5689

-- 
Pawel Sasin

WIRTUALNA POLSKA Spolka Akcyjna z siedziba w Gdansku przy ul.
Traugutta 115 C, wpisana do Krajowego Rejestru Sadowego - Rejestru
Przedsiebiorcow prowadzonego przez Sad Rejonowy Gdansk - Polnoc w
Gdansku pod numerem KRS 068548, o kapitale zakladowym
67.980.024,00  zlotych oplaconym w calosci oraz Numerze Identyfikacji
Podatkowej 957-07-51-216.


Re: bayes autolearn off but journal updated

2009-01-22 Thread Paweł Sasin
  Yes, more specifically, it's mostly going to be updating the
  atime, or time of last access, records for tokens. This time is
  used by the expiry process to drop the least recently used tokens.
 
 
  What does SA do, if it can't r/w open bayes database? Will it skip
  BAYES checks or just tie it r/o ?
 
  (I notice ocasional missing BAYES in X-Spam headers)
 
  Well, first let's be clear.. it's R/W opening the journal, not the
  database itself.
 
  The main _toks and _seen files are only locked R/W if there's one
  of the following going on:
  learning without bayes_learn_to_journal set
  a journal sync
  token expiry is running
 
  As for write locks to the journal, if for some reason there's a
  conflict, the update is just dropped with a warning. This isn't
  incredibly likely unless your bayes is really busy, as journal
  updates are pretty short in nature.
 
 on POSIX filesystems, this should be virtually impossible, since the
 file is opened for append with atomic writes.

It is quite common on Solaris with 40+ working spamds and really high
traffic volume. Some time ago we had such situation. The server had 50%
idle while the spamds were striving to lock the journal (auto_learn and
auto_expire disabled) rather than going on to handle a next message. Ie
the machine was 50% idle but was unable to handle more messages and the
bottleneck was in journal updates.

-- 
Paweł Sasin

WIRTUALNA POLSKA Spolka Akcyjna z siedziba w Gdansku przy ul.
Traugutta 115 C, wpisana do Krajowego Rejestru Sadowego - Rejestru
Przedsiebiorcow prowadzonego przez Sad Rejonowy Gdansk - Polnoc w
Gdansku pod numerem KRS 068548, o kapitale zakladowym
67.980.024,00  zlotych oplaconym w calosci oraz Numerze Identyfikacji
Podatkowej 957-07-51-216.


Re: Trying out a new concept

2008-09-24 Thread Paweł Sasin
 I don't know how this will work but I'm building the data now. For
 those of you who are familiar with Day old bread lists to detect new
 domains, as you know there's a lag time in the data and they often
 don't have data from all the registries. So - here's a different
 solution.
 
 What I'm thinking is to accumulate every domain name that interacts
 with my system and storing it in a list. Eventually after a week or
 so I should have a good list. Then the idea is to do a lookup to see
 if a new domain is NOT on the list. This will catch all really new
 domains, but will have some false positives. But - if it is mixed
 with other conditionals it might be a good way to detect and block
 spam from or linking to tasting domains.

If you use the AWL, you have the list ready. Just scan the AWL DB for
domain names. 

AWL has even more precise data than you want to gather. We could use
it as well. If we assume we trust a new sender less than a sender
we've already seen, then just score any message that has a sender not
contained in the AWL DB (eg +1.0). 

-- 
Paweł Sasin

WIRTUALNA POLSKA Spolka Akcyjna z siedziba w Gdansku przy ul.
Traugutta 115 C, wpisana do Krajowego Rejestru Sadowego - Rejestru
Przedsiebiorcow prowadzonego przez Sad Rejonowy Gdansk - Polnoc w
Gdansku pod numerem KRS 068548, o kapitale zakladowym
67.980.024,00  zlotych oplaconym w calosci oraz Numerze Identyfikacji
Podatkowej 957-07-51-216.


Re: spamd throughput issues

2007-12-09 Thread Paweł Sasin
Hi,

are you using network tests? 

Try to evaluate spamd performance when run with the -L flag.

-- 
Pawel Sasin

WIRTUALNA  POLSKA  SA, ul. Traugutta 115c, 80-226 Gdansk; NIP: 957-07-51-216; 
Sad Rejonowy Gdansk-Polnoc KRS 068548, kapital zakladowy 62.880.024 zlotych 
(w calosci wplacony)