Re: Parallelizing Spam Assassin

2009-07-31 Thread Paweł Sasin
> In my tests - there was not MTA. The mails/spam were collected from
> some server in mbox format and fed to SA using --mbox switch. The
> size of msgs was not altered in any fashion - just the usual size of
> incoming spam/mails

If you're interested in testing/tuning spamassassin for heavy loads you
should consider using spamd daemon. Then you may use SLAMD [1] as
performance evaluation platform [2].

It takes some effort to set up the environment, but SLAMD helps in
repetitive testing and keeping track of the results (comparison,
history, charts).

[1] http://www.slamd.com
[2] https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5689

-- 
Pawel Sasin

"WIRTUALNA POLSKA" Spolka Akcyjna z siedziba w Gdansku przy ul.
Traugutta 115 C, wpisana do Krajowego Rejestru Sadowego - Rejestru
Przedsiebiorcow prowadzonego przez Sad Rejonowy Gdansk - Polnoc w
Gdansku pod numerem KRS 068548, o kapitale zakladowym
67.980.024,00  zlotych oplaconym w calosci oraz Numerze Identyfikacji
Podatkowej 957-07-51-216.


Re: bayes autolearn off but journal updated

2009-01-22 Thread Paweł Sasin
> >>> Yes, more specifically, it's mostly going to be updating the
> >>> "atime", or time of last access, records for tokens. This time is
> >>> used by the expiry process to drop the least recently used tokens.
> >>>
> >>
> >> What does SA do, if it can't r/w open bayes database? Will it skip
> >> BAYES checks or just tie it r/o ?
> >>
> >> (I notice ocasional missing BAYES in X-Spam headers)
> >>
> > Well, first let's be clear.. it's R/W opening the journal, not the
> > database itself.
> >
> > The main _toks and _seen files are only locked R/W if there's one
> > of the following going on:
> > learning without bayes_learn_to_journal set
> > a journal sync
> > token expiry is running
> >
> > As for write locks to the journal, if for some reason there's a
> > conflict, the update is just dropped with a warning. This isn't
> > incredibly likely unless your bayes is really busy, as journal
> > updates are pretty short in nature.
> 
> on POSIX filesystems, this should be virtually impossible, since the
> file is opened for append with atomic writes.

It is quite common on Solaris with 40+ working spamds and really high
traffic volume. Some time ago we had such situation. The server had 50%
idle while the spamds were striving to lock the journal (auto_learn and
auto_expire disabled) rather than going on to handle a next message. Ie
the machine was 50% idle but was unable to handle more messages and the
bottleneck was in journal updates.

-- 
Paweł Sasin

"WIRTUALNA POLSKA" Spolka Akcyjna z siedziba w Gdansku przy ul.
Traugutta 115 C, wpisana do Krajowego Rejestru Sadowego - Rejestru
Przedsiebiorcow prowadzonego przez Sad Rejonowy Gdansk - Polnoc w
Gdansku pod numerem KRS 068548, o kapitale zakladowym
67.980.024,00  zlotych oplaconym w calosci oraz Numerze Identyfikacji
Podatkowej 957-07-51-216.


Re: Trying out a new concept

2008-09-24 Thread Paweł Sasin
> I don't know how this will work but I'm building the data now. For
> those of you who are familiar with Day old bread lists to detect new
> domains, as you know there's a lag time in the data and they often
> don't have data from all the registries. So - here's a different
> solution.
> 
> What I'm thinking is to accumulate every domain name that interacts
> with my system and storing it in a list. Eventually after a week or
> so I should have a good list. Then the idea is to do a lookup to see
> if a new domain is NOT on the list. This will catch all really new
> domains, but will have some false positives. But - if it is mixed
> with other conditionals it might be a good way to detect and block
> spam from or linking to tasting domains.

If you use the AWL, you have the list ready. Just scan the AWL DB for
domain names. 

AWL has even more precise data than you want to gather. We could use
it as well. If we assume we trust a new sender less than a sender
we've already seen, then just score any message that has a sender not
contained in the AWL DB (eg +1.0). 

-- 
Paweł Sasin

"WIRTUALNA POLSKA" Spolka Akcyjna z siedziba w Gdansku przy ul.
Traugutta 115 C, wpisana do Krajowego Rejestru Sadowego - Rejestru
Przedsiebiorcow prowadzonego przez Sad Rejonowy Gdansk - Polnoc w
Gdansku pod numerem KRS 068548, o kapitale zakladowym
67.980.024,00  zlotych oplaconym w calosci oraz Numerze Identyfikacji
Podatkowej 957-07-51-216.


Re: spamd throughput issues

2007-12-09 Thread Paweł Sasin
Hi,

are you using network tests? 

Try to evaluate spamd performance when run with the -L flag.

-- 
Pawel Sasin

WIRTUALNA  POLSKA  SA, ul. Traugutta 115c, 80-226 Gdansk; NIP: 957-07-51-216; 
Sad Rejonowy Gdansk-Polnoc KRS 068548, kapital zakladowy 62.880.024 zlotych 
(w calosci wplacony)