Re: Parallelizing Spam Assassin
> In my tests - there was not MTA. The mails/spam were collected from > some server in mbox format and fed to SA using --mbox switch. The > size of msgs was not altered in any fashion - just the usual size of > incoming spam/mails If you're interested in testing/tuning spamassassin for heavy loads you should consider using spamd daemon. Then you may use SLAMD [1] as performance evaluation platform [2]. It takes some effort to set up the environment, but SLAMD helps in repetitive testing and keeping track of the results (comparison, history, charts). [1] http://www.slamd.com [2] https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5689 -- Pawel Sasin "WIRTUALNA POLSKA" Spolka Akcyjna z siedziba w Gdansku przy ul. Traugutta 115 C, wpisana do Krajowego Rejestru Sadowego - Rejestru Przedsiebiorcow prowadzonego przez Sad Rejonowy Gdansk - Polnoc w Gdansku pod numerem KRS 068548, o kapitale zakladowym 67.980.024,00 zlotych oplaconym w calosci oraz Numerze Identyfikacji Podatkowej 957-07-51-216.
Re: bayes autolearn off but journal updated
> >>> Yes, more specifically, it's mostly going to be updating the > >>> "atime", or time of last access, records for tokens. This time is > >>> used by the expiry process to drop the least recently used tokens. > >>> > >> > >> What does SA do, if it can't r/w open bayes database? Will it skip > >> BAYES checks or just tie it r/o ? > >> > >> (I notice ocasional missing BAYES in X-Spam headers) > >> > > Well, first let's be clear.. it's R/W opening the journal, not the > > database itself. > > > > The main _toks and _seen files are only locked R/W if there's one > > of the following going on: > > learning without bayes_learn_to_journal set > > a journal sync > > token expiry is running > > > > As for write locks to the journal, if for some reason there's a > > conflict, the update is just dropped with a warning. This isn't > > incredibly likely unless your bayes is really busy, as journal > > updates are pretty short in nature. > > on POSIX filesystems, this should be virtually impossible, since the > file is opened for append with atomic writes. It is quite common on Solaris with 40+ working spamds and really high traffic volume. Some time ago we had such situation. The server had 50% idle while the spamds were striving to lock the journal (auto_learn and auto_expire disabled) rather than going on to handle a next message. Ie the machine was 50% idle but was unable to handle more messages and the bottleneck was in journal updates. -- Paweł Sasin "WIRTUALNA POLSKA" Spolka Akcyjna z siedziba w Gdansku przy ul. Traugutta 115 C, wpisana do Krajowego Rejestru Sadowego - Rejestru Przedsiebiorcow prowadzonego przez Sad Rejonowy Gdansk - Polnoc w Gdansku pod numerem KRS 068548, o kapitale zakladowym 67.980.024,00 zlotych oplaconym w calosci oraz Numerze Identyfikacji Podatkowej 957-07-51-216.
Re: Trying out a new concept
> I don't know how this will work but I'm building the data now. For > those of you who are familiar with Day old bread lists to detect new > domains, as you know there's a lag time in the data and they often > don't have data from all the registries. So - here's a different > solution. > > What I'm thinking is to accumulate every domain name that interacts > with my system and storing it in a list. Eventually after a week or > so I should have a good list. Then the idea is to do a lookup to see > if a new domain is NOT on the list. This will catch all really new > domains, but will have some false positives. But - if it is mixed > with other conditionals it might be a good way to detect and block > spam from or linking to tasting domains. If you use the AWL, you have the list ready. Just scan the AWL DB for domain names. AWL has even more precise data than you want to gather. We could use it as well. If we assume we trust a new sender less than a sender we've already seen, then just score any message that has a sender not contained in the AWL DB (eg +1.0). -- Paweł Sasin "WIRTUALNA POLSKA" Spolka Akcyjna z siedziba w Gdansku przy ul. Traugutta 115 C, wpisana do Krajowego Rejestru Sadowego - Rejestru Przedsiebiorcow prowadzonego przez Sad Rejonowy Gdansk - Polnoc w Gdansku pod numerem KRS 068548, o kapitale zakladowym 67.980.024,00 zlotych oplaconym w calosci oraz Numerze Identyfikacji Podatkowej 957-07-51-216.
Re: spamd throughput issues
Hi, are you using network tests? Try to evaluate spamd performance when run with the -L flag. -- Pawel Sasin WIRTUALNA POLSKA SA, ul. Traugutta 115c, 80-226 Gdansk; NIP: 957-07-51-216; Sad Rejonowy Gdansk-Polnoc KRS 068548, kapital zakladowy 62.880.024 zlotych (w calosci wplacony)