ham source for site-wide bayes?

2015-05-20 Thread Steve Rainwater
I've set up spamassassin with a site-wide bayes configuration. I have
some spamtrap email addresses that supply fresh spam into bayes for
training on a cron job. However, from what I've read, bayes needs to
have ongoing ham as well as spam for training in order to work well.
What's the usual method of supplying the ham? Does that have to be done
manually (how often?) or has anyone come up with a way to automatically
supply ham. 

I have the spamtrap email boxes that receive spam-only but all the real
email addresses on the server receive a mix of ham and spam, which is
why I need spamassassin in the first place :)  I can't find anything in
spamassassin docs so far that explains a non-manual way of supplying
ham. Have I missed something? Is there some sort of service where I can
subscribe to an updated ham corpus automatically like with the clamav
database? 

-Steve




Re: ham source for site-wide bayes?

2015-05-20 Thread Kevin A. McGrail

On 5/20/2015 12:29 PM, Steve Rainwater wrote:

I've set up spamassassin with a site-wide bayes configuration. I have
some spamtrap email addresses that supply fresh spam into bayes for
training on a cron job. However, from what I've read, bayes needs to
have ongoing ham as well as spam for training in order to work well.
What's the usual method of supplying the ham? Does that have to be done
manually (how often?) or has anyone come up with a way to automatically
supply ham.

I have the spamtrap email boxes that receive spam-only but all the real
email addresses on the server receive a mix of ham and spam, which is
why I need spamassassin in the first place :)  I can't find anything in
spamassassin docs so far that explains a non-manual way of supplying
ham. Have I missed something? Is there some sort of service where I can
subscribe to an updated ham corpus automatically like with the clamav
database?

One way people often supply ham is to use sent items from your legit users.

Regards,
KAM


Re: ham source for site-wide bayes?

2015-05-20 Thread Axb

On 20.05.2015 18:29, Steve Rainwater wrote:

I've set up spamassassin with a site-wide bayes configuration. I have
some spamtrap email addresses that supply fresh spam into bayes for
training on a cron job. However, from what I've read, bayes needs to
have ongoing ham as well as spam for training in order to work well.
What's the usual method of supplying the ham? Does that have to be done
manually (how often?)


it doesn't have to be done - you *can* do it manually.


or has anyone come up with a way to automaticallysupply ham.


it's called auto_learn [works for me]

you'll find all the details in

https://spamassassin.apache.org/full/3.4.x/doc/Mail_SpamAssassin_Conf.txt

LEARNING OPTIONS



I have the spamtrap email boxes that receive spam-only but all the real
email addresses on the server receive a mix of ham and spam, which is
why I need spamassassin in the first place :)  I can't find anything in
spamassassin docs so far that explains a non-manual way of supplying
ham. Have I missed something? Is there some sort of service where I can
subscribe to an updated ham corpus automatically like with the clamav
database?


your ham is specific to your traffic - you cannot inherit somebody 
else's ham and expect it to work nicely with you traffic.


You'll soon read a dozen of ways to do it.

I'll add mine: I use autolearn AND feed bayes trap data to a 6GB Redis 
DB [works for]


Axb


Re: Site-wide bayes and individual bayes

2014-10-12 Thread LuKreme
On 10 Oct 2014, at 06:49 , RW rwmailli...@googlemail.com wrote:
 And, if not, is it generally better to do sitewide?
 
 It's hard to say, there are advantages and disadvantages either way.

OK, so specific example then.

Small server with a few dozen email users spread over several domains. Almost 
none of these users does any spam training at all, the rest just delete 
unwanted messages (not even marking them as junk) or even worse, just ignore 
them. One user is very aggressive in marking Spam and in keeping the Inbox 
clear of all spam.

I am of two minds. First, that everyone else would benefit from this user’s 
actions or, alternatively, that the user’s aggressive tagging will actually 
‘poison’ the bayes db for the other users who maybe do not think that endless 
emails from pinterest or some political candidate are actually spam.

-- 
You see, in this world there's two kinds of people, my friend: Those
with loaded guns and those who dig. You dig.



Re: Site-wide bayes and individual bayes

2014-10-12 Thread Reindl Harald


Am 12.10.2014 um 18:59 schrieb LuKreme:

On 10 Oct 2014, at 06:49 , RW rwmailli...@googlemail.com wrote:

And, if not, is it generally better to do sitewide?


It's hard to say, there are advantages and disadvantages either way.


OK, so specific example then.

Small server with a few dozen email users spread over several domains. Almost 
none of these users does any spam training at all, the rest just delete 
unwanted messages (not even marking them as junk) or even worse, just ignore 
them. One user is very aggressive in marking Spam and in keeping the Inbox 
clear of all spam.

I am of two minds. First, that everyone else would benefit from this user’s 
actions or, alternatively, that the user’s aggressive tagging will actually 
‘poison’ the bayes db for the other users who maybe do not think that endless 
emails from pinterest or some political candidate are actually spam.


if nobody trains his user specific bayes (like here) site-wide is the 
way to go, just because until a user has flagged 200 ham messages his 
bayes won#t get used regardless of the amount of spam marked ones


merge a users aggressive training site-wide means you need to trust 
that users actions - means: he needs to be careful and not just flag 
anything he don't want to see as spam


if it is really one or two users like here i would stay at a normal 
site-wide bayes, i realized that with IMAP shared folders where those 
users see a ham/spam folder to move messages there and are advised to be 
carfeul in case of ham samples not leak sensitive content


i review that stuff, save the eml messages to the training folders on 
the mailserver and call the sa-learn script, until now a nearly 100% 
result over 8 weeks production (99% spam catched, no false positives)




signature.asc
Description: OpenPGP digital signature


Re: Site-wide bayes and individual bayes

2014-10-12 Thread Ted Mittelstaedt



On 10/12/2014 9:59 AM, LuKreme wrote:

On 10 Oct 2014, at 06:49 , RWrwmailli...@googlemail.com  wrote:

And, if not, is it generally better to do sitewide?


It's hard to say, there are advantages and disadvantages either
way.


OK, so specific example then.

Small server with a few dozen email users spread over several
domains. Almost none of these users does any spam training at all,
the rest just delete unwanted messages (not even marking them as
junk) or even worse, just ignore them. One user is very aggressive in
marking Spam and in keeping the Inbox clear of all spam.

I am of two minds. First, that everyone else would benefit from this
user’s actions or, alternatively, that the user’s aggressive tagging
will actually ‘poison’ the bayes db for the other users who maybe do
not think that endless emails from pinterest or some political
candidate are actually spam.



For starters your problem isn't SPAM it's HAM.

You can get all the spam you want.  Just parse the mail log file every
day for a few weeks, looking for delivery attempts to nonexistent 
mailboxes.  When you see repeated delivery attempts to a specific 
mailbox then create an email address on that nonexistent mailbox and 
redirect all the email into it into a spam box


My experience is that once spammers think they have discovered an
email address they will never leave it alone, they will send increasing
amounts of spam to that address.

If you are lucky enough to never have spammers trying to probe your
server, you can create your honeypot email addresses, just make them up,
and then take these email addresses and post them into the Unsubscribe 
links on spam.  That is a good way to contaminate spammers mailing lists

with honeypot addresses.  A legitimate mailsender will ignore these, a
spammer will happily pull addresses out of unsubscribe replies.

That's your centralized spam source.  Do this for a couple dozen 
nonexistent email addresses on your server domains and you will have

all the input you want for the Bayes learner.

By definition ANY email to a nonexistent address (not an old address
that was closed down years ago) is unsolicited, AKA SPAM.

As for desired political mail, on my servers I classify all of it as
spam, I can think of maybe only 2 users over the last decade who have
complained about not getting it and for those it's easy to do an
all_spam_to to them and then tell them they will have to do their own
spam filtering.

Since overwhelmingly the political email I have seen coming in is the
offensive conservative anti-women, anti-blacks, anti-latinos, beg for
more money email, I have to say that I'm not particularly concerned 
about the wishes of customers who WANT that kind of mail - I'm quite

happy if they go find another provider.

And, naturally, that kind of email is never ever appropriate for a
business and no employee in a business is ever going to dare complain to 
their bosses that they aren't getting it.


If the politicos want to drown people in hate mail, they have paper
mail to do it - might as well make them help reduce my taxes by
subsidizing the US Post Office with their hate mail, that's about the 
only thing that's good about it.


Anyway, as I said HAM is the problem.  If you don't have large 
quantities of ham, Bayes won't work.  Of course, nothing is preventing

you from copying people's folders  (if they are using IMAP) into one
giant mailbox and using that as a HAM source.  You can probably assume
that if a user has gone to the trouble of saving mail to a folder that
it is ham.

Ted


Re: Site-wide bayes and individual bayes

2014-10-10 Thread RW
On Wed, 8 Oct 2014 15:26:25 -0600
LuKreme wrote:

 Is it possible to have a site-wide bayes AND individual bayes for
 some users (or all users)?

Not as things stand. You could use Bayes for one and a separate filter
for the other.

 And, if not, is it generally better to do sitewide?

It's hard to say, there are advantages and disadvantages either way.
 
 And, is it possible to take all the individual bayes and combine them
 into a stitewide db?

It should be fairly straightforward to combine the results from running 
sa-learn --backup on multiple accounts. It's just a matter of
combining the total ham/spam message counts and the counts for each
token.


Re: Site-wide bayes and individual bayes

2014-10-10 Thread John Hardin

On Fri, 10 Oct 2014, RW wrote:


On Wed, 8 Oct 2014 15:26:25 -0600
LuKreme wrote:


Is it possible to have a site-wide bayes AND individual bayes for
some users (or all users)?


Not as things stand.


Not as things stand, possibly absent a hack like: any user who wants to 
use the site-wide bayes has symlinks to the shared bayes database files in 
their local dir.


Not sure how well that would work in practice (locking if you autolearn), 
and it would be somewhat tedious to maintain.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Maxim VI: If violence wasn’t your last resort, you failed to resort
  to enough of it.
---
 862 days since the first successful private support mission to ISS (SpaceX)

Site-wide bayes and individual bayes

2014-10-08 Thread LuKreme
Is it possible to have a site-wide bayes AND individual bayes for some users 
(or all users)?

And, if not, is it generally better to do sitewide?

And, is it possible to take all the individual bayes and combine them into a 
stitewide db?

-- 
You've got to dance like nobody's watching. - Kathy Mattea



Re: sa-learn site-wide bayes on Redis

2014-08-21 Thread Marcin Mirosław
W dniu 20.08.2014 o 14:42, Axb pisze:
 On 08/20/2014 02:25 PM, Matteo Dessalvi wrote:
 Hi all.


 I am managing a bunch of Linux MTAs which are placed in
 front of some Exchange servers. In such a configuration
 the Bayes filter is deployed site-wide.

 For a new deployment of these servers I am planning
 to use Redis as a centralized backend (previously
 the bayes db were just files saved on the disk).

 My question is: do I have to use a specific option
 to tell sa-learn that the bayes db is now hosted on
 Redis? Or sa-learn will use the info from the
 bayes_sql_dsn directive in my local.cf?

 Looking into the wiki:
 http://wiki.apache.org/spamassassin/SiteWideBayesSetup

 or into the sa-learn docs:
 http://spamassassin.apache.org/full/3.4.x/doc/sa-learn.html

 did not give me any clues.
 
 see
 
 http://svn.apache.org/repos/asf/spamassassin/trunk/contrib/HOWTO.Bayes-Redis/
 
 
 hope that helps.
 This is not an official doc, so if you see anything that needs to be
 added/changed, pls let me know.

Hi!
I'm reading bayes_redis.cf and I can see:

#NOTE: We're not using authentication assuming the Redis server/port
should not be reachable form the outside
# You can add authentication once you've seen it work.


Does it means that this example config doesn't include authentication
options or it means that SA doesn't support auth for redis?

Marcin






Re: sa-learn site-wide bayes on Redis

2014-08-21 Thread Matteo Dessalvi

I am pretty sure SA support the Redis authentication mechanism.
For my tests I have used the following line:

bayes_sql_dsn  server=127.0.0.1:6379;password=MySecretPWD;database=2

Matteo

On 21.08.2014 12:56, Marcin Mirosław wrote:


Hi!
I'm reading bayes_redis.cf and I can see:

#NOTE: We're not using authentication assuming the Redis server/port
should not be reachable form the outside
# You can add authentication once you've seen it work.


Does it means that this example config doesn't include authentication
options or it means that SA doesn't support auth for redis?

Marcin






Re: sa-learn site-wide bayes on Redis

2014-08-21 Thread Marcin Mirosław
W dniu 21.08.2014 o 13:45, Matteo Dessalvi pisze:
 I am pretty sure SA support the Redis authentication mechanism.
 For my tests I have used the following line:
 
 bayes_sql_dsn  server=127.0.0.1:6379;password=MySecretPWD;database=2

Thanks Matteo,
firstly I should try then write to ML:) So now I did own check. It looks
that SA doesn't authenticate when connects to redis. It didn't work for
me with your example not when I used
bayes_sql_password   password

When redis needs passowrd then SA throws bayes: Redis failed: Redis
error: ERR operation not permitted, tcpdump also confirms that SA
doesn't do AUTH.
It's strange because in Redis.pm I can see that authentication is
supported. Now I'm thinking where I could made mistake in configuration...

Thanks,
Marcin


Re: sa-learn site-wide bayes on Redis

2014-08-21 Thread Matteo Dessalvi

Which version of Redis are you using? I did have some
problems with the 2.4 version packaged by Debian and
I did solve a similar problem using a more recent
version, like the 2.7 or 2.8.

Matteo

On 21.08.2014 14:45, Marcin Mirosław wrote:

W dniu 21.08.2014 o 13:45, Matteo Dessalvi pisze:

I am pretty sure SA support the Redis authentication mechanism.
For my tests I have used the following line:

bayes_sql_dsn  server=127.0.0.1:6379;password=MySecretPWD;database=2


Thanks Matteo,
firstly I should try then write to ML:) So now I did own check. It looks
that SA doesn't authenticate when connects to redis. It didn't work for
me with your example not when I used
bayes_sql_password   password

When redis needs passowrd then SA throws bayes: Redis failed: Redis
error: ERR operation not permitted, tcpdump also confirms that SA
doesn't do AUTH.
It's strange because in Redis.pm I can see that authentication is
supported. Now I'm thinking where I could made mistake in configuration...

Thanks,
Marcin



Re: BayesStore::Redis can't do AUTH when Redis is =2.6 (was: sa-learn site-wide bayes on Redis)

2014-08-21 Thread Marcin Mirosław
W dniu 21.08.2014 o 15:20, Matteo Dessalvi pisze:
 Which version of Redis are you using? I did have some
 problems with the 2.4 version packaged by Debian and
 I did solve a similar problem using a more recent
 version, like the 2.7 or 2.8.

And you fixed my problem! Indeed, upgrading from redis-2.6.15 to 2.8.13
fixed problem with not working AUTH.
Thanks Matteo!



sa-learn site-wide bayes on Redis

2014-08-20 Thread Matteo Dessalvi

Hi all.


I am managing a bunch of Linux MTAs which are placed in
front of some Exchange servers. In such a configuration
the Bayes filter is deployed site-wide.

For a new deployment of these servers I am planning
to use Redis as a centralized backend (previously
the bayes db were just files saved on the disk).

My question is: do I have to use a specific option
to tell sa-learn that the bayes db is now hosted on
Redis? Or sa-learn will use the info from the
bayes_sql_dsn directive in my local.cf?

Looking into the wiki:
http://wiki.apache.org/spamassassin/SiteWideBayesSetup

or into the sa-learn docs:
http://spamassassin.apache.org/full/3.4.x/doc/sa-learn.html

did not give me any clues.


Thanks in advance!


Best regards,
  Matteo


Re: sa-learn site-wide bayes on Redis

2014-08-20 Thread Axb

On 08/20/2014 02:25 PM, Matteo Dessalvi wrote:

Hi all.


I am managing a bunch of Linux MTAs which are placed in
front of some Exchange servers. In such a configuration
the Bayes filter is deployed site-wide.

For a new deployment of these servers I am planning
to use Redis as a centralized backend (previously
the bayes db were just files saved on the disk).

My question is: do I have to use a specific option
to tell sa-learn that the bayes db is now hosted on
Redis? Or sa-learn will use the info from the
bayes_sql_dsn directive in my local.cf?

Looking into the wiki:
http://wiki.apache.org/spamassassin/SiteWideBayesSetup

or into the sa-learn docs:
http://spamassassin.apache.org/full/3.4.x/doc/sa-learn.html

did not give me any clues.


see

http://svn.apache.org/repos/asf/spamassassin/trunk/contrib/HOWTO.Bayes-Redis/

hope that helps.
This is not an official doc, so if you see anything that needs to be 
added/changed, pls let me know.




Re: sa-learn site-wide bayes on Redis

2014-08-20 Thread Matteo Dessalvi

No, unfortunately it does not help me.
I already have a proper config file for SA
to access Redis as backend and most of
the configurations are done automatically
through a Chef cookbook (Redis included).

In the docs you pointed me there's nothing
about the interaction between sa-learn and
Redis.

Best regards,
   Matteo

On 20.08.2014 14:42, Axb wrote:


see

http://svn.apache.org/repos/asf/spamassassin/trunk/contrib/HOWTO.Bayes-Redis/


hope that helps.
This is not an official doc, so if you see anything that needs to be
added/changed, pls let me know.



Re: sa-learn site-wide bayes on Redis

2014-08-20 Thread Axb

bayes_store_module  Mail::SpamAssassin::BayesStore::Redis

tells SA to use the Redis backend. To sa-learn this becomes transparent, 
as with any other backed (DBD,SDBM,SQL)


bayes_redis.cf shows what parameters are mandatory/optional

On 08/20/2014 03:02 PM, Matteo Dessalvi wrote:

No, unfortunately it does not help me.
I already have a proper config file for SA
to access Redis as backend and most of
the configurations are done automatically
through a Chef cookbook (Redis included).

In the docs you pointed me there's nothing
about the interaction between sa-learn and
Redis.

Best regards,
Matteo

On 20.08.2014 14:42, Axb wrote:


see

http://svn.apache.org/repos/asf/spamassassin/trunk/contrib/HOWTO.Bayes-Redis/



hope that helps.
This is not an official doc, so if you see anything that needs to be
added/changed, pls let me know.





Re: sa-learn site-wide bayes on Redis

2014-08-20 Thread Matteo Dessalvi

Ok, perfect! Thanks a lot! This is what I want to know
and I was not so sure about.

I may be wrong but it looks to me the fact that
tools like sa-learn can access transparently the
backends configured for SA is not exactly clear
from the docs.

It would be great if the wiki maintainers could add
a short note somewhere in the pages regarding the
SiteWide deployment or related topics.

Best regards,
 Matteo

On 20.08.2014 15:08, Axb wrote:

bayes_store_module  Mail::SpamAssassin::BayesStore::Redis

tells SA to use the Redis backend. To sa-learn this becomes transparent,
as with any other backed (DBD,SDBM,SQL)

bayes_redis.cf shows what parameters are mandatory/optional




Re: sa-learn site-wide bayes on Redis

2014-08-20 Thread Axb

I so love to posters.

On 08/20/2014 03:33 PM, Matteo Dessalvi wrote:

Ok, perfect! Thanks a lot! This is what I want to know
and I was not so sure about.

I may be wrong but it looks to me the fact that
tools like sa-learn can access transparently the
backends configured for SA is not exactly clear
from the docs.

It would be great if the wiki maintainers could add
a short note somewhere in the pages regarding the
SiteWide deployment or related topics.

Best regards,
  Matteo

On 20.08.2014 15:08, Axb wrote:

bayes_store_module  Mail::SpamAssassin::BayesStore::Redis

tells SA to use the Redis backend. To sa-learn this becomes transparent,
as with any other backed (DBD,SDBM,SQL)

bayes_redis.cf shows what parameters are mandatory/optional


Watch your memory usage:

If you configure Redis to dump data from memory to file, it's safe to 
*double* the amount of memory you planned for Redis usage



as in my case:

sa-learn --dump magic
0.000  0  3  0  non-token data: bayes db version
0.000  0   25218483  0  non-token data: nspam
0.000  0   11919587  0  non-token data: nham

# Memory
used_memory:3637407032
used_memory_human:3.39G
used_memory_rss:4068585472
used_memory_peak:3702485960
used_memory_peak_human:3.45G
used_memory_lua:205824
mem_fragmentation_ratio:1.12
mem_allocator:jemalloc-3.2.0


I keep at least 5 GB of free memory for the dump to file to avoid ugly 
swaps or crashes.


free
total   used   free sharedbuffers cached
Mem:1426264857866648475984  0  162744 1343408
-/+ buffers/cache:42805129982136
Swap:  2046968  02046968





Re: Site-wide Bayes

2009-12-17 Thread RW
On Wed, 16 Dec 2009 09:36:12 -0500
Michael Scheidell scheid...@secnap.net wrote:

 On 12/16/09 9:27 AM, Thomas Harold wrote:
  I'm guessing that you'd also want to change the autolearn
  thresholds to be stricter?  Like only auto-learning if it scores
  below -2 or above +10?
 
  (That might be an amavisd-new feature.)
 I still use 0, but have the high score at +15.

The default is 0.1 IIRC, and I wouldn't recommend setting it lower
without negative-scoring custom rules - it's set positive for good
reasons. 

BAYES and userconf whitelisting rules don't count for autolearning, so
if you set a negative threshold with the default rules, you rely on
DNS whitelisting to define ham - the likes of HABEOUS.

Setting it at exactly 0.0 is also problematical since the decision to
learn is commonly going to be determined by nominally scored rules that
score 0.001 and -0.001.


Re: Site-wide Bayes

2009-12-17 Thread Thomas Harold

On 12/17/2009 10:30 AM, RW wrote:

On Wed, 16 Dec 2009 09:36:12 -0500
Michael Scheidellscheid...@secnap.net  wrote:


On 12/16/09 9:27 AM, Thomas Harold wrote:

I'm guessing that you'd also want to change the autolearn
thresholds to be stricter?  Like only auto-learning if it scores
below -2 or above +10?

(That might be an amavisd-new feature.)

I still use 0, but have the high score at +15.


The default is 0.1 IIRC, and I wouldn't recommend setting it lower
without negative-scoring custom rules - it's set positive for good
reasons.

BAYES and userconf whitelisting rules don't count for autolearning, so
if you set a negative threshold with the default rules, you rely on
DNS whitelisting to define ham - the likes of HABEOUS.

Setting it at exactly 0.0 is also problematical since the decision to
learn is commonly going to be determined by nominally scored rules that
score 0.001 and -0.001.


Looking at the wiki...

http://wiki.apache.org/spamassassin/BasicConfiguration

We're not using userconf whitelisting, our whitelisting is done by 
amavisd-new mappings (where we score specific domains/addresses with a 
small -2 to -5 score).


The wiki, as it is currently, makes it sound like the +0.1 default for 
ham auto-learn is not conservative enough.  And that the +6.0 default 
for auto-learning spam is too risky.


(We run with -0.5 and +9.5 as our boundaries for auto-learning.)


Re: Site-wide Bayes

2009-12-16 Thread Thomas Harold

On 12/15/2009 11:55 AM, Michael Scheidell wrote:

On 12/15/09 11:49 AM, Charles Gregory wrote:

On Tue, 15 Dec 2009, Matt Garretson wrote:

Heartily agreed. Site-wide bayes here (single database for 2000+
users) catches 40% of the spam here.


But what is the FP rate? Is it safe for an ISP with a widely varied
user base to use site-wide Bayes?


I find that you should reduce scores on the high and low end (bayes_00
and bayes_95) and the 'meta rules' that might combine them also.

(so, yes, an ISP, or for our hosted clients, we have modified the bayes
scores. . if one client is a plastic surgeon, one is a stock broker, and
one is a mortgage broker, each will be getting wildly different ham)

setting up a 'per domain' bayes might work, might be tricky, especially
if an inbound email is going to several domains, and only if you are
doing B2B (commercial clients)



I'm guessing that you'd also want to change the autolearn thresholds to 
be stricter?  Like only auto-learning if it scores below -2 or above +10?


(That might be an amavisd-new feature.)


Re: Site-wide Bayes

2009-12-16 Thread Michael Scheidell

On 12/16/09 9:27 AM, Thomas Harold wrote:
I'm guessing that you'd also want to change the autolearn thresholds 
to be stricter?  Like only auto-learning if it scores below -2 or 
above +10?


(That might be an amavisd-new feature.)

I still use 0, but have the high score at +15.

watch the 'sa-learn dump --magic'

if you can keep the 'spam/ham' ratio close to your sites 'spam vs ham' 
ratio, you should be ok.




--
Michael Scheidell, CTO
Phone: 561-999-5000, x 1259
 *| *SECNAP Network Security Corporation

   * Certified SNORT Integrator
   * 2008-9 Hot Company Award Winner, World Executive Alliance
   * Five-Star Partner Program 2009, VARBusiness
   * Best Anti-Spam Product 2008, Network Products Guide
   * King of Spam Filters, SC Magazine 2008

_
This email has been scanned and certified safe by SpammerTrap(r). 
For Information please see http://www.secnap.com/products/spammertrap/

_
  


Re: Site-wide Bayes (was: Spam from compromised web mails)

2009-12-15 Thread Charles Gregory

On Tue, 15 Dec 2009, Matt Garretson wrote:
Heartily agreed. Site-wide bayes here (single database for 2000+ users) 
catches 40% of the spam here.


But what is the FP rate? Is it safe for an ISP with a widely varied user 
base to use site-wide Bayes?


- Charles


Re: Site-wide Bayes

2009-12-15 Thread Michael Scheidell

On 12/15/09 11:49 AM, Charles Gregory wrote:

On Tue, 15 Dec 2009, Matt Garretson wrote:
Heartily agreed. Site-wide bayes here (single database for 2000+ 
users) catches 40% of the spam here.


But what is the FP rate? Is it safe for an ISP with a widely varied 
user base to use site-wide Bayes?


I find that you should reduce scores on the high and low end (bayes_00 
and bayes_95) and the 'meta rules' that might combine them also.


(so, yes, an ISP, or for our hosted clients, we have modified the bayes 
scores. .  if one client is a plastic surgeon, one is a stock broker, 
and one is a mortgage broker, each will be getting wildly different ham)


setting up a 'per domain' bayes might work, might be tricky, especially 
if an inbound email is going to several domains, and only if you are 
doing B2B (commercial clients)





--
Michael Scheidell, CTO
Phone: 561-999-5000, x 1259
 *| *SECNAP Network Security Corporation

   * Certified SNORT Integrator
   * 2008-9 Hot Company Award Winner, World Executive Alliance
   * Five-Star Partner Program 2009, VARBusiness
   * Best Anti-Spam Product 2008, Network Products Guide
   * King of Spam Filters, SC Magazine 2008

_
This email has been scanned and certified safe by SpammerTrap(r). 
For Information please see http://www.secnap.com/products/spammertrap/

_
  


Re: Site-wide Bayes

2009-12-15 Thread Yet Another Ninja

On 12/15/2009 5:49 PM, Charles Gregory wrote:

On Tue, 15 Dec 2009, Matt Garretson wrote:
Heartily agreed. Site-wide bayes here (single database for 2000+ 
users) catches 40% of the spam here.


But what is the FP rate? Is it safe for an ISP with a widely varied user 
base to use site-wide Bayes?


from my experience, yes.

the auto-fodder is just as diverse making Bayes very rugged and 
effective. You just need a good amount of ham traffic...




per-user and site-wide bayes databases toghether

2007-01-26 Thread Raul Dias
Hi,

I would like to have side by side a per-user and a site-wide database.

AFAIK, right now I can have either one or the other.

IMHE, I think that the per-user database is more effective, specially
for HAM, but a side wide one will help improve SPAM detection (lower
false negatives) and improve users with low mail count.

So, is this possible right now? 
(I dont think so, but had to ask.)

I have no problem in writting perl code.  If I have to implement/hack
this, any tips on where to start or how to implement are very welcome.

Any opinions in why to not do this (or to do this) are also welcome.


Raul Dias



RE: per-user and site-wide bayes databases toghether

2007-01-26 Thread Dan Barker
If they say you can't, then this is how you'd do it.g (Training would
need to be via scripts, not Autolearn, I imagine)

SpamAssassin uses Bayes via database queries. So, you rename the tables to
something different, and define a view of the same name as the table had
been. It will be called by SA, but will return whatever you want the view to
return. In this case, I'd guess it would be the union of the personal bayes
and the site-wide bayes. You'd need to look into the actual columns to see
if you must sum them for dups, but I imagine that would be pretty trivial
logic.

The only hack I see is to update the sa-learn process to use the correct
(renamed) table names. Views are your friend!

Dan

ps: they are the folks who know SpamAssassin. I know squirrel (er, ah, Ess
Que El).

-Original Message-
From: Raul Dias [mailto:[EMAIL PROTECTED]
Sent: Friday, January 26, 2007 1:13 PM
To: users@spamassassin.apache.org
Subject: per-user and site-wide bayes databases toghether


Hi,

I would like to have side by side a per-user and a site-wide database.

AFAIK, right now I can have either one or the other.

IMHE, I think that the per-user database is more effective, specially
for HAM, but a side wide one will help improve SPAM detection (lower
false negatives) and improve users with low mail count.

So, is this possible right now?
(I dont think so, but had to ask.)

I have no problem in writting perl code.  If I have to implement/hack
this, any tips on where to start or how to implement are very welcome.

Any opinions in why to not do this (or to do this) are also welcome.


Raul Dias




Site-Wide Bayes Question

2004-11-09 Thread Jeff Grossman
I have just set up a Sendmail server with MIMEDefang and SpamAssassin 
3.0.1.  This machine is a front-end box to my IMAP server.  I am using a 
site wide bayes database.  I am curious how other people are handling 
spam and ham with the bayes database.  I have set up two accounts on the 
front-end server for a spam mailbox and a ham mailbox for sa-learn.  If 
my users just forward the message to either one of those mailboxes, will 
sa-learn be able to properly register that e-mail?  Or should the user 
be using redirect?  Or since it has already been sent on to another mail 
server, is it worthless without the raw message?

Thanks for any help you can offer me.

Jeff



RE: Site-Wide Bayes Question

2004-11-09 Thread Matthew.van.Eerde
Jeff Grossman wrote:
 I have just set up a Sendmail server with MIMEDefang and SpamAssassin
 3.0.1.  This machine is a front-end box to my IMAP server.

I have a similar setup but with Exchange 2000 as the IMAP server.
I've created two public folders:
FN: spam but not tagged
FP: tagged but not spam

Users drag and drop errors to the appropriate folder, preserving the headers

If your IMAP server supports public folders, this may be the best way to go

Otherwise you might consider having a pair of error folders inside each mailbox 
- then have a script with universal access to all mailboxes walk through each 
mailbox, pulling from the error folders only

Matthew.van.Eerde (at) hbinc.com 805.964.4554 x902
Hispanic Business Inc./HireDiversity.com Software Engineer
perl -emap{y/a-z/l-za-k/;print}shift Jjhi pcdiwtg Ptga wprztg,


Re: Site-Wide Bayes Question

2004-11-09 Thread Jeff Grossman
[EMAIL PROTECTED] wrote:
 Jeff Grossman wrote:
 I have just set up a Sendmail server with MIMEDefang and SpamAssassin
 3.0.1.  This machine is a front-end box to my IMAP server.
 
 I have a similar setup but with Exchange 2000 as the IMAP server.
 I've created two public folders:
 FN: spam but not tagged
 FP: tagged but not spam
 
 Users drag and drop errors to the appropriate folder, preserving the headers
 
 If your IMAP server supports public folders, this may be the best way to go
 
 Otherwise you might consider having a pair of error folders inside each 
 mailbox - then have a script with universal access to all mailboxes walk 
 through each mailbox, pulling from the error folders only
 
Thank you for the suggestions.

Jeff



Site-wide bayes database, autolearn address

2004-11-02 Thread Gaby Vanhegan
Hi,
Just upgraded to 3.0.1 running under qmail on OpenBSD and am happy to 
report no problems.  However, whilst I was doing this, I had a few 
ideas.  I've had a shufty through the archives for these but I didn't 
find an appropriate answer.  I have 3 questions:

1. I would like to setup a sitewide bayes database that all mailboxes 
will use.  This saves having to make every user learn their own spam and 
should improve the overall accuracy of the system.  Is this particularly 
difficult to setup with an SQL backend?  What happens if the database is 
unavailable?  What is the performance hit on the database in these 
situations?  We see around 2 messages a day on the server.

2. I would like to setup an automatic email address that people can send 
uncaught spam to, which will then be learnt as spam and put into the 
bayes database.  Has anyone managed to do this?  The problem I forsee is 
handling the forward as attachment or forward inline that different mail 
clients use.  Presumably we would need to make people forward them as 
attachments, then have a procmail script that handles all mail accordingly.

3. I see entries such as:
autolearn=ham
autolearn=spam
autolearn=unavailable
autolearn=none
In the mail logs.  Is there a spam score threshold that triggers the 
autolearning behaviour?  Is the default sensible?  Should it be a little 
lower?  I see high-scored spam not being learned as such and wonder if 
this ought to be tweaked a little.

Gaby
--
Ha! Ha! Ha!  Dislocation...
- Phil Ken Sebben
[EMAIL PROTECTED]
http://vanhegan.net


Re: Site-wide bayes database, autolearn address

2004-11-02 Thread Keith Hackworth
 Hi,

 Just upgraded to 3.0.1 running under qmail on OpenBSD and am happy to
 report no problems.  However, whilst I was doing this, I had a few
 ideas.  I've had a shufty through the archives for these but I didn't
 find an appropriate answer.  I have 3 questions:

 1. I would like to setup a sitewide bayes database that all mailboxes
 will use.  This saves having to make every user learn their own spam and
 should improve the overall accuracy of the system.  Is this particularly
 difficult to setup with an SQL backend?  What happens if the database is
 unavailable?  What is the performance hit on the database in these
 situations?  We see around 2 messages a day on the server.

 2. I would like to setup an automatic email address that people can send
 uncaught spam to, which will then be learnt as spam and put into the
 bayes database.  Has anyone managed to do this?  The problem I forsee is
 handling the forward as attachment or forward inline that different mail
 clients use.  Presumably we would need to make people forward them as
 attachments, then have a procmail script that handles all mail
 accordingly.

 3. I see entries such as:

 autolearn=ham
 autolearn=spam
 autolearn=unavailable
 autolearn=none

 In the mail logs.  Is there a spam score threshold that triggers the
 autolearning behaviour?  Is the default sensible?  Should it be a little
 lower?  I see high-scored spam not being learned as such and wonder if
 this ought to be tweaked a little.

 Gaby

 --
 Ha! Ha! Ha!  Dislocation...
 - Phil Ken Sebben

 [EMAIL PROTECTED]
 http://vanhegan.net


As for 1 and 3, I don't know, but 2, I did myself.
Actually, the biggest problem you'll run into is that when you forward the
message, it tinkers with the headers of the message.   I found a solution
to this that doesn't require special scripts to strip the 'false' headers.

We run SquirrelMail as a webmail front-end to courier-imap.  I created a
couple buttons as an extension to the amavis-sa plugins in SquirrelMail. 
The buttons are this is spam and this isn't spam.  When a user clicks
one of these, it actually moves the message (yes, at the OS level) from
the mbox of the user who is viewing their email to my spam only mailbox. 
Fortunately, courier is pretty tolerant to this type of abuse.

Keith



Re: Site-wide bayes database, autolearn address

2004-11-02 Thread Gaby Vanhegan
Keith Hackworth wrote:
As for 1 and 3, I don't know, but 2, I did myself.
Actually, the biggest problem you'll run into is that when you forward the
message, it tinkers with the headers of the message.   I found a solution
to this that doesn't require special scripts to strip the 'false' headers.
Forwarding the email as an attachment may help, but as you say, it will 
rip out most of the headers.  We do have SquirrelMail installed on our 
server though, but not many of our users use that, preferring to pop 
from home.

I suppose we could put some instructions up where the user would view 
the message source, paste that into web form and that would get piped 
directly into sa-learn and then into the SQL bayes database.  It's 
pernickerty but it would work, and relies on the sitewide SQL database 
working.

Gaby
--
Ha! Ha! Ha!  Dislocation...
- Phil Ken Sebben
[EMAIL PROTECTED]
http://vanhegan.net