Re: Bayes DB on single-node MySQL cluster

2010-07-26 Thread Michael Scheidell


On 7/26/10 5:40 PM, Paul Hirose wrote:


https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5998

   

I just added a comment.

Hmm, I'm still on 5.0.77 (the basic RHEL5 repository version.)  Do you know 
which plugin?

the innodb plugin for mysql.  supports compression on innodb databases, etc.
I am SSU*MING THAT Most of your performance issues are read/write to 
disk? not cpu? compression will help.

what kind of volume are you seeing before the drop off?


There's a tangential bug #4508 which sends writes to one host and reads to 
another (presumably for master/slave setups) which I look forward to in a 
future version of SA as well.

   

Id like to see it be resilient.  allow us to put in more than one hostname.

--
Michael Scheidell, CTO
Phone: 561-999-5000, x 1259
> *| *SECNAP Network Security Corporation

   * Certified SNORT Integrator
   * 2008-9 Hot Company Award Winner, World Executive Alliance
   * Five-Star Partner Program 2009, VARBusiness
   * Best in Email Security,2010: Network Products Guide
   * King of Spam Filters, SC Magazine 2008


__
This email has been scanned and certified safe by SpammerTrap(r). 
For Information please see http://www.secnap.com/products/spammertrap/

__  

Re: Bayes DB on single-node MySQL cluster

2010-07-26 Thread Paul Hirose
>> RHEL5.5, MySQL GA 5.0.77, MySQL Cluster 7.1.4b, 64bit, SpamAssassin 3.2.5 
>> (but hoping to go to 3.3.1 soon.)
>> In short, I stumbled across: 
>> http://www.clusterdb.com/mysql-cluster/how-can-a-database-be-in-memory-and-durable-at-the-same-time/
>>  which
>> essentially shows how to create a MySQL Cluster, but of only one node.  This 
>> gets me an all-in-memory database *and* row-level locking.  Sorta
>> the best of both worlds, compared to using Heap/Memory vs InnoDB engine.  
>> Has anyone tried this, and  did it work for you?
>   
> and if you have a 3.5GB bayes database, don't you need 3.5GB ram?

Yep, and we're running on 8GB systems, and have innodb_buffer_pool_size set 
upwards of 4GB (or max_heap_table_size the same, if we're trying this in 
Memory/Heap engine instead.)  

> where is that bugzilla report?  I might have a solution for it.

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5998

> > Given this, I know there are folks using m/m-replication, and have seen 
> > reference to various threads.  So far, I haven't see anyone post a glaring 
> > example about how it failed or anything, but I'm still a touch shy about 
> > going against the devs :)  
>   
> biggest issues seem to be, you need a 5.1.47 or newer mysql, and I think you 
> want to use the plugin (i think).
> still get deadlocks while multi threads are trying to update the bayes DB.  
> but if you 'swatch' it, maybe you just retry?
> or, heck, its just bayes, who care? the spammers will hit you again (and if 
> you got the deadlock, they did)

Hmm, I'm still on 5.0.77 (the basic RHEL5 repository version.)  Do you know 
which plugin?  I'm just using BayesStore::MySQL if that's what you mean.  If 
there's something else, I'd appreciate any tips.   All in all, the anecdotal 
gist of random searches seems to be that m/m-replication basically works, and 
if it really does blow up, emptying and starting it clean is perfectly fine.

There's a tangential bug #4508 which sends writes to one host and reads to 
another (presumably for master/slave setups) which I look forward to in a 
future version of SA as well.

For us, this really all started because of some performance drop-off at 
crossing a certain load threshold.  Don't know why yet :( and we're looking 
into that too.  This alternative just kinda crossed our radar during our 
investigations into our InnoDB/Memory set up failing on us.

PH

==
Paul Hirose
pthir...@ucdavis.edu

Re: Bayes DB on single-node MySQL cluster

2010-07-26 Thread Michael Scheidell


On 7/26/10 5:02 PM, Paul Hirose wrote:

RHEL5.5, MySQL GA 5.0.77, MySQL Cluster 7.1.4b, 64bit, SpamAssassin 3.2.5 (but 
hoping to go to 3.3.1 soon.)

In short, I stumbled across: 
http://www.clusterdb.com/mysql-cluster/how-can-a-database-be-in-memory-and-durable-at-the-same-time/
 which essentially shows how to create a MySQL Cluster, but of only one node.  
This gets me an all-in-memory database *and* row-level locking.  Sorta the best 
of both worlds, compared to using Heap/Memory vs InnoDB engine.  Has anyone 
tried this, and  did it work for you?

   

and if you have a 3.5GB bayes database, don't you need 3.5GB ram?

where is that bugzilla report?  I might have a solution for it.


There've been threads against using master/master replication or cluster, and a couple bugzilla 
entries specifically state cluster/replication is "unsafe".  I think the main reason 
behind this is simply the duplication of data, and clear example was given in one bugzilla report.  
But if I do a single-node cluster (only one data/MySQL node), then there are no copies of data.  
Thus, it can't get out of sync, because there's nothing else to get out of sync with. Would this 
then be "safe"?  Or is there something inherent in the clustering/replication that just 
doesn't work?

Given this, I know there are folks using m/m-replication, and have seen 
reference to various threads.  So far, I haven't see anyone post a glaring 
example about how it failed or anything, but I'm still a touch shy about going 
against the devs :)
   
biggest issues seem to be, you need a 5.1.47 or newer mysql, and I think 
you want to use the plugin (i think).
still get deadlocks while multi threads are trying to update the bayes 
DB.  but if you 'swatch' it, maybe you just retry?
or, heck, its just bayes, who care? the spammers will hit you again (and 
if you got the deadlock, they did)


--
Michael Scheidell, CTO
Phone: 561-999-5000, x 1259
> *| *SECNAP Network Security Corporation

   * Certified SNORT Integrator
   * 2008-9 Hot Company Award Winner, World Executive Alliance
   * Five-Star Partner Program 2009, VARBusiness
   * Best in Email Security,2010: Network Products Guide
   * King of Spam Filters, SC Magazine 2008


__
This email has been scanned and certified safe by SpammerTrap(r). 
For Information please see http://www.secnap.com/products/spammertrap/

__  

Re: Bayes db and token expiry questions

2010-03-29 Thread Alex
Hi,

>> Well, what's the missing 120 MB? The journal? Do a complete sync and
>> then delete it.
>
> Probably the signatures in bayes_seen - there's no mechanism for ageing
> them out.

And I assume that isn't a problem then?

>> "too big" is not an absolute figure. If you store 1-occurence tokens
>> you will obviously have more tokens than without them.
>
> There's not really a choice since all tokens start that way.

Maybe a better estimate would be in terms of time. For how long should
the unseen tokens (only occurred once, I guess) remain in the
database? Perhaps that's a good metric. For me it's about a week now.

>> You should use autolearn if you don't do yet.
>
> Autolearning can make things worse by dropping the retention period.

Yes, I'm using autolearn, but how does that affect the retention
period? What do the two have to do with each other? Do you mean
auto-expire, not auto-learn?

My database seems to have improved slightly over the past few days
after increasing the max db size to 1.6M. I guess there is also a lot
of expiry pending also, because the database is currently much larger
than that today:

0.000  02050481  0  non-token data: ntokens

Looks like about 345k to be purged, if I understand correctly?

Thanks,
Alex





Thanks,
Alex


Re: Bayes db and token expiry questions

2010-03-29 Thread RW
On Mon, 29 Mar 2010 13:03:59 +0200
Kai Schaetzl  wrote:

> Alex wrote on Sun, 28 Mar 2010 13:38:25 -0400:
> 
> > I have a bayes db that's about 160MB with a 40MB token db on a
> > system with about 100k messages per day.
> 
> Well, what's the missing 120 MB? The journal? Do a complete sync and
> then delete it.
 
Probably the signatures in bayes_seen - there's no mechanism for ageing
them out.

> You should be
> aware that the expiry kicks in at 75%, not at 100% of max_db_size.

And it may reduce the tokens to 37.5% of nominal

> I suggest you change to SQL. This eliminates the journal.

Isn't that slower than journalled  db?


> > database was too big, so I lowered it back down, but I think that
> > was a mistake.
> 
> "too big" is not an absolute figure. If you store 1-occurence tokens
> you will obviously have more tokens than without them.

There's not really a choice since all tokens start that way.

> You should use autolearn if you don't do yet. 

Autolearning can make things worse by dropping the retention period.



Re: Bayes db and token expiry questions

2010-03-29 Thread Kai Schaetzl
Alex wrote on Sun, 28 Mar 2010 13:38:25 -0400:

> I have a bayes db that's about 160MB with a 40MB token db on a system
> with about 100k messages per day.

Well, what's the missing 120 MB? The journal? Do a complete sync and then 
delete it.

I've just raised the max_db_size set
> to 1.1M tokens (there are currently 1.06M tokens in there).

That's not much for a system with 100.000 messages a day. I don't mean 
it's not sufficient, it is just not "too much". You should be aware that 
the expiry kicks in at 75%, not at 100% of max_db_size.

I've also
> changed bayes to write to the journal instead of directly to the
> database and just checking it periodically to see if the journal needs
> to be synced.

I suggest you change to SQL. This eliminates the journal.

> 
> Can someone explain to me the relationship between the frequency of
> "1-occurrence tokens" and the size of the database? Here is the output
> from a recent manual sync:
> 
> token frequency: 1-occurrence tokens: 72.60%
> token frequency: less than 8 occurrences: 18.11%
> 
> I was thinking that the because the tokens are seen only once,

it probably means you get a lot of fresh tokens in. Do you autolearn?

the
> database was too big, so I lowered it back down, but I think that was
> a mistake.

"too big" is not an absolute figure. If you store 1-occurence tokens you 
will obviously have more tokens than without them. If you slash the db 
(which slashes from all tokens, not just those 1.o ones) and the 
performance goes down afterwards that was obviously a wrong decision ;-) I 
don't know if and how this is reflected in the database itself in size. 
This is a DBM database which will have certain sizes by design no matter 
how many tokens are in it. If the token database is only 40 MB that is not 
overly large, it's normal.

Now some of the same emails are continually hitting only
> BAYES_50 while others seemingly the same hit BAYES_99. I've now raised
> the number of tokens available and continue to manually train the
> database with spam and ham (there are about 1.1M spam and 500k ham
> currently).

You should use autolearn if you don't do yet. If you want to be safe you 
can change the learning thresholds to safer values. (I think I use 8 for 
spam and keep the default for ham.)

> Have I configured something wrong, or am I misunderstanding how this
> works? Is there something else I should read?

I think your db was ok as it was. You should read how to change to SQL 
;-) Do the expiry once per night per cron.

Kai

-- 
Get your web at Conactive Internet Services: http://www.conactive.com





Re: Bayes DB growing without bound; expiry not working

2008-04-21 Thread Michael Parker


On Apr 21, 2008, at 8:40 AM, Chris St. Pierre wrote:

On Mon, 21 Apr 2008, Michael Parker wrote:


select * from bayes_vars;


...
2289 rows in set (0.00 sec)


What user do you run bayes under on your MXs?


I think you've found the issue.  We run as spamd.

# sa-learn -u spamd --dump magic
0.000  0  3  0  non-token data: bayes db  
version

0.000  01492123  0  non-token data: nspam
0.000  0 660634  0  non-token data: nham
0.000  0   73178711  0  non-token data: ntokens
0.000  0 1189775610  0  non-token data: oldest atime
0.000  0 1208785034  0  non-token data: newest atime
0.000  0  0  0  non-token data: last journal  
sync atime
0.000  0  0  0  non-token data: last expiry  
atime
0.000  0  0  0  non-token data: last expire  
atime delta
0.000  0  0  0  non-token data: last expire  
reduction count


That leads to two issues:

1.  I need to straighten things out and figure out why I've got a
strange mix of per-user and global data in my Bayes DB.  Whee.



You should use the bayes override username if you want global and then  
just sa-learn -u  clear everything else (PITA, I know).  I  
personally don't believe individual bayes dbs are an issue, if you've  
got the space and CPU on your database machine.  See below for some  
solutions.





2.  Does this mean that, if I use per-user Bayes, I have to run
expiration as each user individually?

Manual expiration was recommended to me a long time ago as a way to
increase database performance, but it seems like it may not be worth
it if I have to run N forced expirations, for potentially large values
of N.



This is true for DBM based bayes databases, but generally (with an  
exception I'll talk about in a second) MySQL based bayes expiration is  
very fast (just a few seconds).  I would go ahead and turn auto-expire  
on, after running a manual expire to clear out the current backlog.


One reason that expiration slows down is an unoptimized db.  I've  
found for my small uses if I run optimization every couple of weeks I  
get much better performance. It looks like you get a lot more traffic  
so I would recommend running it more often.  With frequent  
optimizations and auto-expire your database will stay in much better  
shape.


Michael



Thanks for your help.

Chris St. Pierre
Unix Systems Administrator
Nebraska Wesleyan University





Re: Bayes DB growing without bound; expiry not working

2008-04-21 Thread Chris St. Pierre

On Mon, 21 Apr 2008, Michael Parker wrote:


select * from bayes_vars;


...
2289 rows in set (0.00 sec)


What user do you run bayes under on your MXs?


I think you've found the issue.  We run as spamd.

# sa-learn -u spamd --dump magic
0.000  0  3  0  non-token data: bayes db version
0.000  01492123  0  non-token data: nspam
0.000  0 660634  0  non-token data: nham
0.000  0   73178711  0  non-token data: ntokens
0.000  0 1189775610  0  non-token data: oldest atime
0.000  0 1208785034  0  non-token data: newest atime
0.000  0  0  0  non-token data: last journal sync atime
0.000  0  0  0  non-token data: last expiry atime
0.000  0  0  0  non-token data: last expire atime delta
0.000  0  0  0  non-token data: last expire reduction 
count

That leads to two issues:

1.  I need to straighten things out and figure out why I've got a
strange mix of per-user and global data in my Bayes DB.  Whee.

2.  Does this mean that, if I use per-user Bayes, I have to run
expiration as each user individually?

Manual expiration was recommended to me a long time ago as a way to
increase database performance, but it seems like it may not be worth
it if I have to run N forced expirations, for potentially large values
of N.

Thanks for your help.

Chris St. Pierre
Unix Systems Administrator
Nebraska Wesleyan University



Re: Bayes DB growing without bound; expiry not working

2008-04-21 Thread Michael Parker


On Apr 21, 2008, at 8:17 AM, Chris St. Pierre wrote:


Consequently, my database is growing, apparently without bound.

Any ideas how I can get expiry to work properly again?  (Hopefully
without completely dumping the database?)



select * from bayes_vars;

What user do you run bayes under on your MXs?

Michael



Re: Bayes DB file locations help

2007-08-26 Thread Matt Kettler
got2go wrote:
> Hello all,
>
> I am trying to get Bayes working on CentoS 4.3 with Postfix, MailScanner,
> IMP (with Spam reporting feature).
>   
Check your /etc/mail/spamassassin/mailscanner.cf. (if you don't have
one, your MailScanner is ancient)

If you've got this line:
bayes_path /var/spool/MailScanner/spamassassin/bayes

Then that's what you're using for everything.

If it's commented out, then
 MailScanner should be using /root/.spamassassin/.
 IMP might be using  /var/www


If you have no /etc/mail/spamassassin/mailscanner.cf, and a REALLY old
mailscanner, check your spam.assassin.prefs.conf, for a bayes_path.

If that's the case then:
locally logged in as root is using /root/.spamassassin
IMP might be using /var/www/
MailScanner is probably using spam.assassin.prefs.conf, which
probably has the /var/spool/MailScanner bayes_path.
   



Re: Bayes DB

2007-05-11 Thread Daniel Aquino

Ok it looks like using sa-learn created the databases fine even with
only 1 ham/spam...


Re: Bayes DB

2007-05-11 Thread Daniel Aquino

I didn't even realize my reply's were not being sent to the thread I started...
Sorry!


RE: Bayes DB

2007-05-11 Thread Bowie Bailey
Daniel Aquino wrote:
> > run these commands as the defang user.
> 
> Would it be bad to use "root" because defang is not a real user..

"spamd" will not run as root.  If you try it, it will switch to
"nobody".

You can deal with this two ways:

If your mail accounts are owned by real users on the system, you can let
SA run as the user you are delivering to.  In this case, you must make
sure that all of your users have read/write access to the Bayes files.

If all of your mail accounts are owned by a single user, you can tell
spamd which user to run as and set the ownership of the Bayes files to
that user.  If you are calling "spamassassin" directly, you will need to
switch to the correct user yourself and then run it.  For spamd, it
looks like this:

spamd -u mailacct

-- 
Bowie


RE: Bayes DB

2007-05-11 Thread Bowie Bailey
Daniel Aquino wrote:
> I really don't know if I can extract emails from Outlook 2003 into a
> standard mbox format...

Maildir is the preferred format.  You can extract emails from Outlook,
but Outlook and Exchange tend to rewrite portions of the message which
makes this less than ideal for SA's purposes.

> So I'm thinking "after" I install this gateway, I could set it up to
> trap incoming messages some how and collect 200 spam/ham to train the
> bayes db...

That is what I am doing for the accounts that I control.  The gateway
sorts the mail into ham and spam folders on the server and also forwards
it along to Exchange.  Every day or so, I scan through the ham and spam
folders to make sure the messages are classified properly and then run
sa-learn on the directories.

-- 
Bowie


RE: Bayes DB

2007-05-11 Thread Bowie Bailey
Daniel Aquino wrote:
> > 1) What (exactly) did you do?
> 
> # local.cf  config file at this url
> http://pastie.caboo.se/60756
> 
> > What user is SA running as?  What are the permissions on the bayes
> > directory? 
> 
> drwx-- 2 defang defang 4096 2007-05-11 10:48
> /var/spool/MD-Databases/ 
> 
> > 2) What (exactly) was the result?
> 
> ls /var/spool/MD-Databases/
> auto-whitelist*  auto-whitelist.mutex
> 
> > Why do you say it didn't work?
> 
> I would like to see the bayes_* db files show up in
> /var/spool/MD-Database 
> I sent a few spam messages through SpamAssassin and it created the
> whitelist, db files but not the bayes files...

Please post replies to the list so that others can learn from and
comment on them.

Assuming that SA is running as the defang user, I don't see anything
obviously wrong with your setup.  It may be that Bayes simply hasn't
seen anything to learn from yet.  Take a couple of your messages and
learn from them manually and see if the Bayes files are created.

sa-learn --ham sample_nonspam.msg
sa-learn --spam sample_spam.msg

You can also query the database manually and see what is there.

sa-learn --dump magic

This will give you all of the message and token counts.  Make sure you
run these commands as the defang user.

-- 
Bowie


RE: Bayes DB

2007-05-11 Thread Bowie Bailey
Luis Hernán Otegui wrote:
> First, RTFM.
> Second, Google.
> Third, oh, well... You NEED to feed Bayes a significant amount of
> data, so it knows what is spam and waht is ham, due to the fact that
> the kind of spam and ham you receive is different from the ones I get
> on my servers. Then it will start auto learning on that basis. But, to
> start, it needs you to feed it data...
> 
> Luix
> 
> 2007/5/11, Daniel Aquino <[EMAIL PROTECTED]>:
> > > Have you trained the bayes database? Is this a fresh install? It
> > > needs at least 200 spam and 200 ham messages to get it going.
> > > However, the more ham and spam you can feed it, the better it
> > > will perform... 
> > 
> > Well I thought I could use the auto-learning feature ?

You can use auto-learning, you just have to watch it at first and
manually re-learn any messages that it mis-classifies.  Also, depending
on your traffic patterns, it can take a while for Bayes to learn the
required 200 spam and 200 ham messages from auto-learning alone.  For
best results teach it manually at the beginning.  This way, you know you
have a good database when you start.  With auto-learning alone, the
database can get corrupted before you ever get a chance to use it.

-- 
Bowie


Re: Bayes DB

2007-05-11 Thread Luis Hernán Otegui

First, RTFM.
Second, Google.
Third, oh, well... You NEED to feed Bayes a significant amount of
data, so it knows what is spam and waht is ham, due to the fact that
the kind of spam and ham you receive is different from the ones I get
on my servers. Then it will start auto learning on that basis. But, to
start, it needs you to feed it data...

Luix

2007/5/11, Daniel Aquino <[EMAIL PROTECTED]>:

> Have you trained the bayes database? Is this a fresh install? It needs
> at least 200 spam and 200 ham messages to get it going. However, the
> more ham and spam you can feed it, the better it will perform...

Well I thought I could use the auto-learning feature ?




--
-
GNU-GPL: "May The Source Be With You...
Linux Registered User #448382.
-


RE: Bayes DB

2007-05-11 Thread Bowie Bailey
Daniel Aquino wrote:
> I setup Bayes and whitelist db paths in my local.cf
> The whitelist db created succesfully but the bayes_* db's did not...

More information please...  Just saying that it doesn't work isn't very
helpful.

Before we can help you, we need the two basic pieces of information:

1) What (exactly) did you do?

Show us the path line you put in the local.cf as well as any other
configuration changes you made that may be relevant.  What user is SA
running as?  What are the permissions on the bayes directory?

2) What (exactly) was the result?

Why do you say it didn't work?  Show us any error messages you got.
Describe any problems that you saw.

-- 
Bowie


Re: Bayes DB

2007-05-11 Thread Luis Hernán Otegui

Have you trained the bayes database? Is this a fresh install? It needs
at least 200 spam and 200 ham messages to get it going. However, the
more ham and spam you can feed it, the better it will perform...


Luix

2007/5/11, Daniel Aquino <[EMAIL PROTECTED]>:

I setup Bayes and whitelist db paths in my local.cf
The whitelist db created succesfully but the bayes_* db's did not...




--
-
GNU-GPL: "May The Source Be With You...
Linux Registered User #448382.
-


Re: Bayes db size....

2007-02-19 Thread Ken Menzel
- Original Message - 
From: "Dave Koontz" <[EMAIL PROTECTED]>

To: "'spam mailling list'" 
Sent: Saturday, February 17, 2007 9:30 AM
Subject: Re: Bayes db size



Is there a consensus on this need?  I deal with the seen db issue by
scheduled deletion of that file.  That said,  with SA becoming more 
and
more prominent all the time, I suspect the Average Joe will miss 
this
oddity until they wind up with a sluggish system, out of drive space 
or

other related issues.

I was mostly curious of the logic on NOT doing maintenance on the 
Seen

and AWL db files.  If there is a consensus this needs to occur, then
perhaps I can take the time to create a proper patch.  I just want 
to

make sure I am not missing something fundamental here

Michael Parker wrote:

Dave Koontz wrote:



I use the SQL interface and expire the bayes_seen like this.  I 
believe 6 months to be over conservative.  I added a lastupdate column 
as a timestamp.  In the perl DBM  I would recommend you use a 
technique such as this and update the timestamp in perl.  It converts 
nicely to SQL.


Here is my query for cleaning bayes_seen:

mysql -u$USER -p$PW -h$SERVER -e\
"DELETE FROM bayes_seen WHERE lastupdate <= DATE_SUB(SYSDATE(), 
INTERVAL 6 MONTH); " \

$DB

Hope this helps,
Ken 



Re: Bayes db size....

2007-02-17 Thread Dave Koontz
Is there a consensus on this need?  I deal with the seen db issue by
scheduled deletion of that file.  That said,  with SA becoming more and
more prominent all the time, I suspect the Average Joe will miss this
oddity until they wind up with a sluggish system, out of drive space or
other related issues.

I was mostly curious of the logic on NOT doing maintenance on the Seen
and AWL db files.  If there is a consensus this needs to occur, then
perhaps I can take the time to create a proper patch.  I just want to
make sure I am not missing something fundamental here

Michael Parker wrote:
> Dave Koontz wrote:
>   
>> I am sure this has been asked numerous times before, but what is the logic
>> in having auto expiry on the bayes DB, and not seen?  Seems that once tokens
>> have been removed from the DB there is little to no use for 'unlearning' any
>> associated messages.  Besides on a busy system, this seen file gets large
>> very fast.  I'd vote for auto expiry and maintenance on seen as well as AWL.
>>
>> 
>
> Patches welcome.
>
> Michael
>
>   
>



Re: Bayes db size....

2007-02-17 Thread Michael Parker
Dave Koontz wrote:
> I am sure this has been asked numerous times before, but what is the logic
> in having auto expiry on the bayes DB, and not seen?  Seems that once tokens
> have been removed from the DB there is little to no use for 'unlearning' any
> associated messages.  Besides on a busy system, this seen file gets large
> very fast.  I'd vote for auto expiry and maintenance on seen as well as AWL.
> 

Patches welcome.

Michael


> 
> -Original Message-
> From: Theo Van Dinter [mailto:[EMAIL PROTECTED] 
> Sent: Friday, February 16, 2007 7:19 PM
> To: spam mailling list
> Subject: Re: Bayes db size
> 
> On Fri, Feb 16, 2007 at 06:17:36PM -0600, Robert Nicholson wrote:
>> So you're saying that right now seen isn't capped like tokens right?
> 
> seen has no max size nor expiry features.
> 
> --
> Randomly Selected Tagline:
> "Like any French restaurant in America, it was overpriced, noisy, moody,
> and would put you in mortal danger if you had an accident with anything
> larger than a croissant." - Unknown about the Renault LeCar
> 
> 



RE: Bayes db size....

2007-02-17 Thread Dave Koontz
I am sure this has been asked numerous times before, but what is the logic
in having auto expiry on the bayes DB, and not seen?  Seems that once tokens
have been removed from the DB there is little to no use for 'unlearning' any
associated messages.  Besides on a busy system, this seen file gets large
very fast.  I'd vote for auto expiry and maintenance on seen as well as AWL.


-Original Message-
From: Theo Van Dinter [mailto:[EMAIL PROTECTED] 
Sent: Friday, February 16, 2007 7:19 PM
To: spam mailling list
Subject: Re: Bayes db size

On Fri, Feb 16, 2007 at 06:17:36PM -0600, Robert Nicholson wrote:
> So you're saying that right now seen isn't capped like tokens right?

seen has no max size nor expiry features.

--
Randomly Selected Tagline:
"Like any French restaurant in America, it was overpriced, noisy, moody,
and would put you in mortal danger if you had an accident with anything
larger than a croissant." - Unknown about the Renault LeCar




Re: Bayes db size....

2007-02-16 Thread Theo Van Dinter
On Fri, Feb 16, 2007 at 06:45:51PM -0600, Robert Nicholson wrote:
> Well then I only care about tokens and not repeated emails can I  
> disable seen?

You can't disable it, but you can delete it, as previously stated.

-- 
Randomly Selected Tagline:
54% of all statistics are made up.  No, make that 82%...


pgpJeszJhPLwp.pgp
Description: PGP signature


Re: Bayes db size....

2007-02-16 Thread Robert Nicholson
Well then I only care about tokens and not repeated emails can I  
disable seen?


On Feb 16, 2007, at 6:19 PM, Theo Van Dinter wrote:


On Fri, Feb 16, 2007 at 06:17:36PM -0600, Robert Nicholson wrote:

So you're saying that right now seen isn't capped like tokens right?


seen has no max size nor expiry features.

--
Randomly Selected Tagline:
"Like any French restaurant in America, it was overpriced, noisy,  
moody,
 and would put you in mortal danger if you had an accident with  
anything

 larger than a croissant." - Unknown about the Renault LeCar




Re: Bayes db size....

2007-02-16 Thread Theo Van Dinter
On Fri, Feb 16, 2007 at 06:17:36PM -0600, Robert Nicholson wrote:
> So you're saying that right now seen isn't capped like tokens right?

seen has no max size nor expiry features.

-- 
Randomly Selected Tagline:
"Like any French restaurant in America, it was overpriced, noisy, moody,
 and would put you in mortal danger if you had an accident with anything
 larger than a croissant." - Unknown about the Renault LeCar


pgpoU1aLK9mxe.pgp
Description: PGP signature


Re: Bayes db size....

2007-02-16 Thread Robert Nicholson

So you're saying that right now seen isn't capped like tokens right?

On Feb 16, 2007, at 5:45 PM, Theo Van Dinter wrote:


On Fri, Feb 16, 2007 at 05:42:13PM -0600, Robert Nicholson wrote:

Why then is my Bayes DB 20MEG in size right now if
=item bayes_expiry_max_db_size  (default: 15)


That's in number of tokens, not physical size in bytes.

100,000 tokens, whichever has a larger value.  150,000 tokens is  
roughly

equivalent to a 8Mb database file.


That's an estimate, but depends on your platforms, libraries, etc.


How do I control the size of the _seen file?


You can delete it if you want to.  You'll be able to release  
messages again,

but that may not be an issue for you.

--
Randomly Selected Tagline:
"Truly unencumbered by the engineering process."
 - Unknown about the Renault Dauphine




Re: Bayes db size....

2007-02-16 Thread Theo Van Dinter
On Fri, Feb 16, 2007 at 05:42:13PM -0600, Robert Nicholson wrote:
> Why then is my Bayes DB 20MEG in size right now if
> =item bayes_expiry_max_db_size  (default: 15)

That's in number of tokens, not physical size in bytes.

> 100,000 tokens, whichever has a larger value.  150,000 tokens is roughly
> equivalent to a 8Mb database file.

That's an estimate, but depends on your platforms, libraries, etc.

> How do I control the size of the _seen file?

You can delete it if you want to.  You'll be able to release messages again,
but that may not be an issue for you.

-- 
Randomly Selected Tagline:
"Truly unencumbered by the engineering process."
 - Unknown about the Renault Dauphine


pgp5XYTaI5E5C.pgp
Description: PGP signature


Re: bayes db version

2007-01-14 Thread Péntek Imre
At 2007. january 14. 20.32 Theo Van Dinter wrote:
> http://issues.apache.org/SpamAssassin/show_bug.cgi?id=3563
well, afaik all my sa-learn instances run through procmail, having correctly 
locked:
:0cw:/tmp/some.lock
|sa-learn --spam --no-sync --single
however this was happened, when procmail reveived 21 mails to learn, and the 
warning occured one time out of the 21 total possibilities, this way it is 
possibly a race condition.
-- 
With regards: Imre Péntek
E-Mail: [EMAIL PROTECTED]


Re: bayes db version

2007-01-14 Thread Theo Van Dinter
On Sun, Jan 14, 2007 at 11:29:38AM +0100, Péntek Imre wrote:
> this output was generated by sa-learn:
> bayes: bayes db version 0 is not able to be used, aborting! 
[...]
> So far this is the first and only time I saw this warning, and since no 
> warnings like this displayed.

Sounds like:

http://issues.apache.org/SpamAssassin/show_bug.cgi?id=3563

-- 
Randomly Selected Tagline:
A production of the digitally insane.


pgpbv2LFxgdTj.pgp
Description: PGP signature


Re: bayes db site wide or per user

2006-12-09 Thread Theo Van Dinter
On Sat, Dec 09, 2006 at 01:48:51PM +0100, Alex Handle wrote:
> I could disable the spamchecks in amavisd-new and invoke sa through
> maildrop.
> But i don't know if a per-user database would scale for 100,000 mailboxes?

IMO, Bayes will likely be ok if you use SQL (though your DB will be quite
a bit larger).  I think the issue is going to be CPU -- more expires,
scanning mail delivered to multiple people multiple times, etc.

Generally speaking I believe, large user installations go site-wide.

-- 
Randomly Selected Tagline:
Leela: "He's crude and gross and he treats me like a slave." 
 Fry: "Then dump his one-eyed ass." 


pgp01qFcp6384.pgp
Description: PGP signature


Re: bayes db site wide or per user

2006-12-09 Thread Alex Handle

Theo Van Dinter schrieb:

On Fri, Dec 08, 2006 at 09:44:04PM +0100, Alex Handle wrote:

postfix/mysql/nfs/amavisd-new/spamassassin and now we

Is it a bad idea to use a site wide bayes database or is it better
to use a per user database in this scenario?


Per user DBs will give you better results, but since you're running from
the MTA, your only choice is site-wide.



I could disable the spamchecks in amavisd-new and invoke sa through
maildrop.
But i don't know if a per-user database would scale for 100,000 mailboxes?




Re: bayes db site wide or per user

2006-12-08 Thread Theo Van Dinter
On Fri, Dec 08, 2006 at 09:44:04PM +0100, Alex Handle wrote:
> postfix/mysql/nfs/amavisd-new/spamassassin and now we
> 
> Is it a bad idea to use a site wide bayes database or is it better
> to use a per user database in this scenario?

Per user DBs will give you better results, but since you're running from
the MTA, your only choice is site-wide.

-- 
Randomly Selected Tagline:
"Wheee! ...ow, I bit my tongue!"
 
--Ralph Wiggum
  Bart's Inner Child (Episode 1F05)


pgpHXRSHFKtRT.pgp
Description: PGP signature


RE: RE: Bayes DB version issue 3.1.3 => 3.1.4

2006-08-08 Thread Gary W. Smith
Nigel, 

I ended up taking the approach you listed a little earlier.  The problem
is that I now have two separate bayes databases; one for RH/3.1.3 and
one for rPath/3.1.4.

This isn't that much of a resource problem rather a redundancy problem
(as I replicate the databases to our DR location, etc).

So I imported the data and started testing.  For some reason it was
taking upwards of 70 seconds per message.  This is starting SA right
after installing.

After reboot it did drop down to .5-1.5 range though.  I was getting
worried.  I know have two 3.1.4 machines up and running.  I will swap
out two of my 4 other 3.1.3 and upgrade those in a couple days after it
has ran for a while.

Gary Wayne Smith

> -Original Message-
> From: Nigel Frankcom [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, August 08, 2006 7:03 PM
> To: Gary W. Smith
> Subject: Re: RE: Bayes DB version issue 3.1.3 => 3.1.4
> 
> Hi Gary,
> 
> A dump from the SA db should reimport; you may have to kill the latin
> line in the dump and replace it with UTF8, beyond that it should be a
> straight forward dump and reload?
> 
> Let me know how it goes?
> 
> Kind regards
> 
> Nigel
> 
> n Tue, 8 Aug 2006 16:12:21 -0700, "Gary W. Smith"
> <[EMAIL PROTECTED]> wrote:
> 
> >I've created a new database in UTF8 format.  I will see how this
works
> >out.  I might try to copy the data from the Latin database to the
UTF8
> >database but in past experience this hasn't worked that great.  I
might
> >also make a backup as well and try that.
> >
> >
> >
> >-Original Message-----
> >From: Gary W. Smith [mailto:[EMAIL PROTECTED]
> >Sent: Tuesday, August 08, 2006 2:23 PM
> >To: users@spamassassin.apache.org
> >Subject: RE: Bayes DB version issue 3.1.3 => 3.1.4
> >
> >Okay, I have a little more information now.  I run the same command
that
> >sql.pm would run.  It appears to be a collation issue.  Can we force
the
> >collation with 3.1.4 to a specific type?  In my case the database is
in
> >latin because 3.1.3 choked on UTF8.  This was on RHEL4 (which
defaults
> >to UTF8).  The kernel is 2.6.9.
> >
> >I'm trying to get this to run on rPath Linux which is on 2.6.16.  I
> >would suspect that they have implemented more libraries in UTF8 now
than
> >back on kernel 2.6.9.
> >
> >Anyway, here is the command I issued to catch this point:
> >
> >echo "SELECT value FROM bayes_global_vars WHERE variable =
'VERSION';" |
> >mysql -u user -D database -h 10.0.13.13 -ppassword
> >
> >ERROR 1267 (HY000) at line 1: Illegal mix of collations
> >(latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for
> >operation '='
> >
> >Any help would be greatly appreciated.
> >
> >Gary Wayne Smith
> >
> >-Original Message-
> >From: Gary W. Smith [mailto:[EMAIL PROTECTED]
> >Sent: Tuesday, August 08, 2006 8:06 AM
> >To: Daryl C. W. O'Shea
> >Cc: users@spamassassin.apache.org
> >Subject: RE: Bayes DB version issue 3.1.3 => 3.1.4
> >
> >Daryl,
> >
> >Thanks for the info.  I will update the .8.  As for the database,
which
> >is the primary concern, the user account is correct.  I have logged
into
> >the database from that server using the same credentials from the
> >local.cf file.  I had thought that we might have restricted by subnet
so
> >I did indeed try that last night.
> >
> >[EMAIL PROTECTED] spamassassin]# mysql -u xxx -h xx.xx.xx.xx -D
> >spamassassin -p
> >Enter password:
> >Reading table information for completion of table and column names
> >You can turn off this feature to get a quicker startup with -A
> >
> >Welcome to the MySQL monitor.  Commands end with ; or \g.
> >Your MySQL connection id is 6649341 to server version: 4.1.7-log
> >
> >Type 'help;' or '\h' for help. Type '\c' to clear the buffer.
> >
> >mysql> show tables;
> >++
> >| Tables_in_spamassassin |
> >++
> >| awl|
> >| bayes_expire   |
> >| bayes_global_vars  |
> >| bayes_seen |
> >| bayes_token|
> >| bayes_vars |
> >| userpref   |
> >++
> >7 rows in set (0.00 sec)
> >
> >mysql> select * from bayes_global_vars;
> >+--+---+
> >| variable | value |
> >+--+---+
> >| VERSION  | 3 |
> >+--+---+
> >1 row in set (0.00 sec)
> >
> >mysql&

RE: Bayes DB version issue 3.1.3 => 3.1.4

2006-08-08 Thread Gary W. Smith
I've created a new database in UTF8 format.  I will see how this works
out.  I might try to copy the data from the Latin database to the UTF8
database but in past experience this hasn't worked that great.  I might
also make a backup as well and try that.



-Original Message-
From: Gary W. Smith [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, August 08, 2006 2:23 PM
To: users@spamassassin.apache.org
Subject: RE: Bayes DB version issue 3.1.3 => 3.1.4

Okay, I have a little more information now.  I run the same command that
sql.pm would run.  It appears to be a collation issue.  Can we force the
collation with 3.1.4 to a specific type?  In my case the database is in
latin because 3.1.3 choked on UTF8.  This was on RHEL4 (which defaults
to UTF8).  The kernel is 2.6.9.

I'm trying to get this to run on rPath Linux which is on 2.6.16.  I
would suspect that they have implemented more libraries in UTF8 now than
back on kernel 2.6.9.

Anyway, here is the command I issued to catch this point:

echo "SELECT value FROM bayes_global_vars WHERE variable = 'VERSION';" |
mysql -u user -D database -h 10.0.13.13 -ppassword  

ERROR 1267 (HY000) at line 1: Illegal mix of collations
(latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for
operation '='

Any help would be greatly appreciated.

Gary Wayne Smith

-Original Message-
From: Gary W. Smith [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, August 08, 2006 8:06 AM
To: Daryl C. W. O'Shea
Cc: users@spamassassin.apache.org
Subject: RE: Bayes DB version issue 3.1.3 => 3.1.4

Daryl, 

Thanks for the info.  I will update the .8.  As for the database, which
is the primary concern, the user account is correct.  I have logged into
the database from that server using the same credentials from the
local.cf file.  I had thought that we might have restricted by subnet so
I did indeed try that last night.  

[EMAIL PROTECTED] spamassassin]# mysql -u xxx -h xx.xx.xx.xx -D
spamassassin -p 
Enter password: 
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 6649341 to server version: 4.1.7-log

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> show tables;
++
| Tables_in_spamassassin |
++
| awl| 
| bayes_expire   | 
| bayes_global_vars  | 
| bayes_seen | 
| bayes_token| 
| bayes_vars | 
| userpref   | 
++
7 rows in set (0.00 sec)

mysql> select * from bayes_global_vars;
+--+---+
| variable | value |
+--+---+
| VERSION  | 3 | 
+--+---+
1 row in set (0.00 sec)

mysql> 


-Original Message-
From: Daryl C. W. O'Shea [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, August 08, 2006 12:38 AM
To: Gary W. Smith
Cc: users@spamassassin.apache.org
Subject: Re: Bayes DB version issue 3.1.3 => 3.1.4

On 8/8/2006 3:29 AM, Gary W. Smith wrote:
> Hello,
> 
> I can't remember smoking crack when copying the config files over but 
> anything's possible. 
> 
> I built out a new machine today and installed SA.  We have a list of 
> CPAN modules that were installed (same list as from the 3.1.3
servers).  
> I copied everything in the /etc/mail/spamassassin from our productions

> servers to the test server and after starting we receive errors.  I
have 
> checked and the MySQL data instance is accessible from this server.  
> There are also several rules that are errors as well.
> 
> I know that someone has asked this question already but I didn't find 
> the answer in the thread archive.
> 
> Here are the contents of the log file:
> 
> Aug  7 21:45:59 labtest01c spamd[2693]: config: score: the non-numeric

> score (.8) is not valid, a numeric score is required
> 
> Aug  7 21:45:59 labtest01c spamd[2693]: config: SpamAssassin failed to

> parse line, "SUBJ_HAS_UNIQ_ID .8 0.212 0.682 1.677" is not valid for 
> "score", skipping: score SUBJ_HAS_UNIQ_ID .8 0.212 0.682 1.677

".8" requires a leading zero.


> Aug  7 21:46:01 labtest01c spamd[2693]: bayes: database version 0 is 
> different than we understand (3), aborting! at 
> /usr/lib/perl5/site_perl/5.8.7/Mail/SpamAssassin/BayesStore/SQL.pm
line 
> 135.
> 
> Aug  7 21:46:03 labtest01c spamd[2693]: bayes: database version 0 is 
> different than we understand (3), aborting! at 
> /usr/lib/perl5/site_perl/5.8.7/Mail/SpamAssassin/BayesStore/SQL.pm
line 
> 135.

SQL server privilege issue?


> Aug  7 21:46:05 labtest01c spamd[2693]: rules: meta test
DIGEST_MULTIPLE 
> has undefined dependency 'RAZOR2_CHECK'
> 
> Aug

RE: Bayes DB version issue 3.1.3 => 3.1.4

2006-08-08 Thread Gary W. Smith
Okay, I have a little more information now.  I run the same command that
sql.pm would run.  It appears to be a collation issue.  Can we force the
collation with 3.1.4 to a specific type?  In my case the database is in
latin because 3.1.3 choked on UTF8.  This was on RHEL4 (which defaults
to UTF8).  The kernel is 2.6.9.

I'm trying to get this to run on rPath Linux which is on 2.6.16.  I
would suspect that they have implemented more libraries in UTF8 now than
back on kernel 2.6.9.

Anyway, here is the command I issued to catch this point:

echo "SELECT value FROM bayes_global_vars WHERE variable = 'VERSION';" |
mysql -u user -D database -h 10.0.13.13 -ppassword  

ERROR 1267 (HY000) at line 1: Illegal mix of collations
(latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for
operation '='

Any help would be greatly appreciated.

Gary Wayne Smith

-Original Message-
From: Gary W. Smith [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, August 08, 2006 8:06 AM
To: Daryl C. W. O'Shea
Cc: users@spamassassin.apache.org
Subject: RE: Bayes DB version issue 3.1.3 => 3.1.4

Daryl, 

Thanks for the info.  I will update the .8.  As for the database, which
is the primary concern, the user account is correct.  I have logged into
the database from that server using the same credentials from the
local.cf file.  I had thought that we might have restricted by subnet so
I did indeed try that last night.  

[EMAIL PROTECTED] spamassassin]# mysql -u xxx -h xx.xx.xx.xx -D
spamassassin -p 
Enter password: 
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 6649341 to server version: 4.1.7-log

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> show tables;
++
| Tables_in_spamassassin |
++
| awl| 
| bayes_expire   | 
| bayes_global_vars  | 
| bayes_seen | 
| bayes_token| 
| bayes_vars | 
| userpref   | 
++
7 rows in set (0.00 sec)

mysql> select * from bayes_global_vars;
+--+---+
| variable | value |
+--+---+
| VERSION  | 3 | 
+--+---+
1 row in set (0.00 sec)

mysql> 


-Original Message-
From: Daryl C. W. O'Shea [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, August 08, 2006 12:38 AM
To: Gary W. Smith
Cc: users@spamassassin.apache.org
Subject: Re: Bayes DB version issue 3.1.3 => 3.1.4

On 8/8/2006 3:29 AM, Gary W. Smith wrote:
> Hello,
> 
> I can't remember smoking crack when copying the config files over but 
> anything's possible. 
> 
> I built out a new machine today and installed SA.  We have a list of 
> CPAN modules that were installed (same list as from the 3.1.3
servers).  
> I copied everything in the /etc/mail/spamassassin from our productions

> servers to the test server and after starting we receive errors.  I
have 
> checked and the MySQL data instance is accessible from this server.  
> There are also several rules that are errors as well.
> 
> I know that someone has asked this question already but I didn't find 
> the answer in the thread archive.
> 
> Here are the contents of the log file:
> 
> Aug  7 21:45:59 labtest01c spamd[2693]: config: score: the non-numeric

> score (.8) is not valid, a numeric score is required
> 
> Aug  7 21:45:59 labtest01c spamd[2693]: config: SpamAssassin failed to

> parse line, "SUBJ_HAS_UNIQ_ID .8 0.212 0.682 1.677" is not valid for 
> "score", skipping: score SUBJ_HAS_UNIQ_ID .8 0.212 0.682 1.677

".8" requires a leading zero.


> Aug  7 21:46:01 labtest01c spamd[2693]: bayes: database version 0 is 
> different than we understand (3), aborting! at 
> /usr/lib/perl5/site_perl/5.8.7/Mail/SpamAssassin/BayesStore/SQL.pm
line 
> 135.
> 
> Aug  7 21:46:03 labtest01c spamd[2693]: bayes: database version 0 is 
> different than we understand (3), aborting! at 
> /usr/lib/perl5/site_perl/5.8.7/Mail/SpamAssassin/BayesStore/SQL.pm
line 
> 135.

SQL server privilege issue?


> Aug  7 21:46:05 labtest01c spamd[2693]: rules: meta test
DIGEST_MULTIPLE 
> has undefined dependency 'RAZOR2_CHECK'
> 
> Aug  7 21:46:05 labtest01c spamd[2693]: rules: meta test
DIGEST_MULTIPLE 
> has undefined dependency 'DCC_CHECK'
> 
> Aug  7 21:46:05 labtest01c spamd[2693]: rules: meta test
DRUGS_ERECTILE 
> has undefined dependency '__DRUGS_ERECTILE7'
> 
> Aug  7 21:46:05 labtest01c spamd[2693]: rules: meta test 
> SARE_SUB_ACCEPT_CCARDS has undefined dependency
'__SARE_SUB_FROM_PAYPAL'
> 
> Aug  7 21:46:0

RE: Bayes DB version issue 3.1.3 => 3.1.4

2006-08-08 Thread Gary W. Smith
Daryl, 

Thanks for the info.  I will update the .8.  As for the database, which
is the primary concern, the user account is correct.  I have logged into
the database from that server using the same credentials from the
local.cf file.  I had thought that we might have restricted by subnet so
I did indeed try that last night.  

[EMAIL PROTECTED] spamassassin]# mysql -u xxx -h xx.xx.xx.xx -D
spamassassin -p 
Enter password: 
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 6649341 to server version: 4.1.7-log

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> show tables;
++
| Tables_in_spamassassin |
++
| awl| 
| bayes_expire   | 
| bayes_global_vars  | 
| bayes_seen | 
| bayes_token| 
| bayes_vars | 
| userpref   | 
++
7 rows in set (0.00 sec)

mysql> select * from bayes_global_vars;
+--+---+
| variable | value |
+--+---+
| VERSION  | 3 | 
+--+---+
1 row in set (0.00 sec)

mysql> 


-Original Message-
From: Daryl C. W. O'Shea [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, August 08, 2006 12:38 AM
To: Gary W. Smith
Cc: users@spamassassin.apache.org
Subject: Re: Bayes DB version issue 3.1.3 => 3.1.4

On 8/8/2006 3:29 AM, Gary W. Smith wrote:
> Hello,
> 
> I can't remember smoking crack when copying the config files over but 
> anything's possible. 
> 
> I built out a new machine today and installed SA.  We have a list of 
> CPAN modules that were installed (same list as from the 3.1.3
servers).  
> I copied everything in the /etc/mail/spamassassin from our productions

> servers to the test server and after starting we receive errors.  I
have 
> checked and the MySQL data instance is accessible from this server.  
> There are also several rules that are errors as well.
> 
> I know that someone has asked this question already but I didn't find 
> the answer in the thread archive.
> 
> Here are the contents of the log file:
> 
> Aug  7 21:45:59 labtest01c spamd[2693]: config: score: the non-numeric

> score (.8) is not valid, a numeric score is required
> 
> Aug  7 21:45:59 labtest01c spamd[2693]: config: SpamAssassin failed to

> parse line, "SUBJ_HAS_UNIQ_ID .8 0.212 0.682 1.677" is not valid for 
> "score", skipping: score SUBJ_HAS_UNIQ_ID .8 0.212 0.682 1.677

".8" requires a leading zero.


> Aug  7 21:46:01 labtest01c spamd[2693]: bayes: database version 0 is 
> different than we understand (3), aborting! at 
> /usr/lib/perl5/site_perl/5.8.7/Mail/SpamAssassin/BayesStore/SQL.pm
line 
> 135.
> 
> Aug  7 21:46:03 labtest01c spamd[2693]: bayes: database version 0 is 
> different than we understand (3), aborting! at 
> /usr/lib/perl5/site_perl/5.8.7/Mail/SpamAssassin/BayesStore/SQL.pm
line 
> 135.

SQL server privilege issue?


> Aug  7 21:46:05 labtest01c spamd[2693]: rules: meta test
DIGEST_MULTIPLE 
> has undefined dependency 'RAZOR2_CHECK'
> 
> Aug  7 21:46:05 labtest01c spamd[2693]: rules: meta test
DIGEST_MULTIPLE 
> has undefined dependency 'DCC_CHECK'
> 
> Aug  7 21:46:05 labtest01c spamd[2693]: rules: meta test
DRUGS_ERECTILE 
> has undefined dependency '__DRUGS_ERECTILE7'
> 
> Aug  7 21:46:05 labtest01c spamd[2693]: rules: meta test 
> SARE_SUB_ACCEPT_CCARDS has undefined dependency
'__SARE_SUB_FROM_PAYPAL'
> 
> Aug  7 21:46:05 labtest01c spamd[2693]: rules: meta test 
> SARE_SPEC_PROLEO_M2a has dependency 'MIME_QP_LONG_LINE' with a zero
score
> 
> Aug  7 21:46:05 labtest01c spamd[2693]: rules: meta test 
> SARE_HEAD_SUBJ_RAND has undefined dependency 'SARE_XMAIL_SUSP2'
> 
> Aug  7 21:46:05 labtest01c spamd[2693]: rules: meta test 
> SARE_HEAD_SUBJ_RAND has undefined dependency 'SARE_HEAD_XAUTH_WARN'
> 
> Aug  7 21:46:05 labtest01c spamd[2693]: rules: meta test 
> SARE_HEAD_SUBJ_RAND has dependency 'X_AUTH_WARN_FAKED' with a zero
score
> 
> Aug  7 21:46:05 labtest01c spamd[2693]: rules: meta test SARE_RD_SAFE 
> has undefined dependency 'SARE_RD_SAFE_MKSHRT'
> 
> Aug  7 21:46:05 labtest01c spamd[2693]: rules: meta test SARE_RD_SAFE 
> has undefined dependency 'SARE_RD_SAFE_GT'
> 
> Aug  7 21:46:05 labtest01c spamd[2693]: rules: meta test SARE_RD_SAFE 
> has undefined dependency 'SARE_RD_SAFE_TINY'
> 
> Aug  7 21:46:05 labtest01c spamd[2693]: rules: meta test 
> SARE_MSGID_LONG45 has undefined dependency '__SARE_MSGI

Re: Bayes DB version issue 3.1.3 => 3.1.4

2006-08-08 Thread Daryl C. W. O'Shea

On 8/8/2006 3:29 AM, Gary W. Smith wrote:

Hello,

I can’t remember smoking crack when copying the config files over but 
anything’s possible. 

I built out a new machine today and installed SA.  We have a list of 
CPAN modules that were installed (same list as from the 3.1.3 servers).  
I copied everything in the /etc/mail/spamassassin from our productions 
servers to the test server and after starting we receive errors.  I have 
checked and the MySQL data instance is accessible from this server.  
There are also several rules that are errors as well.


I know that someone has asked this question already but I didn’t find 
the answer in the thread archive.


Here are the contents of the log file:

Aug  7 21:45:59 labtest01c spamd[2693]: config: score: the non-numeric 
score (.8) is not valid, a numeric score is required


Aug  7 21:45:59 labtest01c spamd[2693]: config: SpamAssassin failed to 
parse line, "SUBJ_HAS_UNIQ_ID .8 0.212 0.682 1.677" is not valid for 
"score", skipping: score SUBJ_HAS_UNIQ_ID .8 0.212 0.682 1.677


".8" requires a leading zero.


Aug  7 21:46:01 labtest01c spamd[2693]: bayes: database version 0 is 
different than we understand (3), aborting! at 
/usr/lib/perl5/site_perl/5.8.7/Mail/SpamAssassin/BayesStore/SQL.pm line 
135.


Aug  7 21:46:03 labtest01c spamd[2693]: bayes: database version 0 is 
different than we understand (3), aborting! at 
/usr/lib/perl5/site_perl/5.8.7/Mail/SpamAssassin/BayesStore/SQL.pm line 
135.


SQL server privilege issue?


Aug  7 21:46:05 labtest01c spamd[2693]: rules: meta test DIGEST_MULTIPLE 
has undefined dependency 'RAZOR2_CHECK'


Aug  7 21:46:05 labtest01c spamd[2693]: rules: meta test DIGEST_MULTIPLE 
has undefined dependency 'DCC_CHECK'


Aug  7 21:46:05 labtest01c spamd[2693]: rules: meta test DRUGS_ERECTILE 
has undefined dependency '__DRUGS_ERECTILE7'


Aug  7 21:46:05 labtest01c spamd[2693]: rules: meta test 
SARE_SUB_ACCEPT_CCARDS has undefined dependency '__SARE_SUB_FROM_PAYPAL'


Aug  7 21:46:05 labtest01c spamd[2693]: rules: meta test 
SARE_SPEC_PROLEO_M2a has dependency 'MIME_QP_LONG_LINE' with a zero score


Aug  7 21:46:05 labtest01c spamd[2693]: rules: meta test 
SARE_HEAD_SUBJ_RAND has undefined dependency 'SARE_XMAIL_SUSP2'


Aug  7 21:46:05 labtest01c spamd[2693]: rules: meta test 
SARE_HEAD_SUBJ_RAND has undefined dependency 'SARE_HEAD_XAUTH_WARN'


Aug  7 21:46:05 labtest01c spamd[2693]: rules: meta test 
SARE_HEAD_SUBJ_RAND has dependency 'X_AUTH_WARN_FAKED' with a zero score


Aug  7 21:46:05 labtest01c spamd[2693]: rules: meta test SARE_RD_SAFE 
has undefined dependency 'SARE_RD_SAFE_MKSHRT'


Aug  7 21:46:05 labtest01c spamd[2693]: rules: meta test SARE_RD_SAFE 
has undefined dependency 'SARE_RD_SAFE_GT'


Aug  7 21:46:05 labtest01c spamd[2693]: rules: meta test SARE_RD_SAFE 
has undefined dependency 'SARE_RD_SAFE_TINY'


Aug  7 21:46:05 labtest01c spamd[2693]: rules: meta test 
SARE_MSGID_LONG45 has undefined dependency '__SARE_MSGID_LONG50'


Aug  7 21:46:05 labtest01c spamd[2693]: rules: meta test 
SARE_MSGID_LONG45 has undefined dependency '__SARE_MSGID_LONG55'


Aug  7 21:46:05 labtest01c spamd[2693]: rules: meta test 
SARE_MSGID_LONG45 has undefined dependency '__SARE_MSGID_LONG65'


Aug  7 21:46:05 labtest01c spamd[2693]: rules: meta test 
SARE_MSGID_LONG45 has undefined dependency '__SARE_MSGID_LONG75'


Aug  7 21:46:06 labtest01c spamd[2693]: rules: meta test 
VIRUS_WARNING_DOOM_BNC has undefined dependency 'VIRUS_WARNING_MYDOOM4'


Aug  7 21:46:06 labtest01c spamd[2693]: rules: meta test 
SARE_OBFU_CIALIS has undefined dependency 'SARE_OBFU_CIALIS2'


Aug  7 21:46:06 labtest01c spamd[2693]: rules: meta test FP_MIXED_PORN3 
has undefined dependency 'FP_PENETRATION'


Not errors, just info.


Aug  7 21:46:07 labtest01c spamd[2693]: spamd: server started on port 
783/tcp (running version 3.1.4)


Aug  7 21:46:07 labtest01c spamd[2693]: spamd: server pid: 2693

Aug  7 21:46:07 labtest01c spamd[2693]: spamd: server successfully 
spawned child process, pid 2700


Aug  7 21:46:07 labtest01c spamd[2693]: spamd: server successfully 
spawned child process, pid 2701


Aug  7 21:46:07 labtest01c spamd[2693]: prefork: child states: II


Normal startup info.


Daryl


Re: Bayes db corrupt, not fixable?

2006-06-27 Thread Theo Van Dinter
On Tue, Jun 27, 2006 at 10:50:03AM -0500, Larry Starr wrote:
> I don't believe that it is referring to the Spamassassin Version, but rather 
> the version of "Berekly DB".   Have you updated any packages lately?

Actually it is the SA DB version being referred to.  v2 is for databases
in SA 2.6x, v3 is SA 3.0.x and later.

> > When I run sa-learn --dump or --sync, it tells me the database is
> > version 2.  This machine never ran less than Spamassassin version 3.
> > The --sync will run continueing to try and get a lock on the file
> > forever.  The only related process running was amavisd, killed it.
> > Still couldn't get a lock on the file.

Well, the lock is really a different thing from the DB.  If the files are
local to the machine (ie: not NFS), you may want to switch to "lock_method
flock" which is better, but doesn't work on remote file systems.

> > I then removed bayes.lock* files in the same directory.  Ran sa-learn -D
> > --sync, reported that it was upgrading the database to v3 and completed.
> > When run again it can't get a lock on the file.

It sounds like either you have a lot of contention for the lock, or processes
are being killed before they get the chance to unlock, or ...

I'd try the flock method if you can, that usually clears up a lot of problems.

-- 
Randomly Generated Tagline:
"So, the long and short of it--if you have one sysadmin, you have a
 "system administrator."  If you have two sysadmins, you have two "system
 administrators."  If you have two thousand sysadmins, you're at LISA."
  - Trey Harris <[EMAIL PROTECTED]>


pgp19Y53KWaFM.pgp
Description: PGP signature


Re: Bayes db corrupt, not fixable?

2006-06-27 Thread Larry Starr
I don't believe that it is referring to the Spamassassin Version, but rather 
the version of "Berekly DB".   Have you updated any packages lately?

On Tuesday 27 June 2006 10:45, Bobby Johnson wrote:
> When I run sa-learn --dump or --sync, it tells me the database is
> version 2.  This machine never ran less than Spamassassin version 3.
> The --sync will run continueing to try and get a lock on the file
> forever.  The only related process running was amavisd, killed it.
> Still couldn't get a lock on the file.
>
> I then removed bayes.lock* files in the same directory.  Ran sa-learn -D
> --sync, reported that it was upgrading the database to v3 and completed.
> When run again it can't get a lock on the file.
>
> I'd rather not rebuild this database if possible.  Any ideas?
>
> Bobby

-- 
Larry G. Starr - [EMAIL PROTECTED] or [EMAIL PROTECTED]
Software Engineer: Full Compass Systems LTD.
Phone: 608-831-7330 x 1347  FAX: 608-831-6330
===
There are only three sports: bullfighting, mountaineering and motor
racing, all the rest are merely games! - Ernest Hemmingway



RE: bayes db issue

2006-04-11 Thread Gary V

I recently switched to using mysql bayes.  I am getting a [1135] dbg:
bayes: unable to initialize database for root user, aborting! When I do
spamassassin -d --lint  any idea what I need to change?

Best regards,

JD Smith



You possibly have not learned a message as root yet. As root, try this:

sa-learn --spam < sample-spam.txt

_
Don’t just search. Find. Check out the new MSN Search! 
http://search.msn.click-url.com/go/onm00200636ave/direct/01/




Re: bayes db issue

2006-04-11 Thread Michael Parker
JD Smith wrote:
> I recently switched to using mysql bayes.  I am getting a [1135] dbg:
> bayes: unable to initialize database for root user, aborting! When I do
> spamassassin -d --lint  any idea what I need to change?  
> 

Its kind of a bad warning message.  Bayes will not attempt to initialize
the database until you actually try to write to it.  In general you can
probably ignore unless you are trying to do some sort of learning.

Michael


Re: bayes db issue

2006-04-11 Thread Dhawal Doshy
JD Smith writes: 


I recently switched to using mysql bayes.  I am getting a [1135] dbg:
bayes: unable to initialize database for root user, aborting! When I do
spamassassin -d --lint  any idea what I need to change?  


Try a "select id,username,spam_count,ham_count from bayes_vars" on your 
bates database to find the username under which your bayes exists.. 


Next use the username in the above query to add this line in your local.cf
bayes_sql_override_username username 


hth,
- dhawal 

Best regards, 

JD Smith 








--
 CAUTION - Disclaimer *
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely
for the use of the addressee(s). If you are not the intended recipient, 
please

notify the sender by e-mail requesting deletion of the original message.
Further, you are not to copy, disclose, or distribute this e-mail or its
contents to any other person and any such actions are unlawful. NetMagic
Solutions Pvt. Ltd. has taken every reasonable precaution to minimize the 
risk

of virus infection & spam, but is not liable for any damage, you may sustain
as a result of any virus in this e-mail. You should carry out your own virus
checks before opening the e-mail or attachment. NetMagic Solutions Pvt. Ltd.
reserves the right to monitor and review the content of all messages sent to
or from this e-mail address. 


Messages sent to or from this e-mail address may be stored on the NetMagic
Solutions Pvt. Ltd.'s e-mail system.
* End of Disclaimer ***


Re: bayes db from SA 3.0.2 to 3.0.4

2005-06-22 Thread Kai Schaetzl
Roman Serbski wrote on Mon, 20 Jun 2005 14:55:56 +0600:

> 1. sa-learn --backup > db.txt (on old server)

since you transferred the complete db I don't see a reason to import and 
export the data. This is like backing up your notebook at home, take the 
notebook and backup with you in the car to work and then recover your data 
from the backup before starting to work. Just move the whole directory, 
that's all you need to do.

Kai

-- 
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com
IE-Center: http://ie5.de & http://msie.winware.org





Re: bayes db from SA 3.0.2 to 3.0.4

2005-06-20 Thread jdow
I suspect you wanted to perform a "sa-learn --sync" first. But I do not
know for sure.
{^_^}
- Original Message - 
From: "Roman Serbski" <[EMAIL PROTECTED]>


Dear colleagues,

Could you please share the correct procedure for moving bayes database
from the server powered by SA 3.0.2 to another server with 3.0.4
installed?

Here is what I did:

1. sa-learn --backup > db.txt (on old server)
2. Transfer of bayes db files from old server to a new one.

cd /var/spool/spamd/.spamassassin/ && ls -al

drwx--  2 spamd  spamd   512 Jun 20 14:46 .
drwxr-xr-x  3 spamd  spamd   512 Feb 20 17:03 ..
-rw---  1 spamd  spamd  3798 Jun 20 14:56 bayes.mutex
-rw-rw-rw-  1 root   spamd 33480 Jun 20 14:56 bayes_journal
-rw---  1 spamd  spamd  10174464 Jun 20 14:56 bayes_seen
-rw-rw-rw-  1 root   spamd   5324800 Jun 20 14:56 bayes_toks
-rw-r--r--  1 spamd  spamd  1175 Jan 30 12:08 user_prefs
-rw-rw-rw-  1 spamd  spamd 65536 Feb 19 17:41 whitelist
-rw---  1 spamd  spamd 6 Feb 19 17:41 whitelist.mutex

3. sa-learn --restore db.txt (on new server)

`spamassassin -D --lint` doesn't show any errors.

Does this procedure look correct?
Thank you for your time!

Roman



Re: bayes DB in CDB format

2005-05-31 Thread Matt Kettler
Asif Iqbal wrote:
> Hi All
> 
> I see notes on using MySQL/PgSQL and other SQL database and migration
> from Berkeley DB to MySQL. I was wondering if anyone knows how to
> migrate to DAN's CDB from Berkeley DB for bayes DB. I like to use that (CDB) 
> as the
> bayes DB.
> 
> Thanks for any help/suggestion/tip
> 

CDB would be rather difficult to support for SA's bayes system. It's designed
for "constant databases", ie: those which are built once and read many times
without change.


To this end, CDB does not support single-record inserts or deletes, which are
very key operations to SpamAssassin's learning and expiry. Any learning or
expiry operation would require deleting the entire bayes database and rebuilding
the whole thing.



Re: bayes DB in CDB format

2005-05-31 Thread Arvinn Løkkebakken



Rick Macdougall wrote:




Asif Iqbal wrote:


Hi All

I see notes on using MySQL/PgSQL and other SQL database and migration
from Berkeley DB to MySQL. I was wondering if anyone knows how to
migrate to DAN's CDB from Berkeley DB for bayes DB. I like to use 
that (CDB) as the

bayes DB.

Thanks for any help/suggestion/tip



Hi,

While I do not know the answer to that (I believe it's going to be 
"Sorry, CDB is not currently supported"), you should really look at 
MySQL or PgSQL for bayes if you are going to migrate.  On a heavily 
loaded server you almost have to run bayes with MySQL or suffer the 
consequences of bayes locking / expiry.


Not to mention when having > 1 spamd servers scanning the same "stream." 
of messages and thus should share the same data.


Arvinn


Re: bayes DB in CDB format

2005-05-30 Thread Michael Parker
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Asif Iqbal wrote:

> Hi All
>
> I see notes on using MySQL/PgSQL and other SQL database and
> migration from Berkeley DB to MySQL. I was wondering if anyone
> knows how to migrate to DAN's CDB from Berkeley DB for bayes DB. I
> like to use that (CDB) as the bayes DB.
>
> Thanks for any help/suggestion/tip
>

Doesn't CDB work best in a read only situation?  or is that TDB?  If
it's got a perl module that follows the same interface as the other
*_File (ie DB_File, SDBM_File, etc) modules, it could be tested I suppose.

FYI, 3.1 has native support for SDBM, as well as Berkeley DB.  It also
has PgSQL and MySQL specific modules that offer features specific to
those databases.

Michael
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.4 (Darwin)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFCm/X3G4km+uS4gOIRAnREAKCbDCj/cpMAXq/hidhH/TgDjnNDuQCgmjaY
Ulrcs902hDp4QzgxVlilLpo=
=j1Sr
-END PGP SIGNATURE-



Re: bayes DB in CDB format

2005-05-30 Thread Rick Macdougall



Asif Iqbal wrote:

Hi All

I see notes on using MySQL/PgSQL and other SQL database and migration
from Berkeley DB to MySQL. I was wondering if anyone knows how to
migrate to DAN's CDB from Berkeley DB for bayes DB. I like to use that (CDB) as 
the
bayes DB.

Thanks for any help/suggestion/tip



Hi,

While I do not know the answer to that (I believe it's going to be 
"Sorry, CDB is not currently supported"), you should really look at 
MySQL or PgSQL for bayes if you are going to migrate.  On a heavily 
loaded server you almost have to run bayes with MySQL or suffer the 
consequences of bayes locking / expiry.


CDB may actually be worse then Berkeley DB for Bayes use, even if it 
was/is supported because of the method it uses for updates.


With a real DB backend, those problems go away.

Just my $0.02 with a couple of hundred thousand messages scanned a day.

Regards,

Rick


Re: bayes DB in CDB format

2005-05-30 Thread Theo Van Dinter
On Mon, May 30, 2005 at 07:18:31PM -0400, Asif Iqbal wrote:
> from Berkeley DB to MySQL. I was wondering if anyone knows how to
> migrate to DAN's CDB from Berkeley DB for bayes DB. I like to use that (CDB) 
> as the
> bayes DB.

There's no native support for CDB, so you'd have to do up your own
BayesStore backend module, etc.

-- 
Randomly Generated Tagline:
Is it progress if a cannibal uses a knife and fork?


pgpZC7tZDkRmu.pgp
Description: PGP signature


Re: bayes db keeps die

2005-05-18 Thread Martin Hepworth
Craig
best to install Spamassassin from source or CPAN. I've seen lots of 
problems with the RPM based install. No specifically bayes but

--
Martin Hepworth
Snr Systems Administrator
Solid State Logic
Tel: +44 (0)1865 842300
Craig White wrote:
CentOS 3.4
# rpm -qa spamassassin
spamassassin-3.0.3-1.1.el3.rf
# rpm -qa mailscanner
mailscanner-4.41.3-1
I start with starter db from Fortress Systems since my old bayes db from
2.6x was creamed by this same issue...
# spamassassin -p /etc/MailScanner/spam.assassin.prefs.conf -D --lint
much snippage...
debug: bayes: 5791 tie-ing to DB file R/O /etc/MailScanner/bayes/bay
debug: bayes: 5791 tie-ing to DB file
R/O /etc/MailScanner/bayes/bayes_seen
debug: bayes: found bayes db version 3
ok - looks good
# sa-learn -p /etc/MailScanner/spam.assassin.prefs.conf --dump magic
0.000  0  3  0  non-token data: bayes db version
0.000  0   1733  0  non-token data: nspam
0.000  0313  0  non-token data: nham
0.000  0 140671  0  non-token data: ntokens
0.000  0 1051647943  0  non-token data: oldest atime
0.000  0 1095956416  0  non-token data: newest atime
0.000  0  0  0  non-token data: last journal
sync atime
0.000  0  0  0  non-token data: last expiry
atime
0.000  0  0  0  non-token data: last expire
atime delta
0.000  0  0  0  non-token data: last expire
reduction count
still looks good - but within 2 minutes...
# sa-learn -p /etc/MailScanner/spam.assassin.prefs.conf --dump magic
bayes: bayes db version 0 is not able to be used, aborting!
at /usr/lib/perl5/vendor_perl/5.8.0/Mail/SpamAssassin/BayesStore/DBM.pm
line 160.
bayes: bayes db version 0 is not able to be used, aborting!
at /usr/lib/perl5/vendor_perl/5.8.0/Mail/SpamAssassin/BayesStore/DBM.pm
line 160.
ERROR: Bayes dump returned an error, please re-run with -D for more
information
It seems that no matter how I execute things like...
sa-learn --rebuild 
or
sa-learn --sync -D

it always corrupts in this fashion.
Any clues?
Thanks
Craig
**
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.
This footnote confirms that this email message has been swept
for the presence of computer viruses and is believed to be clean.   
**


Re: Bayes DB does not grow anymore

2005-03-23 Thread jdow
From: "Kai Schaetzl" <[EMAIL PROTECTED]>

> > in a degree I have set my SA score to be more or less equal with the
> > BAYES_99 score (around 8).
>
> Your BAYES_99 score is 8? I would never do this. General rule is that no
single
> rule should be able to mark a message as ham or spam. That cries for false
> positives.

I'd not do that with Bayes scores. However, there are a few rules that
are iron clad spam detectors here and they get VERY high scores. They
are unique to me and uniquely usable by me so I don't bother to pass
them along. (I have a string if wrong names associated with products
people spam me about that I use to send a score well over 5 to SA. And
I have some additional PayPal antispam of my own which involve some
fancy dancing with meta rules that get an automatic 105 to make sure
they never get through to anything but my spam folder. I do scan the
spam folder, though. If I didn't scan it I'd not be so vicious about
some of my spam scores.

{^_-}




Re: Bayes DB does not grow anymore

2005-03-23 Thread Kai Schaetzl
GRP Productions wrote on Fri, 18 Mar 2005 10:38:29 +0200:

> It seems SURBL is now enabled by default. It has also changed its name to 
> URIDNSBL :-)

SURBL refers generally to those xx_SURBL rules and to URIDNSBL since the only 
other distributed rules is SBL and SURBL started it all.

 I do not use SARE rules (although I am trying to find time to 
> look at them, as I am aware of their credibility). I use Gray's rules 
> (http://files.grayonline.id.au), they seem quite efficient.

I wasn't aware of that site, but now that I visited it, I remember I visited it 
at least once. Use whatever works for you. After all, all this stuff isn't done 
to make you try out again and again but to help you focus your time on the 
important things.

> I understand what you say. The point is, what should be the criteria to 
> understand if the time for an expiration has come? I mean, supposing we take 
> only the size in consideration, could be a problem. What if some old tokens 
> are still common nowadays in spam mail?

This is not a problem. Expiry isn't done by "addition time", but by access time 
(short: atime). So, items which didn't occur recently drop to the "end" of the 
db and get removed by expiry. There's always the chance that old tokens which 
haven't been seen for a long time "come back". But the chance is slimmer the 
older the atime of that token is. There's probably some statistical curve 
algorithm which could be used to determine the best "break point". Because of 
the way dbx databases work expiry can't be done this way, though.

> As I told you, since my last post I have reset everything.  It seems to me 
> it works fine, and it learns rapidly. It gives me no reason not to trust it, 
> in a degree I have set my SA score to be more or less equal with the 
> BAYES_99 score (around 8).

Your BAYES_99 score is 8? I would never do this. General rule is that no single 
rule should be able to mark a message as ham or spam. That cries for false 
positives.

 Of course I keep doing mistake-based learning, 
> but most of the times I feed it with 'subjective' spam mail (ie. mail that 
> my users don't want to receive, but is definitely not spam).

What kind of mail is that? Newsletters they once subscribed to and don't like 
anymore? They should unsubscribe instead of declaring it as spam.


Kai

-- 
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com
IE-Center: http://ie5.de & http://msie.winware.org





Re: Bayes DB does not grow anymore

2005-03-18 Thread GRP Productions
Thanks for the offer. You can send it to the email address I use for this 
list,
or you could just send me an FTP URL for retrieval.
Sorry I did not find the time to do this, but I will try to send it during 
the weekend.

Oh, yes. You need to have SURBL switched on via the init.pre (I think it's 
off
by default) and you should use custom rules. I use a set of carefully 
chosen
rulesets mostly from SARE and updated via rulesdujour and some more rules 
of my
own accumulated over time.
It seems SURBL is now enabled by default. It has also changed its name to 
URIDNSBL :-) I do not use SARE rules (although I am trying to find time to 
look at them, as I am aware of their credibility). I use Gray's rules 
(http://files.grayonline.id.au), they seem quite efficient.

I think on a heavy traffic machine it's preferrable to have it off, 
especially
when using MailScanner. Otherwise the expiry can kick in at random times 
every
few hours (you can set a minimum time, though, f.i. one day). Some people 
run a
scheduled expiry three times a day. That's an advice which often comes up 
on
the Mailscanner list (which is a very helpful list, btw).
Depends on how often you need it (whether it reaches the limit you want to 
hold
more often or not). Starting with one expiry per night should be fine, but 
you
should occasionally expire manually and look at the output, in case there 
are
problems.

No. One should get rid of really old tokens, they are only "ballast" in the 
db.
I don't know how a big db behaves on a busy site. Ours contain 1 Mio. 
tokens
and have a size of 40 MB. They work very well with no ressource hogging. 
But I
have only a few thousand messages running thru each of our servers, there's
probably none which gets more than 10.000 a day. If you get 100.000 it may 
be
different.
I understand what you say. The point is, what should be the criteria to 
understand if the time for an expiration has come? I mean, supposing we take 
only the size in consideration, could be a problem. What if some old tokens 
are still common nowadays in spam mail? You could say it doesn't matter it 
will be started again and recognize all the bad stuff. In that sense, we 
could just stop maintaining Bayes completely.

That's what we do. I only learn messages which were categorized wrong. Not 
by
Bayes, but by SA. Most messages which get a score lower than 5 still get a
BAYES_99 which means that Bayes identifies them all. Nevertheless, I learn
these messages because they are spam and it reassures Bayes that they are 
spam.
BTW: I have set BAYES_99 to 3.0, because it's so accurate for us.
As I told you, since my last post I have reset everything.  It seems to me 
it works fine, and it learns rapidly. It gives me no reason not to trust it, 
in a degree I have set my SA score to be more or less equal with the 
BAYES_99 score (around 8). Of course I keep doing mistake-based learning, 
but most of the times I feed it with 'subjective' spam mail (ie. mail that 
my users don't want to receive, but is definitely not spam). I monitor it 
constantly and I am happy about it.

No problem :-) I tend to be a bit snappy on first messages which look to me
like the author could have done a bit more research, but once we are over 
that
stage I hope I can give some good advice based on my experience.
I have to admit that our communication was valuable to me, I learned so much 
about how the whole thing works. Once again, I appreciate it.

Greg
_
Express yourself instantly with MSN Messenger! Download today it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/



Re: Bayes DB does not grow anymore

2005-03-15 Thread Kai Schaetzl
GRP Productions wrote on Tue, 15 Mar 2005 01:12:53 +0200:

> >I have been trying to get something from CVS for several days now, no luck. 
>  
> Send me your email in private ([EMAIL PROTECTED]) to send it to you.

Thanks for the offer. You can send it to the email address I use for this list, 
or you could just send me an FTP URL for retrieval.

> I will probably start again from scratch. One point: Do you think I should 
> put custom rules inside /etc/mail/spamassassin or the default installation 
> is enough? 

Oh, yes. You need to have SURBL switched on via the init.pre (I think it's off 
by default) and you should use custom rules. I use a set of carefully chosen 
rulesets mostly from SARE and updated via rulesdujour and some more rules of my 
own accumulated over time.

> Yes I just added this. Should auto_expire remain always at 0?

I think on a heavy traffic machine it's preferrable to have it off, especially 
when using MailScanner. Otherwise the expiry can kick in at random times every 
few hours (you can set a minimum time, though, f.i. one day). Some people run a 
scheduled expiry three times a day. That's an advice which often comes up on 
the Mailscanner list (which is a very helpful list, btw).
Depends on how often you need it (whether it reaches the limit you want to hold 
more often or not). Starting with one expiry per night should be fine, but you 
should occasionally expire manually and look at the output, in case there are 
problems.


 Also, do you 
> think it would be better if the db NEVER expired?

No. One should get rid of really old tokens, they are only "ballast" in the db. 
I don't know how a big db behaves on a busy site. Ours contain 1 Mio. tokens 
and have a size of 40 MB. They work very well with no ressource hogging. But I 
have only a few thousand messages running thru each of our servers, there's 
probably none which gets more than 10.000 a day. If you get 100.000 it may be 
different.


 Would this value of 50 
> achieve that? I don't want to come at work some day and see my tokens were 
> lost again :-( 

Just look at what the dump says about your oldest token. If your bayes 
"performance" is good than the hold time is probably of no interest, but if the 
spam detection from bayes is bad and you have a short hold time one of the 
things I would look at is the short hold time.


>  
> In general, should I do as you said, ie. trust the autolearn system and 
> never use sa-learn again, provided that I do not have the time to do full 
> training. 

That's what we do. I only learn messages which were categorized wrong. Not by 
Bayes, but by SA. Most messages which get a score lower than 5 still get a 
BAYES_99 which means that Bayes identifies them all. Nevertheless, I learn 
these messages because they are spam and it reassures Bayes that they are spam.
BTW: I have set BAYES_99 to 3.0, because it's so accurate for us.

>  
> Thanks for giving me so much of your time, and being so patient with my 
> silly questions.

No problem :-) I tend to be a bit snappy on first messages which look to me 
like the author could have done a bit more research, but once we are over that 
stage I hope I can give some good advice based on my experience.


Kai

-- 
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com
IE-Center: http://ie5.de & http://msie.winware.org





Re: Bayes DB does not grow anymore

2005-03-14 Thread GRP Productions
I have been trying to get something from CVS for several days now, no luck.
Send me your email in private ([EMAIL PROTECTED]) to send it to you.
Bayes needs constant training, but this doesn't mean it needs any manual
training. Once it's up and running and "well-greased" it should take care 
of
itself by auto-learning (bayes_auto_learn 1, don't know if on by default).
About 70 or 80% of our spam and ham (especially the spam) is autolearned.
I will probably start again from scratch. One point: Do you think I should 
put custom rules inside /etc/mail/spamassassin or the default installation 
is enough?

Actually, with those "few" tokens you won't loose much if you throw it away 
;-)
As I said upping that should help, no need to throw it away unless you 
think
that's easier (if most spam you get scores at BAYES_50 it might be better 
to
start over than to convince the db that it's spam).
I'll probably do it.
> bayes_auto_expire 0
> bayes_expiry_max_db_size 50
I assume you just added>/changed that?
Yes I just added this. Should auto_expire remain always at 0? Also, do you 
think it would be better if the db NEVER expired? Would this value of 50 
achieve that? I don't want to come at work some day and see my tokens were 
lost again :-(

In general, should I do as you said, ie. trust the autolearn system and 
never use sa-learn again, provided that I do not have the time to do full 
training.

Thanks for giving me so much of your time, and being so patient with my 
silly questions.
Best regards,
Greg

_
Express yourself instantly with MSN Messenger! Download today it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/



Re: Bayes DB does not grow anymore

2005-03-14 Thread Kai Schaetzl
GRP Productions wrote on Mon, 14 Mar 2005 03:41:40 +0200:

> Indeed, this is the CVS version :-) 

I have been trying to get something from CVS for several days now, no luck.

> This is perhaps because I have been using only 'mistake-based' training (ie 
> training only when false classificaiton happens). However this used to work 
> fine. 

Bayes needs constant training, but this doesn't mean it needs any manual 
training. Once it's up and running and "well-greased" it should take care of 
itself by auto-learning (bayes_auto_learn 1, don't know if on by default). 
About 70 or 80% of our spam and ham (especially the spam) is autolearned.

>  
> >your "hold time" is quite low, it's about a month. I think we haven tokens 
> >from 
> >even a year ago. That's maybe a bit too much, but I strongly suggest upping 
> >your bayes_expiry_max_db_size to something like 500.000 or so. Since you 
> >have a 
> >much higher flux of messages than we have on that machine you are literally 
> >"burning" your db to uselessness. 
>  
> So what would you suggest? I certainly dont want to lose everything that has 
> been learned till now. 

Actually, with those "few" tokens you won't loose much if you throw it away ;-) 
As I said upping that should help, no need to throw it away unless you think 
that's easier (if most spam you get scores at BAYES_50 it might be better to 
start over than to convince the db that it's spam).

> Nope, there is definitely only the one comng with MS. I never use SA from 
> the command line anyway.

Well, let's go back:
you sa-learn a message, it says it learned, you dump magic and see there's no 
change, you look in the directory and there's no journal. There *has* to be at 
least one additional Bayes db. Or something happens which I haven't heard of in 
my about three years of using SA+Bayes. What's the output of "sa-learn --dump 
magic"? Don't specify a config file!
 
> bayes_path  /var/spool/MailScanner/bayes/bayes 

and what's in your /etc/mail/spamassassin/local.conf?

> bayes_auto_expire 0
ok, that means it won't expire. Of course, if it doesn't grow this isn't 
necessary ... ;-)

> bayes_expiry_max_db_size 50
I assume you just added>/changed that?

> If I get it you mean that the tokens are lost very quickly?

Yes. However, now that I know that your bayes_expiry is off we have a different 
case? Since when has it been off? Since Feb. 11 as your dump magic suggests? 
Your oldest token is Feb. 2. So that either means your started the db that day 
or you are burning your tokens in 10 days. That's one problem, upping to a 
higher ceiling, as you already did, should take care of that. The other problem 
is that it's apparently not growing. One of the reasons is, of course, that you 
only learn by mistake. So, how often is that done? How many do you actually add 
this way? The second part of this other problem is that even if you learn it 
doesn't seem to learn. I don't see another possibility as that it uses 
different dbs.

 I think am 
> confused , if bayes works with tokens, why does it need nspam and nham? Or 
> are they just counters? 

It's just the number of spam and ham messages you learned to it. Yes, it's more 
or less informational only.

>  
> In general, do you think that setting bayes_expiry_max_db_size would be 
> enough? 

To cure the fast expiration, yes, but you didn't expire for the last 30 days, 
anyway.

> One final thing: Why even if i manually expire, the date of last expiration 
> remains old?

Same reason as above: you work on different dbs. What does the expire output 
show?


Kai

-- 
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com
IE-Center: http://ie5.de & http://msie.winware.org





Re: Bayes DB does not grow anymore

2005-03-14 Thread GRP Productions
That's okay, the problem just is one cannot be sure how accurate it is. 
Knowing
that you use MS would have been useful, anyway :-)
(BTW: my version of Mailwatch can't show this, do you use a CVS version?)
Indeed, this is the CVS version :-)
See the number of tokens, we have ten times yours with less learned mail. 
That
means that our db has much more tokens to qualify an email as ham or spam. 
Also
This is perhaps because I have been using only 'mistake-based' training (ie 
training only when false classificaiton happens). However this used to work 
fine.

your "hold time" is quite low, it's about a month. I think we haven tokens 
from
even a year ago. That's maybe a bit too much, but I strongly suggest upping
your bayes_expiry_max_db_size to something like 500.000 or so. Since you 
have a
much higher flux of messages than we have on that machine you are literally
"burning" your db to uselessness.
So what would you suggest? I certainly dont want to lose everything that has 
been learned till now.

And you learned by specifying the config file? I suspect that you are at 
least
occasionally using two SA configurations, the one coming with MS and the 
one
coming with SA.
Nope, there is definitely only the one comng with MS. I never use SA from 
the command line anyway.

Oh. Still possible, though. You don't need to have one, but on high volume
systems it's highly recommended. Check your SA config (whereever it is :-) 
for
bayes_learn_to_journal 1. I don't know if it is 1 by default, though. What 
do
you have starting with bayes in your config file?
# grep bayes /opt/MailScanner/etc/spam.assassin.prefs.conf
# be created as /var/spool/spamassassin/bayes_msgcount, etc.
#bayes_path /var/spool/spamassassin/bayes
#bayes_file_mode0600
bayes_path  /var/spool/MailScanner/bayes/bayes
bayes_file_mode 0666
# MailScanner: big bayes_toks.new files wasting space.
bayes_auto_expire 0
bayes_expiry_max_db_size 50
bayes_ignore_header X-MailScanner
bayes_ignore_header X-MailScanner-SpamCheck
bayes_ignore_header X-MailScanner-SpamScore
bayes_ignore_header X-MailScanner-Information
# use_bayes 0
Don't know if this would be of any help. As I said, I suspect you are using 
at
least two different bayes dbs. At least when you do it from the command 
line.
Run an "updatedb" and then "locate bayes" (this may not locate all files, 
f.i.
not in /var !).
I think there is only one.
MS, of course, can only use one and doesn't have a chance of confusing 
that, so
when it uses SA that learns and checks the same db. And so far that part 
seems
to be okay (except for the bigger size of bayes_seen, but as I said, this 
may
be normal for your setup, I really don't know). But you burn your tokens 
too
fast. At least that's what I think.
If I get it you mean that the tokens are lost very quickly? I think am 
confused , if bayes works with tokens, why does it need nspam and nham? Or 
are they just counters?

In general, do you think that setting bayes_expiry_max_db_size would be 
enough?
One final thing: Why even if i manually expire, the date of last expiration 
remains old?

_
Express yourself instantly with MSN Messenger! Download today it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/



Re: Bayes DB does not grow anymore

2005-03-14 Thread Kai Schaetzl
GRP Productions wrote on Mon, 14 Mar 2005 00:32:42 +0200:

> You are right, I am using MailWatch. I just posted this output to be easy 
> for one to see the actual dates without having to convert.

That's okay, the problem just is one cannot be sure how accurate it is. Knowing 
that you use MS would have been useful, anyway :-)
(BTW: my version of Mailwatch can't show this, do you use a CVS version?)

 Here is the 
> actual output: 
>  
> # /usr/bin/sa-learn -p /opt/MailScanner/etc/spam.assassin.prefs.conf --dump 
> magic 
> 0.000  0  3  0  non-token data: bayes db version 
> 0.000  0  49740  0  non-token data: nspam 
> 0.000  0  47167  0  non-token data: nham 
> 0.000  0 123325  0  non-token data: ntokens

I didn't look at this closely before, but I think this ratio indicates a 
problem, f.i. this is from our own mail server (just getting our own mail, not 
our clients'):

0.000  0  30089  0  non-token data: nspam
0.000  0  12515  0  non-token data: nham
0.000  01001630  0  non-token data: ntokens

See the number of tokens, we have ten times yours with less learned mail. That 
means that our db has much more tokens to qualify an email as ham or spam. Also 
your "hold time" is quite low, it's about a month. I think we haven tokens from 
even a year ago. That's maybe a bit too much, but I strongly suggest upping 
your bayes_expiry_max_db_size to something like 500.000 or so. Since you have a 
much higher flux of messages than we have on that machine you are literally 
"burning" your db to uselessness.

> No it isn't. This is exactly the point I mentioned.

But you didn't prove it ;-)

 But as I said earlier, 
> sa-learn claims it has learned, even from the web interface: 
> >SA Learn: Learned from 1 message(s) (1 message(s) examined). 

And you learned by specifying the config file? I suspect that you are at least 
occasionally using two SA configurations, the one coming with MS and the one 
coming with SA.

> This is getting more suspicious: there is no bayes_journal file! 

Oh. Still possible, though. You don't need to have one, but on high volume 
systems it's highly recommended. Check your SA config (whereever it is :-) for 
bayes_learn_to_journal 1. I don't know if it is 1 by default, though. What do 
you have starting with bayes in your config file?

> -rw-rw-rw-  1 root nobody 1236 Mar 14 00:22 bayes.mutex 
> -rw-rw-rw-  1 root nobody 10452992 Mar 14 00:22 bayes_seen 
> -rw-rw-rw-  1 root nobody  5509120 Mar 14 00:02 bayes_toks 

bayes_seen is quite high. I haven't ever seen that it is higher than bayes_toks 
on our systems. But maybe that's normal for high volume systems, I don't know. 
On the Mailscanner list many people complain about very big bayes_seen files. 
Someone else on this list should comment on the size.

> I can assure you noone has touched anything inside this directory. If this 
> is the reason for the problems I've been facing, is there a way to recreate 
> the file without having to lose my current data? (perhaps by copying the 
> above files somewhere, execute sa-learn --clear and some time later restore 
> the above files?)

Don't know if this would be of any help. As I said, I suspect you are using at 
least two different bayes dbs. At least when you do it from the command line. 
Run an "updatedb" and then "locate bayes" (this may not locate all files, f.i. 
not in /var !).
MS, of course, can only use one and doesn't have a chance of confusing that, so 
when it uses SA that learns and checks the same db. And so far that part seems 
to be okay (except for the bigger size of bayes_seen, but as I said, this may 
be normal for your setup, I really don't know). But you burn your tokens too 
fast. At least that's what I think.


Kai

-- 
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com
IE-Center: http://ie5.de & http://msie.winware.org





Re: Bayes DB does not grow anymore

2005-03-13 Thread GRP Productions
That is the output of --dump magic? I haven't ever seen it formatted that
nicely. I assume you skipped the first line, but there's also missing the
expire atime delta. So, where do you got this from? Not directly from 
sa-learn
--dump magic I'd say. You are running SA thru some interface? You should 
have
said something about the whereabouts of your installation.
You are right, I am using MailWatch. I just posted this output to be easy 
for one to see the actual dates without having to convert. Here is the 
actual output:

# /usr/bin/sa-learn -p /opt/MailScanner/etc/spam.assassin.prefs.conf --dump 
magic
0.000  0  3  0  non-token data: bayes db version
0.000  0  49740  0  non-token data: nspam
0.000  0  47167  0  non-token data: nham
0.000  0 123325  0  non-token data: ntokens
0.000  0 1107319073  0  non-token data: oldest atime
0.000  0 1110636450  0  non-token data: newest atime
0.000  0 1108137790  0  non-token data: last journal sync 
atime
0.000  0 1108129534  0  non-token data: last expiry atime
0.000  0 804361  0  non-token data: last expire atime 
delta
0.000  0   3475  0  non-token data: last expire 
reduction count

Ok. Get the values. Then learn a message to it. Make sure it says that it
actually learned, then check the values again. Is either the spam or ham 
count
increased by one or not?
No it isn't. This is exactly the point I mentioned. But as I said earlier, 
sa-learn claims it has learned, even from the web interface:
SA Learn: Learned from 1 message(s) (1 message(s) examined).

Ok, this finally looks a bit suspicious. No sync and no expire for a month. 
If
it doesn't sync you don't get new tokens. Check in your bayes directory how 
big
your bayes_journal is. I'd think it's quite big. Do a sync now. (Please 
don't
do it via an interface, do it on the command line.) What's the output? Is 
the
journal gone and the number of tokens increased now? If so, you need to
investigate why it doesn't sync anymore. Also do an expire then.
This is getting more suspicious: there is no bayes_journal file!
# ll /var/spool/MailScanner/bayes/
total 11780
drwxrwxrwx  2 root nobody 4096 Mar 14 00:22 .
drwxr-xr-x  4 root nobody 4096 Mar 13 11:55 ..
-rw-rw-rw-  1 root nobody 1236 Mar 14 00:22 bayes.mutex
-rw-rw-rw-  1 root nobody 10452992 Mar 14 00:22 bayes_seen
-rw-rw-rw-  1 root nobody  5509120 Mar 14 00:02 bayes_toks
I can assure you noone has touched anything inside this directory. If this 
is the reason for the problems I've been facing, is there a way to recreate 
the file without having to lose my current data? (perhaps by copying the 
above files somewhere, execute sa-learn --clear and some time later restore 
the above files?)

Thanks for your help
_
Express yourself instantly with MSN Messenger! Download today it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/



Re: Bayes DB does not grow anymore

2005-03-13 Thread Kai Schaetzl
GRP Productions wrote on Sun, 13 Mar 2005 22:54:22 +0200:

> Perhaps I have not been clear enough. It's not only that the files' size is 
> constant. I am pasting the output of dump magic,

That is the output of --dump magic? I haven't ever seen it formatted that 
nicely. I assume you skipped the first line, but there's also missing the 
expire atime delta. So, where do you got this from? Not directly from sa-learn 
--dump magic I'd say. You are running SA thru some interface? You should have 
said something about the whereabouts of your installation.

 and I have to explain that 
> the nham and nspam values are the same for many days now.

Ok. Get the values. Then learn a message to it. Make sure it says that it 
actually learned, then check the values again. Is either the spam or ham count 
increased by one or not?

> work fine. If I send to myself a message from Yahoo, with subject 'Viagra 
> sex teen " and other nice words, I certainly do not want it to pass. 
> Bayes classifies it as 50% spam.  I tried to sa-learn --forget, and then 
> re-learn, still is BAYES_50.

Again, this is NOT how Bayes works. You can't learn it one message and then 
expect it to flag that message as spam next time. Bayes does not work like 
this!
And that it classifies that message as 50%, which means, it cannot determine if 
it's ham or spam, just says that the tokens in the db are not good enough for 
that message. Or maybe it contains enough hammy tokens, whatever.

> Number of Spam Messages: 49,740 
> Number of Ham Messages: 47,167 
> Number of Tokens: 123,325 
> Oldest Token: Wed, 2 Feb 2005 06:37:53 +0200 
> Newest Token: Sat, 12 Mar 2005 16:07:30 +0200 

Says it added/changed time a token yesterday.

> Last Journal Sync: Fri, 11 Feb 2005 18:03:10 +0200 
> Last Expiry: Fri, 11 Feb 2005 15:45:34 +0200 
> Last Expiry Reduction Count: 3,475 tokens

Ok, this finally looks a bit suspicious. No sync and no expire for a month. If 
it doesn't sync you don't get new tokens. Check in your bayes directory how big 
your bayes_journal is. I'd think it's quite big. Do a sync now. (Please don't 
do it via an interface, do it on the command line.) What's the output? Is the 
journal gone and the number of tokens increased now? If so, you need to 
investigate why it doesn't sync anymore. Also do an expire then.


Kai

-- 
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com
IE-Center: http://ie5.de & http://msie.winware.org





Re: Bayes DB does not grow anymore

2005-03-13 Thread GRP Productions
This doesn't prove anything. sa-learn --dump magic shows you what's inside.
Also, Bayes is not a checksum system like Razor, that's its strength. If 
you
learn something to it that means that it extracts tokens (short pieces) 
from
the message and adjusts its internal probability for them being ham or spam 
by
a certain factor. Or if it doesn't know that token yet it adds it.
That the size doesn't grow can have several reasons, f.i. expiry or the 
fact
that the db format seems to have some "air" in it, so that it grows in 
jumps
and not continually.
Perhaps I have not been clear enough. It's not only that the files' size is 
constant. I am pasting the output of dump magic, and I have to explain that 
the nham and nspam values are the same for many days now. This is not 
normal, since we are talking about a very busy server (more than 4,000 
messages per day). This behaviour has not always been the case, it used to 
work fine. If I send to myself a message from Yahoo, with subject 'Viagra 
sex teen " and other nice words, I certainly do not want it to pass. 
Bayes classifies it as 50% spam. I tried to sa-learn --forget, and then 
re-learn, still is BAYES_50. The nham and nspam values used to increase very 
rapidly (sometimes by a value of 200-300 per day). No errors are produced. I 
wouldn't have noticed the particular problem, but fortunately during the 
last days we started having more spam than usual to be passing. Also, I 
tried to force an expiration many times, but as you can see the expiration 
did not take place. Its definitely not a file permission issue.

Thanks
Number of Spam Messages:49,740
Number of Ham Messages: 47,167
Number of Tokens:   123,325
Oldest Token:   Wed, 2 Feb 2005 06:37:53 +0200
Newest Token:   Sat, 12 Mar 2005 16:07:30 +0200
Last Journal Sync:  Fri, 11 Feb 2005 18:03:10 +0200
Last Expiry:Fri, 11 Feb 2005 15:45:34 +0200
Last Expiry Reduction Count:3,475 tokens
_
FREE pop-up blocking with the new MSN Toolbar - get it now! 
http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/



Re: Bayes DB does not grow anymore

2005-03-13 Thread Kai Schaetzl
GRP Productions wrote on Sun, 13 Mar 2005 11:21:12 +0200:

> for some days now my bayesian DB does not seem to grow. Its size remains 
> stable. It is read with no problems by SA 3.0.2, but nothing new is written. 
> I send an email to me, it is classified as BAYES_50. I sa-learn it as spam, 
> send it again, and it is still BAYES_50 (I expected to see it as BAYES_99).
>

This doesn't prove anything. sa-learn --dump magic shows you what's inside. 
Also, Bayes is not a checksum system like Razor, that's its strength. If you 
learn something to it that means that it extracts tokens (short pieces) from 
the message and adjusts its internal probability for them being ham or spam by 
a certain factor. Or if it doesn't know that token yet it adds it.
That the size doesn't grow can have several reasons, f.i. expiry or the fact 
that the db format seems to have some "air" in it, so that it grows in jumps 
and not continually.

Kai

-- 
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com
IE-Center: http://ie5.de & http://msie.winware.org





Re: bayes db version error

2005-02-08 Thread Matias Lopez Bergero
Michael Parker wrote:
On Tue, Feb 08, 2005 at 05:28:51PM -0300, Matias Lopez Bergero wrote:
2) Throttle the calls to spamd to reduce lock contention.
Sorry to ask this again, but I'm not native English speaker :-P
Did you mean to increase the number of spamd children processes, like 
spamd -m x? I have currently set spamd -m 10.

No, it means to slow/reduce the calls to spamd.  Increasing the number
of children will probably make the problem get worse.
OK, I'm using milter-spamc to talk with sendmail milter and pass the 
messages to spamd. For what I know, there is no way to control the calls 
to spamd from the milter-spamc command.
I would have to reduce the spamd child processes or increase the milter 
timeout in order to reduce the calls to spamd right?

3) Switch to SQL based bayes which won't (well shouldn't) have that
 issue.
That's an interesting idea.
I'm going to keep that in mind :)
You can view the notes/slides from my ApacheCon presentation on
Storing SpamAssassin User Data in SQL Databases here:
http://www.apache.org/~parker/presentations/
Hopefully it will help move things along.
That's good,
Thank you very much Michael.
BR,
Matías.


Re: bayes db version error

2005-02-08 Thread Michael Parker
On Tue, Feb 08, 2005 at 05:28:51PM -0300, Matias Lopez Bergero wrote:
> 
> >2) Throttle the calls to spamd to reduce lock contention.
> 
> Sorry to ask this again, but I'm not native English speaker :-P
> Did you mean to increase the number of spamd children processes, like 
> spamd -m x? I have currently set spamd -m 10.
> 

No, it means to slow/reduce the calls to spamd.  Increasing the number
of children will probably make the problem get worse.

> 
> >3) Switch to SQL based bayes which won't (well shouldn't) have that
> >   issue.
> 
> That's an interesting idea.
> I'm going to keep that in mind :)

You can view the notes/slides from my ApacheCon presentation on
Storing SpamAssassin User Data in SQL Databases here:
http://www.apache.org/~parker/presentations/

Hopefully it will help move things along.

Michael


pgpJ2mCb0uKlh.pgp
Description: PGP signature


Re: bayes db version error

2005-02-08 Thread Matias Lopez Bergero
Michael Parker wrote:
On Tue, Feb 08, 2005 at 04:37:50PM -0300, Matias Lopez Bergero wrote:
I'm seeing a lot of messages about and version error in the bayes db in 
my log file:

spamd[6562]: bayes: bayes db version 0 is not able to be used, aborting! 
at /usr/lib/perl5/site_perl/5.8.0/Mail/SpamAssassin/BayesStore/DBM.pm 
line 160.


I'm assuming you're running sitewide bayes (or at least running as a
single user) and on a somewhat busy server.
Yes, I forgot to say that. I'm running a sitewide install with about 
6 incoming messages per day.

That error message is a pretty good indication that SA couldn't get a
lock on the bayes db files.  It's actually just a warning, not an
error, and it may or may not have actually aborted.
You'll see this on a setup that is getting a good amount of traffic
and using shared/sitewide bayes db files.
Several things you can try:
1) If you db files aren't on an NFS filesystem switch your lock_method
   to flock (the default is nfssafe).  If your shared db files are on
   an NFS filesystem then consider moving them off and switch your
   lock_method.
Done.
2) Throttle the calls to spamd to reduce lock contention.
Sorry to ask this again, but I'm not native English speaker :-P
Did you mean to increase the number of spamd children processes, like 
spamd -m x? I have currently set spamd -m 10.


3) Switch to SQL based bayes which won't (well shouldn't) have that
   issue.
That's an interesting idea.
I'm going to keep that in mind :)
Thanks a lot Michael
BR,
Matías.


Re: bayes db version error

2005-02-08 Thread Michael Parker
On Tue, Feb 08, 2005 at 04:37:50PM -0300, Matias Lopez Bergero wrote:
> I'm seeing a lot of messages about and version error in the bayes db in 
> my log file:
> 
> spamd[6562]: bayes: bayes db version 0 is not able to be used, aborting! 
> at /usr/lib/perl5/site_perl/5.8.0/Mail/SpamAssassin/BayesStore/DBM.pm 
> line 160.
> 

I'm assuming you're running sitewide bayes (or at least running as a
single user) and on a somewhat busy server.

That error message is a pretty good indication that SA couldn't get a
lock on the bayes db files.  It's actually just a warning, not an
error, and it may or may not have actually aborted.

You'll see this on a setup that is getting a good amount of traffic
and using shared/sitewide bayes db files.

Several things you can try:

1) If you db files aren't on an NFS filesystem switch your lock_method
   to flock (the default is nfssafe).  If your shared db files are on
   an NFS filesystem then consider moving them off and switch your
   lock_method.

2) Throttle the calls to spamd to reduce lock contention.

3) Switch to SQL based bayes which won't (well shouldn't) have that
   issue.

> 
> Could this be affecting the spam filtering?
> 

In theory, things should filter just fine, you just won't get BAYES
results.  If you're seeing something different then it's probably a
bug.

Michael


pgpnPbqbJ0aOH.pgp
Description: PGP signature


Re: bayes db - export/import

2005-01-31 Thread Nix
On Fri, 28 Jan 2005, Justin Mason stated:
> Rodney Green writes:
>> I'd like to copy the bayes db to the temporary mail server so it can
>> continue to be used and continue learning.
>> 
>> Will I need to do some special export/import procedure or will I be
>> able to just copy the db files into the directory, set permissions and
>> be good to go?
> 
> If it's the same architecture, and the same OS release, you can
> probably just copy.   For safety I'd recommend using sa-learn --backup
> and --restore.

You shouldn't need to do that. Berkeley DB databases are byte-order-
independent (well, they can be read and written by machines with any
byte order), and the things SA puts in them are byte-order-independent
too.

As evidence, I'm sharing a Bayes database between an i586, two
UltraSPARCs, GNU/Linux (x2) and Solaris (x1) with no trouble. That's
multiple architectures and multiple OSes; no problems are evident. :)


If you *do* need to do a backup-and-restore, it's a sign that
something's compromised the byte-order-independence of what's being put
in there: probably the wrong string as argument to a pack() in
BayesStore.

-- 
`Blish is clearly in love with language. Unfortunately,
 language dislikes him intensely.' --- Russ Allbery


Re: bayes db - export/import

2005-01-28 Thread Rodney Green
On Fri, 28 Jan 2005 11:48:32 -0800, Justin Mason <[EMAIL PROTECTED]> wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> 
> Rodney Green writes:
> > Hello,
> >
> > I'm setting up a temporary mail server so I can do some work on the
> > regular production machine, without interrupting service.
> >
> > I'd like to copy the bayes db to the temporary mail server so it can
> > continue to be used and continue learning.
> >
> > Will I need to do some special export/import procedure or will I be
> > able to just copy the db files into the directory, set permissions and
> > be good to go?
> 
> If it's the same architecture, and the same OS release, you can
> probably just copy.   For safety I'd recommend using sa-learn --backup
> and --restore.
> 

Thanks Justin. I'll use sa-learn --backup and --restore. 

Rod


Re: bayes db - export/import

2005-01-28 Thread Justin Mason
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


Rodney Green writes:
> Hello,
> 
> I'm setting up a temporary mail server so I can do some work on the
> regular production machine, without interrupting service.
> 
> I'd like to copy the bayes db to the temporary mail server so it can
> continue to be used and continue learning.
> 
> Will I need to do some special export/import procedure or will I be
> able to just copy the db files into the directory, set permissions and
> be good to go?

If it's the same architecture, and the same OS release, you can
probably just copy.   For safety I'd recommend using sa-learn --backup
and --restore.

- --j.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFB+pcQMJF5cimLx9ARAronAJ9R00cpm3kAZoa143nNRLfCU/AV8wCcC5oJ
TDOiZ5cFN5j+yk5pzrcRHAc=
=hs43
-END PGP SIGNATURE-



RE: Bayes DB Get Corrupted Quickly

2004-12-06 Thread Gray, Richard
Hi Tim,

The script I sent you dumps the tokens out to a text file because SA
stores them in a Berkeley DB format. If you want to do it in place then
just have a look at the script and edit the appropriate values. If you
get really desperate then the two processes (encoding and decoding) are
essentially just perl functions. Call one and then the other in the same
start function and it will do the whole thing.

With regards to the atime problem that you have, look at this section of
code

if ($atime < 1078099200) 
{
#  print STDERR "\nThrowing away key that is too old:\n
$k$ts$th$atime\n";
  print STDERR '*';
  $droppedcount++;
  next;
}

# reset atime if it is in the future
if ($atime > time)
{
#  print STDERR "\nResetting atime of key in the future:\n
$k$ts$th$atime\n";
  print STDERR 'o';
  $atime = time;
  $resetcount++;
}

I had to write the first if statement because after removing future
tokens the DB still wouldn't expire. It was only when I pulled out the
really old tokens as well that it worked. You might need to change these
values in order to be successful.

Just change the values to accept/deny tokens into your new database
based on date. We run an almost identical system to yours and this
worked for us.

You should read through the whole perl script though, because there is
nothing more dangerous than executing someone elses code without
understanding what it is going to do. I know there is nothing bad in the
script, but do you? 




---
This email from dns has been validated by dnsMSS Managed Email Security and is 
free from all known viruses.

For further information contact [EMAIL PROTECTED]






Re: bayes db version 2 is not able to be used, aborting!

2004-11-14 Thread Kai Schaetzl
You sent three messages to the list in a row without indicating 
if the earlier problems are solved or if the three are actually 
connected. Don't you think that your problems are connected 
somehow? You seem to have upgraded from SA 2.6x to 3.0. I assume 
you either have a mixed setup now or you need to upgrade some 
Perl modules SA depends on. You should start over.
Why don't you put all this in a broader perspective and give 
more information (f.i. which way you upgraded from what etc.)?


Kai

-- 

Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: 
http://www.conactive.com
IE-Center: http://ie5.de & http://msie.winware.org





Re: Bayes DB seemingly corrupted during v2 to v3 upgrade

2004-09-25 Thread Michael Parker
On Sat, Sep 25, 2004 at 12:59:57PM -0500, Jeremy M. Dolan wrote:
> Hi all. Hoping someone might be able to help me out here. Just
> upgraded from 2.6x to 3.0.0 this morning, and, though I followed the
> Bayes DB upgrade steps in the UPGRADE file to a T, my token names all
> seem to be garbage now.
> 
> Here's a few lines of the output from "sa-learn --dump all":
> 
> 0.560 21  3 1094789733  dc60473720
> 0.992  6  0 1090849205  20d2b3d689
> 0.958  1  0 1092129562  23c375c031
> 0.998 20  0 1095699812  cc75bc02df
> 

We no longer store the raw token value in the database, instead it is
a hashed value.  There is a small blurb about this in UPGRADE.

The values in the dump are actually hex representations of the binary
values stored in the database.

So, relax, you database is fine.

Michael


RE: bayes db problem upgrade from 2.63 --> 3.0

2004-09-22 Thread Greg Deputy
>>From the Mailscanner list, run sa-learn with the --sync option to
rebuild the bayes db and that seems to fix the problem.  I was having
the exact same issues, did that, and now the error about the bayes db is
no longer there and I can see BAYES rules in action in the logs.  

Debug outpout now says :

debug: bayes: 22150 tie-ing to DB file R/O
/home/pfuser/.spamassassin/bayes_toks
debug: bayes: 22150 tie-ing to DB file R/O
/home/pfuser/.spamassassin/bayes_seen
debug: bayes: found bayes db version 3
debug: Score set 3 chosen.

I did have to reset the ownership of the bayes files from root to the
postfix user, but that's probably just something on my setup.


> -Original Message-
> From: Nichols, William [mailto:[EMAIL PROTECTED] 
> Sent: Wednesday, September 22, 2004 11:18 AM
> To: users@spamassassin.apache.org
> Subject: RE: bayes db problem upgrade from 2.63 --> 3.0
> 
> 
> I am using SUSE 9.1
> 
> I cannot install berkely db it dies, can't install db_file it dies, 
> 
> Any ideas" SA is working, just no bayes
> 
> 
> -Original Message-
> From: Rick Macdougall [mailto:[EMAIL PROTECTED] 
> Sent: Wednesday, September 22, 2004 11:10 AM
> To: users@spamassassin.apache.org
> Subject: Re: bayes db problem upgrade from 2.63 --> 3.0
> 
> 
> 
> Nichols, William wrote:
> > Tried that as well before mailing the list - I am having a problem
> with
> > install DB_File
> > 
> > version.c:30:16: db.h: No such file or directory
> > make: *** [version.o] Error 1
> >   /usr/bin/make  -- NOT OK
> > Running make test
> >   Can't test without successful make
> > Running make install
> >   make had returned bad status, install seems impossible
> > 
> > hmmm - any help?
> 
> Install berkley db ?  and if you are using redhat the devel 
> rpm as well 
> (if there is one, dunno, I don't do redhat).
> 
> Regards,
> 
> Rick
> 
> 
> 



RE: bayes db problem upgrade from 2.63 --> 3.0

2004-09-22 Thread Nichols, William
I am using SUSE 9.1

I cannot install berkely db it dies, can't install db_file it dies, 

Any ideas" SA is working, just no bayes


-Original Message-
From: Rick Macdougall [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, September 22, 2004 11:10 AM
To: users@spamassassin.apache.org
Subject: Re: bayes db problem upgrade from 2.63 --> 3.0



Nichols, William wrote:
> Tried that as well before mailing the list - I am having a problem
with
> install DB_File
> 
> version.c:30:16: db.h: No such file or directory
> make: *** [version.o] Error 1
>   /usr/bin/make  -- NOT OK
> Running make test
>   Can't test without successful make
> Running make install
>   make had returned bad status, install seems impossible
> 
> hmmm - any help?

Install berkley db ?  and if you are using redhat the devel rpm as well 
(if there is one, dunno, I don't do redhat).

Regards,

Rick



RE: bayes db problem upgrade from 2.63 --> 3.0

2004-09-22 Thread Greg Deputy
I'm having the exact same problem.  Upgraded to 3.0, everything seems ok
except the Bayes DB error.  Tried 

perl -MCPAN -e "install DB_File"

And got same results as William below.  SA is working, but no bayes is
happening anymore.

> --  ORIGINAL MESSAGE
> From: Nichols, William  ci.redding.ca.us>
> Subject: RE: bayes db problem upgrade from 2.63 --> 3.0
> Newsgroups: gmane.mail.spam.spamassassin.general
> Date: Wed, 22 Sep 2004 10:39:40 +
> 
> Tried that as well before mailing the list - I am having a problem
with
> install DB_File
> 
> version.c:30:16: db.h: No such file or directory
> make: *** [version.o] Error 1
>   /usr/bin/make  -- NOT OK
> Running make test
>   Can't test without successful make
> Running make install
>   make had returned bad status, install seems impossible
> 
> hmmm - any help?
> 
> -Original Message-
> From: Rick Macdougall [mailto:rickm  nougen.com]
> Sent: Wednesday, September 22, 2004 10:35 AM
> To: Nichols, William
> Cc: users  spamassassin.apache.org
> Subject: Re: bayes db problem upgrade from 2.63 --> 3.0
> 
> 
> Nichols, William wrote:
> > I installed spamassassin (test box) from cpan over my existing SA
2.63
> > 
> > 
> > When I try to sa-learn -sync I get the following "bayes db version 2
> is
> > not able to be used, aborting! at
> > /usr/lib/perl5/site_perl/5.8.3/Mail/SpamAssassin/BayesStore/DBM.pm
> line 160"
> 
> Same problem till I upgraded DB_File
> 
> perl -MCPAN -e "install DB_File"
> 
> on a side note, I also needed to install Storable as well.
> 
> Regards,
> 
> Rick
> 




Re: bayes db problem upgrade from 2.63 --> 3.0

2004-09-22 Thread Rick Macdougall

Nichols, William wrote:
Tried that as well before mailing the list - I am having a problem with
install DB_File
version.c:30:16: db.h: No such file or directory
make: *** [version.o] Error 1
  /usr/bin/make  -- NOT OK
Running make test
  Can't test without successful make
Running make install
  make had returned bad status, install seems impossible
hmmm - any help?
Install berkley db ?  and if you are using redhat the devel rpm as well 
(if there is one, dunno, I don't do redhat).

Regards,
Rick


RE: bayes db problem upgrade from 2.63 --> 3.0

2004-09-22 Thread Nichols, William
Tried that as well before mailing the list - I am having a problem with
install DB_File

version.c:30:16: db.h: No such file or directory
make: *** [version.o] Error 1
  /usr/bin/make  -- NOT OK
Running make test
  Can't test without successful make
Running make install
  make had returned bad status, install seems impossible

hmmm - any help?

-Original Message-
From: Rick Macdougall [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, September 22, 2004 10:35 AM
To: Nichols, William
Cc: users@spamassassin.apache.org
Subject: Re: bayes db problem upgrade from 2.63 --> 3.0



Nichols, William wrote:
> I installed spamassassin (test box) from cpan over my existing SA 2.63
> 
> 
> When I try to sa-learn -sync I get the following "bayes db version 2
is 
> not able to be used, aborting! at 
> /usr/lib/perl5/site_perl/5.8.3/Mail/SpamAssassin/BayesStore/DBM.pm
line 160"

Same problem till I upgraded DB_File

perl -MCPAN -e "install DB_File"

on a side note, I also needed to install Storable as well.

Regards,

Rick



Re: bayes db problem upgrade from 2.63 --> 3.0

2004-09-22 Thread Rick Macdougall

Nichols, William wrote:
I installed spamassassin (test box) from cpan over my existing SA 2.63
When I try to sa-learn –sync I get the following “bayes db version 2 is 
not able to be used, aborting! at 
/usr/lib/perl5/site_perl/5.8.3/Mail/SpamAssassin/BayesStore/DBM.pm line 160”
Same problem till I upgraded DB_File
perl -MCPAN -e "install DB_File"
on a side note, I also needed to install Storable as well.
Regards,
Rick