Re: Bayes DB on single-node MySQL cluster
On 7/26/10 5:40 PM, Paul Hirose wrote: https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5998 I just added a comment. Hmm, I'm still on 5.0.77 (the basic RHEL5 repository version.) Do you know which plugin? the innodb plugin for mysql. supports compression on innodb databases, etc. I am SSU*MING THAT Most of your performance issues are read/write to disk? not cpu? compression will help. what kind of volume are you seeing before the drop off? There's a tangential bug #4508 which sends writes to one host and reads to another (presumably for master/slave setups) which I look forward to in a future version of SA as well. Id like to see it be resilient. allow us to put in more than one hostname. -- Michael Scheidell, CTO Phone: 561-999-5000, x 1259 > *| *SECNAP Network Security Corporation * Certified SNORT Integrator * 2008-9 Hot Company Award Winner, World Executive Alliance * Five-Star Partner Program 2009, VARBusiness * Best in Email Security,2010: Network Products Guide * King of Spam Filters, SC Magazine 2008 __ This email has been scanned and certified safe by SpammerTrap(r). For Information please see http://www.secnap.com/products/spammertrap/ __
Re: Bayes DB on single-node MySQL cluster
>> RHEL5.5, MySQL GA 5.0.77, MySQL Cluster 7.1.4b, 64bit, SpamAssassin 3.2.5 >> (but hoping to go to 3.3.1 soon.) >> In short, I stumbled across: >> http://www.clusterdb.com/mysql-cluster/how-can-a-database-be-in-memory-and-durable-at-the-same-time/ >> which >> essentially shows how to create a MySQL Cluster, but of only one node. This >> gets me an all-in-memory database *and* row-level locking. Sorta >> the best of both worlds, compared to using Heap/Memory vs InnoDB engine. >> Has anyone tried this, and did it work for you? > > and if you have a 3.5GB bayes database, don't you need 3.5GB ram? Yep, and we're running on 8GB systems, and have innodb_buffer_pool_size set upwards of 4GB (or max_heap_table_size the same, if we're trying this in Memory/Heap engine instead.) > where is that bugzilla report? I might have a solution for it. https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5998 > > Given this, I know there are folks using m/m-replication, and have seen > > reference to various threads. So far, I haven't see anyone post a glaring > > example about how it failed or anything, but I'm still a touch shy about > > going against the devs :) > > biggest issues seem to be, you need a 5.1.47 or newer mysql, and I think you > want to use the plugin (i think). > still get deadlocks while multi threads are trying to update the bayes DB. > but if you 'swatch' it, maybe you just retry? > or, heck, its just bayes, who care? the spammers will hit you again (and if > you got the deadlock, they did) Hmm, I'm still on 5.0.77 (the basic RHEL5 repository version.) Do you know which plugin? I'm just using BayesStore::MySQL if that's what you mean. If there's something else, I'd appreciate any tips. All in all, the anecdotal gist of random searches seems to be that m/m-replication basically works, and if it really does blow up, emptying and starting it clean is perfectly fine. There's a tangential bug #4508 which sends writes to one host and reads to another (presumably for master/slave setups) which I look forward to in a future version of SA as well. For us, this really all started because of some performance drop-off at crossing a certain load threshold. Don't know why yet :( and we're looking into that too. This alternative just kinda crossed our radar during our investigations into our InnoDB/Memory set up failing on us. PH == Paul Hirose pthir...@ucdavis.edu
Re: Bayes DB on single-node MySQL cluster
On 7/26/10 5:02 PM, Paul Hirose wrote: RHEL5.5, MySQL GA 5.0.77, MySQL Cluster 7.1.4b, 64bit, SpamAssassin 3.2.5 (but hoping to go to 3.3.1 soon.) In short, I stumbled across: http://www.clusterdb.com/mysql-cluster/how-can-a-database-be-in-memory-and-durable-at-the-same-time/ which essentially shows how to create a MySQL Cluster, but of only one node. This gets me an all-in-memory database *and* row-level locking. Sorta the best of both worlds, compared to using Heap/Memory vs InnoDB engine. Has anyone tried this, and did it work for you? and if you have a 3.5GB bayes database, don't you need 3.5GB ram? where is that bugzilla report? I might have a solution for it. There've been threads against using master/master replication or cluster, and a couple bugzilla entries specifically state cluster/replication is "unsafe". I think the main reason behind this is simply the duplication of data, and clear example was given in one bugzilla report. But if I do a single-node cluster (only one data/MySQL node), then there are no copies of data. Thus, it can't get out of sync, because there's nothing else to get out of sync with. Would this then be "safe"? Or is there something inherent in the clustering/replication that just doesn't work? Given this, I know there are folks using m/m-replication, and have seen reference to various threads. So far, I haven't see anyone post a glaring example about how it failed or anything, but I'm still a touch shy about going against the devs :) biggest issues seem to be, you need a 5.1.47 or newer mysql, and I think you want to use the plugin (i think). still get deadlocks while multi threads are trying to update the bayes DB. but if you 'swatch' it, maybe you just retry? or, heck, its just bayes, who care? the spammers will hit you again (and if you got the deadlock, they did) -- Michael Scheidell, CTO Phone: 561-999-5000, x 1259 > *| *SECNAP Network Security Corporation * Certified SNORT Integrator * 2008-9 Hot Company Award Winner, World Executive Alliance * Five-Star Partner Program 2009, VARBusiness * Best in Email Security,2010: Network Products Guide * King of Spam Filters, SC Magazine 2008 __ This email has been scanned and certified safe by SpammerTrap(r). For Information please see http://www.secnap.com/products/spammertrap/ __
Re: Bayes db and token expiry questions
Hi, >> Well, what's the missing 120 MB? The journal? Do a complete sync and >> then delete it. > > Probably the signatures in bayes_seen - there's no mechanism for ageing > them out. And I assume that isn't a problem then? >> "too big" is not an absolute figure. If you store 1-occurence tokens >> you will obviously have more tokens than without them. > > There's not really a choice since all tokens start that way. Maybe a better estimate would be in terms of time. For how long should the unseen tokens (only occurred once, I guess) remain in the database? Perhaps that's a good metric. For me it's about a week now. >> You should use autolearn if you don't do yet. > > Autolearning can make things worse by dropping the retention period. Yes, I'm using autolearn, but how does that affect the retention period? What do the two have to do with each other? Do you mean auto-expire, not auto-learn? My database seems to have improved slightly over the past few days after increasing the max db size to 1.6M. I guess there is also a lot of expiry pending also, because the database is currently much larger than that today: 0.000 02050481 0 non-token data: ntokens Looks like about 345k to be purged, if I understand correctly? Thanks, Alex Thanks, Alex
Re: Bayes db and token expiry questions
On Mon, 29 Mar 2010 13:03:59 +0200 Kai Schaetzl wrote: > Alex wrote on Sun, 28 Mar 2010 13:38:25 -0400: > > > I have a bayes db that's about 160MB with a 40MB token db on a > > system with about 100k messages per day. > > Well, what's the missing 120 MB? The journal? Do a complete sync and > then delete it. Probably the signatures in bayes_seen - there's no mechanism for ageing them out. > You should be > aware that the expiry kicks in at 75%, not at 100% of max_db_size. And it may reduce the tokens to 37.5% of nominal > I suggest you change to SQL. This eliminates the journal. Isn't that slower than journalled db? > > database was too big, so I lowered it back down, but I think that > > was a mistake. > > "too big" is not an absolute figure. If you store 1-occurence tokens > you will obviously have more tokens than without them. There's not really a choice since all tokens start that way. > You should use autolearn if you don't do yet. Autolearning can make things worse by dropping the retention period.
Re: Bayes db and token expiry questions
Alex wrote on Sun, 28 Mar 2010 13:38:25 -0400: > I have a bayes db that's about 160MB with a 40MB token db on a system > with about 100k messages per day. Well, what's the missing 120 MB? The journal? Do a complete sync and then delete it. I've just raised the max_db_size set > to 1.1M tokens (there are currently 1.06M tokens in there). That's not much for a system with 100.000 messages a day. I don't mean it's not sufficient, it is just not "too much". You should be aware that the expiry kicks in at 75%, not at 100% of max_db_size. I've also > changed bayes to write to the journal instead of directly to the > database and just checking it periodically to see if the journal needs > to be synced. I suggest you change to SQL. This eliminates the journal. > > Can someone explain to me the relationship between the frequency of > "1-occurrence tokens" and the size of the database? Here is the output > from a recent manual sync: > > token frequency: 1-occurrence tokens: 72.60% > token frequency: less than 8 occurrences: 18.11% > > I was thinking that the because the tokens are seen only once, it probably means you get a lot of fresh tokens in. Do you autolearn? the > database was too big, so I lowered it back down, but I think that was > a mistake. "too big" is not an absolute figure. If you store 1-occurence tokens you will obviously have more tokens than without them. If you slash the db (which slashes from all tokens, not just those 1.o ones) and the performance goes down afterwards that was obviously a wrong decision ;-) I don't know if and how this is reflected in the database itself in size. This is a DBM database which will have certain sizes by design no matter how many tokens are in it. If the token database is only 40 MB that is not overly large, it's normal. Now some of the same emails are continually hitting only > BAYES_50 while others seemingly the same hit BAYES_99. I've now raised > the number of tokens available and continue to manually train the > database with spam and ham (there are about 1.1M spam and 500k ham > currently). You should use autolearn if you don't do yet. If you want to be safe you can change the learning thresholds to safer values. (I think I use 8 for spam and keep the default for ham.) > Have I configured something wrong, or am I misunderstanding how this > works? Is there something else I should read? I think your db was ok as it was. You should read how to change to SQL ;-) Do the expiry once per night per cron. Kai -- Get your web at Conactive Internet Services: http://www.conactive.com
Re: Bayes DB growing without bound; expiry not working
On Apr 21, 2008, at 8:40 AM, Chris St. Pierre wrote: On Mon, 21 Apr 2008, Michael Parker wrote: select * from bayes_vars; ... 2289 rows in set (0.00 sec) What user do you run bayes under on your MXs? I think you've found the issue. We run as spamd. # sa-learn -u spamd --dump magic 0.000 0 3 0 non-token data: bayes db version 0.000 01492123 0 non-token data: nspam 0.000 0 660634 0 non-token data: nham 0.000 0 73178711 0 non-token data: ntokens 0.000 0 1189775610 0 non-token data: oldest atime 0.000 0 1208785034 0 non-token data: newest atime 0.000 0 0 0 non-token data: last journal sync atime 0.000 0 0 0 non-token data: last expiry atime 0.000 0 0 0 non-token data: last expire atime delta 0.000 0 0 0 non-token data: last expire reduction count That leads to two issues: 1. I need to straighten things out and figure out why I've got a strange mix of per-user and global data in my Bayes DB. Whee. You should use the bayes override username if you want global and then just sa-learn -u clear everything else (PITA, I know). I personally don't believe individual bayes dbs are an issue, if you've got the space and CPU on your database machine. See below for some solutions. 2. Does this mean that, if I use per-user Bayes, I have to run expiration as each user individually? Manual expiration was recommended to me a long time ago as a way to increase database performance, but it seems like it may not be worth it if I have to run N forced expirations, for potentially large values of N. This is true for DBM based bayes databases, but generally (with an exception I'll talk about in a second) MySQL based bayes expiration is very fast (just a few seconds). I would go ahead and turn auto-expire on, after running a manual expire to clear out the current backlog. One reason that expiration slows down is an unoptimized db. I've found for my small uses if I run optimization every couple of weeks I get much better performance. It looks like you get a lot more traffic so I would recommend running it more often. With frequent optimizations and auto-expire your database will stay in much better shape. Michael Thanks for your help. Chris St. Pierre Unix Systems Administrator Nebraska Wesleyan University
Re: Bayes DB growing without bound; expiry not working
On Mon, 21 Apr 2008, Michael Parker wrote: select * from bayes_vars; ... 2289 rows in set (0.00 sec) What user do you run bayes under on your MXs? I think you've found the issue. We run as spamd. # sa-learn -u spamd --dump magic 0.000 0 3 0 non-token data: bayes db version 0.000 01492123 0 non-token data: nspam 0.000 0 660634 0 non-token data: nham 0.000 0 73178711 0 non-token data: ntokens 0.000 0 1189775610 0 non-token data: oldest atime 0.000 0 1208785034 0 non-token data: newest atime 0.000 0 0 0 non-token data: last journal sync atime 0.000 0 0 0 non-token data: last expiry atime 0.000 0 0 0 non-token data: last expire atime delta 0.000 0 0 0 non-token data: last expire reduction count That leads to two issues: 1. I need to straighten things out and figure out why I've got a strange mix of per-user and global data in my Bayes DB. Whee. 2. Does this mean that, if I use per-user Bayes, I have to run expiration as each user individually? Manual expiration was recommended to me a long time ago as a way to increase database performance, but it seems like it may not be worth it if I have to run N forced expirations, for potentially large values of N. Thanks for your help. Chris St. Pierre Unix Systems Administrator Nebraska Wesleyan University
Re: Bayes DB growing without bound; expiry not working
On Apr 21, 2008, at 8:17 AM, Chris St. Pierre wrote: Consequently, my database is growing, apparently without bound. Any ideas how I can get expiry to work properly again? (Hopefully without completely dumping the database?) select * from bayes_vars; What user do you run bayes under on your MXs? Michael
Re: Bayes DB file locations help
got2go wrote: > Hello all, > > I am trying to get Bayes working on CentoS 4.3 with Postfix, MailScanner, > IMP (with Spam reporting feature). > Check your /etc/mail/spamassassin/mailscanner.cf. (if you don't have one, your MailScanner is ancient) If you've got this line: bayes_path /var/spool/MailScanner/spamassassin/bayes Then that's what you're using for everything. If it's commented out, then MailScanner should be using /root/.spamassassin/. IMP might be using /var/www If you have no /etc/mail/spamassassin/mailscanner.cf, and a REALLY old mailscanner, check your spam.assassin.prefs.conf, for a bayes_path. If that's the case then: locally logged in as root is using /root/.spamassassin IMP might be using /var/www/ MailScanner is probably using spam.assassin.prefs.conf, which probably has the /var/spool/MailScanner bayes_path.
Re: Bayes DB
Ok it looks like using sa-learn created the databases fine even with only 1 ham/spam...
Re: Bayes DB
I didn't even realize my reply's were not being sent to the thread I started... Sorry!
RE: Bayes DB
Daniel Aquino wrote: > > run these commands as the defang user. > > Would it be bad to use "root" because defang is not a real user.. "spamd" will not run as root. If you try it, it will switch to "nobody". You can deal with this two ways: If your mail accounts are owned by real users on the system, you can let SA run as the user you are delivering to. In this case, you must make sure that all of your users have read/write access to the Bayes files. If all of your mail accounts are owned by a single user, you can tell spamd which user to run as and set the ownership of the Bayes files to that user. If you are calling "spamassassin" directly, you will need to switch to the correct user yourself and then run it. For spamd, it looks like this: spamd -u mailacct -- Bowie
RE: Bayes DB
Daniel Aquino wrote: > I really don't know if I can extract emails from Outlook 2003 into a > standard mbox format... Maildir is the preferred format. You can extract emails from Outlook, but Outlook and Exchange tend to rewrite portions of the message which makes this less than ideal for SA's purposes. > So I'm thinking "after" I install this gateway, I could set it up to > trap incoming messages some how and collect 200 spam/ham to train the > bayes db... That is what I am doing for the accounts that I control. The gateway sorts the mail into ham and spam folders on the server and also forwards it along to Exchange. Every day or so, I scan through the ham and spam folders to make sure the messages are classified properly and then run sa-learn on the directories. -- Bowie
RE: Bayes DB
Daniel Aquino wrote: > > 1) What (exactly) did you do? > > # local.cf config file at this url > http://pastie.caboo.se/60756 > > > What user is SA running as? What are the permissions on the bayes > > directory? > > drwx-- 2 defang defang 4096 2007-05-11 10:48 > /var/spool/MD-Databases/ > > > 2) What (exactly) was the result? > > ls /var/spool/MD-Databases/ > auto-whitelist* auto-whitelist.mutex > > > Why do you say it didn't work? > > I would like to see the bayes_* db files show up in > /var/spool/MD-Database > I sent a few spam messages through SpamAssassin and it created the > whitelist, db files but not the bayes files... Please post replies to the list so that others can learn from and comment on them. Assuming that SA is running as the defang user, I don't see anything obviously wrong with your setup. It may be that Bayes simply hasn't seen anything to learn from yet. Take a couple of your messages and learn from them manually and see if the Bayes files are created. sa-learn --ham sample_nonspam.msg sa-learn --spam sample_spam.msg You can also query the database manually and see what is there. sa-learn --dump magic This will give you all of the message and token counts. Make sure you run these commands as the defang user. -- Bowie
RE: Bayes DB
Luis Hernán Otegui wrote: > First, RTFM. > Second, Google. > Third, oh, well... You NEED to feed Bayes a significant amount of > data, so it knows what is spam and waht is ham, due to the fact that > the kind of spam and ham you receive is different from the ones I get > on my servers. Then it will start auto learning on that basis. But, to > start, it needs you to feed it data... > > Luix > > 2007/5/11, Daniel Aquino <[EMAIL PROTECTED]>: > > > Have you trained the bayes database? Is this a fresh install? It > > > needs at least 200 spam and 200 ham messages to get it going. > > > However, the more ham and spam you can feed it, the better it > > > will perform... > > > > Well I thought I could use the auto-learning feature ? You can use auto-learning, you just have to watch it at first and manually re-learn any messages that it mis-classifies. Also, depending on your traffic patterns, it can take a while for Bayes to learn the required 200 spam and 200 ham messages from auto-learning alone. For best results teach it manually at the beginning. This way, you know you have a good database when you start. With auto-learning alone, the database can get corrupted before you ever get a chance to use it. -- Bowie
Re: Bayes DB
First, RTFM. Second, Google. Third, oh, well... You NEED to feed Bayes a significant amount of data, so it knows what is spam and waht is ham, due to the fact that the kind of spam and ham you receive is different from the ones I get on my servers. Then it will start auto learning on that basis. But, to start, it needs you to feed it data... Luix 2007/5/11, Daniel Aquino <[EMAIL PROTECTED]>: > Have you trained the bayes database? Is this a fresh install? It needs > at least 200 spam and 200 ham messages to get it going. However, the > more ham and spam you can feed it, the better it will perform... Well I thought I could use the auto-learning feature ? -- - GNU-GPL: "May The Source Be With You... Linux Registered User #448382. -
RE: Bayes DB
Daniel Aquino wrote: > I setup Bayes and whitelist db paths in my local.cf > The whitelist db created succesfully but the bayes_* db's did not... More information please... Just saying that it doesn't work isn't very helpful. Before we can help you, we need the two basic pieces of information: 1) What (exactly) did you do? Show us the path line you put in the local.cf as well as any other configuration changes you made that may be relevant. What user is SA running as? What are the permissions on the bayes directory? 2) What (exactly) was the result? Why do you say it didn't work? Show us any error messages you got. Describe any problems that you saw. -- Bowie
Re: Bayes DB
Have you trained the bayes database? Is this a fresh install? It needs at least 200 spam and 200 ham messages to get it going. However, the more ham and spam you can feed it, the better it will perform... Luix 2007/5/11, Daniel Aquino <[EMAIL PROTECTED]>: I setup Bayes and whitelist db paths in my local.cf The whitelist db created succesfully but the bayes_* db's did not... -- - GNU-GPL: "May The Source Be With You... Linux Registered User #448382. -
Re: Bayes db size....
- Original Message - From: "Dave Koontz" <[EMAIL PROTECTED]> To: "'spam mailling list'" Sent: Saturday, February 17, 2007 9:30 AM Subject: Re: Bayes db size Is there a consensus on this need? I deal with the seen db issue by scheduled deletion of that file. That said, with SA becoming more and more prominent all the time, I suspect the Average Joe will miss this oddity until they wind up with a sluggish system, out of drive space or other related issues. I was mostly curious of the logic on NOT doing maintenance on the Seen and AWL db files. If there is a consensus this needs to occur, then perhaps I can take the time to create a proper patch. I just want to make sure I am not missing something fundamental here Michael Parker wrote: Dave Koontz wrote: I use the SQL interface and expire the bayes_seen like this. I believe 6 months to be over conservative. I added a lastupdate column as a timestamp. In the perl DBM I would recommend you use a technique such as this and update the timestamp in perl. It converts nicely to SQL. Here is my query for cleaning bayes_seen: mysql -u$USER -p$PW -h$SERVER -e\ "DELETE FROM bayes_seen WHERE lastupdate <= DATE_SUB(SYSDATE(), INTERVAL 6 MONTH); " \ $DB Hope this helps, Ken
Re: Bayes db size....
Is there a consensus on this need? I deal with the seen db issue by scheduled deletion of that file. That said, with SA becoming more and more prominent all the time, I suspect the Average Joe will miss this oddity until they wind up with a sluggish system, out of drive space or other related issues. I was mostly curious of the logic on NOT doing maintenance on the Seen and AWL db files. If there is a consensus this needs to occur, then perhaps I can take the time to create a proper patch. I just want to make sure I am not missing something fundamental here Michael Parker wrote: > Dave Koontz wrote: > >> I am sure this has been asked numerous times before, but what is the logic >> in having auto expiry on the bayes DB, and not seen? Seems that once tokens >> have been removed from the DB there is little to no use for 'unlearning' any >> associated messages. Besides on a busy system, this seen file gets large >> very fast. I'd vote for auto expiry and maintenance on seen as well as AWL. >> >> > > Patches welcome. > > Michael > > >
Re: Bayes db size....
Dave Koontz wrote: > I am sure this has been asked numerous times before, but what is the logic > in having auto expiry on the bayes DB, and not seen? Seems that once tokens > have been removed from the DB there is little to no use for 'unlearning' any > associated messages. Besides on a busy system, this seen file gets large > very fast. I'd vote for auto expiry and maintenance on seen as well as AWL. > Patches welcome. Michael > > -Original Message- > From: Theo Van Dinter [mailto:[EMAIL PROTECTED] > Sent: Friday, February 16, 2007 7:19 PM > To: spam mailling list > Subject: Re: Bayes db size > > On Fri, Feb 16, 2007 at 06:17:36PM -0600, Robert Nicholson wrote: >> So you're saying that right now seen isn't capped like tokens right? > > seen has no max size nor expiry features. > > -- > Randomly Selected Tagline: > "Like any French restaurant in America, it was overpriced, noisy, moody, > and would put you in mortal danger if you had an accident with anything > larger than a croissant." - Unknown about the Renault LeCar > >
RE: Bayes db size....
I am sure this has been asked numerous times before, but what is the logic in having auto expiry on the bayes DB, and not seen? Seems that once tokens have been removed from the DB there is little to no use for 'unlearning' any associated messages. Besides on a busy system, this seen file gets large very fast. I'd vote for auto expiry and maintenance on seen as well as AWL. -Original Message- From: Theo Van Dinter [mailto:[EMAIL PROTECTED] Sent: Friday, February 16, 2007 7:19 PM To: spam mailling list Subject: Re: Bayes db size On Fri, Feb 16, 2007 at 06:17:36PM -0600, Robert Nicholson wrote: > So you're saying that right now seen isn't capped like tokens right? seen has no max size nor expiry features. -- Randomly Selected Tagline: "Like any French restaurant in America, it was overpriced, noisy, moody, and would put you in mortal danger if you had an accident with anything larger than a croissant." - Unknown about the Renault LeCar
Re: Bayes db size....
On Fri, Feb 16, 2007 at 06:45:51PM -0600, Robert Nicholson wrote: > Well then I only care about tokens and not repeated emails can I > disable seen? You can't disable it, but you can delete it, as previously stated. -- Randomly Selected Tagline: 54% of all statistics are made up. No, make that 82%... pgpJeszJhPLwp.pgp Description: PGP signature
Re: Bayes db size....
Well then I only care about tokens and not repeated emails can I disable seen? On Feb 16, 2007, at 6:19 PM, Theo Van Dinter wrote: On Fri, Feb 16, 2007 at 06:17:36PM -0600, Robert Nicholson wrote: So you're saying that right now seen isn't capped like tokens right? seen has no max size nor expiry features. -- Randomly Selected Tagline: "Like any French restaurant in America, it was overpriced, noisy, moody, and would put you in mortal danger if you had an accident with anything larger than a croissant." - Unknown about the Renault LeCar
Re: Bayes db size....
On Fri, Feb 16, 2007 at 06:17:36PM -0600, Robert Nicholson wrote: > So you're saying that right now seen isn't capped like tokens right? seen has no max size nor expiry features. -- Randomly Selected Tagline: "Like any French restaurant in America, it was overpriced, noisy, moody, and would put you in mortal danger if you had an accident with anything larger than a croissant." - Unknown about the Renault LeCar pgpoU1aLK9mxe.pgp Description: PGP signature
Re: Bayes db size....
So you're saying that right now seen isn't capped like tokens right? On Feb 16, 2007, at 5:45 PM, Theo Van Dinter wrote: On Fri, Feb 16, 2007 at 05:42:13PM -0600, Robert Nicholson wrote: Why then is my Bayes DB 20MEG in size right now if =item bayes_expiry_max_db_size (default: 15) That's in number of tokens, not physical size in bytes. 100,000 tokens, whichever has a larger value. 150,000 tokens is roughly equivalent to a 8Mb database file. That's an estimate, but depends on your platforms, libraries, etc. How do I control the size of the _seen file? You can delete it if you want to. You'll be able to release messages again, but that may not be an issue for you. -- Randomly Selected Tagline: "Truly unencumbered by the engineering process." - Unknown about the Renault Dauphine
Re: Bayes db size....
On Fri, Feb 16, 2007 at 05:42:13PM -0600, Robert Nicholson wrote: > Why then is my Bayes DB 20MEG in size right now if > =item bayes_expiry_max_db_size (default: 15) That's in number of tokens, not physical size in bytes. > 100,000 tokens, whichever has a larger value. 150,000 tokens is roughly > equivalent to a 8Mb database file. That's an estimate, but depends on your platforms, libraries, etc. > How do I control the size of the _seen file? You can delete it if you want to. You'll be able to release messages again, but that may not be an issue for you. -- Randomly Selected Tagline: "Truly unencumbered by the engineering process." - Unknown about the Renault Dauphine pgp5XYTaI5E5C.pgp Description: PGP signature
Re: bayes db version
At 2007. january 14. 20.32 Theo Van Dinter wrote: > http://issues.apache.org/SpamAssassin/show_bug.cgi?id=3563 well, afaik all my sa-learn instances run through procmail, having correctly locked: :0cw:/tmp/some.lock |sa-learn --spam --no-sync --single however this was happened, when procmail reveived 21 mails to learn, and the warning occured one time out of the 21 total possibilities, this way it is possibly a race condition. -- With regards: Imre Péntek E-Mail: [EMAIL PROTECTED]
Re: bayes db version
On Sun, Jan 14, 2007 at 11:29:38AM +0100, Péntek Imre wrote: > this output was generated by sa-learn: > bayes: bayes db version 0 is not able to be used, aborting! [...] > So far this is the first and only time I saw this warning, and since no > warnings like this displayed. Sounds like: http://issues.apache.org/SpamAssassin/show_bug.cgi?id=3563 -- Randomly Selected Tagline: A production of the digitally insane. pgpbv2LFxgdTj.pgp Description: PGP signature
Re: bayes db site wide or per user
On Sat, Dec 09, 2006 at 01:48:51PM +0100, Alex Handle wrote: > I could disable the spamchecks in amavisd-new and invoke sa through > maildrop. > But i don't know if a per-user database would scale for 100,000 mailboxes? IMO, Bayes will likely be ok if you use SQL (though your DB will be quite a bit larger). I think the issue is going to be CPU -- more expires, scanning mail delivered to multiple people multiple times, etc. Generally speaking I believe, large user installations go site-wide. -- Randomly Selected Tagline: Leela: "He's crude and gross and he treats me like a slave." Fry: "Then dump his one-eyed ass." pgp01qFcp6384.pgp Description: PGP signature
Re: bayes db site wide or per user
Theo Van Dinter schrieb: On Fri, Dec 08, 2006 at 09:44:04PM +0100, Alex Handle wrote: postfix/mysql/nfs/amavisd-new/spamassassin and now we Is it a bad idea to use a site wide bayes database or is it better to use a per user database in this scenario? Per user DBs will give you better results, but since you're running from the MTA, your only choice is site-wide. I could disable the spamchecks in amavisd-new and invoke sa through maildrop. But i don't know if a per-user database would scale for 100,000 mailboxes?
Re: bayes db site wide or per user
On Fri, Dec 08, 2006 at 09:44:04PM +0100, Alex Handle wrote: > postfix/mysql/nfs/amavisd-new/spamassassin and now we > > Is it a bad idea to use a site wide bayes database or is it better > to use a per user database in this scenario? Per user DBs will give you better results, but since you're running from the MTA, your only choice is site-wide. -- Randomly Selected Tagline: "Wheee! ...ow, I bit my tongue!" --Ralph Wiggum Bart's Inner Child (Episode 1F05) pgpHXRSHFKtRT.pgp Description: PGP signature
RE: RE: Bayes DB version issue 3.1.3 => 3.1.4
Nigel, I ended up taking the approach you listed a little earlier. The problem is that I now have two separate bayes databases; one for RH/3.1.3 and one for rPath/3.1.4. This isn't that much of a resource problem rather a redundancy problem (as I replicate the databases to our DR location, etc). So I imported the data and started testing. For some reason it was taking upwards of 70 seconds per message. This is starting SA right after installing. After reboot it did drop down to .5-1.5 range though. I was getting worried. I know have two 3.1.4 machines up and running. I will swap out two of my 4 other 3.1.3 and upgrade those in a couple days after it has ran for a while. Gary Wayne Smith > -Original Message- > From: Nigel Frankcom [mailto:[EMAIL PROTECTED] > Sent: Tuesday, August 08, 2006 7:03 PM > To: Gary W. Smith > Subject: Re: RE: Bayes DB version issue 3.1.3 => 3.1.4 > > Hi Gary, > > A dump from the SA db should reimport; you may have to kill the latin > line in the dump and replace it with UTF8, beyond that it should be a > straight forward dump and reload? > > Let me know how it goes? > > Kind regards > > Nigel > > n Tue, 8 Aug 2006 16:12:21 -0700, "Gary W. Smith" > <[EMAIL PROTECTED]> wrote: > > >I've created a new database in UTF8 format. I will see how this works > >out. I might try to copy the data from the Latin database to the UTF8 > >database but in past experience this hasn't worked that great. I might > >also make a backup as well and try that. > > > > > > > >-Original Message----- > >From: Gary W. Smith [mailto:[EMAIL PROTECTED] > >Sent: Tuesday, August 08, 2006 2:23 PM > >To: users@spamassassin.apache.org > >Subject: RE: Bayes DB version issue 3.1.3 => 3.1.4 > > > >Okay, I have a little more information now. I run the same command that > >sql.pm would run. It appears to be a collation issue. Can we force the > >collation with 3.1.4 to a specific type? In my case the database is in > >latin because 3.1.3 choked on UTF8. This was on RHEL4 (which defaults > >to UTF8). The kernel is 2.6.9. > > > >I'm trying to get this to run on rPath Linux which is on 2.6.16. I > >would suspect that they have implemented more libraries in UTF8 now than > >back on kernel 2.6.9. > > > >Anyway, here is the command I issued to catch this point: > > > >echo "SELECT value FROM bayes_global_vars WHERE variable = 'VERSION';" | > >mysql -u user -D database -h 10.0.13.13 -ppassword > > > >ERROR 1267 (HY000) at line 1: Illegal mix of collations > >(latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for > >operation '=' > > > >Any help would be greatly appreciated. > > > >Gary Wayne Smith > > > >-Original Message- > >From: Gary W. Smith [mailto:[EMAIL PROTECTED] > >Sent: Tuesday, August 08, 2006 8:06 AM > >To: Daryl C. W. O'Shea > >Cc: users@spamassassin.apache.org > >Subject: RE: Bayes DB version issue 3.1.3 => 3.1.4 > > > >Daryl, > > > >Thanks for the info. I will update the .8. As for the database, which > >is the primary concern, the user account is correct. I have logged into > >the database from that server using the same credentials from the > >local.cf file. I had thought that we might have restricted by subnet so > >I did indeed try that last night. > > > >[EMAIL PROTECTED] spamassassin]# mysql -u xxx -h xx.xx.xx.xx -D > >spamassassin -p > >Enter password: > >Reading table information for completion of table and column names > >You can turn off this feature to get a quicker startup with -A > > > >Welcome to the MySQL monitor. Commands end with ; or \g. > >Your MySQL connection id is 6649341 to server version: 4.1.7-log > > > >Type 'help;' or '\h' for help. Type '\c' to clear the buffer. > > > >mysql> show tables; > >++ > >| Tables_in_spamassassin | > >++ > >| awl| > >| bayes_expire | > >| bayes_global_vars | > >| bayes_seen | > >| bayes_token| > >| bayes_vars | > >| userpref | > >++ > >7 rows in set (0.00 sec) > > > >mysql> select * from bayes_global_vars; > >+--+---+ > >| variable | value | > >+--+---+ > >| VERSION | 3 | > >+--+---+ > >1 row in set (0.00 sec) > > > >mysql&
RE: Bayes DB version issue 3.1.3 => 3.1.4
I've created a new database in UTF8 format. I will see how this works out. I might try to copy the data from the Latin database to the UTF8 database but in past experience this hasn't worked that great. I might also make a backup as well and try that. -Original Message- From: Gary W. Smith [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 08, 2006 2:23 PM To: users@spamassassin.apache.org Subject: RE: Bayes DB version issue 3.1.3 => 3.1.4 Okay, I have a little more information now. I run the same command that sql.pm would run. It appears to be a collation issue. Can we force the collation with 3.1.4 to a specific type? In my case the database is in latin because 3.1.3 choked on UTF8. This was on RHEL4 (which defaults to UTF8). The kernel is 2.6.9. I'm trying to get this to run on rPath Linux which is on 2.6.16. I would suspect that they have implemented more libraries in UTF8 now than back on kernel 2.6.9. Anyway, here is the command I issued to catch this point: echo "SELECT value FROM bayes_global_vars WHERE variable = 'VERSION';" | mysql -u user -D database -h 10.0.13.13 -ppassword ERROR 1267 (HY000) at line 1: Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '=' Any help would be greatly appreciated. Gary Wayne Smith -Original Message- From: Gary W. Smith [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 08, 2006 8:06 AM To: Daryl C. W. O'Shea Cc: users@spamassassin.apache.org Subject: RE: Bayes DB version issue 3.1.3 => 3.1.4 Daryl, Thanks for the info. I will update the .8. As for the database, which is the primary concern, the user account is correct. I have logged into the database from that server using the same credentials from the local.cf file. I had thought that we might have restricted by subnet so I did indeed try that last night. [EMAIL PROTECTED] spamassassin]# mysql -u xxx -h xx.xx.xx.xx -D spamassassin -p Enter password: Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 6649341 to server version: 4.1.7-log Type 'help;' or '\h' for help. Type '\c' to clear the buffer. mysql> show tables; ++ | Tables_in_spamassassin | ++ | awl| | bayes_expire | | bayes_global_vars | | bayes_seen | | bayes_token| | bayes_vars | | userpref | ++ 7 rows in set (0.00 sec) mysql> select * from bayes_global_vars; +--+---+ | variable | value | +--+---+ | VERSION | 3 | +--+---+ 1 row in set (0.00 sec) mysql> -Original Message- From: Daryl C. W. O'Shea [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 08, 2006 12:38 AM To: Gary W. Smith Cc: users@spamassassin.apache.org Subject: Re: Bayes DB version issue 3.1.3 => 3.1.4 On 8/8/2006 3:29 AM, Gary W. Smith wrote: > Hello, > > I can't remember smoking crack when copying the config files over but > anything's possible. > > I built out a new machine today and installed SA. We have a list of > CPAN modules that were installed (same list as from the 3.1.3 servers). > I copied everything in the /etc/mail/spamassassin from our productions > servers to the test server and after starting we receive errors. I have > checked and the MySQL data instance is accessible from this server. > There are also several rules that are errors as well. > > I know that someone has asked this question already but I didn't find > the answer in the thread archive. > > Here are the contents of the log file: > > Aug 7 21:45:59 labtest01c spamd[2693]: config: score: the non-numeric > score (.8) is not valid, a numeric score is required > > Aug 7 21:45:59 labtest01c spamd[2693]: config: SpamAssassin failed to > parse line, "SUBJ_HAS_UNIQ_ID .8 0.212 0.682 1.677" is not valid for > "score", skipping: score SUBJ_HAS_UNIQ_ID .8 0.212 0.682 1.677 ".8" requires a leading zero. > Aug 7 21:46:01 labtest01c spamd[2693]: bayes: database version 0 is > different than we understand (3), aborting! at > /usr/lib/perl5/site_perl/5.8.7/Mail/SpamAssassin/BayesStore/SQL.pm line > 135. > > Aug 7 21:46:03 labtest01c spamd[2693]: bayes: database version 0 is > different than we understand (3), aborting! at > /usr/lib/perl5/site_perl/5.8.7/Mail/SpamAssassin/BayesStore/SQL.pm line > 135. SQL server privilege issue? > Aug 7 21:46:05 labtest01c spamd[2693]: rules: meta test DIGEST_MULTIPLE > has undefined dependency 'RAZOR2_CHECK' > > Aug
RE: Bayes DB version issue 3.1.3 => 3.1.4
Okay, I have a little more information now. I run the same command that sql.pm would run. It appears to be a collation issue. Can we force the collation with 3.1.4 to a specific type? In my case the database is in latin because 3.1.3 choked on UTF8. This was on RHEL4 (which defaults to UTF8). The kernel is 2.6.9. I'm trying to get this to run on rPath Linux which is on 2.6.16. I would suspect that they have implemented more libraries in UTF8 now than back on kernel 2.6.9. Anyway, here is the command I issued to catch this point: echo "SELECT value FROM bayes_global_vars WHERE variable = 'VERSION';" | mysql -u user -D database -h 10.0.13.13 -ppassword ERROR 1267 (HY000) at line 1: Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '=' Any help would be greatly appreciated. Gary Wayne Smith -Original Message- From: Gary W. Smith [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 08, 2006 8:06 AM To: Daryl C. W. O'Shea Cc: users@spamassassin.apache.org Subject: RE: Bayes DB version issue 3.1.3 => 3.1.4 Daryl, Thanks for the info. I will update the .8. As for the database, which is the primary concern, the user account is correct. I have logged into the database from that server using the same credentials from the local.cf file. I had thought that we might have restricted by subnet so I did indeed try that last night. [EMAIL PROTECTED] spamassassin]# mysql -u xxx -h xx.xx.xx.xx -D spamassassin -p Enter password: Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 6649341 to server version: 4.1.7-log Type 'help;' or '\h' for help. Type '\c' to clear the buffer. mysql> show tables; ++ | Tables_in_spamassassin | ++ | awl| | bayes_expire | | bayes_global_vars | | bayes_seen | | bayes_token| | bayes_vars | | userpref | ++ 7 rows in set (0.00 sec) mysql> select * from bayes_global_vars; +--+---+ | variable | value | +--+---+ | VERSION | 3 | +--+---+ 1 row in set (0.00 sec) mysql> -Original Message- From: Daryl C. W. O'Shea [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 08, 2006 12:38 AM To: Gary W. Smith Cc: users@spamassassin.apache.org Subject: Re: Bayes DB version issue 3.1.3 => 3.1.4 On 8/8/2006 3:29 AM, Gary W. Smith wrote: > Hello, > > I can't remember smoking crack when copying the config files over but > anything's possible. > > I built out a new machine today and installed SA. We have a list of > CPAN modules that were installed (same list as from the 3.1.3 servers). > I copied everything in the /etc/mail/spamassassin from our productions > servers to the test server and after starting we receive errors. I have > checked and the MySQL data instance is accessible from this server. > There are also several rules that are errors as well. > > I know that someone has asked this question already but I didn't find > the answer in the thread archive. > > Here are the contents of the log file: > > Aug 7 21:45:59 labtest01c spamd[2693]: config: score: the non-numeric > score (.8) is not valid, a numeric score is required > > Aug 7 21:45:59 labtest01c spamd[2693]: config: SpamAssassin failed to > parse line, "SUBJ_HAS_UNIQ_ID .8 0.212 0.682 1.677" is not valid for > "score", skipping: score SUBJ_HAS_UNIQ_ID .8 0.212 0.682 1.677 ".8" requires a leading zero. > Aug 7 21:46:01 labtest01c spamd[2693]: bayes: database version 0 is > different than we understand (3), aborting! at > /usr/lib/perl5/site_perl/5.8.7/Mail/SpamAssassin/BayesStore/SQL.pm line > 135. > > Aug 7 21:46:03 labtest01c spamd[2693]: bayes: database version 0 is > different than we understand (3), aborting! at > /usr/lib/perl5/site_perl/5.8.7/Mail/SpamAssassin/BayesStore/SQL.pm line > 135. SQL server privilege issue? > Aug 7 21:46:05 labtest01c spamd[2693]: rules: meta test DIGEST_MULTIPLE > has undefined dependency 'RAZOR2_CHECK' > > Aug 7 21:46:05 labtest01c spamd[2693]: rules: meta test DIGEST_MULTIPLE > has undefined dependency 'DCC_CHECK' > > Aug 7 21:46:05 labtest01c spamd[2693]: rules: meta test DRUGS_ERECTILE > has undefined dependency '__DRUGS_ERECTILE7' > > Aug 7 21:46:05 labtest01c spamd[2693]: rules: meta test > SARE_SUB_ACCEPT_CCARDS has undefined dependency '__SARE_SUB_FROM_PAYPAL' > > Aug 7 21:46:0
RE: Bayes DB version issue 3.1.3 => 3.1.4
Daryl, Thanks for the info. I will update the .8. As for the database, which is the primary concern, the user account is correct. I have logged into the database from that server using the same credentials from the local.cf file. I had thought that we might have restricted by subnet so I did indeed try that last night. [EMAIL PROTECTED] spamassassin]# mysql -u xxx -h xx.xx.xx.xx -D spamassassin -p Enter password: Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 6649341 to server version: 4.1.7-log Type 'help;' or '\h' for help. Type '\c' to clear the buffer. mysql> show tables; ++ | Tables_in_spamassassin | ++ | awl| | bayes_expire | | bayes_global_vars | | bayes_seen | | bayes_token| | bayes_vars | | userpref | ++ 7 rows in set (0.00 sec) mysql> select * from bayes_global_vars; +--+---+ | variable | value | +--+---+ | VERSION | 3 | +--+---+ 1 row in set (0.00 sec) mysql> -Original Message- From: Daryl C. W. O'Shea [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 08, 2006 12:38 AM To: Gary W. Smith Cc: users@spamassassin.apache.org Subject: Re: Bayes DB version issue 3.1.3 => 3.1.4 On 8/8/2006 3:29 AM, Gary W. Smith wrote: > Hello, > > I can't remember smoking crack when copying the config files over but > anything's possible. > > I built out a new machine today and installed SA. We have a list of > CPAN modules that were installed (same list as from the 3.1.3 servers). > I copied everything in the /etc/mail/spamassassin from our productions > servers to the test server and after starting we receive errors. I have > checked and the MySQL data instance is accessible from this server. > There are also several rules that are errors as well. > > I know that someone has asked this question already but I didn't find > the answer in the thread archive. > > Here are the contents of the log file: > > Aug 7 21:45:59 labtest01c spamd[2693]: config: score: the non-numeric > score (.8) is not valid, a numeric score is required > > Aug 7 21:45:59 labtest01c spamd[2693]: config: SpamAssassin failed to > parse line, "SUBJ_HAS_UNIQ_ID .8 0.212 0.682 1.677" is not valid for > "score", skipping: score SUBJ_HAS_UNIQ_ID .8 0.212 0.682 1.677 ".8" requires a leading zero. > Aug 7 21:46:01 labtest01c spamd[2693]: bayes: database version 0 is > different than we understand (3), aborting! at > /usr/lib/perl5/site_perl/5.8.7/Mail/SpamAssassin/BayesStore/SQL.pm line > 135. > > Aug 7 21:46:03 labtest01c spamd[2693]: bayes: database version 0 is > different than we understand (3), aborting! at > /usr/lib/perl5/site_perl/5.8.7/Mail/SpamAssassin/BayesStore/SQL.pm line > 135. SQL server privilege issue? > Aug 7 21:46:05 labtest01c spamd[2693]: rules: meta test DIGEST_MULTIPLE > has undefined dependency 'RAZOR2_CHECK' > > Aug 7 21:46:05 labtest01c spamd[2693]: rules: meta test DIGEST_MULTIPLE > has undefined dependency 'DCC_CHECK' > > Aug 7 21:46:05 labtest01c spamd[2693]: rules: meta test DRUGS_ERECTILE > has undefined dependency '__DRUGS_ERECTILE7' > > Aug 7 21:46:05 labtest01c spamd[2693]: rules: meta test > SARE_SUB_ACCEPT_CCARDS has undefined dependency '__SARE_SUB_FROM_PAYPAL' > > Aug 7 21:46:05 labtest01c spamd[2693]: rules: meta test > SARE_SPEC_PROLEO_M2a has dependency 'MIME_QP_LONG_LINE' with a zero score > > Aug 7 21:46:05 labtest01c spamd[2693]: rules: meta test > SARE_HEAD_SUBJ_RAND has undefined dependency 'SARE_XMAIL_SUSP2' > > Aug 7 21:46:05 labtest01c spamd[2693]: rules: meta test > SARE_HEAD_SUBJ_RAND has undefined dependency 'SARE_HEAD_XAUTH_WARN' > > Aug 7 21:46:05 labtest01c spamd[2693]: rules: meta test > SARE_HEAD_SUBJ_RAND has dependency 'X_AUTH_WARN_FAKED' with a zero score > > Aug 7 21:46:05 labtest01c spamd[2693]: rules: meta test SARE_RD_SAFE > has undefined dependency 'SARE_RD_SAFE_MKSHRT' > > Aug 7 21:46:05 labtest01c spamd[2693]: rules: meta test SARE_RD_SAFE > has undefined dependency 'SARE_RD_SAFE_GT' > > Aug 7 21:46:05 labtest01c spamd[2693]: rules: meta test SARE_RD_SAFE > has undefined dependency 'SARE_RD_SAFE_TINY' > > Aug 7 21:46:05 labtest01c spamd[2693]: rules: meta test > SARE_MSGID_LONG45 has undefined dependency '__SARE_MSGI
Re: Bayes DB version issue 3.1.3 => 3.1.4
On 8/8/2006 3:29 AM, Gary W. Smith wrote: Hello, I can’t remember smoking crack when copying the config files over but anything’s possible. I built out a new machine today and installed SA. We have a list of CPAN modules that were installed (same list as from the 3.1.3 servers). I copied everything in the /etc/mail/spamassassin from our productions servers to the test server and after starting we receive errors. I have checked and the MySQL data instance is accessible from this server. There are also several rules that are errors as well. I know that someone has asked this question already but I didn’t find the answer in the thread archive. Here are the contents of the log file: Aug 7 21:45:59 labtest01c spamd[2693]: config: score: the non-numeric score (.8) is not valid, a numeric score is required Aug 7 21:45:59 labtest01c spamd[2693]: config: SpamAssassin failed to parse line, "SUBJ_HAS_UNIQ_ID .8 0.212 0.682 1.677" is not valid for "score", skipping: score SUBJ_HAS_UNIQ_ID .8 0.212 0.682 1.677 ".8" requires a leading zero. Aug 7 21:46:01 labtest01c spamd[2693]: bayes: database version 0 is different than we understand (3), aborting! at /usr/lib/perl5/site_perl/5.8.7/Mail/SpamAssassin/BayesStore/SQL.pm line 135. Aug 7 21:46:03 labtest01c spamd[2693]: bayes: database version 0 is different than we understand (3), aborting! at /usr/lib/perl5/site_perl/5.8.7/Mail/SpamAssassin/BayesStore/SQL.pm line 135. SQL server privilege issue? Aug 7 21:46:05 labtest01c spamd[2693]: rules: meta test DIGEST_MULTIPLE has undefined dependency 'RAZOR2_CHECK' Aug 7 21:46:05 labtest01c spamd[2693]: rules: meta test DIGEST_MULTIPLE has undefined dependency 'DCC_CHECK' Aug 7 21:46:05 labtest01c spamd[2693]: rules: meta test DRUGS_ERECTILE has undefined dependency '__DRUGS_ERECTILE7' Aug 7 21:46:05 labtest01c spamd[2693]: rules: meta test SARE_SUB_ACCEPT_CCARDS has undefined dependency '__SARE_SUB_FROM_PAYPAL' Aug 7 21:46:05 labtest01c spamd[2693]: rules: meta test SARE_SPEC_PROLEO_M2a has dependency 'MIME_QP_LONG_LINE' with a zero score Aug 7 21:46:05 labtest01c spamd[2693]: rules: meta test SARE_HEAD_SUBJ_RAND has undefined dependency 'SARE_XMAIL_SUSP2' Aug 7 21:46:05 labtest01c spamd[2693]: rules: meta test SARE_HEAD_SUBJ_RAND has undefined dependency 'SARE_HEAD_XAUTH_WARN' Aug 7 21:46:05 labtest01c spamd[2693]: rules: meta test SARE_HEAD_SUBJ_RAND has dependency 'X_AUTH_WARN_FAKED' with a zero score Aug 7 21:46:05 labtest01c spamd[2693]: rules: meta test SARE_RD_SAFE has undefined dependency 'SARE_RD_SAFE_MKSHRT' Aug 7 21:46:05 labtest01c spamd[2693]: rules: meta test SARE_RD_SAFE has undefined dependency 'SARE_RD_SAFE_GT' Aug 7 21:46:05 labtest01c spamd[2693]: rules: meta test SARE_RD_SAFE has undefined dependency 'SARE_RD_SAFE_TINY' Aug 7 21:46:05 labtest01c spamd[2693]: rules: meta test SARE_MSGID_LONG45 has undefined dependency '__SARE_MSGID_LONG50' Aug 7 21:46:05 labtest01c spamd[2693]: rules: meta test SARE_MSGID_LONG45 has undefined dependency '__SARE_MSGID_LONG55' Aug 7 21:46:05 labtest01c spamd[2693]: rules: meta test SARE_MSGID_LONG45 has undefined dependency '__SARE_MSGID_LONG65' Aug 7 21:46:05 labtest01c spamd[2693]: rules: meta test SARE_MSGID_LONG45 has undefined dependency '__SARE_MSGID_LONG75' Aug 7 21:46:06 labtest01c spamd[2693]: rules: meta test VIRUS_WARNING_DOOM_BNC has undefined dependency 'VIRUS_WARNING_MYDOOM4' Aug 7 21:46:06 labtest01c spamd[2693]: rules: meta test SARE_OBFU_CIALIS has undefined dependency 'SARE_OBFU_CIALIS2' Aug 7 21:46:06 labtest01c spamd[2693]: rules: meta test FP_MIXED_PORN3 has undefined dependency 'FP_PENETRATION' Not errors, just info. Aug 7 21:46:07 labtest01c spamd[2693]: spamd: server started on port 783/tcp (running version 3.1.4) Aug 7 21:46:07 labtest01c spamd[2693]: spamd: server pid: 2693 Aug 7 21:46:07 labtest01c spamd[2693]: spamd: server successfully spawned child process, pid 2700 Aug 7 21:46:07 labtest01c spamd[2693]: spamd: server successfully spawned child process, pid 2701 Aug 7 21:46:07 labtest01c spamd[2693]: prefork: child states: II Normal startup info. Daryl
Re: Bayes db corrupt, not fixable?
On Tue, Jun 27, 2006 at 10:50:03AM -0500, Larry Starr wrote: > I don't believe that it is referring to the Spamassassin Version, but rather > the version of "Berekly DB". Have you updated any packages lately? Actually it is the SA DB version being referred to. v2 is for databases in SA 2.6x, v3 is SA 3.0.x and later. > > When I run sa-learn --dump or --sync, it tells me the database is > > version 2. This machine never ran less than Spamassassin version 3. > > The --sync will run continueing to try and get a lock on the file > > forever. The only related process running was amavisd, killed it. > > Still couldn't get a lock on the file. Well, the lock is really a different thing from the DB. If the files are local to the machine (ie: not NFS), you may want to switch to "lock_method flock" which is better, but doesn't work on remote file systems. > > I then removed bayes.lock* files in the same directory. Ran sa-learn -D > > --sync, reported that it was upgrading the database to v3 and completed. > > When run again it can't get a lock on the file. It sounds like either you have a lot of contention for the lock, or processes are being killed before they get the chance to unlock, or ... I'd try the flock method if you can, that usually clears up a lot of problems. -- Randomly Generated Tagline: "So, the long and short of it--if you have one sysadmin, you have a "system administrator." If you have two sysadmins, you have two "system administrators." If you have two thousand sysadmins, you're at LISA." - Trey Harris <[EMAIL PROTECTED]> pgp19Y53KWaFM.pgp Description: PGP signature
Re: Bayes db corrupt, not fixable?
I don't believe that it is referring to the Spamassassin Version, but rather the version of "Berekly DB". Have you updated any packages lately? On Tuesday 27 June 2006 10:45, Bobby Johnson wrote: > When I run sa-learn --dump or --sync, it tells me the database is > version 2. This machine never ran less than Spamassassin version 3. > The --sync will run continueing to try and get a lock on the file > forever. The only related process running was amavisd, killed it. > Still couldn't get a lock on the file. > > I then removed bayes.lock* files in the same directory. Ran sa-learn -D > --sync, reported that it was upgrading the database to v3 and completed. > When run again it can't get a lock on the file. > > I'd rather not rebuild this database if possible. Any ideas? > > Bobby -- Larry G. Starr - [EMAIL PROTECTED] or [EMAIL PROTECTED] Software Engineer: Full Compass Systems LTD. Phone: 608-831-7330 x 1347 FAX: 608-831-6330 === There are only three sports: bullfighting, mountaineering and motor racing, all the rest are merely games! - Ernest Hemmingway
RE: bayes db issue
I recently switched to using mysql bayes. I am getting a [1135] dbg: bayes: unable to initialize database for root user, aborting! When I do spamassassin -d --lint any idea what I need to change? Best regards, JD Smith You possibly have not learned a message as root yet. As root, try this: sa-learn --spam < sample-spam.txt _ Dont just search. Find. Check out the new MSN Search! http://search.msn.click-url.com/go/onm00200636ave/direct/01/
Re: bayes db issue
JD Smith wrote: > I recently switched to using mysql bayes. I am getting a [1135] dbg: > bayes: unable to initialize database for root user, aborting! When I do > spamassassin -d --lint any idea what I need to change? > Its kind of a bad warning message. Bayes will not attempt to initialize the database until you actually try to write to it. In general you can probably ignore unless you are trying to do some sort of learning. Michael
Re: bayes db issue
JD Smith writes: I recently switched to using mysql bayes. I am getting a [1135] dbg: bayes: unable to initialize database for root user, aborting! When I do spamassassin -d --lint any idea what I need to change? Try a "select id,username,spam_count,ham_count from bayes_vars" on your bates database to find the username under which your bayes exists.. Next use the username in the above query to add this line in your local.cf bayes_sql_override_username username hth, - dhawal Best regards, JD Smith -- CAUTION - Disclaimer * This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail requesting deletion of the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions are unlawful. NetMagic Solutions Pvt. Ltd. has taken every reasonable precaution to minimize the risk of virus infection & spam, but is not liable for any damage, you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment. NetMagic Solutions Pvt. Ltd. reserves the right to monitor and review the content of all messages sent to or from this e-mail address. Messages sent to or from this e-mail address may be stored on the NetMagic Solutions Pvt. Ltd.'s e-mail system. * End of Disclaimer ***
Re: bayes db from SA 3.0.2 to 3.0.4
Roman Serbski wrote on Mon, 20 Jun 2005 14:55:56 +0600: > 1. sa-learn --backup > db.txt (on old server) since you transferred the complete db I don't see a reason to import and export the data. This is like backing up your notebook at home, take the notebook and backup with you in the car to work and then recover your data from the backup before starting to work. Just move the whole directory, that's all you need to do. Kai -- Kai Schätzl, Berlin, Germany Get your web at Conactive Internet Services: http://www.conactive.com IE-Center: http://ie5.de & http://msie.winware.org
Re: bayes db from SA 3.0.2 to 3.0.4
I suspect you wanted to perform a "sa-learn --sync" first. But I do not know for sure. {^_^} - Original Message - From: "Roman Serbski" <[EMAIL PROTECTED]> Dear colleagues, Could you please share the correct procedure for moving bayes database from the server powered by SA 3.0.2 to another server with 3.0.4 installed? Here is what I did: 1. sa-learn --backup > db.txt (on old server) 2. Transfer of bayes db files from old server to a new one. cd /var/spool/spamd/.spamassassin/ && ls -al drwx-- 2 spamd spamd 512 Jun 20 14:46 . drwxr-xr-x 3 spamd spamd 512 Feb 20 17:03 .. -rw--- 1 spamd spamd 3798 Jun 20 14:56 bayes.mutex -rw-rw-rw- 1 root spamd 33480 Jun 20 14:56 bayes_journal -rw--- 1 spamd spamd 10174464 Jun 20 14:56 bayes_seen -rw-rw-rw- 1 root spamd 5324800 Jun 20 14:56 bayes_toks -rw-r--r-- 1 spamd spamd 1175 Jan 30 12:08 user_prefs -rw-rw-rw- 1 spamd spamd 65536 Feb 19 17:41 whitelist -rw--- 1 spamd spamd 6 Feb 19 17:41 whitelist.mutex 3. sa-learn --restore db.txt (on new server) `spamassassin -D --lint` doesn't show any errors. Does this procedure look correct? Thank you for your time! Roman
Re: bayes DB in CDB format
Asif Iqbal wrote: > Hi All > > I see notes on using MySQL/PgSQL and other SQL database and migration > from Berkeley DB to MySQL. I was wondering if anyone knows how to > migrate to DAN's CDB from Berkeley DB for bayes DB. I like to use that (CDB) > as the > bayes DB. > > Thanks for any help/suggestion/tip > CDB would be rather difficult to support for SA's bayes system. It's designed for "constant databases", ie: those which are built once and read many times without change. To this end, CDB does not support single-record inserts or deletes, which are very key operations to SpamAssassin's learning and expiry. Any learning or expiry operation would require deleting the entire bayes database and rebuilding the whole thing.
Re: bayes DB in CDB format
Rick Macdougall wrote: Asif Iqbal wrote: Hi All I see notes on using MySQL/PgSQL and other SQL database and migration from Berkeley DB to MySQL. I was wondering if anyone knows how to migrate to DAN's CDB from Berkeley DB for bayes DB. I like to use that (CDB) as the bayes DB. Thanks for any help/suggestion/tip Hi, While I do not know the answer to that (I believe it's going to be "Sorry, CDB is not currently supported"), you should really look at MySQL or PgSQL for bayes if you are going to migrate. On a heavily loaded server you almost have to run bayes with MySQL or suffer the consequences of bayes locking / expiry. Not to mention when having > 1 spamd servers scanning the same "stream." of messages and thus should share the same data. Arvinn
Re: bayes DB in CDB format
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Asif Iqbal wrote: > Hi All > > I see notes on using MySQL/PgSQL and other SQL database and > migration from Berkeley DB to MySQL. I was wondering if anyone > knows how to migrate to DAN's CDB from Berkeley DB for bayes DB. I > like to use that (CDB) as the bayes DB. > > Thanks for any help/suggestion/tip > Doesn't CDB work best in a read only situation? or is that TDB? If it's got a perl module that follows the same interface as the other *_File (ie DB_File, SDBM_File, etc) modules, it could be tested I suppose. FYI, 3.1 has native support for SDBM, as well as Berkeley DB. It also has PgSQL and MySQL specific modules that offer features specific to those databases. Michael -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (Darwin) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFCm/X3G4km+uS4gOIRAnREAKCbDCj/cpMAXq/hidhH/TgDjnNDuQCgmjaY Ulrcs902hDp4QzgxVlilLpo= =j1Sr -END PGP SIGNATURE-
Re: bayes DB in CDB format
Asif Iqbal wrote: Hi All I see notes on using MySQL/PgSQL and other SQL database and migration from Berkeley DB to MySQL. I was wondering if anyone knows how to migrate to DAN's CDB from Berkeley DB for bayes DB. I like to use that (CDB) as the bayes DB. Thanks for any help/suggestion/tip Hi, While I do not know the answer to that (I believe it's going to be "Sorry, CDB is not currently supported"), you should really look at MySQL or PgSQL for bayes if you are going to migrate. On a heavily loaded server you almost have to run bayes with MySQL or suffer the consequences of bayes locking / expiry. CDB may actually be worse then Berkeley DB for Bayes use, even if it was/is supported because of the method it uses for updates. With a real DB backend, those problems go away. Just my $0.02 with a couple of hundred thousand messages scanned a day. Regards, Rick
Re: bayes DB in CDB format
On Mon, May 30, 2005 at 07:18:31PM -0400, Asif Iqbal wrote: > from Berkeley DB to MySQL. I was wondering if anyone knows how to > migrate to DAN's CDB from Berkeley DB for bayes DB. I like to use that (CDB) > as the > bayes DB. There's no native support for CDB, so you'd have to do up your own BayesStore backend module, etc. -- Randomly Generated Tagline: Is it progress if a cannibal uses a knife and fork? pgpZC7tZDkRmu.pgp Description: PGP signature
Re: bayes db keeps die
Craig best to install Spamassassin from source or CPAN. I've seen lots of problems with the RPM based install. No specifically bayes but -- Martin Hepworth Snr Systems Administrator Solid State Logic Tel: +44 (0)1865 842300 Craig White wrote: CentOS 3.4 # rpm -qa spamassassin spamassassin-3.0.3-1.1.el3.rf # rpm -qa mailscanner mailscanner-4.41.3-1 I start with starter db from Fortress Systems since my old bayes db from 2.6x was creamed by this same issue... # spamassassin -p /etc/MailScanner/spam.assassin.prefs.conf -D --lint much snippage... debug: bayes: 5791 tie-ing to DB file R/O /etc/MailScanner/bayes/bay debug: bayes: 5791 tie-ing to DB file R/O /etc/MailScanner/bayes/bayes_seen debug: bayes: found bayes db version 3 ok - looks good # sa-learn -p /etc/MailScanner/spam.assassin.prefs.conf --dump magic 0.000 0 3 0 non-token data: bayes db version 0.000 0 1733 0 non-token data: nspam 0.000 0313 0 non-token data: nham 0.000 0 140671 0 non-token data: ntokens 0.000 0 1051647943 0 non-token data: oldest atime 0.000 0 1095956416 0 non-token data: newest atime 0.000 0 0 0 non-token data: last journal sync atime 0.000 0 0 0 non-token data: last expiry atime 0.000 0 0 0 non-token data: last expire atime delta 0.000 0 0 0 non-token data: last expire reduction count still looks good - but within 2 minutes... # sa-learn -p /etc/MailScanner/spam.assassin.prefs.conf --dump magic bayes: bayes db version 0 is not able to be used, aborting! at /usr/lib/perl5/vendor_perl/5.8.0/Mail/SpamAssassin/BayesStore/DBM.pm line 160. bayes: bayes db version 0 is not able to be used, aborting! at /usr/lib/perl5/vendor_perl/5.8.0/Mail/SpamAssassin/BayesStore/DBM.pm line 160. ERROR: Bayes dump returned an error, please re-run with -D for more information It seems that no matter how I execute things like... sa-learn --rebuild or sa-learn --sync -D it always corrupts in this fashion. Any clues? Thanks Craig ** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. This footnote confirms that this email message has been swept for the presence of computer viruses and is believed to be clean. **
Re: Bayes DB does not grow anymore
From: "Kai Schaetzl" <[EMAIL PROTECTED]> > > in a degree I have set my SA score to be more or less equal with the > > BAYES_99 score (around 8). > > Your BAYES_99 score is 8? I would never do this. General rule is that no single > rule should be able to mark a message as ham or spam. That cries for false > positives. I'd not do that with Bayes scores. However, there are a few rules that are iron clad spam detectors here and they get VERY high scores. They are unique to me and uniquely usable by me so I don't bother to pass them along. (I have a string if wrong names associated with products people spam me about that I use to send a score well over 5 to SA. And I have some additional PayPal antispam of my own which involve some fancy dancing with meta rules that get an automatic 105 to make sure they never get through to anything but my spam folder. I do scan the spam folder, though. If I didn't scan it I'd not be so vicious about some of my spam scores. {^_-}
Re: Bayes DB does not grow anymore
GRP Productions wrote on Fri, 18 Mar 2005 10:38:29 +0200: > It seems SURBL is now enabled by default. It has also changed its name to > URIDNSBL :-) SURBL refers generally to those xx_SURBL rules and to URIDNSBL since the only other distributed rules is SBL and SURBL started it all. I do not use SARE rules (although I am trying to find time to > look at them, as I am aware of their credibility). I use Gray's rules > (http://files.grayonline.id.au), they seem quite efficient. I wasn't aware of that site, but now that I visited it, I remember I visited it at least once. Use whatever works for you. After all, all this stuff isn't done to make you try out again and again but to help you focus your time on the important things. > I understand what you say. The point is, what should be the criteria to > understand if the time for an expiration has come? I mean, supposing we take > only the size in consideration, could be a problem. What if some old tokens > are still common nowadays in spam mail? This is not a problem. Expiry isn't done by "addition time", but by access time (short: atime). So, items which didn't occur recently drop to the "end" of the db and get removed by expiry. There's always the chance that old tokens which haven't been seen for a long time "come back". But the chance is slimmer the older the atime of that token is. There's probably some statistical curve algorithm which could be used to determine the best "break point". Because of the way dbx databases work expiry can't be done this way, though. > As I told you, since my last post I have reset everything. It seems to me > it works fine, and it learns rapidly. It gives me no reason not to trust it, > in a degree I have set my SA score to be more or less equal with the > BAYES_99 score (around 8). Your BAYES_99 score is 8? I would never do this. General rule is that no single rule should be able to mark a message as ham or spam. That cries for false positives. Of course I keep doing mistake-based learning, > but most of the times I feed it with 'subjective' spam mail (ie. mail that > my users don't want to receive, but is definitely not spam). What kind of mail is that? Newsletters they once subscribed to and don't like anymore? They should unsubscribe instead of declaring it as spam. Kai -- Kai Schätzl, Berlin, Germany Get your web at Conactive Internet Services: http://www.conactive.com IE-Center: http://ie5.de & http://msie.winware.org
Re: Bayes DB does not grow anymore
Thanks for the offer. You can send it to the email address I use for this list, or you could just send me an FTP URL for retrieval. Sorry I did not find the time to do this, but I will try to send it during the weekend. Oh, yes. You need to have SURBL switched on via the init.pre (I think it's off by default) and you should use custom rules. I use a set of carefully chosen rulesets mostly from SARE and updated via rulesdujour and some more rules of my own accumulated over time. It seems SURBL is now enabled by default. It has also changed its name to URIDNSBL :-) I do not use SARE rules (although I am trying to find time to look at them, as I am aware of their credibility). I use Gray's rules (http://files.grayonline.id.au), they seem quite efficient. I think on a heavy traffic machine it's preferrable to have it off, especially when using MailScanner. Otherwise the expiry can kick in at random times every few hours (you can set a minimum time, though, f.i. one day). Some people run a scheduled expiry three times a day. That's an advice which often comes up on the Mailscanner list (which is a very helpful list, btw). Depends on how often you need it (whether it reaches the limit you want to hold more often or not). Starting with one expiry per night should be fine, but you should occasionally expire manually and look at the output, in case there are problems. No. One should get rid of really old tokens, they are only "ballast" in the db. I don't know how a big db behaves on a busy site. Ours contain 1 Mio. tokens and have a size of 40 MB. They work very well with no ressource hogging. But I have only a few thousand messages running thru each of our servers, there's probably none which gets more than 10.000 a day. If you get 100.000 it may be different. I understand what you say. The point is, what should be the criteria to understand if the time for an expiration has come? I mean, supposing we take only the size in consideration, could be a problem. What if some old tokens are still common nowadays in spam mail? You could say it doesn't matter it will be started again and recognize all the bad stuff. In that sense, we could just stop maintaining Bayes completely. That's what we do. I only learn messages which were categorized wrong. Not by Bayes, but by SA. Most messages which get a score lower than 5 still get a BAYES_99 which means that Bayes identifies them all. Nevertheless, I learn these messages because they are spam and it reassures Bayes that they are spam. BTW: I have set BAYES_99 to 3.0, because it's so accurate for us. As I told you, since my last post I have reset everything. It seems to me it works fine, and it learns rapidly. It gives me no reason not to trust it, in a degree I have set my SA score to be more or less equal with the BAYES_99 score (around 8). Of course I keep doing mistake-based learning, but most of the times I feed it with 'subjective' spam mail (ie. mail that my users don't want to receive, but is definitely not spam). I monitor it constantly and I am happy about it. No problem :-) I tend to be a bit snappy on first messages which look to me like the author could have done a bit more research, but once we are over that stage I hope I can give some good advice based on my experience. I have to admit that our communication was valuable to me, I learned so much about how the whole thing works. Once again, I appreciate it. Greg _ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
Re: Bayes DB does not grow anymore
GRP Productions wrote on Tue, 15 Mar 2005 01:12:53 +0200: > >I have been trying to get something from CVS for several days now, no luck. > > Send me your email in private ([EMAIL PROTECTED]) to send it to you. Thanks for the offer. You can send it to the email address I use for this list, or you could just send me an FTP URL for retrieval. > I will probably start again from scratch. One point: Do you think I should > put custom rules inside /etc/mail/spamassassin or the default installation > is enough? Oh, yes. You need to have SURBL switched on via the init.pre (I think it's off by default) and you should use custom rules. I use a set of carefully chosen rulesets mostly from SARE and updated via rulesdujour and some more rules of my own accumulated over time. > Yes I just added this. Should auto_expire remain always at 0? I think on a heavy traffic machine it's preferrable to have it off, especially when using MailScanner. Otherwise the expiry can kick in at random times every few hours (you can set a minimum time, though, f.i. one day). Some people run a scheduled expiry three times a day. That's an advice which often comes up on the Mailscanner list (which is a very helpful list, btw). Depends on how often you need it (whether it reaches the limit you want to hold more often or not). Starting with one expiry per night should be fine, but you should occasionally expire manually and look at the output, in case there are problems. Also, do you > think it would be better if the db NEVER expired? No. One should get rid of really old tokens, they are only "ballast" in the db. I don't know how a big db behaves on a busy site. Ours contain 1 Mio. tokens and have a size of 40 MB. They work very well with no ressource hogging. But I have only a few thousand messages running thru each of our servers, there's probably none which gets more than 10.000 a day. If you get 100.000 it may be different. Would this value of 50 > achieve that? I don't want to come at work some day and see my tokens were > lost again :-( Just look at what the dump says about your oldest token. If your bayes "performance" is good than the hold time is probably of no interest, but if the spam detection from bayes is bad and you have a short hold time one of the things I would look at is the short hold time. > > In general, should I do as you said, ie. trust the autolearn system and > never use sa-learn again, provided that I do not have the time to do full > training. That's what we do. I only learn messages which were categorized wrong. Not by Bayes, but by SA. Most messages which get a score lower than 5 still get a BAYES_99 which means that Bayes identifies them all. Nevertheless, I learn these messages because they are spam and it reassures Bayes that they are spam. BTW: I have set BAYES_99 to 3.0, because it's so accurate for us. > > Thanks for giving me so much of your time, and being so patient with my > silly questions. No problem :-) I tend to be a bit snappy on first messages which look to me like the author could have done a bit more research, but once we are over that stage I hope I can give some good advice based on my experience. Kai -- Kai Schätzl, Berlin, Germany Get your web at Conactive Internet Services: http://www.conactive.com IE-Center: http://ie5.de & http://msie.winware.org
Re: Bayes DB does not grow anymore
I have been trying to get something from CVS for several days now, no luck. Send me your email in private ([EMAIL PROTECTED]) to send it to you. Bayes needs constant training, but this doesn't mean it needs any manual training. Once it's up and running and "well-greased" it should take care of itself by auto-learning (bayes_auto_learn 1, don't know if on by default). About 70 or 80% of our spam and ham (especially the spam) is autolearned. I will probably start again from scratch. One point: Do you think I should put custom rules inside /etc/mail/spamassassin or the default installation is enough? Actually, with those "few" tokens you won't loose much if you throw it away ;-) As I said upping that should help, no need to throw it away unless you think that's easier (if most spam you get scores at BAYES_50 it might be better to start over than to convince the db that it's spam). I'll probably do it. > bayes_auto_expire 0 > bayes_expiry_max_db_size 50 I assume you just added>/changed that? Yes I just added this. Should auto_expire remain always at 0? Also, do you think it would be better if the db NEVER expired? Would this value of 50 achieve that? I don't want to come at work some day and see my tokens were lost again :-( In general, should I do as you said, ie. trust the autolearn system and never use sa-learn again, provided that I do not have the time to do full training. Thanks for giving me so much of your time, and being so patient with my silly questions. Best regards, Greg _ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
Re: Bayes DB does not grow anymore
GRP Productions wrote on Mon, 14 Mar 2005 03:41:40 +0200: > Indeed, this is the CVS version :-) I have been trying to get something from CVS for several days now, no luck. > This is perhaps because I have been using only 'mistake-based' training (ie > training only when false classificaiton happens). However this used to work > fine. Bayes needs constant training, but this doesn't mean it needs any manual training. Once it's up and running and "well-greased" it should take care of itself by auto-learning (bayes_auto_learn 1, don't know if on by default). About 70 or 80% of our spam and ham (especially the spam) is autolearned. > > >your "hold time" is quite low, it's about a month. I think we haven tokens > >from > >even a year ago. That's maybe a bit too much, but I strongly suggest upping > >your bayes_expiry_max_db_size to something like 500.000 or so. Since you > >have a > >much higher flux of messages than we have on that machine you are literally > >"burning" your db to uselessness. > > So what would you suggest? I certainly dont want to lose everything that has > been learned till now. Actually, with those "few" tokens you won't loose much if you throw it away ;-) As I said upping that should help, no need to throw it away unless you think that's easier (if most spam you get scores at BAYES_50 it might be better to start over than to convince the db that it's spam). > Nope, there is definitely only the one comng with MS. I never use SA from > the command line anyway. Well, let's go back: you sa-learn a message, it says it learned, you dump magic and see there's no change, you look in the directory and there's no journal. There *has* to be at least one additional Bayes db. Or something happens which I haven't heard of in my about three years of using SA+Bayes. What's the output of "sa-learn --dump magic"? Don't specify a config file! > bayes_path /var/spool/MailScanner/bayes/bayes and what's in your /etc/mail/spamassassin/local.conf? > bayes_auto_expire 0 ok, that means it won't expire. Of course, if it doesn't grow this isn't necessary ... ;-) > bayes_expiry_max_db_size 50 I assume you just added>/changed that? > If I get it you mean that the tokens are lost very quickly? Yes. However, now that I know that your bayes_expiry is off we have a different case? Since when has it been off? Since Feb. 11 as your dump magic suggests? Your oldest token is Feb. 2. So that either means your started the db that day or you are burning your tokens in 10 days. That's one problem, upping to a higher ceiling, as you already did, should take care of that. The other problem is that it's apparently not growing. One of the reasons is, of course, that you only learn by mistake. So, how often is that done? How many do you actually add this way? The second part of this other problem is that even if you learn it doesn't seem to learn. I don't see another possibility as that it uses different dbs. I think am > confused , if bayes works with tokens, why does it need nspam and nham? Or > are they just counters? It's just the number of spam and ham messages you learned to it. Yes, it's more or less informational only. > > In general, do you think that setting bayes_expiry_max_db_size would be > enough? To cure the fast expiration, yes, but you didn't expire for the last 30 days, anyway. > One final thing: Why even if i manually expire, the date of last expiration > remains old? Same reason as above: you work on different dbs. What does the expire output show? Kai -- Kai Schätzl, Berlin, Germany Get your web at Conactive Internet Services: http://www.conactive.com IE-Center: http://ie5.de & http://msie.winware.org
Re: Bayes DB does not grow anymore
That's okay, the problem just is one cannot be sure how accurate it is. Knowing that you use MS would have been useful, anyway :-) (BTW: my version of Mailwatch can't show this, do you use a CVS version?) Indeed, this is the CVS version :-) See the number of tokens, we have ten times yours with less learned mail. That means that our db has much more tokens to qualify an email as ham or spam. Also This is perhaps because I have been using only 'mistake-based' training (ie training only when false classificaiton happens). However this used to work fine. your "hold time" is quite low, it's about a month. I think we haven tokens from even a year ago. That's maybe a bit too much, but I strongly suggest upping your bayes_expiry_max_db_size to something like 500.000 or so. Since you have a much higher flux of messages than we have on that machine you are literally "burning" your db to uselessness. So what would you suggest? I certainly dont want to lose everything that has been learned till now. And you learned by specifying the config file? I suspect that you are at least occasionally using two SA configurations, the one coming with MS and the one coming with SA. Nope, there is definitely only the one comng with MS. I never use SA from the command line anyway. Oh. Still possible, though. You don't need to have one, but on high volume systems it's highly recommended. Check your SA config (whereever it is :-) for bayes_learn_to_journal 1. I don't know if it is 1 by default, though. What do you have starting with bayes in your config file? # grep bayes /opt/MailScanner/etc/spam.assassin.prefs.conf # be created as /var/spool/spamassassin/bayes_msgcount, etc. #bayes_path /var/spool/spamassassin/bayes #bayes_file_mode0600 bayes_path /var/spool/MailScanner/bayes/bayes bayes_file_mode 0666 # MailScanner: big bayes_toks.new files wasting space. bayes_auto_expire 0 bayes_expiry_max_db_size 50 bayes_ignore_header X-MailScanner bayes_ignore_header X-MailScanner-SpamCheck bayes_ignore_header X-MailScanner-SpamScore bayes_ignore_header X-MailScanner-Information # use_bayes 0 Don't know if this would be of any help. As I said, I suspect you are using at least two different bayes dbs. At least when you do it from the command line. Run an "updatedb" and then "locate bayes" (this may not locate all files, f.i. not in /var !). I think there is only one. MS, of course, can only use one and doesn't have a chance of confusing that, so when it uses SA that learns and checks the same db. And so far that part seems to be okay (except for the bigger size of bayes_seen, but as I said, this may be normal for your setup, I really don't know). But you burn your tokens too fast. At least that's what I think. If I get it you mean that the tokens are lost very quickly? I think am confused , if bayes works with tokens, why does it need nspam and nham? Or are they just counters? In general, do you think that setting bayes_expiry_max_db_size would be enough? One final thing: Why even if i manually expire, the date of last expiration remains old? _ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
Re: Bayes DB does not grow anymore
GRP Productions wrote on Mon, 14 Mar 2005 00:32:42 +0200: > You are right, I am using MailWatch. I just posted this output to be easy > for one to see the actual dates without having to convert. That's okay, the problem just is one cannot be sure how accurate it is. Knowing that you use MS would have been useful, anyway :-) (BTW: my version of Mailwatch can't show this, do you use a CVS version?) Here is the > actual output: > > # /usr/bin/sa-learn -p /opt/MailScanner/etc/spam.assassin.prefs.conf --dump > magic > 0.000 0 3 0 non-token data: bayes db version > 0.000 0 49740 0 non-token data: nspam > 0.000 0 47167 0 non-token data: nham > 0.000 0 123325 0 non-token data: ntokens I didn't look at this closely before, but I think this ratio indicates a problem, f.i. this is from our own mail server (just getting our own mail, not our clients'): 0.000 0 30089 0 non-token data: nspam 0.000 0 12515 0 non-token data: nham 0.000 01001630 0 non-token data: ntokens See the number of tokens, we have ten times yours with less learned mail. That means that our db has much more tokens to qualify an email as ham or spam. Also your "hold time" is quite low, it's about a month. I think we haven tokens from even a year ago. That's maybe a bit too much, but I strongly suggest upping your bayes_expiry_max_db_size to something like 500.000 or so. Since you have a much higher flux of messages than we have on that machine you are literally "burning" your db to uselessness. > No it isn't. This is exactly the point I mentioned. But you didn't prove it ;-) But as I said earlier, > sa-learn claims it has learned, even from the web interface: > >SA Learn: Learned from 1 message(s) (1 message(s) examined). And you learned by specifying the config file? I suspect that you are at least occasionally using two SA configurations, the one coming with MS and the one coming with SA. > This is getting more suspicious: there is no bayes_journal file! Oh. Still possible, though. You don't need to have one, but on high volume systems it's highly recommended. Check your SA config (whereever it is :-) for bayes_learn_to_journal 1. I don't know if it is 1 by default, though. What do you have starting with bayes in your config file? > -rw-rw-rw- 1 root nobody 1236 Mar 14 00:22 bayes.mutex > -rw-rw-rw- 1 root nobody 10452992 Mar 14 00:22 bayes_seen > -rw-rw-rw- 1 root nobody 5509120 Mar 14 00:02 bayes_toks bayes_seen is quite high. I haven't ever seen that it is higher than bayes_toks on our systems. But maybe that's normal for high volume systems, I don't know. On the Mailscanner list many people complain about very big bayes_seen files. Someone else on this list should comment on the size. > I can assure you noone has touched anything inside this directory. If this > is the reason for the problems I've been facing, is there a way to recreate > the file without having to lose my current data? (perhaps by copying the > above files somewhere, execute sa-learn --clear and some time later restore > the above files?) Don't know if this would be of any help. As I said, I suspect you are using at least two different bayes dbs. At least when you do it from the command line. Run an "updatedb" and then "locate bayes" (this may not locate all files, f.i. not in /var !). MS, of course, can only use one and doesn't have a chance of confusing that, so when it uses SA that learns and checks the same db. And so far that part seems to be okay (except for the bigger size of bayes_seen, but as I said, this may be normal for your setup, I really don't know). But you burn your tokens too fast. At least that's what I think. Kai -- Kai Schätzl, Berlin, Germany Get your web at Conactive Internet Services: http://www.conactive.com IE-Center: http://ie5.de & http://msie.winware.org
Re: Bayes DB does not grow anymore
That is the output of --dump magic? I haven't ever seen it formatted that nicely. I assume you skipped the first line, but there's also missing the expire atime delta. So, where do you got this from? Not directly from sa-learn --dump magic I'd say. You are running SA thru some interface? You should have said something about the whereabouts of your installation. You are right, I am using MailWatch. I just posted this output to be easy for one to see the actual dates without having to convert. Here is the actual output: # /usr/bin/sa-learn -p /opt/MailScanner/etc/spam.assassin.prefs.conf --dump magic 0.000 0 3 0 non-token data: bayes db version 0.000 0 49740 0 non-token data: nspam 0.000 0 47167 0 non-token data: nham 0.000 0 123325 0 non-token data: ntokens 0.000 0 1107319073 0 non-token data: oldest atime 0.000 0 1110636450 0 non-token data: newest atime 0.000 0 1108137790 0 non-token data: last journal sync atime 0.000 0 1108129534 0 non-token data: last expiry atime 0.000 0 804361 0 non-token data: last expire atime delta 0.000 0 3475 0 non-token data: last expire reduction count Ok. Get the values. Then learn a message to it. Make sure it says that it actually learned, then check the values again. Is either the spam or ham count increased by one or not? No it isn't. This is exactly the point I mentioned. But as I said earlier, sa-learn claims it has learned, even from the web interface: SA Learn: Learned from 1 message(s) (1 message(s) examined). Ok, this finally looks a bit suspicious. No sync and no expire for a month. If it doesn't sync you don't get new tokens. Check in your bayes directory how big your bayes_journal is. I'd think it's quite big. Do a sync now. (Please don't do it via an interface, do it on the command line.) What's the output? Is the journal gone and the number of tokens increased now? If so, you need to investigate why it doesn't sync anymore. Also do an expire then. This is getting more suspicious: there is no bayes_journal file! # ll /var/spool/MailScanner/bayes/ total 11780 drwxrwxrwx 2 root nobody 4096 Mar 14 00:22 . drwxr-xr-x 4 root nobody 4096 Mar 13 11:55 .. -rw-rw-rw- 1 root nobody 1236 Mar 14 00:22 bayes.mutex -rw-rw-rw- 1 root nobody 10452992 Mar 14 00:22 bayes_seen -rw-rw-rw- 1 root nobody 5509120 Mar 14 00:02 bayes_toks I can assure you noone has touched anything inside this directory. If this is the reason for the problems I've been facing, is there a way to recreate the file without having to lose my current data? (perhaps by copying the above files somewhere, execute sa-learn --clear and some time later restore the above files?) Thanks for your help _ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
Re: Bayes DB does not grow anymore
GRP Productions wrote on Sun, 13 Mar 2005 22:54:22 +0200: > Perhaps I have not been clear enough. It's not only that the files' size is > constant. I am pasting the output of dump magic, That is the output of --dump magic? I haven't ever seen it formatted that nicely. I assume you skipped the first line, but there's also missing the expire atime delta. So, where do you got this from? Not directly from sa-learn --dump magic I'd say. You are running SA thru some interface? You should have said something about the whereabouts of your installation. and I have to explain that > the nham and nspam values are the same for many days now. Ok. Get the values. Then learn a message to it. Make sure it says that it actually learned, then check the values again. Is either the spam or ham count increased by one or not? > work fine. If I send to myself a message from Yahoo, with subject 'Viagra > sex teen " and other nice words, I certainly do not want it to pass. > Bayes classifies it as 50% spam. I tried to sa-learn --forget, and then > re-learn, still is BAYES_50. Again, this is NOT how Bayes works. You can't learn it one message and then expect it to flag that message as spam next time. Bayes does not work like this! And that it classifies that message as 50%, which means, it cannot determine if it's ham or spam, just says that the tokens in the db are not good enough for that message. Or maybe it contains enough hammy tokens, whatever. > Number of Spam Messages: 49,740 > Number of Ham Messages: 47,167 > Number of Tokens: 123,325 > Oldest Token: Wed, 2 Feb 2005 06:37:53 +0200 > Newest Token: Sat, 12 Mar 2005 16:07:30 +0200 Says it added/changed time a token yesterday. > Last Journal Sync: Fri, 11 Feb 2005 18:03:10 +0200 > Last Expiry: Fri, 11 Feb 2005 15:45:34 +0200 > Last Expiry Reduction Count: 3,475 tokens Ok, this finally looks a bit suspicious. No sync and no expire for a month. If it doesn't sync you don't get new tokens. Check in your bayes directory how big your bayes_journal is. I'd think it's quite big. Do a sync now. (Please don't do it via an interface, do it on the command line.) What's the output? Is the journal gone and the number of tokens increased now? If so, you need to investigate why it doesn't sync anymore. Also do an expire then. Kai -- Kai Schätzl, Berlin, Germany Get your web at Conactive Internet Services: http://www.conactive.com IE-Center: http://ie5.de & http://msie.winware.org
Re: Bayes DB does not grow anymore
This doesn't prove anything. sa-learn --dump magic shows you what's inside. Also, Bayes is not a checksum system like Razor, that's its strength. If you learn something to it that means that it extracts tokens (short pieces) from the message and adjusts its internal probability for them being ham or spam by a certain factor. Or if it doesn't know that token yet it adds it. That the size doesn't grow can have several reasons, f.i. expiry or the fact that the db format seems to have some "air" in it, so that it grows in jumps and not continually. Perhaps I have not been clear enough. It's not only that the files' size is constant. I am pasting the output of dump magic, and I have to explain that the nham and nspam values are the same for many days now. This is not normal, since we are talking about a very busy server (more than 4,000 messages per day). This behaviour has not always been the case, it used to work fine. If I send to myself a message from Yahoo, with subject 'Viagra sex teen " and other nice words, I certainly do not want it to pass. Bayes classifies it as 50% spam. I tried to sa-learn --forget, and then re-learn, still is BAYES_50. The nham and nspam values used to increase very rapidly (sometimes by a value of 200-300 per day). No errors are produced. I wouldn't have noticed the particular problem, but fortunately during the last days we started having more spam than usual to be passing. Also, I tried to force an expiration many times, but as you can see the expiration did not take place. Its definitely not a file permission issue. Thanks Number of Spam Messages:49,740 Number of Ham Messages: 47,167 Number of Tokens: 123,325 Oldest Token: Wed, 2 Feb 2005 06:37:53 +0200 Newest Token: Sat, 12 Mar 2005 16:07:30 +0200 Last Journal Sync: Fri, 11 Feb 2005 18:03:10 +0200 Last Expiry:Fri, 11 Feb 2005 15:45:34 +0200 Last Expiry Reduction Count:3,475 tokens _ FREE pop-up blocking with the new MSN Toolbar - get it now! http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/
Re: Bayes DB does not grow anymore
GRP Productions wrote on Sun, 13 Mar 2005 11:21:12 +0200: > for some days now my bayesian DB does not seem to grow. Its size remains > stable. It is read with no problems by SA 3.0.2, but nothing new is written. > I send an email to me, it is classified as BAYES_50. I sa-learn it as spam, > send it again, and it is still BAYES_50 (I expected to see it as BAYES_99). > This doesn't prove anything. sa-learn --dump magic shows you what's inside. Also, Bayes is not a checksum system like Razor, that's its strength. If you learn something to it that means that it extracts tokens (short pieces) from the message and adjusts its internal probability for them being ham or spam by a certain factor. Or if it doesn't know that token yet it adds it. That the size doesn't grow can have several reasons, f.i. expiry or the fact that the db format seems to have some "air" in it, so that it grows in jumps and not continually. Kai -- Kai Schätzl, Berlin, Germany Get your web at Conactive Internet Services: http://www.conactive.com IE-Center: http://ie5.de & http://msie.winware.org
Re: bayes db version error
Michael Parker wrote: On Tue, Feb 08, 2005 at 05:28:51PM -0300, Matias Lopez Bergero wrote: 2) Throttle the calls to spamd to reduce lock contention. Sorry to ask this again, but I'm not native English speaker :-P Did you mean to increase the number of spamd children processes, like spamd -m x? I have currently set spamd -m 10. No, it means to slow/reduce the calls to spamd. Increasing the number of children will probably make the problem get worse. OK, I'm using milter-spamc to talk with sendmail milter and pass the messages to spamd. For what I know, there is no way to control the calls to spamd from the milter-spamc command. I would have to reduce the spamd child processes or increase the milter timeout in order to reduce the calls to spamd right? 3) Switch to SQL based bayes which won't (well shouldn't) have that issue. That's an interesting idea. I'm going to keep that in mind :) You can view the notes/slides from my ApacheCon presentation on Storing SpamAssassin User Data in SQL Databases here: http://www.apache.org/~parker/presentations/ Hopefully it will help move things along. That's good, Thank you very much Michael. BR, Matías.
Re: bayes db version error
On Tue, Feb 08, 2005 at 05:28:51PM -0300, Matias Lopez Bergero wrote: > > >2) Throttle the calls to spamd to reduce lock contention. > > Sorry to ask this again, but I'm not native English speaker :-P > Did you mean to increase the number of spamd children processes, like > spamd -m x? I have currently set spamd -m 10. > No, it means to slow/reduce the calls to spamd. Increasing the number of children will probably make the problem get worse. > > >3) Switch to SQL based bayes which won't (well shouldn't) have that > > issue. > > That's an interesting idea. > I'm going to keep that in mind :) You can view the notes/slides from my ApacheCon presentation on Storing SpamAssassin User Data in SQL Databases here: http://www.apache.org/~parker/presentations/ Hopefully it will help move things along. Michael pgpJ2mCb0uKlh.pgp Description: PGP signature
Re: bayes db version error
Michael Parker wrote: On Tue, Feb 08, 2005 at 04:37:50PM -0300, Matias Lopez Bergero wrote: I'm seeing a lot of messages about and version error in the bayes db in my log file: spamd[6562]: bayes: bayes db version 0 is not able to be used, aborting! at /usr/lib/perl5/site_perl/5.8.0/Mail/SpamAssassin/BayesStore/DBM.pm line 160. I'm assuming you're running sitewide bayes (or at least running as a single user) and on a somewhat busy server. Yes, I forgot to say that. I'm running a sitewide install with about 6 incoming messages per day. That error message is a pretty good indication that SA couldn't get a lock on the bayes db files. It's actually just a warning, not an error, and it may or may not have actually aborted. You'll see this on a setup that is getting a good amount of traffic and using shared/sitewide bayes db files. Several things you can try: 1) If you db files aren't on an NFS filesystem switch your lock_method to flock (the default is nfssafe). If your shared db files are on an NFS filesystem then consider moving them off and switch your lock_method. Done. 2) Throttle the calls to spamd to reduce lock contention. Sorry to ask this again, but I'm not native English speaker :-P Did you mean to increase the number of spamd children processes, like spamd -m x? I have currently set spamd -m 10. 3) Switch to SQL based bayes which won't (well shouldn't) have that issue. That's an interesting idea. I'm going to keep that in mind :) Thanks a lot Michael BR, Matías.
Re: bayes db version error
On Tue, Feb 08, 2005 at 04:37:50PM -0300, Matias Lopez Bergero wrote: > I'm seeing a lot of messages about and version error in the bayes db in > my log file: > > spamd[6562]: bayes: bayes db version 0 is not able to be used, aborting! > at /usr/lib/perl5/site_perl/5.8.0/Mail/SpamAssassin/BayesStore/DBM.pm > line 160. > I'm assuming you're running sitewide bayes (or at least running as a single user) and on a somewhat busy server. That error message is a pretty good indication that SA couldn't get a lock on the bayes db files. It's actually just a warning, not an error, and it may or may not have actually aborted. You'll see this on a setup that is getting a good amount of traffic and using shared/sitewide bayes db files. Several things you can try: 1) If you db files aren't on an NFS filesystem switch your lock_method to flock (the default is nfssafe). If your shared db files are on an NFS filesystem then consider moving them off and switch your lock_method. 2) Throttle the calls to spamd to reduce lock contention. 3) Switch to SQL based bayes which won't (well shouldn't) have that issue. > > Could this be affecting the spam filtering? > In theory, things should filter just fine, you just won't get BAYES results. If you're seeing something different then it's probably a bug. Michael pgpnPbqbJ0aOH.pgp Description: PGP signature
Re: bayes db - export/import
On Fri, 28 Jan 2005, Justin Mason stated: > Rodney Green writes: >> I'd like to copy the bayes db to the temporary mail server so it can >> continue to be used and continue learning. >> >> Will I need to do some special export/import procedure or will I be >> able to just copy the db files into the directory, set permissions and >> be good to go? > > If it's the same architecture, and the same OS release, you can > probably just copy. For safety I'd recommend using sa-learn --backup > and --restore. You shouldn't need to do that. Berkeley DB databases are byte-order- independent (well, they can be read and written by machines with any byte order), and the things SA puts in them are byte-order-independent too. As evidence, I'm sharing a Bayes database between an i586, two UltraSPARCs, GNU/Linux (x2) and Solaris (x1) with no trouble. That's multiple architectures and multiple OSes; no problems are evident. :) If you *do* need to do a backup-and-restore, it's a sign that something's compromised the byte-order-independence of what's being put in there: probably the wrong string as argument to a pack() in BayesStore. -- `Blish is clearly in love with language. Unfortunately, language dislikes him intensely.' --- Russ Allbery
Re: bayes db - export/import
On Fri, 28 Jan 2005 11:48:32 -0800, Justin Mason <[EMAIL PROTECTED]> wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > > Rodney Green writes: > > Hello, > > > > I'm setting up a temporary mail server so I can do some work on the > > regular production machine, without interrupting service. > > > > I'd like to copy the bayes db to the temporary mail server so it can > > continue to be used and continue learning. > > > > Will I need to do some special export/import procedure or will I be > > able to just copy the db files into the directory, set permissions and > > be good to go? > > If it's the same architecture, and the same OS release, you can > probably just copy. For safety I'd recommend using sa-learn --backup > and --restore. > Thanks Justin. I'll use sa-learn --backup and --restore. Rod
Re: bayes db - export/import
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Rodney Green writes: > Hello, > > I'm setting up a temporary mail server so I can do some work on the > regular production machine, without interrupting service. > > I'd like to copy the bayes db to the temporary mail server so it can > continue to be used and continue learning. > > Will I need to do some special export/import procedure or will I be > able to just copy the db files into the directory, set permissions and > be good to go? If it's the same architecture, and the same OS release, you can probably just copy. For safety I'd recommend using sa-learn --backup and --restore. - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Exmh CVS iD8DBQFB+pcQMJF5cimLx9ARAronAJ9R00cpm3kAZoa143nNRLfCU/AV8wCcC5oJ TDOiZ5cFN5j+yk5pzrcRHAc= =hs43 -END PGP SIGNATURE-
RE: Bayes DB Get Corrupted Quickly
Hi Tim, The script I sent you dumps the tokens out to a text file because SA stores them in a Berkeley DB format. If you want to do it in place then just have a look at the script and edit the appropriate values. If you get really desperate then the two processes (encoding and decoding) are essentially just perl functions. Call one and then the other in the same start function and it will do the whole thing. With regards to the atime problem that you have, look at this section of code if ($atime < 1078099200) { # print STDERR "\nThrowing away key that is too old:\n $k$ts$th$atime\n"; print STDERR '*'; $droppedcount++; next; } # reset atime if it is in the future if ($atime > time) { # print STDERR "\nResetting atime of key in the future:\n $k$ts$th$atime\n"; print STDERR 'o'; $atime = time; $resetcount++; } I had to write the first if statement because after removing future tokens the DB still wouldn't expire. It was only when I pulled out the really old tokens as well that it worked. You might need to change these values in order to be successful. Just change the values to accept/deny tokens into your new database based on date. We run an almost identical system to yours and this worked for us. You should read through the whole perl script though, because there is nothing more dangerous than executing someone elses code without understanding what it is going to do. I know there is nothing bad in the script, but do you? --- This email from dns has been validated by dnsMSS Managed Email Security and is free from all known viruses. For further information contact [EMAIL PROTECTED]
Re: bayes db version 2 is not able to be used, aborting!
You sent three messages to the list in a row without indicating if the earlier problems are solved or if the three are actually connected. Don't you think that your problems are connected somehow? You seem to have upgraded from SA 2.6x to 3.0. I assume you either have a mixed setup now or you need to upgrade some Perl modules SA depends on. You should start over. Why don't you put all this in a broader perspective and give more information (f.i. which way you upgraded from what etc.)? Kai -- Kai Schätzl, Berlin, Germany Get your web at Conactive Internet Services: http://www.conactive.com IE-Center: http://ie5.de & http://msie.winware.org
Re: Bayes DB seemingly corrupted during v2 to v3 upgrade
On Sat, Sep 25, 2004 at 12:59:57PM -0500, Jeremy M. Dolan wrote: > Hi all. Hoping someone might be able to help me out here. Just > upgraded from 2.6x to 3.0.0 this morning, and, though I followed the > Bayes DB upgrade steps in the UPGRADE file to a T, my token names all > seem to be garbage now. > > Here's a few lines of the output from "sa-learn --dump all": > > 0.560 21 3 1094789733 dc60473720 > 0.992 6 0 1090849205 20d2b3d689 > 0.958 1 0 1092129562 23c375c031 > 0.998 20 0 1095699812 cc75bc02df > We no longer store the raw token value in the database, instead it is a hashed value. There is a small blurb about this in UPGRADE. The values in the dump are actually hex representations of the binary values stored in the database. So, relax, you database is fine. Michael
RE: bayes db problem upgrade from 2.63 --> 3.0
>>From the Mailscanner list, run sa-learn with the --sync option to rebuild the bayes db and that seems to fix the problem. I was having the exact same issues, did that, and now the error about the bayes db is no longer there and I can see BAYES rules in action in the logs. Debug outpout now says : debug: bayes: 22150 tie-ing to DB file R/O /home/pfuser/.spamassassin/bayes_toks debug: bayes: 22150 tie-ing to DB file R/O /home/pfuser/.spamassassin/bayes_seen debug: bayes: found bayes db version 3 debug: Score set 3 chosen. I did have to reset the ownership of the bayes files from root to the postfix user, but that's probably just something on my setup. > -Original Message- > From: Nichols, William [mailto:[EMAIL PROTECTED] > Sent: Wednesday, September 22, 2004 11:18 AM > To: users@spamassassin.apache.org > Subject: RE: bayes db problem upgrade from 2.63 --> 3.0 > > > I am using SUSE 9.1 > > I cannot install berkely db it dies, can't install db_file it dies, > > Any ideas" SA is working, just no bayes > > > -Original Message- > From: Rick Macdougall [mailto:[EMAIL PROTECTED] > Sent: Wednesday, September 22, 2004 11:10 AM > To: users@spamassassin.apache.org > Subject: Re: bayes db problem upgrade from 2.63 --> 3.0 > > > > Nichols, William wrote: > > Tried that as well before mailing the list - I am having a problem > with > > install DB_File > > > > version.c:30:16: db.h: No such file or directory > > make: *** [version.o] Error 1 > > /usr/bin/make -- NOT OK > > Running make test > > Can't test without successful make > > Running make install > > make had returned bad status, install seems impossible > > > > hmmm - any help? > > Install berkley db ? and if you are using redhat the devel > rpm as well > (if there is one, dunno, I don't do redhat). > > Regards, > > Rick > > >
RE: bayes db problem upgrade from 2.63 --> 3.0
I am using SUSE 9.1 I cannot install berkely db it dies, can't install db_file it dies, Any ideas" SA is working, just no bayes -Original Message- From: Rick Macdougall [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 22, 2004 11:10 AM To: users@spamassassin.apache.org Subject: Re: bayes db problem upgrade from 2.63 --> 3.0 Nichols, William wrote: > Tried that as well before mailing the list - I am having a problem with > install DB_File > > version.c:30:16: db.h: No such file or directory > make: *** [version.o] Error 1 > /usr/bin/make -- NOT OK > Running make test > Can't test without successful make > Running make install > make had returned bad status, install seems impossible > > hmmm - any help? Install berkley db ? and if you are using redhat the devel rpm as well (if there is one, dunno, I don't do redhat). Regards, Rick
RE: bayes db problem upgrade from 2.63 --> 3.0
I'm having the exact same problem. Upgraded to 3.0, everything seems ok except the Bayes DB error. Tried perl -MCPAN -e "install DB_File" And got same results as William below. SA is working, but no bayes is happening anymore. > -- ORIGINAL MESSAGE > From: Nichols, William ci.redding.ca.us> > Subject: RE: bayes db problem upgrade from 2.63 --> 3.0 > Newsgroups: gmane.mail.spam.spamassassin.general > Date: Wed, 22 Sep 2004 10:39:40 + > > Tried that as well before mailing the list - I am having a problem with > install DB_File > > version.c:30:16: db.h: No such file or directory > make: *** [version.o] Error 1 > /usr/bin/make -- NOT OK > Running make test > Can't test without successful make > Running make install > make had returned bad status, install seems impossible > > hmmm - any help? > > -Original Message- > From: Rick Macdougall [mailto:rickm nougen.com] > Sent: Wednesday, September 22, 2004 10:35 AM > To: Nichols, William > Cc: users spamassassin.apache.org > Subject: Re: bayes db problem upgrade from 2.63 --> 3.0 > > > Nichols, William wrote: > > I installed spamassassin (test box) from cpan over my existing SA 2.63 > > > > > > When I try to sa-learn -sync I get the following "bayes db version 2 > is > > not able to be used, aborting! at > > /usr/lib/perl5/site_perl/5.8.3/Mail/SpamAssassin/BayesStore/DBM.pm > line 160" > > Same problem till I upgraded DB_File > > perl -MCPAN -e "install DB_File" > > on a side note, I also needed to install Storable as well. > > Regards, > > Rick >
Re: bayes db problem upgrade from 2.63 --> 3.0
Nichols, William wrote: Tried that as well before mailing the list - I am having a problem with install DB_File version.c:30:16: db.h: No such file or directory make: *** [version.o] Error 1 /usr/bin/make -- NOT OK Running make test Can't test without successful make Running make install make had returned bad status, install seems impossible hmmm - any help? Install berkley db ? and if you are using redhat the devel rpm as well (if there is one, dunno, I don't do redhat). Regards, Rick
RE: bayes db problem upgrade from 2.63 --> 3.0
Tried that as well before mailing the list - I am having a problem with install DB_File version.c:30:16: db.h: No such file or directory make: *** [version.o] Error 1 /usr/bin/make -- NOT OK Running make test Can't test without successful make Running make install make had returned bad status, install seems impossible hmmm - any help? -Original Message- From: Rick Macdougall [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 22, 2004 10:35 AM To: Nichols, William Cc: users@spamassassin.apache.org Subject: Re: bayes db problem upgrade from 2.63 --> 3.0 Nichols, William wrote: > I installed spamassassin (test box) from cpan over my existing SA 2.63 > > > When I try to sa-learn -sync I get the following "bayes db version 2 is > not able to be used, aborting! at > /usr/lib/perl5/site_perl/5.8.3/Mail/SpamAssassin/BayesStore/DBM.pm line 160" Same problem till I upgraded DB_File perl -MCPAN -e "install DB_File" on a side note, I also needed to install Storable as well. Regards, Rick
Re: bayes db problem upgrade from 2.63 --> 3.0
Nichols, William wrote: I installed spamassassin (test box) from cpan over my existing SA 2.63 When I try to sa-learn –sync I get the following “bayes db version 2 is not able to be used, aborting! at /usr/lib/perl5/site_perl/5.8.3/Mail/SpamAssassin/BayesStore/DBM.pm line 160” Same problem till I upgraded DB_File perl -MCPAN -e "install DB_File" on a side note, I also needed to install Storable as well. Regards, Rick