Re: Bayes, Manual and Auto Learning Strategies
On 07/02/2014 11:12 AM, John Hardin wrote: A week or so back they briefly listed some of the MailControl.com MTAs, due to apparent exploits. They were quickly removed, though. So the message here is that some DNSBL's are better than others about including and removing addresses quickly and responsibly. Perhaps. I take no position on that. But that does not address the issue of collateral damage to users which share an ISP's email server with someone else who happened to get a spam through and reported back to the DNSBL. Not long ago, I had another client blocked from sending response emails to their on-line customers about their purchases. Turned out one of the users on the hosting provider's system had sent some spam. Now the hosting provider (Webfaction) is quite responsible, very diligent, and has *fantastic* support. (I can recommend them for dynamic language language apps with no reservations.) But guess what? The DNSBL's interface for interacting with them was down. For over a week. (We're sorry, but... Please come back when... No guaranty as to...) And emails to the affected customers were blocked for all that time. I use DNSBL's. But I don't like them. SA is indispensable. I like it. But it's a huge compilation of kluges that happen to mostly work. Expedient. Pragmatic. Not a real solution to the actual problem. -Steve
Re: Bayes, Manual and Auto Learning Strategies
On 07/02/2014 11:10 AM, Jim Popovitch wrote: Just a heads-up... that sort of biting comment is probably not welcome I'm familiar with adapting to the relative insularities of various lists. But thanks for the head-up, Jim. -Steve
Re: Bayes, Manual and Auto Learning Strategies
On Wed, 2 Jul 2014, Axb wrote: If a sender's IP is listed @Spamhaus , he has a serious problem reaching many, many destinations. If he's been expoited, you get good evidence and fast delisting processsing and I have yet to see a real FP with ZEN. A week or so back they briefly listed some of the MailControl.com MTAs, due to apparent exploits. They were quickly removed, though. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- There is no better measure of the unthinking contempt of the environmentalist movement for civilization than their call to turn off the lights and sit in the dark.-- Sultan Knish --- 2 days until the 238th anniversary of the Declaration of Independence
Re: Bayes, Manual and Auto Learning Strategies
On Wed, Jul 2, 2014 at 11:54 AM, Steve Bergman wrote: >> I suggest you join the SDLU list where you can discuss anti spam >> philosophy. >> > > Thanks. I suggest that you consult for an ISP-dependent business someday. > ;-) > > It's an education, too. > > -Steve Just a heads-up... that sort of biting comment is probably not welcome on the SDLU list. -Jim P.
Re: Bayes, Manual and Auto Learning Strategies
I suggest you join the SDLU list where you can discuss anti spam philosophy. Thanks. I suggest that you consult for an ISP-dependent business someday. ;-) It's an education, too. -Steve
Re: Bayes, Manual and Auto Learning Strategies
On 07/02/2014 05:39 PM, Steve Bergman wrote: On 07/02/2014 09:48 AM, Axb wrote: If an IP is exploited/sends spam and a legitimate msg is rejected then somebody hasn't done due diligence and I see the reject as legitimated. The legitimate senders and receivers of the good message, neither of whom's companies have anything to do with the spam, would not see it that way. And I agree with their perspective. Some of the perspective I'm reading here seem really off in the ether. I get the impression that some are so frustrated with SA's limitations that they are willing to resort to desperate measures which normal users would instantly recognize as insane. No rudeness intended. But some of the things I'm reading here are just bizarre. I suggest you join the SDLU list where you can discuss anti spam philosophy. It's a great resource for knowledge. List Guidelines: http://www.new-spam-l.com/admin/faq.html List Information: https://spammers.dontlike.us/mailman/listinfo/list The Mailop list is also a good place to lurk and bathe in hundreds of years of mail related experience http://chilli.nosignal.org/mailman/listinfo/mailop
Re: Bayes, Manual and Auto Learning Strategies
On 07/02/2014 09:48 AM, Axb wrote: If an IP is exploited/sends spam and a legitimate msg is rejected then somebody hasn't done due diligence and I see the reject as legitimated. The legitimate senders and receivers of the good message, neither of whom's companies have anything to do with the spam, would not see it that way. And I agree with their perspective. Some of the perspective I'm reading here seem really off in the ether. I get the impression that some are so frustrated with SA's limitations that they are willing to resort to desperate measures which normal users would instantly recognize as insane. No rudeness intended. But some of the things I'm reading here are just bizarre. -Steve
Re: Bayes, Manual and Auto Learning Strategies
On 07/02/2014 04:40 PM, Steve Bergman wrote: You are discussing about DNSBLs but not being specific. I'm specific in that all the DNSBL's blacklist IP addresses or blocks. And that in today's world many, many companies share sets of mail servers with many other companies and individuals. If an IP is exploited/sends spam and a legitimate msg is rejected then somebody hasn't done due diligence and I see the reject as legitimated. If I need to open up, I have options as the DNSWL, etc.
Re: Bayes, Manual and Auto Learning Strategies
You are discussing about DNSBLs but not being specific. I'm specific in that all the DNSBL's blacklist IP addresses or blocks. And that in today's world many, many companies share sets of mail servers with many other companies and individuals. I'll let others sell you this Hoover. No sale necessary. I continue to recognize the overall expediency of the DNSBL kluge, and continue to use it myself. I wouldn't buy a Hoover anyway. I'm a Kirby kind of guy. I have a 1969 Dual Sanitronic 80 that my grandmother gave our family new, as a Christmas gift. https://c1.staticflickr.com/7/6071/6056367963_f06f08c7f6_z.jpg A 1976 Classic III that I picked up at a garage sale. http://cdn3.volusion.com/maxg3.xen6j/v/vspfiles/photos/KirbyClassicIII-4.jpg?1329982229 And a really cool model 516, manufactured in 1956 that someone had set out on the curb for garbage pickup, which I rescued and restored. http://www.1377731.com/kirby/516_5.jpg All stock photos. Not mine. -Steve
Re: Bayes, Manual and Auto Learning Strategies
On 07/02/2014 03:54 PM, Steve Bergman wrote: On 07/02/2014 06:45 AM, Axb wrote: I'm pretty sure, a huge amount of SA users trust Spamhaus' ZEN at smtp level for outright rejects. At this point, I'm using the defaults, other than upping BAYES_999 enough to enough to total to 5.0 when added to BAYES_99. If a sender's IP is listed @Spamhaus , he has a serious problem reaching many, many destinations. Many, many destinations? Or a high percentage of destinations? I recently had to explain to the owner of the company why an important email from one of his business associates at another company was blocked. I told him that they were on a couple of spam block lists (which they were) and that contributed to the mail's rejection. I made the same pitch. "This should affect their outgoing mail to many sites, etc.". But I'm not sure I believe it. When I interact with people who've had their emails rejected (often related to DNSBLs) I've been listening for any mention of other mails of theirs to other companies being blocked. But when the DNSBL rules in SA are the major contributors to the rejecting, it seems that we are the only domain they interact with which is doing so. Entries in the DNSBL databases do great collateral damage. And of course none of these companies are spammers. They're with this or that ISP who has, at one time, had someone exploit their servers to send spam. DNSBL's are like a guy with a bazooka trying to play sniper. You are discussing about DNSBLs but not being specific. With millions of sessions/day I'm glad Spamhaus keeps my servers from melting. I'll let others sell you this Hoover.
Re: Bayes, Manual and Auto Learning Strategies
On 07/02/2014 06:45 AM, Axb wrote: I'm pretty sure, a huge amount of SA users trust Spamhaus' ZEN at smtp level for outright rejects. At this point, I'm using the defaults, other than upping BAYES_999 enough to enough to total to 5.0 when added to BAYES_99. If a sender's IP is listed @Spamhaus , he has a serious problem reaching many, many destinations. Many, many destinations? Or a high percentage of destinations? I recently had to explain to the owner of the company why an important email from one of his business associates at another company was blocked. I told him that they were on a couple of spam block lists (which they were) and that contributed to the mail's rejection. I made the same pitch. "This should affect their outgoing mail to many sites, etc.". But I'm not sure I believe it. When I interact with people who've had their emails rejected (often related to DNSBLs) I've been listening for any mention of other mails of theirs to other companies being blocked. But when the DNSBL rules in SA are the major contributors to the rejecting, it seems that we are the only domain they interact with which is doing so. Entries in the DNSBL databases do great collateral damage. And of course none of these companies are spammers. They're with this or that ISP who has, at one time, had someone exploit their servers to send spam. DNSBL's are like a guy with a bazooka trying to play sniper. -Steve
Re: Bayes, Manual and Auto Learning Strategies
On 07/02/2014 10:47 AM, Steve Bergman wrote: The DNSBL's are problematic because so many ISP's mail servers are on them. We get quite a few emails from employees at companies who's ISP's are on Spamhaus lists, or whatever, due to nothing that has anything to do with them. I'm pretty sure, a huge amount of SA users trust Spamhaus' ZEN at smtp level for outright rejects. If a sender's IP is listed @Spamhaus , he has a serious problem reaching many, many destinations. If he's been expoited, you get good evidence and fast delisting processsing and I have yet to see a real FP with ZEN. Consider it being better a sender gets a hard reject than having msgs land in some spam folder and remain unseen. but then...
Re: Bayes, Manual and Auto Learning Strategies
On 07/02/2014 10:47 AM, Steve Bergman wrote: But for all the discussion today, we never really had a good talk about postscreen, which is something I'd like to hear someone expound a bit upon. probably Wrong list ... review Postfix list archives
Re: Bayes, Manual and Auto Learning Strategies
On 07/02/2014 10:47 AM, Steve Bergman wrote: I'll add you to the list of people telling me that jumping out of an airplane at 20,000 feet with nothing but a parachute and a pair of underwear is fun. Yep... it is... though you could catch a cold...
Re: Bayes, Manual and Auto Learning Strategies
On 07/02/2014 03:05 AM, Dave Funk wrote: Unless you've explicitly disabled them, the network based rules (razor, pyzor, dcc, DNS based rules, RBLs, URIBLs, etc) constitute an external 'reputation' system to pass judgment on messages. Actually, DCC is not included in the default due to arbitrary restrictions on request volume for the public servers. 100,000 per day or something. And neither is Pyzor, presumably for similar reasons? Razor2 is in by default. I use all these, but have reservations about them. DCC Pyzor and Razor2 are lists of bulk email. Not specifically of *unsolicited* bulk email. Many of my users are on lists of various sorts. The DNSBL's are problematic because so many ISP's mail servers are on them. We get quite a few emails from employees at companies who's ISP's are on Spamhaus lists, or whatever, due to nothing that has anything to do with them. It's not uncommon to take a low-scoring spam and find that it gets a higher score on retest as it has been added to various bad-boy lists. Except that the "bad-boy" lists flag more ham then spam. This is also one way that gray-listing helps. Review the thread. You don't want to talk to me about greylisting. ;-) But for all the discussion today, we never really had a good talk about postscreen, which is something I'd like to hear someone expound a bit upon. I've used site-wide Bayes with auto-learning at a site with ~3000 users and have had to flush & restart our Bayes database twice in 10 years. I'll add you to the list of people telling me that jumping out of an airplane at 20,000 feet with nothing but a parachute and a pair of underwear is fun. -Steve
Re: Bayes, Manual and Auto Learning Strategies
On 07/02/2014 02:39 AM, Dave Funk wrote: Steve, For some reason you seem to be hung-up on Bayes "autolearning". Skip down the thread. I was demonstrated to be wrong. :-) It it possible that you're confusing it with "Auto-White listing"? (which is now deprecated and has -nothing- to do with Bayes). No. I know the difference. AWL, planned to be replaced with TxRep and all that. (I'd mention that TxRep has problems, but it's too late at night for me to engage in yet another argument.) SA's Bayesian scorer is a system based upon a method that parses a message, extracts 'tokens' from it and uses an algorithm to calculate a score for the message based upon a dictionary of previously seen tokens and their relative merit. Yeah. Bayesian statistics is pretty cool. or via an automated process from within SA as it scores messages (known as 'auto' learning). So regardless of whether manual or auto learning is utilized, tokens are added to the dictionary. See, that's where things stop making sense to me. I would not expect the Bayesian filter to do any better than it's training. And if it's training is via input from static rules (plus DNSBL's and DCC's) I would not expect it to be able to do any better. And it's not hard to imagine pathological behavior developing. But people are telling me different. And I'm open to considering alternative possibilities. It's also possible to employ both auto & manual learning methods in the same installation. That would be the scenario I am considering. There can be one dictionary used for scoring all messages processed (called "site wide Bayes") or many separate dictionaries, one used for each recognized user ("per user Bayes"). Either way, the dictionary(s) need to be updated (and the update process could be either manual, auto, or both). Yes. I've been devoted to individual fileDB's, each individually trained for a particular user's spam^Wemail stream. People are telling me that system-wide databases work well. It's been this way for the past 10+ years AFAIK (well, maybe 10 years ago it didn't have as many options for back-end database storage, mostly limited to Berkeley-DB type methods). I think it was around 2003, in SA 2.5(?) that SA got a Bayesian classifier. IIRC, there was a project called dspam (which I think is still around) For a while the dspam guys were pushing the fact that *dspam* was a modern spam filter, and SA was old, clunky, and too outdated to use. Anyway, in the very early versions of SA Bayes, everything was system-wide. Later they added the option to use individual user files. And the only info I've seen that described autolearn and how it worked was a mailing list post from 2004 which specifically stated that it was system-wide, in memory, and was lost upon restart. Maybe that's correct and maybe it's not. But today, it looks to be user-specific, if configured that way. I'm still working out whether I want to use it, and if so, how. -Steve
Re: Bayes, Manual and Auto Learning Strategies
On Wed, 2 Jul 2014, Steve Bergman wrote: Well... I just turned on autolearn for a moment, deleted the bayes_* files on the test account I use, and sent myself a message from my usual outside account. And new bayes_* files were created. So I was wrong, and I win. More options. So now I can proceed to the "what does this mean?" phase. If I leave things as they are, then training is perfect if the users are diligent. But if they are not, then... what? I see plenty of spams getting through with a 0.0 score. IIRC, the autolearn spam threshold is 7? Pretty much everything there is spam. But I'm not sure I quite buy having the static rules of SA training Bayes. Isn't Bayes just learning to emulate the static rules, with all their imperfections? Unless you've explicitly disabled them, the network based rules (razor, pyzor, dcc, DNS based rules, RBLs, URIBLs, etc) constitute an external 'reputation' system to pass judgment on messages. It's not uncommon to take a low-scoring spam and find that it gets a higher score on retest as it has been added to various bad-boy lists. This is also one way that gray-listing helps. If you stiff-arm the first pass of a spam run a later check may hit it more accurately as it's been added to block-lists in the mean-time. If it starts going wrong, doesn't that mean the errors are going to spiral out of control? That is a possible risk of relying solely on auto-learning. The autolearn system has been carefully crafted and tuned over the years to try to prevent a feed-back loop from throwing it into a tail-spin. For example the internal scoring system used to determine if a message is spam or ham WRT the choice for auto-learning explicitly excludes the Bayes score (and other particular kinds of scores such as white/black lists) to try to prevent tail-eating. Occasional judicious manual learning can help to 'tweak' things when Bayes looks like it's not in top shape. (IE manual learning of FPs & FNs). I've used site-wide Bayes with auto-learning at a site with ~3000 users and have had to flush & restart our Bayes database twice in 10 years. Dave -- Dave Funk University of Iowa College of Engineering 319/335-5751 FAX: 319/384-0549 1256 Seamans Center Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527 #include Better is not better, 'standard' is better. B{
Re: Bayes, Manual and Auto Learning Strategies
On 07/02/2014 02:14 AM, Axb wrote: YOu don't need to trust me or believe me (I'm not selling anything - just commenting on what works for me) Well, I know you know what I meant. Ever thought of running a newer distro in a VM, only for SA and let spamass-milter use that? That would mean you can play with SA 3.4 without having to redo all your mail infra? I'm pushing to do our ubuntu 14.04 upgrade soon to get the dovecot full text search. And then a memory upgrade. And these days I just max them out on memory. 4GB -> 32GB. Plus adding a 4TB RAID1. So it ought to be able to handle almost anything. And I've just confirmed that SA 3.4 made it into 14.04. That should, at least, avert all those annoying "time to upgrade" responses like I got here earlier. It's very late here. 2:45AM, I see. But it's been a lot of fun arguing with you guys today. And thanks for all the help. Pyzor seems to be functioning fine now. General rules of thumb to keep in mind: Whenever there are inexplicable problems, it's probably selinux causing them. And if not that, regular old POSIX permissions. And if ever there is an article of clothing you need but can't find anywhere in the house, there's usually a dog sleeping on it. Or possibly a cat. -Steve
Re: Bayes, Manual and Auto Learning Strategies
On Wed, 2 Jul 2014, Steve Bergman wrote: On 07/01/2014 11:49 PM, Karsten Bräckelmann wrote: Those do not tell you about using file or SQL based databases? They do. But not specifically with respect to autolearn. You never thought about googling for "spamassassin per user" and friends? You never checked the SA wiki? I have, indeed. No reference to autolearn and persistent storage. The lack of mention is notable. I'd expect people to be lining up to tell me I'm mistaken if I absolutely were. Can you point me to a change log somewhere documenting autolearn moving from in-memory and system-wide to per user and persistent? I don't hold a strong opinion on this. It would be nice if I were wrong. It would open more options. I'm just waiting for evidence that it's the case. My perception is that It's not. -Steve Steve, For some reason you seem to be hung-up on Bayes "autolearning". It it possible that you're confusing it with "Auto-White listing"? (which is now deprecated and has -nothing- to do with Bayes). SA's Bayesian scorer is a system based upon a method that parses a message, extracts 'tokens' from it and uses an algorithm to calculate a score for the message based upon a dictionary of previously seen tokens and their relative merit. The dictionary is created and updated by a process called 'learning' wherein already-classified messages are tokenized and their tokens are stored in the dictionary along with a merit value derived from their instance count and a factor taken from being classified as spam or ham. This learning process can be either externally driven (known as 'manual' learning) or via an automated process from within SA as it scores messages (known as 'auto' learning). So regardless of whether manual or auto learning is utilized, tokens are added to the dictionary. It's also possible to employ both auto & manual learning methods in the same installation. There can be one dictionary used for scoring all messages processed (called "site wide Bayes") or many separate dictionaries, one used for each recognized user ("per user Bayes"). Either way, the dictionary(s) need to be updated (and the update process could be either manual, auto, or both). The Bayes dictionary(s) need to be stored some how, the usual method is via some kind of database. It could be a simple file based DB, some kind of fancy SQL server based system or something else. This is a DBA'ish kind of choice as to what particular technology is used to store the dictionary DB. (usually on disk in some way, could be in some kind of memory resident set of tables, or something else???). So you have a multi-dimensional matrix WRT your Bayes system configuration, and manual VS auto learning is just one factor. It's been this way for the past 10+ years AFAIK (well, maybe 10 years ago it didn't have as many options for back-end database storage, mostly limited to Berkeley-DB type methods). I hope this helps you. -- Dave Funk University of Iowa College of Engineering 319/335-5751 FAX: 319/384-0549 1256 Seamans Center Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527 #include Better is not better, 'standard' is better. B{
Re: Bayes, Manual and Auto Learning Strategies
On 07/02/2014 02:02 AM, Axb wrote: and don't count on that - they may do it the first week, new toy, but for how long? Not new. They'd previously been training SA with Evolution for some years. I have some confidence in many of them doing it right. Also: take in mind each user's Bayes folder also get a a bayes_seen file which grows and grows and grows and never gets truncated. Well, I have the maximum bayes toks set at 2,000,000. Is bayes_seen likely to become a problem with ~100 users and 4TB of disk space? My largest email volume user has accumulated only 320k of "seen" in 10 days. And I assume that repeat spams don't add to it. Do you really want to spend time watching each user's Bayes? Not really. But I'll do whatever is necessary. -Steve
Re: Bayes, Manual and Auto Learning Strategies
On 07/02/2014 09:01 AM, Steve Bergman wrote: Axb, I'm not sure I quite believe it. And I'm not quite sure I trust you. But you do make an attractive pitch. Excellent spam filtering, system-wide, with no responsibility for training on the part of the users? YOu don't need to trust me or believe me (I'm not selling anything - just commenting on what works for me) You can try it and after a couple of weeks, see if it works for you and then if necessary come up with new methods for extra training or dump the concept totally. Bayes is yet another scoring mechanism in SA. If you have enough traffic, you can wipe the data any time and it's not like you're switching SA off totally. During the dev/test process of the Redis backend, as stuff changed on a daily basis I was forced to purge the Bayes data several times/week. It even became a running joke (wave Henrik/Marc). This sounds like the kind of "too good to be true" message that I'd expect to receive in a spam mail. :-) But hmm. This is good dream material for tonight. I wonder if our Ubuntu 14.04 upgrade has SA 3.4 with redis built in. I do hear that the redis backend is amazing. Ever thought of running a newer distro in a VM, only for SA and let spamass-milter use that? That would mean you can play with SA 3.4 without having to redo all your mail infra?
Re: Bayes, Manual and Auto Learning Strategies
On 07/02/2014 08:48 AM, Steve Bergman wrote: Someone, please convince me that I should turn it on. autolearn doesn't mean you cannot also train manually... Should I turn it on and take my "train as ham" entry out of .forward? Or should I not? manually training ham from unreviewed data? bad idea. I suppose that largely depends upon my individual users' levels of diligence. and don't count on that - they may do it the first week, new toy, but for how long? Also: take in mind each user's Bayes folder also get a a bayes_seen file which grows and grows and grows and never gets truncated. Do you really want to spend time watching each user's Bayes?
Re: Bayes, Manual and Auto Learning Strategies
Axb, I'm not sure I quite believe it. And I'm not quite sure I trust you. But you do make an attractive pitch. Excellent spam filtering, system-wide, with no responsibility for training on the part of the users? This sounds like the kind of "too good to be true" message that I'd expect to receive in a spam mail. But hmm. This is good dream material for tonight. I wonder if our Ubuntu 14.04 upgrade has SA 3.4 with redis built in. I do hear that the redis backend is amazing. -Steve
Re: Bayes, Manual and Auto Learning Strategies
Well... I just turned on autolearn for a moment, deleted the bayes_* files on the test account I use, and sent myself a message from my usual outside account. And new bayes_* files were created. So I was wrong, and I win. More options. So now I can proceed to the "what does this mean?" phase. If I leave things as they are, then training is perfect if the users are diligent. But if they are not, then... what? I see plenty of spams getting through with a 0.0 score. IIRC, the autolearn spam threshold is 7? Pretty much everything there is spam. But I'm not sure I quite buy having the static rules of SA training Bayes. Isn't Bayes just learning to emulate the static rules, with all their imperfections? If it starts going wrong, doesn't that mean the errors are going to spiral out of control? Leaving autolearn off puts everything in the hands of the users. And that's where I've left things for now. Someone, please convince me that I should turn it on. Should I turn it on and take my "train as ham" entry out of .forward? Or should I not? I suppose that largely depends upon my individual users' levels of diligence. -Steve
Re: Bayes, Manual and Auto Learning Strategies
On 07/02/2014 08:00 AM, Steve Bergman wrote: On 07/02/2014 12:52 AM, Axb wrote: Site wide bayes works VERY well even under such ugly conditions as traffic with multiple languages, for ham as well as spam. Please tell me more. This goes against Paul Graham's orginal advice, IIRC. And it goes against intuition. Then again. Bayesian statistics go against intuition. It's hard to let go and trust a systen-wide Bayes. But I'm listening... It works, trust me. SA's Bayes implementation is incredibly robust. My site wide Bayes DB is not exactly small. 0.000 0 23850755 0 non-token data: nspam 0.000 0 10702302 0 non-token data: nham Would I run a monster this size of it didn't work? Nope. I waited a long time to be able to use something really 100% site wide (not per server) till we got the ability to use Redis which was FAST, robust and doesn't cause me headaches as sql, file permissions issues, etc. I can't give you a scientific reason for not using per user Bayes Site wide works for my +2000 corp domains which includes .tr, .ru, .cn, .ua, .es, .fr,.de plus a ton of other major CCtld domains AND: I only run autolearn. NO manual/scheduled training.
Re: Bayes, Manual and Auto Learning Strategies
On Wed, 2 Jul 2014, Steve Bergman wrote: On 07/01/2014 11:14 PM, John Hardin wrote: Autolearn trains the bayes database. The bayes data is stored wherever you configured it to be stored, in a DBM database or SQL or redis, and it's per-user if you configure per-user Bayes databases and scan emails using different usernames (vs. a global user like root or amavis). That is interesting. How sure are you of this? Because if you're pretty sure, it's a piece of information I've been keen to confirm for a while. The bayes database is the only thing in SA that can be trained. (I'm excluding submission of the message to pyzor et. al. because that's obviously not local.) Odd, though, that before I set up .forward to train incoming mails as ham and disabled autolearn, no nhams were showing up in "sa-learn --dump magic" for the individual users. Just nspams. That is rather odd. Very-low-scoring hams should be autolearned as ham unless the default thresholds have been changed. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- News flash: Lowest Common Denominator down 50 points --- 3 days until the 238th anniversary of the Declaration of Independence
Re: Bayes, Manual and Auto Learning Strategies
On 07/02/2014 12:52 AM, Axb wrote: Site wide bayes works VERY well even under such ugly conditions as traffic with multiple languages, for ham as well as spam. Please tell me more. This goes against Paul Graham's orginal advice, IIRC. And it goes against intuition. Then again. Bayesian statistics go against intuition. It's hard to let go and trust a systen-wide Bayes. But I'm listening... -Steve
Re: Bayes, Manual and Auto Learning Strategies
On 07/02/2014 07:37 AM, Steve Bergman wrote: Lets turn this around? Can you prove autolearn was ever done to memory? I'm not really interested in proving anything. I'm interested in being convinced that autolearn is individual file-based when spamc is run as the individual user. It's in the code... but yes, autolearn is always file based and respects the per user settings unless you run spamd with -x I'm not quite sure how that would affect my strategy. But it might (or might not) make autolearn useful. More important, you may need to reconsider is if per user Bayes will give you the level of quality you're aiming for, and from experience I can tell you: it won't. Site wide bayes works VERY well even under such ugly conditions as traffic with multiple languages, for ham as well as spam.
Re: Bayes, Manual and Auto Learning Strategies
Lets turn this around? Can you prove autolearn was ever done to memory? I'm not really interested in proving anything. I'm interested in being convinced that autolearn is individual file-based when spamc is run as the individual user. I'm not quite sure how that would affect my strategy. But it might (or might not) make autolearn useful. -Steve
Re: Bayes, Manual and Auto Learning Strategies
On 07/02/2014 07:19 AM, Steve Bergman wrote: On 07/01/2014 11:49 PM, Karsten Bräckelmann wrote: Those do not tell you about using file or SQL based databases? They do. But not specifically with respect to autolearn. You never thought about googling for "spamassassin per user" and friends? You never checked the SA wiki? I have, indeed. No reference to autolearn and persistent storage. The lack of mention is notable. I'd expect people to be lining up to tell me I'm mistaken if I absolutely were. Can you point me to a change log somewhere documenting autolearn moving from in-memory and system-wide to per user and persistent? I don't hold a strong opinion on this. It would be nice if I were wrong. It would open more options. I'm just waiting for evidence that it's the case. My perception is that It's not. Lets turn this around? Can you prove autolearn was ever done to memory? If you mean "autolearn to journal", this is also file based. I've been using SA since before it was an Apache project, when it was developed by McAfee and the sources were on Sourceforge and back then it was already file based.
Re: Bayes, Manual and Auto Learning Strategies
On 07/01/2014 11:14 PM, John Hardin wrote: Autolearn trains the bayes database. The bayes data is stored wherever you configured it to be stored, in a DBM database or SQL or redis, and it's per-user if you configure per-user Bayes databases and scan emails using different usernames (vs. a global user like root or amavis). That is interesting. How sure are you of this? Because if you're pretty sure, it's a piece of information I've been keen to confirm for a while. Odd, though, that before I set up .forward to train incoming mails as ham and disabled autolearn, no nhams were showing up in "sa-learn --dump magic" for the individual users. Just nspams. -Steve
Re: Bayes, Manual and Auto Learning Strategies
On 07/01/2014 11:49 PM, Karsten Bräckelmann wrote: Those do not tell you about using file or SQL based databases? They do. But not specifically with respect to autolearn. You never thought about googling for "spamassassin per user" and friends? You never checked the SA wiki? I have, indeed. No reference to autolearn and persistent storage. The lack of mention is notable. I'd expect people to be lining up to tell me I'm mistaken if I absolutely were. Can you point me to a change log somewhere documenting autolearn moving from in-memory and system-wide to per user and persistent? I don't hold a strong opinion on this. It would be nice if I were wrong. It would open more options. I'm just waiting for evidence that it's the case. My perception is that It's not. -Steve
Re: Bayes, Manual and Auto Learning Strategies
On Tue, 2014-07-01 at 22:40 -0500, Steve Bergman wrote: > On 07/01/2014 10:21 PM, Karsten Bräckelmann wrote: > > > > http://spamassassin.apache.org/doc/Mail_SpamAssassin_Conf.html > > http://spamassassin.apache.org/doc/Mail_SpamAssassin_Plugin_AutoLearnThreshold.html > > I've read those over and over. It never says anything about where the > data is maintained, or whether it's per-user or not. The *only* solid > claim I have is a ten year old (yes, at the dawn of SA Bayes) post which > specifically says it's in memory, system-wide, and lost upon SA restart. Those do not tell you about using file or SQL based databases? You never thought about googling for "spamassassin per user" and friends? You never checked the SA wiki? FWIW, the links given do NOT refer to in-memory only at all. An in-memory only Bayes database definitely is much more than ten years ago. If it ever existed. No need for me to even check. > > Milter usually means system-wide. (But since you just asked, it is.) > > I'm using spamass-milter. It suid's to the recipient user for most > mails. For aliases it defaults to a particular user who gets an > unbelievable amount of spam at the gate, and whom I know sorts his > ham/spam religiously. So you want to check back with your specific setup and its docs. Suid'ing is pretty likely to be per-user, though the definition of user is not specifically clear in the context of a milter (and the final recipient). In either case, that is not SA specific. (SA happily uses both, per-user or site-wide config AND bayes database, depending on context.) Refer to your milter's docs. > > Irrespective of your feeling -- cheers! /me having a beer > > Whew! After the conversations I've had here, today, I need one, too! ;-) Don't see this as an attack on you. It isn't. Just pointers on helping your understanding of the situation and your issues. Not always gentle, but that also reflects the initial stance. -- char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Bayes, Manual and Auto Learning Strategies
On Tue, 2014-07-01 at 22:18 -0500, Steve Bergman wrote: > On 07/01/2014 09:53 PM, Karsten Bräckelmann wrote: > > > Frankly, it appears you don't understand what auto-learning is. > > So please specify, explicitly, what it is. I asked some specific > questions about it. And I'm very interested in the answers. If you want my opinion, please re-phrase your questions. I locally deleted most of this previous (originally unrelated) thread. > Is auto-learn still system-wide? I'd need it to apply to individual > users. Is it in-memory only? Or can I have it update the users' filedb > token databases? SA itself never was system-wide, neither user-specific. It is both, can be either. It depends on the context of calling SA. > If it's now per user and uses the user databases, then I am more than > ready to reconsider my opinion. But I've not been able to get a clear > answer to this. I haven't had an opportunity to test. And I'd want > confirmation from someone in the know anyway, before I changed strategies. It does not depend on SA, but on how you invoke SA. We cannot give you a clear answer. It depends on your system, your SMTP, glue, system wide calling of SA, and possibly per-user invocations even after system-wide. To be clear: SA is a filter. It does nothing itself, other than classification. Being called, and at which point, is outside the scope of SA. Rejecting, deleting, delivering or any other kind of action is outside the scope of SA. That's actions performed by the calling layer, based on the result of SA evaluation. > >> This method shields the user from the worst of the spam, while giving > >> them full control of what gets relearned as spam. > > > > Wrong. It is not "this" (your) method, that shields the user from the > > worst of the spam. That's SA. Not your style of auto-training. > > Mine is not autotraining at all. it's giving the user a way of > explicitly training the backend spam filter. Quoting your previous post, you "have a line in the users' default .forward file to train incoming mail as ham". That is auto-training. > > (Besides, you *are* doing auto-learning, which you just claimed to be a > > complete joke.) > > No. The messages are assumed ham until the user classifies it as spam. > It is explicit learning. Under user control, Being "assumed" is not the same as being "treated and automatically reinforced". The latter is what you do. (And btw, Yes. You are auto-learning.) > > At this point I won't get into details. It should suffice to highlight > > that a default ham auto-learning threshold of 0.1 is part of the safety > > concepts. (See the M::SA::Plugin::AutoLearnThreshold man-page for more.) > > I really don't think you understand what it is I'm doing. Anything below > a score of 5.0 goes into their mailbox and learned as ham. If it's ham, > that's great. If it's spam, they move it to Junk and it gets learned as > spam. auto-learn is as brain dead as the defunct AWL. I perfectly understood what you are doing. You didn't understand why that is bad. Failing to explain might be my bad, though I'll leave re-explaining for tomorrow my timezone. Or you carefully re-reading my posts. > > I never checked the TB internal Bayes implementation and auto-learn > > strategy, but I'd be surprised if they do train on black/white, without > > any gray area in between. > > Optimally, I would have an "incoming folder" and then the user could > manually move the messages from there to spam or ham. But considering Which is basically what you came from, using Dovecot antispam plugin with SA, and dedicated folders "where the user could manually move the messages" to. Why didn't you just set that up? (Hint: That's your set-up without auto-learning ham Inbox deliveries.) > that this was not even remotely necessary with our old email provider, I > don't feel that I can put my users to that level of extra trouble that > they never even thought about having to deal with before, just because > SA is not performing as well as the spam filter they are used to. The Do initial manual training. Then get back to us. > mail needs to go into the inbox directly. And for SA's bayesian tp work, > it needs to be assumed as ham initially. No. It seems your previous "email provider", whatever that might be, had some sort of spam filtering service. Now you're on your own. Which you are, unless you decide to ask for free (as in beer) support by the community providing the software for free (as in speech) to help you weed out the spam. You did ask, which is just fine, but your assumptions are kind of hostile. Like your previous "email provider" would not use SA internally. He most likely does. > The only thing I see which might change my view would be explicit > details about where autolearn stores its data and how it is used on a > per user basis. So the only thing that might change your view would be reading the docs. Go read them. Auto-learn stores its data exactly wher
Re: Bayes, Manual and Auto Learning Strategies
On Tue, 1 Jul 2014, Steve Bergman wrote: On 07/01/2014 10:21 PM, Karsten Bräckelmann wrote: http: //spamassassin.apache.org/doc/Mail_SpamAssassin_Conf.html http: //spamassassin.apache.org/doc/Mail_SpamAssassin_Plugin_AutoLearnThreshold.html I've read those over and over. It never says anything about where the data is maintained, or whether it's per-user or not. The *only* solid claim I have is a ten year old (yes, at the dawn of SA Bayes) post which specifically says it's in memory, system-wide, and lost upon SA restart. Autolearn trains the bayes database. The bayes data is stored wherever you configured it to be stored, in a DBM database or SQL or redis, and it's per-user if you configure per-user Bayes databases and scan emails using different usernames (vs. a global user like root or amavis). -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- News flash: Lowest Common Denominator down 50 points --- 3 days until the 238th anniversary of the Declaration of Independence
Re: Bayes, Manual and Auto Learning Strategies
On 07/01/2014 10:21 PM, Karsten Bräckelmann wrote: http://spamassassin.apache.org/doc/Mail_SpamAssassin_Conf.html http://spamassassin.apache.org/doc/Mail_SpamAssassin_Plugin_AutoLearnThreshold.html I've read those over and over. It never says anything about where the data is maintained, or whether it's per-user or not. The *only* solid claim I have is a ten year old (yes, at the dawn of SA Bayes) post which specifically says it's in memory, system-wide, and lost upon SA restart. Milter usually means system-wide. (But since you just asked, it is.) I'm using spamass-milter. It suid's to the recipient user for most mails. For aliases it defaults to a particular user who gets an unbelievable amount of spam at the gate, and whom I know sorts his ham/spam religiously. Which, referring to my previous post, also means, a single sloppy user deleting your custom-auto-learned FN ham messages affects all your other users. No. I make sure to keep each user solely responsible for their own email welfare. Irrespective of your feeling -- cheers! /me having a beer Whew! After the conversations I've had here, today, I need one, too! ;-) -Steve
Re: Bayes, Manual and Auto Learning Strategies
On Tue, 2014-07-01 at 20:53 -0500, Steve Bergman wrote: > On 07/01/2014 07:32 PM, Karsten Bräckelmann wrote: > > > That's pretty bad practice. Fundamentally, you are implementing a custom > > auto-learn flavor, overruling the SA configurable auto-learn behavior > > BTW, that reminds me of a question I had been meaning to ask on the > list. Autolearn. There's very little written about it, so far as I am http://spamassassin.apache.org/doc/Mail_SpamAssassin_Conf.html http://spamassassin.apache.org/doc/Mail_SpamAssassin_Plugin_AutoLearnThreshold.html > aware. But from what I have gleaned, from old posts, is that it is > system-wide and in-memory. It depends on how you call SA (SMTP or MDA level). SA itself is a filter, called by your mail-processing chain. Thus, there is no SA default context of system-wide or per-user. It depends on how you call it. > Now, I have Spamass-milter set to run SA 3.3 > as the recipient user, using the filedb backend. So in 3.3, is autolearn > system wide and in memory, or per user and on disk? Milter usually means system-wide. (But since you just asked, it is.) Which, referring to my previous post, also means, a single sloppy user deleting your custom-auto-learned FN ham messages affects all your other users. Or a non-sloppy, but on-vacation-mode user. Moreover, there is no in-memory only, not on-disk mode. Unless you don't have to ask about it. > This makes a difference regarding what Karsten and I are discussing. I > don't suppose I would object to being wrong. But I have a feeling that > I'm right. Irrespective of your feeling -- cheers! /me having a beer -- char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Bayes, Manual and Auto Learning Strategies
On 07/01/2014 09:53 PM, Karsten Bräckelmann wrote: Frankly, it appears you don't understand what auto-learning is. So please specify, explicitly, what it is. I asked some specific questions about it. And I'm very interested in the answers. Is auto-learn still system-wide? I'd need it to apply to individual users. Is it in-memory only? Or can I have it update the users' filedb token databases? If it's now per user and uses the user databases, then I am more than ready to reconsider my opinion. But I've not been able to get a clear answer to this. I haven't had an opportunity to test. And I'd want confirmation from someone in the know anyway, before I changed strategies. This method shields the user from the worst of the spam, while giving them full control of what gets relearned as spam. Wrong. It is not "this" (your) method, that shields the user from the worst of the spam. That's SA. Not your style of auto-training. Mine is not autotraining at all. it's giving the user a way of explicitly training the backend spam filter. And unless you disabled Bayes auto-learning in SA (dunno, might have been mentioned deep in the thread), the user does not have full control of what gets relearned as spam. I have disabled autolearning. I thought I mentioned that to you. (Besides, you *are* doing auto-learning, which you just claimed to be a complete joke.) No. The messages are assumed ham until the user classifies it as spam. It is explicit learning. Under user control, At this point I won't get into details. It should suffice to highlight that a default ham auto-learning threshold of 0.1 is part of the safety concepts. (See the M::SA::Plugin::AutoLearnThreshold man-page for more.) I really don't think you understand what it is I'm doing. Anything below a score of 5.0 goes into their mailbox and learned as ham. If it's ham, that's great. If it's spam, they move it to Junk and it gets learned as spam. auto-learn is as brain dead as the defunct AWL. I never checked the TB internal Bayes implementation and auto-learn strategy, but I'd be surprised if they do train on black/white, without any gray area in between. Optimally, I would have an "incoming folder" and then the user could manually move the messages from there to spam or ham. But considering that this was not even remotely necessary with our old email provider, I don't feel that I can put my users to that level of extra trouble that they never even thought about having to deal with before, just because SA is not performing as well as the spam filter they are used to. The mail needs to go into the inbox directly. And for SA's bayesian tp work, it needs to be assumed as ham initially. The only thing I see which might change my view would be explicit details about where autolearn stores its data and how it is used on a per user basis. -Steve
Re: Bayes, Manual and Auto Learning Strategies
On Tue, 2014-07-01 at 20:36 -0500, Steve Bergman wrote: > On 07/01/2014 07:32 PM, Karsten Bräckelmann wrote: > > > > That's pretty bad practice. Fundamentally, you are implementing a custom > > auto-learn flavor, overruling the SA configurable auto-learn behavior > > SA's autolearn behavior doesn't make much sense. I have no confidence in it. The auto-learning feature is NOT meant to be a fully automated training system. It's an aid for the user to eliminate the need to care about the extremes, while focusing on the close-calls. There are options to tweak to your specific needs, and there even is no single "SA autolearn behavior" as you stated, but different flavors. And an option to turn it off. Frankly, it appears you don't understand what auto-learning is. > This method shields the user from the worst of the spam, while giving > them full control of what gets relearned as spam. Wrong. It is not "this" (your) method, that shields the user from the worst of the spam. That's SA. Not your style of auto-training. And unless you disabled Bayes auto-learning in SA (dunno, might have been mentioned deep in the thread), the user does not have full control of what gets relearned as spam. > > and ignoring all safety concepts implemented by SA. > > What safety concepts? autolearn is a complete joke. Even the docs > explain that it's only there as a last resort method of kinda sorta > training the spam filter. You are doing (custom) auto-learning as ham of any message with a score less than required_score of 5.0. *That* is a joke. (Besides, you *are* doing auto-learning, which you just claimed to be a complete joke.) At this point I won't get into details. It should suffice to highlight that a default ham auto-learning threshold of 0.1 is part of the safety concepts. (See the M::SA::Plugin::AutoLearnThreshold man-page for more.) > > So if a user in a hurry simply deletes some spam, it will remain ham, as > > far as Bayes is concerned. > > Same as with Thunderbird, I think. I never checked the TB internal Bayes implementation and auto-learn strategy, but I'd be surprised if they do train on black/white, without any gray area in between. You stated it. Please back up your claim. > And it's working very well for them. > If they act irresponsibly, they'll get more spam. It takes no longer to > highlight the spam and click "Junk" than it does to highlight the spam > and click "Delete". While I am aware I'm not the average user -- there's a "delete" action key on my keyboard. There's no "junk" equivalent. Yes, I avoid using the mouse if keyboard interaction is more productive... > I've pretty much decided at this point that if the users don't do what I > tell them to do, repeatedly, then what results is not my responsibility. > > And it's not. Do you hate your users or your job? (Sorry, snide-remark I couldn't resist. Feel free to ignore.) > The alternative is to not mark incoming mail as ham, and allow the SA > Bayesian filter to remain inactive forever. No. I can only guess, but it appears there are some mis-interpretations in that conclusion. The SA Bayesian classifier to "remain inactive forever" can only refer to insufficient initial training. Manual training. Of at least 200 ham and spam each (by default, you can lower that to 0). You will easily get that by manual training of existing messages. And even default auto- learning would eventually cross the ham number. Less than forever. More importantly, SA still marks (classifies) incoming mail as ham. Just because its overall score is less than 5.0. It just does not *learn* all of them as ham. Because there's a chance it might not actually be ham, but a FN. That area, between (default) auto-learning as ham and classifying as spam is the gray area, where actual user input is of much value. For both, learning spam AND ham, for that matter. In particular, because generally (and as SA principle), a FP is *much* worse than a FN. Your approach of force learning those as ham, is biasing your Bayes DB. At the very least temporarily (unless a fresh spam campaign has been re-trained by your users on Monday). At worst, until you clear it. Btw, is that per-user, or are you gambling a site-wide Bayes DB? > I opted to give the users the choice of being responsible for sorting, > and reaping the benefits of that if they do. And yes, I know that some > are not going to. > > I'd be interested if you have a better solution in mind. Do not auto-learn ham every message that scores below required_score. Introduce train-on-error for your users, with an extended manual training option. Specific ham and spam folders, where moving or copying mail into trains the Bayes classifier. Kind of optional for the user, unless they feel there's too much mis-classification. -- char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Bayes, Manual and Auto Learning Strategies
On 07/01/2014 07:32 PM, Karsten Bräckelmann wrote: That's pretty bad practice. Fundamentally, you are implementing a custom auto-learn flavor, overruling the SA configurable auto-learn behavior BTW, that reminds me of a question I had been meaning to ask on the list. Autolearn. There's very little written about it, so far as I am aware. But from what I have gleaned, from old posts, is that it is system-wide and in-memory. Now, I have Spamass-milter set to run SA 3.3 as the recipient user, using the filedb backend. So in 3.3, is autolearn system wide and in memory, or per user and on disk? This makes a difference regarding what Karsten and I are discussing. I don't suppose I would object to being wrong. But I have a feeling that I'm right. -Steve
Re: Bayes, Manual and Auto Learning Strategies
On 07/01/2014 07:32 PM, Karsten Bräckelmann wrote: That's pretty bad practice. Fundamentally, you are implementing a custom auto-learn flavor, overruling the SA configurable auto-learn behavior SA's autolearn behavior doesn't make much sense. I have no confidence in it. This method shields the user from the worst of the spam, while giving them full control of what gets relearned as spam. and ignoring all safety concepts implemented by SA. What safety concepts? autolearn is a complete joke. Even the docs explain that it's only there as a last resort method of kinda sorta training the spam filter. So if a user in a hurry simply deletes some spam, it will remain ham, as far as Bayes is concerned. Same as with Thunderbird, I think. And it's working very well for them. If they act irresponsibly, they'll get more spam. It takes no longer to highlight the spam and click "Junk" than it does to highlight the spam and click "Delete". I've pretty much decided at this point that if the users don't do what I tell them to do, repeatedly, then what results is not my responsibility. And it's not. The alternative is to not mark incoming mail as ham, and allow the SA Bayesian filter to remain inactive forever. I opted to give the users the choice of being responsible for sorting, and reaping the benefits of that if they do. And yes, I know that some are not going to. I'd be interested if you have a better solution in mind. -Steve
Bayes, Manual and Auto Learning Strategies (was: Re: getting tons of SPAM)
On Tue, 2014-07-01 at 18:43 -0500, Steve Bergman wrote: > On 07/01/2014 06:09 PM, RW wrote: > > I'm sceptical about the use of Dovecot-Antispam with Spamassassin. > > The problem is that it trains on SpamAssassin errors rather than Bayes > > errors. It may be possible to get sufficient spam this way, but ham > > is learned very slowly through avoidable FPs. > > We currently (early days for this installation) get plenty of spam for > the users to train by moving it to the junk folder. Ham was the problem. > Dovecot does nothing about training ham. Dovecot (and its antispam plugin) does nothing about training ham, either. It offers target folders and triggers, for easy manual (re-) classification -- and thus training -- of ham and spam. > That's why I have a line in the users' default .forward file to train > incoming mail as ham. That's pretty bad practice. Fundamentally, you are implementing a custom auto-learn flavor, overruling the SA configurable auto-learn behavior and ignoring all safety concepts implemented by SA. There's a reason for the ham and spam learning thresholds, and the ham threshold to be 0.1 by default, *not* equaling required_score's default of 5.0. > Then if they or Thunderbird decide to move the mail to Junk, it gets > re-trained as spam. So if a user in a hurry simply deletes some spam, it will remain ham, as far as Bayes is concerned. > dovecot-antispam is *not* a complete solution, so far as I can see. > > At this early stage, it *is* painful to watch all that spam coming in > over the weekend getting trained as ham. I tell my users to mark it as > spam on Monday morning. And if they don't, I just figure it's not my fault. It is your fault to implement a broken training strategy. > Once the token databases get larger there won't be so much potential > flux back and forth, I guess. -- char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}