On Thu, Jan 02, 2003 at 12:12:00PM -0500, Jim Frost wrote: > Anyway, in case this spurs someone to do some work, I did spend some > time working on an imap server based bayesian system. The idea was that > with imap the folders are all on the server and I can easily create a > special "spam" folder that users can drag and drop spam into, and use > their personal folders for the not-spam side of things. My system was > rebuilding the databases every once in awhile out of cron but with a > built-in system you could do it as-you-go (which would be cool).
My problem with this method is that you will need all your spam and nonspam messages to properly rebuild the databases. You will either use large amounts of disk space or not rebuild the database exactly the same. > This was drop-dead simple to use from the user's point of view (my goal > was that my wife should be able to use it without my help). The > downfall was that I haven't had the time to get the delivery stuff > working and integrated into my mail delivery system. > > Apple's mail client with Jaguar (OSX 10.2) does something more or less > like this, but instead of a spam folder there's a "this is spam" > button. And instead of moving probable spam into a special folder it > colorizes them or destroys them (at your option). In some ways I like > this, but I would kind of like to be able to go in and edit the spam > template messages so I think I'd still rather have a spam folder and > have colorization or prioritization versus a trash folder as an option. I think the Jaguar email client has a better balance of ease of use and usefulness. This is much the same way that bogofilter works. All incoming messages are marked as spam or not spam. I just go through and correct it when it's wrong. Interface wise, this is as simple as a "this is spam" button (and conversely "you were wrong this is not spam" button). What you loose is the ability to edit your templates readily. What you gain is requiring maintaince of the databases in their original email form. The whole goal of the bayesian filter is to learn what you think is spam, and is not spam. I don't find myself re-reclassifying my email that often. If I thought the message was spam yesterday, I'm not likely to think it's not spam today. Requiring the user to track all their spam messages will also require them to track their nonspam messages if they want accurate results. Since, you are not only keeping track of bad words, but good words too. The so called "power" user should have the option to view the databases, add words, modify weights, and so forth. > I note that I looked into spamassassin, which seems to be the preferred > technique using an external filter, and I really dislike its rule-based > system. Way too many false positives, and a lot of work to set up and > maintain too. Spam filtering would be a great integrated feature and > doesn't look like it'd be a lot of work to implement. > spamassassin isn't that bad. Though it can't beat any bayesian filters. The advantage is that it comes fully ready out of the box. Bayesian filters, before they are trained will do poorly. The comand line bayesian filters I saw were bogofilter (what I use), and ifile (which uses the spam folders and such). If you are using a pop host, I'd recommend popfile (I've used it with OE, and recommended it to all my friends who use Outlook or OE with a pop host). It has a nice web based interface to modify the word lists and such. _______________________________________________ evolution maillist - [EMAIL PROTECTED] http://lists.ximian.com/mailman/listinfo/evolution
