Re: Am I fscking up my bayes db?
On 09.07.09 09:30, Daniel Schaefer wrote: > I have a similar setup. If a Spam message makes it to my inbox with less > than the required_score, I put it into a SPAM folder and run sa-learn on > the folder. Should I also implement the following ignore rules? > > bayes_ignore_header X-Spam-Flag > bayes_ignore_header X-Spam-Level > bayes_ignore_header X-Spam-Status > bayes_ignore_header X-Spam...etc. Not needed, these are already ignored by spamassassin itself. -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. I just got lost in thought. It was unfamiliar territory.
Re: Am I fscking up my bayes db?
On Thu, 09 Jul 2009 09:30:37 -0400 Steve Bertrand wrote: > It's extremely infrequent how often I have to touch my email setup, > but I've always been curious about this. > > Given your recommendation, would you say that a reset on the db should > be performed? > Essentially, is it fair to say that what I've done has possibly caused > damage? The Barracuda headers don't matter much unless you get similar headers in your legitimate incoming mail, in which case just tell bayes to ignore them. The irrelevant tokens will eventually age out of the database. The received headers are a bit more of a problem because you're weighting bayes against your work domain, ip addresses etc. You could try sending yourself a mail from work and see if it looks spammy.
Re: Am I fscking up my bayes db?
On Thu, 9 Jul 2009, Martin Gregorie wrote: On Thu, 2009-07-09 at 08:50 -0400, Steve Bertrand wrote: My question is, given that the messages have already been processed by the 'cuda's (with their header stamps in place), am I damaging, or at risk of confusing the learning process of SA when I classify these messages as SPAM? Not really answering your question, but I find its helpful to strip SA headers out of the message collection I use for testing private rules. Here's a simple bash shell script fragment that does the job and does it fairly fast: for f in data/*.txt do echo "Cleaning $f" gawk ' BEGIN { act = "copy" } /^X-Spam/ { act = "skip" } /^[A-WYZ]/ { act = "copy" } { if (act == "copy") { print } } ' <$f >temp.txt mv temp.txt $f done ...wouldn't that mangle wrapped X-Spam headers? -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- North Korea: the only country in the world where people would risk execution to flee to communist China. -- Ride Fast --- 11 days until the 40th anniversary of Apollo 11 landing on the Moon
Re: Am I fscking up my bayes db?
On Thu, 09 Jul 2009, Martin Gregorie wrote: > Here's a simple bash shell script fragment that does the job and does it > fairly fast: > > > for f in data/*.txt ... > gawk ' ... > done > Having also Non-LINUX-Users on the list, you might have explained that THIS script needs 'gawk' (old awk would be enough) and works on 'alle the Files in one directory, if their names end on '.txt' :-) E.g. my mail-collection-files mostly end on '*.box' or '*.eml' and my old Solaris never had any 'gawk'. The trick to delete all runs of 'X' Headers from 'X-Spam' on is a good idea (execept e.g. if the next Header is 'X-remote-IP' and you want to check for internal Mail :-). Stucki -- Christoph von Stuckrad * * |nickname |Mail \ Freie Universitaet Berlin |/_*|'stucki' |Tel(Mo.,Do.):+49 30 838-75 459| Mathematik & Informatik EDV |\ *|if online| (Di,Mi,Fr):+49 30 77 39 6600| Takustr. 9 / 14195 Berlin * * |on IRCnet|Fax(home): +49 30 77 39 6601/
Re: Am I fscking up my bayes db?
On Thu, 2009-07-09 at 08:50 -0400, Steve Bertrand wrote: > My question is, given that the messages have already been processed by > the 'cuda's (with their header stamps in place), am I damaging, or at > risk of confusing the learning process of SA when I classify these > messages as SPAM? > Not really answering your question, but I find its helpful to strip SA headers out of the message collection I use for testing private rules. Here's a simple bash shell script fragment that does the job and does it fairly fast: for f in data/*.txt do echo "Cleaning $f" gawk ' BEGIN { act = "copy" } /^X-Spam/ { act = "skip" } /^[A-WYZ]/ { act = "copy" } { if (act == "copy") { print } } ' <$f >temp.txt mv temp.txt $f done Martin
Re: Am I fscking up my bayes db?
Mike Cardwell wrote: > Steve Bertrand wrote: >> My question is, given that the messages have already been processed by >> the 'cuda's (with their header stamps in place), am I damaging, or at >> risk of confusing the learning process of SA when I classify these >> messages as SPAM? >> >> Are there any negative consequences by doing this? > > You should configure bayes to ignore those headers. In your local.cf, > list each of the cuda headers like this: > > bayes_ignore_header X-CudaHeader1 > bayes_ignore_header X-CudaHeader2 > bayes_ignore_header X-CudaHeader3 Thanks Mike. It's extremely infrequent how often I have to touch my email setup, but I've always been curious about this. Given your recommendation, would you say that a reset on the db should be performed? Essentially, is it fair to say that what I've done has possibly caused damage? Steve ps. fwiw, I feel that my SA setup is not under-performing in any way at this time. smime.p7s Description: S/MIME Cryptographic Signature
Re: Am I fscking up my bayes db?
Mike Cardwell wrote: Steve Bertrand wrote: Hi everyone, I aggregate my work and personal email accounts within the same email client. All accounts are IMAP-based. My $work employs a Barracuda cluster, and of course my box runs SA. From time-to-time, I'll get a SPAM message come through the 'cuda's. From there, I move the message from one IMAP folder in my MUA into another SPAM folder, which essentially is a transfer from a work storage server onto my server. Every few days, I run sa-learn against the collected SPAM messages. My question is, given that the messages have already been processed by the 'cuda's (with their header stamps in place), am I damaging, or at risk of confusing the learning process of SA when I classify these messages as SPAM? Are there any negative consequences by doing this? You should configure bayes to ignore those headers. In your local.cf, list each of the cuda headers like this: bayes_ignore_header X-CudaHeader1 bayes_ignore_header X-CudaHeader2 bayes_ignore_header X-CudaHeader3 I have a similar setup. If a Spam message makes it to my inbox with less than the required_score, I put it into a SPAM folder and run sa-learn on the folder. Should I also implement the following ignore rules? bayes_ignore_header X-Spam-Flag bayes_ignore_header X-Spam-Level bayes_ignore_header X-Spam-Status bayes_ignore_header X-Spam...etc. -- Dan Schaefer
Re: Am I fscking up my bayes db?
Steve Bertrand wrote: Hi everyone, I aggregate my work and personal email accounts within the same email client. All accounts are IMAP-based. My $work employs a Barracuda cluster, and of course my box runs SA. From time-to-time, I'll get a SPAM message come through the 'cuda's. From there, I move the message from one IMAP folder in my MUA into another SPAM folder, which essentially is a transfer from a work storage server onto my server. Every few days, I run sa-learn against the collected SPAM messages. My question is, given that the messages have already been processed by the 'cuda's (with their header stamps in place), am I damaging, or at risk of confusing the learning process of SA when I classify these messages as SPAM? Are there any negative consequences by doing this? You should configure bayes to ignore those headers. In your local.cf, list each of the cuda headers like this: bayes_ignore_header X-CudaHeader1 bayes_ignore_header X-CudaHeader2 bayes_ignore_header X-CudaHeader3 -- Mike Cardwell - IT Consultant and LAMP developer Cardwell IT Ltd. (UK Reg'd Company #06920226) http://cardwellit.com/
Am I fscking up my bayes db?
Hi everyone, I aggregate my work and personal email accounts within the same email client. All accounts are IMAP-based. My $work employs a Barracuda cluster, and of course my box runs SA. >From time-to-time, I'll get a SPAM message come through the 'cuda's. >From there, I move the message from one IMAP folder in my MUA into another SPAM folder, which essentially is a transfer from a work storage server onto my server. Every few days, I run sa-learn against the collected SPAM messages. My question is, given that the messages have already been processed by the 'cuda's (with their header stamps in place), am I damaging, or at risk of confusing the learning process of SA when I classify these messages as SPAM? Are there any negative consequences by doing this? Steve smime.p7s Description: S/MIME Cryptographic Signature