Re: Bayes db and token expiry questions
Hi, >> Well, what's the missing 120 MB? The journal? Do a complete sync and >> then delete it. > > Probably the signatures in bayes_seen - there's no mechanism for ageing > them out. And I assume that isn't a problem then? >> "too big" is not an absolute figure. If you store 1-occurence tokens >> you will obviously have more tokens than without them. > > There's not really a choice since all tokens start that way. Maybe a better estimate would be in terms of time. For how long should the unseen tokens (only occurred once, I guess) remain in the database? Perhaps that's a good metric. For me it's about a week now. >> You should use autolearn if you don't do yet. > > Autolearning can make things worse by dropping the retention period. Yes, I'm using autolearn, but how does that affect the retention period? What do the two have to do with each other? Do you mean auto-expire, not auto-learn? My database seems to have improved slightly over the past few days after increasing the max db size to 1.6M. I guess there is also a lot of expiry pending also, because the database is currently much larger than that today: 0.000 02050481 0 non-token data: ntokens Looks like about 345k to be purged, if I understand correctly? Thanks, Alex Thanks, Alex
RE: ATTN DEVELOPERS: Mega-Spam
On Mon, 29 Mar 2010, Brent Kennedy wrote: Ya know, this got me thinking. Wonder if I could create a VM with all the settings and a script to customize the setup. Then organizations could just deploy the VM. Sort of an all in one deployment. Just update the VM template every now and then. Ahh but the learning db might be an issue oh well just a thought. A second VM hosting the bayes DB on MYSQL or Postgres. That way you can drop-in upgrade the SA vm without destabilizing the bayes DB VM. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Our government wants to do everything it can "for the children," except sparing them crushing tax burdens. --- 3 days until April Fools' day
Re: trusted_networks
On 3/29/2010 11:40 AM, Kaleb Hosie wrote: > I'm having a problem with the trusted_networks option. Right now I have it > set to: > > trusted_networks 10.0.1/24 > > In postfix, I need to have spamassassin listed under > "smtpd_recipient_restrictions" so that it will only scan incoming emails > however it would be handy to get this option working if at all possible so it > won't scan outgoing emails. > > When I try to use this option; I login through telnet port 25, and send the > test spam string (from the 10.0.1.0 subnet) it still gets caught in spam. Am > I doing something wrong or is there another option I need to choose? > > Thanks! > Kaleb > > Trusted in this case means "trusted to not forge headers, and while unlikely to originate spam, this host might relay it." For example, your front-end MX would be trusted if your SA runs on an internal server the MX relays to. It will definitely forward whatever spam it gets, because it forwards all mail. trusted_networks is not a whitelisting mechanism. You can check if your trust is working by seeing if messages that are only handled by trusted hosts match the ALL_TRUSTED rule. This rule carries a small negative score, but cannot outweigh the GTUBE sample. In fact, even our whitelist mechanisms won't outweigh a GTUBE. GTUBE is meant to *ALWAYS* be marked as spam if SA scans it, regardless of whitelist settings.
RE: ATTN DEVELOPERS: Mega-Spam
Graylisting does work. We have been using SQLGrey (http://sqlgrey.sourceforge.net/) for three years now. The minute I turned it on, spam to my junk email folder(what SA used to catch) dropped by 90%. SQLGrey sits at the MTA level, so it hits the sender when they connect and before they actually submit email. Obviously, it does allow them through if they come back, but most botnet senders do not retry messages or never have the chance. I think after I turned it on, the botnet plug-in got bored. My stats for it dropped significantly. So thats my proof it does adversely affect botnets. I wish I still had the stats graphs for when I turned it on. However, you can see its affect on my graph here: http://brain.chcfl.com/postfix/ ( noted as rejections ). I also have active directory setup with the MTA, so no messages ever hit the server that do not belong nor NDRs generated. If they try a dictionary attack, they will be on tarpit duty for a long time. To see the relief on someone's face after they realize they only 10 junk emails to glance at rather than 100, you see the value of graylisting. I have put my setup in a few other locations and they also report back to me that their users are now getting work done rather than parsing emails. Ya know, this got me thinking. Wonder if I could create a VM with all the settings and a script to customize the setup. Then organizations could just deploy the VM. Sort of an all in one deployment. Just update the VM template every now and then. Ahh but the learning db might be an issue oh well just a thought. -Brent -Original Message- From: Jonas Eckerman [mailto:jonas_li...@frukt.org] Sent: Monday, March 29, 2010 6:41 PM To: John Hardin Subject: Re: ATTN DEVELOPERS: Mega-Spam On 2010-03-30 00:12, John Hardin wrote: > While greylisting will help, it won't spank the offender in that manner. > It will postpone the message very early in the SMTP exchange, not after > the body has been received. Unless the greylisting is done *after* receiving the body. Of course, this will spank innocent senders as well. (My selective greylisting implementation for MIMEDefang does this, originally because some stupid MTAs didn't handle tempfails correctly at earlier stages... The "selective" stuff keeping delays and spanking of innocents down.) BTW: While I like greylisting because it stops a lot of spam, I've never seen any data substantiating claims that it has a measurable negative impact on botnets. So I'm not convinced it really does a lot of spanking of offenders... Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: ATTN DEVELOPERS: Mega-Spam
On 2010-03-30 00:12, John Hardin wrote: While greylisting will help, it won't spank the offender in that manner. It will postpone the message very early in the SMTP exchange, not after the body has been received. Unless the greylisting is done *after* receiving the body. Of course, this will spank innocent senders as well. (My selective greylisting implementation for MIMEDefang does this, originally because some stupid MTAs didn't handle tempfails correctly at earlier stages... The "selective" stuff keeping delays and spanking of innocents down.) BTW: While I like greylisting because it stops a lot of spam, I've never seen any data substantiating claims that it has a measurable negative impact on botnets. So I'm not convinced it really does a lot of spanking of offenders... Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: ATTN DEVELOPERS: Mega-Spam
> We've got plenty of time, but I suggest not waiting until it becomes a > big problem before desperately rushing to fix it :) Depends on how one defines where a problem starts to become 'big'. For me the problem of large messages was big enough early last year so that I had to implement a solution for it in amavisd-new 2.6.3 - with a corresponding support in the SpamAssassin 3.3.0 library (most of it in its DKIM plugin, as mentioned by Michael). From 2.6.3 release notes (April 22, 2009): - large messages beyond $sa_mail_body_size_limit are now partially passed to SpamAssassin and other spam scanners for checking: a copy passed to a spam scanner is truncated near or slightly past the indicated limit. Large messages are no longer given an almost free passage through spam checks. Note that message truncation can invalidate a DKIM or DK signature. If using (non-default) SpamAssassin rules to assign score points to mail with no valid signatures from authors which are expected to always provide a valid signature, the message truncation can cause false positives on these rules. As a workaround, to a truncated message passed to spam scanners, amavisd inserts a header field: X-Amavis-MessageSize: m, TRUNCATED to n which can be captured by SpamAssassin rules, e.g.: header __TRUNCATED X-Amavis-MessageSize =~ m{\A[^\n]*TRUNCATED}m and used in rules like NOTVALID_EBAY to prevent them from triggering. Starting with version 3.3.0 of SpamAssassin, its DKIM plugin understands the issue and receives undamaged DKIM signature objects directly from amavisd, so the above workaround is not needed. Also, a hit on a __TRUNCATED rule is automatically generated (explicit header rule is not necessary), just in case it might be useful for some purpose. Just did a grep on our log, seeking out large spam messages (beyond 420 kB) and print their sizes. Below is the complete list for March 2010. Seems we are getting about 4.5 large spam messages daily on the average (out of about 75k messages daily). A lot? Depends on one's point of view. Mar 1 score: 12.7, size: 533 kB Mar 1 score: 23.9, size: 435 kB Mar 1 score: 16.4, size: 533 kB Mar 1 score: 7.4, size: 490 kB Mar 1 score: 9.1, size: 490 kB Mar 1 score: 19.6, size: 721 kB Mar 2 score: 15.1, size: 1132 kB Mar 2 score: 16.9, size: 643 kB Mar 2 score: 16.9, size: 643 kB Mar 2 score: 7.3, size: 587 kB Mar 2 score: 21.9, size: 721 kB Mar 2 score: 21.6, size: 527 kB Mar 2 score: 24.8, size: 436 kB Mar 2 score: 20.5, size: 527 kB Mar 2 score: 20.6, size: 528 kB Mar 3 score: 23.6, size: 435 kB Mar 3 score: 30.4, size: 543 kB Mar 3 score: 30.2, size: 543 kB Mar 3 score: 30.2, size: 543 kB Mar 3 score: 31.5, size: 543 kB Mar 3 score: 18.3, size: 1132 kB Mar 3 score: 31.5, size: 543 kB Mar 4 score: 31.5, size: 543 kB Mar 4 score: 32.9, size: 543 kB Mar 4 score: 10.3, size: 719 kB Mar 4 score: 10.3, size: 719 kB Mar 4 score: 10.1, size: 719 kB Mar 4 score: 10.0, size: 720 kB Mar 4 score: 10.2, size: 719 kB Mar 5 score: 16.0, size: 1132 kB Mar 5 score: 24.2, size: 513 kB Mar 5 score: 25.9, size: 513 kB Mar 6 score: 9.9, size: 719 kB Mar 7 score: 29.6, size: 699 kB Mar 7 score: 12.3, size: 682 kB Mar 7 score: 10.2, size: 433 kB Mar 7 score: 10.2, size: 433 kB Mar 7 score: 38.0, size: 543 kB Mar 7 score: 38.0, size: 543 kB Mar 7 score: 38.0, size: 543 kB Mar 7 score: 7.5, size: 1787 kB Mar 7 score: 18.1, size: 643 kB Mar 7 score: 38.0, size: 543 kB Mar 7 score: 38.0, size: 543 kB Mar 8 score: 38.0, size: 543 kB Mar 8 score: 18.2, size: 643 kB Mar 8 score: 7.1, size: 1050 kB Mar 8 score: 18.0, size: 1132 kB Mar 8 score: 25.7, size: 501 kB Mar 9 score: 30.6, size: 813 kB Mar 9 score: 13.7, size: 779 kB Mar 10 score: 36.9, size: 470 kB Mar 10 score: 29.3, size: 1407 kB Mar 10 score: 13.5, size: 910 kB Mar 11 score: 8.0, size: 812 kB Mar 11 score: 8.4, size: 821 kB Mar 11 score: 25.5, size: 435 kB Mar 12 score: 8.8, size: 857 kB Mar 12 score: 32.5, size: 543 kB Mar 12 score: 32.5, size: 543 kB Mar 12 score: 32.5, size: 543 kB Mar 12 score: 32.5, size: 543 kB Mar 12 score: 32.5, size: 543 kB Mar 12 score: 32.5, size: 543 kB Mar 12 score: 32.5, size: 543 kB Mar 12 score: 32.5, size: 543 kB Mar 12 score: 32.5, size: 543 kB Mar 13 score: 32.5, size: 543 kB Mar 13 score: 7.8, size: 732 kB Mar 13 score: 7.9, size: 926 kB Mar 13 score: 7.7, size: 732 kB Mar 13 score: 19.9, size: 513 kB Mar 14 score: 8.7, size: 821 kB Mar 14 score: 8.8, size: 821 kB Mar 14 score: 8.8, size: 821 kB Mar 14 score: 8.7, size: 821 kB Mar 14 score: 8.8, size: 821 kB Mar 14 score: 8.7, size: 821 kB Mar 14
RE: ATTN DEVELOPERS: Mega-Spam
On Mon, 29 Mar 2010, Brent Kennedy wrote: My suggestion would be to use graylisting, force them to send that 1MB message twice. While greylisting will help, it won't spank the offender in that manner. It will postpone the message very early in the SMTP exchange, not after the body has been received. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Windows Vista: Windows ME for the XP generation. --- 3 days until April Fools' day
Re: ATTN DEVELOPERS: Mega-Spam
On Mon, 2010-03-29 at 23:01 +0200, Mathias Homann wrote: > I think it has, I get about 2-5 mega spams per day by now. > and I can't do greylisting because I have to fetchmail from a central > mail server at my hoster that is not under my direct control. > And no, moving from a vhost to a root server just to be able to > greylist is not an option. 5 euro per month versus 50 euro per > month... > Can you persuade your hosting site to implement grey-listing? My ISP implemented grey-listing over a year ago. When they did, my overall spam rate immediately dropped from 80% of my mail stream to under 10%. Currently spam is running at less than 5%. As a result my SA subsystem is mostly trapping spam sent over less-common channels, e.g. mailing lists and an ISP-provided address I no longer use. Having to handle a stream of large spam messages can't be improving the throughput and disk usage of your hosting site's mail server either, so its worth pointing that out to them. They may be more amenable to introducing grey-listing than you realise. Martin
Re: ATTN DEVELOPERS: Mega-Spam
On Mon, 2010-03-29 at 16:57 -0400, Charles Gregory wrote: > The spams I've seen so far look more 'amateur' than 'pro'. Easily tracable > IP's. Blacklistable domains. I'm just throwing my idea into the queue now > so that it can be smoothly integrated with a future release. We've got > plenty of time, but I suggest not waiting until it becomes a big problem > before desperately rushing to fix it :) Agreed on the latter part. But then again, this is a topic for the dev list [1] to start a discussion, not here. guenther [1] Also note your very own Subject. -- char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: ATTN DEVELOPERS: Mega-Spam
Am Montag 29 März 2010 schrieb Karsten Bräckelmann: > On Mon, 2010-03-29 at 16:23 -0400, Brent Kennedy wrote: > > Wow, I knew this was coming at some point. I just figured it was > > too expensive. > > You did read the entire thread, right? :) There's nothing new > about this. Moreover, this still is a rare occurrence. Note even > Charles, who started this thread, claims to have received *one* > such spam. And it appears to be his first. ;) > > Now, if this starts to become a more general pattern... I think it has, I get about 2-5 mega spams per day by now. and I can't do greylisting because I have to fetchmail from a central mail server at my hoster that is not under my direct control. And no, moving from a vhost to a root server just to be able to greylist is not an option. 5 euro per month versus 50 euro per month... bye, MH
Re: ATTN DEVELOPERS: Mega-Spam
On Mon, 29 Mar 2010, Karsten Bräckelmann wrote: You did read the entire thread, right? :) There's nothing new about this. Moreover, this still is a rare occurrence. Note even Charles, who started this thread, claims to have received *one* such spam. And it appears to be his first. ;) Last September the number of spams exceeding 256KB became frequent enough that I bumped up my limit. Now I'm starting to see spams past the new limit (400KB). But when they jump up to 1MB, maybe it's time for a different solution, and maybe regain some of system efficiency by adding the suggested mechanism to SA and only doing significant body scans on messages less than 256KN again :) Now, if this starts to become a more general pattern... The spams I've seen so far look more 'amateur' than 'pro'. Easily tracable IP's. Blacklistable domains. I'm just throwing my idea into the queue now so that it can be smoothly integrated with a future release. We've got plenty of time, but I suggest not waiting until it becomes a big problem before desperately rushing to fix it :) My 0.02 dollars - C
Re: Sought Rules Back?
On Mon, 2010-03-29 at 16:05 -0400, Jason Bertoch wrote: > > Btw, the three rules JM_SOUGHT_FRAUD_{1,2,3} have a score of zero > > as per Justin's request (Bug 6155 c 38, c72, c89, c124). > > Not sure if people using the channel realize that scores > > need to be bumped up. Btw, I prefer to avoid them monopolizing > > the score when more than one hits: > > > > score JM_SOUGHT_FRAUD_1 0.1 > > score JM_SOUGHT_FRAUD_2 0.1 > > score JM_SOUGHT_FRAUD_3 0.1 > > meta JM_SOUGHT_FRAUD_ANY JM_SOUGHT_FRAUD_1 || JM_SOUGHT_FRAUD_2 || > > JM_SOUGHT_FRAUD_3 > > score JM_SOUGHT_FRAUD_ANY 3.0 > Bug 6155 is now closed, but the SOUGHT rules still have a score of 0. > Anyone have an idea on when these rules will be activated again? The zero score request applies *only* to the SOUGHT_FRAUD sub-set. It does *not* affect SOUGHT. Those do have scores according to the GA run. Also, this applies *only* to 3.3, where this moved into stock. Again, the dedicated sa-update channel (also suitable for 3.2) is *not* affected and still has the same scores it used to. Now, regarding activating again -- just do. They are merely disabled by default (in 3.3 stock). You can "activate" them on your site, simply by dropping score lines into your local config. guenther -- char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
RE: ATTN DEVELOPERS: Mega-Spam
On Mon, 2010-03-29 at 16:23 -0400, Brent Kennedy wrote: > Wow, I knew this was coming at some point. I just figured it was too > expensive. You did read the entire thread, right? :) There's nothing new about this. Moreover, this still is a rare occurrence. Note even Charles, who started this thread, claims to have received *one* such spam. And it appears to be his first. ;) Now, if this starts to become a more general pattern... guenther > -Original Message- > From: Charles Gregory [mailto:cgreg...@hwcn.org] > > Literally, Mega-Spam. I just got a spam with 1MB of images. -- char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
RE: ATTN DEVELOPERS: Mega-Spam
Wow, I knew this was coming at some point. I just figured it was too expensive. My suggestion would be to use graylisting, force them to send that 1MB message twice. Course zombie bots don't do that generally, so you would never even have to deal with it. You could also use the botnet plug-in. It would be good if SA could handle this though. The above are only temporary solutions to a bigger problem. -Brent -Original Message- From: Charles Gregory [mailto:cgreg...@hwcn.org] Sent: Monday, March 29, 2010 1:09 PM To: users@spamassassin.apache.org Subject: ATTN DEVELOPERS: Mega-Spam Literally, Mega-Spam. I just got a spam with 1MB of images. My suggestion has been made before, but I would like to ask that it now be taken a bit more seriously. SA needs an option to allow efficient 'partial' scanning of large e-mails, so that, for example, we can peform all the valuable header checks, and maybe even scan for URIBL hits within the first few hundred K of the body? Is it possible (and easy!) to set a flag that tells SA to stop testing aganist the body when it reaches a certain byte count Or perhaps, if I understand the docs correctly, most rules only trigger on textual message parts anyway, so by simply disabling 'full' rules and possbily 'rawbody', we could get the desired result without too much of a processing hit? - C
Re: Sought Rules Back?
On 2010/02/01 10:30 AM, Mark Martinec wrote: Update returned sought rules 1/31/2010. Actually back since Jan 6. :) Re-viewed about 1k fraud spam the following days, for the Sought Fraud sub-set. Btw, the three rules JM_SOUGHT_FRAUD_{1,2,3} have a score of zero as per Justin's request (Bug 6155 c 38, c72, c89, c124). Not sure if people using the channel realize that scores need to be bumped up. Btw, I prefer to avoid them monopolizing the score when more than one hits: score JM_SOUGHT_FRAUD_1 0.1 score JM_SOUGHT_FRAUD_2 0.1 score JM_SOUGHT_FRAUD_3 0.1 meta JM_SOUGHT_FRAUD_ANY JM_SOUGHT_FRAUD_1 || JM_SOUGHT_FRAUD_2 || JM_SOUGHT_FRAUD_3 score JM_SOUGHT_FRAUD_ANY 3.0 Mark Bug 6155 is now closed, but the SOUGHT rules still have a score of 0. Anyone have an idea on when these rules will be activated again? -- /Jason smime.p7s Description: S/MIME Cryptographic Signature
Re: FREEMAIL_ENVFROM_END_DIGIT score
On Mon, 2010-03-29 at 13:52 -0400, Jason Bertoch wrote: > I recently received a FP report on an e-mail that hit on, among other > things, FREEMAIL_ENVFROM_END_DIGIT. This rule has a score of 1.6, which > seems maybe a little high. Henrik mentioned the same thing in comment > 185 [1] of Bug 6155 which is closed as resolved/fixed. The assumption > was that there probably isn't much ham in the corpora that matches > addresses like these and therefore the score may be unfairly high. > > The closed bug was addressing overall score generation and not directly > related to this rule. Have any of the devs already looked at this > particular issue, or should this be opened as a new bug for further > investigation? Please do. We might want to lock down the score -- given there's no way yet to do "minimum of score X and GA result", which would be even better. guenther > [1] https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6155#c185 -- char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: FREEMAIL_ENVFROM_END_DIGIT score
On 3/29/10 1:52 PM, Jason Bertoch wrote: I recently received a FP report on an e-mail that hit on, among other things, FREEMAIL_ENVFROM_END_DIGIT. This rule has a score of 1.6, which seems maybe a little high. Henrik mentioned the same thing in comment 185 [1] of Bug 6155 which is closed as resolved/fixed. The assumption was that there probably isn't much ham in the corpora that matches addresses like these and therefore the score may be unfairly high. The closed bug was addressing overall score generation and not directly related to this rule. Have any of the devs already looked at this particular issue, or should this be opened as a new bug for further investigation? WAY too many gmail and hotmail and yahoo accounts out there, and they HAVE TO END IN DIGITS.so, FREEMAIL-ENVFROM_END_DIGIT is redundant with FREEMAIL. oh, and I have clients who claim their lawyer uses aol for his corporate email address. and guess what? yes, it ends in a digit since his lastname , first/last and last/first were already taken. -- Michael Scheidell, CTO Phone: 561-999-5000, x 1259 > *| *SECNAP Network Security Corporation * Certified SNORT Integrator * 2008-9 Hot Company Award Winner, World Executive Alliance * Five-Star Partner Program 2009, VARBusiness * Best Anti-Spam Product 2008, Network Products Guide * King of Spam Filters, SC Magazine 2008 __ This email has been scanned and certified safe by SpammerTrap(r). For Information please see http://www.secnap.com/products/spammertrap/ __
Re: spamc syslog loglevel for "skipped message, greater than max message size"
Philipp, > why does > > spamc[28825]: [ID 702911 mail.error] skipped message, greater than max > message size (512000 bytes) > > have to be log level error? > > Instead of error would "warn" not be enough? That was fixed in 3.3.0: https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5325 Mark
spamc syslog loglevel for "skipped message, greater than max message size"
Hi why does spamc[28825]: [ID 702911 mail.error] skipped message, greater than max message size (512000 bytes) have to be log level error? Instead of error would "warn" not be enough? thanks, Philipp
FREEMAIL_ENVFROM_END_DIGIT score
I recently received a FP report on an e-mail that hit on, among other things, FREEMAIL_ENVFROM_END_DIGIT. This rule has a score of 1.6, which seems maybe a little high. Henrik mentioned the same thing in comment 185 [1] of Bug 6155 which is closed as resolved/fixed. The assumption was that there probably isn't much ham in the corpora that matches addresses like these and therefore the score may be unfairly high. The closed bug was addressing overall score generation and not directly related to this rule. Have any of the devs already looked at this particular issue, or should this be opened as a new bug for further investigation? [1] https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6155#c185 -- /Jason smime.p7s Description: S/MIME Cryptographic Signature
Re: ATTN DEVELOPERS: Mega-Spam
Aw, is that shouting really necessary? Oh, yes, it is indeed -- you are trying to get heard over on the dev list, so you need to be quite loud from here... ;) The dev list is what you want. On Mon, 2010-03-29 at 13:09 -0400, Charles Gregory wrote: > Literally, Mega-Spam. I just got a spam with 1MB of images. The largest one I've seen included about 4.5 MByte worth of 7 jpeg images, the largest one of which 1.2 MByte. And that doesn't even include the considerable base64 overhead for the mail... On the other hand: Guess what, I get about one spam per year exceeding the default size threshold of 500 kByte. guenther -- char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: ATTN DEVELOPERS: Mega-Spam
On 3/29/10 1:09 PM, Charles Gregory wrote: Literally, Mega-Spam. I just got a spam with 1MB of images. My suggestion has been made before, but I would like to ask that it now be taken a bit more seriously. SA needs an option to allow efficient 'partial' scanning of large e-mails, so that, for example, we can peform all the valuable header checks, and maybe even scan for URIBL hits within the first few hundred K of the body? could, will and does mess up dkim checks and language checks. That said, amavisd-new has a switch to do this already, and for the very same reason. (yes, it costs the scumbags nothing to have aunt martha and her zombot send out 600MM 1MB spams) -- Michael Scheidell, CTO Phone: 561-999-5000, x 1259 > *| *SECNAP Network Security Corporation * Certified SNORT Integrator * 2008-9 Hot Company Award Winner, World Executive Alliance * Five-Star Partner Program 2009, VARBusiness * Best Anti-Spam Product 2008, Network Products Guide * King of Spam Filters, SC Magazine 2008 __ This email has been scanned and certified safe by SpammerTrap(r). For Information please see http://www.secnap.com/products/spammertrap/ __
ATTN DEVELOPERS: Mega-Spam
Literally, Mega-Spam. I just got a spam with 1MB of images. My suggestion has been made before, but I would like to ask that it now be taken a bit more seriously. SA needs an option to allow efficient 'partial' scanning of large e-mails, so that, for example, we can peform all the valuable header checks, and maybe even scan for URIBL hits within the first few hundred K of the body? Is it possible (and easy!) to set a flag that tells SA to stop testing aganist the body when it reaches a certain byte count Or perhaps, if I understand the docs correctly, most rules only trigger on textual message parts anyway, so by simply disabling 'full' rules and possbily 'rawbody', we could get the desired result without too much of a processing hit? - C
RE: trusted_networks
> On 29.3.2010 18:40, Kaleb Hosie wrote: > > I'm having a problem with the trusted_networks option. > Right now I have it set to: > > > > trusted_networks 10.0.1/24 > > > > In postfix, I need to have spamassassin listed under > "smtpd_recipient_restrictions" so that it will only scan > incoming emails however it would be handy to get this option > working if at all possible so it won't scan outgoing emails. > > > > When I try to use this option; I login through telnet port > 25, and send the test spam string (from the 10.0.1.0 subnet) > it still gets caught in spam. Am I doing something wrong or > is there another option I need to choose? > > > > What is your glue to SpamAssassin? How is it called? > > I call SA from maildrop or procmail, which automatically > makes it for incoming only. There are so many ways to do it. > > -- > http://www.iki.fi/jarif/ > > You've been leading a dog's life. Stay off the furniture. > I thought that with that option, SA is able to decide itself as to whether to scan it or not. The program that I use to interface with SA is a rather unknown program called SpamAssassin Quarantine (SAQ). If SA isn't able to decide for itself to not scan particular emails depending upon whether it's from the internal network or not then I'll have to see about reprogramming SAQ to work. Kaleb
Re: trusted_networks
On Mon, 2010-03-29 at 11:40 -0400, Kaleb Hosie wrote: > I'm having a problem with the trusted_networks option. Right now I have > it set to: > > trusted_networks 10.0.1/24 > When I try to use this option; I login through telnet port 25, and send > the test spam string (from the 10.0.1.0 subnet) it still gets caught > in spam. Am I doing something wrong or is there another option I need > to choose? Please re-read the documentation about trusted_networks again. It is *not* an option for bypassing SA. In fact, there is no such option. SA will scan whatever it gets fed. So if you want to bypass SA for mail generated from a particular network, you need to adjust the glue that calls SA to just not do that. -- char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: trusted_networks
On 29.3.2010 18:40, Kaleb Hosie wrote: > I'm having a problem with the trusted_networks option. Right now I have it > set to: > > trusted_networks 10.0.1/24 > > In postfix, I need to have spamassassin listed under > "smtpd_recipient_restrictions" so that it will only scan incoming emails > however it would be handy to get this option working if at all possible so it > won't scan outgoing emails. > > When I try to use this option; I login through telnet port 25, and send the > test spam string (from the 10.0.1.0 subnet) it still gets caught in spam. Am > I doing something wrong or is there another option I need to choose? > What is your glue to SpamAssassin? How is it called? I call SA from maildrop or procmail, which automatically makes it for incoming only. There are so many ways to do it. -- http://www.iki.fi/jarif/ You've been leading a dog's life. Stay off the furniture. signature.asc Description: OpenPGP digital signature
trusted_networks
I'm having a problem with the trusted_networks option. Right now I have it set to: trusted_networks 10.0.1/24 In postfix, I need to have spamassassin listed under "smtpd_recipient_restrictions" so that it will only scan incoming emails however it would be handy to get this option working if at all possible so it won't scan outgoing emails. When I try to use this option; I login through telnet port 25, and send the test spam string (from the 10.0.1.0 subnet) it still gets caught in spam. Am I doing something wrong or is there another option I need to choose? Thanks! Kaleb
Re: Bayes db and token expiry questions
On Mon, 29 Mar 2010 13:03:59 +0200 Kai Schaetzl wrote: > Alex wrote on Sun, 28 Mar 2010 13:38:25 -0400: > > > I have a bayes db that's about 160MB with a 40MB token db on a > > system with about 100k messages per day. > > Well, what's the missing 120 MB? The journal? Do a complete sync and > then delete it. Probably the signatures in bayes_seen - there's no mechanism for ageing them out. > You should be > aware that the expiry kicks in at 75%, not at 100% of max_db_size. And it may reduce the tokens to 37.5% of nominal > I suggest you change to SQL. This eliminates the journal. Isn't that slower than journalled db? > > database was too big, so I lowered it back down, but I think that > > was a mistake. > > "too big" is not an absolute figure. If you store 1-occurence tokens > you will obviously have more tokens than without them. There's not really a choice since all tokens start that way. > You should use autolearn if you don't do yet. Autolearning can make things worse by dropping the retention period.
Re: Bayes db and token expiry questions
Alex wrote on Sun, 28 Mar 2010 13:38:25 -0400: > I have a bayes db that's about 160MB with a 40MB token db on a system > with about 100k messages per day. Well, what's the missing 120 MB? The journal? Do a complete sync and then delete it. I've just raised the max_db_size set > to 1.1M tokens (there are currently 1.06M tokens in there). That's not much for a system with 100.000 messages a day. I don't mean it's not sufficient, it is just not "too much". You should be aware that the expiry kicks in at 75%, not at 100% of max_db_size. I've also > changed bayes to write to the journal instead of directly to the > database and just checking it periodically to see if the journal needs > to be synced. I suggest you change to SQL. This eliminates the journal. > > Can someone explain to me the relationship between the frequency of > "1-occurrence tokens" and the size of the database? Here is the output > from a recent manual sync: > > token frequency: 1-occurrence tokens: 72.60% > token frequency: less than 8 occurrences: 18.11% > > I was thinking that the because the tokens are seen only once, it probably means you get a lot of fresh tokens in. Do you autolearn? the > database was too big, so I lowered it back down, but I think that was > a mistake. "too big" is not an absolute figure. If you store 1-occurence tokens you will obviously have more tokens than without them. If you slash the db (which slashes from all tokens, not just those 1.o ones) and the performance goes down afterwards that was obviously a wrong decision ;-) I don't know if and how this is reflected in the database itself in size. This is a DBM database which will have certain sizes by design no matter how many tokens are in it. If the token database is only 40 MB that is not overly large, it's normal. Now some of the same emails are continually hitting only > BAYES_50 while others seemingly the same hit BAYES_99. I've now raised > the number of tokens available and continue to manually train the > database with spam and ham (there are about 1.1M spam and 500k ham > currently). You should use autolearn if you don't do yet. If you want to be safe you can change the learning thresholds to safer values. (I think I use 8 for spam and keep the default for ham.) > Have I configured something wrong, or am I misunderstanding how this > works? Is there something else I should read? I think your db was ok as it was. You should read how to change to SQL ;-) Do the expiry once per night per cron. Kai -- Get your web at Conactive Internet Services: http://www.conactive.com