Re: FuzzyOCR only runs when specifying spamassassin -D
Andrew Bruce wrote: > > I've been looking at some of the spam emails I've received lately with > images attached and noticed that FuzzyOCR wasn't running against them. > > > > The same seems to be true when I take these messages and run them with: > > spamassassin -t < img-email.eml > > > > However if I run them through as follows, I get FuzzyOCR showing up in > the results: > > spamassassin -t -D < img-email.eml > Well, the rule that tripped was FUZZY_OCR_KNOWN_HASH, I'm no FuzzyOCR expert, but I'm guessing that's related to it storing the hashes of images attached to previous spam in a SQL database. So, in that case, it would have fired the second time regardless of -D being enabled. It's just firing off because it's already seen the image once before and cataloged it as belonging on spam. Glancing at fuzzyOCR's code for the first time, I think this is realated to the focr_enable_image_hashing option. > > > > I also get substantially different AWL results between the two > (although I guess that maybe part of the debug procedure). > -D does not change the AWL. The AWL score change that's a function of two things: 1) scanning the message multiple times. Every time you process it, the AWL will change, because every scanned message gets factored into the AWL's historical average score. 2) fuzzyOCR triggered off, raising the pre-AWL score, which is going to drive down the AWL score. (remember, the AWL score is based on the difference between this message and the past average). Adding +10 to the pre-AWL (which FuzzyOCR did) score should change the AWL score by -5.0, assuming the default AWL factor of 0.5. You saw a total swing of -7, so it looks like the first run raised the average by 4.0, in turn affecting the AWL score by -2.0, and then fuzzyOCR caused another -5.0 change in the AWL. In both cases the AWL still "thought" the message was spam, but in the second case it noted it had a much higher spam score than the previous spam, so it brought it back down a bit to split the difference. That's what the AWL does. See also: http://wiki.apache.org/spamassassin/AwlWrongWay http://wiki.apache.org/spamassassin/AutoWhitelist > >
Re: FuzzyOCR only runs when specifying spamassassin -D
Andrew Bruce wrote: > I've been looking at some of the spam emails I've received lately with > images attached and noticed that FuzzyOCR wasn't running against them. > [snip] > However if I run them through as follows, I get FuzzyOCR showing up in > the results: > > spamassassin -t -D < img-email.eml > [snip] > Does anyone know why this might be happening? I seem to recall > experiencing this before, but can't remember what I did to fix it. That's the way FuzzyOCR works: if a message already has scored above a configurable threshold it doesn't scan it, if you run in debug mode the threshold is ignored. -- René Berber
FuzzyOCR only runs when specifying spamassassin -D
I've been looking at some of the spam emails I've received lately with images attached and noticed that FuzzyOCR wasn't running against them. The same seems to be true when I take these messages and run them with: spamassassin -t < img-email.eml However if I run them through as follows, I get FuzzyOCR showing up in the results: spamassassin -t -D < img-email.eml I also get substantially different AWL results between the two (although I guess that maybe part of the debug procedure). Does anyone know why this might be happening? I seem to recall experiencing this before, but can't remember what I did to fix it. spamassassin -t: Content analysis details: (22.2 points, 5.0 required) pts rule name description -- -- 1.2 RCVD_IN_PBL RBL: Received via a relay in Spamhaus PBL [68.186.154.187 listed in zen.spamhaus.org] 3.0 RCVD_IN_XBL RBL: Received via a relay in Spamhaus XBL 0.9 RCVD_IN_SORBS_DUL RBL: SORBS: sent directly from dynamic IP address [68.186.154.187 listed in dnsbl.sorbs.net] 3.5 BAYES_99 BODY: Bayesian spam probability is 99 to 100% [score: 1.] 1.0 FH_HELO_EQ_CHARTER Helo is d-d-d-d charter.com 4.3 HELO_DYNAMIC_HCC Relay HELO'd using suspicious hostname (HCC) 4.4 HELO_DYNAMIC_IPADDR2 Relay HELO'd using suspicious hostname (IP addr 2) 0.0 FH_HELO_EQ_D_D_D_D Helo is d-d-d-d 2.0 RCVD_IN_BL_SPAMCOP_NET RBL: Received via a relay in bl.spamcop.net [Blocked - see ] 0.0 HTML_MESSAGE BODY: HTML included in message 0.1 RDNS_DYNAMIC Delivered to trusted network by host with dynamic-looking rDNS 1.8 AWL AWL: From: address is in the auto white-list spamassassin -t -D: Content analysis details: (25.7 points, 5.0 required) pts rule name description -- -- 3.0 RCVD_IN_XBL RBL: Received via a relay in Spamhaus XBL [68.186.154.187 listed in zen.spamhaus.org] 1.2 RCVD_IN_PBL RBL: Received via a relay in Spamhaus PBL 0.9 RCVD_IN_SORBS_DUL RBL: SORBS: sent directly from dynamic IP address [68.186.154.187 listed in dnsbl.sorbs.net] 3.5 BAYES_99 BODY: Bayesian spam probability is 99 to 100% [score: 1.] 1.0 FH_HELO_EQ_CHARTER Helo is d-d-d-d charter.com 4.3 HELO_DYNAMIC_HCC Relay HELO'd using suspicious hostname (HCC) 4.4 HELO_DYNAMIC_IPADDR2 Relay HELO'd using suspicious hostname (IP addr 2) 0.0 FH_HELO_EQ_D_D_D_D Helo is d-d-d-d 2.0 RCVD_IN_BL_SPAMCOP_NET RBL: Received via a relay in bl.spamcop.net [Blocked - see ] 0.0 HTML_MESSAGE BODY: HTML included in message 0.1 RDNS_DYNAMIC Delivered to trusted network by host with dynamic-looking rDNS 10 FUZZY_OCR_KNOWN_HASH BODY: -5.2 AWL AWL: From: address is in the auto white-list
Re: 'anti' AWL
On 28-Apr-2009, at 20:14, Matt Kettler wrote: The AWL uses the LAST non-private.. This is, IMO, completely broken. Yep, have to agree. This is seriously retarded. -- I love as only I can, with all my heart
Re: 'anti' AWL
Matt Kettler wrote: > LuKreme wrote: > >> On 28-Apr-2009, at 15:38, RW wrote: >> >>> It's based on the first routable IP address, >>> >> Well, that's a very silly thing for it to be looking at. It should be >> looking at the LAST routable IP address outside of the trusted >> network. Looking at the first routable address is completely worthless. >> > It's actually based on the last IP not matching your internal_networks. > If you haven't declared internal_networks or trusted_networks manually, > then the auto-guesser is going to set it to be the second-to-last > routable IP (it assumes the last routable is your MX, which may or may > not be correct depending on how you route/firewall your DMZ.) > > Of course, first, or last depends on your perspective. I assume RW was > thinking of "first" from a "starting at the inside, working backwards in > time" approach. This is backwards, if you think about the chronology of > the headers, like SA does. However, it makes sense from a "I'm at my > server looking outward at the world" point of view that most folks work > from when thinking about network topologies. > Darnit, I should have checked before sending. The AWL uses the LAST non-private.. This is, IMO, completely broken. Why are we allowing folks to declare internal_networks if we're not going to use it, and assume the last non-private is "external". (which, mind you, is different from what the trust-path guesser does. It assumes that IP is your MX.) Relevant code: foreach my $rly (reverse (@{$pms->{relays_trusted}}, @{$pms->{relays_untrusted}})) { next if ($rly->{ip_private}); if ($rly->{ip}) { $origip = $rly->{ip}; last; } } > > > > > > > > > > >
Re: 'anti' AWL
LuKreme wrote: > On 28-Apr-2009, at 15:38, RW wrote: >> It's based on the first routable IP address, > > > Well, that's a very silly thing for it to be looking at. It should be > looking at the LAST routable IP address outside of the trusted > network. Looking at the first routable address is completely worthless. It's actually based on the last IP not matching your internal_networks. If you haven't declared internal_networks or trusted_networks manually, then the auto-guesser is going to set it to be the second-to-last routable IP (it assumes the last routable is your MX, which may or may not be correct depending on how you route/firewall your DMZ.) Of course, first, or last depends on your perspective. I assume RW was thinking of "first" from a "starting at the inside, working backwards in time" approach. This is backwards, if you think about the chronology of the headers, like SA does. However, it makes sense from a "I'm at my server looking outward at the world" point of view that most folks work from when thinking about network topologies.
Re: 'anti' AWL
On 28-Apr-2009, at 15:38, RW wrote: It's based on the first routable IP address, Well, that's a very silly thing for it to be looking at. It should be looking at the LAST routable IP address outside of the trusted network. Looking at the first routable address is completely worthless. -- Adolescence is the period between childhood and adultery
Re: Physician List
On Tue, 2009-04-28 at 19:43 -0400, Casartello, Thomas wrote: > Has anyone else noticed these messages as a problem? I have had a few > complaints about messages getting through my spam filter involving > “Physicians List in the USA” or something like that usually talking I have seen quite a few myself. Unfortunately, they tend to slip by. Made a first attempt at catching them, which helped -- though I do see new variants going under the radar of a few of my meta's. I'd be interested in getting more samples (contact me off-list first!) by anyone, to tighten and broaden (yes, both) my local rules and drop them publicly. Interestingly, I seem to ever get them only on list role accounts and non-published OSS forwarder addresses. guenther -- char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
my emailBL is live!
This was actually rather simple to set up. I'll publish the code (AGPL) that runs it in a bit (I need to clean it up to withstand the heavy-handed criticism on this list ...). Note, I'm using ZoneEdit's free NS mirroring, which has limited bandwidth. I'm willing to pay their minimum threshold if it gets that popular, but any more than that and I'll be looking for other options. (NOT PRODUCTION GRADE!) A SpamAssassin plugin will be needed to get it working, too ... I suspect there are gurus here who can do that part as easily as I did the scraper and BIND code. If nobody bites, I'll get to it in time. For now, we have a functional proof-of-concept. I'll post the code, a more formal announcement, and more documentation to my blog and website in a few days ("a few" might be a large number). The emailBL syncs with the upstream every 4h (I'd reduce the TTL and increase the syncing frequency, but I'd risk running out of bandwidth). (Note, the DNS will take another 1-4 hours to propagate.) The structure of the upstream list: ADDRESS,TYPE[TYPE...],DATE ADDRESS is an email address like TYPE is one or more letters of A B C D as follows: A (reply-to) B (from, !reply-to) C (msg body has ADDRESS) D (msg body has ADDRESS obfuscated) DATE is the last time it was seen, formatted MMDD, in UTC(?). The structure of domains in my emailBL index: USER.DOMAIN.emailbl.khopesh.com TXT USER.DOMAIN.emailbl.khopesh.com A127.0.0. USER is the ADDRESS's username, altered as follows: s/^([...@+]{1,16})[...@]*@.*/$1/; # truncate to 16 characters s/^[^a-z0-9]*|[^a-z0-9]*$//g; # fix leading/trailing chars s/[^-a-z.0-9]/-/g; # fix illegal chars DOMAIN is the ADDRESS's domain N_TYPE is a numerical version of TYPE above (A=1, B=2, C=3, D=4) Main test points (with no space after the at sign, obviously): test@ example.com -> test.example.com.emailbl.khopesh.com test@ emailbl.khopesh.com -> test.emailbl.khopesh.com.emailbl.khopesh.com Alternate test point (mimicking DNSBLs): 2.0.0.127.emailbl.khopesh.com Let's pretend we're in a shell (I've spaced all emails): # Look up TXT record (last-seen DATE) for $ host -t txt test.example.com.emailbl.khopesh.com. test.example.com.emailbl.khopesh.com descriptive text "20090328" $ # Look up A record (inclusion TYPE[s]) for $ host test.example.com.emailbl.khopesh.com. test.example.com.emailbl.khopesh.com has address 127.0.0.3 test.example.com.emailbl.khopesh.com has address 127.0.0.4 test.example.com.emailbl.khopesh.com has address 127.0.0.1 test.example.com.emailbl.khopesh.com has address 127.0.0.2 $ More comments in-line: Jesse Thompson (developer of anti-phishing-email-reply) wrote me: > Yes, I and others have thought of it. But I don't need it since we > only use the list to scan log files and populate mapping tables. I > don't have time or money to do any of this, and I'm kept pretty > busy just updating the list...on top of my other bazillion other > responsibilities. > > You are welcome to use the list to create your own URIBL of course. (Jesse is BCC'd.) And so I did. Thanks for keeping the list updated. Hopefully this emailBL will open your list to new horizons. Clearly, credit for the real work goes to you and the other APER developers. Rob McEwen wrote: >>> Personally, I think the obfuscation is overkill. Instead, I'd >>> prefer to change the "@" symbol to an underscore (and any other >>> minor change that might be needed to work with dns queries) and >>> be done with it. This would also make the implementation easier, >>> and research by ISPs easire. Mike Cardwell contended: >> It would definitely require a hashing algorithm, like MD5. IIRC >> there is a maximum length for a hostname, and that is 255 >> characters. What if the hostname in your email address is 255 >> characters long on it's own...? When MD5sums were first proposed (in place of my wild escaping), it seemed like a great idea. However, a voice in the back of my head, now spoken (typed?) by Rob, has been growing louder. My implementation now merely truncates email usernames to 16 characters (plus the noted defanging, which makes it complicated again ...) and replaces the @ with a dot (not an underscore, that's not a legal character). In fact, collisions here could be regarded as good, as usernames that long can include tracking strings (e.g. the mailer for our list, users-return-12345-joe=bob.com@ spamassassin.apache.org, becomes users-return-123.spamassassin.apache.org), which should help. I did fully implement my proposed latter 16 characters (of MD5's 32) plus dot plus the domain, complete with hash lookups, but I just removed it (which is why non-test lookups will fail for the next ~4h). >> Having access to the plain text email address would only make it >> easier for ISPs to do anything if they had access to the zone file. >> In which case, you could just give the
Physician List
Has anyone else noticed these messages as a problem? I have had a few complaints about messages getting through my spam filter involving "Physicians List in the USA" or something like that usually talking about dentists too. I made this to target it (someone on the list showed me how to do things like this which really seems to be helping to block EDU Spear attacks) body WSC_DENTISTSCAM /Dent ists|Send an email to Slater|Directory in the United States|have won a prize money|D.entists|Reach Dentists|Physician Mailing List|receive money|you will have your email taken off|Physicians in the US|Pharmaceutical Company List|List of US Hospitals|Directory of US Dentists/i describe WSC_DENTISTSCAM Dentist scam. score WSC_DENTISTSCAM 15 body WSC_DENTIST_D /dentist/i describe WSC_DENTIST_D Email Contains dentist score WSC_DENTIST_D 0.1 body WSC_DENTIST_P /physician|MD/i describe WSC_DENTIST_P Email contains physician score WSC_DENTIST_P 0.1 body WSC_DENTIST_L /list|directory/i describe WSC_DENTIST_L Email contains directory/list score WSC_DENTIST_L 0.1 body WSC_DENTIST_U /United States/i describe WSC_DENTIST_U Email contains United States score WSC_DENTIST_U 0.1 meta WSC_DENTIST_1 WSC_DENTIST_D && WSC_DENTIST_P && WSC_DENTIST_L describe WSC_DENTIST_1 Likely dentist/physician list spam..contains physician, dentist, and list or directory score WSC_DENTIST_1 7 meta WSC_DENTIST_2 WSC_DENTIST_D && WSC_DENTIST_P && WSC_DENTIST_L && WSC_DENTIST_U describe WSC_DENTIST_2 Very Likely dentist/physician list spam score WSC_DENTIST_3 10 Has anyone else been seeing these types of messages? Thomas E. Casartello, Jr. Staff Assistant - Wireless Technician/Linux Administrator Information Technology Wilson 105A Westfield State College (413) 572-8245 Red Hat Certified Technician (RHCT) smime.p7s Description: S/MIME cryptographic signature
Re: How can I tell if the rules are being read?
On Tue, 2009-04-28 at 14:44 -0700, Adam Harrison wrote: > I’m seeing a lot of mail with Viagra in the subject coming through, > even though there is the drugs rules file(20_drugs.cf) in the upgrades > directory(/var/lib/spamassassin/3.002004/updates_spamassassin_org). That doesn't necessarily suffice, regardless of that name. ;) Usually there are lots of other things to tweak, third-party rule-sets and plugins... If you upload one or two full samples including all headers somewhere, providing a link, someone is likely to have a look and can give hints. > Is there a simple way to see what rules files are being read? Sure. :) Watch out for "config: read file" in the full debug stuff. And of course any errors, warnings, strange stuff. $ spamassassin -D --lint 2>&1 | less Alternatively, limit it to the config only. $ spamassassin -D config --lint 2>&1 | less -- char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
RE: sa-compile command-line?
Never mind, it works. J Just calling it without any parameters has it default do The Right ThingT. - Mark From: Mark [mailto:ad...@asarian-host.net] Sent: dinsdag 28 april 2009 23:24 To: users@spamassassin.apache.org Subject: sa-compile command-line? Ok, finally got re2c compiled. :) But now sa-compile doesn't seem to output anything. I run: /usr/local/bin/sa-compile --config-file=/etc/mail/spamassassin --updatedir=/var/db/spamassassin/ But no rules are being generated anywhere (that I can find). A single command-line example in the sa-compile docs wouldn't have hurt. :) So, can someone give me an example of a working command-line for sa-compile? Thanks, - Mark
How can I tell if the rules are being read?
I'm seeing a lot of mail with Viagra in the subject coming through, even though there is the drugs rules file(20_drugs.cf) in the upgrades directory(/var/lib/spamassassin/3.002004/updates_spamassassin_org). Is there a simple way to see what rules files are being read? Thanks, -Adam
Re: 'anti' AWL
On Tue, 28 Apr 2009 11:13:56 -0600 LuKreme wrote: > On 28-Apr-2009, at 08:56, Matus UHLAR - fantomas wrote: > > We have more servers users send mail through. Users can't choose > > which server will they connect. > > That already happens now. I think his point is that that doesn't currently cause a problem, but would with your scheme. > The AWL has a confidence based on number of > messages received, right? If I get messages from b...@example.com that > come from a variety of servers, the confidence is much lower than if > they all come from the same server, so the adjustment is lower. I'm not aware that it has any such concept, AFAIK the AWL score is a configurable fraction of average-score - current-score. > No, if they get spam from the SAME senders on DIFFERENT servers, the > AWL would go up even faster. It's based on the first routable IP address, not the last-hop into the trusted network, so someone using other people's wireless networks could go through a huge number of addresses even with the same outgoing smtp-server. Note also that the email address and ip address used by AWL are both forgable by spammers.
Re: Procmail Setup NOT Working
2009/4/28 Robert Ober : > It was global and I want it to stay global. The old procmailrc is: > > DROPPRIVS=yes > > :0fw > | /usr/bin/spamc That's a global config, but you're running it per-user due to the DROPPRIVS line. fyi. > All I want to do now is have all the identified spam(X-Spam-Status: Yes ?) > go to a global file instead of delivered to the users. The global spam file > will be readable by only myself and management. Just create a file and set the permissions to be globally writable, then point procmail at it. You can set the read perms however you want. This makes it hard for users to figure out that some of their mail is missing though, and makes it harder for them to recover it.
sa-compile command-line?
Ok, finally got re2c compiled. :) But now sa-compile doesn't seem to output anything. I run: /usr/local/bin/sa-compile --config-file=/etc/mail/spamassassin --updatedir=/var/db/spamassassin/ But no rules are being generated anywhere (that I can find). A single command-line example in the sa-compile docs wouldn't have hurt. :) So, can someone give me an example of a working command-line for sa-compile? Thanks, - Mark
Re: Procmail Setup NOT Working
On Tue, 28 Apr 2009, Robert Ober wrote: All I want to do now is have all the identified spam(X-Spam-Status: Yes ?) go to a global file instead of delivered to the users. The global spam file will be readable by only myself and management. Company owned systems, so no privacy implied nor should be expected. Do you really want that mailbox file to be world-writable? Alternative: create a spam user, and have procmail _forward_ spams to that user. Procmail would have to skip SA scoring and forwarding if it was running as that user, of course. Then you don't need to worry about access permissions on the spam box. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Ignorance doesn't make stuff not exist. -- Bucky Katt --- 10 days until the 64th anniversary of VE day
Re: Procmail Setup NOT Working
On 4/28/09 3:00 PM, Karsten Bräckelmann wrote: On Tue, 2009-04-28 at 13:32 -0500, Robert Ober wrote: On 4/28/09 11:34 AM, Karsten Bräckelmann wrote: It was global and I want it to stay global. The old procmailrc is: DROPPRIVS=yes :0fw | /usr/bin/spamc No .procmailrc for the users. And Spamassassin is set to rewrite the subject with *Possible SPAM* All I want to do now is have all the identified spam(X-Spam-Status: Yes ?) go to a global file instead of delivered to the users. The global spam file will be readable by only myself and management. Company owned systems, so no privacy implied nor should be expected. I appreciate the responses. Thanks, Robert A. Ober PS: If not, how else?
Re: sa-compile problem
> > > I was just doing an update and compile and ran into this problem which is > > > new, as I never had troulbe before. Error is token exceeds limit, as > > > below. Any help would be appreciated. > > > What's your re2c version? > > as below, you are correct, re2c.0.13.3 > > > re2c: error: line 159, column 2: Token exceeds limit ^^^ > > > command failed! at /usr/bin/sa-compile line 288, <$fh> line 6173. > > > > May I take a guess? re2c 0.13.3 -- if so, update to 0.13.5 or newer. > > many thanks for your input and quick reply.. No problem, glad it helped... Always happy to do someone else's googling. ;) Pretty clear picture, if you ask for sa-compile and the error message. guenther -- char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Procmail Setup NOT Working
On Tue, 2009-04-28 at 13:32 -0500, Robert Ober wrote: > On 4/28/09 11:34 AM, Karsten Bräckelmann wrote: > > >> DROPPRIVS=yes > > > > procmail is being run on behalf of the recipient. > > Makes sense, any way to make sure the log is writeable other that to > put all the users in a group? Ah, just answered the same question at the very end. ;) > >> LOGFILE=/var/log/procmail.log > >> VERBOSE=yes > >> LOGABSTRACT=all > > > > MAILDIR is not set, so it defaults to $HOME. > > How does this apply for doing Spamassassin globally? It doesn't. I mentioned it to point out where mail will be delivered to by procmail. Or rather would, if the $HOME would exist... However, there *is* a point here that matters to SA. It's not the delivering, which is important only to your IMAP server, or whatever else you plan to access the "spam" folders procmail delivers to. The point that matters to SA is the existence of a $HOME. Since you told procmail to drop privs, and do the filtering on behalf of the recipient user, spamc will be invoked as that user, too -- and spamd will attempt to access per-user configs, and maybe even attempt to create it. How exactly did you do the SA filtering before? Site-wide config and dedicated SA or mail processing user? Are these email users real system users, or virtual? Sounds like you have been using some site-wide setup before -- and now you just switched to a per-user config. Do you really want that? > > Does your "main offsite user" even have a $HOME? What user is this being > > run as? Check its home... > > Yes, but all mail goes to /var/spool/mail. Each user has a file there > under their name. So? See my post again, about the setting of MAILDIR and where procmail will deliver according to your recipes. Which, BTW, does not impact the default folder, when procmail reaches the end of the recipes. It most likely will be the same as it currently is -- given you're doing *per-user* processing with procmail... Which might not be what you want to switch to. Humm... Site-wide SA integration with procmail using a single, side-wide quarantine folder. Anyone? :) Did you check the SA site and wiki for some hints? > >> SPAMFOLDER=spam > >> :0: > >> * ^X-Spam-Level: \*\*\*\*\*\*\*\*\*\*\*\*\*\*\* > >> #/dev/null > >> almost-certainly-spam > > > > This would deliver in *mbox* format into $MAILDIR/almost-certainly-spam > > > >> :0 w :$SPAMFOLDER/.lock That lock file likely isn't writable either. > >> * ^X-Spam-Status: Yes > >> $SPAMFOLDER/. > > > > Here you specify *MH* format, delivering into $MAILDIR/spam/ > > Well I just copied from an article. How do I change it for mbox? You'd better carefully review the source you copy from. That's quite a gross mis-configuration. Oh, and also carefully check if the source actually applies to your case. As for changing to mbox, see man procmailrc, last paragraph of the section "Recipe action line". Spoiler: mbox format will be used if you specify a regular *file*, that's no / or /. suffix. > >> No spam is going to the spam file in /var/spool/mail although the main > >> offsite user did have a .lock . I even dropped the level from 8 to 5 . > >> The main offsite user is being flooded and sees all the spam on his > >> phone. I even rebooted the server (Fedora Linux Core 6) last night. > >> Also, what ownership should the logfile(procmail.log) have? I did 660 > >> and tried mail.mail and it still complains in the maillog that it cannot > >> write to the logfile. > > > > procmail is not being run as user mail. See DROPPRIVS in man procmailrc. > > Will do. > > > You should sort out *where* to deliver, and what *format* to use. Also > > it seems the user procmail runs as is not allowed to write to the > > delivery destinations -- and/or does not have a $HOME. > > Sendmail with mbox. As I stated, it was working just for rewritting the Well, *how* was it working before? How did you integrate SA? (see above) > subject. How do I set procmail to run as mail or whatever. This is > unclear to me. I want this to work globally, all spam to the same file. Hmm, never done such a stunt, but this *could* work. NOTE: I did NOT try it, use on your own risk! In the global procmailrc file, first do the filtering through spamc/d, deliver spam to dedicated, system mbox files -- and then set DROPPRIVS for default mail spool delivery. Again, this is untested! And I really don't like the idea of a global quarantine anyway, possibly containing sensitive and private data. Who will review the spam !? > > You will see the failed delivery attempts and falling through to the > > next recipe / default mailbox in the procmail logs, once they are > > writable... > > Still do not understand how to do that. Add the user to the group? Or even make it world-writable, just for debugging purposes. But without a log, you're stabbing in the dark. Procmail can't even complain to you, which it would loudly. -- char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf
Re: sa-compile problem
On Tue, Apr 28, 2009 at 07:44:08PM +0200 or thereabouts, Karsten Bräckelmann wrote: > On Tue, 2009-04-28 at 11:16 -0500, Gary wrote: > > I was just doing an update and compile and ran into this problem which is > > new, as I never had troulbe before. Error is token exceeds limit, as > > below. Any help would be appreciated. > What's your re2c version? as below, you are correct, re2c.0.13.3 > > SA ~ # sa-update --gpgkey 6C6191E3 --channel sought.rules.yerp.org > > --channel updates.spamassassin.org > > SA ~ # sa-compile > ... > > re2c -i -b -o scanner15.c scanner15.re > > re2c: error: line 159, column 2: Token exceeds limit > > command failed! at /usr/bin/sa-compile line 288, <$fh> line 6173. > > May I take a guess? re2c 0.13.3 -- if so, update to 0.13.5 or newer. many thanks for your input and quick reply.. -- Gary
Re: Procmail Setup NOT Working
On 4/28/09 11:34 AM, Karsten Bräckelmann wrote: DROPPRIVS=yes procmail is being run on behalf of the recipient. Makes sense, any way to make sure the log is writeable other that to put all the users in a group? LOGFILE=/var/log/procmail.log VERBOSE=yes LOGABSTRACT=all MAILDIR is not set, so it defaults to $HOME. How does this apply for doing Spamassassin globally? Does your "main offsite user" even have a $HOME? What user is this being run as? Check its home... Yes, but all mail goes to /var/spool/mail. Each user has a file there under their name. :0fw | /usr/bin/spamc # Mail that is very likely spam (>15) can be dropped on the floor. # Move the # down one line to drop it. # Note that dropping mail on the floor is a *bad* # idea unless you really, really believe no false positives will # have a score greater than 15. SPAMFOLDER=spam :0: * ^X-Spam-Level: \*\*\*\*\*\*\*\*\*\*\*\*\*\*\* #/dev/null almost-certainly-spam This would deliver in *mbox* format into $MAILDIR/almost-certainly-spam :0 w :$SPAMFOLDER/.lock * ^X-Spam-Status: Yes $SPAMFOLDER/. Here you specify *MH* format, delivering into $MAILDIR/spam/ Well I just copied from an article. How do I change it for mbox? No spam is going to the spam file in /var/spool/mail although the main offsite user did have a .lock . I even dropped the level from 8 to 5 . The main offsite user is being flooded and sees all the spam on his phone. I even rebooted the server (Fedora Linux Core 6) last night. Also, what ownership should the logfile(procmail.log) have? I did 660 and tried mail.mail and it still complains in the maillog that it cannot write to the logfile. procmail is not being run as user mail. See DROPPRIVS in man procmailrc. Will do. You should sort out *where* to deliver, and what *format* to use. Also it seems the user procmail runs as is not allowed to write to the delivery destinations -- and/or does not have a $HOME. Sendmail with mbox. As I stated, it was working just for rewritting the subject. How do I set procmail to run as mail or whatever. This is unclear to me. I want this to work globally, all spam to the same file. You will see the failed delivery attempts and falling through to the next recipe / default mailbox in the procmail logs, once they are writable... Still do not understand how to do that. Thanks for the help, Robert:-)
Re: sa-compile problem
On Tue, 2009-04-28 at 11:16 -0500, Gary wrote: > I was just doing an update and compile and ran into this problem which is > new, as I never had troulbe before. Error is token exceeds limit, as > below. Any help would be appreciated. What's your re2c version? > SA ~ # sa-update --gpgkey 6C6191E3 --channel sought.rules.yerp.org > --channel updates.spamassassin.org > SA ~ # sa-compile ... > re2c -i -b -o scanner15.c scanner15.re > re2c: error: line 159, column 2: Token exceeds limit > command failed! at /usr/bin/sa-compile line 288, <$fh> line 6173. May I take a guess? re2c 0.13.3 -- if so, update to 0.13.5 or newer. -- char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: 'anti' AWL
On 28-Apr-2009, at 08:56, Matus UHLAR - fantomas wrote: We have more servers users send mail through. Users can't choose which server will they connect. That already happens now. It can also happen when user switched ISP, mail provider, or the mail provider changes IP address, DNS names or what is used there. This would require much more logic that is curerntly in AWL. No it wouldn't. The AWL has a confidence based on number of messages received, right? If I get messages from b...@example.com that come from a variety of servers, the confidence is much lower than if they all come from the same server, so the adjustment is lower. This would even be useful if the original AWL entry is spammish since multiple servers might be a sign of a botnet or host hopping, so applying a little spammish nudge to these messages is probably going to help out a lot, especially if spam...@fakedoamin.tld is sending mails from, say, 10 different server then all those AWL mismatches are going to feed each other into moving that AWL up very very fast. The question is if users tend to repeatedly get spam from the same sender through the same servers. No, if they get spam from the SAME senders on DIFFERENT servers, the AWL would go up even faster. On 28-Apr-2009, at 09:07, Jeff Mincy wrote: Your idea will FP anytime anybody adds a new email device or the ISP changes (etc). That's why the adjustment would be, initially, small. f...@example.com sends me lots of mail. Say it's over 100. It's all ham and it all comes from mail.example.com. The AWL for this email couplet is , say -2.1. An email comes in from f...@example.com but sent from spam.spammer.tld and score 7.0. It gets an additional, say, .42 (20% of the AWL) to score 7.42 instead. Now, another mail from f...@example.com comes in from mail.spam2.tld, this one scores 4.3. It gets a +.42 for missing the match on mail.example com, and gets a +.288 for missing the match on spam.spammer.tld (1% of the AWL, double for being positive, doubled again for being over 5), for a total score of 4.3+.288+.42 = 5.08, pushing it over the spam threshold. Now, say example.com adds a second mail server, mail2.example.com. It will start off with a 'penalty' of +0.708 for being an unknown sender. But, if the message scores under 0, we don't adjust the AWL at all. If the message is over 0, yes it will have an initial penalty but the AWL is pretty darn good at adjusting. Now, say another AWL entry is based on only 20 emails, instead of adjusting by 20% of the awl, we adjust only 4%. (or something. the point is, the more emails the AWL is based on, the more confident it is, and that confidence should count AGAINST messages that don't match the AWL). -- When we woke up that morning we had no way of knowing that in a matter of hours we'd changed the way we were going. Where would I be now? Where would I be now if we'd never met? Would I be singing this song to someone else instead?
Re: Procmail Setup NOT Working
On Tue, 2009-04-28 at 11:07 -0500, Robert Ober wrote: > filter in Outlook. Problem is that some users are setup to have their > email forwarded to their cellphone/blackberry and the spam is in that > inbox. So I found some articles and decided to have the spam go to a > file. The following is the new version of the /etc/procmailrc: > > DROPPRIVS=yes procmail is being run on behalf of the recipient. > LOGFILE=/var/log/procmail.log > VERBOSE=yes > LOGABSTRACT=all MAILDIR is not set, so it defaults to $HOME. Does your "main offsite user" even have a $HOME? What user is this being run as? Check its home... > :0fw > | /usr/bin/spamc > > > # Mail that is very likely spam (>15) can be dropped on the floor. > # Move the # down one line to drop it. > # Note that dropping mail on the floor is a *bad* > # idea unless you really, really believe no false positives will > # have a score greater than 15. > SPAMFOLDER=spam > :0: > * ^X-Spam-Level: \*\*\*\*\*\*\*\*\*\*\*\*\*\*\* > #/dev/null > almost-certainly-spam This would deliver in *mbox* format into $MAILDIR/almost-certainly-spam > :0 w :$SPAMFOLDER/.lock > * ^X-Spam-Status: Yes > $SPAMFOLDER/. Here you specify *MH* format, delivering into $MAILDIR/spam/ > No spam is going to the spam file in /var/spool/mail although the main > offsite user did have a .lock . I even dropped the level from 8 to 5 . > The main offsite user is being flooded and sees all the spam on his > phone. I even rebooted the server (Fedora Linux Core 6) last night. > Also, what ownership should the logfile(procmail.log) have? I did 660 > and tried mail.mail and it still complains in the maillog that it cannot > write to the logfile. procmail is not being run as user mail. See DROPPRIVS in man procmailrc. You should sort out *where* to deliver, and what *format* to use. Also it seems the user procmail runs as is not allowed to write to the delivery destinations -- and/or does not have a $HOME. You will see the failed delivery attempts and falling through to the next recipe / default mailbox in the procmail logs, once they are writable... -- char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
sa-compile problem
Hi guys, I was just doing an update and compile and ran into this problem which is new, as I never had troulbe before. Error is token exceeds limit, as below. Any help would be appreciated. SA ~ # sa-update --gpgkey 6C6191E3 --channel sought.rules.yerp.org --channel updates.spamassassin.org SA ~ # sa-compile [13915] info: generic: base extraction starting. this can take a while... [13915] info: generic: extracting from rules of type body_0 100% [==] 662.83 rules/sec 00m04s DONET 100% [==] 26.10 bases/sec 02m31s DONE [13915] info: body_0: 3450 base strings extracted in 155 seconds [13915] info: generic: extracting from rules of type body_500 100% [==] 5.35 rules/sec 00m00s DONE 100% [==] 48.23 bases/sec 00m00s DONE [13915] info: body_500: 3 base strings extracted in 0 seconds cd /tmp/.spamassassin13915f5BFZ7tmp cd Mail-SpamAssassin-CompiledRegexps-body_0 Wide character in print at /usr/bin/sa-compile line 385, <$fh> line 5635. Wide character in print at /usr/bin/sa-compile line 385, <$fh> line 5684. re2c -i -b -o scanner1.c scanner1.re re2c -i -b -o scanner2.c scanner2.re re2c -i -b -o scanner3.c scanner3.re re2c -i -b -o scanner4.c scanner4.re re2c -i -b -o scanner5.c scanner5.re re2c -i -b -o scanner6.c scanner6.re re2c -i -b -o scanner7.c scanner7.re re2c -i -b -o scanner8.c scanner8.re re2c -i -b -o scanner9.c scanner9.re re2c -i -b -o scanner10.c scanner10.re re2c -i -b -o scanner11.c scanner11.re re2c -i -b -o scanner12.c scanner12.re re2c -i -b -o scanner13.c scanner13.re re2c -i -b -o scanner14.c scanner14.re re2c -i -b -o scanner15.c scanner15.re re2c: error: line 159, column 2: Token exceeds limit command failed! at /usr/bin/sa-compile line 288, <$fh> line 6173. -- Gary
Procmail Setup NOT Working
Hello Folks, I am using Spamassassin 3.2.5 with Sendmail 8.14.1 in an installation for office and offsite users. The initial setup was to have Spamassassin to rewrite the subject so that the users could setup a filter in Outlook. Problem is that some users are setup to have their email forwarded to their cellphone/blackberry and the spam is in that inbox. So I found some articles and decided to have the spam go to a file. The following is the new version of the /etc/procmailrc: DROPPRIVS=yes LOGFILE=/var/log/procmail.log VERBOSE=yes LOGABSTRACT=all :0fw | /usr/bin/spamc # Mail that is very likely spam (>15) can be dropped on the floor. # Move the # down one line to drop it. # Note that dropping mail on the floor is a *bad* # idea unless you really, really believe no false positives will # have a score greater than 15. SPAMFOLDER=spam :0: * ^X-Spam-Level: \*\*\*\*\*\*\*\*\*\*\*\*\*\*\* #/dev/null almost-certainly-spam :0 w :$SPAMFOLDER/.lock * ^X-Spam-Status: Yes $SPAMFOLDER/. No spam is going to the spam file in /var/spool/mail although the main offsite user did have a .lock . I even dropped the level from 8 to 5 . The main offsite user is being flooded and sees all the spam on his phone. I even rebooted the server (Fedora Linux Core 6) last night. Also, what ownership should the logfile(procmail.log) have? I did 660 and tried mail.mail and it still complains in the maillog that it cannot write to the logfile. Ideas would be most welcome. Thanks, Robert A. Ober
Re: emailBL
On Tue, 28 Apr 2009, Mike Cardwell wrote: Alternatively, just stick the original email address in the TXT record. So in rbldnsd, you'd have a record like this: 98f22901b17b13d910456597685c1963 :127.0.0.1:the.r...@email.address I was going to suggest that. Another thing to put in the TXT record might be a URL to evidence - e.g. (one of) the phishing emails containing that address as the contact point. There's no advantage of sticking the email address in the TXT record rather than having a separate file, apart from keeping the data together. Ease of access? OTOH, if you're (not you, Mike) going to host this data, you'll probably have a webby interface for interactive lookups, and that might be the proper way to publish the evidence. If the email address typed into the web form hits, offer a link to view the evidence supporting the listing. I don't think there's any reason to keep the email address or the evidence (suitably sanitized of the targeted victim's contact information) confidential. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Windows Genuine Advantage (WGA) means that now you use your computer at the sufferance of Microsoft Corporation. They can kill it remotely without your consent at any time for any reason; it also shuts down in sympathy when the servers at Microsoft crash. --- 10 days until the 64th anniversary of VE day
Re: emailBL
Rob McEwen wrote: If you're worried about spammers gaming the hash system Most likely, they won't care. They'll happily pursue the "low hanging fruit". The only exception is if/when freemail ISPs started using such a list to start investigating individual accounts for possible termination. But, even then, that is a good problem to have. Personally, I think the obfuscation is overkill. Instead, I'd prefer to change the "@" symbol to an underscore (and any other minor change that might be needed to work with dns queries) and be done with it. This would also make the implementation easier, and research by ISPs easire. It would definitely require a hashing algorithm, like MD5. IIRC there is a maximum length for a hostname, and that is 255 characters. What if the hostname in your email address is 255 characters long on it's own...? Having access to the plain text email address would only make it easier for ISPs to do anything if they had access to the zone file. In which case, you could just give them access to a separate list which has the email addresses in plain text. Alternatively, just stick the original email address in the TXT record. So in rbldnsd, you'd have a record like this: 98f22901b17b13d910456597685c1963 :127.0.0.1:the.r...@email.address Doing an A record lookup on 98f22901b17b13d910456597685c1963.example.com would return "127.0.0.1" and doing a TXT record returns "the.r...@email.address". There's no advantage of sticking the email address in the TXT record rather than having a separate file, apart from keeping the data together. -- Mike Cardwell (https://secure.grepular.com/) (http://perlcv.com/)
Re: Code Rot?
On Tue, 28 Apr 2009, Matt wrote: Steve Freegard wrote: Is it possible to get SVN access just to the sandboxes though? I'd be happy to submit rules for testing. Ditto +1 -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Windows Genuine Advantage (WGA) means that now you use your computer at the sufferance of Microsoft Corporation. They can kill it remotely without your consent at any time for any reason; it also shuts down in sympathy when the servers at Microsoft crash. --- 10 days until the 64th anniversary of VE day
Re: 'anti' AWL
From: LuKreme Date: Tue, 28 Apr 2009 08:43:46 -0600 OK, working on my first cup of coffee this morning, so maybe this has potential. The way the AWL works is by keeping track of the origin of emails, both the address and the server (the top line Received header?) that send the email. So, lets say that I have a lot of email from f...@example.com and that foo's email is sent to me via mail.example.com. Now, I get an email claiming to be from f...@example.com but sent to me from suspiciousserver.tld, so the AWL is not applied. Your idea will FP anytime anybody adds a new email device or the ISP changes (etc). You could use the sagrey plugin to add a point to email from a new email address+ip pairs. -jeff
Re: emailBL
Ben Winslow wrote: > If you're worried about spammers gaming the hash system Most likely, they won't care. They'll happily pursue the "low hanging fruit". The only exception is if/when freemail ISPs started using such a list to start investigating individual accounts for possible termination. But, even then, that is a good problem to have. Personally, I think the obfuscation is overkill. Instead, I'd prefer to change the "@" symbol to an underscore (and any other minor change that might be needed to work with dns queries) and be done with it. This would also make the implementation easier, and research by ISPs easire. As with all DNSBLs, the really hard part is not listing legitimate items. For example, consider that guy out there is probably sending financial newsletters to his very own clients, uses his ISP's MTA for sending, but uses a gmail "from" address. His e-mail address might have a high chance of being mistakenly blacklisted! The last time 2-3 times I saw this idea come up on either SA or Spam-L, I recall that the idea was strongly shot down by a number of people for this and other reasons. But I kept out of the discussion and I actually thought this could be a great idea... if done right and if FPs are kept to a minimum. I'd been planning on starting such a list for quite some time, but it kept getting delayed by more urgent needs. -- Rob McEwen http://dnsbl.invaluement.com/ r...@invaluement.com +1 (478) 475-9032
Re: 'anti' AWL
On 28.04.09 08:43, LuKreme wrote: > OK, working on my first cup of coffee this morning, so maybe this has > potential. > > The way the AWL works is by keeping track of the origin of emails, both > the address and the server (the top line Received header?) that send the > email. So, lets say that I have a lot of email from f...@example.com and > that foo's email is sent to me via mail.example.com. > > Now, I get an email claiming to be from f...@example.com but sent to me > from suspiciousserver.tld, so the AWL is not applied. > > But if I've gotten 50 emails from f...@example.com and all came through > mail.example.com it seems that it would be beneficial to have a 'anti' > AWL score score applied to this particular email, since it claims to be > from one place, but doesn't match the AWL entry. This, naturally would > start of a new AWL entry, but with a slightly higher score than > otherwise. We have more servers users send mail through. Users can't choose which server will they connect. It can also happen when user switched ISP, mail provider, or the mail provider changes IP address, DNS names or what is used there. This would require much more logic that is curerntly in AWL. > This would even be useful if the original AWL entry is spammish since > multiple servers might be a sign of a botnet or host hopping, so > applying a little spammish nudge to these messages is probably going to > help out a lot, especially if spam...@fakedoamin.tld is sending mails > from, say, 10 different server then all those AWL mismatches are going to > feed each other into moving that AWL up very very fast. The question is if users tend to repeatedly get spam from the same sender through the same servers. -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. If Barbie is so popular, why do you have to buy her friends?
'anti' AWL
OK, working on my first cup of coffee this morning, so maybe this has potential. The way the AWL works is by keeping track of the origin of emails, both the address and the server (the top line Received header?) that send the email. So, lets say that I have a lot of email from f...@example.com and that foo's email is sent to me via mail.example.com. Now, I get an email claiming to be from f...@example.com but sent to me from suspiciousserver.tld, so the AWL is not applied. But if I've gotten 50 emails from f...@example.com and all came through mail.example.com it seems that it would be beneficial to have a 'anti' AWL score score applied to this particular email, since it claims to be from one place, but doesn't match the AWL entry. This, naturally would start of a new AWL entry, but with a slightly higher score than otherwise. This would even be useful if the original AWL entry is spammish since multiple servers might be a sign of a botnet or host hopping, so applying a little spammish nudge to these messages is probably going to help out a lot, especially if spam...@fakedoamin.tld is sending mails from, say, 10 different server then all those AWL mismatches are going to feed each other into moving that AWL up very very fast. -- The Germans wore gray, you wore blue.
Re: Stop Counting!
On 28-Apr-2009, at 08:27, John ffitch wrote: On Tue, 28 Apr 2009, LuKreme wrote: I was thinking that, particularly for people who trash messages over a certain threshold and are worried about the SA overhead, a stop-counting threshold might be a good idea. So, for example, for my personal mail I could set stop_counting at 7.0, once a message hits 7.0 (with bayes) SA simply passes it along with a score of 7.0+ (to indicate it stopped processing) and is done. As long as you do not have negative scores. Oh right. Must stop posting before coffee. -- we all have our moments when we lose it the key is though, to conceal the evidence before the police arrive
Re: Stop Counting!
On Tue, 28 Apr 2009, LuKreme wrote: I was thinking that, particularly for people who trash messages over a certain threshold and are worried about the SA overhead, a stop-counting threshold might be a good idea. So, for example, for my personal mail I could set stop_counting at 7.0, once a message hits 7.0 (with bayes) SA simply passes it along with a score of 7.0+ (to indicate it stopped processing) and is done. As long as you do not have negative scores. This has come up before, and I seem to remember that the cost of sorting rules into order was considered more expensive than brute force and not attempting this optimisation But others may have better memories ==John ff
Re: emailBL
On Tue, 28 Apr 2009 02:09:02 +0100 Steve Freegard wrote: > Well in the case of an emailBL - the worst that can happen is that one > listed md5 collides with an innocent e-mail address. By adding in the > string length it reduces that possibility because both colliding > addresses would have to be exactly the same length. I believe you'll > find that ClamAV uses this method for it's MD5 signatures - to get a > match it has to match the MD5 and the file size has to match. MD5 already adds the message length (in bits, as a 64-bit integer) at the very end of the input before the hash is finalized, so adding it again as an ASCII representation of bytes isn't really going to improve anything. If you're worried about spammers gaming the hash system (e.g. using a botnet to compute an address with a hash which collides with some target address), you should bite the bullet and use a longer hash (something in the SHA family, maybe?) You could make up for the extra hash length (in terms of DNS traffic) by using a more efficient encoding of the hash than hex (e.g. base64 or better) with the obvious caveat that it'd be more difficult to query. Given that most software will need new code to support an email-address-based BL, you should give operational concerns (e.g. bandwidth requirements) some serious thought while you have the chance. -- Ben Winslow
Stop Counting!
I was thinking that, particularly for people who trash messages over a certain threshold and are worried about the SA overhead, a stop- counting threshold might be a good idea. So, for example, for my personal mail I could set stop_counting at 7.0, once a message hits 7.0 (with bayes) SA simply passes it along with a score of 7.0+ (to indicate it stopped processing) and is done. Or is this a silly idea? -- Everybody hates a tourist, especially one who thinks it's all such laugh. Yeah, and the chip stains and grease will come out in the bath. You will never understand how it feels to live your life with no meaning or control, and with nowhere left to go. You are amazed that the exist, and they burn so bright whilst you can only wonder why.
Debugging update channels (was: sought.rules.yerp.org site down?)
On Sun, 2009-04-26 at 08:17 -0700, Bill Landry wrote: > >dig sought.rules.yerp.org > > finds no "A" record. Although yerp.org has an "A" record, the site > cannot be access via browser, at least not from here... Yeah, there was another downtime, obviously fixed since. However, just to clarify on the debugging technique -- as I mentioned the other day in this thread, you're dig'ing up the wrong name. sought.rules.yerp.org is *NOT* supposed to have either an A nor a TXT record. Only with the reversed SA version prepended it does have any record at all -- a TXT record encoding the latest channel version. $ host -t TXT 5.2.3.sought.rules.yerp.org 5.2.3.sought.rules.yerp.org descriptive text "320769313" The actual http mirror is not necessarily in the same domain, and usually *not* the channel name (your dig above). The mirrors are cached in the MIRRORED.BY file, and can be checked fresh with a DNS lookup: $ host -t TXT mirrors.sought.rules.yerp.org Bottom line: Please stop digging the channel. :) guenther -- char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: emailBL
John Hardin wrote: > > I suppose I should ask, what do you mean by a spammer "reversing the list"? > I guess I meant that it makes it harder for the spammer if he/she gets a copy of the list to casually look for addresses to avoid without doing the extra work of encoding the address in the same way and looking it up. But with fresh eyes this morning the benefit of this is tenuous - it just means that they have to do a bit of extra work ;-) My idea for creating an emailBL was in the vain hope that if I could get it to work well enough that the actual mailbox providers hosting the dropboxes might actually use it to terminate the mailbox provided I let them see evidence for each address (I know - probably no chance of that; but I can hope). I'm also thinking of doing the same with 'full URIs' that cannot be listed by the existing URI blacklists due to the spammers abusing services specifically to avoid the existing lists so they don't burn up an actual domain name e.g. http://groups.yahoo.com/groupname/message/1 would be as easy as: s...@laptop-smf:~$ perl -MDigest::MD5 -e '$uri="http://groups.yahoo.com/groupname/message/1";; print Digest::MD5::md5_hex($uri).length($uri).".bl.org\n"' f499f872e8276a4777c3dba48481915a43.bl.org Cheers, Steve.
Re: X-Spam-Report: not wrapped sometimes
On Tue, 2009-04-28 at 12:21 +0200, Matus UHLAR wrote: > I often receive see mail where X-Spam-Report header is longer than 80 > characters. This causes mutt to re-wrap the header, which causes the header > be hardly readable. Since SA already wraps other headers, can we consider > that as a bug or does that have an reason/option to tune? No option to tune. Come on. ;) After a quick look at the code, I guess I see what's going on. Actually confirmed my suspicion looking at your headers. Probably worth filing a low priority, enhancement / minor bug. > * 1.8 HTML_NONELEMENT_30_40 BODY: 30% to 40% of HTML elements are > * non-standard > * 0.6 HTML_IMAGE_RATIO_02 BODY: HTML has a low ratio of text to image area This shows it quite nicely. The first one is exactly 80 chars long (yes, indeed), while the second one is shorter, only 76 chars. So why should we wrap the latter? I guess the problem is with the leading tab. The \t is a *single* char, thus leading to the wrapping problem -- when displaying a tab 8 spaces wide. M::SA::PerMsgStatus::_process_header() calls M::SA::Util::wrap() with a line width of 79. Simply using 72 instead is quite nasty with respect to the first line, though. So we'd need to make wrap() smarter, at least understanding about a leading tab's width... guenther -- char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Code Rot?
On 4/28/2009 12:52 PM, Matt wrote: Steve Freegard wrote: Is it possible to get SVN access just to the sandboxes though? I'd be happy to submit rules for testing. My membership of the -dev list was after the PreflightByMail announcement and I would have definitely used it had I been aware of it. Ditto on both counts. met too!
Re: Code Rot?
Steve Freegard wrote: Is it possible to get SVN access just to the sandboxes though? I'd be happy to submit rules for testing. My membership of the -dev list was after the PreflightByMail announcement and I would have definitely used it had I been aware of it. Ditto on both counts. matt
Re: Code Rot?
Justin Mason wrote: > On Mon, Apr 27, 2009 at 17:38, John Hardin wrote: >> On Mon, 27 Apr 2009, Justin Mason wrote: >> >>> On Mon, Apr 27, 2009 at 17:03, Yet Another Ninja wrote: >>> SARE had a nice system where you could submit a rule via email and got the masscheck results via email. Sadly all the boxes which did this are dead. >>> actually, I _did_ come up with one of those, but nobody used it :( >>> >>> http://wiki.apache.org/spamassassin/PreflightByMail >> Did you announce it to the users list? > > nope -- on the dev list. A couple of SARE folks responded saying > "cool!" though. > >>> btw, don't bother trying it now -- I turned it off again after it was >>> never used. >> Ooo. Can it be resurrected? >> >> But this is only part of the problem. How difficult is it for third parties >> to submit rules for review and inclusion in the base ruleset without >> necessarily joining the dev group? Is posting the proposed rule to bugzilla >> sufficient? > > getting the rule into the "rulesrc" area is all that's needed. it > gets auto-promoted > based on linting ok, getting good performance etc > > it's a hell of a lot easier to use SVN these days though. Would it > really be impossible > to do it that way? that's as simple as > > svn up > edit rulesrc/sandbox/jm/20_whatever.cf > svn commit rulesrc/sandbox/jm/20_whatever.cf > > and wait ;) > Is it possible to get SVN access just to the sandboxes though? I'd be happy to submit rules for testing. My membership of the -dev list was after the PreflightByMail announcement and I would have definitely used it had I been aware of it. Cheers, Steve.
X-Spam-Report: not wrapped sometimes
Hello, I often receive see mail where X-Spam-Report header is longer than 80 characters. This causes mutt to re-wrap the header, which causes the header be hardly readable. Since SA already wraps other headers, can we consider that as a bug or does that have an reason/option to tune? Examples (from 2 different systems, both 3.2.5) X-Spam-Report: * 0.0 MISSING_MID Missing Message-Id: header * 0.0 MIME_HTML_MOSTLY BODY: Multipart message mostly text/html MIME * 0.0 HTML_MESSAGE BODY: HTML included in message * 1.1 MPART_ALT_DIFF BODY: HTML and text parts are different * 1.8 HTML_NONELEMENT_30_40 BODY: 30% to 40% of HTML elements are * non-standard * 0.6 HTML_IMAGE_RATIO_02 BODY: HTML has a low ratio of text to image area * 2.8 PYZOR_CHECK Listed in Pyzor (http://pyzor.sf.net/) * 0.2 URIBL_GREY Contains an URL listed in the URIBL greylist * [URIs: streamsend.com] X-Spam-Report: * -0.0 SPF_HELO_PASS SPF: HELO matches SPF record * -0.0 SPF_PASS SPF: sender matches SPF record * 1.4 DATE_IN_FUTURE_96_XX Date: is 96 hours or more after Received: date * 0.0 BAYES_50 BODY: Bayesian spam probability is 40 to 60% * [score: 0.4989] descriptions of DATE_IN_FUTURE_96_XX and HTML_IMAGE_RATIO_02 are too long, while HTML_NONELEMENT_30_40, URIBL_GREY and BAYES_50 are wrapped correctly. -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. Eagles may soar, but weasels don't get sucked into jet engines.
Re: emailBL
On Tue, Apr 28, 2009 at 10:51:33AM +0100, Matt wrote: > Henrik K wrote: >> >> If someone wants to try it on their mail feed: >> >> http://sa.hege.li/pra.cf >> >> Don't mind the size, as optimized they only take millisecond or two to run. >> >> Of course when if it starts getting 10x the size, DNS will start looking >> attractive.. >> >> > > I have been publishing a sa-update channel for this for some time > > the details are on Julian Field's blog (he wrote a script to do what > Regexp::Assemble does) > > http://www.jules.fm/Logbook/files/anti-spear-phishing.html Ah nice.. though I'd rather see actually optimized regexp and not 200 separate rules. :) What comes to my previous files: as it isn't clear to some of you, my code is an example and I have no mention of usage or promise to update the rules. Try at your discretion. Hopefully someone will come up with the DNS based list, it certainly would stop the need for costly spamassassin reloads.
Re: emailBL
Henrik K wrote: This might sound a big picky, but using backticks to call the date command in a perl script is horrible. Try using the standard gmtime function. Eg: $date = gmtime().' (UTC)'; Rather than: $date = `date -u`; chomp($date); /me too busy to man perlfunc Let this thread be an inspiration for all coders out there. Now back to the real world.. Sorry, I assumed that if you were releasing source code to the public, you'd want to make sure it was cross platform compatible. I wont point out the various other limitations with your script then. Are you actually serious or is this some geek humor that I don't get? I was serious. Your code is a bit shit. I was just trying to help. Never mind. If you are serious, would you be willing to audit SpamAssassin code with such enthusiasm? It might actually _matter_. No, I'm too busy. -- Mike Cardwell (https://secure.grepular.com/) (http://perlcv.com/)
Re: emailBL
On Tue, Apr 28, 2009 at 10:31:42AM +0100, Mike Cardwell wrote: > Henrik K wrote: > >>> This might sound a big picky, but using backticks to call the date >>> command in a perl script is horrible. Try using the standard gmtime >>> function. Eg: >>> >>> $date = gmtime().' (UTC)'; >>> >>> Rather than: >>> >>> $date = `date -u`; chomp($date); >> >> /me too busy to man perlfunc >> >> Let this thread be an inspiration for all coders out there. >> >> Now back to the real world.. > > Sorry, I assumed that if you were releasing source code to the public, > you'd want to make sure it was cross platform compatible. I wont point > out the various other limitations with your script then. Are you actually serious or is this some geek humor that I don't get? If you are serious, would you be willing to audit SpamAssassin code with such enthusiasm? It might actually _matter_.
Re: Pyzor ?
> > > On 22.04.09 13:39, Benny Pedersen wrote: > > > > still running here as server and client > > > > On 24.04.09 15:19, Matus UHLAR - fantomas wrote: > > > client only here. searching for PYZOR string in SA logs didn't > > > findanything > > > for last two days (gotta re-check). > > > seems I will turn pyzor off too... > > On 24.04.09 15:51, Matus UHLAR - fantomas wrote: > > no hit for a week, at least on my employer's machines. Got some on this one. > > Does anyone get HITS from PYZOR? On 24.04.09 23:29, Matus UHLAR - fantomas wrote: > OK, thank you. I see the problem is apparently on our side, I'll look for > it. Seems it had to do something with new pyzor servers. I added "pyzor_options --homedir /etc/pyzor" to local.cf, ran "pyzor --homedir /etc/pyzor discover" and it's hitting now. I even have FP's because of pyzor :) but that should be solved on different place :) -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. Depression is merely anger without enthusiasm.
Re: emailBL
Henrik K wrote: If someone wants to try it on their mail feed: http://sa.hege.li/pra.cf Don't mind the size, as optimized they only take millisecond or two to run. Of course when if it starts getting 10x the size, DNS will start looking attractive.. I have been publishing a sa-update channel for this for some time the details are on Julian Field's blog (he wrote a script to do what Regexp::Assemble does) http://www.jules.fm/Logbook/files/anti-spear-phishing.html matt
Re: emailBL
Henrik K wrote: This might sound a big picky, but using backticks to call the date command in a perl script is horrible. Try using the standard gmtime function. Eg: $date = gmtime().' (UTC)'; Rather than: $date = `date -u`; chomp($date); /me too busy to man perlfunc Let this thread be an inspiration for all coders out there. Now back to the real world.. Sorry, I assumed that if you were releasing source code to the public, you'd want to make sure it was cross platform compatible. I wont point out the various other limitations with your script then. -- Mike Cardwell (https://secure.grepular.com/) (http://perlcv.com/)
Re: emailBL
On Tue, Apr 28, 2009 at 09:46:44AM +0100, Mike Cardwell wrote: > Henrik K wrote: > >>> (note, I'm guessing at the appropriate mailing list for cross-post) >>> >>> Dennis Davis wrote: http://code.google.com/p/anti-phishing-email-reply/ is also useful as it attempts to detail the compromised accounts. Just block/quarantine email for those accounts. >>> Interesting ... this seems like it would be best served by DNS in a >>> manner similar to URIBLs ... does such an "emailBL" exist? >> >> If someone wants to try it on their mail feed: >> >> http://sa.hege.li/pra.cf >> >> Don't mind the size, as optimized they only take millisecond or two to run. >> >> Of course when if it starts getting 10x the size, DNS will start looking >> attractive.. > > This might sound a big picky, but using backticks to call the date > command in a perl script is horrible. Try using the standard gmtime > function. Eg: > > $date = gmtime().' (UTC)'; > > Rather than: > > $date = `date -u`; chomp($date); /me too busy to man perlfunc Let this thread be an inspiration for all coders out there. Now back to the real world..
Re: emailBL
Henrik K wrote: (note, I'm guessing at the appropriate mailing list for cross-post) Dennis Davis wrote: http://code.google.com/p/anti-phishing-email-reply/ is also useful as it attempts to detail the compromised accounts. Just block/quarantine email for those accounts. Interesting ... this seems like it would be best served by DNS in a manner similar to URIBLs ... does such an "emailBL" exist? If someone wants to try it on their mail feed: http://sa.hege.li/pra.cf Don't mind the size, as optimized they only take millisecond or two to run. Of course when if it starts getting 10x the size, DNS will start looking attractive.. This might sound a big picky, but using backticks to call the date command in a perl script is horrible. Try using the standard gmtime function. Eg: $date = gmtime().' (UTC)'; Rather than: $date = `date -u`; chomp($date); -- Mike Cardwell (https://secure.grepular.com) (http://perlcv.com/)
Re: emailBL
Dave Funk wrote: Nah - I really don't like it that way; it doesn't really bring you any benefit and is more likely to cause collisions if you do it that way. Don't see how it can cause less DNS traffic either. At least using MD5 hashes your DNS query will only be 32 characters + blacklist zone name regardless of the size of the input string. To reduce the likelihood of collisions then it's better to add the input string length at the end of the md5 like ClamAV does in it's MD5 sigs e.g. s...@laptop-smf:~$ perl -MDigest::MD5 -e '$email="s...@fsg.com"; print Digest::MD5::md5_hex($email).length($email).".emailbl.org\n"' c18782f8d94595d5e016e3ab9ab3f8f610.emailbl.org This also has the benefit of making it impossible to reverse the list if the spammer were to rsync the list. Silly question, given that RFC-2181 says that you can put almost anything you want into a DNS zone file, why go to the bother with the munging, why not just put the raw unadulterated e-mail address in there and do direct queries on it? EG: nslookup syst...@administrativos.com.marc.icaen.uiowa.edu. Assuming you're running reasonably up-2-date DNS stuff it does just work. You can also put pretty much any character you want in an email address local part. Eg, this is a valid email address... "Personal em...@o'Reilly, Peter"@example.com MD5 is cryptographically secure enough for this purpose. Just hashing the entire address with md5 is the simplest and most workable solution. I expect it would be simple to use such a bl in all modern mta's without too much hacking. Eg, in Exim, the configuration to look up such an address against an emailbl called "example.com" would be (untested): deny dnslists = example.com/${md5:$sender_address} message = $sender_address is listed on $dnslist_domain -- Mike Cardwell (https://secure.grepular.com) (http://perlcv.com/)
Re: Code Rot?
On Tue, Apr 28, 2009 at 02:33, RW wrote: > On Mon, 27 Apr 2009 18:04:36 +0100 > Justin Mason wrote: > >> that's pretty much it. low FPs and a useful number of hits (ie. over >> 1% iirc). > > Unfortunately, that doesn't necessarily mean that the rule is useful. > It's easy to create rules that match the above criteria, but most of > them never make a difference as they only fire on spam that's already > caught with a high score. It's much harder to create new rules that > really make a difference - I've found that those that do are mostly > specific to my own mail. > > I'm not really convinced that a *lot* of new rules are really needed, > particularly when you consider that the main complaint against SA is > the number cpu cycles it consumes. yes. we have ways to measure and mitigate this -- once we have the rules in SVN. --j.