Re: FSL_HELO_BARE_IP_2 & RCVD_NUMERIC_HELO
>Operators of newsgroups which mirror/archive mailing >lists, and allow posting from a web interface, are adding forged >Received: headers before sending an email to the respective list >server. In what way are they forged? Do they contain addresses that doesn't match the system adding the received-line or the system it received the message from? >In both cases the last two Received: headers in each message are >forgeries as no SMTP transaction occurred. Does those headers say that a SMTP transaction occurred? If they don't, what is forced? I'm not sure server you mean "last in insertion order" or "last in reading order" so I'll answer for both. :-) Insertion order: >Received: from list by plane.gmane.org with local (Exim 4.69) > (envelope-from ) > id 1VVzEY-0005lJ-P1 > for debian-u...@lists.debian.org; Tue, 15 Oct 2013 09:40:02 +0200 This one says it was received locally without using SMTP. This is normal when a message is sent/queued by a local application. >Received: from plane.gmane.org (plane.gmane.org [80.91.229.3]) > (using TLSv1 with cipher AES256-SHA (256/256 bits)) > (Client did not present a certificate) > by bendel.debian.org (Postfix) with ESMTPS id 7DD8CA6 > for ; Tue, 15 Oct 2013 07:40:05 + >(UTC) This one says nothing says that the message was received with a ESMTP. Do you know that it wasn't? Reading order: >Received: from 94.79.44.98 ([94.79.44.98]) >by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) >id 1AlnuQ-0007hv-00 >for ; Sun, 13 Oct 2013 19:40:43 +0200 This one says it was received with ESMTP. Again, do you know it wasn't? >Received: from freehck by 94.79.44.98 with local (Gmexim 0.1 (Debian)) >id 1AlnuQ-0007hv-00 >for ; Sun, 13 Oct 2013 19:40:43 +0200 This one says it was received locally without SMTP. This is perfectly normal if it was received from a local application, for example a web server running a PHP script or a gateway fetching messaging from something else. >I'm sure this violates more >than one SMTP RFC, but I doubt Gmane will change the way they do this >any time soon. I don't think it does. Trace headers are useful for mail regardless of the protocol used for the transfers between systems/applications, and are defined in the Internet Mail Format RFCs (822 descendants, not sure what the current one is but if you start at 2822 you should be able to find it). (Also, does the SMTP RFCs really apply when your not using SMTP?) Regards /jonas -- Monypholite gemgas.
Re: Very spammy messages yield BAYES_00 (-1.9)
On 2012-08-15 20:56, Ben Johnson wrote: On 8/15/2012 2:24 PM, John Hardin wrote: You may also want to set up some mechanism for users to submit misclassified messages for training. That sounds like a good idea. [...] this server runs Ubuntu 10.04 with Dovecot Since you're using Dovecot you might be able to use the antispam plugin for dovecot. It let's you specify a special spam folder, and when users move mail into or out of that folder they are spooled or piped for retraining as spam or ham. This way, the user running sa-learn does not need access to the users maildirs. <http://wiki2.dovecot.org/Plugins/Antispam> <http://johannes.sipsolutions.net/Projects/dovecot-antispam> Regards /Jonas -- Jonas Eckerman http://www.truls.org/
Re: which is better for virtual domains
Please keep discussions on-list. On 2012-04-23 20:44, "Николай Г. Петров" wrote: My mail system: OS: FreeBSD МТА: sendmail MDA: maildrop database: ldap (openldap) pop/imap: courier-imap I still have no idea how you call spamassassin or spamc or if you use some other method to connect to spamd. -l -c -i 127.0.0.1 -m 3 --max-conn-per-child=5 --round-robin -u vmail -x --virtual-config-dir='/corpmail/%d/.spamassassin/' -d -r ${pidfile} -s /var/log/spamd.log , but in log I have a: spamd[7256]: spamd: using default config for root: /corpmail//.spamassassin//user_prefs Why 'root'? Maybe because you haven't succesfully told spamd what user mail address to scan the mail for, so it falls back to the default. Why domains is not apear? Maybe because spamd don't know the domain. My question is: privious, you say that you save a awl in mysql - what is it 'awl' - auto-white-lists? Yes, AWL is short for Auto White-List. (Wich is a bad name for it.) And may I save in ldap? I don't know. I've never used SA with LDAP. I read manual about ldap database: I don't understand atribute: spamassassin: add_header all Foo LDAP read What they mean from this example? It's a 'awl' or 'user_prefs' or 'somthing else'? I have no idea where in what man-page you found that, so I have no context at all. If I right understand I try to reach level on my mail system like this (please, if something is wrong, critic) ): /corpmail/domain1/.spamassassin/bayes_seen /corpmail/domain1/.spamassassin/bayes_toks /corpmail/domain1/.spamassassin/auto-whiltelist /corpmail/domain1/user/Maildir/user_prefs - (optionaly) AFAICT you need to skip the optional one since you cant't keep multiple user-dirs for one user, and in your scheme the domain is used insetad of the user. I can automate train spamassassin from individual MDA filter per each of users, but normal mesage possible goto 'ham'. I don't know what "normal mesage possible goto 'ham'" means here. Re-learn I think to configure with forward message to [spam|nospam]@doman[1|2].ru for each of domains, and by cron put some script which re-learn from folder spam|nospam on domains. How do you think it will work? Or may be some better idea? I've done something similar myself. How good it works depends a lot on your users. /Jonas -- Jonas Eckerman http://www.truls.org/
Re: which is better for virtual domains
On 2012-04-23 12:23, "Николай Г. Петров" wrote: If there is a lot of virtual domains with many virtual users in it, which is the better variant of configuration spamassassin about spam/ham: - individual database for each of users - or the same database for all of supported domains You forgot one option: - individual database for each domain The answer depends on the situation (I assume you're asking about the bayes database(s)). If the users have separate bayes databases, will they actually train them? If the users doesn't train their databases, a common database could work a lot better than individual databases. How much does the mail streams for the domains have in common? If they have a lot in common it makes sense to have a common bayes database for them. Otherwise separate databases for each domain migh be better. Regards /Jonas -- Jonas Eckerman http://www.truls.org/
Re: checking and processing scores different
On 2010-04-29 14:58, Raphael Bauduin wrote: >>> The difference is: >>> * BAYES_95 in place of BAYES_05 >>> * score is 6.9 in place of 3.9 http://pastebin.org/192054 As you say the mail has been processed twice, with different configurations or databases, or with the same databases but different users. Since the headers only contains full scores for one of the passes, it's impossible to know for sure where all the difference in scores came from. As you noted, one of the passes does have a negative bayes score, while the other have no bayes score, but that's not the only possible difference. Both passes have AWL scores, but we cannot see what score the AWL applied to one of them. The AWL score may well be quite different between the two passes. There's nothing strange in getting different total scores when running with different databases and/or configurations. Both bayes and AWL are supposed to be able to give different scores to the same mail in different mail streams. It is of course possible that there are other differences in scores as well if the two passes were run with different local score settings. /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: dcc: [26896] terminated: exit 241
On 2010-04-21 20:05, Michael Scheidell wrote: On 4/21/10 2:03 PM, Stefan Hornburg (Racke) wrote: The part of MySQL which is used in Debian (the code without the manual) is licensed under GPL. so, the same with DCC. Not as far as I can see. At both <http://www.rhyolite.com/dcc/> and <http://www.dcc-servers.net/dcc/> they link to another, non-generic, license. Quote about the free license from the general info page: ---8<--- You can redistribute unchanged copies of the free source, but you may not redistribute modified, "fixed," or "improved" versions of the source or binaries. ---8<--- The actual license says: ---8<--- This agreement is not applicable to any entity which sells anti-spam solutions to others or provides an anti-spam solution as part of a security solution sold to other entities, or to a private network which employs the DCC or uses data provided by operation of the DCC but does not provide corresponding data to other users. Permission to use, copy, modify, and distribute this software without changes for any purpose with or without fee is hereby granted, provided that the above copyright notice and this permission notice appear in all copies and any distributed versions or copies are either unchanged or not called anything similar to "DCC" or "Distributed Checksum Clearinghouse". ---8<--- Wich is more permissive than the info page indicates, but it's not the GNU General Public License. Debian *might* be able to distribute DCC under another name, like they did with Firefox / Iceweasel etc. Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: More freemail URI spam
On 2010-04-17 23:51, Alex wrote: Somebody on this list wrote a parser to actually parse shorteners to their obscured URLs. That would sure be great. I hadn't seen that, but would like to know more about it. Sounds like a better solution... That'd be me. It's a plugin called URLRedirect and it's available at <http://whatever.frukt.org/spamassassin.text.shtml> It can use Marc's DNS based URL shortener list. Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: More freemail URI spam
On 2010-04-17 21:04, Alex wrote: Maybe someone knows of a list of all the URL shorteners to be used in a combo uri/meta rule? I very much doubt that you'll find a list of *all* the URL shorteners. New ones crops up all the time, and old ones disappears. Marc Perkel posted about a DNS based list he's hosting a while back. I'm attaching that message to this one. Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/ --- Begin Message --- I don't know if it will be useful but I made a short URL provider list that is DNS readable. I got the list here: http://longurl.org/services It's a host name RBL and you can read it as follows: dig tinyurl.com.shorturl.junkemailfilter.com Let me know if you find a use for it. --- End Message ---
Re: on greylisting...
On 2010-04-01 19:06, Adam Katz wrote: For what it's worth, I reconfigured my greylisting relay from a blanket delay to delaying only spamcop neighbors, anything that hits a DNSBL, and any Windows *desktop* (using p0f). I once tried that, had had to refrain from it. The groupware system FirstClass installed on Windows NT+ (of different flavors, including "desktop" OSes) machines is (or was) popular with swedish disability NGOs, and beeing an NGO for deafblind people, we need to be able to communicate those systems. I probably should analyze our current mail stream to see if we still get lots of mail from FC systems, and what OSes those seem to be running on nowadays. (The fact that admins of above mentioned FirstClass systems tended to configure outgoing SMTP in "odd" ways also amde m putin some country/domainbased exemptions...) If I recall correctly, Jonas's implementation also uses p0f and could therefore benefit from my analysis. Yes, my implementation can use p0f. It uses a list of tests that are checked in order to decide wether a sending system sould be handled by the grylist or not. I'm currently using tests for OS (p0f), DNS black- and white-lists, RDNS, MX, SPF, country (GeoIP), sender domain, local spam/ham history and local otgoing hitory to make that desicion. p0f's results with the (perl-compatible) regular expression /Windows (?:XP|2000(?!SP4)|Vista)/ will safely block only desktops. Interesting. I hope I'll have time to check that against or logs. It would nice to have windows desktops greylisted while still beeing able to exempt windows mail and groupware systems. Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: ATTN DEVELOPERS: Mega-Spam
On 2010-03-30 13:31, Kai Schaetzl wrote: Jonas Eckerman wrote on Tue, 30 Mar 2010 00:41:01 +0200: Unless the greylisting is done *after* receiving the body. Of course, this will spank innocent senders as well. Ooops? It spanks *yourself*. Not really. It does force us to accept the mail before rejecting it, but it still rejects a lot of stuff that would otherwise have been scanned by ClamAV and SpamAssassin before being rejected. So, while it does not save as much bandwidth and work as greylisting after RCPT would, it still saves compared to no greylisting. And the filter does some more stuff. For example: We also greylist with *one* temporary failure at connect for each host the first the gateway sees it. This stops more that I irst expecteded when I tried it. Once a mail from an MTA has passed the greylist test, that IP is excempt from the greylist. We keep tracks of behaviour we don't like. Uknown RCPTs, spam, too many retries before the greylist period (3 minutes) has passed, etc, etc, and tempfails hosts at connect based in thsoe counters. We also make exceptions from the greylist based on DNS whitelists, RDNS etc so that most mail from real outgoing MTAs pass right through it. > Good strategy. My filter works for us. Most spam is stopped without the gateway having to scan it with SpamAssassin. Most ham is passed through without beeing subjected to the greylist or beeing scanned by SpamAssassin. And if there still are any stupid MTAs that can't handle tempfails correctly at earlier stages trying to send mail to us, we have a good chance of receiving it. When I first implemented greylisting I did the tempfailing after RCPT, but some stupid Novell MTA and a security appliance (I think it was from Syamantec) saw no difference between tamporary failures and permanen rejects of RCPT TO. And of course one of them they discarded the response it got from our server when bouncing the mai back to the sender. Even worse, some other idiotic piece of crap (I forgot what) reacted to temporary failures at RCPT by simply deleting the mail from it's queue without notifying anyone. So, we lost some incoming mail from organizations that for different reasons didn't just throw out or fix their junk, and I moved the greylist to after receiving the message data. Hopefully I could now move it to RCPT, but I actually like beeing able to log message-id and subject from greylisted mail and I know it works the way it is now. Rgards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: ATTN DEVELOPERS: Mega-Spam
On 2010-03-30 01:29, Brent Kennedy wrote: Graylisting does work. I know it works. That's why I said I like it because it stops spam. Been using my own implementation for years. I think after I turned it on, the botnet plug-in got bored. My stats for it dropped significantly. So that’s my proof it does adversely affect botnets. No, that's your proof that it has a positive impact on your incoming mail stream. It does not prove that it have a significant negative impact on the botnets. From what I see, botnets seem to have resources to spare. A lot of sending bots still hasn't adapted to greylisting. Bots still tries to send to addresses we have been rejecting for 10 years. I suspect that If the botnets were short on bandwidth and computer power, the programmers would have fix those issues a long time ago. And the simple fact that they still haven't adapted to greylisting indicates that it's impact is not (so far) big enough to care about. Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: ATTN DEVELOPERS: Mega-Spam
On 2010-03-30 00:12, John Hardin wrote: While greylisting will help, it won't spank the offender in that manner. It will postpone the message very early in the SMTP exchange, not after the body has been received. Unless the greylisting is done *after* receiving the body. Of course, this will spank innocent senders as well. (My selective greylisting implementation for MIMEDefang does this, originally because some stupid MTAs didn't handle tempfails correctly at earlier stages... The "selective" stuff keeping delays and spanking of innocents down.) BTW: While I like greylisting because it stops a lot of spam, I've never seen any data substantiating claims that it has a measurable negative impact on botnets. So I'm not convinced it really does a lot of spanking of offenders... Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: The Impossible Rule??? Bug???
On 2010-03-24 14:34, Martin Gregorie wrote: It's named MimeMagic and is available at <http://whatever.frukt.org/spamassassin.text.shtml> Thanks, Jonas. That looks very useful. I've replaced my old IMAGE_MISMATCH rule with an equivalent based on MimeMagic that uses: Please make sure to evaluate the results. As stated on the web page, still consider the plugin to be somewhat experimental, and I haven't had a lot of fedback on it. header IMAGE_MISMATCH eval:mimemagic_mismatch_contenttype('jpg', 'gif', 'png', 'bmp', 'svg') That will miss parts with MIME types image/jpeg or image/x-jpeg. Replacing jpg with jpe?g would be better. It will also miss anything where those substrings are not in the declared MIME type for the part. So a JPEG image with a .gif extensiuon and a application/octet-stream MIME type will not be catched. It will include parts where any of those strings happens to me substrings of any other MIME type, including non image ones. Not sure if that will ever matter though. A rule that should quite a lot of image types might be (just of the top of my head, utested): header IMAGE_MISMATCH eval:mimemagic_mismatch_datatype('image/') This should do a magic check on all parts, and see if any parts identified (by the freedesktop database) as image/* has a mismatched MIME type or file name extension. I don't think MimeMagic is overkill. It is probably only a matter of time before non-image files turn up with equivalent lying content types and/or extensions and adding rules to catch them will be trivial. That's what I thought when I wrote it. :-) At that time I wanted to catch some stuff that where a RAR atchment had a ZIP MIME type (or maybe it was the other way round). Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: The Impossible Rule??? Bug???
On 2010-03-23 12:14, Martin Gregorie wrote: Is there any possibility that somebody who is more knowledgeable than I am about images and Perl can extend it to handle BMP and SVG (as a pre-emptive strike)? Not exactly that, but I have written a non-image-specific SA plugin that can check for mismatches. It's a bit overkill if you only want to check for mismatches for images though. It uses the freedesktop file magic database to recognize file content, and provides eval rules to check for file types and mismatches between content, mime type and file extension. If the freedesktop database contains info for SVG and BMP (it should) it can check for those mismatches. It's named MimeMagic and is available at <http://whatever.frukt.org/spamassassin.text.shtml> Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: Finding URLs in html attachments
On 2010-03-01 15:39, John Hardin wrote: [ About ExtractText.pm] Jonas, what's the current status of that plugin? It looks pretty stable to me. It works fine here. Don't know how it works for others. I haven't tested it with 3.3 yet. And, can it extract from basic text attachments? I assume so... It doesn't have any predefined extractor for that, but yes it can. extracttext_external text {CS:UTF-8} /bin/cat - extracttext_use text .txt .htm .html text/(?:plain|html) That ought to work for text/plain. It should be easy to write a minimal plugin to extract text/plain though, and avoid the external call. For text/html we need to strip out the HTML as well. A plugin for that should also be easy to write. It should probably use SAs existing HTML renderer. The plugin currently allways calls set_rendered with no type parameter, wich means it allways spcifies text/plain. It probably should be able to add text/html as html, so that a HTML stripping extractor plugin would be redundant. I'll look into this. Can't be sure when I have the time though. Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: new (small) shortener campaign & suggestion for URLRedirect
links, and have that all > ready for when we send out HTTP requests. I don't get this either. How would the UDP requests help them find bad links? How it help them distinguish between a spamvertized URL and one refernced in a legitimate messae to a high traffic maling list or newsgroup and the quoted in replies for a month or so? They do need to have all working redirects ready at all time any way for all regular browsers, and the non working redirects should return error codes at any time. So I'm not sure what it is you mean they should have ready for our our HEAD requests. Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: MTX public blacklist implemented Re: MTX plugin functionally complete?
On 2010-02-15 15:04, Charles Gregory wrote: On Sun, 14 Feb 2010, Jonas Eckerman wrote: 1: The participation record is optional, so you only use it if you want "everything else" to be rejected. This is why I would support mtamark... It permits the sysadmin to determine the default behaviour for his IP range, rather than defining a dangerous default in the client. In what way does the above define a dangerous default? The default in the statement above is to consider a domain as *not* participating unless otherwise stated by whoever manages the DNS for the domain. If the domain does not participate it should not be punished when a MTX record isn't found. Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: MTX plugin functionally complete? Re: Spam filtering similar to SPF, less breakage
On 2010-02-14 19:20, dar...@chaosreigns.com wrote: On 02/14, Jonas Eckerman wrote: * I think there should be a way to tell the world wether you are using the scheme for a domain (not host) or not. This could easily be done in DNS. I need to think about this more, thanks for the suggestion. (More on registrar boundaries below.) * I think you should follow conventions in DNS naming, using an underscore to signify that the DNS record is a "special" type of record. This is quite common. That's probably a good idea, hmm. You could use SpamAssassins registrar boundaries stuff for getting the domain in a SA plugin, and score higher for missing MTX host record if there is an MTX domain record. How good is SA's registrar boundaries stuff? Not sure, but it's used in various places if you use SA, so if it isn't good that will have effects on SA anyway. I don't think "Use SpamAssassin's registrar boundaries" would be good in an RFC. I only meant that SA's Mail::SpamAssassin::Util::RegistrarBoundaries could be used for this in an SA plugin. In the RFC I'd suggest it be specified that domain policy's should be checked based on domain registry boundaries (but with better wording than mine). I don't even know where the record should be for wildlife.state.nh.us. www.state.nh.us exists, which would indicate mtx.state.nh.us. Mail::SpamAssassin::Util::RegistrarBoundaries::trim_domain returns "wildlife.state.nh.us" for "wildlife.state.nh.us" (and for "whatever."wildlife.state.nh.us"), suggesting that a policy record should be "policy._mtx.wildlife.state.nh.us" or similar. Wether that makes sense or not, I don't know. It does trim for example "mail.microsoft.us" to "microsoft.us", so I guess there's a special reason for it to trim the "state.nh.us" subdomains to more than two levels. Even if SA's registrar boundaries pointed to mtx.wildlife.state.nh.us, you'd still need to be able to delegate to another subdomain. Yes, you'd need that. As I see it, there are two simple ways to do this. * Make it possible to indicate plicy delegation in the policy record. I see you thought about this one allready. :-) * Or, make a MTX checker traverse domain from the one it checks towards the registry boundary when checking for policy. This means more DNS lookups but might be easier to administrate. (I have a vague recollection that DKIM or ADSP works this way... Not sure though) Or maybe participant._mtx.frukt.org. Giving an A record to the _mtx subdomain itself seems potentially problematic, Agreed. And seeing as a hostname should not contain underscore, that wasn't a very good idea of mine. Any suggestions other than "participant"? "policy" seems better than "participant" to me. Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
HELO SPF + FCDNS (was: MTX plugin functionally complete? Re: Spam filtering similar to SPF, less breakage)
On 2010-02-14 19:20, dar...@chaosreigns.com wrote: On 02/14, Jonas Eckerman wrote: The SPF record above says that a host using "panic.chaosreigns.com" in HELO should not be allowed to send mail unless it has the IP address 64.71.152.40, regardless of the domain in the envelope from, From: header, etc.. You're right, I missed that, thank you. The complication, of course, is where a spammer owns the (forgable) HELO domain but not the IP (PTR). Full circle DNS handles that. Has the combination been implemented? I've no idea wether any software actually checks the combination of HELO SPF and FCDNS. It does seem a logical thing to do in software like SpamAssassin or MIMEDefang. Maybe I should implement it in my MIMEDefang filter just to log the results and see if it'd be a good idea to reject on it... Possibly a lack of separate SPF records for HELO and MAIL FROM if they are the same. Agreed. I think they should have separated those records. But then I also think they should have created an _spf subdomain from the start instead of using the TXT record for the domain without any special qualifier... Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: _mtx Re: MTX plugin functionally complete?
On 2010-02-15 02:06, dar...@chaosreigns.com wrote: Thank you for contacting us. An underscore is only legal for specific types of DNS records, such as 'SRV'. 'A' records should only contain letters, numbers and dashes. You may want to consider using '-' as a substitute. I hope this helps. Please don't hesitate to contact us should you have any further questions or concerns. I'm finding *nothing* else that uses underscores in the names of A records. I'm thinking I should stick with "mtx" instead of "_mtx". Please let me know if there is some evidence I'm missing that it's reasonable to use an underscore in this context. The point of using an underscore in "special" records is that the "host" is *not* a normal hostname. DKIM (including ADSP) uses _domainkey.domain.example: http://dkim.org/specs/rfc4871-dkimbase.html#rfc.section.7.4 http://www.rfc-editor.org/rfc/rfc5617.txt According to the DKIM and OpenSPF folks (and, less important, WikiPedia), underscore is forbidden in hostnames only: http://domainkeys.sourceforge.net/underscore.html http://www.openspf.org/DNS/Underscore http://en.wikipedia.org/wiki/Hostname#Restrictions_on_valid_host_names I could use TXT records. I kind of like the A records. Well established for DNS BLs and WLs and all. TXT records might be, prinicpally, the "correct" way to do this, but A records are more efficcient and some caching only DNS proxies might be set up to cache A record lookups (negative and positive) better than TXT records. If there is to be a policy record, maybe that should be a TXT record, but I too like the A record for the actual MTX lookup. Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: MTX public blacklist implemented Re: MTX plugin functionally complete?
On 2010-02-14 20:06, dar...@chaosreigns.com wrote: I remembered why (else) I didn't want to do that. It effectively says "Everything else should be rejected." Which will discourage some people from using it. So you would at least need to provide a way to say "Yes, I'm participating, but anything without an MTX record is valid too." The first two solutions for this that pops into my head: 1: The participation record is optional, so you only use it if you want "everything else" to be rejected. 2: Make it a policy record rather than a participation record, so you can specify more stuff. Either a TXT record or a bitmaped A record for example. Call it "_policy._mtx.*". More on the other very valid concerns later about this... Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: MTX plugin functionally complete? Re: Spam filtering similar to SPF, less breakage
On 2010-02-13 21:48, dar...@chaosreigns.com wrote: Looks like it ties the helo domain to the delivering IP, breaking (broken) forwarding just like SPF? Tying the HELO domain to an IP has does not break forwarding. The host name (including domain) used in HELO is independent from the domain used in MAIL FROM. (It's not that use of SPF that breaks (borken) forwarding, it's the limits connected to the domain used in MAIL FROM.) Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: MTX plugin functionally complete? Re: Spam filtering similar to SPF, less breakage
On 2010-02-13 04:24, dar...@chaosreigns.com wrote: Still http://www.chaosreigns.com/mtx/ I still have the following comments (wich you didn't answer previously): * I think there should be a way to tell the world wether you are using the scheme for a domain (not host) or not. This could easily be done in DNS. * I think you should follow conventions in DNS naming, using an underscore to signify that the DNS record is a "special" type of record. This is quite common. You could use SpamAssassins registrar boundaries stuff for getting the domain in a SA plugin, and score higher for missing MTX host record if there is an MTX domain record. An example (of the top of my head) could be: To say that "marmaduke.frukt.org" [195.67.112.219] is allowed to send mail: 219.112.67.195._mtx.marmaduke.frukt.org. IN A 127.0.0.1 To say that we're using your scheme for all hosts under "frukt.org": _mtx.frukt.org. IN A 127.0.0.1 If anyone connects from a host where reverse lookup or HELO puts it in "frukt.org" domain, you know that your should reject or score high unless it has FCDNS and a matching MTX record. (And of course, if this catches on, you'll have to provide RFC style documentation.) Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: MTX plugin functionally complete? Re: Spam filtering similar to SPF, less breakage
On 2010-02-13 04:24, dar...@chaosreigns.com wrote: panic.chaosreigns.com. IN SPF "v=spf1 a:64.71.152.40 -all" No. MTX defines 64.71.152.40 as a legitimate transmitting mail server, regardless of the domain in the envelope from, From: header, etc.. Popular misconception, it seems. The SPF record above says that a host using "panic.chaosreigns.com" in HELO should not be allowed to send mail unless it has the IP address 64.71.152.40, regardless of the domain in the envelope from, From: header, etc.. That's not exactly the same as your MTX scheme, but it has similar results when combined with a FCDNS check on HELO (providing your scheme is universally adopted). If you're serious about your proposal, you should explain (in your documentation) in what important way it differs from SPF as used against HELO and other similar schemes, and why it is better. Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: Spam filtering similar to SPF, less breakage
On 2010-02-09 22:31, dar...@chaosreigns.com wrote: [Ideas for a new scheme similar to a subset of SPF.] I don't think the SpamAssassin users list is the right place to discuss a new generasl scheme like this, but here goes anyway. Please not that the comments below is just a first reaction. I haven't really thought this through. A general thought is: What does your current scheme give that HELO SPF + FCDNS doesn't? (SPF can be used with HELO as well as MAIL FROM). What format should this arbitrary A record be? I suggest you use a leading underscore for you magic subdomain (2.0.0.10._mtx.smallbusiness.com). I do this because I think your scheme need one more thing to be of any use at all. It needs a way for the domain owner to sopecify that they are using it. This could be done by creating a record for "_mtx.smallbusiness.com". Without a way to indicate wether the scheme is used or not, it'll be unusable for blocking until *all* major email providers as well as almost everyone else is using it. Using an underscore makes it less likely to collide with existing host names. It also makes it more apparent that it's not a regular hostname. A new record type might be even better. Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: Faked _From_ field using our domain - how to filter/score?
1. It shows up as internal mail so gets -6 points or so from the auto-whitelist thus giving it a decent chance of getting through. If it shows up as internal mail even though its external something is wrong. The AWL takes both the renders email address and the sending systems IP-address into account. For some reason it seems it can't differentiate between the relevant sending systems in your setup. Regards /Jonas
Re: Cooperative data gathering project.
Per Jessen wrote: DNS lookups are usually tried done with UDP first, Sure, DNS usually uses UDP, but the DNS resolver also waits for an answer, wich is simply a waste of time when the sender doesn't need the answer. Add to this that resolving one address may result in multiple queries and that a DNS answer often containes more that the queried info and you get more overhead. > but I agree, just use UDP. Absolutely. Imo, the approach suggested by Marc is a text-book example of when to use UDP. (And if more security is needed the easiest way would be to simple limit access to approved IP addresses.) Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: Cooperative data gathering project.
Jason Haar wrote: Then the third filed is NONE. That's how I do it. But the idea is that any kind of daya can be collectively gathered and distributed. Instead of a TCP channel (which means software), what about using DNS? If the SA clients did RBL lookups that contained the details as part of the query, With any sane SpamAssassin setup for multiple users this wouldn't work. Any SA install except for very small mail flows should use a caching DNS server/proxy, preferably one that caches negative results. It's also a good idea if the caching server used for DNSL checks enforces a minimum TTL. This results in repeated queries not making it to the origin servers. Even if the origin server uses ridicilously low TTLs. The distributed caching nature of DNS is a reason why DNSLs are so efficient, but also one reason why DNS isn't suitable for everything. Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: Cooperative data gathering project.
Marc Perkel wrote: spam 1.2.3.4 example.com ham 5.6.7.8 example2.com Sending these one line TCP messages if fairly easy. Why use TCP for this? Establishing a connection channel for simple short mesages where a return code is not required introduces pointless overhead. It'd be much simpler using UDP instead. Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: URLRedirect.pm & Short URL Providers RBL List
Jonas Eckerman wrote: At the time I mentioned that I planned to add support for that list in my URLRedirect plugin. That support is there, and it seems to be working. Of course I forgot to include where the module can be found... It's available at <http://whatever.frukt.org/spamassassin.text.shtml> Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
URLRedirect.pm & Short URL Providers RBL List
Hi! In November Marc Perkel announcend a trial of a DNS based list of short URL providers. At the time I mentioned that I planned to add support for that list in my URLRedirect plugin. That support is there, and it seems to be working. This module follows URLs (in parallel, using HEAD requests) matching specifications or found in a DNSL and adds the location of redirections to metadata (so that the "real" sites are checked by URIBLs and other rules). The addition of Marc's DNSL might make the plugin a lot better, since I did not have an updated list of URL shorteners for it. The module should be seen as a test of concept. If spammers abuse URL redirectors this module and Marc's DNSL could help, but I have not collected any stats to see how helpful it is in practice, and I don't know wether Marc's list contains the URL shortener services most abused by spammers. Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: Suggestion for use by ANY whitelist service....
Assuming "they" below refers to Habeas. Please ignore this mail if it refers to Return Path. Ted Mittelstaedt wrote: They have had the option to do this already for years, now, and have elected to use implied threats to the world's ISP's, rather than regularly participating on this list. To my knowledge Return Path hasn't owned Habeas for "years" yet (I think they bought it a little more than a year ago or so). If your view of Return Path is the same as your view of Habeas your statement makes sense, but otherwise I think you ought to let your view of Return Path color your opinions of Habeas. This might still be a good time (though a little late) to get Habeas' current owners to make the necessary changes to the Habeas part of their company for the Habeas brand to get a a somewhat better reputation among anti-spam folk. After all, the reputation of Habeas can now tarnish the reputation of their main brand as well. Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: Need help running SA in a (comparative) anti-spam test
Martijn Grooten wrote: - I'm happy to add any extensions as long as these are also free and open source -- note that our 'target audience' includes big ISPs and unfortunately for them things as Spamhaus's RBL aren't free; This doesn't make any sense. You are comparing SA to commercial products that aren't free, and wich may use their providers own black lists or even include a volume license for third party lists, and yet you won't allow SA to use lists that aren't completely free? I'd assume that a big ISP using SA (and wants the best from SA install) would pay to use the better DNSBLs. Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: Short URL Providers RBL List
Mike Cardwell wrote: I don't know if it will be useful but I made a short URL provider list that is DNS readable. Been done. See http://rhs.mailpolice.com/#rhsredir Thet don't seem to be the same thing. Quote from the page you linked to: ---8<--- This includes any website which provides an open mechanism to redirect a web browser to another website, ie, by adding a url=http://anotherwebsite in the URL. ---8<--- I would not consider open redirector of that type to be the same thing as a URL shortener service (especially not a well run service). /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: Short URL Providers RBL List
RW wrote: On Thu, 05 Nov 2009 20:05:25 +0100 Thanks. That could be usuable in my URLRedirect plugin. A current list of URL redirectors is the main thing missing from that plugin. It would be even better it included info about wether a URL shortener uses HTTP redirects (wich is what my plugin checks). One other thing is that sometimes the links have already been cancelled for abuse, and the redirection goes to a page saying that. Such pages aren't going to be in any URIBL list, but obviously they are very strong spam indicators. Ideally there would be a regex to match those links on each redirection service. Since my plugin adds redirect targets to the message metadata that check could be done with a normal URL rule. (It might be useful if the plugin flags redirections that redirects to the same domain.) /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: Short URL Providers RBL List
John Rudd wrote: The point is: the URL shortening service isn't the interesting part of the equation. The expanded URL is. If the service uses HTTP redirects it can be checked pretty cheep, wich is what my URLRedirct plugin does. It adds the redirected-to URL to a messages metadata so that other checks (URIBL for example) sees it. Using a DNS based well kept after list of redirectors could make that plugin much more useful than it is currently. Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: Short URL Providers RBL List
Marc Perkel wrote: I don't know if it will be useful but I made a short URL provider list that is DNS readable. Thanks. That could be usuable in my URLRedirect plugin. A current list of URL redirectors is the main thing missing from that plugin. It would be even better it included info about wether a URL shortener uses HTTP redirects (wich is what my plugin checks). My plugin won't check for frames, meta refresh or other "redirection" variants that requires content to be fetched. Let me know if you find a use for it. I've got a use for it. Now I just need to implement it as well. I'll post here when/if I implement it in URLRedirect.pm. For more info on URLRedirect.pm check at http://whatever.frukt.org/spamassassin.text.shtml Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: word file spam
Matus UHLAR - fantomas wrote: Yes, but generic plugin should be able extract images for later processing > (FuzzyOCR or maybe even things like Bayes) too ;) That would depend on what you mean by "generic". :-) It's a generic text extractor plugin, with the ability to call an OCR program for getting text from images. Wich is what I wanted, and is what John mentioned in his post. It's not a generic attachment parser and object extractor (though it might become one). I do want it to be able to add stuff rendered to HTML, but Mail::SpamAssassin::Message::Node doesn't (currently) have a set_rendered variant for doing that, and I haven't had the time to work on Mail::SpamAssassin::Message::Node. I'm not sure exactly what would be the correct way to add parts (such as extracted images) to the message. I have thought about it, and the plugins plugin architecture does support this. I just haven't had the time to find out how to do it. I don't know what you mean by "even things like Bayes". The plugin does make the extracted text available to bayes (this is what I made it for), and it can call OCR programs. Making extracted images available for FuzzyOCR is (as mentioned above) something I want to do. Since I don't do any OCR at all here, that's a pretty low priority though (unless people start asking for it more). Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: word file spam
John Hardin wrote: There were mutterings about a generic plugin that would take an attachment, process it somehow (e.g. wvHtml, antiword, ps2ascii, or whatever was appropriate), and insert the results into the body text to be scanned by the regular rules. That sounds very much like my ExtractText plugin. It can use command line tools or perl plugins to extract text from attachments. There were a bit more than mutterings about it here. :-) > I don't think anything has come of that yet. The plugin works, and we use are using it in our mail gateway. It's listed on the Custom Plugins wiki page, and is available at <http://whatever.frukt.org/spamassassin.text.shtml>. It comes with a config for extracting text from Word, OpenXML, RTF, ODF and PDF files. Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: word file spam
McDonald, Dan wrote: The word doc has a pretty standard 419 body in it, I recall some mutterings on this list about using wvHtml to regularize word docs. My ExtractText plugin can use a command line tool to extract text from word documents and add the text to the message so it is available to bayes and rules. It comes with a config that uses antiword to do this. It's available at http://whatever.frukt.org/spamassassin.text.shtml#ExtractText.pm Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: Pyzor or DCC
Michael Hutchinson wrote: I saw a test message with just the word test in the subject hit DCC once. That's really strange, I don't see how DCC would fire on the subject.. the checksum of the message must have somehow matched some Spam.. That's perfectly normal. DCC doen't just match spam, it matches things that has been seen before. That means it matches bulk, but also anything that happens to be very common for other reasons. I imagine that an empty message with the subject "test" is pretty common, so it's perfectly reasonable for DCC to have seen such messages many times before. I don't know if DCC cares about the subject att all. If it doesn't, it's even more liekey that it would hit on an empty test message. /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: Display Bayes tokens?
Peter Sabaini wrote: I'd like to verify the tokens Bayes uses to classify; [...] Is this encoded in some way? Yes. If you use SQL for bayes you can use my plugin CollectTokens plugin to collect new tokens indexed by the encoded value used by the bayes system. That way you can look upp tokens and see what they were. Of course, you'll only be able to look up tokens that were learnet after you started using the plugin. I've only tested the plugin with MySQL, but it shouldn't be hard to modify it to use another SQL system. The plugin is available at <http://whatever.frukt.org/spamassassin.text.shtml> Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: Plugin extracting text from docs
Matus UHLAR - fantomas wrote: I've been thinking about it. The pdftohtml could provide interesting infromations like colour informations that could lead to better spam detection. Any experiences with this? I've been thinking a bit more about this. My current plan is to download the trunk version of SA from SVN to a development system and put a decent way for plugins to ask SA to render the "extracted" HTML into visible, invisible, meta, etc. Once done and somewhat tested I'll see what the devs thinks about my patch. It shouldn't be hard at all, it's a small change to Mail::SpamAssassin::Message::Node, but I never seem to have as much time as I need for even half of my work and projects... :-/ If the patch is accepted, my ExtractText plugin will use the opened up functionality if it's there. If it's not any extracted HTML will be added using set_rendered as it does now. /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: Plugin extracting text from docs
Matus UHLAR - fantomas wrote: Ah. I didn't see that option. That's nice. I'm now using pdftotext instead of pdftohtml here as well. :-) I've been thinking about it. The pdftohtml could provide interesting infromations like colour informations that could lead to better spam detection. Any experiences with this? You're right. It should be usefult to extract to HTML when possible, and then use Mail::SpamAssassin::HTML to get and then set properties just like the rendered method of Mail::SpamAssassin::Message::Node does. The nice way to do this would IMHO be to make it possible for a plugin to call the "rendered" method of Mail::SpamAssassin::Message::Node passing type and extracted data as parameters. Something like this (completely untested, and watch for wraps): ---8<--- --- Node.pm Thu Jun 12 17:40:48 2008 +++ Node-new.pm Mon Jul 13 17:22:20 2009 @@ -411,16 +411,17 @@ =cut sub rendered { - my ($self) = @_; + my ($self, $type, $text) = @_; - if (!exists $self->{rendered}) { + if ((defined($type) && defined($data)) || !exists $self->{rendered}) { # We only know how to render text/plain and text/html ... # Note: for bug 4843, make sure to skip text/calendar parts # we also want to skip things like text/x-vcard # text/x-aol is ignored here, but looks like text/html ... +$type = $self->{'type'} unless (defined($type)); return(undef,undef) unless ( $self->{'type'} =~ /^text\/(?:plain|html)$/i ); -my $text = $self->_normalize($self->decode(), $self->{charset}); +$text = $self->_normalize($self->decode(), $self->{charset}) unless (defined($text)); my $raw = length($text); # render text/html always, or any other text|text/plain part as text/html ---8<--- This way, AFAICT, any extracted (or generated) HTML should be treated the same way a normal text/html is. Making it available to HTML eval tests for example. Otherwise my plugin could of course use Mail::SpamAssassin::HTML itself. Unfortunately Mail::SpamAssassin::Message::Node has no nice methods for setting the separate relevant properties though, so either the set_rendered metod needs to be expanded or complemeted to allow this anyway, or my plugin will have to directly set the relevant properties (wich makes it depend on Mail::SpamAssassin::Message::Node not being changed too much). I guess I could do the hack version now, and then update it if/when Mail::SpamAssassin::Message::Node is updated to support this in a nice way. :-) Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: Short URL provider list?
Marc Perkel wrote: > Does anyone have a list of all domains that provide short url > redirection? An added wish from me: Does anyone have a list of URL shorteners actively used by spammers? Thanks for the lists. I'm not sure what I'm going to do with it but I'm going to see if I can find a way to use it. If I have the time I'll check those list and add more URL shorteners to the example config for my URLRedirect plugin. AFAICT my plugin works, but to be effective it do need a list of URL shorteners used by spammers, wich I haven't had the time to compile. I've just updated that module, and it can now read lists of redirectors from flat files, and has eval tests for redirect recursion checks. In case you (or anyone else) wants to experiment with fetching redirect locations from URL shorteners (so that normal URL and URIDNSBL rules can get at the real site), or score based on recursive redirects from URL shorteners, please download and test the plugin. Note: the plugin only does "head" requests, and only to sites in it's redirector lists, so it does not have all the cons that actually fetching pages or sending request to the spamvertised web sites has. It's available at <http://whatever.frukt.org/spamassassin.text.shtml> Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: SpamAssasin .pm & .cf file
chauhananshul wrote: I'm new to linux world can some one please help in understanding .cf &.pm files. Neither of those files are specific to linux. The .pm files are perl modules. To understand how those works in detail you need to learn perl. You don't need to know this when using SpamAssassin though. The .cf files are specific to SpamAssassin. To learn how they work read the SpamAssassin documentation. Particularly perldoc Mail::SpamAssassin::Conf also available at <http://search.cpan.org/~jmason/Mail-SpamAssassin-3.2.5/lib/Mail/SpamAssassin/Conf.pm> I've used .cf files from http://www.rulesemporium.com i used to copy in /usr/share/spamassassin/ Don't do that. You should put your custom .cf files in the site rules directory. Usually "/etc/mail/spamassassin" or "/usr/local/etc/mail/spamassassin". it works But will stop working if you use sa-update or upgrade SpamAssassin. but at some sites both .pm & .cf fiels are available can some one please guide me wht to do with .pm files > how to install them or make them work for me. Read about "loadplugin" in the above mentioned documentation. Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: Managing SA/sa-learn with clamav
Steven W. Orr wrote: http://wiki.apache.org/spamassassin/ClamAVPlugin It looks like what I thought I wanted already exists. Based on what I wrote above, and that I like the result of running sa + clamav via the two milters, does anyone have any caveats for me? 1: When running ClamAV inside SA you have to run SA even if ClamAV finds a virus. This requires more resources than just ClamAV. And ClamAV is way faster and requires far less than SA does. 2: If an infected whitelisted mail comes in, you would need a much higher score than the example (10) to stop the virus from passing. 3: If you just tag (and don't block) spam, using ClamAV only from within SA will actually let the virus infected mail though to users. All this said, we run CLamAV both from a milter (MIMEDefang) before SA *and* from SA with the plugin using different configurations. The clamd instance used *before* SA only has the official ClamAV sigs and has phishing sigs and some checks turned off. The clamd instance used *in* SA has the official sigs as well as some third party sig sets and has phishing, broken exe, etc checks turned on. Once question I have: If I use the plugin and it fires, will it in fact contribute to the bayes and AWL tables ending up as I described above? Or is there a placement question of where the plugin should be invoked? That plugin simply makes an eval test available that you can use for scoring. The effects of it's scores on bayes and AWL is the same as for any other scoring rules in SA. Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: Plugin extracting text from docs
Rosenbaum, Larry M. wrote: I have found the Xpdf package [...] has a pdftotext command line utility. > If you build it with the "--without-x" option, Ah. I didn't see that option. That's nice. I'm now using pdftotext instead of pdftohtml here as well. :-) And I've just uploaded a new version of the ExtractText plugin with a few changes. Also, it's now included on my SA page at <http://whatever.frukt.org/spamassassin.text.shtml> as weel as the CustomPlugin page in the SA wiki. (For those having problems downloading from the above server, the zip archive should be automagically mirrored to <http://mmm.truls.org/m/ExtractText.zip>.) Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: constantcontact.com
rich...@buzzhost.co.uk wrote: (You do know what "legacy" means, right?) Sure - do you? If it's left in the core code because the URI never listed CC in the past that makes it legacy to me. If we consider that argument now that cc *is* listed by urbl then the legacy argument that was used, is gone. It becomes an SA issue for effectively white listing *from urbl lookups* a known rotten/black listed uri. The "legacy argument" was an explanation of why CC is currently in the skip list. As, such, it still stands. It still explains why CC is currently skipped. It was never an argument for why CC should be skipped. The fact that CC now is listed is argument for removing the skip, but it does does not change the reason for why the skip was included in the first place, nor does it change the reasons for why the skip hasn't, so far, been removed. Seems like you think missing a score of 0.25 would be worth money to someone. I think that's pretty silly. Depends. If you are sitting at 4.79 and the have a block score of 5.00 it makes a difference. Do you mean to say that a large enough amount of mail from CC get from 4.76 to 4.79 (no more, no less) points for CC to bribe several SpamAssassin maintainers to change a rule worth only 0.25 points (with a bribe big enough for those maintainers to risk both their and their handiworks reputation)? Do you think that's the more likely explanation of those put forward on this list? Calling it whitelisting also seems silly. Jonas I always thought you were grown up enough to be able to fill in the blanks here. White listed from URI lookups. Please, don't be silly now. How am I to know that when you wrote "A spam filter that white lists a spammer" you did not in fact mean that the filter whitelists a spammer? How I am to know that when you wrote "SpamAssassin effectively white listing spammers" you did not in fact imply that SpamAssassin is whitelisting spammers? If you think I'm silly for believing that you mean what you write, then please keep considering me silly. /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: constantcontact.com
rich...@buzzhost.co.uk wrote: m...@haven:~$ host constantcontact.com.multi.uribl.com constantcontact.com.multi.uribl.com A 127.0.0.4 m...@haven:~$ Oh Dear - that kind of rains on the parade of the 'legacy' argument and puts the ball into the SA court. Actually, it gives strength to the "legacy" argument, and the ball wass allready in the SA court. (You do know what "legacy" means, right?) constantcontact.com.multi.uribl.com. 1800 IN A 127.0.0.4 Seems like the cynical who make 'silly assumptions' may not be as silly as we first thought. Seems like you think missing a score of 0.25 would be worth money to someone. I think that's pretty silly. Calling it whitelisting also seems silly. I do think that the skipping of CC should be reviewed though. It might be listed in other URIDNSBLs for example. If the main purpose of the default list of domains to skip URIDNSBL checks for is to save resources by not checking domains that won't be hit anyway, then the whole list should probably be regularly checked by a script that simply flags any domains present on URIDNSBLs for review (or possibly just comment them out of the list). /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: constantcontact.com
rich...@buzzhost.co.uk wrote: Should that be Hi$torical Rea$ons ? If there was a monetary reason (aka bribe), I'd think CC would have been whitelisted. As it is, CC is *not* whitelisted in SA. At least not according to your own posts. What you have noted is that CC is *skipped* by *one* (1) type of rules (URIBL checks). No more, no less. As it stands the is simply white listing a bulker. No, it isnä't. Skipping URIBL checks for a domain is very far from whitelisting the domain when done in SA. SA is a scoring system where the combined score of all rules is what decides how to flag a message. I'm cynical. The only logical reason I can see for anything of this nature is money changing hands. That's not beeing cynical. It's beeing unbelievably unimaginative. /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: Plugin extracting text from docs
Rosenbaum, Larry M. wrote: It appears that "pdftohtml" is only available as a Windows executable (on Sourceforge). If you want a precompiled executable it seems Windows is the only platform, but AFAICS the source code is also available at http://sourceforge.net/projects/pdftohtml/files/ > I need something that will run on Solaris. I've no idea wether it compiles on Solaris or not, but since I installed it from ports on FreeBSD I do know that it compiles on at least one Unix like OS and doesn't require Windows. Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: Plugin extracting text from docs
Benny Pedersen wrote: pdftohtml is imho not found in gentoo, but pdf2html is maybe the same ? I wouldn't know since I haven't got any Gentoo machines. The "pdftohtml" I'm using is installed from FreeBSD ports. It can be downloaded from <http://pdftohtml.sourceforge.net/> only problem i had was that unrtf nedd to have ${file} in the example cf to work all else works I'm using unrtf 0.21.0. Are you using an older version? 0.20.5 latest unstable on gentoo, unless i self bump it Ah. Then I guess reading from stdin is a new feature in 0.21. one thing i need to know is how to control the tmp file path, i cant find where this is made I'm using Mail::SpamAssassin::Util::secure_tmpfile, so it's SA that controls the path to the temp files. I don't know if you can set that in SAs config or not. Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: Plugin extracting text from docs
Benny Pedersen wrote: just tested this plugin here, all i can say it rooks viagra out of docs rtf files :) I just saw it extract a 419 from a word doc so that it was catched by bayes and a bunch of rules (it would actually have slipped past our filter otherwise). :-) > well done Thanks. only problem i had was that unrtf nedd to have ${file} in the example cf to work all else works Odd. I don't need ${file} for unrtf here. I'm using unrtf 0.21.0. Are you using an older version? Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: ExtractText plugin
Jonas Eckerman wrote: For anyone who likes to test stuff, I've uploaded my plugin that extracts text from documents to <http://whatever.frukt.org/graphdefang/ExtractText.zip> In case any of you have problems downloading the file, it's now mirrored as <http://mmm.truls.org/m/ExtractText.zip> And, please tell me of problems. Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: Plugin extracting text from docs
Benny Pedersen wrote: <http://whatever.frukt.org/graphdefang/ExtractText.zip>). I've now mirrored the file as <http://mmm.truls.org/m/ExtractText.zip> I hope that will work better. Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: Plugin extracting text from docs
Rosenbaum, Larry M. wrote: We can use antiword to render text from MSWord files, and unrtf to render text from RTF files. What is the best tool to render text from PDF files? I don't know what the best tool is, but I'm currently using pdftohtml in XML mode (and then stripping the XML) in my ExtractText plugin. (For more info about the plugin, see my post with subject "ExtractText plugin", or download it from <http://whatever.frukt.org/graphdefang/ExtractText.zip>). Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
ExtractText plugin
Hello! For anyone who likes to test stuff, I've uploaded my plugin that extracts text from documents to <http://whatever.frukt.org/graphdefang/ExtractText.zip> I started writing last week, so it hasn't been heavily tested yet, but it has been running here over the weekend with no showstopping problems. What it does is use external tools and simple (interface wise) extractor plugins to extract text from message parts. The extractors are choosed by MIME type, file name and optionally content magic. The extracted text is seen by bayes and SA rules. It is completely possible to create an OCR extractor, but I haven't done so, and I currently don't plan on doing it. The plugin currently comes with a *very* rudimentary OpenXML (recent MS Word) extractor, and a configuration using external tools "antiword", "unrtf", "odt2txt" and "pdftohtml" to extract text from MS Word, RTF, OpenDocument (OpenOffice/StarOffice) and PDF files. It is also possible for an extractor plugin to return several binary objects as well as text. These objects will also be processed by all extractors, so an extractor for a container type of file can return (as an example) a bunch of images, that is then processed by an OCR extractor. I have not implemented any extractor that does this, so it's completely untested. Stuff I allready know is missing: * A safe-guarding maximum depth of processing. * A way for extractor plugins to get config lines. Test it if you feel like it. Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: user filtering attachments
Matus UHLAR - fantomas wrote: oh, dirty workaround, but doable. However, highly depend on the way your MTA calls the spamassasin. With milter, you can't push _any_ header to the mail, only those compiled in. That would depend on wich milter. With MIMEDefang SA itself can't add headers directly, but MIMEDefang can use the results from SA to add headers. OTOH, if one is using MIMEDefang, then one would most likely use MIMEDefang to strip attachment rather than a compbination of SA and maildrop. Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: Plugin extracting text from docs
Theo Van Dinter wrote: the convolution is a fingerprint that you could write a rule for and then you don't care what the content actually is. For example, you'd render something like "doc_pdf_jpg", which would make an obvious Bayes token. In the same way for a zip file, you could do "zip_pdf zip_jpg zip_txt", etc, and they'd all be different tokes. That's really a good idea. Put the chains of extraction in a pseudoheader that can be tested in rules and seen as a token by bayes. I'm putting that in the todo for the plugin. The most common thing to extract apart from text will most likely be images. Any OCR text extractor tied into my plugin would get to see those images, but any OCR SA plugins run after my plugin won't. It might be good to make extracted images available to those, and other image handling plugins. But yours already ran, so who cares about the others? Because they work very differently? A OCR plugin that adds the rendered text to the message for bayes and text rules is very different from one that does it's own scoring based on the OCRed text. If you're expending the resources to OCR the same image in an email multiple times ... You clearly either have a lot of hardware or not a lot of mail. *I* don't use any OCR at all. We don't have the resources for that (beeing a small non-profit NGO), and so far I haven't seen any need for OCR either since we never had much image spam slip through anyway. So I will not implement a OCR extractor for my plugin. I'll leave that for others. This is actually one of the reasons I'd like to let existing OCR plugins have access to any images extracted by my plugin. So that those who allready do use OCR can get a benefit from the extraction. I'm not going to spend much time on it though. I'm happy just extracting text. :-) And it does extract text (currently from Word, OpenXML, OpenDocument and RTF documents). :-) I actually hadn't even thought about this image/OCR etc stuff before Matus suggested it. Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: Plugin extracting text from docs
Theo Van Dinter wrote: I would comment that plugins should probably skip parts they want to render that already has rendered text available. Ah. That's a good idea. Now I'll have to search for a nice way to check that. :-) I can't see how "set_rendered" would help in creating a fucntioning chain where one converter could put an arbitrary extracted object (image, pdf, whatever) where another converter could have a go at it. If a plugin wants to get image/* parts and do something with the contents, they can do that already. Not if the image/* parts are actually inside a document. If you want to have a plugin do some work on a part's contents, then store that result and let another plugin pick up and continue doing other work ... There's no official method to do that. I guessed as much. This however is what me and Matus were talking about. You can store data as part of the Node object. But what would be a use case for that? Matus example was a Word document that contained as PDF wich (might in turn contain an image). A plugin that knows how to read word document could extract th text of the word document and then use "set_rendered" to make that avaiölable to SA. It cannot currently extract the PDF and make it available to any plugins that knows how tpo read PDFs though. Matus idea about chains would be that in this example the the plugin reading the Word document would store any other objects somehow. In this case a PDF. After that, any plugin that knows how to handle PDFs will get to look at the PDF and extract text and other stuff from it. In case it extracts an image, it would then store it the same way, and any image handling plugins would find it. I really don't know how common that is. I have never seen a Word document with a PDF inside it myself. I have however seen many documents that contain images, and I think it would be a good idea to make those images available to things like FuzzyOCR and ImageInfo. Arguably, there could be multiple people developing plugins for different types, but you'd need some coordination for the register_method_priority calls to figure out who goes in what order. For some stuff coordination would be needed, yes. But not for what I'm thinking of. The text extraction plugin I'm working on (wich started this) itself have simple extractor plugins. These plugins will be able to return arbitrary objects as well as text, and my plugin will check the return objects the same way it checks the original message parts. This way, all the extractors that are tied into my plugins will be able to extract stuff from objects extracted by other extractors. So far so good. The most common thing to extract apart from text will most likely be images. Any OCR text extractor tied into my plugin would get to see those images, but any OCR SA plugins run after my plugin won't. It might be good to make extracted images available to those, and other image handling plugins. My plugin is called after the message is parsed, wich is very good for a text extractor. FuzzyOCR (as an example) however works by scoring OCR output (wich may well be very different from the text in the image as we see it), and therefore has to be called at a later stage. The same gioes for ImageInfo. It might therefore be a good idea to make the extracted images and other objects available to scoring plugins as well. > I just found the register_method_priority() method. \o/) It's nice, isn't it? :-) I'm using it in my URLRedirect plugin. Note: Do not try to add or remove parts in the tree. The tree is meant to represent the mime structure of the mail, and each node relates to that specific mime part. The tree is not meant to be a temporary data storage mechanism. Ok. That makes things easier and less easy for me. I know that I'll have to implement my own list of stuff to loop though when extractors return additional parts in my plugin. That's the easy part. The difficult part is how to make extracted stuff available to other plugins in a way they understand. I see two main ways to do this: 1: Invent a new way. This would require modifications of any plugins that should check the extracted objects. 2: Add a container part somewhere that "find_parts" would find, but wich is not actually a member of the message tree, and then add a simple way to add parts to that container. This would require modification of Mail::SpamAssassin::Message, but not of the plugins. Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: Plugin extracting text from docs
Theo Van Dinter wrote: I am not sure but I think something alike was done. What I mean is to have generic chain of format converters, where at the end would be plain image or even text, that could be processed by classic rules like bayes, replacetags etc. Already exists, check recent list history for "set_rendered". :) I though that was for text only. In any case, any plugin looking for images, or a PDF, will most likely look at MIME type and/or file name, and then use the "decode" method to get the data, and AFAICT the "set_rendered" method doesn't have any impact on any of that. I can't see how "set_rendered" would help in creating a fucntioning chain where one converter could put an arbitrary extracted object (image, pdf, whatever) where another converter could have a go at it. Since the "set_rendered" method seems very undocumented I could of course be wrong here. In that case I hope to be verbosely corrected. :-) /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: Plugin extracting text from docs
Matus UHLAR - fantomas wrote: This I don't understand. Do they put PDFs inside .doc files as if the ..doc was an archive? I am not sure but I think something alike was done. Considering that an OpenXML format is basically a zip file with XML files inside and that the actual document can contain hyperlinks I guess it could be possible to do something like that. Don't know enough about the format to know though. What I mean is to have generic chain of format converters, where at the end would be plain image or even text, that could be processed by classic rules like bayes, replacetags etc. If I manage to figure out how to add new parts to a message from within the "post_message_parse" method, that should work just fine. An extractor plugin can return a list of parts to be added to the message, and my module will keep looping through the message parts if new parts are added. So, if a Word extractor extracts a PDF and returns it, the PDF woudl be added to a new part, and in the next loop the PDF part will be sent to a PDF extractor if that exists. And so on. I'm running "post_message_parse" at priority -1 so any added image parts should be available to plugins like FuzzyOCR as well as plugins running "post_message_parse" at default priority. The missing parts are: 1: How do I add a new part to a parsed message (including a singlepart one). This is of course the main problem. 2: The actual extractor plugin that extracts whatever files are included in the word document. Antiword only extracts text, and my extractor for OpenXML is little more than an extremely basic XML remover. Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: Plugin extracting text from docs
Jonas Eckerman wrote: You meen extract images and add them as parts to the message? I guess that should be doable. I know that "unrtf" can extract images from RTF files. I'll probably implement support for this, but I'll probably not implement actually doing it right away. This'll probably have to wait. Browsing the POD and source of Mail::SpamAssassin::Message::Node and Mail::SpamAssassin::Message I found no obvious way of adding new parts to a message node. Especially if the node is a leaf node (I'm guessing that singlepart messages only has a leaf node). Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: Plugin extracting text from docs
Matus UHLAR - fantomas wrote: I'm currently working on a modular plugin for extracting text and add it to SA message parts. if possible, extract images too, so the fuzzyocr and similar plugins would be able to look at that too. You meen extract images and add them as parts to the message? I guess that should be doable. I know that "unrtf" can extract images from RTF files. I'll probably implement support for this, but I'll probably not implement actually doing it right away. IIRC spammers did even put PDF's to .doc files to make the stuff harder, but if you manage the above, it shouldn't be hard to extract PDF's too :) This I don't understand. Do they put PDFs inside .doc files as if the ..doc was an archive? Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Plugin extracting text from docs (was: new spam using large images)
Jason Haar wrote: Speaking of image/rtf/word attachment spam; is there any work going on to standardize this so that the textual output of such attachments could be fed back into SA? Just as a note: I'm currently working on a modular plugin for extracting text and add it to SA message parts. The plugin can use either external tools or it's own simple plugin modules. How to extract text from parts is configurable, and based on mime types and file names, so new formats can be added by simply configuring for new external tolls or creating a new plugin module. My *far* from finished module currently manages to extract text from Word documents (using antiword), OpenXML text documents (using a simple plugin) and RTF (using unrtf). I haven't tested where and how the extracted text is available to SpamAssassin yet (as noted, it's *far* from finished), but I am using "set_rendered" method as in the example, so it should work. ;-) Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: List headers and footers [Re: Unsubscribe]
McDonald, Dan wrote: List servers like mailman resend the message with a different envelope header. Wich doesn't invalidate a DKIM, PGP or S/MIME signature. The MTA receiving this message looks for policy statements about spamassassin.apache.org, not for policy statements from fantomas.sk. For SPF yes. For DKIM it should look for policy statements from "fantomas.sk" since that is the domain of the address used in the From header. If the message had contained a DKIM signature, it should of course look for a DKIM key for the domain specified in the DKIM-Signature header. Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: List headers and footers [Re: Unsubscribe]
David Gibbs wrote: Since Mailman adds it's own headers to the messages it processes, any existing signatures in the message are invalidated. But... They aren't. Some may be, but not all. As an example, the post from mouss wich you replied to was verified with DKIM by our MX to be signedhave passed through a system correctly signing for "mo...@ml.netoyen.net". DKIM specifies wich headers it includes in the signature, and ignores headers that are prepended after the signature. As long as mailman leaves the specified headers below the signature alone, adding it's own headers won't invalidate DKIM signatures. Also, some signatures simply don't care about the *message* headers at all, only about the body or the signed MIME part(s). Thus, Mailman has to remove any existing signatures and let the MTA resign the message after it's been processed. If mailman has been set up to change the body (adding a footer for example) or change headers that can reasonably be expected to appear in signatures (like From or Subject for example), it should remove certain signatures (like DKIM) and (preferably) replace them with the authentication results at the current point (of course, it should (when applicable) include any prepended results header(s) in it's own signature if it then resigns the message). Otherwise I see no reason for it to remove signatures. Wich is an obvious reason *not* to add a footer or a subject tag, as well as a reason not to rewrite From and reply-To. Wether or not that reason is important is a personal opinion, but it is valid. If signatures are left in places and important data isn't changed, our regular verification methods can verify wether a post purporting to be mouss (for example) came from a system that should send mail from mouss. If mailman removes existing signatures or changes important data, we can not verify that the mail really was sent though a system supposed to send mail from mouss. If mailman (or it's MTA) adds authentication results, we have to trust the system (and it's administator(s)) in order to be reasonably sure wether the mail was sent from an autorized system or not. This may not be reasonable for all list hosts. Note: Important data for the mail from mouss that you replied to is the body, and the following headers: Date:From:Reply-To:MIME-Version:To:Subject:References:In-Reply-To:Content-Type:Content-Transfer-Encoding; As long as mailman (or anything else) doesn't change that data, the DKIM signature will still be valid and verifiable, wich it is here. Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: Plugin configuration
Martin Gregorie wrote: Now I'd like to configure the database configuration details from a .cf file, preferably the one containing the associated SA rule, so is there a recommended way of doing this? The "parse_config" plugin method? Pointers to documentation or examples would be much appreciated. Documentation: perldoc Mail::SpamAssassin::Plugin <http://search.cpan.org/~jmason/Mail-SpamAssassin-3.2.5/lib/Mail/SpamAssassin/Plugin.pm> Examples 1, stock plugins that came with SpamAssassin at: [...@inc]/Mail/SpamAssassin/Plugin/* <http://search.cpan.org/~jmason/Mail-SpamAssassin-3.2.5/> Examples 2, third party plugins at: <http://wiki.apache.org/spamassassin/CustomPlugins> Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: Plugin for URL shorteners / redirects
Benny Pedersen wrote: http://wiki.apache.org/spamassassin/WebRedirectPlugin know this plugin ? Yes. Though I hade forgotten it's name. what is the diff in the testing ? Reading the descriptions of the two plugins would have given you some good hints. Reading the documentation (both have PODs) would have given you the answer. They are very different. The WebRedirect plugin fetches pages. My plugin only fetches headers. The WebRedirect plugin adds the content of pages as pseudoheaders. My plugin adds the "location" for a redirect to the existing canonicalized list of URIs (so that existing URI checkers sees them). The WebRedirect plugin provides an eval test to check the status code for queried links. My plugin doesn't. /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: New image spam
Matus UHLAR - fantomas wrote: You need to check the files contents to catch that, and the ImageInfo plugin isn't meant to understand just any kind of content. Well, first issue was only to compare file extension to provided mime type, so it would hit .gif file of type image/jpeg Ah. yes. That could be done in a much more lightweight way than what my MimeMagic plugin does. It should be pretty easy to make a plugin doing that. compares the content-type with the content (using File::MimeInfo::Magic (wich uses the freedesktop file database). that's more complicated but apparently good to have. I wonder if the real filetype will match the extension or the mime type (or neither one) I made it to check for windows executables sent with MIME type and extension of an MS Office dokument. (I did this after discovering that a couple of machines here gladly ran an Win32 executable if it had a .doc or .xls extension when the user double-clicked it.) For the current image spam it's overkill, but it did hit when I ran checked OPs example message here. Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Plugin for URL shorteners / redirects
Hi! I just threw together a plugin that can check URLs for redirections, and add whatever they redirect to to the message meta-data so that the true destinations are checked by URIBLs etc. It doesn't do this for all URLs in a message. I will only follow those URLs it is specifically told to follow. Also, it only asks for HEAD rather than pages in order to keep the traffic down. I'm not sure wether this is really worthwhile or if it is just a waste of time and resources, but the idea is to use it for URL shorteners that are beeing abused by spammers. To be really useful it needs a list of abused URL shorteners. I don't know wich shorteners are most abused, so I don't know what the list should contain. (The three example shorteners are in the POD because I knew about them, not because they are beeing used by spammers.) If anyone thinks this is a good idea you can check the plugin at <http://whatever.frukt.org/spamassassin.text.shtml?accessibler#URLRedirect.pm>. Suggestions and criticism are very welcome. URL shortener addresses (with formats) even more welcome. Notes: This is not extensively tested. It may well contain bugs. It's not a finished thing. If this plugin is a good idea, making it do it's HEAD requests in paralell would be a good idea, but I don't know what the best way to do that in perl for SA would be. (Currently it has a hardcoded timeout of 10 seconds around it's requesting stage, but no other time saving stuff.) Using a cache should also be implemented so that repeatedly seen URLs aren't followed over and over again. This should be pretty simple. Since it needs URL meta-data to be checked before it runs, and needs to add it's own meta-data before the rest of the scan run, it can't really work asyncronoulsy AFAICS. Currently it uses a parsed_metadata at priority -1 in order to add it's own meta-data. Maybe this isn't the right way to do this. Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: New image spam
mouss wrote: is there a way to generalize this to other MIME types? I mean a file claiming to be a .pdf when it is a .wmv...? You need to check the files contents to catch that, and the ImageInfo plugin isn't meant to understand just any kind of content. or do we need a FileType plugin? I guess you mist my post where I said that I've got an experimental plugin that is just that. It compares the content-type with the content (using File::MimeInfo::Magic (wich uses the freedesktop file database). The plugin was called TypeMismatch for a couple of days, wich is closer to FileType than the current name: MimeMagic. Anyway, it can be found at <http://whatever.frukt.org/spamassassin.text.shtml> Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: New image spam
Bob Proulx wrote: I like the idea of tagging mismatched types where the actual content doesn't match the stated type. That would be a good idea for a plugin enhancement. Perhaps something based upon libmagic? I've got a plugin that does this. It's the MimeMagic plugin at <http://whatever.frukt.org/spamassassin.text.shtml#MimeMagic.pm>. FWIW the spam put up by the OP got hit by a mismatch rule when I ran it through spamassassin here. The plugin uses File::MimeInfo::Magic, wich in turn uses the freedesktop MIME database. Please note that while the plugin isn't new I still consider it experimental since I haven't done enough evaluation of it's results. Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: over-representing non-English spam?
Karsten Bräckelmann wrote: This is not about OpenProtect or their decisions. Actually, there are more than this one sa-update mirror for the SARE rules. I think you missed my point. The OpenProtect channel adds a bunch of SARE rulesets in a single channel. This means that when you use that channel, you delegate the decision on which SARE rulesets to include to OpenProtect. This is fine as long as their decisions fit your mail flow and policy (I use OpenProtect's channel myself). If their decisions doesn't fit your mail flow and policy, it's better to manually add the rulesets you want (for example using Daryl's SARE channels). OpenProtect just happens to be one of the mirrors to provide that service to the >= 3.1.1 SA users out there. :) They didn't write the rules, and they are not responsible for FP hits *long* after the rules have been validated and updated last time. They didn't write the rules, but they do decide wich rulesets to put in their combined channel. And of course they are not responsible for FPs. The person who configured a system to use their channel is responsible for resulting FPs (if any) in that system. Wich fits what I said to the OP as well. Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: over-representing non-English spam?
Jason Haar wrote: As you can see, MIME_CHARSET_FARAWAY, CHARSET_FARAWAY_HEADER, and SARE_SUB_ENC_GB2312 (from openprotect rules) all triggered - total of 8.0 points. Sounds good - but of course that's very bad! Doesn't that mean an actual legitimate Chinese email would *default to a score of 8.0*!?!?!?! About MIME_CHARSET_FARAWAY, CHARSET_FARAWAY_HEADER: Setting ok_locales to something not including Chinese charsets implies that you want Chinese email to get a rather high score. If you don't want to punish Chinese mail, don't tell SA to do so. Hint: The default setting is to allow all charsets. It's you (or your admin) that has decided to punish Chinese mail. About SARE_SUB_ENC_GB2312: This is not a standard SA rule. Adding that rule to your SA ruleset implies that you wish to use it. If you don't wan't SA to use a sepcific custom rule or ruleset, don't tell SA to do so. Hint: Using the OpenProtect channel means that you (or your admin) have decided to trust OpenProtect to decide for you wich rules to add to your ruleset. If you find that you don't agree with OpenProtects decisions, simply stop using their channel and make the decisions yourself. Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: should the spam score increase
Jari Fredriksson wrote: As the mail contains no text, there propably is not much to learn. Why not? Bayes learns from headers as well, and headers can be just as useful as body text for classifying mail. (Note: I haven't seen a single one of these PNG-only spams, so I don't know how telling their headers are in practice.) Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: should the spam score increase
Lists wrote: question is should they system now be 'learning' these and thus changing the bayes_00 to bayes_50 etc It's actually quite hard for us to know if you have autolearn turned on or off. If not, what is the best method to go about 'learning' these spam. If you have shell access: man sa-learn man spamc Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: mcafee sees drop in spam?
Chris Hoogendyk wrote: The first quarter ended just over a week ago. Actually, it ended over a month ago. Michael Scheidell wrote: > looks like mcafee sees a 20% drop in spam? > wonder what that is about. I'm not seeing a drop in ATTEMPTED spam I see a recent (late april or early may) increase in the amount of botnet connections, but that's in the second quarter of 2009. MacAfee are comparing the first quarter of 2009 with the first quarter of 2008. McAfee's belief that the lower amount of spam is thanks to the takedown of McColo seems resonable. Similar figues were reported by others as well in january or february IIRC. Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: The weirdest problem I have ever met
John Hardin wrote: spamassassin --remove-addr-from-whitelist=problemacco...@clientdomain.com An additional note (since, IIRC, the OP said he did this already): Make sure to run this for the same user as that wich scans the mail when it get's the ridicilously high score. Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: Odd behaviour under load.
s, network problems, or for some other temporary reason takes a long time to respond. This is not a failure to follow RFC2821. This seems to be what happened in this case. It is the reason part two is needed. Part Two: Fix the sending systems so that they do not use an inappropriately low timeout after data end (.). There's a reason why it SHOULD be 10 minutes. Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: The weirdest problem I have ever met
Jodizzz wrote: Result: Email was labelled as very very high spam. Mail headers as below Unfortunately those headers does not include the actual rules that hit. Without knowing this, we can only give you educated guesses. Please include the lists of hits for the message. It should be possible to get your software to either put this in a log or include the hits or report in the mail (as headers or a MIME part). In conclusion the email is only treated as major spam if it is from that particular user problemacco...@clientdomain.com and via the LAN/ISP connection. This *really* sounds like a high scoring AWL entry. Are you sure that you have removed the relevant entries from the AWL for the right user? What seems to be the problem? This is so weird! The problem *seems* to be that the AWL containes very high score for that address+relay combo. The fact that the last score is so much lower than the previous score, while still beeing very high, could be an indication of this. Of course, since we don't know what rules actually hit the message, this is just a guess. Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: Personal SPF
Charles Gregory wrote: Please, stop the PSPF discussions and go implement something that will work without changing the whole internet LOL! Please stop discussing ideas? To be fair, this is the SpamAssassin users list. The purpose if this list isn't the discussion about the validity of ideas about possible future extensions to SPF, DKIM or whatever except as to how those ideas might have a direct impact on the usage or development of SpamAssassin. I can't speak for others, but this is one reason why I haven't given my opinions about your proposed PSPF. Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: Personal SPF
Matus UHLAR - fantomas 5.5.'09, 8:55: > > Strictly speaking, getting them to use it consistently and properly will > > be MORE difficult, > more difficult than what? I parsed it as him stating that getting users to use his proposed PSPF will be more difficult than getting them to use athenticated SMTP to his servers. /Jonas
Re: Personal SPF
On 04.05.09 10:31, Charles Gregory wrote: >> OUR mail server *requires* that a user be connected via our dialups. [...] Matus UHLAR - fantomas wrote: Configuring the mail account in their MUA independently on their internet connection is much easier than changing SMTP server every time they connect to other network. This really is an important point. Your current system makes things unnecessarily difficult for roadwarriors. Beeing able to use authenticated SMTP to port 587 at *one* address is much easier than having to set up different outgoing servers for different connections wich can become quite tedious if you tend to use the connections provioded by hotels for example. FWIW, this was actually the main justification here for setting up authenticated SMTP using a custom SMTP proxy wich authenticated against different (local) POP mailboxes depending on user name and server IP. Our users (me included) understandably wanted mail on laptops to be easier. The possibility of using SPF and DKIM were just bonuses. /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: Personal SPF
Charles Gregory wrote: Proposal: "Personal SPF" - A DNS-based lookup system to allow individual sender's of e-mail to publish a *personal* SPF record within the context of their domain's SPF records, that would identify an IP or range of IP's which they would be 'stating' are the only possible sources of their mail. The only other possible work-around for this is to enforce a 'hard' SPF and establish 'pop before SMTP' or 'SMTP auth' protocols, then spam our membership informing them that use of our server is mandatory. But that would cause problems, because we don't really know *who* is using third party servers, and too many of them wouldn't read the notice... :( Why do you think it would be easier to get those of your users that send through other servers to publish a personal SPF record with correct information about the external IP address of the outgoing relay they use than it would be to get then to use SMTP auth with your servers? How many users have any idea at all about the external IPs of their ISPs mail relays? How many of the users who do have a good idea about the external IPs of their ISPs mail relays have no idee how to tell their mail client to send using use authenticated SMTP with your servers? I might just be confused, but to me it seems that your solution requires more from your users, not less. And, even if (big, big if) the big mail receivers (Yahoo, Google, big ISPs, etc) does eventuelly support your personal SPF, it'll take years until it becomes effective. Regards /Jonas But if we had a 'personal' system, then for as many members as we reach (who pay attention to notices), we could them 'opt-in' to a voluntary "I only send my mail from here" type of system, and then that would at least provide *some* address protection/confirmation. Do they all have static IP addresses or do you imply allow users from dynamic addresses to send mail directly? As noted above, we can control our (dynamic) dialups, but not third party usage. So effectively, anyone, anywhere, can use an hwcn.org return address. This is something I'd really like to limit to legitimate users without enforcing use of our mail server only (though I realize this may be the best long term solution for us). OF course, my suggestion also hinges on whether there are a sufficient number of other systems out there in a similar 'position' as us, who would also benefit from this 'next level' of SPF verification... - Charles -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: 'anti' AWL
RW wrote: By your cronological definition of first and last (which is the same as mine), that's the the FIRST non-private address. Or the address in the fake Received header the spambot put in the mail? I hope this is not how it works... It makes sense to me, if I send you an email, the AWL entry should use my IP address not a random gmail server. Considering that lots of people have dynamic routable addresses, this seems like a bad idea for a big group of people not using WebMail. Regards /Jonas -- Jonas Eckerman Fruktträdet & Förbundet Sveriges Dövblinda http://www.fsdb.org/ http://www.frukt.org/ http://whatever.frukt.org/
Re: user-db size, excess growth...limits ignored
Linda Walsh skrev: Yeah -- then this refers back to the bug about there being no way to prune that file -- it just slowly grows and needs to be read in when spamd starts(?) No. The AWL is stored in a database, and spamd does not read the whole database into memory. It just looks up and updates the address pairs as needed. The same principle is true for the bayes database. So the only real harm is the increased read-initialization and the run-time AWL length? I don't know what you mean with "run-time AWL length", but I don't think the time to open a Berkley DB grows much because the file grows. What will become slower as the file grows is the database updates and to a lesser degree the lookups. If the AWL or bayes database grows enough for this to actually do harm, I'd suggest moving to a SQL database (where expiration of old address pairs is pretty easy to implement). Regards /Jonas
Re: How to disable DNSWL?
Matthias Leisi wrote: Speaking of which, it may actually make sense to use all of dnswl.org's entries as trusted_networks-entries... That seems like a way to get false positives when someone with a listed dynamic IP sends through the smarthost of their ISP or ESP. By extendinmg trust to the ESP/ISP smarthost, SA will do RBL checks on the system that sent the mail to the smarthost. That system may well be a SOHO or private user with a dynamic IP address. Possibly even a dynamic IP address that has previously been used by someone else to send spam. (Please note that I've currently got a fever and therefore may be tricked by a non optimally working brain into writing things thar simply aren't correct...) Regards /Jonas -- Jonas Eckerman, FSDB & Fruktträdet http://whatever.frukt.org/ http://www.fsdb.org/ http://www.frukt.org/
Re: netlawyers: why is this patentable?
Michael Scheidell wrote: wonder why this is patentable? Loads of things are patentable in the meaning that someone manages to get a patent. That doesn't mean the patent can witstand a challenge. You never know for sure wether a patent (or a trademark) is fully valid until it is is disputed (in court) and survives. > sounds like preque filtering available in every mta since the early 90's... It sound like more like the bastard child of a packet sniffing trafic analyzing firewall and a spam-scanning smtp proxy. looks for 'helo/mailfrom/recpt to' then drops or accepts connection. From the abstract it's possible that it does so at a firewall level rather than as an MTA, though it might also describe a more common SMTP proxy. /J -- Jonas Eckerman, FSDB & Fruktträdet http://whatever.frukt.org/ http://www.fsdb.org/ http://www.frukt.org/
Re: country in africa
RobertH wrote: looking hard? of course i did. You did say you didn't see Nigeria anywhere. I took this to mean that you dodn't see it anywhere in the SA default rules, which you would have done using a quick grep. Now I don't know what you meant when you said you didn't see it anywhere. wasn't mentioned, wich it obviously was. how many legitimate emails a day do you people get with the work Nigeria in it? I get one every now and then. Those usually have to do with spam, but not allways. Sometimes we get quite a few from TT (a swedish news agency). At those times it's likely to also be mentioned in our own specialized newspaper (made for deafblind people) as well as in several newsletters people subscribe to. We have had correspondance with non-profits in Nigeraia as well, but I've no idea how common that is. In contrast, I can't even remember the last time a 419-type mail mentioning Nigeria slipped through our filter. As an aside: We once got a legitimate mail from a Nigerian NGO seeking financial help for the work with disabled people. We're a swedish NGO for deafblind people with a few projects in Africa, so it's not a spammy thing for them to do. It got stuck in our quarantine (wich is reviewed most workdays), so we actually received it. I do feel sorry for them since it was most likely stopped almost everywhere. Their mail mentioned money, transfers of money, the government of Nigeria and banks and was sent form Nigeria. yeah, that is what i thought. :-) It was? when i get an nigerian email scam email that hits squat, well you get the idea. Yeah. You get mail that I don't. I don't get Nigerian scam email myself, and our users don't report any to me. We reject and quarantine at 9 points, and reject without quarantine at 18 points. So Nigerian scams get at least 9 points here. So most nigerian mail are either stopped by our greylist or get 18 points or more, and virtually none get lower than 9 points here. Regards /Jonas -- Jonas Eckerman, FSDB & Fruktträdet http://whatever.frukt.org/ http://www.fsdb.org/ http://www.frukt.org/
Re: country in africa
RobertH wrote: how is it that the country in africa so often mentioned in email scams is not worth a point in SA default config You mean the rules? nor do i see it anywhere You must not be looking very hard. It's there, both in the default ruleset and in the updated ruleset, but not as a single-word rule: grep -i nigeria /var/db/spamassassin/3.002005/updates_spamassassin_org/* jo...@chip:~$ grep -i nigeria /var/db/spamassassin/3.002005/updates_spamassassin_org/* /var/db/spamassassin/3.002005/updates_spamassassin_org/20_advance_fee.cf:# SpamAssassin rules file: advance fee fraud rules (Nigerian 419 scams) /var/db/spamassassin/3.002005/updates_spamassassin_org/20_advance_fee.cf:body __FRAUD_NEB /(?:government|bank) of nigeria/i /var/db/spamassassin/3.002005/updates_spamassassin_org/20_advance_fee.cf:body __FRAUD_BEP /\b(?:bank of nigeria|central bank of|trust bank|apex bank|amalgamated bank)\b/i /var/db/spamassassin/3.002005/updates_spamassassin_org/20_advance_fee.cf:body __FRAUD_YQV /nigerian? (?:national|government)/i /var/db/spamassassin/3.002005/updates_spamassassin_org/20_advance_fee.cf:describe ADVANCE_FEE_2 Appears to be advance fee fraud (Nigerian 419) /var/db/spamassassin/3.002005/updates_spamassassin_org/20_advance_fee.cf:describe ADVANCE_FEE_3 Appears to be advance fee fraud (Nigerian 419) /var/db/spamassassin/3.002005/updates_spamassassin_org/20_advance_fee.cf:describe ADVANCE_FEE_4 Appears to be advance fee fraud (Nigerian 419) jo...@chip:~$ grep -i nigeria /usr/local/share/spamassassin/* /usr/local/share/spamassassin/20_advance_fee.cf:# SpamAssassin rules file: advance fee fraud rules (Nigerian 419 scams) /usr/local/share/spamassassin/20_advance_fee.cf:body __FRAUD_NEB /(?:government|bank) of nigeria/i /usr/local/share/spamassassin/20_advance_fee.cf:body __FRAUD_BEP /\b(?:bank of nigeria|central bank of|trust bank|apex bank|amalgamated bank)\b/i /usr/local/share/spamassassin/20_advance_fee.cf:body __FRAUD_YQV /nigerian? (?:national|government)/i /usr/local/share/spamassassin/20_advance_fee.cf:describe ADVANCE_FEE_2 Appears to be advance fee fraud (Nigerian 419) /usr/local/share/spamassassin/20_advance_fee.cf:describe ADVANCE_FEE_3 Appears to be advance fee fraud (Nigerian 419) /usr/local/share/spamassassin/20_advance_fee.cf:describe ADVANCE_FEE_4 Appears to be advance fee fraud (Nigerian 419) /Jonas -- Jonas Eckerman, FSDB & Fruktträdet http://whatever.frukt.org/ http://www.fsdb.org/ http://www.frukt.org/
Re: excessive scan time
Brian J. Murrell wrote: I'd also suggest using SQL for user preferences. The user interface (i.e. editing a file) for user preferences is a different story. Now users need to know how to edit SQL records, or I need to install a web interface for that. Or you use a small script that reads the users preferences from file (when the file has been modified) and updates the SQL database. Regards /Jonas -- Jonas Eckerman, FSDB & Fruktträdet http://whatever.frukt.org/ http://www.fsdb.org/ http://www.frukt.org/
Re: excessive scan time
Brian J. Murrell wrote: One thing worth noting is that I have spamassassin using ~/.spamassassin here and people's home dirs can be (i.e. NFS) mounted from remote machines (i.e. their primary workstations), which do occasionally get shut down. If you're not allready using a SQL database for bayes and AWL I'd suggest you do that. I'd also suggest using SQL for user preferences. I wonder what happens in the MTA->SA->local delivery process chain when ~/.spamassassin is unavailable, or worse, on a stale mount. With bayes, AWL and user prefs in a SQL database that problem ought to be avoided. (Maybe there's more than those that should be moved from ~/.spamassassin though). /Jonas -- Jonas Eckerman, FSDB & Fruktträdet http://whatever.frukt.org/ http://www.fsdb.org/ http://www.frukt.org/
Re: Botnet plugin
Henrik K wrote: Less info only if you are running a sad MTA, that doesn't properly resolve. I guess the SOHO rule is exception, That was what I meant. :-) Check for IP in hostname? Does anyone have actual stats, that it's somehow better than a generic \d+-\d+ regex or whatever? Sometimes it's just better to KISS. I don't have any stats now, but I use a similar check in our selective grey listing and once checked stats for that. There was a clear difference (catching more fqdns with fewer FPs) when I changed from a simple check to a more complex one. (Comparing the fqdn with the IP address allows you to match patterns that might otherwise lead to FPs.) Regards /Jonas -- Jonas Eckerman, FSDB & Fruktträdet http://whatever.frukt.org/ http://www.fsdb.org/ http://www.frukt.org/
Re: Botnet plugin
Mark Martinec wrote: In a while I'll send a patch to the author. That is noble, but apparently it doesn't have any effect. When Botnet was known as RelayChecker I made a suggestion to the author. That suggestion was incorporated in the code. For some reason I take that as an indicator that my suggestion did have an effect at that time, and that there is a possibility that my new suggestion also has an effect (depending on, among other things, what the author things about it). I also seem to recall that the author gives credit (in some file included in the Botnet tar) to a whole bunch of people for suggestions and/or changes. Presumably at least some of those suggestions and/or changes did have some kind of effect on the plugin. /Jonas -- Jonas Eckerman, FSDB & Fruktträdet http://whatever.frukt.org/ http://www.fsdb.org/ http://www.frukt.org/
Re: Botnet plugin
Benny Pedersen wrote: i have changed to use BadRelay from http://sa.hege.li/BadRelay.pm http://sa.hege.li/BadRelay.cf After reading BadRelay.pm I see that it does not really replace Botnet. Some of the differences in what is checked are due to Botnet doing DNS-lookups while BadRelay avoids that. That's fair enough since one of the points of BadRelay is to avoid those lookups. It does mean that BadRelay has less info to base decisions on than Botnet though. One differences is simply due to the fact that all Badrelay does is the simple regexp matches. BadRelay doesn't have Botnet's check for IP in host name, wich it could do without DNS lookups. Also, it should be a small and simple change to Botnet in order to use some of it's functions without making it do it's own DNS lookups AFAICT. The eval checks "botnet_ipinhostname", "botnet_clientwords" and "botnet_serverwords" should be able work without any DNS lookups with this small change. I might do a patch for this (if there is any interest). What would be nice though would be a plugin that: 1: Have a simple (for the user) cf option to decide on wether *any* additional DNS lookups should *ever* be done or not. 2: If told to do lookups, do as many of those as possible asynchronously, the way SAs DNSL checks are done. This would require a redesign of the plugins structure though. I *might* do this (in that case I'd do a completely new plugin based on Botnet) if I get time for it, but I currently have no way of knowing when or if that might be. Regards /Jonas -- Jonas Eckerman, FSDB & Fruktträdet http://whatever.frukt.org/ http://www.fsdb.org/ http://www.frukt.org/
Botnet plugin patch - avoid FPs from DNS timeouts
Hello! Here's a small patch for the Botnet plugin. The difference from the original is that it doesn't treat a timeout or DNS error the same as a not found answer. This should avoid FPs due to overloaded or s,low DNS responsesn. This patch is against a version that hjas allready been patched in order to get short timeouts from the resolver. When using the original version, DNS timesouts will probably not occur as often and this patch will not make as big a difference. Please note that I've only tested this since earlier today. If you see or notice a mistake or problem in it, please tell me and the list about it. Regards /Jonas -- Jonas Eckerman, FSDB http://www.fsdb.org/ --- Botnet.pm Thu Jan 15 21:35:42 2009 +++ Botnet.pm.new Thu Jan 15 21:36:25 2009 @@ -721,8 +721,16 @@ dnsrch=>0, defnames=>0, ); - if ($query = $resolver->search($name, $type)) { - # found matches + if ($query = $resolver->send($name, $type)) { + if ($query->header->rcode eq 'SERVFAIL') { +# avoid FP due to timeout or other error +return (-1); +} + if ($query->header->rcode eq 'NXDOMAIN') { +# found no matches +return (0); +} + # check for matches $i = 0; foreach $rr ($query->answer()) { $i++; @@ -744,12 +752,12 @@ } } } - # $ip isn't in the A records for $name at all + # found no matches return(0); } else { - # the sender leads to a host that doesn't have an A record - return (0); + # avoid FP due to timeout or other error + return (-1); } } # can't resolve an empty name nor ip that doesn't look like an address
Botnet plugin (was: Temporary 'Replacements' for SaneSecurity)
Daniel J McDonald wrote: I too found botnet to be a great source of FP. By combining it with p0f it's moderately useful. I just found one reason for FPs in the Botnet plugin. It doesn't make a difference between timeouts (and other DNS errors) and negative answers. So if your DNS server/proxy is overloaded (or slow for some other reason), you'll get FPs Since 15 minutes ago, I'm running a slightly modified version of the plugin that tries to avoid this. In a while I'll send a patch to the author. Apart from this the plugin seems to work fine here with a score of +2 (with an extra +1 if p0f says it's a Windows system). Regards /Jonas -- Jonas Eckerman, FSDB & Fruktträdet http://whatever.frukt.org/ http://www.fsdb.org/ http://www.frukt.org/
Re: sa-update does not pick up newest German spam wave
Richard Hartmann wrote: While I agree in general, the text is very static and antivirus eats CPU, SA does not (so much). What AV application do you use? Is it daemonized or does it have to load it's database for every call? Here SA uses lots more CPU than clamd and fprotd does. /Jonas -- Jonas Eckerman, FSDB & Fruktträdet http://whatever.frukt.org/ http://www.fsdb.org/ http://www.frukt.org/