Re: FSL_HELO_BARE_IP_2 & RCVD_NUMERIC_HELO

2013-10-16 Thread Jonas Eckerman
>Operators of newsgroups which mirror/archive mailing
>lists, and allow posting from a web interface, are adding forged
>Received: headers before sending an email to the respective list
>server.

In what way are they forged? Do they contain addresses that doesn't match the 
system adding the received-line or the system it received the message from?

>In both cases the last two Received: headers in each message are
>forgeries as no SMTP transaction occurred.

Does those headers say that a SMTP transaction occurred? If they don't, what is 
forced?

I'm not sure server you mean "last in insertion order" or "last in reading 
order" so I'll answer for both. :-)

Insertion order:

>Received: from list by plane.gmane.org with local (Exim 4.69)
>   (envelope-from )
>   id 1VVzEY-0005lJ-P1
>   for debian-u...@lists.debian.org; Tue, 15 Oct 2013 09:40:02 +0200

This one says it was received locally without using SMTP. This is normal when a 
message is sent/queued by a local application.

>Received: from plane.gmane.org (plane.gmane.org [80.91.229.3])
>   (using TLSv1 with cipher AES256-SHA (256/256 bits))
>   (Client did not present a certificate)
>   by bendel.debian.org (Postfix) with ESMTPS id 7DD8CA6
>   for ; Tue, 15 Oct 2013 07:40:05 +
>(UTC)

This one says nothing says that the message was received with a ESMTP. Do you 
know that it wasn't?

Reading order:

>Received: from 94.79.44.98 ([94.79.44.98])
>by main.gmane.org with esmtp (Gmexim 0.1 (Debian))
>id 1AlnuQ-0007hv-00
>for ; Sun, 13 Oct 2013 19:40:43 +0200

This one says it was received with ESMTP. Again, do you know it wasn't?

>Received: from freehck by 94.79.44.98 with local (Gmexim 0.1 (Debian))
>id 1AlnuQ-0007hv-00
>for ; Sun, 13 Oct 2013 19:40:43 +0200

This one says it was received locally without SMTP. This is perfectly normal if 
it was received from a local application, for example a web server running a 
PHP script or a gateway fetching messaging from something else.

>I'm sure this violates more
>than one SMTP RFC, but I doubt Gmane will change the way they do this
>any time soon.

I don't think it does. Trace headers are useful for mail regardless of the 
protocol used for the transfers between systems/applications, and are defined 
in the Internet Mail Format RFCs (822 descendants,  not sure what the current 
one is but if you start at 2822 you should be able to find it).

(Also, does the SMTP RFCs really apply when your not using SMTP?)

Regards
/jonas
--
 Monypholite gemgas.


Re: Very spammy messages yield BAYES_00 (-1.9)

2012-08-21 Thread Jonas Eckerman

On 2012-08-15 20:56, Ben Johnson wrote:


On 8/15/2012 2:24 PM, John Hardin wrote:

You may also want to set up some mechanism for users to submit
misclassified messages for training.



That sounds like a good idea.
[...] this server runs Ubuntu 10.04 with Dovecot


Since you're using Dovecot you might be able to use the antispam plugin 
for dovecot. It let's you specify a special spam folder, and when users 
move mail into or out of that folder they are spooled or piped for 
retraining as spam or ham.


This way, the user running sa-learn does not need access to the users 
maildirs.


<http://wiki2.dovecot.org/Plugins/Antispam>
<http://johannes.sipsolutions.net/Projects/dovecot-antispam>

Regards
/Jonas
--
Jonas Eckerman
http://www.truls.org/


Re: which is better for virtual domains

2012-05-08 Thread Jonas Eckerman

Please keep discussions on-list.

On 2012-04-23 20:44, "Николай Г. Петров" wrote:


My mail system:
OS: FreeBSD
МТА: sendmail
MDA: maildrop
database: ldap (openldap)
pop/imap: courier-imap


I still have no idea how you call spamassassin or spamc or if you use 
some other method to connect to spamd.



-l -c -i 127.0.0.1 -m 3 --max-conn-per-child=5 --round-robin -u vmail
-x --virtual-config-dir='/corpmail/%d/.spamassassin/' -d -r ${pidfile}
-s /var/log/spamd.log



, but in log I have a:



spamd[7256]: spamd: using default config for root:
/corpmail//.spamassassin//user_prefs

Why 'root'?


Maybe because you haven't succesfully told spamd what user mail address 
to scan the mail for, so it falls back to the default.



Why domains is not apear?


Maybe because spamd don't know the domain.


My question is: privious, you say that you save a awl in mysql - what is
it 'awl' - auto-white-lists?


Yes, AWL is short for Auto White-List. (Wich is a bad name for it.)


And may I save in ldap?


I don't know. I've never used SA with LDAP.


I read manual about
ldap database: I don't understand atribute:

spamassassin: add_header all Foo LDAP read

What they mean from this example?
It's a 'awl' or 'user_prefs' or 'somthing else'?


I have no idea where in what man-page you found that, so I have no 
context at all.



If I right understand I try to reach level on my mail system like this
(please, if something is wrong, critic) ):



/corpmail/domain1/.spamassassin/bayes_seen
/corpmail/domain1/.spamassassin/bayes_toks
/corpmail/domain1/.spamassassin/auto-whiltelist
/corpmail/domain1/user/Maildir/user_prefs - (optionaly)


AFAICT you need to skip the optional one since you cant't keep multiple 
user-dirs for one user, and in your scheme the domain is used insetad of 
the user.



I can automate train spamassassin from individual MDA filter per each of
users, but normal mesage possible goto 'ham'.


I don't know what "normal mesage possible goto 'ham'" means here.


Re-learn I think to configure with forward message to
[spam|nospam]@doman[1|2].ru for each of domains, and by cron put some
script which re-learn from folder spam|nospam on domains.



How do you think it will work? Or may be some better idea?


I've done something similar myself. How good it works depends a lot on 
your users.


/Jonas
--
Jonas Eckerman
http://www.truls.org/


Re: which is better for virtual domains

2012-04-23 Thread Jonas Eckerman

On 2012-04-23 12:23, "Николай Г. Петров" wrote:


If there is a lot of virtual domains with many virtual users in it,
which is the better variant of configuration spamassassin about spam/ham:
   - individual database for each of users
   - or the same database for all of supported domains


You forgot one option:
 - individual database for each domain

The answer depends on the situation (I assume you're asking about the 
bayes database(s)).


If the users have separate bayes databases, will they actually train 
them? If the users doesn't train their databases, a common database 
could work a lot better than individual databases.


How much does the mail streams for the domains have in common? If they 
have a lot in common it makes sense to have a common bayes database for 
them. Otherwise separate databases for each domain migh be better.


Regards
/Jonas
--
Jonas Eckerman
http://www.truls.org/


Re: checking and processing scores different

2010-04-29 Thread Jonas Eckerman

On 2010-04-29 14:58, Raphael Bauduin wrote:

>>> The difference is:
>>> * BAYES_95 in place of BAYES_05
>>> * score is 6.9 in place of 3.9


http://pastebin.org/192054


As you say the mail has been processed twice, with different 
configurations or databases, or with the same databases but different users.


Since the headers only contains full scores for one of the passes, it's 
impossible to know for sure where all the difference in scores came from.


As you noted, one of the passes does have a negative bayes score, while 
the other have no bayes score, but that's not the only possible difference.


Both passes have AWL scores, but we cannot see what score the AWL 
applied to one of them. The AWL score may well be quite different 
between the two passes.


There's nothing strange in getting different total scores when running 
with different databases and/or configurations. Both bayes and AWL are 
supposed to be able to give different scores to the same mail in 
different mail streams.


It is of course possible that there are other differences in scores as 
well if the two passes were run with different local score settings.


/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: dcc: [26896] terminated: exit 241

2010-04-21 Thread Jonas Eckerman

On 2010-04-21 20:05, Michael Scheidell wrote:


On 4/21/10 2:03 PM, Stefan Hornburg (Racke) wrote:

The part of MySQL which is used in Debian (the code without the manual)
is licensed under GPL.



so, the same with DCC.


Not as far as I can see. At both <http://www.rhyolite.com/dcc/> and 
<http://www.dcc-servers.net/dcc/> they link to another, non-generic, 
license.


Quote about the free license from the general info page:
---8<---
You can redistribute unchanged copies of the free source, but you may 
not redistribute modified, "fixed," or "improved" versions of the source 
or binaries.

---8<---

The actual license says:
---8<---
This agreement is not applicable to any entity which sells anti-spam 
solutions to others or provides an anti-spam solution as part of a 
security solution sold to other entities, or to a private network which 
employs the DCC or uses data provided by operation of the DCC but does 
not provide corresponding data to other users.


Permission to use, copy, modify, and distribute this software without 
changes for any purpose with or without fee is hereby granted, provided 
that the above copyright notice and this permission notice appear in all 
copies and any distributed versions or copies are either unchanged or 
not called anything similar to "DCC" or "Distributed Checksum 
Clearinghouse".

---8<---

Wich is more permissive than the info page indicates, but it's not the 
GNU General Public License.


Debian *might* be able to distribute DCC under another name, like they 
did with Firefox / Iceweasel etc.


Regards
/Jonas

--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: More freemail URI spam

2010-04-20 Thread Jonas Eckerman

On 2010-04-17 23:51, Alex wrote:


Somebody on this list wrote a parser to actually parse shorteners to
their obscured URLs.



That would sure be great. I hadn't seen that, but would like to know
more about it. Sounds like a better solution...


That'd be me. It's a plugin called URLRedirect and it's available at
<http://whatever.frukt.org/spamassassin.text.shtml>

It can use Marc's DNS based URL shortener list.

Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: More freemail URI spam

2010-04-20 Thread Jonas Eckerman

On 2010-04-17 21:04, Alex wrote:


Maybe someone knows of a list of all the URL shorteners to be used in
a combo uri/meta rule?


I very much doubt that you'll find a list of *all* the URL shorteners. 
New ones crops up all the time, and old ones disappears.


Marc Perkel posted about a DNS based list he's hosting a while back. I'm 
attaching that message to this one.


Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/
--- Begin Message ---
I don't know if it will be useful but I made a short URL provider list 
that is DNS readable.


I got the list here:

http://longurl.org/services

It's a host name RBL and you can read it as follows:

dig tinyurl.com.shorturl.junkemailfilter.com

Let me know if you find a use for it.



--- End Message ---


Re: on greylisting...

2010-04-07 Thread Jonas Eckerman

On 2010-04-01 19:06, Adam Katz wrote:

For what it's worth, I reconfigured my greylisting relay from a
blanket delay to delaying only spamcop neighbors, anything that hits a
DNSBL, and any Windows *desktop* (using p0f).


I once tried that, had had to refrain from it. The groupware system 
FirstClass installed on Windows NT+ (of different flavors, including 
"desktop" OSes) machines is (or was) popular with swedish disability 
NGOs, and beeing an NGO for deafblind people, we need to be able to 
communicate those systems.


I probably should analyze our current mail stream to see if we still get 
lots of mail from FC systems, and what OSes those seem to be running on 
nowadays.


(The fact that admins of above mentioned FirstClass systems tended to 
configure outgoing SMTP in "odd" ways also amde m putin some 
country/domainbased exemptions...)



If I recall correctly, Jonas's implementation also uses p0f and could
therefore benefit from my analysis.


Yes, my implementation can use p0f. It uses a list of tests that are 
checked in order to decide wether a sending system sould be handled by 
the grylist or not.


I'm currently using tests for OS (p0f), DNS black- and white-lists, 
RDNS, MX, SPF, country (GeoIP), sender domain, local spam/ham history 
and local otgoing hitory to make that desicion.



p0f's results with the (perl-compatible) regular expression
 /Windows (?:XP|2000(?!SP4)|Vista)/
will safely block only desktops.


Interesting. I hope I'll have time to check that against or logs. It 
would nice to have windows desktops greylisted while still beeing able 
to exempt windows mail and groupware systems.


Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: ATTN DEVELOPERS: Mega-Spam

2010-03-30 Thread Jonas Eckerman

On 2010-03-30 13:31, Kai Schaetzl wrote:

Jonas Eckerman wrote on Tue, 30 Mar 2010 00:41:01 +0200:



Unless the greylisting is done *after* receiving the body. Of course,
this will spank innocent senders as well.



Ooops? It spanks *yourself*.


Not really. It does force us to accept the mail before rejecting it, but 
it still rejects a lot of stuff that would otherwise have been scanned 
by ClamAV and SpamAssassin before being rejected.


So, while it does not save as  much bandwidth and work as greylisting 
after RCPT would, it still saves compared to no greylisting. And the 
filter does some more stuff. For example:


We also greylist with *one* temporary failure at connect for each host 
the first the gateway sees it. This stops more that I irst expecteded 
when I tried it.


Once a mail from an MTA has passed the greylist test, that IP is excempt 
from the greylist.


We keep tracks of behaviour we don't like. Uknown RCPTs, spam, too many 
retries before the greylist period (3 minutes) has passed, etc, etc, and 
tempfails hosts at connect based in thsoe counters.


We also make exceptions from the greylist based on DNS whitelists, RDNS 
etc so that most mail from real outgoing MTAs pass right through it.


> Good strategy.

My filter works for us.

Most spam is stopped without the gateway having to scan it with 
SpamAssassin.
Most ham is passed through without beeing subjected to the greylist or 
beeing scanned by SpamAssassin.


And if there still are any stupid MTAs that can't handle tempfails 
correctly at earlier stages trying to send mail to us, we have a good 
chance of receiving it.


When I first implemented greylisting I did the tempfailing after RCPT, 
but some stupid Novell MTA and  a security appliance (I think it was 
from Syamantec) saw no difference between tamporary failures and 
permanen rejects of RCPT TO. And of course one of them they discarded 
the response it got from our server when bouncing the mai back to the 
sender. Even worse, some other idiotic piece of crap (I forgot what) 
reacted to temporary failures at RCPT by simply deleting the mail from 
it's queue without notifying anyone.


So, we lost some incoming mail from organizations that for different 
reasons didn't just throw out or fix their junk, and I moved the 
greylist to after receiving the message data.


Hopefully I could now move it to RCPT, but I actually like beeing able 
to log message-id and subject from greylisted mail and I know it works 
the way it is now.


Rgards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: ATTN DEVELOPERS: Mega-Spam

2010-03-30 Thread Jonas Eckerman

On 2010-03-30 01:29, Brent Kennedy wrote:


Graylisting does work.


I know it works. That's why I said I like it because it stops spam. Been 
using my own implementation for years.



I think after I turned it on, the botnet plug-in got bored.  My stats for it
dropped significantly.  So that’s my proof it does adversely affect botnets.


No, that's your proof that it has a positive impact on your incoming 
mail stream. It does not prove that it have a significant negative 
impact on the botnets.


From what I see, botnets seem to have resources to spare.
A lot of sending bots still hasn't adapted to greylisting.
Bots still tries to send to addresses we have been rejecting for 10 years.

I suspect that If the botnets were short on bandwidth and computer 
power, the programmers would have fix those issues a long time ago. And 
the simple fact that they still haven't adapted to greylisting indicates 
that it's impact is not (so far) big enough to care about.


Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: ATTN DEVELOPERS: Mega-Spam

2010-03-29 Thread Jonas Eckerman

On 2010-03-30 00:12, John Hardin wrote:


While greylisting will help, it won't spank the offender in that manner.
It will postpone the message very early in the SMTP exchange, not after
the body has been received.


Unless the greylisting is done *after* receiving the body. Of course, 
this will spank innocent senders as well.


(My selective greylisting implementation for MIMEDefang does this, 
originally because some stupid MTAs didn't handle tempfails correctly at 
earlier stages... The "selective" stuff keeping delays and spanking of 
innocents down.)


BTW: While I like greylisting because it stops a lot of spam, I've never 
seen any data substantiating claims that it has a measurable negative 
impact on botnets. So I'm not convinced it really does a lot of spanking 
of offenders...


Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: The Impossible Rule??? Bug???

2010-03-26 Thread Jonas Eckerman

On 2010-03-24 14:34, Martin Gregorie wrote:


It's named MimeMagic and is available at
<http://whatever.frukt.org/spamassassin.text.shtml>


Thanks, Jonas. That looks very useful. I've replaced my old
IMAGE_MISMATCH rule with an equivalent based on MimeMagic that uses:


Please make sure to evaluate the results. As stated on the web page, 
still consider the plugin to be somewhat experimental, and I haven't had 
a lot of fedback on it.



header IMAGE_MISMATCH eval:mimemagic_mismatch_contenttype('jpg', 'gif',
'png', 'bmp', 'svg')


That will miss parts with MIME types image/jpeg or image/x-jpeg. 
Replacing jpg with jpe?g would be better.


It will also miss anything where those substrings are not in the 
declared MIME type for the part. So a JPEG image with a .gif extensiuon 
and a application/octet-stream MIME type will not be catched.


It will include parts where any of those strings happens to me 
substrings of any other MIME type, including non image ones. Not sure if 
that will ever matter though.


A rule that should quite a lot of image types might be (just of the top 
of my head, utested):

header IMAGE_MISMATCH eval:mimemagic_mismatch_datatype('image/')

This should do a magic check on all parts, and see if any parts 
identified (by the freedesktop database) as image/* has a mismatched 
MIME type or file name extension.



I don't think MimeMagic is overkill. It is probably only a matter of
time before non-image files turn up with equivalent lying content types
and/or extensions and adding rules to catch them will be trivial.


That's what I thought when I wrote it. :-)

At that time I wanted to catch some stuff that where a RAR atchment had 
a ZIP MIME type (or maybe it was the other way round).


Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: The Impossible Rule??? Bug???

2010-03-23 Thread Jonas Eckerman

On 2010-03-23 12:14, Martin Gregorie wrote:


Is there any possibility that somebody who is more knowledgeable than I
am about images and Perl can extend it to handle BMP and SVG (as a
pre-emptive strike)?


Not exactly that, but I have written a non-image-specific SA plugin that 
can check for mismatches. It's a bit overkill if you only want to check 
for mismatches for images though.


It uses the freedesktop file magic database to recognize file content, 
and provides eval rules to check for file types and mismatches between 
content, mime type and file extension. If the freedesktop database 
contains info for SVG and BMP (it should) it can check for those mismatches.


It's named MimeMagic and is available at
<http://whatever.frukt.org/spamassassin.text.shtml>

Regards
/Jonas

--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: Finding URLs in html attachments

2010-03-11 Thread Jonas Eckerman

On 2010-03-01 15:39, John Hardin wrote:

[ About ExtractText.pm]


Jonas, what's the current status of that plugin? It looks pretty stable
to me.


It works fine here. Don't know how it works for others. I haven't tested 
it with 3.3 yet.



And, can it extract from basic text attachments? I assume so...


It doesn't have any predefined extractor for that, but yes it can.


extracttext_external text {CS:UTF-8} /bin/cat -
extracttext_use text .txt .htm .html text/(?:plain|html)


That ought to work for text/plain. It should be easy to write a minimal 
plugin to extract text/plain though, and avoid the external call.


For text/html we need to strip out the HTML as well. A plugin for that 
should also be easy to write. It should probably use SAs existing HTML 
renderer.


The plugin currently allways calls set_rendered with no type parameter, 
wich means it allways spcifies text/plain. It probably should be able to 
add text/html as html, so that a HTML stripping extractor plugin would 
be redundant. I'll look into this. Can't be sure when I have the time 
though.


Regards
/Jonas

--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: new (small) shortener campaign & suggestion for URLRedirect

2010-03-01 Thread Jonas Eckerman
 links, and have that all
> ready for when we send out HTTP requests.

I don't get this either. How would the UDP requests help them find bad 
links? How it help them distinguish between a spamvertized URL and one 
refernced in a legitimate messae to a high traffic maling list or 
newsgroup and the quoted in replies for a month or so?


They do need to have all working redirects ready at all time any way for 
all regular browsers, and the non working redirects should return error 
codes at any time. So I'm not sure what it is you mean they should have 
ready for our our HEAD requests.


Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: MTX public blacklist implemented Re: MTX plugin functionally complete?

2010-02-15 Thread Jonas Eckerman

On 2010-02-15 15:04, Charles Gregory wrote:

On Sun, 14 Feb 2010, Jonas Eckerman wrote:



1: The participation record is optional, so you only use it if you
want "everything else" to be rejected.



This is why I would support mtamark... It permits the sysadmin to
determine the default behaviour for his IP range, rather than defining a
dangerous default in the client.


In what way does the above define a dangerous default?

The default in the statement above is to consider a domain as *not* 
participating unless otherwise stated by whoever manages the DNS for the 
domain.


If the domain does not participate it should not be punished when a MTX 
record isn't found.


Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: MTX plugin functionally complete? Re: Spam filtering similar to SPF, less breakage

2010-02-15 Thread Jonas Eckerman

On 2010-02-14 19:20, dar...@chaosreigns.com wrote:


On 02/14, Jonas Eckerman wrote:



* I think there should be a way to tell the world wether you are using
the scheme for a domain (not host) or not. This could easily be done in
DNS.



I need to think about this more, thanks for the suggestion.  (More on
registrar boundaries below.)



* I think you should follow conventions in DNS naming, using an
underscore to signify that the DNS record is a "special" type of record.
This is quite common.



That's probably a good idea, hmm.



You could use SpamAssassins registrar boundaries stuff for getting the
domain in a SA plugin, and score higher for missing MTX host record if
there is an MTX domain record.



How good is SA's registrar boundaries stuff?


Not sure, but it's used in various places if you use SA, so if it isn't 
good that will have effects on SA anyway.



I don't think
"Use SpamAssassin's registrar boundaries" would be good in an RFC.


I only meant that SA's Mail::SpamAssassin::Util::RegistrarBoundaries 
could be used for this in an SA plugin.


In the RFC I'd suggest it be specified that domain policy's should be 
checked based on domain registry boundaries (but with better wording 
than mine).



I don't even know where the record should be for wildlife.state.nh.us.
www.state.nh.us exists, which would indicate mtx.state.nh.us.


Mail::SpamAssassin::Util::RegistrarBoundaries::trim_domain returns 
"wildlife.state.nh.us" for "wildlife.state.nh.us" (and for 
"whatever."wildlife.state.nh.us"), suggesting that a policy record 
should be "policy._mtx.wildlife.state.nh.us" or similar.


Wether that makes sense or not, I don't know. It does trim for example 
"mail.microsoft.us" to "microsoft.us", so I guess there's a special 
reason for it to trim the "state.nh.us" subdomains to more than two levels.



Even if SA's registrar boundaries pointed to mtx.wildlife.state.nh.us,
you'd still need to be able to delegate to another subdomain.


Yes, you'd need that. As I see it, there are two simple ways to do this.

* Make it possible to indicate plicy delegation in the policy record. I 
see you thought about this one allready. :-)


* Or, make a MTX checker traverse domain from the one it checks towards 
the registry boundary when checking for policy. This means more DNS 
lookups but might be easier to administrate. (I have a vague 
recollection that DKIM or ADSP works this way... Not sure though)



Or maybe participant._mtx.frukt.org.  Giving an A record to the _mtx
subdomain itself seems potentially problematic,


Agreed. And seeing as a hostname should not contain underscore, that 
wasn't a very good idea of mine.

Any suggestions other than
"participant"?


"policy" seems better than "participant" to me.

Regards
/Jonas

--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


HELO SPF + FCDNS (was: MTX plugin functionally complete? Re: Spam filtering similar to SPF, less breakage)

2010-02-15 Thread Jonas Eckerman

On 2010-02-14 19:20, dar...@chaosreigns.com wrote:


On 02/14, Jonas Eckerman wrote:



The SPF record above says that a host using "panic.chaosreigns.com"
in HELO should not be allowed to send mail unless it has the IP
address 64.71.152.40, regardless of the domain in the envelope
from, From: header, etc..



You're right, I missed that, thank you.  The complication, of course,
is where a spammer owns the (forgable) HELO domain but not the IP
(PTR). Full circle DNS handles that.  Has the combination been
implemented?


I've no idea wether any software actually checks the combination of HELO
SPF and FCDNS. It does seem a logical thing to do in software like
SpamAssassin or MIMEDefang. Maybe I should implement it in my
MIMEDefang filter just to log the results and see if it'd be a good idea
to reject on it...


Possibly a lack of separate SPF records for HELO and MAIL FROM if
they are the same.


Agreed. I think they should have separated those records. But then I 
also think they should have created an _spf subdomain from the start 
instead of using the TXT record for the domain without any special 
qualifier...


Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: _mtx Re: MTX plugin functionally complete?

2010-02-15 Thread Jonas Eckerman

On 2010-02-15 02:06, dar...@chaosreigns.com wrote:


Thank you for contacting us. An underscore is only legal for specific
types of DNS records, such as 'SRV'. 'A' records should only contain
letters, numbers and dashes. You may want to consider using '-' as
a substitute. I hope this helps. Please don't hesitate to contact us
should you have any further questions or concerns.


I'm finding *nothing* else that uses underscores in the names of A records.
I'm thinking I should stick with "mtx" instead of "_mtx".

Please let me know if there is some evidence I'm missing that it's
reasonable to use an underscore in this context.


The point of using an underscore in "special" records is that the "host" 
is *not* a normal hostname.


DKIM (including ADSP) uses _domainkey.domain.example:
http://dkim.org/specs/rfc4871-dkimbase.html#rfc.section.7.4
http://www.rfc-editor.org/rfc/rfc5617.txt

According to the DKIM and OpenSPF folks (and, less important, 
WikiPedia), underscore is forbidden in hostnames only:

http://domainkeys.sourceforge.net/underscore.html
http://www.openspf.org/DNS/Underscore
http://en.wikipedia.org/wiki/Hostname#Restrictions_on_valid_host_names



I could use TXT records.  I kind of like the A records.  Well established
for DNS BLs and WLs and all.


TXT records might be, prinicpally, the "correct" way to do this, but A 
records are more efficcient and some caching only DNS proxies might be 
set up to cache A record lookups (negative and positive) better than TXT 
records.


If there is to be a policy record, maybe that should be a TXT record, 
but I too like the A record for the actual MTX lookup.


Regards
/Jonas

--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: MTX public blacklist implemented Re: MTX plugin functionally complete?

2010-02-14 Thread Jonas Eckerman

On 2010-02-14 20:06, dar...@chaosreigns.com wrote:


I remembered why (else) I didn't want to do that.  It effectively says
"Everything else should be rejected."  Which will discourage some people
from using it.  So you would at least need to provide a way to say "Yes,
I'm participating, but anything without an MTX record is valid too."


The first two solutions for this that pops into my head:

1: The participation record is optional, so you only use it if you want 
"everything else" to be rejected.


2: Make it a policy record rather than a participation record, so you 
can specify more stuff. Either a TXT record or a bitmaped A record for 
example. Call it "_policy._mtx.*".



More on the other very valid concerns later about this...


Regards
/Jonas

--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: MTX plugin functionally complete? Re: Spam filtering similar to SPF, less breakage

2010-02-14 Thread Jonas Eckerman

On 2010-02-13 21:48, dar...@chaosreigns.com wrote:


Looks like it ties the helo domain to the delivering IP, breaking (broken)
forwarding just like SPF?


Tying the HELO domain to an IP has does not break forwarding. The host 
name (including domain) used in HELO is independent from the domain used 
in MAIL FROM.


(It's not that use of SPF that breaks (borken) forwarding, it's the 
limits connected to the domain used in MAIL FROM.)


Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: MTX plugin functionally complete? Re: Spam filtering similar to SPF, less breakage

2010-02-14 Thread Jonas Eckerman

On 2010-02-13 04:24, dar...@chaosreigns.com wrote:


Still http://www.chaosreigns.com/mtx/


I still have the following comments (wich you didn't answer previously):

* I think there should be a way to tell the world wether you are using 
the scheme for a domain (not host) or not. This could easily be done in DNS.


* I think you should follow conventions in DNS naming, using an 
underscore to signify that the DNS record is a "special" type of record. 
This is quite common.



You could use SpamAssassins registrar boundaries stuff for getting the 
domain in a SA plugin, and score higher for missing MTX host record if 
there is an MTX domain record.



An example (of the top of my head) could be:

To say that "marmaduke.frukt.org" [195.67.112.219] is allowed to send mail:
219.112.67.195._mtx.marmaduke.frukt.org. IN A 127.0.0.1

To say that we're using your scheme for all hosts under "frukt.org":
_mtx.frukt.org. IN A 127.0.0.1

If anyone connects from a host where reverse lookup or HELO puts it in 
"frukt.org" domain, you know that your should reject or score high 
unless it has FCDNS and a matching MTX record.



(And of course, if this catches on, you'll have to provide RFC style 
documentation.)



Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: MTX plugin functionally complete? Re: Spam filtering similar to SPF, less breakage

2010-02-14 Thread Jonas Eckerman

On 2010-02-13 04:24, dar...@chaosreigns.com wrote:


panic.chaosreigns.com. IN SPF "v=spf1 a:64.71.152.40 -all"


No.  MTX defines 64.71.152.40 as a legitimate transmitting mail server,
regardless of the domain in the envelope from, From: header, etc..
Popular misconception, it seems.


The SPF record above says that a host using "panic.chaosreigns.com" in 
HELO should not be allowed to send mail unless it has the IP address 
64.71.152.40, regardless of the domain in the envelope from, From: 
header, etc..


That's not exactly the same as your MTX scheme, but it has similar 
results when combined with a FCDNS check on HELO (providing your scheme 
is universally adopted).


If you're serious about your proposal, you should explain (in your 
documentation) in what important way it differs from SPF as used against 
HELO and other similar schemes, and why it is better.


Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: Spam filtering similar to SPF, less breakage

2010-02-10 Thread Jonas Eckerman

On 2010-02-09 22:31, dar...@chaosreigns.com wrote:
[Ideas for a new scheme similar to a subset of SPF.]

I don't think the SpamAssassin users list is the right place to discuss 
a new generasl scheme like this, but here goes anyway.


Please not that the comments below is just a first reaction. I haven't 
really thought this through.


A general thought is:
What does your current scheme give that HELO SPF + FCDNS doesn't?
(SPF can be used with HELO as well as MAIL FROM).


What format should this arbitrary A record be?


I suggest you use a leading underscore for you magic subdomain 
(2.0.0.10._mtx.smallbusiness.com).


I do this because I think your scheme need one more thing to be of any 
use at all. It needs a way for the domain owner to sopecify that they 
are using it. This could be done by creating a record for 
"_mtx.smallbusiness.com".


Without a way to indicate wether the scheme is used or not, it'll be 
unusable for blocking until *all* major email providers as well as 
almost everyone else is using it.


Using an underscore makes it less likely to collide with existing host 
names. It also makes it more apparent that it's not a regular hostname.


A new record type might be even better.

Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: Faked _From_ field using our domain - how to filter/score?

2010-01-15 Thread Jonas Eckerman
1.  It shows up as internal mail so gets -6 points or so from the 
auto-whitelist thus giving it a decent chance of getting through.


If it shows up as internal mail even though its external something is 
wrong.


The AWL takes both the renders email address and the sending systems 
IP-address into account. For some reason it seems it can't differentiate 
between the relevant sending systems in your setup.


Regards
/Jonas 


Re: Cooperative data gathering project.

2009-12-18 Thread Jonas Eckerman

Per Jessen wrote:


DNS lookups are usually tried done with UDP first,


Sure, DNS usually uses UDP, but the DNS resolver also waits for an 
answer, wich is simply a waste of time when the sender doesn't need the 
answer.


Add to this that resolving one address may result in multiple queries 
and that a DNS answer often containes more that the queried info and you 
get more overhead.


> but I agree, just use UDP.

Absolutely. Imo, the approach suggested by Marc is a text-book example 
of when to use UDP.


(And if more security is needed the easiest way would be to simple limit 
access to approved IP addresses.)


Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/



Re: Cooperative data gathering project.

2009-12-18 Thread Jonas Eckerman

Jason Haar wrote:

Then the third filed is NONE. That's how I do it. But the idea is that 
any kind of daya can be collectively gathered and distributed.




Instead of a TCP channel (which means software), what about using DNS? 
If the SA clients did RBL lookups that contained the details as part of 
the query,


With any sane SpamAssassin setup for multiple users this wouldn't work.

Any SA install except for very small mail flows should use a caching DNS 
 server/proxy, preferably one that caches negative results. It's also a 
good idea if the caching server used for DNSL checks enforces a minimum TTL.


This results in repeated queries not making it to the origin servers. 
Even if the origin server uses ridicilously low TTLs.


The distributed caching nature of DNS is a reason why DNSLs are so 
efficient, but also one reason why DNS isn't suitable for everything.


Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/



Re: Cooperative data gathering project.

2009-12-18 Thread Jonas Eckerman

Marc Perkel wrote:


spam 1.2.3.4 example.com
ham 5.6.7.8 example2.com



Sending these one line TCP messages if fairly easy.


Why use TCP for this? Establishing a connection channel for simple short 
mesages where a return code is not required introduces pointless overhead.


It'd be much simpler using UDP instead.

Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/



Re: URLRedirect.pm & Short URL Providers RBL List

2009-12-16 Thread Jonas Eckerman

Jonas Eckerman wrote:

At the time I mentioned that I planned to add support for 
that list in my URLRedirect plugin.

That support is there, and it seems to be working.


Of course I forgot to include where the module can be found...

It's available at
<http://whatever.frukt.org/spamassassin.text.shtml>

Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


URLRedirect.pm & Short URL Providers RBL List

2009-12-16 Thread Jonas Eckerman

Hi!

In November Marc Perkel announcend a trial of a DNS based list of short 
URL providers. At the time I mentioned that I planned to add support for 
that list in my URLRedirect plugin.


That support is there, and it seems to be working.

This module follows URLs (in parallel, using HEAD requests) matching 
specifications or found in a DNSL and adds the location of redirections 
to metadata (so that the "real" sites are checked by URIBLs and other 
rules). The addition of Marc's DNSL might make the plugin a lot better, 
since I did not have an updated list of URL shorteners for it.


The module should be seen as a test of concept. If spammers abuse URL 
redirectors this module and Marc's DNSL could help, but I have not 
collected any stats to see how helpful it is in practice, and I don't 
know wether Marc's list contains the URL shortener services most abused 
by spammers.


Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: Suggestion for use by ANY whitelist service....

2009-12-08 Thread Jonas Eckerman
Assuming "they" below refers to Habeas. Please ignore this mail if it 
refers to Return Path.


Ted Mittelstaedt wrote:

They have had the option to do this already for years, now, and have 
elected to use implied threats to the world's ISP's, rather than 
regularly participating on this list.


To my knowledge Return Path hasn't owned Habeas for "years" yet (I think 
they bought it a little more than a year ago or so).


If your view of Return Path is the same as your view of Habeas your 
statement makes sense, but otherwise I think you ought to let your view 
of Return Path color your opinions of Habeas.


This might still be a good time (though a little late) to get Habeas' 
current owners to make the necessary changes to the Habeas part of their 
company for the Habeas brand to get a a somewhat better reputation among 
anti-spam folk. After all, the reputation of Habeas can now tarnish the 
reputation of their main brand as well.


Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: Need help running SA in a (comparative) anti-spam test

2009-11-29 Thread Jonas Eckerman

Martijn Grooten wrote:


- I'm happy to add any extensions as long as these are also free and
open source -- note that our 'target audience' includes big ISPs and
unfortunately for them things as Spamhaus's RBL aren't free;


This doesn't make any sense. You are comparing SA to commercial products 
that aren't free, and wich may use their providers own black lists or 
even include a volume license for third party lists, and yet you won't 
allow SA to use lists that aren't completely free?


I'd assume that a big ISP using SA (and wants the best from SA install) 
would pay to use the better DNSBLs.


Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: Short URL Providers RBL List

2009-11-09 Thread Jonas Eckerman

Mike Cardwell wrote:

I don't know if it will be useful but I made a short URL provider list 
that is DNS readable.



Been done. See http://rhs.mailpolice.com/#rhsredir


Thet don't seem to be the same thing. Quote from the page you linked to:
---8<---
This includes any website which provides an open mechanism to redirect a 
web browser to another website, ie, by adding a 
url=http://anotherwebsite in the URL.

---8<---

I would not consider open redirector of that type to be the same thing 
as a URL shortener service (especially not a well run service).


/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: Short URL Providers RBL List

2009-11-09 Thread Jonas Eckerman

RW wrote:

On Thu, 05 Nov 2009 20:05:25 +0100



Thanks. That could be usuable in my URLRedirect plugin. A current
list of URL redirectors is the main thing missing from that plugin.

It would be even better it included info about wether a URL shortener 
uses HTTP redirects (wich is what my plugin checks). 



One other thing is that sometimes the links have already been cancelled
for abuse, and the redirection goes to a page saying that. Such pages
aren't going to be in any URIBL list, but obviously they are very
strong spam indicators.  Ideally there would be a regex to match those
links on each redirection service.


Since my plugin adds redirect targets to the message metadata that check 
could be done with a normal URL rule.


(It might be useful if the plugin flags redirections that redirects to 
the same domain.)


/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/



Re: Short URL Providers RBL List

2009-11-05 Thread Jonas Eckerman

John Rudd wrote:


The point is: the URL shortening service isn't the interesting part of
the equation.  The expanded URL is.


If the service uses HTTP redirects it can be checked pretty cheep, wich 
is what my URLRedirct plugin does. It adds the redirected-to URL to a 
messages metadata so that other checks (URIBL for example) sees it.


Using a DNS based well kept after list of redirectors could make that 
plugin much more useful than it is currently.


Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: Short URL Providers RBL List

2009-11-05 Thread Jonas Eckerman

Marc Perkel wrote:

I don't know if it will be useful but I made a short URL provider list 
that is DNS readable.


Thanks. That could be usuable in my URLRedirect plugin. A current list 
of URL redirectors is the main thing missing from that plugin.


It would be even better it included info about wether a URL shortener 
uses HTTP redirects (wich is what my plugin checks). My plugin won't 
check for frames, meta refresh or other "redirection" variants that 
requires content to be fetched.



Let me know if you find a use for it.


I've got a use for it. Now I just need to implement it as well. I'll 
post here when/if I implement it in URLRedirect.pm.


For more info on URLRedirect.pm check at
http://whatever.frukt.org/spamassassin.text.shtml

Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: word file spam

2009-10-13 Thread Jonas Eckerman

Matus UHLAR - fantomas wrote:


Yes, but generic plugin should be able extract images for later processing

> (FuzzyOCR or maybe even things like Bayes) too ;)

That would depend on what you mean by "generic". :-)

It's a generic text extractor plugin, with the ability to call an OCR 
program for getting text from images. Wich is what I wanted, and is what 
 John mentioned in his post.


It's not a generic attachment parser and object extractor (though it 
might become one).


I do want it to be able to add stuff rendered to HTML, but 
Mail::SpamAssassin::Message::Node doesn't (currently) have a 
set_rendered variant for doing that, and I haven't had the time to work 
on Mail::SpamAssassin::Message::Node.


I'm not sure exactly what would be the correct way to add parts (such as 
extracted images) to the message. I have thought about it, and the 
plugins plugin architecture does support this. I just haven't had the 
time to find out how to do it.


I don't know what you mean by "even things like Bayes". The plugin does 
make the extracted text available to bayes (this is what I made it for), 
and it can call OCR programs.


Making extracted images available for FuzzyOCR is (as mentioned above) 
something I want to do. Since I don't do any OCR at all here, that's a 
pretty low priority though (unless people start asking for it more).


Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: word file spam

2009-10-13 Thread Jonas Eckerman

John Hardin wrote:

There were mutterings about a generic plugin that would take an 
attachment, process it somehow (e.g. wvHtml, antiword, ps2ascii, or 
whatever was appropriate), and insert the results into the body text to 
be scanned by the regular rules.


That sounds very much like my ExtractText plugin. It can use command 
line tools or perl plugins to extract text from attachments.


There were a bit more than mutterings about it here. :-)

> I don't think anything has come of that yet.

The plugin works, and we use are using it in our mail gateway.

It's listed on the Custom Plugins wiki page, and is available at 
<http://whatever.frukt.org/spamassassin.text.shtml>.


It comes with a config for extracting  text from Word, OpenXML, RTF, ODF 
and PDF files.


Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: word file spam

2009-10-13 Thread Jonas Eckerman

McDonald, Dan wrote:


The word doc has a pretty standard 419 body in it,  I recall some
mutterings on this list about using wvHtml to regularize word docs.


My ExtractText plugin can use a command line tool to extract text from 
word documents and add the text to the message so it is available to 
bayes and rules. It comes with a config that uses antiword to do this.


It's available at
http://whatever.frukt.org/spamassassin.text.shtml#ExtractText.pm

Regards
/Jonas

--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: Pyzor or DCC

2009-07-23 Thread Jonas Eckerman

Michael Hutchinson wrote:


I saw a test
message with just the word test in the subject hit DCC once.



That's really strange, I don't see how DCC would fire on the subject..
the checksum of the message must have somehow matched some Spam.. 


That's perfectly normal. DCC doen't just match spam, it matches things 
that has been seen before. That means it matches bulk, but also anything 
that happens to be very common for other reasons.


I imagine that an empty message with the subject "test" is pretty 
common, so it's perfectly reasonable for DCC to have seen such messages 
many times before.


I don't know if DCC cares about the subject att all. If it doesn't, it's 
even more liekey that it would hit on an empty test message.


/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: Display Bayes tokens?

2009-07-21 Thread Jonas Eckerman

Peter Sabaini wrote:


I'd like to verify the tokens Bayes uses to classify;

[...]

Is this encoded in some way?


Yes.

If you use SQL for bayes you can use my plugin CollectTokens plugin to 
collect new tokens indexed by the encoded value used by the bayes 
system. That way you can look upp tokens and see what they were. Of 
course, you'll only be able to look up tokens that were learnet after 
you started using the plugin.


I've only tested the plugin with MySQL, but it shouldn't be hard to 
modify it to use another SQL system.


The plugin is available at
<http://whatever.frukt.org/spamassassin.text.shtml>

Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/



Re: Plugin extracting text from docs

2009-07-17 Thread Jonas Eckerman

Matus UHLAR - fantomas wrote:


I've been thinking about it. The pdftohtml could provide interesting
infromations like colour informations that could lead to better spam
detection. Any experiences with this?


I've been thinking a bit more about this.

My current plan is to download the trunk version of SA from SVN to a 
development system and put a decent way for plugins to ask SA to render 
the "extracted" HTML into visible, invisible, meta, etc.


Once done and somewhat tested I'll see what the devs thinks about my patch.

It shouldn't be hard at all, it's a small change to 
Mail::SpamAssassin::Message::Node, but I never seem to have as much time 
as I need for even half of my work and projects... :-/


If the patch is accepted, my ExtractText plugin will use the opened up 
functionality if it's there. If it's not any extracted HTML will be 
added using set_rendered as it does now.


/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: Plugin extracting text from docs

2009-07-13 Thread Jonas Eckerman

Matus UHLAR - fantomas wrote:

Ah. I didn't see that option. That's nice. I'm now using pdftotext  
instead of pdftohtml here as well. :-)



I've been thinking about it. The pdftohtml could provide interesting
infromations like colour informations that could lead to better spam
detection. Any experiences with this?


You're right. It should be usefult to extract to HTML when possible, and 
then use Mail::SpamAssassin::HTML to get and then set properties just 
like the rendered method of Mail::SpamAssassin::Message::Node does.


The nice way to do this would IMHO be to make it possible for a plugin 
to call the "rendered" method of Mail::SpamAssassin::Message::Node 
passing type and extracted data as parameters.


Something like this (completely untested, and watch for wraps):
---8<---
--- Node.pm Thu Jun 12 17:40:48 2008
+++ Node-new.pm Mon Jul 13 17:22:20 2009
@@ -411,16 +411,17 @@
 =cut

 sub rendered {
-  my ($self) = @_;
+  my ($self, $type, $text) = @_;

-  if (!exists $self->{rendered}) {
+  if ((defined($type) && defined($data)) || !exists $self->{rendered}) {
 # We only know how to render text/plain and text/html ...
 # Note: for bug 4843, make sure to skip text/calendar parts
 # we also want to skip things like text/x-vcard
 # text/x-aol is ignored here, but looks like text/html ...
+$type = $self->{'type'} unless (defined($type));
 return(undef,undef) unless ( $self->{'type'} =~ 
/^text\/(?:plain|html)$/i );


-my $text = $self->_normalize($self->decode(), $self->{charset});
+$text = $self->_normalize($self->decode(), $self->{charset}) unless 
(defined($text));

 my $raw = length($text);

 # render text/html always, or any other text|text/plain part as 
text/html

---8<---

This way, AFAICT, any extracted (or generated) HTML should be treated 
the same way a normal text/html is. Making it available to HTML eval 
tests for example.


Otherwise my plugin could of course use Mail::SpamAssassin::HTML itself.
Unfortunately Mail::SpamAssassin::Message::Node has no nice methods for 
setting the separate relevant properties though, so either the 
set_rendered metod needs to be expanded or complemeted to allow this 
anyway, or my plugin will have to directly set the relevant properties 
(wich makes it depend on Mail::SpamAssassin::Message::Node not being 
changed too much).


I guess I could do the hack version now, and then update it if/when 
Mail::SpamAssassin::Message::Node is updated to support this in a nice 
way. :-)


Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: Short URL provider list?

2009-07-10 Thread Jonas Eckerman

Marc Perkel wrote:

> Does anyone have a list of all domains that provide short url
> redirection?

An added wish from me:

Does anyone have a list of URL shorteners actively used by spammers?

Thanks for the lists. I'm not sure what I'm going to do with it but I'm 
going to see if I can find a way to use it.


If I have the time I'll check those list and add more URL shorteners to 
the example config for my URLRedirect plugin.


AFAICT my plugin works, but to be effective it do need a list of URL 
shorteners used by spammers, wich I haven't had the time to compile.


I've just updated that module, and it can now read lists of redirectors 
from flat files, and has eval tests for redirect recursion checks.


In case you (or anyone else) wants to experiment with fetching redirect 
locations from URL shorteners (so that normal URL and URIDNSBL rules can 
get at the real site), or score based on recursive redirects from URL 
shorteners, please download and test the plugin.


Note: the plugin only does "head" requests, and only to sites in it's 
redirector lists, so it does not have all the cons that actually 
fetching pages or sending request to the spamvertised web sites has.


It's available at
<http://whatever.frukt.org/spamassassin.text.shtml>

Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: SpamAssasin .pm & .cf file

2009-07-10 Thread Jonas Eckerman

chauhananshul wrote:


I'm new to linux world can some one please help in understanding .cf &.pm
files.


Neither of those files are specific to linux.

The .pm files are perl modules. To understand how those works in detail 
you need to learn perl. You don't need to know this when using 
SpamAssassin though.


The .cf files are specific to SpamAssassin. To learn how they work read 
the SpamAssassin documentation. Particularly

perldoc Mail::SpamAssassin::Conf
also available at
<http://search.cpan.org/~jmason/Mail-SpamAssassin-3.2.5/lib/Mail/SpamAssassin/Conf.pm>


I've used .cf files from http://www.rulesemporium.com i used to copy in
/usr/share/spamassassin/


Don't do that.

You should put your custom .cf files in the site rules directory. 
Usually "/etc/mail/spamassassin" or "/usr/local/etc/mail/spamassassin".



it works


But will stop working if you use sa-update or upgrade SpamAssassin.


but at some sites both .pm & .cf fiels
are available can some one please guide me wht to do with .pm files

> how to install them or make them work for me.

Read about "loadplugin" in the above mentioned documentation.

Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: Managing SA/sa-learn with clamav

2009-07-10 Thread Jonas Eckerman

Steven W. Orr wrote:


http://wiki.apache.org/spamassassin/ClamAVPlugin



It looks like what I thought I wanted already exists. Based on what I wrote
above, and that I like the result of running sa + clamav via the two milters,
does anyone have any caveats for me?


1: When running ClamAV inside SA you have to run SA even if ClamAV finds 
a virus. This requires more resources than just ClamAV. And ClamAV is 
way faster and requires far less than SA does.


2: If an infected whitelisted mail comes in, you would need a much 
higher score than the example (10) to stop the virus from passing.


3: If you just tag (and don't block) spam, using ClamAV only from within 
SA will actually let the virus infected mail though to users.


All this said, we run CLamAV both from a milter (MIMEDefang) before SA 
*and* from SA with the plugin using different configurations.
The clamd instance used *before* SA only has the official ClamAV sigs 
and has phishing sigs and some checks turned off.
The clamd instance used *in* SA has the official sigs as well as some 
third party sig sets and has phishing, broken exe, etc checks turned on.



Once question I have: If I use the plugin and it fires, will it in fact
contribute to the bayes and AWL tables ending up as I described above? Or is
there a placement question of where the plugin should be invoked?


That plugin simply makes an eval test available that you can use for 
scoring. The effects of it's scores on bayes and AWL is the same as for 
any other scoring rules in SA.


Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: Plugin extracting text from docs

2009-07-10 Thread Jonas Eckerman

Rosenbaum, Larry M. wrote:


I have found the Xpdf package [...] has a pdftotext command line utility.

> If you build it with the "--without-x" option,

Ah. I didn't see that option. That's nice. I'm now using pdftotext 
instead of pdftohtml here as well. :-)


And I've just uploaded a new version of the ExtractText plugin with a 
few changes.


Also, it's now included on my SA page at
<http://whatever.frukt.org/spamassassin.text.shtml> as weel as the 
CustomPlugin page in the SA wiki.


(For those having problems downloading from the above server, the zip 
archive should be automagically mirrored to 
<http://mmm.truls.org/m/ExtractText.zip>.)


Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: constantcontact.com

2009-07-03 Thread Jonas Eckerman

rich...@buzzhost.co.uk wrote:


(You do know what "legacy" means, right?)



Sure - do you? If it's left in the core code because the URI never
listed CC in the past that makes it legacy to me. If we consider that
argument now that cc *is* listed by urbl then the legacy argument that
was used, is gone. It becomes an SA issue for effectively white listing
*from urbl lookups* a known rotten/black listed uri.


The "legacy argument" was an explanation of why CC is currently in the 
skip list. As, such, it still stands. It still explains why CC is 
currently skipped.


It was never an argument for why CC should be skipped. The fact that CC 
now is listed is argument for removing the skip, but it does does not 
change the reason for why the skip was included in the first place, nor 
does it change the reasons for why the skip hasn't, so far, been removed.


Seems like you think missing a score of 0.25 would be worth money to 
someone. I think that's pretty silly.



Depends. If you are sitting at 4.79 and the have a block score of 5.00
it makes a difference.


Do you mean to say that a large enough amount of mail from CC get from 
4.76 to 4.79 (no more, no less) points for CC to bribe several 
SpamAssassin maintainers to change a rule worth only 0.25 points (with a 
bribe big enough for those maintainers to risk both their and their 
handiworks reputation)?


Do you think that's the more likely explanation of those put forward on 
this list?



Calling it whitelisting also seems silly.



Jonas I always thought you were grown up enough to be able to fill in
the blanks here. White listed from URI lookups. Please, don't be silly
now.


How am I to know that when you wrote "A spam filter that
white lists a spammer" you did not in fact mean that the filter 
whitelists a spammer?


How I am to know that when you wrote "SpamAssassin effectively white 
listing spammers" you did not in fact imply that SpamAssassin is 
whitelisting spammers?


If you think I'm silly for believing that you mean what you write, then 
please keep considering me silly.


/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: constantcontact.com

2009-07-03 Thread Jonas Eckerman

rich...@buzzhost.co.uk wrote:


m...@haven:~$ host constantcontact.com.multi.uribl.com
constantcontact.com.multi.uribl.com A   127.0.0.4
m...@haven:~$



Oh Dear - that kind of rains on the parade of the 'legacy' argument and
puts the ball into the SA court.


Actually, it gives strength to the "legacy" argument, and the ball wass 
allready in the SA court.


(You do know what "legacy" means, right?)


constantcontact.com.multi.uribl.com. 1800 IN A  127.0.0.4



Seems like the cynical who make 'silly assumptions' may not be as silly
as we first thought.


Seems like you think missing a score of 0.25 would be worth money to 
someone. I think that's pretty silly.


Calling it whitelisting also seems silly.


I do think that the skipping of CC should be reviewed though. It might 
be listed in other URIDNSBLs for example.


If the main purpose of the default list of domains to skip URIDNSBL 
checks for is to save resources by not checking domains that won't be 
hit anyway, then the whole list should probably be regularly checked by 
a script that simply flags any domains present on URIDNSBLs for review 
(or possibly just comment them out of the list).



/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: constantcontact.com

2009-07-03 Thread Jonas Eckerman

rich...@buzzhost.co.uk wrote:


Should that be Hi$torical Rea$ons ?


If there was a monetary reason (aka bribe), I'd think CC would have been 
whitelisted.


As it is, CC is *not* whitelisted in SA. At least not according to your 
own posts. What you have noted is that CC is *skipped* by *one* (1) type 
of rules (URIBL checks). No more, no less.



As it stands the is simply white listing a bulker.


No, it isnä't. Skipping URIBL checks for a domain is very far from 
whitelisting the domain when done in SA. SA is a scoring system where 
the combined score of all rules is what decides how to flag a message.



I'm cynical. The only logical
reason I can see for anything of this nature is money changing hands.


That's not beeing cynical. It's beeing unbelievably unimaginative.

/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: Plugin extracting text from docs

2009-07-02 Thread Jonas Eckerman

Rosenbaum, Larry M. wrote:


It appears that "pdftohtml" is only available as a Windows executable (on 
Sourceforge).


If you want a precompiled executable it seems Windows is the only 
platform, but AFAICS the source code is also available at

http://sourceforge.net/projects/pdftohtml/files/

> I need something that will run on Solaris.

I've no idea wether it compiles on Solaris or not, but since I installed 
it from ports on FreeBSD I do know that it compiles on at least one Unix 
like OS and doesn't require Windows.


Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: Plugin extracting text from docs

2009-07-02 Thread Jonas Eckerman

Benny Pedersen wrote:


pdftohtml is imho not found in gentoo, but pdf2html is maybe the same ?


I wouldn't know since I haven't got any Gentoo machines.

The "pdftohtml" I'm using is installed from FreeBSD ports.
It can be downloaded from
<http://pdftohtml.sourceforge.net/>


only problem i had was that unrtf nedd to have ${file} in the example cf to 
work all else works



I'm using unrtf 0.21.0. Are you using an older version?



0.20.5 latest unstable on gentoo, unless i self bump it


Ah. Then I guess reading from stdin is a new feature in 0.21.


one thing i need to know is how to control the tmp file path, i cant find where 
this is made


I'm using Mail::SpamAssassin::Util::secure_tmpfile, so it's SA that 
controls the path to the temp files. I don't know if you can set that in 
SAs config or not.


Regards
/Jonas

--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: Plugin extracting text from docs

2009-07-02 Thread Jonas Eckerman

Benny Pedersen wrote:


just tested this plugin here, all i can say it rooks viagra out of docs rtf 
files :)


I just saw it extract a 419 from a word doc so that it was catched by 
bayes and a bunch of rules (it would actually have slipped past our 
filter otherwise). :-)


> well done

Thanks.


only problem i had was that unrtf nedd to have ${file} in the example cf to 
work all else works


Odd. I don't need ${file} for unrtf here.

I'm using unrtf 0.21.0. Are you using an older version?

Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: ExtractText plugin

2009-07-02 Thread Jonas Eckerman

Jonas Eckerman wrote:

For anyone who likes to test stuff, I've uploaded my plugin that 
extracts text from documents to

<http://whatever.frukt.org/graphdefang/ExtractText.zip>


In case any of you have problems downloading the file, it's now mirrored as
<http://mmm.truls.org/m/ExtractText.zip>

And, please tell me of problems.

Regards
/Jonas

--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: Plugin extracting text from docs

2009-07-02 Thread Jonas Eckerman

Benny Pedersen wrote:


<http://whatever.frukt.org/graphdefang/ExtractText.zip>).


I've now mirrored the file as
<http://mmm.truls.org/m/ExtractText.zip>

I hope that will work better.

Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: Plugin extracting text from docs

2009-07-01 Thread Jonas Eckerman

Rosenbaum, Larry M. wrote:


We can use antiword to render text from MSWord files, and unrtf to render text 
from RTF files.  What is the best tool to render text from PDF files?


I don't know what the best tool is, but I'm currently using pdftohtml in 
XML mode (and then stripping the XML) in my ExtractText plugin.


(For more info about the plugin, see my post with subject "ExtractText 
plugin", or download it from 
<http://whatever.frukt.org/graphdefang/ExtractText.zip>).


Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


ExtractText plugin

2009-06-29 Thread Jonas Eckerman

Hello!

For anyone who likes to test stuff, I've uploaded my plugin that 
extracts text from documents to

<http://whatever.frukt.org/graphdefang/ExtractText.zip>

I started writing last week, so it hasn't been heavily tested yet, but 
it has been running here over the weekend with no showstopping problems.


What it does is use external tools and simple (interface wise) extractor 
plugins to extract text from message parts. The extractors are choosed 
by MIME type, file name and optionally content magic. The extracted text 
is seen by bayes and SA rules. It is completely possible to create an 
OCR extractor, but I haven't done so, and I currently don't plan on 
doing it.


The plugin currently comes with a *very* rudimentary OpenXML (recent MS 
Word) extractor, and a configuration using external tools "antiword", 
"unrtf", "odt2txt" and "pdftohtml" to extract text from MS Word, RTF, 
OpenDocument (OpenOffice/StarOffice) and PDF files.


It is also possible for an extractor plugin to return several binary 
objects as well as text. These objects will also be processed by all 
extractors, so an extractor for a container type of file can return (as 
an example) a bunch of images, that is then processed by an OCR 
extractor. I have not implemented any extractor that does this, so it's 
completely untested.


Stuff I allready know is missing:

* A safe-guarding maximum depth of processing.

* A way for extractor plugins to get config lines.

Test it if you feel like it.

Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: user filtering attachments

2009-06-28 Thread Jonas Eckerman

Matus UHLAR - fantomas wrote:


oh, dirty workaround, but doable. However, highly depend on the way your MTA
calls the spamassasin. With milter, you can't push _any_ header to the mail,
only those compiled in.


That would depend on wich milter. With MIMEDefang SA itself can't add 
headers directly, but MIMEDefang can use the results from SA to add headers.


OTOH, if one is using MIMEDefang, then one would most likely use 
MIMEDefang to strip attachment rather than a compbination of SA and 
maildrop.


Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: Plugin extracting text from docs

2009-06-25 Thread Jonas Eckerman

Theo Van Dinter wrote:


the convolution is a
fingerprint that you could write a rule for and then you don't care
what the content actually is.  For example, you'd render something
like "doc_pdf_jpg", which would make an obvious Bayes token.  In the
same way for a zip file, you could do "zip_pdf zip_jpg zip_txt", etc,
and they'd all be different tokes.


That's really a good idea. Put the chains of extraction in a 
pseudoheader that can be tested in rules and seen as a token by bayes.


I'm putting that in the todo for the plugin.


The most common thing to extract apart from text will most likely be images.
Any OCR text extractor tied into my plugin would get to see those images,
but any OCR SA plugins run after my plugin won't. It might be good to make
extracted images available to those, and other image handling plugins.



But yours already ran, so who cares about the others?


Because they work very differently?

A OCR plugin that adds the rendered text to the message for bayes and 
text rules is very different from one that does it's own scoring based 
on the OCRed text.



If you're expending the resources to OCR the same image in an email
multiple times ...  You clearly either have a lot of hardware or not a
lot of mail.


*I* don't use any OCR at all. We don't have the resources for that 
(beeing a small non-profit NGO), and so far I haven't seen any need for 
OCR either since we never had much image spam slip through anyway.


So I will not implement a OCR extractor for my plugin. I'll leave that 
for others. This is actually one of the reasons I'd like to let existing 
OCR plugins have access to any images extracted by my plugin. So that 
those who allready do use OCR can get a benefit from the extraction.


I'm not going to spend much time on it though. I'm happy just extracting 
text. :-) And it does extract text (currently from Word, OpenXML, 
OpenDocument and RTF documents). :-)


I actually hadn't even thought about this image/OCR etc stuff before 
Matus suggested it.


Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: Plugin extracting text from docs

2009-06-25 Thread Jonas Eckerman

Theo Van Dinter wrote:


I would comment that plugins should probably skip parts they want to
render that already has rendered text available.


Ah. That's a good idea. Now I'll have to search for a nice way to check 
that. :-)



I can't see how "set_rendered" would help in creating a fucntioning chain
where one converter could put an arbitrary extracted object (image, pdf,
whatever) where another converter could have a go at it.



If a plugin wants to get image/* parts and do something with the
contents, they can do that already.


Not if the image/* parts are actually inside a document.


If you want to have a plugin do some work on a part's contents, then
store that result and let another plugin pick up and continue doing
other work ...  There's no official method to do that.


I guessed as much. This however is what me and Matus were talking about.


You can store
data as part of the Node object.



But what would be a use case for that?


Matus example was a Word document that contained as PDF wich (might in 
turn contain an image). A plugin that knows how to read word document 
could extract th text of the word document and then use "set_rendered" 
to make that avaiölable to SA. It cannot currently extract the PDF and 
make it available to any plugins that knows how tpo read PDFs though.


Matus idea about chains would be that in this example the the plugin 
reading the Word document would store any other objects somehow. In this 
case a PDF. After that, any plugin that knows how to handle PDFs will 
get to look at the PDF and extract text and other stuff from it. In case 
it extracts an image, it would then store it the same way, and any image 
handling plugins would find it.


I really don't know how common that is. I have never seen a Word 
document with a PDF inside it myself.


I have however seen many documents that contain images, and I think it 
would be a good idea to make those images available to things like 
FuzzyOCR and ImageInfo.



Arguably, there could be multiple people developing plugins for
different types, but you'd need some coordination for the
register_method_priority calls to figure out who goes in what order.


For some stuff coordination would be needed, yes. But not for what I'm 
thinking of.


The text extraction plugin I'm working on (wich started this) itself 
have simple extractor plugins. These plugins will be able to return 
arbitrary objects as well as text, and my plugin will check the return 
objects the same way it checks the original message parts. This way, all 
the extractors that are tied into my plugins will be able to extract 
stuff from objects extracted by other extractors. So far so good.


The most common thing to extract apart from text will most likely be 
images. Any OCR text extractor tied into my plugin would get to see 
those images, but any OCR SA plugins run after my plugin won't. It might 
be good to make extracted images available to those, and other image 
handling plugins.


My plugin is called after the message is parsed, wich is very good for a 
text extractor. FuzzyOCR (as an example) however works by scoring OCR 
output (wich may well be very different from the text in the image as we 
see it), and therefore has to be called at a later stage. The same gioes 
for ImageInfo.


It might therefore be a good idea to make the extracted images and other 
objects available to scoring plugins as well.


> I just found the register_method_priority() method. \o/)

It's nice, isn't it? :-)

I'm using it in my URLRedirect plugin.


Note: Do not try to add or remove parts in the tree.  The tree is
meant to represent the mime structure of the mail, and each node
relates to that specific mime part.  The tree is not meant to be a
temporary data storage mechanism.


Ok. That makes things easier and less easy for me. I know that I'll have 
to implement my own list of stuff to loop though when extractors return 
additional parts in my plugin. That's the easy part.


The difficult part is how to make extracted stuff available to other 
plugins in a way they understand. I see two main ways to do this:


1: Invent a new way. This would require modifications of any plugins 
that should check the extracted objects.


2: Add a container part somewhere that "find_parts" would find, but wich 
is not actually a member of the message tree, and then add a simple way 
to add parts to that container. This would require modification of 
Mail::SpamAssassin::Message, but not of the plugins.


Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: Plugin extracting text from docs

2009-06-25 Thread Jonas Eckerman

Theo Van Dinter wrote:


I am not sure but I think something alike was done. What I mean is to have
generic chain of format converters, where at the end would be plain image
or even text, that could be processed by classic rules like bayes,
replacetags etc.



Already exists, check recent list history for "set_rendered".
:)


I though that was for text only.

In any case, any plugin looking for images, or a PDF, will most likely 
look at MIME type and/or file name, and then use the "decode" method to 
get the data, and AFAICT the "set_rendered" method doesn't have any 
impact on any of that.


I can't see how "set_rendered" would help in creating a fucntioning 
chain where one converter could put an arbitrary extracted object 
(image, pdf, whatever) where another converter could have a go at it.


Since the "set_rendered" method seems very undocumented I could of 
course be wrong here. In that case I hope to be verbosely corrected. :-)


/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: Plugin extracting text from docs

2009-06-25 Thread Jonas Eckerman

Matus UHLAR - fantomas wrote:

This I don't understand. Do they put PDFs inside .doc files as if the  
..doc was an archive?


I am not sure but I think something alike was done.


Considering that an OpenXML format is basically a zip file with XML 
files inside and that the actual document can contain hyperlinks I guess 
it could be possible to do something like that. Don't know enough about 
the format to know though.



What I mean is to have
generic chain of format converters, where at the end would be plain image
or even text, that could be processed by classic rules like bayes,
replacetags etc.


If I manage to figure out how to add new parts to a message from within 
the "post_message_parse" method, that should work just fine.


An extractor plugin can return a list of parts to be added to the 
message, and my module will keep looping through the message parts if 
new parts are added.


So, if a Word extractor extracts a PDF and returns it, the PDF woudl be 
added to a new part, and in the next loop the PDF part will be sent to a 
PDF extractor if that exists. And so on. I'm running 
"post_message_parse" at priority -1 so any added image parts should be 
available to plugins like FuzzyOCR as well as plugins running 
"post_message_parse" at default priority.


The missing parts are:

1: How do I add a new part to a parsed message (including a singlepart 
one). This is of course the main problem.


2: The actual extractor plugin that extracts whatever files are included 
in the word document. Antiword only extracts text, and my extractor for 
OpenXML is little more than an extremely basic XML remover.


Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: Plugin extracting text from docs

2009-06-25 Thread Jonas Eckerman

Jonas Eckerman wrote:


You meen extract images and add them as parts to the message?

I guess that should be doable. I know that "unrtf" can extract images 
from RTF files. I'll probably implement support for this, but I'll 
probably not implement actually doing it right away.


This'll probably have to wait. Browsing the POD and source of 
Mail::SpamAssassin::Message::Node and Mail::SpamAssassin::Message I 
found no obvious way of adding new parts to a message node. Especially 
if the node is a leaf node (I'm guessing that singlepart messages only 
has a leaf node).


Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: Plugin extracting text from docs

2009-06-25 Thread Jonas Eckerman

Matus UHLAR - fantomas wrote:

I'm currently working on a modular plugin for extracting text and add it  
to SA message parts.


if possible, extract images too, so the fuzzyocr and similar plugins would
be able to look at that too.


You meen extract images and add them as parts to the message?

I guess that should be doable. I know that "unrtf" can extract images 
from RTF files. I'll probably implement support for this, but I'll 
probably not implement actually doing it right away.



IIRC spammers did even put PDF's to .doc files to make the stuff harder, but
if you manage the above, it shouldn't be hard to extract PDF's too :)


This I don't understand. Do they put PDFs inside .doc files as if the 
..doc was an archive?


Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Plugin extracting text from docs (was: new spam using large images)

2009-06-24 Thread Jonas Eckerman

Jason Haar wrote:


Speaking of image/rtf/word attachment spam; is there any work going on
to standardize this so that the textual output of such attachments could
be fed back into SA?


Just as a note:

I'm currently working on a modular plugin for extracting text and add it 
to SA message parts.


The plugin can use either external tools or it's own simple plugin 
modules. How to extract text from parts is configurable, and based on 
mime types and file names, so new formats can be added by simply 
configuring for new external tolls or creating a new plugin module.


My *far* from finished module currently manages to extract text from 
Word documents (using antiword), OpenXML text documents (using a simple 
plugin) and RTF (using unrtf).


I haven't tested where and how the extracted text is available to 
SpamAssassin yet (as noted, it's *far* from finished), but I am using 
  "set_rendered" method as in the example, so it should work. ;-)


Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: List headers and footers [Re: Unsubscribe]

2009-06-16 Thread Jonas Eckerman

McDonald, Dan wrote:


List servers like mailman resend the message with a different envelope
header.


Wich doesn't invalidate a DKIM, PGP or S/MIME signature.


The MTA receiving this message looks for policy statements about
spamassassin.apache.org, not for policy statements from fantomas.sk.


For SPF yes. For DKIM it should look for policy statements from 
"fantomas.sk" since that is the domain of the address used in the From 
header.


If the message had contained a DKIM signature, it should of course look 
for a DKIM key for the domain specified in the DKIM-Signature header.


Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: List headers and footers [Re: Unsubscribe]

2009-06-16 Thread Jonas Eckerman

David Gibbs wrote:


Since Mailman adds it's own headers to the messages it processes, any existing 
signatures in the message are invalidated.


But... They aren't. Some may be, but not all. As an example, the post 
from mouss wich you replied to was verified with DKIM by our MX to be 
signedhave passed through a system correctly signing for 
"mo...@ml.netoyen.net".


DKIM specifies wich headers it includes in the signature, and ignores 
headers that are prepended after the signature. As long as mailman 
leaves the specified headers below the signature alone, adding it's own 
headers won't invalidate DKIM signatures.


Also, some signatures simply don't care about the *message* headers at 
all, only about the body or the signed MIME part(s).



Thus, Mailman has to remove any existing signatures and let the MTA resign the 
message after it's been processed.


If mailman has been set up to change the body (adding a footer for 
example) or change headers that can reasonably be expected to appear in 
signatures (like From or Subject for example), it should remove certain 
signatures (like DKIM) and (preferably) replace them with the 
authentication results at the current point (of course, it should (when 
applicable) include any prepended results header(s) in it's own 
signature if it then resigns the message).


Otherwise I see no reason for it to remove signatures. Wich is an 
obvious reason *not* to add a footer or a subject tag, as well as a 
reason not to rewrite From and reply-To. Wether or not that reason is 
important is a personal opinion, but it is valid.


If signatures are left in places and important data isn't changed, our 
regular verification methods can verify wether a post purporting to be 
mouss (for example) came from a system that should send mail from mouss.


If mailman removes existing signatures or changes important data, we can 
not verify that the mail really was sent though a system supposed to 
send mail from mouss.


If mailman (or it's MTA) adds authentication results, we have to trust 
the system (and it's administator(s)) in order to be reasonably sure 
wether the mail was sent from an autorized system or not. This may not 
be reasonable for all list hosts.


Note: Important data for the mail from mouss that you replied to is the 
body, and the following headers:

Date:From:Reply-To:MIME-Version:To:Subject:References:In-Reply-To:Content-Type:Content-Transfer-Encoding;

As long as mailman (or anything else) doesn't change that data, the DKIM 
signature will still be valid and verifiable, wich it is here.


Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: Plugin configuration

2009-06-15 Thread Jonas Eckerman

Martin Gregorie wrote:


Now I'd like to configure the database configuration details from a .cf
file, preferably the one containing the associated SA rule, so is there
a recommended way of doing this?


The "parse_config" plugin method?


Pointers to documentation or examples would be much appreciated.


Documentation:
perldoc Mail::SpamAssassin::Plugin
<http://search.cpan.org/~jmason/Mail-SpamAssassin-3.2.5/lib/Mail/SpamAssassin/Plugin.pm>

Examples 1, stock plugins that came with SpamAssassin at:
[...@inc]/Mail/SpamAssassin/Plugin/*
<http://search.cpan.org/~jmason/Mail-SpamAssassin-3.2.5/>

Examples 2, third party plugins at:
<http://wiki.apache.org/spamassassin/CustomPlugins>

Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: Plugin for URL shorteners / redirects

2009-05-26 Thread Jonas Eckerman

Benny Pedersen wrote:


http://wiki.apache.org/spamassassin/WebRedirectPlugin
know this plugin ?


Yes. Though I hade forgotten it's name.


what is the diff in the testing ?


Reading the descriptions of the two plugins would have given you some 
good hints. Reading the documentation (both have PODs) would have given 
you the answer. They are very different.


The WebRedirect plugin fetches pages.
My plugin only fetches headers.

The WebRedirect plugin adds the content of pages as pseudoheaders.
My plugin adds the "location" for a redirect to the existing 
canonicalized list of URIs (so that existing URI checkers sees them).


The WebRedirect plugin provides an eval test to check the status code 
for queried links.

My plugin doesn't.

/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: New image spam

2009-05-26 Thread Jonas Eckerman

Matus UHLAR - fantomas wrote:

You need to check the files contents to catch that, and the ImageInfo  
plugin isn't meant to understand just any kind of content.


Well, first issue was only to compare file extension to provided mime type,
so it would hit .gif file of type image/jpeg


Ah. yes. That could be done in a much more lightweight way than what my 
MimeMagic plugin does.


It should be pretty easy to make a plugin doing that.

compares the content-type with the content  
(using File::MimeInfo::Magic (wich uses the freedesktop file database).



that's more complicated but apparently good to have. I wonder if the real
filetype will match the extension or the mime type (or neither one)


I made it to check for windows executables sent with MIME type and 
extension of an MS Office dokument.


(I did this after discovering that a couple of machines here gladly ran 
an Win32 executable if it had a .doc or .xls extension when the user 
double-clicked it.)


For the current image spam it's overkill, but it did hit when I ran 
checked OPs example message here.


Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Plugin for URL shorteners / redirects

2009-05-26 Thread Jonas Eckerman

Hi!

I just threw together a plugin that can check URLs for redirections, and 
add whatever they redirect to to the message meta-data so that the true 
destinations are checked by URIBLs etc.


It doesn't do this for all URLs in a message. I will only follow those 
URLs it is specifically told to follow. Also, it only asks for HEAD 
rather than pages in order to keep the traffic down.


I'm not sure wether this is really worthwhile or if it is just a waste 
of time and resources, but the idea is to use it for URL shorteners that 
are beeing abused by spammers.


To be really useful it needs a list of abused URL shorteners. I don't 
know wich shorteners are most abused, so I don't know what the list 
should contain.


(The three example shorteners are in the POD because I knew about them, 
not because they are beeing used by spammers.)


If anyone thinks this is a good idea you can check the plugin at
<http://whatever.frukt.org/spamassassin.text.shtml?accessibler#URLRedirect.pm>.


Suggestions and criticism are very welcome.

URL shortener addresses (with formats) even more welcome.


Notes:

This is not extensively tested. It may well contain bugs. It's not a 
finished thing.


If this plugin is a good idea, making it do it's HEAD requests in 
paralell would be a good idea, but I don't know what the best way to do 
that in perl for SA would be. (Currently it has a hardcoded timeout of 
10 seconds around it's requesting stage, but no other time saving stuff.)


Using a cache should also be implemented so that repeatedly seen URLs 
aren't followed over and over again. This should be pretty simple.


Since it needs URL meta-data to be checked before it runs, and needs to 
add it's own meta-data before the rest of the scan run, it can't really 
work asyncronoulsy AFAICS. Currently it uses a parsed_metadata at 
priority -1 in order to add it's own meta-data. Maybe this isn't the 
right way to do this.



Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: New image spam

2009-05-25 Thread Jonas Eckerman

mouss wrote:


is there a way to generalize this to other MIME types? I mean a file
claiming to be a .pdf when it is a .wmv...?


You need to check the files contents to catch that, and the ImageInfo 
plugin isn't meant to understand just any kind of content.



or do we need a FileType plugin?


I guess you mist my post where I said that I've got an experimental 
plugin that is just that. It compares the content-type with the content 
(using File::MimeInfo::Magic (wich uses the freedesktop file database).


The plugin was called TypeMismatch for a couple of days, wich is closer 
to FileType than the current name: MimeMagic.


Anyway, it can be found at
<http://whatever.frukt.org/spamassassin.text.shtml>

Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: New image spam

2009-05-25 Thread Jonas Eckerman

Bob Proulx wrote:


I like the idea of tagging mismatched types where the actual content
doesn't match the stated type.  That would be a good idea for a plugin
enhancement.  Perhaps something based upon libmagic?


I've got a plugin that does this. It's the MimeMagic plugin at 
<http://whatever.frukt.org/spamassassin.text.shtml#MimeMagic.pm>.


FWIW the spam put up by the OP got hit by a mismatch rule when I ran it 
through spamassassin here.


The plugin uses File::MimeInfo::Magic, wich in turn uses the freedesktop 
MIME database.


Please note that while the plugin isn't new I still consider it 
experimental since I haven't done enough evaluation of it's results.


Regards
/Jonas

--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: over-representing non-English spam?

2009-05-20 Thread Jonas Eckerman

Karsten Bräckelmann wrote:


This is not about OpenProtect or their decisions. Actually, there are
more than this one sa-update mirror for the SARE rules.


I think you missed my point. The OpenProtect channel adds a bunch of 
SARE rulesets in a single channel. This means that when you use that 
channel, you delegate the decision on which SARE rulesets to include to

OpenProtect.

This is fine as long as their decisions fit your mail flow and policy (I 
use OpenProtect's channel myself). If their decisions doesn't fit your 
mail flow and policy, it's better to manually add the rulesets you want 
(for example using Daryl's SARE channels).



OpenProtect just happens to be one of the mirrors to provide that
service to the >= 3.1.1 SA users out there. :)



They didn't write the rules, and they are not responsible for FP hits
*long* after the rules have been validated and updated last time.


They didn't write the rules, but they do decide wich rulesets to put in 
their combined channel.


And of course they are not responsible for FPs. The person who 
configured a system to use their channel is responsible for resulting 
FPs (if any) in that system. Wich fits what I said to the OP as well.


Regards
/Jonas

--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: over-representing non-English spam?

2009-05-19 Thread Jonas Eckerman

Jason Haar wrote:


As you can see, MIME_CHARSET_FARAWAY, CHARSET_FARAWAY_HEADER, and
SARE_SUB_ENC_GB2312 (from openprotect rules) all triggered - total of
8.0 points. Sounds good - but of course that's very bad! Doesn't that
mean an actual legitimate Chinese email would *default to a score of
8.0*!?!?!?!


About MIME_CHARSET_FARAWAY, CHARSET_FARAWAY_HEADER:

Setting ok_locales to something not including Chinese charsets implies 
that you want Chinese email to get a rather high score.


If you don't want to punish Chinese mail, don't tell SA to do so.

Hint: The default setting is to allow all charsets. It's you (or your 
admin) that has decided to punish Chinese mail.


About SARE_SUB_ENC_GB2312:

This is not a standard SA rule. Adding that rule to your SA ruleset 
implies that you wish to use it.


If you don't wan't SA to use a sepcific custom rule or ruleset, don't 
tell SA to do so.


Hint:
Using the OpenProtect channel means that you (or your admin) have 
decided to trust OpenProtect to decide for you wich rules to add to your 
ruleset. If you find that you don't agree with OpenProtects decisions, 
simply stop using their channel and make the decisions yourself.


Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: should the spam score increase

2009-05-19 Thread Jonas Eckerman

Jari Fredriksson wrote:


As the mail contains no text, there propably is not much to learn.


Why not? Bayes learns from headers as well, and headers can be just as 
useful as body text for classifying mail.


(Note: I haven't seen a single one of these PNG-only spams, so I don't 
know how telling their headers are in practice.)


Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: should the spam score increase

2009-05-19 Thread Jonas Eckerman

Lists wrote:

question is should they system now be 'learning' these and thus changing 
the bayes_00 to bayes_50 etc


It's actually quite hard for us to know if you have autolearn turned on 
or off.



If not, what is the best method to go about 'learning' these spam.


If you have shell access:
man sa-learn
man spamc

Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: mcafee sees drop in spam?

2009-05-08 Thread Jonas Eckerman

Chris Hoogendyk wrote:

The first quarter ended just over a week ago.


Actually, it ended over a month ago.

Michael Scheidell wrote:
> looks like mcafee sees a 20% drop in spam?
> wonder what that is about.  I'm not seeing a drop in ATTEMPTED spam

I see a recent (late april or early may) increase in the amount of 
botnet connections, but that's in the second quarter of 2009.


MacAfee are comparing the first quarter of 2009 with the first quarter 
of 2008.


McAfee's belief that the lower amount of spam is thanks to the takedown 
of McColo seems resonable. Similar figues were reported by others as 
well in january or february IIRC.


Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: The weirdest problem I have ever met

2009-05-08 Thread Jonas Eckerman

John Hardin wrote:


 spamassassin --remove-addr-from-whitelist=problemacco...@clientdomain.com


An additional note (since, IIRC, the OP said he did this already):

Make sure to run this for the same user as that wich scans the mail when 
it get's the ridicilously high score.


Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: Odd behaviour under load.

2009-05-08 Thread Jonas Eckerman
s, network problems, 
or for some other temporary reason takes a long time to respond. This is 
not a failure to follow RFC2821. This seems to be what happened in this 
case. It is the reason part two is needed.


Part Two: Fix the sending systems so that they do not use an 
inappropriately low timeout after data end (.). There's a 
reason why it SHOULD be 10 minutes.


Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: The weirdest problem I have ever met

2009-05-07 Thread Jonas Eckerman

Jodizzz wrote:


Result: Email was labelled as very very high spam. Mail headers as below


Unfortunately those headers does not include the actual rules that hit. 
Without knowing this, we can only give you educated guesses.


Please include the lists of hits for the message. It should be possible 
to get your software to either put this in a log or include the hits or 
report in the mail (as headers or a MIME part).



In conclusion the email is only treated as major spam if it is from that
particular user problemacco...@clientdomain.com and via the LAN/ISP
connection.


This *really* sounds like a high scoring AWL entry. Are you sure that 
you have removed the relevant entries from the AWL for the right user?



What seems to be the problem? This is so weird!


The problem *seems* to be that the AWL containes very high score for 
that address+relay combo.


The fact that the last score is so much lower than the previous score, 
while still beeing very high, could be an indication of this.


Of course, since we don't know what rules actually hit the message, this 
is just a guess.


Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: Personal SPF

2009-05-05 Thread Jonas Eckerman

Charles Gregory wrote:


Please, stop the PSPF discussions and go implement something that will
work without changing the whole internet



LOL! Please stop discussing ideas?


To be fair, this is the SpamAssassin users list. The purpose if this 
list isn't the discussion about the validity of ideas about possible 
future extensions to SPF, DKIM or whatever except as to how those ideas 
might have a direct impact on the usage or development of SpamAssassin.


I can't speak for others, but this is one reason why I haven't given my 
opinions about your proposed PSPF.


Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/



Re: Personal SPF

2009-05-05 Thread Jonas Eckerman
Matus UHLAR - fantomas  5.5.'09,  8:55:

> > Strictly speaking, getting them to use it consistently and properly will  
> > be MORE difficult,

> more difficult than what?

I parsed it as him stating that getting users to use his proposed PSPF will be 
more difficult than getting them to use athenticated SMTP to his servers.

/Jonas



Re: Personal SPF

2009-05-05 Thread Jonas Eckerman

On 04.05.09 10:31, Charles Gregory wrote:
>> OUR mail server *requires* that a user be connected via our dialups.

[...]

Matus UHLAR - fantomas wrote:


Configuring the mail account in their MUA independently on their internet
connection is much easier than changing SMTP server every time they connect
to other network.



This really is an important point. Your current system makes things 
unnecessarily difficult for roadwarriors.


Beeing able to use authenticated SMTP to port 587 at *one* address is 
much easier than having to set up different outgoing servers for 
different connections wich can become quite tedious if you tend to use 
the connections provioded by hotels for example.


FWIW, this was actually the main justification here for setting up 
authenticated SMTP using a custom SMTP proxy wich authenticated against 
different (local) POP mailboxes depending on user name and server IP. 
Our users (me included) understandably wanted mail on laptops to be easier.


The possibility of using SPF and DKIM were just bonuses.

/Jonas

--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: Personal SPF

2009-05-04 Thread Jonas Eckerman

Charles Gregory wrote:


Proposal: "Personal SPF" - A DNS-based lookup system to allow individual
sender's of e-mail to publish a *personal* SPF record within the context
of their domain's SPF records, that would identify an IP or range of 
IP's which they would be 'stating' are the only possible sources of their

mail.


The only other possible work-around for this is to enforce a 'hard' SPF 
and establish 'pop before SMTP' or 'SMTP auth' protocols, then spam our 
membership informing them that use of our server is mandatory. But that 
would cause problems, because we don't really know *who* is using third 
party servers, and too many of them wouldn't read the notice... :(


Why do you think it would be easier to get those of your users that send 
 through other servers to publish a personal SPF record with correct 
information about the external IP address of the outgoing relay they use 
than it would be to get then to use SMTP auth with your servers?


How many users have any idea at all about the external IPs of their ISPs 
mail relays?


How many of the users who do have a good idea about the external IPs of 
their ISPs mail relays have no idee how to tell their mail client to 
send using use authenticated SMTP with your servers?


I might just be confused, but to me it seems that your solution requires 
more from your users, not less.


And, even if (big, big if) the big mail receivers (Yahoo, Google, big 
ISPs, etc) does eventuelly support your personal SPF, it'll take years 
until it becomes effective.


Regards
/Jonas



But if we had a 'personal' system, then for as many members as we reach 
(who pay attention to notices), we could them 'opt-in' to a voluntary "I 
only send my mail from here" type of system, and then that would at 
least provide *some* address protection/confirmation.


Do they all have static IP addresses or do you imply allow users from 
dynamic addresses to send mail directly?


As noted above, we can control our (dynamic) dialups, but not third 
party usage. So effectively, anyone, anywhere, can use an hwcn.org 
return address. This is something I'd really like to limit to legitimate 
users
without enforcing use of our mail server only (though I realize this may 
be the best long term solution for us).


OF course, my suggestion also hinges on whether there are a sufficient 
number of other systems out there in a similar 'position' as us, who 
would also benefit from this 'next level' of SPF verification...


- Charles



--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: 'anti' AWL

2009-04-29 Thread Jonas Eckerman

RW wrote:


By your cronological definition of first and last (which is the same as
mine), that's the the FIRST non-private address.


Or the address in the fake Received header the spambot put in the mail?

I hope this is not how it works...


It makes sense to me, if I send you an email, the AWL entry should use
my IP address not a random gmail server.


Considering that lots of people have dynamic routable addresses, this 
seems like a bad idea for a big group of people not using WebMail.


Regards
/Jonas
--
Jonas Eckerman
Fruktträdet & Förbundet Sveriges Dövblinda
http://www.fsdb.org/
http://www.frukt.org/
http://whatever.frukt.org/


Re: user-db size, excess growth...limits ignored

2009-04-02 Thread Jonas Eckerman

Linda Walsh skrev:


Yeah -- then this refers back to the bug about there being no way to  prune
that file -- it just slowly grows and needs to be read in when spamd 
starts(?)


No.

The AWL is stored in a database, and spamd does not read the whole 
database into memory. It just looks up and updates the address pairs as 
needed.


The same principle is true for the bayes database.


So the only real harm is the increased read-initialization and the run-time
AWL length?


I don't know what you mean with "run-time AWL length", but I don't think 
the time to open a Berkley DB grows much because the file grows.


What will become slower as the file grows is the database updates and to 
a lesser degree the lookups.


If the AWL or bayes database grows enough for this to actually do harm, 
I'd suggest moving to a SQL database (where expiration of old address 
pairs is pretty easy to implement).



Regards
/Jonas


Re: How to disable DNSWL?

2009-03-02 Thread Jonas Eckerman

Matthias Leisi wrote:


Speaking of which, it may actually make sense to use all of dnswl.org's
entries as trusted_networks-entries...


That seems like a way to get false positives when someone with a 
listed dynamic IP sends through the smarthost of their ISP or ESP.


By extendinmg trust to the ESP/ISP smarthost, SA will do RBL 
checks on the system that sent the mail to the smarthost. That 
system may well be a SOHO or private user with a dynamic IP 
address. Possibly even a dynamic IP address that has previously 
been used by someone else to send spam.


(Please note that I've currently got a fever and therefore may be 
tricked by a non optimally working brain into writing things thar 
simply aren't correct...)


Regards
/Jonas

--
Jonas Eckerman, FSDB & Fruktträdet
http://whatever.frukt.org/
http://www.fsdb.org/
http://www.frukt.org/



Re: netlawyers: why is this patentable?

2009-02-20 Thread Jonas Eckerman

Michael Scheidell wrote:


wonder why this is patentable?


Loads of things are patentable in the meaning that someone 
manages to get a patent. That doesn't mean the patent can 
witstand a challenge.


You never know for sure wether a patent (or a trademark) is fully 
valid until it is is disputed (in court) and survives.


> sounds like preque filtering available in

every mta since the early 90's...


It sound like more like the bastard child of a packet sniffing 
trafic analyzing firewall and a spam-scanning smtp proxy.



looks for 'helo/mailfrom/recpt to' then drops or accepts connection.


From the abstract it's possible that it does so at a firewall 
level rather than as an MTA, though it might also describe a more 
common SMTP proxy.


/J

--
Jonas Eckerman, FSDB & Fruktträdet
http://whatever.frukt.org/
http://www.fsdb.org/
http://www.frukt.org/



Re: country in africa

2009-02-01 Thread Jonas Eckerman

RobertH wrote:


looking hard?



of course i did.


You did say you didn't see Nigeria anywhere. I took this to mean 
that you dodn't see it anywhere in the SA default rules, which 
you would have done using a quick grep.


Now I don't know what you meant when you said you didn't see it 
anywhere.


wasn't mentioned, wich it obviously was.


how many legitimate emails a day do you people get with the work Nigeria in
it?


I get one every now and then. Those usually have to do with spam, 
but not allways.


Sometimes we get quite a few from TT (a swedish news agency). At 
those times it's likely to also be mentioned in our own 
specialized newspaper (made for deafblind people) as well as in 
several newsletters people subscribe to.


We have had correspondance with non-profits in Nigeraia as well, 
but I've no idea how common that is.


In contrast, I can't even remember the last time a 419-type mail 
mentioning Nigeria slipped through our filter.


As an aside:

We once got a legitimate mail from a Nigerian NGO seeking 
financial help for the work with disabled people. We're a swedish 
NGO for deafblind people with a few projects in Africa, so it's 
not a spammy thing for them to do. It got stuck in our quarantine 
(wich is reviewed most workdays), so we actually received it.


I do feel sorry for them since it was most likely stopped almost 
everywhere. Their mail mentioned money, transfers of money, the 
government of Nigeria and banks and was sent form Nigeria.



yeah, that is what i thought.   :-)


It was?


when i get an nigerian email scam email that hits squat, well you get the
idea.


Yeah. You get mail that I don't.

I don't get Nigerian scam email myself, and our users don't 
report any to me. We reject and quarantine at 9 points, and 
reject without quarantine at 18 points. So Nigerian scams get at 
least 9 points here.


So most nigerian mail are either stopped by our greylist or get 
18 points or more, and virtually none get lower than 9 points here.


Regards
/Jonas

--
Jonas Eckerman, FSDB & Fruktträdet
http://whatever.frukt.org/
http://www.fsdb.org/
http://www.frukt.org/



Re: country in africa

2009-01-30 Thread Jonas Eckerman

RobertH wrote:

how is it that the country in africa so often mentioned in email scams is
not worth a point in SA default config


You mean the rules?


nor do i see it anywhere


You must not be looking very hard. It's there, both in the 
default ruleset and in the updated ruleset, but not as a 
single-word rule:


grep -i nigeria 
/var/db/spamassassin/3.002005/updates_spamassassin_org/*
jo...@chip:~$ grep -i nigeria 
/var/db/spamassassin/3.002005/updates_spamassassin_org/*
/var/db/spamassassin/3.002005/updates_spamassassin_org/20_advance_fee.cf:# 
SpamAssassin rules file: advance fee fraud rules (Nigerian 419 scams)
/var/db/spamassassin/3.002005/updates_spamassassin_org/20_advance_fee.cf:body 
__FRAUD_NEB   /(?:government|bank) of nigeria/i
/var/db/spamassassin/3.002005/updates_spamassassin_org/20_advance_fee.cf:body 
__FRAUD_BEP   /\b(?:bank of nigeria|central bank of|trust 
bank|apex bank|amalgamated bank)\b/i
/var/db/spamassassin/3.002005/updates_spamassassin_org/20_advance_fee.cf:body 
__FRAUD_YQV   /nigerian? (?:national|government)/i
/var/db/spamassassin/3.002005/updates_spamassassin_org/20_advance_fee.cf:describe 
ADVANCE_FEE_2 Appears to be advance fee fraud (Nigerian 419)
/var/db/spamassassin/3.002005/updates_spamassassin_org/20_advance_fee.cf:describe 
ADVANCE_FEE_3 Appears to be advance fee fraud (Nigerian 419)
/var/db/spamassassin/3.002005/updates_spamassassin_org/20_advance_fee.cf:describe 
ADVANCE_FEE_4 Appears to be advance fee fraud (Nigerian 419)


jo...@chip:~$ grep -i nigeria /usr/local/share/spamassassin/*
/usr/local/share/spamassassin/20_advance_fee.cf:# SpamAssassin 
rules file: advance fee fraud rules (Nigerian 419 scams)
/usr/local/share/spamassassin/20_advance_fee.cf:body __FRAUD_NEB 
   /(?:government|bank) of nigeria/i
/usr/local/share/spamassassin/20_advance_fee.cf:body __FRAUD_BEP 
   /\b(?:bank of nigeria|central bank of|trust bank|apex 
bank|amalgamated bank)\b/i
/usr/local/share/spamassassin/20_advance_fee.cf:body __FRAUD_YQV 
   /nigerian? (?:national|government)/i
/usr/local/share/spamassassin/20_advance_fee.cf:describe 
ADVANCE_FEE_2  Appears to be advance fee fraud (Nigerian 419)
/usr/local/share/spamassassin/20_advance_fee.cf:describe 
ADVANCE_FEE_3  Appears to be advance fee fraud (Nigerian 419)
/usr/local/share/spamassassin/20_advance_fee.cf:describe 
ADVANCE_FEE_4  Appears to be advance fee fraud (Nigerian 419)


/Jonas
--
Jonas Eckerman, FSDB & Fruktträdet
http://whatever.frukt.org/
http://www.fsdb.org/
http://www.frukt.org/



Re: excessive scan time

2009-01-23 Thread Jonas Eckerman

Brian J. Murrell wrote:


I'd also suggest using SQL for user preferences.


The user interface (i.e. editing a file) for user preferences is a 
different story.  Now users need to know how to edit SQL records, or I 
need to install a web interface for that.


Or you use a small script that reads the users preferences from 
file (when the file has been modified) and updates the SQL database.


Regards
/Jonas
--
Jonas Eckerman, FSDB & Fruktträdet
http://whatever.frukt.org/
http://www.fsdb.org/
http://www.frukt.org/



Re: excessive scan time

2009-01-22 Thread Jonas Eckerman

Brian J. Murrell wrote:

One thing worth noting is that I have spamassassin using ~/.spamassassin 
here and people's home dirs can be (i.e. NFS) mounted from remote 
machines (i.e. their primary workstations), which do occasionally get 
shut down.


If you're not allready using a SQL database for bayes and AWL I'd 
suggest you do that.


I'd also suggest using SQL for user preferences.

I wonder what happens in the MTA->SA->local delivery process 
chain when ~/.spamassassin is unavailable, or worse, on a stale mount.


With bayes, AWL and user prefs in a SQL database that problem 
ought to be avoided. (Maybe there's more than those that should 
be moved from ~/.spamassassin though).


/Jonas
--
Jonas Eckerman, FSDB & Fruktträdet
http://whatever.frukt.org/
http://www.fsdb.org/
http://www.frukt.org/



Re: Botnet plugin

2009-01-16 Thread Jonas Eckerman

Henrik K wrote:


Less info only if you are running a sad MTA, that doesn't properly resolve.
I guess the SOHO rule is exception,


That was what I meant. :-)


Check for IP in hostname? Does anyone have actual stats, that it's somehow
better than a generic \d+-\d+ regex or whatever? Sometimes it's just better
to KISS.


I don't have any stats now, but I use a similar check in our 
selective grey listing and once checked stats for that.


There was a clear difference (catching more fqdns with fewer FPs) 
when I changed from a simple check to a more complex one.


(Comparing the fqdn with the IP address allows you to match 
patterns that might otherwise lead to FPs.)


Regards
/Jonas
--
Jonas Eckerman, FSDB & Fruktträdet
http://whatever.frukt.org/
http://www.fsdb.org/
http://www.frukt.org/



Re: Botnet plugin

2009-01-16 Thread Jonas Eckerman

Mark Martinec wrote:


In a while I'll send a patch to the author.



That is noble, but apparently it doesn't have any effect.


When Botnet was known as RelayChecker I made a suggestion to the 
author. That suggestion was incorporated in the code.


For some reason I take that as an indicator that my suggestion 
did have an effect at that time, and that there is a possibility 
that my new suggestion also has an effect (depending on, among 
other things, what the author things about it).


I also seem to recall that the author gives credit (in some file 
included in the Botnet tar) to a whole bunch of people for 
suggestions and/or changes. Presumably at least some of those 
suggestions and/or changes did have some kind of effect on the 
plugin.


/Jonas

--
Jonas Eckerman, FSDB & Fruktträdet
http://whatever.frukt.org/
http://www.fsdb.org/
http://www.frukt.org/




Re: Botnet plugin

2009-01-16 Thread Jonas Eckerman

Benny Pedersen wrote:


i have changed to use BadRelay from



http://sa.hege.li/BadRelay.pm
http://sa.hege.li/BadRelay.cf


After reading BadRelay.pm I see that it does not really replace 
Botnet.


Some of the differences in what is checked are due to Botnet 
doing DNS-lookups while BadRelay avoids that. That's fair enough 
since one of the points of BadRelay is to avoid those lookups. It 
does mean that BadRelay has less info to base decisions on than 
Botnet though.


One differences is simply due to the fact that all Badrelay does 
is the simple regexp matches. BadRelay doesn't have Botnet's 
check for IP in host name, wich it could do without DNS lookups.


Also, it should be a small and simple change to Botnet in order 
to use some of it's functions without making it do it's own DNS 
lookups AFAICT. The eval checks "botnet_ipinhostname", 
"botnet_clientwords" and "botnet_serverwords" should be able work 
without any DNS lookups with this small change. I might do a 
patch for this (if there is any interest).


What would be nice though would be a plugin that:

1: Have a simple (for the user) cf option to decide on wether 
*any* additional DNS lookups should *ever* be done or not.


2: If told to do lookups, do as many of those as possible 
asynchronously, the way SAs DNSL checks are done.


This would require a redesign of the plugins structure though. I 
*might* do this (in that case I'd do a completely new plugin 
based on Botnet) if I get time for it, but I currently have no 
way of knowing when or if that might be.


Regards
/Jonas
--
Jonas Eckerman, FSDB & Fruktträdet
http://whatever.frukt.org/
http://www.fsdb.org/
http://www.frukt.org/



Botnet plugin patch - avoid FPs from DNS timeouts

2009-01-15 Thread Jonas Eckerman

Hello!

Here's a small patch for the Botnet plugin.

The difference from the original is that it doesn't treat a 
timeout or DNS error the same as a not found answer. This should 
avoid FPs due to overloaded or s,low DNS responsesn.


This patch is against a version that hjas allready been patched 
in order to get short timeouts from the resolver. When using the 
original version, DNS timesouts will probably not occur as often 
and this patch will not make as big a difference.


Please note that I've only tested this since earlier today. If 
you see or notice a mistake or problem in it, please tell me and 
the list about it.


Regards
/Jonas

--
Jonas Eckerman, FSDB
http://www.fsdb.org/
--- Botnet.pm   Thu Jan 15 21:35:42 2009
+++ Botnet.pm.new   Thu Jan 15 21:36:25 2009
@@ -721,8 +721,16 @@
dnsrch=>0,
defnames=>0,
);
-  if ($query = $resolver->search($name, $type)) {
- # found matches
+  if ($query = $resolver->send($name, $type)) {
+ if ($query->header->rcode eq 'SERVFAIL') {
+# avoid FP due to timeout or other error
+return (-1);
+}
+ if ($query->header->rcode eq 'NXDOMAIN') {
+# found no matches
+return (0);
+}
+ # check for matches
  $i = 0;
  foreach $rr ($query->answer()) {
 $i++;
@@ -744,12 +752,12 @@
   }
}
 }
- # $ip isn't in the A records for $name at all
+ # found no matches
  return(0);
  }
   else {
- # the sender leads to a host that doesn't have an A record
- return (0);
+ # avoid FP due to timeout or other error
+ return (-1);
  }
   }
# can't resolve an empty name nor ip that doesn't look like an address


Botnet plugin (was: Temporary 'Replacements' for SaneSecurity)

2009-01-15 Thread Jonas Eckerman

Daniel J McDonald wrote:


I too found botnet to be a great source of FP.  By combining it with p0f
it's moderately useful.


I just found one reason for FPs in the Botnet plugin. It doesn't 
make a difference between timeouts (and other DNS errors) and 
negative answers. So if your DNS server/proxy is overloaded (or 
slow for some other reason), you'll get FPs


Since 15 minutes ago, I'm running a slightly modified version of 
the plugin that tries to avoid this. In a while I'll send a patch 
to the author.


Apart from this the plugin seems to work fine here with a score 
of +2 (with an extra +1 if p0f says it's a Windows system).


Regards
/Jonas

--
Jonas Eckerman, FSDB & Fruktträdet
http://whatever.frukt.org/
http://www.fsdb.org/
http://www.frukt.org/



Re: sa-update does not pick up newest German spam wave

2008-12-04 Thread Jonas Eckerman

Richard Hartmann wrote:

While I agree in general, the text is very static and antivirus eats CPU,
SA does not (so much).


What AV application do you use? Is it daemonized or does it have 
to load it's database for every call?


Here SA uses lots more CPU than clamd and fprotd does.

/Jonas

--
Jonas Eckerman, FSDB & Fruktträdet
http://whatever.frukt.org/
http://www.fsdb.org/
http://www.frukt.org/



  1   2   3   >