Re: writing rules howto?

2014-05-31 Thread Andreas Schulze
Karsten Bräckelmann:
 Since SA 3.4, there are template tags which already might be all you
 need. The template tags _URIHOSTS_ and _URIDOMAINS_ list all extracted
 (and to be looked up) URIs, including full hostname and domain only
 respectively. No path information.
 
   add_header all UriHosts _URIHOSTS_
 
 will add an X-Spam-UriHosts header. Since this actually is provided by
 the URIDNSBL plugin, skiplist and max number apply as outlined.

Kasten,

thanks for these comprehensive answers. I think they are valuable pointers.

Andreas


Re: writing rules howto?

2014-05-31 Thread Andreas Schulze


Andreas Schulze:


Kasten,

sorry - Karsten

works wonderful. I now have a list of hostnames SA find in the messagebody
as new header! Thanks. Much simpler then I thought...

Andreas




writing rules howto?

2014-05-30 Thread Andreas Schulze
Hello,

I have to get an overview on http links in a specific mail stream.
My plan is to use spamassassin as it could parse message body much better then 
I do :-)
There is a plugin URIDNSBL that could fire dns queries for every url found.
That's fine for me, as the url is then in my dnsserver log.

But I like to combine it with other properties of a message.
Is ist possible to do something like this:

if (subject =~ foo) {
  uridnsbl  URIBL_FOO   foo.myzone. A
  body  URIBL_FOO   eval:check_uridnsbl('URIBL_FOO')
}
if (subject =~ bar)
  uridnsbl  URIBL_BAR   bar.myzone. A
  body  URIBL_BAR   eval:check_uridnsbl('URIBL_BAR')
}

Thanks for hints
Andreas


Re: writing rules howto?

2014-05-30 Thread Karsten Bräckelmann
On Fri, 2014-05-30 at 22:33 +0200, Andreas Schulze wrote:
 I have to get an overview on http links in a specific mail stream. My
 plan is to use spamassassin as it could parse message body much better
 then I do :-)
 There is a plugin URIDNSBL that could fire dns queries for every url
 found. That's fine for me, as the url is then in my dnsserver log.

This does not necessarily get you all URIs. There are two limiting
factors:

(a) To lower the load on DNSBL operators and prevent unnecessary DNS
queries, there is a list of URIs frequently found in mail, which will
never be blacklisted anyway. These are skipped.

The option clear_uridnsbl_skip_domain can be used to clear the default
skip list.

(b) To prevent excessive queries, the number of domains to look up is
limited. You can set a higher value for uridnsbl_max_domains, if the
default of 20 is not sufficient in your case.

Both these options are documented here:

  http://spamassassin.apache.org/doc/Mail_SpamAssassin_Plugin_URIDNSBL.html


Depending on what you actually want to extract from the messages, the
resulting DNS queries of the URIDNSBL plugin might not be sufficient.
URIDNSBL does NOT operate on actual, full URIs, but its domains only. No
path information, and no hostname level.

If you need more information and detail, you'll have to write a custom
plugin, which has access to the complete, internal URI list.


 But I like to combine it with other properties of a message.
 Is ist possible to do something like this:
 
 if (subject =~ foo) {
   uridnsblURIBL_FOO   foo.myzone. A
   bodyURIBL_FOO   eval:check_uridnsbl('URIBL_FOO')
 }

No, that is not possible.

However, you can achieve such logic with a custom plugin. In addition to
the internal URI list, a plugin can access which rules already matched.
For that, the rules used as a conditional must have been completed
already (lower priority, and not asynchronous).

The bulk of the regex based rules are run at default priority 0, which
also holds for custom header rules. By running your plugin at a higher
priority level, its action can depend on conditions encoded as plain
rules.


Depending on your environment and needs, a plugin might be overkill and
require too much effort. If the corpus is sufficiently small, and you
don't plan on running the analysis frequently, you might get quick
results out of a hack, harvesting -D debug output.

  uri__DUMP_URIS  m~https?://.+~
  tflags __DUMP_URIS  multiple

That is a sub-rule, matching any http or https URI. Due to tflags
multiple, the debug output will list the matching part along with the
rule's name to grep for. (Note though that this does include various
internal versions, with path info stripped, etc. These duplicates need
to be filtered out.)

If you extract the URIs on a per-message basis, you can easily include
more custom rules and have your data harvesting script use them as
conditionals.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: writing rules howto?

2014-05-30 Thread Karsten Bräckelmann
On Sat, 2014-05-31 at 00:44 +0200, Karsten Bräckelmann wrote:
 Depending on what you actually want to extract from the messages, the
 resulting DNS queries of the URIDNSBL plugin might not be sufficient.
 URIDNSBL does NOT operate on actual, full URIs, but its domains only. No
 path information, and no hostname level.

Since SA 3.4, there are template tags which already might be all you
need. The template tags _URIHOSTS_ and _URIDOMAINS_ list all extracted
(and to be looked up) URIs, including full hostname and domain only
respectively. No path information.

  add_header all UriHosts _URIHOSTS_

will add an X-Spam-UriHosts header. Since this actually is provided by
the URIDNSBL plugin, skiplist and max number apply as outlined.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}