Re: [SURBL-Discuss] MIT Spam conference

2004-12-18 Thread Ryan Thompson
Yarg. I hate it when this happens. Maybe it's free, but it's still ~$600
to get me there and back, and I can't write it off or cover it
personally just now.
Ummm... Hey! Anybody want to pay me to program some stuff or write some
rules or something? :-) I'll take good notes. :-)
- Ryan
William Stearns wrote to ML-spamassassin-talk and ml-surbl-discuss on Fri,...:
Good day, all,
	I'll be attending the MIT spam conference this year, Jan 21st, 9-5. 
Details at http://www.spamconference.org/ .  The registration is free, but 
they suggest an early registration before the conference fills up.
	I'd love a chance to meet other people working on spamassassin and 
surbl.  Is anyone else planning on attending?
	Cheers,
	- Bill

---
God grant me the senility to accept the things I cannot change,
The frustration to try to change things I cannot affect, and the wisdom
to tell the difference.
(Courtesy of Mike Ricketts [EMAIL PROTECTED])
--
William Stearns ([EMAIL PROTECTED]).  Mason, Buildkernel, freedups, p0f,
rsync-backup, ssh-keyinstall, dns-check, more at:   http://www.stearns.org
--
___
Discuss mailing list
[EMAIL PROTECTED]
http://lists.surbl.org/mailman/listinfo/discuss
--
  Ryan Thompson [EMAIL PROTECTED]
  SaskNow Technologies - http://www.sasknow.com
  901-1st Avenue North - Saskatoon, SK - S7K 1Y4
Tel: 306-664-3600   Fax: 306-244-7037   Saskatoon
  Toll-Free: 877-727-5669 (877-SASKNOW) North America


Re: A simple way to...

2004-10-09 Thread Ryan Thompson
Robin Lynn Frank wrote to users@spamassassin.apache.org:
We use SA 3.0.0 with MySQL so we can extract certain AWL data and use
it at the MTA level.  However, since SA doesn't have an auto-blacklist
feature,
Hi Robin,
Actually, AutoWhiteList (AWL) is a bit of a misnomer. AWL maintains
average message scores for sender/class-B tuples, so, in effect, it is
also an auto blacklist, because repeat spam senders will have high
average scores in the AWL database.
I'd like to find a relatively simple way to extract IP addresses from
emails that contain spam.  If it is of any importance, we invoke SA
via amavisd-new.
See, for instance, the check_whitelist script in the tools/ directory of
the distribution. I get output like this:
-4.5   (-35.6/8)  --  [EMAIL PROTECTED]|ip=64.59
 9.3(27.9/3)  --  [EMAIL PROTECTED]|ip=65.39
The first line is for a user that sends ham, so his/her score on future
messages would be pushed closer to -4.5.
The second line is for a user that sends spam, so, if they sent a more
hammy message later, the AWL would likely *add* points to the message,
while decreasing the average slightly.
It works both ways. If you want to use this at the MTA level, I could
envision you wanting to grab, say, every entry over a certain average
score and potentially greylist based on that or something.
Hope this helps,
- Ryan
--
  Ryan Thompson [EMAIL PROTECTED]
  SaskNow Technologies - http://www.sasknow.com
  901-1st Avenue North - Saskatoon, SK - S7K 1Y4
Tel: 306-664-3600   Fax: 306-244-7037   Saskatoon
  Toll-Free: 877-727-5669 (877-SASKNOW) North America


Announce: GetURI 1.6 Released

2004-10-01 Thread Ryan Thompson
2004-09-30: GetURI 1.6 Released
I'm very pleased to announce the release of GetURI 1.6. Many new
features have been put into to this quickly growing program, as have a
few important bug fixes. Everyone already using GetURI is strongly
encouraged to upgrade as soon as possible. If you haven't yet tried
GetURI, now is a great time to start!
What is GetURI?
GetURI is a program using the SpamAssassin libraries, designed to
extract URIs from ham and spam messages, mbox files, or lists of
domains, and present them in a format designed to help classify domains
for anti-spam efforts such as SURBL, although it has other uses, too.
The included 'uricat' utility provides a simple way to extract URIs from
virtually any text file, regardless of how they are encoded. With the
help of the SpamAssassin libraries, GetURI attempts to ignore
unclickable domains (i.e., poisoning attempts), follow redirects, and
otherwise simulate the action of mail user agents (MUAs) as closely as
possible.
Sample output: http://ry.ca/geturi/results.html
What's new?
Here are just a few of the most notable additions to GetURI 1.6:
- Support for SpamAssassin 2.6x has been re-introduced. Now 3.0 and 2.6x are
  officially supported
- By popular demand, support for processing mbox files has been added
- GetURI now does several forward lookup checks on domains, including SBL/XBL,
  IADB2/WADB, as well as checks on nameservers, to aid classification.
- More documentation is now included in the output, and the output format has
  been improved visually, to hopefully be somewhat more intuitive.
- It is now possible to specify a specific SURBL host to query, instead of the
  previous default of multi.surbl.org
- A potentially large memory leak was discovered in the handling of SA3.0
  objects.  Consequently, SA3.0 users should upgrade immediately to enjoy
  drastically reduced memory consumption.
Many more changes have been implemented; please see
http://ry.ca/geturi/CHANGELOG for details
To fetch the new version of GetURI, please visit http://ry.ca/geturi/
As always, your feedback will help improve GetURI!
Additional testers are always welcome.
- Ryan Thompson [EMAIL PROTECTED]
--
  Ryan Thompson [EMAIL PROTECTED]
  SaskNow Technologies - http://www.sasknow.com
  901-1st Avenue North - Saskatoon, SK - S7K 1Y4
Tel: 306-664-3600   Fax: 306-244-7037   Saskatoon
  Toll-Free: 877-727-5669 (877-SASKNOW) North America


Re: scan times up!

2004-10-01 Thread Ryan Thompson
Chris Santerre wrote to Spamassassin-Talk (E-mail):
Well...
ver avg scan time
2.4x2.7 seconds
3.0 30.4 seconds
OH MY! Network test :)
Any longer and I might just be doing greylisting by accident. ;)
:-)
Others have pointed out some possible causes. I did fairly extensive
testing between 2.6x and 3.0 before upgrading, which included
performance benchmarks, and, for certain configurations, I found 3.0 to
be marginally faster than 2.6x. In all cases *with equivalent
configurations*, performance was about the same.
- Ryan
--
  Ryan Thompson [EMAIL PROTECTED]
  SaskNow Technologies - http://www.sasknow.com
  901-1st Avenue North - Saskatoon, SK - S7K 1Y4
Tel: 306-664-3600   Fax: 306-244-7037   Saskatoon
  Toll-Free: 877-727-5669 (877-SASKNOW) North America


Re: MIMEDefang, SpamAssassin and URIDNSBLs

2004-09-26 Thread Ryan Thompson
Tim Boyer wrote to users@spamassassin.apache.org:
3.  Do I have DNS lookup enabled?  Yup:
# Enable or disable network checks
dns_available yes
skip_rbl_checks 0
rbl_timeout 15
Can't think of anything else to try.
Do you have
# If boolean true, skip SA network tests
$SALocalTestsOnly = 1;
in your mimedefang-filter? Make sure you set $SALocalTestsOnly to zero.
For whatever reason, MIMEDefang decided they would override this *one*
SA option within mimedefang-filter. ;-)
If that doesn't help, get a bigger hammer, or maybe ask on the
MIMEDefang list.
If I knew how to make MIMEDefang call SpamAssassin with the debug
switch, that might point me in the right direction.
MIMEDefang uses the SA libs directly... which means, so can you, in
mimedefang-filter. :-) I've never tried it, but you should be able to
enable debugging output before calling the SA check in filter_end().
- Ryan
--
  Ryan Thompson [EMAIL PROTECTED]
  SaskNow Technologies - http://www.sasknow.com
  901-1st Avenue North - Saskatoon, SK - S7K 1Y4
Tel: 306-664-3600   Fax: 306-244-7037   Saskatoon
  Toll-Free: 877-727-5669 (877-SASKNOW) North America


Re: stripping SA headers for reporting? (spamcop, etc.)

2004-09-18 Thread Ryan Thompson
Andre Nicholson wrote to users@spamassassin.apache.org:
John Owens wrote:
I'd like to send as original a message as I can to
SpamCop and other places since they don't like munged reports. Currently 
I'm doing this manually,
which is annoying. I note that sa-learn knows how to remove all SA-specific 
annotations from a message
(unwraps MIME, removes headers, etc.). Is that functionality available in 
any other way?
spamassassin -d  MESSAGEFILE  NEWFILE
Or to also report it afterward
spamassassin -d  MESSAGEFILE  NEWFILE  spamassassin -r  NEWFILE
RTFM, folks. :-)
SPAMASSASSIN(1):
   -r, --report
   Report this message as manually-verified spam.  This will submit
   the mail message read from STDIN to various spam-blocker databases.
   [...]
   If the message contains SpamAssassin markup, the markup will be
   stripped out automatically before submission.
This does the same thing as -d before submission. If it doesn't do what
you want, then your upstream probably isn't adding SA markup. (i.e.,
they're wrapping it themselves using MIMEDefang or something).
- Ryan
--
  Ryan Thompson [EMAIL PROTECTED]
  SaskNow Technologies - http://www.sasknow.com
  901-1st Avenue North - Saskatoon, SK - S7K 1Y4
Tel: 306-664-3600   Fax: 306-244-7037   Saskatoon
  Toll-Free: 877-727-5669 (877-SASKNOW) North America


Re: URI obfuscation check

2004-09-17 Thread Ryan Thompson
Jeff Chan wrote to SpamAssassin Users:
Update on the previous, interestingly the HTML renderer in The Bat!
1.62q did not make the link clickable, but the plaintext message
renderer did.
That's because the HTML did not actually contain a link (anchor); just
the plaintext URI. Many plaintext renderers will, however, link anything
that looks like a URI.
- Ryan
--
  Ryan Thompson [EMAIL PROTECTED]
  SaskNow Technologies - http://www.sasknow.com
  901-1st Avenue North - Saskatoon, SK - S7K 1Y4
Tel: 306-664-3600   Fax: 306-244-7037   Saskatoon
  Toll-Free: 877-727-5669 (877-SASKNOW) North America


Re: [SURBL-Discuss] Start an IP list to block?

2004-09-09 Thread Ryan Thompson
Chris Santerre wrote to SURBL Discussion list (E-mail):
OK, this isn't the first time we've had this discussion, but Raymond
and I felt this should be made public again. He ran thru some tests of
1500+ domains and found the following data. Looks like they maybe send
from zombies, and never their hosts. IPs are similar across the board.
So is there a way to use the IP info in a good way? Could SA or SURBL
do a quick ping of the URL and match against a URL? This would allow
us to simply list 1 IP instead of all these domains.
(I'm well aware of virtual hosts! So only the filthiest of spammers
would be put on this IP list. Then their IP better boot them or anyone
hosted on that box would feel the rath of SURBL.)
I talked to Raymond about this, too... and, basically, here are my
big thoughts:
We need to find the correlation of IP addresses to hostnames. See
http://whois.sc/ ; I can, with some help, duplicate what they're doing
in a way that will help us fight spam.
Then, for 219.254.32.111, we could see that there are, say, 200 sites
hosted at that IP, and, after some hand checking, identify that all of
them belong to spammers.
However, for all we know *so far*, 219.254.32.111 could be a HA cluster
of a few dozen machines, and, while there may be 200 pill spammers on
that cluster, there may be 20,000 other legit sites.
With our current data, we can't make either determination. But, using
forward zone data, we can do forward lookups, and track them in a database.
Then, do forward lookups on SURBL data to get the IPs of spammers, and
(algorithmically!) find correlations.
The programming effort to implement this would not be trivial, not to
mention processing power and bandwidth, to do the initial run. The
datasets (.com!) are huge. After that, we just have to periodically
sample for new, removed, and changed domains, at which point the
processing will be reduced.
Still, there's no way I have time or money to do this alone, given my
current commitments. I *wish* I could spend my whole day fighting spam.
I'd need a fair amount of real help. It'd be good to make happen,
though, considering we could then *proactively* list domains (or IPs)
with a high degree of confidence and little or no collateral damage.
(Because we can *measure* collateral damage if we know which other
domains are hosted on a particular IP). And there would be many many
other statistical benefits we could gain.
- Ryan
--
  Ryan Thompson [EMAIL PROTECTED]
  SaskNow Technologies - http://www.sasknow.com
  901-1st Avenue North - Saskatoon, SK - S7K 1Y4
Tel: 306-664-3600   Fax: 306-244-7037   Saskatoon
  Toll-Free: 877-727-5669 (877-SASKNOW) North America


Re: [SURBL-Discuss] Ham corpora needed

2004-09-05 Thread Ryan Thompson
Jeff Chan wrote to SURBL Discuss and SpamAssassin Users:
In order to reduce false positives in the SURBL data, we would
like to have access to ham corpora.  Does anyone know of any
public ham copora, including just the URI domain names from the
hams?  Or is there anyone who would be willing to run our URI
domain lists against their ham?
Does anyone know if messages from the Enron corpus have been
categorized for ham and spam?
 http://www-2.cs.cmu.edu/~enron/
Thanks in advance for any suggestions, comments, thoughts
FWIW, the mass-check I did on that 75K corpus took about 1.75h, on a
beefy machine with rbldnsd running on localhost, with 20 concurrent
jobs. (mass-check is slower than molasses for anything that blocks if
you don't let it run concurrent jobs :-)
Now, I know not everybody runs SpamAssassin, but it *does* have a really
easy log format and hit-frequencies program. It's possible to
concatenate ham and spam logs from different sources to effectively get
statistics on a larger corpus... and only the test hits are stored in
the log, so the results are effectively anonymous.
There's ham.log for ham, and spam.log for spam, and the entries look
like this, one line per message:
Y  7 /spamdir/11710. URIBL_OB_SURBL,URIBL_WS_SURBL time=1089946124
Rather than re-invent the wheel, you can have your checkers output
simplified mass-check logs. The only column that matters is the tests
column. Something like this should work well enough for hit-frequencies:
N  0 any_string URIBL_TESTS_HIT,COMMA_DELIMITED time=any_integer
Then, grab hit-frequencies from the SA distribution and you can
reproduce the output that others have been posting.
If you *do* have SA installed (even if you don't filter your mail with
it), it's even easier. Just set up a simple .cf file with the URIBL
rules (I'll provide one on request), and invoke mass-check in the tools
directory like so:
./mass-check -p=../rules -c=../rules --net -j=20 --progress \
spam:dir:${SPAMDIR} ham:dir:${HAMDIR}
Then run:
./hit-frequencies -s 3 -p
It's almost worth extracting Mail-SpamAssassin from CPAN just to gain
that functionality. You don't even have to *use* SA. :-)
- Ryan
--
  Ryan Thompson [EMAIL PROTECTED]
  SaskNow Technologies - http://www.sasknow.com
  901-1st Avenue North - Saskatoon, SK - S7K 1Y4
Tel: 306-664-3600   Fax: 306-244-7037   Saskatoon
  Toll-Free: 877-727-5669 (877-SASKNOW) North America