spamassassin -report says Wide character in syswrite at /usr/lib/i386-linux-gnu/perl/5.22/IO/Handle.pm line 220.

2019-03-02 Thread darxus
I'm trying to use spamassassin's ability to report an email as spam to
various folks who collect that kind of data:

https://wiki.apache.org/spamassassin/ReportingSpam

I'm piping the email to "spamassassin -report", and the result I get is:
Wide character in syswrite at /usr/lib/i386-linux-gnu/perl/5.22/IO/Handle.pm 
line 220.

This is on Ubuntu 16.04.6 LTS.  A newer LTS release came out almost a year
ago, and maybe upgrading would fix that.

But it kind of looks like this is a bug within spamassassin, and unicode
should be getting handled differently?  

https://www.perlmonks.org/bare/?node_id=329994

I see I have "normalize_charset 1" in my local.cf 
- 
https://spamassassin.apache.org/full/3.4.x/doc/Mail_SpamAssassin_Conf.html#normalize_charset-0-1-default:-0

This problem may be specific to email that has "Content-Transfer-Encoding:
7bit", but then includes unicode.  For example:
http://www.chaosreigns.com/sa/wide.txt
(Search for "So you".)


Subscription confirmation flood

2019-02-27 Thread darxus
I've gotten many subscription confirmation requests today.  These rules are
getting most of them.  I don't claim they're particularly good rules.  I'm
interested in better options.

http://www.chaosreigns.com/sa/subscriptionflood.txt


Re: UTF-8 rule generator script Re: UTF-8 rules, what am I missing?

2014-10-10 Thread darxus
On 09/29, Jay Sekora wrote:
 Seems like it would be a huge convenience if either (1) turning on
 normalize_charset forced interpretation of rule files as UTF-8, (2)
 there were a similar setting to specify the encoding of rule files, or
 (3) there were a way on a file-by-file basis to say what charset the
 rules in the file were in (which is probably best since it would
 facilitate custom rule sharing across sites).  That's off the top of my
 head with no thought so it may be dumb. :-)

I think it's worth opening a bug.  If I can copy and paste UTF8, I feel
like I really should be able to paste it into a spamassassin rule.


UTF-8 rules, what am I missing?

2014-09-26 Thread darxus
I created some rules to match Polish text:
http://www.chaosreigns.com/sa/polish.txt

The rules with only ascii characters work, the ones with utf8 characters
don't.  According to hexedit, they're identical in my maildir and in my
/etc/spamassassin/local.cf.


SA can handle UTF-8 strings in rules at least since SA 3.2 on Perl
5.8.x. 
- http://spamassassin.1065346.n5.nabble.com/UTF-8-Spam-rules-td106485.html

$ spamassassin --version
SpamAssassin version 3.4.0-rsvnunknown

$ perl --version
This is perl, v5.10.1 (*) built for i486-linux-gnu-thread-multi
spamassassin --lint has nothing to say.

This properly prints a euro sign:
$ perl -Mcharnames=:full -CS -wle 'print \N{EURO SIGN}'
€

But spamassassin -t says the rules with non-ascii utf8 characters aren't
hitting.  What am I missing?


If anyone happens upon this email trying to get utf8 stuff straightened
out, to get gnome-terminal to work I needed to add:

$ cat .gnomerc 
export LANG=en_US.utf8

To get apache to work I needed:  AddDefaultCharset utf-8

The rest is covered here:
http://perlgeek.de/en/article/set-up-a-clean-utf8-environment


UTF-8 rule generator script Re: UTF-8 rules, what am I missing?

2014-09-26 Thread darxus
I wrote a script that takes a list of words with UTF-8 characters, and
generates rules matching them:

http://chaosreigns.com/code/dl/sawordrule.pl

For example:

$ echo análisis | perl ./sawordrule.pl SPANISH_
body SPANISH_ANALISIS /\ban[\x{C1}\x{E1}]lisis\b/i # análisis

(The two characters per UTF8 character are the upper and lower case
characters, because /i apparently doesn't apply to these.)

For a bigger example:
cat spanish.txt | tr -d ',;.:()-' | tr ' ' '\n' | sort -f | uniq -i | 
./sawordrule.pl SPANISH_  spanish.cf

A couple untested results:
http://www.chaosreigns.com/sa/spanish.cf
http://www.chaosreigns.com/sa/polish.cf

To be clear, these files will likely flag ALL Polish or Spanish emails as
spam.

By default, rules have a score of 1, so without a corresponding score
line, each of these have a score of 1.

The output is going to include some garbage rules you're going to need to
manually delete.  It's also probably going to include occasional rules
which will match English words.  I'm sure I missed a couple of these in the
.cf files I provided.

To use the .cf files, add something like this to your local.cf:

include /etc/spamassassin/spanish.cf
include /etc/spamassassin/polish.cf

On 09/26, John Hardin wrote:
 On Fri, 26 Sep 2014, dar...@chaosreigns.com wrote:
 
 I created some rules to match Polish text:
 http://www.chaosreigns.com/sa/polish.txt
 
 The rules with only ascii characters work, the ones with utf8 characters
 don't.  According to hexedit, they're identical in my maildir and in my
 /etc/spamassassin/local.cf.
 
 Put the hex strings for the accented characters into the RE.
 
 I've had the best reliability from placing each byte in its own
 character class:  [\xd0][\x80]

Thanks.  


Re: UTF-8 rule generator script Re: UTF-8 rules, what am I missing?

2014-09-26 Thread darxus
On 09/26, Adi wrote:
 are part of some SPAM messages but normal messages too.
 You should consider use long phrase to eliminate wrong matching.
 Many Polish words have many meanings depending on the context.

Certainly proper rules that hit only spam would be preferable, but to
make any decent attempt at that would require access to a bunch of Polish
non-spam for testing, which I do not have.

If you (or anybody) are regularly receiving non-spam in a language other
than English (and willing to sort it into spam vs. non-spam folders), it
would be valuable to the spamassassin project to run the testing script
(masscheck) to report how many of your spams and non-spams each of the
rules hit.  You don't have to give anybody a copy of your emails, just
the report of the hit counts.  More info here:

https://wiki.apache.org/spamassassin/NightlyMassCheck


There's also stuff about automatic rule generation here that might be fun:
https://wiki.apache.org/spamassassin/WritingRules#Automatic_rule_generation


On 09/26, John Hardin wrote:
 How do you get a one byte match for two-byte-long UTF-8-encoded
 accented characters? Shouldn't it generate this:

I believe it was putting 'export PERL_UNICODE=' in my ~/.bashrc.
Documentation is here:
http://perldoc.perl.org/perlrun.html#*-C-[_number/list_]*

Before I set that environment variable, as you said, I was getting two
output characters per two byte long UTF-8 character.

 Your rule doesn't hit in my test environment (though I just pasted
 that word into an existing message to test...)

Weird.


Non-English spam

2014-09-25 Thread darxus
I had TexCat set up to detect non-English emails as spam:
https://spamassassin.apache.org/full/3.4.x/doc/Mail_SpamAssassin_Plugin_TextCat.html
But I apparently didn't have the score turned up high enough.  The default
score for its UNWANTED_LANGUAGE_BODY is 2.800.

I just added this to my /etc/spamassassin/local.cf:
score UNWANTED_LANGUAGE_BODY 5

Which I expect to be helpful.  Since 129 of the 193 spams spamassassin
has missed this month hit that rule (and none of my non-spams have).  67%.
39% of them contained the Polish word for district.  


To enable TextCat to flag everything that's not English, in local.pre
I have:
loadplugin Mail::SpamAssassin::Plugin::TextCat

And in local.cf I have:
ok_languages en


This post was originally going to be asking if anybody wanted to
collaborate on some non-English spam rules.  I guess I'll re-consider that
after October.


Re: SPF failure very low score

2013-08-08 Thread darxus
On 08/08, Quanah Gibson-Mount wrote:
 For SA 3.4.0, it says in 50_scores.cf:
 
 # SPF
 # Note that the benefit for a valid SPF record is deliberately minimal; it's
 # likely that more spammers would quickly move to setting valid SPF records
 # otherwise.  The penalties for an *incorrect* record, however, are
 large. ;)
 
 However, .001 does not seem LARGE to me at all.  I would expect at
 least a 1.  Right now there is tons of facebook spam out there
 that clearly fails SPF, such as the following:
 
 
 X-Spam-Status: No, score=2.407 tagged_above=-10 required=3
   tests=[BAYES_50=0.8, DKIM_ADSP_ALL=0.8, HTML_FONT_LOW_CONTRAST=0.001,
   HTML_MESSAGE=0.001, KHOP_BIG_TO_CC=0.001, RDNS_NONE=0.793,
   SPF_FAIL=0.001, T_HEADER_FROM_DIFFERENT_DOMAINS=0.01] autolearn=no
 
 How is .001 in any way considered a large penalty?

As has been said, SPF is kind of a terrible spam indicator:
http://ruleqa.spamassassin.org/?daterev=20130808-r1511618-nrule=SPF_FAIL

  MSECSSPAM% HAM% S/ORANK   SCORE  NAME   WHO/AGE
  0   0.1057   1.4410   0.0680.400.00  SPF_FAIL  

That says it hits over 10x as large a portion of non-spam as spam.  


The explanation for the quote is, quite simply, that it is out of date, and
you should fix it.

-- 
As humans, we are taught to forget that we are animals.
- forward to Johnny The Homicidal Maniac
http://www.ChaosReigns.com


Re: ok_languages

2013-07-12 Thread darxus
Sounds like you didn't load the plugin (in the right place).  There's some
related stuff on http://wiki.apache.org/spamassassin/ImproveAccuracy

On 07/12, Timothy Murphy wrote:
 When I run spamassin --lint I get the response
 -
 [tim@alfred ~]$ sudo spamassassin --lint
 Jul 12 21:59:15.538 [19228] warn: config: failed to parse, now a plugin,
 skipping, in /etc/mail/spamassassin/local.cf: ok_languages en it fr de ga
 -
 So where do I say now which languages I like?
 
 -- 
 Timothy Murphy  
 e-mail: gayleard /at/ eircom.net
 tel: +353-86-2336090, +353-1-2842366
 School of Mathematics, Trinity College, Dublin 2, Ireland
 

-- 
Let's just say that if complete and utter chaos was lightning, then
he'd be the sort to stand on a hilltop in a thunderstorm wearing wet
copper armour and shouting 'All gods are bastards'. - The Color of Magic
http://www.ChaosReigns.com


Re: 2 Seems To Be My Sweet Spot

2013-06-03 Thread darxus
The default rule scores are generated with an assumed threshold of 5
and a target of 1 false positive in 2,500 non-spams.  It sounds like you
may be substantially increasing the false positive rate.  Which you are
certainly entitled to do, but I would not recommend.

http://wiki.apache.org/spamassassin/ImproveAccuracy

On 06/03, Bill Polhemus wrote:
 Hello. 
 
 I am not a major admin. I have used a Linux box w/ Sendmail + Spamassassin 
 off and on for years, just for personal and small-biz email. I have only two 
 dozen or so accounts allocated among three domains. 
 
 Using third-party email service for many years, which supposedly includes 
 Spam filtering, I noticed that gradually, of  ~500 or so mails per account 
 per day,  about 40% are spam. And in fact I noticed perhaps half again as 
 many spam were getting through as were caught in my email service provider's 
 Spam trap (I have no idea what they use).
 
 Decided to take things in hand again. 
 
 After about 3 months of fiddling I've got it to the point where I'm down to 
 maybe two Spam per account per day getting through. 
 
 Typical SA Bayes files sizes are about 650K Bayes_seen/AWL  and 1.2G 
 Bayes_toks
 
 Thing is, in order to get this performance I've had to set the threshold for 
 Spam/Ham at a SA score of 2, after all hand-feeding and tweaking I know to 
 do. I lowered it gradually over time by 0.5 every two weeks or so, to this 
 point.
 
 So far I've found maybe 1 or 2 false positives per account per week at this 
 scoring. 
 
 I'm fine with it as is, but thought some folks here might find it interesting 
 to note.
 
 William L. Polhemus, Jr. P.E.
 Sent from my iPhone 5

-- 
Believe nothing, no matter where you read it or who has said it, even
if I have said it, unless it agrees with your own reason and your own
common sense. - Buddha, 563-483 B.C.
http://www.ChaosReigns.com


Re: Sare anda OpenProject Updates

2013-05-27 Thread darxus
https://wiki.apache.org/spamassassin/SoughtRules

On 05/27, Rejaine Monteiro wrote:
 Hello guys,
 
 There are still some active rules update channel? Sare and Open
 looks that are no longer available...
 
 The SARE rules are broken to the point of being harmful (see in
 http://wiki.apache.org/spamassassin/SareChannels)
 OpenProtect' SpamAssassin sa-update channel is obsolete since SARE
 stopped updating their rulesets. Please stop using this channel
 (see in http://saupdates.openprotect.com/)
 

-- 
Whole problem with the world is that fools and fanatics are always
so sure of themselves, and wiser people are full of doubts.
- George Bernard Shaw
http://www.ChaosReigns.com


With similar rules, rspamd is about ten times faster than SpamAssassin.

2013-03-06 Thread darxus
http://freecode.com/projects/rspamd

Somebody asked about it in IRC today.  I don't know anything about it.  

-- 
You will need: a big heavy rock, something with a bit of a swing to it...
perhaps Mars - How to destroy the Earth
http://www.ChaosReigns.com


Re: RCVD_IN_DNSWL_HI false negatives (my solution)

2013-02-07 Thread darxus
On 02/07, Lutz Petersen wrote:
  If you use mobile.de as a forwarder, it may make sense to add there IPs to
  your trusted_networks configuration. If you do this, the DNSxL tests are
  applied to the IP _before_ the mobile.de hop.
 
 That is no problem special to us or our customers. The whitelist level for
 the four mobile.de IPs in the dnswl simply is wrong. Instead of HI a level
 of NONE would be right.

FYI, the guy you were replying to there runs dnswl.  

It sounds like one of your customers has created a mobile.de account, and
requested that email to that account be forwarded to an address for which
you are hosting mail.  If that is the case, this is what spamassassin would
call a trusted relay, and you should add mobile.de's IPs as trusted relays,
like:

trusted_networks 194.50.69.1

This will cause spamassassin to use the IP from the relay before mobile.de
for blacklist and whitelist (dnswl) lookups.


It's kind of an awkward, inconvenient situation.  But if your customer has
requested these emails be relayed, it's kind of unreasonable for you to
expect dnswl to delist them.  

Does that all make sense?

On the other hand, if nobody ever requested that these emails be relayed,
and you can firmly establish that, I (and a couple other people in this
thread) would be happy to drop their score in dnswl.  It just doesn't sound
like that's what's happening.  

As Niamh mentioned, dnswl.org has no record of abuse reports, or blacklists
listing this IP, which is further evidence that something else is going on
in your situation.

(I'm also an (inactive) dnswl admin.)

-- 
The whole aim of practical politics is to keep the populace alarmed --
and hence clamorous to be led to safety -- by menacing it with an endless
series of hobgoblins, all of them imaginary. - H. L. Mencken
http://www.ChaosReigns.com


Do you have your trusted networks configured correctly?

2013-02-05 Thread darxus
I feel like this comes up often enough, people not having trusted_networks
or internal_networks set.  

Probably for most people it's unnecessary.  But if you have some server
relaying / forwarding mail to your server, and you don't have one of these
set, spamassassin is using the IP address of that relaying server for
blacklist lookups, which is not useful.  

And all you have to do is add a line to your local.cf containing:

trusted_networks IP

Where IP is the IP address of the relaying machine.  You can have
multiple, separated by a space.

Often, it seems, people are getting email relayed and have forgotten about
it.  So to look for that, you can add to your local.cf:

add_header all RelaysUntrusted _RELAYSUNTRUSTED_

Then wait till you get a bunch of email, then run something like:

cat ~/Maildir/cur/* ~/Maildir/new/* | grep ^X-Spam-RelaysUntrusted | cut -d' ' 
-f3 | sort | uniq -c | sort -nr | less

This will list the untrusted IPs you most commonly get email from.  You
should make sure the ones near the top aren't actually trusted relays you
should add to trusted_networks.

These are the related wiki pages:
http://wiki.apache.org/spamassassin/TrustPath
http://wiki.apache.org/spamassassin/TrustedRelays

I should probably add this testing stuff somewhere.

-- 
I'd rather be happy than right any day.
- Slartiblartfast, The Hitchhiker's Guide to the Galaxy
http://www.ChaosReigns.com


Re: ANNOUNCEMENT: update to ivmURI regarding surge in rarely-blacklisted domains spammers use from legit site that are compromised

2013-01-07 Thread darxus
What spamassassin rules is this related to?

On 01/07, Rob McEwen wrote:
 ANNOUNCEMENT: update to ivmURI regarding surge in rarely-blacklisted domains 
 spammers use from legit site that are compromised
 
 There has been a surge during the past couple of days in rarely-blacklisted 
 domains (as in, you see few of these blacklisted on SURBL/URIBL/DBL) ...where 
 the spammers used compromised sites which are normally legit sites. (maybe 
 the FTP password was cracked? or some other security hole exploited?) 
 Likewise, ivmURI was missing many of these because our 
 FP-prevention-filters... which normally prevent decoy domains or innocent 
 domains from getting blacklisted... were also causing many of these to be 
 overlooked. (I suspect that the same was happening with the other URI 
 blacklists, since [it seems?] even fewer of these were getting blacklisted on 
 those other URI/domain blacklists?)
 
 This isn't new. For months, it has been on my mind to make some adjustments 
 to surgically target listing these types of domains... where our 
 FP-prevention-filters would then back off just a tad... yet in a very 
 surgically targeted way... so that these would start blacklisting, yet 
 without those changes to the filters suddenly causing many FPs, and where 
 these domains would also expire off of ivmURI faster--with the idea that the 
 site owners would probably find and fix their problem somewhat quickly. (we 
 don't want these to remain blacklisted weeks after the spam has ceased and 
 the security problem fixed)
 
 Yes, this WILL cause a tiny bit of collateral damage... but my estimation 
 is that the ratio is off-the-chart GOOD! These are relatively minor sites. 
 This could potentially cause hundreds of thousands of spams blocked for every 
 one legit mail blocked. And if someone STILL has a problem with that ratio... 
 then my message to them is... the site owner should be somewhat held 
 accountable for their poor security--which is partly at fault for so much 
 elusive spam making it into inboxes! (and, again, these listings will expire 
 MUCH faster than regular ivmURI listings)
 
 Many of these spams are especially elusive because the spammers then combine 
 the use of a somewhat legit domain... with sending from freemail servers, 
 or other legit mail servers which would cause far too much collateral damage 
 if blocked by IP. At best, this puts a HUGE burden on content filters. At 
 worst, many of these are slipping past many spam filters.
 
 This major milestone improvement for ivmURI was implemented mere hours ago. 
 Here are some results... where these were added to the ivmURI list today:
 
 http://dnsbl.invaluement.com/uri_surge.txt
 
 NOTE: These are all domains impacted by this change. Unfortunately, many in 
 that list would been blacklisted on ivmURI anyways, without the changes... 
 but many domains in that list required this change to get listed on ivmURI. 
 Also, across the board, you'll also find very few in that list which are on 
 ANY other URI blacklists!
 
 Questions/Feedback are welcome!
 
 -- 
 Rob McEwen
 http://dnsbl.invaluement.com/
 r...@invaluement.com
 +1 (478) 475-9032
 

-- 
And I got these stunning rushes of pure timeless joy, when my
consciousness seemed to expand outwards from the limits of my skin to fill
the universe and I could no longer tell whether I was playing the music or
the music was playing me. - http://www.catb.org/esr/writings/dancing.html
http://www.ChaosReigns.com


Re: Is the SpamAssassin wiki dead?

2013-01-07 Thread darxus
You need to create an account on the wiki, then post to the dev list
requesting write access, mentioning the user name of the account you
created.  As it says at the bottom of http://wiki.apache.org/spamassassin/

On 01/07, Jeremy Morton wrote:
 Sorry, I'm not sure what you mean by added me.  I don't think I
 already had an account with username jez so I was expecting to
 be send a password too.  What should I do?
 
 -- 
 Best regards,
 Jeremy Morton (Jez)
 
 On 06/01/2013 14:25, Jeremy McSpadden wrote:
 Kevin added you back on the 31st.
 
 Should be done.
 
 Happy new year,
 KAM
 
 On 12/28/2012 7:53 AM, Jeremy Morton wrote:
 Hi,
 
 Please add me to the Contributors Group with the wiki username jez.
 
 
 --
 Jeremy McSpadden
 Flux Labs | Endless Solutions
 Cell : 850-890-2543 | Fax : 850-254-2955
 
 On Jan 6, 2013, at 6:50 AM, Jeremy Morton ad...@game-point.net
 mailto:ad...@game-point.net wrote:
 
 I've been trying to get edit access to the SpamAssassin wiki now for
 weeks, and have gotten nowhere. Is the wiki just dead now? Should
 someone else start a documentation project for SpamAssassin? It's
 pretty ludicrous that nobody even seems to care about letting people
 improve the documentation when they are willing to do so.
 
 --
 Best regards,
 Jeremy Morton (Jez)
 
 

-- 
All that is necessary for evil to triumph is for good men to do nothing
- War and Peace (film series)
http://www.ChaosReigns.com


Re: the sa-rules tarball http://spamassassin.apache.org/ is ancient

2012-12-17 Thread darxus
On 12/08, Per Jessen wrote:
 FYI, see $SUBJ.

Just noticed I opened a bug about this nearly a year and a half ago:
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6632

-- 
Anarchy is based on the observation that since few are fit to rule
themselves, even fewer are fit to rule others. -Edward Abbey
http://www.ChaosReigns.com


Re: sa-update generates errors

2012-12-17 Thread darxus
Probably this known problem, bug open for over a year:
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6649#c19

The initial comments make it sound like a simple problem of not correctly
escaping rules containing binary data.  While it is actually a much more
complicated problem related to the same thing.

On 12/17, Eric Krona wrote:
 From time to time when sa-update is running, I get errors in the output.
 
 Like today I got:
 Illegal octal digit '8' ignored at
 /usr/share/perl5/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
 line 1083, $fh line 1097.
 re2c: error: line 170, column 2: unterminated string constant (missing )
 command 're2c -i -b -o scanner2.c scanner2.re' failed: exit 1
 
 What is the reason for it, are some rules poorly written, or do I
 miss some library or what could be the problem?
 
 /eric
 

-- 
You shall know the truth, and it shall make you odd.
-- Flannery O'Connor
http://www.ChaosReigns.com


Re: sa-update generates errors

2012-12-17 Thread darxus
Can this error at least be improved to state which input file the error is
associated with?

On 12/17, Eric Krona wrote:
 From time to time when sa-update is running, I get errors in the output.
 
 Like today I got:
 Illegal octal digit '8' ignored at
 /usr/share/perl5/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
 line 1083, $fh line 1097.
 re2c: error: line 170, column 2: unterminated string constant (missing )
 command 're2c -i -b -o scanner2.c scanner2.re' failed: exit 1
 
 What is the reason for it, are some rules poorly written, or do I
 miss some library or what could be the problem?
 
 /eric
 

-- 
Hermes will help you get your wagon unstuck, but only if you push on it.
- Greek Alphabet Oracle
http://www.ChaosReigns.com


Re: the sa-rules tarball http://spamassassin.apache.org/ is ancient

2012-12-08 Thread darxus
On 12/08, Per Jessen wrote:
 FYI, see $SUBJ.

Much like the 3.2.5 release which that page still unfortunately implies is
reasonable to use.

I'd love an explanation of a situation where somebody is running
spamassassin but can't run sa-update, even once.  I hear that exists.

-- 
We will be dead soon. Is this how we want to live?
http://www.ChaosReigns.com


Re: Report your webmail usage

2012-12-04 Thread darxus
On 12/04, David F. Skoll wrote:
 http://sourceforge.net/projects/aper/
 
 Their phishing_links file did have the URL you reported in it:

But did it contain that url at the time he received the email?  That seems
to be a very important question with these things.

 So all some kind soul needs to do is write a SpamAssassin plugin that
 gets the link list from the project and looks for URLs in message bodies
 (or even just the Google formkey values which are pretty likely to be
 unique.)

Or a script, similar to their 
https://aper.svn.sourceforge.net/svnroot/aper/addresses2spamassassin.pl
which grabs https://aper.svn.sourceforge.net/svnroot/aper/phishing_links
and converts it to SA rules.  Since something (other than an SA plugin) is
going to need to download the file anyway, might as well convert it to
rules in the process.  Shouldn't be too hard, right?  Maybe use \Q\E to
avoid needing to escape everything?

 Oh, somewhat off-topic but in case anyone with clout at Google is
 reading this:  More than a year ago, I recommended to Google that all
 of their user-created forms should display this text:
 
 This is a user-created form hosted at Google.  Do not enter sensitive
  information such as credit card numbers or passwords.  If you are asked
  to enter such information, please report this form as abusive.
 
 but Google never got back to me.  It seems to me they're complicit in
 helping phishers...

You think people who will enter sensitive information into a random web
form will even read that warning?  Or be prevented from entering that
information even if they do read it?

Also, it seems like it would be pretty obnoxious for people who constantly
use that stuff legitimately (which I don't).


On 12/04, Eric Krona wrote:
  -0.5 BAYES_05   BODY: Bayes spam probability is 1 to 5%

Is your bayes data poisoned?  
( http://wiki.apache.org/spamassassin/ImproveAccuracy )

-- 
I don't want to die... just yet... not while there's... women.
- J. Matthew Root, 8/23/02 (http://www.jmrart.com/)
http://www.ChaosReigns.com


Can somebody unsubscribe me...@leigh.ssllock.com from this list?

2012-12-04 Thread darxus
I'm guessing they're sending this garbage to everybody who posts.

- Forwarded message from MDaemon at leigh.ssllock.com 
mdae...@leigh.ssllock.com -

Date: Tue, 04 Dec 2012 17:19:58 -0600
From: MDaemon at leigh.ssllock.com mdae...@leigh.ssllock.com
Reply-To: nore...@leigh.ssllock.com
To: dar...@chaosreigns.com
Subject: Transient Delivery Failure
X-DNSWL: No

--
MDaemon Delivery Status Notification - http://www.altn.com/dsn
--

The attached message had TEMPORARY non-fatal delivery errors.

--
THIS IS A WARNING MESSAGE ONLY - YOU DO NOT NEED TO RESEND YOUR MESSAGE
--

MDaemon is configured to automatically retry delivery at configured
intervals.  Subsequent attempts to deliver this message are pending.

Failed address: ol2...@company.mail

--- Session Transcript ---
 Tue 2012-12-04 17:19:33: [54:1] Session 54; child 1
 Tue 2012-12-04 17:19:33: [54:1] Parsing message 
\pd5003000.msg
 Tue 2012-12-04 17:19:33: [54:1] *  From: dar...@chaosreigns.com
 Tue 2012-12-04 17:19:33: [54:1] *  To: ol2...@company.mail
 Tue 2012-12-04 17:19:33: [54:1] *  Subject: Re: Report your webmail usage
 Tue 2012-12-04 17:19:33: [54:1] *  Size (bytes): 6325
 Tue 2012-12-04 17:19:33: [54:1] *  Message-ID: 
20121204224257.gj12...@chaosreigns.com
 Tue 2012-12-04 17:19:33: [54:1] Attempting SMTP connection to [company.mail]
 Tue 2012-12-04 17:19:33: [54:1] Resolving MX records for [company.mail] (DNS 
Server: 10.20.20.105)...
 Tue 2012-12-04 17:19:33: [54:1] Match to MXCACHE.DAT file:
 Tue 2012-12-04 17:19:33: [54:1] *  P=010 D=company.mail TTL=(0) 
MX=[company.mail] {10.10.42.34}
 Tue 2012-12-04 17:19:33: [54:1] Attempting SMTP connection to [10.10.42.34:25]
 Tue 2012-12-04 17:19:33: [54:1] Waiting for socket connection...
 Tue 2012-12-04 17:19:54: [54:1] *  Winsock Error 10060
 Tue 2012-12-04 17:19:54: [54:1] *  10.10.42.34 added to connection failure 
cache for 5 minutes
 Tue 2012-12-04 17:19:54: [54:1] This message is 36 minutes old; it has 0 
minutes left in this queue
 Tue 2012-12-04 17:19:54: [54:1] Remote queue lifetime exceeded; message placed 
in retry queue
--- End Transcript ---


--
This is a test server. Please do not submit support requests via this channel.

X-MDAV-Result: clean
X-MDAV-Processed: leigh.ssllock.com, Tue, 04 Dec 2012 16:43:26 -0600
Received: from mail.apache.org (hermes.apache.org [140.211.11.3])
by leigh.ssllock.com (leigh.ssllock.com)
(MDaemon PRO v13.0.3)
with ESMTP id md5008389.msg
for me...@leigh.ssllock.com; Tue, 04 Dec 2012 16:43:26 -0600
Authentication-Results: leigh.ssllock.com
spf=pass 
smtp.mail=users-return-99057-Meche=leigh.ssllock@spamassassin.apache.org;
x-ip-ptr=pass dns.ptr=hermes.apache.org (ip=140.211.11.3);
x-ip-helo=pass smtp.helo=mail.apache.org (ip=140.211.11.3);
x-ip-mail=hardfail 
smtp.mail=users-return-99057-Meche=leigh.ssllock@spamassassin.apache.org 
(does not match 140.211.11.3);
dkim=pass header.d=chaosreigns.com (b=X4pc00xgJL; 1:0:good);
Received-SPF: pass (leigh.ssllock.com: domain of 
users-return-99057-Meche=leigh.ssllock@spamassassin.apache.org
designates 140.211.11.3 as permitted sender)
x-spf-client=MDaemon.PRO.v13.0.3
receiver=leigh.ssllock.com
client-ip=140.211.11.3

envelope-from=users-return-99057-Meche=leigh.ssllock@spamassassin.apache.org
helo=mail.apache.org
X-Spam-Processed: leigh.ssllock.com, Tue, 04 Dec 2012 16:43:26 -0600
(not processed: message spf and/or cryptographically verified and 
approved)
X-MDPtrLookup-Result: pass dns.ptr=hermes.apache.org (ip=140.211.11.3) 
(leigh.ssllock.com)
X-MDHeloLookup-Result: pass smtp.helo=mail.apache.org (ip=140.211.11.3) 
(leigh.ssllock.com)
X-MDMailLookup-Result: hardfail 
smtp.mail=users-return-99057-Meche=leigh.ssllock@spamassassin.apache.org 
(does not match 140.211.11.3) (leigh.ssllock.com)
X-MDDKIM-Result: unapproved (leigh.ssllock.com)
X-MDSPF-Result: pass (leigh.ssllock.com)
X-Rcpt-To: me...@leigh.ssllock.com
X-MDRcpt-To: me...@leigh.ssllock.com
X-MDRemoteIP: 140.211.11.3
X-Envelope-From: 
users-return-99057-Meche=leigh.ssllock@spamassassin.apache.org
X-CAV-Result: clean
Received: (qmail 24505 invoked by uid 500); 4 Dec 2012 22:43:22 -
Mailing-List: contact users-h...@spamassassin.apache.org; run by ezmlm
Precedence: bulk
list-help: mailto:users-h...@spamassassin.apache.org
list-unsubscribe: mailto:users-unsubscr...@spamassassin.apache.org
List-Post: mailto:users@spamassassin.apache.org
List-Id: users.spamassassin.apache.org
Delivered-To: mailing list users@spamassassin.apache.org
Received: (qmail 24496 invoked by uid 99); 4 Dec 2012 

Re: Spamassassin test files: sample-nonspam.txt and sample-spam.txt are missing?

2012-11-26 Thread darxus
They're in the Debian package I have installed, and the subversion source
tree.  Sounds like a FreeBSD packaging problem.

In the source:

http://svn.apache.org/repos/asf/spamassassin/trunk/sample-nonspam.txt
http://svn.apache.org/repos/asf/spamassassin/trunk/sample-spam.txt

On 11/26, Ed Flecko wrote:
 Hi folks,
 I'm running SpamAssassin version 3.3.2 (running on Perl version
 5.14.2) on FreeBSD 9.0.
 
 I've installed Spamassassin from the FBSD ports collection by:
 
 # cd /usr/ports/mail/p5-Mail-SpamAssassin
 # make config ; make -D WITH_DCC install clean
 
 I'm trying to test spamassassin using the sample-nonspam.txt and
 sample-spam.txt files...but I can't find them anywhere!
 
 Is it possible that when I installed spamassassin using the install
 clean method that I wiped out my sample files?
 
 If so...how do I test spamassassin?
 
 Thank you!
 
 Ed
 

-- 
For every complex problem, there is a solution that is simple, neat,
and wrong. - H. L. Mencken
http://www.ChaosReigns.com


Re: Provide sa-learn with a CSV file of spam and ham?

2012-11-26 Thread darxus
--mboxInput sources are in mbox format
 --mbx Input sources are in mbx format

--folders=filename, -f filename

sa-learn will read in the list of folders from the specified file, one 
folder per line in the file. If the folder is prefixed with ham:type: or 
spam:type:, sa-learn will learn that folder appropriately, otherwise the 
folders will be assumed to be of the type specified by --ham or --spam.

type above is optional, but is the same as the standard for 
ArchiveIterator: mbox, mbx, dir, file, or detect (the default if not specified).

 - http://spamassassin.apache.org/full/3.3.x/doc/sa-learn.html

So you can specify an input format of mbox, mbx, dir (maildir), file, or
detect.   Looks like no csv.


I'd guess a lot of people use spamassassin without bayes.

On 11/26, Ed Flecko wrote:
 Hi folks,
 I'm running SpamAssassin version 3.3.2 (running on Perl version
 5.14.2) on FreeBSD 9.0.
 
 I've exported a bunch of spam and ham messages from my Baracuda 400.
 
 I have an Excel .csv file of about 2500 spam messages and 2500 ham
 messages, and I'm wondering if I can supply those as a parameter to
 sa-learn? I've looked at the documentation
 (http://spamassassin.apache.org/full/3.2.x/doc/sa-learn.html) and I
 see that you can pass the file as a parameter, but I'm not clear how
 you'd do that and in what format the file needs to be? CAN it be a
 .csv or should it be something else?
 
 I'm new to spamassassin, but (for those of you more familiar with the
 product), teaching spamassassin is TYPICALLY the first thing one
 would do before deploying it in a production environment, wouldn't
 you?
 
 Thank you,
 
 Ed
 

-- 
Hermes will help you get your wagon unstuck, but only if you push on it.
- Greek Alphabet Oracle
http://www.ChaosReigns.com


Re: wrong RCVD_IN_PBL?

2012-11-20 Thread darxus
This is quite different.  The IP delivering the email to your server is
what's hitting RCVD_IN_PBL.  Providing that part of the spamassassin -t
output so I didn't need to do it myself would've been helpful.

 pts rule name  description
 -- --
 3.6 RCVD_IN_PBLRBL: Received via a relay in Spamhaus PBL
[82.165.159.34 listed in zen.spamhaus.org]

On 11/20, Andreas Schulze wrote:
 I have a similiar issue with a web.de (german webmail) user. He uses his 
 iPhone
 to submit mail via web.de submission service. (TLS + Authentication)
 
 The message triggers RCVD_IN_PBL and others. Any hint to make those message 
 pass sa?
 
 here are the headers:
 --- snip
 X-Spam-Status: Yes, score=7.14 tag=-999 tag2=5 kill=5 tests=[BAYES_00=-1.9,
 FREEMAIL_FROM=0.001, HTML_IMAGE_ONLY_12=2.059,
 HTML_MESSAGE=0.001, MTX_NONE=0.001, RCVD_IN_PBL=3.335,
 RCVD_IN_PSBL=2.7, RCVD_IN_RP_RNBL=1.31, RP_MATCHES_RCVD=-0.369,
 TVD_SPACE_RATIO=0.001] autolearn=no
 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on 
 idvamavis03.datev.de
 X-Spam-ASN: AS8560 82.165.0.0/16
 Received: from mout-xforward.web.de (mout-xforward.web.de [82.165.159.34])
 by idvmailin03.datev.de (Postfix) with ESMTP id 3Y5btV2sQ8z690G;
 Tue, 20 Nov 2012 20:04:02 +0100 (CET)
 Received: from [192.168.178.43] ([93.205.254.85]) by smtp.web.de (mrweb102)
  with ESMTPSA (Nemesis) id 0MA5v3-1TPekj36PR-00BSUp; Tue, 20 Nov 2012 19:59:01
  +0100
 Subject: test
 References: a0323c6a-fb02-42df-aa94-c97672816...@web.de
 From: foo...@web.de foo...@web.de
 Mime-Version: 1.0 (1.0)
 Content-Type: multipart/alternative;
 boundary=Apple-Mail-87E5DAF2-18C6-4FCD-BF0D-CD6386E473CE
 X-Mailer: iPhone Mail (10A523)
 Message-Id: e41b88ea-b9cf-4ab1-a033-c2c7c0a13...@web.de
 Date: Tue, 20 Nov 2012 19:58:57 +0100
 Cc: foo...@datev.de
 Content-Transfer-Encoding: 7bit
 To: foo...@datev.de
 X-Provags-ID: V02:K0:EvqK/RN09UfFRommwYltjAXMl2r5JXh5KWYmQ/XvFE7
  v78RzfvGZ2i90sbUnAmle0j16h4tGzLgsFuwPaanb1zpyriAC1
  wbvb4NZuBy1wZDi2uIhlRUmtyTNNXdYa4InULTNS7wG4t+vqOm
  ugaM5p60njVb35BTzZd8ONV2nh4sL0Mke/7RawEhWRPZkuXKs8
  LiB5mlVf7ikRcHdur53ew==
 
 
 --Apple-Mail-87E5DAF2-18C6-4FCD-BF0D-CD6386E473CE
 
 --- snap
 
 

-- 
My definition of a free society is a society where it is safe to be
unpopular. - Adlai E. Stevenson Jr.
http://www.ChaosReigns.com


Re: wrong RCVD_IN_PBL?

2012-11-18 Thread darxus
On 11/17, umeca74 wrote:
 Received: from hppro (ppp-94-68-74-194.home.otenet.gr [94.68.74.194]) 
   by mrelayeu.kundenserver.de (node=mrbap1) with ESMTP (Nemesis) 
 
  I believe if that said ESMTPA instead of ESMTP, 
  you would not have that problem
 
 are you sure? I will report it to my ISP

No, I'm not sure, which is why I said I believe and But I haven't
actually looked into those details lately.  We need better documentation
of this.  But I am very confident something along these lines is your
problem, and that it's appropriate to complain to your ISP that they're not
properly indicating authentication in the received header they're adding.

-- 
If you would be a real seeker after truth, it is necessary that at
least once in your life you doubt, as far as possible, all things.
- Rene Descartes
http://www.ChaosReigns.com


Re: wrong RCVD_IN_PBL?

2012-11-18 Thread darxus
On 11/18, RW wrote:
 Whilst that wont hurt, it's not the real cause of the problem here which
 rests entirely with UnifiedeMail.net.
 
 Whilst it would have prevented this FP, authentication is intended to
 solve a different problem. It shouldn't be necessary to have a
 workaround for the internal network being needlessly allowed to bleed
 into a remote private network.  
 
 I wouldn't worry too much about this, it's not a general problem.

I disagree.  I think indicating the authentication is a better option than
chopping off the early received header(s).  

-- 
I'd rather be happy than right any day.
- Slartiblartfast, The Hitchhiker's Guide to the Galaxy
http://www.ChaosReigns.com


Re: wrong RCVD_IN_PBL?

2012-11-17 Thread darxus
On 11/17, Frederic De Mees wrote:
 From: umeca74 umec...@hotmail.com
 
 3.3 RCVD_IN_PBL
 RBL: Received via a relay in Spamhaus PBL
 [94.68.74.194 listed in zen.spamhaus.org]
 
 
 Your IP (ppp-94-68-74-194.home.otenet.gr is: 94.68.74.194) looks
 like a dynamic home user subscriber line (adsl, cable, dialup).
 
 PBL contains ranges of IP addresses that should never send e-mail
 directly to other domains.
 You should use Otenet's SMTP service offered with your subscription
 as a relay host (smart host), or rent a dedicated server/VPS in a
 colo as an alternative.

No, all this should be completely unnecessary, and handled by spamassassin
detecting an indication of authentication in the received header.  That
indication of authentication is missing.  I'd suggest complaining to the
mail server provider about it.  

Received: from hppro (ppp-94-68-74-194.home.otenet.gr [94.68.74.194])
by mrelayeu.kundenserver.de (node=mrbap1) with ESMTP (Nemesis)
id 0LhkkD-1Svsfh1rOL-00mkUj; Sat, 17 Nov 2012 04:20:25 +0100

I believe if that said ESMTPA instead of ESMTP, you would not have that
problem.  But I haven't actually looked into those details lately.  We need
better documentation of this.

-- 
The reasonable man adapts himself to the world; the unreasonable one
persists in trying to adapt the world to himself.  Therefore all progress
depends on the unreasonable man. - George Bernard Shaw
http://www.ChaosReigns.com


Re: wrong RCVD_IN_PBL?

2012-11-17 Thread darxus
I don't think that should cause triggering RCVD_IN_PBL.

On 11/17, Frederic De Mees wrote:
 There is one line missing in the following path:
 =
 Received: from mx.mg2.unifiedemail.net ([10.251.10.236]) by
 corpserv1.corp.unifiedemail.net with Microsoft SMTPSVC(6.0.3790.4675);
 Fri, 16 Nov 2012 22:20:32 -0500
 Received: from ([127.0.0.1]) with MailEnable ESMTP; Fri, 16 Nov 2012
 22:20:28 -0500
 Received: from hppro (ppp-94-68-74-194.home.otenet.gr [94.68.74.194])
 by mrelayeu.kundenserver.de (node=mrbap1) with ESMTP (Nemesis)
 id 0LhkkD-1Svsfh1rOL-00mkUj; Sat, 17 Nov 2012 04:20:25 +0100
 =
 A no time the message shows that Unifiedmail has received it from
 kundenserver.
 
 Have you submitted your sample to Unifiedemail via the webform, or
 via e-mail ?
 
 Frédéric
 
 
 - Original Message - From: umeca74 umec...@hotmail.com
 To: users@spamassassin.apache.org
 Sent: Saturday, November 17, 2012 5:00 PM
 Subject: Re: wrong RCVD_IN_PBL?
 
 
 Your IP (ppp-94-68-74-194.home.otenet.gr is: 94.68.74.194) looks like
 a dynamic home user subscriber line (adsl, cable, dialup).
 
 that's correct
 
 PBL contains ranges of IP addresses that should never send e-mail
 directly to other domains.
 
 that's what I'm saying, I am NOT sending emails directly from this IP, the
 SMTP server is located in germany (1and1.co.uk) and I am connecting to it
 using an encrypted authorized connection. That's why I think there is a
 problem with spam assassin's RCVD_IN_PBL report!
 
 
 
 --
 View this message in context: 
 http://spamassassin.1065346.n5.nabble.com/wrong-RCVD-IN-PBL-tp102334p102340.html
 Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
 
 

-- 
No human thing is of serious importance. - Plato
http://www.ChaosReigns.com


Re: wrong RCVD_IN_PBL?

2012-11-16 Thread darxus
On 11/16, umeca74 wrote:
 Hello
 
 I am doing some tests sending my emails to contentanaly...@unifiedemail.net
 to assess their spamminess
 
 when I send an email through e.g. hotmail, then it is low scored by
 spamassassin
 
 if I use MS Outlook to go through my SMTP server I immediately see a hefty
 spam score on account of a blocked IP address:
 
 3.3   RCVD_IN_PBL 
 RBL: Received via a relay in Spamhaus PBL
 [94.68.74.194 listed in zen.spamhaus.org]
 
 The explanation given there is that I am not using authenticated SMTP,
 whereas I *am* using an authenticated SMTP connection through port 587
 
 is there something wrong with spam assassin here or is it my fault?

Your MTA isn't mentioning the authentication in the relevant received
header in a way that spamassassin recognizes.

-- 
The reasonable man adapts himself to the world; the unreasonable one
persists in trying to adapt the world to himself.  Therefore all progress
depends on the unreasonable man. - George Bernard Shaw
http://www.ChaosReigns.com


Re: wrong RCVD_IN_PBL?

2012-11-16 Thread darxus
On 11/16, umeca74 wrote:
 thanks for your reply. By MTA you mean my email program, Microsoft Outlook?
 I didn't change any of its settings, is there anything I could try?

No, your mail server software.  If your mail client (outlook) could add it,
then any client could forge that information.

Providing full headers would probably make it easier to help you.

-- 
You shall know the truth, and it shall make you odd.
-- Flannery O'Connor
http://www.ChaosReigns.com


Re: Regex Help

2012-11-10 Thread darxus
On 11/10, Marc Perkel wrote:
 Need a rule to catch this:
 
 HtTp://goOGleplAcESSEOopTimiZaTIonx.cOm

body GOOGLEMIXED /HtTp:\/\/goOGleplAcESSEOopTimiZaTIonx.cOm/

Untested, because I kind of expect that's not actually what you want.  If
you want something to match things that look similar to this, you need to
provide multiple examples.

-- 
it's not how good you are, it's how bad you want it - no fear
http://www.ChaosReigns.com


Re: Claims manager / LOTTO_AGENT

2012-11-07 Thread darxus
Just in case nobody has pointed you toward it before:
https://wiki.apache.org/spamassassin/NightlyMassCheck

Stats we currently have on that rule:
http://ruleqa.spamassassin.org/?daterev=20121103rule=LOTTO_AGENT

  MSECSSPAM% HAM% S/ORANK   SCORE  NAME   WHO/AGE
  0   0.5022   0.0011   0.9980.743.50  LOTTO_AGENT  

It hits 2 of the 180,272 non-spams we have for use in optimal score
generation.  


On 11/07, Michael Orlitzky wrote:
 So, LOTTO_AGENT will hit the string Claims Manager for 3.5 points.
 This is bad news for,
 
   Barbara R. Krieg, Claims...

When you put a string an an email that hits a spamassassin rule... your
email then hits that spamassassin rule.  You should generally try to avoid
that.

-- 
It's never too late to panic.
http://www.ChaosReigns.com


Re: Claims manager / LOTTO_AGENT

2012-11-07 Thread darxus
On 11/07, Michael Orlitzky wrote:
 Yeah, well it's her job title, so...? You misunderstand statistics. The
 data aren't wrong.

Do I?  I think it's more likely that you misunderstand what is expected of
spamassassin rules.

Somebody really should put up a page in the wiki explaining that rules all
have false positives, and that's the entire reason we don't flag an email
as spam for any one rule, etc..


But if you provide us with more masscheck data, we can do a better job of
automatically calculating ideal scores.

-- 
Of course there's strength in numbers. But there's strength in sharp
weaponry too. Ironically, this lead to what we call 'civilization'.
- spore
http://www.ChaosReigns.com


Re: Claims manager / LOTTO_AGENT

2012-11-07 Thread darxus
On 11/07, Michael Orlitzky wrote:
 On 11/07/2012 09:49 PM, dar...@chaosreigns.com wrote:
  On 11/07, Michael Orlitzky wrote:
  So, LOTTO_AGENT will hit the string Claims Manager for 3.5 points.
  This is bad news for,
 
Barbara R. Krieg, Claims...
  
  When you put a string an an email that hits a spamassassin rule... your
  email then hits that spamassassin rule.  You should generally try to avoid
  that.
 
 Yeah, well it's her job title, so...? You misunderstand statistics. The
 data aren't wrong.

After re-reading, I think you may have misunderstood my suggestion to avoid
putting stuff in emails that is known to hit spam rules.  I wasn't
suggesting that Barbara R. Krieg change her signature, I was suggesting
that you not include it intact when posting to this mailing list about it.

-- 
You shall know the truth, and it shall make you odd.
-- Flannery O'Connor
http://www.ChaosReigns.com


Re: Claims manager / LOTTO_AGENT

2012-11-07 Thread darxus
On 11/07, Michael Orlitzky wrote:
 Sorry, I was a little rude. But saying that she shouldn't put her job
 title anywhere in an email, ever, is ridiculous. 

Certainly.

 The inputs (spam, ham)
 to the classifier are assumed god-given; and the classification needs to
 reflect the data, not the other way around.

If the classifier is spamassassin, and The inputs are the spam
and ham data provided via masscheck, then... the scores provided via
sa-update *do* reflect the data.  So I'm not sure what you mean.

The ideal rule scores are chosen to cause one false positive (ham flagged
as spam) in every 2,500 hams, while maximizing the number of spams
correctly flagged as spams.  With so few hams hitting this rule in the
masscheck corpora, we're way below that threshold based on the data we
have.

 This is my fault, of course, but I'm not allowed to mass-check this
 stuff. It's ongoing legal correspondence.

Er, what?  You're not allowed to provide a list of which rules hit each
of your emails?  Or you're not allowed to run a program on your emails
that isn't spamassassin?  Or did I just not put This does not require
sending us your email in bold enough times on the masscheck page?

-- 
It's never too late to panic.
http://www.ChaosReigns.com


Re: HK_LOTTO hitting ham from the UK national lottery

2012-11-01 Thread darxus
On 11/01, Niamh Holding wrote:
 
 Hello Darxus,
 
 Wednesday, October 31, 2012, 10:34:42 PM, you wrote:
 
 dcc They're talking about automated score generation.  Currently, apparently,
 dcc the scores for this rule are fixed, and not included in the calculation 
 of
 dcc ideal scores.
 
 So currently submitting the ham to the corpus won't actually help
 change anything?

Yes.  But two of the developers have agreed that's worth changing, so it
could happen today

And that could change the scores in either direction.

-- 
If you believe everything you read, better not read. - Japanese Proverb
http://www.ChaosReigns.com


Re: HK_LOTTO hitting ham from the UK national lottery

2012-10-31 Thread darxus
On 10/31, Niamh Holding wrote:
 A if you provide a few dozen samples of these hammy msgs , they can be 
 A included in the SA ham corpus
 
 That can be supplied, an mbox of a good supply do?
 
 A you can directly contribute to rescoring by running a masscheck instance
 A as per:
 A http://wiki.apache.org/spamassassin/NightlyMassCheck
 
 Currently not so easy as-
 
 a) all high scoring spam is dumped by procmail
 
 b) I'd need to get back from all the users details of misclassified
 messages so they could be moved to the correct corpora.

You could just provide a few dozen samples of these hammy msgs via
masscheck.  The more you can provide, and the more representative it is,
the better.

Not including high scoring spam isn't a big problem.  Things spamassassin
gets wrong are most useful.

The automated score generation used for the sa-updates comes from email
from about fourteen people, so anything you can provide would probably
be beneficial.

At the bottom of that page is an UploadedCorpora link which you can use
to upload the emails themselves without even needing to run masscheck
yourself.

-- 
You only truly own what you can carry at a dead run.
- 14th  15th century Landsknechts
http://www.ChaosReigns.com


Re: HK_LOTTO hitting ham from the UK national lottery

2012-10-31 Thread darxus
On 10/31, jdow wrote:
 On 2012/10/31 14:05, John Hardin wrote:
 On Wed, 31 Oct 2012, Kevin A. McGrail wrote:
 
  Shouldn't it be set via GA in 72_scores.cf ?
 
 Doesn't sound like a bad idea to comment it in 50_scores.cf and let it 
 float.
 
 +1. That's what threw me when I did my quickie analysis early on.
 
 RaaallY? Would it not be better to put in a line like this?
 score HK_LOTTO 0
 
 50_scores.cf would be continually getting overwritten by updates, would
 it not?

They're talking about automated score generation.  Currently, apparently,
the scores for this rule are fixed, and not included in the calculation of
ideal scores.  They're talking about including it in the calculation of
ideal scores.  Which you download the results of from sa-update.

They're not talking about local score modification.

-- 
Eh, wisdom's overrated. I prefer beatings and snacks.
- Unity, Skin Horse
http://www.ChaosReigns.com


Re: Question about rule: 2.0 DEAR_SOMETHING BODY: Contains 'Dear (something)'

2012-10-28 Thread darxus
On 10/28, Alexandre Boyer wrote:
I understood that. I however need to rescore my ruleset because the setup
I inherited was 1) not updated with sa-update and 2) manually maintained
(with , for example, lot's of perso rules that essentially do the same as
the SA rules added over time).

I don't understand why re-scoring seems like a necessary step to you.

One thing I really want you to understand is that the automated main SA
re-scoring does not happen unless we have 150,000 spams, and 150,000
hams (non-spams).  Because we do not trust the results to be sufficiently
accurate / reliable with fewer.

If you can get that many hand classified hams and spams together, that's
awesome, I envy you, and I think that would be a great idea for your
accuracy.  However, I doubt it.

If you do get re-scoring to work at all, I strongly encourage you to update
the wiki.  I'm sure that section is particularly in need of love because
nobody ever does that.  Just create an account on the wiki, and email the
dev mailing list to request write access.


The age thresholds for re-scoring are:
Ham: 6 years (crazy, right?  another reason we need more data)
Spam: 2 months

As a brutal reset is out of question, I need to do things step by step,
rescoring being one of them prior to have my threshold back to 5 and
sa-update enabled.

Taking things step by step sounds reasonable enough.  Re-scoring doesn't.  

All this being my own private problem, nothing to do with our off topic
exchange :-)

Eh, it's some obscure usage, but I still think it's entirely appropriate to
discuss here.

Arround 10 corpora. Are those corpora used tu run the SA mass-check on SA
servers or do it also include what I will send one day (my mc logs)?

I'll assume you'll find my email which said more on this subject, instead
of replying to some of this again.

-- 
Life is either a daring adventure or it is nothing at all.
- Helen Keller
http://www.ChaosReigns.com


Masscheck Re: Question about rule: 2.0 DEAR_SOMETHING BODY: Contains 'Dear (something)'

2012-10-26 Thread darxus
On 10/26, Alexandre Boyer wrote:
 Well, discouraged was implicit (as is the fact that every admin is

I don't think there's anything implicit about it being discouraged to use a
threshold below 5.  There are lots of local changes which are far less
likely to cause problems, and encouraged.

 The SA rules scores are computed based on the mass-checks, from the
 project and, to some extend, from contributors. A good question is: how
 many contributors really give a feedback on the mass-checks?

This is public information, although not very explicit.
On http://ruleqa.spamassassin.org/ look in the green box, it lists all the
corpora included:

  axb-coi-bulk
  axb-fraud
  axb-generic
  axb-ham-misc
  axb-sa-users
  axb-woas
  bb-guenther_fraud
  bb-jhardin
  bb-jhardin_fraud
  bb-jm
  bb-kmcgrail
  bb-zmi
  bpoliakoff
  danmcdonald
  darxus
  grenier
  jarif
  kpg-gah
  mas
  zmi

The ones starting with bb- are uploaded emails, instead of running
masscheck locally, it's run centrally.  Other than that, the prefixes are
each different contribtors.  So:

axb, guenther, jhardin, jm, kmcgrail, zmi, bpoliakoff, danmcdonald,
darxus, grenier, kpg-gah, kpg, mas, zmi.

14 masscheck contributors.  We'd probably benefit a lot by significantly
increasing that, which is why I mention it somewhat often.

 This is something I do not know, but the fewer they are, the greater the
 bias is. Bias in spam and ham samples. Emails reaching my servers are
 different from yours and from each and every SA users.

Absolutely.

 Unless everybody on earth run a nightly mass-check and report results to
 SA project for it to compute a world wide scoring, there is a bias. At
 least this is my understanding, may be I'm wrong, please correct me if so.

No, you're totally right.  We do what we can with what we have, and I think
we do pretty darn good.  But we could do better with more data.  

 For example, I'm in the process of learning to use mass-check to
 contribute back to SA (which implies a lot of hard work, simply to build
 and maintain valid ham/spam corpora, use mass-check, then hit-freq, then
 fp-fn-stat, I'm not even close to understand how to compute a re-score.

I don't know what fp-fn-stat is.  You don't need to computer a re-score -
that's part of what is done with your maccheck data after you upload it.

There's a reletively recently created mailing list specifically for helping
people with this stuff, to which I believe you automatically get subscribed
when you get a masscheck account:
http://wiki.apache.org/spamassassin/MailingLists#RuleQA

If you're having difficulty with it, the docs probably need improvement, so
do let us know.


Your mention of fp-fn-stat makes me think you may have veered a little too
far from https://wiki.apache.org/spamassassin/NightlyMassCheck

 with this, I'm not sure my contribution would be sufficient to make SA
 scores to be closer to my email traffic reality.

I think it would.  For example, I'm sure, from what you've posted, that you
have enough examples of hams that hit DEAR_SOMETHING that the score of it
would drop significantly.

 Do you have any stat about how many contributors are giving a feedback
 on the masscheck? and about their geographical location? I'm just asking
 because I was not able to find this kind of information anywhere.

I believe they're almost all in the US, primarily English speakers.  That's
bad.

-- 
You only truly own what you can carry at a dead run.
- 14th  15th century Landsknechts
http://www.ChaosReigns.com


Re: Question about rule: 2.0 DEAR_SOMETHING BODY: Contains 'Dear (something)'

2012-10-25 Thread darxus
On 10/25, Bowie Bailey wrote:
 On 10/25/2012 10:47 AM, Simon Loewenthal wrote:
 *  2.0 DEAR_SOMETHING BODY: Contains 'Dear (something)'
 
 Does anyone know the rational behind this, or is our user base simply 
 communicating on a higher level?  :)  I imagine the rational is sound, but I 
 do not know what it is.
 
 The rationale is simple.  The masscheck finds that this rule hits
 more spam than ham, so it gets a higher score.

It's slightly more complicated than that.  It's that this score results in
the maximum spams flagged as spam without exceeding 1 false positive in
2,500 non-spams.

A fun example is SUBJ_YOUR_DEBT, which was getting a score of 3.0 while
hitting more non-spam than spam.  I guess it got disabled somehow.


But more importantly, it's because we do not have have the rule
hit statistics from your email to include them in optimal score
generation because you're not submitting those stats via masscheck:
https://wiki.apache.org/spamassassin/NightlyMassCheck


RuleQA results for that rule are here:
ruleqa.spamassassin.org/?daterev=20121020rule=DEAR_SOMETHING

  MSECSSPAM% HAM% S/ORANK   SCORE  NAME   WHO/AGE
  0   0.6160   0.2324   0.7260.632.00  DEAR_SOMETHING  

It hits 0.6% of spam, and 0.2% of non-spam (ham).


On 10/25, Alexandre Boyer wrote:
 Simon, I had some FPs because of this rule and because my threshold is
 lower than 5.

If you could just append and I know this is highly discouraged
any time you say that, you might reduce my need to point it out to
avoid you causing other people to think that might be a good idea.
Scores are generated with a threshold of 5.  It's often recommended to
use a threshold above 5 for an extra safety measure.  Do you even have a
guess what rate of false positives your causing with a lower threshold?
I don't.

 I just had a score override to lower it but this rule still hist a lot
 of spam (419 scams essentially).

Yup, nothing wrong with customizing your rules to suit the email you get
better.  At least in the direction of reducing false positives.  

-- 
I finally figured out the only reason to be alive is to enjoy it.
- Rita Mae Brown
http://www.ChaosReigns.com


Re: SA wiki

2012-10-24 Thread darxus
On 10/23, Joseph Acquisto wrote:
 at
 http://wiki.apache.org/spamassassin/SiteWideBayesFeedback
 
 the link  a cookbook to setup site wide ham/spam forwarding for postfix 
 http://gtmp.org/publications/sa-postfix-en;,  links to topic does not exist 
 yet.

It apparently got deleted.  The page is available in archive.org, a very
useful tool.

Anybody can edit the wiki, just create an account and email the dev list
asking for write access.  This is mentioned at the bottom of the front page
of the SA wiki, but I know it's not very obvious, I missed it myself.

You could also try contacting the owner of gtmp.org.

-- 
Just because you're offended, doesn't mean you're right. - Ricky Gervais
http://www.ChaosReigns.com


Re: sa-update different rulesets

2012-10-24 Thread darxus
To do sa-update with the default channel and the saught channel, I have a
cron job that does:  

/usr/bin/sa-update --gpgkey 6C6191E3 --channel sought.rules.yerp.org --channel 
updates.spamassassin.org

No, just grabbing a channel once will not cause sa-update to keep it up to
date on its own afterward.

On 10/25, Jonathan Nichols wrote:
 Evening,
   This might be particular to the Ubuntu spamassassin package, but I'm a 
 little confused about sa-update and the channel files. 
 
 I added sought  dostech rulesets and updated them with sa-update. Will 
 sa-update remember them and continue to update them daily? 
 
 Does sa-update need to be told which rulesets to download? Debian/Ubuntu have 
 a spamassassin script in /etc/cron.daily but I didn't see anything in it 
 that was specific to the update channels. 
 
 Cheers,
 --
 jonathan
 
 

-- 
I don't want to die... just yet... not while there's... women.
- J. Matthew Root, 8/23/02 (http://www.jmrart.com/)
http://www.ChaosReigns.com


Re: BAYES_99 score

2012-10-22 Thread darxus
On 10/22, JP Kelly wrote:
 Should I set the BAYES_99 score high enough to trigger as spam?
 I get plenty of spam getting through which does not get caught because 
 BAYES_99 is the only rule which fires and it is not set to score at or above 
 the threshold.

You could.  Some people only use bayesian filtering, which would be
similar.  The important question is, how many false positives (non-spams
flagged as spams) would that cause?  SpamAssassin's automated scoring
attempts to achieve 1 false positive in 2,500 non-spams, with a score
threshold of 5.0.  So if you don't have an absolute minimum of 2,500
representative non-spams to check for having hit BAYES_99, you risk
increasing your false positives.  But it's your risk to take.

Huh, ruleqa doesn't track hits to BAYES_99?

-- 
Let's just say that if complete and utter chaos was lightning, then
he'd be the sort to stand on a hilltop in a thunderstorm wearing wet
copper armour and shouting 'All gods are bastards'. - The Color of Magic
http://www.ChaosReigns.com


Re: BAYES_99 score

2012-10-22 Thread darxus
On 10/23, Jari Fredriksson wrote:
 22.10.2012 21:15, dar...@chaosreigns.com kirjoitti:
  Huh, ruleqa doesn't track hits to BAYES_99?
 If it did, against which database it would do that?

It would show the hit rates in the corpora of the masscheck submitters,
like everything else.  So, the databases of the submitters (who are using
bayes).

-- 
I don't want people who want to dance, I want people who have to dance.
--George Balanchine
http://www.ChaosReigns.com


Re: autolearn

2012-10-21 Thread darxus
I believe that means the score was low enough that it was automatically fed
to sa-learn as ham (non-spam).  

That's scary, I don't use it (bayes_auto_learn 0).

On 10/21, Joseph Acquisto wrote:
 Today I found a missed SPAM that contained this in the header:
 
 X-Spam-Status: No, score=0.0 required=5.0 tests=FREEMAIL_FROM,MISSING_SUBJECT,
 T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=ham version=3.3.2
 
 The subject was empty with a link starting with ftp:
 
 I guess it's the autolearn that  is most puzzling me.
 
 joe a.
 

-- 
I refuse to tip toe through life only to arrive safely at death.
http://www.ChaosReigns.com


Re: Sender domain in IP space 5.0.0.0/8 triggers RCVD_ILLEGAL_IP

2012-10-16 Thread darxus
On 10/16, Frederic De Mees wrote:
 I have found 2 instances of the file 20_head_tests.cf on my server.
 The first stays in /usr/share/spamassassin and contains the following

That's used when you have never run sa-update.

 The second in /var/lib/spamassassin/3.003001/updates_spamassassin_org and

That was downloaded by sa-update.  What is the date on the files in that
directory?  It should be in the last couple days (because you should be
running sa-update daily from cron).

 contains:
 /
 (?:by|ip)=(?=\d+\.\d+\.\d+\.\d+ 
 )(?:(?:0|2(?:2[4-9]|[3-5]\d)|192\.0\.2|198\.51\.100|203\.0\.113)\.|(?:\d+\.){0,3}(?!(?:2(?:[0-4]\d|5[0-5])|[01]?\d\d?)\b))/

Yup, that looks like the current one:

header RCVD_ILLEGAL_IP X-Spam-Relays-Untrusted =~ / 
(?:by|ip)=(?=\d+\.\d+\.\d+\.\d+ 
)(?:(?:0|2(?:2[4-9]|[3-5]\d)|192\.0\.2|198\.51\.100|203\.0\.113)\.|(?:\d+\.){0,3}(?!(?:2(?:[0-4]\d|5[0-5])|[01]?\d\d?)\b))/

 So, maybe SA uses the wrong files.

Could be, but I'd guess that's not it.  The strace command can be useful
for that.

 The other possibility stays with the spampd policy daemon. With a server
 uptime of several months I cannot remember the last time I stopped and
 restarted the daemon.

That sounds like your problem.  

When I was using spampd, I had a /etc/init.d/spampd restart after my
sa-update in cron.  As is suggested on:
http://wiki.apache.org/spamassassin/IntegratePostfixViaSpampd

-- 
I'd rather be happy than right any day.
- Slartiblartfast, The Hitchhiker's Guide to the Galaxy
http://www.ChaosReigns.com


Re: Testing new install - was Updating 3.2.4 on SUSE sles10

2012-10-15 Thread darxus
Try sending it from the server you're testing?

On 10/15, Joseph Acquisto wrote:
 Still can't get GTUBE messages.   Am I being dense?   Sending messages with 
 the GTUBE signature,
 from external sites, don't seem to arrive.   I don't see them trapped in my 
 day jobs outgoing
 queue, etc. 
 
 ??
 
 joe a.
 
  Joseph Acquisto j...@j4computers.com 10/14/12 5:50 PM 
 Upgrade effort abandoned.  Installed OpenSuse 12.2 which, oddly enough, came 
 with the new version of
 SA.   All seems to be working well.   OS seems a big leisurely though.
 
 However, I can't seem to test with GTUBE.   Gmail seems to eat these, they 
 never get to me.  Normal? Other gmail tests do.
 
 Also, cannot test using the spamassassin -D  
 /usr/share/spamassassin/doc/spamassassin-3.0.3/sample-spam.txt
 test, as, in this install, there is no doc directory.   At least I have not 
 found it yet.
 
 Words of guidance?
 
 joe a.
 
 
 
 

-- 
For every complex problem, there is a solution that is simple, neat,
and wrong. - H. L. Mencken
http://www.ChaosReigns.com


Re: Updating 3.2.4 on SUSE sles10

2012-10-10 Thread darxus
Would this not be far easier and more appropriate?

http://www.rpmfind.net/linux/rpm2html/search.php?query=spamassassinsubmit=Search+...system=opensusearch=

Doesn't your distro provide an easy way to search for / upgrade these
things?  (Why would you use a distro that doesn't?)


With ubuntu I'd do:

  apt-get update  apt-get dist upgrade

And have the latest versions of all the packages in the release I'm using.
If the current release doesn't have new enough packages, I'd run
do-release-upgrade, and it would upgrade everything to the next release.

Ubuntu had a spamassassin v3.3.2 package in May 2011.  It's in the archives
for the Oneric, Precise, and Quantal releases.  And I created an ubuntu PPA
providing daily spamassassin builds:
https://launchpad.net/~spamassassin/+archive/spamassassin-daily


And no, installing from source does not mean the distro CDs.  
(You could web search for:  installing from source.)

On 10/10, Joseph Acquisto wrote:
  On 10/10/2012 at 2:34 AM, Per Jessen p...@computer.org wrote:
  Joseph Acquisto wrote:
  
  On 10/9/2012 at 3:02 PM, Per Jessen p...@computer.org wrote:
  Joseph Acquisto wrote:
  
  Won't make, anyway.  Module Net-addr::IP missing.  Finding this
 for
  SuSe seems to be an adventure in itself.
  
  Just install from source.
  
  
  
  --
  Per Jessen, Zürich (14.6°C)
  
  You mean perl-net-addr-ip from source?  If you mean from the Distro
  package (CD's ?), I don't find it there.
  
  Yep, I meant perl-net-addr-ip.  Whether you get it from SUSE or from
  source won't matter. 
  
  
  
  -- 
  Per Jessen, Zürich (13.4°C)
 
 Compiled from stuff at your link to cpan.
 
 So far, so good.   Got some noise about UTF-8, but forged ahead.
 
 perl Makefil.pl (in spamassasin extract folder) gives this:
 
 Checking if your kit is complete...
 Looks good
 Warning: prerequisite Mail::DKIM 0.31 not found.
 Writing Makefile for Mail::SpamAssassin
 
 Problem?
 
 Also, I hesitate to do the final steps as I fear hosing the working
 install.   Yes, I should have built another, but . . .
 
 joe a.
 

-- 
I'd rather be happy than right any day.
- Slartiblartfast, The Hitchhiker's Guide to the Galaxy
http://www.ChaosReigns.com


Re: How can I get SA to tell me what CLAMAV found?

2012-10-05 Thread darxus
On 10/05, Steven W. Orr wrote:
 but I'd like to know which CLAMAV virus was the trigger. Is there a
 way to get output somewhere that tells me which signature(s) fired?

Ask the clamav people?

-- 
If you want to make an apple pie from scratch, you must first create
the universe. - Carl Sagan
http://www.ChaosReigns.com


Re: Try to run sa-learn

2012-10-04 Thread darxus
On 10/04, troxlinux wrote:
 Hi list , I try to run sa-learn on centos 6.3 but no work
 
  sa-learn --spam --showdots /dir/dir/domain.com.ni/spam/.spam/cur/

Try:

sa-learn --spam --showdots /dir/dir/domain.com.ni/spam/.spam/

(cur/ is inside the mailbox, not part of the path to the mailbox)

-- 
Blessed are the cracked, for they shall let in the light.
http://www.ChaosReigns.com


Re: SA rules matching of ipv6 addresses

2012-10-02 Thread darxus
Run the email through spamassassin -D received-header.  That'll tell you
how and if the headers got parsed.  SA has certainly had bugs where it
failed to parse received headers before, and IPv6 hasn't had a whole lot of
use.

There has also been a fair amount of work on IPv6 since the last release,
so it's possible there was a bug, it got fixed, and you don't have the
fix yet.

On 10/02, Mabry Tyson wrote:
One user complained about a false positive.  When I examined the mail,
there appeared to be at least two rules that didn't work as I thought they
should because of a Received line in which IPv6 Link Local addresses were
used.   It appears that a patch was previously put in that was thought to
fix these kinds of things.
The sender was apparently using AA.BB.CC.DD (a Comcast address, presumably
his home address).
He logged into the mail system of SRI.COM (independent of our mail system)
and
sent his mail from within it (which is why CCC.SRI.COM is the oldest
Received line).

That should result in a received header clearly indicating that the
connection from comcast was authenticated, and SA should notice that
and use it to skip the tests on that comcast IP.

It mostly sounds like this is what's missing.  SRI.com not indicating
the authentication in their received header in the standard way.

1.  I believe that RDNS_NONE should not have fired.  At the time of
processing, the
internal networks included 130.107/16 and 128.18/16, and cover the top 3
Receiveds.

So it said RDNS_NONE for the comcast IP?  Did it have a reverse DNS entry?
(Also seems like it should be solved by a received header indicating
authentication.)

The earliest received shows a Link Local IPv6 address, which should match
IP_PRIVATE in Constants.pm.
All of the IPv4 addresses have reverse DNS, including the
x-originating-ip.

I'm not too familiar with these, but my guess is, private IPs should be
skipped, and IPs before those should still be parsed / tested.

2.  I believe that ALL_TRUSTED should have fired.  The trusted networks
included 130.107/16 and
128.18/16.   The Link Local IPv6 address should not have affected that.

x-originating-ip: [AA.BB.CC.DD] appears to be treated effectively the
same as a received header.  So that seems like a good reason for
ALL_TRUSTED to not have fired.

4.
[3]http://spamassassin.apache.org/tests_3_3_x.html has
RCVD_IN_PBL = 3.6  (Spamhaus Policy black list)
RCVD_IN_SBL = 2.6  (Spamhaus Spam black list)
RCVD_IN_XBL = 0.7  (Spamhaus Botnet black list)
which seems backward to me.  The 3.2 tests scoring seems more reasonable.

Do not attempt to comprehend the depths of the mind of the re-scorer :P

No seriously, it has no concept of this rule means the email is more bad
than another rule, therefore it should have a higher score.  Only This
score results in a better approximation of the 1 false positive in 2,500
non-spams goal.  Which often results in unexpected things.  It comes
up a lot.

I very recently found a case where a rule that hit more non-spam than spam
got a score of something like 3.  Which may have been suboptimal.

The Policy Black List applies to anyone using Comcast (this /14, and
similarly for the /12
that includes my home IP address) as their ISP, unless they opt out
 
  [5]http://www.spamhaus.org/pbl/query/PBL1523209
 
To hit all of the users that use that mail system with a 3.6 score is
surely going to cause a number of false positives.

Should be handled by headers indicating authentication.

-- 
Immorality: The morality of those who are having a better time
- Henry Louis Mencken
http://www.ChaosReigns.com


Re: HTML link regex

2012-09-27 Thread darxus
On 09/25, John Hardin wrote:
 This topic comes up regularly enough that it should be a FAQ.

Yeah.  I haven't read this thread enough to know if it's been said, but
here's a previous thread on the subject:

http://spamassassin.1065346.n5.nabble.com/antiphishing-td52027i20.html

And the existing rules:  ruleqa.spamassassin.org/?rule=%2Fspoofed_url

  MSECSSPAM% HAM% S/ORANK   SCORE  NAME   WHO/AGE
  0   1.9104   0.4468   0.8100.550.01  T_SPOOFED_URL_HOST  
  0   1.9456   0.5844   0.7690.530.01  T_SPOOFED_URL  
  0   2.0437   3.6954   0.3560.37   (n/a)  __SPOOFED_URL_HOST  
  0   2.0917   4.0246   0.3420.36   (n/a)  __SPOOFED_URL  


Although, as John mentioned, this wasn't targeting specific domains.  If
rules that you come up with do actually work for you, please submit them
for inclusion in spamassassin QA, to see if they work well enough to
include in future sa-updates.

-- 
Blessed are the cracked, for they shall let in the light.
http://www.ChaosReigns.com


Re: HTML link regex

2012-09-27 Thread darxus
On 09/27, Alexandre Boyer wrote:
 I met you earlier on the IRC channel, remember?

Yup.

 Anyway, I would be glad to submit my rules (corrected by Bowie Bailey).
 I indeed asked how one could do that.

Open a bug:  https://issues.apache.org/SpamAssassin/

Include the rule(s) and request that they be added to ruleqa.


Just came across an old related bug:
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=4372

-- 
Life is either a daring adventure or it is nothing at all.
- Helen Keller
http://www.ChaosReigns.com


Re: X-Spam-Status: No, but still marked with [SPAM]

2012-09-21 Thread darxus
This is pretty common - enough that I'd appreciate it if you could provide
more information on the cause of your problem, and how you fix it, once you
do.

Yesterday in IRC:

09:40PM  ke6i X-Spam-Status: No, score=0.0 required=2.0 
tests=FROM_MISSP_REPLYTO,  FROM_MISSP_URI,TO_NO_BRKTS_FROM_MSSP autolearn=ham 
version=3.3.2   I'm getting mail like this marked as spam.  But score = 0? Why 
would it mark this as spam if score is 0 and required is 2.  
09:48PM  Darxus Sounds like that header, and your [SPAM] subject 
modification(?) are coming from two different runs of spamassassin.
09:49PM  ke6i interesting.  Let me study this message some more.
11:51PM  ke6i yeah something odd is going on here.  I'm seeing 'spamd: 
processing message ' in maillog twice for each email.


There have been a bunch of times I've heard people say spamassassin is
simultaneously marking emails as both spam and not spam.  Many times the
result has been that somehow they were running SA twice on the emails.
Never has it come up that SA was actually doing this in a single run.

On 09/21, Cathryn Mataga wrote:
 I'm getting these messages, some of them real emails, that get
 marked with [SPAM]
 even though X-Spam-Status: comes up as No.  I updated to the latest build on
 Fedora though I think this has been going on awhile.  It happens
 with some email
 accounts but not others.
 
 
 From me...@ecuador.junglevision.com  Thu Sep 20 17:42:50 2012
 Return-Path: me...@ecuador.junglevision.com
 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on
 ecuador.junglevision.com
 X-Spam-Level:
 X-Spam-Status: No, score=0.0 required=2.0 tests=FROM_MISSP_REPLYTO,
 FROM_MISSP_URI,TO_NO_BRKTS_FROM_MSSP autolearn=ham version=3.3.2
 Received: from ecuador.junglevision.com (localhost [127.0.0.1])
 by ecuador.junglevision.com (8.14.5/8.14.5) with ESMTP id
 q8L0go5j02679
 for megans...@junglevision.com; Thu, 20 Sep 2012 17:42:50 -0700
 Received: (from megan@localhost)
 by ecuador.junglevision.com (8.14.5/8.14.5/Submit) id
 q8L0goLd026789
 for megans...@junglevision.com; Thu, 20 Sep 2012 17:42:50 -0700
 Received: from server.cgskies.com (www.cgskies.com [85.17.169.165])
 by ecuador.junglevision.com (8.14.5/8.14.5) with ESMTP id
 q8L0gmKk02678
 for me...@junglevision.com; Thu, 20 Sep 2012 17:42:49 -0700
 Received: from www.cgtextures.com (www.cgtextures.com [95.211.74.173])
 by server.cgskies.com (8.14.4/8.14.4) with ESMTP id q8L0XDp8032570
 for me...@junglevision.com; Fri, 21 Sep 2012 02:33:13 +0200
 Received: by www.cgtextures.com (Postfix, from userid 101)
 id 81BF513200F0; Fri, 21 Sep 2012 03:55:56 +0200 (CEST)
 To: me...@junglevision.com
 Subject: [SPAM] Action Required to Activate Membership for CGTextures
 From: CGTextures supportsupp...@cgtextures.com
 To: me...@junglevision.com
 Reply-To: CGTextures supportsupp...@cgtextures.com
 Date: Fri, 21 Sep 2012 03:55:56 +0200
 Message-Id: 20120921015556.81bf51320...@www.cgtextures.com
 X-Spam-Prev-Subject: Action Required to Activate Membership for CGTextures
 X-UID: 170756
 Status: O
 X-Keywords: NonJunk
 

-- 
I always wonder why birds stay in the same place when they can
fly anywhere on the earth.  Then I ask myself the same question.
- Harun Yahya
http://www.ChaosReigns.com


Re: Exclude from RCVD_IN_DNSWL_MED

2012-09-17 Thread darxus
On 09/17, Noel Butler wrote:
I'm sure every network running a mail server would like to assume they are
100% whitehat too. I see no reason to treat them special, just like gmail
who think they are above it all, I wont include hotmail in that, as they

I suppose you think you're capable of achieving a higher ratio of outgoing
non-spam to spam than gmail, with anything near their number of users?

-- 
I'd rather be happy than right any day.
- Slartiblartfast, The Hitchhiker's Guide to the Galaxy
http://www.ChaosReigns.com


Optimizing scoring Re: Exclude from RCVD_IN_DNSWL_MED

2012-09-17 Thread darxus
On 09/17, Kris Deugau wrote:
 As an ISP mail admin, I **CANNOT** afford to block legitimate mail
 from any source, and if I see a report that a legitimate mail was
 blocked by any local rules or DNSBL data, I change the local rule or
 delete the offending local DNSBL entry ASAP.

Some times I envy the data available to those of you with users.  If you
can get 100,000 spams, and 100,000 non-spams together, you could run the
SA results through the re-scorer used to generate sa-updates, and have
scores fully optimized for your own users.  

And then you could give that data to the SA project to make it more
accurate for everybody else.  

I still feel like there's some good opportunity along these lines for
shared bayes.  

-- 
Democracy is the theory that the common people know what they want,
and deserve to get it good and hard. - H. L. Mencken
http://www.ChaosReigns.com


Re: Anyone from ReturnPath want to deal with this

2012-09-12 Thread darxus
On 09/08, Greg Troxel wrote:
 Some rules seem to have the description in iclude the IP address that
 was looked up in the whitelist/blacklist.  Others don't, and it makes it
 a bit hard to guess (since trusted/etc. processing is slightly tricky).
 So I think it would be good if all dnsbl rules listed the IP address
 that hit.

I agree.  What rules do not list the IP?  I think this is something worth
opening a bug for, if you can specify the rules.

-- 
When you think of the long and gloomy history of man, you will find
more hideous crimes have been committed in the name of obedience than
have ever been committed in the name of rebellion. - C. P. Snow
http://www.ChaosReigns.com


Re: Exclude from RCVD_IN_DNSWL_MED

2012-09-12 Thread darxus
On 09/10, Helmut Schneider wrote:
   If I understood you correctly I'd need to add all relays of
   MessageLabs to trusted_networks and also track any IP address
   changes...
  
  In theory, you need to do this for all DNSxL lookups.
 
 In practise they all resolve fine to *.messagelabs.com.

I believe Matthias was trying to point out that not having your
trusted_networks set correctly will mess up your use of not only DNSWL, but
any other DNS based IP white *and* blacklists, which significantly
contribute to the effectiveness of spamassassin.  

-- 
But do you have any idea how many SuperBalls you could buy if you
actually applied yourself in the world? Probably eleven, but you should
still try. - http://hyperboleandahalf.blogspot.com/
http://www.ChaosReigns.com


Re: Install a new SpamAssassin server

2012-09-09 Thread darxus
On 09/09, Olivier CALVANO wrote:
 I want change my old server with SpamAssassin. Anyone know a web site
 which advises the rules, modules, rbl they must necessarily have to
 reach a maximum rate of detection ?

This may be about what you're looking for:
https://wiki.apache.org/spamassassin/ImproveAccuracy

 Actually, i use commercial service of SpamHaus, he have other list
 with a best quality ?

You're paying for spamhaus because you have a high rate of traffic?  I
think one of the things we're really missing is what rates of traffic are
allowed by which services enabled in spamassassin by default.  Warren did a
nice job of documenting them on his site a while ago:
http://www.spamtips.org/2011/01/usage-limits-of-spamassassin-network.html

The spamhaus lists are good.  You can see the effectiveness of rules here:
http://ruleqa.spamassassin.org/

Or filter to rules that include rcvd_in_, which includes most of those
kind of tests:
http://ruleqa.spamassassin.org/?rule=%2Frcvd_in_

-- 
For every complex problem, there is a solution that is simple, neat,
and wrong. - H. L. Mencken
http://www.ChaosReigns.com


Re: High CPU utilization and performance decrease after recent sa-update.

2012-09-06 Thread darxus
On 09/06, Piotr Kapiszewski wrote:
$sa_local_tests_only = 1 (amavis hook)

SpamAssassin is wrong about three times as often without network tests.
But if you're crippling the network tests as much as you mentioned, might
as well use the score set which is optimized for having the network tests
disabled (which this should do).

-- 
It is the first responsibility of every citizen to question authority.
- Benjamin Franklin
http://www.ChaosReigns.com


Re: spam in foreign characters

2012-08-21 Thread darxus
SpamAssassin has an ok_locales thing that allows you to specify basically
languages you want to accept.  But it has problems:
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=4078

I don't believe anybody has created rules to match these kinds of spams.
A big part of the problem is lacking examples of non-English non-spam
to verify the rules don't hit them.

So, you should probably try using ok_locales, and if it doesn't work,
create your own rules to match these spams, if you can find good common
patterns that don't seem likely to match non-spams (or match all Chinese
email if that's what you want).  And please share what works.

ok_locales is defined in the Mail::SpamAssassin::Conf main page which can
also be found here:
http://spamassassin.apache.org/full/3.3.x/doc/Mail_SpamAssassin_Conf.html

Hmm, ok_locales may actually work on Chinese, I don't see examples of
problems with that language.

On 08/21, Adam Moffett wrote:
 I have a user who seems to get 4-5 messages per day with Chinese
 characters for the subject and body.  They come from a variety of
 domains and IP's so I guess she somehow got onto a list used to spam
 Chinese speaking people.
 
 If I paste them into Google Translate they seem to be roughly the
 same kind of junk as our English spam: work from home, buy our
 drugs, etc.  The handful that I looked at closely had scores of
 2.0-3.0.
 
 Are there existing SpamAssassin rules that work on non english
 characters?  Is there maybe something extra I should enable or
 install that would score these higher?
 
 I'm sorry if it's an ignorant question, but the issue hasn't really
 come up here before.
 
 Thanks.
 

-- 
There never has been an answer. There never will be an answer.
That's the answer. - Gertrude Stein
http://www.ChaosReigns.com


Re: Bogus authorize.net statements

2012-08-15 Thread darxus
On 08/15, Jim Schueler wrote:
the attached. �All share a common marker of embedding a text url within an
HTML a tag containing a different URL. �This seems like an obvious
marker for spam, I wonder why there isn't a rule for it.

There is a rule.  It hits 10x as much non-spam as spam:

ruleqa.spamassassin.org/?rule=%2Fspoofed_url

There was some work on improving it:
http://osdir.com/ml/users-spamassassin/2011-10/msg00237.html

It didn't work out:
http://osdir.com/ml/users-spamassassin/2011-10/msg00304.html

Feel free to try to do better.

-- 
Just because you're offended, doesn't mean you're right. - Ricky Gervais
http://www.ChaosReigns.com


Re: Received header syntax

2012-08-15 Thread darxus
On 08/15, Ori Bani wrote:
 I tried to intentionally make a terribly wrong Received to see if SA
 would give me a rule hit but it did not. Is there a rule for this? If
 so, how can I turn it on and off?

I don't think there is actually a rule for unparsable headers.  I think it
effectively just ignores received headers it can't parse.  So just run one
of your outgoing emails through spamassassin -D and look for lines like:

Aug 15 15:17:33.625 [23043] dbg: received-header: parsed as [ ip=140.211.11.3 
rdns=hermes.apache.org helo=mail.apache.org by=panic.chaosreigns.com ident= 
envfrom= intl=0 id=C6F0CCD227 auth= msa=0 ]

To make sure it has parsed successfully.

 Is there a place I can test only this rule?

No.

-- 
I always wonder why birds stay in the same place when they can
fly anywhere on the earth.  Then I ask myself the same question.
- Harun Yahya
http://www.ChaosReigns.com


Re: RDNS_NONE

2012-08-15 Thread darxus
On 08/15, Matt wrote:
 I have messages marked as such:
 
 RDNS_NONE Delivered to internal network by a host with no rDNS
 
 Problem is they very clearly have reverse and matching forward DNS
 that Exim even agrees on.  Why is SA tagging them as such?

I wonder how much this is related to the other post I just made.  Exim is
notorious for allowing people to modify their Received headers in a way
that doesn't comply with anything.  Are they in headers SA is failing to
parse?  Run it through spamassassin -D.

-- 
Safe is anywhere a hungry person can't walk in three days. - John Titor
http://www.ChaosReigns.com


Re: RCVD_IN_DNSWL_BLOCKED

2012-08-14 Thread darxus
On 08/13, JP Kelly wrote:
 How can I disable the DNSWL rule/plugin or whatever. Not just give it a 
 low/zero score but disable it completely.
 I am tired of seeing RCVD_IN_DNSWL_BLOCKED in my headers.

The description for RCVD_IN_DNSWL_BLOCKED
is The query to DNSWL was blocked.  See
http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more
information.

Have you looked at that link?  Are you running a local non-forwarding,
caching DNS server?

Immediately below the question linked to on that page is how to disable
these rules, as you asked.  However, unless you are, in fact, running a
site with quite a lot of email (over 100,000 queries per day), there is
probably a better solution.


I have some association with dnswl.org.  


On 08/14, Bowie Bailey wrote:
 If you want to disable the DNSWL lookup completely, you should zero
 out the main rules and the sub-rule:
 
score RCVD_IN_DNSWL_BLOCKED 0
score RCVD_IN_DNSWL_HI 0
score RCVD_IN_DNSWL_LOW 0
score RCVD_IN_DNSWL_MED 0
score RCVD_IN_DNSWL_NONE 0

I believe all of the above are unnecessary.

score __RCVD_IN_DNSWL 0

And this alone is adequate.  

I attempted to add it to
http://wiki.apache.org/spamassassin/DnsBlocklists but the site has become
unresponsive.

-- 
Hermes will help you get your wagon unstuck, but only if you push on it.
- Greek Alphabet Oracle
http://www.ChaosReigns.com


Re: RCVD_IN_DNSWL_BLOCKED

2012-08-14 Thread darxus
On 08/14, Jon-Paul Kelly wrote:
  Are you running a local non-forwarding,
  caching DNS server?
 
I have a Plesk installation and am using the DNS server as provided by
Plesk. The nameservers are [2]ns1.smallgod.net, [3]ns2.smallgod.net

If the smallgod.net name servers are provided by plesk, and not your own,
then you are using forwarders, which would be a problem, as the number
of people querying DNSWL would be counted for everybody using those DNS
servers, not just your own.

As Bowie mentioned, this is explained here:
http://wiki.apache.org/spamassassin/CachingNameserver
Which is linked from 
http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block
(the link the the RCVD_IN_DNSWL_BLOCKED description).

I am not sure if I have 100,000+ queries per day. I guess it is possible.
The server has 270 domains and they all use the same name server.
Is there a way to check with [4]dnswl.org the number of queries and where
they are coming from?

Google searching for: dnswl contact
has a useful first hit :)

But it sounds like your problem is using forwarders.

-- 
We will be dead soon. Is this how we want to live?
http://www.ChaosReigns.com


Re: HEADS UP: DBSL.org is returning positive replies

2012-08-10 Thread darxus
On 08/10, Brent Gardner wrote:
 As of today, dsbl.org is returning positive replies

 Is this enough to keep it from being used?
 
 meta RCVD_IN_DSBL (0)

Not necessary, this blacklist is not used in spamassassin because it has
been dead for years.

I believe the warning was posted primarily for people who were using this
BL at their MTA (mail server software).  Or possibly ancient versions of SA
(before 3.3.x) which haven't been getting updates for years and you
shouldn't be running anyway.

-- 
It is better to die on your feet than to live on your knees.
 - Emiliano Zapata, Mexican Revolution Leader
http://www.ChaosReigns.com


Re: HEADS UP: DBSL.org is returning positive replies

2012-08-10 Thread darxus
For completeness:
http://wiki.apache.org/spamassassin/Rules/RCVD_IN_DSBL
For the last three years this page has mentioned this rule is gone because
dsbl.org is gone.

The bug where it was removed from SA, four years ago:
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5988


The thing to look for and remove is stuff in your MTA configuration, which
I had years ago, in my postfix main.cf:

  reject_rbl_client list.dsbl.org

-- 
If you believe everything you read, better not read. - Japanese Proverb
http://www.ChaosReigns.com


Re: Spamassassin and SPF records with +all

2012-07-11 Thread darxus
On 07/11, Josef Karliak wrote:
   within a few days we've spams from domains that has +all in the
 TXT spf record. I was thinking that I'll make a plugin that check
 this records and add some point to this email, but I do not know

Your best chance may be to open a spamassassin bug requesting it.  I'd
guess it wouldn't be too hard to add to the existing SPF plugin.  The more
information you can provide showing this happens with spam, and does not
tend to happen with non-spam, the better.  It would get run through the
re-scoring process with a testing flag to determine if it's actually
useful, and what the optimal score is, before being published via
sa-update.  Well, you'd also need to update your SPF plugin to be able to
use it

  v=spf1 +all
   The domain owner thinks that SPF is useless and/or doesn't care.
- http://www.openspf.org/SPF_Record_Syntax

That's a *really* unprofessional way to say Everything in this domain
passes SPF.


Huh, the spamassassin SPF plugin uses Mail::SPF, and... I'm not sure it's
possible to get a copy of the SPF record to check it for containing +all.
Anybody else see how?


On 07/11, Martin Gregorie wrote:
 All SPF can do is check that the sender has a valid IP for that domain,
 i.e. that the sender's domain wasn't forged. SPF cannot and should not
 be used to flag mail as spam if the sender is a legitimate member of the

Yeah, but there are lots of perfectly valid things that show up in emails
that correlate usefully to spam which, in combination, are useful in
determining which emails are spam and which are not.  If adding 0.2 points
to all emails from a domain with +all in a SPF record increases the spam
caught without increasing false positives significantly, it could be worth
doing.

-- 
You will need: a big heavy rock, something with a bit of a swing to it...
perhaps Mars - How to destroy the Earth
http://www.ChaosReigns.com


Re: Suddenly getting lots of false positives.

2012-05-24 Thread darxus
On 05/24, corpus.defero wrote:
 I'm not 100% but isn't http://www.dnswl.org/ a 'DIY' whitelisting site
 that anyone can kind of abuse?

No.

I'm a (basically inactive) dnswl.org admin.  

Anybody can request to be added to the list, but all changes get looked
over pretty thoroughly by a human, using lots of available data.  

 The rule is tucked away in 72_active.cf, along with the other 'pay to
 spam' whitelists from the likes of Return Path. I suggest you add this

Listing on dnswl.org does not involve payment, it is not a 'pay to spam'
whitelist.

-- 
You will need: a big heavy rock, something with a bit of a swing to it...
perhaps Mars - How to destroy the Earth
http://www.ChaosReigns.com


Re: Suddenly getting lots of false positives.

2012-05-24 Thread darxus
On 05/24, Jeremy Morton wrote:
   -4.0 RCVD_IN_DNSWL_MED  RBL: Sender listed at
 http://www.dnswl.org/, medium
   trust
   [59.94.13.26 listed in list.dnswl.org]

I don't think this was ever actually listed by dnswl.org.  I have
archives back to last June, which don't show it, and in the dnswl.org
admin interface when a listing is removed it generally deactivated not
deleted - and there is nothing there.

That leaves interesting possibilities.  I'd start by running this email
through spamassassin again to see if it repeatably says this IP is listed
by dnswl.  SpamAssassin could be doing something wrong, a DNS server
somewhere could be doing something wrong 

And it might be useful to provide more examples.  Just IPs might be best.
And generally we prefer you provide spams via pastebin instead of including
them in emails to this list.

-- 
For gasoline vapor, the explosive range is from 1.3 to 6.0% vapor
to air...useful against soft targets such as...armored vehicles...and
bunkers. - http://www.fas.org/man/dod-101/sys/dumb/fae.htm
http://www.ChaosReigns.com


Re: Suddenly getting lots of false positives.

2012-05-24 Thread darxus
On 05/24, Benny Pedersen wrote:
 reject spf_softfail in mta, or report to http://www.dnswl.org/ 

SPF_SOFTFAIL kind of sucks:
http://ruleqa.spamassassin.org/?daterev=20120519-r1340375-nrule=%2Fspf

  MSECSSPAM% HAM% S/ORANK   SCORE  NAME   WHO/AGE
  0   3.2640  27.9430   0.1050.670.00  SPF_PASS  
  0   6.3320   0.6518   0.9070.580.00  SPF_SOFTFAIL  
  0   4.0263   1.1272   0.7810.500.00  SPF_NEUTRAL  
  000   0.5000.500.00  SPF_NONE  
  0   1.7415   1.6254   0.5170.390.00  SPF_FAIL  

SPF_SOFTFAIL hits 6.3% of spam and 0.7% of ham, which is a pretty terrible
ratio, which gives it a rank of 0.58, where 1 is best (RCVD_IN_DNSWL_HI, in
fact), and 0 is worst.  A rank of 0.58 sucks.

Therefore rejecting on it at your MTA is a bad idea.  But it's your MTA.
I've done lots of things with my MTA on purpose that were a bad idea.

 (why
 did thay list a dynamic ip ?)

I don't think they did.

 if sender is legit why is it softfailing ?

Generally because people configure their SPF records badly.  SOFTFAIL
*means* the sending domain isn't certain they have all their legit sending
IPs listed.  So based on the protocol it's also inappropriate to use for
absolute blocking.  (In addition to the real world statistics above.)  It's
unfortunate.

-- 
Wash daily from nose-tip to tail-tip; drink deeply, but never too deep;
And remember the night is for hunting, and forget not the day is for sleep.
- The Law of the Jungle, Rudyard Kipling
http://www.ChaosReigns.com


Re: Suddenly getting lots of false positives.

2012-05-24 Thread darxus
On 05/24, Kevin A. McGrail wrote:
 Normally, I blame a DNS server.  See pages like this for more information:
 
 http://www.surbl.org/faqs#dnsproxy

Yup, that could do it.  Icky.  

Jeremy: You could manually check if you're getting the wrong DNS results by
running:

$ host 26.13.94.59.list.dnswl.org
Host 26.13.94.59.list.dnswl.org not found: 3(NXDOMAIN)

(IP address reversed, then .list.dnswl.org.)

If an IP address is listed (as that one should not be), you'll see
something like:

$ host 40.152.71.64.list.dnswl.org
40.152.71.64.list.dnswl.org has address 127.0.6.3

 Darxus, you wrote a good wiki about using other DNS servers, etc. somewhere I 
 thought about but I can't find it.

I did?  Are you thinking of
https://wiki.apache.org/spamassassin/CachingNameserver ?  I didn't write
it.

 In general, I recommend running your own caching nameserver.

Yup. 

-- 
Safe is anywhere a hungry person can't walk in three days. - John Titor
http://www.ChaosReigns.com


Re: __DRUG_MUSCLE1 false-positives

2012-05-17 Thread darxus
On 05/18, Jason Haar wrote:
 A bit OT, but is it because your perl is running under C locale
 instead of se? i.e. would the word boundary definition change under
 different localization contexts? Doesn't help solve the problem for you,
 but it certainly flags a potential issue with a tonne of the rules in SA...

Locale handling is a known problem is SA:
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=3062

-- 
Life is either a daring adventure or it is nothing at all.
- Helen Keller
http://www.ChaosReigns.com


Re: use_bayes=0 completly disables report function

2012-04-20 Thread darxus
On 04/20, Marcin Mirosław wrote:
 Hello,
 i've notice when i set use_bayes 0 then spamc -C report stops to work.
 I've got in log:  spamd: Can't call method learn on an undefined value

bayes_learn_during_report 0

-- 
Safe is anywhere a hungry person can't walk in three days. - John Titor
http://www.ChaosReigns.com


Re: updates

2012-04-12 Thread darxus
On 04/12, John Hardin wrote:
 Can you remind me how far below the threshold we are for corpora?  If I hand
 qualify another couple of thousand hams or so would that be significant? Or
 is our deficit significantly larger than that?

 The current corpora are ham=50658, spam=245341.
 
 I don't remember what the thresholds currently are, but the numbers
 used in the past have been a multiple of 50k, so 100k, 150k, 200k or
 250k. Darxus, you're more in tune with this than I am, what are the
 current thresholds?

Thresholds for both are 15.  Graph here, updated weekly:
http://www.chaosreigns.com/dnswl/tot.svg
According to that, we're at 29003 spams.  That matches the latest net run,
which it's based on: http://ruleqa.spamassassin.org/20120407-r1310705-n
So as of Saturday, we're at 19.3% of the spam corpora we need.

Spam age limit is 2 months.

The dev list gets an alert every day (from me) if updates haven't been
generated.  It says:

SpamAssassin version 3.3.2 has not had a rule update since 2012-02-25.

It's pretty obnoxious, but I think it's a big enough problem to justify it
being posted once a day (and I'm apparently not the only one).


New contributors aren't currently allowed due to
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6694
which has restricted visibility due to being a security bug.  For the past
69 days, it has been waiting for a reply from Warren Togami to okay
declaring it not actually a security problem (which I am in favor of).
It seems this requires another member of the PMC (project management
committee) to step in and declare this not a security bug.  Or for someone
with sufficient access to otherwise fix it, which I suspect is a very
small set of people.  

Once that's cleared up, new people would be able to contribute data
(just logs of rule hits, not actual email) via
https://wiki.apache.org/spamassassin/NightlyMassCheck

-- 
Go forth, and be excellent to one another. - http://www.jhuger.com/fredski.php
http://www.ChaosReigns.com


Re: updates

2012-04-12 Thread darxus
On 04/12, joea wrote:
 SpamAssassin version 3.3.2 has not had a rule update since 2012-02-25.
 
 From this, should I conclude there will be no updates to earlier versions 
 (3.2.x for instance) ?   Must I upgrade in order to update?

No, I thought it was overly verbose to say it actually says:

SpamAssassin version 3.3.0 has not had a rule update since 2012-02-25.
SpamAssassin version 3.3.1 has not had a rule update since 2012-02-25.
SpamAssassin version 3.3.2 has not had a rule update since 2012-02-25.


All 3.3.x versions use the same rules.

-- 
But do you have any idea how many SuperBalls you could buy if you
actually applied yourself in the world? Probably eleven, but you should
still try. - http://hyperboleandahalf.blogspot.com/
http://www.ChaosReigns.com


Re: sought is failing with sa-compile

2012-03-27 Thread darxus
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6649

-- 
Will I ever learn? I hope not, I'm having too much fun.
- Brent Minime Avis, motorcycle.com
http://www.ChaosReigns.com


Re: OT how to bypass public nameservers as bind forwarders?

2012-03-21 Thread darxus
On 03/21, Jari Fredriksson wrote:
 0.0 RCVD_IN_DNSWL_BLOCKED  RBL: ADMINISTRATOR NOTICE: The query to DNSWL
 was blocked.  See
 
 http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block
  for more information.

This is plenty on topic.

I tried to update the contents of that wiki link with the useful answers
from this thread.

Everybody should feel free to further improve the wiki, just create an
account, and email d...@spamassassin.apache.org to request write access.

-- 
He who dies with the most toys... still dies. - No Fear
http://www.ChaosReigns.com


Re: Blocking frequent botnet pattern

2012-03-13 Thread darxus
On 03/13, Alex wrote:
 http://pastebin.com/raw.php?i=iquXBnH0

 While I could create a rule to block this specific domain, or submit
 it to a RBL, I'd appreciate any ideas how to more generally block
 them, rather than by one characteristic in the message.

We need more examples.

 Maybe this is addressed in v3.4? 

Unlikely.

-- 
As humans, we are taught to forget that we are animals.
- forward to Johnny The Homicidal Maniac
http://www.ChaosReigns.com


Automatic rule generation Re: Better phish detection

2012-03-10 Thread darxus
The software used to generate the sought rules, or perhaps an old version
of it, is in the spamassassin source tree.  You can feed it a folder of
known non-spams, and a folder of known spams, and it'll auto-generate rules
that hit the spams but not the non-spams.  

Ah, I documented it some here:
https://wiki.apache.org/spamassassin/WritingRules

svn checkout http://svn.apache.org/repos/asf/spamassassin/trunk
cd trunk/masses/rule-dev
./seek-phrases-in-corpus ham:dir:~/Maildir/ 
spam:dir:~/Maildir/.bad.spam-missed/ 

On 03/10, sporkman wrote:
 
 Hello,
 
 We are getting a fair amount of very targetted phish attempts to our
 userbase.  Since we are relatively small, I don't think any of the URIBLs
 really help (or phishtank or other lists) since we're not a large bank or
 paypal or anything like that.
 
 I did see some gentleman make a rather valiant attempt at listing all the
 common phrases here:
 
 http://old.nabble.com/introducing-body-J_MAILBOX_FULL-tc33207944.html#a33213220
 
 It has a number of errors, and obviously that's not very efficient (I suck
 at regexs, but I know enough to know that list can be collapsed).
 
 Any pointers to a good starting point to take a list like that and make it
 usable?  The phrasing on these is always very similar - stuff about
 upgrading your webmail account, etc.
 
 We're running qmail/vpopmail, and our upgrade to postfix to at least
 front-end qmail is still a ways off.  I think with postfix we could probably
 catch a bunch of this garbage at the front door.  So for now, our only real
 tool to fight this is SA.  
 
 I assume we're not the only ones seeing this mess, what are others doing to
 counteract this?
 
 Thanks,
 
 Charles
 -- 
 View this message in context: 
 http://old.nabble.com/Better-phish-detection-tp33478328p33478328.html
 Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
 

-- 
If you are not paranoid... you may not be paying attention.
 - j...@creative-net.net, on an IDPA mailing list
http://www.ChaosReigns.com


Re: Sought rules alive?

2012-03-07 Thread darxus
On 03/07, Andrea gabellini - SC wrote:
 I noticed that sought rules are not updated from many weeks?
 
 Is the project alive?

There was no mention of intentionally killing it off, so my guess is it
accidentally broke and wasn't noticed.

It hasn't been updated since 2012-01-02, and is supposed to update multiple
times a day.

This came up on the dev list two months ago, unfortunately it was in the
form of Can somebody verify this isn't just broken for me? not Hey,
sought is broken.:
http://old.nabble.com/sought-sa-update-channel---SA-3.4.-trunk-td33164814.html

-- 
You will need: a big heavy rock, something with a bit of a swing to it...
perhaps Mars - How to destroy the Earth
http://www.ChaosReigns.com


Re: White text on white background

2012-02-18 Thread darxus
Bug with patches to fix this:
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6760

On 02/17, dar...@chaosreigns.com wrote:
 Looks like this fixes it:
 
 $ diff ./lib/Mail/SpamAssassin/HTML.pm 
 /usr/share/perl5/Mail/SpamAssassin/HTML.pm
 952a953,956
# Handle 3 character color shorthand.
if (length($color) == 3) {
  $color =~ s/(.)(.)(.)/$1$1$2$2$3$3/;
}
 
 Opening a bug to apply it.
 
 On 02/17, dar...@chaosreigns.com wrote:
  Confirmed.  #999 is getting converted to #090909, when it should be
  getting converted to #99.  (Threw a print statement into the top
  of html_font_invisible().)
  
  On 02/17, dar...@chaosreigns.com wrote:
   You should open a bug.  SpamAssassin attempts to catch these via
   html_font_invisible() in HTML.pm (should hit rule HTML_FONT_LOW_CONTRAST).
   My guess is that it's failing to handle the short form of color values
   (FFF instead of FF).  Looks like they should be converted like 123
   - 112233.
   
   Report bugs here: https://issues.apache.org/SpamAssassin/
   
   On 02/16, JP Kelly wrote:
I am getting a bunch of spam with white text on a white background. Any 
ideas how to catch it?
Here is an example:
   
body bgcolor=#FF leftmargin=0 topmargin=0 marginwidth=0 
marginheight=0
   
p style=color:#FFF; font-size:1px; width:600px;
   
   -- 
   Life is either a daring adventure or it is nothing at all.
   - Helen Keller
   http://www.ChaosReigns.com
   
  
  -- 
  It is the first responsibility of every citizen to question authority.
  - Benjamin Franklin
  http://www.ChaosReigns.com
  
 
 -- 
 I don't want to die... just yet... not while there's... women.
 - J. Matthew Root, 8/23/02 (http://www.jmrart.com/)
 http://www.ChaosReigns.com
 

-- 
Force, my friends, is violence; the supreme authority
from which all other authority is derived.
- Michael Ironside, Starship Troopers
http://www.ChaosReigns.com


Re: White text on white background

2012-02-17 Thread darxus
You should open a bug.  SpamAssassin attempts to catch these via
html_font_invisible() in HTML.pm (should hit rule HTML_FONT_LOW_CONTRAST).
My guess is that it's failing to handle the short form of color values
(FFF instead of FF).  Looks like they should be converted like 123
- 112233.

Report bugs here: https://issues.apache.org/SpamAssassin/

On 02/16, JP Kelly wrote:
 I am getting a bunch of spam with white text on a white background. Any ideas 
 how to catch it?
 Here is an example:

 body bgcolor=#FF leftmargin=0 topmargin=0 marginwidth=0 
 marginheight=0

 p style=color:#FFF; font-size:1px; width:600px;

-- 
Life is either a daring adventure or it is nothing at all.
- Helen Keller
http://www.ChaosReigns.com


Re: White text on white background

2012-02-17 Thread darxus
Confirmed.  #999 is getting converted to #090909, when it should be
getting converted to #99.  (Threw a print statement into the top
of html_font_invisible().)

On 02/17, dar...@chaosreigns.com wrote:
 You should open a bug.  SpamAssassin attempts to catch these via
 html_font_invisible() in HTML.pm (should hit rule HTML_FONT_LOW_CONTRAST).
 My guess is that it's failing to handle the short form of color values
 (FFF instead of FF).  Looks like they should be converted like 123
 - 112233.
 
 Report bugs here: https://issues.apache.org/SpamAssassin/
 
 On 02/16, JP Kelly wrote:
  I am getting a bunch of spam with white text on a white background. Any 
  ideas how to catch it?
  Here is an example:
 
  body bgcolor=#FF leftmargin=0 topmargin=0 marginwidth=0 
  marginheight=0
 
  p style=color:#FFF; font-size:1px; width:600px;
 
 -- 
 Life is either a daring adventure or it is nothing at all.
 - Helen Keller
 http://www.ChaosReigns.com
 

-- 
It is the first responsibility of every citizen to question authority.
- Benjamin Franklin
http://www.ChaosReigns.com


Re: White text on white background

2012-02-17 Thread darxus
Looks like this fixes it:

$ diff ./lib/Mail/SpamAssassin/HTML.pm 
/usr/share/perl5/Mail/SpamAssassin/HTML.pm
952a953,956
   # Handle 3 character color shorthand.
   if (length($color) == 3) {
 $color =~ s/(.)(.)(.)/$1$1$2$2$3$3/;
   }

Opening a bug to apply it.

On 02/17, dar...@chaosreigns.com wrote:
 Confirmed.  #999 is getting converted to #090909, when it should be
 getting converted to #99.  (Threw a print statement into the top
 of html_font_invisible().)
 
 On 02/17, dar...@chaosreigns.com wrote:
  You should open a bug.  SpamAssassin attempts to catch these via
  html_font_invisible() in HTML.pm (should hit rule HTML_FONT_LOW_CONTRAST).
  My guess is that it's failing to handle the short form of color values
  (FFF instead of FF).  Looks like they should be converted like 123
  - 112233.
  
  Report bugs here: https://issues.apache.org/SpamAssassin/
  
  On 02/16, JP Kelly wrote:
   I am getting a bunch of spam with white text on a white background. Any 
   ideas how to catch it?
   Here is an example:
  
   body bgcolor=#FF leftmargin=0 topmargin=0 marginwidth=0 
   marginheight=0
  
   p style=color:#FFF; font-size:1px; width:600px;
  
  -- 
  Life is either a daring adventure or it is nothing at all.
  - Helen Keller
  http://www.ChaosReigns.com
  
 
 -- 
 It is the first responsibility of every citizen to question authority.
 - Benjamin Franklin
 http://www.ChaosReigns.com
 

-- 
I don't want to die... just yet... not while there's... women.
- J. Matthew Root, 8/23/02 (http://www.jmrart.com/)
http://www.ChaosReigns.com


Re: SPF and DKIM tests by default?

2012-02-12 Thread darxus
On 02/10, email builder wrote:
  I believe for SPF you *should* be doing the detecting at your MTA
  (mail server software) and inserting a header for spamassassin to use:
  Received-SPF.  (Because SPF is supposed to use the envelope from,
  which is not necessarily included in a header.)
 
 I see. That makes sense. Is there a wiki page suggesting solutions for this? 
 Anyone know of tips for doing this in postfix? Or during amavis processing?

I use postfix-policyd-spf-perl.
Which appears to currently be officially hosted at:
https://launchpad.net/postfix-policyd-spf-perl/

-- 
For gasoline vapor, the explosive range is from 1.3 to 6.0% vapor
to air...useful against soft targets such as...armored vehicles...and
bunkers. - http://www.fas.org/man/dod-101/sys/dumb/fae.htm
http://www.ChaosReigns.com


Re: SPF and DKIM tests by default?

2012-02-09 Thread darxus
On 02/08, email builder wrote:
 Hello,
 
 I have a server where I never customized any of the SA
 rules/tests (SA v.3.3.1).  The server does run sa-update
 every day.  Is this the right place to look to know what
 tests the server should be running?
 
 https://spamassassin.apache.org/tests_3_0_x.html

At the top of that page, it says Tests Performed: v3.0.x which is not the
version you are running.  https://spamassassin.apache.org/tests_3_3_x.html
contains tests for 3.3.  I don't know when they get updated, maybe only
when 3.3.0 was released.  I wouldn't trust it much.

Run: sa-update -D 21| grep DIR

That will output something like:

Feb  9 12:08:49.609 [20855] dbg: generic: Perl 5.010001, PREFIX=/usr, 
DEF_RULES_DIR=/usr/share/spamassassin, LOCAL_RULES_DIR=/etc/spamassassin, 
LOCAL_STATE_DIR=/var/lib/spamassassin

On this system, sa-update downloads rules to /var/lib/spamassassin, so I
guess you're looking for the LOCAL_STATE_DIR.

That directory will contain a directory related to your SA version,
something like 3.003001, which will contain updates_spamassassin_org, which
will contain the files defining all the rules.  

Although that doesn't necessarily tell you which are enabled by default.
Some require configuration changes.

I believe for SPF you *should* be doing the detecting at your MTA
(mail server software) and inserting a header for spamassassin to use:
Received-SPF.  (Because SPF is supposed to use the envelope from,
which is not necessarily included in a header.)

 From that page, it seems that SPF checks are normal
 but DKIM is not. Is this right?
 
 Contrary to that, this page suggests that DKIM test are
 enabled by default in version 3.3:
 
 https://wiki.apache.org/spamassassin/Plugin/DKIM

I don't have anything in my /etc/spamassassin/local.cf related to DKIM, and
I'm getting DKIM rule hits, so I agree that DKIM is enabled by default
(although I'm running trunk / v3.4.0 which is unreleased).

I believe SPF tests are also enabled by default, but won't do quite the
right thing unless you're inserting the Received-SPF header at your MTA.

 Also, where can I look to verify the tests/rules currently
 in place on the server?  (per-user rules are not implemented)
 
 I looked in /usr/share/spamassassin and there are a few
 files with spf and dkim in their names.  Does that
 mean those tests are active?

Using the official Debian / Ubuntu packages, that directory contains the
rules installed by the spamassassin package, which are only used if you do
not run sa-update.  Which would obviously be sub-optimal.

 ls *spf*
 -rw-r--r-- 1 root root 3100 Mar 15  2010 25_spf.cf
 -rw-r--r-- 1 root root 3584 Mar 15  2010 60_whitelist_spf.cf
 
 ls *dkim*
 -rw-r--r-- 1 root root 4407 Mar 15  2010 25_dkim.cf
 -rw-r--r-- 1 root root 9288 Mar 15  2010 60_adsp_override_dkim.cf
 -rw-r--r-- 1 root root 6455 Mar 15  2010 60_whitelist_dkim.cf

Those are related, although their presence doesn't indicate anything about
defaults.  

None of the SPF or DKIM rules are particularly highly ranked in
spamassassin rule QA, so I wouldn't actually expect significant
improvements in accuracy from it:
http://ruleqa.spamassassin.org/?daterev=20120204
They both have some substantial flaws.  

-- 
Every man, woman and child on the face of this earth is at the mercy
of chaos. - a maxwell smart movie
http://www.ChaosReigns.com


Re: ham marked as spam: bogus IP in report

2012-01-23 Thread darxus
On 01/23, Toni Mueller wrote:
 On Mon, Jan 23, 2012 at 11:59:43AM -0500, Kevin A. McGrail wrote:
   Am I looking at a bug in SA? And/Or, how do I debug this, please?
  Baffling.  Checking your maillogs, you don't see that IP anywhere?
 
 I do see this IP number several times, but it tried to send a completely
 different email to someone else on my server.

I was just about to ask if it might be showing up in other emails, afraid
it might be related to:
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6617
FreeMail rule description shows emails from previous messages.

Ick.

-- 
Begin at the beginning and go on till you come to the end; then stop.
- Lewis Carrol, Alice in Wonderland
http://www.ChaosReigns.com


Re: update channel list

2012-01-18 Thread darxus
On 01/18, Micah Anderson wrote:
 updates.spamassassin.org
 sought.rules.yerp.org
 khop-bl.sa.khopesh.com
 khop-blessed.sa.khopesh.com
 khop-general.sa.khopesh.com
 khop-sc-neighbors.sa.khopesh.com
 
 but I suspect that some of these are no longer good. I was hoping folks
 out there might be able to make some suggestions for improvements?

All of those are currently listed by Adam Katz on
http://khopesh.com/wiki/Anti-spam
I expect that list to be up to date.  
He's an active spamassassin developer.  

That page also lists 90_2tld.cf.sare.sa-update.dostech.net.  I doubt there
are any others worth using.  If there are, they should probably get added
to http://wiki.apache.org/spamassassin/CustomRulesets
If there were more sa-update channels that were useful, I'd recommend
breaking that page up a little more to put the rule sets with update
channels at the top.  

If you're looking to improve SA accuracy in general, I've tried to make a
thorough checklist here:
http://wiki.apache.org/spamassassin/ImproveAccuracy

-- 
You only truly own what you can carry at a dead run. 
- 14th  15th century Landsknechts
http://www.ChaosReigns.com


Re: sa-update channel list

2012-01-11 Thread darxus
On 01/12, jida...@jidanni.org wrote:
  MS == Michael Scheidell michael.scheid...@secnap.com writes:
 MS #1 priority:  keep your version of sa updated
 Hmmm, taking a look at it, I find the last update was about 2011/10/24.
 Too bad sa-update -D doesn't spit out the date.

I don't remember what that update was for, but versions prior to 3.3.0
stopped getting regular updates in 2008.  

-- 
Every normal man must be tempted at times to spit upon his hands,
hoist the black flag, and begin slitting throats.
 - Henry Louis Mencken (1880-1956)
http://www.ChaosReigns.com


Re: installation problem

2012-01-01 Thread darxus
I have little faith in installing spamassassin from cpan.  I'd recommend
uninstalling it if you can, and installing from whatever packaging system
your OS uses, which I believe is ports.  

But if there is a related bug in installation from cpan, it would be nice
to track it down and fix it.

From your debug output:

Jan  1 17:06:23.374 [20281] dbg: generic: Perl 5.01,
PREFIX=/usr/pkg, DEF_RULES_DIR=/usr/pkg/share/spamassassin,
LOCAL_RULES_DIR=/usr/pkg/etc/mail/spamassassin,
LOCAL_STATE_DIR=/usr/pkg/var/spamassassin

What exactly is the directory sa-update is downloading to?  Is it one of
those?  Does it actually contain rules?

$ sa-update -D 21 | grep LOCAL_STATE
Jan  1 14:08:54.614 [9446] dbg: generic: Perl 5.010001, PREFIX=/usr, 
DEF_RULES_DIR=/usr/share/spamassassin, LOCAL_RULES_DIR=/etc/spamassassin, 
LOCAL_STATE_DIR=/var/lib/spamassassin 

$ spamassassin -D /dev/null 21 | grep LOCAL_STATE
Jan  1 14:09:38.542 [9459] dbg: generic: Perl 5.010001, PREFIX=/usr, 
DEF_RULES_DIR=/usr/share/spamassassin, LOCAL_RULES_DIR=/etc/spamassassin, 
LOCAL_STATE_DIR=/var/lib/spamassassin

So on my machine, sa-update is downloading to, and spamassassin is loading
rules from, the LOCAL_STATE_DIR, and the rule definitions are all in
/var/lib/spamassassin/3.004000/updates_spamassassin_org/ .

On 01/01, Steve Blinkhorn wrote:
 Thank you for your various responses.
 
 spamassassin --lint -D output is at  http://pastebin.com/Hjmt8CbE
 
 There is only one sa-update on the system.
 
 I installed from CPAN
 
 --
 Steve Blinkhorn st...@prd.co.uk
 
  
  --f46d04428890f2fb1a04b568e766
  Content-Type: text/plain; charset=ISO-8859-1
  
  Check you've only got one saupdate etc installed and you are calling the
  saupdate associated with the spamassassin you are running. Ie check there's
  not one installed from ports or as the base install if you hand installed a
  version and vice versa
  
  Martin
  
  On Saturday, 31 December 2011, Steve Blinkhorn st...@prd.co.uk wrote:
   Hi,
   I just tried to install spamassassin: everything proceeded normally,
   AFAIK, but the basic spamassassin -t' on the provided sample fails
   because no rules are found (line 400, which looks to my untutored eye
   like an all-purpose error-spitter). sa-update appears to run, and
   exits silently.   There is a rules directory under the the directory
   where I ran the installation, and also under usr/pkg/share, and they
   are both populated with files which look relevant.
  
   I tweaked the script so as not to require rules, and it ran and
   produced output.
  
   NetBSD 4.01, working as root.   What is amiss?
  
   --
   Steve Blinkhorn st...@prd.co.uk
  
  
  
   This email is for the addressee only.   If you are not the addressee
   you should immediately delete this email from your system(s) and
   inform us.   It may contain information that is confidential or
   otherwise privileged, and should not be copied or redistributed to
   recipients not originally specified as addressees without permission.
  
   Psychometric Research  Development Ltd.
   PO Box 1143, St Albans, Herts, AL1 9UT, UK
   Registered in England No. 1909571
   Registered Office: 47 Holywell Hill, St Albans, Herts, AL1 1HD
   Phone: +44 (0)1727 841455
   www.prd.co.uk
  
  
  
  
  -- 
  -- 
  Martin Hepworth
  Oxford, UK
  
  --f46d04428890f2fb1a04b568e766
  Content-Type: text/html; charset=ISO-8859-1
  Content-Transfer-Encoding: quoted-printable
  
  Check you#39;ve only got one saupdate etc installed and you are calling th=
  e saupdate associated with the spamassassin you are running. Ie check there=
  #39;s not one installed from ports or as the base install if you hand inst=
  alled a version and vice versabr
  brMartinbrbrOn Saturday, 31 December 2011, Steve Blinkhorn lt;a hre=
  f=3Dmailto:st...@prd.co.uk;st...@prd.co.uk/agt; wrote:brgt; Hi,br=
  gt; I just tried to install spamassassin: everything proceeded normally,b=
  r
  gt; AFAIK, but the basic quot;spamassassin -t#39; on the provided sample=
   failsbrgt; because no rules are found (line 400, which looks to my untu=
  tored eyebrgt; like an all-purpose error-spitter). sa-update appears to =
  run, andbr
  gt; exits silently. =A0 There is a rules directory under the the directory=
  brgt; where I ran the installation, and also under usr/pkg/share, and th=
  eybrgt; are both populated with files which look relevant.brgt;br
  gt; I tweaked the script so as not to require rules, and it ran andbrgt=
  ; produced output.brgt;brgt; NetBSD 4.01, working as root. =A0 What i=
  s amiss?brgt;brgt; --brgt; Steve Blinkhorn lt;a href=3Dmailto:s=
  t...@prd.co.ukst...@prd.co.uk/agt;br
  gt;brgt; **=
  **brgt; This email is for the addressee only. =A0 If you are=
   not the addresseebrgt; you should 

Re: installation problem

2012-01-01 Thread darxus
On 01/01, Steve Blinkhorn wrote:
 files like init.pre, sa-update-keys, v312.pre, v330.pre
 local.cf, v310.pre, v320.pre?   I don't know exactly what I'm looking
 for - is there a standard extgension for rule files?

No, those are installed with spamassassin.  The files you're looking end in
.cf.  A good example is the file 50_scores.cf.

 I'm afraid you'll have to tell me... http://pastebin.com/xaWNQ0GS

Your LOCAL_STATE_DIR matches in the output of both -
/usr/pkg/var/spamassassin.  You should have rules there.  This file should
exist:  

/usr/pkg/var/spamassassin/3.004000/updates_spamassassin_org/50_scores.cf

Does it?  

If it doesn't exist, sa-update isn't writing file successfully.  If it does
exist, spamassassin isn't reading them.  Could be weird file permissions I
guess.

-- 
But do you have any idea how many SuperBalls you could buy if you
actually applied yourself in the world? Probably eleven, but you should
still try. - http://hyperboleandahalf.blogspot.com/
http://www.ChaosReigns.com


Re: installation problem

2012-01-01 Thread darxus
On 01/01, wolfgang wrote:
  /usr/pkg/var/spamassassin/3.004000/updates_spamassassin_org/50_scores.cf

 I would rather suspect that file to be located in
 Jan  1 19:55:45.157 [6360] dbg: channel: update directory 
 /usr/pkg/var/spamassassin/3.003002/updates_spamassassin_org

You're right, thanks.  I hadn't figured out till now where exactly that
version number comes from.  3.003002 = v3.3.2.
^   ^  ^

So Steve, you should have a file
/usr/pkg/var/spamassassin/3.003002/updates_spamassassin_org/50_scores.cf

-- 
If everything seems under control, you're not going fast enough
- Mario Andretti
http://www.ChaosReigns.com


Re: Help tagging URL spam

2012-01-01 Thread darxus
body PILSPHARMNEW /pilspharmnew/
score PILSPHARMNEW 5
describe PILSPHARMNEW Body contains /pilspharmnew/.

Untested, let me know if it works, but that should do it.

On 01/01, Alex wrote:
 Hi,
 
 I'm having difficulty catching a series of spams with just a text
 component and a URL and hoped someone could help. I've included a few
 samples on pastebin here:
 
 http://pastebin.com/raw.php?i=1Y5QCkfh
 http://pastebin.com/raw.php?i=KdmZXM0d
 
 It only hits BAYES_50 usually, despite learning a few dozen of these
 over the last week. It also appears to originate from yahoo.com.
 
 Any ideas greatly appreciated.
 Thanks,
 Alex
 

-- 
Blessed are they who, in the face of death, think only about the
front sight.
http://www.ChaosReigns.com


Re: Help tagging URL spam

2012-01-01 Thread darxus
On 01/02, Alex wrote:
 What I haven't been able to figure out is a more generalized pattern
 from these, such as something in the header that is inconsistent with
 non-spam or contains some type of invalid header data, such as the
 mismatch between having originated at yahoo but being sent as
 sbcglobal?

Then you should provide a better variety of examples.

 Shouldn't have bayes picked this up after learning a dozen or more of these?

They're probably carefully crafted to avoid being caught by bayes.  And I
don't think SA's bayes even looks at headers.  And bayes definitely doesn't
do stuff like mismatching domains in headers.

-- 
Begin at the beginning and go on till you come to the end; then stop.
- Lewis Carrol, Alice in Wonderland
http://www.ChaosReigns.com


Re: Upgrade FuzzyOcr Plugin to 3.6.0

2011-12-27 Thread darxus
Is it wise to use FuzzyOCR at this point?  Its home page appears to be
http://fuzzyocr.own-hero.net/
That says:

  This project is UNMAINTAINED as of 2009-06-01. Use it at your own risk.
  If you want to fork this project, drop me a note
  (decoder[at]own-hero.net).


Also, it is highly recommended that you upgrade spamassassin to version
3.3.0 or newer.

On 12/21, eliasml wrote:
 
 Hello folks!,
 
 I have a box running freebsd with SpamAssassin and FuzzyOcr plugin, I have
 noted that it doen't work fine, the body/description of rules of FuzzyOcr is
 empty ever, I have googling and I have found that the SpamAssassin version
 3.2.4 not is compatible with the FuzzyOcr 3.4 and I must upgrade this
 version 3.6.0. I'd like know how can do it, somebody can I help me, please?
 
 Thanks in advance!!
 -- 
 View this message in context: 
 http://old.nabble.com/Upgrade-FuzzyOcr-Plugin-to-3.6.0-tp33015472p33015472.html
 Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
 

-- 
Rebellion to Tyrants is Obedience to God.  - Benjamin Franklin,
first version of the reverse side of the Great Seal of the United States.
http://www.ChaosReigns.com


Re: DNSWL was re-enabled

2011-12-26 Thread darxus
On 12/26, Karsten Bräckelmann wrote:
  score __RCVD_IN_DNSWL 0
 
 It is a non-scoring double underscore sub-rule. It does not have a
 score. It cannot have a score. Setting its score to zero does nothing,
 and certainly not prevent the DNS query.
 
 Instead, you need to meta out the rule, overwriting the rule definition.
 
 And frankly, disabling a rule by logically making it never hit is the
 better approach anyway. Just re-define rules to disable them:
 
   meta FOO  0

I asked about this on the dev list a week ago.  I guess I should've
cc'd you.  http://wiki.apache.org/spamassassin/DnsBlocklists says to
use the score method.  I went with that.

  That last one is really important, because without it, you'll still stop
  getting hits on the dnswl rules, but you'll still be sending queries to
  dnswl.  I'm hoping that'll get fixed.
 
 There is nothing to be fixed. There is no problem.

The problem is the potential for large sites to disable the rules but not
disable the queries, continuing to send millions of unused queries per day.  

-- 
Life is either a daring adventure or it is nothing at all.
- Helen Keller
http://www.ChaosReigns.com


  1   2   3   4   >