Re: help lowering score on a specific email list situation

2009-03-29 Thread LuKreme

On 28-Mar-2009, at 17:52, Martin Gregorie wrote:

On Sat, 2009-03-28 at 17:28 -0600, LuKreme wrote:

On 28-Mar-2009, at 15:32, RobertH wrote:

i have problems with the cabletv.org email list.


Why re you running SA over known list messages?


I'm a member of four lists that are not moderated and do not restrict
access to paid-up members of a parent organisation. Of the four, this
list is the only one that doesn't carry spam.


Hmmm... I subscribe to dozens of lists, not one of which is moderated  
or restricts access to paid-up members of anything and none of them  
get spam. Or if they do, they don't sent it on to the list. I don't  
subscribe to any Yahoo groups though.



Is that a good enough reason for you?


Spam filtering of mailing lists is the job of the list owner. If a  
list is frequently spammed find another list with a competent admin  
running on decent software.



--
The real American folksong is a rag -- a mental jag
A rhythmic tone for the chronic blues



Re: Release information in email geader and source

2009-03-29 Thread mouss
Karsten Bräckelmann a écrit :
 On Sat, 2009-03-28 at 17:20 -0700, jdpnh wrote:
 For a long time I have been reviewing the header/source of spam that I
 received in my inbox.  The version/release of SpamAssassin was old - at
 least 10 releases.  I pointed this out to the customer service people and
 tech support folks at my ISP - NO RESPONSE.  I followed thru with a letter
 to the prez of the company.

 All of a sudden the header/source reflected the current release of
 SpamAssassin.

 The service people said that they have always been up to date with the
 current release/version - but they had forgotten to do something that it
 would be reflected in the header/source.  I'm a tech and really question
 this response.
 
 While it does sound strange and questionable -- there's pretty much no
 way to confirm it's legitimacy or your suspicion. Even less so, without
 the actual headers, which at least give some hint about how SA is
 integrated in the mail flow.
 
 No matter how dumb, of course a later hop can rewrite the SA version
 header, if inserted (by SA rather than a glue) in the first place at
 all. Also, the headers could at least give a hint about the legitimacy
 of the claim of a second hop.
 

and in any case, if the results are satisfactory (high spam hit rate,
extremely low FP rate), then there is nothing more to ask for!

 [snip]


RE: help lowering score on a specific email list situation

2009-03-29 Thread RobertH
 


 From: LuKreme
 
 Why re you running SA over known list messages?
 

LuKreme,

u good question.

we do it cause i havent decided to want, develope  implement, and to use a
way to filter out things i dont want to run through SA on inbound SMTP port
25.

it is easier for me to know everything is treated the same

others have mentioned whitelisting via spf etc which we do in some cases,
yet this one is unique in that it is hosted on a situation where SPAM flags
abound and such emails are generally rejected 100%

 - rh



RE: help lowering score on a specific email list situation

2009-03-29 Thread RobertH
 

 From: Evan Platt
 
 Isn't that a tad overkill?
 
 http://wiki.apache.org/spamassassin/RuleUpdates
 
 How often should I run sa-update?
 
 As often as you like. It typically depends on what time-frame 
 is comfortable for you, and how quickly channels are going to 
 be publishing updates. Generally speaking, once a day is a 
 good starting point.
 
 

Evan,

naw, hourly is just fine.

we update sought ruleset at the same time.

i spose i could change it, yet spam is not a once a day thing.

spam is all day every day, so hourly is the least i want to see things
updated.

sometimes i think it would be nice if we had a *come get* or push trigger
on the serving side for some types of update flags...

:-)

 - rh





update overkill (was: help lowering score on a specific email list situation)

2009-03-29 Thread Karsten Bräckelmann
  Isn't that a tad overkill?

It is. :)

  http://wiki.apache.org/spamassassin/RuleUpdates
  
  How often should I run sa-update?
  
  As often as you like. It typically depends on what time-frame 
  is comfortable for you, and how quickly channels are going to 
  be publishing updates. Generally speaking, once a day is a 
  good starting point.

 naw, hourly is just fine.
 we update sought ruleset at the same time.
 
 i spose i could change it, yet spam is not a once a day thing.

With the notable exception of the SOUGHT rule-set, there is no single
rule-set out there that's being updated even on a daily basis, let alone
hourly. Spam isn't once a day, and spam evolves quite fast -- though
still not even close to daily...

Even SOUGHT, which is being updated a few times a day, does *not*
require hourly updates. It isn't generated that often.


-- 
char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: help lowering score on a specific email list situation

2009-03-29 Thread Karsten Bräckelmann
On Sat, 2009-03-28 at 23:27 +0100, Matus UHLAR - fantomas wrote:
   when did you sa-update for last time? afaik FH_HOST_EQ_* 
   rules were removed some time ago. Not that current rules 
   don't have some issues...
   
   And, of course, you have some rules unknown to me and clean 
   SA, are you sure those problems aren't caused by them?

They are the problem. See my earlier post dissecting the rules' hit. A
recent stock SA wouldn't have flagged it spam.

 On 28.03.09 15:14, RobertH wrote:
  we SA update hourly.
  
  unless i have something messed up from old directory or something
 
 I'd check for that.

Indeed. Either Robert is running some really old SA version, or updating
is plain broken on his machine.

Well, or he deliberately put those rules back in locally...


-- 
char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



RE: help lowering score on a specific email list situation

2009-03-29 Thread RobertH

 
 Nope, you don't. You got a problem with your custom rules.
 
 
  here is what it is tripping on...
  
0.7 FH_HOST_EQ_D_D_D_D Host starts with d-d-d-d
1.2 HOST_EQ_STATIC HOST_EQ_STATIC
0.7 FH_HOST_EQ_D_D_D_DBHost is d-d-d-d
1.3 HOST_EQ_CHARTERHOST_EQ_CHARTER
 
 Neither of these is in stock SA 3.2.5, nor pulled by 
 sa-update for any 3.2.x version. Sorry, too lazy to check all 
 old and not-updated versions. Minus 3.9...
 
1.9 TVD_RCVD_IPTVD_RCVD_IP
0.5 FROM_NOT_REPLYTO   From: does not match Reply-To:
 
 Not stock SA, and *does* happen frequently on lists. Local rule.
 
   -2.6 BAYES_00   BODY: Bayesian spam 
 probability is 0 to 1%
   [score: 0.]
1.5 SAGREY Adds 1.0 to spam from 
 first-time senders
 
 Custom, third-party plugin. Use at your own risk. Explicitly 
 mentions in the description, to add 1.0 points -- raised 
 arbitrarily by you. Local rule, local problem.
 
 
  can someone help me formulate a good rule to reduce scoring.
 
 You do not need a good negative scoring rule (besides 
 proposals for rules already posted), you seriously need to 
 review your custom rules.
 
 According to your rules hit, stock SA merely would score 1.9 
 for the single TVD_RCVD_IP hit. Plus Bayes (which affects 
 this rule's score) and even subtracts significantly for you.
 
 
 1.9 -- this is a local problem with your custom rules.
 

Karsten,

thank you for your analysis...  :-)

i had forgotten about (not in a bad way) the use of some FVGT sets etc...

those rules help catch spam.

00_FVGT_File001.cf:
  Rule Name Score Ham   Spam   %of Ham   %of Spam
  ---
  FH_HOST_EQ_D_D_D_D 0.67   1505   9458 1.14%  7.31%
  FH_HOST_EQ_D_D_D_DB0.69663   6756 0.50%  5.22%

88_FVGT_headers.cf:
  Rule Name Score Ham   Spam   %of Ham   %of Spam
  ---
  HOST_EQ_CHARTER1.29 42 61 0.03%  0.05%
  HOST_EQ_STATIC 1.17157   2224 0.12%  1.72%

sagrey.cf:
  Rule Name Score Ham   Spam   %of Ham   %of Spam
  ---
  SAGREY 1.50  0  111668 0.00% 86.33%

SAGREY on a daily basis is more like 90 to 93 percent. i ran the simple
analysis script against a longer period of time and there have been some
minor changes in between.

yet... thanks for pointing this all out. i just grepped the rules against
that directory and gained some extra enlightenment.

regardless, the original question stands, and i thank all of you for your
advise.

i have applied the necessary fix and things are just fine.

everyone's help has been fantastic. thank you!

:-)

 - rh 



RE: help lowering score on a specific email list situation

2009-03-29 Thread RobertH

 
 Indeed. Either Robert is running some really old SA version, 
 or updating is plain broken on his machine.
 
 Well, or he deliberately put those rules back in locally...

i believe i have checked all the rules.

we run 3.2.5

most of the rules were addons.

here is

[r...@ac updates_spamassassin_org]# pwd

/var/lib/spamassassin/3.002005/updates_spamassassin_org

[r...@ac updates_spamassassin_org]# grep TVD_RCVD_IP *

50_scores.cf:score TVD_RCVD_IP 0.502 1.617 2.270 1.931 # n=2
50_scores.cf:score TVD_RCVD_IP4 4.099 3.344 2.901 3.183 # n=2
72_active.cf:##{ TVD_RCVD_IP
72_active.cf:header TVD_RCVD_IP  Received =~
/^from\s+(?:\d+[^0-9a-zA-Z\s]){3}\d+[.\s]/
72_active.cf:##} TVD_RCVD_IP
72_active.cf:##{ TVD_RCVD_IP4
72_active.cf:header TVD_RCVD_IP4 Received =~ /^from\s+(?:\d+\.){3}\d+\s/
72_active.cf:##} TVD_RCVD_IP4

do you see something broke below here?

[9511] dbg: gpg: adding key id 6C6191E3
[9511] dbg: gpg: Searching for 'gpg'
[9511] dbg: util: current PATH is:
/usr/kerberos/sbin:/usr/kerberos/bin:/usr/lib/ccache/bin:/usr/local/sbin:/us
r/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/X11R6/bin:/root/bin:/root/log
ging
[9511] dbg: util: executable for gpg was found at /usr/bin/gpg
[9511] dbg: gpg: found /usr/bin/gpg
[9511] dbg: gpg: release trusted key id list:
5E541DC959CB8BAC7C78DFDC4056A61A5244EC45
26C900A46DD40CD5AD24F6D7DEE01987265FA05B
0C2B1D7175B852C64B3CDC716C55397824F434CE 6C6191E3
[9511] dbg: channel: reading in channelfile /etc/mail/sa-update.conf
[9511] dbg: channel: adding updates.spamassassin.org
[9511] dbg: channel: attempting channel updates.spamassassin.org
[9511] dbg: channel: update directory
/var/lib/spamassassin/3.002005/updates_spamassassin_org
[9511] dbg: channel: channel cf file
/var/lib/spamassassin/3.002005/updates_spamassassin_org.cf
[9511] dbg: channel: channel pre file
/var/lib/spamassassin/3.002005/updates_spamassassin_org.pre
[9511] dbg: channel: metadata version = 752903
[9511] dbg: dns: 5.2.3.updates.spamassassin.org = 752903, parsed as 752903
[9511] dbg: channel: current version is 752903, new version is 752903,
skipping channel
[9511] dbg: diag: updates complete, exiting with code 1

TIA

 - rh



RE: help lowering score on a specific email list situation

2009-03-29 Thread Karsten Bräckelmann
On Sun, 2009-03-29 at 09:20 -0700, RobertH wrote:
  
  Indeed. Either Robert is running some really old SA version, 
  or updating is plain broken on his machine.
  
  Well, or he deliberately put those rules back in locally...

The latter -- according to the other sub-thread all these rules are
clearly in custom or third-party cf files. There is no evidence of stale
files or a broken update.

 i believe i have checked all the rules.
 
 we run 3.2.5
 most of the rules were addons.

Yup. All of them but TVD_RCVD_IP. :)


-- 
char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: update overkill (was: help lowering score on a specific email list situation)

2009-03-29 Thread Justin Mason
on the other hand, the sa-update architecture can cope with it just fine. ;)

2009/3/29 Karsten Bräckelmann guent...@rudersport.de:
  Isn't that a tad overkill?

 It is. :)

  http://wiki.apache.org/spamassassin/RuleUpdates
 
  How often should I run sa-update?
 
  As often as you like. It typically depends on what time-frame
  is comfortable for you, and how quickly channels are going to
  be publishing updates. Generally speaking, once a day is a
  good starting point.

 naw, hourly is just fine.
 we update sought ruleset at the same time.

 i spose i could change it, yet spam is not a once a day thing.

 With the notable exception of the SOUGHT rule-set, there is no single
 rule-set out there that's being updated even on a daily basis, let alone
 hourly. Spam isn't once a day, and spam evolves quite fast -- though
 still not even close to daily...

 Even SOUGHT, which is being updated a few times a day, does *not*
 require hourly updates. It isn't generated that often.


 --
 char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
 main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
 (c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}




Re: update overkill (was: help lowering score on a specific email list situation)

2009-03-29 Thread Karsten Bräckelmann
On Sun, 2009-03-29 at 18:14 +0100, Justin Mason wrote:
 on the other hand, the sa-update architecture can cope with it just fine. ;)

Heh, true. And he could run sa-update even more frequently. After all,
the DNS answer is cached for an hour... ;)

The real impact isn't the DNS query, but whenever an update has been
pushed. If everyone would check once an hour, the full load would have
to be shouldered in 60 minutes, as opposed to evenly distributed about,
say, a day...

It's the same classic problem with uninspired admins, running such cron
jobs strictly at a full hour.


  With the notable exception of the SOUGHT rule-set, there is no single
  rule-set out there that's being updated even on a daily basis, let alone
  hourly. Spam isn't once a day, and spam evolves quite fast -- though
  still not even close to daily...
 
  Even SOUGHT, which is being updated a few times a day, does *not*
  require hourly updates. It isn't generated that often.

-- 
char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: update overkill

2009-03-29 Thread mouss
Karsten Bräckelmann a écrit :
 On Sun, 2009-03-29 at 18:14 +0100, Justin Mason wrote:
 on the other hand, the sa-update architecture can cope with it just fine. ;)
 
 Heh, true. And he could run sa-update even more frequently. After all,
 the DNS answer is cached for an hour... ;)
 
 The real impact isn't the DNS query, but whenever an update has been
 pushed. If everyone would check once an hour, the full load would have
 to be shouldered in 60 minutes, as opposed to evenly distributed about,
 say, a day...
 
 It's the same classic problem with uninspired admins, running such cron
 jobs strictly at a full hour.
 

In most cases, it's not the admins fault. many systems allow adding cron
jobs by simply putting a file in a /some/path/hourly and so on instead
of editing /etc/crontab (or running the crontab command). This is nice
(exceptionally for packages when editing files is problematic), but on
the other hand it doesn't provide flexibility for tasks such downloading
data from a (more or less) central place.

I don't know what the problem is, but P2P may be the answer ;-p



Re: update overkill

2009-03-29 Thread Karsten Bräckelmann
On Sun, 2009-03-29 at 20:44 +0200, mouss wrote:
 Karsten Bräckelmann a écrit :

  It's the same classic problem with uninspired admins, running such cron
  jobs strictly at a full hour.
 
 In most cases, it's not the admins fault. many systems allow adding cron
 jobs by simply putting a file in a /some/path/hourly and so on instead
 of editing /etc/crontab (or running the crontab command). This is nice
 (exceptionally for packages when editing files is problematic), but on

Don't they also provide an /etc/cron.d/ to dump a single cron job per
file including the ability to specify the full run time? Mine do. ;)

 the other hand it doesn't provide flexibility for tasks such downloading
 data from a (more or less) central place.

Randomization isn't particularly hard.
  sleep $(( $RANDOM % 3600 ))

The debian daily spamassassin update cron job does exactly this, FWIW.


-- 
char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



RE: update overkill (was: help lowering score on a specific email list situation)

2009-03-29 Thread RobertH
 

 From: Karsten Bräckelmann
 Heh, true. And he could run sa-update even more frequently. 
 After all, the DNS answer is cached for an hour... ;)
 
 The real impact isn't the DNS query, but whenever an update 
 has been pushed. If everyone would check once an hour, the 
 full load would have to be shouldered in 60 minutes, as 
 opposed to evenly distributed about, say, a day...
 
 It's the same classic problem with uninspired admins, running 
 such cron jobs strictly at a full hour.
 

karsten,

why not change the wiki to be far less ambiguous on this issue, and put out
a specific decree to *all* admins in these regards ?

if there is a more specific better way to deal with it, change it and make
it better and let us all know how it must be done.

anyone can find fault with anything given enough opportunity.

thank you for humbling my day.

 - rh



RE: help lowering score on a specific email list situation

2009-03-29 Thread Evan Platt

At 08:19 AM 3/29/2009, you wrote:


Evan,

naw, hourly is just fine.

we update sought ruleset at the same time.

i spose i could change it, yet spam is not a once a day thing.

spam is all day every day, so hourly is the least i want to see things
updated.


But you're not seeing things updated hourly.

Seriously - look at the times you update, and see if over the past 
say month, do you REALLY see anything updated 24 times in a day? Likely not.


Look at when you see updates, and go with that.

Think of it like this - if I know the mailman generally comes at 
2:30, but sometimes (once every few weeks) comes at 11:00 AM, am I 
going to go out to the mailbox every day at 11:00, or 2:30? I'll go 
at 2:30. And if say I go once at 11:00, and then he's not there, do I 
go again at 12? Then 1? No.



sometimes i think it would be nice if we had a *come get* or push trigger
on the serving side for some types of update flags...



Probably not going to happen for the reason of overloading servers - 
which is also why they ask you to check once a day - so the servers 
aren't overloaded.


Checking once an hour is obscene.



RE: update overkill

2009-03-29 Thread RobertH
 


 Mouss wrote:
 In most cases, it's not the admins fault. many systems allow 
 adding cron jobs by simply putting a file in a 
 /some/path/hourly and so on instead of editing /etc/crontab 
 (or running the crontab command). This is nice (exceptionally 
 for packages when editing files is problematic), but on the 
 other hand it doesn't provide flexibility for tasks such 
 downloading data from a (more or less) central place.
 
 I don't know what the problem is, but P2P may be the answer ;-p
 
 

mouss

the P2P solution part is FUNNY!   ;-) (you have excellent sense of humor)

i chose it that way for specific reasons.

see previous email to K

;-)

 - rh



Re: update overkill

2009-03-29 Thread mouss
Karsten Bräckelmann a écrit :
 On Sun, 2009-03-29 at 20:44 +0200, mouss wrote:
 Karsten Bräckelmann a écrit :
 
 It's the same classic problem with uninspired admins, running such cron
 jobs strictly at a full hour.
 In most cases, it's not the admins fault. many systems allow adding cron
 jobs by simply putting a file in a /some/path/hourly and so on instead
 of editing /etc/crontab (or running the crontab command). This is nice
 (exceptionally for packages when editing files is problematic), but on
 
 Don't they also provide an /etc/cron.d/ to dump a single cron job per
 file including the ability to specify the full run time? Mine do. ;)
 

debian/* and redhat/* do. Other linux distros probably have this cron.d
too.  but even then, people are naturally tempted to use the easy way
(no cron format to care about).

 the other hand it doesn't provide flexibility for tasks such downloading
 data from a (more or less) central place.
 
 Randomization isn't particularly hard.
   sleep $(( $RANDOM % 3600 ))
 
 The debian daily spamassassin update cron job does exactly this, FWIW.
 

does it allow for some control? I mean, I want it random, but during a
specific interval (I would prefer not to sa-compile while the box is
under load...).



RE: help lowering score on a specific email list situation

2009-03-29 Thread RobertH
 

 
 Checking once an hour is obscene.
 
 

Evan,

dude, shut up and mind your own business. (and i mean that in the most
constructive manner)

you dont know me, you do not admin this business, and we are not stupid and
have been doing this for longer than many on this list have been alive.

if you cannot be constructive, get off the list

there was a reason it was done this way and things have changed since that
time and can be modified easily.

if i come to the list with my hat in my hand asking for help, please know
that i am willing to make changes or i wouldnt ask questions in the first
place.

 - rh



RE: update overkill (was: help lowering score on a specific email list situation)

2009-03-29 Thread Karsten Bräckelmann
On Sun, 2009-03-29 at 13:05 -0700, RobertH wrote:
  From: Karsten Bräckelmann

  The real impact isn't the DNS query, but whenever an update 
  has been pushed. If everyone would check once an hour, the 
  full load would have to be shouldered in 60 minutes, as 
  opposed to evenly distributed about, say, a day...
  
  It's the same classic problem with uninspired admins, running 
  such cron jobs strictly at a full hour.
 
 why not change the wiki to be far less ambiguous on this issue, and put out
 a specific decree to *all* admins in these regards ?
 
 if there is a more specific better way to deal with it, change it and make
 it better and let us all know how it must be done.
 
 anyone can find fault with anything given enough opportunity.
 
 thank you for humbling my day.

Oh, come on, Robert -- I didn't say your way is abusive, just overkill.

The most part of this discussion isn't specific to you, nor SA. It's a
well-known, general problem when running update services. It isn't meant
to be a decree either, it's partly my opinion, partly best-practices.

You shouldn't take it personally.


-- 
char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: help lowering score on a specific email list situation

2009-03-29 Thread mouss
Evan Platt a écrit :
 At 08:19 AM 3/29/2009, you wrote:
 
 Evan,

 naw, hourly is just fine.

 we update sought ruleset at the same time.

 i spose i could change it, yet spam is not a once a day thing.

 spam is all day every day, so hourly is the least i want to see things
 updated.
 
 But you're not seeing things updated hourly.
 
 Seriously - look at the times you update, and see if over the past say
 month, do you REALLY see anything updated 24 times in a day? Likely not.
 
 Look at when you see updates, and go with that.
 
 Think of it like this - if I know the mailman generally comes at 2:30,
 but sometimes (once every few weeks) comes at 11:00 AM, am I going to go
 out to the mailbox every day at 11:00, or 2:30? I'll go at 2:30. And if
 say I go once at 11:00, and then he's not there, do I go again at 12?
 Then 1? No.
 
 sometimes i think it would be nice if we had a *come get* or push
 trigger
 on the serving side for some types of update flags...
 
 
 Probably not going to happen for the reason of overloading servers -
 which is also why they ask you to check once a day - so the servers
 aren't overloaded.
 
 Checking once an hour is obscene.
 

This is exagerated.

if a channel isn't updated often, it is enough to set a high TTL. This
way, probes will not go beyond the local cache.


RE: update overkill (was: help lowering score on a specific emaillist situation)

2009-03-29 Thread RobertH

 
 Oh, come on, Robert -- I didn't say your way is abusive, just 
 overkill.
 
 The most part of this discussion isn't specific to you, nor 
 SA. It's a well-known, general problem when running update 
 services. It isn't meant to be a decree either, it's partly 
 my opinion, partly best-practices.
 
 You shouldn't take it personally.
 
 

not taking it personally.  :-)  everyone please read. maybe you can shed
light on what i am experiencing that is detailed towards the bottom.



I really believe the wiki should be modified and have more info in regards
to making decisions. like not less than such and such hours and typically
not more than such and such day.

ALSO :-) 

i was *genuinely* thanking you for humbling my day.

short long story.

what i have been going througg is, many people like us that work in telecom,
networks, computers, etc... well we all think we know how to setup things
just so, and it has to be *perfect* and well, i know i dont know it all.

YET, communicating that is hard to people that are *not* in our respective
fields.

fortunately, i have been having a really hard time explaining things to lay
people lately and it is just **kicking** my rear.

you know, kinda like if you have to do tech support and are responsible for
millions of people and say they are all family you know and love dearly.

i wont bring the associated Bible items into it on list, yet sincerely, i
appreciate a respectful humbling.

again, thank you all for helping me!

yes, even you Evan. (apologies)

  :-)

 - rh



RE: help lowering score on a specific email list situation

2009-03-29 Thread Evan Platt

At 01:22 PM 3/29/2009, you wrote:

dude, shut up and mind your own business. (and i mean that in the most
constructive manner)


You come to a list asking for help, you make whatever you state your 
own business.


But with your attitude you aren't going to get anymore help from me.


you dont know me, you do not admin this business, and we are not stupid and
have been doing this for longer than many on this list have been alive.


Yeah yeah, you've been running anti-spam servers from before the 
internet was around. Sure.



if you cannot be constructive, get off the list


I've tried to be constructive. If you cannot accept constructive, get 
off the list.




there was a reason it was done this way and things have changed since that
time and can be modified easily.

if i come to the list with my hat in my hand asking for help, please know
that i am willing to make changes or i wouldnt ask questions in the first
place.


Apparently you aren't willing to make changes.

No more help from me for you. Best of luck.



sa-update: determining last run

2009-03-29 Thread Dennis G German
 sa-update

mkdir /etc/mail: Permission denied at /usr/bin/sa-update line 1226

 

There is no /etc/mail directory available. (I believe the /etc directory I
can view is artifical)

I cannot make a mail directory. 

I suspect this is another cPanel (shared host) problem.

 

Is there a way I can determine when sa-update was last run?

Thanks

 

sa-update -D

[19204] dbg: logger: adding facilities: all

[19204] dbg: logger: logging level is DBG

[19204] dbg: generic: SpamAssassin version 3.2.4

[19204] dbg: config: score set 0 chosen.

[19204] dbg: dns: is Net::DNS::Resolver available? yes

[19204] dbg: dns: Net::DNS version: 0.65

[19204] dbg: generic: sa-update version svn607589

. 

[19204] dbg: gpg: Searching for 'gpg'

[19204] dbg: util: current PATH is:
/home/realger1/.bin:/usr/kerberos/bin:/usr/lib/courier-imap/bin:/usr/local/b
in:/bin:/usr/bin:/usr/local/bin:/usr/X11R6/bin:/usr/libexec

[19204] dbg: util: executable for gpg was found at /usr/bin/gpg

[19204] dbg: gpg: found /usr/bin/gpg

[19204] dbg: gpg: importing default keyring to
/etc/mail/spamassassin/sa-update-keys

mkdir /etc/mail: Permission denied at /usr/bin/sa-update line 1226



Re: Still getting spam from yahoo/google groups

2009-03-29 Thread Arvid Ephraim Picciani

Where can i past the raw header? pastebin triggers it as spam



there is more then  one pastebin.  just like there is more then one OS.
try:
http://rafb.net/paste/
http://codepad.org/
http://paste.nn-d.de/
http://www.copypaste.at/
http://paste.uni.cc/

etc etc




__ Information from ESET NOD32 Antivirus, version of virus 
signature database 3968 (20090327) __


The message was checked by ESET NOD32 Antivirus.

http://www.eset.com

--
This message has been scanned for viruses and
dangerous content by *MailScanner* http://www.mailscanner.info/, and is
believed to be clean.



dan you please remove those?  no one cares about your _outgoing_ AV.






Re: How long does it take to install SA?

2009-03-29 Thread Arvid Ephraim Picciani



Single-user, vanilla install with two exceptions: the install will check our
two whitelists and give a pass (-100) to any of our clients so we don't
bounce their mail.


I hope you're not actually considering bouncing spam. That statement 
sounds like it.

Either jecect them at smtp time or silently delete

other then that.  whitelist sounds very simple to me:
http://wiki.apache.org/spamassassin/ManualWhitelist
unless you want to query a database of couse.

i'm using a check in my exim router to let some customers' customers 
bypass SA completly.





spamassassin: Determining last sa-update

2009-03-29 Thread Dennis German

I believe this is another cPanel issue.
Attempting to run sa-update displays:
   mkdir /etc/mail: Permission denied at /usr/bin/sa-update line 1226

How can I determine that last time sa-update was run?



SA: Determining last sa-update

2009-03-29 Thread Dennis German

I believe this is another cPanel issue.
Attempting to run sa-update displays:
   mkdir /etc/mail: Permission denied at /usr/bin/sa-update line 1226

How can I determine that last time sa-update was run?



SA: Determining last sa-update

2009-03-29 Thread Dennis German

I believe this is another cPanel issue.
Attempting to run sa-update displays:
   mkdir /etc/mail: Permission denied at /usr/bin/sa-update line 1226

How can I determine that last time sa-update was run?



sa-update when was last run?

2009-03-29 Thread Dennis German

I believe this is another cPanel issue.
Attempting to run sa-update displays:
   mkdir /etc/mail: Permission denied at /usr/bin/sa-update line 1226

How can I determine that last time sa-update was run?



[no subject]

2009-03-29 Thread jcputter
Can spamassassin miss hits or rules if it is running on a slow machine?


Windows Live Spam

2009-03-29 Thread jcputter
Hi i am getting spam from windows live accounts, spamassassin shows no hits

something it comes from live spaces, i have a rule to stop that but other 
pass.. please help


RE: lookup user_prefs in SQL database (not using spamc)

2009-03-29 Thread Guido Leisker
 Have you restarted amavisd-new since you added the @lookup_sql_dsn?
Yes, I did.
What I tried for example is to add a score -111 to GTUBE manually in the
local.cf. That does work. 
I think that makes sure that amavis(using SA) uses the latest local.cf,
right?

BTW: spamassassin should not try to search for user specific
settings in user's home directorys. Not all all. How can I do
that?
   Amavisd-new will not look at the user's home directories.
  No, but SA does.
 
 But amavisd-new doesn't call SpamAssassin as an external.  
 It opens the
 perl libraries and runs the same scoring code-base.  It behaves
 differently than the spamc client...

What does that mean regarding my problem?
(Okay, I know that amavisd-new uses the the libs directly -- rather than
the spamassassin command or spamc. I just mentioned spamc to prove that
db settings etc. should be correct.)
I was quite sure that I have seen SA browsing the home folder (debug
log). But I'll doublecheck that.


Thank you

Guido


Re: New kind of spam

2009-03-29 Thread Arvid Ephraim Picciani

http://codepad.org/W53onqK9

i gave on this kind of spam.  its impossible to train bayes and changing 
to fast to make custom rules. matching senders doesnt work either 
becouse those are sent using live.com, gmail, sourceforge, etc




Re: sa-update when was last run?

2009-03-29 Thread Justin Mason
oops.  Sorry for the multiple mails, folks -- my list moderation
mistake.

--j.

On Sun, Mar 29, 2009 at 01:31, Dennis German
dger...@real-world-systems.com wrote:
 I believe this is another cPanel issue.
 Attempting to run sa-update displays:
    mkdir /etc/mail: Permission denied at /usr/bin/sa-update line 1226
 How can I determine that last time sa-update was run?



google group spam

2009-03-29 Thread JC Putter
hi i am using this rule to catch spam with a google group link,

uri  __GOOGLEGROUPS_15  m'http://[^.]{15}\.googlegroups\.com'i
meta NN_GOOGLEGROUPS_15 __GOOGLEGROUPS_15  __GOOGLEGROUPS_NUM
describe NN_GOOGLEGROUPS_15  Contains a suspicious googlegroups URI.
scoreNN_GOOGLEGROUPS_15 2

but now i am getting a new type of one which the rules doesnt catch 
http://groups.google.com/group/

can someone please help me write a rule for this link?


__ Information from ESET NOD32 Antivirus, version of virus signature 
database 3973 (20090329) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com


-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.



Re: Windows Live Spam

2009-03-29 Thread Sahil Tandon
On Wed, 25 Mar 2009, jcput...@mail.centreweb.co.za wrote:

 Hi i am getting spam from windows live accounts, spamassassin shows no hits
 
 something it comes from live spaces, i have a rule to stop that but other 
 pass.. please help

Sorry, the crystal ball is out of order.  Would you be so kind as to post an
unmodified copy of the spammy message with full headers?  Don't paste here --
put it on a pastebin.

-- 
Sahil Tandon sa...@tandon.net


user-db size, content confusions (how many toks?)

2009-03-29 Thread Linda Walsh


I see 3 DB's in my user directory (.spamassassin).

auto-whitelist  (~80MB)
bayes_seen  (~40MB)
bayes_toks  (~20MB)

Was trying to find relation of 'bayes_expiry_max_db_size' to the physical
size of the above files.  I'm finding some answers, I've run into some
seeming contradictions.  Had db_size set to 500,000, reduced to 250,000
and to 'default' (150,000) during testing.

In trying to lower 'db_size' and see how that affected physical sizes,
I ran sa-learn --force expires and saw these debug messages of 'Note':

[30905] dbg: bayes: expiry check keep size, 0.75 * max: 112500
[30905] dbg: bayes: token count: 0, final goal reduction size: -112500
[30905] dbg: bayes: reduction goal of -112500 is under 1,000 tokens, skipping 
expire
[30905] dbg: bayes: expiry completed

---
First prob(contradiction).  dbg above says token count: 0.  (This is with
a combined bayes db size of 60MB (_seen, _toks).

Seems to think I have no bayes data.  Saw another dbg msg that indicated the
bayes classifier was untrained (~150? entries)  disabled.

Dunno how it got zeroed, but tried adding 'ham' by running sa-learn over
my a despam'ed mailbox.  First run showed:

Learned tokens from 55 message(s) (55 message(s) examined)

But subsequent runs of 'sa-learn with dbg+expire still show token count: 0.

sa-learn --dump magic shows something different:
0.000  0  3  0  non-token data: bayes db version
0.000  0 556414  0  non-token data: nspam
0.000  0 574441  0  non-token data: nham
0.000  0 491743  0  non-token data: ntokens
0.000  0 1216456288  0  non-token data: oldest atime
0.000  0 1237796146  0  non-token data: newest atime
0.000  0 1220476831  0  non-token data: last journal sync atime
0.000  0 1217838535  0  non-token data: last expiry atime
0.000  01382400  0  non-token data: last expire atime delta
0.000  0  70612  0  non-token data: last expire reduction 
count
-

Does the above indicate 0 tokens?  I.e. isn't 'ntokens' = 491743 mean
slightly under 500K tokens (my original limit before trying to run 'sa-learn 
-expires + dbg' manually).


It's like the sa-learn magic shows a 'db' corresponding to my old limit
(that I think is still being 'auto-expired', so might not have pruned
figure as it runs about once per 24 hours, if I understand normal spamd
workings).

So is the --magic output, maybe what is seen and being 'size-controlled' by
auto-expire (was ~500K before recent test changes).

Why isn't 'sa-learn --force expire' seeing the TOKENs indicated in
sa-learn --dump magic?  Debug messages are pointing at the same file
for both operations, so how can dump-magic indicated 500K, but the
debug of sa-learn --force-expire, is somehow seeing 0 TOKENs?

Am I misinterpreting the debug output?

Thanks,
Linda





Re: google group spam

2009-03-29 Thread LuKreme

On 29-Mar-2009, at 16:42, JC Putter wrote:

uri  __GOOGLEGROUPS_15  m'http://[^.]{15}\.googlegroups\.com'i
meta NN_GOOGLEGROUPS_15 __GOOGLEGROUPS_15  __GOOGLEGROUPS_NUM
describe NN_GOOGLEGROUPS_15  Contains a suspicious googlegroups URI.
scoreNN_GOOGLEGROUPS_15 2

but now i am getting a new type of one which the rules doesnt catch 
http://groups.google.com/group/

can someone please help me write a rule for this link?


uri  __GOOGLEGROUPS_15  m'http://groups\.google\.com\/group\/'i

I dunno what the {15} was meant to accomplish (why 15 characters  
specifically?  14 is not suspicious? 37 is not suspicious either?),  
but that will match any google groups link in the form you posted.



--
The Piper's calling you to join him



Re: your mail

2009-03-29 Thread John Hardin

On Wed, 25 Mar 2009, jcput...@mail.centreweb.co.za wrote:


Can spamassassin miss hits or rules if it is running on a slow machine?


No, but a message may skip SA if SA is overloaded due to running on a slow 
machine. You need to take into account your email volume.


The most important thing to look at is whether you are hitting swap. When 
you start hitting swap performance goes _way_ down, per-message scan times 
go up, and the likelihood of the delivery system timing out when 
attempting to pass a message to SA for scanning increases, leading to 
unscanned emails.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  It's easy to be noble with other people's money.
   -- John McKay, _The Welfare State:
  No Mercy for the Middle Class_
---
 3 days until April Fools' day


Re: Windows Live Spam

2009-03-29 Thread LuKreme

On 24-Mar-2009, at 18:47, jcput...@mail.centreweb.co.za wrote:
Hi i am getting spam from windows live accounts, spamassassin shows  
no hits



We just discussed this and someone posted:
#spaces.live.com spam
uri  URI_LIVEDOTCOM /\bspaces\.live\.com\b/i
scoreURI_LIVEDOTCOM #
describe URI_LIVEDOTCOM Contains link to spaces.live.com

I set # to 3.5 on my own mail, and 1.0 in local.cf, iirc.

--
I wrote this song two hours before we met.  I didn't know your
name, or what you looked like yet




Re: Windows Live Spam

2009-03-29 Thread John Hardin

On Wed, 25 Mar 2009, jcput...@mail.centreweb.co.za wrote:

Hi i am getting spam from windows live accounts, spamassassin shows no 
hits


something it comes from live spaces, i have a rule to stop that but 
other pass.. please help


Search the SA list archive for uri spaces live com. There have been 
several rules posted.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  It's easy to be noble with other people's money.
   -- John McKay, _The Welfare State:
  No Mercy for the Middle Class_
---
 3 days until April Fools' day


Re: user-db size, content confusions (how many toks?)

2009-03-29 Thread Matt Kettler
Linda Walsh wrote:

 I see 3 DB's in my user directory (.spamassassin).

 auto-whitelist(~80MB)
 bayes_seen(~40MB)
 bayes_toks(~20MB)

 Was trying to find relation of 'bayes_expiry_max_db_size' to the physical
 size of the above files.
expiry will only affect bayes_toks. Currently neither auto-whitelist nor
bayes_seen have any expiry mechanism at all.

bayes_seen can safely be deleted if you need to. It keeps track of what
messages have already been learned to prevent relearning them. However,
unless you're likely to re-feed messages to SA, bayes_seen isn't stictly
neccesary.


   I'm finding some answers, I've run into some
 seeming contradictions.  Had db_size set to 500,000, reduced to 250,000
 and to 'default' (150,000) during testing.

 In trying to lower 'db_size' and see how that affected physical sizes,
 I ran sa-learn --force expires and saw these debug messages of 'Note':

 [30905] dbg: bayes: expiry check keep size, 0.75 * max: 112500
 [30905] dbg: bayes: token count: 0, final goal reduction size: -112500
 [30905] dbg: bayes: reduction goal of -112500 is under 1,000 tokens,
 skipping expire
 [30905] dbg: bayes: expiry completed

 ---
 First prob(contradiction).  dbg above says token count: 0.  (This is
 with
 a combined bayes db size of 60MB (_seen, _toks).
Are you sure your sa-learn was using the same DB path?

From the sounds of it, sa-learn is using a directory with an empty DB.


 Seems to think I have no bayes data.  Saw another dbg msg that
 indicated the
 bayes classifier was untrained (~150? entries)  disabled.

 Dunno how it got zeroed, but tried adding 'ham' by running sa-learn over
 my a despam'ed mailbox.  First run showed:

 Learned tokens from 55 message(s) (55 message(s) examined)

 But subsequent runs of 'sa-learn with dbg+expire still show token
 count: 0.

 sa-learn --dump magic shows something different:
 0.000  0  3  0  non-token data: bayes db version
 0.000  0 556414  0  non-token data: nspam
 0.000  0 574441  0  non-token data: nham
 0.000  0 491743  0  non-token data: ntokens
 0.000  0 1216456288  0  non-token data: oldest atime
 0.000  0 1237796146  0  non-token data: newest atime
 0.000  0 1220476831  0  non-token data: last journal
 sync atime
 0.000  0 1217838535  0  non-token data: last expiry atime
 0.000  01382400  0  non-token data: last expire
 atime delta
 0.000  0  70612  0  non-token data: last expire
 reduction count
 -

 Does the above indicate 0 tokens?  I.e. isn't 'ntokens' = 491743 mean
 slightly under 500K tokens (my original limit before trying to run
 'sa-learn -expires + dbg' manually).
Yep, looks like you have 491,743 tokens to me.

 It's like the sa-learn magic shows a 'db' corresponding to my old limit
 (that I think is still being 'auto-expired', so might not have pruned
 figure as it runs about once per 24 hours, if I understand normal spamd
 workings).
Approximately. Also, be aware that in order for spamd to use new
settings it needs to be restarted.

 So is the --magic output, maybe what is seen and being
 'size-controlled' by
 auto-expire (was ~500K before recent test changes).
Yes, at least, it should be.

 Why isn't 'sa-learn --force expire' seeing the TOKENs indicated in
 sa-learn --dump magic?  
That is particularly strange to me, and it sounds like there's some
problems there.

Can you give a bit of detail, ie: what paths are you looking at for the
files, what version of SA,
 Debug messages are pointing at the same file
 for both operations, so how can dump-magic indicated 500K, but the
 debug of sa-learn --force-expire, is somehow seeing 0 TOKENs?

 Am I misinterpreting the debug output?
No, you don't seem to be.