Re: Custom Bayes score

2009-09-17 Thread LuKreme
On Sep 17, 2009, at 13:35, Dan Schaefer   
wrote:


In a general consensus for those who have customized your BAYES  
scores, what are they?


I run from 4,5 (thought it was 5.0) to -3

What I'm NOT looking for is a lecture on how everybody's systems are  
different.


But that is essentially true. What works for me is unlikely to work  
exactly the same for you.




Re: Can I auto-delete emails scoring 10 and above, yet mark as spam those 5 and above?

2009-09-17 Thread 牛粥
drkwc  writes:

> New Spamassassin nb qs:
>
> On the configuration panel, I have SpamAssassin set to mark as spam any
> email scoring 5 or above.
>
> I have a rule set in Outlook Express to route those to a SpamAssassin SPAM
> folder.
>
> Now, I'm wondering, can I ALSO set the auto-delete function to delete -- at
> server level -- any emails scoring 10 or higher. That would be really
> convenient and would only deliver to my Outlook Express spam folder those
> scoring lower than 10.
>
> It's not clear to me that I can use both functions simultaneously. The
> language on the Spam Assassin control panel says, for both functions, "Set
> the number of hits required before a mail is considered spam."
>
> Am wondering if I can have two different settings. One for auto delete and
> another for marking and delivering emails?

How about this -- "SPAMASSASSIN shell-based filter"? 
You can look it at Google. Actually that was useful for me.  

Sincerely,

-- 
"You cannot say 'no' to the people you love, not often. That's the secret.
And then when you do, it has to sound like a 'yes'. Or you have to make them
say 'no'. You have to take time and trouble. But I'm old-fashioned, you're
the new modern generation, don't listen to me."
-- Vito Corleone, "Chapter 28", page 401


Re: Move SPAM into SPAM folder

2009-09-17 Thread John Hardin

On Thu, 17 Sep 2009, Jose Luis Marin Perez wrote:

I need to know how can I automatically move all emails that are 
considered as SPAM to a specific directory called SPAM.


The server has installed Dovecot + Qmail + Vpopmail + Simscan + 
Spamassassin 3.2.5 + ClamAV.


At present all emails are considered SPAM go to the quarantine folder on 
the server.


Sorry, but we can't help you. Spamassassin only generates a spamminess 
score. Some other tool uses that score to make delivery decisions. You'll 
have to ask on that tool's support list.


Given that simscan has an --enable-quarantinedir option, the simscan list 
would probably be the place to ask next.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
 Today: the 222nd anniversary of the signing of the U.S. Constitution


Re: Move SPAM into SPAM folder

2009-09-17 Thread Karsten Bräckelmann
On Thu, 2009-09-17 at 17:07 -0500, Jose Luis Marin Perez wrote:
> I need to know how can I automatically move all emails that are
> considered as SPAM to a specific directory called SPAM.

This is not a SA question. SA scores mail. It does not deliver, move or
reject mail, nor anything else but scoring and classifying.

> At present all emails are considered SPAM go to the quarantine folder
> on the server.

So what delivers them there? Simscan?


-- 
char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



[no subject]

2009-09-17 Thread Jose Luis Marin Perez

Dear Sirs,

I need to know how can I automatically move all emails that are considered as 
SPAM to a specific directory called SPAM.

The server has installed Dovecot + Qmail + Vpopmail + Simscan + Spamassassin 
3.2.5 + ClamAV.

Simscan is configured with:

./configure --enable-clamav=y --enable-clamdscan=/usr/local/bin/clamdscan 
--enable-dropmsg=y --enable-custom-smtp-reject=n --enable-per-domain=y 
--enable-attach=y --enable-spam=y --enable-ripmime=/usr/local/bin/ripmime 
--enable-received=y --enable-spam-hits=5.0 --enable-spamc=/usr/bin/spamc 
--enable-spamc-args="-s 20 -t 60 -U /tmp/spamd.sock" --enable-spamc-user=y 
--enable-regex=y --with-pcre-include=/usr/local/include --enable-quarantinedir

At present all emails are considered SPAM go to the quarantine folder on the 
server.

Thanks

Jose Luis
  
_
Connect to the next generation of MSN Messenger 
http://imagine-msn.com/messenger/launch80/default.aspx?locale=en-us&source=wlmailtagline

Move SPAM into SPAM folder

2009-09-17 Thread Jose Luis Marin Perez

Dear Sirs,

I need to know how can I automatically move all emails that are considered as 
SPAM to a specific directory called SPAM.

The server has installed Dovecot + Qmail + Vpopmail + Simscan + Spamassassin 
3.2.5 + ClamAV.

Simscan is configured with:

./configure --enable-clamav=y --enable-clamdscan=/usr/local/bin/clamdscan 
--enable-dropmsg=y --enable-custom-smtp-reject=n --enable-per-domain=y 
--enable-attach=y --enable-spam=y --enable-ripmime=/usr/local/bin/ripmime 
--enable-received=y --enable-spam-hits=5.0 --enable-spamc=/usr/bin/spamc 
--enable-spamc-args="-s 20 -t 60 -U /tmp/spamd.sock" --enable-spamc-user=y 
--enable-regex=y --with-pcre-include=/usr/local/include --enable-quarantinedir

At present all emails are considered SPAM go to the quarantine folder on the 
server.

Thanks

Jose Luis 
_
Explore the seven wonders of the world
http://search.msn.com/results.aspx?q=7+wonders+world&mkt=en-US&form=QBRE

Re: NOTICE: SpamAssassin 3.3.0 mass-checks now starting

2009-09-17 Thread Austin
On Thu, Sep 17, 2009 at 11:39 AM, John Hardin  wrote:
> On Thu, 17 Sep 2009, LuKreme wrote:
>
>> On Sep 16, 2009, at 22:13, Austin  wrote:
>>
>>> It had one header: Subject.  Then a body.  Should
>>> I leave stuff like this in?  I mean, it is ham, but...
>>
>> My feeling would be if it is local only then don't include it.
>
> Agreed.

Thanks for the guidance, all.  I'll toss the absurd things that never
left our network.  There aren't all that many of them, but I wouldn't
want to pollute the pool.

Austin.


Re: Custom Bayes score

2009-09-17 Thread Jari Fredriksson
> In a general consensus for those who have customized your
> BAYES scores, what are they? I have been experimenting
> with them, but I have not been successful with a
> "perfect" score. What I'm NOT looking for is a lecture on
> how everybody's systems are different. 
> 

I have set BAYES_99 to 5, but not changed the other values.

It seems that my bayes training is good, and 99% of my spam generates BAYES_99.

So. it's a poison pill, but no false positives with that rule yet. Kind of like 
a personal system, while I do filter  some mail of a small company too with the 
same setup (info@ address).




Re: Skip DNSBL checks for a specific IP/Net

2009-09-17 Thread Matus UHLAR - fantomas
On 17.09.09 23:07, Karsten Bräckelmann wrote:
> Microsoft Office Outlook 11.  Wow, I didn't know it sucks *that* badly.
> No proper threading headers (In-Reply-To and References),

I think that newest outlook does that, luckily. However it still can't
thread on those and uses its stupid Thread-Index and Thread-Topic headers.

> produces fugly
> Kammquoting,

there is software called outlook-quotefix. Even better version is there for
outlook express.

> and even injects an empty line after each and every line of
> text.

I really wonder if it does this always or only when set to "Use microsoft
word for composing messages". Gotta check when I get around this damn
fscking "install updates at shutdown" thingy (when I'll be able to install
updates without having to turn computer off from windows).

> And *then* converts that monstrosity into HTML, preserving the
> utterly broken format.

now THIS is well-known for years I'd say :) When was M$ software able to
produce correct and simple HTML?
(No, in notepad it's not the software but the user...)

-- 
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
"To Boot or not to Boot, that's the question." [WD1270 Caviar]


Re: Skip DNSBL checks for a specific IP/Net

2009-09-17 Thread Karsten Bräckelmann
On Thu, 2009-09-17 at 23:07 +0200, Karsten Bräckelmann wrote:
> > Can you help me writing these ?
> 
> Well, here are two UNTESTED and ad-hoc written rules, to be used in a
> meta as I previously outlined. The first IP variant in this case matches
> an entire /24 network, the RDNS variant matches any Hotmail blu0 host.
> Both are pretty much examples and likely need to be adjusted.
> 
> header HOTMAIL_IP_TO_MX[...]
> 
> header HOTMAIL_RDNS_TO_MX  [...]

Whoops!  Since these are meant to be used in metas and not score
anything on their own, both these example rules should use rule names
starting with a double underscore -- to make them non-scoring sub-rules.

header __NON_SCORING_EXAMPLE  [...]


-- 
char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Custom Bayes score

2009-09-17 Thread Matus UHLAR - fantomas
On 17.09.09 15:35, Dan Schaefer wrote:
> In a general consensus for those who have customized your BAYES scores,  
> what are they? I have been experimenting with them, but I have not been  
> successful with a "perfect" score. What I'm NOT looking for is a lecture  
> on how everybody's systems are different.

I was tweaking only a little bit, +4 to -3. there still may appear BAYES_99
ham and BAYES_00 spam (I think it happened last year, not sure) to I better
don't use BAYES as poison-pill...

-- 
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Microsoft dick is soft to do no harm


Re: strange entries in log

2009-09-17 Thread Matus UHLAR - fantomas
> >> Per Jessen wrote:
> >>> A domain name component cannot be more than 63 characters.

> > On Sep 17, 2009, at 11:25, Jason Bertoch  wrote:
> >> I am under the impression that host names are limited to 255 characters.
> >>  Maybe I missed something in the RFC?

> 2009/9/17 LuKreme :
> > No, 255 TOTAL characters, but no one component can exceed 63.

On 17.09.09 16:11, François Rousseau wrote:
> Thanks for the information.
> 
> If I understand correctly, an email is scan by spamassassin and in
> this email, a link or a headers point to a an "invalid" domain.

yes. I wonder if that was some invalid hostname/uri in mail or just
something what was misunderstood and checked (and since spammers try to hide
those, I don't object against testing anything that could be an
uri/hostname).

-- 
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Linux is like a teepee: no Windows, no Gates and an apache inside...


Re: Skip DNSBL checks for a specific IP/Net

2009-09-17 Thread Karsten Bräckelmann
Microsoft Office Outlook 11.  Wow, I didn't know it sucks *that* badly.
No proper threading headers (In-Reply-To and References), produces fugly
Kammquoting, and even injects an empty line after each and every line of
text. And *then* converts that monstrosity into HTML, preserving the
utterly broken format.

Next time, please do as you did with your previous reply. Thanks.

On Thu, 2009-09-17 at 16:04 -0400, Philippe Ratté wrote:
> > Ok, slow down. What rules *exactly* are hitting on these messages?
> 
> See below

No answer to that, but see below. :)

> > A SORBS listing does NOT explain why your customer doesn't get his mail.

> > You wouldn't happen to run RBL checks at SMTP stage, prior to SA, that 
> > outright block based on a single BL hit? 
> 
> This is true, I forgot to mention a very important detail. Mail was
> getting blocked by another program named rblsmtpd at SMTP stage. 
> 
> I found the way to skip DNSBL checks for a particular IP in rblsmtpd,
> but not into SpamAssassin. The reason why I wanted to do the same
> thing into SA was to ensure that it would not be blocked at this stage
> and tell my customer that Hotmail is white listed.

Not blacklisting is not the same as whitelisting.

> Found the reason (rblsmtpd). I did not know how SA handled DNSBL so
> maybe simply removing it from rblsmtpd would be enough.

Yes, indeed. As I explained previously, the SA score for any SORBS hit
is not sufficient to push it over a sane spam score threshold, let alone
a typically higher threshold to reject based on the SA score. *IFF* you
reject based on SA at all.

FWIW, SA handles DNSBLs like every other rule -- it scores it. SA is a
scoring system, and by default and design no one rule hit is sufficient
to push it over the spam threshold single-handedly.


So again, yes -- not having rblsmtpd block mail on SORBS should already
be enough. Even if SA still scores it, it should be fine.

If, however, there *might* be potential for SA flagging these mail as
spam with a SORBS hit -- which requires other rules to also hit on these
mail and account for 80+ % of the score -- we won't know until you show
us the SA rules hit. As I requested before.


> I like John's idea :
> 
> meta  NO_RBL_HOTMAIL  RBL_SORBS && FROM_HOTMAIL
> score NO_RBL_HOTMAIL  -2

The idea was actually mentioned by me, but oh well. ;)

> Can you help me writing these ?

Well, here are two UNTESTED and ad-hoc written rules, to be used in a
meta as I previously outlined. The first IP variant in this case matches
an entire /24 network, the RDNS variant matches any Hotmail blu0 host.
Both are pretty much examples and likely need to be adjusted.

header HOTMAIL_IP_TO_MXX-Spam-Relay-Untrusted =~ /^\[ ip=65\.55\.111\./

header HOTMAIL_RDNS_TO_MX  X-Spam-Relay-Untrusted =~ /^\[ [^\]]+ rdns=[^ 
]+\.blu0\.hotmail\.com /

Also, see the docs.
  http://wiki.apache.org/spamassassin/WritingRules
  http://wiki.apache.org/spamassassin/TrustedRelays


However, I still do not believe this to be necessary.

Moreover, I don't think assigning any negative score to hotmail, solely
based on that fact is a good idea. You mentioned it yourself -- you *do*
get spam from hotmail. Thus, if you really feel like you need white-
listing, I'd recommend whitelist_from_rcvd or friends instead.

  guenther


-- 
char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: strange entries in log

2009-09-17 Thread François Rousseau
Thanks for the information.

If I understand correctly, an email is scan by spamassassin and in
this email, a link or a headers point to a an "invalid" domain.

Thanks,
François

2009/9/17 LuKreme :
> On Sep 17, 2009, at 11:25, Jason Bertoch  wrote:
>
>> Per Jessen wrote:
>>>
>>> A domain name component cannot be more than 63 characters.
>>
>> I am under the impression that host names are limited to 255 characters.
>>  Maybe I missed something in the RFC?
>
> No, 255 TOTAL characters, but no one component can exceed 63.
>
>


Re: Skip DNSBL checks for a specific IP/Net

2009-09-17 Thread Philippe Ratté
> De : Karsten Bräckelmann [ 
mailto:guent...@rudersport.de]

> 

> On Wed, 2009-09-16 at 15:38 -0400, Philippe Ratté wrote:

> > > If it is anything else, we might be much better able to help you, 

> > > if we know about the issue -- rather than what you think would be 

> > > the best solution. ;)

> >

> > The situation is about Hotmail. Yesterday a customer told me he was

> having

> > problems between his corporative account and Hotmail, the customers 

> > of

> my

> > customer were unable to contact him.

> >

> > I noticed at that time 65.55.111.100 was part of SORBS BL.

> >  
http://www.us.sorbs.net/lookup.shtml?65.55.111.100 indicates :

> > Address:  65.55.111.100

> > Record Created: Wed Oct 29 19:00:03 2008 GMT

> > Record Updated: Mon Sep 14 08:56:51 2009 GMT

> > Additional Information:  [ Updated via: Report 'o Matic ] Received:

> from

> > blu0-omc2-s25.blu0.hotmail.com (blu0-omc2-s25.blu0.hotmail.com

> > [65.55.111.100]) by anaconda.sorbs.net (Postfix) with ESMTP id

> E0D9B2E05D

> > for <[email]>; Mon, 14 Sep 2009 14:31:01 +1000 (EST) Currently 

> > active

> and

> > flagged to be published in DNS

> 

> Ok, slow down. What rules *exactly* are hitting on these messages?

 

See below

 

> 

> 'grep SORBS 50_scores.cf'. All SORBS listings score below 1. Oddly, 

> SORBS SPAM is missing there, but that just means it is a default score 

> of 1 for the hit.

> 

> A score of <= 1 cannot be the reason for blocked mail! There's at 

> least another 4 points to be added by other rule hits. Well, as far as 

> a sane SA configuration is concerned.

> 

> A SORBS listing does NOT explain why your customer doesn't get his mail.

> 

> Also, SA merely scores. It doesn't reject, but lets all mail through.

> Any action whatsoever is duty of some other tool in your mail 

> processing chain. Which one is the culprit responsible for "your 

> customer not getting his mail"? Regardless if that tool ended up 

> rejecting the mail or delivered it to some kind of dedicated or 

> quarantine folder -- I'd check back there.

> 

> You wouldn't happen to run RBL checks at SMTP stage, prior to SA, that 

> outright block based on a single BL hit?

> 

 

This is true, I forgot to mention a very important detail. Mail was getting
blocked by another program named rblsmtpd at SMTP stage.

 

I found the way to skip DNSBL checks for a particular IP in rblsmtpd, but
not into SpamAssassin. The reason why I wanted to do the same thing into SA
was to ensure that it would not be blocked at this stage and tell my
customer that Hotmail is white listed.

 

 

> 

> Oddly enough, my own checks are inconsistent. :-/  While the sorbs.net 

> lookup indeed does claim exactly what you posted, my own 'host' check 

> returns NXDOMAIN. Two additional, independent BL lookup forms don't 

> agree with each other either.

 

I also see this actually (NXDOMAIN), maybe the web interface of SORBS is not
up-to-date.

 

> 

> 

> > Customer asked "can you white-list them temporarly ?"

> >

> > We have a firewall with a network setup which allow me to bypass RBL 

> > + SpamAssassin easily. We did this with most of Hotmail's IPs until 

> > we

> started

> > receiving spam from valid Hotmail accounts.

> >

> > I do not want to let Hotmail completely white listed, my idea was to

> skip

> > RBL checks and keep other checks in place.

> 

> First of all, you want to skip a single BL. Not all of them. And 

> second, as mentioned above, there is *much* more to your problem than 

> what you provided in your post.

> 

> Mail is not being delivered, so go check the reason. If it is a high 

> SA score, you'll find lots more evil than this in the rules triggered.

 

Found the reason (rblsmtpd). I did not know how SA handled DNSBL so maybe
simply removing it from rblsmtpd would be enough.

 

I like John's idea :

meta  NO_RBL_HOTMAIL  RBL_SORBS && FROM_HOTMAIL score NO_RBL_HOTMAIL  -2

 

Can you help me writing these ?

 

Thanks and have a nice day

 

 

> 

> 

> --

> char

> *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4

> "; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i i%8?

> c<<=1:

> (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ 

> putchar(t[s]);h=m;s=0; }}}

> 

 

 

 

 



Custom Bayes score

2009-09-17 Thread Dan Schaefer
In a general consensus for those who have customized your BAYES scores, 
what are they? I have been experimenting with them, but I have not been 
successful with a "perfect" score. What I'm NOT looking for is a lecture 
on how everybody's systems are different.


Thanks,
Dan Schaefer
Web Developer/Systems Analyst
Performance Administration Corp.



Re: NOTICE: SpamAssassin 3.3.0 mass-checks now starting

2009-09-17 Thread John Hardin

On Thu, 17 Sep 2009, LuKreme wrote:


On Sep 16, 2009, at 22:13, Austin  wrote:


It had one header: Subject.  Then a body.  Should
I leave stuff like this in?  I mean, it is ham, but...


My feeling would be if it is local only then don't include it.


Agreed.

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Perfect Security and Absolute Safety are unattainable; beware
  those who would try to sell them to you, regardless of the cost,
  for they are trying to sell you your own slavery.
---
 Today: the 222nd anniversary of the signing of the U.S. Constitution


Re: strange entries in log

2009-09-17 Thread LuKreme

On Sep 17, 2009, at 11:25, Jason Bertoch  wrote:


Per Jessen wrote:

A domain name component cannot be more than 63 characters.
I am under the impression that host names are limited to 255  
characters.  Maybe I missed something in the RFC?


No, 255 TOTAL characters, but no one component can exceed 63.



Re: strange entries in log

2009-09-17 Thread Jason Bertoch

Jason Bertoch wrote:

Per Jessen wrote:
A domain name component cannot be more than 63 characters.   
I am under the impression that host names are limited to 255 
characters.  Maybe I missed something in the RFC?

Hmm, I suppose I did get it wrong...I misunderstood the definition of label.


Re: NOTICE: SpamAssassin 3.3.0 mass-checks now starting

2009-09-17 Thread LuKreme
On Sep 16, 2009, at 22:13, Austin   
wrote:



It had one header: Subject.  Then a body.  Should
I leave stuff like this in?  I mean, it is ham, but...


My feeling would be if it is local only then don't include it.

--
Sent from my iPhone



Re: strange entries in log

2009-09-17 Thread Jason Bertoch

Per Jessen wrote:
A domain name component cannot be more than 63 characters. 
  
I am under the impression that host names are limited to 255 
characters.  Maybe I missed something in the RFC?


Re: strange entries in log

2009-09-17 Thread Per Jessen
François Rousseau wrote:

> Hello,
> 
> I just notice that I have many time the same strange line in my spamd
> log.  Anyone have an idea about this?
> 
> SpamAssassin Server version 3.2.5
> 
> log entries:
> Sep 16 18:08:06 myhostname spamd[2008]:
> s-female-silhouettes-beverages-film-hearts-music-notes-and-thea...
> Sep 16 18:08:06 myhostname spamd[2008]: truncated to 63 octets
> (RFC1035 2.3.1) at /usr/lib/perl5/Net/DNS/Question.pm line 233

A domain name component cannot be more than 63 characters. 


/Per Jessen, Zürich



Re: NOTICE: SpamAssassin 3.3.0 mass-checks now starting

2009-09-17 Thread Justin Mason
On Thu, Sep 17, 2009 at 04:01, Warren Togami  wrote:
> On 09/16/2009 11:25 PM, Justin Mason wrote:
>>
>> excellent.  That's 2 people who could do with an extension, then!
>
> Could we state with clarity the new deadline?  I might have other people
> with data depending on the extended deadline.

Let's push it out until Monday.

regarding corpus cleaning, RTFM:
http://wiki.apache.org/spamassassin/CorpusCleaning (linked from the
RescoreDetails page)

-- 
--j.


strange entries in log

2009-09-17 Thread François Rousseau
Hello,

I just notice that I have many time the same strange line in my spamd
log.  Anyone have an idea about this?

SpamAssassin Server version 3.2.5

log entries:
Sep 16 18:08:06 myhostname spamd[2008]:
s-female-silhouettes-beverages-film-hearts-music-notes-and-thea...
Sep 16 18:08:06 myhostname spamd[2008]: truncated to 63 octets
(RFC1035 2.3.1) at /usr/lib/perl5/Net/DNS/Question.pm line 233


Thanks,
François


Re: Experimental Plugin: MetaSVM

2009-09-17 Thread Marc Perkel

So - what ever happened to this project? Was it finished?

decoder wrote:

LuKreme wrote:
I don't see any need for the model to be dynamic.  Periodic 
recalculation of it should be just fine.  I bet even daily 
reprocessing will prove to be over zealous. Weekly, perhaps even 
monthly.

This is what I think as well :)


I'm thinking that FPs and FNs are bayes problem anyway.  This tool 
need to concentrate on seeing just what rules hit and building off 
that. I'd go so far to say that as far as SVM is concerned, there is 
no such thing as a false postive or negative.
What do you mean by that? Of course FPs and FNs might also be a 
problem for the SVM, every wrong classified point is certainly a 
problem for a machine learning algorithm. However, I think that the 
SVM is quite robust to a certain amount of FPs/FNs if the majority of 
the training points is correct.



So, if you feel like trying out the plugin, let me know how well it 
works =) I'm especially interested in those cases where it increases 
the spam detection rate (reducing false negatives). Might be easy to 
extract this information from logs.




Thanks and regards,



Chris


Re: NOTICE: SpamAssassin 3.3.0 mass-checks now starting

2009-09-17 Thread Warren Togami

On 09/17/2009 08:34 AM, Mark Martinec wrote:

Austin,


now hope to do this Thursday/Friday.  I should be able to scan my
million or so messages in a day on my cluster.


Wow, that makes me feel inadequate :)  I'm struggling to clean up my
little ham sample of 3600 messages, and looking at another couple
thousand that I'll do if I've got time...


Thanks, that will be nice to have. As the rulesqa site can distinguish
results based on a corpus submitter, even a small but carefully checked
collection is worth having.

I found it valuable to double check ham samples which fire rules
URIBL_JP_SURBL, URIBL_WS_SURBL, URIBL_OB_SURBL,
RCVD_IN_PBL, RCVD_IN_XBL, RCVD_IN_PSBL, RCVD_IN_SSBL


https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6156
Be aware that gmail, yahoo.co.jp and rr.com were whitelisted from new 
inclusion only 5 days ago.  IP's from prior could still be listed before 
the 2 week timeout.  Auto-whitelisting of yahoo.com is not yet 
implemented.  riel is working on DKIM checking in order to whitelist 
yahoo.com.


FP's of PSBL are already rare, but they should become rarer.

Please let us know if you see FP's from a legitimate ISP MTA server. 
That MTA can be whitelisted from PSBL by either listing itself in DNSWL, 
or letting us know to check it by SPF or DKIM.


Warren Togami
wtog...@redhat.com


Re: NOTICE: SpamAssassin 3.3.0 mass-checks now starting

2009-09-17 Thread Henrik K
On Thu, Sep 17, 2009 at 02:34:24PM +0200, Mark Martinec wrote:
> Austin,
> 
> > > now hope to do this Thursday/Friday.  I should be able to scan my
> > > million or so messages in a day on my cluster.
> > 
> > Wow, that makes me feel inadequate :)  I'm struggling to clean up my
> > little ham sample of 3600 messages, and looking at another couple
> > thousand that I'll do if I've got time...
> 
> Thanks, that will be nice to have. As the rulesqa site can distinguish
> results based on a corpus submitter, even a small but carefully checked
> collection is worth having.
> 
> I found it valuable to double check ham samples which fire rules
> URIBL_JP_SURBL, URIBL_WS_SURBL, URIBL_OB_SURBL,
> RCVD_IN_PBL, RCVD_IN_XBL, RCVD_IN_PSBL, RCVD_IN_SSBL

There's lots that one can do..

- analyze corpuses through dspam_train, spots misfiles quite nicely (might
  also use crm114, haven't tried)

- clamscan hams with sanesecurity etc

- grep ham/spam.log for rules with S/O >= ~0.98 (most likely includes all
  that Marc said and more)

- grep Subjects from spams and grep all those from ham (and vice versa)

- fuzzily hash duplicate mails away, so miscategoried mails have smaller
  effect on the totals (or does it make good rules seem worse? heh..), you
  can also spot similar mails that are in both ham+spam for double checking

Sadly I don't have a cleanly defined process yet, it's all scripts and
memorized one-liners. Finding FPs from spam-corpus is more important but
harder..



Re: NOTICE: SpamAssassin 3.3.0 mass-checks now starting

2009-09-17 Thread Mark Martinec
Austin,

> > now hope to do this Thursday/Friday.  I should be able to scan my
> > million or so messages in a day on my cluster.
> 
> Wow, that makes me feel inadequate :)  I'm struggling to clean up my
> little ham sample of 3600 messages, and looking at another couple
> thousand that I'll do if I've got time...

Thanks, that will be nice to have. As the rulesqa site can distinguish
results based on a corpus submitter, even a small but carefully checked
collection is worth having.

I found it valuable to double check ham samples which fire rules
URIBL_JP_SURBL, URIBL_WS_SURBL, URIBL_OB_SURBL,
RCVD_IN_PBL, RCVD_IN_XBL, RCVD_IN_PSBL, RCVD_IN_SSBL

> Also, I need some advice, if someone can provide it.  I'm looking at a
> message (and I have several like this in my corpus at present) which
> generates the following log line
> 
> .  1 /home/gems/ham//cur/n8500ejj019591:2,S
> MISSING_DATE,MISSING_HEADERS,MISSING_MID,T_FSL_HELO_NON_FQDN_2,__DKIM_DEPEN
> DABLE,__DNS_FROM_RFC_ABUSE,__DOS_DIRECT_TO_MX,__DOS_HAS_ANY_URI,__DOS_RCVD_
> FRI,__DOS_SINGLE_EXT_RELAY,__HAS_ANY_EMAIL,__HAS_ANY_URI,__HAS_RCVD,__HAS_S
> UBJECT,__HAVE_BOUNCE_RELAYS,__LAST_EXTERNAL_RELAY_NO_AUTH,__LAST_UNTRUSTED_
> RELAY_NO_AUTH,__MISSING_REF,__MISSING_REPLY,__MISSING_THREAD,__NONEMPTY_BOD
> Y,__NUMBERS_IN_SUBJ,__RCVD_IN_2WEEKS,__RFC_IGNORANT_ENVFROM,__TO_NO_ARROWS_
> R,__TVD_BODY learn=ham,time=1252108840,scantime=1,format=f,reuse=no,set=1
> 
> It's clearly a poorly constructed message, but it's also clearly ham
> (it originated from an application that someone somewhere in my
> organization runs).  It had one header: Subject.  Then a body.  Should
> I leave stuff like this in?  I mean, it is ham, but...

I can't offer a definite answer (other comments are welcome), but I'd say
keep a few samples in your ham collection, but not in many copies.

  Mark


Re: NOTICE: SpamAssassin 3.3.0 mass-checks now starting

2009-09-17 Thread Warren Togami

On 09/16/2009 11:25 PM, Justin Mason wrote:

excellent.  That's 2 people who could do with an extension, then!


Could we state with clarity the new deadline?  I might have other people 
with data depending on the extended deadline.





Re: NOTICE: SpamAssassin 3.3.0 mass-checks now starting

2009-09-17 Thread Mark Martinec
On Wednesday September 16 2009 22:03:17 Justin Mason wrote:
> Who is running a mass-check that's still in progress?  (fwiw, I am ;)
> It'll be at least 5 users (with myself and John), but that's not a
> great population of training data.

I spent a couple of afternoons cleaning up my corpus or 60.000 messages
(of which 39000 is ham, checked and rechecked). I have already uploaded
my results, although I will probably do another iteration of hand-weeding
based on nightly ruleqa results - it will be there by the end of the day.

  Mark