no tokens ? How can that be ?

2006-09-28 Thread Michael Grey








I came across a situation that seems non-intuitive;

 

Two emails this am were spam, but hit BAYES_00.  So they
were (presumably) learned as Ham somewhere along the way.

So far so good…

 

Doing  ‘ sa-learn –forget ./message.txt ‘
gets me : Forgot tokens 0 from message(s) (1 message(s) examined)

 

 

What kind of situation can cause this ? I was under the
impression that Bayes_00 meant it was explicitly learned as spam, so there must
be related tokens.

 

 

Thanks

 

Michael Grey

 

 








Bayes lost info when 'upgrading' ?

2006-09-13 Thread Michael Grey








We went through the process of changing from a V2 BDB to a
V3 DBD then -> MySQL.

 

When running the tests side by side, old system with new we
see some substantial inconsistencies between the bayes scoring…

 

Any ideas why ?  There are obviously fewer tokens now
than before the sync, but no mention in the docs of data being lost…

 

Thanks

 

Michael Grey

 

 

 

 

Old System’s bdb info :

 

0.000 
0 
2  0  non-token data:
bayes db version

0.000 
0   
3541311  0  non-token
data: nspam

0.000 
0   
1707362  0  non-token
data: nham

0.000 
0
343897  0  non-token
data: ntokens

0.000 
0 1157674321  0 
non-token data: oldest atime

0.000 
0 1157749228  0 
non-token data: newest atime

0.000 
0 1157749183  0 
non-token data: last journal sync atime

0.000 
0 1157717569      0 
non-token data: last expiry atime

0.000 
0 
43200  0  non-token
data: last expire atime delta

0.000 
0
145970  0  non-token
data: last expire reduction count

 

New systems bdb info after –sync :

0.000 
0 
3  0  non-token data:
bayes db version

0.000 
0   
3541670  0  non-token
data: nspam

0.000 
0   
1707603  0  non-token
data: nham

0.000 
0
263855  0  non-token
data: ntokens

0.000 
0 1157707085  0 
non-token data: oldest atime

0.000 
0 1157755124  0 
non-token data: newest atime

0.000 
0 
0  0  non-token data:
last journal sync atime

0.000 
0 
0  0  non-token data:
last expiry atime

0.000 
0 
0  0  non-token data:
last expire atime delta

0.000 
0 
0  0  non-token data:
last expire reduction count








BAYES_00

2006-09-08 Thread Michael Grey








Forgive what may be a newbie question;

 

If you hit on BAYES_00, does that mean explicitly that the
email has been learned as NOT SPAM ? 

 

If this is not the case ( or ONLY the case,) what other
conditions may cause this ? ( Presuming the DB is available / healthy etc. )

 


Thanks…

 

Michael Grey

 

 

 








RE: Fuzzy OCR false positives from Screenshots...

2006-09-08 Thread Michael Grey

You will have to ask the cell company about the first issue ...

In regards to the second, many large companies have outside companies do work
for them in the areas of marketing and other aspects. So this also will
happen regardless.

Let me clarify; this is an OUTSIDE relay to INSIDE...

A FuzzyOCR White List with (very privately held) keywords would help. 

Any other ideas ?



Michael Grey




-Original Message-
From: John D. Hardin [mailto:[EMAIL PROTECTED] 
Sent: Friday, September 08, 2006 10:10 AM
To: Michael Grey
Cc: users@spamassassin.apache.org
Subject: Re: Fuzzy OCR false positives from Screenshots...

On Fri, 8 Sep 2006, Michael Grey wrote:

> However, there have been two occasions in the last 24 hrs where screenshots
> embedded into the emails caused false positives.
> 
> One was an 'account summary' from a cell company, the other was some
internal
> marketing info.
> 
> Are there other approaches to getting certain images white listed if they
> contain, say, our specific company name ?

Don't run SA against internal email.

And what the heck is a cell-phone company doing sending you
screenshots?

--
 John Hardin KA7OHZICQ#15735746http://www.impsec.org/~jhardin/
 [EMAIL PROTECTED]FALaholic #11174pgpk -a [EMAIL PROTECTED]
 key: 0xB8732E79 - 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  If someone has a gun and is trying to kill you, it would be
  reasonable to shoot back with your own gun.
  -- the Dalai Lama, May 15, 2001
---
 9 days until The 219th anniversary of the signing of the U.S. Constitution



Fuzzy OCR false positives from Screenshots...

2006-09-08 Thread Michael Grey








We are testing a new configuration using FuzzyOCR, and found
it to work very well overall… 

 

However, there have been two occasions in the last 24 hrs
where screenshots embedded into the emails caused false positives.

 

One was an ‘account summary’ from a cell
company, the other was some internal marketing info.

 

Are there other approaches to getting certain images white listed
if they contain, say, our specific company name ?

 

Any other ideas on how to deal with this ?

 

 

Many thanks !

 

 

Michael Grey

 

 








RE: source SENDER authentication ? (as opposed to SPF HOST authentication)

2006-08-30 Thread Michael Grey
Yes, I tend to agree with this... the reason why many POP servers reply to
VRFY with 'You can try...' instead of a yes or no.

Unfortunately I am not the one driving this requirement ;) 

I like Michel Vaillancourt's idea - if it has to be done.

I appreciate everyone's feedback to this question. 


Michael Grey

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, August 30, 2006 10:44 AM
To: Gino Cerullo
Cc: users@spamassassin.apache.org
Subject: Re: source SENDER authentication ? (as opposed to SPF HOST
authentication) 


Gino Cerullo writes:
> part 1.2   text/plain1027
> On 30-Aug-06, at 1:10 PM, Michael Grey wrote:
> 
> > Are there any SA methods that allow verification of the 'sender' of  
> > an email ?
> >
> > I am aware of SPF which can confirm that a host at ip address  
> > x.x.x.x is authorized to send mail as from domain "A", but how  
> > about a means to confirm that '[EMAIL PROTECTED]' actually is a  
> > real user before accepting mail from him ?
> >
> I don't believe SA can do that as it's a content filter. Some MTAs  
> can do this and this is were you want those kinds of verifications to  
> happen, before DATA. The problem is that if you do it for every  
> address you will get false positives, especially from sources like  
> mailing lists, news & info subscriptions, etc., and you'll find  
> yourself whitelisting alot.
> 
> I actually do this using Postfix but I use a table of 'frequently  
> forged domains' whose addresses are verified before they are allowed  
> to pass on to the content filters.

It's also worth noting that doing this is counterproductive in an overall
strategy sense, since it drives the spammers to simply use known-valid
third-party addresses -- such as random addrs from their target address
list -- as the forged source of the spam.  The end result for us end
users, is a massive increase in "spam blowback", which is what we've
seen since those MTAs implemented it. :(

--j.


source SENDER authentication ? (as opposed to SPF HOST authentication)

2006-08-30 Thread Michael Grey








Are there any SA methods that allow verification of the ‘sender’
of an email ? 

 

I am aware of SPF which can confirm that a host at ip
address x.x.x.x is authorized to send mail as from domain “A”, but
how about a means to confirm that ‘[EMAIL PROTECTED]’ actually
is a real user before accepting mail from him ? 

 

Thanks

 

 

Michael Grey

 

 








RE: FuzzyOCR Install - Issues processing ONLY Gif images.

2006-08-30 Thread Michael Grey
I did have libungif installed, but the rpm doesn't add some of the needed
support that libungif-progs provides.  That did the trick.

Thanks !

Michael Grey

-Original Message-
From: Tim Litwiller [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, August 29, 2006 8:29 PM
To: users@spamassassin.apache.org
Subject: Re: FuzzyOCR Install - Issues processing ONLY Gif images.

try changing your time out from 10 seconds to 15 or 20 and verify that 
giffix is installed and working correctly.
libungif-utils rpm on fedora

Michael Grey wrote:
>
> Installed FuzzyOCR and believe all the dependencies.
>
> Using the sample images I get a Pipe Error ONLY on gif images; 
> resulting in no hits on FUZZY_OCR.
>
> Pipe Command "/usr/bin/giftopnm -"
>
> Giftopnm exists in that path.
>
> Running giftopnm on the command line seems to work with no errors, 
> spitting out a binary file to stdout as expected.
>
> Any ideas of what might be missing ? ( Fedora Core 4 ).
>
> Thanks...
>
>
> Michael Grey
>
> - log / reports -
>
> Corrupted-gif.eml
>
> pts rule name description
>
>  -- 
> --
>
> 0.1 HTML_MESSAGE BODY: HTML included in message
>
> 3.0 BAYES_95 BODY: Bayesian spam probability is 95 to 99%
>
> [score: 0.9694]
>
> 1.5 FUZZY_OCR_WRONG_CTYPE BODY: Mail contains an image with wrong
>
> content-type set
>
> Image has format "GIF" but content-type is
>
> "image/jpeg"
>
> [2006-08-29 19:20:00] Debug mode: Image has format "GIF" but 
> content-type is "image/jpeg"
>
> [2006-08-29 19:20:01] Debug mode: Image is single non-interlaced...
>
> [2006-08-29 19:20:01] Unexpected error in pipe to external programs.
>
> Please check that all helper programs are installed and in the correct 
> path.
>
> (Pipe Command "/usr/bin/giftopnm -", Pipe exit code 1 (""), Temporary 
> file: "/tmp/.spamassassin23614sXR9Dltmp")
>
> [2006-08-29 19:20:01] Debug mode: FuzzyOcr ending successfully...
>
> bash-3.00$
>
> animated-gif.eml
>
> pts rule name description
>
>  -- 
> --
>
> 0.7 DATE_IN_PAST_06_12 Date: is 6 to 12 hours before Received: date
>
> 0.1 HTML_MESSAGE BODY: HTML included in message
>
> 0.0 BAYES_50 BODY: Bayesian spam probability is 40 to 60%
>
> [score: 0.5000]
>
> [2006-08-29 19:22:12] Debug mode: Analyzing file with content-type 
> "image/gif"
>
> [2006-08-29 19:22:12] Debug mode: Image is single non-interlaced...
>
> [2006-08-29 19:22:12] Unexpected error in pipe to external programs.
>
> Please check that all helper programs are installed and in the correct 
> path.
>
> (Pipe Command "/usr/bin/giftopnm -", Pipe exit code 1 (""), Temporary 
> file: "/tmp/.spamassassin23644bPPq3jtmp")
>
> [2006-08-29 19:22:12] Debug mode: FuzzyOcr ending successfully...
>



FuzzyOCR Install - Issues processing ONLY Gif images.

2006-08-29 Thread Michael Grey








Installed FuzzyOCR and believe all the dependencies.

 

Using the sample images I get a Pipe Error ONLY on gif
images; resulting in no hits on FUZZY_OCR. 

Pipe Command "/usr/bin/giftopnm -" 

 

Giftopnm exists in that path.

 

Running giftopnm on the command line seems to work
with no errors, spitting out a binary file to stdout as expected.

 

Any ideas of what might be missing ? ( Fedora Core 4
).

 

Thanks…


Michael Grey

 

 

 

 

 

- log / reports -

 

Corrupted-gif.eml

 

 pts rule name 
description

 --
--

 0.1 HTML_MESSAGE  
BODY: HTML included in message

 3.0 BAYES_95  
BODY: Bayesian spam probability is 95 to 99%

   
[score: 0.9694]

 1.5 FUZZY_OCR_WRONG_CTYPE 
BODY: Mail contains an image with wrong

    content-type
set

   
Image has format "GIF" but content-type is

   
"image/jpeg"

 

 

[2006-08-29 19:20:00] Debug
mode: Image has format "GIF" but content-type is
"image/jpeg"

[2006-08-29 19:20:01] Debug
mode: Image is single non-interlaced...

[2006-08-29 19:20:01]
Unexpected error in pipe to external programs.

  Please
check that all helper programs are installed and in the correct path.

  (Pipe
Command "/usr/bin/giftopnm -", Pipe exit code 1 (""),
Temporary file: "/tmp/.spamassassin23614sXR9Dltmp")

[2006-08-29 19:20:01] Debug
mode: FuzzyOcr ending successfully...

bash-3.00$ 

 

 

 

 

animated-gif.eml

 

 pts rule name 
description

 --
--

 0.7 DATE_IN_PAST_06_12
Date: is 6 to 12 hours before Received: date

 0.1 HTML_MESSAGE  
BODY: HTML included in message

 0.0 BAYES_50  
BODY: Bayesian spam probability is 40 to 60%

   
[score: 0.5000]

 

 

[2006-08-29 19:22:12] Debug
mode: Analyzing file with content-type "image/gif"

[2006-08-29 19:22:12] Debug
mode: Image is single non-interlaced...

[2006-08-29 19:22:12]
Unexpected error in pipe to external programs.

  Please
check that all helper programs are installed and in the correct path.

  (Pipe
Command "/usr/bin/giftopnm -", Pipe exit code 1 (""),
Temporary file: "/tmp/.spamassassin23644bPPq3jtmp")

[2006-08-29 19:22:12] Debug
mode: FuzzyOcr ending successfully...

 

 

 

 








RE: Adding 'SA scores' to all incoming mails

2006-08-24 Thread Michael Grey

Why is it 'better' ? I didn't say it was... 

Simply one of the possible approaches to getting the full headers.

Michael Grey


-Original Message-
From: John D. Hardin [mailto:[EMAIL PROTECTED] 
Sent: Thursday, August 24, 2006 3:14 PM
To: Michael Grey
Cc: users@spamassassin.apache.org
Subject: RE: Adding 'SA scores' to all incoming mails

On Thu, 24 Aug 2006, Michael Grey wrote:

> In this example, all emails get an additional header :
> 
> X-Spam-score-breakdown calvin score 6.77/4.5  
> 
> "add_header all score-breakdown calvin score _HITS_/_REQD_ "

And that's better than this:

X-Spam-Status: No, score=3.5 required=5.0 tests=BAYES_50,FROM_EXCESS_QP,
FROM_SUBDOMAIN,HTML_COMMENTS,HTML_EMBED_IMG_04,HTML_MESSAGE,
HTML_MIME_NO_HTML_TAG,MIME_HTML_ONLY,SARE_UNSUB38D,SPF_PASS,
SUBJECT_EXCESS_QP autolearn=disabled version=3.1.3

how?

I'm not saying it shouldn't be done, but that the scores and rule hits
are *already there* so why paste them in yet again?

> On Thu, 24 Aug 2006, list wrote:
> 
> > I'd like SA to make a extra line/section under all my mails where it 
> > tells what score the mail got (or maybe even which rules scored on the 
> > mail)  is there such a setting?
> > 
> > it would help me to finetune my SA.
> 
> You mean, actually paste the score into the body or attach it as
> another MIME body part?
> 
> Are the X-Spam-* headers not sufficient?

--
 John Hardin KA7OHZICQ#15735746http://www.impsec.org/~jhardin/
 [EMAIL PROTECTED]FALaholic #11174pgpk -a [EMAIL PROTECTED]
 key: 0xB8732E79 - 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Taking my gun away because I *might* shoot someone is like cutting
  my tongue out because I *might* yell "Fire!" in a crowded theater.
  -- Peter Venetoklis
---
 26 days until Talk Like a Pirate day



RE: Adding 'SA scores' to all incoming mails

2006-08-24 Thread Michael Grey
Check the docs for 'add_header' in local.cf or user_prefs.
The key words here are 'add_header all ' then the text  and variables you
want to have displayed; the Rule Scoring is another 'variable' that can be
sourced.

In this example, all emails get an additional header :

X-Spam-score-breakdown calvin score 6.77/4.5  

"add_header all score-breakdown calvin score _HITS_/_REQD_ "

Good luck...

Michael Grey


-Original Message-
From: John D. Hardin [mailto:[EMAIL PROTECTED] 
Sent: Thursday, August 24, 2006 2:32 PM
To: list
Cc: users@spamassassin.apache.org
Subject: Re: Adding 'SA scores' to all incoming mails

On Thu, 24 Aug 2006, list wrote:

> I'd like SA to make a extra line/section under all my mails where it 
> tells what score the mail got (or maybe even which rules scored on the 
> mail)  is there such a setting?
> 
> it would help me to finetune my SA.

You mean, actually paste the score into the body or attach it as
another MIME body part?

Are the X-Spam-* headers not sufficient?

--
 John Hardin KA7OHZICQ#15735746http://www.impsec.org/~jhardin/
 [EMAIL PROTECTED]FALaholic #11174pgpk -a [EMAIL PROTECTED]
 key: 0xB8732E79 - 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Taking my gun away because I *might* shoot someone is like cutting
  my tongue out because I *might* yell "Fire!" in a crowded theater.
  -- Peter Venetoklis
---
 26 days until Talk Like a Pirate day



RE: SPF Scoring... SPF_NEUTRAL

2006-08-23 Thread Michael Grey
Sorry, I was too philosophical in my question... to rephrase;

In the standard SA config, should I expect to see an SPF_* rule hit returned
when the SPF return value is 'none' ?

Thanks

Mike

-Original Message-
From: Gino Cerullo [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, August 23, 2006 9:54 AM
To: users@spamassassin.apache.org
Subject: Re: SPF Scoring... SPF_NEUTRAL

On 23-Aug-06, at 12:45 PM, Michael Grey wrote:

> Since this is not a production system, we have had to do some MX  
> magic on a
> remote domain to push mail through this new system... that domain  
> doesn't
> have SPF enabled (curse you Network Solutions !)
>
> So the big question is really this : Should "NONE" get an SPF score ?

That is a matter of internal policy on your part. If you want to  
penalize domains for not having an SPF record you could give it a  
negative score. On the other hand, if you wish to reward them for not  
having an SPF record give them a positive score.

I believe the general consensus is to leave it alone. Especially  
since SPF is still quite new and still technically in an experimental  
stage.


--
Gino Cerullo

Pixel Point Studios
21 Chesham Drive
Toronto, ON  M3M 1W6

416-247-7740





RE: SPF Scoring... SPF_NEUTRAL

2006-08-23 Thread Michael Grey

Since this is not a production system, we have had to do some MX magic on a
remote domain to push mail through this new system... that domain doesn't
have SPF enabled (curse you Network Solutions !) 

So the big question is really this : Should "NONE" get an SPF score ?

Thanks

Mike
-Original Message-
From: Noel Jones [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, August 23, 2006 9:17 AM
To: Michael Grey
Cc: users@spamassassin.apache.org
Subject: Re: SPF Scoring... SPF_NEUTRAL

On 8/23/06, Michael Grey <[EMAIL PROTECTED]> wrote:
>
>
>
>
>
>
> Has anyone experienced SPF_* rules not actually being scored ?
>
> In the debug I see that it comes back as 'result: none' - shouldn't this
> come back as SPF_NEUTRAL ?
>
>

When the domain does not publish SPF records you get "result: none".
Test with a domain that does publish SPF records.

-- 
Noel Jones


SPF Scoring... SPF_NEUTRAL

2006-08-23 Thread Michael Grey








 

Has anyone experienced SPF_* rules not actually being scored
?  

In the debug I see that it comes back as ‘result: none’
– shouldn’t this come back as SPF_NEUTRAL ?

 

 

We are setting up SA with amavisd, and when running amavis
in debug mode

(amavisd –u amavis –g amavis debug-sa)  I
can see it hit the spf checks; it comes back with

 

--- debug output ---

[2456] dbg: spf: checking HELO (helo=mail.yuki.com, ip=22.110.92.38)

[2456] dbg: spf: query for /22.110.92.38/mail.yuki.com:
result: none, comment: SPF: domain of sender mail.yuki.com does not designate
mailers

[2456] dbg: spf: checking EnvelopeFrom (helo=mail.yuki.com,
ip=22.110.92.38, [EMAIL PROTECTED])

[2456] dbg: spf: query for [EMAIL PROTECTED]/22.110.92.38/mail.yuki.com:
result: none, comment: SPF: domain of sender [EMAIL PROTECTED] does not designate
mailers

 

 

In SA local.cf I have tweaked the scores arbitrarily way up
to try to ensure that the scoring is substantial enough to guarantee notice…

 

--- local.cf ---

score SPF_PASS 10

score SPF_HELO_PASS 10

score SPF_FAIL 12

score SPF_HELO_FAIL 13

score SPF_HELO_NEUTRAL 13

score SPF_HELO_SOFTFAIL 12 

score SPF_NEUTRAL 12

score SPF_SOFTFAIL 12

 

However, the header result in the email is :

 

--- email header ---

X-Spam-Status: No, score=2.047 tagged_above=-999
required=4.5

 tests=[BAYES_50=0.001, RCVD_IN_SORBS_DUL=2.046]

X-Spam-Score: 2.047

X-Spam-Level: **

 

Still no hits… Other score changes in local.cf are
effective; so if I modify RCVD_IN_SORBS_DUL= that change will be apparent in
the email header.

 

Any ideas ???

 


Many thanks.

 

Michael Grey

 

 

 

 








FW: Bayes SQL Errors

2006-08-21 Thread Michael Grey


Ryan,

I just did this myself the first time in the last week;

Be sure that all your operations on the DB are done as the user who is going
to be accessing it; ie: Spamassassin spamuser etc.

Not knowing the history of your install; 

In your Spamassassin local.cf file you should have these lines, COMMENTED OUT
for now... You want spamassassin to use the berkely db for the moment.
#   bayes_store_module  Mail::SpamAssassin::BayesStore::MySQL
#   bayes_sql_dsn   DBI:mysql:bayes:localhost:3306
#   bayes_sql_username  spamassassin
#   bayes_sql_password  spampassword
#   bayes_sql_override_username spamassassin

First be sure that your B-DB is actually a vs 3.x by doing 
'sa-learn --sync'
This will ensure that the b-db format is 3.x compatible.

Next, do a sa-learn --backup >backup.txt.

Create the bayes DB in mysql, and then apply the tables using the templates (
that you obviously have ).

In mysql (as root) be sure to do :
-   grant all privileges on bayes.* to [EMAIL PROTECTED] identified
by 'spampassword'

Uncomment the bayse_* lines from Spamassassin local.cf, then su back as
spamassassin ( or whatever user is going to be accessing the db )

run 'sa-learn --restore ./backup.txt'  this places all the entries from
backup.txt into the mysql db. This should take a few minutes.

>From your errors, it looks like the import process into your db got messed
up. As root, go into mysql> and drop the bayes db and start again

Good luck...

Michael Grey


-Original Message-
From: Ryan Kather [mailto:[EMAIL PROTECTED] 
Sent: Monday, August 21, 2006 1:29 PM
To: