Re: Image Composition Analysis

2004-12-06 Thread Nix
On Thu, 02 Dec 2004, Matt Kettler stated:
> Actually, In my experience, DCC contains very little solicited
> bulk. It also contains much less solicited bulk mail than razor
> does. This is of course completely contrary to Razor's goal of not
> containing solicited email, and DCC's claim of not caring.

Bear in mind that DCC's default bulk threshold is so high (in the
millions if I read the code aright) that it's vanishingly unlikely that
anything which isn't explicitly reported as spam will get categorized as
spam by simple growth of counts.

This means that (given a default DCC and SpamAssassin configuration)
only spam explicitly reported as such to DCC will land up firing
DCC_CHECK. So it acts much like Razor.

(e.g. your mail had a DCC count of 11 when I received it, *far* below
the DCC bulk threshold.)

> I'd treat the DCC and Razor design goals with a huge grain of salt
> compared to their real-world behaviors. Both have some FPs, but then
> again, so does every rule. Most of my FPs on either Razor or DCC are
> solicited bulk mail.

The DCC design goal seems to be working: more popular lists have higher
DCC counts. I don't think its major goal (determine actual bulkiness via
DCC counts) will be successful unless DCC achieves far greater
penetration than it has now, and unless people actually report *all*
their email (that traversed the net) to DCC as the DCC FAQ suggests
(generally by reporting everything and whitelisting local addresses).
Particularly legit mailing lists. :)

(I've used the DCC counts before to identify personally-addressed email
and split away otherwise-unrecognisable mailing list mail into separate
mail folders, without needing to hardwire info on return-paths into
procmailrc at all. `If we've received bulky mail from this return-path
before, treat it as a mailing list' sort of thing. You can't do *that*
with Razor.)

-- 
`The sword we forged has turned upon us
 Only now, at the end of all things do we see
 The lamp-bearer dies; only the lamp burns on.'


RE: Image Composition Analysis

2004-12-03 Thread Smart,Dan
I forget to be paranoid and suspicious some times.  :(

<>


 

>  -Original Message-
>  From: Chris Santerre [mailto:[EMAIL PROTECTED] 
>  Sent: Friday, December 03, 2004 9:12 AM
>  To: Smart,Dan; users@spamassassin.apache.org
>  Subject: RE: Image Composition Analysis
>  
>  
>  
>  >-Original Message-
>  >From: Smart,Dan [mailto:[EMAIL PROTECTED]
>  >Sent: Friday, December 03, 2004 9:59 AM
>  >To: users@spamassassin.apache.org
>  >Subject: RE: Image Composition Analysis
>  >
>  >
>  >Agree on DCC, it only tells if bulk and doesn't 
>  discriminate on Spam or 
>  >not.
>  >I have whitelists, and some home made rules which fix most real 
>  >newsletters.
>  >
>  *SNIP*
>  
>  Oh dear! Its usually not a good idea to post negative 
>  scoring rules to the list. Guess what we are going to see in 
>  the next week of spam? :)
>  
>  --Chris
>  
>  


RE: Image Composition Analysis

2004-12-03 Thread Chris Santerre


>-Original Message-
>From: Smart,Dan [mailto:[EMAIL PROTECTED]
>Sent: Friday, December 03, 2004 9:59 AM
>To: users@spamassassin.apache.org
>Subject: RE: Image Composition Analysis
>
>
>Agree on DCC, it only tells if bulk and doesn't discriminate 
>on Spam or not.
>I have whitelists, and some home made rules which fix most 
>real newsletters.
>
*SNIP*

Oh dear! Its usually not a good idea to post negative scoring rules to the
list. Guess what we are going to see in the next week of spam? :)

--Chris



RE: Image Composition Analysis

2004-12-03 Thread Smart,Dan
Agree on DCC, it only tells if bulk and doesn't discriminate on Spam or not.
I have whitelists, and some home made rules which fix most real newsletters.

header   VMC_S_HAS_DATE Subject =~
/[01]?\d[-\/][0-3]?\d[-\/](20)?0[3-9]/
describe VMC_S_HAS_DATE VMC-Subject contains a date
scoreVMC_S_HAS_DATE -3.0

header   VMC_F_NEWS_LISTFrom =~ /([EMAIL PROTECTED]|[EMAIL PROTECTED])/i
describe VMC_F_NEWS_LISTVMC-From: news or list hostname in FQDN
scoreVMC_F_NEWS_LIST-2.0

header   VMC_S_IS_NEWS  Subject =~ /\b(in review|news|list)/i
describe VMC_S_IS_NEWS  VMC-Sub news newsletter list
scoreVMC_S_IS_NEWS  -2.0

header   VMC_S_FREQ Subject =~
/\b(monday|daily|week|monthly)\b/i
describe VMC_S_FREQ VMC-Sub monday daily week or monthly
scoreVMC_S_FREQ -2.5

header   VMC_S_MONTHSubject =~
/\b(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)/
describe VMC_S_MONTHVMC-Sub has month probable newsletter
scoreVMC_S_MONTH-2.5

<>


 

>  -Original Message-
>  From: Michael Barnes [mailto:[EMAIL PROTECTED] 
>  Sent: Thursday, December 02, 2004 9:18 AM
>  To: Matt Kettler
>  Cc: Smart,Dan; users@spamassassin.apache.org
>  Subject: Re: Image Composition Analysis
>  
>  On Tue, Nov 30, 2004 at 07:25:45PM -0500, Matt Kettler wrote:
>  > Yes, but DCC is still more reliable and faster. (I use both)
>  
>  I had to score DCC with 0.1 because it has way too many 
>  false positives.
>  My local.cf section dealing with this:
>  
>  
>  # too many false posives with this guy, meta corrected below
>  
>  score   DCC_CHECK   0.1
>  
>  metaDCC_PYZOR   (DCC_CHECK && PYZOR_CHECK)
>  score   DCC_PYZOR   2.9
>  
>  
>  DCC seems to have a large number of _solicited_ bulk email 
>  in its database, and my users get very upset when they sign 
>  up for junk email and it gets marked anywhere near spam.
>  
>  Just in my experience, I've noticed that there is a high 
>  correlation between DCC positive hits & Pyzor positive hits 
>  & real spam, so I scored it that way.
>  
>  Mike
>  
>  --
>  /-\
>  | Michael Barnes <[EMAIL PROTECTED]> |
>  | UNIX Systems Administrator  |
>  | College of William and Mary |
>  | Phone: (757) 879-3930   |
>  \-/
>  
>  


Re: Image Composition Analysis

2004-12-02 Thread Bob Proulx
Matt Kettler wrote:
> Actually, In my experience, DCC contains very little solicited bulk. It 
> also contains much less solicited bulk mail than razor does. This is of 
> course completely contrary to Razor's goal of not containing solicited 
> email, and DCC's claim of not caring.

Agreed.  That is consistent with my other statement.

> > [Although it turns out that most don't because most people follow the
> > rules and whitelist subscribed mailing lists thereby avoiding logging
> > legitimate mailing list messages to the database.  And since well
> > behaved mailing lists are not the problem there is no reason to log
> > them there.]

Most non-spam sources are not logged in DCC because there is no reason
to do so.

Bob


Re: Image Composition Analysis

2004-12-02 Thread Matt Kettler
At 11:29 AM 12/2/2004, Bob Proulx wrote:
> DCC seems to have a large number of _solicited_ bulk email in its
> database, and my users get very upset when they sign up for junk email
> and it gets marked anywhere near spam.
Of course DCC will contain solicited bulk email in the database!  You
*completely* misunderstand the entire purpose of DCC.  Please read
along with me the first few paragraphs of the documentation.
Actually, In my experience, DCC contains very little solicited bulk. It 
also contains much less solicited bulk mail than razor does. This is of 
course completely contrary to Razor's goal of not containing solicited 
email, and DCC's claim of not caring.

This experience is also consistent with the mass-check results in 
STATISTICS-set3.txt for SA 3.0. DCC has a noticably higher S/O ratio than 
Razor does.

  4.936  10.1125   0.03010.997   0.792.17  DCC_CHECK
 35.260  71.0900   1.29800.982   0.381.51  RAZOR2_CHECK
When DCC fired in this test, 99.7% of the matches were really spam. For 
razor, 98.2% of it's matches were really spam. Razor's total spam hit rate 
is MUCH higher, but it's accuracy is worse.

I'd treat the DCC and Razor design goals with a huge grain of salt compared 
to their real-world behaviors. Both have some FPs, but then again, so does 
every rule. Most of my FPs on either Razor or DCC are solicited bulk mail.

Also if most of your DCC problems are based on a particular sender, or only 
a few senders, you can configure DCC to not match that sender's mail using 
the whiteclnt file. I've not needed to do this, but it's easy to set up.




Re: Image Composition Analysis

2004-12-02 Thread Bob Proulx
Michael Barnes wrote:
> Matt Kettler wrote:
> > Yes, but DCC is still more reliable and faster. (I use both)
> 
> I had to score DCC with 0.1 because it has way too many false positives.
> [...]
> DCC seems to have a large number of _solicited_ bulk email in its
> database, and my users get very upset when they sign up for junk email
> and it gets marked anywhere near spam.

Of course DCC will contain solicited bulk email in the database!  You
*completely* misunderstand the entire purpose of DCC.  Please read
along with me the first few paragraphs of the documentation.

  man dcc

  DESCRIPTION

 The Distributed Checksum Clearinghouse or DCC is a cooperative,
 distributed system intended to detect "bulk" mail or mail sent to
 many people.  It allows individuals receiving a single mail
 message to determine that many other people have received
 essentially identical copies of the message and so reject or
 discard the message.

  How the DCC Is Used

 The DCC can be viewed as a tool for end users to enforce their
 right to "opt-in" to streams of bulk mail by refusing bulk mail
 except from sources in a "whitelist."  Whitelists are the
 responsibility of DCC clients, since only they know which bulk
 mail they solicited.

DCC is not about spam.  DCC is about bulk email.  Those are two
completely different things.  DCC is a tool to determine that other
people have received the same message that you just received.  If you
have subscribed to a mailing list then of course the mailing list
messages will be in DCC.

[Although it turns out that most don't because most people follow the
rules and whitelist subscribed mailing lists thereby avoiding logging
legitimate mailing list messages to the database.  And since well
behaved mailing lists are not the problem there is no reason to log
them there.]

Bob


Re: Image Composition Analysis

2004-12-02 Thread Michael Barnes
On Tue, Nov 30, 2004 at 07:25:45PM -0500, Matt Kettler wrote:
> Yes, but DCC is still more reliable and faster. (I use both)

I had to score DCC with 0.1 because it has way too many false positives.
My local.cf section dealing with this:


# too many false posives with this guy, meta corrected below

score   DCC_CHECK   0.1

metaDCC_PYZOR   (DCC_CHECK && PYZOR_CHECK)
score   DCC_PYZOR   2.9


DCC seems to have a large number of _solicited_ bulk email in its
database, and my users get very upset when they sign up for junk email
and it gets marked anywhere near spam.

Just in my experience, I've noticed that there is a high correlation
between DCC positive hits & Pyzor positive hits & real spam, so I scored
it that way.

Mike

--
/-\
| Michael Barnes <[EMAIL PROTECTED]> |
| UNIX Systems Administrator  |
| College of William and Mary |
| Phone: (757) 879-3930   |
\-/


Re: Image Composition Analysis

2004-12-02 Thread Michael Barnes
On Tue, Nov 30, 2004 at 04:27:14PM -0600, Smart,Dan wrote:
> Catching image only E-mail with pornographic images is really
> difficult.  My users are offended when they get one, and wonder how
> I could not catch it.  Explaining that the document was text, filled
> with bayes poison, and the one porn image with no porn words in the
> document doesn't seem to have much of an impression on them.


Tell your users or set it up for them to not view html email or at the
very least, not to view images by default.

Both of these are unnecessarily set by default by most GUI based email
clients and they are a privacy and security issue for the user.

BTW, I get bayes poisoned, image only mails and they score above 10 on
my system all the time.

Mike

-- 
/-\
| Michael Barnes <[EMAIL PROTECTED]> |
| UNIX Systems Administrator  |
| College of William and Mary |
| Phone: (757) 879-3930   |
\-/


Re: Image Composition Analysis

2004-12-02 Thread Jeff Chan
On Wednesday, December 1, 2004, 3:25:42 PM, John Hardin wrote:
> On Wed, 2004-12-01 at 14:35, Chris Santerre wrote:
>> We are seeing an increase in throw away domains being used to reroute
>> to other domains that will NEVER show up directly in a spam. All in
>> attempts to get passed SURBL.

> I'm going to bring up this idea again, in a slightly different context
> this time:

> Perhaps it would be useful to have a SURBL list that is automatically
> generated daily from the registrars' notifications of domains that have
> been recently created. This information is available for free download -
> I'm pretty sure I posted the location here a while ago.

> The definition of "recently" might require some testing to set properly,
> perhaps a starting point would be one week.

> Granted this SURBL would be more subject to FPs than a hand-maintained
> list, so it should have a correspondingly lower default score. And it
> wouldn't help too much if spammers don't start using their throwaway
> domains immediately after registering them.

We still want SURBLs to be lists of domains (and a few IPs)
that have actually occurred in spams.  A list of all new
registrations could perhaps be used as an internal data
source, but I think it would have way too many false
positives to use alone.

The Outblaze data in ob.surbl.org somewhat fulfills your
suggestion since it contains only domains that have been
registered within the last 90 days *and which have appeared
in a lot of spams lately.  It tends to work well.

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



RE: Image Composition Analysis

2004-12-01 Thread John Hardin
On Wed, 2004-12-01 at 14:35, Chris Santerre wrote:
> We are seeing an increase in throw away domains being used to reroute
> to other domains that will NEVER show up directly in a spam. All in
> attempts to get passed SURBL.

I'm going to bring up this idea again, in a slightly different context
this time:

Perhaps it would be useful to have a SURBL list that is automatically
generated daily from the registrars' notifications of domains that have
been recently created. This information is available for free download -
I'm pretty sure I posted the location here a while ago.

The definition of "recently" might require some testing to set properly,
perhaps a starting point would be one week.

Granted this SURBL would be more subject to FPs than a hand-maintained
list, so it should have a correspondingly lower default score. And it
wouldn't help too much if spammers don't start using their throwaway
domains immediately after registering them.

--
John Hardin
Internal Systems Administrator (Seattle)
CRS Retail Systems, Inc.
3400 188th Street SW, Suite 185
Lynnwood, WA 98037
voice: (425) 672-1304
  fax: (425) 672-0192
email: [EMAIL PROTECTED]
  web: http://www.crsretail.com
---
 If you smash a computer to bits with a mallet, that appears to count
 as encryption in the state of Nevada.
   - CRYPTO-GRAM 12/2001
---



RE: Image Composition Analysis

2004-12-01 Thread Chris Santerre
Yeah this is a definite candidate for SURBL. This is the
Huntsville-consulting spam gang:
http://www.spamhaus.org/SBL/sbl.lasso?query=SBL20528

353+ domains diretly linked. This is going to be the next trend. The final
destination of this pron spam was throatstuffers . com, but it used a throw
away domain of marlacell . com as a forwarder. Not directly either. That
domain simply hosted a mirrored page of throatstuffers . com. 

We are seeing an increase in throw away domains being used to reroute to
other domains that will NEVER show up directly in a spam. All in attempts to
get passed SURBL. No biggy, the more pople that submit and manage SURBL the
faster they get added. 

However there has been discussion on blocking the final destinations via web
proxy's and host files. I think we will begin to see an increase in
companies blocking these IPs or domains at the firewall or proxy server. 

Its actually helping some antispammers. We are able to tie more spammers
together thru looking at who is trying to get passed SURBL thru throw away
domains. Some of the small guys are only rogues of the bigger ones. We got
people watching spammers six ways from Sunday. Funny how much they don't
realise we know ;)

--Chris 

>-Original Message-
>From: Smart,Dan [mailto:[EMAIL PROTECTED]
>Sent: Wednesday, December 01, 2004 4:57 PM
>To: [EMAIL PROTECTED]
>Subject: RE: Image Composition Analysis
>
>
>Attached is the spam that got through.  I changed the porn URL to not
>offend.  It's a little mangled as it was forwarded by the user 
>via Outlook,
>and tags got mangled by my Sanitizer.
>
>I capture the headers of all files, and here is what they look 
>like.  The
>bayes = 0 is what got this through.
>
><>
>
>
>>From filter  Wed Nov  3 01:29:14 2004
>Return-Path: <[EMAIL PROTECTED]>
>Received: from great.amberalist.com (great.amberalist.com 
>[209.200.9.222])
>by dalton.vul.com (Vulcan E-mail Relay) with SMTP id 56BD89BB2C
>for <[EMAIL PROTECTED]>; Wed,  3 Nov 2004 01:29:14 
>-0600 (CST)
>Received: from mail pickup service by kmanus.com with 
>Microsoft SMTPSVC;
> Wed, 3 Nov 2004 14:17:54 -0800
>Received: from 194.3.74.35 by by7fd.bay7.kmanus.com with HTTP;
>Wed, 3 Nov 2004 14:17:54 GMT
>X-Originating-IP: [194.3.74.35]
>X-Originating-Email: [EMAIL PROTECTED]
>X-Sender: [EMAIL PROTECTED]
>From: Bebe <[EMAIL PROTECTED]>
>To: X <[EMAIL PROTECTED]>
>Subject: re: our appreciation
>Date: 3 Nov 2004 14:17:54 -0500
>Mime-Version: 1.0
>Content-type: text/html
>Message-ID: <[EMAIL PROTECTED]>
>X-Spam-Checker-Version: SpamAssassin 2.64 (2004-01-11) on 
>dalton.vul.com
>X-Spam-DCC: : dalton 1182; Body=1 Fuz1=1 Fuz2=1
>X-Spam-AWL: Auto_Whitelist=
>X-Spam-Status: No, hits=1.7 required=6.5 
>tests=BAYES_00,CP_RANDOMWORD_10,
>HTML_MESSAGE,MIME_HTML_NO_CHARSET,MIME_HTML_ONLY,OB_URI_RBL,
>RCVD_IN_SBL,SARE_HTML_FSIZE_1ALL,WS_URI_RBL autolearn=no
>version=2.64
>X-Spam-Level: *
>Status: RO
>X-Status:
>X-Keywords:
>X-UID: 1219
>
>======
><>
>
>
> 
>
>>  -Original Message-
>>  From: John Andersen [mailto:[EMAIL PROTECTED] 
>>  Sent: Wednesday, December 01, 2004 2:45 AM
>>  To: [EMAIL PROTECTED]
>>  Subject: Re: Image Composition Analysis
>>  
>>  On Tuesday 30 November 2004 01:27 pm, Smart,Dan wrote:
>>   
>>  > Catching image only E-mail with pornographic images is 
>>  really difficult. 
>>  > My users are offended when they get one, and wonder how I 
>>  could not 
>>  > catch it. Explaining that the document was text, filled 
>with bayes 
>>  > poison, and the one porn image with no porn words in the document 
>>  > doesn't seem to have much of an impression on them.
>>  
>>  Open the image with a text editor and challenge them to 
>>  determine if it is spam or not.  
>>  
>>  Really, people this dumb should not be turned loose on the internet.
>>  
>>  --
>>  _
>>  John Andersen
>>  
>
>


Re: Image Composition Analysis

2004-12-01 Thread jdow
BAYES_00

Your Bayes filter thought there was a VERY strong indication that this
message was ham. I'd suggest the filter is in serious need of training
or else the message was extraordinarily well constructed.

{^_^}
- Original Message - 
From: "Smart,Dan" <[EMAIL PROTECTED]>


> Attached is the spam that got through.  I changed the porn URL to not
> offend.  It's a little mangled as it was forwarded by the user via
Outlook,
> and tags got mangled by my Sanitizer.
>
> I capture the headers of all files, and here is what they look like.  The
> bayes = 0 is what got this through.
>
> <>
>
> 
> >From filter  Wed Nov  3 01:29:14 2004
> Return-Path: <[EMAIL PROTECTED]>
> Received: from great.amberalist.com (great.amberalist.com [209.200.9.222])
> by dalton.vul.com (Vulcan E-mail Relay) with SMTP id 56BD89BB2C
> for <[EMAIL PROTECTED]>; Wed,  3 Nov 2004 01:29:14 -0600 (CST)
> Received: from mail pickup service by kmanus.com with Microsoft SMTPSVC;
>  Wed, 3 Nov 2004 14:17:54 -0800
> Received: from 194.3.74.35 by by7fd.bay7.kmanus.com with HTTP;
> Wed, 3 Nov 2004 14:17:54 GMT
> X-Originating-IP: [194.3.74.35]
> X-Originating-Email: [EMAIL PROTECTED]
> X-Sender: [EMAIL PROTECTED]
> From: Bebe <[EMAIL PROTECTED]>
> To: X <[EMAIL PROTECTED]>
> Subject: re: our appreciation
> Date: 3 Nov 2004 14:17:54 -0500
> Mime-Version: 1.0
> Content-type: text/html
> Message-ID: <[EMAIL PROTECTED]>
> X-Spam-Checker-Version: SpamAssassin 2.64 (2004-01-11) on dalton.vul.com
> X-Spam-DCC: : dalton 1182; Body=1 Fuz1=1 Fuz2=1
> X-Spam-AWL: Auto_Whitelist=
> X-Spam-Status: No, hits=1.7 required=6.5 tests=BAYES_00,CP_RANDOMWORD_10,
> HTML_MESSAGE,MIME_HTML_NO_CHARSET,MIME_HTML_ONLY,OB_URI_RBL,
> RCVD_IN_SBL,SARE_HTML_FSIZE_1ALL,WS_URI_RBL autolearn=no
> version=2.64
> X-Spam-Level: *
> Status: RO
> X-Status:
> X-Keywords:
> X-UID: 1219
>
> ======
> <>
>
>
>
>
> >  -Original Message-
> >  From: John Andersen [mailto:[EMAIL PROTECTED]
> >  Sent: Wednesday, December 01, 2004 2:45 AM
> >  To: [EMAIL PROTECTED]
> >  Subject: Re: Image Composition Analysis
> >
> >  On Tuesday 30 November 2004 01:27 pm, Smart,Dan wrote:
> >
> >  > Catching image only E-mail with pornographic images is
> >  really difficult.
> >  > My users are offended when they get one, and wonder how I
> >  could not
> >  > catch it. Explaining that the document was text, filled with bayes
> >  > poison, and the one porn image with no porn words in the document
> >  > doesn't seem to have much of an impression on them.
> >
> >  Open the image with a text editor and challenge them to
> >  determine if it is spam or not.
> >
> >  Really, people this dumb should not be turned loose on the internet.
> >
> >  --
> >  _
> >  John Andersen
> >
>
>




RE: Image Composition Analysis

2004-12-01 Thread Smart,Dan
Attached is the spam that got through.  I changed the porn URL to not
offend.  It's a little mangled as it was forwarded by the user via Outlook,
and tags got mangled by my Sanitizer.

I capture the headers of all files, and here is what they look like.  The
bayes = 0 is what got this through.

<>


>From filter  Wed Nov  3 01:29:14 2004
Return-Path: <[EMAIL PROTECTED]>
Received: from great.amberalist.com (great.amberalist.com [209.200.9.222])
by dalton.vul.com (Vulcan E-mail Relay) with SMTP id 56BD89BB2C
for <[EMAIL PROTECTED]>; Wed,  3 Nov 2004 01:29:14 -0600 (CST)
Received: from mail pickup service by kmanus.com with Microsoft SMTPSVC;
 Wed, 3 Nov 2004 14:17:54 -0800
Received: from 194.3.74.35 by by7fd.bay7.kmanus.com with HTTP;
Wed, 3 Nov 2004 14:17:54 GMT
X-Originating-IP: [194.3.74.35]
X-Originating-Email: [EMAIL PROTECTED]
X-Sender: [EMAIL PROTECTED]
From: Bebe <[EMAIL PROTECTED]>
To: X <[EMAIL PROTECTED]>
Subject: re: our appreciation
Date: 3 Nov 2004 14:17:54 -0500
Mime-Version: 1.0
Content-type: text/html
Message-ID: <[EMAIL PROTECTED]>
X-Spam-Checker-Version: SpamAssassin 2.64 (2004-01-11) on dalton.vul.com
X-Spam-DCC: : dalton 1182; Body=1 Fuz1=1 Fuz2=1
X-Spam-AWL: Auto_Whitelist=
X-Spam-Status: No, hits=1.7 required=6.5 tests=BAYES_00,CP_RANDOMWORD_10,
HTML_MESSAGE,MIME_HTML_NO_CHARSET,MIME_HTML_ONLY,OB_URI_RBL,
RCVD_IN_SBL,SARE_HTML_FSIZE_1ALL,WS_URI_RBL autolearn=no
version=2.64
X-Spam-Level: *
Status: RO
X-Status:
X-Keywords:
X-UID: 1219

==
<>


 

>  -Original Message-
>  From: John Andersen [mailto:[EMAIL PROTECTED] 
>  Sent: Wednesday, December 01, 2004 2:45 AM
>  To: [EMAIL PROTECTED]
>  Subject: Re: Image Composition Analysis
>  
>  On Tuesday 30 November 2004 01:27 pm, Smart,Dan wrote:
>   
>  > Catching image only E-mail with pornographic images is 
>  really difficult. 
>  > My users are offended when they get one, and wonder how I 
>  could not 
>  > catch it. Explaining that the document was text, filled with bayes 
>  > poison, and the one porn image with no porn words in the document 
>  > doesn't seem to have much of an impression on them.
>  
>  Open the image with a text editor and challenge them to 
>  determine if it is spam or not.  
>  
>  Really, people this dumb should not be turned loose on the internet.
>  
>  --
>  _
>  John Andersen
>  



FW our appreciation.htm
Description: Binary data


RE: Image Composition Analysis

2004-12-01 Thread Chris Santerre


>-Original Message-
>From: Martin Hepworth [mailto:[EMAIL PROTECTED]
>Sent: Wednesday, December 01, 2004 4:39 AM
>To: Smart,Dan
>Cc: users@spamassassin.apache.org
>Subject: Re: Image Composition Analysis
>
>
>Dan
>
>I find the surbl.org URIRBL list provides very good protection against 
>this kind of message, along with othe rules in www.rulesemporium.com I 
>don't recall seeing one slip through for ages..
>

Forget that I'm partial to these both, but Martin is right. SURBL stops
them. I haven't seen one reported by my users in a very long time. I have
analysed many pron images. I will conintue to do so over the coming years.
My hope is to find some sort of pattern. Maybe I can get a Gov't grant to
help me dig deeper into pron images full time. Yeah that would be cool. Long
hours, but I feel it would help everyone. 

:-) 

--Chris (I read them for the bayes poisonhonest!) 


Re: Image Composition Analysis

2004-12-01 Thread Martin Hepworth
Dan
I find the surbl.org URIRBL list provides very good protection against 
this kind of message, along with othe rules in www.rulesemporium.com I 
don't recall seeing one slip through for ages..

--
Martin Hepworth
Snr Systems Administrator
Solid State Logic
Tel: +44 (0)1865 842300
Smart,Dan wrote:
Messagelabs made a big deal of their option of using First 4 Internet's 
Image Composition Analysis tool to detect pornographic images.  Is 
anyone in the open source world working on something similar.
 
Catching image only E-mail with pornographic images is really 
difficult.  My users are offended when they get one, and wonder how I 
could not catch it.  Explaining that the document was text, filled with 
bayes poison, and the one porn image with no porn words in the document 
doesn't seem to have much of an impression on them.
 

<>
 
**
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.
This footnote confirms that this email message has been swept
for the presence of computer viruses and is believed to be clean.
**


Re: Image Composition Analysis

2004-12-01 Thread John Andersen
On Tuesday 30 November 2004 01:27 pm, Smart,Dan wrote:
 
> Catching image only E-mail with pornographic images is really difficult. 
> My users are offended when they get one, and wonder how I could not catch
> it. Explaining that the document was text, filled with bayes poison, and
> the one porn image with no porn words in the document doesn't seem to have
> much of an impression on them.

Open the image with a text editor and challenge them to determine
if it is spam or not.  

Really, people this dumb should not be turned loose on the internet.

-- 
_
John Andersen


pgpBUVuTN5sjd.pgp
Description: signature


Re: Image Composition Analysis

2004-12-01 Thread jdow
I wonder what kind of a load it is on the filtering machine.
{^_-}
- Original Message - 
From: "Smart,Dan" <[EMAIL PROTECTED]>


> Messagelabs made a big deal of their option of using First 4 Internet's
> Image Composition Analysis tool to detect pornographic images.  Is anyone
in
> the open source world working on something similar.
>
> Catching image only E-mail with pornographic images is really difficult.
My
> users are offended when they get one, and wonder how I could not catch it.
> Explaining that the document was text, filled with bayes poison, and the
one
> porn image with no porn words in the document doesn't seem to have much of
> an impression on them.
>
>
> <>




RE: Image Composition Analysis

2004-12-01 Thread Matt Kettler
At 07:15 PM 11/30/2004, Smart,Dan wrote:
So Razor differs from DCC in that respect.
Razor and DCC differ quite a bit when you get into the details. 
Particularly now that razor has the e8 algorithm, which is more like SURBL 
than it is like DCC.


I gave up on Razor long ago due to delays due to slow Razor response, and
repeated Razor outages.  Is it more reliable today?
Yes, but DCC is still more reliable and faster. (I use both)
Late last week one of the servers dropped off, but before that it was 
several months since the last outage that was long enough for me to notice.

I'd recommend keeping your razor_timeout low if delays are painful to your 
server, but razor is a worthwhile tool.

I'd also consider shortening the rediscovery_wait a bit from the two-day 
default to one-day or so. Usually outages are corrected if you run a 
discover, since the cloudmark crew tends to update the server list when one 
drops off.




RE: Image Composition Analysis

2004-12-01 Thread Smart,Dan
So Razor differs from DCC in that respect.

I gave up on Razor long ago due to delays due to slow Razor response, and
repeated Razor outages.  Is it more reliable today?

<>


 

>  -Original Message-
>  From: Matt Kettler [mailto:[EMAIL PROTECTED] 
>  Sent: Tuesday, November 30, 2004 5:12 PM
>  To: Smart,Dan; users@spamassassin.apache.org
>  Subject: Re: Image Composition Analysis
>  
>  At 05:27 PM 11/30/2004, Smart,Dan wrote:
>  >Messagelabs made a big deal of their option of using First 
>  4 Internet's 
>  >Image Composition Analysis tool to detect pornographic images.  Is 
>  >anyone in the open source world working on something similar.
>  
>  Not that I'm aware of. Nor am I particularly impressed with 
>  the First 4 tool. It seems to operate mostly by detecting 
>  what percentage of an image is "skintone", leading to FPs on 
>  things like pictures of babies.
>  
>  http://www.computerworld.com/securitytopics/security/story/0,
>  10801,80431p2,00.html
>  
>  The reviewer felt that a out of 100 hits, 9 FPs was 
>  acceptable.. In SA terms that's an S/O of 0.91.. While 
>  that's not bad, it's not exactly impressive either, 
>  particularly for something that's likely to be CPU 
>  intensive. The article doesn't describe in detail what the 
>  FN rate is, only uses vague terms.. but it doesn't sound 
>  very good either.
>  
>  They also excused FN's on messages containing images made 
>  out of several small images. So right out of the box there's 
>  an evasion technique that spammers can use to avoid this 
>  tool with ease.
>  
>  Really, if you're not using razor, you should. It's a better 
>  general-purpose solution for this problem, and likely to run 
>  at about the same speed.
>  
>  Razor is able to spam-classify individual mime sections of 
>  messages based on reported SHA hashes. This way if a spam 
>  with that image is reported any other spam with that same 
>  mime section will hit.
>  
>  This will also help with the image-based pill spams too, not 
>  just the porn ones.
>  
>  
>  
>  
>  
>  
>  


Re: Image Composition Analysis

2004-11-30 Thread Matt Kettler
At 05:27 PM 11/30/2004, Smart,Dan wrote:
Messagelabs made a big deal of their option of using First 4 Internet's 
Image Composition Analysis tool to detect pornographic images.  Is anyone 
in the open source world working on something similar.
Not that I'm aware of. Nor am I particularly impressed with the First 4 
tool. It seems to operate mostly by detecting what percentage of an image 
is "skintone", leading to FPs on things like pictures of babies.

http://www.computerworld.com/securitytopics/security/story/0,10801,80431p2,00.html
The reviewer felt that a out of 100 hits, 9 FPs was acceptable.. In SA 
terms that's an S/O of 0.91.. While that's not bad, it's not exactly 
impressive either, particularly for something that's likely to be CPU 
intensive. The article doesn't describe in detail what the FN rate is, only 
uses vague terms.. but it doesn't sound very good either.

They also excused FN's on messages containing images made out of several 
small images. So right out of the box there's an evasion technique that 
spammers can use to avoid this tool with ease.

Really, if you're not using razor, you should. It's a better 
general-purpose solution for this problem, and likely to run at about the 
same speed.

Razor is able to spam-classify individual mime sections of messages based 
on reported SHA hashes. This way if a spam with that image is reported any 
other spam with that same mime section will hit.

This will also help with the image-based pill spams too, not just the porn 
ones.






RE: Image Composition Analysis

2004-11-30 Thread Smart,Dan
The ones that get through have bayes poison at the bottom.  It did hit a
couple of the SARE rules that look for bayes poison, but didn't score enough
to kill it.  

Very well crafted.

<>


 

>  -Original Message-
>  From: Evan Platt [mailto:[EMAIL PROTECTED] 
>  Sent: Tuesday, November 30, 2004 4:53 PM
>  To: users@spamassassin.apache.org
>  Subject: Re: Image Composition Analysis
>  
>  Smart,Dan said:
>  > Messagelabs made a big deal of their option of using First 4 
>  > Internet's Image Composition Analysis tool to detect pornographic 
>  > images.  Is anyone in the open source world working on something 
>  > similar.
>  >
>  > Catching image only E-mail with pornographic images is 
>  really difficult.
>  > My
>  > users are offended when they get one, and wonder how I 
>  could not catch it.
>  > Explaining that the document was text, filled with bayes 
>  poison, and 
>  > the one porn image with no porn words in the document 
>  doesn't seem to 
>  > have much of an impression on them.
>  
>  Well, I'm only a interested end user, not a admin, nor could 
>  I set up SA if my job depended on it, however I did assist 
>  in the configuration a year ago Isn't there a rule for 
>  something like "Image only" e-mails with no text in them? 
>  Modify that rule for additional points, so if a e-mail 
>  consists of a image only it will score higher. I mean 
>  granted, there will be the occasional message from their 
>  friend with "Here's a picture of my new son" that's possibly 
>  a FP, but that should be few and far between.
>  
>  Just my .02.
>  
>  Evan
>  
>  


Re: Image Composition Analysis

2004-11-30 Thread Evan Platt
Smart,Dan said:
> Messagelabs made a big deal of their option of using First 4 Internet's
> Image Composition Analysis tool to detect pornographic images.  Is anyone
> in
> the open source world working on something similar.
>
> Catching image only E-mail with pornographic images is really difficult.
> My
> users are offended when they get one, and wonder how I could not catch it.
> Explaining that the document was text, filled with bayes poison, and the
> one
> porn image with no porn words in the document doesn't seem to have much of
> an impression on them.

Well, I'm only a interested end user, not a admin, nor could I set up SA
if my job depended on it, however I did assist in the configuration a year
ago Isn't there a rule for something like "Image only" e-mails with no
text in them? Modify that rule for additional points, so if a e-mail
consists of a image only it will score higher. I mean granted, there will
be the occasional message from their friend with "Here's a picture of my
new son" that's possibly a FP, but that should be few and far between.

Just my .02.

Evan