Re: Image Composition Analysis
On Thu, 02 Dec 2004, Matt Kettler stated: > Actually, In my experience, DCC contains very little solicited > bulk. It also contains much less solicited bulk mail than razor > does. This is of course completely contrary to Razor's goal of not > containing solicited email, and DCC's claim of not caring. Bear in mind that DCC's default bulk threshold is so high (in the millions if I read the code aright) that it's vanishingly unlikely that anything which isn't explicitly reported as spam will get categorized as spam by simple growth of counts. This means that (given a default DCC and SpamAssassin configuration) only spam explicitly reported as such to DCC will land up firing DCC_CHECK. So it acts much like Razor. (e.g. your mail had a DCC count of 11 when I received it, *far* below the DCC bulk threshold.) > I'd treat the DCC and Razor design goals with a huge grain of salt > compared to their real-world behaviors. Both have some FPs, but then > again, so does every rule. Most of my FPs on either Razor or DCC are > solicited bulk mail. The DCC design goal seems to be working: more popular lists have higher DCC counts. I don't think its major goal (determine actual bulkiness via DCC counts) will be successful unless DCC achieves far greater penetration than it has now, and unless people actually report *all* their email (that traversed the net) to DCC as the DCC FAQ suggests (generally by reporting everything and whitelisting local addresses). Particularly legit mailing lists. :) (I've used the DCC counts before to identify personally-addressed email and split away otherwise-unrecognisable mailing list mail into separate mail folders, without needing to hardwire info on return-paths into procmailrc at all. `If we've received bulky mail from this return-path before, treat it as a mailing list' sort of thing. You can't do *that* with Razor.) -- `The sword we forged has turned upon us Only now, at the end of all things do we see The lamp-bearer dies; only the lamp burns on.'
RE: Image Composition Analysis
I forget to be paranoid and suspicious some times. :( <> > -Original Message- > From: Chris Santerre [mailto:[EMAIL PROTECTED] > Sent: Friday, December 03, 2004 9:12 AM > To: Smart,Dan; users@spamassassin.apache.org > Subject: RE: Image Composition Analysis > > > > >-Original Message- > >From: Smart,Dan [mailto:[EMAIL PROTECTED] > >Sent: Friday, December 03, 2004 9:59 AM > >To: users@spamassassin.apache.org > >Subject: RE: Image Composition Analysis > > > > > >Agree on DCC, it only tells if bulk and doesn't > discriminate on Spam or > >not. > >I have whitelists, and some home made rules which fix most real > >newsletters. > > > *SNIP* > > Oh dear! Its usually not a good idea to post negative > scoring rules to the list. Guess what we are going to see in > the next week of spam? :) > > --Chris > >
RE: Image Composition Analysis
>-Original Message- >From: Smart,Dan [mailto:[EMAIL PROTECTED] >Sent: Friday, December 03, 2004 9:59 AM >To: users@spamassassin.apache.org >Subject: RE: Image Composition Analysis > > >Agree on DCC, it only tells if bulk and doesn't discriminate >on Spam or not. >I have whitelists, and some home made rules which fix most >real newsletters. > *SNIP* Oh dear! Its usually not a good idea to post negative scoring rules to the list. Guess what we are going to see in the next week of spam? :) --Chris
RE: Image Composition Analysis
Agree on DCC, it only tells if bulk and doesn't discriminate on Spam or not. I have whitelists, and some home made rules which fix most real newsletters. header VMC_S_HAS_DATE Subject =~ /[01]?\d[-\/][0-3]?\d[-\/](20)?0[3-9]/ describe VMC_S_HAS_DATE VMC-Subject contains a date scoreVMC_S_HAS_DATE -3.0 header VMC_F_NEWS_LISTFrom =~ /([EMAIL PROTECTED]|[EMAIL PROTECTED])/i describe VMC_F_NEWS_LISTVMC-From: news or list hostname in FQDN scoreVMC_F_NEWS_LIST-2.0 header VMC_S_IS_NEWS Subject =~ /\b(in review|news|list)/i describe VMC_S_IS_NEWS VMC-Sub news newsletter list scoreVMC_S_IS_NEWS -2.0 header VMC_S_FREQ Subject =~ /\b(monday|daily|week|monthly)\b/i describe VMC_S_FREQ VMC-Sub monday daily week or monthly scoreVMC_S_FREQ -2.5 header VMC_S_MONTHSubject =~ /\b(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)/ describe VMC_S_MONTHVMC-Sub has month probable newsletter scoreVMC_S_MONTH-2.5 <> > -Original Message- > From: Michael Barnes [mailto:[EMAIL PROTECTED] > Sent: Thursday, December 02, 2004 9:18 AM > To: Matt Kettler > Cc: Smart,Dan; users@spamassassin.apache.org > Subject: Re: Image Composition Analysis > > On Tue, Nov 30, 2004 at 07:25:45PM -0500, Matt Kettler wrote: > > Yes, but DCC is still more reliable and faster. (I use both) > > I had to score DCC with 0.1 because it has way too many > false positives. > My local.cf section dealing with this: > > > # too many false posives with this guy, meta corrected below > > score DCC_CHECK 0.1 > > metaDCC_PYZOR (DCC_CHECK && PYZOR_CHECK) > score DCC_PYZOR 2.9 > > > DCC seems to have a large number of _solicited_ bulk email > in its database, and my users get very upset when they sign > up for junk email and it gets marked anywhere near spam. > > Just in my experience, I've noticed that there is a high > correlation between DCC positive hits & Pyzor positive hits > & real spam, so I scored it that way. > > Mike > > -- > /-\ > | Michael Barnes <[EMAIL PROTECTED]> | > | UNIX Systems Administrator | > | College of William and Mary | > | Phone: (757) 879-3930 | > \-/ > >
Re: Image Composition Analysis
Matt Kettler wrote: > Actually, In my experience, DCC contains very little solicited bulk. It > also contains much less solicited bulk mail than razor does. This is of > course completely contrary to Razor's goal of not containing solicited > email, and DCC's claim of not caring. Agreed. That is consistent with my other statement. > > [Although it turns out that most don't because most people follow the > > rules and whitelist subscribed mailing lists thereby avoiding logging > > legitimate mailing list messages to the database. And since well > > behaved mailing lists are not the problem there is no reason to log > > them there.] Most non-spam sources are not logged in DCC because there is no reason to do so. Bob
Re: Image Composition Analysis
At 11:29 AM 12/2/2004, Bob Proulx wrote: > DCC seems to have a large number of _solicited_ bulk email in its > database, and my users get very upset when they sign up for junk email > and it gets marked anywhere near spam. Of course DCC will contain solicited bulk email in the database! You *completely* misunderstand the entire purpose of DCC. Please read along with me the first few paragraphs of the documentation. Actually, In my experience, DCC contains very little solicited bulk. It also contains much less solicited bulk mail than razor does. This is of course completely contrary to Razor's goal of not containing solicited email, and DCC's claim of not caring. This experience is also consistent with the mass-check results in STATISTICS-set3.txt for SA 3.0. DCC has a noticably higher S/O ratio than Razor does. 4.936 10.1125 0.03010.997 0.792.17 DCC_CHECK 35.260 71.0900 1.29800.982 0.381.51 RAZOR2_CHECK When DCC fired in this test, 99.7% of the matches were really spam. For razor, 98.2% of it's matches were really spam. Razor's total spam hit rate is MUCH higher, but it's accuracy is worse. I'd treat the DCC and Razor design goals with a huge grain of salt compared to their real-world behaviors. Both have some FPs, but then again, so does every rule. Most of my FPs on either Razor or DCC are solicited bulk mail. Also if most of your DCC problems are based on a particular sender, or only a few senders, you can configure DCC to not match that sender's mail using the whiteclnt file. I've not needed to do this, but it's easy to set up.
Re: Image Composition Analysis
Michael Barnes wrote: > Matt Kettler wrote: > > Yes, but DCC is still more reliable and faster. (I use both) > > I had to score DCC with 0.1 because it has way too many false positives. > [...] > DCC seems to have a large number of _solicited_ bulk email in its > database, and my users get very upset when they sign up for junk email > and it gets marked anywhere near spam. Of course DCC will contain solicited bulk email in the database! You *completely* misunderstand the entire purpose of DCC. Please read along with me the first few paragraphs of the documentation. man dcc DESCRIPTION The Distributed Checksum Clearinghouse or DCC is a cooperative, distributed system intended to detect "bulk" mail or mail sent to many people. It allows individuals receiving a single mail message to determine that many other people have received essentially identical copies of the message and so reject or discard the message. How the DCC Is Used The DCC can be viewed as a tool for end users to enforce their right to "opt-in" to streams of bulk mail by refusing bulk mail except from sources in a "whitelist." Whitelists are the responsibility of DCC clients, since only they know which bulk mail they solicited. DCC is not about spam. DCC is about bulk email. Those are two completely different things. DCC is a tool to determine that other people have received the same message that you just received. If you have subscribed to a mailing list then of course the mailing list messages will be in DCC. [Although it turns out that most don't because most people follow the rules and whitelist subscribed mailing lists thereby avoiding logging legitimate mailing list messages to the database. And since well behaved mailing lists are not the problem there is no reason to log them there.] Bob
Re: Image Composition Analysis
On Tue, Nov 30, 2004 at 07:25:45PM -0500, Matt Kettler wrote: > Yes, but DCC is still more reliable and faster. (I use both) I had to score DCC with 0.1 because it has way too many false positives. My local.cf section dealing with this: # too many false posives with this guy, meta corrected below score DCC_CHECK 0.1 metaDCC_PYZOR (DCC_CHECK && PYZOR_CHECK) score DCC_PYZOR 2.9 DCC seems to have a large number of _solicited_ bulk email in its database, and my users get very upset when they sign up for junk email and it gets marked anywhere near spam. Just in my experience, I've noticed that there is a high correlation between DCC positive hits & Pyzor positive hits & real spam, so I scored it that way. Mike -- /-\ | Michael Barnes <[EMAIL PROTECTED]> | | UNIX Systems Administrator | | College of William and Mary | | Phone: (757) 879-3930 | \-/
Re: Image Composition Analysis
On Tue, Nov 30, 2004 at 04:27:14PM -0600, Smart,Dan wrote: > Catching image only E-mail with pornographic images is really > difficult. My users are offended when they get one, and wonder how > I could not catch it. Explaining that the document was text, filled > with bayes poison, and the one porn image with no porn words in the > document doesn't seem to have much of an impression on them. Tell your users or set it up for them to not view html email or at the very least, not to view images by default. Both of these are unnecessarily set by default by most GUI based email clients and they are a privacy and security issue for the user. BTW, I get bayes poisoned, image only mails and they score above 10 on my system all the time. Mike -- /-\ | Michael Barnes <[EMAIL PROTECTED]> | | UNIX Systems Administrator | | College of William and Mary | | Phone: (757) 879-3930 | \-/
Re: Image Composition Analysis
On Wednesday, December 1, 2004, 3:25:42 PM, John Hardin wrote: > On Wed, 2004-12-01 at 14:35, Chris Santerre wrote: >> We are seeing an increase in throw away domains being used to reroute >> to other domains that will NEVER show up directly in a spam. All in >> attempts to get passed SURBL. > I'm going to bring up this idea again, in a slightly different context > this time: > Perhaps it would be useful to have a SURBL list that is automatically > generated daily from the registrars' notifications of domains that have > been recently created. This information is available for free download - > I'm pretty sure I posted the location here a while ago. > The definition of "recently" might require some testing to set properly, > perhaps a starting point would be one week. > Granted this SURBL would be more subject to FPs than a hand-maintained > list, so it should have a correspondingly lower default score. And it > wouldn't help too much if spammers don't start using their throwaway > domains immediately after registering them. We still want SURBLs to be lists of domains (and a few IPs) that have actually occurred in spams. A list of all new registrations could perhaps be used as an internal data source, but I think it would have way too many false positives to use alone. The Outblaze data in ob.surbl.org somewhat fulfills your suggestion since it contains only domains that have been registered within the last 90 days *and which have appeared in a lot of spams lately. It tends to work well. Jeff C. -- Jeff Chan mailto:[EMAIL PROTECTED] http://www.surbl.org/
RE: Image Composition Analysis
On Wed, 2004-12-01 at 14:35, Chris Santerre wrote: > We are seeing an increase in throw away domains being used to reroute > to other domains that will NEVER show up directly in a spam. All in > attempts to get passed SURBL. I'm going to bring up this idea again, in a slightly different context this time: Perhaps it would be useful to have a SURBL list that is automatically generated daily from the registrars' notifications of domains that have been recently created. This information is available for free download - I'm pretty sure I posted the location here a while ago. The definition of "recently" might require some testing to set properly, perhaps a starting point would be one week. Granted this SURBL would be more subject to FPs than a hand-maintained list, so it should have a correspondingly lower default score. And it wouldn't help too much if spammers don't start using their throwaway domains immediately after registering them. -- John Hardin Internal Systems Administrator (Seattle) CRS Retail Systems, Inc. 3400 188th Street SW, Suite 185 Lynnwood, WA 98037 voice: (425) 672-1304 fax: (425) 672-0192 email: [EMAIL PROTECTED] web: http://www.crsretail.com --- If you smash a computer to bits with a mallet, that appears to count as encryption in the state of Nevada. - CRYPTO-GRAM 12/2001 ---
RE: Image Composition Analysis
Yeah this is a definite candidate for SURBL. This is the Huntsville-consulting spam gang: http://www.spamhaus.org/SBL/sbl.lasso?query=SBL20528 353+ domains diretly linked. This is going to be the next trend. The final destination of this pron spam was throatstuffers . com, but it used a throw away domain of marlacell . com as a forwarder. Not directly either. That domain simply hosted a mirrored page of throatstuffers . com. We are seeing an increase in throw away domains being used to reroute to other domains that will NEVER show up directly in a spam. All in attempts to get passed SURBL. No biggy, the more pople that submit and manage SURBL the faster they get added. However there has been discussion on blocking the final destinations via web proxy's and host files. I think we will begin to see an increase in companies blocking these IPs or domains at the firewall or proxy server. Its actually helping some antispammers. We are able to tie more spammers together thru looking at who is trying to get passed SURBL thru throw away domains. Some of the small guys are only rogues of the bigger ones. We got people watching spammers six ways from Sunday. Funny how much they don't realise we know ;) --Chris >-Original Message- >From: Smart,Dan [mailto:[EMAIL PROTECTED] >Sent: Wednesday, December 01, 2004 4:57 PM >To: [EMAIL PROTECTED] >Subject: RE: Image Composition Analysis > > >Attached is the spam that got through. I changed the porn URL to not >offend. It's a little mangled as it was forwarded by the user >via Outlook, >and tags got mangled by my Sanitizer. > >I capture the headers of all files, and here is what they look >like. The >bayes = 0 is what got this through. > ><> > > >>From filter Wed Nov 3 01:29:14 2004 >Return-Path: <[EMAIL PROTECTED]> >Received: from great.amberalist.com (great.amberalist.com >[209.200.9.222]) >by dalton.vul.com (Vulcan E-mail Relay) with SMTP id 56BD89BB2C >for <[EMAIL PROTECTED]>; Wed, 3 Nov 2004 01:29:14 >-0600 (CST) >Received: from mail pickup service by kmanus.com with >Microsoft SMTPSVC; > Wed, 3 Nov 2004 14:17:54 -0800 >Received: from 194.3.74.35 by by7fd.bay7.kmanus.com with HTTP; >Wed, 3 Nov 2004 14:17:54 GMT >X-Originating-IP: [194.3.74.35] >X-Originating-Email: [EMAIL PROTECTED] >X-Sender: [EMAIL PROTECTED] >From: Bebe <[EMAIL PROTECTED]> >To: X <[EMAIL PROTECTED]> >Subject: re: our appreciation >Date: 3 Nov 2004 14:17:54 -0500 >Mime-Version: 1.0 >Content-type: text/html >Message-ID: <[EMAIL PROTECTED]> >X-Spam-Checker-Version: SpamAssassin 2.64 (2004-01-11) on >dalton.vul.com >X-Spam-DCC: : dalton 1182; Body=1 Fuz1=1 Fuz2=1 >X-Spam-AWL: Auto_Whitelist= >X-Spam-Status: No, hits=1.7 required=6.5 >tests=BAYES_00,CP_RANDOMWORD_10, >HTML_MESSAGE,MIME_HTML_NO_CHARSET,MIME_HTML_ONLY,OB_URI_RBL, >RCVD_IN_SBL,SARE_HTML_FSIZE_1ALL,WS_URI_RBL autolearn=no >version=2.64 >X-Spam-Level: * >Status: RO >X-Status: >X-Keywords: >X-UID: 1219 > >====== ><> > > > > >> -Original Message- >> From: John Andersen [mailto:[EMAIL PROTECTED] >> Sent: Wednesday, December 01, 2004 2:45 AM >> To: [EMAIL PROTECTED] >> Subject: Re: Image Composition Analysis >> >> On Tuesday 30 November 2004 01:27 pm, Smart,Dan wrote: >> >> > Catching image only E-mail with pornographic images is >> really difficult. >> > My users are offended when they get one, and wonder how I >> could not >> > catch it. Explaining that the document was text, filled >with bayes >> > poison, and the one porn image with no porn words in the document >> > doesn't seem to have much of an impression on them. >> >> Open the image with a text editor and challenge them to >> determine if it is spam or not. >> >> Really, people this dumb should not be turned loose on the internet. >> >> -- >> _ >> John Andersen >> > >
Re: Image Composition Analysis
BAYES_00 Your Bayes filter thought there was a VERY strong indication that this message was ham. I'd suggest the filter is in serious need of training or else the message was extraordinarily well constructed. {^_^} - Original Message - From: "Smart,Dan" <[EMAIL PROTECTED]> > Attached is the spam that got through. I changed the porn URL to not > offend. It's a little mangled as it was forwarded by the user via Outlook, > and tags got mangled by my Sanitizer. > > I capture the headers of all files, and here is what they look like. The > bayes = 0 is what got this through. > > <> > > > >From filter Wed Nov 3 01:29:14 2004 > Return-Path: <[EMAIL PROTECTED]> > Received: from great.amberalist.com (great.amberalist.com [209.200.9.222]) > by dalton.vul.com (Vulcan E-mail Relay) with SMTP id 56BD89BB2C > for <[EMAIL PROTECTED]>; Wed, 3 Nov 2004 01:29:14 -0600 (CST) > Received: from mail pickup service by kmanus.com with Microsoft SMTPSVC; > Wed, 3 Nov 2004 14:17:54 -0800 > Received: from 194.3.74.35 by by7fd.bay7.kmanus.com with HTTP; > Wed, 3 Nov 2004 14:17:54 GMT > X-Originating-IP: [194.3.74.35] > X-Originating-Email: [EMAIL PROTECTED] > X-Sender: [EMAIL PROTECTED] > From: Bebe <[EMAIL PROTECTED]> > To: X <[EMAIL PROTECTED]> > Subject: re: our appreciation > Date: 3 Nov 2004 14:17:54 -0500 > Mime-Version: 1.0 > Content-type: text/html > Message-ID: <[EMAIL PROTECTED]> > X-Spam-Checker-Version: SpamAssassin 2.64 (2004-01-11) on dalton.vul.com > X-Spam-DCC: : dalton 1182; Body=1 Fuz1=1 Fuz2=1 > X-Spam-AWL: Auto_Whitelist= > X-Spam-Status: No, hits=1.7 required=6.5 tests=BAYES_00,CP_RANDOMWORD_10, > HTML_MESSAGE,MIME_HTML_NO_CHARSET,MIME_HTML_ONLY,OB_URI_RBL, > RCVD_IN_SBL,SARE_HTML_FSIZE_1ALL,WS_URI_RBL autolearn=no > version=2.64 > X-Spam-Level: * > Status: RO > X-Status: > X-Keywords: > X-UID: 1219 > > == > <> > > > > > > -Original Message- > > From: John Andersen [mailto:[EMAIL PROTECTED] > > Sent: Wednesday, December 01, 2004 2:45 AM > > To: [EMAIL PROTECTED] > > Subject: Re: Image Composition Analysis > > > > On Tuesday 30 November 2004 01:27 pm, Smart,Dan wrote: > > > > > Catching image only E-mail with pornographic images is > > really difficult. > > > My users are offended when they get one, and wonder how I > > could not > > > catch it. Explaining that the document was text, filled with bayes > > > poison, and the one porn image with no porn words in the document > > > doesn't seem to have much of an impression on them. > > > > Open the image with a text editor and challenge them to > > determine if it is spam or not. > > > > Really, people this dumb should not be turned loose on the internet. > > > > -- > > _ > > John Andersen > > > >
RE: Image Composition Analysis
Attached is the spam that got through. I changed the porn URL to not offend. It's a little mangled as it was forwarded by the user via Outlook, and tags got mangled by my Sanitizer. I capture the headers of all files, and here is what they look like. The bayes = 0 is what got this through. <> >From filter Wed Nov 3 01:29:14 2004 Return-Path: <[EMAIL PROTECTED]> Received: from great.amberalist.com (great.amberalist.com [209.200.9.222]) by dalton.vul.com (Vulcan E-mail Relay) with SMTP id 56BD89BB2C for <[EMAIL PROTECTED]>; Wed, 3 Nov 2004 01:29:14 -0600 (CST) Received: from mail pickup service by kmanus.com with Microsoft SMTPSVC; Wed, 3 Nov 2004 14:17:54 -0800 Received: from 194.3.74.35 by by7fd.bay7.kmanus.com with HTTP; Wed, 3 Nov 2004 14:17:54 GMT X-Originating-IP: [194.3.74.35] X-Originating-Email: [EMAIL PROTECTED] X-Sender: [EMAIL PROTECTED] From: Bebe <[EMAIL PROTECTED]> To: X <[EMAIL PROTECTED]> Subject: re: our appreciation Date: 3 Nov 2004 14:17:54 -0500 Mime-Version: 1.0 Content-type: text/html Message-ID: <[EMAIL PROTECTED]> X-Spam-Checker-Version: SpamAssassin 2.64 (2004-01-11) on dalton.vul.com X-Spam-DCC: : dalton 1182; Body=1 Fuz1=1 Fuz2=1 X-Spam-AWL: Auto_Whitelist= X-Spam-Status: No, hits=1.7 required=6.5 tests=BAYES_00,CP_RANDOMWORD_10, HTML_MESSAGE,MIME_HTML_NO_CHARSET,MIME_HTML_ONLY,OB_URI_RBL, RCVD_IN_SBL,SARE_HTML_FSIZE_1ALL,WS_URI_RBL autolearn=no version=2.64 X-Spam-Level: * Status: RO X-Status: X-Keywords: X-UID: 1219 == <> > -Original Message- > From: John Andersen [mailto:[EMAIL PROTECTED] > Sent: Wednesday, December 01, 2004 2:45 AM > To: [EMAIL PROTECTED] > Subject: Re: Image Composition Analysis > > On Tuesday 30 November 2004 01:27 pm, Smart,Dan wrote: > > > Catching image only E-mail with pornographic images is > really difficult. > > My users are offended when they get one, and wonder how I > could not > > catch it. Explaining that the document was text, filled with bayes > > poison, and the one porn image with no porn words in the document > > doesn't seem to have much of an impression on them. > > Open the image with a text editor and challenge them to > determine if it is spam or not. > > Really, people this dumb should not be turned loose on the internet. > > -- > _ > John Andersen > FW our appreciation.htm Description: Binary data
RE: Image Composition Analysis
>-Original Message- >From: Martin Hepworth [mailto:[EMAIL PROTECTED] >Sent: Wednesday, December 01, 2004 4:39 AM >To: Smart,Dan >Cc: users@spamassassin.apache.org >Subject: Re: Image Composition Analysis > > >Dan > >I find the surbl.org URIRBL list provides very good protection against >this kind of message, along with othe rules in www.rulesemporium.com I >don't recall seeing one slip through for ages.. > Forget that I'm partial to these both, but Martin is right. SURBL stops them. I haven't seen one reported by my users in a very long time. I have analysed many pron images. I will conintue to do so over the coming years. My hope is to find some sort of pattern. Maybe I can get a Gov't grant to help me dig deeper into pron images full time. Yeah that would be cool. Long hours, but I feel it would help everyone. :-) --Chris (I read them for the bayes poisonhonest!)
Re: Image Composition Analysis
Dan I find the surbl.org URIRBL list provides very good protection against this kind of message, along with othe rules in www.rulesemporium.com I don't recall seeing one slip through for ages.. -- Martin Hepworth Snr Systems Administrator Solid State Logic Tel: +44 (0)1865 842300 Smart,Dan wrote: Messagelabs made a big deal of their option of using First 4 Internet's Image Composition Analysis tool to detect pornographic images. Is anyone in the open source world working on something similar. Catching image only E-mail with pornographic images is really difficult. My users are offended when they get one, and wonder how I could not catch it. Explaining that the document was text, filled with bayes poison, and the one porn image with no porn words in the document doesn't seem to have much of an impression on them. <> ** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. This footnote confirms that this email message has been swept for the presence of computer viruses and is believed to be clean. **
Re: Image Composition Analysis
On Tuesday 30 November 2004 01:27 pm, Smart,Dan wrote: > Catching image only E-mail with pornographic images is really difficult. > My users are offended when they get one, and wonder how I could not catch > it. Explaining that the document was text, filled with bayes poison, and > the one porn image with no porn words in the document doesn't seem to have > much of an impression on them. Open the image with a text editor and challenge them to determine if it is spam or not. Really, people this dumb should not be turned loose on the internet. -- _ John Andersen pgpBUVuTN5sjd.pgp Description: signature
Re: Image Composition Analysis
I wonder what kind of a load it is on the filtering machine. {^_-} - Original Message - From: "Smart,Dan" <[EMAIL PROTECTED]> > Messagelabs made a big deal of their option of using First 4 Internet's > Image Composition Analysis tool to detect pornographic images. Is anyone in > the open source world working on something similar. > > Catching image only E-mail with pornographic images is really difficult. My > users are offended when they get one, and wonder how I could not catch it. > Explaining that the document was text, filled with bayes poison, and the one > porn image with no porn words in the document doesn't seem to have much of > an impression on them. > > > <>
RE: Image Composition Analysis
At 07:15 PM 11/30/2004, Smart,Dan wrote: So Razor differs from DCC in that respect. Razor and DCC differ quite a bit when you get into the details. Particularly now that razor has the e8 algorithm, which is more like SURBL than it is like DCC. I gave up on Razor long ago due to delays due to slow Razor response, and repeated Razor outages. Is it more reliable today? Yes, but DCC is still more reliable and faster. (I use both) Late last week one of the servers dropped off, but before that it was several months since the last outage that was long enough for me to notice. I'd recommend keeping your razor_timeout low if delays are painful to your server, but razor is a worthwhile tool. I'd also consider shortening the rediscovery_wait a bit from the two-day default to one-day or so. Usually outages are corrected if you run a discover, since the cloudmark crew tends to update the server list when one drops off.
RE: Image Composition Analysis
So Razor differs from DCC in that respect. I gave up on Razor long ago due to delays due to slow Razor response, and repeated Razor outages. Is it more reliable today? <> > -Original Message- > From: Matt Kettler [mailto:[EMAIL PROTECTED] > Sent: Tuesday, November 30, 2004 5:12 PM > To: Smart,Dan; users@spamassassin.apache.org > Subject: Re: Image Composition Analysis > > At 05:27 PM 11/30/2004, Smart,Dan wrote: > >Messagelabs made a big deal of their option of using First > 4 Internet's > >Image Composition Analysis tool to detect pornographic images. Is > >anyone in the open source world working on something similar. > > Not that I'm aware of. Nor am I particularly impressed with > the First 4 tool. It seems to operate mostly by detecting > what percentage of an image is "skintone", leading to FPs on > things like pictures of babies. > > http://www.computerworld.com/securitytopics/security/story/0, > 10801,80431p2,00.html > > The reviewer felt that a out of 100 hits, 9 FPs was > acceptable.. In SA terms that's an S/O of 0.91.. While > that's not bad, it's not exactly impressive either, > particularly for something that's likely to be CPU > intensive. The article doesn't describe in detail what the > FN rate is, only uses vague terms.. but it doesn't sound > very good either. > > They also excused FN's on messages containing images made > out of several small images. So right out of the box there's > an evasion technique that spammers can use to avoid this > tool with ease. > > Really, if you're not using razor, you should. It's a better > general-purpose solution for this problem, and likely to run > at about the same speed. > > Razor is able to spam-classify individual mime sections of > messages based on reported SHA hashes. This way if a spam > with that image is reported any other spam with that same > mime section will hit. > > This will also help with the image-based pill spams too, not > just the porn ones. > > > > > > >
Re: Image Composition Analysis
At 05:27 PM 11/30/2004, Smart,Dan wrote: Messagelabs made a big deal of their option of using First 4 Internet's Image Composition Analysis tool to detect pornographic images. Is anyone in the open source world working on something similar. Not that I'm aware of. Nor am I particularly impressed with the First 4 tool. It seems to operate mostly by detecting what percentage of an image is "skintone", leading to FPs on things like pictures of babies. http://www.computerworld.com/securitytopics/security/story/0,10801,80431p2,00.html The reviewer felt that a out of 100 hits, 9 FPs was acceptable.. In SA terms that's an S/O of 0.91.. While that's not bad, it's not exactly impressive either, particularly for something that's likely to be CPU intensive. The article doesn't describe in detail what the FN rate is, only uses vague terms.. but it doesn't sound very good either. They also excused FN's on messages containing images made out of several small images. So right out of the box there's an evasion technique that spammers can use to avoid this tool with ease. Really, if you're not using razor, you should. It's a better general-purpose solution for this problem, and likely to run at about the same speed. Razor is able to spam-classify individual mime sections of messages based on reported SHA hashes. This way if a spam with that image is reported any other spam with that same mime section will hit. This will also help with the image-based pill spams too, not just the porn ones.
RE: Image Composition Analysis
The ones that get through have bayes poison at the bottom. It did hit a couple of the SARE rules that look for bayes poison, but didn't score enough to kill it. Very well crafted. <> > -Original Message- > From: Evan Platt [mailto:[EMAIL PROTECTED] > Sent: Tuesday, November 30, 2004 4:53 PM > To: users@spamassassin.apache.org > Subject: Re: Image Composition Analysis > > Smart,Dan said: > > Messagelabs made a big deal of their option of using First 4 > > Internet's Image Composition Analysis tool to detect pornographic > > images. Is anyone in the open source world working on something > > similar. > > > > Catching image only E-mail with pornographic images is > really difficult. > > My > > users are offended when they get one, and wonder how I > could not catch it. > > Explaining that the document was text, filled with bayes > poison, and > > the one porn image with no porn words in the document > doesn't seem to > > have much of an impression on them. > > Well, I'm only a interested end user, not a admin, nor could > I set up SA if my job depended on it, however I did assist > in the configuration a year ago Isn't there a rule for > something like "Image only" e-mails with no text in them? > Modify that rule for additional points, so if a e-mail > consists of a image only it will score higher. I mean > granted, there will be the occasional message from their > friend with "Here's a picture of my new son" that's possibly > a FP, but that should be few and far between. > > Just my .02. > > Evan > >
Re: Image Composition Analysis
Smart,Dan said: > Messagelabs made a big deal of their option of using First 4 Internet's > Image Composition Analysis tool to detect pornographic images. Is anyone > in > the open source world working on something similar. > > Catching image only E-mail with pornographic images is really difficult. > My > users are offended when they get one, and wonder how I could not catch it. > Explaining that the document was text, filled with bayes poison, and the > one > porn image with no porn words in the document doesn't seem to have much of > an impression on them. Well, I'm only a interested end user, not a admin, nor could I set up SA if my job depended on it, however I did assist in the configuration a year ago Isn't there a rule for something like "Image only" e-mails with no text in them? Modify that rule for additional points, so if a e-mail consists of a image only it will score higher. I mean granted, there will be the occasional message from their friend with "Here's a picture of my new son" that's possibly a FP, but that should be few and far between. Just my .02. Evan
Image Composition Analysis
Messagelabs made a big deal of their option of using First 4 Internet's Image Composition Analysis tool to detect pornographic images. Is anyone in the open source world working on something similar. Catching image only E-mail with pornographic images is really difficult. My users are offended when they get one, and wonder how I could not catch it. Explaining that the document was text, filled with bayes poison, and the one porn image with no porn words in the document doesn't seem to have much of an impression on them. <>