Re: pill image spam learns to walk

2010-01-12 Thread Matus UHLAR - fantomas
 Ted Mittelstaedt wrote on Mon, 11 Jan 2010 15:27:07 -0800:
 It simply means that sites WITHOUT a PTR are still fully compliant mailers.

 Kai Schaetzl wrote:
 This has nothing to do with RFC-compliance, but with policy, well 
 accepted policy. 

On 11.01.10 20:42, Ted Mittelstaedt wrote:
 Policy that should be handled in SA and not the MTA, which I've said  
 twice now.

It would not be a policy then. There are sites/admins who enforce this
policy at SMTP level. And it's their decision.

If you don't have any, better do not complain to those policy makers but to
your ISP.

-- 
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Windows 2000: 640 MB ought to be enough for anybody


Re: pill image spam learns to walk

2010-01-12 Thread Mike Cardwell

On 12/01/2010 06:28, Chip M. wrote:


Presently it renders them as plain text. I'm fully aware of the
potential problems with it. Ideally I'd like to be able to render
those parts as HTML, but I need to be 100% sure that I've stripped
out anything dangerous (including embedded remote content by
default) first. It's on the ToDo List page.


Nice job Mike! :)

I wrestled with that same issue when I added direct viewing of HTML
content to my offline analysis/FP-pipeline/MassChecks tool.

Originally, I was using an ActiveX wrapper around IE, which (of
course) made me nervous.  I added some VERY simple, crude tag
stripping (script, iframe, style), but was never happy with it.
I ended up switching to an open source HTML rendering component
which :) lacked support for all the scary stuff.

Whatever you decide to do, please do post more about it, and q'pla!


I shall. There are a multitude of modules on cpan for fixing up html and 
stripping out tags. I just need to find time to test them. I've got to 
figure out how to cleanse the CSS as well. Eg, you can execute 
javascript from CSS with stuff like: 
background:url(javascript:someFunction();)



I'm also aware of the issues surrounding people potentially
uploading images and then linking to them from spam websites or
spam. That's why I've put http referer restrictions in place.


Perhaps redirecting to an image saying something like
this is spam? :)


Then people couldn't share direct links to email parts such as images. 
For example, if I went to http://spamalyser.com/v/6xnb26gp/ and clicked 
on the image, it would give me a direct link to the image. I might then 
IM that link to somebody. When they click on the URL, the referer wont 
be valid and I don't want it to display a This is spam image. So what 
it does is redirect you back to http://spamalyser.com/v/6xnb26gp/ and 
jump to the point on the page where the image is displayed. It's a 
little difficult to explain.



What about requiring registration?  Yes, it's not enough to
stop the most determined, but will whittle it down to the least
stupid.


Requiring registration in order to paste emails wont get rid of the 
problem. Requiring registration in order to read the pasted emails would 
completely solve the problem, however I think that would also stop most 
people from using the service. I'm trying to keep it simple.


Anywho, this is probably getting off topic now.

--
Mike Cardwell: UK based IT Consultant, LAMP developer, Linux admin
Cardwell IT Ltd. : UK Company - http://cardwellit.com/   #06920226
Technical Blog   : Tech Blog  - https://secure.grepular.com/blog/
Spamalyser   : Spam Tool  - http://spamalyser.com/


Re: pill image spam learns to walk

2010-01-12 Thread Henrik K
On Tue, Jan 12, 2010 at 10:15:32AM +, Mike Cardwell wrote:
 On 12/01/2010 06:28, Chip M. wrote:

 Presently it renders them as plain text. I'm fully aware of the
 potential problems with it. Ideally I'd like to be able to render
 those parts as HTML, but I need to be 100% sure that I've stripped
 out anything dangerous (including embedded remote content by
 default) first. It's on the ToDo List page.

 Nice job Mike! :)

 I wrestled with that same issue when I added direct viewing of HTML
 content to my offline analysis/FP-pipeline/MassChecks tool.

 Originally, I was using an ActiveX wrapper around IE, which (of
 course) made me nervous.  I added some VERY simple, crude tag
 stripping (script, iframe, style), but was never happy with it.
 I ended up switching to an open source HTML rendering component
 which :) lacked support for all the scary stuff.

 Whatever you decide to do, please do post more about it, and q'pla!

 I shall. There are a multitude of modules on cpan for fixing up html and  
 stripping out tags. I just need to find time to test them. I've got to  
 figure out how to cleanse the CSS as well. Eg, you can execute  
 javascript from CSS with stuff like:  
 background:url(javascript:someFunction();)

IMO whatever you do, there will always be some hole to be found. Your only
safe option is to render the HTML into image and display that. It will also
be always consistent and not depend on browser version.



Re: pill image spam learns to walk

2010-01-12 Thread Kai Schaetzl
Ted, sorry, but your case is lost (since long, look around) and I won't 
bite in such an off-topic discussion here. Please stop telling others that 
refusing to accept mail from non-rDNS machines is incorrect. If you 
*prefer* to handle this at SA level, that's your choice and you can tell 
that. But stop saying in this authoritative way that it is the only 
reputable (=correct) way. It is definitely not.

My last bits on this topic.

Kai

-- 
Get your web at Conactive Internet Services: http://www.conactive.com





[OT] spamalyser, was Re: pill image spam learns to walk

2010-01-12 Thread Mike Cardwell
On 12/01/2010 10:24, Henrik K wrote:

 Presently it renders them as plain text. I'm fully aware of the
 potential problems with it. Ideally I'd like to be able to render
 those parts as HTML, but I need to be 100% sure that I've stripped
 out anything dangerous (including embedded remote content by
 default) first. It's on the ToDo List page.

 Nice job Mike! :)

 I wrestled with that same issue when I added direct viewing of HTML
 content to my offline analysis/FP-pipeline/MassChecks tool.

 Originally, I was using an ActiveX wrapper around IE, which (of
 course) made me nervous.  I added some VERY simple, crude tag
 stripping (script, iframe, style), but was never happy with it.
 I ended up switching to an open source HTML rendering component
 which :) lacked support for all the scary stuff.

 Whatever you decide to do, please do post more about it, and q'pla!

 I shall. There are a multitude of modules on cpan for fixing up html and  
 stripping out tags. I just need to find time to test them. I've got to  
 figure out how to cleanse the CSS as well. Eg, you can execute  
 javascript from CSS with stuff like:  
 background:url(javascript:someFunction();)
 
 IMO whatever you do, there will always be some hole to be found. Your only
 safe option is to render the HTML into image and display that. It will also
 be always consistent and not depend on browser version.

That was a good suggestion and something I hadn't considered. I've
updated Spamalyser to generate PDFs from HTML parts using the WebKit
rendering engine and QT. So the HTML should look the same as on any
Webkit based user agent. From my tests so far, it's an accurate
representation of what you see in your email client. It handles remote
content like images and CSS fine, and also content attached to the email
with Content-ID headers references by cid URIs. Here's a prime example:
http://spamalyser.com/v/jfv3iz0l/mime#part_1.2

PDF is better than an image because it allows you to maintain the links
in the document. A PNG thumbnail generated from the PDF is displayed
along side text/html parts. Clicking that preview image takes you to the
PDF.

I've also tweaked some of the styling so the headers are easier to read.

I've also set up a mailman based mailing list which is linked to from
http://spamalyser.com/ so if anyone wants to discuss anything further to
do with Spamalyser the discussion should probably move there. Any
further announcements will happen there, not here.

-- 
Mike Cardwell: UK based IT Consultant, LAMP developer, Linux admin
Cardwell IT Ltd. : UK Company - http://cardwellit.com/   #06920226
Technical Blog   : Tech Blog  - https://secure.grepular.com/blog/
Spamalyser   : Spam Tool  - http://spamalyser.com/


Re: [OT] spamalyser, was pill image spam learns to walk

2010-01-12 Thread Kai Schaetzl
Mike Cardwell wrote on Tue, 12 Jan 2010 20:22:44 +:

 It handles remote
 content like images and CSS fine

tip: I would not handle remote content at all as this may lead to account 
verification.

Kai

-- 
Get your web at Conactive Internet Services: http://www.conactive.com





pill image spam learns to walk

2010-01-11 Thread Jason Haar
Hi there

We've been getting a few of these leaking through in the past couple of
weeks.

http://pastebin.com/m574da717

They aren't triggering (enough) network rule matches, contain a
bayes-killer, and even FuzzyOCR can't manage the swirly image trick they
pull. Has anyone come up with a way to fight these? (I've actually added
all the phrases that occur in this image to FuzzyOCR - didn't help)


Thanks

-- 
Cheers

Jason Haar
Information Security Manager, Trimble Navigation Ltd.
Phone: +64 3 9635 377 Fax: +64 3 9635 417
PGP Fingerprint: 7A2E 0407 C9A6 CAF6 2B9F 8422 C063 5EBB FE1D 66D1



Re: pill image spam learns to walk

2010-01-11 Thread --[ UxBoD ]--
- Mike Cardwell spamassassin-us...@lists.grepular.com wrote:

| On 11/01/2010 10:22, Jason Haar wrote:
|  Hi there
| 
|  We've been getting a few of these leaking through in the past couple
| of
|  weeks.
| 
|  http://pastebin.com/m574da717
| 
|  They aren't triggering (enough) network rule matches, contain a
|  bayes-killer, and even FuzzyOCR can't manage the swirly image trick
| they
|  pull. Has anyone come up with a way to fight these? (I've actually
| added
|  all the phrases that occur in this image to FuzzyOCR - didn't help)
| 
| I just copied and pasted that out of pastebin into a little project
| I've 
| been working on. Here's the result:
| 
| http://spamalyser.com/v/6xnb26gp/mime
| 
| Unlike with pastebin, it mime decodes emails and you can see the
| decoded 
| image at the bottom of that page.
| 

That is awesome, Mike! really helps to visualise.

--
Thanks - Phil


Re: pill image spam learns to walk

2010-01-11 Thread Kai Schaetzl
scores these new tests on 3.3.0

*  1.0 FORGED_TBIRD_IMG_SIZE Likely forged Thunderbird image spam
*  1.0 FORGED_TBIRD_IMG_ARROW Likely forged Thunderbird image spam

and you could add, say 4.0, for each mail coming thru your SF.net alias 
and not coming from SF.

Kai

-- 
Get your web at Conactive Internet Services: http://www.conactive.com





Re: pill image spam learns to walk

2010-01-11 Thread Charles Gregory
On Mon, 11 Jan 2010, Mike Cardwell wrote:
: I just copied and pasted that out of pastebin into a little project I've 
: been working on. Here's the result:
: http://spamalyser.com/v/6xnb26gp/mime

Question: What does spamalyzer do with an HTML message part?
It is of concern (naturally) that implanted malicious scripts not be 
rendered whole and complete 

- C


Re: pill image spam learns to walk

2010-01-11 Thread Mike Cardwell

On 11/01/2010 14:55, Charles Gregory wrote:

On Mon, 11 Jan 2010, Mike Cardwell wrote:
: I just copied and pasted that out of pastebin into a little project I've
: been working on. Here's the result:
: http://spamalyser.com/v/6xnb26gp/mime

Question: What does spamalyzer do with an HTML message part?
It is of concern (naturally) that implanted malicious scripts not be
rendered whole and complete


Presently it renders them as plain text. I'm fully aware of the 
potential problems with it. Ideally I'd like to be able to render those 
parts as HTML, but I need to be 100% sure that I've stripped out 
anything dangerous (including embedded remote content by default) first. 
It's on the ToDo List page.


I'm also aware of the issues surrounding people potentially uploading 
images and then linking to them from spam websites or spam. That's why 
I've put http referer restrictions in place.


--
Mike Cardwell: UK based IT Consultant, LAMP developer, Linux admin
Cardwell IT Ltd. : UK Company - http://cardwellit.com/   #06920226
Technical Blog   : Tech Blog  - https://secure.grepular.com/blog/
Spamalyser   : Spam Tool  - http://spamalyser.com/


Re: pill image spam learns to walk

2010-01-11 Thread Terry Carmen

On 01/11/2010 05:22 AM, Jason Haar wrote:

Hi there

We've been getting a few of these leaking through in the past couple of
weeks.

http://pastebin.com/m574da717

They aren't triggering (enough) network rule matches, contain a
bayes-killer, and even FuzzyOCR can't manage the swirly image trick they
pull. Has anyone come up with a way to fight these? (I've actually added
all the phrases that occur in this image to FuzzyOCR - didn't help)
Unless you changed the headers, it looks like it came from an IP with no 
reverse DNS entry.


This is easy enough to stop dead in it's tracks at your MTA. If there 
isn't any reverse DNS, the chances of it being a legitimate mail server 
are pretty slim.


Terry



Re: pill image spam learns to walk

2010-01-11 Thread Alex
Hi,

 Unless you changed the headers, it looks like it came from an IP with no
 reverse DNS entry.

 This is easy enough to stop dead in it's tracks at your MTA. If there isn't
 any reverse DNS, the chances of it being a legitimate mail server are pretty
 slim.

Yes, but not enough to categorically block all incoming mail based on
that, though. At least in my environment, all it would take is one
customer to call and complain, and force me to have to do even more
work to make them an exception and exclude them from this filter.

Thanks,
Alex


Re: pill image spam learns to walk

2010-01-11 Thread Alex
HI,

        *  1.0 FORGED_TBIRD_IMG_SIZE Likely forged Thunderbird image spam
        *  1.0 FORGED_TBIRD_IMG_ARROW Likely forged Thunderbird image spam

 and you could add, say 4.0, for each mail coming thru your SF.net alias
 and not coming from SF.

Just to clarify, you're referring to this, right:

Received: from mx.sourceforge.net by mailsrv1.trimble.co.nz
(envelope-from f...@ef-

How would add the rule you are suggesting? It would be specific to
sourceforge.net, and have a table where its authoritative IP and MX
are stored, right?

Thanks,
Alex


Re: pill image spam learns to walk

2010-01-11 Thread Ted Mittelstaedt

Terry Carmen wrote:

On 01/11/2010 05:22 AM, Jason Haar wrote:

Hi there

We've been getting a few of these leaking through in the past couple of
weeks.

http://pastebin.com/m574da717

They aren't triggering (enough) network rule matches, contain a
bayes-killer, and even FuzzyOCR can't manage the swirly image trick they
pull. Has anyone come up with a way to fight these? (I've actually added
all the phrases that occur in this image to FuzzyOCR - didn't help)
Unless you changed the headers, it looks like it came from an IP with no 
reverse DNS entry.


This is easy enough to stop dead in it's tracks at your MTA. If there 
isn't any reverse DNS, the chances of it being a legitimate mail server 
are pretty slim.




This is the WRONG way to do this - it amazes me that in 2010 on an
anti-spam mailing list that we have people making such statements.

The SMTP RFC 2821 does NOT mandate the existence of a PTR record for an 
SMTP sender.  The DNS RFC 1912 also does not mandate a corresponding PTR
for a mailserver hostname.  Implies, yes, but there's no requirement. 
There is a very good reason for this.*  Blocking at the MTA based on the 
lack of a PTR record is incorrect.  The correct way is to assign a spam 
score in SA to hosts lacking a PTR, the same way you do to mail that 
contains HTML, etc.


Ted

* The reason this is NOT mandated anywhere is because if it was then
sites running multiple mailing domains on a single server could easily
overflow the DNS UDP packet space with a list of PTR's for the server - 
causing the resolver to exceed 512 bytes on the DNS UDP response, or

causing a switch to TCP - either of which can break some firewalls.
For example the Cisco PIX came standard out-of-the-box with a DNS
filter that blocked DNS UDP packets larger than 512.




Re: pill image spam learns to walk

2010-01-11 Thread Terry Carmen

On 01/11/2010 12:42 PM, Ted Mittelstaedt wrote:

Terry Carmen wrote:

On 01/11/2010 05:22 AM, Jason Haar wrote:

Hi there

We've been getting a few of these leaking through in the past couple of
weeks.

http://pastebin.com/m574da717

They aren't triggering (enough) network rule matches, contain a
bayes-killer, and even FuzzyOCR can't manage the swirly image trick 
they
pull. Has anyone come up with a way to fight these? (I've actually 
added

all the phrases that occur in this image to FuzzyOCR - didn't help)
Unless you changed the headers, it looks like it came from an IP with 
no reverse DNS entry.


This is easy enough to stop dead in it's tracks at your MTA. If there 
isn't any reverse DNS, the chances of it being a legitimate mail 
server are pretty slim.




This is the WRONG way to do this - it amazes me that in 2010 on an
anti-spam mailing list that we have people making such statements.

The SMTP RFC 2821 does NOT mandate the existence of a PTR record for 
an SMTP sender.  The DNS RFC 1912 also does not mandate a 
corresponding PTR
for a mailserver hostname.  Implies, yes, but there's no requirement. 
There is a very good reason for this.*  Blocking at the MTA based on 
the lack of a PTR record is incorrect.  The correct way is to assign a 
spam score in SA to hosts lacking a PTR, the same way you do to mail 
that contains HTML, etc.


SA is great software, but scanning is not a lightweight process. If I 
can ditch millions of spams before they ever hit SA, and need to 
manually whitelist a couple of IPs, that's a great deal as far as I'm 
concerned.


Every reasonable ISP I've seen has managed to assign a PTR record for 
their mail server. I don't care if it exactly every (or any) domain they 
transport mail for, as long as it exists. Sure, it's possible to break 
things if you work at it hard enough, but generally speaking, I don't care.


Terry













Re: pill image spam learns to walk

2010-01-11 Thread Terry Carmen

On 01/11/2010 12:57 PM, Terry Carmen wrote:
exactly every (or any) domain 


Should be exactly *matches* every (or any) domain

--
Terry Carmen
CNY Support, LLC

315.382.3939
http://cnysupport.com



Re: pill image spam learns to walk

2010-01-11 Thread Kai Schaetzl
Terry Carmen wrote on Mon, 11 Jan 2010 12:08:16 -0500:

 Unless you changed the headers, it looks like it came from an IP with no 
 reverse DNS entry.

Yeah, his own delivery chain. Not really a candidate for blocking ;-)

Kai

-- 
Get your web at Conactive Internet Services: http://www.conactive.com





Re: pill image spam learns to walk

2010-01-11 Thread Kai Schaetzl
Ted Mittelstaedt wrote on Mon, 11 Jan 2010 09:42:25 -0800:

 This is the WRONG way to do this

It's the right way. The FP rate is almost zero and it encourages the few 
offending ones to quickly add rDNS, really quick.

 * The reason this is NOT mandated anywhere is because if it was then
 sites running multiple mailing domains on a single server could easily
 overflow the DNS UDP packet space with a list of PTR's for the server -

We are not talking about adding PTR for all domains, just for exactly 
*one*. And that doesn't even need to resolve back and forth.

Kai

-- 
Get your web at Conactive Internet Services: http://www.conactive.com





Re: pill image spam learns to walk

2010-01-11 Thread Kai Schaetzl
Alex wrote on Mon, 11 Jan 2010 12:38:29 -0500:

 Just to clarify, you're referring to this, right:
 
 Received: from mx.sourceforge.net by mailsrv1.trimble.co.nz
 (envelope-from f...@ef-
 
 How would add the rule you are suggesting? It would be specific to
 sourceforge.net, and have a table where its authoritative IP and MX
 are stored, right?

I would rather look for To: *...@users.sourceforge.net and score if the From 
is not from sourceforge.net (meta-rule). This indicates that it is an 
external mail that was sent to an SF users alias. I've personally not ever 
gotten a legitimate mail to this alias from outside of SF (and I think SF 
admin/dev/news mail uses the target address directly, anyway, and not the 
alias). So, depending on what you get over this route you may either score 
or drop completely.

Kai

-- 
Get your web at Conactive Internet Services: http://www.conactive.com





Re: pill image spam learns to walk

2010-01-11 Thread Ted Mittelstaedt

Kai Schaetzl wrote:

Ted Mittelstaedt wrote on Mon, 11 Jan 2010 09:42:25 -0800:


This is the WRONG way to do this


It's the right way. The FP rate is almost zero and it encourages the few 
offending ones to quickly add rDNS, really quick.



* The reason this is NOT mandated anywhere is because if it was then
sites running multiple mailing domains on a single server could easily
overflow the DNS UDP packet space with a list of PTR's for the server -


We are not talking about adding PTR for all domains, just for exactly 
*one*. And that doesn't even need to resolve back and forth.




Clearly you fail to understand anything, here.

PTR's are not mandated because the standard has to apply to all sites,
both sites with multiple domains and sites without.  It does not mean
that because it's not mandated that it's a bad idea to add a PTR record.
It simply means that sites WITHOUT a PTR are still fully compliant mailers.

The entire point of SA is to filter based on fuzzy logic, meaning
that the sender's mail is only wrong based on an arbitrary standard that
the person running SA pulls out of their ass.  A no PTR rule is
EXACTLY the kind of fuzzy decision that SA is designed to make decisions
on.  That is where that kind of rule belongs.

Your advice is kind of like the guy who puts a spoiler on a sports
car that is never driven faster than 100mph.  The spoiler, Spamassassin
in this case, is an expensive, gas-mileage sucking dunsel that is only 
there because of the bragging rights the guy gets by having it there,

it does absolutely nothing to help the car.  In fact, anyone who knows
anything about fast cars, looks at the thing and thinks how gay is
that? and what a moron the idiot driving it is.

If you want to build a mailserver WITHOUT SA, then sure, go ahead and
add in rules like no PTR to the MTA - because you cannot do it any
other way.

But don't spend the money and CPU cycles putting SA on a mailserver
and then have it sit there doing nothing, like that spoiler on the
ass-end of a trans-am.

In other words, be a professional not a bozo!

Ted


Re: pill image spam learns to walk - best way to block it - hostkarma

2010-01-11 Thread Marc Perkel
For what it's worth my Lunk Email Filter service block 100% of virus 
generated spam such as this pill image spam. But anyone can tap into 
this for free by doing 2 things.


First - add tarbaby.junkemailfilter.com as you highest numbered MX record.

Second - use the hostkarma.junkemailfilter.com black list.

To be really effective you need to do both. Bot spam tends to spam all 
MX records and focuses on the highest MX. So using us as the highest MX 
lets us harvest your spam bot info. Then when you use our black list - 
it's tuned to the spambots that are spamming you. So it becomes even 
more effective.


And spam attempts to our tarbaby server is a spam that you're not 
getting not need to use your resources to block. So a significant amount 
of your spam will just go away.


We catch spam bots on the first attempt and within 2 minutes they are 
listed in our black list.


Here's the info on these lists:

http://wiki.junkemailfilter.com/index.php/Spam_DNS_Lists

Feel free to use it.



Re: pill image spam learns to walk

2010-01-11 Thread Kai Schaetzl
Ted Mittelstaedt wrote on Mon, 11 Jan 2010 15:27:07 -0800:

 It simply means that sites WITHOUT a PTR are still fully compliant mailers.

This has nothing to do with RFC-compliance, but with policy, well accepted 
policy. If you can't understand that I can't help. No need to shoot this out.

Kai

-- 
Get your web at Conactive Internet Services: http://www.conactive.com





Re: pill image spam learns to walk

2010-01-11 Thread Ted Mittelstaedt

Kai Schaetzl wrote:

Ted Mittelstaedt wrote on Mon, 11 Jan 2010 15:27:07 -0800:


It simply means that sites WITHOUT a PTR are still fully compliant mailers.


This has nothing to do with RFC-compliance, but with policy, well accepted 
policy. 


Policy that should be handled in SA and not the MTA, which I've said 
twice now.



If you can't understand that I can't help.


You cannot help someone when you have no real grasp of the topic
under discussion.


No need to shoot this out.



Well, let's see.  I say it is wrong to tell people to make PTR
checks in the MTA when they have SA running, and to make them in
SA.  Then I explain why you shouldn't do them in the MTA and cite
facts to back up my statements.

You know you can't argue against facts, and you know your wrong,
and rather than just man up and admit it, you try to cover it
up by making the false claim that I am advising to not make PTR
checks at all.  You repeat this false claim multiple times to make 
yourself believe it, and maybe to attempt to get me to forget what I 
said, and adopt your false claim and start arguing for it.


No wonder you don't want to get into a shooting match.  You know
you were caught, and your outgunned.

Ted


Kai





Re: pill image spam learns to walk

2010-01-11 Thread Chip M.
Jason Haar wrote:
They aren't triggering (enough) network rule matches, contain a
bayes-killer, and even FuzzyOCR can't manage the swirly image trick
they pull. Has anyone come up with a way to fight these?

Jason, thanks for the cheerful Subject.  I needed that today. :)

I'm catching all of these, with decent scores (15+).

Here's a few easy things you might score on (up to about 2.5 each):

1. non-huge image which does _NOT_ have an HTML part
   (this will also help with the lonely girl spams; it's highly
unusual for images to be attached to pure text emails; usually
only Nerds send pure text, and our most typical image attachment
is a GIF/PNG screenshot, or a somewhat large JPEG)
2. metas for images that have hit any reliable blocklist
   (I have found Barracuda very helpful - it definitely has a high
FP rate, so score low if you don't have a decent false positives
pipeline)
3. botnet test
4. metas for images sent from/thru unusual nations

These may not be as easy, however may be of :) interest to our
resident developers:

5. all of these have a real name in the From header, with most being
   a single word, which is very unusual
   (note also that _NONE_ have a real name in the To header, which I
do score, but that has a high FP rate so I can not recommend it
unless you have a solid FP pipeline)
6. size of the JPEG header (this may be easy to add to ImageInfo)

I just noticed #6 now, after dumping some image properties for wavy vs
non-wavy spam images, and was surprised by it.  It never occurred to me
to export file hdr size - by now, I :) should have KNOWN better, and
should have added export of ALL properties to my image properties
test last time this sort of thing happened.  I'll fix that next
version. :)

Here's the properties of my last few days of wavy images:
  1 MP#2(jpeg): Area=100804 Density=9.85 bytes=10854(hdr:623,dat:10231) 
(319x316)
  1 MP#2(jpeg): Area=103152 Density=9.32 bytes=11688(hdr:623,dat:11065) 
(336x307)
  1 MP#2(jpeg): Area=103206 Density=5.05 bytes=21045(hdr:623,dat:20422) 
(309x334)
  1 MP#2(jpeg): Area=104304 Density=5.33 bytes=20176(hdr:623,dat:19553) 
(318x328)
  1 MP#2(jpeg): Area=107584 Density=5.58 bytes=19896(hdr:623,dat:19273) 
(328x328)
  1 MP#2(jpeg): Area=108072 Density=9.51 bytes=11982(hdr:623,dat:11359) 
(342x316)
  1 MP#2(jpeg): Area=109472 Density=5.24 bytes=21501(hdr:623,dat:20878) 
(352x311)
  1 MP#2(jpeg): Area= 81104 Density=4.40 bytes=19067(hdr:623,dat:18444) 
(296x274)
  1 MP#2(jpeg): Area= 87809 Density=5.69 bytes=16064(hdr:623,dat:15441) 
(317x277)
  1 MP#2(jpeg): Area= 95142 Density=5.41 bytes=18223(hdr:623,dat:17600) 
(303x314)
  1 MP#2(jpeg): Area= 97148 Density=4.96 bytes=20208(hdr:623,dat:19585) 
(326x298)
The interesting column is hdr:623.
If you're using ImageInfo, the other numbers are useful for limiting
your metas to the total size range typical of these.
The first column is the number of occurrences.

Here's the properties of all NON-wavy spam images from the same period:
  3 MP#2(jpeg): Area=115062 Density= 4.36 bytes=27110(hdr:735,dat:26375) 
(254x453)
  1 MP#2(jpeg): Area=120300 Density= 6.40 bytes=19185(hdr:387,dat:18798) 
(300x401)
  2 MP#2(jpeg): Area=166410 Density=11.62 bytes=14700(hdr:383,dat:14317) 
(430x387)
  1 MP#2(jpeg): Area=166704 Density= 8.55 bytes=19891(hdr:398,dat:19493) 
(453x368)
  1 MP#2(jpeg): Area=197735 Density=13.10 bytes=15476(hdr:380,dat:15096) 
(355x557)
  1 MP#2(jpeg): Area=240800 Density=14.59 bytes=16901(hdr:392,dat:16509) 
(700x344)
  1 MP#3(jpeg): Area=197735 Density=13.10 bytes=15476(hdr:380,dat:15096) 
(355x557)
 17 MP#3(jpeg): Area=239500 Density= 5.53 bytes=43685(hdr:406,dat:43279) 
(479x500)

I dumped the last month's worth of ham image properties from my most
diverse domain, and did find a handful which had that same hdr size
(623), however they all had vastly different areas and/or occurred
with multiple images.

I'll check a few more domains and months' worth, before using that
for real.  I expect to score this in the 2 to 3 range.


Mike Cardwell wrote: 
Presently it renders them as plain text. I'm fully aware of the
potential problems with it. Ideally I'd like to be able to render
those parts as HTML, but I need to be 100% sure that I've stripped
out anything dangerous (including embedded remote content by
default) first. It's on the ToDo List page.

Nice job Mike! :)

I wrestled with that same issue when I added direct viewing of HTML
content to my offline analysis/FP-pipeline/MassChecks tool.

Originally, I was using an ActiveX wrapper around IE, which (of
course) made me nervous.  I added some VERY simple, crude tag
stripping (script, iframe, style), but was never happy with it.
I ended up switching to an open source HTML rendering component
which :) lacked support for all the scary stuff.

Whatever you decide to do, please do post more about it, and q'pla!

I'm also aware of the issues surrounding people potentially
uploading images and then linking to them from