Re: Is fuzzyocr i.e. Image scanning

2018-10-17 Thread Henrik K
On Wed, Oct 17, 2018 at 09:21:33AM +0700, Olivier wrote:
>
> That is the way I meant it, it's an AND, not an OR. I see FuzzyOCR as
> just one more tool that can be added to SA.

The problem is it's so inefficient..  I've never seen image spam as a
problem, mostly it hits other rules and MTA blocks if you know what you are
doing.  My current spam corpus contains only 7% images.  For ham it's over
60%, so that's a horrible amount of executing image transformation tools and
analyzers for nothing, also thinking how many vulnerabilities have
imagemagick etc image tools had.  At minimum FuzzyOCR etc should maintain a
hash database of good images to skip..  all the these 10 year old plugins
are pretty horrid code..



Re: Is fuzzyocr i.e. Image scanning

2018-10-17 Thread Matus UHLAR - fantomas

>On Tue, 16 Oct 2018 11:49:54 +0700 Olivier wrote:
>> One of my holdback with FuzzyOCR is that you have to provide an
>> independant word list, while we have a very good tool to analyze
>> text contents: SpamAssassin itself. So I would much prefer
>> FuzzyOCR to feed the OCR'ed text back to SA for further analysis
>> (the way pdfAssassin is working).

On 16.10.18 13:34, RW wrote:
>That works as long as the OCR remains very accurate. What happened
>before was that the deployment of OCR lead spammers to make their
>text much less readable.



On Tue, 16 Oct 2018 15:48:34 +0200 Matus UHLAR - fantomas wrote:

I think that original reason was that available OCR programs were not
reliable enough.

I have tested gocr, ocrad and tesseract some >10 years ago, with not
very satisfying results, gocr being best at that time.

Since then, google took tesseract and made it much better.

I believe tht currently it would bve viable to push ocr output to
spamassassin for processing with bayes and other rules.


On 16.10.18 18:42, RW wrote:

Bayes might work, but I wouldn't like to see it added to body text
because corrupted text could look like obfuscation.


it should be pushed back to body text just for filters like bayes.
The same could/should be done for attachhed .doc, .pdf files etc.
--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
42.7 percent of all statistics are made up on the spot. 


Re: Is fuzzyocr i.e. Image scanning

2018-10-17 Thread John Hardin

On Wed, 17 Oct 2018, Matus UHLAR - fantomas wrote:


On 16.10.18 18:42, RW wrote:

Bayes might work, but I wouldn't like to see it added to body text
because corrupted text could look like obfuscation.


it should be pushed back to body text just for filters like bayes.
The same could/should be done for attachhed .doc, .pdf files etc.


...which would be much more reliable than OCR.

If it was a resource-allocation decision for pulling text from doc/pdf vs. 
updating OCR, I'd push for the former.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  The problem is when people look at Yahoo, slashdot, or groklaw and
  jump from obvious and correct observations like "Oh my God, this
  place is teeming with utter morons" to incorrect conclusions like
  "there's nothing of value here".-- Al Petrofsky, in Y! SCOX
---
 566 days since the first commercial re-flight of an orbital booster (SpaceX)


Re: Is fuzzyocr i.e. Image scanning

2018-10-17 Thread David B Funk

On Wed, 17 Oct 2018, Rupert Gallagher wrote:


IC is an effort to dig a hole in the water, because the problem of image spam 
with obfuscated text cannot be solved by ocr. 

My approach is a "better safe than sorry" best practice that anyone can 
implement with existing software: 

1. do not display inline the content of attachments and linked resources;
2. give high spam score (>=5) to any email with very low text to image ratio.


Your system, your rules, but it won't work for everybody.

We routinely receive messages from users needing help which contain 1~2 lines of 
text describing the issue (like: 'my computer crashed' ) and then a screen-shot 
taken with a cellphone camera (10~20 megapixel) which is 4~8 MB in size.
Sometimes the text is only in the subject and the screen-shot is the only thing 
in the body.


I agree about not displaying inline attachments by default but that is a client 
configuration issue and we cannot control our users' clients.



--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{

Re: Is fuzzyocr i.e. Image scanning

2018-10-17 Thread Matus UHLAR - fantomas

On 16.10.18 18:42, RW wrote:

Bayes might work, but I wouldn't like to see it added to body text
because corrupted text could look like obfuscation.



On Wed, 17 Oct 2018, Matus UHLAR - fantomas wrote:

it should be pushed back to body text just for filters like bayes.
The same could/should be done for attachhed .doc, .pdf files etc.


On 17.10.18 07:56, John Hardin wrote:

...which would be much more reliable than OCR.

If it was a resource-allocation decision for pulling text from doc/pdf 
vs. updating OCR, I'd push for the former.


this could be easily configured by installing modules or loading them.

btw, both PDF and word documents can contain images too ...


--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
99 percent of lawyers give the rest a bad name. 


Status Authenticated Received Chain (ARC) Support

2018-10-17 Thread Markus Kolb

Hi,

what is the status of ARC Support 
(https://tools.ietf.org/html/draft-ietf-dmarc-arc-protocol-16)?


The perl Mail-DKIM module has ARC support since version 0.50 
(https://metacpan.org/pod/release/MBRADSHAW/Mail-DKIM-0.50/lib/Mail/DKIM.pm)


Does SpamAssassin use this feature from Mail-DKIM if this version or 
newer is available?


Thanks
Markus


HashBL

2018-10-17 Thread Christian Heutger
Hi,

 

please be so kind to answer to my mail address in CC as I subscribed with 
nomail just for this question only. With SA 3.4.2 you introduced HashBL. 
However, I’m unsure about how to enable as the documentation states extra lines 
and I can’t find a score at all. As with other plugins, I expected a hashbl.cf, 
which will be activated as well and a scoring in the score.cf, but I can’t see 
neither on my system nor in the subversion tree. Or is the plugin just 
experimental and should not yet be enabled? Would be happy, if anyone can 
assist.

 

Regards,

Christian



smime.p7s
Description: S/MIME cryptographic signature


RE: HashBL

2018-10-17 Thread Kevin Miller
Here’s the hashbl.cf on my server:


loadplugin Mail::SpamAssassin::Plugin::HashBL   HashBL.pm

ifplugin Mail::SpamAssassin::Plugin::HashBL
header   HASHBL_EMAIL   eval:check_hashbl_emails('ebl.msbl.org')
describe HASHBL_EMAIL   Message contains email address found on the EBL
scoreHASHBL_EMAIL   1.0
endif



HTH…

...Kevin
--
Kevin Miller
Network/email Administrator, CBJ MIS Dept.
155 South Seward Street
Juneau, Alaska 99801
Phone: (907) 586-0242, Fax: (907) 586-4588 Registered Linux User No: 307357

From: Christian Heutger [mailto:christ...@heutger.net]
Sent: Wednesday, October 17, 2018 11:03 AM
To: users@spamassassin.apache.org
Subject: HashBL

Hi,

please be so kind to answer to my mail address in CC as I subscribed with 
nomail just for this question only. With SA 3.4.2 you introduced HashBL. 
However, I’m unsure about how to enable as the documentation states extra lines 
and I can’t find a score at all. As with other plugins, I expected a hashbl.cf, 
which will be activated as well and a scoring in the score.cf, but I can’t see 
neither on my system nor in the subversion tree. Or is the plugin just 
experimental and should not yet be enabled? Would be happy, if anyone can 
assist.

Regards,
Christian


Re: Status Authenticated Received Chain (ARC) Support

2018-10-17 Thread Bill Cole

On 17 Oct 2018, at 14:27, Markus Kolb wrote:


Hi,

what is the status of ARC Support 
(https://tools.ietf.org/html/draft-ietf-dmarc-arc-protocol-16)?


It is not supported in any way in SA as of 3.4.2 and I am unaware of 
anyone proposing an operational model for supporting it. There is no 
supporting code in the current 'trunk' codebase. If someone were to 
provide a reasonable model for supporting ARC in SA and a sound 
implementation, I would expect that it *could* make the 4.0.0 release. 
This would be made somewhat more likely by the draft progressing to a 
final RFC, but the critical component is really a well-designed 
implementation that provides some utility in determining whether or not 
mail is spam. It is worth noting that the utility of DKIM and hence 
DMARC to that end has been marginal.


Also note that while there will be a 3.4.3 release, there is no chance 
of this or any other completely new feature being added for it, as 3.4.3 
is intended to be the final bug fix release for the 3.x lineage.


The perl Mail-DKIM module has ARC support since version 0.50 
(https://metacpan.org/pod/release/MBRADSHAW/Mail-DKIM-0.50/lib/Mail/DKIM.pm)


Notably, that support is documented as being 10 draft revisions behind 
the current one, so it might not be wise to actually use it...


Does SpamAssassin use this feature from Mail-DKIM if this version or 
newer is available?


No.