Re: Is fuzzyocr i.e. Image scanning
On Wed, Oct 17, 2018 at 09:21:33AM +0700, Olivier wrote: > > That is the way I meant it, it's an AND, not an OR. I see FuzzyOCR as > just one more tool that can be added to SA. The problem is it's so inefficient.. I've never seen image spam as a problem, mostly it hits other rules and MTA blocks if you know what you are doing. My current spam corpus contains only 7% images. For ham it's over 60%, so that's a horrible amount of executing image transformation tools and analyzers for nothing, also thinking how many vulnerabilities have imagemagick etc image tools had. At minimum FuzzyOCR etc should maintain a hash database of good images to skip.. all the these 10 year old plugins are pretty horrid code..
Re: Is fuzzyocr i.e. Image scanning
>On Tue, 16 Oct 2018 11:49:54 +0700 Olivier wrote: >> One of my holdback with FuzzyOCR is that you have to provide an >> independant word list, while we have a very good tool to analyze >> text contents: SpamAssassin itself. So I would much prefer >> FuzzyOCR to feed the OCR'ed text back to SA for further analysis >> (the way pdfAssassin is working). On 16.10.18 13:34, RW wrote: >That works as long as the OCR remains very accurate. What happened >before was that the deployment of OCR lead spammers to make their >text much less readable. On Tue, 16 Oct 2018 15:48:34 +0200 Matus UHLAR - fantomas wrote: I think that original reason was that available OCR programs were not reliable enough. I have tested gocr, ocrad and tesseract some >10 years ago, with not very satisfying results, gocr being best at that time. Since then, google took tesseract and made it much better. I believe tht currently it would bve viable to push ocr output to spamassassin for processing with bayes and other rules. On 16.10.18 18:42, RW wrote: Bayes might work, but I wouldn't like to see it added to body text because corrupted text could look like obfuscation. it should be pushed back to body text just for filters like bayes. The same could/should be done for attachhed .doc, .pdf files etc. -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. 42.7 percent of all statistics are made up on the spot.
Re: Is fuzzyocr i.e. Image scanning
On Wed, 17 Oct 2018, Matus UHLAR - fantomas wrote: On 16.10.18 18:42, RW wrote: Bayes might work, but I wouldn't like to see it added to body text because corrupted text could look like obfuscation. it should be pushed back to body text just for filters like bayes. The same could/should be done for attachhed .doc, .pdf files etc. ...which would be much more reliable than OCR. If it was a resource-allocation decision for pulling text from doc/pdf vs. updating OCR, I'd push for the former. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- The problem is when people look at Yahoo, slashdot, or groklaw and jump from obvious and correct observations like "Oh my God, this place is teeming with utter morons" to incorrect conclusions like "there's nothing of value here".-- Al Petrofsky, in Y! SCOX --- 566 days since the first commercial re-flight of an orbital booster (SpaceX)
Re: Is fuzzyocr i.e. Image scanning
On Wed, 17 Oct 2018, Rupert Gallagher wrote: IC is an effort to dig a hole in the water, because the problem of image spam with obfuscated text cannot be solved by ocr. My approach is a "better safe than sorry" best practice that anyone can implement with existing software: 1. do not display inline the content of attachments and linked resources; 2. give high spam score (>=5) to any email with very low text to image ratio. Your system, your rules, but it won't work for everybody. We routinely receive messages from users needing help which contain 1~2 lines of text describing the issue (like: 'my computer crashed' ) and then a screen-shot taken with a cellphone camera (10~20 megapixel) which is 4~8 MB in size. Sometimes the text is only in the subject and the screen-shot is the only thing in the body. I agree about not displaying inline attachments by default but that is a client configuration issue and we cannot control our users' clients. -- Dave Funk University of Iowa College of Engineering 319/335-5751 FAX: 319/384-0549 1256 Seamans Center Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527 #include Better is not better, 'standard' is better. B{
Re: Is fuzzyocr i.e. Image scanning
On 16.10.18 18:42, RW wrote: Bayes might work, but I wouldn't like to see it added to body text because corrupted text could look like obfuscation. On Wed, 17 Oct 2018, Matus UHLAR - fantomas wrote: it should be pushed back to body text just for filters like bayes. The same could/should be done for attachhed .doc, .pdf files etc. On 17.10.18 07:56, John Hardin wrote: ...which would be much more reliable than OCR. If it was a resource-allocation decision for pulling text from doc/pdf vs. updating OCR, I'd push for the former. this could be easily configured by installing modules or loading them. btw, both PDF and word documents can contain images too ... -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. 99 percent of lawyers give the rest a bad name.
Status Authenticated Received Chain (ARC) Support
Hi, what is the status of ARC Support (https://tools.ietf.org/html/draft-ietf-dmarc-arc-protocol-16)? The perl Mail-DKIM module has ARC support since version 0.50 (https://metacpan.org/pod/release/MBRADSHAW/Mail-DKIM-0.50/lib/Mail/DKIM.pm) Does SpamAssassin use this feature from Mail-DKIM if this version or newer is available? Thanks Markus
HashBL
Hi, please be so kind to answer to my mail address in CC as I subscribed with nomail just for this question only. With SA 3.4.2 you introduced HashBL. However, I’m unsure about how to enable as the documentation states extra lines and I can’t find a score at all. As with other plugins, I expected a hashbl.cf, which will be activated as well and a scoring in the score.cf, but I can’t see neither on my system nor in the subversion tree. Or is the plugin just experimental and should not yet be enabled? Would be happy, if anyone can assist. Regards, Christian smime.p7s Description: S/MIME cryptographic signature
RE: HashBL
Here’s the hashbl.cf on my server: loadplugin Mail::SpamAssassin::Plugin::HashBL HashBL.pm ifplugin Mail::SpamAssassin::Plugin::HashBL header HASHBL_EMAIL eval:check_hashbl_emails('ebl.msbl.org') describe HASHBL_EMAIL Message contains email address found on the EBL scoreHASHBL_EMAIL 1.0 endif HTH… ...Kevin -- Kevin Miller Network/email Administrator, CBJ MIS Dept. 155 South Seward Street Juneau, Alaska 99801 Phone: (907) 586-0242, Fax: (907) 586-4588 Registered Linux User No: 307357 From: Christian Heutger [mailto:christ...@heutger.net] Sent: Wednesday, October 17, 2018 11:03 AM To: users@spamassassin.apache.org Subject: HashBL Hi, please be so kind to answer to my mail address in CC as I subscribed with nomail just for this question only. With SA 3.4.2 you introduced HashBL. However, I’m unsure about how to enable as the documentation states extra lines and I can’t find a score at all. As with other plugins, I expected a hashbl.cf, which will be activated as well and a scoring in the score.cf, but I can’t see neither on my system nor in the subversion tree. Or is the plugin just experimental and should not yet be enabled? Would be happy, if anyone can assist. Regards, Christian
Re: Status Authenticated Received Chain (ARC) Support
On 17 Oct 2018, at 14:27, Markus Kolb wrote: Hi, what is the status of ARC Support (https://tools.ietf.org/html/draft-ietf-dmarc-arc-protocol-16)? It is not supported in any way in SA as of 3.4.2 and I am unaware of anyone proposing an operational model for supporting it. There is no supporting code in the current 'trunk' codebase. If someone were to provide a reasonable model for supporting ARC in SA and a sound implementation, I would expect that it *could* make the 4.0.0 release. This would be made somewhat more likely by the draft progressing to a final RFC, but the critical component is really a well-designed implementation that provides some utility in determining whether or not mail is spam. It is worth noting that the utility of DKIM and hence DMARC to that end has been marginal. Also note that while there will be a 3.4.3 release, there is no chance of this or any other completely new feature being added for it, as 3.4.3 is intended to be the final bug fix release for the 3.x lineage. The perl Mail-DKIM module has ARC support since version 0.50 (https://metacpan.org/pod/release/MBRADSHAW/Mail-DKIM-0.50/lib/Mail/DKIM.pm) Notably, that support is documented as being 10 draft revisions behind the current one, so it might not be wise to actually use it... Does SpamAssassin use this feature from Mail-DKIM if this version or newer is available? No.