https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8107
Kevin A. McGrail <kmcgr...@apache.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |kmcgr...@apache.org --- Comment #1 from Kevin A. McGrail <kmcgr...@apache.org> --- Certainly the image detection failure is a good thing to work on. Is there a good module for PDF parsing as you describe? Re: The additional features, here's my thoughts: 1. mask images KAM: not sure this will be an indicator of spam/ham 2. scaling KAM: not sure this will be an indicator of spam/ham 3. Images used multiple times KAM: not sure this will be an indicator of spam/ham 4. We could prioritize content on page 1 (or simply ignore content on all other pages). Spammers usually put the payload on page 1 and if there are other pages, it's only there to confuse the filters. KAM: This sounds like an interesting balance on efficiency that could be very useful 5. Access images and URI's located in binary data. KAM: Are their PDFs avoiding scanning using this technique? Re: I've already started working on this and I think it's doable but I don't want to duplicate work if someone else is already working on it. I'm not aware of anything in progress and we love new blood. Re: I would also like feedback on whether this should be a drop-in replacement or a totally new plugin. How would it affect the stock ruleset would be my main question to help answer that? What changes would people need to make? For example, are their any affected rules in the KAM Ruleset? -- You are receiving this mail because: You are the assignee for the bug.