Re: Looking for a script to extract readable text from emails
If that problem ever gets solved, blind users of the internet could do two useful things; first read things faster, and prevent lots of images from taking up user quota space. Those blind that can hear would not want audio content in video or audio files filtered out though. On Tue, 29 Dec 2015, Bill Cole wrote: Date: Tue, 29 Dec 2015 01:07:55 From: Bill ColeReply-To: users@spamassassin.apache.org To: users@spamassassin.apache.org Subject: Re: Looking for a script to extract readable text from emails On 28 Dec 2015, at 23:16, Marc Perkel wrote: I'm looking for a script to extract readable text from emails. I want it demimed, ignore html, images, etc. What I'm looking for is just the readable text (real words). Mostly just need to extract about the first 200 characters of real text. Can someone point me in the right direction? You might be able to adapt or wrap the mimeprint script from the examples includes in the Perl MIME-Tools package. It can disassemble and decode all parts of a message for you. Of course, there's no guarantee that a message *has* a meaningful text body, or that the text part of a multipart/alternative message resembles what a common MUA will show a user by rendering the HTML part. --
Re: Is BAYES filtering working? Having doubts.
With spamassassin, is it possible to have the filter show counts of number of messages sent to spam, number of messages sent to ham, and total number of messages processed that a user can check?On Mon, 28 Dec 2015, Bill Cole wrote: Date: Mon, 28 Dec 2015 23:42:03 From: Bill ColeReply-To: users@spamassassin.apache.org To: users@spamassassin.apache.org Subject: Re: Is BAYES filtering working? Having doubts. On 28 Dec 2015, at 17:54, Peter L. Berghold wrote: The script that I use to pull the messages out of a spam bucket invoking sa-learn runs as root which has permissions to read from anywhere. The complication is the amavis does not have permissions to read the Maildir files for trivial users like root does. That said, I have some thoughts as how to solve that. In case your ideas don't work out... Useful facts: sa-learn reads stdin if you don't give it any file arguments and it can take mbox format as input. Using these facts, my learning script that runs as root and reads from multiple real users' Maildirs does this to learn ham: for AFILE in $HAMS ; do formail < $AFILE ; done| sudo -H -u $SAUSER sa-learn --ham --mbox Where $HAMS is the list of ham message files and $SAUSER is the user handling the system-wide BayesDB. I use formail there just to give each message a leading 'From ' line (i.e. mbox format) so that the whole bunch can be piped into a single sa-learn invocation. The alternative without formail would be to pipe each raw message into its own sa-learn. If you don't have sudo installed or don't like letting root use it, you can replicate the same effect with su in an uglier command line. --
Re: Large spam
I don't know if someone can help me on a question about message components naming but if you can I think I know how to defeat this large spam. Before a message gets opened there is I'll call it a tag like make money fast you'll read and this is not on the Subject: line either. It was those tags I filtered on and managed to send lots of it to /dev/null. None of these filters would or could learn from it and eventually those fields started showing foreign characters too. I never did find out the name of that field otherwise I could have written procmail filters for all of it. I hope this helps someone. On Wed, 15 Jul 2015, Ian Zimmerman wrote: Date: Wed, 15 Jul 2015 16:42:28 From: Ian Zimmerman i...@buug.org To: users@spamassassin.apache.org Subject: Re: Large spam On 2015-07-15 20:12 +, Zinski, Steve wrote: We're starting to see a lot of spam in the 800KB to 1.2MB size range. I?m running MIMEdefang and it?s configured to skip messages larger than 100KB (and I hesitate to increase the limit due to performance issues). I read somewhere that there?s a way to have MIMEdefang (or spamassassin) strip out the non-text portions of the e-mail and scan. Can anyone help me set this up or point me in the right direction? Thanks! Yes, I see the same thing. I have no doubt at all that it is intentional, to defeat spamc size limit in particular. Moreover, mimedefang won't help because at least some of them are disguised as plain text messages. That is, the outermost message body is an entire MIME message, headers and all. --
Re: dangers of email forgery
A little more background on all of this is that both verizon and microsoft had earlier blacklisted shellworld.net on a domain basis as a result of the high volume of spam being forged by several addresses on that domain mine wasn't the only address that was targeted on shellworld.net and I know this since spammers did not use the BCC: field for their other addresses and several of those I read were shellworld.net addresses. -- Twitter: JudeDaShiell On Mon, 30 Mar 2015, Reindl Harald wrote: Am 30.03.2015 um 21:07 schrieb RW: On Mon, 30 Mar 2015 13:55:52 -0400 (EDT) Jude DaShiell wrote: One of them is that spammers forge your address so much you get your account blacklisted and end up having to have it shut down. That happened to me and the jdash...@shellworld.net account. AFAIK there is no blacklist that lists individual sender email addresses the only thing i can imagine from the OP is a URIBL listing the domain and i would be really interested which one would make such major mistakes - more realistic is a local sender blacklist like we do for all the new registered domains used for the recent Apple phishings
Re: dangers of email forgery
Hi, I wasn't and am not the admin of shellworld.net and don't know if the domain got set up with an spf record or not. I know one thing for sure, before I try setting up my own domain, I'll be back here and ask a few questions. For screen reader accessibility I've heard good things about freedns.eu but haven't had any dealings with them yet. The godaddy.com website for screen reader users is inaccessible so they'll not even be in the running. -- Twitter: JudeDaShiell On Mon, 30 Mar 2015, Reindl Harald wrote: Am 30.03.2015 um 19:55 schrieb Jude DaShiell: One of them is that spammers forge your address so much you get your account blacklisted and end up having to have it shut down. That happened to me and the jdash...@shellworld.net account. Anyone doing a google search on shellworld.net blacklisted will find my former shellworld.net address in the first document google returns did you have SPF at that time (now you have) if yes and blacklists listing you because of forged spam from foreign servers you should blame the blacklists and make them public so anybody can stop using that idiots causing collateral damage
dangers of email forgery
One of them is that spammers forge your address so much you get your account blacklisted and end up having to have it shut down. That happened to me and the jdash...@shellworld.net account. Anyone doing a google search on shellworld.net blacklisted will find my former shellworld.net address in the first document google returns. As a result of spammers and blacklisting it's probably a good idea to minimize use of space on internet providers machines since sooner rather than later your account is going to get blown away. What would really be useful for any spam fighting package to acquire is the ability to automatically check headers on messages and forward servers found to be forging to a kill list so those servers could be blacklisted in turn. So far I know of no such software that will do this service. -- Twitter: JudeDaShiell
Re: Handling very large messages (was Re: Which milter do you prefer?)
I have been getting large spam messages for several years on one of my accounts. Since spamassassin cannot handle them, my only recourse are procmail recipes. -- Twitter: JudeDaShiell On Sun, 15 Mar 2015, Robert Schetterer wrote: Am 15.03.2015 um 12:05 schrieb Reindl Harald: Am 14.03.2015 um 20:17 schrieb Robert Schetterer: Am 14.03.2015 um 18:11 schrieb Reindl Harald: nobody but talks about cut content we talk about how to pass only a part to spamassassin instead skip large messages entirely which in many case would be enough to detect a message as spam because the oversize are just binary parts Ok, but big spam mails are extrem rare, i wouldnt invest time in that you are so terrible wrong my intention was never to agree with you more and more spam messages are coming with a very large image because spammers know the default 256 KB limit which also affects commercial products like from Barracuda Networks, that is not a new trend there is a reason for -s 5242880 in our setup while i started with -s 786432 a few months ago as i wrote this may happen at your site, you should not set your experience as ultimate everyone has his/its own spam, i dont see any rise in large mail spam here back to topic i would recommend a two stage spam filtering, if you got in trouble with big spam mail, i.e spamass-milter in front line, then perhaps combine sieve filters with size/spam matches etc Best Regards MfG Robert Schetterer
whitelist formats?
Does a whitelist format exist to whitelist an email list? What is the format to whitelist individuals? I have people and lists improperly showing up in my probably-spam folder so need to keep them in my inbox and not allow spamassassin to toss any more of those messages out of my inbox inappropriately. --
what can be done about deep sea nutrition spam?
The garbage they send is 6MB in length. Their unsubscribe link also doesn't work. --
Re: what can be done about deep sea nutrition spam?
That message will arrive again probably by tomorrow. Due to the size of the message, I'll put it in my web space with full headers and once done send a follow up url to this list. Any interested can then get all the details. -- On Wed, 29 Oct 2014, David Jones wrote: From: Jude DaShiell jdash...@panix.com Sent: Wednesday, October 29, 2014 3:54 PM To: users@spamassassin.apache.org Subject: what can be done about deep sea nutrition spam? The garbage they send is 6MB in length. Their unsubscribe link also doesn't work. Use RBLs that have this server listed (didn't provide any details for us to check) or setup your own RBL with rbldnsd so you can block easily at the MTA level before it gets to SA.
large spam messages
Since spamassassin cannot handle large spam over 2MB in size, what can be used to handle that class of junk? Maybe some of you have got messages from 3 Bureau Monitoring. I get those probably twice daily and much as I dislike it, I will probably terminate that other internet account when time for it runs out. Another account I have uses the web version of spamassassin so when I have to start using that I'll find out what it can do.
punctuation in subjects
Messages with question marks and spaces have been showing up in my inbox on another account. To blacklist these [? ] would take care of those characters in a Subject: line. Would such a regular expression effectively blacklist any message having just those two kinds of characters in its Subject: line in any combination? The ultimate blacklist entry for such messages would include all punctuation and the space character. These messages are written in fonts not translated by us-ascii or unicode, so I'd be open to blacklisting based on national origin of messages as well but think both kinds of blacklist entries will need to be used in order to shut this traffic off permanently. The other internet service provider runs his system wide open and users have to use spamassassin to deal with the consequences.