Re: Looking for a script to extract readable text from emails

2015-12-29 Thread Jude DaShiell
If that problem ever gets solved, blind users of the internet could do 
two useful things; first read things faster, and prevent lots of images 
from taking up user quota space.  Those blind that can hear would not 
want audio content in video or audio files filtered out though.


On Tue, 29 Dec 2015, Bill Cole wrote:


Date: Tue, 29 Dec 2015 01:07:55
From: Bill Cole 
Reply-To: users@spamassassin.apache.org
To: users@spamassassin.apache.org
Subject: Re: Looking for a script to extract readable text from emails

On 28 Dec 2015, at 23:16, Marc Perkel wrote:

I'm looking for a script to extract readable text from emails. I want it 
demimed, ignore html, images, etc. What I'm looking for is just the 
readable text (real words). Mostly just need to extract about the first 200 
characters of real text.


Can someone point me in the right direction?


You might be able to adapt or wrap the mimeprint script from the examples 
includes in the Perl MIME-Tools package. It can disassemble and decode all 
parts of a message for you.


Of course, there's no guarantee that a message *has* a meaningful text body, 
or that the text part of a multipart/alternative message resembles what a 
common MUA will show a user by rendering the HTML part.




--



Re: Is BAYES filtering working? Having doubts.

2015-12-29 Thread Jude DaShiell
With spamassassin, is it possible to have the filter show counts of number 
of messages sent to spam, number of messages sent to ham, and total number 
of messages processed that a user can check?On Mon, 28 Dec 2015, Bill Cole 
wrote:



Date: Mon, 28 Dec 2015 23:42:03
From: Bill Cole 
Reply-To: users@spamassassin.apache.org
To: users@spamassassin.apache.org
Subject: Re: Is BAYES filtering working? Having doubts.

On 28 Dec 2015, at 17:54, Peter L. Berghold wrote:


The script that I use to pull the messages out of a
spam bucket invoking sa-learn runs as root which has permissions to read
from anywhere.  The complication is the amavis does not have permissions
to read the Maildir files for trivial users like root does.

That said, I have some thoughts as how to solve that.


In case your ideas don't work out...

Useful facts: sa-learn reads stdin if you don't give it any file arguments 
and it can take mbox format as input.


Using these facts, my learning script that runs as root and reads from 
multiple real users' Maildirs does this to learn ham:


 for AFILE in $HAMS ; do formail < $AFILE ; done| sudo -H -u $SAUSER 
sa-learn --ham --mbox


Where $HAMS is the list of ham message files and $SAUSER is the user handling 
the system-wide BayesDB. I use formail there just to give each message a 
leading 'From ' line (i.e. mbox format) so that the whole bunch can be piped 
into a single sa-learn invocation. The alternative without formail would be 
to pipe each raw message into its own sa-learn.  If you don't have sudo 
installed or don't like letting root use it, you can replicate the same 
effect with su in an uglier command line.




--



Re: Large spam

2015-07-16 Thread Jude DaShiell
I don't know if someone can help me on a question about message 
components naming but if you can I think I know how to defeat this large 
spam.  Before a message gets opened there is I'll call it a tag like 
make money fast you'll read and this is not on the Subject: line either.
 It was those tags I filtered on and managed to send lots of it to 
/dev/null.  None of these filters would or could learn from it and 
eventually those fields started showing foreign characters too.  I never 
did find out the name of that field otherwise I could have written 
procmail filters for all of it.  I hope this helps someone.


On Wed, 15 Jul 2015, Ian Zimmerman wrote:


Date: Wed, 15 Jul 2015 16:42:28
From: Ian Zimmerman i...@buug.org
To: users@spamassassin.apache.org
Subject: Re: Large spam

On 2015-07-15 20:12 +, Zinski, Steve wrote:


We're starting to see a lot of spam in the 800KB to 1.2MB size
range. I?m running MIMEdefang and it?s configured to skip messages
larger than 100KB (and I hesitate to increase the limit due to
performance issues). I read somewhere that there?s a way to have
MIMEdefang (or spamassassin) strip out the non-text portions of the
e-mail and scan. Can anyone help me set this up or point me in the
right direction? Thanks!


Yes, I see the same thing.  I have no doubt at all that it is
intentional, to defeat spamc size limit in particular.

Moreover, mimedefang won't help because at least some of them are
disguised as plain text messages.  That is, the outermost message body
is an entire MIME message, headers and all.




--



Re: dangers of email forgery

2015-03-31 Thread Jude DaShiell
A little more background on all of this is that both verizon and microsoft 
had earlier blacklisted shellworld.net on a domain basis as a result of 
the high volume of spam being forged by several addresses on that domain 
mine wasn't the only address that was targeted on shellworld.net and I 
know this since spammers did not use the BCC: field for their other 
addresses and several of those I read were shellworld.net addresses.




-- Twitter: JudeDaShiell


On Mon, 30 Mar 2015, Reindl Harald wrote:



Am 30.03.2015 um 21:07 schrieb RW:

On Mon, 30 Mar 2015 13:55:52 -0400 (EDT)
Jude DaShiell wrote:


One of them is that spammers forge your address so much you get your
account blacklisted and end up having to have it shut down.  That
happened to me and the jdash...@shellworld.net account.


AFAIK there is no blacklist that lists individual sender email
addresses


the only thing i can imagine from the OP is a URIBL listing the domain and i 
would be really interested which one would make such major mistakes - more 
realistic is a local sender blacklist like we do for all the new registered 
domains used for the recent Apple phishings





Re: dangers of email forgery

2015-03-31 Thread Jude DaShiell
Hi, I wasn't and am not the admin of shellworld.net and don't know if the 
domain got set up with an spf record or not.  I know one thing for sure, 
before I try setting up my own domain, I'll be back here and ask a few 
questions.  For screen reader accessibility I've heard good things about 
freedns.eu but haven't had any dealings with them yet.  The godaddy.com 
website for screen reader users is inaccessible so they'll not even be in 
the running.




-- Twitter: JudeDaShiell


On Mon, 30 Mar 2015, Reindl Harald wrote:




Am 30.03.2015 um 19:55 schrieb Jude DaShiell:

One of them is that spammers forge your address so much you get your
account blacklisted and end up having to have it shut down.  That
happened to me and the jdash...@shellworld.net account.  Anyone doing a
google search on shellworld.net blacklisted will find my former
shellworld.net address in the first document google returns


did you have SPF at that time (now you have)

if yes and blacklists listing you because of forged spam from foreign servers 
you should blame the blacklists and make them public so anybody can stop 
using that idiots causing collateral damage





dangers of email forgery

2015-03-30 Thread Jude DaShiell
One of them is that spammers forge your address so much you get your 
account blacklisted and end up having to have it shut down.  That happened 
to me and the jdash...@shellworld.net account.  Anyone doing a google 
search on shellworld.net blacklisted will find my former shellworld.net 
address in the first document google returns.  As a result of spammers and 
blacklisting it's probably a good idea to minimize use of space on 
internet providers machines since sooner rather than later your account is 
going to get blown away.


What would really be useful for any spam fighting package to acquire is 
the ability to automatically check headers on messages and forward servers 
found to be forging to a kill list so those servers could be blacklisted 
in turn.  So far I know of no such software that will do this service.




-- Twitter: JudeDaShiell



Re: Handling very large messages (was Re: Which milter do you prefer?)

2015-03-15 Thread Jude DaShiell
I have been getting large spam messages for several years on one of my 
accounts.  Since spamassassin cannot handle them, my only recourse are 
procmail recipes.



-- Twitter: JudeDaShiell


On Sun, 15 Mar 2015, Robert Schetterer wrote:


Am 15.03.2015 um 12:05 schrieb Reindl Harald:


Am 14.03.2015 um 20:17 schrieb Robert Schetterer:

Am 14.03.2015 um 18:11 schrieb Reindl Harald:

nobody but talks about cut content

we talk about how to pass only a part to spamassassin instead skip large
messages entirely which in many case would be enough to detect a message
as spam because the oversize are just binary parts


Ok, but big spam mails are extrem rare, i wouldnt invest time in that


you are so terrible wrong


my intention was never to agree with you



more and more spam messages are coming with a very large image because
spammers know the default 256 KB limit which also affects commercial
products like from Barracuda Networks, that is not a new trend

there is a reason for -s 5242880 in our setup while i started with -s
786432 a few months ago



as i wrote this may happen at your site, you should not set your
experience as ultimate
everyone has his/its own spam, i dont see any rise in large mail spam here

back to topic i would recommend a two stage spam filtering, if you got
in trouble with big spam mail, i.e spamass-milter in front line, then
perhaps combine sieve filters with size/spam matches etc

Best Regards
MfG Robert Schetterer




whitelist formats?

2014-11-12 Thread Jude DaShiell
Does a whitelist format exist to whitelist an email list?  What is the 
format to whitelist individuals?  I have people and lists improperly 
showing up in my probably-spam folder so need to keep them in my inbox and 
not allow spamassassin to toss any more of those messages out of my inbox 
inappropriately.




--



what can be done about deep sea nutrition spam?

2014-10-29 Thread Jude DaShiell
The garbage they send is 6MB in length.  Their unsubscribe link also 
doesn't work.




--



Re: what can be done about deep sea nutrition spam?

2014-10-29 Thread Jude DaShiell
That message will arrive again probably by tomorrow.  Due to the size of 
the message, I'll put it in my web space with full headers and once done 
send a follow up url to this list.  Any interested can then get all the 
details.




--


On Wed, 29 Oct 2014, David Jones wrote:


From: Jude DaShiell jdash...@panix.com
Sent: Wednesday, October 29, 2014 3:54 PM
To: users@spamassassin.apache.org
Subject: what can be done about deep sea nutrition spam?



The garbage they send is 6MB in length.  Their unsubscribe link also
doesn't work.


Use RBLs that have this server listed (didn't provide any details for
us to check) or setup your own RBL with rbldnsd so you can block
easily at the MTA level before it gets to SA.


large spam messages

2014-09-04 Thread Jude DaShiell
Since spamassassin cannot handle large spam over 2MB in size, what can be 
used to handle that class of junk?  Maybe some of you have got messages 
from 3 Bureau Monitoring.  I get those probably twice daily and much as I 
dislike it, I will probably terminate that other internet account when 
time for it runs out.
Another account I have uses the web version of spamassassin so when I have 
to start using that I'll find out what it can do.





punctuation in subjects

2014-09-01 Thread Jude DaShiell
Messages with question marks and spaces have been showing up in my inbox 
on another account.  To blacklist these [? ] would take care of those 
characters in a Subject: line.  Would such a regular expression 
effectively blacklist any message having just those two kinds of 
characters in its Subject: line in any combination?  The ultimate 
blacklist entry for such messages would include all punctuation and the 
space character.  These messages are written in fonts not translated by 
us-ascii or unicode, so I'd be open to blacklisting based on national 
origin of messages as well but think both kinds of blacklist entries will 
need to be used in order to shut this traffic off permanently.  The other 
internet service provider runs his system wide open and users have to use 
spamassassin to deal with the consequences.