On Wed, Feb 17, 2016 at 11:39:22AM -0500, Michael Tiernan wrote: > I'm sure that I'm not the first who tried to find an easy way to > filter a piece of email so that only the plain text comes out. > > I can find lots of things about going plain to HTML but I've not > seen anything that allows you to just extract the "Content-Type: > text/plain" section of an email. > > Any pointers available? I don't want to try and reinvent the > reinvented wheel.
Here is what I use with Mutt to get lightly-formatted text and unobfuscated links. It isn't perfect, but it works acceptably 90% of the time and it avoids downloading any remote links which was my primary goal. >grep mailcap .muttrc set mailcap_path = ~/.muttmailcap set mailcap_sanitize >cat .muttmailcap text/html; /home/cra/bin/striphtml.pl; copiousoutput text/calendar; /home/cra/bin/vcalendar-filter; copiousoutput >cat ~/bin/striphtml.pl #!/usr/bin/perl -w use HTML::Strip; use HTML::LinkExtor; use HTML::Entities qw/decode_entities/; use URI::Escape qw/uri_unescape/; use Encode qw/from_to/; undef $/; my $html_text = <ARGV>; my $charset = 'UTF-8'; if ($html_text =~ /\ncontent-type:\s+text\/html;\s+charset=(.*)/i) { $charset = $1; $charset =~ s/\"//g; } else { print "no char set\n"; #print $html_text; } $html_text =~ s/<br>/\n/gi; $html_text =~ s/<p>/\n/gi; my $hs = HTML::Strip->new(); my $stripped_text = $hs->parse($html_text); my $decoded_text = decode_entities($stripped_text); $decoded_text =~ s/\n\s*\n/\n\n/g; $decoded_text =~ s/\n\n+/\n\n/g; $decoded_text =~ s/\240/ /g; $decoded_text =~ s/\r//g; #$decoded_text = decode($charset, $decoded_text); ###from_to($decoded_text, $charset, 'UTF-8'); my $hl = HTML::LinkExtor->new(); $hl->parse($html_text); my @links = $hl->links; print "Charset: $charset\n"; print "Message:\n\n"; print $decoded_text; print "\nLinks:\n\n"; foreach my $link (@links) { printf "%-7s %-15s %s\n", $$link[0], $$link[1], uri_unescape($$link[2]); } _______________________________________________ Discuss mailing list Discuss@blu.org http://lists.blu.org/mailman/listinfo/discuss