Re: question for the part location of multipart message

2022-01-02 Thread raf
On Sun, Jan 02, 2022 at 01:23:38PM +0100, Jaroslaw Rafa  
wrote:

> Dnia  1.01.2022 o godz. 11:40:37 Frank Hwa pisze:
> > 
> > For a multipart message, is text/plain part always in the first
> > location?
> > I just want to extract the plain text body of a message. I use the
> > code below (python), but was not very sure.
> 
> I have a perl script that extracts all text/plain parts from multipart
> messages, up to 5 levels nesting of multipart messages one inside another
> (that level is configurable via a parameter in the script).
> 
> If you want to look at it, it's here: http://rafa.eu.org/media/textconv.pl
> -- 
> Regards,
>Jaroslaw Rafa
>r...@rafa.eu.org
> --
> "In a million years, when kids go to school, they're gonna know: once there
> was a Hushpuppy, and she lived with her daddy in the Bathtub."

Another thing that might help is my "textmail" program
which is a mail filter that converts non-text
attachments into text attachments where possible (using
external translation programs), and deletes attachments
that can't be translated to text (like images).

It replaces multipart/alternative parts with the
text/plain part unless it looks vestigial, in which
case it replaces them with the other alternative part
converted to text. This is often much better than just
grabbing the text/plain attachment, since it might just
say something like "Your email client does not support
HTML email". There are a few builtin tests to identify
vestigial text/plain parts, and you can add new ones if
necessary.

It can also save attachments with particular mimetypes.

A command like this does something like what you want:

  cat msg | textmail | textmail -F text/plain -G /path/for/attachments 
>/dev/null

That performs the default transformations, then saves
all resulting text/plain attachments to a directory,
and discards the resulting mail message.

  https://raf.org/textmail
  https://github.com/raforg/textmail

However, it requires multiple external processes
(textmail/perl itself and the translators), and so
probably only works on UNIX-like systems.

If you need it to be pure Python, and aren't expecting
any vestigial text/plain parts, you could modify your
existing script to recursively examine all parts
looking for text/plain. Something like this:

def get_text_parts(msg):
parts = []
if msg.is_multipart():
for part in msg.get_payload():
parts.extend(get_text_parts(part))
elif msg.get_content_type() == 'text/plain':
parts.append(msg.get_payload())
return parts
text_parts = get_text_parts(email.message_from_string(x))
print('%r' % text_parts)

cheers,
raf



Re: question for the part location of multipart message

2022-01-02 Thread Jaroslaw Rafa
Dnia  1.01.2022 o godz. 11:40:37 Frank Hwa pisze:
> 
> For a multipart message, is text/plain part always in the first
> location?
> I just want to extract the plain text body of a message. I use the
> code below (python), but was not very sure.

I have a perl script that extracts all text/plain parts from multipart
messages, up to 5 levels nesting of multipart messages one inside another
(that level is configurable via a parameter in the script).

If you want to look at it, it's here: http://rafa.eu.org/media/textconv.pl
-- 
Regards,
   Jaroslaw Rafa
   r...@rafa.eu.org
--
"In a million years, when kids go to school, they're gonna know: once there
was a Hushpuppy, and she lived with her daddy in the Bathtub."


Re: question for the part location of multipart message

2021-12-31 Thread Bill Cole

On 2021-12-31 at 22:40:37 UTC-0500 (Sat, 01 Jan 2022 11:40:37 +0800)
Frank Hwa 
is rumored to have said:


Hello

For a multipart message, is text/plain part always in the first 
location?


No.

Arguably it SHOULD be first in a multipart/alternative, but it isn't 
always. Also, there are other subtypes of multipart that can include 
text/plain subparts elsewhere.


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


question for the part location of multipart message

2021-12-31 Thread Frank Hwa

Hello

For a multipart message, is text/plain part always in the first 
location?
I just want to extract the plain text body of a message. I use the code 
below (python), but was not very sure.


b = email.message_from_string(x)

if b.is_multipart():
for part in b.get_payload():
print part.get_payload()
break
else:
print b.get_payload()


Thank you.