Re: The lib email parse problem...
"" wrote: > i have use a temp method to overcome it . > > i still think the email lib should give the boundary border to parse > mail. the email lib you're using is a PARSER, and it's already PARSING the mail for you. (if you have trouble structuring your program when someone else is doing the parsing for you, what makes you think it would be easier if you had to do the parsing yourself as well ?) wrt. your temp method, I think you'll find that a recursive solution would be a lot easier to get right without having to resort to code duplication like in your example; I think John Machin posted an example earlier in this thread. -- http://mail.python.org/mailman/listinfo/python-list
Re: The lib email parse problem...
thanks. i have use a temp method to overcome it . i still think the email lib should give the boundary border to parse mail. code is as following: def parse_mail_content(self, mail): content ='' alter =False subty ='' html ='' plain ='' for part in mail.walk(): if part.is_multipart(): if part.get_content_subtype() =='alternative': alter =True else: alter =False continue if part.get_content_maintype() =='text': if part.get_filename(): continue ty =part.get_content_subtype() ch =part.get_content_charset() if alter and ty =='plain': subty ='plain' if ch: plain =unicode(part.get_payload(decode = True),ch).encode('utf-8') else: plain =part.get_payload(decode = True).decode('gb2312').encode('utf-8') elif alter and ty =='html': subty ='html' if ch: html =unicode(part.get_payload(decode = True),ch).encode('utf-8') else: html =part.get_payload(decode = True).decode('gb2312').encode('utf-8') elif not alter: if subty =='html': content +=html elif subty =='plain': content +=plain alter =False subty ='' if ch: content +=unicode(part.get_payload(decode = True),ch).encode('utf-8') else: content +=part.get_payload(decode = True).decode('gb2312').encode('utf-8') elif alter: if subty =='html': content +=html elif subty =='plain': content +=plain alter =False subty ='' if alter: if subty =='html': content +=html elif subty =='plain': content +=plain return content thanks very much. John Machin wrote: > On 30/08/2006 4:44 PM, 叮叮当当 wrote: > > yes, the special is i must choose exactly one section to destruct, > > instead of processing all subparts. > > So have you tried to use the example I posted yesterday? Do you still > have any problems? Note: it is generally a good idea to post a message > when you have overcome a problem -- that lets would-be helpers know that > they are "off the case" :-) > > Cheers, > John -- http://mail.python.org/mailman/listinfo/python-list
Re: The lib email parse problem...
myself wrote a multipart parser in java(i customise it because i need get information of "upload progress"). and i think it's also easy to implement in python, i've not have time done it, or i'll post it. but if you're no other special needs, just use email lib, it's quick to program and if you really not need some part, just drop it. there's anything wrong with email lib? 叮叮当当 wrote: > yes, the special is i must choose exactly one section to destruct, > instead of processing all subparts. > > > John Machin 写道: > > > Tim Roberts wrote: > > > "" <[EMAIL PROTECTED]> wrote: > > > > > > >i know how to use email module lib. > > > > > > > >the question is about how to handle the rfc 1521 mime > > > >mulitpart/alternitave part . > > > > > > > >i know emai can handle mulitpart , but the subpart alternative is > > > >special . > > > > > > No, it's not. A multipart/alternative section is constructed exactly the > > > same as any other multipart section. It just so happens that it will have > > > exactly two subsections, one text/plain and one text/html. > > > > I was under the impression that it was a little more general than that > > ... see e.g. http://www.freesoft.org/CIE/RFC/1521/18.htm > > > > My guess is that the OP meant special in the sense that the reader > > needs to choose one subpart, instead of processing all subparts. > > > > Cheers, > > John > > > > > > > > > > > -- > > > - Tim Roberts, [EMAIL PROTECTED] > > > Providenza & Boekelheide, Inc. -- http://mail.python.org/mailman/listinfo/python-list
Re: The lib email parse problem...
On 30/08/2006 4:44 PM, 叮叮当当 wrote: > yes, the special is i must choose exactly one section to destruct, > instead of processing all subparts. So have you tried to use the example I posted yesterday? Do you still have any problems? Note: it is generally a good idea to post a message when you have overcome a problem -- that lets would-be helpers know that they are "off the case" :-) Cheers, John -- http://mail.python.org/mailman/listinfo/python-list
Re: The lib email parse problem...
yes, the special is i must choose exactly one section to destruct, instead of processing all subparts. John Machin 写道: > Tim Roberts wrote: > > "" <[EMAIL PROTECTED]> wrote: > > > > >i know how to use email module lib. > > > > > >the question is about how to handle the rfc 1521 mime > > >mulitpart/alternitave part . > > > > > >i know emai can handle mulitpart , but the subpart alternative is > > >special . > > > > No, it's not. A multipart/alternative section is constructed exactly the > > same as any other multipart section. It just so happens that it will have > > exactly two subsections, one text/plain and one text/html. > > I was under the impression that it was a little more general than that > ... see e.g. http://www.freesoft.org/CIE/RFC/1521/18.htm > > My guess is that the OP meant special in the sense that the reader > needs to choose one subpart, instead of processing all subparts. > > Cheers, > John > > > > > > -- > > - Tim Roberts, [EMAIL PROTECTED] > > Providenza & Boekelheide, Inc. -- http://mail.python.org/mailman/listinfo/python-list
Re: The lib email parse problem...
Tim Roberts wrote: > "" <[EMAIL PROTECTED]> wrote: > > >i know how to use email module lib. > > > >the question is about how to handle the rfc 1521 mime > >mulitpart/alternitave part . > > > >i know emai can handle mulitpart , but the subpart alternative is > >special . > > No, it's not. A multipart/alternative section is constructed exactly the > same as any other multipart section. It just so happens that it will have > exactly two subsections, one text/plain and one text/html. I was under the impression that it was a little more general than that ... see e.g. http://www.freesoft.org/CIE/RFC/1521/18.htm My guess is that the OP meant special in the sense that the reader needs to choose one subpart, instead of processing all subparts. Cheers, John > -- > - Tim Roberts, [EMAIL PROTECTED] > Providenza & Boekelheide, Inc. -- http://mail.python.org/mailman/listinfo/python-list
Re: The lib email parse problem...
"" <[EMAIL PROTECTED]> wrote: >i know how to use email module lib. > >the question is about how to handle the rfc 1521 mime >mulitpart/alternitave part . > >i know emai can handle mulitpart , but the subpart alternative is >special . No, it's not. A multipart/alternative section is constructed exactly the same as any other multipart section. It just so happens that it will have exactly two subsections, one text/plain and one text/html. -- - Tim Roberts, [EMAIL PROTECTED] Providenza & Boekelheide, Inc. -- http://mail.python.org/mailman/listinfo/python-list
Re: The lib email parse problem...
叮叮当当 wrote: > this is not enough. > > when a part is mulitpart/alternative, i must find out which sub part i > need, not all the subparts. so i must know when the alternative is > ended. > So you'll have to write your own tree-walker. It would seem that is_multipart(), get_content_type() and get_payload() are the important methods. Here's a quickly lashed-up example: def choose_one(part, html_ok=False): last = None for subpart in part.get_payload(): if html_ok or "html" not in subpart.get_content_type(): last = subpart return last def traverse(part, html_ok=False): mp = part.is_multipart() ty = part.get_content_type() print "multi:%r type:%r file:%r" % (mp, ty, part.get_filename("<>")) if mp: if ty == "multipart/alternative": chosen = choose_one(part, html_ok=html_ok) traverse(chosen, html_ok=html_ok) else: for subpart in part.get_payload(): traverse(subpart, html_ok=html_ok) import email pmsg = email.message_from_string(msg_text) for toggle in (True, False): print "--- html_ok is %r ---" % toggle traverse(pmsg, html_ok=toggle) With a suitable message, this produced: --- html_ok is True --- multi:True type:'multipart/alternative' file:'<>' multi:False type:'text/html' file:'<>' --- html_ok is False --- multi:True type:'multipart/alternative' file:'<>' multi:False type:'text/plain' file:'<>' -- http://mail.python.org/mailman/listinfo/python-list
Re: The lib email parse problem...
"" wrote: > btw, i know how to use walk(), and the question is not this. so what is the question? -- http://mail.python.org/mailman/listinfo/python-list
Re: The lib email parse problem...
i know how to use email module lib. the question is about how to handle the rfc 1521 mime mulitpart/alternitave part . i know emai can handle mulitpart , but the subpart alternative is special . Steve Holden 写道: > 叮叮当当 wrote: > > supose a email part like this: > > > > Content-Type: Multipart/Alternative; > > boundary="Boundary-=_iTIraXJMjfQvFKkvZlqprUgHZWDm" > > > > > > --Boundary-=_iTIraXJMjfQvFKkvZlqprUgHZWDm > > Content-Type: text/plain; charset="gb2312" > > Content-Transfer-Encoding: 7bit > > > >abcd. > > --Boundary-=_iTIraXJMjfQvFKkvZlqprUgHZWDm > > Content-Type: text/html; charset="gb2312" > > Content-Transfer-Encoding: quoted-printable > > > > .. > > --Boundary-=_iTIraXJMjfQvFKkvZlqprUgHZWDm-- > > > > the plain text is abcd, and the alternative content type is text/html, > > i should prefer explain the html content, and i must not explaint the > > two part ,so i want to get the boundary end. > > > > thanks all. > > > In other words, you *haven't* tried the email module. > > email.Parser can cope with arbitrarily complex message structures, > including oddities like attachments which are themselves email messages > containing their own attachments. > > Read the documentation and look for sample code, then get back to the > list with questions about how to make email do what you want it to. > > Please don't ask us to re-invent existing libraries. that's why the > libraries are there. > > regards > Steve > -- > Steve Holden +44 150 684 7255 +1 800 494 3119 > Holden Web LLC/Ltd http://www.holdenweb.com > Skype: holdenweb http://holdenweb.blogspot.com > Recent Ramblings http://del.icio.us/steve.holden -- http://mail.python.org/mailman/listinfo/python-list
Re: The lib email parse problem...
this is just a temp solution for the simplest email format as my example, and i cannot always only show the html part. but in fact , there are many more difficult mail format btw, i know how to use walk(), and the question is not this. my code is as the following: def mail_content(mail): content ='' for part in mail.walk(): if part.is_multipart(): continue ch =part.get_content_charset() if ch: content +=unicode(part.get_payload(decode = True),ch).encode('utf-8') else: content +=part.get_payload(decode = True).decode('gb2312').encode('utf-8') return content Fredrik Lundh 写道: > "" wrote: > > > the plain text is abcd, and the alternative content type is text/html, > > i should prefer explain the html content, and i must not explaint the > > two part ,so i want to get the boundary end. > > so use the email module: > > import email > > message_text = "..." > > message = email.message_from_string(message_text) > > for part in message.walk(): > if part.get_content_type() == "text/html": > print "html is", repr(part.get_payload()) > > (the message instances either contains a payload or sequence of submessages; > use message.is_multipart() to see if it's a sequence or not. the walk() > method > used in this example loops over all submessages, in message order). > > -- http://mail.python.org/mailman/listinfo/python-list
Re: The lib email parse problem...
叮叮当当 wrote: > supose a email part like this: > > Content-Type: Multipart/Alternative; > boundary="Boundary-=_iTIraXJMjfQvFKkvZlqprUgHZWDm" > > > --Boundary-=_iTIraXJMjfQvFKkvZlqprUgHZWDm > Content-Type: text/plain; charset="gb2312" > Content-Transfer-Encoding: 7bit > >abcd. > --Boundary-=_iTIraXJMjfQvFKkvZlqprUgHZWDm > Content-Type: text/html; charset="gb2312" > Content-Transfer-Encoding: quoted-printable > > .. > --Boundary-=_iTIraXJMjfQvFKkvZlqprUgHZWDm-- > > the plain text is abcd, and the alternative content type is text/html, > i should prefer explain the html content, and i must not explaint the > two part ,so i want to get the boundary end. > > thanks all. > In other words, you *haven't* tried the email module. email.Parser can cope with arbitrarily complex message structures, including oddities like attachments which are themselves email messages containing their own attachments. Read the documentation and look for sample code, then get back to the list with questions about how to make email do what you want it to. Please don't ask us to re-invent existing libraries. that's why the libraries are there. regards Steve -- Steve Holden +44 150 684 7255 +1 800 494 3119 Holden Web LLC/Ltd http://www.holdenweb.com Skype: holdenweb http://holdenweb.blogspot.com Recent Ramblings http://del.icio.us/steve.holden -- http://mail.python.org/mailman/listinfo/python-list
Re: The lib email parse problem...
"" wrote: > the plain text is abcd, and the alternative content type is text/html, > i should prefer explain the html content, and i must not explaint the > two part ,so i want to get the boundary end. so use the email module: import email message_text = "..." message = email.message_from_string(message_text) for part in message.walk(): if part.get_content_type() == "text/html": print "html is", repr(part.get_payload()) (the message instances either contains a payload or sequence of submessages; use message.is_multipart() to see if it's a sequence or not. the walk() method used in this example loops over all submessages, in message order). -- http://mail.python.org/mailman/listinfo/python-list
Re: The lib email parse problem...
i just use email module lib. Max M 写道: > 叮叮当当 wrote: > > this is not enough. > > > > when a part is mulitpart/alternative, i must find out which sub part i > > need, not all the subparts. so i must know when the alternative is > > ended. > > > Have you tried the email module at all? > > > -- > > hilsen/regards Max M, Denmark > > http://www.mxm.dk/ > IT's Mad Science > > Phone: +45 66 11 84 94 > Mobile: +45 29 93 42 96 -- http://mail.python.org/mailman/listinfo/python-list
Re: The lib email parse problem...
supose a email part like this: Content-Type: Multipart/Alternative; boundary="Boundary-=_iTIraXJMjfQvFKkvZlqprUgHZWDm" --Boundary-=_iTIraXJMjfQvFKkvZlqprUgHZWDm Content-Type: text/plain; charset="gb2312" Content-Transfer-Encoding: 7bit abcd. --Boundary-=_iTIraXJMjfQvFKkvZlqprUgHZWDm Content-Type: text/html; charset="gb2312" Content-Transfer-Encoding: quoted-printable . --Boundary-=_iTIraXJMjfQvFKkvZlqprUgHZWDm-- the plain text is abcd, and the alternative content type is text/html, i should prefer explain the html content, and i must not explaint the two part ,so i want to get the boundary end. thanks all. -- http://mail.python.org/mailman/listinfo/python-list
Re: The lib email parse problem...
叮叮当当 wrote: > this is not enough. > > when a part is mulitpart/alternative, i must find out which sub part i > need, not all the subparts. so i must know when the alternative is > ended. Have you tried the email module at all? -- hilsen/regards Max M, Denmark http://www.mxm.dk/ IT's Mad Science Phone: +45 66 11 84 94 Mobile: +45 29 93 42 96 -- http://mail.python.org/mailman/listinfo/python-list
Re: The lib email parse problem...
this is not enough. when a part is mulitpart/alternative, i must find out which sub part i need, not all the subparts. so i must know when the alternative is ended. John Machin 写道: > 叮叮当当 wrote: > > hi, all > > > > when a email body consist with multipart/alternative, i must know when > > the boundary ends to parse it, > > > > but the email lib have not provide some function to indicate the > > boundary end, how to solve it ? > > By reading the manual. > http://docs.python.org/lib/module-email.Message.html > > You don't need to concern yourself with boundaries -- a high-level > parser is provided. > > Here's a simple example: > > This script: > > msg_text = """ > [snip -- message is some plain text plus an attached file] > """ > import email > pmsg = email.message_from_string(msg_text) > for part in pmsg.walk(): > print part.get_content_type(), part.get_filename("<>") > > produced this output: > > multipart/mixed <> > text/plain <> > application/octet-stream Extract.py > > For a more comprehensive example, see > http://docs.python.org/lib/node597.html > > HTH, > John -- http://mail.python.org/mailman/listinfo/python-list
Re: The lib email parse problem...
叮叮当当 wrote: > hi, all > > when a email body consist with multipart/alternative, i must know when > the boundary ends to parse it, > > but the email lib have not provide some function to indicate the > boundary end, how to solve it ? By reading the manual. http://docs.python.org/lib/module-email.Message.html You don't need to concern yourself with boundaries -- a high-level parser is provided. Here's a simple example: This script: msg_text = """ [snip -- message is some plain text plus an attached file] """ import email pmsg = email.message_from_string(msg_text) for part in pmsg.walk(): print part.get_content_type(), part.get_filename("<>") produced this output: multipart/mixed <> text/plain <> application/octet-stream Extract.py For a more comprehensive example, see http://docs.python.org/lib/node597.html HTH, John -- http://mail.python.org/mailman/listinfo/python-list
Re: The lib email parse problem...
"" wrote: > when a email body consist with multipart/alternative, i must know when > the boundary ends to parse it, or use a library that understands multipart messages. > but the email lib have not provide some function to indicate the > boundary end, how to solve it ? http://docs.python.org/lib/module-email.Parser.html -- http://mail.python.org/mailman/listinfo/python-list
The lib email parse problem...
hi, all when a email body consist with multipart/alternative, i must know when the boundary ends to parse it, but the email lib have not provide some function to indicate the boundary end, how to solve it ? thanks. -- http://mail.python.org/mailman/listinfo/python-list