Bug#320185: rss2email: non-ASCII long header encoding patch

2006-05-02 Thread Tatsuya Kinoshita
To fix this bug, I've reordered the patch for rss2email_2.57-1.
I hope the attached patch will be applied.

--
Tatsuya Kinoshita
--- rss2email-2.57-1/rss2email.py
+++ rss2email-2.57/rss2email.py
@@ -146,6 +146,8 @@
 for e in ['error', 'gaierror']:
if hasattr(socket, e): socket_errors.append(getattr(socket, e))
 import mimify; from StringIO import StringIO as SIO; mimify.CHARSET = 'utf-8'
+from email.Header import Header
+import re
 
 import feedparser
 feedparser.USER_AGENT = rss2email/+__version__+  
+http://www.aaronsw.com/2002/rss2email/;
@@ -172,13 +174,27 @@
Quote names in email according to RFC822.
return '' + unu(s).replace(\\, ).replace('', '\\') + ''
 
+nonascii = re.compile('[^\000-\177]')
+nonatom = 
re.compile('[^a-zA-Z0-9\011\012\015\040\!\#\$\%\\'\*\+\-\/\=\?\^\_\`\{\|\}\~]')
 # ref. RFC2822, atom.  comment is not supported
+
 def header7bit(s):
QP_CORRUPT headers.
-   #return mimify.mime_encode_header(s + ' ')[:-1]
-   # XXX due to mime_encode_header bug
-   import re
-   p = re.compile('=\n([^ \t])');
-   return p.sub(r'\1', mimify.mime_encode_header(s + ' ')[:-1])
+   charset = 'us-ascii'
+   if nonascii.search(s):
+   charset = 'utf-8'
+   h = Header(s, charset, 50)
+   return h.encode()
+
+def header7bit_phrase(s):
+   QP_CORRUPT headers for phrase.
+   if nonascii.search(s):
+   charset = 'utf-8'
+   else:
+   charset = 'us-ascii'
+   if nonatom.search(s):
+   s = quote822(s)
+   h = Header(s, charset, 50)
+   return h.encode()
 
 ### Parsing Utilities ###
 
@@ -447,12 +463,13 @@
from_addr = unu(getEmail(r.feed, entry))
 
message = (
-   From:  + 
quote822(header7bit(getName(r, entry))) +  +from_addr+ +
-   \nTo:  + header7bit(unu(f.to or 
default_to)) + # set a default email!
+   From:  + 
header7bit_phrase(unu(getName(r, entry))) +  +from_addr+ +
+   \nTo:  + unu(f.to or default_to) + # 
set a default email!
\nSubject:  + 
unu(html2text(header7bit(title))).strip() +
\nDate:  + time.strftime(%a, %d %b 
%Y %H:%M:%S -, datetime) +
\nUser-Agent: rss2email + # really 
should be X-Mailer 
BONUS_HEADER +
+   \nMIME-Version: 1.0 +
\nContent-Type: ) # but 
backwards-compatibility

if ishtml(content):
@@ -467,7 +484,11 @@
message += text/plain
content = unu(content).strip() 
+ \n\nURL: +link

-   message += '; charset=utf-8\n\n' + 
content + \n
+   if nonascii.search(content):
+   message += '; 
charset=utf-8\nContent-Transfer-Encoding: 8bit'
+   else:
+   message += '; 
charset=us-ascii\nContent-Transfer-Encoding: 7bit'
+   message += \n\n + content + \n
 
if QP_REQUIRED:
ins, outs = SIO(message), SIO()


pgpZP6S5YZ0Yg.pgp
Description: PGP signature


Bug#320185: rss2email: non-ASCII long header encoding patch

2005-11-09 Thread Tatsuya Kinoshita
On November 8, 2005 at 3:42PM -0500,
me (at aaronsw.com) wrote:

 The latest patch seems to always QP_CORRUPT the message. Why?

In header, non-ASCII word must be converted to MIME encoded-word.

The patch encodes header (field body), not message body.  Even if
the patch is applied, message body is encoded with UTF-8/8bit (or
US-ASCII/7bit).

*

A possible improvement is encoding field body partially.  The
current patch converts the whole of the field body to MIME
encoded-word if non-ASCII character is found.

For example, the current patch converts

  Subject: ascii and non-áscii

to

  Subject: =?utf-8?q?ascii_and_non-=C3=A1scii?=

However,

  Subject: ascii and =?utf-8?q?non-=C3=A1scii?=

is more readable.

-- 
Tatsuya Kinoshita


pgpeS8Xfmd13S.pgp
Description: PGP signature


Bug#320185: rss2email: non-ASCII long header encoding patch

2005-11-09 Thread Tatsuya Kinoshita
On November 9, 2005 at 4:46PM +0900,
tats (at vega.ocn.ne.jp) wrote:

  The latest patch seems to always QP_CORRUPT the message. Why?

 In header, non-ASCII word must be converted to MIME encoded-word.

| + charset = 'us-ascii'
| + if nonascii.search(s):
| + charset = 'utf-8'
| + h = Header(s, charset, 50)
| + return h.encode()

Note that h.encode() doesn't generate encoded-word if charset is 'us-ascii'.

--
Tatsuya Kinoshita


pgpytgLTLKn9H.pgp
Description: PGP signature


Bug#320185: rss2email: non-ASCII long header encoding patch

2005-11-09 Thread Aaron Swartz
 The patch encodes header (field body), not message body.

Oh whoops, you're right, I was misreading the last clause. That looks
pretty reasonable then.



Bug#320185: rss2email: non-ASCII long header encoding patch

2005-11-08 Thread Aaron Swartz
The latest patch seems to always QP_CORRUPT the message. Why?

See http://cr.yp.to/smtp/8bitmime.html for why this is a bad idea.