[PATCH] dim: decode email message content charset to unicode

Jani Nikula Wed, 16 Sep 2020 02:58:47 -0700

Email messages need two levels of decoding: First, content transfer
encoding, such as base64 or quoted-printable. Second, charset decoding.


We've done the first (with part.get_payload(decode=True)), but we've
ignored the charset. Mostly, it has not mattered, since most email is
ascii or utf-8 anyway, and python2 has been relaxed about it. However,
python3 part.get_payload(decode=True) gives us binary instead of
unicode, so we also need to do the charset decoding to get the result we
want.

The problem has likely been observed only now that 'python' no longer
exists or points at python3 instead of python2.

Use part.get_content_charset() for charset decoding, defaulting to
'us-ascii' source charset if nothing is specified.

Cc: Rodrigo Vivi <[email protected]>
Cc: Daniel Vetter <[email protected]>
Signed-off-by: Jani Nikula <[email protected]>
---
 dim | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/dim b/dim
index c3a048db8956..3f489976c6bc 100755
--- a/dim
+++ b/dim
@@ -447,7 +447,7 @@ def print_msg(file):
     msg = email.message_from_file(file)
     for part in msg.walk():
         if part.get_content_type() == 'text/plain':
-            print(part.get_payload(decode=True))
+            
print(part.get_payload(decode=True).decode(part.get_content_charset(failobj='us-ascii')))
 
 print_msg(open('$1', 'r'))
 EOF
-- 
2.20.1

_______________________________________________
dim-tools mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/dim-tools

[PATCH] dim: decode email message content charset to unicode

Reply via email to