Re: [PATCH] parsemail: Fallback to common charsets when charset is None or x-unknown

2014-07-13 Thread Siddhesh Poyarekar
On Mon, Jul 14, 2014 at 10:21:32AM +0800, Jeremy Kerr wrote:
>  if not isinstance(payload, unicode):
> -payload = unicode(payload, charset)
> +charset = part.get_content_charset()
> +
> +# Check that we have a charset that we understand. Otherwise,
> +# ignore it and fallback to our standard set.
> +if charset is not None:
> +try:
> +codec = codecs.lookup(charset)
> +except LookupError:
> +charset = None

Thanks, that looks like a better idea to me too.

Thanks,
Siddhesh


pgpmYTJ0RDaLC.pgp
Description: PGP signature
___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


[PATCH] parsemail: Fallback to common charsets when charset is None or x-unknown

2014-07-13 Thread Jeremy Kerr
From: Siddhesh Poyarekar 

We recently encountered a case in our glibc patchwork instance on
sourceware, where a patch was dropped because it had x-unknown
charset.

This change adds a fallback on a set of encodings (instead of just
utf-8) when the charset is not mentioned or if it is set as x-unknown.

Minor changes and testcase by Jeremy Kerr 

Signed-off-by: Siddhesh Poyarekar 
Signed-off-by: Jeremy Kerr 

---
 apps/patchwork/bin/parsemail.py |   40 -
 apps/patchwork/tests/mail/0010-invalid-charset.mbox |   91 
 apps/patchwork/tests/test_patchparser.py|   11 +
 3 files changed, 136 insertions(+), 6 deletions(-)

diff --git a/apps/patchwork/bin/parsemail.py b/apps/patchwork/bin/parsemail.py
index b6eb97a..2a4866f 100755
--- a/apps/patchwork/bin/parsemail.py
+++ b/apps/patchwork/bin/parsemail.py
@@ -24,6 +24,7 @@ import re
 import datetime
 import time
 import operator
+import codecs
 from email import message_from_file
 try:
 from email.header import Header, decode_header
@@ -147,6 +148,13 @@ def find_pull_request(content):
 return match.group(1)
 return None
 
+def try_decode(payload, charset):
+try:
+payload = unicode(payload, charset)
+except UnicodeDecodeError:
+return None
+return payload
+
 def find_content(project, mail):
 patchbuf = None
 commentbuf = ''
@@ -157,15 +165,35 @@ def find_content(project, mail):
 continue
 
 payload = part.get_payload(decode=True)
-charset = part.get_content_charset()
 subtype = part.get_content_subtype()
 
-# if we don't have a charset, assume utf-8
-if charset is None:
-charset = 'utf-8'
-
 if not isinstance(payload, unicode):
-payload = unicode(payload, charset)
+charset = part.get_content_charset()
+
+# Check that we have a charset that we understand. Otherwise,
+# ignore it and fallback to our standard set.
+if charset is not None:
+try:
+codec = codecs.lookup(charset)
+except LookupError:
+charset = None
+
+# If there is no charset or if it is unknown, then try some common
+# charsets before we fail.
+if charset is None:
+try_charsets = ['utf-8', 'windows-1252', 'iso-8859-1']
+else:
+try_charsets = [charset]
+
+for cset in try_charsets:
+decoded_payload = try_decode(payload, cset)
+if decoded_payload is not None:
+break
+payload = decoded_payload
+
+# Could not find a valid decoded payload.  Fail.
+if payload is None:
+return (None, None)
 
 if subtype in ['x-patch', 'x-diff']:
 patchbuf = payload
diff --git a/apps/patchwork/tests/mail/0010-invalid-charset.mbox 
b/apps/patchwork/tests/mail/0010-invalid-charset.mbox
new file mode 100644
index 000..a8614ef
--- /dev/null
+++ b/apps/patchwork/tests/mail/0010-invalid-charset.mbox
@@ -0,0 +1,91 @@
+From libc-alpha-return-50517-siddhesh=redhat@sourceware.org Thu Jun  5 
10:36:33 2014
+Received: (qmail 11948 invoked by alias); 4 Jun 2014 17:51:01 -
+Mailing-List: contact libc-alpha-h...@sourceware.org; run by ezmlm
+List-Id: 
+Sender: libc-alpha-ow...@sourceware.org
+Date: Wed, 4 Jun 2014 17:50:46 +
+From: "Joseph S. Myers" 
+To: 
+Subject: Fix pow overflow in non-default rounding modes (bug 16315)
+Message-ID: 
+MIME-Version: 1.0
+Content-Type: multipart/mixed;
+   boundary="-1152306461-1522705971-1401904246=:3719"
+Content-Length: 24171
+
+---1152306461-1522705971-1401904246=:3719
+Content-Type: text/plain; charset="none"
+Content-Transfer-Encoding: QUOTED-PRINTABLE
+
+This patch, relative to a tree with
+ applied,
+fixes bug 16315, bad pow handling of overflow/underflow in non-default
+rounding modes.  Tests of pow are duly converted to ALL_RM_TEST to run
+all tests in all rounding modes.
+
+There are two main issues here.  First, various implementations
+compute a negative result by negating a positive result, but this
+yields inappropriate overflow / underflow values for directed
+rounding, so either overflow / underflow results need recomputing in
+the correct sign, or the relevant overflowing / underflowing operation
+needs to be made to have a result of the correct sign.  Second, the
+dbl-64 implementation sets FE_TONEAREST internally; in the overflow /
+underflow case, the result needs recomputing in the original rounding
+mode.
+
+Tested x86_64 and x86 and ulps updated accordingly.
+
+(auto-libm-test-out diffs omitted below.)
+
+2014-06-04  Joseph Myers  
+
+=09[BZ #16315]
+=09* sysdeps/i386/fpu/e_pow.S (__ieee754_pow): Ensure possibly
+=09overflowing or underflowing operations take place with sign of
+=09result.
+=09* s