Re: [PATCH] parsemail: Fallback to common charsets when charset is None or x-unknown

2014-07-13 Thread Siddhesh Poyarekar
On Mon, Jul 14, 2014 at 10:21:32AM +0800, Jeremy Kerr wrote:
>  if not isinstance(payload, unicode):
> -payload = unicode(payload, charset)
> +charset = part.get_content_charset()
> +
> +# Check that we have a charset that we understand. Otherwise,
> +# ignore it and fallback to our standard set.
> +if charset is not None:
> +try:
> +codec = codecs.lookup(charset)
> +except LookupError:
> +charset = None

Thanks, that looks like a better idea to me too.

Thanks,
Siddhesh


pgpmYTJ0RDaLC.pgp
Description: PGP signature
___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: [PATCH] post-receive: exclude commits from the patch update step

2014-07-13 Thread Brian Norris
Hi Jeremy,

On Mon, Jul 14, 2014 at 10:57:38AM +0800, Jeremy Kerr wrote:
> > Actually, I think the rev-parse serves as a little extra parameter
> > checking. Unless someone can pin a good reason one way or the other,
> > I'll stick with my original patch as I sent it. And I've been testing
> > that version for a few weeks now.
> 
> OK, just did some testing with that: it looks like the rev-parse will
> barf to stderr if EXCLUDE is empty. Should we do this conditionally?

Really? What version of git? If I run either of these, I get no output,
with a 'success' return status:

  git rev-parse
  git rev-parse --not

  $ git --version
  git version 1.9.1

> Or, given that we're essentially doing an exclude on ${1} there,
> how about we just add any excludes to the rev-list instead? Something
> like:
> 
>   EXCLUDES=(refs/heads/upstream)
> 
>   [...]
> 
>   update_patches()
>   {
> local cnt; cnt=0
> include=${2}
> excludes=(${1} ${EXCLUDES[@]})
> revs="$include ${excludes[@]/#/^}"

Wow, so we're really going full-on bash! I suppose that works. Or you
could more simply write:

revs="${2} --not ${1} ${EXCLUDES}"

> for rev in $(git rev-list --no-merges --reverse $revs); do
>   [...]

But whatever works for you; yours (or my modification of it) looks
better than my original, I think. I just thought I'd toss my patch out
there, since I needed a solution personally.

Regards,
Brian
___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: [PATCH] post-receive: exclude commits from the patch update step

2014-07-13 Thread Jeremy Kerr
Hi Brian,

> Actually, I think the rev-parse serves as a little extra parameter
> checking. Unless someone can pin a good reason one way or the other,
> I'll stick with my original patch as I sent it. And I've been testing
> that version for a few weeks now.

OK, just did some testing with that: it looks like the rev-parse will
barf to stderr if EXCLUDE is empty. Should we do this conditionally?

Or, given that we're essentially doing an exclude on ${1} there,
how about we just add any excludes to the rev-list instead? Something
like:

  EXCLUDES=(refs/heads/upstream)

  [...]

  update_patches()
  {
local cnt; cnt=0
include=${2}
excludes=(${1} ${EXCLUDES[@]})
revs="$include ${excludes[@]/#/^}"
for rev in $(git rev-list --no-merges --reverse $revs); do
  [...]


Cheers,


Jeremy

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


[PATCH] parsemail: Fallback to common charsets when charset is None or x-unknown

2014-07-13 Thread Jeremy Kerr
From: Siddhesh Poyarekar 

We recently encountered a case in our glibc patchwork instance on
sourceware, where a patch was dropped because it had x-unknown
charset.

This change adds a fallback on a set of encodings (instead of just
utf-8) when the charset is not mentioned or if it is set as x-unknown.

Minor changes and testcase by Jeremy Kerr 

Signed-off-by: Siddhesh Poyarekar 
Signed-off-by: Jeremy Kerr 

---
 apps/patchwork/bin/parsemail.py |   40 -
 apps/patchwork/tests/mail/0010-invalid-charset.mbox |   91 
 apps/patchwork/tests/test_patchparser.py|   11 +
 3 files changed, 136 insertions(+), 6 deletions(-)

diff --git a/apps/patchwork/bin/parsemail.py b/apps/patchwork/bin/parsemail.py
index b6eb97a..2a4866f 100755
--- a/apps/patchwork/bin/parsemail.py
+++ b/apps/patchwork/bin/parsemail.py
@@ -24,6 +24,7 @@ import re
 import datetime
 import time
 import operator
+import codecs
 from email import message_from_file
 try:
 from email.header import Header, decode_header
@@ -147,6 +148,13 @@ def find_pull_request(content):
 return match.group(1)
 return None
 
+def try_decode(payload, charset):
+try:
+payload = unicode(payload, charset)
+except UnicodeDecodeError:
+return None
+return payload
+
 def find_content(project, mail):
 patchbuf = None
 commentbuf = ''
@@ -157,15 +165,35 @@ def find_content(project, mail):
 continue
 
 payload = part.get_payload(decode=True)
-charset = part.get_content_charset()
 subtype = part.get_content_subtype()
 
-# if we don't have a charset, assume utf-8
-if charset is None:
-charset = 'utf-8'
-
 if not isinstance(payload, unicode):
-payload = unicode(payload, charset)
+charset = part.get_content_charset()
+
+# Check that we have a charset that we understand. Otherwise,
+# ignore it and fallback to our standard set.
+if charset is not None:
+try:
+codec = codecs.lookup(charset)
+except LookupError:
+charset = None
+
+# If there is no charset or if it is unknown, then try some common
+# charsets before we fail.
+if charset is None:
+try_charsets = ['utf-8', 'windows-1252', 'iso-8859-1']
+else:
+try_charsets = [charset]
+
+for cset in try_charsets:
+decoded_payload = try_decode(payload, cset)
+if decoded_payload is not None:
+break
+payload = decoded_payload
+
+# Could not find a valid decoded payload.  Fail.
+if payload is None:
+return (None, None)
 
 if subtype in ['x-patch', 'x-diff']:
 patchbuf = payload
diff --git a/apps/patchwork/tests/mail/0010-invalid-charset.mbox 
b/apps/patchwork/tests/mail/0010-invalid-charset.mbox
new file mode 100644
index 000..a8614ef
--- /dev/null
+++ b/apps/patchwork/tests/mail/0010-invalid-charset.mbox
@@ -0,0 +1,91 @@
+From libc-alpha-return-50517-siddhesh=redhat@sourceware.org Thu Jun  5 
10:36:33 2014
+Received: (qmail 11948 invoked by alias); 4 Jun 2014 17:51:01 -
+Mailing-List: contact libc-alpha-h...@sourceware.org; run by ezmlm
+List-Id: 
+Sender: libc-alpha-ow...@sourceware.org
+Date: Wed, 4 Jun 2014 17:50:46 +
+From: "Joseph S. Myers" 
+To: 
+Subject: Fix pow overflow in non-default rounding modes (bug 16315)
+Message-ID: 
+MIME-Version: 1.0
+Content-Type: multipart/mixed;
+   boundary="-1152306461-1522705971-1401904246=:3719"
+Content-Length: 24171
+
+---1152306461-1522705971-1401904246=:3719
+Content-Type: text/plain; charset="none"
+Content-Transfer-Encoding: QUOTED-PRINTABLE
+
+This patch, relative to a tree with
+ applied,
+fixes bug 16315, bad pow handling of overflow/underflow in non-default
+rounding modes.  Tests of pow are duly converted to ALL_RM_TEST to run
+all tests in all rounding modes.
+
+There are two main issues here.  First, various implementations
+compute a negative result by negating a positive result, but this
+yields inappropriate overflow / underflow values for directed
+rounding, so either overflow / underflow results need recomputing in
+the correct sign, or the relevant overflowing / underflowing operation
+needs to be made to have a result of the correct sign.  Second, the
+dbl-64 implementation sets FE_TONEAREST internally; in the overflow /
+underflow case, the result needs recomputing in the original rounding
+mode.
+
+Tested x86_64 and x86 and ulps updated accordingly.
+
+(auto-libm-test-out diffs omitted below.)
+
+2014-06-04  Joseph Myers  
+
+=09[BZ #16315]
+=09* sysdeps/i386/fpu/e_pow.S (__ieee754_pow): Ensure possibly
+=09overflowing or underflowing operations take place with sign of
+=09result.
+=09* s

Re: [PATCH] Fallback to common charsets when charset is None or x-unknown

2014-07-13 Thread Jeremy Kerr
Hi Siddesh,

> We recently encountered a case in our glibc patchwork instance on
> sourceware, where a patch was dropped because it had x-unknown
> charset.  I used the following patch to fix this in our instance.  The
> fix I used was to fall back on a set of encodings (instead of just
> utf-8) when the charset is not mentioned or if it is set as x-unknown.

I tried this patch with a testcase from the failing patch you forwarded
me, and get the following:

ERROR: testPatch (patchwork.tests.test_patchparser.CharsetFallbackPatchTest)
--
Traceback (most recent call last):
  File
"/home/jk/devel/patchwork/apps/patchwork/tests/test_patchparser.py",
line 432, in testPatch
(patch, comment) = find_content(self.project, self.mail)
  File "/home/jk/devel/patchwork/apps/patchwork/bin/parsemail.py", line
180, in find_content
decoded_payload = try_decode(payload, cset)
  File "/home/jk/devel/patchwork/apps/patchwork/bin/parsemail.py", line
152, in try_decode
payload = unicode(payload, charset)
LookupError: unknown encoding: none

In this case, it looks like the encoding is actually ascii, but the
"none" charset is causing trouble:

  Content-Type: text/plain; charset="none"
  Content-Transfer-Encoding: QUOTED-PRINTABLE

So the failure there is due to the "none", and not the charset of the
patch content. I've put together an updated change (will send it
shortly), could you let me know if it looks OK to you?

Regards,


Jeremy
___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork