Re: [PATCH v2 0/9] Tools and fixes for parallel parsing

2018-02-25 Thread Daniel Axtens
Stephen Finucane  writes:

> On Sun, 2018-02-25 at 01:50 +1100, Daniel Axtens wrote:
>> Thomas Petazzoni reported that Patchwork would occasionally lose
>> Buildroot email. Andrew - having talked to jk and sfr - suggested that
>> this may be race-condition related.
>> 
>> I investigated and found some bugs. I first had to develop some tools.
>> Along the way I found other unrelated bugs too.
>> 
>> Patches 1-4 are tooling - ways to do parallel parsing of messages and
>> get and compare the output. (Patch 1 fixes an issue I found when
>> running the tool from patch 2)
>> 
>> Patch 5 is an unrelated fix that came up along the way and
>> demonstrates that humans remain the best fuzzers, and that Python's
>> email module is still adorably* quirky.
>> 
>> Patch 6 is a bug that came up very quickly in testing but is unlikely
>> to be the actual bug Buildroot is hitting, as it can only occur the
>> first time an email address is seen.
>> 
>> Patch 7 is a related tidy-up/optimisation.
>> 
>> Patch 8 fixes up a MySQL-only bug, but also adds some robustness.
>> 
>> I think patch 9 closes the most likely issue for Buildroot patches.
>> 
>> Pending review, patches 5, 6, 8 and 9 should go to stable.
>> 
>> V2: Address review comments from Andrew, see patches.
>> Added the new python script to the automatic pep8 testing.
>> Minor improvement to parallel parse script to allow different
>>  pythons to be used.
>
> Well, this is all rather unfortunate :( When I was developing this, I
> was testing using 'parsearchive' with a single archive, which naturally
> meant everything was running in series. It seems the use of an MTA
> means the 'parsemail' script, which 'parsearchive' harnesses, gets
> called multiple times here and highlights all these races.

Likewise, and yes, I think that's how we get here.

> I've reviewed most of the patches (bar one, which I'm still working on)
> at this point, and think most of these should be merged. However, I've
> noted that some of the patches leave FIXMEs in place. I'm pretty sure
> all of these occur where we apply a "lookup now and save later _if_
> these preconditions are met" pattern, which we do for things like
> series references, authors, etc. This isn't going to be easy to fix,
> IMO.
>
> That brings me to the real question here: what do we need to do to
> ultimately fix this? Far as I can see, there are two options:
>
>  * Save stuff unconditionally. If we're able to find an email, save
>that even if it's not a patch/cover letter/comment. If we find an
>author that we haven't seen before, save that too even if they
>haven't touched a patch, cover letter or comment. This greatly
>simplifies things, but would result in a more congested database and
>API, and would require some background work (saving non-patch/cover
>letter emails).

I don't think we need to do this - and I'm not sure that the problem is
the lookup-now-save-later pattern, I think it's just our implementation.
The big thing is series references, and I think that (apart from
unthreaded series and possibly deeply threaded series?) we can solve
that pretty easily by holding the 'head email' (patch 0 or patch 1) in
the SeriesReference. But I will think about it more; I don't have
certain answers just yet.

>  * Use locks. I recall a patch from Damien Lespiau some years ago
>around making this a parallel process [1]. Couldn't we just do this
>on the critical sections of code (anything that involves DB
>accesses, I guess)? This would require minimal changes but could
>have scalability issues.

So I have Feelings about introducing locking mechanisms.

In particular, looking at [1], I worry we're going to land in the
territory of bizzare and irritating issues that robust futexes / pthread
mutexes deal with.

In short, a futex is a fast userspace mutex device facilitated by the
kernel. It's used to implement higher-level locks/mutexes/semaphores.
It uses shared memory - basically it twiddles bits in memory to use as a
lock.

Now, consider 2 processes, A and B are using shared memory and futexes.
A takes the futex, and B sits waiting for the futex so it can proceed.
While holding the futex, A segfaults. However, A is still holding the
lock, and cannot clear it, so B is now deadlocked.

It's easy to go "oh, we should catch segfaults", and with the right
handler functions you can. So let's say A does that. Now if it
segfaults, the handler unlocks and B continues. This is all well and
good, but now consider the situation where an impatient sysadmin (or the
kernel) SIGKILLs A. In this case no handler can help you: B will always
deadlock.

Robust futexes solve this problem with kernel magic - that's what makes
them "robust". (Basically, the kernel deals with the case of the process
going away with the lock held.)

So, in Damien's case:
 - we could deadlock if there is a Python exception
 - so we catch this with a finally: and unlock
 - but we're still 

Re: [PATCH v2 9/9] parser: don't fail on multiple SeriesReferences

2018-02-25 Thread Daniel Axtens
Stephen Finucane  writes:

> On Sun, 2018-02-25 at 01:50 +1100, Daniel Axtens wrote:
>> Parallel parsing would occasonally fail with:
>> 
>> patchwork.models.MultipleObjectsReturned: get() returned more than one 
>> SeriesReference -- it returned 2!
>> 
>> I think these are happening if you have different processes parsing
>> e.g. 1/3 and 2/3 simultaneously: both will have a reference to 1/3,
>> in the case of 1 it will be the msgid, in the case of 2 it will be
>> in References. So when we come to parse 3/3, .get() finds 2 and
>> throws the exception.
>> 
>> This does not fix the creation of multiple series references; it
>> just causes them to be ignored. We still have serious race conditions
>> with series creation, but I don't yet have clear answers for them.
>> With this patch, they will at least not stop patches from being
>> processed - they'll just lead to wonky series, which we already have.
>> 
>> Reviewed-by: Andrew Donnellan 
>> Signed-off-by: Daniel Axtens 
>
> Correct me if I'm wrong, but this seems to highlight two issues:
>
>  * The race condition between searching for a series reference and
>usingĀ it
>  * The fact we expect a series to be unique for a given msgid and
>project, yet the 'unique_together' is for a given msgid and
>*series*.
>
> The first of these is caught here (yet not fixed, as it's a non-trivial 
> thing to resolve). The second should still be fixed though. Should we
> do this here, as part of this series?

My thinking is no, for these reasons:

 - I want a patch I can backport to stable without requiring a database
   migration, because:

* that seems like a very unusual thing for a stable update

* I don't trust my ability to get it right for a stable release

* relatedly, it will require a lot of testing for me to trust it,
  and I want something sooner rather than later so that the
  Buildroot people can get on with their lives.

* The different numbering of migrations will make the different
  upgrade paths a nightmare to describe, yet alone test

 - I have a long-term ongoing effort to move 'project' up into the
   various tables that use it so as to avoid JOINs and this would be a
   good thing to do in that effort. (It's the next big push on my list,
   it's just a nightmare to get the compatibility and migration code
   going.)

Regards,
Daniel

>
> Stephen
>
>> ---
>>  patchwork/parser.py | 23 +--
>>  1 file changed, 21 insertions(+), 2 deletions(-)
>> 
>> diff --git a/patchwork/parser.py b/patchwork/parser.py
>> index 56dc7006c811..5a7344cee93c 100644
>> --- a/patchwork/parser.py
>> +++ b/patchwork/parser.py
>> @@ -240,6 +240,13 @@ def _find_series_by_references(project, mail):
>>  msgid=ref[:255], series__project=project).series
>>  except SeriesReference.DoesNotExist:
>>  continue
>> +except SeriesReference.MultipleObjectsReturned:
>> +# FIXME: Open bug: this can happen when we're processing
>> +# messages in parallel. Pick the first and log.
>> +logger.error("Multiple SeriesReferences for %s in project %s!" %
>> + (ref[:255], project.name))
>> +return SeriesReference.objects.filter(
>> +msgid=ref[:255], series__project=project).first().series
>>  
>>  
>>  def _find_series_by_markers(project, mail, author):
>> @@ -1037,6 +1044,9 @@ def parse_mail(mail, list_id=None):
>>  series__project=project)
>>  except SeriesReference.DoesNotExist:
>>  SeriesReference.objects.create(series=series, msgid=ref)
>> +except SeriesReference.MultipleObjectsReturned:
>> +logger.error("Multiple SeriesReferences for %s"
>> + " in project %s!" % (ref, project.name))
>>  
>>  # add to a series if we have found one, and we have a numbered
>>  # patch. Don't add unnumbered patches (for example diffs sent
>> @@ -1075,6 +1085,11 @@ def parse_mail(mail, list_id=None):
>>  msgid=msgid, series__project=project).series
>>  except SeriesReference.DoesNotExist:
>>  series = None
>> +except SeriesReference.MultipleObjectsReturned:
>> +logger.error("Multiple SeriesReferences for %s"
>> + " in project %s!" % (msgid, project.name))
>> +series = SeriesReference.objects.filter(
>> +msgid=msgid, series__project=project).first().series
>>  
>>  if not series:
>>  series = Series(project=project,
>> @@ -1087,8 +1102,12 @@ def parse_mail(mail, list_id=None):
>>  # we don't save the in-reply-to or references fields
>>  # for a cover letter, as they can't refer to the same
>> 

Re: [PATCH v2 3/9] tools/scripts: split a mbox N ways

2018-02-25 Thread Daniel Axtens
Stephen Finucane  writes:

> On Sun, 2018-02-25 at 01:50 +1100, Daniel Axtens wrote:
>> To test parallel loading of mail, it's handy to be able to split
>> an existing mbox file into N mbox files in an alternating pattern
>> (e.g. 1 2 1 2 or 1 2 3 4 1 2 3 4 etc)
>> 
>> Introduce tools/scripts as a place to put things like this.
>> 
>> Reviewed-by: Andrew Donnellan 
>> Signed-off-by: Daniel Axtens 
>> 
>> --
>> 
>> v2: address Andrew's review comments
>> for full pep8 compliance, add to tox.ini testing
>> ---
>>  tools/scripts/split_mail.py | 80
>> +
>>  tox.ini |  2 +-
>>  2 files changed, 81 insertions(+), 1 deletion(-)
>>  create mode 100755 tools/scripts/split_mail.py
>> 
>> diff --git a/tools/scripts/split_mail.py
>> b/tools/scripts/split_mail.py
>> new file mode 100755
>> index ..d1e3b06fdf85
>> --- /dev/null
>> +++ b/tools/scripts/split_mail.py
>> @@ -0,0 +1,80 @@
>> +#!/usr/bin/python3
>> +# Patchwork - automated patch tracking system
>> +# Copyright (C) 2018 Daniel Axtens 
>> +#
>> +# This file is part of the Patchwork package.
>> +#
>> +# Patchwork is free software; you can redistribute it and/or modify
>> +# it under the terms of the GNU General Public License as published
>> by
>> +# the Free Software Foundation; either version 2 of the License, or
>> +# (at your option) any later version.
>> +#
>> +# Patchwork is distributed in the hope that it will be useful,
>> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> +# GNU General Public License for more details.
>> +
>> +import sys
>> +import os
>> +import mailbox
>> +
>> +usage = """Split a maildir or mbox into N mboxes
>> +in an alternating pattern
>> +
>> +Usage: ./split_mail.py   
>> +
>> + : input mbox file or Maildir
>> + : output mbox
>> +-1... must not exist
>> +  N-way split"""
>> +
>> +
>> +if len(sys.argv) != 4:
>> +print(usage)
>> +exit(1)
>> +
>> +in_name = sys.argv[1]
>> +out_name = sys.argv[2]
>> +
>> +try:
>> +n = int(sys.argv[3])
>> +except ValueError:
>> +print("N must be an integer.")
>> +print(" ")
>> +print(usage)
>> +exit(1)
>> +
>> +if n < 2:
>> +print("N must be be at least 2")
>> +print(" ")
>> +print(usage)
>> +exit(1)
>> +
>> +if not os.path.exists(in_name):
>> +print("No input at ", in_name)
>> +print(" ")
>> +print(usage)
>> +exit(1)
>> +
>
> Can we just use argparse for this, please? It handles all these kinds
> of checks for us.

I really want to get the core of the series merged and backported and I
don't want to get too caught up in these otherwise perfectly valid
review comments.

How about I split the series in half: 1-4 and 5-9, and then we can
prioritise merging 5-9 while working out these and related quirks in
1-4?

>
>> +print("Opening", in_name)
>> +if os.path.isdir(in_name):
>> +inmail = mailbox.Maildir(in_name)
>> +else:
>> +inmail = mailbox.mbox(in_name)
>> +
>
> This needs to be closed onced open. You'll see warning in Python 3.4
> (?) otherwise.

Oddly I haven't seen them, but I will fix this in the respin of this half.

Regards,
Daniel

>
>> +out = []
>> +for i in range(n):
>> +if os.path.exists(out_name + "-" + str(i + 1)):
>> +print("mbox already exists at ", out_name + "-" + str(i +
>> 1))
>> +print(" ")
>> +print(usage)
>> +exit(1)
>> +
>> +out += [mailbox.mbox(out_name + '-' + str(i + 1))]
>> +
>> +print("Copying messages")
>> +
>> +for (i, msg) in enumerate(inmail):
>> +out[i % n].add(msg)
>> +
>> +print("Done")
>> diff --git a/tox.ini b/tox.ini
>> index 09505f78e157..345f7fe2e15a 100644
>> --- a/tox.ini
>> +++ b/tox.ini
>> @@ -37,7 +37,7 @@ commands =
>>  [testenv:pep8]
>>  basepython = python2.7
>>  deps = flake8
>> -commands = flake8 {posargs} patchwork patchwork/bin/pwclient
>> +commands = flake8 {posargs} patchwork patchwork/bin/pwclient
>> tools/scripts/split_mail.py
>>  
>>  [flake8]
>>  ignore = E129, F405
___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: [PATCH] [RFC] tools: drop vagrant

2018-02-25 Thread Andrew Donnellan

On 24/02/18 12:22, Daniel Axtens wrote:

It served us well, but it's now outdated (Trusty, Python 3.4, etc)
There is no indication that anyone uses it or keeps it up to date.

Signed-off-by: Daniel Axtens 


Vagrant is dead, long live Docker!

Reviewed-by: Andrew Donnellan 

--
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: [PATCH v2 0/9] Tools and fixes for parallel parsing

2018-02-25 Thread Stephen Finucane
On Sun, 2018-02-25 at 01:50 +1100, Daniel Axtens wrote:
> Thomas Petazzoni reported that Patchwork would occasionally lose
> Buildroot email. Andrew - having talked to jk and sfr - suggested that
> this may be race-condition related.
> 
> I investigated and found some bugs. I first had to develop some tools.
> Along the way I found other unrelated bugs too.
> 
> Patches 1-4 are tooling - ways to do parallel parsing of messages and
> get and compare the output. (Patch 1 fixes an issue I found when
> running the tool from patch 2)
> 
> Patch 5 is an unrelated fix that came up along the way and
> demonstrates that humans remain the best fuzzers, and that Python's
> email module is still adorably* quirky.
> 
> Patch 6 is a bug that came up very quickly in testing but is unlikely
> to be the actual bug Buildroot is hitting, as it can only occur the
> first time an email address is seen.
> 
> Patch 7 is a related tidy-up/optimisation.
> 
> Patch 8 fixes up a MySQL-only bug, but also adds some robustness.
> 
> I think patch 9 closes the most likely issue for Buildroot patches.
> 
> Pending review, patches 5, 6, 8 and 9 should go to stable.
> 
> V2: Address review comments from Andrew, see patches.
> Added the new python script to the automatic pep8 testing.
> Minor improvement to parallel parse script to allow different
>  pythons to be used.

Well, this is all rather unfortunate :( When I was developing this, I
was testing using 'parsearchive' with a single archive, which naturally
meant everything was running in series. It seems the use of an MTA
means the 'parsemail' script, which 'parsearchive' harnesses, gets
called multiple times here and highlights all these races.

I've reviewed most of the patches (bar one, which I'm still working on)
at this point, and think most of these should be merged. However, I've
noted that some of the patches leave FIXMEs in place. I'm pretty sure
all of these occur where we apply a "lookup now and save later _if_
these preconditions are met" pattern, which we do for things like
series references, authors, etc. This isn't going to be easy to fix,
IMO.

That brings me to the real question here: what do we need to do to
ultimately fix this? Far as I can see, there are two options:

 * Save stuff unconditionally. If we're able to find an email, save
   that even if it's not a patch/cover letter/comment. If we find an
   author that we haven't seen before, save that too even if they
   haven't touched a patch, cover letter or comment. This greatly
   simplifies things, but would result in a more congested database and
   API, and would require some background work (saving non-patch/cover
   letter emails).
 * Use locks. I recall a patch from Damien Lespiau some years ago
   around making this a parallel process [1]. Couldn't we just do this
   on the critical sections of code (anything that involves DB
   accesses, I guess)? This would require minimal changes but could
   have scalability issues.

Thoughts?

[1] http://patchwork.ozlabs.org/patch/532930/
___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: [PATCH v2 9/9] parser: don't fail on multiple SeriesReferences

2018-02-25 Thread Stephen Finucane
On Sun, 2018-02-25 at 01:50 +1100, Daniel Axtens wrote:
> Parallel parsing would occasonally fail with:
> 
> patchwork.models.MultipleObjectsReturned: get() returned more than one 
> SeriesReference -- it returned 2!
> 
> I think these are happening if you have different processes parsing
> e.g. 1/3 and 2/3 simultaneously: both will have a reference to 1/3,
> in the case of 1 it will be the msgid, in the case of 2 it will be
> in References. So when we come to parse 3/3, .get() finds 2 and
> throws the exception.
> 
> This does not fix the creation of multiple series references; it
> just causes them to be ignored. We still have serious race conditions
> with series creation, but I don't yet have clear answers for them.
> With this patch, they will at least not stop patches from being
> processed - they'll just lead to wonky series, which we already have.
> 
> Reviewed-by: Andrew Donnellan 
> Signed-off-by: Daniel Axtens 

Correct me if I'm wrong, but this seems to highlight two issues:

 * The race condition between searching for a series reference and
   usingĀ it
 * The fact we expect a series to be unique for a given msgid and
   project, yet the 'unique_together' is for a given msgid and
   *series*.

The first of these is caught here (yet not fixed, as it's a non-trivial 
thing to resolve). The second should still be fixed though. Should we
do this here, as part of this series?

Stephen

> ---
>  patchwork/parser.py | 23 +--
>  1 file changed, 21 insertions(+), 2 deletions(-)
> 
> diff --git a/patchwork/parser.py b/patchwork/parser.py
> index 56dc7006c811..5a7344cee93c 100644
> --- a/patchwork/parser.py
> +++ b/patchwork/parser.py
> @@ -240,6 +240,13 @@ def _find_series_by_references(project, mail):
>  msgid=ref[:255], series__project=project).series
>  except SeriesReference.DoesNotExist:
>  continue
> +except SeriesReference.MultipleObjectsReturned:
> +# FIXME: Open bug: this can happen when we're processing
> +# messages in parallel. Pick the first and log.
> +logger.error("Multiple SeriesReferences for %s in project %s!" %
> + (ref[:255], project.name))
> +return SeriesReference.objects.filter(
> +msgid=ref[:255], series__project=project).first().series
>  
>  
>  def _find_series_by_markers(project, mail, author):
> @@ -1037,6 +1044,9 @@ def parse_mail(mail, list_id=None):
>  series__project=project)
>  except SeriesReference.DoesNotExist:
>  SeriesReference.objects.create(series=series, msgid=ref)
> +except SeriesReference.MultipleObjectsReturned:
> +logger.error("Multiple SeriesReferences for %s"
> + " in project %s!" % (ref, project.name))
>  
>  # add to a series if we have found one, and we have a numbered
>  # patch. Don't add unnumbered patches (for example diffs sent
> @@ -1075,6 +1085,11 @@ def parse_mail(mail, list_id=None):
>  msgid=msgid, series__project=project).series
>  except SeriesReference.DoesNotExist:
>  series = None
> +except SeriesReference.MultipleObjectsReturned:
> +logger.error("Multiple SeriesReferences for %s"
> + " in project %s!" % (msgid, project.name))
> +series = SeriesReference.objects.filter(
> +msgid=msgid, series__project=project).first().series
>  
>  if not series:
>  series = Series(project=project,
> @@ -1087,8 +1102,12 @@ def parse_mail(mail, list_id=None):
>  # we don't save the in-reply-to or references fields
>  # for a cover letter, as they can't refer to the same
>  # series
> -SeriesReference.objects.get_or_create(series=series,
> -  msgid=msgid)
> +try:
> +SeriesReference.objects.get_or_create(series=series,
> +  msgid=msgid)
> +except SeriesReference.MultipleObjectsReturned:
> +logger.error("Multiple SeriesReferences for %s"
> + " in project %s!" % (msgid, project.name))
>  
>  cover_letter = CoverLetter(
>  msgid=msgid,

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: [PATCH v2 7/9] parser: avoid an unnecessary UPDATE of Person

2018-02-25 Thread Stephen Finucane
On Sun, 2018-02-25 at 01:50 +1100, Daniel Axtens wrote:
> Analysis of SQL statements showed that when parsing an email, the row
> for the Person who sent the email was always getting updated. This is
> because the test for updating it only checks if the incoming mail has
> *a* name attached to the email address, and not if it has a new name.
> Django is not smart enough to figure that out, and so unconditionally
> UPDATEs the model when asked to save.
> 
> Give it a hand - only update the model and save it if the new name is
> in fact different.

Reviewed-by: Stephen Finucane 
___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: [PATCH v2 5/9] parser: Handle even more exotically broken headers

2018-02-25 Thread Stephen Finucane
On Sun, 2018-02-25 at 01:50 +1100, Daniel Axtens wrote:
> An archive of the Ubuntu kernel team mailing list contains a
> fascinating email that causes the following parse error:
> 
> email.errors.HeaderParseError: header value appears to contain an embedded 
> header:
>   '4Mf^tnii7k\\_EnR5aobBm6Di[DZ9@AX1wJ"okBdX-UoJ>:SRn]c6DDU"qUIwfs98vF>...
> 
> The broken bit seem related to a UTF-8 quoted-printable encoded
> section and to be from an internal attempt to break it over multiple
> lines: here's a snippet from the error message:
> '\n\t=?utf-8?q?Tnf?=\n'
> but interesting the header itself does not contain the new lines, so
> clearly something quite weird is happening behind the scenes!
> 
> This only throws on header.encode(): it actually makes it through
> sanitise_header and into find_headers before throwing the assertion.
> 
> So, try to encode in sanitize_header as a final step.
> 
> Also, fix a hilarious* python bug that this exposes: whitespace-only
> headers cause an index error!

Looks good to me.

Reviewed-by: Stephen Finucane 

> Reviewed-by: Andrew Donnellan 
> Signed-off-by: Daniel Axtens 
> ---
>  patchwork/parser.py   |  8 +
>  patchwork/tests/fuzztests/x-face.mbox | 58 
> +++
>  patchwork/tests/test_parser.py|  7 +++--
>  3 files changed, 71 insertions(+), 2 deletions(-)
>  create mode 100644 patchwork/tests/fuzztests/x-face.mbox
> 
> diff --git a/patchwork/parser.py b/patchwork/parser.py
> index 2cabb3cbc299..cbf88fe4e464 100644
> --- a/patchwork/parser.py
> +++ b/patchwork/parser.py
> @@ -129,6 +129,14 @@ def sanitise_header(header_contents, header_name=None):
>   header_name=header_name,
>   continuation_ws='\t')
>  
> +try:
> +header.encode()
> +except (HeaderParseError, IndexError):
> +# despite our best efforts, the header is stuffed
> +# HeaderParseError: some very weird multi-line headers
> +# IndexError: bug, thrown by make_header(decode_header(' ')).encode()
> +return None
> +
>  return header
>  
>  
> diff --git a/patchwork/tests/fuzztests/x-face.mbox 
> b/patchwork/tests/fuzztests/x-face.mbox
> new file mode 100644
> index ..98019cff8250
> --- /dev/null
> +++ b/patchwork/tests/fuzztests/x-face.mbox
> @@ -0,0 +1,58 @@
> +From laurent.pinch...@skynet.be Thu Nov 13 15:54:10 2008
> +Received: from mailrelay005.isp.belgacom.be ([195.238.6.171])
> + by chlorine.canonical.com with esmtp (Exim 4.60)
> + (envelope-from ) id 1L0eWI-0007oB-7K
> + for kernel-t...@lists.ubuntu.com; Thu, 13 Nov 2008 15:54:10 +
> +X-IronPort-Anti-Spam-Filtered: true
> +X-IronPort-Anti-Spam-Result: ApsEAP/aG0nCTsYx/2dsb2JhbACBds9Hg1c
> +Received: from 49.198-78-194.adsl-static.isp.belgacom.be (HELO
> + laptop-laurent.belgium.cse-semaphore.com) ([194.78.198.49])
> + by relay.skynet.be with ESMTP; 13 Nov 2008 16:54:09 +0100
> +From: Laurent Pinchart 
> +To: kernel-t...@lists.ubuntu.com
> +Subject: uvcvideo (webcam) support for COMPAL JHL90 based laptops
> +Date: Thu, 13 Nov 2008 16:54:22 +0100
> +User-Agent: KMail/1.9.9
> +X-Face: 
> 4Mf^tnii7k\_EnR5aobBm6Di[DZ9@AX1wJ"okBdX-UoJ>:SRn]c6DDU"qUIwfs98vF>=?utf-8?q?Tnf=0A=09SacR=7B?=(0Du"N%_.#X]"TXx)A'gKB1i7SK$CTLuy{h})c=g:'w3
> +MIME-Version: 1.0
> +Content-Type: text/plain;
> +  charset="us-ascii"
> +Content-Transfer-Encoding: 7bit
> +Content-Disposition: inline
> +Message-Id: <200811131654.22389.laurent.pinch...@skynet.be>
> +X-Mailman-Approved-At: Fri, 14 Nov 2008 14:54:06 +
> +Cc: a...@ubuntu.com
> +X-BeenThere: kernel-t...@lists.ubuntu.com
> +X-Mailman-Version: 2.1.8
> +Precedence: list
> +List-Id: Kernel team discussions 
> +List-Unsubscribe: ,
> + 
> +List-Archive: 
> +List-Post: 
> +List-Help: 
> +List-Subscribe: ,
> + 
> +X-List-Received-Date: Thu, 13 Nov 2008 15:54:10 -
> +
> +Hi Amit,
> +
> +I've noticed by sheer luck that the Ubuntu 8.10 linux-image-2.6.27-7 
> packages 
> +include a patch to the uvcvideo driver to support webcam modules integrated 
> +into Compal JHL90 laptops.
> +
> +Is there a reason why the patch hasn't been pushed upstream ? Being the 
> +uvcvideo author and maintainer, I'd appreciate if you could forward patches 
> +related to the driver in the future.
> +
> +On a pure technical note, the patch might not be required with the current 
> +uvcvideo driver version. There is no way to confirm this without testing the 
> 

Re: [PATCH v2 4/9] tools/scripts: parallel_parsearchive - load archives in parallel

2018-02-25 Thread Stephen Finucane
On Sun, 2018-02-25 at 01:50 +1100, Daniel Axtens wrote:
> If you have multiple archives, you quickly tire of typing stuff like
> python3 manage.py parsearchive --list-id=patchwork.ozlabs.org foo-1 &
> python3 manage.py parsearchive --list-id=patchwork.ozlabs.org foo-2 &
> python3 manage.py parsearchive --list-id=patchwork.ozlabs.org foo-3 &
> python3 manage.py parsearchive --list-id=patchwork.ozlabs.org foo-4 &
> and having to copy and paste it - or retype it! - each time you reset
> the database.
> 
> Instead, this patch allows you to do
> tools/scripts/parallel_parsearchive.sh --list-id=patchwork.ozlabs.org 
> -- foo-*
> 
> Much easier, especially when you are doing it a dozen times.
> 
> Reviewed-by: Andrew Donnellan 
> Signed-off-by: Daniel Axtens 
> 
> --
> v2: Include example, thanks Andrew
> Allow python to be overridden by the PW_PYTHON variable,
> defaulting to Python 3.
> ---

This works, but it does seem less obvious than I'd like. I realise we
can't use threads, thanks to the GIL. However, I have used
multiprocessing here in the past to solve similar problems and there is
prior art here for management commands [1]. Any reason we can't do the
same here?

Stephen

[1] 
https://brobin.me/blog/2017/05/mutiprocessing-in-python-django-management-commands/

>  tools/scripts/parallel_parsearchive.sh | 61
> ++
>  1 file changed, 61 insertions(+)
>  create mode 100755 tools/scripts/parallel_parsearchive.sh
> 
> diff --git a/tools/scripts/parallel_parsearchive.sh
> b/tools/scripts/parallel_parsearchive.sh
> new file mode 100755
> index ..f03875b85d6a
> --- /dev/null
> +++ b/tools/scripts/parallel_parsearchive.sh
> @@ -0,0 +1,61 @@
> +#!/bin/bash
> +# Patchwork - automated patch tracking system
> +# Copyright (C) 2018 Daniel Axtens 
> +#
> +# This file is part of the Patchwork package.
> +#
> +# Patchwork is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published
> by
> +# the Free Software Foundation; either version 2 of the License, or
> +# (at your option) any later version.
> +#
> +# Patchwork is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +
> +set -euo pipefail
> +
> +usage() {
> +cat < +parallel_parsearchive.sh - load archives in parallel
> +Usage:
> +  parallel_parsearchive.sh [parsearchive options] -- 
> +  The -- is mandatory.
> +  As many processes as there are archives will be spun up.
> +
> +Example:
> +  tools/scripts/parallel_parsearchive.sh --list-
> id=patchwork.ozlabs.org -- foo-*
> +EOF
> +exit 1
> +}
> +
> +if [ $# -eq 0 ] || [[ $1 == "-h" ]]; then
> +usage;
> +fi
> +
> +PARSEARCHIVE_OPTIONS=""
> +while [[ $1 != "--" ]]; do
> +PARSEARCHIVE_OPTIONS="$PARSEARCHIVE_OPTIONS $1"
> +shift
> +if [ $# -eq 0 ]; then
> +usage;
> +fi
> +done
> +shift
> +
> +if [ $# -eq 0 ]; then
> +usage;
> +fi
> +
> +set +u
> +if [ -z "$PW_PYTHON" ]; then
> +PW_PYTHON=python3
> +fi
> +set -u
> +
> +for x in "$@"; do
> +echo "Starting $x"
> +"$PW_PYTHON" manage.py parsearchive $PARSEARCHIVE_OPTIONS "$x" &
> +done
> +echo "Processes started in the background."

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: [PATCH v2 3/9] tools/scripts: split a mbox N ways

2018-02-25 Thread Stephen Finucane
On Sun, 2018-02-25 at 01:50 +1100, Daniel Axtens wrote:
> To test parallel loading of mail, it's handy to be able to split
> an existing mbox file into N mbox files in an alternating pattern
> (e.g. 1 2 1 2 or 1 2 3 4 1 2 3 4 etc)
> 
> Introduce tools/scripts as a place to put things like this.
> 
> Reviewed-by: Andrew Donnellan 
> Signed-off-by: Daniel Axtens 
> 
> --
> 
> v2: address Andrew's review comments
> for full pep8 compliance, add to tox.ini testing
> ---
>  tools/scripts/split_mail.py | 80
> +
>  tox.ini |  2 +-
>  2 files changed, 81 insertions(+), 1 deletion(-)
>  create mode 100755 tools/scripts/split_mail.py
> 
> diff --git a/tools/scripts/split_mail.py
> b/tools/scripts/split_mail.py
> new file mode 100755
> index ..d1e3b06fdf85
> --- /dev/null
> +++ b/tools/scripts/split_mail.py
> @@ -0,0 +1,80 @@
> +#!/usr/bin/python3
> +# Patchwork - automated patch tracking system
> +# Copyright (C) 2018 Daniel Axtens 
> +#
> +# This file is part of the Patchwork package.
> +#
> +# Patchwork is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published
> by
> +# the Free Software Foundation; either version 2 of the License, or
> +# (at your option) any later version.
> +#
> +# Patchwork is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +
> +import sys
> +import os
> +import mailbox
> +
> +usage = """Split a maildir or mbox into N mboxes
> +in an alternating pattern
> +
> +Usage: ./split_mail.py   
> +
> + : input mbox file or Maildir
> + : output mbox
> +-1... must not exist
> +  N-way split"""
> +
> +
> +if len(sys.argv) != 4:
> +print(usage)
> +exit(1)
> +
> +in_name = sys.argv[1]
> +out_name = sys.argv[2]
> +
> +try:
> +n = int(sys.argv[3])
> +except ValueError:
> +print("N must be an integer.")
> +print(" ")
> +print(usage)
> +exit(1)
> +
> +if n < 2:
> +print("N must be be at least 2")
> +print(" ")
> +print(usage)
> +exit(1)
> +
> +if not os.path.exists(in_name):
> +print("No input at ", in_name)
> +print(" ")
> +print(usage)
> +exit(1)
> +

Can we just use argparse for this, please? It handles all these kinds
of checks for us.

> +print("Opening", in_name)
> +if os.path.isdir(in_name):
> +inmail = mailbox.Maildir(in_name)
> +else:
> +inmail = mailbox.mbox(in_name)
> +

This needs to be closed onced open. You'll see warning in Python 3.4
(?) otherwise.

> +out = []
> +for i in range(n):
> +if os.path.exists(out_name + "-" + str(i + 1)):
> +print("mbox already exists at ", out_name + "-" + str(i +
> 1))
> +print(" ")
> +print(usage)
> +exit(1)
> +
> +out += [mailbox.mbox(out_name + '-' + str(i + 1))]
> +
> +print("Copying messages")
> +
> +for (i, msg) in enumerate(inmail):
> +out[i % n].add(msg)
> +
> +print("Done")
> diff --git a/tox.ini b/tox.ini
> index 09505f78e157..345f7fe2e15a 100644
> --- a/tox.ini
> +++ b/tox.ini
> @@ -37,7 +37,7 @@ commands =
>  [testenv:pep8]
>  basepython = python2.7
>  deps = flake8
> -commands = flake8 {posargs} patchwork patchwork/bin/pwclient
> +commands = flake8 {posargs} patchwork patchwork/bin/pwclient
> tools/scripts/split_mail.py
>  
>  [flake8]
>  ignore = E129, F405

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: [PATCH v2 2/9] debugging: add command to dump patches and series

2018-02-25 Thread Stephen Finucane
On Sun, 2018-02-25 at 01:50 +1100, Daniel Axtens wrote:
> I don't want a full dump, just enough to know if the same patches
> and series have been created with roughly the same properties. This
> seemed like the easiest way to do it.
> 
> Usage:
>   python3 manage.py debug_dump > file
>   ... make changes, reset db, reload files ...
>   python3 manage.py debug_dump > file2
>   diff -u file file2
> 
> Reviewed-by: Andrew Donnellan 
> Signed-off-by: Daniel Axtens 

Same concerns as Andrew RE: naming, with the same conclusion (it's not
big enough to matter). In addition, I'd like to see this documented (I
think the rest of them are documented?). With that...

Reviewed-by: Stephen Finucane 
___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: [PATCH v2 1/9] tools/docker: assume terminal supports utf-8

2018-02-25 Thread Stephen Finucane
On Sun, 2018-02-25 at 01:50 +1100, Daniel Axtens wrote:
> Set PYTHONIOENCODING to UTF-8, which allows Python3 to print UTF-8
> directly to the terminal (to a pipe or shell-redirected file) rather
> than throwing an error.
> 
> Reviewed-by: Andrew Donnellan 
> Signed-off-by: Daniel Axtens 

Fine by me.

Reviewed-by: Stephen Finucane 

> ---
>  tools/docker/Dockerfile | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/tools/docker/Dockerfile b/tools/docker/Dockerfile
> index eb6d35d82528..2154ca7e67bf 100644
> --- a/tools/docker/Dockerfile
> +++ b/tools/docker/Dockerfile
> @@ -10,6 +10,7 @@ ENV db_pass password
>  ENV DJANGO_SETTINGS_MODULE patchwork.settings.dev
>  ENV DEBIAN_FRONTEND noninteractive
>  ENV PYTHONUNBUFFERED 1
> +ENV PYTHONIOENCODING UTF-8
>  
>  # System
>  # trusty and findutils is for python3.4

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: [PATCH] [RFC] tools: drop vagrant

2018-02-25 Thread Stephen Finucane
On Sat, 2018-02-24 at 12:22 +1100, Daniel Axtens wrote:
> It served us well, but it's now outdated (Trusty, Python 3.4, etc)
> There is no indication that anyone uses it or keeps it up to date.
> 
> Signed-off-by: Daniel Axtens 

I'm good to drop this. I've been using Docker exclusively for a while
now too.

Acked-by: Stephen Finucane 
___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork