On Mon, 2018-02-26 at 15:55 +1100, Daniel Axtens wrote: > Stephen Finucane <step...@that.guru> writes: > > > On Sun, 2018-02-25 at 01:50 +1100, Daniel Axtens wrote: > > > Parallel parsing would occasonally fail with: > > > > > > patchwork.models.MultipleObjectsReturned: get() returned more than one > > > SeriesReference -- it returned 2! > > > > > > I think these are happening if you have different processes parsing > > > e.g. 1/3 and 2/3 simultaneously: both will have a reference to 1/3, > > > in the case of 1 it will be the msgid, in the case of 2 it will be > > > in References. So when we come to parse 3/3, .get() finds 2 and > > > throws the exception. > > > > > > This does not fix the creation of multiple series references; it > > > just causes them to be ignored. We still have serious race conditions > > > with series creation, but I don't yet have clear answers for them. > > > With this patch, they will at least not stop patches from being > > > processed - they'll just lead to wonky series, which we already have. > > > > > > Reviewed-by: Andrew Donnellan <andrew.donnel...@au1.ibm.com> > > > Signed-off-by: Daniel Axtens <d...@axtens.net> > > > > Correct me if I'm wrong, but this seems to highlight two issues: > > > > * The race condition between searching for a series reference and > > using it > > * The fact we expect a series to be unique for a given msgid and > > project, yet the 'unique_together' is for a given msgid and > > *series*. > > > > The first of these is caught here (yet not fixed, as it's a non-trivial > > thing to resolve). The second should still be fixed though. Should we > > do this here, as part of this series? > > My thinking is no, for these reasons: > > - I want a patch I can backport to stable without requiring a database > migration, because: > > * that seems like a very unusual thing for a stable update > > * I don't trust my ability to get it right for a stable release > > * relatedly, it will require a lot of testing for me to trust it, > and I want something sooner rather than later so that the > Buildroot people can get on with their lives. > > * The different numbering of migrations will make the different > upgrade paths a nightmare to describe, yet alone test
Yup, I had no considered the backportability aspects of this. That's a very valid point. Let's do this separately. > - I have a long-term ongoing effort to move 'project' up into the > various tables that use it so as to avoid JOINs and this would be a > good thing to do in that effort. (It's the next big push on my list, > it's just a nightmare to get the compatibility and migration code > going.) We should probably discuss this separately, but is this a significant part of the issue? Far as I could see, there were two different issues with the DB causing our pain: * The number of JOINs on Event, which I have an RFC waiting on reviews for. * The need to JOIN for Patch and CoverLetter, which could probably be resolved by flattening this back down into Submission. Have I missed something? Stephen > Regards, > Daniel > > > > > Stephen > > > > > --- > > > patchwork/parser.py | 23 +++++++++++++++++++++-- > > > 1 file changed, 21 insertions(+), 2 deletions(-) > > > > > > diff --git a/patchwork/parser.py b/patchwork/parser.py > > > index 56dc7006c811..5a7344cee93c 100644 > > > --- a/patchwork/parser.py > > > +++ b/patchwork/parser.py > > > @@ -240,6 +240,13 @@ def _find_series_by_references(project, mail): > > > msgid=ref[:255], series__project=project).series > > > except SeriesReference.DoesNotExist: > > > continue > > > + except SeriesReference.MultipleObjectsReturned: > > > + # FIXME: Open bug: this can happen when we're processing > > > + # messages in parallel. Pick the first and log. > > > + logger.error("Multiple SeriesReferences for %s in project > > > %s!" % > > > + (ref[:255], project.name)) > > > + return SeriesReference.objects.filter( > > > + msgid=ref[:255], series__project=project).first().series > > > > > > > > > def _find_series_by_markers(project, mail, author): > > > @@ -1037,6 +1044,9 @@ def parse_mail(mail, list_id=None): > > > series__project=project) > > > except SeriesReference.DoesNotExist: > > > SeriesReference.objects.create(series=series, > > > msgid=ref) > > > + except SeriesReference.MultipleObjectsReturned: > > > + logger.error("Multiple SeriesReferences for %s" > > > + " in project %s!" % (ref, project.name)) > > > > > > # add to a series if we have found one, and we have a numbered > > > # patch. Don't add unnumbered patches (for example diffs sent > > > @@ -1075,6 +1085,11 @@ def parse_mail(mail, list_id=None): > > > msgid=msgid, series__project=project).series > > > except SeriesReference.DoesNotExist: > > > series = None > > > + except SeriesReference.MultipleObjectsReturned: > > > + logger.error("Multiple SeriesReferences for %s" > > > + " in project %s!" % (msgid, project.name)) > > > + series = SeriesReference.objects.filter( > > > + msgid=msgid, series__project=project).first().series > > > > > > if not series: > > > series = Series(project=project, > > > @@ -1087,8 +1102,12 @@ def parse_mail(mail, list_id=None): > > > # we don't save the in-reply-to or references fields > > > # for a cover letter, as they can't refer to the same > > > # series > > > - SeriesReference.objects.get_or_create(series=series, > > > - msgid=msgid) > > > + try: > > > + SeriesReference.objects.get_or_create(series=series, > > > + msgid=msgid) > > > + except SeriesReference.MultipleObjectsReturned: > > > + logger.error("Multiple SeriesReferences for %s" > > > + " in project %s!" % (msgid, > > > project.name)) > > > > > > cover_letter = CoverLetter( > > > msgid=msgid, _______________________________________________ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork