Re: [PATCH v2] models: Use database constraints to prevent split Series

2019-10-29 Thread Daniel Axtens
Daniel Axtens  writes:

>> Closes: #241
>
> I'm not sure this is quite correct, sadly. Using the parallel parser I
> posted, I see a difference in the number of series created by parsing in
> parallel and parsing serially.
>
> # serial (-j1)
 Series.objects.count()
> 28016
>
> # parallel (-j6)
 Series.objects.count()
> 28065
>
>
> I investigated just the first Untitled series from the parallel case,
> where I see that one 6-patch series has been split up a lot:
>
 pprint.pprint([(p.name, p.series) for p in 
 Patch.objects.filter(submitter__id=42)[0:6]])
> [('[1/6] x86: implement support to synchronize RDTSC through MFENCE on AMD '
>   'CPUs',
>on AMD CPUs>),
>  ('[2/6] x86: Implement support to synchronize RDTSC with LFENCE on Intel 
> CPUs',
>   ),
>  ('[3/6] x86: move nop declarations into separate include file',
>   ),
>  ('[4/6] x86: introduce rdtsc_barrier()', ),
>  ('[5/6] x86: remove get_cycles_sync', ),
>  ('[6/6] x86: read_tsc sync', )]
>
> If I do serial parsing, this does not get split up:
>
 pprint.pprint([(p.name, p.series) for p in 
 Patch.objects.filter(submitter__id=74)[0:6]])
> [('[1/6] x86: implement support to synchronize RDTSC through MFENCE on AMD '
>   'CPUs',
>   ),
>  ('[2/6] x86: Implement support to synchronize RDTSC with LFENCE on Intel 
> CPUs',
>   ),
>  ('[3/6] x86: move nop declarations into separate include file',
>   ),
>  ('[4/6] x86: introduce rdtsc_barrier()',
>   ),
>  ('[5/6] x86: remove get_cycles_sync',
>   ),
>  ('[6/6] x86: read_tsc sync', )]
>
> I haven't looked at the series in any detail beyond this - it might
> still be worth including, but it just doesn't quite squash the
> bug, sadly.

I've just implemented an alternative approach (put a field in Series
that represents the 'head' message-id), and also saw this split up - it
turns out that it doesn't contain References or In-Reply-To. I'll have a
look to see if any are split that do contain those headers.

Regards,
Daniel

>
> Regards,
> Daniel
>
>> Cc: Daniel Axtens 
>> Cc: Petr Vorel 
>> ---
>>  patchwork/migrations/0037_python_3.py | 298 ++
>>  .../0038_unique_series_references.py  | 121 +++
>>  patchwork/models.py   |   6 +-
>>  patchwork/parser.py   | 122 +++
>>  patchwork/tests/test_parser.py|   9 +-
>>  patchwork/tests/utils.py  |   6 +-
>>  6 files changed, 496 insertions(+), 66 deletions(-)
>>  create mode 100644 patchwork/migrations/0037_python_3.py
>>  create mode 100644 patchwork/migrations/0038_unique_series_references.py
>>
>> diff --git patchwork/migrations/0037_python_3.py 
>> patchwork/migrations/0037_python_3.py
>> new file mode 100644
>> index ..416a7045
>> --- /dev/null
>> +++ patchwork/migrations/0037_python_3.py
>> @@ -0,0 +1,298 @@
>> +import datetime
>> +
>> +from django.conf import settings
>> +from django.db import migrations, models
>> +import django.db.models.deletion
>> +
>> +import patchwork.models
>> +
>> +
>> +class Migration(migrations.Migration):
>> +
>> +dependencies = [('patchwork', '0036_project_commit_url_format')]
>> +
>> +operations = [
>> +migrations.AlterField(
>> +model_name='check',
>> +name='context',
>> +field=models.SlugField(
>> +default='default',
>> +help_text='A label to discern check from checks of other 
>> testing systems.',  # noqa
>> +max_length=255,
>> +),
>> +),
>> +migrations.AlterField(
>> +model_name='check',
>> +name='description',
>> +field=models.TextField(
>> +blank=True,
>> +help_text='A brief description of the check.',
>> +null=True,
>> +),
>> +),
>> +migrations.AlterField(
>> +model_name='check',
>> +name='state',
>> +field=models.SmallIntegerField(
>> +choices=[
>> +(0, 'pending'),
>> +(1, 'success'),
>> +(2, 'warning'),
>> +(3, 'fail'),
>> +],
>> +default=0,
>> +help_text='The state of the check.',
>> +),
>> +),
>> +migrations.AlterField(
>> +model_name='check',
>> +name='target_url',
>> +field=models.URLField(
>> +blank=True,
>> +help_text='The target URL to associate with this check. 
>> This should be specific to the patch.',  # noqa
>> +null=True,
>> +),
>> +),
>> +migrations.AlterField(
>> +model_name='comment',
>> +name='submission',
>> +field=models.ForeignKey(
>> +on_delete=django.db.models.deletion.CASCADE,
>> +related_name='comments',
>> +

Re: [PATCH v2] models: Use database constraints to prevent split Series

2019-10-21 Thread Stephen Finucane
On Sat, 2019-10-19 at 00:57 +1100, Daniel Axtens wrote:
> > Closes: #241
> 
> I'm not sure this is quite correct, sadly. Using the parallel parser I
> posted, I see a difference in the number of series created by parsing in
> parallel and parsing serially.
> 
> # serial (-j1)
> > > > Series.objects.count()
> 28016
> 
> # parallel (-j6)
> > > > Series.objects.count()
> 28065
> 
> 
> I investigated just the first Untitled series from the parallel case,
> where I see that one 6-patch series has been split up a lot:
> 
> > > > pprint.pprint([(p.name, p.series) for p in 
> > > > Patch.objects.filter(submitter__id=42)[0:6]])
> [('[1/6] x86: implement support to synchronize RDTSC through MFENCE on AMD '
>   'CPUs',
>on AMD CPUs>),
>  ('[2/6] x86: Implement support to synchronize RDTSC with LFENCE on Intel 
> CPUs',
>   ),
>  ('[3/6] x86: move nop declarations into separate include file',
>   ),
>  ('[4/6] x86: introduce rdtsc_barrier()', ),
>  ('[5/6] x86: remove get_cycles_sync', ),
>  ('[6/6] x86: read_tsc sync', )]
> 
> If I do serial parsing, this does not get split up:
> 
> > > > pprint.pprint([(p.name, p.series) for p in 
> > > > Patch.objects.filter(submitter__id=74)[0:6]])
> [('[1/6] x86: implement support to synchronize RDTSC through MFENCE on AMD '
>   'CPUs',
>   ),
>  ('[2/6] x86: Implement support to synchronize RDTSC with LFENCE on Intel 
> CPUs',
>   ),
>  ('[3/6] x86: move nop declarations into separate include file',
>   ),
>  ('[4/6] x86: introduce rdtsc_barrier()',
>   ),
>  ('[5/6] x86: remove get_cycles_sync',
>   ),
>  ('[6/6] x86: read_tsc sync', )]
> 
> I haven't looked at the series in any detail beyond this - it might
> still be worth including, but it just doesn't quite squash the
> bug, sadly.

So my only theory about why this might be happening is that this series
is shallow threaded (so all patches contain a reference to the cover
letter but not their preceding patches) and there is still a race in
creating the SeriesReference corresponding to the cover letter which
some patches are losing. However, what's confusing is that they should
be trying and hitting an IntegrityError in the losing case, resulting
in the rollback/retry of the entire transaction. Clearly that is not
the case though so it would be good to figure out what's missing.

Would you be able to dump the rows for these Patches (minus the diff,
headers and content fields), the Series, and the associated
SeriesReferences so I can try figure out how we ended up doing this? I
probably won't be able to work on it for the next two weeks (I'm on
PTO) but I'll try crack it then. This has to be possible.

Stephen

> Regards,
> Daniel
> 
> > Cc: Daniel Axtens 
> > Cc: Petr Vorel 
> > ---
> >  patchwork/migrations/0037_python_3.py | 298 ++
> >  .../0038_unique_series_references.py  | 121 +++
> >  patchwork/models.py   |   6 +-
> >  patchwork/parser.py   | 122 +++
> >  patchwork/tests/test_parser.py|   9 +-
> >  patchwork/tests/utils.py  |   6 +-
> >  6 files changed, 496 insertions(+), 66 deletions(-)
> >  create mode 100644 patchwork/migrations/0037_python_3.py
> >  create mode 100644 patchwork/migrations/0038_unique_series_references.py
> > 
> > diff --git patchwork/migrations/0037_python_3.py 
> > patchwork/migrations/0037_python_3.py
> > new file mode 100644
> > index ..416a7045
> > --- /dev/null
> > +++ patchwork/migrations/0037_python_3.py
> > @@ -0,0 +1,298 @@
> > +import datetime
> > +
> > +from django.conf import settings
> > +from django.db import migrations, models
> > +import django.db.models.deletion
> > +
> > +import patchwork.models
> > +
> > +
> > +class Migration(migrations.Migration):
> > +
> > +dependencies = [('patchwork', '0036_project_commit_url_format')]
> > +
> > +operations = [
> > +migrations.AlterField(
> > +model_name='check',
> > +name='context',
> > +field=models.SlugField(
> > +default='default',
> > +help_text='A label to discern check from checks of other 
> > testing systems.',  # noqa
> > +max_length=255,
> > +),
> > +),
> > +migrations.AlterField(
> > +model_name='check',
> > +name='description',
> > +field=models.TextField(
> > +blank=True,
> > +help_text='A brief description of the check.',
> > +null=True,
> > +),
> > +),
> > +migrations.AlterField(
> > +model_name='check',
> > +name='state',
> > +field=models.SmallIntegerField(
> > +choices=[
> > +(0, 'pending'),
> > +(1, 'success'),
> > +(2, 'warning'),
> > +(3, 'fail'),
> > +],
> > +default=0,
> > + 

Re: [PATCH v2] models: Use database constraints to prevent split Series

2019-10-17 Thread Stephen Finucane
On Fri, 2019-10-18 at 09:48 +1100, Daniel Axtens wrote:
> Hi Stephen,
> 
> Were you aiming this for 2.2 or for 3?
> 
> I'm thinking 3 but happy to be convinced otherwise.

I'd personally like this to be a blocker for v2.2, though the Python 3
part of the migration can possibly be dropped (I can respin if
necessary). This is a pretty dumb issue and the fix is pretty trivial,
all in all, but it's not backportable (model changes). I don't really
want to have broken series keep popping up for the next 12 months (or
however long it takes for us to wrap up a v3.0).

Let me know if there's anything I can do to help move it along.

> I'll also open a topic branch for PW3 and start posting some of my
> Python 2 removal patches.

I've a question/comment on this but I'll reply on the other thread.

Stephen

> Regards,
> Daniel
> 
> > Currently, the 'SeriesReference' object has a unique constraint on the
> > two fields it has, 'series', which is a foreign key to 'Series', and
> > 'msgid'. This is the wrong constraint. What we actually want to enforce
> > is that a patch, cover letter or comment is referenced by a single
> > series, or rather a single series per project the submission appears on.
> > As such, we should be enforcing uniqueness on the msgid and the project
> > that the patch, cover letter or comment belongs to.
> > 
> > This requires adding a new field to the object, 'project', since it's
> > not possible to do something like the following:
> > 
> >   unique_together = [('msgid', 'series__project')]
> > 
> > This is detailed here [1]. In addition, the migration needs a precursor
> > migration step to merge any broken series.
> > 
> > [1] https://stackoverflow.com/a/4440189/613428
> > 
> > Signed-off-by: Stephen Finucane 
> > Closes: #241
> > Cc: Daniel Axtens 
> > Cc: Petr Vorel 
> > ---
> >  patchwork/migrations/0037_python_3.py | 298 ++
> >  .../0038_unique_series_references.py  | 121 +++
> >  patchwork/models.py   |   6 +-
> >  patchwork/parser.py   | 122 +++
> >  patchwork/tests/test_parser.py|   9 +-
> >  patchwork/tests/utils.py  |   6 +-
> >  6 files changed, 496 insertions(+), 66 deletions(-)
> >  create mode 100644 patchwork/migrations/0037_python_3.py
> >  create mode 100644 patchwork/migrations/0038_unique_series_references.py
> > 
> > diff --git patchwork/migrations/0037_python_3.py 
> > patchwork/migrations/0037_python_3.py
> > new file mode 100644
> > index ..416a7045
> > --- /dev/null
> > +++ patchwork/migrations/0037_python_3.py
> > @@ -0,0 +1,298 @@
> > +import datetime
> > +
> > +from django.conf import settings
> > +from django.db import migrations, models
> > +import django.db.models.deletion
> > +
> > +import patchwork.models
> > +
> > +
> > +class Migration(migrations.Migration):
> > +
> > +dependencies = [('patchwork', '0036_project_commit_url_format')]
> > +
> > +operations = [
> > +migrations.AlterField(
> > +model_name='check',
> > +name='context',
> > +field=models.SlugField(
> > +default='default',
> > +help_text='A label to discern check from checks of other 
> > testing systems.',  # noqa
> > +max_length=255,
> > +),
> > +),
> > +migrations.AlterField(
> > +model_name='check',
> > +name='description',
> > +field=models.TextField(
> > +blank=True,
> > +help_text='A brief description of the check.',
> > +null=True,
> > +),
> > +),
> > +migrations.AlterField(
> > +model_name='check',
> > +name='state',
> > +field=models.SmallIntegerField(
> > +choices=[
> > +(0, 'pending'),
> > +(1, 'success'),
> > +(2, 'warning'),
> > +(3, 'fail'),
> > +],
> > +default=0,
> > +help_text='The state of the check.',
> > +),
> > +),
> > +migrations.AlterField(
> > +model_name='check',
> > +name='target_url',
> > +field=models.URLField(
> > +blank=True,
> > +help_text='The target URL to associate with this check. 
> > This should be specific to the patch.',  # noqa
> > +null=True,
> > +),
> > +),
> > +migrations.AlterField(
> > +model_name='comment',
> > +name='submission',
> > +field=models.ForeignKey(
> > +on_delete=django.db.models.deletion.CASCADE,
> > +related_name='comments',
> > +related_query_name='comment',
> > +to='patchwork.Submission',
> > +),
> > +),
> > +migrations.AlterField(
> 

Re: [PATCH v2] models: Use database constraints to prevent split Series

2019-10-17 Thread Petr Vorel
Hi Stephen,

> Currently, the 'SeriesReference' object has a unique constraint on the
> two fields it has, 'series', which is a foreign key to 'Series', and
> 'msgid'. This is the wrong constraint. What we actually want to enforce
> is that a patch, cover letter or comment is referenced by a single
> series, or rather a single series per project the submission appears on.
> As such, we should be enforcing uniqueness on the msgid and the project
> that the patch, cover letter or comment belongs to.
+1.

> This requires adding a new field to the object, 'project', since it's
> not possible to do something like the following:

>   unique_together = [('msgid', 'series__project')]

> This is detailed here [1]. In addition, the migration needs a precursor
> migration step to merge any broken series.

> [1] https://stackoverflow.com/a/4440189/613428

> Signed-off-by: Stephen Finucane 
> Closes: #241
> Cc: Daniel Axtens 
> Cc: Petr Vorel 

I cannot review your patch, but thanks a lot for implementing fix for #241.

Something related (I should create GH issue): it'd be great if cover letter was
always displayed (no matter how, but with all replies). ATM it's displayed as a
separate patch, if there is diff included: series [1], separated cover letter
[2] (there is no series).

Kind regards,
Petr

[1] https://patchwork.ozlabs.org/project/ltp/list/?series=135548=*
[2] https://patchwork.ozlabs.org/patch/1175082/
___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: [PATCH v2] models: Use database constraints to prevent split Series

2019-10-17 Thread Daniel Axtens
Hi Stephen,

Were you aiming this for 2.2 or for 3?

I'm thinking 3 but happy to be convinced otherwise.

I'll also open a topic branch for PW3 and start posting some of my
Python 2 removal patches.

Regards,
Daniel

> Currently, the 'SeriesReference' object has a unique constraint on the
> two fields it has, 'series', which is a foreign key to 'Series', and
> 'msgid'. This is the wrong constraint. What we actually want to enforce
> is that a patch, cover letter or comment is referenced by a single
> series, or rather a single series per project the submission appears on.
> As such, we should be enforcing uniqueness on the msgid and the project
> that the patch, cover letter or comment belongs to.
>
> This requires adding a new field to the object, 'project', since it's
> not possible to do something like the following:
>
>   unique_together = [('msgid', 'series__project')]
>
> This is detailed here [1]. In addition, the migration needs a precursor
> migration step to merge any broken series.
>
> [1] https://stackoverflow.com/a/4440189/613428
>
> Signed-off-by: Stephen Finucane 
> Closes: #241
> Cc: Daniel Axtens 
> Cc: Petr Vorel 
> ---
>  patchwork/migrations/0037_python_3.py | 298 ++
>  .../0038_unique_series_references.py  | 121 +++
>  patchwork/models.py   |   6 +-
>  patchwork/parser.py   | 122 +++
>  patchwork/tests/test_parser.py|   9 +-
>  patchwork/tests/utils.py  |   6 +-
>  6 files changed, 496 insertions(+), 66 deletions(-)
>  create mode 100644 patchwork/migrations/0037_python_3.py
>  create mode 100644 patchwork/migrations/0038_unique_series_references.py
>
> diff --git patchwork/migrations/0037_python_3.py 
> patchwork/migrations/0037_python_3.py
> new file mode 100644
> index ..416a7045
> --- /dev/null
> +++ patchwork/migrations/0037_python_3.py
> @@ -0,0 +1,298 @@
> +import datetime
> +
> +from django.conf import settings
> +from django.db import migrations, models
> +import django.db.models.deletion
> +
> +import patchwork.models
> +
> +
> +class Migration(migrations.Migration):
> +
> +dependencies = [('patchwork', '0036_project_commit_url_format')]
> +
> +operations = [
> +migrations.AlterField(
> +model_name='check',
> +name='context',
> +field=models.SlugField(
> +default='default',
> +help_text='A label to discern check from checks of other 
> testing systems.',  # noqa
> +max_length=255,
> +),
> +),
> +migrations.AlterField(
> +model_name='check',
> +name='description',
> +field=models.TextField(
> +blank=True,
> +help_text='A brief description of the check.',
> +null=True,
> +),
> +),
> +migrations.AlterField(
> +model_name='check',
> +name='state',
> +field=models.SmallIntegerField(
> +choices=[
> +(0, 'pending'),
> +(1, 'success'),
> +(2, 'warning'),
> +(3, 'fail'),
> +],
> +default=0,
> +help_text='The state of the check.',
> +),
> +),
> +migrations.AlterField(
> +model_name='check',
> +name='target_url',
> +field=models.URLField(
> +blank=True,
> +help_text='The target URL to associate with this check. This 
> should be specific to the patch.',  # noqa
> +null=True,
> +),
> +),
> +migrations.AlterField(
> +model_name='comment',
> +name='submission',
> +field=models.ForeignKey(
> +on_delete=django.db.models.deletion.CASCADE,
> +related_name='comments',
> +related_query_name='comment',
> +to='patchwork.Submission',
> +),
> +),
> +migrations.AlterField(
> +model_name='delegationrule',
> +name='path',
> +field=models.CharField(
> +help_text='An fnmatch-style pattern to match filenames 
> against.',  # noqa
> +max_length=255,
> +),
> +),
> +migrations.AlterField(
> +model_name='delegationrule',
> +name='priority',
> +field=models.IntegerField(
> +default=0,
> +help_text='The priority of the rule. Rules with a higher 
> priority will override rules with lower priorities',  # noqa
> +),
> +),
> +migrations.AlterField(
> +model_name='delegationrule',
> +name='user',
> +field=models.ForeignKey(
> +help_text='A user to delegate the patch to.',