[PATCH] Fetch all series for patch/cover viewing

2018-08-08 Thread Stewart Smith
e.g. a 10 comment patch goes from 26 queries in 17-20ms down to 20
queries in 12ms.

A 67 comment cover letter goes from 14 queries in 16ms down to 8 queries
in 8ms.

So, effectively, a near 2x perf improvement.

Previously, at several points we were asking for the latest series and
then asking for all the series. Since there just usually aren't *that*
many series, fetch them all and take the first one if we need to.

Signed-off-by: Stewart Smith 
---
 patchwork/templates/patchwork/submission.html | 10 +-
 patchwork/views/cover.py  |  2 +-
 patchwork/views/patch.py  |  1 +
 3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/patchwork/templates/patchwork/submission.html 
b/patchwork/templates/patchwork/submission.html
index 2f69735d6925..3b6f9fbe909e 100644
--- a/patchwork/templates/patchwork/submission.html
+++ b/patchwork/templates/patchwork/submission.html
@@ -64,15 +64,15 @@ function toggle_div(link_id, headers_id)

   
  
-{% if submission.latest_series %}
+{% if submission.all_series %}
  
   Series
   

 
- {% for series in submission.series.all %}
+ {% for series in all_series %}
  
-  {% if series == submission.latest_series %}
+  {% if forloop.first %}
{{ series }}
   {% else %}

@@ -93,7 +93,7 @@ function toggle_div(link_id, headers_id)
>show

 
-{% with submission.latest_series.cover_letter as cover %}
+{% with all_series.cover_letter as cover %}
  
  {% if cover %}
   {% if cover == submission %}
@@ -106,7 +106,7 @@ function toggle_div(link_id, headers_id)
  {% endif %}
  
 {% endwith %}
-{% for sibling in submission.latest_series.patches.all %}
+{% for sibling in all_series.patches.all %}
  
   {% if sibling == submission %}
{{ sibling.name|default:"[no subject]"|truncatechars:100 }}
diff --git a/patchwork/views/cover.py b/patchwork/views/cover.py
index edad90bc694d..1ee2b3f988fa 100644
--- a/patchwork/views/cover.py
+++ b/patchwork/views/cover.py
@@ -49,7 +49,7 @@ def cover_detail(request, cover_id):
 comments = comments.select_related('submitter')
 comments = comments.only('submitter','date','id','content','submission')
 context['comments'] = comments
-
+context['all_series'] = cover.series.all().order_by('-date')
 return render_to_response('patchwork/submission.html', context)
 
 
diff --git a/patchwork/views/patch.py b/patchwork/views/patch.py
index f43fbecd9a4d..e1d0cdcfcf39 100644
--- a/patchwork/views/patch.py
+++ b/patchwork/views/patch.py
@@ -118,6 +118,7 @@ def patch_detail(request, patch_id):
 comments = comments.select_related('submitter')
 comments = comments.only('submitter','date','id','content','submission')
 
+context['all_series'] = patch.series.all().order_by('-date')
 context['comments'] = comments
 context['submission'] = patch
 context['patchform'] = form
-- 
2.17.1

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


[PATCH v2] 4x performance improvement for viewing patch with many comments

2018-08-08 Thread Stewart Smith
Using the example of id:20180720035941.6844-1-khand...@linux.vnet.ibm.com
with my test dataset of a chunk of a variety of mailing lists, has
this cover letter have 67 comments from a variety of people. Thus,
it's on the larger side of things.

Originally, displaying the /patch/550/ for this (redirected to /cover)
would take 81 SQL queries in ~60ms on my laptop.

After this optimisation, it's down to 14 queries in 14ms.

When the cache is cold, it's down to 32ms from 83ms.

The effect of this patch is to execute a join in the database to
get the submitter information for each comment at the same time as
getting all the comments rather than doing a one-by-one lookup after
the fact.

Signed-off-by: Stewart Smith 
---
 patchwork/templates/patchwork/submission.html | 2 +-
 patchwork/views/cover.py  | 5 +
 patchwork/views/patch.py  | 5 +
 3 files changed, 11 insertions(+), 1 deletion(-)

---
Changes since v1: don't fat finger patch viewing (s/cover/patch/)

diff --git a/patchwork/templates/patchwork/submission.html 
b/patchwork/templates/patchwork/submission.html
index e817713f7079..2f69735d6925 100644
--- a/patchwork/templates/patchwork/submission.html
+++ b/patchwork/templates/patchwork/submission.html
@@ -262,7 +262,7 @@ function toggle_div(link_id, headers_id)
 
 
 
-{% for item in submission.comments.all %}
+{% for item in comments %}
 {% if forloop.first %}
 Comments
 {% endif %}
diff --git a/patchwork/views/cover.py b/patchwork/views/cover.py
index 73f83cb99b99..edad90bc694d 100644
--- a/patchwork/views/cover.py
+++ b/patchwork/views/cover.py
@@ -45,6 +45,11 @@ def cover_detail(request, cover_id):
 'project': cover.project,
 }
 
+comments = cover.comments.all()
+comments = comments.select_related('submitter')
+comments = comments.only('submitter','date','id','content','submission')
+context['comments'] = comments
+
 return render_to_response('patchwork/submission.html', context)
 
 
diff --git a/patchwork/views/patch.py b/patchwork/views/patch.py
index cbd4ec395d99..f43fbecd9a4d 100644
--- a/patchwork/views/patch.py
+++ b/patchwork/views/patch.py
@@ -114,6 +114,11 @@ def patch_detail(request, patch_id):
 if is_authenticated(request.user):
 context['bundles'] = request.user.bundles.all()
 
+comments = patch.comments.all()
+comments = comments.select_related('submitter')
+comments = comments.only('submitter','date','id','content','submission')
+
+context['comments'] = comments
 context['submission'] = patch
 context['patchform'] = form
 context['createbundleform'] = createbundleform
-- 
2.17.1

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


[PATCH] Add index for patchwork_comment (submission_id,date)

2018-08-08 Thread Stewart Smith
This (at least theoretically) should speed up displaying comments
on patches/cover letters. It's an index that will return rows
in-order for the query that we always do ("give me the comments
on this submission in date order"), rather than having to have
the database server do a sort for us.

I haven't been able to benchmark something locally that shows
this is an actual improvement, but I don't have as large data
set as various production instances. The query plan does look
a bit nicer though. Although the benefit of index maintenance
versus how long it takes to sort things is a good question.

Signed-off-by: Stewart Smith 
---
 .../migrations/0027_add_comment_date_index.py | 23 +++
 patchwork/models.py   |  4 
 2 files changed, 27 insertions(+)
 create mode 100644 patchwork/migrations/0027_add_comment_date_index.py

diff --git a/patchwork/migrations/0027_add_comment_date_index.py 
b/patchwork/migrations/0027_add_comment_date_index.py
new file mode 100644
index ..0a57a9c3b212
--- /dev/null
+++ b/patchwork/migrations/0027_add_comment_date_index.py
@@ -0,0 +1,23 @@
+# -*- coding: utf-8 -*-
+# Generated by Django 1.11.15 on 2018-08-09 14:03
+from __future__ import unicode_literals
+
+from django.db import migrations, models
+
+
+class Migration(migrations.Migration):
+
+dependencies = [
+('patchwork', '0026_add_user_bundles_backref'),
+]
+
+operations = [
+migrations.AlterModelOptions(
+name='series',
+options={'verbose_name_plural': 'Series'},
+),
+migrations.AddIndex(
+model_name='comment',
+index=models.Index(fields=['submission', 'date'], 
name='submission_date_idx'),
+),
+]
diff --git a/patchwork/models.py b/patchwork/models.py
index d2b88fc48c91..d2389cfdad29 100644
--- a/patchwork/models.py
+++ b/patchwork/models.py
@@ -625,6 +625,10 @@ class Comment(EmailMixin, models.Model):
 class Meta:
 ordering = ['date']
 unique_together = [('msgid', 'submission')]
+indexes = [
+models.Index(name='submission_date_idx',
+ fields=['submission','date'])
+]
 
 
 @python_2_unicode_compatible
-- 
2.17.1

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


[PATCH] 4x performance improvement for viewing patch with many comments

2018-08-08 Thread Stewart Smith
Using the example of id:20180720035941.6844-1-khand...@linux.vnet.ibm.com
with my test dataset of a chunk of a variety of mailing lists, has
this cover letter have 67 comments from a variety of people. Thus,
it's on the larger side of things.

Originally, displaying the /patch/550/ for this (redirected to /cover)
would take 81 SQL queries in ~60ms on my laptop.

After this optimisation, it's down to 14 queries in 14ms.

When the cache is cold, it's down to 32ms from 83ms.

The effect of this patch is to execute a join in the database to
get the submitter information for each comment at the same time as
getting all the comments rather than doing a one-by-one lookup after
the fact.

Signed-off-by: Stewart Smith 
---
 patchwork/templates/patchwork/submission.html | 2 +-
 patchwork/views/cover.py  | 5 +
 patchwork/views/patch.py  | 5 +
 3 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/patchwork/templates/patchwork/submission.html 
b/patchwork/templates/patchwork/submission.html
index e817713f7079..2f69735d6925 100644
--- a/patchwork/templates/patchwork/submission.html
+++ b/patchwork/templates/patchwork/submission.html
@@ -262,7 +262,7 @@ function toggle_div(link_id, headers_id)
 
 
 
-{% for item in submission.comments.all %}
+{% for item in comments %}
 {% if forloop.first %}
 Comments
 {% endif %}
diff --git a/patchwork/views/cover.py b/patchwork/views/cover.py
index 73f83cb99b99..edad90bc694d 100644
--- a/patchwork/views/cover.py
+++ b/patchwork/views/cover.py
@@ -45,6 +45,11 @@ def cover_detail(request, cover_id):
 'project': cover.project,
 }
 
+comments = cover.comments.all()
+comments = comments.select_related('submitter')
+comments = comments.only('submitter','date','id','content','submission')
+context['comments'] = comments
+
 return render_to_response('patchwork/submission.html', context)
 
 
diff --git a/patchwork/views/patch.py b/patchwork/views/patch.py
index cbd4ec395d99..bbd84954726e 100644
--- a/patchwork/views/patch.py
+++ b/patchwork/views/patch.py
@@ -114,6 +114,11 @@ def patch_detail(request, patch_id):
 if is_authenticated(request.user):
 context['bundles'] = request.user.bundles.all()
 
+comments = cover.comments.all()
+comments = comments.select_related('submitter')
+comments = comments.only('submitter','date','id','content','submission')
+
+context['comments'] = comments
 context['submission'] = patch
 context['patchform'] = form
 context['createbundleform'] = createbundleform
-- 
2.17.1

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: [RFC PATCH] 3x improvement in patch listing

2018-08-08 Thread Stewart Smith
Daniel Axtens  writes:
> Daniel Axtens  writes:
>
>> Konstantin Ryabitsev  writes:
>>
>>> On Wed, Aug 08, 2018 at 05:40:06PM +1000, Stewart Smith wrote:
There's two main bits that are really expensive when composing the list
of patches for a project: the query getting the list, and the query
finding the series for each patch.
>>>
>>> Stewart:
>>>
>>> Thanks for working on this! Do you think this would help with pagination 
>>> as well? I find that in semi-abandoned projects like 
>>> https://patchwork.kernel.org/project/qemu-devel/list/ it takes a few 
>>> seconds to load the list view due to 800+ pages of unprocessed patches.  
>>> I am currently considering an automated script that would auto-archive 
>>> patches older than 6 months, but if simply improving pagination times 
>>> would fix the issue, then I wouldn't need to bother.
>>
>> It should do. From a brief look at his patch, it fixes a hot path in
>> __init__.py that goes through the paginator (and which I was *sure* I
>> had looked at, but apparently I had not!) I'll do a quick sanity check
>
> Ah, I have figured out what I had thought I had fixed!
>
> So you're hitting the issue where the db has to fetch the 'diff',
> 'content' and 'headers' out of Submission just to list the patches,
> which is clearly utterly bonkers. This is *despite* the defer() we
> attempt to do - jk reported a Django bug for it: [1].

Yeah, that'd explain it. I was kind of going "wtf" a lot at that, and
then doing something kind of crazy to try and get things down to
something even *remotely* sane.

I was almost tempted to rewrite the page just as raw SQL rather than
using Django at all, at least then it won't randomly regress in the
future. I may be a cranky database guy with a healthy disdain for the
SQL that comes out of ORMs though :)

> I suppose it's worth taking the patch - the Django fix certainly isn't
> going to hit the 1.11 stable tree and we're not supporting 2.0 just
> yet. I do still want to check it properly first, so ...
>
>> on it and hopefully apply it to master and stable/2.1 within the next
>> few days. (but no promises!)
>
> ... this lack-of-a-firm-timeline still applies.

Any random testing / whatever I can do locally to try and help?

This is literally my first time poking at anything Django.

-- 
Stewart Smith
OPAL Architect, IBM.

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: [RFC PATCH] 3x improvement in patch listing

2018-08-08 Thread Stewart Smith
Konstantin Ryabitsev  writes:
> On Wed, Aug 08, 2018 at 05:40:06PM +1000, Stewart Smith wrote:
>>There's two main bits that are really expensive when composing the list
>>of patches for a project: the query getting the list, and the query
>>finding the series for each patch.
>
> Stewart:
>
> Thanks for working on this! Do you think this would help with pagination 
> as well? I find that in semi-abandoned projects like 
> https://patchwork.kernel.org/project/qemu-devel/list/ it takes a few 
> seconds to load the list view due to 800+ pages of unprocessed patches.  
> I am currently considering an automated script that would auto-archive 
> patches older than 6 months, but if simply improving pagination times 
> would fix the issue, then I wouldn't need to bother.

I may be accidentally letting people know I know something about
databases again :)

Out of interest, does the kernel.org instance back onto PostgreSQL or
MySQL? I have a bunch of ideas that would quite dramatically improve
performance on MySQL (properly using primary keys to force disk layout
to be sane, rather than the current screw-locality method that Django
enforces). On PostgreSQL, things are a bit different, but it's possible
an occasional ALTER TABLE CLUSTER BY (or whatever the syntax is) could
help a *lot*.

I'm not sure that archiving the patches would help query performance at
all, but I'd have to check it out a bit. The query that Django generates
is certainly "interesting".

-- 
Stewart Smith
OPAL Architect, IBM.

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: [RFC PATCH] 3x improvement in patch listing

2018-08-08 Thread Daniel Axtens
Daniel Axtens  writes:

> Konstantin Ryabitsev  writes:
>
>> On Wed, Aug 08, 2018 at 05:40:06PM +1000, Stewart Smith wrote:
>>>There's two main bits that are really expensive when composing the list
>>>of patches for a project: the query getting the list, and the query
>>>finding the series for each patch.
>>
>> Stewart:
>>
>> Thanks for working on this! Do you think this would help with pagination 
>> as well? I find that in semi-abandoned projects like 
>> https://patchwork.kernel.org/project/qemu-devel/list/ it takes a few 
>> seconds to load the list view due to 800+ pages of unprocessed patches.  
>> I am currently considering an automated script that would auto-archive 
>> patches older than 6 months, but if simply improving pagination times 
>> would fix the issue, then I wouldn't need to bother.
>
> It should do. From a brief look at his patch, it fixes a hot path in
> __init__.py that goes through the paginator (and which I was *sure* I
> had looked at, but apparently I had not!) I'll do a quick sanity check

Ah, I have figured out what I had thought I had fixed!

So you're hitting the issue where the db has to fetch the 'diff',
'content' and 'headers' out of Submission just to list the patches,
which is clearly utterly bonkers. This is *despite* the defer() we
attempt to do - jk reported a Django bug for it: [1].

It looks like Debian picked up the patch for their release, but it
didn't actually make it into Django 1.11.6 ([1], comment 8)

I attempted a monkey-patch inside Patchwork [2] - the patch was NAKed,
but OzLabs is/was carrying a local fix.

I suppose it's worth taking the patch - the Django fix certainly isn't
going to hit the 1.11 stable tree and we're not supporting 2.0 just
yet. I do still want to check it properly first, so ...

> on it and hopefully apply it to master and stable/2.1 within the next
> few days. (but no promises!)

... this lack-of-a-firm-timeline still applies.

Regards,
Daniel

[1] https://code.djangoproject.com/ticket/28549#ticket
[2] https://patchwork.ozlabs.org/patch/809294/
___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: Database consistency issues

2018-08-08 Thread Konstantin Ryabitsev

On Thu, Aug 09, 2018 at 02:01:11AM +1000, Daniel Axtens wrote:

In the case of Patchwork 1.1.3, I am more confused - the Comment model
has a foreign key to Patch here. So if you try to delete Patch you
should have got a referential integrity error like I did with
e.g. Events. (Patchwork pre 2.0 is a pain to test as it uses Vagrant
VMs, which use VirtualBox by default, which breaks SecureBoot, so I'm
not going to try to get it going tonight.)

Was there anything you deleted after or during the migration?


Daniel:

Hmm... Well, the migration was a bit hairy, so it's possible something 
did get missed. We migrated from 1.0, though, not from 1.1.3, so it's 
possible something got missed? If it helps, I can send the database 
schema that we currently have.


Best,
-K


signature.asc
Description: PGP signature
___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: [RFC PATCH] 3x improvement in patch listing

2018-08-08 Thread Daniel Axtens
Konstantin Ryabitsev  writes:

> On Wed, Aug 08, 2018 at 05:40:06PM +1000, Stewart Smith wrote:
>>There's two main bits that are really expensive when composing the list
>>of patches for a project: the query getting the list, and the query
>>finding the series for each patch.
>
> Stewart:
>
> Thanks for working on this! Do you think this would help with pagination 
> as well? I find that in semi-abandoned projects like 
> https://patchwork.kernel.org/project/qemu-devel/list/ it takes a few 
> seconds to load the list view due to 800+ pages of unprocessed patches.  
> I am currently considering an automated script that would auto-archive 
> patches older than 6 months, but if simply improving pagination times 
> would fix the issue, then I wouldn't need to bother.

It should do. From a brief look at his patch, it fixes a hot path in
__init__.py that goes through the paginator (and which I was *sure* I
had looked at, but apparently I had not!) I'll do a quick sanity check
on it and hopefully apply it to master and stable/2.1 within the next
few days. (but no promises!)

Regards,
Daniel

>
> Best,
> -K
> ___
> Patchwork mailing list
> Patchwork@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/patchwork
___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: [PATCH v2 3/3] travis: add mariadb-10.3 as test

2018-08-08 Thread Daniel Axtens
Hi Daniel,

All look good to me (sparse commit messages aside) - I'll do some
testing and hopefully push them this week. Thanks for your contributions
to Patchwork!

Regards,
Daniel

> Signed-off-by: Daniel Black 
> ---
>  .travis.yml | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/.travis.yml b/.travis.yml
> index 0c6e79d..70b42c1 100644
> --- a/.travis.yml
> +++ b/.travis.yml
> @@ -26,6 +26,11 @@ env:
>  
>  matrix:
>include:
> +- addons:
> +mariadb: '10.3'
> +  env:
> +- PW_TEST_DB_TYPE=mysql
> +- PW_TEST_DB_USER=root
>  - addons:
>  postgresql: "10"
>  apt:
> -- 
> 2.17.1
>
> ___
> Patchwork mailing list
> Patchwork@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/patchwork
___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: Database consistency issues

2018-08-08 Thread Daniel Axtens
>
> I will prepare a patch that adds the foreign keys.

Further to this:

Patchwork 2.1 has that Comment has a foreign key relationship to
submission, not patch.

Patch *inherits* from Submission. This was, in hindsight, one of the
cases where an ORM and object-orientation do not sit well
together. Regardless, there's a migration
(0009_add_submission_model.py) that renames the old Patch to
Submission. It creates a new Patch table, which has Submission as a
foreign key.

Now Submission doesn't have a foreign key to Patch, because a
Submission can be either a Patch or a CoverLetter or the body of a
comment. So deleting a Patch in 2.1 won't delete the Submission, so
it won't trigger the Comment foreign key, and you'll be left with
dangling submissions and comments.

I'm not immediately sure how best to address this except to continue to
unpick the general existence of Submission which is part of my long term
plan.

[Stephen, do we allow comments on cover letters? Can we just add a
foreign key to Comment referencing Patch? Could we add nullable foreign
keys to Submission referencing Patch/CoverLetter/Comment so as to
enforce some more referential integrity? (a la Event)]

In the case of Patchwork 1.1.3, I am more confused - the Comment model
has a foreign key to Patch here. So if you try to delete Patch you
should have got a referential integrity error like I did with
e.g. Events. (Patchwork pre 2.0 is a pain to test as it uses Vagrant
VMs, which use VirtualBox by default, which breaks SecureBoot, so I'm
not going to try to get it going tonight.)

Was there anything you deleted after or during the migration?


Regards,
Daniel

>
> Thanks for the report!
>
> Regards,
> Daniel
___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: Database consistency issues

2018-08-08 Thread Konstantin Ryabitsev

On Thu, Aug 09, 2018 at 01:16:25AM +1000, Daniel Axtens wrote:

SELECT id, submission_id FROM patchwork_comment WHERE

submission_id=9896319;
+--+---+
| id   | submission_id |
+--+---+
| 20810589 |   9896319 |
+--+---+
1 row in set (0.00 sec)

MariaDB [KORG_PATCHWORK]> SELECT * FROM patchwork_submission WHERE
id=9896319;
Empty set (0.00 sec)


Right, well this is definitely the most terrifying start to any email I
have received since starting as one of the patchwork maintainers!


Haha, sorry, I should have prefixed this with "I'm not seeing any data 
loss." :)



I will prepare a patch that adds the foreign keys.


Thanks! I know they are in place for the pgsql side of things, since the 
lore.kernel.org/patchwork instance is using postgresql. So, it was 
either a difference between db implementations, or a side-effect of 
migration.


Best,
-K


signature.asc
Description: PGP signature
___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: [RFC PATCH] 3x improvement in patch listing

2018-08-08 Thread Konstantin Ryabitsev

On Wed, Aug 08, 2018 at 05:40:06PM +1000, Stewart Smith wrote:

There's two main bits that are really expensive when composing the list
of patches for a project: the query getting the list, and the query
finding the series for each patch.


Stewart:

Thanks for working on this! Do you think this would help with pagination 
as well? I find that in semi-abandoned projects like 
https://patchwork.kernel.org/project/qemu-devel/list/ it takes a few 
seconds to load the list view due to 800+ pages of unprocessed patches.  
I am currently considering an automated script that would auto-archive 
patches older than 6 months, but if simply improving pagination times 
would fix the issue, then I wouldn't need to bother.


Best,
-K


signature.asc
Description: PGP signature
___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


[RFC PATCH] 3x improvement in patch listing

2018-08-08 Thread Stewart Smith
There's two main bits that are really expensive when composing the list
of patches for a project: the query getting the list, and the query
finding the series for each patch.

If we look at the query getting the list, it gets a lot of unnesseccary
fields such as 'headers' and 'content', even though we tell Django not
to. It turns out that Django seems to ignore the Submission relationship
and I have no idea how to force it to ignore that thing (defer doesn't
work) but if we go only, then it works okay.

>From my import of ~8000 messages for a few projects, my laptop query
time (MySQL, as setup by whatever the docker-compose things do) goes
from:

http://localhost:8000/project/linux-kernel/list/
FROM:
342ms SQL queries cold cache, 268ms warm cache
TO:
118ms SQL queries cold cache, 88ms warm cache

Which is... non trivial to say the least.

The big jump is the patches.only change, and the removal of ordering
on the patchseries takes a further 10ms off. For some strange reason, it
seems rather hard to tell Django that you don't care what order the
results come back in for that query (if we do, then the db server has to
do a sort rather than just return each row)

Signed-off-by: Stewart Smith 
---
 patchwork/models.py | 1 -
 patchwork/views/__init__.py | 2 ++
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/patchwork/models.py b/patchwork/models.py
index 6268f5b72e3c..d2b88fc48c91 100644
--- a/patchwork/models.py
+++ b/patchwork/models.py
@@ -747,7 +747,6 @@ class Series(FilenameMixin, models.Model):
 return self.name if self.name else 'Untitled series #%d' % self.id
 
 class Meta:
-ordering = ('date',)
 verbose_name_plural = 'Series'
 
 
diff --git a/patchwork/views/__init__.py b/patchwork/views/__init__.py
index f8d23a388ac7..96fd0798af5a 100644
--- a/patchwork/views/__init__.py
+++ b/patchwork/views/__init__.py
@@ -287,6 +287,8 @@ def generic_list(request, project, view, view_args=None, 
filter_settings=None,
 # rendering the list template
 patches = patches.select_related('state', 'submitter', 'delegate')
 
+patches = 
patches.only('state','submitter','delegate','project','name','date')
+
 # we also need checks and series
 patches = patches.prefetch_related('check_set', 'series')
 
-- 
2.17.1

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: [RFC PATCH] REST: Add new setting for maximum API page size

2018-08-08 Thread Stewart Smith
Daniel Axtens  writes:
> FWIW we now have this applied on patchwork.ozlabs.org and it appears to
> be working. Would like some more input as to what an appropriate default
> limit is.
 
 My completely fact-free feeling/opinion is that if it takes more than
 ~1sec to run on Ozlabs, it's probably too high. Apart from that, I'm not
 fussed.
>>>
>>> OzLabs.org is not a fast machine. 1 second round trip time == 100 per 
>>> page. 500 per page == ~2.3 seconds round trip.
>>>
>>> A single 500 item load is still under half the time of doing 5 * 100 
>>> item loads...
>
> I wonder if we could let authenticated users do slower queries. Maybe
> something for later. Let's split the difference at 250 then, I'd be
> happy to merge that.
>
>> I bet MySQL would execute the query quicker :)
>
> In general (and I have tested this) mysql 5.7 executes patchwork queries
> slower than postgres 9.6 - this is on my laptop on an SSD with a couple
> 100ks of patches across 2 or 3 projects.

Huh, somewhat surprising. I wonder what the query plan and cache layout
looks like

>> Anyway, that sounds really quite slow and we should look at why on earth
>> it's that slow - is there some kind of horrific hidden join or poor
>> indexing or query plan or something?
>
> Yes.
>
> So the 1.0 -> 2.0 transition was not good for performance because it
> turns out databases do not map well to object-oriented structures.* I
> didn't think to check for performance at the time, so we're
> progressively improving that by denormalising stuff. (In fact that was a
> major driver of the 2.1 release!) There is still further to go.

So, looking at the schema, if we were going for the state of a bunch of
patches in a project (which I imagine is  fairly common), the current
patch table is:

CREATE TABLE `patchwork_patch` (
  `submission_ptr_id` int(11) NOT NULL,
  `diff` longtext,
  `commit_ref` varchar(255) DEFAULT NULL,
  `pull_url` varchar(255) DEFAULT NULL,
  `delegate_id` int(11) DEFAULT NULL,
  `state_id` int(11) DEFAULT NULL,
  `archived` tinyint(1) NOT NULL,
  `hash` char(40) DEFAULT NULL,
  `patch_project_id` int(11) NOT NULL,
  PRIMARY KEY (`submission_ptr_id`),
  KEY `patchwork_patch_delegate_id_8f7ce805_fk_auth_user_id` (`delegate_id`),
  KEY `patchwork_patch_state_id_3d9fb500_fk_patchwork_state_id` (`state_id`),
  KEY `patchwork_patch_patch_project_id_aff1ad51_fk_patchwork` 
(`patch_project_id`),
  CONSTRAINT `patchwork_patch_delegate_id_8f7ce805_fk_auth_user_id` FOREIGN KEY 
(`delegate_id`) REFERENCES `auth_user` (`id`),
  CONSTRAINT `patchwork_patch_patch_project_id_aff1ad51_fk_patchwork` FOREIGN 
KEY (`patch_project_id`) REFERENCES `patchwork_project` (`id`),
  CONSTRAINT `patchwork_patch_state_id_3d9fb500_fk_patchwork_state_id` FOREIGN 
KEY (`state_id`) REFERENCES `patchwork_state` (`id`),
  CONSTRAINT `patchwork_patch_submission_ptr_id_03216801_fk_patchwork` FOREIGN 
KEY (`submission_ptr_id`) REFERENCES `patchwork_submission` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

which puts the submission_ptr_id as the key for the clustered index
(i.e. on disk layout will group by this ID).

What I suspect is occuring is that we end up having teh on disk layout
of patchwork_patch (and patchwork_submission) be ordered by time the
patch comes in across *all* projects of the patchwork instance, which is
typically (never?) what is actually queried.

So that when I do queries over, say 'skiboot', I'm pulling into cache
all the submissions/patches that occured to any project around that
time, rather than only things related to skiboot.

If instead, the primary key was ('patch_project_id','submission_ptr_id')
and there was a UNIQUE KEY ('submission_ptr_id') then this would mean
that we'd be pulling in a database page full of the patches for the
project we're searching on, which I think would be more likely to be
queried in the near future.

The patch_project_id index colud be dropped, as all queries on it would
be satisfied by the primary key.

The same goes for patchwork_submission:

CREATE TABLE `patchwork_submission` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `msgid` varchar(255) NOT NULL,
  `name` varchar(255) NOT NULL,
  `date` datetime(6) NOT NULL,
  `headers` longtext NOT NULL,
  `project_id` int(11) NOT NULL,
  `submitter_id` int(11) NOT NULL,
  `content` longtext,
  PRIMARY KEY (`id`),
  UNIQUE KEY `patchwork_patch_msgid_project_id_2e1fe709_uniq` 
(`msgid`,`project_id`),
  KEY `patchwork_patch_project_id_162a7a7f_fk_patchwork_project_id` 
(`project_id`),
  KEY `patchwork_patch_submitter_id_57c66142_fk_patchwork_person_id` 
(`submitter_id`),
  CONSTRAINT `patchwork_patch_project_id_162a7a7f_fk_patchwork_project_id` 
FOREIGN KEY (`project_id`) REFERENCES `patchwork_project` (`id`),
  CONSTRAINT `patchwork_patch_submitter_id_57c66142_fk_patchwork_person_id` 
FOREIGN KEY (`submitter_id`) REFERENCES `patchwork_person` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

simply getting a list of patches for a project here is 

Re: [RFC PATCH] REST: Add new setting for maximum API page size

2018-08-08 Thread Stewart Smith
Andrew Donnellan  writes:
> On 26/07/18 23:24, Daniel Axtens wrote:
>> Andrew Donnellan  writes:
>> 
>>> On 24/07/18 15:10, Andrew Donnellan wrote:
 In 41790caf59ad ("REST: Limit max page size") we limited the maximum page
 size to the default page size in the settings.

 This turns out to be rather restrictive, as we usually want to keep the
 default page size low, but an administrator may want to allow API clients
 to fetch more than that per request.

 Add a new setting, MAX_REST_RESULTS_PER_PAGE, to set the maximum page size.

 Closes: #202 ("Separate max API page size and default API page size into 
 different settings")
 Suggested-by: Stewart Smith 
 Suggested-by: Joel Stanley 
 Signed-off-by: Andrew Donnellan 
>>>
>>> FWIW we now have this applied on patchwork.ozlabs.org and it appears to
>>> be working. Would like some more input as to what an appropriate default
>>> limit is.
>> 
>> My completely fact-free feeling/opinion is that if it takes more than
>> ~1sec to run on Ozlabs, it's probably too high. Apart from that, I'm not
>> fussed.
>
> OzLabs.org is not a fast machine. 1 second round trip time == 100 per 
> page. 500 per page == ~2.3 seconds round trip.
>
> A single 500 item load is still under half the time of doing 5 * 100 
> item loads...

I bet MySQL would execute the query quicker :)

Anyway, that sounds really quite slow and we should look at why on earth
it's that slow - is there some kind of horrific hidden join or poor
indexing or query plan or something?

-- 
Stewart Smith
OPAL Architect, IBM.

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: [RFC PATCH] REST: Add new setting for maximum API page size

2018-08-08 Thread Stewart Smith
Andrew Donnellan  writes:
> On 24/07/18 15:10, Andrew Donnellan wrote:
>> In 41790caf59ad ("REST: Limit max page size") we limited the maximum page
>> size to the default page size in the settings.
>> 
>> This turns out to be rather restrictive, as we usually want to keep the
>> default page size low, but an administrator may want to allow API clients
>> to fetch more than that per request.
>> 
>> Add a new setting, MAX_REST_RESULTS_PER_PAGE, to set the maximum page size.
>> 
>> Closes: #202 ("Separate max API page size and default API page size into 
>> different settings")
>> Suggested-by: Stewart Smith 
>> Suggested-by: Joel Stanley 
>> Signed-off-by: Andrew Donnellan 
>
> FWIW we now have this applied on patchwork.ozlabs.org and it appears to 
> be working. Would like some more input as to what an appropriate default 
> limit is.

>From a *consumer* of the API PoV, 500 at a time rather than 30 is at
least a 6x speedup of my use of it, so that was extremely welcome. I
haven't looked at the size of the responses I'm getting back so have no
idea if 500 is a good one or not (I suspect I'd have to start optimizing
my code around 700-1000 responses/call).

My *guess* is that a fresh SQL query is run for each page retrieved, so
maybe 500 is "good enough" while there isn't some way to just stream
everything and not run the query multiple times.

-- 
Stewart Smith
OPAL Architect, IBM.

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork