[PATCH] Fetch all series for patch/cover viewing
e.g. a 10 comment patch goes from 26 queries in 17-20ms down to 20 queries in 12ms. A 67 comment cover letter goes from 14 queries in 16ms down to 8 queries in 8ms. So, effectively, a near 2x perf improvement. Previously, at several points we were asking for the latest series and then asking for all the series. Since there just usually aren't *that* many series, fetch them all and take the first one if we need to. Signed-off-by: Stewart Smith --- patchwork/templates/patchwork/submission.html | 10 +- patchwork/views/cover.py | 2 +- patchwork/views/patch.py | 1 + 3 files changed, 7 insertions(+), 6 deletions(-) diff --git a/patchwork/templates/patchwork/submission.html b/patchwork/templates/patchwork/submission.html index 2f69735d6925..3b6f9fbe909e 100644 --- a/patchwork/templates/patchwork/submission.html +++ b/patchwork/templates/patchwork/submission.html @@ -64,15 +64,15 @@ function toggle_div(link_id, headers_id) -{% if submission.latest_series %} +{% if submission.all_series %} Series - {% for series in submission.series.all %} + {% for series in all_series %} - {% if series == submission.latest_series %} + {% if forloop.first %} {{ series }} {% else %} @@ -93,7 +93,7 @@ function toggle_div(link_id, headers_id) >show -{% with submission.latest_series.cover_letter as cover %} +{% with all_series.cover_letter as cover %} {% if cover %} {% if cover == submission %} @@ -106,7 +106,7 @@ function toggle_div(link_id, headers_id) {% endif %} {% endwith %} -{% for sibling in submission.latest_series.patches.all %} +{% for sibling in all_series.patches.all %} {% if sibling == submission %} {{ sibling.name|default:"[no subject]"|truncatechars:100 }} diff --git a/patchwork/views/cover.py b/patchwork/views/cover.py index edad90bc694d..1ee2b3f988fa 100644 --- a/patchwork/views/cover.py +++ b/patchwork/views/cover.py @@ -49,7 +49,7 @@ def cover_detail(request, cover_id): comments = comments.select_related('submitter') comments = comments.only('submitter','date','id','content','submission') context['comments'] = comments - +context['all_series'] = cover.series.all().order_by('-date') return render_to_response('patchwork/submission.html', context) diff --git a/patchwork/views/patch.py b/patchwork/views/patch.py index f43fbecd9a4d..e1d0cdcfcf39 100644 --- a/patchwork/views/patch.py +++ b/patchwork/views/patch.py @@ -118,6 +118,7 @@ def patch_detail(request, patch_id): comments = comments.select_related('submitter') comments = comments.only('submitter','date','id','content','submission') +context['all_series'] = patch.series.all().order_by('-date') context['comments'] = comments context['submission'] = patch context['patchform'] = form -- 2.17.1 ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
[PATCH v2] 4x performance improvement for viewing patch with many comments
Using the example of id:20180720035941.6844-1-khand...@linux.vnet.ibm.com with my test dataset of a chunk of a variety of mailing lists, has this cover letter have 67 comments from a variety of people. Thus, it's on the larger side of things. Originally, displaying the /patch/550/ for this (redirected to /cover) would take 81 SQL queries in ~60ms on my laptop. After this optimisation, it's down to 14 queries in 14ms. When the cache is cold, it's down to 32ms from 83ms. The effect of this patch is to execute a join in the database to get the submitter information for each comment at the same time as getting all the comments rather than doing a one-by-one lookup after the fact. Signed-off-by: Stewart Smith --- patchwork/templates/patchwork/submission.html | 2 +- patchwork/views/cover.py | 5 + patchwork/views/patch.py | 5 + 3 files changed, 11 insertions(+), 1 deletion(-) --- Changes since v1: don't fat finger patch viewing (s/cover/patch/) diff --git a/patchwork/templates/patchwork/submission.html b/patchwork/templates/patchwork/submission.html index e817713f7079..2f69735d6925 100644 --- a/patchwork/templates/patchwork/submission.html +++ b/patchwork/templates/patchwork/submission.html @@ -262,7 +262,7 @@ function toggle_div(link_id, headers_id) -{% for item in submission.comments.all %} +{% for item in comments %} {% if forloop.first %} Comments {% endif %} diff --git a/patchwork/views/cover.py b/patchwork/views/cover.py index 73f83cb99b99..edad90bc694d 100644 --- a/patchwork/views/cover.py +++ b/patchwork/views/cover.py @@ -45,6 +45,11 @@ def cover_detail(request, cover_id): 'project': cover.project, } +comments = cover.comments.all() +comments = comments.select_related('submitter') +comments = comments.only('submitter','date','id','content','submission') +context['comments'] = comments + return render_to_response('patchwork/submission.html', context) diff --git a/patchwork/views/patch.py b/patchwork/views/patch.py index cbd4ec395d99..f43fbecd9a4d 100644 --- a/patchwork/views/patch.py +++ b/patchwork/views/patch.py @@ -114,6 +114,11 @@ def patch_detail(request, patch_id): if is_authenticated(request.user): context['bundles'] = request.user.bundles.all() +comments = patch.comments.all() +comments = comments.select_related('submitter') +comments = comments.only('submitter','date','id','content','submission') + +context['comments'] = comments context['submission'] = patch context['patchform'] = form context['createbundleform'] = createbundleform -- 2.17.1 ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
[PATCH] Add index for patchwork_comment (submission_id,date)
This (at least theoretically) should speed up displaying comments on patches/cover letters. It's an index that will return rows in-order for the query that we always do ("give me the comments on this submission in date order"), rather than having to have the database server do a sort for us. I haven't been able to benchmark something locally that shows this is an actual improvement, but I don't have as large data set as various production instances. The query plan does look a bit nicer though. Although the benefit of index maintenance versus how long it takes to sort things is a good question. Signed-off-by: Stewart Smith --- .../migrations/0027_add_comment_date_index.py | 23 +++ patchwork/models.py | 4 2 files changed, 27 insertions(+) create mode 100644 patchwork/migrations/0027_add_comment_date_index.py diff --git a/patchwork/migrations/0027_add_comment_date_index.py b/patchwork/migrations/0027_add_comment_date_index.py new file mode 100644 index ..0a57a9c3b212 --- /dev/null +++ b/patchwork/migrations/0027_add_comment_date_index.py @@ -0,0 +1,23 @@ +# -*- coding: utf-8 -*- +# Generated by Django 1.11.15 on 2018-08-09 14:03 +from __future__ import unicode_literals + +from django.db import migrations, models + + +class Migration(migrations.Migration): + +dependencies = [ +('patchwork', '0026_add_user_bundles_backref'), +] + +operations = [ +migrations.AlterModelOptions( +name='series', +options={'verbose_name_plural': 'Series'}, +), +migrations.AddIndex( +model_name='comment', +index=models.Index(fields=['submission', 'date'], name='submission_date_idx'), +), +] diff --git a/patchwork/models.py b/patchwork/models.py index d2b88fc48c91..d2389cfdad29 100644 --- a/patchwork/models.py +++ b/patchwork/models.py @@ -625,6 +625,10 @@ class Comment(EmailMixin, models.Model): class Meta: ordering = ['date'] unique_together = [('msgid', 'submission')] +indexes = [ +models.Index(name='submission_date_idx', + fields=['submission','date']) +] @python_2_unicode_compatible -- 2.17.1 ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
[PATCH] 4x performance improvement for viewing patch with many comments
Using the example of id:20180720035941.6844-1-khand...@linux.vnet.ibm.com with my test dataset of a chunk of a variety of mailing lists, has this cover letter have 67 comments from a variety of people. Thus, it's on the larger side of things. Originally, displaying the /patch/550/ for this (redirected to /cover) would take 81 SQL queries in ~60ms on my laptop. After this optimisation, it's down to 14 queries in 14ms. When the cache is cold, it's down to 32ms from 83ms. The effect of this patch is to execute a join in the database to get the submitter information for each comment at the same time as getting all the comments rather than doing a one-by-one lookup after the fact. Signed-off-by: Stewart Smith --- patchwork/templates/patchwork/submission.html | 2 +- patchwork/views/cover.py | 5 + patchwork/views/patch.py | 5 + 3 files changed, 11 insertions(+), 1 deletion(-) diff --git a/patchwork/templates/patchwork/submission.html b/patchwork/templates/patchwork/submission.html index e817713f7079..2f69735d6925 100644 --- a/patchwork/templates/patchwork/submission.html +++ b/patchwork/templates/patchwork/submission.html @@ -262,7 +262,7 @@ function toggle_div(link_id, headers_id) -{% for item in submission.comments.all %} +{% for item in comments %} {% if forloop.first %} Comments {% endif %} diff --git a/patchwork/views/cover.py b/patchwork/views/cover.py index 73f83cb99b99..edad90bc694d 100644 --- a/patchwork/views/cover.py +++ b/patchwork/views/cover.py @@ -45,6 +45,11 @@ def cover_detail(request, cover_id): 'project': cover.project, } +comments = cover.comments.all() +comments = comments.select_related('submitter') +comments = comments.only('submitter','date','id','content','submission') +context['comments'] = comments + return render_to_response('patchwork/submission.html', context) diff --git a/patchwork/views/patch.py b/patchwork/views/patch.py index cbd4ec395d99..bbd84954726e 100644 --- a/patchwork/views/patch.py +++ b/patchwork/views/patch.py @@ -114,6 +114,11 @@ def patch_detail(request, patch_id): if is_authenticated(request.user): context['bundles'] = request.user.bundles.all() +comments = cover.comments.all() +comments = comments.select_related('submitter') +comments = comments.only('submitter','date','id','content','submission') + +context['comments'] = comments context['submission'] = patch context['patchform'] = form context['createbundleform'] = createbundleform -- 2.17.1 ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
Re: [RFC PATCH] 3x improvement in patch listing
Daniel Axtens writes: > Daniel Axtens writes: > >> Konstantin Ryabitsev writes: >> >>> On Wed, Aug 08, 2018 at 05:40:06PM +1000, Stewart Smith wrote: There's two main bits that are really expensive when composing the list of patches for a project: the query getting the list, and the query finding the series for each patch. >>> >>> Stewart: >>> >>> Thanks for working on this! Do you think this would help with pagination >>> as well? I find that in semi-abandoned projects like >>> https://patchwork.kernel.org/project/qemu-devel/list/ it takes a few >>> seconds to load the list view due to 800+ pages of unprocessed patches. >>> I am currently considering an automated script that would auto-archive >>> patches older than 6 months, but if simply improving pagination times >>> would fix the issue, then I wouldn't need to bother. >> >> It should do. From a brief look at his patch, it fixes a hot path in >> __init__.py that goes through the paginator (and which I was *sure* I >> had looked at, but apparently I had not!) I'll do a quick sanity check > > Ah, I have figured out what I had thought I had fixed! > > So you're hitting the issue where the db has to fetch the 'diff', > 'content' and 'headers' out of Submission just to list the patches, > which is clearly utterly bonkers. This is *despite* the defer() we > attempt to do - jk reported a Django bug for it: [1]. Yeah, that'd explain it. I was kind of going "wtf" a lot at that, and then doing something kind of crazy to try and get things down to something even *remotely* sane. I was almost tempted to rewrite the page just as raw SQL rather than using Django at all, at least then it won't randomly regress in the future. I may be a cranky database guy with a healthy disdain for the SQL that comes out of ORMs though :) > I suppose it's worth taking the patch - the Django fix certainly isn't > going to hit the 1.11 stable tree and we're not supporting 2.0 just > yet. I do still want to check it properly first, so ... > >> on it and hopefully apply it to master and stable/2.1 within the next >> few days. (but no promises!) > > ... this lack-of-a-firm-timeline still applies. Any random testing / whatever I can do locally to try and help? This is literally my first time poking at anything Django. -- Stewart Smith OPAL Architect, IBM. ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
Re: [RFC PATCH] 3x improvement in patch listing
Konstantin Ryabitsev writes: > On Wed, Aug 08, 2018 at 05:40:06PM +1000, Stewart Smith wrote: >>There's two main bits that are really expensive when composing the list >>of patches for a project: the query getting the list, and the query >>finding the series for each patch. > > Stewart: > > Thanks for working on this! Do you think this would help with pagination > as well? I find that in semi-abandoned projects like > https://patchwork.kernel.org/project/qemu-devel/list/ it takes a few > seconds to load the list view due to 800+ pages of unprocessed patches. > I am currently considering an automated script that would auto-archive > patches older than 6 months, but if simply improving pagination times > would fix the issue, then I wouldn't need to bother. I may be accidentally letting people know I know something about databases again :) Out of interest, does the kernel.org instance back onto PostgreSQL or MySQL? I have a bunch of ideas that would quite dramatically improve performance on MySQL (properly using primary keys to force disk layout to be sane, rather than the current screw-locality method that Django enforces). On PostgreSQL, things are a bit different, but it's possible an occasional ALTER TABLE CLUSTER BY (or whatever the syntax is) could help a *lot*. I'm not sure that archiving the patches would help query performance at all, but I'd have to check it out a bit. The query that Django generates is certainly "interesting". -- Stewart Smith OPAL Architect, IBM. ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
Re: [RFC PATCH] 3x improvement in patch listing
Daniel Axtens writes: > Konstantin Ryabitsev writes: > >> On Wed, Aug 08, 2018 at 05:40:06PM +1000, Stewart Smith wrote: >>>There's two main bits that are really expensive when composing the list >>>of patches for a project: the query getting the list, and the query >>>finding the series for each patch. >> >> Stewart: >> >> Thanks for working on this! Do you think this would help with pagination >> as well? I find that in semi-abandoned projects like >> https://patchwork.kernel.org/project/qemu-devel/list/ it takes a few >> seconds to load the list view due to 800+ pages of unprocessed patches. >> I am currently considering an automated script that would auto-archive >> patches older than 6 months, but if simply improving pagination times >> would fix the issue, then I wouldn't need to bother. > > It should do. From a brief look at his patch, it fixes a hot path in > __init__.py that goes through the paginator (and which I was *sure* I > had looked at, but apparently I had not!) I'll do a quick sanity check Ah, I have figured out what I had thought I had fixed! So you're hitting the issue where the db has to fetch the 'diff', 'content' and 'headers' out of Submission just to list the patches, which is clearly utterly bonkers. This is *despite* the defer() we attempt to do - jk reported a Django bug for it: [1]. It looks like Debian picked up the patch for their release, but it didn't actually make it into Django 1.11.6 ([1], comment 8) I attempted a monkey-patch inside Patchwork [2] - the patch was NAKed, but OzLabs is/was carrying a local fix. I suppose it's worth taking the patch - the Django fix certainly isn't going to hit the 1.11 stable tree and we're not supporting 2.0 just yet. I do still want to check it properly first, so ... > on it and hopefully apply it to master and stable/2.1 within the next > few days. (but no promises!) ... this lack-of-a-firm-timeline still applies. Regards, Daniel [1] https://code.djangoproject.com/ticket/28549#ticket [2] https://patchwork.ozlabs.org/patch/809294/ ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
Re: Database consistency issues
On Thu, Aug 09, 2018 at 02:01:11AM +1000, Daniel Axtens wrote: In the case of Patchwork 1.1.3, I am more confused - the Comment model has a foreign key to Patch here. So if you try to delete Patch you should have got a referential integrity error like I did with e.g. Events. (Patchwork pre 2.0 is a pain to test as it uses Vagrant VMs, which use VirtualBox by default, which breaks SecureBoot, so I'm not going to try to get it going tonight.) Was there anything you deleted after or during the migration? Daniel: Hmm... Well, the migration was a bit hairy, so it's possible something did get missed. We migrated from 1.0, though, not from 1.1.3, so it's possible something got missed? If it helps, I can send the database schema that we currently have. Best, -K signature.asc Description: PGP signature ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
Re: [RFC PATCH] 3x improvement in patch listing
Konstantin Ryabitsev writes: > On Wed, Aug 08, 2018 at 05:40:06PM +1000, Stewart Smith wrote: >>There's two main bits that are really expensive when composing the list >>of patches for a project: the query getting the list, and the query >>finding the series for each patch. > > Stewart: > > Thanks for working on this! Do you think this would help with pagination > as well? I find that in semi-abandoned projects like > https://patchwork.kernel.org/project/qemu-devel/list/ it takes a few > seconds to load the list view due to 800+ pages of unprocessed patches. > I am currently considering an automated script that would auto-archive > patches older than 6 months, but if simply improving pagination times > would fix the issue, then I wouldn't need to bother. It should do. From a brief look at his patch, it fixes a hot path in __init__.py that goes through the paginator (and which I was *sure* I had looked at, but apparently I had not!) I'll do a quick sanity check on it and hopefully apply it to master and stable/2.1 within the next few days. (but no promises!) Regards, Daniel > > Best, > -K > ___ > Patchwork mailing list > Patchwork@lists.ozlabs.org > https://lists.ozlabs.org/listinfo/patchwork ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
Re: [PATCH v2 3/3] travis: add mariadb-10.3 as test
Hi Daniel, All look good to me (sparse commit messages aside) - I'll do some testing and hopefully push them this week. Thanks for your contributions to Patchwork! Regards, Daniel > Signed-off-by: Daniel Black > --- > .travis.yml | 5 + > 1 file changed, 5 insertions(+) > > diff --git a/.travis.yml b/.travis.yml > index 0c6e79d..70b42c1 100644 > --- a/.travis.yml > +++ b/.travis.yml > @@ -26,6 +26,11 @@ env: > > matrix: >include: > +- addons: > +mariadb: '10.3' > + env: > +- PW_TEST_DB_TYPE=mysql > +- PW_TEST_DB_USER=root > - addons: > postgresql: "10" > apt: > -- > 2.17.1 > > ___ > Patchwork mailing list > Patchwork@lists.ozlabs.org > https://lists.ozlabs.org/listinfo/patchwork ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
Re: Database consistency issues
> > I will prepare a patch that adds the foreign keys. Further to this: Patchwork 2.1 has that Comment has a foreign key relationship to submission, not patch. Patch *inherits* from Submission. This was, in hindsight, one of the cases where an ORM and object-orientation do not sit well together. Regardless, there's a migration (0009_add_submission_model.py) that renames the old Patch to Submission. It creates a new Patch table, which has Submission as a foreign key. Now Submission doesn't have a foreign key to Patch, because a Submission can be either a Patch or a CoverLetter or the body of a comment. So deleting a Patch in 2.1 won't delete the Submission, so it won't trigger the Comment foreign key, and you'll be left with dangling submissions and comments. I'm not immediately sure how best to address this except to continue to unpick the general existence of Submission which is part of my long term plan. [Stephen, do we allow comments on cover letters? Can we just add a foreign key to Comment referencing Patch? Could we add nullable foreign keys to Submission referencing Patch/CoverLetter/Comment so as to enforce some more referential integrity? (a la Event)] In the case of Patchwork 1.1.3, I am more confused - the Comment model has a foreign key to Patch here. So if you try to delete Patch you should have got a referential integrity error like I did with e.g. Events. (Patchwork pre 2.0 is a pain to test as it uses Vagrant VMs, which use VirtualBox by default, which breaks SecureBoot, so I'm not going to try to get it going tonight.) Was there anything you deleted after or during the migration? Regards, Daniel > > Thanks for the report! > > Regards, > Daniel ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
Re: Database consistency issues
On Thu, Aug 09, 2018 at 01:16:25AM +1000, Daniel Axtens wrote: SELECT id, submission_id FROM patchwork_comment WHERE submission_id=9896319; +--+---+ | id | submission_id | +--+---+ | 20810589 | 9896319 | +--+---+ 1 row in set (0.00 sec) MariaDB [KORG_PATCHWORK]> SELECT * FROM patchwork_submission WHERE id=9896319; Empty set (0.00 sec) Right, well this is definitely the most terrifying start to any email I have received since starting as one of the patchwork maintainers! Haha, sorry, I should have prefixed this with "I'm not seeing any data loss." :) I will prepare a patch that adds the foreign keys. Thanks! I know they are in place for the pgsql side of things, since the lore.kernel.org/patchwork instance is using postgresql. So, it was either a difference between db implementations, or a side-effect of migration. Best, -K signature.asc Description: PGP signature ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
Re: [RFC PATCH] 3x improvement in patch listing
On Wed, Aug 08, 2018 at 05:40:06PM +1000, Stewart Smith wrote: There's two main bits that are really expensive when composing the list of patches for a project: the query getting the list, and the query finding the series for each patch. Stewart: Thanks for working on this! Do you think this would help with pagination as well? I find that in semi-abandoned projects like https://patchwork.kernel.org/project/qemu-devel/list/ it takes a few seconds to load the list view due to 800+ pages of unprocessed patches. I am currently considering an automated script that would auto-archive patches older than 6 months, but if simply improving pagination times would fix the issue, then I wouldn't need to bother. Best, -K signature.asc Description: PGP signature ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
[RFC PATCH] 3x improvement in patch listing
There's two main bits that are really expensive when composing the list of patches for a project: the query getting the list, and the query finding the series for each patch. If we look at the query getting the list, it gets a lot of unnesseccary fields such as 'headers' and 'content', even though we tell Django not to. It turns out that Django seems to ignore the Submission relationship and I have no idea how to force it to ignore that thing (defer doesn't work) but if we go only, then it works okay. >From my import of ~8000 messages for a few projects, my laptop query time (MySQL, as setup by whatever the docker-compose things do) goes from: http://localhost:8000/project/linux-kernel/list/ FROM: 342ms SQL queries cold cache, 268ms warm cache TO: 118ms SQL queries cold cache, 88ms warm cache Which is... non trivial to say the least. The big jump is the patches.only change, and the removal of ordering on the patchseries takes a further 10ms off. For some strange reason, it seems rather hard to tell Django that you don't care what order the results come back in for that query (if we do, then the db server has to do a sort rather than just return each row) Signed-off-by: Stewart Smith --- patchwork/models.py | 1 - patchwork/views/__init__.py | 2 ++ 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/patchwork/models.py b/patchwork/models.py index 6268f5b72e3c..d2b88fc48c91 100644 --- a/patchwork/models.py +++ b/patchwork/models.py @@ -747,7 +747,6 @@ class Series(FilenameMixin, models.Model): return self.name if self.name else 'Untitled series #%d' % self.id class Meta: -ordering = ('date',) verbose_name_plural = 'Series' diff --git a/patchwork/views/__init__.py b/patchwork/views/__init__.py index f8d23a388ac7..96fd0798af5a 100644 --- a/patchwork/views/__init__.py +++ b/patchwork/views/__init__.py @@ -287,6 +287,8 @@ def generic_list(request, project, view, view_args=None, filter_settings=None, # rendering the list template patches = patches.select_related('state', 'submitter', 'delegate') +patches = patches.only('state','submitter','delegate','project','name','date') + # we also need checks and series patches = patches.prefetch_related('check_set', 'series') -- 2.17.1 ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
Re: [RFC PATCH] REST: Add new setting for maximum API page size
Daniel Axtens writes: > FWIW we now have this applied on patchwork.ozlabs.org and it appears to > be working. Would like some more input as to what an appropriate default > limit is. My completely fact-free feeling/opinion is that if it takes more than ~1sec to run on Ozlabs, it's probably too high. Apart from that, I'm not fussed. >>> >>> OzLabs.org is not a fast machine. 1 second round trip time == 100 per >>> page. 500 per page == ~2.3 seconds round trip. >>> >>> A single 500 item load is still under half the time of doing 5 * 100 >>> item loads... > > I wonder if we could let authenticated users do slower queries. Maybe > something for later. Let's split the difference at 250 then, I'd be > happy to merge that. > >> I bet MySQL would execute the query quicker :) > > In general (and I have tested this) mysql 5.7 executes patchwork queries > slower than postgres 9.6 - this is on my laptop on an SSD with a couple > 100ks of patches across 2 or 3 projects. Huh, somewhat surprising. I wonder what the query plan and cache layout looks like >> Anyway, that sounds really quite slow and we should look at why on earth >> it's that slow - is there some kind of horrific hidden join or poor >> indexing or query plan or something? > > Yes. > > So the 1.0 -> 2.0 transition was not good for performance because it > turns out databases do not map well to object-oriented structures.* I > didn't think to check for performance at the time, so we're > progressively improving that by denormalising stuff. (In fact that was a > major driver of the 2.1 release!) There is still further to go. So, looking at the schema, if we were going for the state of a bunch of patches in a project (which I imagine is fairly common), the current patch table is: CREATE TABLE `patchwork_patch` ( `submission_ptr_id` int(11) NOT NULL, `diff` longtext, `commit_ref` varchar(255) DEFAULT NULL, `pull_url` varchar(255) DEFAULT NULL, `delegate_id` int(11) DEFAULT NULL, `state_id` int(11) DEFAULT NULL, `archived` tinyint(1) NOT NULL, `hash` char(40) DEFAULT NULL, `patch_project_id` int(11) NOT NULL, PRIMARY KEY (`submission_ptr_id`), KEY `patchwork_patch_delegate_id_8f7ce805_fk_auth_user_id` (`delegate_id`), KEY `patchwork_patch_state_id_3d9fb500_fk_patchwork_state_id` (`state_id`), KEY `patchwork_patch_patch_project_id_aff1ad51_fk_patchwork` (`patch_project_id`), CONSTRAINT `patchwork_patch_delegate_id_8f7ce805_fk_auth_user_id` FOREIGN KEY (`delegate_id`) REFERENCES `auth_user` (`id`), CONSTRAINT `patchwork_patch_patch_project_id_aff1ad51_fk_patchwork` FOREIGN KEY (`patch_project_id`) REFERENCES `patchwork_project` (`id`), CONSTRAINT `patchwork_patch_state_id_3d9fb500_fk_patchwork_state_id` FOREIGN KEY (`state_id`) REFERENCES `patchwork_state` (`id`), CONSTRAINT `patchwork_patch_submission_ptr_id_03216801_fk_patchwork` FOREIGN KEY (`submission_ptr_id`) REFERENCES `patchwork_submission` (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 which puts the submission_ptr_id as the key for the clustered index (i.e. on disk layout will group by this ID). What I suspect is occuring is that we end up having teh on disk layout of patchwork_patch (and patchwork_submission) be ordered by time the patch comes in across *all* projects of the patchwork instance, which is typically (never?) what is actually queried. So that when I do queries over, say 'skiboot', I'm pulling into cache all the submissions/patches that occured to any project around that time, rather than only things related to skiboot. If instead, the primary key was ('patch_project_id','submission_ptr_id') and there was a UNIQUE KEY ('submission_ptr_id') then this would mean that we'd be pulling in a database page full of the patches for the project we're searching on, which I think would be more likely to be queried in the near future. The patch_project_id index colud be dropped, as all queries on it would be satisfied by the primary key. The same goes for patchwork_submission: CREATE TABLE `patchwork_submission` ( `id` int(11) NOT NULL AUTO_INCREMENT, `msgid` varchar(255) NOT NULL, `name` varchar(255) NOT NULL, `date` datetime(6) NOT NULL, `headers` longtext NOT NULL, `project_id` int(11) NOT NULL, `submitter_id` int(11) NOT NULL, `content` longtext, PRIMARY KEY (`id`), UNIQUE KEY `patchwork_patch_msgid_project_id_2e1fe709_uniq` (`msgid`,`project_id`), KEY `patchwork_patch_project_id_162a7a7f_fk_patchwork_project_id` (`project_id`), KEY `patchwork_patch_submitter_id_57c66142_fk_patchwork_person_id` (`submitter_id`), CONSTRAINT `patchwork_patch_project_id_162a7a7f_fk_patchwork_project_id` FOREIGN KEY (`project_id`) REFERENCES `patchwork_project` (`id`), CONSTRAINT `patchwork_patch_submitter_id_57c66142_fk_patchwork_person_id` FOREIGN KEY (`submitter_id`) REFERENCES `patchwork_person` (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 simply getting a list of patches for a project here is
Re: [RFC PATCH] REST: Add new setting for maximum API page size
Andrew Donnellan writes: > On 26/07/18 23:24, Daniel Axtens wrote: >> Andrew Donnellan writes: >> >>> On 24/07/18 15:10, Andrew Donnellan wrote: In 41790caf59ad ("REST: Limit max page size") we limited the maximum page size to the default page size in the settings. This turns out to be rather restrictive, as we usually want to keep the default page size low, but an administrator may want to allow API clients to fetch more than that per request. Add a new setting, MAX_REST_RESULTS_PER_PAGE, to set the maximum page size. Closes: #202 ("Separate max API page size and default API page size into different settings") Suggested-by: Stewart Smith Suggested-by: Joel Stanley Signed-off-by: Andrew Donnellan >>> >>> FWIW we now have this applied on patchwork.ozlabs.org and it appears to >>> be working. Would like some more input as to what an appropriate default >>> limit is. >> >> My completely fact-free feeling/opinion is that if it takes more than >> ~1sec to run on Ozlabs, it's probably too high. Apart from that, I'm not >> fussed. > > OzLabs.org is not a fast machine. 1 second round trip time == 100 per > page. 500 per page == ~2.3 seconds round trip. > > A single 500 item load is still under half the time of doing 5 * 100 > item loads... I bet MySQL would execute the query quicker :) Anyway, that sounds really quite slow and we should look at why on earth it's that slow - is there some kind of horrific hidden join or poor indexing or query plan or something? -- Stewart Smith OPAL Architect, IBM. ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
Re: [RFC PATCH] REST: Add new setting for maximum API page size
Andrew Donnellan writes: > On 24/07/18 15:10, Andrew Donnellan wrote: >> In 41790caf59ad ("REST: Limit max page size") we limited the maximum page >> size to the default page size in the settings. >> >> This turns out to be rather restrictive, as we usually want to keep the >> default page size low, but an administrator may want to allow API clients >> to fetch more than that per request. >> >> Add a new setting, MAX_REST_RESULTS_PER_PAGE, to set the maximum page size. >> >> Closes: #202 ("Separate max API page size and default API page size into >> different settings") >> Suggested-by: Stewart Smith >> Suggested-by: Joel Stanley >> Signed-off-by: Andrew Donnellan > > FWIW we now have this applied on patchwork.ozlabs.org and it appears to > be working. Would like some more input as to what an appropriate default > limit is. >From a *consumer* of the API PoV, 500 at a time rather than 30 is at least a 6x speedup of my use of it, so that was extremely welcome. I haven't looked at the size of the responses I'm getting back so have no idea if 500 is a good one or not (I suspect I'd have to start optimizing my code around 700-1000 responses/call). My *guess* is that a fresh SQL query is run for each page retrieved, so maybe 500 is "good enough" while there isn't some way to just stream everything and not run the query multiple times. -- Stewart Smith OPAL Architect, IBM. ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork