Re: [RFC PATCH] REST: Add new setting for maximum API page size
Andrew Donnellan writes: > On 07/08/18 17:20, Daniel Axtens wrote: >> I wonder if we could let authenticated users do slower queries. Maybe >> something for later. Let's split the difference at 250 then, I'd be >> happy to merge that. > > If you want to take this patch and s/500/250/g during merge then I'd be > happy with that. Applied based on my trusting that OzLabs testing has been sufficent. Regards, Daniel > > -- > Andrew Donnellan OzLabs, ADL Canberra > andrew.donnel...@au1.ibm.com IBM Australia Limited ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
Re: [RFC PATCH] REST: Add new setting for maximum API page size
Daniel Axtens writes: > FWIW we now have this applied on patchwork.ozlabs.org and it appears to > be working. Would like some more input as to what an appropriate default > limit is. My completely fact-free feeling/opinion is that if it takes more than ~1sec to run on Ozlabs, it's probably too high. Apart from that, I'm not fussed. >>> >>> OzLabs.org is not a fast machine. 1 second round trip time == 100 per >>> page. 500 per page == ~2.3 seconds round trip. >>> >>> A single 500 item load is still under half the time of doing 5 * 100 >>> item loads... > > I wonder if we could let authenticated users do slower queries. Maybe > something for later. Let's split the difference at 250 then, I'd be > happy to merge that. > >> I bet MySQL would execute the query quicker :) > > In general (and I have tested this) mysql 5.7 executes patchwork queries > slower than postgres 9.6 - this is on my laptop on an SSD with a couple > 100ks of patches across 2 or 3 projects. Huh, somewhat surprising. I wonder what the query plan and cache layout looks like >> Anyway, that sounds really quite slow and we should look at why on earth >> it's that slow - is there some kind of horrific hidden join or poor >> indexing or query plan or something? > > Yes. > > So the 1.0 -> 2.0 transition was not good for performance because it > turns out databases do not map well to object-oriented structures.* I > didn't think to check for performance at the time, so we're > progressively improving that by denormalising stuff. (In fact that was a > major driver of the 2.1 release!) There is still further to go. So, looking at the schema, if we were going for the state of a bunch of patches in a project (which I imagine is fairly common), the current patch table is: CREATE TABLE `patchwork_patch` ( `submission_ptr_id` int(11) NOT NULL, `diff` longtext, `commit_ref` varchar(255) DEFAULT NULL, `pull_url` varchar(255) DEFAULT NULL, `delegate_id` int(11) DEFAULT NULL, `state_id` int(11) DEFAULT NULL, `archived` tinyint(1) NOT NULL, `hash` char(40) DEFAULT NULL, `patch_project_id` int(11) NOT NULL, PRIMARY KEY (`submission_ptr_id`), KEY `patchwork_patch_delegate_id_8f7ce805_fk_auth_user_id` (`delegate_id`), KEY `patchwork_patch_state_id_3d9fb500_fk_patchwork_state_id` (`state_id`), KEY `patchwork_patch_patch_project_id_aff1ad51_fk_patchwork` (`patch_project_id`), CONSTRAINT `patchwork_patch_delegate_id_8f7ce805_fk_auth_user_id` FOREIGN KEY (`delegate_id`) REFERENCES `auth_user` (`id`), CONSTRAINT `patchwork_patch_patch_project_id_aff1ad51_fk_patchwork` FOREIGN KEY (`patch_project_id`) REFERENCES `patchwork_project` (`id`), CONSTRAINT `patchwork_patch_state_id_3d9fb500_fk_patchwork_state_id` FOREIGN KEY (`state_id`) REFERENCES `patchwork_state` (`id`), CONSTRAINT `patchwork_patch_submission_ptr_id_03216801_fk_patchwork` FOREIGN KEY (`submission_ptr_id`) REFERENCES `patchwork_submission` (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 which puts the submission_ptr_id as the key for the clustered index (i.e. on disk layout will group by this ID). What I suspect is occuring is that we end up having teh on disk layout of patchwork_patch (and patchwork_submission) be ordered by time the patch comes in across *all* projects of the patchwork instance, which is typically (never?) what is actually queried. So that when I do queries over, say 'skiboot', I'm pulling into cache all the submissions/patches that occured to any project around that time, rather than only things related to skiboot. If instead, the primary key was ('patch_project_id','submission_ptr_id') and there was a UNIQUE KEY ('submission_ptr_id') then this would mean that we'd be pulling in a database page full of the patches for the project we're searching on, which I think would be more likely to be queried in the near future. The patch_project_id index colud be dropped, as all queries on it would be satisfied by the primary key. The same goes for patchwork_submission: CREATE TABLE `patchwork_submission` ( `id` int(11) NOT NULL AUTO_INCREMENT, `msgid` varchar(255) NOT NULL, `name` varchar(255) NOT NULL, `date` datetime(6) NOT NULL, `headers` longtext NOT NULL, `project_id` int(11) NOT NULL, `submitter_id` int(11) NOT NULL, `content` longtext, PRIMARY KEY (`id`), UNIQUE KEY `patchwork_patch_msgid_project_id_2e1fe709_uniq` (`msgid`,`project_id`), KEY `patchwork_patch_project_id_162a7a7f_fk_patchwork_project_id` (`project_id`), KEY `patchwork_patch_submitter_id_57c66142_fk_patchwork_person_id` (`submitter_id`), CONSTRAINT `patchwork_patch_project_id_162a7a7f_fk_patchwork_project_id` FOREIGN KEY (`project_id`) REFERENCES `patchwork_project` (`id`), CONSTRAINT `patchwork_patch_submitter_id_57c66142_fk_patchwork_person_id` FOREIGN KEY (`submitter_id`) REFERENCES `patchwork_person` (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 simply getting a list of patches for a project here is
Re: [RFC PATCH] REST: Add new setting for maximum API page size
Andrew Donnellan writes: > On 26/07/18 23:24, Daniel Axtens wrote: >> Andrew Donnellan writes: >> >>> On 24/07/18 15:10, Andrew Donnellan wrote: In 41790caf59ad ("REST: Limit max page size") we limited the maximum page size to the default page size in the settings. This turns out to be rather restrictive, as we usually want to keep the default page size low, but an administrator may want to allow API clients to fetch more than that per request. Add a new setting, MAX_REST_RESULTS_PER_PAGE, to set the maximum page size. Closes: #202 ("Separate max API page size and default API page size into different settings") Suggested-by: Stewart Smith Suggested-by: Joel Stanley Signed-off-by: Andrew Donnellan >>> >>> FWIW we now have this applied on patchwork.ozlabs.org and it appears to >>> be working. Would like some more input as to what an appropriate default >>> limit is. >> >> My completely fact-free feeling/opinion is that if it takes more than >> ~1sec to run on Ozlabs, it's probably too high. Apart from that, I'm not >> fussed. > > OzLabs.org is not a fast machine. 1 second round trip time == 100 per > page. 500 per page == ~2.3 seconds round trip. > > A single 500 item load is still under half the time of doing 5 * 100 > item loads... I bet MySQL would execute the query quicker :) Anyway, that sounds really quite slow and we should look at why on earth it's that slow - is there some kind of horrific hidden join or poor indexing or query plan or something? -- Stewart Smith OPAL Architect, IBM. ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
Re: [RFC PATCH] REST: Add new setting for maximum API page size
Andrew Donnellan writes: > On 24/07/18 15:10, Andrew Donnellan wrote: >> In 41790caf59ad ("REST: Limit max page size") we limited the maximum page >> size to the default page size in the settings. >> >> This turns out to be rather restrictive, as we usually want to keep the >> default page size low, but an administrator may want to allow API clients >> to fetch more than that per request. >> >> Add a new setting, MAX_REST_RESULTS_PER_PAGE, to set the maximum page size. >> >> Closes: #202 ("Separate max API page size and default API page size into >> different settings") >> Suggested-by: Stewart Smith >> Suggested-by: Joel Stanley >> Signed-off-by: Andrew Donnellan > > FWIW we now have this applied on patchwork.ozlabs.org and it appears to > be working. Would like some more input as to what an appropriate default > limit is. >From a *consumer* of the API PoV, 500 at a time rather than 30 is at least a 6x speedup of my use of it, so that was extremely welcome. I haven't looked at the size of the responses I'm getting back so have no idea if 500 is a good one or not (I suspect I'd have to start optimizing my code around 700-1000 responses/call). My *guess* is that a fresh SQL query is run for each page retrieved, so maybe 500 is "good enough" while there isn't some way to just stream everything and not run the query multiple times. -- Stewart Smith OPAL Architect, IBM. ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
Re: [RFC PATCH] REST: Add new setting for maximum API page size
On 07/08/18 17:20, Daniel Axtens wrote: I wonder if we could let authenticated users do slower queries. Maybe something for later. Let's split the difference at 250 then, I'd be happy to merge that. If you want to take this patch and s/500/250/g during merge then I'd be happy with that. -- Andrew Donnellan OzLabs, ADL Canberra andrew.donnel...@au1.ibm.com IBM Australia Limited ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
Re: [RFC PATCH] REST: Add new setting for maximum API page size
FWIW we now have this applied on patchwork.ozlabs.org and it appears to be working. Would like some more input as to what an appropriate default limit is. >>> >>> My completely fact-free feeling/opinion is that if it takes more than >>> ~1sec to run on Ozlabs, it's probably too high. Apart from that, I'm not >>> fussed. >> >> OzLabs.org is not a fast machine. 1 second round trip time == 100 per >> page. 500 per page == ~2.3 seconds round trip. >> >> A single 500 item load is still under half the time of doing 5 * 100 >> item loads... I wonder if we could let authenticated users do slower queries. Maybe something for later. Let's split the difference at 250 then, I'd be happy to merge that. > I bet MySQL would execute the query quicker :) In general (and I have tested this) mysql 5.7 executes patchwork queries slower than postgres 9.6 - this is on my laptop on an SSD with a couple 100ks of patches across 2 or 3 projects. > Anyway, that sounds really quite slow and we should look at why on earth > it's that slow - is there some kind of horrific hidden join or poor > indexing or query plan or something? Yes. So the 1.0 -> 2.0 transition was not good for performance because it turns out databases do not map well to object-oriented structures.* I didn't think to check for performance at the time, so we're progressively improving that by denormalising stuff. (In fact that was a major driver of the 2.1 release!) There is still further to go. Regards, Daniel * Yes, I probably should have realised this at the time. something something unpaid volunteers something something patch review something something. > > -- > Stewart Smith > OPAL Architect, IBM. ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
Re: [RFC PATCH] REST: Add new setting for maximum API page size
On 26/07/18 23:24, Daniel Axtens wrote: Andrew Donnellan writes: On 24/07/18 15:10, Andrew Donnellan wrote: In 41790caf59ad ("REST: Limit max page size") we limited the maximum page size to the default page size in the settings. This turns out to be rather restrictive, as we usually want to keep the default page size low, but an administrator may want to allow API clients to fetch more than that per request. Add a new setting, MAX_REST_RESULTS_PER_PAGE, to set the maximum page size. Closes: #202 ("Separate max API page size and default API page size into different settings") Suggested-by: Stewart Smith Suggested-by: Joel Stanley Signed-off-by: Andrew Donnellan FWIW we now have this applied on patchwork.ozlabs.org and it appears to be working. Would like some more input as to what an appropriate default limit is. My completely fact-free feeling/opinion is that if it takes more than ~1sec to run on Ozlabs, it's probably too high. Apart from that, I'm not fussed. OzLabs.org is not a fast machine. 1 second round trip time == 100 per page. 500 per page == ~2.3 seconds round trip. A single 500 item load is still under half the time of doing 5 * 100 item loads... -- Andrew Donnellan OzLabs, ADL Canberra andrew.donnel...@au1.ibm.com IBM Australia Limited ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
Re: [RFC PATCH] REST: Add new setting for maximum API page size
On 24/07/18 15:10, Andrew Donnellan wrote: In 41790caf59ad ("REST: Limit max page size") we limited the maximum page size to the default page size in the settings. This turns out to be rather restrictive, as we usually want to keep the default page size low, but an administrator may want to allow API clients to fetch more than that per request. Add a new setting, MAX_REST_RESULTS_PER_PAGE, to set the maximum page size. Closes: #202 ("Separate max API page size and default API page size into different settings") Suggested-by: Stewart Smith Suggested-by: Joel Stanley Signed-off-by: Andrew Donnellan FWIW we now have this applied on patchwork.ozlabs.org and it appears to be working. Would like some more input as to what an appropriate default limit is. -- Andrew Donnellan OzLabs, ADL Canberra andrew.donnel...@au1.ibm.com IBM Australia Limited ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
Re: [RFC PATCH] REST: Add new setting for maximum API page size
On 24/07/18 15:10, Andrew Donnellan wrote: diff --git a/docs/deployment/configuration.rst b/docs/deployment/configuration.rst index 347485636d47..e599522a412b 100644 --- a/docs/deployment/configuration.rst +++ b/docs/deployment/configuration.rst @@ -88,7 +88,13 @@ Enable the :doc:`REST API <../api/rest>`. The number of items to include in REST API responses by default. This can be overridden by the ``per_page`` parameter for some endpoints. -.. versionadded:: 2.0 This obviously shouldn't have been there, will fix when I send non RFC +``MAX_REST_RESULTS_PER_PAGE`` +~ + +The maximum number of items that can be requested in a REST API request using +the ``per_page`` parameter. + +.. versionadded:: 2.2 -- Andrew Donnellan OzLabs, ADL Canberra andrew.donnel...@au1.ibm.com IBM Australia Limited ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork