Re: [RFC PATCH] REST: Add new setting for maximum API page size

2018-08-26 Thread Daniel Axtens
Andrew Donnellan  writes:

> On 07/08/18 17:20, Daniel Axtens wrote:
>> I wonder if we could let authenticated users do slower queries. Maybe
>> something for later. Let's split the difference at 250 then, I'd be
>> happy to merge that.
>
> If you want to take this patch and s/500/250/g during merge then I'd be 
> happy with that.

Applied based on my trusting that OzLabs testing has been sufficent.

Regards,
Daniel

>
> -- 
> Andrew Donnellan  OzLabs, ADL Canberra
> andrew.donnel...@au1.ibm.com  IBM Australia Limited
___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: [RFC PATCH] REST: Add new setting for maximum API page size

2018-08-08 Thread Stewart Smith
Daniel Axtens  writes:
> FWIW we now have this applied on patchwork.ozlabs.org and it appears to
> be working. Would like some more input as to what an appropriate default
> limit is.
 
 My completely fact-free feeling/opinion is that if it takes more than
 ~1sec to run on Ozlabs, it's probably too high. Apart from that, I'm not
 fussed.
>>>
>>> OzLabs.org is not a fast machine. 1 second round trip time == 100 per 
>>> page. 500 per page == ~2.3 seconds round trip.
>>>
>>> A single 500 item load is still under half the time of doing 5 * 100 
>>> item loads...
>
> I wonder if we could let authenticated users do slower queries. Maybe
> something for later. Let's split the difference at 250 then, I'd be
> happy to merge that.
>
>> I bet MySQL would execute the query quicker :)
>
> In general (and I have tested this) mysql 5.7 executes patchwork queries
> slower than postgres 9.6 - this is on my laptop on an SSD with a couple
> 100ks of patches across 2 or 3 projects.

Huh, somewhat surprising. I wonder what the query plan and cache layout
looks like

>> Anyway, that sounds really quite slow and we should look at why on earth
>> it's that slow - is there some kind of horrific hidden join or poor
>> indexing or query plan or something?
>
> Yes.
>
> So the 1.0 -> 2.0 transition was not good for performance because it
> turns out databases do not map well to object-oriented structures.* I
> didn't think to check for performance at the time, so we're
> progressively improving that by denormalising stuff. (In fact that was a
> major driver of the 2.1 release!) There is still further to go.

So, looking at the schema, if we were going for the state of a bunch of
patches in a project (which I imagine is  fairly common), the current
patch table is:

CREATE TABLE `patchwork_patch` (
  `submission_ptr_id` int(11) NOT NULL,
  `diff` longtext,
  `commit_ref` varchar(255) DEFAULT NULL,
  `pull_url` varchar(255) DEFAULT NULL,
  `delegate_id` int(11) DEFAULT NULL,
  `state_id` int(11) DEFAULT NULL,
  `archived` tinyint(1) NOT NULL,
  `hash` char(40) DEFAULT NULL,
  `patch_project_id` int(11) NOT NULL,
  PRIMARY KEY (`submission_ptr_id`),
  KEY `patchwork_patch_delegate_id_8f7ce805_fk_auth_user_id` (`delegate_id`),
  KEY `patchwork_patch_state_id_3d9fb500_fk_patchwork_state_id` (`state_id`),
  KEY `patchwork_patch_patch_project_id_aff1ad51_fk_patchwork` 
(`patch_project_id`),
  CONSTRAINT `patchwork_patch_delegate_id_8f7ce805_fk_auth_user_id` FOREIGN KEY 
(`delegate_id`) REFERENCES `auth_user` (`id`),
  CONSTRAINT `patchwork_patch_patch_project_id_aff1ad51_fk_patchwork` FOREIGN 
KEY (`patch_project_id`) REFERENCES `patchwork_project` (`id`),
  CONSTRAINT `patchwork_patch_state_id_3d9fb500_fk_patchwork_state_id` FOREIGN 
KEY (`state_id`) REFERENCES `patchwork_state` (`id`),
  CONSTRAINT `patchwork_patch_submission_ptr_id_03216801_fk_patchwork` FOREIGN 
KEY (`submission_ptr_id`) REFERENCES `patchwork_submission` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

which puts the submission_ptr_id as the key for the clustered index
(i.e. on disk layout will group by this ID).

What I suspect is occuring is that we end up having teh on disk layout
of patchwork_patch (and patchwork_submission) be ordered by time the
patch comes in across *all* projects of the patchwork instance, which is
typically (never?) what is actually queried.

So that when I do queries over, say 'skiboot', I'm pulling into cache
all the submissions/patches that occured to any project around that
time, rather than only things related to skiboot.

If instead, the primary key was ('patch_project_id','submission_ptr_id')
and there was a UNIQUE KEY ('submission_ptr_id') then this would mean
that we'd be pulling in a database page full of the patches for the
project we're searching on, which I think would be more likely to be
queried in the near future.

The patch_project_id index colud be dropped, as all queries on it would
be satisfied by the primary key.

The same goes for patchwork_submission:

CREATE TABLE `patchwork_submission` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `msgid` varchar(255) NOT NULL,
  `name` varchar(255) NOT NULL,
  `date` datetime(6) NOT NULL,
  `headers` longtext NOT NULL,
  `project_id` int(11) NOT NULL,
  `submitter_id` int(11) NOT NULL,
  `content` longtext,
  PRIMARY KEY (`id`),
  UNIQUE KEY `patchwork_patch_msgid_project_id_2e1fe709_uniq` 
(`msgid`,`project_id`),
  KEY `patchwork_patch_project_id_162a7a7f_fk_patchwork_project_id` 
(`project_id`),
  KEY `patchwork_patch_submitter_id_57c66142_fk_patchwork_person_id` 
(`submitter_id`),
  CONSTRAINT `patchwork_patch_project_id_162a7a7f_fk_patchwork_project_id` 
FOREIGN KEY (`project_id`) REFERENCES `patchwork_project` (`id`),
  CONSTRAINT `patchwork_patch_submitter_id_57c66142_fk_patchwork_person_id` 
FOREIGN KEY (`submitter_id`) REFERENCES `patchwork_person` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

simply getting a list of patches for a project here is 

Re: [RFC PATCH] REST: Add new setting for maximum API page size

2018-08-08 Thread Stewart Smith
Andrew Donnellan  writes:
> On 26/07/18 23:24, Daniel Axtens wrote:
>> Andrew Donnellan  writes:
>> 
>>> On 24/07/18 15:10, Andrew Donnellan wrote:
 In 41790caf59ad ("REST: Limit max page size") we limited the maximum page
 size to the default page size in the settings.

 This turns out to be rather restrictive, as we usually want to keep the
 default page size low, but an administrator may want to allow API clients
 to fetch more than that per request.

 Add a new setting, MAX_REST_RESULTS_PER_PAGE, to set the maximum page size.

 Closes: #202 ("Separate max API page size and default API page size into 
 different settings")
 Suggested-by: Stewart Smith 
 Suggested-by: Joel Stanley 
 Signed-off-by: Andrew Donnellan 
>>>
>>> FWIW we now have this applied on patchwork.ozlabs.org and it appears to
>>> be working. Would like some more input as to what an appropriate default
>>> limit is.
>> 
>> My completely fact-free feeling/opinion is that if it takes more than
>> ~1sec to run on Ozlabs, it's probably too high. Apart from that, I'm not
>> fussed.
>
> OzLabs.org is not a fast machine. 1 second round trip time == 100 per 
> page. 500 per page == ~2.3 seconds round trip.
>
> A single 500 item load is still under half the time of doing 5 * 100 
> item loads...

I bet MySQL would execute the query quicker :)

Anyway, that sounds really quite slow and we should look at why on earth
it's that slow - is there some kind of horrific hidden join or poor
indexing or query plan or something?

-- 
Stewart Smith
OPAL Architect, IBM.

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: [RFC PATCH] REST: Add new setting for maximum API page size

2018-08-08 Thread Stewart Smith
Andrew Donnellan  writes:
> On 24/07/18 15:10, Andrew Donnellan wrote:
>> In 41790caf59ad ("REST: Limit max page size") we limited the maximum page
>> size to the default page size in the settings.
>> 
>> This turns out to be rather restrictive, as we usually want to keep the
>> default page size low, but an administrator may want to allow API clients
>> to fetch more than that per request.
>> 
>> Add a new setting, MAX_REST_RESULTS_PER_PAGE, to set the maximum page size.
>> 
>> Closes: #202 ("Separate max API page size and default API page size into 
>> different settings")
>> Suggested-by: Stewart Smith 
>> Suggested-by: Joel Stanley 
>> Signed-off-by: Andrew Donnellan 
>
> FWIW we now have this applied on patchwork.ozlabs.org and it appears to 
> be working. Would like some more input as to what an appropriate default 
> limit is.

>From a *consumer* of the API PoV, 500 at a time rather than 30 is at
least a 6x speedup of my use of it, so that was extremely welcome. I
haven't looked at the size of the responses I'm getting back so have no
idea if 500 is a good one or not (I suspect I'd have to start optimizing
my code around 700-1000 responses/call).

My *guess* is that a fresh SQL query is run for each page retrieved, so
maybe 500 is "good enough" while there isn't some way to just stream
everything and not run the query multiple times.

-- 
Stewart Smith
OPAL Architect, IBM.

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: [RFC PATCH] REST: Add new setting for maximum API page size

2018-08-07 Thread Andrew Donnellan

On 07/08/18 17:20, Daniel Axtens wrote:

I wonder if we could let authenticated users do slower queries. Maybe
something for later. Let's split the difference at 250 then, I'd be
happy to merge that.


If you want to take this patch and s/500/250/g during merge then I'd be 
happy with that.


--
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: [RFC PATCH] REST: Add new setting for maximum API page size

2018-08-07 Thread Daniel Axtens
 FWIW we now have this applied on patchwork.ozlabs.org and it appears to
 be working. Would like some more input as to what an appropriate default
 limit is.
>>> 
>>> My completely fact-free feeling/opinion is that if it takes more than
>>> ~1sec to run on Ozlabs, it's probably too high. Apart from that, I'm not
>>> fussed.
>>
>> OzLabs.org is not a fast machine. 1 second round trip time == 100 per 
>> page. 500 per page == ~2.3 seconds round trip.
>>
>> A single 500 item load is still under half the time of doing 5 * 100 
>> item loads...

I wonder if we could let authenticated users do slower queries. Maybe
something for later. Let's split the difference at 250 then, I'd be
happy to merge that.

> I bet MySQL would execute the query quicker :)

In general (and I have tested this) mysql 5.7 executes patchwork queries
slower than postgres 9.6 - this is on my laptop on an SSD with a couple
100ks of patches across 2 or 3 projects.

> Anyway, that sounds really quite slow and we should look at why on earth
> it's that slow - is there some kind of horrific hidden join or poor
> indexing or query plan or something?

Yes.

So the 1.0 -> 2.0 transition was not good for performance because it
turns out databases do not map well to object-oriented structures.* I
didn't think to check for performance at the time, so we're
progressively improving that by denormalising stuff. (In fact that was a
major driver of the 2.1 release!) There is still further to go.

Regards,
Daniel

* Yes, I probably should have realised this at the time. something
  something unpaid volunteers something something patch review something
  something.

>
> -- 
> Stewart Smith
> OPAL Architect, IBM.
___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: [RFC PATCH] REST: Add new setting for maximum API page size

2018-08-06 Thread Andrew Donnellan

On 26/07/18 23:24, Daniel Axtens wrote:

Andrew Donnellan  writes:


On 24/07/18 15:10, Andrew Donnellan wrote:

In 41790caf59ad ("REST: Limit max page size") we limited the maximum page
size to the default page size in the settings.

This turns out to be rather restrictive, as we usually want to keep the
default page size low, but an administrator may want to allow API clients
to fetch more than that per request.

Add a new setting, MAX_REST_RESULTS_PER_PAGE, to set the maximum page size.

Closes: #202 ("Separate max API page size and default API page size into different 
settings")
Suggested-by: Stewart Smith 
Suggested-by: Joel Stanley 
Signed-off-by: Andrew Donnellan 


FWIW we now have this applied on patchwork.ozlabs.org and it appears to
be working. Would like some more input as to what an appropriate default
limit is.


My completely fact-free feeling/opinion is that if it takes more than
~1sec to run on Ozlabs, it's probably too high. Apart from that, I'm not
fussed.


OzLabs.org is not a fast machine. 1 second round trip time == 100 per 
page. 500 per page == ~2.3 seconds round trip.


A single 500 item load is still under half the time of doing 5 * 100 
item loads...


--
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: [RFC PATCH] REST: Add new setting for maximum API page size

2018-07-25 Thread Andrew Donnellan

On 24/07/18 15:10, Andrew Donnellan wrote:

In 41790caf59ad ("REST: Limit max page size") we limited the maximum page
size to the default page size in the settings.

This turns out to be rather restrictive, as we usually want to keep the
default page size low, but an administrator may want to allow API clients
to fetch more than that per request.

Add a new setting, MAX_REST_RESULTS_PER_PAGE, to set the maximum page size.

Closes: #202 ("Separate max API page size and default API page size into different 
settings")
Suggested-by: Stewart Smith 
Suggested-by: Joel Stanley 
Signed-off-by: Andrew Donnellan 


FWIW we now have this applied on patchwork.ozlabs.org and it appears to 
be working. Would like some more input as to what an appropriate default 
limit is.


--
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: [RFC PATCH] REST: Add new setting for maximum API page size

2018-07-23 Thread Andrew Donnellan

On 24/07/18 15:10, Andrew Donnellan wrote:

diff --git a/docs/deployment/configuration.rst 
b/docs/deployment/configuration.rst
index 347485636d47..e599522a412b 100644
--- a/docs/deployment/configuration.rst
+++ b/docs/deployment/configuration.rst
@@ -88,7 +88,13 @@ Enable the :doc:`REST API <../api/rest>`.
  The number of items to include in REST API responses by default. This can be
  overridden by the ``per_page`` parameter for some endpoints.
  
-.. versionadded:: 2.0


This obviously shouldn't have been there, will fix when I send non RFC


+``MAX_REST_RESULTS_PER_PAGE``
+~
+
+The maximum number of items that can be requested in a REST API request using
+the ``per_page`` parameter.
+
+.. versionadded:: 2.2



--
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


[RFC PATCH] REST: Add new setting for maximum API page size

2018-07-23 Thread Andrew Donnellan
In 41790caf59ad ("REST: Limit max page size") we limited the maximum page
size to the default page size in the settings.

This turns out to be rather restrictive, as we usually want to keep the
default page size low, but an administrator may want to allow API clients
to fetch more than that per request.

Add a new setting, MAX_REST_RESULTS_PER_PAGE, to set the maximum page size.

Closes: #202 ("Separate max API page size and default API page size into 
different settings")
Suggested-by: Stewart Smith 
Suggested-by: Joel Stanley 
Signed-off-by: Andrew Donnellan 

---

This is completely untested but should work, sending this as an RFC because
I have no idea what the default should be, thoughts?
---
 docs/deployment/configuration.rst | 8 +++-
 patchwork/api/base.py | 3 ++-
 patchwork/settings/base.py| 1 +
 3 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/docs/deployment/configuration.rst 
b/docs/deployment/configuration.rst
index 347485636d47..e599522a412b 100644
--- a/docs/deployment/configuration.rst
+++ b/docs/deployment/configuration.rst
@@ -88,7 +88,13 @@ Enable the :doc:`REST API <../api/rest>`.
 The number of items to include in REST API responses by default. This can be
 overridden by the ``per_page`` parameter for some endpoints.
 
-.. versionadded:: 2.0
+``MAX_REST_RESULTS_PER_PAGE``
+~
+
+The maximum number of items that can be requested in a REST API request using
+the ``per_page`` parameter.
+
+.. versionadded:: 2.2
 
 ``COMPAT_REDIR``
 
diff --git a/patchwork/api/base.py b/patchwork/api/base.py
index 8c38d5a1d5f4..bf452f78b390 100644
--- a/patchwork/api/base.py
+++ b/patchwork/api/base.py
@@ -36,7 +36,8 @@ class LinkHeaderPagination(PageNumberPagination):
https://tools.ietf.org/html/rfc5988#section-5
https://developer.github.com/guides/traversing-with-pagination
 """
-page_size = max_page_size = settings.REST_RESULTS_PER_PAGE
+page_size = settings.REST_RESULTS_PER_PAGE
+max_page_size = settings.MAX_REST_RESULTS_PER_PAGE
 page_size_query_param = 'per_page'
 
 def get_paginated_response(self, data):
diff --git a/patchwork/settings/base.py b/patchwork/settings/base.py
index 4b0d5513895a..7dc20041ec42 100644
--- a/patchwork/settings/base.py
+++ b/patchwork/settings/base.py
@@ -220,6 +220,7 @@ ENABLE_XMLRPC = False
 ENABLE_REST_API = True
 
 REST_RESULTS_PER_PAGE = 30
+MAX_REST_RESULTS_PER_PAGE = 500
 
 # Set to True to enable redirections or URLs from previous versions
 # of patchwork
-- 
2.11.0

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork