Daniel Axtens <d...@axtens.net> writes: >>>>> FWIW we now have this applied on patchwork.ozlabs.org and it appears to >>>>> be working. Would like some more input as to what an appropriate default >>>>> limit is. >>>> >>>> My completely fact-free feeling/opinion is that if it takes more than >>>> ~1sec to run on Ozlabs, it's probably too high. Apart from that, I'm not >>>> fussed. >>> >>> OzLabs.org is not a fast machine. 1 second round trip time == 100 per >>> page. 500 per page == ~2.3 seconds round trip. >>> >>> A single 500 item load is still under half the time of doing 5 * 100 >>> item loads... > > I wonder if we could let authenticated users do slower queries. Maybe > something for later. Let's split the difference at 250 then, I'd be > happy to merge that. > >> I bet MySQL would execute the query quicker :) > > In general (and I have tested this) mysql 5.7 executes patchwork queries > slower than postgres 9.6 - this is on my laptop on an SSD with a couple > 100ks of patches across 2 or 3 projects.
Huh, somewhat surprising. I wonder what the query plan and cache layout looks like.... >> Anyway, that sounds really quite slow and we should look at why on earth >> it's that slow - is there some kind of horrific hidden join or poor >> indexing or query plan or something? > > Yes. > > So the 1.0 -> 2.0 transition was not good for performance because it > turns out databases do not map well to object-oriented structures.* I > didn't think to check for performance at the time, so we're > progressively improving that by denormalising stuff. (In fact that was a > major driver of the 2.1 release!) There is still further to go. So, looking at the schema, if we were going for the state of a bunch of patches in a project (which I imagine is fairly common), the current patch table is: CREATE TABLE `patchwork_patch` ( `submission_ptr_id` int(11) NOT NULL, `diff` longtext, `commit_ref` varchar(255) DEFAULT NULL, `pull_url` varchar(255) DEFAULT NULL, `delegate_id` int(11) DEFAULT NULL, `state_id` int(11) DEFAULT NULL, `archived` tinyint(1) NOT NULL, `hash` char(40) DEFAULT NULL, `patch_project_id` int(11) NOT NULL, PRIMARY KEY (`submission_ptr_id`), KEY `patchwork_patch_delegate_id_8f7ce805_fk_auth_user_id` (`delegate_id`), KEY `patchwork_patch_state_id_3d9fb500_fk_patchwork_state_id` (`state_id`), KEY `patchwork_patch_patch_project_id_aff1ad51_fk_patchwork` (`patch_project_id`), CONSTRAINT `patchwork_patch_delegate_id_8f7ce805_fk_auth_user_id` FOREIGN KEY (`delegate_id`) REFERENCES `auth_user` (`id`), CONSTRAINT `patchwork_patch_patch_project_id_aff1ad51_fk_patchwork` FOREIGN KEY (`patch_project_id`) REFERENCES `patchwork_project` (`id`), CONSTRAINT `patchwork_patch_state_id_3d9fb500_fk_patchwork_state_id` FOREIGN KEY (`state_id`) REFERENCES `patchwork_state` (`id`), CONSTRAINT `patchwork_patch_submission_ptr_id_03216801_fk_patchwork` FOREIGN KEY (`submission_ptr_id`) REFERENCES `patchwork_submission` (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 which puts the submission_ptr_id as the key for the clustered index (i.e. on disk layout will group by this ID). What I suspect is occuring is that we end up having teh on disk layout of patchwork_patch (and patchwork_submission) be ordered by time the patch comes in across *all* projects of the patchwork instance, which is typically (never?) what is actually queried. So that when I do queries over, say 'skiboot', I'm pulling into cache all the submissions/patches that occured to any project around that time, rather than only things related to skiboot. If instead, the primary key was ('patch_project_id','submission_ptr_id') and there was a UNIQUE KEY ('submission_ptr_id') then this would mean that we'd be pulling in a database page full of the patches for the project we're searching on, which I think would be more likely to be queried in the near future. The patch_project_id index colud be dropped, as all queries on it would be satisfied by the primary key. The same goes for patchwork_submission: CREATE TABLE `patchwork_submission` ( `id` int(11) NOT NULL AUTO_INCREMENT, `msgid` varchar(255) NOT NULL, `name` varchar(255) NOT NULL, `date` datetime(6) NOT NULL, `headers` longtext NOT NULL, `project_id` int(11) NOT NULL, `submitter_id` int(11) NOT NULL, `content` longtext, PRIMARY KEY (`id`), UNIQUE KEY `patchwork_patch_msgid_project_id_2e1fe709_uniq` (`msgid`,`project_id`), KEY `patchwork_patch_project_id_162a7a7f_fk_patchwork_project_id` (`project_id`), KEY `patchwork_patch_submitter_id_57c66142_fk_patchwork_person_id` (`submitter_id`), CONSTRAINT `patchwork_patch_project_id_162a7a7f_fk_patchwork_project_id` FOREIGN KEY (`project_id`) REFERENCES `patchwork_project` (`id`), CONSTRAINT `patchwork_patch_submitter_id_57c66142_fk_patchwork_person_id` FOREIGN KEY (`submitter_id`) REFERENCES `patchwork_person` (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 simply getting a list of patches for a project here is going to be a very expensive query, pulling in a lot of database pages containing data for patches around that time rather than ones relevant to the query. Again, I'd say go: PRIMARY KEY ('project_id', 'id') UNIQUE KEY ('id') (the project_id index can then be dropped, as the primary key could be used there). Of course, things get better if you query everything looking up project_id=X and id=Y rather than just id=Y. For PostgreSQL it gets a bit interesting as I'm not as familiar with how it does data layout. There *appears* to be a CLUSTER *command* that will re-organise an existing table but *not* make any future inserts/updates obey it (which is annoying). This seems to be a bit of a theme throughout the schema too... Actually... do we have a good test dataset I could load up locally and experiment with? > * Yes, I probably should have realised this at the time. something > something unpaid volunteers something something patch review something > something. :) pretty much neglecting some piece of software that ends up being vital open source infrastructure is a grand tradition, welcome! -- Stewart Smith OPAL Architect, IBM. _______________________________________________ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork