jenkins-bot has submitted this change and it was merged. Change subject: Make RefreshLinksJob de-duplication more robust ......................................................................
Make RefreshLinksJob de-duplication more robust * Do not de-duplicate jobs with "masterPos". It either does not catch anything or is not correct. Previously, it was the later, by making getDuplicationInfo() ignore the position. That made the oldest DB position win among "duplicate" jobs, which is unsafe. * From graphite, deduplication only applies .5-2% of the time for "refreshLinks", so there should not be much more duplicated effort. Dynamic and Prioritized refreshLinks jobs remain de-duplicated on push() and root job de-duplication still applies as it did before. Also, getLinksTimestamp() is still checked to avoid excess work. * Document the class constants. Change-Id: Ie9a10aa58f14fa76917501065dfe65083afb985c --- M includes/jobqueue/jobs/RefreshLinksJob.php 1 file changed, 11 insertions(+), 20 deletions(-) Approvals: Krinkle: Looks good to me, approved Umherirrender: Looks good to me, approved jenkins-bot: Verified diff --git a/includes/jobqueue/jobs/RefreshLinksJob.php b/includes/jobqueue/jobs/RefreshLinksJob.php index fa3278d..fa614d5 100644 --- a/includes/jobqueue/jobs/RefreshLinksJob.php +++ b/includes/jobqueue/jobs/RefreshLinksJob.php @@ -35,29 +35,22 @@ * @ingroup JobQueue */ class RefreshLinksJob extends Job { + /** @var float Cache parser output when it takes this long to render */ const PARSE_THRESHOLD_SEC = 1.0; - + /** @var integer Lag safety margin when comparing root job times to last-refresh times */ const CLOCK_FUDGE = 10; function __construct( Title $title, array $params ) { parent::__construct( 'refreshLinks', $title, $params ); - // Base backlink update jobs and per-title update jobs can be de-duplicated. - // If template A changes twice before any jobs run, a clean queue will have: - // (A base, A base) - // The second job is ignored by the queue on insertion. - // Suppose, many pages use template A, and that template itself uses template B. - // An edit to both will first create two base jobs. A clean FIFO queue will have: - // (A base, B base) - // When these jobs run, the queue will have per-title and remnant partition jobs: - // (titleX,titleY,titleZ,...,A remnant,titleM,titleN,titleO,...,B remnant) - // Some these jobs will be the same, and will automatically be ignored by - // the queue upon insertion. Some title jobs will run before the duplicate is - // inserted, so the work will still be done twice in those cases. More titles - // can be de-duplicated as the remnant jobs continue to be broken down. This - // works best when $wgUpdateRowsPerJob, and either the pages have few backlinks - // and/or the backlink sets for pages A and B are almost identical. - $this->removeDuplicates = !isset( $params['range'] ) - && ( !isset( $params['pages'] ) || count( $params['pages'] ) == 1 ); + // Avoid the overhead of de-duplication when it would be pointless + $this->removeDuplicates = ( + // Master positions won't match + !isset( $params['masterPos'] ) && + // Ranges rarely will line up + !isset( $params['range'] ) && + // Multiple pages per job make matches unlikely + !( isset( $params['pages'] ) && count( $params['pages'] ) != 1 ) + ); } /** @@ -273,8 +266,6 @@ public function getDeduplicationInfo() { $info = parent::getDeduplicationInfo(); if ( is_array( $info['params'] ) ) { - // Don't let highly unique "masterPos" values ruin duplicate detection - unset( $info['params']['masterPos'] ); // For per-pages jobs, the job title is that of the template that changed // (or similar), so remove that since it ruins duplicate detection if ( isset( $info['pages'] ) ) { -- To view, visit https://gerrit.wikimedia.org/r/256361 To unsubscribe, visit https://gerrit.wikimedia.org/r/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ie9a10aa58f14fa76917501065dfe65083afb985c Gerrit-PatchSet: 2 Gerrit-Project: mediawiki/core Gerrit-Branch: master Gerrit-Owner: Aaron Schulz <asch...@wikimedia.org> Gerrit-Reviewer: Aaron Schulz <asch...@wikimedia.org> Gerrit-Reviewer: Krinkle <krinklem...@gmail.com> Gerrit-Reviewer: Umherirrender <umherirrender_de...@web.de> Gerrit-Reviewer: jenkins-bot <> _______________________________________________ MediaWiki-commits mailing list MediaWiki-commits@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits