jenkins-bot has submitted this change and it was merged.

Change subject: Make RefreshLinksJob de-duplication more robust
......................................................................


Make RefreshLinksJob de-duplication more robust

* Do not de-duplicate jobs with "masterPos". It either does not
  catch anything or is not correct. Previously, it was the later,
  by making getDuplicationInfo() ignore the position. That made the
  oldest DB position win among "duplicate" jobs, which is unsafe.
* From graphite, deduplication only applies .5-2% of the time for
  "refreshLinks", so there should not be much more duplicated
  effort. Dynamic and Prioritized refreshLinks jobs remain
  de-duplicated on push() and root job de-duplication still applies
  as it did before. Also, getLinksTimestamp() is still checked to
  avoid excess work.
* Document the class constants.

Change-Id: Ie9a10aa58f14fa76917501065dfe65083afb985c
---
M includes/jobqueue/jobs/RefreshLinksJob.php
1 file changed, 11 insertions(+), 20 deletions(-)

Approvals:
  Krinkle: Looks good to me, approved
  Umherirrender: Looks good to me, approved
  jenkins-bot: Verified



diff --git a/includes/jobqueue/jobs/RefreshLinksJob.php 
b/includes/jobqueue/jobs/RefreshLinksJob.php
index fa3278d..fa614d5 100644
--- a/includes/jobqueue/jobs/RefreshLinksJob.php
+++ b/includes/jobqueue/jobs/RefreshLinksJob.php
@@ -35,29 +35,22 @@
  * @ingroup JobQueue
  */
 class RefreshLinksJob extends Job {
+       /** @var float Cache parser output when it takes this long to render */
        const PARSE_THRESHOLD_SEC = 1.0;
-
+       /** @var integer Lag safety margin when comparing root job times to 
last-refresh times */
        const CLOCK_FUDGE = 10;
 
        function __construct( Title $title, array $params ) {
                parent::__construct( 'refreshLinks', $title, $params );
-               // Base backlink update jobs and per-title update jobs can be 
de-duplicated.
-               // If template A changes twice before any jobs run, a clean 
queue will have:
-               //              (A base, A base)
-               // The second job is ignored by the queue on insertion.
-               // Suppose, many pages use template A, and that template itself 
uses template B.
-               // An edit to both will first create two base jobs. A clean 
FIFO queue will have:
-               //              (A base, B base)
-               // When these jobs run, the queue will have per-title and 
remnant partition jobs:
-               //              (titleX,titleY,titleZ,...,A 
remnant,titleM,titleN,titleO,...,B remnant)
-               // Some these jobs will be the same, and will automatically be 
ignored by
-               // the queue upon insertion. Some title jobs will run before 
the duplicate is
-               // inserted, so the work will still be done twice in those 
cases. More titles
-               // can be de-duplicated as the remnant jobs continue to be 
broken down. This
-               // works best when $wgUpdateRowsPerJob, and either the pages 
have few backlinks
-               // and/or the backlink sets for pages A and B are almost 
identical.
-               $this->removeDuplicates = !isset( $params['range'] )
-                       && ( !isset( $params['pages'] ) || count( 
$params['pages'] ) == 1 );
+               // Avoid the overhead of de-duplication when it would be 
pointless
+               $this->removeDuplicates = (
+                       // Master positions won't match
+                       !isset( $params['masterPos'] ) &&
+                       // Ranges rarely will line up
+                       !isset( $params['range'] ) &&
+                       // Multiple pages per job make matches unlikely
+                       !( isset( $params['pages'] ) && count( $params['pages'] 
) != 1 )
+               );
        }
 
        /**
@@ -273,8 +266,6 @@
        public function getDeduplicationInfo() {
                $info = parent::getDeduplicationInfo();
                if ( is_array( $info['params'] ) ) {
-                       // Don't let highly unique "masterPos" values ruin 
duplicate detection
-                       unset( $info['params']['masterPos'] );
                        // For per-pages jobs, the job title is that of the 
template that changed
                        // (or similar), so remove that since it ruins 
duplicate detection
                        if ( isset( $info['pages'] ) ) {

-- 
To view, visit https://gerrit.wikimedia.org/r/256361
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ie9a10aa58f14fa76917501065dfe65083afb985c
Gerrit-PatchSet: 2
Gerrit-Project: mediawiki/core
Gerrit-Branch: master
Gerrit-Owner: Aaron Schulz <asch...@wikimedia.org>
Gerrit-Reviewer: Aaron Schulz <asch...@wikimedia.org>
Gerrit-Reviewer: Krinkle <krinklem...@gmail.com>
Gerrit-Reviewer: Umherirrender <umherirrender_de...@web.de>
Gerrit-Reviewer: jenkins-bot <>

_______________________________________________
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to