jenkins-bot has submitted this change and it was merged. ( https://gerrit.wikimedia.org/r/378707 )
Change subject: Document change dispatching and change handling in change-propagation.wiki ...................................................................... Document change dispatching and change handling in change-propagation.wiki Change-Id: Ic10f3067dca522197db639e2cfdf18591a40a9d7 --- M docs/change-propagation.wiki 1 file changed, 24 insertions(+), 2 deletions(-) Approvals: Daniel Kinzler: Looks good to me, approved jenkins-bot: Verified diff --git a/docs/change-propagation.wiki b/docs/change-propagation.wiki index b7b45dd..d072f24 100644 --- a/docs/change-propagation.wiki +++ b/docs/change-propagation.wiki @@ -8,11 +8,14 @@ * Subscription management, so the repository knows which client wiki is interested in changes to which entities. * Dispatch state, so the repository knows which changes have already been dispatched to which client. * A buffer of the changes themselves. -* Access to each client's job queue. +* Access to each client's job queue, to push ChangeNotificationJobs to. On each client, there needs to be: * Usage tracking. * Access to sitelinks stored in the repository. +* ChangeHandler for processing changes on the repo, triggered by ChangeNotificationJobs being executed. +* AffectedPagesFinder, a mechanism to determine which pages are affected by which change, based on usage tracking information (see usagetracking.wiki). +* WikiPageUpdater, for updating the client wiki's state. The basic operation of change dispatching involves running two scripts regularly, typically as cron jobs: dispatchChanges.php and pruneChanges.php, both located in the repo/maintenance/ directory. A typical cron setup could look like this: * Every minute, run dispatchChanges.php --max-time 120 @@ -62,5 +65,24 @@ Per default, global MySQL locks are used to ensure that only one process can dispatch to any given client wiki at a time. +== dispatchChanges.php script == +The dispatchChanges script notifies client wikis of changes on the repository. It reads information from the wb_changes and wb_changes_dispatch tables, and posts ChangeNotificationJobs to the clients' job queues. + +The basic scheduling algorithm is as follows: for each client wiki, define how many changes they have not yet seen according to wb_changes_dispatch (we refer to that number as "dispatch lag"). Find the n client wikis that have the most lag (and have not been touched for some minimal delay). Pick one of these wikis at random. For the selected target wiki, find changes it has not yet seen to entities it is subscribed to, up to some maximum number of m changes. Construct a ChangeNotificationJob event containing the IDs of these changes, and push it to the target wiki's JobQueue. In wb_changes_dispatch, record all changes touched in this process as seen by the target wiki. + +The dispatchChanges is designed to be safe against concurrent execution. It can be scaled easily by simply running more instances in parallel. The locking mechanism used to prevent race conditions can be configured using the dispatchingLockManager setting. Per default, named locks on the repo database are used. Redis based locks are supported as an alternative. + == SiteLinkLookup == -Each client wiki can access the repo's sitelink information via a SiteLinkLookup service returned by ClientStore::getSiteLinkLookup. This information is stored in the wb_items_per_site table in the repo's database. +A SiteLinkLookup allows the client wiki to determine which local pages are "connected" to a given Item on the repository. Each client wiki can access the repo's sitelink information via a SiteLinkLookup service returned by ClientStore::getSiteLinkLookup. This information is stored in the wb_items_per_site table in the repo's database. + +== ChangeHandler == +The handleChanges() method of the ChangeHandler class gets called with a list of changes loaded by a ChangeNotificationJob. A ChangeRunCoalescer is then used to merge consecutive changes by the same user to the same entity, reducing the number of logical events to be processed on the client, and to be presented to the user. + +ChangeHandler will then for each change determine the affected pages using the AffectedPagesFinder, which uses information from the wbc_entity_usage table (see usagetracking.wiki). It then uses a WikiPageUpdater to update the client wiki's state: rows are injected into the <code>recentchanges</code> database table, pages using the affected entity's data are re-parsed, and the web cache for these pages is purged. + +== WikiPageUpdater == +The WikiPageUpdater class defines three methods for updating the client wikis state according to a given change on the repository: +* scheduleRefreshLinks() will re-parse each affected page, allowing the link tables to be updated appropriately. This is done asynchronously using RefreshLinksJobs. No batching is applied, since RefreshLinksJobs are slow and this benefit more from deduplication than from batching. +* purgeWebCache() will update the web-cache for each affected page. This is done asynchronously in batches, using HTMLCacheUpdateJob. The batch size is controlled by the purgeCacheBatchSize setting. +* injectRCRecords() will create a RecentChange entry for each affected page. This is done asynchronously in batches, using InjectRCRecordsJobs. The batch size is controlled by the recentChangesBatchSize setting. + -- To view, visit https://gerrit.wikimedia.org/r/378707 To unsubscribe, visit https://gerrit.wikimedia.org/r/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ic10f3067dca522197db639e2cfdf18591a40a9d7 Gerrit-PatchSet: 7 Gerrit-Project: mediawiki/extensions/Wikibase Gerrit-Branch: master Gerrit-Owner: Daniel Kinzler <[email protected]> Gerrit-Reviewer: Aude <[email protected]> Gerrit-Reviewer: Daniel Kinzler <[email protected]> Gerrit-Reviewer: Thiemo Mättig (WMDE) <[email protected]> Gerrit-Reviewer: WMDE-leszek <[email protected]> Gerrit-Reviewer: jenkins-bot <> _______________________________________________ MediaWiki-commits mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits
