jenkins-bot has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/378707 )

Change subject: Document change dispatching and change handling in 
change-propagation.wiki
......................................................................


Document change dispatching and change handling in change-propagation.wiki

Change-Id: Ic10f3067dca522197db639e2cfdf18591a40a9d7
---
M docs/change-propagation.wiki
1 file changed, 24 insertions(+), 2 deletions(-)

Approvals:
  Daniel Kinzler: Looks good to me, approved
  jenkins-bot: Verified



diff --git a/docs/change-propagation.wiki b/docs/change-propagation.wiki
index b7b45dd..d072f24 100644
--- a/docs/change-propagation.wiki
+++ b/docs/change-propagation.wiki
@@ -8,11 +8,14 @@
 * Subscription management, so the repository knows which client wiki is 
interested in changes to which entities.
 * Dispatch state, so the repository knows which changes have already been 
dispatched to which client.
 * A buffer of the changes themselves.
-* Access to each client's job queue.
+* Access to each client's job queue, to push ChangeNotificationJobs to.
 
 On each client, there needs to be:
 * Usage tracking.
 * Access to sitelinks stored in the repository.
+* ChangeHandler for processing changes on the repo, triggered by 
ChangeNotificationJobs being executed.
+* AffectedPagesFinder, a mechanism to determine which pages are affected by 
which change, based on usage tracking information (see usagetracking.wiki).
+* WikiPageUpdater, for updating the client wiki's state.
 
 The basic operation of change dispatching involves running two scripts 
regularly, typically as cron jobs: dispatchChanges.php and pruneChanges.php, 
both located in the repo/maintenance/ directory. A typical cron setup could 
look like this:
 * Every minute, run dispatchChanges.php --max-time 120
@@ -62,5 +65,24 @@
 
 Per default, global MySQL locks are used to ensure that only one process can 
dispatch to any given client wiki at a time.
 
+== dispatchChanges.php script ==
+The dispatchChanges script notifies client wikis of changes on the repository. 
It reads information from the wb_changes and wb_changes_dispatch tables, and 
posts ChangeNotificationJobs to the clients' job queues.
+
+The basic scheduling algorithm is as follows: for each client wiki, define how 
many changes they have not yet seen according to wb_changes_dispatch (we refer 
to that number as "dispatch lag"). Find the n client wikis that have the most 
lag (and have not been touched for some minimal delay). Pick one of these wikis 
at random. For the selected target wiki, find changes it has not yet seen to 
entities it is subscribed to, up to some maximum number of m changes. Construct 
a ChangeNotificationJob event containing the IDs of these changes, and push it 
to the target wiki's JobQueue. In wb_changes_dispatch, record all changes 
touched in this process as seen by the target wiki.
+
+The dispatchChanges is designed to be safe against concurrent execution. It 
can be scaled easily by simply running more instances in parallel. The locking 
mechanism used to prevent race conditions can be configured using the 
dispatchingLockManager setting. Per default, named locks on the repo database 
are used. Redis based locks are supported as an alternative.
+
 == SiteLinkLookup ==
-Each client wiki can access the repo's sitelink information via a 
SiteLinkLookup service returned by ClientStore::getSiteLinkLookup. This 
information is stored in the wb_items_per_site table in the repo's database.
+A SiteLinkLookup allows the client wiki to determine which local pages are 
"connected" to a given Item on the repository. Each client wiki can access the 
repo's sitelink information via a SiteLinkLookup service returned by 
ClientStore::getSiteLinkLookup. This information is stored in the 
wb_items_per_site table in the repo's database.
+
+== ChangeHandler ==
+The handleChanges() method of the ChangeHandler class gets called with a list 
of changes loaded by a ChangeNotificationJob. A ChangeRunCoalescer is then used 
to merge consecutive changes by the same user to the same entity, reducing the 
number of logical events to be processed on the client, and to be presented to 
the user.
+
+ChangeHandler will then for each change determine the affected pages using the 
AffectedPagesFinder, which uses information from the wbc_entity_usage table 
(see usagetracking.wiki). It then uses a WikiPageUpdater to update the client 
wiki's state: rows are injected into the <code>recentchanges</code> database 
table, pages using the affected entity's data are re-parsed, and the web cache 
for these pages is purged.
+
+== WikiPageUpdater ==
+The WikiPageUpdater class defines three methods for updating the client wikis 
state according to a given change on the repository:
+* scheduleRefreshLinks() will re-parse each affected page, allowing the link 
tables to be updated appropriately. This is done asynchronously using 
RefreshLinksJobs. No batching is applied, since RefreshLinksJobs are slow and 
this benefit more from deduplication than from batching.
+* purgeWebCache() will update the web-cache for each affected page. This is 
done asynchronously in batches, using HTMLCacheUpdateJob. The batch size is 
controlled by the purgeCacheBatchSize setting.
+* injectRCRecords() will create a RecentChange entry for each affected page. 
This is done asynchronously in batches, using InjectRCRecordsJobs. The batch 
size is controlled by the recentChangesBatchSize setting.
+

-- 
To view, visit https://gerrit.wikimedia.org/r/378707
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ic10f3067dca522197db639e2cfdf18591a40a9d7
Gerrit-PatchSet: 7
Gerrit-Project: mediawiki/extensions/Wikibase
Gerrit-Branch: master
Gerrit-Owner: Daniel Kinzler <[email protected]>
Gerrit-Reviewer: Aude <[email protected]>
Gerrit-Reviewer: Daniel Kinzler <[email protected]>
Gerrit-Reviewer: Thiemo Mättig (WMDE) <[email protected]>
Gerrit-Reviewer: WMDE-leszek <[email protected]>
Gerrit-Reviewer: jenkins-bot <>

_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to