Ladsgroup added a comment.
Since I'm pinged. Yes. This is sorta expected and due to CAP theorem <https://en.wikipedia.org/wiki/CAP_theorem>. The more we become distributed and scale, the discrepancy gets bigger, jobs that trigger the refresh might fail. Might not get queued (used to be 1% of the time, now it's around 0.006% still non-zero) and the most importantly, there will be some latency between the change and it being reflected in Wikipedias. The most important thing is that we should have only one source of truth (SSOT principle) to avoid corruption. All of that being said, we can definitely do better. In my last project before departing WMDE, I worked on dispatching of changes from Wikidata to Wikipedias (and co) and noticed that the time between changes getting reflected in wikis is quite small but only for changes that don't require re-parsing of a page (e.g. injecting rc entries). The reparsing goes into the same queue of reparsing pages caused by for example template changes (to be a bit more technical: `htmlCacheUpdate` queue in job runners) and that queue is quite busy (to put it mildly). Depending on the load of or if a used template has been changed, it can take days to reach that change. So to mitigate and reduce the latency: - Have a dedicated job for sitelink changes and for other sitelinks of that item and put them in a dedicated lane - Having dedicated queues for higher priority work is more aligned with distributed computing principles (look at the Tannenbaum book) - This is not as straightforward as it looks due to internals of Wikibase. - I'm not in WMDE to help move it forward. - Reduce the load on `htmlCacheUpdate`. The amount of load on it is unbelievable. - There are some tickets already. I remember T250205 <https://phabricator.wikimedia.org/T250205> from top of my head. - I honestly don't know if people really looked into why it's so so massive - A low-hanging fruit would be to avoid re-parsing transclusions when inside of `<noinclude>` has been changed in the template. So updating documentation would not cause a massive cascade of several million reparses. - Fixes in this direction improves health and reliability of all of our systems, from ParserCache, to DB load, to appservers, to job runners, to edge caches. It's even greener. - But it's a big endeavor and will take years at least. - As a terrible band-aid, you can write a bot to listen to rc entries of your wiki with source of wikidata and force a reparse (action=purge) if that injected change is a sitelink change. - A complicating factor is that an article can subscribe to sitelink of another item and get notified for that sitelink changes (it's quite common in commons, pun intended). Make sure to avoid reparsing because of that. HTH TASK DETAIL https://phabricator.wikimedia.org/T297238 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Ladsgroup Cc: Ladsgroup, Xqt, Aklapper, Huji, pywikibot-bugs-list, Invadibot, maantietaja, Akuckartz, DannyS712, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________ pywikibot-bugs mailing list -- pywikibot-bugs@lists.wikimedia.org To unsubscribe send an email to pywikibot-bugs-le...@lists.wikimedia.org