Xqt added a comment.
This is the same problem as noted in T226157 <https://phabricator.wikimedia.org/T226157>. To parse links at Wikidata siteinfo content is required. These are cached inside apicache-py3 folder and expire usually after 30 days. See the following examples: (A) loading siteinfo via api for a clean apicache-py3 Folder ------------------------------------------------------------- >>> import pywikibot >>> from scripts.newitem import NewItemRobot >>> import pywikibot >>> bot = NewItemRobot([]) >>> def f(bot): from datetime import datetime site = pywikibot.Site('cs') start = datetime.now() temp = bot.get_skipping_templates(site) print('Time used:', datetime.now() - start) >>> f(bot) Retrieving skipping templates for site wikipedia:cs... WARNING: C:\pwb\GIT\core\pywikibot\tools\__init__.py:1479: UserWarning: Site wikipedia:be-tarask instantiated using different code "be-x-old" return obj(*__args, **__kw) Time used: 0:06:01.608467 This means siteinfo content load needs 6 minutes until completed (B) Try a second call for this function: ---------------------------------------- >>> f(bot) Time used: 0:00:00 As expected all templates are hold by the bot instance. (C) delete instance Cache and try again --------------------------------------- >>> bot._skipping_templates = {} >>> f(bot) Retrieving skipping templates for site wikipedia:cs... Time used: 0:00:03.309496 The content is fetched from apicache-py3 folder in 3 seconds only (D) use prelodsites for an empty apicache-py3 --------------------------------------------- C:\pwb\GIT\core>pwb preload_sites Preloading sites of wikibooks family... Preloading sites of wikinews family... Preloading sites of wikipedia family... Preloading sites of wikiquote family... Preloading sites of wikisource family... Preloading sites of wikiversity family... Preloading sites of wikivoyage family... Preloading sites of wiktionary family... Preloading sites of wikiversity family completed. Preloading sites of wikivoyage family completed. Preloading sites of wikinews family completed. Preloading sites of wikisource family completed. Preloading sites of wikiquote family completed. Preloading sites of wikibooks family completed. Preloading sites of wiktionary family completed. Preloading sites of wikipedia family completed. Loading time used: 0:02:13.395826 Preloading needs 2.2 minutes only vs. 6 minutes via script. Now check the script loading time: >>> bot._skipping_templates = {} >>> f(bot) Retrieving skipping templates for site wikipedia:cs... Time used: 0:00:04.476267 It's again few seconds because the content is already in apicache-py3. Conclusion ---------- I propose to use `preload_sites.py` maintenance script to preload siteinfo content until we have a better solution for parsing links. You may start it as batch e.g. monthly (because the expiry time is 30 days) or earlier. To force preloading you can use the global option `-API_config_expiry`. `preload_sites.py` is a maintenance script added to current master 6.0.0.dev0. It uses threads to load few siteinfo contents simultaneously. The number of parallel vorkers can be given by `-worker` option but is normally not necessary. The default setting depends on the number of processors of the machine. Usage: > python pwb.py [-API_config_expiry:{<num>}] preload_sites [{<family>}]* [-worker{<num>}] TASK DETAIL https://phabricator.wikimedia.org/T273386 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Xqt Cc: Xqt, Aklapper, JAnD, pywikibot-bugs-list, JohnsonLee01, SHEKH, Dijkstra, Khutuck, Zkhalido, Viztor, Wenyi, Darkminds3113, Tbscho, MayS, Vali.matei, Mdupont, JJMC89, Dvorapa, Altostratus, Avicennasis, Volker_E, mys_721tx, GWicke, Dinoguy1000, jayvdb, Ricordisamoa, Masti, Alchimista, Rxy, Jay8g
_______________________________________________ pywikibot-bugs mailing list pywikibot-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikibot-bugs