Xqt added a comment.

  This is the same problem as noted in T226157 
<https://phabricator.wikimedia.org/T226157>. To parse links at Wikidata 
siteinfo content is required. These are cached inside  apicache-py3 folder and 
expire usually after 30 days. See the following examples:
  
  (A) loading siteinfo via api  for a clean apicache-py3 Folder
  -------------------------------------------------------------
  
    >>> import pywikibot
    >>> from scripts.newitem import NewItemRobot
    >>> import pywikibot
    >>> bot = NewItemRobot([])
    >>> def f(bot):
        from datetime import datetime
        site = pywikibot.Site('cs')
        start = datetime.now()
        temp = bot.get_skipping_templates(site)
        print('Time used:', datetime.now() - start)
    
        
    >>> f(bot)
    Retrieving skipping templates for site wikipedia:cs...
    WARNING: C:\pwb\GIT\core\pywikibot\tools\__init__.py:1479: UserWarning: 
Site wikipedia:be-tarask instantiated using different code "be-x-old"
      return obj(*__args, **__kw)
    
    Time used: 0:06:01.608467
  
  This means siteinfo content load needs 6 minutes until completed
  
  (B) Try a second call for this function:
  ----------------------------------------
  
    >>> f(bot)
    Time used: 0:00:00
  
  As expected all templates are hold by the bot instance.
  
  (C) delete instance Cache and try again
  ---------------------------------------
  
    >>> bot._skipping_templates = {}
    >>> f(bot)
    Retrieving skipping templates for site wikipedia:cs...
    Time used: 0:00:03.309496
  
  The content is fetched from apicache-py3 folder in 3 seconds only
  
  (D) use prelodsites for an empty apicache-py3
  ---------------------------------------------
  
    C:\pwb\GIT\core>pwb preload_sites
    Preloading sites of wikibooks family...
    Preloading sites of wikinews family...
    Preloading sites of wikipedia family...
    Preloading sites of wikiquote family...
    Preloading sites of wikisource family...
    Preloading sites of wikiversity family...
    Preloading sites of wikivoyage family...
    Preloading sites of wiktionary family...
    Preloading sites of wikiversity family completed.
    Preloading sites of wikivoyage family completed.
    Preloading sites of wikinews family completed.
    Preloading sites of wikisource family completed.
    Preloading sites of wikiquote family completed.
    Preloading sites of wikibooks family completed.
    Preloading sites of wiktionary family completed.
    Preloading sites of wikipedia family completed.
    Loading time used: 0:02:13.395826
  
  Preloading needs 2.2 minutes only vs. 6 minutes via script. Now check the 
script loading time:
  
    >>> bot._skipping_templates = {}
    >>> f(bot)
    Retrieving skipping templates for site wikipedia:cs...
    Time used: 0:00:04.476267
  
  It's again few seconds because the content is already in apicache-py3.
  
  Conclusion
  ----------
  
  I propose to use `preload_sites.py` maintenance script to preload siteinfo 
content until we have a better solution for parsing links. You may start it as 
batch e.g. monthly (because the expiry time is 30 days) or earlier. To force 
preloading you can use the global option `-API_config_expiry`.
  
  `preload_sites.py` is a maintenance script added to current master 
6.0.0.dev0. It uses threads to load  few siteinfo contents simultaneously. The 
number of parallel vorkers can be given by `-worker` option but is normally not 
necessary. The default setting depends on the number of processors of the 
machine.
  
  Usage:
  
  > python pwb.py [-API_config_expiry:{<num>}] preload_sites [{<family>}]* 
[-worker{<num>}]

TASK DETAIL
  https://phabricator.wikimedia.org/T273386

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Xqt
Cc: Xqt, Aklapper, JAnD, pywikibot-bugs-list, JohnsonLee01, SHEKH, Dijkstra, 
Khutuck, Zkhalido, Viztor, Wenyi, Darkminds3113, Tbscho, MayS, Vali.matei, 
Mdupont, JJMC89, Dvorapa, Altostratus, Avicennasis, Volker_E, mys_721tx, 
GWicke, Dinoguy1000, jayvdb, Ricordisamoa, Masti, Alchimista, Rxy, Jay8g
_______________________________________________
pywikibot-bugs mailing list
pywikibot-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/pywikibot-bugs

Reply via email to