On Mar 28, 2023, at 9:09 PM, Kunal Mehta <lego...@debian.org> wrote:
> I suppose it's also worth asking what you're using expand_text() for in the 
> first place, to see if there's a better way to do whatever it is you want to 
> :)


That's a fair question.

What I'm doing is looking at DYK nominations to evaluate if they've been 
approved.  Like so many wiki things, there's no formal definition, but the 
simple version is that I'm looking for "File:Symbol confirmed.svg".  The 
problem is that it may not appear in the raw wikitext.  An example is Bismarck 
Kuyon 
<https://en.wikipedia.org/wiki/Template:Did_you_know_nominations/Bismarck_Kuyon>.
  Looking at the page, it's easy to see the green checkmark indicating 
approval.  But looking at the wikitext source, there's no such thing.  What 
there is, is a {{DYK checklist}} template which invokes some Lua code that 
generates the checkmark based on the values in the other fields.  The 
expand_text() forces that to get run on the server side.

From a machine-parsability point of view, it's insane.  But I gotta work with 
what I've been given.

Ultimately, this is going to run as a bot.  That fact that it takes a couple of 
minutes to evaluate all the nominations of interest isn't critical.   I was 
doing an interactive web-based version for review purposes, and for that, 
waiting 2 minutes for the page to load sucked.  But, I don't really need to do 
that, so I'll probably just go back to the serialized version and leave it at 
that.

One optimization I can see is that I only really need to do the expand_text() 
on the subset of nominations which use {{DYK checklist}}, and not even all of 
those (sometimes it's possible to determine the approval state entirely from 
the text following the {{DYK checklist}}).  That will add a bit more 
complexity, which I was trying to avoid.

Even deeper down the complexity rathole, I could re-implement the Lua logic on 
the client side and avoid the expand_text() completely.  I believe that's what 
some existing bots, such as WugBot do.  But I really didn't want to go there.

I did a little reading about your mwbot-rs project.  At one point, I was 
actually kind of excited about Rust and might have joined you just for the 
excuse to learn it.  Maybe some day.  I am totally about your goal of 
"sustainable development of bots and tools".  We've got so many tools (some of 
which important processes like DYK are totally dependent on) which are, 
frankly, a mess of single-purpose code which can't be easily reused for 
anything else.  What I've been trying to do with dyk-tools is create a toolkit 
of reusable components which other people can build upon.  But I seem to be 
spending most of my time working around silly things like the {{DYK checklist}} 
stuff.

Anyway, I hope that answers your question :-)

BTW, I've mentioned this before, but I really can't recommend viztracer 
<https://github.com/gaogaotiantian/viztracer> highly enough as a performance 
analysis tool.  At one level, it's just cProfile on steroids, but with a snazzy 
graphical front end.  It's what let me figure out that it was expand(), not 
get(), which was the most expensive.  I uploaded a screenshot to commons. 
<https://commons.wikimedia.org/wiki/File:Screen_Shot_of_viztracer_output.png>


_______________________________________________
pywikibot mailing list -- pywikibot@lists.wikimedia.org
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/pywikibot@lists.wikimedia.org/message/2NPSDY46GVVRBD5BFKV4GEIYBPDCL4SL/
To unsubscribe send an email to pywikibot-le...@lists.wikimedia.org

Reply via email to