If you are looking to check for image/file usage it’s better to query the
api for just image used images instead of trying to parse wiki text

On Tue, Mar 28, 2023 at 10:29 PM Roy Smith <r...@panix.com> wrote:

> On Mar 28, 2023, at 9:09 PM, Kunal Mehta <lego...@debian.org> wrote:
>
> I suppose it's also worth asking what you're using expand_text() for in
> the first place, to see if there's a better way to do whatever it is you
> want to :)
>
>
> That's a fair question.
>
> What I'm doing is looking at DYK nominations to evaluate if they've been
> approved.  Like so many wiki things, there's no formal definition, but the
> simple version is that I'm looking for "File:Symbol confirmed.svg".  The
> problem is that it may not appear in the raw wikitext.  An example is Bismarck
> Kuyon
> <https://en.wikipedia.org/wiki/Template:Did_you_know_nominations/Bismarck_Kuyon>.
> Looking at the page, it's easy to see the green checkmark indicating
> approval.  But looking at the wikitext source, there's no such thing.  What
> there is, is a {{DYK checklist}} template which invokes some Lua code that
> generates the checkmark based on the values in the other fields.  The
> expand_text() forces that to get run on the server side.
>
> From a machine-parsability point of view, it's insane.  But I gotta work
> with what I've been given.
>
> Ultimately, this is going to run as a bot.  That fact that it takes a
> couple of minutes to evaluate all the nominations of interest isn't
> critical.   I was doing an interactive web-based version for review
> purposes, and for that, waiting 2 minutes for the page to load sucked.
> But, I don't really need to do that, so I'll probably just go back to the
> serialized version and leave it at that.
>
> One optimization I can see is that I only really need to do the
> expand_text() on the subset of nominations which use {{DYK checklist}}, and
> not even all of those (sometimes it's possible to determine the approval
> state entirely from the text following the {{DYK checklist}}).  That will
> add a bit more complexity, which I was trying to avoid.
>
> Even deeper down the complexity rathole, I could re-implement the Lua
> logic on the client side and avoid the expand_text() completely.  I believe
> that's what some existing bots, such as WugBot do.  But I really didn't
> want to go there.
>
> I did a little reading about your mwbot-rs project.  At one point, I was
> actually kind of excited about Rust and might have joined you just for the
> excuse to learn it.  Maybe some day.  I am totally about your goal of
> "sustainable development of bots and tools".  We've got so many tools (some
> of which important processes like DYK are totally dependent on) which are,
> frankly, a mess of single-purpose code which can't be easily reused for
> anything else.  What I've been trying to do with dyk-tools is create a
> toolkit of reusable components which other people can build upon.  But I
> seem to be spending most of my time working around silly things like the
> {{DYK checklist}} stuff.
>
> Anyway, I hope that answers your question :-)
>
> BTW, I've mentioned this before, but I really can't recommend viztracer
> <https://github.com/gaogaotiantian/viztracer> highly enough as a
> performance analysis tool.  At one level, it's just cProfile on steroids,
> but with a snazzy graphical front end.  It's what let me figure out that it
> was expand(), not get(), which was the most expensive.  I uploaded a
> screenshot to commons.
> <https://commons.wikimedia.org/wiki/File:Screen_Shot_of_viztracer_output.png>
>
>
> _______________________________________________
> pywikibot mailing list -- pywikibot@lists.wikimedia.org
> Public archives at
> https://lists.wikimedia.org/hyperkitty/list/pywikibot@lists.wikimedia.org/message/2NPSDY46GVVRBD5BFKV4GEIYBPDCL4SL/
> To unsubscribe send an email to pywikibot-le...@lists.wikimedia.org
>
_______________________________________________
pywikibot mailing list -- pywikibot@lists.wikimedia.org
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/pywikibot@lists.wikimedia.org/message/HAHWDSVDSEO435G5UUMAUKU2M5OF3BXJ/
To unsubscribe send an email to pywikibot-le...@lists.wikimedia.org

Reply via email to