If you are looking to check for image/file usage it’s better to query the api for just image used images instead of trying to parse wiki text
On Tue, Mar 28, 2023 at 10:29 PM Roy Smith <r...@panix.com> wrote: > On Mar 28, 2023, at 9:09 PM, Kunal Mehta <lego...@debian.org> wrote: > > I suppose it's also worth asking what you're using expand_text() for in > the first place, to see if there's a better way to do whatever it is you > want to :) > > > That's a fair question. > > What I'm doing is looking at DYK nominations to evaluate if they've been > approved. Like so many wiki things, there's no formal definition, but the > simple version is that I'm looking for "File:Symbol confirmed.svg". The > problem is that it may not appear in the raw wikitext. An example is Bismarck > Kuyon > <https://en.wikipedia.org/wiki/Template:Did_you_know_nominations/Bismarck_Kuyon>. > Looking at the page, it's easy to see the green checkmark indicating > approval. But looking at the wikitext source, there's no such thing. What > there is, is a {{DYK checklist}} template which invokes some Lua code that > generates the checkmark based on the values in the other fields. The > expand_text() forces that to get run on the server side. > > From a machine-parsability point of view, it's insane. But I gotta work > with what I've been given. > > Ultimately, this is going to run as a bot. That fact that it takes a > couple of minutes to evaluate all the nominations of interest isn't > critical. I was doing an interactive web-based version for review > purposes, and for that, waiting 2 minutes for the page to load sucked. > But, I don't really need to do that, so I'll probably just go back to the > serialized version and leave it at that. > > One optimization I can see is that I only really need to do the > expand_text() on the subset of nominations which use {{DYK checklist}}, and > not even all of those (sometimes it's possible to determine the approval > state entirely from the text following the {{DYK checklist}}). That will > add a bit more complexity, which I was trying to avoid. > > Even deeper down the complexity rathole, I could re-implement the Lua > logic on the client side and avoid the expand_text() completely. I believe > that's what some existing bots, such as WugBot do. But I really didn't > want to go there. > > I did a little reading about your mwbot-rs project. At one point, I was > actually kind of excited about Rust and might have joined you just for the > excuse to learn it. Maybe some day. I am totally about your goal of > "sustainable development of bots and tools". We've got so many tools (some > of which important processes like DYK are totally dependent on) which are, > frankly, a mess of single-purpose code which can't be easily reused for > anything else. What I've been trying to do with dyk-tools is create a > toolkit of reusable components which other people can build upon. But I > seem to be spending most of my time working around silly things like the > {{DYK checklist}} stuff. > > Anyway, I hope that answers your question :-) > > BTW, I've mentioned this before, but I really can't recommend viztracer > <https://github.com/gaogaotiantian/viztracer> highly enough as a > performance analysis tool. At one level, it's just cProfile on steroids, > but with a snazzy graphical front end. It's what let me figure out that it > was expand(), not get(), which was the most expensive. I uploaded a > screenshot to commons. > <https://commons.wikimedia.org/wiki/File:Screen_Shot_of_viztracer_output.png> > > > _______________________________________________ > pywikibot mailing list -- pywikibot@lists.wikimedia.org > Public archives at > https://lists.wikimedia.org/hyperkitty/list/pywikibot@lists.wikimedia.org/message/2NPSDY46GVVRBD5BFKV4GEIYBPDCL4SL/ > To unsubscribe send an email to pywikibot-le...@lists.wikimedia.org >
_______________________________________________ pywikibot mailing list -- pywikibot@lists.wikimedia.org Public archives at https://lists.wikimedia.org/hyperkitty/list/pywikibot@lists.wikimedia.org/message/HAHWDSVDSEO435G5UUMAUKU2M5OF3BXJ/ To unsubscribe send an email to pywikibot-le...@lists.wikimedia.org