I'm trying to parse DYK prep area templates, for example Template:Did you 
know/Preparation area 3 
<https://en.wikipedia.org/wiki/Template:Did_you_know/Preparation_area_3>.  
Unfortunately, these are more like flat text files than any kind of nicely 
structured data.  The stuff of interest is everything between two HTML comments:

> <!--Hooks-->
> {{main page image/DYK|image=Melissa Ong.webp|caption=Selfie of Ong, commonly 
> replicated by the Step Chickens<!--the caption length is intentional, it 
> highlights that this image is there for a specific purpose and isn't just any 
> image of Ong – please don't shorten it! Same for the ''(shown)'' –leek -->}}
> * ... that "Step Chickens" on TikTok replace their profile pictures with an 
> image ''(shown)'' of '''[[Melissa Ong]]''', whom they call "Mother Hen"?
> * ... that '''[[interfaith greetings in Indonesia]]''' include phrases from 
> Islam, Christianity, Hinduism, Buddhism, and Confucianism?
> * ... that '''[[Kimmo Leinonen]]''' helped establish both the [[Finnish 
> Hockey Hall of Fame]] and the [[IIHF Hall of Fame]]?
> * ... that the [[Pulitzer Prize for Fiction|Pulitzer Prize]]-winning novel 
> '''''[[All the Light We Cannot See]]''''' contains a sympathetic 
> [[Nazism|Nazi]]?
> * ... that a {{Convert|10|ft|m|adj=mid|-tall|0}} '''[[Lady Rainier|statue of 
> a woman]]''' in [[Seattle]] was commissioned by a local brewery in 1903?
> * ... that ...
> * ... that prior to entering politics, '''[[Herbert Salvatierra]]''' led a 
> troupe of [[carnival]] ''[[comparsa]]s''?
> * ... that [[Winston Churchill]] published '''[[Are There Men on the Moon?|an 
> essay on extraterrestrial life]]''' during the Second World War?
> <!--HooksEnd-->


I can find the comments with Wikicode.filter_comments().  But once I've found 
the two delimiting comments, how do I grab the text between them?  Or is the 
parser the wrong tool?  Would I do better to treat the content of the page as 
flat text and just iterate over it line by line, teasing it apart with regexes?


_______________________________________________
pywikibot mailing list -- pywikibot@lists.wikimedia.org
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/pywikibot@lists.wikimedia.org/message/XA2Y2ZFSFSLRG5TWHIV5G3QRMAK27H56/
To unsubscribe send an email to pywikibot-le...@lists.wikimedia.org

Reply via email to