On Sat, Aug 24, 2019 at 7:02 AM Bertel Teilfeldt Hansen <[email protected]>
wrote:

> Hi Mediawiki-api mailing listers!
>
> I'm trying to get the intro to a list of Wikipedia pages using the
> "extracts" property with "exintro=True". This works fine for most sites,
> but for a few of them the API returns an empty extract field. See for
> example:
>
> https://en.wikipedia.org/w/api.php?action=query&prop=extracts&titles=Anthem&exintro=True
>
> When looking at the page "https://en.wikipedia.org/wiki/Anthem"; there
> definitely seems to be text before the first section, so I think I should
> be getting something. Indeed without the "exintro" parameter, I get the
> expected return.
>
> Any idea why this occurs?
>

"exintro" assumes that the first heading tag (<h1> to <h6>) indicates the
end of the intro. In the HTML of that page, the {{TOC_Right}} causes the
table of contents to be before the visible text, and the table of contents
includes an <h2>, so it chops it off there.


-- 
Brad Jorsch (Anomie)
Senior Software Engineer
Wikimedia Foundation
_______________________________________________
Mediawiki-api mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api

Reply via email to