On Wed, Oct 10, 2012 at 2:15 PM, Brian Keegan <bkee...@gmail.com> wrote: > Hi all, > > I'm trying to scrape some data from en.wiki about the outlinks from the body > of articles. However, the API returns article outlinks contained within > templates. While I can write a routine to get a list of all the templates > and identify the article links inside these templates to remove from the > outlinks, this is problematic if a link appears in both the body and a > template. Thus if article X has a link to Y in the body as well as links to > Y an Z in templates, I want to capture Y but not Y & Z. > > Ideally, I'd like to either (1) be able to count the number of times an > article links out to another article (if X links to Y twice) and then > iterate this count down for each appearance in a template or (2) count only > the links occurring in the body and not parsing the links in templates. > > Thank you in advance for your suggestions! > Neither of these things is supported by the API, because the underlying functionality in MediaWiki (the links tables and the ParserOutput metadata) doesn't provide or store this information. You would have to do some kind of processing of your own to get this information.
Roan _______________________________________________ Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api