On Wed, Oct 10, 2012 at 2:15 PM, Brian Keegan <bkee...@gmail.com> wrote:
> Hi all,
>
> I'm trying to scrape some data from en.wiki about the outlinks from the body
> of articles. However, the API returns article outlinks contained within
> templates. While I can write a routine to get a list of all the templates
> and identify the article links inside these templates to remove from the
> outlinks, this is problematic if a link appears in both the body and a
> template. Thus if article X has a link to Y in the body as well as links to
> Y an Z in templates, I want to capture Y but not Y & Z.
>
> Ideally, I'd like to either (1) be able to count the number of times an
> article links out to another article (if X links to Y twice) and then
> iterate this count down for each appearance in a template or (2) count only
> the links occurring in the body and not parsing the links in templates.
>
> Thank you in advance for your suggestions!
>
Neither of these things is supported by the API, because the
underlying functionality in MediaWiki (the links tables and the
ParserOutput metadata) doesn't provide or store this information. You
would have to do some kind of processing of your own to get this
information.

Roan

_______________________________________________
Mediawiki-api mailing list
Mediawiki-api@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api

Reply via email to