An alternative is Aaron Halfaker's mediawiki-utilities (
https://pypi.python.org/pypi/mediawiki-utilities) and mwparserfromhell (
https://github.com/earwig/mwparserfromhell) to parse the wikitext to
extract the links, the latter is already a part of pywikibot, though.


Cheers,
Morten


On 18 January 2016 at 10:45, Amir Ladsgroup <[email protected]> wrote:

> Hey,
> There is a really good module implemented in pywikibot called xmlreader.py
> <https://github.com/wikimedia/pywikibot-core/blob/master/pywikibot/xmlreader.py>.
> Also a help is built based on the source code
> <https://doc.wikimedia.org/pywikibot/api_ref/pywikibot.html#module-pywikibot.xmlreader>
> You can read the source code and write your own script. Some scripts also
> support xmlreader, read the manual for them in mediawiki.org
>
> Best
>
> On Mon, Jan 18, 2016 at 10:00 PM Luigi Assom <[email protected]>
> wrote:
>
>> hello hello!
>> about the use of pywikibot:
>> is it possible to use to parse the xml dump?
>>
>> I am interested in extracting links from pages (internal, external, with
>> distinction from ones belonging to category).
>> I also would like to handle transitive redirect.
>> I would like to process the dump, without accessing wiki, either access
>> wiki with proper limits in butch.
>>
>> Is there maybe something in the package already taking care of this ?
>> I 've seen in https://www.mediawiki.org/wiki/Manual:Pywikibot/Scripts
>> there is a "ghost" extracting_links.py" script,
>> I wonted to ask before re-inventing the wheel, and if pywikibot is
>> suitable tool for the purpose.
>>
>> Thank you,
>> L.
>> _______________________________________________
>> pywikibot mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/pywikibot
>>
>
> _______________________________________________
> pywikibot mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/pywikibot
>
>
_______________________________________________
pywikibot mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/pywikibot

Reply via email to