An alternative is Aaron Halfaker's mediawiki-utilities ( https://pypi.python.org/pypi/mediawiki-utilities) and mwparserfromhell ( https://github.com/earwig/mwparserfromhell) to parse the wikitext to extract the links, the latter is already a part of pywikibot, though.
Cheers, Morten On 18 January 2016 at 10:45, Amir Ladsgroup <[email protected]> wrote: > Hey, > There is a really good module implemented in pywikibot called xmlreader.py > <https://github.com/wikimedia/pywikibot-core/blob/master/pywikibot/xmlreader.py>. > Also a help is built based on the source code > <https://doc.wikimedia.org/pywikibot/api_ref/pywikibot.html#module-pywikibot.xmlreader> > You can read the source code and write your own script. Some scripts also > support xmlreader, read the manual for them in mediawiki.org > > Best > > On Mon, Jan 18, 2016 at 10:00 PM Luigi Assom <[email protected]> > wrote: > >> hello hello! >> about the use of pywikibot: >> is it possible to use to parse the xml dump? >> >> I am interested in extracting links from pages (internal, external, with >> distinction from ones belonging to category). >> I also would like to handle transitive redirect. >> I would like to process the dump, without accessing wiki, either access >> wiki with proper limits in butch. >> >> Is there maybe something in the package already taking care of this ? >> I 've seen in https://www.mediawiki.org/wiki/Manual:Pywikibot/Scripts >> there is a "ghost" extracting_links.py" script, >> I wonted to ask before re-inventing the wheel, and if pywikibot is >> suitable tool for the purpose. >> >> Thank you, >> L. >> _______________________________________________ >> pywikibot mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/pywikibot >> > > _______________________________________________ > pywikibot mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/pywikibot > >
_______________________________________________ pywikibot mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/pywikibot
