As masti says, if you're interested in the content of all pages then using a dump is much more efficient. There are some very useful Python 3 libraries for processing them here: http://pythonhosted.org/mediawiki-utilities/
Also, there's a bunch of researchers who are familiar with this sort of problem to be found over on wiki-research-l: https://lists.wikimedia.org/mailman/listinfo/wiki-research-l (I'm one of them) Cheers, Morten On 16 March 2017 at 11:23, masti <[email protected]> wrote: > if You set content to False the page.text will be None and the conetnt > will be fetched live once You use it. > > working on all pages contet will be easier from dumps. As you download > them offline compressed. otherwise You make even more traffic from live > wiki. > > masti > > > > On 16.03.2017 18:04, Haifeng Zhang wrote: > > Hi, folks, > > Recently, I'm working on a research project which needs extracting article > information from wikipedia. > > I managed to get pywikibot work on my computer and was able to pull out a > few simple results. > > One question is regarding a method called pywikibot.pagegenerators.Allpage > sPageGenerator. > > By setting the argument "content" to "True", it will return a page > generator with current version. But, which version will be returned > if setting the argument to False? > > Also, is there a way in pywikibot to get a page generator that contains > articles/pages up to a certain date? > > Maybe, pywikibot is not a right tool to do this. > > I was thinking of using wiki dump data instead of using a wiki API. > > But, it seems the files are huge. I appreciate it you happen to have any > idea to deal with this. > > > Thanks a lot! > > hz.cmu > > > > > > > _______________________________________________ > pywikibot mailing > [email protected]https://lists.wikimedia.org/mailman/listinfo/pywikibot > > > > _______________________________________________ > pywikibot mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/pywikibot > >
_______________________________________________ pywikibot mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/pywikibot
