As masti says, if you're interested in the content of all pages then using
a dump is much more efficient. There are some very useful Python 3
libraries for processing them here:
http://pythonhosted.org/mediawiki-utilities/

Also, there's a bunch of researchers who are familiar with this sort of
problem to be found over on wiki-research-l:
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l (I'm one of
them)


Cheers,
Morten


On 16 March 2017 at 11:23, masti <[email protected]> wrote:

> if You set content to False the page.text will be None and the conetnt
> will be fetched live once You use it.
>
> working on all pages contet will be easier from dumps. As you download
> them offline compressed. otherwise You make even more traffic from live
> wiki.
>
> masti
>
>
>
> On 16.03.2017 18:04, Haifeng Zhang wrote:
>
> Hi, folks,
>
> Recently, I'm working on a research project which needs extracting article
> information from wikipedia.
>
> I managed to get pywikibot work on my computer and was able to pull out a
> few simple results.
>
> One question is regarding a method called pywikibot.pagegenerators.Allpage
> sPageGenerator.
>
> By setting the argument "content" to "True", it will return a page
> generator with current version. But, which version will be returned
> if setting the argument to False?
>
> Also, is there a way in pywikibot to get a page generator that contains
> articles/pages up to a certain date?
>
> Maybe, pywikibot is not a right tool to do this.
>
> I was thinking of using wiki dump data instead of using a wiki API.
>
> But, it seems the files are huge. I appreciate it you happen to have any
> idea to deal with this.
>
>
> Thanks a lot!
>
> hz.cmu
>
>
>
>
>
>
> _______________________________________________
> pywikibot mailing 
> [email protected]https://lists.wikimedia.org/mailman/listinfo/pywikibot
>
>
>
> _______________________________________________
> pywikibot mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/pywikibot
>
>
_______________________________________________
pywikibot mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/pywikibot

Reply via email to