Eric,

We don't produce dumps of the revision table in sql format because some of
those revisions may be hidden from public view, and even metadata about
them should not be released. We do however publish so-called Adds/Changes
dumps once a day for each wiki, providing stubs and content files in xml of
just new pages and revisions since the last such dump. They lag about 12
hours behind to allow vandalism and such to be filtered out by wiki admins,
but hopefully that's good enough for your needs.  You can find those here:
https://dumps.wikimedia.org/other/incr/

Ariel Glenn
ar...@wikimedia.org

On Tue, Jan 17, 2023 at 6:22 AM Eric Andrew Lewis <
eric.andrew.le...@gmail.com> wrote:

> Hi,
>
> I am interested in performing analysis on recently created pages on
> English Wikipedia.
>
> One way to find recently created pages is downloading a meta-history file
> for the English language, and filter through the XML, looking for pages
> where the oldest revision is within the desired timespan.
>
> Since this requires a library to parse through XML string data, I would
> imagine this is much slower than a database query. Is page revision data
> available in one of the SQL dumps which I could query for this use case?
> Looking at the exported tables list
> <https://meta.wikimedia.org/wiki/Data_dumps/What%27s_available_for_download#Database_tables>,
> it does not look like it is. Maybe this is intentional?
>
> Thanks,
> Eric Andrew Lewis
> ericandrewlewis.com
> +1 610 715 8560
> _______________________________________________
> Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org
> To unsubscribe send an email to xmldatadumps-l-le...@lists.wikimedia.org
>
_______________________________________________
Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org
To unsubscribe send an email to xmldatadumps-l-le...@lists.wikimedia.org

Reply via email to