Hi Joseph, Thanks for this announcement.
I am looking for license information regarding the dumps, and I'm not finding it in the pages that you linked at [1] or [2]. The license that applies to text on Wikimedia sites is often CC-BY-SA 3.0, and the WMF Terms of Use at https://foundation.wikimedia.org/wiki/Terms_of_Use do not appear to provide any exception for metadata. In the absence of a specific license, I think that the CC-BY-SA or other relevant licenses would apply to the metadata, and that the licensing information should be prominently included on relevant pages and in the dumps themselves. What do you think? Pine ( https://meta.wikimedia.org/wiki/User:Pine ) On Mon, Feb 10, 2020 at 4:28 PM Joseph Allemandou <jalleman...@wikimedia.org> wrote: > > Hi Analytics People, > > The Wikimedia Analytics Team is pleased to announce the release of the most > complete dataset we have to date to analyze content and contributors > metadata: Mediawiki History [1] [2]. > > Data is in TSV format, released monthly around the 3rd of the month usually, > and every new release contains the full history of metadata. > > The dataset contains an enhanced [3] and historified [4] version of user, > page and revision metadata and serves as a base to Wiksitats API on edits, > users and pages [5] [6]. > > We hope you will have as much fun playing with the data as we have building > it, and we're eager to hear from you [7], whether for issues, ideas or usage > of the data. > > Analytically yours, > > -- > Joseph Allemandou (joal) (he / him) > Sr Data Engineer > Wikimedia Foundation > > [1] https://dumps.wikimedia.org/other/mediawiki_history/readme.html > [2] > https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Mediawiki_history_dumps > [3] Many pre-computed fields are present in the dataset, from edit-counts by > user and page to reverts and reverted information, as well as time between > events. > [4] As accurate as possible historical usernames and page-titles (as well as > user-groups and blocks) is available in addition to current values, and are > provided in a denormalized way to every event of the dataset. > [5] https://wikitech.wikimedia.org/wiki/Analytics/AQS/Wikistats_2 > [6] https://wikimedia.org/api/rest_v1/ > [7] > https://phabricator.wikimedia.org/maniphest/task/edit/?title=Mediawiki%20History%20Dumps&projectPHIDs=Analytics-Wikistats,Analytics > _______________________________________________ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics