Hi Joseph,

Thanks for this announcement.

I am looking for license information regarding the dumps, and I'm not
finding it in the pages that you linked at [1] or [2]. The license
that applies to text on Wikimedia sites is often CC-BY-SA 3.0, and the
WMF Terms of Use at https://foundation.wikimedia.org/wiki/Terms_of_Use
do not appear to provide any exception for metadata. In the absence of
a specific license, I think that the CC-BY-SA or other relevant
licenses would apply to the metadata, and that the licensing
information should be prominently included on relevant pages and in
the dumps themselves.

What do you think?

Pine
( https://meta.wikimedia.org/wiki/User:Pine )

On Mon, Feb 10, 2020 at 4:28 PM Joseph Allemandou
<jalleman...@wikimedia.org> wrote:
>
> Hi Analytics People,
>
> The Wikimedia Analytics Team is pleased to announce the release of the most 
> complete dataset we have to date to analyze content and contributors 
> metadata: Mediawiki History [1] [2].
>
> Data is in TSV format, released monthly around the 3rd of the month usually, 
> and every new release contains the full history of metadata.
>
> The dataset contains an enhanced [3] and historified [4] version of user, 
> page and revision metadata and serves as a base to Wiksitats API on edits, 
> users and pages [5] [6].
>
> We hope you will have as much fun playing with the data as we have building 
> it, and we're eager to hear from you [7], whether for issues, ideas or usage 
> of the data.
>
> Analytically yours,
>
> --
> Joseph Allemandou (joal) (he / him)
> Sr Data Engineer
> Wikimedia Foundation
>
> [1] https://dumps.wikimedia.org/other/mediawiki_history/readme.html
> [2] 
> https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Mediawiki_history_dumps
> [3] Many pre-computed fields are present in the dataset, from edit-counts by 
> user and page to reverts and reverted information, as well as time between 
> events.
> [4] As accurate as possible historical usernames and page-titles (as well as 
> user-groups and blocks) is available in addition to current values, and are 
> provided in a denormalized way to every event of the dataset.
> [5] https://wikitech.wikimedia.org/wiki/Analytics/AQS/Wikistats_2
> [6] https://wikimedia.org/api/rest_v1/
> [7] 
> https://phabricator.wikimedia.org/maniphest/task/edit/?title=Mediawiki%20History%20Dumps&projectPHIDs=Analytics-Wikistats,Analytics
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to