Re: [Analytics] Announcement - Mediawiki History Dumps

2020-02-10 Thread Pine W
I was thinking about the licensing issue some more. Apparently there
was a relevant United States court case regarding metadata several
years ago in the United States, but it's unclear to me from my brief
web search whether this holding would apply to metadata from every
nation. Also, I don't know if the underlying statues have changed
since the time of that ruling. I think that WMF Legal should be
consulted regarding the copyright status of the metadata. Also, I
think that the licensing of metadata should be explicitly addressed in
the Terms of Use or a similar document which is easily accessible to
all contributors to Wikimedia sites.

Pine
( https://meta.wikimedia.org/wiki/User:Pine )

On Tue, Feb 11, 2020 at 12:17 AM Pine W  wrote:
>
> Hi Joseph,
>
> Thanks for this announcement.
>
> I am looking for license information regarding the dumps, and I'm not
> finding it in the pages that you linked at [1] or [2]. The license
> that applies to text on Wikimedia sites is often CC-BY-SA 3.0, and the
> WMF Terms of Use at https://foundation.wikimedia.org/wiki/Terms_of_Use
> do not appear to provide any exception for metadata. In the absence of
> a specific license, I think that the CC-BY-SA or other relevant
> licenses would apply to the metadata, and that the licensing
> information should be prominently included on relevant pages and in
> the dumps themselves.
>
> What do you think?
>
> Pine
> ( https://meta.wikimedia.org/wiki/User:Pine )
>
> On Mon, Feb 10, 2020 at 4:28 PM Joseph Allemandou
>  wrote:
> >
> > Hi Analytics People,
> >
> > The Wikimedia Analytics Team is pleased to announce the release of the most 
> > complete dataset we have to date to analyze content and contributors 
> > metadata: Mediawiki History [1] [2].
> >
> > Data is in TSV format, released monthly around the 3rd of the month 
> > usually, and every new release contains the full history of metadata.
> >
> > The dataset contains an enhanced [3] and historified [4] version of user, 
> > page and revision metadata and serves as a base to Wiksitats API on edits, 
> > users and pages [5] [6].
> >
> > We hope you will have as much fun playing with the data as we have building 
> > it, and we're eager to hear from you [7], whether for issues, ideas or 
> > usage of the data.
> >
> > Analytically yours,
> >
> > --
> > Joseph Allemandou (joal) (he / him)
> > Sr Data Engineer
> > Wikimedia Foundation
> >
> > [1] https://dumps.wikimedia.org/other/mediawiki_history/readme.html
> > [2] 
> > https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Mediawiki_history_dumps
> > [3] Many pre-computed fields are present in the dataset, from edit-counts 
> > by user and page to reverts and reverted information, as well as time 
> > between events.
> > [4] As accurate as possible historical usernames and page-titles (as well 
> > as user-groups and blocks) is available in addition to current values, and 
> > are provided in a denormalized way to every event of the dataset.
> > [5] https://wikitech.wikimedia.org/wiki/Analytics/AQS/Wikistats_2
> > [6] https://wikimedia.org/api/rest_v1/
> > [7] 
> > https://phabricator.wikimedia.org/maniphest/task/edit/?title=Mediawiki%20History%20Dumps=Analytics-Wikistats,Analytics
> > ___
> > Analytics mailing list
> > Analytics@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/analytics

___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Announcement - Mediawiki History Dumps

2020-02-10 Thread Pine W
Hi Joseph,

Thanks for this announcement.

I am looking for license information regarding the dumps, and I'm not
finding it in the pages that you linked at [1] or [2]. The license
that applies to text on Wikimedia sites is often CC-BY-SA 3.0, and the
WMF Terms of Use at https://foundation.wikimedia.org/wiki/Terms_of_Use
do not appear to provide any exception for metadata. In the absence of
a specific license, I think that the CC-BY-SA or other relevant
licenses would apply to the metadata, and that the licensing
information should be prominently included on relevant pages and in
the dumps themselves.

What do you think?

Pine
( https://meta.wikimedia.org/wiki/User:Pine )

On Mon, Feb 10, 2020 at 4:28 PM Joseph Allemandou
 wrote:
>
> Hi Analytics People,
>
> The Wikimedia Analytics Team is pleased to announce the release of the most 
> complete dataset we have to date to analyze content and contributors 
> metadata: Mediawiki History [1] [2].
>
> Data is in TSV format, released monthly around the 3rd of the month usually, 
> and every new release contains the full history of metadata.
>
> The dataset contains an enhanced [3] and historified [4] version of user, 
> page and revision metadata and serves as a base to Wiksitats API on edits, 
> users and pages [5] [6].
>
> We hope you will have as much fun playing with the data as we have building 
> it, and we're eager to hear from you [7], whether for issues, ideas or usage 
> of the data.
>
> Analytically yours,
>
> --
> Joseph Allemandou (joal) (he / him)
> Sr Data Engineer
> Wikimedia Foundation
>
> [1] https://dumps.wikimedia.org/other/mediawiki_history/readme.html
> [2] 
> https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Mediawiki_history_dumps
> [3] Many pre-computed fields are present in the dataset, from edit-counts by 
> user and page to reverts and reverted information, as well as time between 
> events.
> [4] As accurate as possible historical usernames and page-titles (as well as 
> user-groups and blocks) is available in addition to current values, and are 
> provided in a denormalized way to every event of the dataset.
> [5] https://wikitech.wikimedia.org/wiki/Analytics/AQS/Wikistats_2
> [6] https://wikimedia.org/api/rest_v1/
> [7] 
> https://phabricator.wikimedia.org/maniphest/task/edit/?title=Mediawiki%20History%20Dumps=Analytics-Wikistats,Analytics
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics

___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] [Wiki-research-l] Announcement - Mediawiki History Dumps

2020-02-10 Thread Neil Shah-Quinn
I want to echo what Nate said. We've been using this for more than a year
within the Wikimedia Foundation, and it has made analyses of editing
behavior much, much easier and faster, not to mention a lot less annoying.

This is the product of years of expert work by the Analytics team, and they
deserve plenty of congratulations for it 

On Mon, 10 Feb 2020 at 10:42, Nate E TeBlunthuis  wrote:

> Thank you so much Joal! I've been happily using this data for some time
> and I'm optimistic that it can make doing thorough analyses of Wikimedia
> projects much more accessible to the community, students, and researchers.
>
> -- Nate
> --
> *From:* Wiki-research-l  on
> behalf of Joseph Allemandou 
> *Sent:* Monday, February 10, 2020 8:27 AM
> *To:* A mailing list for the Analytics Team at WMF and everybody who has
> an interest in Wikipedia and analytics. ;
> Research into Wikimedia content and communities <
> wiki-researc...@lists.wikimedia.org>; Product Analytics <
> product-analyt...@wikimedia.org>
> *Subject:* [Wiki-research-l] Announcement - Mediawiki History Dumps
>
> Hi Analytics People,
>
> The Wikimedia Analytics Team is pleased to announce the release of the most
> complete dataset we have to date to analyze content and contributors
> metadata: Mediawiki History [1] [2].
>
> Data is in TSV format, released monthly around the 3rd of the month
> usually, and every new release contains the full history of metadata.
>
> The dataset contains an enhanced [3] and historified [4] version of user,
> page and revision metadata and serves as a base to Wiksitats API on edits,
> users and pages [5] [6].
>
> We hope you will have as much fun playing with the data as we have building
> it, and we're eager to hear from you [7], whether for issues, ideas or
> usage of the data.
>
> Analytically yours,
>
> --
> Joseph Allemandou (joal) (he / him)
> Sr Data Engineer
> Wikimedia Foundation
>
> [1] https://dumps.wikimedia.org/other/mediawiki_history/readme.html
> [2]
>
> https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Mediawiki_history_dumps
> [3] Many pre-computed fields are present in the dataset, from edit-counts
> by user and page to reverts and reverted information, as well as time
> between events.
> [4] As accurate as possible historical usernames and page-titles (as well
> as user-groups and blocks) is available in addition to current values, and
> are provided in a denormalized way to every event of the dataset.
> [5] https://wikitech.wikimedia.org/wiki/Analytics/AQS/Wikistats_2
> [6] https://wikimedia.org/api/rest_v1/
> [7]
>
> https://phabricator.wikimedia.org/maniphest/task/edit/?title=Mediawiki%20History%20Dumps=Analytics-Wikistats,Analytics
> ___
> Wiki-research-l mailing list
> wiki-researc...@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


[Analytics] Announcement - Mediawiki History Dumps

2020-02-10 Thread Joseph Allemandou
Hi Analytics People,

The Wikimedia Analytics Team is pleased to announce the release of the most
complete dataset we have to date to analyze content and contributors
metadata: Mediawiki History [1] [2].

Data is in TSV format, released monthly around the 3rd of the month
usually, and every new release contains the full history of metadata.

The dataset contains an enhanced [3] and historified [4] version of user,
page and revision metadata and serves as a base to Wiksitats API on edits,
users and pages [5] [6].

We hope you will have as much fun playing with the data as we have building
it, and we're eager to hear from you [7], whether for issues, ideas or
usage of the data.

Analytically yours,

-- 
Joseph Allemandou (joal) (he / him)
Sr Data Engineer
Wikimedia Foundation

[1] https://dumps.wikimedia.org/other/mediawiki_history/readme.html
[2]
https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Mediawiki_history_dumps
[3] Many pre-computed fields are present in the dataset, from edit-counts
by user and page to reverts and reverted information, as well as time
between events.
[4] As accurate as possible historical usernames and page-titles (as well
as user-groups and blocks) is available in addition to current values, and
are provided in a denormalized way to every event of the dataset.
[5] https://wikitech.wikimedia.org/wiki/Analytics/AQS/Wikistats_2
[6] https://wikimedia.org/api/rest_v1/
[7]
https://phabricator.wikimedia.org/maniphest/task/edit/?title=Mediawiki%20History%20Dumps=Analytics-Wikistats,Analytics
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics