Hi Kate,

and thank you very much for your feedback.

I think I've forgotten to mention how the Wiki comparison 2020 dataset is
so great that I will start using it in my R programming language classes as
of today to help people learn more about hypothesis testing and join
operations across the dataframes : )

Thank you for all the hard work!

> We'll keep an eye toward consistency, but we have not made the data
extraction into a fully automated process.
I have seen the code, I know the pain too well... All the work and then in
the end there is always an additional detail that was maybe not considered
in the beginning, similar things made me cry in the past in my work on
Wikidata... I sympathise with you and the team and I wish you all the best
in your future work!

And by the way... The differences in column names are really not such a big
deal, the variable semantics are so obvious so they match easily. Good work!

With best wishes,
Goran

Goran S. Milovanović, PhD
Data Scientist, Software Department
Wikimedia Deutschland

------------------------------------------------
"It's not the size of the dog in the fight,
it's the size of the fight in the dog."
- Mark Twain
------------------------------------------------


On Tue, Feb 23, 2021 at 11:38 PM Kate Zimmerman <kzimmer...@wikimedia.org>
wrote:

> Hi Goran,
>
> We'll keep an eye toward consistency, but we have not made the data
> extraction into a fully automated process.
>
> We identified 3 columns that had slightly different names and we'll fix
> them:
> overall SIZE rank (2020) vs. overall size rank (2018, 2019)
> second month editor retention (2020) vs. second-month new editor retention
> (2018, 2019)
> monthly structured discussions messages (2020) vs. monthly structured
> discussions (Flow) messages (2018, 2019)
>
> The "project code" column was duplicated in 2020; the duplicate has now
> been removed.
>
> Finally, in 2019 we had added 3 new columns that we hadn't tracked in
> 2018: content pages, cumulative content edits, edits per content page.
> Please be aware that we may add or change columns in the future as needs
> evolve.
>
> Warm regards,
> Kate
>
> On Tue, Feb 23, 2021 at 12:37 PM Goran Milovanovic <
> goran.milovanovic_...@wikimedia.de> wrote:
>
>> Well, it would be desirable to maintain consistent column names across
>> the years...
>>
>> Best,
>> Goran
>>
>> Goran S. Milovanović, PhD
>> Data Scientist, Software Department
>> Wikimedia Deutschland
>>
>> ------------------------------------------------
>> "It's not the size of the dog in the fight,
>> it's the size of the fight in the dog."
>> - Mark Twain
>> ------------------------------------------------
>>
>>
>> On Tue, Feb 23, 2021 at 2:42 AM Jennifer Wang <jw...@wikimedia.org>
>> wrote:
>>
>>> Hi all,
>>>
>>> For your reference we have updated wiki comparison dataset
>>> <https://www.mediawiki.org/wiki/Product_Analytics/Comparison_datasets>
>>> with 2020 data
>>> <https://docs.google.com/spreadsheets/d/1a-UBqsYtJl6gpauJyanx0nyxuPqRvhzJRN817XpkuS8/edit?usp=sharing>
>>> . If you have any feedback or suggestions, please let us know via
>>> product-analyt...@wikimedia.org.
>>>
>>> Regards,
>>> Jennifer & Product Analytics
>>> _______________________________________________
>>> Analytics mailing list
>>> Analytics@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>>
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to