Hey Simon, Is it possible that there is an issue in how you are aggregating the data? I downloaded the dump for 2018-10 ( https://dumps.wikimedia.org/other/clickstream/2018-10/clickstream-enwiki-2018-10.tsv.gz) and found the following lines:
Global_warming Climate_change link 3048 Climate_change Global_warming link 6155 Best, Isaac On Mon, May 13, 2019 at 8:38 AM Simon Munzert <simon.munz...@googlemail.com> wrote: > Hi all, > > I've got a question on the completeness of the clickstream dataset. I > downloaded the dumps for 2018 from > https://dumps.wikimedia.org/other/clickstream/ (English Wikipedia only). > When I filter for the article pair "Climate change" and "Global warming" > (either one being either prev or curr) for all of 2018, this is what I get: > > prev curr type n month > <chr> <chr> <chr> <dbl> <chr> > 1 Global_warming Climate_change link 755 2018-04 > 2 Global_warming Climate_change link 810 2018-05 > 3 Climate_change Global_warming link 3730 2018-05 > 4 Climate_change Global_warming link 3962 2018-09 > 5 Climate_change Global_warming link 5865 2018-11 > 6 Climate_change Global_warming link 5491 2018-12 > 7 Global_warming Climate_change link 2227 2018-12 > > The visit numbers seem plausible. But why is there no data on, e.g., > January to March? And why is there data for both directions in May and > December, but not for the others? This seems implausible given the > popularity of the articles. > > Here's another example: > > prev curr type n month > <chr> <chr> <chr> <dbl> <chr> > 1 Smog Air_pollution link 140 2018-01 > 2 Air_pollution Smog link 82 2018-02 > 3 Air_pollution Smog link 295 2018-04 > 4 Air_pollution Smog link 215 2018-05 > 5 Smog Air_pollution link 85 2018-06 > 6 Air_pollution Smog link 233 2018-07 > 7 Air_pollution Smog link 45 2018-09 > 8 Smog Air_pollution link 96 2018-10 > 9 Smog Air_pollution link 90 2018-12 > > Am I missing something here? > > Thanks in advance, > Simon > _______________________________________________ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics > -- Isaac Johnson -- Research Scientist -- Wikimedia Foundation
_______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics