Bearloga has uploaded a new change for review. ( https://gerrit.wikimedia.org/r/378185 )
Change subject: Remove Cirrus API forecasting ...................................................................... Remove Cirrus API forecasting Bug: T112170 Change-Id: I8a2107f7c494c4567c41d1f8331902687fb26112 --- M docs/README.md M modules/forecasts/forecast.R D modules/forecasts/search/api_cirrus_arima D modules/forecasts/search/api_cirrus_bsts D modules/forecasts/search/api_cirrus_prophet M modules/forecasts/search/config.yaml 6 files changed, 36 insertions(+), 52 deletions(-) git pull ssh://gerrit.wikimedia.org:29418/wikimedia/discovery/golden refs/changes/85/378185/1 diff --git a/docs/README.md b/docs/README.md index 1b2abe6..afc5111 100644 --- a/docs/README.md +++ b/docs/README.md @@ -8,7 +8,7 @@ infrastructure. These datasets provide the metrics that are used by [Discovery's Dashboards](https://discovery.wmflabs.org/) -Last updated on 30 August 2017 +Last updated on 14 September 2017 Daily Metrics ------------- @@ -57,14 +57,24 @@ top 10 countries - **all\_country\_data.tsv**: Sampled traffic to Wikipedia.org Portal, broken down by country +- **all\_country\_data\_history.tsv**: Sampled traffic to + Wikipedia.org Portal, broken down by country. Historical data store. - **app\_link\_clicks.tsv**: Clicks to Wikipedia mobile apps and list of apps - **last\_action\_country.tsv**: Last action performed on Wikipedia.org Portal per user session +- **last\_action\_country\_history.tsv**: Last action performed on + Wikipedia.org Portal per user session. Historical data store. - **most\_common\_country.tsv**: Most common action performed on Wikipedia.org Portal per user session, broken down by country +- **most\_common\_country\_history.tsv**: Most common action performed + on Wikipedia.org Portal per user session, broken down by country. + Historical data store. - **first\_visits\_country.tsv**: Action performed on Wikipedia.org Portal on each user's initial visit, broken down by country +- **first\_visits\_country\_history.tsv**: Action performed on + Wikipedia.org Portal on each user's initial visit, broken down by + country. Historical data store. - **clickthrough\_rate.tsv**: Last action (no action vs clickthrough) by Wikipedia.org Portal visitors - **clickthrough\_sisterprojects.tsv**: Clicks to Wikimedia projects @@ -85,6 +95,9 @@ - **app\_event\_counts\_langproj\_breakdown.tsv**: Clicks and other events by users searching on Android and iOS apps broken down by language +- **app\_event\_counts\_langproj\_breakdown\_history.tsv**: Clicks and + other events by users searching on Android and iOS apps broken down + by language. Historical data store. - **app\_load\_times.tsv**: User-perceived load times when searching on Android and iOS apps - **invoke\_source\_counts.tsv**: How the user initiated their search @@ -96,6 +109,9 @@ - **mobile\_event\_counts\_langproj\_breakdown.tsv**: Clicks and other events by users searching on mobile web broken down by language-project pairs +- **mobile\_event\_counts\_langproj\_breakdown\_history.tsv**: Clicks + and other events by users searching on mobile web broken down by + language-project pairs. Historical data store. - **mobile\_load\_times.tsv**: User-perceived load times when searching on mobile web - **desktop\_event\_counts.tsv**: Clicks and other events by users @@ -103,6 +119,9 @@ - **desktop\_event\_counts\_langproj\_breakdown.tsv**: Clicks and other events by users searching on desktop broken down by language-project pairs +- **desktop\_event\_counts\_langproj\_breakdown\_history.tsv**: Clicks + and other events by users searching on desktop broken down by + language-project pairs. Historical data store. - **desktop\_load\_times.tsv**: User-perceived load times when searching on desktop - **paulscore\_approximations.tsv**: Relevancy of our desktop search @@ -112,6 +131,10 @@ Relevancy of our fulltext desktop search as measured by [PaulScore](https://www.mediawiki.org/wiki/Wikimedia_Discovery/Search/Glossary#PaulScore) broken down by language-project pairs +- **paulscore\_approximations\_fulltext\_langproj\_breakdown\_history.tsv**: + Relevancy of our fulltext desktop search as measured by + [PaulScore](https://www.mediawiki.org/wiki/Wikimedia_Discovery/Search/Glossary#PaulScore) + broken down by language-project pairs. Historical data store. - **sample\_page\_visit\_ld.tsv**: How long users last on pages they arrived at from the search results page, computed like [median lethal dose in @@ -122,6 +145,10 @@ - **search\_threshold\_pass\_rate\_langproj\_breakdown.tsv**: Proportion of users having search sessions longer than a predetermined threshold (10s) broken down by language-project pairs +- **search\_threshold\_pass\_rate\_langproj\_breakdown\_history.tsv**: + Proportion of users having search sessions longer than a + predetermined threshold (10s) broken down by language-project pairs. + Historical data store. - **cirrus\_query\_aggregates\_no\_automata.tsv**: Zero results rate (ZRR), excluding known bots/tools - **cirrus\_query\_aggregates\_with\_automata.tsv**: Overall zero @@ -138,9 +165,17 @@ - **cirrus\_langproj\_breakdown\_no\_automata.tsv**: Zero results and total searches broken down by language-project pairs (e.g. German Wikiquote ZRR vs. French Wikibooks ZRR), excluding known bots/tools +- **cirrus\_langproj\_breakdown\_no\_automata\_history.tsv**: Zero + results and total searches broken down by language-project pairs + (e.g. German Wikiquote ZRR vs. French Wikibooks ZRR), excluding + known bots/tools. Historical data store. - **cirrus\_langproj\_breakdown\_with\_automata.tsv**: Zero results and total searches broken down by language-project pairs (e.g. German Wikiquote ZRR vs. French Wikibooks ZRR) +- **cirrus\_langproj\_breakdown\_with\_automata\_history.tsv**: Zero + results and total searches broken down by language-project pairs + (e.g. German Wikiquote ZRR vs. French Wikibooks ZRR). Historical + data store. - **sister\_search\_traffic.tsv**: Traffic to various wikis from Wikipedia search results pages; broken up by language, destination type (SERP vs not), and access method (desktop vs mobile web); @@ -162,12 +197,6 @@ search/ ------- -- **api\_cirrus\_arima.tsv**: ARIMA-modelled forecasts of Cirrus API - usage by non-automata users -- **api\_cirrus\_bsts.tsv**: BSTS-modelled forecasts of Cirrus API - usage by non-automata users -- **api\_cirrus\_prophet.tsv**: Prophet-modelled forecasts of Cirrus - API usage by non-automata users - **zrr\_overall\_arima.tsv**: ARIMA-modelled forecasts of zero results rate, excluding known bots/tools - **zrr\_overall\_bsts.tsv**: BSTS-modelled forecasts of zero results diff --git a/modules/forecasts/forecast.R b/modules/forecasts/forecast.R index eae88a1..eb3cb0d 100755 --- a/modules/forecasts/forecast.R +++ b/modules/forecasts/forecast.R @@ -8,7 +8,6 @@ make_option(c("-d", "--date"), default = NA, action = "store", type = "character"), make_option("--metric", default = NA, action = "store", type = "character", help = "Available: - * search_api_cirrus * search_zrr_overall * wdqs_homepage * wdqs_sparql"), @@ -80,26 +79,6 @@ output <- switch( opt$metric, - - "search_api_cirrus" = { - api_usage <- read_data("discovery/metrics/search/search_api_usage.tsv", col_types = "Dci") %>% - dplyr::filter(date <= as.Date(opt$date)) %>% - dplyr::arrange(date, api) %>% - dplyr::distinct(date, api, .keep_all = TRUE) %>% - dplyr::filter(!is.na(api)) %>% - tidyr::spread(api, calls) %>% - { xts::xts(.[, -1], order.by = .$date) } %>% - check_dataset - if (opt$model == "ARIMA") { - try( - ceiling(forecast_arima(api_usage[, "cirrus"], arima_params = list(order = c(0, 1, 2), seasonal = list(order = c(2, 1, 1), period = 7)))) - ) - } else if (opt$model == "BSTS") { - ceiling(forecast_bsts(api_usage[, "cirrus"], transformation = "log", ar_lags = 1, n_iter = opt$iters, burn_in = opt$burnin)) - } else if (opt$model == "Prophet") { - ceiling(forecast_prophet(api_usage[, "cirrus"], transformation = "log", n_iter = opt$iters)) - } - }, "search_zrr_overall" = { zrr_overall <- read_data("discovery/metrics/search/cirrus_query_aggregates_no_automata.tsv", col_types = "Dd") %>% diff --git a/modules/forecasts/search/api_cirrus_arima b/modules/forecasts/search/api_cirrus_arima deleted file mode 100755 index be8ca1b..0000000 --- a/modules/forecasts/search/api_cirrus_arima +++ /dev/null @@ -1,3 +0,0 @@ -#!/bin/bash - -Rscript modules/forecasts/forecast.R --date=$1 --metric=search_api_cirrus --model=ARIMA diff --git a/modules/forecasts/search/api_cirrus_bsts b/modules/forecasts/search/api_cirrus_bsts deleted file mode 100755 index 7768915..0000000 --- a/modules/forecasts/search/api_cirrus_bsts +++ /dev/null @@ -1,3 +0,0 @@ -#!/bin/bash - -Rscript modules/forecasts/forecast.R --date=$1 --metric=search_api_cirrus --model=BSTS diff --git a/modules/forecasts/search/api_cirrus_prophet b/modules/forecasts/search/api_cirrus_prophet deleted file mode 100755 index 26a4a06..0000000 --- a/modules/forecasts/search/api_cirrus_prophet +++ /dev/null @@ -1,3 +0,0 @@ -#!/bin/bash - -Rscript modules/forecasts/forecast.R --date=$1 --metric=search_api_cirrus --model=Prophet diff --git a/modules/forecasts/search/config.yaml b/modules/forecasts/search/config.yaml index a46deaa..9cced40 100644 --- a/modules/forecasts/search/config.yaml +++ b/modules/forecasts/search/config.yaml @@ -1,19 +1,4 @@ reports: - api_cirrus_arima: - description: ARIMA-modelled forecasts of Cirrus API usage by non-automata users - granularity: days - starts: 2017-02-01 - type: script - api_cirrus_bsts: - description: BSTS-modelled forecasts of Cirrus API usage by non-automata users - granularity: days - starts: 2017-02-01 - type: script - api_cirrus_prophet: - description: Prophet-modelled forecasts of Cirrus API usage by non-automata users - granularity: days - starts: 2017-02-01 - type: script zrr_overall_arima: description: ARIMA-modelled forecasts of zero results rate, excluding known bots/tools granularity: days -- To view, visit https://gerrit.wikimedia.org/r/378185 To unsubscribe, visit https://gerrit.wikimedia.org/r/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I8a2107f7c494c4567c41d1f8331902687fb26112 Gerrit-PatchSet: 1 Gerrit-Project: wikimedia/discovery/golden Gerrit-Branch: master Gerrit-Owner: Bearloga <mpo...@wikimedia.org> _______________________________________________ MediaWiki-commits mailing list MediaWiki-commits@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits