Bearloga has uploaded a new change for review. ( 
https://gerrit.wikimedia.org/r/378185 )

Change subject: Remove Cirrus API forecasting
......................................................................

Remove Cirrus API forecasting

Bug: T112170
Change-Id: I8a2107f7c494c4567c41d1f8331902687fb26112
---
M docs/README.md
M modules/forecasts/forecast.R
D modules/forecasts/search/api_cirrus_arima
D modules/forecasts/search/api_cirrus_bsts
D modules/forecasts/search/api_cirrus_prophet
M modules/forecasts/search/config.yaml
6 files changed, 36 insertions(+), 52 deletions(-)


  git pull ssh://gerrit.wikimedia.org:29418/wikimedia/discovery/golden 
refs/changes/85/378185/1

diff --git a/docs/README.md b/docs/README.md
index 1b2abe6..afc5111 100644
--- a/docs/README.md
+++ b/docs/README.md
@@ -8,7 +8,7 @@
 infrastructure. These datasets provide the metrics that are used by
 [Discovery's Dashboards](https://discovery.wmflabs.org/)
 
-Last updated on 30 August 2017
+Last updated on 14 September 2017
 
 Daily Metrics
 -------------
@@ -57,14 +57,24 @@
     top 10 countries
 -   **all\_country\_data.tsv**: Sampled traffic to Wikipedia.org Portal,
     broken down by country
+-   **all\_country\_data\_history.tsv**: Sampled traffic to
+    Wikipedia.org Portal, broken down by country. Historical data store.
 -   **app\_link\_clicks.tsv**: Clicks to Wikipedia mobile apps and list
     of apps
 -   **last\_action\_country.tsv**: Last action performed on
     Wikipedia.org Portal per user session
+-   **last\_action\_country\_history.tsv**: Last action performed on
+    Wikipedia.org Portal per user session. Historical data store.
 -   **most\_common\_country.tsv**: Most common action performed on
     Wikipedia.org Portal per user session, broken down by country
+-   **most\_common\_country\_history.tsv**: Most common action performed
+    on Wikipedia.org Portal per user session, broken down by country.
+    Historical data store.
 -   **first\_visits\_country.tsv**: Action performed on Wikipedia.org
     Portal on each user's initial visit, broken down by country
+-   **first\_visits\_country\_history.tsv**: Action performed on
+    Wikipedia.org Portal on each user's initial visit, broken down by
+    country. Historical data store.
 -   **clickthrough\_rate.tsv**: Last action (no action vs clickthrough)
     by Wikipedia.org Portal visitors
 -   **clickthrough\_sisterprojects.tsv**: Clicks to Wikimedia projects
@@ -85,6 +95,9 @@
 -   **app\_event\_counts\_langproj\_breakdown.tsv**: Clicks and other
     events by users searching on Android and iOS apps broken down by
     language
+-   **app\_event\_counts\_langproj\_breakdown\_history.tsv**: Clicks and
+    other events by users searching on Android and iOS apps broken down
+    by language. Historical data store.
 -   **app\_load\_times.tsv**: User-perceived load times when searching
     on Android and iOS apps
 -   **invoke\_source\_counts.tsv**: How the user initiated their search
@@ -96,6 +109,9 @@
 -   **mobile\_event\_counts\_langproj\_breakdown.tsv**: Clicks and other
     events by users searching on mobile web broken down by
     language-project pairs
+-   **mobile\_event\_counts\_langproj\_breakdown\_history.tsv**: Clicks
+    and other events by users searching on mobile web broken down by
+    language-project pairs. Historical data store.
 -   **mobile\_load\_times.tsv**: User-perceived load times when
     searching on mobile web
 -   **desktop\_event\_counts.tsv**: Clicks and other events by users
@@ -103,6 +119,9 @@
 -   **desktop\_event\_counts\_langproj\_breakdown.tsv**: Clicks and
     other events by users searching on desktop broken down by
     language-project pairs
+-   **desktop\_event\_counts\_langproj\_breakdown\_history.tsv**: Clicks
+    and other events by users searching on desktop broken down by
+    language-project pairs. Historical data store.
 -   **desktop\_load\_times.tsv**: User-perceived load times when
     searching on desktop
 -   **paulscore\_approximations.tsv**: Relevancy of our desktop search
@@ -112,6 +131,10 @@
     Relevancy of our fulltext desktop search as measured by
     
[PaulScore](https://www.mediawiki.org/wiki/Wikimedia_Discovery/Search/Glossary#PaulScore)
     broken down by language-project pairs
+-   **paulscore\_approximations\_fulltext\_langproj\_breakdown\_history.tsv**:
+    Relevancy of our fulltext desktop search as measured by
+    
[PaulScore](https://www.mediawiki.org/wiki/Wikimedia_Discovery/Search/Glossary#PaulScore)
+    broken down by language-project pairs. Historical data store.
 -   **sample\_page\_visit\_ld.tsv**: How long users last on pages they
     arrived at from the search results page, computed like [median
     lethal dose in
@@ -122,6 +145,10 @@
 -   **search\_threshold\_pass\_rate\_langproj\_breakdown.tsv**:
     Proportion of users having search sessions longer than a
     predetermined threshold (10s) broken down by language-project pairs
+-   **search\_threshold\_pass\_rate\_langproj\_breakdown\_history.tsv**:
+    Proportion of users having search sessions longer than a
+    predetermined threshold (10s) broken down by language-project pairs.
+    Historical data store.
 -   **cirrus\_query\_aggregates\_no\_automata.tsv**: Zero results rate
     (ZRR), excluding known bots/tools
 -   **cirrus\_query\_aggregates\_with\_automata.tsv**: Overall zero
@@ -138,9 +165,17 @@
 -   **cirrus\_langproj\_breakdown\_no\_automata.tsv**: Zero results and
     total searches broken down by language-project pairs (e.g. German
     Wikiquote ZRR vs. French Wikibooks ZRR), excluding known bots/tools
+-   **cirrus\_langproj\_breakdown\_no\_automata\_history.tsv**: Zero
+    results and total searches broken down by language-project pairs
+    (e.g. German Wikiquote ZRR vs. French Wikibooks ZRR), excluding
+    known bots/tools. Historical data store.
 -   **cirrus\_langproj\_breakdown\_with\_automata.tsv**: Zero results
     and total searches broken down by language-project pairs (e.g.
     German Wikiquote ZRR vs. French Wikibooks ZRR)
+-   **cirrus\_langproj\_breakdown\_with\_automata\_history.tsv**: Zero
+    results and total searches broken down by language-project pairs
+    (e.g. German Wikiquote ZRR vs. French Wikibooks ZRR). Historical
+    data store.
 -   **sister\_search\_traffic.tsv**: Traffic to various wikis from
     Wikipedia search results pages; broken up by language, destination
     type (SERP vs not), and access method (desktop vs mobile web);
@@ -162,12 +197,6 @@
 search/
 -------
 
--   **api\_cirrus\_arima.tsv**: ARIMA-modelled forecasts of Cirrus API
-    usage by non-automata users
--   **api\_cirrus\_bsts.tsv**: BSTS-modelled forecasts of Cirrus API
-    usage by non-automata users
--   **api\_cirrus\_prophet.tsv**: Prophet-modelled forecasts of Cirrus
-    API usage by non-automata users
 -   **zrr\_overall\_arima.tsv**: ARIMA-modelled forecasts of zero
     results rate, excluding known bots/tools
 -   **zrr\_overall\_bsts.tsv**: BSTS-modelled forecasts of zero results
diff --git a/modules/forecasts/forecast.R b/modules/forecasts/forecast.R
index eae88a1..eb3cb0d 100755
--- a/modules/forecasts/forecast.R
+++ b/modules/forecasts/forecast.R
@@ -8,7 +8,6 @@
   make_option(c("-d", "--date"), default = NA, action = "store", type = 
"character"),
   make_option("--metric", default = NA, action = "store", type = "character",
               help = "Available:
-                  * search_api_cirrus
                   * search_zrr_overall
                   * wdqs_homepage
                   * wdqs_sparql"),
@@ -80,26 +79,6 @@
 
 output <- switch(
   opt$metric,
-
-  "search_api_cirrus" = {
-    api_usage <- read_data("discovery/metrics/search/search_api_usage.tsv", 
col_types = "Dci") %>%
-      dplyr::filter(date <= as.Date(opt$date)) %>%
-      dplyr::arrange(date, api) %>%
-      dplyr::distinct(date, api, .keep_all = TRUE) %>%
-      dplyr::filter(!is.na(api)) %>%
-      tidyr::spread(api, calls) %>%
-      { xts::xts(.[, -1], order.by = .$date) } %>%
-      check_dataset
-    if (opt$model == "ARIMA") {
-      try(
-        ceiling(forecast_arima(api_usage[, "cirrus"], arima_params = 
list(order = c(0, 1, 2), seasonal = list(order = c(2, 1, 1), period = 7))))
-      )
-    } else if (opt$model == "BSTS") {
-      ceiling(forecast_bsts(api_usage[, "cirrus"], transformation = "log", 
ar_lags = 1, n_iter = opt$iters, burn_in = opt$burnin))
-    } else if (opt$model == "Prophet") {
-      ceiling(forecast_prophet(api_usage[, "cirrus"], transformation = "log", 
n_iter = opt$iters))
-    }
-  },
 
   "search_zrr_overall" = {
     zrr_overall <- 
read_data("discovery/metrics/search/cirrus_query_aggregates_no_automata.tsv", 
col_types = "Dd") %>%
diff --git a/modules/forecasts/search/api_cirrus_arima 
b/modules/forecasts/search/api_cirrus_arima
deleted file mode 100755
index be8ca1b..0000000
--- a/modules/forecasts/search/api_cirrus_arima
+++ /dev/null
@@ -1,3 +0,0 @@
-#!/bin/bash
-
-Rscript modules/forecasts/forecast.R --date=$1 --metric=search_api_cirrus 
--model=ARIMA
diff --git a/modules/forecasts/search/api_cirrus_bsts 
b/modules/forecasts/search/api_cirrus_bsts
deleted file mode 100755
index 7768915..0000000
--- a/modules/forecasts/search/api_cirrus_bsts
+++ /dev/null
@@ -1,3 +0,0 @@
-#!/bin/bash
-
-Rscript modules/forecasts/forecast.R --date=$1 --metric=search_api_cirrus 
--model=BSTS
diff --git a/modules/forecasts/search/api_cirrus_prophet 
b/modules/forecasts/search/api_cirrus_prophet
deleted file mode 100755
index 26a4a06..0000000
--- a/modules/forecasts/search/api_cirrus_prophet
+++ /dev/null
@@ -1,3 +0,0 @@
-#!/bin/bash
-
-Rscript modules/forecasts/forecast.R --date=$1 --metric=search_api_cirrus 
--model=Prophet
diff --git a/modules/forecasts/search/config.yaml 
b/modules/forecasts/search/config.yaml
index a46deaa..9cced40 100644
--- a/modules/forecasts/search/config.yaml
+++ b/modules/forecasts/search/config.yaml
@@ -1,19 +1,4 @@
 reports:
-    api_cirrus_arima:
-        description: ARIMA-modelled forecasts of Cirrus API usage by 
non-automata users
-        granularity: days
-        starts: 2017-02-01
-        type: script
-    api_cirrus_bsts:
-        description: BSTS-modelled forecasts of Cirrus API usage by 
non-automata users
-        granularity: days
-        starts: 2017-02-01
-        type: script
-    api_cirrus_prophet:
-        description: Prophet-modelled forecasts of Cirrus API usage by 
non-automata users
-        granularity: days
-        starts: 2017-02-01
-        type: script
     zrr_overall_arima:
         description: ARIMA-modelled forecasts of zero results rate, excluding 
known bots/tools
         granularity: days

-- 
To view, visit https://gerrit.wikimedia.org/r/378185
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I8a2107f7c494c4567c41d1f8331902687fb26112
Gerrit-PatchSet: 1
Gerrit-Project: wikimedia/discovery/golden
Gerrit-Branch: master
Gerrit-Owner: Bearloga <mpo...@wikimedia.org>

_______________________________________________
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to