Bearloga has uploaded a new change for review. ( https://gerrit.wikimedia.org/r/388117 )
Change subject: Disable forecasting ...................................................................... Disable forecasting Bug: T112170 Change-Id: Ie985c774b83e961b526bd86d1ec17754a0f03c66 --- M CHANGELOG.md M README.md M docs/README.Rmd M docs/README.md M main.sh M test.R 6 files changed, 30 insertions(+), 80 deletions(-) git pull ssh://gerrit.wikimedia.org:29418/wikimedia/discovery/golden refs/changes/17/388117/1 diff --git a/CHANGELOG.md b/CHANGELOG.md index 9767a68..3c37dae 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,6 +1,9 @@ # Change Log (Patch Notes) All notable changes to this project will be documented in this file. +## 2017/11/02 +- Disabled forecasting (per [T112170#3724472](https://phabricator.wikimedia.org/T112170#3724472)) + ## 2017/10/05 - Changed which hostname the SQL queries are run on ([T176639](https://phabricator.wikimedia.org/T176639)) diff --git a/README.md b/README.md index e93ed81..f8dfca6 100644 --- a/README.md +++ b/README.md @@ -183,7 +183,7 @@ - KPIs (planned) - [x] External Traffic ([configuration](modules/metrics/external_traffic/config.yaml)) - [x] [Referer data](modules/metrics/external_traffic/referer_data) ([T116295](https://phabricator.wikimedia.org/T116295), [Change 247601](https://gerrit.wikimedia.org/r/#/c/247601/)) -- [x] **Forecasts** ([modules/forecasts/forecast.R](modules/forecasts/forecast.R), see [T112170](https://phabricator.wikimedia.org/T112170) for more details) +- [x] **Forecasts** ([modules/forecasts/forecast.R](modules/forecasts/forecast.R), see [T112170](https://phabricator.wikimedia.org/T112170) for more details) (DISABLED) - [x] Search ([configuration](modules/forecasts/search/config.yaml)) - [x] Cirrus API usage - [x] [ARIMA-modelled forecasts](modules/forecasts/search/api_cirrus_arima) diff --git a/docs/README.Rmd b/docs/README.Rmd index c7d27c0..ff37748 100644 --- a/docs/README.Rmd +++ b/docs/README.Rmd @@ -39,20 +39,3 @@ ```{r results='asis'} print_reports(metrics) ``` - -## Daily Forecasts - -```{r yamls_forecasts} -config_yamls <- list.files(path = "../modules/forecasts", pattern = "^config\\.yaml$", recursive = TRUE, full.names = TRUE) -names(config_yamls) <- sub("../modules/forecasts/", "", dirname(config_yamls), fixed = TRUE) -forecasts <- dplyr::bind_rows(lapply(config_yamls, function(path) { - config_yaml <- suppressMessages(suppressWarnings(data.tree::as.Node(yaml::yaml.load_file(path)))) - reports <- data.tree::ToDataFrameTable(config_yaml[["reports"]], "report" = "name", "description") - reports$path = paste0(file.path(dirname(path), reports$report), ifelse(reports$type == "sql", ".sql", "")) - return(reports) -}), .id = "module") -``` - -```{r results='asis'} -print_reports(forecasts) -``` diff --git a/docs/README.md b/docs/README.md index 2053bcf..ef4ef60 100644 --- a/docs/README.md +++ b/docs/README.md @@ -1,14 +1,14 @@ Discovery Datasets ================== -These files are generated by Discovery's +These files are generated by Discovery’s [Golden](https://github.com/wikimedia/wikimedia-discovery-golden/) data retrieval codebase that executes daily and uses [Reportupdater](https://wikitech.wikimedia.org/wiki/Analytics/Systems/Reportupdater) infrastructure. These datasets provide the metrics that are used by -[Discovery's Dashboards](https://discovery.wmflabs.org/) +[Discovery’s Dashboards](https://discovery.wmflabs.org/) -Last updated on 27 September 2017 +Last updated on 02 November 2017 Daily Metrics ------------- @@ -16,18 +16,18 @@ external\_traffic/ ------------------ -- **referer\_data.tsv**: Pageviews broken down by referrer class (e.g. - internal vs external) and search engine +- **referer\_data.tsv**: Pageviews broken down by referrer class + (e.g. internal vs external) and search engine - **referer\_nonbot\_data.tsv**: User-made pageviews broken down by - referrer class (e.g. internal vs external) and search engine + referrer class (e.g. internal vs external) and search engine maps/ ----- -- **actions\_per\_tool.tsv**: Actions broken down by feature (e.g. - GeoHack) +- **actions\_per\_tool.tsv**: Actions broken down by feature + (e.g. GeoHack) - **users\_per\_feature.tsv**: Counts of users broken down by feature - (e.g. GeoHack) + (e.g. GeoHack) - **users\_by\_country.tsv**: Counts of users broken down by top 10 countries - **tile\_aggregates\_with\_automata.tsv**: Tile counts by style, zoom @@ -43,11 +43,11 @@ ------- - **pageviews.tsv**: Wikipedia.org Portal pageviews, broken down by - high-volume clients vs. low-volume clients -- **referer\_data.tsv**: Pageviews broken down by referrer class (e.g. - internal vs external) -- **user\_agent\_data.tsv**: Wikipedia.org Portal visitors' browsers -- **dwell\_metrics.tsv**: Wikipedia.org Portal visitors' dwell-time + high-volume clients vs. low-volume clients +- **referer\_data.tsv**: Pageviews broken down by referrer class + (e.g. internal vs external) +- **user\_agent\_data.tsv**: Wikipedia.org Portal visitors’ browsers +- **dwell\_metrics.tsv**: Wikipedia.org Portal visitors’ dwell-time metrics - **language\_destination.tsv**: The language of the Wikipedia that the Portal visitors went to @@ -71,16 +71,16 @@ on Wikipedia.org Portal per user session, broken down by country. Historical data store. - **first\_visits\_country.tsv**: Action performed on Wikipedia.org - Portal on each user's initial visit, broken down by country + Portal on each user’s initial visit, broken down by country - **first\_visits\_country\_history.tsv**: Action performed on - Wikipedia.org Portal on each user's initial visit, broken down by + Wikipedia.org Portal on each user’s initial visit, broken down by country. Historical data store. - **clickthrough\_rate.tsv**: Last action (no action vs clickthrough) by Wikipedia.org Portal visitors - **clickthrough\_sisterprojects.tsv**: Clicks to Wikimedia projects from Wikipedia.org Portal - **clickthrough\_firstvisit.tsv**: Action performed on Wikipedia.org - Portal on each user's initial visit + Portal on each user’s initial visit - **clickthrough\_breakdown.tsv**: Last action (no action vs clickthrough) by Wikipedia.org Portal visitors, broken down by section @@ -160,27 +160,27 @@ - **cirrus\_query\_aggregates\_with\_automata.tsv**: Overall zero results rate (ZRR) - **cirrus\_query\_breakdowns\_no\_automata.tsv**: Zero results rate - (ZRR) broken down by full-text vs. prefix searches, excluding known + (ZRR) broken down by full-text vs. prefix searches, excluding known bots/tools - **cirrus\_query\_breakdowns\_with\_automata.tsv**: Zero results rate - (ZRR) broken down by full-text vs. prefix searches + (ZRR) broken down by full-text vs. prefix searches - **cirrus\_suggestion\_breakdown\_no\_automata.tsv**: Zero results rate (ZRR) of searches with suggestions, excluding known bots/tools - **cirrus\_suggestion\_breakdown\_with\_automata.tsv**: Zero results rate (ZRR) of searches with suggestions - **cirrus\_langproj\_breakdown\_no\_automata.tsv**: Zero results and - total searches broken down by language-project pairs (e.g. German - Wikiquote ZRR vs. French Wikibooks ZRR), excluding known bots/tools + total searches broken down by language-project pairs (e.g. German + Wikiquote ZRR vs. French Wikibooks ZRR), excluding known bots/tools - **cirrus\_langproj\_breakdown\_no\_automata\_history.tsv**: Zero results and total searches broken down by language-project pairs - (e.g. German Wikiquote ZRR vs. French Wikibooks ZRR), excluding + (e.g. German Wikiquote ZRR vs. French Wikibooks ZRR), excluding known bots/tools. Historical data store. - **cirrus\_langproj\_breakdown\_with\_automata.tsv**: Zero results - and total searches broken down by language-project pairs (e.g. - German Wikiquote ZRR vs. French Wikibooks ZRR) + and total searches broken down by language-project pairs + (e.g. German Wikiquote ZRR vs. French Wikibooks ZRR) - **cirrus\_langproj\_breakdown\_with\_automata\_history.tsv**: Zero results and total searches broken down by language-project pairs - (e.g. German Wikiquote ZRR vs. French Wikibooks ZRR). Historical + (e.g. German Wikiquote ZRR vs. French Wikibooks ZRR). Historical data store. - **sister\_search\_traffic.tsv**: Traffic to various wikis from Wikipedia search results pages; broken up by language, destination @@ -212,32 +212,3 @@ - **basic\_usage.tsv**: Homepage visits, SPARQL & LDF endpoint requests - -Daily Forecasts ---------------- - -search/ -------- - -- **zrr\_overall\_arima.tsv**: ARIMA-modelled forecasts of zero - results rate, excluding known bots/tools -- **zrr\_overall\_bsts.tsv**: BSTS-modelled forecasts of zero results - rate, excluding known bots/tools -- **zrr\_overall\_prophet.tsv**: Prophet-modelled forecasts of zero - results rate, excluding known bots/tools - -wdqs/ ------ - -- **homepage\_traffic\_arima.tsv**: ARIMA-modelled forecasts of WDQS - homepage traffic by non-automata users -- **homepage\_traffic\_bsts.tsv**: BSTS-modelled forecasts of WDQS - homepage traffic by non-automata users -- **homepage\_traffic\_prophet.tsv**: Prophet-modelled forecasts of - WDQS homepage traffic by non-automata users -- **sparql\_usage\_arima.tsv**: ARIMA-modelled forecasts of WDQS - SPARQL endpoint usage by non-automata -- **sparql\_usage\_bsts.tsv**: BSTS-modelled forecasts of WDQS SPARQL - endpoint usage by non-automata -- **sparql\_usage\_prophet.tsv**: Prophet-modelled forecasts of WDQS - SPARQL endpoint usage by non-automata diff --git a/main.sh b/main.sh index 52da628..c378c15 100644 --- a/main.sh +++ b/main.sh @@ -12,10 +12,3 @@ echo "Running Reportupdater on ${module} metrics..." nice ionice reportupdater/update_reports.py -l info "modules/metrics/${module}" "/srv/published-datasets/discovery/metrics/${module}" done - -# Forecasts (dependent on latest metrics) -for module in "search" "wdqs" -do - echo "Running Reportupdater on ${module} forecasts..." - nice -n 17 ionice -c 2 -n 6 reportupdater/update_reports.py -l info "modules/forecasts/${module}" "/srv/published-datasets/discovery/forecasts/${module}" -done diff --git a/test.R b/test.R index 05267b5..0a8e5aa 100644 --- a/test.R +++ b/test.R @@ -36,7 +36,7 @@ help = "Whether to print head & tail of existing datasets"), make_option("--disable_metrics", default = FALSE, action = "store_true", help = "Skip metrics modules to make the test run shorter"), - make_option("--disable_forecasts", default = FALSE, action = "store_true", + make_option("--disable_forecasts", default = TRUE, action = "store_true", help = "Skip forecasting modules to make the test run shorter"), make_option("--forecast_iters", default = 100, action = "store", type = "numeric", help = "Overrides number of MCMC iterations used in BSTS models [default %default]"), -- To view, visit https://gerrit.wikimedia.org/r/388117 To unsubscribe, visit https://gerrit.wikimedia.org/r/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ie985c774b83e961b526bd86d1ec17754a0f03c66 Gerrit-PatchSet: 1 Gerrit-Project: wikimedia/discovery/golden Gerrit-Branch: master Gerrit-Owner: Bearloga <[email protected]> _______________________________________________ MediaWiki-commits mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits
