Bearloga has submitted this change and it was merged. ( https://gerrit.wikimedia.org/r/367931 )
Change subject: Compatibility with Puppetized runs ...................................................................... Compatibility with Puppetized runs Bug: T170494 Depends-On: I6c5996c7ea0c616560ae77dc797f9435828a2c5c Change-Id: I25573e2d552ef7388c83fbbefca6ceab94adacc8 --- M CHANGELOG.md M README.md A config.R M main.sh M modules/forecasts/forecast.R M modules/metrics/maps/config.yaml M modules/metrics/maps/tile_aggregates.R M modules/metrics/portal/config.yaml M modules/metrics/portal/dwell_metrics.R M modules/metrics/portal/engagement.R M modules/metrics/portal/geographic_breakdown.R M modules/metrics/portal/languages.R M modules/metrics/portal/pageviews.R M modules/metrics/portal/user_agents.R M modules/metrics/search/app_event_counts.R M modules/metrics/search/cirrus_aggregates.R M modules/metrics/search/config.yaml M modules/metrics/search/desktop_event_counts.R M modules/metrics/search/load_times.R M modules/metrics/search/mobile_event_counts.R M modules/metrics/search/paulscore_approximations.R M modules/metrics/search/sample_page_visit_ld.R M modules/metrics/search/search_threshold_pass_rate.R M test.R 24 files changed, 69 insertions(+), 98 deletions(-) Approvals: Bearloga: Verified; Looks good to me, approved diff --git a/CHANGELOG.md b/CHANGELOG.md index 9de2244..555e9b3 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,6 +1,9 @@ # Change Log (Patch Notes) All notable changes to this project will be documented in this file. +## 2017/07/27 +- Prepared for Puppetized runs ([T170494](https://phabricator.wikimedia.org/T170494)) + ## 2017/07/05 - Switched TSS2 from Revision 16270835 to 16909631 (due to [change 360851](https://gerrit.wikimedia.org/r/#/c/360851/)) @@ -8,7 +11,7 @@ - Changed where datasets are located - Updated public README -## 2016/12/?? +## 2016/12/??-2017/02/?? - Migrated codebase to Analytics' [Reportupdater infrastructure](https://wikitech.wikimedia.org/wiki/Analytics/Reportupdater) - Rewrote certain scripts to be pure SQL - Rewrote certain R+Hive scripts to be shell+Hive scripts diff --git a/README.md b/README.md index 89ad0b0..a2584ad 100644 --- a/README.md +++ b/README.md @@ -7,15 +7,12 @@ - [Mikhail Popov](https://meta.wikimedia.org/wiki/User:MPopov_(WMF)) (Data Analyst) - [Chelsy Xie](https://meta.wikimedia.org/wiki/User:CXie_(WMF)) (Data Analyst) -For questions and comments, contact [Deb](mailto:[email protected]?subject=Discovery Analysis data retriever codebase), [Mikhail](mailto:[email protected]?subject=Golden repo), or [Chelsy](mailto:[email protected]?subject=Golden repo). +For questions and comments, contact [Deb](mailto:[email protected]), [Mikhail](mailto:[email protected]), or [Chelsy](mailto:[email protected]). ## Table of Contents -- [Setup](#setup) - - [Dependencies](#dependencies) -- [Usage](#usage) - - [Production](#production) - - [Testing](#testing) +- [Setup](#setup-and-usage) +- [Dependencies](#dependencies) - [Modules](#modules) - [Adding New Metrics Modules](#adding-new-metrics-modules) - [MySQL](#mysql) @@ -26,23 +23,11 @@ - [Adding New Forecasting Modules](#adding-new-forecasting-modules) - [Additional Information](#additional-information) -## Setup +## Setup and Usage -On [stat1002](https://wikitech.wikimedia.org/wiki/Stat1002): +As of [T170494](https://phabricator.wikimedia.org/T170494), the setup and daily runs are Puppetized on [stat1005](https://wikitech.wikimedia.org/wiki/Stat1005) via the [statistics::discovery](https://phabricator.wikimedia.org/diffusion/OPUP/browse/production/modules/statistics/manifests/discovery.pp) module (also mirrored on [GitHub](https://github.com/wikimedia/operations-puppet/blob/production/modules/statistics/manifests/discovery.pp)). -```bash -cd /a/discovery/ -git clone --recursive https://gerrit.wikimedia.org/r/wikimedia/discovery/golden -cd golden - -# If already cloned without --recursive: -git submodule update --init --recursive - -# Add execution permission to scripts: -chmod -R +x modules/ -``` - -### Dependencies +## Dependencies ```bash pip install -r reportupdater/requirements.txt @@ -56,7 +41,8 @@ Sys.setenv("https_proxy" = "http://webproxy.eqiad.wmnet:8080") # Set path for packages: -.libPaths("/a/discovery/r-library") +lib_path <- "/srv/discovery/r-library" +.libPaths(lib_path) # Essentials: install.packages( @@ -74,11 +60,11 @@ "bsts", "forecast" # ^ see note below ), - repos = "https://cran.rstudio.com/", - lib = "/a/discovery/r-library" + repos = c(CRAN = "https://cran.rstudio.com/"), + lib = lib_path ) -# 'uaparser' requires C++11, and libyaml-cpp 0.3, boost-system, boost-regex C++ libraries +# 'uaparser' requires C++11, and libyaml-cpp, boost-system, boost-regex C++ libraries devtools::install_github("ua-parser/uap-r", configure.args = "-I/usr/include/yaml-cpp -I/usr/include/boost") # 'ortiz' is needed for Search team's user engagement calculation | https://phabricator.wikimedia.org/diffusion/WDOZ/ @@ -93,37 +79,14 @@ Don't forget to add packages to [test.R](test.R) because that script checks that all packages are installed before performing a test run of the reports. -**Note**: we have had problems installing R package [bsts](https://cran.r-project.org/package=bsts) and its dependencies [Boom](https://cran.r-project.org/package=Boom) and [BoomSpikeSlab](https://cran.r-project.org/package=BoomSpikeSlab) on stat1002 (but not stat1003). Fortunately, [Andrew Otto](https://meta.wikimedia.org/wiki/User:Ottomata) has figured out what to put in [~/.R/Makevars](https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Using-Makevars) to make those packages compile. From [T147682#2837271](https://phabricator.wikimedia.org/T147682#2837271): +To update packages, use [update-library.R](https://github.com/wikimedia/puppet/blob/production/modules/r/files/update-library.R): -``` -CXX=g++-4.8 -CXX1X=g++-4.8 -CXX1XFLAGS=-std=c++11 -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g -CXX1XPICFLAGS=-fPIC -SHLIB_CXX1XLD=g++-4.8 -SHLIB_CXX1XLDFLAGS=-std=c++11 -shared -LDFLAGS=-L/usr/lib/R/lib -Wl,-Bsymbolic-functions -Wl,-z,relro +```bash +Rscript /etc/R/update-library.R -l /srv/discovery/r-library +Rscript /etc/R/update-library.R -l /srv/discovery/r-library -p polloi ``` -To **update packages**, run `Rscript test.R --update_packages` which will update all the dependencies listed in **test.R** - -## Usage - -**Note**: You don't need to use the `--config-path` argument if your config file is inside the query folder and is named **config.yaml**, that is the default. - -### Production - -To use in production, add **main.sh** to `crontab`: - -``` -$ crontab -e - -12 20 * * * cd /a/discovery/golden/ && sh main.sh -``` - -**main.sh** executes **reportupdater/update_reports.py** on each module and writes data to the respective files in **/a/aggregate-datasets/discovery/** - -### Testing +## Testing If you wish to run all the modules without writing data to files or checking for missingness, use: diff --git a/config.R b/config.R new file mode 100644 index 0000000..42f902b --- /dev/null +++ b/config.R @@ -0,0 +1,2 @@ +r_library <- "/srv/discovery/r-library" +published_datasets <- "/srv/published-datasets" diff --git a/main.sh b/main.sh index ca5d4e8..c350a55 100644 --- a/main.sh +++ b/main.sh @@ -1,24 +1,18 @@ #!/bin/bash -# Check if Reportupdater git submodule is set up -if [ ! -f reportupdater/update_reports.py ]; then - echo "Warning: Reportupdater needs to be initialized and updated..." - git submodule init && git submodule update -fi - # Sync README -rsync -c docs/README.md /a/published-datasets/discovery/README.md +rsync -c docs/README.md /srv/published-datasets/discovery/README.md # Metrics for module in "external_traffic" "wdqs" "maps" "search" "portal" do echo "Running Reportupdater on ${module} metrics..." - nice ionice reportupdater/update_reports.py "modules/metrics/${module}" "/a/published-datasets/discovery/metrics/${module}" + nice ionice reportupdater/update_reports.py "modules/metrics/${module}" "/srv/published-datasets/discovery/metrics/${module}" done # Forecasts (dependent on latest metrics) for module in "search" "wdqs" do echo "Running Reportupdater on ${module} forecasts..." - nice -n 17 ionice -c 2 -n 6 reportupdater/update_reports.py "modules/forecasts/${module}" "/a/published-datasets/discovery/forecasts/${module}" + nice -n 17 ionice -c 2 -n 6 reportupdater/update_reports.py "modules/forecasts/${module}" "/srv/published-datasets/discovery/forecasts/${module}" done diff --git a/modules/forecasts/forecast.R b/modules/forecasts/forecast.R index 292def9..eae88a1 100755 --- a/modules/forecasts/forecast.R +++ b/modules/forecasts/forecast.R @@ -1,6 +1,8 @@ #!/usr/bin/env Rscript -.libPaths("/a/discovery/r-library"); suppressPackageStartupMessages(library("optparse")) +source("config.R") +.libPaths(r_library) +suppressPackageStartupMessages(library("optparse")) option_list <- list( make_option(c("-d", "--date"), default = NA, action = "store", type = "character"), @@ -34,8 +36,8 @@ read_data <- function(path, ...) { if (grepl("^stat[0-9]{4}$", Sys.info()["nodename"])) { - # Use local datasets if run on stat1002 - return(readr::read_tsv(file.path("/a/published-datasets", path), ...)) + # Use local datasets if run on stat1005 + return(readr::read_tsv(file.path(published_datasets, path), ...)) } else { # Download from datasets.wikimedia.org otherwise return(polloi::read_dataset(path, ...)) diff --git a/modules/metrics/maps/config.yaml b/modules/metrics/maps/config.yaml index a7c4618..d0dae6e 100644 --- a/modules/metrics/maps/config.yaml +++ b/modules/metrics/maps/config.yaml @@ -2,7 +2,7 @@ el: host: analytics-store.eqiad.wmnet port: 3306 - creds_file: /etc/mysql/conf.d/statistics-private-client.cnf + creds_file: /etc/mysql/conf.d/discovery-stats-client.cnf db: log defaults: db: el diff --git a/modules/metrics/maps/tile_aggregates.R b/modules/metrics/maps/tile_aggregates.R index 67297d6..74ead22 100644 --- a/modules/metrics/maps/tile_aggregates.R +++ b/modules/metrics/maps/tile_aggregates.R @@ -1,6 +1,8 @@ #!/usr/bin/env Rscript -.libPaths("/a/discovery/r-library"); suppressPackageStartupMessages(library("optparse")) +source("config.R") +.libPaths(r_library) +suppressPackageStartupMessages(library("optparse")) option_list <- list( make_option(c("-d", "--date"), default = NA, action = "store", type = "character"), diff --git a/modules/metrics/portal/config.yaml b/modules/metrics/portal/config.yaml index ae3817c..b980a64 100644 --- a/modules/metrics/portal/config.yaml +++ b/modules/metrics/portal/config.yaml @@ -2,7 +2,7 @@ el: host: analytics-store.eqiad.wmnet port: 3306 - creds_file: /etc/mysql/conf.d/statistics-private-client.cnf + creds_file: /etc/mysql/conf.d/discovery-stats-client.cnf db: log defaults: db: el diff --git a/modules/metrics/portal/dwell_metrics.R b/modules/metrics/portal/dwell_metrics.R index ea8a6c0..e788144 100644 --- a/modules/metrics/portal/dwell_metrics.R +++ b/modules/metrics/portal/dwell_metrics.R @@ -1,6 +1,6 @@ #!/usr/bin/env Rscript -.libPaths("/a/discovery/r-library"); suppressPackageStartupMessages(library("optparse")) +.libPaths("/srv/discovery/r-library"); suppressPackageStartupMessages(library("optparse")) option_list <- list( make_option(c("-d", "--date"), default = NA, action = "store", type = "character") diff --git a/modules/metrics/portal/engagement.R b/modules/metrics/portal/engagement.R index 74314dd..5f734b9 100644 --- a/modules/metrics/portal/engagement.R +++ b/modules/metrics/portal/engagement.R @@ -1,6 +1,6 @@ #!/usr/bin/env Rscript -.libPaths("/a/discovery/r-library"); suppressPackageStartupMessages(library("optparse")) +.libPaths("/srv/discovery/r-library"); suppressPackageStartupMessages(library("optparse")) option_list <- list( make_option(c("-d", "--date"), default = NA, action = "store", type = "character"), diff --git a/modules/metrics/portal/geographic_breakdown.R b/modules/metrics/portal/geographic_breakdown.R index 3456a45..5a290b2 100644 --- a/modules/metrics/portal/geographic_breakdown.R +++ b/modules/metrics/portal/geographic_breakdown.R @@ -1,6 +1,6 @@ #!/usr/bin/env Rscript -.libPaths("/a/discovery/r-library"); suppressPackageStartupMessages(library("optparse")) +.libPaths("/srv/discovery/r-library"); suppressPackageStartupMessages(library("optparse")) option_list <- list( make_option(c("-d", "--date"), default = NA, action = "store", type = "character"), diff --git a/modules/metrics/portal/languages.R b/modules/metrics/portal/languages.R index 62fe084..875b2e4 100644 --- a/modules/metrics/portal/languages.R +++ b/modules/metrics/portal/languages.R @@ -1,6 +1,6 @@ #!/usr/bin/env Rscript -.libPaths("/a/discovery/r-library"); suppressPackageStartupMessages(library("optparse")) +.libPaths("/srv/discovery/r-library"); suppressPackageStartupMessages(library("optparse")) option_list <- list( make_option(c("-d", "--date"), default = NA, action = "store", type = "character"), diff --git a/modules/metrics/portal/pageviews.R b/modules/metrics/portal/pageviews.R index 92cf7bb..952fe1a 100644 --- a/modules/metrics/portal/pageviews.R +++ b/modules/metrics/portal/pageviews.R @@ -1,6 +1,6 @@ #!/usr/bin/env Rscript -.libPaths("/a/discovery/r-library"); suppressPackageStartupMessages(library("optparse")) +.libPaths("/srv/discovery/r-library"); suppressPackageStartupMessages(library("optparse")) option_list <- list( make_option(c("-d", "--date"), default = NA, action = "store", type = "character") diff --git a/modules/metrics/portal/user_agents.R b/modules/metrics/portal/user_agents.R index e7b36c0..baa4885 100644 --- a/modules/metrics/portal/user_agents.R +++ b/modules/metrics/portal/user_agents.R @@ -1,6 +1,6 @@ #!/usr/bin/env Rscript -.libPaths("/a/discovery/r-library"); suppressPackageStartupMessages(library("optparse")) +.libPaths("/srv/discovery/r-library"); suppressPackageStartupMessages(library("optparse")) option_list <- list( make_option(c("-d", "--date"), default = NA, action = "store", type = "character") diff --git a/modules/metrics/search/app_event_counts.R b/modules/metrics/search/app_event_counts.R index f850410..e3200d7 100644 --- a/modules/metrics/search/app_event_counts.R +++ b/modules/metrics/search/app_event_counts.R @@ -1,6 +1,8 @@ #!/usr/bin/env Rscript -.libPaths("/a/discovery/r-library"); suppressPackageStartupMessages(library("optparse")) +source("config.R") +.libPaths(r_library) +suppressPackageStartupMessages(library("optparse")) option_list <- list( make_option(c("-d", "--date"), default = NA, action = "store", type = "character"), diff --git a/modules/metrics/search/cirrus_aggregates.R b/modules/metrics/search/cirrus_aggregates.R index 654b33e..ced83e0 100644 --- a/modules/metrics/search/cirrus_aggregates.R +++ b/modules/metrics/search/cirrus_aggregates.R @@ -1,6 +1,8 @@ #!/usr/bin/env Rscript -.libPaths("/a/discovery/r-library"); suppressPackageStartupMessages(library("optparse")) +source("config.R") +.libPaths(r_library) +suppressPackageStartupMessages(library("optparse")) option_list <- list( make_option(c("-d", "--date"), default = NA, action = "store", type = "character"), diff --git a/modules/metrics/search/config.yaml b/modules/metrics/search/config.yaml index 46d9768..82f1c3f 100644 --- a/modules/metrics/search/config.yaml +++ b/modules/metrics/search/config.yaml @@ -2,7 +2,7 @@ el: host: analytics-store.eqiad.wmnet port: 3306 - creds_file: /etc/mysql/conf.d/statistics-private-client.cnf + creds_file: /etc/mysql/conf.d/discovery-stats-client.cnf db: log defaults: db: el diff --git a/modules/metrics/search/desktop_event_counts.R b/modules/metrics/search/desktop_event_counts.R index 699b231..f184d17 100644 --- a/modules/metrics/search/desktop_event_counts.R +++ b/modules/metrics/search/desktop_event_counts.R @@ -1,6 +1,8 @@ #!/usr/bin/env Rscript -.libPaths("/a/discovery/r-library"); suppressPackageStartupMessages(library("optparse")) +source("config.R") +.libPaths(r_library) +suppressPackageStartupMessages(library("optparse")) option_list <- list( make_option(c("-d", "--date"), default = NA, action = "store", type = "character"), diff --git a/modules/metrics/search/load_times.R b/modules/metrics/search/load_times.R index d6fd547..d65ed24 100644 --- a/modules/metrics/search/load_times.R +++ b/modules/metrics/search/load_times.R @@ -1,6 +1,7 @@ #!/usr/bin/env Rscript -.libPaths("/a/discovery/r-library") +source("config.R") +.libPaths(r_library) suppressPackageStartupMessages({ library("methods") library("optparse") diff --git a/modules/metrics/search/mobile_event_counts.R b/modules/metrics/search/mobile_event_counts.R index 73825c4..6f2afbc 100644 --- a/modules/metrics/search/mobile_event_counts.R +++ b/modules/metrics/search/mobile_event_counts.R @@ -1,6 +1,8 @@ #!/usr/bin/env Rscript -.libPaths("/a/discovery/r-library"); suppressPackageStartupMessages(library("optparse")) +source("config.R") +.libPaths(r_library) +suppressPackageStartupMessages(library("optparse")) option_list <- list( make_option(c("-d", "--date"), default = NA, action = "store", type = "character"), diff --git a/modules/metrics/search/paulscore_approximations.R b/modules/metrics/search/paulscore_approximations.R index b01f799..1f7fe9f 100644 --- a/modules/metrics/search/paulscore_approximations.R +++ b/modules/metrics/search/paulscore_approximations.R @@ -1,6 +1,8 @@ #!/usr/bin/env Rscript -.libPaths("/a/discovery/r-library"); suppressPackageStartupMessages(library("optparse")) +source("config.R") +.libPaths(r_library) +suppressPackageStartupMessages(library("optparse")) option_list <- list( make_option(c("-d", "--date"), default = NA, action = "store", type = "character"), diff --git a/modules/metrics/search/sample_page_visit_ld.R b/modules/metrics/search/sample_page_visit_ld.R index e044830..ee425f6 100644 --- a/modules/metrics/search/sample_page_visit_ld.R +++ b/modules/metrics/search/sample_page_visit_ld.R @@ -1,6 +1,8 @@ #!/usr/bin/env Rscript -.libPaths("/a/discovery/r-library"); suppressPackageStartupMessages(library("optparse")) +source("config.R") +.libPaths(r_library) +suppressPackageStartupMessages(library("optparse")) option_list <- list( make_option(c("-d", "--date"), default = NA, action = "store", type = "character") diff --git a/modules/metrics/search/search_threshold_pass_rate.R b/modules/metrics/search/search_threshold_pass_rate.R index c7ecb2d..aed07f3 100644 --- a/modules/metrics/search/search_threshold_pass_rate.R +++ b/modules/metrics/search/search_threshold_pass_rate.R @@ -1,6 +1,8 @@ #!/usr/bin/env Rscript -.libPaths("/a/discovery/r-library"); suppressPackageStartupMessages(library("optparse")) +source("config.R") +.libPaths(r_library) +suppressPackageStartupMessages(library("optparse")) option_list <- list( make_option(c("-d", "--date"), default = NA, action = "store", type = "character"), diff --git a/test.R b/test.R index c689fb6..adc8312 100644 --- a/test.R +++ b/test.R @@ -1,6 +1,7 @@ #!/usr/bin/env Rscript -.libPaths("/a/discovery/r-library") +source("config.R") +.libPaths(r_library) # Check dependencies: dependencies <- c( @@ -9,10 +10,8 @@ "tidyverse", "data.table", "plyr", "optparse", "yaml", "data.tree", "knitr", - # For forecasting modules: "bsts", "forecast", "prophet", - # For querying, etc.: "ISOcodes", "uaparser", "ortiz", "wmf", "polloi" ) @@ -28,7 +27,7 @@ }) option_list <- list( - make_option("--start_date", default = as.character(Sys.Date()-1, "%Y-%m-%d"), action = "store", type = "character"), + make_option("--start_date", default = as.character(Sys.Date() - 1, "%Y-%m-%d"), action = "store", type = "character"), make_option("--end_date", default = as.character(Sys.Date(), "%Y-%m-%d"), action = "store", type = "character", help = "This is required for proper Reportupdater emulation; should be 'start_date' + 1"), make_option("--omit_times", default = FALSE, action = "store_true", @@ -42,21 +41,12 @@ make_option("--forecast_iters", default = 100, action = "store", type = "numeric", help = "Overrides number of MCMC iterations used in BSTS models [default %default]"), make_option("--forecast_burnin", default = 50, action = "store", type = "numeric", - help = "Overrides number of MCMC iterations discarded in BSTS models [default %default]"), - make_option("--update_packages", default = FALSE, action = "store_true", - help = paste("Update dependencies in", .libPaths()[1])) + help = "Overrides number of MCMC iterations discarded in BSTS models [default %default]") ) # Get command line options, if help option encountered print help and exit, # otherwise if options not found on command line then set defaults: opt <- parse_args(OptionParser(option_list = option_list)) - -if (opt$update_packages) { - withr::with_libpaths("/a/discovery/r-library", devtools::update_packages(dependencies)) - library("uaparser", lib.loc = "/a/discovery/r-library") - update_regexes() - q(save = "no") -} if (opt$disable_metrics && opt$disable_forecasts) { stop("Cannot run test utility with metrics AND forecasting modules disabled.") -- To view, visit https://gerrit.wikimedia.org/r/367931 To unsubscribe, visit https://gerrit.wikimedia.org/r/settings Gerrit-MessageType: merged Gerrit-Change-Id: I25573e2d552ef7388c83fbbefca6ceab94adacc8 Gerrit-PatchSet: 6 Gerrit-Project: wikimedia/discovery/golden Gerrit-Branch: master Gerrit-Owner: Bearloga <[email protected]> Gerrit-Reviewer: Bearloga <[email protected]> Gerrit-Reviewer: Chelsyx <[email protected]> _______________________________________________ MediaWiki-commits mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits
