Krinkle has uploaded a new change for review. ( 
https://gerrit.wikimedia.org/r/339802 )

Change subject: [WIP] mediawiki: Add cache-warmup to maintenance
......................................................................

[WIP] mediawiki: Add cache-warmup to maintenance

* Ensure nodejs is installed on the maintenance host (terbium, wasat).
* Ensure warmup script is installed.

FIXME:

* Verify that changing request host in Node.js works with
  our setup or figure a different way (e.g. text-lb has invalid cert).
  With curl we use --resolve 'en.wikipedia.org:443:<ip of lb>' which
  works. Is there a Node equiv?

* Decide how to fetch list of servers (conftool?)

* Decide concurrency.

Bug: T156922
Change-Id: I95ba0ace6daefb135af43019e0ebce741875b4ea
---
A modules/mediawiki/files/maintenance/mediawiki-cache-warmup/README.md
A modules/mediawiki/files/maintenance/mediawiki-cache-warmup/urls-cluster.txt
A modules/mediawiki/files/maintenance/mediawiki-cache-warmup/urls-server.txt
A modules/mediawiki/files/maintenance/mediawiki-cache-warmup/util.js
A modules/mediawiki/files/maintenance/mediawiki-cache-warmup/warmup.js
A modules/mediawiki/manifests/maintenance/cache_warmup.pp
6 files changed, 468 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.wikimedia.org:29418/operations/puppet 
refs/changes/02/339802/1

diff --git 
a/modules/mediawiki/files/maintenance/mediawiki-cache-warmup/README.md 
b/modules/mediawiki/files/maintenance/mediawiki-cache-warmup/README.md
new file mode 100644
index 0000000..67961eb
--- /dev/null
+++ b/modules/mediawiki/files/maintenance/mediawiki-cache-warmup/README.md
@@ -0,0 +1,28 @@
+### Usage
+
+```sh
+node warmup.js urls-cluster.txt spread
+
+node warmup.js urls-server.txt clone-debug
+```
+
+
+### Output
+```
+$ node warmup.js
+
+Usage: node warmup.js [targets] [mode]
+
+ - targets     Path to a text file containing newline-separated list of urls, 
may contain %server or %mobileServer.
+ - mode        One of "spread" (via load-balancer) or "clone" (send each url 
to all servers)
+
+$ node warmup.js urls-server.txt clone-debug
+...
+[2017-02-02T01:05:29.414Z] Request 
https://jbo.wiktionary.org/w/load.php?debug=false&modules=jquery%2Cmediawiki&only=scripts
+[2017-02-02T01:05:29.422Z] Request 
https://ne.wikibooks.org/w/load.php?debug=false&modules=jquery%2Cmediawiki&only=scripts
+Statistics:
+- timing: min = 0.134s | max = 28.894s | avg = 1.040s | total = 46s
+- concurrency: min = 0 | max = 49 | avg = 48
+
+Done!
+```
diff --git 
a/modules/mediawiki/files/maintenance/mediawiki-cache-warmup/urls-cluster.txt 
b/modules/mediawiki/files/maintenance/mediawiki-cache-warmup/urls-cluster.txt
new file mode 100644
index 0000000..4274a3b
--- /dev/null
+++ 
b/modules/mediawiki/files/maintenance/mediawiki-cache-warmup/urls-cluster.txt
@@ -0,0 +1,10 @@
+# Purpose: Root redirect
+https://%server/
+# Purpose: Main Page, Skin cache, Sidebar cache, Localisation cache
+https://%server/wiki/Main_Page
+# Purpose: MobileFrontend, Main Page
+https://%mobileServer/wiki/Main_Page
+# Purpose: Login page
+https://%server/wiki/Special:UserLogin
+# Purpose: API, Recent changes
+https://%server/w/api.php?format=json&action=query&list=recentchanges
diff --git 
a/modules/mediawiki/files/maintenance/mediawiki-cache-warmup/urls-server.txt 
b/modules/mediawiki/files/maintenance/mediawiki-cache-warmup/urls-server.txt
new file mode 100644
index 0000000..790fa05
--- /dev/null
+++ b/modules/mediawiki/files/maintenance/mediawiki-cache-warmup/urls-server.txt
@@ -0,0 +1,4 @@
+# Purpose: APC for ResourceLoader
+https://%server/w/load.php?debug=false&modules=startup&only=scripts
+https://%server/w/load.php?debug=false&modules=jquery%2Cmediawiki&only=scripts
+https://%server/w/load.php?debug=false&modules=site%7Csite.styles
diff --git a/modules/mediawiki/files/maintenance/mediawiki-cache-warmup/util.js 
b/modules/mediawiki/files/maintenance/mediawiki-cache-warmup/util.js
new file mode 100644
index 0000000..82862c9
--- /dev/null
+++ b/modules/mediawiki/files/maintenance/mediawiki-cache-warmup/util.js
@@ -0,0 +1,268 @@
+var http = require( 'http' ),
+       https = require( 'https' ),
+       url = require( 'url' );
+
+/**
+ * @param {string} url
+ * @return {Promise}
+ */
+function fetchHttpsUrl( url ) {
+       return new Promise( function ( resolve, reject ) {
+               var req = https.get( url, function ( resp ) {
+                       var data = '';
+                       resp.on( 'data', function ( chunk ) {
+                               data += chunk;
+                       } );
+                       resp.on( 'end', function() {
+                               resolve( data );
+                       } );
+               } );
+               req.on( 'error', reject );
+       } );
+}
+
+/**
+ * @param {string|Object} options
+ * @return {Promise}
+ */
+function fetchUrl( options ) {
+       var request;
+       if ( typeof options === 'string' ) {
+               options = url.parse( options );
+       }
+       request = options.protocol === 'https:' ? https : http;
+       return new Promise( function ( resolve, reject ) {
+               var req = request.get( options, function ( resp ) {
+                       // Discard data
+                       resp.resume();
+                       resp.on( 'end', function() {
+                               resolve();
+                       } );
+               } );
+               req.on( 'error', function ( err ) {
+                       reject( new Error( err + ' [for url ' + url.format( 
options ) + ']' ) );
+               } );
+       } );
+}
+
+/**
+ * @return {Promise}
+ */
+function getSiteMatrix() {
+       return fetchHttpsUrl( 
'https://meta.wikimedia.org/w/api.php?format=json&action=sitematrix&smlangprop=site&smsiteprop=url|dbname'
 )
+               .then( JSON.parse )
+               .then( function ( data ) {
+                       var map, key, group, i, wiki;
+                       map = Object.create( null );
+                       for ( key in data.sitematrix ) {
+                               if ( key === 'count' ) {
+                                       continue;
+                               }
+                               group = key === 'specials' ? data.sitematrix[ 
key ] : data.sitematrix[ key ].site;
+                               if ( group && group.length ) {
+                                       for ( i = 0; i < group.length; i++ ) {
+                                               if ( group[ i ].private === 
undefined &&
+                                                       group[ i ].closed === 
undefined &&
+                                                       // Exlude labswiki 
(wikitech) and labtestwiki
+                                                       group[ i ].nonglobal 
=== undefined &&
+                                                       group[ i ].fishbowl === 
undefined
+                                               ) {
+                                                       wiki = group[ i ];
+                                                       map[ wiki.dbname ] = {
+                                                               dbname: 
wiki.dbname,
+                                                               url: wiki.url,
+                                                               host: 
url.parse( wiki.url ).host
+                                                       };
+                                               }
+                                       }
+                               }
+                       }
+                       return map;
+               } );
+}
+
+function makeMobileHost( wiki ) {
+       var pattern, parts,
+               wgMobileUrlTemplate = {
+                       default: '%h0.m.%h1.%h2',
+                       foundationwiki: 'm.%h0.%h1',
+                       mediawikiwiki: 'm.%h1.%h2',
+                       sourceswiki: 'm.%h0.%h1',
+                       wikidatawiki: 'm.%h1.%h2',
+                       labswiki: false,
+                       labtestwiki: false,
+                       loginwiki: false
+               };
+       pattern = wgMobileUrlTemplate[ wiki.dbname ] !== undefined ?
+               wgMobileUrlTemplate[ wiki.dbname ] :
+               wgMobileUrlTemplate.default;
+       if ( !pattern ) {
+               return false;
+       }
+       parts = wiki.host.split( '.' );
+       return pattern.replace( /%h([0-9])/g, function ( mAll, m1 ) {
+               return parts[ m1 ] || '';
+       } );
+}
+
+/**
+ * @param {string[]} lines
+ * @return {string[]}
+ */
+function reduceTxtLines( lines ) {
+       return lines.reduce( function ( out, line ) {
+               var text = line.trim();
+               if ( text && text[ 0 ] !== '#' ) {
+                       // Line with placeholders like %hostname or %m-hostname
+                       out.push( text );
+               }
+               return out;
+       }, [] );
+}
+
+/**
+ * @param {string[]} urls
+ * @return {Promise} List of strings
+ */
+function expandUrlList( urls ) {
+       return getSiteMatrix().then( function ( wikis ) {
+               return urls.reduce( function ( out, url ) {
+                       var dbname, mhost;
+                       // Ensure HTTP instead of HTTPS
+                       // url = url.replace( /^https:/, 'http:' );
+                       if ( url.indexOf( '%server' ) !== -1 ) {
+                               // If %server, insert one for each wiki
+                               for ( dbname in wikis ) {
+                                       out.push( url.replace( /%server/g, 
wikis[ dbname ].host ) );
+                               }
+                       } else if ( url.indexOf( '%mobileServer' ) !== -1 ) {
+                               // If %mobileServer, insert one for each wiki, 
converted to mobile
+                               for ( dbname in wikis ) {
+                                       mhost = makeMobileHost( wikis[ dbname ] 
);
+                                       if ( mhost ) {
+                                               out.push( url.replace( 
/%mobileServer/g, mhost ) );
+                                       }
+                               }
+                       } else {
+                               out.push( url );
+                       }
+                       return out;
+               }, [] );
+       } );
+}
+
+/**
+ * @param {Object} options
+ * @param {Object} options.globalConcurrency
+ * @param {Object} options.groupConcurrency
+ * @param {Array|Object} dataset Items to be passed to the handler. May be 
fragmented
+ *  by a group by passing an object containing arrays instead.
+ * @param {Function} handler
+ * @return {Promise}
+ */
+function worker( options, dataset, handler ) {
+       var tasks,
+               concurrency = {
+                       global: 0
+               },
+               workStart = process.hrtime(),
+               stats = {
+                       timing: {
+                               min: Infinity,
+                               max: -Infinity,
+                               avg: 0
+                       },
+                       count: {
+                               min: Infinity,
+                               max: -Infinity,
+                               avg: 0,
+                               total: 0
+                       }
+               };
+       if ( !Array.isArray( dataset ) ) {
+               return Promise.reject( new Error( 'Groups not yet supported' ) 
);
+       }
+       function expandDiff( diff ) {
+               return diff[ 0 ] * 1e9 + diff[ 1 ]; // in nanoseconds
+       }
+       function writeStats( diff ) {
+               var duration, oldTotal;
+               duration = expandDiff( diff );
+               oldTotal = stats.count.total;
+               stats.timing.min = Math.min( stats.timing.min, duration );
+               stats.timing.max = Math.max( stats.timing.max, duration );
+               stats.timing.avg = ( ( stats.timing.avg * oldTotal ) + duration 
) / ( oldTotal + 1 );
+               stats.count.total++;
+               stats.count.min = Math.min( stats.count.min, concurrency.global 
);
+               stats.count.max = Math.max( stats.count.max, concurrency.global 
);
+               stats.count.avg = ( ( stats.count.avg * oldTotal ) + 
concurrency.global ) / ( oldTotal + 1 );
+       }
+       tasks = dataset.slice();
+       return new Promise( function ( resolve, reject ) {
+               function startTask( task ) {
+                       var ret, start;
+                       concurrency.global++;
+                       start = process.hrtime();
+                       ret = handler( task );
+                       Promise.resolve( ret )
+                               .then( function () {
+                                       writeStats( process.hrtime( start ) );
+                                       concurrency.global--;
+                                       handlePending();
+                               } )
+                               .catch( reject );
+               }
+               function handlePending() {
+                       if ( !tasks.length && !concurrency.global ) {
+                               stats.timing.total = expandDiff( 
process.hrtime( workStart ) );
+                               resolve( stats );
+                               return;
+                       }
+                       while ( tasks.length && concurrency.global < 
options.globalConcurrency ) {
+                               startTask( tasks.pop() );
+                       }
+               }
+               handlePending();
+       } );
+}
+
+// See https://bost.ocks.org/mike/shuffle/
+function shuffle( array ) {
+       var tmp, random,
+               i = array.length;
+       while ( i !== 0 ) {
+               // Take one of the remaining elements
+               random = Math.floor( Math.random() * i );
+               i--;
+               // And swap it with the current one
+               tmp = array[ i ];
+               array[ i ] = array[ random ];
+               array[ random ] = tmp;
+       }
+       return array;
+}
+
+/**
+ * @param {Object} reqOptions Options for http.get() via fetchUrl().
+ *  The reqOptions should already contain the 'host' and 'path' options at
+ *  this point. Merged from url.parse().
+ * @param {string} dest Hostname of destination server.
+ */
+function setHostDestination( reqOptions, dest ) {
+       if ( dest && reqOptions.host ) {
+               // Move canonical wiki hostname to 'Host' header
+               reqOptions.headers[ 'Host' ] = reqOptions.host;
+               // Change host for http request to the intended server
+               reqOptions.host = dest;
+       }
+}
+
+module.exports = {
+       fetchUrl,
+       getSiteMatrix,
+       makeMobileHost,
+       reduceTxtLines,
+       expandUrlList,
+       worker,
+       shuffle
+};
diff --git 
a/modules/mediawiki/files/maintenance/mediawiki-cache-warmup/warmup.js 
b/modules/mediawiki/files/maintenance/mediawiki-cache-warmup/warmup.js
new file mode 100644
index 0000000..f6ef5ab
--- /dev/null
+++ b/modules/mediawiki/files/maintenance/mediawiki-cache-warmup/warmup.js
@@ -0,0 +1,107 @@
+var fs = require( 'fs' ),
+       http = require( 'http' ),
+       path = require( 'path' ),
+       url = require( 'url' ),
+
+       util = require( './util' ),
+
+       mode, urlList;
+
+function usage() {
+       console.log( '\nUsage: node ' + path.basename( process.argv[ 1 ] ) + ' 
[file] [mode]\n' );
+       console.log( ' - file \tPath to a text file containing 
newline-separated list of urls, may contain %server or %mobileServer.' );
+       console.log( ' - mode \tOne of:\n   \t\t"spread": distribute urls via 
load-balancer\n   \t\t"clone": send each url to each server\n   
\t\t"clone-debug": send urls to debug server' );
+}
+
+if ( !process.argv[ 2 ] || !process.argv[ 3 ] ) {
+       usage();
+       process.exit( 1 );
+}
+
+// Process cli arguments
+mode = process.argv[ 3 ];
+if ( mode !== 'spread' && mode !== 'clone' && mode !== 'clone-debug' ) {
+       console.error( 'Error: Invalid mode' );
+       usage();
+       process.exit( 1 );
+}
+urlList = util.reduceTxtLines( fs.readFileSync( process.argv[ 2 ] 
).toString().split( '\n' ) );
+
+if ( mode === 'clone' ) {
+       // Mode: clone
+       // This mode runs each of the listed urls on each of the servers.
+       // This is meant for warming up APC caches on each app server.
+
+       // TODO:
+       // - Fetch list of servers
+       //   Either from puppet, appserver from conftool-data/nodes/eqiad.yaml
+       //   or, from confd, using:
+       //   `sudo -i confctl --quiet select 
'dc=codfw,cluster=appserver,service=apache2,pooled=yes' get`
+       //   Return format:
+       //     {"mw0000.codfw.wmnet": {"pooled": "yes", ..}, ..}
+       //     {"mw0001.codfw.wmnet": {"pooled": "yes", ..}, ..}
+       // - Clone url list once for each server, swap hostname,
+       //   and add Host header. In format for util.worker().
+       // - Configure groupConcurrency for util.worker().
+       console.error( 'Mode "clone" not yet implemented.' );
+       process.exit( 1 );
+}
+
+// Mode: spread or clone-debug
+// This mode takes a list of urls and sends it to a cluster.
+// This is meanta for warming up a shared service like Memcached or SQL.
+
+// TODO: In mode 'spread', override host destination from the production
+// wiki hostname, to e.g. text-lb.eqiad.wikimedia.org or 
text-lb.codfw.wikimedia.org.
+util.expandUrlList( urlList ).then( function ( urls ) {
+       var baseOptions, workerConfig;
+       baseOptions = {
+               agent: new http.Agent( { keepAlive: true } ),
+               headers: {
+                       'User-Agent': 'node-wikimedia-warmup; Contact: Krinkle'
+               }
+       };
+       workerConfig = {
+               globalConcurrency: 500
+       };
+
+       if ( mode === 'clone-debug' ) {
+               baseOptions.headers[ 'X-Wikimedia-Debug' ] = '1';
+               workerConfig.globalConcurrency = 50;
+       }
+
+       if ( mode === 'spread' ) {
+               console.error( 'Mode "spread" not yet implemented.' );
+               process.exit( 1 );
+       }
+
+       return util.worker(
+               workerConfig,
+               // Randomize order
+               util.shuffle( urls ),
+               function ( uri ) {
+                       var options = Object.assign(
+                               Object.create( baseOptions ),
+                               url.parse( uri )
+                       );
+                       console.log( `[${new Date().toISOString()}] Request 
${uri}` );
+                       if ( mode === 'spread' ) {
+                               // FIXME: Doesn't work. Will route through 
codfw varnishes
+                               // but still goes to eqiad app servers.
+                               // setHostDestination( options, 
'text-lb.codfw.wikimedia.org' );
+                       }
+                       return util.fetchUrl( options );
+               }
+       );
+} ).then( function ( stats ) {
+       console.log(
+               `Statistics:
+- timing: min = ${stats.timing.min / 1e9}s | max = ${stats.timing.max / 1e9}s 
| avg = ${stats.timing.avg / 1e9}s | total = ${Math.round( stats.timing.total / 
1e9 )}s
+- concurrency: min = ${stats.count.min} | max = ${stats.count.max} | avg = 
${Math.round( stats.count.avg )}
+`
+       );
+       console.log( 'Done!' );
+} ).catch( function ( err ) {
+       console.log( err );
+       process.exit( 1 );
+} );
diff --git a/modules/mediawiki/manifests/maintenance/cache_warmup.pp 
b/modules/mediawiki/manifests/maintenance/cache_warmup.pp
new file mode 100644
index 0000000..26a5a93
--- /dev/null
+++ b/modules/mediawiki/manifests/maintenance/cache_warmup.pp
@@ -0,0 +1,51 @@
+class mediawiki::maintenance::cache_warmup( $ensure = present ) {
+    # Include this on a maintenance host to run APC/Memcached warmup
+    # after resetting caches (e.g. during a dc switchover)
+    # https://phabricator.wikimedia.org/T156922
+    # Hopefully this will be obsolete soon enough when we run active-active.
+
+    require_package('nodejs')
+
+    file { '/var/lib/mediawiki-cache-warmup':
+        ensure => ensure_directory($ensure),
+        owner  => $::mediawiki::users::web,
+        group  => 'wikidev',
+        mode   => '0775',
+    }
+
+    file { '/var/lib/mediawiki-cache-warmup/util.js':
+        ensure => $ensure,
+        owner  => $::mediawiki::users::web,
+        group  => 'wikidev',
+        mode   => '0664',
+        source => 
'puppet:///modules/mediawiki/maintenance/mediawiki-cache-warmup/util.js',
+    }
+    file { '/var/lib/mediawiki-cache-warmup/warmup.js':
+        ensure => $ensure,
+        owner  => $::mediawiki::users::web,
+        group  => 'wikidev',
+        mode   => '0664',
+        source => 
'puppet:///modules/mediawiki/maintenance/mediawiki-cache-warmup/warmup.js',
+    }
+    file { '/var/lib/mediawiki-cache-warmup/urls-cluster.txt':
+        ensure => $ensure,
+        owner  => $::mediawiki::users::web,
+        group  => 'wikidev',
+        mode   => '0664',
+        source => 
'puppet:///modules/mediawiki/maintenance/mediawiki-cache-warmup/urls-cluster.txt',
+    }
+    file { '/var/lib/mediawiki-cache-warmup/url-server.txt':
+        ensure => $ensure,
+        owner  => $::mediawiki::users::web,
+        group  => 'wikidev',
+        mode   => '0664',
+        source => 
'puppet:///modules/mediawiki/maintenance/mediawiki-cache-warmup/url-server.txt',
+    }
+    file { '/var/lib/mediawiki-cache-warmup/README.md':
+        ensure => $ensure,
+        owner  => $::mediawiki::users::web,
+        group  => 'wikidev',
+        mode   => '0664',
+        source => 
'puppet:///modules/mediawiki/maintenance/mediawiki-cache-warmup/util.js',
+    }
+}

-- 
To view, visit https://gerrit.wikimedia.org/r/339802
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I95ba0ace6daefb135af43019e0ebce741875b4ea
Gerrit-PatchSet: 1
Gerrit-Project: operations/puppet
Gerrit-Branch: production
Gerrit-Owner: Krinkle <krinklem...@gmail.com>

_______________________________________________
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to