[MediaWiki-commits] [Gerrit] operations/puppet[production]: clean up old misc dump output files from cron jobs on dump h...

2017-12-02 Thread ArielGlenn (Code Review)
ArielGlenn has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/393245 )

Change subject: clean up old misc dump output files from cron jobs on dump hosts
..


clean up old misc dump output files from cron jobs on dump hosts

This cleans up on both the hosts where the dumps are generated,
where we will keep fewer files, and on the hosts which serve them via nfs
or the web, where we will keep more.

Bug: T179942
Change-Id: Ie8e0a2a27a09009cf7be5184d8e32cab3579f8fc
---
M hieradata/hosts/dataset1001.yaml
A hieradata/hosts/dumpsdata1001.yaml
M hieradata/hosts/dumpsdata1002.yaml
M hieradata/hosts/labstore1006.yaml
M hieradata/hosts/ms1001.yaml
A modules/dumps/files/web/cleanups/cleanup_old_miscdumps.sh
A modules/dumps/manifests/web/cleanup.pp
A modules/dumps/manifests/web/cleanups/miscdumps.pp
M modules/dumps/manifests/web/cleanups/xml_cleanup.pp
M modules/dumps/manifests/web/xmldumps_active.pp
A modules/profile/manifests/dumps/web/cleanup.pp
M modules/profile/manifests/dumps/web/xmldumps_active.pp
M modules/role/manifests/dumps/generation/server/fallback.pp
M modules/role/manifests/dumps/generation/server/primary.pp
M modules/role/manifests/dumps/public/server.pp
M modules/role/manifests/dumps/web/xmldumps_active.pp
M modules/role/manifests/dumps/web/xmldumps_fallback.pp
17 files changed, 200 insertions(+), 11 deletions(-)

Approvals:
  ArielGlenn: Looks good to me, approved
  jenkins-bot: Verified



diff --git a/hieradata/hosts/dataset1001.yaml b/hieradata/hosts/dataset1001.yaml
index d32db3e..967f0c7 100644
--- a/hieradata/hosts/dataset1001.yaml
+++ b/hieradata/hosts/dataset1001.yaml
@@ -1,3 +1,6 @@
+profile::dumps::cleanup::isreplica: true
+profile::dumps::miscdumpsdir: '/data/xmldatadumps/public/other'
+
 profile::dumps::rsyncer:
   dumps_user: 'datasets'
   dumps_group: 'datasets'
diff --git a/hieradata/hosts/dumpsdata1001.yaml 
b/hieradata/hosts/dumpsdata1001.yaml
new file mode 100644
index 000..12f28b6
--- /dev/null
+++ b/hieradata/hosts/dumpsdata1001.yaml
@@ -0,0 +1,2 @@
+profile::dumps::miscdumpsdir: '/data/otherdumps'
+profile::dumps::cleanup::isreplica: false
diff --git a/hieradata/hosts/dumpsdata1002.yaml 
b/hieradata/hosts/dumpsdata1002.yaml
index 867c2a3..6cba27b 100644
--- a/hieradata/hosts/dumpsdata1002.yaml
+++ b/hieradata/hosts/dumpsdata1002.yaml
@@ -1,3 +1,10 @@
+profile::dumps::miscdumpsdir: '/data/otherdumps'
+
+# this is currently a dumps generation fallback host,
+# we configure cleanups of old files there the
+# same way we do the active generating host
+profile::dumps::cleanup::isreplica: false
+
 profile::dumps::rsyncer:
   dumps_user: 'dumpsgen'
   dumps_group: 'dumpsgen'
diff --git a/hieradata/hosts/labstore1006.yaml 
b/hieradata/hosts/labstore1006.yaml
index c74bbdb..549439b 100644
--- a/hieradata/hosts/labstore1006.yaml
+++ b/hieradata/hosts/labstore1006.yaml
@@ -1,3 +1,6 @@
+profile::dumps::miscdumpsdir: '/srv/dumps/xmldatadumps/public/other'
+profile::dumps::cleanup::isreplica: true
+
 profile::dumps::rsyncer:
   dumps_user: 'dumpsgen'
   dumps_group: 'dumpsgen'
diff --git a/hieradata/hosts/ms1001.yaml b/hieradata/hosts/ms1001.yaml
index 8d56948..1f7dd46 100644
--- a/hieradata/hosts/ms1001.yaml
+++ b/hieradata/hosts/ms1001.yaml
@@ -1,3 +1,6 @@
+profile::dumps::cleanup::isreplica: true
+profile::dumps::miscdumpsdir: '/data/xmldatadumps/public/other'
+
 profile::dumps::rsyncer:
   dumps_user: 'datasets'
   dumps_group: 'datasets'
diff --git a/modules/dumps/files/web/cleanups/cleanup_old_miscdumps.sh 
b/modules/dumps/files/web/cleanups/cleanup_old_miscdumps.sh
new file mode 100644
index 000..54bf80c
--- /dev/null
+++ b/modules/dumps/files/web/cleanups/cleanup_old_miscdumps.sh
@@ -0,0 +1,101 @@
+#!/bin/bash
+
+##
+# This file is managed by puppet!
+# puppet:///modules/dumps/web/cleanup_old_miscdumps.sh
+##
+
+# This script removes old files produced by misc dump
+# cron jobs, on hosts where they are rsynced over from
+# the generating host.
+
+# We clean them up here rather than rsync --delete,
+# because we keep and serve more of these on web
+# and other servers than on the generating host.
+
+usage() {
+cat<
+
+  --miscdumpsdir  path to root of misc dumps tree
+  --configfilepath to config file describing dirs and cleanup info
+  --dryrundon't remove anything, print what would be done
+
+Example:  $0 --miscdumpsdir /data/xmldatadumps/other --configfile 
/etc/dumps/confs/cleanup_misc.conf
+EOF
+exit 1
+}
+
+miscdumpsdir=""
+configfile=""
+dryrun=""
+
+while [ $# -gt 0 ]; do
+if [ $1 == "--miscdumpsdir" ]; then
+miscdumpsdir="$2"
+shift; shift
+elif [ $1 == "--configfile" ]; then
+configfile="$2"
+shift; shift
+elif [ $1 == "--dryrun" ]; then
+   dryrun="yes"
+shift
+else
+echo "$0: Unknown option $1" >& 2
+usage
+fi
+done
+

[MediaWiki-commits] [Gerrit] operations/puppet[production]: clean up old misc dump output files from cron jobs on dump h...

2017-11-24 Thread ArielGlenn (Code Review)
ArielGlenn has uploaded a new change for review. ( 
https://gerrit.wikimedia.org/r/393245 )

Change subject: clean up old misc dump output files from cron jobs on dump hosts
..

clean up old misc dump output files from cron jobs on dump hosts

[WIP] draft, untested, needs proper keep values in the manifest for
replicas, etc

This cleans up on both the hosts where the dumps are generated,
where we will keep less, and on the hosts which serve them via nfs
or the web, where we will keep more.

Bug: T179942
Change-Id: Ie8e0a2a27a09009cf7be5184d8e32cab3579f8fc
---
M hieradata/hosts/dataset1001.yaml
A hieradata/hosts/dumpsdata1001.yaml
M hieradata/hosts/dumpsdata1002.yaml
M hieradata/hosts/labstore1006.yaml
M hieradata/hosts/ms1001.yaml
A modules/dumps/files/web/cleanups/cleanup_old_miscdumps.sh
A modules/dumps/manifests/web/cleanup.pp
A modules/dumps/manifests/web/cleanups/miscdumps.pp
A modules/profile/manifests/dumps/web/cleanup.pp
M modules/role/manifests/dumps/generation/server/fallback.pp
M modules/role/manifests/dumps/generation/server/primary.pp
M modules/role/manifests/dumps/public/server.pp
M modules/role/manifests/dumps/web/xmldumps_active.pp
M modules/role/manifests/dumps/web/xmldumps_fallback.pp
14 files changed, 190 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.wikimedia.org:29418/operations/puppet 
refs/changes/45/393245/1

diff --git a/hieradata/hosts/dataset1001.yaml b/hieradata/hosts/dataset1001.yaml
index d32db3e..18c0610 100644
--- a/hieradata/hosts/dataset1001.yaml
+++ b/hieradata/hosts/dataset1001.yaml
@@ -3,6 +3,7 @@
   dumps_group: 'datasets'
   dumps_deploygroup: 'wikidev'
   dumps_mntpoint: '/data'
+profile::dumps::cleanup::isreplica: true
 
 admin::groups:
   - dataset-admins
diff --git a/hieradata/hosts/dumpsdata1001.yaml 
b/hieradata/hosts/dumpsdata1001.yaml
new file mode 100644
index 000..345d04e
--- /dev/null
+++ b/hieradata/hosts/dumpsdata1001.yaml
@@ -0,0 +1 @@
+profile::dumps::cleanup::isreplica: false
diff --git a/hieradata/hosts/dumpsdata1002.yaml 
b/hieradata/hosts/dumpsdata1002.yaml
index 867c2a3..3fbb8a2 100644
--- a/hieradata/hosts/dumpsdata1002.yaml
+++ b/hieradata/hosts/dumpsdata1002.yaml
@@ -1,3 +1,8 @@
+# this is currently a umps generation fallback host,
+# we configure cleanups of old files there the
+# same way we do the active generating host
+profile::dumps::cleanup::isreplica: false
+
 profile::dumps::rsyncer:
   dumps_user: 'dumpsgen'
   dumps_group: 'dumpsgen'
diff --git a/hieradata/hosts/labstore1006.yaml 
b/hieradata/hosts/labstore1006.yaml
index c74bbdb..842f7c5 100644
--- a/hieradata/hosts/labstore1006.yaml
+++ b/hieradata/hosts/labstore1006.yaml
@@ -1,3 +1,5 @@
+profile::dumps::cleanup::isreplica: true
+
 profile::dumps::rsyncer:
   dumps_user: 'dumpsgen'
   dumps_group: 'dumpsgen'
diff --git a/hieradata/hosts/ms1001.yaml b/hieradata/hosts/ms1001.yaml
index 8d56948..816c561 100644
--- a/hieradata/hosts/ms1001.yaml
+++ b/hieradata/hosts/ms1001.yaml
@@ -1,3 +1,5 @@
+profile::dumps::cleanup::isreplica: true
+
 profile::dumps::rsyncer:
   dumps_user: 'datasets'
   dumps_group: 'datasets'
diff --git a/modules/dumps/files/web/cleanups/cleanup_old_miscdumps.sh 
b/modules/dumps/files/web/cleanups/cleanup_old_miscdumps.sh
new file mode 100644
index 000..d628db5
--- /dev/null
+++ b/modules/dumps/files/web/cleanups/cleanup_old_miscdumps.sh
@@ -0,0 +1,104 @@
+#!/bin/bash
+
+##
+# This file is managed by puppet!
+# puppet:///modules/dumps/web/cleanup_old_miscdumps.sh
+##
+
+# This script removes old files produced by misc dump
+# cron jobs, on hosts where they are rsynced over from
+# the generating host.
+
+# We clean them up here rather than rsync --delete,
+# because we keep and serve more of these on web
+# and other servers than on the generating host.
+
+usage() {
+cat<
+
+  --miscdumpsdir  path to root of misc dumps tree
+  --configfilepath to config file describing dirs and cleanup info
+  --dryrundon't remove anything, print what would be done
+
+Example:  $0 --miscdumpsdir /data/xmldatadumps/other --configfile 
/etc/dumps/confs/cleanup_misc.conf
+EOF
+exit 1
+}
+
+miscdumpsdir=""
+configfile=""
+dryrun=""
+
+while [ $# -gt 0 ]; do
+if [ $1 == "--miscdumpsdir" ]; then
+miscdumpsdir="$2"
+shift; shift
+elif [ $1 == "--configfile" ]; then
+configfile="$2"
+shift; shift
+elif [ $1 == "--dryrun" ]; then
+   dryrun="yes"
+shift
+else
+echo "$0: Unknown option $1" >& 2
+usage
+fi
+done
+
+if [ -z "$miscdumpsdir" ]; then
+echo "$0: missing argument --miscdumpsdir"
+usage && exit 1
+elif [ -z "$configfile" ]; then
+echo "$0: missing argument --configfile"
+usage && exit 1
+fi
+
+if [ ! -d "$miscdumpsdir" ]; then
+echo "no such directory $miscdumpsdir"
+exit 1
+fi
+
+cd "$miscdumpsdir" || exit 1