[MediaWiki-commits] [Gerrit] operations/puppet[production]: clean up old misc dump output files from cron jobs on dump h...
ArielGlenn has submitted this change and it was merged. ( https://gerrit.wikimedia.org/r/393245 ) Change subject: clean up old misc dump output files from cron jobs on dump hosts .. clean up old misc dump output files from cron jobs on dump hosts This cleans up on both the hosts where the dumps are generated, where we will keep fewer files, and on the hosts which serve them via nfs or the web, where we will keep more. Bug: T179942 Change-Id: Ie8e0a2a27a09009cf7be5184d8e32cab3579f8fc --- M hieradata/hosts/dataset1001.yaml A hieradata/hosts/dumpsdata1001.yaml M hieradata/hosts/dumpsdata1002.yaml M hieradata/hosts/labstore1006.yaml M hieradata/hosts/ms1001.yaml A modules/dumps/files/web/cleanups/cleanup_old_miscdumps.sh A modules/dumps/manifests/web/cleanup.pp A modules/dumps/manifests/web/cleanups/miscdumps.pp M modules/dumps/manifests/web/cleanups/xml_cleanup.pp M modules/dumps/manifests/web/xmldumps_active.pp A modules/profile/manifests/dumps/web/cleanup.pp M modules/profile/manifests/dumps/web/xmldumps_active.pp M modules/role/manifests/dumps/generation/server/fallback.pp M modules/role/manifests/dumps/generation/server/primary.pp M modules/role/manifests/dumps/public/server.pp M modules/role/manifests/dumps/web/xmldumps_active.pp M modules/role/manifests/dumps/web/xmldumps_fallback.pp 17 files changed, 200 insertions(+), 11 deletions(-) Approvals: ArielGlenn: Looks good to me, approved jenkins-bot: Verified diff --git a/hieradata/hosts/dataset1001.yaml b/hieradata/hosts/dataset1001.yaml index d32db3e..967f0c7 100644 --- a/hieradata/hosts/dataset1001.yaml +++ b/hieradata/hosts/dataset1001.yaml @@ -1,3 +1,6 @@ +profile::dumps::cleanup::isreplica: true +profile::dumps::miscdumpsdir: '/data/xmldatadumps/public/other' + profile::dumps::rsyncer: dumps_user: 'datasets' dumps_group: 'datasets' diff --git a/hieradata/hosts/dumpsdata1001.yaml b/hieradata/hosts/dumpsdata1001.yaml new file mode 100644 index 000..12f28b6 --- /dev/null +++ b/hieradata/hosts/dumpsdata1001.yaml @@ -0,0 +1,2 @@ +profile::dumps::miscdumpsdir: '/data/otherdumps' +profile::dumps::cleanup::isreplica: false diff --git a/hieradata/hosts/dumpsdata1002.yaml b/hieradata/hosts/dumpsdata1002.yaml index 867c2a3..6cba27b 100644 --- a/hieradata/hosts/dumpsdata1002.yaml +++ b/hieradata/hosts/dumpsdata1002.yaml @@ -1,3 +1,10 @@ +profile::dumps::miscdumpsdir: '/data/otherdumps' + +# this is currently a dumps generation fallback host, +# we configure cleanups of old files there the +# same way we do the active generating host +profile::dumps::cleanup::isreplica: false + profile::dumps::rsyncer: dumps_user: 'dumpsgen' dumps_group: 'dumpsgen' diff --git a/hieradata/hosts/labstore1006.yaml b/hieradata/hosts/labstore1006.yaml index c74bbdb..549439b 100644 --- a/hieradata/hosts/labstore1006.yaml +++ b/hieradata/hosts/labstore1006.yaml @@ -1,3 +1,6 @@ +profile::dumps::miscdumpsdir: '/srv/dumps/xmldatadumps/public/other' +profile::dumps::cleanup::isreplica: true + profile::dumps::rsyncer: dumps_user: 'dumpsgen' dumps_group: 'dumpsgen' diff --git a/hieradata/hosts/ms1001.yaml b/hieradata/hosts/ms1001.yaml index 8d56948..1f7dd46 100644 --- a/hieradata/hosts/ms1001.yaml +++ b/hieradata/hosts/ms1001.yaml @@ -1,3 +1,6 @@ +profile::dumps::cleanup::isreplica: true +profile::dumps::miscdumpsdir: '/data/xmldatadumps/public/other' + profile::dumps::rsyncer: dumps_user: 'datasets' dumps_group: 'datasets' diff --git a/modules/dumps/files/web/cleanups/cleanup_old_miscdumps.sh b/modules/dumps/files/web/cleanups/cleanup_old_miscdumps.sh new file mode 100644 index 000..54bf80c --- /dev/null +++ b/modules/dumps/files/web/cleanups/cleanup_old_miscdumps.sh @@ -0,0 +1,101 @@ +#!/bin/bash + +## +# This file is managed by puppet! +# puppet:///modules/dumps/web/cleanup_old_miscdumps.sh +## + +# This script removes old files produced by misc dump +# cron jobs, on hosts where they are rsynced over from +# the generating host. + +# We clean them up here rather than rsync --delete, +# because we keep and serve more of these on web +# and other servers than on the generating host. + +usage() { +cat< + + --miscdumpsdir path to root of misc dumps tree + --configfilepath to config file describing dirs and cleanup info + --dryrundon't remove anything, print what would be done + +Example: $0 --miscdumpsdir /data/xmldatadumps/other --configfile /etc/dumps/confs/cleanup_misc.conf +EOF +exit 1 +} + +miscdumpsdir="" +configfile="" +dryrun="" + +while [ $# -gt 0 ]; do +if [ $1 == "--miscdumpsdir" ]; then +miscdumpsdir="$2" +shift; shift +elif [ $1 == "--configfile" ]; then +configfile="$2" +shift; shift +elif [ $1 == "--dryrun" ]; then + dryrun="yes" +shift +else +echo "$0: Unknown option $1" >& 2 +usage +fi +done +
[MediaWiki-commits] [Gerrit] operations/puppet[production]: clean up old misc dump output files from cron jobs on dump h...
ArielGlenn has uploaded a new change for review. ( https://gerrit.wikimedia.org/r/393245 ) Change subject: clean up old misc dump output files from cron jobs on dump hosts .. clean up old misc dump output files from cron jobs on dump hosts [WIP] draft, untested, needs proper keep values in the manifest for replicas, etc This cleans up on both the hosts where the dumps are generated, where we will keep less, and on the hosts which serve them via nfs or the web, where we will keep more. Bug: T179942 Change-Id: Ie8e0a2a27a09009cf7be5184d8e32cab3579f8fc --- M hieradata/hosts/dataset1001.yaml A hieradata/hosts/dumpsdata1001.yaml M hieradata/hosts/dumpsdata1002.yaml M hieradata/hosts/labstore1006.yaml M hieradata/hosts/ms1001.yaml A modules/dumps/files/web/cleanups/cleanup_old_miscdumps.sh A modules/dumps/manifests/web/cleanup.pp A modules/dumps/manifests/web/cleanups/miscdumps.pp A modules/profile/manifests/dumps/web/cleanup.pp M modules/role/manifests/dumps/generation/server/fallback.pp M modules/role/manifests/dumps/generation/server/primary.pp M modules/role/manifests/dumps/public/server.pp M modules/role/manifests/dumps/web/xmldumps_active.pp M modules/role/manifests/dumps/web/xmldumps_fallback.pp 14 files changed, 190 insertions(+), 0 deletions(-) git pull ssh://gerrit.wikimedia.org:29418/operations/puppet refs/changes/45/393245/1 diff --git a/hieradata/hosts/dataset1001.yaml b/hieradata/hosts/dataset1001.yaml index d32db3e..18c0610 100644 --- a/hieradata/hosts/dataset1001.yaml +++ b/hieradata/hosts/dataset1001.yaml @@ -3,6 +3,7 @@ dumps_group: 'datasets' dumps_deploygroup: 'wikidev' dumps_mntpoint: '/data' +profile::dumps::cleanup::isreplica: true admin::groups: - dataset-admins diff --git a/hieradata/hosts/dumpsdata1001.yaml b/hieradata/hosts/dumpsdata1001.yaml new file mode 100644 index 000..345d04e --- /dev/null +++ b/hieradata/hosts/dumpsdata1001.yaml @@ -0,0 +1 @@ +profile::dumps::cleanup::isreplica: false diff --git a/hieradata/hosts/dumpsdata1002.yaml b/hieradata/hosts/dumpsdata1002.yaml index 867c2a3..3fbb8a2 100644 --- a/hieradata/hosts/dumpsdata1002.yaml +++ b/hieradata/hosts/dumpsdata1002.yaml @@ -1,3 +1,8 @@ +# this is currently a umps generation fallback host, +# we configure cleanups of old files there the +# same way we do the active generating host +profile::dumps::cleanup::isreplica: false + profile::dumps::rsyncer: dumps_user: 'dumpsgen' dumps_group: 'dumpsgen' diff --git a/hieradata/hosts/labstore1006.yaml b/hieradata/hosts/labstore1006.yaml index c74bbdb..842f7c5 100644 --- a/hieradata/hosts/labstore1006.yaml +++ b/hieradata/hosts/labstore1006.yaml @@ -1,3 +1,5 @@ +profile::dumps::cleanup::isreplica: true + profile::dumps::rsyncer: dumps_user: 'dumpsgen' dumps_group: 'dumpsgen' diff --git a/hieradata/hosts/ms1001.yaml b/hieradata/hosts/ms1001.yaml index 8d56948..816c561 100644 --- a/hieradata/hosts/ms1001.yaml +++ b/hieradata/hosts/ms1001.yaml @@ -1,3 +1,5 @@ +profile::dumps::cleanup::isreplica: true + profile::dumps::rsyncer: dumps_user: 'datasets' dumps_group: 'datasets' diff --git a/modules/dumps/files/web/cleanups/cleanup_old_miscdumps.sh b/modules/dumps/files/web/cleanups/cleanup_old_miscdumps.sh new file mode 100644 index 000..d628db5 --- /dev/null +++ b/modules/dumps/files/web/cleanups/cleanup_old_miscdumps.sh @@ -0,0 +1,104 @@ +#!/bin/bash + +## +# This file is managed by puppet! +# puppet:///modules/dumps/web/cleanup_old_miscdumps.sh +## + +# This script removes old files produced by misc dump +# cron jobs, on hosts where they are rsynced over from +# the generating host. + +# We clean them up here rather than rsync --delete, +# because we keep and serve more of these on web +# and other servers than on the generating host. + +usage() { +cat< + + --miscdumpsdir path to root of misc dumps tree + --configfilepath to config file describing dirs and cleanup info + --dryrundon't remove anything, print what would be done + +Example: $0 --miscdumpsdir /data/xmldatadumps/other --configfile /etc/dumps/confs/cleanup_misc.conf +EOF +exit 1 +} + +miscdumpsdir="" +configfile="" +dryrun="" + +while [ $# -gt 0 ]; do +if [ $1 == "--miscdumpsdir" ]; then +miscdumpsdir="$2" +shift; shift +elif [ $1 == "--configfile" ]; then +configfile="$2" +shift; shift +elif [ $1 == "--dryrun" ]; then + dryrun="yes" +shift +else +echo "$0: Unknown option $1" >& 2 +usage +fi +done + +if [ -z "$miscdumpsdir" ]; then +echo "$0: missing argument --miscdumpsdir" +usage && exit 1 +elif [ -z "$configfile" ]; then +echo "$0: missing argument --configfile" +usage && exit 1 +fi + +if [ ! -d "$miscdumpsdir" ]; then +echo "no such directory $miscdumpsdir" +exit 1 +fi + +cd "$miscdumpsdir" || exit 1