ArielGlenn has submitted this change and it was merged. ( https://gerrit.wikimedia.org/r/386161 )
Change subject: use separate path for public/other datasets ...................................................................... use separate path for public/other datasets [WIP] These are typically generated by special cron jobs or even rsynced over from other locations. Don't assume they are on the same filesystem, so that we can move them to the dumpsdata hosts separately from the xml/sql dumps. This also means removing all references to the 'public' dir (currently /mnt/data/xmldumps/public) on snapshot hosts, from files for these cron jobs. Some jobs reference a directory for temp files; this also needs to be a separate path, so we don't write on one NFS filesystem and move to the other. Bug: T178888 Change-Id: I51f995d38dc7e04582b61381a39f186c61c52159 --- M modules/snapshot/files/cron/create-media-per-project-lists.sh M modules/snapshot/files/cron/dump-global-blocks.sh M modules/snapshot/files/cron/dumpcategoriesrdf.sh M modules/snapshot/files/cron/dumpcirrussearch.sh M modules/snapshot/files/cron/dumpcontentxlation.sh M modules/snapshot/files/cron/dumpwikidatajson.sh M modules/snapshot/files/cron/wikidatadumps-shared.sh M modules/snapshot/manifests/cron/contentxlation.pp M modules/snapshot/manifests/cron/dump_global_blocks.pp M modules/snapshot/manifests/cron/pagetitles.pp M modules/snapshot/manifests/dumps/dirs.pp M modules/snapshot/templates/addschanges.conf.erb M modules/snapshot/templates/set_dump_dirs.sh.erb 13 files changed, 32 insertions(+), 22 deletions(-) Approvals: ArielGlenn: Looks good to me, approved jenkins-bot: Verified diff --git a/modules/snapshot/files/cron/create-media-per-project-lists.sh b/modules/snapshot/files/cron/create-media-per-project-lists.sh index 9e8f7fe..d5cbe30 100755 --- a/modules/snapshot/files/cron/create-media-per-project-lists.sh +++ b/modules/snapshot/files/cron/create-media-per-project-lists.sh @@ -8,7 +8,7 @@ source /usr/local/etc/set_dump_dirs.sh DATE=`/bin/date '+%Y%m%d'` -outputdir="${datadir}/public/other/imageinfo/$DATE" +outputdir="${otherdir}/imageinfo/$DATE" configfile="${confsdir}/wikidump.conf:media" errors=0 diff --git a/modules/snapshot/files/cron/dump-global-blocks.sh b/modules/snapshot/files/cron/dump-global-blocks.sh index 9cd3b81..e7e4825 100644 --- a/modules/snapshot/files/cron/dump-global-blocks.sh +++ b/modules/snapshot/files/cron/dump-global-blocks.sh @@ -108,7 +108,7 @@ checkval "$settingname" "${!settingname}" done -outputdir="${datadir}/public/other/globalblocks" +outputdir="${otherdir}/globalblocks" host=`get_db_host "$apachedir"` || exit 1 db_user=`get_db_user "$apachedir"` || exit 1 diff --git a/modules/snapshot/files/cron/dumpcategoriesrdf.sh b/modules/snapshot/files/cron/dumpcategoriesrdf.sh index d0b810d..dd1ebed 100755 --- a/modules/snapshot/files/cron/dumpcategoriesrdf.sh +++ b/modules/snapshot/files/cron/dumpcategoriesrdf.sh @@ -52,20 +52,19 @@ exit 1 fi -args="wiki:dir,privatelist;tools:gzip;output:public" +args="wiki:dir,privatelist;tools:gzip" results=`python "${repodir}/getconfigvals.py" --configfile "$configFile" --args "$args"` deployDir=`getsetting "$results" "wiki" "dir"` || exit 1 privateList=`getsetting "$results" "wiki" "privatelist"` || exit 1 gzip=`getsetting "$results" "tools" "gzip"` || exit 1 -publicDir=`getsetting "$results" "output" "public"` || exit 1 -for settingname in "deployDir" "gzip" "privateList" "publicDir"; do +for settingname in "deployDir" "gzip" "privateList"; do checkval "$settingname" "${!settingname}" done today=$(date +'%Y%m%d') -targetDirBase="${publicDir}/other/categoriesrdf" +targetDirBase="${otherdir}/categoriesrdf" targetDir="${targetDirBase}/${today}" timestampsDir="${targetDirBase}/lastdump" multiVersionScript="${deployDir}/multiversion/MWScript.php" diff --git a/modules/snapshot/files/cron/dumpcirrussearch.sh b/modules/snapshot/files/cron/dumpcirrussearch.sh index 0431825..34d7a0d 100644 --- a/modules/snapshot/files/cron/dumpcirrussearch.sh +++ b/modules/snapshot/files/cron/dumpcirrussearch.sh @@ -40,21 +40,20 @@ exit 1 fi -args="wiki:dir,dblist,privatelist;tools:gzip;output:public" +args="wiki:dir,dblist,privatelist;tools:gzip" results=`python "${repodir}/getconfigvals.py" --configfile "$configFile" --args "$args"` deployDir=`getsetting "$results" "wiki" "dir"` || exit 1 allList=`getsetting "$results" "wiki" "dblist"` || exit 1 privateList=`getsetting "$results" "wiki" "privatelist"` || exit 1 gzip=`getsetting "$results" "tools" "gzip"` || exit 1 -publicDir=`getsetting "$results" "output" "public"` || exit 1 -for settingname in "deployDir" "allList" "privateList" "gzip" "publicDir"; do +for settingname in "deployDir" "allList" "privateList" "gzip"; do checkval "$settingname" "${!settingname}" done today=$(date +'%Y%m%d') -targetDirBase="$publicDir/other/cirrussearch" +targetDirBase="${otherdir}/cirrussearch" targetDir="$targetDirBase/$today" multiVersionScript="$deployDir/multiversion/MWScript.php" diff --git a/modules/snapshot/files/cron/dumpcontentxlation.sh b/modules/snapshot/files/cron/dumpcontentxlation.sh index 881a46e..a9a0235 100644 --- a/modules/snapshot/files/cron/dumpcontentxlation.sh +++ b/modules/snapshot/files/cron/dumpcontentxlation.sh @@ -36,7 +36,6 @@ ##################### configfile="${confsdir}/wikidump.conf" -otherdir="${datadir}/public/other" dryrun="false" ##################### diff --git a/modules/snapshot/files/cron/dumpwikidatajson.sh b/modules/snapshot/files/cron/dumpwikidatajson.sh index 773f588..a53ac8d 100644 --- a/modules/snapshot/files/cron/dumpwikidatajson.sh +++ b/modules/snapshot/files/cron/dumpwikidatajson.sh @@ -95,7 +95,7 @@ mv $tempDir/wikidataJson.gz $targetFileGzip # Legacy directory (with legacy naming scheme) -legacyDirectory=$publicDir/other/wikidata +legacyDirectory=${otherdir}/wikidata ln -s "../wikibase/wikidatawiki/$today/$filename.json.gz" "$legacyDirectory/$today.json.gz" find $legacyDirectory -name '*.json.gz' -mtime +`expr $daysToKeep + 1` -delete diff --git a/modules/snapshot/files/cron/wikidatadumps-shared.sh b/modules/snapshot/files/cron/wikidatadumps-shared.sh index 9714f08..460ed02 100644 --- a/modules/snapshot/files/cron/wikidatadumps-shared.sh +++ b/modules/snapshot/files/cron/wikidatadumps-shared.sh @@ -14,18 +14,20 @@ today=`date +'%Y%m%d'` daysToKeep=70 -args="wiki:dir;output:public,temp" +args="wiki:dir;output:temp" results=`python "${repodir}/getconfigvals.py" --configfile "$configfile" --args "$args"` apacheDir=`getsetting "$results" "wiki" "dir"` || exit 1 -publicDir=`getsetting "$results" "output" "public"` || exit 1 -tempDir=`getsetting "$results" "output" "temp"` || exit 1 +#tempDir=`getsetting "$results" "output" "temp"` || exit 1 +# while jobs are split between dumpsdata and dataset hosts, fix this path +# for those jobs remaining on dataset1001 for now +tempDir="/mnt/data/xmldatadumps/temp" -for settingname in "apacheDir" "publicDir" "tempDir"; do +for settingname in "apacheDir" "tempDir"; do checkval "$settingname" "${!settingname}" done -targetDirBase=$publicDir/other/wikibase/wikidatawiki +targetDirBase=${otherdir}/wikibase/wikidatawiki targetDir=$targetDirBase/$today multiversionscript="${apacheDir}/multiversion/MWScript.php" diff --git a/modules/snapshot/manifests/cron/contentxlation.pp b/modules/snapshot/manifests/cron/contentxlation.pp index 0fc177f..91a05d5 100644 --- a/modules/snapshot/manifests/cron/contentxlation.pp +++ b/modules/snapshot/manifests/cron/contentxlation.pp @@ -3,7 +3,7 @@ ) { include ::snapshot::dumps::dirs - $otherdir = "${snapshot::dumps::dirs::datadir}/public/other" + $otherdir = $snapshot::dumps::dirs::otherdir $repodir = $snapshot::dumps::dirs::repodir $confsdir = $snapshot::dumps::dirs::confsdir $xlationdir = "${otherdir}/contenttranslation" diff --git a/modules/snapshot/manifests/cron/dump_global_blocks.pp b/modules/snapshot/manifests/cron/dump_global_blocks.pp index d278cf2..61dabdf 100644 --- a/modules/snapshot/manifests/cron/dump_global_blocks.pp +++ b/modules/snapshot/manifests/cron/dump_global_blocks.pp @@ -3,7 +3,7 @@ ) { include ::snapshot::dumps::dirs $confsdir = $snapshot::dumps::dirs::confsdir - $otherdir = "${snapshot::dumps::dirs::datadir}/public/other" + $otherdir = $snapshot::dumps::dirs::otherdir $globalblocksdir = "${otherdir}/globalblocks" file { '/usr/local/bin/dump-global-blocks.sh': diff --git a/modules/snapshot/manifests/cron/pagetitles.pp b/modules/snapshot/manifests/cron/pagetitles.pp index 0df5588..53c0d55 100644 --- a/modules/snapshot/manifests/cron/pagetitles.pp +++ b/modules/snapshot/manifests/cron/pagetitles.pp @@ -3,7 +3,7 @@ ) { include ::snapshot::dumps::dirs - $otherdir = "${snapshot::dumps::dirs::datadir}/public/other" + $otherdir = $snapshot::dumps::dirs::otherdir $repodir = $snapshot::dumps::dirs::repodir $confsdir = $snapshot::dumps::dirs::confsdir diff --git a/modules/snapshot/manifests/dumps/dirs.pp b/modules/snapshot/manifests/dumps/dirs.pp index e1c3f3e..971787f 100644 --- a/modules/snapshot/manifests/dumps/dirs.pp +++ b/modules/snapshot/manifests/dumps/dirs.pp @@ -56,6 +56,13 @@ group => 'root', } + # maintained on the NFS fileserver, not here + # but we need to know the path, used for + # dumps and datasets other than the main + # xml/sql dumps + # make this explicit for now + $otherdir = '/mnt/data/xmldatadumps/public/other' + $repodir = '/srv/deployment/dumps/dumps/xmldumps-backup' file { '/usr/local/etc/set_dump_dirs.sh': diff --git a/modules/snapshot/templates/addschanges.conf.erb b/modules/snapshot/templates/addschanges.conf.erb index a6bace0..ecdfd84 100644 --- a/modules/snapshot/templates/addschanges.conf.erb +++ b/modules/snapshot/templates/addschanges.conf.erb @@ -14,10 +14,13 @@ adminsettings=private/PrivateSettings.php [output] -dumpdir=<%= scope.lookupvar('snapshot::dumps::dirs::datadir') -%>/public/other/incr +dumpdir=<%= scope.lookupvar('snapshot::dumps::dirs::otherdir') -%>/incr templatedir=<%= scope.lookupvar('snapshot::dumps::dirs::templsdir') %> indextmpl=<%= scope.lookupvar('snapshot::dumps::dirs::templsdir') -%>/incrs-index.html -temp=<%= scope.lookupvar('snapshot::dumps::dirs::datadir') -%>/temp +#temp=<%= scope.lookupvar('snapshot::dumps::dirs::datadir') -%>/temp +# hardcode this during the period where some jobs will be moved to dumpsdata +# hosts but this one will not +temp=/mnt/data/xmldatadumps/temp webroot=http://download.wikimedia.org fileperms=0644 # revisions must be at least this much older than time of current run diff --git a/modules/snapshot/templates/set_dump_dirs.sh.erb b/modules/snapshot/templates/set_dump_dirs.sh.erb index 90818cd..03ac0e6 100644 --- a/modules/snapshot/templates/set_dump_dirs.sh.erb +++ b/modules/snapshot/templates/set_dump_dirs.sh.erb @@ -4,5 +4,6 @@ confsdir="<%= scope.lookupvar('snapshot::dumps::dirs::confsdir') -%>" repodir="<%= scope.lookupvar('snapshot::dumps::dirs::repodir') -%>" datadir="<%= scope.lookupvar('snapshot::dumps::dirs::datadir') -%>" +otherdir="<%= scope.lookupvar('snapshot::dumps::dirs::otherdir') -%>" dumpsdir="<%= scope.lookupvar('snapshot::dumps::dirs::dumpsdir') -%>" dblistsdir="<%= scope.lookupvar('snapshot::dumps::dirs::dblistsdir') -%>" -- To view, visit https://gerrit.wikimedia.org/r/386161 To unsubscribe, visit https://gerrit.wikimedia.org/r/settings Gerrit-MessageType: merged Gerrit-Change-Id: I51f995d38dc7e04582b61381a39f186c61c52159 Gerrit-PatchSet: 8 Gerrit-Project: operations/puppet Gerrit-Branch: production Gerrit-Owner: ArielGlenn <ar...@wikimedia.org> Gerrit-Reviewer: ArielGlenn <ar...@wikimedia.org> Gerrit-Reviewer: Hoo man <h...@online.de> Gerrit-Reviewer: jenkins-bot <> _______________________________________________ MediaWiki-commits mailing list MediaWiki-commits@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits