ArielGlenn has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/386161 )

Change subject: use separate path for public/other datasets
......................................................................


use separate path for public/other datasets

[WIP]

These are typically generated by special cron jobs or even rsynced
over from other locations.  Don't assume they are on the same
filesystem, so that we can move them to the dumpsdata hosts separately
from the xml/sql dumps.

This also means removing all references to the 'public' dir (currently
/mnt/data/xmldumps/public) on snapshot hosts, from files for these
cron jobs.

Some jobs reference a directory for temp files; this also needs to be
a separate path, so we don't write on one NFS filesystem and move to
the other.

Bug: T178888
Change-Id: I51f995d38dc7e04582b61381a39f186c61c52159
---
M modules/snapshot/files/cron/create-media-per-project-lists.sh
M modules/snapshot/files/cron/dump-global-blocks.sh
M modules/snapshot/files/cron/dumpcategoriesrdf.sh
M modules/snapshot/files/cron/dumpcirrussearch.sh
M modules/snapshot/files/cron/dumpcontentxlation.sh
M modules/snapshot/files/cron/dumpwikidatajson.sh
M modules/snapshot/files/cron/wikidatadumps-shared.sh
M modules/snapshot/manifests/cron/contentxlation.pp
M modules/snapshot/manifests/cron/dump_global_blocks.pp
M modules/snapshot/manifests/cron/pagetitles.pp
M modules/snapshot/manifests/dumps/dirs.pp
M modules/snapshot/templates/addschanges.conf.erb
M modules/snapshot/templates/set_dump_dirs.sh.erb
13 files changed, 32 insertions(+), 22 deletions(-)

Approvals:
  ArielGlenn: Looks good to me, approved
  jenkins-bot: Verified



diff --git a/modules/snapshot/files/cron/create-media-per-project-lists.sh 
b/modules/snapshot/files/cron/create-media-per-project-lists.sh
index 9e8f7fe..d5cbe30 100755
--- a/modules/snapshot/files/cron/create-media-per-project-lists.sh
+++ b/modules/snapshot/files/cron/create-media-per-project-lists.sh
@@ -8,7 +8,7 @@
 source /usr/local/etc/set_dump_dirs.sh
 
 DATE=`/bin/date '+%Y%m%d'`
-outputdir="${datadir}/public/other/imageinfo/$DATE"
+outputdir="${otherdir}/imageinfo/$DATE"
 configfile="${confsdir}/wikidump.conf:media"
 errors=0
 
diff --git a/modules/snapshot/files/cron/dump-global-blocks.sh 
b/modules/snapshot/files/cron/dump-global-blocks.sh
index 9cd3b81..e7e4825 100644
--- a/modules/snapshot/files/cron/dump-global-blocks.sh
+++ b/modules/snapshot/files/cron/dump-global-blocks.sh
@@ -108,7 +108,7 @@
     checkval "$settingname" "${!settingname}"
 done
 
-outputdir="${datadir}/public/other/globalblocks"
+outputdir="${otherdir}/globalblocks"
 
 host=`get_db_host "$apachedir"` || exit 1
 db_user=`get_db_user "$apachedir"` || exit 1
diff --git a/modules/snapshot/files/cron/dumpcategoriesrdf.sh 
b/modules/snapshot/files/cron/dumpcategoriesrdf.sh
index d0b810d..dd1ebed 100755
--- a/modules/snapshot/files/cron/dumpcategoriesrdf.sh
+++ b/modules/snapshot/files/cron/dumpcategoriesrdf.sh
@@ -52,20 +52,19 @@
        exit 1
 fi
 
-args="wiki:dir,privatelist;tools:gzip;output:public"
+args="wiki:dir,privatelist;tools:gzip"
 results=`python "${repodir}/getconfigvals.py" --configfile "$configFile" 
--args "$args"`
 
 deployDir=`getsetting "$results" "wiki" "dir"` || exit 1
 privateList=`getsetting "$results" "wiki" "privatelist"` || exit 1
 gzip=`getsetting "$results" "tools" "gzip"` || exit 1
-publicDir=`getsetting "$results" "output" "public"` || exit 1
 
-for settingname in "deployDir" "gzip" "privateList" "publicDir"; do
+for settingname in "deployDir" "gzip" "privateList"; do
     checkval "$settingname" "${!settingname}"
 done
 
 today=$(date +'%Y%m%d')
-targetDirBase="${publicDir}/other/categoriesrdf"
+targetDirBase="${otherdir}/categoriesrdf"
 targetDir="${targetDirBase}/${today}"
 timestampsDir="${targetDirBase}/lastdump"
 multiVersionScript="${deployDir}/multiversion/MWScript.php"
diff --git a/modules/snapshot/files/cron/dumpcirrussearch.sh 
b/modules/snapshot/files/cron/dumpcirrussearch.sh
index 0431825..34d7a0d 100644
--- a/modules/snapshot/files/cron/dumpcirrussearch.sh
+++ b/modules/snapshot/files/cron/dumpcirrussearch.sh
@@ -40,21 +40,20 @@
        exit 1
 fi
 
-args="wiki:dir,dblist,privatelist;tools:gzip;output:public"
+args="wiki:dir,dblist,privatelist;tools:gzip"
 results=`python "${repodir}/getconfigvals.py" --configfile "$configFile" 
--args "$args"`
 
 deployDir=`getsetting "$results" "wiki" "dir"` || exit 1
 allList=`getsetting "$results" "wiki" "dblist"` || exit 1
 privateList=`getsetting "$results" "wiki" "privatelist"` || exit 1
 gzip=`getsetting "$results" "tools" "gzip"` || exit 1
-publicDir=`getsetting "$results" "output" "public"` || exit 1
 
-for settingname in "deployDir" "allList" "privateList" "gzip" "publicDir"; do
+for settingname in "deployDir" "allList" "privateList" "gzip"; do
     checkval "$settingname" "${!settingname}"
 done
 
 today=$(date +'%Y%m%d')
-targetDirBase="$publicDir/other/cirrussearch"
+targetDirBase="${otherdir}/cirrussearch"
 targetDir="$targetDirBase/$today"
 multiVersionScript="$deployDir/multiversion/MWScript.php"
 
diff --git a/modules/snapshot/files/cron/dumpcontentxlation.sh 
b/modules/snapshot/files/cron/dumpcontentxlation.sh
index 881a46e..a9a0235 100644
--- a/modules/snapshot/files/cron/dumpcontentxlation.sh
+++ b/modules/snapshot/files/cron/dumpcontentxlation.sh
@@ -36,7 +36,6 @@
 #####################
 
 configfile="${confsdir}/wikidump.conf"
-otherdir="${datadir}/public/other"
 dryrun="false"
 
 #####################
diff --git a/modules/snapshot/files/cron/dumpwikidatajson.sh 
b/modules/snapshot/files/cron/dumpwikidatajson.sh
index 773f588..a53ac8d 100644
--- a/modules/snapshot/files/cron/dumpwikidatajson.sh
+++ b/modules/snapshot/files/cron/dumpwikidatajson.sh
@@ -95,7 +95,7 @@
 mv $tempDir/wikidataJson.gz $targetFileGzip
 
 # Legacy directory (with legacy naming scheme)
-legacyDirectory=$publicDir/other/wikidata
+legacyDirectory=${otherdir}/wikidata
 ln -s "../wikibase/wikidatawiki/$today/$filename.json.gz" 
"$legacyDirectory/$today.json.gz"
 find $legacyDirectory -name '*.json.gz' -mtime +`expr $daysToKeep + 1` -delete
 
diff --git a/modules/snapshot/files/cron/wikidatadumps-shared.sh 
b/modules/snapshot/files/cron/wikidatadumps-shared.sh
index 9714f08..460ed02 100644
--- a/modules/snapshot/files/cron/wikidatadumps-shared.sh
+++ b/modules/snapshot/files/cron/wikidatadumps-shared.sh
@@ -14,18 +14,20 @@
 today=`date +'%Y%m%d'`
 daysToKeep=70
 
-args="wiki:dir;output:public,temp"
+args="wiki:dir;output:temp"
 results=`python "${repodir}/getconfigvals.py" --configfile "$configfile" 
--args "$args"`
 
 apacheDir=`getsetting "$results" "wiki" "dir"` || exit 1
-publicDir=`getsetting "$results" "output" "public"` || exit 1
-tempDir=`getsetting "$results" "output" "temp"` || exit 1
+#tempDir=`getsetting "$results" "output" "temp"` || exit 1
+# while jobs are split between dumpsdata and dataset hosts, fix this path
+# for those jobs remaining on dataset1001 for now
+tempDir="/mnt/data/xmldatadumps/temp"
 
-for settingname in "apacheDir" "publicDir" "tempDir"; do
+for settingname in "apacheDir" "tempDir"; do
     checkval "$settingname" "${!settingname}"
 done
 
-targetDirBase=$publicDir/other/wikibase/wikidatawiki
+targetDirBase=${otherdir}/wikibase/wikidatawiki
 targetDir=$targetDirBase/$today
 
 multiversionscript="${apacheDir}/multiversion/MWScript.php"
diff --git a/modules/snapshot/manifests/cron/contentxlation.pp 
b/modules/snapshot/manifests/cron/contentxlation.pp
index 0fc177f..91a05d5 100644
--- a/modules/snapshot/manifests/cron/contentxlation.pp
+++ b/modules/snapshot/manifests/cron/contentxlation.pp
@@ -3,7 +3,7 @@
 ) {
     include ::snapshot::dumps::dirs
 
-    $otherdir = "${snapshot::dumps::dirs::datadir}/public/other"
+    $otherdir = $snapshot::dumps::dirs::otherdir
     $repodir = $snapshot::dumps::dirs::repodir
     $confsdir = $snapshot::dumps::dirs::confsdir
     $xlationdir = "${otherdir}/contenttranslation"
diff --git a/modules/snapshot/manifests/cron/dump_global_blocks.pp 
b/modules/snapshot/manifests/cron/dump_global_blocks.pp
index d278cf2..61dabdf 100644
--- a/modules/snapshot/manifests/cron/dump_global_blocks.pp
+++ b/modules/snapshot/manifests/cron/dump_global_blocks.pp
@@ -3,7 +3,7 @@
 ) {
     include ::snapshot::dumps::dirs
     $confsdir = $snapshot::dumps::dirs::confsdir
-    $otherdir = "${snapshot::dumps::dirs::datadir}/public/other"
+    $otherdir = $snapshot::dumps::dirs::otherdir
     $globalblocksdir = "${otherdir}/globalblocks"
 
     file { '/usr/local/bin/dump-global-blocks.sh':
diff --git a/modules/snapshot/manifests/cron/pagetitles.pp 
b/modules/snapshot/manifests/cron/pagetitles.pp
index 0df5588..53c0d55 100644
--- a/modules/snapshot/manifests/cron/pagetitles.pp
+++ b/modules/snapshot/manifests/cron/pagetitles.pp
@@ -3,7 +3,7 @@
 ) {
     include ::snapshot::dumps::dirs
 
-    $otherdir = "${snapshot::dumps::dirs::datadir}/public/other"
+    $otherdir = $snapshot::dumps::dirs::otherdir
     $repodir = $snapshot::dumps::dirs::repodir
     $confsdir = $snapshot::dumps::dirs::confsdir
 
diff --git a/modules/snapshot/manifests/dumps/dirs.pp 
b/modules/snapshot/manifests/dumps/dirs.pp
index e1c3f3e..971787f 100644
--- a/modules/snapshot/manifests/dumps/dirs.pp
+++ b/modules/snapshot/manifests/dumps/dirs.pp
@@ -56,6 +56,13 @@
       group  => 'root',
     }
 
+    # maintained on the NFS fileserver, not here
+    # but we need to know the path, used for
+    # dumps and datasets other than the main
+    # xml/sql dumps
+    # make this explicit for now
+    $otherdir = '/mnt/data/xmldatadumps/public/other'
+
     $repodir = '/srv/deployment/dumps/dumps/xmldumps-backup'
 
     file { '/usr/local/etc/set_dump_dirs.sh':
diff --git a/modules/snapshot/templates/addschanges.conf.erb 
b/modules/snapshot/templates/addschanges.conf.erb
index a6bace0..ecdfd84 100644
--- a/modules/snapshot/templates/addschanges.conf.erb
+++ b/modules/snapshot/templates/addschanges.conf.erb
@@ -14,10 +14,13 @@
 adminsettings=private/PrivateSettings.php
 
 [output]
-dumpdir=<%= scope.lookupvar('snapshot::dumps::dirs::datadir') 
-%>/public/other/incr
+dumpdir=<%= scope.lookupvar('snapshot::dumps::dirs::otherdir') -%>/incr
 templatedir=<%= scope.lookupvar('snapshot::dumps::dirs::templsdir') %>
 indextmpl=<%= scope.lookupvar('snapshot::dumps::dirs::templsdir') 
-%>/incrs-index.html
-temp=<%= scope.lookupvar('snapshot::dumps::dirs::datadir') -%>/temp
+#temp=<%= scope.lookupvar('snapshot::dumps::dirs::datadir') -%>/temp
+# hardcode this during the period where some jobs will be moved to dumpsdata
+# hosts but this one will not
+temp=/mnt/data/xmldatadumps/temp
 webroot=http://download.wikimedia.org
 fileperms=0644
 # revisions must be at least this much older than time of current run
diff --git a/modules/snapshot/templates/set_dump_dirs.sh.erb 
b/modules/snapshot/templates/set_dump_dirs.sh.erb
index 90818cd..03ac0e6 100644
--- a/modules/snapshot/templates/set_dump_dirs.sh.erb
+++ b/modules/snapshot/templates/set_dump_dirs.sh.erb
@@ -4,5 +4,6 @@
 confsdir="<%= scope.lookupvar('snapshot::dumps::dirs::confsdir') -%>"
 repodir="<%= scope.lookupvar('snapshot::dumps::dirs::repodir') -%>"
 datadir="<%= scope.lookupvar('snapshot::dumps::dirs::datadir') -%>"
+otherdir="<%= scope.lookupvar('snapshot::dumps::dirs::otherdir') -%>"
 dumpsdir="<%= scope.lookupvar('snapshot::dumps::dirs::dumpsdir') -%>"
 dblistsdir="<%= scope.lookupvar('snapshot::dumps::dirs::dblistsdir') -%>"

-- 
To view, visit https://gerrit.wikimedia.org/r/386161
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I51f995d38dc7e04582b61381a39f186c61c52159
Gerrit-PatchSet: 8
Gerrit-Project: operations/puppet
Gerrit-Branch: production
Gerrit-Owner: ArielGlenn <ar...@wikimedia.org>
Gerrit-Reviewer: ArielGlenn <ar...@wikimedia.org>
Gerrit-Reviewer: Hoo man <h...@online.de>
Gerrit-Reviewer: jenkins-bot <>

_______________________________________________
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to