[MediaWiki-commits] [Gerrit] search/MjoLniR[master]: Update resource usage docs

DCausse (Code Review) Wed, 01 Nov 2017 02:44:16 -0700

DCausse has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/387442 )


Change subject: Update resource usage docs
......................................................................


Update resource usage docs

Add some information learned when evaluating different sizes of
executors used to train models.

Bug: T170009
Change-Id: I2cf916f999f053d1ce8545ac8279815e9b09a0f9
---
M docs/running-in-analytics.rst
M mjolnir/test/fixtures/load_config/example_train.expect
M mjolnir/test/utilities/test_spark.py
3 files changed, 89 insertions(+), 60 deletions(-)

Approvals:
  DCausse: Verified; Looks good to me, approved



diff --git a/docs/running-in-analytics.rst b/docs/running-in-analytics.rst
index f728052..93a6a5e 100644
--- a/docs/running-in-analytics.rst
+++ b/docs/running-in-analytics.rst
@@ -295,10 +295,37 @@
 Same as before the final argument is the wiki to limit data collection and
 training to.
 
-Resource usage in the hadoop cluster
-====================================
-TODO
+Resource usage in the hadoop cluster when training
+==================================================
 
+If the training data all fits on a single executor, that is the most efficient
+use of cluster resources. This may not be the fastest way to train individual
+models, but if we are doing hyperparameter tuning we are generally training
+many models in parallel, and the lowest total cpu time used per model comes
+from using a single executor.
+
+Training speed vs core count looks to stay relatively flat up to about 6 cores.
+Less parallelism is again more efficient in terms of total efficiency of the
+cluster, but up to 6 cores has a very minimal decrease. After 6 cores the
+efficiency loss starts to increase at a greater rate.
+
+Overall suggestions:
+
+* Train models with 4 or 6 cores per executor
+* Aim for a single executor if reasonable.
+* Limitation: Cluster has ~2GB of memory per core, so training data (with
+  duplicates, due to spark storage, task data, and xgboost DMatrix copy in CPP)
+  needs to fit in 4*2 or 6*2 GB of memory. This is actually quite reasonable 
with
+  our current feature size, but may need to be revisited is we dramatically
+  increase the number of features used.
+
+Other:
+
+* Minimum amounts of memory that work fine for training a single model will
+  overrun their memory allocation regularly when used to train in mjolnir with
+  hyperparameter optimization. We need to over provision memory vs what it 
takes
+  to spin up a spark instance and train a single model. Perhaps this is some 
sort
+  of leak, or late de-allocation, in xgboost? unsure.
 
 Help! There are exceptions eveywhere!
 =====================================
diff --git a/mjolnir/test/fixtures/load_config/example_train.expect 
b/mjolnir/test/fixtures/load_config/example_train.expect
index a423aa4..23e536f 100644
--- a/mjolnir/test/fixtures/load_config/example_train.expect
+++ b/mjolnir/test/fixtures/load_config/example_train.expect
@@ -14,7 +14,7 @@
         SPARK_HOME: /home/pytest/spark-2.1.0-bin-hadoop2.6
         USER: pytest
       mjolnir_utility: data_pipeline
-      mjolnir_utility_path: /vagrant/venv/bin/mjolnir-utilities.py
+      mjolnir_utility_path: /srv/mjolnir/venv/bin/mjolnir-utilities.py
       paths:
         dir_exist: !!set
           /etc/spark/conf: null
@@ -22,11 +22,11 @@
           /mnt/hdfs/user/pytest/mjolnir/marker: null
         file_exist: !!set
           /home/pytest/spark-2.1.0-bin-hadoop2.6/bin/spark-submit: null
-          /vagrant/mjolnir_venv.zip: null
-          /vagrant/venv/bin/mjolnir-utilities.py: null
+          /srv/mjolnir/mjolnir_venv.zip: null
+          /srv/mjolnir/venv/bin/mjolnir-utilities.py: null
           venv/bin/python: null
       spark_args:
-        archives: /vagrant/mjolnir_venv.zip#venv
+        archives: /srv/mjolnir/mjolnir_venv.zip#venv
         executor-cores: '1'
         executor-memory: 2G
         files: /usr/lib/libhdfs.so.0.0.0
@@ -51,11 +51,11 @@
           /etc/spark/conf: null
         file_exist: !!set
           /home/pytest/spark-2.1.0-bin-hadoop2.6/bin/spark-submit: null
-          /vagrant/mjolnir_venv.zip: null
-          /vagrant/venv/bin/mjolnir-utilities.py: null
+          /srv/mjolnir/mjolnir_venv.zip: null
+          /srv/mjolnir/venv/bin/mjolnir-utilities.py: null
           venv/bin/python: null
       spark_args:
-        archives: /vagrant/mjolnir_venv.zip#venv
+        archives: /srv/mjolnir/mjolnir_venv.zip#venv
         executor-cores: '1'
         executor-memory: 2G
         files: /usr/lib/libhdfs.so.0.0.0
@@ -80,11 +80,11 @@
           /etc/spark/conf: null
         file_exist: !!set
           /home/pytest/spark-2.1.0-bin-hadoop2.6/bin/spark-submit: null
-          /vagrant/mjolnir_venv.zip: null
-          /vagrant/venv/bin/mjolnir-utilities.py: null
+          /srv/mjolnir/mjolnir_venv.zip: null
+          /srv/mjolnir/venv/bin/mjolnir-utilities.py: null
           venv/bin/python: null
       spark_args:
-        archives: /vagrant/mjolnir_venv.zip#venv
+        archives: /srv/mjolnir/mjolnir_venv.zip#venv
         executor-cores: '4'
         executor-memory: 2G
         files: /usr/lib/libhdfs.so.0.0.0
@@ -108,18 +108,18 @@
         SPARK_HOME: /home/pytest/spark-2.1.0-bin-hadoop2.6
         USER: pytest
       mjolnir_utility: training_pipeline
-      mjolnir_utility_path: /vagrant/venv/bin/mjolnir-utilities.py
+      mjolnir_utility_path: /srv/mjolnir/venv/bin/mjolnir-utilities.py
       paths:
         dir_exist: !!set
           /etc/spark/conf: null
           /home/pytest/training_size: null
         file_exist: !!set
           /home/pytest/spark-2.1.0-bin-hadoop2.6/bin/spark-submit: null
-          /vagrant/mjolnir_venv.zip: null
-          /vagrant/venv/bin/mjolnir-utilities.py: null
+          /srv/mjolnir/mjolnir_venv.zip: null
+          /srv/mjolnir/venv/bin/mjolnir-utilities.py: null
           venv/bin/python: null
       spark_args:
-        archives: /vagrant/mjolnir_venv.zip#venv
+        archives: /srv/mjolnir/mjolnir_venv.zip#venv
         driver-memory: 3G
         executor-cores: '1'
         executor-memory: 2G
@@ -153,7 +153,7 @@
           SPARK_HOME: /home/pytest/spark-2.1.0-bin-hadoop2.6
           USER: pytest
         mjolnir_utility: data_pipeline
-        mjolnir_utility_path: /vagrant/venv/bin/mjolnir-utilities.py
+        mjolnir_utility_path: /srv/mjolnir/venv/bin/mjolnir-utilities.py
         paths:
           dir_exist: !!set
             /etc/spark/conf: null
@@ -161,11 +161,11 @@
             /mnt/hdfs/user/pytest/mjolnir/marker: null
           file_exist: !!set
             /home/pytest/spark-2.1.0-bin-hadoop2.6/bin/spark-submit: null
-            /vagrant/mjolnir_venv.zip: null
-            /vagrant/venv/bin/mjolnir-utilities.py: null
+            /srv/mjolnir/mjolnir_venv.zip: null
+            /srv/mjolnir/venv/bin/mjolnir-utilities.py: null
             venv/bin/python: null
         spark_args:
-          archives: /vagrant/mjolnir_venv.zip#venv
+          archives: /srv/mjolnir/mjolnir_venv.zip#venv
           executor-cores: '1'
           executor-memory: 2G
           files: /usr/lib/libhdfs.so.0.0.0
@@ -190,11 +190,11 @@
             /etc/spark/conf: null
           file_exist: !!set
             /home/pytest/spark-2.1.0-bin-hadoop2.6/bin/spark-submit: null
-            /vagrant/mjolnir_venv.zip: null
-            /vagrant/venv/bin/mjolnir-utilities.py: null
+            /srv/mjolnir/mjolnir_venv.zip: null
+            /srv/mjolnir/venv/bin/mjolnir-utilities.py: null
             venv/bin/python: null
         spark_args:
-          archives: /vagrant/mjolnir_venv.zip#venv
+          archives: /srv/mjolnir/mjolnir_venv.zip#venv
           executor-cores: '1'
           executor-memory: 2G
           files: /usr/lib/libhdfs.so.0.0.0
@@ -219,11 +219,11 @@
             /etc/spark/conf: null
           file_exist: !!set
             /home/pytest/spark-2.1.0-bin-hadoop2.6/bin/spark-submit: null
-            /vagrant/mjolnir_venv.zip: null
-            /vagrant/venv/bin/mjolnir-utilities.py: null
+            /srv/mjolnir/mjolnir_venv.zip: null
+            /srv/mjolnir/venv/bin/mjolnir-utilities.py: null
             venv/bin/python: null
         spark_args:
-          archives: /vagrant/mjolnir_venv.zip#venv
+          archives: /srv/mjolnir/mjolnir_venv.zip#venv
           executor-cores: '4'
           executor-memory: 2G
           files: /usr/lib/libhdfs.so.0.0.0
@@ -251,18 +251,18 @@
           SPARK_HOME: /home/pytest/spark-2.1.0-bin-hadoop2.6
           USER: pytest
         mjolnir_utility: training_pipeline
-        mjolnir_utility_path: /vagrant/venv/bin/mjolnir-utilities.py
+        mjolnir_utility_path: /srv/mjolnir/venv/bin/mjolnir-utilities.py
         paths:
           dir_exist: !!set
             /etc/spark/conf: null
             /home/pytest/training_size: null
           file_exist: !!set
             /home/pytest/spark-2.1.0-bin-hadoop2.6/bin/spark-submit: null
-            /vagrant/mjolnir_venv.zip: null
-            /vagrant/venv/bin/mjolnir-utilities.py: null
+            /srv/mjolnir/mjolnir_venv.zip: null
+            /srv/mjolnir/venv/bin/mjolnir-utilities.py: null
             venv/bin/python: null
         spark_args:
-          archives: /vagrant/mjolnir_venv.zip#venv
+          archives: /srv/mjolnir/mjolnir_venv.zip#venv
           driver-memory: 3G
           executor-cores: '6'
           executor-memory: 3G
@@ -298,7 +298,7 @@
           SPARK_HOME: /home/pytest/spark-2.1.0-bin-hadoop2.6
           USER: pytest
         mjolnir_utility: data_pipeline
-        mjolnir_utility_path: /vagrant/venv/bin/mjolnir-utilities.py
+        mjolnir_utility_path: /srv/mjolnir/venv/bin/mjolnir-utilities.py
         paths:
           dir_exist: !!set
             /etc/spark/conf: null
@@ -306,11 +306,11 @@
             /mnt/hdfs/user/pytest/mjolnir/marker: null
           file_exist: !!set
             /home/pytest/spark-2.1.0-bin-hadoop2.6/bin/spark-submit: null
-            /vagrant/mjolnir_venv.zip: null
-            /vagrant/venv/bin/mjolnir-utilities.py: null
+            /srv/mjolnir/mjolnir_venv.zip: null
+            /srv/mjolnir/venv/bin/mjolnir-utilities.py: null
             venv/bin/python: null
         spark_args:
-          archives: /vagrant/mjolnir_venv.zip#venv
+          archives: /srv/mjolnir/mjolnir_venv.zip#venv
           executor-cores: '1'
           executor-memory: 2G
           files: /usr/lib/libhdfs.so.0.0.0
@@ -335,11 +335,11 @@
             /etc/spark/conf: null
           file_exist: !!set
             /home/pytest/spark-2.1.0-bin-hadoop2.6/bin/spark-submit: null
-            /vagrant/mjolnir_venv.zip: null
-            /vagrant/venv/bin/mjolnir-utilities.py: null
+            /srv/mjolnir/mjolnir_venv.zip: null
+            /srv/mjolnir/venv/bin/mjolnir-utilities.py: null
             venv/bin/python: null
         spark_args:
-          archives: /vagrant/mjolnir_venv.zip#venv
+          archives: /srv/mjolnir/mjolnir_venv.zip#venv
           executor-cores: '1'
           executor-memory: 2G
           files: /usr/lib/libhdfs.so.0.0.0
@@ -364,11 +364,11 @@
             /etc/spark/conf: null
           file_exist: !!set
             /home/pytest/spark-2.1.0-bin-hadoop2.6/bin/spark-submit: null
-            /vagrant/mjolnir_venv.zip: null
-            /vagrant/venv/bin/mjolnir-utilities.py: null
+            /srv/mjolnir/mjolnir_venv.zip: null
+            /srv/mjolnir/venv/bin/mjolnir-utilities.py: null
             venv/bin/python: null
         spark_args:
-          archives: /vagrant/mjolnir_venv.zip#venv
+          archives: /srv/mjolnir/mjolnir_venv.zip#venv
           executor-cores: '4'
           executor-memory: 2G
           files: /usr/lib/libhdfs.so.0.0.0
@@ -396,18 +396,18 @@
           SPARK_HOME: /home/pytest/spark-2.1.0-bin-hadoop2.6
           USER: pytest
         mjolnir_utility: training_pipeline
-        mjolnir_utility_path: /vagrant/venv/bin/mjolnir-utilities.py
+        mjolnir_utility_path: /srv/mjolnir/venv/bin/mjolnir-utilities.py
         paths:
           dir_exist: !!set
             /etc/spark/conf: null
             /home/pytest/training_size: null
           file_exist: !!set
             /home/pytest/spark-2.1.0-bin-hadoop2.6/bin/spark-submit: null
-            /vagrant/mjolnir_venv.zip: null
-            /vagrant/venv/bin/mjolnir-utilities.py: null
+            /srv/mjolnir/mjolnir_venv.zip: null
+            /srv/mjolnir/venv/bin/mjolnir-utilities.py: null
             venv/bin/python: null
         spark_args:
-          archives: /vagrant/mjolnir_venv.zip#venv
+          archives: /srv/mjolnir/mjolnir_venv.zip#venv
           driver-memory: 3G
           executor-cores: '6'
           executor-memory: 3G
@@ -445,7 +445,7 @@
           SPARK_HOME: /home/pytest/spark-2.1.0-bin-hadoop2.6
           USER: pytest
         mjolnir_utility: data_pipeline
-        mjolnir_utility_path: /vagrant/venv/bin/mjolnir-utilities.py
+        mjolnir_utility_path: /srv/mjolnir/venv/bin/mjolnir-utilities.py
         paths:
           dir_exist: !!set
             /etc/spark/conf: null
@@ -453,11 +453,11 @@
             /mnt/hdfs/user/pytest/mjolnir/marker: null
           file_exist: !!set
             /home/pytest/spark-2.1.0-bin-hadoop2.6/bin/spark-submit: null
-            /vagrant/mjolnir_venv.zip: null
-            /vagrant/venv/bin/mjolnir-utilities.py: null
+            /srv/mjolnir/mjolnir_venv.zip: null
+            /srv/mjolnir/venv/bin/mjolnir-utilities.py: null
             venv/bin/python: null
         spark_args:
-          archives: /vagrant/mjolnir_venv.zip#venv
+          archives: /srv/mjolnir/mjolnir_venv.zip#venv
           executor-cores: '1'
           executor-memory: 2G
           files: /usr/lib/libhdfs.so.0.0.0
@@ -482,11 +482,11 @@
             /etc/spark/conf: null
           file_exist: !!set
             /home/pytest/spark-2.1.0-bin-hadoop2.6/bin/spark-submit: null
-            /vagrant/mjolnir_venv.zip: null
-            /vagrant/venv/bin/mjolnir-utilities.py: null
+            /srv/mjolnir/mjolnir_venv.zip: null
+            /srv/mjolnir/venv/bin/mjolnir-utilities.py: null
             venv/bin/python: null
         spark_args:
-          archives: /vagrant/mjolnir_venv.zip#venv
+          archives: /srv/mjolnir/mjolnir_venv.zip#venv
           executor-cores: '1'
           executor-memory: 2G
           files: /usr/lib/libhdfs.so.0.0.0
@@ -511,11 +511,11 @@
             /etc/spark/conf: null
           file_exist: !!set
             /home/pytest/spark-2.1.0-bin-hadoop2.6/bin/spark-submit: null
-            /vagrant/mjolnir_venv.zip: null
-            /vagrant/venv/bin/mjolnir-utilities.py: null
+            /srv/mjolnir/mjolnir_venv.zip: null
+            /srv/mjolnir/venv/bin/mjolnir-utilities.py: null
             venv/bin/python: null
         spark_args:
-          archives: /vagrant/mjolnir_venv.zip#venv
+          archives: /srv/mjolnir/mjolnir_venv.zip#venv
           executor-cores: '4'
           executor-memory: 2G
           files: /usr/lib/libhdfs.so.0.0.0
@@ -543,18 +543,18 @@
           SPARK_HOME: /home/pytest/spark-2.1.0-bin-hadoop2.6
           USER: pytest
         mjolnir_utility: training_pipeline
-        mjolnir_utility_path: /vagrant/venv/bin/mjolnir-utilities.py
+        mjolnir_utility_path: /srv/mjolnir/venv/bin/mjolnir-utilities.py
         paths:
           dir_exist: !!set
             /etc/spark/conf: null
             /home/pytest/training_size: null
           file_exist: !!set
             /home/pytest/spark-2.1.0-bin-hadoop2.6/bin/spark-submit: null
-            /vagrant/mjolnir_venv.zip: null
-            /vagrant/venv/bin/mjolnir-utilities.py: null
+            /srv/mjolnir/mjolnir_venv.zip: null
+            /srv/mjolnir/venv/bin/mjolnir-utilities.py: null
             venv/bin/python: null
         spark_args:
-          archives: /vagrant/mjolnir_venv.zip#venv
+          archives: /srv/mjolnir/mjolnir_venv.zip#venv
           driver-memory: 3G
           executor-cores: '4'
           executor-memory: 2G
diff --git a/mjolnir/test/utilities/test_spark.py 
b/mjolnir/test/utilities/test_spark.py
index ccea823..07e0b0c 100644
--- a/mjolnir/test/utilities/test_spark.py
+++ b/mjolnir/test/utilities/test_spark.py
@@ -70,7 +70,9 @@
     monkeypatch.setenv('USER', 'pytest')
 
     with open(test_file, 'r') as f:
-        global_config, profiles = mjolnir.utilities.spark.load_config(f, 
'marker', {})
+        global_config, profiles = mjolnir.utilities.spark.load_config(f, 
'marker', {
+            'mjolnir_dir': '/srv/mjolnir',
+        })
     compare_fixture(expect_file, {
         'global_config': global_config,
         'profiles': profiles

-- 
To view, visit https://gerrit.wikimedia.org/r/387442
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I2cf916f999f053d1ce8545ac8279815e9b09a0f9
Gerrit-PatchSet: 5
Gerrit-Project: search/MjoLniR
Gerrit-Branch: master
Gerrit-Owner: EBernhardson <[email protected]>
Gerrit-Reviewer: DCausse <[email protected]>

_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

[MediaWiki-commits] [Gerrit] search/MjoLniR[master]: Update resource usage docs

Reply via email to