Filippo Giunchedi has submitted this change and it was merged.

Change subject: es-tool: try harder to enable replication
......................................................................


es-tool: try harder to enable replication

We've seen occasional timeouts when enabling replication back again, retry
several times before giving up.

Bug: T99005
Change-Id: Ib66e7219b881c7d69abac566665cd96f1550c841
---
M modules/elasticsearch/files/es-tool
1 file changed, 20 insertions(+), 3 deletions(-)

Approvals:
  Filippo Giunchedi: Verified; Looks good to me, approved



diff --git a/modules/elasticsearch/files/es-tool 
b/modules/elasticsearch/files/es-tool
index c585fd9..23c3d8a 100755
--- a/modules/elasticsearch/files/es-tool
+++ b/modules/elasticsearch/files/es-tool
@@ -8,7 +8,12 @@
 import time
 
 from elasticsearch import Elasticsearch, TransportError
+from elasticsearch.exceptions import ConnectionError
 from subprocess import CalledProcessError
+
+
+# How many times to try re-enabling allocation
+REPLICATION_ENABLE_ATTEMPTS = 10
 
 
 # Helper functions go here
@@ -152,12 +157,24 @@
         printu(".")
         time.sleep(1)
 
-    # Wait a sec
-    time.sleep(1)
+    # Let things settle a bit
+    time.sleep(3)
 
     # Turn replication back on so things will recover fully
     printu("Enabling all replication...")
-    if not set_allocation_state("all"):
+    for attempt in range(REPLICATION_ENABLE_ATTEMPTS):
+        try:
+            if not set_allocation_state("all"):
+                print "failed! -- You will still need to enable replication",
+                print "again with `es-tool start-replication`"
+                return os.EX_UNAVAILABLE
+            else:
+                break
+        except ConnectionError:
+            print "failed! -- retrying (%d/%d)" % (attempt,
+                                                   REPLICATION_ENABLE_ATTEMPTS)
+            time.sleep(3)
+    else:
         print "failed! -- You will still need to enable replication again",
         print "with `es-tool start-replication`"
         return os.EX_UNAVAILABLE

-- 
To view, visit https://gerrit.wikimedia.org/r/211672
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ib66e7219b881c7d69abac566665cd96f1550c841
Gerrit-PatchSet: 4
Gerrit-Project: operations/puppet
Gerrit-Branch: production
Gerrit-Owner: Filippo Giunchedi <fgiunch...@wikimedia.org>
Gerrit-Reviewer: Chad <ch...@wikimedia.org>
Gerrit-Reviewer: Filippo Giunchedi <fgiunch...@wikimedia.org>
Gerrit-Reviewer: Manybubbles <never...@wikimedia.org>
Gerrit-Reviewer: jenkins-bot <>

_______________________________________________
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to