aljoscha commented on a change in pull request #6965: [FLINK-10368][e2e] 
Hardened kerberized yarn e2e test
URL: https://github.com/apache/flink/pull/6965#discussion_r230008139
 
 

 ##########
 File path: flink-end-to-end-tests/test-scripts/test_yarn_kerberos_docker.sh
 ##########
 @@ -60,19 +64,41 @@ function cluster_shutdown {
 trap cluster_shutdown INT
 trap cluster_shutdown EXIT
 
-until docker cp $FLINK_TARBALL_DIR/$FLINK_TARBALL master:/home/hadoop-user/; do
-    # we're retrying this one because we don't know yet if the container is 
ready
-    echo "Uploading Flink tarball to docker master failed, retrying ..."
-    sleep 5
+# wait for kerberos to be set up
+start_time=$(date +%s)
+until docker logs master 2>&1 | grep -q "Finished master initialization"; do
+    current_time=$(date +%s)
+    time_diff=$((current_time - start_time))
+
+    if [ $time_diff -ge $MAX_RETRY_SECONDS ]; then
+        echo "ERROR: Could not start hadoop cluster. Aborting..."
+        exit 0
+    else
+        echo "Waiting for hadoop cluster to come up. We have been trying for 
$time_diff seconds, retrying ..."
+        sleep 10
+    fi
 done
 
+# perform health checks
+if ! { [ $(docker inspect -f '{{.State.Running}}' master 2>&1) = 'true' ] &&
+       [ $(docker inspect -f '{{.State.Running}}' slave1 2>&1) = 'true' ] &&
+       [ $(docker inspect -f '{{.State.Running}}' slave2 2>&1) = 'true' ] &&
+       [ $(docker inspect -f '{{.State.Running}}' kdc 2>&1) = 'true' ]; };
+then
+    echo "ERROR: Could not start hadoop cluster. At least one of the 
containers failed. Aborting..."
+    exit 0
 
 Review comment:
   Isn't exit code `1` the exit code for failure?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to