Repository: kafka
Updated Branches:
  refs/heads/0.11.0 6414b628a -> 57181bb77


MINOR: Add serialized vagrant rsync until upstream fixes broken parallelism

See https://github.com/mitchellh/vagrant/issues/7531. The core of the issue is 
that vagrant rsync uses a fixed set of 1000 possible temp file entries for SSH 
ControlMaster files to cache SSH connections for rsyncing. A few notes:

* We can't break down the steps further and maintain performance due to various 
limitations in vagrant/vagrant-aws (rsync is only executed on `vagrant 
up`/`vagrant reload`/`vagrant rsync`, you can't enable/disable and rsync shared 
folder only during some of those stages, and provisioning only runs in parallel 
with vagrant-aws during `vagrant up`).
* We need to isolate each of the serialized rsync calls. (If we assumed 
`parallel` was available, we actually could get the parallelism back.) This is 
required because even across calls they could randomly choose the same 
temporary file.
* If there's a chance multiple instances were running on the same server at the 
same or nearly the same time, they can conflict since the same temp file 
entries are used globally. This means anything running on shared CI servers 
might end up syncing data between different CI jobs (!!), which could lead to 
some very strange results. Especially weird if they aren't even for the same 
type of job.
* Provisioning error check needs to be removed because it is catching rsync 
errors, but those can still happen in the initial `vagrant up` rsync step 
before the `vagrant up` provisioning step. It seems likely this bug was the 
cause of missing files anyway so this check might not be as valuable anymore.

Author: Ewen Cheslack-Postava <[email protected]>

Reviewers: Ismael Juma <[email protected]>

Closes #3380 from ewencp/deparallelize-rsync

(cherry picked from commit ffa8100457bbde24eaba27a0fadb1bc5212bfc4e)
Signed-off-by: Ewen Cheslack-Postava <[email protected]>


Project: http://git-wip-us.apache.org/repos/asf/kafka/repo
Commit: http://git-wip-us.apache.org/repos/asf/kafka/commit/57181bb7
Tree: http://git-wip-us.apache.org/repos/asf/kafka/tree/57181bb7
Diff: http://git-wip-us.apache.org/repos/asf/kafka/diff/57181bb7

Branch: refs/heads/0.11.0
Commit: 57181bb7786ef7c9ee735d02abcd3b0c6fe94c4c
Parents: 6414b62
Author: Ewen Cheslack-Postava <[email protected]>
Authored: Tue Jun 20 12:21:43 2017 -0700
Committer: Ewen Cheslack-Postava <[email protected]>
Committed: Tue Jun 20 12:21:55 2017 -0700

----------------------------------------------------------------------
 vagrant/base.sh       |  9 ---------
 vagrant/vagrant-up.sh | 19 ++++++++++++++++++-
 2 files changed, 18 insertions(+), 10 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kafka/blob/57181bb7/vagrant/base.sh
----------------------------------------------------------------------
diff --git a/vagrant/base.sh b/vagrant/base.sh
index b302dd1..4c0add5 100755
--- a/vagrant/base.sh
+++ b/vagrant/base.sh
@@ -57,15 +57,6 @@ if [ -h /opt/kafka-dev ]; then
 fi
 ln -s /vagrant /opt/kafka-dev
 
-# Verification to catch provisioning errors.
-if [[ ! -x /opt/kafka-dev/bin/kafka-run-class.sh ]]; then
-    echo "ERROR: kafka-run-class.sh not found/executable in /opt/kafka-dev/bin"
-    find /opt/kafka-dev
-    ls -la /opt/kafka-dev/bin/kafka-run-class.sh || true
-    exit 1
-fi
-
-
 
 get_kafka() {
     version=$1

http://git-wip-us.apache.org/repos/asf/kafka/blob/57181bb7/vagrant/vagrant-up.sh
----------------------------------------------------------------------
diff --git a/vagrant/vagrant-up.sh b/vagrant/vagrant-up.sh
index b01c10d..5b88144 100755
--- a/vagrant/vagrant-up.sh
+++ b/vagrant/vagrant-up.sh
@@ -226,8 +226,25 @@ function bring_up_aws {
 
         if [[ ! -z "$worker_machines" ]]; then
             echo "Bringing up test worker machines in parallel"
-            vagrant_batch_command "vagrant up $debug --provider=aws" 
"$worker_machines" "$max_parallel"
+            # Currently it seems that the AWS provider will always run
+            # rsync as part of vagrant up. However,
+            # https://github.com/mitchellh/vagrant/issues/7531 means
+            # it is not safe to do so. Since the bug doesn't seem to
+            # cause any direct errors, just missing data on some
+            # nodes, follow up with serial rsyncing to ensure we're in
+            # a clean state. Use custom TMPDIR values to ensure we're
+            # isolated from any other instances of this script that
+            # are running/ran recently and may cause different
+            # instances to sync to the wrong nodes
+            local vagrant_rsync_temp_dir=$(mktemp -d);
+            TMPDIR=$vagrant_rsync_temp_dir vagrant_batch_command "vagrant up 
$debug --provider=aws" "$worker_machines" "$max_parallel"
+            rm -rf $vagrant_rsync_temp_dir
             vagrant hostmanager
+            for worker in $worker_machines; do
+                local vagrant_rsync_temp_dir=$(mktemp -d);
+                TMPDIR=$vagrant_rsync_temp_dir vagrant rsync $worker;
+                rm -rf $vagrant_rsync_temp_dir
+            done
         fi
     else
         vagrant up --provider=aws --no-parallel --no-provision $debug

Reply via email to