GitHub user roshu10 opened a pull request: https://github.com/apache/mesos/pull/292
Adding one more master to the cluster (quorum 1 to 2) Hey, Currently we are using 2 mesos master in our infrau. i.e quoram is 1. We are planning to add one more master to the cluster to make quoram 2. Does it need any downtime on our production ? What strategies should we follow to avoid the downtime. Thanks for any help !! You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/mesos master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/mesos/pull/292.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #292 ---- commit 081c3114fefa18c6acd1e884e6d6583232e30d5c Author: Harold Dost <h.dost@...> Date: 2018-05-07T15:39:29Z Documented the `--xfs-kill-containers` flag. Added a description of the `--xfs-kill-containers` flag to the `disk/xfs` isolator page and listed it in the upgrade documentation. Review: https://reviews.apache.org/r/66975/ commit 01b618c416507b7d43bf435b870dc26b2361e9f4 Author: James Peach <jpeach@...> Date: 2018-05-07T16:03:42Z Fixed a disk space check in the XFS tests. The XFS tests were requiring that at least 2MB of data was written, but the test can still correctly pass after only 1MB had been written. Review: https://reviews.apache.org/r/66986/ commit aaf51e007044ad295949ad719776ef49d62dc3bf Author: Qian Zhang <zhq527725@...> Date: 2018-05-07T20:48:39Z Increased the timeout for waiting for `reaped` to be invoked. Previously after the container process is reaped by the Docker executor, we will wait 3 seconds for `reaped` to be invoked. However in some cases (e.g., launch a Docker container to use an external rexray volume), there will be more than 3 seconds after the container process exits and before the `docker run` command returns (i.e., `reaped` invoked). So in this patch, the timeout is increased to 60 seconds. Review: https://reviews.apache.org/r/66947/ commit b5b970381bc8e616720bb9ff7917920f0ab0974c Author: Gilbert Song <songzihao1990@...> Date: 2018-05-07T21:11:30Z Added MESOS-8876 to 1.6.0 CHANGELOG. commit bad10a88f4a7c291add62dfb91d7c2c077582c44 Author: Gilbert Song <songzihao1990@...> Date: 2018-05-07T21:12:45Z Added MESOS-8876 to 1.5.1 CHANGELOG. commit 8b5b35b9f3a792d18c6ddb5dadb0df0657eb4e25 Author: Gilbert Song <songzihao1990@...> Date: 2018-05-07T21:13:42Z Added MESOS-8876 to 1.4.2 CHANGELOG. commit 36522a29adcffebcbaaee66d67ee66b40dcdee8f Author: Gilbert Song <songzihao1990@...> Date: 2018-05-07T21:14:00Z Added MESOS-8876 to 1.3.3 CHANGELOG. commit 0262b41f8e3b40c63c1de42d556241f889320e7d Author: Benjamin Mahler <bmahler@...> Date: 2018-05-07T01:09:02Z Re-enable epoll support for libevent. Epoll support was disabled due to some undocumented "issues". Since the original author is not responsive and a lot of libevent / SSL issues have been fixed, we can try re-enabling epoll support. Should this be an issue, epoll can be disabled once again using the EVENT_NOEPOLL environment variable. Review: https://reviews.apache.org/r/66977 commit 44c1321827e25a2ee2210954b7d180bca8cf5232 Author: Benjamin Mahler <bmahler@...> Date: 2018-05-07T02:03:07Z Disabled debug mode for libevent. Debug mode enables additional tracking in libevent to check for common errors. It is recommended to "only enable debug mode when actually debugging your program" because "tracking which events are initialized requires that Libevent use extra memory and CPU". http://www.wangafu.net/~nickm/libevent-book/Ref1_libsetup.html We could consider introducing libevent flags in order to be able to toggle this behavior with an environment variable since it appears that libevent does not provide one. However, since I don't believe these assertions have been of value, we can just remove the debug mode for now. Review: https://reviews.apache.org/r/66978 commit c692354d0f374de0c94cd04e8e1f7f00a1a1ba36 Author: Greg Mann <gregorywmann@...> Date: 2018-05-07T21:24:22Z Updated Mesos version to 1.7.0. commit f9aadd03011b0dfcbba4a60f0e40ac79e889e954 Author: Benjamin Mahler <bmahler@...> Date: 2018-05-07T21:20:26Z Added MESOS-8881 to the 1.6.0 CHANGELOG. commit 55ef28564c077470729a5bf04ca1674a52c7c5d7 Author: Benjamin Mahler <bmahler@...> Date: 2018-05-07T21:25:42Z Added MESOS-8885 to the 1.6.0 CHANGELOG. commit 17454a62bbe5c8b4cfabcbd0b64f22acc0cf8704 Author: Benjamin Mahler <bmahler@...> Date: 2018-05-07T21:29:16Z Added MESOS-8881 to the 1.5.1 CHANGELOG. commit 91ae3eb7f722cec75479404364c2735ac7156507 Author: Benjamin Mahler <bmahler@...> Date: 2018-05-07T21:29:34Z Added MESOS-8885 to the 1.5.1 CHANGELOG. commit 82ffb94650c3b059c0862e4017cd2240544a1c52 Author: Benjamin Mahler <bmahler@...> Date: 2018-05-07T21:38:01Z Added MESOS-8881 to the 1.4.2 CHANGELOG. commit e4919c44b1ef180461b587892ea1d644b66a5112 Author: Benjamin Mahler <bmahler@...> Date: 2018-05-07T21:38:16Z Added MESOS-8885 to the 1.4.2 CHANGELOG. commit e313487c04f30587c1a42d56fbb1cc15cc708b3d Author: Benjamin Mahler <bmahler@...> Date: 2018-05-07T21:46:18Z Added MESOS-8881 to the 1.3.3 CHANGELOG. commit b11f7aefe28d8e221d976ec9a73417661f5e4629 Author: Benjamin Mahler <bmahler@...> Date: 2018-05-07T21:46:26Z Added MESOS-8885 to the 1.3.3 CHANGELOG. commit 5b42b52f5c932ad0d32f9718d544f75b604cb508 Author: Xudong Ni <xudong_ni@...> Date: 2018-05-07T21:39:46Z Failure to update registry should abort the master process. When the registrar fails to update the registry it would abort the actor and fail all future operations. However when the registrar update is requested by an operator API such as a maintenance update, the master process doesn't shut down (a 500 error is returned to the client instead) and all subsequent operations will fail. This patch fixes the specific maintenance API case but we can follow up with other call sites or put a fix in for the registrar itself. Review: https://reviews.apache.org/r/66919/ commit e6298aef83039dacc80b8e2a8778efacbaa63efc Author: Jiang Yan Xu <xujyan@...> Date: 2018-05-08T00:05:06Z Minor style fix. commit 39b27e1bb90aab3f10c1203d8f4f65de4f32e774 Author: Greg Mann <greg@...> Date: 2018-05-08T00:31:55Z Made the 'SchedulerDriver' abort when operation's 'id' field is set. Since the 'SchedulerDriver' does not support operation status updates, this patch adds a check to the driver which will abort the scheduler if the 'id' field is set in an offer operation. Review: https://reviews.apache.org/r/66938/ commit b4c541b4d9677e2b84d8538f319a3dfe7987e327 Author: Gaston Kleiman <gaston@...> Date: 2018-05-08T00:32:15Z Made the master drop operations with an ID on agent default resources. Review: https://reviews.apache.org/r/66992/ commit a570f9436b816d40ba3d01455211f5d61f77d66d Author: Gaston Kleiman <gaston@...> Date: 2018-05-08T00:32:56Z Made the master include the operation ID in OPERATION_DROPPED updates. Review: https://reviews.apache.org/r/66924/ commit 9d897259a39dc9f90e8fad191732a3fe45d63458 Author: Gaston Kleiman <gaston@...> Date: 2018-05-08T00:33:32Z Prevented master from sending operation updates to v0 frameworks. Review: https://reviews.apache.org/r/66995/ commit 52ae7f0e6dd6952d243c37e8b8aa98ce7752a17d Author: Gaston Kleiman <gaston@...> Date: 2018-05-08T00:33:56Z Improved validation messages for some operations. Review: https://reviews.apache.org/r/66939/ commit 9c54841cbdb77a5c8f5fba0089b70330eed2e80b Author: Greg Mann <gregorywmann@...> Date: 2018-05-08T01:20:08Z Added MESOS-8784 to the CHANGELOG. commit 25176ed1b30a9f7fb82a71bca16a423343ba6d5c Author: Benjamin Bannier <benjamin.bannier@...> Date: 2018-05-08T15:58:12Z Fixed flakiness in a `MasterSlaveReconciliationTest`. The test `ReconcileDroppedOperation` uses detection of a `ReconcileOperationsMessage` to confirm correct agent reregistration behavior. For that it drops an operation on its way to the agent, and then tries to observe the `ReconcileOperationsMessage` when the agent reregisters after a simulated master failover. Since `ReconcileOperationsMessage` is sent whenever the master detects discrepancy between its own operation state of the agent and the information sent by the agent in an `UpdateSlaveMessage` we need to make sure to only drop the operation once the agent has sent the update which is part of its initial registration sequence. Review: https://reviews.apache.org/r/67003/ commit 6d97f68e5a4bbd22a0b72cf7c2c1826e45142de4 Author: Benno Evers <bevers@...> Date: 2018-05-08T17:00:53Z Added an option to keep downloaded patches to apply-reviews.py. By default, the apply-reviews.py script will delete all patch files it downloads. However, when a patch fails to apply, it might be desired to edit and apply it manually. This change will make it easier to do so. Review: https://reviews.apache.org/r/67004/ commit 86523d3157d36bdaf4f7ce8fe001ae241e690a5f Author: Benno Evers <bevers@...> Date: 2018-05-08T17:02:51Z Fixed flakyness in 'MasterAPITest.MasterFailover'. This test used to be sporadically segfault as described in MESOS-8687. The suspected cause is that a in a master actor, the `httpSequence` field was lazily initialized in `ProcessBase::consume()` and afterwards a call to `ProcessBase::_consume()` was dispatched, where it was assumed that `httpSequence` is already initialized. However, during this test the master actor would be destroyed and a new actor would be spawned with the same PID. The dispatched method would be called on this new actor and find `httpSequence` to be not initialized, leading to a crash. This patch introduces a call to `Clock::settle()` after the master is shut down to ensure the outstanding `_consume()` gets discarded before starting the new master actor. Review: https://reviews.apache.org/r/66799/ commit 50f561a29a897004f5865aa8ee38ba5cf1e49410 Author: Jan Schlicht <jan@...> Date: 2018-05-09T10:38:56Z Added token-based authentication for resource providers. If a token is provided, it will be used in HTTP requests to the resource provider manager. This allows JWT-based authentication and authorization for resource providers. The (unimplemented) credential support in `resource_provider::Driver` has been removed in favor of the token-based approach. Review: https://reviews.apache.org/r/66932/ ---- ---