GitHub user roshu10 opened a pull request:

    https://github.com/apache/mesos/pull/292

    Adding one more master to the cluster (quorum 1 to 2)

    Hey,
    
    Currently we are using 2 mesos master in our infrau. i.e quoram is 1.
    We are planning to add one more master to the cluster to make quoram 2.
    
    Does it need any downtime on our production ?
    What strategies should we follow to avoid the downtime.
    
    
    Thanks for any help !!
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/mesos master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/mesos/pull/292.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #292
    
----
commit 081c3114fefa18c6acd1e884e6d6583232e30d5c
Author: Harold Dost <h.dost@...>
Date:   2018-05-07T15:39:29Z

    Documented the `--xfs-kill-containers` flag.
    
    Added a description of the `--xfs-kill-containers` flag to the
    `disk/xfs` isolator page and listed it in the upgrade documentation.
    
    Review: https://reviews.apache.org/r/66975/

commit 01b618c416507b7d43bf435b870dc26b2361e9f4
Author: James Peach <jpeach@...>
Date:   2018-05-07T16:03:42Z

    Fixed a disk space check in the XFS tests.
    
    The XFS tests were requiring that at least 2MB of data was written,
    but the test can still correctly pass after only 1MB had been written.
    
    Review: https://reviews.apache.org/r/66986/

commit aaf51e007044ad295949ad719776ef49d62dc3bf
Author: Qian Zhang <zhq527725@...>
Date:   2018-05-07T20:48:39Z

    Increased the timeout for waiting for `reaped` to be invoked.
    
    Previously after the container process is reaped by the Docker
    executor, we will wait 3 seconds for `reaped` to be invoked.
    However in some cases (e.g., launch a Docker container to use
    an external rexray volume), there will be more than 3 seconds
    after the container process exits and before the `docker run`
    command returns (i.e., `reaped` invoked). So in this patch,
    the timeout is increased to 60 seconds.
    
    Review: https://reviews.apache.org/r/66947/

commit b5b970381bc8e616720bb9ff7917920f0ab0974c
Author: Gilbert Song <songzihao1990@...>
Date:   2018-05-07T21:11:30Z

    Added MESOS-8876 to 1.6.0 CHANGELOG.

commit bad10a88f4a7c291add62dfb91d7c2c077582c44
Author: Gilbert Song <songzihao1990@...>
Date:   2018-05-07T21:12:45Z

    Added MESOS-8876 to 1.5.1 CHANGELOG.

commit 8b5b35b9f3a792d18c6ddb5dadb0df0657eb4e25
Author: Gilbert Song <songzihao1990@...>
Date:   2018-05-07T21:13:42Z

    Added MESOS-8876 to 1.4.2 CHANGELOG.

commit 36522a29adcffebcbaaee66d67ee66b40dcdee8f
Author: Gilbert Song <songzihao1990@...>
Date:   2018-05-07T21:14:00Z

    Added MESOS-8876 to 1.3.3 CHANGELOG.

commit 0262b41f8e3b40c63c1de42d556241f889320e7d
Author: Benjamin Mahler <bmahler@...>
Date:   2018-05-07T01:09:02Z

    Re-enable epoll support for libevent.
    
    Epoll support was disabled due to some undocumented "issues". Since
    the original author is not responsive and a lot of libevent / SSL
    issues have been fixed, we can try re-enabling epoll support.
    
    Should this be an issue, epoll can be disabled once again using the
    EVENT_NOEPOLL environment variable.
    
    Review: https://reviews.apache.org/r/66977

commit 44c1321827e25a2ee2210954b7d180bca8cf5232
Author: Benjamin Mahler <bmahler@...>
Date:   2018-05-07T02:03:07Z

    Disabled debug mode for libevent.
    
    Debug mode enables additional tracking in libevent to check for common
    errors. It is recommended to "only enable debug mode when actually
    debugging your program" because "tracking which events are initialized
    requires that Libevent use extra memory and CPU".
    
    http://www.wangafu.net/~nickm/libevent-book/Ref1_libsetup.html
    
    We could consider introducing libevent flags in order to be able to
    toggle this behavior with an environment variable since it appears
    that libevent does not provide one. However, since I don't believe
    these assertions have been of value, we can just remove the debug mode
    for now.
    
    Review: https://reviews.apache.org/r/66978

commit c692354d0f374de0c94cd04e8e1f7f00a1a1ba36
Author: Greg Mann <gregorywmann@...>
Date:   2018-05-07T21:24:22Z

    Updated Mesos version to 1.7.0.

commit f9aadd03011b0dfcbba4a60f0e40ac79e889e954
Author: Benjamin Mahler <bmahler@...>
Date:   2018-05-07T21:20:26Z

    Added MESOS-8881 to the 1.6.0 CHANGELOG.

commit 55ef28564c077470729a5bf04ca1674a52c7c5d7
Author: Benjamin Mahler <bmahler@...>
Date:   2018-05-07T21:25:42Z

    Added MESOS-8885 to the 1.6.0 CHANGELOG.

commit 17454a62bbe5c8b4cfabcbd0b64f22acc0cf8704
Author: Benjamin Mahler <bmahler@...>
Date:   2018-05-07T21:29:16Z

    Added MESOS-8881 to the 1.5.1 CHANGELOG.

commit 91ae3eb7f722cec75479404364c2735ac7156507
Author: Benjamin Mahler <bmahler@...>
Date:   2018-05-07T21:29:34Z

    Added MESOS-8885 to the 1.5.1 CHANGELOG.

commit 82ffb94650c3b059c0862e4017cd2240544a1c52
Author: Benjamin Mahler <bmahler@...>
Date:   2018-05-07T21:38:01Z

    Added MESOS-8881 to the 1.4.2 CHANGELOG.

commit e4919c44b1ef180461b587892ea1d644b66a5112
Author: Benjamin Mahler <bmahler@...>
Date:   2018-05-07T21:38:16Z

    Added MESOS-8885 to the 1.4.2 CHANGELOG.

commit e313487c04f30587c1a42d56fbb1cc15cc708b3d
Author: Benjamin Mahler <bmahler@...>
Date:   2018-05-07T21:46:18Z

    Added MESOS-8881 to the 1.3.3 CHANGELOG.

commit b11f7aefe28d8e221d976ec9a73417661f5e4629
Author: Benjamin Mahler <bmahler@...>
Date:   2018-05-07T21:46:26Z

    Added MESOS-8885 to the 1.3.3 CHANGELOG.

commit 5b42b52f5c932ad0d32f9718d544f75b604cb508
Author: Xudong Ni <xudong_ni@...>
Date:   2018-05-07T21:39:46Z

    Failure to update registry should abort the master process.
    
    When the registrar fails to update the registry it would abort the
    actor and fail all future operations. However when the registrar
    update is requested by an operator API such as a maintenance update,
    the master process doesn't shut down (a 500 error is returned to the
    client instead) and all subsequent operations will fail.
    
    This patch fixes the specific maintenance API case but we can follow
    up with other call sites or put a fix in for the registrar itself.
    
    Review: https://reviews.apache.org/r/66919/

commit e6298aef83039dacc80b8e2a8778efacbaa63efc
Author: Jiang Yan Xu <xujyan@...>
Date:   2018-05-08T00:05:06Z

    Minor style fix.

commit 39b27e1bb90aab3f10c1203d8f4f65de4f32e774
Author: Greg Mann <greg@...>
Date:   2018-05-08T00:31:55Z

    Made the 'SchedulerDriver' abort when operation's 'id' field is set.
    
    Since the 'SchedulerDriver' does not support operation status updates,
    this patch adds a check to the driver which will abort the scheduler
    if the 'id' field is set in an offer operation.
    
    Review: https://reviews.apache.org/r/66938/

commit b4c541b4d9677e2b84d8538f319a3dfe7987e327
Author: Gaston Kleiman <gaston@...>
Date:   2018-05-08T00:32:15Z

    Made the master drop operations with an ID on agent default resources.
    
    Review: https://reviews.apache.org/r/66992/

commit a570f9436b816d40ba3d01455211f5d61f77d66d
Author: Gaston Kleiman <gaston@...>
Date:   2018-05-08T00:32:56Z

    Made the master include the operation ID in OPERATION_DROPPED updates.
    
    Review: https://reviews.apache.org/r/66924/

commit 9d897259a39dc9f90e8fad191732a3fe45d63458
Author: Gaston Kleiman <gaston@...>
Date:   2018-05-08T00:33:32Z

    Prevented master from sending operation updates to v0 frameworks.
    
    Review: https://reviews.apache.org/r/66995/

commit 52ae7f0e6dd6952d243c37e8b8aa98ce7752a17d
Author: Gaston Kleiman <gaston@...>
Date:   2018-05-08T00:33:56Z

    Improved validation messages for some operations.
    
    Review: https://reviews.apache.org/r/66939/

commit 9c54841cbdb77a5c8f5fba0089b70330eed2e80b
Author: Greg Mann <gregorywmann@...>
Date:   2018-05-08T01:20:08Z

    Added MESOS-8784 to the CHANGELOG.

commit 25176ed1b30a9f7fb82a71bca16a423343ba6d5c
Author: Benjamin Bannier <benjamin.bannier@...>
Date:   2018-05-08T15:58:12Z

    Fixed flakiness in a `MasterSlaveReconciliationTest`.
    
    The test `ReconcileDroppedOperation` uses detection of a
    `ReconcileOperationsMessage` to confirm correct agent reregistration
    behavior. For that it drops an operation on its way to the agent, and
    then tries to observe the `ReconcileOperationsMessage` when the agent
    reregisters after a simulated master failover.
    
    Since `ReconcileOperationsMessage` is sent whenever the master detects
    discrepancy between its own operation state of the agent and the
    information sent by the agent in an `UpdateSlaveMessage` we need to
    make sure to only drop the operation once the agent has sent the
    update which is part of its initial registration sequence.
    
    Review: https://reviews.apache.org/r/67003/

commit 6d97f68e5a4bbd22a0b72cf7c2c1826e45142de4
Author: Benno Evers <bevers@...>
Date:   2018-05-08T17:00:53Z

    Added an option to keep downloaded patches to apply-reviews.py.
    
    By default, the apply-reviews.py script will delete all patch files
    it downloads. However, when a patch fails to apply, it might be
    desired to edit and apply it manually. This change will make it easier
    to do so.
    
    Review: https://reviews.apache.org/r/67004/

commit 86523d3157d36bdaf4f7ce8fe001ae241e690a5f
Author: Benno Evers <bevers@...>
Date:   2018-05-08T17:02:51Z

    Fixed flakyness in 'MasterAPITest.MasterFailover'.
    
    This test used to be sporadically segfault as described in MESOS-8687.
    The suspected cause is that a in a master actor, the `httpSequence`
    field was lazily initialized in `ProcessBase::consume()` and afterwards
    a call to `ProcessBase::_consume()` was dispatched, where it was
    assumed that `httpSequence` is already initialized.
    
    However, during this test the master actor would be destroyed and a
    new actor would be spawned with the same PID. The dispatched method
    would be called on this new actor and find `httpSequence` to be not
    initialized, leading to a crash.
    
    This patch introduces a call to `Clock::settle()` after the master
    is shut down to ensure the outstanding `_consume()` gets discarded
    before starting the new master actor.
    
    Review: https://reviews.apache.org/r/66799/

commit 50f561a29a897004f5865aa8ee38ba5cf1e49410
Author: Jan Schlicht <jan@...>
Date:   2018-05-09T10:38:56Z

    Added token-based authentication for resource providers.
    
    If a token is provided, it will be used in HTTP requests to the resource
    provider manager. This allows JWT-based authentication and authorization
    for resource providers.
    
    The (unimplemented) credential support in `resource_provider::Driver`
    has been removed in favor of the token-based approach.
    
    Review: https://reviews.apache.org/r/66932/

----


---

Reply via email to