On Mon, Feb 26, 2018 at 6:14 PM, Dave Sherohman <d...@sherohman.org> wrote:
> On Mon, Feb 26, 2018 at 05:45:27PM +0530, Karthik Subrahmanya wrote: > > > "In a replica 2 volume... If we set the client-quorum option to > > > auto, then the first brick must always be up, irrespective of the > > > status of the second brick. If only the second brick is up, the > > > subvolume becomes read-only." > > > > > By default client-quorum is "none" in replica 2 volume. > > I'm not sure where I saw the directions saying to set it, but I do have > "cluster.quorum-type: auto" in my volume configuration. (And I think > that's client quorum, but feel free to correct me if I've misunderstood > the docs.) > If it is "auto" then I think it is reconfigured. In replica 2 it will be "none". > > > It applies to all the replica 2 volumes even if it has just 2 brick or > more. > > Total brick count in the volume doesn't matter for the quorum, what > matters > > is the number of bricks which are up in the particular replica subvol. > > Thanks for confirming that. > > > If I understood your configuration correctly it should look something > like > > this: > > (Please correct me if I am wrong) > > replica-1: bricks 1 & 2 > > replica-2: bricks 3 & 4 > > replica-3: bricks 5 & 6 > > Yes, that's correct. > > > Since quorum is per replica, if it is set to auto then it needs the first > > brick of the particular replica subvol to be up to perform the fop. > > > > In replica 2 volumes you can end up in split-brains. > > How would that happen if bricks which are not in (cluster-wide) quorum > refuse to accept writes? I'm not seeing the reason for using individual > subvolume quorums instead of full-volume quorum. > Split brains happen within the replica pair. I will try to explain how you can end up in split-brain even with cluster wide quorum: Lets say you have 6 bricks (replica 2) volume and you always have at least quorum number of bricks up & running. Bricks 1 & 2 are part of replica subvol-1 Bricks 3 & 4 are part of replica subvol-2 Bricks 5 & 6 are part of replica subvol-3 - Brick 1 goes down and a write comes on a file which is part of that replica subvol-1 - Quorum is met since we have 5 out of 6 bricks are running - Brick 2 says brick 1 is bad - Brick 2 goes down and brick 1 comes up. Heal did not happened - Write comes on the same file, quorum is met, and now brick 1 says brick 2 is bad - When both the bricks 1 & 2 are up, both of them blame the other brick - *split-brain* > > > It would be great if you can consider configuring an arbiter or > > replica 3 volume. > > I can. My bricks are 2x850G and 4x11T, so I can repurpose the small > bricks as arbiters with minimal effect on capacity. What would be the > sequence of commands needed to: > > 1) Move all data off of bricks 1 & 2 > 2) Remove that replica from the cluster > 3) Re-add those two bricks as arbiters > > (And did I miss any additional steps?) > > Unfortunately, I've been running a few months already with the current > configuration and there are several virtual machines running off the > existing volume, so I'll need to reconfigure it online if possible. > Without knowing the volume configuration it is difficult to suggest the configuration change, and since it is a live system you may end up in data unavailability or data loss. Can you give the output of "gluster volume info <volname>" and which brick is of what size. Note: The arbiter bricks need not be of bigger size. [1] gives information about how you can provision the arbiter brick. [1] http://docs.gluster.org/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/#arbiter-bricks-sizing Regards, Karthik > > -- > Dave Sherohman >
_______________________________________________ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users