Please run 'gluster v get all cluster.max-op-version' and what ever value it throws up should be used to bump up the cluster.op-version (gluster v set all cluster.op-version <value>) . With that if you restart the rejected peer I believe the problem should go away, if it doesn't I'd need to investigate further once you can pass down the glusterd and cmd_history log files and the content of /var/lib/glusterd from all the nodes.
On Wed, Mar 7, 2018 at 4:13 AM, Jamie Lawrence <jlawre...@squaretrade.com> wrote: > > > On Mar 5, 2018, at 6:41 PM, Atin Mukherjee <amukh...@redhat.com> wrote: > > > I'm tempted to repeat - down things, copy the checksum the "good" ones > agree on, start things; but given that this has turned into a > balloon-squeezing exercise, I want to make sure I'm not doing this the > wrong way. > > > > Yes, that's the way. Copy /var/lib/glusterd/vols/<volname>/ from the > good node to the rejected one and restart glusterd service on the rejected > peer. > > > My apologies for the multiple messages - I'm having to work on this > episodically. > > I've tried again to reset state on the bad peer, to no avail. This time I > downed all of the peers, copied things over, ensuring that the tier-enabled > line was absent and started back up; the cksum immediately changed to some > a bad value, the two good nodes added that line in, and the bad node didn't > have it. > > Just to have a clear view of this, I did it yet again, this time ensuring > the tier-enbled line was present everywhere. Same result, except that it > didn't add the tier-enabled line, which I suppose makes some sense. > > One oddity - I see: > > # gluster v get all cluster.op-version > Option Value > ------ ----- > cluster.op-version 30800 > > but from one of the `info` files: > > op-version=30712 > client-op-version=30712 > > I don't know what it means that the cluster is at one version but > apparently the volume is set for another - I thought that was a > cluster-level setting. (Client.op-version theoretically makes more sense - > I can see Ovirt wanting an older version.) > > I'm at a loss to fix this - copying /var/lib/glusterd/vol/<vol> over > doesn't fix the problem. I'd be somewhat OK with trashing the volume and > starting over, if it weren't for two things: (1) Ovirt was also a massive > pain to set up, and it configured on this volume. But perhaps more > importantly, I'm concerned with this happening again once this is in > production, which would be Bad, especially if I don't have a fix. > > So at this point, I'm unclear on how to move forward or even where more to > look for potential problems. > > -j > > - - - - > > [2018-03-06 22:30:32.421530] I [MSGID: 106490] [glusterd-handler.c:2540:__ > glusterd_handle_incoming_friend_req] 0-glusterd: Received probe from > uuid: 77cdfbba-348c-43fe-ab3d-00621904ea9c > [2018-03-06 22:30:32.422582] E [MSGID: 106010] [glusterd-utils.c:3374: > glusterd_compare_friend_volume] 0-management: Version of Cksums > sc5-ovirt_engine differ. local cksum = 3949237931, remote cksum = > 2068896937 on peer sc5-gluster-10g-1.squaretrade.com > [2018-03-06 22:30:32.422774] I [MSGID: 106493] > [glusterd-handler.c:3800:glusterd_xfer_friend_add_resp] > 0-glusterd: Responded to sc5-gluster-10g-1.squaretrade.com (0), ret: 0, > op_ret: -1 > [2018-03-06 22:30:32.424621] I [MSGID: 106493] > [glusterd-rpc-ops.c:486:__glusterd_friend_add_cbk] > 0-glusterd: Received RJT from uuid: 77cdfbba-348c-43fe-ab3d-00621904ea9c, > host: sc5-gluster-10g-1.squaretrade.com, port: 0 > [2018-03-06 22:30:32.425563] I [MSGID: 106493] > [glusterd-rpc-ops.c:486:__glusterd_friend_add_cbk] > 0-glusterd: Received RJT from uuid: c1877e0d-ccb2-401e-83a6-e4a680af683a, > host: sc5-gluster-2.squaretrade.com, port: 0 > [2018-03-06 22:30:32.426706] I [MSGID: 106163] > [glusterd-handshake.c:1316:__glusterd_mgmt_hndsk_versions_ack] > 0-management: using the op-version 30800 > [2018-03-06 22:30:32.428075] I [MSGID: 106490] [glusterd-handler.c:2540:__ > glusterd_handle_incoming_friend_req] 0-glusterd: Received probe from > uuid: c1877e0d-ccb2-401e-83a6-e4a680af683a > [2018-03-06 22:30:32.428325] E [MSGID: 106010] [glusterd-utils.c:3374: > glusterd_compare_friend_volume] 0-management: Version of Cksums > sc5-ovirt_engine differ. local cksum = 3949237931, remote cksum = > 2068896937 on peer sc5-gluster-2.squaretrade.com > [2018-03-06 22:30:32.428468] I [MSGID: 106493] > [glusterd-handler.c:3800:glusterd_xfer_friend_add_resp] > 0-glusterd: Responded to sc5-gluster-2.squaretrade.com (0), ret: 0, > op_ret: -1 > >
_______________________________________________ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users