Re: [Gluster-users] Gluster 3.6.4 peer rejected while doing probe

2015-09-14 Thread Davy Croonen
Atin,

I performed a gluster volume set  performance.flush-behind off/on 
toggle on both volumes and after that the probe was successful.

So many thanks for your support.

Some additional info, in our lab I did some tests starting with gluster version 
3.6.4 and was not able to reproduce the problem. After that I went looking for 
some differences with our production cluster and found out that we started 
there with version 3.5.x which we upgraded to version 3.6.4. So maybe the 
bug/incompatibility  is introduced somewhere after an upgrade procedure?

Greetings
Davy

On 14 Sep 2015, at 07:43, Atin Mukherjee 
> wrote:

Davy,

This seems to be an issue which we also faced couple of months back
during upgrade testing and a bugzilla [1] was raised for the same. At
the time we didn't have the work around to make peer probe work, but
somehow I managed to get the workaround today.

Could you do an explicit volume set on the existing cluster and then do
a peer probe? Let me know if that works.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1248895

Thanks,
Atin

On 09/11/2015 05:41 PM, Davy Croonen wrote:
Atin

Please see the requested attachments.

KR
Davy

On 11 Sep 2015, at 14:03, Atin Mukherjee 
> wrote:

Could you attach the contents of /var/lib/glusterd/vol//info
file from both the nodes?

~Atin

On 09/11/2015 04:50 PM, Davy Croonen wrote:
Thanks for your quick respons.

As reported in the log the checksums are indeed not the same. On
gfs01a-dcg it is 'info=1266454712’ and on gfs02a-dcg it is
'info=2613085848’. Of course my next question is how can I fix this?

I already tried by stopping the gluster daemon on gfs02a-dcg, deleting
the entire vols directory and starting the gluster daemon again. On the
gfs01a-dcg host I now did a gluster peer status which shows:

Hostname: gfs02a-dcg.intnet.be 

Uuid: 29592d5b-242b-43b5-afc5-5f9a1496d59f
State: Peer in Cluster (Connected)

But, the checksum of the public volume is still not the same on
gfs01a-dcg and gfs02a-dcg and also running a gluster peer status on
gfs01b-dcg (the replica of gfs01a-dcg) gives me:

Hostname: gfs02a-dcg.intnet.be 

Uuid: 29592d5b-242b-43b5-afc5-5f9a1496d59f
State: Peer Rejected (Connected)

So my question remains any way to fix this?

Kind regards

Davy

On 11 Sep 2015, at 12:39, Mohammed Rafi K C 

> wrote:

Can you check the checksum of the volume "public" in both of the
current nodes. Checksums are located in
(/var/lib/glusterd/vols/public/cksum).

Regards
Rafi KC

On 09/11/2015 03:24 PM, Davy Croonen wrote:
Hi all

We have a production cluster with 2 nodes (gfs01a and gfs01b) in a
distributed replicate setup with glusterfs 3.6.4. We want to expand
the volume with 2 extra nodes (gfs02a and gfs02b) because we are
running out of diskspace. Therefor we deployed 2 extra nodes with
glusterfs 3.6.4.

Now, while probing the 2 new nodes from a node in the existing
cluster we got the following error:

root@gfs01a-dcg:~# gluster peer probe 
gfs02a-dcg.intnet.be

peer probe: success.
root@gfs01a-dcg:~# gluster peer status
Number of Peers: 2

Hostname: gfs01b-dcg.intnet.be 

Uuid: cfc83cf2-b719-40c7-afea-b23accc714c3
State: Peer in Cluster (Connected)

Hostname: gfs02a-dcg.intnet.be 

Uuid: 29592d5b-242b-43b5-afc5-5f9a1496d59f
*State: Peer Rejected (Connected)*

In the log file /var/log/glusterfs/etc-glusterfs-glusterd.vol.log the
following entries are written:

[2015-09-11 09:37:49.405906] I
[glusterd-handler.c:1031:__glusterd_handle_cli_probe] 0-glusterd:
Received CLI probe req gfs02a-dcg.intnet.be
 24007
[2015-09-11 09:37:49.428630] I
[glusterd-handler.c:3198:glusterd_probe_begin] 0-glusterd: Unable to
find peerinfo for host: gfs02a-dcg.intnet.be
 (24007)
[2015-09-11 09:37:49.438636] I
[rpc-clnt.c:969:rpc_clnt_connection_init] 0-management: setting
frame-timeout to 600
[2015-09-11 09:37:49.440513] I
[glusterd-handler.c:3131:glusterd_friend_add] 0-management: connect
returned 0
[2015-09-11 09:37:49.474316] I
[glusterd-rpc-ops.c:245:__glusterd_probe_cbk] 0-management: Received
probe resp from uuid: 29592d5b-242b-43b5-afc5-5f9a1496d59f, host:
gfs02a-dcg.intnet.be 
[2015-09-11 09:37:49.481801] I
[glusterd-rpc-ops.c:387:__glusterd_probe_cbk] 0-glusterd: Received
resp to probe req
[2015-09-11 09:37:51.650265] I
[glusterd-rpc-ops.c:437:__glusterd_friend_add_cbk] 0-glusterd:
Received ACC from uuid: 

Re: [Gluster-users] Gluster 3.6.4 peer rejected while doing probe

2015-09-14 Thread Atin Mukherjee


On 09/14/2015 02:33 PM, Davy Croonen wrote:
> Atin,
> 
> I performed a /gluster volume set 
> performance.flush-behind/ /off/on/ toggle on both volumes and after that
> the probe was successful.
> 
> So many thanks for your support.
> 
> Some additional info, in our lab I did some tests starting with gluster
> version 3.6.4 and was not able to reproduce the problem. After that I
> went looking for some differences with our production cluster and found
> out that we started there with version 3.5.x which we upgraded to
> version 3.6.4. So maybe the bug/incompatibility  is introduced somewhere
> after an upgrade procedure?
You will *only* hit this issue if you have upgraded from 3.5 to 3.6, so
your observation is correct, however the problem surfaces while bumping
up the cluster's op-version. We are ideally expected to write all the
new default information in the info file which seems to be missing. I
shall be working on this patch pretty soon and look to backport it in
3.x series.

Thanks,
Atin
> 
> Greetings
> Davy
> 
>> On 14 Sep 2015, at 07:43, Atin Mukherjee > > wrote:
>>
>> Davy,
>>
>> This seems to be an issue which we also faced couple of months back
>> during upgrade testing and a bugzilla [1] was raised for the same. At
>> the time we didn't have the work around to make peer probe work, but
>> somehow I managed to get the workaround today.
>>
>> Could you do an explicit volume set on the existing cluster and then do
>> a peer probe? Let me know if that works.
>>
>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1248895
>>
>> Thanks,
>> Atin
>>
>> On 09/11/2015 05:41 PM, Davy Croonen wrote:
>>> Atin
>>>
>>> Please see the requested attachments.
>>>
>>> KR
>>> Davy
>>>
 On 11 Sep 2015, at 14:03, Atin Mukherjee > wrote:

 Could you attach the contents of /var/lib/glusterd/vol//info
 file from both the nodes?

 ~Atin

 On 09/11/2015 04:50 PM, Davy Croonen wrote:
> Thanks for your quick respons.
>
> As reported in the log the checksums are indeed not the same. On
> gfs01a-dcg it is 'info=1266454712’ and on gfs02a-dcg it is
> 'info=2613085848’. Of course my next question is how can I fix this?
>
> I already tried by stopping the gluster daemon on gfs02a-dcg, deleting
> the entire vols directory and starting the gluster daemon again. On the
> gfs01a-dcg host I now did a gluster peer status which shows:
>
> Hostname: gfs02a-dcg.intnet.be 
> 
> Uuid: 29592d5b-242b-43b5-afc5-5f9a1496d59f
> State: Peer in Cluster (Connected)
>
> But, the checksum of the public volume is still not the same on
> gfs01a-dcg and gfs02a-dcg and also running a gluster peer status on
> gfs01b-dcg (the replica of gfs01a-dcg) gives me:
>
> Hostname: gfs02a-dcg.intnet.be 
> 
> Uuid: 29592d5b-242b-43b5-afc5-5f9a1496d59f
> State: Peer Rejected (Connected)
>
> So my question remains any way to fix this?
>
> Kind regards
>
> Davy
>
>> On 11 Sep 2015, at 12:39, Mohammed Rafi K C > 
>> > wrote:
>>
>> Can you check the checksum of the volume "public" in both of the
>> current nodes. Checksums are located in
>> (/var/lib/glusterd/vols/public/cksum).
>>
>> Regards
>> Rafi KC
>>
>> On 09/11/2015 03:24 PM, Davy Croonen wrote:
>>> Hi all
>>>
>>> We have a production cluster with 2 nodes (gfs01a and gfs01b) in a
>>> distributed replicate setup with glusterfs 3.6.4. We want to expand
>>> the volume with 2 extra nodes (gfs02a and gfs02b) because we are
>>> running out of diskspace. Therefor we deployed 2 extra nodes with
>>> glusterfs 3.6.4.
>>>
>>> Now, while probing the 2 new nodes from a node in the existing
>>> cluster we got the following error:
>>>
>>> root@gfs01a-dcg:~# gluster peer probe gfs02a-dcg.intnet.be
>>> 
>>> 
>>> peer probe: success.
>>> root@gfs01a-dcg:~# gluster peer status
>>> Number of Peers: 2
>>>
>>> Hostname: gfs01b-dcg.intnet.be 
>>> 
>>> Uuid: cfc83cf2-b719-40c7-afea-b23accc714c3
>>> State: Peer in Cluster (Connected)
>>>
>>> Hostname: gfs02a-dcg.intnet.be 
>>> 
>>> Uuid: 29592d5b-242b-43b5-afc5-5f9a1496d59f
>>> *State: Peer Rejected (Connected)*
>>>
>>> In the log file /var/log/glusterfs/etc-glusterfs-glusterd.vol.log the
>>> following entries are written:
>>>
>>> [2015-09-11 09:37:49.405906] I
>>> 

Re: [Gluster-users] Gluster 3.6.4 peer rejected while doing probe

2015-09-13 Thread Atin Mukherjee
Davy,

This seems to be an issue which we also faced couple of months back
during upgrade testing and a bugzilla [1] was raised for the same. At
the time we didn't have the work around to make peer probe work, but
somehow I managed to get the workaround today.

Could you do an explicit volume set on the existing cluster and then do
a peer probe? Let me know if that works.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1248895

Thanks,
Atin

On 09/11/2015 05:41 PM, Davy Croonen wrote:
> Atin
> 
> Please see the requested attachments.
> 
> KR
> Davy
> 
>> On 11 Sep 2015, at 14:03, Atin Mukherjee  wrote:
>>
>> Could you attach the contents of /var/lib/glusterd/vol//info
>> file from both the nodes?
>>
>> ~Atin
>>
>> On 09/11/2015 04:50 PM, Davy Croonen wrote:
>>> Thanks for your quick respons.
>>>
>>> As reported in the log the checksums are indeed not the same. On
>>> gfs01a-dcg it is 'info=1266454712’ and on gfs02a-dcg it is
>>> 'info=2613085848’. Of course my next question is how can I fix this?
>>>
>>> I already tried by stopping the gluster daemon on gfs02a-dcg, deleting
>>> the entire vols directory and starting the gluster daemon again. On the
>>> gfs01a-dcg host I now did a gluster peer status which shows:
>>>
>>> Hostname: gfs02a-dcg.intnet.be 
>>> Uuid: 29592d5b-242b-43b5-afc5-5f9a1496d59f
>>> State: Peer in Cluster (Connected)
>>>
>>> But, the checksum of the public volume is still not the same on
>>> gfs01a-dcg and gfs02a-dcg and also running a gluster peer status on
>>> gfs01b-dcg (the replica of gfs01a-dcg) gives me:
>>>
>>> Hostname: gfs02a-dcg.intnet.be 
>>> Uuid: 29592d5b-242b-43b5-afc5-5f9a1496d59f
>>> State: Peer Rejected (Connected)
>>>
>>> So my question remains any way to fix this?
>>>
>>> Kind regards
>>>
>>> Davy
>>>
 On 11 Sep 2015, at 12:39, Mohammed Rafi K C > wrote:

 Can you check the checksum of the volume "public" in both of the
 current nodes. Checksums are located in
 (/var/lib/glusterd/vols/public/cksum).

 Regards
 Rafi KC

 On 09/11/2015 03:24 PM, Davy Croonen wrote:
> Hi all
>
> We have a production cluster with 2 nodes (gfs01a and gfs01b) in a
> distributed replicate setup with glusterfs 3.6.4. We want to expand
> the volume with 2 extra nodes (gfs02a and gfs02b) because we are
> running out of diskspace. Therefor we deployed 2 extra nodes with
> glusterfs 3.6.4.
>
> Now, while probing the 2 new nodes from a node in the existing
> cluster we got the following error:
>
> root@gfs01a-dcg:~# gluster peer probe gfs02a-dcg.intnet.be
> 
> peer probe: success.
> root@gfs01a-dcg:~# gluster peer status
> Number of Peers: 2
>
> Hostname: gfs01b-dcg.intnet.be 
> Uuid: cfc83cf2-b719-40c7-afea-b23accc714c3
> State: Peer in Cluster (Connected)
>
> Hostname: gfs02a-dcg.intnet.be 
> Uuid: 29592d5b-242b-43b5-afc5-5f9a1496d59f
> *State: Peer Rejected (Connected)*
>
> In the log file /var/log/glusterfs/etc-glusterfs-glusterd.vol.log the
> following entries are written:
>
> [2015-09-11 09:37:49.405906] I
> [glusterd-handler.c:1031:__glusterd_handle_cli_probe] 0-glusterd:
> Received CLI probe req gfs02a-dcg.intnet.be
>  24007
> [2015-09-11 09:37:49.428630] I
> [glusterd-handler.c:3198:glusterd_probe_begin] 0-glusterd: Unable to
> find peerinfo for host: gfs02a-dcg.intnet.be
>  (24007)
> [2015-09-11 09:37:49.438636] I
> [rpc-clnt.c:969:rpc_clnt_connection_init] 0-management: setting
> frame-timeout to 600
> [2015-09-11 09:37:49.440513] I
> [glusterd-handler.c:3131:glusterd_friend_add] 0-management: connect
> returned 0
> [2015-09-11 09:37:49.474316] I
> [glusterd-rpc-ops.c:245:__glusterd_probe_cbk] 0-management: Received
> probe resp from uuid: 29592d5b-242b-43b5-afc5-5f9a1496d59f, host:
> gfs02a-dcg.intnet.be 
> [2015-09-11 09:37:49.481801] I
> [glusterd-rpc-ops.c:387:__glusterd_probe_cbk] 0-glusterd: Received
> resp to probe req
> [2015-09-11 09:37:51.650265] I
> [glusterd-rpc-ops.c:437:__glusterd_friend_add_cbk] 0-glusterd:
> Received ACC from uuid: 29592d5b-242b-43b5-afc5-5f9a1496d59f, host:
> gfs02a-dcg.intnet.be , port: 0
> [2015-09-11 09:37:51.665861] I
> [glusterd-handshake.c:1119:__glusterd_mgmt_hndsk_versions_ack]
> 0-management: using the op-version 30603
> [2015-09-11 09:37:51.690170] I
> [glusterd-handler.c:2543:__glusterd_handle_probe_query] 0-glusterd:
> Received probe from uuid: 29592d5b-242b-43b5-afc5-5f9a1496d59f
> [2015-09-11 09:37:51.692652] I
> 

[Gluster-users] Gluster 3.6.4 peer rejected while doing probe

2015-09-11 Thread Davy Croonen
Hi all

We have a production cluster with 2 nodes (gfs01a and gfs01b) in a distributed 
replicate setup with glusterfs 3.6.4. We want to expand the volume with 2 extra 
nodes (gfs02a and gfs02b) because we are running out of diskspace. Therefor we 
deployed 2 extra nodes with glusterfs 3.6.4.

Now, while probing the 2 new nodes from a node in the existing cluster we got 
the following error:

root@gfs01a-dcg:~# gluster peer probe 
gfs02a-dcg.intnet.be
peer probe: success.
root@gfs01a-dcg:~# gluster peer status
Number of Peers: 2

Hostname: gfs01b-dcg.intnet.be
Uuid: cfc83cf2-b719-40c7-afea-b23accc714c3
State: Peer in Cluster (Connected)

Hostname: gfs02a-dcg.intnet.be
Uuid: 29592d5b-242b-43b5-afc5-5f9a1496d59f
State: Peer Rejected (Connected)

In the log file /var/log/glusterfs/etc-glusterfs-glusterd.vol.log the following 
entries are written:

[2015-09-11 09:37:49.405906] I 
[glusterd-handler.c:1031:__glusterd_handle_cli_probe] 0-glusterd: Received CLI 
probe req gfs02a-dcg.intnet.be 24007
[2015-09-11 09:37:49.428630] I [glusterd-handler.c:3198:glusterd_probe_begin] 
0-glusterd: Unable to find peerinfo for host: 
gfs02a-dcg.intnet.be (24007)
[2015-09-11 09:37:49.438636] I [rpc-clnt.c:969:rpc_clnt_connection_init] 
0-management: setting frame-timeout to 600
[2015-09-11 09:37:49.440513] I [glusterd-handler.c:3131:glusterd_friend_add] 
0-management: connect returned 0
[2015-09-11 09:37:49.474316] I [glusterd-rpc-ops.c:245:__glusterd_probe_cbk] 
0-management: Received probe resp from uuid: 
29592d5b-242b-43b5-afc5-5f9a1496d59f, host: 
gfs02a-dcg.intnet.be
[2015-09-11 09:37:49.481801] I [glusterd-rpc-ops.c:387:__glusterd_probe_cbk] 
0-glusterd: Received resp to probe req
[2015-09-11 09:37:51.650265] I 
[glusterd-rpc-ops.c:437:__glusterd_friend_add_cbk] 0-glusterd: Received ACC 
from uuid: 29592d5b-242b-43b5-afc5-5f9a1496d59f, host: 
gfs02a-dcg.intnet.be, port: 0
[2015-09-11 09:37:51.665861] I 
[glusterd-handshake.c:1119:__glusterd_mgmt_hndsk_versions_ack] 0-management: 
using the op-version 30603
[2015-09-11 09:37:51.690170] I 
[glusterd-handler.c:2543:__glusterd_handle_probe_query] 0-glusterd: Received 
probe from uuid: 29592d5b-242b-43b5-afc5-5f9a1496d59f
[2015-09-11 09:37:51.692652] I 
[glusterd-handler.c:2595:__glusterd_handle_probe_query] 0-glusterd: Responded 
to gfs02a-dcg.intnet.be, op_ret: 0, op_errno: 0, 
ret: 0
[2015-09-11 09:37:51.706203] I 
[glusterd-handler.c:2232:__glusterd_handle_incoming_friend_req] 0-glusterd: 
Received probe from uuid: 29592d5b-242b-43b5-afc5-5f9a1496d59f
[2015-09-11 09:37:51.708909] E [MSGID: 106010] 
[glusterd-utils.c:3297:glusterd_compare_friend_volume] 0-management: Version of 
Cksums public differ. local cksum = 1932535021, remote cksum = 2474653383 on 
peer gfs02a-dcg.intnet.be
[2015-09-11 09:37:51.709026] I 
[glusterd-handler.c:3367:glusterd_xfer_friend_add_resp] 0-glusterd: Responded 
to gfs02a-dcg.intnet.be (0), ret: 0
[2015-09-11 09:37:55.537231] I 
[glusterd-handler.c:1241:__glusterd_handle_cli_list_friends] 0-glusterd: 
Received cli list req

The exact same error appears while probing the second node (gfs02b).

Anyone any idea how to solve this?

Thanks in advance.

Kind regards
Davy
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Gluster 3.6.4 peer rejected while doing probe

2015-09-11 Thread Davy Croonen
Thanks for your quick respons.

As reported in the log the checksums are indeed not the same. On gfs01a-dcg it 
is 'info=1266454712’ and on gfs02a-dcg it is 'info=2613085848’. Of course my 
next question is how can I fix this?

I already tried by stopping the gluster daemon on gfs02a-dcg, deleting the 
entire vols directory and starting the gluster daemon again. On the gfs01a-dcg 
host I now did a gluster peer status which shows:

Hostname: gfs02a-dcg.intnet.be
Uuid: 29592d5b-242b-43b5-afc5-5f9a1496d59f
State: Peer in Cluster (Connected)

But, the checksum of the public volume is still not the same on gfs01a-dcg and 
gfs02a-dcg and also running a gluster peer status on gfs01b-dcg (the replica of 
gfs01a-dcg) gives me:

Hostname: gfs02a-dcg.intnet.be
Uuid: 29592d5b-242b-43b5-afc5-5f9a1496d59f
State: Peer Rejected (Connected)

So my question remains any way to fix this?

Kind regards

Davy

On 11 Sep 2015, at 12:39, Mohammed Rafi K C 
> wrote:

Can you check the checksum of the volume "public" in both of the current nodes. 
Checksums are located in (/var/lib/glusterd/vols/public/cksum).

Regards
Rafi KC

On 09/11/2015 03:24 PM, Davy Croonen wrote:
Hi all

We have a production cluster with 2 nodes (gfs01a and gfs01b) in a distributed 
replicate setup with glusterfs 3.6.4. We want to expand the volume with 2 extra 
nodes (gfs02a and gfs02b) because we are running out of diskspace. Therefor we 
deployed 2 extra nodes with glusterfs 3.6.4.

Now, while probing the 2 new nodes from a node in the existing cluster we got 
the following error:

root@gfs01a-dcg:~# gluster peer probe 
gfs02a-dcg.intnet.be
peer probe: success.
root@gfs01a-dcg:~# gluster peer status
Number of Peers: 2

Hostname: gfs01b-dcg.intnet.be
Uuid: cfc83cf2-b719-40c7-afea-b23accc714c3
State: Peer in Cluster (Connected)

Hostname: gfs02a-dcg.intnet.be
Uuid: 29592d5b-242b-43b5-afc5-5f9a1496d59f
State: Peer Rejected (Connected)

In the log file /var/log/glusterfs/etc-glusterfs-glusterd.vol.log the following 
entries are written:

[2015-09-11 09:37:49.405906] I 
[glusterd-handler.c:1031:__glusterd_handle_cli_probe] 0-glusterd: Received CLI 
probe req gfs02a-dcg.intnet.be 24007
[2015-09-11 09:37:49.428630] I [glusterd-handler.c:3198:glusterd_probe_begin] 
0-glusterd: Unable to find peerinfo for host: 
gfs02a-dcg.intnet.be (24007)
[2015-09-11 09:37:49.438636] I [rpc-clnt.c:969:rpc_clnt_connection_init] 
0-management: setting frame-timeout to 600
[2015-09-11 09:37:49.440513] I [glusterd-handler.c:3131:glusterd_friend_add] 
0-management: connect returned 0
[2015-09-11 09:37:49.474316] I [glusterd-rpc-ops.c:245:__glusterd_probe_cbk] 
0-management: Received probe resp from uuid: 
29592d5b-242b-43b5-afc5-5f9a1496d59f, host: 
gfs02a-dcg.intnet.be
[2015-09-11 09:37:49.481801] I [glusterd-rpc-ops.c:387:__glusterd_probe_cbk] 
0-glusterd: Received resp to probe req
[2015-09-11 09:37:51.650265] I 
[glusterd-rpc-ops.c:437:__glusterd_friend_add_cbk] 0-glusterd: Received ACC 
from uuid: 29592d5b-242b-43b5-afc5-5f9a1496d59f, host: 
gfs02a-dcg.intnet.be, port: 0
[2015-09-11 09:37:51.665861] I 
[glusterd-handshake.c:1119:__glusterd_mgmt_hndsk_versions_ack] 0-management: 
using the op-version 30603
[2015-09-11 09:37:51.690170] I 
[glusterd-handler.c:2543:__glusterd_handle_probe_query] 0-glusterd: Received 
probe from uuid: 29592d5b-242b-43b5-afc5-5f9a1496d59f
[2015-09-11 09:37:51.692652] I 
[glusterd-handler.c:2595:__glusterd_handle_probe_query] 0-glusterd: Responded 
to gfs02a-dcg.intnet.be, op_ret: 0, op_errno: 0, 
ret: 0
[2015-09-11 09:37:51.706203] I 
[glusterd-handler.c:2232:__glusterd_handle_incoming_friend_req] 0-glusterd: 
Received probe from uuid: 29592d5b-242b-43b5-afc5-5f9a1496d59f
[2015-09-11 09:37:51.708909] E [MSGID: 106010] 
[glusterd-utils.c:3297:glusterd_compare_friend_volume] 0-management: Version of 
Cksums public differ. local cksum = 1932535021, remote cksum = 2474653383 on 
peer gfs02a-dcg.intnet.be
[2015-09-11 09:37:51.709026] I 
[glusterd-handler.c:3367:glusterd_xfer_friend_add_resp] 0-glusterd: Responded 
to gfs02a-dcg.intnet.be (0), ret: 0
[2015-09-11 09:37:55.537231] I 
[glusterd-handler.c:1241:__glusterd_handle_cli_list_friends] 0-glusterd: 
Received cli list req

The exact same error appears while probing the second node (gfs02b).

Anyone any idea how to solve this?

Thanks in advance.

Kind regards
Davy



___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


___

Re: [Gluster-users] Gluster 3.6.4 peer rejected while doing probe

2015-09-11 Thread Davy Croonen
Atin

Please see the requested attachments.

KR
Davy

> On 11 Sep 2015, at 14:03, Atin Mukherjee  wrote:
> 
> Could you attach the contents of /var/lib/glusterd/vol//info
> file from both the nodes?
> 
> ~Atin
> 
> On 09/11/2015 04:50 PM, Davy Croonen wrote:
>> Thanks for your quick respons.
>> 
>> As reported in the log the checksums are indeed not the same. On
>> gfs01a-dcg it is 'info=1266454712’ and on gfs02a-dcg it is
>> 'info=2613085848’. Of course my next question is how can I fix this?
>> 
>> I already tried by stopping the gluster daemon on gfs02a-dcg, deleting
>> the entire vols directory and starting the gluster daemon again. On the
>> gfs01a-dcg host I now did a gluster peer status which shows:
>> 
>> Hostname: gfs02a-dcg.intnet.be 
>> Uuid: 29592d5b-242b-43b5-afc5-5f9a1496d59f
>> State: Peer in Cluster (Connected)
>> 
>> But, the checksum of the public volume is still not the same on
>> gfs01a-dcg and gfs02a-dcg and also running a gluster peer status on
>> gfs01b-dcg (the replica of gfs01a-dcg) gives me:
>> 
>> Hostname: gfs02a-dcg.intnet.be 
>> Uuid: 29592d5b-242b-43b5-afc5-5f9a1496d59f
>> State: Peer Rejected (Connected)
>> 
>> So my question remains any way to fix this?
>> 
>> Kind regards
>> 
>> Davy
>> 
>>> On 11 Sep 2015, at 12:39, Mohammed Rafi K C >> > wrote:
>>> 
>>> Can you check the checksum of the volume "public" in both of the
>>> current nodes. Checksums are located in
>>> (/var/lib/glusterd/vols/public/cksum).
>>> 
>>> Regards
>>> Rafi KC
>>> 
>>> On 09/11/2015 03:24 PM, Davy Croonen wrote:
 Hi all
 
 We have a production cluster with 2 nodes (gfs01a and gfs01b) in a
 distributed replicate setup with glusterfs 3.6.4. We want to expand
 the volume with 2 extra nodes (gfs02a and gfs02b) because we are
 running out of diskspace. Therefor we deployed 2 extra nodes with
 glusterfs 3.6.4.
 
 Now, while probing the 2 new nodes from a node in the existing
 cluster we got the following error:
 
 root@gfs01a-dcg:~# gluster peer probe gfs02a-dcg.intnet.be
 
 peer probe: success.
 root@gfs01a-dcg:~# gluster peer status
 Number of Peers: 2
 
 Hostname: gfs01b-dcg.intnet.be 
 Uuid: cfc83cf2-b719-40c7-afea-b23accc714c3
 State: Peer in Cluster (Connected)
 
 Hostname: gfs02a-dcg.intnet.be 
 Uuid: 29592d5b-242b-43b5-afc5-5f9a1496d59f
 *State: Peer Rejected (Connected)*
 
 In the log file /var/log/glusterfs/etc-glusterfs-glusterd.vol.log the
 following entries are written:
 
 [2015-09-11 09:37:49.405906] I
 [glusterd-handler.c:1031:__glusterd_handle_cli_probe] 0-glusterd:
 Received CLI probe req gfs02a-dcg.intnet.be
  24007
 [2015-09-11 09:37:49.428630] I
 [glusterd-handler.c:3198:glusterd_probe_begin] 0-glusterd: Unable to
 find peerinfo for host: gfs02a-dcg.intnet.be
  (24007)
 [2015-09-11 09:37:49.438636] I
 [rpc-clnt.c:969:rpc_clnt_connection_init] 0-management: setting
 frame-timeout to 600
 [2015-09-11 09:37:49.440513] I
 [glusterd-handler.c:3131:glusterd_friend_add] 0-management: connect
 returned 0
 [2015-09-11 09:37:49.474316] I
 [glusterd-rpc-ops.c:245:__glusterd_probe_cbk] 0-management: Received
 probe resp from uuid: 29592d5b-242b-43b5-afc5-5f9a1496d59f, host:
 gfs02a-dcg.intnet.be 
 [2015-09-11 09:37:49.481801] I
 [glusterd-rpc-ops.c:387:__glusterd_probe_cbk] 0-glusterd: Received
 resp to probe req
 [2015-09-11 09:37:51.650265] I
 [glusterd-rpc-ops.c:437:__glusterd_friend_add_cbk] 0-glusterd:
 Received ACC from uuid: 29592d5b-242b-43b5-afc5-5f9a1496d59f, host:
 gfs02a-dcg.intnet.be , port: 0
 [2015-09-11 09:37:51.665861] I
 [glusterd-handshake.c:1119:__glusterd_mgmt_hndsk_versions_ack]
 0-management: using the op-version 30603
 [2015-09-11 09:37:51.690170] I
 [glusterd-handler.c:2543:__glusterd_handle_probe_query] 0-glusterd:
 Received probe from uuid: 29592d5b-242b-43b5-afc5-5f9a1496d59f
 [2015-09-11 09:37:51.692652] I
 [glusterd-handler.c:2595:__glusterd_handle_probe_query] 0-glusterd:
 Responded to gfs02a-dcg.intnet.be ,
 op_ret: 0, op_errno: 0, ret: 0
 [2015-09-11 09:37:51.706203] I
 [glusterd-handler.c:2232:__glusterd_handle_incoming_friend_req]
 0-glusterd: Received probe from uuid:
 29592d5b-242b-43b5-afc5-5f9a1496d59f
 *[2015-09-11 09:37:51.708909] E [MSGID: 106010]
 [glusterd-utils.c:3297:glusterd_compare_friend_volume] 0-management:
 Version of Cksums public differ. local cksum = 1932535021, remote
 cksum = 2474653383 on peer 

Re: [Gluster-users] Gluster 3.6.4 peer rejected while doing probe

2015-09-11 Thread Mohammed Rafi K C
Can you check the checksum of the volume "public" in both of the current
nodes. Checksums are located in (/var/lib/glusterd/vols/public/cksum).

Regards
Rafi KC

On 09/11/2015 03:24 PM, Davy Croonen wrote:
> Hi all
>
> We have a production cluster with 2 nodes (gfs01a and gfs01b) in a
> distributed replicate setup with glusterfs 3.6.4. We want to expand
> the volume with 2 extra nodes (gfs02a and gfs02b) because we are
> running out of diskspace. Therefor we deployed 2 extra nodes with
> glusterfs 3.6.4.
>
> Now, while probing the 2 new nodes from a node in the existing cluster
> we got the following error:
>
> root@gfs01a-dcg:~# gluster peer probe gfs02a-dcg.intnet.be
> 
> peer probe: success.
> root@gfs01a-dcg:~# gluster peer status
> Number of Peers: 2
>
> Hostname: gfs01b-dcg.intnet.be 
> Uuid: cfc83cf2-b719-40c7-afea-b23accc714c3
> State: Peer in Cluster (Connected)
>
> Hostname: gfs02a-dcg.intnet.be 
> Uuid: 29592d5b-242b-43b5-afc5-5f9a1496d59f
> *State: Peer Rejected (Connected)*
>
> In the log file /var/log/glusterfs/etc-glusterfs-glusterd.vol.log the
> following entries are written:
>
> [2015-09-11 09:37:49.405906] I
> [glusterd-handler.c:1031:__glusterd_handle_cli_probe] 0-glusterd:
> Received CLI probe req gfs02a-dcg.intnet.be
>  24007
> [2015-09-11 09:37:49.428630] I
> [glusterd-handler.c:3198:glusterd_probe_begin] 0-glusterd: Unable to
> find peerinfo for host: gfs02a-dcg.intnet.be
>  (24007)
> [2015-09-11 09:37:49.438636] I
> [rpc-clnt.c:969:rpc_clnt_connection_init] 0-management: setting
> frame-timeout to 600
> [2015-09-11 09:37:49.440513] I
> [glusterd-handler.c:3131:glusterd_friend_add] 0-management: connect
> returned 0
> [2015-09-11 09:37:49.474316] I
> [glusterd-rpc-ops.c:245:__glusterd_probe_cbk] 0-management: Received
> probe resp from uuid: 29592d5b-242b-43b5-afc5-5f9a1496d59f, host:
> gfs02a-dcg.intnet.be 
> [2015-09-11 09:37:49.481801] I
> [glusterd-rpc-ops.c:387:__glusterd_probe_cbk] 0-glusterd: Received
> resp to probe req
> [2015-09-11 09:37:51.650265] I
> [glusterd-rpc-ops.c:437:__glusterd_friend_add_cbk] 0-glusterd:
> Received ACC from uuid: 29592d5b-242b-43b5-afc5-5f9a1496d59f, host:
> gfs02a-dcg.intnet.be , port: 0
> [2015-09-11 09:37:51.665861] I
> [glusterd-handshake.c:1119:__glusterd_mgmt_hndsk_versions_ack]
> 0-management: using the op-version 30603
> [2015-09-11 09:37:51.690170] I
> [glusterd-handler.c:2543:__glusterd_handle_probe_query] 0-glusterd:
> Received probe from uuid: 29592d5b-242b-43b5-afc5-5f9a1496d59f
> [2015-09-11 09:37:51.692652] I
> [glusterd-handler.c:2595:__glusterd_handle_probe_query] 0-glusterd:
> Responded to gfs02a-dcg.intnet.be ,
> op_ret: 0, op_errno: 0, ret: 0
> [2015-09-11 09:37:51.706203] I
> [glusterd-handler.c:2232:__glusterd_handle_incoming_friend_req]
> 0-glusterd: Received probe from uuid: 29592d5b-242b-43b5-afc5-5f9a1496d59f
> *[2015-09-11 09:37:51.708909] E [MSGID: 106010]
> [glusterd-utils.c:3297:glusterd_compare_friend_volume] 0-management:
> Version of Cksums public differ. local cksum = 1932535021, remote
> cksum = 2474653383 on peer gfs02a-dcg.intnet.be
> *
> [2015-09-11 09:37:51.709026] I
> [glusterd-handler.c:3367:glusterd_xfer_friend_add_resp] 0-glusterd:
> Responded to gfs02a-dcg.intnet.be  (0),
> ret: 0
> [2015-09-11 09:37:55.537231] I
> [glusterd-handler.c:1241:__glusterd_handle_cli_list_friends]
> 0-glusterd: Received cli list req
>
> The exact same error appears while probing the second node (gfs02b).
>
> Anyone any idea how to solve this?
>
> Thanks in advance.
>
> Kind regards
> Davy
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Gluster 3.6.4 peer rejected while doing probe

2015-09-11 Thread Atin Mukherjee
Could you attach the contents of /var/lib/glusterd/vol//info
file from both the nodes?

~Atin

On 09/11/2015 04:50 PM, Davy Croonen wrote:
> Thanks for your quick respons.
> 
> As reported in the log the checksums are indeed not the same. On
> gfs01a-dcg it is 'info=1266454712’ and on gfs02a-dcg it is
> 'info=2613085848’. Of course my next question is how can I fix this?
> 
> I already tried by stopping the gluster daemon on gfs02a-dcg, deleting
> the entire vols directory and starting the gluster daemon again. On the
> gfs01a-dcg host I now did a gluster peer status which shows:
> 
> Hostname: gfs02a-dcg.intnet.be 
> Uuid: 29592d5b-242b-43b5-afc5-5f9a1496d59f
> State: Peer in Cluster (Connected)
> 
> But, the checksum of the public volume is still not the same on
> gfs01a-dcg and gfs02a-dcg and also running a gluster peer status on
> gfs01b-dcg (the replica of gfs01a-dcg) gives me:
> 
> Hostname: gfs02a-dcg.intnet.be 
> Uuid: 29592d5b-242b-43b5-afc5-5f9a1496d59f
> State: Peer Rejected (Connected)
> 
> So my question remains any way to fix this?
> 
> Kind regards
> 
> Davy
> 
>> On 11 Sep 2015, at 12:39, Mohammed Rafi K C > > wrote:
>>
>> Can you check the checksum of the volume "public" in both of the
>> current nodes. Checksums are located in
>> (/var/lib/glusterd/vols/public/cksum).
>>
>> Regards
>> Rafi KC
>>
>> On 09/11/2015 03:24 PM, Davy Croonen wrote:
>>> Hi all
>>>
>>> We have a production cluster with 2 nodes (gfs01a and gfs01b) in a
>>> distributed replicate setup with glusterfs 3.6.4. We want to expand
>>> the volume with 2 extra nodes (gfs02a and gfs02b) because we are
>>> running out of diskspace. Therefor we deployed 2 extra nodes with
>>> glusterfs 3.6.4.
>>>
>>> Now, while probing the 2 new nodes from a node in the existing
>>> cluster we got the following error:
>>>
>>> root@gfs01a-dcg:~# gluster peer probe gfs02a-dcg.intnet.be
>>> 
>>> peer probe: success.
>>> root@gfs01a-dcg:~# gluster peer status
>>> Number of Peers: 2
>>>
>>> Hostname: gfs01b-dcg.intnet.be 
>>> Uuid: cfc83cf2-b719-40c7-afea-b23accc714c3
>>> State: Peer in Cluster (Connected)
>>>
>>> Hostname: gfs02a-dcg.intnet.be 
>>> Uuid: 29592d5b-242b-43b5-afc5-5f9a1496d59f
>>> *State: Peer Rejected (Connected)*
>>>
>>> In the log file /var/log/glusterfs/etc-glusterfs-glusterd.vol.log the
>>> following entries are written:
>>>
>>> [2015-09-11 09:37:49.405906] I
>>> [glusterd-handler.c:1031:__glusterd_handle_cli_probe] 0-glusterd:
>>> Received CLI probe req gfs02a-dcg.intnet.be
>>>  24007
>>> [2015-09-11 09:37:49.428630] I
>>> [glusterd-handler.c:3198:glusterd_probe_begin] 0-glusterd: Unable to
>>> find peerinfo for host: gfs02a-dcg.intnet.be
>>>  (24007)
>>> [2015-09-11 09:37:49.438636] I
>>> [rpc-clnt.c:969:rpc_clnt_connection_init] 0-management: setting
>>> frame-timeout to 600
>>> [2015-09-11 09:37:49.440513] I
>>> [glusterd-handler.c:3131:glusterd_friend_add] 0-management: connect
>>> returned 0
>>> [2015-09-11 09:37:49.474316] I
>>> [glusterd-rpc-ops.c:245:__glusterd_probe_cbk] 0-management: Received
>>> probe resp from uuid: 29592d5b-242b-43b5-afc5-5f9a1496d59f, host:
>>> gfs02a-dcg.intnet.be 
>>> [2015-09-11 09:37:49.481801] I
>>> [glusterd-rpc-ops.c:387:__glusterd_probe_cbk] 0-glusterd: Received
>>> resp to probe req
>>> [2015-09-11 09:37:51.650265] I
>>> [glusterd-rpc-ops.c:437:__glusterd_friend_add_cbk] 0-glusterd:
>>> Received ACC from uuid: 29592d5b-242b-43b5-afc5-5f9a1496d59f, host:
>>> gfs02a-dcg.intnet.be , port: 0
>>> [2015-09-11 09:37:51.665861] I
>>> [glusterd-handshake.c:1119:__glusterd_mgmt_hndsk_versions_ack]
>>> 0-management: using the op-version 30603
>>> [2015-09-11 09:37:51.690170] I
>>> [glusterd-handler.c:2543:__glusterd_handle_probe_query] 0-glusterd:
>>> Received probe from uuid: 29592d5b-242b-43b5-afc5-5f9a1496d59f
>>> [2015-09-11 09:37:51.692652] I
>>> [glusterd-handler.c:2595:__glusterd_handle_probe_query] 0-glusterd:
>>> Responded to gfs02a-dcg.intnet.be ,
>>> op_ret: 0, op_errno: 0, ret: 0
>>> [2015-09-11 09:37:51.706203] I
>>> [glusterd-handler.c:2232:__glusterd_handle_incoming_friend_req]
>>> 0-glusterd: Received probe from uuid:
>>> 29592d5b-242b-43b5-afc5-5f9a1496d59f
>>> *[2015-09-11 09:37:51.708909] E [MSGID: 106010]
>>> [glusterd-utils.c:3297:glusterd_compare_friend_volume] 0-management:
>>> Version of Cksums public differ. local cksum = 1932535021, remote
>>> cksum = 2474653383 on peer gfs02a-dcg.intnet.be
>>> *
>>> [2015-09-11 09:37:51.709026] I
>>> [glusterd-handler.c:3367:glusterd_xfer_friend_add_resp] 0-glusterd:
>>> Responded to gfs02a-dcg.intnet.be  (0),
>>> ret: 0
>>> [2015-09-11