Re: [Gluster-users] Upgrading from Gluster 3.8 to 3.12

2017-12-19 Thread Hari Gowtham
Yes Atin. I'll take a look.

On Wed, Dec 20, 2017 at 11:28 AM, Atin Mukherjee  wrote:
> Looks like a bug as I see tier-enabled = 0 is an additional entry in the
> info file in shchhv01. As per the code, this field should be written into
> the glusterd store if the op-version is >= 30706 . What I am guessing is
> since we didn't have the commit 33f8703a1 "glusterd: regenerate volfiles on
> op-version bump up" in 3.8.4 while bumping up the op-version the info and
> volfiles were not regenerated which caused the tier-enabled entry to be
> missing in the info file.
>
> For now, you can copy the info file for the volumes where the mismatch
> happened from shchhv01 to shchhv02 and restart glusterd service on shchhv02.
> That should fix up this temporarily. Unfortunately this step might need to
> be repeated for other nodes as well.
>
> @Hari - Could you help in debugging this further.
>
>
>
> On Wed, Dec 20, 2017 at 10:44 AM, Gustave Dahl 
> wrote:
>>
>> I was attempting the same on a local sandbox and also have the same
>> problem.
>>
>>
>> Current: 3.8.4
>>
>> Volume Name: shchst01
>> Type: Distributed-Replicate
>> Volume ID: bcd53e52-cde6-4e58-85f9-71d230b7b0d3
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 4 x 3 = 12
>> Transport-type: tcp
>> Bricks:
>> Brick1: shchhv01-sto:/data/brick3/shchst01
>> Brick2: shchhv02-sto:/data/brick3/shchst01
>> Brick3: shchhv03-sto:/data/brick3/shchst01
>> Brick4: shchhv01-sto:/data/brick1/shchst01
>> Brick5: shchhv02-sto:/data/brick1/shchst01
>> Brick6: shchhv03-sto:/data/brick1/shchst01
>> Brick7: shchhv02-sto:/data/brick2/shchst01
>> Brick8: shchhv03-sto:/data/brick2/shchst01
>> Brick9: shchhv04-sto:/data/brick2/shchst01
>> Brick10: shchhv02-sto:/data/brick4/shchst01
>> Brick11: shchhv03-sto:/data/brick4/shchst01
>> Brick12: shchhv04-sto:/data/brick4/shchst01
>> Options Reconfigured:
>> cluster.data-self-heal-algorithm: full
>> features.shard-block-size: 512MB
>> features.shard: enable
>> performance.readdir-ahead: on
>> storage.owner-uid: 9869
>> storage.owner-gid: 9869
>> server.allow-insecure: on
>> performance.quick-read: off
>> performance.read-ahead: off
>> performance.io-cache: off
>> performance.stat-prefetch: off
>> cluster.eager-lock: enable
>> network.remote-dio: enable
>> cluster.quorum-type: auto
>> cluster.server-quorum-type: server
>> cluster.self-heal-daemon: on
>> nfs.disable: on
>> performance.io-thread-count: 64
>> performance.cache-size: 1GB
>>
>> Upgraded shchhv01-sto to 3.12.3, others remain at 3.8.4
>>
>> RESULT
>> =
>> Hostname: shchhv01-sto
>> Uuid: f6205edb-a0ea-4247-9594-c4cdc0d05816
>> State: Peer Rejected (Connected)
>>
>> Upgraded Server:  shchhv01-sto
>> ==
>> [2017-12-20 05:02:44.747313] I [MSGID: 101190]
>> [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread
>> with
>> index 1
>> [2017-12-20 05:02:44.747387] I [MSGID: 101190]
>> [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread
>> with
>> index 2
>> [2017-12-20 05:02:44.749087] W [rpc-clnt-ping.c:246:rpc_clnt_ping_cbk]
>> 0-management: RPC_CLNT_PING notify failed
>> [2017-12-20 05:02:44.749165] W [rpc-clnt-ping.c:246:rpc_clnt_ping_cbk]
>> 0-management: RPC_CLNT_PING notify failed
>> [2017-12-20 05:02:44.749563] W [rpc-clnt-ping.c:246:rpc_clnt_ping_cbk]
>> 0-management: RPC_CLNT_PING notify failed
>> [2017-12-20 05:02:54.676324] I [MSGID: 106493]
>> [glusterd-rpc-ops.c:486:__glusterd_friend_add_cbk] 0-glusterd: Received
>> RJT
>> from uuid: 546503ae-ba0e-40d4-843f-c5dbac22d272, host: shchhv02-sto, port:
>> 0
>> [2017-12-20 05:02:54.690237] I [MSGID: 106163]
>> [glusterd-handshake.c:1316:__glusterd_mgmt_hndsk_versions_ack]
>> 0-management:
>> using the op-version 30800
>> [2017-12-20 05:02:54.695823] I [MSGID: 106490]
>> [glusterd-handler.c:2540:__glusterd_handle_incoming_friend_req]
>> 0-glusterd:
>> Received probe from uuid: 546503ae-ba0e-40d4-843f-c5dbac22d272
>> [2017-12-20 05:02:54.696956] E [MSGID: 106010]
>> [glusterd-utils.c:3370:glusterd_compare_friend_volume] 0-management:
>> Version
>> of Cksums shchst01-sto differ. local cksum = 4218452135, remote cksum =
>> 2747317484 on peer shchhv02-sto
>> [2017-12-20 05:02:54.697796] I [MSGID: 106493]
>> [glusterd-handler.c:3800:glusterd_xfer_friend_add_resp] 0-glusterd:
>> Responded to shchhv02-sto (0), ret: 0, op_ret: -1
>> [2017-12-20 05:02:55.033822] I [MSGID: 106493]
>> [glusterd-rpc-ops.c:486:__glusterd_friend_add_cbk] 0-glusterd: Received
>> RJT
>> from uuid: 3de22cb5-c1c1-4041-a1e1-eb969afa9b4b, host: shchhv03-sto, port:
>> 0
>> [2017-12-20 05:02:55.038460] I [MSGID: 106163]
>> [glusterd-handshake.c:1316:__glusterd_mgmt_hndsk_versions_ack]
>> 0-management:
>> using the op-version 30800
>> [2017-12-20 05:02:55.040032] I [MSGID: 106490]
>> [glusterd-handler.c:2540:__glusterd_handle_incoming_friend_req]
>> 0-glusterd:
>> Received probe from uuid: 3de22cb5-c1c1-4041-a1e1-eb969afa9b4b
>> 

Re: [Gluster-users] Upgrading from Gluster 3.8 to 3.12

2017-12-19 Thread Atin Mukherjee
Looks like a bug as I see tier-enabled = 0 is an additional entry in the
info file in shchhv01. As per the code, this field should be written into
the glusterd store if the op-version is >= 30706 . What I am guessing is
since we didn't have the commit 33f8703a1 "glusterd: regenerate volfiles on
op-version bump up" in 3.8.4 while bumping up the op-version the info and
volfiles were not regenerated which caused the tier-enabled entry to be
missing in the info file.

For now, you can copy the info file for the volumes where the mismatch
happened from shchhv01 to shchhv02 and restart glusterd service on
shchhv02. That should fix up this temporarily. Unfortunately this step
might need to be repeated for other nodes as well.

@Hari - Could you help in debugging this further.



On Wed, Dec 20, 2017 at 10:44 AM, Gustave Dahl 
wrote:

> I was attempting the same on a local sandbox and also have the same
> problem.
>
>
> Current: 3.8.4
>
> Volume Name: shchst01
> Type: Distributed-Replicate
> Volume ID: bcd53e52-cde6-4e58-85f9-71d230b7b0d3
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 4 x 3 = 12
> Transport-type: tcp
> Bricks:
> Brick1: shchhv01-sto:/data/brick3/shchst01
> Brick2: shchhv02-sto:/data/brick3/shchst01
> Brick3: shchhv03-sto:/data/brick3/shchst01
> Brick4: shchhv01-sto:/data/brick1/shchst01
> Brick5: shchhv02-sto:/data/brick1/shchst01
> Brick6: shchhv03-sto:/data/brick1/shchst01
> Brick7: shchhv02-sto:/data/brick2/shchst01
> Brick8: shchhv03-sto:/data/brick2/shchst01
> Brick9: shchhv04-sto:/data/brick2/shchst01
> Brick10: shchhv02-sto:/data/brick4/shchst01
> Brick11: shchhv03-sto:/data/brick4/shchst01
> Brick12: shchhv04-sto:/data/brick4/shchst01
> Options Reconfigured:
> cluster.data-self-heal-algorithm: full
> features.shard-block-size: 512MB
> features.shard: enable
> performance.readdir-ahead: on
> storage.owner-uid: 9869
> storage.owner-gid: 9869
> server.allow-insecure: on
> performance.quick-read: off
> performance.read-ahead: off
> performance.io-cache: off
> performance.stat-prefetch: off
> cluster.eager-lock: enable
> network.remote-dio: enable
> cluster.quorum-type: auto
> cluster.server-quorum-type: server
> cluster.self-heal-daemon: on
> nfs.disable: on
> performance.io-thread-count: 64
> performance.cache-size: 1GB
>
> Upgraded shchhv01-sto to 3.12.3, others remain at 3.8.4
>
> RESULT
> =
> Hostname: shchhv01-sto
> Uuid: f6205edb-a0ea-4247-9594-c4cdc0d05816
> State: Peer Rejected (Connected)
>
> Upgraded Server:  shchhv01-sto
> ==
> [2017-12-20 05:02:44.747313] I [MSGID: 101190]
> [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread
> with
> index 1
> [2017-12-20 05:02:44.747387] I [MSGID: 101190]
> [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread
> with
> index 2
> [2017-12-20 05:02:44.749087] W [rpc-clnt-ping.c:246:rpc_clnt_ping_cbk]
> 0-management: RPC_CLNT_PING notify failed
> [2017-12-20 05:02:44.749165] W [rpc-clnt-ping.c:246:rpc_clnt_ping_cbk]
> 0-management: RPC_CLNT_PING notify failed
> [2017-12-20 05:02:44.749563] W [rpc-clnt-ping.c:246:rpc_clnt_ping_cbk]
> 0-management: RPC_CLNT_PING notify failed
> [2017-12-20 05:02:54.676324] I [MSGID: 106493]
> [glusterd-rpc-ops.c:486:__glusterd_friend_add_cbk] 0-glusterd: Received
> RJT
> from uuid: 546503ae-ba0e-40d4-843f-c5dbac22d272, host: shchhv02-sto,
> port: 0
> [2017-12-20 05:02:54.690237] I [MSGID: 106163]
> [glusterd-handshake.c:1316:__glusterd_mgmt_hndsk_versions_ack]
> 0-management:
> using the op-version 30800
> [2017-12-20 05:02:54.695823] I [MSGID: 106490]
> [glusterd-handler.c:2540:__glusterd_handle_incoming_friend_req]
> 0-glusterd:
> Received probe from uuid: 546503ae-ba0e-40d4-843f-c5dbac22d272
> [2017-12-20 05:02:54.696956] E [MSGID: 106010]
> [glusterd-utils.c:3370:glusterd_compare_friend_volume] 0-management:
> Version
> of Cksums shchst01-sto differ. local cksum = 4218452135, remote cksum =
> 2747317484 on peer shchhv02-sto
> [2017-12-20 05:02:54.697796] I [MSGID: 106493]
> [glusterd-handler.c:3800:glusterd_xfer_friend_add_resp] 0-glusterd:
> Responded to shchhv02-sto (0), ret: 0, op_ret: -1
> [2017-12-20 05:02:55.033822] I [MSGID: 106493]
> [glusterd-rpc-ops.c:486:__glusterd_friend_add_cbk] 0-glusterd: Received
> RJT
> from uuid: 3de22cb5-c1c1-4041-a1e1-eb969afa9b4b, host: shchhv03-sto,
> port: 0
> [2017-12-20 05:02:55.038460] I [MSGID: 106163]
> [glusterd-handshake.c:1316:__glusterd_mgmt_hndsk_versions_ack]
> 0-management:
> using the op-version 30800
> [2017-12-20 05:02:55.040032] I [MSGID: 106490]
> [glusterd-handler.c:2540:__glusterd_handle_incoming_friend_req]
> 0-glusterd:
> Received probe from uuid: 3de22cb5-c1c1-4041-a1e1-eb969afa9b4b
> [2017-12-20 05:02:55.040266] E [MSGID: 106010]
> [glusterd-utils.c:3370:glusterd_compare_friend_volume] 0-management:
> Version
> of Cksums shchst01-sto differ. local cksum = 4218452135, remote cksum =
> 2747317484 on peer shchhv03-sto
> [2017-12-20 

Re: [Gluster-users] How to make sure self-heal backlog is empty ?

2017-12-19 Thread Karthik Subrahmanya
Hi,

Can you provide the
- volume info
- shd log
- mount log
of the volumes which are showing pending entries, to debug the issue.

Thanks & Regards,
Karthik

On Wed, Dec 20, 2017 at 3:11 AM, Matt Waymack  wrote:

> Mine also has a list of files that seemingly never heal.  They are usually
> isolated on my arbiter bricks, but not always.  I would also like to find
> an answer for this behavior.
>
> -Original Message-
> From: gluster-users-boun...@gluster.org [mailto:gluster-users-bounces@
> gluster.org] On Behalf Of Hoggins!
> Sent: Tuesday, December 19, 2017 12:26 PM
> To: gluster-users 
> Subject: [Gluster-users] How to make sure self-heal backlog is empty ?
>
> Hello list,
>
> I'm not sure what to look for here, not sure if what I'm seeing is the
> actual "backlog" (that we need to make sure is empty while performing a
> rolling upgrade before going to the next node), how can I tell, while
> reading this, if it's okay to reboot / upgrade my next node in the pool ?
> Here is what I do for checking :
>
> for i in `gluster volume list`; do gluster volume heal $i info; done
>
> And here is what I get :
>
> Brick ngluster-1.network.hoggins.fr:/export/brick/clem
> Status: Connected
> Number of entries: 0
>
> Brick ngluster-2.network.hoggins.fr:/export/brick/clem
> Status: Connected
> Number of entries: 0
>
> Brick ngluster-3.network.hoggins.fr:/export/brick/clem
> Status: Connected
> Number of entries: 0
>
> Brick ngluster-1.network.hoggins.fr:/export/brick/mailer
> Status: Connected
> Number of entries: 0
>
> Brick ngluster-2.network.hoggins.fr:/export/brick/mailer
> Status: Connected
> Number of entries: 0
>
> Brick ngluster-3.network.hoggins.fr:/export/brick/mailer
> 
> Status: Connected
> Number of entries: 1
>
> Brick ngluster-1.network.hoggins.fr:/export/brick/rom
> Status: Connected
> Number of entries: 0
>
> Brick ngluster-2.network.hoggins.fr:/export/brick/rom
> Status: Connected
> Number of entries: 0
>
> Brick ngluster-3.network.hoggins.fr:/export/brick/rom
> 
> Status: Connected
> Number of entries: 1
>
> Brick ngluster-1.network.hoggins.fr:/export/brick/thedude
> Status: Connected
> Number of entries: 0
>
> Brick ngluster-2.network.hoggins.fr:/export/brick/thedude
> 
> Status: Connected
> Number of entries: 1
>
> Brick ngluster-3.network.hoggins.fr:/export/brick/thedude
> Status: Connected
> Number of entries: 0
>
> Brick ngluster-1.network.hoggins.fr:/export/brick/web
> Status: Connected
> Number of entries: 0
>
> Brick ngluster-2.network.hoggins.fr:/export/brick/web
> 
> 
> 
> Status: Connected
> Number of entries: 3
>
> Brick ngluster-3.network.hoggins.fr:/export/brick/web
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Status: Connected
> Number of entries: 11
>
>
> Should I be worrying with this never ending ?
>
> Thank you,
>
> Hoggins!
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Upgrading from Gluster 3.8 to 3.12

2017-12-19 Thread Gustave Dahl
I was attempting the same on a local sandbox and also have the same problem.


Current: 3.8.4

Volume Name: shchst01
Type: Distributed-Replicate
Volume ID: bcd53e52-cde6-4e58-85f9-71d230b7b0d3
Status: Started
Snapshot Count: 0
Number of Bricks: 4 x 3 = 12
Transport-type: tcp
Bricks:
Brick1: shchhv01-sto:/data/brick3/shchst01
Brick2: shchhv02-sto:/data/brick3/shchst01
Brick3: shchhv03-sto:/data/brick3/shchst01
Brick4: shchhv01-sto:/data/brick1/shchst01
Brick5: shchhv02-sto:/data/brick1/shchst01
Brick6: shchhv03-sto:/data/brick1/shchst01
Brick7: shchhv02-sto:/data/brick2/shchst01
Brick8: shchhv03-sto:/data/brick2/shchst01
Brick9: shchhv04-sto:/data/brick2/shchst01
Brick10: shchhv02-sto:/data/brick4/shchst01
Brick11: shchhv03-sto:/data/brick4/shchst01
Brick12: shchhv04-sto:/data/brick4/shchst01
Options Reconfigured:
cluster.data-self-heal-algorithm: full
features.shard-block-size: 512MB
features.shard: enable
performance.readdir-ahead: on
storage.owner-uid: 9869
storage.owner-gid: 9869
server.allow-insecure: on
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
cluster.self-heal-daemon: on
nfs.disable: on
performance.io-thread-count: 64
performance.cache-size: 1GB

Upgraded shchhv01-sto to 3.12.3, others remain at 3.8.4

RESULT
=
Hostname: shchhv01-sto
Uuid: f6205edb-a0ea-4247-9594-c4cdc0d05816
State: Peer Rejected (Connected)

Upgraded Server:  shchhv01-sto
==
[2017-12-20 05:02:44.747313] I [MSGID: 101190]
[event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread with
index 1
[2017-12-20 05:02:44.747387] I [MSGID: 101190]
[event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread with
index 2
[2017-12-20 05:02:44.749087] W [rpc-clnt-ping.c:246:rpc_clnt_ping_cbk]
0-management: RPC_CLNT_PING notify failed
[2017-12-20 05:02:44.749165] W [rpc-clnt-ping.c:246:rpc_clnt_ping_cbk]
0-management: RPC_CLNT_PING notify failed
[2017-12-20 05:02:44.749563] W [rpc-clnt-ping.c:246:rpc_clnt_ping_cbk]
0-management: RPC_CLNT_PING notify failed
[2017-12-20 05:02:54.676324] I [MSGID: 106493]
[glusterd-rpc-ops.c:486:__glusterd_friend_add_cbk] 0-glusterd: Received RJT
from uuid: 546503ae-ba0e-40d4-843f-c5dbac22d272, host: shchhv02-sto, port: 0
[2017-12-20 05:02:54.690237] I [MSGID: 106163]
[glusterd-handshake.c:1316:__glusterd_mgmt_hndsk_versions_ack] 0-management:
using the op-version 30800
[2017-12-20 05:02:54.695823] I [MSGID: 106490]
[glusterd-handler.c:2540:__glusterd_handle_incoming_friend_req] 0-glusterd:
Received probe from uuid: 546503ae-ba0e-40d4-843f-c5dbac22d272
[2017-12-20 05:02:54.696956] E [MSGID: 106010]
[glusterd-utils.c:3370:glusterd_compare_friend_volume] 0-management: Version
of Cksums shchst01-sto differ. local cksum = 4218452135, remote cksum =
2747317484 on peer shchhv02-sto
[2017-12-20 05:02:54.697796] I [MSGID: 106493]
[glusterd-handler.c:3800:glusterd_xfer_friend_add_resp] 0-glusterd:
Responded to shchhv02-sto (0), ret: 0, op_ret: -1
[2017-12-20 05:02:55.033822] I [MSGID: 106493]
[glusterd-rpc-ops.c:486:__glusterd_friend_add_cbk] 0-glusterd: Received RJT
from uuid: 3de22cb5-c1c1-4041-a1e1-eb969afa9b4b, host: shchhv03-sto, port: 0
[2017-12-20 05:02:55.038460] I [MSGID: 106163]
[glusterd-handshake.c:1316:__glusterd_mgmt_hndsk_versions_ack] 0-management:
using the op-version 30800
[2017-12-20 05:02:55.040032] I [MSGID: 106490]
[glusterd-handler.c:2540:__glusterd_handle_incoming_friend_req] 0-glusterd:
Received probe from uuid: 3de22cb5-c1c1-4041-a1e1-eb969afa9b4b
[2017-12-20 05:02:55.040266] E [MSGID: 106010]
[glusterd-utils.c:3370:glusterd_compare_friend_volume] 0-management: Version
of Cksums shchst01-sto differ. local cksum = 4218452135, remote cksum =
2747317484 on peer shchhv03-sto
[2017-12-20 05:02:55.040405] I [MSGID: 106493]
[glusterd-handler.c:3800:glusterd_xfer_friend_add_resp] 0-glusterd:
Responded to shchhv03-sto (0), ret: 0, op_ret: -1
[2017-12-20 05:02:55.584854] I [MSGID: 106493]
[glusterd-rpc-ops.c:486:__glusterd_friend_add_cbk] 0-glusterd: Received RJT
from uuid: 36306e37-d7f0-4fec-9140-0d0f1bd2d2d5, host: shchhv04-sto, port: 0
[2017-12-20 05:02:55.595125] I [MSGID: 106163]
[glusterd-handshake.c:1316:__glusterd_mgmt_hndsk_versions_ack] 0-management:
using the op-version 30800
[2017-12-20 05:02:55.600804] I [MSGID: 106490]
[glusterd-handler.c:2540:__glusterd_handle_incoming_friend_req] 0-glusterd:
Received probe from uuid: 36306e37-d7f0-4fec-9140-0d0f1bd2d2d5
[2017-12-20 05:02:55.601288] E [MSGID: 106010]
[glusterd-utils.c:3370:glusterd_compare_friend_volume] 0-management: Version
of Cksums shchst01-sto differ. local cksum = 4218452135, remote cksum =
2747317484 on peer shchhv04-sto
[2017-12-20 05:02:55.601497] I [MSGID: 106493]
[glusterd-handler.c:3800:glusterd_xfer_friend_add_resp] 0-glusterd:
Responded to shchhv04-sto (0), ret: 0, op_ret: -1

Re: [Gluster-users] Upgrading from Gluster 3.8 to 3.12

2017-12-19 Thread Ziemowit Pierzycki
I have not done the upgrade yet.  Since this is a production cluster I
need to make sure it stays up or schedule some downtime if it doesn't
doesn't.  Thanks.

On Tue, Dec 19, 2017 at 10:11 AM, Atin Mukherjee  wrote:
>
>
> On Tue, Dec 19, 2017 at 1:10 AM, Ziemowit Pierzycki 
> wrote:
>>
>> Hi,
>>
>> I have a cluster of 10 servers all running Fedora 24 along with
>> Gluster 3.8.  I'm planning on doing rolling upgrades to Fedora 27 with
>> Gluster 3.12.  I saw the documentation and did some testing but I
>> would like to run my plan through some (more?) educated minds.
>>
>> The current setup is:
>>
>> Volume Name: vol0
>> Distributed-Replicate
>> Number of Bricks: 2 x (2 + 1) = 6
>> Bricks:
>> Brick1: glt01:/vol/vol0
>> Brick2: glt02:/vol/vol0
>> Brick3: glt05:/vol/vol0 (arbiter)
>> Brick4: glt03:/vol/vol0
>> Brick5: glt04:/vol/vol0
>> Brick6: glt06:/vol/vol0 (arbiter)
>>
>> Volume Name: vol1
>> Distributed-Replicate
>> Number of Bricks: 2 x (2 + 1) = 6
>> Bricks:
>> Brick1: glt07:/vol/vol1
>> Brick2: glt08:/vol/vol1
>> Brick3: glt05:/vol/vol1 (arbiter)
>> Brick4: glt09:/vol/vol1
>> Brick5: glt10:/vol/vol1
>> Brick6: glt06:/vol/vol1 (arbiter)
>>
>> After performing the upgrade because of differences in checksums, the
>> upgraded nodes will become:
>>
>> State: Peer Rejected (Connected)
>
>
> Have you upgraded all the nodes? If yes, have you bumped up the
> cluster.op-version after upgrading all the nodes? Please follow :
> http://docs.gluster.org/en/latest/Upgrade-Guide/op_version/ for more details
> on how to bump up the cluster.op-version. In case you have done all of these
> and you're seeing a checksum issue then I'm afraid you have hit a bug. I'd
> need further details like the checksum mismatch error from glusterd.log file
> along with the the exact volume's info file from
> /var/lib/glusterd/vols//info between both the peers to debug this
> further.
>
>>
>> If I start doing the upgrades one at a time, with nodes glt10 to glt01
>> except for the arbiters glt05 and glt06, and then upgrading the
>> arbiters last, everything should remain online at all times through
>> the process.  Correct?
>>
>> Thanks.
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] How to make sure self-heal backlog is empty ?

2017-12-19 Thread Matt Waymack
Mine also has a list of files that seemingly never heal.  They are usually 
isolated on my arbiter bricks, but not always.  I would also like to find an 
answer for this behavior.

-Original Message-
From: gluster-users-boun...@gluster.org 
[mailto:gluster-users-boun...@gluster.org] On Behalf Of Hoggins!
Sent: Tuesday, December 19, 2017 12:26 PM
To: gluster-users 
Subject: [Gluster-users] How to make sure self-heal backlog is empty ?

Hello list,

I'm not sure what to look for here, not sure if what I'm seeing is the actual 
"backlog" (that we need to make sure is empty while performing a rolling 
upgrade before going to the next node), how can I tell, while reading this, if 
it's okay to reboot / upgrade my next node in the pool ?
Here is what I do for checking :

for i in `gluster volume list`; do gluster volume heal $i info; done

And here is what I get :

Brick ngluster-1.network.hoggins.fr:/export/brick/clem
Status: Connected
Number of entries: 0

Brick ngluster-2.network.hoggins.fr:/export/brick/clem
Status: Connected
Number of entries: 0

Brick ngluster-3.network.hoggins.fr:/export/brick/clem
Status: Connected
Number of entries: 0

Brick ngluster-1.network.hoggins.fr:/export/brick/mailer
Status: Connected
Number of entries: 0

Brick ngluster-2.network.hoggins.fr:/export/brick/mailer
Status: Connected
Number of entries: 0

Brick ngluster-3.network.hoggins.fr:/export/brick/mailer

Status: Connected
Number of entries: 1

Brick ngluster-1.network.hoggins.fr:/export/brick/rom
Status: Connected
Number of entries: 0

Brick ngluster-2.network.hoggins.fr:/export/brick/rom
Status: Connected
Number of entries: 0

Brick ngluster-3.network.hoggins.fr:/export/brick/rom

Status: Connected
Number of entries: 1

Brick ngluster-1.network.hoggins.fr:/export/brick/thedude
Status: Connected
Number of entries: 0

Brick ngluster-2.network.hoggins.fr:/export/brick/thedude

Status: Connected
Number of entries: 1

Brick ngluster-3.network.hoggins.fr:/export/brick/thedude
Status: Connected
Number of entries: 0

Brick ngluster-1.network.hoggins.fr:/export/brick/web
Status: Connected
Number of entries: 0

Brick ngluster-2.network.hoggins.fr:/export/brick/web



Status: Connected
Number of entries: 3

Brick ngluster-3.network.hoggins.fr:/export/brick/web











Status: Connected
Number of entries: 11


Should I be worrying with this never ending ?

    Thank you,

        Hoggins!

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Wrong volume size with df

2017-12-19 Thread Teknologeek Teknologeek
I have a glusterfs setup with distributed disperse volumes 5 * ( 4 + 2 ).

After a server crash, "gluster peer status" reports all peers as connected.

"gluster volume status detail" shows that all bricks are up and running
with the right size, but when I use df from a client mount point, the size
displayed is about 1/6 of the total size.

When browsing the data, they seem to be ok tho.

I need some help to understand what's going on as i can't delete the volume
and recreate it from scratch.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Testing sharding on tiered volume

2017-12-19 Thread Viktor Nosov
Ben,

For this set of tests we are using bricks provisioned on RAID storage. We are 
not trying to test performance of tiered volume right now. The goal is to find 
solution to handle large files that do not fit into hot tier.
You are correct that there is a lot of promotions and demotions of shards  when 
we are reading or writing large file. But at least sharding  let us do what 
pure tiered volume rejects.
Regarding to our experiments, tiered volume puts into hot  tier only file's 
shards that are accessed. All other file shards are staying on cold tier. 
Access to shards is managed by the shard translator.
After hitting problem while deleting files we stopped testing. We hope that 
somebody with knowledge of sharded and tiered volume implementations tells us 
how difficult could be to fix this issue. 

Best regards,
Viktor Nosov 

-Original Message-
From: Ben Turner [mailto:btur...@redhat.com] 
Sent: Sunday, December 17, 2017 5:22 PM
To: Viktor Nosov
Cc: gluster-users@gluster.org
Subject: Re: [Gluster-users] Testing sharding on tiered volume

- Original Message -
> From: "Viktor Nosov" 
> To: gluster-users@gluster.org
> Cc: vno...@stonefly.com
> Sent: Friday, December 8, 2017 5:45:25 PM
> Subject: [Gluster-users] Testing sharding on tiered volume
> 
> Hi,
> 
> I'm looking to use sharding on tiered volume. This is very attractive 
> feature that could benefit tiered volume to let it handle larger files 
> without hitting the "out of (hot)space problem".
> I decided to set  test configuration on GlusterFS 3.12.3  when tiered 
> volume has 2TB cold and 1GB hot segments. Shard size is set to be 16MB.
> For testing 100GB files are used. It seems writes and reads are going well.
> But I hit problem trying to delete files from the volume. One of 
> GlusterFS processes hit segmentation fault.
> The problem is reproducible each time.  It was submitted to Red Hat 
> Bugzilla bug list and has ID 1521119.
> You can find details at the attachments to the bug.
> 
> I'm wondering are there other users who are interested to apply 
> sharding to tiered volumes and are experienced similar problems?
>  How  this problem can be resolved or could it be avoided?

This isn't a config I have tried before, from the BZ it mentions:

-The VOL is shared out over SMB to a windows client -You have a 1GB hot tier, 
2099GB cold tier -You have features.shard-block-size: 16MB and 
cluster.tier-demote-frequency: 150

What are you using for the hot tier that has only 1GB, some sort of RAM disk or 
battery back flash or something?  

With that small of a hot tier you may run into some strange performance 
characteristics, AFAIK the current tiering implementation uses rebalance to 
move files between tiers when the tier demote freq times out.  You may end up 
spending alot of time waiting for your hot files to rebalance to the cold tier 
since its out of space, you will also probably have other files being written 
to the cold tier with the hot tier full, further using up your IOPs.  

I don't know how tiering would treat sharded files, would it only promote the 
shards of the file that are in use or would it try to put the whole file / all 
the shards on the hot tier?  

If you get a free min update me on what you are trying todo, happy to help 
however I can.

-b


> 
> Best regards,
> 
>  Viktor Nosov
> 
> 
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
> 


___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] How to make sure self-heal backlog is empty ?

2017-12-19 Thread Hoggins!
Hello list,

I'm not sure what to look for here, not sure if what I'm seeing is the
actual "backlog" (that we need to make sure is empty while performing a
rolling upgrade before going to the next node), how can I tell, while
reading this, if it's okay to reboot / upgrade my next node in the pool ?
Here is what I do for checking :

for i in `gluster volume list`; do gluster volume heal $i info; done

And here is what I get :

Brick ngluster-1.network.hoggins.fr:/export/brick/clem
Status: Connected
Number of entries: 0

Brick ngluster-2.network.hoggins.fr:/export/brick/clem
Status: Connected
Number of entries: 0

Brick ngluster-3.network.hoggins.fr:/export/brick/clem
Status: Connected
Number of entries: 0

Brick ngluster-1.network.hoggins.fr:/export/brick/mailer
Status: Connected
Number of entries: 0

Brick ngluster-2.network.hoggins.fr:/export/brick/mailer
Status: Connected
Number of entries: 0

Brick ngluster-3.network.hoggins.fr:/export/brick/mailer

Status: Connected
Number of entries: 1

Brick ngluster-1.network.hoggins.fr:/export/brick/rom
Status: Connected
Number of entries: 0

Brick ngluster-2.network.hoggins.fr:/export/brick/rom
Status: Connected
Number of entries: 0

Brick ngluster-3.network.hoggins.fr:/export/brick/rom

Status: Connected
Number of entries: 1

Brick ngluster-1.network.hoggins.fr:/export/brick/thedude
Status: Connected
Number of entries: 0

Brick ngluster-2.network.hoggins.fr:/export/brick/thedude

Status: Connected
Number of entries: 1

Brick ngluster-3.network.hoggins.fr:/export/brick/thedude
Status: Connected
Number of entries: 0

Brick ngluster-1.network.hoggins.fr:/export/brick/web
Status: Connected
Number of entries: 0

Brick ngluster-2.network.hoggins.fr:/export/brick/web



Status: Connected
Number of entries: 3

Brick ngluster-3.network.hoggins.fr:/export/brick/web











Status: Connected
Number of entries: 11


Should I be worrying with this never ending ?

    Thank you,

        Hoggins!



signature.asc
Description: OpenPGP digital signature
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Upgrading from Gluster 3.8 to 3.12

2017-12-19 Thread Atin Mukherjee
On Tue, Dec 19, 2017 at 1:10 AM, Ziemowit Pierzycki 
wrote:

> Hi,
>
> I have a cluster of 10 servers all running Fedora 24 along with
> Gluster 3.8.  I'm planning on doing rolling upgrades to Fedora 27 with
> Gluster 3.12.  I saw the documentation and did some testing but I
> would like to run my plan through some (more?) educated minds.
>
> The current setup is:
>
> Volume Name: vol0
> Distributed-Replicate
> Number of Bricks: 2 x (2 + 1) = 6
> Bricks:
> Brick1: glt01:/vol/vol0
> Brick2: glt02:/vol/vol0
> Brick3: glt05:/vol/vol0 (arbiter)
> Brick4: glt03:/vol/vol0
> Brick5: glt04:/vol/vol0
> Brick6: glt06:/vol/vol0 (arbiter)
>
> Volume Name: vol1
> Distributed-Replicate
> Number of Bricks: 2 x (2 + 1) = 6
> Bricks:
> Brick1: glt07:/vol/vol1
> Brick2: glt08:/vol/vol1
> Brick3: glt05:/vol/vol1 (arbiter)
> Brick4: glt09:/vol/vol1
> Brick5: glt10:/vol/vol1
> Brick6: glt06:/vol/vol1 (arbiter)
>
> After performing the upgrade because of differences in checksums, the
> upgraded nodes will become:
>
> State: Peer Rejected (Connected)
>

Have you upgraded all the nodes? If yes, have you bumped up the
cluster.op-version after upgrading all the nodes? Please follow :
http://docs.gluster.org/en/latest/Upgrade-Guide/op_version/ for more
details on how to bump up the cluster.op-version. In case you have done all
of these and you're seeing a checksum issue then I'm afraid you have hit a
bug. I'd need further details like the checksum mismatch error from
glusterd.log file along with the the exact volume's info file from
/var/lib/glusterd/vols//info between both the peers to debug this
further.


> If I start doing the upgrades one at a time, with nodes glt10 to glt01
> except for the arbiters glt05 and glt06, and then upgrading the
> arbiters last, everything should remain online at all times through
> the process.  Correct?
>
> Thanks.
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [heketi-devel] Heketi v5.0.1 security release available for download

2017-12-19 Thread Niels de Vos
On Mon, Dec 18, 2017 at 06:10:29PM +0100, Michael Adam wrote:
> 
> Heketi v5.0.1 is now available.

Packages for the CentOS Storage SIG are now becomnig available in the
testing repository. Packages can be obtained (soon) with the following
steps:

  # yum --enablerepo=centos-gluster*-test update heketi

The update will show up for systems that have the repository files from
the centos-release-gluster{310,312,313} packages. Other repositories
will not receive any updates anymore.

I'd appreciate it if someone could do basic testing of the update. When
some feedback is provided, the package can be marked for release to the
CentOS mirrors.

Niels


> This release[1] fixes a flaw that was found in heketi API that
> permits issuing of OS commands through specially crafted
> requests, possibly leading to escalation of privileges. More
> details can be obtained at CVE-2017-15103. [2]
> 
> If authentication is turned "on" in heketi configuration, the
> flaw can be exploited only by those who possess authentication
> key. In case you have a deployment without authentication set to
> true, we recommend that you turn it on and also upgrade to
> version with fix.
> 
> 
> We thank Markus Krell of NTT Security for identifying
> the vulnerability and notifying us about the it.
> 
> The fix was provided by Raghavendra Talur of Red Hat.
> 
> 
> Note that previous versions of Heketi are discontinued
> and users are strongly recommended to upgrade to Heketi 5.0.1.
> 
> 
> Michael Adam on behalf of the Heketi team
> 
> 
> [1] https://github.com/heketi/heketi/releases/tag/v5.0.1
> [2] https://cve.mitre.org/cgi-bin/cvename.cgi?name=2017-15103



> ___
> heketi-devel mailing list
> heketi-de...@gluster.org
> http://lists.gluster.org/mailman/listinfo/heketi-devel

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users