Re: [Gluster-users] Upgrading from Gluster 3.8 to 3.12
Yes Atin. I'll take a look. On Wed, Dec 20, 2017 at 11:28 AM, Atin Mukherjeewrote: > Looks like a bug as I see tier-enabled = 0 is an additional entry in the > info file in shchhv01. As per the code, this field should be written into > the glusterd store if the op-version is >= 30706 . What I am guessing is > since we didn't have the commit 33f8703a1 "glusterd: regenerate volfiles on > op-version bump up" in 3.8.4 while bumping up the op-version the info and > volfiles were not regenerated which caused the tier-enabled entry to be > missing in the info file. > > For now, you can copy the info file for the volumes where the mismatch > happened from shchhv01 to shchhv02 and restart glusterd service on shchhv02. > That should fix up this temporarily. Unfortunately this step might need to > be repeated for other nodes as well. > > @Hari - Could you help in debugging this further. > > > > On Wed, Dec 20, 2017 at 10:44 AM, Gustave Dahl > wrote: >> >> I was attempting the same on a local sandbox and also have the same >> problem. >> >> >> Current: 3.8.4 >> >> Volume Name: shchst01 >> Type: Distributed-Replicate >> Volume ID: bcd53e52-cde6-4e58-85f9-71d230b7b0d3 >> Status: Started >> Snapshot Count: 0 >> Number of Bricks: 4 x 3 = 12 >> Transport-type: tcp >> Bricks: >> Brick1: shchhv01-sto:/data/brick3/shchst01 >> Brick2: shchhv02-sto:/data/brick3/shchst01 >> Brick3: shchhv03-sto:/data/brick3/shchst01 >> Brick4: shchhv01-sto:/data/brick1/shchst01 >> Brick5: shchhv02-sto:/data/brick1/shchst01 >> Brick6: shchhv03-sto:/data/brick1/shchst01 >> Brick7: shchhv02-sto:/data/brick2/shchst01 >> Brick8: shchhv03-sto:/data/brick2/shchst01 >> Brick9: shchhv04-sto:/data/brick2/shchst01 >> Brick10: shchhv02-sto:/data/brick4/shchst01 >> Brick11: shchhv03-sto:/data/brick4/shchst01 >> Brick12: shchhv04-sto:/data/brick4/shchst01 >> Options Reconfigured: >> cluster.data-self-heal-algorithm: full >> features.shard-block-size: 512MB >> features.shard: enable >> performance.readdir-ahead: on >> storage.owner-uid: 9869 >> storage.owner-gid: 9869 >> server.allow-insecure: on >> performance.quick-read: off >> performance.read-ahead: off >> performance.io-cache: off >> performance.stat-prefetch: off >> cluster.eager-lock: enable >> network.remote-dio: enable >> cluster.quorum-type: auto >> cluster.server-quorum-type: server >> cluster.self-heal-daemon: on >> nfs.disable: on >> performance.io-thread-count: 64 >> performance.cache-size: 1GB >> >> Upgraded shchhv01-sto to 3.12.3, others remain at 3.8.4 >> >> RESULT >> = >> Hostname: shchhv01-sto >> Uuid: f6205edb-a0ea-4247-9594-c4cdc0d05816 >> State: Peer Rejected (Connected) >> >> Upgraded Server: shchhv01-sto >> == >> [2017-12-20 05:02:44.747313] I [MSGID: 101190] >> [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread >> with >> index 1 >> [2017-12-20 05:02:44.747387] I [MSGID: 101190] >> [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread >> with >> index 2 >> [2017-12-20 05:02:44.749087] W [rpc-clnt-ping.c:246:rpc_clnt_ping_cbk] >> 0-management: RPC_CLNT_PING notify failed >> [2017-12-20 05:02:44.749165] W [rpc-clnt-ping.c:246:rpc_clnt_ping_cbk] >> 0-management: RPC_CLNT_PING notify failed >> [2017-12-20 05:02:44.749563] W [rpc-clnt-ping.c:246:rpc_clnt_ping_cbk] >> 0-management: RPC_CLNT_PING notify failed >> [2017-12-20 05:02:54.676324] I [MSGID: 106493] >> [glusterd-rpc-ops.c:486:__glusterd_friend_add_cbk] 0-glusterd: Received >> RJT >> from uuid: 546503ae-ba0e-40d4-843f-c5dbac22d272, host: shchhv02-sto, port: >> 0 >> [2017-12-20 05:02:54.690237] I [MSGID: 106163] >> [glusterd-handshake.c:1316:__glusterd_mgmt_hndsk_versions_ack] >> 0-management: >> using the op-version 30800 >> [2017-12-20 05:02:54.695823] I [MSGID: 106490] >> [glusterd-handler.c:2540:__glusterd_handle_incoming_friend_req] >> 0-glusterd: >> Received probe from uuid: 546503ae-ba0e-40d4-843f-c5dbac22d272 >> [2017-12-20 05:02:54.696956] E [MSGID: 106010] >> [glusterd-utils.c:3370:glusterd_compare_friend_volume] 0-management: >> Version >> of Cksums shchst01-sto differ. local cksum = 4218452135, remote cksum = >> 2747317484 on peer shchhv02-sto >> [2017-12-20 05:02:54.697796] I [MSGID: 106493] >> [glusterd-handler.c:3800:glusterd_xfer_friend_add_resp] 0-glusterd: >> Responded to shchhv02-sto (0), ret: 0, op_ret: -1 >> [2017-12-20 05:02:55.033822] I [MSGID: 106493] >> [glusterd-rpc-ops.c:486:__glusterd_friend_add_cbk] 0-glusterd: Received >> RJT >> from uuid: 3de22cb5-c1c1-4041-a1e1-eb969afa9b4b, host: shchhv03-sto, port: >> 0 >> [2017-12-20 05:02:55.038460] I [MSGID: 106163] >> [glusterd-handshake.c:1316:__glusterd_mgmt_hndsk_versions_ack] >> 0-management: >> using the op-version 30800 >> [2017-12-20 05:02:55.040032] I [MSGID: 106490] >> [glusterd-handler.c:2540:__glusterd_handle_incoming_friend_req] >> 0-glusterd: >> Received probe from uuid: 3de22cb5-c1c1-4041-a1e1-eb969afa9b4b >>
Re: [Gluster-users] Upgrading from Gluster 3.8 to 3.12
Looks like a bug as I see tier-enabled = 0 is an additional entry in the info file in shchhv01. As per the code, this field should be written into the glusterd store if the op-version is >= 30706 . What I am guessing is since we didn't have the commit 33f8703a1 "glusterd: regenerate volfiles on op-version bump up" in 3.8.4 while bumping up the op-version the info and volfiles were not regenerated which caused the tier-enabled entry to be missing in the info file. For now, you can copy the info file for the volumes where the mismatch happened from shchhv01 to shchhv02 and restart glusterd service on shchhv02. That should fix up this temporarily. Unfortunately this step might need to be repeated for other nodes as well. @Hari - Could you help in debugging this further. On Wed, Dec 20, 2017 at 10:44 AM, Gustave Dahlwrote: > I was attempting the same on a local sandbox and also have the same > problem. > > > Current: 3.8.4 > > Volume Name: shchst01 > Type: Distributed-Replicate > Volume ID: bcd53e52-cde6-4e58-85f9-71d230b7b0d3 > Status: Started > Snapshot Count: 0 > Number of Bricks: 4 x 3 = 12 > Transport-type: tcp > Bricks: > Brick1: shchhv01-sto:/data/brick3/shchst01 > Brick2: shchhv02-sto:/data/brick3/shchst01 > Brick3: shchhv03-sto:/data/brick3/shchst01 > Brick4: shchhv01-sto:/data/brick1/shchst01 > Brick5: shchhv02-sto:/data/brick1/shchst01 > Brick6: shchhv03-sto:/data/brick1/shchst01 > Brick7: shchhv02-sto:/data/brick2/shchst01 > Brick8: shchhv03-sto:/data/brick2/shchst01 > Brick9: shchhv04-sto:/data/brick2/shchst01 > Brick10: shchhv02-sto:/data/brick4/shchst01 > Brick11: shchhv03-sto:/data/brick4/shchst01 > Brick12: shchhv04-sto:/data/brick4/shchst01 > Options Reconfigured: > cluster.data-self-heal-algorithm: full > features.shard-block-size: 512MB > features.shard: enable > performance.readdir-ahead: on > storage.owner-uid: 9869 > storage.owner-gid: 9869 > server.allow-insecure: on > performance.quick-read: off > performance.read-ahead: off > performance.io-cache: off > performance.stat-prefetch: off > cluster.eager-lock: enable > network.remote-dio: enable > cluster.quorum-type: auto > cluster.server-quorum-type: server > cluster.self-heal-daemon: on > nfs.disable: on > performance.io-thread-count: 64 > performance.cache-size: 1GB > > Upgraded shchhv01-sto to 3.12.3, others remain at 3.8.4 > > RESULT > = > Hostname: shchhv01-sto > Uuid: f6205edb-a0ea-4247-9594-c4cdc0d05816 > State: Peer Rejected (Connected) > > Upgraded Server: shchhv01-sto > == > [2017-12-20 05:02:44.747313] I [MSGID: 101190] > [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread > with > index 1 > [2017-12-20 05:02:44.747387] I [MSGID: 101190] > [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread > with > index 2 > [2017-12-20 05:02:44.749087] W [rpc-clnt-ping.c:246:rpc_clnt_ping_cbk] > 0-management: RPC_CLNT_PING notify failed > [2017-12-20 05:02:44.749165] W [rpc-clnt-ping.c:246:rpc_clnt_ping_cbk] > 0-management: RPC_CLNT_PING notify failed > [2017-12-20 05:02:44.749563] W [rpc-clnt-ping.c:246:rpc_clnt_ping_cbk] > 0-management: RPC_CLNT_PING notify failed > [2017-12-20 05:02:54.676324] I [MSGID: 106493] > [glusterd-rpc-ops.c:486:__glusterd_friend_add_cbk] 0-glusterd: Received > RJT > from uuid: 546503ae-ba0e-40d4-843f-c5dbac22d272, host: shchhv02-sto, > port: 0 > [2017-12-20 05:02:54.690237] I [MSGID: 106163] > [glusterd-handshake.c:1316:__glusterd_mgmt_hndsk_versions_ack] > 0-management: > using the op-version 30800 > [2017-12-20 05:02:54.695823] I [MSGID: 106490] > [glusterd-handler.c:2540:__glusterd_handle_incoming_friend_req] > 0-glusterd: > Received probe from uuid: 546503ae-ba0e-40d4-843f-c5dbac22d272 > [2017-12-20 05:02:54.696956] E [MSGID: 106010] > [glusterd-utils.c:3370:glusterd_compare_friend_volume] 0-management: > Version > of Cksums shchst01-sto differ. local cksum = 4218452135, remote cksum = > 2747317484 on peer shchhv02-sto > [2017-12-20 05:02:54.697796] I [MSGID: 106493] > [glusterd-handler.c:3800:glusterd_xfer_friend_add_resp] 0-glusterd: > Responded to shchhv02-sto (0), ret: 0, op_ret: -1 > [2017-12-20 05:02:55.033822] I [MSGID: 106493] > [glusterd-rpc-ops.c:486:__glusterd_friend_add_cbk] 0-glusterd: Received > RJT > from uuid: 3de22cb5-c1c1-4041-a1e1-eb969afa9b4b, host: shchhv03-sto, > port: 0 > [2017-12-20 05:02:55.038460] I [MSGID: 106163] > [glusterd-handshake.c:1316:__glusterd_mgmt_hndsk_versions_ack] > 0-management: > using the op-version 30800 > [2017-12-20 05:02:55.040032] I [MSGID: 106490] > [glusterd-handler.c:2540:__glusterd_handle_incoming_friend_req] > 0-glusterd: > Received probe from uuid: 3de22cb5-c1c1-4041-a1e1-eb969afa9b4b > [2017-12-20 05:02:55.040266] E [MSGID: 106010] > [glusterd-utils.c:3370:glusterd_compare_friend_volume] 0-management: > Version > of Cksums shchst01-sto differ. local cksum = 4218452135, remote cksum = > 2747317484 on peer shchhv03-sto > [2017-12-20
Re: [Gluster-users] How to make sure self-heal backlog is empty ?
Hi, Can you provide the - volume info - shd log - mount log of the volumes which are showing pending entries, to debug the issue. Thanks & Regards, Karthik On Wed, Dec 20, 2017 at 3:11 AM, Matt Waymackwrote: > Mine also has a list of files that seemingly never heal. They are usually > isolated on my arbiter bricks, but not always. I would also like to find > an answer for this behavior. > > -Original Message- > From: gluster-users-boun...@gluster.org [mailto:gluster-users-bounces@ > gluster.org] On Behalf Of Hoggins! > Sent: Tuesday, December 19, 2017 12:26 PM > To: gluster-users > Subject: [Gluster-users] How to make sure self-heal backlog is empty ? > > Hello list, > > I'm not sure what to look for here, not sure if what I'm seeing is the > actual "backlog" (that we need to make sure is empty while performing a > rolling upgrade before going to the next node), how can I tell, while > reading this, if it's okay to reboot / upgrade my next node in the pool ? > Here is what I do for checking : > > for i in `gluster volume list`; do gluster volume heal $i info; done > > And here is what I get : > > Brick ngluster-1.network.hoggins.fr:/export/brick/clem > Status: Connected > Number of entries: 0 > > Brick ngluster-2.network.hoggins.fr:/export/brick/clem > Status: Connected > Number of entries: 0 > > Brick ngluster-3.network.hoggins.fr:/export/brick/clem > Status: Connected > Number of entries: 0 > > Brick ngluster-1.network.hoggins.fr:/export/brick/mailer > Status: Connected > Number of entries: 0 > > Brick ngluster-2.network.hoggins.fr:/export/brick/mailer > Status: Connected > Number of entries: 0 > > Brick ngluster-3.network.hoggins.fr:/export/brick/mailer > > Status: Connected > Number of entries: 1 > > Brick ngluster-1.network.hoggins.fr:/export/brick/rom > Status: Connected > Number of entries: 0 > > Brick ngluster-2.network.hoggins.fr:/export/brick/rom > Status: Connected > Number of entries: 0 > > Brick ngluster-3.network.hoggins.fr:/export/brick/rom > > Status: Connected > Number of entries: 1 > > Brick ngluster-1.network.hoggins.fr:/export/brick/thedude > Status: Connected > Number of entries: 0 > > Brick ngluster-2.network.hoggins.fr:/export/brick/thedude > > Status: Connected > Number of entries: 1 > > Brick ngluster-3.network.hoggins.fr:/export/brick/thedude > Status: Connected > Number of entries: 0 > > Brick ngluster-1.network.hoggins.fr:/export/brick/web > Status: Connected > Number of entries: 0 > > Brick ngluster-2.network.hoggins.fr:/export/brick/web > > > > Status: Connected > Number of entries: 3 > > Brick ngluster-3.network.hoggins.fr:/export/brick/web > > > > > > > > > > > > Status: Connected > Number of entries: 11 > > > Should I be worrying with this never ending ? > > Thank you, > > Hoggins! > > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Upgrading from Gluster 3.8 to 3.12
I was attempting the same on a local sandbox and also have the same problem. Current: 3.8.4 Volume Name: shchst01 Type: Distributed-Replicate Volume ID: bcd53e52-cde6-4e58-85f9-71d230b7b0d3 Status: Started Snapshot Count: 0 Number of Bricks: 4 x 3 = 12 Transport-type: tcp Bricks: Brick1: shchhv01-sto:/data/brick3/shchst01 Brick2: shchhv02-sto:/data/brick3/shchst01 Brick3: shchhv03-sto:/data/brick3/shchst01 Brick4: shchhv01-sto:/data/brick1/shchst01 Brick5: shchhv02-sto:/data/brick1/shchst01 Brick6: shchhv03-sto:/data/brick1/shchst01 Brick7: shchhv02-sto:/data/brick2/shchst01 Brick8: shchhv03-sto:/data/brick2/shchst01 Brick9: shchhv04-sto:/data/brick2/shchst01 Brick10: shchhv02-sto:/data/brick4/shchst01 Brick11: shchhv03-sto:/data/brick4/shchst01 Brick12: shchhv04-sto:/data/brick4/shchst01 Options Reconfigured: cluster.data-self-heal-algorithm: full features.shard-block-size: 512MB features.shard: enable performance.readdir-ahead: on storage.owner-uid: 9869 storage.owner-gid: 9869 server.allow-insecure: on performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off cluster.eager-lock: enable network.remote-dio: enable cluster.quorum-type: auto cluster.server-quorum-type: server cluster.self-heal-daemon: on nfs.disable: on performance.io-thread-count: 64 performance.cache-size: 1GB Upgraded shchhv01-sto to 3.12.3, others remain at 3.8.4 RESULT = Hostname: shchhv01-sto Uuid: f6205edb-a0ea-4247-9594-c4cdc0d05816 State: Peer Rejected (Connected) Upgraded Server: shchhv01-sto == [2017-12-20 05:02:44.747313] I [MSGID: 101190] [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2017-12-20 05:02:44.747387] I [MSGID: 101190] [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2 [2017-12-20 05:02:44.749087] W [rpc-clnt-ping.c:246:rpc_clnt_ping_cbk] 0-management: RPC_CLNT_PING notify failed [2017-12-20 05:02:44.749165] W [rpc-clnt-ping.c:246:rpc_clnt_ping_cbk] 0-management: RPC_CLNT_PING notify failed [2017-12-20 05:02:44.749563] W [rpc-clnt-ping.c:246:rpc_clnt_ping_cbk] 0-management: RPC_CLNT_PING notify failed [2017-12-20 05:02:54.676324] I [MSGID: 106493] [glusterd-rpc-ops.c:486:__glusterd_friend_add_cbk] 0-glusterd: Received RJT from uuid: 546503ae-ba0e-40d4-843f-c5dbac22d272, host: shchhv02-sto, port: 0 [2017-12-20 05:02:54.690237] I [MSGID: 106163] [glusterd-handshake.c:1316:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30800 [2017-12-20 05:02:54.695823] I [MSGID: 106490] [glusterd-handler.c:2540:__glusterd_handle_incoming_friend_req] 0-glusterd: Received probe from uuid: 546503ae-ba0e-40d4-843f-c5dbac22d272 [2017-12-20 05:02:54.696956] E [MSGID: 106010] [glusterd-utils.c:3370:glusterd_compare_friend_volume] 0-management: Version of Cksums shchst01-sto differ. local cksum = 4218452135, remote cksum = 2747317484 on peer shchhv02-sto [2017-12-20 05:02:54.697796] I [MSGID: 106493] [glusterd-handler.c:3800:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to shchhv02-sto (0), ret: 0, op_ret: -1 [2017-12-20 05:02:55.033822] I [MSGID: 106493] [glusterd-rpc-ops.c:486:__glusterd_friend_add_cbk] 0-glusterd: Received RJT from uuid: 3de22cb5-c1c1-4041-a1e1-eb969afa9b4b, host: shchhv03-sto, port: 0 [2017-12-20 05:02:55.038460] I [MSGID: 106163] [glusterd-handshake.c:1316:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30800 [2017-12-20 05:02:55.040032] I [MSGID: 106490] [glusterd-handler.c:2540:__glusterd_handle_incoming_friend_req] 0-glusterd: Received probe from uuid: 3de22cb5-c1c1-4041-a1e1-eb969afa9b4b [2017-12-20 05:02:55.040266] E [MSGID: 106010] [glusterd-utils.c:3370:glusterd_compare_friend_volume] 0-management: Version of Cksums shchst01-sto differ. local cksum = 4218452135, remote cksum = 2747317484 on peer shchhv03-sto [2017-12-20 05:02:55.040405] I [MSGID: 106493] [glusterd-handler.c:3800:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to shchhv03-sto (0), ret: 0, op_ret: -1 [2017-12-20 05:02:55.584854] I [MSGID: 106493] [glusterd-rpc-ops.c:486:__glusterd_friend_add_cbk] 0-glusterd: Received RJT from uuid: 36306e37-d7f0-4fec-9140-0d0f1bd2d2d5, host: shchhv04-sto, port: 0 [2017-12-20 05:02:55.595125] I [MSGID: 106163] [glusterd-handshake.c:1316:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30800 [2017-12-20 05:02:55.600804] I [MSGID: 106490] [glusterd-handler.c:2540:__glusterd_handle_incoming_friend_req] 0-glusterd: Received probe from uuid: 36306e37-d7f0-4fec-9140-0d0f1bd2d2d5 [2017-12-20 05:02:55.601288] E [MSGID: 106010] [glusterd-utils.c:3370:glusterd_compare_friend_volume] 0-management: Version of Cksums shchst01-sto differ. local cksum = 4218452135, remote cksum = 2747317484 on peer shchhv04-sto [2017-12-20 05:02:55.601497] I [MSGID: 106493] [glusterd-handler.c:3800:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to shchhv04-sto (0), ret: 0, op_ret: -1
Re: [Gluster-users] Upgrading from Gluster 3.8 to 3.12
I have not done the upgrade yet. Since this is a production cluster I need to make sure it stays up or schedule some downtime if it doesn't doesn't. Thanks. On Tue, Dec 19, 2017 at 10:11 AM, Atin Mukherjeewrote: > > > On Tue, Dec 19, 2017 at 1:10 AM, Ziemowit Pierzycki > wrote: >> >> Hi, >> >> I have a cluster of 10 servers all running Fedora 24 along with >> Gluster 3.8. I'm planning on doing rolling upgrades to Fedora 27 with >> Gluster 3.12. I saw the documentation and did some testing but I >> would like to run my plan through some (more?) educated minds. >> >> The current setup is: >> >> Volume Name: vol0 >> Distributed-Replicate >> Number of Bricks: 2 x (2 + 1) = 6 >> Bricks: >> Brick1: glt01:/vol/vol0 >> Brick2: glt02:/vol/vol0 >> Brick3: glt05:/vol/vol0 (arbiter) >> Brick4: glt03:/vol/vol0 >> Brick5: glt04:/vol/vol0 >> Brick6: glt06:/vol/vol0 (arbiter) >> >> Volume Name: vol1 >> Distributed-Replicate >> Number of Bricks: 2 x (2 + 1) = 6 >> Bricks: >> Brick1: glt07:/vol/vol1 >> Brick2: glt08:/vol/vol1 >> Brick3: glt05:/vol/vol1 (arbiter) >> Brick4: glt09:/vol/vol1 >> Brick5: glt10:/vol/vol1 >> Brick6: glt06:/vol/vol1 (arbiter) >> >> After performing the upgrade because of differences in checksums, the >> upgraded nodes will become: >> >> State: Peer Rejected (Connected) > > > Have you upgraded all the nodes? If yes, have you bumped up the > cluster.op-version after upgrading all the nodes? Please follow : > http://docs.gluster.org/en/latest/Upgrade-Guide/op_version/ for more details > on how to bump up the cluster.op-version. In case you have done all of these > and you're seeing a checksum issue then I'm afraid you have hit a bug. I'd > need further details like the checksum mismatch error from glusterd.log file > along with the the exact volume's info file from > /var/lib/glusterd/vols//info between both the peers to debug this > further. > >> >> If I start doing the upgrades one at a time, with nodes glt10 to glt01 >> except for the arbiters glt05 and glt06, and then upgrading the >> arbiters last, everything should remain online at all times through >> the process. Correct? >> >> Thanks. >> ___ >> Gluster-users mailing list >> Gluster-users@gluster.org >> http://lists.gluster.org/mailman/listinfo/gluster-users > > ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] How to make sure self-heal backlog is empty ?
Mine also has a list of files that seemingly never heal. They are usually isolated on my arbiter bricks, but not always. I would also like to find an answer for this behavior. -Original Message- From: gluster-users-boun...@gluster.org [mailto:gluster-users-boun...@gluster.org] On Behalf Of Hoggins! Sent: Tuesday, December 19, 2017 12:26 PM To: gluster-usersSubject: [Gluster-users] How to make sure self-heal backlog is empty ? Hello list, I'm not sure what to look for here, not sure if what I'm seeing is the actual "backlog" (that we need to make sure is empty while performing a rolling upgrade before going to the next node), how can I tell, while reading this, if it's okay to reboot / upgrade my next node in the pool ? Here is what I do for checking : for i in `gluster volume list`; do gluster volume heal $i info; done And here is what I get : Brick ngluster-1.network.hoggins.fr:/export/brick/clem Status: Connected Number of entries: 0 Brick ngluster-2.network.hoggins.fr:/export/brick/clem Status: Connected Number of entries: 0 Brick ngluster-3.network.hoggins.fr:/export/brick/clem Status: Connected Number of entries: 0 Brick ngluster-1.network.hoggins.fr:/export/brick/mailer Status: Connected Number of entries: 0 Brick ngluster-2.network.hoggins.fr:/export/brick/mailer Status: Connected Number of entries: 0 Brick ngluster-3.network.hoggins.fr:/export/brick/mailer Status: Connected Number of entries: 1 Brick ngluster-1.network.hoggins.fr:/export/brick/rom Status: Connected Number of entries: 0 Brick ngluster-2.network.hoggins.fr:/export/brick/rom Status: Connected Number of entries: 0 Brick ngluster-3.network.hoggins.fr:/export/brick/rom Status: Connected Number of entries: 1 Brick ngluster-1.network.hoggins.fr:/export/brick/thedude Status: Connected Number of entries: 0 Brick ngluster-2.network.hoggins.fr:/export/brick/thedude Status: Connected Number of entries: 1 Brick ngluster-3.network.hoggins.fr:/export/brick/thedude Status: Connected Number of entries: 0 Brick ngluster-1.network.hoggins.fr:/export/brick/web Status: Connected Number of entries: 0 Brick ngluster-2.network.hoggins.fr:/export/brick/web Status: Connected Number of entries: 3 Brick ngluster-3.network.hoggins.fr:/export/brick/web Status: Connected Number of entries: 11 Should I be worrying with this never ending ? Thank you, Hoggins! ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Wrong volume size with df
I have a glusterfs setup with distributed disperse volumes 5 * ( 4 + 2 ). After a server crash, "gluster peer status" reports all peers as connected. "gluster volume status detail" shows that all bricks are up and running with the right size, but when I use df from a client mount point, the size displayed is about 1/6 of the total size. When browsing the data, they seem to be ok tho. I need some help to understand what's going on as i can't delete the volume and recreate it from scratch. ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Testing sharding on tiered volume
Ben, For this set of tests we are using bricks provisioned on RAID storage. We are not trying to test performance of tiered volume right now. The goal is to find solution to handle large files that do not fit into hot tier. You are correct that there is a lot of promotions and demotions of shards when we are reading or writing large file. But at least sharding let us do what pure tiered volume rejects. Regarding to our experiments, tiered volume puts into hot tier only file's shards that are accessed. All other file shards are staying on cold tier. Access to shards is managed by the shard translator. After hitting problem while deleting files we stopped testing. We hope that somebody with knowledge of sharded and tiered volume implementations tells us how difficult could be to fix this issue. Best regards, Viktor Nosov -Original Message- From: Ben Turner [mailto:btur...@redhat.com] Sent: Sunday, December 17, 2017 5:22 PM To: Viktor Nosov Cc: gluster-users@gluster.org Subject: Re: [Gluster-users] Testing sharding on tiered volume - Original Message - > From: "Viktor Nosov"> To: gluster-users@gluster.org > Cc: vno...@stonefly.com > Sent: Friday, December 8, 2017 5:45:25 PM > Subject: [Gluster-users] Testing sharding on tiered volume > > Hi, > > I'm looking to use sharding on tiered volume. This is very attractive > feature that could benefit tiered volume to let it handle larger files > without hitting the "out of (hot)space problem". > I decided to set test configuration on GlusterFS 3.12.3 when tiered > volume has 2TB cold and 1GB hot segments. Shard size is set to be 16MB. > For testing 100GB files are used. It seems writes and reads are going well. > But I hit problem trying to delete files from the volume. One of > GlusterFS processes hit segmentation fault. > The problem is reproducible each time. It was submitted to Red Hat > Bugzilla bug list and has ID 1521119. > You can find details at the attachments to the bug. > > I'm wondering are there other users who are interested to apply > sharding to tiered volumes and are experienced similar problems? > How this problem can be resolved or could it be avoided? This isn't a config I have tried before, from the BZ it mentions: -The VOL is shared out over SMB to a windows client -You have a 1GB hot tier, 2099GB cold tier -You have features.shard-block-size: 16MB and cluster.tier-demote-frequency: 150 What are you using for the hot tier that has only 1GB, some sort of RAM disk or battery back flash or something? With that small of a hot tier you may run into some strange performance characteristics, AFAIK the current tiering implementation uses rebalance to move files between tiers when the tier demote freq times out. You may end up spending alot of time waiting for your hot files to rebalance to the cold tier since its out of space, you will also probably have other files being written to the cold tier with the hot tier full, further using up your IOPs. I don't know how tiering would treat sharded files, would it only promote the shards of the file that are in use or would it try to put the whole file / all the shards on the hot tier? If you get a free min update me on what you are trying todo, happy to help however I can. -b > > Best regards, > > Viktor Nosov > > > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users > ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] How to make sure self-heal backlog is empty ?
Hello list, I'm not sure what to look for here, not sure if what I'm seeing is the actual "backlog" (that we need to make sure is empty while performing a rolling upgrade before going to the next node), how can I tell, while reading this, if it's okay to reboot / upgrade my next node in the pool ? Here is what I do for checking : for i in `gluster volume list`; do gluster volume heal $i info; done And here is what I get : Brick ngluster-1.network.hoggins.fr:/export/brick/clem Status: Connected Number of entries: 0 Brick ngluster-2.network.hoggins.fr:/export/brick/clem Status: Connected Number of entries: 0 Brick ngluster-3.network.hoggins.fr:/export/brick/clem Status: Connected Number of entries: 0 Brick ngluster-1.network.hoggins.fr:/export/brick/mailer Status: Connected Number of entries: 0 Brick ngluster-2.network.hoggins.fr:/export/brick/mailer Status: Connected Number of entries: 0 Brick ngluster-3.network.hoggins.fr:/export/brick/mailer Status: Connected Number of entries: 1 Brick ngluster-1.network.hoggins.fr:/export/brick/rom Status: Connected Number of entries: 0 Brick ngluster-2.network.hoggins.fr:/export/brick/rom Status: Connected Number of entries: 0 Brick ngluster-3.network.hoggins.fr:/export/brick/rom Status: Connected Number of entries: 1 Brick ngluster-1.network.hoggins.fr:/export/brick/thedude Status: Connected Number of entries: 0 Brick ngluster-2.network.hoggins.fr:/export/brick/thedude Status: Connected Number of entries: 1 Brick ngluster-3.network.hoggins.fr:/export/brick/thedude Status: Connected Number of entries: 0 Brick ngluster-1.network.hoggins.fr:/export/brick/web Status: Connected Number of entries: 0 Brick ngluster-2.network.hoggins.fr:/export/brick/web Status: Connected Number of entries: 3 Brick ngluster-3.network.hoggins.fr:/export/brick/web Status: Connected Number of entries: 11 Should I be worrying with this never ending ? Thank you, Hoggins! signature.asc Description: OpenPGP digital signature ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Upgrading from Gluster 3.8 to 3.12
On Tue, Dec 19, 2017 at 1:10 AM, Ziemowit Pierzyckiwrote: > Hi, > > I have a cluster of 10 servers all running Fedora 24 along with > Gluster 3.8. I'm planning on doing rolling upgrades to Fedora 27 with > Gluster 3.12. I saw the documentation and did some testing but I > would like to run my plan through some (more?) educated minds. > > The current setup is: > > Volume Name: vol0 > Distributed-Replicate > Number of Bricks: 2 x (2 + 1) = 6 > Bricks: > Brick1: glt01:/vol/vol0 > Brick2: glt02:/vol/vol0 > Brick3: glt05:/vol/vol0 (arbiter) > Brick4: glt03:/vol/vol0 > Brick5: glt04:/vol/vol0 > Brick6: glt06:/vol/vol0 (arbiter) > > Volume Name: vol1 > Distributed-Replicate > Number of Bricks: 2 x (2 + 1) = 6 > Bricks: > Brick1: glt07:/vol/vol1 > Brick2: glt08:/vol/vol1 > Brick3: glt05:/vol/vol1 (arbiter) > Brick4: glt09:/vol/vol1 > Brick5: glt10:/vol/vol1 > Brick6: glt06:/vol/vol1 (arbiter) > > After performing the upgrade because of differences in checksums, the > upgraded nodes will become: > > State: Peer Rejected (Connected) > Have you upgraded all the nodes? If yes, have you bumped up the cluster.op-version after upgrading all the nodes? Please follow : http://docs.gluster.org/en/latest/Upgrade-Guide/op_version/ for more details on how to bump up the cluster.op-version. In case you have done all of these and you're seeing a checksum issue then I'm afraid you have hit a bug. I'd need further details like the checksum mismatch error from glusterd.log file along with the the exact volume's info file from /var/lib/glusterd/vols//info between both the peers to debug this further. > If I start doing the upgrades one at a time, with nodes glt10 to glt01 > except for the arbiters glt05 and glt06, and then upgrading the > arbiters last, everything should remain online at all times through > the process. Correct? > > Thanks. > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users > ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] [heketi-devel] Heketi v5.0.1 security release available for download
On Mon, Dec 18, 2017 at 06:10:29PM +0100, Michael Adam wrote: > > Heketi v5.0.1 is now available. Packages for the CentOS Storage SIG are now becomnig available in the testing repository. Packages can be obtained (soon) with the following steps: # yum --enablerepo=centos-gluster*-test update heketi The update will show up for systems that have the repository files from the centos-release-gluster{310,312,313} packages. Other repositories will not receive any updates anymore. I'd appreciate it if someone could do basic testing of the update. When some feedback is provided, the package can be marked for release to the CentOS mirrors. Niels > This release[1] fixes a flaw that was found in heketi API that > permits issuing of OS commands through specially crafted > requests, possibly leading to escalation of privileges. More > details can be obtained at CVE-2017-15103. [2] > > If authentication is turned "on" in heketi configuration, the > flaw can be exploited only by those who possess authentication > key. In case you have a deployment without authentication set to > true, we recommend that you turn it on and also upgrade to > version with fix. > > > We thank Markus Krell of NTT Security for identifying > the vulnerability and notifying us about the it. > > The fix was provided by Raghavendra Talur of Red Hat. > > > Note that previous versions of Heketi are discontinued > and users are strongly recommended to upgrade to Heketi 5.0.1. > > > Michael Adam on behalf of the Heketi team > > > [1] https://github.com/heketi/heketi/releases/tag/v5.0.1 > [2] https://cve.mitre.org/cgi-bin/cvename.cgi?name=2017-15103 > ___ > heketi-devel mailing list > heketi-de...@gluster.org > http://lists.gluster.org/mailman/listinfo/heketi-devel ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users