Re: [Gluster-users] Volume stuck unable to add a brick

Karthik Subrahmanya Tue, 16 Apr 2019 05:20:21 -0700

Hi Boris,

Thank you for providing the logs.
The problem here is because of the "auth.allow: 127.0.0.1" setting on the
volume.
When you try to add a new brick to the volume internally replication module
will try to set some metadata on the existing bricks to mark pending heal
on the new brick, by creating a temporary mount. Because of the auth.allow
setting that mount gets permission errors as seen in the below logs,
leading to add-brick failure.


>From data-gluster-dockervols.log-webserver9 :
[2019-04-15 14:00:34.226838] I [addr.c:55:compare_addr_and_update]
0-/data/gluster/dockervols: allowed = "127.0.0.1", received addr =
"192.168.200.147"
[2019-04-15 14:00:34.226895] E [MSGID: 115004]
[authenticate.c:224:gf_authenticate] 0-auth: no authentication module is
interested in accepting remote-client (null)
[2019-04-15 14:00:34.227129] E [MSGID: 115001]
[server-handshake.c:848:server_setvolume] 0-dockervols-server: Cannot
authenticate client from
webserver8.cast.org-55674-2019/04/15-14:00:20:495333-dockervols-client-2-0-0
3.12.2 [Permission denied]

>From dockervols-add-brick-mount.log :
[2019-04-15 14:00:20.672033] W [MSGID: 114043]
[client-handshake.c:1109:client_setvolume_cbk] 0-dockervols-client-2:
failed to set the volume [Permission denied]
[2019-04-15 14:00:20.672102] W [MSGID: 114007]
[client-handshake.c:1138:client_setvolume_cbk] 0-dockervols-client-2:
failed to get 'process-uuid' from reply dict [Invalid argument]
[2019-04-15 14:00:20.672129] E [MSGID: 114044]
[client-handshake.c:1144:client_setvolume_cbk] 0-dockervols-client-2:
SETVOLUME on remote-host failed: Authentication failed [Permission denied]
[2019-04-15 14:00:20.672151] I [MSGID: 114049]
[client-handshake.c:1258:client_setvolume_cbk] 0-dockervols-client-2:
sending AUTH_FAILED event

This is a known issue and we are planning to fix this. For the time being
we have a workaround for this.
- Before you try adding the brick set the auth.allow option to default
i.e., "*" or you can do this by running "gluster v reset <volname>
auth.allow"
- Add the brick
- After it succeeds set back the auth.allow option to the previous value.

Regards,
Karthik

On Tue, Apr 16, 2019 at 5:20 PM Boris Goldowsky <bgoldow...@cast.org> wrote:

> OK, log files attached.
>
>
>
> Boris
>
>
>
>
>
> *From: *Karthik Subrahmanya <ksubr...@redhat.com>
> *Date: *Tuesday, April 16, 2019 at 2:52 AM
> *To: *Atin Mukherjee <atin.mukherje...@gmail.com>, Boris Goldowsky <
> bgoldow...@cast.org>
> *Cc: *Gluster-users <gluster-users@gluster.org>
> *Subject: *Re: [Gluster-users] Volume stuck unable to add a brick
>
>
>
>
>
>
>
> On Mon, Apr 15, 2019 at 9:43 PM Atin Mukherjee <atin.mukherje...@gmail.com>
> wrote:
>
> +Karthik Subrahmanya <ksubr...@redhat.com>
>
>
>
> Didn't we we fix this problem recently? Failed to set extended attribute
> indicates that temp mount is failing and we don't have quorum number of
> bricks up.
>
>
>
> We had two fixes which handles two kind of add-brick scenarios.
>
> [1] Fails add-brick when increasing the replica count if any of the brick
> is down to avoid data loss. This can be overridden by using the force
> option.
>
> [2] Allow add-brick to set the extended attributes by the temp mount if
> the volume is already mounted (has clients).
>
>
>
> They are in version 3.12.2 so, patch [1] is present there. But since they
> are using the force option it should not have any problem even if they have
> any brick down. The error message they are getting is also different, so it
> is not because of any brick being down I guess.
>
> Patch [2] is not present in 3.12.2 and it is not the conversion from plain
> distribute to replicate volume. So the scenario is different here.
>
> It seems like they are hitting some other issue.
>
>
>
> @Boris,
>
> Can you attach the add-brick's temp mount log. The file name should look
> something like "dockervols-add-brick-mount.log". Can you also provide all
> the brick logs of that volume during that time.
>
>
>
> [1] https://review.gluster.org/#/c/glusterfs/+/16330/
>
> [2] https://review.gluster.org/#/c/glusterfs/+/21791/
>
>
>
> Regards,
>
> Karthik
>
>
>
> Boris - What's the gluster version are you using?
>
>
>
>
>
>
>
> On Mon, Apr 15, 2019 at 7:35 PM Boris Goldowsky <bgoldow...@cast.org>
> wrote:
>
> Atin, thank you for the reply.  Here are all of those pieces of
> information:
>
>
>
> [bgoldowsky@webserver9 ~]$ gluster --version
>
> glusterfs 3.12.2
>
> (same on all nodes)
>
>
>
> [bgoldowsky@webserver9 ~]$ sudo gluster peer status
>
> Number of Peers: 3
>
>
>
> Hostname: webserver11.cast.org
>
> Uuid: c2b147fd-cab4-4859-9922-db5730f8549d
>
> State: Peer in Cluster (Connected)
>
>
>
> Hostname: webserver1.cast.org
>
> Uuid: 4b918f65-2c9d-478e-8648-81d1d6526d4c
>
> State: Peer in Cluster (Connected)
>
> Other names:
>
> 192.168.200.131
>
> webserver1
>
>
>
> Hostname: webserver8.cast.org
>
> Uuid: be2f568b-61c5-4016-9264-083e4e6453a2
>
> State: Peer in Cluster (Connected)
>
> Other names:
>
> webserver8
>
>
>
> [bgoldowsky@webserver1 ~]$ sudo gluster v info
>
> Volume Name: dockervols
>
> Type: Replicate
>
> Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a
>
> Status: Started
>
> Snapshot Count: 0
>
> Number of Bricks: 1 x 3 = 3
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: webserver1:/data/gluster/dockervols
>
> Brick2: webserver11:/data/gluster/dockervols
>
> Brick3: webserver9:/data/gluster/dockervols
>
> Options Reconfigured:
>
> nfs.disable: on
>
> transport.address-family: inet
>
> auth.allow: 127.0.0.1
>
>
>
> Volume Name: testvol
>
> Type: Replicate
>
> Volume ID: 4d5f00f5-00ea-4dcf-babf-1a76eca55332
>
> Status: Started
>
> Snapshot Count: 0
>
> Number of Bricks: 1 x 4 = 4
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: webserver1:/data/gluster/testvol
>
> Brick2: webserver9:/data/gluster/testvol
>
> Brick3: webserver11:/data/gluster/testvol
>
> Brick4: webserver8:/data/gluster/testvol
>
> Options Reconfigured:
>
> transport.address-family: inet
>
> nfs.disable: on
>
>
>
> [bgoldowsky@webserver8 ~]$ sudo gluster v info
>
> Volume Name: dockervols
>
> Type: Replicate
>
> Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a
>
> Status: Started
>
> Snapshot Count: 0
>
> Number of Bricks: 1 x 3 = 3
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: webserver1:/data/gluster/dockervols
>
> Brick2: webserver11:/data/gluster/dockervols
>
> Brick3: webserver9:/data/gluster/dockervols
>
> Options Reconfigured:
>
> nfs.disable: on
>
> transport.address-family: inet
>
> auth.allow: 127.0.0.1
>
>
>
> Volume Name: testvol
>
> Type: Replicate
>
> Volume ID: 4d5f00f5-00ea-4dcf-babf-1a76eca55332
>
> Status: Started
>
> Snapshot Count: 0
>
> Number of Bricks: 1 x 4 = 4
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: webserver1:/data/gluster/testvol
>
> Brick2: webserver9:/data/gluster/testvol
>
> Brick3: webserver11:/data/gluster/testvol
>
> Brick4: webserver8:/data/gluster/testvol
>
> Options Reconfigured:
>
> nfs.disable: on
>
> transport.address-family: inet
>
>
>
> [bgoldowsky@webserver9 ~]$ sudo gluster v info
>
> Volume Name: dockervols
>
> Type: Replicate
>
> Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a
>
> Status: Started
>
> Snapshot Count: 0
>
> Number of Bricks: 1 x 3 = 3
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: webserver1:/data/gluster/dockervols
>
> Brick2: webserver11:/data/gluster/dockervols
>
> Brick3: webserver9:/data/gluster/dockervols
>
> Options Reconfigured:
>
> nfs.disable: on
>
> transport.address-family: inet
>
> auth.allow: 127.0.0.1
>
>
>
> Volume Name: testvol
>
> Type: Replicate
>
> Volume ID: 4d5f00f5-00ea-4dcf-babf-1a76eca55332
>
> Status: Started
>
> Snapshot Count: 0
>
> Number of Bricks: 1 x 4 = 4
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: webserver1:/data/gluster/testvol
>
> Brick2: webserver9:/data/gluster/testvol
>
> Brick3: webserver11:/data/gluster/testvol
>
> Brick4: webserver8:/data/gluster/testvol
>
> Options Reconfigured:
>
> nfs.disable: on
>
> transport.address-family: inet
>
>
>
> [bgoldowsky@webserver11 ~]$ sudo gluster v info
>
> Volume Name: dockervols
>
> Type: Replicate
>
> Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a
>
> Status: Started
>
> Snapshot Count: 0
>
> Number of Bricks: 1 x 3 = 3
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: webserver1:/data/gluster/dockervols
>
> Brick2: webserver11:/data/gluster/dockervols
>
> Brick3: webserver9:/data/gluster/dockervols
>
> Options Reconfigured:
>
> auth.allow: 127.0.0.1
>
> transport.address-family: inet
>
> nfs.disable: on
>
>
>
> Volume Name: testvol
>
> Type: Replicate
>
> Volume ID: 4d5f00f5-00ea-4dcf-babf-1a76eca55332
>
> Status: Started
>
> Snapshot Count: 0
>
> Number of Bricks: 1 x 4 = 4
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: webserver1:/data/gluster/testvol
>
> Brick2: webserver9:/data/gluster/testvol
>
> Brick3: webserver11:/data/gluster/testvol
>
> Brick4: webserver8:/data/gluster/testvol
>
> Options Reconfigured:
>
> transport.address-family: inet
>
> nfs.disable: on
>
>
>
> [bgoldowsky@webserver9 ~]$ sudo gluster volume add-brick dockervols
> replica 4 webserver8:/data/gluster/dockervols force
>
> volume add-brick: failed: Commit failed on webserver8.cast.org. Please
> check log file for details.
>
>
>
> Webserver8 glusterd.log:
>
>
>
> [2019-04-15 13:55:42.338197] I [MSGID: 106488]
> [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management:
> Received get vol req
>
> The message "I [MSGID: 106488]
> [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management:
> Received get vol req" repeated 2 times between [2019-04-15 13:55:42.338197]
> and [2019-04-15 13:55:42.341618]
>
> [2019-04-15 14:00:20.445011] I [run.c:190:runner_log]
> (-->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x3a215)
> [0x7fe697764215]
> -->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0xe3e9d)
> [0x7fe69780de9d] -->/lib64/libglusterfs.so.0(runner_log+0x115)
> [0x7fe6a2d16ea5] ) 0-management: Ran script:
> /var/lib/glusterd/hooks/1/add-brick/pre/S28Quota-enable-root-xattr-heal.sh
> --volname=dockervols --version=1 --volume-op=add-brick
> --gd-workdir=/var/lib/glusterd
>
> [2019-04-15 14:00:20.445148] I [MSGID: 106578]
> [glusterd-brick-ops.c:1354:glusterd_op_perform_add_bricks] 0-management:
> replica-count is set 4
>
> [2019-04-15 14:00:20.445184] I [MSGID: 106578]
> [glusterd-brick-ops.c:1364:glusterd_op_perform_add_bricks] 0-management:
> type is set 0, need to change it
>
> [2019-04-15 14:00:20.672347] E [MSGID: 106054]
> [glusterd-utils.c:13863:glusterd_handle_replicate_brick_ops] 0-management:
> Failed to set extended attribute trusted.add-brick : Transport endpoint is
> not connected [Transport endpoint is not connected]
>
> [2019-04-15 14:00:20.693491] E [MSGID: 101042]
> [compat.c:569:gf_umount_lazy] 0-management: Lazy unmount of /tmp/mntmvdFGq
> [Transport endpoint is not connected]
>
> [2019-04-15 14:00:20.693597] E [MSGID: 106074]
> [glusterd-brick-ops.c:2590:glusterd_op_add_brick] 0-glusterd: Unable to add
> bricks
>
> [2019-04-15 14:00:20.693637] E [MSGID: 106123]
> [glusterd-mgmt.c:312:gd_mgmt_v3_commit_fn] 0-management: Add-brick commit
> failed.
>
> [2019-04-15 14:00:20.693667] E [MSGID: 106123]
> [glusterd-mgmt-handler.c:616:glusterd_handle_commit_fn] 0-management:
> commit failed on operation Add brick
>
>
>
> Webserver11 log file:
>
>
>
> [2019-04-15 13:56:29.563270] I [MSGID: 106488]
> [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management:
> Received get vol req
>
> The message "I [MSGID: 106488]
> [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management:
> Received get vol req" repeated 2 times between [2019-04-15 13:56:29.563270]
> and [2019-04-15 13:56:29.566209]
>
> [2019-04-15 14:00:33.996866] I [run.c:190:runner_log]
> (-->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x3a215)
> [0x7f36de924215]
> -->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0xe3e9d)
> [0x7f36de9cde9d] -->/lib64/libglusterfs.so.0(runner_log+0x115)
> [0x7f36e9ed6ea5] ) 0-management: Ran script:
> /var/lib/glusterd/hooks/1/add-brick/pre/S28Quota-enable-root-xattr-heal.sh
> --volname=dockervols --version=1 --volume-op=add-brick
> --gd-workdir=/var/lib/glusterd
>
> [2019-04-15 14:00:33.996979] I [MSGID: 106578]
> [glusterd-brick-ops.c:1354:glusterd_op_perform_add_bricks] 0-management:
> replica-count is set 4
>
> [2019-04-15 14:00:33.997004] I [MSGID: 106578]
> [glusterd-brick-ops.c:1364:glusterd_op_perform_add_bricks] 0-management:
> type is set 0, need to change it
>
> [2019-04-15 14:00:34.013789] I [MSGID: 106132]
> [glusterd-proc-mgmt.c:84:glusterd_proc_stop] 0-management: nfs already
> stopped
>
> [2019-04-15 14:00:34.013849] I [MSGID: 106568]
> [glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: nfs service is
> stopped
>
> [2019-04-15 14:00:34.017535] I [MSGID: 106568]
> [glusterd-proc-mgmt.c:88:glusterd_proc_stop] 0-management: Stopping
> glustershd daemon running in pid: 6087
>
> [2019-04-15 14:00:35.018783] I [MSGID: 106568]
> [glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: glustershd
> service is stopped
>
> [2019-04-15 14:00:35.018952] I [MSGID: 106567]
> [glusterd-svc-mgmt.c:211:glusterd_svc_start] 0-management: Starting
> glustershd service
>
> [2019-04-15 14:00:35.028306] I [MSGID: 106132]
> [glusterd-proc-mgmt.c:84:glusterd_proc_stop] 0-management: bitd already
> stopped
>
> [2019-04-15 14:00:35.028408] I [MSGID: 106568]
> [glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: bitd service is
> stopped
>
> [2019-04-15 14:00:35.028601] I [MSGID: 106132]
> [glusterd-proc-mgmt.c:84:glusterd_proc_stop] 0-management: scrub already
> stopped
>
> [2019-04-15 14:00:35.028645] I [MSGID: 106568]
> [glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: scrub service is
> stopped
>
>
>
> Thank you for taking a look!
>
>
>
> Boris
>
>
>
>
>
> *From: *Atin Mukherjee <atin.mukherje...@gmail.com>
> *Date: *Friday, April 12, 2019 at 1:10 PM
> *To: *Boris Goldowsky <bgoldow...@cast.org>
> *Cc: *Gluster-users <gluster-users@gluster.org>
> *Subject: *Re: [Gluster-users] Volume stuck unable to add a brick
>
>
>
>
>
>
>
> On Fri, 12 Apr 2019 at 22:32, Boris Goldowsky <bgoldow...@cast.org> wrote:
>
> I’ve got a replicated volume with three bricks  (“1x3=3”), the idea is to
> have a common set of files that are locally available on all the machines
> (Scientific Linux 7, which is essentially CentOS 7) in a cluster.
>
>
>
> I tried to add on a fourth machine, so used a command like this:
>
>
>
> sudo gluster volume add-brick dockervols replica 4
> webserver8:/data/gluster/dockervols force
>
>
>
> but the result is:
>
> volume add-brick: failed: Commit failed on webserver1. Please check log
> file for details.
>
> Commit failed on webserver8. Please check log file for details.
>
> Commit failed on webserver11. Please check log file for details.
>
>
>
> Tried: removing the new brick (this also fails) and trying again.
>
> Tried: checking the logs. The log files are not enlightening to me – I
> don’t know what’s normal and what’s not.
>
>
>
> From webserver8 & webserver11 could you attach glusterd log files?
>
>
>
> Also please share following:
>
> - gluster version? (gluster —version)
>
> - Output of ‘gluster peer status’
>
> - Output of ‘gluster v info’ from all 4 nodes.
>
>
>
> Tried: deleting the brick directory from previous attempt, so that it’s
> not in the way.
>
> Tried: restarting gluster services
>
> Tried: rebooting
>
> Tried: setting up a new volume, replicated to all four machines. This
> works, so I’m assuming it’s not a networking issue.  But still fails with
> this existing volume that has the critical data in it.
>
>
>
> Running out of ideas. Any suggestions?  Thank you!
>
>
>
> Boris
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
> --
>
> --Atin
>
>

_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Volume stuck unable to add a brick

Reply via email to