Re: [Gluster-users] add-brick: failed: Commit failed

David Cunningham Tue, 21 May 2019 22:54:21 -0700

Hi Ravi,

I'd already done exactly that before, where step 3 was a simple 'rm -rf
/nodirectwritedata/gluster/gvol0'. Have you another suggestion on what the
cleanup or reformat should be?


Thank you.


On Wed, 22 May 2019 at 13:56, Ravishankar N <ravishan...@redhat.com> wrote:

> Hmm, so the volume info seems to indicate that the add-brick was
> successful but the gfid xattr is missing on the new brick (as are the
> actual files, barring the .glusterfs folder, according to your previous
> mail).
>
> Do you want to try removing and adding it again?
>
> 1. `gluster volume remove-brick gvol0 replica 2
> gfs3:/nodirectwritedata/gluster/gvol0 force` from gfs1
>
> 2. Check that gluster volume info is now back to a 1x2 volume on all nodes
> and `gluster peer status` is  connected on all nodes.
>
> 3. Cleanup or reformat '/nodirectwritedata/gluster/gvol0' on gfs3.
>
> 4. `gluster volume add-brick gvol0 replica 3 arbiter 1
> gfs3:/nodirectwritedata/gluster/gvol0` from gfs1.
>
> 5. Check that the files are getting healed on to the new brick.
> Thanks,
> Ravi
> On 22/05/19 6:50 AM, David Cunningham wrote:
>
> Hi Ravi,
>
> Certainly. On the existing two nodes:
>
> gfs1 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0
> getfattr: Removing leading '/' from absolute path names
> # file: nodirectwritedata/gluster/gvol0
> trusted.afr.dirty=0x000000000000000000000000
> trusted.afr.gvol0-client-2=0x000000000000000000000000
> trusted.gfid=0x00000000000000000000000000000001
> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
> trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6
>
> gfs2 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0
> getfattr: Removing leading '/' from absolute path names
> # file: nodirectwritedata/gluster/gvol0
> trusted.afr.dirty=0x000000000000000000000000
> trusted.afr.gvol0-client-0=0x000000000000000000000000
> trusted.afr.gvol0-client-2=0x000000000000000000000000
> trusted.gfid=0x00000000000000000000000000000001
> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
> trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6
>
> On the new node:
>
> gfs3 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0
> getfattr: Removing leading '/' from absolute path names
> # file: nodirectwritedata/gluster/gvol0
> trusted.afr.dirty=0x000000000000000000000001
> trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6
>
> Output of "gluster volume info" is the same on all 3 nodes and is:
>
> # gluster volume info
>
> Volume Name: gvol0
> Type: Replicate
> Volume ID: fb5af69e-1c3e-4164-8b23-c1d7bec9b1b6
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x (2 + 1) = 3
> Transport-type: tcp
> Bricks:
> Brick1: gfs1:/nodirectwritedata/gluster/gvol0
> Brick2: gfs2:/nodirectwritedata/gluster/gvol0
> Brick3: gfs3:/nodirectwritedata/gluster/gvol0 (arbiter)
> Options Reconfigured:
> performance.client-io-threads: off
> nfs.disable: on
> transport.address-family: inet
>
>
> On Wed, 22 May 2019 at 12:43, Ravishankar N <ravishan...@redhat.com>
> wrote:
>
>> Hi David,
>> Could you provide the `getfattr -d -m. -e hex
>> /nodirectwritedata/gluster/gvol0` output of all bricks and the output of
>> `gluster volume info`?
>>
>> Thanks,
>> Ravi
>> On 22/05/19 4:57 AM, David Cunningham wrote:
>>
>> Hi Sanju,
>>
>> Here's what glusterd.log says on the new arbiter server when trying to
>> add the node:
>>
>> [2019-05-22 00:15:05.963059] I [run.c:242:runner_log]
>> (-->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0x3b2cd)
>> [0x7fe4ca9102cd]
>> -->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0xe6b85)
>> [0x7fe4ca9bbb85] -->/lib64/libglusterfs.so.0(runner_log+0x115)
>> [0x7fe4d5ecc955] ) 0-management: Ran script:
>> /var/lib/glusterd/hooks/1/add-brick/pre/S28Quota-enable-root-xattr-heal.sh
>> --volname=gvol0 --version=1 --volume-op=add-brick
>> --gd-workdir=/var/lib/glusterd
>> [2019-05-22 00:15:05.963177] I [MSGID: 106578]
>> [glusterd-brick-ops.c:1355:glusterd_op_perform_add_bricks] 0-management:
>> replica-count is set 3
>> [2019-05-22 00:15:05.963228] I [MSGID: 106578]
>> [glusterd-brick-ops.c:1360:glusterd_op_perform_add_bricks] 0-management:
>> arbiter-count is set 1
>> [2019-05-22 00:15:05.963257] I [MSGID: 106578]
>> [glusterd-brick-ops.c:1364:glusterd_op_perform_add_bricks] 0-management:
>> type is set 0, need to change it
>> [2019-05-22 00:15:17.015268] E [MSGID: 106053]
>> [glusterd-utils.c:13942:glusterd_handle_replicate_brick_ops] 0-management:
>> Failed to set extended attribute trusted.add-brick : Transport endpoint is
>> not connected [Transport endpoint is not connected]
>> [2019-05-22 00:15:17.036479] E [MSGID: 106073]
>> [glusterd-brick-ops.c:2595:glusterd_op_add_brick] 0-glusterd: Unable to add
>> bricks
>> [2019-05-22 00:15:17.036595] E [MSGID: 106122]
>> [glusterd-mgmt.c:299:gd_mgmt_v3_commit_fn] 0-management: Add-brick commit
>> failed.
>> [2019-05-22 00:15:17.036710] E [MSGID: 106122]
>> [glusterd-mgmt-handler.c:594:glusterd_handle_commit_fn] 0-management:
>> commit failed on operation Add brick
>>
>> As before gvol0-add-brick-mount.log said:
>>
>> [2019-05-22 00:15:17.005695] I [fuse-bridge.c:4267:fuse_init]
>> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel
>> 7.22
>> [2019-05-22 00:15:17.005749] I [fuse-bridge.c:4878:fuse_graph_sync]
>> 0-fuse: switched to graph 0
>> [2019-05-22 00:15:17.010101] E [fuse-bridge.c:4336:fuse_first_lookup]
>> 0-fuse: first lookup on root failed (Transport endpoint is not connected)
>> [2019-05-22 00:15:17.014217] W [fuse-bridge.c:897:fuse_attr_cbk]
>> 0-glusterfs-fuse: 2: LOOKUP() / => -1 (Transport endpoint is not connected)
>> [2019-05-22 00:15:17.015097] W [fuse-resolve.c:127:fuse_resolve_gfid_cbk]
>> 0-fuse: 00000000-0000-0000-0000-000000000001: failed to resolve (Transport
>> endpoint is not connected)
>> [2019-05-22 00:15:17.015158] W [fuse-bridge.c:3294:fuse_setxattr_resume]
>> 0-glusterfs-fuse: 3: SETXATTR 00000000-0000-0000-0000-000000000001/1
>> (trusted.add-brick) resolution failed
>> [2019-05-22 00:15:17.035636] I [fuse-bridge.c:5144:fuse_thread_proc]
>> 0-fuse: initating unmount of /tmp/mntYGNbj9
>> [2019-05-22 00:15:17.035854] W [glusterfsd.c:1500:cleanup_and_exit]
>> (-->/lib64/libpthread.so.0(+0x7dd5) [0x7f7745ccedd5]
>> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55c81b63de75]
>> -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55c81b63dceb] ) 0-:
>> received signum (15), shutting down
>> [2019-05-22 00:15:17.035942] I [fuse-bridge.c:5914:fini] 0-fuse:
>> Unmounting '/tmp/mntYGNbj9'.
>> [2019-05-22 00:15:17.035966] I [fuse-bridge.c:5919:fini] 0-fuse: Closing
>> fuse connection to '/tmp/mntYGNbj9'.
>>
>> Here are the processes running on the new arbiter server:
>> # ps -ef | grep gluster
>> root      3466     1  0 20:13 ?        00:00:00 /usr/sbin/glusterfs -s
>> localhost --volfile-id gluster/glustershd -p
>> /var/run/gluster/glustershd/glustershd.pid -l
>> /var/log/glusterfs/glustershd.log -S
>> /var/run/gluster/24c12b09f93eec8e.socket --xlator-option
>> *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412 --process-name
>> glustershd
>> root      6832     1  0 May16 ?        00:02:10 /usr/sbin/glusterd -p
>> /var/run/glusterd.pid --log-level INFO
>> root     17841     1  0 May16 ?        00:00:58 /usr/sbin/glusterfs
>> --process-name fuse --volfile-server=gfs1 --volfile-id=/gvol0 /mnt/glusterfs
>>
>> Here are the files created on the new arbiter server:
>> # find /nodirectwritedata/gluster/gvol0 | xargs ls -ald
>> drwxr-xr-x 3 root root 4096 May 21 20:15 /nodirectwritedata/gluster/gvol0
>> drw------- 2 root root 4096 May 21 20:15
>> /nodirectwritedata/gluster/gvol0/.glusterfs
>>
>> Thank you for your help!
>>
>>
>> On Tue, 21 May 2019 at 00:10, Sanju Rakonde <srako...@redhat.com> wrote:
>>
>>> David,
>>>
>>> can you please attach glusterd.logs? As the error message says, Commit
>>> failed on the arbitar node, we might be able to find some issue on that
>>> node.
>>>
>>> On Mon, May 20, 2019 at 10:10 AM Nithya Balachandran <
>>> nbala...@redhat.com> wrote:
>>>
>>>>
>>>>
>>>> On Fri, 17 May 2019 at 06:01, David Cunningham <
>>>> dcunning...@voisonics.com> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> We're adding an arbiter node to an existing volume and having an
>>>>> issue. Can anyone help? The root cause error appears to be
>>>>> "00000000-0000-0000-0000-000000000001: failed to resolve (Transport
>>>>> endpoint is not connected)", as below.
>>>>>
>>>>> We are running glusterfs 5.6.1. Thanks in advance for any assistance!
>>>>>
>>>>> On existing node gfs1, trying to add new arbiter node gfs3:
>>>>>
>>>>> # gluster volume add-brick gvol0 replica 3 arbiter 1
>>>>> gfs3:/nodirectwritedata/gluster/gvol0
>>>>> volume add-brick: failed: Commit failed on gfs3. Please check log file
>>>>> for details.
>>>>>
>>>>
>>>> This looks like a glusterd issue. Please check the glusterd logs for
>>>> more info.
>>>> Adding the glusterd dev to this thread. Sanju, can you take a look?
>>>>
>>>> Regards,
>>>> Nithya
>>>>
>>>>>
>>>>> On new node gfs3 in gvol0-add-brick-mount.log:
>>>>>
>>>>> [2019-05-17 01:20:22.689721] I [fuse-bridge.c:4267:fuse_init]
>>>>> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 
>>>>> kernel
>>>>> 7.22
>>>>> [2019-05-17 01:20:22.689778] I [fuse-bridge.c:4878:fuse_graph_sync]
>>>>> 0-fuse: switched to graph 0
>>>>> [2019-05-17 01:20:22.694897] E [fuse-bridge.c:4336:fuse_first_lookup]
>>>>> 0-fuse: first lookup on root failed (Transport endpoint is not connected)
>>>>> [2019-05-17 01:20:22.699770] W
>>>>> [fuse-resolve.c:127:fuse_resolve_gfid_cbk] 0-fuse:
>>>>> 00000000-0000-0000-0000-000000000001: failed to resolve (Transport 
>>>>> endpoint
>>>>> is not connected)
>>>>> [2019-05-17 01:20:22.699834] W
>>>>> [fuse-bridge.c:3294:fuse_setxattr_resume] 0-glusterfs-fuse: 2: SETXATTR
>>>>> 00000000-0000-0000-0000-000000000001/1 (trusted.add-brick) resolution 
>>>>> failed
>>>>> [2019-05-17 01:20:22.715656] I [fuse-bridge.c:5144:fuse_thread_proc]
>>>>> 0-fuse: initating unmount of /tmp/mntQAtu3f
>>>>> [2019-05-17 01:20:22.715865] W [glusterfsd.c:1500:cleanup_and_exit]
>>>>> (-->/lib64/libpthread.so.0(+0x7dd5) [0x7fb223bf6dd5]
>>>>> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x560886581e75]
>>>>> -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x560886581ceb] ) 0-:
>>>>> received signum (15), shutting down
>>>>> [2019-05-17 01:20:22.715926] I [fuse-bridge.c:5914:fini] 0-fuse:
>>>>> Unmounting '/tmp/mntQAtu3f'.
>>>>> [2019-05-17 01:20:22.715953] I [fuse-bridge.c:5919:fini] 0-fuse:
>>>>> Closing fuse connection to '/tmp/mntQAtu3f'.
>>>>>
>>>>> Processes running on new node gfs3:
>>>>>
>>>>> # ps -ef | grep gluster
>>>>> root      6832     1  0 20:17 ?        00:00:00 /usr/sbin/glusterd -p
>>>>> /var/run/glusterd.pid --log-level INFO
>>>>> root     15799     1  0 20:17 ?        00:00:00 /usr/sbin/glusterfs -s
>>>>> localhost --volfile-id gluster/glustershd -p
>>>>> /var/run/gluster/glustershd/glustershd.pid -l
>>>>> /var/log/glusterfs/glustershd.log -S
>>>>> /var/run/gluster/24c12b09f93eec8e.socket --xlator-option
>>>>> *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412 --process-name
>>>>> glustershd
>>>>> root     16856 16735  0 21:21 pts/0    00:00:00 grep --color=auto
>>>>> gluster
>>>>>
>>>>> --
>>>>> David Cunningham, Voisonics Limited
>>>>> http://voisonics.com/
>>>>> USA: +1 213 221 1092
>>>>> New Zealand: +64 (0)28 2558 3782
>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> Gluster-users@gluster.org
>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>
>>>>
>>>
>>> --
>>> Thanks,
>>> Sanju
>>>
>>
>>
>> --
>> David Cunningham, Voisonics Limited
>> http://voisonics.com/
>> USA: +1 213 221 1092
>> New Zealand: +64 (0)28 2558 3782
>>
>> _______________________________________________
>> Gluster-users mailing 
>> listGluster-users@gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
>
> --
> David Cunningham, Voisonics Limited
> http://voisonics.com/
> USA: +1 213 221 1092
> New Zealand: +64 (0)28 2558 3782
>
>

-- 
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782

_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] add-brick: failed: Commit failed

Reply via email to