Re: [Gluster-users] add-brick: failed: Commit failed

Ravishankar N Fri, 24 May 2019 06:49:19 -0700

Hi David,

On 23/05/19 3:54 AM, David Cunningham wrote:

Hi Ravi,


Please see the log attached.

When I grep -E "Connected to |disconnected from"gvol0-add-brick-mount.log, I don't see a "Connected to gvol0-client-1".It looks like this temporary mount is not able to connect to the 2ndbrick, which is why the lookup is failing due to lack of quorum.

The output of "gluster volume status" is as follows. Should there besomething listening on gfs3? I'm not sure whether it having TCP Portand Pid as N/A is a symptom or cause. Thank you.
# gluster volume status
Status of volume: gvol0
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick gfs1:/nodirectwritedata/gluster/gvol0 49152 0          Y       7706
Brick gfs2:/nodirectwritedata/gluster/gvol0 49152 0          Y       7624
Brick gfs3:/nodirectwritedata/gluster/gvol0 N/A N/A        N       N/A


Can you see if the following steps help?

1. Do a `setfattr -n trusted.afr.gvol0-client-2 -v0x000000000000000100000001 /nodirectwritedata/gluster/gvol0` on *both*gfs1 and gfs2.


2. 'gluster volume start gvol0 force`

3. Check if Brick-3 now comes online with a valid TCP port and PID. Ifit doesn't, check the brick log under /var/log/glusterfs/bricks on gfs3to see why.


Thanks,

Ravi

Self-heal Daemon on localhost               N/A N/A        Y       19853
Self-heal Daemon on gfs1                    N/A N/A        Y       28600
Self-heal Daemon on gfs2                    N/A N/A        Y       17614

Task Status of Volume gvol0
------------------------------------------------------------------------------
There are no active volume tasks

On Wed, 22 May 2019 at 18:06, Ravishankar N <ravishan...@redhat.com<mailto:ravishan...@redhat.com>> wrote:


    If you are trying this again, please 'gluster volume set $volname
    client-log-level DEBUG`before attempting the add-brick and attach
    the gvol0-add-brick-mount.log here. After that, you can change the
    client-log-level back to INFO.

    -Ravi

    On 22/05/19 11:32 AM, Ravishankar N wrote:



    On 22/05/19 11:23 AM, David Cunningham wrote:

    Hi Ravi,

    I'd already done exactly that before, where step 3 was a simple
    'rm -rf /nodirectwritedata/gluster/gvol0'. Have you another
    suggestion on what the cleanup or reformat should be?

    `rm -rf /nodirectwritedata/gluster/gvol0` does look okay to me
    David. Basically, '/nodirectwritedata/gluster/gvol0' must be
    empty and must not have any extended attributes set on it. Why
    fuse_first_lookup() is failing is a bit of a mystery to me at
    this point. :-(
    Regards,
    Ravi


    Thank you.


    On Wed, 22 May 2019 at 13:56, Ravishankar N
    <ravishan...@redhat.com <mailto:ravishan...@redhat.com>> wrote:

        Hmm, so the volume info seems to indicate that the add-brick
        was successful but the gfid xattr is missing on the new
        brick (as are the actual files, barring the .glusterfs
        folder, according to your previous mail).

        Do you want to try removing and adding it again?

        1. `gluster volume remove-brick gvol0 replica 2
        gfs3:/nodirectwritedata/gluster/gvol0 force` from gfs1

        2. Check that gluster volume info is now back to a 1x2
        volume on all nodes and `gluster peer status` is  connected
        on all nodes.

        3. Cleanup or reformat '/nodirectwritedata/gluster/gvol0' on
        gfs3.

        4. `gluster volume add-brick gvol0 replica 3 arbiter 1
        gfs3:/nodirectwritedata/gluster/gvol0` from gfs1.

        5. Check that the files are getting healed on to the new brick.

        Thanks,
        Ravi
        On 22/05/19 6:50 AM, David Cunningham wrote:

        Hi Ravi,

        Certainly. On the existing two nodes:

        gfs1 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0
        getfattr: Removing leading '/' from absolute path names
        # file: nodirectwritedata/gluster/gvol0
        trusted.afr.dirty=0x000000000000000000000000
        trusted.afr.gvol0-client-2=0x000000000000000000000000
        trusted.gfid=0x00000000000000000000000000000001
        trusted.glusterfs.dht=0x000000010000000000000000ffffffff
        trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6

        gfs2 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0
        getfattr: Removing leading '/' from absolute path names
        # file: nodirectwritedata/gluster/gvol0
        trusted.afr.dirty=0x000000000000000000000000
        trusted.afr.gvol0-client-0=0x000000000000000000000000
        trusted.afr.gvol0-client-2=0x000000000000000000000000
        trusted.gfid=0x00000000000000000000000000000001
        trusted.glusterfs.dht=0x000000010000000000000000ffffffff
        trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6

        On the new node:

        gfs3 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0
        getfattr: Removing leading '/' from absolute path names
        # file: nodirectwritedata/gluster/gvol0
        trusted.afr.dirty=0x000000000000000000000001
        trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6

        Output of "gluster volume info" is the same on all 3 nodes
        and is:

        # gluster volume info

        Volume Name: gvol0
        Type: Replicate
        Volume ID: fb5af69e-1c3e-4164-8b23-c1d7bec9b1b6
        Status: Started
        Snapshot Count: 0
        Number of Bricks: 1 x (2 + 1) = 3
        Transport-type: tcp
        Bricks:
        Brick1: gfs1:/nodirectwritedata/gluster/gvol0
        Brick2: gfs2:/nodirectwritedata/gluster/gvol0
        Brick3: gfs3:/nodirectwritedata/gluster/gvol0 (arbiter)
        Options Reconfigured:
        performance.client-io-threads: off
        nfs.disable: on
        transport.address-family: inet


        On Wed, 22 May 2019 at 12:43, Ravishankar N
        <ravishan...@redhat.com <mailto:ravishan...@redhat.com>> wrote:

            Hi David,
            Could you provide the `getfattr -d -m. -e hex
            /nodirectwritedata/gluster/gvol0` output of all bricks
            and the output of `gluster volume info`?

            Thanks,
            Ravi
            On 22/05/19 4:57 AM, David Cunningham wrote:

            Hi Sanju,

            Here's what glusterd.log says on the new arbiter
            server when trying to add the node:

            [2019-05-22 00:15:05.963059] I [run.c:242:runner_log]
            (-->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0x3b2cd)
            [0x7fe4ca9102cd]
            -->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0xe6b85)
            [0x7fe4ca9bbb85]
            -->/lib64/libglusterfs.so.0(runner_log+0x115)
            [0x7fe4d5ecc955] ) 0-management: Ran script:
            
/var/lib/glusterd/hooks/1/add-brick/pre/S28Quota-enable-root-xattr-heal.sh
            --volname=gvol0 --version=1 --volume-op=add-brick
            --gd-workdir=/var/lib/glusterd
            [2019-05-22 00:15:05.963177] I [MSGID: 106578]
            [glusterd-brick-ops.c:1355:glusterd_op_perform_add_bricks]
            0-management: replica-count is set 3
            [2019-05-22 00:15:05.963228] I [MSGID: 106578]
            [glusterd-brick-ops.c:1360:glusterd_op_perform_add_bricks]
            0-management: arbiter-count is set 1
            [2019-05-22 00:15:05.963257] I [MSGID: 106578]
            [glusterd-brick-ops.c:1364:glusterd_op_perform_add_bricks]
            0-management: type is set 0, need to change it
            [2019-05-22 00:15:17.015268] E [MSGID: 106053]
            [glusterd-utils.c:13942:glusterd_handle_replicate_brick_ops]
            0-management: Failed to set extended attribute
            trusted.add-brick : Transport endpoint is not
            connected [Transport endpoint is not connected]
            [2019-05-22 00:15:17.036479] E [MSGID: 106073]
            [glusterd-brick-ops.c:2595:glusterd_op_add_brick]
            0-glusterd: Unable to add bricks
            [2019-05-22 00:15:17.036595] E [MSGID: 106122]
            [glusterd-mgmt.c:299:gd_mgmt_v3_commit_fn]
            0-management: Add-brick commit failed.
            [2019-05-22 00:15:17.036710] E [MSGID: 106122]
            [glusterd-mgmt-handler.c:594:glusterd_handle_commit_fn]
            0-management: commit failed on operation Add brick

            As before gvol0-add-brick-mount.log said:

            [2019-05-22 00:15:17.005695] I
            [fuse-bridge.c:4267:fuse_init] 0-glusterfs-fuse: FUSE
            inited with protocol versions: glusterfs 7.24 kernel 7.22
            [2019-05-22 00:15:17.005749] I
            [fuse-bridge.c:4878:fuse_graph_sync] 0-fuse: switched
            to graph 0
            [2019-05-22 00:15:17.010101] E
            [fuse-bridge.c:4336:fuse_first_lookup] 0-fuse: first
            lookup on root failed (Transport endpoint is not
            connected)
            [2019-05-22 00:15:17.014217] W
            [fuse-bridge.c:897:fuse_attr_cbk] 0-glusterfs-fuse: 2:
            LOOKUP() / => -1 (Transport endpoint is not connected)
            [2019-05-22 00:15:17.015097] W
            [fuse-resolve.c:127:fuse_resolve_gfid_cbk] 0-fuse:
            00000000-0000-0000-0000-000000000001: failed to
            resolve (Transport endpoint is not connected)
            [2019-05-22 00:15:17.015158] W
            [fuse-bridge.c:3294:fuse_setxattr_resume]
            0-glusterfs-fuse: 3: SETXATTR
            00000000-0000-0000-0000-000000000001/1
            (trusted.add-brick) resolution failed
            [2019-05-22 00:15:17.035636] I
            [fuse-bridge.c:5144:fuse_thread_proc] 0-fuse:
            initating unmount of /tmp/mntYGNbj9
            [2019-05-22 00:15:17.035854] W
            [glusterfsd.c:1500:cleanup_and_exit]
            (-->/lib64/libpthread.so.0(+0x7dd5) [0x7f7745ccedd5]
            -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5)
            [0x55c81b63de75]
            -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b)
            [0x55c81b63dceb] ) 0-: received signum (15), shutting down
            [2019-05-22 00:15:17.035942] I
            [fuse-bridge.c:5914:fini] 0-fuse: Unmounting
            '/tmp/mntYGNbj9'.
            [2019-05-22 00:15:17.035966] I
            [fuse-bridge.c:5919:fini] 0-fuse: Closing fuse
            connection to '/tmp/mntYGNbj9'.

            Here are the processes running on the new arbiter server:
            # ps -ef | grep gluster
            root      3466     1  0 20:13 ?        00:00:00
            /usr/sbin/glusterfs -s localhost --volfile-id
            gluster/glustershd -p
            /var/run/gluster/glustershd/glustershd.pid -l
            /var/log/glusterfs/glustershd.log -S
            /var/run/gluster/24c12b09f93eec8e.socket
            --xlator-option
            *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412
            --process-name glustershd
            root      6832     1  0 May16 ?        00:02:10
            /usr/sbin/glusterd -p /var/run/glusterd.pid
            --log-level INFO
            root     17841     1  0 May16 ?        00:00:58
            /usr/sbin/glusterfs --process-name fuse
            --volfile-server=gfs1 --volfile-id=/gvol0 /mnt/glusterfs

            Here are the files created on the new arbiter server:
            # find /nodirectwritedata/gluster/gvol0 | xargs ls -ald
            drwxr-xr-x 3 root root 4096 May 21 20:15
            /nodirectwritedata/gluster/gvol0
            drw------- 2 root root 4096 May 21 20:15
            /nodirectwritedata/gluster/gvol0/.glusterfs

            Thank you for your help!


            On Tue, 21 May 2019 at 00:10, Sanju Rakonde
            <srako...@redhat.com <mailto:srako...@redhat.com>> wrote:

                David,

                can you please attach glusterd.logs? As the error
                message says, Commit failed on the arbitar node,
                we might be able to find some issue on that node.

                On Mon, May 20, 2019 at 10:10 AM Nithya
                Balachandran <nbala...@redhat.com
                <mailto:nbala...@redhat.com>> wrote:



                    On Fri, 17 May 2019 at 06:01, David Cunningham
                    <dcunning...@voisonics.com
                    <mailto:dcunning...@voisonics.com>> wrote:

                        Hello,

                        We're adding an arbiter node to an
                        existing volume and having an issue. Can
                        anyone help? The root cause error appears
                        to be
                        "00000000-0000-0000-0000-000000000001:
                        failed to resolve (Transport endpoint is
                        not connected)", as below.

                        We are running glusterfs 5.6.1. Thanks in
                        advance for any assistance!

                        On existing node gfs1, trying to add new
                        arbiter node gfs3:

                        # gluster volume add-brick gvol0 replica 3
                        arbiter 1
                        gfs3:/nodirectwritedata/gluster/gvol0
                        volume add-brick: failed: Commit failed on
                        gfs3. Please check log file for details.


                    This looks like a glusterd issue. Please check
                    the glusterd logs for more info.
                    Adding the glusterd dev to this thread. Sanju,
                    can you take a look?
                    Regards,
                    Nithya


                        On new node gfs3 in gvol0-add-brick-mount.log:

                        [2019-05-17 01:20:22.689721] I
                        [fuse-bridge.c:4267:fuse_init]
                        0-glusterfs-fuse: FUSE inited with
                        protocol versions: glusterfs 7.24 kernel 7.22
                        [2019-05-17 01:20:22.689778] I
                        [fuse-bridge.c:4878:fuse_graph_sync]
                        0-fuse: switched to graph 0
                        [2019-05-17 01:20:22.694897] E
                        [fuse-bridge.c:4336:fuse_first_lookup]
                        0-fuse: first lookup on root failed
                        (Transport endpoint is not connected)
                        [2019-05-17 01:20:22.699770] W
                        [fuse-resolve.c:127:fuse_resolve_gfid_cbk]
                        0-fuse:
                        00000000-0000-0000-0000-000000000001:
                        failed to resolve (Transport endpoint is
                        not connected)
                        [2019-05-17 01:20:22.699834] W
                        [fuse-bridge.c:3294:fuse_setxattr_resume]
                        0-glusterfs-fuse: 2: SETXATTR
                        00000000-0000-0000-0000-000000000001/1
                        (trusted.add-brick) resolution failed
                        [2019-05-17 01:20:22.715656] I
                        [fuse-bridge.c:5144:fuse_thread_proc]
                        0-fuse: initating unmount of /tmp/mntQAtu3f
                        [2019-05-17 01:20:22.715865] W
                        [glusterfsd.c:1500:cleanup_and_exit]
                        (-->/lib64/libpthread.so.0(+0x7dd5)
                        [0x7fb223bf6dd5]
                        -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5)
                        [0x560886581e75]
                        -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b)
                        [0x560886581ceb] ) 0-: received signum
                        (15), shutting down
                        [2019-05-17 01:20:22.715926] I
                        [fuse-bridge.c:5914:fini] 0-fuse:
                        Unmounting '/tmp/mntQAtu3f'.
                        [2019-05-17 01:20:22.715953] I
                        [fuse-bridge.c:5919:fini] 0-fuse: Closing
                        fuse connection to '/tmp/mntQAtu3f'.

                        Processes running on new node gfs3:

                        # ps -ef | grep gluster
                        root 6832     1  0 20:17 ? 00:00:00
                        /usr/sbin/glusterd -p
                        /var/run/glusterd.pid --log-level INFO
                        root 15799     1  0 20:17 ? 00:00:00
                        /usr/sbin/glusterfs -s localhost
                        --volfile-id gluster/glustershd -p
                        /var/run/gluster/glustershd/glustershd.pid
                        -l /var/log/glusterfs/glustershd.log -S
                        /var/run/gluster/24c12b09f93eec8e.socket
                        --xlator-option
                        
*replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412
                        --process-name glustershd
                        root     16856 16735  0 21:21 pts/0
                        00:00:00 grep --color=auto gluster

--David Cunningham, Voisonics Limited

                        http://voisonics.com/
                        USA: +1 213 221 1092
                        New Zealand: +64 (0)28 2558 3782
                        _______________________________________________
                        Gluster-users mailing list
                        Gluster-users@gluster.org
                        <mailto:Gluster-users@gluster.org>
                        https://lists.gluster.org/mailman/listinfo/gluster-users

--Thanks,

                Sanju

--David Cunningham, Voisonics Limited

            http://voisonics.com/
            USA: +1 213 221 1092
            New Zealand: +64 (0)28 2558 3782

            _______________________________________________
            Gluster-users mailing list
            Gluster-users@gluster.org  <mailto:Gluster-users@gluster.org>
            https://lists.gluster.org/mailman/listinfo/gluster-users

--David Cunningham, Voisonics Limited

        http://voisonics.com/
        USA: +1 213 221 1092
        New Zealand: +64 (0)28 2558 3782

--David Cunningham, Voisonics Limited

    http://voisonics.com/
    USA: +1 213 221 1092
    New Zealand: +64 (0)28 2558 3782




--
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782

_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] add-brick: failed: Commit failed

Reply via email to