Hi Ravi, I'd already done exactly that before, where step 3 was a simple 'rm -rf /nodirectwritedata/gluster/gvol0'. Have you another suggestion on what the cleanup or reformat should be?
Thank you. On Wed, 22 May 2019 at 13:56, Ravishankar N <ravishan...@redhat.com> wrote: > Hmm, so the volume info seems to indicate that the add-brick was > successful but the gfid xattr is missing on the new brick (as are the > actual files, barring the .glusterfs folder, according to your previous > mail). > > Do you want to try removing and adding it again? > > 1. `gluster volume remove-brick gvol0 replica 2 > gfs3:/nodirectwritedata/gluster/gvol0 force` from gfs1 > > 2. Check that gluster volume info is now back to a 1x2 volume on all nodes > and `gluster peer status` is connected on all nodes. > > 3. Cleanup or reformat '/nodirectwritedata/gluster/gvol0' on gfs3. > > 4. `gluster volume add-brick gvol0 replica 3 arbiter 1 > gfs3:/nodirectwritedata/gluster/gvol0` from gfs1. > > 5. Check that the files are getting healed on to the new brick. > Thanks, > Ravi > On 22/05/19 6:50 AM, David Cunningham wrote: > > Hi Ravi, > > Certainly. On the existing two nodes: > > gfs1 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0 > getfattr: Removing leading '/' from absolute path names > # file: nodirectwritedata/gluster/gvol0 > trusted.afr.dirty=0x000000000000000000000000 > trusted.afr.gvol0-client-2=0x000000000000000000000000 > trusted.gfid=0x00000000000000000000000000000001 > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6 > > gfs2 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0 > getfattr: Removing leading '/' from absolute path names > # file: nodirectwritedata/gluster/gvol0 > trusted.afr.dirty=0x000000000000000000000000 > trusted.afr.gvol0-client-0=0x000000000000000000000000 > trusted.afr.gvol0-client-2=0x000000000000000000000000 > trusted.gfid=0x00000000000000000000000000000001 > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6 > > On the new node: > > gfs3 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0 > getfattr: Removing leading '/' from absolute path names > # file: nodirectwritedata/gluster/gvol0 > trusted.afr.dirty=0x000000000000000000000001 > trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6 > > Output of "gluster volume info" is the same on all 3 nodes and is: > > # gluster volume info > > Volume Name: gvol0 > Type: Replicate > Volume ID: fb5af69e-1c3e-4164-8b23-c1d7bec9b1b6 > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x (2 + 1) = 3 > Transport-type: tcp > Bricks: > Brick1: gfs1:/nodirectwritedata/gluster/gvol0 > Brick2: gfs2:/nodirectwritedata/gluster/gvol0 > Brick3: gfs3:/nodirectwritedata/gluster/gvol0 (arbiter) > Options Reconfigured: > performance.client-io-threads: off > nfs.disable: on > transport.address-family: inet > > > On Wed, 22 May 2019 at 12:43, Ravishankar N <ravishan...@redhat.com> > wrote: > >> Hi David, >> Could you provide the `getfattr -d -m. -e hex >> /nodirectwritedata/gluster/gvol0` output of all bricks and the output of >> `gluster volume info`? >> >> Thanks, >> Ravi >> On 22/05/19 4:57 AM, David Cunningham wrote: >> >> Hi Sanju, >> >> Here's what glusterd.log says on the new arbiter server when trying to >> add the node: >> >> [2019-05-22 00:15:05.963059] I [run.c:242:runner_log] >> (-->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0x3b2cd) >> [0x7fe4ca9102cd] >> -->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0xe6b85) >> [0x7fe4ca9bbb85] -->/lib64/libglusterfs.so.0(runner_log+0x115) >> [0x7fe4d5ecc955] ) 0-management: Ran script: >> /var/lib/glusterd/hooks/1/add-brick/pre/S28Quota-enable-root-xattr-heal.sh >> --volname=gvol0 --version=1 --volume-op=add-brick >> --gd-workdir=/var/lib/glusterd >> [2019-05-22 00:15:05.963177] I [MSGID: 106578] >> [glusterd-brick-ops.c:1355:glusterd_op_perform_add_bricks] 0-management: >> replica-count is set 3 >> [2019-05-22 00:15:05.963228] I [MSGID: 106578] >> [glusterd-brick-ops.c:1360:glusterd_op_perform_add_bricks] 0-management: >> arbiter-count is set 1 >> [2019-05-22 00:15:05.963257] I [MSGID: 106578] >> [glusterd-brick-ops.c:1364:glusterd_op_perform_add_bricks] 0-management: >> type is set 0, need to change it >> [2019-05-22 00:15:17.015268] E [MSGID: 106053] >> [glusterd-utils.c:13942:glusterd_handle_replicate_brick_ops] 0-management: >> Failed to set extended attribute trusted.add-brick : Transport endpoint is >> not connected [Transport endpoint is not connected] >> [2019-05-22 00:15:17.036479] E [MSGID: 106073] >> [glusterd-brick-ops.c:2595:glusterd_op_add_brick] 0-glusterd: Unable to add >> bricks >> [2019-05-22 00:15:17.036595] E [MSGID: 106122] >> [glusterd-mgmt.c:299:gd_mgmt_v3_commit_fn] 0-management: Add-brick commit >> failed. >> [2019-05-22 00:15:17.036710] E [MSGID: 106122] >> [glusterd-mgmt-handler.c:594:glusterd_handle_commit_fn] 0-management: >> commit failed on operation Add brick >> >> As before gvol0-add-brick-mount.log said: >> >> [2019-05-22 00:15:17.005695] I [fuse-bridge.c:4267:fuse_init] >> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel >> 7.22 >> [2019-05-22 00:15:17.005749] I [fuse-bridge.c:4878:fuse_graph_sync] >> 0-fuse: switched to graph 0 >> [2019-05-22 00:15:17.010101] E [fuse-bridge.c:4336:fuse_first_lookup] >> 0-fuse: first lookup on root failed (Transport endpoint is not connected) >> [2019-05-22 00:15:17.014217] W [fuse-bridge.c:897:fuse_attr_cbk] >> 0-glusterfs-fuse: 2: LOOKUP() / => -1 (Transport endpoint is not connected) >> [2019-05-22 00:15:17.015097] W [fuse-resolve.c:127:fuse_resolve_gfid_cbk] >> 0-fuse: 00000000-0000-0000-0000-000000000001: failed to resolve (Transport >> endpoint is not connected) >> [2019-05-22 00:15:17.015158] W [fuse-bridge.c:3294:fuse_setxattr_resume] >> 0-glusterfs-fuse: 3: SETXATTR 00000000-0000-0000-0000-000000000001/1 >> (trusted.add-brick) resolution failed >> [2019-05-22 00:15:17.035636] I [fuse-bridge.c:5144:fuse_thread_proc] >> 0-fuse: initating unmount of /tmp/mntYGNbj9 >> [2019-05-22 00:15:17.035854] W [glusterfsd.c:1500:cleanup_and_exit] >> (-->/lib64/libpthread.so.0(+0x7dd5) [0x7f7745ccedd5] >> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55c81b63de75] >> -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55c81b63dceb] ) 0-: >> received signum (15), shutting down >> [2019-05-22 00:15:17.035942] I [fuse-bridge.c:5914:fini] 0-fuse: >> Unmounting '/tmp/mntYGNbj9'. >> [2019-05-22 00:15:17.035966] I [fuse-bridge.c:5919:fini] 0-fuse: Closing >> fuse connection to '/tmp/mntYGNbj9'. >> >> Here are the processes running on the new arbiter server: >> # ps -ef | grep gluster >> root 3466 1 0 20:13 ? 00:00:00 /usr/sbin/glusterfs -s >> localhost --volfile-id gluster/glustershd -p >> /var/run/gluster/glustershd/glustershd.pid -l >> /var/log/glusterfs/glustershd.log -S >> /var/run/gluster/24c12b09f93eec8e.socket --xlator-option >> *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412 --process-name >> glustershd >> root 6832 1 0 May16 ? 00:02:10 /usr/sbin/glusterd -p >> /var/run/glusterd.pid --log-level INFO >> root 17841 1 0 May16 ? 00:00:58 /usr/sbin/glusterfs >> --process-name fuse --volfile-server=gfs1 --volfile-id=/gvol0 /mnt/glusterfs >> >> Here are the files created on the new arbiter server: >> # find /nodirectwritedata/gluster/gvol0 | xargs ls -ald >> drwxr-xr-x 3 root root 4096 May 21 20:15 /nodirectwritedata/gluster/gvol0 >> drw------- 2 root root 4096 May 21 20:15 >> /nodirectwritedata/gluster/gvol0/.glusterfs >> >> Thank you for your help! >> >> >> On Tue, 21 May 2019 at 00:10, Sanju Rakonde <srako...@redhat.com> wrote: >> >>> David, >>> >>> can you please attach glusterd.logs? As the error message says, Commit >>> failed on the arbitar node, we might be able to find some issue on that >>> node. >>> >>> On Mon, May 20, 2019 at 10:10 AM Nithya Balachandran < >>> nbala...@redhat.com> wrote: >>> >>>> >>>> >>>> On Fri, 17 May 2019 at 06:01, David Cunningham < >>>> dcunning...@voisonics.com> wrote: >>>> >>>>> Hello, >>>>> >>>>> We're adding an arbiter node to an existing volume and having an >>>>> issue. Can anyone help? The root cause error appears to be >>>>> "00000000-0000-0000-0000-000000000001: failed to resolve (Transport >>>>> endpoint is not connected)", as below. >>>>> >>>>> We are running glusterfs 5.6.1. Thanks in advance for any assistance! >>>>> >>>>> On existing node gfs1, trying to add new arbiter node gfs3: >>>>> >>>>> # gluster volume add-brick gvol0 replica 3 arbiter 1 >>>>> gfs3:/nodirectwritedata/gluster/gvol0 >>>>> volume add-brick: failed: Commit failed on gfs3. Please check log file >>>>> for details. >>>>> >>>> >>>> This looks like a glusterd issue. Please check the glusterd logs for >>>> more info. >>>> Adding the glusterd dev to this thread. Sanju, can you take a look? >>>> >>>> Regards, >>>> Nithya >>>> >>>>> >>>>> On new node gfs3 in gvol0-add-brick-mount.log: >>>>> >>>>> [2019-05-17 01:20:22.689721] I [fuse-bridge.c:4267:fuse_init] >>>>> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 >>>>> kernel >>>>> 7.22 >>>>> [2019-05-17 01:20:22.689778] I [fuse-bridge.c:4878:fuse_graph_sync] >>>>> 0-fuse: switched to graph 0 >>>>> [2019-05-17 01:20:22.694897] E [fuse-bridge.c:4336:fuse_first_lookup] >>>>> 0-fuse: first lookup on root failed (Transport endpoint is not connected) >>>>> [2019-05-17 01:20:22.699770] W >>>>> [fuse-resolve.c:127:fuse_resolve_gfid_cbk] 0-fuse: >>>>> 00000000-0000-0000-0000-000000000001: failed to resolve (Transport >>>>> endpoint >>>>> is not connected) >>>>> [2019-05-17 01:20:22.699834] W >>>>> [fuse-bridge.c:3294:fuse_setxattr_resume] 0-glusterfs-fuse: 2: SETXATTR >>>>> 00000000-0000-0000-0000-000000000001/1 (trusted.add-brick) resolution >>>>> failed >>>>> [2019-05-17 01:20:22.715656] I [fuse-bridge.c:5144:fuse_thread_proc] >>>>> 0-fuse: initating unmount of /tmp/mntQAtu3f >>>>> [2019-05-17 01:20:22.715865] W [glusterfsd.c:1500:cleanup_and_exit] >>>>> (-->/lib64/libpthread.so.0(+0x7dd5) [0x7fb223bf6dd5] >>>>> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x560886581e75] >>>>> -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x560886581ceb] ) 0-: >>>>> received signum (15), shutting down >>>>> [2019-05-17 01:20:22.715926] I [fuse-bridge.c:5914:fini] 0-fuse: >>>>> Unmounting '/tmp/mntQAtu3f'. >>>>> [2019-05-17 01:20:22.715953] I [fuse-bridge.c:5919:fini] 0-fuse: >>>>> Closing fuse connection to '/tmp/mntQAtu3f'. >>>>> >>>>> Processes running on new node gfs3: >>>>> >>>>> # ps -ef | grep gluster >>>>> root 6832 1 0 20:17 ? 00:00:00 /usr/sbin/glusterd -p >>>>> /var/run/glusterd.pid --log-level INFO >>>>> root 15799 1 0 20:17 ? 00:00:00 /usr/sbin/glusterfs -s >>>>> localhost --volfile-id gluster/glustershd -p >>>>> /var/run/gluster/glustershd/glustershd.pid -l >>>>> /var/log/glusterfs/glustershd.log -S >>>>> /var/run/gluster/24c12b09f93eec8e.socket --xlator-option >>>>> *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412 --process-name >>>>> glustershd >>>>> root 16856 16735 0 21:21 pts/0 00:00:00 grep --color=auto >>>>> gluster >>>>> >>>>> -- >>>>> David Cunningham, Voisonics Limited >>>>> http://voisonics.com/ >>>>> USA: +1 213 221 1092 >>>>> New Zealand: +64 (0)28 2558 3782 >>>>> _______________________________________________ >>>>> Gluster-users mailing list >>>>> Gluster-users@gluster.org >>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>> >>>> >>> >>> -- >>> Thanks, >>> Sanju >>> >> >> >> -- >> David Cunningham, Voisonics Limited >> http://voisonics.com/ >> USA: +1 213 221 1092 >> New Zealand: +64 (0)28 2558 3782 >> >> _______________________________________________ >> Gluster-users mailing >> listGluster-users@gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users >> >> > > -- > David Cunningham, Voisonics Limited > http://voisonics.com/ > USA: +1 213 221 1092 > New Zealand: +64 (0)28 2558 3782 > > -- David Cunningham, Voisonics Limited http://voisonics.com/ USA: +1 213 221 1092 New Zealand: +64 (0)28 2558 3782
_______________________________________________ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users