Actually, never mind. I had another background process running that was checking the state of the gluster cluster. I turned that off and when I ran "gluster volume heal appian full" on server-ip-1, only the server-ip-1's tailed logs showed anything. The other two server's logs didn't output anything.
On Sun, Jun 26, 2016 at 2:02 PM, Danny Lee <dan...@vt.edu> wrote: > Originally, I ran "sudo gluster volume heal appian full" on server-ip-1 > and then tailed the logs for all of the servers. The only thing that > showed up was the logs for server-ip-1, so I thought it wasn't even > connecting to the other boxes. But after about 15 seconds, logs showed up > in server-ip-2 and server-ip-3. Thanks for pointing that out, Joe. > > Here are the tailed logs. I also noticed that there were some locking > errors that popped up once in a while in > the etc-glusterfs-glusterd.vol.log. I have also added these logs below. > > ==== server-ip-1 ==== > > ==> etc-glusterfs-glusterd.vol.log <== > [2016-06-26 16:48:31.405513] I [MSGID: 106533] > [glusterd-volume-ops.c:857:__glusterd_handle_cli_heal_volume] 0-management: > Received heal vol req for volume volname > [2016-06-26 16:48:31.409903] E [MSGID: 106153] > [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Commit failed on > server-ip-2. Please check log file for details. > > ==> cli.log <== > [2016-06-26 16:48:51.208828] I [cli.c:721:main] 0-cli: Started running > gluster with version 3.7.11 > [2016-06-26 16:48:51.213391] I > [cli-cmd-volume.c:1795:cli_check_gsync_present] 0-: geo-replication not > installed > [2016-06-26 16:48:51.213674] I [MSGID: 101190] > [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread > with index 1 > [2016-06-26 16:48:51.213733] I [socket.c:2356:socket_event_handler] > 0-transport: disconnecting now > [2016-06-26 16:48:51.219674] I [cli-rpc-ops.c:8417:gf_cli_heal_volume_cbk] > 0-cli: Received resp to heal volume > [2016-06-26 16:48:51.219768] I [input.c:36:cli_batch] 0-: Exiting with: -1 > > ==> cmd_history.log <== > [2016-06-26 16:48:51.219596] : volume heal volname full : FAILED : Commit > failed on server-ip-2. Please check log file for details. > > ==> etc-glusterfs-glusterd.vol.log <== > [2016-06-26 16:48:51.214185] I [MSGID: 106533] > [glusterd-volume-ops.c:857:__glusterd_handle_cli_heal_volume] 0-management: > Received heal vol req for volume volname > [2016-06-26 16:48:51.219031] E [MSGID: 106153] > [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Commit failed on > server-ip-2. Please check log file for details. > > > ==== server-ip-2 ==== > > ==> etc-glusterfs-glusterd.vol.log <== > [2016-06-26 16:48:30.087365] I [MSGID: 106533] > [glusterd-volume-ops.c:857:__glusterd_handle_cli_heal_volume] 0-management: > Received heal vol req for volume volname > [2016-06-26 16:48:30.092829] E [MSGID: 106153] > [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Commit failed on > server-ip-2. Please check log file for details. > > ==> cli.log <== > [2016-06-26 16:49:30.099446] I [cli.c:721:main] 0-cli: Started running > gluster with version 3.7.11 > [2016-06-26 16:49:30.104599] I > [cli-cmd-volume.c:1795:cli_check_gsync_present] 0-: geo-replication not > installed > [2016-06-26 16:49:30.104853] I [MSGID: 101190] > [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread > with index 1 > [2016-06-26 16:49:30.104896] I [socket.c:2356:socket_event_handler] > 0-transport: disconnecting now > [2016-06-26 16:49:30.177924] I [input.c:36:cli_batch] 0-: Exiting with: 0 > > > ==== server-ip-3 ==== > > ==> cli.log <== > [2016-06-26 16:48:49.177859] I [cli.c:721:main] 0-cli: Started running > gluster with version 3.7.11 > [2016-06-26 16:48:49.182887] I > [cli-cmd-volume.c:1795:cli_check_gsync_present] 0-: geo-replication not > installed > [2016-06-26 16:48:49.183146] I [MSGID: 101190] > [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread > with index 1 > [2016-06-26 16:48:49.183188] I [socket.c:2356:socket_event_handler] > 0-transport: disconnecting now > [2016-06-26 16:48:49.189005] I [cli-rpc-ops.c:8417:gf_cli_heal_volume_cbk] > 0-cli: Received resp to heal volume > [2016-06-26 16:48:49.189058] I [input.c:36:cli_batch] 0-: Exiting with: -1 > > > ==== All servers ==== > > ==> glfsheal-volname.log <== > [2016-06-26 16:51:47.493809] I [MSGID: 104045] [glfs-master.c:95:notify] > 0-gfapi: New graph 6766732d-332d-6272-6963-6b2d656d7074 (0) coming up > [2016-06-26 16:51:47.493841] I [MSGID: 114020] [client.c:2106:notify] > 0-volname-client-0: parent translators are ready, attempting connect on > transport > [2016-06-26 16:51:47.496465] I [MSGID: 114020] [client.c:2106:notify] > 0-volname-client-1: parent translators are ready, attempting connect on > transport > [2016-06-26 16:51:47.496729] I [rpc-clnt.c:1868:rpc_clnt_reconfig] > 0-volname-client-0: changing port to 49152 (from 0) > [2016-06-26 16:51:47.498945] I [MSGID: 114020] [client.c:2106:notify] > 0-volname-client-2: parent translators are ready, attempting connect on > transport > [2016-06-26 16:51:47.501600] I [MSGID: 114057] > [client-handshake.c:1437:select_server_supported_programs] > 0-volname-client-0: Using Program GlusterFS 3.3, Num (1298437), Version > (330) > [2016-06-26 16:51:47.502008] I [MSGID: 114046] > [client-handshake.c:1213:client_setvolume_cbk] 0-volname-client-0: > Connected to volname-client-0, attached to remote volume > '/usr/local/volname/local-data/mirrored-data'. > [2016-06-26 16:51:47.502031] I [MSGID: 114047] > [client-handshake.c:1224:client_setvolume_cbk] 0-volname-client-0: Server > and Client lk-version numbers are not same, reopening the fds > [2016-06-26 16:51:47.502101] I [MSGID: 108005] > [afr-common.c:4007:afr_notify] 0-volname-replicate-0: Subvolume > 'volname-client-0' came back up; going online. > [2016-06-26 16:51:47.502146] I [rpc-clnt.c:1868:rpc_clnt_reconfig] > 0-volname-client-1: changing port to 49152 (from 0) > [2016-06-26 16:51:47.502201] I [MSGID: 114035] > [client-handshake.c:193:client_set_lk_version_cbk] 0-volname-client-0: > Server lk version = 1 > [2016-06-26 16:51:47.504858] I [rpc-clnt.c:1868:rpc_clnt_reconfig] > 0-volname-client-2: changing port to 49152 (from 0) > [2016-06-26 16:51:47.507128] I [MSGID: 114057] > [client-handshake.c:1437:select_server_supported_programs] > 0-volname-client-1: Using Program GlusterFS 3.3, Num (1298437), Version > (330) > [2016-06-26 16:51:47.507674] I [MSGID: 114046] > [client-handshake.c:1213:client_setvolume_cbk] 0-volname-client-1: > Connected to volname-client-1, attached to remote volume > '/usr/local/volname/local-data/mirrored-data'. > [2016-06-26 16:51:47.507692] I [MSGID: 114047] > [client-handshake.c:1224:client_setvolume_cbk] 0-volname-client-1: Server > and Client lk-version numbers are not same, reopening the fds > [2016-06-26 16:51:47.507994] I [MSGID: 114035] > [client-handshake.c:193:client_set_lk_version_cbk] 0-volname-client-1: > Server lk version = 1 > [2016-06-26 16:51:47.509373] I [MSGID: 114057] > [client-handshake.c:1437:select_server_supported_programs] > 0-volname-client-2: Using Program GlusterFS 3.3, Num (1298437), Version > (330) > [2016-06-26 16:51:47.509815] I [MSGID: 114046] > [client-handshake.c:1213:client_setvolume_cbk] 0-volname-client-2: > Connected to volname-client-2, attached to remote volume > '/usr/local/volname/local-data/mirrored-data'. > [2016-06-26 16:51:47.509825] I [MSGID: 114047] > [client-handshake.c:1224:client_setvolume_cbk] 0-volname-client-2: Server > and Client lk-version numbers are not same, reopening the fds > [2016-06-26 16:51:47.526799] I [MSGID: 114035] > [client-handshake.c:193:client_set_lk_version_cbk] 0-volname-client-2: > Server lk version = 1 > [2016-06-26 16:51:47.528396] I [MSGID: 104041] > [glfs-resolve.c:869:__glfs_active_subvol] 0-volname: switched to graph > 6766732d-332d-6272-6963-6b2d656d7074 (0) > > > > ==== Possibly Related? ==== > ==== server-ip-1 ==== > > ==> Some logs of note in etc-glusterfs-glusterd.vol.log <== > [2016-06-26 11:35:13.583338] W > [glusterd-locks.c:577:glusterd_mgmt_v3_lock] > (-->/usr/lib64/glusterfs/3.7.11/xlator/mgmt/glusterd.so(glusterd_op_begin_synctask+0x30) > [0x7fad0f286520] > -->/usr/lib64/glusterfs/3.7.11/xlator/mgmt/glusterd.so(gd_sync_task_begin+0x980) > [0x7fad0f286450] > -->/usr/lib64/glusterfs/3.7.11/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_lock+0x5f7) > [0x7fad0f28ac47] ) 0-management: Lock for volname held by > e24bbfa8-5e0f-4ff2-846d-743bf47d2022 > [2016-06-26 11:35:13.583355] E [MSGID: 106119] > [glusterd-syncop.c:1830:gd_sync_task_begin] 0-management: Unable to acquire > lock for volname > [2016-06-26 11:35:13.601006] I [MSGID: 106494] > [glusterd-handler.c:3070:__glusterd_handle_cli_profile_volume] > 0-management: Received volume profile req for volume volname > [2016-06-26 11:35:13.868117] W > [glusterd-locks.c:577:glusterd_mgmt_v3_lock] > (-->/usr/lib64/glusterfs/3.7.11/xlator/mgmt/glusterd.so(glusterd_op_sm+0x29f) > [0x7fad0f1fb86f] > -->/usr/lib64/glusterfs/3.7.11/xlator/mgmt/glusterd.so(+0x5c675) > [0x7fad0f1ed675] > -->/usr/lib64/glusterfs/3.7.11/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_lock+0x5f7) > [0x7fad0f28ac47] ) 0-management: Lock for volname held by > 9c486029-ec26-4222-b302-b609216fd68f > [2016-06-26 11:35:13.868135] E [MSGID: 106119] > [glusterd-op-sm.c:3711:glusterd_op_ac_lock] 0-management: Unable to acquire > lock for volname > [2016-06-26 11:35:13.868175] E [MSGID: 106376] > [glusterd-op-sm.c:7591:glusterd_op_sm] 0-management: handler returned: -1 > [2016-06-26 11:35:13.869150] E [MSGID: 106116] > [glusterd-mgmt.c:135:gd_mgmt_v3_collate_errors] 0-management: Locking > failed on server-ip-2. Please check log file for details. > [2016-06-26 11:35:13.869515] W > [glusterd-locks.c:692:glusterd_mgmt_v3_unlock] > (-->/usr/lib64/glusterfs/3.7.11/xlator/mgmt/glusterd.so(glusterd_op_sm+0x29f) > [0x7fad0f1fb86f] > -->/usr/lib64/glusterfs/3.7.11/xlator/mgmt/glusterd.so(+0x5c362) > [0x7fad0f1ed362] > -->/usr/lib64/glusterfs/3.7.11/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x37a) > [0x7fad0f28afea] ) 0-management: Lock owner mismatch. Lock for vol volname > held by 9c486029-ec26-4222-b302-b609216fd68f > [2016-06-26 11:35:13.869533] E [MSGID: 106118] > [glusterd-op-sm.c:3774:glusterd_op_ac_unlock] 0-management: Unable to > release lock for volname > [2016-06-26 11:35:13.869573] E [MSGID: 106376] > [glusterd-op-sm.c:7591:glusterd_op_sm] 0-management: handler returned: 1 > [2016-06-26 11:35:13.869590] E [MSGID: 106151] > [glusterd-syncop.c:1868:gd_sync_task_begin] 0-management: Locking Peers > Failed. > [2016-06-26 11:35:13.872140] E [MSGID: 106116] > [glusterd-mgmt.c:135:gd_mgmt_v3_collate_errors] 0-management: Unlocking > failed on server-ip-2. Please check log file for details. > [2016-06-26 11:35:13.872187] E [MSGID: 106152] > [glusterd-syncop.c:1569:gd_unlock_op_phase] 0-management: Failed to unlock > on some peer(s) > [2016-06-26 11:35:13.959124] W > [glusterd-locks.c:577:glusterd_mgmt_v3_lock] > (-->/usr/lib64/glusterfs/3.7.11/xlator/mgmt/glusterd.so(glusterd_op_begin_synctask+0x30) > [0x7fad0f286520] > -->/usr/lib64/glusterfs/3.7.11/xlator/mgmt/glusterd.so(gd_sync_task_begin+0x980) > [0x7fad0f286450] > -->/usr/lib64/glusterfs/3.7.11/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_lock+0x5f7) > [0x7fad0f28ac47] ) 0-management: Lock for volname held by > e24bbfa8-5e0f-4ff2-846d-743bf47d2022 > The message "I [MSGID: 106494] > [glusterd-handler.c:3070:__glusterd_handle_cli_profile_volume] > 0-management: Received volume profile req for volume volname" repeated 7 > times between [2016-06-26 11:35:13.601006] and [2016-06-26 11:35:14.822349] > [2016-06-26 11:35:40.905084] I [MSGID: 106533] > [glusterd-volume-ops.c:857:__glusterd_handle_cli_heal_volume] 0-management: > Received heal vol req for volume volname > [2016-06-26 11:35:40.909790] E [MSGID: 106153] > [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Commit failed on > server-ip-2. Please check log file for details. > The message "I [MSGID: 106488] > [glusterd-handler.c:1533:__glusterd_handle_cli_get_volume] 0-glusterd: > Received get vol req" repeated 2 times between [2016-06-26 11:35:13.538007] > and [2016-06-26 11:35:13.735654] > [2016-06-26 11:35:13.959143] E [MSGID: 106119] > [glusterd-syncop.c:1830:gd_sync_task_begin] 0-management: Unable to acquire > lock for volname > The message "I [MSGID: 106499] > [glusterd-handler.c:4331:__glusterd_handle_status_volume] 0-management: > Received status volume req for volume volname" repeated 8 times between > [2016-06-26 11:35:13.583212] and [2016-06-26 11:35:14.358853] > > On Sat, Jun 25, 2016 at 11:17 AM, Joe Julian <j...@julianfamily.org> wrote: > >> Notice it actually tells you to look in the logs on server-ip-2 but you >> did not include any logs from that server. >> >> On June 21, 2016 10:22:14 AM PDT, Danny Lee <dan...@vt.edu> wrote: >> >>> Hello, >>> >>> We are currently figuring out how to add GlusterFS to our system to make >>> our systems highly available using scripts. We are using Gluster 3.7.11. >>> >>> Problem: >>> Trying to migrate to GlusterFS from a non-clustered system to a 3-node >>> glusterfs replicated cluster using scripts. Tried various things to make >>> this work, but it sometimes causes us to be in an indesirable state where >>> if you call "gluster volume heal <volname> full", we would get the error >>> message, "Launching heal operation to perform full self heal on volume >>> <volname> has been unsuccessful on bricks that are down. Please check if >>> all brick processes are running." All the brick processes are running >>> based on running the command, "gluster volume status volname" >>> >>> Things we have tried: >>> Order of preference >>> 1. Create Volume with 3 Filesystems with the same data >>> 2. Create Volume with 2 Empty filesysytems and one with the data >>> 3. Create Volume with only one filesystem with data and then using >>> "add-brick" command to add the other two empty filesystems >>> 4. Create Volume with one empty filesystem, mounting it, and then >>> copying the data over to that one. And then finally, using "add-brick" >>> command to add the other two empty filesystems >>> 5. Create Volume with 3 empty filesystems, mounting it, and then copying >>> the data over >>> >>> Other things to note: >>> A few minutes after the volume is created and started successfully, >>> our application server starts up against it, so reads and writes may happen >>> pretty quickly after the volume has started. But there is only about 50MB >>> of data. >>> >>> Steps to reproduce (all in a script): >>> # This is run by the primary node with the IP Adress, <server-ip-1>, >>> that has data >>> systemctl restart glusterd >>> gluster peer probe <server-ip-2> >>> gluster peer probe <server-ip-3> >>> Wait for "gluster peer status" to all be in "Peer in Cluster" state >>> gluster volume create <volname> replica 3 transport tcp ${BRICKS[0]} >>> ${BRICKS[1]} ${BRICKS[2]} force >>> gluster volume set <volname> nfs.disable true >>> gluster volume start <volname> >>> mkdir -p $MOUNT_POINT >>> mount -t glusterfs <server-ip-1>:/volname $MOUNT_POINT >>> find $MOUNT_POINT | xargs stat >>> >>> Note that, when we added sleeps around the gluster commands, there was a >>> higher probability of success, but not 100%. >>> >>> # Once volume is started, all the the clients/servers will mount the >>> gluster filesystem by polling "mountpoint -q $MOUNT_POINT": >>> mkdir -p $MOUNT_POINT >>> mount -t glusterfs <server-ip-1>:/volname $MOUNT_POINT >>> >>> Logs: >>> *etc-glusterfs-glusterd.vol.log* in *server-ip-1* >>> >>> [2016-06-21 14:10:38.285234] I [MSGID: 106533] >>> [glusterd-volume-ops.c:857:__glusterd_handle_cli_heal_volume] 0-management: >>> Received heal vol req for volume volname >>> [2016-06-21 14:10:38.296801] E [MSGID: 106153] >>> [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Commit failed on >>> <server-ip-2>. Please check log file for details. >>> >>> >>> *usr-local-volname-data-mirrored-data.log* in *server-ip-1* >>> >>> [2016-06-21 14:14:39.233366] E [MSGID: 114058] >>> [client-handshake.c:1524:client_query_portmap_cbk] 0-volname-client-0: >>> failed to get the port number for remote subvolume. Please run 'gluster >>> volume status' on server to see if brick process is running. >>> *I think this is caused by the self heal daemon* >>> >>> *cmd_history.log* in *server-ip-1* >>> >>> [2016-06-21 14:10:38.298800] : volume heal volname full : FAILED : >>> Commit failed on <server-ip-2>. Please check log file for details. >>> >>> ------------------------------ >>> >>> Gluster-users mailing list >>> Gluster-users@gluster.org >>> http://www.gluster.org/mailman/listinfo/gluster-users >>> >>> >> -- >> Sent from my Android device with K-9 Mail. Please excuse my brevity. >> > >
_______________________________________________ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users