Re: [Gluster-users] NFS over gluster stops responding under write load
No replies? Surely this can't be expected behavior? On Tue, Sep 18, 2012 at 2:59 PM, Lonni J Friedman netll...@gmail.com wrote: Greetings, I'm running version 3.3.0 on Fedora16-x86_64. I have two bricks setup with a volume doing basic replication on an XFS formatted filesystem. I've NFS mounted the volume on a 3rd system, and invoked bonnie++ to write to the NFS mount point. After a few minutes, I noticed that bonnie++ didn't seem to be generating any more progress output, at which point I started checking assorted logs to see if anything was wrong. At that point, I saw that the client system was no longer able to write to the NFS mount point, and dmesg (and /var/log/messages) was spewing these warnings like crazy (dozens/second): nfs: server 10.31.99.166 not responding, still trying trying Those warnings started at 14:40:58 on the client system, but oddly stopped a few seconds later at 14:41:04. Here's the full bonnie++ output (/mnt/gv0 is where the gluster file system is mounted as an NFS client): [root@cuda-ljf0 ~]# bonnie++ -d /mnt/gv0 -u root Using uid:0, gid:0. Writing a byte at a time...done Writing intelligently...done Rewriting... Here's what's in the glusterfs logs at the moment: ## # tail etc-glusterfs-glusterd.vol.log [2012-09-18 14:54:39.026557] I [glusterd-handler.c:542:glusterd_req_ctx_create] 0-glusterd: Received op from uuid: 1d3fb6c7-f5eb-42e9-b2bc-48bd3ed09e62 [2012-09-18 14:54:39.029463] I [glusterd-handler.c:1417:glusterd_op_stage_send_resp] 0-glusterd: Responded to stage, ret: 0 [2012-09-18 14:54:46.993426] I [glusterd-volume-ops.c:492:glusterd_handle_cli_heal_volume] 0-management: Received heal vol req for volume gv0 [2012-09-18 14:54:46.993503] E [glusterd-utils.c:277:glusterd_lock] 0-glusterd: Unable to get lock for uuid: e9ce949d-8521-4868-ad1b-860e0ffd8768, lock held by: 1d3fb6c7-f5eb-42e9-b2bc-48bd3ed09e62 [2012-09-18 14:54:46.993520] E [glusterd-handler.c:453:glusterd_op_txn_begin] 0-management: Unable to acquire local lock, ret: -1 [2012-09-18 14:55:47.175521] I [glusterd-handler.c:860:glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2012-09-18 14:55:47.181048] I [glusterd-handler.c:860:glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2012-09-18 14:55:49.306776] I [glusterd-volume-ops.c:492:glusterd_handle_cli_heal_volume] 0-management: Received heal vol req for volume gv0 [2012-09-18 14:55:49.306834] E [glusterd-utils.c:277:glusterd_lock] 0-glusterd: Unable to get lock for uuid: e9ce949d-8521-4868-ad1b-860e0ffd8768, lock held by: 1d3fb6c7-f5eb-42e9-b2bc-48bd3ed09e62 [2012-09-18 14:55:49.306844] E [glusterd-handler.c:453:glusterd_op_txn_begin] 0-management: Unable to acquire local lock, ret: -1 # tail -f cli.log [2012-09-18 14:55:47.176824] I [cli-rpc-ops.c:479:gf_cli3_1_get_volume_cbk] 0-cli: Received resp to get vol: 0 [2012-09-18 14:55:47.180959] I [cli-rpc-ops.c:732:gf_cli3_1_get_volume_cbk] 0-cli: Returning: 0 [2012-09-18 14:55:47.181128] I [cli-rpc-ops.c:479:gf_cli3_1_get_volume_cbk] 0-cli: Received resp to get vol: 0 [2012-09-18 14:55:47.181167] I [cli-rpc-ops.c:732:gf_cli3_1_get_volume_cbk] 0-cli: Returning: 0 [2012-09-18 14:55:47.181214] I [input.c:46:cli_batch] 0-: Exiting with: 0 [2012-09-18 14:55:49.244795] W [rpc-transport.c:174:rpc_transport_load] 0-rpc-transport: missing 'option transport-type'. defaulting to socket [2012-09-18 14:55:49.307054] I [cli-rpc-ops.c:5905:gf_cli3_1_heal_volume_cbk] 0-cli: Received resp to heal volume [2012-09-18 14:55:49.307274] W [dict.c:2339:dict_unserialize] (--/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0xa5) [0x328ca10365] (--/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5) [0x328ca0f965] (--gluster(gf_cli3_1_heal_volume_cbk+0x1d4) [0x4225e4]))) 0-dict: buf is null! [2012-09-18 14:55:49.307289] E [cli-rpc-ops.c:5930:gf_cli3_1_heal_volume_cbk] 0-: Unable to allocate memory [2012-09-18 14:55:49.307314] I [input.c:46:cli_batch] 0-: Exiting with: -1 ## I'd be happy to provide more if someone requests something specific. Not sure what other information to provide at this point, but here's the basics of the gluster setup: ## # gluster volume info all Volume Name: gv0 Type: Replicate Volume ID: 200046fc-1b5f-460c-b54b-96932e31ed3c Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 10.31.99.165:/mnt/sdb1 Brick2: 10.31.99.166:/mnt/sdb1 # gluster volume heal gv0 info operation failed ## I just noticed that glusterfs seems to be rapidly heading towards OOM territory. The glusterfs daemon is currently consuming 90% of MEM according to top. thanks ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] NFS over gluster stops responding under write load
Not expected behavior, but one that has been noted before: Joe Julian posted this only a few days ago - it sounds similar to your problem? https://bugzilla.redhat.com/show_bug.cgi?id=849526 On Thursday, September 20, 2012 11:58:50 AM Lonni J Friedman wrote: No replies? Surely this can't be expected behavior? On Tue, Sep 18, 2012 at 2:59 PM, Lonni J Friedman netll...@gmail.com wrote: Greetings, I'm running version 3.3.0 on Fedora16-x86_64. I have two bricks setup with a volume doing basic replication on an XFS formatted filesystem. I've NFS mounted the volume on a 3rd system, and invoked bonnie++ to write to the NFS mount point. After a few minutes, I noticed that bonnie++ didn't seem to be generating any more progress output, at which point I started checking assorted logs to see if anything was wrong. At that point, I saw that the client system was no longer able to write to the NFS mount point, and dmesg (and /var/log/messages) was spewing these warnings like crazy (dozens/second): nfs: server 10.31.99.166 not responding, still trying trying Those warnings started at 14:40:58 on the client system, but oddly stopped a few seconds later at 14:41:04. Here's the full bonnie++ output (/mnt/gv0 is where the gluster file system is mounted as an NFS client): [root@cuda-ljf0 ~]# bonnie++ -d /mnt/gv0 -u root Using uid:0, gid:0. Writing a byte at a time...done Writing intelligently...done Rewriting... Here's what's in the glusterfs logs at the moment: ## # tail etc-glusterfs-glusterd.vol.log [2012-09-18 14:54:39.026557] I [glusterd-handler.c:542:glusterd_req_ctx_create] 0-glusterd: Received op from uuid: 1d3fb6c7-f5eb-42e9-b2bc-48bd3ed09e62 [2012-09-18 14:54:39.029463] I [glusterd-handler.c:1417:glusterd_op_stage_send_resp] 0-glusterd: Responded to stage, ret: 0 [2012-09-18 14:54:46.993426] I [glusterd-volume-ops.c:492:glusterd_handle_cli_heal_volume] 0-management: Received heal vol req for volume gv0 [2012-09-18 14:54:46.993503] E [glusterd-utils.c:277:glusterd_lock] 0-glusterd: Unable to get lock for uuid: e9ce949d-8521-4868-ad1b-860e0ffd8768, lock held by: 1d3fb6c7-f5eb-42e9-b2bc-48bd3ed09e62 [2012-09-18 14:54:46.993520] E [glusterd-handler.c:453:glusterd_op_txn_begin] 0-management: Unable to acquire local lock, ret: -1 [2012-09-18 14:55:47.175521] I [glusterd-handler.c:860:glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2012-09-18 14:55:47.181048] I [glusterd-handler.c:860:glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2012-09-18 14:55:49.306776] I [glusterd-volume-ops.c:492:glusterd_handle_cli_heal_volume] 0-management: Received heal vol req for volume gv0 [2012-09-18 14:55:49.306834] E [glusterd-utils.c:277:glusterd_lock] 0-glusterd: Unable to get lock for uuid: e9ce949d-8521-4868-ad1b-860e0ffd8768, lock held by: 1d3fb6c7-f5eb-42e9-b2bc-48bd3ed09e62 [2012-09-18 14:55:49.306844] E [glusterd-handler.c:453:glusterd_op_txn_begin] 0-management: Unable to acquire local lock, ret: -1 # tail -f cli.log [2012-09-18 14:55:47.176824] I [cli-rpc-ops.c:479:gf_cli3_1_get_volume_cbk] 0-cli: Received resp to get vol: 0 [2012-09-18 14:55:47.180959] I [cli-rpc-ops.c:732:gf_cli3_1_get_volume_cbk] 0-cli: Returning: 0 [2012-09-18 14:55:47.181128] I [cli-rpc-ops.c:479:gf_cli3_1_get_volume_cbk] 0-cli: Received resp to get vol: 0 [2012-09-18 14:55:47.181167] I [cli-rpc-ops.c:732:gf_cli3_1_get_volume_cbk] 0-cli: Returning: 0 [2012-09-18 14:55:47.181214] I [input.c:46:cli_batch] 0-: Exiting with: 0 [2012-09-18 14:55:49.244795] W [rpc-transport.c:174:rpc_transport_load] 0-rpc-transport: missing 'option transport-type'. defaulting to socket [2012-09-18 14:55:49.307054] I [cli-rpc-ops.c:5905:gf_cli3_1_heal_volume_cbk] 0-cli: Received resp to heal volume [2012-09-18 14:55:49.307274] W [dict.c:2339:dict_unserialize] (--/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0xa5) [0x328ca10365] (--/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5) [0x328ca0f965] (--gluster(gf_cli3_1_heal_volume_cbk+0x1d4) [0x4225e4]))) 0-dict: buf is null! [2012-09-18 14:55:49.307289] E [cli-rpc-ops.c:5930:gf_cli3_1_heal_volume_cbk] 0-: Unable to allocate memory [2012-09-18 14:55:49.307314] I [input.c:46:cli_batch] 0-: Exiting with: -1 ## I'd be happy to provide more if someone requests something specific. Not sure what other information to provide at this point, but here's the basics of the gluster setup: ## # gluster volume info all Volume Name: gv0 Type: Replicate Volume ID: 200046fc-1b5f-460c-b54b-96932e31ed3c Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 10.31.99.165:/mnt/sdb1 Brick2: 10.31.99.166:/mnt/sdb1 # gluster volume heal gv0 info operation failed ## I just noticed that glusterfs seems to be rapidly
Re: [Gluster-users] NFS over gluster stops responding under write load
Yup, that does look quite similar. On Thu, Sep 20, 2012 at 12:25 PM, harry mangalam harry.manga...@uci.edu wrote: Not expected behavior, but one that has been noted before: Joe Julian posted this only a few days ago - it sounds similar to your problem? https://bugzilla.redhat.com/show_bug.cgi?id=849526 On Thursday, September 20, 2012 11:58:50 AM Lonni J Friedman wrote: No replies? Surely this can't be expected behavior? On Tue, Sep 18, 2012 at 2:59 PM, Lonni J Friedman netll...@gmail.com wrote: Greetings, I'm running version 3.3.0 on Fedora16-x86_64. I have two bricks setup with a volume doing basic replication on an XFS formatted filesystem. I've NFS mounted the volume on a 3rd system, and invoked bonnie++ to write to the NFS mount point. After a few minutes, I noticed that bonnie++ didn't seem to be generating any more progress output, at which point I started checking assorted logs to see if anything was wrong. At that point, I saw that the client system was no longer able to write to the NFS mount point, and dmesg (and /var/log/messages) was spewing these warnings like crazy (dozens/second): nfs: server 10.31.99.166 not responding, still trying trying Those warnings started at 14:40:58 on the client system, but oddly stopped a few seconds later at 14:41:04. Here's the full bonnie++ output (/mnt/gv0 is where the gluster file system is mounted as an NFS client): [root@cuda-ljf0 ~]# bonnie++ -d /mnt/gv0 -u root Using uid:0, gid:0. Writing a byte at a time...done Writing intelligently...done Rewriting... Here's what's in the glusterfs logs at the moment: ## # tail etc-glusterfs-glusterd.vol.log [2012-09-18 14:54:39.026557] I [glusterd-handler.c:542:glusterd_req_ctx_create] 0-glusterd: Received op from uuid: 1d3fb6c7-f5eb-42e9-b2bc-48bd3ed09e62 [2012-09-18 14:54:39.029463] I [glusterd-handler.c:1417:glusterd_op_stage_send_resp] 0-glusterd: Responded to stage, ret: 0 [2012-09-18 14:54:46.993426] I [glusterd-volume-ops.c:492:glusterd_handle_cli_heal_volume] 0-management: Received heal vol req for volume gv0 [2012-09-18 14:54:46.993503] E [glusterd-utils.c:277:glusterd_lock] 0-glusterd: Unable to get lock for uuid: e9ce949d-8521-4868-ad1b-860e0ffd8768, lock held by: 1d3fb6c7-f5eb-42e9-b2bc-48bd3ed09e62 [2012-09-18 14:54:46.993520] E [glusterd-handler.c:453:glusterd_op_txn_begin] 0-management: Unable to acquire local lock, ret: -1 [2012-09-18 14:55:47.175521] I [glusterd-handler.c:860:glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2012-09-18 14:55:47.181048] I [glusterd-handler.c:860:glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2012-09-18 14:55:49.306776] I [glusterd-volume-ops.c:492:glusterd_handle_cli_heal_volume] 0-management: Received heal vol req for volume gv0 [2012-09-18 14:55:49.306834] E [glusterd-utils.c:277:glusterd_lock] 0-glusterd: Unable to get lock for uuid: e9ce949d-8521-4868-ad1b-860e0ffd8768, lock held by: 1d3fb6c7-f5eb-42e9-b2bc-48bd3ed09e62 [2012-09-18 14:55:49.306844] E [glusterd-handler.c:453:glusterd_op_txn_begin] 0-management: Unable to acquire local lock, ret: -1 # tail -f cli.log [2012-09-18 14:55:47.176824] I [cli-rpc-ops.c:479:gf_cli3_1_get_volume_cbk] 0-cli: Received resp to get vol: 0 [2012-09-18 14:55:47.180959] I [cli-rpc-ops.c:732:gf_cli3_1_get_volume_cbk] 0-cli: Returning: 0 [2012-09-18 14:55:47.181128] I [cli-rpc-ops.c:479:gf_cli3_1_get_volume_cbk] 0-cli: Received resp to get vol: 0 [2012-09-18 14:55:47.181167] I [cli-rpc-ops.c:732:gf_cli3_1_get_volume_cbk] 0-cli: Returning: 0 [2012-09-18 14:55:47.181214] I [input.c:46:cli_batch] 0-: Exiting with: 0 [2012-09-18 14:55:49.244795] W [rpc-transport.c:174:rpc_transport_load] 0-rpc-transport: missing 'option transport-type'. defaulting to socket [2012-09-18 14:55:49.307054] I [cli-rpc-ops.c:5905:gf_cli3_1_heal_volume_cbk] 0-cli: Received resp to heal volume [2012-09-18 14:55:49.307274] W [dict.c:2339:dict_unserialize] (--/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0xa5) [0x328ca10365] (--/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5) [0x328ca0f965] (--gluster(gf_cli3_1_heal_volume_cbk+0x1d4) [0x4225e4]))) 0-dict: buf is null! [2012-09-18 14:55:49.307289] E [cli-rpc-ops.c:5930:gf_cli3_1_heal_volume_cbk] 0-: Unable to allocate memory [2012-09-18 14:55:49.307314] I [input.c:46:cli_batch] 0-: Exiting with: -1 ## I'd be happy to provide more if someone requests something specific. Not sure what other information to provide at this point, but here's the basics of the gluster setup: ## # gluster volume info all Volume Name: gv0 Type: Replicate Volume ID: 200046fc-1b5f-460c-b54b-96932e31ed3c Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 10.31.99.165:/mnt/sdb1 Brick2: 10.31.99.166:/mnt/sdb1 #
[Gluster-users] NFS over gluster stops responding under write load
Greetings, I'm running version 3.3.0 on Fedora16-x86_64. I have two bricks setup with a volume doing basic replication on an XFS formatted filesystem. I've NFS mounted the volume on a 3rd system, and invoked bonnie++ to write to the NFS mount point. After a few minutes, I noticed that bonnie++ didn't seem to be generating any more progress output, at which point I started checking assorted logs to see if anything was wrong. At that point, I saw that the client system was no longer able to write to the NFS mount point, and dmesg (and /var/log/messages) was spewing these warnings like crazy (dozens/second): nfs: server 10.31.99.166 not responding, still trying trying Those warnings started at 14:40:58 on the client system, but oddly stopped a few seconds later at 14:41:04. Here's the full bonnie++ output (/mnt/gv0 is where the gluster file system is mounted as an NFS client): [root@cuda-ljf0 ~]# bonnie++ -d /mnt/gv0 -u root Using uid:0, gid:0. Writing a byte at a time...done Writing intelligently...done Rewriting... Here's what's in the glusterfs logs at the moment: ## # tail etc-glusterfs-glusterd.vol.log [2012-09-18 14:54:39.026557] I [glusterd-handler.c:542:glusterd_req_ctx_create] 0-glusterd: Received op from uuid: 1d3fb6c7-f5eb-42e9-b2bc-48bd3ed09e62 [2012-09-18 14:54:39.029463] I [glusterd-handler.c:1417:glusterd_op_stage_send_resp] 0-glusterd: Responded to stage, ret: 0 [2012-09-18 14:54:46.993426] I [glusterd-volume-ops.c:492:glusterd_handle_cli_heal_volume] 0-management: Received heal vol req for volume gv0 [2012-09-18 14:54:46.993503] E [glusterd-utils.c:277:glusterd_lock] 0-glusterd: Unable to get lock for uuid: e9ce949d-8521-4868-ad1b-860e0ffd8768, lock held by: 1d3fb6c7-f5eb-42e9-b2bc-48bd3ed09e62 [2012-09-18 14:54:46.993520] E [glusterd-handler.c:453:glusterd_op_txn_begin] 0-management: Unable to acquire local lock, ret: -1 [2012-09-18 14:55:47.175521] I [glusterd-handler.c:860:glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2012-09-18 14:55:47.181048] I [glusterd-handler.c:860:glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2012-09-18 14:55:49.306776] I [glusterd-volume-ops.c:492:glusterd_handle_cli_heal_volume] 0-management: Received heal vol req for volume gv0 [2012-09-18 14:55:49.306834] E [glusterd-utils.c:277:glusterd_lock] 0-glusterd: Unable to get lock for uuid: e9ce949d-8521-4868-ad1b-860e0ffd8768, lock held by: 1d3fb6c7-f5eb-42e9-b2bc-48bd3ed09e62 [2012-09-18 14:55:49.306844] E [glusterd-handler.c:453:glusterd_op_txn_begin] 0-management: Unable to acquire local lock, ret: -1 # tail -f cli.log [2012-09-18 14:55:47.176824] I [cli-rpc-ops.c:479:gf_cli3_1_get_volume_cbk] 0-cli: Received resp to get vol: 0 [2012-09-18 14:55:47.180959] I [cli-rpc-ops.c:732:gf_cli3_1_get_volume_cbk] 0-cli: Returning: 0 [2012-09-18 14:55:47.181128] I [cli-rpc-ops.c:479:gf_cli3_1_get_volume_cbk] 0-cli: Received resp to get vol: 0 [2012-09-18 14:55:47.181167] I [cli-rpc-ops.c:732:gf_cli3_1_get_volume_cbk] 0-cli: Returning: 0 [2012-09-18 14:55:47.181214] I [input.c:46:cli_batch] 0-: Exiting with: 0 [2012-09-18 14:55:49.244795] W [rpc-transport.c:174:rpc_transport_load] 0-rpc-transport: missing 'option transport-type'. defaulting to socket [2012-09-18 14:55:49.307054] I [cli-rpc-ops.c:5905:gf_cli3_1_heal_volume_cbk] 0-cli: Received resp to heal volume [2012-09-18 14:55:49.307274] W [dict.c:2339:dict_unserialize] (--/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0xa5) [0x328ca10365] (--/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5) [0x328ca0f965] (--gluster(gf_cli3_1_heal_volume_cbk+0x1d4) [0x4225e4]))) 0-dict: buf is null! [2012-09-18 14:55:49.307289] E [cli-rpc-ops.c:5930:gf_cli3_1_heal_volume_cbk] 0-: Unable to allocate memory [2012-09-18 14:55:49.307314] I [input.c:46:cli_batch] 0-: Exiting with: -1 ## I'd be happy to provide more if someone requests something specific. Not sure what other information to provide at this point, but here's the basics of the gluster setup: ## # gluster volume info all Volume Name: gv0 Type: Replicate Volume ID: 200046fc-1b5f-460c-b54b-96932e31ed3c Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 10.31.99.165:/mnt/sdb1 Brick2: 10.31.99.166:/mnt/sdb1 # gluster volume heal gv0 info operation failed ## I just noticed that glusterfs seems to be rapidly heading towards OOM territory. The glusterfs daemon is currently consuming 90% of MEM according to top. thanks ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] NFS over gluster stops responding under write load
On Tue, Sep 18, 2012 at 2:59 PM, Lonni J Friedman netll...@gmail.com wrote: Greetings, I'm running version 3.3.0 on Fedora16-x86_64. I have two bricks setup with a volume doing basic replication on an XFS formatted filesystem. I've NFS mounted the volume on a 3rd system, and invoked bonnie++ to write to the NFS mount point. After a few minutes, I noticed that bonnie++ didn't seem to be generating any more progress output, at which point I started checking assorted logs to see if anything was wrong. At that point, I saw that the client system was no longer able to write to the NFS mount point, and dmesg (and /var/log/messages) was spewing these warnings like crazy (dozens/second): nfs: server 10.31.99.166 not responding, still trying trying Those warnings started at 14:40:58 on the client system, but oddly stopped a few seconds later at 14:41:04. Here's the full bonnie++ output (/mnt/gv0 is where the gluster file system is mounted as an NFS client): [root@cuda-ljf0 ~]# bonnie++ -d /mnt/gv0 -u root Using uid:0, gid:0. Writing a byte at a time...done Writing intelligently...done Rewriting... Here's what's in the glusterfs logs at the moment: ## # tail etc-glusterfs-glusterd.vol.log [2012-09-18 14:54:39.026557] I [glusterd-handler.c:542:glusterd_req_ctx_create] 0-glusterd: Received op from uuid: 1d3fb6c7-f5eb-42e9-b2bc-48bd3ed09e62 [2012-09-18 14:54:39.029463] I [glusterd-handler.c:1417:glusterd_op_stage_send_resp] 0-glusterd: Responded to stage, ret: 0 [2012-09-18 14:54:46.993426] I [glusterd-volume-ops.c:492:glusterd_handle_cli_heal_volume] 0-management: Received heal vol req for volume gv0 [2012-09-18 14:54:46.993503] E [glusterd-utils.c:277:glusterd_lock] 0-glusterd: Unable to get lock for uuid: e9ce949d-8521-4868-ad1b-860e0ffd8768, lock held by: 1d3fb6c7-f5eb-42e9-b2bc-48bd3ed09e62 [2012-09-18 14:54:46.993520] E [glusterd-handler.c:453:glusterd_op_txn_begin] 0-management: Unable to acquire local lock, ret: -1 [2012-09-18 14:55:47.175521] I [glusterd-handler.c:860:glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2012-09-18 14:55:47.181048] I [glusterd-handler.c:860:glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2012-09-18 14:55:49.306776] I [glusterd-volume-ops.c:492:glusterd_handle_cli_heal_volume] 0-management: Received heal vol req for volume gv0 [2012-09-18 14:55:49.306834] E [glusterd-utils.c:277:glusterd_lock] 0-glusterd: Unable to get lock for uuid: e9ce949d-8521-4868-ad1b-860e0ffd8768, lock held by: 1d3fb6c7-f5eb-42e9-b2bc-48bd3ed09e62 [2012-09-18 14:55:49.306844] E [glusterd-handler.c:453:glusterd_op_txn_begin] 0-management: Unable to acquire local lock, ret: -1 # tail -f cli.log [2012-09-18 14:55:47.176824] I [cli-rpc-ops.c:479:gf_cli3_1_get_volume_cbk] 0-cli: Received resp to get vol: 0 [2012-09-18 14:55:47.180959] I [cli-rpc-ops.c:732:gf_cli3_1_get_volume_cbk] 0-cli: Returning: 0 [2012-09-18 14:55:47.181128] I [cli-rpc-ops.c:479:gf_cli3_1_get_volume_cbk] 0-cli: Received resp to get vol: 0 [2012-09-18 14:55:47.181167] I [cli-rpc-ops.c:732:gf_cli3_1_get_volume_cbk] 0-cli: Returning: 0 [2012-09-18 14:55:47.181214] I [input.c:46:cli_batch] 0-: Exiting with: 0 [2012-09-18 14:55:49.244795] W [rpc-transport.c:174:rpc_transport_load] 0-rpc-transport: missing 'option transport-type'. defaulting to socket [2012-09-18 14:55:49.307054] I [cli-rpc-ops.c:5905:gf_cli3_1_heal_volume_cbk] 0-cli: Received resp to heal volume [2012-09-18 14:55:49.307274] W [dict.c:2339:dict_unserialize] (--/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0xa5) [0x328ca10365] (--/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5) [0x328ca0f965] (--gluster(gf_cli3_1_heal_volume_cbk+0x1d4) [0x4225e4]))) 0-dict: buf is null! [2012-09-18 14:55:49.307289] E [cli-rpc-ops.c:5930:gf_cli3_1_heal_volume_cbk] 0-: Unable to allocate memory [2012-09-18 14:55:49.307314] I [input.c:46:cli_batch] 0-: Exiting with: -1 ## I'd be happy to provide more if someone requests something specific. Not sure what other information to provide at this point, but here's the basics of the gluster setup: ## # gluster volume info all Volume Name: gv0 Type: Replicate Volume ID: 200046fc-1b5f-460c-b54b-96932e31ed3c Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 10.31.99.165:/mnt/sdb1 Brick2: 10.31.99.166:/mnt/sdb1 # gluster volume heal gv0 info operation failed ## I just noticed that glusterfs seems to be rapidly heading towards OOM territory. The glusterfs daemon is currently consuming 90% of MEM according to top. I just attempted to shutdown the glusterd service, and it ran off a cliff. the OOM killer kicked in and killed it. From dmesg: # [ 4151.733182] glusterfsd invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0 [ 4151.733186] glusterfsd cpuset=/ mems_allowed=0 [ 4151.733189] Pid: 2567, comm: glusterfsd