Hi, Open a ticket so Red Hat technical staff can take care of this. I think it is the fastest way to resolve and fix this issue.
Regards. On Tue, Jun 28, 2011 at 8:55 AM, anderson souza <[email protected]>wrote: > Hi everyone, > > I have an Active/Passive RHCS 6.1 runing with 8TB of GFS2 with NFS on > top and exporting 26 mouting points to 250 NFS clients. The GFS2 mounting > points are mounted with noatime, nodiratime, data=writeback and localflocks > options, and also the SAN and servers are fast (4Gbps and 8Gb, dual > controllers working in LB, H.A... QuadCore, 48GB of memory...). The cluster > has been doing its work (failover working fine...), however > and unfortunately I have seen hight I/Owait rates, sometimes around 60-70% > (on which is very bad), and a couple of glock_workqueue jobs, so I get a > bunch of gfs2_quotad, nfsd errors and qdisk latency. The debugfs didn't show > me "W", only "G" and "H". > > Have you guys seen it before? > Looks like some glock's contention? > How could I get it fixed and what does it mean? > > Thank you very much > > > Jun 27 18:48:05 kernel: INFO: task gfs2_quotad:19066 blocked for more than > 120 seconds. > Jun 27 18:48:05 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > Jun 27 18:48:05 kernel: gfs2_quotad D 0000000000000004 0 19066 > 2 0x00000080 > Jun 27 18:48:05 kernel: ffff880bb01e1c20 0000000000000046 0000000000000000 > ffffffffa045ec6d > Jun 27 18:48:05 kernel: 0000000000000000 ffff880be6e2b000 ffff880bb01e1c50 > 00000001051d8b46 > Jun 27 18:48:05 kernel: ffff880be4865af8 ffff880bb01e1fd8 000000000000f598 > ffff880be4865af8t > Jun 27 18:48:05 kernel: Call Trace: > Jun 27 18:48:05 kernel: [<ffffffffa045ec6d>] ? dlm_put_lockspace+0x1d/0x40 > [dlm] > Jun 27 18:48:05 kernel: [<ffffffffa0525c50>] ? > gfs2_glock_holder_wait+0x0/0x20 [gfs2] > Jun 27 18:48:05 kernel: [<ffffffffa0525c5e>] > gfs2_glock_holder_wait+0xe/0x20 [gfs2] > Jun 27 18:48:05 kernel: [<ffffffff814db87f>] __wait_on_bit+0x5f/0x90 > Jun 27 18:48:05 kernel: [<ffffffffa0525c50>] ? > gfs2_glock_holder_wait+0x0/0x20 [gfs2] > Jun 27 18:48:05 kernel: [<ffffffff814db928>] > out_of_line_wait_on_bit+0x78/0x90 > Jun 27 18:48:05 kernel: [<ffffffff8108e140>] ? wake_bit_function+0x0/0x50 > Jun 27 18:48:05 kernel: [<ffffffffa0526816>] gfs2_glock_wait+0x36/0x40 > [gfs2] > Jun 27 18:48:05 kernel: [<ffffffffa0529011>] gfs2_glock_nq+0x191/0x370 > [gfs2] > Jun 27 18:48:05 kernel: [<ffffffff8107a11b>] ? > try_to_del_timer_sync+0x7b/0xe0 > Jun 27 18:48:05 kernel: [<ffffffffa05427f8>] gfs2_statfs_sync+0x58/0x1b0 > [gfs2] > Jun 27 18:48:05 kernel: [<ffffffff814db52a>] ? > schedule_timeout+0x19a/0x2e0 > Jun 27 18:48:05 kernel: [<ffffffffa05427f0>] ? gfs2_statfs_sync+0x50/0x1b0 > [gfs2] > Jun 27 18:48:05 kernel: [<ffffffffa053a787>] quotad_check_timeo+0x57/0xb0 > [gfs2] > Jun 27 18:48:05 kernel: [<ffffffffa053aa14>] gfs2_quotad+0x234/0x2b0 > [gfs2] > Jun 27 18:48:05 kernel: [<ffffffff8108e100>] ? > autoremove_wake_function+0x0/0x40 > Jun 27 18:48:05 kernel: [<ffffffffa053a7e0>] ? gfs2_quotad+0x0/0x2b0 > [gfs2] > Jun 27 18:48:05 kernel: [<ffffffff8108dd96>] kthread+0x96/0xa0 > Jun 27 18:48:05 kernel: [<ffffffff8100c1ca>] child_rip+0xa/0x20 > Jun 27 18:48:05 kernel: [<ffffffff8108dd00>] ? kthread+0x0/0xa0 > Jun 27 18:48:05 kernel: [<ffffffff8100c1c0>] ? child_rip+0x0/0x20 > > Jun 27 19:49:07 kernel: __ratelimit: 57 callbacks suppressed > Jun 27 19:49:07 kernel: nfsd: peername failed (err 107)! > Jun 27 19:49:07 kernel: nfsd: peername failed (err 107)! > Jun 27 19:49:07 kernel: nfsd: peername failed (err 107)! > Jun 27 19:49:07 kernel: nfsd: peername failed (err 107)! > Jun 27 19:49:07 kernel: nfsd: peername failed (err 107)! > Jun 27 19:49:07 kernel: nfsd: peername failed (err 107)! > Jun 27 19:49:07 kernel: nfsd: peername failed (err 107)! > Jun 27 19:49:07 kernel: nfsd: peername failed (err 107)! > Jun 27 19:49:07 kernel: nfsd: peername failed (err 107)! > Jun 27 19:49:07 kernel: nfsd: peername failed (err 107)! > Jun 27 20:00:58 kernel: rpc-srv/tcp: nfsd: got error -104 when sending 140 > bytes - shutting down socket > Jun 27 20:00:58 kernel: __ratelimit: 40 callbacks suppressed > qdiskd[10078]: qdisk cycle took more than 1 second to complete (1.170000) > qdisk cycle took more than 1 second to complete (1.120000) > > Thanks > James S. > > -- > Linux-cluster mailing list > [email protected] > https://www.redhat.com/mailman/listinfo/linux-cluster >
-- Linux-cluster mailing list [email protected] https://www.redhat.com/mailman/listinfo/linux-cluster
