Hi Dominic, Yes the errors are only belongs to passive path.
> ------------------------------ > > Message: 3 > Date: Tue, 21 Jun 2011 18:22:49 +0530 > From: dOminic <[email protected]> > To: linux clustering <[email protected]> > Subject: Re: [Linux-cluster] Cluster Failover Failed > Message-ID: <[email protected]> > Content-Type: text/plain; charset="iso-8859-1" > > Hi, > > Btw, how many HBAs are present in your box ? . Problem is with scsi3 only > ?. > > Refer https://access.redhat.com/kb/docs/DOC-2991 , then set the filter. > Also, I would suggest you to open ticket with Linux vendor if IO errors are > belongs to Active paths. > > Pointed IO errors are belongs to disk that in passive paths group ?. you > can > verify the same in multipath-ll output . > > regards, > > On Sun, Jun 19, 2011 at 10:03 PM, dOminic <[email protected]> wrote: > > > Hi Balaji, > > > > Yes, the reported message is harmless ... However, you can try following > > > > 1) I would suggest you to set the filter setting in lvm.conf to properly > > scan your mpath* devices and local disks. > > 2) Enable blacklist section in multipath.conf eg: > > > > blacklist { > > devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*" > > devnode "^hd[a-z]" > > } > > > > # multipath -v2 > > > > Observe the box. Check whether that helps ... > > > > > > Regards, > > > > > > On Wed, Jun 15, 2011 at 12:16 AM, Balaji S <[email protected]> wrote: > > > >> Hi, > >> In my setup implemented 10 tow node cluster's which running mysql as > >> cluster service, ipmi card as fencing device. > >> > >> In my /var/log/messages i am keep getting the errors like below, > >> > >> Jun 14 12:50:48 hostname kernel: end_request: I/O error, dev sdm, sector > 0 > >> Jun 14 12:50:48 hostname kernel: sd 3:0:2:2: Device not ready: <6>: > >> Current: sense key: Not Ready > >> Jun 14 12:50:48 hostname kernel: Add. Sense: Logical unit not ready, > >> manual intervention required > >> Jun 14 12:50:48 hostname kernel: > >> Jun 14 12:50:48 hostname kernel: end_request: I/O error, dev sdn, sector > 0 > >> Jun 14 12:50:48 hostname kernel: sd 3:0:2:4: Device not ready: <6>: > >> Current: sense key: Not Ready > >> Jun 14 12:50:48 hostname kernel: Add. Sense: Logical unit not ready, > >> manual intervention required > >> Jun 14 12:50:48 hostname kernel: > >> Jun 14 12:50:48 hostname kernel: end_request: I/O error, dev sdp, sector > 0 > >> Jun 14 12:51:10 hostname kernel: sd 3:0:0:1: Device not ready: <6>: > >> Current: sense key: Not Ready > >> Jun 14 12:51:10 hostname kernel: Add. Sense: Logical unit not ready, > >> manual intervention required > >> Jun 14 12:51:10 hostname kernel: > >> Jun 14 12:51:10 hostname kernel: end_request: I/O error, dev sdc, sector > 0 > >> Jun 14 12:51:10 hostname kernel: printk: 3 messages suppressed. > >> Jun 14 12:51:10 hostname kernel: Buffer I/O error on device sdc, logical > >> block 0 > >> Jun 14 12:51:10 hostname kernel: sd 3:0:0:2: Device not ready: <6>: > >> Current: sense key: Not Ready > >> Jun 14 12:51:10 hostname kernel: Add. Sense: Logical unit not ready, > >> manual intervention required > >> Jun 14 12:51:10 hostname kernel: > >> Jun 14 12:51:10 hostname kernel: end_request: I/O error, dev sdd, sector > 0 > >> Jun 14 12:51:10 hostname kernel: Buffer I/O error on device sdd, logical > >> block 0 > >> Jun 14 12:51:10 hostname kernel: sd 3:0:0:4: Device not ready: <6>: > >> Current: sense key: Not Ready > >> Jun 14 12:51:10 hostname kernel: Add. Sense: Logical unit not ready, > >> manual intervention required > >> > >> > >> when i am checking the multipath -ll , this all devices are in passive > >> path. > >> > >> Environment : > >> > >> RHEL 5.4 & EMC SAN > >> > >> Please suggest how to overcome this issue. Support will be highly > helpful. > >> Thanks in Advance > >> > >> > >> -- > >> Thanks, > >> BSK > >> > >> -- > >> Linux-cluster mailing list > >> [email protected] > >> https://www.redhat.com/mailman/listinfo/linux-cluster > >> > > > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > https://www.redhat.com/archives/linux-cluster/attachments/20110621/e41e841c/attachment.html > > > > ------------------------------ > > Message: 4 > Date: Tue, 21 Jun 2011 15:31:13 +0200 > From: Miha Valencic <[email protected]> > To: linux clustering <[email protected]> > Subject: Re: [Linux-cluster] Troubleshooting service relocation > Message-ID: <[email protected]> > Content-Type: text/plain; charset="utf-8" > > Michael, I've configured the logging on RM and am now waiting for it to > switch nodes. Hopefully, I can see a reason why it is relocating. > > Thanks, > Miha. > > On Sat, Jun 18, 2011 at 11:24 AM, Michael Pye <[email protected]> wrote: > > > On 17/06/2011 09:13, Miha Valencic wrote: > > > How can I turn on logging or what else can I check? > > > > Take a look at this knowledgebase article: > > https://access.redhat.com/kb/docs/DOC-53500 > > > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > https://www.redhat.com/archives/linux-cluster/attachments/20110621/19a643fd/attachment.html > > > > ------------------------------ > > Message: 5 > Date: Tue, 21 Jun 2011 09:57:38 -0400 > From: "Nicolas Ross" <[email protected]> > To: "linux clustering" <[email protected]> > Subject: [Linux-cluster] GFS2 fatal: filesystem consistency error > Message-ID: <AD364AF1E9D94C50B96231FB0320B1DE@versa> > Content-Type: text/plain; format=flowed; charset="iso-8859-1"; > reply-type=original > > 8 node cluster, fiber channel hbas and disks access trough a qlogic fabric. > > I've got hit 3 times with this error on different nodes : > > GFS2: fsid=CyberCluster:GizServer.1: fatal: filesystem consistency error > GFS2: fsid=CyberCluster:GizServer.1: inode = 9582 6698267 > GFS2: fsid=CyberCluster:GizServer.1: function = gfs2_dinode_dealloc, file = > fs/gfs2/inode.c, line = 352 > GFS2: fsid=CyberCluster:GizServer.1: about to withdraw this file system > GFS2: fsid=CyberCluster:GizServer.1: telling LM to unmount > GFS2: fsid=CyberCluster:GizServer.1: withdrawn > Pid: 2659, comm: delete_workqueu Tainted: G W ---------------- T > 2.6.32-131.2.1.el6.x86_64 #1 > Call Trace: > [<ffffffffa044ffd2>] ? gfs2_lm_withdraw+0x102/0x130 [gfs2] > [<ffffffffa0425209>] ? trunc_dealloc+0xa9/0x130 [gfs2] > [<ffffffffa04501dd>] ? gfs2_consist_inode_i+0x5d/0x60 [gfs2] > [<ffffffffa0435584>] ? gfs2_dinode_dealloc+0x64/0x210 [gfs2] > [<ffffffffa044e1da>] ? gfs2_delete_inode+0x1ba/0x280 [gfs2] > [<ffffffffa044e0ad>] ? gfs2_delete_inode+0x8d/0x280 [gfs2] > [<ffffffffa044e020>] ? gfs2_delete_inode+0x0/0x280 [gfs2] > [<ffffffff8118cfbe>] ? generic_delete_inode+0xde/0x1d0 > [<ffffffffa0432940>] ? delete_work_func+0x0/0x80 [gfs2] > [<ffffffff8118d115>] ? generic_drop_inode+0x65/0x80 > [<ffffffffa044cc4e>] ? gfs2_drop_inode+0x2e/0x30 [gfs2] > [<ffffffff8118bf82>] ? iput+0x62/0x70 > [<ffffffffa0432994>] ? delete_work_func+0x54/0x80 [gfs2] > [<ffffffff810887d0>] ? worker_thread+0x170/0x2a0 > [<ffffffff8108e100>] ? autoremove_wake_function+0x0/0x40 > [<ffffffff81088660>] ? worker_thread+0x0/0x2a0 > [<ffffffff8108dd96>] ? kthread+0x96/0xa0 > [<ffffffff8100c1ca>] ? child_rip+0xa/0x20 > [<ffffffff8108dd00>] ? kthread+0x0/0xa0 > [<ffffffff8100c1c0>] ? child_rip+0x0/0x20 > no_formal_ino = 9582 > no_addr = 6698267 > i_disksize = 6838 > blocks = 0 > i_goal = 6698304 > i_diskflags = 0x00000000 > i_height = 1 > i_depth = 0 > i_entries = 0 > i_eattr = 0 > GFS2: fsid=CyberCluster:GizServer.1: gfs2_delete_inode: -5 > gdlm_unlock 5,66351b err=-22 > > > Only, with different inodes each time. > > After that event, services running on that filesystem are marked failed and > not moved over another node. Any access to that fs yields I/O error. Server > needed to be rebooted to properly work again. > > I did ran a fsck last night on that filesystem, and it did find some > errors, > but nothing serious. Lots (realy lots) of those : > > Ondisk and fsck bitmaps differ at block 5771602 (0x581152) > Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) > Metadata type is 0 (free) > Fix bitmap for block 5771602 (0x581152) ? (y/n) > > And after completing the fsck, I started back some services, and I got the > same error on another filesystem that is practily empty and used for small > utilities used troughout the cluster... > > What should I do to find the source of this problem ? > > > > ------------------------------ > > Message: 6 > Date: Tue, 21 Jun 2011 10:42:40 -0400 (EDT) > From: Bob Peterson <[email protected]> > To: linux clustering <[email protected]> > Subject: Re: [Linux-cluster] GFS2 fatal: filesystem consistency error > Message-ID: > < > 1036238479.689034.1308667360488.javamail.r...@zmail06.collab.prod.int.phx2.redhat.com > > > > Content-Type: text/plain; charset=utf-8 > > ----- Original Message ----- > | 8 node cluster, fiber channel hbas and disks access trough a qlogic > | fabric. > | > | I've got hit 3 times with this error on different nodes : > | > | GFS2: fsid=CyberCluster:GizServer.1: fatal: filesystem consistency > | error > | GFS2: fsid=CyberCluster:GizServer.1: inode = 9582 6698267 > | GFS2: fsid=CyberCluster:GizServer.1: function = gfs2_dinode_dealloc, > | file = > | fs/gfs2/inode.c, line = 352 > | GFS2: fsid=CyberCluster:GizServer.1: about to withdraw this file > | system > | GFS2: fsid=CyberCluster:GizServer.1: telling LM to unmount > | GFS2: fsid=CyberCluster:GizServer.1: withdrawn > | Pid: 2659, comm: delete_workqueu Tainted: G W ---------------- T > | 2.6.32-131.2.1.el6.x86_64 #1 > | Call Trace: > | [<ffffffffa044ffd2>] ? gfs2_lm_withdraw+0x102/0x130 [gfs2] > | [<ffffffffa0425209>] ? trunc_dealloc+0xa9/0x130 [gfs2] > | [<ffffffffa04501dd>] ? gfs2_consist_inode_i+0x5d/0x60 [gfs2] > | [<ffffffffa0435584>] ? gfs2_dinode_dealloc+0x64/0x210 [gfs2] > | [<ffffffffa044e1da>] ? gfs2_delete_inode+0x1ba/0x280 [gfs2] > | [<ffffffffa044e0ad>] ? gfs2_delete_inode+0x8d/0x280 [gfs2] > | [<ffffffffa044e020>] ? gfs2_delete_inode+0x0/0x280 [gfs2] > | [<ffffffff8118cfbe>] ? generic_delete_inode+0xde/0x1d0 > | [<ffffffffa0432940>] ? delete_work_func+0x0/0x80 [gfs2] > | [<ffffffff8118d115>] ? generic_drop_inode+0x65/0x80 > | [<ffffffffa044cc4e>] ? gfs2_drop_inode+0x2e/0x30 [gfs2] > | [<ffffffff8118bf82>] ? iput+0x62/0x70 > | [<ffffffffa0432994>] ? delete_work_func+0x54/0x80 [gfs2] > | [<ffffffff810887d0>] ? worker_thread+0x170/0x2a0 > | [<ffffffff8108e100>] ? autoremove_wake_function+0x0/0x40 > | [<ffffffff81088660>] ? worker_thread+0x0/0x2a0 > | [<ffffffff8108dd96>] ? kthread+0x96/0xa0 > | [<ffffffff8100c1ca>] ? child_rip+0xa/0x20 > | [<ffffffff8108dd00>] ? kthread+0x0/0xa0 > | [<ffffffff8100c1c0>] ? child_rip+0x0/0x20 > | no_formal_ino = 9582 > | no_addr = 6698267 > | i_disksize = 6838 > | blocks = 0 > | i_goal = 6698304 > | i_diskflags = 0x00000000 > | i_height = 1 > | i_depth = 0 > | i_entries = 0 > | i_eattr = 0 > | GFS2: fsid=CyberCluster:GizServer.1: gfs2_delete_inode: -5 > | gdlm_unlock 5,66351b err=-22 > | > | > | Only, with different inodes each time. > | > | After that event, services running on that filesystem are marked > | failed and > | not moved over another node. Any access to that fs yields I/O error. > | Server > | needed to be rebooted to properly work again. > | > | I did ran a fsck last night on that filesystem, and it did find some > | errors, > | but nothing serious. Lots (realy lots) of those : > | > | Ondisk and fsck bitmaps differ at block 5771602 (0x581152) > | Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) > | Metadata type is 0 (free) > | Fix bitmap for block 5771602 (0x581152) ? (y/n) > | > | And after completing the fsck, I started back some services, and I got > | the > | same error on another filesystem that is practily empty and used for > | small > | utilities used troughout the cluster... > | > | What should I do to find the source of this problem ? > > Hi, > > I believe this is a GFS2 bug we've already solved. > Please contact Red Hat Support. > > Regards, > > Bob Peterson > Red Hat File Systems > > > > ------------------------------ > > -- > Linux-cluster mailing list > [email protected] > https://www.redhat.com/mailman/listinfo/linux-cluster > > End of Linux-cluster Digest, Vol 86, Issue 19 > ********************************************* > -- Thanks, Balaji S
-- Linux-cluster mailing list [email protected] https://www.redhat.com/mailman/listinfo/linux-cluster
