Re: [Linux-cluster] Linux-cluster Digest, Vol 86, Issue 19

Balaji S Tue, 21 Jun 2011 20:15:36 -0700

Hi Dominic,

Yes the errors are only belongs to passive path.


> ------------------------------
>
> Message: 3
> Date: Tue, 21 Jun 2011 18:22:49 +0530
> From: dOminic <[email protected]>
> To: linux clustering <[email protected]>
> Subject: Re: [Linux-cluster] Cluster Failover Failed
> Message-ID: <[email protected]>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Hi,
>
> Btw, how many HBAs are present in your box ? . Problem is with scsi3 only
> ?.
>
> Refer https://access.redhat.com/kb/docs/DOC-2991 , then set the filter.
> Also, I would suggest you to open ticket with Linux vendor if IO errors are
> belongs to Active paths.
>
> Pointed IO errors are belongs to disk that in passive paths group ?. you
> can
> verify the same in multipath-ll output .
>
> regards,
>
> On Sun, Jun 19, 2011 at 10:03 PM, dOminic <[email protected]> wrote:
>
> > Hi Balaji,
> >
> > Yes, the reported message is harmless ... However, you can try following
> >
> > 1) I would suggest you to set the filter setting in lvm.conf to properly
> > scan your mpath* devices and local disks.
> > 2) Enable blacklist section in multipath.conf  eg:
> >
> > blacklist {
> >        devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
> >        devnode "^hd[a-z]"
> > }
> >
> > # multipath -v2
> >
> > Observe the box. Check whether that helps ...
> >
> >
> > Regards,
> >
> >
> > On Wed, Jun 15, 2011 at 12:16 AM, Balaji S <[email protected]> wrote:
> >
> >> Hi,
> >> In my setup implemented 10 tow node cluster's which running mysql as
> >> cluster service, ipmi card as fencing device.
> >>
> >> In my /var/log/messages i am keep getting the errors like below,
> >>
> >> Jun 14 12:50:48 hostname kernel: end_request: I/O error, dev sdm, sector
> 0
> >> Jun 14 12:50:48 hostname kernel: sd 3:0:2:2: Device not ready: <6>:
> >> Current: sense key: Not Ready
> >> Jun 14 12:50:48 hostname kernel:     Add. Sense: Logical unit not ready,
> >> manual intervention required
> >> Jun 14 12:50:48 hostname kernel:
> >> Jun 14 12:50:48 hostname kernel: end_request: I/O error, dev sdn, sector
> 0
> >> Jun 14 12:50:48 hostname kernel: sd 3:0:2:4: Device not ready: <6>:
> >> Current: sense key: Not Ready
> >> Jun 14 12:50:48 hostname kernel:     Add. Sense: Logical unit not ready,
> >> manual intervention required
> >> Jun 14 12:50:48 hostname kernel:
> >> Jun 14 12:50:48 hostname kernel: end_request: I/O error, dev sdp, sector
> 0
> >> Jun 14 12:51:10 hostname kernel: sd 3:0:0:1: Device not ready: <6>:
> >> Current: sense key: Not Ready
> >> Jun 14 12:51:10 hostname kernel:     Add. Sense: Logical unit not ready,
> >> manual intervention required
> >> Jun 14 12:51:10 hostname kernel:
> >> Jun 14 12:51:10 hostname kernel: end_request: I/O error, dev sdc, sector
> 0
> >> Jun 14 12:51:10 hostname kernel: printk: 3 messages suppressed.
> >> Jun 14 12:51:10 hostname kernel: Buffer I/O error on device sdc, logical
> >> block 0
> >> Jun 14 12:51:10 hostname kernel: sd 3:0:0:2: Device not ready: <6>:
> >> Current: sense key: Not Ready
> >> Jun 14 12:51:10 hostname kernel:     Add. Sense: Logical unit not ready,
> >> manual intervention required
> >> Jun 14 12:51:10 hostname kernel:
> >> Jun 14 12:51:10 hostname kernel: end_request: I/O error, dev sdd, sector
> 0
> >> Jun 14 12:51:10 hostname kernel: Buffer I/O error on device sdd, logical
> >> block 0
> >> Jun 14 12:51:10 hostname kernel: sd 3:0:0:4: Device not ready: <6>:
> >> Current: sense key: Not Ready
> >> Jun 14 12:51:10 hostname kernel:     Add. Sense: Logical unit not ready,
> >> manual intervention required
> >>
> >>
> >> when i am checking the multipath -ll , this all devices are in passive
> >> path.
> >>
> >> Environment :
> >>
> >> RHEL 5.4 & EMC SAN
> >>
> >> Please suggest how to overcome this issue. Support will be highly
> helpful.
> >> Thanks in Advance
> >>
> >>
> >> --
> >> Thanks,
> >> BSK
> >>
> >> --
> >> Linux-cluster mailing list
> >> [email protected]
> >> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>
> >
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> https://www.redhat.com/archives/linux-cluster/attachments/20110621/e41e841c/attachment.html
> >
>
> ------------------------------
>
> Message: 4
> Date: Tue, 21 Jun 2011 15:31:13 +0200
> From: Miha Valencic <[email protected]>
> To: linux clustering <[email protected]>
> Subject: Re: [Linux-cluster] Troubleshooting service relocation
> Message-ID: <[email protected]>
> Content-Type: text/plain; charset="utf-8"
>
> Michael, I've configured the logging on RM and am now waiting for it to
> switch nodes. Hopefully, I can see a reason why it is relocating.
>
> Thanks,
>  Miha.
>
> On Sat, Jun 18, 2011 at 11:24 AM, Michael Pye <[email protected]> wrote:
>
> > On 17/06/2011 09:13, Miha Valencic wrote:
> > > How can I turn on logging or what else can I check?
> >
> > Take a look at this knowledgebase article:
> > https://access.redhat.com/kb/docs/DOC-53500
> >
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> https://www.redhat.com/archives/linux-cluster/attachments/20110621/19a643fd/attachment.html
> >
>
> ------------------------------
>
> Message: 5
> Date: Tue, 21 Jun 2011 09:57:38 -0400
> From: "Nicolas Ross" <[email protected]>
> To: "linux clustering" <[email protected]>
> Subject: [Linux-cluster] GFS2 fatal: filesystem consistency error
> Message-ID: <AD364AF1E9D94C50B96231FB0320B1DE@versa>
> Content-Type: text/plain; format=flowed; charset="iso-8859-1";
>        reply-type=original
>
> 8 node cluster, fiber channel hbas and disks access trough a qlogic fabric.
>
> I've got hit 3 times with this error on different nodes :
>
> GFS2: fsid=CyberCluster:GizServer.1: fatal: filesystem consistency error
> GFS2: fsid=CyberCluster:GizServer.1: inode = 9582 6698267
> GFS2: fsid=CyberCluster:GizServer.1: function = gfs2_dinode_dealloc, file =
> fs/gfs2/inode.c, line = 352
> GFS2: fsid=CyberCluster:GizServer.1: about to withdraw this file system
> GFS2: fsid=CyberCluster:GizServer.1: telling LM to unmount
> GFS2: fsid=CyberCluster:GizServer.1: withdrawn
> Pid: 2659, comm: delete_workqueu Tainted: G W ---------------- T
> 2.6.32-131.2.1.el6.x86_64 #1
> Call Trace:
> [<ffffffffa044ffd2>] ? gfs2_lm_withdraw+0x102/0x130 [gfs2]
> [<ffffffffa0425209>] ? trunc_dealloc+0xa9/0x130 [gfs2]
> [<ffffffffa04501dd>] ? gfs2_consist_inode_i+0x5d/0x60 [gfs2]
> [<ffffffffa0435584>] ? gfs2_dinode_dealloc+0x64/0x210 [gfs2]
> [<ffffffffa044e1da>] ? gfs2_delete_inode+0x1ba/0x280 [gfs2]
> [<ffffffffa044e0ad>] ? gfs2_delete_inode+0x8d/0x280 [gfs2]
> [<ffffffffa044e020>] ? gfs2_delete_inode+0x0/0x280 [gfs2]
> [<ffffffff8118cfbe>] ? generic_delete_inode+0xde/0x1d0
> [<ffffffffa0432940>] ? delete_work_func+0x0/0x80 [gfs2]
> [<ffffffff8118d115>] ? generic_drop_inode+0x65/0x80
> [<ffffffffa044cc4e>] ? gfs2_drop_inode+0x2e/0x30 [gfs2]
> [<ffffffff8118bf82>] ? iput+0x62/0x70
> [<ffffffffa0432994>] ? delete_work_func+0x54/0x80 [gfs2]
> [<ffffffff810887d0>] ? worker_thread+0x170/0x2a0
> [<ffffffff8108e100>] ? autoremove_wake_function+0x0/0x40
> [<ffffffff81088660>] ? worker_thread+0x0/0x2a0
> [<ffffffff8108dd96>] ? kthread+0x96/0xa0
> [<ffffffff8100c1ca>] ? child_rip+0xa/0x20
> [<ffffffff8108dd00>] ? kthread+0x0/0xa0
> [<ffffffff8100c1c0>] ? child_rip+0x0/0x20
> no_formal_ino = 9582
> no_addr = 6698267
> i_disksize = 6838
> blocks = 0
> i_goal = 6698304
> i_diskflags = 0x00000000
> i_height = 1
> i_depth = 0
> i_entries = 0
> i_eattr = 0
> GFS2: fsid=CyberCluster:GizServer.1: gfs2_delete_inode: -5
> gdlm_unlock 5,66351b err=-22
>
>
> Only, with different inodes each time.
>
> After that event, services running on that filesystem are marked failed and
> not moved over another node. Any access to that fs yields I/O error. Server
> needed to be rebooted to properly work again.
>
> I did ran a fsck last night on that filesystem, and it did find some
> errors,
> but nothing serious. Lots (realy lots) of those :
>
> Ondisk and fsck bitmaps differ at block 5771602 (0x581152)
> Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free)
> Metadata type is 0 (free)
> Fix bitmap for block 5771602 (0x581152) ? (y/n)
>
> And after completing the fsck, I started back some services, and I got the
> same error on another filesystem that is practily empty and used for small
> utilities used troughout the cluster...
>
> What should I do to find the source of this problem ?
>
>
>
> ------------------------------
>
> Message: 6
> Date: Tue, 21 Jun 2011 10:42:40 -0400 (EDT)
> From: Bob Peterson <[email protected]>
> To: linux clustering <[email protected]>
> Subject: Re: [Linux-cluster] GFS2 fatal: filesystem consistency error
> Message-ID:
>        <
> 1036238479.689034.1308667360488.javamail.r...@zmail06.collab.prod.int.phx2.redhat.com
> >
>
> Content-Type: text/plain; charset=utf-8
>
> ----- Original Message -----
> | 8 node cluster, fiber channel hbas and disks access trough a qlogic
> | fabric.
> |
> | I've got hit 3 times with this error on different nodes :
> |
> | GFS2: fsid=CyberCluster:GizServer.1: fatal: filesystem consistency
> | error
> | GFS2: fsid=CyberCluster:GizServer.1: inode = 9582 6698267
> | GFS2: fsid=CyberCluster:GizServer.1: function = gfs2_dinode_dealloc,
> | file =
> | fs/gfs2/inode.c, line = 352
> | GFS2: fsid=CyberCluster:GizServer.1: about to withdraw this file
> | system
> | GFS2: fsid=CyberCluster:GizServer.1: telling LM to unmount
> | GFS2: fsid=CyberCluster:GizServer.1: withdrawn
> | Pid: 2659, comm: delete_workqueu Tainted: G W ---------------- T
> | 2.6.32-131.2.1.el6.x86_64 #1
> | Call Trace:
> | [<ffffffffa044ffd2>] ? gfs2_lm_withdraw+0x102/0x130 [gfs2]
> | [<ffffffffa0425209>] ? trunc_dealloc+0xa9/0x130 [gfs2]
> | [<ffffffffa04501dd>] ? gfs2_consist_inode_i+0x5d/0x60 [gfs2]
> | [<ffffffffa0435584>] ? gfs2_dinode_dealloc+0x64/0x210 [gfs2]
> | [<ffffffffa044e1da>] ? gfs2_delete_inode+0x1ba/0x280 [gfs2]
> | [<ffffffffa044e0ad>] ? gfs2_delete_inode+0x8d/0x280 [gfs2]
> | [<ffffffffa044e020>] ? gfs2_delete_inode+0x0/0x280 [gfs2]
> | [<ffffffff8118cfbe>] ? generic_delete_inode+0xde/0x1d0
> | [<ffffffffa0432940>] ? delete_work_func+0x0/0x80 [gfs2]
> | [<ffffffff8118d115>] ? generic_drop_inode+0x65/0x80
> | [<ffffffffa044cc4e>] ? gfs2_drop_inode+0x2e/0x30 [gfs2]
> | [<ffffffff8118bf82>] ? iput+0x62/0x70
> | [<ffffffffa0432994>] ? delete_work_func+0x54/0x80 [gfs2]
> | [<ffffffff810887d0>] ? worker_thread+0x170/0x2a0
> | [<ffffffff8108e100>] ? autoremove_wake_function+0x0/0x40
> | [<ffffffff81088660>] ? worker_thread+0x0/0x2a0
> | [<ffffffff8108dd96>] ? kthread+0x96/0xa0
> | [<ffffffff8100c1ca>] ? child_rip+0xa/0x20
> | [<ffffffff8108dd00>] ? kthread+0x0/0xa0
> | [<ffffffff8100c1c0>] ? child_rip+0x0/0x20
> | no_formal_ino = 9582
> | no_addr = 6698267
> | i_disksize = 6838
> | blocks = 0
> | i_goal = 6698304
> | i_diskflags = 0x00000000
> | i_height = 1
> | i_depth = 0
> | i_entries = 0
> | i_eattr = 0
> | GFS2: fsid=CyberCluster:GizServer.1: gfs2_delete_inode: -5
> | gdlm_unlock 5,66351b err=-22
> |
> |
> | Only, with different inodes each time.
> |
> | After that event, services running on that filesystem are marked
> | failed and
> | not moved over another node. Any access to that fs yields I/O error.
> | Server
> | needed to be rebooted to properly work again.
> |
> | I did ran a fsck last night on that filesystem, and it did find some
> | errors,
> | but nothing serious. Lots (realy lots) of those :
> |
> | Ondisk and fsck bitmaps differ at block 5771602 (0x581152)
> | Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free)
> | Metadata type is 0 (free)
> | Fix bitmap for block 5771602 (0x581152) ? (y/n)
> |
> | And after completing the fsck, I started back some services, and I got
> | the
> | same error on another filesystem that is practily empty and used for
> | small
> | utilities used troughout the cluster...
> |
> | What should I do to find the source of this problem ?
>
> Hi,
>
> I believe this is a GFS2 bug we've already solved.
> Please contact Red Hat Support.
>
> Regards,
>
> Bob Peterson
> Red Hat File Systems
>
>
>
> ------------------------------
>
> --
> Linux-cluster mailing list
> [email protected]
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> End of Linux-cluster Digest, Vol 86, Issue 19
> *********************************************
>




-- 
Thanks,
Balaji S

--
Linux-cluster mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/linux-cluster

Re: [Linux-cluster] Linux-cluster Digest, Vol 86, Issue 19

Reply via email to