RE: Raid Arrays and Power Loss

MacGregor, Ian A. Tue, 16 Sep 2003 11:10:01 -0700

The Raid Array is a Sun  A1000.  I'm not sure the vintage, but the disks are 18 GB. 
The Raid array did not lose its configuration.  The storage is still there.  Neither 
affected file system was every empty, but a couple of files were lost.  One on each 
file system.


The box is located at one of our interaction regions (IR's).  some additional 
information [results truncated]

[EMAIL PROTECTED] $ last reboot  

reboot    system boot                   Fri Sep 12 15:32
reboot    system boot                   Mon Aug 25 14:24

When the 

  Fri Sep 12 13:32:01 2003
 ORA-00204: error in reading (block 1, # blocks 1) of controlfile
 ORA-00202: controlfile: '/u1/oradata/BBRO/BBROcntrl01.ctl'
 ORA-27091: skgfqio: unable to queue I/O
 SVR4 Error: 6: No such device or address
 Additional information: 1

Error occurred the raid box was off.  I had thought that the unix box had already been 
rebooted but that turns out to be false.

After the box was rebooted with the raid array on

Fri Sep 12 15:33:08 2003
> ORA-00202: controlfile: '/u1/oradata/BBRO/BBROcntrl01.ctl'
> ORA-27037: unable to obtain file status
> SVR4 Error: 2: No such file or directory
> Additional information: 3
> Fri Sep 12 15:33:11 2003

The other files on /u1 were fine.  Also concerning 

The other error

Fri Sep 12 16:18:58 2003
> Thread recovery: start rolling forward thread 1
> Fri Sep 12 16:18:58 2003
> Errors in file /opt/oracle/admin/BBRO/udump/bbro_ora_1804.trc:
> ORA-00313: open failed for members of log group 3 of thread 1
> ORA-00312: online log 3 thread 1: '/u2/oradata/BBRO/redo0301.log'
> ORA-27037: unable to obtain file status
> SVR4 Error: 2: No such file or directory
> Additional information: 3

The other files are /u2 were fine.  The files in question just disappeared.  I know 
this is not normal and raid boxes do not normally lose files, but it's hard to argue 
against the empirical evidence here that they can.  It may be that either I or the 
folks down an IR-2 induced the problems.  But files were indeed lost on two different 
LUN's.

My current thinking is that the two files were being written when the power was turned 
off on the raid array or there was not enough to keep the disks spinning because the 
UPS had been drained.  The battery for the cache was reporting  low, but based on the 
number of hours it operation.  Should it not have maintained the cache?

Ian MacGregor
Stanford Linear Accelerator Center
[EMAIL PROTECTED] 








  
 



-----Original Message-----
Sent: Tuesday, September 16, 2003 10:55 AM
To: Multiple recipients of list ORACLE-L



Okay, core questions:

-as someone asked, what's the make/model of storage?
-has your raid array lost its config?  In other words, is the storage there, just with 
an empty vtoc/volume table/partition table (insert your particular OS nomenclature) 
-Is the filesystem good, just empty?  When you say the file is gone, is the /u1 
directory empty, or is the filesystem structure there, just that file is gone?

Okay, I just saw your message that shows its solaris 8 + veritas.  Here's what 
probably happened.  The box was powered on without the RAID array powered on and 
consequently veritas doesn't see the disk groups/volumes that are on the RAID array.  
Have you tried doing (as root):

vxconfigd -km enable

This will cause a rescan of the existing volume groups.  Afterwards, what does a 
vxprint -hrt look like?

In general, power loss to a RAID array will not produce the results you describe - I 
think its far more likely that a system->array interaction is preventing proper access 
to your storage.

Thanks,
Matt

--
Matthew Zito
GridApp Systems
Email: [EMAIL PROTECTED]
Cell: 646-220-3551
Phone: 212-358-8211 x 359
http://www.gridapp.com

> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On
> Behalf Of MacGregor, Ian A.
> Sent: Tuesday, September 16, 2003 12:34 AM
> To: Multiple recipients of list ORACLE-L
> Subject: Raid Arrays and Power Loss
> 
> 
> Last Friday was hot here, and rumor has it our  230 KV  power
> line sagged and touched some tree branches.  The local power 
> company shut it off.  Leaving our systems to depend on UPS.  
> About 30 minutes afterwards one system produced these  
> errors.  This was jus before the system went dead
> 
> Fri Sep 12 12:58:40 2003
> Errors in file /opt/oracle/admin/BBRO/bdump/bbro_ckpt_1420.trc:
> ORA-00206: error in writing (block 3, # blocks 1) of controlfile
> ORA-00202: controlfile: '/u1/oradata/BBRO/BBROcntrl01.ctl'
> ORA-27063: skgfospo: number of bytes read/written is
> incorrect SVR4 Error: 5: I/O error Additional information: -1 
> Additional information: 8192 Fri Sep 12 12:58:42 2003 Errors 
> in file /opt/oracle/admin/BBRO/bdump/bbro_ckpt_1420.trc:
> ORA-00221: error on write to controlfile
> ORA-00206: error in writing (block 3, # blocks 1) of controlfile
> ORA-00202: controlfile: '/u1/oradata/BBRO/BBROcntrl01.ctl'
> ORA-27063: skgfospo: number of bytes read/written is 
> incorrect SVR4 Error: 5: I/O error Additional information: -1 
> Additional information: 8192 Fri Sep 12 12:58:42 2003
> CKPT: terminating instance due to error 221
> Instance terminated by CKPT, pid = 1420
> --------------------------------------------------------------
> -----------------------------------------------
> Things look pretty shaky here.  When things were restarted 
> the following error was produced. 
  Fri Sep 12 13:32:01 2003
> ORA-00204: error in reading (block 1, # blocks 1) of controlfile
> ORA-00202: controlfile: '/u1/oradata/BBRO/BBROcntrl01.ctl'
> ORA-27091: skgfqio: unable to queue I/O
> SVR4 Error: 6: No such device or address
> Additional information: 1
> 
> The raid array had not been powered on
> --------------------------------------------------------------
> -----------------------------------------
> However
> Fri Sep 12 15:33:08 2003
> ORA-00202: controlfile: '/u1/oradata/BBRO/BBROcntrl01.ctl'
> ORA-27037: unable to obtain file status
> SVR4 Error: 2: No such file or directory
> Additional information: 3
> Fri Sep 12 15:33:11 2003
> ORA-205 signalled during: alter database  mount...
> 
> Now the file system is available, but the file itself has
> disappeared. It was not corrupted, just disappeared.  We 
> duplex a copy to an internal disk.  So recovery was easy.
> 
> However once this was fixed
> 
> Fri Sep 12 16:18:58 2003
> Thread recovery: start rolling forward thread 1
> Fri Sep 12 16:18:58 2003
> Errors in file /opt/oracle/admin/BBRO/udump/bbro_ora_1804.trc:
> ORA-00313: open failed for members of log group 3 of thread 1
> ORA-00312: online log 3 thread 1: '/u2/oradata/BBRO/redo0301.log'
> ORA-27037: unable to obtain file status
> SVR4 Error: 2: No such file or directory
> Additional information: 3
> ORA-313 signalled during: ALTER DATABASE OPEN...
> --------------------------------------------------------------
> -----------------------------------------------
> These files are on a RAID  1 LUN.  Both copies of the file
> are gone.  Again not corrupted but gone.  I don't know if 
> using duplexing rather than RAID 1 would have mattered here, 
> but I am changing things so that one group of redo logs is on 
> internal disk and written via the duplexing method.
> 
> 
> 
> 
> Ian MacGregor
> Stanford linear Accelerator Center
> [EMAIL PROTECTED]
> 
>  
> 
> --
> Please see the official ORACLE-L FAQ: http://www.orafaq.net
> -- 
> Author: MacGregor, Ian A.
>   INET: [EMAIL PROTECTED]
> 
> Fat City Network Services    -- 858-538-5051 http://www.fatcity.com
> San Diego, California        -- Mailing list and web hosting services
> ---------------------------------------------------------------------
> To REMOVE yourself from this mailing list, send an E-Mail message
> to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru')
> and in the message BODY, include a line containing: UNSUB 
> ORACLE-L (or the name of mailing list you want to be removed 
> from).  You may also send the HELP command for other 
> information (like subscribing).
> 

-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.net
-- 
Author: Matthew Zito
  INET: [EMAIL PROTECTED]

Fat City Network Services    -- 858-538-5051 http://www.fatcity.com
San Diego, California        -- Mailing list and web hosting services
---------------------------------------------------------------------
To REMOVE yourself from this mailing list, send an E-Mail message
to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).
-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.net
-- 
Author: MacGregor, Ian A.
  INET: [EMAIL PROTECTED]

Fat City Network Services    -- 858-538-5051 http://www.fatcity.com
San Diego, California        -- Mailing list and web hosting services
---------------------------------------------------------------------
To REMOVE yourself from this mailing list, send an E-Mail message
to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).

RE: Raid Arrays and Power Loss

Reply via email to