On 04/10/2015 9:17 PM, Jim Klimov wrote:

5 октября 2015 г. 0:27:38 CEST, Rainer Heilke
<rhei...@dragonhearth.com> пишет:
Greetings. I've recently had three hard drives fail in my server.
One was the OS disk, so I just reinstalled. The other two, however,
were each one-half

of zpool mirrors. They are the problem disks.

Both have been replaced, but now I cannot seem to work with them.
In format -e, they are giving errors, specifically: 1. c3d1 <drive
type unknown> /pci@0,0/pci-ide@11/ide@0/cmdk@1,0 and, 7. c7d1
<drive type unknown> /pci@0,0/pci-ide@14,1/ide@0/cmdk@1,0

There is also a third disk erroring out: 3. c5t9d1 <SS330055-
99JJXXK-0001 cyl 60797 alt 2 hd 255 sec 63>
/pci@0,0/pci1002,5a17@3/pci1000,9240@0/sd@9,1

I am suspecting c3d1 to be an old OS mirror, due to the low
controller number.

When I select 1 or 7, I get a Segmentation fault, and get booted
out of

the format utility. (If I select 3, the format utility never comes
back, freezing.) A zpool status shows:

pool: Pool1 state: ONLINE status: Some supported features are not
enabled on the pool. The pool can still be used, but some features
are unavailable. action: Enable all features using 'zpool upgrade'.
Once this is done, the pool may no longer be accessible by software
that does not support the features. See zpool-features(5) for
details. scan: resilvered 2.78M in 0h0m with 0 errors on Tue Sep 16
14:11:00 2014 config:

NAME        STATE     READ WRITE CKSUM Pool1       ONLINE       0
0     0 c5t8d1    ONLINE       0     0     0

errors: No known data errors

pool: data state: DEGRADED status: One or more devices has
experienced an error resulting in data corruption.  Applications
may be affected. action: Restore the file in question if possible.
Otherwise restore the entire pool from backup. see:
http://illumos.org/msg/ZFS-8000-8A scan: resilvered 36.1M in 0h17m
with 738 errors on Thu Oct  1 18:17:43 2015 config:

NAME                     STATE     READ WRITE CKSUM data
DEGRADED 20.6K     0     0 mirror-0               DEGRADED 81.8K
0     0 7152018192933189428  FAULTED      0     0     0  was
/dev/dsk/c11t8d1s0 c6d0                 ONLINE       0     0 81.8K

errors: 737 data errors, use '-v' for a list

(Doing a zpool status -v freezes the terminal.)

The system has three disks connected to an LSI MegaRAID SAS
9240-8i controller.

I am suspecting that disk 3 (c5t9d1) might be the detached mirror
of Pool1 ( c5t8d1), but being unable to work with it, I cannot
verify this. I have no idea on how to deal with the data mirror.
Should I just detach /dev/dsk/c11t8d1s0 ( 7152018192933189428) and
hope that c6d0 will be clean enough for a decent scrub? Or is
/dev/dsk/c11t8d1s0 ( 7152018192933189428) the disk with the less
corrupted data? Not being able to even get a listing (ls) of the
data pool leaves me very hesitant.

Does anyone have any ideas on how to clean this up?

Thanks in advance, Rainer

As already noted - part of the problem may be IDE access mode: e.g.
are your disks modern and large (over 2TB IIRC)?

Did you rescan OS devices (devfsadm -Cv)?

Did you try other partitioning programs (parted, fdisk) to see if you
can access the new disks at all, and in particular to verify that zfs
managed to make its partitioning? In the worse case you might have to
define an mbr/efi 'solaris' partition yourself and use it (as
cXtYdZpN) directly as a pool vdev, or use format afterwards to define
slices inside that partition and use cXtYdZs0 on the disk. I wrote
some howto's about 'advanced setup' on OI wiki that can help you get
started.

But first i'd verify all components work, including hardware. Maybe
cabling needs to be all re-plugged, the box may need a vacuum cleaner
(or rather a blow-out), or the power-source has nearly died (aged
capacitors, etc.)

Jim -- Typos courtesy of K-9 Mail on my Samsung Android


I rescanned using devfsadm (again; I had done it before, but no harm running it again. It seems to have nothing.

parted and fdisk either say the device doesn't exist, or they can't open it. ls -al /dev/rdsk | grep <target> does show full entries for all of the disks.

All of the drives are spinning, cabling checked (yet again), the box is clean inside (got a good blowout when the CMOS battery issue was found), and the power supply was load tested with 9 drives as part of the troubleshooting that finally found the battery problem. Those load tests ran overnight. Finally, each HDD power lead has both functional and problematic drives on them (though I may want to triple-check that).

I'm at a bit of a loss. Bad SATA cables? Why does the listing (ls -al) see the drives when parted, fdisk, and format have troubles? (I'm just trying to work out the logic of that in my head; it wasn't a real question to the list).

Rainer

--
Put your makeup on and fix your hair up pretty,
And meet me tonight in Atlantic  City
                        Bruce Springsteen


_______________________________________________
openindiana-discuss mailing list
openindiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss

Reply via email to