Re: [OpenIndiana-discuss] Replacing an unavailable hd...

2014-06-16 Thread John Ryan

On 16/06/2014 3:18 PM, John Doe via openindiana-discuss wrote:

I went onsite to replace the disk.
  zpool offline VOLUME c5t2d0 + unplug old, plug new
And nothing happens...
There was this in messages:
   Jun 14 02:30:12 data-4 ahci: [ID 517647 kern.warning] WARNING: ahci0:
 watchdog port 2 satapkt 0xff06cd8cbc68 timed out
   Jun 14 02:30:23 data-4 ahci: [ID 860969 kern.warning] WARNING: ahci0:
 ahci_port_reset port 2 the device hardware has been initialized and the
 power-up diagnostics failed
   Jun 14 02:30:24 data-4 sata: [ID 801845 kern.info] /pci@0,0/pci15d9,62f@1f,2:
   Jun 14 02:30:24 data-4  SATA port 2 error
   Jun 14 02:30:24 data-4 scsi: [ID 107833 kern.warning] WARNING: 
/pci@0,0/pci15d9,62f@1f,2/disk@2,0 (sd3):
   Jun 14 02:30:24 data-4  SYNCHRONIZE CACHE command failed (5)
   Jun 14 02:30:25 data-4 sata: [ID 801845 kern.info] /pci@0,0/pci15d9,62f@1f,2:
   Jun 14 02:30:25 data-4  SATA port 2 error
   ...
I tried:
   cfgadm -cconnect xxx and get:
   cfgadm: Insufficient condition
Nothing new in /var/adm/messages.
cfgadm still says
   sata0/2  sata-port  disconnected unconfigured failed
I guess "sata-port"
So I guess this sata port is bad (or just disabled?)...
And of course, that's one of these servers that must stays on 24h/24...
Is there a way to make the kernel try to reinitialize this sata port appart 
from a reboot?

Try `fmadm faulty`
Then `fmadm repaired `
Then cfgadm commands

John


___
openindiana-discuss mailing list
openindiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss



___
openindiana-discuss mailing list
openindiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


Re: [OpenIndiana-discuss] Replacing an unavailable hd...

2014-06-16 Thread Reginald Beardsley via openindiana-discuss

Thermal effects will cause connector issues.  The most extreme case in my 
experience was a move of our group in Dallas during July.  The systems were 
moved over the weekend.  None would boot on Monday until I had reseated all the 
disk drive cables.  They did not like sitting in a hot truck.  I've seen lots 
of systems which had cable issues after a shutdown which allowed the hardware 
to cool down, but not as consistent (i.e. everyone in my group) as the system 
move case.

Reg

___
openindiana-discuss mailing list
openindiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


Re: [OpenIndiana-discuss] Replacing an unavailable hd...

2014-06-16 Thread John Doe via openindiana-discuss
I went onsite to replace the disk.
 zpool offline VOLUME c5t2d0 + unplug old, plug new
And nothing happens...
There was this in messages:
  Jun 14 02:30:12 data-4 ahci: [ID 517647 kern.warning] WARNING: ahci0: 
    watchdog port 2 satapkt 0xff06cd8cbc68 timed out
  Jun 14 02:30:23 data-4 ahci: [ID 860969 kern.warning] WARNING: ahci0:
    ahci_port_reset port 2 the device hardware has been initialized and the
    power-up diagnostics failed
  Jun 14 02:30:24 data-4 sata: [ID 801845 kern.info] /pci@0,0/pci15d9,62f@1f,2:
  Jun 14 02:30:24 data-4  SATA port 2 error
  Jun 14 02:30:24 data-4 scsi: [ID 107833 kern.warning] WARNING: 
/pci@0,0/pci15d9,62f@1f,2/disk@2,0 (sd3):
  Jun 14 02:30:24 data-4  SYNCHRONIZE CACHE command failed (5)
  Jun 14 02:30:25 data-4 sata: [ID 801845 kern.info] /pci@0,0/pci15d9,62f@1f,2:
  Jun 14 02:30:25 data-4  SATA port 2 error
  ...
I tried:
  cfgadm -cconnect xxx and get:
  cfgadm: Insufficient condition
Nothing new in /var/adm/messages.
cfgadm still says
  sata0/2  sata-port  disconnected unconfigured failed
I guess "sata-port"
So I guess this sata port is bad (or just disabled?)...
And of course, that's one of these servers that must stays on 24h/24...
Is there a way to make the kernel try to reinitialize this sata port appart 
from a reboot?

___
openindiana-discuss mailing list
openindiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


Re: [OpenIndiana-discuss] SMF service that should not shutdown with the OS

2014-06-16 Thread Gary Mills
On Sun, Jun 15, 2014 at 10:12:14PM +0200, Jim Klimov wrote:
> On 2014-06-15 15:01, Gary Mills wrote:
> >
> >What you need here is a remote console.
> 
> That's for sure, in general.
> 
> But on this box it does not have telnet (or rather, it has either
> telnet or ssh - but not both at the same time), so I can not "telnet"
> from our Cisco router to the firewall/vpn/infrastructure server's
> RSC, and the Cisco SSH command (as of that firmware at least) does
> not support connections into a non-default VRF (routing table).
> I can however "telnet" anywhere, and I've put up a private VLAN where
> this server can listen for "insecure" telnet accesses from this Cisco.
> So it is all complicated :)

Where I used to work, we had a physically separate console network.
All of the RSC and ILOM ports connected to this network.  We had one
server that connected to both the console network and our computer
room backbone, but did not route between the two.  Once you logged
into that server, you could telnet or SSH to any of the console cards.
We ran conserver there to automate that part.

-- 
-Gary Mills--refurb--Winnipeg, Manitoba, Canada-

___
openindiana-discuss mailing list
openindiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


Re: [OpenIndiana-discuss] Replacing an unavailable hd...

2014-06-16 Thread Gary Gendel

On 06/16/2014 05:05 AM, John Doe via openindiana-discuss wrote:

From: Jim Klimov 

If you are uncertain if the HDD device has really failed, you can also
try to take apart the computer and remove-replug the power and signal
cables, perhaps a few times. This may cleanse them of oxydation and
repair the storage - happened to me dozens of times on both home-made
rigs and brand servers (though rarely on the latter).

Also, while you're near the box taken apart, you can listen to the
disk if it "squeaks" and vibrates when powered on, or no longer works
mechanically indeed.

In fact, recently we've had a power outage that rebooted a couple of
old servers who had a dead and unresponsive HDD each (the poor boxes
waited for replacements to be purchased and received), and now the
disks are back online - several scrubs found no problems (that is,
after the initial resilver/scrub which complained a lot due to lots
of stale data). So there was even no mechanical replugging, just a
power cycle.

Hum... the server is a bit less than 2 years old, and all disks are
plugged on the same back-plane, so I would be surprised if it was a
cabling issue.  And, since I have spares, I prefer to replace the
suspect disk to be safe and test it later...  If it is indeed a
cabling issue, the new disk will also look failed I guess.

If it helps, cfgadm -alv says:
sata0/2  disconnected unconfigured failed

I did not yet go onsite to witness if there is any red led.

After a moving the server to temporary quarters, I had a similar 
situation. I was able to bring the disk back up remotely (several hours 
drive away) with the following commands:


cfgadm -cunconnect 
cfgadm -cconnect 
cfgadm -cconfigure 

This situation kept happening (failing between a day and a week later), 
even when I finally moved it to new quarters.  I then re-seated the 
controller card and cables.  It's been running for several months 
without a complaint.  I did see that sometimes when the disk failed it 
would even show up as not connected.


I tried rebooting for awhile which sometimes worked, but I was much more 
successful with the above commands and it didn't require bringing the 
machine offline.


Even without the move, there is enough vibration to cause problems with 
marginal connections so I concur with Jim that it could occur.


Gary


___
openindiana-discuss mailing list
openindiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


Re: [OpenIndiana-discuss] Replacing an unavailable hd...

2014-06-16 Thread Jim Klimov

On 2014-06-16 11:05, John Doe via openindiana-discuss wrote:

Hum... the server is a bit less than 2 years old, and all disks are
plugged on the same back-plane, so I would be surprised if it was a
cabling issue.  And, since I have spares, I prefer to replace the
suspect disk to be safe and test it later...  If it is indeed a
cabling issue, the new disk will also look failed I guess.


Not necessarily. In case of oxydation, the problem is intermittent
and will probably be fixed by the act of re-plugging, which scratches
off the oxyde films on contacts and brings the metal parts back into
better contact. If this is the case, however, then it is likely that
the situation will recur roughly each year, and can be "fixed" in the
same way.

Thanks to ZFS, at least, if there are any botched data packets between
the HDD and HBA due to weak signal/interference/noise when the film
is growing but does not yet preclude communications completely, then
you'd see checksum errors on the device (if there are mismatches
indeed) and the pool will try to amend that automatically.

It may also be possible that SAS/SCSI protocol also includes some error
correction and automatic retries for such in-flight mistakes, though
there is likely nothing like that for SATA/IDE protocols.

HTH,
Jim Klimov


___
openindiana-discuss mailing list
openindiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


Re: [OpenIndiana-discuss] Replacing an unavailable hd...

2014-06-16 Thread John Doe via openindiana-discuss
From: Jim Klimov 
> If you are uncertain if the HDD device has really failed, you can also
> try to take apart the computer and remove-replug the power and signal
> cables, perhaps a few times. This may cleanse them of oxydation and
> repair the storage - happened to me dozens of times on both home-made
> rigs and brand servers (though rarely on the latter).
>
> Also, while you're near the box taken apart, you can listen to the
> disk if it "squeaks" and vibrates when powered on, or no longer works
> mechanically indeed.
>
> In fact, recently we've had a power outage that rebooted a couple of
> old servers who had a dead and unresponsive HDD each (the poor boxes
> waited for replacements to be purchased and received), and now the
> disks are back online - several scrubs found no problems (that is,
> after the initial resilver/scrub which complained a lot due to lots
> of stale data). So there was even no mechanical replugging, just a
> power cycle.

Hum... the server is a bit less than 2 years old, and all disks are
plugged on the same back-plane, so I would be surprised if it was a
cabling issue.  And, since I have spares, I prefer to replace the
suspect disk to be safe and test it later...  If it is indeed a
cabling issue, the new disk will also look failed I guess.

If it helps, cfgadm -alv says:
sata0/2  disconnected unconfigured failed

I did not yet go onsite to witness if there is any red led.

Thx,
JD

___
openindiana-discuss mailing list
openindiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


Re: [OpenIndiana-discuss] Replacing an unavailable hd...

2014-06-16 Thread Jim Klimov

On 2014-06-16 10:18, John Doe via openindiana-discuss wrote:

Hi,

I just got my first failed HD under OI and wanted to be sure I don't crash 
everything when replacing it...


If you are uncertain if the HDD device has really failed, you can also 
try to take apart the computer and remove-replug the power and signal

cables, perhaps a few times. This may cleanse them of oxydation and
repair the storage - happened to me dozens of times on both home-made
rigs and brand servers (though rarely on the latter).

Also, while you're near the box taken apart, you can listen to the
disk if it "squeaks" and vibrates when powered on, or no longer works
mechanically indeed.

In fact, recently we've had a power outage that rebooted a couple of
old servers who had a dead and unresponsive HDD each (the poor boxes
waited for replacements to be purchased and received), and now the
disks are back online - several scrubs found no problems (that is,
after the initial resilver/scrub which complained a lot due to lots
of stale data). So there was even no mechanical replugging, just a
power cycle.

HTH,
//Jim


___
openindiana-discuss mailing list
openindiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


[OpenIndiana-discuss] Replacing an unavailable hd...

2014-06-16 Thread John Doe via openindiana-discuss
Hi,

I just got my first failed HD under OI and wanted to be sure I don't crash 
everything when replacing it...

# zpool status -x
  pool: VOLUME
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
    the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://illumos.org/msg/ZFS-8000-2Q
  scan: scrub repaired 0 in 0h0m with 0 errors on Fri Jan 25 14:09:17 2013
config:

    NAME    STATE READ WRITE CKSUM
    VOLUME  DEGRADED 0 0 0
  raidz2-0  DEGRADED 0 0 0
    c5t0d0  ONLINE   0 0 0
    c5t1d0  ONLINE   0 0 0
    c5t2d0  UNAVAIL 12    20 0  cannot open
    c5t3d0  ONLINE   0 0 0
    c5t4d0  ONLINE   0 0 0
    c5t5d0  ONLINE   0 0 0

Do I just need to physicaly replace the HD and then do a:
  zpool replace VOLUME c5t2d0
or do I need to encapsulate that with some zpool offline/online, "cfgadm -c 
unconfigure/configure", followed by a "zpool clear"...?

Thx,
JD

___
openindiana-discuss mailing list
openindiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss