Hi Michelle

For User home files you will need a backup anyway. For system Consistency you can use `pkg fix` to restore the system image to a known state in a new Boot environment.

Greetings
Till

On 05.08.21 05:14, Toomas Soome via openindiana-discuss wrote:


On 5. Aug 2021, at 11:11, Michelle <miche...@msknight.com> wrote:

I removed the drive in order to a backup before I start messing around
with things, which is why it isn't in the iostat. The backup will take
probably until early evening.

This is what happened from messages around that time. Almost looks like
whatever happened, it rebooted.


 From those, I’d say, you need to replace that disk.

rgds,
toomas


Aug  5 01:55:01 jaguar smbd[601]: [ID 617204 daemon.error] Can't get
SID for ID=0 type=1, status=-9977
Aug  5 01:58:00 jaguar ahci: [ID 296163 kern.warning] WARNING: ahci0:
ahci port 3 has task file error
Aug  5 01:58:00 jaguar ahci: [ID 687168 kern.warning] WARNING: ahci0:
ahci port 3 is trying to do error recovery
Aug  5 01:58:00 jaguar ahci: [ID 693748 kern.warning] WARNING: ahci0:
ahci port 3 task_file_status = 0x4041
Aug  5 01:58:00 jaguar ahci: [ID 657156 kern.warning] WARNING: ahci0:
error recovery for port 3 succeed
Aug  5 01:58:09 jaguar ahci: [ID 296163 kern.warning] WARNING: ahci0:
ahci port 3 has task file error
Aug  5 01:58:09 jaguar ahci: [ID 687168 kern.warning] WARNING: ahci0:
ahci port 3 is trying to do error recovery
Aug  5 01:58:09 jaguar ahci: [ID 693748 kern.warning] WARNING: ahci0:
ahci port 3 task_file_status = 0x4041
Aug  5 01:58:09 jaguar ahci: [ID 657156 kern.warning] WARNING: ahci0:
error recovery for port 3 succeed
Aug  5 02:00:15 jaguar ahci: [ID 296163 kern.warning] WARNING: ahci0:
ahci port 3 has task file error
Aug  5 02:00:15 jaguar ahci: [ID 687168 kern.warning] WARNING: ahci0:
ahci port 3 is trying to do error recovery
Aug  5 02:00:15 jaguar ahci: [ID 693748 kern.warning] WARNING: ahci0:
ahci port 3 task_file_status = 0x4041
Aug  5 02:00:16 jaguar ahci: [ID 657156 kern.warning] WARNING: ahci0:
error recovery for port 3 succeed
Aug  5 02:00:20 jaguar ahci: [ID 296163 kern.warning] WARNING: ahci0:
ahci port 3 has task file error
Aug  5 02:00:20 jaguar ahci: [ID 687168 kern.warning] WARNING: ahci0:
ahci port 3 is trying to do error recovery
Aug  5 02:00:20 jaguar ahci: [ID 693748 kern.warning] WARNING: ahci0:
ahci port 3 task_file_status = 0x4041
Aug  5 02:00:20 jaguar ahci: [ID 657156 kern.warning] WARNING: ahci0:
error recovery for port 3 succeed
Aug  5 02:00:24 jaguar ahci: [ID 296163 kern.warning] WARNING: ahci0:
ahci port 3 has task file error
Aug  5 02:00:24 jaguar ahci: [ID 687168 kern.warning] WARNING: ahci0:
ahci port 3 is trying to do error recovery
Aug  5 02:00:24 jaguar ahci: [ID 693748 kern.warning] WARNING: ahci0:
ahci port 3 task_file_status = 0x4041
Aug  5 02:00:24 jaguar ahci: [ID 657156 kern.warning] WARNING: ahci0:
error recovery for port 3 succeed
Aug  5 02:00:24 jaguar ahci: [ID 811322 kern.info] NOTICE: ahci0:
ahci_tran_reset_dport port 3 reset device
Aug  5 02:00:29 jaguar ahci: [ID 296163 kern.warning] WARNING: ahci0:
ahci port 3 has task file error
Aug  5 02:00:29 jaguar ahci: [ID 687168 kern.warning] WARNING: ahci0:
ahci port 3 is trying to do error recovery
Aug  5 02:00:29 jaguar ahci: [ID 693748 kern.warning] WARNING: ahci0:
ahci port 3 task_file_status = 0x4041
Aug  5 02:00:29 jaguar ahci: [ID 657156 kern.warning] WARNING: ahci0:
error recovery for port 3 succeed
Aug  5 02:00:34 jaguar ahci: [ID 296163 kern.warning] WARNING: ahci0:
ahci port 3 has task file error
Aug  5 02:00:34 jaguar ahci: [ID 687168 kern.warning] WARNING: ahci0:
ahci port 3 is trying to do error recovery
Aug  5 02:00:34 jaguar ahci: [ID 693748 kern.warning] WARNING: ahci0:
ahci port 3 task_file_status = 0x4041
Aug  5 02:00:34 jaguar ahci: [ID 657156 kern.warning] WARNING: ahci0:
error recovery for port 3 succeed
Aug  5 02:00:38 jaguar ahci: [ID 296163 kern.warning] WARNING: ahci0:
ahci port 3 has task file error
Aug  5 02:00:38 jaguar ahci: [ID 687168 kern.warning] WARNING: ahci0:
ahci port 3 is trying to do error recovery
Aug  5 02:00:38 jaguar ahci: [ID 693748 kern.warning] WARNING: ahci0:
ahci port 3 task_file_status = 0x4041
Aug  5 02:00:38 jaguar ahci: [ID 657156 kern.warning] WARNING: ahci0:
error recovery for port 3 succeed
Aug  5 02:00:53 jaguar fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-
8000-FD, TYPE: Fault, VER: 1, SEVERITY: Major
Aug  5 02:00:53 jaguar EVENT-TIME: Thu Aug  5 02:00:53 UTC 2021
Aug  5 02:00:53 jaguar PLATFORM: ProLiant-MicroServer, CSN: 5C7351P4L9,
HOSTNAME: jaguar
Aug  5 02:00:53 jaguar SOURCE: zfs-diagnosis, REV: 1.0


On Thu, 2021-08-05 at 11:03 +0300, Toomas Soome via openindiana-discuss
wrote:
On 5. Aug 2021, at 10:52, Michelle <miche...@msknight.com> wrote:

Thanks for this. So I'm possibly better off rolling back the OS
snapshot after my backup has finished?

maybe, maybe not. first of all, I have no idea to what point the
rollback would be.

secondly; the system has seen some errors, at this time, the fault
is, it does not tell us if those were checksum errors or something
else, and it seems to me, it is something else.

and this is why: if you look on your zpool output, you see report
about c6t3d0, but iostat -En below, it does not include c6t3d0. It
seems to be missing.

what do you get from: 'iostat -En c6t3d0’ ?

Also, it would be good idea to check /var/adm/messages, are there any
SATA or IO related messages around august 05. 02:00?

FMA definitely has recorded an issue about pool, so there must be
something going on.

rgds,
toomas

I have removed the drive for the moment, and am running a backup.
Just
in case :-)

mich@jaguar:~$ iostat -En
c5d1             Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Model: INTEL SSDSA2M04 Revision:  Serial No: CVGB949301PC040
Size: 40.02GB <40019116032 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0
c6t1d0           Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA      Product: WDC WD40EZRZ-00G Revision: 0A80 Serial
No:
WD-WCC7K5UK24LJ
Size: 4000.79GB <4000787030016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
c6t0d0           Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA      Product: WDC WD60EFRX-68L Revision: 0A82 Serial
No:
WD-WX21DA84EH0F
Size: 6001.18GB <6001175126016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
c6t2d0           Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA      Product: WDC WD60EFRX-68L Revision: 0A82 Serial
No:
WD-WX51DB880RJ4
Size: 6001.18GB <6001175126016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0


--------------- ------------------------------------  -------------
- --
-------
TIME            EVENT-ID                              MSG-
ID         SEVERITY
--------------- ------------------------------------  -------------
- --
-------
Aug 05 02:00:53 c5934fd6-5f4b-409e-b0f8-8f44ea8f99c4  ZFS-8000-
FD    Major

Host        : jaguar
Platform    : ProLiant-MicroServer      Chassis_id  : 5C7351P4L9
Product_sn  :

Fault class : fault.fs.zfs.vdev.io
Affects     : zfs://pool=jaguar/vdev=740c01ae0d3c3109
                 faulted and taken out of service
Problem in  : zfs://pool=jaguar/vdev=740c01ae0d3c3109
                 faulted and taken out of service

Description : The number of I/O errors associated with a ZFS device
exceeded
                    acceptable levels.  Refer to
             http://illumos.org/msg/ZFS-8000-FD for more
information.

Response    : The device has been offlined and marked as
faulted.  An
attempt
                    will be made to activate a hot spare if
available.

Impact      : Fault tolerance of the pool may be compromised.

Action      : Run 'zpool status -x' and replace the bad device.



On Thu, 2021-08-05 at 10:22 +0300, Toomas Soome via openindiana-
discuss
wrote:
On 5. Aug 2021, at 09:35, Michelle <miche...@msknight.com>
wrote:

Hi Folks,

About a month ago I updated my Hipster...
SunOS jaguar 5.11 illumos-ca706442e6 i86pc i386 i86pc

This morning it was absolutely crawling. Couldn't even connect
via
SSH
and had to bounce the box.

It was reporting a drive as faulted, but didn't give any
numbers...
everything was 0. I'm now not sure what happened and whether
the
drive
is good, or whether I should roll back the OS.

(and the drive WD Red 6TB (not shingle) went out of warrantee a
week
ago. How about that, eh?)

Grateful for any opinions please.

Thu  5 Aug 04:00:01 UTC 2021
NAME   SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DED
UP
HE
ALTH  ALTROOT
lion  5.45T  5.28T   176G        -         -     4%    96%  1.0
0x
DEGR
ADED  -
pool: jaguar
state: DEGRADED
status: One or more devices are faulted in response to
persistent
errors.
        Sufficient replicas exist for the pool to continue
functioning
in a
        degraded state.
action: Replace the faulted device, or use 'zpool clear' to
mark
the
device
        repaired.
scan: scrub in progress since Thu Aug  5 00:00:00 2021
        6.00T scanned at 428M/s, 5.02T issued at 358M/s, 7.90T
total
        1M repaired, 63.59% done, 0 days 02:20:17 to go
config:
        NAME        STATE     READ WRITE CKSUM
        jaguar      DEGRADED     0     0     0
          raidz1-0  DEGRADED     0     0     0
            c6t0d0  ONLINE       0     0     0
            c6t2d0  ONLINE       0     0     0
            c6t3d0  FAULTED      0     0     0  too many
errors  (repairing)


Can you postoutput from:
iostat -En
fmadm faulty

in any case, there definitely is bug about error reporting -
counters
are zero while “too many errors” is reported.

rgds,
toomas
_______________________________________________
openindiana-discuss mailing list
openindiana-discuss@openindiana.org
https://openindiana.org/mailman/listinfo/openindiana-discuss

_______________________________________________
openindiana-discuss mailing list
openindiana-discuss@openindiana.org
https://openindiana.org/mailman/listinfo/openindiana-discuss

_______________________________________________
openindiana-discuss mailing list
openindiana-discuss@openindiana.org
https://openindiana.org/mailman/listinfo/openindiana-discuss


_______________________________________________
openindiana-discuss mailing list
openindiana-discuss@openindiana.org
https://openindiana.org/mailman/listinfo/openindiana-discuss


_______________________________________________
openindiana-discuss mailing list
openindiana-discuss@openindiana.org
https://openindiana.org/mailman/listinfo/openindiana-discuss


_______________________________________________
openindiana-discuss mailing list
openindiana-discuss@openindiana.org
https://openindiana.org/mailman/listinfo/openindiana-discuss

Reply via email to