Hi Michelle
For User home files you will need a backup anyway. For system
Consistency you can use `pkg fix` to restore the system image to a known
state in a new Boot environment.
Greetings
Till
On 05.08.21 05:14, Toomas Soome via openindiana-discuss wrote:
On 5. Aug 2021, at 11:11, Michelle <miche...@msknight.com> wrote:
I removed the drive in order to a backup before I start messing around
with things, which is why it isn't in the iostat. The backup will take
probably until early evening.
This is what happened from messages around that time. Almost looks like
whatever happened, it rebooted.
From those, I’d say, you need to replace that disk.
rgds,
toomas
Aug 5 01:55:01 jaguar smbd[601]: [ID 617204 daemon.error] Can't get
SID for ID=0 type=1, status=-9977
Aug 5 01:58:00 jaguar ahci: [ID 296163 kern.warning] WARNING: ahci0:
ahci port 3 has task file error
Aug 5 01:58:00 jaguar ahci: [ID 687168 kern.warning] WARNING: ahci0:
ahci port 3 is trying to do error recovery
Aug 5 01:58:00 jaguar ahci: [ID 693748 kern.warning] WARNING: ahci0:
ahci port 3 task_file_status = 0x4041
Aug 5 01:58:00 jaguar ahci: [ID 657156 kern.warning] WARNING: ahci0:
error recovery for port 3 succeed
Aug 5 01:58:09 jaguar ahci: [ID 296163 kern.warning] WARNING: ahci0:
ahci port 3 has task file error
Aug 5 01:58:09 jaguar ahci: [ID 687168 kern.warning] WARNING: ahci0:
ahci port 3 is trying to do error recovery
Aug 5 01:58:09 jaguar ahci: [ID 693748 kern.warning] WARNING: ahci0:
ahci port 3 task_file_status = 0x4041
Aug 5 01:58:09 jaguar ahci: [ID 657156 kern.warning] WARNING: ahci0:
error recovery for port 3 succeed
Aug 5 02:00:15 jaguar ahci: [ID 296163 kern.warning] WARNING: ahci0:
ahci port 3 has task file error
Aug 5 02:00:15 jaguar ahci: [ID 687168 kern.warning] WARNING: ahci0:
ahci port 3 is trying to do error recovery
Aug 5 02:00:15 jaguar ahci: [ID 693748 kern.warning] WARNING: ahci0:
ahci port 3 task_file_status = 0x4041
Aug 5 02:00:16 jaguar ahci: [ID 657156 kern.warning] WARNING: ahci0:
error recovery for port 3 succeed
Aug 5 02:00:20 jaguar ahci: [ID 296163 kern.warning] WARNING: ahci0:
ahci port 3 has task file error
Aug 5 02:00:20 jaguar ahci: [ID 687168 kern.warning] WARNING: ahci0:
ahci port 3 is trying to do error recovery
Aug 5 02:00:20 jaguar ahci: [ID 693748 kern.warning] WARNING: ahci0:
ahci port 3 task_file_status = 0x4041
Aug 5 02:00:20 jaguar ahci: [ID 657156 kern.warning] WARNING: ahci0:
error recovery for port 3 succeed
Aug 5 02:00:24 jaguar ahci: [ID 296163 kern.warning] WARNING: ahci0:
ahci port 3 has task file error
Aug 5 02:00:24 jaguar ahci: [ID 687168 kern.warning] WARNING: ahci0:
ahci port 3 is trying to do error recovery
Aug 5 02:00:24 jaguar ahci: [ID 693748 kern.warning] WARNING: ahci0:
ahci port 3 task_file_status = 0x4041
Aug 5 02:00:24 jaguar ahci: [ID 657156 kern.warning] WARNING: ahci0:
error recovery for port 3 succeed
Aug 5 02:00:24 jaguar ahci: [ID 811322 kern.info] NOTICE: ahci0:
ahci_tran_reset_dport port 3 reset device
Aug 5 02:00:29 jaguar ahci: [ID 296163 kern.warning] WARNING: ahci0:
ahci port 3 has task file error
Aug 5 02:00:29 jaguar ahci: [ID 687168 kern.warning] WARNING: ahci0:
ahci port 3 is trying to do error recovery
Aug 5 02:00:29 jaguar ahci: [ID 693748 kern.warning] WARNING: ahci0:
ahci port 3 task_file_status = 0x4041
Aug 5 02:00:29 jaguar ahci: [ID 657156 kern.warning] WARNING: ahci0:
error recovery for port 3 succeed
Aug 5 02:00:34 jaguar ahci: [ID 296163 kern.warning] WARNING: ahci0:
ahci port 3 has task file error
Aug 5 02:00:34 jaguar ahci: [ID 687168 kern.warning] WARNING: ahci0:
ahci port 3 is trying to do error recovery
Aug 5 02:00:34 jaguar ahci: [ID 693748 kern.warning] WARNING: ahci0:
ahci port 3 task_file_status = 0x4041
Aug 5 02:00:34 jaguar ahci: [ID 657156 kern.warning] WARNING: ahci0:
error recovery for port 3 succeed
Aug 5 02:00:38 jaguar ahci: [ID 296163 kern.warning] WARNING: ahci0:
ahci port 3 has task file error
Aug 5 02:00:38 jaguar ahci: [ID 687168 kern.warning] WARNING: ahci0:
ahci port 3 is trying to do error recovery
Aug 5 02:00:38 jaguar ahci: [ID 693748 kern.warning] WARNING: ahci0:
ahci port 3 task_file_status = 0x4041
Aug 5 02:00:38 jaguar ahci: [ID 657156 kern.warning] WARNING: ahci0:
error recovery for port 3 succeed
Aug 5 02:00:53 jaguar fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-
8000-FD, TYPE: Fault, VER: 1, SEVERITY: Major
Aug 5 02:00:53 jaguar EVENT-TIME: Thu Aug 5 02:00:53 UTC 2021
Aug 5 02:00:53 jaguar PLATFORM: ProLiant-MicroServer, CSN: 5C7351P4L9,
HOSTNAME: jaguar
Aug 5 02:00:53 jaguar SOURCE: zfs-diagnosis, REV: 1.0
On Thu, 2021-08-05 at 11:03 +0300, Toomas Soome via openindiana-discuss
wrote:
On 5. Aug 2021, at 10:52, Michelle <miche...@msknight.com> wrote:
Thanks for this. So I'm possibly better off rolling back the OS
snapshot after my backup has finished?
maybe, maybe not. first of all, I have no idea to what point the
rollback would be.
secondly; the system has seen some errors, at this time, the fault
is, it does not tell us if those were checksum errors or something
else, and it seems to me, it is something else.
and this is why: if you look on your zpool output, you see report
about c6t3d0, but iostat -En below, it does not include c6t3d0. It
seems to be missing.
what do you get from: 'iostat -En c6t3d0’ ?
Also, it would be good idea to check /var/adm/messages, are there any
SATA or IO related messages around august 05. 02:00?
FMA definitely has recorded an issue about pool, so there must be
something going on.
rgds,
toomas
I have removed the drive for the moment, and am running a backup.
Just
in case :-)
mich@jaguar:~$ iostat -En
c5d1 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Model: INTEL SSDSA2M04 Revision: Serial No: CVGB949301PC040
Size: 40.02GB <40019116032 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0
c6t1d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA Product: WDC WD40EZRZ-00G Revision: 0A80 Serial
No:
WD-WCC7K5UK24LJ
Size: 4000.79GB <4000787030016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
c6t0d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA Product: WDC WD60EFRX-68L Revision: 0A82 Serial
No:
WD-WX21DA84EH0F
Size: 6001.18GB <6001175126016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
c6t2d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA Product: WDC WD60EFRX-68L Revision: 0A82 Serial
No:
WD-WX51DB880RJ4
Size: 6001.18GB <6001175126016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
--------------- ------------------------------------ -------------
- --
-------
TIME EVENT-ID MSG-
ID SEVERITY
--------------- ------------------------------------ -------------
- --
-------
Aug 05 02:00:53 c5934fd6-5f4b-409e-b0f8-8f44ea8f99c4 ZFS-8000-
FD Major
Host : jaguar
Platform : ProLiant-MicroServer Chassis_id : 5C7351P4L9
Product_sn :
Fault class : fault.fs.zfs.vdev.io
Affects : zfs://pool=jaguar/vdev=740c01ae0d3c3109
faulted and taken out of service
Problem in : zfs://pool=jaguar/vdev=740c01ae0d3c3109
faulted and taken out of service
Description : The number of I/O errors associated with a ZFS device
exceeded
acceptable levels. Refer to
http://illumos.org/msg/ZFS-8000-FD for more
information.
Response : The device has been offlined and marked as
faulted. An
attempt
will be made to activate a hot spare if
available.
Impact : Fault tolerance of the pool may be compromised.
Action : Run 'zpool status -x' and replace the bad device.
On Thu, 2021-08-05 at 10:22 +0300, Toomas Soome via openindiana-
discuss
wrote:
On 5. Aug 2021, at 09:35, Michelle <miche...@msknight.com>
wrote:
Hi Folks,
About a month ago I updated my Hipster...
SunOS jaguar 5.11 illumos-ca706442e6 i86pc i386 i86pc
This morning it was absolutely crawling. Couldn't even connect
via
SSH
and had to bounce the box.
It was reporting a drive as faulted, but didn't give any
numbers...
everything was 0. I'm now not sure what happened and whether
the
drive
is good, or whether I should roll back the OS.
(and the drive WD Red 6TB (not shingle) went out of warrantee a
week
ago. How about that, eh?)
Grateful for any opinions please.
Thu 5 Aug 04:00:01 UTC 2021
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DED
UP
HE
ALTH ALTROOT
lion 5.45T 5.28T 176G - - 4% 96% 1.0
0x
DEGR
ADED -
pool: jaguar
state: DEGRADED
status: One or more devices are faulted in response to
persistent
errors.
Sufficient replicas exist for the pool to continue
functioning
in a
degraded state.
action: Replace the faulted device, or use 'zpool clear' to
mark
the
device
repaired.
scan: scrub in progress since Thu Aug 5 00:00:00 2021
6.00T scanned at 428M/s, 5.02T issued at 358M/s, 7.90T
total
1M repaired, 63.59% done, 0 days 02:20:17 to go
config:
NAME STATE READ WRITE CKSUM
jaguar DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
c6t0d0 ONLINE 0 0 0
c6t2d0 ONLINE 0 0 0
c6t3d0 FAULTED 0 0 0 too many
errors (repairing)
Can you postoutput from:
iostat -En
fmadm faulty
in any case, there definitely is bug about error reporting -
counters
are zero while “too many errors” is reported.
rgds,
toomas
_______________________________________________
openindiana-discuss mailing list
openindiana-discuss@openindiana.org
https://openindiana.org/mailman/listinfo/openindiana-discuss
_______________________________________________
openindiana-discuss mailing list
openindiana-discuss@openindiana.org
https://openindiana.org/mailman/listinfo/openindiana-discuss
_______________________________________________
openindiana-discuss mailing list
openindiana-discuss@openindiana.org
https://openindiana.org/mailman/listinfo/openindiana-discuss
_______________________________________________
openindiana-discuss mailing list
openindiana-discuss@openindiana.org
https://openindiana.org/mailman/listinfo/openindiana-discuss
_______________________________________________
openindiana-discuss mailing list
openindiana-discuss@openindiana.org
https://openindiana.org/mailman/listinfo/openindiana-discuss
_______________________________________________
openindiana-discuss mailing list
openindiana-discuss@openindiana.org
https://openindiana.org/mailman/listinfo/openindiana-discuss