On Fri, Mar 6, 2015 at 10:48 AM, Robert Mustacchi <[email protected]> wrote:
> On 3/6/15 8:43 , Schweiss, Chip via illumos-discuss wrote:
> > I have two fairly new Haswell based servers running OmniOS. I have
> several
> > faults from both systems that I don't know what they are or what to do
> > about them.
> >
> > I am not seeing any related issues these faults.
> >
> > Can anyone clarify what they are and what to do about them?
>
> We've received error reports that the system doesn't understand how to
> diagnose. Here, getting the actual ereports that were generated on the
> system and looking at them will shed more light on the problem and will
> allow us to better understand what's happening on the systems.
>
>
I'm not familiar with ereports. After some googling, I'm assuming you mean
the output from 'fmdump -eV'
Here's reports that correspond to the first event. If this is what you
were asking for I'll dig out the rest of them.
Feb 27 2015 18:11:17.068478684 ereport.io.pci.fabric
nvlist version: 0
class = ereport.io.pci.fabric
ena = 0xe97c1b9f5a501401
detector = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = dev
device-path = /pci@0,0/pci8086,2f06@2,2
(end detector)
bdf = 0x12
device_id = 0x2f06
vendor_id = 0x8086
rev_id = 0x2
dev_type = 0x40
pcie_off = 0x90
pcix_off = 0x0
aer_off = 0x148
ecc_ver = 0x0
pci_status = 0x10
pci_command = 0x47
pci_bdg_sec_status = 0x2000
pci_bdg_ctrl = 0x3
pcie_status = 0x0
pcie_command = 0x27
pcie_dev_cap = 0x8001
pcie_adv_ctl = 0x0
pcie_ue_status = 0x0
pcie_ue_mask = 0x100000
pcie_ue_sev = 0x62030
pcie_ue_hdr0 = 0x0
pcie_ue_hdr1 = 0x0
pcie_ue_hdr2 = 0x0
pcie_ue_hdr3 = 0x0
pcie_ce_status = 0x0
pcie_ce_mask = 0x0
pcie_rp_status = 0x0
pcie_rp_control = 0x0
pcie_adv_rp_status = 0x1
pcie_adv_rp_command = 0x7
pcie_adv_rp_ce_src_id = 0x600
pcie_adv_rp_ue_src_id = 0x0
remainder = 0x3
severity = 0x1
__ttl = 0x1
__tod = 0x54f107a5 0x414e6dc
Feb 27 2015 18:11:17.068509897 ereport.io.pci.fabric
nvlist version: 0
class = ereport.io.pci.fabric
ena = 0xe97c1ba6ebb01401
detector = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = dev
device-path = /pci@0,0/pci8086,2f06@2,2/pci10b5,8724@0
(end detector)
bdf = 0x400
device_id = 0x8724
vendor_id = 0x10b5
rev_id = 0xca
dev_type = 0x50
pcie_off = 0x68
pcix_off = 0x0
aer_off = 0xfb4
ecc_ver = 0x0
pci_status = 0x10
pci_command = 0x147
pci_bdg_sec_status = 0x0
pci_bdg_ctrl = 0x3
pcie_status = 0x9
pcie_command = 0x37
pcie_dev_cap = 0x8004
pcie_adv_ctl = 0xbf
pcie_ue_status = 0x100000
pcie_ue_mask = 0x180000
pcie_ue_sev = 0x62030
pcie_ue_hdr0 = 0x0
pcie_ue_hdr1 = 0x0
pcie_ue_hdr2 = 0x0
pcie_ue_hdr3 = 0x0
pcie_ce_status = 0x2000
pcie_ce_mask = 0x0
remainder = 0x2
severity = 0x3
__ttl = 0x1
__tod = 0x54f107a5 0x41560c9
Feb 27 2015 18:11:17.068526093 ereport.io.pci.fabric
nvlist version: 0
class = ereport.io.pci.fabric
ena = 0xe97c1baaee901401
detector = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = dev
device-path = /pci@0,0/pci8086,2f06@2,2/pci10b5,8724@0
/pci10b5,8724@1
(end detector)
bdf = 0x508
device_id = 0x8724
vendor_id = 0x10b5
rev_id = 0xca
dev_type = 0x60
pcie_off = 0x68
pcix_off = 0x0
aer_off = 0xfb4
ecc_ver = 0x0
pci_status = 0x10
pci_command = 0x147
pci_bdg_sec_status = 0x0
pci_bdg_ctrl = 0x3
pcie_status = 0x0
pcie_command = 0x37
pcie_dev_cap = 0x8004
pcie_adv_ctl = 0xbf
pcie_ue_status = 0x0
pcie_ue_mask = 0x180000
pcie_ue_sev = 0x462030
pcie_ue_hdr0 = 0x0
pcie_ue_hdr1 = 0x0
pcie_ue_hdr2 = 0x0
pcie_ue_hdr3 = 0x0
pcie_ce_status = 0x0
pcie_ce_mask = 0x0
remainder = 0x1
severity = 0x1
__ttl = 0x1
__tod = 0x54f107a5 0x415a00d
Feb 27 2015 18:11:17.068541905 ereport.io.pci.fabric
nvlist version: 0
class = ereport.io.pci.fabric
ena = 0xe97c1baedbc01401
detector = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = dev
device-path = /pci@0,0/pci8086,2f06@2,2/pci10b5,8724@0
/pci10b5,8724@1/pci1000,3070@0
(end detector)
bdf = 0x600
device_id = 0x87
vendor_id = 0x1000
rev_id = 0x5
dev_type = 0x0
pcie_off = 0x68
pcix_off = 0x0
aer_off = 0x100
ecc_ver = 0x0
pci_status = 0x10
pci_command = 0x146
pcie_status = 0x1
pcie_command = 0x2037
pcie_dev_cap = 0x10008025
pcie_adv_ctl = 0x0
pcie_ue_status = 0x0
pcie_ue_mask = 0x180000
pcie_ue_sev = 0x462031
pcie_ue_hdr0 = 0x4000001
pcie_ue_hdr1 = 0x122003
pcie_ue_hdr2 = 0x6010000
pcie_ue_hdr3 = 0xb70d8120
pcie_ce_status = 0x1
pcie_ce_mask = 0x0
remainder = 0x0
severity = 0x3
__ttl = 0x1
__tod = 0x54f107a5 0x415ddd1
Feb 27 2015 18:11:17.068478684 ereport.io.pciex.rc.ce-msg
nvlist version: 0
ena = 0xe97c1b9f5a501401
detector = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = dev
device-path = /pci@0,0/pci8086,2f06@2,2
(end detector)
class = ereport.io.pciex.rc.ce-msg
rc-status = 0x1
source-id = 0x600
source-valid = 1
__ttl = 0x1
__tod = 0x54f107a5 0x414e6dc
Feb 27 2015 18:11:17.068509897 ereport.io.pciex.a-nonfatal
nvlist version: 0
ena = 0xe97c1ba6ebb01401
detector = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = dev
device-path = /pci@0,0/pci8086,2f06@2,2/pci10b5,8724@0
(end detector)
class = ereport.io.pciex.a-nonfatal
dev-status = 0x9
ce-status = 0x2000
__ttl = 0x1
__tod = 0x54f107a5 0x41560c9
Feb 27 2015 18:11:17.068509897 ereport.io.pciex.rc.ce-msg
nvlist version: 0
ena = 0xe97c1ba6ebb01401
detector = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = dev
device-path = /pci@0,0
(end detector)
class = ereport.io.pciex.rc.ce-msg
rc-status = 0x1
source-id = 0x400
source-valid = 1
__ttl = 0x1
__tod = 0x54f107a5 0x41560c9
Feb 27 2015 18:11:17.068541905 ereport.io.pciex.pl.re
nvlist version: 0
ena = 0xe97c1baedbc01401
detector = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = dev
device-path = /pci@0,0/pci8086,2f06@2,2/pci10b5,8724@0
/pci10b5,8724@1/pci1000,3070@0
(end detector)
class = ereport.io.pciex.pl.re
dev-status = 0x1
ce-status = 0x1
__ttl = 0x1
__tod = 0x54f107a5 0x415ddd1
Feb 27 2015 18:11:17.068541905 ereport.io.pciex.rc.ce-msg
nvlist version: 0
ena = 0xe97c1baedbc01401
detector = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = dev
device-path = /pci@0,0
(end detector)
class = ereport.io.pciex.rc.ce-msg
rc-status = 0x1
source-id = 0x600
source-valid = 1
__ttl = 0x1
__tod = 0x54f107a5 0x415ddd1
> Robert
>
> >>From host #1:
> >
> > --------------- ------------------------------------ --------------
> > ---------
> > TIME EVENT-ID MSG-ID
> > SEVERITY
> > --------------- ------------------------------------ --------------
> > ---------
> > Feb 27 18:11:19 3951b062-71f1-cccc-9fea-bbdc354f2603 SUNOS-8000-J0
> Major
> >
> > Host : mir-zfs01
> > Platform : SYS-6028U-TR4+ Chassis_id : S16512424A07095
> > Product_sn :
> >
> > Fault class : defect.sunos.eft.unexpected_telemetry 50%
> > fault.sunos.eft.unexpected_telemetry 50%
> > Problem in : dev:////pci@0,0
> > faulted and taken out of service
> >
> > Description : The diagnosis engine encountered telemetry from the listed
> > devices for which it was unable to perform a diagnosis -
> > Refer to http://illumos.org/msg/SUNOS-8000-J0 for more
> > information. Refer to
> http://illumos.org/msg/SUNOS-8000-J0
> > for
> > more information.
> >
> > Response : Error reports have been logged for examination by Sun.
> >
> > Impact : Automated diagnosis and response for these events will not
> > occur.
> >
> > Action : Ensure that the latest Solaris Kernel and Predictive
> > Self-Healing
> > (PSH) patches are installed.
> >
> > --------------- ------------------------------------ --------------
> > ---------
> > TIME EVENT-ID MSG-ID
> > SEVERITY
> > --------------- ------------------------------------ --------------
> > ---------
> > Jan 15 21:53:07 2cb9f0e0-dd7f-c912-dd22-bbaa7a4ebf6c SUNOS-8000-J0
> Major
> >
> > Host : mir-zfs01
> > Platform : SYS-6028U-TR4+ Chassis_id : S16512424A07095
> > Product_sn :
> >
> > Fault class : defect.sunos.eft.unexpected_telemetry max 25%
> > fault.sunos.eft.unexpected_telemetry max 25%
> > Affects : cpu:///cpuid=6
> > cpu:///cpuid=16
> > faulted but still in service
> > FRU :
> >
> hc://:product-id=SYS-6028U-TR4+:server-id=mir-zfs01:chassis-id=S16512424A07095/motherboard=0/chip=0
> > 25%
> >
> >
> hc://:product-id=SYS-6028U-TR4+:server-id=mir-zfs01:chassis-id=S16512424A07095/motherboard=0/chip=1
> > 25%
> > faulty
> >
> > Description : The diagnosis engine encountered telemetry from the listed
> > devices for which it was unable to perform a diagnosis -
> > Refer to http://illumos.org/msg/SUNOS-8000-J0 for more
> > information. Refer to
> http://illumos.org/msg/SUNOS-8000-J0
> > for
> > more information.
> >
> > Response : Error reports have been logged for examination by Sun.
> >
> > Impact : Automated diagnosis and response for these events will not
> > occur.
> >
> > Action : Ensure that the latest Solaris Kernel and Predictive
> > Self-Healing
> > (PSH) patches are installed.
> >
> >
> >>From host #2:
> >
> > --------------- ------------------------------------ --------------
> > ---------
> > TIME EVENT-ID MSG-ID
> > SEVERITY
> > --------------- ------------------------------------ --------------
> > ---------
> > Jan 31 12:45:54 0efc914b-7cc5-c4df-fd11-9be172d4931a SUNOS-8000-J0
> Major
> >
> > Host : mir-zfs02
> > Platform : SYS-6028U-TR4+ Chassis_id : S16512424A07109
> > Product_sn :
> >
> > Fault class : defect.sunos.eft.unexpected_telemetry 50%
> > fault.sunos.eft.unexpected_telemetry 50%
> > Problem in : dev:////pci@74,0
> > faulted and taken out of service
> >
> > Description : The diagnosis engine encountered telemetry from the listed
> > devices for which it was unable to perform a diagnosis -
> > Refer to http://illumos.org/msg/SUNOS-8000-J0 for more
> > information. Refer to
> http://illumos.org/msg/SUNOS-8000-J0
> > for
> > more information.
> >
> > Response : Error reports have been logged for examination by Sun.
> >
> > Impact : Automated diagnosis and response for these events will not
> > occur.
> >
> > Action : Ensure that the latest Solaris Kernel and Predictive
> > Self-Healing
> > (PSH) patches are installed.
> > --------------- ------------------------------------ --------------
> > ---------
> > TIME EVENT-ID MSG-ID
> > SEVERITY
> > --------------- ------------------------------------ --------------
> > ---------
> > Dec 04 15:22:09 6020baed-5ab6-cdb0-95c0-ed3f9fde1172 SUNOS-8000-J0
> Major
> >
> > Host : mir-zfs02
> > Platform : SYS-6028U-TR4+ Chassis_id : S16512424A07109
> > Product_sn :
> >
> > Fault class : fault.sunos.eft.unexpected_telemetry max 25%
> > defect.sunos.eft.unexpected_telemetry max 25%
> > Affects : cpu:///cpuid=41
> > ok and in service
> > cpu:///cpuid=26
> > faulted but still in service
> > FRU :
> >
> hc://:product-id=SYS-6028U-TR4+:server-id=mir-zfs02:chassis-id=S16512424A07109/motherboard=0/chip=1
> > 25%
> > acquitted
> >
> >
> hc://:product-id=SYS-6028U-TR4+:server-id=mir-zfs02:chassis-id=S16512424A07109/motherboard=0/chip=0
> > 25%
> > faulty
> >
> > Description : The diagnosis engine encountered telemetry from the listed
> > devices for which it was unable to perform a diagnosis -
> > Refer to http://illumos.org/msg/SUNOS-8000-J0 for more
> > information. Refer to
> http://illumos.org/msg/SUNOS-8000-J0
> > for
> > more information.
> >
> > Response : Error reports have been logged for examination by Sun.
> >
> > Impact : Automated diagnosis and response for these events will not
> > occur.
> >
> > Action : Ensure that the latest Solaris Kernel and Predictive
> > Self-Healing
> > (PSH) patches are installed.
> >
> > --------------- ------------------------------------ --------------
> > ---------
> > TIME EVENT-ID MSG-ID
> > SEVERITY
> > --------------- ------------------------------------ --------------
> > ---------
> > Dec 04 18:55:38 eadd4984-7c7a-490b-f6e1-b0f936b09ab7 SUNOS-8000-J0
> Major
> >
> > Host : mir-zfs02
> > Platform : SYS-6028U-TR4+ Chassis_id : S16512424A07109
> > Product_sn :
> >
> > Fault class : fault.sunos.eft.unexpected_telemetry max 25%
> > defect.sunos.eft.unexpected_telemetry max 25%
> > Affects : cpu:///cpuid=6
> > cpu:///cpuid=18
> > faulted but still in service
> > FRU :
> >
> hc://:product-id=SYS-6028U-TR4+:server-id=mir-zfs02:chassis-id=S16512424A07109/motherboard=0/chip=0
> > 25%
> >
> >
> hc://:product-id=SYS-6028U-TR4+:server-id=mir-zfs02:chassis-id=S16512424A07109/motherboard=0/chip=1
> > 25%
> > faulty
> >
> > Description : The diagnosis engine encountered telemetry from the listed
> > devices for which it was unable to perform a diagnosis -
> > Refer to http://illumos.org/msg/SUNOS-8000-J0 for more
> > information. Refer to
> http://illumos.org/msg/SUNOS-8000-J0
> > for
> > more information.
> >
> > Response : Error reports have been logged for examination by Sun.
> >
> > Impact : Automated diagnosis and response for these events will not
> > occur.
> >
> > Action : Ensure that the latest Solaris Kernel and Predictive
> > Self-Healing
> > (PSH) patches are installed.
> >
> >
> >
> > -------------------------------------------
> > illumos-discuss
> > Archives: https://www.listbox.com/member/archive/182180/=now
> > RSS Feed:
> https://www.listbox.com/member/archive/rss/182180/21175748-6cf9d6b5
> > Modify Your Subscription:
> https://www.listbox.com/member/?&
> > Powered by Listbox: http://www.listbox.com
> >
>
>
-------------------------------------------
illumos-discuss
Archives: https://www.listbox.com/member/archive/182180/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175430-2e6923be
Modify Your Subscription:
https://www.listbox.com/member/?member_id=21175430&id_secret=21175430-6a77cda4
Powered by Listbox: http://www.listbox.com