Looks like PCIe fabric errors to me.  I don’t have a PCIe manual in front of me 
to decode the actual errors, but my suspicion is that you have a flaky bus — 
I’d power everything off, check and reseat all the PCIe cards, although it 
looks like this is probably implicating a PLX PCIe switch, with an LSI SAS 
controller on the far end.  (Is it an add-in card?  If so, reseat it.)  If the 
switch itself is connected via cabling or mezzanine card or some such, check 
those too, and reseat.

Failing that, I’d contact the system vendor.  

If someone here has access to the PCIe specifications, they can try to decode 
the various ue and ce error register values for you.

        - Garrett


> On Mar 6, 2015, at 11:45 AM, Schweiss, Chip via illumos-discuss 
> <[email protected]> wrote:
> 
> 
> 
> On Fri, Mar 6, 2015 at 10:48 AM, Robert Mustacchi <[email protected] 
> <mailto:[email protected]>> wrote:
> On 3/6/15 8:43 , Schweiss, Chip via illumos-discuss wrote:
> > I have two fairly new Haswell based servers running OmniOS.  I have several
> > faults from both systems that I don't know what they are or what to do
> > about them.
> >
> > I am not seeing any related issues these faults.
> >
> > Can anyone clarify what they are and what to do about them?
> 
> We've received error reports that the system doesn't understand how to
> diagnose. Here, getting the actual ereports that were generated on the
> system and looking at them will shed more light on the problem and will
> allow us to better understand what's happening on the systems.
> 
> 
> I'm not familiar with ereports.  After some googling, I'm assuming you mean 
> the output from 'fmdump -eV'
> 
> Here's reports that correspond to the first event.  If this is what you were 
> asking for I'll dig out the rest of them.
> 
> Feb 27 2015 18:11:17.068478684 ereport.io.pci.fabric
> nvlist version: 0
>         class = ereport.io.pci.fabric
>         ena = 0xe97c1b9f5a501401
>         detector = (embedded nvlist)
>         nvlist version: 0
>                 version = 0x0
>                 scheme = dev
>                 device-path = /pci@0,0/pci8086,2f06@2,2
>         (end detector)
> 
>         bdf = 0x12
>         device_id = 0x2f06
>         vendor_id = 0x8086
>         rev_id = 0x2
>         dev_type = 0x40
>         pcie_off = 0x90
>         pcix_off = 0x0
>         aer_off = 0x148
>         ecc_ver = 0x0
>         pci_status = 0x10
>         pci_command = 0x47
>         pci_bdg_sec_status = 0x2000
>         pci_bdg_ctrl = 0x3
>         pcie_status = 0x0
>         pcie_command = 0x27
>         pcie_dev_cap = 0x8001
>         pcie_adv_ctl = 0x0
>         pcie_ue_status = 0x0
>         pcie_ue_mask = 0x100000
>         pcie_ue_sev = 0x62030
>         pcie_ue_hdr0 = 0x0
>         pcie_ue_hdr1 = 0x0
>         pcie_ue_hdr2 = 0x0
>         pcie_ue_hdr3 = 0x0
>         pcie_ce_status = 0x0
>         pcie_ce_mask = 0x0
>         pcie_rp_status = 0x0
>         pcie_rp_control = 0x0
>         pcie_adv_rp_status = 0x1
>         pcie_adv_rp_command = 0x7
>         pcie_adv_rp_ce_src_id = 0x600
>         pcie_adv_rp_ue_src_id = 0x0
>         remainder = 0x3
>         severity = 0x1
>         __ttl = 0x1
>         __tod = 0x54f107a5 0x414e6dc
> 
> Feb 27 2015 18:11:17.068509897 ereport.io.pci.fabric
> nvlist version: 0
>         class = ereport.io.pci.fabric
>         ena = 0xe97c1ba6ebb01401
>         detector = (embedded nvlist)
>         nvlist version: 0
>                 version = 0x0
>                 scheme = dev
>                 device-path = /pci@0,0/pci8086,2f06@2,2/pci10b5,8724@0
>         (end detector)
> 
>         bdf = 0x400
>         device_id = 0x8724
>         vendor_id = 0x10b5
>         rev_id = 0xca
>         dev_type = 0x50
>         pcie_off = 0x68
>         pcix_off = 0x0
>         aer_off = 0xfb4
>         ecc_ver = 0x0
>         pci_status = 0x10
>         pci_command = 0x147
>         pci_bdg_sec_status = 0x0
>         pci_bdg_ctrl = 0x3
>         pcie_status = 0x9
>         pcie_command = 0x37
>         pcie_dev_cap = 0x8004
>         pcie_adv_ctl = 0xbf
>         pcie_ue_status = 0x100000
>         pcie_ue_mask = 0x180000
>         pcie_ue_sev = 0x62030
>         pcie_ue_hdr0 = 0x0
>         pcie_ue_hdr1 = 0x0
>         pcie_ue_hdr2 = 0x0
>         pcie_ue_hdr3 = 0x0
>         pcie_ce_status = 0x2000
>         pcie_ce_mask = 0x0
>         remainder = 0x2
>         severity = 0x3
>         __ttl = 0x1
>         __tod = 0x54f107a5 0x41560c9
> 
> Feb 27 2015 18:11:17.068526093 ereport.io.pci.fabric
> nvlist version: 0
>         class = ereport.io.pci.fabric
>         ena = 0xe97c1baaee901401
>         detector = (embedded nvlist)
>         nvlist version: 0
>                 version = 0x0
>                 scheme = dev
>                 device-path = 
> /pci@0,0/pci8086,2f06@2,2/pci10b5,8724@0/pci10b5,8724@1
>         (end detector)
> 
>         bdf = 0x508
>         device_id = 0x8724
>         vendor_id = 0x10b5
>         rev_id = 0xca
>         dev_type = 0x60
>         pcie_off = 0x68
>         pcix_off = 0x0
>         aer_off = 0xfb4
>         ecc_ver = 0x0
>         pci_status = 0x10
>         pci_command = 0x147
>         pci_bdg_sec_status = 0x0
>         pci_bdg_ctrl = 0x3
>         pcie_status = 0x0
>         pcie_command = 0x37
>         pcie_dev_cap = 0x8004
>         pcie_adv_ctl = 0xbf
>         pcie_ue_status = 0x0
>         pcie_ue_mask = 0x180000
>         pcie_ue_sev = 0x462030
>         pcie_ue_hdr0 = 0x0
>         pcie_ue_hdr1 = 0x0
>         pcie_ue_hdr2 = 0x0
>         pcie_ue_hdr3 = 0x0
>         pcie_ce_status = 0x0
>         pcie_ce_mask = 0x0
>         remainder = 0x1
>         severity = 0x1
>         __ttl = 0x1
>         __tod = 0x54f107a5 0x415a00d
> 
> Feb 27 2015 18:11:17.068541905 ereport.io.pci.fabric
> nvlist version: 0
>         class = ereport.io.pci.fabric
>         ena = 0xe97c1baedbc01401
>         detector = (embedded nvlist)
>         nvlist version: 0
>                 version = 0x0
>                 scheme = dev
>                 device-path = 
> /pci@0,0/pci8086,2f06@2,2/pci10b5,8724@0/pci10b5,8724@1/pci1000,3070@0
>         (end detector)
> 
>         bdf = 0x600
>         device_id = 0x87
>         vendor_id = 0x1000
>         rev_id = 0x5
>         dev_type = 0x0
>         pcie_off = 0x68
>         pcix_off = 0x0
>         aer_off = 0x100
>         ecc_ver = 0x0
>         pci_status = 0x10
>         pci_command = 0x146
>         pcie_status = 0x1
>         pcie_command = 0x2037
>         pcie_dev_cap = 0x10008025
>         pcie_adv_ctl = 0x0
>         pcie_ue_status = 0x0
>         pcie_ue_mask = 0x180000
>         pcie_ue_sev = 0x462031
>         pcie_ue_hdr0 = 0x4000001
>         pcie_ue_hdr1 = 0x122003
>         pcie_ue_hdr2 = 0x6010000
>         pcie_ue_hdr3 = 0xb70d8120
>         pcie_ce_status = 0x1
>         pcie_ce_mask = 0x0
>         remainder = 0x0
>         severity = 0x3
>         __ttl = 0x1
>         __tod = 0x54f107a5 0x415ddd1
> 
> Feb 27 2015 18:11:17.068478684 ereport.io.pciex.rc.ce-msg
> nvlist version: 0
>         ena = 0xe97c1b9f5a501401
>         detector = (embedded nvlist)
>         nvlist version: 0
>                 version = 0x0
>                 scheme = dev
>                 device-path = /pci@0,0/pci8086,2f06@2,2
>         (end detector)
> 
>         class = ereport.io.pciex.rc.ce-msg
>         rc-status = 0x1
>         source-id = 0x600
>         source-valid = 1
>         __ttl = 0x1
>         __tod = 0x54f107a5 0x414e6dc
> 
> Feb 27 2015 18:11:17.068509897 ereport.io.pciex.a-nonfatal
> nvlist version: 0
>         ena = 0xe97c1ba6ebb01401
>         detector = (embedded nvlist)
>         nvlist version: 0
>                 version = 0x0
>                 scheme = dev
>                 device-path = /pci@0,0/pci8086,2f06@2,2/pci10b5,8724@0
>         (end detector)
> 
>         class = ereport.io.pciex.a-nonfatal
>         dev-status = 0x9
>         ce-status = 0x2000
>         __ttl = 0x1
>         __tod = 0x54f107a5 0x41560c9
> 
> Feb 27 2015 18:11:17.068509897 ereport.io.pciex.rc.ce-msg
> nvlist version: 0
>         ena = 0xe97c1ba6ebb01401
>         detector = (embedded nvlist)
>         nvlist version: 0
>                 version = 0x0
>                 scheme = dev
>                 device-path = /pci@0,0
>         (end detector)
> 
>         class = ereport.io.pciex.rc.ce-msg
>         rc-status = 0x1
>         source-id = 0x400
>         source-valid = 1
>         __ttl = 0x1
>         __tod = 0x54f107a5 0x41560c9
> 
> Feb 27 2015 18:11:17.068541905 ereport.io.pciex.pl.re 
> <http://ereport.io.pciex.pl.re/>
> nvlist version: 0
>         ena = 0xe97c1baedbc01401
>         detector = (embedded nvlist)
>         nvlist version: 0
>                 version = 0x0
>                 scheme = dev
>                 device-path = 
> /pci@0,0/pci8086,2f06@2,2/pci10b5,8724@0/pci10b5,8724@1/pci1000,3070@0
>         (end detector)
> 
>         class = ereport.io.pciex.pl.re <http://ereport.io.pciex.pl.re/>
>         dev-status = 0x1
>         ce-status = 0x1
>         __ttl = 0x1
>         __tod = 0x54f107a5 0x415ddd1
> 
> Feb 27 2015 18:11:17.068541905 ereport.io.pciex.rc.ce-msg
> nvlist version: 0
>         ena = 0xe97c1baedbc01401
>         detector = (embedded nvlist)
>         nvlist version: 0
>                 version = 0x0
>                 scheme = dev
>                 device-path = /pci@0,0
>         (end detector)
> 
>         class = ereport.io.pciex.rc.ce-msg
>         rc-status = 0x1
>         source-id = 0x600
>         source-valid = 1
>         __ttl = 0x1
>         __tod = 0x54f107a5 0x415ddd1
> 
> 
> 
> 
>  
> Robert
> 
> >>From host #1:
> >
> > --------------- ------------------------------------  --------------
> > ---------
> > TIME            EVENT-ID                              MSG-ID
> > SEVERITY
> > --------------- ------------------------------------  --------------
> > ---------
> > Feb 27 18:11:19 3951b062-71f1-cccc-9fea-bbdc354f2603  SUNOS-8000-J0  Major
> >
> > Host        : mir-zfs01
> > Platform    : SYS-6028U-TR4+    Chassis_id  : S16512424A07095
> > Product_sn  :
> >
> > Fault class : defect.sunos.eft.unexpected_telemetry 50%
> >               fault.sunos.eft.unexpected_telemetry 50%
> > Problem in  : dev:////pci@0,0
> >                   faulted and taken out of service
> >
> > Description : The diagnosis engine encountered telemetry from the listed
> >               devices for which it was unable to perform a diagnosis -
> >               Refer to http://illumos.org/msg/SUNOS-8000-J0 
> > <http://illumos.org/msg/SUNOS-8000-J0> for more
> >               information.  Refer to http://illumos.org/msg/SUNOS-8000-J0 
> > <http://illumos.org/msg/SUNOS-8000-J0>
> > for
> >               more information.
> >
> > Response    : Error reports have been logged for examination by Sun.
> >
> > Impact      : Automated diagnosis and response for these events will not
> > occur.
> >
> > Action      : Ensure that the latest Solaris Kernel and Predictive
> > Self-Healing
> >               (PSH) patches are installed.
> >
> > --------------- ------------------------------------  --------------
> > ---------
> > TIME            EVENT-ID                              MSG-ID
> > SEVERITY
> > --------------- ------------------------------------  --------------
> > ---------
> > Jan 15 21:53:07 2cb9f0e0-dd7f-c912-dd22-bbaa7a4ebf6c  SUNOS-8000-J0  Major
> >
> > Host        : mir-zfs01
> > Platform    : SYS-6028U-TR4+    Chassis_id  : S16512424A07095
> > Product_sn  :
> >
> > Fault class : defect.sunos.eft.unexpected_telemetry max 25%
> >               fault.sunos.eft.unexpected_telemetry max 25%
> > Affects     : cpu:///cpuid=6
> >               cpu:///cpuid=16
> >                   faulted but still in service
> > FRU         :
> > hc://:product-id=SYS-6028U-TR4+:server-id=mir-zfs01:chassis-id=S16512424A07095/motherboard=0/chip=0
> > 25%
> >
> > hc://:product-id=SYS-6028U-TR4+:server-id=mir-zfs01:chassis-id=S16512424A07095/motherboard=0/chip=1
> > 25%
> >                   faulty
> >
> > Description : The diagnosis engine encountered telemetry from the listed
> >               devices for which it was unable to perform a diagnosis -
> >               Refer to http://illumos.org/msg/SUNOS-8000-J0 
> > <http://illumos.org/msg/SUNOS-8000-J0> for more
> >               information.  Refer to http://illumos.org/msg/SUNOS-8000-J0 
> > <http://illumos.org/msg/SUNOS-8000-J0>
> > for
> >               more information.
> >
> > Response    : Error reports have been logged for examination by Sun.
> >
> > Impact      : Automated diagnosis and response for these events will not
> > occur.
> >
> > Action      : Ensure that the latest Solaris Kernel and Predictive
> > Self-Healing
> >               (PSH) patches are installed.
> >
> >
> >>From host #2:
> >
> > --------------- ------------------------------------  --------------
> > ---------
> > TIME            EVENT-ID                              MSG-ID
> > SEVERITY
> > --------------- ------------------------------------  --------------
> > ---------
> > Jan 31 12:45:54 0efc914b-7cc5-c4df-fd11-9be172d4931a  SUNOS-8000-J0  Major
> >
> > Host        : mir-zfs02
> > Platform    : SYS-6028U-TR4+    Chassis_id  : S16512424A07109
> > Product_sn  :
> >
> > Fault class : defect.sunos.eft.unexpected_telemetry 50%
> >               fault.sunos.eft.unexpected_telemetry 50%
> > Problem in  : dev:////pci@74,0
> >                   faulted and taken out of service
> >
> > Description : The diagnosis engine encountered telemetry from the listed
> >               devices for which it was unable to perform a diagnosis -
> >               Refer to http://illumos.org/msg/SUNOS-8000-J0 
> > <http://illumos.org/msg/SUNOS-8000-J0> for more
> >               information.  Refer to http://illumos.org/msg/SUNOS-8000-J0 
> > <http://illumos.org/msg/SUNOS-8000-J0>
> > for
> >               more information.
> >
> > Response    : Error reports have been logged for examination by Sun.
> >
> > Impact      : Automated diagnosis and response for these events will not
> > occur.
> >
> > Action      : Ensure that the latest Solaris Kernel and Predictive
> > Self-Healing
> >               (PSH) patches are installed.
> > --------------- ------------------------------------  --------------
> > ---------
> > TIME            EVENT-ID                              MSG-ID
> > SEVERITY
> > --------------- ------------------------------------  --------------
> > ---------
> > Dec 04 15:22:09 6020baed-5ab6-cdb0-95c0-ed3f9fde1172  SUNOS-8000-J0  Major
> >
> > Host        : mir-zfs02
> > Platform    : SYS-6028U-TR4+    Chassis_id  : S16512424A07109
> > Product_sn  :
> >
> > Fault class : fault.sunos.eft.unexpected_telemetry max 25%
> >               defect.sunos.eft.unexpected_telemetry max 25%
> > Affects     : cpu:///cpuid=41
> >                   ok and in service
> >               cpu:///cpuid=26
> >                   faulted but still in service
> > FRU         :
> > hc://:product-id=SYS-6028U-TR4+:server-id=mir-zfs02:chassis-id=S16512424A07109/motherboard=0/chip=1
> > 25%
> >                   acquitted
> >
> > hc://:product-id=SYS-6028U-TR4+:server-id=mir-zfs02:chassis-id=S16512424A07109/motherboard=0/chip=0
> > 25%
> >                   faulty
> >
> > Description : The diagnosis engine encountered telemetry from the listed
> >               devices for which it was unable to perform a diagnosis -
> >               Refer to http://illumos.org/msg/SUNOS-8000-J0 
> > <http://illumos.org/msg/SUNOS-8000-J0> for more
> >               information.  Refer to http://illumos.org/msg/SUNOS-8000-J0 
> > <http://illumos.org/msg/SUNOS-8000-J0>
> > for
> >               more information.
> >
> > Response    : Error reports have been logged for examination by Sun.
> >
> > Impact      : Automated diagnosis and response for these events will not
> > occur.
> >
> > Action      : Ensure that the latest Solaris Kernel and Predictive
> > Self-Healing
> >               (PSH) patches are installed.
> >
> > --------------- ------------------------------------  --------------
> > ---------
> > TIME            EVENT-ID                              MSG-ID
> > SEVERITY
> > --------------- ------------------------------------  --------------
> > ---------
> > Dec 04 18:55:38 eadd4984-7c7a-490b-f6e1-b0f936b09ab7  SUNOS-8000-J0  Major
> >
> > Host        : mir-zfs02
> > Platform    : SYS-6028U-TR4+    Chassis_id  : S16512424A07109
> > Product_sn  :
> >
> > Fault class : fault.sunos.eft.unexpected_telemetry max 25%
> >               defect.sunos.eft.unexpected_telemetry max 25%
> > Affects     : cpu:///cpuid=6
> >               cpu:///cpuid=18
> >                   faulted but still in service
> > FRU         :
> > hc://:product-id=SYS-6028U-TR4+:server-id=mir-zfs02:chassis-id=S16512424A07109/motherboard=0/chip=0
> > 25%
> >
> > hc://:product-id=SYS-6028U-TR4+:server-id=mir-zfs02:chassis-id=S16512424A07109/motherboard=0/chip=1
> > 25%
> >                   faulty
> >
> > Description : The diagnosis engine encountered telemetry from the listed
> >               devices for which it was unable to perform a diagnosis -
> >               Refer to http://illumos.org/msg/SUNOS-8000-J0 
> > <http://illumos.org/msg/SUNOS-8000-J0> for more
> >               information.  Refer to http://illumos.org/msg/SUNOS-8000-J0 
> > <http://illumos.org/msg/SUNOS-8000-J0>
> > for
> >               more information.
> >
> > Response    : Error reports have been logged for examination by Sun.
> >
> > Impact      : Automated diagnosis and response for these events will not
> > occur.
> >
> > Action      : Ensure that the latest Solaris Kernel and Predictive
> > Self-Healing
> >               (PSH) patches are installed.
> >
> >
> >
> > -------------------------------------------
> > illumos-discuss
> > Archives: https://www.listbox.com/member/archive/182180/=now 
> > <https://www.listbox.com/member/archive/182180/=now>
> > RSS Feed: 
> > https://www.listbox.com/member/archive/rss/182180/21175748-6cf9d6b5 
> > <https://www.listbox.com/member/archive/rss/182180/21175748-6cf9d6b5>
> > Modify Your Subscription: https://www.listbox.com/member/?&; 
> > <https://www.listbox.com/member/?&;>
> > Powered by Listbox: http://www.listbox.com <http://www.listbox.com/>
> >
> 
> 
> illumos-discuss | Archives 
> <https://www.listbox.com/member/archive/182180/=now>  
> <https://www.listbox.com/member/archive/rss/182180/22003744-9012f59c> | 
> Modify <https://www.listbox.com/member/?&;> Your Subscription  
> <http://www.listbox.com/>



-------------------------------------------
illumos-discuss
Archives: https://www.listbox.com/member/archive/182180/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175430-2e6923be
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21175430&id_secret=21175430-6a77cda4
Powered by Listbox: http://www.listbox.com

Reply via email to