Thanks Peng! I believe it is zfs that tries to get/set the cache status in this case.
I have filed the bugs: CR 6941996 (and also CR 6942004). You don't happen to have any more information on the PCI bridge error (ereport.io.pci.fabric)? After my tests, with two different SUN-STK-INT cards in two different slots, I believe it is actually related to the SUN-STK-INT card. /ragge On 8 apr 2010, at 08.44, Peng Liu wrote: > δΊ 2010/4/8 0:07, Ragnar Sundblad ει: >> On 6 apr 2010, at 18.51, Ragnar Sundblad wrote: >> >> >>> On 5 apr 2010, at 11.55, Ragnar Sundblad wrote: >>> >>> >>>> On 5 apr 2010, at 06.41, pavan chandrashekar wrote: >>>> >>>> >>>>> Ragnar Sundblad wrote: >>>>> >>>>>> Hello, >>>>>> I wonder if anyone could help me with a pci-e problem. >>>>>> I have a X4150 running snv_134. It was shipped with a "STK RAID INT" >>>>>> adaptec/intel/storagetek/sun SAS HBA. The machine also has a >>>>>> LSI SAS card in another slot, though I don't know if that is >>>>>> significant in any way. >>>>>> >>>>> It might help troubleshooting. >>>>> >>>>> You can try putting the disks behind the LSI SAS HBA and see if you still >>>>> get errors. That will at the least tell you if the two errors are >>>>> manifestations of the same problem, or separate issues. >>>>> >>>>> You might still have issues with the fabric. You can then take off the >>>>> HBA that is throwing errors (STK RAID) and put the LSI SAS HBA on the >>>>> slot on which the STK RAID rested earlier and check the behaviour. >>>>> Maybe, this will point at the culprit. If the fabric errors continue with >>>>> what ever card on the currently faulty slot (if at all it is), it is more >>>>> probable that the issue is with the fabric. >>>>> >>>> Thanks! The only problem right now and the last few days is that the >>>> machine is at my workplace, some 10 kilometers away, and we have >>>> eastern holiday right now. I was hoping to use those days off having >>>> it running tests all by itself, but have instead been chasing hidden >>>> easter eggs inside an intel design. >>>> >>>> I have now discovered that the ereport.io.pci.fabric started when >>>> I upgraded from snv_128 to 134, I totally missed that relation before. >>>> There has been some changes in the PCI code about that time that may >>>> or may not be related, for example: >>>> <http://src.opensolaris.org/source/history/onnv/onnv-gate/usr/src/cmd/fm/modules/common/fabric-xlate/fabric-xlate.c> >>>> If that means that this is a driver glitch or a hardware problem >>>> that now became visible, and whether it can be ignored or not, >>>> is still far beyond my knowledge. >>>> >>>> But I will follow your advice and move the cards around and see what >>>> happens! >>>> >>> I have now swapped the cards. The problem seems to remain almost identical >>> to before, but if I understand this it is now on another PCI bridge >>> (i suppose by this: pci8086,2...@2, maybe I should check out the chip set >>> documentation). >>> >>> Can someone please tell me how I can decode the ereport information so >>> that I can understand what the PCI bridge complains about? >>> >> I have now also tried with another SUN_STK_INT controller (with >> older firmware, as shipped form Sun) including riser board from another >> X4150, and it gets the same ereports. >> >> I have tried removing the LSI board, and it still behaves the same. >> >> Is there anyone else out there with a Sun X4xxx running snv_134 with >> a SUN_STK_INT raid controller that sees or don't see this? >> >> For the record, the ereport.io.pci.fabric-s appears every >> 4 minutes 4 seconds, give and take half a second or so. >> >> Thanks! >> >> /ragge >> >> > Hi Ragnar, > > The fma message about "sd_get_write_cache_enabled: Mode Sense caching page > code mismatch 0" is because aac driver does not support MODE SENSE command > with Caching mode page. Some userland program wanted to know a disks > write-cache status via sd driver, so sd requested Caching mode page from aac. > When it failed, sd reported it via fma, and that was logged. Please file an > aac driver bug and I'll fix it. > > Thanks, > Peng > >> >>> Thanks! >>> >>> /ragge >>> >>> Apr 06 2010 18:40:34.965687100 ereport.io.pci.fabric >>> nvlist version: 0 >>> class = ereport.io.pci.fabric >>> ena = 0x28d9c49528201801 >>> detector = (embedded nvlist) >>> nvlist version: 0 >>> version = 0x0 >>> scheme = dev >>> device-path = /p...@0,0/pci8086,2...@2 >>> (end detector) >>> >>> bdf = 0x10 >>> device_id = 0x25e2 >>> vendor_id = 0x8086 >>> rev_id = 0xb1 >>> dev_type = 0x40 >>> pcie_off = 0x6c >>> pcix_off = 0x0 >>> aer_off = 0x100 >>> ecc_ver = 0x0 >>> pci_status = 0x10 >>> pci_command = 0x147 >>> pci_bdg_sec_status = 0x0 >>> pci_bdg_ctrl = 0x3 >>> pcie_status = 0x0 >>> pcie_command = 0x2027 >>> pcie_dev_cap = 0xfc1 >>> pcie_adv_ctl = 0x0 >>> pcie_ue_status = 0x0 >>> pcie_ue_mask = 0x100000 >>> pcie_ue_sev = 0x62031 >>> pcie_ue_hdr0 = 0x0 >>> pcie_ue_hdr1 = 0x0 >>> pcie_ue_hdr2 = 0x0 >>> pcie_ue_hdr3 = 0x0 >>> pcie_ce_status = 0x0 >>> pcie_ce_mask = 0x0 >>> pcie_rp_status = 0x0 >>> pcie_rp_control = 0x7 >>> pcie_adv_rp_status = 0x0 >>> pcie_adv_rp_command = 0x7 >>> pcie_adv_rp_ce_src_id = 0x0 >>> pcie_adv_rp_ue_src_id = 0x0 >>> remainder = 0x0 >>> severity = 0x1 >>> __ttl = 0x1 >>> __tod = 0x4bbb6402 0x398f373c >>> >>> >>> >>> >>>> /ragge >>>> >>>> >>>>> Pavan >>>>> >>>>> >>>>>> It logs some errors, as shown with "fmdump -e(V). >>>>>> It is most often a pci bridge error (I think), about five to ten >>>>>> times an hour, and occasionally a problem with accessing a >>>>>> mode page on the disks behind the STK raid controller for >>>>>> enabling/disabling the disks' write caches, one error for each disk, >>>>>> about every three hours. I don't believe the two have to be related. >>>>>> I am especially interested in understanding the ereport.io.pci.fabric >>>>>> report. >>>>>> I haven't seen this problem on other more or less identical >>>>>> machines running sol10. >>>>>> Is this a known software problem, or do I have faulty hardware? >>>>>> Thanks! >>>>>> /ragge >>>>>> -------------- >>>>>> % fmdump -e >>>>>> ... >>>>>> Apr 04 01:21:53.2244 ereport.io.pci.fabric Apr 04 >>>>>> 01:30:00.6999 ereport.io.pci.fabric Apr 04 01:30:23.4647 >>>>>> ereport.io.scsi.cmd.disk.dev.uderr >>>>>> Apr 04 01:30:23.4651 ereport.io.scsi.cmd.disk.dev.uderr >>>>>> ... >>>>>> % fmdump -eV >>>>>> Apr 04 2010 01:21:53.224492765 ereport.io.pci.fabric >>>>>> nvlist version: 0 >>>>>> class = ereport.io.pci.fabric >>>>>> ena = 0xd6a00a43be800c01 >>>>>> detector = (embedded nvlist) >>>>>> nvlist version: 0 >>>>>> version = 0x0 >>>>>> scheme = dev >>>>>> device-path = /p...@0,0/pci8086,2...@4 >>>>>> (end detector) >>>>>> bdf = 0x20 >>>>>> device_id = 0x25f8 >>>>>> vendor_id = 0x8086 >>>>>> rev_id = 0xb1 >>>>>> dev_type = 0x40 >>>>>> pcie_off = 0x6c >>>>>> pcix_off = 0x0 >>>>>> aer_off = 0x100 >>>>>> ecc_ver = 0x0 >>>>>> pci_status = 0x10 >>>>>> pci_command = 0x147 >>>>>> pci_bdg_sec_status = 0x0 >>>>>> pci_bdg_ctrl = 0x3 >>>>>> pcie_status = 0x0 >>>>>> pcie_command = 0x2027 >>>>>> pcie_dev_cap = 0xfc1 >>>>>> pcie_adv_ctl = 0x0 >>>>>> pcie_ue_status = 0x0 >>>>>> pcie_ue_mask = 0x100000 >>>>>> pcie_ue_sev = 0x62031 >>>>>> pcie_ue_hdr0 = 0x0 >>>>>> pcie_ue_hdr1 = 0x0 >>>>>> pcie_ue_hdr2 = 0x0 >>>>>> pcie_ue_hdr3 = 0x0 >>>>>> pcie_ce_status = 0x0 >>>>>> pcie_ce_mask = 0x0 >>>>>> pcie_rp_status = 0x0 >>>>>> pcie_rp_control = 0x7 >>>>>> pcie_adv_rp_status = 0x0 >>>>>> pcie_adv_rp_command = 0x7 >>>>>> pcie_adv_rp_ce_src_id = 0x0 >>>>>> pcie_adv_rp_ue_src_id = 0x0 >>>>>> remainder = 0x0 >>>>>> severity = 0x1 >>>>>> __ttl = 0x1 >>>>>> __tod = 0x4bb7cd91 0xd617cdd >>>>>> ... >>>>>> Apr 04 2010 01:30:23.464768275 ereport.io.scsi.cmd.disk.dev.uderr >>>>>> nvlist version: 0 >>>>>> class = ereport.io.scsi.cmd.disk.dev.uderr >>>>>> ena = 0xde0cd54f84201c01 >>>>>> detector = (embedded nvlist) >>>>>> nvlist version: 0 >>>>>> version = 0x0 >>>>>> scheme = dev >>>>>> device-path = >>>>>> /p...@0,0/pci8086,2...@4/pci108e,2...@0/d...@5,0 >>>>>> devid = id1,s...@tsun_____stk_raid_int____ea4b6f24 >>>>>> (end detector) >>>>>> driver-assessment = fail >>>>>> op-code = 0x1a >>>>>> cdb = 0x1a 0x0 0x8 0x0 0x18 0x0 >>>>>> pkt-reason = 0x0 >>>>>> pkt-state = 0x1f >>>>>> pkt-stats = 0x0 >>>>>> stat-code = 0x0 >>>>>> un-decode-info = sd_get_write_cache_enabled: Mode Sense caching page >>>>>> code mismatch 0 >>>>>> un-decode-value = >>>>>> __ttl = 0x1 >>>>>> __tod = 0x4bb7cf8f 0x1bb3cd13 >>>>>> ... >>>>>> _______________________________________________ >>>>>> driver-discuss mailing list >>>>>> [email protected] >>>>>> http://mail.opensolaris.org/mailman/listinfo/driver-discuss >>>>>> >>>>> >>>> _______________________________________________ >>>> driver-discuss mailing list >>>> [email protected] >>>> http://mail.opensolaris.org/mailman/listinfo/driver-discuss >>>> >>> _______________________________________________ >>> driver-discuss mailing list >>> [email protected] >>> http://mail.opensolaris.org/mailman/listinfo/driver-discuss >>> >> _______________________________________________ >> driver-discuss mailing list >> [email protected] >> http://mail.opensolaris.org/mailman/listinfo/driver-discuss >> > _______________________________________________ driver-discuss mailing list [email protected] http://mail.opensolaris.org/mailman/listinfo/driver-discuss
