> I made some progress understanding the behavior but I > am nowhere close to a solution. Any suggestions would > be welcome. > > First of all, I think the fix in snv_127 for the PCIe > cards does not address the real issue. It simply > slows down transmission to the point where the bug > doesn't arrive. I fixed the card detection as masa > suggested but I commented out the trigger commands in > the send() function. Upon testing it, the driver > worked fine. I tried reducing the counter iterations > from 10 to 4 and the bug appeared. This is a strong > indication that the fix works by changing timing of > events rather than the extra trigger commands > restarting transmissions.
Would you explain the detail of your changes because I'd like to test your change in my box. > Something else that I noticed is that the interface > does come back after 5 min or so after the watchdog > expires. I tried reducing te value of the watchdog > from 64K to 256 but it didn't change how fast > recovery will be. I suspect that until we run our of > trasmit buffers, the watchdog will not trigger. I wonder why watchdog takes 5 minuts too. > Finally, the other weird thing that happens is that > when the card is stuck no packets seem to be > received. Here are the kstat -m rge from two samples > a few seconds apart when the driver is stuck. Look at > rbytes. Also, as time goes by and I got more samples > I started seeing norcvbuf going up. There are many errors in both direction including collisions. Are there any error messages in /var/adm/messages? Did you ensure that the gigabit switch was full duplex mode? -masa > Any suggestions would be welcome. > > First sample: > > module: rge instance: 0 > name: mac class: > net > adv_cap_1000fdx 1 > adv_cap_1000hdx 0 > adv_cap_100fdx 1 > adv_cap_100hdx 1 > adv_cap_100T4 0 > adv_cap_10fdx 1 > adv_cap_10gfdx 0 > adv_cap_10hdx 1 > adv_cap_asmpause 1 > adv_cap_autoneg 1 > adv_cap_pause 1 > adv_rem_fault 0 > align_errors 62207 > brdcstrcv 4308 > brdcstxmt 0 > cap_1000fdx 1 > cap_1000hdx 0 > cap_100fdx 1 > cap_100hdx 1 > cap_100T4 0 > cap_10fdx 1 > cap_10gfdx 0 > cap_10hdx 1 > cap_asmpause 1 > cap_autoneg 1 > cap_pause 1 > cap_rem_fault 0 > carrier_errors 0 > collisions 7452 > crtime 42351.518238685 > defer_xmts 0 > ex_collisions 0 > fcs_errors 0 > first_collisions 770 > ierrors 116109 > ifspeed 1000000000 > ipackets 59187601 > ipackets64 59187601 > jabber_errors 0 > link_asmpause 0 > link_autoneg 0 > link_duplex 2 > link_pause 0 > link_state 1 > link_up 1 > lp_cap_1000fdx 0 > lp_cap_1000hdx 0 > lp_cap_100fdx 0 > lp_cap_100hdx 0 > lp_cap_100T4 0 > lp_cap_10fdx 0 > lp_cap_10gfdx 0 > lp_cap_10hdx 0 > lp_cap_asmpause 0 > lp_cap_autoneg 0 > lp_cap_pause 0 > lp_rem_fault 0 > macrcv_errors 0 > macxmt_errors 0 > multi_collisions 6682 > multircv 324 > multixmt 0 > norcvbuf 0 > noxmtbuf 0 > obytes 1824883714 > obytes64 19004752898 > oerrors 120177 > oflo 0 > opackets 143451801 > opackets64 143451801 > promisc 0 > rbytes 151565154 > rbytes64 151565154 > runt_errors 0 > snaptime 42701.466558939 > sqe_errors 0 > toolong_errors 0 > tx_late_collisions 0 > uflo 0 > unknowns 0 > xcvr_addr 1 > xcvr_id 1886482 > xcvr_inuse 7 > ond sample > > module: rge instance: 0 > name: mac class: > net > adv_cap_1000fdx 1 > adv_cap_1000hdx 0 > adv_cap_100fdx 1 > adv_cap_100hdx 1 > adv_cap_100T4 0 > adv_cap_10fdx 1 > adv_cap_10gfdx 0 > adv_cap_10hdx 1 > adv_cap_asmpause 1 > adv_cap_autoneg 1 > adv_cap_pause 1 > adv_rem_fault 0 > align_errors 62207 > brdcstrcv 4317 > brdcstxmt 0 > cap_1000fdx 1 > cap_1000hdx 0 > cap_100fdx 1 > cap_100hdx 1 > cap_100T4 0 > cap_10fdx 1 > cap_10gfdx 0 > cap_10hdx 1 > cap_asmpause 1 > cap_autoneg 1 > cap_pause 1 > cap_rem_fault 0 > carrier_errors 0 > collisions 7452 > crtime 42351.518238685 > defer_xmts 0 > ex_collisions 0 > fcs_errors 0 > first_collisions 770 > ierrors 116109 > ifspeed 1000000000 > ipackets 59187622 > ipackets64 59187622 > jabber_errors 0 > link_asmpause 0 > link_autoneg 0 > link_duplex 2 > link_pause 0 > link_state 1 > link_up 1 > lp_cap_1000fdx 0 > lp_cap_1000hdx 0 > lp_cap_100fdx 0 > lp_cap_100hdx 0 > lp_cap_100T4 0 > lp_cap_10fdx 0 > lp_cap_10gfdx 0 > lp_cap_10hdx 0 > lp_cap_asmpause 0 > lp_cap_autoneg 0 > lp_cap_pause 0 > lp_rem_fault 0 > macrcv_errors 0 > macxmt_errors 0 > multi_collisions 6682 > multircv 324 > multixmt 0 > norcvbuf 0 > noxmtbuf 0 > obytes 1824884552 > obytes64 19004753736 > oerrors 120177 > oflo 0 > opackets 143451801 > opackets64 143451801 > promisc 0 > rbytes 151565154 > rbytes64 151565154 > runt_errors 0 > snaptime 42727.678195414 > sqe_errors 0 > toolong_errors 0 > tx_late_collisions 0 > uflo 0 > unknowns 0 > xcvr_addr 1 > xcvr_id 1886482 > xcvr_inuse 7 > rd sample > > module: rge instance: 0 > name: mac class: > net > adv_cap_1000fdx 1 > adv_cap_1000hdx 0 > adv_cap_100fdx 1 > adv_cap_100hdx 1 > adv_cap_100T4 0 > adv_cap_10fdx 1 > adv_cap_10gfdx 0 > adv_cap_10hdx 1 > adv_cap_asmpause 1 > adv_cap_autoneg 1 > adv_cap_pause 1 > adv_rem_fault 0 > align_errors 62207 > brdcstrcv 4317 > brdcstxmt 0 > cap_1000fdx 1 > cap_1000hdx 0 > cap_100fdx 1 > cap_100hdx 1 > cap_100T4 0 > cap_10fdx 1 > cap_10gfdx 0 > cap_10hdx 1 > cap_asmpause 1 > cap_autoneg 1 > cap_pause 1 > cap_rem_fault 0 > carrier_errors 0 > collisions 7452 > crtime 42351.518238685 > defer_xmts 0 > ex_collisions 0 > fcs_errors 0 > first_collisions 770 > ierrors 116109 > ifspeed 1000000000 > ipackets 59187622 > ipackets64 59187622 > jabber_errors 0 > link_asmpause 0 > link_autoneg 0 > link_duplex 2 > link_pause 0 > link_state 1 > link_up 1 > lp_cap_1000fdx 0 > lp_cap_1000hdx 0 > lp_cap_100fdx 0 > lp_cap_100hdx 0 > lp_cap_100T4 0 > lp_cap_10fdx 0 > lp_cap_10gfdx 0 > lp_cap_10hdx 0 > lp_cap_asmpause 0 > lp_cap_autoneg 0 > lp_cap_pause 0 > lp_rem_fault 0 > macrcv_errors 0 > macxmt_errors 0 > multi_collisions 6682 > multircv 324 > multixmt 0 > norcvbuf 13 > noxmtbuf 0 > obytes 1824886004 > obytes64 19004755188 > oerrors 120177 > oflo 0 > opackets 143451801 > opackets64 143451801 > promisc 0 > rbytes 151565154 > rbytes64 151565154 > runt_errors 0 > snaptime 42747.590540636 > sqe_errors 0 > toolong_errors 0 > tx_late_collisions 0 > uflo 0 > unknowns 0 > xcvr_addr 1 > xcvr_id 1886482 > xcvr_inuse 7 -- This message posted from opensolaris.org _______________________________________________ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org