Re: svn commit: r205090 - head/sys/dev/bge
On Mon, Mar 15, 2010 at 09:31:52PM -0600, Scott Long wrote: > On Mar 12, 2010, at 11:18 AM, Pyun YongHyeon wrote: > > Author: yongari > > Date: Fri Mar 12 18:18:04 2010 > > New Revision: 205090 > > URL: http://svn.freebsd.org/changeset/base/205090 > > > > Log: > > Reorder interrupt handler a bit such that producer/consumer > > index of status block is read first before acknowledging the > > interrupts. Otherwise bge(4) may get stale status block as > > acknowledging an interrupt may yield another status block update. > > > > I'm starting a new sub-thread because it quickly became impossible to keep > context straight in the conversation between you and Bruce. > > The previous rev did this: > > 1. Write an ACK word to the hardware > 2. perform the memory coherency protocol > 3 Cache the status descriptors > 4. Execute the interrupt handlers for the descriptors > > I think that your concern was that after performing step 1, the BGE hardware > would be free to assert a new interrupt and/or update memory in a way that > would interfere with steps 2-4, yes? I don't believe that this is a valid > concern. By performing the ACK first, the driver is guaranteeing that any > new updates done by the BGE hardware will generate a follow-on interrupt that > will be seen and trigger a new run through the interrupt handle. No matter > where an unexpected update happens from the hardware, a new interrupt will be > generated and will be guaranteed to be serviced, ensuring that the update is > seen. Also, the status descriptors are designed to be immune to interference > of this nature; they can go stale, but that can't be corrupted. Again, going > stale is not bad. > > The previous version affirms that a race exists, but guarantees that it won't > be forgotten. There's nothing wrong with this, in my opinion. Whether > you're using MSI or INTx (obviously assuming that there are no hardware bugs > here), the race will be caught. > > I don't like your change because it leaves the ACK step incoherent. By > deferring that write to be after the read, there's no guaranteed of when that > write will actually get flushed to the hardware. It will eventually, but > maybe not as soon as we'd like. > This is valid concern and I seem to missed this. I still think tagged status would be better way to handle interrupts but it still does not solve the issue you mentioned. I also can see a couple of complex code path in Linux which indicates needing of forced flush for mail box register. Old code was safe in this regard. I'll back out the change. ___ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"
Re: svn commit: r205090 - head/sys/dev/bge
On Mar 12, 2010, at 11:18 AM, Pyun YongHyeon wrote: > Author: yongari > Date: Fri Mar 12 18:18:04 2010 > New Revision: 205090 > URL: http://svn.freebsd.org/changeset/base/205090 > > Log: > Reorder interrupt handler a bit such that producer/consumer > index of status block is read first before acknowledging the > interrupts. Otherwise bge(4) may get stale status block as > acknowledging an interrupt may yield another status block update. > I'm starting a new sub-thread because it quickly became impossible to keep context straight in the conversation between you and Bruce. The previous rev did this: 1. Write an ACK word to the hardware 2. perform the memory coherency protocol 3 Cache the status descriptors 4. Execute the interrupt handlers for the descriptors I think that your concern was that after performing step 1, the BGE hardware would be free to assert a new interrupt and/or update memory in a way that would interfere with steps 2-4, yes? I don't believe that this is a valid concern. By performing the ACK first, the driver is guaranteeing that any new updates done by the BGE hardware will generate a follow-on interrupt that will be seen and trigger a new run through the interrupt handle. No matter where an unexpected update happens from the hardware, a new interrupt will be generated and will be guaranteed to be serviced, ensuring that the update is seen. Also, the status descriptors are designed to be immune to interference of this nature; they can go stale, but that can't be corrupted. Again, going stale is not bad. The previous version affirms that a race exists, but guarantees that it won't be forgotten. There's nothing wrong with this, in my opinion. Whether you're using MSI or INTx (obviously assuming that there are no hardware bugs here), the race will be caught. I don't like your change because it leaves the ACK step incoherent. By deferring that write to be after the read, there's no guaranteed of when that write will actually get flushed to the hardware. It will eventually, but maybe not as soon as we'd like. Scott ___ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"
Re: svn commit: r205090 - head/sys/dev/bge
On Tue, Mar 16, 2010 at 03:47:29AM +1100, Bruce Evans wrote: > On Sun, 14 Mar 2010, Pyun YongHyeon wrote: > > >On Sat, Mar 13, 2010 at 11:05:11PM +1100, Bruce Evans wrote: > >>On Fri, 12 Mar 2010, Pyun YongHyeon wrote: > >> > >>>Log: > >>>Reorder interrupt handler a bit such that producer/consumer > >>>index of status block is read first before acknowledging the > >>>interrupts. Otherwise bge(4) may get stale status block as > >>>acknowledging an interrupt may yield another status block update. > >>> > >>>Reviewed by: marius > >> > >>Er, doesn't this give a race instead? It undoes a critical part of > >>rev.1.169 but not the comment part which still says that the ack is > > > >You're probably right it may increase race window for interrupts. > > Rev.1.169 was supposed to fix all races involving the interrupt ack. > An increase from 0 to epsilon is large :-). > It may fix the race but ignored status block coherency. > >But it ensures coherent accesses to the status block which is more > >important than losing interrupts. > > Interrupts weren't lost after 1.169 AFAIK. Rather the reverse. We > take a few extra interrupts in order not to miss any. I don't see how > your change improves coherency but don't really understand the coherency. > But^2 I now remember discussing with scottl that there is no locking > for the status block and it is not clear whether there should be or > how things mostly work when there isn't (mutex-type locking seems to > be useless). > Old code (before r196370) had packet drop issues due to the status block coherency problems. > >I think old code fought not to > >lose interrupts by acknowledging interrupts first and tried to get > >next interrupts if status block was updated in interrupt handler. > >Old code also had similar race window because it blindly > >acknowledged the interrupt, status block could be updated after the > >acknowledgment of the interrupt. > > Why is this fighting? Acking the interrupt should have no affect on > the contents of the status block, and any status block update after > acking should cause a new interrupt. Of course the hardware might If you're talking about interrupt only, yes you're right. Acking the interrupt tell the controller can update the status block at any time which in turn can cause packet drops. Tagged status mode will tell acked sequence number to controller so controller will interrupt again if there is sequence number difference between driver and status block. I think this is more reliable way to catch interrupts as well as fixing status block coherency. > not be as nice as that, but something like this seems to be needed > for a status block to work efficiently. > I still think the bge(4) hardwares except BCM5700 were correctly designed. bge(4) just didn't take advantage of it. > Anyway, the old race window (if it is a race that matters) isn't similar, > but is the opposite race. > > > > >>done first, and why (to ensure getting another interrupt if the status > >>block changes after we have looked at it). > >> > >>% /* > >>%* Do the mandatory PCI flush as well as get the link status. > >>%*/ > >>% statusword = CSR_READ_4(sc, BGE_MAC_STS) & BGE_MACSTAT_LINK_CHANGED; > >>% > >>% /* Make sure the descriptor ring indexes are coherent. */ > >>% bus_dmamap_sync(sc->bge_cdata.bge_status_tag, > >>% sc->bge_cdata.bge_status_map, > >>% BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE); > >>% rx_prod = sc->bge_ldata.bge_status_block->bge_idx[0].bge_rx_prod_idx; > >>% tx_cons = sc->bge_ldata.bge_status_block->bge_idx[0].bge_tx_cons_idx; > >>% sc->bge_ldata.bge_status_block->bge_status = 0; > >>% bus_dmamap_sync(sc->bge_cdata.bge_status_tag, > >>% sc->bge_cdata.bge_status_map, > >>% BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE); > >> > >>The above presumably gives sufficiently coherent accesses to the status > >>block, but what happens if a status update occurs now (before the ack). > > > >Theoretically this may happen if interrupt is shared with other > >devices. Since bge(4) does not check whether the interrupt is ours > >it may blindly process the interrupt. > > Well, my version checks, but you said before that some hardware cannot > be trusted to get this right (it works with my 5701 UP no-MSI). > Yes, for non-MSI case, interrupt could be delivered first before updating a status block. This is the reason why bge(4) still has CSR_READ_4(sc, BGE_MAC_STS) to ensure status block update. This is different issue and I think it can't be easily avoided. > I see that you are saying that your change doesn't help if the interrupt > isn't ours. Then the ack is still done first (long ago, on the last > bge_intr()) so the above may run partly before the interrupt is ours > and partly after, giving the same problem (the one that I don't As long as status block coherency is guaranteed there is no negative effect except doing some unnecessary work. This is the same behavior as before.
Re: svn commit: r205090 - head/sys/dev/bge
On Sun, 14 Mar 2010, Pyun YongHyeon wrote: On Sat, Mar 13, 2010 at 11:05:11PM +1100, Bruce Evans wrote: On Fri, 12 Mar 2010, Pyun YongHyeon wrote: Log: Reorder interrupt handler a bit such that producer/consumer index of status block is read first before acknowledging the interrupts. Otherwise bge(4) may get stale status block as acknowledging an interrupt may yield another status block update. Reviewed by:marius Er, doesn't this give a race instead? It undoes a critical part of rev.1.169 but not the comment part which still says that the ack is You're probably right it may increase race window for interrupts. Rev.1.169 was supposed to fix all races involving the interrupt ack. An increase from 0 to epsilon is large :-). But it ensures coherent accesses to the status block which is more important than losing interrupts. Interrupts weren't lost after 1.169 AFAIK. Rather the reverse. We take a few extra interrupts in order not to miss any. I don't see how your change improves coherency but don't really understand the coherency. But^2 I now remember discussing with scottl that there is no locking for the status block and it is not clear whether there should be or how things mostly work when there isn't (mutex-type locking seems to be useless). I think old code fought not to lose interrupts by acknowledging interrupts first and tried to get next interrupts if status block was updated in interrupt handler. Old code also had similar race window because it blindly acknowledged the interrupt, status block could be updated after the acknowledgment of the interrupt. Why is this fighting? Acking the interrupt should have no affect on the contents of the status block, and any status block update after acking should cause a new interrupt. Of course the hardware might not be as nice as that, but something like this seems to be needed for a status block to work efficiently. Anyway, the old race window (if it is a race that matters) isn't similar, but is the opposite race. done first, and why (to ensure getting another interrupt if the status block changes after we have looked at it). % /* %* Do the mandatory PCI flush as well as get the link status. %*/ % statusword = CSR_READ_4(sc, BGE_MAC_STS) & BGE_MACSTAT_LINK_CHANGED; % % /* Make sure the descriptor ring indexes are coherent. */ % bus_dmamap_sync(sc->bge_cdata.bge_status_tag, % sc->bge_cdata.bge_status_map, % BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE); % rx_prod = sc->bge_ldata.bge_status_block->bge_idx[0].bge_rx_prod_idx; % tx_cons = sc->bge_ldata.bge_status_block->bge_idx[0].bge_tx_cons_idx; % sc->bge_ldata.bge_status_block->bge_status = 0; % bus_dmamap_sync(sc->bge_cdata.bge_status_tag, % sc->bge_cdata.bge_status_map, % BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE); The above presumably gives sufficiently coherent accesses to the status block, but what happens if a status update occurs now (before the ack). Theoretically this may happen if interrupt is shared with other devices. Since bge(4) does not check whether the interrupt is ours it may blindly process the interrupt. Well, my version checks, but you said before that some hardware cannot be trusted to get this right (it works with my 5701 UP no-MSI). I see that you are saying that your change doesn't help if the interrupt isn't ours. Then the ack is still done first (long ago, on the last bge_intr()) so the above may run partly before the interrupt is ours and partly after, giving the same problem (the one that I don't understand :-) caused by running the above after an ack. Is the status block supposed to be frozen once an interrupt really for us occurs until we ack the interrupt? Doesn't the ack prevent an interrupt for this status update? I think If interrupt is shared with other devices it wouldn't. tx_prod and tx cons (read above) don't become stale since they are only advanced by software, and we may processes tx and rx descriptors beyond the ones reported by status updates before or after the ack, but statusword (read above) does become stale. I think this was a long standing bug of bge(4) and it kept bge(4) from running on PAE environments. In order not to lose interrupts I believe bge(4) should use tagged status mode which will enable interrupt tracking via status block. Relying on some timing in interrupt handler can't solve the root cause. All controllers except BCM5700 supports tagged status mode and this commit is the first step to the tagged status mode which requires coherent accesses to the status block. Because driver should tell which interrupts were handled in the interrupt handler coherent access to status block is critical one. ... %* %* We do the ack first to ensure another interrupt if there is a %* status update after the ack. We don't check for the status But we don't do the ack first any more.
Re: svn commit: r205090 - head/sys/dev/bge
On Sat, Mar 13, 2010 at 11:05:11PM +1100, Bruce Evans wrote: > On Fri, 12 Mar 2010, Pyun YongHyeon wrote: > > >Log: > > Reorder interrupt handler a bit such that producer/consumer > > index of status block is read first before acknowledging the > > interrupts. Otherwise bge(4) may get stale status block as > > acknowledging an interrupt may yield another status block update. > > > > Reviewed by:marius > > Er, doesn't this give a race instead? It undoes a critical part of > rev.1.169 but not the comment part which still says that the ack is You're probably right it may increase race window for interrupts. But it ensures coherent accesses to the status block which is more important than losing interrupts. I think old code fought not to lose interrupts by acknowledging interrupts first and tried to get next interrupts if status block was updated in interrupt handler. Old code also had similar race window because it blindly acknowledged the interrupt, status block could be updated after the acknowledgment of the interrupt. > done first, and why (to ensure getting another interrupt if the status > block changes after we have looked at it). > > % /* > % * Do the mandatory PCI flush as well as get the link status. > % */ > % statusword = CSR_READ_4(sc, BGE_MAC_STS) & BGE_MACSTAT_LINK_CHANGED; > % > % /* Make sure the descriptor ring indexes are coherent. */ > % bus_dmamap_sync(sc->bge_cdata.bge_status_tag, > % sc->bge_cdata.bge_status_map, > % BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE); > % rx_prod = sc->bge_ldata.bge_status_block->bge_idx[0].bge_rx_prod_idx; > % tx_cons = sc->bge_ldata.bge_status_block->bge_idx[0].bge_tx_cons_idx; > % sc->bge_ldata.bge_status_block->bge_status = 0; > % bus_dmamap_sync(sc->bge_cdata.bge_status_tag, > % sc->bge_cdata.bge_status_map, > % BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE); > > The above presumably gives sufficiently coherent accesses to the status > block, but what happens if a status update occurs now (before the ack). Theoretically this may happen if interrupt is shared with other devices. Since bge(4) does not check whether the interrupt is ours it may blindly process the interrupt. > Doesn't the ack prevent an interrupt for this status update? I think If interrupt is shared with other devices it wouldn't. > tx_prod and tx cons (read above) don't become stale since they are only > advanced by software, and we may processes tx and rx descriptors beyond > the ones reported by status updates before or after the ack, but > statusword (read above) does become stale. > I think this was a long standing bug of bge(4) and it kept bge(4) from running on PAE environments. In order not to lose interrupts I believe bge(4) should use tagged status mode which will enable interrupt tracking via status block. Relying on some timing in interrupt handler can't solve the root cause. All controllers except BCM5700 supports tagged status mode and this commit is the first step to the tagged status mode which requires coherent accesses to the status block. Because driver should tell which interrupts were handled in the interrupt handler coherent access to status block is critical one. > % > % /* > % * Ack the interrupt by writing something to BGE_MBX_IRQ0_LO. Don't > % * disable interrupts by writing nonzero like we used to, since with > % * our current organization this just gives complications and > % * pessimizations for re-enabling interrupts. We used to have races > % * instead of the necessary complications. Disabling interrupts > > I don't remember seeing races with the current order, but I seem to > remember seeing them when the ack was the last hardware thing in the > function. As described in detail below, the latter gives quite a > large race window so it is easy to miss an interrupt. > > % * would just reduce the chance of a status update while we are > % * running (by switching to the interrupt-mode coalescence > % * parameters), but this chance is already very low so it is more > % * efficient to get another interrupt than prevent it. > > This describes why it doesn't matter if we get an extra interrupt due to > the status block being updated after the ack, even in rev.1.168 when the > race window was much larger (it was the entire runtime of bge_intr(), > which can be several hundred uS; now it is several hundred nS). > Correct. Having an extra interrupt wouldn't hurt so bge(4) should implement reliable way to track last acked interrupts instead of relying on timing. > % * > % * We do the ack first to ensure another interrupt if there is a > % * status update after the ack. We don't check for the status > > But we don't do the ack first any more. > I agree current comment does not match with code. I'll fix that after implementing tagged status mode. > % * changing later because it is more effici
Re: svn commit: r205090 - head/sys/dev/bge
On Fri, 12 Mar 2010, Pyun YongHyeon wrote: Log: Reorder interrupt handler a bit such that producer/consumer index of status block is read first before acknowledging the interrupts. Otherwise bge(4) may get stale status block as acknowledging an interrupt may yield another status block update. Reviewed by: marius Er, doesn't this give a race instead? It undoes a critical part of rev.1.169 but not the comment part which still says that the ack is done first, and why (to ensure getting another interrupt if the status block changes after we have looked at it). % /* %* Do the mandatory PCI flush as well as get the link status. %*/ % statusword = CSR_READ_4(sc, BGE_MAC_STS) & BGE_MACSTAT_LINK_CHANGED; % % /* Make sure the descriptor ring indexes are coherent. */ % bus_dmamap_sync(sc->bge_cdata.bge_status_tag, % sc->bge_cdata.bge_status_map, % BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE); % rx_prod = sc->bge_ldata.bge_status_block->bge_idx[0].bge_rx_prod_idx; % tx_cons = sc->bge_ldata.bge_status_block->bge_idx[0].bge_tx_cons_idx; % sc->bge_ldata.bge_status_block->bge_status = 0; % bus_dmamap_sync(sc->bge_cdata.bge_status_tag, % sc->bge_cdata.bge_status_map, % BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE); The above presumably gives sufficiently coherent accesses to the status block, but what happens if a status update occurs now (before the ack). Doesn't the ack prevent an interrupt for this status update? I think tx_prod and tx cons (read above) don't become stale since they are only advanced by software, and we may processes tx and rx descriptors beyond the ones reported by status updates before or after the ack, but statusword (read above) does become stale. % % /* %* Ack the interrupt by writing something to BGE_MBX_IRQ0_LO. Don't %* disable interrupts by writing nonzero like we used to, since with %* our current organization this just gives complications and %* pessimizations for re-enabling interrupts. We used to have races %* instead of the necessary complications. Disabling interrupts I don't remember seeing races with the current order, but I seem to remember seeing them when the ack was the last hardware thing in the function. As described in detail below, the latter gives quite a large race window so it is easy to miss an interrupt. %* would just reduce the chance of a status update while we are %* running (by switching to the interrupt-mode coalescence %* parameters), but this chance is already very low so it is more %* efficient to get another interrupt than prevent it. This describes why it doesn't matter if we get an extra interrupt due to the status block being updated after the ack, even in rev.1.168 when the race window was much larger (it was the entire runtime of bge_intr(), which can be several hundred uS; now it is several hundred nS). %* %* We do the ack first to ensure another interrupt if there is a %* status update after the ack. We don't check for the status But we don't do the ack first any more. %* changing later because it is more efficient to get another %* interrupt than prevent it, not quite as above (not checking is %* a smaller optimization than not toggling the interrupt enable, %* since checking doesn't involve PCI accesses and toggling require %* the status check). So toggling would probably be a pessimization %* even with MSI. It would only be needed for using a task queue. %*/ % bge_writembx(sc, BGE_MBX_IRQ0_LO, 0); Bruce ___ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"
svn commit: r205090 - head/sys/dev/bge
Author: yongari Date: Fri Mar 12 18:18:04 2010 New Revision: 205090 URL: http://svn.freebsd.org/changeset/base/205090 Log: Reorder interrupt handler a bit such that producer/consumer index of status block is read first before acknowledging the interrupts. Otherwise bge(4) may get stale status block as acknowledging an interrupt may yield another status block update. Reviewed by: marius Modified: head/sys/dev/bge/if_bge.c Modified: head/sys/dev/bge/if_bge.c == --- head/sys/dev/bge/if_bge.c Fri Mar 12 17:55:29 2010(r205089) +++ head/sys/dev/bge/if_bge.c Fri Mar 12 18:18:04 2010(r205090) @@ -3654,6 +3654,22 @@ bge_intr(void *xsc) #endif /* +* Do the mandatory PCI flush as well as get the link status. +*/ + statusword = CSR_READ_4(sc, BGE_MAC_STS) & BGE_MACSTAT_LINK_CHANGED; + + /* Make sure the descriptor ring indexes are coherent. */ + bus_dmamap_sync(sc->bge_cdata.bge_status_tag, + sc->bge_cdata.bge_status_map, + BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE); + rx_prod = sc->bge_ldata.bge_status_block->bge_idx[0].bge_rx_prod_idx; + tx_cons = sc->bge_ldata.bge_status_block->bge_idx[0].bge_tx_cons_idx; + sc->bge_ldata.bge_status_block->bge_status = 0; + bus_dmamap_sync(sc->bge_cdata.bge_status_tag, + sc->bge_cdata.bge_status_map, + BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE); + + /* * Ack the interrupt by writing something to BGE_MBX_IRQ0_LO. Don't * disable interrupts by writing nonzero like we used to, since with * our current organization this just gives complications and @@ -3675,22 +3691,6 @@ bge_intr(void *xsc) */ bge_writembx(sc, BGE_MBX_IRQ0_LO, 0); - /* -* Do the mandatory PCI flush as well as get the link status. -*/ - statusword = CSR_READ_4(sc, BGE_MAC_STS) & BGE_MACSTAT_LINK_CHANGED; - - /* Make sure the descriptor ring indexes are coherent. */ - bus_dmamap_sync(sc->bge_cdata.bge_status_tag, - sc->bge_cdata.bge_status_map, - BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE); - rx_prod = sc->bge_ldata.bge_status_block->bge_idx[0].bge_rx_prod_idx; - tx_cons = sc->bge_ldata.bge_status_block->bge_idx[0].bge_tx_cons_idx; - sc->bge_ldata.bge_status_block->bge_status = 0; - bus_dmamap_sync(sc->bge_cdata.bge_status_tag, - sc->bge_cdata.bge_status_map, - BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE); - if ((sc->bge_asicrev == BGE_ASICREV_BCM5700 && sc->bge_chipid != BGE_CHIPID_BCM5700_B2) || statusword || sc->bge_link_evt) ___ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"