Re: [openib-general] [PATCH v3 1/7] AMSO1100 Low Level Driver.
Also on a related note, have you checked the driver for the needed PCI posting flushes? + + /* Disable IRQs by clearing the interrupt mask */ + writel(1, c2dev-regs + C2_IDIS); + writel(0, c2dev-regs + C2_NIMR0); like here... This code is followed by a call to c2_reset(), which interacts with the firmware on the adapter to quiesce the hardware. So I don't think we need to wait here for the posted writes to flush... + + elem = tx_ring-to_use; + elem-skb = skb; + elem-mapaddr = mapaddr; + elem-maplen = maplen; + + /* Tell HW to xmit */ + __raw_writeq(cpu_to_be64(mapaddr), elem-hw_desc + C2_TXP_ADDR); + __raw_writew(cpu_to_be16(maplen), elem-hw_desc + C2_TXP_LEN); + __raw_writew(cpu_to_be16(TXP_HTXD_READY), elem-hw_desc + C2_TXP_FLAGS); or here No need here. This logic submits the packet for transmission. We don't assume it is transmitted until we (after a completion interrupt usually) read back the HTXD entry and see the TXP_HTXD_DONE bit set (see c2_tx_interrupt()). Steve. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH v3 1/7] AMSO1100 Low Level Driver.
+ /* Tell HW to xmit */ + __raw_writeq(cpu_to_be64(mapaddr), elem-hw_desc + C2_TXP_ADDR); + __raw_writew(cpu_to_be16(maplen), elem-hw_desc + C2_TXP_LEN); + __raw_writew(cpu_to_be16(TXP_HTXD_READY), elem-hw_desc + C2_TXP_FLAGS); or here No need here. This logic submits the packet for transmission. We don't assume it is transmitted until we (after a completion interrupt usually) read back the HTXD entry and see the TXP_HTXD_DONE bit set (see c2_tx_interrupt()). ... but will that interrupt happen at all if these 3 writes never hit the hardware? ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH v3 1/7] AMSO1100 Low Level Driver.
On Fri, 2006-06-23 at 15:48 +0200, Arjan van de Ven wrote: + /* Tell HW to xmit */ + __raw_writeq(cpu_to_be64(mapaddr), elem-hw_desc + C2_TXP_ADDR); + __raw_writew(cpu_to_be16(maplen), elem-hw_desc + C2_TXP_LEN); + __raw_writew(cpu_to_be16(TXP_HTXD_READY), elem-hw_desc + C2_TXP_FLAGS); or here No need here. This logic submits the packet for transmission. We don't assume it is transmitted until we (after a completion interrupt usually) read back the HTXD entry and see the TXP_HTXD_DONE bit set (see c2_tx_interrupt()). ... but will that interrupt happen at all if these 3 writes never hit the hardware? I thought the posted write WILL eventually get to adapter memory. Not stall forever cached in a bridge. I'm wrong? My point is for a given HTXD entry, we write it to post a packet for transmission, then only free the packet memory and reuse this entry _after_ reading the HTXD and seeing the DONE bit set. So I still don't see a problem. But I've been wrong before ;-) Steve. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH v3 1/7] AMSO1100 Low Level Driver.
On Fri, 2006-06-23 at 08:56 -0500, Steve Wise wrote: On Fri, 2006-06-23 at 15:48 +0200, Arjan van de Ven wrote: + /* Tell HW to xmit */ + __raw_writeq(cpu_to_be64(mapaddr), elem-hw_desc + C2_TXP_ADDR); + __raw_writew(cpu_to_be16(maplen), elem-hw_desc + C2_TXP_LEN); + __raw_writew(cpu_to_be16(TXP_HTXD_READY), elem-hw_desc + C2_TXP_FLAGS); or here No need here. This logic submits the packet for transmission. We don't assume it is transmitted until we (after a completion interrupt usually) read back the HTXD entry and see the TXP_HTXD_DONE bit set (see c2_tx_interrupt()). ... but will that interrupt happen at all if these 3 writes never hit the hardware? I thought the posted write WILL eventually get to adapter memory. Not stall forever cached in a bridge. I'm wrong? I'm not sure there is a theoretical upper bound (and if it's several msec per bridge, then you have a lot of latency anyway) ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH v3 1/7] AMSO1100 Low Level Driver.
On Fri, Jun 23, 2006 at 04:04:31PM +0200, Arjan van de Ven wrote: I thought the posted write WILL eventually get to adapter memory. Not stall forever cached in a bridge. I'm wrong? I'm not sure there is a theoretical upper bound I'm not aware of one either since MMIO writes can travel across many other chips that are not constrained by PCI ordering rules (I'm thinking of SGI Altix...) (and if it's several msec per bridge, then you have a lot of latency anyway) That's what my original concern was when I saw you point this out. But MMIO reads here would be expensive and many drivers tolerate this latency in exchange for avoiding the MMIO read in the performance path. grant ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH v3 1/7] AMSO1100 Low Level Driver.
At 10:14 AM 6/23/2006, Grant Grundler wrote: On Fri, Jun 23, 2006 at 04:04:31PM +0200, Arjan van de Ven wrote: I thought the posted write WILL eventually get to adapter memory. Not stall forever cached in a bridge. I'm wrong? I'm not sure there is a theoretical upper bound I'm not aware of one either since MMIO writes can travel across many other chips that are not constrained by PCI ordering rules (I'm thinking of SGI Altix...) It is processor / coherency backplane technology specific as to the number of outstanding writes. There is also no guarantee that such writes will hit the top of the PCI hierarchy in the order they were posted in a multi-core / processor system. Hence, it is up to software to guarantee that ordering is preserved and to not assume anything about ordering from a hardware perspective. Once a transaction hits the PCI hierarchy, then the PCI ordering rules apply and depending upon the transaction type and other rules, what is guaranteed is deterministic in nature. (and if it's several msec per bridge, then you have a lot of latency anyway) That's what my original concern was when I saw you point this out. But MMIO reads here would be expensive and many drivers tolerate this latency in exchange for avoiding the MMIO read in the performance path. As the saying goes, MMIO Reads are pure evil and should be avoided at all costs if performance is the goal. Even in a relatively flat I/O hierarchy, the additional latency is non-trivial and can lead to a significant loss in performance for the system. Mike ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH v3 1/7] AMSO1100 Low Level Driver.
On Tue, 2006-06-20 at 22:43 +0200, Arjan van de Ven wrote: On Tue, 2006-06-20 at 15:30 -0500, Steve Wise wrote: +/* + * Allocate TX ring elements and chain them together. + * One-to-one association of adapter descriptors with ring elements. + */ +static int c2_tx_ring_alloc(struct c2_ring *tx_ring, void *vaddr, + dma_addr_t base, void __iomem * mmio_txp_ring) +{ + struct c2_tx_desc *tx_desc; + struct c2_txp_desc __iomem *txp_desc; + struct c2_element *elem; + int i; + + tx_ring-start = kmalloc(sizeof(*elem) * tx_ring-count, GFP_KERNEL); I would think this needs a dma_alloc_coherent() rather than a kmalloc... No, this memory is used to describe the tx ring from the host's perspective. The HW never touches this memory. The HW's TX descriptor ring is in adapter memory and is mapped into host memory (see c2dev-mmio_txp_ring). + +/* Free all buffers in RX ring, assumes receiver stopped */ +static void c2_rx_clean(struct c2_port *c2_port) +{ + struct c2_dev *c2dev = c2_port-c2dev; + struct c2_ring *rx_ring = c2_port-rx_ring; + struct c2_element *elem; + struct c2_rx_desc *rx_desc; + + elem = rx_ring-start; + do { + rx_desc = elem-ht_desc; + rx_desc-len = 0; + + __raw_writew(0, elem-hw_desc + C2_RXP_STATUS); + __raw_writew(0, elem-hw_desc + C2_RXP_COUNT); + __raw_writew(0, elem-hw_desc + C2_RXP_LEN); you seem to be a fan of the __raw_write() functions... any reason why? __raw_ is not a magic go faster prefix In this particular case, I believe this is done to avoid a swap of '0' since its not necessary. In other places, __raw is used because the adapter needs the data in BE and we want to explicitly swap it using cpu_to_be* then raw_write it to the adapter memory... Also on a related note, have you checked the driver for the needed PCI posting flushes? Um, what's a 'PCI posting flush'? Can you point me where its described/used so I can see if we need it? Thanx. + + /* Disable IRQs by clearing the interrupt mask */ + writel(1, c2dev-regs + C2_IDIS); + writel(0, c2dev-regs + C2_NIMR0); like here... + + elem = tx_ring-to_use; + elem-skb = skb; + elem-mapaddr = mapaddr; + elem-maplen = maplen; + + /* Tell HW to xmit */ + __raw_writeq(cpu_to_be64(mapaddr), elem-hw_desc + C2_TXP_ADDR); + __raw_writew(cpu_to_be16(maplen), elem-hw_desc + C2_TXP_LEN); + __raw_writew(cpu_to_be16(TXP_HTXD_READY), elem-hw_desc + C2_TXP_FLAGS); or here +static int c2_change_mtu(struct net_device *netdev, int new_mtu) +{ + int ret = 0; + + if (new_mtu ETH_ZLEN || new_mtu ETH_JUMBO_MTU) + return -EINVAL; + + netdev-mtu = new_mtu; + + if (netif_running(netdev)) { + c2_down(netdev); + + c2_up(netdev); + } this looks odd... The 1100 hardware caches the dma address of the next skb that will be used to place data. When the MTU changes, we want to free the SKBs in the RX descriptor ring and get new ones that sufficient for the new MTU. To effectively flush that cached address of the old skb, we must quiesce the HW and firmware (via c2_down()), then reinitialize everything with skb's big enough for the new mtu. Steve. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH v3 1/7] AMSO1100 Low Level Driver.
On Wed, Jun 21, 2006 at 11:32:51AM -0500, Steve Wise wrote: Um, what's a 'PCI posting flush'? Can you point me where its described/used so I can see if we need it? Thanx. I've written this up before: http://iou.parisc-linux.org/ols_2002/4Posted_vs_Non_Posted.html grant ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH v3 1/7] AMSO1100 Low Level Driver.
ok pci posting... basically, if you use writel() and co, the PCI bridges in the middle are allowed (and the more fancy ones do) cache the write, to see if more writes follow, so that the bridge can do the writes as a single burst to the device, rather than as individual writes. This is of course great... ... except when you really want the write to hit the device before the driver continues with other actions. Now the PCI spec is set up such that any traffic in the other direction (basically readl() and co) will first flush the write through the system before the read is actually sent to the device, so doing a dummy readl() is a good way to flush any pending posted writes. Where does this matter? it matters most at places such as irq enabling/disabling, IO submission and possibly IRQ acking, but also often in eeprom-like read/write logic (where you do manual clocking and need to do delays between the write()'s). But in general... any place where you do writel() without doing any readl() before doing nothing to the card for a long time, or where you are waiting for the card to do something (or want it done NOW, such as IRQ disabling) you need to issue a (dummy) readl() to flush pending writes out to the hardware. does this explanation make any sense? if not please feel free to ask any questions, I know I'm not always very good at explaining things. Yep. I get it. I believe we're ok in this respect, but I'll review the code again with an eye for this issue... Steve. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH v3 1/7] AMSO1100 Low Level Driver.
On Tue, 2006-06-20 at 15:30 -0500, Steve Wise wrote: +/* + * Allocate TX ring elements and chain them together. + * One-to-one association of adapter descriptors with ring elements. + */ +static int c2_tx_ring_alloc(struct c2_ring *tx_ring, void *vaddr, + dma_addr_t base, void __iomem * mmio_txp_ring) +{ + struct c2_tx_desc *tx_desc; + struct c2_txp_desc __iomem *txp_desc; + struct c2_element *elem; + int i; + + tx_ring-start = kmalloc(sizeof(*elem) * tx_ring-count, GFP_KERNEL); I would think this needs a dma_alloc_coherent() rather than a kmalloc... + +/* Free all buffers in RX ring, assumes receiver stopped */ +static void c2_rx_clean(struct c2_port *c2_port) +{ + struct c2_dev *c2dev = c2_port-c2dev; + struct c2_ring *rx_ring = c2_port-rx_ring; + struct c2_element *elem; + struct c2_rx_desc *rx_desc; + + elem = rx_ring-start; + do { + rx_desc = elem-ht_desc; + rx_desc-len = 0; + + __raw_writew(0, elem-hw_desc + C2_RXP_STATUS); + __raw_writew(0, elem-hw_desc + C2_RXP_COUNT); + __raw_writew(0, elem-hw_desc + C2_RXP_LEN); you seem to be a fan of the __raw_write() functions... any reason why? __raw_ is not a magic go faster prefix Also on a related note, have you checked the driver for the needed PCI posting flushes? + + /* Disable IRQs by clearing the interrupt mask */ + writel(1, c2dev-regs + C2_IDIS); + writel(0, c2dev-regs + C2_NIMR0); like here... + + elem = tx_ring-to_use; + elem-skb = skb; + elem-mapaddr = mapaddr; + elem-maplen = maplen; + + /* Tell HW to xmit */ + __raw_writeq(cpu_to_be64(mapaddr), elem-hw_desc + C2_TXP_ADDR); + __raw_writew(cpu_to_be16(maplen), elem-hw_desc + C2_TXP_LEN); + __raw_writew(cpu_to_be16(TXP_HTXD_READY), elem-hw_desc + C2_TXP_FLAGS); or here +static int c2_change_mtu(struct net_device *netdev, int new_mtu) +{ + int ret = 0; + + if (new_mtu ETH_ZLEN || new_mtu ETH_JUMBO_MTU) + return -EINVAL; + + netdev-mtu = new_mtu; + + if (netif_running(netdev)) { + c2_down(netdev); + + c2_up(netdev); + } this looks odd... ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general