Re: [openib-general] [PATCH v3 1/7] AMSO1100 Low Level Driver.

2006-06-23 Thread Steve Wise

 
 Also on a related note, have you checked the driver for the needed PCI
 posting flushes?
 
  +
  +   /* Disable IRQs by clearing the interrupt mask */
  +   writel(1, c2dev-regs + C2_IDIS);
  +   writel(0, c2dev-regs + C2_NIMR0);
 
 like here...

This code is followed by a call to c2_reset(), which interacts with the
firmware on the adapter to quiesce the hardware.  So I don't think we
need to wait here for the posted writes to flush...

  +
  +   elem = tx_ring-to_use;
  +   elem-skb = skb;
  +   elem-mapaddr = mapaddr;
  +   elem-maplen = maplen;
  +
  +   /* Tell HW to xmit */
  +   __raw_writeq(cpu_to_be64(mapaddr), elem-hw_desc + C2_TXP_ADDR);
  +   __raw_writew(cpu_to_be16(maplen), elem-hw_desc + C2_TXP_LEN);
  +   __raw_writew(cpu_to_be16(TXP_HTXD_READY), elem-hw_desc + C2_TXP_FLAGS);
 
 or here
 

No need here.  This logic submits the packet for transmission.  We don't
assume it is transmitted until we (after a completion interrupt usually)
read back the HTXD entry and see the TXP_HTXD_DONE bit set (see
c2_tx_interrupt()). 


Steve.



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH v3 1/7] AMSO1100 Low Level Driver.

2006-06-23 Thread Arjan van de Ven

   + /* Tell HW to xmit */
   + __raw_writeq(cpu_to_be64(mapaddr), elem-hw_desc + C2_TXP_ADDR);
   + __raw_writew(cpu_to_be16(maplen), elem-hw_desc + C2_TXP_LEN);
   + __raw_writew(cpu_to_be16(TXP_HTXD_READY), elem-hw_desc + C2_TXP_FLAGS);
  
  or here
  
 
 No need here.  This logic submits the packet for transmission.  We don't
 assume it is transmitted until we (after a completion interrupt usually)
 read back the HTXD entry and see the TXP_HTXD_DONE bit set (see
 c2_tx_interrupt()). 

... but will that interrupt happen at all if these 3 writes never hit
the hardware?



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH v3 1/7] AMSO1100 Low Level Driver.

2006-06-23 Thread Steve Wise
On Fri, 2006-06-23 at 15:48 +0200, Arjan van de Ven wrote:
+   /* Tell HW to xmit */
+   __raw_writeq(cpu_to_be64(mapaddr), elem-hw_desc + C2_TXP_ADDR);
+   __raw_writew(cpu_to_be16(maplen), elem-hw_desc + C2_TXP_LEN);
+   __raw_writew(cpu_to_be16(TXP_HTXD_READY), elem-hw_desc + 
C2_TXP_FLAGS);
   
   or here
   
  
  No need here.  This logic submits the packet for transmission.  We don't
  assume it is transmitted until we (after a completion interrupt usually)
  read back the HTXD entry and see the TXP_HTXD_DONE bit set (see
  c2_tx_interrupt()). 
 
 ... but will that interrupt happen at all if these 3 writes never hit
 the hardware?
 

I thought the posted write WILL eventually get to adapter memory.  Not
stall forever cached in a bridge.  I'm wrong?

My point is for a given HTXD entry, we write it to post a packet for
transmission, then only free the packet memory and reuse this entry
_after_ reading the HTXD and seeing the DONE bit set.  So I still don't
see a problem.  But I've been wrong before ;-)

Steve.





___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH v3 1/7] AMSO1100 Low Level Driver.

2006-06-23 Thread Arjan van de Ven
On Fri, 2006-06-23 at 08:56 -0500, Steve Wise wrote:
 On Fri, 2006-06-23 at 15:48 +0200, Arjan van de Ven wrote:
 + /* Tell HW to xmit */
 + __raw_writeq(cpu_to_be64(mapaddr), elem-hw_desc + C2_TXP_ADDR);
 + __raw_writew(cpu_to_be16(maplen), elem-hw_desc + C2_TXP_LEN);
 + __raw_writew(cpu_to_be16(TXP_HTXD_READY), elem-hw_desc + 
 C2_TXP_FLAGS);

or here

   
   No need here.  This logic submits the packet for transmission.  We don't
   assume it is transmitted until we (after a completion interrupt usually)
   read back the HTXD entry and see the TXP_HTXD_DONE bit set (see
   c2_tx_interrupt()). 
  
  ... but will that interrupt happen at all if these 3 writes never hit
  the hardware?
  
 
 I thought the posted write WILL eventually get to adapter memory.  Not
 stall forever cached in a bridge.  I'm wrong?

I'm not sure there is a theoretical upper bound 

(and if it's several msec per bridge, then you have a lot of latency
anyway)


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH v3 1/7] AMSO1100 Low Level Driver.

2006-06-23 Thread Grant Grundler
On Fri, Jun 23, 2006 at 04:04:31PM +0200, Arjan van de Ven wrote:
  I thought the posted write WILL eventually get to adapter memory.  Not
  stall forever cached in a bridge.  I'm wrong?
 
 I'm not sure there is a theoretical upper bound 

I'm not aware of one either since MMIO writes can travel
across many other chips that are not constrained by
PCI ordering rules (I'm thinking of SGI Altix...)

 (and if it's several msec per bridge, then you have a lot of latency
 anyway)

That's what my original concern was when I saw you point this out.
But MMIO reads here would be expensive and many drivers tolerate
this latency in exchange for avoiding the MMIO read in the
performance path.

grant

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH v3 1/7] AMSO1100 Low Level Driver.

2006-06-23 Thread Michael Krause


At 10:14 AM 6/23/2006, Grant Grundler wrote:
On Fri, Jun 23, 2006 at
04:04:31PM +0200, Arjan van de Ven wrote:
  I thought the posted write WILL eventually get to adapter
memory. Not
  stall forever cached in a bridge. I'm wrong?
 
 I'm not sure there is a theoretical upper bound 
I'm not aware of one either since MMIO writes can travel
across many other chips that are not constrained by
PCI ordering rules (I'm thinking of SGI
Altix...)
It is processor / coherency backplane technology specific as to the
number of outstanding writes. There is also no guarantee that such
writes will hit the top of the PCI hierarchy in the order they were
posted in a multi-core / processor system. Hence, it is up to
software to guarantee that ordering is preserved and to not assume
anything about ordering from a hardware perspective. Once a
transaction hits the PCI hierarchy, then the PCI ordering rules apply and
depending upon the transaction type and other rules, what is guaranteed
is deterministic in nature.

 (and if it's
several msec per bridge, then you have a lot of latency
 anyway)
That's what my original concern was when I saw you point this out.
But MMIO reads here would be expensive and many drivers tolerate
this latency in exchange for avoiding the MMIO read in the
performance path.
As the saying goes, MMIO Reads are pure evil and should be
avoided at all costs if performance is the goal. Even in a
relatively flat I/O hierarchy, the additional latency is non-trivial and
can lead to a significant loss in performance for the system.

Mike

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH v3 1/7] AMSO1100 Low Level Driver.

2006-06-21 Thread Steve Wise
On Tue, 2006-06-20 at 22:43 +0200, Arjan van de Ven wrote:
 On Tue, 2006-06-20 at 15:30 -0500, Steve Wise wrote:
 
  +/*
  + * Allocate TX ring elements and chain them together.
  + * One-to-one association of adapter descriptors with ring elements.
  + */
  +static int c2_tx_ring_alloc(struct c2_ring *tx_ring, void *vaddr,
  +   dma_addr_t base, void __iomem * mmio_txp_ring)
  +{
  +   struct c2_tx_desc *tx_desc;
  +   struct c2_txp_desc __iomem *txp_desc;
  +   struct c2_element *elem;
  +   int i;
  +
  +   tx_ring-start = kmalloc(sizeof(*elem) * tx_ring-count, GFP_KERNEL);
 
 I would think this needs a dma_alloc_coherent() rather than a kmalloc...
 

No, this memory is used to describe the tx ring from the host's
perspective.  The HW never touches this memory.  The HW's TX descriptor
ring is in adapter memory and is mapped into host memory (see
c2dev-mmio_txp_ring).

 
  +
  +/* Free all buffers in RX ring, assumes receiver stopped */
  +static void c2_rx_clean(struct c2_port *c2_port)
  +{
  +   struct c2_dev *c2dev = c2_port-c2dev;
  +   struct c2_ring *rx_ring = c2_port-rx_ring;
  +   struct c2_element *elem;
  +   struct c2_rx_desc *rx_desc;
  +
  +   elem = rx_ring-start;
  +   do {
  +   rx_desc = elem-ht_desc;
  +   rx_desc-len = 0;
  +
  +   __raw_writew(0, elem-hw_desc + C2_RXP_STATUS);
  +   __raw_writew(0, elem-hw_desc + C2_RXP_COUNT);
  +   __raw_writew(0, elem-hw_desc + C2_RXP_LEN);
 
 you seem to be a fan of the __raw_write() functions... any reason why?
 __raw_ is not a magic go faster prefix
 

In this particular case, I believe this is done to avoid a swap of '0'
since its not necessary.  In other places, __raw is used because the
adapter needs the data in BE and we want to explicitly swap it using
cpu_to_be* then raw_write it to the adapter memory...


 Also on a related note, have you checked the driver for the needed PCI
 posting flushes?
 

Um, what's a 'PCI posting flush'?  Can you point me where its
described/used so I can see if we need it?  Thanx.


  +
  +   /* Disable IRQs by clearing the interrupt mask */
  +   writel(1, c2dev-regs + C2_IDIS);
  +   writel(0, c2dev-regs + C2_NIMR0);
 
 like here...
  +
  +   elem = tx_ring-to_use;
  +   elem-skb = skb;
  +   elem-mapaddr = mapaddr;
  +   elem-maplen = maplen;
  +
  +   /* Tell HW to xmit */
  +   __raw_writeq(cpu_to_be64(mapaddr), elem-hw_desc + C2_TXP_ADDR);
  +   __raw_writew(cpu_to_be16(maplen), elem-hw_desc + C2_TXP_LEN);
  +   __raw_writew(cpu_to_be16(TXP_HTXD_READY), elem-hw_desc + C2_TXP_FLAGS);
 
 or here
 
  +static int c2_change_mtu(struct net_device *netdev, int new_mtu)
  +{
  +   int ret = 0;
  +
  +   if (new_mtu  ETH_ZLEN || new_mtu  ETH_JUMBO_MTU)
  +   return -EINVAL;
  +
  +   netdev-mtu = new_mtu;
  +
  +   if (netif_running(netdev)) {
  +   c2_down(netdev);
  +
  +   c2_up(netdev);
  +   }
 
 this looks odd...
 

The 1100 hardware caches the dma address of the next skb that will be
used to place data.  When the MTU changes, we want to free the SKBs in
the RX descriptor ring and get new ones that sufficient for the new MTU.
To effectively flush that cached address of the old skb, we must quiesce
the HW and firmware (via c2_down()), then reinitialize everything with
skb's big enough for the new mtu. 

Steve.



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH v3 1/7] AMSO1100 Low Level Driver.

2006-06-21 Thread Grant Grundler
On Wed, Jun 21, 2006 at 11:32:51AM -0500, Steve Wise wrote:
 Um, what's a 'PCI posting flush'?  Can you point me where its
 described/used so I can see if we need it?  Thanx.

I've written this up before:
http://iou.parisc-linux.org/ols_2002/4Posted_vs_Non_Posted.html

grant

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH v3 1/7] AMSO1100 Low Level Driver.

2006-06-21 Thread Steve Wise

 ok pci posting...
 
 basically, if you use writel() and co, the PCI bridges in the middle are
 allowed (and the more fancy ones do) cache the write, to see if more
 writes follow, so that the bridge can do the writes as a single burst to
 the device, rather than as individual writes. This is of course great...
 ... except when you really want the write to hit the device before the
 driver continues with other actions. 
 
 Now the PCI spec is set up such that any traffic in the other direction
 (basically readl() and co) will first flush the write through the system
 before the read is actually sent to the device, so doing a dummy readl()
 is a good way to flush any pending posted writes.
 
 Where does this matter? 
 it matters most at places such as irq enabling/disabling, IO submission
 and possibly IRQ acking, but also often in eeprom-like read/write logic
 (where you do manual clocking and need to do delays between the
 write()'s). But in general... any place where you do writel() without
 doing any readl() before doing nothing to the card for a long time, or
 where you are waiting for the card to do something (or want it done NOW,
 such as IRQ disabling) you need to issue a (dummy) readl() to flush
 pending writes out to the hardware.
 
 
 does this explanation make any sense? if not please feel free to ask any
 questions, I know I'm not always very good at explaining things.

Yep.  I get it.  I believe we're ok in this respect, but I'll review the
code again with an eye for this issue...

Steve.




___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH v3 1/7] AMSO1100 Low Level Driver.

2006-06-20 Thread Arjan van de Ven
On Tue, 2006-06-20 at 15:30 -0500, Steve Wise wrote:

 +/*
 + * Allocate TX ring elements and chain them together.
 + * One-to-one association of adapter descriptors with ring elements.
 + */
 +static int c2_tx_ring_alloc(struct c2_ring *tx_ring, void *vaddr,
 + dma_addr_t base, void __iomem * mmio_txp_ring)
 +{
 + struct c2_tx_desc *tx_desc;
 + struct c2_txp_desc __iomem *txp_desc;
 + struct c2_element *elem;
 + int i;
 +
 + tx_ring-start = kmalloc(sizeof(*elem) * tx_ring-count, GFP_KERNEL);

I would think this needs a dma_alloc_coherent() rather than a kmalloc...


 +
 +/* Free all buffers in RX ring, assumes receiver stopped */
 +static void c2_rx_clean(struct c2_port *c2_port)
 +{
 + struct c2_dev *c2dev = c2_port-c2dev;
 + struct c2_ring *rx_ring = c2_port-rx_ring;
 + struct c2_element *elem;
 + struct c2_rx_desc *rx_desc;
 +
 + elem = rx_ring-start;
 + do {
 + rx_desc = elem-ht_desc;
 + rx_desc-len = 0;
 +
 + __raw_writew(0, elem-hw_desc + C2_RXP_STATUS);
 + __raw_writew(0, elem-hw_desc + C2_RXP_COUNT);
 + __raw_writew(0, elem-hw_desc + C2_RXP_LEN);

you seem to be a fan of the __raw_write() functions... any reason why?
__raw_ is not a magic go faster prefix

Also on a related note, have you checked the driver for the needed PCI
posting flushes?

 +
 + /* Disable IRQs by clearing the interrupt mask */
 + writel(1, c2dev-regs + C2_IDIS);
 + writel(0, c2dev-regs + C2_NIMR0);

like here...
 +
 + elem = tx_ring-to_use;
 + elem-skb = skb;
 + elem-mapaddr = mapaddr;
 + elem-maplen = maplen;
 +
 + /* Tell HW to xmit */
 + __raw_writeq(cpu_to_be64(mapaddr), elem-hw_desc + C2_TXP_ADDR);
 + __raw_writew(cpu_to_be16(maplen), elem-hw_desc + C2_TXP_LEN);
 + __raw_writew(cpu_to_be16(TXP_HTXD_READY), elem-hw_desc + C2_TXP_FLAGS);

or here

 +static int c2_change_mtu(struct net_device *netdev, int new_mtu)
 +{
 + int ret = 0;
 +
 + if (new_mtu  ETH_ZLEN || new_mtu  ETH_JUMBO_MTU)
 + return -EINVAL;
 +
 + netdev-mtu = new_mtu;
 +
 + if (netif_running(netdev)) {
 + c2_down(netdev);
 +
 + c2_up(netdev);
 + }

this looks odd...




___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general