Re: [openib-general] [PATCH][3/7]ipoib performance patches -- remove tx_ring

2006-05-26 Thread Roland Dreier
Shirley This patch reduces one extra ring between dev xmit queue
Shirley and device send queue and removes tx_lock in completion
Shirley handler.

I don't think replacing the ring with a linked list is an
improvement.  At best it works out to be the same.

But I didn't notice that you removed the tx_lock from the send
completion handler before.  I suspect that this is where all the
improvement is coming from.  Unfortunately it introduces a couple of
race conditions.  For example, if CPU1 is in ipoib_send() and CPU2 is
running the send completion handler, you can have:

  CPU1  CPU2

  err = post_send(priv, wr_id, address-ah, qpn, addr, skb-len);
  // err is non-zero
  if (!err) {
  } else {
if (!netif_queue_stopped(dev)) {
 
// send completion handler
// runs many times and drains
// completion queue
// queue is never stopped so
// it never gets woken again
if (netif_queue_stopped(dev))
  netif_wake_queue(dev);


 netif_stop_queue(dev);

// no more sends are posted so
// another completion never
// occurs and the queue stays
// stopped forever.

There are analogous races that cause spurious wakeups too.

However I think this is a good idea, and I think we can get around the
races by working around them in a way that doesn't hurt performance
(eg by having the netdev tx watchdog restart the queue in the scenario
above).  I cooked up a quick patch (below) but I didn't see any
performance improvement from this in my quick tests -- NPtcp peaks at
~3400 Mbit/sec between my test systems with 2.6.17-rc5 both with and
without this patch.

Then I tried to apply your patches 1/7 and 3/7 to see if they were any
different on my setup, but the patches didn't apply.  3/7 had no
attachment, and it's hopeless to try and apply the patches you send
inline.  So can you resend up-to-date versions of the patches that
give you a 10% improvement?  (BTW it would be nice if you could figure
out a way to fix your mail client to post patches inline without
mangling them, or at least attach them with a mime type of text/plain
or something)

Also, if you're interested, you could try the patch below and see how
it does on your tests.

 - R.

Here's my tx_lock removal test patch (it would need better comments
about all the races with tx_tail and some more careful review before
we actually apply it):

--- infiniband/ulp/ipoib/ipoib_main.c   (revision 7507)
+++ infiniband/ulp/ipoib/ipoib_main.c   (working copy)
@@ -634,6 +634,14 @@ static int ipoib_start_xmit(struct sk_bu
return NETDEV_TX_BUSY;
}
 
+   /*
+* Because tx_lock is not held when updating tx_tail in the
+* send completion handler, we may receive a spurious wakeup
+* that starts our queue when there really isn't space yet.
+*/
+   if (unlikely(priv-tx_head - priv-tx_tail == ipoib_sendq_size))
+   return NETDEV_TX_BUSY;
+
if (skb-dst  skb-dst-neighbour) {
if (unlikely(!*to_ipoib_neigh(skb-dst-neighbour))) {
ipoib_path_lookup(skb, dev);
@@ -703,6 +711,21 @@ static struct net_device_stats *ipoib_ge
 static void ipoib_timeout(struct net_device *dev)
 {
struct ipoib_dev_priv *priv = netdev_priv(dev);
+   unsigned long flags;
+   int lost_wakeup = 0;
+
+   spin_lock_irqsave(priv-tx_lock, flags);
+   if (netif_queue_stopped(dev) 
+   priv-tx_head - priv-tx_tail  ipoib_sendq_size) {
+   ipoib_dbg(priv, lost wakeup, head %u, tail %u\n,
+ priv-tx_head, priv-tx_tail);
+   lost_wakeup = 1;
+   netif_wake_queue(dev);
+   }
+   spin_unlock_irqrestore(priv-tx_lock, flags);
+
+   if (lost_wakeup)
+   return;
 
ipoib_warn(priv, transmit timeout: latency %d msecs\n,
   jiffies_to_msecs(jiffies - dev-trans_start));
--- infiniband/ulp/ipoib/ipoib_ib.c (revision 7511)
+++ infiniband/ulp/ipoib/ipoib_ib.c (working copy)
@@ -244,7 +244,6 @@ static void ipoib_ib_handle_wc(struct ne
 
} else {
struct ipoib_tx_buf *tx_req;
-   unsigned long flags;
 
if (wr_id = ipoib_sendq_size) {
ipoib_warn(priv, completion event with wrid %d ( 
%d)\n,
@@ -266,12 +265,17 @@ static void ipoib_ib_handle_wc(struct ne
 
dev_kfree_skb_any(tx_req-skb);
 
-   spin_lock_irqsave(priv-tx_lock, flags);
++priv-tx_tail;
+
+   /*
+* Since 

Re: [openib-general] [PATCH][3/7]ipoib performance patches -- remove tx_ring

2006-05-26 Thread Shirley Ma

Roland,

Roland,

Thanks for the review comments. I will update these
patches and
tests results.

BTW it would be nice if you could figure
out a way to fix your mail client to post patches inline without
mangling them, or at least attach them with a mime type of text/plain
or something.

I will use my unix account to send out patches. 

Also, if you're interested, you could try the patch below and see how
it does on your tests.

Sure. I will test it after this weekend.
Did you see send queue overrun
with tx_ring default size 128?

Thanks
Shirley Ma
IBM Linux Technology Center
15300 SW Koll Parkway
Beaverton, OR 97006-6063
Phone(Fax): (503) 578-7638___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH][3/7]ipoib performance patches -- remove tx_ring

2006-05-26 Thread Roland Dreier
Shirley Sure. I will test it after this weekend. Did you see send
Shirley queue overrun with tx_ring default size 128?

No, but my patch would still prevent send queue overruns.  It might
stop the netdevice queue because the send queue is full, but I didn't
turn on debugging to see that.

(Also the default TX ring size is 64, not 128, isn't it?)

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [PATCH][3/7]ipoib performance patches -- remove tx_ring

2006-05-26 Thread Shirley Ma

Roland Dreier [EMAIL PROTECTED] wrote on 05/26/2006
04:20:02 PM:
 (Also the default TX ring size is 64, not 128, isn't it?)
 
 - R.

Yes.

Thanks
Shirley Ma
IBM Linux Technology Center
15300 SW Koll Parkway
Beaverton, OR 97006-6063
Phone(Fax): (503) 578-7638




___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH][3/7]ipoib performance patches -- remove tx_ring

2006-05-25 Thread Roland Dreier
This also looks like a step backwards to me.  You are replacing a
cache-friendly array with a cache-unfriendly linked list, which also
requires two more lock/unlock operations in the fast path.

In a more general note, you are definitely generating some creative
ideas about IPoIB but you need to work on presentation and
communication if you want your patches integrated.  For example, all
you said about this patch is:

  Here is the tx_ring removal patch for you to review. 

If you want this to be merged then you need to provide a changelog
entry that explains _what_ the patch does, _how_ it does it, and most
importantly _why_ the patch is an improvement.  This is especially
important for these patches, which naively at least look like they are
making things worse.

Also, improving the quality of your patches would be helpful.  For
example, in this patch you delete the comments:

  -   /*
  -* We put the skb into the tx_ring _before_ we call post_send()
  -* because it's entirely possible that the completion handler will
  -* run before we execute anything after the post_send().  That
  -* means we have to make sure everything is properly recorded and
  -* our state is consistent before we call post_send().
  -*/

and then a few lines further down you create exactly the bug it described:

  +   err = post_send(priv, wr_id, address-ah, qpn, addr, skb-len);
  +   if (!err) {
  +   dev-trans_start = jiffies;
  +   IPOIB_SKB_PRV_ADDR(skb) = addr;
  +   IPOIB_SKB_PRV_AH(skb) = address;
  +   IPOIB_SKB_PRV_SKB(skb) = skb;
  +   spin_lock(priv-slist_lock);
  +   list_add_tail(IPOIB_SKB_PRV_LIST(skb), priv-send_list);
  +   spin_unlock(priv-slist_lock);
  +   return;

This means I have to read your patches very carefully, because they
often introduce races or other bugs.

Finally, keeping strictly to the one idea per patch rule and holding
back from the temptation to fiddle with unrelated things would be
nice.  In this case you have:

  -   spinlock_t   tx_lockcacheline_aligned_in_smp;

  +   spinlock_t   tx_lock;

That cacheline_aligned_in_smp annotation is something that you
added earlier in the patch series (with no explanation), and now
you're removing it (again with no explanation).  This sort of churn
just makes the patches bigger and harder to review and merge.

As I said you are generating very creative ideas for IPoIB.  Perhaps
you can find someone to help you polish the presentation so that we
can use them?

Thanks,
  Roland
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [PATCH][3/7]ipoib performance patches -- remove tx_ring

2006-05-25 Thread Shirley Ma

Roland,

I made some mistakes during splitting
these patches. Thanks for pointing out.
The reason I removed the cacheline because
I have tested and proved that it didn't help somehow.
Even in some code, I induced some locks,
the overall performance of all these 7 patches I am trying to post here
could improve IPoIB from 20% - 80% unidirectional and doubled bidirectional.
As you mentioned I need help to repolish
these patches. I am glad that you give all these valuable inputs of my
patches. Thanks lots here. 

Thanks
Shirley Ma
IBM Linux Technology Center
15300 SW Koll Parkway
Beaverton, OR 97006-6063
Phone(Fax): (503) 578-7638___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH][3/7]ipoib performance patches -- remove tx_ring

2006-05-25 Thread Roland Dreier
Shirley didn't help somehow.  Even in some code, I induced some
Shirley locks, the overall performance of all these 7 patches I
Shirley am trying to post here could improve IPoIB from 20% - 80%
Shirley unidirectional and doubled bidirectional.

I'm guessing that all of that gain comes from letting the send and
receive completion handlers run simultaneously.  Is that right?

For example how much of an improvement do you see if you just apply
the patches you've posted -- that is, only 1/7, 2/7 and 3/7?

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [PATCH][3/7]ipoib performance patches -- remove tx_ring

2006-05-25 Thread Shirley Ma

Roland Dreier [EMAIL PROTECTED] wrote on 05/25/2006
04:28:27 PM:

   Shirley didn't help somehow. Even in some
code, I induced some
   Shirley locks, the overall performance of all these
7 patches I
   Shirley am trying to post here could improve IPoIB
from 20% - 80%
   Shirley unidirectional and doubled bidirectional.
 
 I'm guessing that all of that gain comes from letting the send and
 receive completion handlers run simultaneously. Is that right?
 
 For example how much of an improvement do you see if you just apply
 the patches you've posted -- that is, only 1/7, 2/7 and 3/7?
 
 - R.

That's not true. I tested performance
with 1/7, 3/7 a couple weeks ago, I saw more than 10% improvement. I never
saw send queue overrun with tx_ring before on one cpu with one TCP stream.
After removing tx_ring, the send path is much faster, the default 128 is
not bigger enough to handle it. That's the reason I have another patch
to handler send queue overrun -- requeue the packet to the head of dev
xmit queue instead of current implementation which is silently dropped
the packets when device driver send queue is full, and depends on TCP retransmission.
This implementation would cause TCP fast trans, slow start, and packets
out of orders.

Thanks
Shirley Ma
IBM Linux Technology Center
15300 SW Koll Parkway
Beaverton, OR 97006-6063
Phone(Fax): (503) 578-7638


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH][3/7]ipoib performance patches -- remove tx_ring

2006-05-25 Thread Shirley Ma

Roland,

Roland Dreier [EMAIL PROTECTED] wrote on 05/25/2006
09:24:01 AM:

 This also looks like a step backwards to me.
You are replacing a
 cache-friendly array with a cache-unfriendly linked list, which also
 requires two more lock/unlock operations in the fast path.

This patch reduces one extra ring between
dev xmit queue and device 
send queue and removes tx_lock in completion
handler.

The whole purpose to have the send_list
and slock is for shutting down
clean up. Otherwise we don't need to
maintain this list. And most likely when
shutting down, waiting for 5HZ, the
list is empty. 

I could implment it differently, like
use RCU list with cache-friendly. 
I thought it's not worth it before since
i didn't see the gain.

Thanks
Shirley Ma
IBM Linux Technology Center
15300 SW Koll Parkway
Beaverton, OR 97006-6063
Phone(Fax): (503) 578-7638

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] [PATCH][3/7]ipoib performance patches -- remove tx_ring

2006-05-24 Thread Shirley Ma

Roland, 

Here is the tx_ring removal patch for
you to review.


diff -urpN infiniband-ah/ulp/ipoib/ipoib.h
infiniband-tx/ulp/ipoib/ipoib.h
--- infiniband-ah/ulp/ipoib/ipoib.h2006-05-23
10:09:05.0 -0700
+++ infiniband-tx/ulp/ipoib/ipoib.h2006-05-24
11:45:52.0 -0700
@@ -114,11 +114,19 @@ struct ipoib_rx_buf {
 dma_addr_tmapping;
 };
 
-struct ipoib_tx_buf {
-struct sk_buff *skb;
-DECLARE_PCI_UNMAP_ADDR(mapping)
+struct ipoib_skb_prv {
+dma_addr_t
   addr;
+struct ipoib_ah   
*ah;
+struct sk_buff   
*skb;
+struct list_head   
list;
 };
 
+#define IPOIB_SKB_PRV_ADDR(skb)(((struct
ipoib_skb_prv *)(skb)-cb)-addr)
+#define IPOIB_SKB_PRV_AH(skb)(((struct
ipoib_skb_prv *)(skb)-cb)-ah)
+#define IPOIB_SKB_PRV_SKB(skb)(((struct
ipoib_skb_prv *)(skb)-cb)-skb)
+#define IPOIB_SKB_PRV_LIST(skb)(((struct
ipoib_skb_prv *)(skb)-cb)-list)
+
+
 /*
 * Device private locking: tx_lock protects members used in TX fast
 * path (and we use LLTX so upper layers don't do extra locking).
@@ -166,12 +174,11 @@ struct ipoib_dev_priv {
 
 struct ipoib_rx_buf *rx_ring;
 
-spinlock_t
 tx_lockcacheline_aligned_in_smp;
-struct
ipoib_tx_buf *tx_ring;
-unsigned
  tx_head;
-unsigned
  tx_tail;
+spinlock_t
 tx_lock;
 struct ib_sge   
tx_sge;
 struct ib_send_wr  tx_wr;
+spinlock_t
  slist_lock;
+struct list_head   send_list;
 
 struct list_head dead_ahs;
 
diff -urpN infiniband-ah/ulp/ipoib/ipoib_ib.c infiniband-tx/ulp/ipoib/ipoib_ib.c
--- infiniband-ah/ulp/ipoib/ipoib_ib.c2006-05-23
10:14:08.0 -0700
+++ infiniband-tx/ulp/ipoib/ipoib_ib.c2006-05-24
11:58:46.0 -0700
@@ -243,45 +243,36 @@ static void ipoib_ib_handle_send_wc(stru
 
   
 struct ib_wc *wc)
 {
 struct ipoib_dev_priv *priv = netdev_priv(dev);
-unsigned int wr_id = wc-wr_id;
-struct ipoib_tx_buf *tx_req;
-unsigned long flags;
-
-ipoib_dbg_data(priv, called:
id %d, op %d, status: %d\n,
-
   wr_id, wc-opcode, wc-status);
-
-if (wr_id = ipoib_sendq_size)
{
-ipoib_warn(priv,
completion event with wrid %d ( %d)\n,
-
 wr_id, ipoib_sendq_size);
-return;
-}
-
-ipoib_dbg_data(priv, send complete,
wrid %d\n, wr_id);
+struct
sk_buff *skb;
+unsigned long wr_id = wc-wr_id;
 
-tx_req = priv-tx_ring[wr_id];
-
-dma_unmap_single(priv-ca-dma_device,
-
pci_unmap_addr(tx_req, mapping),
-
tx_req-skb-len,
-
DMA_TO_DEVICE);
+skb = (struct sk_buff *)wr_id;
+kref_put(IPOIB_SKB_PRV_AH(skb)-ref,
ipoib_free_ah);
 
+ if (IS_ERR(skb) || skb != IPOIB_SKB_PRV_SKB(skb))
{
+ ipoib_warn(priv,
send completion event with corrupted wrid\n);
+ return;
+ }
+list_del(IPOIB_SKB_PRV_LIST(skb));
+ 
+ipoib_dbg_data(priv, send complete,
wrid %lu\n, wr_id);
+ 
+ dma_unmap_single(priv-ca-dma_device,
+ 
   IPOIB_SKB_PRV_ADDR(skb),
+
skb-len,
+ 
DMA_TO_DEVICE);
+ 
 ++priv-stats.tx_packets;
-priv-stats.tx_bytes += tx_req-skb-len;
-
-dev_kfree_skb_any(tx_req-skb);
-
-spin_lock_irqsave(priv-tx_lock,
flags);
-++priv-tx_tail;
-if (netif_queue_stopped(dev) 
-  priv-tx_head - priv-tx_tail
= ipoib_sendq_size  1)
-netif_wake_queue(dev);
-spin_unlock_irqrestore(priv-tx_lock,
flags);
-
-if (wc-status != IB_WC_SUCCESS

-
 wc-status != IB_WC_WR_FLUSH_ERR)
-ipoib_warn(priv,
failed send event 
-
 (status=%d, wrid=%d vend_err
%x)\n,
-
 wc-status, wr_id, wc-vendor_err);
+priv-stats.tx_bytes += skb-len;
+dev_kfree_skb_any(skb);
+ 
+if (netif_queue_stopped(dev))
+ netif_wake_queue(dev);
+ if (wc-status != IB_WC_SUCCESS

+   wc-status
!= IB_WC_WR_FLUSH_ERR)
+ ipoib_warn(priv,
failed send event 
+ 
(status=%d, wrid=%lu vend_err
%x)\n,
+ 
wc-status, wr_id, wc-vendor_err);
 }
 
 void ipoib_ib_send_completion(struct ib_cq *cq, void *dev_ptr)
@@ -313,7 +304,7 @@ void ipoib_ib_recv_completion(struct ib_
 }
 
 static inline int post_send(struct ipoib_dev_priv *priv,
-
 unsigned int wr_id,
+
 unsigned long wr_id,
 
 struct ib_ah *address, u32 qpn,
 
 dma_addr_t addr, int len)
 {
@@ -333,8 +324,9 @@ void ipoib_send(struct net_device *dev, 
 struct
ipoib_ah *address, u32 qpn)
 {
 struct ipoib_dev_priv *priv = netdev_priv(dev);
-struct ipoib_tx_buf *tx_req;
 dma_addr_t addr;
+unsigned
long wr_id;
+unsigned long flags;
 int err;
 
 kref_get(address-ref);
@@ -350,38 +342,31 @@ void ipoib_send(struct net_device *dev, 
 
 ipoib_dbg_data(priv, sending
packet, length=%d address=%p qpn=0x%06x\n,
 
  skb-len, address, qpn);
-
-/*
- * We put the skb into the tx_ring
_before_ we call post_send()
- * because it's entirely possible
that the completion 

Re: [openib-general] [PATCH][3/7]ipoib performance patches -- remove tx_ring

2006-05-24 Thread Shirley Ma

Oops, I missed one pair of spin_lock_irqsave()/spin_lock_irqrestore()
to protect send_list in ipoib_ib_handle_send_wc().

Thanks
Shirley Ma
IBM Linux Technology Center
15300 SW Koll Parkway
Beaverton, OR 97006-6063
Phone(Fax): (503) 578-7638___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general