Re: [openib-general] [PATCH/RFC 1/2] IB: Return maybe_missed_event hint from ib_req_notify_cq()

2006-11-16 Thread Pradeep Satyanarayana
[EMAIL PROTECTED] wrote on 11/14/2006 03:18:23 PM:

 Shirley The rotting packet situation consistently happens for
 Shirley ehca driver. The napi could poll forever with your
 Shirley original patch. That's the reason I defer the rotting
 Shirley packet process in next napi poll.
 
 Hmm, I don't see it.  In my latest patch, the poll routine does:
 
 repoll:
done  = 0;
empty = 0;
 
while (max) {
   t = min(IPOIB_NUM_WC, max);
   n = ib_poll_cq(priv-cq, t, priv-ibwc);
 
   for (i = 0; i  n; ++i) {
  if (priv-ibwc[i].wr_id  IPOIB_OP_RECV) {
 ++done;
 --max;
 ipoib_ib_handle_rx_wc(dev, priv-ibwc + i);
  } else
 ipoib_ib_handle_tx_wc(dev, priv-ibwc + i);
   }
 
   if (n != t) {
  empty = 1;
  break;
   }
}
 
dev-quota -= done;
*budget-= done;
 
if (empty) {
   netif_rx_complete(dev);
   if (unlikely(ib_req_notify_cq(priv-cq,
  IB_CQ_NEXT_COMP |
  IB_CQ_REPORT_MISSED_EVENTS)) 
   netif_rx_reschedule(dev, 0))
  goto repoll;
 
   return 0;
}
 
return 1;
 
 so every receive completion will count against the limit set by the
 variable max.  The only way I could see the driver staying in the poll
 routine for a long time would be if it was only processing send
 completions, but even that doesn't actually seem bad: the driver is
 making progress handling completions.
 

Is it possible that when one gets into the rotting packet case, the 
quota
is at or close to 0 (on ehca). If in the cass it is 0 and 
netif_rx_reschedule() 
case wins (over netif_rx_schedule()) then it keeps spinning unable to 
process 
any packets since the undo parameter for netif_reschedule() is 0.

If netif_rx_reschedule() keeps winning for a few iterations then the 
receive
queues get full and dropping packets, thus causing a loss in performance.

If this is indeed the case, then one option to try out may be is to change 

the undo parameter of netif_rx_rechedule()to either IB_WC or even 
dev-weight.



 Shirley It does help the performance from 1XXMb/s to 7XXMb/s, but
 Shirley not as expected 3XXXMb/s.
 
 Is that 3xxx Mb/sec the performance you see without the NAPI patch?
 
 Shirley With the defer rotting packet process patch, I can see
 Shirley packets out of order problem in TCP layer.  Is it
 Shirley possible there is a race somewhere causing two napi polls
 Shirley in the same time? mthca seems to use irq auto affinity,
 Shirley but ehca uses round-robin interrupt.
 
 I don't see how two NAPI polls could run at once, and I would expect
 worse effects from them stepping on each other than just out-of-order
 packets.  However, the fact that ehca does round-robin interrupt
 handling might lead to out-of-order packets just because different
 CPUs are all feeding packets into the network stack.
 
  - R.
 
 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general
 
 To unsubscribe, please visit 
http://openib.org/mailman/listinfo/openib-general
 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH/RFC 1/2] IB: Return maybe_missed_event hint from ib_req_notify_cq()

2006-11-16 Thread Michael S. Tsirkin
Quoting r. Roland Dreier [EMAIL PROTECTED]:

 I would really like to understand why ehca does worse with NAPI.  In
 my tests both mthca and ipath exhibit various degrees of improvement
 depending on the test -- but I've never seen performance get worse.
 This is the main thing holding back merging NAPI.

Documentation/netowkring/NAPI_HOWTO.txt says:

APPENDIX 3: Scheduling issues
As seen NAPI moves processing to softirq level. Linux uses the ksoftirqd as the
general solution to schedule softirq's to run before next interrupt and by
putting them under scheduler control. Also this prevents consecutive softirq's
from monopolize the CPU. This also have the effect that the priority of ksoftirq
needs to be considered when running very CPU-intensive applications and
networking to get the proper balance of softirq/user balance. Increasing
ksoftirq priority to 0 (eventually more) is reported cure problems with low
network performance at high CPU load.

So I wonder
1. Was this tried? Its clear that we have high CPU load.
2. Could this be the reason that e.g. e1000 disables NAPI by default?

The issue seem sufficiently tricky that we may yet find ourselves debugging
NAPI performance problems in the field.
Maybe we still need a module option ...

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH/RFC 1/2] IB: Return maybe_missed_event hint from ib_req_notify_cq()

2006-11-16 Thread Roland Dreier
Pradeep Is it possible that when one gets into the rotting
Pradeep packet case, the quota is at or close to 0 (on ehca). If
Pradeep in the cass it is 0 and netif_rx_reschedule() case wins
Pradeep (over netif_rx_schedule()) then it keeps spinning unable
Pradeep to process any packets since the undo parameter for
Pradeep netif_reschedule() is 0.

It is possible that the quota is close to 0, but I don't see how the
poll routine could spin with quota (the variable max) equal to 0.  If
max is 0, then the while (max) loop will never be entered, empty
will remain 0, and the poll routine will simply fall through and
return 1.  Do you agree with that summary?

We don't want the undo parameter of netif_rx_reschedule() to be
non-zero because when we go back to repoll, done is reset to 0.  So
there's no reason to increase the quota again.

I guess you could instrument how many iterations there are with a
small value of max, but I would assume it's self-limiting, since the
last few completions should appear fairly quickly.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH/RFC 1/2] IB: Return maybe_missed_event hint from ib_req_notify_cq()

2006-11-16 Thread Roland Dreier
  What I have found in ehca driver, n! = t, does't mean it's empty. If poll
  again, there are still some packets in cq. IB_CQ_REPORT_mISSED_EVENTS most
  of the time reports 1. It relies on netif_rx_reschedule() returns 0 to exit
  napi poll. That might be the reason in poll routine for a long time? I will
  rerun my test to use n! = 0 to see any difference here.

Maybe there's an ehca bug in poll CQ?  If n != t then it should mean
that the CQ was indeed drained.  I would expect a missed event would
be rare, because it means a completion occurs between the last poll CQ
and the request notify, and that shouldn't be that common...

My rough estimate is that even at a higher throughput than what you're
seeing, IPoIB should only generate ~ 500K completions/sec, which means
the average delay between completions is 2 microseconds.  So I
wouldn't expect completions to hit the window between poll and request
notify that often.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH/RFC 1/2] IB: Return maybe_missed_event hint from ib_req_notify_cq()

2006-11-16 Thread Shirley Ma





Roland Dreier [EMAIL PROTECTED] wrote on 11/16/2006 11:26:31 AM:

   What I have found in ehca driver, n! = t, does't mean it's empty. If
poll
   again, there are still some packets in cq. IB_CQ_REPORT_mISSED_EVENTS
most
   of the time reports 1. It relies on netif_rx_reschedule() returns0 to
exit
   napi poll. That might be the reason in poll routine for a long time? I
will
   rerun my test to use n! = 0 to see any difference here.

 Maybe there's an ehca bug in poll CQ?  If n != t then it should mean
 that the CQ was indeed drained.  I would expect a missed event would
 be rare, because it means a completion occurs between the last poll CQ
 and the request notify, and that shouldn't be that common...

 My rough estimate is that even at a higher throughput than what you're
 seeing, IPoIB should only generate ~ 500K completions/sec, which means
 the average delay between completions is 2 microseconds.  So I
 wouldn't expect completions to hit the window between poll and request
 notify that often.

  - R.

I have tried low_latency is 1 to disable TCP prequeue, the throughput was
increased from 1XXMb/s to 4XXMb/s. If I delayed net_skb_receive() a little
bit, I could get around 1700Mb/s. If I totally disable
netif_rx_reschedule(), then there is no repoll and return 0, I could get
around 2900Mb/s throughout without packet seeing out of order issues. I
have tried to add a spin lock in ipoib_poll(). And I still see packets out
of orders.

disable prequeue: 2XXMb/s to 4XXMb/s (packets out of order)
slowdown netif_receive_skb: 17XXMb/s (packets out of order)
don't handle missed event: 28XXMb/s (no packets out of order)
handler missed envent later: 7XXMb/s to 11XXMb/s (packets out of order)

Maybe it is ehca driver deliver packets much faster?  Which makes me think
user processes tcp backlogqueue, prequeue might be out of order?

Thanks
Shirley Ma___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH/RFC 1/2] IB: Return maybe_missed_event hint from ib_req_notify_cq()

2006-11-15 Thread Shirley Ma

Roland Dreier [EMAIL PROTECTED] wrote on 11/14/2006 03:18:23 PM:

   Shirley The rotting packet situation consistently happens for
   Shirley ehca driver. The napi could poll forever with your
   Shirley original patch. That's the reason I defer the rotting
   Shirley packet process in next napi poll.
 
 Hmm, I don't see it. In my latest patch, the poll routine does:
 
 repoll:
  done = 0;
  empty = 0;
 
  while (max) {
t = min(IPOIB_NUM_WC, max);
n = ib_poll_cq(priv-cq, t, priv-ibwc);
 
for (i = 0; i  n; ++i) {
 if (priv-ibwc[i].wr_id  IPOIB_OP_RECV) {
   ++done;
   --max;
   ipoib_ib_handle_rx_wc(dev, priv-ibwc + i);
 } else
   ipoib_ib_handle_tx_wc(dev, priv-ibwc + i);
}
 
if (n != t) {
 empty = 1;
 break;
}
  }
 
  dev-quota -= done;
  *budget  -= done;
 
  if (empty) {
netif_rx_complete(dev);
if (unlikely(ib_req_notify_cq(priv-cq,
   IB_CQ_NEXT_COMP |
   IB_CQ_REPORT_MISSED_EVENTS)) 
  netif_rx_reschedule(dev, 0))
 goto repoll;
 
return 0;
  }
 
  return 1;
 
 so every receive completion will count against the limit set by the
 variable max. The only way I could see the driver staying in the poll
 routine for a long time would be if it was only processing send
 completions, but even that doesn't actually seem bad: the driver is
 making progress handling completions.
What I have found in ehca driver, n! = t, does't mean it's empty. If poll again, there are still some packets in cq. IB_CQ_REPORT_mISSED_EVENTS most of the time reports 1. It relies on netif_rx_reschedule() returns 0 to exit napi poll. That might be the reason in poll routine for a long time? I will rerun my test to use n! = 0 to see any difference here.

 
   Shirley It does help the performance from 1XXMb/s to 7XXMb/s, but
   Shirley not as expected 3XXXMb/s.
 
 Is that 3xxx Mb/sec the performance you see without the NAPI patch?

Without NAPI patch, in my test environment ehca can gain around 2800Mb to 3000Mb/s throughput.

   Shirley With the defer rotting packet process patch, I can see
   Shirley packets out of order problem in TCP layer. Is it
   Shirley possible there is a race somewhere causing two napi polls
   Shirley in the same time? mthca seems to use irq auto affinity,
   Shirley but ehca uses round-robin interrupt.
 
 I don't see how two NAPI polls could run at once, and I would expect
 worse effects from them stepping on each other than just out-of-order
 packets. However, the fact that ehca does round-robin interrupt
 handling might lead to out-of-order packets just because different
 CPUs are all feeding packets into the network stack.
 
 - R.
Normally for NAPI there should be only one running at a time. And NAPI process packet all the way to TCP layer by processing packet one by one (netif_receive_skb()). So it shouldn't lead to out-of-packets even for round-robin interrupt handling in NAPI. I am still investing this.

Thanks
Shirley___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH/RFC 1/2] IB: Return maybe_missed_event hint from ib_req_notify_cq()

2006-11-15 Thread Shirley Ma

I will rerun my test to use n! = 0 to see any difference here.
It should be n == 0 to indicate empty.

Thanks
Shirley Ma___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH/RFC 1/2] IB: Return maybe_missed_event hint from ib_req_notify_cq()

2006-11-14 Thread Shirley Ma

Roland Dreier [EMAIL PROTECTED] wrote on 11/13/2006 08:45:52 AM:

  Sorry I was not intend to send previous email. Anyway I accidently sent it
  out. What I thought was there would be a problem, if the missed_event
  always return to 1. Then this napi poll would keep forever.
 
 Well, it's limited by the quota that the net stack gives it, so
 there's no possibility of looping forever. However

  How about defer the rotting packets process later? like this:
 
 that seems like it is still correct.
 
  With this patch, I could get NAPI + non scaling code throughput performance
  from 1XXMb/s to 7XXMb/s, anyway there are some other problems I am still
  investigating now.
 
 But I wonder why it gives you a factor of 4 in performance?? Why does
 it make a difference? I would have thought that the rotting packet
 situation would be rare enough that it doesn't really matter for
 performance exactly how we handle it.
 
 What are the other problems you're investigating?
 
 - R.

The rotting packet situation consistently happens for ehca driver. The napi could poll forever with your original patch. That's the reason I defer the rotting packet process in next napi poll. It does help the performance from 1XXMb/s to 7XXMb/s, but not as  expected 3XXXMb/s. With the defer rotting packet process patch, I can see packets out of order problem in TCP layer. Is it possible there is a race somewhere causing two napi polls in the same time? mthca seems to use irq auto affinity, but ehca uses round-robin interrupt.

Thanks
Shirley Ma___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH/RFC 1/2] IB: Return maybe_missed_event hint from ib_req_notify_cq()

2006-11-14 Thread Shirley Ma

Roland,

I think there is a barrier might be needed in checking LINK SCHED state, like smp_mb_before_clear_bit() and smp_mb_after_clear_bit(), otherwise the netif_rx_reschedule() for rotting packet and next interrupt netif_rx_schedule() could be running in the time. If the interrupt is round-robin fashion, then packets are going to be out of order in TCP layer. I will test it out once I have the resouce. 

How do you think?

Thanks
Shirley Ma___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH/RFC 1/2] IB: Return maybe_missed_event hint from ib_req_notify_cq()

2006-11-14 Thread Shirley Ma

Roland,

	Ignore my previous email, test_and_set_bit is atomic operation and has the memeory barrier already. 

Thanks
Shirley Ma___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH/RFC 1/2] IB: Return maybe_missed_event hint from ib_req_notify_cq()

2006-11-14 Thread Shirley Ma

From the code work through, if defering rotting packet process by return (missed_event  netif_rx_reschedule(dev, 0)); Then the same dev-poll can be added to per cpu poll list twice: one is from netif_rx_reschedule, one is from napi return 1. That might explains packets out of order: when one poll finishes and reset LINK SCHED bit and the next interrupt runs on other cpu.

Thanks
Shirley Ma___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH/RFC 1/2] IB: Return maybe_missed_event hint from ib_req_notify_cq()

2006-11-14 Thread Roland Dreier
Shirley The rotting packet situation consistently happens for
Shirley ehca driver. The napi could poll forever with your
Shirley original patch. That's the reason I defer the rotting
Shirley packet process in next napi poll.

Hmm, I don't see it.  In my latest patch, the poll routine does:

repoll:
done  = 0;
empty = 0;

while (max) {
t = min(IPOIB_NUM_WC, max);
n = ib_poll_cq(priv-cq, t, priv-ibwc);

for (i = 0; i  n; ++i) {
if (priv-ibwc[i].wr_id  IPOIB_OP_RECV) {
++done;
--max;
ipoib_ib_handle_rx_wc(dev, priv-ibwc + i);
} else
ipoib_ib_handle_tx_wc(dev, priv-ibwc + i);
}

if (n != t) {
empty = 1;
break;
}
}

dev-quota -= done;
*budget-= done;

if (empty) {
netif_rx_complete(dev);
if (unlikely(ib_req_notify_cq(priv-cq,
  IB_CQ_NEXT_COMP |
  IB_CQ_REPORT_MISSED_EVENTS)) 
netif_rx_reschedule(dev, 0))
goto repoll;

return 0;
}

return 1;

so every receive completion will count against the limit set by the
variable max.  The only way I could see the driver staying in the poll
routine for a long time would be if it was only processing send
completions, but even that doesn't actually seem bad: the driver is
making progress handling completions.

Shirley It does help the performance from 1XXMb/s to 7XXMb/s, but
Shirley not as expected 3XXXMb/s.

Is that 3xxx Mb/sec the performance you see without the NAPI patch?

Shirley With the defer rotting packet process patch, I can see
Shirley packets out of order problem in TCP layer.  Is it
Shirley possible there is a race somewhere causing two napi polls
Shirley in the same time? mthca seems to use irq auto affinity,
Shirley but ehca uses round-robin interrupt.

I don't see how two NAPI polls could run at once, and I would expect
worse effects from them stepping on each other than just out-of-order
packets.  However, the fact that ehca does round-robin interrupt
handling might lead to out-of-order packets just because different
CPUs are all feeding packets into the network stack.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH/RFC 1/2] IB: Return maybe_missed_event hint from ib_req_notify_cq()

2006-11-14 Thread Roland Dreier
Shirley From the code work through, if defering rotting packet
Shirley process by return (missed_event 
Shirley netif_rx_reschedule(dev, 0)); Then the same dev-poll can
Shirley be added to per cpu poll list twice: one is from
Shirley netif_rx_reschedule, one is from napi return 1. That
Shirley might explains packets out of order: when one poll
Shirley finishes and reset LINK SCHED bit and the next interrupt
Shirley runs on other cpu.

I don't think so.  It's completely normal for dev-poll() to return 1
when there's more work to be done, so the networking core will just
move the device to the tail of the poll list.  So I don't see why it
would make a difference if we actually do any work after
netif_rx_reschedule() or not.

On the other hand I still don't see why it helps to drop out of the
poll routine immediately even though we know there is more work to be
done, and the networking stack has told us it could handle more packets.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH/RFC 1/2] IB: Return maybe_missed_event hint from ib_req_notify_cq()

2006-11-13 Thread Roland Dreier
  Sorry I was not intend to send previous email. Anyway I accidently sent it
  out. What I thought was there would be a problem, if the missed_event
  always return to 1. Then this napi poll would keep forever.

Well, it's limited by the quota that the net stack gives it, so
there's no possibility of looping forever.  However

  How about defer the rotting packets process later? like this:

that seems like it is still correct.

  With this patch, I could get NAPI + non scaling code throughput performance
  from 1XXMb/s to 7XXMb/s, anyway there are some other problems I am still
  investigating now.

But I wonder why it gives you a factor of 4 in performance??  Why does
it make a difference?  I would have thought that the rotting packet
situation would be rare enough that it doesn't really matter for
performance exactly how we handle it.

What are the other problems you're investigating?

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH/RFC 1/2] IB: Return maybe_missed_event hint from ib_req_notify_cq()

2006-11-10 Thread Roland Dreier
I think it has to stay the way I wrote it.  Your version:

+if (empty) {
+ib_req_notify_cq(priv-cq, IB_CQ_NEXT_COMP, 
missed_event);
+if (unlikely(missed_event)  netif_rx_reschedule(dev, 
0))
+goto repoll;
+netif_rx_complete(dev);
+
+return 0;
+}

has a race: suppose missed_event is 0 but an event _is_ generated
right before the call to netif_rx_complete().  Then the interrupt
handler might run before the call to netif_rx_complete(), try to
schedule the NAPI poll, but end up doing nothing because the poll
routine is still running.  Then the poll routine will call
netif_rx_complete() and return 0, so it won't get called again ever
(because the CQ event has already fired).  And so the interface will
hang and never make any more progress.

I would really like to understand why ehca does worse with NAPI.  In
my tests both mthca and ipath exhibit various degrees of improvement
depending on the test -- but I've never seen performance get worse.
This is the main thing holding back merging NAPI.

Does the NAPI patch help mthca on pSeries?  I wonder if it's not ehca,
but rather that there's some ppc64 quirk that makes NAPI a lot more
expensive.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH/RFC 1/2] IB: Return maybe_missed_event hint from ib_req_notify_cq()

2006-11-10 Thread Shirley Ma

 I would really like to understand why ehca does worse with NAPI. In
 my tests both mthca and ipath exhibit various degrees of improvement
 depending on the test -- but I've never seen performance get worse.
 This is the main thing holding back merging NAPI.
 
 Does the NAPI patch help mthca on pSeries? I wonder if it's not ehca,
 but rather that there's some ppc64 quirk that makes NAPI a lot more
 expensive.
 
 - R.

Got your point. Sorry I haven't made any big progress yet. What I have found so far in none scaling code, if I always set missed_event = 0 without peeking rotting packet, then NAPI will increase the performance and reduce the cpu utilization. That's the reason I suggest above change.
I have't found the reason for scaling code dropping 2/3 of the performance yet.
The NAPI touch test for methca on power performance is good.
So I don't think it's ppc4 issue.

Thanks
Shirley___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH/RFC 1/2] IB: Return maybe_missed_event hint from ib_req_notify_cq()

2006-11-10 Thread Shirley Ma

Roland Dreier [EMAIL PROTECTED] wrote on 11/10/2006 07:00:46 AM:

 I think it has to stay the way I wrote it. Your version:
 
 +  if (empty) 
 +return (ib_req_notify_cq(priv-cq, IB_CQ_NEXT_COMP|IB_CQ_REPORT_MISSED_EVENTS)  netif_rx_reschedule(dev, 0);
 +   


Thanks
Shirley Ma
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH/RFC 1/2] IB: Return maybe_missed_event hint from ib_req_notify_cq()

2006-11-10 Thread Shirley Ma

Roland,

Sorry I was not intend to send previous email. Anyway I accidently sent it out. What I thought was there would be a problem, if the missed_event always return to 1. Then this napi poll would keep forever. How about defer the rotting packets process later? like this:

 
 +  if (empty) 
 +return (ib_req_notify_cq(priv-cq, 
IB_CQ_NEXT_COMP|IB_CQ_REPORT_MISSED_EVENTS)  netif_rx_reschedule(dev, 0);
 +   

With this patch, I could get NAPI + non scaling code throughput performance from 1XXMb/s to 7XXMb/s, anyway there are some other problems I am still investigating now.

Thanks
Shirley Ma___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH/RFC 1/2] IB: Return maybe_missed_event hint from ib_req_notify_cq()

2006-11-09 Thread Shirley Ma

Roland Dreier [EMAIL PROTECTED] wrote on 10/19/2006 09:10:35 PM:
Roland,

 I looked over my code again, and I don't see anything obviously wrong,
 but it's quite possible I made a mistake that I just can't see right
 now (like reversing a truth value somewhere). Someone who knows how
 ehca works might be able to spot the error.
 
 - R.

Your code is OK. I just found the problem here.
+		 if (empty) {
+		 		 netif_rx_complete(dev);
+		 		 ib_req_notify_cq(priv-cq, IB_CQ_NEXT_COMP, missed_event);
+		 		 if (unlikely(missed_event)  netif_rx_reschedule(dev, 0))
+		 		 		 goto repoll;
+
+		 		 return 0;
+		 }

netif_rx_complete() should be called right before return. It does improve none scaling performance with this patch, but reduce scaling performance.

+		 if (empty) {
+		 		 ib_req_notify_cq(priv-cq, IB_CQ_NEXT_COMP, missed_event);
+		 		 if (unlikely(missed_event)  netif_rx_reschedule(dev, 0))
+		 		 		 goto repoll;
+		 		 netif_rx_complete(dev);
+
+		 		 return 0;
+		 }
Any other reason, calling netif_rx_complete() while still possibably within napi? 

Thanks
Shirley Ma
IBM Linux Technology Center___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH/RFC 1/2] IB: Return maybe_missed_event hint from ib_req_notify_cq()

2006-10-21 Thread Michael S. Tsirkin
Quoting r. Shirley Ma [EMAIL PROTECTED]:
 Subject: Re: [PATCH/RFC 1/2] IB: Return maybe_missed_event hint from 
 ib_req_notify_cq()
 
 Michael S. Tsirkin [EMAIL PROTECTED] wrote on 10/19/2006 01:21:45 PM:
 
  Please also note that due to factors such as TCP window limits, TX on a 
  single
  socket is often stalled.  To really stress a connection and see benefit from
  NAPI you should be running multiple socket streams in parallel:
  either just run multiple instances of netperf/netserver, or use iperf with 
  -P flag.
 
 I used to get 7600Mb/s IPoIB one socket duplex throughput with my other IPoIB 
 patches on 2.6.5 kernel under certain configuration. Which makes me believe 
 we could gain close to link throughput with one UD QP. Now I couldn't get it 
 anymore on the new kernel. I was struggling with TCP window limits on the new 
 kernel. Do you have any hint?

Could be the stretch ACK fix - newer kernels are sending much more ACKs
than 2.6.5. Without NAPI, this means we have more interrupts - lower
throughput.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH/RFC 1/2] IB: Return maybe_missed_event hint from ib_req_notify_cq()

2006-10-20 Thread Shirley Ma

Retest several times, this hack patch only fixed the none scaling code. I thought I tested both scaling and none scaling, it seems I made a mistake, I might configure and test none scaling configuration twice in previous run.

thanks
Shirley Ma
IBM Linux Technology Center
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH/RFC 1/2] IB: Return maybe_missed_event hint from ib_req_notify_cq()

2006-10-20 Thread Shirley Ma

Michael S. Tsirkin [EMAIL PROTECTED] wrote on 10/19/2006 01:21:45 PM:

 Please also note that due to factors such as TCP window limits, TX on a single
 socket is often stalled. To really stress a connection and see benefit from
 NAPI you should be running multiple socket streams in parallel: 
 either just run multiple instances of netperf/netserver, or use iperf with -P flag.

I used to get 7600Mb/s IPoIB one socket duplex throughput with my other IPoIB patches on 2.6.5 kernel under certain configuration. Which makes me believe we could gain close to link throughput with one UD QP. Now I couldn't get it anymore on the new kernel. I was struggling with TCP window limits on the new kernel. Do you have any hint?

thanks
Shirley Ma
IBM Linux Technology Center
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH/RFC 1/2] IB: Return maybe_missed_event hint from ib_req_notify_cq()

2006-10-19 Thread Michael S. Tsirkin
Quoting r. Shirley Ma [EMAIL PROTECTED]:
 Subject: Re: [PATCH/RFC 1/2] IB: Return maybe_missed_event hint from 
 ib_req_notify_cq()
 
 Roland Dreier [EMAIL PROTECTED] wrote on 10/18/2006 01:55:13 PM:
  I would like to understand why there's a throughput difference with
  scaling turned off, since the NAPI code doesn't change the interrupt
  handling all that much, and should lower the CPU usage if anything.

 That's I am trying to understand now.
 Yes, the send side rate dropped significant, cpu usage lower as well.

I think its a TCP configuration issue in your setup.
With NAPI, we seem to be getting stable high results as reported previously by
Eli. Hope to complete testing and report next week.

Shirley, can you please post test setup and results? Some ideas:

Please note that you need to apply the NAPI patch on both send and recv side
in stream benchmark, otherwise one side will be a bottleneck.

Please also note that due to factors such as TCP window limits, TX on a single
socket is often stalled.  To really stress a connection and see benefit from
NAPI you should be running multiple socket streams in parallel: either just run
multiple instances of netperf/netserver, or use iperf with -P flag.

You also should look at the effect of increasing the send/recv socket buffer
size.

Finally, tuning RX/TX ring size should also be done differently:
you might be over-running your queues, so make them bigger for NAPI.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH/RFC 1/2] IB: Return maybe_missed_event hint from ib_req_notify_cq()

2006-10-19 Thread Shirley Ma

Thanks Michael for all these tips. I have tried several suggestions as you proposed here. I couldn't see performance any better. The TCP_RR is dropped to 472 trans/s from about 18,000 trans/s , and TCP_STREAM BW is dropped to 1/3 as before ( ehca + scaling code) with same TCP configuration, send queue size=recve queue size = 1K.

Thanks
Shirley Ma
IBM Linux Technology Center___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH/RFC 1/2] IB: Return maybe_missed_event hint from ib_req_notify_cq()

2006-10-19 Thread Roland Dreier
OK, as promised I redid the request notify patches according to
Michael's suggestion to add a new flag.  I think I like this a lot
better -- I'll send out the new patches as replies to this email for
comments.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH/RFC 1/2] IB: Return maybe_missed_event hint from ib_req_notify_cq()

2006-10-19 Thread Shirley Ma

Roland,

I have applied this patch and updated patch 2/2. You will send out an updated patch 2/2, I think. 
I did some extra modification in ipoib code, (which has more extra repolls). I do see around 10% or more performance improvement now with this change on both scaling and none scaling code. I will run oprofile tomorrow to see the difference. I think with these extra repolls, the cpu utilization would be much higher.

Thanks
Shirley Ma
IBM Linux Technology Center___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH/RFC 1/2] IB: Return maybe_missed_event hint from ib_req_notify_cq()

2006-10-19 Thread Roland Dreier
  I have applied this patch and updated patch 2/2. You will send out an
  updated patch 2/2, I think.

Sorry, messed that up.  I just sent out the patch.

  I did some extra modification in ipoib code, (which has more extra
  repolls). I do see around 10% or more performance improvement now with this
  change on both scaling and none scaling code. I will run oprofile tomorrow
  to see the difference. I think with these extra repolls, the cpu
  utilization would be much higher.

You mean you add more calls to ib_poll_cq()?  Where do you add them?
Why does it help?

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH/RFC 1/2] IB: Return maybe_missed_event hint from ib_req_notify_cq()

2006-10-19 Thread Shirley Ma

Roland Dreier [EMAIL PROTECTED] wrote on 10/19/2006 07:39:25 PM:

  I have applied this patch and updated patch 2/2. You will send out an
  updated patch 2/2, I think.
 
 Sorry, messed that up. I just sent out the patch.
No problem, I did same change.

 You mean you add more calls to ib_poll_cq()? Where do you add them?
 Why does it help?
 
 - R.
I run out of ideas why losing 2/3 of the throughput and got 476 trans/s. So I assumed there was always a missed event, then ipoib would stay in its napi poll within its scheduled time. That's why it helps. This is really a hack, doesn't address the problem. It sacrificed cpu utilization and gained the performance back. I need to understand how ehca reports missing event, there might be some delay there?

Thanks
Shirley Ma
IBM Linux Technology Center___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH/RFC 1/2] IB: Return maybe_missed_event hint from ib_req_notify_cq()

2006-10-19 Thread Roland Dreier
  I run out of ideas why losing 2/3 of the throughput and got 476 trans/s. So
  I assumed there was always a missed event, then ipoib would stay in its
  napi poll within its scheduled time. That's why it helps. This is really a
  hack, doesn't address the problem. It sacrificed cpu utilization and gained
  the performance back. I need to understand how ehca reports missing event,
  there might be some delay there?

It's entirely possible that my implementation of the missing event
hint in ehca is wrong.  I just guessed based on how poll CQ is
implemented -- if the consumer requests a hint about missing events,
then I lock the CQ and check if its empty after requesting
notification.

I looked over my code again, and I don't see anything obviously wrong,
but it's quite possible I made a mistake that I just can't see right
now (like reversing a truth value somewhere).  Someone who knows how
ehca works might be able to spot the error.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH/RFC 1/2] IB: Return maybe_missed_event hint from ib_req_notify_cq()

2006-10-19 Thread Shirley Ma

Roland Dreier [EMAIL PROTECTED] wrote on 10/19/2006 09:10:35 PM:
 It's entirely possible that my implementation of the missing event
 hint in ehca is wrong. I just guessed based on how poll CQ is
 implemented -- if the consumer requests a hint about missing events,
 then I lock the CQ and check if its empty after requesting
 notification.
 
 I looked over my code again, and I don't see anything obviously wrong,
 but it's quite possible I made a mistake that I just can't see right
 now (like reversing a truth value somewhere). Someone who knows how
 ehca works might be able to spot the error.
 
 - R.

The oprofile data (with your napi + this hack patch) looks good, it reduced cpu utilization significantly. (I was wrong about cpu utilization.) I will talk with ehca team regarding this missing event hint patch on ehca.

thanks
Shirley Ma
IBM Linux Technology Center
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH/RFC 1/2] IB: Return maybe_missed_event hint from ib_req_notify_cq()

2006-10-18 Thread Shirley Ma

Roland Dreier [EMAIL PROTECTED] wrote on 10/17/2006 08:41:59 PM:
 Anyway, I'm eagerly awaiting your NAPI results with ehca.
 
 Thanks,
  Roland

Thanks. The touch test results are not good. This NAPI patch induces huge latency for ehca driver scaling code, the throughput performance is not good. (I am not fully conviced the huge latency is because of raising NAPI in thread context.) Then I tried ehca no scaling driver, the latency looks good, but the throughtput is still a problem. We are working on these issues. Hopefully we can get the answer soon. 

Thanks
Shirley Ma
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH/RFC 1/2] IB: Return maybe_missed_event hint from ib_req_notify_cq()

2006-10-18 Thread Roland Dreier
  Thanks. The touch test results are not good. This NAPI patch induces huge
  latency for ehca driver scaling code, the throughput performance is not
  good. (I am not fully conviced the huge latency is because of raising NAPI
  in thread context.) Then I tried ehca no scaling driver, the latency looks
  good, but the throughtput is still a problem. We are working on these
  issues. Hopefully we can get the answer soon.

Hmm, the results with scaling on are not that unexpected, since the
idea of scheduling a thread round-robin (to kill all cache locality)
is pretty dubious anyway.

I would like to understand why there's a throughput difference with
scaling turned off, since the NAPI code doesn't change the interrupt
handling all that much, and should lower the CPU usage if anything.
Does changing the netdev weight value affect anything?

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH/RFC 1/2] IB: Return maybe_missed_event hint from ib_req_notify_cq()

2006-10-18 Thread Shirley Ma

Roland Dreier [EMAIL PROTECTED] wrote on 10/18/2006 01:55:13 PM:
 I would like to understand why there's a throughput difference with
 scaling turned off, since the NAPI code doesn't change the interrupt
 handling all that much, and should lower the CPU usage if anything.
That's I am trying to understand now. 
Yes, the send side rate dropped significant, cpu usage lower as well.

 Does changing the netdev weight value affect anything?
 
 - R.
No, it doesn't.

Thanks
Shirley Ma
IBM Linux Technology Center___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH/RFC 1/2] IB: Return maybe_missed_event hint from ib_req_notify_cq()

2006-10-17 Thread Shirley Ma

Hi, Roland,

There were a couple errors and warning when I applied this patch to OFED-1.1-rc7.
1. ehca_req_notify_cq() in ehca_iverbs.h is not updated.
2. *maybe_missed_event = ipz_qeit_is_valid(my_cq-ipz_queue) should be =ipz_qeit_is_valid(my_cq-ipz_queue) 
3. a compile warning this line return cqe_flags  7 == queue-toggle_state  1;

Thanks
Shirley Ma___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH/RFC 1/2] IB: Return maybe_missed_event hint from ib_req_notify_cq()

2006-10-17 Thread Roland Dreier
Sorry, I just noticed my cross-compilation test setup was messed up,
so I never actually built the modified ehca, even though I thought I
did.  Anyway, the patch below on top of what I sent out should fix
everything up.

I've also merged this into my ipoib-napi branch, so what's there
should be OK for ehca now.

Anyway, I'm eagerly awaiting your NAPI results with ehca.

Thanks,
  Roland

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general