Re: [openib-general] IPOIB NAPI

2007-02-27 Thread Shirley Ma




Roland Dreier [EMAIL PROTECTED] wrote on 02/26/2007 02:36:26 PM:
 No way, it's way too late at this point to change the kernel-user ABI,
 let alone change all ULPs.

  - R.

Hello Roland,

So the IBV_CQ_REPORT_MISSED_EVENTS has been part of OFED-1.2 already? I can
generate the patch for all ULPs to use this for review. Do you need me to
do that?

Thanks
Shirley Ma___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] IPOIB NAPI

2007-02-27 Thread Roland Dreier
  So the IBV_CQ_REPORT_MISSED_EVENTS has been part of OFED-1.2 already? I can
  generate the patch for all ULPs to use this for review. Do you need me to
  do that?

No, it's not in OFED 1.2 or the upstream kernel.  And no one has
implemented it for userspace (and I'm somewhat reluctant to break the
ABI at this point without some performance numbers to motivate making
this API change).

Have the NAPI performance problems with ehca been resolved?  We could
probably merge IPoIB NAPI for 2.6.22 then, which would pull in the
kernel changes at least.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] IPOIB NAPI

2007-02-27 Thread Shirley Ma




oland Dreier [EMAIL PROTECTED] wrote on 02/27/2007 02:41:44 PM:

   So the IBV_CQ_REPORT_MISSED_EVENTS has been part of OFED-1.2 already?
I can
   generate the patch for all ULPs to use this for review. Do you need me
to
   do that?

 No, it's not in OFED 1.2 or the upstream kernel.  And no one has
 implemented it for userspace (and I'm somewhat reluctant to break the
 ABI at this point without some performance numbers to motivate making
 this API change).

 Have the NAPI performance problems with ehca been resolved?  We could
 probably merge IPoIB NAPI for 2.6.22 then, which would pull in the
 kernel changes at least.

  - R.
We have addressed the NAPI performance issues with ehca driver. I believe
the patches have been upper stream. However the test results show that it's
better to delay poll again to next NAPI interval, something like this:

poll-cq
notify-cq, if missed_event  netif_rx_reschedule()
return 1

vs.
poll-cq,
notify-cq, if missed_event  netif_rx_reschedule()
poll again
return 0

It seems ehca delivering packet much faster than other HCAs. So poll again
would stay in the loop for many many times. So the above changes doesn't
impact other HCAs, I would recommand it. I saw same implementations on
other ethernet drivers.

Thanks
Shirley Ma___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] IPOIB NAPI

2007-02-27 Thread Michael S. Tsirkin
 Quoting Shirley Ma [EMAIL PROTECTED]:
 Subject: Re: [openib-general] IPOIB NAPI
 
 Roland Dreier [EMAIL PROTECTED] wrote on 02/27/2007 02:41:44 PM:
 
So the IBV_CQ_REPORT_MISSED_EVENTS has been part of OFED-1.2 already? I 
  can
generate the patch for all ULPs to use this for review. Do you need me to
do that?
  
  No, it's not in OFED 1.2 or the upstream kernel.  And no one has
  implemented it for userspace (and I'm somewhat reluctant to break the
  ABI at this point without some performance numbers to motivate making
  this API change).
  
  Have the NAPI performance problems with ehca been resolved?  We could
  probably merge IPoIB NAPI for 2.6.22 then, which would pull in the
  kernel changes at least.
  
   - R.
 We have addressed the NAPI performance issues with ehca driver. I believe the 
 patches have been upper stream. However the test results show that it's 
 better to delay poll again to next NAPI interval, something like this:
 
 poll-cq
 notify-cq, if missed_event  netif_rx_reschedule()
 return 1
 
 vs.
 poll-cq,
 notify-cq, if missed_event  netif_rx_reschedule()
 poll again
 return 0
 
 It seems ehca delivering packet much faster than other HCAs. So poll again 
 would stay in the loop for many many times. So the above changes doesn't 
 impact other HCAs, I would recommand it. I saw same implementations on other 
 ethernet drivers.

I'm confused. Which one is faster?

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[OFA General] Re: [openib-general] IPOIB NAPI

2007-02-27 Thread Shirley Ma




I'm confused. Which one is faster?
Sorry for the confusion, Michael. The one with return 1 has better
throughput.

Thanks
Shirley Ma___
general mailing list
[EMAIL PROTECTED]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] ipoib the partial pkey

2007-02-26 Thread Hal Rosenstock
On Sun, 2007-02-25 at 05:48, Or Gerlitz wrote:
 Sean Hefty wrote:
  I looked into this more...
  RFC 4391 states (middle of page 5):
  For a node to join a partition, one of its ports must be assigned the 
  relevant
  P_Key by the SM [RFC4392].
 
  Jumping to RFC 4392 (top of page 4):
 
 Just to have us agree on the quote, it is from section 4 of rfc 4392 
 (page 14) eg in http://www.ietf.org/rfc/rfc4392.txt
 
  at the time of creating an IB multicast group, multiple values such as the
  P_Key, Q_Key, Service Level, Hop Limit, Flow ID, TClass, MTU, etc.  have to 
  be
  specified.  These values should be such that all potential members of the IB
  multicast group are able to communicate with one another when using them.
 
 OK, I suggest to remove this spec limitation,

IMO you would need to get the IB spec changed first in order to do this.

 as it does not allow the 
 use case of a server using a partition for which inter-client 
 communication is not allowed.

 Actually since it does not let people use partial membership 
 partitioning with IPoIB as every ipoib device needs to join the 
 broadcast group, it is probably a spec bug and not a limitation done on 
 purpose.

I'm pretty sure this was done on purpose (a conscious choice) as it is
based on what the IBA spec requires.

The flip side of this approach are the partial connectivity issues which
Sean mentioned and this will be reported as SM failures (e.g. more
support issues).

 A simple real-life example is I/O target, the system admin wants IB 
 block and/or file storage traffic to use a partition, but he does not 
 want initiators to communicate among themselves on this partition.
 
 To achieve that the SM is configured to assign the partial pkey to the 
 initiator nodes and the full pkey to the target ports.
 
 The current implementation of IPoIB and core perfectly (and 
 transparently...) supports that.

and is currently non compliant in its behavior.

-- Hal

 Or.
 
 
 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general
 
 To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
 


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] ipoib the partial pkey

2007-02-26 Thread Hal Rosenstock
On Mon, 2007-02-26 at 10:37, Or Gerlitz wrote:
 Hal Rosenstock wrote:
  On Sun, 2007-02-25 at 05:48, Or Gerlitz wrote:
 
  Just to have us agree on the quote, it is from section 4 of rfc 4392 
  (page 14) eg in http://www.ietf.org/rfc/rfc4392.txt
 
  at the time of creating an IB multicast group, multiple values such as the
  P_Key, Q_Key, Service Level, Hop Limit, Flow ID, TClass, MTU, etc.  have 
  to be
  specified.  These values should be such that all potential members of the 
  IB
  multicast group are able to communicate with one another when using them.
 
  OK, I suggest to remove this spec limitation,
 
  IMO you would need to get the IB spec changed first in order to do this.
 
 do you refers to this?
 
  What about the description og P_Key in MCMemberRecord (table 210 on p.
  908 which is compliance) which states:
  
  All members of the multicast group shall have full membership in the
  partition indicated by the partition key.
 
 if yes, indeed, this also has to be changed.

Yes, for one. There may be others; I didn't look exhaustively at the
spec for this.

-- Hal

 Or.
 


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] IPOIB NAPI

2007-02-26 Thread Shirley Ma




Roland,

Yes. It would be good to reduce number of interrupts by changing all upper
layer protocols to use:

poll CQ
notify CQ, rotting packet notification
poll again

instead of
notify CQ
poll CQ

If possible this can be in OFED-1.2?

Thanks
Shirley Ma
IBM Linux Technology Center
15300 SW Koll Parkway
Beaverton, OR 97006-6063
Phone(Fax): (503) 578-7638
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] IPOIB NAPI

2007-02-26 Thread Roland Dreier
  Yes. It would be good to reduce number of interrupts by changing all upper
  layer protocols to use:
  
  poll CQ
  notify CQ, rotting packet notification
  poll again
  
  instead of
  notify CQ
  poll CQ
  
  If possible this can be in OFED-1.2?

No way, it's way too late at this point to change the kernel-user ABI,
let alone change all ULPs.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] ipoib the partial pkey

2007-02-25 Thread Or Gerlitz
Sean Hefty wrote:
 I looked into this more...
 RFC 4391 states (middle of page 5):
 For a node to join a partition, one of its ports must be assigned the relevant
 P_Key by the SM [RFC4392].

 Jumping to RFC 4392 (top of page 4):

Just to have us agree on the quote, it is from section 4 of rfc 4392 
(page 14) eg in http://www.ietf.org/rfc/rfc4392.txt

 at the time of creating an IB multicast group, multiple values such as the
 P_Key, Q_Key, Service Level, Hop Limit, Flow ID, TClass, MTU, etc.  have to be
 specified.  These values should be such that all potential members of the IB
 multicast group are able to communicate with one another when using them.

OK, I suggest to remove this spec limitation, as it does not allow the 
use case of a server using a partition for which inter-client 
communication is not allowed.

Actually since it does not let people use partial membership 
partitioning with IPoIB as every ipoib device needs to join the 
broadcast group, it is probably a spec bug and not a limitation done on 
purpose.

A simple real-life example is I/O target, the system admin wants IB 
block and/or file storage traffic to use a partition, but he does not 
want initiators to communicate among themselves on this partition.

To achieve that the SM is configured to assign the partial pkey to the 
initiator nodes and the full pkey to the target ports.

The current implementation of IPoIB and core perfectly (and 
transparently...) supports that.

Or.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] ipoib the partial pkey, was: librdmacm: fix bug causing failure to work with partial membership pkey

2007-02-23 Thread Hal Rosenstock
On Thu, 2007-02-22 at 18:35, Sean Hefty wrote:
 Doesn't this allow ipoib to join a multicast group for which it may not be 
 able
 to communicate with all members?  For the broadcast group, this seems like an
 error to me.  Can ipoib work in such a configuration?  If all nodes were
 assigned a partial membership PKey, none of them could communicate, but no
 errors would be generated anywhere.
 
 I looked into this more...
 
 RFC 4391 states (middle of page 5):
 
 For a node to join a partition, one of its ports must be assigned the relevant
 P_Key by the SM [RFC4392].
 
 Jumping to RFC 4392 (top of page 4):
 
 at the time of creating an IB multicast group, multiple values such as the
 P_Key, Q_Key, Service Level, Hop Limit, Flow ID, TClass, MTU, etc.  have to be
 specified.  These values should be such that all potential members of the IB
 multicast group are able to communicate with one another when using them.

Seems to me that for P_Key this would mean full membership.

 and page 14:
 
 Note that this IB_join to the broadcast group is a FullMember join.

FullMember here is referring to MCMemberRecord:JoinState rather than
partition membership.

-- Hal

 If any of
 the ports or the switches linking the port to the rest of the IPoIB subnet
 cannot support the parameters (e.g., path MTU or P_Key) associated with the
 broadcast group, then the IB_join request will fail and the requesting port 
 will
 not become part of the IPoIB subnet
 
 My initial interpretation of these statements lead me to believe that pkey 
 check
 in ib_find_cached_pkey should not mask out the upper bit, which would prevent
 ipoib from joining a multicast group until it has been configured with the 
 full
 membership pkey for the broadcast group.  Does this seem reasonable?
 
 - Sean


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] IPOIB NAPI

2007-02-22 Thread Michael S. Tsirkin
   An API idea:
   how about instead testing missed_events, we add a flag:
   
   IB_CQ_TEST (or a longer name IB_CQ_REPORT_MISSED_EVENTS?)
   and change ib_req_notify_cq to return int which will keep
   the missed_events value, only if this flag is set?
   
   This has 2 advatages
   - Less churn updating all users to new API - they just ignore return value 
 -
 and still almost no overhead for them as they don't set IB_CQ_TEST
   - For all users we have to push less values on stack - note compiler can't
 get rid of them as we are calling function through a pointer
   - For users that do
 missed_events = ib_req_notify_cq(priv-cq, IB_CQ_NEXT_COMP | IB_CQ_TEST)
 we get the result in register.
 
 Yes, I like this.  So ib_req_notify_cq() gets a return value that is
 negative if an error occurred, 0 if everything is fine, or positive if
 a missed event might have happened.
 
 I think I prefer the longer name IB_CQ_REPORT_MISSED_EVENTS -- at
 least there's a chance at guessing what it means even if you don't
 read the documentation.

By the way, how about extending the userspace API in a similiar
fashion?

missed_events = ibv_req_notify_cq(priv-cq, IBV_CQ_NEXT_COMP |
  IBV_CQ_REPORT_MISSED_EVENTS)


-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] IPOIB NAPI

2007-02-22 Thread Roland Dreier
  By the way, how about extending the userspace API in a similiar
  fashion?
  
  missed_events = ibv_req_notify_cq(priv-cq, IBV_CQ_NEXT_COMP |
 IBV_CQ_REPORT_MISSED_EVENTS)

It would require a kernel-user ABI bump.  Is it worth it?

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] ipoib the partial pkey, was: librdmacm: fix bug causing failure to work with partial membership pkey

2007-02-22 Thread Sean Hefty
Doesn't this allow ipoib to join a multicast group for which it may not be able
to communicate with all members?  For the broadcast group, this seems like an
error to me.  Can ipoib work in such a configuration?  If all nodes were
assigned a partial membership PKey, none of them could communicate, but no
errors would be generated anywhere.

I looked into this more...

RFC 4391 states (middle of page 5):

For a node to join a partition, one of its ports must be assigned the relevant
P_Key by the SM [RFC4392].

Jumping to RFC 4392 (top of page 4):

at the time of creating an IB multicast group, multiple values such as the
P_Key, Q_Key, Service Level, Hop Limit, Flow ID, TClass, MTU, etc.  have to be
specified.  These values should be such that all potential members of the IB
multicast group are able to communicate with one another when using them.

and page 14:

Note that this IB_join to the broadcast group is a FullMember join.  If any of
the ports or the switches linking the port to the rest of the IPoIB subnet
cannot support the parameters (e.g., path MTU or P_Key) associated with the
broadcast group, then the IB_join request will fail and the requesting port will
not become part of the IPoIB subnet

My initial interpretation of these statements lead me to believe that pkey check
in ib_find_cached_pkey should not mask out the upper bit, which would prevent
ipoib from joining a multicast group until it has been configured with the full
membership pkey for the broadcast group.  Does this seem reasonable?

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] IPOIB NAPI

2007-02-22 Thread Michael S. Tsirkin
 Quoting Roland Dreier [EMAIL PROTECTED]:
 Subject: Re: IPOIB NAPI
 
   By the way, how about extending the userspace API in a similiar
   fashion?
   
   missed_events = ibv_req_notify_cq(priv-cq, IBV_CQ_NEXT_COMP |
IBV_CQ_REPORT_MISSED_EVENTS)
 
 It would require a kernel-user ABI bump. Is it worth it?

I hear some people asking for it: I imagine reasons are same as NAPI -
race-free, clean API to switch from polling to event mode -
rather than a minor optimization.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] IPoIB CM for merge?

2007-02-04 Thread Michael S. Tsirkin
 Quoting Steve Wise [EMAIL PROTECTED]:
 Subject: Re: [openib-general] IPoIB CM for merge?
 
 On Fri, 2007-02-02 at 13:15 +0200, Michael S. Tsirkin wrote:
   Quoting Roland Dreier [EMAIL PROTECTED]:
   Subject: Re: IPoIB CM for merge?
   
 Could you please spend some time reviewing IPoIB CM code?
 I am concerned about missing the 2.6.21 merge window.
   
   Thanks for the reminder.
   
   Can we trade?  Have you looked at the cxgb3 iwarp driver?  Any comments?
  
  OK.
  I am not sure I have the last version posted so I am going to go by what
  is there in OFED git tree.
  
  And I also only looked under drivers/infiniband/.
  
  So, here are some questions: I looked in the archives and have not seen
  these addressed. Maybe these can be answered and then I'll go from there?
  Does this sound OK?
  
  Files with names like
  ./core/cxio_hal.c
  ./core/cxio_hal.h
  normally generate a fair bit of discussion which wasn't present here,
  I did not guess everyone was just busy.
  For example, why is there both struct iwch_cq and struct t3_cq?
  
 
 The cxgb3/core code defines a low level interface to the RDMA bits of
 the T3 device. 
 
 This code was originally a separate module (named cxio) that allowed
 other RDMA middleware layers to sit on top of the this core rdma module.
 At the time, there was RNIC-PI and OFA being developed.  So that is the
 history of this.  As per the first openib review (about a year ago) of
 this code I merged this core module into the cxgb3 module.  I left the
 file structure and names as-is because it was low priority IMO.
 
 The t3_cq struct is the low level CQ structure used to manage both a HW
 accessed CQ and a SW CQ (needed to handle error cases and out of order
 completions). The iwch_cq struct contains the stuff needed to integrate
 with the OFA core and uverbs code. It contains a t3_cq inline.

So now that there's a common module, there's no technical reason for
the two-level structure to exist? I would say you want to at least
move the files into a common directory.

I think you will also find that for datapath operations such as poll cq,
converting completion from hardware to struct t3_cqe, and from
that to ib_wc adds an untrivial amount of overhead.


  File tcb.h comment says:
  /* This file is automatically generated --- do not edit */
  This looks like a GPL violation, does it not?
  
 
 I can add the license if that's what you mean.

I mean that this file does not seem to be the source, in the GPL sense.
The following comes from COPYING under linux source directory:

The source code for a work means the preferred form of the work for
making modifications to it.  For an executable work, complete source
code means all the source code for all modules it contains, plus any
associated interface definition files, plus the scripts used to
control compilation and installation of the executable.

So I think you must make the actual source available under the terms of GPL.

  What's the deal with the naming convention?
  Is there a reason in cxgb3, some files start with iwch and some with cxio?
  How about using cxgb3 prefix all over?
 
 The cxio_ prefix is used for the low-level functions/types that talk
 directly with the HW.  iwch_ is the provider driver functions that
 interface with the OFA stack.  I'd rather not change the names.
 Especially since this has already gone through several review cycles.
 I'm hoping we can get this in and improve it with subsequent
 submissions.  Is that reasonable?



-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] IPoIB connected mode review comments

2007-02-04 Thread Michael S. Tsirkin
 Quoting Steve Wise [EMAIL PROTECTED]:
 Subject: IPoIB connected mode review comments
 
 On Thu, 2007-02-01 at 20:48 -0800, Roland Dreier wrote:
Have you had a chance to review this?
  
  Still on my list.
  
  Can we trade?  Can you look at the IPoIB connected mode stuff in the
  ipoib-cm branch in
  
  git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git
  
  and let me know if you see anything you don't like?
  
   - R.
 
 Here are my comments.  I'm not an ib cm expert though.  These are mostly
 questions:

Steve, thanks for looking at the code!
I hope the following answers your questions.

 
 Since IPoIB is using IP addresses already, wouldn't it be simpler to use
 the rdma cm to setup connections?  

IPoIB is not using IP addresses. It uses hardware addresses as any network
device would. So using rdma cm does not make sense.

 Could you optimize this design and only signal some of the tx wrs?

This optimization would apply to UD mode too.
No one so far came up with a way to do this cleanly.

 In ipoib_cm_send() you call ipoib_cm_skb_too_long() if the packet is too
 large for the interface mtu.  And you print a warning.  But
 ipoib_cm_skb_too_long() actually queues the packet for the cm case.  For
 ud it just drops the packet.  The skb task for cm then will send a
 ICMP_DEST_UNREACH for these packets.  Why the difference?

For UD I just kept the current behaviour - I think
this can actually only happen in case of a race when packet was queued
before MTU was changed, so the originator was already notified of
the MTU change by the stack above us.

For CM the local MTU may exceed the size of a buffer that was posted on
the remote QP. So we need to send ICMP_DEST_UNREACH to reduce the
originator's dest MTU to whatever this QP actually can support.
Since this needs the original skb, and must be done from task or bh context,
so we queue the skb and handle it in task context.

 Also if this
 packet came from the local stack via a local application, you don't want
 to send  DEST_UNREACH, right?  (I'm probably just confused about the
 purpose of this).

Yes, sending DEST_UNREACH does not seem to affect local interface.  That's why
I call update_pmtu too.  It is also good to update the MTU ASAP to reduce the
number lot of packets that are dropped - and update_pmtu can be called from
atomic context. I do not know how to tell the packet is from local
stack and it does not seem to do any harm to handle all packets in a uniform
manner.

net/ipv4/ip_gre.c and net/ipv4/ipip.c are examples of code that do something
similiar.

 In ipoib_cm_tx_completion() you rearm, then drain the cq.  I thought
 there was some reason that it was better to do drain/rearm/drain?
 Something about if you rearm and there's a cq entry mthca does another
 immediate interrupt?  

Again, this comment applies to UD mode as well.
AFAIK so far this worked best.

 In ipoib_cm_handle_tx_wc():
 
 When can a tx completion happen with a wr_id that isn't within the
 ipoib_sendq_size range?  This looks like it is really a bug condition
 that should never happen.

Because of this:
post_send(priv, tx, tx-tx_head  (ipoib_sendq_size - 1))
so wr_id is always within range.
Again, this is exactly the same logic as in UD case.

 I see the same code in the rx completion path too.  

It's even simpler there:
+   for (i = 0; i  ipoib_recvq_size; ++i) {

...

+   if (ipoib_cm_post_receive(dev, i)) {

...

+   }
+   }

So i is always within RX size range.

 Also, what's up with the /* FIXME */ comment?

Since I have QPs which I never post send WRs on, I should be able to set
.cap.max_send_wr to 0 and .cap.max_send_sge should not matter.

However, low level drivers do not seem to support this at the moment, so
I set these to 1 for now - this is also correct but has a small memory cost. 

 You lock the priv-lock inside of the priv-tx_lock.  Is this ordering
 correct and consistent across all the code?

Yes, that's the nesting rule.

 ipoib_cm_handle_rx_wc() - what's up with the XXX comment?

We have the same comment in UD code - that's where this comes from.
Basically we don't have an easy way to know the correct packet type,
and always setting it to PACKET_HOST seems to work.

 What's the algorithm to keep enough buffers posted in the SRQ?

Same as with UD really - if I can't allocate a new skb I repost
the old one and increment the dropped packet counter.


-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] IPoIB CM for merge?

2007-02-04 Thread Michael S. Tsirkin
 Quoting Pradeep Satyanarayana [EMAIL PROTECTED]:
 Subject: Re: [openib-general] IPoIB CM for merge?
 
 
 Hello Michael, 
 
 Here are a few more observations : 

Pradeep, I think you are posting in the wrong thread: it seems you are not
talking about my code, but rather about the project you mentioned of
implementing IPoIB CM without SRQ.

IPoIB CM currently falls back on UD mode for HCAs that do not support SRQ,
so there should be no problem for the ehca - as new code won't be activated.
As I said already, I do not see a clean way to address this limitation,
so I would rather have current IPoIB CM code merged upstream first, and think
about enhancements later.

 
 1. For the SRQ case, the skbs and recieve biffers are posted during init and 
 even before the rx_qp is created. This causes a problem (atleast for non 
 SRQs) for the ehca. We need to call the ipoib_cm_alloc_skb() and 
 ipoib_cm_post_recieve() after the rx_qp
 is in the RTR state. 
 
 2. Also found that in ipoib_cm_create_rx_qp() one needs to initialize 
 .cap.max_recv_wr and .cap.max_recv_sge. Otherwise this leads to some problems 
 like rq overflows and causing communication failures. 

Yes, I think these are some of the things that would need to be done to make 
IPoIB CM
work without SRQ. It is clearly not something we want to do for SRQ case 
however:
for example, posting WRs to SRQ during connection setup would race
against completion events for other connections. And assigning .cap.max_recv_wr 
 0
for a QP not connected to SRQ does not make sense, and might thinkably confuse
low level drivers.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] IPoIB CM for merge?

2007-02-02 Thread Michael S. Tsirkin
 Quoting Roland Dreier [EMAIL PROTECTED]:
 Subject: Re: IPoIB CM for merge?
 
   Could you please spend some time reviewing IPoIB CM code?
   I am concerned about missing the 2.6.21 merge window.
 
 Thanks for the reminder.
 
 Can we trade?  Have you looked at the cxgb3 iwarp driver?  Any comments?

OK.
I am not sure I have the last version posted so I am going to go by what
is there in OFED git tree.

And I also only looked under drivers/infiniband/.

So, here are some questions: I looked in the archives and have not seen
these addressed. Maybe these can be answered and then I'll go from there?
Does this sound OK?

Files with names like
./core/cxio_hal.c
./core/cxio_hal.h
normally generate a fair bit of discussion which wasn't present here,
I did not guess everyone was just busy.
For example, why is there both struct iwch_cq and struct t3_cq?

File tcb.h comment says:
/* This file is automatically generated --- do not edit */
This looks like a GPL violation, does it not?

What's the deal with the naming convention?
Is there a reason in cxgb3, some files start with iwch and some with cxio?
How about using cxgb3 prefix all over?

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] IPoIB CM for merge?

2007-02-02 Thread Steve Wise
On Fri, 2007-02-02 at 13:15 +0200, Michael S. Tsirkin wrote:
  Quoting Roland Dreier [EMAIL PROTECTED]:
  Subject: Re: IPoIB CM for merge?
  
Could you please spend some time reviewing IPoIB CM code?
I am concerned about missing the 2.6.21 merge window.
  
  Thanks for the reminder.
  
  Can we trade?  Have you looked at the cxgb3 iwarp driver?  Any comments?
 
 OK.
 I am not sure I have the last version posted so I am going to go by what
 is there in OFED git tree.
 
 And I also only looked under drivers/infiniband/.
 
 So, here are some questions: I looked in the archives and have not seen
 these addressed. Maybe these can be answered and then I'll go from there?
 Does this sound OK?
 
 Files with names like
 ./core/cxio_hal.c
 ./core/cxio_hal.h
 normally generate a fair bit of discussion which wasn't present here,
 I did not guess everyone was just busy.
 For example, why is there both struct iwch_cq and struct t3_cq?
 

The cxgb3/core code defines a low level interface to the RDMA bits of
the T3 device. 

This code was originally a separate module (named cxio) that allowed
other RDMA middleware layers to sit on top of the this core rdma module.
At the time, there was RNIC-PI and OFA being developed.  So that is the
history of this.  As per the first openib review (about a year ago) of
this code I merged this core module into the cxgb3 module.  I left the
file structure and names as-is because it was low priority IMO.

The t3_cq struct is the low level CQ structure used to manage both a HW
accessed CQ and a SW CQ (needed to handle error cases and out of order
completions). The iwch_cq struct contains the stuff needed to integrate
with the OFA core and uverbs code. It contains a t3_cq inline.

 File tcb.h comment says:
 /* This file is automatically generated --- do not edit */
 This looks like a GPL violation, does it not?
 

I can add the license if that's what you mean.

 What's the deal with the naming convention?
 Is there a reason in cxgb3, some files start with iwch and some with cxio?
 How about using cxgb3 prefix all over?

The cxio_ prefix is used for the low-level functions/types that talk
directly with the HW.  iwch_ is the provider driver functions that
interface with the OFA stack.  I'd rather not change the names.
Especially since this has already gone through several review cycles.
I'm hoping we can get this in and improve it with subsequent
submissions.  Is that reasonable?

Steve.
 




___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] IPoIB connected mode review comments

2007-02-02 Thread Steve Wise
On Thu, 2007-02-01 at 20:48 -0800, Roland Dreier wrote:
   Have you had a chance to review this?
 
 Still on my list.
 
 Can we trade?  Can you look at the IPoIB connected mode stuff in the
 ipoib-cm branch in
 
 git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git
 
 and let me know if you see anything you don't like?
 
  - R.

Here are my comments.  I'm not an ib cm expert though.  These are mostly
questions:


Since IPoIB is using IP addresses already, wouldn't it be simpler to use
the rdma cm to setup connections?  

Could you optimize this design and only signal some of the tx wrs?

In ipoib_cm_send() you call ipoib_cm_skb_too_long() if the packet is too
large for the interface mtu.  And you print a warning.  But
ipoib_cm_skb_too_long() actually queues the packet for the cm case.  For
ud it just drops the packet.  The skb task for cm then will send a
ICMP_DEST_UNREACH for these packets.  Why the difference?  Also if this
packet came from the local stack via a local application, you don't want
to send  DEST_UNREACH, right?  (I'm probably just confused about the
purpose of this).

In ipoib_cm_tx_completion() you rearm, then drain the cq.  I thought
there was some reason that it was better to do drain/rearm/drain?
Something about if you rearm and there's a cq entry mthca does another
immediate interrupt?  

In ipoib_cm_handle_tx_wc():

When can a tx completion happen with a wr_id that isn't within the
ipoib_sendq_size range?  This looks like it is really a bug condition
that should never happen.  I see the same code in the rx completion path
too.  

Also, what's up with the /* FIXME */ comment?

You lock the priv-lock inside of the priv-tx_lock.  Is this ordering
correct and consistent across all the code?


ipoib_cm_handle_rx_wc() - what's up with the XXX comment?

What's the algorithm to keep enough buffers posted in the SRQ?






___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] IPoIB CM for merge?

2007-02-02 Thread Pradeep Satyanarayana
Hello Michael,

Here are a few more observations :

1. For the SRQ case, the skbs and recieve biffers are posted during init 
and even before the rx_qp is created. This causes a problem (atleast for 
non SRQs) for the ehca. We need to call the ipoib_cm_alloc_skb() and 
ipoib_cm_post_recieve() after the rx_qp is in the RTR state.

2. Also found that in ipoib_cm_create_rx_qp() one needs to initialize 
.cap.max_recv_wr and .cap.max_recv_sge. Otherwise this leads to some 
problems like rq overflows and causing communication failures.

Pradeep
[EMAIL PROTECTED]___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] IPoIB CM for merge?

2007-02-01 Thread Michael S. Tsirkin
Roland, 2.6.20 is nearly done.
Could you please spend some time reviewing IPoIB CM code?
I am concerned about missing the 2.6.21 merge window.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] IPoIB CM for merge?

2007-02-01 Thread Roland Dreier
  Could you please spend some time reviewing IPoIB CM code?
  I am concerned about missing the 2.6.21 merge window.

Thanks for the reminder.

Can we trade?  Have you looked at the cxgb3 iwarp driver?  Any comments?

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] IPoIB CM for merge?

2007-02-01 Thread Michael S. Tsirkin
 Quoting Roland Dreier [EMAIL PROTECTED]:
 Subject: Re: IPoIB CM for merge?
 
   Could you please spend some time reviewing IPoIB CM code?
   I am concerned about missing the 2.6.21 merge window.
 
 Thanks for the reminder.
 
 Can we trade?  Have you looked at the cxgb3 iwarp driver?  Any comments?

I haven't yet, sorry. OK.
I am not sure I have the last version posted so I am going to go by what
is there in OFED git tree.

And I also only looked under drivers/infiniband/.

So, here are some questions: I looked in the archives and have not seen
these addressed. Maybe these can be answered and then I'll go from there?
Does this sound OK?

Files with names like
./core/cxio_hal.c
./core/cxio_hal.h
normally generate a fair bit of discussion which wasn't present here,
I did not guess everyone was just busy.
For example, why is there both struct iwch_cq and struct t3_cq?

File tcb.h comment says:
/* This file is automatically generated --- do not edit */
This looks like a GPL violation, does it not?

What's the deal with the naming convention?
Is there a reason in cxgb3, some files start with iwch and some with cxio?
How about using cxgb3 prefix all over?

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] IPOIB CM with Non SRQ support

2007-01-30 Thread Michael S. Tsirkin

 -One artifact of the current send side implemantation is that for every
 message we create a new set of tx qps.

I do not believe this describes the implementation correctly - ipoib_cm_tx is
cached in ipoib_neigh structure so that once a connection is setup, it is reused
for all messages to the same neighbour.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] ipoib, ipv6 and multicast groups

2007-01-29 Thread chas williams - CONTRACTOR
recently our sm started throwing the following errors:

Jan 29 18:10:49 706710 [42003940] - __get_new_mlid: ERR 1B23: All available:32 
mlids are taken
Jan 29 18:10:49 706721 [42003940] - osm_mcmr_rcv_create_new_mgrp: ERR 1B19: 
__get_new_mlid failed
Jan 29 18:10:51 345113 [42804940] - __get_new_mlid: ERR 1B23: All available:32 
mlids are taken
Jan 29 18:10:51 345132 [42804940] - osm_mcmr_rcv_create_new_mgrp: ERR 1B19: 
__get_new_mlid failed
Jan 29 18:10:51 514312 [41802940] - __get_new_mlid: ERR 1B23: All available:32 
mlids are taken
Jan 29 18:10:51 514320 [41802940] - osm_mcmr_rcv_create_new_mgrp: ERR 1B19: 
__get_new_mlid failed
Jan 29 18:10:51 735732 [42804940] - __get_new_mlid: ERR 1B23: All available:32 
mlids are taken

we tracked this down to a problem with ipoib interaction
with ipv6.  ipv6 joins two multicast groups, instead of 
just one like ipv4.

# netstat -A inet6 -g  -n
...
IPv6/IPv4 Group Memberships
Interface   RefCnt Group
--- -- -
lo  1  ff02::1
ib0 1  ff02::1:ff00:77a2
ib0 1  ff02::1


# netstat -A inet6 -g  -n
...
IPv6/IPv4 Group Memberships
Interface   RefCnt Group
--- -- -
lo  1  224.0.0.1
ib0 1  224.0.0.1


# cat /sys/kernel/debug/ipoib/ib0_mcg
GID: ff12:401b::0:0:0:0:1
  created: 4298482097
  queuelen: 0
  complete:   yes
  send_only:   no

GID: ff12:401b::0:0:0::
  created: 4298482097
  queuelen: 0
  complete:   yes
  send_only:   no

GID: ff12:601b::0:0:0:0:1
  created: 4298482097
  queuelen: 0
  complete:   yes
  send_only:   no

GID: ff12:601b::0:0:1:ff00:77a2
  created: 4298482097
  queuelen: 0
  complete:   yes
  send_only:   no


the ff02::1:ff00:77a2 group is specific to the interface (link local),
so each of our ib hosts running ipv6 registers its own unique multicast
group.  since our network is bigger than 32 hosts, it appears that we
have exceeded the multicast tables in our local switches and this is
making opensm generate the above error.

besides not running ipv6, are there any thoughts about this?

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] ipoib, ipv6 and multicast groups

2007-01-29 Thread Hal Rosenstock
On Mon, 2007-01-29 at 13:17, chas williams - CONTRACTOR wrote:
 recently our sm started throwing the following errors:
 
 Jan 29 18:10:49 706710 [42003940] - __get_new_mlid: ERR 1B23: All 
 available:32 mlids are taken
 Jan 29 18:10:49 706721 [42003940] - osm_mcmr_rcv_create_new_mgrp: ERR 1B19: 
 __get_new_mlid failed
 Jan 29 18:10:51 345113 [42804940] - __get_new_mlid: ERR 1B23: All 
 available:32 mlids are taken
 Jan 29 18:10:51 345132 [42804940] - osm_mcmr_rcv_create_new_mgrp: ERR 1B19: 
 __get_new_mlid failed
 Jan 29 18:10:51 514312 [41802940] - __get_new_mlid: ERR 1B23: All 
 available:32 mlids are taken
 Jan 29 18:10:51 514320 [41802940] - osm_mcmr_rcv_create_new_mgrp: ERR 1B19: 
 __get_new_mlid failed
 Jan 29 18:10:51 735732 [42804940] - __get_new_mlid: ERR 1B23: All 
 available:32 mlids are taken

32 is too low for MLID space support IMO.

 we tracked this down to a problem with ipoib interaction
 with ipv6.  ipv6 joins two multicast groups, instead of 
 just one like ipv4.
 
   # netstat -A inet6 -g  -n
   ...
   IPv6/IPv4 Group Memberships
   Interface   RefCnt Group
   --- -- -
   lo  1  ff02::1
   ib0 1  ff02::1:ff00:77a2
   ib0 1  ff02::1
 
 
   # netstat -A inet6 -g  -n
   ...
   IPv6/IPv4 Group Memberships
   Interface   RefCnt Group
   --- -- -
   lo  1  224.0.0.1
   ib0 1  224.0.0.1
 
 
   # cat /sys/kernel/debug/ipoib/ib0_mcg
   GID: ff12:401b::0:0:0:0:1
 created: 4298482097
 queuelen: 0
 complete:   yes
 send_only:   no
 
   GID: ff12:401b::0:0:0::
 created: 4298482097
 queuelen: 0
 complete:   yes
 send_only:   no
 
   GID: ff12:601b::0:0:0:0:1
 created: 4298482097
 queuelen: 0
 complete:   yes
 send_only:   no
 
   GID: ff12:601b::0:0:1:ff00:77a2
 created: 4298482097
 queuelen: 0
 complete:   yes
 send_only:   no
 
 
 the ff02::1:ff00:77a2 group is specific to the interface (link local),
 so each of our ib hosts running ipv6 registers its own unique multicast
 group.  since our network is bigger than 32 hosts, it appears that we
 have exceeded the multicast tables in our local switches and this is
 making opensm generate the above error.
 
 besides not running ipv6, are there any thoughts about this?

This has been discussed on the list before. Last time was a thread on
IPv6 and IPoIB scalability issue back in late November (11/30) to
early December (12/2). There are some options presented. None have been
pursued to the best of my knowledge.

-- Hal

 
 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general
 
 To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
 


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] IPOIB CM with Non SRQ support

2007-01-29 Thread Pradeep Satyanarayana
Hello Michael,

Yes, the code seems to get complex with lots of small changes spread 
across all over the recieve side. Plus 
special cassing them with #ifdef makes it look a little messy. It is 
unlikely I can get this out by Feb 1st.

As I was working through this I noticed a few things and here are my 
observations:

-ipoib_cm_modify_rx_rts() does not actually transition the passive side qp 
to RTS state and remains in the
RTR state. However, the active side qp does transition to RTS.

-One artifact of the current send side implemantation is that for every 
message we create a new set of tx qps.
So, if one were to use IB for the cluster heartbeat mechanism as an 
example, then for every heartbeat we
end up creating an ipoib_cm_tx structure and initiating a set of CM 
exchanges.  This might consume a lot of
 resources (even on an idle system). Changing this has a potential 
performance upside.

Pradeep
[EMAIL PROTECTED] 

Michael S. Tsirkin [EMAIL PROTECTED] wrote on 01/25/2007 11:41:28 PM:

  Quoting Pradeep Satyanarayana [EMAIL PROTECTED]:
  Subject: IPOIB CM with Non SRQ support
  
  
  Michael, 
  
  I am working on a prototype based on your IPOIB CM patch to 
 incorporate support for Non SRQ  as well. IPOIB CM was planned to be
 in OFED 1.2 if I remember correctly. If I were to submit a patch for
 non SRQ support, what would be the cut off date to make it
  into OFED 1.2? 
 
 I think it must be ready for merge by feature freeze on Feb 1st, but at 
this
 stage it really needs to be a small patch. I can't commit to merging it
 before I see it.
 
 I have to warn you that I thought about this problem, and unfortunately
 I do not see a way to implement it in a robust fashion without 
complicating
 the code significantly. In this case, you'll just might have to maintain 
it
 as a separate patch until the code lands upstream, and propose as a 
separate
 improvement later.
 
 -- 
 MST
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] IPOIB CM with Non SRQ support

2007-01-25 Thread Michael S. Tsirkin
 Quoting Pradeep Satyanarayana [EMAIL PROTECTED]:
 Subject: IPOIB CM with Non SRQ support
 
 
 Michael, 
 
 I am working on a prototype based on your IPOIB CM patch to incorporate 
 support for Non SRQ  as well. IPOIB CM was planned to be in OFED 1.2 if I 
 remember correctly. If I were to submit a patch for non SRQ support, what 
 would be the cut off date to make it
 into OFED 1.2? 

I think it must be ready for merge by feature freeze on Feb 1st, but at this
stage it really needs to be a small patch. I can't commit to merging it
before I see it.

I have to warn you that I thought about this problem, and unfortunately
I do not see a way to implement it in a robust fashion without complicating
the code significantly. In this case, you'll just might have to maintain it
as a separate patch until the code lands upstream, and propose as a separate
improvement later.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] ipoib ipv6 multicast joins, was: multicast code/merge status

2007-01-15 Thread Sean Hefty
 Can you explain how this relates to your multicast changes? the IPoIB 
 send-only-full-member-join hack was there before your patch and stayed 
 there after your patch... and how come a change in the multicast code 
 can cause the error steam to be finite... have you moved the retry 
 mechanism from the ib_sa consumer to the ib_sa mcast engine?

There was a bug in the ib_sa multicast engine handling failed joins, which had 
it retry forever.  (Basically, the response was not being matched with the 
request.  So the response was discarded, and the request was retried.)  I had 
fixed this in svn, but lost the patch moving over to git.

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] ipoib ipv6 multicast joins, was: multicast code/merge status

2007-01-15 Thread Or Gerlitz
On 1/15/07, Sean Hefty [EMAIL PROTECTED] wrote:
  Can you explain how this relates to your multicast changes? the IPoIB
  send-only-full-member-join hack was there before your patch and stayed
  there after your patch... and how come a change in the multicast code
  can cause the error steam to be finite... have you moved the retry
  mechanism from the ib_sa consumer to the ib_sa mcast engine?

 There was a bug in the ib_sa multicast engine handling failed joins, which had
 it retry forever.  (Basically, the response was not being matched with the
 request.  So the response was discarded, and the request was retried.)  I had
 fixed this in svn, but lost the patch moving over to git.

sure, got you.

Or.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] ipoib ipv6 multicast joins, was: multicast code/merge status

2007-01-14 Thread Or Gerlitz
Sean Hefty wrote:
 So, this looks like a work-around for some broken SM, does it not?
 
 Yes - I mentioned it because the resulting error message (wrong 
 component mask) is what was filling up the opensm log file.
 
 Jan 11 14:21:36 083844 [40583BB0] - osm_mcmr_rcv_join_mgrp: ERR 1B11: 
 method =
 SubnAdmSet, scope_state = 0x1, component mask = 0x00010083, 
 expected com
 p mask = 0x000130c7, MGID: 0x : 
 0x201400020404 from
 port 0x0002c9010ad258f1
 
 I've applied a missing patch to my rdma-dev git tree that should avoid 
 filling up the opensm log file.  But the error in the opensm log file is 
 a result of this work-around.

Sean,

Can you explain how this relates to your multicast changes? the IPoIB 
send-only-full-member-join hack was there before your patch and stayed 
there after your patch... and how come a change in the multicast code 
can cause the error steam to be finite... have you moved the retry 
mechanism from the ib_sa consumer to the ib_sa mcast engine?

Or.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] ipoib ipv6 multicast joins, was: multicast code/merge status

2007-01-12 Thread Hal Rosenstock
On Thu, 2007-01-11 at 19:11, Sean Hefty wrote: 
 Hal Rosenstock wrote:
 (*) there are some more issues here which need to be addressed, see
 for example the Some SMs don't support send-only yet weird comment
 at ipoib_mcast_sendonly_join()
  
  
  It's more likely an SA issue but I'm only guessing... It may also be
  historical...
 
 Based on observation, it looks like ipoib joins a couple of IPv6 multicast 
 groups with send only membership.

Yes.

 However it changes the join_state from 4 to 1 
 (send-only to full member).

Yes, that is the workaround Roland had put in (likely for a non
compliant SM which didn't support send only joins).

 This results in the SA trying to create the 
 multicast group, only the required MCMemberRecord components have not been 
 set.

Right, the group either needs to be previously precreated or a receiver
started first which would create the group.

 I'm not sure if this indicates a serious problem, but I'm guessing not.

I don't believe it's a serious problem (at least now). In any case, it
is no worse than it was before your change for this (it is not a problem
of your making...).

 The join request simply fails and returns an error back to ipoib.  (Which 
 would have 
 happened for a send-only join if the group hadn't already been created.)

Right.

-- Hal

 - Sean


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] ipoib ipv6 multicast joins, was: multicast code/merge status

2007-01-11 Thread Sean Hefty
Hal Rosenstock wrote:
(*) there are some more issues here which need to be addressed, see
for example the Some SMs don't support send-only yet weird comment
at ipoib_mcast_sendonly_join()
 
 
 It's more likely an SA issue but I'm only guessing... It may also be
 historical...

Based on observation, it looks like ipoib joins a couple of IPv6 multicast 
groups with send only membership.  However it changes the join_state from 4 to 
1 
(send-only to full member).  This results in the SA trying to create the 
multicast group, only the required MCMemberRecord components have not been set.

I'm not sure if this indicates a serious problem, but I'm guessing not.  The 
join request simply fails and returns an error back to ipoib.  (Which would 
have 
happened for a send-only join if the group hadn't already been created.)

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] ipoib ipv6 multicast joins, was: multicast code/merge status

2007-01-11 Thread Michael S. Tsirkin
 Quoting Sean Hefty [EMAIL PROTECTED]:
 Subject: ipoib ipv6 multicast joins, was: multicast code/merge status
 
 Hal Rosenstock wrote:
 (*) there are some more issues here which need to be addressed, see
 for example the Some SMs don't support send-only yet weird comment
 at ipoib_mcast_sendonly_join()
  
  
  It's more likely an SA issue but I'm only guessing... It may also be
  historical...
 
 Based on observation, it looks like ipoib joins a couple of IPv6 multicast 
 groups with send only membership.  However it changes the join_state from 4 
 to 1 
 (send-only to full member).  This results in the SA trying to create the 
 multicast group, only the required MCMemberRecord components have not been 
 set.
 
 I'm not sure if this indicates a serious problem, but I'm guessing not.  The 
 join request simply fails and returns an error back to ipoib.  (Which would 
 have 
 happened for a send-only join if the group hadn't already been created.)

So, this looks like a work-around for some broken SM, does it not?

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] ipoib ipv6 multicast joins, was: multicast code/merge status

2007-01-11 Thread Sean Hefty
 So, this looks like a work-around for some broken SM, does it not?

Yes - I mentioned it because the resulting error message (wrong component mask) 
is what was filling up the opensm log file.

Jan 11 14:21:36 083844 [40583BB0] - osm_mcmr_rcv_join_mgrp: ERR 1B11: method =
SubnAdmSet, scope_state = 0x1, component mask = 0x00010083, expected com
p mask = 0x000130c7, MGID: 0x : 0x201400020404 from
port 0x0002c9010ad258f1

I've applied a missing patch to my rdma-dev git tree that should avoid filling 
up the opensm log file.  But the error in the opensm log file is a result of 
this work-around.

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] IPoIB new multicast API patches oops

2006-11-14 Thread Michael S. Tsirkin
Quoting Sean Hefty [EMAIL PROTECTED]:
 Subject: RE: IPoIB new multicast API patches oops
 
 I have not been able to reproduce this crash on my systems, and even
 instrumenting the code isn't helping me to locate the issue.  Can you
 apply the following patch on top of the previous patches, and let me
 know if you get any additional output?

OK, I hope to get back to testing this next-week-ish.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] IPoIB new multicast API patches oops

2006-11-13 Thread Sean Hefty
I have not been able to reproduce this crash on my systems, and even
instrumenting the code isn't helping me to locate the issue.  Can you
apply the following patch on top of the previous patches, and let me
know if you get any additional output?

- Sean
---
diff --git a/drivers/infiniband/core/multicast.c 
b/drivers/infiniband/core/multicast.c
index 88a9edf..b3bc4c6 100644
--- a/drivers/infiniband/core/multicast.c
+++ b/drivers/infiniband/core/multicast.c
@@ -81,6 +81,12 @@ enum mcast_state {
MCAST_ERROR
 };
 
+enum mcast_debug {
+   MCAST_DEBUG_IDLE,
+   MCAST_DEBUG_JOINING,
+   MCAST_DEBUG_LEAVING,
+};
+
 struct mcast_member;
 
 struct mcast_group {
@@ -97,6 +103,7 @@ struct mcast_group {
enum mcast_statestate;
struct ib_sa_query  *query;
int query_id;
+   enum mcast_debugdebug_state;
 };
 
 struct mcast_member {
@@ -179,6 +186,7 @@ static void release_group(struct mcast_g
if (atomic_dec_and_test(group-refcount)) {
rb_erase(group-node, port-table);
spin_unlock_irqrestore(port-lock, flags);
+   BUG_ON(group-debug_state != MCAST_DEBUG_IDLE);
kfree(group);
deref_port(port);
} else
@@ -319,6 +327,8 @@ static int send_join(struct mcast_group 
struct mcast_port *port = group-port;
int ret;
 
+   BUG_ON(group-debug_state != MCAST_DEBUG_IDLE);
+   group-debug_state = MCAST_DEBUG_JOINING;
ret = ib_sa_mcmember_rec_query(sa_client, port-dev-device,
   port-port_num, IB_MGMT_METHOD_SET,
   member-multicast.rec,
@@ -341,6 +351,8 @@ static int send_leave(struct mcast_group
rec = group-rec;
rec.join_state = leave_state;
 
+   BUG_ON(group-debug_state != MCAST_DEBUG_IDLE);
+   group-debug_state = MCAST_DEBUG_LEAVING;
ret = ib_sa_mcmember_rec_query(sa_client, port-dev-device,
   port-port_num, IB_SA_METHOD_DELETE, 
rec,
   IB_SA_MCMEMBER_REC_MGID |
@@ -493,6 +505,8 @@ static void join_handler(int status, str
 {
struct mcast_group *group = context;
 
+   BUG_ON(group-debug_state != MCAST_DEBUG_JOINING);
+   group-debug_state = MCAST_DEBUG_IDLE;
if (status)
process_join_error(group, status);
else {
@@ -510,6 +524,10 @@ static void join_handler(int status, str
 static void leave_handler(int status, struct ib_sa_mcmember_rec *rec,
  void *context)
 {
+   struct mcast_group *group = context;
+
+   BUG_ON(group-debug_state != MCAST_DEBUG_LEAVING);
+   group-debug_state = MCAST_DEBUG_IDLE;
mcast_work_handler(context);
 }
 


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] ipoib mtu problem with UDP

2006-11-07 Thread Moni Shoua
Michael S. Tsirkin wrote:

I tried using ifconfig to limit the ipoib mtu.
Once I do this on *either* both server and client, or only on the client side,
UDP seems to stop working:

#ifconfig ib0 mtu 512
#netperf -c -C -H 11.4.3.68 -f M -t UDP_STREAM
UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 11.4.3.68
(11.4.3.68) port 0 AF_INET : demo
Socket  Message  Elapsed  Messages   CPU  Service
SizeSize Time Okay Errors   Throughput   Util Demand
bytes   bytessecs#  #   MBytes/sec % SS us/KB

118784   65507   10.00   27582  0  172.2 26.33inf
118784   10.00   0   0.0 23.40inf

Things work fine if the mtu on the client side is 2044:
# ifconfig ib0 mtu 2044
# netperf -c -C -H 11.4.3.68 -f M -t UDP_STREAM
UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
11.4.3.68 (11.4.3.68) port 0 AF_INET : demo
Socket  Message  Elapsed  Messages   CPU  Service
SizeSize Time Okay Errors   Throughput   Util Demand
bytes   bytessecs#  #   MBytes/sec % SS us/KB

118784   65507   10.00   78488  0  490.1 25.312.310
118784   10.00   68534 428.0 24.552.241

Tested with kernel 2.6.19-rc4 and netperf 2.4.2.

  

I get the same  results with iperf.
However they succeed with smaller datagrams (netperf uses 65507 by default)

dodly5:/home/shared/testing-tools/x86_64/netperf/netperf-2.4.1 # 
ifconfig ib0
ib0   Link encap:UNSPEC  HWaddr 
00-00-04-04-FE-80-00-00-00-00-00-00-00-00-00-00
  inet addr:192.168.11.235  Bcast:192.168.11.255  Mask:255.255.255.0
  UP BROADCAST RUNNING MULTICAST  MTU:512  Metric:1
  RX packets:42 errors:0 dropped:0 overruns:0 frame:0
  TX packets:14077513 errors:0 dropped:5 overruns:0 carrier:0
  collisions:0 txqueuelen:128
  RX bytes:5776 (5.6 Kb)  TX bytes:6717604780 (6406.4 Mb)

dodly5:/home/shared/testing-tools/x86_64/netperf/netperf-2.4.1 # 
./netperf   -H 192.168.11.233  -t UDP_STREAM -- -m 3
UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
192.168.11.233 (192.168.11.233) port 0 AF_INET
Socket  Message  Elapsed  Messages
SizeSize Time Okay Errors   Throughput
bytes   bytessecs#  #   10^6bits/sec

262144   3   10.00   52533  01260.59
262144   10.00   22956550.86


dodly5:/home/shared/testing-tools/x86_64/iperf-2.0.2 # ./iperf -uc 
192.168.11.233 -l 65000

Client connecting to 192.168.11.233, UDP port 5001
Sending 65000 byte datagrams
UDP buffer size:   256 KByte (default)

[  3] local 192.168.11.235 port 32769 connected with 192.168.11.233 port 
5001
[  3]  0.0-10.9 sec  1.36 MBytes  1.05 Mbits/sec
[  3] Sent 22 datagrams
[  3] WARNING: did not receive ack of last datagram after 10 tries.
dodly5:/home/shared/testing-tools/x86_64/iperf-2.0.2 # ./iperf -uc 
192.168.11.233

Client connecting to 192.168.11.233, UDP port 5001
Sending 1470 byte datagrams
UDP buffer size:   256 KByte (default)

[  3] local 192.168.11.235 port 32769 connected with 192.168.11.233 port 
5001
[  3]  0.0-10.0 sec  1.25 MBytes  1.05 Mbits/sec
[  3] Sent 893 datagrams
[  3] Server Report:
[  3]  0.0-10.0 sec  1.25 MBytes  1.05 Mbits/sec  0.002 ms0/  893 (0%)




___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] ipoib mtu problem with UDP

2006-11-06 Thread Michael S. Tsirkin
I tried using ifconfig to limit the ipoib mtu.
Once I do this on *either* both server and client, or only on the client side,
UDP seems to stop working:

#ifconfig ib0 mtu 512
#netperf -c -C -H 11.4.3.68 -f M -t UDP_STREAM
UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 11.4.3.68
(11.4.3.68) port 0 AF_INET : demo
Socket  Message  Elapsed  Messages   CPU  Service
SizeSize Time Okay Errors   Throughput   Util Demand
bytes   bytessecs#  #   MBytes/sec % SS us/KB

118784   65507   10.00   27582  0  172.2 26.33inf
118784   10.00   0   0.0 23.40inf

Things work fine if the mtu on the client side is 2044:
# ifconfig ib0 mtu 2044
# netperf -c -C -H 11.4.3.68 -f M -t UDP_STREAM
UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 11.4.3.68 
(11.4.3.68) port 0 AF_INET : demo
Socket  Message  Elapsed  Messages   CPU  Service
SizeSize Time Okay Errors   Throughput   Util Demand
bytes   bytessecs#  #   MBytes/sec % SS us/KB

118784   65507   10.00   78488  0  490.1 25.312.310
118784   10.00   68534 428.0 24.552.241

Tested with kernel 2.6.19-rc4 and netperf 2.4.2.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] IPoIB odd loopback packet from arp

2006-10-24 Thread Eli Cohen
Todd,
This does not look like an error. The first arp is a broadcast
(qpn=ff) so it is received in at the sending interface and is
dropped. The second on is a unicast arp (qpn=0x000404) so it is not
received at the local interface. 


On Mon, 2006-10-23 at 13:48 -0600, Todd Bowman wrote:
 Using the OFED 1.0 and OFED 1.1 stack I have notice some rcvswrelay
 errors.  I have tracked it down to the arp request.  I can reproduce
 the problem with the following steps:
 
 ( I have used both 2.6.14.14 and 2.6.18.1 kernels) 
 
 ib109 arp -d ib110
 ib109 ping ib110 -c 2
 
 # ib_ipoib module debug
 13:15:46 ib109 kernel: ib0: sending packet, length=60 address=f6187200
 qpn=0xff 
 13:15:46 ib109 kernel: ib0: called: id 34, op 0, status: 0
 13:15:46 ib109 kernel: ib0: send complete, wrid 34
 13:15:46 ib109 kernel: ib0: called: id -2147483623, op 128, status: 0
 13:15:46 ib109 kernel: ib0: received 100 bytes, SLID 0x0369 
 13:15:46 ib109 kernel: ib0: dropping loopback packet
 13:15:46 ib109 kernel: ib0: called: id -2147483622, op 128, status: 0
 13:15:46 ib109 kernel: ib0: received 100 bytes, SLID 0x016d
 13:15:46 ib109 kernel: ib0: sending packet, length=88 address=f6e57520
 qpn=0x000404 
 13:15:46 ib109 kernel: ib0: called: id 35, op 0, status: 0
 13:15:46 ib109 kernel: ib0: send complete, wrid 35
 13:15:46 ib109 kernel: ib0: called: id -2147483621, op 128, status: 0
 13:15:46 ib109 kernel: ib0: received 128 bytes, SLID 0x016d 
 13:15:47 ib109 kernel: ib0: sending packet, length=88 address=f6e57520
 qpn=0x000404
 13:15:47 ib109 kernel: ib0: called: id 36, op 0, status: 0
 13:15:47 ib109 kernel: ib0: send complete, wrid 36
 13:15:47 ib109 kernel: ib0: called: id -2147483620, op 128, status: 0 
 13:15:47 ib109 kernel: ib0: received 128 bytes, SLID 0x016d
 13:15:51 ib109 kernel: ib0: called: id -2147483619, op 128, status: 0
 13:15:51 ib109 kernel: ib0: received 100 bytes, SLID 0x016d
 13:15:51 ib109 kernel: ib0: sending packet, length=60 address=f6e57520
 qpn=0x000404 
 13:15:51 ib109 kernel: ib0: called: id 37, op 0, status: 0
 13:15:51 ib109 kernel: ib0: send complete, wrid 37
 
 # tcpdump -i ib0
 13:15:46.977578 arp who-has ib110 tell ib109 hardware #32
 13:15:46.977682 arp reply ib110 is-at
 00:00:04:04:fe:80:00:00:00:00:00:00:00:08:f1:04:03:96:11:59 hardware
 #32 
 13:15:46.977710 IP ib109  ib110: icmp 64: echo request seq 0
 13:15:46.977790 IP ib110  ib109: icmp 64: echo reply seq 0
 13:15:47.92 IP ib109  ib110: icmp 64: echo request seq 1
 13:15:47.977892 IP ib110  ib109: icmp 64: echo reply seq 1
 13:15:51.977076 arp who-has ib109 tell ib110 hardware #32
 13:15:51.977094 arp reply ib109 is-at
 00:02:00:14:fe:80:00:00:00:00:00:00:00:02:c9:02:00:00:3b:31 hardware
 #32 
 
 # error dump
 rcvswrelayerrors:1 MT47396 Infiniscale-III 0x2c9010b022090[1]
  ib109 HCA-1 0x2c9023b30[1]   
 
 1) The ping is successful and the arp table is populated so Is this
 really a problem or a false positive?  
 2) The second arp does not generate an error (the error dump reports
 all new errors in switches). Why?
 
 Any ideas?
 
 Thanks in advance.
 
 Todd
 
 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general
 
 To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] IPoIB Question

2006-10-24 Thread Sean Hubbell
Greg Lindahl wrote:
 On Mon, Oct 23, 2006 at 07:53:06AM -0500, Hubbell, Sean C Contractor/Decibel 
 wrote:

   
   I currently have several applications that uses a legacy IPv4 protocol
 and I use IPoIB to utilize my infiniband network which works great. I
 have completed some timing and throughput analysis and noticed that I do
 not get very much more if I use an infiniband network interface than
 using my GigE network interface.
 

 You might want to note that different InfinBand implementations have
 quite different performance of IPoIB, especially for UDP.

 Another issue is that IPoIB has quite different performance with
 different Linux kernels. This is especially evident for TCP, although
 you can use SDP to accelerate TCP sockets and avoid this issue.

   
We are currently looking at the new tickless kernel. Do you have one 
that you recommend?

Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] IPoIB odd loopback packet from arp

2006-10-24 Thread Todd Bowman
Thanks Eli.So the switch is incrementing the rcvswrelay counter when it sends the broadcast back through the original port. This doesn't seem to be correct behavior, it makes that counter unreliable.
On 10/24/06, Eli Cohen [EMAIL PROTECTED] wrote:
Todd,This does not look like an error. The first arp is a broadcast(qpn=ff) so it is received in at the sending interface and isdropped. The second on is a unicast arp (qpn=0x000404) so it is notreceived at the local interface.
On Mon, 2006-10-23 at 13:48 -0600, Todd Bowman wrote: Using the OFED 1.0 and OFED 1.1 stack I have notice some rcvswrelay errors.I have tracked it down to the arp request.I can reproduce
 the problem with the following steps: ( I have used both 2.6.14.14 and 2.6.18.1 kernels) ib109 arp -d ib110 ib109 ping ib110 -c 2
 # ib_ipoib module debug 13:15:46 ib109 kernel: ib0: sending packet, length=60 address=f6187200 qpn=0xff 13:15:46 ib109 kernel: ib0: called: id 34, op 0, status: 0 13:15:46 ib109 kernel: ib0: send complete, wrid 34
 13:15:46 ib109 kernel: ib0: called: id -2147483623, op 128, status: 0 13:15:46 ib109 kernel: ib0: received 100 bytes, SLID 0x0369 13:15:46 ib109 kernel: ib0: dropping loopback packet 13:15:46 ib109 kernel: ib0: called: id -2147483622, op 128, status: 0
 13:15:46 ib109 kernel: ib0: received 100 bytes, SLID 0x016d 13:15:46 ib109 kernel: ib0: sending packet, length=88 address=f6e57520 qpn=0x000404 13:15:46 ib109 kernel: ib0: called: id 35, op 0, status: 0
 13:15:46 ib109 kernel: ib0: send complete, wrid 35 13:15:46 ib109 kernel: ib0: called: id -2147483621, op 128, status: 0 13:15:46 ib109 kernel: ib0: received 128 bytes, SLID 0x016d 13:15:47 ib109 kernel: ib0: sending packet, length=88 address=f6e57520
 qpn=0x000404 13:15:47 ib109 kernel: ib0: called: id 36, op 0, status: 0 13:15:47 ib109 kernel: ib0: send complete, wrid 36 13:15:47 ib109 kernel: ib0: called: id -2147483620, op 128, status: 0
 13:15:47 ib109 kernel: ib0: received 128 bytes, SLID 0x016d 13:15:51 ib109 kernel: ib0: called: id -2147483619, op 128, status: 0 13:15:51 ib109 kernel: ib0: received 100 bytes, SLID 0x016d 13:15:51 ib109 kernel: ib0: sending packet, length=60 address=f6e57520
 qpn=0x000404 13:15:51 ib109 kernel: ib0: called: id 37, op 0, status: 0 13:15:51 ib109 kernel: ib0: send complete, wrid 37 # tcpdump -i ib0 13:15:46.977578 arp who-has ib110 tell ib109 hardware #32
 13:15:46.977682 arp reply ib110 is-at 00:00:04:04:fe:80:00:00:00:00:00:00:00:08:f1:04:03:96:11:59 hardware #32 13:15:46.977710 IP ib109  ib110: icmp 64: echo request seq 0 13:15:
46.977790 IP ib110  ib109: icmp 64: echo reply seq 0 13:15:47.92 IP ib109  ib110: icmp 64: echo request seq 1 13:15:47.977892 IP ib110  ib109: icmp 64: echo reply seq 1 13:15:51.977076
 arp who-has ib109 tell ib110 hardware #32 13:15:51.977094 arp reply ib109 is-at 00:02:00:14:fe:80:00:00:00:00:00:00:00:02:c9:02:00:00:3b:31 hardware #32 # error dump rcvswrelayerrors:1 MT47396 Infiniscale-III 0x2c9010b022090[1]
  ib109 HCA-1 0x2c9023b30[1] 1) The ping is successful and the arp table is populated so Is this really a problem or a false positive? 2) The second arp does not generate an error (the error dump reports
 all new errors in switches). Why? Any ideas? Thanks in advance. Todd ___ openib-general mailing list
 openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit 
http://openib.org/mailman/listinfo/openib-general
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] IPoIB Question

2006-10-24 Thread Michael Krause
At 10:00 PM 10/23/2006, Greg Lindahl wrote:
On Mon, Oct 23, 2006 at 07:53:06AM -0500, Hubbell, Sean C 
Contractor/Decibel wrote:

I currently have several applications that uses a legacy IPv4 protocol
  and I use IPoIB to utilize my infiniband network which works great. I
  have completed some timing and throughput analysis and noticed that I do
  not get very much more if I use an infiniband network interface than
  using my GigE network interface.

You might want to note that different InfinBand implementations have
quite different performance of IPoIB, especially for UDP.

Another issue is that IPoIB has quite different performance with
different Linux kernels. This is especially evident for TCP, although
you can use SDP to accelerate TCP sockets and avoid this issue.

  My question is, am I using IPoIB correctly or are these the typical
  numbers that everyone is seeing?

It is certainly the case that there are some message patterns and
situations for which InfiniBand is not much of an improvement over
gigE.

Unfortunately, the comparison of IB to GbE are often apple-to-orange 
comparisons even for IP over IB.  Until a HCA supplies the same level of 
functional off-load enabled by the IP network stack that is used with 
Ethernet, it really isn't a fair comparison.  The same is also true for 
many of the marketroids and their comparisons of IB to Ethernet based 
solutions.  Fortunately, most customers are getting a bit smarter and not 
falling for the marketing drivel these days - certainly the OEM don't fall 
for it thought the marketroids continue to come in and try to convince 
people it isn't an apple-to-orange comparison.The fact is both 
technologies have their pros / cons and it is really the workload or 
production environment that determines which is the best fit instead of the 
force fit.

In any case, not really a development issue so will drop further discussion.

Mike 



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] IPoIB Question

2006-10-24 Thread Greg Lindahl
On Tue, Oct 24, 2006 at 08:35:18AM -0500, Sean Hubbell wrote:

 We are currently looking at the new tickless kernel. Do you have one 
 that you recommend?

The main one to less-recommend is 2.6.9-based kernels, those are the
slowest at TCP. Modern kernels, like the ones you see in Fedora 4 and
up and SLES 10, seem to all be good and about equal in this area.

I don't think we've tried a tickless kernel. We do most of our testing
on the various kernels that ship with distros, plus the tip-of-tree
kernel.org kernel.

-- greg


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] IPoIB Question

2006-10-24 Thread Scott Weitzenkamp (sweitzen)
We see 3.6 Gb/sec with IPoIB using RHEL4U4 2.6.9-42 x86_64 kernel on
Dell PE1950 Woodcrest systems.

In my testing, faster hardware is more important than newer kernels, but
I don't try newer kernels much.

Scott Weitzenkamp
SQA and Release Manager
Server Virtualization Business Unit
Cisco Systems
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Greg Lindahl
 Sent: Tuesday, October 24, 2006 1:16 PM
 To: Sean Hubbell
 Cc: openib-general@openib.org
 Subject: Re: [openib-general] IPoIB Question
 
 On Tue, Oct 24, 2006 at 08:35:18AM -0500, Sean Hubbell wrote:
 
  We are currently looking at the new tickless kernel. Do you 
 have one 
  that you recommend?
 
 The main one to less-recommend is 2.6.9-based kernels, those are the
 slowest at TCP. Modern kernels, like the ones you see in Fedora 4 and
 up and SLES 10, seem to all be good and about equal in this area.
 
 I don't think we've tried a tickless kernel. We do most of our testing
 on the various kernels that ship with distros, plus the tip-of-tree
 kernel.org kernel.
 
 -- greg
 
 
 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general
 
 To unsubscribe, please visit 
 http://openib.org/mailman/listinfo/openib-general
 

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] IPoIB Question

2006-10-24 Thread Sean Hubbell
Is this with a combination of TCP and UDP or just TCP?

Sean

Scott Weitzenkamp (sweitzen) wrote:
 We see 3.6 Gb/sec with IPoIB using RHEL4U4 2.6.9-42 x86_64 kernel on
 Dell PE1950 Woodcrest systems.

 In my testing, faster hardware is more important than newer kernels, but
 I don't try newer kernels much.

   

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] IPoIB Question

2006-10-23 Thread Hubbell, Sean C Contractor/Decibel
Title: IPoIB Question






Hello,


 I currently have several applications that uses a legacy IPv4 protocol and I use IPoIB to utilize my infiniband network which works great. I have completed some timing and throughput analysis and noticed that I do not get very much more if I use an infiniband network interface than using my GigE network interface. My question is, am I using IPoIB correctly or are these the typical numbers that everyone is seeing? Is there a standard application that I may use to test my current configuration?

Thanks in advance,


Sean



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] IPoIB Question

2006-10-23 Thread Scott Weitzenkamp (sweitzen)
Title: IPoIB Question



IPoIB performance will vary quite a bit depending on what 
motherboard, CPU speed, and HCA type you have. What are the specs on the 
systems you are using?

Netperf (www.netperf.org) is a good 
tool to measure IPoIB performance.

Scott 
Weitzenkamp
SQA and Release 
Manager
Server Virtualization 
Business Unit
Cisco Systems


  
  
  From: [EMAIL PROTECTED] 
  [mailto:[EMAIL PROTECTED] On Behalf Of Hubbell, Sean C 
  Contractor/DecibelSent: Monday, October 23, 2006 5:53 
  AMTo: openib-general@openib.orgCc: Sean 
  HubbellSubject: [openib-general] IPoIB 
Question
  
  Hello, 
   I currently have several applications that 
  uses a legacy IPv4 protocol and I use IPoIB to utilize my infiniband network 
  which works great. I have completed some timing and throughput analysis and 
  noticed that I do not get very much more if I use an infiniband network 
  interface than using my GigE network interface. My question is, am I using 
  IPoIB correctly or are these the typical numbers that everyone is seeing? Is 
  there a standard application that I may use to test my current 
  configuration?
  Thanks in advance, 
  Sean 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] IPoIB Question

2006-10-23 Thread Michael S. Tsirkin
Quoting r. Scott Weitzenkamp (sweitzen) [EMAIL PROTECTED]:
 Netperf (www.netperf.org) is a good tool to measure IPoIB performance.

Of special note is the -T flag which often lets you get more consistent results
by pinning the test to a single CPU.

Another useful tool is iperf, which has a -P option for running multiple socket
tests in parallel. In TCP, multi-socket performance often exceeds that of a 
single
socket.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] IPoIB Question

2006-10-23 Thread Sean Hubbell
We currently have a non-homogeneous cluster so that seems that would 
possible explain a few of the differences that I have seen on some of my 
tests. I will look at netperf.org and see what they have to offer.

  On another note, is there plans to have IPoIB support the full 
throughput that infiniband 4x or 12x has? Specifically, can I keep my 
legacy apps and just upgrade the network to take advantage of the bandwidth?

Sean

Scott Weitzenkamp (sweitzen) wrote:
 IPoIB performance will vary quite a bit depending on what motherboard, 
 CPU speed, and HCA type you have.  What are the specs on the systems 
 you are using?
  
 Netperf (www.netperf.org http://www.netperf.org) is a good tool to 
 measure IPoIB performance.
  
 Scott Weitzenkamp
 SQA and Release Manager
 Server Virtualization Business Unit
 Cisco Systems
  

 
 *From:* [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] *On Behalf Of *Hubbell,
 Sean C Contractor/Decibel
 *Sent:* Monday, October 23, 2006 5:53 AM
 *To:* openib-general@openib.org
 *Cc:* Sean Hubbell
 *Subject:* [openib-general] IPoIB Question

 Hello,

   I currently have several applications that uses a legacy IPv4
 protocol and I use IPoIB to utilize my infiniband network which
 works great. I have completed some timing and throughput analysis
 and noticed that I do not get very much more if I use an
 infiniband network interface than using my GigE network interface.
 My question is, am I using IPoIB correctly or are these the
 typical numbers that everyone is seeing? Is there a standard
 application that I may use to test my current configuration?

 Thanks in advance,

 Sean



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] IPoIB Question

2006-10-23 Thread Scott Weitzenkamp (sweitzen)
If you are using TCP, you can use SDP transparently via libsdp to get
improved latency and throughput.

Scott 

 -Original Message-
 From: Sean Hubbell [mailto:[EMAIL PROTECTED] 
 Sent: Monday, October 23, 2006 8:56 AM
 To: Scott Weitzenkamp (sweitzen)
 Cc: openib-general@openib.org
 Subject: Re: [openib-general] IPoIB Question
 
 We currently have a non-homogeneous cluster so that seems that would 
 possible explain a few of the differences that I have seen on 
 some of my 
 tests. I will look at netperf.org and see what they have to offer.
 
   On another note, is there plans to have IPoIB support the full 
 throughput that infiniband 4x or 12x has? Specifically, can I keep my 
 legacy apps and just upgrade the network to take advantage of 
 the bandwidth?
 
 Sean
 
 Scott Weitzenkamp (sweitzen) wrote:
  IPoIB performance will vary quite a bit depending on what 
 motherboard, 
  CPU speed, and HCA type you have.  What are the specs on 
 the systems 
  you are using?
   
  Netperf (www.netperf.org http://www.netperf.org) is a 
 good tool to 
  measure IPoIB performance.
   
  Scott Weitzenkamp
  SQA and Release Manager
  Server Virtualization Business Unit
  Cisco Systems
   
 
  
 --
 --
  *From:* [EMAIL PROTECTED]
  [mailto:[EMAIL PROTECTED] *On Behalf 
 Of *Hubbell,
  Sean C Contractor/Decibel
  *Sent:* Monday, October 23, 2006 5:53 AM
  *To:* openib-general@openib.org
  *Cc:* Sean Hubbell
  *Subject:* [openib-general] IPoIB Question
 
  Hello,
 
I currently have several applications that uses a legacy IPv4
  protocol and I use IPoIB to utilize my infiniband network which
  works great. I have completed some timing and 
 throughput analysis
  and noticed that I do not get very much more if I use an
  infiniband network interface than using my GigE network 
 interface.
  My question is, am I using IPoIB correctly or are these the
  typical numbers that everyone is seeing? Is there a standard
  application that I may use to test my current configuration?
 
  Thanks in advance,
 
  Sean
 
 

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] IPoIB Question

2006-10-23 Thread Sean Hubbell
Scott,

Thanks for the reply again. The third party api that we use leverages a 
combination of UDP and TCP socket conntections for speed. Is there 
something for UCP as well?

Sean

Scott Weitzenkamp (sweitzen) wrote:
 If you are using TCP, you can use SDP transparently via libsdp to get
 improved latency and throughput.

 Scott 

   
 -Original Message-
 From: Sean Hubbell [mailto:[EMAIL PROTECTED] 
 Sent: Monday, October 23, 2006 8:56 AM
 To: Scott Weitzenkamp (sweitzen)
 Cc: openib-general@openib.org
 Subject: Re: [openib-general] IPoIB Question

 We currently have a non-homogeneous cluster so that seems that would 
 possible explain a few of the differences that I have seen on 
 some of my 
 tests. I will look at netperf.org and see what they have to offer.

   On another note, is there plans to have IPoIB support the full 
 throughput that infiniband 4x or 12x has? Specifically, can I keep my 
 legacy apps and just upgrade the network to take advantage of 
 the bandwidth?

 Sean

 Scott Weitzenkamp (sweitzen) wrote:
 
 IPoIB performance will vary quite a bit depending on what 
   
 motherboard, 
 
 CPU speed, and HCA type you have.  What are the specs on 
   
 the systems 
 
 you are using?
  
 Netperf (www.netperf.org http://www.netperf.org) is a 
   
 good tool to 
 
 measure IPoIB performance.
  
 Scott Weitzenkamp
 SQA and Release Manager
 Server Virtualization Business Unit
 Cisco Systems
  

 
   
 --
 --
 
 *From:* [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] *On Behalf 
   
 Of *Hubbell,
 
 Sean C Contractor/Decibel
 *Sent:* Monday, October 23, 2006 5:53 AM
 *To:* openib-general@openib.org
 *Cc:* Sean Hubbell
 *Subject:* [openib-general] IPoIB Question

 Hello,

   I currently have several applications that uses a legacy IPv4
 protocol and I use IPoIB to utilize my infiniband network which
 works great. I have completed some timing and 
   
 throughput analysis
 
 and noticed that I do not get very much more if I use an
 infiniband network interface than using my GigE network 
   
 interface.
 
 My question is, am I using IPoIB correctly or are these the
 typical numbers that everyone is seeing? Is there a standard
 application that I may use to test my current configuration?

 Thanks in advance,

 Sean

   


   


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] IPoIB Question

2006-10-23 Thread Michael S. Tsirkin
Quoting r. Sean Hubbell [EMAIL PROTECTED]:
 Subject: Re: IPoIB Question
 
 Scott,
 
 Thanks for the reply again. The third party api that we use leverages a 
 combination of UDP and TCP socket conntections for speed. Is there 
 something for UCP as well?

iperf supports UDP as well. Again, check out the -P flag.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] IPoIB Question

2006-10-23 Thread Scott Weitzenkamp (sweitzen)
Nothing today in OF to accelerate UDP sockets.

Scott

 Thanks for the reply again. The third party api that we use 
 leverages a 
 combination of UDP and TCP socket conntections for speed. Is there 
 something for UCP as well?
 
 Sean
 
 Scott Weitzenkamp (sweitzen) wrote:
  If you are using TCP, you can use SDP transparently via 
 libsdp to get
  improved latency and throughput.
 
  Scott 

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] IPoIB Question

2006-10-23 Thread Sean Hubbell
Thanks Michael I looked at iperf and that looks like a very nice tool. I 
will be using that when I evaluate and check performance of my 
applications. I am also interested in getting more bandwidth out of my 
applications leveraging a current or planned capability for IPoIB. This 
way, I will not have to modify my source code and I can just actually 
change out the interfaces that my applications send and receive on. So, 
I am looking at libsdp for the TCP funcationality and wanted to know if 
libsdp supports UDP as well or is there another library that I can use 
to maximize the bandwidth when transmitting and sending over infiniband?

Sean

Michael S. Tsirkin wrote:
 Quoting r. Sean Hubbell [EMAIL PROTECTED]:
   
 Subject: Re: IPoIB Question

 Scott,

 Thanks for the reply again. The third party api that we use leverages a 
 combination of UDP and TCP socket conntections for speed. Is there 
 something for UCP as well?
 

 iperf supports UDP as well. Again, check out the -P flag.

   


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] IPoIB Question

2006-10-23 Thread Michael S. Tsirkin
Quoting r. Sean Hubbell [EMAIL PROTECTED]:
 I am looking at libsdp for the TCP funcationality and wanted to know if 
 libsdp supports UDP as well

AFAIK, SDP can only emulate TCP sockets.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] IPoIB Question

2006-10-23 Thread Michael Krause
At 10:19 AM 10/23/2006, Michael S. Tsirkin wrote:
Quoting r. Sean Hubbell [EMAIL PROTECTED]:
  I am looking at libsdp for the TCP funcationality and wanted to know if
  libsdp supports UDP as well

AFAIK, SDP can only emulate TCP sockets.

SDP is defined to work with AF_INET applications.  If using a shared 
library approach / pre-load, one can transparently enable any AF_INET 
application to utilize SDP without a recompile, etc.   The SDP Port Mapper 
specification for iWARP / service id for IB enable the connection 
management or whatever service it is implemented within to 
application-transparent discover the real target listen port and establish 
a SDP session nominally during connection establishment.Implementations 
may vary in the robustness or policies used to determine what to off-load, 
number of off-load sessions, etc.  - in other words, a lot of opportunity 
and flexibility is provided to use SDP.

Note: WinSocks Direct on Windows provides an equivalent service though uses 
a proprietary protocol.  Vista will have SDP as defined in the specifications.

There are currently no plans to develop an equivalent for datagram 
applications.   Any datagram application (user or kernel) can already 
access the hardware directly and given RDMA is not defined for datagram, it 
was felt such a specification would provide minimal value.

Mike  



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] IPoIB Question

2006-10-23 Thread Sean Hubbell
Perfect, I'll check with my vendor to see if this is possible. If so, this 
rocks!

Thanks!

Sean

-- Original message --
Date: Mon, 23 Oct 2006 11:04:40 -0700
From: Michael Krause [EMAIL PROTECTED]
Reply-To: Michael Krause [EMAIL PROTECTED]
To: Michael S. Tsirkin [EMAIL PROTECTED], Sean Hubbell [EMAIL PROTECTED]
CC: openib-general@openib.org
Subject: Re: [openib-general] IPoIB Question

At 10:19 AM 10/23/2006, Michael S. Tsirkin wrote:
Quoting r. Sean Hubbell [EMAIL PROTECTED]:
  I am looking at libsdp for the TCP funcationality and wanted to know if
  libsdp supports UDP as well

AFAIK, SDP can only emulate TCP sockets.

SDP is defined to work with AF_INET applications.  If using a shared 
library approach / pre-load, one can transparently enable any AF_INET 
application to utilize SDP without a recompile, etc.   The SDP Port Mapper 
specification for iWARP / service id for IB enable the connection 
management or whatever service it is implemented within to 
application-transparent discover the real target listen port and establish 
a SDP session nominally during connection establishment.Implementations 
may vary in the robustness or policies used to determine what to off-load, 
number of off-load sessions, etc.  - in other words, a lot of opportunity 
and flexibility is provided to use SDP.

Note: WinSocks Direct on Windows provides an equivalent service though uses 
a proprietary protocol.  Vista will have SDP as defined in the specifications.

There are currently no plans to develop an equivalent for datagram 
applications.   Any datagram application (user or kernel) can already 
access the hardware directly and given RDMA is not defined for datagram, it 
was felt such a specification would provide minimal value.

Mike  



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] IPoIB odd loopback packet from arp

2006-10-23 Thread Todd Bowman
Using the OFED 1.0 and OFED 1.1 stack I have notice some rcvswrelay errors. I have tracked it down to the arp request. I can reproduce the problem with the following steps:( I have used both 
2.6.14.14 and 2.6.18.1 kernels) ib109 arp -d ib110ib109 ping ib110 -c 2# ib_ipoib module debug13:15:46 ib109 kernel: ib0: sending packet, length=60 address=f6187200 qpn=0xff
13:15:46 ib109 kernel: ib0: called: id 34, op 0, status: 013:15:46 ib109 kernel: ib0: send complete, wrid 3413:15:46 ib109 kernel: ib0: called: id -2147483623, op 128, status: 013:15:46 ib109 kernel: ib0: received 100 bytes, SLID 0x0369
13:15:46 ib109 kernel: ib0: dropping loopback packet13:15:46 ib109 kernel: ib0: called: id -2147483622, op 128, status: 013:15:46 ib109 kernel: ib0: received 100 bytes, SLID 0x016d13:15:46 ib109 kernel: ib0: sending packet, length=88 address=f6e57520 qpn=0x000404
13:15:46 ib109 kernel: ib0: called: id 35, op 0, status: 013:15:46 ib109 kernel: ib0: send complete, wrid 3513:15:46 ib109 kernel: ib0: called: id -2147483621, op 128, status: 013:15:46 ib109 kernel: ib0: received 128 bytes, SLID 0x016d
13:15:47 ib109 kernel: ib0: sending packet, length=88 address=f6e57520 qpn=0x00040413:15:47 ib109 kernel: ib0: called: id 36, op 0, status: 013:15:47 ib109 kernel: ib0: send complete, wrid 3613:15:47 ib109 kernel: ib0: called: id -2147483620, op 128, status: 0
13:15:47 ib109 kernel: ib0: received 128 bytes, SLID 0x016d13:15:51 ib109 kernel: ib0: called: id -2147483619, op 128, status: 013:15:51 ib109 kernel: ib0: received 100 bytes, SLID 0x016d13:15:51 ib109 kernel: ib0: sending packet, length=60 address=f6e57520 qpn=0x000404
13:15:51 ib109 kernel: ib0: called: id 37, op 0, status: 013:15:51 ib109 kernel: ib0: send complete, wrid 37# tcpdump -i ib013:15:46.977578 arp who-has ib110 tell ib109 hardware #3213:15:46.977682 arp reply ib110 is-at 00:00:04:04:fe:80:00:00:00:00:00:00:00:08:f1:04:03:96:11:59 hardware #32
13:15:46.977710 IP ib109  ib110: icmp 64: echo request seq 013:15:46.977790 IP ib110  ib109: icmp 64: echo reply seq 013:15:47.92 IP ib109  ib110: icmp 64: echo request seq 113:15:47.977892
 IP ib110  ib109: icmp 64: echo reply seq 113:15:51.977076 arp who-has ib109 tell ib110 hardware #3213:15:51.977094 arp reply ib109 is-at 00:02:00:14:fe:80:00:00:00:00:00:00:00:02:c9:02:00:00:3b:31 hardware #32
# error dumprcvswrelayerrors:1 MT47396 Infiniscale-III 0x2c9010b022090[1]  ib109 HCA-1 0x2c9023b30[1] 1) The ping is successful and the arp table is populated so Is this really a problem or a false positive? 
2) The second arp does not generate an error (the error dump reports all new errors in switches). Why?Any ideas?Thanks in advance.Todd
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] IPoIB Question

2006-10-23 Thread Parks Fields
At 10:59 AM 10/23/2006, Sean Hubbell wrote:
Thanks Michael I looked at iperf and that looks like a very nice tool.


Something else about Iperf is, that it supports multiple streams. 
Which maybe closer to the way some apps operate.


* Correspondence *

This email contains no programmatic content that requires independent 
ADC review  



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] IPoIB Question

2006-10-23 Thread Greg Lindahl
On Mon, Oct 23, 2006 at 07:53:06AM -0500, Hubbell, Sean C Contractor/Decibel 
wrote:

   I currently have several applications that uses a legacy IPv4 protocol
 and I use IPoIB to utilize my infiniband network which works great. I
 have completed some timing and throughput analysis and noticed that I do
 not get very much more if I use an infiniband network interface than
 using my GigE network interface.

You might want to note that different InfinBand implementations have
quite different performance of IPoIB, especially for UDP.

Another issue is that IPoIB has quite different performance with
different Linux kernels. This is especially evident for TCP, although
you can use SDP to accelerate TCP sockets and avoid this issue.

 My question is, am I using IPoIB correctly or are these the typical
 numbers that everyone is seeing?

It is certainly the case that there are some message patterns and
situations for which InfiniBand is not much of an improvement over
gigE.

-- greg



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] IPoIB multicast neighbour?!

2006-10-18 Thread Or Gerlitz
While debugging something, i have opened ipoib debug messages and see

ib0: neigh_destructor for ff ff12:601b::::::0002

Do you have an idea what is the source of this neighbour? why it is created and
is there a way to eliminate this somehow (my guess is that removing IPv6 support
from the kernel will do that).

Its a RH4 U3 system with OFED 1.1 rc7

more info below, thanks.

Or.

# ip a s ib0
9: ib0: BROADCAST,MULTICAST,UP mtu 1500 qdisc pfifo_fast qlen 128
link/[32] 00:02:04:04:fe:80:00:00:00:00:00:00:00:08:f1:04:03:97:08:c5
  brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
inet 192.169.3.235/24 brd 192.169.3.255 scope global ib0
inet6 fe80::208:f104:397:8c5/64 scope link
   valid_lft forever preferred_lft forever

# ip m s ib0
9:  ib0
link  00:ff:ff:ff:ff:12:40:1b:00:00:00:00:00:00:00:00:00:00:00:01
link  00:ff:ff:ff:ff:12:60:1b:00:00:00:00:00:00:00:01:ff:97:08:c5
link  00:ff:ff:ff:ff:12:60:1b:00:00:00:00:00:00:00:00:00:00:00:01
inet  224.0.0.1
inet6 ff02::1:ff97:8c5
inet6 ff02::1


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] IPOIB NAPI

2006-10-16 Thread Eli Cohen
On Sun, 2006-10-15 at 09:39 -0700, Roland Dreier wrote:
 I've been meaning to mention this... I have a preliminary version in
 
 git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
 ipoib-napi
 
 There are further changes I would like to add on top of that, but
 comments on the two patches there would be appreciated.  And also
 benchmarks would be good.

Please diff to see my comments. Generaly it looks like the condition on
netif_rx_reschedule() should be inverted. Also ou need to set max to
some large value since you don't know if how many completions you missed
and you want to make sure you get all the ones the sneaked from the last
poll to the request notify.

int ipoib_poll(struct net_device *dev, int *budget)
{
struct ipoib_dev_priv *priv = netdev_priv(dev);
int max = min(*budget, dev-quota);
int done;
int t;
int empty;
int missed_event;
int n, i;

repoll:
done  = 0;
empty = 0;

while (max) {
t = min(IPOIB_NUM_WC, max);
n = ib_poll_cq(priv-cq, t, priv-ibwc);

for (i = 0; i  n; ++i) {
if (priv-ibwc[i].wr_id  IPOIB_OP_RECV) {
++done;
--max;
ipoib_ib_handle_rx_wc(dev, priv-ibwc + i);
} else
ipoib_ib_handle_tx_wc(dev, priv-ibwc + i);
}

if (n != t) {
empty = 1;
break;
}
}

dev-quota -= done;
*budget-= done;

if (empty) {
netif_rx_complete(dev);
ib_req_notify_cq(priv-cq, IB_CQ_NEXT_COMP, missed_event);
if (missed_event  !netif_rx_reschedule(dev, 0)) {
max = 1000;
goto repoll;
}

return 0;
}

return 1;
}



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] IPOIB NAPI

2006-10-16 Thread Roland Dreier
Eli Please diff to see my comments. Generaly it looks like the
Eli condition on netif_rx_reschedule() should be inverted.

Why?  A return value of 0 means that the reschedule failed (probably
because the poll routine is already running somewhere else) and the
poll routine should just return.  I think the code is correct as it stands.

Eli Also ou need to set max to some large value since you don't
Eli know if how many completions you missed and you want to make
Eli sure you get all the ones the sneaked from the last poll to
Eli the request notify.

Why?  max is there to limit us from doing more work than the quota
passed in from the networking stack.  If we fail to drain the CQ
because we exhaust max, then the poll routine will return 1 and will
remain scheduled, so the networking stack will call the poll routine
again to continue grabbing completions.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] IPOIB NAPI

2006-10-16 Thread Eli Cohen
On Mon, 2006-10-16 at 09:48 -0700, Roland Dreier wrote:
 Eli Please diff to see my comments. Generaly it looks like the
 Eli condition on netif_rx_reschedule() should be inverted.
 
 Why?  A return value of 0 means that the reschedule failed (probably
 because the poll routine is already running somewhere else) and the
 poll routine should just return.  I think the code is correct as it stands.
 
 Eli Also ou need to set max to some large value since you don't
 Eli know if how many completions you missed and you want to make
 Eli sure you get all the ones the sneaked from the last poll to
 Eli the request notify.
 
 Why?  max is there to limit us from doing more work than the quota
 passed in from the networking stack.  If we fail to drain the CQ
 because we exhaust max, then the poll routine will return 1 and will
 remain scheduled, so the networking stack will call the poll routine
 again to continue grabbing completions.
 
  - R.

OK I see what you mean. So I guess it's OK then.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] IPOIB NAPI

2006-10-16 Thread Shirley Ma

Roland,

Don't know why I have trouble to get this patch from your git tree. Do you mind to post this patch here so I can test the performance over ehca?

Thanks
Shirley Ma___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] IPOIB NAPI

2006-10-16 Thread Michael S. Tsirkin
Quoting r. Roland Dreier [EMAIL PROTECTED]:
 There are further changes I would like to add on top of that, but
 comments on the two patches there would be appreciated.

A small optimization:

if (missed_event  netif_rx_reschedule(dev, 0))

should be, I think

if (unlikely(missed_event)  netif_rx_reschedule(dev, 0))

since we are talking about an unlikely race where CQ became non-empty
just as we were calling req_notify_cq.

An API idea:
how about instead testing missed_events, we add a flag:

IB_CQ_TEST (or a longer name IB_CQ_REPORT_MISSED_EVENTS?)
and change ib_req_notify_cq to return int which will keep
the missed_events value, only if this flag is set?

This has 2 advatages
- Less churn updating all users to new API - they just ignore return value -
  and still almost no overhead for them as they don't set IB_CQ_TEST
- For all users we have to push less values on stack - note compiler can't
  get rid of them as we are calling function through a pointer
- For users that do
  missed_events = ib_req_notify_cq(priv-cq, IB_CQ_NEXT_COMP | IB_CQ_TEST)
  we get the result in register.

I agree its a minor optimization, but I think quite a similiar change went in
in the linux irq code - waste not, want not.

Want to see hw a patch like this will look?

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] IPOIB NAPI

2006-10-15 Thread Eli Cohen
Hi Roland,

can you tell when you are going to push your NAPI patch to ipoib? Is
there anything I can do to help making this happen?


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] IPOIB NAPI

2006-10-15 Thread Roland Dreier
Eli Hi Roland, can you tell when you are going to push your NAPI
Eli patch to ipoib? Is there anything I can do to help making
Eli this happen?

I've been meaning to mention this... I have a preliminary version in

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
ipoib-napi

There are further changes I would like to add on top of that, but
comments on the two patches there would be appreciated.  And also
benchmarks would be good.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] ipoib: ignores dma mapping errors on TX?

2006-10-10 Thread Michael S. Tsirkin
Quoting r. Roland Dreier [EMAIL PROTECTED]:
 + if (unlikely(dma_mapping_error(addr))) {
 + ++priv-stats.tx_errors;
 + dev_kfree_skb_any(skb);
 + return;
 + }

Do we want a warning there?

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] ipoib: ignores dma mapping errors on TX?

2006-10-10 Thread Tom Tucker

Does anyone know what might happen if a device tries to bus master
bad_dma_address. Does it get a pci-abort, an NMI, a bus err interrupt, all
of the above?


On 10/9/06 1:01 PM, Roland Dreier [EMAIL PROTECTED] wrote:

 Michael It seems that IPoIB ignores the possibility that
 Michael dma_map_single with DMA_TO_DEVICE direction might return
 Michael dma_mapping_error.
 
 Michael Is there some reason that such mappings can't fail?
 
 No, it's just an oversight.  Most network device drivers don't check
 for DMA mapping errors but it's probably better to do so anyway.  I
 added this to my queue:
 
 commit 8edaf479946022d67350d6c344952fb65064e51b
 Author: Roland Dreier [EMAIL PROTECTED]
 Date:   Mon Oct 9 10:54:20 2006 -0700
 
 IPoIB: Check for DMA mapping error for TX packets
 
 Signed-off-by: Roland Dreier [EMAIL PROTECTED]
 
 diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
 b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
 index f426a69..8bf5e9e 100644
 --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
 +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
 @@ -355,6 +355,11 @@ void ipoib_send(struct net_device *dev,
 tx_req-skb = skb;
 addr = dma_map_single(priv-ca-dma_device, skb-data, skb-len,
  DMA_TO_DEVICE);
 + if (unlikely(dma_mapping_error(addr))) {
 +  ++priv-stats.tx_errors;
 +  dev_kfree_skb_any(skb);
 +  return;
 + }
 pci_unmap_addr_set(tx_req, mapping, addr);
  
 if (unlikely(post_send(priv, priv-tx_head  (ipoib_sendq_size - 1),
 
 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general
 
 To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
 



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] ipoib: ignores dma mapping errors on TX?

2006-10-10 Thread Michael Krause
At 10:24 AM 10/10/2006, Tom Tucker wrote:

Does anyone know what might happen if a device tries to bus master
bad_dma_address. Does it get a pci-abort, an NMI, a bus err interrupt, all
of the above?

It depends upon the platform.   Some will enter a containment mode and, for 
example, shutdown the PCI Bus or the PCIe Root Port.  Others may trigger a 
system error and shutdown the system.  These responses are in part, a 
policy of the implementation and how the system is implemented.  In future 
chipsets that contain IOMMU / Address Translation Protection Tables (ATPT) 
/ pick your favorite name, the error can be contained to a single device 
and the appropriate error recovery triggered without requiring the system 
to go down.   Again, all policy at the end of the day as to what action is 
triggered.  For most, the potential for silent data corruption is too high 
to risk that bus or Root Port from continuing to operate without a reset / 
flush so containment is used at a minimum.

Mike



On 10/9/06 1:01 PM, Roland Dreier [EMAIL PROTECTED] wrote:

  Michael It seems that IPoIB ignores the possibility that
  Michael dma_map_single with DMA_TO_DEVICE direction might return
  Michael dma_mapping_error.
 
  Michael Is there some reason that such mappings can't fail?
 
  No, it's just an oversight.  Most network device drivers don't check
  for DMA mapping errors but it's probably better to do so anyway.  I
  added this to my queue:
 
  commit 8edaf479946022d67350d6c344952fb65064e51b
  Author: Roland Dreier [EMAIL PROTECTED]
  Date:   Mon Oct 9 10:54:20 2006 -0700
 
  IPoIB: Check for DMA mapping error for TX packets
 
  Signed-off-by: Roland Dreier [EMAIL PROTECTED]
 
  diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
  b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
  index f426a69..8bf5e9e 100644
  --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
  +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
  @@ -355,6 +355,11 @@ void ipoib_send(struct net_device *dev,
  tx_req-skb = skb;
  addr = dma_map_single(priv-ca-dma_device, skb-data, skb-len,
   DMA_TO_DEVICE);
  + if (unlikely(dma_mapping_error(addr))) {
  +  ++priv-stats.tx_errors;
  +  dev_kfree_skb_any(skb);
  +  return;
  + }
  pci_unmap_addr_set(tx_req, mapping, addr);
 
  if (unlikely(post_send(priv, priv-tx_head  (ipoib_sendq_size - 1),
 
  ___
  openib-general mailing list
  openib-general@openib.org
  http://openib.org/mailman/listinfo/openib-general
 
  To unsubscribe, please visit 
 http://openib.org/mailman/listinfo/openib-general
 



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit 
http://openib.org/mailman/listinfo/openib-general



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] ipoib: ignores dma mapping errors on TX?

2006-10-09 Thread Michael S. Tsirkin
It seems that IPoIB ignores the possibility that
dma_map_single with DMA_TO_DEVICE direction might return
dma_mapping_error.

Is there some reason that such mappings can't fail?

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] ipoib: ignores dma mapping errors on TX?

2006-10-09 Thread Roland Dreier
Michael It seems that IPoIB ignores the possibility that
Michael dma_map_single with DMA_TO_DEVICE direction might return
Michael dma_mapping_error.

Michael Is there some reason that such mappings can't fail?

No, it's just an oversight.  Most network device drivers don't check
for DMA mapping errors but it's probably better to do so anyway.  I
added this to my queue:

commit 8edaf479946022d67350d6c344952fb65064e51b
Author: Roland Dreier [EMAIL PROTECTED]
Date:   Mon Oct 9 10:54:20 2006 -0700

IPoIB: Check for DMA mapping error for TX packets

Signed-off-by: Roland Dreier [EMAIL PROTECTED]

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c 
b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
index f426a69..8bf5e9e 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
@@ -355,6 +355,11 @@ void ipoib_send(struct net_device *dev, 
tx_req-skb = skb;
addr = dma_map_single(priv-ca-dma_device, skb-data, skb-len,
  DMA_TO_DEVICE);
+   if (unlikely(dma_mapping_error(addr))) {
+   ++priv-stats.tx_errors;
+   dev_kfree_skb_any(skb);
+   return;
+   }
pci_unmap_addr_set(tx_req, mapping, addr);
 
if (unlikely(post_send(priv, priv-tx_head  (ipoib_sendq_size - 1),

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] ipoib mcast questions...

2006-10-06 Thread Arthur Jones
hi roland, ...

On Thu, Oct 05, 2006 at 09:18:36PM -0700, Roland Dreier wrote:
   1) the set_multicast_list net device callback
   seems to just kick off another thread to do
   the work of registering the multicast group.
   the mc_list net_device field is only valid
   under the netif_tx_lock, but this lock is not
   grabbed by the restart_task.  what happens
   if the mc_list is modified while in the
   restart_task?
 
 Just looking quickly, I see that ipoib_mcast_restart_task() does
 netif_tx_lock() (right near the top).  Isn't this sufficient?

doh!  i just missed it -- i predicted it would
be missing, so i made it missing...

   2) there seem to be 2 threads, the restart_task
   which creates queries and the join_task which sends
   off the mad requests.  why?  is there some performance
   advantage?  it would seem easier to do the registrations
   serially in the restart task...
 
 I guess it's really that way mainly for historical reasons.  I'd be
 glad to see patches that simplify things (of course making sure that
 everything still works ;)

i'm imagining that all the proprietary eth
interfaces + ipoib need to do about the same
thing when it comes to registering with mcast
groups.  would you (all) be averse to pulling some
of the mcast group registration code out into
the core ib driver for all to use?

arthur

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] ipoib mcast questions...

2006-10-06 Thread Hal Rosenstock
On Fri, 2006-10-06 at 11:17, Arthur Jones wrote:
 hi roland, ...
 
 On Thu, Oct 05, 2006 at 09:18:36PM -0700, Roland Dreier wrote:
1) the set_multicast_list net device callback
seems to just kick off another thread to do
the work of registering the multicast group.
the mc_list net_device field is only valid
under the netif_tx_lock, but this lock is not
grabbed by the restart_task.  what happens
if the mc_list is modified while in the
restart_task?
  
  Just looking quickly, I see that ipoib_mcast_restart_task() does
  netif_tx_lock() (right near the top).  Isn't this sufficient?
 
 doh!  i just missed it -- i predicted it would
 be missing, so i made it missing...
 
2) there seem to be 2 threads, the restart_task
which creates queries and the join_task which sends
off the mad requests.  why?  is there some performance
advantage?  it would seem easier to do the registrations
serially in the restart task...
  
  I guess it's really that way mainly for historical reasons.  I'd be
  glad to see patches that simplify things (of course making sure that
  everything still works ;)
 
 i'm imagining that all the proprietary eth
 interfaces + ipoib need to do about the same
 thing when it comes to registering with mcast
 groups.  would you (all) be averse to pulling some
 of the mcast group registration code out into
 the core ib driver for all to use?

Isn't this already done with Sean's multicast work ?

-- Hal

 
 arthur
 
 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general
 
 To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
 


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] ipoib mcast questions...

2006-10-06 Thread Arthur Jones
hi hal, ...

On Fri, Oct 06, 2006 at 11:26:26AM -0400, Hal Rosenstock wrote:
  [...]
  i'm imagining that all the proprietary eth
  interfaces + ipoib need to do about the same
  thing when it comes to registering with mcast
  groups.  would you (all) be averse to pulling some
  of the mcast group registration code out into
  the core ib driver for all to use?
 
 Isn't this already done with Sean's multicast work ?

i didn't know about this work.  do you know where i can
find it?

arthur

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] ipoib mcast questions...

2006-10-06 Thread Hal Rosenstock
On Fri, 2006-10-06 at 11:44, Arthur Jones wrote:
 hi hal, ...
 
 On Fri, Oct 06, 2006 at 11:26:26AM -0400, Hal Rosenstock wrote:
   [...]
   i'm imagining that all the proprietary eth
   interfaces + ipoib need to do about the same
   thing when it comes to registering with mcast
   groups.  would you (all) be averse to pulling some
   of the mcast group registration code out into
   the core ib driver for all to use?
  
  Isn't this already done with Sean's multicast work ?
 
 i didn't know about this work.  do you know where i can
 find it?

I think it is in svn trunk.

-- Hal

 
 arthur


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] ipoib mcast questions...

2006-10-06 Thread Sean Hefty
Hal Rosenstock wrote:
i didn't know about this work.  do you know where i can
find it?
 
 
 I think it is in svn trunk.

It's in svn.  I've create patches against for-2.6.19, and will post that as 
part 
of a request to merge some on the features upstream.

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] ipoib mcast questions...

2006-10-06 Thread Arthur Jones
thanks all!  i'll have a look...

arthur

On Fri, Oct 06, 2006 at 09:37:39AM -0700, Sean Hefty wrote:
 Hal Rosenstock wrote:
 i didn't know about this work.  do you know where i can
 find it?
 
 
 I think it is in svn trunk.
 
 It's in svn.  I've create patches against for-2.6.19, and will post that as 
 part of a request to merge some on the features upstream.
 
 - Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] ipoib mcast questions...

2006-10-06 Thread Arthur Jones
hi hal, ...

On Fri, Oct 06, 2006 at 11:26:26AM -0400, Hal Rosenstock wrote:
  [...]
  i'm imagining that all the proprietary eth
  interfaces + ipoib need to do about the same
  thing when it comes to registering with mcast
  groups.  would you (all) be averse to pulling some
  of the mcast group registration code out into
  the core ib driver for all to use?
 
 Isn't this already done with Sean's multicast work ?

after reading the code, iiuc, sean's work provides
nice infrastructure for ib_multicast group join/leave.
i was thinking about one more level up, i.e. generic
_net_ multicast join/leave infrastructure.  i'm not
sure exactly how it would go -- but i think all the ib
net_devices are going to need a way to associate a
multicast hw addr w/ a live mgid.  if that could be
broken out, we could all share it...

arthur

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] ipoib mcast questions...

2006-10-06 Thread Hal Rosenstock
On Fri, 2006-10-06 at 15:47, Arthur Jones wrote:
 hi hal, ...
 
 On Fri, Oct 06, 2006 at 11:26:26AM -0400, Hal Rosenstock wrote:
   [...]
   i'm imagining that all the proprietary eth
   interfaces + ipoib need to do about the same
   thing when it comes to registering with mcast
   groups.  would you (all) be averse to pulling some
   of the mcast group registration code out into
   the core ib driver for all to use?
  
  Isn't this already done with Sean's multicast work ?
 
 after reading the code, iiuc, sean's work provides
 nice infrastructure for ib_multicast group join/leave.
 i was thinking about one more level up, i.e. generic
 _net_ multicast join/leave infrastructure.  i'm not
 sure exactly how it would go -- but i think all the ib
 net_devices are going to need a way to associate a
 multicast hw addr w/ a live mgid.

Don't IPmc addresses translate to MGIDs per the RFC ?
MGIDs are not hardware addresses (MLIDs are).

-- Hal

   if that could be broken out, we could all share it...

 arthur


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] ipoib mcast questions...

2006-10-06 Thread Arthur Jones
hi hal, ...

On Fri, Oct 06, 2006 at 04:09:05PM -0400, Hal Rosenstock wrote:
 On Fri, 2006-10-06 at 15:47, Arthur Jones wrote:
  hi hal, ...
  
  On Fri, Oct 06, 2006 at 11:26:26AM -0400, Hal Rosenstock wrote:
[...]
i'm imagining that all the proprietary eth
interfaces + ipoib need to do about the same
thing when it comes to registering with mcast
groups.  would you (all) be averse to pulling some
of the mcast group registration code out into
the core ib driver for all to use?
   
   Isn't this already done with Sean's multicast work ?
  
  after reading the code, iiuc, sean's work provides
  nice infrastructure for ib_multicast group join/leave.
  i was thinking about one more level up, i.e. generic
  _net_ multicast join/leave infrastructure.  i'm not
  sure exactly how it would go -- but i think all the ib
  net_devices are going to need a way to associate a
  multicast hw addr w/ a live mgid.
 
 Don't IPmc addresses translate to MGIDs per the RFC ?

that's a different problem than the one i'm
trying to address.  i think you're talking
about mapping ip mcast addresses to hardware
addresses.  rfc4391 tells ipoib how to do that,
for the virtual ethernet devices, we'll need
to come up w/ something different...

 MGIDs are not hardware addresses (MLIDs are).

mgids are generated from the mc_list-dmi_addr.
this is a hardware address to the linux net
code.  i'm looking for commonality to reduce
duplicated code.  we all (ipoib + virtual eth)
need to associate mgids, however we got them,
with the mlids (i think).  i'm guessing we'll
do it in a very similar way...

arthur

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] ipoib mcast questions...

2006-10-05 Thread Arthur Jones
hi all, i'm looking over the ipoib multicast
code, and i have a couple questions:

1) the set_multicast_list net device callback
seems to just kick off another thread to do
the work of registering the multicast group.
the mc_list net_device field is only valid
under the netif_tx_lock, but this lock is not
grabbed by the restart_task.  what happens
if the mc_list is modified while in the
restart_task?

2) there seem to be 2 threads, the restart_task
which creates queries and the join_task which sends
off the mad requests.  why?  is there some performance
advantage?  it would seem easier to do the registrations
serially in the restart task...

arthur

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] ipoib mcast questions...

2006-10-05 Thread Roland Dreier
  1) the set_multicast_list net device callback
  seems to just kick off another thread to do
  the work of registering the multicast group.
  the mc_list net_device field is only valid
  under the netif_tx_lock, but this lock is not
  grabbed by the restart_task.  what happens
  if the mc_list is modified while in the
  restart_task?

Just looking quickly, I see that ipoib_mcast_restart_task() does
netif_tx_lock() (right near the top).  Isn't this sufficient?

  2) there seem to be 2 threads, the restart_task
  which creates queries and the join_task which sends
  off the mad requests.  why?  is there some performance
  advantage?  it would seem easier to do the registrations
  serially in the restart task...

I guess it's really that way mainly for historical reasons.  I'd be
glad to see patches that simplify things (of course making sure that
everything still works ;)

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] ipoib question when running on the same node as opensm

2006-10-04 Thread Ira Weiny
We just brought another cluster up and had an issue with our management node
(node running opensm) not coming up on ipoib.  Here is what happened and how I
got it working and I had some questions.

1) We had both opensm running and a switch based Voltaire SM running.  This
   caused problems.

2) We stopped the Voltaire SM and restarted all the nodes.  This got all of the
   nodes except the one with opensm running to work.

3) I had to unload all the modules, load only those needed by opensm, start
   opensm, and then bring up the ipoib interface.  At this point the node
   seemed to be in the multicast group and ipoib worked fine.

Does this seem like proper behavior?  I would think that on boot if ipoib does
not find a SM running it will delay setting up a connection until the SM comes
on-line?  (ie when the opensm init script gets run.)

It seems like the card saves some information (from the Voltaire SM) across a
soft reboot?  I know that it was not coming up in the multicast group with the
opensm.  Is this by design?

At this point ipoib seems to work fine after a reboot even though the interface
is brought up before opensm.  Do I need to ensure that opensm is up before all
ipoib requests in the future?

Thanks,
Ira Weiny
[EMAIL PROTECTED]


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] ipoib question when running on the same node as opensm

2006-10-04 Thread Michael S. Tsirkin
Quoting r. Ira Weiny [EMAIL PROTECTED]:
 Do I need to ensure that opensm is up before all
 ipoib requests in the future?

Shouldn't be required, thing work well for me, anyway.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [ipoib] [PATCH] - Removed unused include of vmalloc.h

2006-09-21 Thread Dotan Barak
IPoIB: Removed unused include of vmalloc.h.

Signed-off-by: Dotan Barak [EMAIL PROTECTED]
---
Index: last_stable/drivers/infiniband/ulp/ipoib/ipoib_main.c
===
--- last_stable.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c  
2006-08-07 17:45:02.0 +0300
+++ last_stable/drivers/infiniband/ulp/ipoib/ipoib_main.c   
2006-08-08 09:36:45.0 +0300
@@ -40,7 +40,6 @@

 #include linux/init.h
 #include linux/slab.h
-#include linux/vmalloc.h
 #include linux/kernel.h

 #include linux/if_arp.h  /* For ARPHRD_xxx */


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] ipoib multicast problem

2006-09-19 Thread Eli cohen
Hi,
I have seen the following problem with ipoib:

1. An application registers to a multicast group as a full member. As a
result all the groups are listed in dev-mclist.
2. The infiniband link falls momentarily, opensm restarted etc.
3. All multicast memberships are flushed.
4. The net device will not join again until at a later time something
will cause ipoib_set_mcast_list() to be called.
 



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] ipoib multicast problems on RHEL4.0 u4

2006-09-19 Thread Eli cohen
Hi,

while testing ipoib multicast on RHEL4.0 u4, I noticed that setsockopt()
succeeds to add a multicast group to an interface but actually the
multicast group is not added to the net_device. This means that an
application cannot join a multicast group as a full member. When I
examined the differences between the kernel sources for u3 and u4 I
noticed that essential code was removed:

diff -ru net/ipv4/arp.c ../linux-2.6.9-42.ELsmp/net/ipv4/arp.c
--- net/ipv4/arp.c  2006-09-18 15:35:03.0 +0300
+++ ../linux-2.6.9-42.ELsmp/net/ipv4/arp.c  2006-09-19
10:08:06.0 +0300
@@ -213,9 +213,6 @@
case ARPHRD_IEEE802_TR:
ip_tr_mc_map(addr, haddr);
return 0;
-   case ARPHRD_INFINIBAND:
-   ip_ib_mc_map(addr, haddr);
-   return 0;
default:
if (dir) {
memcpy(haddr, dev-broadcast, dev-addr_len);


Can anyone suggest a workaround to this issue?

Thanks
Eli


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] ipoib multicast problem

2006-09-19 Thread Roland Dreier
Eli 1. An application registers to a multicast group as a full
Eli member. As a result all the groups are listed in dev-mclist.
Eli 2. The infiniband link falls momentarily, opensm restarted
Eli etc.  3. All multicast memberships are flushed.  4. The net
Eli device will not join again until at a later time something
Eli will cause ipoib_set_mcast_list() to be called.
 
I don't understand.  How could ipoib rejoin the broadcast group and
then not rejoin the rest of the full member groups it has?

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] ipoib multicast problems on RHEL4.0 u4

2006-09-19 Thread Doug Ledford
On Tue, 2006-09-19 at 14:44 +0300, Eli cohen wrote:
 Hi,
 
 while testing ipoib multicast on RHEL4.0 u4, I noticed that setsockopt()
 succeeds to add a multicast group to an interface but actually the
 multicast group is not added to the net_device. This means that an
 application cannot join a multicast group as a full member. When I
 examined the differences between the kernel sources for u3 and u4 I
 noticed that essential code was removed:
 
 diff -ru net/ipv4/arp.c ../linux-2.6.9-42.ELsmp/net/ipv4/arp.c
 --- net/ipv4/arp.c  2006-09-18 15:35:03.0 +0300
 +++ ../linux-2.6.9-42.ELsmp/net/ipv4/arp.c  2006-09-19
 10:08:06.0 +0300
 @@ -213,9 +213,6 @@
 case ARPHRD_IEEE802_TR:
 ip_tr_mc_map(addr, haddr);
 return 0;
 -   case ARPHRD_INFINIBAND:
 -   ip_ib_mc_map(addr, haddr);
 -   return 0;
 default:
 if (dir) {
 memcpy(haddr, dev-broadcast, dev-addr_len);
 
 
 Can anyone suggest a workaround to this issue?

Short of spinning a kernel, it's going to be hard to work around.
Thanks for finding this, I'll track down how this got left out of the U4
kernel when it was in the U3 kernel :-/

-- 
Doug Ledford [EMAIL PROTECTED]
  GPG KeyID: CFBFF194
  http://people.redhat.com/dledford

Infiniband specific RPMs available at
  http://people.redhat.com/dledford/Infiniband


signature.asc
Description: This is a digitally signed message part
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] ipoib multicast problem

2006-09-19 Thread eli
 
 I don't understand.  How could ipoib rejoin the broadcast group and
 then not rejoin the rest of the full member groups it has?


That is because the broadcast group is not part of the multicast groups
maintained by the kernel but rather is part of ipoib and is joined from a
different function. The other full members are maintained by the kernel
for the net device and come from dev-mclist.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] ipoib multicast problem

2006-09-19 Thread Roland Dreier
eli That is because the broadcast group is not part of the
eli multicast groups maintained by the kernel but rather is part
eli of ipoib and is joined from a different function. The other
eli full members are maintained by the kernel for the net device
eli and come from dev-mclist.

Oh I see, when we flush the multicast groups we actually delete all of
them instead of just removing the attached flag.  OK I guess your fix
makes sense then.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] ipoib send only failure

2006-09-14 Thread Eli cohen
Hi,
when running a test I encountered the following scenario:
the test sends to multicast address
ipoib issues send only joins which fails.
successive joins to this group will not be attempted since the query
field of the mcast object holds the old pointer.




___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] IPOIB failover ?

2006-09-13 Thread Or Gerlitz
Richard Frank wrote:
 Does IPOIB in this stack support transparent fail over between ports and
 across redundant HCAs using a virtual IP ?

I am working on a patch to the linux bonding driver which will allow it 
to enslave also IPoIB devices for the active-backup mode. I will send an 
RFC to netdev for review next week. Does this meets your needs?

Does by virtual IP you mean an ***alias address*** assigned at one point 
of time to one ipoib device and in another point of time (eg during 
fail-over) to a second ipoib device?  does this approach have any 
advantage on the bonding approach?

Or.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] IPOIB failover ?

2006-09-13 Thread Richard Frank
Supporting IPOIB fail over with the Bonding driver will work - we
currently use this for GE, etc. 


On Wed, 2006-09-13 at 14:27 +0300, Or Gerlitz wrote:
 Richard Frank wrote:
  Does IPOIB in this stack support transparent fail over between ports and
  across redundant HCAs using a virtual IP ?
 
 I am working on a patch to the linux bonding driver which will allow it 
 to enslave also IPoIB devices for the active-backup mode. I will send an 
 RFC to netdev for review next week. Does this meets your needs?
 
 Does by virtual IP you mean an ***alias address*** assigned at one point 
 of time to one ipoib device and in another point of time (eg during 
 fail-over) to a second ipoib device?  does this approach have any 
 advantage on the bonding approach?
 
 Or.
 
 
 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general
 
 To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
 


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] IPOIB failover ?

2006-09-13 Thread Cain, Brian (GE Healthcare)
 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Richard Frank
 Sent: Wednesday, September 13, 2006 7:12 AM
 To: Or Gerlitz
 Cc: openib-general@openib.org
 Subject: Re: [openib-general] IPOIB failover ?
 
 Supporting IPOIB fail over with the Bonding driver will work - we
 currently use this for GE, etc. 

You can also get failover with IPoIB if you're willing to use SCTP as
the transport.

-Brian

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



  1   2   3   4   >