Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24

2007-10-05 Thread Roland Dreier
  I tested this by simulating a slow passive side responder, and it worked as
  expected for those tests.  Using an MRA does add another MAD to the CM 
  exchange,
  which is why it is sent only after seeing a duplicate request.  
  Alternatively,
  we can take the OFED module parameter patch.

What the heck, I added this for 2.6.24.  If it doesn't work out we can
back it out.

 - R.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: InfiniBand/RDMA merge plans for 2.6.24

2007-10-03 Thread Shirley Ma
Roland Dreier [EMAIL PROTECTED] wrote on 09/17/2007 02:47:42 PM:

IPoIB CM handles this properly by gathering together single pages 
in
skbs' fragment lists.
 
   Then can we reuse IPoIB CM code here?
 
 Yes, if possible, refactoring things so that the rx skb allocation
 code becomes common between CM and non-CM would definitely make sense.

IPoIB-CM rx skb allocation is not generic to be used by UD, it allocates 
more buffers than needed if mtu is not 64K, and doesn't query the real 
max_num_sg from the device. I am thinking to have a generic skb allocation 
in IPoIB based on matrix of (ipoib-mtu-size, page-size, max_num_sg, 
head-size).

Thanks
Shirley 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24

2007-10-02 Thread Roland Dreier
  OK -- just to make sure I'm understanding what you're saying: have you
  confirmed that your proposed [CM MRA] patches actually fix the issue?
  
  Not directly.  I cannot easily test kernel patches on our larger, production
  clusters.  We've seen the issue with specific applications on 512 and 1024
  cores, but I've only been able to test the patch on a 48-core cluster.  I 
  have
  verified that it successfully increases the timeout to where it *should* 
  work,
  but cannot absolutely confirm that it will fix the problem.  I'm unlikely to
  know that until the production clusters move to an OFED release (1.3?)
  containing this patch.

Umm... this is a difficult situation for me to merge the changes then.
We're changing the CM retry behavior blind here.  How do we know that
the MRA changes don't make the scalability issue worse?

 - R.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24

2007-10-02 Thread Sean Hefty
Umm... this is a difficult situation for me to merge the changes then.
We're changing the CM retry behavior blind here.  How do we know that
the MRA changes don't make the scalability issue worse?

What's currently upstream doesn't work for Intel MPI on our larger clusters.
The connection requests time out on the active side before the passive side can
respond.

The OFED release works because it provides a kernel patch to make the timeout a
module parameter.  I'm trying to avoid adding a module parameter, and the MRA is
designed for this situation.

I tested this by simulating a slow passive side responder, and it worked as
expected for those tests.  Using an MRA does add another MAD to the CM exchange,
which is why it is sent only after seeing a duplicate request.  Alternatively,
we can take the OFED module parameter patch.

- Sean
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: InfiniBand/RDMA merge plans for 2.6.24

2007-09-19 Thread Michael S. Tsirkin
 Missing from this list (IMPORTANT patch!):
 [ofa-general] [PATCH 2 of 2] IB/mlx4: Handle new FW requirement for send 
 request prefetching, for WQE sg lists
 (Posted by me to list on Sept 4)
 {patch header: 
 This is an addendum to Roland's commit 
 0e6e74162164d908edf7889ac66dca09e7505745
 (June 18). This addendum adds prefetch headroom marking processing for s/g 
 segments.
 
 We write s/g segments in reverse order into the WQE, in order to guarantee
 that the first dword of all cachelines containing s/g segments is written last
 (overwriting the headroom invalidation pattern). The entire cacheline will 
 thus
 contain valid data when the invalidation pattern is overwritten.

This actually looks like a bugfix that might even have been appropriate
for 2.6.23. Roland, do you have this patch? Can you comment on it please?

-- 
MST
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24

2007-09-18 Thread Jack Morgenstein
On Thursday 13 September 2007 20:57, Roland Dreier wrote:
 HW specific:
 
  - I already merged patches to enable MSI-X by default for mthca and
    mlx4.  I hope there aren't too many systems that get hosed if a
    MSI-X interrupt is generated.
 
  - Jack and Michael's mlx4 FMR support.  Will merge I guess, although
    I do hope to have time to address the DMA API abuse that is being
    copied from mthca, so that mlx4 and mthca work in Xen domU.
 
  - ehca patch queue.  Will merge, pending fixes for the few minor
    issues I commented on.
 
  - Steve's mthca router mode support.  Would be nice to see a review
    from someone at Mellanox.
 
  - Arthur's mthca doorbell alignment fixes.  I will experiment with a
    few different approaches and post what I like (and fix mlx4 as
    well).  I hope Arthur can review.
 
  - Michael's mlx4 WQE shrinking patch.  Not sure yet; I'll reply to
    the latest patch directly.
 
Missing from this list (IMPORTANT patch!):
[ofa-general] [PATCH 2 of 2] IB/mlx4: Handle new FW requirement for send 
request prefetching, for WQE sg lists
(Posted by me to list on Sept 4)
{patch header: 
This is an addendum to Roland's commit 0e6e74162164d908edf7889ac66dca09e7505745
(June 18). This addendum adds prefetch headroom marking processing for s/g 
segments.

We write s/g segments in reverse order into the WQE, in order to guarantee
that the first dword of all cachelines containing s/g segments is written last
(overwriting the headroom invalidation pattern). The entire cacheline will thus
contain valid data when the invalidation pattern is overwritten.
}
This patch series (1 of 2 is for libmlx4, the same issue).


Also, I'm now posting (in a separate post) the following patch to mlx4, which 
is important:
  display the following device information via sysfs:
  board_id, fw_ver, hw_rev, hca_type.

  The info is displayed under directory /sys/class/infiniband/mlx4_x, where x is
  the pci bus sequence number (starting from zero).

  This patch makes information available to ibstat and ibv_devinfo under the
  same directory as is used for tavor/arbel/sinai -- thus requiring no userspace
  modifications.

- Jack


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24

2007-09-18 Thread Tziporet Koren

Hal Rosenstock wrote:


Has anyone tested these with QoS actually be used ? I suppose this
requires Connect-X.
  


You can test it with a switch without ConnectX.
If you want that the HCA will react to the QoS setting too then you 
should have ConnectX




Tziporet
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: InfiniBand/RDMA merge plans for 2.6.24

2007-09-18 Thread Michael S. Tsirkin
 Quoting Roland Dreier [EMAIL PROTECTED]:
 Subject: InfiniBand/RDMA merge plans for 2.6.24
 
 With 2.6.24 probably opening in the not-too-distant future, it's
 probably a good time to review what my plans are for when the merge
 window opens.

Roland, could you merge the common TX CQ patch please?
It actually fixes a real problem.


-- 
MST
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: InfiniBand/RDMA merge plans for 2.6.24

2007-09-18 Thread Roland Dreier
  Roland, could you merge the common TX CQ patch please?
  It actually fixes a real problem.

Yes, I will, but it collides with the net-2.6.24 NAPI rework I think,
so it may not go in until a few days after the merge window.

Have you verified that the patch cures the interrupt overload issues?
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: InfiniBand/RDMA merge plans for 2.6.24

2007-09-18 Thread Michael S. Tsirkin
 Quoting Roland Dreier [EMAIL PROTECTED]:
 Subject: Re: InfiniBand/RDMA merge plans for 2.6.24
 
   Roland, could you merge the common TX CQ patch please?
   It actually fixes a real problem.
 
 Yes, I will, but it collides with the net-2.6.24 NAPI rework I think,
 so it may not go in until a few days after the merge window.
 
 Have you verified that the patch cures the interrupt overload issues?

Yes.

-- 
MST
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: InfiniBand/RDMA merge plans for 2.6.24

2007-09-17 Thread Roland Dreier
   IPoIB CM handles this properly by gathering together single pages in
   skbs' fragment lists.

  Then can we reuse IPoIB CM code here?

Yes, if possible, refactoring things so that the rx skb allocation
code becomes common between CM and non-CM would definitely make sense.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24

2007-09-17 Thread Roland Dreier
  The IGMP enabling patch posted by me on September 2nd isn't on your list
  http://lists.openfabrics.org/pipermail/general/2007-September/040250.html
  can you add it?

Yes, I lost that somehow.  I will add it to my list of things to take
a look at (no opinion yet).

 - R.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24

2007-09-16 Thread Or Gerlitz

Roland Dreier wrote:

With 2.6.24 probably opening in the not-too-distant future, it's
probably a good time to review what my plans are for when the merge
window opens.



Core:
 - Sean's QoS changes.  These look fine at first glance, and I just
   plan to understand the backwards compatibility story (ie how this
   works with an old SM) and merge.  Anyone who objects let me know.


Hi Roland,

I have reviewed the qos patches and provided comments which were 
deployed in v2 of the series. I also tested it (ipoib and iser which is 
rdma-cm based) against the Voltaire SM/SA to see that nothing was 
broken. I will send you a reviewed by: signature.



ULPs:



[ofa-general] [PATCH RFC] IB/ipoib: enable IGMP for userpsace multicast IB apps

The IGMP enabling patch posted by me on September 2nd isn't on your list
http://lists.openfabrics.org/pipermail/general/2007-September/040250.html
can you add it?



 - Moni's IPoIB bonding support.  This seems mostly an issue of
   getting the core bonding maintainer's attention.  However getting a
   Reviewed-by: for the IPoIB changes wouldn't hurt too.


Jay Vosburgh, the bonding driver maintainer just sent an ack on all 
patch series. As for the IPoIB changes, there are three patches, where 
two of them, namely

[PATCH 02/11] IB/ipoib: Notify the world before doing unregister
[PATCH 04/11] IB/ipoib: Verify address handle validity on send

are handling a corner-case problems pointed by Michael Tsirkin.
Michael, will you be able to look on it and provide a reviewed-by 
signature? the third patch

[PATCH 03/11] IB/ipoib: Bound the net device to the ipoib_neigh structue

is somehow much more simple, I don't think more review is needed for it.


 - Eli and Michael's IPoIB stateless offload (checksum offload, LSO,
   LRO, etc).  It's a big series that makes quite a few core changes.
   I think it needs some careful review and is probably at risk of
   missing this merge window.  Sorting in order of invasiveness so we
   can merge at least some of it (if splitting it makes sense) might
   be a good idea.


Just for the record, the 'etc' above relates to the interrupt moderation 
support (mlx4, core, ipoib {config through ethertool, usage). Among 
other things, what is not clear to me here is if/how this goes 
hand-in-hand with NAPI.


As you saw the patch adding checksum offload support had a long thread, 
and I think the discussion has reached the point where Michael is 
waiting for your take on it.


As for the LSO, LRO patches, I did not see any review comment.

I will see that I can review from the series, to begin with, will send 
Eli some comments and questions.



HW specific:
 - Jack and Michael's mlx4 FMR support.  Will merge I guess, although
   I do hope to have time to address the DMA API abuse that is being
   copied from mthca, so that mlx4 and mthca work in Xen domU.


This patch series is somehow important as without them iser is useless 
over connectx. Can be nice if you merge this and at max fix the abuse later.


Or.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24

2007-09-15 Thread Steve Wise



Roland Dreier wrote:

  I was about to post v2 of my patch to avoid port space collisions with
  the native stack.  Can we get that 2.6.24?  It is high priority
  IMO. I've tried to solicit review on it, but I think folks are
  reluctant... ;-)

I would like to get this in, but I'm still at least a little
reluctant, since we would be committing to a user interface that seems
a little awkward at best, so I'd like to try and find something
better.  Just to summarize my understanding:

 - your patch requires the administration to configure an ethX:iwY
   alias address to use iwarp.  (By the way is there anything other
   than don't do that that avoids assigning the same address to the
   iwarp alias and a non-iwarp interface?)



Nope.  Its totally up to the admin to create the ethX:iwY interface 
-and- to segment his services so host TCP runs on the ethX subnet(s) and 
the iwarp rdma ones run on ethX:iwY subnet(s).  Without changing the 
core network serices, I don't see any way around this.



 - it would be nicer to create the alias automatically, but an alias
   without an address doesn't make sense.  Creating a whole separate
   net device causes problems because the iwarp stuff still needs to
   use the main net device to do ARP etc.



I do log a warning if an iwarp application binds to address 0.0.0.0 and 
there are no ethX:iwY address available.



 - so I'm out of better ideas but I still want to push back a little
   before we commit to something ugly.



Me 2. :-(


I've been meaning to track down the bnx2 iscsi offload patch to look
and see if this issue is addressed, since the same problem seems to
exist: it seems an iscsi connection and a main stack tcp connection
might share the same 4-tuple unless something is done to avoid that
happening.

Also, I think it behooves us to get some agreement on this approach
with NetEffect and Kanoj (NetXen?) at least, since their iwarp drivers
seem to be imminent.

 - R.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24

2007-09-14 Thread Evgeniy Polyakov
On Thu, Sep 13, 2007 at 01:59:21PM -0500, Steve Wise ([EMAIL PROTECTED]) wrote:
 Well, if it involves /sharing/ port space with the native stack, i.e. 
 where port 1234 is IB but 1235 is Linux, pretty much all the networking 
 devs have NAK'd that approach AFAICS.
 
 Jeff, I posted a fix that doesn't do this.  No port sharing.  The iwarp 
 device will use its own ip address and subnet to avoid collisions.  You 
 should review the patch when I post v2.

Could you please resend it, since I missed it in [EMAIL PROTECTED]

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: InfiniBand/RDMA merge plans for 2.6.24

2007-09-14 Thread Roland Dreier
  The patch is just needed to pick up broadcast MTU size instead of hard
  coding 2K right now. SKB allocation shouldn't be different with Ethernet
  Jambo Frame and IPoIB-CM which 64K MTU. I don't understand why it's
  different. Could you please explain this?

It's exactly the same problem as ethernet jumbo frames.  A web search
for 'order 1 failure e1000' might be interesting.

IPoIB CM handles this properly by gathering together single pages in
skbs' fragment lists.

 - R.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24

2007-09-14 Thread Roland Dreier
   I've been meaning to track down the bnx2 iscsi offload patch to look
   and see if this issue is addressed, since the same problem seems to
   exist: it seems an iscsi connection and a main stack tcp connection
   might share the same 4-tuple unless something is done to avoid that
   happening.

  iSCSI does not do passive listens, only active connections to the
  target.  But you're right, the port space is still shared between iSCSI
  and the main stack.  We currently rely on user apps binding to the main
  stack to reserve certain ephemeral ports, and telling the iSCSI driver
  which ports to use.

Got it... I wasn't thinking that clearly, but it is clear that a full
4-tuple collision with only active connections is quite unlikely.  I
guess you would have to make both an offloaded and a non-offloaded
iSCSI connection to the same target and get really unlucky with
ephemeral port allocation.  So in practice I guess it's not an issue
at all with your driver yet.

However, do you have any plans to support iSCSI offload for targets?
Also, looking at the first CNIC patch, I can't help but notice that
you seem to have at least some support for iWARP there.  How does the
CNIC look?  Does it share the same interface/addresses as the
non-offload NIC, or does it create a completely separate netdevice?

I want to make sure that whatever solution we come up with for cxgb3
doesn't cause problems for you.  And of course if you have a better
idea than what Steve has come up with, that would be great :)

 - R.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24

2007-09-14 Thread Sean Hefty
OK -- just to make sure I'm understanding what you're saying: have you
confirmed that your proposed patches actually fix the issue?

Not directly.  I cannot easily test kernel patches on our larger, production
clusters.  We've seen the issue with specific applications on 512 and 1024
cores, but I've only been able to test the patch on a 48-core cluster.  I have
verified that it successfully increases the timeout to where it *should* work,
but cannot absolutely confirm that it will fix the problem.  I'm unlikely to
know that until the production clusters move to an OFED release (1.3?)
containing this patch.

- Sean
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: InfiniBand/RDMA merge plans for 2.6.24

2007-09-14 Thread Shirley Ma
 IPoIB CM handles this properly by gathering together single pages in
 skbs' fragment lists.
 
  - R.

Then can we reuse IPoIB CM code here?

Thanks
Shirley 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24

2007-09-14 Thread Michael Chan
On Fri, 2007-09-14 at 09:18 -0700, Roland Dreier wrote:

 However, do you have any plans to support iSCSI offload for targets?
 Also, looking at the first CNIC patch, I can't help but notice that
 you seem to have at least some support for iWARP there.  How does the
 CNIC look?  Does it share the same interface/addresses as the
 non-offload NIC, or does it create a completely separate netdevice?

We will support iWARP in the future and it should be similar to the way
we do iSCSI - using the same interface/addresses as the bnx2 NIC.

 
 I want to make sure that whatever solution we come up with for cxgb3
 doesn't cause problems for you.  And of course if you have a better
 idea than what Steve has come up with, that would be great :)
 

We are looking at these discussions with great interest.  If we have any
new ideas, we'll definitely let everyone know.  Thanks.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


InfiniBand/RDMA merge plans for 2.6.24

2007-09-13 Thread Roland Dreier
With 2.6.24 probably opening in the not-too-distant future, it's
probably a good time to review what my plans are for when the merge
window opens.

At the kernel summit, we discussed patch review (doing a web search
for kernel summit reviewed-by: should turn up lots of info on
this).  Due to an unfortunate combination of vacation and conference
travel, summer colds, and other inconveniences, I am very backed up on
reviewing.  And in any case, I've allowed too much code review to be
dumped on me -- when there are dozens of people working on IB and RDMA
stuff, it obviously doesn't work to expect me to do all the reviewing.

Unfortunately, due to the length of the backlog and the fact that
2.6.23 seems fairly close, some of the things listed below are going
to miss the 2.6.24 merge window.  So, although the plan is to phase in
requiring Reviewed-by: gently, for this merge, if you can get
someone other than me to review your work, then the chances of it
being merged increase dramatically.  I'm talking about a real review--
ideally, someone independent (from another company would be good) who
is willing to provide a Reviewed-by: line that means the reviewer
has really looked at and thought about the patch.  There should be a
mailing list thread you can point me at where the reviewer comments on
the patch and a new version of that patch addressing all comments is
posted (or in exceptional cases, where the patch is perfect to start
with, where the reviewer says the patch is great).

For example, given the number of IPoIB changes pending, it might be a
good idea for the people submitting them to get together and trade
reviews (ie If you review my patch, I'll review your patch).  There
are a few cases where getting a review may not be necessary.  First of
all, trivial and obvious patches don't need a review.  It's a
judgement call what is trivial or obvious, and it's always a good idea
to provide a changelog that makes it clear why a patch is trivial and
obviously correct.  Second, hardware driver patches may not make sense
to anyone outside of the company whose hardware the driver is for.
Still, in this case, an internal Reviewed-by: would be nice, and also
a changelog that explains the reason for the change always helps
(don't just tell me what your patch does, but also explain what the
patch fixes and what the impact of the current situation is).

Anyway, here are all the pending things that I'm aware of.  As usual,
if something isn't already in my tree and isn't listed below, I
probably missed it or dropped it by mistake.  Please remind me again
in that case.

Core:

 - My user_mad P_Key index support patch.  I'll test the ioctl to
   change to the new mode and merge this I guess, since Hal and Sean
   have tested this out.

 - A fix to the user_mad 32-bit big-endian userspace 64/32 problem
   with the method_mask when registering agents.  I'll write a patch
   to handle this in a way that doesn't change the ABI for anything
   other than the broken case and hope to get someone to review this
   so it can be merged.

 - Sean's QoS changes.  These look fine at first glance, and I just
   plan to understand the backwards compatibility story (ie how this
   works with an old SM) and merge.  Anyone who objects let me know.

 - Sean's IB CM MRA interface changes.  Don't know at this point.  It
   seems OK but I'm not clear on what if any real-world improvement
   this gives us.

ULPs:

 - Pradeep's IPoIB CM support for devices that don't have SRQs.  I
   think the basic approach makes sense (I don't think faking SRQs at
   some other layer is really feasible) and I need to find time to
   look at the details to see if the current patch looks workable.  I'm
   likely to merge this; getting an independent Reviewed-by: would
   certainly be appreciated too.

 - Moni's IPoIB bonding support.  This seems mostly an issue of
   getting the core bonding maintainer's attention.  However getting a
   Reviewed-by: for the IPoIB changes wouldn't hurt too.

 - Rolf's IPoIB MGID scope changes.  Certainly we want to fix this
   issue but the specific changes need review.

 - Eli and Michael's IPoIB stateless offload (checksum offload, LSO,
   LRO, etc).  It's a big series that makes quite a few core changes.
   I think it needs some careful review and is probably at risk of
   missing this merge window.  Sorting in order of invasiveness so we
   can merge at least some of it (if splitting it makes sense) might
   be a good idea.

HW specific:

 - I already merged patches to enable MSI-X by default for mthca and
   mlx4.  I hope there aren't too many systems that get hosed if a
   MSI-X interrupt is generated.

 - Jack and Michael's mlx4 FMR support.  Will merge I guess, although
   I do hope to have time to address the DMA API abuse that is being
   copied from mthca, so that mlx4 and mthca work in Xen domU.

 - ehca patch queue.  Will merge, pending fixes for the few minor
   issues I commented on.

 - Steve's mthca router mode 

Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24

2007-09-13 Thread Steve Wise

Hey Roland,

I was about to post v2 of my patch to avoid port space collisions with 
the native stack.  Can we get that 2.6.24?  It is high priority IMO. 
I've tried to solicit review on it, but I think folks are reluctant... ;-)


Steve.



Roland Dreier wrote:

With 2.6.24 probably opening in the not-too-distant future, it's
probably a good time to review what my plans are for when the merge
window opens.

At the kernel summit, we discussed patch review (doing a web search
for kernel summit reviewed-by: should turn up lots of info on
this).  Due to an unfortunate combination of vacation and conference
travel, summer colds, and other inconveniences, I am very backed up on
reviewing.  And in any case, I've allowed too much code review to be
dumped on me -- when there are dozens of people working on IB and RDMA
stuff, it obviously doesn't work to expect me to do all the reviewing.

Unfortunately, due to the length of the backlog and the fact that
2.6.23 seems fairly close, some of the things listed below are going
to miss the 2.6.24 merge window.  So, although the plan is to phase in
requiring Reviewed-by: gently, for this merge, if you can get
someone other than me to review your work, then the chances of it
being merged increase dramatically.  I'm talking about a real review--
ideally, someone independent (from another company would be good) who
is willing to provide a Reviewed-by: line that means the reviewer
has really looked at and thought about the patch.  There should be a
mailing list thread you can point me at where the reviewer comments on
the patch and a new version of that patch addressing all comments is
posted (or in exceptional cases, where the patch is perfect to start
with, where the reviewer says the patch is great).

For example, given the number of IPoIB changes pending, it might be a
good idea for the people submitting them to get together and trade
reviews (ie If you review my patch, I'll review your patch).  There
are a few cases where getting a review may not be necessary.  First of
all, trivial and obvious patches don't need a review.  It's a
judgement call what is trivial or obvious, and it's always a good idea
to provide a changelog that makes it clear why a patch is trivial and
obviously correct.  Second, hardware driver patches may not make sense
to anyone outside of the company whose hardware the driver is for.
Still, in this case, an internal Reviewed-by: would be nice, and also
a changelog that explains the reason for the change always helps
(don't just tell me what your patch does, but also explain what the
patch fixes and what the impact of the current situation is).

Anyway, here are all the pending things that I'm aware of.  As usual,
if something isn't already in my tree and isn't listed below, I
probably missed it or dropped it by mistake.  Please remind me again
in that case.

Core:

 - My user_mad P_Key index support patch.  I'll test the ioctl to
   change to the new mode and merge this I guess, since Hal and Sean
   have tested this out.

 - A fix to the user_mad 32-bit big-endian userspace 64/32 problem
   with the method_mask when registering agents.  I'll write a patch
   to handle this in a way that doesn't change the ABI for anything
   other than the broken case and hope to get someone to review this
   so it can be merged.

 - Sean's QoS changes.  These look fine at first glance, and I just
   plan to understand the backwards compatibility story (ie how this
   works with an old SM) and merge.  Anyone who objects let me know.

 - Sean's IB CM MRA interface changes.  Don't know at this point.  It
   seems OK but I'm not clear on what if any real-world improvement
   this gives us.

ULPs:

 - Pradeep's IPoIB CM support for devices that don't have SRQs.  I
   think the basic approach makes sense (I don't think faking SRQs at
   some other layer is really feasible) and I need to find time to
   look at the details to see if the current patch looks workable.  I'm
   likely to merge this; getting an independent Reviewed-by: would
   certainly be appreciated too.

 - Moni's IPoIB bonding support.  This seems mostly an issue of
   getting the core bonding maintainer's attention.  However getting a
   Reviewed-by: for the IPoIB changes wouldn't hurt too.

 - Rolf's IPoIB MGID scope changes.  Certainly we want to fix this
   issue but the specific changes need review.

 - Eli and Michael's IPoIB stateless offload (checksum offload, LSO,
   LRO, etc).  It's a big series that makes quite a few core changes.
   I think it needs some careful review and is probably at risk of
   missing this merge window.  Sorting in order of invasiveness so we
   can merge at least some of it (if splitting it makes sense) might
   be a good idea.

HW specific:

 - I already merged patches to enable MSI-X by default for mthca and
   mlx4.  I hope there aren't too many systems that get hosed if a
   MSI-X interrupt is generated.

 - Jack and Michael's mlx4 FMR support.  Will merge I guess, 

RE: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24

2007-09-13 Thread Sean Hefty
 - My user_mad P_Key index support patch.  I'll test the ioctl to
   change to the new mode and merge this I guess, since Hal and Sean
   have tested this out.

I can give this patch a reviewed-by: too, and I will also try to review a couple
of the pending ipoib patches.

 - Sean's QoS changes.  These look fine at first glance, and I just
   plan to understand the backwards compatibility story (ie how this
   works with an old SM) and merge.  Anyone who objects let me know.

The new QoS fields fall into fields that are currently reserved, which should be
ignored by an older SM.  I've only tested this against openSM however.

 - Sean's IB CM MRA interface changes.  Don't know at this point.  It
   seems OK but I'm not clear on what if any real-world improvement
   this gives us.

This patch was generated in response to an Intel MPI issue.  We've seen MPI take
several minutes to respond to a connection request during the middle of large
application runs.  When this happens, the active side times out the connection.
In OFED, we added module parameters to adjust the rdma_cm connection timeout on
the active side, but I believe that sending an MRA from the passive side is a
better solution.

- Sean
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: InfiniBand/RDMA merge plans for 2.6.24

2007-09-13 Thread Shirley Ma
Hello Roland,
 
Since ehca can support 4K MTU, we would like to see a patch in 
IPoIB to allow link MTU to be up to 4K instead of current 2K for 2.6.24 
kernel. The idea is IPoIB link MTU will pick up a return value from SM's 
default broadcast MTU. This patch should be a small patch, I hope you are 
OK with this.

Thanks
Shirley
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24

2007-09-13 Thread Jeff Garzik

Steve Wise wrote:
I was about to post v2 of my patch to avoid port space collisions with 
the native stack.  Can we get that 2.6.24?  It is high priority IMO. 
I've tried to solicit review on it, but I think folks are reluctant... ;-)


Well, if it involves /sharing/ port space with the native stack, i.e. 
where port 1234 is IB but 1235 is Linux, pretty much all the networking 
devs have NAK'd that approach AFAICS.


Jeff



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24

2007-09-13 Thread Steve Wise



Jeff Garzik wrote:

Steve Wise wrote:
I was about to post v2 of my patch to avoid port space collisions with 
the native stack.  Can we get that 2.6.24?  It is high priority IMO. 
I've tried to solicit review on it, but I think folks are reluctant... 
;-)


Well, if it involves /sharing/ port space with the native stack, i.e. 
where port 1234 is IB but 1235 is Linux, pretty much all the networking 
devs have NAK'd that approach AFAICS.




Jeff, I posted a fix that doesn't do this.  No port sharing.  The iwarp 
device will use its own ip address and subnet to avoid collisions.  You 
should review the patch when I post v2.


Thanks,

Steve.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24

2007-09-13 Thread Jeff Garzik

Steve Wise wrote:

Jeff Garzik wrote:

Steve Wise wrote:
I was about to post v2 of my patch to avoid port space collisions 
with the native stack.  Can we get that 2.6.24?  It is high priority 
IMO. I've tried to solicit review on it, but I think folks are 
reluctant... ;-)


Well, if it involves /sharing/ port space with the native stack, i.e. 
where port 1234 is IB but 1235 is Linux, pretty much all the 
networking devs have NAK'd that approach AFAICS.


Jeff, I posted a fix that doesn't do this.  No port sharing.  The iwarp 
device will use its own ip address and subnet to avoid collisions.  You 
should review the patch when I post v2.


Sounds promising, then!  :)

Jeff



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: InfiniBand/RDMA merge plans for 2.6.24

2007-09-13 Thread Roland Dreier
  Since ehca can support 4K MTU, we would like to see a patch in 
  IPoIB to allow link MTU to be up to 4K instead of current 2K for 2.6.24 
  kernel. The idea is IPoIB link MTU will pick up a return value from SM's 
  default broadcast MTU. This patch should be a small patch, I hope you are 
  OK with this.

It's actually not small, since it turns the skb allocation into a
4100-byte buffer, which ends up being more than 1 page usually, which
means it fails if memory is fragmented.

Anyway given the backlog anything substantial that hasn't been posted
already is almost surely going to have to wait until 2.6.25.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24

2007-09-13 Thread Roland Dreier
   - My user_mad P_Key index support patch.  I'll test the ioctl to
 change to the new mode and merge this I guess, since Hal and Sean
 have tested this out.
  
  I can give this patch a reviewed-by: too, and I will also try to review a 
  couple
  of the pending ipoib patches.

Thanks!

   - Sean's QoS changes.  These look fine at first glance, and I just
 plan to understand the backwards compatibility story (ie how this
 works with an old SM) and merge.  Anyone who objects let me know.
  
  The new QoS fields fall into fields that are currently reserved, which 
  should be
  ignored by an older SM.  I've only tested this against openSM however.

That seems OK -- I'm OK with breaking things if an SM is clearly buggy
(and not ignoring fields that are defined to be ignored in the spec
would certainly be a clear bug to me).

  This patch was generated in response to an Intel MPI issue.  We've seen MPI 
  take
  several minutes to respond to a connection request during the middle of large
  application runs.  When this happens, the active side times out the 
  connection.
  In OFED, we added module parameters to adjust the rdma_cm connection timeout 
  on
  the active side, but I believe that sending an MRA from the passive side is a
  better solution.

OK -- just to make sure I'm understanding what you're saying: have you
confirmed that your proposed patches actually fix the issue?

 - R.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24

2007-09-13 Thread Roland Dreier
  I was about to post v2 of my patch to avoid port space collisions with
  the native stack.  Can we get that 2.6.24?  It is high priority
  IMO. I've tried to solicit review on it, but I think folks are
  reluctant... ;-)

I would like to get this in, but I'm still at least a little
reluctant, since we would be committing to a user interface that seems
a little awkward at best, so I'd like to try and find something
better.  Just to summarize my understanding:

 - your patch requires the administration to configure an ethX:iwY
   alias address to use iwarp.  (By the way is there anything other
   than don't do that that avoids assigning the same address to the
   iwarp alias and a non-iwarp interface?)

 - it would be nicer to create the alias automatically, but an alias
   without an address doesn't make sense.  Creating a whole separate
   net device causes problems because the iwarp stuff still needs to
   use the main net device to do ARP etc.

 - so I'm out of better ideas but I still want to push back a little
   before we commit to something ugly.

I've been meaning to track down the bnx2 iscsi offload patch to look
and see if this issue is addressed, since the same problem seems to
exist: it seems an iscsi connection and a main stack tcp connection
might share the same 4-tuple unless something is done to avoid that
happening.

Also, I think it behooves us to get some agreement on this approach
with NetEffect and Kanoj (NetXen?) at least, since their iwarp drivers
seem to be imminent.

 - R.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24

2007-09-13 Thread Roland Dreier
  Well, if it involves /sharing/ port space with the native stack,
  i.e. where port 1234 is IB but 1235 is Linux, pretty much all the
  networking devs have NAK'd that approach AFAICS.

Just to be clear, InfiniBand has no problem; the issue is port
collisions involving iWARP connections.

 - R.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24

2007-09-13 Thread Michael Chan
On Thu, 2007-09-13 at 14:11 -0700, Roland Dreier wrote:

 
 I've been meaning to track down the bnx2 iscsi offload patch to look
 and see if this issue is addressed, since the same problem seems to
 exist: it seems an iscsi connection and a main stack tcp connection
 might share the same 4-tuple unless something is done to avoid that
 happening.
 

iSCSI does not do passive listens, only active connections to the
target.  But you're right, the port space is still shared between iSCSI
and the main stack.  We currently rely on user apps binding to the main
stack to reserve certain ephemeral ports, and telling the iSCSI driver
which ports to use.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html