Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24
I tested this by simulating a slow passive side responder, and it worked as expected for those tests. Using an MRA does add another MAD to the CM exchange, which is why it is sent only after seeing a duplicate request. Alternatively, we can take the OFED module parameter patch. What the heck, I added this for 2.6.24. If it doesn't work out we can back it out. - R. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: InfiniBand/RDMA merge plans for 2.6.24
Roland Dreier [EMAIL PROTECTED] wrote on 09/17/2007 02:47:42 PM: IPoIB CM handles this properly by gathering together single pages in skbs' fragment lists. Then can we reuse IPoIB CM code here? Yes, if possible, refactoring things so that the rx skb allocation code becomes common between CM and non-CM would definitely make sense. IPoIB-CM rx skb allocation is not generic to be used by UD, it allocates more buffers than needed if mtu is not 64K, and doesn't query the real max_num_sg from the device. I am thinking to have a generic skb allocation in IPoIB based on matrix of (ipoib-mtu-size, page-size, max_num_sg, head-size). Thanks Shirley - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24
OK -- just to make sure I'm understanding what you're saying: have you confirmed that your proposed [CM MRA] patches actually fix the issue? Not directly. I cannot easily test kernel patches on our larger, production clusters. We've seen the issue with specific applications on 512 and 1024 cores, but I've only been able to test the patch on a 48-core cluster. I have verified that it successfully increases the timeout to where it *should* work, but cannot absolutely confirm that it will fix the problem. I'm unlikely to know that until the production clusters move to an OFED release (1.3?) containing this patch. Umm... this is a difficult situation for me to merge the changes then. We're changing the CM retry behavior blind here. How do we know that the MRA changes don't make the scalability issue worse? - R. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24
Umm... this is a difficult situation for me to merge the changes then. We're changing the CM retry behavior blind here. How do we know that the MRA changes don't make the scalability issue worse? What's currently upstream doesn't work for Intel MPI on our larger clusters. The connection requests time out on the active side before the passive side can respond. The OFED release works because it provides a kernel patch to make the timeout a module parameter. I'm trying to avoid adding a module parameter, and the MRA is designed for this situation. I tested this by simulating a slow passive side responder, and it worked as expected for those tests. Using an MRA does add another MAD to the CM exchange, which is why it is sent only after seeing a duplicate request. Alternatively, we can take the OFED module parameter patch. - Sean - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: InfiniBand/RDMA merge plans for 2.6.24
Missing from this list (IMPORTANT patch!): [ofa-general] [PATCH 2 of 2] IB/mlx4: Handle new FW requirement for send request prefetching, for WQE sg lists (Posted by me to list on Sept 4) {patch header: This is an addendum to Roland's commit 0e6e74162164d908edf7889ac66dca09e7505745 (June 18). This addendum adds prefetch headroom marking processing for s/g segments. We write s/g segments in reverse order into the WQE, in order to guarantee that the first dword of all cachelines containing s/g segments is written last (overwriting the headroom invalidation pattern). The entire cacheline will thus contain valid data when the invalidation pattern is overwritten. This actually looks like a bugfix that might even have been appropriate for 2.6.23. Roland, do you have this patch? Can you comment on it please? -- MST - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24
On Thursday 13 September 2007 20:57, Roland Dreier wrote: HW specific: - I already merged patches to enable MSI-X by default for mthca and mlx4. I hope there aren't too many systems that get hosed if a MSI-X interrupt is generated. - Jack and Michael's mlx4 FMR support. Will merge I guess, although I do hope to have time to address the DMA API abuse that is being copied from mthca, so that mlx4 and mthca work in Xen domU. - ehca patch queue. Will merge, pending fixes for the few minor issues I commented on. - Steve's mthca router mode support. Would be nice to see a review from someone at Mellanox. - Arthur's mthca doorbell alignment fixes. I will experiment with a few different approaches and post what I like (and fix mlx4 as well). I hope Arthur can review. - Michael's mlx4 WQE shrinking patch. Not sure yet; I'll reply to the latest patch directly. Missing from this list (IMPORTANT patch!): [ofa-general] [PATCH 2 of 2] IB/mlx4: Handle new FW requirement for send request prefetching, for WQE sg lists (Posted by me to list on Sept 4) {patch header: This is an addendum to Roland's commit 0e6e74162164d908edf7889ac66dca09e7505745 (June 18). This addendum adds prefetch headroom marking processing for s/g segments. We write s/g segments in reverse order into the WQE, in order to guarantee that the first dword of all cachelines containing s/g segments is written last (overwriting the headroom invalidation pattern). The entire cacheline will thus contain valid data when the invalidation pattern is overwritten. } This patch series (1 of 2 is for libmlx4, the same issue). Also, I'm now posting (in a separate post) the following patch to mlx4, which is important: display the following device information via sysfs: board_id, fw_ver, hw_rev, hca_type. The info is displayed under directory /sys/class/infiniband/mlx4_x, where x is the pci bus sequence number (starting from zero). This patch makes information available to ibstat and ibv_devinfo under the same directory as is used for tavor/arbel/sinai -- thus requiring no userspace modifications. - Jack - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24
Hal Rosenstock wrote: Has anyone tested these with QoS actually be used ? I suppose this requires Connect-X. You can test it with a switch without ConnectX. If you want that the HCA will react to the QoS setting too then you should have ConnectX Tziporet - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: InfiniBand/RDMA merge plans for 2.6.24
Quoting Roland Dreier [EMAIL PROTECTED]: Subject: InfiniBand/RDMA merge plans for 2.6.24 With 2.6.24 probably opening in the not-too-distant future, it's probably a good time to review what my plans are for when the merge window opens. Roland, could you merge the common TX CQ patch please? It actually fixes a real problem. -- MST - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: InfiniBand/RDMA merge plans for 2.6.24
Roland, could you merge the common TX CQ patch please? It actually fixes a real problem. Yes, I will, but it collides with the net-2.6.24 NAPI rework I think, so it may not go in until a few days after the merge window. Have you verified that the patch cures the interrupt overload issues? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: InfiniBand/RDMA merge plans for 2.6.24
Quoting Roland Dreier [EMAIL PROTECTED]: Subject: Re: InfiniBand/RDMA merge plans for 2.6.24 Roland, could you merge the common TX CQ patch please? It actually fixes a real problem. Yes, I will, but it collides with the net-2.6.24 NAPI rework I think, so it may not go in until a few days after the merge window. Have you verified that the patch cures the interrupt overload issues? Yes. -- MST - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: InfiniBand/RDMA merge plans for 2.6.24
IPoIB CM handles this properly by gathering together single pages in skbs' fragment lists. Then can we reuse IPoIB CM code here? Yes, if possible, refactoring things so that the rx skb allocation code becomes common between CM and non-CM would definitely make sense. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24
The IGMP enabling patch posted by me on September 2nd isn't on your list http://lists.openfabrics.org/pipermail/general/2007-September/040250.html can you add it? Yes, I lost that somehow. I will add it to my list of things to take a look at (no opinion yet). - R. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24
Roland Dreier wrote: With 2.6.24 probably opening in the not-too-distant future, it's probably a good time to review what my plans are for when the merge window opens. Core: - Sean's QoS changes. These look fine at first glance, and I just plan to understand the backwards compatibility story (ie how this works with an old SM) and merge. Anyone who objects let me know. Hi Roland, I have reviewed the qos patches and provided comments which were deployed in v2 of the series. I also tested it (ipoib and iser which is rdma-cm based) against the Voltaire SM/SA to see that nothing was broken. I will send you a reviewed by: signature. ULPs: [ofa-general] [PATCH RFC] IB/ipoib: enable IGMP for userpsace multicast IB apps The IGMP enabling patch posted by me on September 2nd isn't on your list http://lists.openfabrics.org/pipermail/general/2007-September/040250.html can you add it? - Moni's IPoIB bonding support. This seems mostly an issue of getting the core bonding maintainer's attention. However getting a Reviewed-by: for the IPoIB changes wouldn't hurt too. Jay Vosburgh, the bonding driver maintainer just sent an ack on all patch series. As for the IPoIB changes, there are three patches, where two of them, namely [PATCH 02/11] IB/ipoib: Notify the world before doing unregister [PATCH 04/11] IB/ipoib: Verify address handle validity on send are handling a corner-case problems pointed by Michael Tsirkin. Michael, will you be able to look on it and provide a reviewed-by signature? the third patch [PATCH 03/11] IB/ipoib: Bound the net device to the ipoib_neigh structue is somehow much more simple, I don't think more review is needed for it. - Eli and Michael's IPoIB stateless offload (checksum offload, LSO, LRO, etc). It's a big series that makes quite a few core changes. I think it needs some careful review and is probably at risk of missing this merge window. Sorting in order of invasiveness so we can merge at least some of it (if splitting it makes sense) might be a good idea. Just for the record, the 'etc' above relates to the interrupt moderation support (mlx4, core, ipoib {config through ethertool, usage). Among other things, what is not clear to me here is if/how this goes hand-in-hand with NAPI. As you saw the patch adding checksum offload support had a long thread, and I think the discussion has reached the point where Michael is waiting for your take on it. As for the LSO, LRO patches, I did not see any review comment. I will see that I can review from the series, to begin with, will send Eli some comments and questions. HW specific: - Jack and Michael's mlx4 FMR support. Will merge I guess, although I do hope to have time to address the DMA API abuse that is being copied from mthca, so that mlx4 and mthca work in Xen domU. This patch series is somehow important as without them iser is useless over connectx. Can be nice if you merge this and at max fix the abuse later. Or. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24
Roland Dreier wrote: I was about to post v2 of my patch to avoid port space collisions with the native stack. Can we get that 2.6.24? It is high priority IMO. I've tried to solicit review on it, but I think folks are reluctant... ;-) I would like to get this in, but I'm still at least a little reluctant, since we would be committing to a user interface that seems a little awkward at best, so I'd like to try and find something better. Just to summarize my understanding: - your patch requires the administration to configure an ethX:iwY alias address to use iwarp. (By the way is there anything other than don't do that that avoids assigning the same address to the iwarp alias and a non-iwarp interface?) Nope. Its totally up to the admin to create the ethX:iwY interface -and- to segment his services so host TCP runs on the ethX subnet(s) and the iwarp rdma ones run on ethX:iwY subnet(s). Without changing the core network serices, I don't see any way around this. - it would be nicer to create the alias automatically, but an alias without an address doesn't make sense. Creating a whole separate net device causes problems because the iwarp stuff still needs to use the main net device to do ARP etc. I do log a warning if an iwarp application binds to address 0.0.0.0 and there are no ethX:iwY address available. - so I'm out of better ideas but I still want to push back a little before we commit to something ugly. Me 2. :-( I've been meaning to track down the bnx2 iscsi offload patch to look and see if this issue is addressed, since the same problem seems to exist: it seems an iscsi connection and a main stack tcp connection might share the same 4-tuple unless something is done to avoid that happening. Also, I think it behooves us to get some agreement on this approach with NetEffect and Kanoj (NetXen?) at least, since their iwarp drivers seem to be imminent. - R. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24
On Thu, Sep 13, 2007 at 01:59:21PM -0500, Steve Wise ([EMAIL PROTECTED]) wrote: Well, if it involves /sharing/ port space with the native stack, i.e. where port 1234 is IB but 1235 is Linux, pretty much all the networking devs have NAK'd that approach AFAICS. Jeff, I posted a fix that doesn't do this. No port sharing. The iwarp device will use its own ip address and subnet to avoid collisions. You should review the patch when I post v2. Could you please resend it, since I missed it in [EMAIL PROTECTED] -- Evgeniy Polyakov - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: InfiniBand/RDMA merge plans for 2.6.24
The patch is just needed to pick up broadcast MTU size instead of hard coding 2K right now. SKB allocation shouldn't be different with Ethernet Jambo Frame and IPoIB-CM which 64K MTU. I don't understand why it's different. Could you please explain this? It's exactly the same problem as ethernet jumbo frames. A web search for 'order 1 failure e1000' might be interesting. IPoIB CM handles this properly by gathering together single pages in skbs' fragment lists. - R. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24
I've been meaning to track down the bnx2 iscsi offload patch to look and see if this issue is addressed, since the same problem seems to exist: it seems an iscsi connection and a main stack tcp connection might share the same 4-tuple unless something is done to avoid that happening. iSCSI does not do passive listens, only active connections to the target. But you're right, the port space is still shared between iSCSI and the main stack. We currently rely on user apps binding to the main stack to reserve certain ephemeral ports, and telling the iSCSI driver which ports to use. Got it... I wasn't thinking that clearly, but it is clear that a full 4-tuple collision with only active connections is quite unlikely. I guess you would have to make both an offloaded and a non-offloaded iSCSI connection to the same target and get really unlucky with ephemeral port allocation. So in practice I guess it's not an issue at all with your driver yet. However, do you have any plans to support iSCSI offload for targets? Also, looking at the first CNIC patch, I can't help but notice that you seem to have at least some support for iWARP there. How does the CNIC look? Does it share the same interface/addresses as the non-offload NIC, or does it create a completely separate netdevice? I want to make sure that whatever solution we come up with for cxgb3 doesn't cause problems for you. And of course if you have a better idea than what Steve has come up with, that would be great :) - R. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24
OK -- just to make sure I'm understanding what you're saying: have you confirmed that your proposed patches actually fix the issue? Not directly. I cannot easily test kernel patches on our larger, production clusters. We've seen the issue with specific applications on 512 and 1024 cores, but I've only been able to test the patch on a 48-core cluster. I have verified that it successfully increases the timeout to where it *should* work, but cannot absolutely confirm that it will fix the problem. I'm unlikely to know that until the production clusters move to an OFED release (1.3?) containing this patch. - Sean - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: InfiniBand/RDMA merge plans for 2.6.24
IPoIB CM handles this properly by gathering together single pages in skbs' fragment lists. - R. Then can we reuse IPoIB CM code here? Thanks Shirley - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24
On Fri, 2007-09-14 at 09:18 -0700, Roland Dreier wrote: However, do you have any plans to support iSCSI offload for targets? Also, looking at the first CNIC patch, I can't help but notice that you seem to have at least some support for iWARP there. How does the CNIC look? Does it share the same interface/addresses as the non-offload NIC, or does it create a completely separate netdevice? We will support iWARP in the future and it should be similar to the way we do iSCSI - using the same interface/addresses as the bnx2 NIC. I want to make sure that whatever solution we come up with for cxgb3 doesn't cause problems for you. And of course if you have a better idea than what Steve has come up with, that would be great :) We are looking at these discussions with great interest. If we have any new ideas, we'll definitely let everyone know. Thanks. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
InfiniBand/RDMA merge plans for 2.6.24
With 2.6.24 probably opening in the not-too-distant future, it's probably a good time to review what my plans are for when the merge window opens. At the kernel summit, we discussed patch review (doing a web search for kernel summit reviewed-by: should turn up lots of info on this). Due to an unfortunate combination of vacation and conference travel, summer colds, and other inconveniences, I am very backed up on reviewing. And in any case, I've allowed too much code review to be dumped on me -- when there are dozens of people working on IB and RDMA stuff, it obviously doesn't work to expect me to do all the reviewing. Unfortunately, due to the length of the backlog and the fact that 2.6.23 seems fairly close, some of the things listed below are going to miss the 2.6.24 merge window. So, although the plan is to phase in requiring Reviewed-by: gently, for this merge, if you can get someone other than me to review your work, then the chances of it being merged increase dramatically. I'm talking about a real review-- ideally, someone independent (from another company would be good) who is willing to provide a Reviewed-by: line that means the reviewer has really looked at and thought about the patch. There should be a mailing list thread you can point me at where the reviewer comments on the patch and a new version of that patch addressing all comments is posted (or in exceptional cases, where the patch is perfect to start with, where the reviewer says the patch is great). For example, given the number of IPoIB changes pending, it might be a good idea for the people submitting them to get together and trade reviews (ie If you review my patch, I'll review your patch). There are a few cases where getting a review may not be necessary. First of all, trivial and obvious patches don't need a review. It's a judgement call what is trivial or obvious, and it's always a good idea to provide a changelog that makes it clear why a patch is trivial and obviously correct. Second, hardware driver patches may not make sense to anyone outside of the company whose hardware the driver is for. Still, in this case, an internal Reviewed-by: would be nice, and also a changelog that explains the reason for the change always helps (don't just tell me what your patch does, but also explain what the patch fixes and what the impact of the current situation is). Anyway, here are all the pending things that I'm aware of. As usual, if something isn't already in my tree and isn't listed below, I probably missed it or dropped it by mistake. Please remind me again in that case. Core: - My user_mad P_Key index support patch. I'll test the ioctl to change to the new mode and merge this I guess, since Hal and Sean have tested this out. - A fix to the user_mad 32-bit big-endian userspace 64/32 problem with the method_mask when registering agents. I'll write a patch to handle this in a way that doesn't change the ABI for anything other than the broken case and hope to get someone to review this so it can be merged. - Sean's QoS changes. These look fine at first glance, and I just plan to understand the backwards compatibility story (ie how this works with an old SM) and merge. Anyone who objects let me know. - Sean's IB CM MRA interface changes. Don't know at this point. It seems OK but I'm not clear on what if any real-world improvement this gives us. ULPs: - Pradeep's IPoIB CM support for devices that don't have SRQs. I think the basic approach makes sense (I don't think faking SRQs at some other layer is really feasible) and I need to find time to look at the details to see if the current patch looks workable. I'm likely to merge this; getting an independent Reviewed-by: would certainly be appreciated too. - Moni's IPoIB bonding support. This seems mostly an issue of getting the core bonding maintainer's attention. However getting a Reviewed-by: for the IPoIB changes wouldn't hurt too. - Rolf's IPoIB MGID scope changes. Certainly we want to fix this issue but the specific changes need review. - Eli and Michael's IPoIB stateless offload (checksum offload, LSO, LRO, etc). It's a big series that makes quite a few core changes. I think it needs some careful review and is probably at risk of missing this merge window. Sorting in order of invasiveness so we can merge at least some of it (if splitting it makes sense) might be a good idea. HW specific: - I already merged patches to enable MSI-X by default for mthca and mlx4. I hope there aren't too many systems that get hosed if a MSI-X interrupt is generated. - Jack and Michael's mlx4 FMR support. Will merge I guess, although I do hope to have time to address the DMA API abuse that is being copied from mthca, so that mlx4 and mthca work in Xen domU. - ehca patch queue. Will merge, pending fixes for the few minor issues I commented on. - Steve's mthca router mode
Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24
Hey Roland, I was about to post v2 of my patch to avoid port space collisions with the native stack. Can we get that 2.6.24? It is high priority IMO. I've tried to solicit review on it, but I think folks are reluctant... ;-) Steve. Roland Dreier wrote: With 2.6.24 probably opening in the not-too-distant future, it's probably a good time to review what my plans are for when the merge window opens. At the kernel summit, we discussed patch review (doing a web search for kernel summit reviewed-by: should turn up lots of info on this). Due to an unfortunate combination of vacation and conference travel, summer colds, and other inconveniences, I am very backed up on reviewing. And in any case, I've allowed too much code review to be dumped on me -- when there are dozens of people working on IB and RDMA stuff, it obviously doesn't work to expect me to do all the reviewing. Unfortunately, due to the length of the backlog and the fact that 2.6.23 seems fairly close, some of the things listed below are going to miss the 2.6.24 merge window. So, although the plan is to phase in requiring Reviewed-by: gently, for this merge, if you can get someone other than me to review your work, then the chances of it being merged increase dramatically. I'm talking about a real review-- ideally, someone independent (from another company would be good) who is willing to provide a Reviewed-by: line that means the reviewer has really looked at and thought about the patch. There should be a mailing list thread you can point me at where the reviewer comments on the patch and a new version of that patch addressing all comments is posted (or in exceptional cases, where the patch is perfect to start with, where the reviewer says the patch is great). For example, given the number of IPoIB changes pending, it might be a good idea for the people submitting them to get together and trade reviews (ie If you review my patch, I'll review your patch). There are a few cases where getting a review may not be necessary. First of all, trivial and obvious patches don't need a review. It's a judgement call what is trivial or obvious, and it's always a good idea to provide a changelog that makes it clear why a patch is trivial and obviously correct. Second, hardware driver patches may not make sense to anyone outside of the company whose hardware the driver is for. Still, in this case, an internal Reviewed-by: would be nice, and also a changelog that explains the reason for the change always helps (don't just tell me what your patch does, but also explain what the patch fixes and what the impact of the current situation is). Anyway, here are all the pending things that I'm aware of. As usual, if something isn't already in my tree and isn't listed below, I probably missed it or dropped it by mistake. Please remind me again in that case. Core: - My user_mad P_Key index support patch. I'll test the ioctl to change to the new mode and merge this I guess, since Hal and Sean have tested this out. - A fix to the user_mad 32-bit big-endian userspace 64/32 problem with the method_mask when registering agents. I'll write a patch to handle this in a way that doesn't change the ABI for anything other than the broken case and hope to get someone to review this so it can be merged. - Sean's QoS changes. These look fine at first glance, and I just plan to understand the backwards compatibility story (ie how this works with an old SM) and merge. Anyone who objects let me know. - Sean's IB CM MRA interface changes. Don't know at this point. It seems OK but I'm not clear on what if any real-world improvement this gives us. ULPs: - Pradeep's IPoIB CM support for devices that don't have SRQs. I think the basic approach makes sense (I don't think faking SRQs at some other layer is really feasible) and I need to find time to look at the details to see if the current patch looks workable. I'm likely to merge this; getting an independent Reviewed-by: would certainly be appreciated too. - Moni's IPoIB bonding support. This seems mostly an issue of getting the core bonding maintainer's attention. However getting a Reviewed-by: for the IPoIB changes wouldn't hurt too. - Rolf's IPoIB MGID scope changes. Certainly we want to fix this issue but the specific changes need review. - Eli and Michael's IPoIB stateless offload (checksum offload, LSO, LRO, etc). It's a big series that makes quite a few core changes. I think it needs some careful review and is probably at risk of missing this merge window. Sorting in order of invasiveness so we can merge at least some of it (if splitting it makes sense) might be a good idea. HW specific: - I already merged patches to enable MSI-X by default for mthca and mlx4. I hope there aren't too many systems that get hosed if a MSI-X interrupt is generated. - Jack and Michael's mlx4 FMR support. Will merge I guess,
RE: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24
- My user_mad P_Key index support patch. I'll test the ioctl to change to the new mode and merge this I guess, since Hal and Sean have tested this out. I can give this patch a reviewed-by: too, and I will also try to review a couple of the pending ipoib patches. - Sean's QoS changes. These look fine at first glance, and I just plan to understand the backwards compatibility story (ie how this works with an old SM) and merge. Anyone who objects let me know. The new QoS fields fall into fields that are currently reserved, which should be ignored by an older SM. I've only tested this against openSM however. - Sean's IB CM MRA interface changes. Don't know at this point. It seems OK but I'm not clear on what if any real-world improvement this gives us. This patch was generated in response to an Intel MPI issue. We've seen MPI take several minutes to respond to a connection request during the middle of large application runs. When this happens, the active side times out the connection. In OFED, we added module parameters to adjust the rdma_cm connection timeout on the active side, but I believe that sending an MRA from the passive side is a better solution. - Sean - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: InfiniBand/RDMA merge plans for 2.6.24
Hello Roland, Since ehca can support 4K MTU, we would like to see a patch in IPoIB to allow link MTU to be up to 4K instead of current 2K for 2.6.24 kernel. The idea is IPoIB link MTU will pick up a return value from SM's default broadcast MTU. This patch should be a small patch, I hope you are OK with this. Thanks Shirley - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24
Steve Wise wrote: I was about to post v2 of my patch to avoid port space collisions with the native stack. Can we get that 2.6.24? It is high priority IMO. I've tried to solicit review on it, but I think folks are reluctant... ;-) Well, if it involves /sharing/ port space with the native stack, i.e. where port 1234 is IB but 1235 is Linux, pretty much all the networking devs have NAK'd that approach AFAICS. Jeff - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24
Jeff Garzik wrote: Steve Wise wrote: I was about to post v2 of my patch to avoid port space collisions with the native stack. Can we get that 2.6.24? It is high priority IMO. I've tried to solicit review on it, but I think folks are reluctant... ;-) Well, if it involves /sharing/ port space with the native stack, i.e. where port 1234 is IB but 1235 is Linux, pretty much all the networking devs have NAK'd that approach AFAICS. Jeff, I posted a fix that doesn't do this. No port sharing. The iwarp device will use its own ip address and subnet to avoid collisions. You should review the patch when I post v2. Thanks, Steve. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24
Steve Wise wrote: Jeff Garzik wrote: Steve Wise wrote: I was about to post v2 of my patch to avoid port space collisions with the native stack. Can we get that 2.6.24? It is high priority IMO. I've tried to solicit review on it, but I think folks are reluctant... ;-) Well, if it involves /sharing/ port space with the native stack, i.e. where port 1234 is IB but 1235 is Linux, pretty much all the networking devs have NAK'd that approach AFAICS. Jeff, I posted a fix that doesn't do this. No port sharing. The iwarp device will use its own ip address and subnet to avoid collisions. You should review the patch when I post v2. Sounds promising, then! :) Jeff - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: InfiniBand/RDMA merge plans for 2.6.24
Since ehca can support 4K MTU, we would like to see a patch in IPoIB to allow link MTU to be up to 4K instead of current 2K for 2.6.24 kernel. The idea is IPoIB link MTU will pick up a return value from SM's default broadcast MTU. This patch should be a small patch, I hope you are OK with this. It's actually not small, since it turns the skb allocation into a 4100-byte buffer, which ends up being more than 1 page usually, which means it fails if memory is fragmented. Anyway given the backlog anything substantial that hasn't been posted already is almost surely going to have to wait until 2.6.25. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24
- My user_mad P_Key index support patch. I'll test the ioctl to change to the new mode and merge this I guess, since Hal and Sean have tested this out. I can give this patch a reviewed-by: too, and I will also try to review a couple of the pending ipoib patches. Thanks! - Sean's QoS changes. These look fine at first glance, and I just plan to understand the backwards compatibility story (ie how this works with an old SM) and merge. Anyone who objects let me know. The new QoS fields fall into fields that are currently reserved, which should be ignored by an older SM. I've only tested this against openSM however. That seems OK -- I'm OK with breaking things if an SM is clearly buggy (and not ignoring fields that are defined to be ignored in the spec would certainly be a clear bug to me). This patch was generated in response to an Intel MPI issue. We've seen MPI take several minutes to respond to a connection request during the middle of large application runs. When this happens, the active side times out the connection. In OFED, we added module parameters to adjust the rdma_cm connection timeout on the active side, but I believe that sending an MRA from the passive side is a better solution. OK -- just to make sure I'm understanding what you're saying: have you confirmed that your proposed patches actually fix the issue? - R. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24
I was about to post v2 of my patch to avoid port space collisions with the native stack. Can we get that 2.6.24? It is high priority IMO. I've tried to solicit review on it, but I think folks are reluctant... ;-) I would like to get this in, but I'm still at least a little reluctant, since we would be committing to a user interface that seems a little awkward at best, so I'd like to try and find something better. Just to summarize my understanding: - your patch requires the administration to configure an ethX:iwY alias address to use iwarp. (By the way is there anything other than don't do that that avoids assigning the same address to the iwarp alias and a non-iwarp interface?) - it would be nicer to create the alias automatically, but an alias without an address doesn't make sense. Creating a whole separate net device causes problems because the iwarp stuff still needs to use the main net device to do ARP etc. - so I'm out of better ideas but I still want to push back a little before we commit to something ugly. I've been meaning to track down the bnx2 iscsi offload patch to look and see if this issue is addressed, since the same problem seems to exist: it seems an iscsi connection and a main stack tcp connection might share the same 4-tuple unless something is done to avoid that happening. Also, I think it behooves us to get some agreement on this approach with NetEffect and Kanoj (NetXen?) at least, since their iwarp drivers seem to be imminent. - R. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24
Well, if it involves /sharing/ port space with the native stack, i.e. where port 1234 is IB but 1235 is Linux, pretty much all the networking devs have NAK'd that approach AFAICS. Just to be clear, InfiniBand has no problem; the issue is port collisions involving iWARP connections. - R. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24
On Thu, 2007-09-13 at 14:11 -0700, Roland Dreier wrote: I've been meaning to track down the bnx2 iscsi offload patch to look and see if this issue is addressed, since the same problem seems to exist: it seems an iscsi connection and a main stack tcp connection might share the same 4-tuple unless something is done to avoid that happening. iSCSI does not do passive listens, only active connections to the target. But you're right, the port space is still shared between iSCSI and the main stack. We currently rely on user apps binding to the main stack to reserve certain ephemeral ports, and telling the iSCSI driver which ports to use. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html