Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes
On 2/7/2010 6:39 PM, Steve Wise wrote: If ofed-1.5.1 is based on 2.6.33 then it will get this patch automatically (assuming it goes upstream and makes 2.6.33). Or we can pull it in as a kernel_patches/fixes/ patch. OFED 1.5.1 is not based on 2.6.33, but on 2.6.30, so we need the patch under fixes. Steve - can you prepare such a patch? Tziporet -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes
Tziporet Koren wrote: On 2/7/2010 6:39 PM, Steve Wise wrote: If ofed-1.5.1 is based on 2.6.33 then it will get this patch automatically (assuming it goes upstream and makes 2.6.33). Or we can pull it in as a kernel_patches/fixes/ patch. OFED 1.5.1 is not based on 2.6.33, but on 2.6.30, so we need the patch under fixes. Steve - can you prepare such a patch? Tziporet The reason I thought it was based on 2.6.33, is because I see 2.6.33 git tags in the ofed kernel tree. I misinterpreted what that meant. I can develop a patch, but it will disable _all_ 127.0.0.1 binds. Otherwise openmpi is still broken on IB. Steve. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes
On 2/7/2010 3:22 AM, Steve Wise wrote: Good catch, I'll update the patch and submit for 2.6.33 on Monday. NOTE: This doesn't solve our IB/openmpi regression for ofed-1.5.1. If this patch will be accepted to the kernel 2.6.33 we can take it too Tziporet -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes
Tziporet Koren wrote: On 2/7/2010 3:22 AM, Steve Wise wrote: Good catch, I'll update the patch and submit for 2.6.33 on Monday. NOTE: This doesn't solve our IB/openmpi regression for ofed-1.5.1. If this patch will be accepted to the kernel 2.6.33 we can take it too If ofed-1.5.1 is based on 2.6.33 then it will get this patch automatically (assuming it goes upstream and makes 2.6.33). Or we can pull it in as a kernel_patches/fixes/ patch. My point, though, is that even with this patch in ofed-1.5.1, we still have an openmpi/IB/rdmacm regression. The only way to avoid this regression without changing openmpi is to disallow _all_ rdma binds to 127.0.0.1. Steve. Tziporet -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes
My point, though, is that even with this patch in ofed-1.5.1, we still have an openmpi/IB/rdmacm regression. The only way to avoid this regression without changing openmpi is to disallow _all_ rdma binds to 127.0.0.1. Can you identify the source of the regression? ie what was the change that broke things? I'm most concerned that there is another regression in 2.6.33, and if so I would like to try and avoid letting that get into the final release. -- Roland Dreier rola...@cisco.com Cisco.com - http://www.cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes
Roland Dreier wrote: My point, though, is that even with this patch in ofed-1.5.1, we still have an openmpi/IB/rdmacm regression. The only way to avoid this regression without changing openmpi is to disallow _all_ rdma binds to 127.0.0.1. Can you identify the source of the regression? ie what was the change that broke things? It is the same commit you sited earlier. It enables binding rdma cm_ids to 127.0.0.1. Sean's proposed patch on top of that disables this only for iwarp devices. I'm most concerned that there is another regression in 2.6.33, and if so I would like to try and avoid letting that get into the final release. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [ewg] bug 1918 - openmpi broken due to rdma-cm changes
Can you identify the source of the regression? ie what was the change that broke things? My understanding is that support for loopback addresses exposes an existing bug in openmpi. It tries to bind to 127.0.0.1, which now succeeds. Openmpi passes that address to a remote node for use in connections. I'm most concerned that there is another regression in 2.6.33, and if so I would like to try and avoid letting that get into the final release. Unless we never support loopback addresses, openmpi will see a regression. The only other problem that I'm aware of for 2.6.33 is that the bind to a loopback address will succeed, even though the RDMA device may not support loopback. This is true for the Chelsio and Ammasso drivers. Connections should still fail, but the bind is basically useless in this case. I will try to get a patch for that tomorrow. - Sean -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html