Re: [PATCH 02/13] drivers/infiniband: Remove unnecessary casts of private_data
On Tue, 7 Sep 2010, Ralph Campbell wrote: Acked-by: Ralph Campbell ralph.campb...@qlogic.com Applied, thanks. Signed-off-by: Joe Perches j...@perches.com --- drivers/infiniband/hw/qib/qib_file_ops.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/qib/qib_file_ops.c b/drivers/infiniband/hw/qib/qib_file_ops.c index 6b11645..cef5d67 100644 --- a/drivers/infiniband/hw/qib/qib_file_ops.c +++ b/drivers/infiniband/hw/qib/qib_file_ops.c @@ -1722,7 +1722,7 @@ static int qib_close(struct inode *in, struct file *fp) mutex_lock(qib_mutex); - fd = (struct qib_filedata *) fp-private_data; + fd = fp-private_data; fp-private_data = NULL; rcd = fd-rcd; if (!rcd) { @@ -1808,7 +1808,7 @@ static int qib_ctxt_info(struct file *fp, struct qib_ctxt_info __user *uinfo) struct qib_ctxtdata *rcd = ctxt_fp(fp); struct qib_filedata *fd; - fd = (struct qib_filedata *) fp-private_data; + fd = fp-private_data; info.num_active = qib_count_active_units(); info.unit = rcd-dd-unit; -- Jiri Kosina SUSE Labs, Novell Inc. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: idr_get_new_exact ?
On Mon, Sep 20, 2010 at 11:26:47PM +0200, Tejun Heo wrote: Hello, On 09/20/2010 10:35 PM, Roland Dreier wrote: Looks fine to me as an improvement over the status quo, but I wonder how many of these places could use the radix_tree stuff instead? If you're not using the ability of the idr code to assign an id for you, then it seems the radix_tree API is a better fit. I agree. Wouldn't those users better off simply using radix tree? It could go either way. I was about to write the same function when playing with it for IRQ mapping, the idea being to propagate the initial tree with sparse static vectors and then switch over to dynamic IDs for virtual IRQ creation. I ended up going with a radix tree for other reasons, though. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: idr_get_new_exact ?
Hello, On 09/23/2010 01:42 PM, Paul Mundt wrote: On Mon, Sep 20, 2010 at 11:26:47PM +0200, Tejun Heo wrote: Hello, On 09/20/2010 10:35 PM, Roland Dreier wrote: Looks fine to me as an improvement over the status quo, but I wonder how many of these places could use the radix_tree stuff instead? If you're not using the ability of the idr code to assign an id for you, then it seems the radix_tree API is a better fit. I agree. Wouldn't those users better off simply using radix tree? It could go either way. I was about to write the same function when playing with it for IRQ mapping, the idea being to propagate the initial tree with sparse static vectors and then switch over to dynamic IDs for virtual IRQ creation. I ended up going with a radix tree for other reasons, though. I see. If there are use cases where fixed and dynamic IDs need to be mixed, no objection from me. Thanks. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: software iwarp stack update
linux-rdma-ow...@vger.kernel.org wrote on 09/22/2010 10:42:18 PM: On 09/22/2010 03:35 PM, Nicholas A. Bellinger wrote: On Wed, 2010-09-22 at 10:19 +0200, Bernard Metzler wrote: Earlier this year, we announced the availability of an open source, full software implementation of the iWARP RDMA protocol stack - see my email software iwarp stack from March 14th at the linux-rdma list (http://www.mail-archive.com/linux-rdma@vger.kernel.org/msg02940.html) While since then working on performance and stability, we provided some source code updates. Current user and kernel code is available at gitorious.org/softiwarp. Please see the CHANGES file in the kernel/ directory for a summary of the most recent changes. For more convenient testing, the latest update now allows for a stand-alone build of the kernel module without full kernel source code access. We tested the code with kernel version 2.6.34. If you are interested in a full software RDMA stack on Ethernet, please try it out. In the hope of providing useful information, I put net...@vger.kernel.org on copy. Subscribers of this list, please put me on private cc in case you reply or comment, since I am not subscribed to the list. We would be more than happy if you netdev folks would consider a hardware independent RDMA kernel service as something useful and potentially to be integrated into the mainline network stack. Why might it be useful? A software RDMA stack makes the semantic advantages of asynchronous and one-sided communication available while obsoleting the need to deploy dedicated RDMA hardware or any protocol offloading (while not matching the lowest delay numbers of real RDMA hardware). Implementing the IETF's iWARP protocol stack on top of TCP kernel sockets, softiwarp integrates with the open fabrics environment and thus exports the RDMA kernel and user verbs interface. The efficiency of the Linux TCP/IP network stack together with intrinsic advantages of the RDMA communication model (async. posting of work and reaping of work completions, transfer of send buffer ownership to the kernel which enables zero copy transmit, peer data placement without application scheduling, one-sided remote read operations etc.) can result in improved application-to-application performance and less CPU load, while using the unchanged kernel TCP stack. A software RDMA stack might promote wider RDMA deployment, since when using the host TCP stack, it enables RDMA semantic independent of dedicated hardware. softiwarp peers with real RNICs (tested with Chelsio's T3 adapter). softiwarp is still work in progress and we are very thankful for any suggestions/comments/bug reports. Please advise how we should proceed to bring the stack further to your attention. Would it be useful to provide patches against the current stable kernel version or the next release candidate? Hi Bernard, So what I would recommend doing here to make things more appealing to DaveM and other interested NetDev folks would be to clone a seperate tree from the net-2.6.git or net-next-2.6.git repositories and include the softiwarp/kernel.git code into a fresh 'in-kernel' clone tracking the latest netdev code, and then keep git rebase'ing against DaveM's last changes and update your local tree to the lastest netdev code. Of course you will want to remove all of the 'out of tree' LINUX_VERSION build macros and any other legacy bits to follow mainline kernel convention for your 'in-kernel' softiwarp tree. And then post a patch series for review. All, Yes, ok, thats what I will do now. Many thanks for the helpful and encouraging replies. Bernard. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] siw: Fix ib_register_device() for v2.6.34 kernels
Thanks, Nicholas. Just applied your patch. Bernard. linux-rdma-ow...@vger.kernel.org wrote on 09/22/2010 10:30:17 PM: From: Nicholas Bellinger n...@linux-iscsi.org This patch adds a LINUX_VERSION_CODE v2.6.34 check inside of siw_main.c:siw_register_device() around the use of ib_register_device(). In post v2.6.34 kernels this function accepts a second parameter used a sysfs port callback described here: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git; a=commitdiff;h=9a6edb60ec10d86b1025a0cdad68fd89f1ddaf02 This patch currently sets this second parameter to NULL. Signed-off-by: Nicholas A. Bellinger n...@linux-iscsi.org --- softiwarp/siw_main.c |5 - 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/softiwarp/siw_main.c b/softiwarp/siw_main.c index cacedea..c97adee 100644 --- a/softiwarp/siw_main.c +++ b/softiwarp/siw_main.c @@ -233,8 +233,11 @@ int siw_register_device(struct siw_dev *dev) ibdev-iwcm-add_ref = siw_qp_get_ref; ibdev-iwcm-rem_ref = siw_qp_put_ref; ibdev-iwcm-get_qp = siw_get_ofaqp; - +#if LINUX_VERSION_CODE KERNEL_VERSION(2, 6, 34) + rv = ib_register_device(ibdev, NULL); +#else rv = ib_register_device(ibdev); +#endif if (rv) { dprint(DBG_DM|DBG_ON, (dev=%s): ib_register_device failed: rv=%d\n, ibdev-name, rv); -- 1.5.6.5 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] opensm/osm_helper.c: use ARR_SIZE macro instead of hardcoded values
On 11:56 Sun 12 Sep , Yevgeny Kliteynik wrote: Signed-off-by: Yevgeny Kliteynik klit...@dev.mellanox.co.il Applied. Thanks. Sasha -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] siw: Add support for CRC32C offload instruction using libcrypto crc32c-intel
linux-rdma-ow...@vger.kernel.org wrote on 09/23/2010 12:36:29 AM: On Wed, 2010-09-22 at 16:06 -0600, Jason Gunthorpe wrote: On Wed, Sep 22, 2010 at 02:38:31PM -0700, Nicholas A. Bellinger wrote: So I think the main bit here is the ability to request crc32c-intel.ko first, and then fall back to crc32c.ko when the former is not available on CONFIG_X86. Well, it is what Andi said, everything is working fine but there is no mechanism to autoload the accelerated crypto module. If you did modprobe crc32c_intel prior to loading your driver it would automatically get crc32c-intel when it asks for crc32c since it is loaded and a higher priority. Ah, OK. I see what you mean now here wrt to libcrypto priorities and crc32c + crc32c_intel modules. My apologies for the in-experience with libcrypto here.. So, the drivers are correct to just request crc32c .. The work around to limited autoprobing is so trivial (modprob crc32_intel) I'm not sure including extra autoprobing code in the drivers is worthwhile? Indeed, I am happy to drop this patch if Bernard would be nice enough to add a 'modprobe crc32_intel' into the SIW scripts. Ok, thanks for the CRC comments, quite instructive. To sum up, now I'll add a minimum siw bringup script to the kernel part. Bernard -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: igmp: Staggered igmp report intervals for unsolicited igmp reports
On Wed, 22 Sep 2010, Jason Gunthorpe wrote: The device is ready. Its just the multicast group that has not been established yet. In IB when the SA replies to a group join the group should be ready, prior to that the device can't send into the group because it has no MLID for the group.. If you have a MLID then the group is working. When the SA replies it has created the MLID but not reconfigured the fabric yet. So the initial IGMP messages get lost. Is the issue you are dropping IGMP packets because the 224.0.0.2 join hasn't finished? Ideally you'd wait for the SA to reply before sending a IGMP, but a simpler solution might just be to use the broadcast MLID for packets addressed to a MGID that has not yet got a MLID. This would bebe similar to the ethernet behaviour of flooding. IGMP reports are sent on the multicast group not on 224.0.0.2. 224.0.0.2 is only used when leaving a multicast group. I thought also about solutions along the same lines. We could modify the IB layer to send to 224.0.0.2 while until the SA has confirmed the creation of the MC group. For that to work we first would need to modify the SA logic to ensure that it only sends confirmation *after* the fabric has been reconfigured. Then we need to switch the MLIDs of the MC group when the notification is received. If the IB layer has not joined 224.0.0.2 yet (and it will take awhile) then we could even fallback to broadcast until its ready. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: igmp: Staggered igmp report intervals for unsolicited igmp reports
On Wed, 22 Sep 2010, David Stevens wrote: Also increment the frequency so that we get a 10 reports send over a few seconds. Except you want to conform and not conform at the same time. :-) IGMPv2 should be: default count 2, interval 10secs IGMPv3 should be: default count 2, interval 1sec This is during the period of unsolicited igmp reports. We do not know if this group is managed using V3 or V2 since no igmp query/report has been received yet. ...and no way is it a good idea to send 10 unsolicited reports on an Ethernet. Why would that be an issue? The IGMPv2 RFC has no strict limit and RFC3376 mentions that the retransmission occurs Robustness Variable times minus one. Choosing 10 for the Robustness Variable is certainly ok. If we do not increase the number of reports but just limit the interval then the chance of outages of a second or so during mc group creation causing routers missing igmp reports is significantly increased. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: igmp: Staggered igmp report intervals for unsolicited igmp reports
On Thu, Sep 23, 2010 at 10:32:17AM -0500, Christoph Lameter wrote: Is the issue you are dropping IGMP packets because the 224.0.0.2 join hasn't finished? Ideally you'd wait for the SA to reply before sending a IGMP, but a simpler solution might just be to use the broadcast MLID for packets addressed to a MGID that has not yet got a MLID. This would bebe similar to the ethernet behaviour of flooding. IGMP reports are sent on the multicast group not on 224.0.0.2. 224.0.0.2 is only used when leaving a multicast group. Hm, that is quite different than in IGMPv3.. How does this work at all in IB? A message to the multicast group isn't going to make it to any routers unless the routers use some other means to join the IB MGID. Jason -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: igmp: Staggered igmp report intervals for unsolicited igmp reports
On Thu, 23 Sep 2010, Jason Gunthorpe wrote: On Thu, Sep 23, 2010 at 10:32:17AM -0500, Christoph Lameter wrote: Is the issue you are dropping IGMP packets because the 224.0.0.2 join hasn't finished? Ideally you'd wait for the SA to reply before sending a IGMP, but a simpler solution might just be to use the broadcast MLID for packets addressed to a MGID that has not yet got a MLID. This would bebe similar to the ethernet behaviour of flooding. IGMP reports are sent on the multicast group not on 224.0.0.2. 224.0.0.2 is only used when leaving a multicast group. Hm, that is quite different than in IGMPv3.. How does this work at all in IB? A message to the multicast group isn't going to make it to any routers unless the routers use some other means to join the IB MGID. IPoIB creates a infiniband multicast group via the IB calls for a IP multicast group. Then IGMP comes into play and the kernel sends the IP based igmp report. This igmp report must be received by an outside router (on an IP network) in order to for traffic to get forwarded into the IB fabric. You can end up with a IB multicast configuration that is all fine but with loss of the unsolicited packets due to fabric reconfiguration not being complete yet. The larger the fabric the worse the situation. If all unsolicited igmp reports are lost then the router will only start forwarding the mc group after the reporting intervals (which could be in the range of minutes) when it triggers igmp reports through a general igmp query. Until that time the MC group looks dead. And people and software may conclude that the network is broken. This is a general issue for any network where configurations for MC forwarding is needed and where initial igmp reports may get lost. A staggering of time intervals would be a general solution to that issue. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: igmp: Staggered igmp report intervals for unsolicited igmp reports
On Thu, Sep 23, 2010 at 12:37:28PM -0500, Christoph Lameter wrote: On Thu, 23 Sep 2010, Jason Gunthorpe wrote: On Thu, Sep 23, 2010 at 10:32:17AM -0500, Christoph Lameter wrote: Is the issue you are dropping IGMP packets because the 224.0.0.2 join hasn't finished? Ideally you'd wait for the SA to reply before sending a IGMP, but a simpler solution might just be to use the broadcast MLID for packets addressed to a MGID that has not yet got a MLID. This would bebe similar to the ethernet behaviour of flooding. IGMP reports are sent on the multicast group not on 224.0.0.2. 224.0.0.2 is only used when leaving a multicast group. Hm, that is quite different than in IGMPv3.. How does this work at all in IB? A message to the multicast group isn't going to make it to any routers unless the routers use some other means to join the IB MGID. IPoIB creates a infiniband multicast group via the IB calls for a IP multicast group. Then IGMP comes into play and the kernel sends the IP based igmp report. This igmp report must be received by an outside router (on an IP network) in order to for traffic to get forwarded into the IB fabric. You can end up with a IB multicast configuration that is all fine but with loss of the unsolicited packets due to fabric reconfiguration not being complete yet. The larger the fabric the worse the situation. But my point is that IB has very limited multicast, if I create a IB group and then send IGMP into that group *it will not reach a router*. I have to send something to the all routers group or the all IGMPv3 group to get it to reach a router with any reliably. The only way this kind of scheme could work is if an IGMPv2 IPoIB router listens for IB MGID Create notices from the SA and automatically joins all groups that are created, so it can get IGMPv2 membership reports. Which obviously adds more delay, lag, and risk. I'm *guessing* that the change in IGMPv3 to send reports to 224.0.0.22 (all IGMPv3 multicast address) is related to this sort of problem, and it seems like on IB IGMPv2 is not a good fit and should not be used if v3 is available.. Jason -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: igmp: Staggered igmp report intervals for unsolicited igmp reports
On Thu, 23 Sep 2010, Jason Gunthorpe wrote: IPoIB creates a infiniband multicast group via the IB calls for a IP multicast group. Then IGMP comes into play and the kernel sends the IP based igmp report. This igmp report must be received by an outside router (on an IP network) in order to for traffic to get forwarded into the IB fabric. You can end up with a IB multicast configuration that is all fine but with loss of the unsolicited packets due to fabric reconfiguration not being complete yet. The larger the fabric the worse the situation. But my point is that IB has very limited multicast, if I create a IB group and then send IGMP into that group *it will not reach a router*. The IPoIB routers automatically join all IP MC groups created. The only way this kind of scheme could work is if an IGMPv2 IPoIB router listens for IB MGID Create notices from the SA and automatically joins all groups that are created, so it can get IGMPv2 membership reports. Which obviously adds more delay, lag, and risk. Right that is how it works now. I'm *guessing* that the change in IGMPv3 to send reports to 224.0.0.22 (all IGMPv3 multicast address) is related to this sort of problem, and it seems like on IB IGMPv2 is not a good fit and should not be used if v3 is available.. Existing routers do no support IGMPv3. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: igmp: Allow mininum interval specification for igmp timers.
From: Christoph Lameter c...@linux.com Date: Wed, 22 Sep 2010 13:59:30 -0500 (CDT) IGMP timers sometimes fire too rapidly due to randomization of the intervalsfrom 0 to max_delay in igmp_start_timer(). ... Signed-off-by: Christoph Lameter c...@linux.com This change seems reasonable to me, what do you think David? -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html