Re: [PATCH 02/13] drivers/infiniband: Remove unnecessary casts of private_data

2010-09-23 Thread Jiri Kosina
On Tue, 7 Sep 2010, Ralph Campbell wrote:

 Acked-by: Ralph Campbell ralph.campb...@qlogic.com

Applied, thanks.

  Signed-off-by: Joe Perches j...@perches.com
  ---
   drivers/infiniband/hw/qib/qib_file_ops.c |4 ++--
   1 files changed, 2 insertions(+), 2 deletions(-)
  
  diff --git a/drivers/infiniband/hw/qib/qib_file_ops.c 
  b/drivers/infiniband/hw/qib/qib_file_ops.c
  index 6b11645..cef5d67 100644
  --- a/drivers/infiniband/hw/qib/qib_file_ops.c
  +++ b/drivers/infiniband/hw/qib/qib_file_ops.c
  @@ -1722,7 +1722,7 @@ static int qib_close(struct inode *in, struct file 
  *fp)
   
  mutex_lock(qib_mutex);
   
  -   fd = (struct qib_filedata *) fp-private_data;
  +   fd = fp-private_data;
  fp-private_data = NULL;
  rcd = fd-rcd;
  if (!rcd) {
  @@ -1808,7 +1808,7 @@ static int qib_ctxt_info(struct file *fp, struct 
  qib_ctxt_info __user *uinfo)
  struct qib_ctxtdata *rcd = ctxt_fp(fp);
  struct qib_filedata *fd;
   
  -   fd = (struct qib_filedata *) fp-private_data;
  +   fd = fp-private_data;
   
  info.num_active = qib_count_active_units();
  info.unit = rcd-dd-unit;

-- 
Jiri Kosina
SUSE Labs, Novell Inc.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: idr_get_new_exact ?

2010-09-23 Thread Paul Mundt
On Mon, Sep 20, 2010 at 11:26:47PM +0200, Tejun Heo wrote:
 Hello,
 
 On 09/20/2010 10:35 PM, Roland Dreier wrote:
  Looks fine to me as an improvement over the status quo, but I wonder how
  many of these places could use the radix_tree stuff instead?  If you're
  not using the ability of the idr code to assign an id for you, then it
  seems the radix_tree API is a better fit.
 
 I agree.  Wouldn't those users better off simply using radix tree?
 
It could go either way. I was about to write the same function when
playing with it for IRQ mapping, the idea being to propagate the initial
tree with sparse static vectors and then switch over to dynamic IDs for
virtual IRQ creation. I ended up going with a radix tree for other
reasons, though.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: idr_get_new_exact ?

2010-09-23 Thread Tejun Heo
Hello,

On 09/23/2010 01:42 PM, Paul Mundt wrote:
 On Mon, Sep 20, 2010 at 11:26:47PM +0200, Tejun Heo wrote:
 Hello,

 On 09/20/2010 10:35 PM, Roland Dreier wrote:
 Looks fine to me as an improvement over the status quo, but I wonder how
 many of these places could use the radix_tree stuff instead?  If you're
 not using the ability of the idr code to assign an id for you, then it
 seems the radix_tree API is a better fit.

 I agree.  Wouldn't those users better off simply using radix tree?

 It could go either way. I was about to write the same function when
 playing with it for IRQ mapping, the idea being to propagate the initial
 tree with sparse static vectors and then switch over to dynamic IDs for
 virtual IRQ creation. I ended up going with a radix tree for other
 reasons, though.

I see.  If there are use cases where fixed and dynamic IDs need to be
mixed, no objection from me.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: software iwarp stack update

2010-09-23 Thread Bernard Metzler


linux-rdma-ow...@vger.kernel.org wrote on 09/22/2010 10:42:18 PM:

 On 09/22/2010 03:35 PM, Nicholas A. Bellinger wrote:
  On Wed, 2010-09-22 at 10:19 +0200, Bernard Metzler wrote:
 
  Earlier this year, we announced the availability of an open source,
  full software implementation of the iWARP RDMA protocol stack - see
  my email software iwarp stack from March 14th at the linux-rdma list
  (http://www.mail-archive.com/linux-rdma@vger.kernel.org/msg02940.html)
  While since then working on performance and stability, we provided
  some source code updates. Current user and kernel code is available at
  gitorious.org/softiwarp. Please see the CHANGES file in the
  kernel/ directory for a summary of the most recent changes.
 
  For more convenient testing, the latest update now allows for a
  stand-alone build of the kernel module without full kernel source
  code access. We tested the code with kernel version 2.6.34. If
  you are interested in a full software RDMA stack on Ethernet,
  please try it out.
 
  In the hope of providing useful information, I put
  net...@vger.kernel.org on copy. Subscribers of this list,
  please put me on private cc in case you reply or comment, since
  I am not subscribed to the list.
  We would be more than happy if you netdev folks would consider
  a hardware independent RDMA kernel service as something useful and
  potentially to be integrated into the mainline network stack.
 
  Why might it be useful?
  A software RDMA stack makes the semantic advantages of
  asynchronous and one-sided communication available while obsoleting
  the need to deploy dedicated RDMA hardware or any protocol offloading
  (while not matching the lowest delay numbers of real RDMA hardware).
  Implementing the IETF's iWARP protocol stack on top of TCP kernel
  sockets, softiwarp integrates with the open fabrics environment
  and thus exports the RDMA kernel and user verbs interface.
 
  The efficiency of the Linux TCP/IP network stack together with
intrinsic
  advantages of the RDMA communication model (async. posting of work
  and reaping of work completions, transfer of send buffer ownership
  to the kernel which enables zero copy transmit, peer data placement
  without application scheduling, one-sided remote read operations etc.)
  can result in improved application-to-application performance and
  less CPU load, while using the unchanged kernel TCP stack.
 
  A software RDMA stack might promote wider RDMA deployment,
  since when using the host TCP stack, it enables RDMA semantic
  independent of dedicated hardware. softiwarp peers with real
  RNICs (tested with Chelsio's T3 adapter).
 
  softiwarp is still work in progress and we are very thankful for any
  suggestions/comments/bug reports. Please advise how we should proceed
  to bring the stack further to your attention. Would it be useful to
  provide patches against the current stable kernel version or the next
  release candidate?
 
 
  Hi Bernard,
 
  So what I would recommend doing here to make things more appealing to
  DaveM and other interested NetDev folks would be to clone a seperate
  tree from the net-2.6.git or net-next-2.6.git repositories and include
  the softiwarp/kernel.git code into a fresh 'in-kernel' clone tracking
  the latest netdev code, and then keep git rebase'ing against DaveM's
  last changes and update your local tree to the lastest netdev code.
 
  Of course you will want to remove all of the 'out of tree'
LINUX_VERSION
  build macros and any other legacy bits to follow mainline kernel
  convention for your 'in-kernel' softiwarp tree.
 
 

 And then post a patch series for review.


All,

Yes, ok, thats what I will do now.
Many thanks for the helpful and encouraging replies.

Bernard.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] siw: Fix ib_register_device() for v2.6.34 kernels

2010-09-23 Thread Bernard Metzler
Thanks, Nicholas. Just applied your patch.

Bernard.

linux-rdma-ow...@vger.kernel.org wrote on 09/22/2010 10:30:17 PM:

 From: Nicholas Bellinger n...@linux-iscsi.org

 This patch adds a LINUX_VERSION_CODE  v2.6.34 check inside of
 siw_main.c:siw_register_device()
 around the use of ib_register_device().  In post v2.6.34 kernels
 this function accepts a second
 parameter used a sysfs port callback described here:

 http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;
 a=commitdiff;h=9a6edb60ec10d86b1025a0cdad68fd89f1ddaf02

 This patch currently sets this second parameter to NULL.

 Signed-off-by: Nicholas A. Bellinger n...@linux-iscsi.org
 ---
  softiwarp/siw_main.c |5 -
  1 files changed, 4 insertions(+), 1 deletions(-)

 diff --git a/softiwarp/siw_main.c b/softiwarp/siw_main.c
 index cacedea..c97adee 100644
 --- a/softiwarp/siw_main.c
 +++ b/softiwarp/siw_main.c
 @@ -233,8 +233,11 @@ int siw_register_device(struct siw_dev *dev)
 ibdev-iwcm-add_ref = siw_qp_get_ref;
 ibdev-iwcm-rem_ref = siw_qp_put_ref;
 ibdev-iwcm-get_qp = siw_get_ofaqp;
 -
 +#if LINUX_VERSION_CODE  KERNEL_VERSION(2, 6, 34)
 +   rv = ib_register_device(ibdev, NULL);
 +#else
 rv = ib_register_device(ibdev);
 +#endif
 if (rv) {
dprint(DBG_DM|DBG_ON, (dev=%s): 
   ib_register_device failed: rv=%d\n, ibdev-name, rv);
 --
 1.5.6.5

 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] opensm/osm_helper.c: use ARR_SIZE macro instead of hardcoded values

2010-09-23 Thread Sasha Khapyorsky
On 11:56 Sun 12 Sep , Yevgeny Kliteynik wrote:
 
 Signed-off-by: Yevgeny Kliteynik klit...@dev.mellanox.co.il

Applied. Thanks.

Sasha
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] siw: Add support for CRC32C offload instruction using libcrypto crc32c-intel

2010-09-23 Thread Bernard Metzler


linux-rdma-ow...@vger.kernel.org wrote on 09/23/2010 12:36:29 AM:

 On Wed, 2010-09-22 at 16:06 -0600, Jason Gunthorpe wrote:
  On Wed, Sep 22, 2010 at 02:38:31PM -0700, Nicholas A. Bellinger wrote:
 
   So I think the main bit here is the ability to request
   crc32c-intel.ko first, and then fall back to crc32c.ko when the
   former is not available on CONFIG_X86.
 
  Well, it is what Andi said, everything is working fine but there is no
  mechanism to autoload the accelerated crypto module. If you did
  modprobe crc32c_intel prior to loading your driver it would
  automatically get crc32c-intel when it asks for crc32c since it is
  loaded and a higher priority.
 

 Ah, OK.  I see what you mean now here wrt to libcrypto priorities and
 crc32c + crc32c_intel modules.   My apologies for the in-experience with
 libcrypto here..

  So, the drivers are correct to just request crc32c .. The work around
  to limited autoprobing is so trivial (modprob crc32_intel) I'm not
  sure including extra autoprobing code in the drivers is worthwhile?
 

 Indeed, I am happy to drop this patch if Bernard would be nice enough to
 add a 'modprobe crc32_intel' into the SIW scripts.


Ok, thanks for the CRC comments, quite instructive.  To sum up, now I'll
add
a minimum siw bringup script to the kernel part.

Bernard

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: igmp: Staggered igmp report intervals for unsolicited igmp reports

2010-09-23 Thread Christoph Lameter
On Wed, 22 Sep 2010, Jason Gunthorpe wrote:

  The device is ready. Its just the multicast group that has not been
  established yet.

 In IB when the SA replies to a group join the group should be ready,
 prior to that the device can't send into the group because it has no
 MLID for the group.. If you have a MLID then the group is working.

When the SA replies it has created the MLID but not reconfigured the
fabric yet. So the initial IGMP messages get lost.

 Is the issue you are dropping IGMP packets because the 224.0.0.2 join
 hasn't finished? Ideally you'd wait for the SA to reply before sending
 a IGMP, but a simpler solution might just be to use the broadcast MLID
 for packets addressed to a MGID that has not yet got a MLID. This
 would bebe similar to the ethernet behaviour of flooding.

IGMP reports are sent on the multicast group not on 224.0.0.2. 224.0.0.2
is only used when leaving a multicast group.

I thought also about solutions along the same lines. We could modify the
IB layer to send to 224.0.0.2 while until the SA has confirmed the
creation of the MC group. For that to work we first would need to modify
the SA logic to ensure that it only sends confirmation *after* the fabric
has been reconfigured. Then we need to switch the MLIDs of the MC group
when the notification is received.

If the IB layer has not joined 224.0.0.2 yet (and it will take awhile)
then we could even fallback to broadcast until its ready.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: igmp: Staggered igmp report intervals for unsolicited igmp reports

2010-09-23 Thread Christoph Lameter
On Wed, 22 Sep 2010, David Stevens wrote:

 
  Also increment the frequency so that we get a 10 reports send over a
  few seconds.

 Except you want to conform and not conform at the same time. :-)
 IGMPv2 should be: default count 2, interval 10secs
 IGMPv3 should be: default count 2, interval 1sec

This is during the period of unsolicited igmp reports. We do not know if
this group is managed using V3 or V2 since no igmp query/report has been
received yet.

 ...and no way is it a good idea to send 10 unsolicited reports on an
 Ethernet.

Why would that be an issue?

The IGMPv2 RFC has no strict limit and RFC3376
mentions that the retransmission occurs Robustness Variable times
minus one. Choosing 10 for the Robustness Variable is certainly ok.

If we do not increase the number of reports but just limit the interval
then the chance of outages of a second or so during mc group creation
causing routers missing igmp reports is significantly increased.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: igmp: Staggered igmp report intervals for unsolicited igmp reports

2010-09-23 Thread Jason Gunthorpe
On Thu, Sep 23, 2010 at 10:32:17AM -0500, Christoph Lameter wrote:

  Is the issue you are dropping IGMP packets because the 224.0.0.2 join
  hasn't finished? Ideally you'd wait for the SA to reply before sending
  a IGMP, but a simpler solution might just be to use the broadcast MLID
  for packets addressed to a MGID that has not yet got a MLID. This
  would bebe similar to the ethernet behaviour of flooding.
 
 IGMP reports are sent on the multicast group not on 224.0.0.2. 224.0.0.2
 is only used when leaving a multicast group.

Hm, that is quite different than in IGMPv3.. How does this work at all
in IB? A message to the multicast group isn't going to make it to any
routers unless the routers use some other means to join the IB MGID.

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: igmp: Staggered igmp report intervals for unsolicited igmp reports

2010-09-23 Thread Christoph Lameter
On Thu, 23 Sep 2010, Jason Gunthorpe wrote:

 On Thu, Sep 23, 2010 at 10:32:17AM -0500, Christoph Lameter wrote:

   Is the issue you are dropping IGMP packets because the 224.0.0.2 join
   hasn't finished? Ideally you'd wait for the SA to reply before sending
   a IGMP, but a simpler solution might just be to use the broadcast MLID
   for packets addressed to a MGID that has not yet got a MLID. This
   would bebe similar to the ethernet behaviour of flooding.
 
  IGMP reports are sent on the multicast group not on 224.0.0.2. 224.0.0.2
  is only used when leaving a multicast group.

 Hm, that is quite different than in IGMPv3.. How does this work at all
 in IB? A message to the multicast group isn't going to make it to any
 routers unless the routers use some other means to join the IB MGID.

IPoIB creates a infiniband multicast group via the IB calls for a IP
multicast group. Then IGMP comes into play and the kernel sends the IP
based igmp report. This igmp report must be received by an outside router
(on an IP network) in order to for traffic to get forwarded into the IB
fabric. You can end up with a IB multicast configuration that is all fine
but with loss of the unsolicited packets due to fabric reconfiguration not
being complete yet. The larger the fabric the worse the situation.

If all unsolicited igmp reports are lost then the router will
only start forwarding the mc group after the reporting intervals
(which could be in the range of minutes) when it triggers igmp reports
through a general igmp query. Until that time the MC group looks dead. And
people and software may conclude that the  network is broken.

This is a general issue for any network where configurations for MC
forwarding is needed and where initial igmp reports may get lost. A
staggering of time intervals would be a general solution to that issue.



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: igmp: Staggered igmp report intervals for unsolicited igmp reports

2010-09-23 Thread Jason Gunthorpe
On Thu, Sep 23, 2010 at 12:37:28PM -0500, Christoph Lameter wrote:
 On Thu, 23 Sep 2010, Jason Gunthorpe wrote:
 
  On Thu, Sep 23, 2010 at 10:32:17AM -0500, Christoph Lameter wrote:
 
Is the issue you are dropping IGMP packets because the 224.0.0.2 join
hasn't finished? Ideally you'd wait for the SA to reply before sending
a IGMP, but a simpler solution might just be to use the broadcast MLID
for packets addressed to a MGID that has not yet got a MLID. This
would bebe similar to the ethernet behaviour of flooding.
  
   IGMP reports are sent on the multicast group not on 224.0.0.2. 224.0.0.2
   is only used when leaving a multicast group.
 
  Hm, that is quite different than in IGMPv3.. How does this work at all
  in IB? A message to the multicast group isn't going to make it to any
  routers unless the routers use some other means to join the IB MGID.
 
 IPoIB creates a infiniband multicast group via the IB calls for a IP
 multicast group. Then IGMP comes into play and the kernel sends the IP
 based igmp report. This igmp report must be received by an outside router
 (on an IP network) in order to for traffic to get forwarded into the IB
 fabric. You can end up with a IB multicast configuration that is all fine
 but with loss of the unsolicited packets due to fabric reconfiguration not
 being complete yet. The larger the fabric the worse the situation.

But my point is that IB has very limited multicast, if I create a IB
group and then send IGMP into that group *it will not reach a router*.

I have to send something to the all routers group or the all IGMPv3
group to get it to reach a router with any reliably.

The only way this kind of scheme could work is if an IGMPv2 IPoIB
router listens for IB MGID Create notices from the SA and
automatically joins all groups that are created, so it can get IGMPv2
membership reports. Which obviously adds more delay, lag, and risk.

I'm *guessing* that the change in IGMPv3 to send reports to 224.0.0.22
(all IGMPv3 multicast address) is related to this sort of problem, and
it seems like on IB IGMPv2 is not a good fit and should not be used if
v3 is available..

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: igmp: Staggered igmp report intervals for unsolicited igmp reports

2010-09-23 Thread Christoph Lameter
On Thu, 23 Sep 2010, Jason Gunthorpe wrote:

  IPoIB creates a infiniband multicast group via the IB calls for a IP
  multicast group. Then IGMP comes into play and the kernel sends the IP
  based igmp report. This igmp report must be received by an outside router
  (on an IP network) in order to for traffic to get forwarded into the IB
  fabric. You can end up with a IB multicast configuration that is all fine
  but with loss of the unsolicited packets due to fabric reconfiguration not
  being complete yet. The larger the fabric the worse the situation.

 But my point is that IB has very limited multicast, if I create a IB
 group and then send IGMP into that group *it will not reach a router*.

The IPoIB routers automatically join all IP MC groups created.

 The only way this kind of scheme could work is if an IGMPv2 IPoIB
 router listens for IB MGID Create notices from the SA and
 automatically joins all groups that are created, so it can get IGMPv2
 membership reports. Which obviously adds more delay, lag, and risk.

Right that is how it works now.

 I'm *guessing* that the change in IGMPv3 to send reports to 224.0.0.22
 (all IGMPv3 multicast address) is related to this sort of problem, and
 it seems like on IB IGMPv2 is not a good fit and should not be used if
 v3 is available..

Existing routers do no support IGMPv3.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: igmp: Allow mininum interval specification for igmp timers.

2010-09-23 Thread David Miller
From: Christoph Lameter c...@linux.com
Date: Wed, 22 Sep 2010 13:59:30 -0500 (CDT)

 IGMP timers sometimes fire too rapidly due to randomization of the
 intervalsfrom 0 to max_delay in igmp_start_timer().
 ...
 Signed-off-by: Christoph Lameter c...@linux.com


This change seems reasonable to me, what do you think David?
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html