[PATCH] opensm: Clean up event subscriptions if a port goes away

2013-09-05 Thread Line Holen
Event subscriptions needs to be cleaned up if a port goes away.
If the port comes online again later it may no longer want to
receive the events on the same QPN. If the old QPN is used for
something else the SM forwarding events may cause QKey violations.

Signed-off-by: Line Holen line.ho...@oracle.com

---

diff --git a/include/opensm/osm_inform.h b/include/opensm/osm_inform.h
index f737441..8cefc20 100644
--- a/include/opensm/osm_inform.h
+++ b/include/opensm/osm_inform.h
@@ -2,6 +2,7 @@
  * Copyright (c) 2004-2009 Voltaire, Inc. All rights reserved.
  * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved.
  * Copyright (c) 1996-2003 Intel Corporation. All rights reserved.
+ * Copyright (c) 2013 Oracle and/or its affiliates. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
@@ -200,6 +201,35 @@ void osm_infr_insert_to_db(IN osm_subn_t * p_subn, IN 
osm_log_t * p_log,
 void osm_infr_remove_from_db(IN osm_subn_t * p_subn, IN osm_log_t * p_log,
 IN osm_infr_t * p_infr);
 
+/f* OpenSM: Inform Record/osm_infr_remove_subscriptions
+* NAME
+*  osm_infr_remove_subscriptions
+*
+* DESCRIPTION
+*  Remove all event subscriptions of a port
+*
+* SYNOPSIS
+*/
+ib_api_status_t
+osm_infr_remove_subscriptions(IN osm_subn_t * p_subn, IN osm_log_t * p_log,
+ IN ib_net64_t port_guid);
+/*
+* PARAMETERS
+*  p_subn
+*  [in] Pointer to the subnet object
+*
+*  p_log
+*  [in] Pointer to the log object
+*
+*  port_guid
+*  [in] PortGUID of the subscriber that should be removed
+*
+* RETURN
+*  CL_SUCCESS if port_guid had any subscriptions being removed
+*  CL_NOT_FOUND if port_guid did not have any active subscriptions
+* SEE ALSO
+*/
+
 /f* OpenSM: Inform Record/osm_report_notice
 * NAME
 *  osm_report_notice
diff --git a/opensm/osm_drop_mgr.c b/opensm/osm_drop_mgr.c
index b309273..5f9181a 100644
--- a/opensm/osm_drop_mgr.c
+++ b/opensm/osm_drop_mgr.c
@@ -3,6 +3,7 @@
  * Copyright (c) 2002-2012 Mellanox Technologies LTD. All rights reserved.
  * Copyright (c) 1996-2003 Intel Corporation. All rights reserved.
  * Copyright (c) 2008 Xsigo Systems Inc.  All rights reserved.
+ * Copyright (c) 2013 Oracle and/or its affiliates. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
@@ -276,6 +277,13 @@ static void drop_mgr_remove_port(osm_sm_t * sm, IN 
osm_port_t * p_port)
 
drop_mgr_clean_physp(sm, p_port-p_physp);
 
+   /* Delete event forwarding subscriptions */
+   if (osm_infr_remove_subscriptions(sm-p_subn, sm-p_log, port_guid)
+   == CL_SUCCESS)
+   OSM_LOG(sm-p_log, OSM_LOG_DEBUG,
+   Removed event subscriptions for port 0x%016 PRIx64 \n,
+   cl_ntoh64(port_guid));
+
/* initialize the p_node - may need to get node_desc later */
p_node = p_port-p_node;
 
diff --git a/opensm/osm_inform.c b/opensm/osm_inform.c
index 804c414..6318700 100644
--- a/opensm/osm_inform.c
+++ b/opensm/osm_inform.c
@@ -282,6 +282,41 @@ void osm_infr_remove_from_db(IN osm_subn_t * p_subn, IN 
osm_log_t * p_log,
OSM_LOG_EXIT(p_log);
 }
 
+ib_api_status_t osm_infr_remove_subscriptions(IN osm_subn_t * p_subn,
+ IN osm_log_t * p_log,
+ IN ib_net64_t port_guid)
+{
+   ib_gid_t gid;
+   cl_list_item_t *p_list_item;
+   osm_infr_t *p_infr;
+   ib_api_status_t status = CL_NOT_FOUND;
+
+   OSM_LOG_ENTER(p_log);
+
+   gid.unicast.interface_id = port_guid;
+   gid.unicast.prefix = p_subn-opt.subnet_prefix;
+
+   /* go over all inform info available at the subnet */
+   /* match to the given GID and delete subscriptions if match */
+   p_list_item = cl_qlist_head(p_subn-sa_infr_list);
+   while (p_list_item != cl_qlist_end(p_subn-sa_infr_list)) {
+
+   p_infr = (osm_infr_t *)p_list_item;
+   p_list_item = cl_qlist_next(p_list_item);
+
+   if (memcmp((p_infr-inform_record.subscriber_gid), gid, 
sizeof(ib_gid_t)))
+   continue;
+
+   /* Remove this event subscription */
+   osm_infr_remove_from_db(p_subn, p_log, p_infr);
+
+   status = CL_SUCCESS;
+   }
+
+   OSM_LOG_EXIT(p_log);
+   return (status);
+}
+
 /**
  * Send a report:
  * Given a target address to send to and the notice.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][MINOR] osm_sa_informinfo.c Add attribute info to log messages

2013-09-05 Thread Hal Rosenstock
On 9/3/2013 7:31 AM, Line Holen wrote:
 Signed-off-by: Line Holen line.ho...@oracle.com

Thanks. Applied with minor change noted below.

 
 ---
 
 diff --git a/opensm/osm_sa_informinfo.c b/opensm/osm_sa_informinfo.c
 index f32b88b..bf1fe97 100644
 --- a/opensm/osm_sa_informinfo.c
 +++ b/opensm/osm_sa_informinfo.c
 @@ -598,7 +598,8 @@ void osm_infr_rcv_process(IN void *context, IN void *data)
   CL_ASSERT(p_sa_mad-attr_id == IB_MAD_ATTR_INFORM_INFO);
  
   if (p_sa_mad-method != IB_MAD_METHOD_SET) {
 - OSM_LOG(sa-p_log, OSM_LOG_DEBUG, Unsupported Method (%s)\n,
 + OSM_LOG(sa-p_log, OSM_LOG_DEBUG,
 + Unsupported Method (%s) for InformInfo\n,
   ib_get_sa_method_str(p_sa_mad-method));
   osm_sa_send_error(sa, p_madw, IB_MAD_STATUS_UNSUP_METHOD_ATTR);
   goto Exit;
 @@ -626,7 +627,8 @@ void osm_infir_rcv_process(IN void *context, IN void 
 *data)
  
   if (p_sa_mad-method != IB_MAD_METHOD_GET 
   p_sa_mad-method != IB_MAD_METHOD_GETTABLE) {
 - OSM_LOG(sa-p_log, OSM_LOG_DEBUG, Unsupported Method (%s)\n,
 + OSM_LOG(sa-p_log, OSM_LOG_DEBUG,
 + Unsupported Method (%s) for InformInfo\n,

InformInfoRecord r.t. InformInfo

-- Hal

   ib_get_sa_method_str(p_sa_mad-method));
   osm_sa_send_error(sa, p_madw, IB_MAD_STATUS_UNSUP_METHOD_ATTR);
   goto Exit;
 

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] osm_sa_guidinfo_record.c False duplicate GUID error messages

2013-09-05 Thread Hal Rosenstock
On 9/3/2013 7:30 AM, Line Holen wrote:
 If the same request is received twice then the second one will cause 
 error messages indicating duplicate alias GUIDs. In this case this is 
 a false warning. The second request should be treated as a void and 
 return success to the requester.
 
 Signed-off-by: Line Holen line.ho...@oracle.com
 
 ---
 
 diff --git a/opensm/osm_sa_guidinfo_record.c b/opensm/osm_sa_guidinfo_record.c
 index 8323b38..cfaf6f3 100644
 --- a/opensm/osm_sa_guidinfo_record.c
 +++ b/opensm/osm_sa_guidinfo_record.c
 @@ -638,6 +638,14 @@ static void set_guidinfo(IN osm_sa_t *sa, IN osm_madw_t 
 *p_madw,
   }
  
  add_alias_guid:
 + /* Check whether the port already contain this alias guid
 +at this index. If so we're done, continue to next */
 + if (set_alias_guid == (*p_port-p_physp-p_guids)[i]) {
 + OSM_LOG(sa-p_log, OSM_LOG_DEBUG,
 + The alias GUID is already correctly assigned, 
 continue\n);
 + continue;
 + }
 +

I don't think this handles all the cases properly. Please see modified
patch to follow this.

-- Hal

   /* allocate alias guid and add to alias guid table */
   p_alias_guid = osm_alias_guid_new(set_alias_guid, p_port);
   if (!p_alias_guid) {
 @@ -656,9 +664,12 @@ add_alias_guid:
   /* alias GUID is a duplicate */
   OSM_LOG(sa-p_log, OSM_LOG_ERROR, ERR 5108: 
   Duplicate alias port GUID 0x% PRIx64
 -  index %d base port GUID 0x% PRIx64 \n,
 +  index %d base port GUID 0x% PRIx64
 + , alias GUID already assigned to 
 + base port GUID 0x% PRIx64 \n,
   cl_ntoh64(p_alias_guid-alias_guid), i,
 - cl_ntoh64(p_alias_guid-p_base_port-guid));
 + cl_ntoh64(p_alias_guid-p_base_port-guid),
 + 
 cl_ntoh64(p_alias_guid_check-p_base_port-guid));
   osm_alias_guid_delete(p_alias_guid);
   /* clear response guid at index to indicate duplicate */
   p_rcvd_rec-guid_info.guid[i % 8] = 0;
 

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH opensm] osm_sa_guidinfo_record.c: False duplicate GUID error messages

2013-09-05 Thread Hal Rosenstock

If the same request is received twice then the second one will cause
error messages indicating duplicate alias GUIDs. In this case this is
a false warning. The second request should be treated as a void and
return success to the requester.

Signed-off-by: Line Holen line.ho...@oracle.com
Signed-off-by: Hal Rosenstock h...@mellanox.com
---
diff --git a/opensm/osm_sa_guidinfo_record.c b/opensm/osm_sa_guidinfo_record.c
index 8323b38..28f6c0f 100644
--- a/opensm/osm_sa_guidinfo_record.c
+++ b/opensm/osm_sa_guidinfo_record.c
@@ -653,15 +653,21 @@ add_alias_guid:

p_alias_guid-alias_guid,

p_alias_guid-map_item);
if (p_alias_guid_check != p_alias_guid) {
-   /* alias GUID is a duplicate */
-   OSM_LOG(sa-p_log, OSM_LOG_ERROR, ERR 5108: 
-   Duplicate alias port GUID 0x% PRIx64
-index %d base port GUID 0x% PRIx64 \n,
-   cl_ntoh64(p_alias_guid-alias_guid), i,
-   cl_ntoh64(p_alias_guid-p_base_port-guid));
-   osm_alias_guid_delete(p_alias_guid);
-   /* clear response guid at index to indicate duplicate */
-   p_rcvd_rec-guid_info.guid[i % 8] = 0;
+   /* alias GUID is a duplicate if it exists on another 
port or on the same port but at another index */
+   if (p_alias_guid_check-p_base_port != p_port ||
+   (*p_port-p_physp-p_guids)[i] != set_alias_guid) {
+   OSM_LOG(sa-p_log, OSM_LOG_ERROR, ERR 5108: 
+   Duplicate alias port GUID 0x% PRIx64
+index %d base port GUID 0x% PRIx64
+   , alias GUID already assigned to 
+   base port GUID 0x% PRIx64 \n,
+   cl_ntoh64(p_alias_guid-alias_guid), i,
+   
cl_ntoh64(p_alias_guid-p_base_port-guid),
+   
cl_ntoh64(p_alias_guid_check-p_base_port-guid));
+   osm_alias_guid_delete(p_alias_guid);
+   /* clear response guid at index to indicate 
duplicate */
+   p_rcvd_rec-guid_info.guid[i % 8] = 0;
+   }
} else {
del_alias_guid = (*p_port-p_physp-p_guids)[i];
if (del_alias_guid) {
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rtnl_lock deadlock on 3.10

2013-09-05 Thread Steve Wise

On 9/5/2013 5:02 AM, Bart Van Assche wrote:

On 07/30/13 14:54, Steve Wise wrote:

On 7/29/2013 6:02 PM, Shawn Bohrer wrote:

On Mon, Jul 15, 2013 at 09:38:19AM -0500, Shawn Bohrer wrote:

On Wed, Jul 03, 2013 at 08:26:11PM +0300, Or Gerlitz wrote:

On 03/07/2013 20:22, Shawn Bohrer wrote:
On Wed, Jul 03, 2013 at 07:33:07AM +0200, Hannes Frederic Sowa 
wrote:
On Wed, Jul 03, 2013 at 07:11:52AM +0200, Hannes Frederic Sowa 
wrote:

On Tue, Jul 02, 2013 at 01:38:26PM +, Cong Wang wrote:

On Tue, 02 Jul 2013 at 08:28 GMT, Hannes Frederic Sowa
han...@stressinduktion.org wrote:

On Mon, Jul 01, 2013 at 09:54:56AM -0500, Shawn Bohrer wrote:

I've managed to hit a deadlock at boot a couple times while
testing
the 3.10 rc kernels.  It seems to always happen when my network
devices are initializing.  This morning I updated to v3.10 and
made a
few config tweaks and so far I've hit it 4 out of 5 reboots.
It looks
like most processes are getting stuck on rtnl_lock.  Below is
a boot
log with the soft lockup prints.  Please let know if there 
is any

other information I can provide:

Could you try a build with CONFIG_LOCKDEP enabled?


The problem is clear: ib_register_device() is called with
rtnl_lock,
but itself needs device_mutex, however, ib_register_client() 
first

acquires device_mutex, then indirectly calls register_netdev()
which
takes rtnl_lock. Deadlock!

One possible fix is always taking rtnl_lock before taking
device_mutex, something like below:

diff --git a/drivers/infiniband/core/device.c
b/drivers/infiniband/core/device.c
index 18c1ece..890870b 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -381,6 +381,7 @@ int ib_register_client(struct ib_client
*client)
  {
  struct ib_device *device;
+rtnl_lock();
  mutex_lock(device_mutex);
  list_add_tail(client-list, client_list);
@@ -389,6 +390,7 @@ int ib_register_client(struct ib_client
*client)
  client-add(device);
  mutex_unlock(device_mutex);
+rtnl_unlock();
  return 0;
  }
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c
b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index b6e049a..5a7a048 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -1609,7 +1609,7 @@ static struct net_device
*ipoib_add_port(const char *format,
  goto event_failed;
  }
-result = register_netdev(priv-dev);
+result = register_netdevice(priv-dev);
  if (result) {
  printk(KERN_WARNING %s: couldn't register ipoib port
%d; error %d\n,
 hca-name, port, result);

Looks good to me. Shawn, could you test this patch?
ib_unregister_device/ib_unregister_client would need the same 
change,

too. I have not checked the other -add() and -remove()
functions. Also
cc'ed linux-rdma@vger.kernel.org, Roland Dreier.
Cong's patch is missing the #include linux/rtnetlink.h but 
otherwise
I've had 34 successful reboots with no deadlocks which is a good 
sign.

It sounds like there are more paths that need to be audited and a
proper patch submitted.  I can do more testing later if needed.

Thanks,
Shawn


Guys, I was a bit busy today looking into that, but I don't think we
want the IB core layer  (core/device.c) to
use rtnl locking which is something that belongs to the network 
stack.

Has anymore thought been put into a proper fix for this issue?

I'm no expert in this area but I'm having a hard time seeing a
different solution than the one Cong suggested.  Just to be clear the
deadlock I hit was between cxgb3 and the ipoib module, so I've Cc'd
Steve Wise in case he has a better solution from the Chelsio side.


I don't know of another way to resolve this.   The rtnl lock is used in
ipoib and mlx4 already.  I think we should go forward with the proposed
patch.


(replying to an e-mail of one month ago)

Hello,

It would be appreciated if anyone could report what the current status 
of this issue is. I think a deadlock I ran into with kernels 3.10 and 
3.11 and PCI pass-through is related to this issue. See also 
http://bugzilla.kernel.org/show_bug.cgi?id=60856 for the lockdep report.


Thanks,

Bart.



Roland, what do you think?

As I've said, I think we should go ahead with using the rtnl lock in the 
core.  Is there a complete patch available for review?  looks like the 
original was a partial fix.


Steve.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rtnl_lock deadlock on 3.10

2013-09-05 Thread Shawn Bohrer
On Thu, Sep 05, 2013 at 10:14:51AM -0500, Steve Wise wrote:
 On 9/5/2013 5:02 AM, Bart Van Assche wrote:
 On 07/30/13 14:54, Steve Wise wrote:
 On 7/29/2013 6:02 PM, Shawn Bohrer wrote:
 On Mon, Jul 15, 2013 at 09:38:19AM -0500, Shawn Bohrer wrote:
 On Wed, Jul 03, 2013 at 08:26:11PM +0300, Or Gerlitz wrote:
 On 03/07/2013 20:22, Shawn Bohrer wrote:
 On Wed, Jul 03, 2013 at 07:33:07AM +0200, Hannes
 Frederic Sowa wrote:
 On Wed, Jul 03, 2013 at 07:11:52AM +0200, Hannes
 Frederic Sowa wrote:
 On Tue, Jul 02, 2013 at 01:38:26PM +, Cong Wang wrote:
 On Tue, 02 Jul 2013 at 08:28 GMT, Hannes Frederic Sowa
 han...@stressinduktion.org wrote:
 On Mon, Jul 01, 2013 at 09:54:56AM -0500, Shawn Bohrer wrote:
 I've managed to hit a deadlock at boot a couple times while
 testing
 the 3.10 rc kernels.  It seems to always happen when my network
 devices are initializing.  This morning I updated to v3.10 and
 made a
 few config tweaks and so far I've hit it 4 out of 5 reboots.
 It looks
 like most processes are getting stuck on rtnl_lock.  Below is
 a boot
 log with the soft lockup prints.  Please let
 know if there is any
 other information I can provide:
 Could you try a build with CONFIG_LOCKDEP enabled?
 
 The problem is clear: ib_register_device() is called with
 rtnl_lock,
 but itself needs device_mutex, however,
 ib_register_client() first
 acquires device_mutex, then indirectly calls register_netdev()
 which
 takes rtnl_lock. Deadlock!
 
 One possible fix is always taking rtnl_lock before taking
 device_mutex, something like below:
 
 diff --git a/drivers/infiniband/core/device.c
 b/drivers/infiniband/core/device.c
 index 18c1ece..890870b 100644
 --- a/drivers/infiniband/core/device.c
 +++ b/drivers/infiniband/core/device.c
 @@ -381,6 +381,7 @@ int ib_register_client(struct ib_client
 *client)
   {
   struct ib_device *device;
 +rtnl_lock();
   mutex_lock(device_mutex);
   list_add_tail(client-list, client_list);
 @@ -389,6 +390,7 @@ int ib_register_client(struct ib_client
 *client)
   client-add(device);
   mutex_unlock(device_mutex);
 +rtnl_unlock();
   return 0;
   }
 diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c
 b/drivers/infiniband/ulp/ipoib/ipoib_main.c
 index b6e049a..5a7a048 100644
 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
 +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
 @@ -1609,7 +1609,7 @@ static struct net_device
 *ipoib_add_port(const char *format,
   goto event_failed;
   }
 -result = register_netdev(priv-dev);
 +result = register_netdevice(priv-dev);
   if (result) {
   printk(KERN_WARNING %s: couldn't register ipoib port
 %d; error %d\n,
  hca-name, port, result);
 Looks good to me. Shawn, could you test this patch?
 ib_unregister_device/ib_unregister_client would need
 the same change,
 too. I have not checked the other -add() and -remove()
 functions. Also
 cc'ed linux-rdma@vger.kernel.org, Roland Dreier.
 Cong's patch is missing the #include linux/rtnetlink.h
 but otherwise
 I've had 34 successful reboots with no deadlocks which
 is a good sign.
 It sounds like there are more paths that need to be audited and a
 proper patch submitted.  I can do more testing later if needed.
 
 Thanks,
 Shawn
 
 Guys, I was a bit busy today looking into that, but I don't think we
 want the IB core layer  (core/device.c) to
 use rtnl locking which is something that belongs to the
 network stack.
 Has anymore thought been put into a proper fix for this issue?
 I'm no expert in this area but I'm having a hard time seeing a
 different solution than the one Cong suggested.  Just to be clear the
 deadlock I hit was between cxgb3 and the ipoib module, so I've Cc'd
 Steve Wise in case he has a better solution from the Chelsio side.
 
 I don't know of another way to resolve this.   The rtnl lock is used in
 ipoib and mlx4 already.  I think we should go forward with the proposed
 patch.
 
 (replying to an e-mail of one month ago)
 
 Hello,
 
 It would be appreciated if anyone could report what the current
 status of this issue is. I think a deadlock I ran into with
 kernels 3.10 and 3.11 and PCI pass-through is related to this
 issue. See also http://bugzilla.kernel.org/show_bug.cgi?id=60856
 for the lockdep report.
 
 Thanks,
 
 Bart.
 
 
 Roland, what do you think?
 
 As I've said, I think we should go ahead with using the rtnl lock in
 the core.  Is there a complete patch available for review?  looks
 like the original was a partial fix.

I've been running with Cong's partial fix for the past couple of
months, and I'm pretty sure no complete patch has been posted.
I may be able to look at the missing pieces tomorrow and see if I can
put together a patch but if someone else wants to run with this feel
free.

--
Shawn
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 0/4] RDMA/nes and RDMA/cxgb4: iWARP Port Mapper Overview

2013-09-05 Thread Nikolova, Tatyana E

Hello Steve,

Thank you for the feedback.

 I think there are lots of #defines and enums replicated in both 
 nes_netlink.h and c4iw_netlink.h.  For example, the IWPM_* 
 defines and enums.  It seems like we could put all this in a common header 
 file to be included by all iwarp providers?  
 Say include/rdma/iw_portmap.h or something.

Having a common rdma/iw_portmap.h is a good idea. 

 (and maybe even a common core module if there is enough common 
 functions)

At this early stage of the project, I think that having a separate core module 
for the iwarp port mapper could be overhead. That is the reason we initially 
decided to keep the changes locally in the drivers using the service.

Tatyana

-Original Message-
From: Steve Wise [mailto:sw...@opengridcomputing.com] 
Sent: Tuesday, September 03, 2013 10:34 AM
To: Nikolova, Tatyana E
Cc: Roland Dreier; Sharp, Robert O; Lacombe, John S; vi...@chelsio.com; 
linux-rdma@vger.kernel.org
Subject: Re: [PATCH 0/4] RDMA/nes and RDMA/cxgb4: iWARP Port Mapper Overview

On 8/31/2013 2:41 PM, Tatyana Nikolova wrote:
 Hello All,

 This patch series adds iWARP Port Mapper (IWPM) service support in 
 RDMA/core, RDMA/nes driver and RDMA/cxgb4 driver. The iWARP Port 
 Mapper implementation is based on the port mapper specification 
 section in the Sockets Direct Protocol paper - 
 http://www.rdmaconsortium.org/home/draft-pinkerton-iwarp-sdp-v1.0.pdf


Hey Tatyana,

I'm replying to 0/4 because these comments are for both iwarp providers.

I think there are lots of #defines and enums replicated in both nes_netlink.h 
and c4iw_netlink.h.  For example, the IWPM_* defines and enums.  It seems like 
we could put all this in a common header file to be included by all iwarp 
providers?  Say include/rdma/iw_portmap.h or something.  Basically the 
interface to the user mode daemon is the same for all providers, no?  Those 
bits should be in a common header.  (and maybe even a common core module if 
there is enough common functions).

Thoughts?

Steve.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Strange NFS client ACK behaviour

2013-09-05 Thread Wendy Cheng
CC linux-nfs .. maybe this is obvious to someone there ... Two
comments inlined below.

On Tue, Sep 3, 2013 at 11:28 AM, Markus Stockhausen
stockhau...@collogia.de wrote:
 Hello,

 we observed a performance drop in our IPoIB NFS backup
 infrastructure since we switched to machines with newer
 kernels. As I do not know where to start I hope someone
 on this list can give me hint where to dig for more details.

In case of no other reply, I would start w/ a socket program (or a
network performance measuring tool) on the interface that does similar
logic as dd you described below; that is, send a 256K message in a
fixed number of loops (so total transfer size somewhere close to your
file size) between client and server, followed by comparing the
interrupt counters (cat /proc/interrtups) on both kernels. If the
interrupt count differs as you described, the problem is most likely
with the IB driver, not NFS layer.


 To make a long story short. We use ConnectX cards with the
 standard kernel drivers on version 2.6.32 (Ubuntu 10.04), 3.5
 (Ubuntu 12.04) and 3.10 (Fedora 19). The very simple and not
 scientific test consists of mounting a NFS share using IPoIB UD
 network interfaces at MTU of 2044. Afterwards read a large file
 on the client side with dd if=file of=/dev/null bs=256K.
 During the transfer we run a tcpdump on the ibX interface on
 the NFS server side. No special settings for kernel parameters
 until now.

I don't know much about ConnectX. Not sure what IPoIB UD means ?
Datagram vs. CM or TCP vs. UDP ?


 When doing the test with a 2.6.32 kernel based client we see the
 following packet sequence. More or less a lot of transferd blocks
 from the NFS server to the client with sometimes an ACK package
 from the client to the server:

 16:16:45.050930 IP server.nfs  cli_2_6_32.896:
   Flags [.], seq 8909853:8913837, ack 1154149,
   win 604, options [nop,nop,TS val 1640401415
   ecr 3881919089], length 3984
 16:16:45.050936 IP server.nfs  cli_2_6_32.896:
   Flags [.], seq 8913837:8917821, ack 1154149,
   win 604, options [nop,nop,TS val 1640401415
   ecr 3881919089], length 3984

 ... 8 more ...

 16:16:45.050976 IP cli_2_6_32.896  server.nfs:
   Flags [.], ack 8909853, win 24574, options
   [nop,nop,TS val 3881919089 ecr 1640401415],
   length 0
 ...

 After switchng to a client with a newer kernel (3.5 or 3.10) the
 sequence all of a sudden gives just the opposite behaviour.
 One should note that this is the same server as in the test
 above. The server sends bigger packets (I guess TSO is doing
 the rest of the work). After each packet the client sends
 several ACK packages back.

 16:15:21.038782 IP server.nfs  cli_3_5_0.928:
   Flags [.], seq 9612429:9652269, ack 372776,
   win 5815, options [nop,nop,TS val 1640380412
   ecr 560111379], length 39840
 16:15:21.038806 IP cli_3_5_0.928  server.nfs:
   Flags [.], ack 9542205, win 16384, options
   [nop,nop,TS val 560111379 ecr 1640380412],
   length 0
 16:15:21.038812 IP cli_3_5_0.928  server.nfs:
   Flags [.], ack 9546077, win 16384, options
   [nop,nop,TS val 560111379 ecr 1640380412],
 length 0

 ... 6-8 more ...

 The visible side effects of this changed processing include:
 - NIC interrupts on the NFS servers raise by a factor of 8.
 - Transfer speed lowers by 50% (400-200 MB/sec)

 Best regards.

 Markus
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 0/4] RDMA/nes and RDMA/cxgb4: iWARP Port Mapper Overview

2013-09-05 Thread Nikolova, Tatyana E
Hello Or,

 Does this series follows the netconf 2011session e.g from this link 
 http://vger.kernel.org/netconf2011_slides/pj_netconf2011.ppt?
Yes

Thanks,
Tatyana

-Original Message-
From: linux-rdma-ow...@vger.kernel.org 
[mailto:linux-rdma-ow...@vger.kernel.org] On Behalf Of Or Gerlitz
Sent: Wednesday, September 04, 2013 5:19 AM
To: Nikolova, Tatyana E
Cc: Roland Dreier; Sharp, Robert O; Lacombe, John S; Vipul Pandya; Steve Wise; 
linux-rdma
Subject: Re: [PATCH 0/4] RDMA/nes and RDMA/cxgb4: iWARP Port Mapper Overview

On Sat, Aug 31 2013, Tatyana Nikolova tatyana.e.nikol...@intel.com wrote:
[...]
 The patches are built against Roland's infiniband tree for-next branch.
 We would like this series to get in the linux-3.12 merge window.

So why submitting it only 48 hours before the merge window opens up?

Does this series follows the netconf 2011session e.g from this link 
http://vger.kernel.org/netconf2011_slides/pj_netconf2011.ppt ?
Personally, I will not be able to review the concept and core patches before 
next week, but if others here can and are OK for this to 3.12, let it be.

Or.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in the 
body of a message to majord...@vger.kernel.org More majordomo info at  
http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH opensm] osm_sa_guidinfo_record.c: False duplicate GUID error messages

2013-09-05 Thread Line Holen

Looks good to me. Thanks,
Line

On 09/05/13 16:18, Hal Rosenstock wrote:

If the same request is received twice then the second one will cause
error messages indicating duplicate alias GUIDs. In this case this is
a false warning. The second request should be treated as a void and
return success to the requester.

Signed-off-by: Line Holenline.ho...@oracle.com
Signed-off-by: Hal Rosenstockh...@mellanox.com
---
diff --git a/opensm/osm_sa_guidinfo_record.c b/opensm/osm_sa_guidinfo_record.c
index 8323b38..28f6c0f 100644
--- a/opensm/osm_sa_guidinfo_record.c
+++ b/opensm/osm_sa_guidinfo_record.c
@@ -653,15 +653,21 @@ add_alias_guid:

p_alias_guid-alias_guid,

p_alias_guid-map_item);
if (p_alias_guid_check != p_alias_guid) {
-   /* alias GUID is a duplicate */
-   OSM_LOG(sa-p_log, OSM_LOG_ERROR, ERR 5108: 
-   Duplicate alias port GUID 0x% PRIx64
-index %d base port GUID 0x% PRIx64 \n,
-   cl_ntoh64(p_alias_guid-alias_guid), i,
-   cl_ntoh64(p_alias_guid-p_base_port-guid));
-   osm_alias_guid_delete(p_alias_guid);
-   /* clear response guid at index to indicate duplicate */
-   p_rcvd_rec-guid_info.guid[i % 8] = 0;
+   /* alias GUID is a duplicate if it exists on another 
port or on the same port but at another index */
+   if (p_alias_guid_check-p_base_port != p_port ||
+   (*p_port-p_physp-p_guids)[i] != set_alias_guid) {
+   OSM_LOG(sa-p_log, OSM_LOG_ERROR, ERR 5108: 
+   Duplicate alias port GUID 0x% PRIx64
+index %d base port GUID 0x% PRIx64
+   , alias GUID already assigned to 
+   base port GUID 0x% PRIx64 \n,
+   cl_ntoh64(p_alias_guid-alias_guid), i,
+   
cl_ntoh64(p_alias_guid-p_base_port-guid),
+   
cl_ntoh64(p_alias_guid_check-p_base_port-guid));
+   osm_alias_guid_delete(p_alias_guid);
+   /* clear response guid at index to indicate 
duplicate */
+   p_rcvd_rec-guid_info.guid[i % 8] = 0;
+   }
} else {
del_alias_guid = (*p_port-p_physp-p_guids)[i];
if (del_alias_guid) {
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 0/4] RDMA/nes and RDMA/cxgb4: iWARP Port Mapper Overview

2013-09-05 Thread Nikolova, Tatyana E
Hello Sean,

+   RDMA_NL_NES,
+   RDMA_NL_C4IW

 Why is the communication with the daemon device specific?  

These enums are necessary. The communication with the IWPM daemon is device 
specific, because each device is a different netlink client and needs to 
register with netlink a set of callback functions. The drivers nes and cxgb4 
are consuming two netlink clients, using the following 

ibnl_add_client(RDMA_NL_NES, RDMA_NL_IWPM_NUM_OPS, nes_nl_cb_table)
ibnl_add_client(RDMA_NL_C4IW, RDMA_NL_IWPM_NUM_OPS, c4iw_nl_cb_table)

The current implementation will work with two iwarp providers (nes and cxgb4) 
on the same system and the user space iwarp port mapper will be servicing two 
clients at the same time.

 Can the iwarp cm fully encapsulate these changes?
This needs more consideration, because it may not be a trivial change.

Thank you,
Tatyana

-Original Message-
From: Hefty, Sean 
Sent: Tuesday, September 03, 2013 12:22 PM
To: Steve Wise; Nikolova, Tatyana E
Cc: Roland Dreier; Sharp, Robert O; Lacombe, John S; vi...@chelsio.com; 
linux-rdma@vger.kernel.org
Subject: RE: [PATCH 0/4] RDMA/nes and RDMA/cxgb4: iWARP Port Mapper Overview

 I think there are lots of #defines and enums replicated in both 
 nes_netlink.h and c4iw_netlink.h.  For example, the IWPM_* defines and 
 enums.  It seems like we could put all this in a common header file to 
 be included by all iwarp providers?  Say include/rdma/iw_portmap.h or 
 something.  Basically the interface to the user mode daemon is the 
 same for all providers, no?  Those bits should be in a common header.  
 (and maybe even a common core module if there is enough common functions).
 
 Thoughts?

I agree.  This enum change looks off:

+   RDMA_NL_NES,
+   RDMA_NL_C4IW

Why is the communication with the daemon device specific?  Can the iwarp cm 
fully encapsulate these changes?

- Sean
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html