[PATCH v2][TRIVIAL] osm_sm_state_mgr.c Trivial log changes

2013-06-20 Thread Line Holen
Signed-off-by: Line Holen line.ho...@oracle.com

---

diff --git a/opensm/osm_sm_state_mgr.c b/opensm/osm_sm_state_mgr.c
index 11defdd..5d4b651 100644
--- a/opensm/osm_sm_state_mgr.c
+++ b/opensm/osm_sm_state_mgr.c
@@ -157,7 +157,7 @@ static void sm_state_mgr_start_polling(osm_sm_t * sm)
cl_status = cl_timer_start(sm-polling_timer, timeout);
if (cl_status != CL_SUCCESS)
OSM_LOG(sm-p_log, OSM_LOG_ERROR, ERR 3210: 
-   Failed to start timer\n);
+   Failed to start polling timer\n);
 
OSM_LOG_EXIT(sm-p_log);
 }
@@ -201,7 +201,8 @@ void osm_sm_state_mgr_polling_callback(IN void *context)
 * osm_sm_state_mgr_process with signal OSM_SM_SIGNAL_POLLING_TIMEOUT
 */
sm-retry_number++;
-   OSM_LOG(sm-p_log, OSM_LOG_VERBOSE, Retry number:%d\n,
+   OSM_LOG(sm-p_log, OSM_LOG_VERBOSE, SM State %d (%s), Retry 
number:%d\n,
+   sm-p_subn-sm_state,  
osm_get_sm_mgr_state_str(sm-p_subn-sm_state),
sm-retry_number);
 
if (sm-retry_number = sm-p_subn-opt.polling_retry_number) {
@@ -219,7 +220,7 @@ void osm_sm_state_mgr_polling_callback(IN void *context)
cl_status = cl_timer_start(sm-polling_timer, timeout);
if (cl_status != CL_SUCCESS)
OSM_LOG(sm-p_log, OSM_LOG_ERROR, ERR 3211: 
-   Failed to restart timer\n);
+   Failed to restart polling timer\n);
 
 Exit:
OSM_LOG_EXIT(sm-p_log);
@@ -414,8 +415,8 @@ ib_api_status_t osm_sm_state_mgr_process(osm_sm_t * sm,
 * handover from it.
 */
OSM_LOG(sm-p_log, OSM_LOG_VERBOSE,
-   Forcing heavy sweep. 
-   Received OSM_SM_SIGNAL_HANDOVER or 
OSM_SM_SIGNAL_POLLING_TIMEOUT\n);
+   Forcing heavy sweep. Received signal %s\n,
+   osm_get_sm_mgr_signal_str(signal));
/* Force set_client_rereg_on_sweep, we don't know what 
the other
 * SM may have configure/done on the fabric.
 */
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] mm: Revert pinned_vm braindamage

2013-06-20 Thread Ingo Molnar

* Thomas Gleixner t...@linutronix.de wrote:

 On Thu, 13 Jun 2013, Andrew Morton wrote:
  Let's try to get this wrapped up?
  
  On Thu, 6 Jun 2013 14:43:51 +0200 Peter Zijlstra pet...@infradead.org 
  wrote:
  
   
   Patch bc3e53f682 (mm: distinguish between mlocked and pinned pages)
   broke RLIMIT_MEMLOCK.
  
  I rather like what bc3e53f682 did, actually.  RLIMIT_MEMLOCK limits the
  amount of memory you can mlock().  Nice and simple.
  
  This pinning thing which infiniband/perf are doing is conceptually
  different and if we care at all, perhaps we should be looking at adding
  RLIMIT_PINNED.
 
 Actually PINNED is just a stronger version of MEMLOCK. PINNED and 
 MEMLOCK are both preventing the page from being paged out. PINNED adds 
 the constraint of preventing minor faults as well.
 
 So I think the really important tuning knob is the limitation of pages 
 which cannot be paged out. And this is what RLIMIT_MEMLOCK is about.
 
 Now if you want to add RLIMIT_PINNED as well, then it only limits the 
 number of pages which cannot create minor faults, but that does not 
 affect the limitation of total pages which cannot be paged out.

Agreed.

( Furthermore, the RLIMIT_MEMLOCK semantics change actively broke code so
  this is not academic and it would be nice to progress with it. )

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] mm: Revert pinned_vm braindamage

2013-06-20 Thread Ingo Molnar

* Christoph Lameter c...@gentwo.org wrote:

 On Mon, 17 Jun 2013, Peter Zijlstra wrote:
 
  They did no such thing; being one of those who wrote such code. I
  expressly used RLIMIT_MEMLOCK for its the one limit userspace has to
  limit pages that are exempt from paging.
 
 Dont remember reviewing that. Assumptions were wrong in that patch then.
 
   Pinned pages are exempted by the kernel. A device driver or some other
   kernel process (reclaim, page migration, io etc) increase the page count.
   There is currently no consistent accounting for pinned pages. The
   vm_pinned counter was introduced to allow the largest pinners to track
   what they did.
 
  No, not the largest, user space controlled pinnners. The thing that
  makes all the difference is the _USER_ control.
 
 The pinning *cannot* be done from user space. Here it is the IB subsystem
 that is doing it.

Peter clearly pointed it out that in the perf case it's user-space that 
initiates the pinned memory mapping which is resource-controlled via 
RLIMIT_MEMLOCK - and this was implemented that way before your commit 
broke the code.

You seem to be hell bent on defining 'memory pinning' only as the thing 
done via the mlock*() system calls, but that is a nonsensical distinction 
that actively and incorrectly ignores other system calls that can and do 
pin memory legitimately.

If some other system call results in mapping pinned memory that is at 
least as restrictively pinned as an mlock()-ed vma (the perf syscall is 
such) then it's entirely proper design to be resource controlled under 
RLIMIT_MEMLOCK as well. In fact this worked so before your commit broke 
it.

   mlockall does not require CAP_IPC_LOCK. Never had an issue.
 
  MCL_FUTURE does absolutely require CAP_IPC_LOCK, MCL_CURRENT requires 
  a huge (as opposed to the default 64k) RLIMIT or CAP_IPC_LOCK.
 
  There's no argument there, look at the code.
 
 I am sorry but we have been mlockall() for years now without the issues 
 that you are bringing up. AFAICT mlockall does not require MCL_FUTURE.

You only have to read the mlockall() code to see that Peter's claim is 
correct:

mm/mlock.c:

SYSCALL_DEFINE1(mlockall, int, flags)
{
unsigned long lock_limit;
int ret = -EINVAL;

if (!flags || (flags  ~(MCL_CURRENT | MCL_FUTURE)))
goto out;

ret = -EPERM;
if (!can_do_mlock())
goto out;
...


int can_do_mlock(void)
{
if (capable(CAP_IPC_LOCK))
return 1;
if (rlimit(RLIMIT_MEMLOCK) != 0)
return 1;
return 0;
}
EXPORT_SYMBOL(can_do_mlock);

Q.E.D.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 0/4] Add IPv6 support for iWARP

2013-06-20 Thread Steve Wise

On 6/19/2013 11:08 PM, David Miller wrote:

From: Steve Wise sw...@opengridcomputing.com
Date: Wed, 19 Jun 2013 21:19:13 -0500


On 6/19/2013 8:01 PM, David Miller wrote:

From: Vipul Pandya vi...@chelsio.com
Date: Wed, 12 Jun 2013 17:11:38 +0530


We have included all the maintainers of respective drivers. Kindly
review the change and let us know in case of any review comments.

I have not seen anyone review v2 of this patch series.


Reviewed-by: Steve Wise sw...@opengridcomputing.com

You wrote the first patch, and I bet you didn't even read the code in
the cxgb4 driver.  So your review is sort of pointless... UNLESS you
spotted the obvious bugs in these changes, that would have been
interesting.

Because NOBODY, and I mean NOBODY, even looked at the build of the
cxgb4 changes.

Tell me what this does:

struct tid_info *t = dev-rdev.lldi.tids;
int status = GET_AOPEN_STATUS(ntohl(rpl-atid_status));
+   struct sockaddr_in *la = (struct sockaddr_in *)ep-com.local_addr;
+   struct sockaddr_in *ra = (struct sockaddr_in *)ep-com.remote_addr;
+   struct sockaddr_in6 *la6 = (struct sockaddr_in6 *)ep-com.local_addr;
+   struct sockaddr_in6 *ra6 = (struct sockaddr_in6 *)ep-com.remote_addr;
+
+
  
  	ep = lookup_atid(t, atid);


Dereferencing 'ep' before initializing it.

The compiler complains loudly about this, therefore nobody even looked at
the build logs from these changes before submitting them to me.

That translates to don't care, and if the people submitting this
code don't care why should I?

Sorry, not impressed.  I'm seriously going to take my time reviewing
any future submissions of these changes, because it's obvious that
even the people writing and submitting this code DO NOT CARE.



We do care.  We screwed up.


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 0/4] Add IPv6 support for iWARP

2013-06-20 Thread Vipul Pandya


On 20-06-2013 09:38, David Miller wrote:
 From: Steve Wise sw...@opengridcomputing.com
 Date: Wed, 19 Jun 2013 21:19:13 -0500
 
 On 6/19/2013 8:01 PM, David Miller wrote:
 From: Vipul Pandya vi...@chelsio.com
 Date: Wed, 12 Jun 2013 17:11:38 +0530

 We have included all the maintainers of respective drivers. Kindly
 review the change and let us know in case of any review comments.
 I have not seen anyone review v2 of this patch series.


 Reviewed-by: Steve Wise sw...@opengridcomputing.com
 
 You wrote the first patch, and I bet you didn't even read the code in
 the cxgb4 driver.  So your review is sort of pointless... UNLESS you
 spotted the obvious bugs in these changes, that would have been
 interesting.
 
 Because NOBODY, and I mean NOBODY, even looked at the build of the
 cxgb4 changes.
 
 Tell me what this does:
 
   struct tid_info *t = dev-rdev.lldi.tids;
   int status = GET_AOPEN_STATUS(ntohl(rpl-atid_status));
 + struct sockaddr_in *la = (struct sockaddr_in *)ep-com.local_addr;
 + struct sockaddr_in *ra = (struct sockaddr_in *)ep-com.remote_addr;
 + struct sockaddr_in6 *la6 = (struct sockaddr_in6 *)ep-com.local_addr;
 + struct sockaddr_in6 *ra6 = (struct sockaddr_in6 *)ep-com.remote_addr;
 +
 +
  
   ep = lookup_atid(t, atid);
 
 Dereferencing 'ep' before initializing it.
 
 The compiler complains loudly about this, therefore nobody even looked at
 the build logs from these changes before submitting them to me.
 
 That translates to don't care, and if the people submitting this
 code don't care why should I?
 
 Sorry, not impressed.  I'm seriously going to take my time reviewing
 any future submissions of these changes, because it's obvious that
 even the people writing and submitting this code DO NOT CARE.
 

I am really very sorry for this. Somehow my compiler is not giving me
any warnings for this. My compiler is gcc 4.4.6 20120305 (Red Hat
4.4.6-4). Previously also once it has happened that my compiler did not
give any warning but your build environment caught. Is there any special
gcc option I have to pass with make command for this? I am following the
checklist mentioned in Documentation/SubmitChecklist file.

We always make sure all our drivers are building cleanly before
submitting the drivers. We also have unit tested this code. However the
problematic code gets executed only in error path hence could not catch
during unit testing.

I will resubmitt the series with the changes. Your review comments are
very valuable for us.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] libibverbs: Allow arbitrary int values for MTU

2013-06-20 Thread Jeff Squyres (jsquyres)
On Jun 18, 2013, at 2:49 PM, Jason Gunthorpe jguntho...@obsidianresearch.com 
wrote:

 +int num_to_ibv_mtu(int num);
 
 Probably should be ibv_num_to_mtu() to keep with the naming pattern..


New patch coming momentarily, but I wanted to comment on this one: 

I used the name num_to_ibv_mtu because it is in the spirit of the other 
enum-to-int/int-to-enum function pair naming conventions:

int ibv_rate_to_mult(enum ibv_rate rate);
enum ibv_rate mult_to_ibv_rate(int mult);

int ibv_rate_to_mbps(enum ibv_rate rate);
enum ibv_rate mbps_to_ibv_rate(int mbps);

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2] libibverbs: Allow arbitrary int values for MTU.

2013-06-20 Thread Jeff Squyres
Keep IBV_MTU_* enums values as they are, but pass MTU values around as
int's.  This is an ABI-compatible change; legacy applications will use
the enum values, but newer applications can use an int for values that
do not currently exist in the enum set (e.g., 1500, 9000).

(if people like the idea of this patch, I will send the corresponding
kernel patch)

Signed-off-by: Jeff Squyres jsquy...@cisco.com
---
 Makefile.am|  3 ++-
 examples/devinfo.c | 20 +++---
 examples/pingpong.c| 12 -
 examples/pingpong.h|  1 -
 examples/rc_pingpong.c |  8 +++---
 examples/srq_pingpong.c|  8 +++---
 examples/uc_pingpong.c |  8 +++---
 examples/ud_pingpong.c |  2 +-
 include/infiniband/verbs.h | 55 ++---
 man/ibv_modify_qp.3|  2 +-
 man/ibv_mtu_to_num.3   | 67 ++
 man/ibv_query_port.3   |  4 +--
 man/ibv_query_qp.3 |  2 +-
 13 files changed, 142 insertions(+), 50 deletions(-)
 create mode 100644 man/ibv_mtu_to_num.3

diff --git a/Makefile.am b/Makefile.am
index 40e83be..1159e55 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -54,7 +54,8 @@ man_MANS = man/ibv_asyncwatch.1 man/ibv_devices.1 
man/ibv_devinfo.1   \
 man/ibv_post_srq_recv.3 man/ibv_query_device.3 man/ibv_query_gid.3 \
 man/ibv_query_pkey.3 man/ibv_query_port.3 man/ibv_query_qp.3   \
 man/ibv_query_srq.3 man/ibv_rate_to_mult.3 man/ibv_reg_mr.3
\
-man/ibv_req_notify_cq.3 man/ibv_resize_cq.3 man/ibv_rate_to_mbps.3
+man/ibv_req_notify_cq.3 man/ibv_resize_cq.3 man/ibv_rate_to_mbps.3  \
+man/ibv_mtu_to_num.3
 
 DEBIAN = debian/changelog debian/compat debian/control debian/copyright \
 debian/ibverbs-utils.install debian/libibverbs1.install \
diff --git a/examples/devinfo.c b/examples/devinfo.c
index ff078e4..9f51dcb 100644
--- a/examples/devinfo.c
+++ b/examples/devinfo.c
@@ -111,18 +111,6 @@ static const char *atomic_cap_str(enum ibv_atomic_cap 
atom_cap)
}
 }
 
-static const char *mtu_str(enum ibv_mtu max_mtu)
-{
-   switch (max_mtu) {
-   case IBV_MTU_256:  return 256;
-   case IBV_MTU_512:  return 512;
-   case IBV_MTU_1024: return 1024;
-   case IBV_MTU_2048: return 2048;
-   case IBV_MTU_4096: return 4096;
-   default:   return invalid MTU;
-   }
-}
-
 static const char *width_str(uint8_t width)
 {
switch (width) {
@@ -301,10 +289,10 @@ static int print_hca_cap(struct ibv_device *ib_dev, 
uint8_t ib_port)
printf(\t\tport:\t%d\n, port);
printf(\t\t\tstate:\t\t\t%s (%d)\n,
   port_state_str(port_attr.state), port_attr.state);
-   printf(\t\t\tmax_mtu:\t\t%s (%d)\n,
-  mtu_str(port_attr.max_mtu), port_attr.max_mtu);
-   printf(\t\t\tactive_mtu:\t\t%s (%d)\n,
-  mtu_str(port_attr.active_mtu), port_attr.active_mtu);
+   printf(\t\t\tmax_mtu:\t\t%d (%d)\n,
+  ibv_mtu_to_num(port_attr.max_mtu), port_attr.max_mtu);
+   printf(\t\t\tactive_mtu:\t\t%d (%d)\n,
+  ibv_mtu_to_num(port_attr.active_mtu), 
port_attr.active_mtu);
printf(\t\t\tsm_lid:\t\t\t%d\n, port_attr.sm_lid);
printf(\t\t\tport_lid:\t\t%d\n, port_attr.lid);
printf(\t\t\tport_lmc:\t\t0x%02x\n, port_attr.lmc);
diff --git a/examples/pingpong.c b/examples/pingpong.c
index 90732ef..d1c22c9 100644
--- a/examples/pingpong.c
+++ b/examples/pingpong.c
@@ -36,18 +36,6 @@
 #include stdio.h
 #include string.h
 
-enum ibv_mtu pp_mtu_to_enum(int mtu)
-{
-   switch (mtu) {
-   case 256:  return IBV_MTU_256;
-   case 512:  return IBV_MTU_512;
-   case 1024: return IBV_MTU_1024;
-   case 2048: return IBV_MTU_2048;
-   case 4096: return IBV_MTU_4096;
-   default:   return -1;
-   }
-}
-
 uint16_t pp_get_local_lid(struct ibv_context *context, int port)
 {
struct ibv_port_attr attr;
diff --git a/examples/pingpong.h b/examples/pingpong.h
index 9cdc03e..91d217b 100644
--- a/examples/pingpong.h
+++ b/examples/pingpong.h
@@ -35,7 +35,6 @@
 
 #include infiniband/verbs.h
 
-enum ibv_mtu pp_mtu_to_enum(int mtu);
 uint16_t pp_get_local_lid(struct ibv_context *context, int port);
 int pp_get_port_info(struct ibv_context *context, int port,
 struct ibv_port_attr *attr);
diff --git a/examples/rc_pingpong.c b/examples/rc_pingpong.c
index 15494a1..2d6d30e 100644
--- a/examples/rc_pingpong.c
+++ b/examples/rc_pingpong.c
@@ -78,7 +78,7 @@ struct pingpong_dest {
 };
 
 static int pp_connect_ctx(struct pingpong_context *ctx, int port, int my_psn,
- enum ibv_mtu mtu, int sl,
+ ibv_mtu_t mtu, int sl,
  struct pingpong_dest *dest, int sgid_idx)
 {
struct ibv_qp_attr attr = {
@@ -209,7 +209,7 @@ out:
 }
 
 static struct 

Re: [PATCH for-next 0/8] Add Mellanox mlx5 driver for Connect-IB devices

2013-06-20 Thread Or Gerlitz

On 16/06/2013 15:02, Eli Cohen wrote:

From: Eli Cohen e...@mellanox.com

The patches that follow constitute the driver for Mellanox's 5th generation
of HCAs named Connect-IB.

The driver is comprised of two kernel modules: mlx5_ib and mlx5_core. This
partitioning resembles what we have for mlx4 with the substantial difference
that mlx5_ib is the pci device driver and not mlx5_core.

mlx5_core provides general functionality that is intended to be used by
other Mellanox devices that will be introduced in the future. In this sense,
it can be perceived as a library. mlx5_ib has a similar role as any hardware
device under drivers/infiniband/hw.


Hi Dave,

So we skipped netdev in V0, in an attempt to reduce cross postings... 
anyway, the mlx5_core driver is similar story as of mlx4_core. So, if 
looking forward, for the initial merge to be simpler, are you OK for 
both the core and IB driver to go through Roland's tree?


Or.



The patches are partitioned to avoid exceeding the 100KB vger.kernel.org
limitation. Only the last patch adds the Makefiles and Kconfigs, to make
things robust for future bisections.

PPC is not yet supported but support will be included in the near future.

Eli

Eli Cohen (8):
   mlx5: Mellanox Connect-IB driver part 1/8
   mlx5: Mellanox Connect-IB driver part 2/8
   mlx5: Mellanox Connect-IB driver part 3/8
   mlx5: Mellanox Connect-IB driver part 4/8
   mlx5: Mellanox Connect-IB driver part 5/8
   mlx5: Mellanox Connect-IB driver part 6/8
   mlx5: Mellanox Connect-IB driver part 7/8
   mlx5: Mellanox Connect-IB driver part 8/8

  MAINTAINERS|   22 +
  drivers/infiniband/Kconfig |1 +
  drivers/infiniband/Makefile|1 +
  drivers/infiniband/hw/mlx5/Kconfig |   10 +
  drivers/infiniband/hw/mlx5/Makefile|4 +
  drivers/infiniband/hw/mlx5/ah.c|   95 +
  drivers/infiniband/hw/mlx5/cq.c|  851 +++
  drivers/infiniband/hw/mlx5/doorbell.c  |  100 +
  drivers/infiniband/hw/mlx5/mad.c   |  143 ++
  drivers/infiniband/hw/mlx5/main.c  | 1512 
  drivers/infiniband/hw/mlx5/mem.c   |  194 ++
  drivers/infiniband/hw/mlx5/mlx5_ib.h   |  593 +
  drivers/infiniband/hw/mlx5/mr.c| 1025 
  drivers/infiniband/hw/mlx5/qp.c| 2549 
  drivers/infiniband/hw/mlx5/srq.c   |  481 
  drivers/infiniband/hw/mlx5/user.h  |  123 +
  drivers/net/ethernet/mellanox/Kconfig  |1 +
  drivers/net/ethernet/mellanox/Makefile |1 +
  drivers/net/ethernet/mellanox/mlx5/core/Kconfig|   18 +
  drivers/net/ethernet/mellanox/mlx5/core/Makefile   |6 +
  drivers/net/ethernet/mellanox/mlx5/core/alloc.c|  244 ++
  drivers/net/ethernet/mellanox/mlx5/core/cmd.c  | 1497 
  drivers/net/ethernet/mellanox/mlx5/core/cq.c   |  226 ++
  drivers/net/ethernet/mellanox/mlx5/core/debugfs.c  |  600 +
  drivers/net/ethernet/mellanox/mlx5/core/eq.c   |  523 
  drivers/net/ethernet/mellanox/mlx5/core/fw.c   |  187 ++
  drivers/net/ethernet/mellanox/mlx5/core/health.c   |  216 ++
  drivers/net/ethernet/mellanox/mlx5/core/mad.c  |   80 +
  drivers/net/ethernet/mellanox/mlx5/core/main.c |  483 
  drivers/net/ethernet/mellanox/mlx5/core/mcg.c  |  108 +
  .../net/ethernet/mellanox/mlx5/core/mlx5_core.h|   96 +
  drivers/net/ethernet/mellanox/mlx5/core/mr.c   |  138 ++
  .../net/ethernet/mellanox/mlx5/core/pagealloc.c|  438 
  drivers/net/ethernet/mellanox/mlx5/core/pd.c   |  103 +
  drivers/net/ethernet/mellanox/mlx5/core/port.c |  106 +
  drivers/net/ethernet/mellanox/mlx5/core/qp.c   |  303 +++
  drivers/net/ethernet/mellanox/mlx5/core/srq.c  |  225 ++
  drivers/net/ethernet/mellanox/mlx5/core/uar.c  |  225 ++
  include/linux/mlx5/cmd.h   |   51 +
  include/linux/mlx5/cq.h|  166 ++
  include/linux/mlx5/device.h|  886 +++
  include/linux/mlx5/doorbell.h  |   81 +
  include/linux/mlx5/driver.h|  763 ++
  include/linux/mlx5/qp.h|  467 
  include/linux/mlx5/srq.h   |   41 +
  45 files changed, 15983 insertions(+)
  create mode 100644 drivers/infiniband/hw/mlx5/Kconfig
  create mode 100644 drivers/infiniband/hw/mlx5/Makefile
  create mode 100644 drivers/infiniband/hw/mlx5/ah.c
  create mode 100644 drivers/infiniband/hw/mlx5/cq.c
  create mode 100644 drivers/infiniband/hw/mlx5/doorbell.c
  create mode 100644 drivers/infiniband/hw/mlx5/mad.c
  create mode 100644 drivers/infiniband/hw/mlx5/main.c
  create mode 100644 drivers/infiniband/hw/mlx5/mem.c
  create mode 100644 

Re: [PATCH] mm: Revert pinned_vm braindamage

2013-06-20 Thread Christoph Lameter
On Thu, 20 Jun 2013, Ingo Molnar wrote:

 Peter clearly pointed it out that in the perf case it's user-space that
 initiates the pinned memory mapping which is resource-controlled via
 RLIMIT_MEMLOCK - and this was implemented that way before your commit
 broke the code.

There is no way that user space can initiate a page pin right now. Perf is
pinning the page from the kernel. Similarly the IB subsystem pins memory
meeded for device I/O.

 You seem to be hell bent on defining 'memory pinning' only as the thing
 done via the mlock*() system calls, but that is a nonsensical distinction
 that actively and incorrectly ignores other system calls that can and do
 pin memory legitimately.

Nope. I have said that Memory pinning is done by increasing the refcount
which is different from mlock which sets a page flag.

I have consistently argued that these are two different things. And I am
a
bit surprised that this point has not been understood after all these
repetitions.

Memory pinning these days is done as a side effect of kernel / driver
needs. I.e. the memory registration done through the IB subsystem and
elsewhere.

 int can_do_mlock(void)
 {
 if (capable(CAP_IPC_LOCK))
 return 1;
 if (rlimit(RLIMIT_MEMLOCK) != 0)
 return 1;
 return 0;
 }
 EXPORT_SYMBOL(can_do_mlock);

 Q.E.D.

Argh. Just checked the apps. True. They did set the rlimit to 0 at some
point in order to make this work. Then they monitor the number of locked
pages and create alerts so that action can be taking if a system uses too
many mlocked pages.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NFS over RDMA benchmark

2013-06-20 Thread Or Gerlitz

On 19/06/2013 18:47, Wendy Cheng wrote:

what kind of HW I would need to run it ?


The mlx4 driver supports memory windows as of kernel 3.9

Or.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2][TRIVIAL] osm_sm_state_mgr.c Trivial log changes

2013-06-20 Thread Hal Rosenstock
On 6/20/2013 3:44 AM, Line Holen wrote:
 Signed-off-by: Line Holen line.ho...@oracle.com

Thanks. Applied.

-- Hal
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2] libibverbs: Allow arbitrary int values for MTU.

2013-06-20 Thread Doug Ledford
On 06/20/2013 10:21 AM, Jeff Squyres wrote:
 Keep IBV_MTU_* enums values as they are, but pass MTU values around as
 int's.  This is an ABI-compatible change; legacy applications will use
 the enum values,

I'm not really concerned with what the legacy apps use so much as what
they are presented with.  In other words, I'm sure they won't request
anything other than an MTU represented by one of the enum values.  The
problem though, is what if they are run on a link with a non-IB MTU and
they are presented with it?  Let's look at one of the ports you did in
this patch as an example:

 diff --git a/examples/ud_pingpong.c b/examples/ud_pingpong.c
 index 21c551d..5a0656f 100644
 --- a/examples/ud_pingpong.c
 +++ b/examples/ud_pingpong.c
 @@ -332,7 +332,7 @@ static struct pingpong_context *pp_init_ctx(struct 
 ibv_device *ib_dev, int size,
   fprintf(stderr, Unable to query port info for port 
 %d\n, port);
   goto clean_device;
   }
 - mtu = 1  (port_info.active_mtu + 7);
 + mtu = ibv_mtu_to_num(port_info.active_mtu);
   if (size  mtu) {
   fprintf(stderr, Requested size larger than port MTU 
 (%d)\n, mtu);
   goto clean_device;

That used to be a valid mathematical construct.  Now it isn't.  It will
work for all of the IBV_MTU_* values, but if you run this program,
unmodified, on an MTU 9000 link, you get 1  9007 ;-)

Saying that this is backwards compatible is therefore incorrect.  If it
were entirely backwards compatible, then apps would not need to be
recompiled in order to avoid this error.

One possible solution to this problem is to use ld.linux's symbolic
symbol versions to solve this problem for us.  Fortunately, Roland has
been excellent in the past about keeping all of libibverbs symbols
versioned.  That can save us here.

We would need to redefine the active_mtu and max_mtu in ibv_device_attr
and path_mtu in ibv_qp_attr to all be ints.  No need to maintain back
compatibility.

Then we define ibv_device_attr_1.1 and ibv_qp_attr_1.1 structs that use
the old enum.

Then we define version IBVERBS_1.2 version of ibv_get_device_list plus a
version 1.2 of any other symbols that pass around ibv_device_attr struct
and ibv_qp_attr struct.  The new default will be to use the new structs
that have MTUs defined as ints, but the old 1.1 version of things will
use the compat structs to pass things around.  When we query the kernel
about a device/qp, if we are linked against an old app we will be using
the compat struct and we can do the conversion from int MTU to ibv_enum
based MTU and vice versa in the IBVERBS_1.1 wrapper functions that need
to be called to convert ints to enums and vice versa.

It's a lot more work, but it's the right way to do this.

So, sorry Jeff, but I'm going to Nack this patch as it stands on design
and back compatibility.  This really needs an API update, and it can be
done in a back compatible way by using the shared library symbol version
mapping and compat wrapper functions.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2] libibverbs: Allow arbitrary int values for MTU.

2013-06-20 Thread Doug Ledford
On 06/20/2013 12:34 PM, Doug Ledford wrote:
 On 06/20/2013 10:21 AM, Jeff Squyres wrote:
 Keep IBV_MTU_* enums values as they are, but pass MTU values around as
 int's.  This is an ABI-compatible change; legacy applications will use
 the enum values,
 
 I'm not really concerned with what the legacy apps use so much as what
 they are presented with.  In other words, I'm sure they won't request
 anything other than an MTU represented by one of the enum values.  The
 problem though, is what if they are run on a link with a non-IB MTU and
 they are presented with it?  Let's look at one of the ports you did in
 this patch as an example:
 
 diff --git a/examples/ud_pingpong.c b/examples/ud_pingpong.c
 index 21c551d..5a0656f 100644
 --- a/examples/ud_pingpong.c
 +++ b/examples/ud_pingpong.c
 @@ -332,7 +332,7 @@ static struct pingpong_context *pp_init_ctx(struct 
 ibv_device *ib_dev, int size,
  fprintf(stderr, Unable to query port info for port 
 %d\n, port);
  goto clean_device;
  }
 -mtu = 1  (port_info.active_mtu + 7);
 +mtu = ibv_mtu_to_num(port_info.active_mtu);
  if (size  mtu) {
  fprintf(stderr, Requested size larger than port MTU 
 (%d)\n, mtu);
  goto clean_device;
 
 That used to be a valid mathematical construct.  Now it isn't.  It will
 work for all of the IBV_MTU_* values, but if you run this program,
 unmodified, on an MTU 9000 link, you get 1  9007 ;-)
 
 Saying that this is backwards compatible is therefore incorrect.  If it
 were entirely backwards compatible, then apps would not need to be
 recompiled in order to avoid this error.
 
 One possible solution to this problem is to use ld.linux's symbolic
 symbol versions to solve this problem for us.  Fortunately, Roland has
 been excellent in the past about keeping all of libibverbs symbols
 versioned.  That can save us here.
 
 We would need to redefine the active_mtu and max_mtu in ibv_device_attr
 and path_mtu in ibv_qp_attr to all be ints.  No need to maintain back
 compatibility.

I should point out here that I would also change their name slightly in
order to force current apps to fail to compile.  This *is* an API
change, and apps need to have to make the very minor touchups necessary
in order to work again.  You don't want someone to recompile their app
without making this change and then be surprised.  The other option is
to add the int based MTUs as new elements and leave these, and also
leave these elements as an enum, but in that case I would warn people
that use these items in some way (a compiler warning about touching an
item that's deprecated might be good if that's even possible to do).

 Then we define ibv_device_attr_1.1 and ibv_qp_attr_1.1 structs that use
 the old enum.
 
 Then we define version IBVERBS_1.2 version of ibv_get_device_list plus a
 version 1.2 of any other symbols that pass around ibv_device_attr struct
 and ibv_qp_attr struct.  The new default will be to use the new structs
 that have MTUs defined as ints, but the old 1.1 version of things will
 use the compat structs to pass things around.  When we query the kernel
 about a device/qp, if we are linked against an old app we will be using
 the compat struct and we can do the conversion from int MTU to ibv_enum
 based MTU and vice versa in the IBVERBS_1.1 wrapper functions that need
 to be called to convert ints to enums and vice versa.
 
 It's a lot more work, but it's the right way to do this.
 
 So, sorry Jeff, but I'm going to Nack this patch as it stands on design
 and back compatibility.  This really needs an API update, and it can be
 done in a back compatible way by using the shared library symbol version
 mapping and compat wrapper functions.
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2] libibverbs: Allow arbitrary int values for MTU.

2013-06-20 Thread Jason Gunthorpe
On Thu, Jun 20, 2013 at 12:34:04PM -0400, Doug Ledford wrote:
 On 06/20/2013 10:21 AM, Jeff Squyres wrote:
  Keep IBV_MTU_* enums values as they are, but pass MTU values around as
  int's.  This is an ABI-compatible change; legacy applications will use
  the enum values,
 
 I'm not really concerned with what the legacy apps use so much as what
 they are presented with.  In other words, I'm sure they won't request
 anything other than an MTU represented by one of the enum values.  The
 problem though, is what if they are run on a link with a non-IB MTU and
 they are presented with it?  Let's look at one of the ports you did in
 this patch as an example:

Remember, apps will only see a wonky value if they are being used on
one of Jeff's new not-IB, not-ROCE, not-iWARP transports. Who knows if
they will even work on this new transport unmodified anyhow??

An app update to suport future transports is not unreasonable, it
happened for iwarp, rocee, etc.

  diff --git a/examples/ud_pingpong.c b/examples/ud_pingpong.c
  index 21c551d..5a0656f 100644
  +++ b/examples/ud_pingpong.c
  @@ -332,7 +332,7 @@ static struct pingpong_context *pp_init_ctx(struct 
  ibv_device *ib_dev, int size,
  fprintf(stderr, Unable to query port info for port 
  %d\n, port);
  goto clean_device;
  }
  -   mtu = 1  (port_info.active_mtu + 7);

.. and this is sketchy anyhow, the above maths are not defined to work
anywhere, it just happens to work with the constants that have been
defined so far. This would break equally if we added any new constant
to the enum. So no, these maths are not important.

 One possible solution to this problem is to use ld.linux's symbolic
 symbol versions to solve this problem for us.  Fortunately, Roland has
 been excellent in the past about keeping all of libibverbs symbols
 versioned.  That can save us here.

There is a huge resistance to reving the symbol versions in
ibverbs. See the whole extension mess.

Further, the symbol versions don't work well in verbs, the internal
structures are too exposed. The existing support is already broken and
only works in very limited cases.

What you propose breaks in fairly common use cases, eg if
librdmacm/etc and the app link to different ibverbs versions then
things go wrong. rdmacm and the app pass pointers to verbs structures
across their boundary but they are unware they are versioned
differently, and will pass them back to the wrong verbs entry
point. This has already been seen to fail with the existing symbol
version support.

Basically: the verbs ABI was not designed to work with symbol
versions.

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] libibverbs: Allow arbitrary int values for MTU

2013-06-20 Thread Hefty, Sean
 I used the name num_to_ibv_mtu because it is in the spirit of the other 
 enum-
 to-int/int-to-enum function pair naming conventions:
 
 int ibv_rate_to_mult(enum ibv_rate rate);
 enum ibv_rate mult_to_ibv_rate(int mult);
 
 int ibv_rate_to_mbps(enum ibv_rate rate);
 enum ibv_rate mbps_to_ibv_rate(int mbps);

libibverbs uses the ibv_ prefix for pretty much everything.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] libibverbs: Allow arbitrary int values for MTU

2013-06-20 Thread Jeff Squyres (jsquyres)
On Jun 20, 2013, at 1:09 PM, Hefty, Sean sean.he...@intel.com wrote:

 int ibv_rate_to_mult(enum ibv_rate rate);
 enum ibv_rate mult_to_ibv_rate(int mult);
 
 int ibv_rate_to_mbps(enum ibv_rate rate);
 enum ibv_rate mbps_to_ibv_rate(int mbps);
 
 libibverbs uses the ibv_ prefix for pretty much everything.

...except for those 2 functions above (mbps_to_ibv_rate and mult_to_ibv_rate).  
See:

https://git.kernel.org/cgit/libs/infiniband/libibverbs.git/tree/include/infiniband/verbs.h#n392

and

https://git.kernel.org/cgit/libs/infiniband/libibverbs.git/tree/include/infiniband/verbs.h#n379

respectively.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2] libibverbs: Allow arbitrary int values for MTU.

2013-06-20 Thread Doug Ledford
On 06/20/2013 12:53 PM, Jason Gunthorpe wrote:
 On Thu, Jun 20, 2013 at 12:34:04PM -0400, Doug Ledford wrote:
 On 06/20/2013 10:21 AM, Jeff Squyres wrote:
 Keep IBV_MTU_* enums values as they are, but pass MTU values around as
 int's.  This is an ABI-compatible change; legacy applications will use
 the enum values,

 I'm not really concerned with what the legacy apps use so much as what
 they are presented with.  In other words, I'm sure they won't request
 anything other than an MTU represented by one of the enum values.  The
 problem though, is what if they are run on a link with a non-IB MTU and
 they are presented with it?  Let's look at one of the ports you did in
 this patch as an example:
 
 Remember, apps will only see a wonky value if they are being used on
 one of Jeff's new not-IB, not-ROCE, not-iWARP transports.

So?  That's just today.  The only reason RoCE/IBoE maps to IB MTUs is
that they didn't bother to make this ABI break for it, but it could
benefit from having a more flexible MTU that followed the underlying
Ethernet MTU.  So who's to say that isn't next?

 Who knows if
 they will even work on this new transport unmodified anyhow??

Either we should be trying to keep back compatibility or we shouldn't.
If we are, then it should work.  If we aren't, then there is no sense
doing the magic hocus-pocus tricks with the MTU where in some cases it
is the old enum value and other cases the real MTU value.

 An app update to suport future transports is not unreasonable,

I disagree.

 it
 happened for iwarp, rocee, etc.

If it happened once, then I would agree with you above.  That it *keeps*
happening is the issue.  To me, that's a clear indication that instead
of fixing the shortcomings of the current API properly, band-aids just
keep getting applied.

 diff --git a/examples/ud_pingpong.c b/examples/ud_pingpong.c
 index 21c551d..5a0656f 100644
 +++ b/examples/ud_pingpong.c
 @@ -332,7 +332,7 @@ static struct pingpong_context *pp_init_ctx(struct 
 ibv_device *ib_dev, int size,
 fprintf(stderr, Unable to query port info for port 
 %d\n, port);
 goto clean_device;
 }
 -   mtu = 1  (port_info.active_mtu + 7);
 
 .. and this is sketchy anyhow, the above maths are not defined to work
 anywhere, it just happens to work with the constants that have been
 defined so far. This would break equally if we added any new constant
 to the enum. So no, these maths are not important.

No, but I also skipped a number of patches where code did switch
statements to convert from enum to byte value, or enum to string
representation.  All of those would break too.

 One possible solution to this problem is to use ld.linux's symbolic
 symbol versions to solve this problem for us.  Fortunately, Roland has
 been excellent in the past about keeping all of libibverbs symbols
 versioned.  That can save us here.
 
 There is a huge resistance to reving the symbol versions in
 ibverbs. See the whole extension mess.

I thought the resistance was to revving the libibverbs soname, not just
the internal symbol versions.

 Further, the symbol versions don't work well in verbs, the internal
 structures are too exposed. The existing support is already broken and
 only works in very limited cases.
 
 What you propose breaks in fairly common use cases, eg if
 librdmacm/etc and the app link to different ibverbs versions then
 things go wrong.

At the time the app is compiled, it will be compiled against a librdmacm
that needs a specific version of the libibverbs symbols because
librdmacm has already been compiled.  That means that if you want things
to just work for the end user, when you rev the internal libibverbs
symbols, then you make a corresponding change in librdmacm and when you
install libibverbs-devel, you make it have a Conflict: with librdmacm 
new-version.  Likewise, you make librdmacm have a BuildRequires:
libibverbs-devel = new-version, and make librdmacm-devel have a
Requires: libibverbs-devel = new-version.  In this way, librdmacm-devel
will automatically require that the installed libibverbs-devel be of the
right version or it won't install itself.  Likewise, updating
libibverbs-devel without also updating librdmacm-devel will cause the
entire transaction to get kicked out or, depending on options, cause
librdmacm to be removed from the system to be updated later.

Now, these are package install time checks that can be implemented in
either rpm or, I assume, apt.  If you want compile time checks, that
could be done too with header file magic.

So, this isn't broken, it's just that no one is taking the time to
properly identify incompatible versions and force compatible versions to
be installed before things are allowed to link up.

 rdmacm and the app pass pointers to verbs structures
 across their boundary but they are unware they are versioned
 differently, and will pass them back to the wrong verbs entry
 point. This has already been seen to fail with the 

Re: [PATCH] libibverbs: Allow arbitrary int values for MTU

2013-06-20 Thread Doug Ledford
On 06/18/2013 02:49 PM, Jason Gunthorpe wrote:
 This is simpler:
 
 {
   static char str[16];
   snprintf(str, sizeof(str), %d, ibv_mtu_to_num(max_mtu));
 return str;
 }

That is not, however, multi-thread safe nor advisable unless you clearly
indicate in the man page to the function that subsequent calls to the
function wipe out the result of previous calls.  It's not even single
thread safe if you have more than one interface and don't know that
later calls wipe this buffer out.  Best to avoid library routines such
as this.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] libibverbs: Allow arbitrary int values for MTU

2013-06-20 Thread Jeff Squyres (jsquyres)
On Jun 20, 2013, at 4:40 PM, Doug Ledford dledf...@redhat.com wrote:

 {
  static char str[16];
  snprintf(str, sizeof(str), %d, ibv_mtu_to_num(max_mtu));
return str;
 }
 
 That is not, however, multi-thread safe nor advisable unless you clearly
 indicate in the man page to the function that subsequent calls to the
 function wipe out the result of previous calls.  It's not even single
 thread safe if you have more than one interface and don't know that
 later calls wipe this buffer out.  Best to avoid library routines such
 as this.

This is in the devinfo.c program (which is single-threaded), not in the library 
itself.

But regardless, this whole function went away in V2 of the patch.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2] libibverbs: Allow arbitrary int values for MTU.

2013-06-20 Thread Jason Gunthorpe
On Thu, Jun 20, 2013 at 04:31:14PM -0400, Doug Ledford wrote:
  happened for iwarp, rocee, etc.
 
 If it happened once, then I would agree with you above.  That it *keeps*
 happening is the issue.  To me, that's a clear indication that instead
 of fixing the shortcomings of the current API properly, band-aids just
 keep getting applied.

The new transports have new requirements, and the apps have new
required behaviors - the API simply can't hide all this in every
case. The changes before had nothing to do with MTU, FWIW.

Jeff: Does your new transport support 100% of ibverbs and MTU is the
only change an app would need?

  .. and this is sketchy anyhow, the above maths are not defined to work
  anywhere, it just happens to work with the constants that have been
  defined so far. This would break equally if we added any new constant
  to the enum. So no, these maths are not important.
 
 No, but I also skipped a number of patches where code did switch
 statements to convert from enum to byte value, or enum to string
 representation.  All of those would break too.

Yes, but often either doesn't matter (they are just print strings) or
there are default fall throughs.

UD apps are ones that are going to have a problem, but we already have
very poor transport agnostic support for UD, so it is unlikely an
existing UD app will run on a new transport.
 
  There is a huge resistance to reving the symbol versions in
  ibverbs. See the whole extension mess.
 
 I thought the resistance was to revving the libibverbs soname, not just
 the internal symbol versions.

Nope, people want new apps (using extensions/etc) to run on old verbs
versions. I don't really like that, mind you, but it has been strongly
asked for.

 At the time the app is compiled, it will be compiled against a librdmacm
 that needs a specific version of the libibverbs symbols because
 librdmacm has already been compiled.  That means that if you want
 things to just work for the end user, when you rev the internal libibverbs
 symbols, then you make a corresponding change in librdmacm and when
 you

Both the app and librdmacm have a DT_NEEDED on libibverbs, and both
call into libibverbs.

The issue is not sorting out the install of the core libraries via
package management tricks, but what happens when an app/middleware
outside the package management dynamically links to this mess.

We've already seen this fail in the field with apps that link to the
v1.0 verbs ABI that call into other libraries that were linked to the
v1.1 API.

It explodes. The fundamental problem with the v1.0/v1.1 switch is the
v1.0 functions are returning pointers that cannot be passed into a
v1.1 function, eg iv_close_device@1.1(ibv_open_device@1.0(..))
crashes. 

Your idea to change the MTU causes the same problem with structure
versioning. If I use a rdmacm/etc API to get a MTU containing
structure then I still get the new meaning because rdmacm is linked to
the v1.2 verbs symbols, but my app is linked to the v1.1 symbols and
can't support it.

.. and of course rdmacm is just an example, there are other middleware
libraries (uDAPL, MPI, etc) that may be affected.

Symbol versioning *doesn't* solve the problem, it just creates a new
class of subtle failure modes. It appears to work in simple cases so
people think it is a silver bullet, but it is not. It is very complex,
the failures cases are screwy and subtle, and verbs tends to hit them
head on because of how exposed all the internal structures are.

 So, this isn't broken, it's just that no one is taking the time to
 properly identify incompatible versions and force compatible versions to
 be installed before things are allowed to link up.

You can't enforce things on binary-only proprietary apps being
installed from outside package management.

The verbs extension mechanism can safely deal with this kind of
change, it effectively adds structure versioning to the ABI, but it is
not mainlined yet and is also pretty complex.

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


linux-next: just chaeking correctness of the infiniband tree

2013-06-20 Thread Stephen Rothwell
Hi all,

I noticed that the infiniband tree is now based on the net-next tree.  I
assume that is deliberate?  I do have to question how much testing that
tree has had since it is now based on a tree that Dave only released in
the last 24 hours ...

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpQkTfEjcmq5.pgp
Description: PGP signature


Re: [PATCH V2] libibverbs: Allow arbitrary int values for MTU.

2013-06-20 Thread Doug Ledford
On 06/20/2013 05:14 PM, Jason Gunthorpe wrote:
 On Thu, Jun 20, 2013 at 04:31:14PM -0400, Doug Ledford wrote:
 happened for iwarp, rocee, etc.

 If it happened once, then I would agree with you above.  That it *keeps*
 happening is the issue.  To me, that's a clear indication that instead
 of fixing the shortcomings of the current API properly, band-aids just
 keep getting applied.
 
 The new transports have new requirements, and the apps have new
 required behaviors - the API simply can't hide all this in every
 case. The changes before had nothing to do with MTU, FWIW.

It demonstrates what I would call a leakage between layer 2 and higher
layer APIs though.

 Nope, people want new apps (using extensions/etc) to run on old verbs
 versions. I don't really like that, mind you, but it has been strongly
 asked for.

At some point people just need to suck it up and deal with a new
version.  Once you reach a certain level of maturity you can maintain
long term back compatibility and forward compatibility.  But that
requires that the API be sufficiently well thought out that each change
is more evolutionary than revolutionary.  The entire IB stack still
likes to do major, revolutionary changes.  It has not reached the level
of maturity where it can truly maintain a long term forward/back
compatibility IMO.  And the layer level leakage I mention earlier just
makes this more problematic.

 Both the app and librdmacm have a DT_NEEDED on libibverbs, and both
 call into libibverbs.
 
 The issue is not sorting out the install of the core libraries via
 package management tricks, but what happens when an app/middleware
 outside the package management dynamically links to this mess.

If a user chooses not to use packaging, that's their prerogative.
However, they can also collect the pieces when things break.  If a ISV
chooses to do the same, then that ISV is just being flat lazy and
sloppy.  The package management stacks are there for a reason and serve
a valuable purpose.  Ignoring them is akin to just thumbing your nose at
the libibverbs version as well.

 We've already seen this fail in the field with apps that link to the
 v1.0 verbs ABI that call into other libraries that were linked to the
 v1.1 API.

So this exposes an issue, I agree.

 It explodes. The fundamental problem with the v1.0/v1.1 switch is the
 v1.0 functions are returning pointers that cannot be passed into a
 v1.1 function, eg iv_close_device@1.1(ibv_open_device@1.0(..))
 crashes. 

This isn't a problem if library A doesn't call into library B and try to
use the same struct as the app itself when the app calls into library B.

 Your idea to change the MTU causes the same problem with structure
 versioning. If I use a rdmacm/etc API to get a MTU containing
 structure then I still get the new meaning because rdmacm is linked to
 the v1.2 verbs symbols, but my app is linked to the v1.1 symbols and
 can't support it.
 
 .. and of course rdmacm is just an example, there are other middleware
 libraries (uDAPL, MPI, etc) that may be affected.
 
 Symbol versioning *doesn't* solve the problem, it just creates a new
 class of subtle failure modes. It appears to work in simple cases so
 people think it is a silver bullet, but it is not. It is very complex,
 the failures cases are screwy and subtle, and verbs tends to hit them
 head on because of how exposed all the internal structures are.

I would argue that this is because the libraries are so disjoint (that
librdmacm needs the deep internal knowledge it needs of libibverbs
indicates that maybe these two shouldn't be separate from each other for
example, or that maybe libibverbs should provide a unified connection
API to the user and internally use both librdmacm and libibcm on the
back end to work IP v. GUID connection setup).  So, I think there is
significant room to improve the layout of the overall RDMA APIs and
doing that would address this particular issue and is probably the right
way to go.

However, aside from that, my current objection to all of this is that
this solution, while meeting the needs of the we don't want to have to
change anything unless the app wants to run on this new fabric results
in what I would call a gross hack (some enum, some int, same variable).
 I'm not so much complaining about Jeff's solution, more the requirement
that we come up with such an ugly construct.  We are headed down a
course of putting in gross hacks in order to preserve an outdated
design, one which has much more elegant solutions today than what we are
currently using.  At *some* point, this becomes a miserable,
unmaintainable mess.

So I hear you that people object to breaking the API for a new library
version.  My objection (which I'm sure I'll be overruled on) is that
people are taking the easy way out instead of fixing things up the right
way.

 So, this isn't broken, it's just that no one is taking the time to
 properly identify incompatible versions and force compatible versions to
 be installed 

RE: [PATCH V2] libibverbs: Allow arbitrary int values for MTU.

2013-06-20 Thread Hefty, Sean
 I would argue that this is because the libraries are so disjoint (that
 librdmacm needs the deep internal knowledge it needs of libibverbs
 indicates that maybe these two shouldn't be separate from each other for
 example, or that maybe libibverbs should provide a unified connection
 API to the user and internally use both librdmacm and libibcm on the
 back end to work IP v. GUID connection setup).  So, I think there is
 significant room to improve the layout of the overall RDMA APIs and
 doing that would address this particular issue and is probably the right
 way to go.
... 
 That would address structures, but I think the API itself could use some
 love and care, and that wouldn't be addressed by just the verbs
 extension mechanism (and in fact if you rethink some of the exposed API,
 it might drastically change how you might want to handle
 extensions...who knows).

I agree with Doug.  A merged library that can evolve the RDMA APIs with fewer 
compatibility constraints could be beneficial.  I just think such an approach 
would require some thought and a lot of discussion.

- Sean
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] IB/core: Fix error return code in add_port()

2013-06-20 Thread Wei Yongjun
From: Wei Yongjun yongjun_...@trendmicro.com.cn

Fix to return -ENOMEM in the alloc_group_attrs() error handling
case instead of 0, as done elsewhere in this function.

Signed-off-by: Wei Yongjun yongjun_...@trendmicro.com.cn
---
 drivers/infiniband/core/sysfs.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c
index 246fdc1..d9b78c4 100644
--- a/drivers/infiniband/core/sysfs.c
+++ b/drivers/infiniband/core/sysfs.c
@@ -545,8 +545,10 @@ static int add_port(struct ib_device *device, int port_num,
 
p-gid_group.name  = gids;
p-gid_group.attrs = alloc_group_attrs(show_port_gid, attr.gid_tbl_len);
-   if (!p-gid_group.attrs)
+   if (!p-gid_group.attrs) {
+   ret = -ENOMEM;
goto err_remove_pma;
+   }
 
ret = sysfs_create_group(p-kobj, p-gid_group);
if (ret)
@@ -555,8 +557,10 @@ static int add_port(struct ib_device *device, int port_num,
p-pkey_group.name  = pkeys;
p-pkey_group.attrs = alloc_group_attrs(show_port_pkey,
attr.pkey_tbl_len);
-   if (!p-pkey_group.attrs)
+   if (!p-pkey_group.attrs) {
+   ret = -ENOMEM;
goto err_remove_gid;
+   }
 
ret = sysfs_create_group(p-kobj, p-pkey_group);
if (ret)

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] IB/core: Fix error return code in add_port()

2013-06-20 Thread Hefty, Sean
 Subject: [PATCH] IB/core: Fix error return code in add_port()
 
 From: Wei Yongjun yongjun_...@trendmicro.com.cn
 
 Fix to return -ENOMEM in the alloc_group_attrs() error handling

Patch itself looks fine, but please change alloc_group_attrs() - add_port() in 
the description.

- Sean
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] IB/core: Fix error return code in add_port()

2013-06-20 Thread Wei Yongjun
Fix to return -ENOMEM in the add_port() error handling
case instead of 0, as done elsewhere in this function.

Signed-off-by: Wei Yongjun yongjun_...@trendmicro.com.cn
---
v1 - v2: change patch description
---
 drivers/infiniband/core/sysfs.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c
index 246fdc1..d9b78c4 100644
--- a/drivers/infiniband/core/sysfs.c
+++ b/drivers/infiniband/core/sysfs.c
@@ -545,8 +545,10 @@ static int add_port(struct ib_device *device, int port_num,
 
p-gid_group.name  = gids;
p-gid_group.attrs = alloc_group_attrs(show_port_gid, attr.gid_tbl_len);
-   if (!p-gid_group.attrs)
+   if (!p-gid_group.attrs) {
+   ret = -ENOMEM;
goto err_remove_pma;
+   }
 
ret = sysfs_create_group(p-kobj, p-gid_group);
if (ret)
@@ -555,8 +557,10 @@ static int add_port(struct ib_device *device, int port_num,
p-pkey_group.name  = pkeys;
p-pkey_group.attrs = alloc_group_attrs(show_port_pkey,
attr.pkey_tbl_len);
-   if (!p-pkey_group.attrs)
+   if (!p-pkey_group.attrs) {
+   ret = -ENOMEM;
goto err_remove_gid;
+   }
 
ret = sysfs_create_group(p-kobj, p-pkey_group);
if (ret)


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: linux-next: just chaeking correctness of the infiniband tree

2013-06-20 Thread Roland Dreier
On Thu, Jun 20, 2013 at 5:09 PM, Stephen Rothwell s...@canb.auug.org.au wrote:
 I noticed that the infiniband tree is now based on the net-next tree.  I
 assume that is deliberate?  I do have to question how much testing that
 tree has had since it is now based on a tree that Dave only released in
 the last 24 hours ...

That is intentional since there is work coming that relies on net-next
prerequisites.

The tree hasn't had much testing, but pushing it out a few weeks
before the merge window is the way it gets testing.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html