Re: [PATCH libibverbs] init.c: increase sysfs read buffer size to 16

2015-12-09 Thread Jeff Squyres (jsquyres)
Any further comments on this?

Doug -- does it look ok to you?


> On Dec 7, 2015, at 5:27 AM, Haggai Eran <hagg...@mellanox.com> wrote:
> 
> On 12/04/2015 01:09 AM, Jeff Squyres wrote:
>> The default value of 8 is too small to read
>> /sys/class/infiniband/usnic_x/node_type, which contains "6: usNIC
>> UDP".  Per a7a73a8c1b39362f1701256bc772d82847832f9c, the too-small
>> buffer causes a stderr warning to be emitted from ibv_devinfo when
>> reading usNIC devices.
>> 
>> This commit therefore increases the buffer size to 16, which is long
>> enough to read the usNIC node_type value.
> 
> Reviewed-by: Haggai Eran <hagg...@mellanox.com>


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] usnic: add missing clauses to BSD license

2015-09-30 Thread Jeff Squyres
The usnic_verbs kernel module was clearly marked with the following in
its code:

  MODULE_LICENSE("Dual BSD/GPL");

However, we accidentally left a few clauses of the BSD text out of the
license header in all the source files.  This commit fixes that: all
the files are properly dual BSD/GPL-licensed.

Signed-off-by: Jeff Squyres <jsquy...@cisco.com>
---
 drivers/infiniband/hw/usnic/usnic.h | 21 ++---
 drivers/infiniband/hw/usnic/usnic_abi.h | 21 ++---
 drivers/infiniband/hw/usnic/usnic_common_pkt_hdr.h  | 21 ++---
 drivers/infiniband/hw/usnic/usnic_common_util.h | 21 ++---
 drivers/infiniband/hw/usnic/usnic_debugfs.c | 21 ++---
 drivers/infiniband/hw/usnic/usnic_debugfs.h | 21 ++---
 drivers/infiniband/hw/usnic/usnic_fwd.c | 21 ++---
 drivers/infiniband/hw/usnic/usnic_fwd.h | 21 ++---
 drivers/infiniband/hw/usnic/usnic_ib.h  | 21 ++---
 drivers/infiniband/hw/usnic/usnic_ib_main.c | 21 ++---
 drivers/infiniband/hw/usnic/usnic_ib_qp_grp.c   | 21 ++---
 drivers/infiniband/hw/usnic/usnic_ib_qp_grp.h   | 21 ++---
 drivers/infiniband/hw/usnic/usnic_ib_sysfs.c| 21 ++---
 drivers/infiniband/hw/usnic/usnic_ib_sysfs.h| 21 ++---
 drivers/infiniband/hw/usnic/usnic_ib_verbs.c| 21 ++---
 drivers/infiniband/hw/usnic/usnic_ib_verbs.h| 21 ++---
 drivers/infiniband/hw/usnic/usnic_log.h | 21 ++---
 drivers/infiniband/hw/usnic/usnic_transport.c   | 21 ++---
 drivers/infiniband/hw/usnic/usnic_transport.h   | 21 ++---
 drivers/infiniband/hw/usnic/usnic_uiom.c|  2 +-
 drivers/infiniband/hw/usnic/usnic_uiom.h| 21 ++---
 .../infiniband/hw/usnic/usnic_uiom_interval_tree.c  | 21 ++---
 .../infiniband/hw/usnic/usnic_uiom_interval_tree.h  | 21 ++---
 drivers/infiniband/hw/usnic/usnic_vnic.c| 21 ++---
 drivers/infiniband/hw/usnic/usnic_vnic.h| 21 ++---
 25 files changed, 433 insertions(+), 73 deletions(-)

diff --git a/drivers/infiniband/hw/usnic/usnic.h 
b/drivers/infiniband/hw/usnic/usnic.h
index 5be13d8..f903502 100644
--- a/drivers/infiniband/hw/usnic/usnic.h
+++ b/drivers/infiniband/hw/usnic/usnic.h
@@ -1,9 +1,24 @@
 /*
  * Copyright (c) 2013, Cisco Systems, Inc. All rights reserved.
  *
- * This program is free software; you may redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; version 2 of the License.
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
  *
  * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
  * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
diff --git a/drivers/infiniband/hw/usnic/usnic_abi.h 
b/drivers/infiniband/hw/usnic/usnic_abi.h
index 04a6622..7fe9502 100644
--- a/drivers/infiniband/hw/usnic/usnic_abi.h
+++ b/drivers/infiniband/hw/usnic/usnic_abi.h
@@ -1,9 +1,24 @@
 /*
  * Copyright (c) 2013, Cisco Systems, Inc. All rights reserved.
  *
- * This program is free software; you may redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; version 2 of the License.
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *c

Re: [PATCH v3] libibverbs init.c: conditionally emit warning if no userspace driver found

2015-07-06 Thread Jeff Squyres (jsquyres)
On Jun 17, 2015, at 10:25 AM, Doug Ledford dledf...@redhat.com wrote:
 
 The patch is accepted, I just haven’t pushed it out yet.

Is there a timeline for when this patch will be available in the upstream git 
repo and released in a new version of libibverbs?

I ask because we'd like to see this patch get into upstream distro libibverbs 
releases.  Once that happens, we can start planning the end of the horrible 
hackarounds we had to put into place (e.g., in Open MPI) to suppress the 
misleading libibverbs output.

Thanks!

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

N�r��yb�X��ǧv�^�)޺{.n�+{��ٚ�{ay�ʇڙ�,j��f���h���z��w���
���j:+v���w�j�mzZ+�ݢj��!�i

Re: [PATCH v3] libibverbs init.c: conditionally emit warning if no userspace driver found

2015-06-16 Thread Jeff Squyres (jsquyres)
Ping.

This is just a periodic query to see if there has been any progress on 
accepting this patch into libibverbs.



 On Jun 3, 2015, at 12:50 PM, Doug Ledford dledf...@redhat.com wrote:
 
 On Mon, 2015-06-01 at 22:02 +, Jeff Squyres (jsquyres) wrote:
 On May 22, 2015, at 9:44 AM, Doug Ledford dledf...@redhat.com wrote:
 
 Did that happen yet?
 
 I don't think so.  I didn't file a specific ticket for it at k.o yet
 (the k.o tickets take a while to process, so I didn't want to file it
 until after the comment period here on list).
 
 Ping.
 
 This is just a periodic query to see if there has been any progress on 
 accepting this patch into libibverbs.
 
 
 I have a ticket with kernel.org helpdesk to change the permissions on
 the libibverbs.git repo, and they are waiting on Roland to ACK the
 change.  Until then, I can't do much.
 
 -- 
 Doug Ledford dledf...@redhat.com
  GPG KeyID: 0E572FDD
 


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] libibverbs init.c: conditionally emit warning if no userspace driver found

2015-06-01 Thread Jeff Squyres (jsquyres)
On May 22, 2015, at 9:44 AM, Doug Ledford dledf...@redhat.com wrote:
 
 Did that happen yet?
 
 I don't think so.  I didn't file a specific ticket for it at k.o yet
 (the k.o tickets take a while to process, so I didn't want to file it
 until after the comment period here on list).

Ping.

This is just a periodic query to see if there has been any progress on 
accepting this patch into libibverbs.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] libibverbs init.c: conditionally emit warning if no userspace driver found

2015-05-22 Thread Jeff Squyres (jsquyres)
On May 20, 2015, at 1:11 PM, Doug Ledford dledf...@redhat.com wrote:
 
 The location of the upstream sources and tarballs would not change.
 Neither the git repo nor the tarball repo were like the kernel.  The
 upstream kernel.org git repo Roland had, had his name in the repo.  So
 it had to change.  But the libibverbs repo is in a generic location.
 There is no need to change it, only to change the permissions on the git
 repo to allow the new maintainer to push directly into it.  

Did that happen yet?

 Ditto with
 the upload/download space on openfabrics.org/downloads/verbs.

It looks like someone did part of this on flatbed -- you own the download 
directory but none of the files, and they are all 644.  So I took the liberty 
of chown'ing them all to you:


$ hostname; pwd; ls -la
flatbed.openfabrics.org
/var/www/html/downloads/verbs
total 6096
drwxr-xr-x.  2 dledford ofed   4096 May  7  2014 .
drwxrwxr-x. 55 apache   ofed   4096 Feb 13 07:31 ..
-rw-r--r--.  1 dledford ofed 347508 Mar 14  2006 libibverbs-1.0.2.tar.gz
-rw-r--r--.  1 dledford ofed 349439 May  2  2006 libibverbs-1.0.3.tar.gz
-rw-r--r--.  1 dledford ofed 360410 Oct 31  2006 libibverbs-1.0.4.tar.gz
-rw-r--r--.  1 dledford ofed 359902 Jun 18  2007 libibverbs-1.0.5.tar.gz
-rw-r--r--.  1 dledford ofed 321835 Aug 29  2005 libibverbs-1.0-rc1.tar.gz
-rw-r--r--.  1 dledford ofed 338537 Oct  2  2005 libibverbs-1.0-rc3.tar.gz
-rw-r--r--.  1 dledford ofed 341792 Oct 28  2005 libibverbs-1.0-rc4.tar.gz
-rw-r--r--.  1 dledford ofed 347699 Feb 17  2006 libibverbs-1.0-rc7.tar.gz
-rw-r--r--.  1 dledford ofed 384743 Jun 18  2007 libibverbs-1.1.1.tar.gz
-rw-r--r--.  1 dledford ofed 394618 Apr 18  2008 libibverbs-1.1.2.tar.gz
-rw-r--r--.  1 dledford ofed 359331 Oct 29  2009 libibverbs-1.1.3.tar.gz
-rw-r--r--.  1 dledford ofed 362475 Jun  3  2010 libibverbs-1.1.4.tar.gz
-rw-r--r--.  1 dledford ofed 364219 Jun 28  2011 libibverbs-1.1.5.tar.gz
-rw-r--r--.  1 dledford ofed 387794 Dec 21  2011 libibverbs-1.1.6.tar.gz
-rw-r--r--.  1 dledford ofed 391812 May 28  2013 libibverbs-1.1.7.tar.gz
-rw-r--r--   1 dledford ofed 406548 May  5  2014 libibverbs-1.1.8.tar.gz
-rw-r--r--.  1 dledford ofed 384656 Apr 24  2007 libibverbs-1.1.tar.gz
-rw-r--r--   1 dledford ofed   3957 May  7  2014 README.html
-rw-r--r--.  1 dledford ofed 60 Mar 12  2008 WEB_README
-

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] libibverbs init.c: remove stderr warnings if no userspace driver found

2015-05-11 Thread Jeff Squyres (jsquyres)
On May 9, 2015, at 8:04 AM, Yann Droneaud ydrone...@opteya.com wrote:
 
 Le vendredi 08 mai 2015 à 11:21 -0700, Jeff Squyres a écrit :
 Signed-off-by: Jeff Squyres jsquy...@cisco.com
 
 This is a little short for an explanation: what was the issue with the
 error messages ?

Cisco has stopped shipping its libibverbs usnic driver, although we are still 
using the kernel driver in the /sys/class/infiniband space (since it's the only 
way to be upstream).  Specifically: instead of using libibverbs for userspace 
access, we are now using libfabric.

That is: it's not a warning or an error if libibverbs cannot find a userspace 
driver for kernel devices.  Indeed, returning a num_devices of 0 is sufficient 
-- the middleware shouldn't be unconditionally printing out stderr message; let 
the upper layer application do that (if it wants to).

FWIW, Sean just removed a similar set of stderr warnings from librdmacm:

   
http://git.openfabrics.org/?p=~shefty/librdmacm.git;a=commitdiff;h=2b2aad809afc56fa3157f5cf99036f92b9c90f16

 -free(sysfs_dev);
 
 I believe this free() was necessary to not leak some memory.

Ah -- I mis-read the loop.  I'll re-submit with the loop still there, but just 
removing the fprintf block.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

N�r��yb�X��ǧv�^�)޺{.n�+{��ٚ�{ay�ʇڙ�,j��f���h���z��w���
���j:+v���w�j�mzZ+�ݢj��!�i

[PATCH] libibverbs init.c: remove stderr warnings if no userspace driver found

2015-05-08 Thread Jeff Squyres
Signed-off-by: Jeff Squyres jsquy...@cisco.com
---
 src/init.c | 14 --
 1 file changed, 14 deletions(-)

diff --git a/src/init.c b/src/init.c
index d0e4b1c..9c21768 100644
--- a/src/init.c
+++ b/src/init.c
@@ -557,19 +557,5 @@ HIDDEN int ibverbs_init(struct ibv_device ***list)
}
 
 out:
-   for (sysfs_dev = sysfs_dev_list,
-next_dev = sysfs_dev ? sysfs_dev-next : NULL;
-sysfs_dev;
-sysfs_dev = next_dev, next_dev = sysfs_dev ? sysfs_dev-next : 
NULL) {
-   if (!sysfs_dev-have_driver) {
-   fprintf(stderr, PFX Warning: no userspace 
device-specific 
-   driver found for %s\n, sysfs_dev-sysfs_path);
-   if (statically_linked)
-   fprintf(stderr,When linking libibverbs 
statically, 
-   driver must be statically linked 
too.\n);
-   }
-   free(sysfs_dev);
-   }
-
return num_devices;
 }
-- 
2.2.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH libibverbs V2] Add new verb: uv_query_port_max_datagram()

2013-08-27 Thread Jeff Squyres (jsquyres)
Bump.  This is V2 of the patch, which removes the ABI issue: libibverbs 
directly calls the command in the kernel (without going through the provider 
plugin).

On Aug 21, 2013, at 5:22 PM, Jeff Squyres jsquy...@cisco.com wrote:

 Per lengthy discussion on the linux-rdma list, add a new verb to get
 max datagram size (in bytes) since the methods for retrieving MTU
 values are limited to a finite enum set, and are difficult to change
 for backwards compatibility reasons.  
 
 Also add corresponding command: uv_cmd_query_port_max_datagram().
 Since this is a new verb, there was no need to add a _V2 enum for the
 command macro, which required adding a UB_INIT_CMD_RESP() macro.
 
 If the kernel does not support the new QUERY_PORT_MAX_DATAGRAM
 command, fall back to returning the int-ized MTU enum from
 ibv_cmd_query_port().
 
 Note that the name for this verb was chosen with the following
 rationale:
 
 * After discussion with Roland, use the prefix uv instead of ibv,
  since this verb is generic to both Ethernet, InfiniBand, and
  whatever other transports are underneath.
 * query was used (vs. get) because it invokes a command (vs. a
  struct lookup)
 
 If the community likes this approach, I'll send the corresponding
 kernel patch.
 
 Difference from V1 
 ==
 Do not add this verb to the devops struct (because that would break ABI).
 Instead, just have uv_query_port_max_datagram() directly invoke 
 uv_cmd_query_port_max_datagram().
 
 Signed-off-by: Jeff Squyres jsquy...@cisco.com
 ---
 Makefile.am  |  3 +-
 examples/devinfo.c   |  7 +
 include/infiniband/driver.h  |  4 +++
 include/infiniband/kern-abi.h| 17 +++-
 include/infiniband/verbs.h   |  6 
 man/uv_query_port_max_datagram.3 | 59 
 src/cmd.c| 54 
 src/ibverbs.h|  8 ++
 src/libibverbs.map   |  2 ++
 src/verbs.c  | 13 +
 10 files changed, 171 insertions(+), 2 deletions(-)
 create mode 100644 man/uv_query_port_max_datagram.3
 
 diff --git a/Makefile.am b/Makefile.am
 index 40e83be..51fe5d5 100644
 --- a/Makefile.am
 +++ b/Makefile.am
 @@ -54,7 +54,8 @@ man_MANS = man/ibv_asyncwatch.1 man/ibv_devices.1 
 man/ibv_devinfo.1 \
 man/ibv_post_srq_recv.3 man/ibv_query_device.3 man/ibv_query_gid.3
 \
 man/ibv_query_pkey.3 man/ibv_query_port.3 man/ibv_query_qp.3  \
 man/ibv_query_srq.3 man/ibv_rate_to_mult.3 man/ibv_reg_mr.3   
 \
 -man/ibv_req_notify_cq.3 man/ibv_resize_cq.3 man/ibv_rate_to_mbps.3
 +man/ibv_req_notify_cq.3 man/ibv_resize_cq.3 man/ibv_rate_to_mbps.3   
 \
 +man/uv_query_port_max_datagram.3
 
 DEBIAN = debian/changelog debian/compat debian/control debian/copyright \
 debian/ibverbs-utils.install debian/libibverbs1.install \
 diff --git a/examples/devinfo.c b/examples/devinfo.c
 index ff078e4..f51620b 100644
 --- a/examples/devinfo.c
 +++ b/examples/devinfo.c
 @@ -209,6 +209,7 @@ static int print_hca_cap(struct ibv_device *ib_dev, 
 uint8_t ib_port)
   struct ibv_port_attr port_attr;
   int rc = 0;
   uint8_t port;
 + uint32_t max_datagram;
   char buf[256];
 
   ctx = ibv_open_device(ib_dev);
 @@ -298,6 +299,11 @@ static int print_hca_cap(struct ibv_device *ib_dev, 
 uint8_t ib_port)
   fprintf(stderr, Failed to query port %u props\n, 
 port);
   goto cleanup;
   }
 + rc = uv_query_port_max_datagram(ctx, port, max_datagram);
 + if (rc) {
 + fprintf(stderr, Failed to query port %u max datagram 
 size\n, port);
 + goto cleanup;
 + }
   printf(\t\tport:\t%d\n, port);
   printf(\t\t\tstate:\t\t\t%s (%d)\n,
  port_state_str(port_attr.state), port_attr.state);
 @@ -305,6 +311,7 @@ static int print_hca_cap(struct ibv_device *ib_dev, 
 uint8_t ib_port)
  mtu_str(port_attr.max_mtu), port_attr.max_mtu);
   printf(\t\t\tactive_mtu:\t\t%s (%d)\n,
  mtu_str(port_attr.active_mtu), port_attr.active_mtu);
 + printf(\t\t\tmax_datagram_size:\t%u\n, max_datagram);
   printf(\t\t\tsm_lid:\t\t\t%d\n, port_attr.sm_lid);
   printf(\t\t\tport_lid:\t\t%d\n, port_attr.lid);
   printf(\t\t\tport_lmc:\t\t0x%02x\n, port_attr.lmc);
 diff --git a/include/infiniband/driver.h b/include/infiniband/driver.h
 index 9a81416..6e1236c 100644
 --- a/include/infiniband/driver.h
 +++ b/include/infiniband/driver.h
 @@ -67,6 +67,10 @@ int ibv_cmd_query_device(struct ibv_context *context,
 int ibv_cmd_query_port(struct ibv_context *context, uint8_t port_num,
  struct ibv_port_attr *port_attr,
  struct ibv_query_port *cmd, size_t cmd_size);
 +int

[PATCH libibverbs V2] Add new verb: uv_query_port_max_datagram()

2013-08-21 Thread Jeff Squyres
Per lengthy discussion on the linux-rdma list, add a new verb to get
max datagram size (in bytes) since the methods for retrieving MTU
values are limited to a finite enum set, and are difficult to change
for backwards compatibility reasons.  

Also add corresponding command: uv_cmd_query_port_max_datagram().
Since this is a new verb, there was no need to add a _V2 enum for the
command macro, which required adding a UB_INIT_CMD_RESP() macro.

If the kernel does not support the new QUERY_PORT_MAX_DATAGRAM
command, fall back to returning the int-ized MTU enum from
ibv_cmd_query_port().

Note that the name for this verb was chosen with the following
rationale:

* After discussion with Roland, use the prefix uv instead of ibv,
  since this verb is generic to both Ethernet, InfiniBand, and
  whatever other transports are underneath.
* query was used (vs. get) because it invokes a command (vs. a
  struct lookup)

If the community likes this approach, I'll send the corresponding
kernel patch.

Difference from V1 
==
Do not add this verb to the devops struct (because that would break ABI).
Instead, just have uv_query_port_max_datagram() directly invoke 
uv_cmd_query_port_max_datagram().

Signed-off-by: Jeff Squyres jsquy...@cisco.com
---
 Makefile.am  |  3 +-
 examples/devinfo.c   |  7 +
 include/infiniband/driver.h  |  4 +++
 include/infiniband/kern-abi.h| 17 +++-
 include/infiniband/verbs.h   |  6 
 man/uv_query_port_max_datagram.3 | 59 
 src/cmd.c| 54 
 src/ibverbs.h|  8 ++
 src/libibverbs.map   |  2 ++
 src/verbs.c  | 13 +
 10 files changed, 171 insertions(+), 2 deletions(-)
 create mode 100644 man/uv_query_port_max_datagram.3

diff --git a/Makefile.am b/Makefile.am
index 40e83be..51fe5d5 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -54,7 +54,8 @@ man_MANS = man/ibv_asyncwatch.1 man/ibv_devices.1 
man/ibv_devinfo.1   \
 man/ibv_post_srq_recv.3 man/ibv_query_device.3 man/ibv_query_gid.3 \
 man/ibv_query_pkey.3 man/ibv_query_port.3 man/ibv_query_qp.3   \
 man/ibv_query_srq.3 man/ibv_rate_to_mult.3 man/ibv_reg_mr.3
\
-man/ibv_req_notify_cq.3 man/ibv_resize_cq.3 man/ibv_rate_to_mbps.3
+man/ibv_req_notify_cq.3 man/ibv_resize_cq.3 man/ibv_rate_to_mbps.3 \
+man/uv_query_port_max_datagram.3
 
 DEBIAN = debian/changelog debian/compat debian/control debian/copyright \
 debian/ibverbs-utils.install debian/libibverbs1.install \
diff --git a/examples/devinfo.c b/examples/devinfo.c
index ff078e4..f51620b 100644
--- a/examples/devinfo.c
+++ b/examples/devinfo.c
@@ -209,6 +209,7 @@ static int print_hca_cap(struct ibv_device *ib_dev, uint8_t 
ib_port)
struct ibv_port_attr port_attr;
int rc = 0;
uint8_t port;
+   uint32_t max_datagram;
char buf[256];
 
ctx = ibv_open_device(ib_dev);
@@ -298,6 +299,11 @@ static int print_hca_cap(struct ibv_device *ib_dev, 
uint8_t ib_port)
fprintf(stderr, Failed to query port %u props\n, 
port);
goto cleanup;
}
+   rc = uv_query_port_max_datagram(ctx, port, max_datagram);
+   if (rc) {
+   fprintf(stderr, Failed to query port %u max datagram 
size\n, port);
+   goto cleanup;
+   }
printf(\t\tport:\t%d\n, port);
printf(\t\t\tstate:\t\t\t%s (%d)\n,
   port_state_str(port_attr.state), port_attr.state);
@@ -305,6 +311,7 @@ static int print_hca_cap(struct ibv_device *ib_dev, uint8_t 
ib_port)
   mtu_str(port_attr.max_mtu), port_attr.max_mtu);
printf(\t\t\tactive_mtu:\t\t%s (%d)\n,
   mtu_str(port_attr.active_mtu), port_attr.active_mtu);
+   printf(\t\t\tmax_datagram_size:\t%u\n, max_datagram);
printf(\t\t\tsm_lid:\t\t\t%d\n, port_attr.sm_lid);
printf(\t\t\tport_lid:\t\t%d\n, port_attr.lid);
printf(\t\t\tport_lmc:\t\t0x%02x\n, port_attr.lmc);
diff --git a/include/infiniband/driver.h b/include/infiniband/driver.h
index 9a81416..6e1236c 100644
--- a/include/infiniband/driver.h
+++ b/include/infiniband/driver.h
@@ -67,6 +67,10 @@ int ibv_cmd_query_device(struct ibv_context *context,
 int ibv_cmd_query_port(struct ibv_context *context, uint8_t port_num,
   struct ibv_port_attr *port_attr,
   struct ibv_query_port *cmd, size_t cmd_size);
+int uv_cmd_query_port_max_datagram(struct ibv_context *context, uint8_t 
port_num,
+  uint32_t *max_datagram,
+  struct uv_query_port_max_datagram *cmd,
+  size_t cmd_size);
 int ibv_cmd_query_gid(struct

[PATCH] Add new verb: uv_query_port_max_datagram()

2013-08-19 Thread Jeff Squyres
Per lengthy discussion on the linux-rdma list, add a new verb to get
max datagram size (in bytes) since the methods for retrieving MTU
values are limited to a finite enum set, and are difficult to change
for backwards compatibility reasons.

Also add corresponding command: uv_cmd_query_port_max_datagram().
Since this is a new verb, there was no need to add a _V2 enum for the
command macro, which required adding a UB_INIT_CMD_RESP() macro.

Bumped the ABI version to 7 (the new verb will return -ENOSYS if
abi_verb is  7).

Note that the name for this verb was chosen with the following
rationale:

* After discussion with Roland, use the prefix uv instead of ibv,
  since this verb is generic to both Ethernet, InfiniBand, and
  whatever other transports are underneath.
* query was used (vs. get) because it invokes a command (vs. a
  struct lookup)

If the community likes this approach, I'll send the corresponding
kernel patch.

Signed-off-by: Jeff Squyres jsquy...@cisco.com
---
 Makefile.am  |  3 +-
 examples/devinfo.c   |  7 +
 include/infiniband/driver.h  |  4 +++
 include/infiniband/kern-abi.h| 19 +++--
 include/infiniband/verbs.h   |  7 +
 man/uv_query_port_max_datagram.3 | 60 
 src/cmd.c| 25 +
 src/ibverbs.h|  8 ++
 src/libibverbs.map   |  2 ++
 src/verbs.c  | 10 +++
 10 files changed, 142 insertions(+), 3 deletions(-)
 create mode 100644 man/uv_query_port_max_datagram.3

diff --git a/Makefile.am b/Makefile.am
index 40e83be..51fe5d5 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -54,7 +54,8 @@ man_MANS = man/ibv_asyncwatch.1 man/ibv_devices.1 
man/ibv_devinfo.1   \
 man/ibv_post_srq_recv.3 man/ibv_query_device.3 man/ibv_query_gid.3 \
 man/ibv_query_pkey.3 man/ibv_query_port.3 man/ibv_query_qp.3   \
 man/ibv_query_srq.3 man/ibv_rate_to_mult.3 man/ibv_reg_mr.3
\
-man/ibv_req_notify_cq.3 man/ibv_resize_cq.3 man/ibv_rate_to_mbps.3
+man/ibv_req_notify_cq.3 man/ibv_resize_cq.3 man/ibv_rate_to_mbps.3 \
+man/uv_query_port_max_datagram.3
 
 DEBIAN = debian/changelog debian/compat debian/control debian/copyright \
 debian/ibverbs-utils.install debian/libibverbs1.install \
diff --git a/examples/devinfo.c b/examples/devinfo.c
index ff078e4..f51620b 100644
--- a/examples/devinfo.c
+++ b/examples/devinfo.c
@@ -209,6 +209,7 @@ static int print_hca_cap(struct ibv_device *ib_dev, uint8_t 
ib_port)
struct ibv_port_attr port_attr;
int rc = 0;
uint8_t port;
+   uint32_t max_datagram;
char buf[256];
 
ctx = ibv_open_device(ib_dev);
@@ -298,6 +299,11 @@ static int print_hca_cap(struct ibv_device *ib_dev, 
uint8_t ib_port)
fprintf(stderr, Failed to query port %u props\n, 
port);
goto cleanup;
}
+   rc = uv_query_port_max_datagram(ctx, port, max_datagram);
+   if (rc) {
+   fprintf(stderr, Failed to query port %u max datagram 
size\n, port);
+   goto cleanup;
+   }
printf(\t\tport:\t%d\n, port);
printf(\t\t\tstate:\t\t\t%s (%d)\n,
   port_state_str(port_attr.state), port_attr.state);
@@ -305,6 +311,7 @@ static int print_hca_cap(struct ibv_device *ib_dev, uint8_t 
ib_port)
   mtu_str(port_attr.max_mtu), port_attr.max_mtu);
printf(\t\t\tactive_mtu:\t\t%s (%d)\n,
   mtu_str(port_attr.active_mtu), port_attr.active_mtu);
+   printf(\t\t\tmax_datagram_size:\t%u\n, max_datagram);
printf(\t\t\tsm_lid:\t\t\t%d\n, port_attr.sm_lid);
printf(\t\t\tport_lid:\t\t%d\n, port_attr.lid);
printf(\t\t\tport_lmc:\t\t0x%02x\n, port_attr.lmc);
diff --git a/include/infiniband/driver.h b/include/infiniband/driver.h
index 9a81416..6e1236c 100644
--- a/include/infiniband/driver.h
+++ b/include/infiniband/driver.h
@@ -67,6 +67,10 @@ int ibv_cmd_query_device(struct ibv_context *context,
 int ibv_cmd_query_port(struct ibv_context *context, uint8_t port_num,
   struct ibv_port_attr *port_attr,
   struct ibv_query_port *cmd, size_t cmd_size);
+int uv_cmd_query_port_max_datagram(struct ibv_context *context, uint8_t 
port_num,
+  uint32_t *max_datagram,
+  struct uv_query_port_max_datagram *cmd,
+  size_t cmd_size);
 int ibv_cmd_query_gid(struct ibv_context *context, uint8_t port_num,
  int index, union ibv_gid *gid);
 int ibv_cmd_query_pkey(struct ibv_context *context, uint8_t port_num,
diff --git a/include/infiniband/kern-abi.h b/include/infiniband/kern-abi.h
index 619ea7e..951108e 100644
--- a/include/infiniband/kern

[PATCH libibverbs] Add new verb: uv_query_port_max_datagram()

2013-08-19 Thread Jeff Squyres
(re-sending because I forgot to include libibverbs in the subject)

Per lengthy discussion on the linux-rdma list, add a new verb to get
max datagram size (in bytes) since the methods for retrieving MTU
values are limited to a finite enum set, and are difficult to change
for backwards compatibility reasons.

Also add corresponding command: uv_cmd_query_port_max_datagram().
Since this is a new verb, there was no need to add a _V2 enum for the
command macro, which required adding a UB_INIT_CMD_RESP() macro.

Bumped the ABI version to 7 (the new verb will return -ENOSYS if
abi_verb is  7).

Note that the name for this verb was chosen with the following
rationale:

* After discussion with Roland, use the prefix uv instead of ibv,
  since this verb is generic to both Ethernet, InfiniBand, and
  whatever other transports are underneath.
* query was used (vs. get) because it invokes a command (vs. a
  struct lookup)

If the community likes this approach, I'll send the corresponding
kernel patch.

Signed-off-by: Jeff Squyres jsquy...@cisco.com
---
 Makefile.am  |  3 +-
 examples/devinfo.c   |  7 +
 include/infiniband/driver.h  |  4 +++
 include/infiniband/kern-abi.h| 19 +++--
 include/infiniband/verbs.h   |  7 +
 man/uv_query_port_max_datagram.3 | 60 
 src/cmd.c| 25 +
 src/ibverbs.h|  8 ++
 src/libibverbs.map   |  2 ++
 src/verbs.c  | 10 +++
 10 files changed, 142 insertions(+), 3 deletions(-)
 create mode 100644 man/uv_query_port_max_datagram.3

diff --git a/Makefile.am b/Makefile.am
index 40e83be..51fe5d5 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -54,7 +54,8 @@ man_MANS = man/ibv_asyncwatch.1 man/ibv_devices.1 
man/ibv_devinfo.1   \
 man/ibv_post_srq_recv.3 man/ibv_query_device.3 man/ibv_query_gid.3 \
 man/ibv_query_pkey.3 man/ibv_query_port.3 man/ibv_query_qp.3   \
 man/ibv_query_srq.3 man/ibv_rate_to_mult.3 man/ibv_reg_mr.3
\
-man/ibv_req_notify_cq.3 man/ibv_resize_cq.3 man/ibv_rate_to_mbps.3
+man/ibv_req_notify_cq.3 man/ibv_resize_cq.3 man/ibv_rate_to_mbps.3 \
+man/uv_query_port_max_datagram.3
 
 DEBIAN = debian/changelog debian/compat debian/control debian/copyright \
 debian/ibverbs-utils.install debian/libibverbs1.install \
diff --git a/examples/devinfo.c b/examples/devinfo.c
index ff078e4..f51620b 100644
--- a/examples/devinfo.c
+++ b/examples/devinfo.c
@@ -209,6 +209,7 @@ static int print_hca_cap(struct ibv_device *ib_dev, uint8_t 
ib_port)
struct ibv_port_attr port_attr;
int rc = 0;
uint8_t port;
+   uint32_t max_datagram;
char buf[256];
 
ctx = ibv_open_device(ib_dev);
@@ -298,6 +299,11 @@ static int print_hca_cap(struct ibv_device *ib_dev, 
uint8_t ib_port)
fprintf(stderr, Failed to query port %u props\n, 
port);
goto cleanup;
}
+   rc = uv_query_port_max_datagram(ctx, port, max_datagram);
+   if (rc) {
+   fprintf(stderr, Failed to query port %u max datagram 
size\n, port);
+   goto cleanup;
+   }
printf(\t\tport:\t%d\n, port);
printf(\t\t\tstate:\t\t\t%s (%d)\n,
   port_state_str(port_attr.state), port_attr.state);
@@ -305,6 +311,7 @@ static int print_hca_cap(struct ibv_device *ib_dev, uint8_t 
ib_port)
   mtu_str(port_attr.max_mtu), port_attr.max_mtu);
printf(\t\t\tactive_mtu:\t\t%s (%d)\n,
   mtu_str(port_attr.active_mtu), port_attr.active_mtu);
+   printf(\t\t\tmax_datagram_size:\t%u\n, max_datagram);
printf(\t\t\tsm_lid:\t\t\t%d\n, port_attr.sm_lid);
printf(\t\t\tport_lid:\t\t%d\n, port_attr.lid);
printf(\t\t\tport_lmc:\t\t0x%02x\n, port_attr.lmc);
diff --git a/include/infiniband/driver.h b/include/infiniband/driver.h
index 9a81416..6e1236c 100644
--- a/include/infiniband/driver.h
+++ b/include/infiniband/driver.h
@@ -67,6 +67,10 @@ int ibv_cmd_query_device(struct ibv_context *context,
 int ibv_cmd_query_port(struct ibv_context *context, uint8_t port_num,
   struct ibv_port_attr *port_attr,
   struct ibv_query_port *cmd, size_t cmd_size);
+int uv_cmd_query_port_max_datagram(struct ibv_context *context, uint8_t 
port_num,
+  uint32_t *max_datagram,
+  struct uv_query_port_max_datagram *cmd,
+  size_t cmd_size);
 int ibv_cmd_query_gid(struct ibv_context *context, uint8_t port_num,
  int index, union ibv_gid *gid);
 int ibv_cmd_query_pkey(struct ibv_context *context, uint8_t port_num,
diff --git a/include/infiniband/kern-abi.h b/include/infiniband/kern

Re: [PATCH libibverbs] Add new verb: uv_query_port_max_datagram()

2013-08-19 Thread Jeff Squyres (jsquyres)
On Aug 19, 2013, at 4:19 PM, Jason Gunthorpe jguntho...@obsidianresearch.com 
wrote:

 What about doing query port in this case and returning that value,
 decoded to an enum? Otherwise apps have to include that logic anyhow.
 
 I'm assuming the kernel will do basically the same?
 
 Bascially, the only failure for this call should be due to a bad port
 number..


Sure, can do.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add new verb: uv_query_port_max_datagram()

2013-08-19 Thread Jeff Squyres (jsquyres)
On Aug 19, 2013, at 5:18 PM, Hefty, Sean sean.he...@intel.com wrote:

 Bumped the ABI version to 7 (the new verb will return -ENOSYS if
 abi_verb is  7).
 
 How does this break the ABI?


It doesn't *break* the ABI, but it does add a new downcall into the kernel.  
That requires bumping the ABI version to 7, no?

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add new verb: uv_query_port_max_datagram()

2013-08-19 Thread Jeff Squyres (jsquyres)
On Aug 19, 2013, at 6:07 PM, Hefty, Sean sean.he...@intel.com wrote:

 It doesn't *break* the ABI, but it does add a new downcall into the kernel.
 That requires bumping the ABI version to 7, no?
 
 No - adding a new command is fine.  Older kernels will return ENOSYS if that 
 command is not supported.  In that case, you can handle things like Jason 
 suggested.


Gotcha.  I'll adjust the patch.

Any other feedback?

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] libibverbs: Add the use of IBV_SEND_INLINE to example pingpong programs

2013-07-30 Thread Jeff Squyres (jsquyres)
4th bump...

On Jul 10, 2013, at 4:32 PM, Jeff Squyres jsquy...@cisco.com wrote:

 If the send size is less than the cap.max_inline_data reported by the
 qp, use the IBV_SEND_INLINE flag.  This now only shows the example of
 using ibv_query_qp(), it also reduces the latency time shown by the
 pingpong programs when the sends can be inlined.
 
 Signed-off-by: Jeff Squyres jsquy...@cisco.com
 ---
 examples/rc_pingpong.c  | 18 +-
 examples/srq_pingpong.c | 19 +--
 examples/uc_pingpong.c  | 17 -
 examples/ud_pingpong.c  | 18 +-
 4 files changed, 51 insertions(+), 21 deletions(-)
 
 diff --git a/examples/rc_pingpong.c b/examples/rc_pingpong.c
 index 15494a1..a8637a5 100644
 --- a/examples/rc_pingpong.c
 +++ b/examples/rc_pingpong.c
 @@ -65,6 +65,7 @@ struct pingpong_context {
   struct ibv_qp   *qp;
   void*buf;
   int  size;
 + int  send_flags;
   int  rx_depth;
   int  pending;
   struct ibv_port_attr portinfo;
 @@ -319,8 +320,9 @@ static struct pingpong_context *pp_init_ctx(struct 
 ibv_device *ib_dev, int size,
   if (!ctx)
   return NULL;
 
 - ctx-size = size;
 - ctx-rx_depth = rx_depth;
 + ctx-size   = size;
 + ctx-send_flags = IBV_SEND_SIGNALED;
 + ctx-rx_depth   = rx_depth;
 
   ctx-buf = memalign(page_size, size);
   if (!ctx-buf) {
 @@ -367,7 +369,8 @@ static struct pingpong_context *pp_init_ctx(struct 
 ibv_device *ib_dev, int size,
   }
 
   {
 - struct ibv_qp_init_attr attr = {
 + struct ibv_qp_attr attr;
 + struct ibv_qp_init_attr init_attr = {
   .send_cq = ctx-cq,
   .recv_cq = ctx-cq,
   .cap = {
 @@ -379,11 +382,16 @@ static struct pingpong_context *pp_init_ctx(struct 
 ibv_device *ib_dev, int size,
   .qp_type = IBV_QPT_RC
   };
 
 - ctx-qp = ibv_create_qp(ctx-pd, attr);
 + ctx-qp = ibv_create_qp(ctx-pd, init_attr);
   if (!ctx-qp)  {
   fprintf(stderr, Couldn't create QP\n);
   goto clean_cq;
   }
 +
 + ibv_query_qp(ctx-qp, attr, IBV_QP_CAP, init_attr);
 + if (init_attr.cap.max_inline_data = size) {
 + ctx-send_flags |= IBV_SEND_INLINE;
 + }
   }
 
   {
 @@ -508,7 +516,7 @@ static int pp_post_send(struct pingpong_context *ctx)
   .sg_list= list,
   .num_sge= 1,
   .opcode = IBV_WR_SEND,
 - .send_flags = IBV_SEND_SIGNALED,
 + .send_flags = ctx-send_flags,
   };
   struct ibv_send_wr *bad_wr;
 
 diff --git a/examples/srq_pingpong.c b/examples/srq_pingpong.c
 index 6e00f8c..552a144 100644
 --- a/examples/srq_pingpong.c
 +++ b/examples/srq_pingpong.c
 @@ -68,6 +68,7 @@ struct pingpong_context {
   struct ibv_qp   *qp[MAX_QP];
   void*buf;
   int  size;
 + int  send_flags;
   int  num_qp;
   int  rx_depth;
   int  pending[MAX_QP];
 @@ -350,9 +351,10 @@ static struct pingpong_context *pp_init_ctx(struct 
 ibv_device *ib_dev, int size,
   if (!ctx)
   return NULL;
 
 - ctx-size = size;
 - ctx-num_qp   = num_qp;
 - ctx-rx_depth = rx_depth;
 + ctx-size   = size;
 + ctx-send_flags = IBV_SEND_SIGNALED;
 + ctx-num_qp = num_qp;
 + ctx-rx_depth   = rx_depth;
 
   ctx-buf = memalign(page_size, size);
   if (!ctx-buf) {
 @@ -413,7 +415,8 @@ static struct pingpong_context *pp_init_ctx(struct 
 ibv_device *ib_dev, int size,
   }
 
   for (i = 0; i  num_qp; ++i) {
 - struct ibv_qp_init_attr attr = {
 + struct ibv_qp_attr attr;
 + struct ibv_qp_init_attr init_attr = {
   .send_cq = ctx-cq,
   .recv_cq = ctx-cq,
   .srq = ctx-srq,
 @@ -424,11 +427,15 @@ static struct pingpong_context *pp_init_ctx(struct 
 ibv_device *ib_dev, int size,
   .qp_type = IBV_QPT_RC
   };
 
 - ctx-qp[i] = ibv_create_qp(ctx-pd, attr);
 + ctx-qp[i] = ibv_create_qp(ctx-pd, init_attr);
   if (!ctx-qp[i])  {
   fprintf(stderr, Couldn't create QP[%d]\n, i);
   goto clean_qps;
   }
 + ibv_query_qp(ctx-qp[i], attr, IBV_QP_CAP, init_attr);
 + if (init_attr.cap.max_inline_data = size) {
 + ctx-send_flags |= IBV_SEND_INLINE;
 + }
   }
 
   for (i = 0; i  num_qp; ++i) {
 @@ -568,7 +575,7 @@ static int pp_post_send(struct

Re: [PATCH V2] libibverbs: Allow arbitrary int values for MTU

2013-07-30 Thread Jeff Squyres (jsquyres)
On Jul 23, 2013, at 9:26 AM, Jeff Squyres (jsquyres) jsquy...@cisco.com wrote:

 .. and UD is the least abstracted transport, so existing apps won't
 support Jeff's new NIC anyhow, MTU is the least of their problems.
 
 Existing apps with existing transports see the same old values.


Bump.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2] libibverbs: Allow arbitrary int values for MTU

2013-07-30 Thread Jeff Squyres (jsquyres)
On Jul 30, 2013, at 12:44 PM, Christoph Lameter c...@gentwo.org wrote:

 What in the world does that mean? I am an oldtimer I guess. Seems that
 this is something that can be done in the newfangled forum? How does this
 affect mailing lists?


I'm not sure what you're asking me; please see the prior posts on this thread 
that describes the MTU issue and why we still need a solution.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] libibverbs: Add the use of IBV_SEND_INLINE to example pingpong programs

2013-07-23 Thread Jeff Squyres (jsquyres)
Bump bump bump.

I know this isn't a huge / important patch, but it is a small thing that does 
decrease the latency reported by these example programs.


On Jul 10, 2013, at 4:32 PM, Jeff Squyres jsquy...@cisco.com wrote:

 If the send size is less than the cap.max_inline_data reported by the
 qp, use the IBV_SEND_INLINE flag.  This not only shows the example of
 using ibv_query_qp(), it also reduces the latency time shown by the
 pingpong programs when the sends can be inlined.
 
 Signed-off-by: Jeff Squyres jsquy...@cisco.com
 ---
 examples/rc_pingpong.c  | 18 +-
 examples/srq_pingpong.c | 19 +--
 examples/uc_pingpong.c  | 17 -
 examples/ud_pingpong.c  | 18 +-
 4 files changed, 51 insertions(+), 21 deletions(-)
 
 diff --git a/examples/rc_pingpong.c b/examples/rc_pingpong.c
 index 15494a1..a8637a5 100644
 --- a/examples/rc_pingpong.c
 +++ b/examples/rc_pingpong.c
 @@ -65,6 +65,7 @@ struct pingpong_context {
   struct ibv_qp   *qp;
   void*buf;
   int  size;
 + int  send_flags;
   int  rx_depth;
   int  pending;
   struct ibv_port_attr portinfo;
 @@ -319,8 +320,9 @@ static struct pingpong_context *pp_init_ctx(struct 
 ibv_device *ib_dev, int size,
   if (!ctx)
   return NULL;
 
 - ctx-size = size;
 - ctx-rx_depth = rx_depth;
 + ctx-size   = size;
 + ctx-send_flags = IBV_SEND_SIGNALED;
 + ctx-rx_depth   = rx_depth;
 
   ctx-buf = memalign(page_size, size);
   if (!ctx-buf) {
 @@ -367,7 +369,8 @@ static struct pingpong_context *pp_init_ctx(struct 
 ibv_device *ib_dev, int size,
   }
 
   {
 - struct ibv_qp_init_attr attr = {
 + struct ibv_qp_attr attr;
 + struct ibv_qp_init_attr init_attr = {
   .send_cq = ctx-cq,
   .recv_cq = ctx-cq,
   .cap = {
 @@ -379,11 +382,16 @@ static struct pingpong_context *pp_init_ctx(struct 
 ibv_device *ib_dev, int size,
   .qp_type = IBV_QPT_RC
   };
 
 - ctx-qp = ibv_create_qp(ctx-pd, attr);
 + ctx-qp = ibv_create_qp(ctx-pd, init_attr);
   if (!ctx-qp)  {
   fprintf(stderr, Couldn't create QP\n);
   goto clean_cq;
   }
 +
 + ibv_query_qp(ctx-qp, attr, IBV_QP_CAP, init_attr);
 + if (init_attr.cap.max_inline_data = size) {
 + ctx-send_flags |= IBV_SEND_INLINE;
 + }
   }
 
   {
 @@ -508,7 +516,7 @@ static int pp_post_send(struct pingpong_context *ctx)
   .sg_list= list,
   .num_sge= 1,
   .opcode = IBV_WR_SEND,
 - .send_flags = IBV_SEND_SIGNALED,
 + .send_flags = ctx-send_flags,
   };
   struct ibv_send_wr *bad_wr;
 
 diff --git a/examples/srq_pingpong.c b/examples/srq_pingpong.c
 index 6e00f8c..552a144 100644
 --- a/examples/srq_pingpong.c
 +++ b/examples/srq_pingpong.c
 @@ -68,6 +68,7 @@ struct pingpong_context {
   struct ibv_qp   *qp[MAX_QP];
   void*buf;
   int  size;
 + int  send_flags;
   int  num_qp;
   int  rx_depth;
   int  pending[MAX_QP];
 @@ -350,9 +351,10 @@ static struct pingpong_context *pp_init_ctx(struct 
 ibv_device *ib_dev, int size,
   if (!ctx)
   return NULL;
 
 - ctx-size = size;
 - ctx-num_qp   = num_qp;
 - ctx-rx_depth = rx_depth;
 + ctx-size   = size;
 + ctx-send_flags = IBV_SEND_SIGNALED;
 + ctx-num_qp = num_qp;
 + ctx-rx_depth   = rx_depth;
 
   ctx-buf = memalign(page_size, size);
   if (!ctx-buf) {
 @@ -413,7 +415,8 @@ static struct pingpong_context *pp_init_ctx(struct 
 ibv_device *ib_dev, int size,
   }
 
   for (i = 0; i  num_qp; ++i) {
 - struct ibv_qp_init_attr attr = {
 + struct ibv_qp_attr attr;
 + struct ibv_qp_init_attr init_attr = {
   .send_cq = ctx-cq,
   .recv_cq = ctx-cq,
   .srq = ctx-srq,
 @@ -424,11 +427,15 @@ static struct pingpong_context *pp_init_ctx(struct 
 ibv_device *ib_dev, int size,
   .qp_type = IBV_QPT_RC
   };
 
 - ctx-qp[i] = ibv_create_qp(ctx-pd, attr);
 + ctx-qp[i] = ibv_create_qp(ctx-pd, init_attr);
   if (!ctx-qp[i])  {
   fprintf(stderr, Couldn't create QP[%d]\n, i);
   goto clean_qps;
   }
 + ibv_query_qp(ctx-qp[i], attr, IBV_QP_CAP, init_attr);
 + if (init_attr.cap.max_inline_data = size) {
 + ctx-send_flags

Re: [PATCH V2] libibverbs: Allow arbitrary int values for MTU

2013-07-23 Thread Jeff Squyres (jsquyres)
On Jul 18, 2013, at 12:50 PM, Jason Gunthorpe jguntho...@obsidianresearch.com 
wrote:

 We need it for UD for our upcoming device, however, because the MTU
 is the only way to get the max message size.
 
 .. and UD is the least abstracted transport, so existing apps won't
 support Jeff's new NIC anyhow, MTU is the least of their problems.
 
 Existing apps with existing transports see the same old values.


...so how do we move forward?

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] libibverbs: Add the use of IBV_SEND_INLINE to example pingpong programs

2013-07-19 Thread Jeff Squyres (jsquyres)
Bump bump.

On Jul 10, 2013, at 4:32 PM, Jeff Squyres jsquy...@cisco.com wrote:

 If the send size is less than the cap.max_inline_data reported by the
 qp, use the IBV_SEND_INLINE flag.  This now only shows the example of
 using ibv_query_qp(), it also reduces the latency time shown by the
 pingpong programs when the sends can be inlined.
 
 Signed-off-by: Jeff Squyres jsquy...@cisco.com
 ---
 examples/rc_pingpong.c  | 18 +-
 examples/srq_pingpong.c | 19 +--
 examples/uc_pingpong.c  | 17 -
 examples/ud_pingpong.c  | 18 +-
 4 files changed, 51 insertions(+), 21 deletions(-)
 
 diff --git a/examples/rc_pingpong.c b/examples/rc_pingpong.c
 index 15494a1..a8637a5 100644
 --- a/examples/rc_pingpong.c
 +++ b/examples/rc_pingpong.c
 @@ -65,6 +65,7 @@ struct pingpong_context {
   struct ibv_qp   *qp;
   void*buf;
   int  size;
 + int  send_flags;
   int  rx_depth;
   int  pending;
   struct ibv_port_attr portinfo;
 @@ -319,8 +320,9 @@ static struct pingpong_context *pp_init_ctx(struct 
 ibv_device *ib_dev, int size,
   if (!ctx)
   return NULL;
 
 - ctx-size = size;
 - ctx-rx_depth = rx_depth;
 + ctx-size   = size;
 + ctx-send_flags = IBV_SEND_SIGNALED;
 + ctx-rx_depth   = rx_depth;
 
   ctx-buf = memalign(page_size, size);
   if (!ctx-buf) {
 @@ -367,7 +369,8 @@ static struct pingpong_context *pp_init_ctx(struct 
 ibv_device *ib_dev, int size,
   }
 
   {
 - struct ibv_qp_init_attr attr = {
 + struct ibv_qp_attr attr;
 + struct ibv_qp_init_attr init_attr = {
   .send_cq = ctx-cq,
   .recv_cq = ctx-cq,
   .cap = {
 @@ -379,11 +382,16 @@ static struct pingpong_context *pp_init_ctx(struct 
 ibv_device *ib_dev, int size,
   .qp_type = IBV_QPT_RC
   };
 
 - ctx-qp = ibv_create_qp(ctx-pd, attr);
 + ctx-qp = ibv_create_qp(ctx-pd, init_attr);
   if (!ctx-qp)  {
   fprintf(stderr, Couldn't create QP\n);
   goto clean_cq;
   }
 +
 + ibv_query_qp(ctx-qp, attr, IBV_QP_CAP, init_attr);
 + if (init_attr.cap.max_inline_data = size) {
 + ctx-send_flags |= IBV_SEND_INLINE;
 + }
   }
 
   {
 @@ -508,7 +516,7 @@ static int pp_post_send(struct pingpong_context *ctx)
   .sg_list= list,
   .num_sge= 1,
   .opcode = IBV_WR_SEND,
 - .send_flags = IBV_SEND_SIGNALED,
 + .send_flags = ctx-send_flags,
   };
   struct ibv_send_wr *bad_wr;
 
 diff --git a/examples/srq_pingpong.c b/examples/srq_pingpong.c
 index 6e00f8c..552a144 100644
 --- a/examples/srq_pingpong.c
 +++ b/examples/srq_pingpong.c
 @@ -68,6 +68,7 @@ struct pingpong_context {
   struct ibv_qp   *qp[MAX_QP];
   void*buf;
   int  size;
 + int  send_flags;
   int  num_qp;
   int  rx_depth;
   int  pending[MAX_QP];
 @@ -350,9 +351,10 @@ static struct pingpong_context *pp_init_ctx(struct 
 ibv_device *ib_dev, int size,
   if (!ctx)
   return NULL;
 
 - ctx-size = size;
 - ctx-num_qp   = num_qp;
 - ctx-rx_depth = rx_depth;
 + ctx-size   = size;
 + ctx-send_flags = IBV_SEND_SIGNALED;
 + ctx-num_qp = num_qp;
 + ctx-rx_depth   = rx_depth;
 
   ctx-buf = memalign(page_size, size);
   if (!ctx-buf) {
 @@ -413,7 +415,8 @@ static struct pingpong_context *pp_init_ctx(struct 
 ibv_device *ib_dev, int size,
   }
 
   for (i = 0; i  num_qp; ++i) {
 - struct ibv_qp_init_attr attr = {
 + struct ibv_qp_attr attr;
 + struct ibv_qp_init_attr init_attr = {
   .send_cq = ctx-cq,
   .recv_cq = ctx-cq,
   .srq = ctx-srq,
 @@ -424,11 +427,15 @@ static struct pingpong_context *pp_init_ctx(struct 
 ibv_device *ib_dev, int size,
   .qp_type = IBV_QPT_RC
   };
 
 - ctx-qp[i] = ibv_create_qp(ctx-pd, attr);
 + ctx-qp[i] = ibv_create_qp(ctx-pd, init_attr);
   if (!ctx-qp[i])  {
   fprintf(stderr, Couldn't create QP[%d]\n, i);
   goto clean_qps;
   }
 + ibv_query_qp(ctx-qp[i], attr, IBV_QP_CAP, init_attr);
 + if (init_attr.cap.max_inline_data = size) {
 + ctx-send_flags |= IBV_SEND_INLINE;
 + }
   }
 
   for (i = 0; i  num_qp; ++i) {
 @@ -568,7 +575,7 @@ static int pp_post_send(struct

Re: [PATCH V2] libibverbs: Allow arbitrary int values for MTU

2013-07-17 Thread Jeff Squyres (jsquyres)
On Jul 17, 2013, at 5:44 PM, Steve Wise sw...@opengridcomputing.com wrote:

 The iwarp drivers just report the nearest mtu enum.  Apps don't need it for 
 iwarp like they do for ib.


For RC, it doesn't matter much.  So the fact that RoCE and iWARP lie about 
their MTU isn't a huge deal.  It's wrong, but it doesn't matter much.

We need it for UD for our upcoming device, however, because the MTU is the only 
way to get the max message size.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2] libibverbs: Allow arbitrary int values for MTU

2013-07-16 Thread Jeff Squyres (jsquyres)
On Jul 16, 2013, at 10:47 AM, Jason Gunthorpe jguntho...@obsidianresearch.com 
wrote:

 A source change is completely unvaoidable. Supporting the new MTU
 values requires updated source.


I don't really care one way or the other; I'll submit whatever patch people 
want.  :-)

But FWIW, I tend to believe the Doug/Jason position:

- MTU really needs to be a plain integer (not an enum)
- forcing application source change/adaptation is the safest way to move forward
- doing it this way preserves ABI, so existing binaries are safe

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2] libibverbs: Allow arbitrary int values for MTU

2013-07-15 Thread Jeff Squyres (jsquyres)
Bump.

On Jul 10, 2013, at 8:14 AM, Jeff Squyres (jsquyres) jsquy...@cisco.com wrote:

 On Jul 8, 2013, at 1:26 PM, Jason Gunthorpe jguntho...@obsidianresearch.com 
 wrote:
 
 Jeff's patch doesn't break old binaries, old binaries, running with
 normal IB MTUs work fine. The structure layouts all stay the same,
 etc.
 
 
 FWIW, I did a simple test to confirm this.  I installed a stock git HEAD 
 libibverbs into $HOME/libibverbs-HEAD and a libibverbs with the MTU patch in 
 $HOME/libibverbs-mtu-patch.  The mlx4 driver was installed into both trees (I 
 used some fairly old Mellanox HCAs+Dell servers for this test).
 
 This is the base case:
 
 -
 [5:06] dell012:~ ❯❯❯ cd libibverbs-HEAD
 [5:07] dell012:~/libibverbs-HEAD ❯❯❯ ldd bin/ibv_rc_pingpong
   linux-vdso.so.1 =  (0x2aacb000)
   libibverbs.so.1 = /home/jsquyres/libibverbs-HEAD/lib/libibverbs.so.1 
 (0x2accd000)
   libpthread.so.0 = /lib64/libpthread.so.0 (0x2aeec000)
   libdl.so.2 = /lib64/libdl.so.2 (0x2b109000)
   libc.so.6 = /lib64/libc.so.6 (0x2b30e000)
   /lib64/ld-linux-x86-64.so.2 (0x2aaab000)
 [5:07] dell012:~/libibverbs-HEAD ❯❯❯ ./bin/ibv_rc_pingpong dell011
  local address:  LID 0x0004, QPN 0x04004a, PSN 0xc08742, GID ::
  remote address: LID 0x0019, QPN 0x20004a, PSN 0x44c48e, GID ::
 8192000 bytes in 0.02 seconds = 4170.28 Mbit/sec
 1000 iters in 0.02 seconds = 15.72 usec/iter
 -
 
 Works fine.  Now let's use the same libibverbs-HEAD rc pingpong binary, but 
 with the MTU-patched libibverbs.so:
 
 -
 [5:07] dell012:~/libibverbs-HEAD ❯❯❯ mv lib/libibverbs.so.1 
 lib/libibverbs.so.1-bogus
 [5:07] dell012:~/libibverbs-HEAD ❯❯❯ export 
 LD_LIBRARY_PATH=$HOME/libibverbs-mtu-patch/lib
 [5:08] dell012:~/libibverbs-HEAD ❯❯❯ ldd bin/ibv_rc_pingpong
   linux-vdso.so.1 =  (0x2aacb000)
   libibverbs.so.1 = 
 /home/jsquyres/libibverbs-mtu-patch/lib/libibverbs.so.1 (0x2accd000)
   libpthread.so.0 = /lib64/libpthread.so.0 (0x2aeed000)
   libdl.so.2 = /lib64/libdl.so.2 (0x2b10a000)
   libc.so.6 = /lib64/libc.so.6 (0x2b30e000)
   /lib64/ld-linux-x86-64.so.2 (0x2aaab000)
 [5:08] dell012:~/libibverbs-HEAD ❯❯❯ ./bin/ibv_rc_pingpong dell011
  local address:  LID 0x0004, QPN 0x08004a, PSN 0x65391c, GID ::
  remote address: LID 0x0019, QPN 0x24004a, PSN 0x7d137e, GID ::
 8192000 bytes in 0.02 seconds = 4163.39 Mbit/sec
 1000 iters in 0.02 seconds = 15.74 usec/iter
 -
 
 Still works fine.


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [PATCH] libibverbs: Add the use of IBV_SEND_INLINE to example pingpong programs

2013-07-15 Thread Jeff Squyres (jsquyres)
Bump.

On Jul 10, 2013, at 4:32 PM, Jeff Squyres jsquy...@cisco.com wrote:

 If the send size is less than the cap.max_inline_data reported by the
 qp, use the IBV_SEND_INLINE flag.  This now only shows the example of
 using ibv_query_qp(), it also reduces the latency time shown by the
 pingpong programs when the sends can be inlined.
 
 Signed-off-by: Jeff Squyres jsquy...@cisco.com
 ---
 examples/rc_pingpong.c  | 18 +-
 examples/srq_pingpong.c | 19 +--
 examples/uc_pingpong.c  | 17 -
 examples/ud_pingpong.c  | 18 +-
 4 files changed, 51 insertions(+), 21 deletions(-)
 
 diff --git a/examples/rc_pingpong.c b/examples/rc_pingpong.c
 index 15494a1..a8637a5 100644
 --- a/examples/rc_pingpong.c
 +++ b/examples/rc_pingpong.c
 @@ -65,6 +65,7 @@ struct pingpong_context {
   struct ibv_qp   *qp;
   void*buf;
   int  size;
 + int  send_flags;
   int  rx_depth;
   int  pending;
   struct ibv_port_attr portinfo;
 @@ -319,8 +320,9 @@ static struct pingpong_context *pp_init_ctx(struct 
 ibv_device *ib_dev, int size,
   if (!ctx)
   return NULL;
 
 - ctx-size = size;
 - ctx-rx_depth = rx_depth;
 + ctx-size   = size;
 + ctx-send_flags = IBV_SEND_SIGNALED;
 + ctx-rx_depth   = rx_depth;
 
   ctx-buf = memalign(page_size, size);
   if (!ctx-buf) {
 @@ -367,7 +369,8 @@ static struct pingpong_context *pp_init_ctx(struct 
 ibv_device *ib_dev, int size,
   }
 
   {
 - struct ibv_qp_init_attr attr = {
 + struct ibv_qp_attr attr;
 + struct ibv_qp_init_attr init_attr = {
   .send_cq = ctx-cq,
   .recv_cq = ctx-cq,
   .cap = {
 @@ -379,11 +382,16 @@ static struct pingpong_context *pp_init_ctx(struct 
 ibv_device *ib_dev, int size,
   .qp_type = IBV_QPT_RC
   };
 
 - ctx-qp = ibv_create_qp(ctx-pd, attr);
 + ctx-qp = ibv_create_qp(ctx-pd, init_attr);
   if (!ctx-qp)  {
   fprintf(stderr, Couldn't create QP\n);
   goto clean_cq;
   }
 +
 + ibv_query_qp(ctx-qp, attr, IBV_QP_CAP, init_attr);
 + if (init_attr.cap.max_inline_data = size) {
 + ctx-send_flags |= IBV_SEND_INLINE;
 + }
   }
 
   {
 @@ -508,7 +516,7 @@ static int pp_post_send(struct pingpong_context *ctx)
   .sg_list= list,
   .num_sge= 1,
   .opcode = IBV_WR_SEND,
 - .send_flags = IBV_SEND_SIGNALED,
 + .send_flags = ctx-send_flags,
   };
   struct ibv_send_wr *bad_wr;
 
 diff --git a/examples/srq_pingpong.c b/examples/srq_pingpong.c
 index 6e00f8c..552a144 100644
 --- a/examples/srq_pingpong.c
 +++ b/examples/srq_pingpong.c
 @@ -68,6 +68,7 @@ struct pingpong_context {
   struct ibv_qp   *qp[MAX_QP];
   void*buf;
   int  size;
 + int  send_flags;
   int  num_qp;
   int  rx_depth;
   int  pending[MAX_QP];
 @@ -350,9 +351,10 @@ static struct pingpong_context *pp_init_ctx(struct 
 ibv_device *ib_dev, int size,
   if (!ctx)
   return NULL;
 
 - ctx-size = size;
 - ctx-num_qp   = num_qp;
 - ctx-rx_depth = rx_depth;
 + ctx-size   = size;
 + ctx-send_flags = IBV_SEND_SIGNALED;
 + ctx-num_qp = num_qp;
 + ctx-rx_depth   = rx_depth;
 
   ctx-buf = memalign(page_size, size);
   if (!ctx-buf) {
 @@ -413,7 +415,8 @@ static struct pingpong_context *pp_init_ctx(struct 
 ibv_device *ib_dev, int size,
   }
 
   for (i = 0; i  num_qp; ++i) {
 - struct ibv_qp_init_attr attr = {
 + struct ibv_qp_attr attr;
 + struct ibv_qp_init_attr init_attr = {
   .send_cq = ctx-cq,
   .recv_cq = ctx-cq,
   .srq = ctx-srq,
 @@ -424,11 +427,15 @@ static struct pingpong_context *pp_init_ctx(struct 
 ibv_device *ib_dev, int size,
   .qp_type = IBV_QPT_RC
   };
 
 - ctx-qp[i] = ibv_create_qp(ctx-pd, attr);
 + ctx-qp[i] = ibv_create_qp(ctx-pd, init_attr);
   if (!ctx-qp[i])  {
   fprintf(stderr, Couldn't create QP[%d]\n, i);
   goto clean_qps;
   }
 + ibv_query_qp(ctx-qp[i], attr, IBV_QP_CAP, init_attr);
 + if (init_attr.cap.max_inline_data = size) {
 + ctx-send_flags |= IBV_SEND_INLINE;
 + }
   }
 
   for (i = 0; i  num_qp; ++i) {
 @@ -568,7 +575,7 @@ static int pp_post_send(struct

Re: [PATCH V2] libibverbs: Allow arbitrary int values for MTU

2013-07-10 Thread Jeff Squyres (jsquyres)
On Jul 8, 2013, at 1:26 PM, Jason Gunthorpe jguntho...@obsidianresearch.com 
wrote:

 Jeff's patch doesn't break old binaries, old binaries, running with
 normal IB MTUs work fine. The structure layouts all stay the same,
 etc.


FWIW, I did a simple test to confirm this.  I installed a stock git HEAD 
libibverbs into $HOME/libibverbs-HEAD and a libibverbs with the MTU patch in 
$HOME/libibverbs-mtu-patch.  The mlx4 driver was installed into both trees (I 
used some fairly old Mellanox HCAs+Dell servers for this test).

This is the base case:

-
[5:06] dell012:~ ❯❯❯ cd libibverbs-HEAD
[5:07] dell012:~/libibverbs-HEAD ❯❯❯ ldd bin/ibv_rc_pingpong
linux-vdso.so.1 =  (0x2aacb000)
libibverbs.so.1 = /home/jsquyres/libibverbs-HEAD/lib/libibverbs.so.1 
(0x2accd000)
libpthread.so.0 = /lib64/libpthread.so.0 (0x2aeec000)
libdl.so.2 = /lib64/libdl.so.2 (0x2b109000)
libc.so.6 = /lib64/libc.so.6 (0x2b30e000)
/lib64/ld-linux-x86-64.so.2 (0x2aaab000)
[5:07] dell012:~/libibverbs-HEAD ❯❯❯ ./bin/ibv_rc_pingpong dell011
  local address:  LID 0x0004, QPN 0x04004a, PSN 0xc08742, GID ::
  remote address: LID 0x0019, QPN 0x20004a, PSN 0x44c48e, GID ::
8192000 bytes in 0.02 seconds = 4170.28 Mbit/sec
1000 iters in 0.02 seconds = 15.72 usec/iter
-

Works fine.  Now let's use the same libibverbs-HEAD rc pingpong binary, but 
with the MTU-patched libibverbs.so:

-
[5:07] dell012:~/libibverbs-HEAD ❯❯❯ mv lib/libibverbs.so.1 
lib/libibverbs.so.1-bogus
[5:07] dell012:~/libibverbs-HEAD ❯❯❯ export 
LD_LIBRARY_PATH=$HOME/libibverbs-mtu-patch/lib
[5:08] dell012:~/libibverbs-HEAD ❯❯❯ ldd bin/ibv_rc_pingpong
linux-vdso.so.1 =  (0x2aacb000)
libibverbs.so.1 = 
/home/jsquyres/libibverbs-mtu-patch/lib/libibverbs.so.1 (0x2accd000)
libpthread.so.0 = /lib64/libpthread.so.0 (0x2aeed000)
libdl.so.2 = /lib64/libdl.so.2 (0x2b10a000)
libc.so.6 = /lib64/libc.so.6 (0x2b30e000)
/lib64/ld-linux-x86-64.so.2 (0x2aaab000)
[5:08] dell012:~/libibverbs-HEAD ❯❯❯ ./bin/ibv_rc_pingpong dell011
  local address:  LID 0x0004, QPN 0x08004a, PSN 0x65391c, GID ::
  remote address: LID 0x0019, QPN 0x24004a, PSN 0x7d137e, GID ::
8192000 bytes in 0.02 seconds = 4163.39 Mbit/sec
1000 iters in 0.02 seconds = 15.74 usec/iter
-

Still works fine.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

N�r��yb�X��ǧv�^�)޺{.n�+{��ٚ�{ay�ʇڙ�,j��f���h���z��w���
���j:+v���w�j�mzZ+�ݢj��!�i

[PATCH] libibverbs: Add the use of IBV_SEND_INLINE to example pingpong programs

2013-07-10 Thread Jeff Squyres
If the send size is less than the cap.max_inline_data reported by the
qp, use the IBV_SEND_INLINE flag.  This now only shows the example of
using ibv_query_qp(), it also reduces the latency time shown by the
pingpong programs when the sends can be inlined.

Signed-off-by: Jeff Squyres jsquy...@cisco.com
---
 examples/rc_pingpong.c  | 18 +-
 examples/srq_pingpong.c | 19 +--
 examples/uc_pingpong.c  | 17 -
 examples/ud_pingpong.c  | 18 +-
 4 files changed, 51 insertions(+), 21 deletions(-)

diff --git a/examples/rc_pingpong.c b/examples/rc_pingpong.c
index 15494a1..a8637a5 100644
--- a/examples/rc_pingpong.c
+++ b/examples/rc_pingpong.c
@@ -65,6 +65,7 @@ struct pingpong_context {
struct ibv_qp   *qp;
void*buf;
int  size;
+   int  send_flags;
int  rx_depth;
int  pending;
struct ibv_port_attr portinfo;
@@ -319,8 +320,9 @@ static struct pingpong_context *pp_init_ctx(struct 
ibv_device *ib_dev, int size,
if (!ctx)
return NULL;
 
-   ctx-size = size;
-   ctx-rx_depth = rx_depth;
+   ctx-size   = size;
+   ctx-send_flags = IBV_SEND_SIGNALED;
+   ctx-rx_depth   = rx_depth;
 
ctx-buf = memalign(page_size, size);
if (!ctx-buf) {
@@ -367,7 +369,8 @@ static struct pingpong_context *pp_init_ctx(struct 
ibv_device *ib_dev, int size,
}
 
{
-   struct ibv_qp_init_attr attr = {
+   struct ibv_qp_attr attr;
+   struct ibv_qp_init_attr init_attr = {
.send_cq = ctx-cq,
.recv_cq = ctx-cq,
.cap = {
@@ -379,11 +382,16 @@ static struct pingpong_context *pp_init_ctx(struct 
ibv_device *ib_dev, int size,
.qp_type = IBV_QPT_RC
};
 
-   ctx-qp = ibv_create_qp(ctx-pd, attr);
+   ctx-qp = ibv_create_qp(ctx-pd, init_attr);
if (!ctx-qp)  {
fprintf(stderr, Couldn't create QP\n);
goto clean_cq;
}
+
+   ibv_query_qp(ctx-qp, attr, IBV_QP_CAP, init_attr);
+   if (init_attr.cap.max_inline_data = size) {
+   ctx-send_flags |= IBV_SEND_INLINE;
+   }
}
 
{
@@ -508,7 +516,7 @@ static int pp_post_send(struct pingpong_context *ctx)
.sg_list= list,
.num_sge= 1,
.opcode = IBV_WR_SEND,
-   .send_flags = IBV_SEND_SIGNALED,
+   .send_flags = ctx-send_flags,
};
struct ibv_send_wr *bad_wr;
 
diff --git a/examples/srq_pingpong.c b/examples/srq_pingpong.c
index 6e00f8c..552a144 100644
--- a/examples/srq_pingpong.c
+++ b/examples/srq_pingpong.c
@@ -68,6 +68,7 @@ struct pingpong_context {
struct ibv_qp   *qp[MAX_QP];
void*buf;
int  size;
+   int  send_flags;
int  num_qp;
int  rx_depth;
int  pending[MAX_QP];
@@ -350,9 +351,10 @@ static struct pingpong_context *pp_init_ctx(struct 
ibv_device *ib_dev, int size,
if (!ctx)
return NULL;
 
-   ctx-size = size;
-   ctx-num_qp   = num_qp;
-   ctx-rx_depth = rx_depth;
+   ctx-size   = size;
+   ctx-send_flags = IBV_SEND_SIGNALED;
+   ctx-num_qp = num_qp;
+   ctx-rx_depth   = rx_depth;
 
ctx-buf = memalign(page_size, size);
if (!ctx-buf) {
@@ -413,7 +415,8 @@ static struct pingpong_context *pp_init_ctx(struct 
ibv_device *ib_dev, int size,
}
 
for (i = 0; i  num_qp; ++i) {
-   struct ibv_qp_init_attr attr = {
+   struct ibv_qp_attr attr;
+   struct ibv_qp_init_attr init_attr = {
.send_cq = ctx-cq,
.recv_cq = ctx-cq,
.srq = ctx-srq,
@@ -424,11 +427,15 @@ static struct pingpong_context *pp_init_ctx(struct 
ibv_device *ib_dev, int size,
.qp_type = IBV_QPT_RC
};
 
-   ctx-qp[i] = ibv_create_qp(ctx-pd, attr);
+   ctx-qp[i] = ibv_create_qp(ctx-pd, init_attr);
if (!ctx-qp[i])  {
fprintf(stderr, Couldn't create QP[%d]\n, i);
goto clean_qps;
}
+   ibv_query_qp(ctx-qp[i], attr, IBV_QP_CAP, init_attr);
+   if (init_attr.cap.max_inline_data = size) {
+   ctx-send_flags |= IBV_SEND_INLINE;
+   }
}
 
for (i = 0; i  num_qp; ++i) {
@@ -568,7 +575,7 @@ static int pp_post_send(struct pingpong_context *ctx, int 
qp_index

Re: [PATCH V2] libibverbs: Allow arbitrary int values for MTU

2013-07-08 Thread Jeff Squyres (jsquyres)
On Jul 5, 2013, at 3:11 PM, Roland Dreier rol...@purestorage.com wrote:

 So what happens if I have an old application binary, and I run against
 a new libibverbs without recompiling?
 
 Also it seems that I'm forced to change my source code to be able to
 compile against new libibverbs?


I previously sent an ABI-preserving version of this patch, but it was hated by 
Doug Ledford and (eventually) Jason Gunthorpe.  

After long discussion (see thread starting here: 
http://www.spinics.net/lists/linux-rdma/msg15951.html), they decided that they 
wanted a clean break that forces both source code and ABI changes, which 
resulted in this patch.

I personally don't care which way this goes; I just want the ability to have 
non-enum MTU values.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2] libibverbs: Allow arbitrary int values for MTU

2013-07-03 Thread Jeff Squyres (jsquyres)
Bump.

On Jul 2, 2013, at 8:31 AM, Jeff Squyres jsquy...@cisco.com wrote:

 (Previous patch did not include updates for the man pages)
 
 Keep IBV_MTU_* enums values as they are, but pass MTU values around as
 a struct containing a single int.  
 
 Per lengthy discusson on the linux-rdma list, this patch introdces a
 source code incompatibility.  Although legacy applications can
 continue to use the enum values, they will need to be updated to use
 the struct.  Newer applications are encouraged to use arbitrary int
 values, not the MTU enums (e.g., 1024, 1500, 9000).
 
 Signed-off-by: Jeff Squyres jsquy...@cisco.com
 ---
 Makefile.am|  3 +-
 examples/devinfo.c | 20 +++--
 examples/pingpong.c| 12 
 examples/pingpong.h|  1 -
 examples/rc_pingpong.c | 10 +++
 examples/srq_pingpong.c| 10 +++
 examples/uc_pingpong.c | 10 +++
 examples/ud_pingpong.c |  2 +-
 include/infiniband/verbs.h | 61 +--
 man/ibv_modify_qp.3|  2 +-
 man/ibv_mtu_to_num.3   | 71 ++
 man/ibv_query_port.3   |  4 +--
 man/ibv_query_qp.3 |  2 +-
 src/cmd.c  |  8 +++---
 src/marshall.c |  2 +-
 15 files changed, 160 insertions(+), 58 deletions(-)
 create mode 100644 man/ibv_mtu_to_num.3
 
 diff --git a/Makefile.am b/Makefile.am
 index 40e83be..1159e55 100644
 --- a/Makefile.am
 +++ b/Makefile.am
 @@ -54,7 +54,8 @@ man_MANS = man/ibv_asyncwatch.1 man/ibv_devices.1 
 man/ibv_devinfo.1 \
 man/ibv_post_srq_recv.3 man/ibv_query_device.3 man/ibv_query_gid.3
 \
 man/ibv_query_pkey.3 man/ibv_query_port.3 man/ibv_query_qp.3  \
 man/ibv_query_srq.3 man/ibv_rate_to_mult.3 man/ibv_reg_mr.3   
 \
 -man/ibv_req_notify_cq.3 man/ibv_resize_cq.3 man/ibv_rate_to_mbps.3
 +man/ibv_req_notify_cq.3 man/ibv_resize_cq.3 man/ibv_rate_to_mbps.3  \
 +man/ibv_mtu_to_num.3
 
 DEBIAN = debian/changelog debian/compat debian/control debian/copyright \
 debian/ibverbs-utils.install debian/libibverbs1.install \
 diff --git a/examples/devinfo.c b/examples/devinfo.c
 index ff078e4..e8fb27e 100644
 --- a/examples/devinfo.c
 +++ b/examples/devinfo.c
 @@ -111,18 +111,6 @@ static const char *atomic_cap_str(enum ibv_atomic_cap 
 atom_cap)
   }
 }
 
 -static const char *mtu_str(enum ibv_mtu max_mtu)
 -{
 - switch (max_mtu) {
 - case IBV_MTU_256:  return 256;
 - case IBV_MTU_512:  return 512;
 - case IBV_MTU_1024: return 1024;
 - case IBV_MTU_2048: return 2048;
 - case IBV_MTU_4096: return 4096;
 - default:   return invalid MTU;
 - }
 -}
 -
 static const char *width_str(uint8_t width)
 {
   switch (width) {
 @@ -301,10 +289,10 @@ static int print_hca_cap(struct ibv_device *ib_dev, 
 uint8_t ib_port)
   printf(\t\tport:\t%d\n, port);
   printf(\t\t\tstate:\t\t\t%s (%d)\n,
  port_state_str(port_attr.state), port_attr.state);
 - printf(\t\t\tmax_mtu:\t\t%s (%d)\n,
 -mtu_str(port_attr.max_mtu), port_attr.max_mtu);
 - printf(\t\t\tactive_mtu:\t\t%s (%d)\n,
 -mtu_str(port_attr.active_mtu), port_attr.active_mtu);
 + printf(\t\t\tmax_mtu:\t\t%d (%d)\n,
 +ibv_mtu_to_num(port_attr.max_mtu), 
 port_attr.max_mtu.mtu);
 + printf(\t\t\tactive_mtu:\t\t%d (%d)\n,
 + ibv_mtu_to_num(port_attr.active_mtu), 
 port_attr.active_mtu.mtu);
   printf(\t\t\tsm_lid:\t\t\t%d\n, port_attr.sm_lid);
   printf(\t\t\tport_lid:\t\t%d\n, port_attr.lid);
   printf(\t\t\tport_lmc:\t\t0x%02x\n, port_attr.lmc);
 diff --git a/examples/pingpong.c b/examples/pingpong.c
 index 90732ef..d1c22c9 100644
 --- a/examples/pingpong.c
 +++ b/examples/pingpong.c
 @@ -36,18 +36,6 @@
 #include stdio.h
 #include string.h
 
 -enum ibv_mtu pp_mtu_to_enum(int mtu)
 -{
 - switch (mtu) {
 - case 256:  return IBV_MTU_256;
 - case 512:  return IBV_MTU_512;
 - case 1024: return IBV_MTU_1024;
 - case 2048: return IBV_MTU_2048;
 - case 4096: return IBV_MTU_4096;
 - default:   return -1;
 - }
 -}
 -
 uint16_t pp_get_local_lid(struct ibv_context *context, int port)
 {
   struct ibv_port_attr attr;
 diff --git a/examples/pingpong.h b/examples/pingpong.h
 index 9cdc03e..91d217b 100644
 --- a/examples/pingpong.h
 +++ b/examples/pingpong.h
 @@ -35,7 +35,6 @@
 
 #include infiniband/verbs.h
 
 -enum ibv_mtu pp_mtu_to_enum(int mtu);
 uint16_t pp_get_local_lid(struct ibv_context *context, int port);
 int pp_get_port_info(struct ibv_context *context, int port,
struct ibv_port_attr *attr);
 diff --git a/examples/rc_pingpong.c b/examples/rc_pingpong.c
 index 15494a1..a7e1836 100644
 --- a/examples/rc_pingpong.c
 +++ b/examples/rc_pingpong.c
 @@ -78,7 +78,7 @@ struct pingpong_dest {
 };
 
 static

[PATCH] libibverbs: Allow arbitrary int values for MTU

2013-07-01 Thread Jeff Squyres
Keep IBV_MTU_* enums values as they are, but pass MTU values around as
a struct containing a single int.  Per lengthy discusson on the
linux-rdma list, this patch introdces a source code incompatibility.
Although legacy applications can continue to use the enum values, they
will need to be updated to use the struct.  Newer applications are
encouraged to use arbitrary int values, not the MTU enums (e.g., 1024,
1500, 9000).

(if people like the idea of this patch, I will send the corresponding
kernel patch)

Signed-off-by: Jeff Squyres jsquy...@cisco.com
---
 Makefile.am|  3 ++-
 examples/devinfo.c | 20 +++
 examples/pingpong.c| 12 -
 examples/pingpong.h|  1 -
 examples/rc_pingpong.c | 10 
 examples/srq_pingpong.c| 10 
 examples/uc_pingpong.c | 10 
 examples/ud_pingpong.c |  2 +-
 include/infiniband/verbs.h | 61 +++---
 man/ibv_modify_qp.3|  2 +-
 man/ibv_query_port.3   |  4 +--
 man/ibv_query_qp.3 |  2 +-
 src/cmd.c  |  8 +++---
 src/marshall.c |  2 +-
 14 files changed, 89 insertions(+), 58 deletions(-)

diff --git a/Makefile.am b/Makefile.am
index 40e83be..1159e55 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -54,7 +54,8 @@ man_MANS = man/ibv_asyncwatch.1 man/ibv_devices.1 
man/ibv_devinfo.1   \
 man/ibv_post_srq_recv.3 man/ibv_query_device.3 man/ibv_query_gid.3 \
 man/ibv_query_pkey.3 man/ibv_query_port.3 man/ibv_query_qp.3   \
 man/ibv_query_srq.3 man/ibv_rate_to_mult.3 man/ibv_reg_mr.3
\
-man/ibv_req_notify_cq.3 man/ibv_resize_cq.3 man/ibv_rate_to_mbps.3
+man/ibv_req_notify_cq.3 man/ibv_resize_cq.3 man/ibv_rate_to_mbps.3  \
+man/ibv_mtu_to_num.3
 
 DEBIAN = debian/changelog debian/compat debian/control debian/copyright \
 debian/ibverbs-utils.install debian/libibverbs1.install \
diff --git a/examples/devinfo.c b/examples/devinfo.c
index ff078e4..e8fb27e 100644
--- a/examples/devinfo.c
+++ b/examples/devinfo.c
@@ -111,18 +111,6 @@ static const char *atomic_cap_str(enum ibv_atomic_cap 
atom_cap)
}
 }
 
-static const char *mtu_str(enum ibv_mtu max_mtu)
-{
-   switch (max_mtu) {
-   case IBV_MTU_256:  return 256;
-   case IBV_MTU_512:  return 512;
-   case IBV_MTU_1024: return 1024;
-   case IBV_MTU_2048: return 2048;
-   case IBV_MTU_4096: return 4096;
-   default:   return invalid MTU;
-   }
-}
-
 static const char *width_str(uint8_t width)
 {
switch (width) {
@@ -301,10 +289,10 @@ static int print_hca_cap(struct ibv_device *ib_dev, 
uint8_t ib_port)
printf(\t\tport:\t%d\n, port);
printf(\t\t\tstate:\t\t\t%s (%d)\n,
   port_state_str(port_attr.state), port_attr.state);
-   printf(\t\t\tmax_mtu:\t\t%s (%d)\n,
-  mtu_str(port_attr.max_mtu), port_attr.max_mtu);
-   printf(\t\t\tactive_mtu:\t\t%s (%d)\n,
-  mtu_str(port_attr.active_mtu), port_attr.active_mtu);
+   printf(\t\t\tmax_mtu:\t\t%d (%d)\n,
+  ibv_mtu_to_num(port_attr.max_mtu), 
port_attr.max_mtu.mtu);
+   printf(\t\t\tactive_mtu:\t\t%d (%d)\n,
+   ibv_mtu_to_num(port_attr.active_mtu), 
port_attr.active_mtu.mtu);
printf(\t\t\tsm_lid:\t\t\t%d\n, port_attr.sm_lid);
printf(\t\t\tport_lid:\t\t%d\n, port_attr.lid);
printf(\t\t\tport_lmc:\t\t0x%02x\n, port_attr.lmc);
diff --git a/examples/pingpong.c b/examples/pingpong.c
index 90732ef..d1c22c9 100644
--- a/examples/pingpong.c
+++ b/examples/pingpong.c
@@ -36,18 +36,6 @@
 #include stdio.h
 #include string.h
 
-enum ibv_mtu pp_mtu_to_enum(int mtu)
-{
-   switch (mtu) {
-   case 256:  return IBV_MTU_256;
-   case 512:  return IBV_MTU_512;
-   case 1024: return IBV_MTU_1024;
-   case 2048: return IBV_MTU_2048;
-   case 4096: return IBV_MTU_4096;
-   default:   return -1;
-   }
-}
-
 uint16_t pp_get_local_lid(struct ibv_context *context, int port)
 {
struct ibv_port_attr attr;
diff --git a/examples/pingpong.h b/examples/pingpong.h
index 9cdc03e..91d217b 100644
--- a/examples/pingpong.h
+++ b/examples/pingpong.h
@@ -35,7 +35,6 @@
 
 #include infiniband/verbs.h
 
-enum ibv_mtu pp_mtu_to_enum(int mtu);
 uint16_t pp_get_local_lid(struct ibv_context *context, int port);
 int pp_get_port_info(struct ibv_context *context, int port,
 struct ibv_port_attr *attr);
diff --git a/examples/rc_pingpong.c b/examples/rc_pingpong.c
index 15494a1..a7e1836 100644
--- a/examples/rc_pingpong.c
+++ b/examples/rc_pingpong.c
@@ -78,7 +78,7 @@ struct pingpong_dest {
 };
 
 static int pp_connect_ctx(struct pingpong_context *ctx, int port, int my_psn,
- enum ibv_mtu mtu, int sl,
+ struct ibv_mtu_t mtu, int sl

Re: [PATCH V2] libibverbs: Allow arbitrary int values for MTU.

2013-06-22 Thread Jeff Squyres (jsquyres)
On Jun 21, 2013, at 5:20 PM, Jason Gunthorpe jguntho...@obsidianresearch.com 
wrote:

 Jeff: If you are still reading -

I am still reading, just didn't have much to contribute until now.  :-)

 one concrete suggestion, I think, is
 to ensure compile-time failure when the new-format MTU variable is
 touched.  This is trivially done by wrapping it in a struct:
 
 struct ibv_mtu_t {int __mtu;};

Sure, I can work up a patch that does this.

Do others agree?  Roland?

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] libibverbs: Allow arbitrary int values for MTU

2013-06-20 Thread Jeff Squyres (jsquyres)
On Jun 18, 2013, at 2:49 PM, Jason Gunthorpe jguntho...@obsidianresearch.com 
wrote:

 +int num_to_ibv_mtu(int num);
 
 Probably should be ibv_num_to_mtu() to keep with the naming pattern..


New patch coming momentarily, but I wanted to comment on this one: 

I used the name num_to_ibv_mtu because it is in the spirit of the other 
enum-to-int/int-to-enum function pair naming conventions:

int ibv_rate_to_mult(enum ibv_rate rate);
enum ibv_rate mult_to_ibv_rate(int mult);

int ibv_rate_to_mbps(enum ibv_rate rate);
enum ibv_rate mbps_to_ibv_rate(int mbps);

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2] libibverbs: Allow arbitrary int values for MTU.

2013-06-20 Thread Jeff Squyres
Keep IBV_MTU_* enums values as they are, but pass MTU values around as
int's.  This is an ABI-compatible change; legacy applications will use
the enum values, but newer applications can use an int for values that
do not currently exist in the enum set (e.g., 1500, 9000).

(if people like the idea of this patch, I will send the corresponding
kernel patch)

Signed-off-by: Jeff Squyres jsquy...@cisco.com
---
 Makefile.am|  3 ++-
 examples/devinfo.c | 20 +++---
 examples/pingpong.c| 12 -
 examples/pingpong.h|  1 -
 examples/rc_pingpong.c |  8 +++---
 examples/srq_pingpong.c|  8 +++---
 examples/uc_pingpong.c |  8 +++---
 examples/ud_pingpong.c |  2 +-
 include/infiniband/verbs.h | 55 ++---
 man/ibv_modify_qp.3|  2 +-
 man/ibv_mtu_to_num.3   | 67 ++
 man/ibv_query_port.3   |  4 +--
 man/ibv_query_qp.3 |  2 +-
 13 files changed, 142 insertions(+), 50 deletions(-)
 create mode 100644 man/ibv_mtu_to_num.3

diff --git a/Makefile.am b/Makefile.am
index 40e83be..1159e55 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -54,7 +54,8 @@ man_MANS = man/ibv_asyncwatch.1 man/ibv_devices.1 
man/ibv_devinfo.1   \
 man/ibv_post_srq_recv.3 man/ibv_query_device.3 man/ibv_query_gid.3 \
 man/ibv_query_pkey.3 man/ibv_query_port.3 man/ibv_query_qp.3   \
 man/ibv_query_srq.3 man/ibv_rate_to_mult.3 man/ibv_reg_mr.3
\
-man/ibv_req_notify_cq.3 man/ibv_resize_cq.3 man/ibv_rate_to_mbps.3
+man/ibv_req_notify_cq.3 man/ibv_resize_cq.3 man/ibv_rate_to_mbps.3  \
+man/ibv_mtu_to_num.3
 
 DEBIAN = debian/changelog debian/compat debian/control debian/copyright \
 debian/ibverbs-utils.install debian/libibverbs1.install \
diff --git a/examples/devinfo.c b/examples/devinfo.c
index ff078e4..9f51dcb 100644
--- a/examples/devinfo.c
+++ b/examples/devinfo.c
@@ -111,18 +111,6 @@ static const char *atomic_cap_str(enum ibv_atomic_cap 
atom_cap)
}
 }
 
-static const char *mtu_str(enum ibv_mtu max_mtu)
-{
-   switch (max_mtu) {
-   case IBV_MTU_256:  return 256;
-   case IBV_MTU_512:  return 512;
-   case IBV_MTU_1024: return 1024;
-   case IBV_MTU_2048: return 2048;
-   case IBV_MTU_4096: return 4096;
-   default:   return invalid MTU;
-   }
-}
-
 static const char *width_str(uint8_t width)
 {
switch (width) {
@@ -301,10 +289,10 @@ static int print_hca_cap(struct ibv_device *ib_dev, 
uint8_t ib_port)
printf(\t\tport:\t%d\n, port);
printf(\t\t\tstate:\t\t\t%s (%d)\n,
   port_state_str(port_attr.state), port_attr.state);
-   printf(\t\t\tmax_mtu:\t\t%s (%d)\n,
-  mtu_str(port_attr.max_mtu), port_attr.max_mtu);
-   printf(\t\t\tactive_mtu:\t\t%s (%d)\n,
-  mtu_str(port_attr.active_mtu), port_attr.active_mtu);
+   printf(\t\t\tmax_mtu:\t\t%d (%d)\n,
+  ibv_mtu_to_num(port_attr.max_mtu), port_attr.max_mtu);
+   printf(\t\t\tactive_mtu:\t\t%d (%d)\n,
+  ibv_mtu_to_num(port_attr.active_mtu), 
port_attr.active_mtu);
printf(\t\t\tsm_lid:\t\t\t%d\n, port_attr.sm_lid);
printf(\t\t\tport_lid:\t\t%d\n, port_attr.lid);
printf(\t\t\tport_lmc:\t\t0x%02x\n, port_attr.lmc);
diff --git a/examples/pingpong.c b/examples/pingpong.c
index 90732ef..d1c22c9 100644
--- a/examples/pingpong.c
+++ b/examples/pingpong.c
@@ -36,18 +36,6 @@
 #include stdio.h
 #include string.h
 
-enum ibv_mtu pp_mtu_to_enum(int mtu)
-{
-   switch (mtu) {
-   case 256:  return IBV_MTU_256;
-   case 512:  return IBV_MTU_512;
-   case 1024: return IBV_MTU_1024;
-   case 2048: return IBV_MTU_2048;
-   case 4096: return IBV_MTU_4096;
-   default:   return -1;
-   }
-}
-
 uint16_t pp_get_local_lid(struct ibv_context *context, int port)
 {
struct ibv_port_attr attr;
diff --git a/examples/pingpong.h b/examples/pingpong.h
index 9cdc03e..91d217b 100644
--- a/examples/pingpong.h
+++ b/examples/pingpong.h
@@ -35,7 +35,6 @@
 
 #include infiniband/verbs.h
 
-enum ibv_mtu pp_mtu_to_enum(int mtu);
 uint16_t pp_get_local_lid(struct ibv_context *context, int port);
 int pp_get_port_info(struct ibv_context *context, int port,
 struct ibv_port_attr *attr);
diff --git a/examples/rc_pingpong.c b/examples/rc_pingpong.c
index 15494a1..2d6d30e 100644
--- a/examples/rc_pingpong.c
+++ b/examples/rc_pingpong.c
@@ -78,7 +78,7 @@ struct pingpong_dest {
 };
 
 static int pp_connect_ctx(struct pingpong_context *ctx, int port, int my_psn,
- enum ibv_mtu mtu, int sl,
+ ibv_mtu_t mtu, int sl,
  struct pingpong_dest *dest, int sgid_idx)
 {
struct ibv_qp_attr attr = {
@@ -209,7 +209,7 @@ out:
 }
 
 static struct

Re: [PATCH] libibverbs: Allow arbitrary int values for MTU

2013-06-20 Thread Jeff Squyres (jsquyres)
On Jun 20, 2013, at 1:09 PM, Hefty, Sean sean.he...@intel.com wrote:

 int ibv_rate_to_mult(enum ibv_rate rate);
 enum ibv_rate mult_to_ibv_rate(int mult);
 
 int ibv_rate_to_mbps(enum ibv_rate rate);
 enum ibv_rate mbps_to_ibv_rate(int mbps);
 
 libibverbs uses the ibv_ prefix for pretty much everything.

...except for those 2 functions above (mbps_to_ibv_rate and mult_to_ibv_rate).  
See:

https://git.kernel.org/cgit/libs/infiniband/libibverbs.git/tree/include/infiniband/verbs.h#n392

and

https://git.kernel.org/cgit/libs/infiniband/libibverbs.git/tree/include/infiniband/verbs.h#n379

respectively.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] libibverbs: Allow arbitrary int values for MTU

2013-06-20 Thread Jeff Squyres (jsquyres)
On Jun 20, 2013, at 4:40 PM, Doug Ledford dledf...@redhat.com wrote:

 {
  static char str[16];
  snprintf(str, sizeof(str), %d, ibv_mtu_to_num(max_mtu));
return str;
 }
 
 That is not, however, multi-thread safe nor advisable unless you clearly
 indicate in the man page to the function that subsequent calls to the
 function wipe out the result of previous calls.  It's not even single
 thread safe if you have more than one interface and don't know that
 later calls wipe this buffer out.  Best to avoid library routines such
 as this.

This is in the devinfo.c program (which is single-threaded), not in the library 
itself.

But regardless, this whole function went away in V2 of the patch.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] libibverbs: Allow arbitrary int values for MTU

2013-06-17 Thread Jeff Squyres
Keep IBV_MTU_* enums values as they are, but pass MTU values around as
int's.  This is an ABI-compatible change; legacy applications will use
the enum values, but newer applications can use an int for values that
do not currently exist in the enum set (e.g., 1500, 9000).

(if people like the idea of this patch, I will send the corresponding
kernel patch)

Signed-off-by: Jeff Squyres jsquy...@cisco.com
---
 examples/devinfo.c | 11 +--
 examples/pingpong.c| 12 
 examples/pingpong.h|  1 -
 examples/rc_pingpong.c |  8 
 examples/srq_pingpong.c|  8 
 examples/uc_pingpong.c |  8 
 examples/ud_pingpong.c |  2 +-
 include/infiniband/verbs.h | 20 +---
 man/ibv_modify_qp.3|  2 +-
 man/ibv_query_port.3   |  4 ++--
 man/ibv_query_qp.3 |  2 +-
 src/libibverbs.map |  3 +++
 src/verbs.c| 24 
 13 files changed, 70 insertions(+), 35 deletions(-)

diff --git a/examples/devinfo.c b/examples/devinfo.c
index ff078e4..b856c82 100644
--- a/examples/devinfo.c
+++ b/examples/devinfo.c
@@ -111,15 +111,22 @@ static const char *atomic_cap_str(enum ibv_atomic_cap 
atom_cap)
}
 }
 
-static const char *mtu_str(enum ibv_mtu max_mtu)
+static const char *mtu_str(int max_mtu)
 {
+   static char str[16];
+
switch (max_mtu) {
case IBV_MTU_256:  return 256;
case IBV_MTU_512:  return 512;
case IBV_MTU_1024: return 1024;
case IBV_MTU_2048: return 2048;
case IBV_MTU_4096: return 4096;
-   default:   return invalid MTU;
+   default:
+   if (max_mtu  0) {
+   snprintf(str, sizeof(str), %d, max_mtu);
+   return str;
+   }
+   return invalid MTU;
}
 }
 
diff --git a/examples/pingpong.c b/examples/pingpong.c
index 90732ef..d1c22c9 100644
--- a/examples/pingpong.c
+++ b/examples/pingpong.c
@@ -36,18 +36,6 @@
 #include stdio.h
 #include string.h
 
-enum ibv_mtu pp_mtu_to_enum(int mtu)
-{
-   switch (mtu) {
-   case 256:  return IBV_MTU_256;
-   case 512:  return IBV_MTU_512;
-   case 1024: return IBV_MTU_1024;
-   case 2048: return IBV_MTU_2048;
-   case 4096: return IBV_MTU_4096;
-   default:   return -1;
-   }
-}
-
 uint16_t pp_get_local_lid(struct ibv_context *context, int port)
 {
struct ibv_port_attr attr;
diff --git a/examples/pingpong.h b/examples/pingpong.h
index 9cdc03e..91d217b 100644
--- a/examples/pingpong.h
+++ b/examples/pingpong.h
@@ -35,7 +35,6 @@
 
 #include infiniband/verbs.h
 
-enum ibv_mtu pp_mtu_to_enum(int mtu);
 uint16_t pp_get_local_lid(struct ibv_context *context, int port);
 int pp_get_port_info(struct ibv_context *context, int port,
 struct ibv_port_attr *attr);
diff --git a/examples/rc_pingpong.c b/examples/rc_pingpong.c
index 15494a1..8a5318b 100644
--- a/examples/rc_pingpong.c
+++ b/examples/rc_pingpong.c
@@ -78,7 +78,7 @@ struct pingpong_dest {
 };
 
 static int pp_connect_ctx(struct pingpong_context *ctx, int port, int my_psn,
- enum ibv_mtu mtu, int sl,
+ int mtu, int sl,
  struct pingpong_dest *dest, int sgid_idx)
 {
struct ibv_qp_attr attr = {
@@ -209,7 +209,7 @@ out:
 }
 
 static struct pingpong_dest *pp_server_exch_dest(struct pingpong_context *ctx,
-int ib_port, enum ibv_mtu mtu,
+int ib_port, int mtu,
 int port, int sl,
 const struct pingpong_dest 
*my_dest,
 int sgid_idx)
@@ -547,7 +547,7 @@ int main(int argc, char *argv[])
int  port = 18515;
int  ib_port = 1;
int  size = 4096;
-   enum ibv_mtu mtu = IBV_MTU_1024;
+   int  mtu = 1024;
int  rx_depth = 500;
int  iters = 1000;
int  use_event = 0;
@@ -608,7 +608,7 @@ int main(int argc, char *argv[])
break;
 
case 'm':
-   mtu = pp_mtu_to_enum(strtol(optarg, NULL, 0));
+   mtu = strtol(optarg, NULL, 0);
if (mtu  0) {
usage(argv[0]);
return 1;
diff --git a/examples/srq_pingpong.c b/examples/srq_pingpong.c
index 6e00f8c..f1eb879 100644
--- a/examples/srq_pingpong.c
+++ b/examples/srq_pingpong.c
@@ -81,7 +81,7 @@ struct pingpong_dest {
union ibv_gid gid;
 };
 
-static int pp_connect_ctx(struct pingpong_context *ctx, int port, enum ibv_mtu 
mtu,
+static int pp_connect_ctx(struct pingpong_context *ctx

Re: Status of ummunot branch?

2013-06-14 Thread Jeff Squyres (jsquyres)
On Jun 12, 2013, at 5:17 PM, Jason Gunthorpe jguntho...@obsidianresearch.com 
wrote:

 Yes, it can, via MAP_FIXED. There are lots of fun tricks you can play
 using that.


You're missing the point.

Normal users (i.e., MPI users) don't do that.  They call malloc() and they get 
what they get.

The whole point of upper-layer APIs is that they hide all the network stuff 
from the application programmer.  Verbs is *hard* for the mere mortal to 
program.  MPI can do a great deal to hide the complexities of verbs from app 
developers, but one major concession that MPI (intentionally) made is that the 
*application provides the buffer*, not MPI.

Hence, we're stuck with what buffers the user passes in.

This is the root of the whole MPI has a registration cache issue.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Status of ummunot branch?

2013-06-12 Thread Jeff Squyres (jsquyres)
On Jun 10, 2013, at 11:56 AM, Liran Liss lir...@mellanox.com wrote:

 Register all address space is the moral equivalent of not having userspace
 registration, so let's talk about it in those terms.  Specifically, there's 
 a subtle
 difference between:
 
 a) telling verbs to register (0...2^64)
   -- Which is weird because it tells verbs to register memory that isn't in 
 my
 address space
 
 Another way to look at it is specify IO access permissions for address 
 space ranges.
 This could be useful to implement a buffer pool to be used for a specific MR 
 only, yet still map/unmap memory within this pool on the fly to optimize 
 physical memory utilization.
 In this case, you would provide smaller ranges than 2^64...


Hmm; I'm not sure I understand.

Userspace doesn't control what virtual addresses it gets back from mmap/etc.  
So how is what you're talking about different than regular/reactive memory 
registration? (vs. pre-emptively registering a whole pile of memory that 
doesn't exist yet)

Specifically: I'm confused because you said you could (preemptively) register 
some small regions (that assumedly don't yet exist in your virtual memory 
address space) and use them as memory pools.  But given that userspace doesn't 
control its virtual address ranges, I'm not sure how that's useful.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Status of ummunot branch?

2013-06-12 Thread Jeff Squyres (jsquyres)
On Jun 10, 2013, at 1:26 PM, Jason Gunthorpe jguntho...@obsidianresearch.com 
wrote:

 I agree that pushing all registration issues out of the application
 and (somewhere) into the verbs stack would be a nice solution.
 
 Well, it creates a mess in another sense, because now you've lost
 context. When your MPI goes to do a 1byte send the kernel may well
 prefetch a few megabytes of page tables, whereas an implementation in
 userspace still has the context and can say, no I don't need that..

It seems like there are Big Problems on either side of this problem (userspace 
and kernel).

I thought that ummunotify was a good balance between the two -- MPI kept its 
registration caches (which are annoying, but we have long-since understood that 
*someone* has to maintain them), but it gets a bulletproof way to keep them 
coherent.  That is what is missing in today's solutions: bulletproofness (plus 
we have to use the horrid glibc malloc hooks, which are deprecated and are 
going away).

 That being said, everyone I've talked to about ODP finds it very,
 very strange that the kernel would keep memory registrations around
 for memory that is no longer part of a process.  Not only does it
 
 MRs are badly named. They are not 'memory registrations'. They are
 'address registrations'. Don't conflat address === memory in your
 head, then it seems weird :)
 
 The memory the address space points to is flexible.
 
 The address space is tied to the lifetime of the process.
 
 It doesn't matter if there is no memory mapped to the address space,
 the address space is still there.
 
 Liran had a good example. You can register address space and then use
 mmap/munmap/MAP_FIXED to mess around with where it points to

...but this is not how people write applications.  Real apps use malloc (and 
some direct mmap, and perhaps even some shared memory).  They don't pay 
attention to the contiguiousness (is that a word?) of memory/addresses in the 
large scale.  To be clear: the most tightly bound codes *do* actually care 
about cache hits and locality, but that's in the small scale -- not in the 
large scale.  I would find it hard to believe that a real code would pay 
attention to where in its address range a given malloc() returns, for example.

*That's* what makes this whole concept weird.

It seems like this is a perfect kernel space concept, but is quite foreign to 
userspace developers.

 A practical example of using this would be to avoid the need to send
 scatter buffer pointers to the remote. The remote writes into a memory
 ring and the ring is made 'endless' by clever use of remapping.

I don't understand -- please explain your example a bit more...?

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Status of ummunot branch?

2013-06-10 Thread Jeff Squyres (jsquyres)
On Jun 7, 2013, at 4:57 PM, Jason Gunthorpe jguntho...@obsidianresearch.com 
wrote:

 We talked about this at the MPI Forum this week; it doesn't seem
 like ODP fixes any MPI problems.
 
 ODP without 'register all address space' changes the nature of the
 problem, and fixes only one problem.

I agree that pushing all registration issues out of the application and 
(somewhere) into the verbs stack would be a nice solution.

 You do need to cache registrations, and all the tuning parameters (how
 much do I cache, how long do I hold it for, etc, etc) all still apply.
 
 What goes away (is fixed) is the need for intercepts and the need to
 purge address space from the cache because the backing registration
 has become non-coherent/invalid. Registrations are always
 coherent/valid with ODP.

 This cache, and the associated optimization problem, can never go
 away. With a 'register all of memory' semantic the cache can move into
 the kernel, but the performance implication and overheads are all
 still present, just migrated.

Good summary; and you corrected some of my mistakes -- thanks.

That being said, everyone I've talked to about ODP finds it very, very strange 
that the kernel would keep memory registrations around for memory that is no 
longer part of a process.  Not only does it lead to the new memory is 
magically already registered semantic that I find weird, it's just plain *odd* 
for the kernel to maintain state for something that doesn't exist any more.  It 
feels dirty.

Sidenote: I was just informed today that the current way MPI implementations 
implement registration cache coherence (glibc malloc hooks) has been deprecated 
and will be removed from glibc 
(http://sourceware.org/ml/libc-alpha/2011-05/msg00103.html).  This really puts 
on the pressure to find a new / proper solution.

 What MPI wants is:
 
 1. verbs for ummunotify-like functionality
 2. non-blocking memory registration verbs; poll the cq to know when it has 
 completed
 
 To me, ODP with an additional 'register all address space' semantic, plus
 an asynchronous prefetch does both of these for you.
 
 1. ummunotify functionality and caching is now in the kernel, under
   ODP. RDMA access to an 'all of memory' registration always does the
   right thing.

Register all address space is the moral equivalent of not having userspace 
registration, so let's talk about it in those terms.  Specifically, there's a 
subtle difference between:

a) telling verbs to register (0...2^64) 
   -- Which is weird because it tells verbs to register memory that isn't in 
my address space
b) telling verbs that the app doesn't want to handle registration
   -- How that gets implemented is not important (from userspace's point of 
view) -- if the kernel chooses to implement that by registering non-existent 
memory, that's the kernel's problem

I guess I'm arguing that registering non-existent memory is not the Right Thing.

Regardless of what solution is devised for registered memory management 
(ummunotify, ODP, or something else), a non-blocking verb for registering 
memory would still be a Very Useful Thing.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Status of ummunot branch?

2013-06-07 Thread Jeff Squyres (jsquyres)
On Jun 6, 2013, at 4:33 PM, Jeff Squyres (jsquyres) jsquy...@cisco.com wrote:

 I don't think this covers other memory regions, like those added via mmap, 
 right?


We talked about this at the MPI Forum this week; it doesn't seem like ODP fixes 
any MPI problems.

1. MPI still has to have a memory registration cache, because 
ibv_reg_mr(0...sbrk()) doesn't cover the stack or mmap'ed memory, etc.

2. MPI still has to intercept (at least) munmap().

3. Having mmap/malloc/etc. return new memory that may already be registered 
because of a prior memory registration and subsequent munmap/free/etc. is just 
plain weird.  Worse, if we re-register it, ref counts could go such that the 
actual registration will never actually expire until the process dies (which 
could lead to processes with abnormally large memory footprints, because they 
never actually let go of memory because it's still registered).

4. Even if MPI checks the value of sbrk() and re-registers (0...sbrk()) when 
sbrk() increases, this would seem to create a lot of work for the kernel -- 
which is both slow and synchronous.  Example:

a = malloc(5GB);
MPI_Send(a, 1, MPI_CHAR, ...); // MPI sends 1 byte

Then the MPI_Send of 1 byte will have to pay the cost of registering 5GB of new 
memory.

-

Unless we understand this wrong (and there's definitely a chance that we do!), 
it doesn't sound like ODP solves anything for MPI.  Especially since HPC 
applications almost never swap (in fact, swap is usually disabled in HPC 
environments).

What MPI wants is:

1. verbs for ummunotify-like functionality
2. non-blocking memory registration verbs; poll the cq to know when it has 
completed

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Status of ummunot branch?

2013-06-06 Thread Jeff Squyres (jsquyres)
On Jun 5, 2013, at 10:52 PM, Haggai Eran hagg...@mellanox.com wrote:

 Haggai: A verb to resize a registration would probably be a helpful
 step. MPI could maintain one registration that covers the sbrk
 region and one registration that covers the heap, much easier than
 searching tables and things.
 
 That's a nice idea. Even without this verb, I think it is possible to
 develop a registration cache that covers those regions though. When you
 find out you have some part of your region not registered, you can
 register a new, larger region that covers everything you need. For new
 operations you only use the newer region. Once the previous, smaller
 region is not used, you de-register it.


I'm not sure what you mean.  Are you saying I should do something like this:

MPI_Init() {
// the first MPI function invoked
  mpi_sbrk_save = sbrk();
  ibv_reg_mr(..., 0, mpi_sbrk_save, ...);
  ...
}

MPI_Send(buffer, ...) {
  if (mpi_sbrk_save != sbrk())
  mpi_sbrk_save = sbrk();
  ibv_rereg_mr(..., 0, mpi_sbrk_save, ...);
  ...
}

I don't think this covers other memory regions, like those added via mmap, 
right?

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Status of ummunot branch?

2013-06-05 Thread Jeff Squyres (jsquyres)
On Jun 5, 2013, at 12:14 AM, Haggai Eran hagg...@mellanox.com wrote:

 Hmm; I'm confused.  How does this fix the 
 MPI-needs-to-intercept-freed-memory problem?
 Well, there is no problem if an application frees registered memory (in
 an on-demand paging memory region) and that memory is returned to the
 OS. The OS will invalidate these pages, and the HCA will no longer be
 able to use them. This means that the registration cache doesn't have to
 de-register memory immediately when it is freed.


(must... resist... urge... to... throw... furniture...)

This is why features should not be introduced to solve MPI problems without an 
understanding of what the MPI problems are.  :-)  Please go talk to the 
Mellanox MPI team.

Forgive me for being frustrated; memory registration and all the pain that it 
entails was highlighted as ***the #1 problem*** by *5 major MPI 
implementations* at the Sonoma 2009 workshop (see 
https://www.openfabrics.org/resources/document-downloads/presentations/doc_download/301-mpi-update-and-requirements-panel-all-presentations.html,
 starting at slide 7 in the openmpi slide deck).  

Why don't we have something like ummunotify yet?
Why don't we have non-blocking memory registration yet?
...etc.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] libibverbs: A possible solution for allowing arbitrary MTU values.

2013-06-05 Thread Jeff Squyres
Set the IBV_MTU_* enums equal to their values (e.g., IBV_MTU_1024 =
1024), and then pass MTU values around as int's.  Legacy applications
will use the enum values, but newer applications can use any int for
values that do not currently exist in the enum set (e.g., 1500, 9000).

The obvious drawback is that this will break ABI; applications will
need to be recompiled.

(if this approach/patch is acceptable, I will submit a corresponding
patch for the kernel side)

Signed-off-by: Jeff Squyres jsquy...@cisco.com
---
 examples/devinfo.c | 18 +-
 examples/pingpong.c| 12 
 examples/pingpong.h|  1 -
 examples/rc_pingpong.c |  8 
 examples/srq_pingpong.c|  8 
 examples/uc_pingpong.c |  8 
 include/infiniband/verbs.h | 16 
 man/ibv_modify_qp.3|  2 +-
 man/ibv_query_port.3   |  4 ++--
 man/ibv_query_qp.3 |  2 +-
 10 files changed, 33 insertions(+), 46 deletions(-)

diff --git a/examples/devinfo.c b/examples/devinfo.c
index ff078e4..f46deca 100644
--- a/examples/devinfo.c
+++ b/examples/devinfo.c
@@ -111,16 +111,16 @@ static const char *atomic_cap_str(enum ibv_atomic_cap 
atom_cap)
}
 }
 
-static const char *mtu_str(enum ibv_mtu max_mtu)
+static const char *mtu_str(int max_mtu)
 {
-   switch (max_mtu) {
-   case IBV_MTU_256:  return 256;
-   case IBV_MTU_512:  return 512;
-   case IBV_MTU_1024: return 1024;
-   case IBV_MTU_2048: return 2048;
-   case IBV_MTU_4096: return 4096;
-   default:   return invalid MTU;
-   }
+   static char str[16];
+
+   if (max_mtu  0)
+   snprintf(str, sizeof(str), %d, max_mtu);
+   else
+   strncpy(str, invalid MTU, sizeof(str));
+
+   return str;
 }
 
 static const char *width_str(uint8_t width)
diff --git a/examples/pingpong.c b/examples/pingpong.c
index 90732ef..d1c22c9 100644
--- a/examples/pingpong.c
+++ b/examples/pingpong.c
@@ -36,18 +36,6 @@
 #include stdio.h
 #include string.h
 
-enum ibv_mtu pp_mtu_to_enum(int mtu)
-{
-   switch (mtu) {
-   case 256:  return IBV_MTU_256;
-   case 512:  return IBV_MTU_512;
-   case 1024: return IBV_MTU_1024;
-   case 2048: return IBV_MTU_2048;
-   case 4096: return IBV_MTU_4096;
-   default:   return -1;
-   }
-}
-
 uint16_t pp_get_local_lid(struct ibv_context *context, int port)
 {
struct ibv_port_attr attr;
diff --git a/examples/pingpong.h b/examples/pingpong.h
index 9cdc03e..91d217b 100644
--- a/examples/pingpong.h
+++ b/examples/pingpong.h
@@ -35,7 +35,6 @@
 
 #include infiniband/verbs.h
 
-enum ibv_mtu pp_mtu_to_enum(int mtu);
 uint16_t pp_get_local_lid(struct ibv_context *context, int port);
 int pp_get_port_info(struct ibv_context *context, int port,
 struct ibv_port_attr *attr);
diff --git a/examples/rc_pingpong.c b/examples/rc_pingpong.c
index 15494a1..8a5318b 100644
--- a/examples/rc_pingpong.c
+++ b/examples/rc_pingpong.c
@@ -78,7 +78,7 @@ struct pingpong_dest {
 };
 
 static int pp_connect_ctx(struct pingpong_context *ctx, int port, int my_psn,
- enum ibv_mtu mtu, int sl,
+ int mtu, int sl,
  struct pingpong_dest *dest, int sgid_idx)
 {
struct ibv_qp_attr attr = {
@@ -209,7 +209,7 @@ out:
 }
 
 static struct pingpong_dest *pp_server_exch_dest(struct pingpong_context *ctx,
-int ib_port, enum ibv_mtu mtu,
+int ib_port, int mtu,
 int port, int sl,
 const struct pingpong_dest 
*my_dest,
 int sgid_idx)
@@ -547,7 +547,7 @@ int main(int argc, char *argv[])
int  port = 18515;
int  ib_port = 1;
int  size = 4096;
-   enum ibv_mtu mtu = IBV_MTU_1024;
+   int  mtu = 1024;
int  rx_depth = 500;
int  iters = 1000;
int  use_event = 0;
@@ -608,7 +608,7 @@ int main(int argc, char *argv[])
break;
 
case 'm':
-   mtu = pp_mtu_to_enum(strtol(optarg, NULL, 0));
+   mtu = strtol(optarg, NULL, 0);
if (mtu  0) {
usage(argv[0]);
return 1;
diff --git a/examples/srq_pingpong.c b/examples/srq_pingpong.c
index 6e00f8c..f1eb879 100644
--- a/examples/srq_pingpong.c
+++ b/examples/srq_pingpong.c
@@ -81,7 +81,7 @@ struct pingpong_dest {
union ibv_gid gid;
 };
 
-static int pp_connect_ctx(struct pingpong_context *ctx, int port, enum ibv_mtu 
mtu,
+static int pp_connect_ctx(struct pingpong_context *ctx, int

Re: Status of ummunot branch?

2013-06-05 Thread Jeff Squyres (jsquyres)
On Jun 5, 2013, at 6:39 AM, Haggai Eran hagg...@mellanox.com wrote:

 Perhaps I'm missing something, but I believe ODP deals with the first
 two problems in the list (slide 8), even if it doesn't solve them
 completely.

Unfortunately, it does not.  If we could register(0 ... 2^64) and never have to 
worry about registered memory, that might be cool (depending on how that 
actually works) -- more below.

See this blog post that describes the freed registered memory issue:


http://blogs.cisco.com/performance/registered-memory-rma-rdma-and-mpi-implementations/

and consider the following valid user code:

a = malloc(x);// a gets (va=0x100, pa=0x12345) back from malloc
MPI_Send(a, ...); // MPI registers 0x100 for len=x, and saves (0x100,x) in reg 
cache
free(a);
a = malloc(x);// a gets (va=0x100, pa=0x98765) back from malloc
MPI_Send(a, ...); // MPI sees a=0x100 and things that it is already registered
// ...kaboom

In short, MPI has to intercept free/sbrk/whatever so that it can update its 
registration cache.

 In the future we want to implement an implicit memory region covering
 the entire process address space, thus eliminating the need for memory
 registration almost completely (you might still want memory
 registration, or memory windows, in order to control permissions of
 remote operations).

This would be great, as long as it's fast, transparent, and has no subtle 
implementation effects (like causing additional RNR NAKs for pages that are 
still in memory, which, according to your descriptions, it sounds like it 
won't).

 We can also allow fork to work with our implementation. Copy-on-write
 will work with ODP regions by invalidating the HCA's page tables before
 modifying the pages to be read-only. A page fault from the HCA can then
 refill the pages, or even break COW in case of a write.

That would be cool, too.  fork() has been a continuing problem -- solving that 
problem would be wonderful.

If this ODP stuff becomes a new verb, it would be good:

- if these fork-fixing / register-infinite capabilities can be queried at run 
time (maybe on ibv_device_cap_flags?) so that ULPs can know to use this 
functionality
- if driver owners can get a heads up so that they can know to implement it

 Why don't we have something like ummunotify yet?
 I think that the problem we are trying to solve is better handled inside
 the kernel. If you are going to change the HCA's memory mappings, you'd
 have to go through the kernel anyway.

If/when you allow registering all memory, then I think you're right -- the 
MPI-must-intercept-free/sbrk-whatever issue may go away (that's why I started 
this thread asking about register(0 .. 2^64)).  But without that, unless I'm 
missing something, I don't think it solves the MPI-must-catch-free-sbrk-etc. 
issues...?  And therefore, having some kind of ummunotify-like functionality as 
a verb would be a Very Good Thing.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] libibverbs: A possible solution for allowing arbitrary MTU values.

2013-06-05 Thread Jeff Squyres (jsquyres)
On Jun 5, 2013, at 9:46 AM, Jason Gunthorpe jguntho...@obsidianresearch.com 
wrote:

 No, this too big of an ABI break, and silent at that..
 
 The IBA values have to continue to be accepted and exported in all
 cases so the ABI stays the same, which is what I thought was agreed
 on??


Can this go to a libibverbs 2.0, where it would be palatable to have an ABI 
break?

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] libibverbs: A possible solution for allowing arbitrary MTU values.

2013-06-05 Thread Jeff Squyres (jsquyres)
On Jun 5, 2013, at 10:19 AM, Jason Gunthorpe jguntho...@obsidianresearch.com 
wrote:

 The concept of a libibverbs 2.0 has been NAK's by pretty much everyone
 involved. This is why we are suffering with the complex extension
 mechanism.

Are you saying that libibverbs must always always always be backwards 
compatible, and there will never be an ABI break at any version in the future?

 The mixed approach that was brought up, where values like 1500 were
 passed as 1500, and values like 1024 were passed as 3 seemed doable to
 me. Did you see a problem with it for your use?

It just seems overly complex in terms of implementation.

 Thoughts:
 - 1024 and 3 both mean 1024, the library must accept both values,
   it should only ever return 3 though.

Why?  If the caller can pass in 1024, it seems like 1024 should be able to be 
passed out, too.

 - 1500/etc means 1500, the libray can return that.
 - Make a ibv_from/to_mtu inline function to translate from bytes to
   the encoded MTU value.
 - Switch ibv_mtu from a enum to a typedef int ibv_mtu

That also breaks ABI, doesn't it?

 Jason


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Status of ummunot branch?

2013-06-05 Thread Jeff Squyres (jsquyres)
On Jun 5, 2013, at 10:14 AM, Jason Gunthorpe jguntho...@obsidianresearch.com 
wrote:

 a = malloc(x);// a gets (va=0x100, pa=0x12345) back from malloc
 MPI_Send(a, ...); // MPI registers 0x100 for len=x, and saves (0x100,x) in 
 reg cache
 free(a);
 a = malloc(x);// a gets (va=0x100, pa=0x98765) back from malloc
 MPI_Send(a, ...); // MPI sees a=0x100 and things that it is already 
 registered
 // ...kaboom
 
 ODP is supposed to completely solve this problem. The HCA's view and
 Kernels view of virtual to physical mapping becomes 100% synchronized,
 and there is no 'kaboom'. The kernel updates the HCA after the free,
 and after the 2nd malloc to 100% match the current virtual memory map
 in the process.

Are you saying that the 2nd malloc will magically be registered (with the new 
physical address)?

 AFAIK the ummunotify user space API was nak'd by the core kernel
 guys.

It was NAK'ed by Linus, saying fix your own network stack; this is not needed 
in the general purpose part of the kernel (remember that Roland initially 
developed this as a standalone, non-IB-related kernel module).  

 I got the impression people thought it would be acceptable as a
 rdma API, not a general API. So it is waiting on someone to recast the
 function within verbs to make progress...

'zactly.  Roland has this ummunot branch in his git tree, where he is in the 
middle of incorporating this functionality from the original ummunotify 
standalone kernel module into libibverbs and ibcore.

I started this thread asking the status of that branch.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] libibverbs: A possible solution for allowing arbitrary MTU values.

2013-06-05 Thread Jeff Squyres (jsquyres)
On Jun 5, 2013, at 11:11 AM, Jason Gunthorpe jguntho...@obsidianresearch.com 
wrote:

 I won't say never, but this is what people want. Bumping the soname is
 seen as too difficult now.

Gotcha.  

Ok, so my patch is a non-starter.

 Thoughts:
 - 1024 and 3 both mean 1024, the library must accept both values,
  it should only ever return 3 though.
 
 Why?  If the caller can pass in 1024, it seems like 1024 should be
 able to be passed out, too.
 
 If the caller passes in 1024 then it is probably OK to return 1024,
 but you have to keep track of that specially. That seems more complex
 than just always returning 3. 3 is guarenteed compatible with all
 users.
 
 Old users will test directly against 3.
 New users will call ibv_from_mtu which tests against 3 as well.


Ok.

I'll take a to-do to work up a new patch -- probably not until next week.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Status of ummunot branch?

2013-06-05 Thread Jeff Squyres (jsquyres)
On Jun 5, 2013, at 11:18 AM, Jason Gunthorpe jguntho...@obsidianresearch.com 
wrote:

 Are you saying that the 2nd malloc will magically be registered
 (with the new physical address)?
 
 Yes, that is the whole point.

Interesting.

 ODP fundamentally fixes the *bug* where the HCA's view of process
 memory can become inconsistent with the kernel's view.

Hum.  I was under the impression that with today's code (i.e., not ODP), if you

a = malloc(N);
ibv_reg_mr(..., a, N, ...);
free(a);

(assuming that the memory actually left the process at free)

Then the relevant kernel verbs driver was notified, and would unregister that 
device.  ...but I'm an MPI guy, not a kernel guy -- it seems like you're saying 
that my impression was wrong (which doesn't currently matter because we 
intercept free/sbrk and unregister such memory, anyway).

 'magically be registered' is the wrong way to think about it - the
 registration of VA=0x100 is simply kept, and any change to the
 underlying physical mapping of the VA is synchronized with the HCA.

What happens if you:

a = malloc(N * page_size);
ibv_reg_mr(..., a, N * page_size, ...);
free(a);
// incoming RDMA arrives targeted at buffer a

Or if you:

a = malloc(N * page_size);
ibv_reg_mr(..., a, N * page_size, ...);
free(a);
a = malloc(N / 2 * page_size);
// incoming RDMA arrives targeted at buffer a that is of length (N*page_size)

It does seem quite odd, abstractly speaking, that a registration would survive 
a free/re-malloc (which is arguably a different buffer).

That being said, it still seems like MPI needs a registration cache.  It is 
several good steps forward if we don't need to intercept free/sbrk/whatever, 
but when MPI_Send(buf, ...) is invoked, we still have to check that the entire 
buf is registered.  If ibv_reg_mr(..., 0, 2^64, ...) was supported, that would 
obviate the entire need for registration caches.  That would be wonderful.

 Right, this was discussed at the Enterprise Summit a few weeks
 ago. I'm sure Roland would welcome patches...


That's why I asked at the beginning of this thread.  He didn't provide any 
details about what still needs to be done, though.  :-)

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Status of ummunot branch?

2013-06-05 Thread Jeff Squyres (jsquyres)
On Jun 5, 2013, at 12:05 PM, Jason Gunthorpe jguntho...@obsidianresearch.com 
wrote:

 It does seem quite odd, abstractly speaking, that a registration
 would survive a free/re-malloc (which is arguably a different
 buffer).
 
 Not at all: the purpose of the registration is to allow access via
 RDMA to a portion of the process's address space. The address space
 doesn't change, but what it is mapped to can vary.

I still think it's really weird.  When I do this:

a = malloc(N);
ibv_reg_mr(..., a, N, ...);
free(a);
b = malloc(M);

If b just happens to be partially or wholly registered by some quirk of the 
malloc() system (i.e., some/all of the virtual address space in b happens to 
have been covered by a prior malloc/ibv_reg_mr)... that's just weird.

 If ibv_reg_mr(...,
 0, 2^64, ...) was supported, that would obviate the entire need for
 registration caches.  That would be wonderful.
 
 Yes, except that this shifts around where the registration overhead
 ends up. Basically the HCA driver now has the registration cache you
 had in MPI, and all the same overheads still exist.

There's fewer verbs drivers than applications, right?

 Haggai: A verb to resize a registration would probably be a helpful
 step. MPI could maintain one registration that covers the sbrk
 region and one registration that covers the heap, much easier than
 searching tables and things.

If we still have to register buffers piecemeal, a non-blocking registration 
verb would be quite helpful.

 Also bear in mind that all RDMA access protections will be disabled if
 you register the entire process VM, the remote(s) can scribble/read
 everything..


No problem for MPI/HPC...  :-)

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Status of ummunot branch?

2013-06-04 Thread Jeff Squyres (jsquyres)
On Jun 4, 2013, at 2:54 AM, Haggai Eran hagg...@mellanox.com wrote:

 We wish to get there eventually. In our current implementation you still
 have to register an on-demand memory region explicitly. The difference
 between a regular memory region is that the pages in the region aren't
 pinned.

Does this mean that an MPI implementation still has to register memory upon 
usage, and maintain its own registered memory cache?

 We chose to support only 2 concurrent page faults per QP since this
 allows us to maintain order between the QP's operations and the
 user-space code using it.


I talked to someone who was at the OpenFabrics workshop and saw the ODP 
presentation in person; he tells me that a fault will be incurred when a page 
is not in the HCA's TLB cache (vs. when a registered page is not in memory and 
must be swapped back in), and that this will trigger an RNR NAK.

Is this correct?

He was very concerned about what the size of the TLB on the HCA, and therefore 
what the actual run-time behavior would be for sending around large messages 
via MPI -- i.e., would RDMA'ing 1GB messages now incur this 
HCA-must-reload-its-TLB-and-therefore-incur-RNR-NAKs behavior?

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Status of ummunot branch?

2013-06-04 Thread Jeff Squyres (jsquyres)
On Jun 4, 2013, at 4:50 AM, Haggai Eran hagg...@mellanox.com wrote:

 Does this mean that an MPI implementation still has to register memory upon 
 usage, and maintain its own registered memory cache?
 Yes. However, since registration doesn't pin memory, you can leave
 registered memory regions in the cache for longer periods, and you can
 register larger memory regions without needing to back them with
 physical memory.

Hmm; I'm confused.  How does this fix the MPI-needs-to-intercept-freed-memory 
problem?

 
 We chose to support only 2 concurrent page faults per QP since this
 allows us to maintain order between the QP's operations and the
 user-space code using it.
 
 
 I talked to someone who was at the OpenFabrics workshop and saw the ODP 
 presentation in person; he tells me that a fault will be incurred when a 
 page is not in the HCA's TLB cache (vs. when a registered page is not in 
 memory and must be swapped back in), and that this will trigger an RNR NAK.
 
 Is this correct?
 
 Our HCAs use their own page tables, in addition to a TLB cache. A miss
 in the TLB cache that can be filled from the HCA's page tables will not
 cause an RNR NAK, since the HCA can fill it relatively fast without the
 help of the operating system. If the page is missing from the HCA's page
 table though it will trigger a page fault and ask the OS to bring that
 page. Since this might take longer, in these cases we send an RNR NAK.

Ok.

But the primary use case I care about is fixing the 
MPI-needs-to-intercept-freed-memory problem, and it doesn't sounds like ODP 
fixes this.

 He was very concerned about what the size of the TLB on the HCA, and 
 therefore what the actual run-time behavior would be for sending around 
 large messages via MPI -- i.e., would RDMA'ing 1GB messages now incur this 
 HCA-must-reload-its-TLB-and-therefore-incur-RNR-NAKs behavior?
 
 We have a mechanism to prefetch the pages needed for a large message
 upon the first page fault, which can also help amortizing the cost of
 the page fault for larger messages.


Ok, thanks.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Status of ummunot branch?

2013-06-03 Thread Jeff Squyres (jsquyres)
On May 29, 2013, at 1:53 AM, Or Gerlitz or.gerl...@gmail.com wrote:

 Have you looked on ODP? see
 https://www.openfabrics.org/resources/document-downloads/presentations/doc_download/568-on-demand-paging-for-user-space-networking.html


Is the idea behind ODP that, at the beginning of time, you register the entire 
memory space (i.e., NULL to 2^64) and then never worry about registered memory?

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Status of ummunot branch?

2013-05-30 Thread Jeff Squyres (jsquyres)
On May 30, 2013, at 1:09 AM, Or Gerlitz ogerl...@mellanox.com wrote:

 Has this been run by the MPI implementor community?
 
 The team that works on this here isn't ready for submission, so community 
 runs were not made yet

If this is a solution to an MPI problem, it would seem like a good idea to run 
the specifics of this proposal to the MPI *implementor* community first (not 
*users*).

I say this because Mellanox also proposed the concept of a shared send queue 
as a solution to MPI RC scalability problems a while ago (around about the time 
XRC first debuted, IIRC?), and the MPI community universally hated it.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Status of ummunot branch?

2013-05-29 Thread Jeff Squyres (jsquyres)
On May 29, 2013, at 4:53 AM, Or Gerlitz or.gerl...@gmail.com wrote:

 Have you looked on ODP? see
 https://www.openfabrics.org/resources/document-downloads/presentations/doc_download/568-on-demand-paging-for-user-space-networking.html


Is this upstream?

Has this been run by the MPI implementor community?

The limitation of a max of 2 concurrent page faults seems fairly significant.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Status of ummunot branch?

2013-05-28 Thread Jeff Squyres (jsquyres)
On May 28, 2013, at 1:52 PM, Roland Dreier rol...@purestorage.com wrote:

 Haven't touched it in quite a while except to keep it building.  Needs
 work to finish up.

What kinds of things still need to be done?  (I don't know if we could work on 
this or not; just asking to scope out what would need to be done at this point)

Has anything been done on the userspace side?

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] libibverbs: Use autoreconf in autogen.sh

2013-05-02 Thread Jeff Squyres (jsquyres)
On May 1, 2013, at 11:30 AM, Doug Ledford dledf...@redhat.com wrote:

 This is fine with me, however, I think you also need to bump the
 autotools version to the latest upstream.  The automated checkers in our
 build environment is spitting out errors about a number of upstream
 packages where the autotools used to configure the package does not
 include proper arm support.  The latest autotools bring in all of the
 forthcoming arm variants.  So I would like to see both of these things done.

Are you referring to the version of Autotools that Roland uses to create his 
tarballs?

Because I have no control over that.  :-)


 On Apr 25, 2013, at 11:38 AM, Jeff Squyres (jsquyres) jsquy...@cisco.com 
 wrote:
 
 Bump.
 
 On Apr 22, 2013, at 1:41 PM, Jeff Squyres jsquy...@cisco.com wrote:
 
 The old sequence of Autotools commands listed in autogen.sh is no
 longer correct.  Instead, just use the single autoreconf command,
 which will invoke all the Right Autotools commands in the correct
 order.
 
 Signed-off-by: Jeff Squyres jsquy...@cisco.com
 ---
 autogen.sh | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)
 
 diff --git a/autogen.sh b/autogen.sh
 index fd47839..6c9233e 100755
 --- a/autogen.sh
 +++ b/autogen.sh
 @@ -1,8 +1,4 @@
 #! /bin/sh
 
 set -x
 -aclocal -I config
 -libtoolize --force --copy
 -autoheader
 -automake --foreign --add-missing --copy
 -autoconf
 +autoreconf -ifv -I config
 -- 
 1.8.1.1
 
 
 
 -- 
 Jeff Squyres
 jsquy...@cisco.com
 For corporate legal information go to: 
 http://www.cisco.com/web/about/doing_business/legal/cri/
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 
 


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] Ad IB_MTU_1500|9000 enums.

2013-05-02 Thread Jeff Squyres (jsquyres)
On Apr 22, 2013, at 4:00 PM, Doug Ledford dledf...@redhat.com wrote:

 2. Change all instances of ib_mtu/ibv_mtu to an int.  Code such as 
 switch(mtu) case IBV_MTU_1024: ... will need to be updated to 
 switch(mtu) case 1024: 
 
 I was actually thinking that an ibverbs API version 2.0 might be an
 interesting way to go.  The proliferation of non-IB link layers
 providing the verbs API make some of the original assumptions of IB link
 layer in the original API obsolete.  But, if we were to do that, I'd
 take some time to really think the issue over and try to catch all of
 the needed updates in one go.


In addition to the MTU, another obvious issue is the active_speed attribute on 
the ibv_port_attr.  On the kernel side, it's an enum (IB_SPEED_SDR through 
IB_SPEED_EDR), but there's no corresponding enum names in libibverbs.  

It would be good to make this value a non-enum-int, too.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] libibverbs: Use autoreconf in autogen.sh

2013-04-30 Thread Jeff Squyres (jsquyres)
Bump bump.  :-)

On Apr 25, 2013, at 11:38 AM, Jeff Squyres (jsquyres) jsquy...@cisco.com 
wrote:

 Bump.
 
 On Apr 22, 2013, at 1:41 PM, Jeff Squyres jsquy...@cisco.com wrote:
 
 The old sequence of Autotools commands listed in autogen.sh is no
 longer correct.  Instead, just use the single autoreconf command,
 which will invoke all the Right Autotools commands in the correct
 order.
 
 Signed-off-by: Jeff Squyres jsquy...@cisco.com
 ---
 autogen.sh | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)
 
 diff --git a/autogen.sh b/autogen.sh
 index fd47839..6c9233e 100755
 --- a/autogen.sh
 +++ b/autogen.sh
 @@ -1,8 +1,4 @@
 #! /bin/sh
 
 set -x
 -aclocal -I config
 -libtoolize --force --copy
 -autoheader
 -automake --foreign --add-missing --copy
 -autoconf
 +autoreconf -ifv -I config
 -- 
 1.8.1.1
 
 
 
 -- 
 Jeff Squyres
 jsquy...@cisco.com
 For corporate legal information go to: 
 http://www.cisco.com/web/about/doing_business/legal/cri/
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] libibverbs: Use autoreconf in autogen.sh

2013-04-25 Thread Jeff Squyres (jsquyres)
Bump.

On Apr 22, 2013, at 1:41 PM, Jeff Squyres jsquy...@cisco.com wrote:

 The old sequence of Autotools commands listed in autogen.sh is no
 longer correct.  Instead, just use the single autoreconf command,
 which will invoke all the Right Autotools commands in the correct
 order.
 
 Signed-off-by: Jeff Squyres jsquy...@cisco.com
 ---
 autogen.sh | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)
 
 diff --git a/autogen.sh b/autogen.sh
 index fd47839..6c9233e 100755
 --- a/autogen.sh
 +++ b/autogen.sh
 @@ -1,8 +1,4 @@
 #! /bin/sh
 
 set -x
 -aclocal -I config
 -libtoolize --force --copy
 -autoheader
 -automake --foreign --add-missing --copy
 -autoconf
 +autoreconf -ifv -I config
 -- 
 1.8.1.1
 


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] Use autoreconf in autogen.sh

2013-04-22 Thread Jeff Squyres (jsquyres)
On Apr 19, 2013, at 8:19 PM, Hefty, Sean sean.he...@intel.com wrote:

 It may help if you identify the library this patch is against.  :)

3rd time sending will be the charm...  :-)

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] libiberbs: .gitignore updates and rename configure.in-.ac

2013-04-22 Thread Jeff Squyres
Added some entries to config/.gitignore for newer versions of the GNU
Autotools.  Also renamed configure.in - configure.ac to accomodate
newer GNU Autotools

(http://lists.gnu.org/archive/html/autotools-announce/2012-11/msg0.html
announced the intent to drop support for configure.in in future
versions of Autoconf).

Signed-off-by: Jeff Squyres jsquy...@cisco.com
---
 .gitignore   |  6 +
 configure.ac | 74 
 configure.in | 74 
 3 files changed, 80 insertions(+), 74 deletions(-)
 create mode 100644 configure.ac
 delete mode 100644 configure.in

diff --git a/.gitignore b/.gitignore
index 78effef..d198dd1 100644
--- a/.gitignore
+++ b/.gitignore
@@ -6,6 +6,7 @@ autom4te.cache
 aclocal.m4
 stamp-h.in
 config.h.in
+config.h.in~
 config.log
 config.h
 .libs
@@ -15,3 +16,8 @@ Makefile
 config.status
 stamp-h1
 libtool
+config/libtool.m4
+config/ltoptions.m4
+config/ltsugar.m4
+config/ltversion.m4
+config/lt~obsolete.m4
diff --git a/configure.ac b/configure.ac
new file mode 100644
index 000..efdc5ac
--- /dev/null
+++ b/configure.ac
@@ -0,0 +1,74 @@
+dnl Process this file with autoconf to produce a configure script.
+
+AC_PREREQ(2.57)
+AC_INIT(libibverbs, 1.1.6, linux-rdma@vger.kernel.org)
+AC_CONFIG_SRCDIR([src/ibverbs.h])
+AC_CONFIG_AUX_DIR(config)
+AC_CONFIG_MACRO_DIR(config)
+AC_CONFIG_HEADER(config.h)
+AM_INIT_AUTOMAKE([foreign])
+m4_ifdef([AM_SILENT_RULES], [AM_SILENT_RULES([yes])])
+
+dnl Checks for programs
+AC_PROG_CC
+AC_GNU_SOURCE
+AC_PROG_LN_S
+AC_PROG_LIBTOOL
+
+LT_INIT
+
+AC_ARG_WITH([valgrind],
+AC_HELP_STRING([--with-valgrind],
+[Enable Valgrind annotations (small runtime overhead, default NO)]))
+if test x$with_valgrind = x || test x$with_valgrind = xno; then
+want_valgrind=no
+AC_DEFINE([NVALGRIND], 1, [Define to 1 to disable Valgrind annotations.])
+else
+want_valgrind=yes
+if test -d $with_valgrind; then
+CPPFLAGS=$CPPFLAGS -I$with_valgrind/include
+fi
+fi
+
+dnl Checks for libraries
+AC_CHECK_LIB(dl, dlsym, [],
+AC_MSG_ERROR([dlsym() not found.  libibverbs requires libdl.]))
+AC_CHECK_LIB(pthread, pthread_mutex_init, [],
+AC_MSG_ERROR([pthread_mutex_init() not found.  libibverbs requires 
libpthread.]))
+
+dnl Checks for header files.
+AC_HEADER_STDC
+AC_CHECK_HEADER(valgrind/memcheck.h,
+[AC_DEFINE(HAVE_VALGRIND_MEMCHECK_H, 1,
+[Define to 1 if you have the valgrind/memcheck.h header file.])],
+[if test $want_valgrind = yes; then
+AC_MSG_ERROR([Valgrind memcheck support requested, but 
valgrind/memcheck.h not found.])
+fi])
+
+dnl Checks for typedefs, structures, and compiler characteristics.
+AC_C_CONST
+
+AC_CACHE_CHECK(whether ld accepts --version-script, ac_cv_version_script,
+[if test -n `$LD --help  /dev/null 2/dev/null | grep version-script`; 
then
+   ac_cv_version_script=yes
+else
+   ac_cv_version_script=no
+fi])
+
+if test $ac_cv_version_script = yes; then
+
LIBIBVERBS_VERSION_SCRIPT='-Wl,--version-script=$(srcdir)/src/libibverbs.map'
+else
+LIBIBVERBS_VERSION_SCRIPT=
+fi
+AC_SUBST(LIBIBVERBS_VERSION_SCRIPT)
+
+AC_CACHE_CHECK(for .symver assembler support, ac_cv_asm_symver_support,
+[AC_TRY_COMPILE(, [asm(symbol:\n.symver symbol, api@ABI\n);],
+ac_cv_asm_symver_support=yes,
+ac_cv_asm_symver_support=no)])
+if test $ac_cv_asm_symver_support = yes; then
+AC_DEFINE([HAVE_SYMVER_SUPPORT], 1, [assembler has .symver support])
+fi
+
+AC_CONFIG_FILES([Makefile libibverbs.spec])
+AC_OUTPUT
diff --git a/configure.in b/configure.in
deleted file mode 100644
index efdc5ac..000
--- a/configure.in
+++ /dev/null
@@ -1,74 +0,0 @@
-dnl Process this file with autoconf to produce a configure script.
-
-AC_PREREQ(2.57)
-AC_INIT(libibverbs, 1.1.6, linux-rdma@vger.kernel.org)
-AC_CONFIG_SRCDIR([src/ibverbs.h])
-AC_CONFIG_AUX_DIR(config)
-AC_CONFIG_MACRO_DIR(config)
-AC_CONFIG_HEADER(config.h)
-AM_INIT_AUTOMAKE([foreign])
-m4_ifdef([AM_SILENT_RULES], [AM_SILENT_RULES([yes])])
-
-dnl Checks for programs
-AC_PROG_CC
-AC_GNU_SOURCE
-AC_PROG_LN_S
-AC_PROG_LIBTOOL
-
-LT_INIT
-
-AC_ARG_WITH([valgrind],
-AC_HELP_STRING([--with-valgrind],
-[Enable Valgrind annotations (small runtime overhead, default NO)]))
-if test x$with_valgrind = x || test x$with_valgrind = xno; then
-want_valgrind=no
-AC_DEFINE([NVALGRIND], 1, [Define to 1 to disable Valgrind annotations.])
-else
-want_valgrind=yes
-if test -d $with_valgrind; then
-CPPFLAGS=$CPPFLAGS -I$with_valgrind/include
-fi
-fi
-
-dnl Checks for libraries
-AC_CHECK_LIB(dl, dlsym, [],
-AC_MSG_ERROR([dlsym() not found.  libibverbs requires libdl.]))
-AC_CHECK_LIB(pthread, pthread_mutex_init, [],
-AC_MSG_ERROR([pthread_mutex_init() not found.  libibverbs requires 
libpthread.]))
-
-dnl Checks for header files.
-AC_HEADER_STDC
-AC_CHECK_HEADER(valgrind/memcheck.h

[PATCH 1/2] libibverbs: Use autoreconf in autogen.sh

2013-04-22 Thread Jeff Squyres
The old sequence of Autotools commands listed in autogen.sh is no
longer correct.  Instead, just use the single autoreconf command,
which will invoke all the Right Autotools commands in the correct
order.

Signed-off-by: Jeff Squyres jsquy...@cisco.com
---
 autogen.sh | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/autogen.sh b/autogen.sh
index fd47839..6c9233e 100755
--- a/autogen.sh
+++ b/autogen.sh
@@ -1,8 +1,4 @@
 #! /bin/sh
 
 set -x
-aclocal -I config
-libtoolize --force --copy
-autoheader
-automake --foreign --add-missing --copy
-autoconf
+autoreconf -ifv -I config
-- 
1.8.1.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] Ad IB_MTU_1500|9000 enums.

2013-04-22 Thread Jeff Squyres (jsquyres)
On Apr 22, 2013, at 1:30 PM, Doug Ledford dledf...@redhat.com wrote:

 However, for some reason I had it
 in my mind when I was reading the patch that it was against libibverbs.
 That's what I get for staying up late and reviewing when I'm tired :-/


There were other patches against libibverbs that were submitted at the same 
time.

That being said, I see two obvious ways to go forward, both of which have 
pros/cons:

1. Extend the enum ib_mtu to include new enum values for 1500 and 9000 -- 
probably with a different prefix to indicate that they're not IBTA-sanctioned 
values (note that this will also require corresponding changes in libibverbs, 
since MTU values get passed up from kernel to userspace).

PRO: fixes the immediate problem
PRO: probably the lowest impact solution; just adding some more enum values
CON: weird naming (IB_ and RDMA_ prefixes in the same ib_mtu enum; probably 
something similar in userspace)
CON: doesn't do anything to address other MTU values (e.g., what if someone has 
an MTU of 1498?)

2. Change all instances of ib_mtu/ibv_mtu to an int.  Code such as switch(mtu) 
case IBV_MTU_1024: ... will need to be updated to switch(mtu) case 1024: 

PRO: solves the problem for all MTU values
PRO: eliminates the enum-to-int translation functions
CON: much driver code will need to be updated per above, and also update logic 
checking for out-of-bounds MTU calues
CON: similarly, userspace apps will need to be updated; it might be worthwhile 
to bump libibverbs to 2.x, and then intentionally change the MTU field names in 
ibv_port_attr and ibv_qp_attr so that apps using those fields will fail to 
compile with libibverbs 2.x (and therefore forcibly realize they need to adapt 
to the new int MTU values)

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] Ad IB_MTU_1500|9000 enums.

2013-04-19 Thread Jeff Squyres (jsquyres)
On Apr 12, 2013, at 11:40 AM, Jeff Squyres (jsquyres) jsquy...@cisco.com 
wrote:

 As an aside I like the use of RDMA_MTU_* for these values.  Again to 
 distinguish them from the IBTA values.  But I know that is poor form.
 
 So what's the right way to move forward on this?  Is it this:
 
 enum ib_mtu {
   IB_MTU_256  = 1,
   IB_MTU_512  = 2,
   IB_MTU_1024 = 3,
   IB_MTU_2048 = 4,
   IB_MTU_4096 = 5,
   RDMA_MTU_1500 = 1500,
   RDMA_MTU_9000 = 9000
 };


Bump.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] Use autoreconf in autogen.sh

2013-04-19 Thread Jeff Squyres (jsquyres)
Bump.

Any thoughts on these two patches?  They're pretty trivial, enable use with 
modern versions of Autotools, and now feature the proper Signed-off-by line.


On Apr 13, 2013, at 8:15 AM, Jeff Squyres jsquy...@cisco.com wrote:

 The old sequence of Autotools commands listed in autogen.sh is no
 longer correct.  Instead, just use the single autoreconf command,
 which will invoke all the Right Autotools commands in the correct
 order.
 
 Signed-off-by: Jeff Squyres jsquy...@cisco.com
 ---
 autogen.sh | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)
 
 diff --git a/autogen.sh b/autogen.sh
 index fd47839..6c9233e 100755
 --- a/autogen.sh
 +++ b/autogen.sh
 @@ -1,8 +1,4 @@
 #! /bin/sh
 
 set -x
 -aclocal -I config
 -libtoolize --force --copy
 -autoheader
 -automake --foreign --add-missing --copy
 -autoconf
 +autoreconf -ifv -I config
 -- 
 1.8.1.1
 


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] Use autoreconf in autogen.sh

2013-04-13 Thread Jeff Squyres
The old sequence of Autotools commands listed in autogen.sh is no
longer correct.  Instead, just use the single autoreconf command,
which will invoke all the Right Autotools commands in the correct
order.

Signed-off-by: Jeff Squyres jsquy...@cisco.com
---
 autogen.sh | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/autogen.sh b/autogen.sh
index fd47839..6c9233e 100755
--- a/autogen.sh
+++ b/autogen.sh
@@ -1,8 +1,4 @@
 #! /bin/sh
 
 set -x
-aclocal -I config
-libtoolize --force --copy
-autoheader
-automake --foreign --add-missing --copy
-autoconf
+autoreconf -ifv -I config
-- 
1.8.1.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] .gitignore updates and rename configure.in-.ac

2013-04-13 Thread Jeff Squyres
Added some entries to config/.gitignore for newer versions of the GNU
Autotools.  Also renamed configure.in - configure.ac to accomodate
newer GNU Autotools

(http://lists.gnu.org/archive/html/autotools-announce/2012-11/msg0.html
announced the intent to drop support for configure.in in future
versions of Autoconf).

Signed-off-by: Jeff Squyres jsquy...@cisco.com
---
 .gitignore   |  6 +
 configure.ac | 74 
 configure.in | 74 
 3 files changed, 80 insertions(+), 74 deletions(-)
 create mode 100644 configure.ac
 delete mode 100644 configure.in

diff --git a/.gitignore b/.gitignore
index 78effef..d198dd1 100644
--- a/.gitignore
+++ b/.gitignore
@@ -6,6 +6,7 @@ autom4te.cache
 aclocal.m4
 stamp-h.in
 config.h.in
+config.h.in~
 config.log
 config.h
 .libs
@@ -15,3 +16,8 @@ Makefile
 config.status
 stamp-h1
 libtool
+config/libtool.m4
+config/ltoptions.m4
+config/ltsugar.m4
+config/ltversion.m4
+config/lt~obsolete.m4
diff --git a/configure.ac b/configure.ac
new file mode 100644
index 000..efdc5ac
--- /dev/null
+++ b/configure.ac
@@ -0,0 +1,74 @@
+dnl Process this file with autoconf to produce a configure script.
+
+AC_PREREQ(2.57)
+AC_INIT(libibverbs, 1.1.6, linux-rdma@vger.kernel.org)
+AC_CONFIG_SRCDIR([src/ibverbs.h])
+AC_CONFIG_AUX_DIR(config)
+AC_CONFIG_MACRO_DIR(config)
+AC_CONFIG_HEADER(config.h)
+AM_INIT_AUTOMAKE([foreign])
+m4_ifdef([AM_SILENT_RULES], [AM_SILENT_RULES([yes])])
+
+dnl Checks for programs
+AC_PROG_CC
+AC_GNU_SOURCE
+AC_PROG_LN_S
+AC_PROG_LIBTOOL
+
+LT_INIT
+
+AC_ARG_WITH([valgrind],
+AC_HELP_STRING([--with-valgrind],
+[Enable Valgrind annotations (small runtime overhead, default NO)]))
+if test x$with_valgrind = x || test x$with_valgrind = xno; then
+want_valgrind=no
+AC_DEFINE([NVALGRIND], 1, [Define to 1 to disable Valgrind annotations.])
+else
+want_valgrind=yes
+if test -d $with_valgrind; then
+CPPFLAGS=$CPPFLAGS -I$with_valgrind/include
+fi
+fi
+
+dnl Checks for libraries
+AC_CHECK_LIB(dl, dlsym, [],
+AC_MSG_ERROR([dlsym() not found.  libibverbs requires libdl.]))
+AC_CHECK_LIB(pthread, pthread_mutex_init, [],
+AC_MSG_ERROR([pthread_mutex_init() not found.  libibverbs requires 
libpthread.]))
+
+dnl Checks for header files.
+AC_HEADER_STDC
+AC_CHECK_HEADER(valgrind/memcheck.h,
+[AC_DEFINE(HAVE_VALGRIND_MEMCHECK_H, 1,
+[Define to 1 if you have the valgrind/memcheck.h header file.])],
+[if test $want_valgrind = yes; then
+AC_MSG_ERROR([Valgrind memcheck support requested, but 
valgrind/memcheck.h not found.])
+fi])
+
+dnl Checks for typedefs, structures, and compiler characteristics.
+AC_C_CONST
+
+AC_CACHE_CHECK(whether ld accepts --version-script, ac_cv_version_script,
+[if test -n `$LD --help  /dev/null 2/dev/null | grep version-script`; 
then
+   ac_cv_version_script=yes
+else
+   ac_cv_version_script=no
+fi])
+
+if test $ac_cv_version_script = yes; then
+
LIBIBVERBS_VERSION_SCRIPT='-Wl,--version-script=$(srcdir)/src/libibverbs.map'
+else
+LIBIBVERBS_VERSION_SCRIPT=
+fi
+AC_SUBST(LIBIBVERBS_VERSION_SCRIPT)
+
+AC_CACHE_CHECK(for .symver assembler support, ac_cv_asm_symver_support,
+[AC_TRY_COMPILE(, [asm(symbol:\n.symver symbol, api@ABI\n);],
+ac_cv_asm_symver_support=yes,
+ac_cv_asm_symver_support=no)])
+if test $ac_cv_asm_symver_support = yes; then
+AC_DEFINE([HAVE_SYMVER_SUPPORT], 1, [assembler has .symver support])
+fi
+
+AC_CONFIG_FILES([Makefile libibverbs.spec])
+AC_OUTPUT
diff --git a/configure.in b/configure.in
deleted file mode 100644
index efdc5ac..000
--- a/configure.in
+++ /dev/null
@@ -1,74 +0,0 @@
-dnl Process this file with autoconf to produce a configure script.
-
-AC_PREREQ(2.57)
-AC_INIT(libibverbs, 1.1.6, linux-rdma@vger.kernel.org)
-AC_CONFIG_SRCDIR([src/ibverbs.h])
-AC_CONFIG_AUX_DIR(config)
-AC_CONFIG_MACRO_DIR(config)
-AC_CONFIG_HEADER(config.h)
-AM_INIT_AUTOMAKE([foreign])
-m4_ifdef([AM_SILENT_RULES], [AM_SILENT_RULES([yes])])
-
-dnl Checks for programs
-AC_PROG_CC
-AC_GNU_SOURCE
-AC_PROG_LN_S
-AC_PROG_LIBTOOL
-
-LT_INIT
-
-AC_ARG_WITH([valgrind],
-AC_HELP_STRING([--with-valgrind],
-[Enable Valgrind annotations (small runtime overhead, default NO)]))
-if test x$with_valgrind = x || test x$with_valgrind = xno; then
-want_valgrind=no
-AC_DEFINE([NVALGRIND], 1, [Define to 1 to disable Valgrind annotations.])
-else
-want_valgrind=yes
-if test -d $with_valgrind; then
-CPPFLAGS=$CPPFLAGS -I$with_valgrind/include
-fi
-fi
-
-dnl Checks for libraries
-AC_CHECK_LIB(dl, dlsym, [],
-AC_MSG_ERROR([dlsym() not found.  libibverbs requires libdl.]))
-AC_CHECK_LIB(pthread, pthread_mutex_init, [],
-AC_MSG_ERROR([pthread_mutex_init() not found.  libibverbs requires 
libpthread.]))
-
-dnl Checks for header files.
-AC_HEADER_STDC
-AC_CHECK_HEADER(valgrind/memcheck.h

Re: [PATCH 2/2] Ad IB_MTU_1500|9000 enums.

2013-04-12 Thread Jeff Squyres (jsquyres)
On Apr 9, 2013, at 10:44 PM, Weiny, Ira ira.we...@intel.com wrote:

 As an aside I like the use of RDMA_MTU_* for these values.  Again to 
 distinguish them from the IBTA values.  But I know that is poor form.


So what's the right way to move forward on this?  Is it this:

enum ib_mtu {
   IB_MTU_256  = 1,
   IB_MTU_512  = 2,
   IB_MTU_1024 = 3,
   IB_MTU_2048 = 4,
   IB_MTU_4096 = 5,
   RDMA_MTU_1500 = 1500,
   RDMA_MTU_9000 =  9000
};

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] Ad IB_MTU_1500|9000 enums.

2013-04-09 Thread Jeff Squyres (jsquyres)
On Apr 8, 2013, at 6:16 PM, Hefty, Sean sean.he...@intel.com wrote:

 Why can't IB_MTU_1500 = 1500?


It certainly could.  Additionally, since Roland was a little concerned about 
the IB prefix (since 1500 and 9000 are not IBTA-sanctioned MTUs), they could 
have a different prefix -- perhaps RDMA_MTU_1500.  

Although I admit that it would be weird to have an enum that contains values 
with different prefixes:

enum ib_mtu {
IB_MTU_256  = 1,
IB_MTU_512  = 2,
IB_MTU_1024 = 3,
IB_MTU_2048 = 4,
IB_MTU_4096 = 5,
RDMA_MTU_1500 = 1500,
RDMA_MTU_9000 = 9000
};

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] Add IBV_*_USNIC enums for the Cisco Ethernet Virtual NIC.

2013-04-08 Thread Jeff Squyres (jsquyres)
On Apr 5, 2013, at 4:40 PM, Roland Dreier rol...@purestorage.com wrote:

 I think the idea is that without context, it's hard to know if adding
 these enums makes sense or not.  And I'm sorry but I'm not that
 sympathetic to my code isn't ready but you have to take this
 out-of-context patch so I can meet Red Hat's arbitrary schedule.


Ok, fair enough.  It'll be a few weeks before we can submit usnic.ko, so I'll 
re-bring up the IBV_NODE_VENDOR/related patches then.

I think the MTU discussion is still relevant, however -- there seems to be a 
larger design issue there.  I'll go reply separately on that thread.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] Ad IB_MTU_1500|9000 enums.

2013-04-08 Thread Jeff Squyres (jsquyres)
On Apr 4, 2013, at 1:57 PM, Weiny, Ira ira.we...@intel.com wrote:

 In hindsight, the user space API never should have exposed the mtu as an
 enum...
 
 Since an enum is an int, and we're never going to have anything with an mtu
 = 5 bytes, couldn't we just store all new mtu values directly as their byte
 value?
 
 That seems like a pretty good idea.


Agreed, but changing to an int would seem to have some fairly serious backwards 
compatibility issues.

What is the right way to move forward here?

Just to re-state: our issue is that there does not seem to be any other way to 
get the max UD message size without knowing the actual MTU (are we incorrect 
about that?).  Hence, using the IB-defined values is not really sufficient.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] Use autoreconf in autogen.sh

2013-04-08 Thread Jeff Squyres (jsquyres)
Roland --

If there are no objections, can this patch (and patch 4 of this set: 
https://patchwork.kernel.org/patch/2387321/) be committed?  Neither should not 
have any real impact other than the modernization of the libibverbs build 
system.


On Apr 3, 2013, at 9:06 AM, Jeff Squyres jsquy...@cisco.com wrote:

 The old sequence of Autotools commands listed in autogen.sh is no
 longer correct.  Instead, just use the single autoreconf command,
 which will invoke all the Right Autotools commands in the correct
 order.
 
 ---
 autogen.sh | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)
 
 diff --git a/autogen.sh b/autogen.sh
 index fd47839..6c9233e 100755
 --- a/autogen.sh
 +++ b/autogen.sh
 @@ -1,8 +1,4 @@
 #! /bin/sh
 
 set -x
 -aclocal -I config
 -libtoolize --force --copy
 -autoheader
 -automake --foreign --add-missing --copy
 -autoconf
 +autoreconf -ifv -I config
 -- 
 1.8.1.1
 


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] Add IBV_*_USNIC enums for the Cisco Ethernet Virtual NIC.

2013-04-05 Thread Jeff Squyres (jsquyres)
Forgive the top reply; I'm actually on vacation this week and currently only 
have email access on my phone...

I'm not sure what you're asking me to do. Are you asking us to submit our 
known-buggy-and-not-yet-complete driver just to get two enums approved?

Sent from my phone. No type good. 

On Apr 4, 2013, at 5:27 PM, Or Gerlitz or.gerl...@gmail.com wrote:

 Jeff Squyres (jsquyres) jsquy...@cisco.com wrote:
 
 Sure.  For a little background, the 2nd-generation Cisco VIC has been 
 available
 since last year (IIRC): http://www.cisco.com/en/US/products/ps10277
 /prod_module_series_home.html.  It's a converged 10G Ethernet adapter 
 available  in a variety of form factors (e.g., 2x10G on PCIe and Mezz).
 
 After some off-list discussion with Roland, we chose to create new 
 IBV_*_USNIC
 enums because none of the current enums were accurate for our device.  It's 
 an
 Ethernet NIC, but it's not an RNIC.  It's an Ethernet-based transport, but 
 it's not
 iWARP.
 
 
 The reason we're asking for these IBV_*_USNIC enums now -- before we've 
 submitted the driver -- is because we're targeting getting our driver 
 included in RHEL 6.5.  There's a bit of a chicken-and-egg issue here: 
 they'll accept our patches for a new hardware driver while that driver is 
 being worked upstream.  But they (rightfully) won't accept patches to IB 
 core and libibverbs until they've been vetted by the community.  Hence, even 
 though our driver is slowly working its way through QA and not available 
 yet, we wanted to submit these new enums upstream for community approval so 
 that they can be included in RHEL 6.5.
 
 Does that help?
 
 yes it does, but I still think we need to see the driver code in order
 to conduct proper /better review and maybe even accept the proposed
 changes to the IB core. You can submit it as RFC which means you can
 look on it, and give me comments, but don't pick it up yet
 
 Or.
 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] Add IBV_*_USNIC enums for the Cisco Ethernet Virtual NIC.

2013-04-05 Thread Jeff Squyres (jsquyres)
Per my previous email, forgive my top reply...

RDMA_NODE_VENDOR would be great, actually. Should I work up a patch for that?

Sent from my phone. No type good. 

On Apr 4, 2013, at 10:32 AM, Hefty, Sean sean.he...@intel.com wrote:

 The reason we're asking for these IBV_*_USNIC enums now -- before we've
 submitted the driver -- is because we're targeting getting our driver 
 included
 in RHEL 6.5.  There's a bit of a chicken-and-egg issue here: they'll accept 
 our
 patches for a new hardware driver while that driver is being worked upstream.
 But they (rightfully) won't accept patches to IB core and libibverbs until
 they've been vetted by the community.  Hence, even though our driver is 
 slowly
 working its way through QA and not available yet, we wanted to submit these 
 new
 enums upstream for community approval so that they can be included in RHEL 
 6.5.
 
 I understand the issue.
 
 In the end, these are kernel changes with no actual users of those changes... 
  But then they are also just small changes to a framework...
 
 Just thinking aloud here, but what if we added 'RDMA_NODE_VENDOR' instead?  
 Then other fields, such as transport, become vendor specific.
 
 - Sean
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] Add IBV_*_USNIC enums for the Cisco Ethernet Virtual NIC.

2013-04-04 Thread Jeff Squyres (jsquyres)
On Apr 3, 2013, at 2:45 PM, Or Gerlitz or.gerl...@gmail.com wrote:

 Jeff, I agree with Sean, there's not much point to review/discuss
 these general/pre-step patches without seeing some actual device
 specific kernel (if there are such or user space code if there aren't
 any kernel ones) code. e.g you can submit the two kernel pre-step
 patches as the two first pieces in a series that has the driver code.


Unfortunately, not yet.

I just sent another mail that explained our rationale: our kernel driver and 
libibverbs plugin code are working their way through QA.  It'll take a little 
time before we can submit good patches for these.  The main driving factor for 
submitting these new enums is so that they can be included in RHEL 6.5.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


New patches

2013-04-03 Thread Jeff Squyres (jsquyres)
I'm about to send some patches for libibverbs and Roland's infiniband kernel 
git tree.  The patches fit into two general categories:

1. Add enums for Cisco's Ethernet Virtual NIC (it's not an RNIC and therefore 
doesn't fit the RNIC/IWARP enums).  Also add enums for 1500 and 9000 MTUs.

2. Minor modernization of the GNU Autotools usage in libibverbs.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] Use autoreconf in autogen.sh

2013-04-03 Thread Jeff Squyres
The old sequence of Autotools commands listed in autogen.sh is no
longer correct.  Instead, just use the single autoreconf command,
which will invoke all the Right Autotools commands in the correct
order.

---
 autogen.sh | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/autogen.sh b/autogen.sh
index fd47839..6c9233e 100755
--- a/autogen.sh
+++ b/autogen.sh
@@ -1,8 +1,4 @@
 #! /bin/sh
 
 set -x
-aclocal -I config
-libtoolize --force --copy
-autoheader
-automake --foreign --add-missing --copy
-autoconf
+autoreconf -ifv -I config
-- 
1.8.1.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] Add IBV_*_USNIC enums for the Cisco Ethernet Virtual NIC.

2013-04-03 Thread Jeff Squyres
Per off-list conversation with Roland, add some new enums for the
Cisco Ethernet Virtual NIC (it's not an RNIC/iWARP device, so it
doesn't fit in the same category as RDMA_NODE_RNIC / RDMA_TRANSPORT_IWARP).

USNIC = Userspace NIC.

---
 examples/devinfo.c | 1 +
 include/infiniband/verbs.h | 6 --
 src/enum_strs.c| 5 +++--
 src/init.c | 5 -
 4 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/examples/devinfo.c b/examples/devinfo.c
index 7dc0463..98a6b4b 100644
--- a/examples/devinfo.c
+++ b/examples/devinfo.c
@@ -72,6 +72,7 @@ static const char *transport_str(enum ibv_transport_type 
transport)
switch (transport) {
case IBV_TRANSPORT_IB:return InfiniBand;
case IBV_TRANSPORT_IWARP: return iWARP;
+   case IBV_TRANSPORT_USNIC: return USNIC;
default:  return invalid transport;
}
 }
diff --git a/include/infiniband/verbs.h b/include/infiniband/verbs.h
index 6acfc81..6a6944c 100644
--- a/include/infiniband/verbs.h
+++ b/include/infiniband/verbs.h
@@ -68,13 +68,15 @@ enum ibv_node_type {
IBV_NODE_CA = 1,
IBV_NODE_SWITCH,
IBV_NODE_ROUTER,
-   IBV_NODE_RNIC
+   IBV_NODE_RNIC,
+   IBV_NODE_USNIC
 };
 
 enum ibv_transport_type {
IBV_TRANSPORT_UNKNOWN   = -1,
IBV_TRANSPORT_IB= 0,
-   IBV_TRANSPORT_IWARP
+   IBV_TRANSPORT_IWARP,
+   IBV_TRANSPORT_USNIC
 };
 
 enum ibv_device_cap_flags {
diff --git a/src/enum_strs.c b/src/enum_strs.c
index 54d71a6..0d68c75 100644
--- a/src/enum_strs.c
+++ b/src/enum_strs.c
@@ -38,10 +38,11 @@ const char *ibv_node_type_str(enum ibv_node_type node_type)
[IBV_NODE_CA]   = InfiniBand channel adapter,
[IBV_NODE_SWITCH]   = InfiniBand switch,
[IBV_NODE_ROUTER]   = InfiniBand router,
-   [IBV_NODE_RNIC] = iWARP NIC
+   [IBV_NODE_RNIC] = iWARP NIC,
+   [IBV_NODE_USNIC]= Ethernet USNIC
};
 
-   if (node_type  IBV_NODE_CA || node_type  IBV_NODE_RNIC)
+   if (node_type  IBV_NODE_CA || node_type  IBV_NODE_USNIC)
return unknown;
 
return node_type_str[node_type];
diff --git a/src/init.c b/src/init.c
index 8d6786e..e4ef001 100644
--- a/src/init.c
+++ b/src/init.c
@@ -346,7 +346,7 @@ static struct ibv_device *try_driver(struct ibv_driver 
*driver,
dev-node_type = IBV_NODE_UNKNOWN;
} else {
dev-node_type = strtol(value, NULL, 10);
-   if (dev-node_type  IBV_NODE_CA || dev-node_type  
IBV_NODE_RNIC)
+   if (dev-node_type  IBV_NODE_CA || dev-node_type  
IBV_NODE_USNIC)
dev-node_type = IBV_NODE_UNKNOWN;
}
 
@@ -359,6 +359,9 @@ static struct ibv_device *try_driver(struct ibv_driver 
*driver,
case IBV_NODE_RNIC:
dev-transport_type = IBV_TRANSPORT_IWARP;
break;
+   case IBV_NODE_USNIC:
+   dev-transport_type = IBV_TRANSPORT_USNIC;
+   break;
default:
dev-transport_type = IBV_TRANSPORT_UNKNOWN;
break;
-- 
1.8.1.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] Add IBV_MTU_1500|9000 enums.

2013-04-03 Thread Jeff Squyres
Allow specification of common Ethernet MTUs.

---
 examples/devinfo.c | 2 ++
 examples/pingpong.c| 2 ++
 include/infiniband/verbs.h | 6 --
 3 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/examples/devinfo.c b/examples/devinfo.c
index 98a6b4b..6700882 100644
--- a/examples/devinfo.c
+++ b/examples/devinfo.c
@@ -118,8 +118,10 @@ static const char *mtu_str(enum ibv_mtu max_mtu)
case IBV_MTU_256:  return 256;
case IBV_MTU_512:  return 512;
case IBV_MTU_1024: return 1024;
+   case IBV_MTU_1500: return 1500;
case IBV_MTU_2048: return 2048;
case IBV_MTU_4096: return 4096;
+   case IBV_MTU_9000: return 9000;
default:   return invalid MTU;
}
 }
diff --git a/examples/pingpong.c b/examples/pingpong.c
index 90732ef..d7443a8 100644
--- a/examples/pingpong.c
+++ b/examples/pingpong.c
@@ -42,8 +42,10 @@ enum ibv_mtu pp_mtu_to_enum(int mtu)
case 256:  return IBV_MTU_256;
case 512:  return IBV_MTU_512;
case 1024: return IBV_MTU_1024;
+   case 1500: return IBV_MTU_1500;
case 2048: return IBV_MTU_2048;
case 4096: return IBV_MTU_4096;
+   case 9000: return IBV_MTU_9000;
default:   return -1;
}
 }
diff --git a/include/infiniband/verbs.h b/include/infiniband/verbs.h
index 6a6944c..1583c34 100644
--- a/include/infiniband/verbs.h
+++ b/include/infiniband/verbs.h
@@ -150,8 +150,10 @@ enum ibv_mtu {
IBV_MTU_256  = 1,
IBV_MTU_512  = 2,
IBV_MTU_1024 = 3,
-   IBV_MTU_2048 = 4,
-   IBV_MTU_4096 = 5
+   IBV_MTU_1500 = 4,
+   IBV_MTU_2048 = 5,
+   IBV_MTU_4096 = 6,
+   IBV_MTU_9000 = 7
 };
 
 enum ibv_port_state {
-- 
1.8.1.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] .gitignore updates and renameconfigure.in-.ac

2013-04-03 Thread Jeff Squyres
Added some entries to config/.gitignore for newer versions of the GNU
Autotools.  Also renamed configure.in - configure.ac to accomodate
newer GNU Autotools

(http://lists.gnu.org/archive/html/autotools-announce/2012-11/msg0.html
announced the intent to drop support for configure.in in future
versions of Autoconf).

---
 .gitignore   |  6 +
 configure.ac | 74 
 configure.in | 74 
 3 files changed, 80 insertions(+), 74 deletions(-)
 create mode 100644 configure.ac
 delete mode 100644 configure.in

diff --git a/.gitignore b/.gitignore
index 78effef..d198dd1 100644
--- a/.gitignore
+++ b/.gitignore
@@ -6,6 +6,7 @@ autom4te.cache
 aclocal.m4
 stamp-h.in
 config.h.in
+config.h.in~
 config.log
 config.h
 .libs
@@ -15,3 +16,8 @@ Makefile
 config.status
 stamp-h1
 libtool
+config/libtool.m4
+config/ltoptions.m4
+config/ltsugar.m4
+config/ltversion.m4
+config/lt~obsolete.m4
diff --git a/configure.ac b/configure.ac
new file mode 100644
index 000..efdc5ac
--- /dev/null
+++ b/configure.ac
@@ -0,0 +1,74 @@
+dnl Process this file with autoconf to produce a configure script.
+
+AC_PREREQ(2.57)
+AC_INIT(libibverbs, 1.1.6, linux-rdma@vger.kernel.org)
+AC_CONFIG_SRCDIR([src/ibverbs.h])
+AC_CONFIG_AUX_DIR(config)
+AC_CONFIG_MACRO_DIR(config)
+AC_CONFIG_HEADER(config.h)
+AM_INIT_AUTOMAKE([foreign])
+m4_ifdef([AM_SILENT_RULES], [AM_SILENT_RULES([yes])])
+
+dnl Checks for programs
+AC_PROG_CC
+AC_GNU_SOURCE
+AC_PROG_LN_S
+AC_PROG_LIBTOOL
+
+LT_INIT
+
+AC_ARG_WITH([valgrind],
+AC_HELP_STRING([--with-valgrind],
+[Enable Valgrind annotations (small runtime overhead, default NO)]))
+if test x$with_valgrind = x || test x$with_valgrind = xno; then
+want_valgrind=no
+AC_DEFINE([NVALGRIND], 1, [Define to 1 to disable Valgrind annotations.])
+else
+want_valgrind=yes
+if test -d $with_valgrind; then
+CPPFLAGS=$CPPFLAGS -I$with_valgrind/include
+fi
+fi
+
+dnl Checks for libraries
+AC_CHECK_LIB(dl, dlsym, [],
+AC_MSG_ERROR([dlsym() not found.  libibverbs requires libdl.]))
+AC_CHECK_LIB(pthread, pthread_mutex_init, [],
+AC_MSG_ERROR([pthread_mutex_init() not found.  libibverbs requires 
libpthread.]))
+
+dnl Checks for header files.
+AC_HEADER_STDC
+AC_CHECK_HEADER(valgrind/memcheck.h,
+[AC_DEFINE(HAVE_VALGRIND_MEMCHECK_H, 1,
+[Define to 1 if you have the valgrind/memcheck.h header file.])],
+[if test $want_valgrind = yes; then
+AC_MSG_ERROR([Valgrind memcheck support requested, but 
valgrind/memcheck.h not found.])
+fi])
+
+dnl Checks for typedefs, structures, and compiler characteristics.
+AC_C_CONST
+
+AC_CACHE_CHECK(whether ld accepts --version-script, ac_cv_version_script,
+[if test -n `$LD --help  /dev/null 2/dev/null | grep version-script`; 
then
+   ac_cv_version_script=yes
+else
+   ac_cv_version_script=no
+fi])
+
+if test $ac_cv_version_script = yes; then
+
LIBIBVERBS_VERSION_SCRIPT='-Wl,--version-script=$(srcdir)/src/libibverbs.map'
+else
+LIBIBVERBS_VERSION_SCRIPT=
+fi
+AC_SUBST(LIBIBVERBS_VERSION_SCRIPT)
+
+AC_CACHE_CHECK(for .symver assembler support, ac_cv_asm_symver_support,
+[AC_TRY_COMPILE(, [asm(symbol:\n.symver symbol, api@ABI\n);],
+ac_cv_asm_symver_support=yes,
+ac_cv_asm_symver_support=no)])
+if test $ac_cv_asm_symver_support = yes; then
+AC_DEFINE([HAVE_SYMVER_SUPPORT], 1, [assembler has .symver support])
+fi
+
+AC_CONFIG_FILES([Makefile libibverbs.spec])
+AC_OUTPUT
diff --git a/configure.in b/configure.in
deleted file mode 100644
index efdc5ac..000
--- a/configure.in
+++ /dev/null
@@ -1,74 +0,0 @@
-dnl Process this file with autoconf to produce a configure script.
-
-AC_PREREQ(2.57)
-AC_INIT(libibverbs, 1.1.6, linux-rdma@vger.kernel.org)
-AC_CONFIG_SRCDIR([src/ibverbs.h])
-AC_CONFIG_AUX_DIR(config)
-AC_CONFIG_MACRO_DIR(config)
-AC_CONFIG_HEADER(config.h)
-AM_INIT_AUTOMAKE([foreign])
-m4_ifdef([AM_SILENT_RULES], [AM_SILENT_RULES([yes])])
-
-dnl Checks for programs
-AC_PROG_CC
-AC_GNU_SOURCE
-AC_PROG_LN_S
-AC_PROG_LIBTOOL
-
-LT_INIT
-
-AC_ARG_WITH([valgrind],
-AC_HELP_STRING([--with-valgrind],
-[Enable Valgrind annotations (small runtime overhead, default NO)]))
-if test x$with_valgrind = x || test x$with_valgrind = xno; then
-want_valgrind=no
-AC_DEFINE([NVALGRIND], 1, [Define to 1 to disable Valgrind annotations.])
-else
-want_valgrind=yes
-if test -d $with_valgrind; then
-CPPFLAGS=$CPPFLAGS -I$with_valgrind/include
-fi
-fi
-
-dnl Checks for libraries
-AC_CHECK_LIB(dl, dlsym, [],
-AC_MSG_ERROR([dlsym() not found.  libibverbs requires libdl.]))
-AC_CHECK_LIB(pthread, pthread_mutex_init, [],
-AC_MSG_ERROR([pthread_mutex_init() not found.  libibverbs requires 
libpthread.]))
-
-dnl Checks for header files.
-AC_HEADER_STDC
-AC_CHECK_HEADER(valgrind/memcheck.h,
-[AC_DEFINE(HAVE_VALGRIND_MEMCHECK_H, 1,
-   

[PATCH 2/2] Ad IB_MTU_1500|9000 enums.

2013-04-03 Thread Jeff Squyres
Allow specification of common Ethernet MTUs.

---
 include/rdma/ib_addr.h  | 6 +-
 include/rdma/ib_verbs.h | 8 ++--
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/include/rdma/ib_addr.h b/include/rdma/ib_addr.h
index 9996539..1f6fbbc 100644
--- a/include/rdma/ib_addr.h
+++ b/include/rdma/ib_addr.h
@@ -200,10 +200,14 @@ static inline enum ib_mtu iboe_get_mtu(int mtu)
 */
mtu = mtu - IB_GRH_BYTES - IB_BTH_BYTES - 28;
 
-   if (mtu = ib_mtu_enum_to_int(IB_MTU_4096))
+   if (mtu = ib_mtu_enum_to_int(IB_MTU_9000))
+   return IB_MTU_9000;
+   else if (mtu = ib_mtu_enum_to_int(IB_MTU_4096))
return IB_MTU_4096;
else if (mtu = ib_mtu_enum_to_int(IB_MTU_2048))
return IB_MTU_2048;
+   else if (mtu = ib_mtu_enum_to_int(IB_MTU_1500))
+   return IB_MTU_1500;
else if (mtu = ib_mtu_enum_to_int(IB_MTU_1024))
return IB_MTU_1024;
else if (mtu = ib_mtu_enum_to_int(IB_MTU_512))
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 8a66758..4670f6f 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -174,8 +174,10 @@ enum ib_mtu {
IB_MTU_256  = 1,
IB_MTU_512  = 2,
IB_MTU_1024 = 3,
-   IB_MTU_2048 = 4,
-   IB_MTU_4096 = 5
+   IB_MTU_1500 = 4,
+   IB_MTU_2048 = 5,
+   IB_MTU_4096 = 6,
+   IB_MTU_9000 = 7
 };
 
 static inline int ib_mtu_enum_to_int(enum ib_mtu mtu)
@@ -184,8 +186,10 @@ static inline int ib_mtu_enum_to_int(enum ib_mtu mtu)
case IB_MTU_256:  return  256;
case IB_MTU_512:  return  512;
case IB_MTU_1024: return 1024;
+   case IB_MTU_1500: return 1500;
case IB_MTU_2048: return 2048;
case IB_MTU_4096: return 4096;
+   case IB_MTU_9000: return 9000;
default:  return -1;
}
 }
-- 
1.8.1.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] Add RDMA_*_USNIC enums for the Cisco Ethernet Virtual NIC.

2013-04-03 Thread Jeff Squyres
Per off-list conversation with Roland, add some new enums for the
Cisco Ethernet Virtual NIC (it's not an RNIC/iWARP device, so it
doesn't fit in the same category as RDMA_NODE_RNIC / RDMA_TRANSPORT_IWARP).

USNIC = Userspace NIC.

---
 drivers/infiniband/core/verbs.c | 3 +++
 include/rdma/ib_verbs.h | 6 --
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index a8fdd33..2a35518 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -114,6 +114,8 @@ rdma_node_get_transport(enum rdma_node_type node_type)
return RDMA_TRANSPORT_IB;
case RDMA_NODE_RNIC:
return RDMA_TRANSPORT_IWARP;
+   case RDMA_NODE_USNIC:
+   return RDMA_TRANSPORT_USNIC;
default:
BUG();
return 0;
@@ -130,6 +132,7 @@ enum rdma_link_layer rdma_port_get_link_layer(struct 
ib_device *device, u8 port_
case RDMA_TRANSPORT_IB:
return IB_LINK_LAYER_INFINIBAND;
case RDMA_TRANSPORT_IWARP:
+   case RDMA_TRANSPORT_USNIC:
return IB_LINK_LAYER_ETHERNET;
default:
return IB_LINK_LAYER_UNSPECIFIED;
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 98cc4b2..8a66758 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -67,12 +67,14 @@ enum rdma_node_type {
RDMA_NODE_IB_CA = 1,
RDMA_NODE_IB_SWITCH,
RDMA_NODE_IB_ROUTER,
-   RDMA_NODE_RNIC
+   RDMA_NODE_RNIC,
+   RDMA_NODE_USNIC
 };
 
 enum rdma_transport_type {
RDMA_TRANSPORT_IB,
-   RDMA_TRANSPORT_IWARP
+   RDMA_TRANSPORT_IWARP,
+   RDMA_TRANSPORT_USNIC
 };
 
 enum rdma_transport_type
-- 
1.8.1.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] XRC upstream merge reboot

2011-07-21 Thread Jeff Squyres
On Jul 21, 2011, at 3:38 AM, Jack Morgenstein wrote:

 If MPI can use a different XRC domain per job (and deallocate the domain
 at the job's end), this would solve the tgt qp lifetime problem (-- by
 destroying all the tgt qp's when the xrc domain is deallocated).

What happens if the MPI job crashes and does not properly deallocate the XRC 
domain / tgt qp?

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] XRC upstream merge reboot

2011-07-21 Thread Jeff Squyres
On Jul 21, 2011, at 8:47 AM, Jack Morgenstein wrote:

 [snip]
 When the last user of an XRC domain exits cleanly (or crashes), the domain 
 should be destroyed.
 In this case, with Sean's design, the tgt qp's for the XRC domain should also 
 be destroyed.

Sounds perfect.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2] [OFED] libibverbs: Support both OFED verbs and ibverbs

2011-07-14 Thread Jeff Squyres
Sean pinged me last night about XRC in Open MPI last night (note that I am no 
longer on the linux-rdma list).

Open MPI uses XRC, but in a non-default manner -- the user has to specifically 
ask for it at run time.


On Jul 14, 2011, at 9:13 AM, Jack Morgenstein wrote:

 Hi Sean,
 
 I am pleased that you are putting in the effort to enable the existing OFED 
 user base to continue using its
 code without changes to the XRC calls.
 
 Regarding XRC and MPI, see below.
 
 On Wednesday 13 July 2011 20:21, Hefty, Sean wrote:
 I was able to build and run mvapich2 successfully against libibverbs with
 this patch applied on top of the current XRC patches.  (The XRC patches are
 still undergoing work.)  I built mvapich2 using the following configure 
 options:
 
 --with-rdma=gen2 CFLAGS=-DOFED_VERBS
 and
 --with-rdma=gen2 CFLAGS='-DOFED_VERBS -D_ENABLE_XRC_
 
 It didn't appear that mvapichs ever used XRC
 
 You are correct, mvapich does not use XRC.
 openMPI uses XRC, so hopefully you can use openMPI to test out your XRC stuff.
 
 You can contact Jeff Squyres for details/help.
 
 In the meantime, I include the following from the ewg list:
 =
 On 11/08/2010 08:06 PM, Jeff Squyres wrote:
 Steve pinged me on IM this morning and told me that you want OMPI v1.4.3 for 
 the next OFED release.  I just logged into www.openfabrics.org
 and apparently the server has changed -- my entire $HOME is empty. 
 
 Where do you want me to put the new OMPI SRPM?  Alternatively, anyone can 
 grab the SRPM from the URL below
 -- there's nothing special about the SRPM for OpenFabrics that's not already 
 in our community SRPM: 
 
 http://www.open-mpi.org/software/ompi/v1.4/
 
 
 Hi Jeff,
 The place for the Open MPI on the new server is under:
 /var/www/openfabrics.org/downloads/openmpi/   
 (http://www.openfabrics.org/downloads/openmpi/)
 
 I updated Open MPI version there to v1.4.3.
 
 Regards,
 Vladimir
 ==
 
 -Jack


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OpenMPI over RoCEE

2010-07-13 Thread Jeff Squyres
Does it work with Open MPI v1.4.2?


On Jul 12, 2010, at 4:21 PM, Steve Wise wrote:

 I'm running OFED-1.5.1 with the RoCEE mlx4 drivers.  I can run low level
 verbs programs ok, but when running open mpi, I'm getting this error. 
 Anybody seen this?
 
 -
 
 [o...@escher ~]$ mpirun -np 2 -host 10.192.176.111,10.192.176.112 --mca
 btl openib,sm,self /usr/mpi/gcc/openmpi-1.4.1/tests/IMB-3.2/IMB-MPI1
 -msglen msglen.txt -iter 100 pingpong
 [escher][[36356,1],1][connect/btl_openib_connect_oob.c:325:qp_connect_all]
 error modifing QP to RTR errno says Invalid argument
 [escher][[36356,1],1][connect/btl_openib_connect_oob.c:809:rml_recv_cb]
 error in endpoint reply start connect
 --
 mpirun has exited due to process rank 1 with PID 4894 on
 node escher exiting without calling finalize. This may
 have caused other processes in the application to be
 terminated by signals sent by mpirun (as reported here).
 --
 
 


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ummunotify: Userspace support for MMU notifications V2

2010-05-07 Thread Jeff Squyres
On Apr 22, 2010, at 9:38 AM, Eric B Munson wrote:

 From: Roland Dreier rola...@cisco.com
 
 As discussed in http://article.gmane.org/gmane.linux.drivers.openib/61925
 and follow-up messages, libraries using RDMA would like to track
 precisely when application code changes memory mapping via free(),
 munmap(), etc.  Current pure-userspace solutions using malloc hooks
 and other tricks are not robust, and the feeling among experts is that
 the issue is unfixable without kernel help.

Sorry for not replying earlier -- just to throw in my $0.02 here: the MPI 
community is *very interested* in having this stuff in upstream kernels.  It 
solves a fairly major problem for us. 

Open MPI (www.open-mpi.org) is ready to pretty much immediately take advantage 
of these capabilities.  The code to use ummunotify is in a Mercurial branch; 
we're only waiting for ummunotify to go upstream before committing our support 
for it to our main SVN development trunk.

 We solve this not by implementing the full API proposed in the email
 linked above but rather with a simpler and more generic interface,
 which may be useful in other contexts.  Specifically, we implement a
 new character device driver, ummunotify, that creates a /dev/ummunotify
 node.  A userspace process can open this node read-only and use the fd
 as follows:
 
  1. ioctl() to register/unregister an address range to watch in the
 kernel (cf struct ummunotify_register_ioctl in linux/ummunotify.h).
 
  2. read() to retrieve events generated when a mapping in a watched
 address range is invalidated (cf struct ummunotify_event in
 linux/ummunotify.h).  select()/poll()/epoll() and SIGIO are
 handled for this IO.
 
  3. mmap() one page at offset 0 to map a kernel page that contains a
 generation counter that is incremented each time an event is
 generated.  This allows userspace to have a fast path that checks
 that no events have occurred without a system call.
 
 Thanks to Jason Gunthorpe jgunthorpe at obsidianresearch.com for
 suggestions on the interface design.  Also thanks to Jeff Squyres
 jsquyres at cisco.com for prototyping support for this in Open MPI, which
 helped find several bugs during development.
 
 Signed-off-by: Roland Dreier rola...@cisco.com
 Signed-off-by: Eric B Munson ebmun...@us.ibm.com

Acked-by: Jeff Squyers jsquy...@cisco.com

 ---
 
 Changes from V1:
 - Update Kbuild to handle test program build properly
 - Update documentation to cover questions not addressed in previous
   thread
 ---
  Documentation/Makefile  |3 +-
  Documentation/ummunotify/Makefile   |7 +
  Documentation/ummunotify/ummunotify.txt |  162 +
  Documentation/ummunotify/umn-test.c |  200 +++
  drivers/char/Kconfig|   12 +
  drivers/char/Makefile   |1 +
  drivers/char/ummunotify.c   |  567 
 +++
  include/linux/Kbuild|1 +
  include/linux/ummunotify.h  |  121 +++
  9 files changed, 1073 insertions(+), 1 deletions(-)
  create mode 100644 Documentation/ummunotify/Makefile
  create mode 100644 Documentation/ummunotify/ummunotify.txt
  create mode 100644 Documentation/ummunotify/umn-test.c
  create mode 100644 drivers/char/ummunotify.c
  create mode 100644 include/linux/ummunotify.h
 
 diff --git a/Documentation/Makefile b/Documentation/Makefile
 index 6fc7ea1..27ba76a 100644
 --- a/Documentation/Makefile
 +++ b/Documentation/Makefile
 @@ -1,3 +1,4 @@
  obj-m := DocBook/ accounting/ auxdisplay/ connector/ \
 filesystems/ filesystems/configfs/ ia64/ laptops/ networking/ \
 -   pcmcia/ spi/ timers/ video4linux/ vm/ watchdog/src/
 +   pcmcia/ spi/ timers/ video4linux/ vm/ ummunotify/ \
 +   watchdog/src/
 diff --git a/Documentation/ummunotify/Makefile 
 b/Documentation/ummunotify/Makefile
 new file mode 100644
 index 000..89f31a0
 --- /dev/null
 +++ b/Documentation/ummunotify/Makefile
 @@ -0,0 +1,7 @@
 +# List of programs to build
 +hostprogs-y := umn-test
 +
 +# Tell kbuild to always build the programs
 +always := $(hostprogs-y)
 +
 +HOSTCFLAGS_umn-test.o += -I$(objtree)/usr/include
 diff --git a/Documentation/ummunotify/ummunotify.txt 
 b/Documentation/ummunotify/ummunotify.txt
 new file mode 100644
 index 000..d6c2ccc
 --- /dev/null
 +++ b/Documentation/ummunotify/ummunotify.txt
 @@ -0,0 +1,162 @@
 +UMMUNOTIFY
 +
 +  Ummunotify relays MMU notifier events to userspace.  This is useful
 +  for libraries that need to track the memory mapping of applications;
 +  for example, MPI implementations using RDMA want to cache memory
 +  registrations for performance, but tracking all possible crazy cases
 +  such as when, say, the FORTRAN runtime frees memory is impossible
 +  without kernel help.
 +
 +Basic Model
 +
 +  A userspace process uses it by opening /dev/ummunotify, which
 +  returns a file descriptor.  Interest in address ranges is registered
 +  using

Re: [PATCH] ummunotify: Userspace support for MMU notifications

2010-04-14 Thread Jeff Squyres
On Apr 14, 2010, at 5:06 AM, Gleb Natapov wrote:

  The Open MPI developers have spent a lot of effort trying to handle this
  purely in userspace and still do not believe that a truly robust
  solution is possible without kernel help.  Perhaps they can expand on
  what the obstacles are.

By truly robust we mean that some other user-level code can't override the 
hooks installed by the MPI (user level) middleware.  All current glibc hooks 
are overridable by other user-level code -- and sometimes real applications do 
this (for their own good reasons).  Most of the time, apps blithely override 
our hooks because they either don't know or can't know that our hooks are 
installed.  It can be dicey to know what you can and cannot override 
pre-main(), for example (e.g., via the __malloc_initialize_hook).

Opening up a direct channel to the kernel and saying hey, tell me when 
something changes is robust because no other entity can hijack your 
notifications.  It also allows us to avoid using pre-main hooks, and makes it 
so that we don't have to hook into the memory subsystem (usually replacing it 
with our own).  Both of these things are extremely distasteful -- fixing these 
two things alone make doing something like ummunotify worthwhile, IMHO.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ummunotify: progress at last!

2010-03-23 Thread Jeff Squyres
On Mar 23, 2010, at 12:59 PM, Jason Gunthorpe wrote:

 The main reason for the new FD is so it can be polled on..

What do you poll on the fd for?

With ummunotify, you only read() from the fd when (counter != last_counter).  
Were you thinking that the poll() would be for something else?

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ummunotify: progress at last!

2010-03-23 Thread Jeff Squyres
On Mar 23, 2010, at 1:29 PM, Jason Gunthorpe wrote:

  What do you poll on the fd for?
 
 poll() is for apps that want to get the notifications without
 spinning on the counter.

Ah, ok.

I think even with the ummunotify interface, that would work, too.  Meaning: 
since you have to read() from the fd to get event details, poll() would *also* 
tell you if there was something to read (in addition to checking if 
(last_counter != counter)).  The counter is a fast way of checking -- e.g., if 
you need to check in your fast path (which MPI's likely will).  poll() could be 
used if you don't care if the check is slow.

 If you don't think that is worth doing it
 does simplify things alot, just add two new verbs calls:
 
 ibv_set_mmu_counter(verbs, my_counter);
 ibv_get_mmu_notifications(verbs, my_list, sizeof(my_list));

I have no real opinion on whether the mmap/read should be hidden by the above 
ibv calls or not.  Either is fine with me.  I would *assume* that 
ibv_get_mmu_notifications() is non-blocking, right?  E.g., if you ask for N and 
only M are available (where M  N), then the call returns with only M items 
filled (and M could be 0).  Perhaps you need another parameter to indicate how 
many items in my_list were actually filled?  Or is that the return value?

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ummunotify: progress at last!

2010-03-23 Thread Jeff Squyres
On Mar 23, 2010, at 3:52 PM, Jason Gunthorpe wrote:

   ibv_set_mmu_counter(verbs, my_counter);
   ibv_get_mmu_notifications(verbs, my_list, sizeof(my_list));
 
 These are not hiding mmap/read, they are new uverbs 'syscalls' that
 get the kernel to perform that operation.

Oh -- so there's 2 mechanisms to get the counter info (for example):

1. the above uverb
2. mmap

Right?

I don't really have an opinion here -- I'm not really an owner of the ibv 
API.  As long as there is a fast/mmap way for me to get the counter without an 
extra function call, I'm happy.  :-)

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ewg] rdma/cm: revert associating an RDMA device when binding to loopback

2010-02-09 Thread Jeff Squyres
Open MPI also now checks for 127.0.0.1/8 and skips them.  This behavior will be 
included in the upcoming Open MPI v1.4.2 (possibly within a few weeks?) and 
Open MPI v1.5.0.

Two followup questions:

1. Is this now the recommended way to find all the IP interfaces that support 
RDMA:

- loop over all local IP addresses
- if 127.0.0.1/8, skip
- try to rdma_bind_addr()
- if it succeeds and verbs ptr is != NULL, it's an RDMA device

(I believe Steve Wise proposed adding an API function to just return a list of 
IP addresses of RDMA devices a while back; it was rejected, which is why either 
we use the try-to-rdma_bind_addr() approach)

2. Before Sean backed out the localhost behavior, when you 
rdma_addr_bind(127.0.0.1), what did the id-verbs pointer correspond to?




On Feb 9, 2010, at 11:15 AM, Pradeep Satyanarayana wrote:

 Steve Wise wrote:
  This patch works.  It also backports cleanly to ofed-1.5.1/RH5.3.
 
  Acked-by: Steve Wise sw...@opengridcomputing.com
 
  Steve.
 Steve, Was this tested against both iWARP and IB?
 
 Thanks
 Pradeep
 
 ___
 ewg mailing list
 e...@lists.openfabrics.org
 http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
 


-- 
Jeff Squyres
jsquy...@cisco.com

For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ewg] rdma/cm: disallow loopback address for iwarp devices

2010-02-08 Thread Jeff Squyres
Sorry -- I missed many of these mails today due to mail filtering (don't ask).

FWIW:

- I'm not opposed to adding LOOPBACK checks into OMPI to avoid this problem 
(I'm waiting for a patch, actually).  I'm just saying that we're not going to 
get a release out immediately with this fix.  Our next release was scheduled to 
be 1.4.2, and it is still at least several weeks away.  So allowing this in 
2.6.33 would be Bad because a) we know it breaks OMPI, and b) OMPI can't get a 
release out immediately to fix the issue.

- There are customers who are using RDMA CM with IB (e.g., Sandia with their 
Mesh/IB routing stuff).

- I see the following in rdma_bind_addr(3):

-
DESCRIPTION
   Associates a source address with an rdma_cm_id.  The  address  may  be
   wildcarded.   If  binding  to a specific local address, the rdma_cm_id
   will also be bound to a local RDMA device.
-

What RDMA device is bound to when you use 127.0.0.1?  I'm not 100% sure, but I 
think that this might be where we got the rationale that we didn't need 
additional LOOPBACK tests in OMPI...  (if anyone else agrees with this 
interpretation, then it's at least one argument that allowing binding to 
LOOPBACK devices *is* a change in semantics, and therefore should be treated 
extremely carefully)


On Feb 8, 2010, at 4:16 PM, Steve Wise wrote:

 
 Sean Hefty wrote:
  IMO 127.0.0.1 should be for SW loopback, not HW RDMA loopback.
 
 
  I disagree, but what does it matter?  So, we add a 'software' loopback that 
  uses
  127.0.0.1.  Openmpi still wouldn't work.
 
   
 
 I guess that's true.
 
  I will commit to get the fix in openmpi asap.
 
 
  If we don't care if the fix is in the kernel or user space, then we could 
  add an
  a 'disable-loopback-support' build option to librdmacm, which can fail any
  attempt to bind to a loopback address.
 
   
 
 I'd rather see it removed from 2.6.33 kernel before it shipts, and then
 we fix openmpi, and then re-submit 127.0.0.1 support once openmpi
 publishes a release with its fix.  See my other email that submits a
 potential commit to remove 127.0.0.1 support for 2.6.33.
 
 Steve.
 


-- 
Jeff Squyres
jsquy...@cisco.com

For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ewg] rdma/cm: disallow loopback address for iwarp devices

2010-02-08 Thread Jeff Squyres
On Feb 8, 2010, at 5:09 PM, Jason Gunthorpe wrote:

 DESCRIPTION
   Associates a source address with an rdma_cm_id.  The  address  may  be
   wildcarded.   If  binding  to a specific local address, the rdma_cm_id
   will also be bound to a local RDMA device.
 This statement is trying to say that if a source address is given then
 the rdma_cm_id will be bound to a device.

Which device is bound to if you specify 127.0.0.1 as the source address?  
(which is what OMPI is doing)  Is it possible to assign 127.0.0.1 to an RDMA 
device?

-- 
Jeff Squyres
jsquy...@cisco.com

For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ewg] rdma/cm: disallow loopback address for iwarp devices

2010-02-08 Thread Jeff Squyres
On Feb 8, 2010, at 5:13 PM, Sean Hefty wrote:

 Are you certain that rdma_bind_addr does NOT work with 127.0.0.1, and that 
 this
 is now the problem?
 
 It does appear to work on OFED 1.4 and on 2.6.26 based on ucmatose.  Is the
 problem really with rdma_bind_addr succeeding, or with rdma_connect, which now
 works, or rdma_bind_addr now assigning a device?

On my OFED 1.4.1 RHEL4u6 systems, rdma_bind_addr() fails when attempting to 
bind to 127.0.0.1 per the email I sent Friday:

http://www.spinics.net/lists/linux-rdma/msg02568.html

I have not checked any other combinations; Steve was saying that he saw it 
rdma_bind_addr() succeeding on his machines with OFED 1.5.1rcwhatever (I don't 
recall the OS he said he was using).

-- 
Jeff Squyres
jsquy...@cisco.com

For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ewg] rdma/cm: disallow loopback address for iwarp devices

2010-02-08 Thread Jeff Squyres
On Feb 8, 2010, at 6:48 PM, Sean Hefty wrote:

   rc = rdma_bind_addr(cm_id, ipaddr);
   if (rc || !cm_id-verbs) {
   rc = OMPI_SUCCESS;
   goto out3;
   }

Ah, yes!  Per the OMPI code you cited, I amended my printf's and see:

   [svbu-mpi.cisco.com:19315] FAILED to bind to 127.0.0.1: rc=0, verbs=(nil)

So the rc from from rdma_bind_addr was 0, but you're right that the verbs 
pointer was NULL, and we therefore rule that it was no good.

 The other is btl_openib_connect_rdmacm.c, but that deals with listening.  I
 can't quickly determine if btl_openib_iwarp.c is usually used for IB or not.

It is.

 So, to fully keep the behavior of 2.6.32, rdma_bind_addr for 127.0.0.1 should
 succeed, but not assign a device.  I think this was the change from commit
 ..c55e657 that changed the behavior:
 
 @@ -2089,7 +2096,9 @@ int rdma_bind_addr(struct rdma_cm_id *id, struct 
 sockaddr
 *addr)
 if (!cma_comp_exch(id_priv, CMA_IDLE, CMA_ADDR_BOUND))
 return -EINVAL;
 
 -   if (!cma_any_addr(addr)) {
 +   if (cma_loopback_addr(addr)) {
 +   ret = cma_bind_loopback(id_priv);
 +   } else if (!cma_zero_addr(addr)) {
 ret = rdma_translate_ip(addr, id-route.addr.dev_addr);
 if (ret)
 goto err1;
 
 I'll see if reverting this gives the desired(?) behavior.

Thanks!

-- 
Jeff Squyres
jsquy...@cisco.com

For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ewg] rdma/cm: disallow loopback address for iwarp devices

2010-02-08 Thread Jeff Squyres
On Feb 8, 2010, at 7:50 PM, Pradeep Satyanarayana wrote:

 No, there is none. I got this command from one of the mails in the thread. 
 What should I use instead?

You need to compile and run an MPI program.  ring is a typical test program 
that sends a message around in a ring.  I think that OFED installs those test 
apps somewhere, but I don't recall where offhand.

ring_c.c is attached.  Compile it with:

mpicc ring_c.c -o ring

(you might need the full path to mpicc if it's not in your path?)

A better mpirun command line would be:

/usr/mpi/gcc/openmpi-1.4.1/bin/mpirun -np 2 --host HOSTNAME1,HOSTNAME2 \
--mca btl openib,sm,self --mca btl_openib_cpc_include rdmacm ring

Put in your own HOSTNAME1 and HOSTNAME2 values.  You'll also need to ensure 
that both Open MPI and ring are available on both names (preferably in the 
same filesystem locations on both nodes, for simplicity) and that you can ssh 
to from one node to the other without being prompted for a password or 
passphrase.

This will run a 2-process MPI job across the two nodes, passing a message 
between the two processes a few times before quitting.

The various --mca parameters on this mpirun command line ensure that you are 
definitely using the OpenFabrics verbs support and forcing the use of RDMA CM.

-- 
Jeff Squyres
jsquy...@cisco.com

For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


ring_c.c
Description: Binary data


Re: bug 1918 - openmpi broken due to rdma-cm changes

2010-02-05 Thread Jeff Squyres
On Feb 5, 2010, at 11:16 AM, Steve Wise wrote:

  Note that it is highly unlikely that we will release open mpi 1.4.2 in
  time for ofed 1.5.1.
 
 Jeff, there is no way to handle high priority bug fixes in the current
 released stream?

We have 1.4.2 cooking, but it's not ready yet.  

I'll take it back to the OMPI community to see if they want to do a 
high-priority release, but I'm not excited about it (see below).

  Also note that trying to bind rdma cm to all interface ip addresses
  was the way that we were advised by openfabrics to figure out which
  devices are rdma-capable.
 
  As such, it is highly desirable to get the fix transparently in rdmacm
  and preserve the old semantic. More specifically, it seems undesirable
  to change this semantic in a minor ofed point release.
 
 I agree that we should probably not allow 127.0.0.1 binds in ofed-1.5.1
 at all because it regresses OpenMPI.  Even with IB systems, if the bind
 to 127.0.0.1 succeeds, then OpenMPI assumes 127.0.0.1 is bound to that
 rdma interface and advertises this address to its peer as an address
 to-which that peer can rdma connect!  This will break IB clusters too,
 not just T3/iWARP cluster.   While I think OpenMPI needs to skip
 127.0.0.1 in its logic, I think we should probably defer allowing
 127.0.0.1 binds until ofed-1.6.

I agree that Open MPI should not advertise 127.0.0.1 to peers.  However, the 
logic that we were advised to use was to try to RDMA CM bind to each IP 
address.  If the bind succeeds, then it's an RDMA-capable device and therefore 
it's advertisable.  The rationale was that 127.0.0.1 (really, any loopback 
address) is *not* an RDMA device and therefore the RDMA CM bind should *never* 
succeed on it.  Hence, it wasn't necessary to add a is this a loopback 
address? check in the logic.

I guess I don't understand why that rationale is now incorrect -- 127.0.0.1 is 
still not an RDMA-capable device, right?

 But Jeff, note that if someone uses the upstream kernel and OpenMPI, its
 busted...
 
 So I recommend:
 
 1) Don't allow 127.0.0.1 binds in ofed-1.5.1
 
 2) Fix OpenMPI ASAP to never advertise 127.0.0.1 as a valid rdma-cm
 connect address (get it in ofed-1.5.2 or ofed-1.6).

We can add this logic (because I understand that some upstream kernels now 
allow binding to loopback addresses), but I'm still confused (in principle) as 
to why it should be necessary.

Can you clarify what kernel versions allow binding LOOPBACK addresses with RDMA 
CM?

-- 
Jeff Squyres jsquy...@cisco.com
Cisco.com - http://www.cisco.com

For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >