RE: OFA Management maintainership
Hi, I think we all owe a debt of gratitude for Alex's excellent 2+ years of OpenSM, libibumad, and ibsim maintainership. I hope I can live up to the high standard Alex set. Thanks for all you've done Alex! -- Hal -Original Message- From: Alex Netes [mailto:ale...@dev.mellanox.co.il] On Behalf Of Alex Netes Sent: Thursday, February 07, 2013 1:37 AM To: linux-rdma@vger.kernel.org; Hal Rosenstock Subject: OFA Management maintainership Hi, I want to announce that starting from today Hal Rosenstock which you are familiar with, is going to maintain OpenSM, libibumad and ibsim development. So starting from today his trees should be considered as master development trees: git://git.openfabrics.org/~halr/libibumad git://git.openfabrics.org/~halr/opensm git://git.openfabrics.org/~halr/ibsim I would like to wish Hal a lot of success with the new role. Adiitionaly, I would like to thank the whole community for a good working time. I still continue to work on OpenSM and will continue to contribute to the community in the future. --Alex -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
OFA Management maintainership
Hi, I want to announce that starting from today Hal Rosenstock which you are familiar with, is going to maintain OpenSM, libibumad and ibsim development. So starting from today his trees should be considered as master development trees: git://git.openfabrics.org/~halr/libibumad git://git.openfabrics.org/~halr/opensm git://git.openfabrics.org/~halr/ibsim I would like to wish Hal a lot of success with the new role. Adiitionaly, I would like to thank the whole community for a good working time. I still continue to work on OpenSM and will continue to contribute to the community in the future. --Alex -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PULL] please pull infiniband.git
Hi Linus, Please pull from git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git tags/rdma-for-linus IB regression fixes for 3.8: - Fix mlx4 VFs not working on old guests because of 64B CQE changes - Fix ill-considered sparse fix for qib - Fix IPoIB crash due to skb double destruct introduced in 3.8-rc1 Mike Marciniszyn (1): IB/qib: Fix for broken sparse warning fix Or Gerlitz (1): mlx4_core: Fix advertisement of wrong PF context behaviour Roland Dreier (1): Merge branches 'ipoib', 'mlx4' and 'qib' into for-next Shlomo Pongratz (1): IPoIB: Fix crash due to skb double destruct drivers/infiniband/hw/qib/qib_qp.c| 11 +++ drivers/infiniband/ulp/ipoib/ipoib_cm.c | 6 +++--- drivers/infiniband/ulp/ipoib/ipoib_ib.c | 6 +++--- drivers/net/ethernet/mellanox/mlx4/main.c | 2 +- 4 files changed, 10 insertions(+), 15 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] infiniband-diags: deprecate dump_[m|l]fts.sh scripts
Signed-off-by: Ira Weiny --- Makefile.am|6 +- configure.in |4 +- doc/man/dump_lfts.8.in | 177 doc/man/dump_mfts.8.in | 170 -- doc/rst/dump_lfts.8.in.rst | 73 -- doc/rst/dump_mfts.8.in.rst | 64 man/dump_lfts.8|2 + man/dump_mfts.8|2 + scripts/dump_lfts.sh | 72 -- scripts/dump_lfts.sh.in| 12 +++ scripts/dump_mfts.sh | 72 -- scripts/dump_mfts.sh.in| 12 +++ 12 files changed, 33 insertions(+), 633 deletions(-) delete mode 100644 doc/man/dump_lfts.8.in delete mode 100644 doc/man/dump_mfts.8.in delete mode 100644 doc/rst/dump_lfts.8.in.rst delete mode 100644 doc/rst/dump_mfts.8.in.rst create mode 100644 man/dump_lfts.8 create mode 100644 man/dump_mfts.8 delete mode 100755 scripts/dump_lfts.sh create mode 100755 scripts/dump_lfts.sh.in delete mode 100755 scripts/dump_mfts.sh create mode 100755 scripts/dump_mfts.sh.in diff --git a/Makefile.am b/Makefile.am index 42c2c75..f44b4d6 100644 --- a/Makefile.am +++ b/Makefile.am @@ -47,8 +47,6 @@ man_MANS = doc/man/ibaddr.8 \ doc/man/ibccconfig.8 \ doc/man/ibccquery.8 \ doc/man/dump_fts.8 \ - doc/man/dump_lfts.8 \ - doc/man/dump_mfts.8 \ doc/man/iblinkinfo.8 \ doc/man/ibfindnodesusing.8 \ doc/man/ibhosts.8 \ @@ -71,7 +69,9 @@ man_MANS = doc/man/ibaddr.8 \ doc/man/smpdump.8 \ doc/man/smpquery.8 \ doc/man/vendstat.8 \ - doc/man/infiniband-diags.8 + doc/man/infiniband-diags.8 \ + man/dump_lfts.8 \ + man/dump_mfts.8 # define this for the dist target compat_man_pages = man/ibdiscover.8 man/ibcheckerrors.8 man/ibcheckerrs.8 \ diff --git a/configure.in b/configure.in index b54222b..edac1e3 100644 --- a/configure.in +++ b/configure.in @@ -216,14 +216,14 @@ AC_CONFIG_FILES([\ scripts/ibrouters \ scripts/iblinkinfo.pl \ scripts/ibqueryerrors.pl \ + scripts/dump_lfts.sh \ + scripts/dump_mfts.sh \ doc/man/ibaddr.8 \ doc/man/check_lft_balance.8 \ doc/man/ibcacheedit.8 \ doc/man/ibccconfig.8 \ doc/man/ibccquery.8 \ doc/man/dump_fts.8 \ - doc/man/dump_lfts.8 \ - doc/man/dump_mfts.8 \ doc/man/ibhosts.8 \ doc/man/ibidsverify.8 \ doc/man/iblinkinfo.8 \ diff --git a/doc/man/dump_lfts.8.in b/doc/man/dump_lfts.8.in deleted file mode 100644 index a75a425..000 --- a/doc/man/dump_lfts.8.in +++ /dev/null @@ -1,177 +0,0 @@ -.\" Man page generated from reStructeredText. -. -.TH DUMP_LFTS.SH 8 "@BUILD_DATE@" "" "OpenIB Diagnostics" -.SH NAME -DUMP_LFTS.SH \- dump InfiniBand unicast forwarding tables -. -.nr rst2man-indent-level 0 -. -.de1 rstReportMargin -\\$1 \\n[an-margin] -level \\n[rst2man-indent-level] -level margin: \\n[rst2man-indent\\n[rst2man-indent-level]] -- -\\n[rst2man-indent0] -\\n[rst2man-indent1] -\\n[rst2man-indent2] -.. -.de1 INDENT -.\" .rstReportMargin pre: -. RS \\$1 -. nr rst2man-indent\\n[rst2man-indent-level] \\n[an-margin] -. nr rst2man-indent-level +1 -.\" .rstReportMargin post: -.. -.de UNINDENT -. RE -.\" indent \\n[an-margin] -.\" old: \\n[rst2man-indent\\n[rst2man-indent-level]] -.nr rst2man-indent-level -1 -.\" new: \\n[rst2man-indent\\n[rst2man-indent-level]] -.in \\n[rst2man-indent\\n[rst2man-indent-level]]u -.. -.SH SYNOPSIS -.sp -dump_lfts.sh [\-h] [\-D] [\-C ca_name] [\-P ca_port] [\-t(imeout) timeout_ms] [>/path/to/dump\-file] -.SH DESCRIPTION -.sp -dump_lfts.sh is a script which dumps the InfiniBand unciast forwarding -tables (MFTs) in the switch nodes in the subnet. -.sp -The dump file format is compatible with loading into OpenSM using -the \-R file \-U /path/to/dump\-file syntax. -.SH OPTIONS -.sp -\fB\-D\fP -dump forwarding tables using direct routed rather than LID routed SMPs -.sp -\fB\-h\fP -show help -.SS Port Selection flags -.\" Define the common option -C -. -.sp -\fB\-C, \-\-Ca \fPuse the specified ca_name. -.\" Define the common option -P -. -.sp -\fB\-P, \-\-Port \fPuse the specified ca_port. -.\" Explanation of local port selection -. -.SS Local port Selection -.sp -Multiple port/Multiple CA support: when no IB device or port is specified -(see the "local umad parameters" below), the libibumad library -selects the port to use by the following criteria: -.INDENT 0.0 -.INDENT 3.5 -.INDENT 0.0 -.IP 1. 3 -. -the first port that is ACTIVE. -.IP 2. 3 -. -if not found, the first port that is UP (physical link up). -.UNINDENT -.sp -If a port and/or CA name is specified, the libibumad library attempts -to fulfill the user request, and will fail if it is not possible. -.sp -For example: -.sp -.nf -.ft C -ibaddr
[PATCH V3 2/3] infiniband-diags: add dump_fts tool
dump_fts adds a faster version of the functionality of dump_[l|m]fts.sh. This code is based off of the ibroute code and simply uses libibnetdisc to scan the fabric instead of using ibnetdiscover and letting ibroute requery all that data over again. This improves things in 3 ways. 1) performance improves by nearly 2 orders of magnitude. 2) this version greatly reduces the mads required and thus reduces the impact on the fabric. 3) Everything is queried with DR paths which ensures if the routing tables are bad on the cluster the query will still complete and give you the information you were looking for. (To be fair dump_lft.sh has the DR option but it is currently buggy.) Example runs on the ~1400 nodes of the Hyperion test cluster show: 13:45:46 > time ./dump_lfts.sh > /dev/null real4m58.175s user0m6.407s sys 0m17.983s 13:53:12 > time ./dump_fts > /dev/null dump tables: linear forwarding table get failed real0m8.121s user0m3.032s sys 0m3.342s Changes from V1: Add status and query information to error messages. Add man page files which were missed in the first patch Changes from V2: Additional status and address information on error messages. clean up error handling when fabric scan fails. Signed-off-by: Ira Weiny --- Makefile.am |7 +- configure.in |1 + doc/man/dump_fts.8.in | 236 ++ doc/rst/dump_fts.8.in.rst | 85 infiniband-diags.spec.in |2 + src/dump_fts.c| 489 + 6 files changed, 819 insertions(+), 1 deletions(-) create mode 100644 doc/man/dump_fts.8.in create mode 100644 doc/rst/dump_fts.8.in.rst create mode 100644 src/dump_fts.c diff --git a/Makefile.am b/Makefile.am index a35a432..42c2c75 100644 --- a/Makefile.am +++ b/Makefile.am @@ -15,7 +15,8 @@ sbin_PROGRAMS = src/ibaddr src/ibnetdiscover src/ibping src/ibportstate \ src/perfquery src/sminfo src/smpdump src/smpquery \ src/saquery src/vendstat src/iblinkinfo \ src/ibqueryerrors src/ibcacheedit src/ibccquery \ - src/ibccconfig + src/ibccconfig \ + src/dump_fts if ENABLE_TEST_UTILS sbin_PROGRAMS += src/ibsendtrap src/mcm_rereg_test @@ -45,6 +46,7 @@ man_MANS = doc/man/ibaddr.8 \ doc/man/ibcacheedit.8 \ doc/man/ibccconfig.8 \ doc/man/ibccquery.8 \ + doc/man/dump_fts.8 \ doc/man/dump_lfts.8 \ doc/man/dump_mfts.8 \ doc/man/iblinkinfo.8 \ @@ -118,6 +120,9 @@ src_ibqueryerrors_LDFLAGS = -L$(top_builddir)/libibnetdisc -libnetdisc src_ibcacheedit_SOURCES = src/ibcacheedit.c src_ibcacheedit_LDFLAGS = -L$(top_builddir)/libibnetdisc -libnetdisc +src_dump_fts_SOURCES = src/dump_fts.c +src_dump_fts_LDFLAGS = -L$(top_builddir)/libibnetdisc -libnetdisc + BUILT_SOURCES = ibdiag_version ibdiag_version: if [ -x $(top_srcdir)/gen_ver.sh ] ; then \ diff --git a/configure.in b/configure.in index ca62d5b..b54222b 100644 --- a/configure.in +++ b/configure.in @@ -221,6 +221,7 @@ AC_CONFIG_FILES([\ doc/man/ibcacheedit.8 \ doc/man/ibccconfig.8 \ doc/man/ibccquery.8 \ + doc/man/dump_fts.8 \ doc/man/dump_lfts.8 \ doc/man/dump_mfts.8 \ doc/man/ibhosts.8 \ diff --git a/doc/man/dump_fts.8.in b/doc/man/dump_fts.8.in new file mode 100644 index 000..a64c6da --- /dev/null +++ b/doc/man/dump_fts.8.in @@ -0,0 +1,236 @@ +.\" Man page generated from reStructeredText. +. +.TH DUMP_FTS 8 "@BUILD_DATE@" "" "OpenIB Diagnostics" +.SH NAME +DUMP_FTS \- dump InfiniBand forwarding tables +. +.nr rst2man-indent-level 0 +. +.de1 rstReportMargin +\\$1 \\n[an-margin] +level \\n[rst2man-indent-level] +level margin: \\n[rst2man-indent\\n[rst2man-indent-level]] +- +\\n[rst2man-indent0] +\\n[rst2man-indent1] +\\n[rst2man-indent2] +.. +.de1 INDENT +.\" .rstReportMargin pre: +. RS \\$1 +. nr rst2man-indent\\n[rst2man-indent-level] \\n[an-margin] +. nr rst2man-indent-level +1 +.\" .rstReportMargin post: +.. +.de UNINDENT +. RE +.\" indent \\n[an-margin] +.\" old: \\n[rst2man-indent\\n[rst2man-indent-level]] +.nr rst2man-indent-level -1 +.\" new: \\n[rst2man-indent\\n[rst2man-indent-level]] +.in \\n[rst2man-indent\\n[rst2man-indent-level]]u +.. +.SH SYNOPSIS +.sp +dump_fts [options] [ []] +.SH DESCRIPTION +.sp +dump_fts is similar to ibroute but dumps tables for every switch found in an +ibnetdiscover scan of the subnet. +.sp +The dump file format is compatible with loading into OpenSM using +the \-R file \-U /path/to/dump\-file syntax. +.SH OPTIONS +.INDENT 0.0 +.TP +.B \fB\-a, \-\-all\fP +.sp +show all lids in range, even invalid entries +.TP +.B \fB\-n, \-\-no_dests\fP +.sp +do not try to resolve destinations +.TP +.B \fB\-M, \-\-Multicast\fP +.sp +show multicast forwarding tables +In this case, the range par
[PATCH V3 1/3] infiniband-diags: libibnetdisc add find node by lid
NOTE: this change adds a glib requirement to the package. Changes since V1: Use GINT_TO_POINTER rather than allocating keys Changes since V2: Use internal object everywhere rather than just hacked into discover_fabric Generate lid2port hash when reading cached fabrics Signed-off-by: Ira Weiny --- configure.in|7 ++ infiniband-diags.spec.in|4 +- libibnetdisc/Makefile.am|4 +- libibnetdisc/include/infiniband/ibnetdisc.h |3 + libibnetdisc/libibnetdisc.ver |2 +- libibnetdisc/src/ibnetdisc.c| 126 ++ libibnetdisc/src/ibnetdisc_cache.c | 32 libibnetdisc/src/internal.h | 15 +++- libibnetdisc/src/libibnetdisc.map |1 + 9 files changed, 132 insertions(+), 62 deletions(-) diff --git a/configure.in b/configure.in index 2dc60a0..ca62d5b 100644 --- a/configure.in +++ b/configure.in @@ -161,6 +161,13 @@ IBSCRIPTPATH_TMP2="`echo $IBSCRIPTPATH_TMP1 | sed 's/^NONE/$ac_default_prefix/'` IBSCRIPTPATH="${with_ibpath_override:-`eval echo $IBSCRIPTPATH_TMP2`}" AC_SUBST(IBSCRIPTPATH) +dnl check for glib +PKG_CHECK_MODULES([GLIB], [glib-2.0], ac_glib=yes, ac_glib=no) +AM_CONDITIONAL([HAVE_GLIB], test "$ac_glib" = "yes") +if test "$ac_glib" = "yes"; then + AC_DEFINE([HAVE_GLIB], 1, [Define to 1 to indicate GLIB support]) +fi + dnl Begin libibnetdisc stuff ibnetdisc_api_version=`grep LIBVERSION $srcdir/libibnetdisc/libibnetdisc.ver | sed 's/LIBVERSION=//'` if test -z $ibnetdisc_api_version; then diff --git a/infiniband-diags.spec.in b/infiniband-diags.spec.in index d3fcd13..9cd195b 100644 --- a/infiniband-diags.spec.in +++ b/infiniband-diags.spec.in @@ -11,8 +11,8 @@ Group: System Environment/Libraries BuildRoot: %{_tmppath}/%{name}-%{version}-%{release}-root-%(%{__id_u} -n) Source: http://www.openfabrics.org/downloads/management/@TARBALL@ Url: http://openfabrics.org/ -BuildRequires: libibmad-devel, opensm-devel, libibumad-devel -Requires: libibmad, opensm-libs, libibumad +BuildRequires: libibmad-devel, opensm-devel, libibumad-devel, glib-devel +Requires: libibmad, opensm-libs, libibumad, glib Provides: perl(IBswcountlimits) Obsoletes: openib-diags diff --git a/libibnetdisc/Makefile.am b/libibnetdisc/Makefile.am index fbf0e60..d05604f 100644 --- a/libibnetdisc/Makefile.am +++ b/libibnetdisc/Makefile.am @@ -24,10 +24,10 @@ endif libibnetdisc_la_SOURCES = src/ibnetdisc.c src/ibnetdisc_cache.c src/chassis.c \ src/chassis.h src/internal.h src/query_smp.c -libibnetdisc_la_CFLAGS = -Wall $(DBGFLAGS) +libibnetdisc_la_CFLAGS = -Wall $(DBGFLAGS) $(GLIB_CFLAGS) libibnetdisc_la_LDFLAGS = -version-info $(ibnetdisc_api_version) \ -export-dynamic $(libibnetdisc_version_script) \ - -libmad + -libmad $(GLIB_LIBS) libibnetdisc_la_DEPENDENCIES = $(srcdir)/src/libibnetdisc.map libibnetdiscincludedir = $(includedir)/infiniband diff --git a/libibnetdisc/include/infiniband/ibnetdisc.h b/libibnetdisc/include/infiniband/ibnetdisc.h index e41c92c..acde1dc 100644 --- a/libibnetdisc/include/infiniband/ibnetdisc.h +++ b/libibnetdisc/include/infiniband/ibnetdisc.h @@ -231,6 +231,9 @@ IBND_EXPORT ibnd_port_t *ibnd_find_port_guid(ibnd_fabric_t * fabric, uint64_t guid); IBND_EXPORT ibnd_port_t *ibnd_find_port_dr(ibnd_fabric_t * fabric, char *dr_str); +IBND_EXPORT ibnd_port_t *ibnd_find_port_lid(ibnd_fabric_t * fabric, + uint16_t lid); + typedef void (*ibnd_iter_port_func_t) (ibnd_port_t * port, void *user_data); IBND_EXPORT void ibnd_iter_ports(ibnd_fabric_t * fabric, ibnd_iter_port_func_t func, void *user_data); diff --git a/libibnetdisc/libibnetdisc.ver b/libibnetdisc/libibnetdisc.ver index c513f2a..59fca19 100644 --- a/libibnetdisc/libibnetdisc.ver +++ b/libibnetdisc/libibnetdisc.ver @@ -6,4 +6,4 @@ # API_REV - advance on any added API # RUNNING_REV - advance any change to the vendor files # AGE - number of backward versions the API still supports -LIBVERSION=7:0:2 +LIBVERSION=8:0:3 diff --git a/libibnetdisc/src/ibnetdisc.c b/libibnetdisc/src/ibnetdisc.c index 3a7dd8f..9d120dd 100644 --- a/libibnetdisc/src/ibnetdisc.c +++ b/libibnetdisc/src/ibnetdisc.c @@ -98,10 +98,10 @@ static int add_port_to_dpath(ib_dr_path_t * path, int nextport) static int retract_dpath(smp_engine_t * engine, ib_portid_t * portid) { ibnd_scan_t *scan = engine->user_data; - ibnd_fabric_t *fabric = scan->fabric; + f_internal_t *f_int = scan->f_int; if (scan->cfg->max_hops && - fabric->maxhops_discovered > scan->cfg->max_hops) + f_int->fabric.maxhops_discovered > scan->cfg->max_hops) return 0; /* this may seem wrong but the only time we would r
Re: NFS over RDMA crashing
On 2/6/2013 4:24 PM, J. Bruce Fields wrote: On Wed, Feb 06, 2013 at 05:48:15PM +0200, Yan Burman wrote: When killing mount command that got stuck: --- BUG: unable to handle kernel paging request at 880324dc7ff8 IP: [] rdma_read_xdr+0x8bb/0xd40 [svcrdma] PGD 1a0c063 PUD 32f82e063 PMD 32f2fd063 PTE 800324dc7161 Oops: 0003 [#1] PREEMPT SMP Modules linked in: md5 ib_ipoib xprtrdma svcrdma rdma_cm ib_cm iw_cm ib_addr nfsd exportfs netconsole ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat nfsv3 nfs_acl ebtables x_tables nfsv4 auth_rpcgss nfs lockd autofs4 sunrpc target_core_iblock target_core_file target_core_pscsi target_core_mod configfs 8021q bridge stp llc ipv6 dm_mirror dm_region_hash dm_log vhost_net macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support kvm_intel kvm crc32c_intel microcode pcspkr joydev i2c_i801 lpc_ich mfd_core ehci_pci ehci_hcd sg ioatdma ixgbe mdio mlx4_ib ib_sa ib_mad ib_core mlx4_en mlx4_core igb hwmon dca ptp pps_core button dm_mod ext3 jbd sd_mod ata_piix libata uhci_hcd megaraid_sas scsi_mod CPU 6 Pid: 4744, comm: nfsd Not tainted 3.8.0-rc5+ #4 Supermicro X8DTH-i/6/iF/6F/X8DTH RIP: 0010:[] [] rdma_read_xdr+0x8bb/0xd40 [svcrdma] RSP: 0018:880324c3dbf8 EFLAGS: 00010297 RAX: 880324dc8000 RBX: 0001 RCX: 880324dd8428 RDX: 880324dc7ff8 RSI: 880324dd8428 RDI: 81149618 RBP: 880324c3dd78 R08: 60f9c860 R09: 0001 R10: 880324dd8000 R11: 0001 R12: 8806299dcb10 R13: 0003 R14: 0001 R15: 0010 FS: () GS:88063fc0() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 880324dc7ff8 CR3: 01a0b000 CR4: 07e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process nfsd (pid: 4744, threadinfo 880324c3c000, task 88033055) Stack: 880324c3dc78 880324c3dcd8 0282 880631cec000 880324dd8000 88062ed33040 000124c3dc48 880324dd8000 88062ed33058 880630ce2b90 8806299e8000 0003 Call Trace: [] svc_rdma_recvfrom+0x3ee/0xd80 [svcrdma] [] ? try_to_wake_up+0x2f0/0x2f0 [] svc_recv+0x3ef/0x4b0 [sunrpc] [] ? nfsd_svc+0x740/0x740 [nfsd] [] nfsd+0xad/0x130 [nfsd] [] ? nfsd_svc+0x740/0x740 [nfsd] [] kthread+0xd6/0xe0 [] ? __init_kthread_worker+0x70/0x70 [] ret_from_fork+0x7c/0xb0 [] ? __init_kthread_worker+0x70/0x70 Code: 63 c2 49 8d 8c c2 18 02 00 00 48 39 ce 77 e1 49 8b 82 40 0a 00 00 48 39 c6 0f 84 92 f7 ff ff 90 48 8d 50 f8 49 89 92 40 0a 00 00 <48> c7 40 f8 00 00 00 00 49 8b 82 40 0a 00 00 49 3b 82 30 0a 00 RIP [] rdma_read_xdr+0x8bb/0xd40 [svcrdma] RSP CR2: 880324dc7ff8 ---[ end trace 06d0384754e9609a ]--- It seems that commit afc59400d6c65bad66d4ad0b2daf879cbff8e23e "nfsd4: cleanup: replace rq_resused count by rq_next_page pointer" is responsible for the crash (it seems to be crashing in net/sunrpc/xprtrdma/svc_rdma_recvfrom.c:527) It may be because I have CONFIG_DEBUG_SET_MODULE_RONX and CONFIG_DEBUG_RODATA enabled. I did not try to disable them yet. When I moved to commit 79f77bf9a4e3dd5ead006b8f17e7c4ff07d8374e I was no longer getting the server crashes, so the reset of my tests were done using that point (it is somewhere in the middle of 3.7.0-rc2). OK, so this part's clearly my fault--I'll work on a patch, but the rdma's use of the ->rq_pages array is pretty confusing. Maybe Tom can shed some light? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NFS over RDMA crashing
On Wed, Feb 06, 2013 at 05:48:15PM +0200, Yan Burman wrote: > When killing mount command that got stuck: > --- > > BUG: unable to handle kernel paging request at 880324dc7ff8 > IP: [] rdma_read_xdr+0x8bb/0xd40 [svcrdma] > PGD 1a0c063 PUD 32f82e063 PMD 32f2fd063 PTE 800324dc7161 > Oops: 0003 [#1] PREEMPT SMP > Modules linked in: md5 ib_ipoib xprtrdma svcrdma rdma_cm ib_cm iw_cm > ib_addr nfsd exportfs netconsole ip6table_filter ip6_tables > iptable_filter ip_tables ebtable_nat nfsv3 nfs_acl ebtables x_tables > nfsv4 auth_rpcgss nfs lockd autofs4 sunrpc target_core_iblock > target_core_file target_core_pscsi target_core_mod configfs 8021q > bridge stp llc ipv6 dm_mirror dm_region_hash dm_log vhost_net > macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support kvm_intel > kvm crc32c_intel microcode pcspkr joydev i2c_i801 lpc_ich mfd_core > ehci_pci ehci_hcd sg ioatdma ixgbe mdio mlx4_ib ib_sa ib_mad ib_core > mlx4_en mlx4_core igb hwmon dca ptp pps_core button dm_mod ext3 jbd > sd_mod ata_piix libata uhci_hcd megaraid_sas scsi_mod > CPU 6 > Pid: 4744, comm: nfsd Not tainted 3.8.0-rc5+ #4 Supermicro > X8DTH-i/6/iF/6F/X8DTH > RIP: 0010:[] [] > rdma_read_xdr+0x8bb/0xd40 [svcrdma] > RSP: 0018:880324c3dbf8 EFLAGS: 00010297 > RAX: 880324dc8000 RBX: 0001 RCX: 880324dd8428 > RDX: 880324dc7ff8 RSI: 880324dd8428 RDI: 81149618 > RBP: 880324c3dd78 R08: 60f9c860 R09: 0001 > R10: 880324dd8000 R11: 0001 R12: 8806299dcb10 > R13: 0003 R14: 0001 R15: 0010 > FS: () GS:88063fc0() knlGS: > CS: 0010 DS: ES: CR0: 8005003b > CR2: 880324dc7ff8 CR3: 01a0b000 CR4: 07e0 > DR0: DR1: DR2: > DR3: DR6: 0ff0 DR7: 0400 > Process nfsd (pid: 4744, threadinfo 880324c3c000, task 88033055) > Stack: > 880324c3dc78 880324c3dcd8 0282 880631cec000 > 880324dd8000 88062ed33040 000124c3dc48 880324dd8000 > 88062ed33058 880630ce2b90 8806299e8000 0003 > Call Trace: > [] svc_rdma_recvfrom+0x3ee/0xd80 [svcrdma] > [] ? try_to_wake_up+0x2f0/0x2f0 > [] svc_recv+0x3ef/0x4b0 [sunrpc] > [] ? nfsd_svc+0x740/0x740 [nfsd] > [] nfsd+0xad/0x130 [nfsd] > [] ? nfsd_svc+0x740/0x740 [nfsd] > [] kthread+0xd6/0xe0 > [] ? __init_kthread_worker+0x70/0x70 > [] ret_from_fork+0x7c/0xb0 > [] ? __init_kthread_worker+0x70/0x70 > Code: 63 c2 49 8d 8c c2 18 02 00 00 48 39 ce 77 e1 49 8b 82 40 0a 00 > 00 48 39 c6 0f 84 92 f7 ff ff 90 48 8d 50 f8 49 89 92 40 0a 00 00 > <48> c7 40 f8 00 00 00 00 49 8b 82 40 0a 00 00 49 3b 82 30 0a 00 > RIP [] rdma_read_xdr+0x8bb/0xd40 [svcrdma] > RSP > CR2: 880324dc7ff8 > ---[ end trace 06d0384754e9609a ]--- > > > It seems that commit afc59400d6c65bad66d4ad0b2daf879cbff8e23e > "nfsd4: cleanup: replace rq_resused count by rq_next_page pointer" > is responsible for the crash (it seems to be crashing in > net/sunrpc/xprtrdma/svc_rdma_recvfrom.c:527) > It may be because I have CONFIG_DEBUG_SET_MODULE_RONX and > CONFIG_DEBUG_RODATA enabled. I did not try to disable them yet. > > When I moved to commit 79f77bf9a4e3dd5ead006b8f17e7c4ff07d8374e I > was no longer getting the server crashes, > so the reset of my tests were done using that point (it is somewhere > in the middle of 3.7.0-rc2). OK, so this part's clearly my fault--I'll work on a patch, but the rdma's use of the ->rq_pages array is pretty confusing. --b. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH for 3.8 v3, resend 0/3] IB/SRP patches for kernel 3.8
Bart Van Assche wrote: On 02/05/13 21:54, Or Gerlitz wrote: On Tue, Feb 5, 2013 at 6:25 PM, Bart Van Assche wrote: On 02/04/13 22:11, Or Gerlitz wrote: Bart, I'd like to sharpen the point: could you please clarify if the series posted to linux-rdma stands for itself in the sense that SRP HA scheme X (please state it) now works/better when the patches applied on top of the latest 3.8-rc cut? OR for X to do better/work, one needs this series AND the one you posted to linux-scsi. Hello Or, A huge number of patches have been taken upstream between 3.8-rc1 and 3.8-rc6. I have retested these three patches with 3.8-rc6 and would appreciate if you would also repeat your tests. Thanks, Bart. Hello Bart, I tested your 3.8 v3 patchset. I did the following: - clone & checkout Roland's ib tree for-next branch - applied Bart's 3.8 v3 patchset - applied "save & restore host_scribble during error handling" patch - http://www.mail-archive.com/linux-scsi@vger.kernel.org/msg17809.html I have two paths to target thru port 1 & 2 (scsi_host host9 & host10) - run I/Os - disable port 1 @ 19:11:30 - error recovery for host9 kick in @ 19:12:04 - multipath remove the path, I/Os fail-over @ 19:12:51 - error recovery was still going on with host9 (sysfs entry for host9 still intact) - enable port 1 @19:15:00 - host9 reconnect to target thru error recovery, multipathd module re-instate the path in kernel; and then host9 is REMOVED, usermode "multipath -l" did not show re-instate path thru host9 Feb 6 19:15:04 vsa30 kernel: scsi host9: SRP abort called Feb 6 19:15:05 vsa30 multipathd: overflow in attribute '/sys/devices/pci:00/:00:02.0/:02:00.0/host9/target9:0:0/9:0:0:2/state' Feb 6 19:15:14 vsa30 kernel: scsi host9: SRP abort called Feb 6 19:15:14 vsa30 kernel: scsi host9: SRP reset_device called Feb 6 19:15:14 vsa30 kernel: scsi host9: ib_srp: SRP reset_host called Feb 6 19:15:14 vsa30 kernel: scsi host9: ib_srp: reconnect succeeded Feb 6 19:15:26 vsa30 multipathd: 3600144f0665c440050a522180003: sdd - tur checker reports path is up Feb 6 19:15:26 vsa30 multipathd: 8:48: reinstated Feb 6 19:15:26 vsa30 multipathd: 3600144f0665c440050a522180003: remaining active paths: 2 Feb 6 19:15:26 vsa30 multipathd: 3600144f0665c440050a522180002: sdc - tur checker reports path is up Feb 6 19:15:26 vsa30 multipathd: 8:32: reinstated Feb 6 19:15:26 vsa30 multipathd: 3600144f0665c440050a522180002: remaining active paths: 2 Feb 6 19:15:26 vsa30 multipathd: sdc: remove path (uevent) Feb 6 19:15:26 vsa30 multipathd: 3600144f0665c440050a522180002: load table [0 409600 multipath 0 0 1 1 round-robin 0 1 1 8:80 1] Feb 6 19:15:26 vsa30 multipathd: sdc: path removed from map 3600144f0665c440050a522180002 Feb 6 19:15:26 vsa30 kernel: sd 9:0:0:1: [sdc] Synchronizing SCSI cache Feb 6 19:15:26 vsa30 multipathd: sdd: remove path (uevent) Feb 6 19:15:26 vsa30 multipathd: 3600144f0665c440050a522180003: load table [0 409600 multipath 0 0 1 1 round-robin 0 1 1 8:96 1] Feb 6 19:15:26 vsa30 multipathd: sdd: path removed from map 3600144f0665c440050a522180003 Feb 6 19:15:26 vsa30 kernel: sd 9:0:0:2: [sdd] Synchronizing SCSI cache - disable port 2 @19:22:50 - error recovery kicked in on host10 @ 19:23:40 - I/Os failed with NO path to target @ 19:24:27 - without enabling port 2, error recovery was still going on host10 still 19:57:52 and stop. - host10 was still in sysfs /sys/class/scsi_host/host10 & taking reference on ib_srp module - enable port 2 - nothing happened. Conclusion: 1. disable the port/path long enough >35 minutes, we have dangling scsi host. 2. enable the port within 30 minute, scsi host re-establish connection, path re-instate and then scsi_host was removed (no entry in sysfs) I attached a log here to show what happened above. thanks, -vu messages.bz2 Description: Binary data
[PATCH 38/77] IB/core: convert to idr_alloc()
Convert to the much saner new idr interface. Only compile tested. v2: Mike triggered WARN_ON() in idr_preload() because send_mad(), which may be used from non-process context, was calling idr_preload() unconditionally. Preload iff @gfp_mask has __GFP_WAIT. Signed-off-by: Tejun Heo Reviewed-by: Sean Hefty Reported-by: "Marciniszyn, Mike" Cc: Roland Dreier Cc: Sean Hefty Cc: Hal Rosenstock Cc: linux-rdma@vger.kernel.org --- drivers/infiniband/core/cm.c | 22 +++--- drivers/infiniband/core/cma.c| 24 +++- drivers/infiniband/core/sa_query.c | 18 ++ drivers/infiniband/core/ucm.c| 16 drivers/infiniband/core/ucma.c | 32 drivers/infiniband/core/uverbs_cmd.c | 17 - 6 files changed, 48 insertions(+), 81 deletions(-) diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index 394fea2..98281fe 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -382,20 +382,21 @@ static int cm_init_av_by_path(struct ib_sa_path_rec *path, struct cm_av *av) static int cm_alloc_id(struct cm_id_private *cm_id_priv) { unsigned long flags; - int ret, id; + int id; static int next_id; - do { - spin_lock_irqsave(&cm.lock, flags); - ret = idr_get_new_above(&cm.local_id_table, cm_id_priv, - next_id, &id); - if (!ret) - next_id = ((unsigned) id + 1) & MAX_IDR_MASK; - spin_unlock_irqrestore(&cm.lock, flags); - } while( (ret == -EAGAIN) && idr_pre_get(&cm.local_id_table, GFP_KERNEL) ); + idr_preload(GFP_KERNEL); + spin_lock_irqsave(&cm.lock, flags); + + id = idr_alloc(&cm.local_id_table, cm_id_priv, next_id, 0, GFP_NOWAIT); + if (id >= 0) + next_id = ((unsigned) id + 1) & MAX_IDR_MASK; + + spin_unlock_irqrestore(&cm.lock, flags); + idr_preload_end(); cm_id_priv->id.local_id = (__force __be32)id ^ cm.random_id_operand; - return ret; + return id < 0 ? id : 0; } static void cm_free_id(__be32 local_id) @@ -3844,7 +3845,6 @@ static int __init ib_cm_init(void) cm.remote_sidr_table = RB_ROOT; idr_init(&cm.local_id_table); get_random_bytes(&cm.random_id_operand, sizeof cm.random_id_operand); - idr_pre_get(&cm.local_id_table, GFP_KERNEL); INIT_LIST_HEAD(&cm.timewait_list); ret = class_register(&cm_class); diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index d789eea..c32eeaa 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -2143,33 +2143,23 @@ static int cma_alloc_port(struct idr *ps, struct rdma_id_private *id_priv, unsigned short snum) { struct rdma_bind_list *bind_list; - int port, ret; + int ret; bind_list = kzalloc(sizeof *bind_list, GFP_KERNEL); if (!bind_list) return -ENOMEM; - do { - ret = idr_get_new_above(ps, bind_list, snum, &port); - } while ((ret == -EAGAIN) && idr_pre_get(ps, GFP_KERNEL)); - - if (ret) - goto err1; - - if (port != snum) { - ret = -EADDRNOTAVAIL; - goto err2; - } + ret = idr_alloc(ps, bind_list, snum, snum + 1, GFP_KERNEL); + if (ret < 0) + goto err; bind_list->ps = ps; - bind_list->port = (unsigned short) port; + bind_list->port = (unsigned short)ret; cma_bind_port(bind_list, id_priv); return 0; -err2: - idr_remove(ps, port); -err1: +err: kfree(bind_list); - return ret; + return ret == -ENOSPC ? -EADDRNOTAVAIL : ret; } static int cma_alloc_any_port(struct idr *ps, struct rdma_id_private *id_priv) diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c index a8905ab..934f45e 100644 --- a/drivers/infiniband/core/sa_query.c +++ b/drivers/infiniband/core/sa_query.c @@ -611,19 +611,21 @@ static void init_mad(struct ib_sa_mad *mad, struct ib_mad_agent *agent) static int send_mad(struct ib_sa_query *query, int timeout_ms, gfp_t gfp_mask) { + bool preload = gfp_mask & __GFP_WAIT; unsigned long flags; int ret, id; -retry: - if (!idr_pre_get(&query_idr, gfp_mask)) - return -ENOMEM; + if (preload) + idr_preload(gfp_mask); spin_lock_irqsave(&idr_lock, flags); - ret = idr_get_new(&query_idr, query, &id); + + id = idr_alloc(&query_idr, query, 0, 0, GFP_NOWAIT); + spin_unlock_irqrestore(&idr_lock, flags); - if (ret == -EAGAIN) - goto retry; - if (ret) - return ret; + if (preload) + idr_preload_end(); + if (id < 0) + re
[PATCH 40/77] IB/cxgb3: convert to idr_alloc()
Convert to the much saner new idr interface. Only compile tested. Signed-off-by: Tejun Heo Reviewed-by: Steve Wise Cc: linux-rdma@vger.kernel.org --- drivers/infiniband/hw/cxgb3/iwch.h | 24 +++- 1 file changed, 11 insertions(+), 13 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/iwch.h b/drivers/infiniband/hw/cxgb3/iwch.h index a1c4457..8378622 100644 --- a/drivers/infiniband/hw/cxgb3/iwch.h +++ b/drivers/infiniband/hw/cxgb3/iwch.h @@ -153,19 +153,17 @@ static inline int insert_handle(struct iwch_dev *rhp, struct idr *idr, void *handle, u32 id) { int ret; - int newid; - - do { - if (!idr_pre_get(idr, GFP_KERNEL)) { - return -ENOMEM; - } - spin_lock_irq(&rhp->lock); - ret = idr_get_new_above(idr, handle, id, &newid); - BUG_ON(newid != id); - spin_unlock_irq(&rhp->lock); - } while (ret == -EAGAIN); - - return ret; + + idr_preload(GFP_KERNEL); + spin_lock_irq(&rhp->lock); + + ret = idr_alloc(idr, handle, id, id + 1, GFP_NOWAIT); + + spin_unlock_irq(&rhp->lock); + idr_preload_end(); + + BUG_ON(ret == -ENOSPC); + return ret < 0 ? ret : 0; } static inline void remove_handle(struct iwch_dev *rhp, struct idr *idr, u32 id) -- 1.8.1 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 39/77] IB/amso1100: convert to idr_alloc()
Convert to the much saner new idr interface. Only compile tested. Signed-off-by: Tejun Heo Reviewed-by: Steve Wise Cc: Tom Tucker Cc: linux-rdma@vger.kernel.org --- drivers/infiniband/hw/amso1100/c2_qp.c | 19 +++ 1 file changed, 11 insertions(+), 8 deletions(-) diff --git a/drivers/infiniband/hw/amso1100/c2_qp.c b/drivers/infiniband/hw/amso1100/c2_qp.c index 28cd5cb..0ab826b 100644 --- a/drivers/infiniband/hw/amso1100/c2_qp.c +++ b/drivers/infiniband/hw/amso1100/c2_qp.c @@ -382,14 +382,17 @@ static int c2_alloc_qpn(struct c2_dev *c2dev, struct c2_qp *qp) { int ret; -do { - spin_lock_irq(&c2dev->qp_table.lock); - ret = idr_get_new_above(&c2dev->qp_table.idr, qp, - c2dev->qp_table.last++, &qp->qpn); - spin_unlock_irq(&c2dev->qp_table.lock); -} while ((ret == -EAGAIN) && -idr_pre_get(&c2dev->qp_table.idr, GFP_KERNEL)); - return ret; + idr_preload(GFP_KERNEL); + spin_lock_irq(&c2dev->qp_table.lock); + + ret = idr_alloc(&c2dev->qp_table.idr, qp, c2dev->qp_table.last++, 0, + GFP_NOWAIT); + if (ret >= 0) + qp->qpn = ret; + + spin_unlock_irq(&c2dev->qp_table.lock); + idr_preload_end(); + return ret < 0 ? ret : 0; } static void c2_free_qpn(struct c2_dev *c2dev, int qpn) -- 1.8.1 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 43/77] IB/ipath: convert to idr_alloc()
Convert to the much saner new idr interface. Only compile tested. Signed-off-by: Tejun Heo Cc: Mike Marciniszyn Cc: linux-rdma@vger.kernel.org --- drivers/infiniband/hw/ipath/ipath_driver.c | 16 1 file changed, 4 insertions(+), 12 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c b/drivers/infiniband/hw/ipath/ipath_driver.c index 7b371f5..fcdaeea 100644 --- a/drivers/infiniband/hw/ipath/ipath_driver.c +++ b/drivers/infiniband/hw/ipath/ipath_driver.c @@ -194,11 +194,6 @@ static struct ipath_devdata *ipath_alloc_devdata(struct pci_dev *pdev) struct ipath_devdata *dd; int ret; - if (!idr_pre_get(&unit_table, GFP_KERNEL)) { - dd = ERR_PTR(-ENOMEM); - goto bail; - } - dd = vzalloc(sizeof(*dd)); if (!dd) { dd = ERR_PTR(-ENOMEM); @@ -206,9 +201,10 @@ static struct ipath_devdata *ipath_alloc_devdata(struct pci_dev *pdev) } dd->ipath_unit = -1; + idr_preload(GFP_KERNEL); spin_lock_irqsave(&ipath_devs_lock, flags); - ret = idr_get_new(&unit_table, dd, &dd->ipath_unit); + ret = idr_alloc(&unit_table, dd, 0, 0, GFP_KERNEL); if (ret < 0) { printk(KERN_ERR IPATH_DRV_NAME ": Could not allocate unit ID: error %d\n", -ret); @@ -216,6 +212,7 @@ static struct ipath_devdata *ipath_alloc_devdata(struct pci_dev *pdev) dd = ERR_PTR(ret); goto bail_unlock; } + dd->ipath_unit = ret; dd->pcidev = pdev; pci_set_drvdata(pdev, dd); @@ -224,7 +221,7 @@ static struct ipath_devdata *ipath_alloc_devdata(struct pci_dev *pdev) bail_unlock: spin_unlock_irqrestore(&ipath_devs_lock, flags); - + idr_preload_end(); bail: return dd; } @@ -2503,11 +2500,6 @@ static int __init infinipath_init(void) * the PCI subsystem. */ idr_init(&unit_table); - if (!idr_pre_get(&unit_table, GFP_KERNEL)) { - printk(KERN_ERR IPATH_DRV_NAME ": idr_pre_get() failed\n"); - ret = -ENOMEM; - goto bail; - } ret = pci_register_driver(&ipath_driver); if (ret < 0) { -- 1.8.1 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 46/77] IB/qib: convert to idr_alloc()
Convert to the much saner new idr interface. Only compile tested. Signed-off-by: Tejun Heo Cc: Mike Marciniszyn Cc: linux-rdma@vger.kernel.org --- drivers/infiniband/hw/qib/qib_init.c | 21 - 1 file changed, 8 insertions(+), 13 deletions(-) diff --git a/drivers/infiniband/hw/qib/qib_init.c b/drivers/infiniband/hw/qib/qib_init.c index ddf066d..50e33aa 100644 --- a/drivers/infiniband/hw/qib/qib_init.c +++ b/drivers/infiniband/hw/qib/qib_init.c @@ -1060,22 +1060,23 @@ struct qib_devdata *qib_alloc_devdata(struct pci_dev *pdev, size_t extra) struct qib_devdata *dd; int ret; - if (!idr_pre_get(&qib_unit_table, GFP_KERNEL)) { - dd = ERR_PTR(-ENOMEM); - goto bail; - } - dd = (struct qib_devdata *) ib_alloc_device(sizeof(*dd) + extra); if (!dd) { dd = ERR_PTR(-ENOMEM); goto bail; } + idr_preload(GFP_KERNEL); spin_lock_irqsave(&qib_devs_lock, flags); - ret = idr_get_new(&qib_unit_table, dd, &dd->unit); - if (ret >= 0) + + ret = idr_alloc(&qib_unit_table, dd, 0, 0, GFP_NOWAIT); + if (ret >= 0) { + dd->unit = ret; list_add(&dd->list, &qib_dev_list); + } + spin_unlock_irqrestore(&qib_devs_lock, flags); + idr_preload_end(); if (ret < 0) { qib_early_err(&pdev->dev, @@ -1180,11 +1181,6 @@ static int __init qlogic_ib_init(void) * the PCI subsystem. */ idr_init(&qib_unit_table); - if (!idr_pre_get(&qib_unit_table, GFP_KERNEL)) { - pr_err("idr_pre_get() failed\n"); - ret = -ENOMEM; - goto bail_cq_wq; - } ret = pci_register_driver(&qib_driver); if (ret < 0) { @@ -1199,7 +1195,6 @@ static int __init qlogic_ib_init(void) bail_unit: idr_destroy(&qib_unit_table); -bail_cq_wq: destroy_workqueue(qib_cq_wq); bail_dev: qib_dev_cleanup(); -- 1.8.1 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 44/77] IB/mlx4: convert to idr_alloc()
Convert to the much saner new idr interface. Only compile tested. Signed-off-by: Tejun Heo Cc: Jack Morgenstein Cc: Or Gerlitz Cc: Roland Dreier Cc: linux-rdma@vger.kernel.org --- drivers/infiniband/hw/mlx4/cm.c | 32 +++- 1 file changed, 15 insertions(+), 17 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/cm.c b/drivers/infiniband/hw/mlx4/cm.c index dbc99d4..80e59ed 100644 --- a/drivers/infiniband/hw/mlx4/cm.c +++ b/drivers/infiniband/hw/mlx4/cm.c @@ -203,7 +203,7 @@ static void sl_id_map_add(struct ib_device *ibdev, struct id_map_entry *new) static struct id_map_entry * id_map_alloc(struct ib_device *ibdev, int slave_id, u32 sl_cm_id) { - int ret, id; + int ret; static int next_id; struct id_map_entry *ent; struct mlx4_ib_sriov *sriov = &to_mdev(ibdev)->sriov; @@ -220,25 +220,23 @@ id_map_alloc(struct ib_device *ibdev, int slave_id, u32 sl_cm_id) ent->dev = to_mdev(ibdev); INIT_DELAYED_WORK(&ent->timeout, id_map_ent_timeout); - do { - spin_lock(&to_mdev(ibdev)->sriov.id_map_lock); - ret = idr_get_new_above(&sriov->pv_id_table, ent, - next_id, &id); - if (!ret) { - next_id = ((unsigned) id + 1) & MAX_IDR_MASK; - ent->pv_cm_id = (u32)id; - sl_id_map_add(ibdev, ent); - } + idr_preload(GFP_KERNEL); + spin_lock(&to_mdev(ibdev)->sriov.id_map_lock); - spin_unlock(&sriov->id_map_lock); - } while (ret == -EAGAIN && idr_pre_get(&sriov->pv_id_table, GFP_KERNEL)); - /*the function idr_get_new_above can return -ENOSPC, so don't insert in that case.*/ - if (!ret) { - spin_lock(&sriov->id_map_lock); + ret = idr_alloc(&sriov->pv_id_table, ent, next_id, 0, GFP_NOWAIT); + if (ret >= 0) { + next_id = ((unsigned)ret + 1) & MAX_IDR_MASK; + ent->pv_cm_id = (u32)ret; + sl_id_map_add(ibdev, ent); list_add_tail(&ent->list, &sriov->cm_list); - spin_unlock(&sriov->id_map_lock); - return ent; } + + spin_unlock(&sriov->id_map_lock); + idr_preload_end(); + + if (ret >= 0) + return ent; + /*error flow*/ kfree(ent); mlx4_ib_warn(ibdev, "No more space in the idr (err:0x%x)\n", ret); -- 1.8.1 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 45/77] IB/ocrdma: convert to idr_alloc()
Convert to the much saner new idr interface. Only compile tested. Signed-off-by: Tejun Heo Cc: Roland Dreier Cc: linux-rdma@vger.kernel.org --- drivers/infiniband/hw/ocrdma/ocrdma_main.c | 14 +- 1 file changed, 1 insertion(+), 13 deletions(-) diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_main.c b/drivers/infiniband/hw/ocrdma/ocrdma_main.c index c4e0131..48928c8 100644 --- a/drivers/infiniband/hw/ocrdma/ocrdma_main.c +++ b/drivers/infiniband/hw/ocrdma/ocrdma_main.c @@ -51,18 +51,6 @@ static DEFINE_IDR(ocrdma_dev_id); static union ib_gid ocrdma_zero_sgid; -static int ocrdma_get_instance(void) -{ - int instance = 0; - - /* Assign an unused number */ - if (!idr_pre_get(&ocrdma_dev_id, GFP_KERNEL)) - return -1; - if (idr_get_new(&ocrdma_dev_id, NULL, &instance)) - return -1; - return instance; -} - void ocrdma_get_guid(struct ocrdma_dev *dev, u8 *guid) { u8 mac_addr[6]; @@ -416,7 +404,7 @@ static struct ocrdma_dev *ocrdma_add(struct be_dev_info *dev_info) goto idr_err; memcpy(&dev->nic_info, dev_info, sizeof(*dev_info)); - dev->id = ocrdma_get_instance(); + dev->id = idr_alloc(&ocrdma_dev_id, NULL, 0, 0, GFP_KERNEL); if (dev->id < 0) goto idr_err; -- 1.8.1 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 41/77] IB/cxgb4: convert to idr_alloc()
Convert to the much saner new idr interface. Only compile tested. Signed-off-by: Tejun Heo Reviewed-by: Steve Wise Cc: linux-rdma@vger.kernel.org --- drivers/infiniband/hw/cxgb4/iw_cxgb4.h | 27 ++- 1 file changed, 14 insertions(+), 13 deletions(-) diff --git a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h index 9c1644f..7f862da 100644 --- a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h +++ b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h @@ -260,20 +260,21 @@ static inline int _insert_handle(struct c4iw_dev *rhp, struct idr *idr, void *handle, u32 id, int lock) { int ret; - int newid; - do { - if (!idr_pre_get(idr, lock ? GFP_KERNEL : GFP_ATOMIC)) - return -ENOMEM; - if (lock) - spin_lock_irq(&rhp->lock); - ret = idr_get_new_above(idr, handle, id, &newid); - BUG_ON(!ret && newid != id); - if (lock) - spin_unlock_irq(&rhp->lock); - } while (ret == -EAGAIN); - - return ret; + if (lock) { + idr_preload(GFP_KERNEL); + spin_lock_irq(&rhp->lock); + } + + ret = idr_alloc(idr, handle, id, id + 1, GFP_ATOMIC); + + if (lock) { + spin_unlock_irq(&rhp->lock); + idr_preload_end(); + } + + BUG_ON(ret == -ENOSPC); + return ret < 0 ? ret : 0; } static inline int insert_handle(struct c4iw_dev *rhp, struct idr *idr, -- 1.8.1 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 42/77] IB/ehca: convert to idr_alloc()
Convert to the much saner new idr interface. Only compile tested. Signed-off-by: Tejun Heo Cc: Hoang-Nam Nguyen Cc: Christoph Raisch Cc: linux-rdma@vger.kernel.org --- drivers/infiniband/hw/ehca/ehca_cq.c | 27 +++ drivers/infiniband/hw/ehca/ehca_qp.c | 34 +++--- 2 files changed, 22 insertions(+), 39 deletions(-) diff --git a/drivers/infiniband/hw/ehca/ehca_cq.c b/drivers/infiniband/hw/ehca/ehca_cq.c index 8f52901..212150c 100644 --- a/drivers/infiniband/hw/ehca/ehca_cq.c +++ b/drivers/infiniband/hw/ehca/ehca_cq.c @@ -128,7 +128,7 @@ struct ib_cq *ehca_create_cq(struct ib_device *device, int cqe, int comp_vector, void *vpage; u32 counter; u64 rpage, cqx_fec, h_ret; - int ipz_rc, ret, i; + int ipz_rc, i; unsigned long flags; if (cqe >= 0x - 64 - additional_cqe) @@ -163,32 +163,19 @@ struct ib_cq *ehca_create_cq(struct ib_device *device, int cqe, int comp_vector, adapter_handle = shca->ipz_hca_handle; param.eq_handle = shca->eq.ipz_eq_handle; - do { - if (!idr_pre_get(&ehca_cq_idr, GFP_KERNEL)) { - cq = ERR_PTR(-ENOMEM); - ehca_err(device, "Can't reserve idr nr. device=%p", -device); - goto create_cq_exit1; - } - - write_lock_irqsave(&ehca_cq_idr_lock, flags); - ret = idr_get_new(&ehca_cq_idr, my_cq, &my_cq->token); - write_unlock_irqrestore(&ehca_cq_idr_lock, flags); - } while (ret == -EAGAIN); + idr_preload(GFP_KERNEL); + write_lock_irqsave(&ehca_cq_idr_lock, flags); + my_cq->token = idr_alloc(&ehca_cq_idr, my_cq, 0, 0x200, GFP_NOWAIT); + write_unlock_irqrestore(&ehca_cq_idr_lock, flags); + idr_preload_end(); - if (ret) { + if (my_cq->token < 0) { cq = ERR_PTR(-ENOMEM); ehca_err(device, "Can't allocate new idr entry. device=%p", device); goto create_cq_exit1; } - if (my_cq->token > 0x1FF) { - cq = ERR_PTR(-ENOMEM); - ehca_err(device, "Invalid number of cq. device=%p", device); - goto create_cq_exit2; - } - /* * CQs maximum depth is 4GB-64, but we need additional 20 as buffer * for receiving errors CQEs. diff --git a/drivers/infiniband/hw/ehca/ehca_qp.c b/drivers/infiniband/hw/ehca/ehca_qp.c index 1493939..00d6861 100644 --- a/drivers/infiniband/hw/ehca/ehca_qp.c +++ b/drivers/infiniband/hw/ehca/ehca_qp.c @@ -636,30 +636,26 @@ static struct ehca_qp *internal_create_qp( my_qp->send_cq = container_of(init_attr->send_cq, struct ehca_cq, ib_cq); - do { - if (!idr_pre_get(&ehca_qp_idr, GFP_KERNEL)) { - ret = -ENOMEM; - ehca_err(pd->device, "Can't reserve idr resources."); - goto create_qp_exit0; - } + idr_preload(GFP_KERNEL); + write_lock_irqsave(&ehca_qp_idr_lock, flags); - write_lock_irqsave(&ehca_qp_idr_lock, flags); - ret = idr_get_new(&ehca_qp_idr, my_qp, &my_qp->token); - write_unlock_irqrestore(&ehca_qp_idr_lock, flags); - } while (ret == -EAGAIN); + ret = idr_alloc(&ehca_qp_idr, my_qp, 0, 0x200, GFP_NOWAIT); + if (ret >= 0) + my_qp->token = ret; - if (ret) { - ret = -ENOMEM; - ehca_err(pd->device, "Can't allocate new idr entry."); + write_unlock_irqrestore(&ehca_qp_idr_lock, flags); + idr_preload_end(); + if (ret < 0) { + if (ret == -ENOSPC) { + ret = -EINVAL; + ehca_err(pd->device, "Invalid number of qp"); + } else { + ret = -ENOMEM; + ehca_err(pd->device, "Can't allocate new idr entry."); + } goto create_qp_exit0; } - if (my_qp->token > 0x1FF) { - ret = -EINVAL; - ehca_err(pd->device, "Invalid number of qp"); - goto create_qp_exit1; - } - if (has_srq) parms.srq_token = my_qp->token; -- 1.8.1 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NFS over RDMA crashing
Hi. In case you're interested, I did the NFS/RDMA backports for OFED. I tested that NFS/RDMA in OFED 3.5 works on kernel 3.5, and also the RHEL 6.3 kernel. However, I did not test it with SRIOV. If you test it (OFED-3.5-rc6 was released last week), I'd like to know how it goes. Thanks. Jeff Becker On 02/06/2013 07:58 AM, Steve Wise wrote: On 2/6/2013 9:48 AM, Yan Burman wrote: When I moved to commit 79f77bf9a4e3dd5ead006b8f17e7c4ff07d8374e I was no longer getting the server crashes, so the reset of my tests were done using that point (it is somewhere in the middle of 3.7.0-rc2) +tom tucker I'd try going back a few kernels, like to 3.5.x and see if things are more stable. If you find a point that works, then git bisect might help identify the regression. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH for-next 06/10] IB/core: Enhance memory windows support
From: Shani Michaeli This patch enhanced the IB core support for Memory Windows. Memory Windows (MW) allow an application to have better/flexible control over remote access to memory. Two types of MWs are supported: Type 1 - associated with PD only Type 2A - associated with QPN only Type 2B - associated with PD and QPN Applications can allocate a MW once, and then repeatedly bind the MW to different ranges in MRs that are associated to the same PD. Type 1 windows are bound through a verb, while type 2 windows are bound by posting a work request. The 32-bit memory key is composed of a 24-bit index and an 8-bit key. The key is changed with each bind, thus allowing more control over the peer's use of the memory key. The changes introduced are the following: * add memory window type enum and a corresponding parameter to ib_alloc_mw. * type 2 memory window bind work request support. * create a struct that contains the common part of the bind verb struct ibv_mw_bind and the bind work request into a single struct. * add the ib_inc_rkey helper function to advance the tag part of an rkey. Consumer interface details: * new device capability flags IB_DEVICE_MEM_WINDOW_TYPE_2A and IB_DEVICE_MEM_WINDOW_TYPE_2B are added to indicate device support for these features. Devices can set either IB_DEVICE_MEM_WINDOW_TYPE_2A or IB_DEVICE_MEM_WINDOW_TYPE_2B if it supports type 2A or type 2B memory windows. It can set neither to indicate it doesn't support type 2 windows at all. * modify existing provides and consumers code to the new param of ib_alloc_mw and the ib_mw_bind_info structure Signed-off-by: Haggai Eran Signed-off-by: Shani Michaeli Signed-off-by: Or Gerlitz --- drivers/infiniband/core/verbs.c |5 +- drivers/infiniband/hw/cxgb3/iwch_provider.c |5 ++- drivers/infiniband/hw/cxgb3/iwch_qp.c | 15 +++--- drivers/infiniband/hw/cxgb4/iw_cxgb4.h |2 +- drivers/infiniband/hw/cxgb4/mem.c |5 ++- drivers/infiniband/hw/ehca/ehca_iverbs.h|2 +- drivers/infiniband/hw/ehca/ehca_mrmw.c |5 ++- drivers/infiniband/hw/nes/nes_verbs.c | 19 --- include/rdma/ib_verbs.h | 73 +++--- net/sunrpc/xprtrdma/verbs.c | 20 10 files changed, 110 insertions(+), 41 deletions(-) diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index 30f199e..a8fdd33 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -1099,18 +1099,19 @@ EXPORT_SYMBOL(ib_free_fast_reg_page_list); /* Memory windows */ -struct ib_mw *ib_alloc_mw(struct ib_pd *pd) +struct ib_mw *ib_alloc_mw(struct ib_pd *pd, enum ib_mw_type type) { struct ib_mw *mw; if (!pd->device->alloc_mw) return ERR_PTR(-ENOSYS); - mw = pd->device->alloc_mw(pd); + mw = pd->device->alloc_mw(pd, type); if (!IS_ERR(mw)) { mw->device = pd->device; mw->pd = pd; mw->uobject = NULL; + mw->type= type; atomic_inc(&pd->usecnt); } diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c index 0bdf09a..074d5c2 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c @@ -738,7 +738,7 @@ static struct ib_mr *iwch_get_dma_mr(struct ib_pd *pd, int acc) return ibmr; } -static struct ib_mw *iwch_alloc_mw(struct ib_pd *pd) +static struct ib_mw *iwch_alloc_mw(struct ib_pd *pd, enum ib_mw_type type) { struct iwch_dev *rhp; struct iwch_pd *php; @@ -747,6 +747,9 @@ static struct ib_mw *iwch_alloc_mw(struct ib_pd *pd) u32 stag = 0; int ret; + if (type != IB_MW_TYPE_1) + return ERR_PTR(-EINVAL); + php = to_iwch_pd(pd); rhp = php->rhp; mhp = kzalloc(sizeof(*mhp), GFP_KERNEL); diff --git a/drivers/infiniband/hw/cxgb3/iwch_qp.c b/drivers/infiniband/hw/cxgb3/iwch_qp.c index 6de8463..e5649e8 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_qp.c +++ b/drivers/infiniband/hw/cxgb3/iwch_qp.c @@ -567,18 +567,19 @@ int iwch_bind_mw(struct ib_qp *qp, if (mw_bind->send_flags & IB_SEND_SIGNALED) t3_wr_flags = T3_COMPLETION_FLAG; - sgl.addr = mw_bind->addr; - sgl.lkey = mw_bind->mr->lkey; - sgl.length = mw_bind->length; + sgl.addr = mw_bind->bind_info.addr; + sgl.lkey = mw_bind->bind_info.mr->lkey; + sgl.length = mw_bind->bind_info.length; wqe->bind.reserved = 0; wqe->bind.type = TPT_VATO; /* TBD: check perms */ - wqe->bind.perms = iwch_ib_to_tpt_bind_access(mw_bind->mw_access_flags); - wqe->bind.mr_stag = cpu_to_be32(mw_bind->mr->lkey); + wqe->bind.perms = iwch_ib_to_tpt_bind_access( + mw_bind->bind_info.mw_access_flags); + wqe->bind.mr_stag = c
[PATCH for-next 05/10] net/mlx4_core: Enable memory windows in {INIT,QUERY}_HCA
From: Shani Michaeli Add memory windows related code to INIT_HCA and QUERY_HCA Signed-off-by: Haggai Eran Signed-off-by: Shani Michaeli Signed-off-by: Or Gerlitz --- drivers/net/ethernet/mellanox/mlx4/fw.c |3 +++ drivers/net/ethernet/mellanox/mlx4/fw.h |1 + drivers/net/ethernet/mellanox/mlx4/main.c |4 drivers/net/ethernet/mellanox/mlx4/mlx4.h |2 ++ 4 files changed, 10 insertions(+), 0 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c b/drivers/net/ethernet/mellanox/mlx4/fw.c index a389612..d136b36 100644 --- a/drivers/net/ethernet/mellanox/mlx4/fw.c +++ b/drivers/net/ethernet/mellanox/mlx4/fw.c @@ -1207,6 +1207,7 @@ int mlx4_INIT_HCA(struct mlx4_dev *dev, struct mlx4_init_hca_param *param) #define INIT_HCA_FS_IB_NUM_ADDRS_OFFSET (INIT_HCA_FS_PARAM_OFFSET + 0x26) #define INIT_HCA_TPT_OFFSET 0x0f0 #define INIT_HCA_DMPT_BASE_OFFSET (INIT_HCA_TPT_OFFSET + 0x00) +#define INIT_HCA_TPT_MW_OFFSET (INIT_HCA_TPT_OFFSET + 0x08) #define INIT_HCA_LOG_MPT_SZ_OFFSET (INIT_HCA_TPT_OFFSET + 0x0b) #define INIT_HCA_MTT_BASE_OFFSET(INIT_HCA_TPT_OFFSET + 0x10) #define INIT_HCA_CMPT_BASE_OFFSET (INIT_HCA_TPT_OFFSET + 0x18) @@ -1323,6 +1324,7 @@ int mlx4_INIT_HCA(struct mlx4_dev *dev, struct mlx4_init_hca_param *param) /* TPT attributes */ MLX4_PUT(inbox, param->dmpt_base, INIT_HCA_DMPT_BASE_OFFSET); + MLX4_PUT(inbox, param->mw_enabled, INIT_HCA_TPT_MW_OFFSET); MLX4_PUT(inbox, param->log_mpt_sz, INIT_HCA_LOG_MPT_SZ_OFFSET); MLX4_PUT(inbox, param->mtt_base, INIT_HCA_MTT_BASE_OFFSET); MLX4_PUT(inbox, param->cmpt_base, INIT_HCA_CMPT_BASE_OFFSET); @@ -1419,6 +1421,7 @@ int mlx4_QUERY_HCA(struct mlx4_dev *dev, /* TPT attributes */ MLX4_GET(param->dmpt_base, outbox, INIT_HCA_DMPT_BASE_OFFSET); + MLX4_GET(param->mw_enabled, outbox, INIT_HCA_TPT_MW_OFFSET); MLX4_GET(param->log_mpt_sz, outbox, INIT_HCA_LOG_MPT_SZ_OFFSET); MLX4_GET(param->mtt_base, outbox, INIT_HCA_MTT_BASE_OFFSET); MLX4_GET(param->cmpt_base, outbox, INIT_HCA_CMPT_BASE_OFFSET); diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.h b/drivers/net/ethernet/mellanox/mlx4/fw.h index dbf2f69..9f1a25c 100644 --- a/drivers/net/ethernet/mellanox/mlx4/fw.h +++ b/drivers/net/ethernet/mellanox/mlx4/fw.h @@ -170,6 +170,7 @@ struct mlx4_init_hca_param { u8 log_mc_table_sz; u8 log_mpt_sz; u8 log_uar_sz; + u8 mw_enabled; /* Enable memory windows */ u8 uar_page_sz; /* log pg sz in 4k chunks */ u8 fs_hash_enable_bits; u8 steering_mode; /* for QUERY_HCA */ diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c index 9a84c75..2a4dda0 100644 --- a/drivers/net/ethernet/mellanox/mlx4/main.c +++ b/drivers/net/ethernet/mellanox/mlx4/main.c @@ -1447,6 +1447,10 @@ static int mlx4_init_hca(struct mlx4_dev *dev) init_hca.log_uar_sz = ilog2(dev->caps.num_uars); init_hca.uar_page_sz = PAGE_SHIFT - 12; + init_hca.mw_enabled = 0; + if (dev->caps.flags & MLX4_DEV_CAP_FLAG_MEM_WINDOW || + dev->caps.bmme_flags & MLX4_BMME_FLAG_TYPE_2_WIN) + init_hca.mw_enabled = INIT_HCA_TPT_MW_ENABLE; err = mlx4_init_icm(dev, &dev_cap, &init_hca, icm_size); if (err) diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4.h b/drivers/net/ethernet/mellanox/mlx4/mlx4.h index 539212b..8b75d5e 100644 --- a/drivers/net/ethernet/mellanox/mlx4/mlx4.h +++ b/drivers/net/ethernet/mellanox/mlx4/mlx4.h @@ -60,6 +60,8 @@ #define MLX4_FS_MGM_LOG_ENTRY_SIZE 7 #define MLX4_FS_NUM_MCG(1 << 17) +#define INIT_HCA_TPT_MW_ENABLE (1 << 7) + enum { MLX4_FS_L2_HASH = 0, MLX4_FS_L2_L3_L4_HASH, -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH for-next 02/10] net/mlx4_core: Rename MPT related service routines to have mpt_ prefix
From: Shani Michaeli The MPT - Memory Protection Table - is used by both memory windows and memory regions. Hence, all MPT references are relevant for both types of memory objects. Rename the relevant functions to start with mpt_ instead of the current mr_ prefix. Signed-off-by: Haggai Eran Signed-off-by: Shani Michaeli Signed-off-by: Or Gerlitz --- drivers/net/ethernet/mellanox/mlx4/mlx4.h | 16 +++--- drivers/net/ethernet/mellanox/mlx4/mr.c| 48 ++-- .../net/ethernet/mellanox/mlx4/resource_tracker.c | 14 +++--- 3 files changed, 39 insertions(+), 39 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4.h b/drivers/net/ethernet/mellanox/mlx4/mlx4.h index 116c5c2..5075236 100644 --- a/drivers/net/ethernet/mellanox/mlx4/mlx4.h +++ b/drivers/net/ethernet/mellanox/mlx4/mlx4.h @@ -118,10 +118,10 @@ enum { MLX4_NUM_CMPTS = MLX4_CMPT_NUM_TYPE << MLX4_CMPT_SHIFT }; -enum mlx4_mr_state { - MLX4_MR_DISABLED = 0, - MLX4_MR_EN_HW, - MLX4_MR_EN_SW +enum mlx4_mpt_state { + MLX4_MPT_DISABLED = 0, + MLX4_MPT_EN_HW, + MLX4_MPT_EN_SW }; #define MLX4_COMM_TIME 1 @@ -871,10 +871,10 @@ int __mlx4_cq_alloc_icm(struct mlx4_dev *dev, int *cqn); void __mlx4_cq_free_icm(struct mlx4_dev *dev, int cqn); int __mlx4_srq_alloc_icm(struct mlx4_dev *dev, int *srqn); void __mlx4_srq_free_icm(struct mlx4_dev *dev, int srqn); -int __mlx4_mr_reserve(struct mlx4_dev *dev); -void __mlx4_mr_release(struct mlx4_dev *dev, u32 index); -int __mlx4_mr_alloc_icm(struct mlx4_dev *dev, u32 index); -void __mlx4_mr_free_icm(struct mlx4_dev *dev, u32 index); +int __mlx4_mpt_reserve(struct mlx4_dev *dev); +void __mlx4_mpt_release(struct mlx4_dev *dev, u32 index); +int __mlx4_mpt_alloc_icm(struct mlx4_dev *dev, u32 index); +void __mlx4_mpt_free_icm(struct mlx4_dev *dev, u32 index); u32 __mlx4_alloc_mtt_range(struct mlx4_dev *dev, int order); void __mlx4_free_mtt_range(struct mlx4_dev *dev, u32 first_seg, int order); diff --git a/drivers/net/ethernet/mellanox/mlx4/mr.c b/drivers/net/ethernet/mellanox/mlx4/mr.c index c202d3a..49705cf 100644 --- a/drivers/net/ethernet/mellanox/mlx4/mr.c +++ b/drivers/net/ethernet/mellanox/mlx4/mr.c @@ -321,7 +321,7 @@ static int mlx4_mr_alloc_reserved(struct mlx4_dev *dev, u32 mridx, u32 pd, mr->size = size; mr->pd = pd; mr->access = access; - mr->enabled= MLX4_MR_DISABLED; + mr->enabled= MLX4_MPT_DISABLED; mr->key= hw_index_to_key(mridx); return mlx4_mtt_init(dev, npages, page_shift, &mr->mtt); @@ -335,14 +335,14 @@ static int mlx4_WRITE_MTT(struct mlx4_dev *dev, MLX4_CMD_TIME_CLASS_A, MLX4_CMD_WRAPPED); } -int __mlx4_mr_reserve(struct mlx4_dev *dev) +int __mlx4_mpt_reserve(struct mlx4_dev *dev) { struct mlx4_priv *priv = mlx4_priv(dev); return mlx4_bitmap_alloc(&priv->mr_table.mpt_bitmap); } -static int mlx4_mr_reserve(struct mlx4_dev *dev) +static int mlx4_mpt_reserve(struct mlx4_dev *dev) { u64 out_param; @@ -353,17 +353,17 @@ static int mlx4_mr_reserve(struct mlx4_dev *dev) return -1; return get_param_l(&out_param); } - return __mlx4_mr_reserve(dev); + return __mlx4_mpt_reserve(dev); } -void __mlx4_mr_release(struct mlx4_dev *dev, u32 index) +void __mlx4_mpt_release(struct mlx4_dev *dev, u32 index) { struct mlx4_priv *priv = mlx4_priv(dev); mlx4_bitmap_free(&priv->mr_table.mpt_bitmap, index); } -static void mlx4_mr_release(struct mlx4_dev *dev, u32 index) +static void mlx4_mpt_release(struct mlx4_dev *dev, u32 index) { u64 in_param; @@ -376,17 +376,17 @@ static void mlx4_mr_release(struct mlx4_dev *dev, u32 index) index); return; } - __mlx4_mr_release(dev, index); + __mlx4_mpt_release(dev, index); } -int __mlx4_mr_alloc_icm(struct mlx4_dev *dev, u32 index) +int __mlx4_mpt_alloc_icm(struct mlx4_dev *dev, u32 index) { struct mlx4_mr_table *mr_table = &mlx4_priv(dev)->mr_table; return mlx4_table_get(dev, &mr_table->dmpt_table, index); } -static int mlx4_mr_alloc_icm(struct mlx4_dev *dev, u32 index) +static int mlx4_mpt_alloc_icm(struct mlx4_dev *dev, u32 index) { u64 param; @@ -397,17 +397,17 @@ static int mlx4_mr_alloc_icm(struct mlx4_dev *dev, u32 index) MLX4_CMD_TIME_CLASS_A, MLX4_CMD_WRAPPED); } - return __mlx4_mr_alloc_icm(dev, index); + return __mlx4_mpt_alloc_icm(dev, index); } -void __mlx4_mr_free_icm(struct mlx4_dev *dev, u32 index) +void __mlx4_mpt_free_icm(struct mlx4_dev *dev, u32 index) { struct mlx4_mr_table *mr_table = &mlx4_priv(dev)->mr_table; mlx4_table_put(dev,
[PATCH for-next 10/10] IB/mlx4_ib: Advertize MW support
From: Shani Michaeli Indicate memory windows support through device capabilities, kernel verb entries and the relevant uverbs command mask entries. Signed-off-by: Haggai Eran Signed-off-by: Shani Michaeli Signed-off-by: Or Gerlitz --- drivers/infiniband/hw/mlx4/main.c | 19 +++ 1 files changed, 19 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c index e7d81c0..f77ff4f 100644 --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -137,6 +137,14 @@ static int mlx4_ib_query_device(struct ib_device *ibdev, props->device_cap_flags |= IB_DEVICE_MEM_MGT_EXTENSIONS; if (dev->dev->caps.flags & MLX4_DEV_CAP_FLAG_XRC) props->device_cap_flags |= IB_DEVICE_XRC; + if (dev->dev->caps.flags & MLX4_DEV_CAP_FLAG_MEM_WINDOW) + props->device_cap_flags |= IB_DEVICE_MEM_WINDOW; + if (dev->dev->caps.bmme_flags & MLX4_BMME_FLAG_TYPE_2_WIN) { + if (dev->dev->caps.bmme_flags & MLX4_BMME_FLAG_WIN_TYPE_2B) + props->device_cap_flags |= IB_DEVICE_MEM_WINDOW_TYPE_2B; + else + props->device_cap_flags |= IB_DEVICE_MEM_WINDOW_TYPE_2A; + } props->vendor_id = be32_to_cpup((__be32 *) (out_mad->data + 36)) & 0xff; @@ -1434,6 +1442,17 @@ static void *mlx4_ib_add(struct mlx4_dev *dev) ibdev->ib_dev.dealloc_fmr = mlx4_ib_fmr_dealloc; } + if (dev->caps.flags & MLX4_DEV_CAP_FLAG_MEM_WINDOW || + dev->caps.bmme_flags & MLX4_BMME_FLAG_TYPE_2_WIN) { + ibdev->ib_dev.alloc_mw = mlx4_ib_alloc_mw; + ibdev->ib_dev.bind_mw = mlx4_ib_bind_mw; + ibdev->ib_dev.dealloc_mw = mlx4_ib_dealloc_mw; + + ibdev->ib_dev.uverbs_cmd_mask |= + (1ull << IB_USER_VERBS_CMD_ALLOC_MW) | + (1ull << IB_USER_VERBS_CMD_DEALLOC_MW); + } + if (dev->caps.flags & MLX4_DEV_CAP_FLAG_XRC) { ibdev->ib_dev.alloc_xrcd = mlx4_ib_alloc_xrcd; ibdev->ib_dev.dealloc_xrcd = mlx4_ib_dealloc_xrcd; -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH for-next 09/10] IB/mlx4_ib: Support memory window binding
From: Shani Michaeli * Implement memory windows binding in mlx4_ib_post_send. * Implement mlx4_ib_bind_mw by deferring to mlx4_ib_post_send. * Rename MLX4_WQE_FMR_PERM_* flags to MLX4_WQE_FMR_AND_BIND_PERM_*, indicating that they are used both for fast registration work requests, and for memory window bind work requests. Signed-off-by: Haggai Eran Signed-off-by: Shani Michaeli Signed-off-by: Or Gerlitz --- drivers/infiniband/hw/mlx4/mlx4_ib.h |2 + drivers/infiniband/hw/mlx4/mr.c | 22 + drivers/infiniband/hw/mlx4/qp.c | 35 +++-- include/linux/mlx4/qp.h | 11 +++-- 4 files changed, 64 insertions(+), 6 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h index 6d28491..5a21783 100644 --- a/drivers/infiniband/hw/mlx4/mlx4_ib.h +++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h @@ -592,6 +592,8 @@ struct ib_mr *mlx4_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length, struct ib_udata *udata); int mlx4_ib_dereg_mr(struct ib_mr *mr); struct ib_mw *mlx4_ib_alloc_mw(struct ib_pd *pd, enum ib_mw_type type); +int mlx4_ib_bind_mw(struct ib_qp *qp, struct ib_mw *mw, + struct ib_mw_bind *mw_bind); int mlx4_ib_dealloc_mw(struct ib_mw *mw); struct ib_mr *mlx4_ib_alloc_fast_reg_mr(struct ib_pd *pd, int max_page_list_len); diff --git a/drivers/infiniband/hw/mlx4/mr.c b/drivers/infiniband/hw/mlx4/mr.c index 5adf4c4..e471f08 100644 --- a/drivers/infiniband/hw/mlx4/mr.c +++ b/drivers/infiniband/hw/mlx4/mr.c @@ -231,6 +231,28 @@ err_free: return ERR_PTR(err); } +int mlx4_ib_bind_mw(struct ib_qp *qp, struct ib_mw *mw, + struct ib_mw_bind *mw_bind) +{ + struct ib_send_wr wr; + struct ib_send_wr *bad_wr; + int ret; + + memset(&wr, 0, sizeof(wr)); + wr.opcode = IB_WR_BIND_MW; + wr.wr_id= mw_bind->wr_id; + wr.send_flags = mw_bind->send_flags; + wr.wr.bind_mw.mw= mw; + wr.wr.bind_mw.bind_info = mw_bind->bind_info; + wr.wr.bind_mw.rkey = ib_inc_rkey(mw->rkey); + + ret = mlx4_ib_post_send(qp, &wr, &bad_wr); + if (!ret) + mw->rkey = wr.wr.bind_mw.rkey; + + return ret; +} + int mlx4_ib_dealloc_mw(struct ib_mw *ibmw) { struct mlx4_ib_mw *mw = to_mmw(ibmw); diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c index c6dde71..93bdae5 100644 --- a/drivers/infiniband/hw/mlx4/qp.c +++ b/drivers/infiniband/hw/mlx4/qp.c @@ -104,6 +104,7 @@ static const __be32 mlx4_ib_opcode[] = { [IB_WR_FAST_REG_MR] = cpu_to_be32(MLX4_OPCODE_FMR), [IB_WR_MASKED_ATOMIC_CMP_AND_SWP] = cpu_to_be32(MLX4_OPCODE_MASKED_ATOMIC_CS), [IB_WR_MASKED_ATOMIC_FETCH_AND_ADD] = cpu_to_be32(MLX4_OPCODE_MASKED_ATOMIC_FA), + [IB_WR_BIND_MW] = cpu_to_be32(MLX4_OPCODE_BIND_MW), }; static struct mlx4_ib_sqp *to_msqp(struct mlx4_ib_qp *mqp) @@ -1953,9 +1954,12 @@ static int mlx4_wq_overflow(struct mlx4_ib_wq *wq, int nreq, struct ib_cq *ib_cq static __be32 convert_access(int acc) { - return (acc & IB_ACCESS_REMOTE_ATOMIC ? cpu_to_be32(MLX4_WQE_FMR_PERM_ATOMIC) : 0) | - (acc & IB_ACCESS_REMOTE_WRITE ? cpu_to_be32(MLX4_WQE_FMR_PERM_REMOTE_WRITE) : 0) | - (acc & IB_ACCESS_REMOTE_READ ? cpu_to_be32(MLX4_WQE_FMR_PERM_REMOTE_READ) : 0) | + return (acc & IB_ACCESS_REMOTE_ATOMIC ? + cpu_to_be32(MLX4_WQE_FMR_AND_BIND_PERM_ATOMIC) : 0) | + (acc & IB_ACCESS_REMOTE_WRITE ? + cpu_to_be32(MLX4_WQE_FMR_AND_BIND_PERM_REMOTE_WRITE) : 0) | + (acc & IB_ACCESS_REMOTE_READ ? + cpu_to_be32(MLX4_WQE_FMR_AND_BIND_PERM_REMOTE_READ) : 0) | (acc & IB_ACCESS_LOCAL_WRITE ? cpu_to_be32(MLX4_WQE_FMR_PERM_LOCAL_WRITE) : 0) | cpu_to_be32(MLX4_WQE_FMR_PERM_LOCAL_READ); } @@ -1981,6 +1985,24 @@ static void set_fmr_seg(struct mlx4_wqe_fmr_seg *fseg, struct ib_send_wr *wr) fseg->reserved[1] = 0; } +static void set_bind_seg(struct mlx4_wqe_bind_seg *bseg, struct ib_send_wr *wr) +{ + bseg->flags1 = + convert_access(wr->wr.bind_mw.bind_info.mw_access_flags) & + cpu_to_be32(MLX4_WQE_FMR_AND_BIND_PERM_REMOTE_READ | + MLX4_WQE_FMR_AND_BIND_PERM_REMOTE_WRITE | + MLX4_WQE_FMR_AND_BIND_PERM_ATOMIC); + bseg->flags2 = 0; + if (wr->wr.bind_mw.mw->type == IB_MW_TYPE_2) + bseg->flags2 |= cpu_to_be32(MLX4_WQE_BIND_TYPE_2); + if (wr->wr.bind_mw.bind_info.mw_access_flags & IB_ZERO_BASED) + bseg->flags2 |= cpu_to_be32(MLX4_WQE_BIND_ZERO_BASED); + bseg->new_rkey = cpu_to_be32(wr->wr.b
[PATCH for-next 08/10] mlx4: Implement memory windows allocation and deallocation
From: Shani Michaeli Implement MW allocation and deallocation in mlx4_core and mlx4_ib. Pass down the enable bind flag when registering memory regions. Signed-off-by: Haggai Eran Signed-off-by: Shani Michaeli Signed-off-by: Or Gerlitz --- drivers/infiniband/hw/mlx4/mlx4_ib.h| 12 drivers/infiniband/hw/mlx4/mr.c | 52 + drivers/net/ethernet/mellanox/mlx4/mr.c | 95 +++ include/linux/mlx4/device.h | 20 ++- 4 files changed, 178 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h index dcd845b..6d28491 100644 --- a/drivers/infiniband/hw/mlx4/mlx4_ib.h +++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h @@ -116,6 +116,11 @@ struct mlx4_ib_mr { struct ib_umem *umem; }; +struct mlx4_ib_mw { + struct ib_mwibmw; + struct mlx4_mw mmw; +}; + struct mlx4_ib_fast_reg_page_list { struct ib_fast_reg_page_listibfrpl; __be64 *mapped_page_list; @@ -533,6 +538,11 @@ static inline struct mlx4_ib_mr *to_mmr(struct ib_mr *ibmr) return container_of(ibmr, struct mlx4_ib_mr, ibmr); } +static inline struct mlx4_ib_mw *to_mmw(struct ib_mw *ibmw) +{ + return container_of(ibmw, struct mlx4_ib_mw, ibmw); +} + static inline struct mlx4_ib_fast_reg_page_list *to_mfrpl(struct ib_fast_reg_page_list *ibfrpl) { return container_of(ibfrpl, struct mlx4_ib_fast_reg_page_list, ibfrpl); @@ -581,6 +591,8 @@ struct ib_mr *mlx4_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length, u64 virt_addr, int access_flags, struct ib_udata *udata); int mlx4_ib_dereg_mr(struct ib_mr *mr); +struct ib_mw *mlx4_ib_alloc_mw(struct ib_pd *pd, enum ib_mw_type type); +int mlx4_ib_dealloc_mw(struct ib_mw *mw); struct ib_mr *mlx4_ib_alloc_fast_reg_mr(struct ib_pd *pd, int max_page_list_len); struct ib_fast_reg_page_list *mlx4_ib_alloc_fast_reg_page_list(struct ib_device *ibdev, diff --git a/drivers/infiniband/hw/mlx4/mr.c b/drivers/infiniband/hw/mlx4/mr.c index 254e1cf..5adf4c4 100644 --- a/drivers/infiniband/hw/mlx4/mr.c +++ b/drivers/infiniband/hw/mlx4/mr.c @@ -41,9 +41,19 @@ static u32 convert_access(int acc) (acc & IB_ACCESS_REMOTE_WRITE ? MLX4_PERM_REMOTE_WRITE : 0) | (acc & IB_ACCESS_REMOTE_READ ? MLX4_PERM_REMOTE_READ : 0) | (acc & IB_ACCESS_LOCAL_WRITE ? MLX4_PERM_LOCAL_WRITE : 0) | + (acc & IB_ACCESS_MW_BIND ? MLX4_PERM_BIND_MW : 0) | MLX4_PERM_LOCAL_READ; } +static enum mlx4_mw_type to_mlx4_type(enum ib_mw_type type) +{ + switch (type) { + case IB_MW_TYPE_1: return MLX4_MW_TYPE_1; + case IB_MW_TYPE_2: return MLX4_MW_TYPE_2; + default:return -1; + } +} + struct ib_mr *mlx4_ib_get_dma_mr(struct ib_pd *pd, int acc) { struct mlx4_ib_mr *mr; @@ -189,6 +199,48 @@ int mlx4_ib_dereg_mr(struct ib_mr *ibmr) return 0; } +struct ib_mw *mlx4_ib_alloc_mw(struct ib_pd *pd, enum ib_mw_type type) +{ + struct mlx4_ib_dev *dev = to_mdev(pd->device); + struct mlx4_ib_mw *mw; + int err; + + mw = kmalloc(sizeof(*mw), GFP_KERNEL); + if (!mw) + return ERR_PTR(-ENOMEM); + + err = mlx4_mw_alloc(dev->dev, to_mpd(pd)->pdn, + to_mlx4_type(type), &mw->mmw); + if (err) + goto err_free; + + err = mlx4_mw_enable(dev->dev, &mw->mmw); + if (err) + goto err_mw; + + mw->ibmw.rkey = mw->mmw.key; + + return &mw->ibmw; + +err_mw: + mlx4_mw_free(dev->dev, &mw->mmw); + +err_free: + kfree(mw); + + return ERR_PTR(err); +} + +int mlx4_ib_dealloc_mw(struct ib_mw *ibmw) +{ + struct mlx4_ib_mw *mw = to_mmw(ibmw); + + mlx4_mw_free(to_mdev(ibmw->device)->dev, &mw->mmw); + kfree(mw); + + return 0; +} + struct ib_mr *mlx4_ib_alloc_fast_reg_mr(struct ib_pd *pd, int max_page_list_len) { diff --git a/drivers/net/ethernet/mellanox/mlx4/mr.c b/drivers/net/ethernet/mellanox/mlx4/mr.c index 5e785bd..602ca9b 100644 --- a/drivers/net/ethernet/mellanox/mlx4/mr.c +++ b/drivers/net/ethernet/mellanox/mlx4/mr.c @@ -654,6 +654,101 @@ int mlx4_buf_write_mtt(struct mlx4_dev *dev, struct mlx4_mtt *mtt, } EXPORT_SYMBOL_GPL(mlx4_buf_write_mtt); +int mlx4_mw_alloc(struct mlx4_dev *dev, u32 pd, enum mlx4_mw_type type, + struct mlx4_mw *mw) +{ + u32 index; + + if ((type == MLX4_MW_TYPE_1 && +!(dev->caps.flags & MLX4_DEV_CAP_FLAG_MEM_WINDOW)) || +(type == MLX4_MW_TYPE_2 && +!(dev->caps.bmme_flags & MLX4_BMME_FLAG_TYPE_2_WIN))) + return -ENOTSUPP; + + index = mlx4_mpt_reserve(dev)
[PATCH for-next 07/10] IB/uverbs: Implement memory windows support in uverbs
From: Shani Michaeli The existing user/kernel uverbs API has IB_USER_VERBS_CMD_ALLOC/DEALLOC_MW, implement these calls, along with destroying user memory windows during process cleanup. Signed-off-by: Haggai Eran Signed-off-by: Shani Michaeli Signed-off-by: Or Gerlitz --- drivers/infiniband/core/uverbs.h |2 + drivers/infiniband/core/uverbs_cmd.c | 121 + drivers/infiniband/core/uverbs_main.c | 13 +++- include/uapi/rdma/ib_user_verbs.h | 16 + 4 files changed, 150 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h index 5bcb2af..0fcd7aa 100644 --- a/drivers/infiniband/core/uverbs.h +++ b/drivers/infiniband/core/uverbs.h @@ -188,6 +188,8 @@ IB_UVERBS_DECLARE_CMD(alloc_pd); IB_UVERBS_DECLARE_CMD(dealloc_pd); IB_UVERBS_DECLARE_CMD(reg_mr); IB_UVERBS_DECLARE_CMD(dereg_mr); +IB_UVERBS_DECLARE_CMD(alloc_mw); +IB_UVERBS_DECLARE_CMD(dealloc_mw); IB_UVERBS_DECLARE_CMD(create_comp_channel); IB_UVERBS_DECLARE_CMD(create_cq); IB_UVERBS_DECLARE_CMD(resize_cq); diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c index 0cb0007..3983a05 100644 --- a/drivers/infiniband/core/uverbs_cmd.c +++ b/drivers/infiniband/core/uverbs_cmd.c @@ -48,6 +48,7 @@ struct uverbs_lock_class { static struct uverbs_lock_class pd_lock_class = { .name = "PD-uobj" }; static struct uverbs_lock_class mr_lock_class = { .name = "MR-uobj" }; +static struct uverbs_lock_class mw_lock_class = { .name = "MW-uobj" }; static struct uverbs_lock_class cq_lock_class = { .name = "CQ-uobj" }; static struct uverbs_lock_class qp_lock_class = { .name = "QP-uobj" }; static struct uverbs_lock_class ah_lock_class = { .name = "AH-uobj" }; @@ -1049,6 +1050,126 @@ ssize_t ib_uverbs_dereg_mr(struct ib_uverbs_file *file, return in_len; } +ssize_t ib_uverbs_alloc_mw(struct ib_uverbs_file *file, +const char __user *buf, int in_len, +int out_len) +{ + struct ib_uverbs_alloc_mw cmd; + struct ib_uverbs_alloc_mw_resp resp; + struct ib_uobject *uobj; + struct ib_pd *pd; + struct ib_mw *mw; + intret; + + if (out_len < sizeof(resp)) + return -ENOSPC; + + if (copy_from_user(&cmd, buf, sizeof(cmd))) + return -EFAULT; + + uobj = kmalloc(sizeof(*uobj), GFP_KERNEL); + if (!uobj) + return -ENOMEM; + + init_uobj(uobj, 0, file->ucontext, &mw_lock_class); + down_write(&uobj->mutex); + + pd = idr_read_pd(cmd.pd_handle, file->ucontext); + if (!pd) { + ret = -EINVAL; + goto err_free; + } + + mw = pd->device->alloc_mw(pd, cmd.mw_type); + if (IS_ERR(mw)) { + ret = PTR_ERR(mw); + goto err_put; + } + + mw->device = pd->device; + mw->pd = pd; + mw->uobject = uobj; + atomic_inc(&pd->usecnt); + + uobj->object = mw; + ret = idr_add_uobj(&ib_uverbs_mw_idr, uobj); + if (ret) + goto err_unalloc; + + memset(&resp, 0, sizeof(resp)); + resp.rkey = mw->rkey; + resp.mw_handle = uobj->id; + + if (copy_to_user((void __user *)(unsigned long)cmd.response, +&resp, sizeof(resp))) { + ret = -EFAULT; + goto err_copy; + } + + put_pd_read(pd); + + mutex_lock(&file->mutex); + list_add_tail(&uobj->list, &file->ucontext->mw_list); + mutex_unlock(&file->mutex); + + uobj->live = 1; + + up_write(&uobj->mutex); + + return in_len; + +err_copy: + idr_remove_uobj(&ib_uverbs_mw_idr, uobj); + +err_unalloc: + ib_dealloc_mw(mw); + +err_put: + put_pd_read(pd); + +err_free: + put_uobj_write(uobj); + return ret; +} + +ssize_t ib_uverbs_dealloc_mw(struct ib_uverbs_file *file, + const char __user *buf, int in_len, + int out_len) +{ + struct ib_uverbs_dealloc_mw cmd; + struct ib_mw *mw; + struct ib_uobject *uobj; + int ret = -EINVAL; + + if (copy_from_user(&cmd, buf, sizeof(cmd))) + return -EFAULT; + + uobj = idr_write_uobj(&ib_uverbs_mw_idr, cmd.mw_handle, file->ucontext); + if (!uobj) + return -EINVAL; + + mw = uobj->object; + + ret = ib_dealloc_mw(mw); + if (!ret) + uobj->live = 0; + + put_uobj_write(uobj); + + if (ret) + return ret; + + idr_remove_uobj(&ib_uverbs_mw_idr, uobj); + + mutex_lock(&file->mutex); + list_del(&uobj->list); + mutex_unlock(&file->mutex); + + put_uobj(uobj); + + return in_len; +} + ssize_t ib_uverbs_create_comp_channel(struc
[PATCH for-next 04/10] net/mlx4_core: Disable memory windows for VFs
From: Shani Michaeli Do not enable memory windows allocation for virtual functions. In addition, add a few safety checks, such as: * Verifying the PD of a new MPT matches the VF. * Making sure binding memory window isn't enabled for FMRs, and that new memory windows are not FMR themselves. Signed-off-by: Haggai Eran Signed-off-by: Shani Michaeli Signed-off-by: Or Gerlitz --- drivers/net/ethernet/mellanox/mlx4/fw.c| 11 - drivers/net/ethernet/mellanox/mlx4/mlx4.h | 16 ++ drivers/net/ethernet/mellanox/mlx4/mr.c| 14 -- .../net/ethernet/mellanox/mlx4/resource_tracker.c | 49 4 files changed, 75 insertions(+), 15 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c b/drivers/net/ethernet/mellanox/mlx4/fw.c index 8b3d051..a389612 100644 --- a/drivers/net/ethernet/mellanox/mlx4/fw.c +++ b/drivers/net/ethernet/mellanox/mlx4/fw.c @@ -757,15 +757,19 @@ int mlx4_QUERY_DEV_CAP_wrapper(struct mlx4_dev *dev, int slave, u64 flags; int err = 0; u8 field; + u32 bmme_flags; err = mlx4_cmd_box(dev, 0, outbox->dma, 0, 0, MLX4_CMD_QUERY_DEV_CAP, MLX4_CMD_TIME_CLASS_A, MLX4_CMD_NATIVE); if (err) return err; - /* add port mng change event capability unconditionally to slaves */ + /* add port mng change event capability and disable mw type 1 +* unconditionally to slaves +*/ MLX4_GET(flags, outbox->buf, QUERY_DEV_CAP_EXT_FLAGS_OFFSET); flags |= MLX4_DEV_CAP_FLAG_PORT_MNG_CHG_EV; + flags &= ~MLX4_DEV_CAP_FLAG_MEM_WINDOW; MLX4_PUT(outbox->buf, flags, QUERY_DEV_CAP_EXT_FLAGS_OFFSET); /* For guests, report Blueflame disabled */ @@ -773,6 +777,11 @@ int mlx4_QUERY_DEV_CAP_wrapper(struct mlx4_dev *dev, int slave, field &= 0x7f; MLX4_PUT(outbox->buf, field, QUERY_DEV_CAP_BF_OFFSET); + /* For guests, disable mw type 2 */ + MLX4_GET(bmme_flags, outbox, QUERY_DEV_CAP_BMME_FLAGS_OFFSET); + bmme_flags &= ~MLX4_BMME_FLAG_TYPE_2_WIN; + MLX4_PUT(outbox->buf, bmme_flags, QUERY_DEV_CAP_BMME_FLAGS_OFFSET); + return 0; } diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4.h b/drivers/net/ethernet/mellanox/mlx4/mlx4.h index 5075236..539212b 100644 --- a/drivers/net/ethernet/mellanox/mlx4/mlx4.h +++ b/drivers/net/ethernet/mellanox/mlx4/mlx4.h @@ -268,6 +268,22 @@ struct mlx4_icm_table { struct mlx4_icm **icm; }; +#define MLX4_MPT_FLAG_SW_OWNS (0xfUL << 28) +#define MLX4_MPT_FLAG_FREE (0x3UL << 28) +#define MLX4_MPT_FLAG_MIO (1 << 17) +#define MLX4_MPT_FLAG_BIND_ENABLE (1 << 15) +#define MLX4_MPT_FLAG_PHYSICAL (1 << 9) +#define MLX4_MPT_FLAG_REGION (1 << 8) + +#define MLX4_MPT_PD_FLAG_FAST_REG (1 << 27) +#define MLX4_MPT_PD_FLAG_RAE (1 << 28) +#define MLX4_MPT_PD_FLAG_EN_INV(3 << 24) + +#define MLX4_MPT_QP_FLAG_BOUND_QP (1 << 7) + +#define MLX4_MPT_STATUS_SW 0xF0 +#define MLX4_MPT_STATUS_HW 0x00 + /* * Must be packed because mtt_seg is 64 bits but only aligned to 32 bits. */ diff --git a/drivers/net/ethernet/mellanox/mlx4/mr.c b/drivers/net/ethernet/mellanox/mlx4/mr.c index 06b16e4..5e785bd 100644 --- a/drivers/net/ethernet/mellanox/mlx4/mr.c +++ b/drivers/net/ethernet/mellanox/mlx4/mr.c @@ -44,20 +44,6 @@ #include "mlx4.h" #include "icm.h" -#define MLX4_MPT_FLAG_SW_OWNS (0xfUL << 28) -#define MLX4_MPT_FLAG_FREE (0x3UL << 28) -#define MLX4_MPT_FLAG_MIO (1 << 17) -#define MLX4_MPT_FLAG_BIND_ENABLE (1 << 15) -#define MLX4_MPT_FLAG_PHYSICAL (1 << 9) -#define MLX4_MPT_FLAG_REGION (1 << 8) - -#define MLX4_MPT_PD_FLAG_FAST_REG (1 << 27) -#define MLX4_MPT_PD_FLAG_RAE (1 << 28) -#define MLX4_MPT_PD_FLAG_EN_INV(3 << 24) - -#define MLX4_MPT_STATUS_SW 0xF0 -#define MLX4_MPT_STATUS_HW 0x00 - static u32 mlx4_buddy_alloc(struct mlx4_buddy *buddy, int order) { int o; diff --git a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c index 2287dfd..9185e2e 100644 --- a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c +++ b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c @@ -1796,6 +1796,26 @@ static int mr_get_mtt_size(struct mlx4_mpt_entry *mpt) return be32_to_cpu(mpt->mtt_sz); } +static u32 mr_get_pd(struct mlx4_mpt_entry *mpt) +{ + return be32_to_cpu(mpt->pd_flags) & 0x00ff; +} + +static int mr_is_fmr(struct mlx4_mpt_entry *mpt) +{ + return be32_to_cpu(mpt->pd_flags) & MLX4_MPT_PD_FLAG_FAST_REG; +} + +static int mr_is_bind_enabled(struct mlx4_mpt_entry *mpt) +{ + return be32_to_cpu(mpt->flags) & MLX4_MPT_FLAG_BIND_ENABLE; +} + +static int mr_is_region(struct mlx4_mpt_entry *mpt) +{ + return be32_to_cpu(mpt->flags) & MLX4_MPT_FLAG_RE
[PATCH for-next 00/10] mlx4: Add Memory Windows support
Hi Roland, Here's a series from Shani Michaeli and Haggai Eran adds mlx4 driver support for Memory Windows. The first entries in this set are "pre patches" preparing the grounds for the actual implementation of MWs. Later there're two core patches, one to ib_verbs.h adding support for type 2 MWs and another one to uverbs that exposes MW commands to user space. And finally the actual mlx4 driver MWs patches. Or. Shani Michaeli (10): IB/mlx4_ib: Remove local invalidate segment unused fields net/mlx4_core: Rename MPT related service routines to have mpt_ prefix net/mlx4_core: Propogate MR deregistration failure net/mlx4_core: Disable memory windows for VFs net/mlx4_core: Enable memory windows in {INIT,QUERY}_HCA IB/core: Enhance memory windows support IB/uverbs: Implement memory windows support in uverbs mlx4: Implement memory windows allocation and deallocation IB/mlx4_ib: Support memory window binding IB/mlx4_ib: Advertize MW support drivers/infiniband/core/uverbs.h |2 + drivers/infiniband/core/uverbs_cmd.c | 121 + drivers/infiniband/core/uverbs_main.c | 13 ++- drivers/infiniband/core/verbs.c|5 +- drivers/infiniband/hw/cxgb3/iwch_provider.c|5 +- drivers/infiniband/hw/cxgb3/iwch_qp.c | 15 +- drivers/infiniband/hw/cxgb4/iw_cxgb4.h |2 +- drivers/infiniband/hw/cxgb4/mem.c |5 +- drivers/infiniband/hw/ehca/ehca_iverbs.h |2 +- drivers/infiniband/hw/ehca/ehca_mrmw.c |5 +- drivers/infiniband/hw/mlx4/main.c | 19 ++ drivers/infiniband/hw/mlx4/mlx4_ib.h | 14 ++ drivers/infiniband/hw/mlx4/mr.c| 87 +- drivers/infiniband/hw/mlx4/qp.c| 41 - drivers/infiniband/hw/nes/nes_verbs.c | 19 ++- drivers/net/ethernet/mellanox/mlx4/en_main.c |4 +- drivers/net/ethernet/mellanox/mlx4/fw.c| 14 ++- drivers/net/ethernet/mellanox/mlx4/fw.h|1 + drivers/net/ethernet/mellanox/mlx4/main.c |4 + drivers/net/ethernet/mellanox/mlx4/mlx4.h | 34 +++- drivers/net/ethernet/mellanox/mlx4/mr.c| 186 +++- .../net/ethernet/mellanox/mlx4/resource_tracker.c | 63 ++- include/linux/mlx4/device.h| 22 ++- include/linux/mlx4/qp.h| 19 ++- include/rdma/ib_verbs.h| 73 +++- include/uapi/rdma/ib_user_verbs.h | 16 ++ net/sunrpc/xprtrdma/verbs.c| 20 +- 27 files changed, 683 insertions(+), 128 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH for-next 03/10] net/mlx4_core: Propogate MR deregistration failure
From: Shani Michaeli MR deregistration fails when memory windows are bound to it. Handle such failures by propagating it to the caller ULP. Signed-off-by: Haggai Eran Signed-off-by: Shani Michaeli Signed-off-by: Or Gerlitz --- drivers/infiniband/hw/mlx4/mr.c | 13 +++ drivers/net/ethernet/mellanox/mlx4/en_main.c |4 +- drivers/net/ethernet/mellanox/mlx4/mr.c | 29 +++-- include/linux/mlx4/device.h |2 +- 4 files changed, 33 insertions(+), 15 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/mr.c b/drivers/infiniband/hw/mlx4/mr.c index bbaf617..254e1cf 100644 --- a/drivers/infiniband/hw/mlx4/mr.c +++ b/drivers/infiniband/hw/mlx4/mr.c @@ -68,7 +68,7 @@ struct ib_mr *mlx4_ib_get_dma_mr(struct ib_pd *pd, int acc) return &mr->ibmr; err_mr: - mlx4_mr_free(to_mdev(pd->device)->dev, &mr->mmr); + (void) mlx4_mr_free(to_mdev(pd->device)->dev, &mr->mmr); err_free: kfree(mr); @@ -163,7 +163,7 @@ struct ib_mr *mlx4_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length, return &mr->ibmr; err_mr: - mlx4_mr_free(to_mdev(pd->device)->dev, &mr->mmr); + (void) mlx4_mr_free(to_mdev(pd->device)->dev, &mr->mmr); err_umem: ib_umem_release(mr->umem); @@ -177,8 +177,11 @@ err_free: int mlx4_ib_dereg_mr(struct ib_mr *ibmr) { struct mlx4_ib_mr *mr = to_mmr(ibmr); + int ret; - mlx4_mr_free(to_mdev(ibmr->device)->dev, &mr->mmr); + ret = mlx4_mr_free(to_mdev(ibmr->device)->dev, &mr->mmr); + if (ret) + return ret; if (mr->umem) ib_umem_release(mr->umem); kfree(mr); @@ -212,7 +215,7 @@ struct ib_mr *mlx4_ib_alloc_fast_reg_mr(struct ib_pd *pd, return &mr->ibmr; err_mr: - mlx4_mr_free(dev->dev, &mr->mmr); + (void) mlx4_mr_free(dev->dev, &mr->mmr); err_free: kfree(mr); @@ -291,7 +294,7 @@ struct ib_fmr *mlx4_ib_fmr_alloc(struct ib_pd *pd, int acc, return &fmr->ibfmr; err_mr: - mlx4_mr_free(to_mdev(pd->device)->dev, &fmr->mfmr.mr); + (void) mlx4_mr_free(to_mdev(pd->device)->dev, &fmr->mfmr.mr); err_free: kfree(fmr); diff --git a/drivers/net/ethernet/mellanox/mlx4/en_main.c b/drivers/net/ethernet/mellanox/mlx4/en_main.c index 3a2b8c6..a298714 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_main.c @@ -176,7 +176,7 @@ static void mlx4_en_remove(struct mlx4_dev *dev, void *endev_ptr) flush_workqueue(mdev->workqueue); destroy_workqueue(mdev->workqueue); - mlx4_mr_free(dev, &mdev->mr); + (void) mlx4_mr_free(dev, &mdev->mr); iounmap(mdev->uar_map); mlx4_uar_free(dev, &mdev->priv_uar); mlx4_pd_free(dev, mdev->priv_pdn); @@ -283,7 +283,7 @@ static void *mlx4_en_add(struct mlx4_dev *dev) return mdev; err_mr: - mlx4_mr_free(dev, &mdev->mr); + (void) mlx4_mr_free(dev, &mdev->mr); err_map: if (!mdev->uar_map) iounmap(mdev->uar_map); diff --git a/drivers/net/ethernet/mellanox/mlx4/mr.c b/drivers/net/ethernet/mellanox/mlx4/mr.c index 49705cf..06b16e4 100644 --- a/drivers/net/ethernet/mellanox/mlx4/mr.c +++ b/drivers/net/ethernet/mellanox/mlx4/mr.c @@ -442,7 +442,7 @@ int mlx4_mr_alloc(struct mlx4_dev *dev, u32 pd, u64 iova, u64 size, u32 access, } EXPORT_SYMBOL_GPL(mlx4_mr_alloc); -static void mlx4_mr_free_reserved(struct mlx4_dev *dev, struct mlx4_mr *mr) +static int mlx4_mr_free_reserved(struct mlx4_dev *dev, struct mlx4_mr *mr) { int err; @@ -450,20 +450,31 @@ static void mlx4_mr_free_reserved(struct mlx4_dev *dev, struct mlx4_mr *mr) err = mlx4_HW2SW_MPT(dev, NULL, key_to_hw_index(mr->key) & (dev->caps.num_mpts - 1)); - if (err) - mlx4_warn(dev, "xxx HW2SW_MPT failed (%d)\n", err); + if (err) { + mlx4_warn(dev, "HW2SW_MPT failed (%d),", err); + mlx4_warn(dev, "MR has MWs bound to it.\n"); + return err; + } mr->enabled = MLX4_MPT_EN_SW; } mlx4_mtt_cleanup(dev, &mr->mtt); + + return 0; } -void mlx4_mr_free(struct mlx4_dev *dev, struct mlx4_mr *mr) +int mlx4_mr_free(struct mlx4_dev *dev, struct mlx4_mr *mr) { - mlx4_mr_free_reserved(dev, mr); + int ret; + + ret = mlx4_mr_free_reserved(dev, mr); + if (ret) + return ret; if (mr->enabled) mlx4_mpt_free_icm(dev, key_to_hw_index(mr->key)); mlx4_mpt_release(dev, key_to_hw_index(mr->key)); + + return 0; } EXPORT_SYMBOL_GPL(mlx4_mr_free); @@ -831,7 +842,7 @@ int mlx4_fmr_alloc(struct mlx4_dev *dev, u32 pd, u32 access, int max_pages, return 0; err_free: - mlx4_mr_free(dev, &fmr->mr); +
[PATCH for-next 01/10] IB/mlx4_ib: Remove local invalidate segment unused fields
From: Shani Michaeli Remove unused fields from the local invalidate WQE segment structure. Signed-off-by: Haggai Eran Signed-off-by: Shani Michaeli Signed-off-by: Or Gerlitz --- drivers/infiniband/hw/mlx4/qp.c |6 ++ include/linux/mlx4/qp.h |8 +++- 2 files changed, 5 insertions(+), 9 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c index 19e0637..c6dde71 100644 --- a/drivers/infiniband/hw/mlx4/qp.c +++ b/drivers/infiniband/hw/mlx4/qp.c @@ -1983,10 +1983,8 @@ static void set_fmr_seg(struct mlx4_wqe_fmr_seg *fseg, struct ib_send_wr *wr) static void set_local_inv_seg(struct mlx4_wqe_local_inval_seg *iseg, u32 rkey) { - iseg->flags = 0; - iseg->mem_key = cpu_to_be32(rkey); - iseg->guest_id = 0; - iseg->pa= 0; + memset(iseg, 0, sizeof(*iseg)); + iseg->mem_key = cpu_to_be32(rkey); } static __always_inline void set_raddr_seg(struct mlx4_wqe_raddr_seg *rseg, diff --git a/include/linux/mlx4/qp.h b/include/linux/mlx4/qp.h index 4b4ad6f..6c8a68c 100644 --- a/include/linux/mlx4/qp.h +++ b/include/linux/mlx4/qp.h @@ -304,12 +304,10 @@ struct mlx4_wqe_fmr_ext_seg { }; struct mlx4_wqe_local_inval_seg { - __be32 flags; - u32 reserved1; + u64 reserved1; __be32 mem_key; - u32 reserved2[2]; - __be32 guest_id; - __be64 pa; + u32 reserved2; + u64 reserved3[2]; }; struct mlx4_wqe_raddr_seg { -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NFS over RDMA crashing
On 2/6/2013 9:48 AM, Yan Burman wrote: When I moved to commit 79f77bf9a4e3dd5ead006b8f17e7c4ff07d8374e I was no longer getting the server crashes, so the reset of my tests were done using that point (it is somewhere in the middle of 3.7.0-rc2). +tom tucker I'd try going back a few kernels, like to 3.5.x and see if things are more stable. If you find a point that works, then git bisect might help identify the regression. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
NFS over RDMA crashing
Hi. I have been trying to create a setup with NFS/RDMA, but I am getting crashes. I am using Mellanox ConnectX 3 HCA with SRIOV enabled with two KVM VMs with RHEL 6.3 getting one VF each. My test case is trying to use one VM's storage from another using NFS over RDMA (192.168.20.210 server, 192.168.20.211 client) I started with two physical hosts, but because of crashes moved to VMs which are easier to debug. I have functional ipoib connection between the two VMs and rping is working between them also. My /etc/exports has the following entry: /mnt/tmp*(fsid=1,rw,async,insecure,all_squash) while /mnt/tmp has tmpfs mounted on it. My mount command is: mount -t nfs -o rdma,port=2050 192.168.20.210:/mnt/tmp /mnt/tmp I have tried latest net-next kernel first, but I was getting the following errors: = [ INFO: possible recursive locking detected ] 3.8.0-rc5+ #4 Not tainted - kworker/6:0/49 is trying to acquire lock: (&id_priv->handler_mutex){+.+.+.}, at: [] rdma_destroy_id+0x33/0x250 [rdma_cm] but task is already holding lock: (&id_priv->handler_mutex){+.+.+.}, at: [] cma_disable_callback+0x2b/0x60 [rdma_cm] other info that might help us debug this: Possible unsafe locking scenario: CPU0 lock(&id_priv->handler_mutex); lock(&id_priv->handler_mutex); *** DEADLOCK *** May be due to missing lock nesting notation 3 locks held by kworker/6:0/49: #0: (ib_cm){.+.+.+}, at: [] process_one_work+0x160/0x720 #1: ((&(&work->work)->work)){+.+.+.}, at: [] process_one_work+0x160/0x720 #2: (&id_priv->handler_mutex){+.+.+.}, at: [] cma_disable_callback+0x2b/0x60 [rdma_cm] stack backtrace: Pid: 49, comm: kworker/6:0 Not tainted 3.8.0-rc5+ #4 Call Trace: [] validate_chain+0xdcc/0x11f0 [] ? save_trace+0x3f/0xc0 [] __lock_acquire+0x440/0xc30 [] ? __lock_acquire+0x440/0xc30 [] lock_acquire+0x95/0x1e0 [] ? rdma_destroy_id+0x33/0x250 [rdma_cm] [] ? rdma_destroy_id+0x33/0x250 [rdma_cm] [] mutex_lock_nested+0x5f/0x3b0 [] ? rdma_destroy_id+0x33/0x250 [rdma_cm] [] ? trace_hardirqs_on_caller+0x10d/0x1a0 [] ? trace_hardirqs_on+0xd/0x10 [] ? _raw_spin_unlock_irqrestore+0x3d/0x80 [] rdma_destroy_id+0x33/0x250 [rdma_cm] [] cma_req_handler+0x719/0x730 [rdma_cm] [] ? _raw_spin_unlock_irqrestore+0x4/0x80 [] cm_process_work+0x22/0x170 [ib_cm] [] cm_req_handler+0x67d/0xa70 [ib_cm] [] cm_work_handler+0x12d/0x1218 [ib_cm] [] process_one_work+0x1d2/0x720 [] ? process_one_work+0x160/0x720 [] ? cm_req_handler+0xa70/0xa70 [ib_cm] [] worker_thread+0x120/0x460 [] ? preempt_schedule+0x44/0x60 [] ? manage_workers+0x300/0x300 [] kthread+0xd6/0xe0 [] ? __init_kthread_worker+0x70/0x70 [] ret_from_fork+0x7c/0xb0 [] ? __init_kthread_worker+0x70/0x70 When killing mount command that got stuck: --- BUG: unable to handle kernel paging request at 880324dc7ff8 IP: [] rdma_read_xdr+0x8bb/0xd40 [svcrdma] PGD 1a0c063 PUD 32f82e063 PMD 32f2fd063 PTE 800324dc7161 Oops: 0003 [#1] PREEMPT SMP Modules linked in: md5 ib_ipoib xprtrdma svcrdma rdma_cm ib_cm iw_cm ib_addr nfsd exportfs netconsole ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat nfsv3 nfs_acl ebtables x_tables nfsv4 auth_rpcgss nfs lockd autofs4 sunrpc target_core_iblock target_core_file target_core_pscsi target_core_mod configfs 8021q bridge stp llc ipv6 dm_mirror dm_region_hash dm_log vhost_net macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support kvm_intel kvm crc32c_intel microcode pcspkr joydev i2c_i801 lpc_ich mfd_core ehci_pci ehci_hcd sg ioatdma ixgbe mdio mlx4_ib ib_sa ib_mad ib_core mlx4_en mlx4_core igb hwmon dca ptp pps_core button dm_mod ext3 jbd sd_mod ata_piix libata uhci_hcd megaraid_sas scsi_mod CPU 6 Pid: 4744, comm: nfsd Not tainted 3.8.0-rc5+ #4 Supermicro X8DTH-i/6/iF/6F/X8DTH RIP: 0010:[] [] rdma_read_xdr+0x8bb/0xd40 [svcrdma] RSP: 0018:880324c3dbf8 EFLAGS: 00010297 RAX: 880324dc8000 RBX: 0001 RCX: 880324dd8428 RDX: 880324dc7ff8 RSI: 880324dd8428 RDI: 81149618 RBP: 880324c3dd78 R08: 60f9c860 R09: 0001 R10: 880324dd8000 R11: 0001 R12: 8806299dcb10 R13: 0003 R14: 0001 R15: 0010 FS: () GS:88063fc0() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 880324dc7ff8 CR3: 01a0b000 CR4: 07e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process nfsd (pid: 4744, threadinfo 880324c3c000, task 88033055) Stack: 880324c3dc78 880324c3dcd8 0282 880631cec000 880324dd8000 88062ed33040 000124c3dc48 880324dd8000 88062ed33058 880630ce2b90 8806299e8000 0003 Call Trace: [] svc_rdma_recv
Re: "Virtual" ibnetdiscover command fails
On 06/02/2013 12:40, Sebastian Riemer wrote: So if I don't use the unmaintained srptools to get the SRP connection strings but instead send them directly to the initiator to connect to the SRP target, then also SRP should be possible with the virtual GUID. Am I right? Basically YES, you can use the initiator VM vGID as the source GID for the connection. Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: "Virtual" ibnetdiscover command fails
On 06.02.2013 11:20, Or Gerlitz wrote: > On 06/02/2013 12:04, Mathis GAVILLON wrote: >> Just a last question : is that possible VFs lid to be different from >> PF one ? > > NO, we've implemented a "shared port" model, so all functions on the > same IB port use the same lid, each function has its own > virtual GUID though. So if I don't use the unmaintained srptools to get the SRP connection strings but instead send them directly to the initiator to connect to the SRP target, then also SRP should be possible with the virtual GUID. Am I right? Cheers, Sebastian -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: "Virtual" ibnetdiscover command fails
On 06.02.2013 10:22, Or Gerlitz wrote: > On 06/02/2013 11:17, Mathis GAVILLON wrote: >> Ok. But what is it possible to do with Infiniband VFs if QP0 is not >> available ? > > EVERYTHING, e.g run IPoIB, iSER, RDS, MPI, etc, etc - except for what > requires QP0, such as running SM or issuing SMPs for > discovery/diagnostics purposes But SRP isn't provided with SR-IOV I've heared. Is it just a matter of software or is it a matter of firmware/hardware? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: "Virtual" ibnetdiscover command fails
On 06/02/2013 12:04, Mathis GAVILLON wrote: Just a last question : is that possible VFs lid to be different from PF one ? NO, we've implemented a "shared port" model, so all functions on the same IB port use the same lid, each function has its own virtual GUID though. Or. Thanks 2013/2/6 Mathis GAVILLON : EVERYTHING, e.g run IPoIB, iSER, RDS, MPI, etc, etc - except for what requires QP0, such as running SM or issuing SMPs for discovery/diagnostics purposes Ok. I just begin with Infiniband technologie so I don't know everything about this yet. Thanks Mathis -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ANNOUNCE] OpenSM tarball release
Hi, There is a new release of OpenSM. Tarball available in: http://www.openfabrics.org/downloads/management/ (listed in http://www.openfabrics.org/downloads/management/latest.txt) md5sum: 32b16efbaba69d478f8c05df42ce0462 opensm-3.3.16.tar.gz All component versions are from recent master branch. Full list of changes is below. Albert Chu (5): opensm: Manage ports that do not support congestion control opensm: Fix signed vs unsigned int comparison opensm: Protect against spurious wakeups when calling cl_event_wait_on opensm/osm_perfmgr_db.c: Fix output error due to possible 32bit int overflow opensm: Add better error output when parsing node name maps Alex Netes (15): opensm: fix crash in DFSSSP routing engine on reroute opensm: fix default cc_max_outstanding_mads assignment opensm/osm_subnet.c: Only parameters that marked with can_update flag should be updated during conf file rescan opensm: Changed #if to #ifdef when using ENABLE_OSM_PERF_MGR_PROFILE opensm/osm_link_mgr.c: Set AM SMSupportExtendedSpeeds bit if port supports ExtPortInfo opensm/osm_link_mgr.c: Fix sending PortInfo(Set) with AM SMSupportExtendedSpeeds bit set for switch base port 0 opensm: Revert "opensm/osm_ucast_ftree: When roots are not connected, update hop count but not lft" opensm/osm_sm_mad_ctrl.c: Upon receiving trap repress we should decrease qp0_mads_outstanding_on_wire opensm: Add physp_p discovery count support opensm/osm_sm_state_mgr.c: Start sweep immedeately when recieving HANDOVER in DISCOVERING state opensm/configure.in: Remove Default-Start from opensmd init script opensm/osm_req.c: fix first sweep m_key search algorithm opensm: update shared library versions opensm_release_notes-3.3: update opensm: packages versions update Bart Van Assche (7): opensm: osm_pkey: Remove unused variables opensm: Add .gitignore Correct option names in opensm man page Add command-line option --pidfile Make it possible to enable opensm with chkconfig opensm.spec.in: Improve portability /etc/init.d/opensmd: Improve systemd integration Dan Ben Yosef (3): opensm/osm_ucast_dfsssp.c : Fix resource leak opensm/osm_ucast_dfsssp.c : fix dereference null return value opensm/osm_ucast_dfsssp.c : fix dereference before null check Daniel Klein (1): opensm: improve search common pkeys. Garrett Cooper (3): Fix linker error with clang with -O < 2 Fix -Wtautological-compare warnings with clang Fix -Wformat-security warnings with clang Hal Rosenstock (51): opensm/osm_vendor_ibumad.c: Add management class to error log message opensm/osm_sw_info_rcv.c: Fixed locking issue on osm_get_node_by_guid error OpenSM: Add new Mellanox OUI osmtest/osmt_multicast.c: Fix 02BF error opensm/osm_torus.c: Add error code to error log message opensm/complib/cl_spinlock.h: Remove some unimplemented routines opensm/ib_types.h: Commentary and cosmetic formatting change opensm/osm_sa.h: Cosmetic commentary change opensm/osm_ucast_updn.c: Add error codes to a couple of log messages opensm/osm_helper.c: Add some missing new lines to log message output opensm/osm_torus.c: Cosmetic formatting change opensm: Track minimum value in the fabric for data VLs supported on switch external ports opensm/osm_torus.c: Check fabric minimum data VLs on switch external ports opensm/complib/cl_atomic_osd.h: Fix long standing bug in cl_atomic_sub opensm/osm_trap_rcv.c: Eliminate unneeded trap_rcv_process_response routine opensm/osm_vl15intf.c: Fix commentary typo opensm/include/complib/cl_packon.h: Fix some commentary typos opensm/osm_sa_mcmember_record.c: Return proper scope for query with valid SA key Add Per Module Logging support for Congestion Manager opensm/include/osm_opensm.h: Fix commentary typo opensm: Add routing specific update_vlarb hook routine opensm/osm_torus.c: Require only 2 data VLs supported (PortInfo.VLCap) and use VLs 0-1 on CA links opensm: Update doc for changes to torus routing for CA, support opensm/osm_torus.c: Improve QoS configuration opensm/osm_torus.c: Add copyright opensm: Update doc for changes to torus routing for, endport support opensm/osm_torus.c: Minor simplification to check_qos_config opensm/osm_torus.c: Improve some misconfiguration error messages opensm/osm_req.c: In req_determine_mkey, add more info when ERR 1107 occurs opensm/osm_subnet.c: Improve error messages in subn_validate_neighbor opensm/osm_ucast_ftree.c: Remove duplicate free in fabric_create_leaf_switch_array opensm/osm_ucast_ftree.c: Eliminate unneeded NULL pointer checks prior to calls to free opensm/osm_torus.c: Fix crash in torus_update_osm_vlarb opensm/osm_port_info_rcv.c
Re: "Virtual" ibnetdiscover command fails
> EVERYTHING, e.g run IPoIB, iSER, RDS, MPI, etc, etc - except for what > requires QP0, such as running SM or issuing SMPs for discovery/diagnostics > purposes Ok. I just begin with Infiniband technologie so I don't know everything about this yet. Thanks Mathis -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: "Virtual" ibnetdiscover command fails
On 06/02/2013 11:17, Mathis GAVILLON wrote: Ok. But what is it possible to do with Infiniband VFs if QP0 is not available ? EVERYTHING, e.g run IPoIB, iSER, RDS, MPI, etc, etc - except for what requires QP0, such as running SM or issuing SMPs for discovery/diagnostics purposes 2013/2/5 Jack Morgenstein : Mathis, You cannot use SMP packets on a virtual host (this is a security issue, VFs are not trusted). Since QP0 (SMP) is not available on VFs, any tool which attempts to use QP0 (SMPs) will fail. Thus, OpenSM will not run over a VF, nor will ibnetdiscover, nor will sminfo (which uses SMP). -Jack -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH for 3.8 v3, resend 0/3] IB/SRP patches for kernel 3.8
On 06/02/2013 09:59, Bart Van Assche wrote: On 02/06/13 08:44, Or Gerlitz wrote: On 06/02/2013 09:22, Bart Van Assche wrote: A huge number of patches have been taken upstream between 3.8-rc1 and 3.8-rc6. I have retested these three patches with 3.8-rc6 and would appreciate if you would also repeat your tests. not really... this is what I see on Linus tree for the relevant directories, anywhere else I need to look linux-2.6]# git log --oneline v3.8-rc1..v3.8-rc6 drivers/scsi/ drivers/block/ block/drivers/infiniband/ulp/srp bdb0ae6 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 83e6818 efi: Make 'efi_enabled' a function to query EFI facilities 2263647 Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux 8d85fce Drivers: block: remove __dev* attributes. 6f03979 Drivers: scsi: remove __dev* attributes. f4953fe virtio-blk: Don't free ida when disk is in use Nobody outside Mellanox has ever been able to reproduce the behavior reported by you. I have asked for 2nd opinion so we can get a quorum either way. Something in your tests might have been specific to the Mellanox environment. Have you perhaps been running your tests with a firmware version that is not available to the general public ? NO I would appreciate it if you could check your test environment and repeat your tests. We will repeat the tests, indeed. Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html