[PATCH 1/1] blk-mq: reinit q->tag_set_list entry only after grace period

2018-06-10 Thread Roman Pen
It is not allowed to reinit q->tag_set_list list entry while RCU grace
period has not completed yet, otherwise the following soft lockup in
blk_mq_sched_restart() happens:

[ 1064.252652] watchdog: BUG: soft lockup - CPU#12 stuck for 23s! [fio:9270]
[ 1064.254445] task: 99b912e8b900 task.stack: a6d54c758000
[ 1064.254613] RIP: 0010:blk_mq_sched_restart+0x96/0x150
[ 1064.256510] Call Trace:
[ 1064.256664]  
[ 1064.256824]  blk_mq_free_request+0xea/0x100
[ 1064.256987]  msg_io_conf+0x59/0xd0 [ibnbd_client]
[ 1064.257175]  complete_rdma_req+0xf2/0x230 [ibtrs_client]
[ 1064.257340]  ? ibtrs_post_recv_empty+0x4d/0x70 [ibtrs_core]
[ 1064.257502]  ibtrs_clt_rdma_done+0xd1/0x1e0 [ibtrs_client]
[ 1064.257669]  ib_create_qp+0x321/0x380 [ib_core]
[ 1064.257841]  ib_process_cq_direct+0xbd/0x120 [ib_core]
[ 1064.258007]  irq_poll_softirq+0xb7/0xe0
[ 1064.258165]  __do_softirq+0x106/0x2a2
[ 1064.258328]  irq_exit+0x92/0xa0
[ 1064.258509]  do_IRQ+0x4a/0xd0
[ 1064.258660]  common_interrupt+0x7a/0x7a
[ 1064.258818]  

Meanwhile another context frees other queue but with the same set of
shared tags:

[ 1288.201183] INFO: task bash:5910 blocked for more than 180 seconds.
[ 1288.201833] bashD0  5910   5820 0x
[ 1288.202016] Call Trace:
[ 1288.202315]  schedule+0x32/0x80
[ 1288.202462]  schedule_timeout+0x1e5/0x380
[ 1288.203838]  wait_for_completion+0xb0/0x120
[ 1288.204137]  __wait_rcu_gp+0x125/0x160
[ 1288.204287]  synchronize_sched+0x6e/0x80
[ 1288.204770]  blk_mq_free_queue+0x74/0xe0
[ 1288.204922]  blk_cleanup_queue+0xc7/0x110
[ 1288.205073]  ibnbd_clt_unmap_device+0x1bc/0x280 [ibnbd_client]
[ 1288.205389]  ibnbd_clt_unmap_dev_store+0x169/0x1f0 [ibnbd_client]
[ 1288.205548]  kernfs_fop_write+0x109/0x180
[ 1288.206328]  vfs_write+0xb3/0x1a0
[ 1288.206476]  SyS_write+0x52/0xc0
[ 1288.206624]  do_syscall_64+0x68/0x1d0
[ 1288.206774]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2

What happened is the following:

1. There are several MQ queues with shared tags.
2. One queue is about to be freed and now task is in
   blk_mq_del_queue_tag_set().
3. Other CPU is in blk_mq_sched_restart() and loops over all queues in
   tag list in order to find hctx to restart.

Because linked list entry was modified in blk_mq_del_queue_tag_set()
without proper waiting for a grace period, blk_mq_sched_restart()
never ends, spining in list_for_each_entry_rcu_rr(), thus soft lockup.

Fix is simple: reinit list entry after an RCU grace period elapsed.

Signed-off-by: Roman Pen 
Cc: Jens Axboe 
Cc: Bart Van Assche 
Cc: Christoph Hellwig 
Cc: Sagi Grimberg 
Cc: Ming Lei 
Cc: linux-block@vger.kernel.org
---
 block/blk-mq.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 0dc9e341c2a7..2a40d60950f4 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2422,7 +2422,6 @@ static void blk_mq_del_queue_tag_set(struct request_queue 
*q)
 
mutex_lock(>tag_list_lock);
list_del_rcu(>tag_set_list);
-   INIT_LIST_HEAD(>tag_set_list);
if (list_is_singular(>tag_list)) {
/* just transitioned to unshared */
set->flags &= ~BLK_MQ_F_TAG_SHARED;
@@ -2430,8 +2429,8 @@ static void blk_mq_del_queue_tag_set(struct request_queue 
*q)
blk_mq_update_tag_set_depth(set, false);
}
mutex_unlock(>tag_list_lock);
-
synchronize_rcu();
+   INIT_LIST_HEAD(>tag_set_list);
 }
 
 static void blk_mq_add_queue_tag_set(struct blk_mq_tag_set *set,
-- 
2.13.1



[PATCH v3 00/25] InfiniBand Transport (IBTRS) and Network Block Device (IBNBD)

2018-06-06 Thread Roman Pen
NVMEoRDMA Change
   x1   932951975425  +4.6%
   x8  1543074   1504416  -2.5%
  x16  1531282   1432937  -6.4%
  x24  1396153   1244858 -10.8%
  x32  1215334   1066607 -12.2%
  x40  1255781   1076841 -14.2%
  x48  1240931   1066453 -14.1%
  x56  1250333   1065879 -14.8%
  x64  1229389   1064199 -13.4%

 rw=randwrite, bandwidth in Kbytes:
 jobsIBNBD NVMEoRDMA Change
   x1   1416413  1181102 -16.6%
   x8   2438615  1977051 -18.9%
  x16   2436924  1854223 -23.9%
  x24   2430527  1714580 -29.5%
  x32   2425552  1641288 -32.3%
  x40   2378784  1592788 -33.0%
  x48   2202260  1511895 -31.3%
  x56   2207013  1493400 -32.3%
  x64   2098949  1432951 -31.7%


  - on ConnectX-3 (MT4099)
x40 CPUs Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz

 rw=randread, bandwidth in Kbytes:
 jobsIBNBD NVMEoRDMA Change
   x1  1961216   2046572  +4.4%
   x8  4012912   4059410  +1.2%
  x16  4033837   3968410  -1.6%
  x24  3939186   3770729  -4.3%
  x32  3843434   3623869  -5.7%
  x40  3696896   3448772  -6.7%
  x48  4106259   3729201  -9.2%
  x56  4141374   3732954  -9.9%
  x64  4207317   3805638  -9.5%

 rw=randwrite, bandwidth in Kbytes:
 jobsIBNBD NVMEoRDMA Change
   x1  3195637   2479068 -22.4%
   x8  4576924   4541743  -0.8%
  x16  4581528   4555459  -0.6%
  x24  4692540   4595963  -2.1%
  x32  4686968   4540456  -3.1%
  x40  4583814   4404859  -3.9%
  x48  4969587   4710902  -5.2%
  x56  4996101   4701814  -5.9%
  x64  5083460   4759663  -6.4%

  The interesting observation is that on machine with Intel CPUs and
  ConnectX-3 card the difference between IBNBD and NVME bandwidth is
  significantly smaller comparing to AMD and ConnectX-2.  I did not
  thoroughly investiage that behaviour, but suspect that the devil
  is in Intel vs AMD architecture and probably how NUMAs are organized,
  i.e. Intel has 2 NUMA nodes against 8 on AMD.  If someone is interested
  in those results and can point me out where to dig on NVME side I can
  investigate deeply why exactly NVME bandwidth significantly drops on
  AMD machine with Connect-X2.

  Latest shiny graphs are here:
  
https://docs.google.com/spreadsheets/d/1vxSoIvfjPbOWD61XMeN2_gPGxsxrbIUOZADk1UX5lj0

Roman Pen (25):
  sysfs: export sysfs_remove_file_self()
  ibtrs: public interface header to establish RDMA connections
  ibtrs: private headers with IBTRS protocol structs and helpers
  ibtrs: core: lib functions shared between client and server modules
  ibtrs: client: private header with client structs and functions
  ibtrs: client: main functionality
  ibtrs: client: statistics functions
  ibtrs: client: sysfs interface functions
  ibtrs: server: private header with server structs and functions
  ibtrs: server: main functionality
  ibtrs: server: statistics functions
  ibtrs: server: sysfs interface functions
  ibtrs: include client and server modules into kernel compilation
  ibtrs: a bit of documentation
  ibnbd: private headers with IBNBD protocol structs and helpers
  ibnbd: client: private header with client structs and functions
  ibnbd: client: main functionality
  ibnbd: client: sysfs interface functions
  ibnbd: server: private header with server structs and functions
  ibnbd: server: main functionality
  ibnbd: server: functionality for IO submission to file or block dev
  ibnbd: server: sysfs interface functions
  ibnbd: include client and server modules into kernel compilation
  ibnbd: a bit of documentation
  MAINTAINERS: Add maintainer for IBNBD/IBTRS modules

 MAINTAINERS|   14 +
 drivers/block/Kconfig  |2 +
 drivers/block/Makefile |1 +
 drivers/block/ibnbd/Kconfig|   22 +
 drivers/block/ibnbd/Makefile   |   11 +
 drivers/block/ibnbd/README |  299 +++
 drivers/block/ibnbd/ibnbd-clt-sysfs.c  |  685 ++
 drivers/block/ibnbd/ibnbd-clt.c| 1817 +++
 drivers/block/ibnbd/ibnbd-clt.h|  172 ++
 drivers/block/ibnbd/ibnbd-log.h|   71 +
 drivers/block/ibnbd/ibnbd-proto.h  |  364 +++
 drivers/block/ibnbd/ibnbd-srv-dev.c|  413 
 drivers/block/ibnbd/ibnbd-srv-dev.h|  149 ++
 drivers/block/ibnbd/ibnbd-srv-sysfs.c  |  242 ++
 drivers/block/ibnbd/ibnbd-srv.c 

[PATCH v3 02/25] ibtrs: public interface header to establish RDMA connections

2018-06-06 Thread Roman Pen
Introduce public header which provides set of API functions to
establish RDMA connections from client to server machine using
IBTRS protocol, which manages RDMA connections for each session,
does multipathing and load balancing.

Main functions for client (active) side:

 ibtrs_clt_open() - Creates set of RDMA connections incapsulated
in IBTRS session and returns pointer on IBTRS
session object.
 ibtrs_clt_close() - Closes RDMA connections associated with IBTRS
 session.
 ibtrs_clt_request() - Requests zero-copy RDMA transfer to/from
   server.

Main functions for server (passive) side:

 ibtrs_srv_open() - Starts listening for IBTRS clients on specified
port and invokes IBTRS callbacks for incoming
RDMA requests or link events.
 ibtrs_srv_close() - Closes IBTRS server context.

Signed-off-by: Roman Pen 
Signed-off-by: Danil Kipnis 
Cc: Jack Wang 
---
 drivers/infiniband/ulp/ibtrs/ibtrs.h | 325 +++
 1 file changed, 325 insertions(+)
 create mode 100644 drivers/infiniband/ulp/ibtrs/ibtrs.h

diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs.h 
b/drivers/infiniband/ulp/ibtrs/ibtrs.h
new file mode 100644
index ..24a1e18816d7
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs/ibtrs.h
@@ -0,0 +1,325 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler 
+ *  Jack Wang 
+ *  Kleber Souza 
+ *  Danil Kipnis 
+ *  Roman Penyaev 
+ *  Milind Dumbare 
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis 
+ *  Roman Penyaev 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef IBTRS_H
+#define IBTRS_H
+
+#include 
+#include 
+
+struct ibtrs_tag;
+struct ibtrs_clt;
+struct ibtrs_srv_ctx;
+struct ibtrs_srv;
+struct ibtrs_srv_op;
+
+/*
+ * Here goes IBTRS client API
+ */
+
+/**
+ * enum ibtrs_clt_link_ev - Events about connectivity state of a client
+ * @IBTRS_CLT_LINK_EV_RECONNECTED  Client was reconnected.
+ * @IBTRS_CLT_LINK_EV_DISCONNECTED Client was disconnected.
+ */
+enum ibtrs_clt_link_ev {
+   IBTRS_CLT_LINK_EV_RECONNECTED,
+   IBTRS_CLT_LINK_EV_DISCONNECTED,
+};
+
+/**
+ * Source and destination address of a path to be established
+ */
+struct ibtrs_addr {
+   struct sockaddr_storage *src;
+   struct sockaddr_storage *dst;
+};
+
+typedef void (link_clt_ev_fn)(void *priv, enum ibtrs_clt_link_ev ev);
+/**
+ * ibtrs_clt_open() - Open a session to a IBTRS client
+ * @priv:  User supplied private data.
+ * @link_ev:   Event notification for connection state changes
+ * @priv:  user supplied data that was passed to
+ * ibtrs_clt_open()
+ * @ev:Occurred event
+ * @sessname: name of the session
+ * @paths: Paths to be established defined by their src and dst addresses
+ * @path_cnt: Number of elemnts in the @paths array
+ * @port: port to be used by the IBTRS session
+ * @pdu_sz: Size of extra payload which can be accessed after tag allocation.
+ * @max_inflight_msg: Max. number of parallel inflight messages for the session
+ * @max_segments: Max. number of segments per IO request
+ * @reconnect_delay_sec: time between reconnect tries
+ * @max_reconnect_attempts: Number of times to reconnect on error before giving
+ * up, 0 for * disabled, -1 for forever
+ *
+ * Starts session establishment with the ibtrs_server. The function can block
+ * up to ~2000ms until it returns.
+ *
+ * Return a valid pointer on success otherwise PTR_ERR.
+ */
+struct ibtrs_clt *ibtrs_clt_open(void *priv, link_clt_ev_fn *link_ev,
+const char *sessname,
+const struct ibtrs_addr *paths,
+size_t path_cnt, short port,
+size_t pdu_sz, u8 reconnect_delay_sec,
+u16 max_segments,
+s16 max_reconnect_attempts);
+
+/**
+ * ibtrs_clt_close() - Close a session
+ * @sess: Session handler, is freed on return
+ */
+void ibtrs_clt_close(struct ibtrs_clt

[PATCH v3 23/25] ibnbd: include client and server modules into kernel compilation

2018-06-06 Thread Roman Pen
Add IBNBD Makefile, Kconfig and also corresponding lines into upper
block layer files.

Signed-off-by: Roman Pen 
Signed-off-by: Danil Kipnis 
Cc: Jack Wang 
---
 drivers/block/Kconfig|  2 ++
 drivers/block/Makefile   |  1 +
 drivers/block/ibnbd/Kconfig  | 22 ++
 drivers/block/ibnbd/Makefile | 11 +++
 4 files changed, 36 insertions(+)
 create mode 100644 drivers/block/ibnbd/Kconfig
 create mode 100644 drivers/block/ibnbd/Makefile

diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig
index ad9b687a236a..d8c1590411c8 100644
--- a/drivers/block/Kconfig
+++ b/drivers/block/Kconfig
@@ -481,4 +481,6 @@ config BLK_DEV_RSXX
  To compile this driver as a module, choose M here: the
  module will be called rsxx.
 
+source "drivers/block/ibnbd/Kconfig"
+
 endif # BLK_DEV
diff --git a/drivers/block/Makefile b/drivers/block/Makefile
index dc061158b403..65346a1d0b1a 100644
--- a/drivers/block/Makefile
+++ b/drivers/block/Makefile
@@ -38,6 +38,7 @@ obj-$(CONFIG_BLK_DEV_PCIESSD_MTIP32XX)+= mtip32xx/
 obj-$(CONFIG_BLK_DEV_RSXX) += rsxx/
 obj-$(CONFIG_BLK_DEV_NULL_BLK) += null_blk.o
 obj-$(CONFIG_ZRAM) += zram/
+obj-$(CONFIG_BLK_DEV_IBNBD)+= ibnbd/
 
 skd-y  := skd_main.o
 swim_mod-y := swim.o swim_asm.o
diff --git a/drivers/block/ibnbd/Kconfig b/drivers/block/ibnbd/Kconfig
new file mode 100644
index ..b381c6c084d2
--- /dev/null
+++ b/drivers/block/ibnbd/Kconfig
@@ -0,0 +1,22 @@
+config BLK_DEV_IBNBD
+   bool
+
+config BLK_DEV_IBNBD_CLIENT
+   tristate "Network block device driver on top of IBTRS transport"
+   depends on INFINIBAND_IBTRS_CLIENT
+   select BLK_DEV_IBNBD
+   help
+ IBNBD client allows for mapping of a remote block devices over
+ IBTRS protocol from a target system where IBNBD server is running.
+
+ If unsure, say N.
+
+config BLK_DEV_IBNBD_SERVER
+   tristate "Network block device over RDMA Infiniband server support"
+   depends on INFINIBAND_IBTRS_SERVER
+   select BLK_DEV_IBNBD
+   help
+ IBNBD server allows for exporting local block devices to a remote 
client
+ over IBTRS protocol.
+
+ If unsure, say N.
diff --git a/drivers/block/ibnbd/Makefile b/drivers/block/ibnbd/Makefile
new file mode 100644
index ..ac906036310e
--- /dev/null
+++ b/drivers/block/ibnbd/Makefile
@@ -0,0 +1,11 @@
+ccflags-y := -Idrivers/infiniband/ulp/ibtrs
+
+ibnbd-client-y := ibnbd-clt.o \
+ ibnbd-clt-sysfs.o
+
+ibnbd-server-y := ibnbd-srv.o \
+ ibnbd-srv-dev.o \
+ ibnbd-srv-sysfs.o
+
+obj-$(CONFIG_BLK_DEV_IBNBD_CLIENT) += ibnbd-client.o
+obj-$(CONFIG_BLK_DEV_IBNBD_SERVER) += ibnbd-server.o
-- 
2.13.1



[PATCH v3 18/25] ibnbd: client: sysfs interface functions

2018-06-06 Thread Roman Pen
This is the sysfs interface to IBNBD block devices on client side:

  /sys/devices/virtual/ibnbd-client/ctl/
|- map_device
|  *** maps remote device
|
|- devices/
   *** all mapped devices

  /sys/block/ibnbd/ibnbd_client/
|- unmap_device
|  *** unmaps device
|
|- state
|  *** device state
|
|- session
|  *** session name
|
|- mapping_path
   *** path of the dev that was mapped on server

Signed-off-by: Roman Pen 
Signed-off-by: Danil Kipnis 
Cc: Jack Wang 
---
 drivers/block/ibnbd/ibnbd-clt-sysfs.c | 685 ++
 1 file changed, 685 insertions(+)
 create mode 100644 drivers/block/ibnbd/ibnbd-clt-sysfs.c

diff --git a/drivers/block/ibnbd/ibnbd-clt-sysfs.c 
b/drivers/block/ibnbd/ibnbd-clt-sysfs.c
new file mode 100644
index ..3d3659a74e94
--- /dev/null
+++ b/drivers/block/ibnbd/ibnbd-clt-sysfs.c
@@ -0,0 +1,685 @@
+/*
+ * InfiniBand Network Block Driver
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler 
+ *  Jack Wang 
+ *  Kleber Souza 
+ *  Danil Kipnis 
+ *  Roman Penyaev 
+ *  Milind Dumbare 
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis 
+ *  Roman Penyaev 
+ *  Swapnil Ingle 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#undef pr_fmt
+#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "ibnbd-clt.h"
+
+static struct device *ibnbd_dev;
+static struct class *ibnbd_dev_class;
+static struct kobject *ibnbd_devs_kobj;
+
+enum {
+   IBNBD_OPT_ERR   = 0,
+   IBNBD_OPT_PATH  = 1 << 0,
+   IBNBD_OPT_DEV_PATH  = 1 << 1,
+   IBNBD_OPT_ACCESS_MODE   = 1 << 3,
+   IBNBD_OPT_IO_MODE   = 1 << 5,
+   IBNBD_OPT_SESSNAME  = 1 << 6,
+};
+
+static unsigned int ibnbd_opt_mandatory[] = {
+   IBNBD_OPT_PATH,
+   IBNBD_OPT_DEV_PATH,
+   IBNBD_OPT_SESSNAME,
+};
+
+static const match_table_t ibnbd_opt_tokens = {
+   {   IBNBD_OPT_PATH, "path=%s"   },
+   {   IBNBD_OPT_DEV_PATH, "device_path=%s"},
+   {   IBNBD_OPT_ACCESS_MODE,  "access_mode=%s"},
+   {   IBNBD_OPT_IO_MODE,  "io_mode=%s"},
+   {   IBNBD_OPT_SESSNAME, "sessname=%s"   },
+   {   IBNBD_OPT_ERR,  NULL},
+};
+
+/* remove new line from string */
+static void strip(char *s)
+{
+   char *p = s;
+
+   while (*s != '\0') {
+   if (*s != '\n')
+   *p++ = *s++;
+   else
+   ++s;
+   }
+   *p = '\0';
+}
+
+static int ibnbd_clt_parse_map_options(const char *buf,
+  char *sessname,
+  struct ibtrs_addr *paths,
+  size_t *path_cnt,
+  size_t max_path_cnt,
+  char *pathname,
+  enum ibnbd_access_mode *access_mode,
+  enum ibnbd_io_mode *io_mode)
+{
+   char *options, *sep_opt;
+   char *p;
+   substring_t args[MAX_OPT_ARGS];
+   int opt_mask = 0;
+   int token;
+   int ret = -EINVAL;
+   int i;
+   int p_cnt = 0;
+
+   options = kstrdup(buf, GFP_KERNEL);
+   if (!options)
+   return -ENOMEM;
+
+   sep_opt = strstrip(options);
+   strip(sep_opt);
+   while ((p = strsep(_opt, " ")) != NULL) {
+   if (!*p)
+   continue;
+
+   token = match_token(p, ibnbd_opt_tokens, args);
+   opt_mask |= token;
+
+   switch (token) {
+   case IBNBD_OPT_SESSNAME:
+   p = match_strdup(args);
+   if (!p) {
+   ret = -ENOMEM;
+   goto out;
+   }
+   if (strlen(p

[PATCH v3 20/25] ibnbd: server: main functionality

2018-06-06 Thread Roman Pen
This is main functionality of ibnbd-server module, which handles IBTRS
events and IBNBD protocol requests, like map (open) or unmap (close)
device.  Also server side is responsible for processing incoming IBTRS
IO requests and forward them to local mapped devices.

Signed-off-by: Roman Pen 
Signed-off-by: Danil Kipnis 
Cc: Jack Wang 
---
 drivers/block/ibnbd/ibnbd-srv.c | 946 
 1 file changed, 946 insertions(+)
 create mode 100644 drivers/block/ibnbd/ibnbd-srv.c

diff --git a/drivers/block/ibnbd/ibnbd-srv.c b/drivers/block/ibnbd/ibnbd-srv.c
new file mode 100644
index ..b045f8071ab0
--- /dev/null
+++ b/drivers/block/ibnbd/ibnbd-srv.c
@@ -0,0 +1,946 @@
+/*
+ * InfiniBand Network Block Driver
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler 
+ *  Jack Wang 
+ *  Kleber Souza 
+ *  Danil Kipnis 
+ *  Roman Penyaev 
+ *  Milind Dumbare 
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis 
+ *  Roman Penyaev 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#undef pr_fmt
+#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt
+
+#include 
+#include 
+
+#include "ibnbd-srv.h"
+#include "ibnbd-srv-dev.h"
+
+MODULE_AUTHOR("ib...@profitbricks.com");
+MODULE_VERSION(IBNBD_VER_STRING);
+MODULE_DESCRIPTION("InfiniBand Network Block Device Server");
+MODULE_LICENSE("GPL");
+
+#define DEFAULT_DEV_SEARCH_PATH "/"
+
+static char dev_search_path[PATH_MAX] = DEFAULT_DEV_SEARCH_PATH;
+
+static int dev_search_path_set(const char *val, const struct kernel_param *kp)
+{
+   char *dup;
+
+   if (strlen(val) >= sizeof(dev_search_path))
+   return -EINVAL;
+
+   dup = kstrdup(val, GFP_KERNEL);
+
+   if (dup[strlen(dup) - 1] == '\n')
+   dup[strlen(dup) - 1] = '\0';
+
+   strlcpy(dev_search_path, dup, sizeof(dev_search_path));
+
+   kfree(dup);
+   pr_info("dev_search_path changed to '%s'\n", dev_search_path);
+
+   return 0;
+}
+
+static struct kparam_string dev_search_path_kparam_str = {
+   .maxlen = sizeof(dev_search_path),
+   .string = dev_search_path
+};
+
+static const struct kernel_param_ops dev_search_path_ops = {
+   .set= dev_search_path_set,
+   .get= param_get_string,
+};
+
+module_param_cb(dev_search_path, _search_path_ops,
+   _search_path_kparam_str, 0444);
+MODULE_PARM_DESC(dev_search_path, "Sets the dev_search_path."
+" When a device is mapped this path is prepended to the"
+" device path from the map device operation.  If %SESSNAME%"
+" is specified in a path, then device will be searched in a"
+" session namespace."
+" (default: " DEFAULT_DEV_SEARCH_PATH ")");
+
+static int def_io_mode = IBNBD_BLOCKIO;
+
+static int def_io_mode_set(const char *val, const struct kernel_param *kp)
+{
+   int io_mode, rc;
+
+   rc = kstrtoint(val, 0, _mode);
+   if (unlikely(rc))
+   return rc;
+
+   switch (io_mode) {
+   case IBNBD_FILEIO:
+   case IBNBD_BLOCKIO:
+   def_io_mode = io_mode;
+   return 0;
+   default:
+   return -EINVAL;
+   }
+}
+
+static const struct kernel_param_ops def_io_mode_ops = {
+   .set= def_io_mode_set,
+   .get= param_get_int,
+};
+module_param_cb(def_io_mode, _io_mode_ops, _io_mode, 0444);
+MODULE_PARM_DESC(def_io_mode, "By default, export devices in"
+" blockio(" __stringify(_IBNBD_BLOCKIO) ") or"
+" fileio(" __stringify(_IBNBD_FILEIO) ") mode."
+" (default: " __stringify(_IBNBD_BLOCKIO) " (blockio))");
+
+static DEFINE_MUTEX(sess_lock);
+static DEFINE_SPINLOCK(dev_lock);
+
+static LIST_HEAD(sess_list);
+static LIST_HEAD(dev_list);
+
+struct ibnbd_io_private {
+   struct ibtrs_srv_op *id;
+   struct ibnbd_srv_sess_dev   *sess_dev;
+};
+
+static void ibnbd_sess_dev_release(struct kref *kref)
+{
+

[PATCH v3 24/25] ibnbd: a bit of documentation

2018-06-06 Thread Roman Pen
README with description of major sysfs entries.

Signed-off-by: Roman Pen 
Signed-off-by: Danil Kipnis 
Cc: Jack Wang 
---
 drivers/block/ibnbd/README | 299 +
 1 file changed, 299 insertions(+)
 create mode 100644 drivers/block/ibnbd/README

diff --git a/drivers/block/ibnbd/README b/drivers/block/ibnbd/README
new file mode 100644
index ..bbaddd02c1c5
--- /dev/null
+++ b/drivers/block/ibnbd/README
@@ -0,0 +1,299 @@
+***
+Infiniband Network Block Device (IBNBD)
+***
+
+Introduction
+
+
+IBNBD (InfiniBand Network Block Device) is a pair of kernel modules
+(client and server) that allow for remote access of a block device on
+the server over IBTRS protocol using the RDMA (InfiniBand, RoCE, iWarp)
+transport. After being mapped, the remote block devices can be accessed
+on the client side as local block devices.
+
+I/O is transfered between client and server by the IBTRS transport
+modules. The administration of IBNBD and IBTRS modules is done via
+sysfs entries.
+
+Requirements
+
+
+  IBTRS kernel modules
+
+Quick Start
+---
+
+Server side:
+  # modprobe ibnbd_server
+
+Client side:
+  # modprobe ibnbd_client
+  # echo "sessname=blya path=ip:10.50.100.66 device_path=/dev/ram0" > \
+/sys/devices/virtual/ibnbd-client/ctl/map_device
+
+  Where "sessname=" is a session name, a string to identify the session
+  on client and on server sides; "path=" is a destination IP address or
+  a pair of a source and a destination IPs, separated by comma.  Multiple
+  "path=" options can be specified in order to use multipath  (see IBTRS
+  description for details); "device_path=" is the block device to be
+  mapped from the server side. After the session to the server machine is
+  established, the mapped device will appear on the client side under
+  /dev/ibnbd.
+
+
+==
+Client Sysfs Interface
+==
+
+All sysfs files that are not read-only provide the usage information on read:
+
+Example:
+  # cat /sys/devices/virtual/ibnbd-client/ctl/map_device
+
+  > Usage: echo "sessname= path=<[srcaddr,]dstaddr>
+  > [path=<[srcaddr,]dstaddr>] device_path=
+  > [access_mode=]
+  > [io_mode=]" > map_device
+  >
+  > addr ::= [ ip: | ip: | gid: ]
+
+Entries under /sys/devices/virtual/ibnbd-client/ctl/
+===
+
+map_device (RW)
+---
+
+Expected format is the following:
+
+sessname=
+path=<[srcaddr,]dstaddr> [path=<[srcaddr,]dstaddr> ...]
+device_path=
+[access_mode=]
+[io_mode=]
+
+Where:
+
+sessname: accepts a string not bigger than 256 chars, which identifies
+  a given session on the client and on the server.
+  I.e. "clt_hostname-srv_hostname" could be a natural choice.
+
+path: describes a connection between the client and the server by
+  specifying destination and, when required, the source address.
+  The addresses are to be provided in the following format:
+
+ip:
+ip:
+gid:
+
+  for example:
+
+  path=ip:10.0.0.66
+ The single addr is treated as the destination.
+ The connection will be established to this
+ server from any client IP address.
+
+  path=ip:10.0.0.66,ip:10.0.1.66
+ First addr is the source address and the second
+ is the destination.
+
+  If multiple "path=" options are specified multiple connection
+  will be established and data will be sent according to
+  the selected multipath policy (see IBTRS mp_policy sysfs entry
+  description).
+
+device_path: Path to the block device on the server side. Path is specified
+ relative to the directory on server side configured in the
+ 'dev_search_path' module parameter of the ibnbd_server.
+ The ibnbd_server prepends the  received from client
+ with  and tries to open the
+ / block device.  On success,
+ a /dev/ibnbd device file, a /sys/block/ibnbd_client/ibnbd/
+ directory and an entry in 
/sys/devices/virtual/ibnbd-client/ctl/devices
+ will be created.
+
+ If 'dev_search_path' contains '%SESSNAME%', then each session can
+ have different devices namespace, e.g. server was configured with
+ the following parameter "dev_search_path=/run/ibnbd-devs/%SESSNAME%",
+ client has this string "sessname=blya device_path=sda", then server
+ will try to open: /run/ibnbd-devs/blya/sda.
+
+access_mode: the access_mode parameter specifies if the device is to be
+ mapped as "ro" read-only or "rw" rea

[PATCH v3 19/25] ibnbd: server: private header with server structs and functions

2018-06-06 Thread Roman Pen
This header describes main structs and functions used by ibnbd-server
module, namely structs for managing sessions from different clients
and mapped (opened) devices.

Signed-off-by: Roman Pen 
Signed-off-by: Danil Kipnis 
Cc: Jack Wang 
---
 drivers/block/ibnbd/ibnbd-srv.h | 100 
 1 file changed, 100 insertions(+)
 create mode 100644 drivers/block/ibnbd/ibnbd-srv.h

diff --git a/drivers/block/ibnbd/ibnbd-srv.h b/drivers/block/ibnbd/ibnbd-srv.h
new file mode 100644
index ..191a1650bc1d
--- /dev/null
+++ b/drivers/block/ibnbd/ibnbd-srv.h
@@ -0,0 +1,100 @@
+/*
+ * InfiniBand Network Block Driver
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler 
+ *  Jack Wang 
+ *  Kleber Souza 
+ *  Danil Kipnis 
+ *  Roman Penyaev 
+ *  Milind Dumbare 
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis 
+ *  Roman Penyaev 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef IBNBD_SRV_H
+#define IBNBD_SRV_H
+
+#include 
+#include 
+#include 
+
+#include "ibtrs.h"
+#include "ibnbd-proto.h"
+#include "ibnbd-log.h"
+
+struct ibnbd_srv_session {
+   /* Entry inside global sess_list */
+   struct list_headlist;
+   struct ibtrs_srv*ibtrs;
+   charsessname[NAME_MAX];
+   int queue_depth;
+   struct bio_set  *sess_bio_set;
+
+   rwlock_tindex_lock cacheline_aligned;
+   struct idr  index_idr;
+   /* List of struct ibnbd_srv_sess_dev */
+   struct list_headsess_dev_list;
+   struct mutexlock;
+   u8  ver;
+};
+
+struct ibnbd_srv_dev {
+   /* Entry inside global dev_list */
+   struct list_headlist;
+   struct kobject  dev_kobj;
+   struct kobject  dev_sessions_kobj;
+   struct kref kref;
+   charid[NAME_MAX];
+   /* List of ibnbd_srv_sess_dev structs */
+   struct list_headsess_dev_list;
+   struct mutexlock;
+   int open_write_cnt;
+   enum ibnbd_io_mode  mode;
+};
+
+/* Structure which binds N devices and N sessions */
+struct ibnbd_srv_sess_dev {
+   /* Entry inside ibnbd_srv_dev struct */
+   struct list_headdev_list;
+   /* Entry inside ibnbd_srv_session struct */
+   struct list_headsess_list;
+   struct ibnbd_dev*ibnbd_dev;
+   struct ibnbd_srv_session*sess;
+   struct ibnbd_srv_dev*dev;
+   struct kobject  kobj;
+   struct completion   *sysfs_release_compl;
+   u32 device_id;
+   fmode_t open_flags;
+   struct kref kref;
+   struct completion   *destroy_comp;
+   charpathname[NAME_MAX];
+};
+
+/* ibnbd-srv-sysfs.c */
+
+int ibnbd_srv_create_dev_sysfs(struct ibnbd_srv_dev *dev,
+  struct block_device *bdev,
+  const char *dir_name);
+void ibnbd_srv_destroy_dev_sysfs(struct ibnbd_srv_dev *dev);
+int ibnbd_srv_create_dev_session_sysfs(struct ibnbd_srv_sess_dev *sess_dev);
+void ibnbd_srv_destroy_dev_session_sysfs(struct ibnbd_srv_sess_dev *sess_dev);
+int ibnbd_srv_create_sysfs_files(void);
+void ibnbd_srv_destroy_sysfs_files(void);
+
+#endif /* IBNBD_SRV_H */
-- 
2.13.1



[PATCH v3 03/25] ibtrs: private headers with IBTRS protocol structs and helpers

2018-06-06 Thread Roman Pen
These are common private headers with IBTRS protocol structures,
logging, sysfs and other helper functions, which are used on
both client and server sides.

Signed-off-by: Roman Pen 
Signed-off-by: Danil Kipnis 
Cc: Jack Wang 
---
 drivers/infiniband/ulp/ibtrs/ibtrs-log.h |  91 ++
 drivers/infiniband/ulp/ibtrs/ibtrs-pri.h | 470 +++
 2 files changed, 561 insertions(+)
 create mode 100644 drivers/infiniband/ulp/ibtrs/ibtrs-log.h
 create mode 100644 drivers/infiniband/ulp/ibtrs/ibtrs-pri.h

diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-log.h 
b/drivers/infiniband/ulp/ibtrs/ibtrs-log.h
new file mode 100644
index ..f56257eabdee
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs/ibtrs-log.h
@@ -0,0 +1,91 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler 
+ *  Jack Wang 
+ *  Kleber Souza 
+ *  Danil Kipnis 
+ *  Roman Penyaev 
+ *  Milind Dumbare 
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis 
+ *  Roman Penyaev 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef IBTRS_LOG_H
+#define IBTRS_LOG_H
+
+#define P1 )
+#define P2 ))
+#define P3 )))
+#define P4 
+#define P(N) P ## N
+
+#define CAT(a, ...) PRIMITIVE_CAT(a, __VA_ARGS__)
+#define PRIMITIVE_CAT(a, ...) a ## __VA_ARGS__
+
+#define LIST(...)  \
+   __VA_ARGS__,\
+   ({ unknown_type(); NULL; }) \
+   CAT(P, COUNT_ARGS(__VA_ARGS__)) \
+
+#define EMPTY()
+#define DEFER(id) id EMPTY()
+
+#define _CASE(obj, type, member)   \
+   __builtin_choose_expr(  \
+   __builtin_types_compatible_p(   \
+   typeof(obj), type), \
+   ((type)obj)->member
+#define CASE(o, t, m) DEFER(_CASE)(o,t,m)
+
+/*
+ * Below we define retrieving of sessname from common IBTRS types.
+ * Client or server related types have to be defined by special
+ * TYPES_TO_SESSNAME macro.
+ */
+
+void unknown_type(void);
+
+#ifndef TYPES_TO_SESSNAME
+#define TYPES_TO_SESSNAME(...) ({ unknown_type(); NULL; })
+#endif
+
+#define ibtrs_prefix(obj)  \
+   _CASE(obj, struct ibtrs_con *,  sess->sessname),\
+   _CASE(obj, struct ibtrs_sess *, sessname),  \
+   TYPES_TO_SESSNAME(obj)  \
+   ))
+
+#define ibtrs_log(fn, obj, fmt, ...)   \
+   fn("<%s>: " fmt, ibtrs_prefix(obj), ##__VA_ARGS__)
+
+#define ibtrs_err(obj, fmt, ...)   \
+   ibtrs_log(pr_err, obj, fmt, ##__VA_ARGS__)
+#define ibtrs_err_rl(obj, fmt, ...)\
+   ibtrs_log(pr_err_ratelimited, obj, fmt, ##__VA_ARGS__)
+#define ibtrs_wrn(obj, fmt, ...)   \
+   ibtrs_log(pr_warn, obj, fmt, ##__VA_ARGS__)
+#define ibtrs_wrn_rl(obj, fmt, ...) \
+   ibtrs_log(pr_warn_ratelimited, obj, fmt, ##__VA_ARGS__)
+#define ibtrs_info(obj, fmt, ...) \
+   ibtrs_log(pr_info, obj, fmt, ##__VA_ARGS__)
+#define ibtrs_info_rl(obj, fmt, ...) \
+   ibtrs_log(pr_info_ratelimited, obj, fmt, ##__VA_ARGS__)
+
+#endif /* IBTRS_LOG_H */
diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-pri.h 
b/drivers/infiniband/ulp/ibtrs/ibtrs-pri.h
new file mode 100644
index ..f56652a46a8d
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs/ibtrs-pri.h
@@ -0,0 +1,470 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler 
+ *  Jack Wang 
+ *  Kleber Souza 
+ *  Danil Kipnis 
+ *  Roman Penyaev 
+ *  Milind Dumbare 
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis 
+ *  Roman Penyaev 
+ *  Swapnil Ingle 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ *

[PATCH v3 21/25] ibnbd: server: functionality for IO submission to file or block dev

2018-06-06 Thread Roman Pen
This provides helper functions for IO submission to file or block dev.

Signed-off-by: Roman Pen 
Signed-off-by: Danil Kipnis 
Cc: Jack Wang 
---
 drivers/block/ibnbd/ibnbd-srv-dev.c | 413 
 drivers/block/ibnbd/ibnbd-srv-dev.h | 149 +
 2 files changed, 562 insertions(+)
 create mode 100644 drivers/block/ibnbd/ibnbd-srv-dev.c
 create mode 100644 drivers/block/ibnbd/ibnbd-srv-dev.h

diff --git a/drivers/block/ibnbd/ibnbd-srv-dev.c 
b/drivers/block/ibnbd/ibnbd-srv-dev.c
new file mode 100644
index ..aefa10fcafc3
--- /dev/null
+++ b/drivers/block/ibnbd/ibnbd-srv-dev.c
@@ -0,0 +1,413 @@
+/*
+ * InfiniBand Network Block Driver
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler 
+ *  Jack Wang 
+ *  Kleber Souza 
+ *  Danil Kipnis 
+ *  Roman Penyaev 
+ *  Milind Dumbare 
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis 
+ *  Roman Penyaev 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#undef pr_fmt
+#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt
+
+#include "ibnbd-srv-dev.h"
+#include "ibnbd-log.h"
+
+#define IBNBD_DEV_MAX_FILEIO_ACTIVE_WORKERS 0
+
+struct ibnbd_dev_file_io_work {
+   struct ibnbd_dev*dev;
+   void*priv;
+
+   sector_tsector;
+   void*data;
+   size_t  len;
+   size_t  bi_size;
+   enum ibnbd_io_flags flags;
+
+   struct work_struct  work;
+};
+
+struct ibnbd_dev_blk_io {
+   struct ibnbd_dev *dev;
+   void *priv;
+};
+
+static struct workqueue_struct *fileio_wq;
+
+int ibnbd_dev_init(void)
+{
+   fileio_wq = alloc_workqueue("%s", WQ_UNBOUND,
+   IBNBD_DEV_MAX_FILEIO_ACTIVE_WORKERS,
+   "ibnbd_server_fileio_wq");
+   if (!fileio_wq)
+   return -ENOMEM;
+
+   return 0;
+}
+
+void ibnbd_dev_destroy(void)
+{
+   destroy_workqueue(fileio_wq);
+}
+
+static inline struct block_device *ibnbd_dev_open_bdev(const char *path,
+  fmode_t flags)
+{
+   return blkdev_get_by_path(path, flags, THIS_MODULE);
+}
+
+static int ibnbd_dev_blk_open(struct ibnbd_dev *dev, const char *path,
+ fmode_t flags)
+{
+   dev->bdev = ibnbd_dev_open_bdev(path, flags);
+   return PTR_ERR_OR_ZERO(dev->bdev);
+}
+
+static int ibnbd_dev_vfs_open(struct ibnbd_dev *dev, const char *path,
+ fmode_t flags)
+{
+   int oflags = O_DSYNC; /* enable write-through */
+
+   if (flags & FMODE_WRITE)
+   oflags |= O_RDWR;
+   else if (flags & FMODE_READ)
+   oflags |= O_RDONLY;
+   else
+   return -EINVAL;
+
+   dev->file = filp_open(path, oflags, 0);
+   return PTR_ERR_OR_ZERO(dev->file);
+}
+
+struct ibnbd_dev *ibnbd_dev_open(const char *path, fmode_t flags,
+enum ibnbd_io_mode mode, struct bio_set *bs,
+ibnbd_dev_io_fn io_cb)
+{
+   struct ibnbd_dev *dev;
+   int ret;
+
+   dev = kzalloc(sizeof(*dev), GFP_KERNEL);
+   if (!dev)
+   return ERR_PTR(-ENOMEM);
+
+   if (mode == IBNBD_BLOCKIO) {
+   dev->blk_open_flags = flags;
+   ret = ibnbd_dev_blk_open(dev, path, dev->blk_open_flags);
+   if (ret)
+   goto err;
+   } else if (mode == IBNBD_FILEIO) {
+   dev->blk_open_flags = FMODE_READ;
+   ret = ibnbd_dev_blk_open(dev, path, dev->blk_open_flags);
+   if (ret)
+   goto err;
+
+   ret = ibnbd_dev_vfs_open(dev, path, flags);
+   if (ret)
+   goto blk_put;
+   } else {
+   ret = -EINVAL;
+   goto err;
+   }
+
+   dev->blk_open_flags = flags;
+   dev->mode   = mode;
+   dev->io_cb  = io_cb;
+   bdevname(dev->bdev, dev->name);
+   dev->

[PATCH v3 22/25] ibnbd: server: sysfs interface functions

2018-06-06 Thread Roman Pen
This is the sysfs interface to IBNBD mapped devices on server side:

  /sys/devices/virtual/ibnbd-server/ctl/devices//
|- block_dev
|  *** link pointing to the corresponding block device sysfs entry
|
|- sessions//
|  *** sessions directory
   |
   |- read_only
   |  *** is devices mapped as read only
   |
   |- mapping_path
  *** relative device path provided by the client during mapping

Signed-off-by: Roman Pen 
Signed-off-by: Danil Kipnis 
Cc: Jack Wang 
---
 drivers/block/ibnbd/ibnbd-srv-sysfs.c | 242 ++
 1 file changed, 242 insertions(+)
 create mode 100644 drivers/block/ibnbd/ibnbd-srv-sysfs.c

diff --git a/drivers/block/ibnbd/ibnbd-srv-sysfs.c 
b/drivers/block/ibnbd/ibnbd-srv-sysfs.c
new file mode 100644
index ..5bf77cdb09c8
--- /dev/null
+++ b/drivers/block/ibnbd/ibnbd-srv-sysfs.c
@@ -0,0 +1,242 @@
+/*
+ * InfiniBand Network Block Driver
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler 
+ *  Jack Wang 
+ *  Kleber Souza 
+ *  Danil Kipnis 
+ *  Roman Penyaev 
+ *  Milind Dumbare 
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis 
+ *  Roman Penyaev 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#undef pr_fmt
+#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "ibnbd-srv.h"
+
+static struct device *ibnbd_dev;
+static struct class *ibnbd_dev_class;
+static struct kobject *ibnbd_devs_kobj;
+
+static struct attribute *ibnbd_srv_default_dev_attrs[] = {
+   NULL,
+};
+
+static struct attribute_group ibnbd_srv_default_dev_attr_group = {
+   .attrs = ibnbd_srv_default_dev_attrs,
+};
+
+static struct kobj_type ktype = {
+   .sysfs_ops  = _sysfs_ops,
+};
+
+int ibnbd_srv_create_dev_sysfs(struct ibnbd_srv_dev *dev,
+  struct block_device *bdev,
+  const char *dir_name)
+{
+   struct kobject *bdev_kobj;
+   int ret;
+
+   ret = kobject_init_and_add(>dev_kobj, ,
+  ibnbd_devs_kobj, dir_name);
+   if (ret)
+   return ret;
+
+   ret = kobject_init_and_add(>dev_sessions_kobj,
+  ,
+  >dev_kobj, "sessions");
+   if (ret)
+   goto err;
+
+   ret = sysfs_create_group(>dev_kobj,
+_srv_default_dev_attr_group);
+   if (ret)
+   goto err2;
+
+   bdev_kobj = _to_dev(bdev->bd_disk)->kobj;
+   ret = sysfs_create_link(>dev_kobj, bdev_kobj, "block_dev");
+   if (ret)
+   goto err3;
+
+   return 0;
+
+err3:
+   sysfs_remove_group(>dev_kobj,
+  _srv_default_dev_attr_group);
+err2:
+   kobject_del(>dev_sessions_kobj);
+   kobject_put(>dev_sessions_kobj);
+err:
+   kobject_del(>dev_kobj);
+   kobject_put(>dev_kobj);
+   return ret;
+}
+
+void ibnbd_srv_destroy_dev_sysfs(struct ibnbd_srv_dev *dev)
+{
+   sysfs_remove_link(>dev_kobj, "block_dev");
+   sysfs_remove_group(>dev_kobj, _srv_default_dev_attr_group);
+   kobject_del(>dev_sessions_kobj);
+   kobject_put(>dev_sessions_kobj);
+   kobject_del(>dev_kobj);
+   kobject_put(>dev_kobj);
+}
+
+static ssize_t ibnbd_srv_dev_session_ro_show(struct kobject *kobj,
+struct kobj_attribute *attr,
+char *page)
+{
+   struct ibnbd_srv_sess_dev *sess_dev;
+
+   sess_dev = container_of(kobj, struct ibnbd_srv_sess_dev, kobj);
+
+   return scnprintf(page, PAGE_SIZE, "%s\n",
+(sess_dev->open_flags & FMODE_WRITE) ? "0" : "1");
+}
+
+static struct kobj_attribute ibnbd_srv_dev_session_ro_attr =
+   __ATTR(read_only, 0444,
+  ibnbd_srv_dev_session_ro_show,
+  NULL);
+
+static ssize_t
+ibnbd_srv_dev_session_mapping_path_show(struct kobject *kobj,
+ 

[PATCH v3 15/25] ibnbd: private headers with IBNBD protocol structs and helpers

2018-06-06 Thread Roman Pen
These are common private headers with IBNBD protocol structures,
logging, sysfs and other helper functions, which are used on
both client and server sides.

Signed-off-by: Roman Pen 
Signed-off-by: Danil Kipnis 
Cc: Jack Wang 
---
 drivers/block/ibnbd/ibnbd-log.h   |  71 
 drivers/block/ibnbd/ibnbd-proto.h | 364 ++
 2 files changed, 435 insertions(+)
 create mode 100644 drivers/block/ibnbd/ibnbd-log.h
 create mode 100644 drivers/block/ibnbd/ibnbd-proto.h

diff --git a/drivers/block/ibnbd/ibnbd-log.h b/drivers/block/ibnbd/ibnbd-log.h
new file mode 100644
index ..489343a61171
--- /dev/null
+++ b/drivers/block/ibnbd/ibnbd-log.h
@@ -0,0 +1,71 @@
+/*
+ * InfiniBand Network Block Driver
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler 
+ *  Jack Wang 
+ *  Kleber Souza 
+ *  Danil Kipnis 
+ *  Roman Penyaev 
+ *  Milind Dumbare 
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis 
+ *  Roman Penyaev 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef IBNBD_LOG_H
+#define IBNBD_LOG_H
+
+#include "ibnbd-clt.h"
+#include "ibnbd-srv.h"
+
+#define ibnbd_diskname(dev) ({ \
+   struct gendisk *gd = ((struct ibnbd_clt_dev *)dev)->gd; \
+   gd ? gd->disk_name : "";\
+})
+
+void unknown_type(void);
+
+#define ibnbd_log(fn, dev, fmt, ...) ({
\
+   __builtin_choose_expr(  \
+   __builtin_types_compatible_p(   \
+   typeof(dev), struct ibnbd_clt_dev *),   \
+   fn("<%s@%s> %s: " fmt, (dev)->pathname, \
+  (dev)->sess->sessname, ibnbd_diskname(dev),  \
+  ##__VA_ARGS__),  \
+   __builtin_choose_expr(  \
+   __builtin_types_compatible_p(typeof(dev),   \
+   struct ibnbd_srv_sess_dev *),   \
+   fn("<%s@%s>: " fmt, (dev)->pathname,\
+  (dev)->sess->sessname, ##__VA_ARGS__),   
\
+   unknown_type()));   \
+})
+
+#define ibnbd_err(dev, fmt, ...)   \
+   ibnbd_log(pr_err, dev, fmt, ##__VA_ARGS__)
+#define ibnbd_err_rl(dev, fmt, ...)\
+   ibnbd_log(pr_err_ratelimited, dev, fmt, ##__VA_ARGS__)
+#define ibnbd_wrn(dev, fmt, ...)   \
+   ibnbd_log(pr_warn, dev, fmt, ##__VA_ARGS__)
+#define ibnbd_wrn_rl(dev, fmt, ...) \
+   ibnbd_log(pr_warn_ratelimited, dev, fmt, ##__VA_ARGS__)
+#define ibnbd_info(dev, fmt, ...) \
+   ibnbd_log(pr_info, dev, fmt, ##__VA_ARGS__)
+#define ibnbd_info_rl(dev, fmt, ...) \
+   ibnbd_log(pr_info_ratelimited, dev, fmt, ##__VA_ARGS__)
+
+#endif /* IBNBD_LOG_H */
diff --git a/drivers/block/ibnbd/ibnbd-proto.h 
b/drivers/block/ibnbd/ibnbd-proto.h
new file mode 100644
index ..050d3fa4c1bf
--- /dev/null
+++ b/drivers/block/ibnbd/ibnbd-proto.h
@@ -0,0 +1,364 @@
+/*
+ * InfiniBand Network Block Driver
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler 
+ *  Jack Wang 
+ *  Kleber Souza 
+ *  Danil Kipnis 
+ *  Roman Penyaev 
+ *  Milind Dumbare 
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis 
+ *  Roman Penyaev 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU 

[PATCH v3 25/25] MAINTAINERS: Add maintainer for IBNBD/IBTRS modules

2018-06-06 Thread Roman Pen
Signed-off-by: Roman Pen 
Cc: Danil Kipnis 
Cc: Jack Wang 
---
 MAINTAINERS | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index ca4afd68530c..201c6c8e039e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6782,6 +6782,20 @@ IBM ServeRAID RAID DRIVER
 S: Orphan
 F: drivers/scsi/ips.*
 
+IBNBD BLOCK DRIVERS
+M: IBNBD/IBTRS Storage Team 
+L: linux-block@vger.kernel.org
+S: Maintained
+T: git git://github.com/profitbricks/ibnbd.git
+F: drivers/block/ibnbd/
+
+IBTRS TRANSPORT DRIVERS
+M: IBNBD/IBTRS Storage Team 
+L: linux-r...@vger.kernel.org
+S: Maintained
+T: git git://github.com/profitbricks/ibnbd.git
+F: drivers/infiniband/ulp/ibtrs/
+
 ICH LPC AND GPIO DRIVER
 M: Peter Tyser 
 S: Maintained
-- 
2.13.1



[PATCH v3 10/25] ibtrs: server: main functionality

2018-06-06 Thread Roman Pen
This is main functionality of ibtrs-server module, which accepts
set of RDMA connections (so called IBTRS session), creates/destroys
sysfs entries associated with IBTRS session and notifies upper layer
(user of IBTRS API) about RDMA requests or link events.

Signed-off-by: Roman Pen 
Signed-off-by: Danil Kipnis 
Cc: Jack Wang 
---
 drivers/infiniband/ulp/ibtrs/ibtrs-srv.c | 2003 ++
 1 file changed, 2003 insertions(+)
 create mode 100644 drivers/infiniband/ulp/ibtrs/ibtrs-srv.c

diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-srv.c 
b/drivers/infiniband/ulp/ibtrs/ibtrs-srv.c
new file mode 100644
index ..22c965cd5c8b
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs/ibtrs-srv.c
@@ -0,0 +1,2003 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler 
+ *  Jack Wang 
+ *  Kleber Souza 
+ *  Danil Kipnis 
+ *  Roman Penyaev 
+ *  Milind Dumbare 
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis 
+ *  Roman Penyaev 
+ *  Swapnil Ingle 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#undef pr_fmt
+#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt
+
+#include 
+#include 
+
+#include "ibtrs-srv.h"
+#include "ibtrs-log.h"
+
+MODULE_AUTHOR("ib...@profitbricks.com");
+MODULE_DESCRIPTION("IBTRS Server");
+MODULE_VERSION(IBTRS_VER_STRING);
+MODULE_LICENSE("GPL");
+
+/* Must be power of 2, see mask from mr->page_size in ib_sg_to_pages() */
+#define DEFAULT_MAX_CHUNK_SIZE (128 << 10)
+#define DEFAULT_SESS_QUEUE_DEPTH 512
+#define MAX_HDR_SIZE PAGE_SIZE
+#define MAX_SG_COUNT ((MAX_HDR_SIZE - sizeof(struct ibtrs_msg_rdma_read)) \
+ / sizeof(struct ibtrs_sg_desc))
+
+/* We guarantee to serve 10 paths at least */
+#define CHUNK_POOL_SZ 10
+
+static struct ibtrs_ib_dev_pool dev_pool;
+static mempool_t *chunk_pool;
+struct class *ibtrs_dev_class;
+
+static int retry_count = 7;
+static int __read_mostly max_chunk_size = DEFAULT_MAX_CHUNK_SIZE;
+static int __read_mostly sess_queue_depth = DEFAULT_SESS_QUEUE_DEPTH;
+
+module_param_named(max_chunk_size, max_chunk_size, int, 0444);
+MODULE_PARM_DESC(max_chunk_size,
+"Max size for each IO request, when change the unit is in byte"
+" (default: " __stringify(DEFAULT_MAX_CHUNK_SIZE_KB) "KB)");
+
+module_param_named(sess_queue_depth, sess_queue_depth, int, 0444);
+MODULE_PARM_DESC(sess_queue_depth,
+"Number of buffers for pending I/O requests to allocate"
+" per session. Maximum: " __stringify(MAX_SESS_QUEUE_DEPTH)
+" (default: " __stringify(DEFAULT_SESS_QUEUE_DEPTH) ")");
+
+static int retry_count_set(const char *val, const struct kernel_param *kp)
+{
+   int err, ival;
+
+   err = kstrtoint(val, 0, );
+   if (err)
+   return err;
+
+   if (ival < MIN_RTR_CNT || ival > MAX_RTR_CNT) {
+   pr_err("Invalid retry count value %d, has to be"
+  " > %d, < %d\n", ival, MIN_RTR_CNT, MAX_RTR_CNT);
+   return -EINVAL;
+   }
+
+   retry_count = ival;
+   pr_info("QP retry count changed to %d\n", ival);
+
+   return 0;
+}
+
+static const struct kernel_param_ops retry_count_ops = {
+   .set= retry_count_set,
+   .get= param_get_int,
+};
+module_param_cb(retry_count, _count_ops, _count, 0644);
+
+MODULE_PARM_DESC(retry_count, "Number of times to send the message if the"
+" remote side didn't respond with Ack or Nack (default: 3,"
+" min: " __stringify(MIN_RTR_CNT) ", max: "
+__stringify(MAX_RTR_CNT) ")");
+
+static char cq_affinity_list[256] = "";
+static cpumask_t cq_affinity_mask = { CPU_BITS_ALL };
+
+static void init_cq_affinity(void)
+{
+   sprintf(cq_affinity_list, "0-%d", nr_cpu_ids - 1);
+}
+
+static int cq_affinity_list_set(const char *val, const struct kernel_param *kp)
+{
+   int

[PATCH v3 09/25] ibtrs: server: private header with server structs and functions

2018-06-06 Thread Roman Pen
This header describes main structs and functions used by ibtrs-server
module, mainly for accepting IBTRS sessions, creating/destroying
sysfs entries, accounting statistics on server side.

Signed-off-by: Roman Pen 
Signed-off-by: Danil Kipnis 
Cc: Jack Wang 
---
 drivers/infiniband/ulp/ibtrs/ibtrs-srv.h | 177 +++
 1 file changed, 177 insertions(+)
 create mode 100644 drivers/infiniband/ulp/ibtrs/ibtrs-srv.h

diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-srv.h 
b/drivers/infiniband/ulp/ibtrs/ibtrs-srv.h
new file mode 100644
index ..b1e32136f352
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs/ibtrs-srv.h
@@ -0,0 +1,177 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler 
+ *  Jack Wang 
+ *  Kleber Souza 
+ *  Danil Kipnis 
+ *  Roman Penyaev 
+ *  Milind Dumbare 
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis 
+ *  Roman Penyaev 
+ *  Swapnil Ingle 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef IBTRS_SRV_H
+#define IBTRS_SRV_H
+
+#include 
+#include 
+#include "ibtrs-pri.h"
+
+/**
+ * enum ibtrs_srv_state - Server states.
+ */
+enum ibtrs_srv_state {
+   IBTRS_SRV_CONNECTING,
+   IBTRS_SRV_CONNECTED,
+   IBTRS_SRV_CLOSING,
+   IBTRS_SRV_CLOSED,
+};
+
+static inline const char *ibtrs_srv_state_str(enum ibtrs_srv_state state)
+{
+   switch (state) {
+   case IBTRS_SRV_CONNECTING:
+   return "IBTRS_SRV_CONNECTING";
+   case IBTRS_SRV_CONNECTED:
+   return "IBTRS_SRV_CONNECTED";
+   case IBTRS_SRV_CLOSING:
+   return "IBTRS_SRV_CLOSING";
+   case IBTRS_SRV_CLOSED:
+   return "IBTRS_SRV_CLOSED";
+   default:
+   return "UNKNOWN";
+   }
+}
+
+struct ibtrs_stats_wc_comp {
+   atomic64_t  calls;
+   atomic64_t  total_wc_cnt;
+};
+
+struct ibtrs_srv_stats_rdma_stats {
+   struct {
+   atomic64_t  cnt;
+   atomic64_t  size_total;
+   } dir[2];
+};
+
+struct ibtrs_srv_stats {
+   struct ibtrs_srv_stats_rdma_stats   rdma_stats;
+   atomic_tapm_cnt;
+   struct ibtrs_stats_wc_comp  wc_comp;
+};
+
+struct ibtrs_srv_con {
+   struct ibtrs_conc;
+   atomic_twr_cnt;
+};
+
+struct ibtrs_srv_op {
+   struct ibtrs_srv_con*con;
+   u32 msg_id;
+   u8  dir;
+   struct ibtrs_msg_rdma_read  *rd_msg;
+   struct ib_rdma_wr   *tx_wr;
+   struct ib_sge   *tx_sg;
+};
+
+struct ibtrs_srv_mr {
+   struct ib_mr*mr;
+   struct sg_table sgt;
+};
+
+struct ibtrs_srv_sess {
+   struct ibtrs_sess   s;
+   struct ibtrs_srv*srv;
+   struct work_struct  close_work;
+   enum ibtrs_srv_statestate;
+   spinlock_t  state_lock;
+   int cur_cq_vector;
+   struct ibtrs_srv_op **ops_ids;
+   atomic_tids_inflight;
+   wait_queue_head_t   ids_waitq;
+   struct ibtrs_srv_mr *mrs;
+   unsigned intmrs_num;
+   dma_addr_t  *dma_addr;
+   boolestablished;
+   unsigned intmem_bits;
+   struct kobject  kobj;
+   struct kobject  kobj_stats;
+   struct ibtrs_srv_stats  stats;
+};
+
+struct ibtrs_srv {
+   struct list_headpaths_list;
+   int paths_up;
+   struct mutexpaths_ev_mutex;
+   size_t  paths_num;
+   struct mutexpaths_mutex;
+   uuid_t  paths_uuid;
+   refcount_t  refcount;
+   struct ibtrs_srv_ctx*ctx;
+   struct list_headctx_list;
+   void*priv;
+   size_t  queue_depth;
+   struct page **chunks;
+   struct device   dev;
+   unsigneddev_ref;
+   struct kobject  kobj_paths;
+};
+
+struct ibtrs_

[PATCH v3 16/25] ibnbd: client: private header with client structs and functions

2018-06-06 Thread Roman Pen
This header describes main structs and functions used by ibnbd-client
module, mainly for managing IBNBD sessions and mapped block devices,
creating and destroying sysfs entries.

Signed-off-by: Roman Pen 
Signed-off-by: Danil Kipnis 
Cc: Jack Wang 
---
 drivers/block/ibnbd/ibnbd-clt.h | 172 
 1 file changed, 172 insertions(+)
 create mode 100644 drivers/block/ibnbd/ibnbd-clt.h

diff --git a/drivers/block/ibnbd/ibnbd-clt.h b/drivers/block/ibnbd/ibnbd-clt.h
new file mode 100644
index ..c5f6f08ec338
--- /dev/null
+++ b/drivers/block/ibnbd/ibnbd-clt.h
@@ -0,0 +1,172 @@
+/*
+ * InfiniBand Network Block Driver
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler 
+ *  Jack Wang 
+ *  Kleber Souza 
+ *  Danil Kipnis 
+ *  Roman Penyaev 
+ *  Milind Dumbare 
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis 
+ *  Roman Penyaev 
+ *  Swapnil Ingle 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef IBNBD_CLT_H
+#define IBNBD_CLT_H
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "ibtrs.h"
+#include "ibnbd-proto.h"
+#include "ibnbd-log.h"
+
+#define BMAX_SEGMENTS 31
+#define RECONNECT_DELAY 30
+#define MAX_RECONNECTS -1
+
+enum ibnbd_clt_dev_state {
+   DEV_STATE_INIT,
+   DEV_STATE_MAPPED,
+   DEV_STATE_MAPPED_DISCONNECTED,
+   DEV_STATE_UNMAPPED,
+};
+
+struct ibnbd_iu_comp {
+   wait_queue_head_t wait;
+   int errno;
+};
+
+struct ibnbd_iu {
+   union {
+   struct request *rq; /* for block io */
+   void *buf; /* for user messages */
+   };
+   struct ibtrs_tag*tag;
+   union {
+   /* use to send msg associated with a dev */
+   struct ibnbd_clt_dev *dev;
+   /* use to send msg associated with a sess */
+   struct ibnbd_clt_session *sess;
+   };
+   blk_status_tstatus;
+   struct scatterlist  sglist[BMAX_SEGMENTS];
+   struct work_struct  work;
+   int errno;
+   struct ibnbd_iu_comp*comp;
+};
+
+struct ibnbd_cpu_qlist {
+   struct list_headrequeue_list;
+   spinlock_t  requeue_lock;
+   unsigned intcpu;
+};
+
+struct ibnbd_clt_session {
+   struct list_headlist;
+   struct ibtrs_clt*ibtrs;
+   wait_queue_head_t   ibtrs_waitq;
+   boolibtrs_ready;
+   struct ibnbd_cpu_qlist  __percpu
+   *cpu_queues;
+   DECLARE_BITMAP(cpu_queues_bm, NR_CPUS);
+   int __percpu*cpu_rr; /* per-cpu var for CPU round-robin */
+   atomic_tbusy;
+   int queue_depth;
+   u32 max_io_size;
+   struct blk_mq_tag_set   tag_set;
+   struct mutexlock; /* protects state and devs_list */
+   struct list_headdevs_list; /* list of struct ibnbd_clt_dev */
+   refcount_t  refcount;
+   charsessname[NAME_MAX];
+   u8  ver; /* protocol version */
+};
+
+/**
+ * Submission queues.
+ */
+struct ibnbd_queue {
+   struct list_headrequeue_list;
+   unsigned long   in_list;
+   struct ibnbd_clt_dev*dev;
+   struct blk_mq_hw_ctx*hctx;
+};
+
+struct ibnbd_clt_dev {
+   struct ibnbd_clt_session*sess;
+   struct request_queue*queue;
+   struct ibnbd_queue  *hw_queues;
+   u32 device_id;
+   /* local Idr index - used to track minor number allocations. */
+   u32 clt_device_id;
+   struct mutexlock;
+   enum ibnbd_clt_dev_statedev_state;
+   enum ibnbd_io_mode  io_mode; /* user requested */
+   enum ibnbd_io_mode  remote_io_mode; /* server really used */
+   charpathname[NAME_MAX];
+   enum ibnbd_access_mode  access_mode;
+   boolread_only;
+   boolrotational;
+   u32 max_hw_sectors;
+   u32

[PATCH v3 17/25] ibnbd: client: main functionality

2018-06-06 Thread Roman Pen
This is main functionality of ibnbd-client module, which provides
interface to map remote device as local block device /dev/ibnbd
and feeds IBTRS with IO requests.

Signed-off-by: Roman Pen 
Signed-off-by: Danil Kipnis 
Cc: Jack Wang 
---
 drivers/block/ibnbd/ibnbd-clt.c | 1817 +++
 1 file changed, 1817 insertions(+)
 create mode 100644 drivers/block/ibnbd/ibnbd-clt.c

diff --git a/drivers/block/ibnbd/ibnbd-clt.c b/drivers/block/ibnbd/ibnbd-clt.c
new file mode 100644
index ..d665e144a253
--- /dev/null
+++ b/drivers/block/ibnbd/ibnbd-clt.c
@@ -0,0 +1,1817 @@
+/*
+ * InfiniBand Network Block Driver
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler 
+ *  Jack Wang 
+ *  Kleber Souza 
+ *  Danil Kipnis 
+ *  Roman Penyaev 
+ *  Milind Dumbare 
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis 
+ *  Roman Penyaev 
+ *  Swapnil Ingle 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#undef pr_fmt
+#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "ibnbd-clt.h"
+
+MODULE_AUTHOR("ib...@profitbricks.com");
+MODULE_DESCRIPTION("InfiniBand Network Block Device Client");
+MODULE_VERSION(IBNBD_VER_STRING);
+MODULE_LICENSE("GPL");
+
+/*
+ * This is for closing devices when unloading the module:
+ * we might be closing a lot (>256) of devices in parallel
+ * and it is better not to use the system_wq.
+ */
+static struct workqueue_struct *unload_wq;
+static int ibnbd_client_major;
+static DEFINE_IDA(index_ida);
+static DEFINE_MUTEX(ida_lock);
+static DEFINE_MUTEX(sess_lock);
+static LIST_HEAD(sess_list);
+
+static bool softirq_enable;
+module_param(softirq_enable, bool, 0444);
+MODULE_PARM_DESC(softirq_enable, "finish request in softirq_fn."
+" (default: 0)");
+/*
+ * Maximum number of partitions an instance can have.
+ * 6 bits = 64 minors = 63 partitions (one minor is used for the device itself)
+ */
+#define IBNBD_PART_BITS6
+#define KERNEL_SECTOR_SIZE  512
+
+static inline bool ibnbd_clt_get_sess(struct ibnbd_clt_session *sess)
+{
+   return refcount_inc_not_zero(>refcount);
+}
+
+static void free_sess(struct ibnbd_clt_session *sess);
+
+static void ibnbd_clt_put_sess(struct ibnbd_clt_session *sess)
+{
+   might_sleep();
+
+   if (refcount_dec_and_test(>refcount))
+   free_sess(sess);
+}
+
+static inline bool ibnbd_clt_dev_is_mapped(struct ibnbd_clt_dev *dev)
+{
+   return dev->dev_state == DEV_STATE_MAPPED;
+}
+
+static void ibnbd_clt_put_dev(struct ibnbd_clt_dev *dev)
+{
+   might_sleep();
+
+   if (refcount_dec_and_test(>refcount)) {
+   mutex_lock(_lock);
+   ida_simple_remove(_ida, dev->clt_device_id);
+   mutex_unlock(_lock);
+   kfree(dev->hw_queues);
+   ibnbd_clt_put_sess(dev->sess);
+   kfree(dev);
+   }
+}
+
+static inline bool ibnbd_clt_get_dev(struct ibnbd_clt_dev *dev)
+{
+   return refcount_inc_not_zero(>refcount);
+}
+
+static int ibnbd_clt_set_dev_attr(struct ibnbd_clt_dev *dev,
+ const struct ibnbd_msg_open_rsp *rsp)
+{
+   struct ibnbd_clt_session *sess = dev->sess;
+
+   if (unlikely(!rsp->logical_block_size))
+   return -EINVAL;
+
+   dev->device_id  = le32_to_cpu(rsp->device_id);
+   dev->nsectors   = le64_to_cpu(rsp->nsectors);
+   dev->logical_block_size = le16_to_cpu(rsp->logical_block_size);
+   dev->physical_block_size= le16_to_cpu(rsp->physical_block_size);
+   dev->max_write_same_sectors = le32_to_cpu(rsp->max_write_same_sectors);
+   dev->max_discard_sectors= le32_to_cpu(rsp->max_discard_sectors);
+   dev->discard_granularity= le32_to_cpu(rsp->discard_granularity);
+   dev->discard_alignment  = le32_to_cpu(rsp->discard_alignment);
+   dev->secure_discard = le16_to_cpu(rsp->secure_discard);
+   dev->rotational  

[PATCH v3 13/25] ibtrs: include client and server modules into kernel compilation

2018-06-06 Thread Roman Pen
Add IBTRS Makefile, Kconfig and also corresponding lines into upper
layer infiniband/ulp files.

Signed-off-by: Roman Pen 
Signed-off-by: Danil Kipnis 
Cc: Jack Wang 
---
 drivers/infiniband/Kconfig|  1 +
 drivers/infiniband/ulp/Makefile   |  1 +
 drivers/infiniband/ulp/ibtrs/Kconfig  | 20 
 drivers/infiniband/ulp/ibtrs/Makefile | 13 +
 4 files changed, 35 insertions(+)
 create mode 100644 drivers/infiniband/ulp/ibtrs/Kconfig
 create mode 100644 drivers/infiniband/ulp/ibtrs/Makefile

diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig
index 2a972ed6851b..10df5d2bb8fe 100644
--- a/drivers/infiniband/Kconfig
+++ b/drivers/infiniband/Kconfig
@@ -97,6 +97,7 @@ source "drivers/infiniband/ulp/srpt/Kconfig"
 
 source "drivers/infiniband/ulp/iser/Kconfig"
 source "drivers/infiniband/ulp/isert/Kconfig"
+source "drivers/infiniband/ulp/ibtrs/Kconfig"
 
 source "drivers/infiniband/ulp/opa_vnic/Kconfig"
 source "drivers/infiniband/sw/rdmavt/Kconfig"
diff --git a/drivers/infiniband/ulp/Makefile b/drivers/infiniband/ulp/Makefile
index 437813c7b481..1c4f10dc8d49 100644
--- a/drivers/infiniband/ulp/Makefile
+++ b/drivers/infiniband/ulp/Makefile
@@ -5,3 +5,4 @@ obj-$(CONFIG_INFINIBAND_SRPT)   += srpt/
 obj-$(CONFIG_INFINIBAND_ISER)  += iser/
 obj-$(CONFIG_INFINIBAND_ISERT) += isert/
 obj-$(CONFIG_INFINIBAND_OPA_VNIC)  += opa_vnic/
+obj-$(CONFIG_INFINIBAND_IBTRS) += ibtrs/
diff --git a/drivers/infiniband/ulp/ibtrs/Kconfig 
b/drivers/infiniband/ulp/ibtrs/Kconfig
new file mode 100644
index ..eaeb8f3f6b4e
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs/Kconfig
@@ -0,0 +1,20 @@
+config INFINIBAND_IBTRS
+   tristate
+   depends on INFINIBAND_ADDR_TRANS
+
+config INFINIBAND_IBTRS_CLIENT
+   tristate "IBTRS client module"
+   depends on INFINIBAND_ADDR_TRANS
+   select INFINIBAND_IBTRS
+   help
+ IBTRS client allows for simplified data transfer and connection
+ establishment over RDMA (InfiniBand, RoCE, iWarp). Uses BIO-like
+ READ/WRITE semantics and provides multipath capabilities.
+
+config INFINIBAND_IBTRS_SERVER
+   tristate "IBTRS server module"
+   depends on INFINIBAND_ADDR_TRANS
+   select INFINIBAND_IBTRS
+   help
+ IBTRS server module processing connection and IO requests received
+ from the IBTRS client module.
diff --git a/drivers/infiniband/ulp/ibtrs/Makefile 
b/drivers/infiniband/ulp/ibtrs/Makefile
new file mode 100644
index ..2a145f8d252a
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs/Makefile
@@ -0,0 +1,13 @@
+ibtrs-client-y := ibtrs-clt.o \
+ ibtrs-clt-stats.o \
+ ibtrs-clt-sysfs.o
+
+ibtrs-server-y := ibtrs-srv.o \
+ ibtrs-srv-stats.o \
+ ibtrs-srv-sysfs.o
+
+ibtrs-core-y := ibtrs.o
+
+obj-$(CONFIG_INFINIBAND_IBTRS)+= ibtrs-core.o
+obj-$(CONFIG_INFINIBAND_IBTRS_CLIENT) += ibtrs-client.o
+obj-$(CONFIG_INFINIBAND_IBTRS_SERVER) += ibtrs-server.o
-- 
2.13.1



[PATCH v3 05/25] ibtrs: client: private header with client structs and functions

2018-06-06 Thread Roman Pen
This header describes main structs and functions used by ibtrs-client
module, mainly for managing IBTRS sessions, creating/destroying sysfs
entries, accounting statistics on client side.

Signed-off-by: Roman Pen 
Signed-off-by: Danil Kipnis 
Cc: Jack Wang 
---
 drivers/infiniband/ulp/ibtrs/ibtrs-clt.h | 315 +++
 1 file changed, 315 insertions(+)
 create mode 100644 drivers/infiniband/ulp/ibtrs/ibtrs-clt.h

diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-clt.h 
b/drivers/infiniband/ulp/ibtrs/ibtrs-clt.h
new file mode 100644
index ..3212a33a0bf5
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs/ibtrs-clt.h
@@ -0,0 +1,315 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler 
+ *  Jack Wang 
+ *  Kleber Souza 
+ *  Danil Kipnis 
+ *  Roman Penyaev 
+ *  Milind Dumbare 
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis 
+ *  Roman Penyaev 
+ *  Swapnil Ingle 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef IBTRS_CLT_H
+#define IBTRS_CLT_H
+
+#include 
+#include "ibtrs-pri.h"
+
+/**
+ * enum ibtrs_clt_state - Client states.
+ */
+enum ibtrs_clt_state {
+   IBTRS_CLT_CONNECTING,
+   IBTRS_CLT_CONNECTING_ERR,
+   IBTRS_CLT_RECONNECTING,
+   IBTRS_CLT_CONNECTED,
+   IBTRS_CLT_CLOSING,
+   IBTRS_CLT_CLOSED,
+   IBTRS_CLT_DEAD,
+};
+
+static inline const char *ibtrs_clt_state_str(enum ibtrs_clt_state state)
+{
+   switch (state) {
+   case IBTRS_CLT_CONNECTING:
+   return "IBTRS_CLT_CONNECTING";
+   case IBTRS_CLT_CONNECTING_ERR:
+   return "IBTRS_CLT_CONNECTING_ERR";
+   case IBTRS_CLT_RECONNECTING:
+   return "IBTRS_CLT_RECONNECTING";
+   case IBTRS_CLT_CONNECTED:
+   return "IBTRS_CLT_CONNECTED";
+   case IBTRS_CLT_CLOSING:
+   return "IBTRS_CLT_CLOSING";
+   case IBTRS_CLT_CLOSED:
+   return "IBTRS_CLT_CLOSED";
+   case IBTRS_CLT_DEAD:
+   return "IBTRS_CLT_DEAD";
+   default:
+   return "UNKNOWN";
+   }
+}
+
+enum ibtrs_mp_policy {
+   MP_POLICY_RR,
+   MP_POLICY_MIN_INFLIGHT,
+};
+
+struct ibtrs_clt_stats_reconnects {
+   int successful_cnt;
+   int fail_cnt;
+};
+
+struct ibtrs_clt_stats_wc_comp {
+   u32 cnt;
+   u64 total_cnt;
+};
+
+struct ibtrs_clt_stats_cpu_migr {
+   atomic_t from;
+   int to;
+};
+
+struct ibtrs_clt_stats_rdma {
+   struct {
+   u64 cnt;
+   u64 size_total;
+   } dir[2];
+
+   u64 failover_cnt;
+};
+
+struct ibtrs_clt_stats_rdma_lat {
+   u64 read;
+   u64 write;
+};
+
+#define MIN_LOG_SG 2
+#define MAX_LOG_SG 5
+#define MAX_LIN_SG BIT(MIN_LOG_SG)
+#define SG_DISTR_SZ (MAX_LOG_SG - MIN_LOG_SG + MAX_LIN_SG + 2)
+
+#define MAX_LOG_LAT 16
+#define MIN_LOG_LAT 0
+#define LOG_LAT_SZ (MAX_LOG_LAT - MIN_LOG_LAT + 2)
+
+struct ibtrs_clt_stats_pcpu {
+   struct ibtrs_clt_stats_cpu_migr cpu_migr;
+   struct ibtrs_clt_stats_rdma rdma;
+   u64 sg_list_total;
+   u64 sg_list_distr[SG_DISTR_SZ];
+   struct ibtrs_clt_stats_rdma_lat rdma_lat_distr[LOG_LAT_SZ];
+   struct ibtrs_clt_stats_rdma_lat rdma_lat_max;
+   struct ibtrs_clt_stats_wc_comp  wc_comp;
+};
+
+struct ibtrs_clt_stats {
+   boolenable_rdma_lat;
+   struct ibtrs_clt_stats_pcpu__percpu *pcpu_stats;
+   struct ibtrs_clt_stats_reconnects   reconnects;
+   atomic_tinflight;
+};
+
+struct ibtrs_clt_con {
+   struct ibtrs_conc;
+   unsignedcpu;
+   atomic_tio_cnt;
+   int cm_err;
+};
+
+/**
+ * ibtrs_tag - tags the memory allocation for future RDMA operation
+ */
+struct ibtrs_tag {
+   enum ibtrs_clt_con_type con_type;
+   unsigned int cpu_id;
+   unsigned int mem_id;
+   unsigned int mem_off;
+};
+
+struct ibtrs_clt_io_req {
+   struct lis

[PATCH v3 11/25] ibtrs: server: statistics functions

2018-06-06 Thread Roman Pen
This introduces set of functions used on server side to account
statistics of RDMA data sent/received.

Signed-off-by: Roman Pen 
Signed-off-by: Danil Kipnis 
Cc: Jack Wang 
---
 drivers/infiniband/ulp/ibtrs/ibtrs-srv-stats.c | 110 +
 1 file changed, 110 insertions(+)
 create mode 100644 drivers/infiniband/ulp/ibtrs/ibtrs-srv-stats.c

diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-srv-stats.c 
b/drivers/infiniband/ulp/ibtrs/ibtrs-srv-stats.c
new file mode 100644
index ..5933cfc03f95
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs/ibtrs-srv-stats.c
@@ -0,0 +1,110 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler 
+ *  Jack Wang 
+ *  Kleber Souza 
+ *  Danil Kipnis 
+ *  Roman Penyaev 
+ *  Milind Dumbare 
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis 
+ *  Roman Penyaev 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#undef pr_fmt
+#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt
+
+#include "ibtrs-srv.h"
+
+void ibtrs_srv_update_rdma_stats(struct ibtrs_srv_stats *s,
+size_t size, int d)
+{
+   atomic64_inc(>rdma_stats.dir[d].cnt);
+   atomic64_add(size, >rdma_stats.dir[d].size_total);
+}
+
+void ibtrs_srv_update_wc_stats(struct ibtrs_srv_stats *s)
+{
+   atomic64_inc(>wc_comp.calls);
+   atomic64_inc(>wc_comp.total_wc_cnt);
+}
+
+int ibtrs_srv_reset_rdma_stats(struct ibtrs_srv_stats *stats, bool enable)
+{
+   if (enable) {
+   struct ibtrs_srv_stats_rdma_stats *r = >rdma_stats;
+
+   memset(r, 0, sizeof(*r));
+   return 0;
+   }
+
+   return -EINVAL;
+}
+
+ssize_t ibtrs_srv_stats_rdma_to_str(struct ibtrs_srv_stats *stats,
+   char *page, size_t len)
+{
+   struct ibtrs_srv_stats_rdma_stats *r = >rdma_stats;
+   struct ibtrs_srv_sess *sess;
+
+   sess = container_of(stats, typeof(*sess), stats);
+
+   return scnprintf(page, len, "%lld %lld %lld %lld %u\n",
+(s64)atomic64_read(>dir[READ].cnt),
+(s64)atomic64_read(>dir[READ].size_total),
+(s64)atomic64_read(>dir[WRITE].cnt),
+(s64)atomic64_read(>dir[WRITE].size_total),
+atomic_read(>ids_inflight));
+}
+
+int ibtrs_srv_reset_wc_completion_stats(struct ibtrs_srv_stats *stats,
+   bool enable)
+{
+   if (enable) {
+   memset(>wc_comp, 0, sizeof(stats->wc_comp));
+   return 0;
+   }
+
+   return -EINVAL;
+}
+
+int ibtrs_srv_stats_wc_completion_to_str(struct ibtrs_srv_stats *stats,
+char *buf, size_t len)
+{
+   return snprintf(buf, len, "%lld %lld\n",
+   (s64)atomic64_read(>wc_comp.total_wc_cnt),
+   (s64)atomic64_read(>wc_comp.calls));
+}
+
+ssize_t ibtrs_srv_reset_all_help(struct ibtrs_srv_stats *stats,
+char *page, size_t len)
+{
+   return scnprintf(page, PAGE_SIZE, "echo 1 to reset all statistics\n");
+}
+
+int ibtrs_srv_reset_all_stats(struct ibtrs_srv_stats *stats, bool enable)
+{
+   if (enable) {
+   ibtrs_srv_reset_wc_completion_stats(stats, enable);
+   ibtrs_srv_reset_rdma_stats(stats, enable);
+   return 0;
+   }
+
+   return -EINVAL;
+}
-- 
2.13.1



[PATCH v3 12/25] ibtrs: server: sysfs interface functions

2018-06-06 Thread Roman Pen
This is the sysfs interface to IBTRS sessions on server side:

  /sys/devices/virtual/ibtrs-server//
*** IBTRS session accepted from a client peer
|
|- paths//
   *** established paths from a client in a session
   |
   |- disconnect
   |  *** disconnect path
   |
   |- hca_name
   |  *** HCA name
   |
   |- hca_port
   |  *** HCA port
   |
   |- stats/
  *** current path statistics
  |
  |- rdma
  |- reset_all
  |- wc_completions

Signed-off-by: Roman Pen 
Signed-off-by: Danil Kipnis 
Cc: Jack Wang 
---
 drivers/infiniband/ulp/ibtrs/ibtrs-srv-sysfs.c | 307 +
 1 file changed, 307 insertions(+)
 create mode 100644 drivers/infiniband/ulp/ibtrs/ibtrs-srv-sysfs.c

diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-srv-sysfs.c 
b/drivers/infiniband/ulp/ibtrs/ibtrs-srv-sysfs.c
new file mode 100644
index ..91f664b7eb66
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs/ibtrs-srv-sysfs.c
@@ -0,0 +1,307 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler 
+ *  Jack Wang 
+ *  Kleber Souza 
+ *  Danil Kipnis 
+ *  Roman Penyaev 
+ *  Milind Dumbare 
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis 
+ *  Roman Penyaev 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#undef pr_fmt
+#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt
+
+#include "ibtrs-pri.h"
+#include "ibtrs-srv.h"
+#include "ibtrs-log.h"
+
+static struct kobj_type ktype = {
+   .sysfs_ops  = _sysfs_ops,
+};
+
+static ssize_t ibtrs_srv_disconnect_show(struct kobject *kobj,
+struct kobj_attribute *attr,
+char *page)
+{
+   return scnprintf(page, PAGE_SIZE, "Usage: echo 1 > %s\n",
+attr->attr.name);
+}
+
+static ssize_t ibtrs_srv_disconnect_store(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ const char *buf, size_t count)
+{
+   struct ibtrs_srv_sess *sess;
+   char str[MAXHOSTNAMELEN];
+
+   sess = container_of(kobj, struct ibtrs_srv_sess, kobj);
+   if (!sysfs_streq(buf, "1")) {
+   ibtrs_err(sess, "%s: invalid value: '%s'\n",
+ attr->attr.name, buf);
+   return -EINVAL;
+   }
+
+   sockaddr_to_str((struct sockaddr *)>s.dst_addr, str, sizeof(str));
+
+   ibtrs_info(sess, "disconnect for path %s requested\n", str);
+   ibtrs_srv_queue_close(sess);
+
+   return count;
+}
+
+static struct kobj_attribute ibtrs_srv_disconnect_attr =
+   __ATTR(disconnect, 0644,
+  ibtrs_srv_disconnect_show, ibtrs_srv_disconnect_store);
+
+static ssize_t ibtrs_srv_hca_port_show(struct kobject *kobj,
+  struct kobj_attribute *attr,
+  char *page)
+{
+   struct ibtrs_srv_sess *sess;
+   struct ibtrs_con *usr_con;
+
+   sess = container_of(kobj, typeof(*sess), kobj);
+   usr_con = sess->s.con[0];
+
+   return scnprintf(page, PAGE_SIZE, "%u\n",
+usr_con->cm_id->port_num);
+}
+
+static struct kobj_attribute ibtrs_srv_hca_port_attr =
+   __ATTR(hca_port, 0444, ibtrs_srv_hca_port_show, NULL);
+
+static ssize_t ibtrs_srv_hca_name_show(struct kobject *kobj,
+  struct kobj_attribute *attr,
+  char *page)
+{
+   struct ibtrs_srv_sess *sess;
+
+   sess = container_of(kobj, struct ibtrs_srv_sess, kobj);
+
+   return scnprintf(page, PAGE_SIZE, "%s\n",
+sess->s.dev->ib_dev->name);
+}
+
+static struct kobj_attribute ibtrs_srv_hca_name_attr =
+   __ATTR(hca_name, 0444, ibtrs_srv_hca_name_show, NULL);
+
+static ssize_t ibtrs_srv_src_addr_show(struct kobject *kobj,
+  struct kobj_attribute *attr,
+  

[PATCH v3 14/25] ibtrs: a bit of documentation

2018-06-06 Thread Roman Pen
README with description of major sysfs entries.

Signed-off-by: Roman Pen 
Signed-off-by: Danil Kipnis 
Cc: Jack Wang 
---
 drivers/infiniband/ulp/ibtrs/README | 390 
 1 file changed, 390 insertions(+)
 create mode 100644 drivers/infiniband/ulp/ibtrs/README

diff --git a/drivers/infiniband/ulp/ibtrs/README 
b/drivers/infiniband/ulp/ibtrs/README
new file mode 100644
index ..d9d8cd69d44f
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs/README
@@ -0,0 +1,390 @@
+
+InfiniBand Transport (IBTRS)
+
+
+IBTRS (InfiniBand Transport) is a reliable high speed transport library
+which provides support to establish optimal number of connections
+between client and server machines using RDMA (InfiniBand, RoCE, iWarp)
+transport. It is optimized to transfer (read/write) IO blocks.
+
+In its core interface it follows the BIO semantics of providing the
+possibility to either write data from an sg list to the remote side
+or to request ("read") data transfer from the remote side into a given
+sg list.
+
+IBTRS provides I/O fail-over and load-balancing capabilities by using
+multipath I/O (see "add_path" and "mp_policy" configuration entries).
+
+IBTRS is used by the IBNBD (Infiniband Network Block Device) modules.
+
+==
+Client Sysfs Interface
+==
+
+This chapter describes only the most important files of sysfs interface
+on client side.
+
+Entries under /sys/devices/virtual/ibtrs-client/
+
+
+When a user of IBTRS API creates a new session, a directory entry with
+the name of that session is created.
+
+Entries under /sys/devices/virtual/ibtrs-client//
+===
+
+add_path (RW)
+-
+
+Adds a new path (connection) to an existing session. Expected format is the
+following:
+
+  <[source addr,]destination addr>
+
+  *addr ::= [ ip: | gid: ]
+
+max_reconnect_attempts (RW)
+---
+
+Maximum number reconnect attempts the client should make before giving up
+after connection breaks unexpectedly.
+
+mp_policy (RW)
+--
+
+Multipath policy specifies which path should be selected on each IO:
+
+   round-robin (0):
+   select path in per CPU round-robin manner.
+
+   min-inflight (1):
+   select path with minimum inflights.
+
+Entries under /sys/devices/virtual/ibtrs-client//paths/
+=
+
+
+Each path belonging to a given session is listed here by its source and
+destination address. When a new path is added to a session by writing to
+the "add_path" entry, a directory  is created.
+
+Entries under /sys/devices/virtual/ibtrs-client//paths//
+===
+
+state (R)
+-
+
+Contains "connected" if the session is connected to the peer and fully
+functional.  Otherwise the file contains "disconnected"
+
+reconnect (RW)
+--
+
+Write "1" to the file in order to reconnect the path.
+Operation is blocking and returns 0 if reconnect was successful.
+
+disconnect (RW)
+---
+
+Write "1" to the file in order to disconnect the path.
+Operation blocks until IBTRS path is disconnected.
+
+remove_path (RW)
+
+
+Write "1" to the file in order to disconnected and remove the path
+from the session.  Operation blocks until the path is disconnected
+and removed from the session.
+
+hca_name (R)
+
+
+Contains the the name of HCA the connection established on.
+
+hca_port (R)
+
+
+Contains the port number of active port traffic is going through.
+
+src_addr (R)
+
+
+Contains the source address of the path
+
+dst_addr (R)
+
+
+Contains the destination address of the path
+
+
+Entries under 
/sys/devices/virtual/ibtrs-client//paths//stats/
+=
+
+Write "0" to any file in that directory to reset corresponding statistics.
+
+reset_all (RW)
+--
+
+Read will return usage help, write 0 will clear all the statistics.
+
+sg_entries (RW)
+---
+
+Data to be transferred via RDMA is passed to IBTRS as scatter-gather
+list. A scatter-gather list can contain multiple entries.
+Scatter-gather list with less entries require less processing power
+and can therefore transferred faster. The file sg_entries outputs a
+per-CPU distribution table for the number of entries in the
+scatter-gather lists, that were passed to the IBTRS API function
+ibtrs_clt_request (READ or WRITE).
+
+cpu_migration (RW)
+--
+
+IBTRS expects that each HCA IRQ is pinned to a separate CPU. If it's
+not the case, the processing of an I/O response could be p

[PATCH v3 07/25] ibtrs: client: statistics functions

2018-06-06 Thread Roman Pen
This introduces set of functions used on client side to account
statistics of RDMA data sent/received, amount of IOs inflight,
latency, cpu migrations, etc.  Almost all statistics is collected
using percpu variables.

Signed-off-by: Roman Pen 
Signed-off-by: Danil Kipnis 
Cc: Jack Wang 
---
 drivers/infiniband/ulp/ibtrs/ibtrs-clt-stats.c | 455 +
 1 file changed, 455 insertions(+)
 create mode 100644 drivers/infiniband/ulp/ibtrs/ibtrs-clt-stats.c

diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-clt-stats.c 
b/drivers/infiniband/ulp/ibtrs/ibtrs-clt-stats.c
new file mode 100644
index ..af2ed05d2900
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs/ibtrs-clt-stats.c
@@ -0,0 +1,455 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler 
+ *  Jack Wang 
+ *  Kleber Souza 
+ *  Danil Kipnis 
+ *  Roman Penyaev 
+ *  Milind Dumbare 
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis 
+ *  Roman Penyaev 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#undef pr_fmt
+#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt
+
+#include "ibtrs-clt.h"
+
+static inline int ibtrs_clt_ms_to_id(unsigned long ms)
+{
+   int id = ms ? ilog2(ms) - MIN_LOG_LAT + 1 : 0;
+
+   return clamp(id, 0, LOG_LAT_SZ - 1);
+}
+
+void ibtrs_clt_update_rdma_lat(struct ibtrs_clt_stats *stats, bool read,
+  unsigned long ms)
+{
+   struct ibtrs_clt_stats_pcpu *s;
+   int id;
+
+   id = ibtrs_clt_ms_to_id(ms);
+   s = this_cpu_ptr(stats->pcpu_stats);
+   if (read) {
+   s->rdma_lat_distr[id].read++;
+   if (s->rdma_lat_max.read < ms)
+   s->rdma_lat_max.read = ms;
+   } else {
+   s->rdma_lat_distr[id].write++;
+   if (s->rdma_lat_max.write < ms)
+   s->rdma_lat_max.write = ms;
+   }
+}
+
+void ibtrs_clt_decrease_inflight(struct ibtrs_clt_stats *stats)
+{
+   atomic_dec(>inflight);
+}
+
+void ibtrs_clt_update_wc_stats(struct ibtrs_clt_con *con)
+{
+   struct ibtrs_clt_sess *sess = to_clt_sess(con->c.sess);
+   struct ibtrs_clt_stats *stats = >stats;
+   struct ibtrs_clt_stats_pcpu *s;
+   int cpu;
+
+   cpu = raw_smp_processor_id();
+   s = this_cpu_ptr(stats->pcpu_stats);
+   s->wc_comp.cnt++;
+   s->wc_comp.total_cnt++;
+   if (unlikely(con->cpu != cpu)) {
+   s->cpu_migr.to++;
+
+   /* Careful here, override s pointer */
+   s = per_cpu_ptr(stats->pcpu_stats, con->cpu);
+   atomic_inc(>cpu_migr.from);
+   }
+}
+
+void ibtrs_clt_inc_failover_cnt(struct ibtrs_clt_stats *stats)
+{
+   struct ibtrs_clt_stats_pcpu *s;
+
+   s = this_cpu_ptr(stats->pcpu_stats);
+   s->rdma.failover_cnt++;
+}
+
+static inline u32 ibtrs_clt_stats_get_avg_wc_cnt(struct ibtrs_clt_stats *stats)
+{
+   u32 cnt = 0;
+   u64 sum = 0;
+   int cpu;
+
+   for_each_possible_cpu(cpu) {
+   struct ibtrs_clt_stats_pcpu *s;
+
+   s = per_cpu_ptr(stats->pcpu_stats, cpu);
+   sum += s->wc_comp.total_cnt;
+   cnt += s->wc_comp.cnt;
+   }
+
+   return cnt ? sum / cnt : 0;
+}
+
+int ibtrs_clt_stats_wc_completion_to_str(struct ibtrs_clt_stats *stats,
+char *buf, size_t len)
+{
+   return scnprintf(buf, len, "%u\n",
+ibtrs_clt_stats_get_avg_wc_cnt(stats));
+}
+
+ssize_t ibtrs_clt_stats_rdma_lat_distr_to_str(struct ibtrs_clt_stats *stats,
+ char *page, size_t len)
+{
+   struct ibtrs_clt_stats_rdma_lat res[LOG_LAT_SZ];
+   struct ibtrs_clt_stats_rdma_lat max;
+   struct ibtrs_clt_stats_pcpu *s;
+
+   ssize_t cnt = 0;
+   int i, cpu;
+
+   max.write = 0;
+   max.read = 0;
+   for_each_possible_cpu(cpu) {
+   s = per_cpu_ptr(stats->pcpu_stats, cpu);
+
+   if (max.write < s->rdma_lat_max.write)
+   max.write = s

[PATCH v3 06/25] ibtrs: client: main functionality

2018-06-06 Thread Roman Pen
This is main functionality of ibtrs-client module, which manages
set of RDMA connections for each IBTRS session, does multipathing,
load balancing and failover of RDMA requests.

Signed-off-by: Roman Pen 
Signed-off-by: Danil Kipnis 
Cc: Jack Wang 
---
 drivers/infiniband/ulp/ibtrs/ibtrs-clt.c | 2844 ++
 1 file changed, 2844 insertions(+)
 create mode 100644 drivers/infiniband/ulp/ibtrs/ibtrs-clt.c

diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-clt.c 
b/drivers/infiniband/ulp/ibtrs/ibtrs-clt.c
new file mode 100644
index ..dc0327a95ef6
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs/ibtrs-clt.c
@@ -0,0 +1,2844 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler 
+ *  Jack Wang 
+ *  Kleber Souza 
+ *  Danil Kipnis 
+ *  Roman Penyaev 
+ *  Milind Dumbare 
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis 
+ *  Roman Penyaev 
+ *  Swapnil Ingle 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#undef pr_fmt
+#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt
+
+#include 
+#include 
+
+#include "ibtrs-clt.h"
+#include "ibtrs-log.h"
+
+#define MAX_SEGMENTS 31
+#define IBTRS_CONNECT_TIMEOUT_MS 5000
+
+MODULE_AUTHOR("ib...@profitbricks.com");
+MODULE_DESCRIPTION("IBTRS Client");
+MODULE_VERSION(IBTRS_VER_STRING);
+MODULE_LICENSE("GPL");
+
+static ushort nr_cons_per_session;
+module_param(nr_cons_per_session, ushort, 0444);
+MODULE_PARM_DESC(nr_cons_per_session, "Number of connections per session."
+" (default: nr_cpu_ids)");
+
+static int retry_cnt = 7;
+module_param_named(retry_cnt, retry_cnt, int, 0644);
+MODULE_PARM_DESC(retry_cnt, "Number of times to send the message if the"
+" remote side didn't respond with Ack or Nack (default: 7,"
+" min: " __stringify(MIN_RTR_CNT) ", max: "
+__stringify(MAX_RTR_CNT) ")");
+
+static int __read_mostly noreg_cnt = 0;
+module_param_named(noreg_cnt, noreg_cnt, int, 0444);
+MODULE_PARM_DESC(noreg_cnt, "Max number of SG entries when MR registration "
+"does not happen (default: 0)");
+
+static const struct ibtrs_ib_dev_pool_ops dev_pool_ops;
+static struct ibtrs_ib_dev_pool dev_pool = {
+   .ops = _pool_ops
+};
+static struct workqueue_struct *ibtrs_wq;
+static struct class *ibtrs_dev_class;
+
+static void ibtrs_rdma_error_recovery(struct ibtrs_clt_con *con);
+static int ibtrs_clt_rdma_cm_handler(struct rdma_cm_id *cm_id,
+struct rdma_cm_event *ev);
+static void ibtrs_clt_rdma_done(struct ib_cq *cq, struct ib_wc *wc);
+static void complete_rdma_req(struct ibtrs_clt_io_req *req, int errno,
+ bool notify, bool can_wait);
+static int ibtrs_clt_write_req(struct ibtrs_clt_io_req *req);
+static int ibtrs_clt_read_req(struct ibtrs_clt_io_req *req);
+
+bool ibtrs_clt_sess_is_connected(const struct ibtrs_clt_sess *sess)
+{
+   return sess->state == IBTRS_CLT_CONNECTED;
+}
+
+static inline bool ibtrs_clt_is_connected(const struct ibtrs_clt *clt)
+{
+   struct ibtrs_clt_sess *sess;
+   bool connected = false;
+
+   rcu_read_lock();
+   list_for_each_entry_rcu(sess, >paths_list, s.entry)
+   connected |= ibtrs_clt_sess_is_connected(sess);
+   rcu_read_unlock();
+
+   return connected;
+}
+
+static inline struct ibtrs_tag *
+__ibtrs_get_tag(struct ibtrs_clt *clt, enum ibtrs_clt_con_type con_type)
+{
+   size_t max_depth = clt->queue_depth;
+   struct ibtrs_tag *tag;
+   int cpu, bit;
+
+   cpu = get_cpu();
+   do {
+   bit = find_first_zero_bit(clt->tags_map, max_depth);
+   if (unlikely(bit >= max_depth)) {
+   put_cpu();
+   return NULL;
+   }
+
+   } while (unlikely(test_and_set_bit_lock(bit, clt->tags_map)));
+   put_cpu();
+
+   tag = GET_TAG(clt, bit);
+   WARN_ON(tag->mem_id != bit);
+   tag->cpu_id = cpu;
+   tag->con_type = con_t

[PATCH v3 08/25] ibtrs: client: sysfs interface functions

2018-06-06 Thread Roman Pen
This is the sysfs interface to IBTRS sessions on client side:

  /sys/devices/virtual/ibtrs-client//
*** IBTRS session created by ibtrs_clt_open() API call
|
|- max_reconnect_attempts
|  *** number of reconnect attempts for session
|
|- add_path
|  *** adds another connection path into IBTRS session
|
|- paths//
   *** established paths to server in a session
   |
   |- disconnect
   |  *** disconnect path
   |
   |- reconnect
   |  *** reconnect path
   |
   |- remove_path
   |  *** remove current path
   |
   |- state
   |  *** retrieve current path state
   |
   |- hca_port
   |  *** HCA port number
   |
   |- hca_name
   |  *** HCA name
   |
   |- stats/
  *** current path statistics
  |
  |- cpu_migration
  |- rdma
  |- rdma_lat
  |- reconnects
  |- reset_all
  |- sg_entries
  |- wc_completions

Signed-off-by: Roman Pen 
Signed-off-by: Danil Kipnis 
Cc: Jack Wang 
---
 drivers/infiniband/ulp/ibtrs/ibtrs-clt-sysfs.c | 520 +
 1 file changed, 520 insertions(+)
 create mode 100644 drivers/infiniband/ulp/ibtrs/ibtrs-clt-sysfs.c

diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-clt-sysfs.c 
b/drivers/infiniband/ulp/ibtrs/ibtrs-clt-sysfs.c
new file mode 100644
index ..a25763a29a17
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs/ibtrs-clt-sysfs.c
@@ -0,0 +1,520 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler 
+ *  Jack Wang 
+ *  Kleber Souza 
+ *  Danil Kipnis 
+ *  Roman Penyaev 
+ *  Milind Dumbare 
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis 
+ *  Roman Penyaev 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#undef pr_fmt
+#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt
+
+#include "ibtrs-pri.h"
+#include "ibtrs-clt.h"
+#include "ibtrs-log.h"
+
+#define MIN_MAX_RECONN_ATT -1
+#define MAX_MAX_RECONN_ATT 
+
+static struct kobj_type ktype = {
+   .sysfs_ops = _sysfs_ops,
+};
+
+static ssize_t max_reconnect_attempts_show(struct device *dev,
+  struct device_attribute *attr,
+  char *page)
+{
+   struct ibtrs_clt *clt;
+
+   clt = container_of(dev, struct ibtrs_clt, dev);
+
+   return sprintf(page, "%d\n", ibtrs_clt_get_max_reconnect_attempts(clt));
+}
+
+static ssize_t max_reconnect_attempts_store(struct device *dev,
+   struct device_attribute *attr,
+   const char *buf,
+   size_t count)
+{
+   struct ibtrs_clt *clt;
+   int value;
+   int ret;
+
+   clt = container_of(dev, struct ibtrs_clt, dev);
+
+   ret = kstrtoint(buf, 10, );
+   if (unlikely(ret)) {
+   ibtrs_err(clt, "%s: failed to convert string '%s' to int\n",
+ attr->attr.name, buf);
+   return ret;
+   }
+   if (unlikely(value > MAX_MAX_RECONN_ATT ||
+value < MIN_MAX_RECONN_ATT)) {
+   ibtrs_err(clt, "%s: invalid range"
+ " (provided: '%s', accepted: min: %d, max: %d)\n",
+ attr->attr.name, buf, MIN_MAX_RECONN_ATT,
+ MAX_MAX_RECONN_ATT);
+   return -EINVAL;
+   }
+   ibtrs_clt_set_max_reconnect_attempts(clt, value);
+
+   return count;
+}
+
+static DEVICE_ATTR_RW(max_reconnect_attempts);
+
+static ssize_t mpath_policy_show(struct device *dev,
+struct device_attribute *attr,
+char *page)
+{
+   struct ibtrs_clt *clt;
+
+   clt = container_of(dev, struct ibtrs_clt, dev);
+
+   switch (clt->mp_policy) {
+   case MP_POLICY_RR:
+   return sprintf(page, "round-robin (RR: %d)\n", clt->mp_policy);
+   case MP_POLICY_MIN_INFLIGHT:
+   return sprintf(p

[PATCH v3 04/25] ibtrs: core: lib functions shared between client and server modules

2018-06-06 Thread Roman Pen
This is a set of library functions existing as a ibtrs-core module,
used by client and server modules.

Mainly these functions wrap IB and RDMA calls and provide a bit higher
abstraction for implementing of IBTRS protocol on client or server
sides.

Signed-off-by: Roman Pen 
Signed-off-by: Danil Kipnis 
Cc: Jack Wang 
---
 drivers/infiniband/ulp/ibtrs/ibtrs.c | 611 +++
 1 file changed, 611 insertions(+)
 create mode 100644 drivers/infiniband/ulp/ibtrs/ibtrs.c

diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs.c 
b/drivers/infiniband/ulp/ibtrs/ibtrs.c
new file mode 100644
index ..11302408b13c
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs/ibtrs.c
@@ -0,0 +1,611 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler 
+ *  Jack Wang 
+ *  Kleber Souza 
+ *  Danil Kipnis 
+ *  Roman Penyaev 
+ *  Milind Dumbare 
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis 
+ *  Roman Penyaev 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#undef pr_fmt
+#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt
+
+#include 
+#include 
+
+#include "ibtrs-pri.h"
+#include "ibtrs-log.h"
+
+MODULE_AUTHOR("ib...@profitbricks.com");
+MODULE_DESCRIPTION("IBTRS Core");
+MODULE_VERSION(IBTRS_VER_STRING);
+MODULE_LICENSE("GPL");
+
+struct ibtrs_iu *ibtrs_iu_alloc(u32 tag, size_t size, gfp_t gfp_mask,
+   struct ib_device *dma_dev,
+   enum dma_data_direction direction,
+   void (*done)(struct ib_cq *cq,
+struct ib_wc *wc))
+{
+   struct ibtrs_iu *iu;
+
+   iu = kmalloc(sizeof(*iu), gfp_mask);
+   if (unlikely(!iu))
+   return NULL;
+
+   iu->buf = kzalloc(size, gfp_mask);
+   if (unlikely(!iu->buf))
+   goto err1;
+
+   iu->dma_addr = ib_dma_map_single(dma_dev, iu->buf, size, direction);
+   if (unlikely(ib_dma_mapping_error(dma_dev, iu->dma_addr)))
+   goto err2;
+
+   iu->cqe.done  = done;
+   iu->size  = size;
+   iu->direction = direction;
+   iu->tag   = tag;
+
+   return iu;
+
+err2:
+   kfree(iu->buf);
+err1:
+   kfree(iu);
+
+   return NULL;
+}
+EXPORT_SYMBOL_GPL(ibtrs_iu_alloc);
+
+void ibtrs_iu_free(struct ibtrs_iu *iu, enum dma_data_direction dir,
+  struct ib_device *ibdev)
+{
+   if (!iu)
+   return;
+
+   ib_dma_unmap_single(ibdev, iu->dma_addr, iu->size, dir);
+   kfree(iu->buf);
+   kfree(iu);
+}
+EXPORT_SYMBOL_GPL(ibtrs_iu_free);
+
+int ibtrs_iu_post_recv(struct ibtrs_con *con, struct ibtrs_iu *iu)
+{
+   struct ibtrs_sess *sess = con->sess;
+   struct ib_recv_wr wr, *bad_wr;
+   struct ib_sge list;
+
+   list.addr   = iu->dma_addr;
+   list.length = iu->size;
+   list.lkey   = sess->dev->ib_pd->local_dma_lkey;
+
+   if (WARN_ON(list.length == 0)) {
+   ibtrs_wrn(con, "Posting receive work request failed,"
+ " sg list is empty\n");
+   return -EINVAL;
+   }
+
+   wr.next= NULL;
+   wr.wr_cqe  = >cqe;
+   wr.sg_list = 
+   wr.num_sge = 1;
+
+   return ib_post_recv(con->qp, , _wr);
+}
+EXPORT_SYMBOL_GPL(ibtrs_iu_post_recv);
+
+int ibtrs_post_recv_empty(struct ibtrs_con *con, struct ib_cqe *cqe)
+{
+   struct ib_recv_wr wr, *bad_wr;
+
+   wr.next= NULL;
+   wr.wr_cqe  = cqe;
+   wr.sg_list = NULL;
+   wr.num_sge = 0;
+
+   return ib_post_recv(con->qp, , _wr);
+}
+EXPORT_SYMBOL_GPL(ibtrs_post_recv_empty);
+
+int ibtrs_post_recv_empty_x2(struct ibtrs_con *con, struct ib_cqe *cqe)
+{
+   struct ib_recv_wr wr_arr[2], *wr, *bad_wr;
+   int i;
+
+   memset(wr_arr, 0, sizeof(wr_arr));
+   for (i = 0; i < ARRAY_SIZE(wr_arr); i++) {
+   wr = _arr[i];
+   wr->wr_cqe  = cqe;
+   if (i)
+   /* Chain backwards */
+   wr->next

[PATCH v2 01/26] rculist: introduce list_next_or_null_rr_rcu()

2018-05-18 Thread Roman Pen
Function is going to be used in transport over RDMA module
in subsequent patches.

Function returns next element in round-robin fashion,
i.e. head will be skipped.  NULL will be returned if list
is observed as empty.

Signed-off-by: Roman Pen <roman.peny...@profitbricks.com>
Cc: Paul E. McKenney <paul...@linux.vnet.ibm.com>
Cc: linux-ker...@vger.kernel.org
---
 include/linux/rculist.h | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/include/linux/rculist.h b/include/linux/rculist.h
index 127f534fec94..b0840d5ab25a 100644
--- a/include/linux/rculist.h
+++ b/include/linux/rculist.h
@@ -339,6 +339,25 @@ static inline void list_splice_tail_init_rcu(struct 
list_head *list,
 })
 
 /**
+ * list_next_or_null_rr_rcu - get next list element in round-robin fashion.
+ * @head:  the head for the list.
+ * @ptr:the list head to take the next element from.
+ * @type:   the type of the struct this is embedded in.
+ * @memb:   the name of the list_head within the struct.
+ *
+ * Next element returned in round-robin fashion, i.e. head will be skipped,
+ * but if list is observed as empty, NULL will be returned.
+ *
+ * This primitive may safely run concurrently with the _rcu list-mutation
+ * primitives such as list_add_rcu() as long as it's guarded by 
rcu_read_lock().
+ */
+#define list_next_or_null_rr_rcu(head, ptr, type, memb) \
+({ \
+   list_next_or_null_rcu(head, ptr, type, memb) ?: \
+   list_next_or_null_rcu(head, READ_ONCE((ptr)->next), type, 
memb); \
+})
+
+/**
  * list_for_each_entry_rcu -   iterate over rcu list of given type
  * @pos:   the type * to use as a loop cursor.
  * @head:  the head for your list.
-- 
2.13.1



[PATCH v2 00/26] InfiniBand Transport (IBTRS) and Network Block Device (IBNBD)

2018-05-18 Thread Roman Pen
 -10.8%
  x32  1215334   1066607 -12.2%
  x40  1255781   1076841 -14.2%
  x48  1240931   1066453 -14.1%
  x56  1250333   1065879 -14.8%
  x64  1229389   1064199 -13.4%

 rw=randwrite, bandwidth in Kbytes:
 jobsIBNBD NVMEoRDMA Change
   x1   1416413  1181102 -16.6%
   x8   2438615  1977051 -18.9%
  x16   2436924  1854223 -23.9%
  x24   2430527  1714580 -29.5%
  x32   2425552  1641288 -32.3%
  x40   2378784  1592788 -33.0%
  x48   2202260  1511895 -31.3%
  x56   2207013  1493400 -32.3%
  x64   2098949  1432951 -31.7%


  - on ConnectX-3 (MT4099)
x40 CPUs Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz

 rw=randread, bandwidth in Kbytes:
 jobsIBNBD NVMEoRDMA Change
   x1  1961216   2046572  +4.4%
   x8  4012912   4059410  +1.2%
  x16  4033837   3968410  -1.6%
  x24  3939186   3770729  -4.3%
  x32  3843434   3623869  -5.7%
  x40  3696896   3448772  -6.7%
  x48  4106259   3729201  -9.2%
  x56  4141374   3732954  -9.9%
  x64  4207317   3805638  -9.5%

 rw=randwrite, bandwidth in Kbytes:
 jobsIBNBD NVMEoRDMA Change
   x1  3195637   2479068 -22.4%
   x8  4576924   4541743  -0.8%
  x16  4581528   4555459  -0.6%
  x24  4692540   4595963  -2.1%
  x32  4686968   4540456  -3.1%
  x40  4583814   4404859  -3.9%
  x48  4969587   4710902  -5.2%
  x56  4996101   4701814  -5.9%
  x64  5083460   4759663  -6.4%

  The interesting observation is that on machine with Intel CPUs and
  ConnectX-3 card the difference between IBNBD and NVME bandwidth is
  significantly smaller comparing to AMD and ConnectX-2.  I did not
  thoroughly investiage that behaviour, but suspect that the devil
  is in Intel vs AMD architecture and probably how NUMAs are organized,
  i.e. Intel has 2 NUMA nodes against 8 on AMD.  If someone is interested
  in those results and can point me out where to dig on NVME side I can
  investigate deeply why exactly NVME bandwidth significantly drops on
  AMD machine with Connect-X2.

  Shiny graphs are here:
  
https://docs.google.com/spreadsheets/d/1vxSoIvfjPbOWD61XMeN2_gPGxsxrbIUOZADk1UX5lj0

Roman Pen (26):
  rculist: introduce list_next_or_null_rr_rcu()
  sysfs: export sysfs_remove_file_self()
  ibtrs: public interface header to establish RDMA connections
  ibtrs: private headers with IBTRS protocol structs and helpers
  ibtrs: core: lib functions shared between client and server modules
  ibtrs: client: private header with client structs and functions
  ibtrs: client: main functionality
  ibtrs: client: statistics functions
  ibtrs: client: sysfs interface functions
  ibtrs: server: private header with server structs and functions
  ibtrs: server: main functionality
  ibtrs: server: statistics functions
  ibtrs: server: sysfs interface functions
  ibtrs: include client and server modules into kernel compilation
  ibtrs: a bit of documentation
  ibnbd: private headers with IBNBD protocol structs and helpers
  ibnbd: client: private header with client structs and functions
  ibnbd: client: main functionality
  ibnbd: client: sysfs interface functions
  ibnbd: server: private header with server structs and functions
  ibnbd: server: main functionality
  ibnbd: server: functionality for IO submission to file or block dev
  ibnbd: server: sysfs interface functions
  ibnbd: include client and server modules into kernel compilation
  ibnbd: a bit of documentation
  MAINTAINERS: Add maintainer for IBNBD/IBTRS modules

 MAINTAINERS|   14 +
 drivers/block/Kconfig  |2 +
 drivers/block/Makefile |1 +
 drivers/block/ibnbd/Kconfig|   22 +
 drivers/block/ibnbd/Makefile   |   13 +
 drivers/block/ibnbd/README |  299 +++
 drivers/block/ibnbd/ibnbd-clt-sysfs.c  |  669 ++
 drivers/block/ibnbd/ibnbd-clt.c| 1818 +++
 drivers/block/ibnbd/ibnbd-clt.h|  171 ++
 drivers/block/ibnbd/ibnbd-log.h|   71 +
 drivers/block/ibnbd/ibnbd-proto.h  |  364 +++
 drivers/block/ibnbd/ibnbd-srv-dev.c|  410 
 drivers/block/ibnbd/ibnbd-srv-dev.h|  149 ++
 drivers/block/ibnbd/ibnbd-srv-sysfs.c  |  242 ++
 drivers/block/ibnbd/ibnbd-srv.c|  922 
 drivers/block/ibnbd/ibnbd-srv.h|  100 +
 drivers/infiniband/Kconfig |1 +
 drivers/infiniband/ulp/Makefile

[PATCH v2 02/26] sysfs: export sysfs_remove_file_self()

2018-05-18 Thread Roman Pen
Function is going to be used in transport over RDMA module
in subsequent patches.

Signed-off-by: Roman Pen <roman.peny...@profitbricks.com>
Cc: Tejun Heo <t...@kernel.org>
Cc: linux-ker...@vger.kernel.org
---
 fs/sysfs/file.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
index 5c13f29bfcdb..ff7443ac2aa7 100644
--- a/fs/sysfs/file.c
+++ b/fs/sysfs/file.c
@@ -444,6 +444,7 @@ bool sysfs_remove_file_self(struct kobject *kobj, const 
struct attribute *attr)
kernfs_put(kn);
return ret;
 }
+EXPORT_SYMBOL_GPL(sysfs_remove_file_self);
 
 void sysfs_remove_files(struct kobject *kobj, const struct attribute **ptr)
 {
-- 
2.13.1



[PATCH v2 06/26] ibtrs: client: private header with client structs and functions

2018-05-18 Thread Roman Pen
This header describes main structs and functions used by ibtrs-client
module, mainly for managing IBTRS sessions, creating/destroying sysfs
entries, accounting statistics on client side.

Signed-off-by: Roman Pen <roman.peny...@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kip...@profitbricks.com>
Cc: Jack Wang <jinpu.w...@profitbricks.com>
---
 drivers/infiniband/ulp/ibtrs/ibtrs-clt.h | 315 +++
 1 file changed, 315 insertions(+)
 create mode 100644 drivers/infiniband/ulp/ibtrs/ibtrs-clt.h

diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-clt.h 
b/drivers/infiniband/ulp/ibtrs/ibtrs-clt.h
new file mode 100644
index ..0323da91ca01
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs/ibtrs-clt.h
@@ -0,0 +1,315 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler <m...@fholler.de>
+ *  Jack Wang <jinpu.w...@profitbricks.com>
+ *  Kleber Souza <kleber.so...@profitbricks.com>
+ *  Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *  Milind Dumbare <milind.dumb...@gmail.com>
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *  Swapnil Ingle <swapnil.in...@profitbricks.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef IBTRS_CLT_H
+#define IBTRS_CLT_H
+
+#include 
+#include "ibtrs-pri.h"
+
+/**
+ * enum ibtrs_clt_state - Client states.
+ */
+enum ibtrs_clt_state {
+   IBTRS_CLT_CONNECTING,
+   IBTRS_CLT_CONNECTING_ERR,
+   IBTRS_CLT_RECONNECTING,
+   IBTRS_CLT_CONNECTED,
+   IBTRS_CLT_CLOSING,
+   IBTRS_CLT_CLOSED,
+   IBTRS_CLT_DEAD,
+};
+
+static inline const char *ibtrs_clt_state_str(enum ibtrs_clt_state state)
+{
+   switch (state) {
+   case IBTRS_CLT_CONNECTING:
+   return "IBTRS_CLT_CONNECTING";
+   case IBTRS_CLT_CONNECTING_ERR:
+   return "IBTRS_CLT_CONNECTING_ERR";
+   case IBTRS_CLT_RECONNECTING:
+   return "IBTRS_CLT_RECONNECTING";
+   case IBTRS_CLT_CONNECTED:
+   return "IBTRS_CLT_CONNECTED";
+   case IBTRS_CLT_CLOSING:
+   return "IBTRS_CLT_CLOSING";
+   case IBTRS_CLT_CLOSED:
+   return "IBTRS_CLT_CLOSED";
+   case IBTRS_CLT_DEAD:
+   return "IBTRS_CLT_DEAD";
+   default:
+   return "UNKNOWN";
+   }
+}
+
+enum ibtrs_mp_policy {
+   MP_POLICY_RR,
+   MP_POLICY_MIN_INFLIGHT,
+};
+
+struct ibtrs_clt_stats_reconnects {
+   int successful_cnt;
+   int fail_cnt;
+};
+
+struct ibtrs_clt_stats_wc_comp {
+   u32 cnt;
+   u64 total_cnt;
+};
+
+struct ibtrs_clt_stats_cpu_migr {
+   atomic_t from;
+   int to;
+};
+
+struct ibtrs_clt_stats_rdma {
+   struct {
+   u64 cnt;
+   u64 size_total;
+   } dir[2];
+
+   u64 failover_cnt;
+};
+
+struct ibtrs_clt_stats_rdma_lat {
+   u64 read;
+   u64 write;
+};
+
+#define MIN_LOG_SG 2
+#define MAX_LOG_SG 5
+#define MAX_LIN_SG BIT(MIN_LOG_SG)
+#define SG_DISTR_SZ (MAX_LOG_SG - MIN_LOG_SG + MAX_LIN_SG + 2)
+
+#define MAX_LOG_LAT 16
+#define MIN_LOG_LAT 0
+#define LOG_LAT_SZ (MAX_LOG_LAT - MIN_LOG_LAT + 2)
+
+struct ibtrs_clt_stats_pcpu {
+   struct ibtrs_clt_stats_cpu_migr cpu_migr;
+   struct ibtrs_clt_stats_rdma rdma;
+   u64 sg_list_total;
+   u64 sg_list_distr[SG_DISTR_SZ];
+   struct ibtrs_clt_stats_rdma_lat rdma_lat_distr[LOG_LAT_SZ];
+   struct ibtrs_clt_stats_rdma_lat rdma_lat_max;
+   struct ibtrs_clt_stats_wc_comp  wc_comp;
+};
+
+struct ibtrs_clt_stats {
+   boolenable_rdma_lat;
+   struct ibtrs_clt_stats_pcpu__percpu *pcpu_stats;
+   struct ibtrs_clt_stats_reconnects   reconnects;
+   atomic_tinflight;
+};
+
+struct ibtrs_clt_con {
+   struct 

[PATCH v2 05/26] ibtrs: core: lib functions shared between client and server modules

2018-05-18 Thread Roman Pen
This is a set of library functions existing as a ibtrs-core module,
used by client and server modules.

Mainly these functions wrap IB and RDMA calls and provide a bit higher
abstraction for implementing of IBTRS protocol on client or server
sides.

Signed-off-by: Roman Pen <roman.peny...@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kip...@profitbricks.com>
Cc: Jack Wang <jinpu.w...@profitbricks.com>
---
 drivers/infiniband/ulp/ibtrs/ibtrs.c | 609 +++
 1 file changed, 609 insertions(+)
 create mode 100644 drivers/infiniband/ulp/ibtrs/ibtrs.c

diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs.c 
b/drivers/infiniband/ulp/ibtrs/ibtrs.c
new file mode 100644
index ..39a933fe528e
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs/ibtrs.c
@@ -0,0 +1,609 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler <m...@fholler.de>
+ *  Jack Wang <jinpu.w...@profitbricks.com>
+ *  Kleber Souza <kleber.so...@profitbricks.com>
+ *  Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *  Milind Dumbare <milind.dumb...@gmail.com>
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#undef pr_fmt
+#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt
+
+#include 
+#include 
+
+#include "ibtrs-pri.h"
+#include "ibtrs-log.h"
+
+MODULE_AUTHOR("ib...@profitbricks.com");
+MODULE_DESCRIPTION("IBTRS Core");
+MODULE_VERSION(IBTRS_VER_STRING);
+MODULE_LICENSE("GPL");
+
+struct ibtrs_iu *ibtrs_iu_alloc(u32 tag, size_t size, gfp_t gfp_mask,
+   struct ib_device *dma_dev,
+   enum dma_data_direction direction,
+   void (*done)(struct ib_cq *cq,
+struct ib_wc *wc))
+{
+   struct ibtrs_iu *iu;
+
+   iu = kmalloc(sizeof(*iu), gfp_mask);
+   if (unlikely(!iu))
+   return NULL;
+
+   iu->buf = kzalloc(size, gfp_mask);
+   if (unlikely(!iu->buf))
+   goto err1;
+
+   iu->dma_addr = ib_dma_map_single(dma_dev, iu->buf, size, direction);
+   if (unlikely(ib_dma_mapping_error(dma_dev, iu->dma_addr)))
+   goto err2;
+
+   iu->cqe.done  = done;
+   iu->size  = size;
+   iu->direction = direction;
+   iu->tag   = tag;
+
+   return iu;
+
+err2:
+   kfree(iu->buf);
+err1:
+   kfree(iu);
+
+   return NULL;
+}
+EXPORT_SYMBOL_GPL(ibtrs_iu_alloc);
+
+void ibtrs_iu_free(struct ibtrs_iu *iu, enum dma_data_direction dir,
+  struct ib_device *ibdev)
+{
+   if (!iu)
+   return;
+
+   ib_dma_unmap_single(ibdev, iu->dma_addr, iu->size, dir);
+   kfree(iu->buf);
+   kfree(iu);
+}
+EXPORT_SYMBOL_GPL(ibtrs_iu_free);
+
+int ibtrs_iu_post_recv(struct ibtrs_con *con, struct ibtrs_iu *iu)
+{
+   struct ibtrs_sess *sess = con->sess;
+   struct ib_recv_wr wr, *bad_wr;
+   struct ib_sge list;
+
+   list.addr   = iu->dma_addr;
+   list.length = iu->size;
+   list.lkey   = sess->dev->ib_pd->local_dma_lkey;
+
+   if (WARN_ON(list.length == 0)) {
+   ibtrs_wrn(con, "Posting receive work request failed,"
+ " sg list is empty\n");
+   return -EINVAL;
+   }
+
+   wr.next= NULL;
+   wr.wr_cqe  = >cqe;
+   wr.sg_list = 
+   wr.num_sge = 1;
+
+   return ib_post_recv(con->qp, , _wr);
+}
+EXPORT_SYMBOL_GPL(ibtrs_iu_post_recv);
+
+int ibtrs_post_recv_empty(struct ibtrs_con *con, struct ib_cqe *cqe)
+{
+   struct ib_recv_wr wr, *bad_wr;
+
+   wr.next= NULL;
+   wr.wr_cqe  = cqe;
+   wr.sg_list = NULL;
+   wr.num_sge = 0;
+
+   return ib_post_recv(con->qp, , _wr);
+}
+EXPORT_SYMBOL_GPL(ibtrs_post_recv_empty);
+
+int ibtrs_post_recv_empty_x

[PATCH v2 04/26] ibtrs: private headers with IBTRS protocol structs and helpers

2018-05-18 Thread Roman Pen
These are common private headers with IBTRS protocol structures,
logging, sysfs and other helper functions, which are used on
both client and server sides.

Signed-off-by: Roman Pen <roman.peny...@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kip...@profitbricks.com>
Cc: Jack Wang <jinpu.w...@profitbricks.com>
---
 drivers/infiniband/ulp/ibtrs/ibtrs-log.h |  91 ++
 drivers/infiniband/ulp/ibtrs/ibtrs-pri.h | 459 +++
 2 files changed, 550 insertions(+)
 create mode 100644 drivers/infiniband/ulp/ibtrs/ibtrs-log.h
 create mode 100644 drivers/infiniband/ulp/ibtrs/ibtrs-pri.h

diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-log.h 
b/drivers/infiniband/ulp/ibtrs/ibtrs-log.h
new file mode 100644
index ..f56257eabdee
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs/ibtrs-log.h
@@ -0,0 +1,91 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler <m...@fholler.de>
+ *  Jack Wang <jinpu.w...@profitbricks.com>
+ *  Kleber Souza <kleber.so...@profitbricks.com>
+ *  Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *  Milind Dumbare <milind.dumb...@gmail.com>
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef IBTRS_LOG_H
+#define IBTRS_LOG_H
+
+#define P1 )
+#define P2 ))
+#define P3 )))
+#define P4 
+#define P(N) P ## N
+
+#define CAT(a, ...) PRIMITIVE_CAT(a, __VA_ARGS__)
+#define PRIMITIVE_CAT(a, ...) a ## __VA_ARGS__
+
+#define LIST(...)  \
+   __VA_ARGS__,\
+   ({ unknown_type(); NULL; }) \
+   CAT(P, COUNT_ARGS(__VA_ARGS__)) \
+
+#define EMPTY()
+#define DEFER(id) id EMPTY()
+
+#define _CASE(obj, type, member)   \
+   __builtin_choose_expr(  \
+   __builtin_types_compatible_p(   \
+   typeof(obj), type), \
+   ((type)obj)->member
+#define CASE(o, t, m) DEFER(_CASE)(o,t,m)
+
+/*
+ * Below we define retrieving of sessname from common IBTRS types.
+ * Client or server related types have to be defined by special
+ * TYPES_TO_SESSNAME macro.
+ */
+
+void unknown_type(void);
+
+#ifndef TYPES_TO_SESSNAME
+#define TYPES_TO_SESSNAME(...) ({ unknown_type(); NULL; })
+#endif
+
+#define ibtrs_prefix(obj)  \
+   _CASE(obj, struct ibtrs_con *,  sess->sessname),\
+   _CASE(obj, struct ibtrs_sess *, sessname),  \
+   TYPES_TO_SESSNAME(obj)  \
+   ))
+
+#define ibtrs_log(fn, obj, fmt, ...)   \
+   fn("<%s>: " fmt, ibtrs_prefix(obj), ##__VA_ARGS__)
+
+#define ibtrs_err(obj, fmt, ...)   \
+   ibtrs_log(pr_err, obj, fmt, ##__VA_ARGS__)
+#define ibtrs_err_rl(obj, fmt, ...)\
+   ibtrs_log(pr_err_ratelimited, obj, fmt, ##__VA_ARGS__)
+#define ibtrs_wrn(obj, fmt, ...)   \
+   ibtrs_log(pr_warn, obj, fmt, ##__VA_ARGS__)
+#define ibtrs_wrn_rl(obj, fmt, ...) \
+   ibtrs_log(pr_warn_ratelimited, obj, fmt, ##__VA_ARGS__)
+#define ibtrs_info(obj, fmt, ...) \
+   ibtrs_log(pr_info, obj, fmt, ##__VA_ARGS__)
+#define ibtrs_info_rl(obj, fmt, ...) \
+   ibtrs_log(pr_info_ratelimited, obj, fmt, ##__VA_ARGS__)
+
+#endif /* IBTRS_LOG_H */
diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-pri.h 
b/drivers/infiniband/ulp/ibtrs/ibtrs-pri.h
new file mode 100644
index ..40647f066840
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs/ibtrs-pri.h
@@ -0,0 +1,459 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler <m...@fholler.de>
+ *  Jack Wang <jinpu.w...@profitbricks.com>
+ *  Kleber Souza <kleber.so...@profitbricks.com>
+ *  Danil Kipnis <dan

[PATCH v2 03/26] ibtrs: public interface header to establish RDMA connections

2018-05-18 Thread Roman Pen
Introduce public header which provides set of API functions to
establish RDMA connections from client to server machine using
IBTRS protocol, which manages RDMA connections for each session,
does multipathing and load balancing.

Main functions for client (active) side:

 ibtrs_clt_open() - Creates set of RDMA connections incapsulated
in IBTRS session and returns pointer on IBTRS
session object.
 ibtrs_clt_close() - Closes RDMA connections associated with IBTRS
 session.
 ibtrs_clt_request() - Requests zero-copy RDMA transfer to/from
   server.

Main functions for server (passive) side:

 ibtrs_srv_open() - Starts listening for IBTRS clients on specified
port and invokes IBTRS callbacks for incoming
RDMA requests or link events.
 ibtrs_srv_close() - Closes IBTRS server context.

Signed-off-by: Roman Pen <roman.peny...@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kip...@profitbricks.com>
Cc: Jack Wang <jinpu.w...@profitbricks.com>
---
 drivers/infiniband/ulp/ibtrs/ibtrs.h | 324 +++
 1 file changed, 324 insertions(+)
 create mode 100644 drivers/infiniband/ulp/ibtrs/ibtrs.h

diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs.h 
b/drivers/infiniband/ulp/ibtrs/ibtrs.h
new file mode 100644
index ..08325e39a41e
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs/ibtrs.h
@@ -0,0 +1,324 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler <m...@fholler.de>
+ *  Jack Wang <jinpu.w...@profitbricks.com>
+ *  Kleber Souza <kleber.so...@profitbricks.com>
+ *  Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *  Milind Dumbare <milind.dumb...@gmail.com>
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef IBTRS_H
+#define IBTRS_H
+
+#include 
+#include 
+
+struct ibtrs_tag;
+struct ibtrs_clt;
+struct ibtrs_srv_ctx;
+struct ibtrs_srv;
+struct ibtrs_srv_op;
+
+/*
+ * Here goes IBTRS client API
+ */
+
+/**
+ * enum ibtrs_clt_link_ev - Events about connectivity state of a client
+ * @IBTRS_CLT_LINK_EV_RECONNECTED  Client was reconnected.
+ * @IBTRS_CLT_LINK_EV_DISCONNECTED Client was disconnected.
+ */
+enum ibtrs_clt_link_ev {
+   IBTRS_CLT_LINK_EV_RECONNECTED,
+   IBTRS_CLT_LINK_EV_DISCONNECTED,
+};
+
+/**
+ * Source and destination address of a path to be established
+ */
+struct ibtrs_addr {
+   struct sockaddr_storage *src;
+   struct sockaddr_storage *dst;
+};
+
+typedef void (link_clt_ev_fn)(void *priv, enum ibtrs_clt_link_ev ev);
+/**
+ * ibtrs_clt_open() - Open a session to a IBTRS client
+ * @priv:  User supplied private data.
+ * @link_ev:   Event notification for connection state changes
+ * @priv:  user supplied data that was passed to
+ * ibtrs_clt_open()
+ * @ev:Occurred event
+ * @sessname: name of the session
+ * @paths: Paths to be established defined by their src and dst addresses
+ * @path_cnt: Number of elemnts in the @paths array
+ * @port: port to be used by the IBTRS session
+ * @pdu_sz: Size of extra payload which can be accessed after tag allocation.
+ * @max_inflight_msg: Max. number of parallel inflight messages for the session
+ * @max_segments: Max. number of segments per IO request
+ * @reconnect_delay_sec: time between reconnect tries
+ * @max_reconnect_attempts: Number of times to reconnect on error before giving
+ * up, 0 for * disabled, -1 for forever
+ *
+ * Starts session establishment with the ibtrs_server. The function can block
+ * up to ~2000ms until it returns.
+ *
+ * Return a valid pointer on success otherwise PTR_ERR.
+ */
+struct ibtrs_clt *ibtrs_clt_open(void *priv, link_clt_ev_fn *link_ev,
+const char *sessname,
+const struct ibtrs_a

[PATCH v2 23/26] ibnbd: server: sysfs interface functions

2018-05-18 Thread Roman Pen
This is the sysfs interface to IBNBD mapped devices on server side:

  /sys/devices/virtual/ibnbd-server/ctl/devices//
|- block_dev
|  *** link pointing to the corresponding block device sysfs entry
|
|- sessions//
|  *** sessions directory
   |
   |- read_only
   |  *** is devices mapped as read only
   |
   |- mapping_path
  *** relative device path provided by the client during mapping

Signed-off-by: Roman Pen <roman.peny...@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kip...@profitbricks.com>
Cc: Jack Wang <jinpu.w...@profitbricks.com>
---
 drivers/block/ibnbd/ibnbd-srv-sysfs.c | 242 ++
 1 file changed, 242 insertions(+)
 create mode 100644 drivers/block/ibnbd/ibnbd-srv-sysfs.c

diff --git a/drivers/block/ibnbd/ibnbd-srv-sysfs.c 
b/drivers/block/ibnbd/ibnbd-srv-sysfs.c
new file mode 100644
index ..5bf77cdb09c8
--- /dev/null
+++ b/drivers/block/ibnbd/ibnbd-srv-sysfs.c
@@ -0,0 +1,242 @@
+/*
+ * InfiniBand Network Block Driver
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler <m...@fholler.de>
+ *  Jack Wang <jinpu.w...@profitbricks.com>
+ *  Kleber Souza <kleber.so...@profitbricks.com>
+ *  Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *  Milind Dumbare <milind.dumb...@gmail.com>
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#undef pr_fmt
+#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "ibnbd-srv.h"
+
+static struct device *ibnbd_dev;
+static struct class *ibnbd_dev_class;
+static struct kobject *ibnbd_devs_kobj;
+
+static struct attribute *ibnbd_srv_default_dev_attrs[] = {
+   NULL,
+};
+
+static struct attribute_group ibnbd_srv_default_dev_attr_group = {
+   .attrs = ibnbd_srv_default_dev_attrs,
+};
+
+static struct kobj_type ktype = {
+   .sysfs_ops  = _sysfs_ops,
+};
+
+int ibnbd_srv_create_dev_sysfs(struct ibnbd_srv_dev *dev,
+  struct block_device *bdev,
+  const char *dir_name)
+{
+   struct kobject *bdev_kobj;
+   int ret;
+
+   ret = kobject_init_and_add(>dev_kobj, ,
+  ibnbd_devs_kobj, dir_name);
+   if (ret)
+   return ret;
+
+   ret = kobject_init_and_add(>dev_sessions_kobj,
+  ,
+  >dev_kobj, "sessions");
+   if (ret)
+   goto err;
+
+   ret = sysfs_create_group(>dev_kobj,
+_srv_default_dev_attr_group);
+   if (ret)
+   goto err2;
+
+   bdev_kobj = _to_dev(bdev->bd_disk)->kobj;
+   ret = sysfs_create_link(>dev_kobj, bdev_kobj, "block_dev");
+   if (ret)
+   goto err3;
+
+   return 0;
+
+err3:
+   sysfs_remove_group(>dev_kobj,
+  _srv_default_dev_attr_group);
+err2:
+   kobject_del(>dev_sessions_kobj);
+   kobject_put(>dev_sessions_kobj);
+err:
+   kobject_del(>dev_kobj);
+   kobject_put(>dev_kobj);
+   return ret;
+}
+
+void ibnbd_srv_destroy_dev_sysfs(struct ibnbd_srv_dev *dev)
+{
+   sysfs_remove_link(>dev_kobj, "block_dev");
+   sysfs_remove_group(>dev_kobj, _srv_default_dev_attr_group);
+   kobject_del(>dev_sessions_kobj);
+   kobject_put(>dev_sessions_kobj);
+   kobject_del(>dev_kobj);
+   kobject_put(>dev_kobj);
+}
+
+static ssize_t ibnbd_srv_dev_session_ro_show(struct kobject *kobj,
+struct kobj_attribute *attr,
+char *page)
+{
+   struct ibnbd_srv_sess_dev *sess_dev;
+
+   sess_dev = container_of(kobj, struct ibnbd_srv_sess_dev, kobj);
+
+   return scnprintf(page, PAGE_S

[PATCH v2 20/26] ibnbd: server: private header with server structs and functions

2018-05-18 Thread Roman Pen
This header describes main structs and functions used by ibnbd-server
module, namely structs for managing sessions from different clients
and mapped (opened) devices.

Signed-off-by: Roman Pen <roman.peny...@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kip...@profitbricks.com>
Cc: Jack Wang <jinpu.w...@profitbricks.com>
---
 drivers/block/ibnbd/ibnbd-srv.h | 100 
 1 file changed, 100 insertions(+)
 create mode 100644 drivers/block/ibnbd/ibnbd-srv.h

diff --git a/drivers/block/ibnbd/ibnbd-srv.h b/drivers/block/ibnbd/ibnbd-srv.h
new file mode 100644
index ..191a1650bc1d
--- /dev/null
+++ b/drivers/block/ibnbd/ibnbd-srv.h
@@ -0,0 +1,100 @@
+/*
+ * InfiniBand Network Block Driver
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler <m...@fholler.de>
+ *  Jack Wang <jinpu.w...@profitbricks.com>
+ *  Kleber Souza <kleber.so...@profitbricks.com>
+ *  Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *  Milind Dumbare <milind.dumb...@gmail.com>
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef IBNBD_SRV_H
+#define IBNBD_SRV_H
+
+#include 
+#include 
+#include 
+
+#include "ibtrs.h"
+#include "ibnbd-proto.h"
+#include "ibnbd-log.h"
+
+struct ibnbd_srv_session {
+   /* Entry inside global sess_list */
+   struct list_headlist;
+   struct ibtrs_srv*ibtrs;
+   charsessname[NAME_MAX];
+   int queue_depth;
+   struct bio_set  *sess_bio_set;
+
+   rwlock_tindex_lock cacheline_aligned;
+   struct idr  index_idr;
+   /* List of struct ibnbd_srv_sess_dev */
+   struct list_headsess_dev_list;
+   struct mutexlock;
+   u8  ver;
+};
+
+struct ibnbd_srv_dev {
+   /* Entry inside global dev_list */
+   struct list_headlist;
+   struct kobject  dev_kobj;
+   struct kobject  dev_sessions_kobj;
+   struct kref kref;
+   charid[NAME_MAX];
+   /* List of ibnbd_srv_sess_dev structs */
+   struct list_headsess_dev_list;
+   struct mutexlock;
+   int open_write_cnt;
+   enum ibnbd_io_mode  mode;
+};
+
+/* Structure which binds N devices and N sessions */
+struct ibnbd_srv_sess_dev {
+   /* Entry inside ibnbd_srv_dev struct */
+   struct list_headdev_list;
+   /* Entry inside ibnbd_srv_session struct */
+   struct list_headsess_list;
+   struct ibnbd_dev*ibnbd_dev;
+   struct ibnbd_srv_session*sess;
+   struct ibnbd_srv_dev*dev;
+   struct kobject  kobj;
+   struct completion   *sysfs_release_compl;
+   u32 device_id;
+   fmode_t open_flags;
+   struct kref kref;
+   struct completion   *destroy_comp;
+   charpathname[NAME_MAX];
+};
+
+/* ibnbd-srv-sysfs.c */
+
+int ibnbd_srv_create_dev_sysfs(struct ibnbd_srv_dev *dev,
+  struct block_device *bdev,
+  const char *dir_name);
+void ibnbd_srv_destroy_dev_sysfs(struct ibnbd_srv_dev *dev);
+int ibnbd_srv_create_dev_session_sysfs(struct ibnbd_srv_sess_dev *sess_dev);
+void ibnbd_srv_destroy_dev_session_sysfs(struct ibnbd_srv_sess_dev *sess_dev);
+int ibnbd_srv_create_sysfs_files(void);
+void ibnbd_srv_destroy_sysfs_files(void);
+
+#endif /* IBNBD_SRV_H */
-- 
2.13.1



[PATCH v2 21/26] ibnbd: server: main functionality

2018-05-18 Thread Roman Pen
This is main functionality of ibnbd-server module, which handles IBTRS
events and IBNBD protocol requests, like map (open) or unmap (close)
device.  Also server side is responsible for processing incoming IBTRS
IO requests and forward them to local mapped devices.

Signed-off-by: Roman Pen <roman.peny...@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kip...@profitbricks.com>
Cc: Jack Wang <jinpu.w...@profitbricks.com>
---
 drivers/block/ibnbd/ibnbd-srv.c | 922 
 1 file changed, 922 insertions(+)
 create mode 100644 drivers/block/ibnbd/ibnbd-srv.c

diff --git a/drivers/block/ibnbd/ibnbd-srv.c b/drivers/block/ibnbd/ibnbd-srv.c
new file mode 100644
index ..a42a9191dad9
--- /dev/null
+++ b/drivers/block/ibnbd/ibnbd-srv.c
@@ -0,0 +1,922 @@
+/*
+ * InfiniBand Network Block Driver
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler <m...@fholler.de>
+ *  Jack Wang <jinpu.w...@profitbricks.com>
+ *  Kleber Souza <kleber.so...@profitbricks.com>
+ *  Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *  Milind Dumbare <milind.dumb...@gmail.com>
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#undef pr_fmt
+#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt
+
+#include 
+#include 
+
+#include "ibnbd-srv.h"
+#include "ibnbd-srv-dev.h"
+
+MODULE_AUTHOR("ib...@profitbricks.com");
+MODULE_VERSION(IBNBD_VER_STRING);
+MODULE_DESCRIPTION("InfiniBand Network Block Device Server");
+MODULE_LICENSE("GPL");
+
+#define DEFAULT_DEV_SEARCH_PATH "/"
+
+static char dev_search_path[PATH_MAX] = DEFAULT_DEV_SEARCH_PATH;
+
+static int dev_search_path_set(const char *val, const struct kernel_param *kp)
+{
+   char *dup;
+
+   if (strlen(val) >= sizeof(dev_search_path))
+   return -EINVAL;
+
+   dup = kstrdup(val, GFP_KERNEL);
+
+   if (dup[strlen(dup) - 1] == '\n')
+   dup[strlen(dup) - 1] = '\0';
+
+   strlcpy(dev_search_path, dup, sizeof(dev_search_path));
+
+   kfree(dup);
+   pr_info("dev_search_path changed to '%s'\n", dev_search_path);
+
+   return 0;
+}
+
+static struct kparam_string dev_search_path_kparam_str = {
+   .maxlen = sizeof(dev_search_path),
+   .string = dev_search_path
+};
+
+static const struct kernel_param_ops dev_search_path_ops = {
+   .set= dev_search_path_set,
+   .get= param_get_string,
+};
+
+module_param_cb(dev_search_path, _search_path_ops,
+   _search_path_kparam_str, 0444);
+MODULE_PARM_DESC(dev_search_path, "Sets the dev_search_path."
+" When a device is mapped this path is prepended to the"
+" device path from the map device operation.  If %SESSNAME%"
+" is specified in a path, then device will be searched in a"
+" session namespace."
+" (default: " DEFAULT_DEV_SEARCH_PATH ")");
+
+static int def_io_mode = IBNBD_BLOCKIO;
+module_param(def_io_mode, int, 0444);
+MODULE_PARM_DESC(def_io_mode, "By default, export devices in"
+" blockio(" __stringify(_IBNBD_BLOCKIO) ") or"
+" fileio(" __stringify(_IBNBD_FILEIO) ") mode."
+" (default: " __stringify(_IBNBD_BLOCKIO) " (blockio))");
+
+static DEFINE_MUTEX(sess_lock);
+static DEFINE_SPINLOCK(dev_lock);
+
+static LIST_HEAD(sess_list);
+static LIST_HEAD(dev_list);
+
+struct ibnbd_io_private {
+   struct ibtrs_srv_op *id;
+   struct ibnbd_srv_sess_dev   *sess_dev;
+};
+
+static void ibnbd_sess_dev_release(struct kref *kref)
+{
+   struct ibnbd_srv_sess_dev *sess_dev;
+
+   sess_dev = container_of(kref, struct ibnbd_srv_sess_dev, kref);
+   complete(sess_dev->destroy_comp);
+}
+

[PATCH v2 26/26] MAINTAINERS: Add maintainer for IBNBD/IBTRS modules

2018-05-18 Thread Roman Pen
Signed-off-by: Roman Pen <roman.peny...@profitbricks.com>
Cc: Danil Kipnis <danil.kip...@profitbricks.com>
Cc: Jack Wang <jinpu.w...@profitbricks.com>
---
 MAINTAINERS | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 92be777d060a..e5a001bd0f05 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6786,6 +6786,20 @@ IBM ServeRAID RAID DRIVER
 S: Orphan
 F: drivers/scsi/ips.*
 
+IBNBD BLOCK DRIVERS
+M: IBNBD/IBTRS Storage Team <ib...@profitbricks.com>
+L: linux-block@vger.kernel.org
+S: Maintained
+T: git git://github.com/profitbricks/ibnbd.git
+F: drivers/block/ibnbd/
+
+IBTRS TRANSPORT DRIVERS
+M: IBNBD/IBTRS Storage Team <ib...@profitbricks.com>
+L: linux-r...@vger.kernel.org
+S: Maintained
+T: git git://github.com/profitbricks/ibnbd.git
+F: drivers/infiniband/ulp/ibtrs/
+
 ICH LPC AND GPIO DRIVER
 M: Peter Tyser <pty...@xes-inc.com>
 S: Maintained
-- 
2.13.1



[PATCH v2 24/26] ibnbd: include client and server modules into kernel compilation

2018-05-18 Thread Roman Pen
Add IBNBD Makefile, Kconfig and also corresponding lines into upper
block layer files.

Signed-off-by: Roman Pen <roman.peny...@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kip...@profitbricks.com>
Cc: Jack Wang <jinpu.w...@profitbricks.com>
---
 drivers/block/Kconfig|  2 ++
 drivers/block/Makefile   |  1 +
 drivers/block/ibnbd/Kconfig  | 22 ++
 drivers/block/ibnbd/Makefile | 13 +
 4 files changed, 38 insertions(+)
 create mode 100644 drivers/block/ibnbd/Kconfig
 create mode 100644 drivers/block/ibnbd/Makefile

diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig
index ad9b687a236a..d8c1590411c8 100644
--- a/drivers/block/Kconfig
+++ b/drivers/block/Kconfig
@@ -481,4 +481,6 @@ config BLK_DEV_RSXX
  To compile this driver as a module, choose M here: the
  module will be called rsxx.
 
+source "drivers/block/ibnbd/Kconfig"
+
 endif # BLK_DEV
diff --git a/drivers/block/Makefile b/drivers/block/Makefile
index dc061158b403..65346a1d0b1a 100644
--- a/drivers/block/Makefile
+++ b/drivers/block/Makefile
@@ -38,6 +38,7 @@ obj-$(CONFIG_BLK_DEV_PCIESSD_MTIP32XX)+= mtip32xx/
 obj-$(CONFIG_BLK_DEV_RSXX) += rsxx/
 obj-$(CONFIG_BLK_DEV_NULL_BLK) += null_blk.o
 obj-$(CONFIG_ZRAM) += zram/
+obj-$(CONFIG_BLK_DEV_IBNBD)+= ibnbd/
 
 skd-y  := skd_main.o
 swim_mod-y := swim.o swim_asm.o
diff --git a/drivers/block/ibnbd/Kconfig b/drivers/block/ibnbd/Kconfig
new file mode 100644
index ..b381c6c084d2
--- /dev/null
+++ b/drivers/block/ibnbd/Kconfig
@@ -0,0 +1,22 @@
+config BLK_DEV_IBNBD
+   bool
+
+config BLK_DEV_IBNBD_CLIENT
+   tristate "Network block device driver on top of IBTRS transport"
+   depends on INFINIBAND_IBTRS_CLIENT
+   select BLK_DEV_IBNBD
+   help
+ IBNBD client allows for mapping of a remote block devices over
+ IBTRS protocol from a target system where IBNBD server is running.
+
+ If unsure, say N.
+
+config BLK_DEV_IBNBD_SERVER
+   tristate "Network block device over RDMA Infiniband server support"
+   depends on INFINIBAND_IBTRS_SERVER
+   select BLK_DEV_IBNBD
+   help
+ IBNBD server allows for exporting local block devices to a remote 
client
+ over IBTRS protocol.
+
+ If unsure, say N.
diff --git a/drivers/block/ibnbd/Makefile b/drivers/block/ibnbd/Makefile
new file mode 100644
index ..5f20e72e0633
--- /dev/null
+++ b/drivers/block/ibnbd/Makefile
@@ -0,0 +1,13 @@
+ccflags-y := -Idrivers/infiniband/ulp/ibtrs
+
+ibnbd-client-y := ibnbd-clt.o \
+ ibnbd-clt-sysfs.o
+
+ibnbd-server-y := ibnbd-srv.o \
+ ibnbd-srv-dev.o \
+ ibnbd-srv-sysfs.o
+
+obj-$(CONFIG_BLK_DEV_IBNBD_CLIENT) += ibnbd-client.o
+obj-$(CONFIG_BLK_DEV_IBNBD_SERVER) += ibnbd-server.o
+
+-include $(src)/compat/compat.mk
-- 
2.13.1



[PATCH v2 25/26] ibnbd: a bit of documentation

2018-05-18 Thread Roman Pen
README with description of major sysfs entries.

Signed-off-by: Roman Pen <roman.peny...@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kip...@profitbricks.com>
Cc: Jack Wang <jinpu.w...@profitbricks.com>
---
 drivers/block/ibnbd/README | 299 +
 1 file changed, 299 insertions(+)
 create mode 100644 drivers/block/ibnbd/README

diff --git a/drivers/block/ibnbd/README b/drivers/block/ibnbd/README
new file mode 100644
index ..bbaddd02c1c5
--- /dev/null
+++ b/drivers/block/ibnbd/README
@@ -0,0 +1,299 @@
+***
+Infiniband Network Block Device (IBNBD)
+***
+
+Introduction
+
+
+IBNBD (InfiniBand Network Block Device) is a pair of kernel modules
+(client and server) that allow for remote access of a block device on
+the server over IBTRS protocol using the RDMA (InfiniBand, RoCE, iWarp)
+transport. After being mapped, the remote block devices can be accessed
+on the client side as local block devices.
+
+I/O is transfered between client and server by the IBTRS transport
+modules. The administration of IBNBD and IBTRS modules is done via
+sysfs entries.
+
+Requirements
+
+
+  IBTRS kernel modules
+
+Quick Start
+---
+
+Server side:
+  # modprobe ibnbd_server
+
+Client side:
+  # modprobe ibnbd_client
+  # echo "sessname=blya path=ip:10.50.100.66 device_path=/dev/ram0" > \
+/sys/devices/virtual/ibnbd-client/ctl/map_device
+
+  Where "sessname=" is a session name, a string to identify the session
+  on client and on server sides; "path=" is a destination IP address or
+  a pair of a source and a destination IPs, separated by comma.  Multiple
+  "path=" options can be specified in order to use multipath  (see IBTRS
+  description for details); "device_path=" is the block device to be
+  mapped from the server side. After the session to the server machine is
+  established, the mapped device will appear on the client side under
+  /dev/ibnbd.
+
+
+==
+Client Sysfs Interface
+==
+
+All sysfs files that are not read-only provide the usage information on read:
+
+Example:
+  # cat /sys/devices/virtual/ibnbd-client/ctl/map_device
+
+  > Usage: echo "sessname= path=<[srcaddr,]dstaddr>
+  > [path=<[srcaddr,]dstaddr>] device_path=
+  > [access_mode=<ro|rw|migration>]
+  > [io_mode=<fileio|blockio>]" > map_device
+  >
+  > addr ::= [ ip: | ip: | gid: ]
+
+Entries under /sys/devices/virtual/ibnbd-client/ctl/
+===
+
+map_device (RW)
+---
+
+Expected format is the following:
+
+sessname=
+path=<[srcaddr,]dstaddr> [path=<[srcaddr,]dstaddr> ...]
+device_path=
+[access_mode=<ro|rw|migration>]
+[io_mode=<fileio|blockio>]
+
+Where:
+
+sessname: accepts a string not bigger than 256 chars, which identifies
+  a given session on the client and on the server.
+  I.e. "clt_hostname-srv_hostname" could be a natural choice.
+
+path: describes a connection between the client and the server by
+  specifying destination and, when required, the source address.
+  The addresses are to be provided in the following format:
+
+ip:
+ip:
+gid:
+
+  for example:
+
+  path=ip:10.0.0.66
+ The single addr is treated as the destination.
+ The connection will be established to this
+ server from any client IP address.
+
+  path=ip:10.0.0.66,ip:10.0.1.66
+ First addr is the source address and the second
+ is the destination.
+
+  If multiple "path=" options are specified multiple connection
+  will be established and data will be sent according to
+  the selected multipath policy (see IBTRS mp_policy sysfs entry
+  description).
+
+device_path: Path to the block device on the server side. Path is specified
+ relative to the directory on server side configured in the
+ 'dev_search_path' module parameter of the ibnbd_server.
+ The ibnbd_server prepends the  received from client
+ with  and tries to open the
+ / block device.  On success,
+ a /dev/ibnbd device file, a /sys/block/ibnbd_client/ibnbd/
+ directory and an entry in 
/sys/devices/virtual/ibnbd-client/ctl/devices
+ will be created.
+
+ If 'dev_search_path' contains '%SESSNAME%', then each session can
+ have different devices namespace, e.g. server was configured with
+ the following parameter "dev_search_path=/run/ibnbd-devs/%SESSNAME%",
+ client has this string "sessname=blya device_path=sda", then ser

[PATCH v2 17/26] ibnbd: client: private header with client structs and functions

2018-05-18 Thread Roman Pen
This header describes main structs and functions used by ibnbd-client
module, mainly for managing IBNBD sessions and mapped block devices,
creating and destroying sysfs entries.

Signed-off-by: Roman Pen <roman.peny...@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kip...@profitbricks.com>
Cc: Jack Wang <jinpu.w...@profitbricks.com>
---
 drivers/block/ibnbd/ibnbd-clt.h | 172 
 1 file changed, 172 insertions(+)
 create mode 100644 drivers/block/ibnbd/ibnbd-clt.h

diff --git a/drivers/block/ibnbd/ibnbd-clt.h b/drivers/block/ibnbd/ibnbd-clt.h
new file mode 100644
index ..c5f6f08ec338
--- /dev/null
+++ b/drivers/block/ibnbd/ibnbd-clt.h
@@ -0,0 +1,172 @@
+/*
+ * InfiniBand Network Block Driver
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler <m...@fholler.de>
+ *  Jack Wang <jinpu.w...@profitbricks.com>
+ *  Kleber Souza <kleber.so...@profitbricks.com>
+ *  Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *  Milind Dumbare <milind.dumb...@gmail.com>
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *  Swapnil Ingle <swapnil.in...@profitbricks.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef IBNBD_CLT_H
+#define IBNBD_CLT_H
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "ibtrs.h"
+#include "ibnbd-proto.h"
+#include "ibnbd-log.h"
+
+#define BMAX_SEGMENTS 31
+#define RECONNECT_DELAY 30
+#define MAX_RECONNECTS -1
+
+enum ibnbd_clt_dev_state {
+   DEV_STATE_INIT,
+   DEV_STATE_MAPPED,
+   DEV_STATE_MAPPED_DISCONNECTED,
+   DEV_STATE_UNMAPPED,
+};
+
+struct ibnbd_iu_comp {
+   wait_queue_head_t wait;
+   int errno;
+};
+
+struct ibnbd_iu {
+   union {
+   struct request *rq; /* for block io */
+   void *buf; /* for user messages */
+   };
+   struct ibtrs_tag*tag;
+   union {
+   /* use to send msg associated with a dev */
+   struct ibnbd_clt_dev *dev;
+   /* use to send msg associated with a sess */
+   struct ibnbd_clt_session *sess;
+   };
+   blk_status_tstatus;
+   struct scatterlist  sglist[BMAX_SEGMENTS];
+   struct work_struct  work;
+   int errno;
+   struct ibnbd_iu_comp*comp;
+};
+
+struct ibnbd_cpu_qlist {
+   struct list_headrequeue_list;
+   spinlock_t  requeue_lock;
+   unsigned intcpu;
+};
+
+struct ibnbd_clt_session {
+   struct list_headlist;
+   struct ibtrs_clt*ibtrs;
+   wait_queue_head_t   ibtrs_waitq;
+   boolibtrs_ready;
+   struct ibnbd_cpu_qlist  __percpu
+   *cpu_queues;
+   DECLARE_BITMAP(cpu_queues_bm, NR_CPUS);
+   int __percpu*cpu_rr; /* per-cpu var for CPU round-robin */
+   atomic_tbusy;
+   int queue_depth;
+   u32 max_io_size;
+   struct blk_mq_tag_set   tag_set;
+   struct mutexlock; /* protects state and devs_list */
+   struct list_headdevs_list; /* list of struct ibnbd_clt_dev */
+   refcount_t  refcount;
+   charsessname[NAME_MAX];
+   u8  ver; /* protocol version */
+};
+
+/**
+ * Submission queues.
+ */
+struct ibnbd_queue {
+   struct list_headrequeue_list;
+   unsigned long   in_list;
+   struct ibnbd_clt_dev*dev;
+   struct blk_mq_hw_ctx*hctx;
+};
+
+struct ibnbd_clt_dev {
+   struct ibnbd_clt_session*sess;
+   struct request_queue*queue;
+   struct ibnbd_queue  *hw_queues;
+   u32 device_id;
+   /* local Idr index - used to track minor number allocations. */
+   u32 clt_device_id;
+   struct mutexlock;
+   enum ibnbd_clt_dev_stat

[PATCH v2 14/26] ibtrs: include client and server modules into kernel compilation

2018-05-18 Thread Roman Pen
Add IBTRS Makefile, Kconfig and also corresponding lines into upper
layer infiniband/ulp files.

Signed-off-by: Roman Pen <roman.peny...@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kip...@profitbricks.com>
Cc: Jack Wang <jinpu.w...@profitbricks.com>
---
 drivers/infiniband/Kconfig|  1 +
 drivers/infiniband/ulp/Makefile   |  1 +
 drivers/infiniband/ulp/ibtrs/Kconfig  | 20 
 drivers/infiniband/ulp/ibtrs/Makefile | 15 +++
 4 files changed, 37 insertions(+)
 create mode 100644 drivers/infiniband/ulp/ibtrs/Kconfig
 create mode 100644 drivers/infiniband/ulp/ibtrs/Makefile

diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig
index ee270e065ba9..787bd286fb08 100644
--- a/drivers/infiniband/Kconfig
+++ b/drivers/infiniband/Kconfig
@@ -94,6 +94,7 @@ source "drivers/infiniband/ulp/srpt/Kconfig"
 
 source "drivers/infiniband/ulp/iser/Kconfig"
 source "drivers/infiniband/ulp/isert/Kconfig"
+source "drivers/infiniband/ulp/ibtrs/Kconfig"
 
 source "drivers/infiniband/ulp/opa_vnic/Kconfig"
 source "drivers/infiniband/sw/rdmavt/Kconfig"
diff --git a/drivers/infiniband/ulp/Makefile b/drivers/infiniband/ulp/Makefile
index 437813c7b481..1c4f10dc8d49 100644
--- a/drivers/infiniband/ulp/Makefile
+++ b/drivers/infiniband/ulp/Makefile
@@ -5,3 +5,4 @@ obj-$(CONFIG_INFINIBAND_SRPT)   += srpt/
 obj-$(CONFIG_INFINIBAND_ISER)  += iser/
 obj-$(CONFIG_INFINIBAND_ISERT) += isert/
 obj-$(CONFIG_INFINIBAND_OPA_VNIC)  += opa_vnic/
+obj-$(CONFIG_INFINIBAND_IBTRS) += ibtrs/
diff --git a/drivers/infiniband/ulp/ibtrs/Kconfig 
b/drivers/infiniband/ulp/ibtrs/Kconfig
new file mode 100644
index ..eaeb8f3f6b4e
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs/Kconfig
@@ -0,0 +1,20 @@
+config INFINIBAND_IBTRS
+   tristate
+   depends on INFINIBAND_ADDR_TRANS
+
+config INFINIBAND_IBTRS_CLIENT
+   tristate "IBTRS client module"
+   depends on INFINIBAND_ADDR_TRANS
+   select INFINIBAND_IBTRS
+   help
+ IBTRS client allows for simplified data transfer and connection
+ establishment over RDMA (InfiniBand, RoCE, iWarp). Uses BIO-like
+ READ/WRITE semantics and provides multipath capabilities.
+
+config INFINIBAND_IBTRS_SERVER
+   tristate "IBTRS server module"
+   depends on INFINIBAND_ADDR_TRANS
+   select INFINIBAND_IBTRS
+   help
+ IBTRS server module processing connection and IO requests received
+ from the IBTRS client module.
diff --git a/drivers/infiniband/ulp/ibtrs/Makefile 
b/drivers/infiniband/ulp/ibtrs/Makefile
new file mode 100644
index ..e6ea858745ad
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs/Makefile
@@ -0,0 +1,15 @@
+ibtrs-client-y := ibtrs-clt.o \
+ ibtrs-clt-stats.o \
+ ibtrs-clt-sysfs.o
+
+ibtrs-server-y := ibtrs-srv.o \
+ ibtrs-srv-stats.o \
+ ibtrs-srv-sysfs.o
+
+ibtrs-core-y := ibtrs.o
+
+obj-$(CONFIG_INFINIBAND_IBTRS)+= ibtrs-core.o
+obj-$(CONFIG_INFINIBAND_IBTRS_CLIENT) += ibtrs-client.o
+obj-$(CONFIG_INFINIBAND_IBTRS_SERVER) += ibtrs-server.o
+
+-include $(src)/compat/compat.mk
-- 
2.13.1



[PATCH v2 22/26] ibnbd: server: functionality for IO submission to file or block dev

2018-05-18 Thread Roman Pen
This provides helper functions for IO submission to file or block dev.

Signed-off-by: Roman Pen <roman.peny...@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kip...@profitbricks.com>
Cc: Jack Wang <jinpu.w...@profitbricks.com>
---
 drivers/block/ibnbd/ibnbd-srv-dev.c | 410 
 drivers/block/ibnbd/ibnbd-srv-dev.h | 149 +
 2 files changed, 559 insertions(+)
 create mode 100644 drivers/block/ibnbd/ibnbd-srv-dev.c
 create mode 100644 drivers/block/ibnbd/ibnbd-srv-dev.h

diff --git a/drivers/block/ibnbd/ibnbd-srv-dev.c 
b/drivers/block/ibnbd/ibnbd-srv-dev.c
new file mode 100644
index ..a5894849b9d5
--- /dev/null
+++ b/drivers/block/ibnbd/ibnbd-srv-dev.c
@@ -0,0 +1,410 @@
+/*
+ * InfiniBand Network Block Driver
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler <m...@fholler.de>
+ *  Jack Wang <jinpu.w...@profitbricks.com>
+ *  Kleber Souza <kleber.so...@profitbricks.com>
+ *  Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *  Milind Dumbare <milind.dumb...@gmail.com>
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#undef pr_fmt
+#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt
+
+#include "ibnbd-srv-dev.h"
+#include "ibnbd-log.h"
+
+#define IBNBD_DEV_MAX_FILEIO_ACTIVE_WORKERS 0
+
+struct ibnbd_dev_file_io_work {
+   struct ibnbd_dev*dev;
+   void*priv;
+
+   sector_tsector;
+   void*data;
+   size_t  len;
+   size_t  bi_size;
+   enum ibnbd_io_flags flags;
+
+   struct work_struct  work;
+};
+
+struct ibnbd_dev_blk_io {
+   struct ibnbd_dev *dev;
+   void *priv;
+};
+
+static struct workqueue_struct *fileio_wq;
+
+int ibnbd_dev_init(void)
+{
+   fileio_wq = alloc_workqueue("%s", WQ_UNBOUND,
+   IBNBD_DEV_MAX_FILEIO_ACTIVE_WORKERS,
+   "ibnbd_server_fileio_wq");
+   if (!fileio_wq)
+   return -ENOMEM;
+
+   return 0;
+}
+
+void ibnbd_dev_destroy(void)
+{
+   destroy_workqueue(fileio_wq);
+}
+
+static inline struct block_device *ibnbd_dev_open_bdev(const char *path,
+  fmode_t flags)
+{
+   return blkdev_get_by_path(path, flags, THIS_MODULE);
+}
+
+static int ibnbd_dev_blk_open(struct ibnbd_dev *dev, const char *path,
+ fmode_t flags)
+{
+   dev->bdev = ibnbd_dev_open_bdev(path, flags);
+   return PTR_ERR_OR_ZERO(dev->bdev);
+}
+
+static int ibnbd_dev_vfs_open(struct ibnbd_dev *dev, const char *path,
+ fmode_t flags)
+{
+   int oflags = O_DSYNC; /* enable write-through */
+
+   if (flags & FMODE_WRITE)
+   oflags |= O_RDWR;
+   else if (flags & FMODE_READ)
+   oflags |= O_RDONLY;
+   else
+   return -EINVAL;
+
+   dev->file = filp_open(path, oflags, 0);
+   return PTR_ERR_OR_ZERO(dev->file);
+}
+
+struct ibnbd_dev *ibnbd_dev_open(const char *path, fmode_t flags,
+enum ibnbd_io_mode mode, struct bio_set *bs,
+ibnbd_dev_io_fn io_cb)
+{
+   struct ibnbd_dev *dev;
+   int ret;
+
+   dev = kzalloc(sizeof(*dev), GFP_KERNEL);
+   if (!dev)
+   return ERR_PTR(-ENOMEM);
+
+   if (mode == IBNBD_BLOCKIO) {
+   dev->blk_open_flags = flags;
+   ret = ibnbd_dev_blk_open(dev, path, dev->blk_open_flags);
+   if (ret)
+   goto err;
+   } else if (mode == IBNBD_FILEIO) {
+   dev->blk_open_flags = FMODE_READ;
+   ret = ibnbd_dev_blk_open(dev, path, dev->blk_open_flags);
+   if (ret)
+   goto err;
+
+   ret = i

[PATCH v2 09/26] ibtrs: client: sysfs interface functions

2018-05-18 Thread Roman Pen
This is the sysfs interface to IBTRS sessions on client side:

  /sys/devices/virtual/ibtrs-client//
*** IBTRS session created by ibtrs_clt_open() API call
|
|- max_reconnect_attempts
|  *** number of reconnect attempts for session
|
|- add_path
|  *** adds another connection path into IBTRS session
|
|- paths//
   *** established paths to server in a session
   |
   |- disconnect
   |  *** disconnect path
   |
   |- reconnect
   |  *** reconnect path
   |
   |- remove_path
   |  *** remove current path
   |
   |- state
   |  *** retrieve current path state
   |
   |- hca_port
   |  *** HCA port number
   |
   |- hca_name
   |  *** HCA name
   |
   |- stats/
  *** current path statistics
  |
  |- cpu_migration
  |- rdma
  |- rdma_lat
  |- reconnects
  |- reset_all
  |- sg_entries
  |- wc_completions

Signed-off-by: Roman Pen <roman.peny...@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kip...@profitbricks.com>
Cc: Jack Wang <jinpu.w...@profitbricks.com>
---
 drivers/infiniband/ulp/ibtrs/ibtrs-clt-sysfs.c | 482 +
 1 file changed, 482 insertions(+)
 create mode 100644 drivers/infiniband/ulp/ibtrs/ibtrs-clt-sysfs.c

diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-clt-sysfs.c 
b/drivers/infiniband/ulp/ibtrs/ibtrs-clt-sysfs.c
new file mode 100644
index ..c185bbc4fd5c
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs/ibtrs-clt-sysfs.c
@@ -0,0 +1,482 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler <m...@fholler.de>
+ *  Jack Wang <jinpu.w...@profitbricks.com>
+ *  Kleber Souza <kleber.so...@profitbricks.com>
+ *  Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *  Milind Dumbare <milind.dumb...@gmail.com>
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#undef pr_fmt
+#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt
+
+#include "ibtrs-pri.h"
+#include "ibtrs-clt.h"
+#include "ibtrs-log.h"
+
+#define MIN_MAX_RECONN_ATT -1
+#define MAX_MAX_RECONN_ATT 
+
+static struct kobj_type ktype = {
+   .sysfs_ops = _sysfs_ops,
+};
+
+static ssize_t max_reconnect_attempts_show(struct device *dev,
+  struct device_attribute *attr,
+  char *page)
+{
+   struct ibtrs_clt *clt;
+
+   clt = container_of(dev, struct ibtrs_clt, dev);
+
+   return sprintf(page, "%d\n", ibtrs_clt_get_max_reconnect_attempts(clt));
+}
+
+static ssize_t max_reconnect_attempts_store(struct device *dev,
+   struct device_attribute *attr,
+   const char *buf,
+   size_t count)
+{
+   struct ibtrs_clt *clt;
+   int value;
+   int ret;
+
+   clt = container_of(dev, struct ibtrs_clt, dev);
+
+   ret = kstrtoint(buf, 10, );
+   if (unlikely(ret)) {
+   ibtrs_err(clt, "%s: failed to convert string '%s' to int\n",
+ attr->attr.name, buf);
+   return ret;
+   }
+   if (unlikely(value > MAX_MAX_RECONN_ATT ||
+value < MIN_MAX_RECONN_ATT)) {
+   ibtrs_err(clt, "%s: invalid range"
+ " (provided: '%s', accepted: min: %d, max: %d)\n",
+ attr->attr.name, buf, MIN_MAX_RECONN_ATT,
+ MAX_MAX_RECONN_ATT);
+   return -EINVAL;
+   }
+   ibtrs_clt_set_max_reconnect_attempts(clt, value);
+
+   return count;
+}
+
+static DEVICE_ATTR_RW(max_reconnect_attempts);
+
+static ssize_t mpath_policy_show(struct device *dev,
+struct device_a

[PATCH v2 19/26] ibnbd: client: sysfs interface functions

2018-05-18 Thread Roman Pen
This is the sysfs interface to IBNBD block devices on client side:

  /sys/devices/virtual/ibnbd-client/ctl/
|- map_device
|  *** maps remote device
|
|- devices/
   *** all mapped devices

  /sys/block/ibnbd/ibnbd_client/
|- unmap_device
|  *** unmaps device
|
|- state
|  *** device state
|
|- session
|  *** session name
|
|- mapping_path
   *** path of the dev that was mapped on server

Signed-off-by: Roman Pen <roman.peny...@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kip...@profitbricks.com>
Cc: Jack Wang <jinpu.w...@profitbricks.com>
---
 drivers/block/ibnbd/ibnbd-clt-sysfs.c | 675 ++
 1 file changed, 675 insertions(+)
 create mode 100644 drivers/block/ibnbd/ibnbd-clt-sysfs.c

diff --git a/drivers/block/ibnbd/ibnbd-clt-sysfs.c 
b/drivers/block/ibnbd/ibnbd-clt-sysfs.c
new file mode 100644
index ..ca3e59b28c54
--- /dev/null
+++ b/drivers/block/ibnbd/ibnbd-clt-sysfs.c
@@ -0,0 +1,675 @@
+/*
+ * InfiniBand Network Block Driver
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler <m...@fholler.de>
+ *  Jack Wang <jinpu.w...@profitbricks.com>
+ *  Kleber Souza <kleber.so...@profitbricks.com>
+ *  Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *  Milind Dumbare <milind.dumb...@gmail.com>
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *  Swapnil Ingle <swapnil.in...@profitbricks.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#undef pr_fmt
+#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "ibnbd-clt.h"
+
+static struct device *ibnbd_dev;
+static struct class *ibnbd_dev_class;
+static struct kobject *ibnbd_devs_kobj;
+
+enum {
+   IBNBD_OPT_ERR   = 0,
+   IBNBD_OPT_PATH  = 1 << 0,
+   IBNBD_OPT_DEV_PATH  = 1 << 1,
+   IBNBD_OPT_ACCESS_MODE   = 1 << 3,
+   IBNBD_OPT_IO_MODE   = 1 << 5,
+   IBNBD_OPT_SESSNAME  = 1 << 6,
+};
+
+static unsigned int ibnbd_opt_mandatory[] = {
+   IBNBD_OPT_PATH,
+   IBNBD_OPT_DEV_PATH,
+   IBNBD_OPT_SESSNAME,
+};
+
+static const match_table_t ibnbd_opt_tokens = {
+   {   IBNBD_OPT_PATH, "path=%s"   },
+   {   IBNBD_OPT_DEV_PATH, "device_path=%s"},
+   {   IBNBD_OPT_ACCESS_MODE,  "access_mode=%s"},
+   {   IBNBD_OPT_IO_MODE,  "io_mode=%s"},
+   {   IBNBD_OPT_SESSNAME, "sessname=%s"   },
+   {   IBNBD_OPT_ERR,  NULL},
+};
+
+/* remove new line from string */
+static void strip(char *s)
+{
+   char *p = s;
+
+   while (*s != '\0') {
+   if (*s != '\n')
+   *p++ = *s++;
+   else
+   ++s;
+   }
+   *p = '\0';
+}
+
+static int ibnbd_clt_parse_map_options(const char *buf,
+  char *sessname,
+  struct ibtrs_addr *paths,
+  size_t *path_cnt,
+  size_t max_path_cnt,
+  char *pathname,
+  enum ibnbd_access_mode *access_mode,
+  enum ibnbd_io_mode *io_mode)
+{
+   char *options, *sep_opt;
+   char *p;
+   substring_t args[MAX_OPT_ARGS];
+   int opt_mask = 0;
+   int token;
+   int ret = -EINVAL;
+   int i;
+   int p_cnt = 0;
+
+   options = kstrdup(buf, GFP_KERNEL);
+   if (!options)
+   return -ENOMEM;
+
+   sep_opt = strstrip(options);
+   strip(sep_opt);
+   while ((p = strsep(_opt, " ")) != NULL) {
+   if (!*p)
+  

[PATCH v2 15/26] ibtrs: a bit of documentation

2018-05-18 Thread Roman Pen
README with description of major sysfs entries.

Signed-off-by: Roman Pen <roman.peny...@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kip...@profitbricks.com>
Cc: Jack Wang <jinpu.w...@profitbricks.com>
---
 drivers/infiniband/ulp/ibtrs/README | 358 
 1 file changed, 358 insertions(+)
 create mode 100644 drivers/infiniband/ulp/ibtrs/README

diff --git a/drivers/infiniband/ulp/ibtrs/README 
b/drivers/infiniband/ulp/ibtrs/README
new file mode 100644
index ..010a93b02d9c
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs/README
@@ -0,0 +1,358 @@
+
+InfiniBand Transport (IBTRS)
+
+
+IBTRS (InfiniBand Transport) is a reliable high speed transport library
+which provides support to establish optimal number of connections
+between client and server machines using RDMA (InfiniBand, RoCE, iWarp)
+transport. It is optimized to transfer (read/write) IO blocks.
+
+In its core interface it follows the BIO semantics of providing the
+possibility to either write data from an sg list to the remote side
+or to request ("read") data transfer from the remote side into a given
+sg list.
+
+IBTRS provides I/O fail-over and load-balancing capabilities by using
+multipath I/O (see "add_path" and "mp_policy" configuration entries).
+
+IBTRS is used by the IBNBD (Infiniband Network Block Device) modules.
+
+==
+Client Sysfs Interface
+==
+
+This chapter describes only the most important files of sysfs interface
+on client side.
+
+Entries under /sys/devices/virtual/ibtrs-client/
+
+
+When a user of IBTRS API creates a new session, a directory entry with
+the name of that session is created.
+
+Entries under /sys/devices/virtual/ibtrs-client//
+===
+
+add_path (RW)
+-
+
+Adds a new path (connection) to an existing session. Expected format is the
+following:
+
+  <[source addr,]destination addr>
+
+  *addr ::= [ ip:<ipv4|ipv6> | gid: ]
+
+max_reconnect_attempts (RW)
+---
+
+Maximum number reconnect attempts the client should make before giving up
+after connection breaks unexpectedly.
+
+mp_policy (RW)
+--
+
+Multipath policy specifies which path should be selected on each IO:
+
+   round-robin (0):
+   select path in per CPU round-robin manner.
+
+   min-inflight (1):
+   select path with minimum inflights.
+
+Entries under /sys/devices/virtual/ibtrs-client//paths/
+=
+
+
+Each path belonging to a given session is listed here by its destination
+address. When a new path is added to a session by writing to the "add_path"
+entry, a directory with the corresponding destination address is created.
+
+Entries under 
/sys/devices/virtual/ibtrs-client//paths//
+=
+
+state (R)
+-
+
+Contains "connected" if the session is connected to the peer and fully
+functional.  Otherwise the file contains "disconnected"
+
+reconnect (RW)
+--
+
+Write "1" to the file in order to reconnect the path.
+Operation is blocking and returns 0 if reconnect was successful.
+
+disconnect (RW)
+---
+
+Write "1" to the file in order to disconnect the path.
+Operation blocks until IBTRS path is disconnected.
+
+remove_path (RW)
+
+
+Write "1" to the file in order to disconnected and remove the path
+from the session.  Operation blocks until the path is disconnected
+and removed from the session.
+
+Entries under 
/sys/devices/virtual/ibtrs-client//paths//stats/
+===
+
+Write "0" to any file in that directory to reset corresponding statistics.
+
+reset_all (RW)
+--
+
+Read will return usage help, write 0 will clear all the statistics.
+
+sg_entries (RW)
+---
+
+Data to be transferred via RDMA is passed to IBTRS as scatter-gather
+list. A scatter-gather list can contain multiple entries.
+Scatter-gather list with less entries require less processing power
+and can therefore transferred faster. The file sg_entries outputs a
+per-CPU distribution table for the number of entries in the
+scatter-gather lists, that were passed to the IBTRS API function
+ibtrs_clt_request (READ or WRITE).
+
+cpu_migration (RW)
+--
+
+IBTRS expects that each HCA IRQ is pinned to a separate CPU. If it's
+not the case, the processing of an I/O response could be processed on a
+different CPU than where it was originally submitted.  This file shows
+how many interrupts where generated on a non expected CPU.
+"from:" is the

[PATCH v2 12/26] ibtrs: server: statistics functions

2018-05-18 Thread Roman Pen
This introduces set of functions used on server side to account
statistics of RDMA data sent/received.

Signed-off-by: Roman Pen <roman.peny...@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kip...@profitbricks.com>
Cc: Jack Wang <jinpu.w...@profitbricks.com>
---
 drivers/infiniband/ulp/ibtrs/ibtrs-srv-stats.c | 110 +
 1 file changed, 110 insertions(+)
 create mode 100644 drivers/infiniband/ulp/ibtrs/ibtrs-srv-stats.c

diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-srv-stats.c 
b/drivers/infiniband/ulp/ibtrs/ibtrs-srv-stats.c
new file mode 100644
index ..5933cfc03f95
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs/ibtrs-srv-stats.c
@@ -0,0 +1,110 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler <m...@fholler.de>
+ *  Jack Wang <jinpu.w...@profitbricks.com>
+ *  Kleber Souza <kleber.so...@profitbricks.com>
+ *  Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *  Milind Dumbare <milind.dumb...@gmail.com>
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#undef pr_fmt
+#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt
+
+#include "ibtrs-srv.h"
+
+void ibtrs_srv_update_rdma_stats(struct ibtrs_srv_stats *s,
+size_t size, int d)
+{
+   atomic64_inc(>rdma_stats.dir[d].cnt);
+   atomic64_add(size, >rdma_stats.dir[d].size_total);
+}
+
+void ibtrs_srv_update_wc_stats(struct ibtrs_srv_stats *s)
+{
+   atomic64_inc(>wc_comp.calls);
+   atomic64_inc(>wc_comp.total_wc_cnt);
+}
+
+int ibtrs_srv_reset_rdma_stats(struct ibtrs_srv_stats *stats, bool enable)
+{
+   if (enable) {
+   struct ibtrs_srv_stats_rdma_stats *r = >rdma_stats;
+
+   memset(r, 0, sizeof(*r));
+   return 0;
+   }
+
+   return -EINVAL;
+}
+
+ssize_t ibtrs_srv_stats_rdma_to_str(struct ibtrs_srv_stats *stats,
+   char *page, size_t len)
+{
+   struct ibtrs_srv_stats_rdma_stats *r = >rdma_stats;
+   struct ibtrs_srv_sess *sess;
+
+   sess = container_of(stats, typeof(*sess), stats);
+
+   return scnprintf(page, len, "%lld %lld %lld %lld %u\n",
+(s64)atomic64_read(>dir[READ].cnt),
+(s64)atomic64_read(>dir[READ].size_total),
+(s64)atomic64_read(>dir[WRITE].cnt),
+(s64)atomic64_read(>dir[WRITE].size_total),
+atomic_read(>ids_inflight));
+}
+
+int ibtrs_srv_reset_wc_completion_stats(struct ibtrs_srv_stats *stats,
+   bool enable)
+{
+   if (enable) {
+   memset(>wc_comp, 0, sizeof(stats->wc_comp));
+   return 0;
+   }
+
+   return -EINVAL;
+}
+
+int ibtrs_srv_stats_wc_completion_to_str(struct ibtrs_srv_stats *stats,
+char *buf, size_t len)
+{
+   return snprintf(buf, len, "%lld %lld\n",
+   (s64)atomic64_read(>wc_comp.total_wc_cnt),
+   (s64)atomic64_read(>wc_comp.calls));
+}
+
+ssize_t ibtrs_srv_reset_all_help(struct ibtrs_srv_stats *stats,
+char *page, size_t len)
+{
+   return scnprintf(page, PAGE_SIZE, "echo 1 to reset all statistics\n");
+}
+
+int ibtrs_srv_reset_all_stats(struct ibtrs_srv_stats *stats, bool enable)
+{
+   if (enable) {
+   ibtrs_srv_reset_wc_completion_stats(stats, enable);
+   ibtrs_srv_reset_rdma_stats(stats, enable);
+   return 0;
+   }
+
+   return -EINVAL;
+}
-- 
2.13.1



[PATCH v2 16/26] ibnbd: private headers with IBNBD protocol structs and helpers

2018-05-18 Thread Roman Pen
These are common private headers with IBNBD protocol structures,
logging, sysfs and other helper functions, which are used on
both client and server sides.

Signed-off-by: Roman Pen <roman.peny...@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kip...@profitbricks.com>
Cc: Jack Wang <jinpu.w...@profitbricks.com>
---
 drivers/block/ibnbd/ibnbd-log.h   |  71 
 drivers/block/ibnbd/ibnbd-proto.h | 364 ++
 2 files changed, 435 insertions(+)
 create mode 100644 drivers/block/ibnbd/ibnbd-log.h
 create mode 100644 drivers/block/ibnbd/ibnbd-proto.h

diff --git a/drivers/block/ibnbd/ibnbd-log.h b/drivers/block/ibnbd/ibnbd-log.h
new file mode 100644
index ..489343a61171
--- /dev/null
+++ b/drivers/block/ibnbd/ibnbd-log.h
@@ -0,0 +1,71 @@
+/*
+ * InfiniBand Network Block Driver
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler <m...@fholler.de>
+ *  Jack Wang <jinpu.w...@profitbricks.com>
+ *  Kleber Souza <kleber.so...@profitbricks.com>
+ *  Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *  Milind Dumbare <milind.dumb...@gmail.com>
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef IBNBD_LOG_H
+#define IBNBD_LOG_H
+
+#include "ibnbd-clt.h"
+#include "ibnbd-srv.h"
+
+#define ibnbd_diskname(dev) ({ \
+   struct gendisk *gd = ((struct ibnbd_clt_dev *)dev)->gd; \
+   gd ? gd->disk_name : "";\
+})
+
+void unknown_type(void);
+
+#define ibnbd_log(fn, dev, fmt, ...) ({
\
+   __builtin_choose_expr(  \
+   __builtin_types_compatible_p(   \
+   typeof(dev), struct ibnbd_clt_dev *),   \
+   fn("<%s@%s> %s: " fmt, (dev)->pathname, \
+  (dev)->sess->sessname, ibnbd_diskname(dev),  \
+  ##__VA_ARGS__),  \
+   __builtin_choose_expr(  \
+   __builtin_types_compatible_p(typeof(dev),   \
+   struct ibnbd_srv_sess_dev *),   \
+   fn("<%s@%s>: " fmt, (dev)->pathname,\
+  (dev)->sess->sessname, ##__VA_ARGS__),   
\
+   unknown_type()));   \
+})
+
+#define ibnbd_err(dev, fmt, ...)   \
+   ibnbd_log(pr_err, dev, fmt, ##__VA_ARGS__)
+#define ibnbd_err_rl(dev, fmt, ...)\
+   ibnbd_log(pr_err_ratelimited, dev, fmt, ##__VA_ARGS__)
+#define ibnbd_wrn(dev, fmt, ...)   \
+   ibnbd_log(pr_warn, dev, fmt, ##__VA_ARGS__)
+#define ibnbd_wrn_rl(dev, fmt, ...) \
+   ibnbd_log(pr_warn_ratelimited, dev, fmt, ##__VA_ARGS__)
+#define ibnbd_info(dev, fmt, ...) \
+   ibnbd_log(pr_info, dev, fmt, ##__VA_ARGS__)
+#define ibnbd_info_rl(dev, fmt, ...) \
+   ibnbd_log(pr_info_ratelimited, dev, fmt, ##__VA_ARGS__)
+
+#endif /* IBNBD_LOG_H */
diff --git a/drivers/block/ibnbd/ibnbd-proto.h 
b/drivers/block/ibnbd/ibnbd-proto.h
new file mode 100644
index ..050d3fa4c1bf
--- /dev/null
+++ b/drivers/block/ibnbd/ibnbd-proto.h
@@ -0,0 +1,364 @@
+/*
+ * InfiniBand Network Block Driver
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler <m...@fholler.de>
+ *  Jack Wang <jinpu.w...@profitbricks.com>
+ *  Kleber Souza <kleber.so...@profitbricks.com>
+ *  Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *  Milind Dumbare <milind.dumb...@gmail.com>
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis <danil.kip...

[PATCH v2 11/26] ibtrs: server: main functionality

2018-05-18 Thread Roman Pen
This is main functionality of ibtrs-server module, which accepts
set of RDMA connections (so called IBTRS session), creates/destroys
sysfs entries associated with IBTRS session and notifies upper layer
(user of IBTRS API) about RDMA requests or link events.

Signed-off-by: Roman Pen <roman.peny...@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kip...@profitbricks.com>
Cc: Jack Wang <jinpu.w...@profitbricks.com>
---
 drivers/infiniband/ulp/ibtrs/ibtrs-srv.c | 1981 ++
 1 file changed, 1981 insertions(+)
 create mode 100644 drivers/infiniband/ulp/ibtrs/ibtrs-srv.c

diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-srv.c 
b/drivers/infiniband/ulp/ibtrs/ibtrs-srv.c
new file mode 100644
index ..d57fa6af5a5c
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs/ibtrs-srv.c
@@ -0,0 +1,1981 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler <m...@fholler.de>
+ *  Jack Wang <jinpu.w...@profitbricks.com>
+ *  Kleber Souza <kleber.so...@profitbricks.com>
+ *  Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *  Milind Dumbare <milind.dumb...@gmail.com>
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *  Swapnil Ingle <swapnil.in...@profitbricks.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#undef pr_fmt
+#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt
+
+#include 
+#include 
+
+#include "ibtrs-srv.h"
+#include "ibtrs-log.h"
+
+MODULE_AUTHOR("ib...@profitbricks.com");
+MODULE_DESCRIPTION("IBTRS Server");
+MODULE_VERSION(IBTRS_VER_STRING);
+MODULE_LICENSE("GPL");
+
+/* Must be power of 2, see mask from mr->page_size in ib_sg_to_pages() */
+#define DEFAULT_MAX_CHUNK_SIZE (128 << 10)
+#define DEFAULT_SESS_QUEUE_DEPTH 512
+#define MAX_HDR_SIZE PAGE_SIZE
+#define MAX_SG_COUNT ((MAX_HDR_SIZE - sizeof(struct ibtrs_msg_rdma_read)) \
+ / sizeof(struct ibtrs_sg_desc))
+
+/* We guarantee to serve 10 paths at least */
+#define CHUNK_POOL_SZ 10
+
+static struct ibtrs_ib_dev_pool dev_pool;
+static mempool_t *chunk_pool;
+struct class *ibtrs_dev_class;
+
+static int retry_count = 7;
+static int __read_mostly max_chunk_size = DEFAULT_MAX_CHUNK_SIZE;
+static int __read_mostly sess_queue_depth = DEFAULT_SESS_QUEUE_DEPTH;
+
+module_param_named(max_chunk_size, max_chunk_size, int, 0444);
+MODULE_PARM_DESC(max_chunk_size,
+"Max size for each IO request, when change the unit is in byte"
+" (default: " __stringify(DEFAULT_MAX_CHUNK_SIZE_KB) "KB)");
+
+module_param_named(sess_queue_depth, sess_queue_depth, int, 0444);
+MODULE_PARM_DESC(sess_queue_depth,
+"Number of buffers for pending I/O requests to allocate"
+" per session. Maximum: " __stringify(MAX_SESS_QUEUE_DEPTH)
+" (default: " __stringify(DEFAULT_SESS_QUEUE_DEPTH) ")");
+
+static int retry_count_set(const char *val, const struct kernel_param *kp)
+{
+   int err, ival;
+
+   err = kstrtoint(val, 0, );
+   if (err)
+   return err;
+
+   if (ival < MIN_RTR_CNT || ival > MAX_RTR_CNT) {
+   pr_err("Invalid retry count value %d, has to be"
+  " > %d, < %d\n", ival, MIN_RTR_CNT, MAX_RTR_CNT);
+   return -EINVAL;
+   }
+
+   retry_count = ival;
+   pr_info("QP retry count changed to %d\n", ival);
+
+   return 0;
+}
+
+static const struct kernel_param_ops retry_count_ops = {
+   .set= retry_count_set,
+   .get= param_get_int,
+};
+module_param_cb(retry_count, _count_ops, _count, 0644);
+
+MODULE_PARM_DESC(retry_count, "Number of times to send the message if the"
+" remote side didn't respond with Ack or Nack (default: 3,"
+" min: " __stringify(MIN_RTR

[PATCH v2 13/26] ibtrs: server: sysfs interface functions

2018-05-18 Thread Roman Pen
This is the sysfs interface to IBTRS sessions on server side:

  /sys/devices/virtual/ibtrs-server//
*** IBTRS session accepted from a client peer
|
|- paths//
   *** established paths from a client in a session
   |
   |- disconnect
   |  *** disconnect path
   |
   |- hca_name
   |  *** HCA name
   |
   |- hca_port
   |  *** HCA port
   |
   |- stats/
  *** current path statistics
  |
  |- rdma
  |- reset_all
  |- wc_completions

Signed-off-by: Roman Pen <roman.peny...@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kip...@profitbricks.com>
Cc: Jack Wang <jinpu.w...@profitbricks.com>
---
 drivers/infiniband/ulp/ibtrs/ibtrs-srv-sysfs.c | 271 +
 1 file changed, 271 insertions(+)
 create mode 100644 drivers/infiniband/ulp/ibtrs/ibtrs-srv-sysfs.c

diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-srv-sysfs.c 
b/drivers/infiniband/ulp/ibtrs/ibtrs-srv-sysfs.c
new file mode 100644
index ..96d9d9f08e0e
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs/ibtrs-srv-sysfs.c
@@ -0,0 +1,271 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler <m...@fholler.de>
+ *  Jack Wang <jinpu.w...@profitbricks.com>
+ *  Kleber Souza <kleber.so...@profitbricks.com>
+ *  Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *  Milind Dumbare <milind.dumb...@gmail.com>
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#undef pr_fmt
+#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt
+
+#include "ibtrs-pri.h"
+#include "ibtrs-srv.h"
+#include "ibtrs-log.h"
+
+extern struct class *ibtrs_dev_class;
+
+static struct kobj_type ktype = {
+   .sysfs_ops  = _sysfs_ops,
+};
+
+static ssize_t ibtrs_srv_disconnect_show(struct kobject *kobj,
+struct kobj_attribute *attr,
+char *page)
+{
+   return scnprintf(page, PAGE_SIZE, "Usage: echo 1 > %s\n",
+attr->attr.name);
+}
+
+static ssize_t ibtrs_srv_disconnect_store(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ const char *buf, size_t count)
+{
+   struct ibtrs_srv_sess *sess;
+   char str[MAXHOSTNAMELEN];
+
+   sess = container_of(kobj, struct ibtrs_srv_sess, kobj);
+   if (!sysfs_streq(buf, "1")) {
+   ibtrs_err(sess, "%s: invalid value: '%s'\n",
+ attr->attr.name, buf);
+   return -EINVAL;
+   }
+
+   sockaddr_to_str((struct sockaddr *)>s.dst_addr, str, sizeof(str));
+
+   ibtrs_info(sess, "disconnect for path %s requested\n", str);
+   ibtrs_srv_queue_close(sess);
+
+   return count;
+}
+
+static struct kobj_attribute ibtrs_srv_disconnect_attr =
+   __ATTR(disconnect, 0644,
+  ibtrs_srv_disconnect_show, ibtrs_srv_disconnect_store);
+
+static ssize_t ibtrs_srv_hca_port_show(struct kobject *kobj,
+  struct kobj_attribute *attr,
+  char *page)
+{
+   struct ibtrs_srv_sess *sess;
+   struct ibtrs_con *usr_con;
+
+   sess = container_of(kobj, typeof(*sess), kobj);
+   usr_con = sess->s.con[0];
+
+   return scnprintf(page, PAGE_SIZE, "%u\n",
+usr_con->cm_id->port_num);
+}
+
+static struct kobj_attribute ibtrs_srv_hca_port_attr =
+   __ATTR(hca_port, 0444, ibtrs_srv_hca_port_show, NULL);
+
+static ssize_t ibtrs_srv_hca_name_show(struct kobject *kobj,
+  struct kobj_attribute *attr,
+  char *page)
+{
+   struct ibtrs_srv_sess *sess;
+
+   sess = container_of(kobj, struct ibtrs_srv_sess, kobj);
+

[PATCH v2 07/26] ibtrs: client: main functionality

2018-05-18 Thread Roman Pen
This is main functionality of ibtrs-client module, which manages
set of RDMA connections for each IBTRS session, does multipathing,
load balancing and failover of RDMA requests.

Signed-off-by: Roman Pen <roman.peny...@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kip...@profitbricks.com>
Cc: Jack Wang <jinpu.w...@profitbricks.com>
---
 drivers/infiniband/ulp/ibtrs/ibtrs-clt.c | 2818 ++
 1 file changed, 2818 insertions(+)
 create mode 100644 drivers/infiniband/ulp/ibtrs/ibtrs-clt.c

diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-clt.c 
b/drivers/infiniband/ulp/ibtrs/ibtrs-clt.c
new file mode 100644
index ..0983f0939b19
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs/ibtrs-clt.c
@@ -0,0 +1,2818 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler <m...@fholler.de>
+ *  Jack Wang <jinpu.w...@profitbricks.com>
+ *  Kleber Souza <kleber.so...@profitbricks.com>
+ *  Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *  Milind Dumbare <milind.dumb...@gmail.com>
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *  Swapnil Ingle <swapnil.in...@profitbricks.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#undef pr_fmt
+#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt
+
+#include 
+#include 
+
+#include "ibtrs-clt.h"
+#include "ibtrs-log.h"
+
+#define MAX_SEGMENTS 31
+#define IBTRS_CONNECT_TIMEOUT_MS 5000
+
+MODULE_AUTHOR("ib...@profitbricks.com");
+MODULE_DESCRIPTION("IBTRS Client");
+MODULE_VERSION(IBTRS_VER_STRING);
+MODULE_LICENSE("GPL");
+
+static ushort nr_cons_per_session;
+module_param(nr_cons_per_session, ushort, 0444);
+MODULE_PARM_DESC(nr_cons_per_session, "Number of connections per session."
+" (default: nr_cpu_ids)");
+
+static int retry_cnt = 7;
+module_param_named(retry_cnt, retry_cnt, int, 0644);
+MODULE_PARM_DESC(retry_cnt, "Number of times to send the message if the"
+" remote side didn't respond with Ack or Nack (default: 7,"
+" min: " __stringify(MIN_RTR_CNT) ", max: "
+__stringify(MAX_RTR_CNT) ")");
+
+static int __read_mostly noreg_cnt = 0;
+module_param_named(noreg_cnt, noreg_cnt, int, 0444);
+MODULE_PARM_DESC(noreg_cnt, "Max number of SG entries when MR registration "
+"does not happen (default: 0)");
+
+static const struct ibtrs_ib_dev_pool_ops dev_pool_ops;
+static struct ibtrs_ib_dev_pool dev_pool = {
+   .ops = _pool_ops
+};
+static struct workqueue_struct *ibtrs_wq;
+static struct class *ibtrs_dev_class;
+
+static void ibtrs_rdma_error_recovery(struct ibtrs_clt_con *con);
+static int ibtrs_clt_rdma_cm_handler(struct rdma_cm_id *cm_id,
+struct rdma_cm_event *ev);
+static void ibtrs_clt_rdma_done(struct ib_cq *cq, struct ib_wc *wc);
+static void complete_rdma_req(struct ibtrs_clt_io_req *req, int errno,
+ bool notify, bool can_wait);
+static int ibtrs_clt_write_req(struct ibtrs_clt_io_req *req);
+static int ibtrs_clt_read_req(struct ibtrs_clt_io_req *req);
+
+bool ibtrs_clt_sess_is_connected(const struct ibtrs_clt_sess *sess)
+{
+   return sess->state == IBTRS_CLT_CONNECTED;
+}
+
+static inline bool ibtrs_clt_is_connected(const struct ibtrs_clt *clt)
+{
+   struct ibtrs_clt_sess *sess;
+   bool connected = false;
+
+   rcu_read_lock();
+   list_for_each_entry_rcu(sess, >paths_list, s.entry)
+   connected |= ibtrs_clt_sess_is_connected(sess);
+   rcu_read_unlock();
+
+   return connected;
+}
+
+static inline struct ibtrs_tag *
+__ibtrs_get_tag(struct ibtrs_clt *clt, enum ibtrs_clt_con_type con_type)
+{
+   size_t max_depth = clt->queue_depth;
+   struct ibtrs_tag *tag;
+   int cpu, bit;
+
+   cpu = get_cpu();
+   do {
+   bit 

[PATCH v2 10/26] ibtrs: server: private header with server structs and functions

2018-05-18 Thread Roman Pen
This header describes main structs and functions used by ibtrs-server
module, mainly for accepting IBTRS sessions, creating/destroying
sysfs entries, accounting statistics on server side.

Signed-off-by: Roman Pen <roman.peny...@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kip...@profitbricks.com>
Cc: Jack Wang <jinpu.w...@profitbricks.com>
---
 drivers/infiniband/ulp/ibtrs/ibtrs-srv.h | 175 +++
 1 file changed, 175 insertions(+)
 create mode 100644 drivers/infiniband/ulp/ibtrs/ibtrs-srv.h

diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-srv.h 
b/drivers/infiniband/ulp/ibtrs/ibtrs-srv.h
new file mode 100644
index ..8193d568e67e
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs/ibtrs-srv.h
@@ -0,0 +1,175 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler <m...@fholler.de>
+ *  Jack Wang <jinpu.w...@profitbricks.com>
+ *  Kleber Souza <kleber.so...@profitbricks.com>
+ *  Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *  Milind Dumbare <milind.dumb...@gmail.com>
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *  Swapnil Ingle <swapnil.in...@profitbricks.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef IBTRS_SRV_H
+#define IBTRS_SRV_H
+
+#include 
+#include 
+#include "ibtrs-pri.h"
+
+/**
+ * enum ibtrs_srv_state - Server states.
+ */
+enum ibtrs_srv_state {
+   IBTRS_SRV_CONNECTING,
+   IBTRS_SRV_CONNECTED,
+   IBTRS_SRV_CLOSING,
+   IBTRS_SRV_CLOSED,
+};
+
+static inline const char *ibtrs_srv_state_str(enum ibtrs_srv_state state)
+{
+   switch (state) {
+   case IBTRS_SRV_CONNECTING:
+   return "IBTRS_SRV_CONNECTING";
+   case IBTRS_SRV_CONNECTED:
+   return "IBTRS_SRV_CONNECTED";
+   case IBTRS_SRV_CLOSING:
+   return "IBTRS_SRV_CLOSING";
+   case IBTRS_SRV_CLOSED:
+   return "IBTRS_SRV_CLOSED";
+   default:
+   return "UNKNOWN";
+   }
+}
+
+struct ibtrs_stats_wc_comp {
+   atomic64_t  calls;
+   atomic64_t  total_wc_cnt;
+};
+
+struct ibtrs_srv_stats_rdma_stats {
+   struct {
+   atomic64_t  cnt;
+   atomic64_t  size_total;
+   } dir[2];
+};
+
+struct ibtrs_srv_stats {
+   struct ibtrs_srv_stats_rdma_stats   rdma_stats;
+   atomic_tapm_cnt;
+   struct ibtrs_stats_wc_comp  wc_comp;
+};
+
+struct ibtrs_srv_con {
+   struct ibtrs_conc;
+   atomic_twr_cnt;
+};
+
+struct ibtrs_srv_op {
+   struct ibtrs_srv_con*con;
+   u32 msg_id;
+   u8  dir;
+   struct ibtrs_msg_rdma_read  *rd_msg;
+   struct ib_rdma_wr   *tx_wr;
+   struct ib_sge   *tx_sg;
+};
+
+struct ibtrs_srv_mr {
+   struct ib_mr*mr;
+   struct sg_table sgt;
+};
+
+struct ibtrs_srv_sess {
+   struct ibtrs_sess   s;
+   struct ibtrs_srv*srv;
+   struct work_struct  close_work;
+   enum ibtrs_srv_statestate;
+   spinlock_t  state_lock;
+   int cur_cq_vector;
+   struct ibtrs_srv_op **ops_ids;
+   atomic_tids_inflight;
+   wait_queue_head_t   ids_waitq;
+   struct ibtrs_srv_mr *mrs;
+   unsigned intmrs_num;
+   dma_addr_t  *dma_addr;
+   boolestablished;
+   unsigned intmem_bits;
+   struct kobject  kobj;
+   struct kobject  kobj_stats;
+   struct ibtrs_srv_stats  stats;
+};
+
+struct ibtrs_srv {
+   struct list_headpaths_list;
+   int paths_up;
+   struct mutexpaths_ev_mutex;
+   size_t  paths_num;
+   struct mutexpaths_mutex;
+   uu

[PATCH v2 08/26] ibtrs: client: statistics functions

2018-05-18 Thread Roman Pen
This introduces set of functions used on client side to account
statistics of RDMA data sent/received, amount of IOs inflight,
latency, cpu migrations, etc.  Almost all statistics is collected
using percpu variables.

Signed-off-by: Roman Pen <roman.peny...@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kip...@profitbricks.com>
Cc: Jack Wang <jinpu.w...@profitbricks.com>
---
 drivers/infiniband/ulp/ibtrs/ibtrs-clt-stats.c | 455 +
 1 file changed, 455 insertions(+)
 create mode 100644 drivers/infiniband/ulp/ibtrs/ibtrs-clt-stats.c

diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-clt-stats.c 
b/drivers/infiniband/ulp/ibtrs/ibtrs-clt-stats.c
new file mode 100644
index ..af2ed05d2900
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs/ibtrs-clt-stats.c
@@ -0,0 +1,455 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler <m...@fholler.de>
+ *  Jack Wang <jinpu.w...@profitbricks.com>
+ *  Kleber Souza <kleber.so...@profitbricks.com>
+ *  Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *  Milind Dumbare <milind.dumb...@gmail.com>
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#undef pr_fmt
+#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt
+
+#include "ibtrs-clt.h"
+
+static inline int ibtrs_clt_ms_to_id(unsigned long ms)
+{
+   int id = ms ? ilog2(ms) - MIN_LOG_LAT + 1 : 0;
+
+   return clamp(id, 0, LOG_LAT_SZ - 1);
+}
+
+void ibtrs_clt_update_rdma_lat(struct ibtrs_clt_stats *stats, bool read,
+  unsigned long ms)
+{
+   struct ibtrs_clt_stats_pcpu *s;
+   int id;
+
+   id = ibtrs_clt_ms_to_id(ms);
+   s = this_cpu_ptr(stats->pcpu_stats);
+   if (read) {
+   s->rdma_lat_distr[id].read++;
+   if (s->rdma_lat_max.read < ms)
+   s->rdma_lat_max.read = ms;
+   } else {
+   s->rdma_lat_distr[id].write++;
+   if (s->rdma_lat_max.write < ms)
+   s->rdma_lat_max.write = ms;
+   }
+}
+
+void ibtrs_clt_decrease_inflight(struct ibtrs_clt_stats *stats)
+{
+   atomic_dec(>inflight);
+}
+
+void ibtrs_clt_update_wc_stats(struct ibtrs_clt_con *con)
+{
+   struct ibtrs_clt_sess *sess = to_clt_sess(con->c.sess);
+   struct ibtrs_clt_stats *stats = >stats;
+   struct ibtrs_clt_stats_pcpu *s;
+   int cpu;
+
+   cpu = raw_smp_processor_id();
+   s = this_cpu_ptr(stats->pcpu_stats);
+   s->wc_comp.cnt++;
+   s->wc_comp.total_cnt++;
+   if (unlikely(con->cpu != cpu)) {
+   s->cpu_migr.to++;
+
+   /* Careful here, override s pointer */
+   s = per_cpu_ptr(stats->pcpu_stats, con->cpu);
+   atomic_inc(>cpu_migr.from);
+   }
+}
+
+void ibtrs_clt_inc_failover_cnt(struct ibtrs_clt_stats *stats)
+{
+   struct ibtrs_clt_stats_pcpu *s;
+
+   s = this_cpu_ptr(stats->pcpu_stats);
+   s->rdma.failover_cnt++;
+}
+
+static inline u32 ibtrs_clt_stats_get_avg_wc_cnt(struct ibtrs_clt_stats *stats)
+{
+   u32 cnt = 0;
+   u64 sum = 0;
+   int cpu;
+
+   for_each_possible_cpu(cpu) {
+   struct ibtrs_clt_stats_pcpu *s;
+
+   s = per_cpu_ptr(stats->pcpu_stats, cpu);
+   sum += s->wc_comp.total_cnt;
+   cnt += s->wc_comp.cnt;
+   }
+
+   return cnt ? sum / cnt : 0;
+}
+
+int ibtrs_clt_stats_wc_completion_to_str(struct ibtrs_clt_stats *stats,
+char *buf, size_t len)
+{
+   return scnprintf(buf, len, "%u\n",
+ibtrs_clt_stats_get_avg_wc_cnt(stats));
+}
+
+ssize_t ibtrs_clt_stats_rdma_lat_distr_to_str(struct ibtrs_clt_stats *stats,
+ char *page, size_t len)
+{
+   struct ibtrs_clt_stats_rdma_lat res[LOG

[PATCH 15/24] ibnbd: client: private header with client structs and functions

2018-02-02 Thread Roman Pen
This header describes main structs and functions used by ibnbd-client
module, mainly for managing IBNBD sessions and mapped block devices,
creating and destroying sysfs entries.

Signed-off-by: Roman Pen <roman.peny...@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kip...@profitbricks.com>
Cc: Jack Wang <jinpu.w...@profitbricks.com>
---
 drivers/block/ibnbd/ibnbd-clt.h | 193 
 1 file changed, 193 insertions(+)

diff --git a/drivers/block/ibnbd/ibnbd-clt.h b/drivers/block/ibnbd/ibnbd-clt.h
new file mode 100644
index ..b3d72b2962dd
--- /dev/null
+++ b/drivers/block/ibnbd/ibnbd-clt.h
@@ -0,0 +1,193 @@
+/*
+ * InfiniBand Network Block Driver
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler <m...@fholler.de>
+ *  Jack Wang <jinpu.w...@profitbricks.com>
+ *  Kleber Souza <kleber.so...@profitbricks.com>
+ *  Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *  Milind Dumbare <milind.dumb...@gmail.com>
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef IBNBD_CLT_H
+#define IBNBD_CLT_H
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "ibtrs.h"
+#include "ibnbd-proto.h"
+#include "ibnbd-log.h"
+
+#define BMAX_SEGMENTS 31
+#define RECONNECT_DELAY 30
+#define MAX_RECONNECTS -1
+
+enum ibnbd_clt_dev_state {
+   DEV_STATE_INIT,
+   DEV_STATE_MAPPED,
+   DEV_STATE_MAPPED_DISCONNECTED,
+   DEV_STATE_UNMAPPED,
+};
+
+enum ibnbd_queue_mode {
+   BLK_MQ,
+   BLK_RQ
+};
+
+struct ibnbd_iu_comp {
+   wait_queue_head_t wait;
+   int errno;
+};
+
+struct ibnbd_iu {
+   union {
+   struct request *rq; /* for block io */
+   void *buf; /* for user messages */
+   };
+   struct ibtrs_tag*tag;
+   union {
+   /* use to send msg associated with a dev */
+   struct ibnbd_clt_dev *dev;
+   /* use to send msg associated with a sess */
+   struct ibnbd_clt_session *sess;
+   };
+   blk_status_tstatus;
+   struct scatterlist  sglist[BMAX_SEGMENTS];
+   struct work_struct  work;
+   int errno;
+   struct ibnbd_iu_comp*comp;
+};
+
+struct ibnbd_cpu_qlist {
+   struct list_headrequeue_list;
+   spinlock_t  requeue_lock;
+   unsigned intcpu;
+};
+
+struct ibnbd_clt_session {
+   struct list_headlist;
+   struct ibtrs_clt*ibtrs;
+   wait_queue_head_t   ibtrs_waitq;
+   boolibtrs_ready;
+   struct ibnbd_cpu_qlist  __percpu
+   *cpu_queues;
+   DECLARE_BITMAP(cpu_queues_bm, NR_CPUS);
+   int __percpu*cpu_rr; /* per-cpu var for CPU round-robin */
+   atomic_tbusy;
+   int queue_depth;
+   u32 max_io_size;
+   struct blk_mq_tag_set   tag_set;
+   struct mutexlock; /* protects state and devs_list */
+   struct list_headdevs_list; /* list of struct ibnbd_clt_dev */
+   refcount_t  refcount;
+   charsessname[NAME_MAX];
+   u8  ver; /* protocol version */
+};
+
+/**
+ * Submission queues.
+ */
+struct ibnbd_queue {
+   struct list_headrequeue_list;
+   unsigned long   in_list;
+   struct ibnbd_clt_dev*dev;
+   struct blk_mq_hw_ctx*hctx;
+};
+
+struct ibnbd_clt_dev {
+   struct ibnbd_clt_session*sess;
+   struct request_queue*queue;
+   struct ibnbd_queue  *hw_queues;
+   struct delayed_work rq_delay_work;
+   u32 device_id;
+   /* local Idr index - used to track minor number allocations. */
+   u32 clt_device_id;
+   struct mutexlock;
+   enum ibnbd_clt_dev_statedev_state;

[PATCH 17/24] ibnbd: client: sysfs interface functions

2018-02-02 Thread Roman Pen
This is the sysfs interface to IBNBD block devices on client side:

  /sys/kernel/ibnbd_client/
|- map_device
|  *** maps remote device
|
|- devices/
   *** all mapped devices

  /sys/block/ibnbd/ibnbd_client/
|- unmap_device
|  *** unmaps device
|
|- state
|  *** device state
|
|- session
|  *** session name
|
|- mapping_path
   *** path of the dev that was mapped on server

Signed-off-by: Roman Pen <roman.peny...@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kip...@profitbricks.com>
Cc: Jack Wang <jinpu.w...@profitbricks.com>
---
 drivers/block/ibnbd/ibnbd-clt-sysfs.c | 723 ++
 1 file changed, 723 insertions(+)

diff --git a/drivers/block/ibnbd/ibnbd-clt-sysfs.c 
b/drivers/block/ibnbd/ibnbd-clt-sysfs.c
new file mode 100644
index ..2770b5c81c23
--- /dev/null
+++ b/drivers/block/ibnbd/ibnbd-clt-sysfs.c
@@ -0,0 +1,723 @@
+/*
+ * InfiniBand Network Block Driver
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler <m...@fholler.de>
+ *  Jack Wang <jinpu.w...@profitbricks.com>
+ *  Kleber Souza <kleber.so...@profitbricks.com>
+ *  Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *  Milind Dumbare <milind.dumb...@gmail.com>
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#undef pr_fmt
+#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "ibnbd-clt.h"
+
+static struct kobject *ibnbd_kobject;
+static struct kobject *ibnbd_devices_kobject;
+
+enum {
+   IBNBD_OPT_ERR   = 0,
+   IBNBD_OPT_PATH  = 1 << 0,
+   IBNBD_OPT_DEV_PATH  = 1 << 1,
+   IBNBD_OPT_ACCESS_MODE   = 1 << 3,
+   IBNBD_OPT_INPUT_MODE= 1 << 4,
+   IBNBD_OPT_IO_MODE   = 1 << 5,
+   IBNBD_OPT_SESSNAME  = 1 << 6,
+};
+
+static unsigned int ibnbd_opt_mandatory[] = {
+   IBNBD_OPT_PATH,
+   IBNBD_OPT_DEV_PATH,
+   IBNBD_OPT_SESSNAME,
+};
+
+static const match_table_t ibnbd_opt_tokens = {
+   {   IBNBD_OPT_PATH, "path=%s"   },
+   {   IBNBD_OPT_DEV_PATH, "device_path=%s"},
+   {   IBNBD_OPT_ACCESS_MODE,  "access_mode=%s"},
+   {   IBNBD_OPT_INPUT_MODE,   "input_mode=%s" },
+   {   IBNBD_OPT_IO_MODE,  "io_mode=%s"},
+   {   IBNBD_OPT_SESSNAME, "sessname=%s"   },
+   {   IBNBD_OPT_ERR,  NULL},
+};
+
+/* remove new line from string */
+static void strip(char *s)
+{
+   char *p = s;
+
+   while (*s != '\0') {
+   if (*s != '\n')
+   *p++ = *s++;
+   else
+   ++s;
+   }
+   *p = '\0';
+}
+
+static int ibnbd_clt_parse_map_options(const char *buf,
+  char *sessname,
+  struct ibtrs_addr *paths,
+  size_t *path_cnt,
+  size_t max_path_cnt,
+  char *pathname,
+  enum ibnbd_access_mode *access_mode,
+  enum ibnbd_queue_mode *queue_mode,
+  enum ibnbd_io_mode *io_mode)
+{
+   char *options, *sep_opt;
+   char *p;
+   substring_t args[MAX_OPT_ARGS];
+   int opt_mask = 0;
+   int token;
+   int ret = -EINVAL;
+   int i;
+   int p_cnt = 0;
+
+   options = kstrdup(buf, GFP_KERNEL);
+   if (!options)
+   return -ENOMEM;
+
+   options = strstrip(options);
+   strip(options);
+   sep_opt = options;
+   while ((p = strsep(_opt, " &qu

[PATCH 24/24] MAINTAINERS: Add maintainer for IBNBD/IBTRS modules

2018-02-02 Thread Roman Pen
Signed-off-by: Roman Pen <roman.peny...@profitbricks.com>
Cc: Danil Kipnis <danil.kip...@profitbricks.com>
Cc: Jack Wang <jinpu.w...@profitbricks.com>
---
 MAINTAINERS | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 18994806e441..fad9c2529f8a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6714,6 +6714,20 @@ IBM ServeRAID RAID DRIVER
 S: Orphan
 F: drivers/scsi/ips.*
 
+IBNBD BLOCK DRIVERS
+M: IBNBD/IBTRS Storage Team <ib...@profitbricks.com>
+L: linux-block@vger.kernel.org
+S: Maintained
+T: git git://github.com/profitbricks/ibnbd.git
+F: drivers/block/ibnbd/
+
+IBTRS TRANSPORT DRIVERS
+M: IBNBD/IBTRS Storage Team <ib...@profitbricks.com>
+L: linux-r...@vger.kernel.org
+S: Maintained
+T: git git://github.com/profitbricks/ibnbd.git
+F: drivers/infiniband/ulp/ibtrs/
+
 ICH LPC AND GPIO DRIVER
 M: Peter Tyser <pty...@xes-inc.com>
 S: Maintained
-- 
2.13.1



[PATCH 14/24] ibnbd: private headers with IBNBD protocol structs and helpers

2018-02-02 Thread Roman Pen
These are common private headers with IBNBD protocol structures,
logging, sysfs and other helper functions, which are used on
both client and server sides.

Signed-off-by: Roman Pen <roman.peny...@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kip...@profitbricks.com>
Cc: Jack Wang <jinpu.w...@profitbricks.com>
---
 drivers/block/ibnbd/ibnbd-log.h   |  71 
 drivers/block/ibnbd/ibnbd-proto.h | 360 ++
 2 files changed, 431 insertions(+)

diff --git a/drivers/block/ibnbd/ibnbd-log.h b/drivers/block/ibnbd/ibnbd-log.h
new file mode 100644
index ..489343a61171
--- /dev/null
+++ b/drivers/block/ibnbd/ibnbd-log.h
@@ -0,0 +1,71 @@
+/*
+ * InfiniBand Network Block Driver
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler <m...@fholler.de>
+ *  Jack Wang <jinpu.w...@profitbricks.com>
+ *  Kleber Souza <kleber.so...@profitbricks.com>
+ *  Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *  Milind Dumbare <milind.dumb...@gmail.com>
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef IBNBD_LOG_H
+#define IBNBD_LOG_H
+
+#include "ibnbd-clt.h"
+#include "ibnbd-srv.h"
+
+#define ibnbd_diskname(dev) ({ \
+   struct gendisk *gd = ((struct ibnbd_clt_dev *)dev)->gd; \
+   gd ? gd->disk_name : "";\
+})
+
+void unknown_type(void);
+
+#define ibnbd_log(fn, dev, fmt, ...) ({
\
+   __builtin_choose_expr(  \
+   __builtin_types_compatible_p(   \
+   typeof(dev), struct ibnbd_clt_dev *),   \
+   fn("<%s@%s> %s: " fmt, (dev)->pathname, \
+  (dev)->sess->sessname, ibnbd_diskname(dev),  \
+  ##__VA_ARGS__),  \
+   __builtin_choose_expr(  \
+   __builtin_types_compatible_p(typeof(dev),   \
+   struct ibnbd_srv_sess_dev *),   \
+   fn("<%s@%s>: " fmt, (dev)->pathname,\
+  (dev)->sess->sessname, ##__VA_ARGS__),   
\
+   unknown_type()));   \
+})
+
+#define ibnbd_err(dev, fmt, ...)   \
+   ibnbd_log(pr_err, dev, fmt, ##__VA_ARGS__)
+#define ibnbd_err_rl(dev, fmt, ...)\
+   ibnbd_log(pr_err_ratelimited, dev, fmt, ##__VA_ARGS__)
+#define ibnbd_wrn(dev, fmt, ...)   \
+   ibnbd_log(pr_warn, dev, fmt, ##__VA_ARGS__)
+#define ibnbd_wrn_rl(dev, fmt, ...) \
+   ibnbd_log(pr_warn_ratelimited, dev, fmt, ##__VA_ARGS__)
+#define ibnbd_info(dev, fmt, ...) \
+   ibnbd_log(pr_info, dev, fmt, ##__VA_ARGS__)
+#define ibnbd_info_rl(dev, fmt, ...) \
+   ibnbd_log(pr_info_ratelimited, dev, fmt, ##__VA_ARGS__)
+
+#endif /* IBNBD_LOG_H */
diff --git a/drivers/block/ibnbd/ibnbd-proto.h 
b/drivers/block/ibnbd/ibnbd-proto.h
new file mode 100644
index ..c809705a2322
--- /dev/null
+++ b/drivers/block/ibnbd/ibnbd-proto.h
@@ -0,0 +1,360 @@
+/*
+ * InfiniBand Network Block Driver
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler <m...@fholler.de>
+ *  Jack Wang <jinpu.w...@profitbricks.com>
+ *  Kleber Souza <kleber.so...@profitbricks.com>
+ *  Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *  Milind Dumbare <milind.dumb...@gmail.com>
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *
+ * This 

[PATCH 18/24] ibnbd: server: private header with server structs and functions

2018-02-02 Thread Roman Pen
This header describes main structs and functions used by ibnbd-server
module, namely structs for managing sessions from different clients
and mapped (opened) devices.

Signed-off-by: Roman Pen <roman.peny...@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kip...@profitbricks.com>
Cc: Jack Wang <jinpu.w...@profitbricks.com>
---
 drivers/block/ibnbd/ibnbd-srv.h | 100 
 1 file changed, 100 insertions(+)

diff --git a/drivers/block/ibnbd/ibnbd-srv.h b/drivers/block/ibnbd/ibnbd-srv.h
new file mode 100644
index ..191a1650bc1d
--- /dev/null
+++ b/drivers/block/ibnbd/ibnbd-srv.h
@@ -0,0 +1,100 @@
+/*
+ * InfiniBand Network Block Driver
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler <m...@fholler.de>
+ *  Jack Wang <jinpu.w...@profitbricks.com>
+ *  Kleber Souza <kleber.so...@profitbricks.com>
+ *  Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *  Milind Dumbare <milind.dumb...@gmail.com>
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef IBNBD_SRV_H
+#define IBNBD_SRV_H
+
+#include 
+#include 
+#include 
+
+#include "ibtrs.h"
+#include "ibnbd-proto.h"
+#include "ibnbd-log.h"
+
+struct ibnbd_srv_session {
+   /* Entry inside global sess_list */
+   struct list_headlist;
+   struct ibtrs_srv*ibtrs;
+   charsessname[NAME_MAX];
+   int queue_depth;
+   struct bio_set  *sess_bio_set;
+
+   rwlock_tindex_lock cacheline_aligned;
+   struct idr  index_idr;
+   /* List of struct ibnbd_srv_sess_dev */
+   struct list_headsess_dev_list;
+   struct mutexlock;
+   u8  ver;
+};
+
+struct ibnbd_srv_dev {
+   /* Entry inside global dev_list */
+   struct list_headlist;
+   struct kobject  dev_kobj;
+   struct kobject  dev_sessions_kobj;
+   struct kref kref;
+   charid[NAME_MAX];
+   /* List of ibnbd_srv_sess_dev structs */
+   struct list_headsess_dev_list;
+   struct mutexlock;
+   int open_write_cnt;
+   enum ibnbd_io_mode  mode;
+};
+
+/* Structure which binds N devices and N sessions */
+struct ibnbd_srv_sess_dev {
+   /* Entry inside ibnbd_srv_dev struct */
+   struct list_headdev_list;
+   /* Entry inside ibnbd_srv_session struct */
+   struct list_headsess_list;
+   struct ibnbd_dev*ibnbd_dev;
+   struct ibnbd_srv_session*sess;
+   struct ibnbd_srv_dev*dev;
+   struct kobject  kobj;
+   struct completion   *sysfs_release_compl;
+   u32 device_id;
+   fmode_t open_flags;
+   struct kref kref;
+   struct completion   *destroy_comp;
+   charpathname[NAME_MAX];
+};
+
+/* ibnbd-srv-sysfs.c */
+
+int ibnbd_srv_create_dev_sysfs(struct ibnbd_srv_dev *dev,
+  struct block_device *bdev,
+  const char *dir_name);
+void ibnbd_srv_destroy_dev_sysfs(struct ibnbd_srv_dev *dev);
+int ibnbd_srv_create_dev_session_sysfs(struct ibnbd_srv_sess_dev *sess_dev);
+void ibnbd_srv_destroy_dev_session_sysfs(struct ibnbd_srv_sess_dev *sess_dev);
+int ibnbd_srv_create_sysfs_files(void);
+void ibnbd_srv_destroy_sysfs_files(void);
+
+#endif /* IBNBD_SRV_H */
-- 
2.13.1



[PATCH 19/24] ibnbd: server: main functionality

2018-02-02 Thread Roman Pen
This is main functionality of ibnbd-server module, which handles IBTRS
events and IBNBD protocol requests, like map (open) or unmap (close)
device.  Also server side is responsible for processing incoming IBTRS
IO requests and forward them to local mapped devices.

Signed-off-by: Roman Pen <roman.peny...@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kip...@profitbricks.com>
Cc: Jack Wang <jinpu.w...@profitbricks.com>
---
 drivers/block/ibnbd/ibnbd-srv.c | 901 
 1 file changed, 901 insertions(+)

diff --git a/drivers/block/ibnbd/ibnbd-srv.c b/drivers/block/ibnbd/ibnbd-srv.c
new file mode 100644
index ..a32d22ab67a3
--- /dev/null
+++ b/drivers/block/ibnbd/ibnbd-srv.c
@@ -0,0 +1,901 @@
+/*
+ * InfiniBand Network Block Driver
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler <m...@fholler.de>
+ *  Jack Wang <jinpu.w...@profitbricks.com>
+ *  Kleber Souza <kleber.so...@profitbricks.com>
+ *  Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *  Milind Dumbare <milind.dumb...@gmail.com>
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#undef pr_fmt
+#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt
+
+#include 
+#include 
+
+#include "ibnbd-srv.h"
+#include "ibnbd-srv-dev.h"
+
+MODULE_AUTHOR("ib...@profitbricks.com");
+MODULE_VERSION(IBNBD_VER_STRING);
+MODULE_DESCRIPTION("InfiniBand Network Block Device Server");
+MODULE_LICENSE("GPL");
+
+#define DEFAULT_DEV_SEARCH_PATH "/"
+
+static char dev_search_path[PATH_MAX] = DEFAULT_DEV_SEARCH_PATH;
+
+static int dev_search_path_set(const char *val, const struct kernel_param *kp)
+{
+   char *dup;
+
+   if (strlen(val) >= sizeof(dev_search_path))
+   return -EINVAL;
+
+   dup = kstrdup(val, GFP_KERNEL);
+
+   if (dup[strlen(dup) - 1] == '\n')
+   dup[strlen(dup) - 1] = '\0';
+
+   strlcpy(dev_search_path, dup, sizeof(dev_search_path));
+
+   kfree(dup);
+   pr_info("dev_search_path changed to '%s'\n", dev_search_path);
+
+   return 0;
+}
+
+static struct kparam_string dev_search_path_kparam_str = {
+   .maxlen = sizeof(dev_search_path),
+   .string = dev_search_path
+};
+
+static const struct kernel_param_ops dev_search_path_ops = {
+   .set= dev_search_path_set,
+   .get= param_get_string,
+};
+
+module_param_cb(dev_search_path, _search_path_ops,
+   _search_path_kparam_str, 0444);
+MODULE_PARM_DESC(dev_search_path, "Sets the device_search_path."
+" When a device is mapped this path is prepended to the"
+" device_path from the map_device operation."
+" (default: " DEFAULT_DEV_SEARCH_PATH ")");
+
+static int def_io_mode = IBNBD_BLOCKIO;
+module_param(def_io_mode, int, 0444);
+MODULE_PARM_DESC(def_io_mode, "By default, export devices in"
+" blockio(" __stringify(_IBNBD_BLOCKIO) ") or"
+" fileio(" __stringify(_IBNBD_FILEIO) ") mode."
+" (default: " __stringify(_IBNBD_BLOCKIO) " (blockio))");
+
+static DEFINE_MUTEX(sess_lock);
+static DEFINE_SPINLOCK(dev_lock);
+
+static LIST_HEAD(sess_list);
+static LIST_HEAD(dev_list);
+
+struct ibnbd_io_private {
+   struct ibtrs_srv_op *id;
+   struct ibnbd_srv_sess_dev   *sess_dev;
+};
+
+static void ibnbd_sess_dev_release(struct kref *kref)
+{
+   struct ibnbd_srv_sess_dev *sess_dev;
+
+   sess_dev = container_of(kref, struct ibnbd_srv_sess_dev, kref);
+   complete(sess_dev->destroy_comp);
+}
+
+static inline void ibnbd_put_sess_dev(struct ibnbd_srv_sess_dev *sess_dev)
+{
+   kref_put(_dev->kref, ibnbd_sess_dev_release);
+}
+
+static void ibnbd_endio(void *priv, int error)
+{
+   struct ibnbd_io_p

[PATCH 16/24] ibnbd: client: main functionality

2018-02-02 Thread Roman Pen
This is main functionality of ibnbd-client module, which provides
interface to map remote device as local block device /dev/ibnbd
and feeds IBTRS with IO requests.

Signed-off-by: Roman Pen <roman.peny...@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kip...@profitbricks.com>
Cc: Jack Wang <jinpu.w...@profitbricks.com>
---
 drivers/block/ibnbd/ibnbd-clt.c | 1959 +++
 1 file changed, 1959 insertions(+)

diff --git a/drivers/block/ibnbd/ibnbd-clt.c b/drivers/block/ibnbd/ibnbd-clt.c
new file mode 100644
index ..b5bc71414778
--- /dev/null
+++ b/drivers/block/ibnbd/ibnbd-clt.c
@@ -0,0 +1,1959 @@
+/*
+ * InfiniBand Network Block Driver
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler <m...@fholler.de>
+ *  Jack Wang <jinpu.w...@profitbricks.com>
+ *  Kleber Souza <kleber.so...@profitbricks.com>
+ *  Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *  Milind Dumbare <milind.dumb...@gmail.com>
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#undef pr_fmt
+#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "ibnbd-clt.h"
+
+MODULE_AUTHOR("ib...@profitbricks.com");
+MODULE_DESCRIPTION("InfiniBand Network Block Device Client");
+MODULE_VERSION(IBNBD_VER_STRING);
+MODULE_LICENSE("GPL");
+
+static int ibnbd_client_major;
+static DEFINE_IDA(index_ida);
+static DEFINE_MUTEX(ida_lock);
+static DEFINE_MUTEX(sess_lock);
+static LIST_HEAD(sess_list);
+
+static bool softirq_enable;
+module_param(softirq_enable, bool, 0444);
+MODULE_PARM_DESC(softirq_enable, "finish request in softirq_fn."
+" (default: 0)");
+/*
+ * Maximum number of partitions an instance can have.
+ * 6 bits = 64 minors = 63 partitions (one minor is used for the device itself)
+ */
+#define IBNBD_PART_BITS6
+#define KERNEL_SECTOR_SIZE  512
+
+static inline bool ibnbd_clt_get_sess(struct ibnbd_clt_session *sess)
+{
+   return refcount_inc_not_zero(>refcount);
+}
+
+static void free_sess(struct ibnbd_clt_session *sess);
+
+static void ibnbd_clt_put_sess(struct ibnbd_clt_session *sess)
+{
+   might_sleep();
+
+   if (refcount_dec_and_test(>refcount))
+   free_sess(sess);
+}
+
+static inline bool ibnbd_clt_dev_is_mapped(struct ibnbd_clt_dev *dev)
+{
+   return dev->dev_state == DEV_STATE_MAPPED;
+}
+
+static void ibnbd_clt_put_dev(struct ibnbd_clt_dev *dev)
+{
+   might_sleep();
+
+   if (refcount_dec_and_test(>refcount)) {
+   mutex_lock(_lock);
+   ida_simple_remove(_ida, dev->clt_device_id);
+   mutex_unlock(_lock);
+   kfree(dev->hw_queues);
+   ibnbd_clt_put_sess(dev->sess);
+   kfree(dev);
+   }
+}
+
+static inline bool ibnbd_clt_get_dev(struct ibnbd_clt_dev *dev)
+{
+   return refcount_inc_not_zero(>refcount);
+}
+
+static void ibnbd_clt_set_dev_attr(struct ibnbd_clt_dev *dev,
+  const struct ibnbd_msg_open_rsp *rsp)
+{
+   struct ibnbd_clt_session *sess = dev->sess;
+
+   dev->device_id  = le32_to_cpu(rsp->device_id);
+   dev->nsectors   = le64_to_cpu(rsp->nsectors);
+   dev->logical_block_size = le16_to_cpu(rsp->logical_block_size);
+   dev->physical_block_size= le16_to_cpu(rsp->physical_block_size);
+   dev->max_write_same_sectors = le32_to_cpu(rsp->max_write_same_sectors);
+   dev->max_discard_sectors= le32_to_cpu(rsp->max_discard_sectors);
+   dev->discard_granularity= le32_to_cpu(rsp->discard_granularity);
+   dev->discard_alignment  = le32_to_cpu(rsp->discard_alignment);
+   dev->secure_discard = le16_to_cpu(rsp->secure_discard);
+   dev->rotational 

[PATCH 20/24] ibnbd: server: functionality for IO submission to file or block dev

2018-02-02 Thread Roman Pen
This provides helper functions for IO submission to file or block dev.

Signed-off-by: Roman Pen <roman.peny...@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kip...@profitbricks.com>
Cc: Jack Wang <jinpu.w...@profitbricks.com>
---
 drivers/block/ibnbd/ibnbd-srv-dev.c | 410 
 drivers/block/ibnbd/ibnbd-srv-dev.h | 149 +
 2 files changed, 559 insertions(+)

diff --git a/drivers/block/ibnbd/ibnbd-srv-dev.c 
b/drivers/block/ibnbd/ibnbd-srv-dev.c
new file mode 100644
index ..a5894849b9d5
--- /dev/null
+++ b/drivers/block/ibnbd/ibnbd-srv-dev.c
@@ -0,0 +1,410 @@
+/*
+ * InfiniBand Network Block Driver
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler <m...@fholler.de>
+ *  Jack Wang <jinpu.w...@profitbricks.com>
+ *  Kleber Souza <kleber.so...@profitbricks.com>
+ *  Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *  Milind Dumbare <milind.dumb...@gmail.com>
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#undef pr_fmt
+#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt
+
+#include "ibnbd-srv-dev.h"
+#include "ibnbd-log.h"
+
+#define IBNBD_DEV_MAX_FILEIO_ACTIVE_WORKERS 0
+
+struct ibnbd_dev_file_io_work {
+   struct ibnbd_dev*dev;
+   void*priv;
+
+   sector_tsector;
+   void*data;
+   size_t  len;
+   size_t  bi_size;
+   enum ibnbd_io_flags flags;
+
+   struct work_struct  work;
+};
+
+struct ibnbd_dev_blk_io {
+   struct ibnbd_dev *dev;
+   void *priv;
+};
+
+static struct workqueue_struct *fileio_wq;
+
+int ibnbd_dev_init(void)
+{
+   fileio_wq = alloc_workqueue("%s", WQ_UNBOUND,
+   IBNBD_DEV_MAX_FILEIO_ACTIVE_WORKERS,
+   "ibnbd_server_fileio_wq");
+   if (!fileio_wq)
+   return -ENOMEM;
+
+   return 0;
+}
+
+void ibnbd_dev_destroy(void)
+{
+   destroy_workqueue(fileio_wq);
+}
+
+static inline struct block_device *ibnbd_dev_open_bdev(const char *path,
+  fmode_t flags)
+{
+   return blkdev_get_by_path(path, flags, THIS_MODULE);
+}
+
+static int ibnbd_dev_blk_open(struct ibnbd_dev *dev, const char *path,
+ fmode_t flags)
+{
+   dev->bdev = ibnbd_dev_open_bdev(path, flags);
+   return PTR_ERR_OR_ZERO(dev->bdev);
+}
+
+static int ibnbd_dev_vfs_open(struct ibnbd_dev *dev, const char *path,
+ fmode_t flags)
+{
+   int oflags = O_DSYNC; /* enable write-through */
+
+   if (flags & FMODE_WRITE)
+   oflags |= O_RDWR;
+   else if (flags & FMODE_READ)
+   oflags |= O_RDONLY;
+   else
+   return -EINVAL;
+
+   dev->file = filp_open(path, oflags, 0);
+   return PTR_ERR_OR_ZERO(dev->file);
+}
+
+struct ibnbd_dev *ibnbd_dev_open(const char *path, fmode_t flags,
+enum ibnbd_io_mode mode, struct bio_set *bs,
+ibnbd_dev_io_fn io_cb)
+{
+   struct ibnbd_dev *dev;
+   int ret;
+
+   dev = kzalloc(sizeof(*dev), GFP_KERNEL);
+   if (!dev)
+   return ERR_PTR(-ENOMEM);
+
+   if (mode == IBNBD_BLOCKIO) {
+   dev->blk_open_flags = flags;
+   ret = ibnbd_dev_blk_open(dev, path, dev->blk_open_flags);
+   if (ret)
+   goto err;
+   } else if (mode == IBNBD_FILEIO) {
+   dev->blk_open_flags = FMODE_READ;
+   ret = ibnbd_dev_blk_open(dev, path, dev->blk_open_flags);
+   if (ret)
+   goto err;
+
+   ret = ibnbd_dev_vfs_open(dev, path, flags);
+   if (ret)
+   goto blk_put;
+  

[PATCH 22/24] ibnbd: include client and server modules into kernel compilation

2018-02-02 Thread Roman Pen
Add IBNBD Makefile, Kconfig and also corresponding lines into upper
block layer files.

Signed-off-by: Roman Pen <roman.peny...@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kip...@profitbricks.com>
Cc: Jack Wang <jinpu.w...@profitbricks.com>
---
 drivers/block/Kconfig|  2 ++
 drivers/block/Makefile   |  1 +
 drivers/block/ibnbd/Kconfig  | 22 ++
 drivers/block/ibnbd/Makefile | 13 +
 4 files changed, 38 insertions(+)

diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig
index 40579d0cb3d1..483aae5d391e 100644
--- a/drivers/block/Kconfig
+++ b/drivers/block/Kconfig
@@ -477,4 +477,6 @@ config BLK_DEV_RSXX
  To compile this driver as a module, choose M here: the
  module will be called rsxx.
 
+source "drivers/block/ibnbd/Kconfig"
+
 endif # BLK_DEV
diff --git a/drivers/block/Makefile b/drivers/block/Makefile
index dc061158b403..65346a1d0b1a 100644
--- a/drivers/block/Makefile
+++ b/drivers/block/Makefile
@@ -38,6 +38,7 @@ obj-$(CONFIG_BLK_DEV_PCIESSD_MTIP32XX)+= mtip32xx/
 obj-$(CONFIG_BLK_DEV_RSXX) += rsxx/
 obj-$(CONFIG_BLK_DEV_NULL_BLK) += null_blk.o
 obj-$(CONFIG_ZRAM) += zram/
+obj-$(CONFIG_BLK_DEV_IBNBD)+= ibnbd/
 
 skd-y  := skd_main.o
 swim_mod-y := swim.o swim_asm.o
diff --git a/drivers/block/ibnbd/Kconfig b/drivers/block/ibnbd/Kconfig
new file mode 100644
index ..c5cc7d111c7a
--- /dev/null
+++ b/drivers/block/ibnbd/Kconfig
@@ -0,0 +1,22 @@
+config BLK_DEV_IBNBD
+   boolean
+
+config BLK_DEV_IBNBD_CLIENT
+   tristate "Network block device driver on top of IBTRS transport"
+   depends on INFINIBAND_IBTRS_CLIENT
+   select BLK_DEV_IBNBD
+   help
+ IBNBD client allows for mapping of a remote block devices over
+ IBTRS protocol from a target system where IBNBD server is running.
+
+ If unsure, say N.
+
+config BLK_DEV_IBNBD_SERVER
+   tristate "Network block device over RDMA Infiniband server support"
+   depends on INFINIBAND_IBTRS_SERVER
+   select BLK_DEV_IBNBD
+   help
+ IBNBD server allows for exporting local block devices to a remote 
client
+ over IBTRS protocol.
+
+ If unsure, say N.
diff --git a/drivers/block/ibnbd/Makefile b/drivers/block/ibnbd/Makefile
new file mode 100644
index ..5f20e72e0633
--- /dev/null
+++ b/drivers/block/ibnbd/Makefile
@@ -0,0 +1,13 @@
+ccflags-y := -Idrivers/infiniband/ulp/ibtrs
+
+ibnbd-client-y := ibnbd-clt.o \
+ ibnbd-clt-sysfs.o
+
+ibnbd-server-y := ibnbd-srv.o \
+ ibnbd-srv-dev.o \
+ ibnbd-srv-sysfs.o
+
+obj-$(CONFIG_BLK_DEV_IBNBD_CLIENT) += ibnbd-client.o
+obj-$(CONFIG_BLK_DEV_IBNBD_SERVER) += ibnbd-server.o
+
+-include $(src)/compat/compat.mk
-- 
2.13.1



[PATCH 23/24] ibnbd: a bit of documentation

2018-02-02 Thread Roman Pen
README with description of major sysfs entries.

Signed-off-by: Roman Pen <roman.peny...@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kip...@profitbricks.com>
Cc: Jack Wang <jinpu.w...@profitbricks.com>
---
 drivers/block/ibnbd/README | 272 +
 1 file changed, 272 insertions(+)

diff --git a/drivers/block/ibnbd/README b/drivers/block/ibnbd/README
new file mode 100644
index ..e0feb39fad14
--- /dev/null
+++ b/drivers/block/ibnbd/README
@@ -0,0 +1,272 @@
+***
+Infiniband Network Block Device (IBNBD)
+***
+
+Introduction
+
+
+IBNBD (InfiniBand Network Block Device) is a pair of kernel modules
+(client and server) that allow for remote access of a block device on
+the server over IBTRS protocol using the RDMA (InfiniBand, RoCE, iWarp)
+transport. After being mapped, the remote block devices can be accessed
+on the client side as local block devices.
+
+I/O is transfered between client and server by the IBTRS transport
+modules. The administration of IBNBD and IBTRS modules is done via
+sysfs entries.
+
+Requirements
+
+
+  IBTRS kernel modules
+
+Quick Start
+---
+
+Server side:
+  # modprobe ibnbd_server
+
+Client side:
+  # modprobe ibnbd_client
+  # echo "sessname=blya path=ip:10.50.100.66 device_path=/dev/ram0" > \
+/sys/kernel/ibnbd_client/map_device
+
+  Where "sessname=" is a session name, a string to identify the session
+  on client and on server sides; "path=" is a destination IP address or
+  a pair of a source and a destination IPs, separated by comma.  Multiple
+  "path=" options can be specified in order to use multipath  (see IBTRS
+  description for details); "device_path=" is the block device to be
+  mapped from the server side. After the session to the server machine is
+  established, the mapped device will appear on the client side under
+  /dev/ibnbd.
+
+
+==
+Client Sysfs Interface
+==
+
+All sysfs files that are not read-only provide the usage information on read:
+
+Example:
+  # cat /sys/kernel/ibnbd_client/map_device
+
+  > Usage: echo "sessname= path=<[srcaddr,]dstaddr>
+  > [path=<[srcaddr,]dstaddr>] device_path=
+  > [access_mode=<ro|rw|migration>] [input_mode=<mq|rq>]
+  > [io_mode=<fileio|blockio>]" > map_device
+  >
+  > addr ::= [ ip: | ip: | gid: ]
+
+Entries under /sys/kernel/ibnbd_client/
+===
+
+map_device (RW)
+---
+
+Expected format is the following:
+
+sessname=
+path=<[srcaddr,]dstaddr> [path=<[srcaddr,]dstaddr> ...]
+device_path=
+[access_mode=<ro|rw|migration>]
+[input_mode=<mq|rq>]
+[io_mode=<fileio|blockio>]
+
+Where:
+
+sessname: accepts a string not bigger than 256 chars, which identifies
+  a given session on the client and on the server.
+ I.e. "clt_hostname-srv_hostname" could be a natural choice.
+
+path: describes a connection between the client and the server by
+ specifying destination and, when required, the source address.
+ The addresses are to be provided in the following format:
+
+ip:
+ip:
+gid:
+
+  for example:
+
+  path=ip:10.0.0.66
+ The single addr is treated as the destination.
+ The connection will be established to this
+ server from any client IP address.
+
+  path=ip:10.0.0.66,ip:10.0.1.66
+ First addr is the source address and the second
+ is the destination.
+
+  If multiple "path=" options are specified multiple connection
+  will be established and data will be sent according to
+  the selected multipath policy (see IBTRS mp_policy sysfs entry
+  description).
+
+device_path: Path to the block device on the server side. Path is specified
+relative to the directory on server side configured in the
+ 'dev_search_path' module parameter of the ibnbd_server.
+ The ibnbd_server prepends the  received from client
+with  and tries to open the
+/ block device.  On success,
+a /dev/ibnbd device file, a /sys/block/ibnbd_client/ibnbd/
+directory and an entry in /sys/kernel/ibnbd_client/devices will be
+ created.
+
+access_mode: the access_mode parameter specifies if the device is to be
+ mapped as "ro" read-only or "rw" read-write. The server allows
+a device to be exported in rw mode only once. The "migration"
+ access mode has to be specified if a second mapping in read-writ

[PATCH 21/24] ibnbd: server: sysfs interface functions

2018-02-02 Thread Roman Pen
This is the sysfs interface to IBNBD mapped devices on server side:

  /sys/kernel/ibnbd_server/devices//
|- block_dev
|  *** link pointing to the corresponding block device sysfs entry
|
|- sessions//
|  *** sessions directory
   |
   |- read_only
   |  *** is devices mapped as read only
   |
   |- mapping_path
  *** relative device path provided by the client during mapping

Signed-off-by: Roman Pen <roman.peny...@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kip...@profitbricks.com>
Cc: Jack Wang <jinpu.w...@profitbricks.com>
---
 drivers/block/ibnbd/ibnbd-srv-sysfs.c | 264 ++
 1 file changed, 264 insertions(+)

diff --git a/drivers/block/ibnbd/ibnbd-srv-sysfs.c 
b/drivers/block/ibnbd/ibnbd-srv-sysfs.c
new file mode 100644
index ..a0efd6a2accb
--- /dev/null
+++ b/drivers/block/ibnbd/ibnbd-srv-sysfs.c
@@ -0,0 +1,264 @@
+/*
+ * InfiniBand Network Block Driver
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler <m...@fholler.de>
+ *  Jack Wang <jinpu.w...@profitbricks.com>
+ *  Kleber Souza <kleber.so...@profitbricks.com>
+ *  Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *  Milind Dumbare <milind.dumb...@gmail.com>
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#undef pr_fmt
+#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "ibnbd-srv.h"
+
+static struct kobject *ibnbd_srv_kobj;
+static struct kobject *ibnbd_srv_devices_kobj;
+
+static struct attribute *ibnbd_srv_default_dev_attrs[] = {
+   NULL,
+};
+
+static struct attribute_group ibnbd_srv_default_dev_attr_group = {
+   .attrs = ibnbd_srv_default_dev_attrs,
+};
+
+static ssize_t ibnbd_srv_attr_show(struct kobject *kobj, struct attribute 
*attr,
+  char *page)
+{
+   struct kobj_attribute *kattr;
+   int ret = -EIO;
+
+   kattr = container_of(attr, struct kobj_attribute, attr);
+   if (kattr->show)
+   ret = kattr->show(kobj, kattr, page);
+   return ret;
+}
+
+static ssize_t ibnbd_srv_attr_store(struct kobject *kobj,
+   struct attribute *attr,
+   const char *page, size_t length)
+{
+   struct kobj_attribute *kattr;
+   int ret = -EIO;
+
+   kattr = container_of(attr, struct kobj_attribute, attr);
+   if (kattr->store)
+   ret = kattr->store(kobj, kattr, page, length);
+   return ret;
+}
+
+static const struct sysfs_ops ibnbd_srv_sysfs_ops = {
+   .show   = ibnbd_srv_attr_show,
+   .store  = ibnbd_srv_attr_store,
+};
+
+static struct kobj_type ibnbd_srv_dev_ktype = {
+   .sysfs_ops  = _srv_sysfs_ops,
+};
+
+static struct kobj_type ibnbd_srv_dev_sessions_ktype = {
+   .sysfs_ops  = _srv_sysfs_ops,
+};
+
+int ibnbd_srv_create_dev_sysfs(struct ibnbd_srv_dev *dev,
+  struct block_device *bdev,
+  const char *dir_name)
+{
+   struct kobject *bdev_kobj;
+   int ret;
+
+   ret = kobject_init_and_add(>dev_kobj, _srv_dev_ktype,
+  ibnbd_srv_devices_kobj, dir_name);
+   if (ret)
+   return ret;
+
+   ret = kobject_init_and_add(>dev_sessions_kobj,
+  _srv_dev_sessions_ktype,
+  >dev_kobj, "sessions");
+   if (ret)
+   goto err;
+
+   ret = sysfs_create_group(>dev_kobj,
+_srv_default_dev_attr_group);
+   if (ret)
+   goto err2;
+
+   bdev_kobj = _to_dev(bdev->bd_disk)->kobj;
+   ret = sysfs_create_link(>dev_kobj, bdev_kobj, "block_dev");
+   if (ret)
+   goto err3;
+
+   return 0;
+
+err3:
+  

[PATCH 11/24] ibtrs: server: sysfs interface functions

2018-02-02 Thread Roman Pen
This is the sysfs interface to IBTRS sessions on server side:

  /sys/kernel/ibtrs_server//
*** IBTRS session accepted from a client peer
|
|- paths//
   *** established paths from a client in a session
   |
   |- disconnect
   |  *** disconnect path
   |
   |- hca_name
   |  *** HCA name
   |
   |- hca_port
   |  *** HCA port
   |
   |- stats/
  *** current path statistics
  |
  |- rdma
  |- reset_all
  |- wc_completions

Signed-off-by: Roman Pen <roman.peny...@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kip...@profitbricks.com>
Cc: Jack Wang <jinpu.w...@profitbricks.com>
---
 drivers/infiniband/ulp/ibtrs/ibtrs-srv-sysfs.c | 278 +
 1 file changed, 278 insertions(+)

diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-srv-sysfs.c 
b/drivers/infiniband/ulp/ibtrs/ibtrs-srv-sysfs.c
new file mode 100644
index ..ec2c86fe4181
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs/ibtrs-srv-sysfs.c
@@ -0,0 +1,278 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler <m...@fholler.de>
+ *  Jack Wang <jinpu.w...@profitbricks.com>
+ *  Kleber Souza <kleber.so...@profitbricks.com>
+ *  Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *  Milind Dumbare <milind.dumb...@gmail.com>
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#undef pr_fmt
+#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt
+
+#include "ibtrs-pri.h"
+#include "ibtrs-srv.h"
+#include "ibtrs-log.h"
+
+static struct kobject *ibtrs_kobj;
+
+static struct kobj_type ktype = {
+   .sysfs_ops  = _sysfs_ops,
+};
+
+static ssize_t ibtrs_srv_disconnect_show(struct kobject *kobj,
+struct kobj_attribute *attr,
+char *page)
+{
+   return scnprintf(page, PAGE_SIZE, "Usage: echo 1 > %s\n",
+attr->attr.name);
+}
+
+static ssize_t ibtrs_srv_disconnect_store(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ const char *buf, size_t count)
+{
+   struct ibtrs_srv_sess *sess;
+   char str[MAXHOSTNAMELEN];
+
+   sess = container_of(kobj, struct ibtrs_srv_sess, kobj);
+   if (!sysfs_streq(buf, "1")) {
+   ibtrs_err(sess, "%s: invalid value: '%s'\n",
+ attr->attr.name, buf);
+   return -EINVAL;
+   }
+
+   sockaddr_to_str((struct sockaddr *)>s.dst_addr, str, sizeof(str));
+
+   ibtrs_info(sess, "disconnect for path %s requested\n", str);
+   ibtrs_srv_queue_close(sess);
+
+   return count;
+}
+
+static struct kobj_attribute ibtrs_srv_disconnect_attr =
+   __ATTR(disconnect, 0644,
+  ibtrs_srv_disconnect_show, ibtrs_srv_disconnect_store);
+
+static ssize_t ibtrs_srv_hca_port_show(struct kobject *kobj,
+  struct kobj_attribute *attr,
+  char *page)
+{
+   struct ibtrs_srv_sess *sess;
+   struct ibtrs_con *usr_con;
+
+   sess = container_of(kobj, typeof(*sess), kobj);
+   usr_con = sess->s.con[0];
+
+   return scnprintf(page, PAGE_SIZE, "%u\n",
+usr_con->cm_id->port_num);
+}
+
+static struct kobj_attribute ibtrs_srv_hca_port_attr =
+   __ATTR(hca_port, 0444, ibtrs_srv_hca_port_show, NULL);
+
+static ssize_t ibtrs_srv_hca_name_show(struct kobject *kobj,
+  struct kobj_attribute *attr,
+  char *page)
+{
+   struct ibtrs_srv_sess *sess;
+
+   sess = container_of(kobj, struct ibtrs_srv_sess, kobj);
+
+   return scnprintf(page, PAGE_SIZE, "%s\n",

[PATCH 12/24] ibtrs: include client and server modules into kernel compilation

2018-02-02 Thread Roman Pen
Add IBTRS Makefile, Kconfig and also corresponding lines into upper
layer infiniband/ulp files.

Signed-off-by: Roman Pen <roman.peny...@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kip...@profitbricks.com>
Cc: Jack Wang <jinpu.w...@profitbricks.com>
---
 drivers/infiniband/Kconfig|  1 +
 drivers/infiniband/ulp/Makefile   |  1 +
 drivers/infiniband/ulp/ibtrs/Kconfig  | 20 
 drivers/infiniband/ulp/ibtrs/Makefile | 15 +++
 4 files changed, 37 insertions(+)

diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig
index cbf186522016..7adbd0e272c4 100644
--- a/drivers/infiniband/Kconfig
+++ b/drivers/infiniband/Kconfig
@@ -93,6 +93,7 @@ source "drivers/infiniband/ulp/srpt/Kconfig"
 
 source "drivers/infiniband/ulp/iser/Kconfig"
 source "drivers/infiniband/ulp/isert/Kconfig"
+source "drivers/infiniband/ulp/ibtrs/Kconfig"
 
 source "drivers/infiniband/ulp/opa_vnic/Kconfig"
 source "drivers/infiniband/sw/rdmavt/Kconfig"
diff --git a/drivers/infiniband/ulp/Makefile b/drivers/infiniband/ulp/Makefile
index 437813c7b481..1c4f10dc8d49 100644
--- a/drivers/infiniband/ulp/Makefile
+++ b/drivers/infiniband/ulp/Makefile
@@ -5,3 +5,4 @@ obj-$(CONFIG_INFINIBAND_SRPT)   += srpt/
 obj-$(CONFIG_INFINIBAND_ISER)  += iser/
 obj-$(CONFIG_INFINIBAND_ISERT) += isert/
 obj-$(CONFIG_INFINIBAND_OPA_VNIC)  += opa_vnic/
+obj-$(CONFIG_INFINIBAND_IBTRS) += ibtrs/
diff --git a/drivers/infiniband/ulp/ibtrs/Kconfig 
b/drivers/infiniband/ulp/ibtrs/Kconfig
new file mode 100644
index ..eaeb8f3f6b4e
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs/Kconfig
@@ -0,0 +1,20 @@
+config INFINIBAND_IBTRS
+   tristate
+   depends on INFINIBAND_ADDR_TRANS
+
+config INFINIBAND_IBTRS_CLIENT
+   tristate "IBTRS client module"
+   depends on INFINIBAND_ADDR_TRANS
+   select INFINIBAND_IBTRS
+   help
+ IBTRS client allows for simplified data transfer and connection
+ establishment over RDMA (InfiniBand, RoCE, iWarp). Uses BIO-like
+ READ/WRITE semantics and provides multipath capabilities.
+
+config INFINIBAND_IBTRS_SERVER
+   tristate "IBTRS server module"
+   depends on INFINIBAND_ADDR_TRANS
+   select INFINIBAND_IBTRS
+   help
+ IBTRS server module processing connection and IO requests received
+ from the IBTRS client module.
diff --git a/drivers/infiniband/ulp/ibtrs/Makefile 
b/drivers/infiniband/ulp/ibtrs/Makefile
new file mode 100644
index ..e6ea858745ad
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs/Makefile
@@ -0,0 +1,15 @@
+ibtrs-client-y := ibtrs-clt.o \
+ ibtrs-clt-stats.o \
+ ibtrs-clt-sysfs.o
+
+ibtrs-server-y := ibtrs-srv.o \
+ ibtrs-srv-stats.o \
+ ibtrs-srv-sysfs.o
+
+ibtrs-core-y := ibtrs.o
+
+obj-$(CONFIG_INFINIBAND_IBTRS)+= ibtrs-core.o
+obj-$(CONFIG_INFINIBAND_IBTRS_CLIENT) += ibtrs-client.o
+obj-$(CONFIG_INFINIBAND_IBTRS_SERVER) += ibtrs-server.o
+
+-include $(src)/compat/compat.mk
-- 
2.13.1



[PATCH 13/24] ibtrs: a bit of documentation

2018-02-02 Thread Roman Pen
README with description of major sysfs entries.

Signed-off-by: Roman Pen <roman.peny...@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kip...@profitbricks.com>
Cc: Jack Wang <jinpu.w...@profitbricks.com>
---
 drivers/infiniband/ulp/ibtrs/README | 238 
 1 file changed, 238 insertions(+)

diff --git a/drivers/infiniband/ulp/ibtrs/README 
b/drivers/infiniband/ulp/ibtrs/README
new file mode 100644
index ..ed506c7e202d
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs/README
@@ -0,0 +1,238 @@
+
+InfiniBand Transport (IBTRS)
+
+
+IBTRS (InfiniBand Transport) is a reliable high speed transport library
+which provides support to establish optimal number of connections
+between client and server machines using RDMA (InfiniBand, RoCE, iWarp)
+transport. It is optimized to transfer (read/write) IO blocks.
+
+In its core interface it follows the BIO semantics of providing the
+possibility to either write data from an sg list to the remote side
+or to request ("read") data transfer from the remote side into a given
+sg list.
+
+IBTRS provides I/O fail-over and load-balancing capabilities by using
+multipath I/O (see "add_path" and "mp_policy" configuration entries).
+
+IBTRS is used by the IBNBD (Infiniband Network Block Device) modules.
+
+==
+Client Sysfs Interface
+==
+
+This chapter describes only the most important files of sysfs interface
+on client side.
+
+Entries under /sys/kernel/ibtrs_client/
+===
+
+When a user of IBTRS API creates a new session, a directory entry with
+the name of that session is created.
+
+Entries under /sys/kernel/ibtrs_client//
+==
+
+add_path (RW)
+-
+
+Adds a new path (connection) to an existing session. Expected format is the
+following:
+
+  <[source addr,]destination addr>
+
+  *addr ::= [ ip:<ipv4|ipv6> | gid: ]
+
+max_reconnect_attempts (RW)
+---
+
+Maximum number reconnect attempts the client should make before giving up
+after connection breaks unexpectedly.
+
+mp_policy (RW)
+--
+
+Multipath policy specifies which path should be selected on each IO:
+
+   round-robin (0):
+   select path in per CPU round-robin manner.
+
+   min-inflight (1):
+   select path with minimum inflights.
+
+Entries under /sys/kernel/ibtrs_client//paths/
+
+
+
+Each path belonging to a given session is listed here by its destination
+address. When a new path is added to a session by writing to the "add_path"
+entry, a directory with the corresponding destination address is created.
+
+Entries under /sys/kernel/ibtrs_client//paths//
+
+
+state (R)
+-
+
+Contains "connected" if the session is connected to the peer and fully
+functional.  Otherwise the file contains "disconnected"
+
+reconnect (RW)
+--
+
+Write "1" to the file in order to reconnect the path.
+Operation is blocking and returns 0 if reconnect was successfull.
+
+disconnect (RW)
+---
+
+Write "1" to the file in order to disconnect the path.
+Operation blocks until IBTRS path is disconnected.
+
+remove_path (RW)
+
+
+Write "1" to the file in order to disconnected and remove the path
+from the session.  Operation blocks until the path is disconnected
+and removed from the session.
+
+Entries under /sys/kernel/ibtrs_client//paths//stats/
+==
+
+Write "0" to any file in that directory to reset corresponding statistics.
+
+reset_all (RW)
+--
+
+Read will return usage help, write 0 will clear all the statistics.
+
+sg_entries (RW)
+---
+
+Data to be transfered via RDMA is passed to IBTRS as scather-gather
+list. A scather-gather list can contain multiple entries.
+Scather-gather list with less entries require less processing power
+and can therefore transfered faster. The file sg_entries outputs a
+per-CPU distribution table for the number of entries in the
+scather-gather lists, that were passed to the IBTRS API function
+ibtrs_clt_request (READ or WRITE).
+
+cpu_migration (RW)
+--
+
+IBTRS expects that each HCA IRQ is pinned to a separate CPU. If it's
+not the case, the processing of an I/O response could be processed on a
+different CPU than where it was originally submitted.  This file shows
+how many interrupts where generated on a non expected CPU.
+"from:" is the CPU on which the IRQ was expected, but not generated.
+"to:" is the CPU on which the IRQ was generated, but not expected.
+
+reconnects (RW)
+---

[PATCH 10/24] ibtrs: server: statistics functions

2018-02-02 Thread Roman Pen
This introduces set of functions used on server side to account
statistics of RDMA data sent/received.

Signed-off-by: Roman Pen <roman.peny...@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kip...@profitbricks.com>
Cc: Jack Wang <jinpu.w...@profitbricks.com>
---
 drivers/infiniband/ulp/ibtrs/ibtrs-srv-stats.c | 110 +
 1 file changed, 110 insertions(+)

diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-srv-stats.c 
b/drivers/infiniband/ulp/ibtrs/ibtrs-srv-stats.c
new file mode 100644
index ..441b07fdf44a
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs/ibtrs-srv-stats.c
@@ -0,0 +1,110 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler <m...@fholler.de>
+ *  Jack Wang <jinpu.w...@profitbricks.com>
+ *  Kleber Souza <kleber.so...@profitbricks.com>
+ *  Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *  Milind Dumbare <milind.dumb...@gmail.com>
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#undef pr_fmt
+#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt
+
+#include "ibtrs-srv.h"
+
+void ibtrs_srv_update_rdma_stats(struct ibtrs_srv_stats *s,
+size_t size, int d)
+{
+   atomic64_inc(>rdma_stats.dir[d].cnt);
+   atomic64_add(size, >rdma_stats.dir[d].size_total);
+}
+
+void ibtrs_srv_update_wc_stats(struct ibtrs_srv_stats *s)
+{
+   atomic64_inc(>wc_comp.calls);
+   atomic64_inc(>wc_comp.total_wc_cnt);
+}
+
+int ibtrs_srv_reset_rdma_stats(struct ibtrs_srv_stats *stats, bool enable)
+{
+   if (enable) {
+   struct ibtrs_srv_stats_rdma_stats *r = >rdma_stats;
+
+   memset(r, 0, sizeof(*r));
+   return 0;
+   }
+
+   return -EINVAL;
+}
+
+ssize_t ibtrs_srv_stats_rdma_to_str(struct ibtrs_srv_stats *stats,
+   char *page, size_t len)
+{
+   struct ibtrs_srv_stats_rdma_stats *r = >rdma_stats;
+   struct ibtrs_srv_sess *sess;
+
+   sess = container_of(stats, typeof(*sess), stats);
+
+   return scnprintf(page, len, "%ld %ld %ld %ld %u\n",
+atomic64_read(>dir[READ].cnt),
+atomic64_read(>dir[READ].size_total),
+atomic64_read(>dir[WRITE].cnt),
+atomic64_read(>dir[WRITE].size_total),
+atomic_read(>ids_inflight));
+}
+
+int ibtrs_srv_reset_wc_completion_stats(struct ibtrs_srv_stats *stats,
+   bool enable)
+{
+   if (enable) {
+   memset(>wc_comp, 0, sizeof(stats->wc_comp));
+   return 0;
+   }
+
+   return -EINVAL;
+}
+
+int ibtrs_srv_stats_wc_completion_to_str(struct ibtrs_srv_stats *stats,
+char *buf, size_t len)
+{
+   return snprintf(buf, len, "%ld %ld\n",
+   atomic64_read(>wc_comp.total_wc_cnt),
+   atomic64_read(>wc_comp.calls));
+}
+
+ssize_t ibtrs_srv_reset_all_help(struct ibtrs_srv_stats *stats,
+char *page, size_t len)
+{
+   return scnprintf(page, PAGE_SIZE, "echo 1 to reset all statistics\n");
+}
+
+int ibtrs_srv_reset_all_stats(struct ibtrs_srv_stats *stats, bool enable)
+{
+   if (enable) {
+   ibtrs_srv_reset_wc_completion_stats(stats, enable);
+   ibtrs_srv_reset_rdma_stats(stats, enable);
+   return 0;
+   }
+
+   return -EINVAL;
+}
-- 
2.13.1



[PATCH 02/24] ibtrs: private headers with IBTRS protocol structs and helpers

2018-02-02 Thread Roman Pen
These are common private headers with IBTRS protocol structures,
logging, sysfs and other helper functions, which are used on
both client and server sides.

Signed-off-by: Roman Pen <roman.peny...@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kip...@profitbricks.com>
Cc: Jack Wang <jinpu.w...@profitbricks.com>
---
 drivers/infiniband/ulp/ibtrs/ibtrs-log.h |  94 ++
 drivers/infiniband/ulp/ibtrs/ibtrs-pri.h | 494 +++
 2 files changed, 588 insertions(+)

diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-log.h 
b/drivers/infiniband/ulp/ibtrs/ibtrs-log.h
new file mode 100644
index ..308593785c64
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs/ibtrs-log.h
@@ -0,0 +1,94 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler <m...@fholler.de>
+ *  Jack Wang <jinpu.w...@profitbricks.com>
+ *  Kleber Souza <kleber.so...@profitbricks.com>
+ *  Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *  Milind Dumbare <milind.dumb...@gmail.com>
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef IBTRS_LOG_H
+#define IBTRS_LOG_H
+
+#define P1 )
+#define P2 ))
+#define P3 )))
+#define P4 
+#define P(N) P ## N
+
+#define CAT(a, ...) PRIMITIVE_CAT(a, __VA_ARGS__)
+#define PRIMITIVE_CAT(a, ...) a ## __VA_ARGS__
+
+#define COUNT_ARGS(...) COUNT_ARGS_(,##__VA_ARGS__,6,5,4,3,2,1,0)
+#define COUNT_ARGS_(z,a,b,c,d,e,f,cnt,...) cnt
+
+#define LIST(...)  \
+   __VA_ARGS__,\
+   ({ unknown_type(); NULL; }) \
+   CAT(P, COUNT_ARGS(__VA_ARGS__)) \
+
+#define EMPTY()
+#define DEFER(id) id EMPTY()
+
+#define _CASE(obj, type, member)   \
+   __builtin_choose_expr(  \
+   __builtin_types_compatible_p(   \
+   typeof(obj), type), \
+   ((type)obj)->member
+#define CASE(o, t, m) DEFER(_CASE)(o,t,m)
+
+/*
+ * Below we define retrieving of sessname from common IBTRS types.
+ * Client or server related types have to be defined by special
+ * TYPES_TO_SESSNAME macro.
+ */
+
+void unknown_type(void);
+
+#ifndef TYPES_TO_SESSNAME
+#define TYPES_TO_SESSNAME(...) ({ unknown_type(); NULL; })
+#endif
+
+#define ibtrs_prefix(obj)  \
+   _CASE(obj, struct ibtrs_con *,  sess->sessname),\
+   _CASE(obj, struct ibtrs_sess *, sessname),  \
+   TYPES_TO_SESSNAME(obj)  \
+   ))
+
+#define ibtrs_log(fn, obj, fmt, ...)   \
+   fn("<%s>: " fmt, ibtrs_prefix(obj), ##__VA_ARGS__)
+
+#define ibtrs_err(obj, fmt, ...)   \
+   ibtrs_log(pr_err, obj, fmt, ##__VA_ARGS__)
+#define ibtrs_err_rl(obj, fmt, ...)\
+   ibtrs_log(pr_err_ratelimited, obj, fmt, ##__VA_ARGS__)
+#define ibtrs_wrn(obj, fmt, ...)   \
+   ibtrs_log(pr_warn, obj, fmt, ##__VA_ARGS__)
+#define ibtrs_wrn_rl(obj, fmt, ...) \
+   ibtrs_log(pr_warn_ratelimited, obj, fmt, ##__VA_ARGS__)
+#define ibtrs_info(obj, fmt, ...) \
+   ibtrs_log(pr_info, obj, fmt, ##__VA_ARGS__)
+#define ibtrs_info_rl(obj, fmt, ...) \
+   ibtrs_log(pr_info_ratelimited, obj, fmt, ##__VA_ARGS__)
+
+#endif /* IBTRS_LOG_H */
diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-pri.h 
b/drivers/infiniband/ulp/ibtrs/ibtrs-pri.h
new file mode 100644
index ..b3b51af8607e
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs/ibtrs-pri.h
@@ -0,0 +1,494 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler <m...@fholler.de>
+ *  Jack Wang <jinpu.w...@profitbricks.com>
+ *  Kleber Souza <kleber.so...@profitbricks.com>
+ *  Danil Kipnis <dan

[PATCH 04/24] ibtrs: client: private header with client structs and functions

2018-02-02 Thread Roman Pen
This header describes main structs and functions used by ibtrs-client
module, mainly for managing IBTRS sessions, creating/destroying sysfs
entries, accounting statistics on client side.

Signed-off-by: Roman Pen <roman.peny...@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kip...@profitbricks.com>
Cc: Jack Wang <jinpu.w...@profitbricks.com>
---
 drivers/infiniband/ulp/ibtrs/ibtrs-clt.h | 338 +++
 1 file changed, 338 insertions(+)

diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-clt.h 
b/drivers/infiniband/ulp/ibtrs/ibtrs-clt.h
new file mode 100644
index ..b57af19ac833
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs/ibtrs-clt.h
@@ -0,0 +1,338 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler <m...@fholler.de>
+ *  Jack Wang <jinpu.w...@profitbricks.com>
+ *  Kleber Souza <kleber.so...@profitbricks.com>
+ *  Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *  Milind Dumbare <milind.dumb...@gmail.com>
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef IBTRS_CLT_H
+#define IBTRS_CLT_H
+
+#include "ibtrs-pri.h"
+
+/**
+ * enum ibtrs_clt_state - Client states.
+ */
+enum ibtrs_clt_state {
+   IBTRS_CLT_CONNECTING,
+   IBTRS_CLT_CONNECTING_ERR,
+   IBTRS_CLT_RECONNECTING,
+   IBTRS_CLT_CONNECTED,
+   IBTRS_CLT_CLOSING,
+   IBTRS_CLT_CLOSED,
+   IBTRS_CLT_DEAD,
+};
+
+static inline const char *ibtrs_clt_state_str(enum ibtrs_clt_state state)
+{
+   switch (state) {
+   case IBTRS_CLT_CONNECTING:
+   return "IBTRS_CLT_CONNECTING";
+   case IBTRS_CLT_CONNECTING_ERR:
+   return "IBTRS_CLT_CONNECTING_ERR";
+   case IBTRS_CLT_RECONNECTING:
+   return "IBTRS_CLT_RECONNECTING";
+   case IBTRS_CLT_CONNECTED:
+   return "IBTRS_CLT_CONNECTED";
+   case IBTRS_CLT_CLOSING:
+   return "IBTRS_CLT_CLOSING";
+   case IBTRS_CLT_CLOSED:
+   return "IBTRS_CLT_CLOSED";
+   case IBTRS_CLT_DEAD:
+   return "IBTRS_CLT_DEAD";
+   default:
+   return "UNKNOWN";
+   }
+}
+
+enum ibtrs_fast_reg {
+   IBTRS_FAST_MEM_NONE,
+   IBTRS_FAST_MEM_FR,
+   IBTRS_FAST_MEM_FMR
+};
+
+enum ibtrs_mp_policy {
+   MP_POLICY_RR,
+   MP_POLICY_MIN_INFLIGHT,
+};
+
+struct ibtrs_clt_stats_reconnects {
+   int successful_cnt;
+   int fail_cnt;
+};
+
+struct ibtrs_clt_stats_wc_comp {
+   u32 cnt;
+   u64 total_cnt;
+};
+
+struct ibtrs_clt_stats_cpu_migr {
+   atomic_t from;
+   int to;
+};
+
+struct ibtrs_clt_stats_rdma {
+   struct {
+   u64 cnt;
+   u64 size_total;
+   } dir[2];
+
+   u64 failover_cnt;
+};
+
+struct ibtrs_clt_stats_rdma_lat {
+   u64 read;
+   u64 write;
+};
+
+#define MIN_LOG_SG 2
+#define MAX_LOG_SG 5
+#define MAX_LIN_SG BIT(MIN_LOG_SG)
+#define SG_DISTR_SZ (MAX_LOG_SG - MIN_LOG_SG + MAX_LIN_SG + 2)
+
+#define MAX_LOG_LAT 16
+#define MIN_LOG_LAT 0
+#define LOG_LAT_SZ (MAX_LOG_LAT - MIN_LOG_LAT + 2)
+
+struct ibtrs_clt_stats_pcpu {
+   struct ibtrs_clt_stats_cpu_migr cpu_migr;
+   struct ibtrs_clt_stats_rdma rdma;
+   u64 sg_list_total;
+   u64 sg_list_distr[SG_DISTR_SZ];
+   struct ibtrs_clt_stats_rdma_lat rdma_lat_distr[LOG_LAT_SZ];
+   struct ibtrs_clt_stats_rdma_lat rdma_lat_max;
+   struct ibtrs_clt_stats_wc_comp  wc_comp;
+};
+
+struct ibtrs_clt_stats {
+   boolenable_rdma_lat;
+   struct ibtrs_clt_stats_pcpu__percpu *pcpu_stats;
+   struct ibtrs_clt_stats_reconnects   reconnects;
+   atomic_tinflight;
+};
+
+struct ibtrs_clt_con {
+   struct ibtrs_con  

[PATCH 01/24] ibtrs: public interface header to establish RDMA connections

2018-02-02 Thread Roman Pen
Introduce public header which provides set of API functions to
establish RDMA connections from client to server machine using
IBTRS protocol, which manages RDMA connections for each session,
does multipathing and load balancing.

Main functions for client (active) side:

 ibtrs_clt_open() - Creates set of RDMA connections incapsulated
in IBTRS session and returns pointer on IBTRS
session object.
 ibtrs_clt_close() - Closes RDMA connections associated with IBTRS
 session.
 ibtrs_clt_request() - Requests zero-copy RDMA transfer to/from
   server.

Main functions for server (passive) side:

 ibtrs_srv_open() - Starts listening for IBTRS clients on specified
port and invokes IBTRS callbacks for incoming
RDMA requests or link events.
 ibtrs_srv_close() - Closes IBTRS server context.

Signed-off-by: Roman Pen <roman.peny...@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kip...@profitbricks.com>
Cc: Jack Wang <jinpu.w...@profitbricks.com>
---
 drivers/infiniband/ulp/ibtrs/ibtrs.h | 331 +++
 1 file changed, 331 insertions(+)

diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs.h 
b/drivers/infiniband/ulp/ibtrs/ibtrs.h
new file mode 100644
index ..747cdde3d9cf
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs/ibtrs.h
@@ -0,0 +1,331 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler <m...@fholler.de>
+ *  Jack Wang <jinpu.w...@profitbricks.com>
+ *  Kleber Souza <kleber.so...@profitbricks.com>
+ *  Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *  Milind Dumbare <milind.dumb...@gmail.com>
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef IBTRS_H
+#define IBTRS_H
+
+#include 
+#include 
+
+struct ibtrs_clt;
+struct ibtrs_srv_ctx;
+struct ibtrs_srv;
+struct ibtrs_srv_op;
+
+/*
+ * Here goes IBTRS client API
+ */
+
+/**
+ * enum ibtrs_clt_link_ev - Events about connectivity state of a client
+ * @IBTRS_CLT_LINK_EV_RECONNECTED  Client was reconnected.
+ * @IBTRS_CLT_LINK_EV_DISCONNECTED Client was disconnected.
+ */
+enum ibtrs_clt_link_ev {
+   IBTRS_CLT_LINK_EV_RECONNECTED,
+   IBTRS_CLT_LINK_EV_DISCONNECTED,
+};
+
+/**
+ * Source and destination address of a path to be established
+ */
+struct ibtrs_addr {
+   struct sockaddr *src;
+   struct sockaddr *dst;
+};
+
+typedef void (link_clt_ev_fn)(void *priv, enum ibtrs_clt_link_ev ev);
+/**
+ * ibtrs_clt_open() - Open a session to a IBTRS client
+ * @priv:  User supplied private data.
+ * @link_ev:   Event notification for connection state changes
+ * @priv:  user supplied data that was passed to
+ * ibtrs_clt_open()
+ * @ev:Occurred event
+ * @sessname: name of the session
+ * @paths: Paths to be established defined by their src and dst addresses
+ * @path_cnt: Number of elemnts in the @paths array
+ * @port: port to be used by the IBTRS session
+ * @pdu_sz: Size of extra payload which can be accessed after tag allocation.
+ * @max_inflight_msg: Max. number of parallel inflight messages for the session
+ * @max_segments: Max. number of segments per IO request
+ * @reconnect_delay_sec: time between reconnect tries
+ * @max_reconnect_attempts: Number of times to reconnect on error before giving
+ * up, 0 for * disabled, -1 for forever
+ *
+ * Starts session establishment with the ibtrs_server. The function can block
+ * up to ~2000ms until it returns.
+ *
+ * Return a valid pointer on success otherwise PTR_ERR.
+ */
+struct ibtrs_clt *ibtrs_clt_open(void *priv, link_clt_ev_fn *link_ev,
+const char *sessname,
+const struct ibtrs_addr *paths,
+size_t path_cnt, short port,
+size_t pdu_sz, u

[PATCH 08/24] ibtrs: server: private header with server structs and functions

2018-02-02 Thread Roman Pen
This header describes main structs and functions used by ibtrs-server
module, mainly for accepting IBTRS sessions, creating/destroying
sysfs entries, accounting statistics on server side.

Signed-off-by: Roman Pen <roman.peny...@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kip...@profitbricks.com>
Cc: Jack Wang <jinpu.w...@profitbricks.com>
---
 drivers/infiniband/ulp/ibtrs/ibtrs-srv.h | 169 +++
 1 file changed, 169 insertions(+)

diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-srv.h 
b/drivers/infiniband/ulp/ibtrs/ibtrs-srv.h
new file mode 100644
index ..f54e159eaf2a
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs/ibtrs-srv.h
@@ -0,0 +1,169 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler <m...@fholler.de>
+ *  Jack Wang <jinpu.w...@profitbricks.com>
+ *  Kleber Souza <kleber.so...@profitbricks.com>
+ *  Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *  Milind Dumbare <milind.dumb...@gmail.com>
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef IBTRS_SRV_H
+#define IBTRS_SRV_H
+
+#include 
+#include "ibtrs-pri.h"
+
+/**
+ * enum ibtrs_srv_state - Server states.
+ */
+enum ibtrs_srv_state {
+   IBTRS_SRV_CONNECTING,
+   IBTRS_SRV_CONNECTED,
+   IBTRS_SRV_CLOSING,
+   IBTRS_SRV_CLOSED,
+};
+
+static inline const char *ibtrs_srv_state_str(enum ibtrs_srv_state state)
+{
+   switch (state) {
+   case IBTRS_SRV_CONNECTING:
+   return "IBTRS_SRV_CONNECTING";
+   case IBTRS_SRV_CONNECTED:
+   return "IBTRS_SRV_CONNECTED";
+   case IBTRS_SRV_CLOSING:
+   return "IBTRS_SRV_CLOSING";
+   case IBTRS_SRV_CLOSED:
+   return "IBTRS_SRV_CLOSED";
+   default:
+   return "UNKNOWN";
+   }
+}
+
+struct ibtrs_stats_wc_comp {
+   atomic64_t  calls;
+   atomic64_t  total_wc_cnt;
+};
+
+struct ibtrs_srv_stats_rdma_stats {
+   struct {
+   atomic64_t  cnt;
+   atomic64_t  size_total;
+   } dir[2];
+};
+
+struct ibtrs_srv_stats {
+   struct ibtrs_srv_stats_rdma_stats   rdma_stats;
+   atomic_tapm_cnt;
+   struct ibtrs_stats_wc_comp  wc_comp;
+};
+
+struct ibtrs_srv_con {
+   struct ibtrs_conc;
+   atomic_twr_cnt;
+};
+
+struct ibtrs_srv_op {
+   struct ibtrs_srv_con*con;
+   u32 msg_id;
+   u8  dir;
+   u64 data_dma_addr;
+   struct ibtrs_msg_rdma_read  *msg;
+   struct ib_rdma_wr   *tx_wr;
+   struct ib_sge   *tx_sg;
+};
+
+struct ibtrs_srv_sess {
+   struct ibtrs_sess   s;
+   struct ibtrs_srv*srv;
+   struct work_struct  close_work;
+   enum ibtrs_srv_statestate;
+   spinlock_t  state_lock;
+   int cur_cq_vector;
+   struct ibtrs_srv_op **ops_ids;
+   atomic_tids_inflight;
+   wait_queue_head_t   ids_waitq;
+   dma_addr_t  *rdma_addr;
+   boolestablished;
+   unsigned intmem_bits;
+   struct kobject  kobj;
+   struct kobject  kobj_stats;
+   struct ibtrs_srv_stats  stats;
+};
+
+struct ibtrs_srv {
+   struct list_headpaths_list;
+   int paths_up;
+   struct mutexpaths_ev_mutex;
+   size_t  paths_num;
+   struct mutexpaths_mutex;
+   uuid_t  paths_uuid;
+   refcount_t  refcount;
+   struct ibtrs_srv_ctx*ctx;
+   struct list_headctx_list;
+   void*priv;
+   size_t  queue_depth;
+

[PATCH 09/24] ibtrs: server: main functionality

2018-02-02 Thread Roman Pen
This is main functionality of ibtrs-server module, which accepts
set of RDMA connections (so called IBTRS session), creates/destroys
sysfs entries associated with IBTRS session and notifies upper layer
(user of IBTRS API) about RDMA requests or link events.

Signed-off-by: Roman Pen <roman.peny...@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kip...@profitbricks.com>
Cc: Jack Wang <jinpu.w...@profitbricks.com>
---
 drivers/infiniband/ulp/ibtrs/ibtrs-srv.c | 1811 ++
 1 file changed, 1811 insertions(+)

diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-srv.c 
b/drivers/infiniband/ulp/ibtrs/ibtrs-srv.c
new file mode 100644
index ..0d1fc08bd821
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs/ibtrs-srv.c
@@ -0,0 +1,1811 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler <m...@fholler.de>
+ *  Jack Wang <jinpu.w...@profitbricks.com>
+ *  Kleber Souza <kleber.so...@profitbricks.com>
+ *  Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *  Milind Dumbare <milind.dumb...@gmail.com>
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#undef pr_fmt
+#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt
+
+#include 
+#include 
+
+#include "ibtrs-srv.h"
+#include "ibtrs-log.h"
+
+MODULE_AUTHOR("ib...@profitbricks.com");
+MODULE_DESCRIPTION("IBTRS Server");
+MODULE_VERSION(IBTRS_VER_STRING);
+MODULE_LICENSE("GPL");
+
+#define DEFAULT_MAX_IO_SIZE_KB 128
+#define DEFAULT_MAX_IO_SIZE (DEFAULT_MAX_IO_SIZE_KB * 1024)
+#define MAX_REQ_SIZE PAGE_SIZE
+#define MAX_SG_COUNT ((MAX_REQ_SIZE - sizeof(struct ibtrs_msg_rdma_read)) \
+ / sizeof(struct ibtrs_sg_desc))
+
+static int max_io_size = DEFAULT_MAX_IO_SIZE;
+static int rcv_buf_size = DEFAULT_MAX_IO_SIZE + MAX_REQ_SIZE;
+
+static int max_io_size_set(const char *val, const struct kernel_param *kp)
+{
+   int err, ival;
+
+   err = kstrtoint(val, 0, );
+   if (err)
+   return err;
+
+   if (ival < 4096 || ival + MAX_REQ_SIZE > (4096 * 1024) ||
+   (ival + MAX_REQ_SIZE) % 512 != 0) {
+   pr_err("Invalid max io size value %d, has to be"
+  " > %d, < %d\n", ival, 4096, 4194304);
+   return -EINVAL;
+   }
+
+   max_io_size = ival;
+   rcv_buf_size = max_io_size + MAX_REQ_SIZE;
+   pr_info("max io size changed to %d\n", ival);
+
+   return 0;
+}
+
+static const struct kernel_param_ops max_io_size_ops = {
+   .set= max_io_size_set,
+   .get= param_get_int,
+};
+module_param_cb(max_io_size, _io_size_ops, _io_size, 0444);
+MODULE_PARM_DESC(max_io_size,
+"Max size for each IO request, when change the unit is in byte"
+" (default: " __stringify(DEFAULT_MAX_IO_SIZE_KB) "KB)");
+
+#define DEFAULT_SESS_QUEUE_DEPTH 512
+static int sess_queue_depth = DEFAULT_SESS_QUEUE_DEPTH;
+module_param_named(sess_queue_depth, sess_queue_depth, int, 0444);
+MODULE_PARM_DESC(sess_queue_depth,
+"Number of buffers for pending I/O requests to allocate"
+" per session. Maximum: " __stringify(MAX_SESS_QUEUE_DEPTH)
+" (default: " __stringify(DEFAULT_SESS_QUEUE_DEPTH) ")");
+
+/* We guarantee to serve 10 paths at least */
+#define CHUNK_POOL_SIZE (DEFAULT_SESS_QUEUE_DEPTH * 10)
+static mempool_t *chunk_pool;
+
+static int retry_count = 7;
+
+static int retry_count_set(const char *val, const struct kernel_param *kp)
+{
+   int err, ival;
+
+   err = kstrtoint(val, 0, );
+   if (err)
+   return err;
+
+   if (ival < MIN_RTR_CNT || ival > MAX_RTR_CNT) {
+   pr_err("Invalid retry count value %d, has to be"
+  &quo

[PATCH 00/24] InfiniBand Transport (IBTRS) and Network Block Device (IBNBD)

2018-02-02 Thread Roman Pen
This series introduces IBNBD/IBTRS modules.

IBTRS (InfiniBand Transport) is a reliable high speed transport library
which allows for establishing connection between client and server
machines via RDMA. It is optimized to transfer (read/write) IO blocks
in the sense that it follows the BIO semantics of providing the
possibility to either write data from a scatter-gather list to the
remote side or to request ("read") data transfer from the remote side
into a given set of buffers.

IBTRS is multipath capable and provides I/O fail-over and load-balancing
functionality.

IBNBD (InfiniBand Network Block Device) is a pair of kernel modules
(client and server) that allow for remote access of a block device on
the server over IBTRS protocol. After being mapped, the remote block
devices can be accessed on the client side as local block devices.
Internally IBNBD uses IBTRS as an RDMA transport library.

Why?

   - IBNBD/IBTRS is developed in order to map thin provisioned volumes,
 thus internal protocol is simple and consists of several request
 types only without awareness of underlaying hardware devices.
   - IBTRS was developed as an independent RDMA transport library, which
 supports fail-over and load-balancing policies using multipath, thus
 it can be used for any other IO needs rather than only for block
 device.
   - IBNBD/IBTRS is faster than NVME over RDMA.  Old comparison results:
 https://www.spinics.net/lists/linux-rdma/msg48799.html
 (I retested on latest 4.14 kernel - there is no any significant
  difference, thus I post the old link).

Key features of IBTRS transport library and IBNBD block device:

o High throughput and low latency due to:
   - Only two RDMA messages per IO.
   - IMM InfiniBand messages on responses to reduce round trip latency.
   - Simplified memory management: memory allocation happens once on
 server side when IBTRS session is established.

o IO fail-over and load-balancing by using multipath.

o Simple configuration of IBNBD:
   - Server side is completely passive: volumes do not need to be
 explicitly exported.
   - Only IB port GID and device path needed on client side to map
 a block device.
   - A device is remapped automatically i.e. after storage reboot.

This series is a second try, first variant was published [1] and
presented on Vault in 2017 [2].

Since the first version the following was changed:

   - Load-balancing and IO fail-over using multipath features were added.
   - Major parts of the code were rewritten, simplified and overall code
 size was reduced by a quarter.

Commits for kernel can be found here:
   https://github.com/profitbricks/ibnbd/commits/linux-4.15-rc8

The out-of-tree modules are here:
   https://github.com/profitbricks/ibnbd/

[1] https://lwn.net/Articles/718181/
[2] 
http://events.linuxfoundation.org/sites/events/files/slides/IBNBD-Vault-2017.pdf

Roman Pen (24):
  ibtrs: public interface header to establish RDMA connections
  ibtrs: private headers with IBTRS protocol structs and helpers
  ibtrs: core: lib functions shared between client and server modules
  ibtrs: client: private header with client structs and functions
  ibtrs: client: main functionality
  ibtrs: client: statistics functions
  ibtrs: client: sysfs interface functions
  ibtrs: server: private header with server structs and functions
  ibtrs: server: main functionality
  ibtrs: server: statistics functions
  ibtrs: server: sysfs interface functions
  ibtrs: include client and server modules into kernel compilation
  ibtrs: a bit of documentation
  ibnbd: private headers with IBNBD protocol structs and helpers
  ibnbd: client: private header with client structs and functions
  ibnbd: client: main functionality
  ibnbd: client: sysfs interface functions
  ibnbd: server: private header with server structs and functions
  ibnbd: server: main functionality
  ibnbd: server: functionality for IO submission to file or block dev
  ibnbd: server: sysfs interface functions
  ibnbd: include client and server modules into kernel compilation
  ibnbd: a bit of documentation
  MAINTAINERS: Add maintainer for IBNBD/IBTRS modules

 MAINTAINERS|   14 +
 drivers/block/Kconfig  |2 +
 drivers/block/Makefile |1 +
 drivers/block/ibnbd/Kconfig|   22 +
 drivers/block/ibnbd/Makefile   |   13 +
 drivers/block/ibnbd/README |  272 ++
 drivers/block/ibnbd/ibnbd-clt-sysfs.c  |  723 +
 drivers/block/ibnbd/ibnbd-clt.c| 1959 +
 drivers/block/ibnbd/ibnbd-clt.h|  193 ++
 drivers/block/ibnbd/ibnbd-log.h|   71 +
 drivers/block/ibnbd/ibnbd-proto.h  |  360 +++
 drivers/block/ibnbd/ibnbd-srv-dev.c|  410 +++
 drivers/block/ibnbd/ibnbd-srv-dev.h|  149 +
 drivers/block/ibnbd/ibnbd-s

[PATCH 03/24] ibtrs: core: lib functions shared between client and server modules

2018-02-02 Thread Roman Pen
This is a set of library functions existing as a ibtrs-core module,
used by client and server modules.

Mainly these functions wrap IB and RDMA calls and provide a bit higher
abstraction for implementing of IBTRS protocol on client or server
sides.

Signed-off-by: Roman Pen <roman.peny...@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kip...@profitbricks.com>
Cc: Jack Wang <jinpu.w...@profitbricks.com>
---
 drivers/infiniband/ulp/ibtrs/ibtrs.c | 582 +++
 1 file changed, 582 insertions(+)

diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs.c 
b/drivers/infiniband/ulp/ibtrs/ibtrs.c
new file mode 100644
index ..007380506959
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs/ibtrs.c
@@ -0,0 +1,582 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler <m...@fholler.de>
+ *  Jack Wang <jinpu.w...@profitbricks.com>
+ *  Kleber Souza <kleber.so...@profitbricks.com>
+ *  Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *  Milind Dumbare <milind.dumb...@gmail.com>
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#undef pr_fmt
+#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt
+
+#include 
+#include 
+
+#include "ibtrs-pri.h"
+#include "ibtrs-log.h"
+
+MODULE_AUTHOR("ib...@profitbricks.com");
+MODULE_DESCRIPTION("IBTRS Core");
+MODULE_VERSION(IBTRS_VER_STRING);
+MODULE_LICENSE("GPL");
+
+static LIST_HEAD(device_list);
+static DEFINE_MUTEX(device_list_mutex);
+
+struct ibtrs_iu *ibtrs_iu_alloc(u32 tag, size_t size, gfp_t gfp_mask,
+   struct ib_device *dma_dev,
+   enum dma_data_direction direction,
+   void (*done)(struct ib_cq *cq,
+struct ib_wc *wc))
+{
+   struct ibtrs_iu *iu;
+
+   iu = kmalloc(sizeof(*iu), gfp_mask);
+   if (unlikely(!iu))
+   return NULL;
+
+   iu->buf = kzalloc(size, gfp_mask);
+   if (unlikely(!iu->buf))
+   goto err1;
+
+   iu->dma_addr = ib_dma_map_single(dma_dev, iu->buf, size, direction);
+   if (unlikely(ib_dma_mapping_error(dma_dev, iu->dma_addr)))
+   goto err2;
+
+   iu->cqe.done  = done;
+   iu->size  = size;
+   iu->direction = direction;
+   iu->tag   = tag;
+
+   return iu;
+
+err2:
+   kfree(iu->buf);
+err1:
+   kfree(iu);
+
+   return NULL;
+}
+EXPORT_SYMBOL_GPL(ibtrs_iu_alloc);
+
+void ibtrs_iu_free(struct ibtrs_iu *iu, enum dma_data_direction dir,
+  struct ib_device *ibdev)
+{
+   if (!iu)
+   return;
+
+   ib_dma_unmap_single(ibdev, iu->dma_addr, iu->size, dir);
+   kfree(iu->buf);
+   kfree(iu);
+}
+EXPORT_SYMBOL_GPL(ibtrs_iu_free);
+
+int ibtrs_iu_post_recv(struct ibtrs_con *con, struct ibtrs_iu *iu)
+{
+   struct ibtrs_sess *sess = con->sess;
+   struct ib_recv_wr wr, *bad_wr;
+   struct ib_sge list;
+
+   list.addr   = iu->dma_addr;
+   list.length = iu->size;
+   list.lkey   = sess->ib_dev->lkey;
+
+   if (WARN_ON(list.length == 0)) {
+   ibtrs_wrn(con, "Posting receive work request failed,"
+ " sg list is empty\n");
+   return -EINVAL;
+   }
+
+   wr.next= NULL;
+   wr.wr_cqe  = >cqe;
+   wr.sg_list = 
+   wr.num_sge = 1;
+
+   return ib_post_recv(con->qp, , _wr);
+}
+EXPORT_SYMBOL_GPL(ibtrs_iu_post_recv);
+
+int ibtrs_post_recv_empty(struct ibtrs_con *con, struct ib_cqe *cqe)
+{
+   struct ib_recv_wr wr, *bad_wr;
+
+   wr.next= NULL;
+   wr.wr_cqe  = cqe;
+   wr.sg_list = NULL;
+   wr.num_sge = 0;
+
+   return ib_post_recv(con->qp, , _wr);
+}
+EXPORT_SYMBOL_GPL(ibtrs_post_recv_empty);
+
+in

[PATCH 06/24] ibtrs: client: statistics functions

2018-02-02 Thread Roman Pen
This introduces set of functions used on client side to account
statistics of RDMA data sent/received, amount of IOs inflight,
latency, cpu migrations, etc.  Almost all statistics is collected
using percpu variables.

Signed-off-by: Roman Pen <roman.peny...@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kip...@profitbricks.com>
Cc: Jack Wang <jinpu.w...@profitbricks.com>
---
 drivers/infiniband/ulp/ibtrs/ibtrs-clt-stats.c | 455 +
 1 file changed, 455 insertions(+)

diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-clt-stats.c 
b/drivers/infiniband/ulp/ibtrs/ibtrs-clt-stats.c
new file mode 100644
index ..af2ed05d2900
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs/ibtrs-clt-stats.c
@@ -0,0 +1,455 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler <m...@fholler.de>
+ *  Jack Wang <jinpu.w...@profitbricks.com>
+ *  Kleber Souza <kleber.so...@profitbricks.com>
+ *  Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *  Milind Dumbare <milind.dumb...@gmail.com>
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#undef pr_fmt
+#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt
+
+#include "ibtrs-clt.h"
+
+static inline int ibtrs_clt_ms_to_id(unsigned long ms)
+{
+   int id = ms ? ilog2(ms) - MIN_LOG_LAT + 1 : 0;
+
+   return clamp(id, 0, LOG_LAT_SZ - 1);
+}
+
+void ibtrs_clt_update_rdma_lat(struct ibtrs_clt_stats *stats, bool read,
+  unsigned long ms)
+{
+   struct ibtrs_clt_stats_pcpu *s;
+   int id;
+
+   id = ibtrs_clt_ms_to_id(ms);
+   s = this_cpu_ptr(stats->pcpu_stats);
+   if (read) {
+   s->rdma_lat_distr[id].read++;
+   if (s->rdma_lat_max.read < ms)
+   s->rdma_lat_max.read = ms;
+   } else {
+   s->rdma_lat_distr[id].write++;
+   if (s->rdma_lat_max.write < ms)
+   s->rdma_lat_max.write = ms;
+   }
+}
+
+void ibtrs_clt_decrease_inflight(struct ibtrs_clt_stats *stats)
+{
+   atomic_dec(>inflight);
+}
+
+void ibtrs_clt_update_wc_stats(struct ibtrs_clt_con *con)
+{
+   struct ibtrs_clt_sess *sess = to_clt_sess(con->c.sess);
+   struct ibtrs_clt_stats *stats = >stats;
+   struct ibtrs_clt_stats_pcpu *s;
+   int cpu;
+
+   cpu = raw_smp_processor_id();
+   s = this_cpu_ptr(stats->pcpu_stats);
+   s->wc_comp.cnt++;
+   s->wc_comp.total_cnt++;
+   if (unlikely(con->cpu != cpu)) {
+   s->cpu_migr.to++;
+
+   /* Careful here, override s pointer */
+   s = per_cpu_ptr(stats->pcpu_stats, con->cpu);
+   atomic_inc(>cpu_migr.from);
+   }
+}
+
+void ibtrs_clt_inc_failover_cnt(struct ibtrs_clt_stats *stats)
+{
+   struct ibtrs_clt_stats_pcpu *s;
+
+   s = this_cpu_ptr(stats->pcpu_stats);
+   s->rdma.failover_cnt++;
+}
+
+static inline u32 ibtrs_clt_stats_get_avg_wc_cnt(struct ibtrs_clt_stats *stats)
+{
+   u32 cnt = 0;
+   u64 sum = 0;
+   int cpu;
+
+   for_each_possible_cpu(cpu) {
+   struct ibtrs_clt_stats_pcpu *s;
+
+   s = per_cpu_ptr(stats->pcpu_stats, cpu);
+   sum += s->wc_comp.total_cnt;
+   cnt += s->wc_comp.cnt;
+   }
+
+   return cnt ? sum / cnt : 0;
+}
+
+int ibtrs_clt_stats_wc_completion_to_str(struct ibtrs_clt_stats *stats,
+char *buf, size_t len)
+{
+   return scnprintf(buf, len, "%u\n",
+ibtrs_clt_stats_get_avg_wc_cnt(stats));
+}
+
+ssize_t ibtrs_clt_stats_rdma_lat_distr_to_str(struct ibtrs_clt_stats *stats,
+ char *page, size_t len)
+{
+   struct ibtrs_clt_stats_rdma_lat res[LOG_LAT_SZ];
+   struct ibtrs_clt_stats_rdma_lat max;
+   struct 

[PATCH 05/24] ibtrs: client: main functionality

2018-02-02 Thread Roman Pen
This is main functionality of ibtrs-client module, which manages
set of RDMA connections for each IBTRS session, does multipathing,
load balancing and failover of RDMA requests.

Signed-off-by: Roman Pen <roman.peny...@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kip...@profitbricks.com>
Cc: Jack Wang <jinpu.w...@profitbricks.com>
---
 drivers/infiniband/ulp/ibtrs/ibtrs-clt.c | 3496 ++
 1 file changed, 3496 insertions(+)

diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-clt.c 
b/drivers/infiniband/ulp/ibtrs/ibtrs-clt.c
new file mode 100644
index ..aa0a17f2a78c
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs/ibtrs-clt.c
@@ -0,0 +1,3496 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler <m...@fholler.de>
+ *  Jack Wang <jinpu.w...@profitbricks.com>
+ *  Kleber Souza <kleber.so...@profitbricks.com>
+ *  Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *  Milind Dumbare <milind.dumb...@gmail.com>
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#undef pr_fmt
+#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt
+
+#include 
+#include 
+
+#include "ibtrs-clt.h"
+#include "ibtrs-log.h"
+
+#define RECONNECT_SEED 8
+#define MAX_SEGMENTS 31
+
+#define IBTRS_CONNECT_TIMEOUT_MS 5000
+
+MODULE_AUTHOR("ib...@profitbricks.com");
+MODULE_DESCRIPTION("IBTRS Client");
+MODULE_VERSION(IBTRS_VER_STRING);
+MODULE_LICENSE("GPL");
+
+static bool use_fr;
+module_param(use_fr, bool, 0444);
+MODULE_PARM_DESC(use_fr, "use FRWR mode for memory registration if possible."
+" (default: 0)");
+
+static ushort nr_cons_per_session;
+module_param(nr_cons_per_session, ushort, 0444);
+MODULE_PARM_DESC(nr_cons_per_session, "Number of connections per session."
+" (default: nr_cpu_ids)");
+
+static int retry_count = 7;
+
+static int retry_count_set(const char *val, const struct kernel_param *kp)
+{
+   int err, ival;
+
+   err = kstrtoint(val, 0, );
+   if (err)
+   return err;
+
+   if (ival < MIN_RTR_CNT || ival > MAX_RTR_CNT)
+   return -EINVAL;
+
+   retry_count = ival;
+
+   return 0;
+}
+
+static const struct kernel_param_ops retry_count_ops = {
+   .set= retry_count_set,
+   .get= param_get_int,
+};
+module_param_cb(retry_count, _count_ops, _count, 0644);
+
+MODULE_PARM_DESC(retry_count, "Number of times to send the message if the"
+" remote side didn't respond with Ack or Nack (default: 3,"
+" min: " __stringify(MIN_RTR_CNT) ", max: "
+__stringify(MAX_RTR_CNT) ")");
+
+static int fmr_sg_cnt = 4;
+module_param_named(fmr_sg_cnt, fmr_sg_cnt, int, 0644);
+MODULE_PARM_DESC(fmr_sg_cnt, "when sg_cnt is bigger than fmr_sg_cnt, enable"
+" FMR (default: 4)");
+
+static struct workqueue_struct *ibtrs_wq;
+
+static void ibtrs_rdma_error_recovery(struct ibtrs_clt_con *con);
+static void ibtrs_clt_rdma_done(struct ib_cq *cq, struct ib_wc *wc);
+
+static inline void ibtrs_clt_state_lock(void)
+{
+   rcu_read_lock();
+}
+
+static inline void ibtrs_clt_state_unlock(void)
+{
+   rcu_read_unlock();
+}
+
+#define cmpxchg_min(var, new) ({   \
+   typeof(var) old;\
+   \
+   do {\
+   old = var;  \
+   new = (!old ? new : min_t(typeof(var), old, new));  \
+   } while (cmpxchg(, old, new) != old);   \
+})
+
+static void ibtrs_clt_set_min_queue_depth(struct ibtrs_clt *clt, size_t 

[PATCH 07/24] ibtrs: client: sysfs interface functions

2018-02-02 Thread Roman Pen
This is the sysfs interface to IBTRS sessions on client side:

  /sys/kernel/ibtrs_client//
*** IBTRS session created by ibtrs_clt_open() API call
|
|- max_reconnect_attempts
|  *** number of reconnect attempts for session
|
|- add_path
|  *** adds another connection path into IBTRS session
|
|- paths//
   *** established paths to server in a session
   |
   |- disconnect
   |  *** disconnect path
   |
   |- reconnect
   |  *** reconnect path
   |
   |- remove_path
   |  *** remove current path
   |
   |- state
   |  *** retrieve current path state
   |
   |- stats/
  *** current path statistics
  |
  |- cpu_migration
  |- rdma
  |- rdma_lat
  |- reconnects
  |- reset_all
  |- sg_entries
  |- wc_completions

Signed-off-by: Roman Pen <roman.peny...@profitbricks.com>
Signed-off-by: Danil Kipnis <danil.kip...@profitbricks.com>
Cc: Jack Wang <jinpu.w...@profitbricks.com>
---
 drivers/infiniband/ulp/ibtrs/ibtrs-clt-sysfs.c | 519 +
 1 file changed, 519 insertions(+)

diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-clt-sysfs.c 
b/drivers/infiniband/ulp/ibtrs/ibtrs-clt-sysfs.c
new file mode 100644
index ..04949d6d796b
--- /dev/null
+++ b/drivers/infiniband/ulp/ibtrs/ibtrs-clt-sysfs.c
@@ -0,0 +1,519 @@
+/*
+ * InfiniBand Transport Layer
+ *
+ * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved.
+ * Authors: Fabian Holler <m...@fholler.de>
+ *  Jack Wang <jinpu.w...@profitbricks.com>
+ *  Kleber Souza <kleber.so...@profitbricks.com>
+ *  Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *  Milind Dumbare <milind.dumb...@gmail.com>
+ *
+ * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved.
+ * Authors: Danil Kipnis <danil.kip...@profitbricks.com>
+ *  Roman Penyaev <roman.peny...@profitbricks.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#undef pr_fmt
+#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt
+
+#include "ibtrs-pri.h"
+#include "ibtrs-clt.h"
+#include "ibtrs-log.h"
+
+static struct kobject *ibtrs_kobj;
+
+#define MIN_MAX_RECONN_ATT -1
+#define MAX_MAX_RECONN_ATT 
+
+static struct kobj_type ktype = {
+   .sysfs_ops = _sysfs_ops,
+};
+
+static ssize_t ibtrs_clt_max_reconn_attempts_show(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ char *page)
+{
+   struct ibtrs_clt *clt;
+
+   clt = container_of(kobj, struct ibtrs_clt, kobj);
+
+   return sprintf(page, "%d\n", ibtrs_clt_get_max_reconnect_attempts(clt));
+}
+
+static ssize_t ibtrs_clt_max_reconn_attempts_store(struct kobject *kobj,
+  struct kobj_attribute *attr,
+  const char *buf,
+  size_t count)
+{
+   struct ibtrs_clt *clt;
+   int value;
+   int ret;
+
+   clt = container_of(kobj, struct ibtrs_clt, kobj);
+
+   ret = kstrtoint(buf, 10, );
+   if (unlikely(ret)) {
+   ibtrs_err(clt, "%s: failed to convert string '%s' to int\n",
+ attr->attr.name, buf);
+   return ret;
+   }
+   if (unlikely(value > MAX_MAX_RECONN_ATT ||
+value < MIN_MAX_RECONN_ATT)) {
+   ibtrs_err(clt, "%s: invalid range"
+ " (provided: '%s', accepted: min: %d, max: %d)\n",
+ attr->attr.name, buf, MIN_MAX_RECONN_ATT,
+ MAX_MAX_RECONN_ATT);
+   return -EINVAL;
+   }
+   ibtrs_clt_set_max_reconnect_attempts(clt, value);
+
+   return count;
+}
+
+static struct kobj_attribute ibtrs_clt_max_reconnect_attempts_attr =
+   __ATTR(max_reconnect_attempts, 0644,
+  ibtrs_clt_max_reconn_attempts_show,
+  ibtrs_clt_max_reconn_attempts_store);
+
+static ssize_t ibtrs_clt_m

[PATCH 1/1] [RFC] blk-mq: fix queue stalling on shared hctx restart

2017-10-18 Thread Roman Pen
Hi all,

the patch below fixes queue stalling when shared hctx marked for restart
(BLK_MQ_S_SCHED_RESTART bit) but q->shared_hctx_restart stays zero.  The
root cause is that hctxs are shared between queues, but 'shared_hctx_restart'
belongs to the particular queue, which in fact may not need to be restarted,
thus we return from blk_mq_sched_restart() and leave shared hctx of another
queue never restarted.

The fix is to make shared_hctx_restart counter belong not to the queue, but
to tags, thereby counter will reflect real number of shared hctx needed to
be restarted.

During tests 1 hctx (set->nr_hw_queues) was used and all stalled requests
were noticed in dd->fifo_list of mq-deadline scheduler.

Seeming possible sequence of events:

1. Request A of queue A is inserted into dd->fifo_list of the scheduler.

2. Request B of queue A bypasses scheduler and goes directly to
   hctx->dispatch.

3. Request C of queue B is inserted.

4. blk_mq_sched_dispatch_requests() is invoked, since hctx->dispatch is not
   empty (request B is in the list) hctx is only marked for for next restart
   and request A is left in a list (see comment "So it's best to leave them
   there for as long as we can. Mark the hw queue as needing a restart in
   that case." in blk-mq-sched.c)

5. Eventually request B is completed/freed and blk_mq_sched_restart() is
   called, but by chance hctx from queue B is chosen for restart and request C
   gets a chance to be dispatched.

6. Eventually request C is completed/freed and blk_mq_sched_restart() is
   called, but shared_hctx_restart for queue B is zero and we return without
   attempt to restart hctx from queue A, thus request A is stuck forever.

But stalling queue is not the only one problem with blk_mq_sched_restart().
My tests show that those loops thru all queues and hctxs can be very costly,
even with shared_hctx_restart counter, which aims to fix performance issue.
For my tests I create 128 devices with 64 hctx each, which share same tags
set.

The following is the fio and ftrace output for v4.14-rc4 kernel:

 READ: io=5630.3MB, aggrb=573208KB/s, minb=573208KB/s, maxb=573208KB/s, 
mint=10058msec, maxt=10058msec
WRITE: io=5650.9MB, aggrb=575312KB/s, minb=575312KB/s, maxb=575312KB/s, 
mint=10058msec, maxt=10058msec

root@pserver16:~/roman# cat /sys/kernel/debug/tracing/trace_stat/* | grep blk_mq
  Function  Hit TimeAvg s^2
    --- --- ---
  blk_mq_sched_restart 163479540759 us  583.639 us  8804801 us
  blk_mq_sched_restart  78846073471 us  770.354 us  8780054 us
  blk_mq_sched_restart 141767586794 us  535.185 us  2822731 us
  blk_mq_sched_restart  78436205435 us  791.206 us  12424960 us
  blk_mq_sched_restart  14904786107 us  3212.153 us 1949753 us
  blk_mq_sched_restart  78926039311 us  765.244 us  2994627 us
  blk_mq_sched_restart 153827511126 us  488.306 us  3090912 us
  [cut]

And here are results with two patches reverted:
   8e8320c9315c ("blk-mq: fix performance regression with shared tags")
   6d8c6c0f97ad ("blk-mq: Restart a single queue if tag sets are shared")

 READ: io=12884MB, aggrb=1284.3MB/s, minb=1284.3MB/s, maxb=1284.3MB/s, 
mint=10032msec, maxt=10032msec
WRITE: io=12987MB, aggrb=1294.6MB/s, minb=1294.6MB/s, maxb=1294.6MB/s, 
mint=10032msec, maxt=10032msec

root@pserver16:~/roman# cat /sys/kernel/debug/tracing/trace_stat/* | grep blk_mq
  Function  Hit  TimeAvg s^2
    ---  --- ---
  blk_mq_sched_restart  506998802.349 us 0.173 us121.771 us
  blk_mq_sched_restart  503628740.470 us 0.173 us161.494 us
  blk_mq_sched_restart  504029066.337 us 0.179 us113.009 us
  blk_mq_sched_restart  501049366.197 us 0.186 us188.645 us
  blk_mq_sched_restart  503759317.727 us 0.184 us54.218 us
  blk_mq_sched_restart  501369311.657 us 0.185 us446.790 us
  blk_mq_sched_restart  501039179.625 us 0.183 us114.472 us
  [cut]

Timings and stdevs are terrible, which leads to significant difference:
570MB/s vs 1280MB/s.

This is RFC since current patch fixes queue stalling but performance issue
still remains and for me is not clear is it better to improve commit
6d8c6c0f97ad ("blk-mq: Restart a single queue if tag sets are shared")
making percpu restart lists (to avoid looping and to dequeue hctx immediately)
or revert it (frankly I did not notice any difference on small number of
devices and hctxs, when looping issue does not impact much).

--
Roman

Signed-off-by: Roman Pen <roman.peny...@profitbricks.com>
Cc: linux-ker...@vger.kernel.org
Cc: linux-block@vger.kernel.org
Cc: Bart 

[PATCH v2 1/1] blk-mq: fix hang caused by freeze/unfreeze sequence

2016-08-08 Thread Roman Pen
Long time ago there was a similar fix proposed by Akinobu Mita[1],
but it seems that time everyone decided to fix this subtle race in
percpu-refcount and Tejun Heo[2] did an attempt (as I can see that
patchset was not applied).

The following is a description of a hang in blk_mq_freeze_queue_wait() -
same fix but a bug from another angle.

The hang happens on attempt to freeze a queue while another task does
queue unfreeze.

The root cause is an incorrect sequence of percpu_ref_reinit() and
percpu_ref_kill() and as a result those two can be swapped:

 CPU#0   CPU#1
 -
 percpu_ref_kill()

 percpu_ref_kill() << atomic reference does
 percpu_ref_reinit()   << not guarantee the order

 blk_mq_freeze_queue_wait() << HANG HERE

 percpu_ref_reinit()

Firstly this wrong sequence raises two kernel warnings:

  1st. WARNING at lib/percpu-recount.c:309
   percpu_ref_kill_and_confirm called more than once

  2nd. WARNING at lib/percpu-refcount.c:331

But the most unpleasant effect is a hang of a blk_mq_freeze_queue_wait(),
which waits for a zero of a q_usage_counter, which never happens
because percpu-ref was reinited (instead of being killed) and stays in
PERCPU state forever.

The simplified sequence above can be reproduced on shared tags, when
queue A is going to die meanwhile another queue B is in init state and
is trying to freeze the queue A, which shares the same tags set:

 CPU#0   CPU#1
 --- 
 q1 = blk_mq_init_queue(shared_tags)

q2 = blk_mq_init_queue(shared_tags):
  blk_mq_add_queue_tag_set(shared_tags):
blk_mq_update_tag_set_depth(shared_tags):
  blk_mq_freeze_queue(q1)
 blk_cleanup_queue(q1) ...
   blk_mq_freeze_queue(q1)   <<<->>>   blk_mq_unfreeze_queue(q1)

[1] Message id: 1443287365-4244-7-git-send-email-akinobu.m...@gmail.com
[2] Message id: 1443563240-29306-6-git-send-email...@kernel.org

Signed-off-by: Roman Pen <roman.peny...@profitbricks.com>
Cc: Akinobu Mita <akinobu.m...@gmail.com>
Cc: Tejun Heo <t...@kernel.org>
Cc: Jens Axboe <ax...@kernel.dk>
Cc: Christoph Hellwig <h...@lst.de>
Cc: linux-block@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
---
 v2:
   - forgotten hunk from local repo
   - minor tweaks in the commit message

 block/blk-core.c   |  3 ++-
 block/blk-mq.c | 22 +++---
 include/linux/blkdev.h |  7 ++-
 3 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index ef78848..4fd27e9 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -658,7 +658,7 @@ int blk_queue_enter(struct request_queue *q, gfp_t gfp)
return -EBUSY;
 
ret = wait_event_interruptible(q->mq_freeze_wq,
-   !atomic_read(>mq_freeze_depth) ||
+   !q->mq_freeze_depth ||
blk_queue_dying(q));
if (blk_queue_dying(q))
return -ENODEV;
@@ -740,6 +740,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, 
int node_id)
__set_bit(QUEUE_FLAG_BYPASS, >queue_flags);
 
init_waitqueue_head(>mq_freeze_wq);
+   mutex_init(>mq_freeze_lock);
 
/*
 * Init percpu_ref in atomic mode so that it's faster to shutdown.
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 6d6f8fe..1f3e81b 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -80,13 +80,13 @@ static void blk_mq_hctx_clear_pending(struct blk_mq_hw_ctx 
*hctx,
 
 void blk_mq_freeze_queue_start(struct request_queue *q)
 {
-   int freeze_depth;
-
-   freeze_depth = atomic_inc_return(>mq_freeze_depth);
-   if (freeze_depth == 1) {
+   mutex_lock(>mq_freeze_lock);
+   if (++q->mq_freeze_depth == 1) {
percpu_ref_kill(>q_usage_counter);
+   mutex_unlock(>mq_freeze_lock);
blk_mq_run_hw_queues(q, false);
-   }
+   } else
+   mutex_unlock(>mq_freeze_lock);
 }
 EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_start);
 
@@ -124,14 +124,14 @@ EXPORT_SYMBOL_GPL(blk_mq_freeze_queue);
 
 void blk_mq_unfreeze_queue(struct request_queue *q)
 {
-   int freeze_depth;
-
-   freeze_depth = atomic_dec_return(>mq_freeze_depth);
-   WARN_ON_ONCE(freeze_depth < 0);
-   if (!freeze_depth) {
+   mutex_lock(>mq_freeze_lock);
+   q->mq_freeze_depth--;
+   WARN_ON_ONCE(q->mq_freeze_depth < 0);
+   if (!q->mq_freeze_depth) {
percpu_ref_reinit(>q_usage_counter);
wake_up_all(>mq_freeze_wq);
}
+