[PATCH 1/1] blk-mq: fix hang caused by freeze/unfreeze sequence
Long time ago there was a similar fix proposed by Akinobu Mita[1], but it seems that time everyone decided to fix this subtle race in percpu-refcount and Tejun Heo[2] did an attempt (as I can see that patchset was not applied). The following is a description of a queue hang - same fix but a bug from another angle. The hang happens on queue freeze because of a simultaneous calls of blk_mq_freeze_queue() and blk_mq_unfreeze_queue() from different threads, and because of a reference race percpu_ref_reinit() and percpu_ref_kill() swap. CPU#0 CPU#1 - percpu_ref_kill() percpu_ref_kill() << atomic reference does not percpu_ref_reinit() << guarantee the order blk_mq_freeze_queue_wait() << HANG HERE percpu_ref_reinit() Firstly this wrong sequence raises two kernel warnings: 1st. WARNING at lib/percpu-recount.c:309 percpu_ref_kill_and_confirm called more than once 2nd. WARNING at lib/percpu-refcount.c:331 But the most unpleasant effect is a hang of a blk_mq_freeze_queue_wait(), which waits for a zero of a q_usage_counter, which never happens because percpu-ref was not reinited and stays in PERCPU state forever. The simplified sequence above is reproduced on shared tags, when one queue is going to die meanwhile another one is initing: CPU#0 CPU#1 --- q1 = blk_mq_init_queue(shared_tags) q2 = blk_mq_init_queue(shared_tags): blk_mq_add_queue_tag_set(shared_tags): blk_mq_update_tag_set_depth(shared_tags): blk_mq_freeze_queue(q1) blk_cleanup_queue(q1) ... blk_mq_freeze_queue(q1) <<<->>> blk_mq_unfreeze_queue(q1) [1] Message id: 1443287365-4244-7-git-send-email-akinobu.m...@gmail.com [2] Message id: 1443563240-29306-6-git-send-email...@kernel.org Signed-off-by: Roman Pen Cc: Akinobu Mita Cc: Tejun Heo Cc: Jens Axboe Cc: Christoph Hellwig Cc: linux-block@vger.kernel.org Cc: linux-ker...@vger.kernel.org --- block/blk-core.c | 1 + block/blk-mq.c | 22 +++--- include/linux/blkdev.h | 7 ++- 3 files changed, 18 insertions(+), 12 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index ef78848..01dcb02 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -740,6 +740,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id) __set_bit(QUEUE_FLAG_BYPASS, &q->queue_flags); init_waitqueue_head(&q->mq_freeze_wq); + mutex_init(&q->mq_freeze_lock); /* * Init percpu_ref in atomic mode so that it's faster to shutdown. diff --git a/block/blk-mq.c b/block/blk-mq.c index 6d6f8fe..1f3e81b 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -80,13 +80,13 @@ static void blk_mq_hctx_clear_pending(struct blk_mq_hw_ctx *hctx, void blk_mq_freeze_queue_start(struct request_queue *q) { - int freeze_depth; - - freeze_depth = atomic_inc_return(&q->mq_freeze_depth); - if (freeze_depth == 1) { + mutex_lock(&q->mq_freeze_lock); + if (++q->mq_freeze_depth == 1) { percpu_ref_kill(&q->q_usage_counter); + mutex_unlock(&q->mq_freeze_lock); blk_mq_run_hw_queues(q, false); - } + } else + mutex_unlock(&q->mq_freeze_lock); } EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_start); @@ -124,14 +124,14 @@ EXPORT_SYMBOL_GPL(blk_mq_freeze_queue); void blk_mq_unfreeze_queue(struct request_queue *q) { - int freeze_depth; - - freeze_depth = atomic_dec_return(&q->mq_freeze_depth); - WARN_ON_ONCE(freeze_depth < 0); - if (!freeze_depth) { + mutex_lock(&q->mq_freeze_lock); + q->mq_freeze_depth--; + WARN_ON_ONCE(q->mq_freeze_depth < 0); + if (!q->mq_freeze_depth) { percpu_ref_reinit(&q->q_usage_counter); wake_up_all(&q->mq_freeze_wq); } + mutex_unlock(&q->mq_freeze_lock); } EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue); @@ -2105,7 +2105,7 @@ void blk_mq_free_queue(struct request_queue *q) static void blk_mq_queue_reinit(struct request_queue *q, const struct cpumask *online_mask) { - WARN_ON_ONCE(!atomic_read(&q->mq_freeze_depth)); + WARN_ON_ONCE(!q->mq_freeze_depth); blk_mq_sysfs_unregister(q); diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index f6ff9d1..d692c16 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -445,7 +445,7 @@ struct request_queue { struct mutexsysfs_lock; int
[PATCH v2 1/1] blk-mq: fix hang caused by freeze/unfreeze sequence
Long time ago there was a similar fix proposed by Akinobu Mita[1], but it seems that time everyone decided to fix this subtle race in percpu-refcount and Tejun Heo[2] did an attempt (as I can see that patchset was not applied). The following is a description of a hang in blk_mq_freeze_queue_wait() - same fix but a bug from another angle. The hang happens on attempt to freeze a queue while another task does queue unfreeze. The root cause is an incorrect sequence of percpu_ref_reinit() and percpu_ref_kill() and as a result those two can be swapped: CPU#0 CPU#1 - percpu_ref_kill() percpu_ref_kill() << atomic reference does percpu_ref_reinit() << not guarantee the order blk_mq_freeze_queue_wait() << HANG HERE percpu_ref_reinit() Firstly this wrong sequence raises two kernel warnings: 1st. WARNING at lib/percpu-recount.c:309 percpu_ref_kill_and_confirm called more than once 2nd. WARNING at lib/percpu-refcount.c:331 But the most unpleasant effect is a hang of a blk_mq_freeze_queue_wait(), which waits for a zero of a q_usage_counter, which never happens because percpu-ref was reinited (instead of being killed) and stays in PERCPU state forever. The simplified sequence above can be reproduced on shared tags, when queue A is going to die meanwhile another queue B is in init state and is trying to freeze the queue A, which shares the same tags set: CPU#0 CPU#1 --- q1 = blk_mq_init_queue(shared_tags) q2 = blk_mq_init_queue(shared_tags): blk_mq_add_queue_tag_set(shared_tags): blk_mq_update_tag_set_depth(shared_tags): blk_mq_freeze_queue(q1) blk_cleanup_queue(q1) ... blk_mq_freeze_queue(q1) <<<->>> blk_mq_unfreeze_queue(q1) [1] Message id: 1443287365-4244-7-git-send-email-akinobu.m...@gmail.com [2] Message id: 1443563240-29306-6-git-send-email...@kernel.org Signed-off-by: Roman Pen Cc: Akinobu Mita Cc: Tejun Heo Cc: Jens Axboe Cc: Christoph Hellwig Cc: linux-block@vger.kernel.org Cc: linux-ker...@vger.kernel.org --- v2: - forgotten hunk from local repo - minor tweaks in the commit message block/blk-core.c | 3 ++- block/blk-mq.c | 22 +++--- include/linux/blkdev.h | 7 ++- 3 files changed, 19 insertions(+), 13 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index ef78848..4fd27e9 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -658,7 +658,7 @@ int blk_queue_enter(struct request_queue *q, gfp_t gfp) return -EBUSY; ret = wait_event_interruptible(q->mq_freeze_wq, - !atomic_read(&q->mq_freeze_depth) || + !q->mq_freeze_depth || blk_queue_dying(q)); if (blk_queue_dying(q)) return -ENODEV; @@ -740,6 +740,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id) __set_bit(QUEUE_FLAG_BYPASS, &q->queue_flags); init_waitqueue_head(&q->mq_freeze_wq); + mutex_init(&q->mq_freeze_lock); /* * Init percpu_ref in atomic mode so that it's faster to shutdown. diff --git a/block/blk-mq.c b/block/blk-mq.c index 6d6f8fe..1f3e81b 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -80,13 +80,13 @@ static void blk_mq_hctx_clear_pending(struct blk_mq_hw_ctx *hctx, void blk_mq_freeze_queue_start(struct request_queue *q) { - int freeze_depth; - - freeze_depth = atomic_inc_return(&q->mq_freeze_depth); - if (freeze_depth == 1) { + mutex_lock(&q->mq_freeze_lock); + if (++q->mq_freeze_depth == 1) { percpu_ref_kill(&q->q_usage_counter); + mutex_unlock(&q->mq_freeze_lock); blk_mq_run_hw_queues(q, false); - } + } else + mutex_unlock(&q->mq_freeze_lock); } EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_start); @@ -124,14 +124,14 @@ EXPORT_SYMBOL_GPL(blk_mq_freeze_queue); void blk_mq_unfreeze_queue(struct request_queue *q) { - int freeze_depth; - - freeze_depth = atomic_dec_return(&q->mq_freeze_depth); - WARN_ON_ONCE(freeze_depth < 0); - if (!freeze_depth) { + mutex_lock(&q->mq_freeze_lock); + q->mq_freeze_depth--; + WARN_ON_ONCE(q->mq_freeze_depth < 0); + if (!q->mq_freeze_depth) { percpu_ref_reinit(&q->q_usage_counter); wake_up_all(&q->mq_freeze_wq); } + mutex_unlock(&q->mq_f
[PATCH 1/1] [RFC] blk-mq: fix queue stalling on shared hctx restart
Hi all, the patch below fixes queue stalling when shared hctx marked for restart (BLK_MQ_S_SCHED_RESTART bit) but q->shared_hctx_restart stays zero. The root cause is that hctxs are shared between queues, but 'shared_hctx_restart' belongs to the particular queue, which in fact may not need to be restarted, thus we return from blk_mq_sched_restart() and leave shared hctx of another queue never restarted. The fix is to make shared_hctx_restart counter belong not to the queue, but to tags, thereby counter will reflect real number of shared hctx needed to be restarted. During tests 1 hctx (set->nr_hw_queues) was used and all stalled requests were noticed in dd->fifo_list of mq-deadline scheduler. Seeming possible sequence of events: 1. Request A of queue A is inserted into dd->fifo_list of the scheduler. 2. Request B of queue A bypasses scheduler and goes directly to hctx->dispatch. 3. Request C of queue B is inserted. 4. blk_mq_sched_dispatch_requests() is invoked, since hctx->dispatch is not empty (request B is in the list) hctx is only marked for for next restart and request A is left in a list (see comment "So it's best to leave them there for as long as we can. Mark the hw queue as needing a restart in that case." in blk-mq-sched.c) 5. Eventually request B is completed/freed and blk_mq_sched_restart() is called, but by chance hctx from queue B is chosen for restart and request C gets a chance to be dispatched. 6. Eventually request C is completed/freed and blk_mq_sched_restart() is called, but shared_hctx_restart for queue B is zero and we return without attempt to restart hctx from queue A, thus request A is stuck forever. But stalling queue is not the only one problem with blk_mq_sched_restart(). My tests show that those loops thru all queues and hctxs can be very costly, even with shared_hctx_restart counter, which aims to fix performance issue. For my tests I create 128 devices with 64 hctx each, which share same tags set. The following is the fio and ftrace output for v4.14-rc4 kernel: READ: io=5630.3MB, aggrb=573208KB/s, minb=573208KB/s, maxb=573208KB/s, mint=10058msec, maxt=10058msec WRITE: io=5650.9MB, aggrb=575312KB/s, minb=575312KB/s, maxb=575312KB/s, mint=10058msec, maxt=10058msec root@pserver16:~/roman# cat /sys/kernel/debug/tracing/trace_stat/* | grep blk_mq Function Hit TimeAvg s^2 --- --- --- blk_mq_sched_restart 163479540759 us 583.639 us 8804801 us blk_mq_sched_restart 78846073471 us 770.354 us 8780054 us blk_mq_sched_restart 141767586794 us 535.185 us 2822731 us blk_mq_sched_restart 78436205435 us 791.206 us 12424960 us blk_mq_sched_restart 14904786107 us 3212.153 us 1949753 us blk_mq_sched_restart 78926039311 us 765.244 us 2994627 us blk_mq_sched_restart 153827511126 us 488.306 us 3090912 us [cut] And here are results with two patches reverted: 8e8320c9315c ("blk-mq: fix performance regression with shared tags") 6d8c6c0f97ad ("blk-mq: Restart a single queue if tag sets are shared") READ: io=12884MB, aggrb=1284.3MB/s, minb=1284.3MB/s, maxb=1284.3MB/s, mint=10032msec, maxt=10032msec WRITE: io=12987MB, aggrb=1294.6MB/s, minb=1294.6MB/s, maxb=1294.6MB/s, mint=10032msec, maxt=10032msec root@pserver16:~/roman# cat /sys/kernel/debug/tracing/trace_stat/* | grep blk_mq Function Hit TimeAvg s^2 --- --- --- blk_mq_sched_restart 506998802.349 us 0.173 us121.771 us blk_mq_sched_restart 503628740.470 us 0.173 us161.494 us blk_mq_sched_restart 504029066.337 us 0.179 us113.009 us blk_mq_sched_restart 501049366.197 us 0.186 us188.645 us blk_mq_sched_restart 503759317.727 us 0.184 us54.218 us blk_mq_sched_restart 501369311.657 us 0.185 us446.790 us blk_mq_sched_restart 501039179.625 us 0.183 us114.472 us [cut] Timings and stdevs are terrible, which leads to significant difference: 570MB/s vs 1280MB/s. This is RFC since current patch fixes queue stalling but performance issue still remains and for me is not clear is it better to improve commit 6d8c6c0f97ad ("blk-mq: Restart a single queue if tag sets are shared") making percpu restart lists (to avoid looping and to dequeue hctx immediately) or revert it (frankly I did not notice any difference on small number of devices and hctxs, when looping issue does not impact much). -- Roman Signed-off-by: Roman Pen Cc: linux-ker...@vger.kernel.org Cc: linux-block@vger.kernel.org Cc: Bart Van Assche Cc: Christop
[PATCH 06/24] ibtrs: client: statistics functions
This introduces set of functions used on client side to account statistics of RDMA data sent/received, amount of IOs inflight, latency, cpu migrations, etc. Almost all statistics is collected using percpu variables. Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/infiniband/ulp/ibtrs/ibtrs-clt-stats.c | 455 + 1 file changed, 455 insertions(+) diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-clt-stats.c b/drivers/infiniband/ulp/ibtrs/ibtrs-clt-stats.c new file mode 100644 index ..af2ed05d2900 --- /dev/null +++ b/drivers/infiniband/ulp/ibtrs/ibtrs-clt-stats.c @@ -0,0 +1,455 @@ +/* + * InfiniBand Transport Layer + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#undef pr_fmt +#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt + +#include "ibtrs-clt.h" + +static inline int ibtrs_clt_ms_to_id(unsigned long ms) +{ + int id = ms ? ilog2(ms) - MIN_LOG_LAT + 1 : 0; + + return clamp(id, 0, LOG_LAT_SZ - 1); +} + +void ibtrs_clt_update_rdma_lat(struct ibtrs_clt_stats *stats, bool read, + unsigned long ms) +{ + struct ibtrs_clt_stats_pcpu *s; + int id; + + id = ibtrs_clt_ms_to_id(ms); + s = this_cpu_ptr(stats->pcpu_stats); + if (read) { + s->rdma_lat_distr[id].read++; + if (s->rdma_lat_max.read < ms) + s->rdma_lat_max.read = ms; + } else { + s->rdma_lat_distr[id].write++; + if (s->rdma_lat_max.write < ms) + s->rdma_lat_max.write = ms; + } +} + +void ibtrs_clt_decrease_inflight(struct ibtrs_clt_stats *stats) +{ + atomic_dec(&stats->inflight); +} + +void ibtrs_clt_update_wc_stats(struct ibtrs_clt_con *con) +{ + struct ibtrs_clt_sess *sess = to_clt_sess(con->c.sess); + struct ibtrs_clt_stats *stats = &sess->stats; + struct ibtrs_clt_stats_pcpu *s; + int cpu; + + cpu = raw_smp_processor_id(); + s = this_cpu_ptr(stats->pcpu_stats); + s->wc_comp.cnt++; + s->wc_comp.total_cnt++; + if (unlikely(con->cpu != cpu)) { + s->cpu_migr.to++; + + /* Careful here, override s pointer */ + s = per_cpu_ptr(stats->pcpu_stats, con->cpu); + atomic_inc(&s->cpu_migr.from); + } +} + +void ibtrs_clt_inc_failover_cnt(struct ibtrs_clt_stats *stats) +{ + struct ibtrs_clt_stats_pcpu *s; + + s = this_cpu_ptr(stats->pcpu_stats); + s->rdma.failover_cnt++; +} + +static inline u32 ibtrs_clt_stats_get_avg_wc_cnt(struct ibtrs_clt_stats *stats) +{ + u32 cnt = 0; + u64 sum = 0; + int cpu; + + for_each_possible_cpu(cpu) { + struct ibtrs_clt_stats_pcpu *s; + + s = per_cpu_ptr(stats->pcpu_stats, cpu); + sum += s->wc_comp.total_cnt; + cnt += s->wc_comp.cnt; + } + + return cnt ? sum / cnt : 0; +} + +int ibtrs_clt_stats_wc_completion_to_str(struct ibtrs_clt_stats *stats, +char *buf, size_t len) +{ + return scnprintf(buf, len, "%u\n", +ibtrs_clt_stats_get_avg_wc_cnt(stats)); +} + +ssize_t ibtrs_clt_stats_rdma_lat_distr_to_str(struct ibtrs_clt_stats *stats, + char *page, size_t len) +{ + struct ibtrs_clt_stats_rdma_lat res[LOG_LAT_SZ]; + struct ibtrs_clt_stats_rdma_lat max; + struct ibtrs_clt_stats_pcpu *s; + + ssize_t cnt = 0; + int i, cpu; + + max.write = 0; + max.read = 0; + for_each_possible_cpu(cpu) { + s = per_cpu_ptr(stats->pcpu_stats, cpu); + + if (max.write < s->rdma_lat_max.write) + max.write = s->rdma_lat_max.write; + i
[PATCH 05/24] ibtrs: client: main functionality
This is main functionality of ibtrs-client module, which manages set of RDMA connections for each IBTRS session, does multipathing, load balancing and failover of RDMA requests. Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/infiniband/ulp/ibtrs/ibtrs-clt.c | 3496 ++ 1 file changed, 3496 insertions(+) diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-clt.c b/drivers/infiniband/ulp/ibtrs/ibtrs-clt.c new file mode 100644 index ..aa0a17f2a78c --- /dev/null +++ b/drivers/infiniband/ulp/ibtrs/ibtrs-clt.c @@ -0,0 +1,3496 @@ +/* + * InfiniBand Transport Layer + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#undef pr_fmt +#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt + +#include +#include + +#include "ibtrs-clt.h" +#include "ibtrs-log.h" + +#define RECONNECT_SEED 8 +#define MAX_SEGMENTS 31 + +#define IBTRS_CONNECT_TIMEOUT_MS 5000 + +MODULE_AUTHOR("ib...@profitbricks.com"); +MODULE_DESCRIPTION("IBTRS Client"); +MODULE_VERSION(IBTRS_VER_STRING); +MODULE_LICENSE("GPL"); + +static bool use_fr; +module_param(use_fr, bool, 0444); +MODULE_PARM_DESC(use_fr, "use FRWR mode for memory registration if possible." +" (default: 0)"); + +static ushort nr_cons_per_session; +module_param(nr_cons_per_session, ushort, 0444); +MODULE_PARM_DESC(nr_cons_per_session, "Number of connections per session." +" (default: nr_cpu_ids)"); + +static int retry_count = 7; + +static int retry_count_set(const char *val, const struct kernel_param *kp) +{ + int err, ival; + + err = kstrtoint(val, 0, &ival); + if (err) + return err; + + if (ival < MIN_RTR_CNT || ival > MAX_RTR_CNT) + return -EINVAL; + + retry_count = ival; + + return 0; +} + +static const struct kernel_param_ops retry_count_ops = { + .set= retry_count_set, + .get= param_get_int, +}; +module_param_cb(retry_count, &retry_count_ops, &retry_count, 0644); + +MODULE_PARM_DESC(retry_count, "Number of times to send the message if the" +" remote side didn't respond with Ack or Nack (default: 3," +" min: " __stringify(MIN_RTR_CNT) ", max: " +__stringify(MAX_RTR_CNT) ")"); + +static int fmr_sg_cnt = 4; +module_param_named(fmr_sg_cnt, fmr_sg_cnt, int, 0644); +MODULE_PARM_DESC(fmr_sg_cnt, "when sg_cnt is bigger than fmr_sg_cnt, enable" +" FMR (default: 4)"); + +static struct workqueue_struct *ibtrs_wq; + +static void ibtrs_rdma_error_recovery(struct ibtrs_clt_con *con); +static void ibtrs_clt_rdma_done(struct ib_cq *cq, struct ib_wc *wc); + +static inline void ibtrs_clt_state_lock(void) +{ + rcu_read_lock(); +} + +static inline void ibtrs_clt_state_unlock(void) +{ + rcu_read_unlock(); +} + +#define cmpxchg_min(var, new) ({ \ + typeof(var) old;\ + \ + do {\ + old = var; \ + new = (!old ? new : min_t(typeof(var), old, new)); \ + } while (cmpxchg(&var, old, new) != old); \ +}) + +static void ibtrs_clt_set_min_queue_depth(struct ibtrs_clt *clt, size_t new) +{ + /* Can be updated from different sessions (paths), so cmpxchg */ + + cmpxchg_min(clt->queue_depth, new); +} + +static void ibtrs_clt_set_min_io_size(struct ibtrs_clt *clt, size_t new) +{ + /* Can be updated from different sessions (paths), so cmpxchg */ + + cmpxchg_min(clt->max_io_size, new); +}
[PATCH 07/24] ibtrs: client: sysfs interface functions
This is the sysfs interface to IBTRS sessions on client side: /sys/kernel/ibtrs_client// *** IBTRS session created by ibtrs_clt_open() API call | |- max_reconnect_attempts | *** number of reconnect attempts for session | |- add_path | *** adds another connection path into IBTRS session | |- paths// *** established paths to server in a session | |- disconnect | *** disconnect path | |- reconnect | *** reconnect path | |- remove_path | *** remove current path | |- state | *** retrieve current path state | |- stats/ *** current path statistics | |- cpu_migration |- rdma |- rdma_lat |- reconnects |- reset_all |- sg_entries |- wc_completions Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/infiniband/ulp/ibtrs/ibtrs-clt-sysfs.c | 519 + 1 file changed, 519 insertions(+) diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-clt-sysfs.c b/drivers/infiniband/ulp/ibtrs/ibtrs-clt-sysfs.c new file mode 100644 index ..04949d6d796b --- /dev/null +++ b/drivers/infiniband/ulp/ibtrs/ibtrs-clt-sysfs.c @@ -0,0 +1,519 @@ +/* + * InfiniBand Transport Layer + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#undef pr_fmt +#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt + +#include "ibtrs-pri.h" +#include "ibtrs-clt.h" +#include "ibtrs-log.h" + +static struct kobject *ibtrs_kobj; + +#define MIN_MAX_RECONN_ATT -1 +#define MAX_MAX_RECONN_ATT + +static struct kobj_type ktype = { + .sysfs_ops = &kobj_sysfs_ops, +}; + +static ssize_t ibtrs_clt_max_reconn_attempts_show(struct kobject *kobj, + struct kobj_attribute *attr, + char *page) +{ + struct ibtrs_clt *clt; + + clt = container_of(kobj, struct ibtrs_clt, kobj); + + return sprintf(page, "%d\n", ibtrs_clt_get_max_reconnect_attempts(clt)); +} + +static ssize_t ibtrs_clt_max_reconn_attempts_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, + size_t count) +{ + struct ibtrs_clt *clt; + int value; + int ret; + + clt = container_of(kobj, struct ibtrs_clt, kobj); + + ret = kstrtoint(buf, 10, &value); + if (unlikely(ret)) { + ibtrs_err(clt, "%s: failed to convert string '%s' to int\n", + attr->attr.name, buf); + return ret; + } + if (unlikely(value > MAX_MAX_RECONN_ATT || +value < MIN_MAX_RECONN_ATT)) { + ibtrs_err(clt, "%s: invalid range" + " (provided: '%s', accepted: min: %d, max: %d)\n", + attr->attr.name, buf, MIN_MAX_RECONN_ATT, + MAX_MAX_RECONN_ATT); + return -EINVAL; + } + ibtrs_clt_set_max_reconnect_attempts(clt, value); + + return count; +} + +static struct kobj_attribute ibtrs_clt_max_reconnect_attempts_attr = + __ATTR(max_reconnect_attempts, 0644, + ibtrs_clt_max_reconn_attempts_show, + ibtrs_clt_max_reconn_attempts_store); + +static ssize_t ibtrs_clt_mp_policy_show(struct kobject *kobj, + struct kobj_attribute *attr, + char *page) +{ + struct ibtrs_clt *clt; + + clt = container_of(kobj, struct ibtrs_clt, kobj); + + switch (clt->mp_policy) { + case MP_POLICY_RR: + return sprintf(p
[PATCH 00/24] InfiniBand Transport (IBTRS) and Network Block Device (IBNBD)
This series introduces IBNBD/IBTRS modules. IBTRS (InfiniBand Transport) is a reliable high speed transport library which allows for establishing connection between client and server machines via RDMA. It is optimized to transfer (read/write) IO blocks in the sense that it follows the BIO semantics of providing the possibility to either write data from a scatter-gather list to the remote side or to request ("read") data transfer from the remote side into a given set of buffers. IBTRS is multipath capable and provides I/O fail-over and load-balancing functionality. IBNBD (InfiniBand Network Block Device) is a pair of kernel modules (client and server) that allow for remote access of a block device on the server over IBTRS protocol. After being mapped, the remote block devices can be accessed on the client side as local block devices. Internally IBNBD uses IBTRS as an RDMA transport library. Why? - IBNBD/IBTRS is developed in order to map thin provisioned volumes, thus internal protocol is simple and consists of several request types only without awareness of underlaying hardware devices. - IBTRS was developed as an independent RDMA transport library, which supports fail-over and load-balancing policies using multipath, thus it can be used for any other IO needs rather than only for block device. - IBNBD/IBTRS is faster than NVME over RDMA. Old comparison results: https://www.spinics.net/lists/linux-rdma/msg48799.html (I retested on latest 4.14 kernel - there is no any significant difference, thus I post the old link). Key features of IBTRS transport library and IBNBD block device: o High throughput and low latency due to: - Only two RDMA messages per IO. - IMM InfiniBand messages on responses to reduce round trip latency. - Simplified memory management: memory allocation happens once on server side when IBTRS session is established. o IO fail-over and load-balancing by using multipath. o Simple configuration of IBNBD: - Server side is completely passive: volumes do not need to be explicitly exported. - Only IB port GID and device path needed on client side to map a block device. - A device is remapped automatically i.e. after storage reboot. This series is a second try, first variant was published [1] and presented on Vault in 2017 [2]. Since the first version the following was changed: - Load-balancing and IO fail-over using multipath features were added. - Major parts of the code were rewritten, simplified and overall code size was reduced by a quarter. Commits for kernel can be found here: https://github.com/profitbricks/ibnbd/commits/linux-4.15-rc8 The out-of-tree modules are here: https://github.com/profitbricks/ibnbd/ [1] https://lwn.net/Articles/718181/ [2] http://events.linuxfoundation.org/sites/events/files/slides/IBNBD-Vault-2017.pdf Roman Pen (24): ibtrs: public interface header to establish RDMA connections ibtrs: private headers with IBTRS protocol structs and helpers ibtrs: core: lib functions shared between client and server modules ibtrs: client: private header with client structs and functions ibtrs: client: main functionality ibtrs: client: statistics functions ibtrs: client: sysfs interface functions ibtrs: server: private header with server structs and functions ibtrs: server: main functionality ibtrs: server: statistics functions ibtrs: server: sysfs interface functions ibtrs: include client and server modules into kernel compilation ibtrs: a bit of documentation ibnbd: private headers with IBNBD protocol structs and helpers ibnbd: client: private header with client structs and functions ibnbd: client: main functionality ibnbd: client: sysfs interface functions ibnbd: server: private header with server structs and functions ibnbd: server: main functionality ibnbd: server: functionality for IO submission to file or block dev ibnbd: server: sysfs interface functions ibnbd: include client and server modules into kernel compilation ibnbd: a bit of documentation MAINTAINERS: Add maintainer for IBNBD/IBTRS modules MAINTAINERS| 14 + drivers/block/Kconfig |2 + drivers/block/Makefile |1 + drivers/block/ibnbd/Kconfig| 22 + drivers/block/ibnbd/Makefile | 13 + drivers/block/ibnbd/README | 272 ++ drivers/block/ibnbd/ibnbd-clt-sysfs.c | 723 + drivers/block/ibnbd/ibnbd-clt.c| 1959 + drivers/block/ibnbd/ibnbd-clt.h| 193 ++ drivers/block/ibnbd/ibnbd-log.h| 71 + drivers/block/ibnbd/ibnbd-proto.h | 360 +++ drivers/block/ibnbd/ibnbd-srv-dev.c| 410 +++ drivers/block/ibnbd/ibnbd-srv-dev.h| 149 + drivers/block/ibnbd/ibnbd-s
[PATCH 03/24] ibtrs: core: lib functions shared between client and server modules
This is a set of library functions existing as a ibtrs-core module, used by client and server modules. Mainly these functions wrap IB and RDMA calls and provide a bit higher abstraction for implementing of IBTRS protocol on client or server sides. Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/infiniband/ulp/ibtrs/ibtrs.c | 582 +++ 1 file changed, 582 insertions(+) diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs.c b/drivers/infiniband/ulp/ibtrs/ibtrs.c new file mode 100644 index ..007380506959 --- /dev/null +++ b/drivers/infiniband/ulp/ibtrs/ibtrs.c @@ -0,0 +1,582 @@ +/* + * InfiniBand Transport Layer + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#undef pr_fmt +#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt + +#include +#include + +#include "ibtrs-pri.h" +#include "ibtrs-log.h" + +MODULE_AUTHOR("ib...@profitbricks.com"); +MODULE_DESCRIPTION("IBTRS Core"); +MODULE_VERSION(IBTRS_VER_STRING); +MODULE_LICENSE("GPL"); + +static LIST_HEAD(device_list); +static DEFINE_MUTEX(device_list_mutex); + +struct ibtrs_iu *ibtrs_iu_alloc(u32 tag, size_t size, gfp_t gfp_mask, + struct ib_device *dma_dev, + enum dma_data_direction direction, + void (*done)(struct ib_cq *cq, +struct ib_wc *wc)) +{ + struct ibtrs_iu *iu; + + iu = kmalloc(sizeof(*iu), gfp_mask); + if (unlikely(!iu)) + return NULL; + + iu->buf = kzalloc(size, gfp_mask); + if (unlikely(!iu->buf)) + goto err1; + + iu->dma_addr = ib_dma_map_single(dma_dev, iu->buf, size, direction); + if (unlikely(ib_dma_mapping_error(dma_dev, iu->dma_addr))) + goto err2; + + iu->cqe.done = done; + iu->size = size; + iu->direction = direction; + iu->tag = tag; + + return iu; + +err2: + kfree(iu->buf); +err1: + kfree(iu); + + return NULL; +} +EXPORT_SYMBOL_GPL(ibtrs_iu_alloc); + +void ibtrs_iu_free(struct ibtrs_iu *iu, enum dma_data_direction dir, + struct ib_device *ibdev) +{ + if (!iu) + return; + + ib_dma_unmap_single(ibdev, iu->dma_addr, iu->size, dir); + kfree(iu->buf); + kfree(iu); +} +EXPORT_SYMBOL_GPL(ibtrs_iu_free); + +int ibtrs_iu_post_recv(struct ibtrs_con *con, struct ibtrs_iu *iu) +{ + struct ibtrs_sess *sess = con->sess; + struct ib_recv_wr wr, *bad_wr; + struct ib_sge list; + + list.addr = iu->dma_addr; + list.length = iu->size; + list.lkey = sess->ib_dev->lkey; + + if (WARN_ON(list.length == 0)) { + ibtrs_wrn(con, "Posting receive work request failed," + " sg list is empty\n"); + return -EINVAL; + } + + wr.next= NULL; + wr.wr_cqe = &iu->cqe; + wr.sg_list = &list; + wr.num_sge = 1; + + return ib_post_recv(con->qp, &wr, &bad_wr); +} +EXPORT_SYMBOL_GPL(ibtrs_iu_post_recv); + +int ibtrs_post_recv_empty(struct ibtrs_con *con, struct ib_cqe *cqe) +{ + struct ib_recv_wr wr, *bad_wr; + + wr.next= NULL; + wr.wr_cqe = cqe; + wr.sg_list = NULL; + wr.num_sge = 0; + + return ib_post_recv(con->qp, &wr, &bad_wr); +} +EXPORT_SYMBOL_GPL(ibtrs_post_recv_empty); + +int ibtrs_iu_post_send(struct ibtrs_con *con, struct ibtrs_iu *iu, size_t size) +{ + struct ibtrs_sess *sess = con->sess; + struct ib_send_wr wr, *bad_wr; + struct ib_sge list; + + if ((WARN_ON(size == 0))) + return -EINVAL; + + list.addr = iu->dma_addr; + list.length = size; + list.lkey = sess->
[PATCH 09/24] ibtrs: server: main functionality
This is main functionality of ibtrs-server module, which accepts set of RDMA connections (so called IBTRS session), creates/destroys sysfs entries associated with IBTRS session and notifies upper layer (user of IBTRS API) about RDMA requests or link events. Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/infiniband/ulp/ibtrs/ibtrs-srv.c | 1811 ++ 1 file changed, 1811 insertions(+) diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-srv.c b/drivers/infiniband/ulp/ibtrs/ibtrs-srv.c new file mode 100644 index ..0d1fc08bd821 --- /dev/null +++ b/drivers/infiniband/ulp/ibtrs/ibtrs-srv.c @@ -0,0 +1,1811 @@ +/* + * InfiniBand Transport Layer + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#undef pr_fmt +#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt + +#include +#include + +#include "ibtrs-srv.h" +#include "ibtrs-log.h" + +MODULE_AUTHOR("ib...@profitbricks.com"); +MODULE_DESCRIPTION("IBTRS Server"); +MODULE_VERSION(IBTRS_VER_STRING); +MODULE_LICENSE("GPL"); + +#define DEFAULT_MAX_IO_SIZE_KB 128 +#define DEFAULT_MAX_IO_SIZE (DEFAULT_MAX_IO_SIZE_KB * 1024) +#define MAX_REQ_SIZE PAGE_SIZE +#define MAX_SG_COUNT ((MAX_REQ_SIZE - sizeof(struct ibtrs_msg_rdma_read)) \ + / sizeof(struct ibtrs_sg_desc)) + +static int max_io_size = DEFAULT_MAX_IO_SIZE; +static int rcv_buf_size = DEFAULT_MAX_IO_SIZE + MAX_REQ_SIZE; + +static int max_io_size_set(const char *val, const struct kernel_param *kp) +{ + int err, ival; + + err = kstrtoint(val, 0, &ival); + if (err) + return err; + + if (ival < 4096 || ival + MAX_REQ_SIZE > (4096 * 1024) || + (ival + MAX_REQ_SIZE) % 512 != 0) { + pr_err("Invalid max io size value %d, has to be" + " > %d, < %d\n", ival, 4096, 4194304); + return -EINVAL; + } + + max_io_size = ival; + rcv_buf_size = max_io_size + MAX_REQ_SIZE; + pr_info("max io size changed to %d\n", ival); + + return 0; +} + +static const struct kernel_param_ops max_io_size_ops = { + .set= max_io_size_set, + .get= param_get_int, +}; +module_param_cb(max_io_size, &max_io_size_ops, &max_io_size, 0444); +MODULE_PARM_DESC(max_io_size, +"Max size for each IO request, when change the unit is in byte" +" (default: " __stringify(DEFAULT_MAX_IO_SIZE_KB) "KB)"); + +#define DEFAULT_SESS_QUEUE_DEPTH 512 +static int sess_queue_depth = DEFAULT_SESS_QUEUE_DEPTH; +module_param_named(sess_queue_depth, sess_queue_depth, int, 0444); +MODULE_PARM_DESC(sess_queue_depth, +"Number of buffers for pending I/O requests to allocate" +" per session. Maximum: " __stringify(MAX_SESS_QUEUE_DEPTH) +" (default: " __stringify(DEFAULT_SESS_QUEUE_DEPTH) ")"); + +/* We guarantee to serve 10 paths at least */ +#define CHUNK_POOL_SIZE (DEFAULT_SESS_QUEUE_DEPTH * 10) +static mempool_t *chunk_pool; + +static int retry_count = 7; + +static int retry_count_set(const char *val, const struct kernel_param *kp) +{ + int err, ival; + + err = kstrtoint(val, 0, &ival); + if (err) + return err; + + if (ival < MIN_RTR_CNT || ival > MAX_RTR_CNT) { + pr_err("Invalid retry count value %d, has to be" + " > %d, < %d\n", ival, MIN_RTR_CNT, MAX_RTR_CNT); + return -EINVAL; + } + + retry_count = ival; + pr_info("QP retry count changed to %d\n", ival); + + return 0; +} + +static const struct kernel_param_ops retry_count_ops = { + .set= retry_count_set, + .get= param_get_int, +};
[PATCH 04/24] ibtrs: client: private header with client structs and functions
This header describes main structs and functions used by ibtrs-client module, mainly for managing IBTRS sessions, creating/destroying sysfs entries, accounting statistics on client side. Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/infiniband/ulp/ibtrs/ibtrs-clt.h | 338 +++ 1 file changed, 338 insertions(+) diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-clt.h b/drivers/infiniband/ulp/ibtrs/ibtrs-clt.h new file mode 100644 index ..b57af19ac833 --- /dev/null +++ b/drivers/infiniband/ulp/ibtrs/ibtrs-clt.h @@ -0,0 +1,338 @@ +/* + * InfiniBand Transport Layer + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#ifndef IBTRS_CLT_H +#define IBTRS_CLT_H + +#include "ibtrs-pri.h" + +/** + * enum ibtrs_clt_state - Client states. + */ +enum ibtrs_clt_state { + IBTRS_CLT_CONNECTING, + IBTRS_CLT_CONNECTING_ERR, + IBTRS_CLT_RECONNECTING, + IBTRS_CLT_CONNECTED, + IBTRS_CLT_CLOSING, + IBTRS_CLT_CLOSED, + IBTRS_CLT_DEAD, +}; + +static inline const char *ibtrs_clt_state_str(enum ibtrs_clt_state state) +{ + switch (state) { + case IBTRS_CLT_CONNECTING: + return "IBTRS_CLT_CONNECTING"; + case IBTRS_CLT_CONNECTING_ERR: + return "IBTRS_CLT_CONNECTING_ERR"; + case IBTRS_CLT_RECONNECTING: + return "IBTRS_CLT_RECONNECTING"; + case IBTRS_CLT_CONNECTED: + return "IBTRS_CLT_CONNECTED"; + case IBTRS_CLT_CLOSING: + return "IBTRS_CLT_CLOSING"; + case IBTRS_CLT_CLOSED: + return "IBTRS_CLT_CLOSED"; + case IBTRS_CLT_DEAD: + return "IBTRS_CLT_DEAD"; + default: + return "UNKNOWN"; + } +} + +enum ibtrs_fast_reg { + IBTRS_FAST_MEM_NONE, + IBTRS_FAST_MEM_FR, + IBTRS_FAST_MEM_FMR +}; + +enum ibtrs_mp_policy { + MP_POLICY_RR, + MP_POLICY_MIN_INFLIGHT, +}; + +struct ibtrs_clt_stats_reconnects { + int successful_cnt; + int fail_cnt; +}; + +struct ibtrs_clt_stats_wc_comp { + u32 cnt; + u64 total_cnt; +}; + +struct ibtrs_clt_stats_cpu_migr { + atomic_t from; + int to; +}; + +struct ibtrs_clt_stats_rdma { + struct { + u64 cnt; + u64 size_total; + } dir[2]; + + u64 failover_cnt; +}; + +struct ibtrs_clt_stats_rdma_lat { + u64 read; + u64 write; +}; + +#define MIN_LOG_SG 2 +#define MAX_LOG_SG 5 +#define MAX_LIN_SG BIT(MIN_LOG_SG) +#define SG_DISTR_SZ (MAX_LOG_SG - MIN_LOG_SG + MAX_LIN_SG + 2) + +#define MAX_LOG_LAT 16 +#define MIN_LOG_LAT 0 +#define LOG_LAT_SZ (MAX_LOG_LAT - MIN_LOG_LAT + 2) + +struct ibtrs_clt_stats_pcpu { + struct ibtrs_clt_stats_cpu_migr cpu_migr; + struct ibtrs_clt_stats_rdma rdma; + u64 sg_list_total; + u64 sg_list_distr[SG_DISTR_SZ]; + struct ibtrs_clt_stats_rdma_lat rdma_lat_distr[LOG_LAT_SZ]; + struct ibtrs_clt_stats_rdma_lat rdma_lat_max; + struct ibtrs_clt_stats_wc_comp wc_comp; +}; + +struct ibtrs_clt_stats { + boolenable_rdma_lat; + struct ibtrs_clt_stats_pcpu__percpu *pcpu_stats; + struct ibtrs_clt_stats_reconnects reconnects; + atomic_tinflight; +}; + +struct ibtrs_clt_con { + struct ibtrs_conc; + unsignedcpu; + atomic_tio_cnt; + struct ibtrs_fr_pool*fr_pool; + int cm_err; +}; + +struct ibtrs_clt_io_req { + struct list_headlist; + struct ibtrs_iu *iu; + struct scatterlist *sglist; /* list holding user data */ + unsigned intsg_cnt; +
[PATCH 01/24] ibtrs: public interface header to establish RDMA connections
Introduce public header which provides set of API functions to establish RDMA connections from client to server machine using IBTRS protocol, which manages RDMA connections for each session, does multipathing and load balancing. Main functions for client (active) side: ibtrs_clt_open() - Creates set of RDMA connections incapsulated in IBTRS session and returns pointer on IBTRS session object. ibtrs_clt_close() - Closes RDMA connections associated with IBTRS session. ibtrs_clt_request() - Requests zero-copy RDMA transfer to/from server. Main functions for server (passive) side: ibtrs_srv_open() - Starts listening for IBTRS clients on specified port and invokes IBTRS callbacks for incoming RDMA requests or link events. ibtrs_srv_close() - Closes IBTRS server context. Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/infiniband/ulp/ibtrs/ibtrs.h | 331 +++ 1 file changed, 331 insertions(+) diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs.h b/drivers/infiniband/ulp/ibtrs/ibtrs.h new file mode 100644 index ..747cdde3d9cf --- /dev/null +++ b/drivers/infiniband/ulp/ibtrs/ibtrs.h @@ -0,0 +1,331 @@ +/* + * InfiniBand Transport Layer + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#ifndef IBTRS_H +#define IBTRS_H + +#include +#include + +struct ibtrs_clt; +struct ibtrs_srv_ctx; +struct ibtrs_srv; +struct ibtrs_srv_op; + +/* + * Here goes IBTRS client API + */ + +/** + * enum ibtrs_clt_link_ev - Events about connectivity state of a client + * @IBTRS_CLT_LINK_EV_RECONNECTED Client was reconnected. + * @IBTRS_CLT_LINK_EV_DISCONNECTED Client was disconnected. + */ +enum ibtrs_clt_link_ev { + IBTRS_CLT_LINK_EV_RECONNECTED, + IBTRS_CLT_LINK_EV_DISCONNECTED, +}; + +/** + * Source and destination address of a path to be established + */ +struct ibtrs_addr { + struct sockaddr *src; + struct sockaddr *dst; +}; + +typedef void (link_clt_ev_fn)(void *priv, enum ibtrs_clt_link_ev ev); +/** + * ibtrs_clt_open() - Open a session to a IBTRS client + * @priv: User supplied private data. + * @link_ev: Event notification for connection state changes + * @priv: user supplied data that was passed to + * ibtrs_clt_open() + * @ev:Occurred event + * @sessname: name of the session + * @paths: Paths to be established defined by their src and dst addresses + * @path_cnt: Number of elemnts in the @paths array + * @port: port to be used by the IBTRS session + * @pdu_sz: Size of extra payload which can be accessed after tag allocation. + * @max_inflight_msg: Max. number of parallel inflight messages for the session + * @max_segments: Max. number of segments per IO request + * @reconnect_delay_sec: time between reconnect tries + * @max_reconnect_attempts: Number of times to reconnect on error before giving + * up, 0 for * disabled, -1 for forever + * + * Starts session establishment with the ibtrs_server. The function can block + * up to ~2000ms until it returns. + * + * Return a valid pointer on success otherwise PTR_ERR. + */ +struct ibtrs_clt *ibtrs_clt_open(void *priv, link_clt_ev_fn *link_ev, +const char *sessname, +const struct ibtrs_addr *paths, +size_t path_cnt, short port, +size_t pdu_sz, u8 reconnect_delay_sec, +u16 max_segments, +s16 max_reconnect_attempts); + +/** + * ibtrs_clt_close() - Close a session + * @sess: Session handler, is freed on return + */ +void ibtrs_clt_close(struct ibtrs_clt *sess); + +enum { + IBTRS_TAG_NOWAIT = 0, + IBTRS_TAG_WAIT = 1, +}; + +/** + * enum ibtrs_clt_con_type() t
[PATCH 08/24] ibtrs: server: private header with server structs and functions
This header describes main structs and functions used by ibtrs-server module, mainly for accepting IBTRS sessions, creating/destroying sysfs entries, accounting statistics on server side. Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/infiniband/ulp/ibtrs/ibtrs-srv.h | 169 +++ 1 file changed, 169 insertions(+) diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-srv.h b/drivers/infiniband/ulp/ibtrs/ibtrs-srv.h new file mode 100644 index ..f54e159eaf2a --- /dev/null +++ b/drivers/infiniband/ulp/ibtrs/ibtrs-srv.h @@ -0,0 +1,169 @@ +/* + * InfiniBand Transport Layer + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#ifndef IBTRS_SRV_H +#define IBTRS_SRV_H + +#include +#include "ibtrs-pri.h" + +/** + * enum ibtrs_srv_state - Server states. + */ +enum ibtrs_srv_state { + IBTRS_SRV_CONNECTING, + IBTRS_SRV_CONNECTED, + IBTRS_SRV_CLOSING, + IBTRS_SRV_CLOSED, +}; + +static inline const char *ibtrs_srv_state_str(enum ibtrs_srv_state state) +{ + switch (state) { + case IBTRS_SRV_CONNECTING: + return "IBTRS_SRV_CONNECTING"; + case IBTRS_SRV_CONNECTED: + return "IBTRS_SRV_CONNECTED"; + case IBTRS_SRV_CLOSING: + return "IBTRS_SRV_CLOSING"; + case IBTRS_SRV_CLOSED: + return "IBTRS_SRV_CLOSED"; + default: + return "UNKNOWN"; + } +} + +struct ibtrs_stats_wc_comp { + atomic64_t calls; + atomic64_t total_wc_cnt; +}; + +struct ibtrs_srv_stats_rdma_stats { + struct { + atomic64_t cnt; + atomic64_t size_total; + } dir[2]; +}; + +struct ibtrs_srv_stats { + struct ibtrs_srv_stats_rdma_stats rdma_stats; + atomic_tapm_cnt; + struct ibtrs_stats_wc_comp wc_comp; +}; + +struct ibtrs_srv_con { + struct ibtrs_conc; + atomic_twr_cnt; +}; + +struct ibtrs_srv_op { + struct ibtrs_srv_con*con; + u32 msg_id; + u8 dir; + u64 data_dma_addr; + struct ibtrs_msg_rdma_read *msg; + struct ib_rdma_wr *tx_wr; + struct ib_sge *tx_sg; +}; + +struct ibtrs_srv_sess { + struct ibtrs_sess s; + struct ibtrs_srv*srv; + struct work_struct close_work; + enum ibtrs_srv_statestate; + spinlock_t state_lock; + int cur_cq_vector; + struct ibtrs_srv_op **ops_ids; + atomic_tids_inflight; + wait_queue_head_t ids_waitq; + dma_addr_t *rdma_addr; + boolestablished; + unsigned intmem_bits; + struct kobject kobj; + struct kobject kobj_stats; + struct ibtrs_srv_stats stats; +}; + +struct ibtrs_srv { + struct list_headpaths_list; + int paths_up; + struct mutexpaths_ev_mutex; + size_t paths_num; + struct mutexpaths_mutex; + uuid_t paths_uuid; + refcount_t refcount; + struct ibtrs_srv_ctx*ctx; + struct list_headctx_list; + void*priv; + size_t queue_depth; + struct page **chunks; + struct kobject kobj; + struct kobject kobj_paths; +}; + +struct ibtrs_srv_ctx { + rdma_ev_fn *rdma_ev; + link_ev_fn *link_ev; + struct rdma_cm_id *cm_id_ip; + struct rdma_cm_id *cm_id_ib; + struct mutex srv_mutex; + struct list_head srv_list; +}; + +/* See ibtrs-log.h */ +#define TYPES_TO_SE
[PATCH 10/24] ibtrs: server: statistics functions
This introduces set of functions used on server side to account statistics of RDMA data sent/received. Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/infiniband/ulp/ibtrs/ibtrs-srv-stats.c | 110 + 1 file changed, 110 insertions(+) diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-srv-stats.c b/drivers/infiniband/ulp/ibtrs/ibtrs-srv-stats.c new file mode 100644 index ..441b07fdf44a --- /dev/null +++ b/drivers/infiniband/ulp/ibtrs/ibtrs-srv-stats.c @@ -0,0 +1,110 @@ +/* + * InfiniBand Transport Layer + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#undef pr_fmt +#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt + +#include "ibtrs-srv.h" + +void ibtrs_srv_update_rdma_stats(struct ibtrs_srv_stats *s, +size_t size, int d) +{ + atomic64_inc(&s->rdma_stats.dir[d].cnt); + atomic64_add(size, &s->rdma_stats.dir[d].size_total); +} + +void ibtrs_srv_update_wc_stats(struct ibtrs_srv_stats *s) +{ + atomic64_inc(&s->wc_comp.calls); + atomic64_inc(&s->wc_comp.total_wc_cnt); +} + +int ibtrs_srv_reset_rdma_stats(struct ibtrs_srv_stats *stats, bool enable) +{ + if (enable) { + struct ibtrs_srv_stats_rdma_stats *r = &stats->rdma_stats; + + memset(r, 0, sizeof(*r)); + return 0; + } + + return -EINVAL; +} + +ssize_t ibtrs_srv_stats_rdma_to_str(struct ibtrs_srv_stats *stats, + char *page, size_t len) +{ + struct ibtrs_srv_stats_rdma_stats *r = &stats->rdma_stats; + struct ibtrs_srv_sess *sess; + + sess = container_of(stats, typeof(*sess), stats); + + return scnprintf(page, len, "%ld %ld %ld %ld %u\n", +atomic64_read(&r->dir[READ].cnt), +atomic64_read(&r->dir[READ].size_total), +atomic64_read(&r->dir[WRITE].cnt), +atomic64_read(&r->dir[WRITE].size_total), +atomic_read(&sess->ids_inflight)); +} + +int ibtrs_srv_reset_wc_completion_stats(struct ibtrs_srv_stats *stats, + bool enable) +{ + if (enable) { + memset(&stats->wc_comp, 0, sizeof(stats->wc_comp)); + return 0; + } + + return -EINVAL; +} + +int ibtrs_srv_stats_wc_completion_to_str(struct ibtrs_srv_stats *stats, +char *buf, size_t len) +{ + return snprintf(buf, len, "%ld %ld\n", + atomic64_read(&stats->wc_comp.total_wc_cnt), + atomic64_read(&stats->wc_comp.calls)); +} + +ssize_t ibtrs_srv_reset_all_help(struct ibtrs_srv_stats *stats, +char *page, size_t len) +{ + return scnprintf(page, PAGE_SIZE, "echo 1 to reset all statistics\n"); +} + +int ibtrs_srv_reset_all_stats(struct ibtrs_srv_stats *stats, bool enable) +{ + if (enable) { + ibtrs_srv_reset_wc_completion_stats(stats, enable); + ibtrs_srv_reset_rdma_stats(stats, enable); + return 0; + } + + return -EINVAL; +} -- 2.13.1
[PATCH 02/24] ibtrs: private headers with IBTRS protocol structs and helpers
These are common private headers with IBTRS protocol structures, logging, sysfs and other helper functions, which are used on both client and server sides. Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/infiniband/ulp/ibtrs/ibtrs-log.h | 94 ++ drivers/infiniband/ulp/ibtrs/ibtrs-pri.h | 494 +++ 2 files changed, 588 insertions(+) diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-log.h b/drivers/infiniband/ulp/ibtrs/ibtrs-log.h new file mode 100644 index ..308593785c64 --- /dev/null +++ b/drivers/infiniband/ulp/ibtrs/ibtrs-log.h @@ -0,0 +1,94 @@ +/* + * InfiniBand Transport Layer + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#ifndef IBTRS_LOG_H +#define IBTRS_LOG_H + +#define P1 ) +#define P2 )) +#define P3 ))) +#define P4 +#define P(N) P ## N + +#define CAT(a, ...) PRIMITIVE_CAT(a, __VA_ARGS__) +#define PRIMITIVE_CAT(a, ...) a ## __VA_ARGS__ + +#define COUNT_ARGS(...) COUNT_ARGS_(,##__VA_ARGS__,6,5,4,3,2,1,0) +#define COUNT_ARGS_(z,a,b,c,d,e,f,cnt,...) cnt + +#define LIST(...) \ + __VA_ARGS__,\ + ({ unknown_type(); NULL; }) \ + CAT(P, COUNT_ARGS(__VA_ARGS__)) \ + +#define EMPTY() +#define DEFER(id) id EMPTY() + +#define _CASE(obj, type, member) \ + __builtin_choose_expr( \ + __builtin_types_compatible_p( \ + typeof(obj), type), \ + ((type)obj)->member +#define CASE(o, t, m) DEFER(_CASE)(o,t,m) + +/* + * Below we define retrieving of sessname from common IBTRS types. + * Client or server related types have to be defined by special + * TYPES_TO_SESSNAME macro. + */ + +void unknown_type(void); + +#ifndef TYPES_TO_SESSNAME +#define TYPES_TO_SESSNAME(...) ({ unknown_type(); NULL; }) +#endif + +#define ibtrs_prefix(obj) \ + _CASE(obj, struct ibtrs_con *, sess->sessname),\ + _CASE(obj, struct ibtrs_sess *, sessname), \ + TYPES_TO_SESSNAME(obj) \ + )) + +#define ibtrs_log(fn, obj, fmt, ...) \ + fn("<%s>: " fmt, ibtrs_prefix(obj), ##__VA_ARGS__) + +#define ibtrs_err(obj, fmt, ...) \ + ibtrs_log(pr_err, obj, fmt, ##__VA_ARGS__) +#define ibtrs_err_rl(obj, fmt, ...)\ + ibtrs_log(pr_err_ratelimited, obj, fmt, ##__VA_ARGS__) +#define ibtrs_wrn(obj, fmt, ...) \ + ibtrs_log(pr_warn, obj, fmt, ##__VA_ARGS__) +#define ibtrs_wrn_rl(obj, fmt, ...) \ + ibtrs_log(pr_warn_ratelimited, obj, fmt, ##__VA_ARGS__) +#define ibtrs_info(obj, fmt, ...) \ + ibtrs_log(pr_info, obj, fmt, ##__VA_ARGS__) +#define ibtrs_info_rl(obj, fmt, ...) \ + ibtrs_log(pr_info_ratelimited, obj, fmt, ##__VA_ARGS__) + +#endif /* IBTRS_LOG_H */ diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-pri.h b/drivers/infiniband/ulp/ibtrs/ibtrs-pri.h new file mode 100644 index ..b3b51af8607e --- /dev/null +++ b/drivers/infiniband/ulp/ibtrs/ibtrs-pri.h @@ -0,0 +1,494 @@ +/* + * InfiniBand Transport Layer + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope tha
[PATCH 13/24] ibtrs: a bit of documentation
README with description of major sysfs entries. Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/infiniband/ulp/ibtrs/README | 238 1 file changed, 238 insertions(+) diff --git a/drivers/infiniband/ulp/ibtrs/README b/drivers/infiniband/ulp/ibtrs/README new file mode 100644 index ..ed506c7e202d --- /dev/null +++ b/drivers/infiniband/ulp/ibtrs/README @@ -0,0 +1,238 @@ + +InfiniBand Transport (IBTRS) + + +IBTRS (InfiniBand Transport) is a reliable high speed transport library +which provides support to establish optimal number of connections +between client and server machines using RDMA (InfiniBand, RoCE, iWarp) +transport. It is optimized to transfer (read/write) IO blocks. + +In its core interface it follows the BIO semantics of providing the +possibility to either write data from an sg list to the remote side +or to request ("read") data transfer from the remote side into a given +sg list. + +IBTRS provides I/O fail-over and load-balancing capabilities by using +multipath I/O (see "add_path" and "mp_policy" configuration entries). + +IBTRS is used by the IBNBD (Infiniband Network Block Device) modules. + +== +Client Sysfs Interface +== + +This chapter describes only the most important files of sysfs interface +on client side. + +Entries under /sys/kernel/ibtrs_client/ +=== + +When a user of IBTRS API creates a new session, a directory entry with +the name of that session is created. + +Entries under /sys/kernel/ibtrs_client// +== + +add_path (RW) +- + +Adds a new path (connection) to an existing session. Expected format is the +following: + + <[source addr,]destination addr> + + *addr ::= [ ip: | gid: ] + +max_reconnect_attempts (RW) +--- + +Maximum number reconnect attempts the client should make before giving up +after connection breaks unexpectedly. + +mp_policy (RW) +-- + +Multipath policy specifies which path should be selected on each IO: + + round-robin (0): + select path in per CPU round-robin manner. + + min-inflight (1): + select path with minimum inflights. + +Entries under /sys/kernel/ibtrs_client//paths/ + + + +Each path belonging to a given session is listed here by its destination +address. When a new path is added to a session by writing to the "add_path" +entry, a directory with the corresponding destination address is created. + +Entries under /sys/kernel/ibtrs_client//paths// + + +state (R) +- + +Contains "connected" if the session is connected to the peer and fully +functional. Otherwise the file contains "disconnected" + +reconnect (RW) +-- + +Write "1" to the file in order to reconnect the path. +Operation is blocking and returns 0 if reconnect was successfull. + +disconnect (RW) +--- + +Write "1" to the file in order to disconnect the path. +Operation blocks until IBTRS path is disconnected. + +remove_path (RW) + + +Write "1" to the file in order to disconnected and remove the path +from the session. Operation blocks until the path is disconnected +and removed from the session. + +Entries under /sys/kernel/ibtrs_client//paths//stats/ +== + +Write "0" to any file in that directory to reset corresponding statistics. + +reset_all (RW) +-- + +Read will return usage help, write 0 will clear all the statistics. + +sg_entries (RW) +--- + +Data to be transfered via RDMA is passed to IBTRS as scather-gather +list. A scather-gather list can contain multiple entries. +Scather-gather list with less entries require less processing power +and can therefore transfered faster. The file sg_entries outputs a +per-CPU distribution table for the number of entries in the +scather-gather lists, that were passed to the IBTRS API function +ibtrs_clt_request (READ or WRITE). + +cpu_migration (RW) +-- + +IBTRS expects that each HCA IRQ is pinned to a separate CPU. If it's +not the case, the processing of an I/O response could be processed on a +different CPU than where it was originally submitted. This file shows +how many interrupts where generated on a non expected CPU. +"from:" is the CPU on which the IRQ was expected, but not generated. +"to:" is the CPU on which the IRQ was generated, but not expected. + +reconnects (RW) +--- + +Contains 2 unsigned int values, the first one records number of successful +reconnects in the path lifetime, t
[PATCH 12/24] ibtrs: include client and server modules into kernel compilation
Add IBTRS Makefile, Kconfig and also corresponding lines into upper layer infiniband/ulp files. Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/infiniband/Kconfig| 1 + drivers/infiniband/ulp/Makefile | 1 + drivers/infiniband/ulp/ibtrs/Kconfig | 20 drivers/infiniband/ulp/ibtrs/Makefile | 15 +++ 4 files changed, 37 insertions(+) diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig index cbf186522016..7adbd0e272c4 100644 --- a/drivers/infiniband/Kconfig +++ b/drivers/infiniband/Kconfig @@ -93,6 +93,7 @@ source "drivers/infiniband/ulp/srpt/Kconfig" source "drivers/infiniband/ulp/iser/Kconfig" source "drivers/infiniband/ulp/isert/Kconfig" +source "drivers/infiniband/ulp/ibtrs/Kconfig" source "drivers/infiniband/ulp/opa_vnic/Kconfig" source "drivers/infiniband/sw/rdmavt/Kconfig" diff --git a/drivers/infiniband/ulp/Makefile b/drivers/infiniband/ulp/Makefile index 437813c7b481..1c4f10dc8d49 100644 --- a/drivers/infiniband/ulp/Makefile +++ b/drivers/infiniband/ulp/Makefile @@ -5,3 +5,4 @@ obj-$(CONFIG_INFINIBAND_SRPT) += srpt/ obj-$(CONFIG_INFINIBAND_ISER) += iser/ obj-$(CONFIG_INFINIBAND_ISERT) += isert/ obj-$(CONFIG_INFINIBAND_OPA_VNIC) += opa_vnic/ +obj-$(CONFIG_INFINIBAND_IBTRS) += ibtrs/ diff --git a/drivers/infiniband/ulp/ibtrs/Kconfig b/drivers/infiniband/ulp/ibtrs/Kconfig new file mode 100644 index ..eaeb8f3f6b4e --- /dev/null +++ b/drivers/infiniband/ulp/ibtrs/Kconfig @@ -0,0 +1,20 @@ +config INFINIBAND_IBTRS + tristate + depends on INFINIBAND_ADDR_TRANS + +config INFINIBAND_IBTRS_CLIENT + tristate "IBTRS client module" + depends on INFINIBAND_ADDR_TRANS + select INFINIBAND_IBTRS + help + IBTRS client allows for simplified data transfer and connection + establishment over RDMA (InfiniBand, RoCE, iWarp). Uses BIO-like + READ/WRITE semantics and provides multipath capabilities. + +config INFINIBAND_IBTRS_SERVER + tristate "IBTRS server module" + depends on INFINIBAND_ADDR_TRANS + select INFINIBAND_IBTRS + help + IBTRS server module processing connection and IO requests received + from the IBTRS client module. diff --git a/drivers/infiniband/ulp/ibtrs/Makefile b/drivers/infiniband/ulp/ibtrs/Makefile new file mode 100644 index ..e6ea858745ad --- /dev/null +++ b/drivers/infiniband/ulp/ibtrs/Makefile @@ -0,0 +1,15 @@ +ibtrs-client-y := ibtrs-clt.o \ + ibtrs-clt-stats.o \ + ibtrs-clt-sysfs.o + +ibtrs-server-y := ibtrs-srv.o \ + ibtrs-srv-stats.o \ + ibtrs-srv-sysfs.o + +ibtrs-core-y := ibtrs.o + +obj-$(CONFIG_INFINIBAND_IBTRS)+= ibtrs-core.o +obj-$(CONFIG_INFINIBAND_IBTRS_CLIENT) += ibtrs-client.o +obj-$(CONFIG_INFINIBAND_IBTRS_SERVER) += ibtrs-server.o + +-include $(src)/compat/compat.mk -- 2.13.1
[PATCH 11/24] ibtrs: server: sysfs interface functions
This is the sysfs interface to IBTRS sessions on server side: /sys/kernel/ibtrs_server// *** IBTRS session accepted from a client peer | |- paths// *** established paths from a client in a session | |- disconnect | *** disconnect path | |- hca_name | *** HCA name | |- hca_port | *** HCA port | |- stats/ *** current path statistics | |- rdma |- reset_all |- wc_completions Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/infiniband/ulp/ibtrs/ibtrs-srv-sysfs.c | 278 + 1 file changed, 278 insertions(+) diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-srv-sysfs.c b/drivers/infiniband/ulp/ibtrs/ibtrs-srv-sysfs.c new file mode 100644 index ..ec2c86fe4181 --- /dev/null +++ b/drivers/infiniband/ulp/ibtrs/ibtrs-srv-sysfs.c @@ -0,0 +1,278 @@ +/* + * InfiniBand Transport Layer + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#undef pr_fmt +#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt + +#include "ibtrs-pri.h" +#include "ibtrs-srv.h" +#include "ibtrs-log.h" + +static struct kobject *ibtrs_kobj; + +static struct kobj_type ktype = { + .sysfs_ops = &kobj_sysfs_ops, +}; + +static ssize_t ibtrs_srv_disconnect_show(struct kobject *kobj, +struct kobj_attribute *attr, +char *page) +{ + return scnprintf(page, PAGE_SIZE, "Usage: echo 1 > %s\n", +attr->attr.name); +} + +static ssize_t ibtrs_srv_disconnect_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t count) +{ + struct ibtrs_srv_sess *sess; + char str[MAXHOSTNAMELEN]; + + sess = container_of(kobj, struct ibtrs_srv_sess, kobj); + if (!sysfs_streq(buf, "1")) { + ibtrs_err(sess, "%s: invalid value: '%s'\n", + attr->attr.name, buf); + return -EINVAL; + } + + sockaddr_to_str((struct sockaddr *)&sess->s.dst_addr, str, sizeof(str)); + + ibtrs_info(sess, "disconnect for path %s requested\n", str); + ibtrs_srv_queue_close(sess); + + return count; +} + +static struct kobj_attribute ibtrs_srv_disconnect_attr = + __ATTR(disconnect, 0644, + ibtrs_srv_disconnect_show, ibtrs_srv_disconnect_store); + +static ssize_t ibtrs_srv_hca_port_show(struct kobject *kobj, + struct kobj_attribute *attr, + char *page) +{ + struct ibtrs_srv_sess *sess; + struct ibtrs_con *usr_con; + + sess = container_of(kobj, typeof(*sess), kobj); + usr_con = sess->s.con[0]; + + return scnprintf(page, PAGE_SIZE, "%u\n", +usr_con->cm_id->port_num); +} + +static struct kobj_attribute ibtrs_srv_hca_port_attr = + __ATTR(hca_port, 0444, ibtrs_srv_hca_port_show, NULL); + +static ssize_t ibtrs_srv_hca_name_show(struct kobject *kobj, + struct kobj_attribute *attr, + char *page) +{ + struct ibtrs_srv_sess *sess; + + sess = container_of(kobj, struct ibtrs_srv_sess, kobj); + + return scnprintf(page, PAGE_SIZE, "%s\n", +sess->s.ib_dev->dev->name); +} + +static struct kobj_attribute ibtrs_srv_hca_name_attr = + __ATTR(hca_name, 0444, ibtrs_srv_hca_name_show, NULL); + +static struct attribute *ibtrs_srv_sess_attrs[] = { + &ibtrs_srv_hca_name_attr.attr, + &ibtrs_srv_hca_port_attr.attr, + &ibtrs_srv_disconnect
[PATCH 21/24] ibnbd: server: sysfs interface functions
This is the sysfs interface to IBNBD mapped devices on server side: /sys/kernel/ibnbd_server/devices// |- block_dev | *** link pointing to the corresponding block device sysfs entry | |- sessions// | *** sessions directory | |- read_only | *** is devices mapped as read only | |- mapping_path *** relative device path provided by the client during mapping Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/block/ibnbd/ibnbd-srv-sysfs.c | 264 ++ 1 file changed, 264 insertions(+) diff --git a/drivers/block/ibnbd/ibnbd-srv-sysfs.c b/drivers/block/ibnbd/ibnbd-srv-sysfs.c new file mode 100644 index ..a0efd6a2accb --- /dev/null +++ b/drivers/block/ibnbd/ibnbd-srv-sysfs.c @@ -0,0 +1,264 @@ +/* + * InfiniBand Network Block Driver + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#undef pr_fmt +#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt + +#include +#include +#include +#include +#include +#include +#include + +#include "ibnbd-srv.h" + +static struct kobject *ibnbd_srv_kobj; +static struct kobject *ibnbd_srv_devices_kobj; + +static struct attribute *ibnbd_srv_default_dev_attrs[] = { + NULL, +}; + +static struct attribute_group ibnbd_srv_default_dev_attr_group = { + .attrs = ibnbd_srv_default_dev_attrs, +}; + +static ssize_t ibnbd_srv_attr_show(struct kobject *kobj, struct attribute *attr, + char *page) +{ + struct kobj_attribute *kattr; + int ret = -EIO; + + kattr = container_of(attr, struct kobj_attribute, attr); + if (kattr->show) + ret = kattr->show(kobj, kattr, page); + return ret; +} + +static ssize_t ibnbd_srv_attr_store(struct kobject *kobj, + struct attribute *attr, + const char *page, size_t length) +{ + struct kobj_attribute *kattr; + int ret = -EIO; + + kattr = container_of(attr, struct kobj_attribute, attr); + if (kattr->store) + ret = kattr->store(kobj, kattr, page, length); + return ret; +} + +static const struct sysfs_ops ibnbd_srv_sysfs_ops = { + .show = ibnbd_srv_attr_show, + .store = ibnbd_srv_attr_store, +}; + +static struct kobj_type ibnbd_srv_dev_ktype = { + .sysfs_ops = &ibnbd_srv_sysfs_ops, +}; + +static struct kobj_type ibnbd_srv_dev_sessions_ktype = { + .sysfs_ops = &ibnbd_srv_sysfs_ops, +}; + +int ibnbd_srv_create_dev_sysfs(struct ibnbd_srv_dev *dev, + struct block_device *bdev, + const char *dir_name) +{ + struct kobject *bdev_kobj; + int ret; + + ret = kobject_init_and_add(&dev->dev_kobj, &ibnbd_srv_dev_ktype, + ibnbd_srv_devices_kobj, dir_name); + if (ret) + return ret; + + ret = kobject_init_and_add(&dev->dev_sessions_kobj, + &ibnbd_srv_dev_sessions_ktype, + &dev->dev_kobj, "sessions"); + if (ret) + goto err; + + ret = sysfs_create_group(&dev->dev_kobj, +&ibnbd_srv_default_dev_attr_group); + if (ret) + goto err2; + + bdev_kobj = &disk_to_dev(bdev->bd_disk)->kobj; + ret = sysfs_create_link(&dev->dev_kobj, bdev_kobj, "block_dev"); + if (ret) + goto err3; + + return 0; + +err3: + sysfs_remove_group(&dev->dev_kobj, + &ibnbd_srv_default_dev_attr_group); +err2: + kobject_del(&dev->dev_sessions_kobj); + kobject_put(&dev->dev_sessions_kobj); +err: + kobject_del(&dev->dev_kobj); + kobje
[PATCH 22/24] ibnbd: include client and server modules into kernel compilation
Add IBNBD Makefile, Kconfig and also corresponding lines into upper block layer files. Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/block/Kconfig| 2 ++ drivers/block/Makefile | 1 + drivers/block/ibnbd/Kconfig | 22 ++ drivers/block/ibnbd/Makefile | 13 + 4 files changed, 38 insertions(+) diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig index 40579d0cb3d1..483aae5d391e 100644 --- a/drivers/block/Kconfig +++ b/drivers/block/Kconfig @@ -477,4 +477,6 @@ config BLK_DEV_RSXX To compile this driver as a module, choose M here: the module will be called rsxx. +source "drivers/block/ibnbd/Kconfig" + endif # BLK_DEV diff --git a/drivers/block/Makefile b/drivers/block/Makefile index dc061158b403..65346a1d0b1a 100644 --- a/drivers/block/Makefile +++ b/drivers/block/Makefile @@ -38,6 +38,7 @@ obj-$(CONFIG_BLK_DEV_PCIESSD_MTIP32XX)+= mtip32xx/ obj-$(CONFIG_BLK_DEV_RSXX) += rsxx/ obj-$(CONFIG_BLK_DEV_NULL_BLK) += null_blk.o obj-$(CONFIG_ZRAM) += zram/ +obj-$(CONFIG_BLK_DEV_IBNBD)+= ibnbd/ skd-y := skd_main.o swim_mod-y := swim.o swim_asm.o diff --git a/drivers/block/ibnbd/Kconfig b/drivers/block/ibnbd/Kconfig new file mode 100644 index ..c5cc7d111c7a --- /dev/null +++ b/drivers/block/ibnbd/Kconfig @@ -0,0 +1,22 @@ +config BLK_DEV_IBNBD + boolean + +config BLK_DEV_IBNBD_CLIENT + tristate "Network block device driver on top of IBTRS transport" + depends on INFINIBAND_IBTRS_CLIENT + select BLK_DEV_IBNBD + help + IBNBD client allows for mapping of a remote block devices over + IBTRS protocol from a target system where IBNBD server is running. + + If unsure, say N. + +config BLK_DEV_IBNBD_SERVER + tristate "Network block device over RDMA Infiniband server support" + depends on INFINIBAND_IBTRS_SERVER + select BLK_DEV_IBNBD + help + IBNBD server allows for exporting local block devices to a remote client + over IBTRS protocol. + + If unsure, say N. diff --git a/drivers/block/ibnbd/Makefile b/drivers/block/ibnbd/Makefile new file mode 100644 index ..5f20e72e0633 --- /dev/null +++ b/drivers/block/ibnbd/Makefile @@ -0,0 +1,13 @@ +ccflags-y := -Idrivers/infiniband/ulp/ibtrs + +ibnbd-client-y := ibnbd-clt.o \ + ibnbd-clt-sysfs.o + +ibnbd-server-y := ibnbd-srv.o \ + ibnbd-srv-dev.o \ + ibnbd-srv-sysfs.o + +obj-$(CONFIG_BLK_DEV_IBNBD_CLIENT) += ibnbd-client.o +obj-$(CONFIG_BLK_DEV_IBNBD_SERVER) += ibnbd-server.o + +-include $(src)/compat/compat.mk -- 2.13.1
[PATCH 23/24] ibnbd: a bit of documentation
README with description of major sysfs entries. Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/block/ibnbd/README | 272 + 1 file changed, 272 insertions(+) diff --git a/drivers/block/ibnbd/README b/drivers/block/ibnbd/README new file mode 100644 index ..e0feb39fad14 --- /dev/null +++ b/drivers/block/ibnbd/README @@ -0,0 +1,272 @@ +*** +Infiniband Network Block Device (IBNBD) +*** + +Introduction + + +IBNBD (InfiniBand Network Block Device) is a pair of kernel modules +(client and server) that allow for remote access of a block device on +the server over IBTRS protocol using the RDMA (InfiniBand, RoCE, iWarp) +transport. After being mapped, the remote block devices can be accessed +on the client side as local block devices. + +I/O is transfered between client and server by the IBTRS transport +modules. The administration of IBNBD and IBTRS modules is done via +sysfs entries. + +Requirements + + + IBTRS kernel modules + +Quick Start +--- + +Server side: + # modprobe ibnbd_server + +Client side: + # modprobe ibnbd_client + # echo "sessname=blya path=ip:10.50.100.66 device_path=/dev/ram0" > \ +/sys/kernel/ibnbd_client/map_device + + Where "sessname=" is a session name, a string to identify the session + on client and on server sides; "path=" is a destination IP address or + a pair of a source and a destination IPs, separated by comma. Multiple + "path=" options can be specified in order to use multipath (see IBTRS + description for details); "device_path=" is the block device to be + mapped from the server side. After the session to the server machine is + established, the mapped device will appear on the client side under + /dev/ibnbd. + + +== +Client Sysfs Interface +== + +All sysfs files that are not read-only provide the usage information on read: + +Example: + # cat /sys/kernel/ibnbd_client/map_device + + > Usage: echo "sessname= path=<[srcaddr,]dstaddr> + > [path=<[srcaddr,]dstaddr>] device_path= + > [access_mode=] [input_mode=] + > [io_mode=]" > map_device + > + > addr ::= [ ip: | ip: | gid: ] + +Entries under /sys/kernel/ibnbd_client/ +=== + +map_device (RW) +--- + +Expected format is the following: + +sessname= +path=<[srcaddr,]dstaddr> [path=<[srcaddr,]dstaddr> ...] +device_path= +[access_mode=] +[input_mode=] +[io_mode=] + +Where: + +sessname: accepts a string not bigger than 256 chars, which identifies + a given session on the client and on the server. + I.e. "clt_hostname-srv_hostname" could be a natural choice. + +path: describes a connection between the client and the server by + specifying destination and, when required, the source address. + The addresses are to be provided in the following format: + +ip: +ip: +gid: + + for example: + + path=ip:10.0.0.66 + The single addr is treated as the destination. + The connection will be established to this + server from any client IP address. + + path=ip:10.0.0.66,ip:10.0.1.66 + First addr is the source address and the second + is the destination. + + If multiple "path=" options are specified multiple connection + will be established and data will be sent according to + the selected multipath policy (see IBTRS mp_policy sysfs entry + description). + +device_path: Path to the block device on the server side. Path is specified +relative to the directory on server side configured in the + 'dev_search_path' module parameter of the ibnbd_server. + The ibnbd_server prepends the received from client +with and tries to open the +/ block device. On success, +a /dev/ibnbd device file, a /sys/block/ibnbd_client/ibnbd/ +directory and an entry in /sys/kernel/ibnbd_client/devices will be + created. + +access_mode: the access_mode parameter specifies if the device is to be + mapped as "ro" read-only or "rw" read-write. The server allows +a device to be exported in rw mode only once. The "migration" + access mode has to be specified if a second mapping in read-write +mode is desired. + + By default "rw" is used. + +input_mode: the input_mode parameter specifies the internal I/O +processing mode of the block device on the client. Acce
[PATCH 16/24] ibnbd: client: main functionality
This is main functionality of ibnbd-client module, which provides interface to map remote device as local block device /dev/ibnbd and feeds IBTRS with IO requests. Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/block/ibnbd/ibnbd-clt.c | 1959 +++ 1 file changed, 1959 insertions(+) diff --git a/drivers/block/ibnbd/ibnbd-clt.c b/drivers/block/ibnbd/ibnbd-clt.c new file mode 100644 index ..b5bc71414778 --- /dev/null +++ b/drivers/block/ibnbd/ibnbd-clt.c @@ -0,0 +1,1959 @@ +/* + * InfiniBand Network Block Driver + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#undef pr_fmt +#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt + +#include +#include +#include +#include +#include + +#include "ibnbd-clt.h" + +MODULE_AUTHOR("ib...@profitbricks.com"); +MODULE_DESCRIPTION("InfiniBand Network Block Device Client"); +MODULE_VERSION(IBNBD_VER_STRING); +MODULE_LICENSE("GPL"); + +static int ibnbd_client_major; +static DEFINE_IDA(index_ida); +static DEFINE_MUTEX(ida_lock); +static DEFINE_MUTEX(sess_lock); +static LIST_HEAD(sess_list); + +static bool softirq_enable; +module_param(softirq_enable, bool, 0444); +MODULE_PARM_DESC(softirq_enable, "finish request in softirq_fn." +" (default: 0)"); +/* + * Maximum number of partitions an instance can have. + * 6 bits = 64 minors = 63 partitions (one minor is used for the device itself) + */ +#define IBNBD_PART_BITS6 +#define KERNEL_SECTOR_SIZE 512 + +static inline bool ibnbd_clt_get_sess(struct ibnbd_clt_session *sess) +{ + return refcount_inc_not_zero(&sess->refcount); +} + +static void free_sess(struct ibnbd_clt_session *sess); + +static void ibnbd_clt_put_sess(struct ibnbd_clt_session *sess) +{ + might_sleep(); + + if (refcount_dec_and_test(&sess->refcount)) + free_sess(sess); +} + +static inline bool ibnbd_clt_dev_is_mapped(struct ibnbd_clt_dev *dev) +{ + return dev->dev_state == DEV_STATE_MAPPED; +} + +static void ibnbd_clt_put_dev(struct ibnbd_clt_dev *dev) +{ + might_sleep(); + + if (refcount_dec_and_test(&dev->refcount)) { + mutex_lock(&ida_lock); + ida_simple_remove(&index_ida, dev->clt_device_id); + mutex_unlock(&ida_lock); + kfree(dev->hw_queues); + ibnbd_clt_put_sess(dev->sess); + kfree(dev); + } +} + +static inline bool ibnbd_clt_get_dev(struct ibnbd_clt_dev *dev) +{ + return refcount_inc_not_zero(&dev->refcount); +} + +static void ibnbd_clt_set_dev_attr(struct ibnbd_clt_dev *dev, + const struct ibnbd_msg_open_rsp *rsp) +{ + struct ibnbd_clt_session *sess = dev->sess; + + dev->device_id = le32_to_cpu(rsp->device_id); + dev->nsectors = le64_to_cpu(rsp->nsectors); + dev->logical_block_size = le16_to_cpu(rsp->logical_block_size); + dev->physical_block_size= le16_to_cpu(rsp->physical_block_size); + dev->max_write_same_sectors = le32_to_cpu(rsp->max_write_same_sectors); + dev->max_discard_sectors= le32_to_cpu(rsp->max_discard_sectors); + dev->discard_granularity= le32_to_cpu(rsp->discard_granularity); + dev->discard_alignment = le32_to_cpu(rsp->discard_alignment); + dev->secure_discard = le16_to_cpu(rsp->secure_discard); + dev->rotational = rsp->rotational; + dev->remote_io_mode = rsp->io_mode; + + dev->max_hw_sectors = sess->max_io_size / dev->logical_block_size; + dev->max_segments = BMAX_SEGMENTS; + + if (dev->remote_io_mode == IBNBD_BLOCKIO) { + dev->max_hw_sectors = min_t(u32
[PATCH 20/24] ibnbd: server: functionality for IO submission to file or block dev
This provides helper functions for IO submission to file or block dev. Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/block/ibnbd/ibnbd-srv-dev.c | 410 drivers/block/ibnbd/ibnbd-srv-dev.h | 149 + 2 files changed, 559 insertions(+) diff --git a/drivers/block/ibnbd/ibnbd-srv-dev.c b/drivers/block/ibnbd/ibnbd-srv-dev.c new file mode 100644 index ..a5894849b9d5 --- /dev/null +++ b/drivers/block/ibnbd/ibnbd-srv-dev.c @@ -0,0 +1,410 @@ +/* + * InfiniBand Network Block Driver + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#undef pr_fmt +#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt + +#include "ibnbd-srv-dev.h" +#include "ibnbd-log.h" + +#define IBNBD_DEV_MAX_FILEIO_ACTIVE_WORKERS 0 + +struct ibnbd_dev_file_io_work { + struct ibnbd_dev*dev; + void*priv; + + sector_tsector; + void*data; + size_t len; + size_t bi_size; + enum ibnbd_io_flags flags; + + struct work_struct work; +}; + +struct ibnbd_dev_blk_io { + struct ibnbd_dev *dev; + void *priv; +}; + +static struct workqueue_struct *fileio_wq; + +int ibnbd_dev_init(void) +{ + fileio_wq = alloc_workqueue("%s", WQ_UNBOUND, + IBNBD_DEV_MAX_FILEIO_ACTIVE_WORKERS, + "ibnbd_server_fileio_wq"); + if (!fileio_wq) + return -ENOMEM; + + return 0; +} + +void ibnbd_dev_destroy(void) +{ + destroy_workqueue(fileio_wq); +} + +static inline struct block_device *ibnbd_dev_open_bdev(const char *path, + fmode_t flags) +{ + return blkdev_get_by_path(path, flags, THIS_MODULE); +} + +static int ibnbd_dev_blk_open(struct ibnbd_dev *dev, const char *path, + fmode_t flags) +{ + dev->bdev = ibnbd_dev_open_bdev(path, flags); + return PTR_ERR_OR_ZERO(dev->bdev); +} + +static int ibnbd_dev_vfs_open(struct ibnbd_dev *dev, const char *path, + fmode_t flags) +{ + int oflags = O_DSYNC; /* enable write-through */ + + if (flags & FMODE_WRITE) + oflags |= O_RDWR; + else if (flags & FMODE_READ) + oflags |= O_RDONLY; + else + return -EINVAL; + + dev->file = filp_open(path, oflags, 0); + return PTR_ERR_OR_ZERO(dev->file); +} + +struct ibnbd_dev *ibnbd_dev_open(const char *path, fmode_t flags, +enum ibnbd_io_mode mode, struct bio_set *bs, +ibnbd_dev_io_fn io_cb) +{ + struct ibnbd_dev *dev; + int ret; + + dev = kzalloc(sizeof(*dev), GFP_KERNEL); + if (!dev) + return ERR_PTR(-ENOMEM); + + if (mode == IBNBD_BLOCKIO) { + dev->blk_open_flags = flags; + ret = ibnbd_dev_blk_open(dev, path, dev->blk_open_flags); + if (ret) + goto err; + } else if (mode == IBNBD_FILEIO) { + dev->blk_open_flags = FMODE_READ; + ret = ibnbd_dev_blk_open(dev, path, dev->blk_open_flags); + if (ret) + goto err; + + ret = ibnbd_dev_vfs_open(dev, path, flags); + if (ret) + goto blk_put; + } + + dev->blk_open_flags = flags; + dev->mode = mode; + dev->io_cb = io_cb; + bdevname(dev->bdev, dev->name); + dev->ibd_bio_set= bs; + + return dev; + +blk_put: + blkdev_put(dev->bdev, dev->blk_open_flags); +err: + kfree(dev); + return ERR_PTR(ret); +} + +void ibnbd
[PATCH 18/24] ibnbd: server: private header with server structs and functions
This header describes main structs and functions used by ibnbd-server module, namely structs for managing sessions from different clients and mapped (opened) devices. Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/block/ibnbd/ibnbd-srv.h | 100 1 file changed, 100 insertions(+) diff --git a/drivers/block/ibnbd/ibnbd-srv.h b/drivers/block/ibnbd/ibnbd-srv.h new file mode 100644 index ..191a1650bc1d --- /dev/null +++ b/drivers/block/ibnbd/ibnbd-srv.h @@ -0,0 +1,100 @@ +/* + * InfiniBand Network Block Driver + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#ifndef IBNBD_SRV_H +#define IBNBD_SRV_H + +#include +#include +#include + +#include "ibtrs.h" +#include "ibnbd-proto.h" +#include "ibnbd-log.h" + +struct ibnbd_srv_session { + /* Entry inside global sess_list */ + struct list_headlist; + struct ibtrs_srv*ibtrs; + charsessname[NAME_MAX]; + int queue_depth; + struct bio_set *sess_bio_set; + + rwlock_tindex_lock cacheline_aligned; + struct idr index_idr; + /* List of struct ibnbd_srv_sess_dev */ + struct list_headsess_dev_list; + struct mutexlock; + u8 ver; +}; + +struct ibnbd_srv_dev { + /* Entry inside global dev_list */ + struct list_headlist; + struct kobject dev_kobj; + struct kobject dev_sessions_kobj; + struct kref kref; + charid[NAME_MAX]; + /* List of ibnbd_srv_sess_dev structs */ + struct list_headsess_dev_list; + struct mutexlock; + int open_write_cnt; + enum ibnbd_io_mode mode; +}; + +/* Structure which binds N devices and N sessions */ +struct ibnbd_srv_sess_dev { + /* Entry inside ibnbd_srv_dev struct */ + struct list_headdev_list; + /* Entry inside ibnbd_srv_session struct */ + struct list_headsess_list; + struct ibnbd_dev*ibnbd_dev; + struct ibnbd_srv_session*sess; + struct ibnbd_srv_dev*dev; + struct kobject kobj; + struct completion *sysfs_release_compl; + u32 device_id; + fmode_t open_flags; + struct kref kref; + struct completion *destroy_comp; + charpathname[NAME_MAX]; +}; + +/* ibnbd-srv-sysfs.c */ + +int ibnbd_srv_create_dev_sysfs(struct ibnbd_srv_dev *dev, + struct block_device *bdev, + const char *dir_name); +void ibnbd_srv_destroy_dev_sysfs(struct ibnbd_srv_dev *dev); +int ibnbd_srv_create_dev_session_sysfs(struct ibnbd_srv_sess_dev *sess_dev); +void ibnbd_srv_destroy_dev_session_sysfs(struct ibnbd_srv_sess_dev *sess_dev); +int ibnbd_srv_create_sysfs_files(void); +void ibnbd_srv_destroy_sysfs_files(void); + +#endif /* IBNBD_SRV_H */ -- 2.13.1
[PATCH 15/24] ibnbd: client: private header with client structs and functions
This header describes main structs and functions used by ibnbd-client module, mainly for managing IBNBD sessions and mapped block devices, creating and destroying sysfs entries. Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/block/ibnbd/ibnbd-clt.h | 193 1 file changed, 193 insertions(+) diff --git a/drivers/block/ibnbd/ibnbd-clt.h b/drivers/block/ibnbd/ibnbd-clt.h new file mode 100644 index ..b3d72b2962dd --- /dev/null +++ b/drivers/block/ibnbd/ibnbd-clt.h @@ -0,0 +1,193 @@ +/* + * InfiniBand Network Block Driver + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#ifndef IBNBD_CLT_H +#define IBNBD_CLT_H + +#include +#include +#include +#include +#include + +#include "ibtrs.h" +#include "ibnbd-proto.h" +#include "ibnbd-log.h" + +#define BMAX_SEGMENTS 31 +#define RECONNECT_DELAY 30 +#define MAX_RECONNECTS -1 + +enum ibnbd_clt_dev_state { + DEV_STATE_INIT, + DEV_STATE_MAPPED, + DEV_STATE_MAPPED_DISCONNECTED, + DEV_STATE_UNMAPPED, +}; + +enum ibnbd_queue_mode { + BLK_MQ, + BLK_RQ +}; + +struct ibnbd_iu_comp { + wait_queue_head_t wait; + int errno; +}; + +struct ibnbd_iu { + union { + struct request *rq; /* for block io */ + void *buf; /* for user messages */ + }; + struct ibtrs_tag*tag; + union { + /* use to send msg associated with a dev */ + struct ibnbd_clt_dev *dev; + /* use to send msg associated with a sess */ + struct ibnbd_clt_session *sess; + }; + blk_status_tstatus; + struct scatterlist sglist[BMAX_SEGMENTS]; + struct work_struct work; + int errno; + struct ibnbd_iu_comp*comp; +}; + +struct ibnbd_cpu_qlist { + struct list_headrequeue_list; + spinlock_t requeue_lock; + unsigned intcpu; +}; + +struct ibnbd_clt_session { + struct list_headlist; + struct ibtrs_clt*ibtrs; + wait_queue_head_t ibtrs_waitq; + boolibtrs_ready; + struct ibnbd_cpu_qlist __percpu + *cpu_queues; + DECLARE_BITMAP(cpu_queues_bm, NR_CPUS); + int __percpu*cpu_rr; /* per-cpu var for CPU round-robin */ + atomic_tbusy; + int queue_depth; + u32 max_io_size; + struct blk_mq_tag_set tag_set; + struct mutexlock; /* protects state and devs_list */ + struct list_headdevs_list; /* list of struct ibnbd_clt_dev */ + refcount_t refcount; + charsessname[NAME_MAX]; + u8 ver; /* protocol version */ +}; + +/** + * Submission queues. + */ +struct ibnbd_queue { + struct list_headrequeue_list; + unsigned long in_list; + struct ibnbd_clt_dev*dev; + struct blk_mq_hw_ctx*hctx; +}; + +struct ibnbd_clt_dev { + struct ibnbd_clt_session*sess; + struct request_queue*queue; + struct ibnbd_queue *hw_queues; + struct delayed_work rq_delay_work; + u32 device_id; + /* local Idr index - used to track minor number allocations. */ + u32 clt_device_id; + struct mutexlock; + enum ibnbd_clt_dev_statedev_state; + enum ibnbd_queue_mode queue_mode; + enum ibnbd_io_mode io_mode; /* user requested */ + enum ibnbd_io_mode remote_io_mode; /* server really used */ + charpathname[NAME_MAX]; + enum ibnbd_access_mode access_mode; + boolread_only; + boolrotational; + u32
[PATCH 17/24] ibnbd: client: sysfs interface functions
This is the sysfs interface to IBNBD block devices on client side: /sys/kernel/ibnbd_client/ |- map_device | *** maps remote device | |- devices/ *** all mapped devices /sys/block/ibnbd/ibnbd_client/ |- unmap_device | *** unmaps device | |- state | *** device state | |- session | *** session name | |- mapping_path *** path of the dev that was mapped on server Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/block/ibnbd/ibnbd-clt-sysfs.c | 723 ++ 1 file changed, 723 insertions(+) diff --git a/drivers/block/ibnbd/ibnbd-clt-sysfs.c b/drivers/block/ibnbd/ibnbd-clt-sysfs.c new file mode 100644 index ..2770b5c81c23 --- /dev/null +++ b/drivers/block/ibnbd/ibnbd-clt-sysfs.c @@ -0,0 +1,723 @@ +/* + * InfiniBand Network Block Driver + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#undef pr_fmt +#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "ibnbd-clt.h" + +static struct kobject *ibnbd_kobject; +static struct kobject *ibnbd_devices_kobject; + +enum { + IBNBD_OPT_ERR = 0, + IBNBD_OPT_PATH = 1 << 0, + IBNBD_OPT_DEV_PATH = 1 << 1, + IBNBD_OPT_ACCESS_MODE = 1 << 3, + IBNBD_OPT_INPUT_MODE= 1 << 4, + IBNBD_OPT_IO_MODE = 1 << 5, + IBNBD_OPT_SESSNAME = 1 << 6, +}; + +static unsigned int ibnbd_opt_mandatory[] = { + IBNBD_OPT_PATH, + IBNBD_OPT_DEV_PATH, + IBNBD_OPT_SESSNAME, +}; + +static const match_table_t ibnbd_opt_tokens = { + { IBNBD_OPT_PATH, "path=%s" }, + { IBNBD_OPT_DEV_PATH, "device_path=%s"}, + { IBNBD_OPT_ACCESS_MODE, "access_mode=%s"}, + { IBNBD_OPT_INPUT_MODE, "input_mode=%s" }, + { IBNBD_OPT_IO_MODE, "io_mode=%s"}, + { IBNBD_OPT_SESSNAME, "sessname=%s" }, + { IBNBD_OPT_ERR, NULL}, +}; + +/* remove new line from string */ +static void strip(char *s) +{ + char *p = s; + + while (*s != '\0') { + if (*s != '\n') + *p++ = *s++; + else + ++s; + } + *p = '\0'; +} + +static int ibnbd_clt_parse_map_options(const char *buf, + char *sessname, + struct ibtrs_addr *paths, + size_t *path_cnt, + size_t max_path_cnt, + char *pathname, + enum ibnbd_access_mode *access_mode, + enum ibnbd_queue_mode *queue_mode, + enum ibnbd_io_mode *io_mode) +{ + char *options, *sep_opt; + char *p; + substring_t args[MAX_OPT_ARGS]; + int opt_mask = 0; + int token; + int ret = -EINVAL; + int i; + int p_cnt = 0; + + options = kstrdup(buf, GFP_KERNEL); + if (!options) + return -ENOMEM; + + options = strstrip(options); + strip(options); + sep_opt = options; + while ((p = strsep(&sep_opt, " ")) != NULL) { + if (!*p) + continue; + + token = match_token(p, ibnbd_opt_tokens, args); + opt_mask |= token; + + switch (token) { + case IBNBD_OPT_SESSNAME: + p = match_strdup(args); +
[PATCH 19/24] ibnbd: server: main functionality
This is main functionality of ibnbd-server module, which handles IBTRS events and IBNBD protocol requests, like map (open) or unmap (close) device. Also server side is responsible for processing incoming IBTRS IO requests and forward them to local mapped devices. Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/block/ibnbd/ibnbd-srv.c | 901 1 file changed, 901 insertions(+) diff --git a/drivers/block/ibnbd/ibnbd-srv.c b/drivers/block/ibnbd/ibnbd-srv.c new file mode 100644 index ..a32d22ab67a3 --- /dev/null +++ b/drivers/block/ibnbd/ibnbd-srv.c @@ -0,0 +1,901 @@ +/* + * InfiniBand Network Block Driver + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#undef pr_fmt +#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt + +#include +#include + +#include "ibnbd-srv.h" +#include "ibnbd-srv-dev.h" + +MODULE_AUTHOR("ib...@profitbricks.com"); +MODULE_VERSION(IBNBD_VER_STRING); +MODULE_DESCRIPTION("InfiniBand Network Block Device Server"); +MODULE_LICENSE("GPL"); + +#define DEFAULT_DEV_SEARCH_PATH "/" + +static char dev_search_path[PATH_MAX] = DEFAULT_DEV_SEARCH_PATH; + +static int dev_search_path_set(const char *val, const struct kernel_param *kp) +{ + char *dup; + + if (strlen(val) >= sizeof(dev_search_path)) + return -EINVAL; + + dup = kstrdup(val, GFP_KERNEL); + + if (dup[strlen(dup) - 1] == '\n') + dup[strlen(dup) - 1] = '\0'; + + strlcpy(dev_search_path, dup, sizeof(dev_search_path)); + + kfree(dup); + pr_info("dev_search_path changed to '%s'\n", dev_search_path); + + return 0; +} + +static struct kparam_string dev_search_path_kparam_str = { + .maxlen = sizeof(dev_search_path), + .string = dev_search_path +}; + +static const struct kernel_param_ops dev_search_path_ops = { + .set= dev_search_path_set, + .get= param_get_string, +}; + +module_param_cb(dev_search_path, &dev_search_path_ops, + &dev_search_path_kparam_str, 0444); +MODULE_PARM_DESC(dev_search_path, "Sets the device_search_path." +" When a device is mapped this path is prepended to the" +" device_path from the map_device operation." +" (default: " DEFAULT_DEV_SEARCH_PATH ")"); + +static int def_io_mode = IBNBD_BLOCKIO; +module_param(def_io_mode, int, 0444); +MODULE_PARM_DESC(def_io_mode, "By default, export devices in" +" blockio(" __stringify(_IBNBD_BLOCKIO) ") or" +" fileio(" __stringify(_IBNBD_FILEIO) ") mode." +" (default: " __stringify(_IBNBD_BLOCKIO) " (blockio))"); + +static DEFINE_MUTEX(sess_lock); +static DEFINE_SPINLOCK(dev_lock); + +static LIST_HEAD(sess_list); +static LIST_HEAD(dev_list); + +struct ibnbd_io_private { + struct ibtrs_srv_op *id; + struct ibnbd_srv_sess_dev *sess_dev; +}; + +static void ibnbd_sess_dev_release(struct kref *kref) +{ + struct ibnbd_srv_sess_dev *sess_dev; + + sess_dev = container_of(kref, struct ibnbd_srv_sess_dev, kref); + complete(sess_dev->destroy_comp); +} + +static inline void ibnbd_put_sess_dev(struct ibnbd_srv_sess_dev *sess_dev) +{ + kref_put(&sess_dev->kref, ibnbd_sess_dev_release); +} + +static void ibnbd_endio(void *priv, int error) +{ + struct ibnbd_io_private *ibnbd_priv = priv; + struct ibnbd_srv_sess_dev *sess_dev = ibnbd_priv->sess_dev; + + ibnbd_put_sess_dev(sess_dev); + + ibtrs_srv_resp_rdma(ibnbd_priv->id, error); + + kfree(priv); +} + +static struct ibnbd_srv_sess_dev * +ibnbd_get_sess_dev(int dev_id, struct ibnbd_srv_session *srv_sess) +{ +
[PATCH 24/24] MAINTAINERS: Add maintainer for IBNBD/IBTRS modules
Signed-off-by: Roman Pen Cc: Danil Kipnis Cc: Jack Wang --- MAINTAINERS | 14 ++ 1 file changed, 14 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 18994806e441..fad9c2529f8a 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -6714,6 +6714,20 @@ IBM ServeRAID RAID DRIVER S: Orphan F: drivers/scsi/ips.* +IBNBD BLOCK DRIVERS +M: IBNBD/IBTRS Storage Team +L: linux-block@vger.kernel.org +S: Maintained +T: git git://github.com/profitbricks/ibnbd.git +F: drivers/block/ibnbd/ + +IBTRS TRANSPORT DRIVERS +M: IBNBD/IBTRS Storage Team +L: linux-r...@vger.kernel.org +S: Maintained +T: git git://github.com/profitbricks/ibnbd.git +F: drivers/infiniband/ulp/ibtrs/ + ICH LPC AND GPIO DRIVER M: Peter Tyser S: Maintained -- 2.13.1
[PATCH 14/24] ibnbd: private headers with IBNBD protocol structs and helpers
These are common private headers with IBNBD protocol structures, logging, sysfs and other helper functions, which are used on both client and server sides. Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/block/ibnbd/ibnbd-log.h | 71 drivers/block/ibnbd/ibnbd-proto.h | 360 ++ 2 files changed, 431 insertions(+) diff --git a/drivers/block/ibnbd/ibnbd-log.h b/drivers/block/ibnbd/ibnbd-log.h new file mode 100644 index ..489343a61171 --- /dev/null +++ b/drivers/block/ibnbd/ibnbd-log.h @@ -0,0 +1,71 @@ +/* + * InfiniBand Network Block Driver + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#ifndef IBNBD_LOG_H +#define IBNBD_LOG_H + +#include "ibnbd-clt.h" +#include "ibnbd-srv.h" + +#define ibnbd_diskname(dev) ({ \ + struct gendisk *gd = ((struct ibnbd_clt_dev *)dev)->gd; \ + gd ? gd->disk_name : "";\ +}) + +void unknown_type(void); + +#define ibnbd_log(fn, dev, fmt, ...) ({ \ + __builtin_choose_expr( \ + __builtin_types_compatible_p( \ + typeof(dev), struct ibnbd_clt_dev *), \ + fn("<%s@%s> %s: " fmt, (dev)->pathname, \ + (dev)->sess->sessname, ibnbd_diskname(dev), \ + ##__VA_ARGS__), \ + __builtin_choose_expr( \ + __builtin_types_compatible_p(typeof(dev), \ + struct ibnbd_srv_sess_dev *), \ + fn("<%s@%s>: " fmt, (dev)->pathname,\ + (dev)->sess->sessname, ##__VA_ARGS__), \ + unknown_type())); \ +}) + +#define ibnbd_err(dev, fmt, ...) \ + ibnbd_log(pr_err, dev, fmt, ##__VA_ARGS__) +#define ibnbd_err_rl(dev, fmt, ...)\ + ibnbd_log(pr_err_ratelimited, dev, fmt, ##__VA_ARGS__) +#define ibnbd_wrn(dev, fmt, ...) \ + ibnbd_log(pr_warn, dev, fmt, ##__VA_ARGS__) +#define ibnbd_wrn_rl(dev, fmt, ...) \ + ibnbd_log(pr_warn_ratelimited, dev, fmt, ##__VA_ARGS__) +#define ibnbd_info(dev, fmt, ...) \ + ibnbd_log(pr_info, dev, fmt, ##__VA_ARGS__) +#define ibnbd_info_rl(dev, fmt, ...) \ + ibnbd_log(pr_info_ratelimited, dev, fmt, ##__VA_ARGS__) + +#endif /* IBNBD_LOG_H */ diff --git a/drivers/block/ibnbd/ibnbd-proto.h b/drivers/block/ibnbd/ibnbd-proto.h new file mode 100644 index ..c809705a2322 --- /dev/null +++ b/drivers/block/ibnbd/ibnbd-proto.h @@ -0,0 +1,360 @@ +/* + * InfiniBand Network Block Driver + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#ifndef IBNBD
[PATCH v2 08/26] ibtrs: client: statistics functions
This introduces set of functions used on client side to account statistics of RDMA data sent/received, amount of IOs inflight, latency, cpu migrations, etc. Almost all statistics is collected using percpu variables. Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/infiniband/ulp/ibtrs/ibtrs-clt-stats.c | 455 + 1 file changed, 455 insertions(+) create mode 100644 drivers/infiniband/ulp/ibtrs/ibtrs-clt-stats.c diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-clt-stats.c b/drivers/infiniband/ulp/ibtrs/ibtrs-clt-stats.c new file mode 100644 index ..af2ed05d2900 --- /dev/null +++ b/drivers/infiniband/ulp/ibtrs/ibtrs-clt-stats.c @@ -0,0 +1,455 @@ +/* + * InfiniBand Transport Layer + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#undef pr_fmt +#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt + +#include "ibtrs-clt.h" + +static inline int ibtrs_clt_ms_to_id(unsigned long ms) +{ + int id = ms ? ilog2(ms) - MIN_LOG_LAT + 1 : 0; + + return clamp(id, 0, LOG_LAT_SZ - 1); +} + +void ibtrs_clt_update_rdma_lat(struct ibtrs_clt_stats *stats, bool read, + unsigned long ms) +{ + struct ibtrs_clt_stats_pcpu *s; + int id; + + id = ibtrs_clt_ms_to_id(ms); + s = this_cpu_ptr(stats->pcpu_stats); + if (read) { + s->rdma_lat_distr[id].read++; + if (s->rdma_lat_max.read < ms) + s->rdma_lat_max.read = ms; + } else { + s->rdma_lat_distr[id].write++; + if (s->rdma_lat_max.write < ms) + s->rdma_lat_max.write = ms; + } +} + +void ibtrs_clt_decrease_inflight(struct ibtrs_clt_stats *stats) +{ + atomic_dec(&stats->inflight); +} + +void ibtrs_clt_update_wc_stats(struct ibtrs_clt_con *con) +{ + struct ibtrs_clt_sess *sess = to_clt_sess(con->c.sess); + struct ibtrs_clt_stats *stats = &sess->stats; + struct ibtrs_clt_stats_pcpu *s; + int cpu; + + cpu = raw_smp_processor_id(); + s = this_cpu_ptr(stats->pcpu_stats); + s->wc_comp.cnt++; + s->wc_comp.total_cnt++; + if (unlikely(con->cpu != cpu)) { + s->cpu_migr.to++; + + /* Careful here, override s pointer */ + s = per_cpu_ptr(stats->pcpu_stats, con->cpu); + atomic_inc(&s->cpu_migr.from); + } +} + +void ibtrs_clt_inc_failover_cnt(struct ibtrs_clt_stats *stats) +{ + struct ibtrs_clt_stats_pcpu *s; + + s = this_cpu_ptr(stats->pcpu_stats); + s->rdma.failover_cnt++; +} + +static inline u32 ibtrs_clt_stats_get_avg_wc_cnt(struct ibtrs_clt_stats *stats) +{ + u32 cnt = 0; + u64 sum = 0; + int cpu; + + for_each_possible_cpu(cpu) { + struct ibtrs_clt_stats_pcpu *s; + + s = per_cpu_ptr(stats->pcpu_stats, cpu); + sum += s->wc_comp.total_cnt; + cnt += s->wc_comp.cnt; + } + + return cnt ? sum / cnt : 0; +} + +int ibtrs_clt_stats_wc_completion_to_str(struct ibtrs_clt_stats *stats, +char *buf, size_t len) +{ + return scnprintf(buf, len, "%u\n", +ibtrs_clt_stats_get_avg_wc_cnt(stats)); +} + +ssize_t ibtrs_clt_stats_rdma_lat_distr_to_str(struct ibtrs_clt_stats *stats, + char *page, size_t len) +{ + struct ibtrs_clt_stats_rdma_lat res[LOG_LAT_SZ]; + struct ibtrs_clt_stats_rdma_lat max; + struct ibtrs_clt_stats_pcpu *s; + + ssize_t cnt = 0; + int i, cpu; + + max.write = 0; + max.read = 0; + for_each_possible_cpu(cpu) { + s = per_cpu_ptr(stats->pcpu_stats, cpu); + + if (max.write < s->rdma_lat_max.write) +
[PATCH v2 07/26] ibtrs: client: main functionality
This is main functionality of ibtrs-client module, which manages set of RDMA connections for each IBTRS session, does multipathing, load balancing and failover of RDMA requests. Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/infiniband/ulp/ibtrs/ibtrs-clt.c | 2818 ++ 1 file changed, 2818 insertions(+) create mode 100644 drivers/infiniband/ulp/ibtrs/ibtrs-clt.c diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-clt.c b/drivers/infiniband/ulp/ibtrs/ibtrs-clt.c new file mode 100644 index ..0983f0939b19 --- /dev/null +++ b/drivers/infiniband/ulp/ibtrs/ibtrs-clt.c @@ -0,0 +1,2818 @@ +/* + * InfiniBand Transport Layer + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * Swapnil Ingle + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#undef pr_fmt +#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt + +#include +#include + +#include "ibtrs-clt.h" +#include "ibtrs-log.h" + +#define MAX_SEGMENTS 31 +#define IBTRS_CONNECT_TIMEOUT_MS 5000 + +MODULE_AUTHOR("ib...@profitbricks.com"); +MODULE_DESCRIPTION("IBTRS Client"); +MODULE_VERSION(IBTRS_VER_STRING); +MODULE_LICENSE("GPL"); + +static ushort nr_cons_per_session; +module_param(nr_cons_per_session, ushort, 0444); +MODULE_PARM_DESC(nr_cons_per_session, "Number of connections per session." +" (default: nr_cpu_ids)"); + +static int retry_cnt = 7; +module_param_named(retry_cnt, retry_cnt, int, 0644); +MODULE_PARM_DESC(retry_cnt, "Number of times to send the message if the" +" remote side didn't respond with Ack or Nack (default: 7," +" min: " __stringify(MIN_RTR_CNT) ", max: " +__stringify(MAX_RTR_CNT) ")"); + +static int __read_mostly noreg_cnt = 0; +module_param_named(noreg_cnt, noreg_cnt, int, 0444); +MODULE_PARM_DESC(noreg_cnt, "Max number of SG entries when MR registration " +"does not happen (default: 0)"); + +static const struct ibtrs_ib_dev_pool_ops dev_pool_ops; +static struct ibtrs_ib_dev_pool dev_pool = { + .ops = &dev_pool_ops +}; +static struct workqueue_struct *ibtrs_wq; +static struct class *ibtrs_dev_class; + +static void ibtrs_rdma_error_recovery(struct ibtrs_clt_con *con); +static int ibtrs_clt_rdma_cm_handler(struct rdma_cm_id *cm_id, +struct rdma_cm_event *ev); +static void ibtrs_clt_rdma_done(struct ib_cq *cq, struct ib_wc *wc); +static void complete_rdma_req(struct ibtrs_clt_io_req *req, int errno, + bool notify, bool can_wait); +static int ibtrs_clt_write_req(struct ibtrs_clt_io_req *req); +static int ibtrs_clt_read_req(struct ibtrs_clt_io_req *req); + +bool ibtrs_clt_sess_is_connected(const struct ibtrs_clt_sess *sess) +{ + return sess->state == IBTRS_CLT_CONNECTED; +} + +static inline bool ibtrs_clt_is_connected(const struct ibtrs_clt *clt) +{ + struct ibtrs_clt_sess *sess; + bool connected = false; + + rcu_read_lock(); + list_for_each_entry_rcu(sess, &clt->paths_list, s.entry) + connected |= ibtrs_clt_sess_is_connected(sess); + rcu_read_unlock(); + + return connected; +} + +static inline struct ibtrs_tag * +__ibtrs_get_tag(struct ibtrs_clt *clt, enum ibtrs_clt_con_type con_type) +{ + size_t max_depth = clt->queue_depth; + struct ibtrs_tag *tag; + int cpu, bit; + + cpu = get_cpu(); + do { + bit = find_first_zero_bit(clt->tags_map, max_depth); + if (unlikely(bit >= max_depth)) { + put_cpu(); + return NULL; + } + + } while (unlikely(test_and_set_bit_lock(bit, clt->tags_map))); + put_cpu(); + + tag = GET_TAG(clt, bit); + WARN_ON(tag->mem_id != bit); + tag->cpu_id = cpu; +
[PATCH v2 10/26] ibtrs: server: private header with server structs and functions
This header describes main structs and functions used by ibtrs-server module, mainly for accepting IBTRS sessions, creating/destroying sysfs entries, accounting statistics on server side. Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/infiniband/ulp/ibtrs/ibtrs-srv.h | 175 +++ 1 file changed, 175 insertions(+) create mode 100644 drivers/infiniband/ulp/ibtrs/ibtrs-srv.h diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-srv.h b/drivers/infiniband/ulp/ibtrs/ibtrs-srv.h new file mode 100644 index ..8193d568e67e --- /dev/null +++ b/drivers/infiniband/ulp/ibtrs/ibtrs-srv.h @@ -0,0 +1,175 @@ +/* + * InfiniBand Transport Layer + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * Swapnil Ingle + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#ifndef IBTRS_SRV_H +#define IBTRS_SRV_H + +#include +#include +#include "ibtrs-pri.h" + +/** + * enum ibtrs_srv_state - Server states. + */ +enum ibtrs_srv_state { + IBTRS_SRV_CONNECTING, + IBTRS_SRV_CONNECTED, + IBTRS_SRV_CLOSING, + IBTRS_SRV_CLOSED, +}; + +static inline const char *ibtrs_srv_state_str(enum ibtrs_srv_state state) +{ + switch (state) { + case IBTRS_SRV_CONNECTING: + return "IBTRS_SRV_CONNECTING"; + case IBTRS_SRV_CONNECTED: + return "IBTRS_SRV_CONNECTED"; + case IBTRS_SRV_CLOSING: + return "IBTRS_SRV_CLOSING"; + case IBTRS_SRV_CLOSED: + return "IBTRS_SRV_CLOSED"; + default: + return "UNKNOWN"; + } +} + +struct ibtrs_stats_wc_comp { + atomic64_t calls; + atomic64_t total_wc_cnt; +}; + +struct ibtrs_srv_stats_rdma_stats { + struct { + atomic64_t cnt; + atomic64_t size_total; + } dir[2]; +}; + +struct ibtrs_srv_stats { + struct ibtrs_srv_stats_rdma_stats rdma_stats; + atomic_tapm_cnt; + struct ibtrs_stats_wc_comp wc_comp; +}; + +struct ibtrs_srv_con { + struct ibtrs_conc; + atomic_twr_cnt; +}; + +struct ibtrs_srv_op { + struct ibtrs_srv_con*con; + u32 msg_id; + u8 dir; + struct ibtrs_msg_rdma_read *rd_msg; + struct ib_rdma_wr *tx_wr; + struct ib_sge *tx_sg; +}; + +struct ibtrs_srv_mr { + struct ib_mr*mr; + struct sg_table sgt; +}; + +struct ibtrs_srv_sess { + struct ibtrs_sess s; + struct ibtrs_srv*srv; + struct work_struct close_work; + enum ibtrs_srv_statestate; + spinlock_t state_lock; + int cur_cq_vector; + struct ibtrs_srv_op **ops_ids; + atomic_tids_inflight; + wait_queue_head_t ids_waitq; + struct ibtrs_srv_mr *mrs; + unsigned intmrs_num; + dma_addr_t *dma_addr; + boolestablished; + unsigned intmem_bits; + struct kobject kobj; + struct kobject kobj_stats; + struct ibtrs_srv_stats stats; +}; + +struct ibtrs_srv { + struct list_headpaths_list; + int paths_up; + struct mutexpaths_ev_mutex; + size_t paths_num; + struct mutexpaths_mutex; + uuid_t paths_uuid; + refcount_t refcount; + struct ibtrs_srv_ctx*ctx; + struct list_headctx_list; + void*priv; + size_t queue_depth; + struct page **chunks; + struct device dev; + unsigneddev_ref; + struct kobject kobj_paths; +}; + +struct ibtrs_
[PATCH v2 13/26] ibtrs: server: sysfs interface functions
This is the sysfs interface to IBTRS sessions on server side: /sys/devices/virtual/ibtrs-server// *** IBTRS session accepted from a client peer | |- paths// *** established paths from a client in a session | |- disconnect | *** disconnect path | |- hca_name | *** HCA name | |- hca_port | *** HCA port | |- stats/ *** current path statistics | |- rdma |- reset_all |- wc_completions Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/infiniband/ulp/ibtrs/ibtrs-srv-sysfs.c | 271 + 1 file changed, 271 insertions(+) create mode 100644 drivers/infiniband/ulp/ibtrs/ibtrs-srv-sysfs.c diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-srv-sysfs.c b/drivers/infiniband/ulp/ibtrs/ibtrs-srv-sysfs.c new file mode 100644 index ..96d9d9f08e0e --- /dev/null +++ b/drivers/infiniband/ulp/ibtrs/ibtrs-srv-sysfs.c @@ -0,0 +1,271 @@ +/* + * InfiniBand Transport Layer + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#undef pr_fmt +#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt + +#include "ibtrs-pri.h" +#include "ibtrs-srv.h" +#include "ibtrs-log.h" + +extern struct class *ibtrs_dev_class; + +static struct kobj_type ktype = { + .sysfs_ops = &kobj_sysfs_ops, +}; + +static ssize_t ibtrs_srv_disconnect_show(struct kobject *kobj, +struct kobj_attribute *attr, +char *page) +{ + return scnprintf(page, PAGE_SIZE, "Usage: echo 1 > %s\n", +attr->attr.name); +} + +static ssize_t ibtrs_srv_disconnect_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t count) +{ + struct ibtrs_srv_sess *sess; + char str[MAXHOSTNAMELEN]; + + sess = container_of(kobj, struct ibtrs_srv_sess, kobj); + if (!sysfs_streq(buf, "1")) { + ibtrs_err(sess, "%s: invalid value: '%s'\n", + attr->attr.name, buf); + return -EINVAL; + } + + sockaddr_to_str((struct sockaddr *)&sess->s.dst_addr, str, sizeof(str)); + + ibtrs_info(sess, "disconnect for path %s requested\n", str); + ibtrs_srv_queue_close(sess); + + return count; +} + +static struct kobj_attribute ibtrs_srv_disconnect_attr = + __ATTR(disconnect, 0644, + ibtrs_srv_disconnect_show, ibtrs_srv_disconnect_store); + +static ssize_t ibtrs_srv_hca_port_show(struct kobject *kobj, + struct kobj_attribute *attr, + char *page) +{ + struct ibtrs_srv_sess *sess; + struct ibtrs_con *usr_con; + + sess = container_of(kobj, typeof(*sess), kobj); + usr_con = sess->s.con[0]; + + return scnprintf(page, PAGE_SIZE, "%u\n", +usr_con->cm_id->port_num); +} + +static struct kobj_attribute ibtrs_srv_hca_port_attr = + __ATTR(hca_port, 0444, ibtrs_srv_hca_port_show, NULL); + +static ssize_t ibtrs_srv_hca_name_show(struct kobject *kobj, + struct kobj_attribute *attr, + char *page) +{ + struct ibtrs_srv_sess *sess; + + sess = container_of(kobj, struct ibtrs_srv_sess, kobj); + + return scnprintf(page, PAGE_SIZE, "%s\n", +sess->s.dev->ib_dev->name); +} + +static struct kobj_attribute ibtrs_srv_hca_name_attr = + __ATTR(hca_name, 0444, ibtrs_srv_hca_name_show, NULL); + +static struct attribute *ibtrs_srv_sess_attrs[] = { + &ibtrs_srv_hca_name_attr.attr, +
[PATCH v2 12/26] ibtrs: server: statistics functions
This introduces set of functions used on server side to account statistics of RDMA data sent/received. Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/infiniband/ulp/ibtrs/ibtrs-srv-stats.c | 110 + 1 file changed, 110 insertions(+) create mode 100644 drivers/infiniband/ulp/ibtrs/ibtrs-srv-stats.c diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-srv-stats.c b/drivers/infiniband/ulp/ibtrs/ibtrs-srv-stats.c new file mode 100644 index ..5933cfc03f95 --- /dev/null +++ b/drivers/infiniband/ulp/ibtrs/ibtrs-srv-stats.c @@ -0,0 +1,110 @@ +/* + * InfiniBand Transport Layer + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#undef pr_fmt +#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt + +#include "ibtrs-srv.h" + +void ibtrs_srv_update_rdma_stats(struct ibtrs_srv_stats *s, +size_t size, int d) +{ + atomic64_inc(&s->rdma_stats.dir[d].cnt); + atomic64_add(size, &s->rdma_stats.dir[d].size_total); +} + +void ibtrs_srv_update_wc_stats(struct ibtrs_srv_stats *s) +{ + atomic64_inc(&s->wc_comp.calls); + atomic64_inc(&s->wc_comp.total_wc_cnt); +} + +int ibtrs_srv_reset_rdma_stats(struct ibtrs_srv_stats *stats, bool enable) +{ + if (enable) { + struct ibtrs_srv_stats_rdma_stats *r = &stats->rdma_stats; + + memset(r, 0, sizeof(*r)); + return 0; + } + + return -EINVAL; +} + +ssize_t ibtrs_srv_stats_rdma_to_str(struct ibtrs_srv_stats *stats, + char *page, size_t len) +{ + struct ibtrs_srv_stats_rdma_stats *r = &stats->rdma_stats; + struct ibtrs_srv_sess *sess; + + sess = container_of(stats, typeof(*sess), stats); + + return scnprintf(page, len, "%lld %lld %lld %lld %u\n", +(s64)atomic64_read(&r->dir[READ].cnt), +(s64)atomic64_read(&r->dir[READ].size_total), +(s64)atomic64_read(&r->dir[WRITE].cnt), +(s64)atomic64_read(&r->dir[WRITE].size_total), +atomic_read(&sess->ids_inflight)); +} + +int ibtrs_srv_reset_wc_completion_stats(struct ibtrs_srv_stats *stats, + bool enable) +{ + if (enable) { + memset(&stats->wc_comp, 0, sizeof(stats->wc_comp)); + return 0; + } + + return -EINVAL; +} + +int ibtrs_srv_stats_wc_completion_to_str(struct ibtrs_srv_stats *stats, +char *buf, size_t len) +{ + return snprintf(buf, len, "%lld %lld\n", + (s64)atomic64_read(&stats->wc_comp.total_wc_cnt), + (s64)atomic64_read(&stats->wc_comp.calls)); +} + +ssize_t ibtrs_srv_reset_all_help(struct ibtrs_srv_stats *stats, +char *page, size_t len) +{ + return scnprintf(page, PAGE_SIZE, "echo 1 to reset all statistics\n"); +} + +int ibtrs_srv_reset_all_stats(struct ibtrs_srv_stats *stats, bool enable) +{ + if (enable) { + ibtrs_srv_reset_wc_completion_stats(stats, enable); + ibtrs_srv_reset_rdma_stats(stats, enable); + return 0; + } + + return -EINVAL; +} -- 2.13.1
[PATCH v2 16/26] ibnbd: private headers with IBNBD protocol structs and helpers
These are common private headers with IBNBD protocol structures, logging, sysfs and other helper functions, which are used on both client and server sides. Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/block/ibnbd/ibnbd-log.h | 71 drivers/block/ibnbd/ibnbd-proto.h | 364 ++ 2 files changed, 435 insertions(+) create mode 100644 drivers/block/ibnbd/ibnbd-log.h create mode 100644 drivers/block/ibnbd/ibnbd-proto.h diff --git a/drivers/block/ibnbd/ibnbd-log.h b/drivers/block/ibnbd/ibnbd-log.h new file mode 100644 index ..489343a61171 --- /dev/null +++ b/drivers/block/ibnbd/ibnbd-log.h @@ -0,0 +1,71 @@ +/* + * InfiniBand Network Block Driver + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#ifndef IBNBD_LOG_H +#define IBNBD_LOG_H + +#include "ibnbd-clt.h" +#include "ibnbd-srv.h" + +#define ibnbd_diskname(dev) ({ \ + struct gendisk *gd = ((struct ibnbd_clt_dev *)dev)->gd; \ + gd ? gd->disk_name : "";\ +}) + +void unknown_type(void); + +#define ibnbd_log(fn, dev, fmt, ...) ({ \ + __builtin_choose_expr( \ + __builtin_types_compatible_p( \ + typeof(dev), struct ibnbd_clt_dev *), \ + fn("<%s@%s> %s: " fmt, (dev)->pathname, \ + (dev)->sess->sessname, ibnbd_diskname(dev), \ + ##__VA_ARGS__), \ + __builtin_choose_expr( \ + __builtin_types_compatible_p(typeof(dev), \ + struct ibnbd_srv_sess_dev *), \ + fn("<%s@%s>: " fmt, (dev)->pathname,\ + (dev)->sess->sessname, ##__VA_ARGS__), \ + unknown_type())); \ +}) + +#define ibnbd_err(dev, fmt, ...) \ + ibnbd_log(pr_err, dev, fmt, ##__VA_ARGS__) +#define ibnbd_err_rl(dev, fmt, ...)\ + ibnbd_log(pr_err_ratelimited, dev, fmt, ##__VA_ARGS__) +#define ibnbd_wrn(dev, fmt, ...) \ + ibnbd_log(pr_warn, dev, fmt, ##__VA_ARGS__) +#define ibnbd_wrn_rl(dev, fmt, ...) \ + ibnbd_log(pr_warn_ratelimited, dev, fmt, ##__VA_ARGS__) +#define ibnbd_info(dev, fmt, ...) \ + ibnbd_log(pr_info, dev, fmt, ##__VA_ARGS__) +#define ibnbd_info_rl(dev, fmt, ...) \ + ibnbd_log(pr_info_ratelimited, dev, fmt, ##__VA_ARGS__) + +#endif /* IBNBD_LOG_H */ diff --git a/drivers/block/ibnbd/ibnbd-proto.h b/drivers/block/ibnbd/ibnbd-proto.h new file mode 100644 index ..050d3fa4c1bf --- /dev/null +++ b/drivers/block/ibnbd/ibnbd-proto.h @@ -0,0 +1,364 @@ +/* + * InfiniBand Network Block Driver + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU
[PATCH v2 11/26] ibtrs: server: main functionality
This is main functionality of ibtrs-server module, which accepts set of RDMA connections (so called IBTRS session), creates/destroys sysfs entries associated with IBTRS session and notifies upper layer (user of IBTRS API) about RDMA requests or link events. Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/infiniband/ulp/ibtrs/ibtrs-srv.c | 1981 ++ 1 file changed, 1981 insertions(+) create mode 100644 drivers/infiniband/ulp/ibtrs/ibtrs-srv.c diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-srv.c b/drivers/infiniband/ulp/ibtrs/ibtrs-srv.c new file mode 100644 index ..d57fa6af5a5c --- /dev/null +++ b/drivers/infiniband/ulp/ibtrs/ibtrs-srv.c @@ -0,0 +1,1981 @@ +/* + * InfiniBand Transport Layer + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * Swapnil Ingle + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#undef pr_fmt +#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt + +#include +#include + +#include "ibtrs-srv.h" +#include "ibtrs-log.h" + +MODULE_AUTHOR("ib...@profitbricks.com"); +MODULE_DESCRIPTION("IBTRS Server"); +MODULE_VERSION(IBTRS_VER_STRING); +MODULE_LICENSE("GPL"); + +/* Must be power of 2, see mask from mr->page_size in ib_sg_to_pages() */ +#define DEFAULT_MAX_CHUNK_SIZE (128 << 10) +#define DEFAULT_SESS_QUEUE_DEPTH 512 +#define MAX_HDR_SIZE PAGE_SIZE +#define MAX_SG_COUNT ((MAX_HDR_SIZE - sizeof(struct ibtrs_msg_rdma_read)) \ + / sizeof(struct ibtrs_sg_desc)) + +/* We guarantee to serve 10 paths at least */ +#define CHUNK_POOL_SZ 10 + +static struct ibtrs_ib_dev_pool dev_pool; +static mempool_t *chunk_pool; +struct class *ibtrs_dev_class; + +static int retry_count = 7; +static int __read_mostly max_chunk_size = DEFAULT_MAX_CHUNK_SIZE; +static int __read_mostly sess_queue_depth = DEFAULT_SESS_QUEUE_DEPTH; + +module_param_named(max_chunk_size, max_chunk_size, int, 0444); +MODULE_PARM_DESC(max_chunk_size, +"Max size for each IO request, when change the unit is in byte" +" (default: " __stringify(DEFAULT_MAX_CHUNK_SIZE_KB) "KB)"); + +module_param_named(sess_queue_depth, sess_queue_depth, int, 0444); +MODULE_PARM_DESC(sess_queue_depth, +"Number of buffers for pending I/O requests to allocate" +" per session. Maximum: " __stringify(MAX_SESS_QUEUE_DEPTH) +" (default: " __stringify(DEFAULT_SESS_QUEUE_DEPTH) ")"); + +static int retry_count_set(const char *val, const struct kernel_param *kp) +{ + int err, ival; + + err = kstrtoint(val, 0, &ival); + if (err) + return err; + + if (ival < MIN_RTR_CNT || ival > MAX_RTR_CNT) { + pr_err("Invalid retry count value %d, has to be" + " > %d, < %d\n", ival, MIN_RTR_CNT, MAX_RTR_CNT); + return -EINVAL; + } + + retry_count = ival; + pr_info("QP retry count changed to %d\n", ival); + + return 0; +} + +static const struct kernel_param_ops retry_count_ops = { + .set= retry_count_set, + .get= param_get_int, +}; +module_param_cb(retry_count, &retry_count_ops, &retry_count, 0644); + +MODULE_PARM_DESC(retry_count, "Number of times to send the message if the" +" remote side didn't respond with Ack or Nack (default: 3," +" min: " __stringify(MIN_RTR_CNT) ", max: " +__stringify(MAX_RTR_CNT) ")"); + +static char cq_affinity_list[256] = ""; +static cpumask_t cq_affinity_mask = { CPU_BITS_ALL }; + +static void init_cq_affinity(void) +{ + sprintf(cq_affinity_list, "0-%d", nr_cpu_ids - 1); +} + +static int cq_affinity_list_set(const char *val, const struct kernel_param *kp) +
[PATCH v2 15/26] ibtrs: a bit of documentation
README with description of major sysfs entries. Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/infiniband/ulp/ibtrs/README | 358 1 file changed, 358 insertions(+) create mode 100644 drivers/infiniband/ulp/ibtrs/README diff --git a/drivers/infiniband/ulp/ibtrs/README b/drivers/infiniband/ulp/ibtrs/README new file mode 100644 index ..010a93b02d9c --- /dev/null +++ b/drivers/infiniband/ulp/ibtrs/README @@ -0,0 +1,358 @@ + +InfiniBand Transport (IBTRS) + + +IBTRS (InfiniBand Transport) is a reliable high speed transport library +which provides support to establish optimal number of connections +between client and server machines using RDMA (InfiniBand, RoCE, iWarp) +transport. It is optimized to transfer (read/write) IO blocks. + +In its core interface it follows the BIO semantics of providing the +possibility to either write data from an sg list to the remote side +or to request ("read") data transfer from the remote side into a given +sg list. + +IBTRS provides I/O fail-over and load-balancing capabilities by using +multipath I/O (see "add_path" and "mp_policy" configuration entries). + +IBTRS is used by the IBNBD (Infiniband Network Block Device) modules. + +== +Client Sysfs Interface +== + +This chapter describes only the most important files of sysfs interface +on client side. + +Entries under /sys/devices/virtual/ibtrs-client/ + + +When a user of IBTRS API creates a new session, a directory entry with +the name of that session is created. + +Entries under /sys/devices/virtual/ibtrs-client// +=== + +add_path (RW) +- + +Adds a new path (connection) to an existing session. Expected format is the +following: + + <[source addr,]destination addr> + + *addr ::= [ ip: | gid: ] + +max_reconnect_attempts (RW) +--- + +Maximum number reconnect attempts the client should make before giving up +after connection breaks unexpectedly. + +mp_policy (RW) +-- + +Multipath policy specifies which path should be selected on each IO: + + round-robin (0): + select path in per CPU round-robin manner. + + min-inflight (1): + select path with minimum inflights. + +Entries under /sys/devices/virtual/ibtrs-client//paths/ += + + +Each path belonging to a given session is listed here by its destination +address. When a new path is added to a session by writing to the "add_path" +entry, a directory with the corresponding destination address is created. + +Entries under /sys/devices/virtual/ibtrs-client//paths// += + +state (R) +- + +Contains "connected" if the session is connected to the peer and fully +functional. Otherwise the file contains "disconnected" + +reconnect (RW) +-- + +Write "1" to the file in order to reconnect the path. +Operation is blocking and returns 0 if reconnect was successful. + +disconnect (RW) +--- + +Write "1" to the file in order to disconnect the path. +Operation blocks until IBTRS path is disconnected. + +remove_path (RW) + + +Write "1" to the file in order to disconnected and remove the path +from the session. Operation blocks until the path is disconnected +and removed from the session. + +Entries under /sys/devices/virtual/ibtrs-client//paths//stats/ +=== + +Write "0" to any file in that directory to reset corresponding statistics. + +reset_all (RW) +-- + +Read will return usage help, write 0 will clear all the statistics. + +sg_entries (RW) +--- + +Data to be transferred via RDMA is passed to IBTRS as scatter-gather +list. A scatter-gather list can contain multiple entries. +Scatter-gather list with less entries require less processing power +and can therefore transferred faster. The file sg_entries outputs a +per-CPU distribution table for the number of entries in the +scatter-gather lists, that were passed to the IBTRS API function +ibtrs_clt_request (READ or WRITE). + +cpu_migration (RW) +-- + +IBTRS expects that each HCA IRQ is pinned to a separate CPU. If it's +not the case, the processing of an I/O response could be processed on a +different CPU than where it was originally submitted. This file shows +how many interrupts where generated on a non expected CPU. +"from:" is the CPU on which the IRQ was expected, but not generated. +"to:" is the CPU on which the IRQ was generated, but not expected. +
[PATCH v2 25/26] ibnbd: a bit of documentation
README with description of major sysfs entries. Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/block/ibnbd/README | 299 + 1 file changed, 299 insertions(+) create mode 100644 drivers/block/ibnbd/README diff --git a/drivers/block/ibnbd/README b/drivers/block/ibnbd/README new file mode 100644 index ..bbaddd02c1c5 --- /dev/null +++ b/drivers/block/ibnbd/README @@ -0,0 +1,299 @@ +*** +Infiniband Network Block Device (IBNBD) +*** + +Introduction + + +IBNBD (InfiniBand Network Block Device) is a pair of kernel modules +(client and server) that allow for remote access of a block device on +the server over IBTRS protocol using the RDMA (InfiniBand, RoCE, iWarp) +transport. After being mapped, the remote block devices can be accessed +on the client side as local block devices. + +I/O is transfered between client and server by the IBTRS transport +modules. The administration of IBNBD and IBTRS modules is done via +sysfs entries. + +Requirements + + + IBTRS kernel modules + +Quick Start +--- + +Server side: + # modprobe ibnbd_server + +Client side: + # modprobe ibnbd_client + # echo "sessname=blya path=ip:10.50.100.66 device_path=/dev/ram0" > \ +/sys/devices/virtual/ibnbd-client/ctl/map_device + + Where "sessname=" is a session name, a string to identify the session + on client and on server sides; "path=" is a destination IP address or + a pair of a source and a destination IPs, separated by comma. Multiple + "path=" options can be specified in order to use multipath (see IBTRS + description for details); "device_path=" is the block device to be + mapped from the server side. After the session to the server machine is + established, the mapped device will appear on the client side under + /dev/ibnbd. + + +== +Client Sysfs Interface +== + +All sysfs files that are not read-only provide the usage information on read: + +Example: + # cat /sys/devices/virtual/ibnbd-client/ctl/map_device + + > Usage: echo "sessname= path=<[srcaddr,]dstaddr> + > [path=<[srcaddr,]dstaddr>] device_path= + > [access_mode=] + > [io_mode=]" > map_device + > + > addr ::= [ ip: | ip: | gid: ] + +Entries under /sys/devices/virtual/ibnbd-client/ctl/ +=== + +map_device (RW) +--- + +Expected format is the following: + +sessname= +path=<[srcaddr,]dstaddr> [path=<[srcaddr,]dstaddr> ...] +device_path= +[access_mode=] +[io_mode=] + +Where: + +sessname: accepts a string not bigger than 256 chars, which identifies + a given session on the client and on the server. + I.e. "clt_hostname-srv_hostname" could be a natural choice. + +path: describes a connection between the client and the server by + specifying destination and, when required, the source address. + The addresses are to be provided in the following format: + +ip: +ip: +gid: + + for example: + + path=ip:10.0.0.66 + The single addr is treated as the destination. + The connection will be established to this + server from any client IP address. + + path=ip:10.0.0.66,ip:10.0.1.66 + First addr is the source address and the second + is the destination. + + If multiple "path=" options are specified multiple connection + will be established and data will be sent according to + the selected multipath policy (see IBTRS mp_policy sysfs entry + description). + +device_path: Path to the block device on the server side. Path is specified + relative to the directory on server side configured in the + 'dev_search_path' module parameter of the ibnbd_server. + The ibnbd_server prepends the received from client + with and tries to open the + / block device. On success, + a /dev/ibnbd device file, a /sys/block/ibnbd_client/ibnbd/ + directory and an entry in /sys/devices/virtual/ibnbd-client/ctl/devices + will be created. + + If 'dev_search_path' contains '%SESSNAME%', then each session can + have different devices namespace, e.g. server was configured with + the following parameter "dev_search_path=/run/ibnbd-devs/%SESSNAME%", + client has this string "sessname=blya device_path=sda", then server + will try to open: /run/ibnbd-devs/blya/sda. + +access_mode: the access_mode parameter specifies if the device is to be + mapped as "ro" r
[PATCH v2 19/26] ibnbd: client: sysfs interface functions
This is the sysfs interface to IBNBD block devices on client side: /sys/devices/virtual/ibnbd-client/ctl/ |- map_device | *** maps remote device | |- devices/ *** all mapped devices /sys/block/ibnbd/ibnbd_client/ |- unmap_device | *** unmaps device | |- state | *** device state | |- session | *** session name | |- mapping_path *** path of the dev that was mapped on server Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/block/ibnbd/ibnbd-clt-sysfs.c | 675 ++ 1 file changed, 675 insertions(+) create mode 100644 drivers/block/ibnbd/ibnbd-clt-sysfs.c diff --git a/drivers/block/ibnbd/ibnbd-clt-sysfs.c b/drivers/block/ibnbd/ibnbd-clt-sysfs.c new file mode 100644 index ..ca3e59b28c54 --- /dev/null +++ b/drivers/block/ibnbd/ibnbd-clt-sysfs.c @@ -0,0 +1,675 @@ +/* + * InfiniBand Network Block Driver + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * Swapnil Ingle + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#undef pr_fmt +#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "ibnbd-clt.h" + +static struct device *ibnbd_dev; +static struct class *ibnbd_dev_class; +static struct kobject *ibnbd_devs_kobj; + +enum { + IBNBD_OPT_ERR = 0, + IBNBD_OPT_PATH = 1 << 0, + IBNBD_OPT_DEV_PATH = 1 << 1, + IBNBD_OPT_ACCESS_MODE = 1 << 3, + IBNBD_OPT_IO_MODE = 1 << 5, + IBNBD_OPT_SESSNAME = 1 << 6, +}; + +static unsigned int ibnbd_opt_mandatory[] = { + IBNBD_OPT_PATH, + IBNBD_OPT_DEV_PATH, + IBNBD_OPT_SESSNAME, +}; + +static const match_table_t ibnbd_opt_tokens = { + { IBNBD_OPT_PATH, "path=%s" }, + { IBNBD_OPT_DEV_PATH, "device_path=%s"}, + { IBNBD_OPT_ACCESS_MODE, "access_mode=%s"}, + { IBNBD_OPT_IO_MODE, "io_mode=%s"}, + { IBNBD_OPT_SESSNAME, "sessname=%s" }, + { IBNBD_OPT_ERR, NULL}, +}; + +/* remove new line from string */ +static void strip(char *s) +{ + char *p = s; + + while (*s != '\0') { + if (*s != '\n') + *p++ = *s++; + else + ++s; + } + *p = '\0'; +} + +static int ibnbd_clt_parse_map_options(const char *buf, + char *sessname, + struct ibtrs_addr *paths, + size_t *path_cnt, + size_t max_path_cnt, + char *pathname, + enum ibnbd_access_mode *access_mode, + enum ibnbd_io_mode *io_mode) +{ + char *options, *sep_opt; + char *p; + substring_t args[MAX_OPT_ARGS]; + int opt_mask = 0; + int token; + int ret = -EINVAL; + int i; + int p_cnt = 0; + + options = kstrdup(buf, GFP_KERNEL); + if (!options) + return -ENOMEM; + + sep_opt = strstrip(options); + strip(sep_opt); + while ((p = strsep(&sep_opt, " ")) != NULL) { + if (!*p) + continue; + + token = match_token(p, ibnbd_opt_tokens, args); + opt_mask |= token; + + switch (token) { + case IBNBD_OPT_SESSNAME: + p = match_strdup(args); + if (!p) { + ret = -ENOMEM; +
[PATCH v2 17/26] ibnbd: client: private header with client structs and functions
This header describes main structs and functions used by ibnbd-client module, mainly for managing IBNBD sessions and mapped block devices, creating and destroying sysfs entries. Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/block/ibnbd/ibnbd-clt.h | 172 1 file changed, 172 insertions(+) create mode 100644 drivers/block/ibnbd/ibnbd-clt.h diff --git a/drivers/block/ibnbd/ibnbd-clt.h b/drivers/block/ibnbd/ibnbd-clt.h new file mode 100644 index ..c5f6f08ec338 --- /dev/null +++ b/drivers/block/ibnbd/ibnbd-clt.h @@ -0,0 +1,172 @@ +/* + * InfiniBand Network Block Driver + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * Swapnil Ingle + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#ifndef IBNBD_CLT_H +#define IBNBD_CLT_H + +#include +#include +#include +#include +#include + +#include "ibtrs.h" +#include "ibnbd-proto.h" +#include "ibnbd-log.h" + +#define BMAX_SEGMENTS 31 +#define RECONNECT_DELAY 30 +#define MAX_RECONNECTS -1 + +enum ibnbd_clt_dev_state { + DEV_STATE_INIT, + DEV_STATE_MAPPED, + DEV_STATE_MAPPED_DISCONNECTED, + DEV_STATE_UNMAPPED, +}; + +struct ibnbd_iu_comp { + wait_queue_head_t wait; + int errno; +}; + +struct ibnbd_iu { + union { + struct request *rq; /* for block io */ + void *buf; /* for user messages */ + }; + struct ibtrs_tag*tag; + union { + /* use to send msg associated with a dev */ + struct ibnbd_clt_dev *dev; + /* use to send msg associated with a sess */ + struct ibnbd_clt_session *sess; + }; + blk_status_tstatus; + struct scatterlist sglist[BMAX_SEGMENTS]; + struct work_struct work; + int errno; + struct ibnbd_iu_comp*comp; +}; + +struct ibnbd_cpu_qlist { + struct list_headrequeue_list; + spinlock_t requeue_lock; + unsigned intcpu; +}; + +struct ibnbd_clt_session { + struct list_headlist; + struct ibtrs_clt*ibtrs; + wait_queue_head_t ibtrs_waitq; + boolibtrs_ready; + struct ibnbd_cpu_qlist __percpu + *cpu_queues; + DECLARE_BITMAP(cpu_queues_bm, NR_CPUS); + int __percpu*cpu_rr; /* per-cpu var for CPU round-robin */ + atomic_tbusy; + int queue_depth; + u32 max_io_size; + struct blk_mq_tag_set tag_set; + struct mutexlock; /* protects state and devs_list */ + struct list_headdevs_list; /* list of struct ibnbd_clt_dev */ + refcount_t refcount; + charsessname[NAME_MAX]; + u8 ver; /* protocol version */ +}; + +/** + * Submission queues. + */ +struct ibnbd_queue { + struct list_headrequeue_list; + unsigned long in_list; + struct ibnbd_clt_dev*dev; + struct blk_mq_hw_ctx*hctx; +}; + +struct ibnbd_clt_dev { + struct ibnbd_clt_session*sess; + struct request_queue*queue; + struct ibnbd_queue *hw_queues; + u32 device_id; + /* local Idr index - used to track minor number allocations. */ + u32 clt_device_id; + struct mutexlock; + enum ibnbd_clt_dev_statedev_state; + enum ibnbd_io_mode io_mode; /* user requested */ + enum ibnbd_io_mode remote_io_mode; /* server really used */ + charpathname[NAME_MAX]; + enum ibnbd_access_mode access_mode; + boolread_only; + boolrotational; + u32 max_hw_sectors; + u32
[PATCH v2 14/26] ibtrs: include client and server modules into kernel compilation
Add IBTRS Makefile, Kconfig and also corresponding lines into upper layer infiniband/ulp files. Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/infiniband/Kconfig| 1 + drivers/infiniband/ulp/Makefile | 1 + drivers/infiniband/ulp/ibtrs/Kconfig | 20 drivers/infiniband/ulp/ibtrs/Makefile | 15 +++ 4 files changed, 37 insertions(+) create mode 100644 drivers/infiniband/ulp/ibtrs/Kconfig create mode 100644 drivers/infiniband/ulp/ibtrs/Makefile diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig index ee270e065ba9..787bd286fb08 100644 --- a/drivers/infiniband/Kconfig +++ b/drivers/infiniband/Kconfig @@ -94,6 +94,7 @@ source "drivers/infiniband/ulp/srpt/Kconfig" source "drivers/infiniband/ulp/iser/Kconfig" source "drivers/infiniband/ulp/isert/Kconfig" +source "drivers/infiniband/ulp/ibtrs/Kconfig" source "drivers/infiniband/ulp/opa_vnic/Kconfig" source "drivers/infiniband/sw/rdmavt/Kconfig" diff --git a/drivers/infiniband/ulp/Makefile b/drivers/infiniband/ulp/Makefile index 437813c7b481..1c4f10dc8d49 100644 --- a/drivers/infiniband/ulp/Makefile +++ b/drivers/infiniband/ulp/Makefile @@ -5,3 +5,4 @@ obj-$(CONFIG_INFINIBAND_SRPT) += srpt/ obj-$(CONFIG_INFINIBAND_ISER) += iser/ obj-$(CONFIG_INFINIBAND_ISERT) += isert/ obj-$(CONFIG_INFINIBAND_OPA_VNIC) += opa_vnic/ +obj-$(CONFIG_INFINIBAND_IBTRS) += ibtrs/ diff --git a/drivers/infiniband/ulp/ibtrs/Kconfig b/drivers/infiniband/ulp/ibtrs/Kconfig new file mode 100644 index ..eaeb8f3f6b4e --- /dev/null +++ b/drivers/infiniband/ulp/ibtrs/Kconfig @@ -0,0 +1,20 @@ +config INFINIBAND_IBTRS + tristate + depends on INFINIBAND_ADDR_TRANS + +config INFINIBAND_IBTRS_CLIENT + tristate "IBTRS client module" + depends on INFINIBAND_ADDR_TRANS + select INFINIBAND_IBTRS + help + IBTRS client allows for simplified data transfer and connection + establishment over RDMA (InfiniBand, RoCE, iWarp). Uses BIO-like + READ/WRITE semantics and provides multipath capabilities. + +config INFINIBAND_IBTRS_SERVER + tristate "IBTRS server module" + depends on INFINIBAND_ADDR_TRANS + select INFINIBAND_IBTRS + help + IBTRS server module processing connection and IO requests received + from the IBTRS client module. diff --git a/drivers/infiniband/ulp/ibtrs/Makefile b/drivers/infiniband/ulp/ibtrs/Makefile new file mode 100644 index ..e6ea858745ad --- /dev/null +++ b/drivers/infiniband/ulp/ibtrs/Makefile @@ -0,0 +1,15 @@ +ibtrs-client-y := ibtrs-clt.o \ + ibtrs-clt-stats.o \ + ibtrs-clt-sysfs.o + +ibtrs-server-y := ibtrs-srv.o \ + ibtrs-srv-stats.o \ + ibtrs-srv-sysfs.o + +ibtrs-core-y := ibtrs.o + +obj-$(CONFIG_INFINIBAND_IBTRS)+= ibtrs-core.o +obj-$(CONFIG_INFINIBAND_IBTRS_CLIENT) += ibtrs-client.o +obj-$(CONFIG_INFINIBAND_IBTRS_SERVER) += ibtrs-server.o + +-include $(src)/compat/compat.mk -- 2.13.1
[PATCH v2 22/26] ibnbd: server: functionality for IO submission to file or block dev
This provides helper functions for IO submission to file or block dev. Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/block/ibnbd/ibnbd-srv-dev.c | 410 drivers/block/ibnbd/ibnbd-srv-dev.h | 149 + 2 files changed, 559 insertions(+) create mode 100644 drivers/block/ibnbd/ibnbd-srv-dev.c create mode 100644 drivers/block/ibnbd/ibnbd-srv-dev.h diff --git a/drivers/block/ibnbd/ibnbd-srv-dev.c b/drivers/block/ibnbd/ibnbd-srv-dev.c new file mode 100644 index ..a5894849b9d5 --- /dev/null +++ b/drivers/block/ibnbd/ibnbd-srv-dev.c @@ -0,0 +1,410 @@ +/* + * InfiniBand Network Block Driver + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#undef pr_fmt +#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt + +#include "ibnbd-srv-dev.h" +#include "ibnbd-log.h" + +#define IBNBD_DEV_MAX_FILEIO_ACTIVE_WORKERS 0 + +struct ibnbd_dev_file_io_work { + struct ibnbd_dev*dev; + void*priv; + + sector_tsector; + void*data; + size_t len; + size_t bi_size; + enum ibnbd_io_flags flags; + + struct work_struct work; +}; + +struct ibnbd_dev_blk_io { + struct ibnbd_dev *dev; + void *priv; +}; + +static struct workqueue_struct *fileio_wq; + +int ibnbd_dev_init(void) +{ + fileio_wq = alloc_workqueue("%s", WQ_UNBOUND, + IBNBD_DEV_MAX_FILEIO_ACTIVE_WORKERS, + "ibnbd_server_fileio_wq"); + if (!fileio_wq) + return -ENOMEM; + + return 0; +} + +void ibnbd_dev_destroy(void) +{ + destroy_workqueue(fileio_wq); +} + +static inline struct block_device *ibnbd_dev_open_bdev(const char *path, + fmode_t flags) +{ + return blkdev_get_by_path(path, flags, THIS_MODULE); +} + +static int ibnbd_dev_blk_open(struct ibnbd_dev *dev, const char *path, + fmode_t flags) +{ + dev->bdev = ibnbd_dev_open_bdev(path, flags); + return PTR_ERR_OR_ZERO(dev->bdev); +} + +static int ibnbd_dev_vfs_open(struct ibnbd_dev *dev, const char *path, + fmode_t flags) +{ + int oflags = O_DSYNC; /* enable write-through */ + + if (flags & FMODE_WRITE) + oflags |= O_RDWR; + else if (flags & FMODE_READ) + oflags |= O_RDONLY; + else + return -EINVAL; + + dev->file = filp_open(path, oflags, 0); + return PTR_ERR_OR_ZERO(dev->file); +} + +struct ibnbd_dev *ibnbd_dev_open(const char *path, fmode_t flags, +enum ibnbd_io_mode mode, struct bio_set *bs, +ibnbd_dev_io_fn io_cb) +{ + struct ibnbd_dev *dev; + int ret; + + dev = kzalloc(sizeof(*dev), GFP_KERNEL); + if (!dev) + return ERR_PTR(-ENOMEM); + + if (mode == IBNBD_BLOCKIO) { + dev->blk_open_flags = flags; + ret = ibnbd_dev_blk_open(dev, path, dev->blk_open_flags); + if (ret) + goto err; + } else if (mode == IBNBD_FILEIO) { + dev->blk_open_flags = FMODE_READ; + ret = ibnbd_dev_blk_open(dev, path, dev->blk_open_flags); + if (ret) + goto err; + + ret = ibnbd_dev_vfs_open(dev, path, flags); + if (ret) + goto blk_put; + } + + dev->blk_open_flags = flags; + dev->mode = mode; + dev->io_cb = io_cb; + bdevname(dev->bdev, dev->name); + dev->ibd_bio_set= bs; + + return dev; + +blk_put: + blkdev_pu
[PATCH v2 09/26] ibtrs: client: sysfs interface functions
This is the sysfs interface to IBTRS sessions on client side: /sys/devices/virtual/ibtrs-client// *** IBTRS session created by ibtrs_clt_open() API call | |- max_reconnect_attempts | *** number of reconnect attempts for session | |- add_path | *** adds another connection path into IBTRS session | |- paths// *** established paths to server in a session | |- disconnect | *** disconnect path | |- reconnect | *** reconnect path | |- remove_path | *** remove current path | |- state | *** retrieve current path state | |- hca_port | *** HCA port number | |- hca_name | *** HCA name | |- stats/ *** current path statistics | |- cpu_migration |- rdma |- rdma_lat |- reconnects |- reset_all |- sg_entries |- wc_completions Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/infiniband/ulp/ibtrs/ibtrs-clt-sysfs.c | 482 + 1 file changed, 482 insertions(+) create mode 100644 drivers/infiniband/ulp/ibtrs/ibtrs-clt-sysfs.c diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-clt-sysfs.c b/drivers/infiniband/ulp/ibtrs/ibtrs-clt-sysfs.c new file mode 100644 index ..c185bbc4fd5c --- /dev/null +++ b/drivers/infiniband/ulp/ibtrs/ibtrs-clt-sysfs.c @@ -0,0 +1,482 @@ +/* + * InfiniBand Transport Layer + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#undef pr_fmt +#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt + +#include "ibtrs-pri.h" +#include "ibtrs-clt.h" +#include "ibtrs-log.h" + +#define MIN_MAX_RECONN_ATT -1 +#define MAX_MAX_RECONN_ATT + +static struct kobj_type ktype = { + .sysfs_ops = &kobj_sysfs_ops, +}; + +static ssize_t max_reconnect_attempts_show(struct device *dev, + struct device_attribute *attr, + char *page) +{ + struct ibtrs_clt *clt; + + clt = container_of(dev, struct ibtrs_clt, dev); + + return sprintf(page, "%d\n", ibtrs_clt_get_max_reconnect_attempts(clt)); +} + +static ssize_t max_reconnect_attempts_store(struct device *dev, + struct device_attribute *attr, + const char *buf, + size_t count) +{ + struct ibtrs_clt *clt; + int value; + int ret; + + clt = container_of(dev, struct ibtrs_clt, dev); + + ret = kstrtoint(buf, 10, &value); + if (unlikely(ret)) { + ibtrs_err(clt, "%s: failed to convert string '%s' to int\n", + attr->attr.name, buf); + return ret; + } + if (unlikely(value > MAX_MAX_RECONN_ATT || +value < MIN_MAX_RECONN_ATT)) { + ibtrs_err(clt, "%s: invalid range" + " (provided: '%s', accepted: min: %d, max: %d)\n", + attr->attr.name, buf, MIN_MAX_RECONN_ATT, + MAX_MAX_RECONN_ATT); + return -EINVAL; + } + ibtrs_clt_set_max_reconnect_attempts(clt, value); + + return count; +} + +static DEVICE_ATTR_RW(max_reconnect_attempts); + +static ssize_t mpath_policy_show(struct device *dev, +struct device_attribute *attr, +char *page) +{ + struct ibtrs_clt *clt; + + clt = container_of(dev, struct ibtrs_clt, dev); + + switch (clt->mp_policy) { + case MP_POLICY_RR: + return sprintf(page, "round-robin (RR: %d)\n", clt->mp_policy); + case MP_POLICY_MIN_INFLIGHT
[PATCH v2 21/26] ibnbd: server: main functionality
This is main functionality of ibnbd-server module, which handles IBTRS events and IBNBD protocol requests, like map (open) or unmap (close) device. Also server side is responsible for processing incoming IBTRS IO requests and forward them to local mapped devices. Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/block/ibnbd/ibnbd-srv.c | 922 1 file changed, 922 insertions(+) create mode 100644 drivers/block/ibnbd/ibnbd-srv.c diff --git a/drivers/block/ibnbd/ibnbd-srv.c b/drivers/block/ibnbd/ibnbd-srv.c new file mode 100644 index ..a42a9191dad9 --- /dev/null +++ b/drivers/block/ibnbd/ibnbd-srv.c @@ -0,0 +1,922 @@ +/* + * InfiniBand Network Block Driver + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#undef pr_fmt +#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt + +#include +#include + +#include "ibnbd-srv.h" +#include "ibnbd-srv-dev.h" + +MODULE_AUTHOR("ib...@profitbricks.com"); +MODULE_VERSION(IBNBD_VER_STRING); +MODULE_DESCRIPTION("InfiniBand Network Block Device Server"); +MODULE_LICENSE("GPL"); + +#define DEFAULT_DEV_SEARCH_PATH "/" + +static char dev_search_path[PATH_MAX] = DEFAULT_DEV_SEARCH_PATH; + +static int dev_search_path_set(const char *val, const struct kernel_param *kp) +{ + char *dup; + + if (strlen(val) >= sizeof(dev_search_path)) + return -EINVAL; + + dup = kstrdup(val, GFP_KERNEL); + + if (dup[strlen(dup) - 1] == '\n') + dup[strlen(dup) - 1] = '\0'; + + strlcpy(dev_search_path, dup, sizeof(dev_search_path)); + + kfree(dup); + pr_info("dev_search_path changed to '%s'\n", dev_search_path); + + return 0; +} + +static struct kparam_string dev_search_path_kparam_str = { + .maxlen = sizeof(dev_search_path), + .string = dev_search_path +}; + +static const struct kernel_param_ops dev_search_path_ops = { + .set= dev_search_path_set, + .get= param_get_string, +}; + +module_param_cb(dev_search_path, &dev_search_path_ops, + &dev_search_path_kparam_str, 0444); +MODULE_PARM_DESC(dev_search_path, "Sets the dev_search_path." +" When a device is mapped this path is prepended to the" +" device path from the map device operation. If %SESSNAME%" +" is specified in a path, then device will be searched in a" +" session namespace." +" (default: " DEFAULT_DEV_SEARCH_PATH ")"); + +static int def_io_mode = IBNBD_BLOCKIO; +module_param(def_io_mode, int, 0444); +MODULE_PARM_DESC(def_io_mode, "By default, export devices in" +" blockio(" __stringify(_IBNBD_BLOCKIO) ") or" +" fileio(" __stringify(_IBNBD_FILEIO) ") mode." +" (default: " __stringify(_IBNBD_BLOCKIO) " (blockio))"); + +static DEFINE_MUTEX(sess_lock); +static DEFINE_SPINLOCK(dev_lock); + +static LIST_HEAD(sess_list); +static LIST_HEAD(dev_list); + +struct ibnbd_io_private { + struct ibtrs_srv_op *id; + struct ibnbd_srv_sess_dev *sess_dev; +}; + +static void ibnbd_sess_dev_release(struct kref *kref) +{ + struct ibnbd_srv_sess_dev *sess_dev; + + sess_dev = container_of(kref, struct ibnbd_srv_sess_dev, kref); + complete(sess_dev->destroy_comp); +} + +static inline void ibnbd_put_sess_dev(struct ibnbd_srv_sess_dev *sess_dev) +{ + kref_put(&sess_dev->kref, ibnbd_sess_dev_release); +} + +static void ibnbd_endio(void *priv, int error) +{ + struct ibnbd_io_private *ibnbd_priv = priv; + struct ibnbd_srv_sess_dev *sess_dev = ibnbd_priv->sess_dev; + + ibnbd_put_sess_dev(sess_
[PATCH v2 26/26] MAINTAINERS: Add maintainer for IBNBD/IBTRS modules
Signed-off-by: Roman Pen Cc: Danil Kipnis Cc: Jack Wang --- MAINTAINERS | 14 ++ 1 file changed, 14 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 92be777d060a..e5a001bd0f05 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -6786,6 +6786,20 @@ IBM ServeRAID RAID DRIVER S: Orphan F: drivers/scsi/ips.* +IBNBD BLOCK DRIVERS +M: IBNBD/IBTRS Storage Team +L: linux-block@vger.kernel.org +S: Maintained +T: git git://github.com/profitbricks/ibnbd.git +F: drivers/block/ibnbd/ + +IBTRS TRANSPORT DRIVERS +M: IBNBD/IBTRS Storage Team +L: linux-r...@vger.kernel.org +S: Maintained +T: git git://github.com/profitbricks/ibnbd.git +F: drivers/infiniband/ulp/ibtrs/ + ICH LPC AND GPIO DRIVER M: Peter Tyser S: Maintained -- 2.13.1
[PATCH v2 24/26] ibnbd: include client and server modules into kernel compilation
Add IBNBD Makefile, Kconfig and also corresponding lines into upper block layer files. Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/block/Kconfig| 2 ++ drivers/block/Makefile | 1 + drivers/block/ibnbd/Kconfig | 22 ++ drivers/block/ibnbd/Makefile | 13 + 4 files changed, 38 insertions(+) create mode 100644 drivers/block/ibnbd/Kconfig create mode 100644 drivers/block/ibnbd/Makefile diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig index ad9b687a236a..d8c1590411c8 100644 --- a/drivers/block/Kconfig +++ b/drivers/block/Kconfig @@ -481,4 +481,6 @@ config BLK_DEV_RSXX To compile this driver as a module, choose M here: the module will be called rsxx. +source "drivers/block/ibnbd/Kconfig" + endif # BLK_DEV diff --git a/drivers/block/Makefile b/drivers/block/Makefile index dc061158b403..65346a1d0b1a 100644 --- a/drivers/block/Makefile +++ b/drivers/block/Makefile @@ -38,6 +38,7 @@ obj-$(CONFIG_BLK_DEV_PCIESSD_MTIP32XX)+= mtip32xx/ obj-$(CONFIG_BLK_DEV_RSXX) += rsxx/ obj-$(CONFIG_BLK_DEV_NULL_BLK) += null_blk.o obj-$(CONFIG_ZRAM) += zram/ +obj-$(CONFIG_BLK_DEV_IBNBD)+= ibnbd/ skd-y := skd_main.o swim_mod-y := swim.o swim_asm.o diff --git a/drivers/block/ibnbd/Kconfig b/drivers/block/ibnbd/Kconfig new file mode 100644 index ..b381c6c084d2 --- /dev/null +++ b/drivers/block/ibnbd/Kconfig @@ -0,0 +1,22 @@ +config BLK_DEV_IBNBD + bool + +config BLK_DEV_IBNBD_CLIENT + tristate "Network block device driver on top of IBTRS transport" + depends on INFINIBAND_IBTRS_CLIENT + select BLK_DEV_IBNBD + help + IBNBD client allows for mapping of a remote block devices over + IBTRS protocol from a target system where IBNBD server is running. + + If unsure, say N. + +config BLK_DEV_IBNBD_SERVER + tristate "Network block device over RDMA Infiniband server support" + depends on INFINIBAND_IBTRS_SERVER + select BLK_DEV_IBNBD + help + IBNBD server allows for exporting local block devices to a remote client + over IBTRS protocol. + + If unsure, say N. diff --git a/drivers/block/ibnbd/Makefile b/drivers/block/ibnbd/Makefile new file mode 100644 index ..5f20e72e0633 --- /dev/null +++ b/drivers/block/ibnbd/Makefile @@ -0,0 +1,13 @@ +ccflags-y := -Idrivers/infiniband/ulp/ibtrs + +ibnbd-client-y := ibnbd-clt.o \ + ibnbd-clt-sysfs.o + +ibnbd-server-y := ibnbd-srv.o \ + ibnbd-srv-dev.o \ + ibnbd-srv-sysfs.o + +obj-$(CONFIG_BLK_DEV_IBNBD_CLIENT) += ibnbd-client.o +obj-$(CONFIG_BLK_DEV_IBNBD_SERVER) += ibnbd-server.o + +-include $(src)/compat/compat.mk -- 2.13.1
[PATCH v2 20/26] ibnbd: server: private header with server structs and functions
This header describes main structs and functions used by ibnbd-server module, namely structs for managing sessions from different clients and mapped (opened) devices. Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/block/ibnbd/ibnbd-srv.h | 100 1 file changed, 100 insertions(+) create mode 100644 drivers/block/ibnbd/ibnbd-srv.h diff --git a/drivers/block/ibnbd/ibnbd-srv.h b/drivers/block/ibnbd/ibnbd-srv.h new file mode 100644 index ..191a1650bc1d --- /dev/null +++ b/drivers/block/ibnbd/ibnbd-srv.h @@ -0,0 +1,100 @@ +/* + * InfiniBand Network Block Driver + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#ifndef IBNBD_SRV_H +#define IBNBD_SRV_H + +#include +#include +#include + +#include "ibtrs.h" +#include "ibnbd-proto.h" +#include "ibnbd-log.h" + +struct ibnbd_srv_session { + /* Entry inside global sess_list */ + struct list_headlist; + struct ibtrs_srv*ibtrs; + charsessname[NAME_MAX]; + int queue_depth; + struct bio_set *sess_bio_set; + + rwlock_tindex_lock cacheline_aligned; + struct idr index_idr; + /* List of struct ibnbd_srv_sess_dev */ + struct list_headsess_dev_list; + struct mutexlock; + u8 ver; +}; + +struct ibnbd_srv_dev { + /* Entry inside global dev_list */ + struct list_headlist; + struct kobject dev_kobj; + struct kobject dev_sessions_kobj; + struct kref kref; + charid[NAME_MAX]; + /* List of ibnbd_srv_sess_dev structs */ + struct list_headsess_dev_list; + struct mutexlock; + int open_write_cnt; + enum ibnbd_io_mode mode; +}; + +/* Structure which binds N devices and N sessions */ +struct ibnbd_srv_sess_dev { + /* Entry inside ibnbd_srv_dev struct */ + struct list_headdev_list; + /* Entry inside ibnbd_srv_session struct */ + struct list_headsess_list; + struct ibnbd_dev*ibnbd_dev; + struct ibnbd_srv_session*sess; + struct ibnbd_srv_dev*dev; + struct kobject kobj; + struct completion *sysfs_release_compl; + u32 device_id; + fmode_t open_flags; + struct kref kref; + struct completion *destroy_comp; + charpathname[NAME_MAX]; +}; + +/* ibnbd-srv-sysfs.c */ + +int ibnbd_srv_create_dev_sysfs(struct ibnbd_srv_dev *dev, + struct block_device *bdev, + const char *dir_name); +void ibnbd_srv_destroy_dev_sysfs(struct ibnbd_srv_dev *dev); +int ibnbd_srv_create_dev_session_sysfs(struct ibnbd_srv_sess_dev *sess_dev); +void ibnbd_srv_destroy_dev_session_sysfs(struct ibnbd_srv_sess_dev *sess_dev); +int ibnbd_srv_create_sysfs_files(void); +void ibnbd_srv_destroy_sysfs_files(void); + +#endif /* IBNBD_SRV_H */ -- 2.13.1
[PATCH v2 23/26] ibnbd: server: sysfs interface functions
This is the sysfs interface to IBNBD mapped devices on server side: /sys/devices/virtual/ibnbd-server/ctl/devices// |- block_dev | *** link pointing to the corresponding block device sysfs entry | |- sessions// | *** sessions directory | |- read_only | *** is devices mapped as read only | |- mapping_path *** relative device path provided by the client during mapping Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/block/ibnbd/ibnbd-srv-sysfs.c | 242 ++ 1 file changed, 242 insertions(+) create mode 100644 drivers/block/ibnbd/ibnbd-srv-sysfs.c diff --git a/drivers/block/ibnbd/ibnbd-srv-sysfs.c b/drivers/block/ibnbd/ibnbd-srv-sysfs.c new file mode 100644 index ..5bf77cdb09c8 --- /dev/null +++ b/drivers/block/ibnbd/ibnbd-srv-sysfs.c @@ -0,0 +1,242 @@ +/* + * InfiniBand Network Block Driver + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#undef pr_fmt +#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt + +#include +#include +#include +#include +#include +#include +#include +#include + +#include "ibnbd-srv.h" + +static struct device *ibnbd_dev; +static struct class *ibnbd_dev_class; +static struct kobject *ibnbd_devs_kobj; + +static struct attribute *ibnbd_srv_default_dev_attrs[] = { + NULL, +}; + +static struct attribute_group ibnbd_srv_default_dev_attr_group = { + .attrs = ibnbd_srv_default_dev_attrs, +}; + +static struct kobj_type ktype = { + .sysfs_ops = &kobj_sysfs_ops, +}; + +int ibnbd_srv_create_dev_sysfs(struct ibnbd_srv_dev *dev, + struct block_device *bdev, + const char *dir_name) +{ + struct kobject *bdev_kobj; + int ret; + + ret = kobject_init_and_add(&dev->dev_kobj, &ktype, + ibnbd_devs_kobj, dir_name); + if (ret) + return ret; + + ret = kobject_init_and_add(&dev->dev_sessions_kobj, + &ktype, + &dev->dev_kobj, "sessions"); + if (ret) + goto err; + + ret = sysfs_create_group(&dev->dev_kobj, +&ibnbd_srv_default_dev_attr_group); + if (ret) + goto err2; + + bdev_kobj = &disk_to_dev(bdev->bd_disk)->kobj; + ret = sysfs_create_link(&dev->dev_kobj, bdev_kobj, "block_dev"); + if (ret) + goto err3; + + return 0; + +err3: + sysfs_remove_group(&dev->dev_kobj, + &ibnbd_srv_default_dev_attr_group); +err2: + kobject_del(&dev->dev_sessions_kobj); + kobject_put(&dev->dev_sessions_kobj); +err: + kobject_del(&dev->dev_kobj); + kobject_put(&dev->dev_kobj); + return ret; +} + +void ibnbd_srv_destroy_dev_sysfs(struct ibnbd_srv_dev *dev) +{ + sysfs_remove_link(&dev->dev_kobj, "block_dev"); + sysfs_remove_group(&dev->dev_kobj, &ibnbd_srv_default_dev_attr_group); + kobject_del(&dev->dev_sessions_kobj); + kobject_put(&dev->dev_sessions_kobj); + kobject_del(&dev->dev_kobj); + kobject_put(&dev->dev_kobj); +} + +static ssize_t ibnbd_srv_dev_session_ro_show(struct kobject *kobj, +struct kobj_attribute *attr, +char *page) +{ + struct ibnbd_srv_sess_dev *sess_dev; + + sess_dev = container_of(kobj, struct ibnbd_srv_sess_dev, kobj); + + return scnprintf(page, PAGE_SIZE, "%s\n", +(sess_dev->open_flags & FMODE_WRITE) ? "0" : "1"); +} + +static struct kobj_attribute ibnbd_srv_
[PATCH v2 18/26] ibnbd: client: main functionality
This is main functionality of ibnbd-client module, which provides interface to map remote device as local block device /dev/ibnbd and feeds IBTRS with IO requests. Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/block/ibnbd/ibnbd-clt.c | 1819 +++ 1 file changed, 1819 insertions(+) create mode 100644 drivers/block/ibnbd/ibnbd-clt.c diff --git a/drivers/block/ibnbd/ibnbd-clt.c b/drivers/block/ibnbd/ibnbd-clt.c new file mode 100644 index ..06524e33e19f --- /dev/null +++ b/drivers/block/ibnbd/ibnbd-clt.c @@ -0,0 +1,1819 @@ +/* + * InfiniBand Network Block Driver + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * Swapnil Ingle + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#undef pr_fmt +#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt + +#include +#include +#include +#include +#include + +#include "ibnbd-clt.h" + +MODULE_AUTHOR("ib...@profitbricks.com"); +MODULE_DESCRIPTION("InfiniBand Network Block Device Client"); +MODULE_VERSION(IBNBD_VER_STRING); +MODULE_LICENSE("GPL"); + +/* + * This is for closing devices when unloading the module: + * we might be closing a lot (>256) of devices in parallel + * and it is better not to use the system_wq. + */ +static struct workqueue_struct *unload_wq; +static int ibnbd_client_major; +static DEFINE_IDA(index_ida); +static DEFINE_MUTEX(ida_lock); +static DEFINE_MUTEX(sess_lock); +static LIST_HEAD(sess_list); + +static bool softirq_enable; +module_param(softirq_enable, bool, 0444); +MODULE_PARM_DESC(softirq_enable, "finish request in softirq_fn." +" (default: 0)"); +/* + * Maximum number of partitions an instance can have. + * 6 bits = 64 minors = 63 partitions (one minor is used for the device itself) + */ +#define IBNBD_PART_BITS6 +#define KERNEL_SECTOR_SIZE 512 + +static inline bool ibnbd_clt_get_sess(struct ibnbd_clt_session *sess) +{ + return refcount_inc_not_zero(&sess->refcount); +} + +static void free_sess(struct ibnbd_clt_session *sess); + +static void ibnbd_clt_put_sess(struct ibnbd_clt_session *sess) +{ + might_sleep(); + + if (refcount_dec_and_test(&sess->refcount)) + free_sess(sess); +} + +static inline bool ibnbd_clt_dev_is_mapped(struct ibnbd_clt_dev *dev) +{ + return dev->dev_state == DEV_STATE_MAPPED; +} + +static void ibnbd_clt_put_dev(struct ibnbd_clt_dev *dev) +{ + might_sleep(); + + if (refcount_dec_and_test(&dev->refcount)) { + mutex_lock(&ida_lock); + ida_simple_remove(&index_ida, dev->clt_device_id); + mutex_unlock(&ida_lock); + kfree(dev->hw_queues); + ibnbd_clt_put_sess(dev->sess); + kfree(dev); + } +} + +static inline bool ibnbd_clt_get_dev(struct ibnbd_clt_dev *dev) +{ + return refcount_inc_not_zero(&dev->refcount); +} + +static int ibnbd_clt_set_dev_attr(struct ibnbd_clt_dev *dev, + const struct ibnbd_msg_open_rsp *rsp) +{ + struct ibnbd_clt_session *sess = dev->sess; + + if (unlikely(!rsp->logical_block_size)) + return -EINVAL; + + dev->device_id = le32_to_cpu(rsp->device_id); + dev->nsectors = le64_to_cpu(rsp->nsectors); + dev->logical_block_size = le16_to_cpu(rsp->logical_block_size); + dev->physical_block_size= le16_to_cpu(rsp->physical_block_size); + dev->max_write_same_sectors = le32_to_cpu(rsp->max_write_same_sectors); + dev->max_discard_sectors= le32_to_cpu(rsp->max_discard_sectors); + dev->discard_granularity= le32_to_cpu(rsp->discard_granularity); + dev->discard_alignment = le32_to_cpu(rsp->discard_alignment); + dev->secure_discard = le16_to_cpu(rsp-&g
[PATCH v2 04/26] ibtrs: private headers with IBTRS protocol structs and helpers
These are common private headers with IBTRS protocol structures, logging, sysfs and other helper functions, which are used on both client and server sides. Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/infiniband/ulp/ibtrs/ibtrs-log.h | 91 ++ drivers/infiniband/ulp/ibtrs/ibtrs-pri.h | 459 +++ 2 files changed, 550 insertions(+) create mode 100644 drivers/infiniband/ulp/ibtrs/ibtrs-log.h create mode 100644 drivers/infiniband/ulp/ibtrs/ibtrs-pri.h diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-log.h b/drivers/infiniband/ulp/ibtrs/ibtrs-log.h new file mode 100644 index ..f56257eabdee --- /dev/null +++ b/drivers/infiniband/ulp/ibtrs/ibtrs-log.h @@ -0,0 +1,91 @@ +/* + * InfiniBand Transport Layer + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#ifndef IBTRS_LOG_H +#define IBTRS_LOG_H + +#define P1 ) +#define P2 )) +#define P3 ))) +#define P4 +#define P(N) P ## N + +#define CAT(a, ...) PRIMITIVE_CAT(a, __VA_ARGS__) +#define PRIMITIVE_CAT(a, ...) a ## __VA_ARGS__ + +#define LIST(...) \ + __VA_ARGS__,\ + ({ unknown_type(); NULL; }) \ + CAT(P, COUNT_ARGS(__VA_ARGS__)) \ + +#define EMPTY() +#define DEFER(id) id EMPTY() + +#define _CASE(obj, type, member) \ + __builtin_choose_expr( \ + __builtin_types_compatible_p( \ + typeof(obj), type), \ + ((type)obj)->member +#define CASE(o, t, m) DEFER(_CASE)(o,t,m) + +/* + * Below we define retrieving of sessname from common IBTRS types. + * Client or server related types have to be defined by special + * TYPES_TO_SESSNAME macro. + */ + +void unknown_type(void); + +#ifndef TYPES_TO_SESSNAME +#define TYPES_TO_SESSNAME(...) ({ unknown_type(); NULL; }) +#endif + +#define ibtrs_prefix(obj) \ + _CASE(obj, struct ibtrs_con *, sess->sessname),\ + _CASE(obj, struct ibtrs_sess *, sessname), \ + TYPES_TO_SESSNAME(obj) \ + )) + +#define ibtrs_log(fn, obj, fmt, ...) \ + fn("<%s>: " fmt, ibtrs_prefix(obj), ##__VA_ARGS__) + +#define ibtrs_err(obj, fmt, ...) \ + ibtrs_log(pr_err, obj, fmt, ##__VA_ARGS__) +#define ibtrs_err_rl(obj, fmt, ...)\ + ibtrs_log(pr_err_ratelimited, obj, fmt, ##__VA_ARGS__) +#define ibtrs_wrn(obj, fmt, ...) \ + ibtrs_log(pr_warn, obj, fmt, ##__VA_ARGS__) +#define ibtrs_wrn_rl(obj, fmt, ...) \ + ibtrs_log(pr_warn_ratelimited, obj, fmt, ##__VA_ARGS__) +#define ibtrs_info(obj, fmt, ...) \ + ibtrs_log(pr_info, obj, fmt, ##__VA_ARGS__) +#define ibtrs_info_rl(obj, fmt, ...) \ + ibtrs_log(pr_info_ratelimited, obj, fmt, ##__VA_ARGS__) + +#endif /* IBTRS_LOG_H */ diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-pri.h b/drivers/infiniband/ulp/ibtrs/ibtrs-pri.h new file mode 100644 index ..40647f066840 --- /dev/null +++ b/drivers/infiniband/ulp/ibtrs/ibtrs-pri.h @@ -0,0 +1,459 @@ +/* + * InfiniBand Transport Layer + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * Swapnil Ingle + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This p
[PATCH v2 03/26] ibtrs: public interface header to establish RDMA connections
Introduce public header which provides set of API functions to establish RDMA connections from client to server machine using IBTRS protocol, which manages RDMA connections for each session, does multipathing and load balancing. Main functions for client (active) side: ibtrs_clt_open() - Creates set of RDMA connections incapsulated in IBTRS session and returns pointer on IBTRS session object. ibtrs_clt_close() - Closes RDMA connections associated with IBTRS session. ibtrs_clt_request() - Requests zero-copy RDMA transfer to/from server. Main functions for server (passive) side: ibtrs_srv_open() - Starts listening for IBTRS clients on specified port and invokes IBTRS callbacks for incoming RDMA requests or link events. ibtrs_srv_close() - Closes IBTRS server context. Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/infiniband/ulp/ibtrs/ibtrs.h | 324 +++ 1 file changed, 324 insertions(+) create mode 100644 drivers/infiniband/ulp/ibtrs/ibtrs.h diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs.h b/drivers/infiniband/ulp/ibtrs/ibtrs.h new file mode 100644 index ..08325e39a41e --- /dev/null +++ b/drivers/infiniband/ulp/ibtrs/ibtrs.h @@ -0,0 +1,324 @@ +/* + * InfiniBand Transport Layer + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#ifndef IBTRS_H +#define IBTRS_H + +#include +#include + +struct ibtrs_tag; +struct ibtrs_clt; +struct ibtrs_srv_ctx; +struct ibtrs_srv; +struct ibtrs_srv_op; + +/* + * Here goes IBTRS client API + */ + +/** + * enum ibtrs_clt_link_ev - Events about connectivity state of a client + * @IBTRS_CLT_LINK_EV_RECONNECTED Client was reconnected. + * @IBTRS_CLT_LINK_EV_DISCONNECTED Client was disconnected. + */ +enum ibtrs_clt_link_ev { + IBTRS_CLT_LINK_EV_RECONNECTED, + IBTRS_CLT_LINK_EV_DISCONNECTED, +}; + +/** + * Source and destination address of a path to be established + */ +struct ibtrs_addr { + struct sockaddr_storage *src; + struct sockaddr_storage *dst; +}; + +typedef void (link_clt_ev_fn)(void *priv, enum ibtrs_clt_link_ev ev); +/** + * ibtrs_clt_open() - Open a session to a IBTRS client + * @priv: User supplied private data. + * @link_ev: Event notification for connection state changes + * @priv: user supplied data that was passed to + * ibtrs_clt_open() + * @ev:Occurred event + * @sessname: name of the session + * @paths: Paths to be established defined by their src and dst addresses + * @path_cnt: Number of elemnts in the @paths array + * @port: port to be used by the IBTRS session + * @pdu_sz: Size of extra payload which can be accessed after tag allocation. + * @max_inflight_msg: Max. number of parallel inflight messages for the session + * @max_segments: Max. number of segments per IO request + * @reconnect_delay_sec: time between reconnect tries + * @max_reconnect_attempts: Number of times to reconnect on error before giving + * up, 0 for * disabled, -1 for forever + * + * Starts session establishment with the ibtrs_server. The function can block + * up to ~2000ms until it returns. + * + * Return a valid pointer on success otherwise PTR_ERR. + */ +struct ibtrs_clt *ibtrs_clt_open(void *priv, link_clt_ev_fn *link_ev, +const char *sessname, +const struct ibtrs_addr *paths, +size_t path_cnt, short port, +size_t pdu_sz, u8 reconnect_delay_sec, +u16 max_segments, +s16 max_reconnect_attempts); + +/** + * ibtrs_clt_close() - Close a session + * @sess: Session handler, is freed on return + */ +void ibtrs_clt_close(struct ibtrs_clt
[PATCH v2 00/26] InfiniBand Transport (IBTRS) and Network Block Device (IBNBD)
1432937 -6.4% x24 1396153 1244858 -10.8% x32 1215334 1066607 -12.2% x40 1255781 1076841 -14.2% x48 1240931 1066453 -14.1% x56 1250333 1065879 -14.8% x64 1229389 1064199 -13.4% rw=randwrite, bandwidth in Kbytes: jobsIBNBD NVMEoRDMA Change x1 1416413 1181102 -16.6% x8 2438615 1977051 -18.9% x16 2436924 1854223 -23.9% x24 2430527 1714580 -29.5% x32 2425552 1641288 -32.3% x40 2378784 1592788 -33.0% x48 2202260 1511895 -31.3% x56 2207013 1493400 -32.3% x64 2098949 1432951 -31.7% - on ConnectX-3 (MT4099) x40 CPUs Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz rw=randread, bandwidth in Kbytes: jobsIBNBD NVMEoRDMA Change x1 1961216 2046572 +4.4% x8 4012912 4059410 +1.2% x16 4033837 3968410 -1.6% x24 3939186 3770729 -4.3% x32 3843434 3623869 -5.7% x40 3696896 3448772 -6.7% x48 4106259 3729201 -9.2% x56 4141374 3732954 -9.9% x64 4207317 3805638 -9.5% rw=randwrite, bandwidth in Kbytes: jobsIBNBD NVMEoRDMA Change x1 3195637 2479068 -22.4% x8 4576924 4541743 -0.8% x16 4581528 4555459 -0.6% x24 4692540 4595963 -2.1% x32 4686968 4540456 -3.1% x40 4583814 4404859 -3.9% x48 4969587 4710902 -5.2% x56 4996101 4701814 -5.9% x64 5083460 4759663 -6.4% The interesting observation is that on machine with Intel CPUs and ConnectX-3 card the difference between IBNBD and NVME bandwidth is significantly smaller comparing to AMD and ConnectX-2. I did not thoroughly investiage that behaviour, but suspect that the devil is in Intel vs AMD architecture and probably how NUMAs are organized, i.e. Intel has 2 NUMA nodes against 8 on AMD. If someone is interested in those results and can point me out where to dig on NVME side I can investigate deeply why exactly NVME bandwidth significantly drops on AMD machine with Connect-X2. Shiny graphs are here: https://docs.google.com/spreadsheets/d/1vxSoIvfjPbOWD61XMeN2_gPGxsxrbIUOZADk1UX5lj0 Roman Pen (26): rculist: introduce list_next_or_null_rr_rcu() sysfs: export sysfs_remove_file_self() ibtrs: public interface header to establish RDMA connections ibtrs: private headers with IBTRS protocol structs and helpers ibtrs: core: lib functions shared between client and server modules ibtrs: client: private header with client structs and functions ibtrs: client: main functionality ibtrs: client: statistics functions ibtrs: client: sysfs interface functions ibtrs: server: private header with server structs and functions ibtrs: server: main functionality ibtrs: server: statistics functions ibtrs: server: sysfs interface functions ibtrs: include client and server modules into kernel compilation ibtrs: a bit of documentation ibnbd: private headers with IBNBD protocol structs and helpers ibnbd: client: private header with client structs and functions ibnbd: client: main functionality ibnbd: client: sysfs interface functions ibnbd: server: private header with server structs and functions ibnbd: server: main functionality ibnbd: server: functionality for IO submission to file or block dev ibnbd: server: sysfs interface functions ibnbd: include client and server modules into kernel compilation ibnbd: a bit of documentation MAINTAINERS: Add maintainer for IBNBD/IBTRS modules MAINTAINERS| 14 + drivers/block/Kconfig |2 + drivers/block/Makefile |1 + drivers/block/ibnbd/Kconfig| 22 + drivers/block/ibnbd/Makefile | 13 + drivers/block/ibnbd/README | 299 +++ drivers/block/ibnbd/ibnbd-clt-sysfs.c | 669 ++ drivers/block/ibnbd/ibnbd-clt.c| 1818 +++ drivers/block/ibnbd/ibnbd-clt.h| 171 ++ drivers/block/ibnbd/ibnbd-log.h| 71 + drivers/block/ibnbd/ibnbd-proto.h | 364 +++ drivers/block/ibnbd/ibnbd-srv-dev.c| 410 drivers/block/ibnbd/ibnbd-srv-dev.h| 149 ++ drivers/block/ibnbd/ibnbd-srv-sysfs.c | 242 ++ drivers/block/ibnbd/ibnbd-srv.c| 922 drivers/block/ibnbd/ibnbd-srv.h| 100 + drivers/infiniband/Kconfig
[PATCH v2 02/26] sysfs: export sysfs_remove_file_self()
Function is going to be used in transport over RDMA module in subsequent patches. Signed-off-by: Roman Pen Cc: Tejun Heo Cc: linux-ker...@vger.kernel.org --- fs/sysfs/file.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c index 5c13f29bfcdb..ff7443ac2aa7 100644 --- a/fs/sysfs/file.c +++ b/fs/sysfs/file.c @@ -444,6 +444,7 @@ bool sysfs_remove_file_self(struct kobject *kobj, const struct attribute *attr) kernfs_put(kn); return ret; } +EXPORT_SYMBOL_GPL(sysfs_remove_file_self); void sysfs_remove_files(struct kobject *kobj, const struct attribute **ptr) { -- 2.13.1
[PATCH v2 01/26] rculist: introduce list_next_or_null_rr_rcu()
Function is going to be used in transport over RDMA module in subsequent patches. Function returns next element in round-robin fashion, i.e. head will be skipped. NULL will be returned if list is observed as empty. Signed-off-by: Roman Pen Cc: Paul E. McKenney Cc: linux-ker...@vger.kernel.org --- include/linux/rculist.h | 19 +++ 1 file changed, 19 insertions(+) diff --git a/include/linux/rculist.h b/include/linux/rculist.h index 127f534fec94..b0840d5ab25a 100644 --- a/include/linux/rculist.h +++ b/include/linux/rculist.h @@ -339,6 +339,25 @@ static inline void list_splice_tail_init_rcu(struct list_head *list, }) /** + * list_next_or_null_rr_rcu - get next list element in round-robin fashion. + * @head: the head for the list. + * @ptr:the list head to take the next element from. + * @type: the type of the struct this is embedded in. + * @memb: the name of the list_head within the struct. + * + * Next element returned in round-robin fashion, i.e. head will be skipped, + * but if list is observed as empty, NULL will be returned. + * + * This primitive may safely run concurrently with the _rcu list-mutation + * primitives such as list_add_rcu() as long as it's guarded by rcu_read_lock(). + */ +#define list_next_or_null_rr_rcu(head, ptr, type, memb) \ +({ \ + list_next_or_null_rcu(head, ptr, type, memb) ?: \ + list_next_or_null_rcu(head, READ_ONCE((ptr)->next), type, memb); \ +}) + +/** * list_for_each_entry_rcu - iterate over rcu list of given type * @pos: the type * to use as a loop cursor. * @head: the head for your list. -- 2.13.1
[PATCH v2 06/26] ibtrs: client: private header with client structs and functions
This header describes main structs and functions used by ibtrs-client module, mainly for managing IBTRS sessions, creating/destroying sysfs entries, accounting statistics on client side. Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/infiniband/ulp/ibtrs/ibtrs-clt.h | 315 +++ 1 file changed, 315 insertions(+) create mode 100644 drivers/infiniband/ulp/ibtrs/ibtrs-clt.h diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-clt.h b/drivers/infiniband/ulp/ibtrs/ibtrs-clt.h new file mode 100644 index ..0323da91ca01 --- /dev/null +++ b/drivers/infiniband/ulp/ibtrs/ibtrs-clt.h @@ -0,0 +1,315 @@ +/* + * InfiniBand Transport Layer + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * Swapnil Ingle + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#ifndef IBTRS_CLT_H +#define IBTRS_CLT_H + +#include +#include "ibtrs-pri.h" + +/** + * enum ibtrs_clt_state - Client states. + */ +enum ibtrs_clt_state { + IBTRS_CLT_CONNECTING, + IBTRS_CLT_CONNECTING_ERR, + IBTRS_CLT_RECONNECTING, + IBTRS_CLT_CONNECTED, + IBTRS_CLT_CLOSING, + IBTRS_CLT_CLOSED, + IBTRS_CLT_DEAD, +}; + +static inline const char *ibtrs_clt_state_str(enum ibtrs_clt_state state) +{ + switch (state) { + case IBTRS_CLT_CONNECTING: + return "IBTRS_CLT_CONNECTING"; + case IBTRS_CLT_CONNECTING_ERR: + return "IBTRS_CLT_CONNECTING_ERR"; + case IBTRS_CLT_RECONNECTING: + return "IBTRS_CLT_RECONNECTING"; + case IBTRS_CLT_CONNECTED: + return "IBTRS_CLT_CONNECTED"; + case IBTRS_CLT_CLOSING: + return "IBTRS_CLT_CLOSING"; + case IBTRS_CLT_CLOSED: + return "IBTRS_CLT_CLOSED"; + case IBTRS_CLT_DEAD: + return "IBTRS_CLT_DEAD"; + default: + return "UNKNOWN"; + } +} + +enum ibtrs_mp_policy { + MP_POLICY_RR, + MP_POLICY_MIN_INFLIGHT, +}; + +struct ibtrs_clt_stats_reconnects { + int successful_cnt; + int fail_cnt; +}; + +struct ibtrs_clt_stats_wc_comp { + u32 cnt; + u64 total_cnt; +}; + +struct ibtrs_clt_stats_cpu_migr { + atomic_t from; + int to; +}; + +struct ibtrs_clt_stats_rdma { + struct { + u64 cnt; + u64 size_total; + } dir[2]; + + u64 failover_cnt; +}; + +struct ibtrs_clt_stats_rdma_lat { + u64 read; + u64 write; +}; + +#define MIN_LOG_SG 2 +#define MAX_LOG_SG 5 +#define MAX_LIN_SG BIT(MIN_LOG_SG) +#define SG_DISTR_SZ (MAX_LOG_SG - MIN_LOG_SG + MAX_LIN_SG + 2) + +#define MAX_LOG_LAT 16 +#define MIN_LOG_LAT 0 +#define LOG_LAT_SZ (MAX_LOG_LAT - MIN_LOG_LAT + 2) + +struct ibtrs_clt_stats_pcpu { + struct ibtrs_clt_stats_cpu_migr cpu_migr; + struct ibtrs_clt_stats_rdma rdma; + u64 sg_list_total; + u64 sg_list_distr[SG_DISTR_SZ]; + struct ibtrs_clt_stats_rdma_lat rdma_lat_distr[LOG_LAT_SZ]; + struct ibtrs_clt_stats_rdma_lat rdma_lat_max; + struct ibtrs_clt_stats_wc_comp wc_comp; +}; + +struct ibtrs_clt_stats { + boolenable_rdma_lat; + struct ibtrs_clt_stats_pcpu__percpu *pcpu_stats; + struct ibtrs_clt_stats_reconnects reconnects; + atomic_tinflight; +}; + +struct ibtrs_clt_con { + struct ibtrs_conc; + unsignedcpu; + atomic_tio_cnt; + int cm_err; +}; + +/** + * ibtrs_tag - tags the memory allocation for future RDMA operation + */ +struct ibtrs_tag { + enum ibtrs_clt_con_type con_type; + unsigned int cpu_id; + unsigned int mem_id; + unsigned int mem_off; +}; + +struct ibtrs_clt_io_req { + struct lis
[PATCH v2 05/26] ibtrs: core: lib functions shared between client and server modules
This is a set of library functions existing as a ibtrs-core module, used by client and server modules. Mainly these functions wrap IB and RDMA calls and provide a bit higher abstraction for implementing of IBTRS protocol on client or server sides. Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/infiniband/ulp/ibtrs/ibtrs.c | 609 +++ 1 file changed, 609 insertions(+) create mode 100644 drivers/infiniband/ulp/ibtrs/ibtrs.c diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs.c b/drivers/infiniband/ulp/ibtrs/ibtrs.c new file mode 100644 index ..39a933fe528e --- /dev/null +++ b/drivers/infiniband/ulp/ibtrs/ibtrs.c @@ -0,0 +1,609 @@ +/* + * InfiniBand Transport Layer + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#undef pr_fmt +#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt + +#include +#include + +#include "ibtrs-pri.h" +#include "ibtrs-log.h" + +MODULE_AUTHOR("ib...@profitbricks.com"); +MODULE_DESCRIPTION("IBTRS Core"); +MODULE_VERSION(IBTRS_VER_STRING); +MODULE_LICENSE("GPL"); + +struct ibtrs_iu *ibtrs_iu_alloc(u32 tag, size_t size, gfp_t gfp_mask, + struct ib_device *dma_dev, + enum dma_data_direction direction, + void (*done)(struct ib_cq *cq, +struct ib_wc *wc)) +{ + struct ibtrs_iu *iu; + + iu = kmalloc(sizeof(*iu), gfp_mask); + if (unlikely(!iu)) + return NULL; + + iu->buf = kzalloc(size, gfp_mask); + if (unlikely(!iu->buf)) + goto err1; + + iu->dma_addr = ib_dma_map_single(dma_dev, iu->buf, size, direction); + if (unlikely(ib_dma_mapping_error(dma_dev, iu->dma_addr))) + goto err2; + + iu->cqe.done = done; + iu->size = size; + iu->direction = direction; + iu->tag = tag; + + return iu; + +err2: + kfree(iu->buf); +err1: + kfree(iu); + + return NULL; +} +EXPORT_SYMBOL_GPL(ibtrs_iu_alloc); + +void ibtrs_iu_free(struct ibtrs_iu *iu, enum dma_data_direction dir, + struct ib_device *ibdev) +{ + if (!iu) + return; + + ib_dma_unmap_single(ibdev, iu->dma_addr, iu->size, dir); + kfree(iu->buf); + kfree(iu); +} +EXPORT_SYMBOL_GPL(ibtrs_iu_free); + +int ibtrs_iu_post_recv(struct ibtrs_con *con, struct ibtrs_iu *iu) +{ + struct ibtrs_sess *sess = con->sess; + struct ib_recv_wr wr, *bad_wr; + struct ib_sge list; + + list.addr = iu->dma_addr; + list.length = iu->size; + list.lkey = sess->dev->ib_pd->local_dma_lkey; + + if (WARN_ON(list.length == 0)) { + ibtrs_wrn(con, "Posting receive work request failed," + " sg list is empty\n"); + return -EINVAL; + } + + wr.next= NULL; + wr.wr_cqe = &iu->cqe; + wr.sg_list = &list; + wr.num_sge = 1; + + return ib_post_recv(con->qp, &wr, &bad_wr); +} +EXPORT_SYMBOL_GPL(ibtrs_iu_post_recv); + +int ibtrs_post_recv_empty(struct ibtrs_con *con, struct ib_cqe *cqe) +{ + struct ib_recv_wr wr, *bad_wr; + + wr.next= NULL; + wr.wr_cqe = cqe; + wr.sg_list = NULL; + wr.num_sge = 0; + + return ib_post_recv(con->qp, &wr, &bad_wr); +} +EXPORT_SYMBOL_GPL(ibtrs_post_recv_empty); + +int ibtrs_post_recv_empty_x2(struct ibtrs_con *con, struct ib_cqe *cqe) +{ + struct ib_recv_wr wr_arr[2], *wr, *bad_wr; + int i; + + memset(wr_arr, 0, sizeof(wr_arr)); + for (i = 0; i < ARRAY_SIZE(wr_arr); i++) { + wr = &wr_arr[i]; + wr->wr_cqe = cqe; + if (i) +
[PATCH v3 04/25] ibtrs: core: lib functions shared between client and server modules
This is a set of library functions existing as a ibtrs-core module, used by client and server modules. Mainly these functions wrap IB and RDMA calls and provide a bit higher abstraction for implementing of IBTRS protocol on client or server sides. Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/infiniband/ulp/ibtrs/ibtrs.c | 611 +++ 1 file changed, 611 insertions(+) create mode 100644 drivers/infiniband/ulp/ibtrs/ibtrs.c diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs.c b/drivers/infiniband/ulp/ibtrs/ibtrs.c new file mode 100644 index ..11302408b13c --- /dev/null +++ b/drivers/infiniband/ulp/ibtrs/ibtrs.c @@ -0,0 +1,611 @@ +/* + * InfiniBand Transport Layer + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#undef pr_fmt +#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt + +#include +#include + +#include "ibtrs-pri.h" +#include "ibtrs-log.h" + +MODULE_AUTHOR("ib...@profitbricks.com"); +MODULE_DESCRIPTION("IBTRS Core"); +MODULE_VERSION(IBTRS_VER_STRING); +MODULE_LICENSE("GPL"); + +struct ibtrs_iu *ibtrs_iu_alloc(u32 tag, size_t size, gfp_t gfp_mask, + struct ib_device *dma_dev, + enum dma_data_direction direction, + void (*done)(struct ib_cq *cq, +struct ib_wc *wc)) +{ + struct ibtrs_iu *iu; + + iu = kmalloc(sizeof(*iu), gfp_mask); + if (unlikely(!iu)) + return NULL; + + iu->buf = kzalloc(size, gfp_mask); + if (unlikely(!iu->buf)) + goto err1; + + iu->dma_addr = ib_dma_map_single(dma_dev, iu->buf, size, direction); + if (unlikely(ib_dma_mapping_error(dma_dev, iu->dma_addr))) + goto err2; + + iu->cqe.done = done; + iu->size = size; + iu->direction = direction; + iu->tag = tag; + + return iu; + +err2: + kfree(iu->buf); +err1: + kfree(iu); + + return NULL; +} +EXPORT_SYMBOL_GPL(ibtrs_iu_alloc); + +void ibtrs_iu_free(struct ibtrs_iu *iu, enum dma_data_direction dir, + struct ib_device *ibdev) +{ + if (!iu) + return; + + ib_dma_unmap_single(ibdev, iu->dma_addr, iu->size, dir); + kfree(iu->buf); + kfree(iu); +} +EXPORT_SYMBOL_GPL(ibtrs_iu_free); + +int ibtrs_iu_post_recv(struct ibtrs_con *con, struct ibtrs_iu *iu) +{ + struct ibtrs_sess *sess = con->sess; + struct ib_recv_wr wr, *bad_wr; + struct ib_sge list; + + list.addr = iu->dma_addr; + list.length = iu->size; + list.lkey = sess->dev->ib_pd->local_dma_lkey; + + if (WARN_ON(list.length == 0)) { + ibtrs_wrn(con, "Posting receive work request failed," + " sg list is empty\n"); + return -EINVAL; + } + + wr.next= NULL; + wr.wr_cqe = &iu->cqe; + wr.sg_list = &list; + wr.num_sge = 1; + + return ib_post_recv(con->qp, &wr, &bad_wr); +} +EXPORT_SYMBOL_GPL(ibtrs_iu_post_recv); + +int ibtrs_post_recv_empty(struct ibtrs_con *con, struct ib_cqe *cqe) +{ + struct ib_recv_wr wr, *bad_wr; + + wr.next= NULL; + wr.wr_cqe = cqe; + wr.sg_list = NULL; + wr.num_sge = 0; + + return ib_post_recv(con->qp, &wr, &bad_wr); +} +EXPORT_SYMBOL_GPL(ibtrs_post_recv_empty); + +int ibtrs_post_recv_empty_x2(struct ibtrs_con *con, struct ib_cqe *cqe) +{ + struct ib_recv_wr wr_arr[2], *wr, *bad_wr; + int i; + + memset(wr_arr, 0, sizeof(wr_arr)); + for (i = 0; i < ARRAY_SIZE(wr_arr); i++) { + wr = &wr_arr[i]; + wr->wr_cqe = cqe; + if (i) +
[PATCH v3 07/25] ibtrs: client: statistics functions
This introduces set of functions used on client side to account statistics of RDMA data sent/received, amount of IOs inflight, latency, cpu migrations, etc. Almost all statistics is collected using percpu variables. Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/infiniband/ulp/ibtrs/ibtrs-clt-stats.c | 455 + 1 file changed, 455 insertions(+) create mode 100644 drivers/infiniband/ulp/ibtrs/ibtrs-clt-stats.c diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-clt-stats.c b/drivers/infiniband/ulp/ibtrs/ibtrs-clt-stats.c new file mode 100644 index ..af2ed05d2900 --- /dev/null +++ b/drivers/infiniband/ulp/ibtrs/ibtrs-clt-stats.c @@ -0,0 +1,455 @@ +/* + * InfiniBand Transport Layer + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#undef pr_fmt +#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt + +#include "ibtrs-clt.h" + +static inline int ibtrs_clt_ms_to_id(unsigned long ms) +{ + int id = ms ? ilog2(ms) - MIN_LOG_LAT + 1 : 0; + + return clamp(id, 0, LOG_LAT_SZ - 1); +} + +void ibtrs_clt_update_rdma_lat(struct ibtrs_clt_stats *stats, bool read, + unsigned long ms) +{ + struct ibtrs_clt_stats_pcpu *s; + int id; + + id = ibtrs_clt_ms_to_id(ms); + s = this_cpu_ptr(stats->pcpu_stats); + if (read) { + s->rdma_lat_distr[id].read++; + if (s->rdma_lat_max.read < ms) + s->rdma_lat_max.read = ms; + } else { + s->rdma_lat_distr[id].write++; + if (s->rdma_lat_max.write < ms) + s->rdma_lat_max.write = ms; + } +} + +void ibtrs_clt_decrease_inflight(struct ibtrs_clt_stats *stats) +{ + atomic_dec(&stats->inflight); +} + +void ibtrs_clt_update_wc_stats(struct ibtrs_clt_con *con) +{ + struct ibtrs_clt_sess *sess = to_clt_sess(con->c.sess); + struct ibtrs_clt_stats *stats = &sess->stats; + struct ibtrs_clt_stats_pcpu *s; + int cpu; + + cpu = raw_smp_processor_id(); + s = this_cpu_ptr(stats->pcpu_stats); + s->wc_comp.cnt++; + s->wc_comp.total_cnt++; + if (unlikely(con->cpu != cpu)) { + s->cpu_migr.to++; + + /* Careful here, override s pointer */ + s = per_cpu_ptr(stats->pcpu_stats, con->cpu); + atomic_inc(&s->cpu_migr.from); + } +} + +void ibtrs_clt_inc_failover_cnt(struct ibtrs_clt_stats *stats) +{ + struct ibtrs_clt_stats_pcpu *s; + + s = this_cpu_ptr(stats->pcpu_stats); + s->rdma.failover_cnt++; +} + +static inline u32 ibtrs_clt_stats_get_avg_wc_cnt(struct ibtrs_clt_stats *stats) +{ + u32 cnt = 0; + u64 sum = 0; + int cpu; + + for_each_possible_cpu(cpu) { + struct ibtrs_clt_stats_pcpu *s; + + s = per_cpu_ptr(stats->pcpu_stats, cpu); + sum += s->wc_comp.total_cnt; + cnt += s->wc_comp.cnt; + } + + return cnt ? sum / cnt : 0; +} + +int ibtrs_clt_stats_wc_completion_to_str(struct ibtrs_clt_stats *stats, +char *buf, size_t len) +{ + return scnprintf(buf, len, "%u\n", +ibtrs_clt_stats_get_avg_wc_cnt(stats)); +} + +ssize_t ibtrs_clt_stats_rdma_lat_distr_to_str(struct ibtrs_clt_stats *stats, + char *page, size_t len) +{ + struct ibtrs_clt_stats_rdma_lat res[LOG_LAT_SZ]; + struct ibtrs_clt_stats_rdma_lat max; + struct ibtrs_clt_stats_pcpu *s; + + ssize_t cnt = 0; + int i, cpu; + + max.write = 0; + max.read = 0; + for_each_possible_cpu(cpu) { + s = per_cpu_ptr(stats->pcpu_stats, cpu); + + if (max.write < s->rdma_lat_max.write) +
[PATCH v3 06/25] ibtrs: client: main functionality
This is main functionality of ibtrs-client module, which manages set of RDMA connections for each IBTRS session, does multipathing, load balancing and failover of RDMA requests. Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/infiniband/ulp/ibtrs/ibtrs-clt.c | 2844 ++ 1 file changed, 2844 insertions(+) create mode 100644 drivers/infiniband/ulp/ibtrs/ibtrs-clt.c diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-clt.c b/drivers/infiniband/ulp/ibtrs/ibtrs-clt.c new file mode 100644 index ..dc0327a95ef6 --- /dev/null +++ b/drivers/infiniband/ulp/ibtrs/ibtrs-clt.c @@ -0,0 +1,2844 @@ +/* + * InfiniBand Transport Layer + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * Swapnil Ingle + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#undef pr_fmt +#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt + +#include +#include + +#include "ibtrs-clt.h" +#include "ibtrs-log.h" + +#define MAX_SEGMENTS 31 +#define IBTRS_CONNECT_TIMEOUT_MS 5000 + +MODULE_AUTHOR("ib...@profitbricks.com"); +MODULE_DESCRIPTION("IBTRS Client"); +MODULE_VERSION(IBTRS_VER_STRING); +MODULE_LICENSE("GPL"); + +static ushort nr_cons_per_session; +module_param(nr_cons_per_session, ushort, 0444); +MODULE_PARM_DESC(nr_cons_per_session, "Number of connections per session." +" (default: nr_cpu_ids)"); + +static int retry_cnt = 7; +module_param_named(retry_cnt, retry_cnt, int, 0644); +MODULE_PARM_DESC(retry_cnt, "Number of times to send the message if the" +" remote side didn't respond with Ack or Nack (default: 7," +" min: " __stringify(MIN_RTR_CNT) ", max: " +__stringify(MAX_RTR_CNT) ")"); + +static int __read_mostly noreg_cnt = 0; +module_param_named(noreg_cnt, noreg_cnt, int, 0444); +MODULE_PARM_DESC(noreg_cnt, "Max number of SG entries when MR registration " +"does not happen (default: 0)"); + +static const struct ibtrs_ib_dev_pool_ops dev_pool_ops; +static struct ibtrs_ib_dev_pool dev_pool = { + .ops = &dev_pool_ops +}; +static struct workqueue_struct *ibtrs_wq; +static struct class *ibtrs_dev_class; + +static void ibtrs_rdma_error_recovery(struct ibtrs_clt_con *con); +static int ibtrs_clt_rdma_cm_handler(struct rdma_cm_id *cm_id, +struct rdma_cm_event *ev); +static void ibtrs_clt_rdma_done(struct ib_cq *cq, struct ib_wc *wc); +static void complete_rdma_req(struct ibtrs_clt_io_req *req, int errno, + bool notify, bool can_wait); +static int ibtrs_clt_write_req(struct ibtrs_clt_io_req *req); +static int ibtrs_clt_read_req(struct ibtrs_clt_io_req *req); + +bool ibtrs_clt_sess_is_connected(const struct ibtrs_clt_sess *sess) +{ + return sess->state == IBTRS_CLT_CONNECTED; +} + +static inline bool ibtrs_clt_is_connected(const struct ibtrs_clt *clt) +{ + struct ibtrs_clt_sess *sess; + bool connected = false; + + rcu_read_lock(); + list_for_each_entry_rcu(sess, &clt->paths_list, s.entry) + connected |= ibtrs_clt_sess_is_connected(sess); + rcu_read_unlock(); + + return connected; +} + +static inline struct ibtrs_tag * +__ibtrs_get_tag(struct ibtrs_clt *clt, enum ibtrs_clt_con_type con_type) +{ + size_t max_depth = clt->queue_depth; + struct ibtrs_tag *tag; + int cpu, bit; + + cpu = get_cpu(); + do { + bit = find_first_zero_bit(clt->tags_map, max_depth); + if (unlikely(bit >= max_depth)) { + put_cpu(); + return NULL; + } + + } while (unlikely(test_and_set_bit_lock(bit, clt->tags_map))); + put_cpu(); + + tag = GET_TAG(clt, bit); + WARN_ON(tag->mem_id != bit); + tag->cpu_id = cpu; +
[PATCH v3 08/25] ibtrs: client: sysfs interface functions
This is the sysfs interface to IBTRS sessions on client side: /sys/devices/virtual/ibtrs-client// *** IBTRS session created by ibtrs_clt_open() API call | |- max_reconnect_attempts | *** number of reconnect attempts for session | |- add_path | *** adds another connection path into IBTRS session | |- paths// *** established paths to server in a session | |- disconnect | *** disconnect path | |- reconnect | *** reconnect path | |- remove_path | *** remove current path | |- state | *** retrieve current path state | |- hca_port | *** HCA port number | |- hca_name | *** HCA name | |- stats/ *** current path statistics | |- cpu_migration |- rdma |- rdma_lat |- reconnects |- reset_all |- sg_entries |- wc_completions Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/infiniband/ulp/ibtrs/ibtrs-clt-sysfs.c | 520 + 1 file changed, 520 insertions(+) create mode 100644 drivers/infiniband/ulp/ibtrs/ibtrs-clt-sysfs.c diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-clt-sysfs.c b/drivers/infiniband/ulp/ibtrs/ibtrs-clt-sysfs.c new file mode 100644 index ..a25763a29a17 --- /dev/null +++ b/drivers/infiniband/ulp/ibtrs/ibtrs-clt-sysfs.c @@ -0,0 +1,520 @@ +/* + * InfiniBand Transport Layer + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#undef pr_fmt +#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt + +#include "ibtrs-pri.h" +#include "ibtrs-clt.h" +#include "ibtrs-log.h" + +#define MIN_MAX_RECONN_ATT -1 +#define MAX_MAX_RECONN_ATT + +static struct kobj_type ktype = { + .sysfs_ops = &kobj_sysfs_ops, +}; + +static ssize_t max_reconnect_attempts_show(struct device *dev, + struct device_attribute *attr, + char *page) +{ + struct ibtrs_clt *clt; + + clt = container_of(dev, struct ibtrs_clt, dev); + + return sprintf(page, "%d\n", ibtrs_clt_get_max_reconnect_attempts(clt)); +} + +static ssize_t max_reconnect_attempts_store(struct device *dev, + struct device_attribute *attr, + const char *buf, + size_t count) +{ + struct ibtrs_clt *clt; + int value; + int ret; + + clt = container_of(dev, struct ibtrs_clt, dev); + + ret = kstrtoint(buf, 10, &value); + if (unlikely(ret)) { + ibtrs_err(clt, "%s: failed to convert string '%s' to int\n", + attr->attr.name, buf); + return ret; + } + if (unlikely(value > MAX_MAX_RECONN_ATT || +value < MIN_MAX_RECONN_ATT)) { + ibtrs_err(clt, "%s: invalid range" + " (provided: '%s', accepted: min: %d, max: %d)\n", + attr->attr.name, buf, MIN_MAX_RECONN_ATT, + MAX_MAX_RECONN_ATT); + return -EINVAL; + } + ibtrs_clt_set_max_reconnect_attempts(clt, value); + + return count; +} + +static DEVICE_ATTR_RW(max_reconnect_attempts); + +static ssize_t mpath_policy_show(struct device *dev, +struct device_attribute *attr, +char *page) +{ + struct ibtrs_clt *clt; + + clt = container_of(dev, struct ibtrs_clt, dev); + + switch (clt->mp_policy) { + case MP_POLICY_RR: + return sprintf(page, "round-robin (RR: %d)\n", clt->mp_policy); + case MP_POLICY_MIN_INFLIGHT
[PATCH v3 11/25] ibtrs: server: statistics functions
This introduces set of functions used on server side to account statistics of RDMA data sent/received. Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/infiniband/ulp/ibtrs/ibtrs-srv-stats.c | 110 + 1 file changed, 110 insertions(+) create mode 100644 drivers/infiniband/ulp/ibtrs/ibtrs-srv-stats.c diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-srv-stats.c b/drivers/infiniband/ulp/ibtrs/ibtrs-srv-stats.c new file mode 100644 index ..5933cfc03f95 --- /dev/null +++ b/drivers/infiniband/ulp/ibtrs/ibtrs-srv-stats.c @@ -0,0 +1,110 @@ +/* + * InfiniBand Transport Layer + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#undef pr_fmt +#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt + +#include "ibtrs-srv.h" + +void ibtrs_srv_update_rdma_stats(struct ibtrs_srv_stats *s, +size_t size, int d) +{ + atomic64_inc(&s->rdma_stats.dir[d].cnt); + atomic64_add(size, &s->rdma_stats.dir[d].size_total); +} + +void ibtrs_srv_update_wc_stats(struct ibtrs_srv_stats *s) +{ + atomic64_inc(&s->wc_comp.calls); + atomic64_inc(&s->wc_comp.total_wc_cnt); +} + +int ibtrs_srv_reset_rdma_stats(struct ibtrs_srv_stats *stats, bool enable) +{ + if (enable) { + struct ibtrs_srv_stats_rdma_stats *r = &stats->rdma_stats; + + memset(r, 0, sizeof(*r)); + return 0; + } + + return -EINVAL; +} + +ssize_t ibtrs_srv_stats_rdma_to_str(struct ibtrs_srv_stats *stats, + char *page, size_t len) +{ + struct ibtrs_srv_stats_rdma_stats *r = &stats->rdma_stats; + struct ibtrs_srv_sess *sess; + + sess = container_of(stats, typeof(*sess), stats); + + return scnprintf(page, len, "%lld %lld %lld %lld %u\n", +(s64)atomic64_read(&r->dir[READ].cnt), +(s64)atomic64_read(&r->dir[READ].size_total), +(s64)atomic64_read(&r->dir[WRITE].cnt), +(s64)atomic64_read(&r->dir[WRITE].size_total), +atomic_read(&sess->ids_inflight)); +} + +int ibtrs_srv_reset_wc_completion_stats(struct ibtrs_srv_stats *stats, + bool enable) +{ + if (enable) { + memset(&stats->wc_comp, 0, sizeof(stats->wc_comp)); + return 0; + } + + return -EINVAL; +} + +int ibtrs_srv_stats_wc_completion_to_str(struct ibtrs_srv_stats *stats, +char *buf, size_t len) +{ + return snprintf(buf, len, "%lld %lld\n", + (s64)atomic64_read(&stats->wc_comp.total_wc_cnt), + (s64)atomic64_read(&stats->wc_comp.calls)); +} + +ssize_t ibtrs_srv_reset_all_help(struct ibtrs_srv_stats *stats, +char *page, size_t len) +{ + return scnprintf(page, PAGE_SIZE, "echo 1 to reset all statistics\n"); +} + +int ibtrs_srv_reset_all_stats(struct ibtrs_srv_stats *stats, bool enable) +{ + if (enable) { + ibtrs_srv_reset_wc_completion_stats(stats, enable); + ibtrs_srv_reset_rdma_stats(stats, enable); + return 0; + } + + return -EINVAL; +} -- 2.13.1
[PATCH v3 12/25] ibtrs: server: sysfs interface functions
This is the sysfs interface to IBTRS sessions on server side: /sys/devices/virtual/ibtrs-server// *** IBTRS session accepted from a client peer | |- paths// *** established paths from a client in a session | |- disconnect | *** disconnect path | |- hca_name | *** HCA name | |- hca_port | *** HCA port | |- stats/ *** current path statistics | |- rdma |- reset_all |- wc_completions Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/infiniband/ulp/ibtrs/ibtrs-srv-sysfs.c | 307 + 1 file changed, 307 insertions(+) create mode 100644 drivers/infiniband/ulp/ibtrs/ibtrs-srv-sysfs.c diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-srv-sysfs.c b/drivers/infiniband/ulp/ibtrs/ibtrs-srv-sysfs.c new file mode 100644 index ..91f664b7eb66 --- /dev/null +++ b/drivers/infiniband/ulp/ibtrs/ibtrs-srv-sysfs.c @@ -0,0 +1,307 @@ +/* + * InfiniBand Transport Layer + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#undef pr_fmt +#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt + +#include "ibtrs-pri.h" +#include "ibtrs-srv.h" +#include "ibtrs-log.h" + +static struct kobj_type ktype = { + .sysfs_ops = &kobj_sysfs_ops, +}; + +static ssize_t ibtrs_srv_disconnect_show(struct kobject *kobj, +struct kobj_attribute *attr, +char *page) +{ + return scnprintf(page, PAGE_SIZE, "Usage: echo 1 > %s\n", +attr->attr.name); +} + +static ssize_t ibtrs_srv_disconnect_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t count) +{ + struct ibtrs_srv_sess *sess; + char str[MAXHOSTNAMELEN]; + + sess = container_of(kobj, struct ibtrs_srv_sess, kobj); + if (!sysfs_streq(buf, "1")) { + ibtrs_err(sess, "%s: invalid value: '%s'\n", + attr->attr.name, buf); + return -EINVAL; + } + + sockaddr_to_str((struct sockaddr *)&sess->s.dst_addr, str, sizeof(str)); + + ibtrs_info(sess, "disconnect for path %s requested\n", str); + ibtrs_srv_queue_close(sess); + + return count; +} + +static struct kobj_attribute ibtrs_srv_disconnect_attr = + __ATTR(disconnect, 0644, + ibtrs_srv_disconnect_show, ibtrs_srv_disconnect_store); + +static ssize_t ibtrs_srv_hca_port_show(struct kobject *kobj, + struct kobj_attribute *attr, + char *page) +{ + struct ibtrs_srv_sess *sess; + struct ibtrs_con *usr_con; + + sess = container_of(kobj, typeof(*sess), kobj); + usr_con = sess->s.con[0]; + + return scnprintf(page, PAGE_SIZE, "%u\n", +usr_con->cm_id->port_num); +} + +static struct kobj_attribute ibtrs_srv_hca_port_attr = + __ATTR(hca_port, 0444, ibtrs_srv_hca_port_show, NULL); + +static ssize_t ibtrs_srv_hca_name_show(struct kobject *kobj, + struct kobj_attribute *attr, + char *page) +{ + struct ibtrs_srv_sess *sess; + + sess = container_of(kobj, struct ibtrs_srv_sess, kobj); + + return scnprintf(page, PAGE_SIZE, "%s\n", +sess->s.dev->ib_dev->name); +} + +static struct kobj_attribute ibtrs_srv_hca_name_attr = + __ATTR(hca_name, 0444, ibtrs_srv_hca_name_show, NULL); + +static ssize_t ibtrs_srv_src_addr_show(struct kobject *kobj, + struct kobj_attribute *attr, +
[PATCH v3 13/25] ibtrs: include client and server modules into kernel compilation
Add IBTRS Makefile, Kconfig and also corresponding lines into upper layer infiniband/ulp files. Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/infiniband/Kconfig| 1 + drivers/infiniband/ulp/Makefile | 1 + drivers/infiniband/ulp/ibtrs/Kconfig | 20 drivers/infiniband/ulp/ibtrs/Makefile | 13 + 4 files changed, 35 insertions(+) create mode 100644 drivers/infiniband/ulp/ibtrs/Kconfig create mode 100644 drivers/infiniband/ulp/ibtrs/Makefile diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig index 2a972ed6851b..10df5d2bb8fe 100644 --- a/drivers/infiniband/Kconfig +++ b/drivers/infiniband/Kconfig @@ -97,6 +97,7 @@ source "drivers/infiniband/ulp/srpt/Kconfig" source "drivers/infiniband/ulp/iser/Kconfig" source "drivers/infiniband/ulp/isert/Kconfig" +source "drivers/infiniband/ulp/ibtrs/Kconfig" source "drivers/infiniband/ulp/opa_vnic/Kconfig" source "drivers/infiniband/sw/rdmavt/Kconfig" diff --git a/drivers/infiniband/ulp/Makefile b/drivers/infiniband/ulp/Makefile index 437813c7b481..1c4f10dc8d49 100644 --- a/drivers/infiniband/ulp/Makefile +++ b/drivers/infiniband/ulp/Makefile @@ -5,3 +5,4 @@ obj-$(CONFIG_INFINIBAND_SRPT) += srpt/ obj-$(CONFIG_INFINIBAND_ISER) += iser/ obj-$(CONFIG_INFINIBAND_ISERT) += isert/ obj-$(CONFIG_INFINIBAND_OPA_VNIC) += opa_vnic/ +obj-$(CONFIG_INFINIBAND_IBTRS) += ibtrs/ diff --git a/drivers/infiniband/ulp/ibtrs/Kconfig b/drivers/infiniband/ulp/ibtrs/Kconfig new file mode 100644 index ..eaeb8f3f6b4e --- /dev/null +++ b/drivers/infiniband/ulp/ibtrs/Kconfig @@ -0,0 +1,20 @@ +config INFINIBAND_IBTRS + tristate + depends on INFINIBAND_ADDR_TRANS + +config INFINIBAND_IBTRS_CLIENT + tristate "IBTRS client module" + depends on INFINIBAND_ADDR_TRANS + select INFINIBAND_IBTRS + help + IBTRS client allows for simplified data transfer and connection + establishment over RDMA (InfiniBand, RoCE, iWarp). Uses BIO-like + READ/WRITE semantics and provides multipath capabilities. + +config INFINIBAND_IBTRS_SERVER + tristate "IBTRS server module" + depends on INFINIBAND_ADDR_TRANS + select INFINIBAND_IBTRS + help + IBTRS server module processing connection and IO requests received + from the IBTRS client module. diff --git a/drivers/infiniband/ulp/ibtrs/Makefile b/drivers/infiniband/ulp/ibtrs/Makefile new file mode 100644 index ..2a145f8d252a --- /dev/null +++ b/drivers/infiniband/ulp/ibtrs/Makefile @@ -0,0 +1,13 @@ +ibtrs-client-y := ibtrs-clt.o \ + ibtrs-clt-stats.o \ + ibtrs-clt-sysfs.o + +ibtrs-server-y := ibtrs-srv.o \ + ibtrs-srv-stats.o \ + ibtrs-srv-sysfs.o + +ibtrs-core-y := ibtrs.o + +obj-$(CONFIG_INFINIBAND_IBTRS)+= ibtrs-core.o +obj-$(CONFIG_INFINIBAND_IBTRS_CLIENT) += ibtrs-client.o +obj-$(CONFIG_INFINIBAND_IBTRS_SERVER) += ibtrs-server.o -- 2.13.1
[PATCH v3 05/25] ibtrs: client: private header with client structs and functions
This header describes main structs and functions used by ibtrs-client module, mainly for managing IBTRS sessions, creating/destroying sysfs entries, accounting statistics on client side. Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/infiniband/ulp/ibtrs/ibtrs-clt.h | 315 +++ 1 file changed, 315 insertions(+) create mode 100644 drivers/infiniband/ulp/ibtrs/ibtrs-clt.h diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-clt.h b/drivers/infiniband/ulp/ibtrs/ibtrs-clt.h new file mode 100644 index ..3212a33a0bf5 --- /dev/null +++ b/drivers/infiniband/ulp/ibtrs/ibtrs-clt.h @@ -0,0 +1,315 @@ +/* + * InfiniBand Transport Layer + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * Swapnil Ingle + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#ifndef IBTRS_CLT_H +#define IBTRS_CLT_H + +#include +#include "ibtrs-pri.h" + +/** + * enum ibtrs_clt_state - Client states. + */ +enum ibtrs_clt_state { + IBTRS_CLT_CONNECTING, + IBTRS_CLT_CONNECTING_ERR, + IBTRS_CLT_RECONNECTING, + IBTRS_CLT_CONNECTED, + IBTRS_CLT_CLOSING, + IBTRS_CLT_CLOSED, + IBTRS_CLT_DEAD, +}; + +static inline const char *ibtrs_clt_state_str(enum ibtrs_clt_state state) +{ + switch (state) { + case IBTRS_CLT_CONNECTING: + return "IBTRS_CLT_CONNECTING"; + case IBTRS_CLT_CONNECTING_ERR: + return "IBTRS_CLT_CONNECTING_ERR"; + case IBTRS_CLT_RECONNECTING: + return "IBTRS_CLT_RECONNECTING"; + case IBTRS_CLT_CONNECTED: + return "IBTRS_CLT_CONNECTED"; + case IBTRS_CLT_CLOSING: + return "IBTRS_CLT_CLOSING"; + case IBTRS_CLT_CLOSED: + return "IBTRS_CLT_CLOSED"; + case IBTRS_CLT_DEAD: + return "IBTRS_CLT_DEAD"; + default: + return "UNKNOWN"; + } +} + +enum ibtrs_mp_policy { + MP_POLICY_RR, + MP_POLICY_MIN_INFLIGHT, +}; + +struct ibtrs_clt_stats_reconnects { + int successful_cnt; + int fail_cnt; +}; + +struct ibtrs_clt_stats_wc_comp { + u32 cnt; + u64 total_cnt; +}; + +struct ibtrs_clt_stats_cpu_migr { + atomic_t from; + int to; +}; + +struct ibtrs_clt_stats_rdma { + struct { + u64 cnt; + u64 size_total; + } dir[2]; + + u64 failover_cnt; +}; + +struct ibtrs_clt_stats_rdma_lat { + u64 read; + u64 write; +}; + +#define MIN_LOG_SG 2 +#define MAX_LOG_SG 5 +#define MAX_LIN_SG BIT(MIN_LOG_SG) +#define SG_DISTR_SZ (MAX_LOG_SG - MIN_LOG_SG + MAX_LIN_SG + 2) + +#define MAX_LOG_LAT 16 +#define MIN_LOG_LAT 0 +#define LOG_LAT_SZ (MAX_LOG_LAT - MIN_LOG_LAT + 2) + +struct ibtrs_clt_stats_pcpu { + struct ibtrs_clt_stats_cpu_migr cpu_migr; + struct ibtrs_clt_stats_rdma rdma; + u64 sg_list_total; + u64 sg_list_distr[SG_DISTR_SZ]; + struct ibtrs_clt_stats_rdma_lat rdma_lat_distr[LOG_LAT_SZ]; + struct ibtrs_clt_stats_rdma_lat rdma_lat_max; + struct ibtrs_clt_stats_wc_comp wc_comp; +}; + +struct ibtrs_clt_stats { + boolenable_rdma_lat; + struct ibtrs_clt_stats_pcpu__percpu *pcpu_stats; + struct ibtrs_clt_stats_reconnects reconnects; + atomic_tinflight; +}; + +struct ibtrs_clt_con { + struct ibtrs_conc; + unsignedcpu; + atomic_tio_cnt; + int cm_err; +}; + +/** + * ibtrs_tag - tags the memory allocation for future RDMA operation + */ +struct ibtrs_tag { + enum ibtrs_clt_con_type con_type; + unsigned int cpu_id; + unsigned int mem_id; + unsigned int mem_off; +}; + +struct ibtrs_clt_io_req { + struct lis
[PATCH v3 14/25] ibtrs: a bit of documentation
README with description of major sysfs entries. Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/infiniband/ulp/ibtrs/README | 390 1 file changed, 390 insertions(+) create mode 100644 drivers/infiniband/ulp/ibtrs/README diff --git a/drivers/infiniband/ulp/ibtrs/README b/drivers/infiniband/ulp/ibtrs/README new file mode 100644 index ..d9d8cd69d44f --- /dev/null +++ b/drivers/infiniband/ulp/ibtrs/README @@ -0,0 +1,390 @@ + +InfiniBand Transport (IBTRS) + + +IBTRS (InfiniBand Transport) is a reliable high speed transport library +which provides support to establish optimal number of connections +between client and server machines using RDMA (InfiniBand, RoCE, iWarp) +transport. It is optimized to transfer (read/write) IO blocks. + +In its core interface it follows the BIO semantics of providing the +possibility to either write data from an sg list to the remote side +or to request ("read") data transfer from the remote side into a given +sg list. + +IBTRS provides I/O fail-over and load-balancing capabilities by using +multipath I/O (see "add_path" and "mp_policy" configuration entries). + +IBTRS is used by the IBNBD (Infiniband Network Block Device) modules. + +== +Client Sysfs Interface +== + +This chapter describes only the most important files of sysfs interface +on client side. + +Entries under /sys/devices/virtual/ibtrs-client/ + + +When a user of IBTRS API creates a new session, a directory entry with +the name of that session is created. + +Entries under /sys/devices/virtual/ibtrs-client// +=== + +add_path (RW) +- + +Adds a new path (connection) to an existing session. Expected format is the +following: + + <[source addr,]destination addr> + + *addr ::= [ ip: | gid: ] + +max_reconnect_attempts (RW) +--- + +Maximum number reconnect attempts the client should make before giving up +after connection breaks unexpectedly. + +mp_policy (RW) +-- + +Multipath policy specifies which path should be selected on each IO: + + round-robin (0): + select path in per CPU round-robin manner. + + min-inflight (1): + select path with minimum inflights. + +Entries under /sys/devices/virtual/ibtrs-client//paths/ += + + +Each path belonging to a given session is listed here by its source and +destination address. When a new path is added to a session by writing to +the "add_path" entry, a directory is created. + +Entries under /sys/devices/virtual/ibtrs-client//paths// +=== + +state (R) +- + +Contains "connected" if the session is connected to the peer and fully +functional. Otherwise the file contains "disconnected" + +reconnect (RW) +-- + +Write "1" to the file in order to reconnect the path. +Operation is blocking and returns 0 if reconnect was successful. + +disconnect (RW) +--- + +Write "1" to the file in order to disconnect the path. +Operation blocks until IBTRS path is disconnected. + +remove_path (RW) + + +Write "1" to the file in order to disconnected and remove the path +from the session. Operation blocks until the path is disconnected +and removed from the session. + +hca_name (R) + + +Contains the the name of HCA the connection established on. + +hca_port (R) + + +Contains the port number of active port traffic is going through. + +src_addr (R) + + +Contains the source address of the path + +dst_addr (R) + + +Contains the destination address of the path + + +Entries under /sys/devices/virtual/ibtrs-client//paths//stats/ += + +Write "0" to any file in that directory to reset corresponding statistics. + +reset_all (RW) +-- + +Read will return usage help, write 0 will clear all the statistics. + +sg_entries (RW) +--- + +Data to be transferred via RDMA is passed to IBTRS as scatter-gather +list. A scatter-gather list can contain multiple entries. +Scatter-gather list with less entries require less processing power +and can therefore transferred faster. The file sg_entries outputs a +per-CPU distribution table for the number of entries in the +scatter-gather lists, that were passed to the IBTRS API function +ibtrs_clt_request (READ or WRITE). + +cpu_migration (RW) +-- + +IBTRS expects that each HCA IRQ is pinned to a separate CPU. If it's +not the case, the processing of an I/O response could
[PATCH v3 09/25] ibtrs: server: private header with server structs and functions
This header describes main structs and functions used by ibtrs-server module, mainly for accepting IBTRS sessions, creating/destroying sysfs entries, accounting statistics on server side. Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/infiniband/ulp/ibtrs/ibtrs-srv.h | 177 +++ 1 file changed, 177 insertions(+) create mode 100644 drivers/infiniband/ulp/ibtrs/ibtrs-srv.h diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-srv.h b/drivers/infiniband/ulp/ibtrs/ibtrs-srv.h new file mode 100644 index ..b1e32136f352 --- /dev/null +++ b/drivers/infiniband/ulp/ibtrs/ibtrs-srv.h @@ -0,0 +1,177 @@ +/* + * InfiniBand Transport Layer + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * Swapnil Ingle + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#ifndef IBTRS_SRV_H +#define IBTRS_SRV_H + +#include +#include +#include "ibtrs-pri.h" + +/** + * enum ibtrs_srv_state - Server states. + */ +enum ibtrs_srv_state { + IBTRS_SRV_CONNECTING, + IBTRS_SRV_CONNECTED, + IBTRS_SRV_CLOSING, + IBTRS_SRV_CLOSED, +}; + +static inline const char *ibtrs_srv_state_str(enum ibtrs_srv_state state) +{ + switch (state) { + case IBTRS_SRV_CONNECTING: + return "IBTRS_SRV_CONNECTING"; + case IBTRS_SRV_CONNECTED: + return "IBTRS_SRV_CONNECTED"; + case IBTRS_SRV_CLOSING: + return "IBTRS_SRV_CLOSING"; + case IBTRS_SRV_CLOSED: + return "IBTRS_SRV_CLOSED"; + default: + return "UNKNOWN"; + } +} + +struct ibtrs_stats_wc_comp { + atomic64_t calls; + atomic64_t total_wc_cnt; +}; + +struct ibtrs_srv_stats_rdma_stats { + struct { + atomic64_t cnt; + atomic64_t size_total; + } dir[2]; +}; + +struct ibtrs_srv_stats { + struct ibtrs_srv_stats_rdma_stats rdma_stats; + atomic_tapm_cnt; + struct ibtrs_stats_wc_comp wc_comp; +}; + +struct ibtrs_srv_con { + struct ibtrs_conc; + atomic_twr_cnt; +}; + +struct ibtrs_srv_op { + struct ibtrs_srv_con*con; + u32 msg_id; + u8 dir; + struct ibtrs_msg_rdma_read *rd_msg; + struct ib_rdma_wr *tx_wr; + struct ib_sge *tx_sg; +}; + +struct ibtrs_srv_mr { + struct ib_mr*mr; + struct sg_table sgt; +}; + +struct ibtrs_srv_sess { + struct ibtrs_sess s; + struct ibtrs_srv*srv; + struct work_struct close_work; + enum ibtrs_srv_statestate; + spinlock_t state_lock; + int cur_cq_vector; + struct ibtrs_srv_op **ops_ids; + atomic_tids_inflight; + wait_queue_head_t ids_waitq; + struct ibtrs_srv_mr *mrs; + unsigned intmrs_num; + dma_addr_t *dma_addr; + boolestablished; + unsigned intmem_bits; + struct kobject kobj; + struct kobject kobj_stats; + struct ibtrs_srv_stats stats; +}; + +struct ibtrs_srv { + struct list_headpaths_list; + int paths_up; + struct mutexpaths_ev_mutex; + size_t paths_num; + struct mutexpaths_mutex; + uuid_t paths_uuid; + refcount_t refcount; + struct ibtrs_srv_ctx*ctx; + struct list_headctx_list; + void*priv; + size_t queue_depth; + struct page **chunks; + struct device dev; + unsigneddev_ref; + struct kobject kobj_paths; +}; + +struct ibtrs_
[PATCH v3 16/25] ibnbd: client: private header with client structs and functions
This header describes main structs and functions used by ibnbd-client module, mainly for managing IBNBD sessions and mapped block devices, creating and destroying sysfs entries. Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/block/ibnbd/ibnbd-clt.h | 172 1 file changed, 172 insertions(+) create mode 100644 drivers/block/ibnbd/ibnbd-clt.h diff --git a/drivers/block/ibnbd/ibnbd-clt.h b/drivers/block/ibnbd/ibnbd-clt.h new file mode 100644 index ..c5f6f08ec338 --- /dev/null +++ b/drivers/block/ibnbd/ibnbd-clt.h @@ -0,0 +1,172 @@ +/* + * InfiniBand Network Block Driver + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * Swapnil Ingle + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#ifndef IBNBD_CLT_H +#define IBNBD_CLT_H + +#include +#include +#include +#include +#include + +#include "ibtrs.h" +#include "ibnbd-proto.h" +#include "ibnbd-log.h" + +#define BMAX_SEGMENTS 31 +#define RECONNECT_DELAY 30 +#define MAX_RECONNECTS -1 + +enum ibnbd_clt_dev_state { + DEV_STATE_INIT, + DEV_STATE_MAPPED, + DEV_STATE_MAPPED_DISCONNECTED, + DEV_STATE_UNMAPPED, +}; + +struct ibnbd_iu_comp { + wait_queue_head_t wait; + int errno; +}; + +struct ibnbd_iu { + union { + struct request *rq; /* for block io */ + void *buf; /* for user messages */ + }; + struct ibtrs_tag*tag; + union { + /* use to send msg associated with a dev */ + struct ibnbd_clt_dev *dev; + /* use to send msg associated with a sess */ + struct ibnbd_clt_session *sess; + }; + blk_status_tstatus; + struct scatterlist sglist[BMAX_SEGMENTS]; + struct work_struct work; + int errno; + struct ibnbd_iu_comp*comp; +}; + +struct ibnbd_cpu_qlist { + struct list_headrequeue_list; + spinlock_t requeue_lock; + unsigned intcpu; +}; + +struct ibnbd_clt_session { + struct list_headlist; + struct ibtrs_clt*ibtrs; + wait_queue_head_t ibtrs_waitq; + boolibtrs_ready; + struct ibnbd_cpu_qlist __percpu + *cpu_queues; + DECLARE_BITMAP(cpu_queues_bm, NR_CPUS); + int __percpu*cpu_rr; /* per-cpu var for CPU round-robin */ + atomic_tbusy; + int queue_depth; + u32 max_io_size; + struct blk_mq_tag_set tag_set; + struct mutexlock; /* protects state and devs_list */ + struct list_headdevs_list; /* list of struct ibnbd_clt_dev */ + refcount_t refcount; + charsessname[NAME_MAX]; + u8 ver; /* protocol version */ +}; + +/** + * Submission queues. + */ +struct ibnbd_queue { + struct list_headrequeue_list; + unsigned long in_list; + struct ibnbd_clt_dev*dev; + struct blk_mq_hw_ctx*hctx; +}; + +struct ibnbd_clt_dev { + struct ibnbd_clt_session*sess; + struct request_queue*queue; + struct ibnbd_queue *hw_queues; + u32 device_id; + /* local Idr index - used to track minor number allocations. */ + u32 clt_device_id; + struct mutexlock; + enum ibnbd_clt_dev_statedev_state; + enum ibnbd_io_mode io_mode; /* user requested */ + enum ibnbd_io_mode remote_io_mode; /* server really used */ + charpathname[NAME_MAX]; + enum ibnbd_access_mode access_mode; + boolread_only; + boolrotational; + u32 max_hw_sectors; + u32
[PATCH v3 17/25] ibnbd: client: main functionality
This is main functionality of ibnbd-client module, which provides interface to map remote device as local block device /dev/ibnbd and feeds IBTRS with IO requests. Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/block/ibnbd/ibnbd-clt.c | 1817 +++ 1 file changed, 1817 insertions(+) create mode 100644 drivers/block/ibnbd/ibnbd-clt.c diff --git a/drivers/block/ibnbd/ibnbd-clt.c b/drivers/block/ibnbd/ibnbd-clt.c new file mode 100644 index ..d665e144a253 --- /dev/null +++ b/drivers/block/ibnbd/ibnbd-clt.c @@ -0,0 +1,1817 @@ +/* + * InfiniBand Network Block Driver + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * Swapnil Ingle + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#undef pr_fmt +#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt + +#include +#include +#include +#include +#include + +#include "ibnbd-clt.h" + +MODULE_AUTHOR("ib...@profitbricks.com"); +MODULE_DESCRIPTION("InfiniBand Network Block Device Client"); +MODULE_VERSION(IBNBD_VER_STRING); +MODULE_LICENSE("GPL"); + +/* + * This is for closing devices when unloading the module: + * we might be closing a lot (>256) of devices in parallel + * and it is better not to use the system_wq. + */ +static struct workqueue_struct *unload_wq; +static int ibnbd_client_major; +static DEFINE_IDA(index_ida); +static DEFINE_MUTEX(ida_lock); +static DEFINE_MUTEX(sess_lock); +static LIST_HEAD(sess_list); + +static bool softirq_enable; +module_param(softirq_enable, bool, 0444); +MODULE_PARM_DESC(softirq_enable, "finish request in softirq_fn." +" (default: 0)"); +/* + * Maximum number of partitions an instance can have. + * 6 bits = 64 minors = 63 partitions (one minor is used for the device itself) + */ +#define IBNBD_PART_BITS6 +#define KERNEL_SECTOR_SIZE 512 + +static inline bool ibnbd_clt_get_sess(struct ibnbd_clt_session *sess) +{ + return refcount_inc_not_zero(&sess->refcount); +} + +static void free_sess(struct ibnbd_clt_session *sess); + +static void ibnbd_clt_put_sess(struct ibnbd_clt_session *sess) +{ + might_sleep(); + + if (refcount_dec_and_test(&sess->refcount)) + free_sess(sess); +} + +static inline bool ibnbd_clt_dev_is_mapped(struct ibnbd_clt_dev *dev) +{ + return dev->dev_state == DEV_STATE_MAPPED; +} + +static void ibnbd_clt_put_dev(struct ibnbd_clt_dev *dev) +{ + might_sleep(); + + if (refcount_dec_and_test(&dev->refcount)) { + mutex_lock(&ida_lock); + ida_simple_remove(&index_ida, dev->clt_device_id); + mutex_unlock(&ida_lock); + kfree(dev->hw_queues); + ibnbd_clt_put_sess(dev->sess); + kfree(dev); + } +} + +static inline bool ibnbd_clt_get_dev(struct ibnbd_clt_dev *dev) +{ + return refcount_inc_not_zero(&dev->refcount); +} + +static int ibnbd_clt_set_dev_attr(struct ibnbd_clt_dev *dev, + const struct ibnbd_msg_open_rsp *rsp) +{ + struct ibnbd_clt_session *sess = dev->sess; + + if (unlikely(!rsp->logical_block_size)) + return -EINVAL; + + dev->device_id = le32_to_cpu(rsp->device_id); + dev->nsectors = le64_to_cpu(rsp->nsectors); + dev->logical_block_size = le16_to_cpu(rsp->logical_block_size); + dev->physical_block_size= le16_to_cpu(rsp->physical_block_size); + dev->max_write_same_sectors = le32_to_cpu(rsp->max_write_same_sectors); + dev->max_discard_sectors= le32_to_cpu(rsp->max_discard_sectors); + dev->discard_granularity= le32_to_cpu(rsp->discard_granularity); + dev->discard_alignment = le32_to_cpu(rsp->discard_alignment); + dev->secure_discard = le16_to_cpu(rsp-&g
[PATCH v3 22/25] ibnbd: server: sysfs interface functions
This is the sysfs interface to IBNBD mapped devices on server side: /sys/devices/virtual/ibnbd-server/ctl/devices// |- block_dev | *** link pointing to the corresponding block device sysfs entry | |- sessions// | *** sessions directory | |- read_only | *** is devices mapped as read only | |- mapping_path *** relative device path provided by the client during mapping Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/block/ibnbd/ibnbd-srv-sysfs.c | 242 ++ 1 file changed, 242 insertions(+) create mode 100644 drivers/block/ibnbd/ibnbd-srv-sysfs.c diff --git a/drivers/block/ibnbd/ibnbd-srv-sysfs.c b/drivers/block/ibnbd/ibnbd-srv-sysfs.c new file mode 100644 index ..5bf77cdb09c8 --- /dev/null +++ b/drivers/block/ibnbd/ibnbd-srv-sysfs.c @@ -0,0 +1,242 @@ +/* + * InfiniBand Network Block Driver + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#undef pr_fmt +#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt + +#include +#include +#include +#include +#include +#include +#include +#include + +#include "ibnbd-srv.h" + +static struct device *ibnbd_dev; +static struct class *ibnbd_dev_class; +static struct kobject *ibnbd_devs_kobj; + +static struct attribute *ibnbd_srv_default_dev_attrs[] = { + NULL, +}; + +static struct attribute_group ibnbd_srv_default_dev_attr_group = { + .attrs = ibnbd_srv_default_dev_attrs, +}; + +static struct kobj_type ktype = { + .sysfs_ops = &kobj_sysfs_ops, +}; + +int ibnbd_srv_create_dev_sysfs(struct ibnbd_srv_dev *dev, + struct block_device *bdev, + const char *dir_name) +{ + struct kobject *bdev_kobj; + int ret; + + ret = kobject_init_and_add(&dev->dev_kobj, &ktype, + ibnbd_devs_kobj, dir_name); + if (ret) + return ret; + + ret = kobject_init_and_add(&dev->dev_sessions_kobj, + &ktype, + &dev->dev_kobj, "sessions"); + if (ret) + goto err; + + ret = sysfs_create_group(&dev->dev_kobj, +&ibnbd_srv_default_dev_attr_group); + if (ret) + goto err2; + + bdev_kobj = &disk_to_dev(bdev->bd_disk)->kobj; + ret = sysfs_create_link(&dev->dev_kobj, bdev_kobj, "block_dev"); + if (ret) + goto err3; + + return 0; + +err3: + sysfs_remove_group(&dev->dev_kobj, + &ibnbd_srv_default_dev_attr_group); +err2: + kobject_del(&dev->dev_sessions_kobj); + kobject_put(&dev->dev_sessions_kobj); +err: + kobject_del(&dev->dev_kobj); + kobject_put(&dev->dev_kobj); + return ret; +} + +void ibnbd_srv_destroy_dev_sysfs(struct ibnbd_srv_dev *dev) +{ + sysfs_remove_link(&dev->dev_kobj, "block_dev"); + sysfs_remove_group(&dev->dev_kobj, &ibnbd_srv_default_dev_attr_group); + kobject_del(&dev->dev_sessions_kobj); + kobject_put(&dev->dev_sessions_kobj); + kobject_del(&dev->dev_kobj); + kobject_put(&dev->dev_kobj); +} + +static ssize_t ibnbd_srv_dev_session_ro_show(struct kobject *kobj, +struct kobj_attribute *attr, +char *page) +{ + struct ibnbd_srv_sess_dev *sess_dev; + + sess_dev = container_of(kobj, struct ibnbd_srv_sess_dev, kobj); + + return scnprintf(page, PAGE_SIZE, "%s\n", +(sess_dev->open_flags & FMODE_WRITE) ? "0" : "1"); +} + +static struct kobj_attribute ibnbd_srv_
[PATCH v3 15/25] ibnbd: private headers with IBNBD protocol structs and helpers
These are common private headers with IBNBD protocol structures, logging, sysfs and other helper functions, which are used on both client and server sides. Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/block/ibnbd/ibnbd-log.h | 71 drivers/block/ibnbd/ibnbd-proto.h | 364 ++ 2 files changed, 435 insertions(+) create mode 100644 drivers/block/ibnbd/ibnbd-log.h create mode 100644 drivers/block/ibnbd/ibnbd-proto.h diff --git a/drivers/block/ibnbd/ibnbd-log.h b/drivers/block/ibnbd/ibnbd-log.h new file mode 100644 index ..489343a61171 --- /dev/null +++ b/drivers/block/ibnbd/ibnbd-log.h @@ -0,0 +1,71 @@ +/* + * InfiniBand Network Block Driver + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#ifndef IBNBD_LOG_H +#define IBNBD_LOG_H + +#include "ibnbd-clt.h" +#include "ibnbd-srv.h" + +#define ibnbd_diskname(dev) ({ \ + struct gendisk *gd = ((struct ibnbd_clt_dev *)dev)->gd; \ + gd ? gd->disk_name : "";\ +}) + +void unknown_type(void); + +#define ibnbd_log(fn, dev, fmt, ...) ({ \ + __builtin_choose_expr( \ + __builtin_types_compatible_p( \ + typeof(dev), struct ibnbd_clt_dev *), \ + fn("<%s@%s> %s: " fmt, (dev)->pathname, \ + (dev)->sess->sessname, ibnbd_diskname(dev), \ + ##__VA_ARGS__), \ + __builtin_choose_expr( \ + __builtin_types_compatible_p(typeof(dev), \ + struct ibnbd_srv_sess_dev *), \ + fn("<%s@%s>: " fmt, (dev)->pathname,\ + (dev)->sess->sessname, ##__VA_ARGS__), \ + unknown_type())); \ +}) + +#define ibnbd_err(dev, fmt, ...) \ + ibnbd_log(pr_err, dev, fmt, ##__VA_ARGS__) +#define ibnbd_err_rl(dev, fmt, ...)\ + ibnbd_log(pr_err_ratelimited, dev, fmt, ##__VA_ARGS__) +#define ibnbd_wrn(dev, fmt, ...) \ + ibnbd_log(pr_warn, dev, fmt, ##__VA_ARGS__) +#define ibnbd_wrn_rl(dev, fmt, ...) \ + ibnbd_log(pr_warn_ratelimited, dev, fmt, ##__VA_ARGS__) +#define ibnbd_info(dev, fmt, ...) \ + ibnbd_log(pr_info, dev, fmt, ##__VA_ARGS__) +#define ibnbd_info_rl(dev, fmt, ...) \ + ibnbd_log(pr_info_ratelimited, dev, fmt, ##__VA_ARGS__) + +#endif /* IBNBD_LOG_H */ diff --git a/drivers/block/ibnbd/ibnbd-proto.h b/drivers/block/ibnbd/ibnbd-proto.h new file mode 100644 index ..050d3fa4c1bf --- /dev/null +++ b/drivers/block/ibnbd/ibnbd-proto.h @@ -0,0 +1,364 @@ +/* + * InfiniBand Network Block Driver + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU
[PATCH v3 25/25] MAINTAINERS: Add maintainer for IBNBD/IBTRS modules
Signed-off-by: Roman Pen Cc: Danil Kipnis Cc: Jack Wang --- MAINTAINERS | 14 ++ 1 file changed, 14 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index ca4afd68530c..201c6c8e039e 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -6782,6 +6782,20 @@ IBM ServeRAID RAID DRIVER S: Orphan F: drivers/scsi/ips.* +IBNBD BLOCK DRIVERS +M: IBNBD/IBTRS Storage Team +L: linux-block@vger.kernel.org +S: Maintained +T: git git://github.com/profitbricks/ibnbd.git +F: drivers/block/ibnbd/ + +IBTRS TRANSPORT DRIVERS +M: IBNBD/IBTRS Storage Team +L: linux-r...@vger.kernel.org +S: Maintained +T: git git://github.com/profitbricks/ibnbd.git +F: drivers/infiniband/ulp/ibtrs/ + ICH LPC AND GPIO DRIVER M: Peter Tyser S: Maintained -- 2.13.1
[PATCH v3 10/25] ibtrs: server: main functionality
This is main functionality of ibtrs-server module, which accepts set of RDMA connections (so called IBTRS session), creates/destroys sysfs entries associated with IBTRS session and notifies upper layer (user of IBTRS API) about RDMA requests or link events. Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/infiniband/ulp/ibtrs/ibtrs-srv.c | 2003 ++ 1 file changed, 2003 insertions(+) create mode 100644 drivers/infiniband/ulp/ibtrs/ibtrs-srv.c diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-srv.c b/drivers/infiniband/ulp/ibtrs/ibtrs-srv.c new file mode 100644 index ..22c965cd5c8b --- /dev/null +++ b/drivers/infiniband/ulp/ibtrs/ibtrs-srv.c @@ -0,0 +1,2003 @@ +/* + * InfiniBand Transport Layer + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * Swapnil Ingle + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#undef pr_fmt +#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt + +#include +#include + +#include "ibtrs-srv.h" +#include "ibtrs-log.h" + +MODULE_AUTHOR("ib...@profitbricks.com"); +MODULE_DESCRIPTION("IBTRS Server"); +MODULE_VERSION(IBTRS_VER_STRING); +MODULE_LICENSE("GPL"); + +/* Must be power of 2, see mask from mr->page_size in ib_sg_to_pages() */ +#define DEFAULT_MAX_CHUNK_SIZE (128 << 10) +#define DEFAULT_SESS_QUEUE_DEPTH 512 +#define MAX_HDR_SIZE PAGE_SIZE +#define MAX_SG_COUNT ((MAX_HDR_SIZE - sizeof(struct ibtrs_msg_rdma_read)) \ + / sizeof(struct ibtrs_sg_desc)) + +/* We guarantee to serve 10 paths at least */ +#define CHUNK_POOL_SZ 10 + +static struct ibtrs_ib_dev_pool dev_pool; +static mempool_t *chunk_pool; +struct class *ibtrs_dev_class; + +static int retry_count = 7; +static int __read_mostly max_chunk_size = DEFAULT_MAX_CHUNK_SIZE; +static int __read_mostly sess_queue_depth = DEFAULT_SESS_QUEUE_DEPTH; + +module_param_named(max_chunk_size, max_chunk_size, int, 0444); +MODULE_PARM_DESC(max_chunk_size, +"Max size for each IO request, when change the unit is in byte" +" (default: " __stringify(DEFAULT_MAX_CHUNK_SIZE_KB) "KB)"); + +module_param_named(sess_queue_depth, sess_queue_depth, int, 0444); +MODULE_PARM_DESC(sess_queue_depth, +"Number of buffers for pending I/O requests to allocate" +" per session. Maximum: " __stringify(MAX_SESS_QUEUE_DEPTH) +" (default: " __stringify(DEFAULT_SESS_QUEUE_DEPTH) ")"); + +static int retry_count_set(const char *val, const struct kernel_param *kp) +{ + int err, ival; + + err = kstrtoint(val, 0, &ival); + if (err) + return err; + + if (ival < MIN_RTR_CNT || ival > MAX_RTR_CNT) { + pr_err("Invalid retry count value %d, has to be" + " > %d, < %d\n", ival, MIN_RTR_CNT, MAX_RTR_CNT); + return -EINVAL; + } + + retry_count = ival; + pr_info("QP retry count changed to %d\n", ival); + + return 0; +} + +static const struct kernel_param_ops retry_count_ops = { + .set= retry_count_set, + .get= param_get_int, +}; +module_param_cb(retry_count, &retry_count_ops, &retry_count, 0644); + +MODULE_PARM_DESC(retry_count, "Number of times to send the message if the" +" remote side didn't respond with Ack or Nack (default: 3," +" min: " __stringify(MIN_RTR_CNT) ", max: " +__stringify(MAX_RTR_CNT) ")"); + +static char cq_affinity_list[256] = ""; +static cpumask_t cq_affinity_mask = { CPU_BITS_ALL }; + +static void init_cq_affinity(void) +{ + sprintf(cq_affinity_list, "0-%d", nr_cpu_ids - 1); +} + +static int cq_affinity_list_set(const char *val, const struct kernel_param *kp) +
[PATCH v3 23/25] ibnbd: include client and server modules into kernel compilation
Add IBNBD Makefile, Kconfig and also corresponding lines into upper block layer files. Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/block/Kconfig| 2 ++ drivers/block/Makefile | 1 + drivers/block/ibnbd/Kconfig | 22 ++ drivers/block/ibnbd/Makefile | 11 +++ 4 files changed, 36 insertions(+) create mode 100644 drivers/block/ibnbd/Kconfig create mode 100644 drivers/block/ibnbd/Makefile diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig index ad9b687a236a..d8c1590411c8 100644 --- a/drivers/block/Kconfig +++ b/drivers/block/Kconfig @@ -481,4 +481,6 @@ config BLK_DEV_RSXX To compile this driver as a module, choose M here: the module will be called rsxx. +source "drivers/block/ibnbd/Kconfig" + endif # BLK_DEV diff --git a/drivers/block/Makefile b/drivers/block/Makefile index dc061158b403..65346a1d0b1a 100644 --- a/drivers/block/Makefile +++ b/drivers/block/Makefile @@ -38,6 +38,7 @@ obj-$(CONFIG_BLK_DEV_PCIESSD_MTIP32XX)+= mtip32xx/ obj-$(CONFIG_BLK_DEV_RSXX) += rsxx/ obj-$(CONFIG_BLK_DEV_NULL_BLK) += null_blk.o obj-$(CONFIG_ZRAM) += zram/ +obj-$(CONFIG_BLK_DEV_IBNBD)+= ibnbd/ skd-y := skd_main.o swim_mod-y := swim.o swim_asm.o diff --git a/drivers/block/ibnbd/Kconfig b/drivers/block/ibnbd/Kconfig new file mode 100644 index ..b381c6c084d2 --- /dev/null +++ b/drivers/block/ibnbd/Kconfig @@ -0,0 +1,22 @@ +config BLK_DEV_IBNBD + bool + +config BLK_DEV_IBNBD_CLIENT + tristate "Network block device driver on top of IBTRS transport" + depends on INFINIBAND_IBTRS_CLIENT + select BLK_DEV_IBNBD + help + IBNBD client allows for mapping of a remote block devices over + IBTRS protocol from a target system where IBNBD server is running. + + If unsure, say N. + +config BLK_DEV_IBNBD_SERVER + tristate "Network block device over RDMA Infiniband server support" + depends on INFINIBAND_IBTRS_SERVER + select BLK_DEV_IBNBD + help + IBNBD server allows for exporting local block devices to a remote client + over IBTRS protocol. + + If unsure, say N. diff --git a/drivers/block/ibnbd/Makefile b/drivers/block/ibnbd/Makefile new file mode 100644 index ..ac906036310e --- /dev/null +++ b/drivers/block/ibnbd/Makefile @@ -0,0 +1,11 @@ +ccflags-y := -Idrivers/infiniband/ulp/ibtrs + +ibnbd-client-y := ibnbd-clt.o \ + ibnbd-clt-sysfs.o + +ibnbd-server-y := ibnbd-srv.o \ + ibnbd-srv-dev.o \ + ibnbd-srv-sysfs.o + +obj-$(CONFIG_BLK_DEV_IBNBD_CLIENT) += ibnbd-client.o +obj-$(CONFIG_BLK_DEV_IBNBD_SERVER) += ibnbd-server.o -- 2.13.1
[PATCH v3 18/25] ibnbd: client: sysfs interface functions
This is the sysfs interface to IBNBD block devices on client side: /sys/devices/virtual/ibnbd-client/ctl/ |- map_device | *** maps remote device | |- devices/ *** all mapped devices /sys/block/ibnbd/ibnbd_client/ |- unmap_device | *** unmaps device | |- state | *** device state | |- session | *** session name | |- mapping_path *** path of the dev that was mapped on server Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/block/ibnbd/ibnbd-clt-sysfs.c | 685 ++ 1 file changed, 685 insertions(+) create mode 100644 drivers/block/ibnbd/ibnbd-clt-sysfs.c diff --git a/drivers/block/ibnbd/ibnbd-clt-sysfs.c b/drivers/block/ibnbd/ibnbd-clt-sysfs.c new file mode 100644 index ..3d3659a74e94 --- /dev/null +++ b/drivers/block/ibnbd/ibnbd-clt-sysfs.c @@ -0,0 +1,685 @@ +/* + * InfiniBand Network Block Driver + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * Swapnil Ingle + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#undef pr_fmt +#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "ibnbd-clt.h" + +static struct device *ibnbd_dev; +static struct class *ibnbd_dev_class; +static struct kobject *ibnbd_devs_kobj; + +enum { + IBNBD_OPT_ERR = 0, + IBNBD_OPT_PATH = 1 << 0, + IBNBD_OPT_DEV_PATH = 1 << 1, + IBNBD_OPT_ACCESS_MODE = 1 << 3, + IBNBD_OPT_IO_MODE = 1 << 5, + IBNBD_OPT_SESSNAME = 1 << 6, +}; + +static unsigned int ibnbd_opt_mandatory[] = { + IBNBD_OPT_PATH, + IBNBD_OPT_DEV_PATH, + IBNBD_OPT_SESSNAME, +}; + +static const match_table_t ibnbd_opt_tokens = { + { IBNBD_OPT_PATH, "path=%s" }, + { IBNBD_OPT_DEV_PATH, "device_path=%s"}, + { IBNBD_OPT_ACCESS_MODE, "access_mode=%s"}, + { IBNBD_OPT_IO_MODE, "io_mode=%s"}, + { IBNBD_OPT_SESSNAME, "sessname=%s" }, + { IBNBD_OPT_ERR, NULL}, +}; + +/* remove new line from string */ +static void strip(char *s) +{ + char *p = s; + + while (*s != '\0') { + if (*s != '\n') + *p++ = *s++; + else + ++s; + } + *p = '\0'; +} + +static int ibnbd_clt_parse_map_options(const char *buf, + char *sessname, + struct ibtrs_addr *paths, + size_t *path_cnt, + size_t max_path_cnt, + char *pathname, + enum ibnbd_access_mode *access_mode, + enum ibnbd_io_mode *io_mode) +{ + char *options, *sep_opt; + char *p; + substring_t args[MAX_OPT_ARGS]; + int opt_mask = 0; + int token; + int ret = -EINVAL; + int i; + int p_cnt = 0; + + options = kstrdup(buf, GFP_KERNEL); + if (!options) + return -ENOMEM; + + sep_opt = strstrip(options); + strip(sep_opt); + while ((p = strsep(&sep_opt, " ")) != NULL) { + if (!*p) + continue; + + token = match_token(p, ibnbd_opt_tokens, args); + opt_mask |= token; + + switch (token) { + case IBNBD_OPT_SESSNAME: + p = match_strdup(args); + if (!p) { + ret = -ENOMEM; +
[PATCH v3 20/25] ibnbd: server: main functionality
This is main functionality of ibnbd-server module, which handles IBTRS events and IBNBD protocol requests, like map (open) or unmap (close) device. Also server side is responsible for processing incoming IBTRS IO requests and forward them to local mapped devices. Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/block/ibnbd/ibnbd-srv.c | 946 1 file changed, 946 insertions(+) create mode 100644 drivers/block/ibnbd/ibnbd-srv.c diff --git a/drivers/block/ibnbd/ibnbd-srv.c b/drivers/block/ibnbd/ibnbd-srv.c new file mode 100644 index ..b045f8071ab0 --- /dev/null +++ b/drivers/block/ibnbd/ibnbd-srv.c @@ -0,0 +1,946 @@ +/* + * InfiniBand Network Block Driver + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#undef pr_fmt +#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt + +#include +#include + +#include "ibnbd-srv.h" +#include "ibnbd-srv-dev.h" + +MODULE_AUTHOR("ib...@profitbricks.com"); +MODULE_VERSION(IBNBD_VER_STRING); +MODULE_DESCRIPTION("InfiniBand Network Block Device Server"); +MODULE_LICENSE("GPL"); + +#define DEFAULT_DEV_SEARCH_PATH "/" + +static char dev_search_path[PATH_MAX] = DEFAULT_DEV_SEARCH_PATH; + +static int dev_search_path_set(const char *val, const struct kernel_param *kp) +{ + char *dup; + + if (strlen(val) >= sizeof(dev_search_path)) + return -EINVAL; + + dup = kstrdup(val, GFP_KERNEL); + + if (dup[strlen(dup) - 1] == '\n') + dup[strlen(dup) - 1] = '\0'; + + strlcpy(dev_search_path, dup, sizeof(dev_search_path)); + + kfree(dup); + pr_info("dev_search_path changed to '%s'\n", dev_search_path); + + return 0; +} + +static struct kparam_string dev_search_path_kparam_str = { + .maxlen = sizeof(dev_search_path), + .string = dev_search_path +}; + +static const struct kernel_param_ops dev_search_path_ops = { + .set= dev_search_path_set, + .get= param_get_string, +}; + +module_param_cb(dev_search_path, &dev_search_path_ops, + &dev_search_path_kparam_str, 0444); +MODULE_PARM_DESC(dev_search_path, "Sets the dev_search_path." +" When a device is mapped this path is prepended to the" +" device path from the map device operation. If %SESSNAME%" +" is specified in a path, then device will be searched in a" +" session namespace." +" (default: " DEFAULT_DEV_SEARCH_PATH ")"); + +static int def_io_mode = IBNBD_BLOCKIO; + +static int def_io_mode_set(const char *val, const struct kernel_param *kp) +{ + int io_mode, rc; + + rc = kstrtoint(val, 0, &io_mode); + if (unlikely(rc)) + return rc; + + switch (io_mode) { + case IBNBD_FILEIO: + case IBNBD_BLOCKIO: + def_io_mode = io_mode; + return 0; + default: + return -EINVAL; + } +} + +static const struct kernel_param_ops def_io_mode_ops = { + .set= def_io_mode_set, + .get= param_get_int, +}; +module_param_cb(def_io_mode, &def_io_mode_ops, &def_io_mode, 0444); +MODULE_PARM_DESC(def_io_mode, "By default, export devices in" +" blockio(" __stringify(_IBNBD_BLOCKIO) ") or" +" fileio(" __stringify(_IBNBD_FILEIO) ") mode." +" (default: " __stringify(_IBNBD_BLOCKIO) " (blockio))"); + +static DEFINE_MUTEX(sess_lock); +static DEFINE_SPINLOCK(dev_lock); + +static LIST_HEAD(sess_list); +static LIST_HEAD(dev_list); + +struct ibnbd_io_private { + struct ibtrs_srv_op *id; + struct ibnbd_srv_sess_dev *sess_dev; +}; +
[PATCH v3 24/25] ibnbd: a bit of documentation
README with description of major sysfs entries. Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/block/ibnbd/README | 299 + 1 file changed, 299 insertions(+) create mode 100644 drivers/block/ibnbd/README diff --git a/drivers/block/ibnbd/README b/drivers/block/ibnbd/README new file mode 100644 index ..bbaddd02c1c5 --- /dev/null +++ b/drivers/block/ibnbd/README @@ -0,0 +1,299 @@ +*** +Infiniband Network Block Device (IBNBD) +*** + +Introduction + + +IBNBD (InfiniBand Network Block Device) is a pair of kernel modules +(client and server) that allow for remote access of a block device on +the server over IBTRS protocol using the RDMA (InfiniBand, RoCE, iWarp) +transport. After being mapped, the remote block devices can be accessed +on the client side as local block devices. + +I/O is transfered between client and server by the IBTRS transport +modules. The administration of IBNBD and IBTRS modules is done via +sysfs entries. + +Requirements + + + IBTRS kernel modules + +Quick Start +--- + +Server side: + # modprobe ibnbd_server + +Client side: + # modprobe ibnbd_client + # echo "sessname=blya path=ip:10.50.100.66 device_path=/dev/ram0" > \ +/sys/devices/virtual/ibnbd-client/ctl/map_device + + Where "sessname=" is a session name, a string to identify the session + on client and on server sides; "path=" is a destination IP address or + a pair of a source and a destination IPs, separated by comma. Multiple + "path=" options can be specified in order to use multipath (see IBTRS + description for details); "device_path=" is the block device to be + mapped from the server side. After the session to the server machine is + established, the mapped device will appear on the client side under + /dev/ibnbd. + + +== +Client Sysfs Interface +== + +All sysfs files that are not read-only provide the usage information on read: + +Example: + # cat /sys/devices/virtual/ibnbd-client/ctl/map_device + + > Usage: echo "sessname= path=<[srcaddr,]dstaddr> + > [path=<[srcaddr,]dstaddr>] device_path= + > [access_mode=] + > [io_mode=]" > map_device + > + > addr ::= [ ip: | ip: | gid: ] + +Entries under /sys/devices/virtual/ibnbd-client/ctl/ +=== + +map_device (RW) +--- + +Expected format is the following: + +sessname= +path=<[srcaddr,]dstaddr> [path=<[srcaddr,]dstaddr> ...] +device_path= +[access_mode=] +[io_mode=] + +Where: + +sessname: accepts a string not bigger than 256 chars, which identifies + a given session on the client and on the server. + I.e. "clt_hostname-srv_hostname" could be a natural choice. + +path: describes a connection between the client and the server by + specifying destination and, when required, the source address. + The addresses are to be provided in the following format: + +ip: +ip: +gid: + + for example: + + path=ip:10.0.0.66 + The single addr is treated as the destination. + The connection will be established to this + server from any client IP address. + + path=ip:10.0.0.66,ip:10.0.1.66 + First addr is the source address and the second + is the destination. + + If multiple "path=" options are specified multiple connection + will be established and data will be sent according to + the selected multipath policy (see IBTRS mp_policy sysfs entry + description). + +device_path: Path to the block device on the server side. Path is specified + relative to the directory on server side configured in the + 'dev_search_path' module parameter of the ibnbd_server. + The ibnbd_server prepends the received from client + with and tries to open the + / block device. On success, + a /dev/ibnbd device file, a /sys/block/ibnbd_client/ibnbd/ + directory and an entry in /sys/devices/virtual/ibnbd-client/ctl/devices + will be created. + + If 'dev_search_path' contains '%SESSNAME%', then each session can + have different devices namespace, e.g. server was configured with + the following parameter "dev_search_path=/run/ibnbd-devs/%SESSNAME%", + client has this string "sessname=blya device_path=sda", then server + will try to open: /run/ibnbd-devs/blya/sda. + +access_mode: the access_mode parameter specifies if the device is to be + mapped as "ro" r
[PATCH v3 19/25] ibnbd: server: private header with server structs and functions
This header describes main structs and functions used by ibnbd-server module, namely structs for managing sessions from different clients and mapped (opened) devices. Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/block/ibnbd/ibnbd-srv.h | 100 1 file changed, 100 insertions(+) create mode 100644 drivers/block/ibnbd/ibnbd-srv.h diff --git a/drivers/block/ibnbd/ibnbd-srv.h b/drivers/block/ibnbd/ibnbd-srv.h new file mode 100644 index ..191a1650bc1d --- /dev/null +++ b/drivers/block/ibnbd/ibnbd-srv.h @@ -0,0 +1,100 @@ +/* + * InfiniBand Network Block Driver + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#ifndef IBNBD_SRV_H +#define IBNBD_SRV_H + +#include +#include +#include + +#include "ibtrs.h" +#include "ibnbd-proto.h" +#include "ibnbd-log.h" + +struct ibnbd_srv_session { + /* Entry inside global sess_list */ + struct list_headlist; + struct ibtrs_srv*ibtrs; + charsessname[NAME_MAX]; + int queue_depth; + struct bio_set *sess_bio_set; + + rwlock_tindex_lock cacheline_aligned; + struct idr index_idr; + /* List of struct ibnbd_srv_sess_dev */ + struct list_headsess_dev_list; + struct mutexlock; + u8 ver; +}; + +struct ibnbd_srv_dev { + /* Entry inside global dev_list */ + struct list_headlist; + struct kobject dev_kobj; + struct kobject dev_sessions_kobj; + struct kref kref; + charid[NAME_MAX]; + /* List of ibnbd_srv_sess_dev structs */ + struct list_headsess_dev_list; + struct mutexlock; + int open_write_cnt; + enum ibnbd_io_mode mode; +}; + +/* Structure which binds N devices and N sessions */ +struct ibnbd_srv_sess_dev { + /* Entry inside ibnbd_srv_dev struct */ + struct list_headdev_list; + /* Entry inside ibnbd_srv_session struct */ + struct list_headsess_list; + struct ibnbd_dev*ibnbd_dev; + struct ibnbd_srv_session*sess; + struct ibnbd_srv_dev*dev; + struct kobject kobj; + struct completion *sysfs_release_compl; + u32 device_id; + fmode_t open_flags; + struct kref kref; + struct completion *destroy_comp; + charpathname[NAME_MAX]; +}; + +/* ibnbd-srv-sysfs.c */ + +int ibnbd_srv_create_dev_sysfs(struct ibnbd_srv_dev *dev, + struct block_device *bdev, + const char *dir_name); +void ibnbd_srv_destroy_dev_sysfs(struct ibnbd_srv_dev *dev); +int ibnbd_srv_create_dev_session_sysfs(struct ibnbd_srv_sess_dev *sess_dev); +void ibnbd_srv_destroy_dev_session_sysfs(struct ibnbd_srv_sess_dev *sess_dev); +int ibnbd_srv_create_sysfs_files(void); +void ibnbd_srv_destroy_sysfs_files(void); + +#endif /* IBNBD_SRV_H */ -- 2.13.1
[PATCH v3 03/25] ibtrs: private headers with IBTRS protocol structs and helpers
These are common private headers with IBTRS protocol structures, logging, sysfs and other helper functions, which are used on both client and server sides. Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/infiniband/ulp/ibtrs/ibtrs-log.h | 91 ++ drivers/infiniband/ulp/ibtrs/ibtrs-pri.h | 470 +++ 2 files changed, 561 insertions(+) create mode 100644 drivers/infiniband/ulp/ibtrs/ibtrs-log.h create mode 100644 drivers/infiniband/ulp/ibtrs/ibtrs-pri.h diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-log.h b/drivers/infiniband/ulp/ibtrs/ibtrs-log.h new file mode 100644 index ..f56257eabdee --- /dev/null +++ b/drivers/infiniband/ulp/ibtrs/ibtrs-log.h @@ -0,0 +1,91 @@ +/* + * InfiniBand Transport Layer + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#ifndef IBTRS_LOG_H +#define IBTRS_LOG_H + +#define P1 ) +#define P2 )) +#define P3 ))) +#define P4 +#define P(N) P ## N + +#define CAT(a, ...) PRIMITIVE_CAT(a, __VA_ARGS__) +#define PRIMITIVE_CAT(a, ...) a ## __VA_ARGS__ + +#define LIST(...) \ + __VA_ARGS__,\ + ({ unknown_type(); NULL; }) \ + CAT(P, COUNT_ARGS(__VA_ARGS__)) \ + +#define EMPTY() +#define DEFER(id) id EMPTY() + +#define _CASE(obj, type, member) \ + __builtin_choose_expr( \ + __builtin_types_compatible_p( \ + typeof(obj), type), \ + ((type)obj)->member +#define CASE(o, t, m) DEFER(_CASE)(o,t,m) + +/* + * Below we define retrieving of sessname from common IBTRS types. + * Client or server related types have to be defined by special + * TYPES_TO_SESSNAME macro. + */ + +void unknown_type(void); + +#ifndef TYPES_TO_SESSNAME +#define TYPES_TO_SESSNAME(...) ({ unknown_type(); NULL; }) +#endif + +#define ibtrs_prefix(obj) \ + _CASE(obj, struct ibtrs_con *, sess->sessname),\ + _CASE(obj, struct ibtrs_sess *, sessname), \ + TYPES_TO_SESSNAME(obj) \ + )) + +#define ibtrs_log(fn, obj, fmt, ...) \ + fn("<%s>: " fmt, ibtrs_prefix(obj), ##__VA_ARGS__) + +#define ibtrs_err(obj, fmt, ...) \ + ibtrs_log(pr_err, obj, fmt, ##__VA_ARGS__) +#define ibtrs_err_rl(obj, fmt, ...)\ + ibtrs_log(pr_err_ratelimited, obj, fmt, ##__VA_ARGS__) +#define ibtrs_wrn(obj, fmt, ...) \ + ibtrs_log(pr_warn, obj, fmt, ##__VA_ARGS__) +#define ibtrs_wrn_rl(obj, fmt, ...) \ + ibtrs_log(pr_warn_ratelimited, obj, fmt, ##__VA_ARGS__) +#define ibtrs_info(obj, fmt, ...) \ + ibtrs_log(pr_info, obj, fmt, ##__VA_ARGS__) +#define ibtrs_info_rl(obj, fmt, ...) \ + ibtrs_log(pr_info_ratelimited, obj, fmt, ##__VA_ARGS__) + +#endif /* IBTRS_LOG_H */ diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs-pri.h b/drivers/infiniband/ulp/ibtrs/ibtrs-pri.h new file mode 100644 index ..f56652a46a8d --- /dev/null +++ b/drivers/infiniband/ulp/ibtrs/ibtrs-pri.h @@ -0,0 +1,470 @@ +/* + * InfiniBand Transport Layer + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * Swapnil Ingle + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This p
[PATCH v3 21/25] ibnbd: server: functionality for IO submission to file or block dev
This provides helper functions for IO submission to file or block dev. Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/block/ibnbd/ibnbd-srv-dev.c | 413 drivers/block/ibnbd/ibnbd-srv-dev.h | 149 + 2 files changed, 562 insertions(+) create mode 100644 drivers/block/ibnbd/ibnbd-srv-dev.c create mode 100644 drivers/block/ibnbd/ibnbd-srv-dev.h diff --git a/drivers/block/ibnbd/ibnbd-srv-dev.c b/drivers/block/ibnbd/ibnbd-srv-dev.c new file mode 100644 index ..aefa10fcafc3 --- /dev/null +++ b/drivers/block/ibnbd/ibnbd-srv-dev.c @@ -0,0 +1,413 @@ +/* + * InfiniBand Network Block Driver + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#undef pr_fmt +#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt + +#include "ibnbd-srv-dev.h" +#include "ibnbd-log.h" + +#define IBNBD_DEV_MAX_FILEIO_ACTIVE_WORKERS 0 + +struct ibnbd_dev_file_io_work { + struct ibnbd_dev*dev; + void*priv; + + sector_tsector; + void*data; + size_t len; + size_t bi_size; + enum ibnbd_io_flags flags; + + struct work_struct work; +}; + +struct ibnbd_dev_blk_io { + struct ibnbd_dev *dev; + void *priv; +}; + +static struct workqueue_struct *fileio_wq; + +int ibnbd_dev_init(void) +{ + fileio_wq = alloc_workqueue("%s", WQ_UNBOUND, + IBNBD_DEV_MAX_FILEIO_ACTIVE_WORKERS, + "ibnbd_server_fileio_wq"); + if (!fileio_wq) + return -ENOMEM; + + return 0; +} + +void ibnbd_dev_destroy(void) +{ + destroy_workqueue(fileio_wq); +} + +static inline struct block_device *ibnbd_dev_open_bdev(const char *path, + fmode_t flags) +{ + return blkdev_get_by_path(path, flags, THIS_MODULE); +} + +static int ibnbd_dev_blk_open(struct ibnbd_dev *dev, const char *path, + fmode_t flags) +{ + dev->bdev = ibnbd_dev_open_bdev(path, flags); + return PTR_ERR_OR_ZERO(dev->bdev); +} + +static int ibnbd_dev_vfs_open(struct ibnbd_dev *dev, const char *path, + fmode_t flags) +{ + int oflags = O_DSYNC; /* enable write-through */ + + if (flags & FMODE_WRITE) + oflags |= O_RDWR; + else if (flags & FMODE_READ) + oflags |= O_RDONLY; + else + return -EINVAL; + + dev->file = filp_open(path, oflags, 0); + return PTR_ERR_OR_ZERO(dev->file); +} + +struct ibnbd_dev *ibnbd_dev_open(const char *path, fmode_t flags, +enum ibnbd_io_mode mode, struct bio_set *bs, +ibnbd_dev_io_fn io_cb) +{ + struct ibnbd_dev *dev; + int ret; + + dev = kzalloc(sizeof(*dev), GFP_KERNEL); + if (!dev) + return ERR_PTR(-ENOMEM); + + if (mode == IBNBD_BLOCKIO) { + dev->blk_open_flags = flags; + ret = ibnbd_dev_blk_open(dev, path, dev->blk_open_flags); + if (ret) + goto err; + } else if (mode == IBNBD_FILEIO) { + dev->blk_open_flags = FMODE_READ; + ret = ibnbd_dev_blk_open(dev, path, dev->blk_open_flags); + if (ret) + goto err; + + ret = ibnbd_dev_vfs_open(dev, path, flags); + if (ret) + goto blk_put; + } else { + ret = -EINVAL; + goto err; + } + + dev->blk_open_flags = flags; + dev->mode = mode; + dev->io_cb = io_cb; + bdevname(dev->bdev, dev->name); + dev->
[PATCH v3 00/25] InfiniBand Transport (IBTRS) and Network Block Device (IBNBD)
=randread, bandwidth in Kbytes: jobsIBNBD NVMEoRDMA Change x1 932951975425 +4.6% x8 1543074 1504416 -2.5% x16 1531282 1432937 -6.4% x24 1396153 1244858 -10.8% x32 1215334 1066607 -12.2% x40 1255781 1076841 -14.2% x48 1240931 1066453 -14.1% x56 1250333 1065879 -14.8% x64 1229389 1064199 -13.4% rw=randwrite, bandwidth in Kbytes: jobsIBNBD NVMEoRDMA Change x1 1416413 1181102 -16.6% x8 2438615 1977051 -18.9% x16 2436924 1854223 -23.9% x24 2430527 1714580 -29.5% x32 2425552 1641288 -32.3% x40 2378784 1592788 -33.0% x48 2202260 1511895 -31.3% x56 2207013 1493400 -32.3% x64 2098949 1432951 -31.7% - on ConnectX-3 (MT4099) x40 CPUs Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz rw=randread, bandwidth in Kbytes: jobsIBNBD NVMEoRDMA Change x1 1961216 2046572 +4.4% x8 4012912 4059410 +1.2% x16 4033837 3968410 -1.6% x24 3939186 3770729 -4.3% x32 3843434 3623869 -5.7% x40 3696896 3448772 -6.7% x48 4106259 3729201 -9.2% x56 4141374 3732954 -9.9% x64 4207317 3805638 -9.5% rw=randwrite, bandwidth in Kbytes: jobsIBNBD NVMEoRDMA Change x1 3195637 2479068 -22.4% x8 4576924 4541743 -0.8% x16 4581528 4555459 -0.6% x24 4692540 4595963 -2.1% x32 4686968 4540456 -3.1% x40 4583814 4404859 -3.9% x48 4969587 4710902 -5.2% x56 4996101 4701814 -5.9% x64 5083460 4759663 -6.4% The interesting observation is that on machine with Intel CPUs and ConnectX-3 card the difference between IBNBD and NVME bandwidth is significantly smaller comparing to AMD and ConnectX-2. I did not thoroughly investiage that behaviour, but suspect that the devil is in Intel vs AMD architecture and probably how NUMAs are organized, i.e. Intel has 2 NUMA nodes against 8 on AMD. If someone is interested in those results and can point me out where to dig on NVME side I can investigate deeply why exactly NVME bandwidth significantly drops on AMD machine with Connect-X2. Latest shiny graphs are here: https://docs.google.com/spreadsheets/d/1vxSoIvfjPbOWD61XMeN2_gPGxsxrbIUOZADk1UX5lj0 Roman Pen (25): sysfs: export sysfs_remove_file_self() ibtrs: public interface header to establish RDMA connections ibtrs: private headers with IBTRS protocol structs and helpers ibtrs: core: lib functions shared between client and server modules ibtrs: client: private header with client structs and functions ibtrs: client: main functionality ibtrs: client: statistics functions ibtrs: client: sysfs interface functions ibtrs: server: private header with server structs and functions ibtrs: server: main functionality ibtrs: server: statistics functions ibtrs: server: sysfs interface functions ibtrs: include client and server modules into kernel compilation ibtrs: a bit of documentation ibnbd: private headers with IBNBD protocol structs and helpers ibnbd: client: private header with client structs and functions ibnbd: client: main functionality ibnbd: client: sysfs interface functions ibnbd: server: private header with server structs and functions ibnbd: server: main functionality ibnbd: server: functionality for IO submission to file or block dev ibnbd: server: sysfs interface functions ibnbd: include client and server modules into kernel compilation ibnbd: a bit of documentation MAINTAINERS: Add maintainer for IBNBD/IBTRS modules MAINTAINERS| 14 + drivers/block/Kconfig |2 + drivers/block/Makefile |1 + drivers/block/ibnbd/Kconfig| 22 + drivers/block/ibnbd/Makefile | 11 + drivers/block/ibnbd/README | 299 +++ drivers/block/ibnbd/ibnbd-clt-sysfs.c | 685 ++ drivers/block/ibnbd/ibnbd-clt.c| 1817 +++ drivers/block/ibnbd/ibnbd-clt.h| 172 ++ drivers/block/ibnbd/ibnbd-log.h| 71 + drivers/block/ibnbd/ibnbd-proto.h | 364 +++ drivers/block/ibnbd/ibnbd-srv-dev.c| 413 drivers/block/ibnbd/ibnbd-srv-dev.h| 149 ++ drivers/block/ibnbd/ibnbd-srv-sysfs.c
[PATCH v3 02/25] ibtrs: public interface header to establish RDMA connections
Introduce public header which provides set of API functions to establish RDMA connections from client to server machine using IBTRS protocol, which manages RDMA connections for each session, does multipathing and load balancing. Main functions for client (active) side: ibtrs_clt_open() - Creates set of RDMA connections incapsulated in IBTRS session and returns pointer on IBTRS session object. ibtrs_clt_close() - Closes RDMA connections associated with IBTRS session. ibtrs_clt_request() - Requests zero-copy RDMA transfer to/from server. Main functions for server (passive) side: ibtrs_srv_open() - Starts listening for IBTRS clients on specified port and invokes IBTRS callbacks for incoming RDMA requests or link events. ibtrs_srv_close() - Closes IBTRS server context. Signed-off-by: Roman Pen Signed-off-by: Danil Kipnis Cc: Jack Wang --- drivers/infiniband/ulp/ibtrs/ibtrs.h | 325 +++ 1 file changed, 325 insertions(+) create mode 100644 drivers/infiniband/ulp/ibtrs/ibtrs.h diff --git a/drivers/infiniband/ulp/ibtrs/ibtrs.h b/drivers/infiniband/ulp/ibtrs/ibtrs.h new file mode 100644 index ..24a1e18816d7 --- /dev/null +++ b/drivers/infiniband/ulp/ibtrs/ibtrs.h @@ -0,0 +1,325 @@ +/* + * InfiniBand Transport Layer + * + * Copyright (c) 2014 - 2017 ProfitBricks GmbH. All rights reserved. + * Authors: Fabian Holler + * Jack Wang + * Kleber Souza + * Danil Kipnis + * Roman Penyaev + * Milind Dumbare + * + * Copyright (c) 2017 - 2018 ProfitBricks GmbH. All rights reserved. + * Authors: Danil Kipnis + * Roman Penyaev + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#ifndef IBTRS_H +#define IBTRS_H + +#include +#include + +struct ibtrs_tag; +struct ibtrs_clt; +struct ibtrs_srv_ctx; +struct ibtrs_srv; +struct ibtrs_srv_op; + +/* + * Here goes IBTRS client API + */ + +/** + * enum ibtrs_clt_link_ev - Events about connectivity state of a client + * @IBTRS_CLT_LINK_EV_RECONNECTED Client was reconnected. + * @IBTRS_CLT_LINK_EV_DISCONNECTED Client was disconnected. + */ +enum ibtrs_clt_link_ev { + IBTRS_CLT_LINK_EV_RECONNECTED, + IBTRS_CLT_LINK_EV_DISCONNECTED, +}; + +/** + * Source and destination address of a path to be established + */ +struct ibtrs_addr { + struct sockaddr_storage *src; + struct sockaddr_storage *dst; +}; + +typedef void (link_clt_ev_fn)(void *priv, enum ibtrs_clt_link_ev ev); +/** + * ibtrs_clt_open() - Open a session to a IBTRS client + * @priv: User supplied private data. + * @link_ev: Event notification for connection state changes + * @priv: user supplied data that was passed to + * ibtrs_clt_open() + * @ev:Occurred event + * @sessname: name of the session + * @paths: Paths to be established defined by their src and dst addresses + * @path_cnt: Number of elemnts in the @paths array + * @port: port to be used by the IBTRS session + * @pdu_sz: Size of extra payload which can be accessed after tag allocation. + * @max_inflight_msg: Max. number of parallel inflight messages for the session + * @max_segments: Max. number of segments per IO request + * @reconnect_delay_sec: time between reconnect tries + * @max_reconnect_attempts: Number of times to reconnect on error before giving + * up, 0 for * disabled, -1 for forever + * + * Starts session establishment with the ibtrs_server. The function can block + * up to ~2000ms until it returns. + * + * Return a valid pointer on success otherwise PTR_ERR. + */ +struct ibtrs_clt *ibtrs_clt_open(void *priv, link_clt_ev_fn *link_ev, +const char *sessname, +const struct ibtrs_addr *paths, +size_t path_cnt, short port, +size_t pdu_sz, u8 reconnect_delay_sec, +u16 max_segments, +s16 max_reconnect_attempts); + +/** + * ibtrs_clt_close() - Close a session + * @sess: Session handler, is freed on return + */ +void ibtrs_clt_close(struct ibtrs_clt
[PATCH 1/1] blk-mq: reinit q->tag_set_list entry only after grace period
It is not allowed to reinit q->tag_set_list list entry while RCU grace period has not completed yet, otherwise the following soft lockup in blk_mq_sched_restart() happens: [ 1064.252652] watchdog: BUG: soft lockup - CPU#12 stuck for 23s! [fio:9270] [ 1064.254445] task: 99b912e8b900 task.stack: a6d54c758000 [ 1064.254613] RIP: 0010:blk_mq_sched_restart+0x96/0x150 [ 1064.256510] Call Trace: [ 1064.256664] [ 1064.256824] blk_mq_free_request+0xea/0x100 [ 1064.256987] msg_io_conf+0x59/0xd0 [ibnbd_client] [ 1064.257175] complete_rdma_req+0xf2/0x230 [ibtrs_client] [ 1064.257340] ? ibtrs_post_recv_empty+0x4d/0x70 [ibtrs_core] [ 1064.257502] ibtrs_clt_rdma_done+0xd1/0x1e0 [ibtrs_client] [ 1064.257669] ib_create_qp+0x321/0x380 [ib_core] [ 1064.257841] ib_process_cq_direct+0xbd/0x120 [ib_core] [ 1064.258007] irq_poll_softirq+0xb7/0xe0 [ 1064.258165] __do_softirq+0x106/0x2a2 [ 1064.258328] irq_exit+0x92/0xa0 [ 1064.258509] do_IRQ+0x4a/0xd0 [ 1064.258660] common_interrupt+0x7a/0x7a [ 1064.258818] Meanwhile another context frees other queue but with the same set of shared tags: [ 1288.201183] INFO: task bash:5910 blocked for more than 180 seconds. [ 1288.201833] bashD0 5910 5820 0x [ 1288.202016] Call Trace: [ 1288.202315] schedule+0x32/0x80 [ 1288.202462] schedule_timeout+0x1e5/0x380 [ 1288.203838] wait_for_completion+0xb0/0x120 [ 1288.204137] __wait_rcu_gp+0x125/0x160 [ 1288.204287] synchronize_sched+0x6e/0x80 [ 1288.204770] blk_mq_free_queue+0x74/0xe0 [ 1288.204922] blk_cleanup_queue+0xc7/0x110 [ 1288.205073] ibnbd_clt_unmap_device+0x1bc/0x280 [ibnbd_client] [ 1288.205389] ibnbd_clt_unmap_dev_store+0x169/0x1f0 [ibnbd_client] [ 1288.205548] kernfs_fop_write+0x109/0x180 [ 1288.206328] vfs_write+0xb3/0x1a0 [ 1288.206476] SyS_write+0x52/0xc0 [ 1288.206624] do_syscall_64+0x68/0x1d0 [ 1288.206774] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 What happened is the following: 1. There are several MQ queues with shared tags. 2. One queue is about to be freed and now task is in blk_mq_del_queue_tag_set(). 3. Other CPU is in blk_mq_sched_restart() and loops over all queues in tag list in order to find hctx to restart. Because linked list entry was modified in blk_mq_del_queue_tag_set() without proper waiting for a grace period, blk_mq_sched_restart() never ends, spining in list_for_each_entry_rcu_rr(), thus soft lockup. Fix is simple: reinit list entry after an RCU grace period elapsed. Signed-off-by: Roman Pen Cc: Jens Axboe Cc: Bart Van Assche Cc: Christoph Hellwig Cc: Sagi Grimberg Cc: Ming Lei Cc: linux-block@vger.kernel.org --- block/blk-mq.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index 0dc9e341c2a7..2a40d60950f4 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2422,7 +2422,6 @@ static void blk_mq_del_queue_tag_set(struct request_queue *q) mutex_lock(&set->tag_list_lock); list_del_rcu(&q->tag_set_list); - INIT_LIST_HEAD(&q->tag_set_list); if (list_is_singular(&set->tag_list)) { /* just transitioned to unshared */ set->flags &= ~BLK_MQ_F_TAG_SHARED; @@ -2430,8 +2429,8 @@ static void blk_mq_del_queue_tag_set(struct request_queue *q) blk_mq_update_tag_set_depth(set, false); } mutex_unlock(&set->tag_list_lock); - synchronize_rcu(); + INIT_LIST_HEAD(&q->tag_set_list); } static void blk_mq_add_queue_tag_set(struct blk_mq_tag_set *set, -- 2.13.1