Re: [RFC PATCH 06/11] IB/Verbs: Use management helper has_sa() and cap_sa(), for sa-check
On Tue, Mar 31, 2015 at 08:51:13PM -0400, ira.weiny wrote: > > Going forward we want to NAK stuff like this: > > > > if (rdma_ib_mgmt() || rdma_opa_mgmt()) > > if (has_sa() || has_opa_foobar()) > > I'm confused. > > Is the idea that you would NAK has_sa but be in favor of has_ib_sa? It is the || I don't want to see. Feature tests should be one thing. > I believe it is reasonable to flag OPA MAD support as a feature set through a > single bit. From this the MAD stack can know if OPA MAD base/class versions > are allowed (or treated differently from future IB versions) and if it should > processing OPA SM Class DR MADs differently. I don't want devices to be > required to supply a list of MAD Base/Class Versions they support. That seems reasonable, but we'd gain a is_ib_smp and is_opa_smp test to do that. > For example, the MAD size is more generally (and perhaps better) communicated > via an actual mad_size attribute rather than as part of the OPA MAD feature > set. Start with a bit and change when that makes no sense, MTUs as ints make sense. > Another example is the question of is it appropriate to wrap up the single > READ > SGL entry support within the "is iwarp" flag? Again, stick with a single bit test until a HW vendors needs more, then someone can clean it. At least they now know what to clean and why. Remember, this is in the kernel, we can change it when it outlives its life. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 06/11] IB/Verbs: Use management helper has_sa() and cap_sa(), for sa-check
On Tue, Mar 31, 2015 at 05:12:02PM -0600, Jason Gunthorpe wrote: > On Mon, Mar 30, 2015 at 01:02:03PM -0400, Doug Ledford wrote: > > > If we use something like this, then the above is all you need. Then > > every place in the code that checks for something like has_sa or cap_sa > > can be replaced with rdma_ib_mgmt. > > I don't want to see this slide back into some ill defined feature > test. > > We need to clearly define exactly what these tests are checking, and I > think, since we have so much code sharing, the checks should be narrow > in scope, not just an entire standard. Somewhat agree. > > I think the basic family of checks Michael identified seemed like > a reasonable start. > > Going forward we want to NAK stuff like this: > > if (rdma_ib_mgmt() || rdma_opa_mgmt()) > if (has_sa() || has_opa_foobar()) I'm confused. Is the idea that you would NAK has_sa but be in favor of has_ib_sa? I think cap_opa_mad is reasonable. And this confuses me why has_opa_foobar would be NAKed. I believe it is reasonable to flag OPA MAD support as a feature set through a single bit. From this the MAD stack can know if OPA MAD base/class versions are allowed (or treated differently from future IB versions) and if it should processing OPA SM Class DR MADs differently. I don't want devices to be required to supply a list of MAD Base/Class Versions they support. > > That indicates we need a new micro feature test. The million dollar question is how micro we should go. More specifically which feature sets can be appropriately communicated with a single bit vs not. For example, the MAD size is more generally (and perhaps better) communicated via an actual mad_size attribute rather than as part of the OPA MAD feature set. Another example is the question of is it appropriate to wrap up the single READ SGL entry support within the "is iwarp" flag? https://www.mail-archive.com/linux-rdma@vger.kernel.org/msg23611.html Right now there is implicit information being gleaned from the "is iwarp" flag. That is that an iWarp device is limited to only 1 READ SGL entry. As this is the only device type which has this limitation _and_ that number is "well known" it works. Even if not the best solution. The task at hand is to clean up implicit information passing like this while not getting mired down in our own refactoring. > > A big problem here is people working on the core may not know the > intricate details of all the families. This will only get worse when > proprietary tech like OPA gets added. Documenting requirements via a > narrow feature test gives a much higher chance that core stuff will be > right via review. Agreed as long as we get the width right. > > > But, like I said, this is an all or nothing change, it isn't > > something we can ease into. > > Well, we ease into it by introducing the micro tests and then wiping > the legacy ones, followed by changing how the driver/core communicates > the port and device feature set, ideally to a bitset like you've > described. I think this does need to be eased into. We seem to agree that the feature flags we have now are not granular and/or explicit enough. So lets start cleaning up some of these tests and see how it goes. Ira > > Michael has tackled the core code, another series could work on the > drivers.. > > Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 08/11] IB/Verbs: Use management helper has_iwarp() for, iwarp-check
On Mon, Mar 30, 2015 at 12:13:00PM -0400, Doug Ledford wrote: > On Fri, 2015-03-27 at 16:47 +0100, Michael Wang wrote: > > Introduce helper has_iwarp() to help us check if an IB device > > support IWARP protocol. > > This is a needless redirection. Just stick with the original > rdma_transport_is_iwarp(). Sticking with the original isn't really the point. The point here, is to document what Tom was talking about - some ports can only support one RDMA READ SGL entry, even though they support multiple RDMA WRITE SGL entries. Today the query API assumes READ/WRITE/SEND are symmetrical. has_rdma_read_sgl() is a good way to document that for now, and is a big flashing marker where something might need to be fixed in the future. This tells everyone reading the code and the API that when working with RDMA READ they need to be aware of this limitation. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 06/11] IB/Verbs: Use management helper has_sa() and cap_sa(), for sa-check
On Mon, Mar 30, 2015 at 01:02:03PM -0400, Doug Ledford wrote: > If we use something like this, then the above is all you need. Then > every place in the code that checks for something like has_sa or cap_sa > can be replaced with rdma_ib_mgmt. I don't want to see this slide back into some ill defined feature test. We need to clearly define exactly what these tests are checking, and I think, since we have so much code sharing, the checks should be narrow in scope, not just an entire standard. I think the basic family of checks Michael identified seemed like a reasonable start. Going forward we want to NAK stuff like this: if (rdma_ib_mgmt() || rdma_opa_mgmt()) if (has_sa() || has_opa_foobar()) That indicates we need a new micro feature test. A big problem here is people working on the core may not know the intricate details of all the families. This will only get worse when proprietary tech like OPA gets added. Documenting requirements via a narrow feature test gives a much higher chance that core stuff will be right via review. > But, like I said, this is an all or nothing change, it isn't > something we can ease into. Well, we ease into it by introducing the micro tests and then wiping the legacy ones, followed by changing how the driver/core communicates the port and device feature set, ideally to a bitset like you've described. Michael has tackled the core code, another series could work on the drivers.. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH for-next 0/9] mlx4 changes in virtual GID management
> I must disagree. You made this claim back when we submitted the IB > core patch which adds support for signature, protection etc offloads > (commit 1b01d33560e7 "IB/core: Introduce signature verbs API" and the > detailed replied you got were explaining that And I'll continue to make this claim because it is true. > Very similar arguments would apply to the ODP submission. Yes - ODP will _have_ to change the core. USNIC had to change the core, and had changes rejected that made perfect sense conceptually. This is part of the problem!!! If you disagree, then submit patches that don't touch the core. > So examples please! Sure - this is from Somnath's latest patch series: Matan Barak (14): IB/core: Add RoCE GID cache IB/core: Add kref to IB devices IB/core: Add RoCE GID population IB/core: Add default GID for RoCE GID Cache net/bonding: make DRV macros private net: Add info for NETDEV_CHANGEUPPER event IB/core: Add RoCE cache bonding support IB/core: GID attribute should be returned from verbs API and cache API IB/core: Report gid_type and gid_ndev through sysfs IB/core: Support find sgid index using a filter function IB/core: Modify ib_verbs and cma in order to use roce_gid_cache IB/core: Add gid_type to path and rdma_id_private IB/core: Add rdma_network_type to wc IB/cma: Add configfs for rdma_cm Moni Shoua (13): IB/mlx4: Remove gid table management for RoCE IB/mlx4: Replace spin_lock with rw_semaphore IB/mlx4: Lock with RCU instead of RTNL net/mlx4: Postpone the registration of net_device IB/mlx4: Advertise RoCE support in port capabilities IB/mlx4: Implement ib_device callback - get_netdev IB/mlx4: Implement ib_device callback - modify_gid IB/mlx4: Configure device to work in RoCEv2 IB/mlx4: Translate cache gid index to real index IB/core: Initialize UD header structure with IP and UDP headers IB/mlx4: Enable send of RoCE QP1 packets with IP/UDP headers IB/mlx4: Create and use another QP1 for RoCEv2 IB/cma: Join and leave multicast groups with IGMP That's a significant number of patches that modify the core rdma layer. NFSoRDMA had 5 different ways to register memory. I agree that Roland's response time is ridiculously slow. But he does tend to merge in new drivers and updates that only touch a single vendor's driver fairly quickly. It's the thrashing on the core that sees significant delays.
Re: [PATCH 0/9] IB/ipoib: fixup multicast locking issues
On Tue, Mar 31, 2015 at 8:04 PM, ira.weiny wrote: > On Sun, Mar 15, 2015 at 11:52:44AM -0700, Doug Ledford wrote: >> >> > On Mar 13, 2015, at 1:41 AM, Or Gerlitz wrote: >> > >> > On Sun, Feb 22, 2015 at 2:26 AM, Doug Ledford wrote: >> >> This is the re-ordered, squashed version of my 22 patch set that I >> >> posted on Feb 11. There are a few minor differences between that >> >> set and this one. They are: >> > [...] >> > >> > Doug, you wrote here a very detailed listing of the changes from >> > earlier posts and the testing the patches went through, which is >> > excellent. It would be very good if you can also post few liner >> > telling the changes done by the series in high level, so we can have >> > this test as part of a "merge" commit that says in the kernel logs. >> >> OK. I would take what I had in the original message and expand upon it then: >> >> This entire patchset was intended to address the issue of ipoib >> interfaces being brought up/down in a tight loop, which will hardlock >> a standard v3.19 kernel. It succeeds at resolving that problem. > > > I pulled this series and did some medium weight testing on 3.19 (module > reloads, insmod/rmmod, opensm restarts (client re-register)). IPoIB recovered > without issue on each of the tests. > Tested-by: Ira Weiny Yep, here too. We tested upstream + this series and it works well. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH for-next 0/9] mlx4 changes in virtual GID management
On Tue, Mar 31, 2015 at 8:27 PM, Hefty, Sean wrote: > I honestly think part of the issue here is general approach being taken by > the rdma stack, which is to expose every hardware specific component that a > vendor might define through the interfaces up to the consumers. Sean, I must disagree. You made this claim back when we submitted the IB core patch which adds support for signature, protection etc offloads (commit 1b01d33560e7 "IB/core: Introduce signature verbs API" and the detailed replied you got were explaining that 1. this allows to offload SCSI T10 signature for RDMA block storage protocols such as SRP and iSER 2. SCSI, SRP and iSER are all industry standards 3. T10 offload is done also by competing technologies such as Fibre-Channel 4. nothing, but exactly nothing in the verbs API is tied to specific HCA implementation 5. etc Very similar arguments would apply to the ODP submission. So examples please! Or. This results in unwieldy interfaces that are under constant churn, with no clear direction. If most submissions to the rdma stack were contained the lower-level drivers, I don't know that we would have the issues that we do. But a significant percentage of rdma patches modify the rdma core software layer, which threaten the stability of the entire stack. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH for-next 0/9] mlx4 changes in virtual GID management
On Tue, Mar 31, 2015 at 12:13:58PM +0300, Sagi Grimberg wrote: > We had a talk about a similar topic at LSFMM15. Linux-scsi subsystem > is a large scale subsystem and also has the "single maintainer with > limited time for upstream maintenance" bottleneck. > > Christoph created the "scsi-queue" tree to feed James in order to > accelerate the work process. [..] > One problem with that is that we need "a christoph" to poke people for > review since not a lot of people giving review. I have tried for a few years to get enough interested/qualified people together to do this kind of system, but it just hasn't happened. It feels like we are stuck in a loop: People and companies *do not* want to work on drivers/infiniband because too often patches just die. Asking people to work on it, and companies to fund work, gets met with the above. I think there is a broad agreement we need to change the system here. So, lets hear from people and companies: put your name forward, consistently provide time to do serious review work on all patches. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH for-next 0/9] mlx4 changes in virtual GID management
> It doesn't take a lot of time to do this work, especially if you use > the correct tools. I can review and apply 100 patches a day, even > when I'm on vacation. I'm an extreme example, but what you're doing > right now Roland is not acceptable and is not in agreement with your > claims about how much you care about this subsystem. I honestly think part of the issue here is general approach being taken by the rdma stack, which is to expose every hardware specific component that a vendor might define through the interfaces up to the consumers. This results in unwieldy interfaces that are under constant churn, with no clear direction. If most submissions to the rdma stack were contained the lower-level drivers, I don't know that we would have the issues that we do. But a significant percentage of rdma patches modify the rdma core software layer, which threaten the stability of the entire stack. - Sean -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/9] IB/ipoib: fixup multicast locking issues
On Sun, Mar 15, 2015 at 11:52:44AM -0700, Doug Ledford wrote: > > > On Mar 13, 2015, at 1:41 AM, Or Gerlitz wrote: > > > > On Sun, Feb 22, 2015 at 2:26 AM, Doug Ledford wrote: > >> This is the re-ordered, squashed version of my 22 patch set that I > >> posted on Feb 11. There are a few minor differences between that > >> set and this one. They are: > > [...] > > > > Doug, you wrote here a very detailed listing of the changes from > > earlier posts and the testing the patches went through, which is > > excellent. It would be very good if you can also post few liner > > telling the changes done by the series in high level, so we can have > > this test as part of a "merge" commit that says in the kernel logs. > > OK. I would take what I had in the original message and expand upon it then: > > This entire patchset was intended to address the issue of ipoib > interfaces being brought up/down in a tight loop, which will hardlock > a standard v3.19 kernel. It succeeds at resolving that problem. I pulled this series and did some medium weight testing on 3.19 (module reloads, insmod/rmmod, opensm restarts (client re-register)). IPoIB recovered without issue on each of the tests. Tested-by: Ira Weiny > > In order to accomplish this goal, it reworks how the IPOIB_MCAST_FLAG_BUSY > flag is used. Conceptually, that flag used to be set when we started a > multicast join, and would stay set once the join was complete. This left no > way to tell if the multicast join was complete or still in flight. This > allowed race conditions to develop between joining multicast groups and > taking an interface down. A previous attempt to resolve these race > conditions used the flag IPOIB_MCAST_JOIN_STARTED, but did not succeed at > fully resolving the race conditions. This patchset resolves this issue, plus > a number of related issues discovered while working on this issue. The > primary fix itself is patch 6/9 and a more complete description of the > changes to how the IPOIB_MCAST_FLAG_BUSY flag is now used can be found in > that commit log. > > — > Doug Ledford > GPG Key ID: 0E572FDD > > > > > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 13/18] IB/iser: Make fastreg pool cache friendly
Memory regions are resources that are saved in the device caches. Increase the probability for a cache hit by adding the MRU descriptor to pool head. Signed-off-by: Sagi Grimberg --- drivers/infiniband/ulp/iser/iser_memory.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/ulp/iser/iser_memory.c b/drivers/infiniband/ulp/iser/iser_memory.c index 17a5d70..45f5120 100644 --- a/drivers/infiniband/ulp/iser/iser_memory.c +++ b/drivers/infiniband/ulp/iser/iser_memory.c @@ -63,7 +63,7 @@ iser_reg_desc_put(struct ib_conn *ib_conn, unsigned long flags; spin_lock_irqsave(&ib_conn->lock, flags); - list_add_tail(&desc->list, &ib_conn->fastreg.pool); + list_add(&desc->list, &ib_conn->fastreg.pool); spin_unlock_irqrestore(&ib_conn->lock, flags); } -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 07/18] IB/iser: Move memory reg/dereg routines to iser_memory.c
As memory registration/de-registration methods, lets move them to their natural location. While we're at it, make iser_reg_page_vec routine static. This patch does not change any functionality. Signed-off-by: Sagi Grimberg --- drivers/infiniband/ulp/iser/iscsi_iser.h |4 - drivers/infiniband/ulp/iser/iser_memory.c | 88 + drivers/infiniband/ulp/iser/iser_verbs.c | 87 3 files changed, 88 insertions(+), 91 deletions(-) diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h b/drivers/infiniband/ulp/iser/iscsi_iser.h index 5c7036c..d5e5288 100644 --- a/drivers/infiniband/ulp/iser/iscsi_iser.h +++ b/drivers/infiniband/ulp/iser/iscsi_iser.h @@ -632,10 +632,6 @@ int iser_connect(struct iser_conn *iser_conn, struct sockaddr *dst_addr, int non_blocking); -int iser_reg_page_vec(struct ib_conn *ib_conn, - struct iser_page_vec *page_vec, - struct iser_mem_reg *mem_reg); - void iser_unreg_mem_fmr(struct iscsi_iser_task *iser_task, enum iser_data_dir cmd_dir); void iser_unreg_mem_fastreg(struct iscsi_iser_task *iser_task, diff --git a/drivers/infiniband/ulp/iser/iser_memory.c b/drivers/infiniband/ulp/iser/iser_memory.c index 9c60ff1..4e0cbbb 100644 --- a/drivers/infiniband/ulp/iser/iser_memory.c +++ b/drivers/infiniband/ulp/iser/iser_memory.c @@ -362,6 +362,94 @@ static int fall_to_bounce_buf(struct iscsi_iser_task *iser_task, } /** + * iser_reg_page_vec - Register physical memory + * + * returns: 0 on success, errno code on failure + */ +static +int iser_reg_page_vec(struct ib_conn *ib_conn, + struct iser_page_vec *page_vec, + struct iser_mem_reg *mem_reg) +{ + struct ib_pool_fmr *mem; + u64io_addr; + u64*page_list; + intstatus; + + page_list = page_vec->pages; + io_addr = page_list[0]; + + mem = ib_fmr_pool_map_phys(ib_conn->fmr.pool, + page_list, + page_vec->length, + io_addr); + + if (IS_ERR(mem)) { + status = (int)PTR_ERR(mem); + iser_err("ib_fmr_pool_map_phys failed: %d\n", status); + return status; + } + + mem_reg->lkey = mem->fmr->lkey; + mem_reg->rkey = mem->fmr->rkey; + mem_reg->len = page_vec->length * SIZE_4K; + mem_reg->va= io_addr; + mem_reg->mem_h = (void *)mem; + + mem_reg->va += page_vec->offset; + mem_reg->len = page_vec->data_size; + + iser_dbg("PHYSICAL Mem.register, [PHYS p_array: 0x%p, sz: %d, " +"entry[0]: (0x%08lx,%ld)] -> " +"[lkey: 0x%08X mem_h: 0x%p va: 0x%08lX sz: %ld]\n", +page_vec, page_vec->length, +(unsigned long)page_vec->pages[0], +(unsigned long)page_vec->data_size, +(unsigned int)mem_reg->lkey, mem_reg->mem_h, +(unsigned long)mem_reg->va, (unsigned long)mem_reg->len); + return 0; +} + +/** + * Unregister (previosuly registered using FMR) memory. + * If memory is non-FMR does nothing. + */ +void iser_unreg_mem_fmr(struct iscsi_iser_task *iser_task, + enum iser_data_dir cmd_dir) +{ + struct iser_mem_reg *reg = &iser_task->rdma_regd[cmd_dir].reg; + int ret; + + if (!reg->mem_h) + return; + + iser_dbg("PHYSICAL Mem.Unregister mem_h %p\n", reg->mem_h); + + ret = ib_fmr_pool_unmap((struct ib_pool_fmr *)reg->mem_h); + if (ret) + iser_err("ib_fmr_pool_unmap failed %d\n", ret); + + reg->mem_h = NULL; +} + +void iser_unreg_mem_fastreg(struct iscsi_iser_task *iser_task, + enum iser_data_dir cmd_dir) +{ + struct iser_mem_reg *reg = &iser_task->rdma_regd[cmd_dir].reg; + struct iser_conn *iser_conn = iser_task->iser_conn; + struct ib_conn *ib_conn = &iser_conn->ib_conn; + struct fast_reg_descriptor *desc = reg->mem_h; + + if (!desc) + return; + + reg->mem_h = NULL; + spin_lock_bh(&ib_conn->lock); + list_add_tail(&desc->list, &ib_conn->fastreg.pool); + spin_unlock_bh(&ib_conn->lock); +} + +/** * iser_reg_rdma_mem_fmr - Registers memory intended for RDMA, * using FMR (if possible) obtaining rkey and va * diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c b/drivers/infiniband/ulp/iser/iser_verbs.c index 7ee4926..986b5f4 100644 --- a/drivers/infiniband/ulp/iser/iser_verbs.c +++ b/drivers/infiniband/ulp/iser/iser_verbs.c @@ -992,93 +992,6 @@ connect_failure: return err; } -/** - * iser_reg_page_vec - Register physical memory - * - * returns: 0 on success, errno code on failure - */ -int iser_reg_page_vec(struct ib_conn *ib_conn, - struct
[PATCH v2 08/18] IB/iser: Remove redundant assignments in iser_reg_page_vec
Buffer length was assigned twice, and no reason to set va to io_addr and then add the offset, just set va to io_addr + offset. This patch does not change any functionality. Signed-off-by: Sagi Grimberg Signed-off-by: Adir Lev --- drivers/infiniband/ulp/iser/iser_memory.c |7 ++- 1 files changed, 2 insertions(+), 5 deletions(-) diff --git a/drivers/infiniband/ulp/iser/iser_memory.c b/drivers/infiniband/ulp/iser/iser_memory.c index 4e0cbbb..cb30865 100644 --- a/drivers/infiniband/ulp/iser/iser_memory.c +++ b/drivers/infiniband/ulp/iser/iser_memory.c @@ -392,12 +392,9 @@ int iser_reg_page_vec(struct ib_conn *ib_conn, mem_reg->lkey = mem->fmr->lkey; mem_reg->rkey = mem->fmr->rkey; - mem_reg->len = page_vec->length * SIZE_4K; - mem_reg->va= io_addr; - mem_reg->mem_h = (void *)mem; - - mem_reg->va += page_vec->offset; mem_reg->len = page_vec->data_size; + mem_reg->va= io_addr + page_vec->offset; + mem_reg->mem_h = (void *)mem; iser_dbg("PHYSICAL Mem.register, [PHYS p_array: 0x%p, sz: %d, " "entry[0]: (0x%08lx,%ld)] -> " -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 11/18] IB/iser: Move fastreg descriptor pool get/put to helper functions
Instead of open-coding connection fastreg pool get/put, we introduce iser_reg_desc[get|put] helpers. We aren't setting these static as this will be a per-device routine later on. Also, cleanup iser_unreg_rdma_mem_fastreg a bit. This patch does not change any functionality. Signed-off-by: Sagi Grimberg Signed-off-by: Adir Lev --- drivers/infiniband/ulp/iser/iscsi_iser.h |5 +++ drivers/infiniband/ulp/iser/iser_memory.c | 50 ++-- 2 files changed, 37 insertions(+), 18 deletions(-) diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h b/drivers/infiniband/ulp/iser/iscsi_iser.h index 8133678..185d2ec 100644 --- a/drivers/infiniband/ulp/iser/iscsi_iser.h +++ b/drivers/infiniband/ulp/iser/iscsi_iser.h @@ -644,4 +644,9 @@ int iser_create_fastreg_pool(struct ib_conn *ib_conn, unsigned cmds_max); void iser_free_fastreg_pool(struct ib_conn *ib_conn); u8 iser_check_task_pi_status(struct iscsi_iser_task *iser_task, enum iser_data_dir cmd_dir, sector_t *sector); +struct fast_reg_descriptor * +iser_reg_desc_get(struct ib_conn *ib_conn); +void +iser_reg_desc_put(struct ib_conn *ib_conn, + struct fast_reg_descriptor *desc); #endif diff --git a/drivers/infiniband/ulp/iser/iser_memory.c b/drivers/infiniband/ulp/iser/iser_memory.c index 6e6b753..17a5d70 100644 --- a/drivers/infiniband/ulp/iser/iser_memory.c +++ b/drivers/infiniband/ulp/iser/iser_memory.c @@ -41,6 +41,32 @@ #define ISER_KMALLOC_THRESHOLD 0x2 /* 128K - kmalloc limit */ +struct fast_reg_descriptor * +iser_reg_desc_get(struct ib_conn *ib_conn) +{ + struct fast_reg_descriptor *desc; + unsigned long flags; + + spin_lock_irqsave(&ib_conn->lock, flags); + desc = list_first_entry(&ib_conn->fastreg.pool, + struct fast_reg_descriptor, list); + list_del(&desc->list); + spin_unlock_irqrestore(&ib_conn->lock, flags); + + return desc; +} + +void +iser_reg_desc_put(struct ib_conn *ib_conn, + struct fast_reg_descriptor *desc) +{ + unsigned long flags; + + spin_lock_irqsave(&ib_conn->lock, flags); + list_add_tail(&desc->list, &ib_conn->fastreg.pool); + spin_unlock_irqrestore(&ib_conn->lock, flags); +} + /** * iser_start_rdma_unaligned_sg */ @@ -409,17 +435,13 @@ void iser_unreg_mem_fastreg(struct iscsi_iser_task *iser_task, enum iser_data_dir cmd_dir) { struct iser_mem_reg *reg = &iser_task->rdma_reg[cmd_dir]; - struct iser_conn *iser_conn = iser_task->iser_conn; - struct ib_conn *ib_conn = &iser_conn->ib_conn; - struct fast_reg_descriptor *desc = reg->mem_h; - if (!desc) + if (!reg->mem_h) return; + iser_reg_desc_put(&iser_task->iser_conn->ib_conn, + reg->mem_h); reg->mem_h = NULL; - spin_lock_bh(&ib_conn->lock); - list_add_tail(&desc->list, &ib_conn->fastreg.pool); - spin_unlock_bh(&ib_conn->lock); } /** @@ -725,7 +747,6 @@ int iser_reg_rdma_mem_fastreg(struct iscsi_iser_task *iser_task, struct fast_reg_descriptor *desc = NULL; struct ib_sge data_sge; int err, aligned_len; - unsigned long flags; aligned_len = iser_data_buf_aligned_len(mem, ibdev); if (aligned_len != mem->dma_nents) { @@ -739,11 +760,7 @@ int iser_reg_rdma_mem_fastreg(struct iscsi_iser_task *iser_task, if (mem->dma_nents != 1 || scsi_get_prot_op(iser_task->sc) != SCSI_PROT_NORMAL) { - spin_lock_irqsave(&ib_conn->lock, flags); - desc = list_first_entry(&ib_conn->fastreg.pool, - struct fast_reg_descriptor, list); - list_del(&desc->list); - spin_unlock_irqrestore(&ib_conn->lock, flags); + desc = iser_reg_desc_get(ib_conn); mem_reg->mem_h = desc; } @@ -799,11 +816,8 @@ int iser_reg_rdma_mem_fastreg(struct iscsi_iser_task *iser_task, return 0; err_reg: - if (desc) { - spin_lock_irqsave(&ib_conn->lock, flags); - list_add_tail(&desc->list, &ib_conn->fastreg.pool); - spin_unlock_irqrestore(&ib_conn->lock, flags); - } + if (desc) + iser_reg_desc_put(ib_conn, desc); return err; } -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 09/18] IB/iser: Get rid of struct iser_rdma_regd
This struct members other than struct iser_mem_reg are unused, so remove it altogether. This patch does not change any functionality. Signed-off-by: Sagi Grimberg Signed-off-by: Adir Lev --- drivers/infiniband/ulp/iser/iscsi_iser.h | 21 +- drivers/infiniband/ulp/iser/iser_initiator.c | 44 ++-- drivers/infiniband/ulp/iser/iser_memory.c| 56 +- drivers/infiniband/ulp/iser/iser_verbs.c |2 +- 4 files changed, 53 insertions(+), 70 deletions(-) diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h b/drivers/infiniband/ulp/iser/iscsi_iser.h index d5e5288..8133678 100644 --- a/drivers/infiniband/ulp/iser/iscsi_iser.h +++ b/drivers/infiniband/ulp/iser/iscsi_iser.h @@ -261,23 +261,6 @@ struct iser_mem_reg { void *mem_h; }; -/** - * struct iser_regd_buf - iSER buffer registration desc - * - * @reg: memory registration info - * @virt_addr:virtual address of buffer - * @device: reference to iser device - * @direction:dma direction (for dma_unmap) - * @data_size:data buffer size in bytes - */ -struct iser_regd_buf { - struct iser_mem_reg reg; - void*virt_addr; - struct iser_device *device; - enum dma_data_direction direction; - unsigned intdata_size; -}; - enum iser_desc_type { ISCSI_TX_CONTROL , ISCSI_TX_SCSI_COMMAND, @@ -537,7 +520,7 @@ struct iser_conn { * @sc: link to scsi command * @command_sent: indicate if command was sent * @dir: iser data direction - * @rdma_regd:task rdma registration desc + * @rdma_reg: task rdma registration desc * @data: iser data buffer desc * @prot: iser protection buffer desc */ @@ -548,7 +531,7 @@ struct iscsi_iser_task { struct scsi_cmnd *sc; int command_sent; int dir[ISER_DIRS_NUM]; - struct iser_regd_buf rdma_regd[ISER_DIRS_NUM]; + struct iser_mem_reg rdma_reg[ISER_DIRS_NUM]; struct iser_data_buf data[ISER_DIRS_NUM]; struct iser_data_buf prot[ISER_DIRS_NUM]; }; diff --git a/drivers/infiniband/ulp/iser/iser_initiator.c b/drivers/infiniband/ulp/iser/iser_initiator.c index 0e414db..420a613 100644 --- a/drivers/infiniband/ulp/iser/iser_initiator.c +++ b/drivers/infiniband/ulp/iser/iser_initiator.c @@ -50,7 +50,7 @@ static int iser_prepare_read_cmd(struct iscsi_task *task) { struct iscsi_iser_task *iser_task = task->dd_data; struct iser_device *device = iser_task->iser_conn->ib_conn.device; - struct iser_regd_buf *regd_buf; + struct iser_mem_reg *mem_reg; int err; struct iser_hdr *hdr = &iser_task->desc.iser_header; struct iser_data_buf *buf_in = &iser_task->data[ISER_DIR_IN]; @@ -78,15 +78,15 @@ static int iser_prepare_read_cmd(struct iscsi_task *task) iser_err("Failed to set up Data-IN RDMA\n"); return err; } - regd_buf = &iser_task->rdma_regd[ISER_DIR_IN]; + mem_reg = &iser_task->rdma_reg[ISER_DIR_IN]; hdr->flags|= ISER_RSV; - hdr->read_stag = cpu_to_be32(regd_buf->reg.rkey); - hdr->read_va = cpu_to_be64(regd_buf->reg.va); + hdr->read_stag = cpu_to_be32(mem_reg->rkey); + hdr->read_va = cpu_to_be64(mem_reg->va); iser_dbg("Cmd itt:%d READ tags RKEY:%#.4X VA:%#llX\n", -task->itt, regd_buf->reg.rkey, -(unsigned long long)regd_buf->reg.va); +task->itt, mem_reg->rkey, +(unsigned long long)mem_reg->va); return 0; } @@ -104,7 +104,7 @@ iser_prepare_write_cmd(struct iscsi_task *task, { struct iscsi_iser_task *iser_task = task->dd_data; struct iser_device *device = iser_task->iser_conn->ib_conn.device; - struct iser_regd_buf *regd_buf; + struct iser_mem_reg *mem_reg; int err; struct iser_hdr *hdr = &iser_task->desc.iser_header; struct iser_data_buf *buf_out = &iser_task->data[ISER_DIR_OUT]; @@ -134,25 +134,25 @@ iser_prepare_write_cmd(struct iscsi_task *task, return err; } - regd_buf = &iser_task->rdma_regd[ISER_DIR_OUT]; + mem_reg = &iser_task->rdma_reg[ISER_DIR_OUT]; if (unsol_sz < edtl) { hdr->flags |= ISER_WSV; - hdr->write_stag = cpu_to_be32(regd_buf->reg.rkey); - hdr->write_va = cpu_to_be64(regd_buf->reg.va + unsol_sz); + hdr->write_stag = cpu_to_be32(mem_reg->rkey); + hdr->write_va = cpu_to_be64(mem_reg->va + unsol_sz); iser_dbg("Cmd itt:%d, WRITE tags, RKEY:%#.4X " "VA:%#llX + unsol:%d\n", -task->itt, regd_buf->reg.rkey, -(unsigned long long)regd_buf->reg.va, un
[PATCH v2 01/18] IB/iser: Fix unload during ep_poll wrong dereference
In case the user unloaded ib_iser while ep_connect is in progress, we need to destroy the endpoint although ep_disconnect wasn't invoked (we detect this by the iser conn state != DOWN). However, if we got an REJECTED/UNREACHABLE CM event we move the connection state to DOWN which will prevent us from destroying the endpoint in the module unload stage. Fix this by setting the connection state to TERMINATING in iser_conn_error so we can still destroy the endpoint at unload stage. Reported-by: Ariel Nahum Signed-off-by: Sagi Grimberg --- drivers/infiniband/ulp/iser/iser_verbs.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c b/drivers/infiniband/ulp/iser/iser_verbs.c index 4065abe..070c5af 100644 --- a/drivers/infiniband/ulp/iser/iser_verbs.c +++ b/drivers/infiniband/ulp/iser/iser_verbs.c @@ -721,7 +721,7 @@ static void iser_connect_error(struct rdma_cm_id *cma_id) struct iser_conn *iser_conn; iser_conn = (struct iser_conn *)cma_id->context; - iser_conn->state = ISER_CONN_DOWN; + iser_conn->state = ISER_CONN_TERMINATING; } /** -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 12/18] IB/iser: Move PI context alloc/free to routines
Make iser_[create|destroy]_fastreg_desc shorter, more readable and easily extendable. This patch does not change any functionality. Signed-off-by: Sagi Grimberg Signed-off-by: Adir Lev --- drivers/infiniband/ulp/iser/iser_verbs.c | 118 -- 1 files changed, 63 insertions(+), 55 deletions(-) diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c b/drivers/infiniband/ulp/iser/iser_verbs.c index 20eec09..cc2dd35 100644 --- a/drivers/infiniband/ulp/iser/iser_verbs.c +++ b/drivers/infiniband/ulp/iser/iser_verbs.c @@ -274,6 +274,65 @@ void iser_free_fmr_pool(struct ib_conn *ib_conn) } static int +iser_alloc_pi_ctx(struct ib_device *ib_device, struct ib_pd *pd, + struct fast_reg_descriptor *desc) +{ + struct iser_pi_context *pi_ctx = NULL; + struct ib_mr_init_attr mr_init_attr = {.max_reg_descriptors = 2, + .flags = IB_MR_SIGNATURE_EN}; + int ret = 0; + + desc->pi_ctx = kzalloc(sizeof(*desc->pi_ctx), GFP_KERNEL); + if (!desc->pi_ctx) + return -ENOMEM; + + pi_ctx = desc->pi_ctx; + + pi_ctx->prot_frpl = ib_alloc_fast_reg_page_list(ib_device, + ISCSI_ISER_SG_TABLESIZE); + if (IS_ERR(pi_ctx->prot_frpl)) { + ret = PTR_ERR(pi_ctx->prot_frpl); + goto prot_frpl_failure; + } + + pi_ctx->prot_mr = ib_alloc_fast_reg_mr(pd, + ISCSI_ISER_SG_TABLESIZE + 1); + if (IS_ERR(pi_ctx->prot_mr)) { + ret = PTR_ERR(pi_ctx->prot_mr); + goto prot_mr_failure; + } + desc->reg_indicators |= ISER_PROT_KEY_VALID; + + pi_ctx->sig_mr = ib_create_mr(pd, &mr_init_attr); + if (IS_ERR(pi_ctx->sig_mr)) { + ret = PTR_ERR(pi_ctx->sig_mr); + goto sig_mr_failure; + } + desc->reg_indicators |= ISER_SIG_KEY_VALID; + desc->reg_indicators &= ~ISER_FASTREG_PROTECTED; + + return 0; + +sig_mr_failure: + ib_dereg_mr(desc->pi_ctx->prot_mr); +prot_mr_failure: + ib_free_fast_reg_page_list(desc->pi_ctx->prot_frpl); +prot_frpl_failure: + kfree(desc->pi_ctx); + + return ret; +} + +static void +iser_free_pi_ctx(struct iser_pi_context *pi_ctx) +{ + ib_free_fast_reg_page_list(pi_ctx->prot_frpl); + ib_dereg_mr(pi_ctx->prot_mr); + ib_destroy_mr(pi_ctx->sig_mr); + kfree(pi_ctx); +} + +static int iser_create_fastreg_desc(struct ib_device *ib_device, struct ib_pd *pd, bool pi_enable, struct fast_reg_descriptor *desc) { @@ -297,59 +356,12 @@ iser_create_fastreg_desc(struct ib_device *ib_device, struct ib_pd *pd, desc->reg_indicators |= ISER_DATA_KEY_VALID; if (pi_enable) { - struct ib_mr_init_attr mr_init_attr = {0}; - struct iser_pi_context *pi_ctx = NULL; - - desc->pi_ctx = kzalloc(sizeof(*desc->pi_ctx), GFP_KERNEL); - if (!desc->pi_ctx) { - iser_err("Failed to allocate pi context\n"); - ret = -ENOMEM; + ret = iser_alloc_pi_ctx(ib_device, pd, desc); + if (ret) goto pi_ctx_alloc_failure; - } - pi_ctx = desc->pi_ctx; - - pi_ctx->prot_frpl = ib_alloc_fast_reg_page_list(ib_device, - ISCSI_ISER_SG_TABLESIZE); - if (IS_ERR(pi_ctx->prot_frpl)) { - ret = PTR_ERR(pi_ctx->prot_frpl); - iser_err("Failed to allocate prot frpl ret=%d\n", -ret); - goto prot_frpl_failure; - } - - pi_ctx->prot_mr = ib_alloc_fast_reg_mr(pd, - ISCSI_ISER_SG_TABLESIZE + 1); - if (IS_ERR(pi_ctx->prot_mr)) { - ret = PTR_ERR(pi_ctx->prot_mr); - iser_err("Failed to allocate prot frmr ret=%d\n", -ret); - goto prot_mr_failure; - } - desc->reg_indicators |= ISER_PROT_KEY_VALID; - - mr_init_attr.max_reg_descriptors = 2; - mr_init_attr.flags |= IB_MR_SIGNATURE_EN; - pi_ctx->sig_mr = ib_create_mr(pd, &mr_init_attr); - if (IS_ERR(pi_ctx->sig_mr)) { - ret = PTR_ERR(pi_ctx->sig_mr); - iser_err("Failed to allocate signature enabled mr err=%d\n", -ret); - goto sig_mr_failure; - } - desc->reg_indicators |= ISER_SIG_KEY_VALID; } - desc->reg_indicators &= ~ISER_FASTREG_PROTECTED; - - iser_dbg("Create fr_desc %p page_list %p\n", -desc, desc->data_frpl->page_list); retur
[PATCH v2 16/18] IB/iser: Remove code duplication for a single DMA entry
In singleton scatterlists, DMA memory registration code is taken both for Fastreg and FMR code paths. Move it to a function. This patch does not change any functionality. Signed-off-by: Sagi Grimberg Signed-off-by: Adir Lev --- drivers/infiniband/ulp/iser/iser_memory.c | 48 1 files changed, 21 insertions(+), 27 deletions(-) diff --git a/drivers/infiniband/ulp/iser/iser_memory.c b/drivers/infiniband/ulp/iser/iser_memory.c index b4b0098..e78ce53 100644 --- a/drivers/infiniband/ulp/iser/iser_memory.c +++ b/drivers/infiniband/ulp/iser/iser_memory.c @@ -334,6 +334,24 @@ void iser_dma_unmap_task_data(struct iscsi_iser_task *iser_task, ib_dma_unmap_sg(dev, data->sg, data->size, dir); } +static int +iser_reg_dma(struct iser_device *device, struct iser_data_buf *mem, +struct iser_mem_reg *reg) +{ + struct scatterlist *sg = mem->sg; + + reg->sge.lkey = device->mr->lkey; + reg->rkey = device->mr->rkey; + reg->sge.addr = ib_sg_dma_address(device->ib_device, &sg[0]); + reg->sge.length = ib_sg_dma_len(device->ib_device, &sg[0]); + + iser_dbg("Single DMA entry: lkey=0x%x, rkey=0x%x, addr=0x%llx," +" length=0x%x\n", reg->sge.lkey, reg->rkey, +reg->sge.addr, reg->sge.length); + + return 0; +} + static int fall_to_bounce_buf(struct iscsi_iser_task *iser_task, struct iser_data_buf *mem, enum iser_data_dir cmd_dir, @@ -461,7 +479,6 @@ int iser_reg_rdma_mem_fmr(struct iscsi_iser_task *iser_task, int aligned_len; int err; int i; - struct scatterlist *sg; mem_reg = &iser_task->rdma_reg[cmd_dir]; @@ -477,19 +494,7 @@ int iser_reg_rdma_mem_fmr(struct iscsi_iser_task *iser_task, /* if there a single dma entry, FMR is not needed */ if (mem->dma_nents == 1) { - sg = mem->sg; - - mem_reg->sge.lkey = device->mr->lkey; - mem_reg->rkey = device->mr->rkey; - mem_reg->sge.length = ib_sg_dma_len(ibdev, &sg[0]); - mem_reg->sge.addr = ib_sg_dma_address(ibdev, &sg[0]); - - iser_dbg("PHYSICAL Mem.register: lkey: 0x%08X rkey: 0x%08X " -"va: 0x%08lX sz: %ld]\n", -(unsigned int)mem_reg->sge.lkey, -(unsigned int)mem_reg->rkey, -(unsigned long)mem_reg->sge.addr, -(unsigned long)mem_reg->sge.length); + return iser_reg_dma(device, mem, mem_reg); } else { /* use FMR for multiple dma entries */ err = iser_reg_page_vec(iser_task, mem, ib_conn->fmr.page_vec, mem_reg); @@ -660,7 +665,6 @@ static int iser_fast_reg_mr(struct iscsi_iser_task *iser_task, struct fast_reg_descriptor *desc = mem_reg->mem_h; struct ib_conn *ib_conn = &iser_task->iser_conn->ib_conn; struct iser_device *device = ib_conn->device; - struct ib_device *ibdev = device->ib_device; struct ib_mr *mr; struct ib_fast_reg_page_list *frpl; struct ib_send_wr fastreg_wr, inv_wr; @@ -668,18 +672,8 @@ static int iser_fast_reg_mr(struct iscsi_iser_task *iser_task, int ret, offset, size, plen; /* if there a single dma entry, dma mr suffices */ - if (mem->dma_nents == 1) { - struct scatterlist *sg = mem->sg; - - reg->sge.lkey = device->mr->lkey; - reg->rkey = device->mr->rkey; - reg->sge.addr = ib_sg_dma_address(ibdev, &sg[0]); - reg->sge.length = ib_sg_dma_len(ibdev, &sg[0]); - - iser_dbg("Single DMA entry: lkey=0x%x, addr=0x%llx, length=0x%x\n", -reg->sge.lkey, reg->sge.addr, reg->sge.length); - return 0; - } + if (mem->dma_nents == 1) + return iser_reg_dma(device, mem, mem_reg); if (ind == ISER_DATA_KEY_VALID) { mr = desc->data_mr; -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 15/18] IB/iser: Pass struct iser_mem_reg to iser_fast_reg_mr and iser_reg_sig_mr
Instead of passing ib_sge as output variable, we pass the mem_reg pointer to have the routines fill the rkey as well. This reduces code duplication and extra assignments. This is a preparation step to unify some registration logics together. This patch does not change any functionality. Signed-off-by: Sagi Grimberg Signed-off-by: Adir Lev --- drivers/infiniband/ulp/iser/iser_memory.c | 68 1 files changed, 29 insertions(+), 39 deletions(-) diff --git a/drivers/infiniband/ulp/iser/iser_memory.c b/drivers/infiniband/ulp/iser/iser_memory.c index 40d22d5..b4b0098 100644 --- a/drivers/infiniband/ulp/iser/iser_memory.c +++ b/drivers/infiniband/ulp/iser/iser_memory.c @@ -590,8 +590,10 @@ iser_inv_rkey(struct ib_send_wr *inv_wr, struct ib_mr *mr) static int iser_reg_sig_mr(struct iscsi_iser_task *iser_task, - struct fast_reg_descriptor *desc, struct ib_sge *data_sge, - struct ib_sge *prot_sge, struct ib_sge *sig_sge) + struct fast_reg_descriptor *desc, + struct iser_mem_reg *data_reg, + struct iser_mem_reg *prot_reg, + struct iser_mem_reg *sig_reg) { struct ib_conn *ib_conn = &iser_task->iser_conn->ib_conn; struct iser_pi_context *pi_ctx = desc->pi_ctx; @@ -615,12 +617,12 @@ iser_reg_sig_mr(struct iscsi_iser_task *iser_task, memset(&sig_wr, 0, sizeof(sig_wr)); sig_wr.opcode = IB_WR_REG_SIG_MR; sig_wr.wr_id = ISER_FASTREG_LI_WRID; - sig_wr.sg_list = data_sge; + sig_wr.sg_list = &data_reg->sge; sig_wr.num_sge = 1; sig_wr.wr.sig_handover.sig_attrs = &sig_attrs; sig_wr.wr.sig_handover.sig_mr = pi_ctx->sig_mr; if (scsi_prot_sg_count(iser_task->sc)) - sig_wr.wr.sig_handover.prot = prot_sge; + sig_wr.wr.sig_handover.prot = &prot_reg->sge; sig_wr.wr.sig_handover.access_flags = IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_READ | IB_ACCESS_REMOTE_WRITE; @@ -637,13 +639,14 @@ iser_reg_sig_mr(struct iscsi_iser_task *iser_task, } desc->reg_indicators &= ~ISER_SIG_KEY_VALID; - sig_sge->lkey = pi_ctx->sig_mr->lkey; - sig_sge->addr = 0; - sig_sge->length = scsi_transfer_length(iser_task->sc); + sig_reg->sge.lkey = pi_ctx->sig_mr->lkey; + sig_reg->rkey = pi_ctx->sig_mr->rkey; + sig_reg->sge.addr = 0; + sig_reg->sge.length = scsi_transfer_length(iser_task->sc); - iser_dbg("sig_sge: addr: 0x%llx length: %u lkey: 0x%x\n", -sig_sge->addr, sig_sge->length, -sig_sge->lkey); + iser_dbg("sig_sge: lkey: 0x%x, rkey: 0x%x, addr: 0x%llx, length: %u\n", +sig_reg->sge.lkey, sig_reg->rkey, sig_reg->sge.addr, +sig_reg->sge.length); err: return ret; } @@ -652,7 +655,7 @@ static int iser_fast_reg_mr(struct iscsi_iser_task *iser_task, struct iser_mem_reg *mem_reg, struct iser_data_buf *mem, enum iser_reg_indicator ind, - struct ib_sge *sge) + struct iser_mem_reg *reg) { struct fast_reg_descriptor *desc = mem_reg->mem_h; struct ib_conn *ib_conn = &iser_task->iser_conn->ib_conn; @@ -668,12 +671,13 @@ static int iser_fast_reg_mr(struct iscsi_iser_task *iser_task, if (mem->dma_nents == 1) { struct scatterlist *sg = mem->sg; - sge->lkey = device->mr->lkey; - sge->addr = ib_sg_dma_address(ibdev, &sg[0]); - sge->length = ib_sg_dma_len(ibdev, &sg[0]); + reg->sge.lkey = device->mr->lkey; + reg->rkey = device->mr->rkey; + reg->sge.addr = ib_sg_dma_address(ibdev, &sg[0]); + reg->sge.length = ib_sg_dma_len(ibdev, &sg[0]); iser_dbg("Single DMA entry: lkey=0x%x, addr=0x%llx, length=0x%x\n", -sge->lkey, sge->addr, sge->length); +reg->sge.lkey, reg->sge.addr, reg->sge.length); return 0; } @@ -723,9 +727,10 @@ static int iser_fast_reg_mr(struct iscsi_iser_task *iser_task, } desc->reg_indicators &= ~ind; - sge->lkey = mr->lkey; - sge->addr = frpl->page_list[0] + offset; - sge->length = size; + reg->sge.lkey = mr->lkey; + reg->rkey = mr->rkey; + reg->sge.addr = frpl->page_list[0] + offset; + reg->sge.length = size; return ret; } @@ -745,7 +750,6 @@ int iser_reg_rdma_mem_fastreg(struct iscsi_iser_task *iser_task, struct iser_data_buf *mem = &iser_task->data[cmd_dir]; struct iser_mem_reg *mem_reg = &iser_task->rdma_reg[cmd_dir]; struct fast_reg_descriptor *desc = NULL; - struct ib_sge data_sge; int err, aligned_len;
[PATCH v2 10/18] IB/iser: Merge build page-vec into register page-vec
No need for these two separate. Keep it in a single routine like in the fastreg case. This will also make iser_reg_page_vec closer to iser_fast_reg_mr arguments. This is a preparation step for registration flow refactor. This patch does not change any functionality. Signed-off-by: Sagi Grimberg Signed-off-by: Adir Lev --- drivers/infiniband/ulp/iser/iser_memory.c | 91 ++-- 1 files changed, 33 insertions(+), 58 deletions(-) diff --git a/drivers/infiniband/ulp/iser/iser_memory.c b/drivers/infiniband/ulp/iser/iser_memory.c index 0b8656f..6e6b753 100644 --- a/drivers/infiniband/ulp/iser/iser_memory.c +++ b/drivers/infiniband/ulp/iser/iser_memory.c @@ -280,31 +280,6 @@ static void iser_dump_page_vec(struct iser_page_vec *page_vec) iser_err("%d %lx\n",i,(unsigned long)page_vec->pages[i]); } -static void iser_page_vec_build(struct iser_data_buf *data, - struct iser_page_vec *page_vec, - struct ib_device *ibdev) -{ - int page_vec_len = 0; - - page_vec->length = 0; - page_vec->offset = 0; - - iser_dbg("Translating sg sz: %d\n", data->dma_nents); - page_vec_len = iser_sg_to_page_vec(data, ibdev, page_vec->pages, - &page_vec->offset, - &page_vec->data_size); - iser_dbg("sg len %d page_vec_len %d\n", data->dma_nents, page_vec_len); - - page_vec->length = page_vec_len; - - if (page_vec_len * SIZE_4K < page_vec->data_size) { - iser_err("page_vec too short to hold this SG\n"); - iser_data_buf_dump(data, ibdev); - iser_dump_page_vec(page_vec); - BUG(); - } -} - int iser_dma_map_task_data(struct iscsi_iser_task *iser_task, struct iser_data_buf *data, enum iser_data_dir iser_dir, @@ -367,43 +342,44 @@ static int fall_to_bounce_buf(struct iscsi_iser_task *iser_task, * returns: 0 on success, errno code on failure */ static -int iser_reg_page_vec(struct ib_conn *ib_conn, +int iser_reg_page_vec(struct iscsi_iser_task *iser_task, + struct iser_data_buf *mem, struct iser_page_vec *page_vec, - struct iser_mem_reg *mem_reg) + struct iser_mem_reg *mem_reg) { - struct ib_pool_fmr *mem; - u64io_addr; - u64*page_list; - intstatus; - - page_list = page_vec->pages; - io_addr = page_list[0]; + struct ib_conn *ib_conn = &iser_task->iser_conn->ib_conn; + struct iser_device *device = ib_conn->device; + struct ib_pool_fmr *fmr; + int ret, plen; + + plen = iser_sg_to_page_vec(mem, device->ib_device, + page_vec->pages, + &page_vec->offset, + &page_vec->data_size); + page_vec->length = plen; + if (plen * SIZE_4K < page_vec->data_size) { + iser_err("page vec too short to hold this SG\n"); + iser_data_buf_dump(mem, device->ib_device); + iser_dump_page_vec(page_vec); + return -EINVAL; + } - mem = ib_fmr_pool_map_phys(ib_conn->fmr.pool, - page_list, + fmr = ib_fmr_pool_map_phys(ib_conn->fmr.pool, + page_vec->pages, page_vec->length, - io_addr); - - if (IS_ERR(mem)) { - status = (int)PTR_ERR(mem); - iser_err("ib_fmr_pool_map_phys failed: %d\n", status); - return status; + page_vec->pages[0]); + if (IS_ERR(fmr)) { + ret = PTR_ERR(fmr); + iser_err("ib_fmr_pool_map_phys failed: %d\n", ret); + return ret; } - mem_reg->lkey = mem->fmr->lkey; - mem_reg->rkey = mem->fmr->rkey; - mem_reg->len = page_vec->data_size; - mem_reg->va= io_addr + page_vec->offset; - mem_reg->mem_h = (void *)mem; - - iser_dbg("PHYSICAL Mem.register, [PHYS p_array: 0x%p, sz: %d, " -"entry[0]: (0x%08lx,%ld)] -> " -"[lkey: 0x%08X mem_h: 0x%p va: 0x%08lX sz: %ld]\n", -page_vec, page_vec->length, -(unsigned long)page_vec->pages[0], -(unsigned long)page_vec->data_size, -(unsigned int)mem_reg->lkey, mem_reg->mem_h, -(unsigned long)mem_reg->va, (unsigned long)mem_reg->len); + mem_reg->lkey = fmr->fmr->lkey; + mem_reg->rkey = fmr->fmr->rkey; + mem_reg->va = page_vec->pages[0] + page_vec->offset; + mem_reg->len = page_vec->data_size; + mem_reg->mem_h = fmr; + return 0; } @@ -493,8 +4
[PATCH v2 05/18] IB/iser: Remove a redundant struct iser_data_buf
No need to keep two iser_data_buf structures just in case we use mem copy. We can avoid that just by adding a pointer to the original sg. So keep only two iser_data_buf per command (data and protection) and pass the relevant data_buf to bounce buffer routine. This patch does not change any functionality. Signed-off-by: Sagi Grimberg Signed-off-by: Adir Lev --- drivers/infiniband/ulp/iser/iscsi_iser.h | 12 ++--- drivers/infiniband/ulp/iser/iser_initiator.c | 16 +++ drivers/infiniband/ulp/iser/iser_memory.c| 58 ++--- 3 files changed, 34 insertions(+), 52 deletions(-) diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h b/drivers/infiniband/ulp/iser/iscsi_iser.h index b47aea1..5c7036c 100644 --- a/drivers/infiniband/ulp/iser/iscsi_iser.h +++ b/drivers/infiniband/ulp/iser/iscsi_iser.h @@ -218,20 +218,23 @@ enum iser_data_dir { /** * struct iser_data_buf - iSER data buffer * - * @buf: pointer to the sg list + * @sg: pointer to the sg list * @size: num entries of this sg * @data_len: total beffer byte len * @dma_nents:returned by dma_map_sg * @copy_buf: allocated copy buf for SGs unaligned *for rdma which are copied + * @orig_sg: pointer to the original sg list (in case + *we used a copy) * @sg_single:SG-ified clone of a non SG SC or *unaligned SG */ struct iser_data_buf { - void *buf; + struct scatterlist *sg; unsigned int size; unsigned long data_len; unsigned int dma_nents; + struct scatterlist *orig_sg; char *copy_buf; struct scatterlist sg_single; }; @@ -536,9 +539,7 @@ struct iser_conn { * @dir: iser data direction * @rdma_regd:task rdma registration desc * @data: iser data buffer desc - * @data_copy:iser data copy buffer desc (bounce buffer) * @prot: iser protection buffer desc - * @prot_copy:iser protection copy buffer desc (bounce buffer) */ struct iscsi_iser_task { struct iser_tx_desc desc; @@ -549,9 +550,7 @@ struct iscsi_iser_task { int dir[ISER_DIRS_NUM]; struct iser_regd_buf rdma_regd[ISER_DIRS_NUM]; struct iser_data_buf data[ISER_DIRS_NUM]; - struct iser_data_buf data_copy[ISER_DIRS_NUM]; struct iser_data_buf prot[ISER_DIRS_NUM]; - struct iser_data_buf prot_copy[ISER_DIRS_NUM]; }; struct iser_page_vec { @@ -621,7 +620,6 @@ void iser_free_rx_descriptors(struct iser_conn *iser_conn); void iser_finalize_rdma_unaligned_sg(struct iscsi_iser_task *iser_task, struct iser_data_buf *mem, -struct iser_data_buf *mem_copy, enum iser_data_dir cmd_dir); int iser_reg_rdma_mem_fmr(struct iscsi_iser_task *task, diff --git a/drivers/infiniband/ulp/iser/iser_initiator.c b/drivers/infiniband/ulp/iser/iser_initiator.c index 76eb57b..0e414db 100644 --- a/drivers/infiniband/ulp/iser/iser_initiator.c +++ b/drivers/infiniband/ulp/iser/iser_initiator.c @@ -401,13 +401,13 @@ int iser_send_command(struct iscsi_conn *conn, } if (scsi_sg_count(sc)) { /* using a scatter list */ - data_buf->buf = scsi_sglist(sc); + data_buf->sg = scsi_sglist(sc); data_buf->size = scsi_sg_count(sc); } data_buf->data_len = scsi_bufflen(sc); if (scsi_prot_sg_count(sc)) { - prot_buf->buf = scsi_prot_sglist(sc); + prot_buf->sg = scsi_prot_sglist(sc); prot_buf->size = scsi_prot_sg_count(sc); prot_buf->data_len = (data_buf->data_len >> ilog2(sc->device->sector_size)) * 8; @@ -674,35 +674,31 @@ void iser_task_rdma_finalize(struct iscsi_iser_task *iser_task) /* if we were reading, copy back to unaligned sglist, * anyway dma_unmap and free the copy */ - if (iser_task->data_copy[ISER_DIR_IN].copy_buf != NULL) { + if (iser_task->data[ISER_DIR_IN].copy_buf) { is_rdma_data_aligned = 0; iser_finalize_rdma_unaligned_sg(iser_task, &iser_task->data[ISER_DIR_IN], - &iser_task->data_copy[ISER_DIR_IN], ISER_DIR_IN); } - if (iser_task->data_copy[ISER_DIR_OUT].copy_buf != NULL) { + if (iser_task->data[ISER_DIR_OUT].copy_buf) { is_rdma_data_aligned = 0; iser_finalize_rdma_unaligned_sg(iser_task, &iser_task->data[ISER_DIR_OUT], - &iser_task->data_c
[PATCH v2 02/18] IB/iser: Handle fastreg/local_inv completion errors
Fast registration and local invalidate work requests can also fail. We should call error completion handler for them. Reported-by: Roi Dayan Signed-off-by: Sagi Grimberg --- drivers/infiniband/ulp/iser/iser_verbs.c | 11 ++- 1 files changed, 6 insertions(+), 5 deletions(-) diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c b/drivers/infiniband/ulp/iser/iser_verbs.c index 070c5af..7ee4926 100644 --- a/drivers/infiniband/ulp/iser/iser_verbs.c +++ b/drivers/infiniband/ulp/iser/iser_verbs.c @@ -1210,6 +1210,9 @@ iser_handle_comp_error(struct ib_conn *ib_conn, iscsi_conn_failure(iser_conn->iscsi_conn, ISCSI_ERR_CONN_FAILED); + if (wc->wr_id == ISER_FASTREG_LI_WRID) + return; + if (is_iser_tx_desc(iser_conn, wr_id)) { struct iser_tx_desc *desc = wr_id; @@ -1254,13 +1257,11 @@ static void iser_handle_wc(struct ib_wc *wc) else iser_dbg("flush error: wr id %llx\n", wc->wr_id); - if (wc->wr_id != ISER_FASTREG_LI_WRID && - wc->wr_id != ISER_BEACON_WRID) - iser_handle_comp_error(ib_conn, wc); - - /* complete in case all flush errors were consumed */ if (wc->wr_id == ISER_BEACON_WRID) + /* all flush errors were consumed */ complete(&ib_conn->flush_comp); + else + iser_handle_comp_error(ib_conn, wc); } } -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 18/18] IB/iser: Rewrite bounce buffer code path
In some rare cases, IO operations may be not aligned to page boundaries. This prevents iser from performing fast memory registration. In order to overcome that iser uses a bounce buffer to carry the transaction. We basically allocate a buffer in the size of the transaction and perform a copy. The buffer allocation using kmalloc is too restrictive since it requires higher order (atomic) allocations for large transactions (which may result in memory exhaustion fairly fast for some workloads). We rewrite the bounce buffer code path to allocate scattered pages and perform a copy between the transaction sg and the bounce sg. Reported-by: Alex Lyakas Signed-off-by: Sagi Grimberg --- drivers/infiniband/ulp/iser/iscsi_iser.h |8 +- drivers/infiniband/ulp/iser/iser_initiator.c |8 +- drivers/infiniband/ulp/iser/iser_memory.c| 211 -- 3 files changed, 138 insertions(+), 89 deletions(-) diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h b/drivers/infiniband/ulp/iser/iscsi_iser.h index a39645a..262ba1f 100644 --- a/drivers/infiniband/ulp/iser/iscsi_iser.h +++ b/drivers/infiniband/ulp/iser/iscsi_iser.h @@ -222,12 +222,9 @@ enum iser_data_dir { * @size: num entries of this sg * @data_len: total beffer byte len * @dma_nents:returned by dma_map_sg - * @copy_buf: allocated copy buf for SGs unaligned - *for rdma which are copied * @orig_sg: pointer to the original sg list (in case *we used a copy) - * @sg_single:SG-ified clone of a non SG SC or - *unaligned SG + * @orig_size:num entris of orig sg list */ struct iser_data_buf { struct scatterlist *sg; @@ -235,8 +232,7 @@ struct iser_data_buf { unsigned long data_len; unsigned int dma_nents; struct scatterlist *orig_sg; - char *copy_buf; - struct scatterlist sg_single; + unsigned int orig_size; }; /* fwd declarations */ diff --git a/drivers/infiniband/ulp/iser/iser_initiator.c b/drivers/infiniband/ulp/iser/iser_initiator.c index b2e3b77..3e2118e 100644 --- a/drivers/infiniband/ulp/iser/iser_initiator.c +++ b/drivers/infiniband/ulp/iser/iser_initiator.c @@ -674,28 +674,28 @@ void iser_task_rdma_finalize(struct iscsi_iser_task *iser_task) /* if we were reading, copy back to unaligned sglist, * anyway dma_unmap and free the copy */ - if (iser_task->data[ISER_DIR_IN].copy_buf) { + if (iser_task->data[ISER_DIR_IN].orig_sg) { is_rdma_data_aligned = 0; iser_finalize_rdma_unaligned_sg(iser_task, &iser_task->data[ISER_DIR_IN], ISER_DIR_IN); } - if (iser_task->data[ISER_DIR_OUT].copy_buf) { + if (iser_task->data[ISER_DIR_OUT].orig_sg) { is_rdma_data_aligned = 0; iser_finalize_rdma_unaligned_sg(iser_task, &iser_task->data[ISER_DIR_OUT], ISER_DIR_OUT); } - if (iser_task->prot[ISER_DIR_IN].copy_buf) { + if (iser_task->prot[ISER_DIR_IN].orig_sg) { is_rdma_prot_aligned = 0; iser_finalize_rdma_unaligned_sg(iser_task, &iser_task->prot[ISER_DIR_IN], ISER_DIR_IN); } - if (iser_task->prot[ISER_DIR_OUT].copy_buf) { + if (iser_task->prot[ISER_DIR_OUT].orig_sg) { is_rdma_prot_aligned = 0; iser_finalize_rdma_unaligned_sg(iser_task, &iser_task->prot[ISER_DIR_OUT], diff --git a/drivers/infiniband/ulp/iser/iser_memory.c b/drivers/infiniband/ulp/iser/iser_memory.c index e78ce53..f90d6de 100644 --- a/drivers/infiniband/ulp/iser/iser_memory.c +++ b/drivers/infiniband/ulp/iser/iser_memory.c @@ -39,7 +39,112 @@ #include "iscsi_iser.h" -#define ISER_KMALLOC_THRESHOLD 0x2 /* 128K - kmalloc limit */ +static void +iser_free_bounce_sg(struct iser_data_buf *data) +{ + struct scatterlist *sg; + int count; + + for_each_sg(data->sg, sg, data->size, count) + __free_page(sg_page(sg)); + + kfree(data->sg); + + data->sg = data->orig_sg; + data->size = data->orig_size; + data->orig_sg = NULL; + data->orig_size = 0; +} + +static int +iser_alloc_bounce_sg(struct iser_data_buf *data) +{ + struct scatterlist *sg; + struct page *page; + unsigned long length = data->data_len; + int i = 0, nents = DIV_ROUND_UP(length, PAGE_SIZE); + + sg = kcalloc(nents, sizeof(*sg), GFP_ATOMIC); + if (!sg) + goto err; + + sg_init_table(sg, nents); + while (length) { + u32 page_len = min_t(u32, length, PAGE_SIZE);
[PATCH v2 04/18] IB/iser: Remove redundant cmd_data_len calculation
This code was added before we had protection data length calculation (in iser_send_command), so we needed to calc the sg data length from the sg itself. This is not needed anymore. This patch does not change any functionality. Signed-off-by: Sagi Grimberg Signed-off-by: Adir Lev --- drivers/infiniband/ulp/iser/iser_memory.c |5 + 1 files changed, 1 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/ulp/iser/iser_memory.c b/drivers/infiniband/ulp/iser/iser_memory.c index 341040b..32ccd5c 100644 --- a/drivers/infiniband/ulp/iser/iser_memory.c +++ b/drivers/infiniband/ulp/iser/iser_memory.c @@ -53,12 +53,9 @@ static int iser_start_rdma_unaligned_sg(struct iscsi_iser_task *iser_task, struct scatterlist *sgl = (struct scatterlist *)data->buf; struct scatterlist *sg; char *mem = NULL; - unsigned long cmd_data_len = 0; + unsigned long cmd_data_len = data->data_len; int dma_nents, i; - for_each_sg(sgl, sg, data->size, i) - cmd_data_len += ib_sg_dma_len(dev, sg); - if (cmd_data_len > ISER_KMALLOC_THRESHOLD) mem = (void *)__get_free_pages(GFP_ATOMIC, ilog2(roundup_pow_of_two(cmd_data_len)) - PAGE_SHIFT); -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
iser patches for kernel 4.1 [v2]
Hi Roland, This set contains bug fixes as well as code refactoring patches - Patches 1-3: Bug fixes (stable material) - Patches 4-6: Bounce buffer related code cleanups - Patches 7-12,16: Refactoring - Patches 13-15: Minor optimizations - Patch 17:Version bump - Patch 18:Bounce buffer rewrite to handle situations where large transfers are unaligned causing iser to use high order allocations Changes from v1: - Addressed review comments (patch 18) from Alex Lyakas on bounce buffer re-write code (kmap/kunmap mismatch, possible use-after-free, and double free). Sagi Grimberg (18): IB/iser: Fix unload during ep_poll wrong dereference IB/iser: Handle fastreg/local_inv completion errors IB/iser: Fix wrong calculation of protection buffer length IB/iser: Remove redundant cmd_data_len calculation IB/iser: Remove a redundant struct iser_data_buf IB/iser: Don't pass ib_device to fall_to_bounce_buff routine IB/iser: Move memory reg/dereg routines to iser_memory.c IB/iser: Remove redundant assignments in iser_reg_page_vec IB/iser: Get rid of struct iser_rdma_regd IB/iser: Merge build page-vec into register page-vec IB/iser: Move fastreg descriptor pool get/put to helper functions IB/iser: Move PI context alloc/free to routines IB/iser: Make fastreg pool cache friendly IB/iser: Modify struct iser_mem_reg members IB/iser: Pass struct iser_mem_reg to iser_fast_reg_mr and iser_reg_sig_mr IB/iser: Remove code duplication for a single DMA entry IB/iser: Bump version to 1.6 IB/iser: Rewrite bounce buffer code path drivers/infiniband/ulp/iser/iscsi_iser.h | 66 +--- drivers/infiniband/ulp/iser/iser_initiator.c | 66 ++-- drivers/infiniband/ulp/iser/iser_memory.c| 524 +++--- drivers/infiniband/ulp/iser/iser_verbs.c | 220 4 files changed, 432 insertions(+), 444 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 03/18] IB/iser: Fix wrong calculation of protection buffer length
This length miss-calculation may cause a silent data corruption in the DIX case and cause the device to reference unmapped area. Fixes: d77e65350f2d ('libiscsi, iser: Adjust data_length to include protection information') Signed-off-by: Sagi Grimberg --- drivers/infiniband/ulp/iser/iser_initiator.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/ulp/iser/iser_initiator.c b/drivers/infiniband/ulp/iser/iser_initiator.c index 20e859a..76eb57b 100644 --- a/drivers/infiniband/ulp/iser/iser_initiator.c +++ b/drivers/infiniband/ulp/iser/iser_initiator.c @@ -409,8 +409,8 @@ int iser_send_command(struct iscsi_conn *conn, if (scsi_prot_sg_count(sc)) { prot_buf->buf = scsi_prot_sglist(sc); prot_buf->size = scsi_prot_sg_count(sc); - prot_buf->data_len = data_buf->data_len >> -ilog2(sc->device->sector_size) * 8; + prot_buf->data_len = (data_buf->data_len >> +ilog2(sc->device->sector_size)) * 8; } if (hdr->flags & ISCSI_FLAG_CMD_READ) { -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 06/18] IB/iser: Don't pass ib_device to fall_to_bounce_buff routine
No need to pass that, we can take it from the task. In a later stage, this function will be invoked according to a device capability. This patch does not change any functionality. Signed-off-by: Sagi Grimberg Signed-off-by: Adir Lev --- drivers/infiniband/ulp/iser/iser_memory.c | 12 ++-- 1 files changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/infiniband/ulp/iser/iser_memory.c b/drivers/infiniband/ulp/iser/iser_memory.c index beeabd0..9c60ff1 100644 --- a/drivers/infiniband/ulp/iser/iser_memory.c +++ b/drivers/infiniband/ulp/iser/iser_memory.c @@ -334,19 +334,19 @@ void iser_dma_unmap_task_data(struct iscsi_iser_task *iser_task, } static int fall_to_bounce_buf(struct iscsi_iser_task *iser_task, - struct ib_device *ibdev, struct iser_data_buf *mem, enum iser_data_dir cmd_dir, int aligned_len) { - struct iscsi_conn*iscsi_conn = iser_task->iser_conn->iscsi_conn; + struct iscsi_conn *iscsi_conn = iser_task->iser_conn->iscsi_conn; + struct iser_device *device = iser_task->iser_conn->ib_conn.device; iscsi_conn->fmr_unalign_cnt++; iser_warn("rdma alignment violation (%d/%d aligned) or FMR not supported\n", aligned_len, mem->size); if (iser_debug_level > 0) - iser_data_buf_dump(mem, ibdev); + iser_data_buf_dump(mem, device->ib_device); /* unmap the command data before accessing it */ iser_dma_unmap_task_data(iser_task, mem, @@ -384,7 +384,7 @@ int iser_reg_rdma_mem_fmr(struct iscsi_iser_task *iser_task, aligned_len = iser_data_buf_aligned_len(mem, ibdev); if (aligned_len != mem->dma_nents) { - err = fall_to_bounce_buf(iser_task, ibdev, mem, + err = fall_to_bounce_buf(iser_task, mem, cmd_dir, aligned_len); if (err) { iser_err("failed to allocate bounce buffer\n"); @@ -669,7 +669,7 @@ int iser_reg_rdma_mem_fastreg(struct iscsi_iser_task *iser_task, aligned_len = iser_data_buf_aligned_len(mem, ibdev); if (aligned_len != mem->dma_nents) { - err = fall_to_bounce_buf(iser_task, ibdev, mem, + err = fall_to_bounce_buf(iser_task, mem, cmd_dir, aligned_len); if (err) { iser_err("failed to allocate bounce buffer\n"); @@ -700,7 +700,7 @@ int iser_reg_rdma_mem_fastreg(struct iscsi_iser_task *iser_task, mem = &iser_task->prot[cmd_dir]; aligned_len = iser_data_buf_aligned_len(mem, ibdev); if (aligned_len != mem->dma_nents) { - err = fall_to_bounce_buf(iser_task, ibdev, mem, + err = fall_to_bounce_buf(iser_task, mem, cmd_dir, aligned_len); if (err) { iser_err("failed to allocate bounce buffer\n"); -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 14/18] IB/iser: Modify struct iser_mem_reg members
No need to keep lkey, va, len variables, we can keep them as struct ib_sge. This will help when we change the memory registration logic. This patch does not change any functionality. Signed-off-by: Sagi Grimberg Signed-off-by: Adir Lev --- drivers/infiniband/ulp/iser/iscsi_iser.h | 14 --- drivers/infiniband/ulp/iser/iser_initiator.c | 18 +++--- drivers/infiniband/ulp/iser/iser_memory.c| 30 +- 3 files changed, 29 insertions(+), 33 deletions(-) diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h b/drivers/infiniband/ulp/iser/iscsi_iser.h index 185d2ec..5fd0963 100644 --- a/drivers/infiniband/ulp/iser/iscsi_iser.h +++ b/drivers/infiniband/ulp/iser/iscsi_iser.h @@ -247,18 +247,14 @@ struct iscsi_endpoint; /** * struct iser_mem_reg - iSER memory registration info * - * @lkey: MR local key - * @rkey: MR remote key - * @va: MR start address (buffer va) - * @len: MR length + * @sge: memory region sg element + * @rkey: memory region remote key * @mem_h:pointer to registration context (FMR/Fastreg) */ struct iser_mem_reg { - u32 lkey; - u32 rkey; - u64 va; - u64 len; - void *mem_h; + struct ib_sgesge; + u32 rkey; + void*mem_h; }; enum iser_desc_type { diff --git a/drivers/infiniband/ulp/iser/iser_initiator.c b/drivers/infiniband/ulp/iser/iser_initiator.c index 420a613..b2e3b77 100644 --- a/drivers/infiniband/ulp/iser/iser_initiator.c +++ b/drivers/infiniband/ulp/iser/iser_initiator.c @@ -82,11 +82,11 @@ static int iser_prepare_read_cmd(struct iscsi_task *task) hdr->flags|= ISER_RSV; hdr->read_stag = cpu_to_be32(mem_reg->rkey); - hdr->read_va = cpu_to_be64(mem_reg->va); + hdr->read_va = cpu_to_be64(mem_reg->sge.addr); iser_dbg("Cmd itt:%d READ tags RKEY:%#.4X VA:%#llX\n", task->itt, mem_reg->rkey, -(unsigned long long)mem_reg->va); +(unsigned long long)mem_reg->sge.addr); return 0; } @@ -139,20 +139,20 @@ iser_prepare_write_cmd(struct iscsi_task *task, if (unsol_sz < edtl) { hdr->flags |= ISER_WSV; hdr->write_stag = cpu_to_be32(mem_reg->rkey); - hdr->write_va = cpu_to_be64(mem_reg->va + unsol_sz); + hdr->write_va = cpu_to_be64(mem_reg->sge.addr + unsol_sz); iser_dbg("Cmd itt:%d, WRITE tags, RKEY:%#.4X " "VA:%#llX + unsol:%d\n", task->itt, mem_reg->rkey, -(unsigned long long)mem_reg->va, unsol_sz); +(unsigned long long)mem_reg->sge.addr, unsol_sz); } if (imm_sz > 0) { iser_dbg("Cmd itt:%d, WRITE, adding imm.data sz: %d\n", task->itt, imm_sz); - tx_dsg->addr = mem_reg->va; + tx_dsg->addr = mem_reg->sge.addr; tx_dsg->length = imm_sz; - tx_dsg->lkey = mem_reg->lkey; + tx_dsg->lkey = mem_reg->sge.lkey; iser_task->desc.num_sge = 2; } @@ -479,9 +479,9 @@ int iser_send_data_out(struct iscsi_conn *conn, mem_reg = &iser_task->rdma_reg[ISER_DIR_OUT]; tx_dsg = &tx_desc->tx_sg[1]; - tx_dsg->addr= mem_reg->va + buf_offset; - tx_dsg->length = data_seg_len; - tx_dsg->lkey= mem_reg->lkey; + tx_dsg->addr = mem_reg->sge.addr + buf_offset; + tx_dsg->length = data_seg_len; + tx_dsg->lkey = mem_reg->sge.lkey; tx_desc->num_sge = 2; if (buf_offset + data_seg_len > iser_task->data[ISER_DIR_OUT].data_len) { diff --git a/drivers/infiniband/ulp/iser/iser_memory.c b/drivers/infiniband/ulp/iser/iser_memory.c index 45f5120..40d22d5 100644 --- a/drivers/infiniband/ulp/iser/iser_memory.c +++ b/drivers/infiniband/ulp/iser/iser_memory.c @@ -400,10 +400,10 @@ int iser_reg_page_vec(struct iscsi_iser_task *iser_task, return ret; } - mem_reg->lkey = fmr->fmr->lkey; + mem_reg->sge.lkey = fmr->fmr->lkey; mem_reg->rkey = fmr->fmr->rkey; - mem_reg->va = page_vec->pages[0] + page_vec->offset; - mem_reg->len = page_vec->data_size; + mem_reg->sge.addr = page_vec->pages[0] + page_vec->offset; + mem_reg->sge.length = page_vec->data_size; mem_reg->mem_h = fmr; return 0; @@ -479,17 +479,17 @@ int iser_reg_rdma_mem_fmr(struct iscsi_iser_task *iser_task, if (mem->dma_nents == 1) { sg = mem->sg; - mem_reg->lkey = device->mr->lkey; + mem_reg->sge.lkey = device->mr->lkey; mem_reg->rkey = device->mr->rkey; - mem_reg->len = ib_sg_dma_len(ibdev, &sg[0]); - mem_reg->va = ib_sg_dma_address(ibdev, &sg[0]); + mem_reg->sge.leng
[PATCH v2 17/18] IB/iser: Bump version to 1.6
Signed-off-by: Sagi Grimberg --- drivers/infiniband/ulp/iser/iscsi_iser.h |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h b/drivers/infiniband/ulp/iser/iscsi_iser.h index 5fd0963..a39645a 100644 --- a/drivers/infiniband/ulp/iser/iscsi_iser.h +++ b/drivers/infiniband/ulp/iser/iscsi_iser.h @@ -69,7 +69,7 @@ #define DRV_NAME "iser" #define PFXDRV_NAME ": " -#define DRV_VER"1.5" +#define DRV_VER"1.6" #define iser_dbg(fmt, arg...) \ do { \ -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH for-next 0/9] mlx4 changes in virtual GID management
On Tue, 31 Mar 2015, David Miller wrote: > > Well this needs to be addressed yes but in order to have that done someone > > needs to step forward do the proper work, maintain git trees, do the > > review and show up for the conferences. Neither Roland nor Or was there at > > the Infiniband conferences in Monterey this year(!) and this is the prime > > venue for face to face conversation about the subsystem each year. > > I don't go to IETF meetings, ever, yet I am rather sure that nobody > uses that to question my ability to be the networking maintainer. This isnt a standards body. Openfabrics is the organization dedicated to the development of the infiniband stack. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH for-next 0/9] mlx4 changes in virtual GID management
From: Christoph Lameter Date: Tue, 31 Mar 2015 07:49:38 -0500 (CDT) > Well this needs to be addressed yes but in order to have that done someone > needs to step forward do the proper work, maintain git trees, do the > review and show up for the conferences. Neither Roland nor Or was there at > the Infiniband conferences in Monterey this year(!) and this is the prime > venue for face to face conversation about the subsystem each year. I don't go to IETF meetings, ever, yet I am rather sure that nobody uses that to question my ability to be the networking maintainer. I've gone several years at times without meeting other networking developers as well, and that also I am pretty sure did not harm my stature as the networking maintainer. So I absolutely disagree that these two acts are requirements for someone whose job is to steadily and reasonably review and apply patches for a subsystem. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH for-next 0/9] mlx4 changes in virtual GID management
On Tue, 31 Mar 2015, Hal Rosenstock wrote: > > Sean and Hal showed up for the conference. Sean is active mostly in > > focusing on userspace stuff. Both could become more involved in the kernel > > ib tree and the review process? Hal is employed by Mellanox but is not > > participating in these patch related discussions? Can Mellanox give Hal > > more time for his role as the kernel ib subsystem maintainer? Maybe you > > could get that resolved at Mellanox? You already have one of your own as > > a maintainer of the IB subsystem. > > My involvement is limited to IB management related aspects in the kernel. Need to update the entry then I think. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH for-next 0/9] mlx4 changes in virtual GID management
From: Or Gerlitz Date: Tue, 31 Mar 2015 14:22:50 +0300 > On Tue, Mar 31, 2015 at 6:47 AM, Roland Dreier wrote: >>> Roland, I have to genuinely agree with Or, that your handling of >>> patch integration is sub-par and really painful for anyone actually >>> trying to get real work done here. >>> >>> If you simply don't have the time to devote to constantly reviewing >>> patches as they come in, and doing so in a timely manner, please let >>> someone who is actually interested and has the time to take over. >> >> It's a fair criticism, and certainly for at least the last year or so >> I have not had the time to do enough work as a maintainer. I have >> hope that some of the things that have been keeping me busy are dying >> down and that I'll have more time to spend on handling the RDMA tree, >> but that's just talk until I actually get more done. > > Roland, well, with few variations, this goes way beyond a year. I > would say more or less half of the time you're wearing the maintainer > hat (2005-now). The practice of updating your for-next branch to rc1 > only few days/hours before the the kernel is out and the merge window > opens, is an attitude, not lack of resources and will not be solved by > whatever processes people suggest. You need to act differently. Unfortunately, and no direct offense intend to you personally Roland, but I agree with Or here. If a person really cares about a subsystem they are marked at the maintainer for, they usually _make_ the time necessary to apply patches and attend to maintainership in a reasonable manner. If this "once every merge window" behavior was limited to one release cycle, I'd give the benefit of the doubt, but this has been going on for a very long time. You cannot on the one hand say "I care about this subsystem and the long term maintainership and ABI stability of it" yet on the other hand not be willing to put forth even the _MOST MINIMAL_ amount of effort and time required to steadily review and apply patches over a period of several years. It doesn't take a lot of time to do this work, especially if you use the correct tools. I can review and apply 100 patches a day, even when I'm on vacation. I'm an extreme example, but what you're doing right now Roland is not acceptable and is not in agreement with your claims about how much you care about this subsystem. If you processed the incoming patch queue even _once_ a week, we wouldn't even be having this conversation. But you haven't even been doing that. I get sick to my stomache when a patch gets to 3 days in my patch queue, that's already too long. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 00/15] NFS/RDMA patches proposed for 4.1
On 03/30/2015 02:33 PM, Chuck Lever wrote: > This is a series of client-side patches for NFS/RDMA. In preparation > for increasing the transport credit limit and maximum rsize/wsize, > I've re-factored the memory registration logic into separate files, > invoked via a method API. > > The series is available in the nfs-rdma-for-4.1 topic branch at > > git://linux-nfs.org/projects/cel/cel-2.6.git Thanks! I've applied these and pushed them out to my nfs-rdma git tree. Anna > > Changes since v2: > - Rebased on 4.0-rc6 > - One minor fix squashed into 01/15 > - Tested-by tags added > > Changes since v1: > - Rebased on 4.0-rc5 > - Main optimizations postponed to 4.2 > - Addressed review comments from Anna, Sagi, and Devesh > > --- > > Chuck Lever (15): > SUNRPC: Introduce missing well-known netids > xprtrdma: Display IPv6 addresses and port numbers correctly > xprtrdma: Perform a full marshal on retransmit > xprtrdma: Byte-align FRWR registration > xprtrdma: Prevent infinite loop in rpcrdma_ep_create() > xprtrdma: Add vector of ops for each memory registration strategy > xprtrdma: Add a "max_payload" op for each memreg mode > xprtrdma: Add a "register_external" op for each memreg mode > xprtrdma: Add a "deregister_external" op for each memreg mode > xprtrdma: Add "init MRs" memreg op > xprtrdma: Add "reset MRs" memreg op > xprtrdma: Add "destroy MRs" memreg op > xprtrdma: Add "open" memreg op > xprtrdma: Handle non-SEND completions via a callout > xprtrdma: Make rpcrdma_{un}map_one() into inline functions > > > include/linux/sunrpc/msg_prot.h|8 > include/linux/sunrpc/xprtrdma.h|5 > net/sunrpc/xprtrdma/Makefile |3 > net/sunrpc/xprtrdma/fmr_ops.c | 208 +++ > net/sunrpc/xprtrdma/frwr_ops.c | 353 ++ > net/sunrpc/xprtrdma/physical_ops.c | 94 + > net/sunrpc/xprtrdma/rpc_rdma.c | 87 ++-- > net/sunrpc/xprtrdma/transport.c| 61 ++- > net/sunrpc/xprtrdma/verbs.c| 699 > +++- > net/sunrpc/xprtrdma/xprt_rdma.h| 90 - > 10 files changed, 882 insertions(+), 726 deletions(-) > create mode 100644 net/sunrpc/xprtrdma/fmr_ops.c > create mode 100644 net/sunrpc/xprtrdma/frwr_ops.c > create mode 100644 net/sunrpc/xprtrdma/physical_ops.c > > -- > Chuck Lever > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 08/11] IB/Verbs: Use management helper has_iwarp() for, iwarp-check
On 3/31/2015 7:41 AM, Michael Wang wrote: Hi, Tom Thanks for the comments :-) On 03/31/2015 01:19 PM, Tom Talpey wrote: [oops - repeating my reply with full cc's] [snip] Actually I'm thinking about Doug's idea to use rdma_transport_is_XX() instead of the current basic helper, thus may be use rdma_transport_is_iwarp() in here could be better, since it's actually a feature of iwarp tech that RDMA Read only support one scatter-gather entry. No, you should expose an attribute to surface the maximum length of the remote gather list, which varies by adapter as well as protocol. The fact that iWARP is different from IB is not relevant, and conflates unrelated properties. To be confirmed, so your point is that the max-read-sges will be different even the transport is the same IWRAP, and that depends on the capability of adapter, correct? Yes, in fact the iWARP protocol does not preclude multiple read SGEs, even though most iWARP implementations have chosen to support just one. Even for multi-SGE-capable adapters, there is a limit of SGL size, based on the adapter's work request format and other factors. So my argument is that upper layers can and should query that, not make a blanket decision based on protocol type. I currently only find this one place where infer max-read-sges from transport type, it looks more like a special case to me rather than a generic method we could exposed... and also not very related with IB management helper concept IMHO. It is most certainly not a special case, but you could decide to introduce it in many ways. I'm not commenting on that. My main concern is that you do not introduce a new and clumsy "is iWARP" rule as an adapter-specific API requirement to expose the RDMA Read SGE behavior. That's what your initial message seemed to imply? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 08/11] IB/Verbs: Use management helper has_iwarp() for, iwarp-check
On 03/31/2015 03:42 PM, Tom Talpey wrote: > On 3/31/2015 7:41 AM, Michael Wang wrote: >> [snip] > Yes, in fact the iWARP protocol does not preclude multiple read SGEs, > even though most iWARP implementations have chosen to support just one. > > Even for multi-SGE-capable adapters, there is a limit of SGL size, based > on the adapter's work request format and other factors. So my argument > is that upper layers can and should query that, not make a blanket > decision based on protocol type. Thanks for the explanation Sounds like some new callback on device level like query_device() to acquire the right info. >> I currently only find this one place where infer max-read-sges from >> transport type, it looks more like a special case to me rather than a generic >> method we could exposed... and also not very related with IB management >> helper concept IMHO. > It is most certainly not a special case, but you could decide to > introduce it in many ways. I'm not commenting on that. > > My main concern is that you do not introduce a new and clumsy "is iWARP" > rule as an adapter-specific API requirement to expose the RDMA Read SGE > behavior. That's what your initial message seemed to imply? Yeah I planed to just use rdma_transport_iwarp() to replace the check, it's actually meaningless but just refine, frankly speaking I would prefer some driver developer to work on that part, at this point I prefer to focus on introducing the management helpers firstly Maybe we could mark it as a TODO temporarily, if later we found more scenes using similar logical, we can collect them and do some reform Regards, Michael Wang -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 06/11] IB/Verbs: Use management helper has_sa() and cap_sa(), for sa-check
On 03/30/2015 07:02 PM, Doug Ledford wrote: > On Mon, 2015-03-30 at 18:42 +0200, Michael Wang wrote: >> [snip] >> >> rdma_transport_is_ib and rdma_port_ll_is_ib is actually just rough helper >> to save some code, we can get rid of them when we no longer need them, but >> currently device driver still using them a lot, I'm not sure if the new >> mechanism could take cover all these cases... > [snip] > > If we use something like this, then the above is all you need. Then > every place in the code that checks for something like has_sa or cap_sa > can be replaced with rdma_ib_mgmt. When Ira updates his patches for > this, he can check for rdma_opa_mgmt to enable jumbo MAD packets and > whatever else he needs. Every place that does transport == IB and ll == > Ethernet can become rdma_transport_is_roce. Every place that does > transport == IB and ll == INFINIBAND becomes rdma_transport_is_ib. Get your point :-) I need to investigate all those cases see if it works, in case if things unclear, I'll reform it according to my understanding to adapt to this new mechanism, if there are any misunderstanding, we can address them case by case during the review. > The > code in multicast.c just needs to check rdma_ib_mgmt() (which happens to > make perfect sense anyway as the code in multicast.c that is checking > that we are on an IB interface is doing so because IB requires extra > management of the multicast group joins/leaves). I'll adopt those helpers already been discussed, and they will be implemented with the helpers from new mechanism (eg, rdma_ib_mgmt) in next version. > But, like I said, this > is an all or nothing change, it isn't something we can ease into. It's will be great if we can make it, let's see ;-) Regards, Michael Wang > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH for-next 0/9] mlx4 changes in virtual GID management
On 3/31/2015 8:49 AM, Christoph Lameter wrote: > Well this needs to be addressed yes but in order to have that done someone > needs to step forward do the proper work, maintain git trees, do the > review and show up for the conferences. Neither Roland nor Or was there at > the Infiniband conferences in Monterey this year(!) and this is the prime > venue for face to face conversation about the subsystem each year. > > Just looked at the MAINTAINERS file: Surprise: We already have 3 > maintainers for the IB subsystem: > > INFINIBAND SUBSYSTEM > M: Roland Dreier > M: Sean Hefty > M: Hal Rosenstock > L: linux-rdma@vger.kernel.org > W: http://www.openfabrics.org/ > Q: http://patchwork.kernel.org/project/linux-rdma/list/ > T: git > git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git > S: Supported > F: Documentation/infiniband/ > F: drivers/infiniband/ > F: include/uapi/linux/if_infiniband.h > > Sean and Hal showed up for the conference. Sean is active mostly in > focusing on userspace stuff. Both could become more involved in the kernel > ib tree and the review process? Hal is employed by Mellanox but is not > participating in these patch related discussions? Can Mellanox give Hal > more time for his role as the kernel ib subsystem maintainer? Maybe you > could get that resolved at Mellanox? You already have one of your own as > a maintainer of the IB subsystem. My involvement is limited to IB management related aspects in the kernel. -- Hal -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH for-next 0/9] mlx4 changes in virtual GID management
Well this needs to be addressed yes but in order to have that done someone needs to step forward do the proper work, maintain git trees, do the review and show up for the conferences. Neither Roland nor Or was there at the Infiniband conferences in Monterey this year(!) and this is the prime venue for face to face conversation about the subsystem each year. Just looked at the MAINTAINERS file: Surprise: We already have 3 maintainers for the IB subsystem: INFINIBAND SUBSYSTEM M: Roland Dreier M: Sean Hefty M: Hal Rosenstock L: linux-rdma@vger.kernel.org W: http://www.openfabrics.org/ Q: http://patchwork.kernel.org/project/linux-rdma/list/ T: git git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git S: Supported F: Documentation/infiniband/ F: drivers/infiniband/ F: include/uapi/linux/if_infiniband.h Sean and Hal showed up for the conference. Sean is active mostly in focusing on userspace stuff. Both could become more involved in the kernel ib tree and the review process? Hal is employed by Mellanox but is not participating in these patch related discussions? Can Mellanox give Hal more time for his role as the kernel ib subsystem maintainer? Maybe you could get that resolved at Mellanox? You already have one of your own as a maintainer of the IB subsystem. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 08/11] IB/Verbs: Use management helper has_iwarp() for, iwarp-check
Hi, Tom Thanks for the comments :-) On 03/31/2015 01:19 PM, Tom Talpey wrote: > [snip] > >> >> Actually I'm thinking about Doug's idea to use rdma_transport_is_XX() >> instead of the current basic helper, thus may be use >> rdma_transport_is_iwarp() >> in here could be better, since it's actually a feature of iwarp tech >> that RDMA Read only support one scatter-gather entry. > > No, you should expose an attribute to surface the maximum length of > the remote gather list, which varies by adapter as well as protocol. > The fact that iWARP is different from IB is not relevant, and conflates > unrelated properties. To be confirmed, so your point is that the max-read-sges will be different even the transport is the same IWRAP, and that depends on the capability of adapter, correct? I currently only find this one place where infer max-read-sges from transport type, it looks more like a special case to me rather than a generic method we could exposed... and also not very related with IB management helper concept IMHO. Regards, Michael Wang -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 08/11] IB/Verbs: Use management helper has_iwarp() for, iwarp-check
On 3/31/2015 3:39 AM, Michael Wang wrote: On 03/31/2015 12:35 AM, Jason Gunthorpe wrote: On Mon, Mar 30, 2015 at 05:10:12PM +0200, Michael Wang wrote: I found that actually we don't have to touch this one which only used by HW driver currently. I'm having a hard time understanding this, the code in question was in net/sunrpc/xprtrdma/svc_rdma_recvfrom.c Which is the NFS ULP, not a device driver. I'm not familiar with this part too :-P but yes, it looks like an ulp to support NFS. It's the NFS server itself. Actually I'm thinking about Doug's idea to use rdma_transport_is_XX() instead of the current basic helper, thus may be use rdma_transport_is_iwarp() in here could be better, since it's actually a feature of iwarp tech that RDMA Read only support one scatter-gather entry. No, you should expose an attribute to surface the maximum length of the remote gather list, which varies by adapter as well as protocol. The fact that iWARP is different from IB is not relevant, and conflates unrelated properties. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH for-next 0/9] mlx4 changes in virtual GID management
On Tue, Mar 31, 2015 at 6:47 AM, Roland Dreier wrote: >> Roland, I have to genuinely agree with Or, that your handling of >> patch integration is sub-par and really painful for anyone actually >> trying to get real work done here. >> >> If you simply don't have the time to devote to constantly reviewing >> patches as they come in, and doing so in a timely manner, please let >> someone who is actually interested and has the time to take over. > > It's a fair criticism, and certainly for at least the last year or so > I have not had the time to do enough work as a maintainer. I have > hope that some of the things that have been keeping me busy are dying > down and that I'll have more time to spend on handling the RDMA tree, > but that's just talk until I actually get more done. Roland, well, with few variations, this goes way beyond a year. I would say more or less half of the time you're wearing the maintainer hat (2005-now). The practice of updating your for-next branch to rc1 only few days/hours before the the kernel is out and the merge window opens, is an attitude, not lack of resources and will not be solved by whatever processes people suggest. You need to act differently. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH for-next 3/9] net/mlx4_core: Set initial admin GUIDs for VFs
On 3/30/2015 8:16 PM, Jason Gunthorpe wrote: But, it doesn't really make sense to use eth_random_addr (which doesn't have a special OUI) and not randomize every bit. get_random_bytes(&guid, sizeof(guid)); guid &= ~(1ULL << 56); guid |= 1ULL << 57; OK, we can do that, sure. I also don't think the kernel should be generating random GUIDs. Either the SA should be consulted to do this, or the management stack should generate a cloud wide unique number. There **is** an option to consult with the SM controlled by the sm_guid_assign module param of mlx4_ib. When this mode is in action, the randomized guid would be overridden by the one assigned by the SA. In real life environments, people will use the "host assigned" option (sm_guid_assign = false) where the virtualization systems generate a unique guid to be used by the VF/VM and provision it through mlx4_ib to be provided to VF when the bindin. The only place where the random value will be used, is for OOTB experience for staging purposes and such, when one wants to have working system with zero-touch. The practice of generating locally random address for non-provisioned VF is also used by the pretty much the whole set of Ethernet VF drivers (ixgbe_vf, i40e_vf, mlx4 VF code, etc). This is low cost solution for point cases which need that. Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [-stable] commit 377b513485fd ("IB/core: Avoid leakage from kernel to user space")
On 03/27/2015, 01:42 PM, Yann Droneaud wrote: > Hi, > > Please add commit 377b513485fd ("IB/core: Avoid leakage from kernel to > user space") to -stable. It can be applied to v2.6.32 and later. Now pushed to 3.12. Thanks. -- js suse labs -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH for-next 0/9] mlx4 changes in virtual GID management
On 3/31/2015 6:47 AM, Roland Dreier wrote: Roland, I have to genuinely agree with Or, that your handling of patch integration is sub-par and really painful for anyone actually trying to get real work done here. If you simply don't have the time to devote to constantly reviewing patches as they come in, and doing so in a timely manner, please let someone who is actually interested and has the time to take over. It's a fair criticism, and certainly for at least the last year or so I have not had the time to do enough work as a maintainer. I have hope that some of the things that have been keeping me busy are dying down and that I'll have more time to spend on handling the RDMA tree, but that's just talk until I actually get more done. I really would like to get more people involved in handling the flow of patches but I'm not sure who has not only the interest and the time but also the judgement and expertise to take over. Certainly Or has been a long time contributor who has done a lot of great things, but I still worry about things like ABI stability and backwards compatibility. But I'm open to ideas. We had a talk about a similar topic at LSFMM15. Linux-scsi subsystem is a large scale subsystem and also has the "single maintainer with limited time for upstream maintenance" bottleneck. Christoph created the "scsi-queue" tree to feed James in order to accelerate the work process. One thought was laying a set of rules that would allow a maintainer to "just apply patches": - Obviously applies cleanly and does not produce compilation errors/warning (patches that don't meet this will be removed) - At least two Reviewed-by tags (one of them can be Acked-by/Tested-by) One problem with that is that we need "a christoph" to poke people for review since not a lot of people giving review. Another thought was to allow multiple maintainers (my understanding is the patchwork supports it). If we can find some split to allow different maintainers to feed Roland this might work. Thoughts? Sagi. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 08/11] IB/Verbs: Use management helper has_iwarp() for, iwarp-check
On 03/31/2015 12:35 AM, Jason Gunthorpe wrote: > On Mon, Mar 30, 2015 at 05:10:12PM +0200, Michael Wang wrote: >> I found that actually we don't have to touch this one which >> only used by HW driver currently. > I'm having a hard time understanding this, the code in question was in > > net/sunrpc/xprtrdma/svc_rdma_recvfrom.c > > Which is the NFS ULP, not a device driver. I'm not familiar with this part too :-P but yes, it looks like an ulp to support NFS. Actually I'm thinking about Doug's idea to use rdma_transport_is_XX() instead of the current basic helper, thus may be use rdma_transport_is_iwarp() in here could be better, since it's actually a feature of iwarp tech that RDMA Read only support one scatter-gather entry. But I need more investigation on that idea, there are some part especially inside device driver I'm not very clear, things could be more easier if the semantic there is compatible with Doug's proposal ;-) Regards, Michael Wang > > Regards, > Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 07/11] IB/Verbs: Use management helper has_mcast() and, cap_mcast() for mcast-check
On 03/31/2015 01:47 AM, ira.weiny wrote: > On Mon, Mar 30, 2015 at 06:20:48PM +0200, Michael Wang wrote: >> [snip] Cc: Jason Gunthorpe Cc: Doug Ledford Cc: Ira Weiny Cc: Sean Hefty Signed-off-by: Michael Wang --- drivers/infiniband/core/cma.c | 2 +- drivers/infiniband/core/multicast.c | 8 include/rdma/ib_verbs.h | 28 3 files changed, 33 insertions(+), 5 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 276fb76..cbbc85b 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -3398,7 +3398,7 @@ void rdma_leave_multicast(struct rdma_cm_id *id, struct sockaddr *addr) ib_detach_mcast(id->qp, &mc->multicast.ib->rec.mgid, be16_to_cpu(mc->multicast.ib->rec.mlid)); -if (rdma_transport_is_ib(id_priv->cma_dev->device)) { +if (has_mcast(id_priv->cma_dev->device)) { > > You need a similar check in rdma_join_multicast. Thanks for the remind, I'll give it a reform :-) Regards, Michael Wang > > Ira > > switch (rdma_port_get_link_layer(id->device, id->port_num)) { case IB_LINK_LAYER_INFINIBAND: ib_sa_free_multicast(mc->multicast.ib); diff --git a/drivers/infiniband/core/multicast.c b/drivers/infiniband/core/multicast.c index 17573ff..ffeaf27 100644 --- a/drivers/infiniband/core/multicast.c +++ b/drivers/infiniband/core/multicast.c @@ -780,7 +780,7 @@ static void mcast_event_handler(struct ib_event_handler *handler, int index; dev = container_of(handler, struct mcast_device, event_handler); -if (!rdma_port_ll_is_ib(dev->device, event->element.port_num)) +if (!cap_mcast(dev->device, event->element.port_num)) return; index = event->element.port_num - dev->start_port; @@ -807,7 +807,7 @@ static void mcast_add_one(struct ib_device *device) int i; int count = 0; -if (!rdma_transport_is_ib(device)) +if (!has_mcast(device)) return; dev = kmalloc(sizeof *dev + device->phys_port_cnt * sizeof *port, @@ -823,7 +823,7 @@ static void mcast_add_one(struct ib_device *device) } for (i = 0; i <= dev->end_port - dev->start_port; i++) { -if (!rdma_port_ll_is_ib(device, dev->start_port + i)) +if (!cap_mcast(device, dev->start_port + i)) continue; port = &dev->port[i]; port->dev = dev; @@ -861,7 +861,7 @@ static void mcast_remove_one(struct ib_device *device) flush_workqueue(mcast_wq); for (i = 0; i <= dev->end_port - dev->start_port; i++) { -if (rdma_port_ll_is_ib(device, dev->start_port + i)) { +if (cap_mcast(device, dev->start_port + i)) { port = &dev->port[i]; deref_port(port); wait_for_completion(&port->comp); diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index fa8ffa3..e796104 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -1823,6 +1823,19 @@ static inline int has_sa(struct ib_device *device) } /** + * has_mcast - Check if a device support Multicast. + * + * @device: Device to be checked + * + * Return 0 when a device has none port to support + * Multicast. + */ +static inline int has_mcast(struct ib_device *device) +{ +return rdma_transport_is_ib(device); +} + +/** * cap_smi - Check if the port of device has the capability * Subnet Management Interface. * @@ -1852,6 +1865,21 @@ static inline int cap_sa(struct ib_device *device, u8 port_num) return rdma_port_ll_is_ib(device, port_num); } +/** + * cap_mcast - Check if the port of device has the capability + * Multicast. + * + * @device: Device to be checked + * @port_num: Port number of the device + * + * Return 0 when port of the device don't support + * Multicast. + */ +static inline int cap_mcast(struct ib_device *device, u8 port_num) +{ +return rdma_port_ll_is_ib(device, port_num); +} + int ib_query_gid(struct ib_device *device, u8 port_num, int index, union ib_gid *gid); >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a m