Re: [RFC PATCH 06/11] IB/Verbs: Use management helper has_sa() and cap_sa(), for sa-check

2015-03-31 Thread Jason Gunthorpe
On Tue, Mar 31, 2015 at 08:51:13PM -0400, ira.weiny wrote:

> > Going forward we want to NAK stuff like this:
> > 
> >   if (rdma_ib_mgmt() || rdma_opa_mgmt())
> >   if (has_sa() || has_opa_foobar())
> 
> I'm confused.
> 
> Is the idea that you would NAK has_sa but be in favor of has_ib_sa?

It is the || I don't want to see. Feature tests should be one thing.

> I believe it is reasonable to flag OPA MAD support as a feature set through a
> single bit.  From this the MAD stack can know if OPA MAD base/class versions
> are allowed (or treated differently from future IB versions) and if it should
> processing OPA SM Class DR MADs differently.  I don't want devices to be
> required to supply a list of MAD Base/Class Versions they support.

That seems reasonable, but we'd gain a is_ib_smp and is_opa_smp test
to do that.

> For example, the MAD size is more generally (and perhaps better) communicated
> via an actual mad_size attribute rather than as part of the OPA MAD feature
> set.

Start with a bit and change when that makes no sense, MTUs as ints
make sense.

> Another example is the question of is it appropriate to wrap up the single 
> READ
> SGL entry support within the "is iwarp" flag?

Again, stick with a single bit test until a HW vendors needs more,
then someone can clean it. At least they now know what to clean and
why.

Remember, this is in the kernel, we can change it when it outlives its
life.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 06/11] IB/Verbs: Use management helper has_sa() and cap_sa(), for sa-check

2015-03-31 Thread ira.weiny
On Tue, Mar 31, 2015 at 05:12:02PM -0600, Jason Gunthorpe wrote:
> On Mon, Mar 30, 2015 at 01:02:03PM -0400, Doug Ledford wrote:
> 
> > If we use something like this, then the above is all you need.  Then
> > every place in the code that checks for something like has_sa or cap_sa
> > can be replaced with rdma_ib_mgmt. 
> 
> I don't want to see this slide back into some ill defined feature
> test.
> 
> We need to clearly define exactly what these tests are checking, and I
> think, since we have so much code sharing, the checks should be narrow
> in scope, not just an entire standard.

Somewhat agree.

> 
> I think the basic family of checks Michael identified seemed like
> a reasonable start.
> 
> Going forward we want to NAK stuff like this:
> 
>   if (rdma_ib_mgmt() || rdma_opa_mgmt())
>   if (has_sa() || has_opa_foobar())

I'm confused.

Is the idea that you would NAK has_sa but be in favor of has_ib_sa?

I think cap_opa_mad is reasonable.  And this confuses me why has_opa_foobar
would be NAKed.

I believe it is reasonable to flag OPA MAD support as a feature set through a
single bit.  From this the MAD stack can know if OPA MAD base/class versions
are allowed (or treated differently from future IB versions) and if it should
processing OPA SM Class DR MADs differently.  I don't want devices to be
required to supply a list of MAD Base/Class Versions they support.

> 
> That indicates we need a new micro feature test.

The million dollar question is how micro we should go.

More specifically which feature sets can be appropriately communicated with a
single bit vs not.

For example, the MAD size is more generally (and perhaps better) communicated
via an actual mad_size attribute rather than as part of the OPA MAD feature
set.

Another example is the question of is it appropriate to wrap up the single READ
SGL entry support within the "is iwarp" flag?

https://www.mail-archive.com/linux-rdma@vger.kernel.org/msg23611.html

Right now there is implicit information being gleaned from the "is iwarp" flag.
That is that an iWarp device is limited to only 1 READ SGL entry.  As this is
the only device type which has this limitation _and_ that number is "well
known" it works.  Even if not the best solution.

The task at hand is to clean up implicit information passing like this while
not getting mired down in our own refactoring.

> 
> A big problem here is people working on the core may not know the
> intricate details of all the families. This will only get worse when
> proprietary tech like OPA gets added. Documenting requirements via a
> narrow feature test gives a much higher chance that core stuff will be
> right via review.

Agreed as long as we get the width right.

> 
> > But, like I said, this is an all or nothing change, it isn't
> > something we can ease into.
> 
> Well, we ease into it by introducing the micro tests and then wiping
> the legacy ones, followed by changing how the driver/core communicates
> the port and device feature set, ideally to a bitset like you've
> described.

I think this does need to be eased into.  We seem to agree that the feature
flags we have now are not granular and/or explicit enough.  So lets start
cleaning up some of these tests and see how it goes.

Ira

> 
> Michael has tackled the core code, another series could work on the
> drivers..
> 
> Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 08/11] IB/Verbs: Use management helper has_iwarp() for, iwarp-check

2015-03-31 Thread Jason Gunthorpe
On Mon, Mar 30, 2015 at 12:13:00PM -0400, Doug Ledford wrote:
> On Fri, 2015-03-27 at 16:47 +0100, Michael Wang wrote:
> > Introduce helper has_iwarp() to help us check if an IB device
> > support IWARP protocol.
> 
> This is a needless redirection.  Just stick with the original
> rdma_transport_is_iwarp().

Sticking with the original isn't really the point.

The point here, is to document what Tom was talking about - some ports
can only support one RDMA READ SGL entry, even though they support
multiple RDMA WRITE SGL entries. Today the query API assumes
READ/WRITE/SEND are symmetrical.

has_rdma_read_sgl() is a good way to document that for now, and is a big
flashing marker where something might need to be fixed in the
future.

This tells everyone reading the code and the API that when working
with RDMA READ they need to be aware of this limitation.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 06/11] IB/Verbs: Use management helper has_sa() and cap_sa(), for sa-check

2015-03-31 Thread Jason Gunthorpe
On Mon, Mar 30, 2015 at 01:02:03PM -0400, Doug Ledford wrote:

> If we use something like this, then the above is all you need.  Then
> every place in the code that checks for something like has_sa or cap_sa
> can be replaced with rdma_ib_mgmt. 

I don't want to see this slide back into some ill defined feature
test.

We need to clearly define exactly what these tests are checking, and I
think, since we have so much code sharing, the checks should be narrow
in scope, not just an entire standard.

I think the basic family of checks Michael identified seemed like
a reasonable start.

Going forward we want to NAK stuff like this:

  if (rdma_ib_mgmt() || rdma_opa_mgmt())
  if (has_sa() || has_opa_foobar())

That indicates we need a new micro feature test.

A big problem here is people working on the core may not know the
intricate details of all the families. This will only get worse when
proprietary tech like OPA gets added. Documenting requirements via a
narrow feature test gives a much higher chance that core stuff will be
right via review.

> But, like I said, this is an all or nothing change, it isn't
> something we can ease into.

Well, we ease into it by introducing the micro tests and then wiping
the legacy ones, followed by changing how the driver/core communicates
the port and device feature set, ideally to a bitset like you've
described.

Michael has tackled the core code, another series could work on the
drivers..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH for-next 0/9] mlx4 changes in virtual GID management

2015-03-31 Thread Hefty, Sean
> I must disagree. You made this claim back when we submitted the IB
> core patch which adds support for signature, protection etc offloads
> (commit 1b01d33560e7 "IB/core: Introduce signature verbs API" and the
> detailed replied you got were explaining that

And I'll continue to make this claim because it is true.

> Very similar arguments would apply to the ODP submission.

Yes - ODP will _have_ to change the core.  USNIC had to change the core, and 
had changes rejected that made perfect sense conceptually.  This is part of the 
problem!!!   If you disagree, then submit patches that don't touch the core.

> So examples please!

Sure - this is from Somnath's latest patch series:

Matan Barak (14):
  IB/core: Add RoCE GID cache
  IB/core: Add kref to IB devices
  IB/core: Add RoCE GID population
  IB/core: Add default GID for RoCE GID Cache
  net/bonding: make DRV macros private
  net: Add info for NETDEV_CHANGEUPPER event
  IB/core: Add RoCE cache bonding support
  IB/core: GID attribute should be returned from verbs API and cache API
  IB/core: Report gid_type and gid_ndev through sysfs
  IB/core: Support find sgid index using a filter function
  IB/core: Modify ib_verbs and cma in order to use roce_gid_cache
  IB/core: Add gid_type to path and rdma_id_private
  IB/core: Add rdma_network_type to wc
  IB/cma: Add configfs for rdma_cm

Moni Shoua (13):
  IB/mlx4: Remove gid table management for RoCE
  IB/mlx4: Replace spin_lock with rw_semaphore
  IB/mlx4: Lock with RCU instead of RTNL
  net/mlx4: Postpone the registration of net_device
  IB/mlx4: Advertise RoCE support in port capabilities
  IB/mlx4: Implement ib_device callback - get_netdev
  IB/mlx4: Implement ib_device callback - modify_gid
  IB/mlx4: Configure device to work in RoCEv2
  IB/mlx4: Translate cache gid index to real index
  IB/core: Initialize UD header structure with IP and UDP headers
  IB/mlx4: Enable send of RoCE QP1 packets with IP/UDP headers
  IB/mlx4: Create and use another QP1 for RoCEv2
  IB/cma: Join and leave multicast groups with IGMP

That's a significant number of patches that modify the core rdma layer.  
NFSoRDMA had 5 different ways to register memory.

I agree that Roland's response time is ridiculously slow.  But he does tend to 
merge in new drivers and updates that only touch a single vendor's driver 
fairly quickly.  It's the thrashing on the core that sees significant delays.


Re: [PATCH 0/9] IB/ipoib: fixup multicast locking issues

2015-03-31 Thread Or Gerlitz
On Tue, Mar 31, 2015 at 8:04 PM, ira.weiny  wrote:
> On Sun, Mar 15, 2015 at 11:52:44AM -0700, Doug Ledford wrote:
>>
>> > On Mar 13, 2015, at 1:41 AM, Or Gerlitz  wrote:
>> >
>> > On Sun, Feb 22, 2015 at 2:26 AM, Doug Ledford  wrote:
>> >> This is the re-ordered, squashed version of my 22 patch set that I
>> >> posted on Feb 11.  There are a few minor differences between that
>> >> set and this one.  They are:
>> > [...]
>> >
>> > Doug, you wrote here a very detailed listing of the changes from
>> > earlier posts and the testing the patches went through, which is
>> > excellent. It would be very good if you can also post few liner
>> > telling the changes done by the series in high level, so we can have
>> > this test as part of a "merge" commit that says in the kernel logs.
>>
>> OK.  I would take what I had in the original message and expand upon it then:
>>
>> This entire patchset was intended to address the issue of ipoib
>> interfaces being brought up/down in a tight loop, which will hardlock
>> a standard v3.19 kernel.  It succeeds at resolving that problem.
>
>
> I pulled this series and did some medium weight testing on 3.19 (module
> reloads, insmod/rmmod, opensm restarts (client re-register)).  IPoIB recovered
> without issue on each of the tests.
> Tested-by: Ira Weiny 

Yep, here too. We tested upstream + this series and it works well.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-next 0/9] mlx4 changes in virtual GID management

2015-03-31 Thread Or Gerlitz
On Tue, Mar 31, 2015 at 8:27 PM, Hefty, Sean  wrote:


> I honestly think part of the issue here is general approach being taken by 
> the rdma stack, which is to expose every hardware specific component that a 
> vendor might define through the interfaces up to the consumers.

Sean,

I must disagree. You made this claim back when we submitted the IB
core patch which adds support for signature, protection etc offloads
(commit 1b01d33560e7 "IB/core: Introduce signature verbs API" and the
detailed replied you got were explaining that

1. this allows to offload SCSI T10 signature for RDMA block storage
protocols such as SRP and iSER
2. SCSI, SRP and iSER are all industry standards
3. T10 offload is done also by competing technologies such as Fibre-Channel
4. nothing, but exactly nothing in the verbs API is tied to specific
HCA implementation
5. etc

Very similar arguments would apply to the ODP submission.

So examples please!

Or.

This results in unwieldy interfaces that are under constant churn,
with no clear direction.  If most submissions to the rdma stack were
contained the lower-level drivers, I don't know that we would have the
issues that we do.  But a significant percentage of rdma patches
modify the rdma core software layer, which threaten the stability of
the entire stack.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-next 0/9] mlx4 changes in virtual GID management

2015-03-31 Thread Jason Gunthorpe
On Tue, Mar 31, 2015 at 12:13:58PM +0300, Sagi Grimberg wrote:

> We had a talk about a similar topic at LSFMM15. Linux-scsi subsystem
> is a large scale subsystem and also has the "single maintainer with
> limited time for upstream maintenance" bottleneck.
> 
> Christoph created the "scsi-queue" tree to feed James in order to
> accelerate the work process.

[..]

> One problem with that is that we need "a christoph" to poke people for
> review since not a lot of people giving review.

I have tried for a few years to get enough interested/qualified people
together to do this kind of system, but it just hasn't happened.

It feels like we are stuck in a loop: People and companies *do not*
want to work on drivers/infiniband because too often patches just die.
Asking people to work on it, and companies to fund work, gets met with
the above.

I think there is a broad agreement we need to change the system
here. So, lets hear from people and companies: put your name forward,
consistently provide time to do serious review work on all patches.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH for-next 0/9] mlx4 changes in virtual GID management

2015-03-31 Thread Hefty, Sean
> It doesn't take a lot of time to do this work, especially if you use
> the correct tools.  I can review and apply 100 patches a day, even
> when I'm on vacation.  I'm an extreme example, but what you're doing
> right now Roland is not acceptable and is not in agreement with your
> claims about how much you care about this subsystem.

I honestly think part of the issue here is general approach being taken by the 
rdma stack, which is to expose every hardware specific component that a vendor 
might define through the interfaces up to the consumers.  This results in 
unwieldy interfaces that are under constant churn, with no clear direction.  If 
most submissions to the rdma stack were contained the lower-level drivers, I 
don't know that we would have the issues that we do.  But a significant 
percentage of rdma patches modify the rdma core software layer, which threaten 
the stability of the entire stack.

- Sean
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/9] IB/ipoib: fixup multicast locking issues

2015-03-31 Thread ira.weiny
On Sun, Mar 15, 2015 at 11:52:44AM -0700, Doug Ledford wrote:
> 
> > On Mar 13, 2015, at 1:41 AM, Or Gerlitz  wrote:
> > 
> > On Sun, Feb 22, 2015 at 2:26 AM, Doug Ledford  wrote:
> >> This is the re-ordered, squashed version of my 22 patch set that I
> >> posted on Feb 11.  There are a few minor differences between that
> >> set and this one.  They are:
> > [...]
> > 
> > Doug, you wrote here a very detailed listing of the changes from
> > earlier posts and the testing the patches went through, which is
> > excellent. It would be very good if you can also post few liner
> > telling the changes done by the series in high level, so we can have
> > this test as part of a "merge" commit that says in the kernel logs.
> 
> OK.  I would take what I had in the original message and expand upon it then:
> 
> This entire patchset was intended to address the issue of ipoib
> interfaces being brought up/down in a tight loop, which will hardlock
> a standard v3.19 kernel.  It succeeds at resolving that problem.


I pulled this series and did some medium weight testing on 3.19 (module
reloads, insmod/rmmod, opensm restarts (client re-register)).  IPoIB recovered
without issue on each of the tests.

Tested-by: Ira Weiny 

> 
> In order to accomplish this goal, it reworks how the IPOIB_MCAST_FLAG_BUSY 
> flag is used.  Conceptually, that flag used to be set when we started a 
> multicast join, and would stay set once the join was complete.  This left no 
> way to tell if the multicast join was complete or still in flight.  This 
> allowed race conditions to develop between joining multicast groups and 
> taking an interface down.  A previous attempt to resolve these race 
> conditions used the flag IPOIB_MCAST_JOIN_STARTED, but did not succeed at 
> fully resolving the race conditions.  This patchset resolves this issue, plus 
> a number of related issues discovered while working on this issue.  The 
> primary fix itself is patch 6/9 and a more complete description of the 
> changes to how the IPOIB_MCAST_FLAG_BUSY flag is now used can be found in 
> that commit log.
> 
> —
> Doug Ledford 
>   GPG Key ID: 0E572FDD
> 
> 
> 
> 
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 13/18] IB/iser: Make fastreg pool cache friendly

2015-03-31 Thread Sagi Grimberg
Memory regions are resources that are saved
in the device caches. Increase the probability for
a cache hit by adding the MRU descriptor to pool
head.

Signed-off-by: Sagi Grimberg 
---
 drivers/infiniband/ulp/iser/iser_memory.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iser_memory.c 
b/drivers/infiniband/ulp/iser/iser_memory.c
index 17a5d70..45f5120 100644
--- a/drivers/infiniband/ulp/iser/iser_memory.c
+++ b/drivers/infiniband/ulp/iser/iser_memory.c
@@ -63,7 +63,7 @@ iser_reg_desc_put(struct ib_conn *ib_conn,
unsigned long flags;
 
spin_lock_irqsave(&ib_conn->lock, flags);
-   list_add_tail(&desc->list, &ib_conn->fastreg.pool);
+   list_add(&desc->list, &ib_conn->fastreg.pool);
spin_unlock_irqrestore(&ib_conn->lock, flags);
 }
 
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 07/18] IB/iser: Move memory reg/dereg routines to iser_memory.c

2015-03-31 Thread Sagi Grimberg
As memory registration/de-registration methods, lets
move them to their natural location. While we're at it,
make iser_reg_page_vec routine static.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg 
---
 drivers/infiniband/ulp/iser/iscsi_iser.h  |4 -
 drivers/infiniband/ulp/iser/iser_memory.c |   88 +
 drivers/infiniband/ulp/iser/iser_verbs.c  |   87 
 3 files changed, 88 insertions(+), 91 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h 
b/drivers/infiniband/ulp/iser/iscsi_iser.h
index 5c7036c..d5e5288 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.h
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.h
@@ -632,10 +632,6 @@ int  iser_connect(struct iser_conn *iser_conn,
  struct sockaddr *dst_addr,
  int non_blocking);
 
-int  iser_reg_page_vec(struct ib_conn *ib_conn,
-  struct iser_page_vec *page_vec,
-  struct iser_mem_reg *mem_reg);
-
 void iser_unreg_mem_fmr(struct iscsi_iser_task *iser_task,
enum iser_data_dir cmd_dir);
 void iser_unreg_mem_fastreg(struct iscsi_iser_task *iser_task,
diff --git a/drivers/infiniband/ulp/iser/iser_memory.c 
b/drivers/infiniband/ulp/iser/iser_memory.c
index 9c60ff1..4e0cbbb 100644
--- a/drivers/infiniband/ulp/iser/iser_memory.c
+++ b/drivers/infiniband/ulp/iser/iser_memory.c
@@ -362,6 +362,94 @@ static int fall_to_bounce_buf(struct iscsi_iser_task 
*iser_task,
 }
 
 /**
+ * iser_reg_page_vec - Register physical memory
+ *
+ * returns: 0 on success, errno code on failure
+ */
+static
+int iser_reg_page_vec(struct ib_conn *ib_conn,
+ struct iser_page_vec *page_vec,
+ struct iser_mem_reg  *mem_reg)
+{
+   struct ib_pool_fmr *mem;
+   u64io_addr;
+   u64*page_list;
+   intstatus;
+
+   page_list = page_vec->pages;
+   io_addr   = page_list[0];
+
+   mem  = ib_fmr_pool_map_phys(ib_conn->fmr.pool,
+   page_list,
+   page_vec->length,
+   io_addr);
+
+   if (IS_ERR(mem)) {
+   status = (int)PTR_ERR(mem);
+   iser_err("ib_fmr_pool_map_phys failed: %d\n", status);
+   return status;
+   }
+
+   mem_reg->lkey  = mem->fmr->lkey;
+   mem_reg->rkey  = mem->fmr->rkey;
+   mem_reg->len   = page_vec->length * SIZE_4K;
+   mem_reg->va= io_addr;
+   mem_reg->mem_h = (void *)mem;
+
+   mem_reg->va   += page_vec->offset;
+   mem_reg->len   = page_vec->data_size;
+
+   iser_dbg("PHYSICAL Mem.register, [PHYS p_array: 0x%p, sz: %d, "
+"entry[0]: (0x%08lx,%ld)] -> "
+"[lkey: 0x%08X mem_h: 0x%p va: 0x%08lX sz: %ld]\n",
+page_vec, page_vec->length,
+(unsigned long)page_vec->pages[0],
+(unsigned long)page_vec->data_size,
+(unsigned int)mem_reg->lkey, mem_reg->mem_h,
+(unsigned long)mem_reg->va, (unsigned long)mem_reg->len);
+   return 0;
+}
+
+/**
+ * Unregister (previosuly registered using FMR) memory.
+ * If memory is non-FMR does nothing.
+ */
+void iser_unreg_mem_fmr(struct iscsi_iser_task *iser_task,
+   enum iser_data_dir cmd_dir)
+{
+   struct iser_mem_reg *reg = &iser_task->rdma_regd[cmd_dir].reg;
+   int ret;
+
+   if (!reg->mem_h)
+   return;
+
+   iser_dbg("PHYSICAL Mem.Unregister mem_h %p\n", reg->mem_h);
+
+   ret = ib_fmr_pool_unmap((struct ib_pool_fmr *)reg->mem_h);
+   if (ret)
+   iser_err("ib_fmr_pool_unmap failed %d\n", ret);
+
+   reg->mem_h = NULL;
+}
+
+void iser_unreg_mem_fastreg(struct iscsi_iser_task *iser_task,
+   enum iser_data_dir cmd_dir)
+{
+   struct iser_mem_reg *reg = &iser_task->rdma_regd[cmd_dir].reg;
+   struct iser_conn *iser_conn = iser_task->iser_conn;
+   struct ib_conn *ib_conn = &iser_conn->ib_conn;
+   struct fast_reg_descriptor *desc = reg->mem_h;
+
+   if (!desc)
+   return;
+
+   reg->mem_h = NULL;
+   spin_lock_bh(&ib_conn->lock);
+   list_add_tail(&desc->list, &ib_conn->fastreg.pool);
+   spin_unlock_bh(&ib_conn->lock);
+}
+
+/**
  * iser_reg_rdma_mem_fmr - Registers memory intended for RDMA,
  * using FMR (if possible) obtaining rkey and va
  *
diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c 
b/drivers/infiniband/ulp/iser/iser_verbs.c
index 7ee4926..986b5f4 100644
--- a/drivers/infiniband/ulp/iser/iser_verbs.c
+++ b/drivers/infiniband/ulp/iser/iser_verbs.c
@@ -992,93 +992,6 @@ connect_failure:
return err;
 }
 
-/**
- * iser_reg_page_vec - Register physical memory
- *
- * returns: 0 on success, errno code on failure
- */
-int iser_reg_page_vec(struct ib_conn *ib_conn,
- struct 

[PATCH v2 08/18] IB/iser: Remove redundant assignments in iser_reg_page_vec

2015-03-31 Thread Sagi Grimberg
Buffer length was assigned twice, and no reason to set va to
io_addr and then add the offset, just set va to io_addr + offset.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg 
Signed-off-by: Adir Lev 
---
 drivers/infiniband/ulp/iser/iser_memory.c |7 ++-
 1 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iser_memory.c 
b/drivers/infiniband/ulp/iser/iser_memory.c
index 4e0cbbb..cb30865 100644
--- a/drivers/infiniband/ulp/iser/iser_memory.c
+++ b/drivers/infiniband/ulp/iser/iser_memory.c
@@ -392,12 +392,9 @@ int iser_reg_page_vec(struct ib_conn *ib_conn,
 
mem_reg->lkey  = mem->fmr->lkey;
mem_reg->rkey  = mem->fmr->rkey;
-   mem_reg->len   = page_vec->length * SIZE_4K;
-   mem_reg->va= io_addr;
-   mem_reg->mem_h = (void *)mem;
-
-   mem_reg->va   += page_vec->offset;
mem_reg->len   = page_vec->data_size;
+   mem_reg->va= io_addr + page_vec->offset;
+   mem_reg->mem_h = (void *)mem;
 
iser_dbg("PHYSICAL Mem.register, [PHYS p_array: 0x%p, sz: %d, "
 "entry[0]: (0x%08lx,%ld)] -> "
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 11/18] IB/iser: Move fastreg descriptor pool get/put to helper functions

2015-03-31 Thread Sagi Grimberg
Instead of open-coding connection fastreg pool get/put,
we introduce iser_reg_desc[get|put] helpers.

We aren't setting these static as this will be a per-device
routine later on. Also, cleanup iser_unreg_rdma_mem_fastreg
a bit.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg 
Signed-off-by: Adir Lev 
---
 drivers/infiniband/ulp/iser/iscsi_iser.h  |5 +++
 drivers/infiniband/ulp/iser/iser_memory.c |   50 ++--
 2 files changed, 37 insertions(+), 18 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h 
b/drivers/infiniband/ulp/iser/iscsi_iser.h
index 8133678..185d2ec 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.h
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.h
@@ -644,4 +644,9 @@ int iser_create_fastreg_pool(struct ib_conn *ib_conn, 
unsigned cmds_max);
 void iser_free_fastreg_pool(struct ib_conn *ib_conn);
 u8 iser_check_task_pi_status(struct iscsi_iser_task *iser_task,
 enum iser_data_dir cmd_dir, sector_t *sector);
+struct fast_reg_descriptor *
+iser_reg_desc_get(struct ib_conn *ib_conn);
+void
+iser_reg_desc_put(struct ib_conn *ib_conn,
+ struct fast_reg_descriptor *desc);
 #endif
diff --git a/drivers/infiniband/ulp/iser/iser_memory.c 
b/drivers/infiniband/ulp/iser/iser_memory.c
index 6e6b753..17a5d70 100644
--- a/drivers/infiniband/ulp/iser/iser_memory.c
+++ b/drivers/infiniband/ulp/iser/iser_memory.c
@@ -41,6 +41,32 @@
 
 #define ISER_KMALLOC_THRESHOLD 0x2 /* 128K - kmalloc limit */
 
+struct fast_reg_descriptor *
+iser_reg_desc_get(struct ib_conn *ib_conn)
+{
+   struct fast_reg_descriptor *desc;
+   unsigned long flags;
+
+   spin_lock_irqsave(&ib_conn->lock, flags);
+   desc = list_first_entry(&ib_conn->fastreg.pool,
+   struct fast_reg_descriptor, list);
+   list_del(&desc->list);
+   spin_unlock_irqrestore(&ib_conn->lock, flags);
+
+   return desc;
+}
+
+void
+iser_reg_desc_put(struct ib_conn *ib_conn,
+ struct fast_reg_descriptor *desc)
+{
+   unsigned long flags;
+
+   spin_lock_irqsave(&ib_conn->lock, flags);
+   list_add_tail(&desc->list, &ib_conn->fastreg.pool);
+   spin_unlock_irqrestore(&ib_conn->lock, flags);
+}
+
 /**
  * iser_start_rdma_unaligned_sg
  */
@@ -409,17 +435,13 @@ void iser_unreg_mem_fastreg(struct iscsi_iser_task 
*iser_task,
enum iser_data_dir cmd_dir)
 {
struct iser_mem_reg *reg = &iser_task->rdma_reg[cmd_dir];
-   struct iser_conn *iser_conn = iser_task->iser_conn;
-   struct ib_conn *ib_conn = &iser_conn->ib_conn;
-   struct fast_reg_descriptor *desc = reg->mem_h;
 
-   if (!desc)
+   if (!reg->mem_h)
return;
 
+   iser_reg_desc_put(&iser_task->iser_conn->ib_conn,
+ reg->mem_h);
reg->mem_h = NULL;
-   spin_lock_bh(&ib_conn->lock);
-   list_add_tail(&desc->list, &ib_conn->fastreg.pool);
-   spin_unlock_bh(&ib_conn->lock);
 }
 
 /**
@@ -725,7 +747,6 @@ int iser_reg_rdma_mem_fastreg(struct iscsi_iser_task 
*iser_task,
struct fast_reg_descriptor *desc = NULL;
struct ib_sge data_sge;
int err, aligned_len;
-   unsigned long flags;
 
aligned_len = iser_data_buf_aligned_len(mem, ibdev);
if (aligned_len != mem->dma_nents) {
@@ -739,11 +760,7 @@ int iser_reg_rdma_mem_fastreg(struct iscsi_iser_task 
*iser_task,
 
if (mem->dma_nents != 1 ||
scsi_get_prot_op(iser_task->sc) != SCSI_PROT_NORMAL) {
-   spin_lock_irqsave(&ib_conn->lock, flags);
-   desc = list_first_entry(&ib_conn->fastreg.pool,
-   struct fast_reg_descriptor, list);
-   list_del(&desc->list);
-   spin_unlock_irqrestore(&ib_conn->lock, flags);
+   desc = iser_reg_desc_get(ib_conn);
mem_reg->mem_h = desc;
}
 
@@ -799,11 +816,8 @@ int iser_reg_rdma_mem_fastreg(struct iscsi_iser_task 
*iser_task,
 
return 0;
 err_reg:
-   if (desc) {
-   spin_lock_irqsave(&ib_conn->lock, flags);
-   list_add_tail(&desc->list, &ib_conn->fastreg.pool);
-   spin_unlock_irqrestore(&ib_conn->lock, flags);
-   }
+   if (desc)
+   iser_reg_desc_put(ib_conn, desc);
 
return err;
 }
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 09/18] IB/iser: Get rid of struct iser_rdma_regd

2015-03-31 Thread Sagi Grimberg
This struct members other than struct iser_mem_reg are unused,
so remove it altogether.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg 
Signed-off-by: Adir Lev 
---
 drivers/infiniband/ulp/iser/iscsi_iser.h |   21 +-
 drivers/infiniband/ulp/iser/iser_initiator.c |   44 ++--
 drivers/infiniband/ulp/iser/iser_memory.c|   56 +-
 drivers/infiniband/ulp/iser/iser_verbs.c |2 +-
 4 files changed, 53 insertions(+), 70 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h 
b/drivers/infiniband/ulp/iser/iscsi_iser.h
index d5e5288..8133678 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.h
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.h
@@ -261,23 +261,6 @@ struct iser_mem_reg {
void *mem_h;
 };
 
-/**
- * struct iser_regd_buf - iSER buffer registration desc
- *
- * @reg:  memory registration info
- * @virt_addr:virtual address of buffer
- * @device:   reference to iser device
- * @direction:dma direction (for dma_unmap)
- * @data_size:data buffer size in bytes
- */
-struct iser_regd_buf {
-   struct iser_mem_reg reg;
-   void*virt_addr;
-   struct iser_device  *device;
-   enum dma_data_direction direction;
-   unsigned intdata_size;
-};
-
 enum iser_desc_type {
ISCSI_TX_CONTROL ,
ISCSI_TX_SCSI_COMMAND,
@@ -537,7 +520,7 @@ struct iser_conn {
  * @sc:   link to scsi command
  * @command_sent: indicate if command was sent
  * @dir:  iser data direction
- * @rdma_regd:task rdma registration desc
+ * @rdma_reg: task rdma registration desc
  * @data: iser data buffer desc
  * @prot: iser protection buffer desc
  */
@@ -548,7 +531,7 @@ struct iscsi_iser_task {
struct scsi_cmnd *sc;
int  command_sent;
int  dir[ISER_DIRS_NUM];
-   struct iser_regd_buf rdma_regd[ISER_DIRS_NUM];
+   struct iser_mem_reg  rdma_reg[ISER_DIRS_NUM];
struct iser_data_buf data[ISER_DIRS_NUM];
struct iser_data_buf prot[ISER_DIRS_NUM];
 };
diff --git a/drivers/infiniband/ulp/iser/iser_initiator.c 
b/drivers/infiniband/ulp/iser/iser_initiator.c
index 0e414db..420a613 100644
--- a/drivers/infiniband/ulp/iser/iser_initiator.c
+++ b/drivers/infiniband/ulp/iser/iser_initiator.c
@@ -50,7 +50,7 @@ static int iser_prepare_read_cmd(struct iscsi_task *task)
 {
struct iscsi_iser_task *iser_task = task->dd_data;
struct iser_device  *device = iser_task->iser_conn->ib_conn.device;
-   struct iser_regd_buf *regd_buf;
+   struct iser_mem_reg *mem_reg;
int err;
struct iser_hdr *hdr = &iser_task->desc.iser_header;
struct iser_data_buf *buf_in = &iser_task->data[ISER_DIR_IN];
@@ -78,15 +78,15 @@ static int iser_prepare_read_cmd(struct iscsi_task *task)
iser_err("Failed to set up Data-IN RDMA\n");
return err;
}
-   regd_buf = &iser_task->rdma_regd[ISER_DIR_IN];
+   mem_reg = &iser_task->rdma_reg[ISER_DIR_IN];
 
hdr->flags|= ISER_RSV;
-   hdr->read_stag = cpu_to_be32(regd_buf->reg.rkey);
-   hdr->read_va   = cpu_to_be64(regd_buf->reg.va);
+   hdr->read_stag = cpu_to_be32(mem_reg->rkey);
+   hdr->read_va   = cpu_to_be64(mem_reg->va);
 
iser_dbg("Cmd itt:%d READ tags RKEY:%#.4X VA:%#llX\n",
-task->itt, regd_buf->reg.rkey,
-(unsigned long long)regd_buf->reg.va);
+task->itt, mem_reg->rkey,
+(unsigned long long)mem_reg->va);
 
return 0;
 }
@@ -104,7 +104,7 @@ iser_prepare_write_cmd(struct iscsi_task *task,
 {
struct iscsi_iser_task *iser_task = task->dd_data;
struct iser_device  *device = iser_task->iser_conn->ib_conn.device;
-   struct iser_regd_buf *regd_buf;
+   struct iser_mem_reg *mem_reg;
int err;
struct iser_hdr *hdr = &iser_task->desc.iser_header;
struct iser_data_buf *buf_out = &iser_task->data[ISER_DIR_OUT];
@@ -134,25 +134,25 @@ iser_prepare_write_cmd(struct iscsi_task *task,
return err;
}
 
-   regd_buf = &iser_task->rdma_regd[ISER_DIR_OUT];
+   mem_reg = &iser_task->rdma_reg[ISER_DIR_OUT];
 
if (unsol_sz < edtl) {
hdr->flags |= ISER_WSV;
-   hdr->write_stag = cpu_to_be32(regd_buf->reg.rkey);
-   hdr->write_va   = cpu_to_be64(regd_buf->reg.va + unsol_sz);
+   hdr->write_stag = cpu_to_be32(mem_reg->rkey);
+   hdr->write_va   = cpu_to_be64(mem_reg->va + unsol_sz);
 
iser_dbg("Cmd itt:%d, WRITE tags, RKEY:%#.4X "
 "VA:%#llX + unsol:%d\n",
-task->itt, regd_buf->reg.rkey,
-(unsigned long long)regd_buf->reg.va, un

[PATCH v2 01/18] IB/iser: Fix unload during ep_poll wrong dereference

2015-03-31 Thread Sagi Grimberg
In case the user unloaded ib_iser while ep_connect is in
progress, we need to destroy the endpoint although ep_disconnect
wasn't invoked (we detect this by the iser conn state != DOWN).
However, if we got an REJECTED/UNREACHABLE CM event we move the
connection state to DOWN which will prevent us from destroying
the endpoint in the module unload stage. Fix this by setting the
connection state to TERMINATING in iser_conn_error so we can still
destroy the endpoint at unload stage.

Reported-by: Ariel Nahum 
Signed-off-by: Sagi Grimberg 
---
 drivers/infiniband/ulp/iser/iser_verbs.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c 
b/drivers/infiniband/ulp/iser/iser_verbs.c
index 4065abe..070c5af 100644
--- a/drivers/infiniband/ulp/iser/iser_verbs.c
+++ b/drivers/infiniband/ulp/iser/iser_verbs.c
@@ -721,7 +721,7 @@ static void iser_connect_error(struct rdma_cm_id *cma_id)
struct iser_conn *iser_conn;
 
iser_conn = (struct iser_conn *)cma_id->context;
-   iser_conn->state = ISER_CONN_DOWN;
+   iser_conn->state = ISER_CONN_TERMINATING;
 }
 
 /**
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 12/18] IB/iser: Move PI context alloc/free to routines

2015-03-31 Thread Sagi Grimberg
Make iser_[create|destroy]_fastreg_desc shorter, more
readable and easily extendable.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg 
Signed-off-by: Adir Lev 
---
 drivers/infiniband/ulp/iser/iser_verbs.c |  118 --
 1 files changed, 63 insertions(+), 55 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c 
b/drivers/infiniband/ulp/iser/iser_verbs.c
index 20eec09..cc2dd35 100644
--- a/drivers/infiniband/ulp/iser/iser_verbs.c
+++ b/drivers/infiniband/ulp/iser/iser_verbs.c
@@ -274,6 +274,65 @@ void iser_free_fmr_pool(struct ib_conn *ib_conn)
 }
 
 static int
+iser_alloc_pi_ctx(struct ib_device *ib_device, struct ib_pd *pd,
+ struct fast_reg_descriptor *desc)
+{
+   struct iser_pi_context *pi_ctx = NULL;
+   struct ib_mr_init_attr mr_init_attr = {.max_reg_descriptors = 2,
+  .flags = IB_MR_SIGNATURE_EN};
+   int ret = 0;
+
+   desc->pi_ctx = kzalloc(sizeof(*desc->pi_ctx), GFP_KERNEL);
+   if (!desc->pi_ctx)
+   return -ENOMEM;
+
+   pi_ctx = desc->pi_ctx;
+
+   pi_ctx->prot_frpl = ib_alloc_fast_reg_page_list(ib_device,
+   ISCSI_ISER_SG_TABLESIZE);
+   if (IS_ERR(pi_ctx->prot_frpl)) {
+   ret = PTR_ERR(pi_ctx->prot_frpl);
+   goto prot_frpl_failure;
+   }
+
+   pi_ctx->prot_mr = ib_alloc_fast_reg_mr(pd,
+   ISCSI_ISER_SG_TABLESIZE + 1);
+   if (IS_ERR(pi_ctx->prot_mr)) {
+   ret = PTR_ERR(pi_ctx->prot_mr);
+   goto prot_mr_failure;
+   }
+   desc->reg_indicators |= ISER_PROT_KEY_VALID;
+
+   pi_ctx->sig_mr = ib_create_mr(pd, &mr_init_attr);
+   if (IS_ERR(pi_ctx->sig_mr)) {
+   ret = PTR_ERR(pi_ctx->sig_mr);
+   goto sig_mr_failure;
+   }
+   desc->reg_indicators |= ISER_SIG_KEY_VALID;
+   desc->reg_indicators &= ~ISER_FASTREG_PROTECTED;
+
+   return 0;
+
+sig_mr_failure:
+   ib_dereg_mr(desc->pi_ctx->prot_mr);
+prot_mr_failure:
+   ib_free_fast_reg_page_list(desc->pi_ctx->prot_frpl);
+prot_frpl_failure:
+   kfree(desc->pi_ctx);
+
+   return ret;
+}
+
+static void
+iser_free_pi_ctx(struct iser_pi_context *pi_ctx)
+{
+   ib_free_fast_reg_page_list(pi_ctx->prot_frpl);
+   ib_dereg_mr(pi_ctx->prot_mr);
+   ib_destroy_mr(pi_ctx->sig_mr);
+   kfree(pi_ctx);
+}
+
+static int
 iser_create_fastreg_desc(struct ib_device *ib_device, struct ib_pd *pd,
 bool pi_enable, struct fast_reg_descriptor *desc)
 {
@@ -297,59 +356,12 @@ iser_create_fastreg_desc(struct ib_device *ib_device, 
struct ib_pd *pd,
desc->reg_indicators |= ISER_DATA_KEY_VALID;
 
if (pi_enable) {
-   struct ib_mr_init_attr mr_init_attr = {0};
-   struct iser_pi_context *pi_ctx = NULL;
-
-   desc->pi_ctx = kzalloc(sizeof(*desc->pi_ctx), GFP_KERNEL);
-   if (!desc->pi_ctx) {
-   iser_err("Failed to allocate pi context\n");
-   ret = -ENOMEM;
+   ret = iser_alloc_pi_ctx(ib_device, pd, desc);
+   if (ret)
goto pi_ctx_alloc_failure;
-   }
-   pi_ctx = desc->pi_ctx;
-
-   pi_ctx->prot_frpl = ib_alloc_fast_reg_page_list(ib_device,
-   ISCSI_ISER_SG_TABLESIZE);
-   if (IS_ERR(pi_ctx->prot_frpl)) {
-   ret = PTR_ERR(pi_ctx->prot_frpl);
-   iser_err("Failed to allocate prot frpl ret=%d\n",
-ret);
-   goto prot_frpl_failure;
-   }
-
-   pi_ctx->prot_mr = ib_alloc_fast_reg_mr(pd,
-   ISCSI_ISER_SG_TABLESIZE + 1);
-   if (IS_ERR(pi_ctx->prot_mr)) {
-   ret = PTR_ERR(pi_ctx->prot_mr);
-   iser_err("Failed to allocate prot frmr ret=%d\n",
-ret);
-   goto prot_mr_failure;
-   }
-   desc->reg_indicators |= ISER_PROT_KEY_VALID;
-
-   mr_init_attr.max_reg_descriptors = 2;
-   mr_init_attr.flags |= IB_MR_SIGNATURE_EN;
-   pi_ctx->sig_mr = ib_create_mr(pd, &mr_init_attr);
-   if (IS_ERR(pi_ctx->sig_mr)) {
-   ret = PTR_ERR(pi_ctx->sig_mr);
-   iser_err("Failed to allocate signature enabled mr 
err=%d\n",
-ret);
-   goto sig_mr_failure;
-   }
-   desc->reg_indicators |= ISER_SIG_KEY_VALID;
}
-   desc->reg_indicators &= ~ISER_FASTREG_PROTECTED;
-
-   iser_dbg("Create fr_desc %p page_list %p\n",
-desc, desc->data_frpl->page_list);
 
retur

[PATCH v2 16/18] IB/iser: Remove code duplication for a single DMA entry

2015-03-31 Thread Sagi Grimberg
In singleton scatterlists, DMA memory registration code
is taken both for Fastreg and FMR code paths. Move it to
a function.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg 
Signed-off-by: Adir Lev 
---
 drivers/infiniband/ulp/iser/iser_memory.c |   48 
 1 files changed, 21 insertions(+), 27 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iser_memory.c 
b/drivers/infiniband/ulp/iser/iser_memory.c
index b4b0098..e78ce53 100644
--- a/drivers/infiniband/ulp/iser/iser_memory.c
+++ b/drivers/infiniband/ulp/iser/iser_memory.c
@@ -334,6 +334,24 @@ void iser_dma_unmap_task_data(struct iscsi_iser_task 
*iser_task,
ib_dma_unmap_sg(dev, data->sg, data->size, dir);
 }
 
+static int
+iser_reg_dma(struct iser_device *device, struct iser_data_buf *mem,
+struct iser_mem_reg *reg)
+{
+   struct scatterlist *sg = mem->sg;
+
+   reg->sge.lkey = device->mr->lkey;
+   reg->rkey = device->mr->rkey;
+   reg->sge.addr = ib_sg_dma_address(device->ib_device, &sg[0]);
+   reg->sge.length = ib_sg_dma_len(device->ib_device, &sg[0]);
+
+   iser_dbg("Single DMA entry: lkey=0x%x, rkey=0x%x, addr=0x%llx,"
+" length=0x%x\n", reg->sge.lkey, reg->rkey,
+reg->sge.addr, reg->sge.length);
+
+   return 0;
+}
+
 static int fall_to_bounce_buf(struct iscsi_iser_task *iser_task,
  struct iser_data_buf *mem,
  enum iser_data_dir cmd_dir,
@@ -461,7 +479,6 @@ int iser_reg_rdma_mem_fmr(struct iscsi_iser_task *iser_task,
int aligned_len;
int err;
int i;
-   struct scatterlist *sg;
 
mem_reg = &iser_task->rdma_reg[cmd_dir];
 
@@ -477,19 +494,7 @@ int iser_reg_rdma_mem_fmr(struct iscsi_iser_task 
*iser_task,
 
/* if there a single dma entry, FMR is not needed */
if (mem->dma_nents == 1) {
-   sg = mem->sg;
-
-   mem_reg->sge.lkey = device->mr->lkey;
-   mem_reg->rkey = device->mr->rkey;
-   mem_reg->sge.length = ib_sg_dma_len(ibdev, &sg[0]);
-   mem_reg->sge.addr = ib_sg_dma_address(ibdev, &sg[0]);
-
-   iser_dbg("PHYSICAL Mem.register: lkey: 0x%08X rkey: 0x%08X  "
-"va: 0x%08lX sz: %ld]\n",
-(unsigned int)mem_reg->sge.lkey,
-(unsigned int)mem_reg->rkey,
-(unsigned long)mem_reg->sge.addr,
-(unsigned long)mem_reg->sge.length);
+   return iser_reg_dma(device, mem, mem_reg);
} else { /* use FMR for multiple dma entries */
err = iser_reg_page_vec(iser_task, mem, ib_conn->fmr.page_vec,
mem_reg);
@@ -660,7 +665,6 @@ static int iser_fast_reg_mr(struct iscsi_iser_task 
*iser_task,
struct fast_reg_descriptor *desc = mem_reg->mem_h;
struct ib_conn *ib_conn = &iser_task->iser_conn->ib_conn;
struct iser_device *device = ib_conn->device;
-   struct ib_device *ibdev = device->ib_device;
struct ib_mr *mr;
struct ib_fast_reg_page_list *frpl;
struct ib_send_wr fastreg_wr, inv_wr;
@@ -668,18 +672,8 @@ static int iser_fast_reg_mr(struct iscsi_iser_task 
*iser_task,
int ret, offset, size, plen;
 
/* if there a single dma entry, dma mr suffices */
-   if (mem->dma_nents == 1) {
-   struct scatterlist *sg = mem->sg;
-
-   reg->sge.lkey = device->mr->lkey;
-   reg->rkey = device->mr->rkey;
-   reg->sge.addr = ib_sg_dma_address(ibdev, &sg[0]);
-   reg->sge.length = ib_sg_dma_len(ibdev, &sg[0]);
-
-   iser_dbg("Single DMA entry: lkey=0x%x, addr=0x%llx, 
length=0x%x\n",
-reg->sge.lkey, reg->sge.addr, reg->sge.length);
-   return 0;
-   }
+   if (mem->dma_nents == 1)
+   return iser_reg_dma(device, mem, mem_reg);
 
if (ind == ISER_DATA_KEY_VALID) {
mr = desc->data_mr;
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 15/18] IB/iser: Pass struct iser_mem_reg to iser_fast_reg_mr and iser_reg_sig_mr

2015-03-31 Thread Sagi Grimberg
Instead of passing ib_sge as output variable, we pass the mem_reg
pointer to have the routines fill the rkey as well. This reduces
code duplication and extra assignments. This is a preparation step
to unify some registration logics together.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg 
Signed-off-by: Adir Lev 
---
 drivers/infiniband/ulp/iser/iser_memory.c |   68 
 1 files changed, 29 insertions(+), 39 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iser_memory.c 
b/drivers/infiniband/ulp/iser/iser_memory.c
index 40d22d5..b4b0098 100644
--- a/drivers/infiniband/ulp/iser/iser_memory.c
+++ b/drivers/infiniband/ulp/iser/iser_memory.c
@@ -590,8 +590,10 @@ iser_inv_rkey(struct ib_send_wr *inv_wr, struct ib_mr *mr)
 
 static int
 iser_reg_sig_mr(struct iscsi_iser_task *iser_task,
-   struct fast_reg_descriptor *desc, struct ib_sge *data_sge,
-   struct ib_sge *prot_sge, struct ib_sge *sig_sge)
+   struct fast_reg_descriptor *desc,
+   struct iser_mem_reg *data_reg,
+   struct iser_mem_reg *prot_reg,
+   struct iser_mem_reg *sig_reg)
 {
struct ib_conn *ib_conn = &iser_task->iser_conn->ib_conn;
struct iser_pi_context *pi_ctx = desc->pi_ctx;
@@ -615,12 +617,12 @@ iser_reg_sig_mr(struct iscsi_iser_task *iser_task,
memset(&sig_wr, 0, sizeof(sig_wr));
sig_wr.opcode = IB_WR_REG_SIG_MR;
sig_wr.wr_id = ISER_FASTREG_LI_WRID;
-   sig_wr.sg_list = data_sge;
+   sig_wr.sg_list = &data_reg->sge;
sig_wr.num_sge = 1;
sig_wr.wr.sig_handover.sig_attrs = &sig_attrs;
sig_wr.wr.sig_handover.sig_mr = pi_ctx->sig_mr;
if (scsi_prot_sg_count(iser_task->sc))
-   sig_wr.wr.sig_handover.prot = prot_sge;
+   sig_wr.wr.sig_handover.prot = &prot_reg->sge;
sig_wr.wr.sig_handover.access_flags = IB_ACCESS_LOCAL_WRITE |
  IB_ACCESS_REMOTE_READ |
  IB_ACCESS_REMOTE_WRITE;
@@ -637,13 +639,14 @@ iser_reg_sig_mr(struct iscsi_iser_task *iser_task,
}
desc->reg_indicators &= ~ISER_SIG_KEY_VALID;
 
-   sig_sge->lkey = pi_ctx->sig_mr->lkey;
-   sig_sge->addr = 0;
-   sig_sge->length = scsi_transfer_length(iser_task->sc);
+   sig_reg->sge.lkey = pi_ctx->sig_mr->lkey;
+   sig_reg->rkey = pi_ctx->sig_mr->rkey;
+   sig_reg->sge.addr = 0;
+   sig_reg->sge.length = scsi_transfer_length(iser_task->sc);
 
-   iser_dbg("sig_sge: addr: 0x%llx  length: %u lkey: 0x%x\n",
-sig_sge->addr, sig_sge->length,
-sig_sge->lkey);
+   iser_dbg("sig_sge: lkey: 0x%x, rkey: 0x%x, addr: 0x%llx, length: %u\n",
+sig_reg->sge.lkey, sig_reg->rkey, sig_reg->sge.addr,
+sig_reg->sge.length);
 err:
return ret;
 }
@@ -652,7 +655,7 @@ static int iser_fast_reg_mr(struct iscsi_iser_task 
*iser_task,
struct iser_mem_reg *mem_reg,
struct iser_data_buf *mem,
enum iser_reg_indicator ind,
-   struct ib_sge *sge)
+   struct iser_mem_reg *reg)
 {
struct fast_reg_descriptor *desc = mem_reg->mem_h;
struct ib_conn *ib_conn = &iser_task->iser_conn->ib_conn;
@@ -668,12 +671,13 @@ static int iser_fast_reg_mr(struct iscsi_iser_task 
*iser_task,
if (mem->dma_nents == 1) {
struct scatterlist *sg = mem->sg;
 
-   sge->lkey = device->mr->lkey;
-   sge->addr   = ib_sg_dma_address(ibdev, &sg[0]);
-   sge->length  = ib_sg_dma_len(ibdev, &sg[0]);
+   reg->sge.lkey = device->mr->lkey;
+   reg->rkey = device->mr->rkey;
+   reg->sge.addr = ib_sg_dma_address(ibdev, &sg[0]);
+   reg->sge.length = ib_sg_dma_len(ibdev, &sg[0]);
 
iser_dbg("Single DMA entry: lkey=0x%x, addr=0x%llx, 
length=0x%x\n",
-sge->lkey, sge->addr, sge->length);
+reg->sge.lkey, reg->sge.addr, reg->sge.length);
return 0;
}
 
@@ -723,9 +727,10 @@ static int iser_fast_reg_mr(struct iscsi_iser_task 
*iser_task,
}
desc->reg_indicators &= ~ind;
 
-   sge->lkey = mr->lkey;
-   sge->addr = frpl->page_list[0] + offset;
-   sge->length = size;
+   reg->sge.lkey = mr->lkey;
+   reg->rkey = mr->rkey;
+   reg->sge.addr = frpl->page_list[0] + offset;
+   reg->sge.length = size;
 
return ret;
 }
@@ -745,7 +750,6 @@ int iser_reg_rdma_mem_fastreg(struct iscsi_iser_task 
*iser_task,
struct iser_data_buf *mem = &iser_task->data[cmd_dir];
struct iser_mem_reg *mem_reg = &iser_task->rdma_reg[cmd_dir];
struct fast_reg_descriptor *desc = NULL;
-   struct ib_sge data_sge;
int err, aligned_len;
 

[PATCH v2 10/18] IB/iser: Merge build page-vec into register page-vec

2015-03-31 Thread Sagi Grimberg
No need for these two separate. Keep it in a single routine
like in the fastreg case. This will also make iser_reg_page_vec
closer to iser_fast_reg_mr arguments. This is a preparation
step for registration flow refactor.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg 
Signed-off-by: Adir Lev 
---
 drivers/infiniband/ulp/iser/iser_memory.c |   91 ++--
 1 files changed, 33 insertions(+), 58 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iser_memory.c 
b/drivers/infiniband/ulp/iser/iser_memory.c
index 0b8656f..6e6b753 100644
--- a/drivers/infiniband/ulp/iser/iser_memory.c
+++ b/drivers/infiniband/ulp/iser/iser_memory.c
@@ -280,31 +280,6 @@ static void iser_dump_page_vec(struct iser_page_vec 
*page_vec)
iser_err("%d %lx\n",i,(unsigned long)page_vec->pages[i]);
 }
 
-static void iser_page_vec_build(struct iser_data_buf *data,
-   struct iser_page_vec *page_vec,
-   struct ib_device *ibdev)
-{
-   int page_vec_len = 0;
-
-   page_vec->length = 0;
-   page_vec->offset = 0;
-
-   iser_dbg("Translating sg sz: %d\n", data->dma_nents);
-   page_vec_len = iser_sg_to_page_vec(data, ibdev, page_vec->pages,
-  &page_vec->offset,
-  &page_vec->data_size);
-   iser_dbg("sg len %d page_vec_len %d\n", data->dma_nents, page_vec_len);
-
-   page_vec->length = page_vec_len;
-
-   if (page_vec_len * SIZE_4K < page_vec->data_size) {
-   iser_err("page_vec too short to hold this SG\n");
-   iser_data_buf_dump(data, ibdev);
-   iser_dump_page_vec(page_vec);
-   BUG();
-   }
-}
-
 int iser_dma_map_task_data(struct iscsi_iser_task *iser_task,
struct iser_data_buf *data,
enum iser_data_dir iser_dir,
@@ -367,43 +342,44 @@ static int fall_to_bounce_buf(struct iscsi_iser_task 
*iser_task,
  * returns: 0 on success, errno code on failure
  */
 static
-int iser_reg_page_vec(struct ib_conn *ib_conn,
+int iser_reg_page_vec(struct iscsi_iser_task *iser_task,
+ struct iser_data_buf *mem,
  struct iser_page_vec *page_vec,
- struct iser_mem_reg  *mem_reg)
+ struct iser_mem_reg *mem_reg)
 {
-   struct ib_pool_fmr *mem;
-   u64io_addr;
-   u64*page_list;
-   intstatus;
-
-   page_list = page_vec->pages;
-   io_addr   = page_list[0];
+   struct ib_conn *ib_conn = &iser_task->iser_conn->ib_conn;
+   struct iser_device *device = ib_conn->device;
+   struct ib_pool_fmr *fmr;
+   int ret, plen;
+
+   plen = iser_sg_to_page_vec(mem, device->ib_device,
+  page_vec->pages,
+  &page_vec->offset,
+  &page_vec->data_size);
+   page_vec->length = plen;
+   if (plen * SIZE_4K < page_vec->data_size) {
+   iser_err("page vec too short to hold this SG\n");
+   iser_data_buf_dump(mem, device->ib_device);
+   iser_dump_page_vec(page_vec);
+   return -EINVAL;
+   }
 
-   mem  = ib_fmr_pool_map_phys(ib_conn->fmr.pool,
-   page_list,
+   fmr  = ib_fmr_pool_map_phys(ib_conn->fmr.pool,
+   page_vec->pages,
page_vec->length,
-   io_addr);
-
-   if (IS_ERR(mem)) {
-   status = (int)PTR_ERR(mem);
-   iser_err("ib_fmr_pool_map_phys failed: %d\n", status);
-   return status;
+   page_vec->pages[0]);
+   if (IS_ERR(fmr)) {
+   ret = PTR_ERR(fmr);
+   iser_err("ib_fmr_pool_map_phys failed: %d\n", ret);
+   return ret;
}
 
-   mem_reg->lkey  = mem->fmr->lkey;
-   mem_reg->rkey  = mem->fmr->rkey;
-   mem_reg->len   = page_vec->data_size;
-   mem_reg->va= io_addr + page_vec->offset;
-   mem_reg->mem_h = (void *)mem;
-
-   iser_dbg("PHYSICAL Mem.register, [PHYS p_array: 0x%p, sz: %d, "
-"entry[0]: (0x%08lx,%ld)] -> "
-"[lkey: 0x%08X mem_h: 0x%p va: 0x%08lX sz: %ld]\n",
-page_vec, page_vec->length,
-(unsigned long)page_vec->pages[0],
-(unsigned long)page_vec->data_size,
-(unsigned int)mem_reg->lkey, mem_reg->mem_h,
-(unsigned long)mem_reg->va, (unsigned long)mem_reg->len);
+   mem_reg->lkey = fmr->fmr->lkey;
+   mem_reg->rkey = fmr->fmr->rkey;
+   mem_reg->va = page_vec->pages[0] + page_vec->offset;
+   mem_reg->len = page_vec->data_size;
+   mem_reg->mem_h = fmr;
+
return 0;
 }
 
@@ -493,8 +4

[PATCH v2 05/18] IB/iser: Remove a redundant struct iser_data_buf

2015-03-31 Thread Sagi Grimberg
No need to keep two iser_data_buf structures just in case we use
mem copy. We can avoid that just by adding a pointer to the original
sg. So keep only two iser_data_buf per command (data and protection)
and pass the relevant data_buf to bounce buffer routine.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg 
Signed-off-by: Adir Lev 
---
 drivers/infiniband/ulp/iser/iscsi_iser.h |   12 ++---
 drivers/infiniband/ulp/iser/iser_initiator.c |   16 +++
 drivers/infiniband/ulp/iser/iser_memory.c|   58 ++---
 3 files changed, 34 insertions(+), 52 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h 
b/drivers/infiniband/ulp/iser/iscsi_iser.h
index b47aea1..5c7036c 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.h
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.h
@@ -218,20 +218,23 @@ enum iser_data_dir {
 /**
  * struct iser_data_buf - iSER data buffer
  *
- * @buf:  pointer to the sg list
+ * @sg:   pointer to the sg list
  * @size: num entries of this sg
  * @data_len: total beffer byte len
  * @dma_nents:returned by dma_map_sg
  * @copy_buf: allocated copy buf for SGs unaligned
  *for rdma which are copied
+ * @orig_sg:  pointer to the original sg list (in case
+ *we used a copy)
  * @sg_single:SG-ified clone of a non SG SC or
  *unaligned SG
  */
 struct iser_data_buf {
-   void   *buf;
+   struct scatterlist *sg;
unsigned int   size;
unsigned long  data_len;
unsigned int   dma_nents;
+   struct scatterlist *orig_sg;
char   *copy_buf;
struct scatterlist sg_single;
   };
@@ -536,9 +539,7 @@ struct iser_conn {
  * @dir:  iser data direction
  * @rdma_regd:task rdma registration desc
  * @data: iser data buffer desc
- * @data_copy:iser data copy buffer desc (bounce buffer)
  * @prot: iser protection buffer desc
- * @prot_copy:iser protection copy buffer desc (bounce buffer)
  */
 struct iscsi_iser_task {
struct iser_tx_desc  desc;
@@ -549,9 +550,7 @@ struct iscsi_iser_task {
int  dir[ISER_DIRS_NUM];
struct iser_regd_buf rdma_regd[ISER_DIRS_NUM];
struct iser_data_buf data[ISER_DIRS_NUM];
-   struct iser_data_buf data_copy[ISER_DIRS_NUM];
struct iser_data_buf prot[ISER_DIRS_NUM];
-   struct iser_data_buf prot_copy[ISER_DIRS_NUM];
 };
 
 struct iser_page_vec {
@@ -621,7 +620,6 @@ void iser_free_rx_descriptors(struct iser_conn *iser_conn);
 
 void iser_finalize_rdma_unaligned_sg(struct iscsi_iser_task *iser_task,
 struct iser_data_buf *mem,
-struct iser_data_buf *mem_copy,
 enum iser_data_dir cmd_dir);
 
 int  iser_reg_rdma_mem_fmr(struct iscsi_iser_task *task,
diff --git a/drivers/infiniband/ulp/iser/iser_initiator.c 
b/drivers/infiniband/ulp/iser/iser_initiator.c
index 76eb57b..0e414db 100644
--- a/drivers/infiniband/ulp/iser/iser_initiator.c
+++ b/drivers/infiniband/ulp/iser/iser_initiator.c
@@ -401,13 +401,13 @@ int iser_send_command(struct iscsi_conn *conn,
}
 
if (scsi_sg_count(sc)) { /* using a scatter list */
-   data_buf->buf  = scsi_sglist(sc);
+   data_buf->sg = scsi_sglist(sc);
data_buf->size = scsi_sg_count(sc);
}
data_buf->data_len = scsi_bufflen(sc);
 
if (scsi_prot_sg_count(sc)) {
-   prot_buf->buf  = scsi_prot_sglist(sc);
+   prot_buf->sg  = scsi_prot_sglist(sc);
prot_buf->size = scsi_prot_sg_count(sc);
prot_buf->data_len = (data_buf->data_len >>
 ilog2(sc->device->sector_size)) * 8;
@@ -674,35 +674,31 @@ void iser_task_rdma_finalize(struct iscsi_iser_task 
*iser_task)
/* if we were reading, copy back to unaligned sglist,
 * anyway dma_unmap and free the copy
 */
-   if (iser_task->data_copy[ISER_DIR_IN].copy_buf != NULL) {
+   if (iser_task->data[ISER_DIR_IN].copy_buf) {
is_rdma_data_aligned = 0;
iser_finalize_rdma_unaligned_sg(iser_task,
&iser_task->data[ISER_DIR_IN],
-   
&iser_task->data_copy[ISER_DIR_IN],
ISER_DIR_IN);
}
 
-   if (iser_task->data_copy[ISER_DIR_OUT].copy_buf != NULL) {
+   if (iser_task->data[ISER_DIR_OUT].copy_buf) {
is_rdma_data_aligned = 0;
iser_finalize_rdma_unaligned_sg(iser_task,
&iser_task->data[ISER_DIR_OUT],
-   
&iser_task->data_c

[PATCH v2 02/18] IB/iser: Handle fastreg/local_inv completion errors

2015-03-31 Thread Sagi Grimberg
Fast registration and local invalidate work requests can
also fail. We should call error completion handler for them.

Reported-by: Roi Dayan 
Signed-off-by: Sagi Grimberg 
---
 drivers/infiniband/ulp/iser/iser_verbs.c |   11 ++-
 1 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c 
b/drivers/infiniband/ulp/iser/iser_verbs.c
index 070c5af..7ee4926 100644
--- a/drivers/infiniband/ulp/iser/iser_verbs.c
+++ b/drivers/infiniband/ulp/iser/iser_verbs.c
@@ -1210,6 +1210,9 @@ iser_handle_comp_error(struct ib_conn *ib_conn,
iscsi_conn_failure(iser_conn->iscsi_conn,
   ISCSI_ERR_CONN_FAILED);
 
+   if (wc->wr_id == ISER_FASTREG_LI_WRID)
+   return;
+
if (is_iser_tx_desc(iser_conn, wr_id)) {
struct iser_tx_desc *desc = wr_id;
 
@@ -1254,13 +1257,11 @@ static void iser_handle_wc(struct ib_wc *wc)
else
iser_dbg("flush error: wr id %llx\n", wc->wr_id);
 
-   if (wc->wr_id != ISER_FASTREG_LI_WRID &&
-   wc->wr_id != ISER_BEACON_WRID)
-   iser_handle_comp_error(ib_conn, wc);
-
-   /* complete in case all flush errors were consumed */
if (wc->wr_id == ISER_BEACON_WRID)
+   /* all flush errors were consumed */
complete(&ib_conn->flush_comp);
+   else
+   iser_handle_comp_error(ib_conn, wc);
}
 }
 
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 18/18] IB/iser: Rewrite bounce buffer code path

2015-03-31 Thread Sagi Grimberg
In some rare cases, IO operations may be not aligned to page
boundaries. This prevents iser from performing fast memory
registration. In order to overcome that iser uses a bounce
buffer to carry the transaction. We basically allocate a buffer
in the size of the transaction and perform a copy.

The buffer allocation using kmalloc is too restrictive since it
requires higher order (atomic) allocations for large transactions
(which may result in memory exhaustion fairly fast for some workloads).
We rewrite the bounce buffer code path to allocate scattered pages
and perform a copy between the transaction sg and the bounce sg.

Reported-by: Alex Lyakas 
Signed-off-by: Sagi Grimberg 
---
 drivers/infiniband/ulp/iser/iscsi_iser.h |8 +-
 drivers/infiniband/ulp/iser/iser_initiator.c |8 +-
 drivers/infiniband/ulp/iser/iser_memory.c|  211 --
 3 files changed, 138 insertions(+), 89 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h 
b/drivers/infiniband/ulp/iser/iscsi_iser.h
index a39645a..262ba1f 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.h
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.h
@@ -222,12 +222,9 @@ enum iser_data_dir {
  * @size: num entries of this sg
  * @data_len: total beffer byte len
  * @dma_nents:returned by dma_map_sg
- * @copy_buf: allocated copy buf for SGs unaligned
- *for rdma which are copied
  * @orig_sg:  pointer to the original sg list (in case
  *we used a copy)
- * @sg_single:SG-ified clone of a non SG SC or
- *unaligned SG
+ * @orig_size:num entris of orig sg list
  */
 struct iser_data_buf {
struct scatterlist *sg;
@@ -235,8 +232,7 @@ struct iser_data_buf {
unsigned long  data_len;
unsigned int   dma_nents;
struct scatterlist *orig_sg;
-   char   *copy_buf;
-   struct scatterlist sg_single;
+   unsigned int   orig_size;
   };
 
 /* fwd declarations */
diff --git a/drivers/infiniband/ulp/iser/iser_initiator.c 
b/drivers/infiniband/ulp/iser/iser_initiator.c
index b2e3b77..3e2118e 100644
--- a/drivers/infiniband/ulp/iser/iser_initiator.c
+++ b/drivers/infiniband/ulp/iser/iser_initiator.c
@@ -674,28 +674,28 @@ void iser_task_rdma_finalize(struct iscsi_iser_task 
*iser_task)
/* if we were reading, copy back to unaligned sglist,
 * anyway dma_unmap and free the copy
 */
-   if (iser_task->data[ISER_DIR_IN].copy_buf) {
+   if (iser_task->data[ISER_DIR_IN].orig_sg) {
is_rdma_data_aligned = 0;
iser_finalize_rdma_unaligned_sg(iser_task,
&iser_task->data[ISER_DIR_IN],
ISER_DIR_IN);
}
 
-   if (iser_task->data[ISER_DIR_OUT].copy_buf) {
+   if (iser_task->data[ISER_DIR_OUT].orig_sg) {
is_rdma_data_aligned = 0;
iser_finalize_rdma_unaligned_sg(iser_task,
&iser_task->data[ISER_DIR_OUT],
ISER_DIR_OUT);
}
 
-   if (iser_task->prot[ISER_DIR_IN].copy_buf) {
+   if (iser_task->prot[ISER_DIR_IN].orig_sg) {
is_rdma_prot_aligned = 0;
iser_finalize_rdma_unaligned_sg(iser_task,
&iser_task->prot[ISER_DIR_IN],
ISER_DIR_IN);
}
 
-   if (iser_task->prot[ISER_DIR_OUT].copy_buf) {
+   if (iser_task->prot[ISER_DIR_OUT].orig_sg) {
is_rdma_prot_aligned = 0;
iser_finalize_rdma_unaligned_sg(iser_task,
&iser_task->prot[ISER_DIR_OUT],
diff --git a/drivers/infiniband/ulp/iser/iser_memory.c 
b/drivers/infiniband/ulp/iser/iser_memory.c
index e78ce53..f90d6de 100644
--- a/drivers/infiniband/ulp/iser/iser_memory.c
+++ b/drivers/infiniband/ulp/iser/iser_memory.c
@@ -39,7 +39,112 @@
 
 #include "iscsi_iser.h"
 
-#define ISER_KMALLOC_THRESHOLD 0x2 /* 128K - kmalloc limit */
+static void
+iser_free_bounce_sg(struct iser_data_buf *data)
+{
+   struct scatterlist *sg;
+   int count;
+
+   for_each_sg(data->sg, sg, data->size, count)
+   __free_page(sg_page(sg));
+
+   kfree(data->sg);
+
+   data->sg = data->orig_sg;
+   data->size = data->orig_size;
+   data->orig_sg = NULL;
+   data->orig_size = 0;
+}
+
+static int
+iser_alloc_bounce_sg(struct iser_data_buf *data)
+{
+   struct scatterlist *sg;
+   struct page *page;
+   unsigned long length = data->data_len;
+   int i = 0, nents = DIV_ROUND_UP(length, PAGE_SIZE);
+
+   sg = kcalloc(nents, sizeof(*sg), GFP_ATOMIC);
+   if (!sg)
+   goto err;
+
+   sg_init_table(sg, nents);
+   while (length) {
+   u32 page_len = min_t(u32, length, PAGE_SIZE);

[PATCH v2 04/18] IB/iser: Remove redundant cmd_data_len calculation

2015-03-31 Thread Sagi Grimberg
This code was added before we had protection data length
calculation (in iser_send_command), so we needed to calc
the sg data length from the sg itself. This is not needed
anymore.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg 
Signed-off-by: Adir Lev 
---
 drivers/infiniband/ulp/iser/iser_memory.c |5 +
 1 files changed, 1 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iser_memory.c 
b/drivers/infiniband/ulp/iser/iser_memory.c
index 341040b..32ccd5c 100644
--- a/drivers/infiniband/ulp/iser/iser_memory.c
+++ b/drivers/infiniband/ulp/iser/iser_memory.c
@@ -53,12 +53,9 @@ static int iser_start_rdma_unaligned_sg(struct 
iscsi_iser_task *iser_task,
struct scatterlist *sgl = (struct scatterlist *)data->buf;
struct scatterlist *sg;
char *mem = NULL;
-   unsigned long  cmd_data_len = 0;
+   unsigned long  cmd_data_len = data->data_len;
int dma_nents, i;
 
-   for_each_sg(sgl, sg, data->size, i)
-   cmd_data_len += ib_sg_dma_len(dev, sg);
-
if (cmd_data_len > ISER_KMALLOC_THRESHOLD)
mem = (void *)__get_free_pages(GFP_ATOMIC,
  ilog2(roundup_pow_of_two(cmd_data_len)) - PAGE_SHIFT);
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


iser patches for kernel 4.1 [v2]

2015-03-31 Thread Sagi Grimberg
Hi Roland,

This set contains bug fixes as well as code refactoring patches

- Patches 1-3: Bug fixes (stable material)
- Patches 4-6: Bounce buffer related code cleanups
- Patches 7-12,16: Refactoring
- Patches 13-15:   Minor optimizations
- Patch 17:Version bump
- Patch 18:Bounce buffer rewrite to handle situations where
   large transfers are unaligned causing iser to
   use high order allocations

Changes from v1:
- Addressed review comments (patch 18) from Alex Lyakas on bounce buffer
  re-write code (kmap/kunmap mismatch, possible use-after-free,
  and double free).

Sagi Grimberg (18):
  IB/iser: Fix unload during ep_poll wrong dereference
  IB/iser: Handle fastreg/local_inv completion errors
  IB/iser: Fix wrong calculation of protection buffer length
  IB/iser: Remove redundant cmd_data_len calculation
  IB/iser: Remove a redundant struct iser_data_buf
  IB/iser: Don't pass ib_device to fall_to_bounce_buff routine
  IB/iser: Move memory reg/dereg routines to iser_memory.c
  IB/iser: Remove redundant assignments in iser_reg_page_vec
  IB/iser: Get rid of struct iser_rdma_regd
  IB/iser: Merge build page-vec into register page-vec
  IB/iser: Move fastreg descriptor pool get/put to helper functions
  IB/iser: Move PI context alloc/free to routines
  IB/iser: Make fastreg pool cache friendly
  IB/iser: Modify struct iser_mem_reg members
  IB/iser: Pass struct iser_mem_reg to iser_fast_reg_mr and
iser_reg_sig_mr
  IB/iser: Remove code duplication for a single DMA entry
  IB/iser: Bump version to 1.6
  IB/iser: Rewrite bounce buffer code path

 drivers/infiniband/ulp/iser/iscsi_iser.h |   66 +---
 drivers/infiniband/ulp/iser/iser_initiator.c |   66 ++--
 drivers/infiniband/ulp/iser/iser_memory.c|  524 +++---
 drivers/infiniband/ulp/iser/iser_verbs.c |  220 
 4 files changed, 432 insertions(+), 444 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 03/18] IB/iser: Fix wrong calculation of protection buffer length

2015-03-31 Thread Sagi Grimberg
This length miss-calculation may cause a silent data corruption
in the DIX case and cause the device to reference unmapped area.

Fixes: d77e65350f2d ('libiscsi, iser: Adjust data_length to include protection 
information')
Signed-off-by: Sagi Grimberg 
---
 drivers/infiniband/ulp/iser/iser_initiator.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iser_initiator.c 
b/drivers/infiniband/ulp/iser/iser_initiator.c
index 20e859a..76eb57b 100644
--- a/drivers/infiniband/ulp/iser/iser_initiator.c
+++ b/drivers/infiniband/ulp/iser/iser_initiator.c
@@ -409,8 +409,8 @@ int iser_send_command(struct iscsi_conn *conn,
if (scsi_prot_sg_count(sc)) {
prot_buf->buf  = scsi_prot_sglist(sc);
prot_buf->size = scsi_prot_sg_count(sc);
-   prot_buf->data_len = data_buf->data_len >>
-ilog2(sc->device->sector_size) * 8;
+   prot_buf->data_len = (data_buf->data_len >>
+ilog2(sc->device->sector_size)) * 8;
}
 
if (hdr->flags & ISCSI_FLAG_CMD_READ) {
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 06/18] IB/iser: Don't pass ib_device to fall_to_bounce_buff routine

2015-03-31 Thread Sagi Grimberg
No need to pass that, we can take it from the task.
In a later stage, this function will be invoked
according to a device capability.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg 
Signed-off-by: Adir Lev 
---
 drivers/infiniband/ulp/iser/iser_memory.c |   12 ++--
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iser_memory.c 
b/drivers/infiniband/ulp/iser/iser_memory.c
index beeabd0..9c60ff1 100644
--- a/drivers/infiniband/ulp/iser/iser_memory.c
+++ b/drivers/infiniband/ulp/iser/iser_memory.c
@@ -334,19 +334,19 @@ void iser_dma_unmap_task_data(struct iscsi_iser_task 
*iser_task,
 }
 
 static int fall_to_bounce_buf(struct iscsi_iser_task *iser_task,
- struct ib_device *ibdev,
  struct iser_data_buf *mem,
  enum iser_data_dir cmd_dir,
  int aligned_len)
 {
-   struct iscsi_conn*iscsi_conn = iser_task->iser_conn->iscsi_conn;
+   struct iscsi_conn *iscsi_conn = iser_task->iser_conn->iscsi_conn;
+   struct iser_device *device = iser_task->iser_conn->ib_conn.device;
 
iscsi_conn->fmr_unalign_cnt++;
iser_warn("rdma alignment violation (%d/%d aligned) or FMR not 
supported\n",
  aligned_len, mem->size);
 
if (iser_debug_level > 0)
-   iser_data_buf_dump(mem, ibdev);
+   iser_data_buf_dump(mem, device->ib_device);
 
/* unmap the command data before accessing it */
iser_dma_unmap_task_data(iser_task, mem,
@@ -384,7 +384,7 @@ int iser_reg_rdma_mem_fmr(struct iscsi_iser_task *iser_task,
 
aligned_len = iser_data_buf_aligned_len(mem, ibdev);
if (aligned_len != mem->dma_nents) {
-   err = fall_to_bounce_buf(iser_task, ibdev, mem,
+   err = fall_to_bounce_buf(iser_task, mem,
 cmd_dir, aligned_len);
if (err) {
iser_err("failed to allocate bounce buffer\n");
@@ -669,7 +669,7 @@ int iser_reg_rdma_mem_fastreg(struct iscsi_iser_task 
*iser_task,
 
aligned_len = iser_data_buf_aligned_len(mem, ibdev);
if (aligned_len != mem->dma_nents) {
-   err = fall_to_bounce_buf(iser_task, ibdev, mem,
+   err = fall_to_bounce_buf(iser_task, mem,
 cmd_dir, aligned_len);
if (err) {
iser_err("failed to allocate bounce buffer\n");
@@ -700,7 +700,7 @@ int iser_reg_rdma_mem_fastreg(struct iscsi_iser_task 
*iser_task,
mem = &iser_task->prot[cmd_dir];
aligned_len = iser_data_buf_aligned_len(mem, ibdev);
if (aligned_len != mem->dma_nents) {
-   err = fall_to_bounce_buf(iser_task, ibdev, mem,
+   err = fall_to_bounce_buf(iser_task, mem,
 cmd_dir, aligned_len);
if (err) {
iser_err("failed to allocate bounce 
buffer\n");
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 14/18] IB/iser: Modify struct iser_mem_reg members

2015-03-31 Thread Sagi Grimberg
No need to keep lkey, va, len variables, we can keep
them as struct ib_sge. This will help when we change the
memory registration logic.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg 
Signed-off-by: Adir Lev 
---
 drivers/infiniband/ulp/iser/iscsi_iser.h |   14 ---
 drivers/infiniband/ulp/iser/iser_initiator.c |   18 +++---
 drivers/infiniband/ulp/iser/iser_memory.c|   30 +-
 3 files changed, 29 insertions(+), 33 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h 
b/drivers/infiniband/ulp/iser/iscsi_iser.h
index 185d2ec..5fd0963 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.h
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.h
@@ -247,18 +247,14 @@ struct iscsi_endpoint;
 /**
  * struct iser_mem_reg - iSER memory registration info
  *
- * @lkey: MR local key
- * @rkey: MR remote key
- * @va:   MR start address (buffer va)
- * @len:  MR length
+ * @sge:  memory region sg element
+ * @rkey: memory region remote key
  * @mem_h:pointer to registration context (FMR/Fastreg)
  */
 struct iser_mem_reg {
-   u32  lkey;
-   u32  rkey;
-   u64  va;
-   u64  len;
-   void *mem_h;
+   struct ib_sgesge;
+   u32  rkey;
+   void*mem_h;
 };
 
 enum iser_desc_type {
diff --git a/drivers/infiniband/ulp/iser/iser_initiator.c 
b/drivers/infiniband/ulp/iser/iser_initiator.c
index 420a613..b2e3b77 100644
--- a/drivers/infiniband/ulp/iser/iser_initiator.c
+++ b/drivers/infiniband/ulp/iser/iser_initiator.c
@@ -82,11 +82,11 @@ static int iser_prepare_read_cmd(struct iscsi_task *task)
 
hdr->flags|= ISER_RSV;
hdr->read_stag = cpu_to_be32(mem_reg->rkey);
-   hdr->read_va   = cpu_to_be64(mem_reg->va);
+   hdr->read_va   = cpu_to_be64(mem_reg->sge.addr);
 
iser_dbg("Cmd itt:%d READ tags RKEY:%#.4X VA:%#llX\n",
 task->itt, mem_reg->rkey,
-(unsigned long long)mem_reg->va);
+(unsigned long long)mem_reg->sge.addr);
 
return 0;
 }
@@ -139,20 +139,20 @@ iser_prepare_write_cmd(struct iscsi_task *task,
if (unsol_sz < edtl) {
hdr->flags |= ISER_WSV;
hdr->write_stag = cpu_to_be32(mem_reg->rkey);
-   hdr->write_va   = cpu_to_be64(mem_reg->va + unsol_sz);
+   hdr->write_va   = cpu_to_be64(mem_reg->sge.addr + unsol_sz);
 
iser_dbg("Cmd itt:%d, WRITE tags, RKEY:%#.4X "
 "VA:%#llX + unsol:%d\n",
 task->itt, mem_reg->rkey,
-(unsigned long long)mem_reg->va, unsol_sz);
+(unsigned long long)mem_reg->sge.addr, unsol_sz);
}
 
if (imm_sz > 0) {
iser_dbg("Cmd itt:%d, WRITE, adding imm.data sz: %d\n",
 task->itt, imm_sz);
-   tx_dsg->addr   = mem_reg->va;
+   tx_dsg->addr = mem_reg->sge.addr;
tx_dsg->length = imm_sz;
-   tx_dsg->lkey   = mem_reg->lkey;
+   tx_dsg->lkey = mem_reg->sge.lkey;
iser_task->desc.num_sge = 2;
}
 
@@ -479,9 +479,9 @@ int iser_send_data_out(struct iscsi_conn *conn,
 
mem_reg = &iser_task->rdma_reg[ISER_DIR_OUT];
tx_dsg = &tx_desc->tx_sg[1];
-   tx_dsg->addr= mem_reg->va + buf_offset;
-   tx_dsg->length  = data_seg_len;
-   tx_dsg->lkey= mem_reg->lkey;
+   tx_dsg->addr = mem_reg->sge.addr + buf_offset;
+   tx_dsg->length = data_seg_len;
+   tx_dsg->lkey = mem_reg->sge.lkey;
tx_desc->num_sge = 2;
 
if (buf_offset + data_seg_len > iser_task->data[ISER_DIR_OUT].data_len) 
{
diff --git a/drivers/infiniband/ulp/iser/iser_memory.c 
b/drivers/infiniband/ulp/iser/iser_memory.c
index 45f5120..40d22d5 100644
--- a/drivers/infiniband/ulp/iser/iser_memory.c
+++ b/drivers/infiniband/ulp/iser/iser_memory.c
@@ -400,10 +400,10 @@ int iser_reg_page_vec(struct iscsi_iser_task *iser_task,
return ret;
}
 
-   mem_reg->lkey = fmr->fmr->lkey;
+   mem_reg->sge.lkey = fmr->fmr->lkey;
mem_reg->rkey = fmr->fmr->rkey;
-   mem_reg->va = page_vec->pages[0] + page_vec->offset;
-   mem_reg->len = page_vec->data_size;
+   mem_reg->sge.addr = page_vec->pages[0] + page_vec->offset;
+   mem_reg->sge.length = page_vec->data_size;
mem_reg->mem_h = fmr;
 
return 0;
@@ -479,17 +479,17 @@ int iser_reg_rdma_mem_fmr(struct iscsi_iser_task 
*iser_task,
if (mem->dma_nents == 1) {
sg = mem->sg;
 
-   mem_reg->lkey = device->mr->lkey;
+   mem_reg->sge.lkey = device->mr->lkey;
mem_reg->rkey = device->mr->rkey;
-   mem_reg->len  = ib_sg_dma_len(ibdev, &sg[0]);
-   mem_reg->va   = ib_sg_dma_address(ibdev, &sg[0]);
+   mem_reg->sge.leng

[PATCH v2 17/18] IB/iser: Bump version to 1.6

2015-03-31 Thread Sagi Grimberg
Signed-off-by: Sagi Grimberg 
---
 drivers/infiniband/ulp/iser/iscsi_iser.h |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h 
b/drivers/infiniband/ulp/iser/iscsi_iser.h
index 5fd0963..a39645a 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.h
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.h
@@ -69,7 +69,7 @@
 
 #define DRV_NAME   "iser"
 #define PFXDRV_NAME ": "
-#define DRV_VER"1.5"
+#define DRV_VER"1.6"
 
 #define iser_dbg(fmt, arg...)   \
do { \
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-next 0/9] mlx4 changes in virtual GID management

2015-03-31 Thread Christoph Lameter
On Tue, 31 Mar 2015, David Miller wrote:

> > Well this needs to be addressed yes but in order to have that done someone
> > needs to step forward do the proper work, maintain git trees, do the
> > review and show up for the conferences. Neither Roland nor Or was there at
> > the Infiniband conferences in Monterey this year(!) and this is the prime
> > venue for face to face conversation about the subsystem each year.
>
> I don't go to IETF meetings, ever, yet I am rather sure that nobody
> uses that to question my ability to be the networking maintainer.

This isnt a standards body. Openfabrics is the organization dedicated to
the development of the infiniband stack.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-next 0/9] mlx4 changes in virtual GID management

2015-03-31 Thread David Miller
From: Christoph Lameter 
Date: Tue, 31 Mar 2015 07:49:38 -0500 (CDT)

> Well this needs to be addressed yes but in order to have that done someone
> needs to step forward do the proper work, maintain git trees, do the
> review and show up for the conferences. Neither Roland nor Or was there at
> the Infiniband conferences in Monterey this year(!) and this is the prime
> venue for face to face conversation about the subsystem each year.

I don't go to IETF meetings, ever, yet I am rather sure that nobody
uses that to question my ability to be the networking maintainer.

I've gone several years at times without meeting other networking
developers as well, and that also I am pretty sure did not harm my
stature as the networking maintainer.

So I absolutely disagree that these two acts are requirements for
someone whose job is to steadily and reasonably review and apply
patches for a subsystem.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-next 0/9] mlx4 changes in virtual GID management

2015-03-31 Thread Christoph Lameter
On Tue, 31 Mar 2015, Hal Rosenstock wrote:

> > Sean and Hal showed up for the conference. Sean is active mostly in
> > focusing on userspace stuff. Both could become more involved in the kernel
> > ib tree and the review process? Hal is employed by Mellanox but is not
> > participating in these patch related discussions? Can Mellanox give Hal
> > more time for his role as the kernel ib subsystem maintainer? Maybe you
> > could get that resolved at Mellanox? You already have one of your own as
> > a maintainer of the IB subsystem.
>
> My involvement is limited to IB management related aspects in the kernel.

Need to update the entry then I think.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-next 0/9] mlx4 changes in virtual GID management

2015-03-31 Thread David Miller
From: Or Gerlitz 
Date: Tue, 31 Mar 2015 14:22:50 +0300

> On Tue, Mar 31, 2015 at 6:47 AM, Roland Dreier  wrote:
>>> Roland, I have to genuinely agree with Or, that your handling of
>>> patch integration is sub-par and really painful for anyone actually
>>> trying to get real work done here.
>>>
>>> If you simply don't have the time to devote to constantly reviewing
>>> patches as they come in, and doing so in a timely manner, please let
>>> someone who is actually interested and has the time to take over.
>>
>> It's a fair criticism, and certainly for at least the last year or so
>> I have not had the time to do enough work as a maintainer.  I have
>> hope that some of the things that have been keeping me busy are dying
>> down and that I'll have more time to spend on handling the RDMA tree,
>> but that's just talk until I actually get more done.
> 
> Roland, well, with few variations, this goes way beyond a year. I
> would say more or less half of the time you're wearing the maintainer
> hat (2005-now). The practice of updating your for-next branch to rc1
> only few days/hours before the the kernel is out and the merge window
> opens, is an attitude, not lack of resources and will not be solved by
> whatever processes people suggest. You need to act differently.

Unfortunately, and no direct offense intend to you personally Roland,
but I agree with Or here.

If a person really cares about a subsystem they are marked at the
maintainer for, they usually _make_ the time necessary to apply
patches and attend to maintainership in a reasonable manner.

If this "once every merge window" behavior was limited to one release
cycle, I'd give the benefit of the doubt, but this has been going on
for a very long time.

You cannot on the one hand say "I care about this subsystem and the
long term maintainership and ABI stability of it" yet on the other
hand not be willing to put forth even the _MOST MINIMAL_ amount of
effort and time required to steadily review and apply patches over a
period of several years.

It doesn't take a lot of time to do this work, especially if you use
the correct tools.  I can review and apply 100 patches a day, even
when I'm on vacation.  I'm an extreme example, but what you're doing
right now Roland is not acceptable and is not in agreement with your
claims about how much you care about this subsystem.

If you processed the incoming patch queue even _once_ a week, we
wouldn't even be having this conversation.  But you haven't even been
doing that.

I get sick to my stomache when a patch gets to 3 days in my patch
queue, that's already too long.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 00/15] NFS/RDMA patches proposed for 4.1

2015-03-31 Thread Anna Schumaker
On 03/30/2015 02:33 PM, Chuck Lever wrote:
> This is a series of client-side patches for NFS/RDMA. In preparation
> for increasing the transport credit limit and maximum rsize/wsize,
> I've re-factored the memory registration logic into separate files,
> invoked via a method API.
> 
> The series is available in the nfs-rdma-for-4.1 topic branch at
> 
> git://linux-nfs.org/projects/cel/cel-2.6.git

Thanks!  I've applied these and pushed them out to my nfs-rdma git tree.

Anna

> 
> Changes since v2:
> - Rebased on 4.0-rc6
> - One minor fix squashed into 01/15
> - Tested-by tags added
> 
> Changes since v1:
> - Rebased on 4.0-rc5
> - Main optimizations postponed to 4.2
> - Addressed review comments from Anna, Sagi, and Devesh
> 
> ---
> 
> Chuck Lever (15):
>   SUNRPC: Introduce missing well-known netids
>   xprtrdma: Display IPv6 addresses and port numbers correctly
>   xprtrdma: Perform a full marshal on retransmit
>   xprtrdma: Byte-align FRWR registration
>   xprtrdma: Prevent infinite loop in rpcrdma_ep_create()
>   xprtrdma: Add vector of ops for each memory registration strategy
>   xprtrdma: Add a "max_payload" op for each memreg mode
>   xprtrdma: Add a "register_external" op for each memreg mode
>   xprtrdma: Add a "deregister_external" op for each memreg mode
>   xprtrdma: Add "init MRs" memreg op
>   xprtrdma: Add "reset MRs" memreg op
>   xprtrdma: Add "destroy MRs" memreg op
>   xprtrdma: Add "open" memreg op
>   xprtrdma: Handle non-SEND completions via a callout
>   xprtrdma: Make rpcrdma_{un}map_one() into inline functions
> 
> 
>  include/linux/sunrpc/msg_prot.h|8 
>  include/linux/sunrpc/xprtrdma.h|5 
>  net/sunrpc/xprtrdma/Makefile   |3 
>  net/sunrpc/xprtrdma/fmr_ops.c  |  208 +++
>  net/sunrpc/xprtrdma/frwr_ops.c |  353 ++
>  net/sunrpc/xprtrdma/physical_ops.c |   94 +
>  net/sunrpc/xprtrdma/rpc_rdma.c |   87 ++--
>  net/sunrpc/xprtrdma/transport.c|   61 ++-
>  net/sunrpc/xprtrdma/verbs.c|  699 
> +++-
>  net/sunrpc/xprtrdma/xprt_rdma.h|   90 -
>  10 files changed, 882 insertions(+), 726 deletions(-)
>  create mode 100644 net/sunrpc/xprtrdma/fmr_ops.c
>  create mode 100644 net/sunrpc/xprtrdma/frwr_ops.c
>  create mode 100644 net/sunrpc/xprtrdma/physical_ops.c
> 
> --
> Chuck Lever
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 08/11] IB/Verbs: Use management helper has_iwarp() for, iwarp-check

2015-03-31 Thread Tom Talpey

On 3/31/2015 7:41 AM, Michael Wang wrote:

Hi, Tom

Thanks for the comments :-)

On 03/31/2015 01:19 PM, Tom Talpey wrote:


[oops - repeating my reply with full cc's]


[snip]



Actually I'm thinking about Doug's idea to use rdma_transport_is_XX()
instead of the current basic helper, thus may be use rdma_transport_is_iwarp()
in here could be better, since it's actually a feature of iwarp tech
that RDMA Read only support one scatter-gather entry.


No, you should expose an attribute to surface the maximum length of
the remote gather list, which varies by adapter as well as protocol.
The fact that iWARP is different from IB is not relevant, and conflates
unrelated properties.


To be confirmed, so your point is that the max-read-sges will be different
even the transport is the same IWRAP, and that depends on the capability
of adapter, correct?


Yes, in fact the iWARP protocol does not preclude multiple read SGEs,
even though most iWARP implementations have chosen to support just one.

Even for multi-SGE-capable adapters, there is a limit of SGL size, based
on the adapter's work request format and other factors. So my argument
is that upper layers can and should query that, not make a blanket
decision based on protocol type.



I currently only find this one place where infer max-read-sges from
transport type, it looks more like a special case to me rather than a generic
method we could exposed... and also  not very related with IB management
helper concept IMHO.


It is most certainly not a special case, but you could decide to
introduce it in many ways. I'm not commenting on that.

My main concern is that you do not introduce a new and clumsy "is iWARP"
rule as an adapter-specific API requirement to expose the RDMA Read SGE
behavior. That's what your initial message seemed to imply?


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 08/11] IB/Verbs: Use management helper has_iwarp() for, iwarp-check

2015-03-31 Thread Michael Wang
On 03/31/2015 03:42 PM, Tom Talpey wrote:

> On 3/31/2015 7:41 AM, Michael Wang wrote:
>> [snip]
> Yes, in fact the iWARP protocol does not preclude multiple read SGEs,
> even though most iWARP implementations have chosen to support just one.
>
> Even for multi-SGE-capable adapters, there is a limit of SGL size, based
> on the adapter's work request format and other factors. So my argument
> is that upper layers can and should query that, not make a blanket
> decision based on protocol type.

Thanks for the explanation  Sounds like some new callback on device level
like query_device() to acquire the right info.

>> I currently only find this one place where infer max-read-sges from
>> transport type, it looks more like a special case to me rather than a generic
>> method we could exposed... and also  not very related with IB management
>> helper concept IMHO.
> It is most certainly not a special case, but you could decide to
> introduce it in many ways. I'm not commenting on that.
>
> My main concern is that you do not introduce a new and clumsy "is iWARP"
> rule as an adapter-specific API requirement to expose the RDMA Read SGE
> behavior. That's what your initial message seemed to imply?

Yeah I planed to just use rdma_transport_iwarp() to replace the check, it's
actually meaningless but just refine, frankly speaking I would prefer some
driver developer to work on that part, at this point I prefer to focus on
introducing the management helpers firstly 

Maybe we could mark it as a TODO temporarily, if later we found more
scenes using similar logical, we can collect them and do some reform 

Regards,
Michael Wang


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 06/11] IB/Verbs: Use management helper has_sa() and cap_sa(), for sa-check

2015-03-31 Thread Michael Wang
On 03/30/2015 07:02 PM, Doug Ledford wrote:
> On Mon, 2015-03-30 at 18:42 +0200, Michael Wang wrote:
>> [snip]
>>
>> rdma_transport_is_ib and rdma_port_ll_is_ib is actually just rough helper
>> to save some code, we can get rid of them when we no longer need them, but
>> currently device driver still using them a lot, I'm not sure if the new
>> mechanism could take cover all these cases...
> [snip]
>
> If we use something like this, then the above is all you need.  Then
> every place in the code that checks for something like has_sa or cap_sa
> can be replaced with rdma_ib_mgmt.  When Ira updates his patches for
> this, he can check for rdma_opa_mgmt to enable jumbo MAD packets and
> whatever else he needs. Every place that does transport == IB and ll ==
> Ethernet can become rdma_transport_is_roce.  Every place that does
> transport == IB and ll == INFINIBAND becomes rdma_transport_is_ib.

Get your point :-) I need to investigate all those cases see if it
works, in case if things unclear, I'll reform it according to my
understanding to adapt to this new mechanism, if there are any
misunderstanding, we can address them case by case during
the review.

>  The
> code in multicast.c just needs to check rdma_ib_mgmt() (which happens to
> make perfect sense anyway as the code in multicast.c that is checking
> that we are on an IB interface is doing so because IB requires extra
> management of the multicast group joins/leaves).

I'll adopt those helpers already been discussed, and they will be
implemented with the helpers from new mechanism
(eg, rdma_ib_mgmt) in next version.

> But, like I said, this
> is an all or nothing change, it isn't something we can ease into.

It's will be great if we can make it, let's see ;-)

Regards,
Michael Wang

>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-next 0/9] mlx4 changes in virtual GID management

2015-03-31 Thread Hal Rosenstock
On 3/31/2015 8:49 AM, Christoph Lameter wrote:
> Well this needs to be addressed yes but in order to have that done someone
> needs to step forward do the proper work, maintain git trees, do the
> review and show up for the conferences. Neither Roland nor Or was there at
> the Infiniband conferences in Monterey this year(!) and this is the prime
> venue for face to face conversation about the subsystem each year.
> 
> Just looked at the MAINTAINERS file: Surprise: We already have 3
> maintainers for the IB subsystem:
> 
> INFINIBAND SUBSYSTEM
> M:  Roland Dreier 
> M:  Sean Hefty 
> M:  Hal Rosenstock 
> L:  linux-rdma@vger.kernel.org
> W:  http://www.openfabrics.org/
> Q:  http://patchwork.kernel.org/project/linux-rdma/list/
> T:  git 
> git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git
> S:  Supported
> F:  Documentation/infiniband/
> F:  drivers/infiniband/
> F:  include/uapi/linux/if_infiniband.h
> 
> Sean and Hal showed up for the conference. Sean is active mostly in
> focusing on userspace stuff. Both could become more involved in the kernel
> ib tree and the review process? Hal is employed by Mellanox but is not
> participating in these patch related discussions? Can Mellanox give Hal
> more time for his role as the kernel ib subsystem maintainer? Maybe you
> could get that resolved at Mellanox? You already have one of your own as
> a maintainer of the IB subsystem.

My involvement is limited to IB management related aspects in the kernel.

-- Hal

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-next 0/9] mlx4 changes in virtual GID management

2015-03-31 Thread Christoph Lameter
Well this needs to be addressed yes but in order to have that done someone
needs to step forward do the proper work, maintain git trees, do the
review and show up for the conferences. Neither Roland nor Or was there at
the Infiniband conferences in Monterey this year(!) and this is the prime
venue for face to face conversation about the subsystem each year.

Just looked at the MAINTAINERS file: Surprise: We already have 3
maintainers for the IB subsystem:

INFINIBAND SUBSYSTEM
M:  Roland Dreier 
M:  Sean Hefty 
M:  Hal Rosenstock 
L:  linux-rdma@vger.kernel.org
W:  http://www.openfabrics.org/
Q:  http://patchwork.kernel.org/project/linux-rdma/list/
T:  git git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git
S:  Supported
F:  Documentation/infiniband/
F:  drivers/infiniband/
F:  include/uapi/linux/if_infiniband.h

Sean and Hal showed up for the conference. Sean is active mostly in
focusing on userspace stuff. Both could become more involved in the kernel
ib tree and the review process? Hal is employed by Mellanox but is not
participating in these patch related discussions? Can Mellanox give Hal
more time for his role as the kernel ib subsystem maintainer? Maybe you
could get that resolved at Mellanox? You already have one of your own as
a maintainer of the IB subsystem.


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 08/11] IB/Verbs: Use management helper has_iwarp() for, iwarp-check

2015-03-31 Thread Michael Wang
Hi, Tom

Thanks for the comments :-)

On 03/31/2015 01:19 PM, Tom Talpey wrote:
> [snip]
>
>>
>> Actually I'm thinking about Doug's idea to use rdma_transport_is_XX()
>> instead of the current basic helper, thus may be use 
>> rdma_transport_is_iwarp()
>> in here could be better, since it's actually a feature of iwarp tech
>> that RDMA Read only support one scatter-gather entry.
>
> No, you should expose an attribute to surface the maximum length of
> the remote gather list, which varies by adapter as well as protocol.
> The fact that iWARP is different from IB is not relevant, and conflates
> unrelated properties.

To be confirmed, so your point is that the max-read-sges will be different
even the transport is the same IWRAP, and that depends on the capability
of adapter, correct?

I currently only find this one place where infer max-read-sges from
transport type, it looks more like a special case to me rather than a generic
method we could exposed... and also  not very related with IB management
helper concept IMHO.

Regards,
Michael Wang

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 08/11] IB/Verbs: Use management helper has_iwarp() for, iwarp-check

2015-03-31 Thread Tom Talpey

On 3/31/2015 3:39 AM, Michael Wang wrote:

On 03/31/2015 12:35 AM, Jason Gunthorpe wrote:

On Mon, Mar 30, 2015 at 05:10:12PM +0200, Michael Wang wrote:

I found that actually we don't have to touch this one which
only used by HW driver currently.

I'm having a hard time understanding this, the code in question was in

net/sunrpc/xprtrdma/svc_rdma_recvfrom.c

Which is the NFS ULP, not a device driver.


I'm not familiar with this part too :-P but yes, it looks like an ulp
to support NFS.


It's the NFS server itself.



Actually I'm thinking about Doug's idea to use rdma_transport_is_XX()
instead of the current basic helper, thus may be use rdma_transport_is_iwarp()
in here could be better, since it's actually a feature of iwarp tech
that RDMA Read only support one scatter-gather entry.


No, you should expose an attribute to surface the maximum length of
the remote gather list, which varies by adapter as well as protocol.
The fact that iWARP is different from IB is not relevant, and conflates
unrelated properties.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-next 0/9] mlx4 changes in virtual GID management

2015-03-31 Thread Or Gerlitz
On Tue, Mar 31, 2015 at 6:47 AM, Roland Dreier  wrote:
>> Roland, I have to genuinely agree with Or, that your handling of
>> patch integration is sub-par and really painful for anyone actually
>> trying to get real work done here.
>>
>> If you simply don't have the time to devote to constantly reviewing
>> patches as they come in, and doing so in a timely manner, please let
>> someone who is actually interested and has the time to take over.
>
> It's a fair criticism, and certainly for at least the last year or so
> I have not had the time to do enough work as a maintainer.  I have
> hope that some of the things that have been keeping me busy are dying
> down and that I'll have more time to spend on handling the RDMA tree,
> but that's just talk until I actually get more done.

Roland, well, with few variations, this goes way beyond a year. I
would say more or less half of the time you're wearing the maintainer
hat (2005-now). The practice of updating your for-next branch to rc1
only few days/hours before the the kernel is out and the merge window
opens, is an attitude, not lack of resources and will not be solved by
whatever processes people suggest. You need to act differently.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-next 3/9] net/mlx4_core: Set initial admin GUIDs for VFs

2015-03-31 Thread Or Gerlitz

On 3/30/2015 8:16 PM, Jason Gunthorpe wrote:

But, it doesn't really make sense to use eth_random_addr (which
doesn't have a special OUI) and not randomize every bit.

 get_random_bytes(&guid, sizeof(guid));
 guid &= ~(1ULL << 56);
guid |= 1ULL << 57;


OK, we can do that, sure.


I also don't think the kernel should be generating random GUIDs. Either the SA 
should be consulted to do this, or the management stack should generate a cloud 
wide unique number.


There **is** an option to consult with the SM controlled by the 
sm_guid_assign module param of mlx4_ib. When this mode is in action, the 
randomized guid would be overridden by the one assigned by the SA.


In real life environments, people will use the "host assigned" option 
(sm_guid_assign = false) where the virtualization systems generate a 
unique guid to be used by the VF/VM and provision it through mlx4_ib to 
be provided to VF when the bindin.


The only place where the random value will be used, is for OOTB 
experience for staging purposes and such, when one wants to have working 
system with zero-touch. The practice of generating locally random 
address for non-provisioned VF is also used by the pretty much the whole 
set of Ethernet VF drivers (ixgbe_vf, i40e_vf, mlx4 VF code, etc). This 
is low cost solution for point cases which need that.


Or.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [-stable] commit 377b513485fd ("IB/core: Avoid leakage from kernel to user space")

2015-03-31 Thread Jiri Slaby
On 03/27/2015, 01:42 PM, Yann Droneaud wrote:
> Hi,
> 
> Please add commit 377b513485fd ("IB/core: Avoid leakage from kernel to
> user space") to -stable. It can be applied to v2.6.32 and later.

Now pushed to 3.12. Thanks.

-- 
js
suse labs
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-next 0/9] mlx4 changes in virtual GID management

2015-03-31 Thread Sagi Grimberg

On 3/31/2015 6:47 AM, Roland Dreier wrote:

Roland, I have to genuinely agree with Or, that your handling of
patch integration is sub-par and really painful for anyone actually
trying to get real work done here.

If you simply don't have the time to devote to constantly reviewing
patches as they come in, and doing so in a timely manner, please let
someone who is actually interested and has the time to take over.


It's a fair criticism, and certainly for at least the last year or so
I have not had the time to do enough work as a maintainer.  I have
hope that some of the things that have been keeping me busy are dying
down and that I'll have more time to spend on handling the RDMA tree,
but that's just talk until I actually get more done.

I really would like to get more people involved in handling the flow
of patches but I'm not sure who has not only the interest and the time
but also the judgement and expertise to take over.  Certainly Or has
been a long time contributor who has done a lot of great things, but I
still worry about things like ABI stability and backwards
compatibility.

But I'm open to ideas.



We had a talk about a similar topic at LSFMM15. Linux-scsi subsystem is 
a large scale subsystem and also has the "single maintainer with

limited time for upstream maintenance" bottleneck.

Christoph created the "scsi-queue" tree to feed James in order to
accelerate the work process.

One thought was laying a set of rules that would allow a maintainer
to "just apply patches":
- Obviously applies cleanly and does not produce compilation
errors/warning (patches that don't meet this will be removed)
- At least two Reviewed-by tags (one of them can be Acked-by/Tested-by)

One problem with that is that we need "a christoph" to poke people for
review since not a lot of people giving review.

Another thought was to allow multiple maintainers (my understanding is
the patchwork supports it). If we can find some split to allow different
maintainers to feed Roland this might work.

Thoughts?

Sagi.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 08/11] IB/Verbs: Use management helper has_iwarp() for, iwarp-check

2015-03-31 Thread Michael Wang
On 03/31/2015 12:35 AM, Jason Gunthorpe wrote:
> On Mon, Mar 30, 2015 at 05:10:12PM +0200, Michael Wang wrote:
>> I found that actually we don't have to touch this one which
>> only used by HW driver currently.
> I'm having a hard time understanding this, the code in question was in
>
> net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>
> Which is the NFS ULP, not a device driver.

I'm not familiar with this part too :-P but yes, it looks like an ulp
to support NFS.

Actually I'm thinking about Doug's idea to use rdma_transport_is_XX()
instead of the current basic helper, thus may be use rdma_transport_is_iwarp()
in here could be better, since it's actually a feature of iwarp tech
that RDMA Read only support one scatter-gather entry.

But I need more investigation on that idea, there are some part
especially inside device driver I'm not very clear, things could be more
easier if the semantic there is compatible with Doug's proposal ;-)

Regards,
Michael Wang

>
> Regards,
> Jason

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 07/11] IB/Verbs: Use management helper has_mcast() and, cap_mcast() for mcast-check

2015-03-31 Thread Michael Wang
On 03/31/2015 01:47 AM, ira.weiny wrote:
> On Mon, Mar 30, 2015 at 06:20:48PM +0200, Michael Wang wrote:
>> [snip]
 Cc: Jason Gunthorpe 
 Cc: Doug Ledford 
 Cc: Ira Weiny 
 Cc: Sean Hefty 
 Signed-off-by: Michael Wang 
 ---
  drivers/infiniband/core/cma.c   |  2 +-
  drivers/infiniband/core/multicast.c |  8 
  include/rdma/ib_verbs.h | 28 
  3 files changed, 33 insertions(+), 5 deletions(-)

 diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
 index 276fb76..cbbc85b 100644
 --- a/drivers/infiniband/core/cma.c
 +++ b/drivers/infiniband/core/cma.c
 @@ -3398,7 +3398,7 @@ void rdma_leave_multicast(struct rdma_cm_id *id, 
 struct sockaddr *addr)
  ib_detach_mcast(id->qp,
  &mc->multicast.ib->rec.mgid,
  be16_to_cpu(mc->multicast.ib->rec.mlid));
 -if (rdma_transport_is_ib(id_priv->cma_dev->device)) {
 +if (has_mcast(id_priv->cma_dev->device)) {
>
> You need a similar check in rdma_join_multicast.
Thanks for the remind, I'll give it a reform :-)

Regards,
Michael Wang


>
> Ira
>
>
  switch (rdma_port_get_link_layer(id->device, 
 id->port_num)) {
  case IB_LINK_LAYER_INFINIBAND:
  ib_sa_free_multicast(mc->multicast.ib);
 diff --git a/drivers/infiniband/core/multicast.c 
 b/drivers/infiniband/core/multicast.c
 index 17573ff..ffeaf27 100644
 --- a/drivers/infiniband/core/multicast.c
 +++ b/drivers/infiniband/core/multicast.c
 @@ -780,7 +780,7 @@ static void mcast_event_handler(struct 
 ib_event_handler *handler,
  int index;
  
  dev = container_of(handler, struct mcast_device, event_handler);
 -if (!rdma_port_ll_is_ib(dev->device, event->element.port_num))
 +if (!cap_mcast(dev->device, event->element.port_num))
  return;
  
  index = event->element.port_num - dev->start_port;
 @@ -807,7 +807,7 @@ static void mcast_add_one(struct ib_device *device)
  int i;
  int count = 0;
  
 -if (!rdma_transport_is_ib(device))
 +if (!has_mcast(device))
  return;
  
  dev = kmalloc(sizeof *dev + device->phys_port_cnt * sizeof *port,
 @@ -823,7 +823,7 @@ static void mcast_add_one(struct ib_device *device)
  }
  
  for (i = 0; i <= dev->end_port - dev->start_port; i++) {
 -if (!rdma_port_ll_is_ib(device, dev->start_port + i))
 +if (!cap_mcast(device, dev->start_port + i))
  continue;
  port = &dev->port[i];
  port->dev = dev;
 @@ -861,7 +861,7 @@ static void mcast_remove_one(struct ib_device *device)
  flush_workqueue(mcast_wq);
  
  for (i = 0; i <= dev->end_port - dev->start_port; i++) {
 -if (rdma_port_ll_is_ib(device, dev->start_port + i)) {
 +if (cap_mcast(device, dev->start_port + i)) {
  port = &dev->port[i];
  deref_port(port);
  wait_for_completion(&port->comp);
 diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
 index fa8ffa3..e796104 100644
 --- a/include/rdma/ib_verbs.h
 +++ b/include/rdma/ib_verbs.h
 @@ -1823,6 +1823,19 @@ static inline int has_sa(struct ib_device *device)
  }
  
  /**
 + * has_mcast - Check if a device support Multicast.
 + *
 + * @device: Device to be checked
 + *
 + * Return 0 when a device has none port to support
 + * Multicast.
 + */
 +static inline int has_mcast(struct ib_device *device)
 +{
 +return rdma_transport_is_ib(device);
 +}
 +
 +/**
   * cap_smi - Check if the port of device has the capability
   * Subnet Management Interface.
   *
 @@ -1852,6 +1865,21 @@ static inline int cap_sa(struct ib_device *device, 
 u8 port_num)
  return rdma_port_ll_is_ib(device, port_num);
  }
  
 +/**
 + * cap_mcast - Check if the port of device has the capability
 + * Multicast.
 + *
 + * @device: Device to be checked
 + * @port_num: Port number of the device
 + *
 + * Return 0 when port of the device don't support
 + * Multicast.
 + */
 +static inline int cap_mcast(struct ib_device *device, u8 port_num)
 +{
 +return rdma_port_ll_is_ib(device, port_num);
 +}
 +
  int ib_query_gid(struct ib_device *device,
   u8 port_num, int index, union ib_gid *gid);
  
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a m