Re: ibdiagnet: ibdiagnet /tmp clobbering vulnerability

2013-07-15 Thread Or Gerlitz

On 15/07/2013 08:19, Pramod Gunjikar wrote:

Sending this on behalf of Rajkumar Sivaprakasam.


you didn't copy the author
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] RDMA/cma: silence GCC warning

2013-07-15 Thread Paul Bolle
Building cma.o triggers this GCC warning:
drivers/infiniband/core/cma.c: In function ‘rdma_resolve_addr’:
drivers/infiniband/core/cma.c:465:23: warning: ‘port’ may be used 
uninitialized in this function [-Wmaybe-uninitialized]
drivers/infiniband/core/cma.c:426:5: note: ‘port’ was declared here

This is a false positive, as port will always be initialized if we're
at found.

Help GCC by initializing port to 0. Since valid port numbers
apparently start at 1, this is an invalid value. That could help in
analyzing the code and tracking down errors.

Signed-off-by: Paul Bolle pebo...@tiscali.nl
---
0) Compile tested only.

1) Perhaps a better way to silence GCC is to drop port entirely, and
assign to id_priv-id.port_num directly. Would that be acceptable?

 drivers/infiniband/core/cma.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index f1c279f..c6c0a5f 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -426,6 +426,7 @@ static int cma_resolve_ib_dev(struct rdma_id_private 
*id_priv)
u8 port, p;
int i;
 
+   port = 0; /* silence GCC */
cma_dev = NULL;
addr = (struct sockaddr_ib *) cma_dst_addr(id_priv);
dgid = (union ib_gid *) addr-sib_addr;
-- 
1.8.1.4

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 11/13] IB/srp: Make HCA completion vector configurable

2013-07-15 Thread Bart Van Assche

On 14/07/2013 3:43, Sagi Grimberg wrote:

On 7/3/2013 3:58 PM, Bart Van Assche wrote:

Several InfiniBand HCA's allow to configure the completion vector
per queue pair. This allows to spread the workload created by IB
completion interrupts over multiple MSI-X vectors and hence over
multiple CPU cores. In other words, configuring the completion
vector properly not only allows to reduce latency on an initiator
connected to multiple SRP targets but also allows to improve
throughput.


Hey Bart,
Just wrote a small patch to allow srp_daemon spread connection across
HCA's completion vectors.
But re-thinking on this, is it really a good idea to give the user
control over completion
vectors for CQs he doesn't really owns. This way the user must retrieve
the maximum completion
vectors from the ib_device and consider this when adding a connection
and In addition will need to set proper IRQ affinity.

Perhaps the driver can manage this on it's own without involving the
user, take the mlx4_en driver for
example, it spreads it's CQs across HCAs completion vectors without
involving the user. the user that
opens a socket has no influence of the underlying cq-comp-vector
assignment.

The only use-case I can think of is where the user will want to use only
a subset of the completion-vectors
if the user will want to reserve some completion-vectors for native IB
applications but I don't know
how common it is.

Other from that, I think it is always better to spread the CQs across
HCA completion-vectors, so perhaps the driver
just assign connection CQs across comp-vecs without getting args from
the user, but simply iterate over comp_vectors.

What do you think?


Hello Sagi,

Sorry but I do not think it is a good idea to let srp_daemon assign the 
completion vector. While this might work well on single-socket systems 
this will result in suboptimal results on NUMA systems. For certain 
workloads on NUMA systems, and when a NUMA initiator system is connected 
to multiple target systems, the optimal configuration is to make sure 
that all processing that is associated with a single SCSI host occurs on 
the same NUMA node. This means configuring the completion vector value 
such that IB interrupts are generated on the same NUMA node where the 
associated SCSI host and applications are running.


More in general, performance tuning on NUMA systems requires system-wide 
knowledge of all applications that are running and also of which 
interrupt is processed by which NUMA node. So choosing a proper value 
for the completion vector is only possible once the system topology and 
the IRQ affinity masks are known. I don't think we should build 
knowledge of all this in srp_daemon.


Bart.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V7 libibverbs 0/7] Add extension and XRC QP support

2013-07-15 Thread Yishai Hadas

Hi Jason,

Let me clarify the place holder reservation mentioned in the cover letter.
The entries I was referring to are not proprietary vendor verbs but 
rather user space
verbs who are on their way to be submitted upstream for inclusion in 
libibverbs/libmlx4.
Two of these verbs are the userspace API for Flow-Steering whose kernel 
part (as you know)
is under review in linux-rdma. These verbs are 
ibv_create_flow/ibv_destroy_flow which
serve to attach and detach a QPs to/from flows, using flow 
specification. The third

verb deals with registration of shared-memory, details below.
Each verb consumes two pointers, one for the verbs library code and one 
for provider

specific library implementation, total of six pointers.
I think we all agree that the current milestone to meet w.r.t to verbs 
extensions is

to have the basic extensions mechanism AND XRC patches merged.
On the other hand, RAW Ethernet QPs are merged upstream (kernel + user 
space) but without
upstream mechanism to actually receive traffic on them which makes them 
pretty much useless.
So in that respect, it would make sense for us as vendor to provide 
customers through
mellanox ofed the means to use them. What asked now is to reserve the 
pointers, no more.
As for the Shared-MR, indeed it would have been fair to require 
submitting this

code prior to the pointer reservation, so for next time...

To sum up, we think it would be constructive step to continue with this 
series while reserving the six pointers,
or if this really helps submit for review the libibverbs part of those 
verbs along with the basic verbs extensions patches.


What do you think ?
By the way I'm OOO for a week starting 16/7.

Yishai

New verb, Register Shared Memory Region”: this verb is defined in the 
IB spec, section 11.2.8.8.


“Given an existing Memory Region, a new independent Memory Region 
associated with the same physical memory locations is created by that 
new verb.
Through repeated calls to the Verb, an arbitrary number of Memory 
Regions can potentially share the same physical memory locations.
The memory region created by this verb behaves identically to memory 
regions created by the other memory registration verbs.”


In general sharing MR involves 2 steps:
1) Using the regular mechanism to create a Memory Region, application 
can mention that MR should be created with a later option to be shared. 
The application will supply the allowed sharing access to that MR.
2) Using the “Register Shared MR” to register to the pre-allocated 
shared MR.


On 7/11/2013 8:07 PM, Jason Gunthorpe wrote:

On Thu, Jul 11, 2013 at 06:04:16PM +0300, Yishai Hadas wrote:


ABI support with OFED release, added place holder for 6 function pointers
  in verbs_context to enable ABI compatibility with OFED release that already
  used 6 extended verbs before XRC.

WTF !?!

The extension mechanism is NOT A TOY.

IT IS NOT FOR VENDORS TO USE.

Extension ids must be allocated at upstream, by Roland.

If you deviate from upstream you *MUST* change the ibverbs SONAME.

PERIOD.

What version of OFA OFED did this? This is a violation of the new
upstream first policy at OFA!

So, I'm going to NAK this. It sets an exceedingly bad precedent going
forward.

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 11/13] IB/srp: Make HCA completion vector configurable

2013-07-15 Thread Sagi Grimberg

On 7/15/2013 2:06 PM, Bart Van Assche wrote:

On 14/07/2013 3:43, Sagi Grimberg wrote:

On 7/3/2013 3:58 PM, Bart Van Assche wrote:

Several InfiniBand HCA's allow to configure the completion vector
per queue pair. This allows to spread the workload created by IB
completion interrupts over multiple MSI-X vectors and hence over
multiple CPU cores. In other words, configuring the completion
vector properly not only allows to reduce latency on an initiator
connected to multiple SRP targets but also allows to improve
throughput.


Hey Bart,
Just wrote a small patch to allow srp_daemon spread connection across
HCA's completion vectors.
But re-thinking on this, is it really a good idea to give the user
control over completion
vectors for CQs he doesn't really owns. This way the user must retrieve
the maximum completion
vectors from the ib_device and consider this when adding a connection
and In addition will need to set proper IRQ affinity.

Perhaps the driver can manage this on it's own without involving the
user, take the mlx4_en driver for
example, it spreads it's CQs across HCAs completion vectors without
involving the user. the user that
opens a socket has no influence of the underlying cq-comp-vector
assignment.

The only use-case I can think of is where the user will want to use only
a subset of the completion-vectors
if the user will want to reserve some completion-vectors for native IB
applications but I don't know
how common it is.

Other from that, I think it is always better to spread the CQs across
HCA completion-vectors, so perhaps the driver
just assign connection CQs across comp-vecs without getting args from
the user, but simply iterate over comp_vectors.

What do you think?


Hello Sagi,

Sorry but I do not think it is a good idea to let srp_daemon assign 
the completion vector. While this might work well on single-socket 
systems this will result in suboptimal results on NUMA systems. For 
certain workloads on NUMA systems, and when a NUMA initiator system is 
connected to multiple target systems, the optimal configuration is to 
make sure that all processing that is associated with a single SCSI 
host occurs on the same NUMA node. This means configuring the 
completion vector value such that IB interrupts are generated on the 
same NUMA node where the associated SCSI host and applications are 
running.


More in general, performance tuning on NUMA systems requires 
system-wide knowledge of all applications that are running and also of 
which interrupt is processed by which NUMA node. So choosing a proper 
value for the completion vector is only possible once the system 
topology and the IRQ affinity masks are known. I don't think we should 
build knowledge of all this in srp_daemon.


Bart.



Hey Bart,

Thanks for your quick attention for my question.
srp_daemon is a package designated for the costumer to automatically 
detect targets in the IB fabric. From our expeirience here in Mellanox, 
costumers/users like automatic plugplay tools.
They are reluctant to build their own scriptology to enhance performance 
and settle with srp_daemon which is preferred over use of ibsrpdm and 
manual adding new targets.
Regardless, the completion vectors assignment is meaningless without 
setting proper IRQ affinity, so in the worst case where the user didn't 
set his IRQ affinity,
this assignment will perform like the default completion vector 
assignment as all IRQs are directed without any masking i.e. core 0.


From my expiriments in NUMA systems, optimal performance is gained 
where all IRQs are directed to half of the cores on the NUMA node close 
to the HCA, and all traffic generators share the other half of the cores 
on the same NUMA node. So based on that knowledge, I thought that 
srp_daemon/srp driver will assign it's CQs across the HCAs completion 
vectors, and the user is encouraged to set the IRQ affinity as described 
above to gain optimal performance.
Adding connections over the far NUMA node don't seem to benefit 
performance too much...


As I mentioned, a use-case I see that may raise a problem here, is if 
the user would like to maintain multiple SRP connections and reserve 
some completion vectors for other IB applications on the system.
in this case the user will be able to disable srp_daemon/srp driver 
completion vectors assignment.


So, this was just an idea, and easy implementation that would 
potentionaly give the user semi-automatic performance optimized 
configuration...


-Sagi
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2] libibverbs: Allow arbitrary int values for MTU

2013-07-15 Thread Jeff Squyres (jsquyres)
Bump.

On Jul 10, 2013, at 8:14 AM, Jeff Squyres (jsquyres) jsquy...@cisco.com wrote:

 On Jul 8, 2013, at 1:26 PM, Jason Gunthorpe jguntho...@obsidianresearch.com 
 wrote:
 
 Jeff's patch doesn't break old binaries, old binaries, running with
 normal IB MTUs work fine. The structure layouts all stay the same,
 etc.
 
 
 FWIW, I did a simple test to confirm this.  I installed a stock git HEAD 
 libibverbs into $HOME/libibverbs-HEAD and a libibverbs with the MTU patch in 
 $HOME/libibverbs-mtu-patch.  The mlx4 driver was installed into both trees (I 
 used some fairly old Mellanox HCAs+Dell servers for this test).
 
 This is the base case:
 
 -
 [5:06] dell012:~ ❯❯❯ cd libibverbs-HEAD
 [5:07] dell012:~/libibverbs-HEAD ❯❯❯ ldd bin/ibv_rc_pingpong
   linux-vdso.so.1 =  (0x2aacb000)
   libibverbs.so.1 = /home/jsquyres/libibverbs-HEAD/lib/libibverbs.so.1 
 (0x2accd000)
   libpthread.so.0 = /lib64/libpthread.so.0 (0x2aeec000)
   libdl.so.2 = /lib64/libdl.so.2 (0x2b109000)
   libc.so.6 = /lib64/libc.so.6 (0x2b30e000)
   /lib64/ld-linux-x86-64.so.2 (0x2aaab000)
 [5:07] dell012:~/libibverbs-HEAD ❯❯❯ ./bin/ibv_rc_pingpong dell011
  local address:  LID 0x0004, QPN 0x04004a, PSN 0xc08742, GID ::
  remote address: LID 0x0019, QPN 0x20004a, PSN 0x44c48e, GID ::
 8192000 bytes in 0.02 seconds = 4170.28 Mbit/sec
 1000 iters in 0.02 seconds = 15.72 usec/iter
 -
 
 Works fine.  Now let's use the same libibverbs-HEAD rc pingpong binary, but 
 with the MTU-patched libibverbs.so:
 
 -
 [5:07] dell012:~/libibverbs-HEAD ❯❯❯ mv lib/libibverbs.so.1 
 lib/libibverbs.so.1-bogus
 [5:07] dell012:~/libibverbs-HEAD ❯❯❯ export 
 LD_LIBRARY_PATH=$HOME/libibverbs-mtu-patch/lib
 [5:08] dell012:~/libibverbs-HEAD ❯❯❯ ldd bin/ibv_rc_pingpong
   linux-vdso.so.1 =  (0x2aacb000)
   libibverbs.so.1 = 
 /home/jsquyres/libibverbs-mtu-patch/lib/libibverbs.so.1 (0x2accd000)
   libpthread.so.0 = /lib64/libpthread.so.0 (0x2aeed000)
   libdl.so.2 = /lib64/libdl.so.2 (0x2b10a000)
   libc.so.6 = /lib64/libc.so.6 (0x2b30e000)
   /lib64/ld-linux-x86-64.so.2 (0x2aaab000)
 [5:08] dell012:~/libibverbs-HEAD ❯❯❯ ./bin/ibv_rc_pingpong dell011
  local address:  LID 0x0004, QPN 0x08004a, PSN 0x65391c, GID ::
  remote address: LID 0x0019, QPN 0x24004a, PSN 0x7d137e, GID ::
 8192000 bytes in 0.02 seconds = 4163.39 Mbit/sec
 1000 iters in 0.02 seconds = 15.74 usec/iter
 -
 
 Still works fine.


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [PATCH] libibverbs: Add the use of IBV_SEND_INLINE to example pingpong programs

2013-07-15 Thread Jeff Squyres (jsquyres)
Bump.

On Jul 10, 2013, at 4:32 PM, Jeff Squyres jsquy...@cisco.com wrote:

 If the send size is less than the cap.max_inline_data reported by the
 qp, use the IBV_SEND_INLINE flag.  This now only shows the example of
 using ibv_query_qp(), it also reduces the latency time shown by the
 pingpong programs when the sends can be inlined.
 
 Signed-off-by: Jeff Squyres jsquy...@cisco.com
 ---
 examples/rc_pingpong.c  | 18 +-
 examples/srq_pingpong.c | 19 +--
 examples/uc_pingpong.c  | 17 -
 examples/ud_pingpong.c  | 18 +-
 4 files changed, 51 insertions(+), 21 deletions(-)
 
 diff --git a/examples/rc_pingpong.c b/examples/rc_pingpong.c
 index 15494a1..a8637a5 100644
 --- a/examples/rc_pingpong.c
 +++ b/examples/rc_pingpong.c
 @@ -65,6 +65,7 @@ struct pingpong_context {
   struct ibv_qp   *qp;
   void*buf;
   int  size;
 + int  send_flags;
   int  rx_depth;
   int  pending;
   struct ibv_port_attr portinfo;
 @@ -319,8 +320,9 @@ static struct pingpong_context *pp_init_ctx(struct 
 ibv_device *ib_dev, int size,
   if (!ctx)
   return NULL;
 
 - ctx-size = size;
 - ctx-rx_depth = rx_depth;
 + ctx-size   = size;
 + ctx-send_flags = IBV_SEND_SIGNALED;
 + ctx-rx_depth   = rx_depth;
 
   ctx-buf = memalign(page_size, size);
   if (!ctx-buf) {
 @@ -367,7 +369,8 @@ static struct pingpong_context *pp_init_ctx(struct 
 ibv_device *ib_dev, int size,
   }
 
   {
 - struct ibv_qp_init_attr attr = {
 + struct ibv_qp_attr attr;
 + struct ibv_qp_init_attr init_attr = {
   .send_cq = ctx-cq,
   .recv_cq = ctx-cq,
   .cap = {
 @@ -379,11 +382,16 @@ static struct pingpong_context *pp_init_ctx(struct 
 ibv_device *ib_dev, int size,
   .qp_type = IBV_QPT_RC
   };
 
 - ctx-qp = ibv_create_qp(ctx-pd, attr);
 + ctx-qp = ibv_create_qp(ctx-pd, init_attr);
   if (!ctx-qp)  {
   fprintf(stderr, Couldn't create QP\n);
   goto clean_cq;
   }
 +
 + ibv_query_qp(ctx-qp, attr, IBV_QP_CAP, init_attr);
 + if (init_attr.cap.max_inline_data = size) {
 + ctx-send_flags |= IBV_SEND_INLINE;
 + }
   }
 
   {
 @@ -508,7 +516,7 @@ static int pp_post_send(struct pingpong_context *ctx)
   .sg_list= list,
   .num_sge= 1,
   .opcode = IBV_WR_SEND,
 - .send_flags = IBV_SEND_SIGNALED,
 + .send_flags = ctx-send_flags,
   };
   struct ibv_send_wr *bad_wr;
 
 diff --git a/examples/srq_pingpong.c b/examples/srq_pingpong.c
 index 6e00f8c..552a144 100644
 --- a/examples/srq_pingpong.c
 +++ b/examples/srq_pingpong.c
 @@ -68,6 +68,7 @@ struct pingpong_context {
   struct ibv_qp   *qp[MAX_QP];
   void*buf;
   int  size;
 + int  send_flags;
   int  num_qp;
   int  rx_depth;
   int  pending[MAX_QP];
 @@ -350,9 +351,10 @@ static struct pingpong_context *pp_init_ctx(struct 
 ibv_device *ib_dev, int size,
   if (!ctx)
   return NULL;
 
 - ctx-size = size;
 - ctx-num_qp   = num_qp;
 - ctx-rx_depth = rx_depth;
 + ctx-size   = size;
 + ctx-send_flags = IBV_SEND_SIGNALED;
 + ctx-num_qp = num_qp;
 + ctx-rx_depth   = rx_depth;
 
   ctx-buf = memalign(page_size, size);
   if (!ctx-buf) {
 @@ -413,7 +415,8 @@ static struct pingpong_context *pp_init_ctx(struct 
 ibv_device *ib_dev, int size,
   }
 
   for (i = 0; i  num_qp; ++i) {
 - struct ibv_qp_init_attr attr = {
 + struct ibv_qp_attr attr;
 + struct ibv_qp_init_attr init_attr = {
   .send_cq = ctx-cq,
   .recv_cq = ctx-cq,
   .srq = ctx-srq,
 @@ -424,11 +427,15 @@ static struct pingpong_context *pp_init_ctx(struct 
 ibv_device *ib_dev, int size,
   .qp_type = IBV_QPT_RC
   };
 
 - ctx-qp[i] = ibv_create_qp(ctx-pd, attr);
 + ctx-qp[i] = ibv_create_qp(ctx-pd, init_attr);
   if (!ctx-qp[i])  {
   fprintf(stderr, Couldn't create QP[%d]\n, i);
   goto clean_qps;
   }
 + ibv_query_qp(ctx-qp[i], attr, IBV_QP_CAP, init_attr);
 + if (init_attr.cap.max_inline_data = size) {
 + ctx-send_flags |= IBV_SEND_INLINE;
 + }
   }
 
   for (i = 0; i  num_qp; ++i) {
 @@ -568,7 +575,7 @@ static int pp_post_send(struct 

[PATCH infiniband-diags] infiniband-diags: Eliminate unneeded clean_nodedesc calls

2013-07-15 Thread Hal Rosenstock

in ibroute and dump_fts
clean_nodedesc is called if needed within remap_node_name

Signed-off-by: Hal Rosenstock h...@mellanox.com
---
diff --git a/src/dump_fts.c b/src/dump_fts.c
index 89f0dce..5383c9d 100644
--- a/src/dump_fts.c
+++ b/src/dump_fts.c
@@ -154,7 +154,7 @@ void dump_multicast_tables(ibnd_node_t * node, unsigned 
startlid,
endlid = IB_MAX_MCAST_LID;
}
 
-   mapnd = remap_node_name(node_name_map, nodeguid, clean_nodedesc(nd));
+   mapnd = remap_node_name(node_name_map, nodeguid, nd);
 
printf(Multicast mlids [0x%x-0x%x] of switch %s guid 0x%016 PRIx64
(%s):\n, startlid, endlid, portid2str(portid), nodeguid,
@@ -289,7 +289,7 @@ int dump_lid(char *str, int str_len, int lid, int valid,
*last_port_lid = baselid + (1  lmc) - 1;
}
 
-   mapnd = remap_node_name(node_name_map, nodeguid, clean_nodedesc(nd));
+   mapnd = remap_node_name(node_name_map, nodeguid, nd);
  
rc = snprintf(str, str_len, : (%s portguid %s: '%s'),
  mad_dump_val(IB_NODE_TYPE_F, ntype, sizeof ntype,
@@ -331,7 +331,7 @@ void dump_unicast_tables(ibnd_node_t * node, int startlid, 
int endlid,
endlid = IB_MAX_UCAST_LID;
}
 
-   mapnd = remap_node_name(node_name_map, nodeguid, clean_nodedesc(nd));
+   mapnd = remap_node_name(node_name_map, nodeguid, nd);
 
printf(Unicast lids [0x%x-0x%x] of switch %s guid 0x%016 PRIx64
(%s):\n, startlid, endlid, portid2str(portid), nodeguid,
diff --git a/src/ibroute.c b/src/ibroute.c
index d4827c9..350a557 100644
--- a/src/ibroute.c
+++ b/src/ibroute.c
@@ -178,7 +178,7 @@ char *dump_multicast_tables(ib_portid_t * portid, unsigned 
startlid,
endlid = IB_MAX_MCAST_LID;
}
 
-   mapnd = remap_node_name(node_name_map, nodeguid, clean_nodedesc(nd));
+   mapnd = remap_node_name(node_name_map, nodeguid, nd);
 
printf(Multicast mlids [0x%x-0x%x] of switch %s guid 0x%016 PRIx64
(%s):\n, startlid, endlid, portid2str(portid), nodeguid,
@@ -309,7 +309,7 @@ int dump_lid(char *str, int strlen, int lid, int valid)
last_port_lid = baselid + (1  lmc) - 1;
}
 
-   mapnd = remap_node_name(node_name_map, nodeguid, clean_nodedesc(nd));
+   mapnd = remap_node_name(node_name_map, nodeguid, nd);
  
rc = snprintf(str, strlen, : (%s portguid %s: '%s'),
  mad_dump_val(IB_NODE_TYPE_F, ntype, sizeof ntype,
@@ -348,7 +348,7 @@ char *dump_unicast_tables(ib_portid_t * portid, int 
startlid, int endlid)
endlid = IB_MAX_UCAST_LID;
}
 
-   mapnd = remap_node_name(node_name_map, nodeguid, clean_nodedesc(nd));
+   mapnd = remap_node_name(node_name_map, nodeguid, nd);
 
printf(Unicast lids [0x%x-0x%x] of switch %s guid 0x%016 PRIx64
(%s):\n, startlid, endlid, portid2str(portid), nodeguid,
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mlx5: Add driver for Mellanox Connect-IB adapters

2013-07-15 Thread Or Gerlitz

On 10/07/2013 13:55, Dan Carpenter wrote:

Hello Eli Cohen,

The patch e126ba97dba9: mlx5: Add driver for Mellanox Connect-IB
adapters from Jul 7, 2013, leads to the following Smatch warning:
drivers/net/ethernet/mellanox/mlx5/core/cmd.c:822
mlx5_alloc_cmd_msg()
 warn: use 'flags' here instead of GFP_XXX?

811  static struct mlx5_cmd_msg *mlx5_alloc_cmd_msg(struct mlx5_core_dev 
*dev,
812 gfp_t flags, int size)
^^^

813  {
814  struct mlx5_cmd_mailbox *tmp, *head = NULL;
815  struct mlx5_cmd_prot_block *block;
816  struct mlx5_cmd_msg *msg;
817  int blen;
818  int err;
819  int n;
820  int i;
821
822  msg = kzalloc(sizeof(*msg), GFP_KERNEL);
 ^^
823  if (!msg)
824  return ERR_PTR(-ENOMEM);
825
826  blen = size - min_t(int, sizeof(msg-first.data), size);
827  n = (blen + MLX5_CMD_DATA_BLOCK_SIZE - 1) / 
MLX5_CMD_DATA_BLOCK_SIZE;
828
829  for (i = 0; i  n; i++) {
830  tmp = alloc_cmd_box(dev, flags);
  ^
There is a kmalloc() in alloc_cmd_box() that uses flags as well as a
pci_pool_alloc() that uses it.

831  if (IS_ERR(tmp)) {
832  mlx5_core_warn(dev, failed allocating 
block\n);




Hi,

I think Eli has some problems with his email account for mailing lists, 
so sorry for the delay. Once this is fixed, he should

be addressing your email.

Or.


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rtnl_lock deadlock on 3.10

2013-07-15 Thread Shawn Bohrer
On Wed, Jul 03, 2013 at 08:26:11PM +0300, Or Gerlitz wrote:
 On 03/07/2013 20:22, Shawn Bohrer wrote:
 On Wed, Jul 03, 2013 at 07:33:07AM +0200, Hannes Frederic Sowa wrote:
 On Wed, Jul 03, 2013 at 07:11:52AM +0200, Hannes Frederic Sowa wrote:
 On Tue, Jul 02, 2013 at 01:38:26PM +, Cong Wang wrote:
 On Tue, 02 Jul 2013 at 08:28 GMT, Hannes Frederic Sowa 
 han...@stressinduktion.org wrote:
 On Mon, Jul 01, 2013 at 09:54:56AM -0500, Shawn Bohrer wrote:
 I've managed to hit a deadlock at boot a couple times while testing
 the 3.10 rc kernels.  It seems to always happen when my network
 devices are initializing.  This morning I updated to v3.10 and made a
 few config tweaks and so far I've hit it 4 out of 5 reboots.  It looks
 like most processes are getting stuck on rtnl_lock.  Below is a boot
 log with the soft lockup prints.  Please let know if there is any
 other information I can provide:
 Could you try a build with CONFIG_LOCKDEP enabled?
 
 The problem is clear: ib_register_device() is called with rtnl_lock,
 but itself needs device_mutex, however, ib_register_client() first
 acquires device_mutex, then indirectly calls register_netdev() which
 takes rtnl_lock. Deadlock!
 
 One possible fix is always taking rtnl_lock before taking
 device_mutex, something like below:
 
 diff --git a/drivers/infiniband/core/device.c 
 b/drivers/infiniband/core/device.c
 index 18c1ece..890870b 100644
 --- a/drivers/infiniband/core/device.c
 +++ b/drivers/infiniband/core/device.c
 @@ -381,6 +381,7 @@ int ib_register_client(struct ib_client *client)
   {
   struct ib_device *device;
 + rtnl_lock();
   mutex_lock(device_mutex);
   list_add_tail(client-list, client_list);
 @@ -389,6 +390,7 @@ int ib_register_client(struct ib_client *client)
   client-add(device);
   mutex_unlock(device_mutex);
 + rtnl_unlock();
   return 0;
   }
 diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c 
 b/drivers/infiniband/ulp/ipoib/ipoib_main.c
 index b6e049a..5a7a048 100644
 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
 +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
 @@ -1609,7 +1609,7 @@ static struct net_device *ipoib_add_port(const char 
 *format,
   goto event_failed;
   }
 - result = register_netdev(priv-dev);
 + result = register_netdevice(priv-dev);
   if (result) {
   printk(KERN_WARNING %s: couldn't register ipoib port 
  %d; error %d\n,
  hca-name, port, result);
 Looks good to me. Shawn, could you test this patch?
 ib_unregister_device/ib_unregister_client would need the same change,
 too. I have not checked the other -add() and -remove() functions. Also
 cc'ed linux-rdma@vger.kernel.org, Roland Dreier.
 Cong's patch is missing the #include linux/rtnetlink.h but otherwise
 I've had 34 successful reboots with no deadlocks which is a good sign.
 It sounds like there are more paths that need to be audited and a
 proper patch submitted.  I can do more testing later if needed.
 
 Thanks,
 Shawn
 
 
 Guys, I was a bit busy today looking into that, but I don't think we
 want the IB core layer  (core/device.c) to
 use rtnl locking which is something that belongs to the network stack.

Has anymore thought been put into a proper fix for this issue?

Thanks,
Shawn

-- 

---
This email, along with any attachments, is confidential. If you 
believe you received this message in error, please contact the 
sender immediately and delete all copies of the message.  
Thank you.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Repeated Stale Connection errors on successive reconnect attempt with new cmid/QPs..

2013-07-15 Thread lokesh agarwal
Hello,
I am using rdmacm as my connection manager and I have N processes trying to
establish a complete mesh of connections over RC protocol.

I am receiving Connection rejected event with error 10.  As per the
IBTA, the Connection Server
is rejecting the connection requests thinking that it is a stale connection.

I read an earlier response from Sean stating that it is advisable to
recreate the QP and retry
the connection. That is exactly what I am doing.
Everytime I get err 10, I destroy the cmid and the attached QP, and go
through the entire
connection management again. (create a new cmid, resolve addr/resolve
route/ connect)

The reconnect attempt succeeds until the Route resolve step, at which
point I create a new QP, and issue the connect.
I am stuck in this loop where I keep getting Rejects (err 10) after
repeated attempts.
It almost seems as if the underlying QP number is being recycled and
the remote Connection server keeps rejecting the connection requests.

As per the IBTA:
A CM may receive a REQ/REP specifying a remote QPN in
“REQ:local QPN”/”REP:local QPN” that the CM already considers connected
to a local QP. A local CM may receive such a REQ/REP if its local
QP has a stale connection, as described in section 12.4.1. When a CM
receives such a REQ/REP it shall abort the connection establishment by
issuing REJ to the REQ/REP. It shall then issue DREQ, with “DREQ:remote
QPN” set to the remote QPN from the REQ/REP, until DREP is received
or Max Retries is exceeded, and place the local QP in the TimeWait state

So, the connection Server should issue a disconnect request. But the connecting
client does not get any disconnect requests..? Should librdmacm be
sending out this
disconnect..? Could this be a bug in the kernel driver..?


I scanned the source for cm.c, and I dont see the kernel driver
issuing a disconnect when it detects a stale connection...


-Lokesh
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mlx5: Add driver for Mellanox Connect-IB adapters

2013-07-15 Thread Eli Cohen
Hi Dan,
thanks for cathing this. I will fix this an a subsequent patch.
Currently we can consider this non critical as we don't use commands
from interrupt context.

On Wed, Jul 10, 2013 at 01:55:38PM +0300, Dan Carpenter wrote:
 Hello Eli Cohen,
 
 The patch e126ba97dba9: mlx5: Add driver for Mellanox Connect-IB
 adapters from Jul 7, 2013, leads to the following Smatch warning:
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c:822
 mlx5_alloc_cmd_msg()
warn: use 'flags' here instead of GFP_XXX?
 
811  static struct mlx5_cmd_msg *mlx5_alloc_cmd_msg(struct mlx5_core_dev 
 *dev,
812 gfp_t flags, int size)
^^^
 
813  {
814  struct mlx5_cmd_mailbox *tmp, *head = NULL;
815  struct mlx5_cmd_prot_block *block;
816  struct mlx5_cmd_msg *msg;
817  int blen;
818  int err;
819  int n;
820  int i;
821  
822  msg = kzalloc(sizeof(*msg), GFP_KERNEL);
 ^^
823  if (!msg)
824  return ERR_PTR(-ENOMEM);
825  
826  blen = size - min_t(int, sizeof(msg-first.data), size);
827  n = (blen + MLX5_CMD_DATA_BLOCK_SIZE - 1) / 
 MLX5_CMD_DATA_BLOCK_SIZE;
828  
829  for (i = 0; i  n; i++) {
830  tmp = alloc_cmd_box(dev, flags);
  ^
 There is a kmalloc() in alloc_cmd_box() that uses flags as well as a
 pci_pool_alloc() that uses it.
 
831  if (IS_ERR(tmp)) {
832  mlx5_core_warn(dev, failed allocating 
 block\n);
 
 regards,
 dan carpenter
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3.11-rc1] mlx5 core: Fix __udivdi3 when compiling for 32 bit arches

2013-07-15 Thread Tim Gardner
Cc: Eli Cohen e...@mellanox.com
Signed-off-by: Tim Gardner tim.gard...@canonical.com
---
 drivers/net/ethernet/mellanox/mlx5/core/debugfs.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/debugfs.c 
b/drivers/net/ethernet/mellanox/mlx5/core/debugfs.c
index 4273c06..9c7194b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/debugfs.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/debugfs.c
@@ -156,7 +156,7 @@ static ssize_t average_read(struct file *filp, char __user 
*buf, size_t count,
stats = filp-private_data;
spin_lock(stats-lock);
if (stats-n)
-   field = stats-sum / stats-n;
+   field = div64_u64(stats-sum, stats-n);
spin_unlock(stats-lock);
ret = snprintf(tbuf, sizeof(tbuf), %llu\n, field);
if (ret  0) {
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ibdiagnet: ibdiagnet /tmp clobbering vulnerability

2013-07-15 Thread Pramod Gunjikar

Copying Rajkumar. Rajkumar is also in on this alias.

Thanks
   Pramod

Sending this on behalf of Rajkumar Sivaprakasam.

  Pramod
-- 



The diff of the proposed changes for fixing the CVE 2013-2561 is in
the  attachment to this mail. The diffs are based on 
~kliteyn/ibutils.git.

Please review the proposed changes for the fix. See below for the
details about the issue and the proposed fix.

Synopsis:
 CVE-2013-2561: ibdiagnet /tmp clobbering vulnerability

Problem statement:
 The ibdiagnet  utility creates many files in the /tmp/ directory.
If  any  of  these  files  exist  already,  they  are  deleted during
initialization of the command except for,

 ibdiagnet.log
 ibdiagnet.db
 ibdiagnet_ibis.log

 These files, if  present, are  truncated to zero  length  and the
relevant data written to it.

 A malicious user (non-privileged) can create a soft link with one
or more of the above three files pointing to an important  system file
(example /etc/shadow), since /tmp/ has write  access by  everyone. The
next time  ibdiagnet  is  run  as root, the command will just open the
symlink  truncating the system file and  writing the ibdiagnet data in
those file, possibly bringing down the system.

Fix Summary:
 Remove  the  above  files  too  in a way that does not impair the
functionality.  This  breaks  any  symlink  that  was  created  by the
malicious user  preventing the data  corruption. Since these files are
always  opened  with  the  access  mode 'w'  (write,  with  file size
truncated  to  0)  it  is  safe  to delete these files and treats them
similar to the other files.

Fix Details:
 The  ibdiagnet  utility  at  the  initialization path of the code
(StartIBDIAG() in  ibdebug.tcl) calls  DeleteOldFiles()  routine which
deletes  all  the  files  that will be created by the utility, if they
already  exist. This  routine  however  skips  the  ibdiagnet.log and
ibdiagnet.db files.

 The  ibdiagnet.log  file  is  opened  as soon as the command line
arguments  are  parsed  (in  ParseArgv())  much  before StartIBDIAG()
routine is executed. This is done to log any errors encountered during
the early  initialization phase. If the  DeleteOldFiles()  deletes the
log file  latter, the  utility will  continue to write to the file but
the  entry  will  be  removed  from  the directory name space. It will
likely be deleted (inode and disk blocks) once we close the file. This
is  the  reason  DeleteOldFiles()  explicitly  skips  this  file. The
proposed  fix  deletes any pre-existing file with the same name before
the file is opened for logging in ParseArgv().

 The  ibdiagnet.db  file  contains  the IB  subnet database in tcl
source-able  array  format. This  file  can be used latter to load the
subnet database without going  through the subnet discovery phase. The
sourcing of the subnet database file is done after the DeleteOldFiles()
routine  is   executed.  So  if  the   input   file  is  the default
/tmp/ibdiagnet.db  file and if we  delete it, then the sourcing of the
subnet database will fail. To  prevent this  from happening, currently
the  subnet  database  file is explicitly skipped by DeleteOldFiles().
This  opens the  window for  the symlink based system  file clobbering
when the  file is  written. The  fix is  to  delete  the ibdiagnet.db
file  just  before  opening for  write. The  file  is  deleted only if
-load_db  option is not given, since we do not want to delete the file
unless  we  have done  subnet discovery and need to write out the new
subnet database.

 The  ibdiagnet_ibis.log is the log file setup for IB interactive
scripting  interpreter (ibis) to log its  messages. This  is done very
early in InitializeIBIS(). Also, this is not part of the list of files
maintained  by   ibdiagnet   utility.  Hence  it  is  not  deleted by
DeleteOldFiles().  The  fix  is  to  delete the file just before it is
setup by ibdiagnet for ibis.


Thanks
Raj


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IB/srp: Let srp_abort() return FAST_IO_FAIL if TL offline

2013-07-15 Thread Vu Pham

Bart Van Assche wrote:

If the transport layer is offline it is more appropriate to let
srp_abort() return FAST_IO_FAIL instead of SUCCESS.

Signed-off-by: Bart Van Assche bvanass...@acm.org
Reported-by: Sebastian Riemer sebastian.rie...@profitbricks.com
Cc: David Dillow dillo...@ornl.gov
Cc: Roland Dreier rol...@purestorage.com
Cc: Vu Pham v...@mellanox.com
---
 drivers/infiniband/ulp/srp/ib_srp.c |3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index 9d8b46e..f93baf8 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -1753,8 +1753,7 @@ static int srp_abort(struct scsi_cmnd *scmnd)
if (!req || !srp_claim_req(target, req, scmnd))
return FAILED;
if (srp_send_tsk_mgmt(target, req-index, scmnd-device-lun,
- SRP_TSK_ABORT_TASK) == 0 ||
-   target-transport_offline)
+ SRP_TSK_ABORT_TASK) == 0)
ret = SUCCESS;
else if (target-transport_offline)
ret = FAST_IO_FAIL;
  

Hello Bart,

With this path on top of v3, the test have  run well with my fail-over test.
Please queue it up with v4.

thanks,
-vu
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IB/srp: Let srp_abort() return FAST_IO_FAIL if TL offline

2013-07-15 Thread Vu Pham

Bart Van Assche wrote:

If the transport layer is offline it is more appropriate to let
srp_abort() return FAST_IO_FAIL instead of SUCCESS.

Signed-off-by: Bart Van Assche bvanass...@acm.org
Reported-by: Sebastian Riemer sebastian.rie...@profitbricks.com
Cc: David Dillow dillo...@ornl.gov
Cc: Roland Dreier rol...@purestorage.com
Cc: Vu Pham v...@mellanox.com
---
 drivers/infiniband/ulp/srp/ib_srp.c |3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index 9d8b46e..f93baf8 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -1753,8 +1753,7 @@ static int srp_abort(struct scsi_cmnd *scmnd)
if (!req || !srp_claim_req(target, req, scmnd))
return FAILED;
if (srp_send_tsk_mgmt(target, req-index, scmnd-device-lun,
- SRP_TSK_ABORT_TASK) == 0 ||
-   target-transport_offline)
+ SRP_TSK_ABORT_TASK) == 0)
ret = SUCCESS;
else if (target-transport_offline)
ret = FAST_IO_FAIL;
  

Hello Bart,

With this patch applied on top of v3, my fail-over test have run well.
Please queue it up with v4.

thanks,
-vu
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 11/13] IB/srp: Make HCA completion vector configurable

2013-07-15 Thread Bart Van Assche

On 15/07/2013 7:29, Sagi Grimberg wrote:

srp_daemon is a package designated for the customer to automatically
detect targets in the IB fabric. From our experience here in Mellanox,
customers/users like automatic plugplay tools.
They are reluctant to build their own scriptology to enhance performance
and settle with srp_daemon which is preferred over use of ibsrpdm and
manual adding new targets.
Regardless, the completion vectors assignment is meaningless without
setting proper IRQ affinity, so in the worst case where the user didn't
set his IRQ affinity,
this assignment will perform like the default completion vector
assignment as all IRQs are directed without any masking i.e. core 0.

From my experiments in NUMA systems, optimal performance is gained
where all IRQs are directed to half of the cores on the NUMA node close
to the HCA, and all traffic generators share the other half of the cores
on the same NUMA node. So based on that knowledge, I thought that
srp_daemon/srp driver will assign it's CQs across the HCAs completion
vectors, and the user is encouraged to set the IRQ affinity as described
above to gain optimal performance.
Adding connections over the far NUMA node don't seem to benefit
performance too much...

As I mentioned, a use-case I see that may raise a problem here, is if
the user would like to maintain multiple SRP connections and reserve
some completion vectors for other IB applications on the system.
in this case the user will be able to disable srp_daemon/srp driver
completion vectors assignment.

So, this was just an idea, and easy implementation that would
potentially give the user semi-automatic performance optimized
configuration...


Hello Sagi,

I agree with you that it would help a lot if completion vector 
assignment could be automated such that end users do not have to care 
about assigning completion vector numbers. The challenge is to find an 
approach that is general enough such that it works for all possible use 
cases. One possible approach is to let a tool that has knowledge about 
the application fill in completion vector numbers in srp_daemon.conf and 
let srp_daemon use the values generated by this tool. That approach 
would avoid that srp_daemon has to have any knowledge about the 
application but would still allow srp_daemon to assign the completion 
vector numbers.


Bart.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3.11-rc1] mlx5 core: Fix __udivdi3 when compiling for 32 bit arches

2013-07-15 Thread Tim Gardner
On 07/15/2013 09:52 AM, Randy Dunlap wrote:
 On 07/15/13 07:56, Tim Gardner wrote:
 Cc: Eli Cohen e...@mellanox.com Signed-off-by: Tim Gardner
 tim.gard...@canonical.com
 
 I reported this last week and Eli wrote:
 
 I have this fixed in my tree and we run the driver on i386. I will
 check on Sunday why it is not in the patches submitted.
 
 Anyway, the patch works for me.
 
 Acked-by: Randy Dunlap rdun...@infradead.org Reported-by: Randy
 Dunlap rdun...@infradead.org
 

I figured someone must have seen it, but a cursory Google search on
mlx5 __udivdi3 failed to turn up anything.

rtg
-- 
Tim Gardner tim.gard...@canonical.com
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] RDMA/cma: silence GCC warning

2013-07-15 Thread Hefty, Sean
 1) Perhaps a better way to silence GCC is to drop port entirely, and
 assign to id_priv-id.port_num directly. Would that be acceptable?

Yes, this would work.

- Sean
 


Re: [PATCH V7 libibverbs 0/7] Add extension and XRC QP support

2013-07-15 Thread Jason Gunthorpe
On Mon, Jul 15, 2013 at 03:40:47PM +0300, Yishai Hadas wrote:
 Hi Jason,
 
 Let me clarify the place holder reservation mentioned in the cover
 letter.  The entries I was referring to are not proprietary vendor
 verbs but rather user space verbs who are on their way to be
 submitted upstream for inclusion in libibverbs/libmlx4.

I get that.

But it doesn't make one difference. Until they reach Roland's tree
*nobody* gets to use indexes in the extension structure with the
'libibverbs.so.1' SONAME.

PERIOD. NO EXCEPTIONS.

 So in that respect, it would make sense for us as vendor to provide
 customers through mellanox ofed the means to use them.

You can provide whatever you like in your Mellanox software releases.

If you use the libibverbs.so.1 SONAME and mess with the ABI you get to
keep all the broken bits and angry users when upstream does something
different.

OFA has already committed to stop this nonsense and no longer ship
upstream incompatible libraries.

As a vendor, when you make ABI incompatible changes like this you have
to change the SONAME/symbols. There is no other answer, you must do
it.

 What asked now is to reserve the pointers, no more.  As for the

The only reason you are going to care about the actual indexes is
because you are already shipping code that uses them. That means
you've already made the choice of what index has what function
signature.

So you are asking that upstream adpot the ABI you are already shipping
without alteration. Why on earth should that happen? Especially when
some of the index/signature pairs have apparently never even been
posted to the list

NO. This is a bad precedent. Upstream sets the ABI, not the vendors.

PERIOD.

 To sum up, we think it would be constructive step to continue with
 this series while reserving the six pointers, or if this really
 helps submit for review the libibverbs part of those verbs along
 with the basic verbs extensions patches.

Frankly, this nonsense makes me question if the extension mechanism is
even a sane idea if this is the sort of abuse it is going to be
subject too.

AFAIK, Roland has not commented on any of these series, or the basic
extension mechanism.

I think we all need to hear his opinion.

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-next 0/4] IP based RoCE GID Addressing

2013-07-15 Thread Or Gerlitz
Devesh Sharma deves...@gmail.com wrote:
[...]
 What will happen to those devices which does not suppot IP based RoCE, How
 connection management will happen with those if Native RoCE support is
 removed. There may be other devices/implementations which does not support
 IP Base RoCE.


RoCE GIDs is something programmed by the low-level IB driver into the
device GID table, so what ever format used for GIDs, the device has to
support being programmed by the driver.

 How connection management will happen on them, will those devices always
 see Connection Reject event form RDMA-CM?

When a CM connection request is received by a node and the GID inside
it doesn't match any GID on this node GID table, the IB CM sends a
reject message with invalid gid reject reason.

Or.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 for-next 2/4] IB/core: Infra-structure to support verbs extensions through uverbs

2013-07-15 Thread Or Gerlitz
On Tue, Jul 9, 2013 at 6:00 PM, Tzahi Oved tza...@mellanox.com wrote:
 On 26/06/2013 16:05, Roland Dreier wrote:
   I don't understand what extended capabilities are or why we need
 to change the header format.

 First let me clarify the target of *this* patch - allow extending
 uverbs new commands input and output variables. Whenever a specific
 IB_USER_VERBS_CMD_XXX is extended == needs to have additional
 arguments, we will be able to add them without creating a completely
 new IB_USER_VERBS_CMD_YYY command or bumping the uverbs ABI version.
 This patch alone doesn’t provide the whole scheme which is also
 dependent on adding comp_mask to the extended uverbs structs - see my
 explanation below, and do let me know if an actual use case is more of
 self-explaining.

 On 26/06/2013 16:05, Roland Dreier wrote:
   What is the verbs extensions approach?

 Similar to the scheme built with Sean and Jason for user verbs
 extension support in libibverbs, that can be reviewed in the XRC
 patches Sean submitted, we’d like to build a scheme for the kernel
 uverbs as well.

 On 26/06/2013 16:05, Roland Dreier wrote:
   Why does the kernel need to know about it?

 The kernel uverbs are the ones that we wish to add extension
 capability in this patch (vs. the user verbs which exisit in an
 orthogonal patch).

 On 26/06/2013 16:05, Roland Dreier wrote:
   What is different about the processing?

 Ok, so we want to allow future extension of the CMD arguments
 (ib_uverbs_cmd_hdr.in_words, ib_uverbs_cmd_hdr.out_words) for an
 existing new command (= a command that supports the new uverbs command
 header format suggested in this patch) w/o bumping ABI ver and with
 maintaining backward and formward compatibility to new and old
 libibverbs versions.
 In the uverbs command we are passing both uverbs arguments and the
 provider arguments (mlx4_ib in our case), for example take the
 create_cq call:
 Uverbs gets struct ibv_create_cq, mlx4_ib gets struct mlx4_create_cq
 (which includes struct ibv_create_cq), and In_words =
 sizeof(mlx4_create_cq)/4.
 Thus ib_uverbs_cmd_hdr.in_words carry both uverbs plus mlx4_ib input
 argument sizes, where uverbs assumes it knows the size of its input
 argument - struct ibv_create_cq.
 Now, if we wish to add a variable to struct ibv_create_cq, we can add
 a comp_mask field to the struct which is basically bit field
 indicating which fields exists in the struct (as done for the
 libibverbs API extension), but we need a way to tell what is the total
 size of the struct and not assume the struct size is predefined (since
 we may get different struct sizes from different user libibverbs
 vers), so we know at which point the provider input argument (struct
 mlx4_create_cq ) begins. Same goes for extending the provider struct
 mlx4_create_cq. Thus we split the ib_uverbs_cmd_hdr.in_words to
 ib_uverbs_cmd_hdr.in_words which will now carry only uverbs input
 argument struct size and ib_uverbs_cmd_hdr.provider_in_words that will
 carry the provider (mlx4_ib) input argument size. Same goes for the
 response (the uverbs CMD output argument).

 Tzahi

Roland,

Does the above helps to address your question/concerns on the matter?

Or.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html