Re: kernel BUG at drivers/iommu/intel-iommu.c:1767 on F16 3.1.1-2.fc16.x86_64

2012-01-20 Thread Roland Dreier
On Thu, Jan 19, 2012 at 12:57 AM, Albert Strasheim full...@gmail.com wrote:
 Just checking up on this issue. Is there any further testing or
 information we can provide to help make a fix happen?

I'm not likely to be much help on VT-d issues, but maybe it
would be useful to dump all the values in the BUG_ON if its
going to trigger, ie just before

   BUG_ON(addr_width  BITS_PER_LONG  (iov_pfn + nr_pages - 1)
 addr_width);

add

   if (addr_width  BITS_PER_LONG  (iov_pfn + nr_pages - 1)  addr_width)
   pr_err(VT-d BUG! addr_width %d  %d (iov_pfn 0x%lx
nr_pages %ld)\n, addr_width, BITS_PER_LONG, iov_pfn, nr_pages);

and report what that prints.

 - R.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


cq-event kernel panic

2012-01-20 Thread Bernd Schubert
We are still seeing kernel panics with linux-3.2, this time initiated 
from mthca_cq_event(). I'm unsure if this is somehow related to the 
yesterdays cq_completion patch. In any case, I'm CCing Sean therefore.


kernel logs sometimes show something like

ib_mthca :01:00.0: CQ access violation on CQN 2c0089

and at the same time either our FhGFS daemons, which are using ibverbs 
crash with a segmentation fault or the entire kernel crashes with panic 
as given below. My next step is to debug our FhGFS crashes to see if 
this is from ib libs or a real issue of the daemon.


Below is the kernel panic. The kernel already includes the patch to 
initialized qp-usecnt.



[53904.589342] ib_mthca :01:00.0: CQ access violation on CQN 8b
[53964.464518] ib_mthca :01:00.0: CQ access violation on CQN d2009f
[53964.468302] BUG: unable to handle kernel NULL pointer dereference at 
0058
[53964.468302] IP: [a03a71a8] ib_uverbs_async_handler+0x28/0x150 
[ib_uverbs]
[53964.468302] PGD 1f8d18067 PUD 1f3904067 PMD 0
[53964.468302] Oops:  [#1] SMP
[53964.468302] CPU 1
[53964.468302] Modules linked in: nfsd ext4 mbcache jbd2 crc16 mlx4_ib 
mlx4_core ib_umad rdma_ucm rdma_cm iw_cm ib_addr ib_uverbs ib_ipoib ib_cm ib_sa 
sg ipv6 sd_mod crc_t10dif loop arcmsr md_mod pcspkr 8250_pnp ib_mthca ib_mad 
ib_core fuse af_packet nfs lockd fscache auth_rpcgss nfs_acl sunrpc btrfs 
lzo_decompress lzo_compress zlib_deflate crc32c libcrc32c crypto_hash 
crypto_algapi ata_generic pata_acpi pata_amd e1000 sata_nv libata scsi_mod unix 
[last unloaded: scsi_wait_scan]
[53964.468302]
[53964.468302] Pid: 10644, comm: fhgfs-storage-u Not tainted 3.2.0+ #10 
Supermicro H8DCE/H8DCE
[53964.468302] RIP: 0010:[a03a71a8]  [a03a71a8] 
ib_uverbs_async_handler+0x28/0x150 [ib_uverbs]
[53964.468302] RSP: 0018:8801ffc039b0  EFLAGS: 00010082
[53964.468302] RAX: 8801f948e300 RBX:  RCX: 8801f948e370
[53964.468302] RDX:  RSI: 8801f948ee40 RDI: 
[53964.468302] RBP: 8801ffc039f0 R08: 8801f948e384 R09: 8142c5e0
[53964.468302] R10: 0006 R11: 000d R12: 00d2009f
[53964.468302] R13: 8800bf5aba20 R14:  R15: 8801f3a82400
[53964.468302] FS:  74ca7700() GS:8801ffc0() 
knlGS:
[53964.468302] CS:  0010 DS:  ES:  CR0: 8005003b
[53964.468302] CR2: 0058 CR3: 0001f96d4000 CR4: 06e0
[53964.468302] DR0:  DR1:  DR2: 
[53964.468302] DR3:  DR6: 0ff0 DR7: 0400
[53964.468302] Process fhgfs-storage-u (pid: 10644, threadinfo 
8809, task 8800c8139650)
[53964.468302] Stack:
[53964.468302]  8801ffc03a00 8801f948e384 a0318208 
8800bf5ab000
[53964.468302]  00d2009f 8800bf5aba20  
8801f3a82400
[53964.468302]  8801ffc03a00 a03a737b 8801ffc03a60 
a0306f77
[53964.468302] Call Trace:
[53964.468302]  IRQ
[53964.468302]  [a03a737b] ib_uverbs_cq_event_handler+0x2b/0x30 
[ib_uverbs]
[53964.468302]  [a0306f77] mthca_cq_event+0x87/0x110 [ib_mthca]
[53964.468302]  [a03062a4] mthca_eq_int+0x2d4/0x410 [ib_mthca]
[53964.468302]  [a0306544] mthca_arbel_msi_x_interrupt+0x24/0x60 
[ib_mthca]
[53964.468302]  [810b54fd] handle_irq_event_percpu+0x5d/0x210
[53964.468302]  [810b56f0] handle_irq_event+0x40/0x70
[53964.468302]  [810b8d0d] handle_edge_irq+0x6d/0x120
[53964.468302]  [810166a2] handle_irq+0x22/0x30
[53964.468302]  [81390aad] do_IRQ+0x5d/0xe0
[53964.468302]  [81385eb3] common_interrupt+0x73/0x73
[53964.468302]  [812e3f9b] ? __alloc_skb+0x4b/0x170
[53964.468302]  [8113e0fb] ? kmem_cache_alloc_node+0x3b/0x130
[53964.468302]  [8131af61] ? ip_rcv+0x201/0x2e0
[53964.468302]  [812e3f9b] __alloc_skb+0x4b/0x170
[53964.468302]  [812e457d] dev_alloc_skb+0x1d/0x40
[53964.468302]  [a0395fca] ipoib_alloc_rx_skb+0x4a/0x380 [ib_ipoib]



ib_uverbs_async_handler+0x28 translates to


Reading symbols from 
/home/schubert/src/linux/linux-stable/debian/tmp/lib/modules/3.2.0+/kernel/drivers/infiniband/core/ib_uverbs.ko...done.
(gdb) l *(ib_uverbs_async_handler+0x28)
0x11a8 is in ib_uverbs_async_handler 
(drivers/infiniband/core/uverbs_main.c:440).
435 u32 *counter)
436 {
437 struct ib_uverbs_event *entry;
438 unsigned long flags;
439
440 spin_lock_irqsave(file-async_file-lock, flags);
441 if (file-async_file-is_closed) {
442 spin_unlock_irqrestore(file-async_file-lock, flags);
443 return;
444 }



Any ideas?


Thanks,
Bernd
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a 

RE: races in ipathfs

2012-01-20 Thread Mike Marciniszyn
We are currently investigating this.

Thanks for the review on this issue!

Mike

 -Original Message-
 From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
 ow...@vger.kernel.org] On Behalf Of Al Viro
 Sent: Thursday, January 19, 2012 3:20 PM
 To: Dept_Infinipath
 Cc: linux-rdma@vger.kernel.org; linux-kernel
 Subject: races in ipathfs

   Use of qib_super is seriously racy.  qibfs_add() (and worse,
 qibfs_remove()) can happen during qibfs_mount() and qibfs_kill_super().

   1) CPU1: qib_init_one().  The sucker is allocated and placed
 on the list.  CPU2: ipathfs is mounted, directory created.  CPU1:
 finally
 gets around to qibfs_add(); by now qib_super is non-NULL and off we go,
 trying to create it again.  The worst part is, that code doesn't even
 notice that dentry is there and positive; you silently leak the old
 inode.

   2) CPU1: qib_init_one().  Allocated the sucker.  CPU2: ipathfs
 is getting mounted.  Picked the first device off the list, creating
 directory for it.  CPU1: inserted new device into the head of the list,
 continued working.  Got around to qibfs_add(); qib_super is NULL, so
 we do nothing.  CPU2: walked the rest of the list, creating directories
 for all devices.  Our device is missed, since we are past that point in
 the list.  Worse, shift the timing a bit and it doesn't matter whether
 you add to the head or to the tail of the list - if qibfs_add() happens
 just before we set qib_super, we are screwed again.

   3) CPU1: qib_remove_one().  CPU2: mount ipathfs is walking that
 list and decides to try and create a directory for the device that is
 being freed.  Oops...

   4) CPU1: qib_init_one() or qib_remove_one(), doesn't matter
 which.
 CPU2: final umount of ipathfs already got through setting sb-s_root to
 NULL but still hadn't set qib_super to the same.  Oops...  And no,
 moving that qib_super = NULL; up prior to kill_litter_super() won't
 fix the race either, of course.

 AFAICS, the older driver (in hw/ipath) has the same problems.
 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma
 in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


This message and any attached documents contain information from QLogic 
Corporation or its wholly-owned subsidiaries that may be confidential. If you 
are not the intended recipient, you may not read, copy, distribute, or use this 
information. If you have received this transmission in error, please notify the 
sender immediately by reply e-mail and then delete this message.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] IB/mad: Return unsupported for MADs as appropriate

2012-01-20 Thread Hal Rosenstock
On 1/20/2012 7:12 AM, Swapna Thete wrote:
 Thank you all for your suggestions.
 However, I just wanted to understand that given this code is for the
 case of an entire class not supported, 

There's more than just class not supported which gets to that point in
the MAD code flow. For example, SMInfo can and that is more appropriate
as method/attribute not supported rather than class not supported.
The one example cited so far is SMInfo is part of SM class and that is
required to be supported by SMA in every node.

 isn't this error code
 IB_MGMT_MAD_STATUS_BAD_VERSION (MAD returned with Bad Status: Unsupported 
 Class or Version) more appropriate? Also this is
 Part of IB spec error codes.

Yes, class not supported is indicated by this code.

We can either choose to discern which case it is and return the correct
error code (preferable if not too much overhead) or pick one of the two
(in the case where it's an incoming get/set). If we pick one, which one
causes less confusion when it comes to answering why that MAD status was
returned ? I don't think it would/should cause any behavioral difference
as any bad MAD status is/should be treated as an error.

-- Hal

 Also I did not make any changes to the handling of SMA Get(SmInfo
 attribute). Let me know if I am missing something.
 
 Thanks,
 Swapna
 -Original Message-
 From: Hal Rosenstock [mailto:h...@dev.mellanox.co.il]
 Sent: Thursday, January 19, 2012 6:28 PM
 To: Swapna Thete
 Cc: rol...@kernel.org; linux-rdma@vger.kernel.org; Jack Morgenstein
 Subject: Re: [PATCH 2/2] IB/mad: Return unsupported for MADs as appropriate
 
 On 1/18/2012 5:30 PM, Hal Rosenstock wrote:
 On 1/18/2012 3:43 AM, Swapna Thete wrote:
 Setup a response with appropriate error status and
 send it for the MADs that are not supported by a
 specific class/version.
 Signed-off-by: Swapna Thete swapna.th...@qlogic.com
 ---
  drivers/infiniband/core/mad.c |   10 ++
  1 files changed, 10 insertions(+), 0 deletions(-)

 diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
 index 2fe428b..734d846 100644
 --- a/drivers/infiniband/core/mad.c
 +++ b/drivers/infiniband/core/mad.c
 @@ -1963,6 +1963,16 @@ local:
   * or via recv_handler in ib_mad_complete_recv()
   */
  recv = NULL;
 +} else {
 +memcpy(response, recv, sizeof(*response));

 Isn't this overkill as the bad MAD status precludes looking at the MAD
 data ?

 +response-header.recv_wc.wc = response-header.wc;
 +response-header.recv_wc.recv_buf.mad = response-mad.mad;
 +response-header.recv_wc.recv_buf.grh = response-grh;
 +response-mad.mad.mad_hdr.method = IB_MGMT_METHOD_GET_RESP;
 +response-mad.mad.mad_hdr.status =
 +__be16_to_cpu(IB_MGMT_MAD_STATUS_BAD_VERSION);

 While this is the best status for class not supported, that's not all
 the cases that get to here. Attribute not supported (in a supported
 class) could also occur here for which unsupported method/attribute
 combination is more appropriate as a MAD status. I'm not sure it's worth
 the effort to discern that (as any invalid MAD status is treated the
 same) but I think it could be done if we want to be more precise about
 the error.
 
 The other alternative is to just return method/attribute combination not
 supported (STATUS_FIELD[4:2] = 0x3 (MAD status 0xc)) for all this as
 Jack and Or have indicated. FWIW that's my preference.
 
 -- Hal
 

 -- Hal

 +agent_send_response(response-mad.mad, recv-grh, wc,
 +port_priv-device, port_num, qp_info-qp-qp_num);
  }

  out:


 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


 
 
 
 This message and any attached documents contain information from QLogic 
 Corporation or its wholly-owned subsidiaries that may be confidential. If you 
 are not the intended recipient, you may not read, copy, distribute, or use 
 this information. If you have received this transmission in error, please 
 notify the sender immediately by reply e-mail and then delete this message.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 2/2] IB/mad: Return unsupported for MADs as appropriate

2012-01-20 Thread Hefty, Sean
 Thank you all for your suggestions.
 However, I just wanted to understand that given this code is for the
 case of an entire class not supported, isn't this error code
 IB_MGMT_MAD_STATUS_BAD_VERSION (MAD returned with Bad Status: Unsupported
 Class or Version) more appropriate? Also this is
 Part of IB spec error codes.

As Jason pointed out, see figure 169, but, yes, this is the correct value for 
your case.  What I was suggesting is moving these checks out into another 
function, dealing with the specific case(s) that you are interested in, and 
deferring work on other cases.  The function I'm referring to would use 
positive checks to determine if a reply should be generated and what type of 
reply it needs to be, with the default case to simply discard the MAD.  
Silently dropping these MADs is permissible under the spec, automatically 
generating a canned reply is not.

- Sean
N�r��yb�X��ǧv�^�)޺{.n�+{��ٚ�{ay�ʇڙ�,j��f���h���z��w���
���j:+v���w�j�mzZ+�ݢj��!�i

RE: cq-event kernel panic

2012-01-20 Thread Hefty, Sean
 We are still seeing kernel panics with linux-3.2, this time initiated
 from mthca_cq_event(). I'm unsure if this is somehow related to the
 yesterdays cq_completion patch. In any case, I'm CCing Sean therefore.

Was your patch applied when testing?  mthca uses kmalloc to allocate the QP 
structure.


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] core/verb.c: fix kernel panic: always initialize struct ib_qp *qp-usecnt

2012-01-20 Thread Bernd Schubert
Hmm, I think we do have serious problem with the hole approach. While 
the patch works for the kernel side, there is a problem with user space 
libraries. So I monitored our daemons and noticed ibv_destroy_cq() 
failed. The reason again seems to be the same issue as already fixed for 
kernel qp's. So in __ibv_create_qp() (libibverbs/src/verbs.c):



__ibv_create_qp()



struct ibv_qp *qp = pd-context-ops.create_qp(pd, qp_init_attr);

if (qp) {
qp-context   = pd-context;
qp-qp_context= qp_init_attr-qp_context;
qp-pd= pd;
qp-send_cq   = qp_init_attr-send_cq;

[...]

I *guess* the qp allocated by pd-context-ops.create_qp() does not have 
qp-usecnt initialized (not does it know anything about it). So its 
random value will fail the destruction later. A simple workaround that 
would work for us, is to extend the patch I send to


diff --git a/drivers/infiniband/core/verbs.c 
b/drivers/infiniband/core/verbs.c

index 602b1bd..fba1675 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -874,7 +874,7 @@ int ib_destroy_qp(struct ib_qp *qp)
struct ib_srq *srq;
int ret;

-   if (atomic_read(qp-usecnt))
+   if (qp-qp_type == IB_QPT_XRC_TGT  atomic_read(qp-usecnt))
return -EBUSY;

if (qp-real_qp != qp)



However, what is is with user space setting type to IB_QPT_XRC_TGT? I 
guess this could be solved by letting the kernel zero the memory 
returned by -ops.create_qp(pd, qp_init_attr).
Btw, I didn't figure out yet, how this translates at all in kernel 
space? Is this op directly going to the device driver?


But even if we are properly going to initialize the qp, what is with 
user space mischievously trying to crash the system by manipulating 
struct ib_qp *qp?



Thanks,
Bernd


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: cq-event kernel panic

2012-01-20 Thread Bernd Schubert

On 01/20/2012 05:04 PM, Hefty, Sean wrote:

We are still seeing kernel panics with linux-3.2, this time initiated
from mthca_cq_event(). I'm unsure if this is somehow related to the
yesterdays cq_completion patch. In any case, I'm CCing Sean therefore.


Was your patch applied when testing?  mthca uses kmalloc to allocate the QP 
structure.




Yes, therefore the kernel build name updated to 3.2.0+. But please see 
the mail I just sent, it probably explains the underlying issue.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OT: netmap - a novel framework for fast packet I/O

2012-01-20 Thread Ira Weiny
On Fri, 20 Jan 2012 06:18:44 -0800
Atchley, Scott atchle...@ornl.gov wrote:

 Interesting. It totally hijacks the NIC; all traffic is captured. You would 
 have to implement your own IP stack, Verbs stack, etc.
 

Can multiple user space processes share the card?  If so, how is security 
handled between them?

Ira

 Scott
 
 On Jan 19, 2012, at 11:50 AM, Yann Droneaud wrote:
 
  Hi,
  
  I have discovered today the netmap project[1] through an ACM Queue
  article[2].
  
  Netmap is a new interface to send and receive packets through an
  Ethernet interface (NIC). It seems to provide a raw access to network
  interface in order to process packets at high rate with a low overhead.
  
  This is an another example of kernel-bypass/zero-copy which are core
  features of InfiniBand verbs/RDMA.
  
  But unlike InfiniBand verbs/RDMA, Netmap seems to have a very small API.
  
  Such API could be enough to build an unreliable datagram messaging
  system on low cost hardware (without concerns of determinism, flow
  control, etc.).
  
  I'm asking myself if the way netmap exposes internal NIC rings could be
  applicable for IB/IBoE HCA ? e.g. beyond 10GbE NIC, is netmap relevant ?
  
  Regards.
  
  [1] http://info.iet.unipi.it/~luigi/netmap/
  
  netmap - a novel framework for fast packet I/O
  Luigi Rizzo Università di Pisa
  
  [2] http://queue.acm.org/detail.cfm?id=2103536
  
  Revisiting Network I/O APIs: The netmap Framework 
  Luigi Rizzo, 2012-01-17  
  
  -- 
  Yann Droneaud
  
  
  --
  To unsubscribe from this list: send the line unsubscribe linux-rdma in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
Ira Weiny
Member of Technical Staff
Lawrence Livermore National Lab
925-423-8008
wei...@llnl.gov
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OT: netmap - a novel framework for fast packet I/O

2012-01-20 Thread Atchley, Scott
On Jan 20, 2012, at 11:20 AM, Ira Weiny wrote:

 On Fri, 20 Jan 2012 06:18:44 -0800
 Atchley, Scott atchle...@ornl.gov wrote:
 
 Interesting. It totally hijacks the NIC; all traffic is captured. You would 
 have to implement your own IP stack, Verbs stack, etc.
 
 
 Can multiple user space processes share the card?  If so, how is security 
 handled between them?

It is not clear from the paper I scanned.

There does seem to be a mechanism to send selected packets up the host stack.

Scott

 
 Ira
 
 Scott
 
 On Jan 19, 2012, at 11:50 AM, Yann Droneaud wrote:
 
 Hi,
 
 I have discovered today the netmap project[1] through an ACM Queue
 article[2].
 
 Netmap is a new interface to send and receive packets through an
 Ethernet interface (NIC). It seems to provide a raw access to network
 interface in order to process packets at high rate with a low overhead.
 
 This is an another example of kernel-bypass/zero-copy which are core
 features of InfiniBand verbs/RDMA.
 
 But unlike InfiniBand verbs/RDMA, Netmap seems to have a very small API.
 
 Such API could be enough to build an unreliable datagram messaging
 system on low cost hardware (without concerns of determinism, flow
 control, etc.).
 
 I'm asking myself if the way netmap exposes internal NIC rings could be
 applicable for IB/IBoE HCA ? e.g. beyond 10GbE NIC, is netmap relevant ?
 
 Regards.
 
 [1] http://info.iet.unipi.it/~luigi/netmap/
 
 netmap - a novel framework for fast packet I/O
 Luigi Rizzo Università di Pisa
 
 [2] http://queue.acm.org/detail.cfm?id=2103536
 
 Revisiting Network I/O APIs: The netmap Framework 
 Luigi Rizzo, 2012-01-17  
 
 -- 
 Yann Droneaud
 
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 
 -- 
 Ira Weiny
 Member of Technical Staff
 Lawrence Livermore National Lab
 925-423-8008
 wei...@llnl.gov

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


memory region limit at 32 GB?

2012-01-20 Thread Albert Strasheim
Hello all

Is there some kind of limit that would prevent me from registering
more than 32 GiB worth of memory regions with ibv_reg_mr in
libibverbs?

From strace I can see:

open(/dev/infiniband/uverbs0, O_RDWR) = 8
...
write(8, 
\t\0\0\0\f\0\3\0\340o\255\35\377\177\0\0\0P\211\336\26\177\0\0\0\0\0@\0\0\0\0\0P\211\336\26\177\0\0\1\0\0\0\3\0\0\0,
48) = -1 ENOMEM (Cannot allocate memory)

when trying to register my 33rd 1 GiB buffer.

cat /proc/meminfo
MemTotal:   198075136 kB
MemFree:186688448 kB
CommitLimit:185892796 kB

so it doesn't look like a memory thing.

IB adapter details:

hca_id: mlx4_0
transport:  InfiniBand (0)
fw_ver: 2.8.600
vendor_id:  0x02c9
vendor_part_id: 26428
hw_ver: 0xB0
board_id:   MT_0FC0110009

I'm using libibverbs 1.1.6 on kernel 3.2.1-1.fc16.x86_64.

Regards

Albert
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: memory region limit at 32 GB?

2012-01-20 Thread Roland Dreier
On Fri, Jan 20, 2012 at 9:39 AM, Albert Strasheim full...@gmail.com wrote:
 Is there some kind of limit that would prevent me from registering
 more than 32 GiB worth of memory regions with ibv_reg_mr in
 libibverbs?

Yes, by default mlx4 allocates a limited amount of adapter resources
for tracking memory regions.  I forget the exact limits but 32GB looks
reasonable...

In mlx4/main.c, there is

static struct mlx4_profile default_profile = {
.num_qp = 1  18,
.num_srq= 1  16,
.rdmarc_per_qp  = 1  4,
.num_cq = 1  16,
.num_mcg= 1  13,
.num_mpt= 1  19,
.num_mtt= 1  20,
};

and I think if you bump num_mtt up a few powers of 2 (eg
1  22) then you should be able to register more.

(num_mpt controls the number of MRs but I guess 33 is
not near the limit of 512K yet ;)

 - R.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: memory region limit at 32 GB?

2012-01-20 Thread Roland Dreier
By the way, I wonder if we should auto-tune num_mtt so
we have enough MTTs to cover, say, 4X of the amount of
physical memory.

How much RAM do you have in your system?

 - R.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: memory region limit at 32 GB?

2012-01-20 Thread Roland Dreier
On Fri, Jan 20, 2012 at 10:30 AM, Albert Strasheim full...@gmail.com wrote:
 FYI, new Sandy Bridge motherboards will be out soon that do 512 GB and
 even 768 GB.

Yeah, Cisco UCS C260 goes up to 1TB in a 2socket 2U already so we really
should fix this.  I'll cook something up.

The good news is that the overhead is that we use more memory for adapter
context, but if you have 1TB of RAM you probably don't care about a few
more MB over overhead :)

 - R.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] core/verb.c: fix kernel panic: always initialize struct ib_qp *qp-usecnt

2012-01-20 Thread Roland Dreier
On Fri, Jan 20, 2012 at 8:14 AM, Bernd Schubert
bernd.schub...@itwm.fraunhofer.de wrote:
 I *guess* the qp allocated by pd-context-ops.create_qp() does not have
 qp-usecnt initialized (not does it know anything about it). So its random
 value will fail the destruction later. A simple workaround that would work
 for us, is to extend the patch I send to

 diff --git a/drivers/infiniband/core/verbs.c
 b/drivers/infiniband/core/verbs.c
 index 602b1bd..fba1675 100644
 --- a/drivers/infiniband/core/verbs.c
 +++ b/drivers/infiniband/core/verbs.c
 @@ -874,7 +874,7 @@ int ib_destroy_qp(struct ib_qp *qp)
        struct ib_srq *srq;
        int ret;

 -       if (atomic_read(qp-usecnt))
 +       if (qp-qp_type == IB_QPT_XRC_TGT  atomic_read(qp-usecnt))
                return -EBUSY;

        if (qp-real_qp != qp)

It looks like this is sufficient and correct without the other patch?



 However, what is is with user space setting type to IB_QPT_XRC_TGT? I guess
 this could be solved by letting the kernel zero the memory returned by
 -ops.create_qp(pd, qp_init_attr).
 Btw, I didn't figure out yet, how this translates at all in kernel space? Is
 this op directly going to the device driver?

 But even if we are properly going to initialize the qp, what is with user
 space mischievously trying to crash the system by manipulating struct ib_qp
 *qp?

I don't follow this.  Isn't *qp completely allocated and manipulated
in the kernel?  How can userspace touch it except by having the
kernel do something via the uverbs interface?

 - R.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] core/verb.c: fix kernel panic: always initialize struct ib_qp *qp-usecnt

2012-01-20 Thread Hefty, Sean
 However, what is is with user space setting type to IB_QPT_XRC_TGT? I
 guess this could be solved by letting the kernel zero the memory
 returned by -ops.create_qp(pd, qp_init_attr).
 Btw, I didn't figure out yet, how this translates at all in kernel
 space? Is this op directly going to the device driver?

ops.create_qp basically ends up going into the kernel into 
ib_uverbs_create_qp().

 But even if we are properly going to initialize the qp, what is with
 user space mischievously trying to crash the system by manipulating
 struct ib_qp *qp?

There's cleanup in uverbs that ignores the return value from ib_destroy_qp(), 
basically because it shouldn't fail in those circumstances.  After calling 
ib_destroy_qp, uverbs will free some internal structures that some of the 
callback handlers expect to access.  This leads to the crashes that you're 
seeing.

I think the problem is that your first patch is incomplete.  
ib_uverbs_create_qp() will create a QP by either calling ib_create_qp() or by 
calling the device directly (device-create_qp).  qp-usecnt needs to be 
initialized in both cases.  Can you try this modification to your original 
patch?

From: Bernd Schubert bernd.schub...@itwm.fraunhofer.de

From: Sean Hefty sean.he...@intel.com

rdma/core: Fix kernel panic by always initializing qp-usecnt

We have just been investigating kernel panics related to
cq-ibcq.event_handler() completion calls.

Reason is that ib_destroy_qp() fails with -EBUSY.  Further investigation
revealed qp-usecnt is not initialized.  This counter was introduced
in linux-3.2 by commit 0e0ec7e0638ef48e0c661873dfcc8caccab984c6
and is only initialized for IB_QPT_XRC_TGT, but also checked in ib_destroy_qp()
for any qp type.

Signed-off-by: Bernd Schubert bernd.schub...@itwm.fraunhofer.de
Signed-off-by: Sven Breuner sven.breu...@itwm.fraunhofer.de
Signed-off-by: Sean Hefty sean.he...@intel.com
---
 drivers/infiniband/core/uverbs_cmd.c |1 +
 drivers/infiniband/core/verbs.c  |2 +-
 2 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_cmd.c 
b/drivers/infiniband/core/uverbs_cmd.c
index e26193f..e47dbf1 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -1472,6 +1472,7 @@ ssize_t ib_uverbs_create_qp(struct ib_uverbs_file *file,
qp-event_handler = attr.event_handler;
qp-qp_context= attr.qp_context;
qp-qp_type   = attr.qp_type;
+   atomic_set(qp-usecnt, 0);
atomic_inc(pd-usecnt);
atomic_inc(attr.send_cq-usecnt);
if (attr.recv_cq)
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 602b1bd..575b780 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -421,6 +421,7 @@ struct ib_qp *ib_create_qp(struct ib_pd *pd,
qp-uobject= NULL;
qp-qp_type= qp_init_attr-qp_type;
 
+   atomic_set(qp-usecnt, 0);
if (qp_init_attr-qp_type == IB_QPT_XRC_TGT) {
qp-event_handler = __ib_shared_qp_event_handler;
qp-qp_context = qp;
@@ -430,7 +431,6 @@ struct ib_qp *ib_create_qp(struct ib_pd *pd,
qp-xrcd = qp_init_attr-xrcd;
atomic_inc(qp_init_attr-xrcd-usecnt);
INIT_LIST_HEAD(qp-open_list);
-   atomic_set(qp-usecnt, 0);
 
real_qp = qp;
qp = __ib_open_qp(real_qp, qp_init_attr-event_handler,


N�r��yb�X��ǧv�^�)޺{.n�+{��ٚ�{ay�ʇڙ�,j��f���h���z��w���
���j:+v���w�j�mzZ+�ݢj��!�i

Re: [PATCH] core/verb.c: fix kernel panic: always initialize struct ib_qp *qp-usecnt

2012-01-20 Thread Roland Dreier
On Fri, Jan 20, 2012 at 10:40 AM, Roland Dreier rol...@purestorage.com wrote:
 On Fri, Jan 20, 2012 at 8:14 AM, Bernd Schubert
 bernd.schub...@itwm.fraunhofer.de wrote:
 I *guess* the qp allocated by pd-context-ops.create_qp() does not have
 qp-usecnt initialized (not does it know anything about it). So its random
 value will fail the destruction later. A simple workaround that would work
 for us, is to extend the patch I send to

 diff --git a/drivers/infiniband/core/verbs.c
 b/drivers/infiniband/core/verbs.c
 index 602b1bd..fba1675 100644
 --- a/drivers/infiniband/core/verbs.c
 +++ b/drivers/infiniband/core/verbs.c
 @@ -874,7 +874,7 @@ int ib_destroy_qp(struct ib_qp *qp)
        struct ib_srq *srq;
        int ret;

 -       if (atomic_read(qp-usecnt))
 +       if (qp-qp_type == IB_QPT_XRC_TGT  atomic_read(qp-usecnt))
                return -EBUSY;

        if (qp-real_qp != qp)

 It looks like this is sufficient and correct without the other patch?

But maybe it's cleaner to initialize qp-usecnt in
both ib_create_qp() and ib_uverbs_create_qp().

 - R.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at drivers/iommu/intel-iommu.c:1767 on F16 3.1.1-2.fc16.x86_64

2012-01-20 Thread Albert Strasheim
Hello

On Fri, Jan 20, 2012 at 10:23 AM, Roland Dreier rol...@purestorage.com wrote:
 On Thu, Jan 19, 2012 at 12:57 AM, Albert Strasheim full...@gmail.com wrote:
 Just checking up on this issue. Is there any further testing or
 information we can provide to help make a fix happen?
 I'm not likely to be much help on VT-d issues, but maybe it
 would be useful to dump all the values in the BUG_ON if its
 going to trigger, ie just before

Just retested with 3.2.1-1.fc16.x86_64 and the bug seems to be gone.

I confirmed that my test program triggers the bug on 3.1.1-1.fc16.x86_64.

It seems a bunch of IOMMU fixes went in on 9 and 10 January, so it
seems to have fixed this problem in 3.2.

Thanks!

Regards

Albert
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] IB/mad: Return unsupported for MADs as appropriate

2012-01-20 Thread Hal Rosenstock
On 1/20/2012 10:27 AM, Hefty, Sean wrote:
 As Jason pointed out, see figure 169, but, yes, this is the correct value for 
 your case.

Figure 169 is GMP Check. SMInfo is SM class so it's a different case.
When SMInfo is not handled due to no SM present, method/attribute not
supported is most appropriate as a MAD status as SMA handles SM class so
it's always present. I think there are other similar cases (when GS
class is partially supported) too. The only way to return the optimal
MAD status is to look at the port's MAD registrations to really figure
this out.

-- Hal
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html