Re: kernel BUG at drivers/iommu/intel-iommu.c:1767 on F16 3.1.1-2.fc16.x86_64
On Thu, Jan 19, 2012 at 12:57 AM, Albert Strasheim full...@gmail.com wrote: Just checking up on this issue. Is there any further testing or information we can provide to help make a fix happen? I'm not likely to be much help on VT-d issues, but maybe it would be useful to dump all the values in the BUG_ON if its going to trigger, ie just before BUG_ON(addr_width BITS_PER_LONG (iov_pfn + nr_pages - 1) addr_width); add if (addr_width BITS_PER_LONG (iov_pfn + nr_pages - 1) addr_width) pr_err(VT-d BUG! addr_width %d %d (iov_pfn 0x%lx nr_pages %ld)\n, addr_width, BITS_PER_LONG, iov_pfn, nr_pages); and report what that prints. - R. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
cq-event kernel panic
We are still seeing kernel panics with linux-3.2, this time initiated from mthca_cq_event(). I'm unsure if this is somehow related to the yesterdays cq_completion patch. In any case, I'm CCing Sean therefore. kernel logs sometimes show something like ib_mthca :01:00.0: CQ access violation on CQN 2c0089 and at the same time either our FhGFS daemons, which are using ibverbs crash with a segmentation fault or the entire kernel crashes with panic as given below. My next step is to debug our FhGFS crashes to see if this is from ib libs or a real issue of the daemon. Below is the kernel panic. The kernel already includes the patch to initialized qp-usecnt. [53904.589342] ib_mthca :01:00.0: CQ access violation on CQN 8b [53964.464518] ib_mthca :01:00.0: CQ access violation on CQN d2009f [53964.468302] BUG: unable to handle kernel NULL pointer dereference at 0058 [53964.468302] IP: [a03a71a8] ib_uverbs_async_handler+0x28/0x150 [ib_uverbs] [53964.468302] PGD 1f8d18067 PUD 1f3904067 PMD 0 [53964.468302] Oops: [#1] SMP [53964.468302] CPU 1 [53964.468302] Modules linked in: nfsd ext4 mbcache jbd2 crc16 mlx4_ib mlx4_core ib_umad rdma_ucm rdma_cm iw_cm ib_addr ib_uverbs ib_ipoib ib_cm ib_sa sg ipv6 sd_mod crc_t10dif loop arcmsr md_mod pcspkr 8250_pnp ib_mthca ib_mad ib_core fuse af_packet nfs lockd fscache auth_rpcgss nfs_acl sunrpc btrfs lzo_decompress lzo_compress zlib_deflate crc32c libcrc32c crypto_hash crypto_algapi ata_generic pata_acpi pata_amd e1000 sata_nv libata scsi_mod unix [last unloaded: scsi_wait_scan] [53964.468302] [53964.468302] Pid: 10644, comm: fhgfs-storage-u Not tainted 3.2.0+ #10 Supermicro H8DCE/H8DCE [53964.468302] RIP: 0010:[a03a71a8] [a03a71a8] ib_uverbs_async_handler+0x28/0x150 [ib_uverbs] [53964.468302] RSP: 0018:8801ffc039b0 EFLAGS: 00010082 [53964.468302] RAX: 8801f948e300 RBX: RCX: 8801f948e370 [53964.468302] RDX: RSI: 8801f948ee40 RDI: [53964.468302] RBP: 8801ffc039f0 R08: 8801f948e384 R09: 8142c5e0 [53964.468302] R10: 0006 R11: 000d R12: 00d2009f [53964.468302] R13: 8800bf5aba20 R14: R15: 8801f3a82400 [53964.468302] FS: 74ca7700() GS:8801ffc0() knlGS: [53964.468302] CS: 0010 DS: ES: CR0: 8005003b [53964.468302] CR2: 0058 CR3: 0001f96d4000 CR4: 06e0 [53964.468302] DR0: DR1: DR2: [53964.468302] DR3: DR6: 0ff0 DR7: 0400 [53964.468302] Process fhgfs-storage-u (pid: 10644, threadinfo 8809, task 8800c8139650) [53964.468302] Stack: [53964.468302] 8801ffc03a00 8801f948e384 a0318208 8800bf5ab000 [53964.468302] 00d2009f 8800bf5aba20 8801f3a82400 [53964.468302] 8801ffc03a00 a03a737b 8801ffc03a60 a0306f77 [53964.468302] Call Trace: [53964.468302] IRQ [53964.468302] [a03a737b] ib_uverbs_cq_event_handler+0x2b/0x30 [ib_uverbs] [53964.468302] [a0306f77] mthca_cq_event+0x87/0x110 [ib_mthca] [53964.468302] [a03062a4] mthca_eq_int+0x2d4/0x410 [ib_mthca] [53964.468302] [a0306544] mthca_arbel_msi_x_interrupt+0x24/0x60 [ib_mthca] [53964.468302] [810b54fd] handle_irq_event_percpu+0x5d/0x210 [53964.468302] [810b56f0] handle_irq_event+0x40/0x70 [53964.468302] [810b8d0d] handle_edge_irq+0x6d/0x120 [53964.468302] [810166a2] handle_irq+0x22/0x30 [53964.468302] [81390aad] do_IRQ+0x5d/0xe0 [53964.468302] [81385eb3] common_interrupt+0x73/0x73 [53964.468302] [812e3f9b] ? __alloc_skb+0x4b/0x170 [53964.468302] [8113e0fb] ? kmem_cache_alloc_node+0x3b/0x130 [53964.468302] [8131af61] ? ip_rcv+0x201/0x2e0 [53964.468302] [812e3f9b] __alloc_skb+0x4b/0x170 [53964.468302] [812e457d] dev_alloc_skb+0x1d/0x40 [53964.468302] [a0395fca] ipoib_alloc_rx_skb+0x4a/0x380 [ib_ipoib] ib_uverbs_async_handler+0x28 translates to Reading symbols from /home/schubert/src/linux/linux-stable/debian/tmp/lib/modules/3.2.0+/kernel/drivers/infiniband/core/ib_uverbs.ko...done. (gdb) l *(ib_uverbs_async_handler+0x28) 0x11a8 is in ib_uverbs_async_handler (drivers/infiniband/core/uverbs_main.c:440). 435 u32 *counter) 436 { 437 struct ib_uverbs_event *entry; 438 unsigned long flags; 439 440 spin_lock_irqsave(file-async_file-lock, flags); 441 if (file-async_file-is_closed) { 442 spin_unlock_irqrestore(file-async_file-lock, flags); 443 return; 444 } Any ideas? Thanks, Bernd -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a
RE: races in ipathfs
We are currently investigating this. Thanks for the review on this issue! Mike -Original Message- From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma- ow...@vger.kernel.org] On Behalf Of Al Viro Sent: Thursday, January 19, 2012 3:20 PM To: Dept_Infinipath Cc: linux-rdma@vger.kernel.org; linux-kernel Subject: races in ipathfs Use of qib_super is seriously racy. qibfs_add() (and worse, qibfs_remove()) can happen during qibfs_mount() and qibfs_kill_super(). 1) CPU1: qib_init_one(). The sucker is allocated and placed on the list. CPU2: ipathfs is mounted, directory created. CPU1: finally gets around to qibfs_add(); by now qib_super is non-NULL and off we go, trying to create it again. The worst part is, that code doesn't even notice that dentry is there and positive; you silently leak the old inode. 2) CPU1: qib_init_one(). Allocated the sucker. CPU2: ipathfs is getting mounted. Picked the first device off the list, creating directory for it. CPU1: inserted new device into the head of the list, continued working. Got around to qibfs_add(); qib_super is NULL, so we do nothing. CPU2: walked the rest of the list, creating directories for all devices. Our device is missed, since we are past that point in the list. Worse, shift the timing a bit and it doesn't matter whether you add to the head or to the tail of the list - if qibfs_add() happens just before we set qib_super, we are screwed again. 3) CPU1: qib_remove_one(). CPU2: mount ipathfs is walking that list and decides to try and create a directory for the device that is being freed. Oops... 4) CPU1: qib_init_one() or qib_remove_one(), doesn't matter which. CPU2: final umount of ipathfs already got through setting sb-s_root to NULL but still hadn't set qib_super to the same. Oops... And no, moving that qib_super = NULL; up prior to kill_litter_super() won't fix the race either, of course. AFAICS, the older driver (in hw/ipath) has the same problems. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html This message and any attached documents contain information from QLogic Corporation or its wholly-owned subsidiaries that may be confidential. If you are not the intended recipient, you may not read, copy, distribute, or use this information. If you have received this transmission in error, please notify the sender immediately by reply e-mail and then delete this message. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] IB/mad: Return unsupported for MADs as appropriate
On 1/20/2012 7:12 AM, Swapna Thete wrote: Thank you all for your suggestions. However, I just wanted to understand that given this code is for the case of an entire class not supported, There's more than just class not supported which gets to that point in the MAD code flow. For example, SMInfo can and that is more appropriate as method/attribute not supported rather than class not supported. The one example cited so far is SMInfo is part of SM class and that is required to be supported by SMA in every node. isn't this error code IB_MGMT_MAD_STATUS_BAD_VERSION (MAD returned with Bad Status: Unsupported Class or Version) more appropriate? Also this is Part of IB spec error codes. Yes, class not supported is indicated by this code. We can either choose to discern which case it is and return the correct error code (preferable if not too much overhead) or pick one of the two (in the case where it's an incoming get/set). If we pick one, which one causes less confusion when it comes to answering why that MAD status was returned ? I don't think it would/should cause any behavioral difference as any bad MAD status is/should be treated as an error. -- Hal Also I did not make any changes to the handling of SMA Get(SmInfo attribute). Let me know if I am missing something. Thanks, Swapna -Original Message- From: Hal Rosenstock [mailto:h...@dev.mellanox.co.il] Sent: Thursday, January 19, 2012 6:28 PM To: Swapna Thete Cc: rol...@kernel.org; linux-rdma@vger.kernel.org; Jack Morgenstein Subject: Re: [PATCH 2/2] IB/mad: Return unsupported for MADs as appropriate On 1/18/2012 5:30 PM, Hal Rosenstock wrote: On 1/18/2012 3:43 AM, Swapna Thete wrote: Setup a response with appropriate error status and send it for the MADs that are not supported by a specific class/version. Signed-off-by: Swapna Thete swapna.th...@qlogic.com --- drivers/infiniband/core/mad.c | 10 ++ 1 files changed, 10 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c index 2fe428b..734d846 100644 --- a/drivers/infiniband/core/mad.c +++ b/drivers/infiniband/core/mad.c @@ -1963,6 +1963,16 @@ local: * or via recv_handler in ib_mad_complete_recv() */ recv = NULL; +} else { +memcpy(response, recv, sizeof(*response)); Isn't this overkill as the bad MAD status precludes looking at the MAD data ? +response-header.recv_wc.wc = response-header.wc; +response-header.recv_wc.recv_buf.mad = response-mad.mad; +response-header.recv_wc.recv_buf.grh = response-grh; +response-mad.mad.mad_hdr.method = IB_MGMT_METHOD_GET_RESP; +response-mad.mad.mad_hdr.status = +__be16_to_cpu(IB_MGMT_MAD_STATUS_BAD_VERSION); While this is the best status for class not supported, that's not all the cases that get to here. Attribute not supported (in a supported class) could also occur here for which unsupported method/attribute combination is more appropriate as a MAD status. I'm not sure it's worth the effort to discern that (as any invalid MAD status is treated the same) but I think it could be done if we want to be more precise about the error. The other alternative is to just return method/attribute combination not supported (STATUS_FIELD[4:2] = 0x3 (MAD status 0xc)) for all this as Jack and Or have indicated. FWIW that's my preference. -- Hal -- Hal +agent_send_response(response-mad.mad, recv-grh, wc, +port_priv-device, port_num, qp_info-qp-qp_num); } out: -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html This message and any attached documents contain information from QLogic Corporation or its wholly-owned subsidiaries that may be confidential. If you are not the intended recipient, you may not read, copy, distribute, or use this information. If you have received this transmission in error, please notify the sender immediately by reply e-mail and then delete this message. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 2/2] IB/mad: Return unsupported for MADs as appropriate
Thank you all for your suggestions. However, I just wanted to understand that given this code is for the case of an entire class not supported, isn't this error code IB_MGMT_MAD_STATUS_BAD_VERSION (MAD returned with Bad Status: Unsupported Class or Version) more appropriate? Also this is Part of IB spec error codes. As Jason pointed out, see figure 169, but, yes, this is the correct value for your case. What I was suggesting is moving these checks out into another function, dealing with the specific case(s) that you are interested in, and deferring work on other cases. The function I'm referring to would use positive checks to determine if a reply should be generated and what type of reply it needs to be, with the default case to simply discard the MAD. Silently dropping these MADs is permissible under the spec, automatically generating a canned reply is not. - Sean N�r��yb�X��ǧv�^�){.n�+{��ٚ�{ay�ʇڙ�,j��f���h���z��w��� ���j:+v���w�j�mzZ+�ݢj��!�i
RE: cq-event kernel panic
We are still seeing kernel panics with linux-3.2, this time initiated from mthca_cq_event(). I'm unsure if this is somehow related to the yesterdays cq_completion patch. In any case, I'm CCing Sean therefore. Was your patch applied when testing? mthca uses kmalloc to allocate the QP structure. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] core/verb.c: fix kernel panic: always initialize struct ib_qp *qp-usecnt
Hmm, I think we do have serious problem with the hole approach. While the patch works for the kernel side, there is a problem with user space libraries. So I monitored our daemons and noticed ibv_destroy_cq() failed. The reason again seems to be the same issue as already fixed for kernel qp's. So in __ibv_create_qp() (libibverbs/src/verbs.c): __ibv_create_qp() struct ibv_qp *qp = pd-context-ops.create_qp(pd, qp_init_attr); if (qp) { qp-context = pd-context; qp-qp_context= qp_init_attr-qp_context; qp-pd= pd; qp-send_cq = qp_init_attr-send_cq; [...] I *guess* the qp allocated by pd-context-ops.create_qp() does not have qp-usecnt initialized (not does it know anything about it). So its random value will fail the destruction later. A simple workaround that would work for us, is to extend the patch I send to diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index 602b1bd..fba1675 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -874,7 +874,7 @@ int ib_destroy_qp(struct ib_qp *qp) struct ib_srq *srq; int ret; - if (atomic_read(qp-usecnt)) + if (qp-qp_type == IB_QPT_XRC_TGT atomic_read(qp-usecnt)) return -EBUSY; if (qp-real_qp != qp) However, what is is with user space setting type to IB_QPT_XRC_TGT? I guess this could be solved by letting the kernel zero the memory returned by -ops.create_qp(pd, qp_init_attr). Btw, I didn't figure out yet, how this translates at all in kernel space? Is this op directly going to the device driver? But even if we are properly going to initialize the qp, what is with user space mischievously trying to crash the system by manipulating struct ib_qp *qp? Thanks, Bernd -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: cq-event kernel panic
On 01/20/2012 05:04 PM, Hefty, Sean wrote: We are still seeing kernel panics with linux-3.2, this time initiated from mthca_cq_event(). I'm unsure if this is somehow related to the yesterdays cq_completion patch. In any case, I'm CCing Sean therefore. Was your patch applied when testing? mthca uses kmalloc to allocate the QP structure. Yes, therefore the kernel build name updated to 3.2.0+. But please see the mail I just sent, it probably explains the underlying issue. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: OT: netmap - a novel framework for fast packet I/O
On Fri, 20 Jan 2012 06:18:44 -0800 Atchley, Scott atchle...@ornl.gov wrote: Interesting. It totally hijacks the NIC; all traffic is captured. You would have to implement your own IP stack, Verbs stack, etc. Can multiple user space processes share the card? If so, how is security handled between them? Ira Scott On Jan 19, 2012, at 11:50 AM, Yann Droneaud wrote: Hi, I have discovered today the netmap project[1] through an ACM Queue article[2]. Netmap is a new interface to send and receive packets through an Ethernet interface (NIC). It seems to provide a raw access to network interface in order to process packets at high rate with a low overhead. This is an another example of kernel-bypass/zero-copy which are core features of InfiniBand verbs/RDMA. But unlike InfiniBand verbs/RDMA, Netmap seems to have a very small API. Such API could be enough to build an unreliable datagram messaging system on low cost hardware (without concerns of determinism, flow control, etc.). I'm asking myself if the way netmap exposes internal NIC rings could be applicable for IB/IBoE HCA ? e.g. beyond 10GbE NIC, is netmap relevant ? Regards. [1] http://info.iet.unipi.it/~luigi/netmap/ netmap - a novel framework for fast packet I/O Luigi Rizzo Università di Pisa [2] http://queue.acm.org/detail.cfm?id=2103536 Revisiting Network I/O APIs: The netmap Framework Luigi Rizzo, 2012-01-17 -- Yann Droneaud -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Ira Weiny Member of Technical Staff Lawrence Livermore National Lab 925-423-8008 wei...@llnl.gov -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: OT: netmap - a novel framework for fast packet I/O
On Jan 20, 2012, at 11:20 AM, Ira Weiny wrote: On Fri, 20 Jan 2012 06:18:44 -0800 Atchley, Scott atchle...@ornl.gov wrote: Interesting. It totally hijacks the NIC; all traffic is captured. You would have to implement your own IP stack, Verbs stack, etc. Can multiple user space processes share the card? If so, how is security handled between them? It is not clear from the paper I scanned. There does seem to be a mechanism to send selected packets up the host stack. Scott Ira Scott On Jan 19, 2012, at 11:50 AM, Yann Droneaud wrote: Hi, I have discovered today the netmap project[1] through an ACM Queue article[2]. Netmap is a new interface to send and receive packets through an Ethernet interface (NIC). It seems to provide a raw access to network interface in order to process packets at high rate with a low overhead. This is an another example of kernel-bypass/zero-copy which are core features of InfiniBand verbs/RDMA. But unlike InfiniBand verbs/RDMA, Netmap seems to have a very small API. Such API could be enough to build an unreliable datagram messaging system on low cost hardware (without concerns of determinism, flow control, etc.). I'm asking myself if the way netmap exposes internal NIC rings could be applicable for IB/IBoE HCA ? e.g. beyond 10GbE NIC, is netmap relevant ? Regards. [1] http://info.iet.unipi.it/~luigi/netmap/ netmap - a novel framework for fast packet I/O Luigi Rizzo Università di Pisa [2] http://queue.acm.org/detail.cfm?id=2103536 Revisiting Network I/O APIs: The netmap Framework Luigi Rizzo, 2012-01-17 -- Yann Droneaud -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Ira Weiny Member of Technical Staff Lawrence Livermore National Lab 925-423-8008 wei...@llnl.gov -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
memory region limit at 32 GB?
Hello all Is there some kind of limit that would prevent me from registering more than 32 GiB worth of memory regions with ibv_reg_mr in libibverbs? From strace I can see: open(/dev/infiniband/uverbs0, O_RDWR) = 8 ... write(8, \t\0\0\0\f\0\3\0\340o\255\35\377\177\0\0\0P\211\336\26\177\0\0\0\0\0@\0\0\0\0\0P\211\336\26\177\0\0\1\0\0\0\3\0\0\0, 48) = -1 ENOMEM (Cannot allocate memory) when trying to register my 33rd 1 GiB buffer. cat /proc/meminfo MemTotal: 198075136 kB MemFree:186688448 kB CommitLimit:185892796 kB so it doesn't look like a memory thing. IB adapter details: hca_id: mlx4_0 transport: InfiniBand (0) fw_ver: 2.8.600 vendor_id: 0x02c9 vendor_part_id: 26428 hw_ver: 0xB0 board_id: MT_0FC0110009 I'm using libibverbs 1.1.6 on kernel 3.2.1-1.fc16.x86_64. Regards Albert -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: memory region limit at 32 GB?
On Fri, Jan 20, 2012 at 9:39 AM, Albert Strasheim full...@gmail.com wrote: Is there some kind of limit that would prevent me from registering more than 32 GiB worth of memory regions with ibv_reg_mr in libibverbs? Yes, by default mlx4 allocates a limited amount of adapter resources for tracking memory regions. I forget the exact limits but 32GB looks reasonable... In mlx4/main.c, there is static struct mlx4_profile default_profile = { .num_qp = 1 18, .num_srq= 1 16, .rdmarc_per_qp = 1 4, .num_cq = 1 16, .num_mcg= 1 13, .num_mpt= 1 19, .num_mtt= 1 20, }; and I think if you bump num_mtt up a few powers of 2 (eg 1 22) then you should be able to register more. (num_mpt controls the number of MRs but I guess 33 is not near the limit of 512K yet ;) - R. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: memory region limit at 32 GB?
By the way, I wonder if we should auto-tune num_mtt so we have enough MTTs to cover, say, 4X of the amount of physical memory. How much RAM do you have in your system? - R. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: memory region limit at 32 GB?
On Fri, Jan 20, 2012 at 10:30 AM, Albert Strasheim full...@gmail.com wrote: FYI, new Sandy Bridge motherboards will be out soon that do 512 GB and even 768 GB. Yeah, Cisco UCS C260 goes up to 1TB in a 2socket 2U already so we really should fix this. I'll cook something up. The good news is that the overhead is that we use more memory for adapter context, but if you have 1TB of RAM you probably don't care about a few more MB over overhead :) - R. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] core/verb.c: fix kernel panic: always initialize struct ib_qp *qp-usecnt
On Fri, Jan 20, 2012 at 8:14 AM, Bernd Schubert bernd.schub...@itwm.fraunhofer.de wrote: I *guess* the qp allocated by pd-context-ops.create_qp() does not have qp-usecnt initialized (not does it know anything about it). So its random value will fail the destruction later. A simple workaround that would work for us, is to extend the patch I send to diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index 602b1bd..fba1675 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -874,7 +874,7 @@ int ib_destroy_qp(struct ib_qp *qp) struct ib_srq *srq; int ret; - if (atomic_read(qp-usecnt)) + if (qp-qp_type == IB_QPT_XRC_TGT atomic_read(qp-usecnt)) return -EBUSY; if (qp-real_qp != qp) It looks like this is sufficient and correct without the other patch? However, what is is with user space setting type to IB_QPT_XRC_TGT? I guess this could be solved by letting the kernel zero the memory returned by -ops.create_qp(pd, qp_init_attr). Btw, I didn't figure out yet, how this translates at all in kernel space? Is this op directly going to the device driver? But even if we are properly going to initialize the qp, what is with user space mischievously trying to crash the system by manipulating struct ib_qp *qp? I don't follow this. Isn't *qp completely allocated and manipulated in the kernel? How can userspace touch it except by having the kernel do something via the uverbs interface? - R. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] core/verb.c: fix kernel panic: always initialize struct ib_qp *qp-usecnt
However, what is is with user space setting type to IB_QPT_XRC_TGT? I guess this could be solved by letting the kernel zero the memory returned by -ops.create_qp(pd, qp_init_attr). Btw, I didn't figure out yet, how this translates at all in kernel space? Is this op directly going to the device driver? ops.create_qp basically ends up going into the kernel into ib_uverbs_create_qp(). But even if we are properly going to initialize the qp, what is with user space mischievously trying to crash the system by manipulating struct ib_qp *qp? There's cleanup in uverbs that ignores the return value from ib_destroy_qp(), basically because it shouldn't fail in those circumstances. After calling ib_destroy_qp, uverbs will free some internal structures that some of the callback handlers expect to access. This leads to the crashes that you're seeing. I think the problem is that your first patch is incomplete. ib_uverbs_create_qp() will create a QP by either calling ib_create_qp() or by calling the device directly (device-create_qp). qp-usecnt needs to be initialized in both cases. Can you try this modification to your original patch? From: Bernd Schubert bernd.schub...@itwm.fraunhofer.de From: Sean Hefty sean.he...@intel.com rdma/core: Fix kernel panic by always initializing qp-usecnt We have just been investigating kernel panics related to cq-ibcq.event_handler() completion calls. Reason is that ib_destroy_qp() fails with -EBUSY. Further investigation revealed qp-usecnt is not initialized. This counter was introduced in linux-3.2 by commit 0e0ec7e0638ef48e0c661873dfcc8caccab984c6 and is only initialized for IB_QPT_XRC_TGT, but also checked in ib_destroy_qp() for any qp type. Signed-off-by: Bernd Schubert bernd.schub...@itwm.fraunhofer.de Signed-off-by: Sven Breuner sven.breu...@itwm.fraunhofer.de Signed-off-by: Sean Hefty sean.he...@intel.com --- drivers/infiniband/core/uverbs_cmd.c |1 + drivers/infiniband/core/verbs.c |2 +- 2 files changed, 2 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c index e26193f..e47dbf1 100644 --- a/drivers/infiniband/core/uverbs_cmd.c +++ b/drivers/infiniband/core/uverbs_cmd.c @@ -1472,6 +1472,7 @@ ssize_t ib_uverbs_create_qp(struct ib_uverbs_file *file, qp-event_handler = attr.event_handler; qp-qp_context= attr.qp_context; qp-qp_type = attr.qp_type; + atomic_set(qp-usecnt, 0); atomic_inc(pd-usecnt); atomic_inc(attr.send_cq-usecnt); if (attr.recv_cq) diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index 602b1bd..575b780 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -421,6 +421,7 @@ struct ib_qp *ib_create_qp(struct ib_pd *pd, qp-uobject= NULL; qp-qp_type= qp_init_attr-qp_type; + atomic_set(qp-usecnt, 0); if (qp_init_attr-qp_type == IB_QPT_XRC_TGT) { qp-event_handler = __ib_shared_qp_event_handler; qp-qp_context = qp; @@ -430,7 +431,6 @@ struct ib_qp *ib_create_qp(struct ib_pd *pd, qp-xrcd = qp_init_attr-xrcd; atomic_inc(qp_init_attr-xrcd-usecnt); INIT_LIST_HEAD(qp-open_list); - atomic_set(qp-usecnt, 0); real_qp = qp; qp = __ib_open_qp(real_qp, qp_init_attr-event_handler, N�r��yb�X��ǧv�^�){.n�+{��ٚ�{ay�ʇڙ�,j��f���h���z��w��� ���j:+v���w�j�mzZ+�ݢj��!�i
Re: [PATCH] core/verb.c: fix kernel panic: always initialize struct ib_qp *qp-usecnt
On Fri, Jan 20, 2012 at 10:40 AM, Roland Dreier rol...@purestorage.com wrote: On Fri, Jan 20, 2012 at 8:14 AM, Bernd Schubert bernd.schub...@itwm.fraunhofer.de wrote: I *guess* the qp allocated by pd-context-ops.create_qp() does not have qp-usecnt initialized (not does it know anything about it). So its random value will fail the destruction later. A simple workaround that would work for us, is to extend the patch I send to diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index 602b1bd..fba1675 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -874,7 +874,7 @@ int ib_destroy_qp(struct ib_qp *qp) struct ib_srq *srq; int ret; - if (atomic_read(qp-usecnt)) + if (qp-qp_type == IB_QPT_XRC_TGT atomic_read(qp-usecnt)) return -EBUSY; if (qp-real_qp != qp) It looks like this is sufficient and correct without the other patch? But maybe it's cleaner to initialize qp-usecnt in both ib_create_qp() and ib_uverbs_create_qp(). - R. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at drivers/iommu/intel-iommu.c:1767 on F16 3.1.1-2.fc16.x86_64
Hello On Fri, Jan 20, 2012 at 10:23 AM, Roland Dreier rol...@purestorage.com wrote: On Thu, Jan 19, 2012 at 12:57 AM, Albert Strasheim full...@gmail.com wrote: Just checking up on this issue. Is there any further testing or information we can provide to help make a fix happen? I'm not likely to be much help on VT-d issues, but maybe it would be useful to dump all the values in the BUG_ON if its going to trigger, ie just before Just retested with 3.2.1-1.fc16.x86_64 and the bug seems to be gone. I confirmed that my test program triggers the bug on 3.1.1-1.fc16.x86_64. It seems a bunch of IOMMU fixes went in on 9 and 10 January, so it seems to have fixed this problem in 3.2. Thanks! Regards Albert -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] IB/mad: Return unsupported for MADs as appropriate
On 1/20/2012 10:27 AM, Hefty, Sean wrote: As Jason pointed out, see figure 169, but, yes, this is the correct value for your case. Figure 169 is GMP Check. SMInfo is SM class so it's a different case. When SMInfo is not handled due to no SM present, method/attribute not supported is most appropriate as a MAD status as SMA handles SM class so it's always present. I think there are other similar cases (when GS class is partially supported) too. The only way to return the optimal MAD status is to look at the port's MAD registrations to really figure this out. -- Hal -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html