Re: [for-next 7/7] IB/mlx5: Implement fragmented completion queue (CQ)

2018-02-24 Thread Majd Dibbiny

> On Feb 23, 2018, at 9:13 PM, Saeed Mahameed  wrote:
> 
>> On Thu, 2018-02-22 at 16:04 -0800, Santosh Shilimkar wrote:
>> Hi Saeed
>> 
>>> On 2/21/2018 12:13 PM, Saeed Mahameed wrote:
>>> From: Yonatan Cohen 
>>> 
>>> The current implementation of create CQ requires contiguous
>>> memory, such requirement is problematic once the memory is
>>> fragmented or the system is low in memory, it causes for
>>> failures in dma_zalloc_coherent().
>>> 
>>> This patch implements new scheme of fragmented CQ to overcome
>>> this issue by introducing new type: 'struct mlx5_frag_buf_ctrl'
>>> to allocate fragmented buffers, rather than contiguous ones.
>>> 
>>> Base the Completion Queues (CQs) on this new fragmented buffer.
>>> 
>>> It fixes following crashes:
>>> kworker/29:0: page allocation failure: order:6, mode:0x80d0
>>> CPU: 29 PID: 8374 Comm: kworker/29:0 Tainted: G OE 3.10.0
>>> Workqueue: ib_cm cm_work_handler [ib_cm]
>>> Call Trace:
>>> [<>] dump_stack+0x19/0x1b
>>> [<>] warn_alloc_failed+0x110/0x180
>>> [<>] __alloc_pages_slowpath+0x6b7/0x725
>>> [<>] __alloc_pages_nodemask+0x405/0x420
>>> [<>] dma_generic_alloc_coherent+0x8f/0x140
>>> [<>] x86_swiotlb_alloc_coherent+0x21/0x50
>>> [<>] mlx5_dma_zalloc_coherent_node+0xad/0x110 [mlx5_core]
>>> [<>] ? mlx5_db_alloc_node+0x69/0x1b0 [mlx5_core]
>>> [<>] mlx5_buf_alloc_node+0x3e/0xa0 [mlx5_core]
>>> [<>] mlx5_buf_alloc+0x14/0x20 [mlx5_core]
>>> [<>] create_cq_kernel+0x90/0x1f0 [mlx5_ib]
>>> [<>] mlx5_ib_create_cq+0x3b0/0x4e0 [mlx5_ib]
>>> 
>>> Signed-off-by: Yonatan Cohen 
>>> Reviewed-by: Tariq Toukan 
>>> Signed-off-by: Leon Romanovsky 
>>> Signed-off-by: Saeed Mahameed 
>>> ---
>> 
>> Jason mentioned about this patch to me off-list. We were
>> seeing similar issue with SRQs & QPs. So wondering whether
>> you have any plans to do similar change for other resouces
>> too so that they don't rely on higher order page allocation
>> for icm tables.
>> 
> 
> Hi Santosh,
> 
> Adding Majd,
> 
> Which ULP is in question ? how big are the QPs/SRQs you create that
> lead to this problem ?
> 
> For icm tables we already allocate only order 0 pages:
> see alloc_system_page() in
> drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
> 
> But for kernel RDMA SRQ and QP buffers there is a place for
> improvement.
> 
> Majd, do you know if we have any near future plans for this.

It’s in our plans to move all the buffers to use 0-order pages.

Santosh,

Is this RDS? Do you have persistent failure with some configuration? Can you 
please share more information?

Thanks
> 
>> Regards,
>> Santosh


Re: [for-next 4/6] net/mlx5: FPGA, Add basic support for Innova

2017-06-10 Thread Majd Dibbiny

> On Jun 10, 2017, at 1:24 AM, Doug Ledford  wrote:
> 
>> On Wed, 2017-06-07 at 13:21 -0600, Jason Gunthorpe wrote:
>>> On Wed, Jun 07, 2017 at 10:13:43PM +0300, Saeed Mahameed wrote:
>>>  
>>> No !!
>>> I am just showing you that the ib_core eventually will end up
>>> calling
>>> mlx5_core to create a QP.
>>> so mlx5_core can create the QP it self since it is the one
>>> eventually
>>> creating QPs.
>>> we just call mlx5_core_create_qp directly.
>> 
>> Which is building a RDMA ULP inside a driver without using the core
>> code :(
> 
> Aren't the transmit/receive queues of the Ethernet netdevice on
> mlx4/mlx5 hardware QPs too?  Those bypass the RDMA subsystem entirely.
>  Just because something uses a QP on hardware that does *everything*
> via QPs doesn't necessarily mean it must go through the RDMA subsystem.
> 
> Now, the fact that the content of the packets is basically a RoCE
> packet does make things a bit fuzzier, but if their packets are
> specially crafted RoCE packets that aren't really intended to be fully
> RoCE spec compliant (maybe they don't support all the options as normal
> RoCE QPs), then I can see hiding them from the larger RoCE portion of
> the RDMA stack.
> 
>>> 
 
 This keep getting more ugly :(
 
 What about security? What if user space sends some raw packets to
 the
 FPGA - can it reprogram the ISPEC settings or worse?
 
>>> 
>>> No such thing. This QP is only for internal driver/HW
>>> communications,
>>> as it is faster from the existing command interface.
>>> it is not meant to be exposed for any raw user space usages at all,
>>> without proper standard API adapter of course.
>> 
>> I'm not asking about the QP, I'm asking what happens after the NIC
>> part. You use ROCE packets to control the FPGA. What prevents
>> userspace from forcibly constructing roce packets and sending them to
>> the FPGA. How does the FPGA know for certain the packet came from the
>> kernel QP and not someplace else.
> 
> This is a valid concern.
> 
>> This is especially true for mlx nics as there are many raw packet
>> bypass mechanisms available to userspace.
> 
All of the Raw packet bypass mechanisms are restricted to CAP_NET_RAW, and thus 
malicious users can't simply open a RAW Packet QP and send it to the FPGA..
> Right.  The question becomes: Does the firmware filter outgoing raw ETH
> QPs such that a nefarious user could not send a crafted RoCE packet
> that the bump on the wire would intercept and accept?
> 
> -- 
> Doug Ledford 
> GPG KeyID: B826A3330E572FDD
>
> Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html