On Wed, Apr 24, 2013 at 4:58 PM, Or Gerlitz wrote:
> The RDMA stack allows for applications to create IB_QPT_RAW_PACKET QPs,
> for which plain Ethernet packets are used, specifically packets which
> don't carry any QPN to be matched by the receiving side.
>
> Applications using these QPs must be provided with a method to
> program some steering rule with the HW so packets arriving at
> the local port can be routed to them.
Any feedback? we've added RAW PACKET QPs support back on 3.5 or 3.6
but without RX flow steering APIs applications can only send packets,
but not receive them, which is a bit of a problem for production... so
here's a concrete && working suggestion, waiting to be reviewed and
hopefully accepted.
Or.
As I wrote in the cover letter, looking on the "Network Adapter Flow
Steering" slides from Tzahi Oved which he presented on the annual OFA
2012 meeting could be helpful
https://www.openfabrics.org/resources/document-downloads/presentations/doc_download/518-network-adapter-flow-steering.html
> This patch adds ib_create_flow which allow to provide a flow specification
> for a QP, such that when there's a match between the specification and the
> received packet, it can be forwarded to that QP, in a similar manner
> one needs to use ib_attach_multicast for IB UD multicast handling.
>
> Flow specifications are provided as instances of struct ib_flow_spec_yyy
> which describe L2, L3 and L4 headers, currently specs for Ethernet, IPv4,
> TCP, UDP and IB are defined. Flow specs are made of values and masks.
>
> The input to ib_create_flow is instance of struct ib_flow_attr which
> contain few mandatory control elements and optional flow specs.
>
> struct ib_flow_attr {
> enum ib_flow_attr_type type;
> u16 size;
> u16 priority;
> u8 num_of_specs;
> u8 port;
> u32 flags;
> /* Following are the optional layers according to user request
> * struct ib_flow_spec_yyy
> * struct ib_flow_spec_zzz
> */
> };
>
> As these specs are eventually coming from user space, they are defined and
> used in a way which allows adding new spec types without kernel/user ABI
> change, and with a little API enhancement which defines the newly added spec.
>
> The flow spec structures are defined in a TLV (Type-Length-Value) manner,
> which allows to call ib_create_flow with a list of variable length of
> optional specs.
>
> For the actual processing of ib_flow_attr the driver uses the number of
> specs and the size mandatory fields along with the TLV nature of the specs.
>
> Steering rules processing order is according to rules priority. The user
> sets the 12 low-order bits from the priority field and the remaining
> 4 high-order bits are set by the kernel according to a domain the
> application or the layer that created the rule belongs to. Lower
> priority numerical value means higher priority.
>
> The returned value from ib_create_flow is instance of struct ib_flow
> which contains a database pointer (handle) provided by the HW driver
> to be used when calling ib_destroy_flow.
>
> Applications that offload TCP/IP traffic could be written also over IB UD QPs.
> As such, the ib_create_flow / ib_destroy_flow API is designed to support UD
> QPs
> too, the HW driver sets IB_DEVICE_MANAGED_FLOW_STEERING to denote support
> of flow steering.
>
> The ib_flow_attr enum type relates to usage of flow steering for promiscuous
> and sniffer purposes:
>
> IB_FLOW_ATTR_NORMAL - "regular" rule, steering according to rule specification
>
> IB_FLOW_ATTR_ALL_DEFAULT - default unicast and multicast rule, receive
> all Ethernet traffic which isn't steered to any QP
>
> IB_FLOW_ATTR_MC_DEFAULT - same as IB_FLOW_ATTR_ALL_DEFAULT but only for
> multicast
>
> IB_FLOW_ATTR_SNIFFER - sniffer rule, receive all port traffic
>
> ALL_DEFAULT and MC_DEFAULT rules options are valid only for Ethernet link
> type.
>
> Signed-off-by: Hadar Hen Zion
> Signed-off-by: Or Gerlitz
> ---
> drivers/infiniband/core/verbs.c | 30 +
> include/rdma/ib_verbs.h | 136
> ++-
> 2 files changed, 164 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
> index 22192de..932f4a7 100644
> --- a/drivers/infiniband/core/verbs.c
> +++ b/drivers/infiniband/core/verbs.c
> @@ -1254,3 +1254,33 @@ int ib_dealloc_xrcd(struct ib_xrcd *xrcd)
> return xrcd->device->dealloc_xrcd(xrcd);
> }
> EXPORT_SYMBOL(ib_dealloc_xrcd);
> +
> +struct ib_flow *ib_create_flow(struct ib_qp *qp,
> + struct ib_flow_attr *flow_attr,
> + int domain)
> +{
> + struct ib_flow *flow_id;
> + if (!qp->device->create_flow)
> + return ERR_PTR(-ENOSYS);
> +
> + flow_id = qp->device->create_flow(qp, flow_attr, domain);
> + if (!IS_ERR(flow_id))
> + atomic_inc(&qp->usecnt);