Re: [PATCHv2 TRIVIAL] IB/core: ib_mad.h ib_mad_snoop_handler documentation fix
On 1/4/2016 10:44 PM, Hal Rosenstock wrote: ib_mad_snoop_handler uses send_buf rather than send_wr Signed-off-by: Hal Rosenstock Please use higher language in commit titles e.g IB/core: Documentation fix in the MAD header file --- Change since v1: Fixed typo in patch description diff --git a/include/rdma/ib_mad.h b/include/rdma/ib_mad.h index ec9b44d..2b3573d 100644 --- a/include/rdma/ib_mad.h +++ b/include/rdma/ib_mad.h @@ -424,11 +424,11 @@ typedef void (*ib_mad_send_handler)(struct ib_mad_agent *mad_agent, /** * ib_mad_snoop_handler - Callback handler for snooping sent MADs. * @mad_agent: MAD agent that snooped the MAD. - * @send_wr: Work request information on the sent MAD. + * @send_buf: send MAD data buffer. * @mad_send_wc: Work completion information on the sent MAD. Valid * only for snooping that occurs on a send completion. * - * Clients snooping MADs should not modify data referenced by the @send_wr + * Clients snooping MADs should not modify data referenced by the @send_buf * or @mad_send_wc. */ typedef void (*ib_mad_snoop_handler)(struct ib_mad_agent *mad_agent, -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html __ This email has been scanned by the Symantec Email Security.cloud service. For more information please visit http://www.symanteccloud.com __ -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] IB/sysfs: Fix sparse warning on attr_id
On 1/3/2016 10:44 PM, ira.we...@intel.com wrote: > From: Ira Weiny > > Attributed ID was declared as an int while the value should really be big > endian 16. > > Fixes: 35c4cbb17811 ("IB/core: Create get_perf_mad function in sysfs.c") > > Reported-by: Bart Van Assche > Signed-off-by: Ira Weiny Reviewed-by: Hal Rosenstock -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 TRIVIAL] IB/core: ib_mad.h ib_mad_snoop_handler documentation fix
On Mon, Jan 04, 2016 at 03:44:15PM -0500, Hal Rosenstock wrote: > ib_mad_snoop_handler uses send_buf rather than send_wr > > Signed-off-by: Hal Rosenstock Reviewed-by: Ira Weiny > --- > Change since v1: > Fixed typo in patch description > > diff --git a/include/rdma/ib_mad.h b/include/rdma/ib_mad.h > index ec9b44d..2b3573d 100644 > --- a/include/rdma/ib_mad.h > +++ b/include/rdma/ib_mad.h > @@ -424,11 +424,11 @@ typedef void (*ib_mad_send_handler)(struct ib_mad_agent > *mad_agent, > /** > * ib_mad_snoop_handler - Callback handler for snooping sent MADs. > * @mad_agent: MAD agent that snooped the MAD. > - * @send_wr: Work request information on the sent MAD. > + * @send_buf: send MAD data buffer. > * @mad_send_wc: Work completion information on the sent MAD. Valid > * only for snooping that occurs on a send completion. > * > - * Clients snooping MADs should not modify data referenced by the @send_wr > + * Clients snooping MADs should not modify data referenced by the @send_buf > * or @mad_send_wc. > */ > typedef void (*ib_mad_snoop_handler)(struct ib_mad_agent *mad_agent, > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] IB/mad: use CQ abstraction
On Mon, Jan 04, 2016 at 02:15:59PM +0100, Christoph Hellwig wrote: > Remove the local workqueue to process mad completions and use the CQ API > instead. > > Signed-off-by: Christoph Hellwig One minor nit below... > --- > drivers/infiniband/core/mad.c | 159 > + > drivers/infiniband/core/mad_priv.h | 2 +- > 2 files changed, 58 insertions(+), 103 deletions(-) > > diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c > index cbe232a..286d1a9 100644 > --- a/drivers/infiniband/core/mad.c > +++ b/drivers/infiniband/core/mad.c > @@ -61,18 +61,6 @@ MODULE_PARM_DESC(send_queue_size, "Size of send queue in > number of work requests > module_param_named(recv_queue_size, mad_recvq_size, int, 0444); > MODULE_PARM_DESC(recv_queue_size, "Size of receive queue in number of work > requests"); > > -/* > - * Define a limit on the number of completions which will be processed by the > - * worker thread in a single work item. This ensures that other work items > - * (potentially from other users) are processed fairly. > - * > - * The number of completions was derived from the default queue sizes above. > - * We use a value which is double the larger of the 2 queues (receive @ 512) > - * but keep it fixed such that an increase in that value does not introduce > - * unfairness. > - */ > -#define MAD_COMPLETION_PROC_LIMIT 1024 > - > static struct list_head ib_mad_port_list; > static u32 ib_mad_client_id = 0; > > @@ -96,6 +84,9 @@ static int add_nonoui_reg_req(struct ib_mad_reg_req > *mad_reg_req, > u8 mgmt_class); > static int add_oui_reg_req(struct ib_mad_reg_req *mad_reg_req, > struct ib_mad_agent_private *agent_priv); > +static bool ib_mad_send_error(struct ib_mad_port_private *port_priv, > + struct ib_wc *wc); > +static void ib_mad_send_done(struct ib_cq *cq, struct ib_wc *wc); > > /* > * Returns a ib_mad_port_private structure or NULL for a device/port > @@ -702,11 +693,11 @@ static void snoop_recv(struct ib_mad_qp_info *qp_info, > } > > static void build_smp_wc(struct ib_qp *qp, > - u64 wr_id, u16 slid, u16 pkey_index, u8 port_num, > + void *wr_cqe, u16 slid, u16 pkey_index, u8 port_num, Sorry I did not catch this before but rather than void * wouldn't it be better to use struct ib_cqe? Regardless: Reviewed-by: Ira Weiny >struct ib_wc *wc) > { > memset(wc, 0, sizeof *wc); > - wc->wr_id = wr_id; > + wc->wr_cqe = wr_cqe; > wc->status = IB_WC_SUCCESS; > wc->opcode = IB_WC_RECV; > wc->pkey_index = pkey_index; > @@ -844,7 +835,7 @@ static int handle_outgoing_dr_smp(struct > ib_mad_agent_private *mad_agent_priv, > } > > build_smp_wc(mad_agent_priv->agent.qp, > - send_wr->wr.wr_id, drslid, > + send_wr->wr.wr_cqe, drslid, >send_wr->pkey_index, >send_wr->port_num, &mad_wc); > > @@ -1051,7 +1042,9 @@ struct ib_mad_send_buf * ib_create_send_mad(struct > ib_mad_agent *mad_agent, > > mad_send_wr->sg_list[1].lkey = mad_agent->qp->pd->local_dma_lkey; > > - mad_send_wr->send_wr.wr.wr_id = (unsigned long) mad_send_wr; > + mad_send_wr->mad_list.cqe.done = ib_mad_send_done; > + > + mad_send_wr->send_wr.wr.wr_cqe = &mad_send_wr->mad_list.cqe; > mad_send_wr->send_wr.wr.sg_list = mad_send_wr->sg_list; > mad_send_wr->send_wr.wr.num_sge = 2; > mad_send_wr->send_wr.wr.opcode = IB_WR_SEND; > @@ -1163,8 +1156,9 @@ int ib_send_mad(struct ib_mad_send_wr_private > *mad_send_wr) > > /* Set WR ID to find mad_send_wr upon completion */ > qp_info = mad_send_wr->mad_agent_priv->qp_info; > - mad_send_wr->send_wr.wr.wr_id = (unsigned long)&mad_send_wr->mad_list; > mad_send_wr->mad_list.mad_queue = &qp_info->send_queue; > + mad_send_wr->mad_list.cqe.done = ib_mad_send_done; > + mad_send_wr->send_wr.wr.wr_cqe = &mad_send_wr->mad_list.cqe; > > mad_agent = mad_send_wr->send_buf.mad_agent; > sge = mad_send_wr->sg_list; > @@ -2185,13 +2179,14 @@ handle_smi(struct ib_mad_port_private *port_priv, > return handle_ib_smi(port_priv, qp_info, wc, port_num, recv, response); > } > > -static void ib_mad_recv_done_handler(struct ib_mad_port_private *port_priv, > - struct ib_wc *wc) > +static void ib_mad_recv_done(struct ib_cq *cq, struct ib_wc *wc) > { > + struct ib_mad_port_private *port_priv = cq->cq_context; > + struct ib_mad_list_head *mad_list = > + container_of(wc->wr_cqe, struct ib_mad_list_head, cqe); > struct ib_mad_qp_info *qp_info; > struct ib_mad_private_header *mad_priv_hdr; > struct ib_mad_private *recv, *response = NULL; > - struct ib_mad_list_head *mad_list; > struct ib_mad_agent_private *mad_agent; > int port_num; >
Re: [PATCH 1/2] IB/mad: pass ib_mad_send_buf explicitly to the recv_handler
On Mon, Jan 04, 2016 at 02:15:58PM +0100, Christoph Hellwig wrote: > Stop abusing wr_id and just pass the parameter explicitly. > > Signed-off-by: Christoph Hellwig Reviewed-by: Ira Weiny > --- > drivers/infiniband/core/cm.c | 1 + > drivers/infiniband/core/mad.c | 18 ++ > drivers/infiniband/core/sa_query.c| 7 --- > drivers/infiniband/core/user_mad.c| 1 + > drivers/infiniband/ulp/srpt/ib_srpt.c | 1 + > include/rdma/ib_mad.h | 2 ++ > 6 files changed, 19 insertions(+), 11 deletions(-) > > diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c > index e3a95d1..ad3726d 100644 > --- a/drivers/infiniband/core/cm.c > +++ b/drivers/infiniband/core/cm.c > @@ -3503,6 +3503,7 @@ int ib_cm_notify(struct ib_cm_id *cm_id, enum > ib_event_type event) > EXPORT_SYMBOL(ib_cm_notify); > > static void cm_recv_handler(struct ib_mad_agent *mad_agent, > + struct ib_mad_send_buf *send_buf, > struct ib_mad_recv_wc *mad_recv_wc) > { > struct cm_port *port = mad_agent->context; > diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c > index d4d2a61..cbe232a 100644 > --- a/drivers/infiniband/core/mad.c > +++ b/drivers/infiniband/core/mad.c > @@ -693,7 +693,7 @@ static void snoop_recv(struct ib_mad_qp_info *qp_info, > > atomic_inc(&mad_snoop_priv->refcount); > spin_unlock_irqrestore(&qp_info->snoop_lock, flags); > - mad_snoop_priv->agent.recv_handler(&mad_snoop_priv->agent, > + mad_snoop_priv->agent.recv_handler(&mad_snoop_priv->agent, NULL, > mad_recv_wc); > deref_snoop_agent(mad_snoop_priv); > spin_lock_irqsave(&qp_info->snoop_lock, flags); > @@ -1994,9 +1994,9 @@ static void ib_mad_complete_recv(struct > ib_mad_agent_private *mad_agent_priv, > /* user rmpp is in effect >* and this is an active RMPP MAD >*/ > - mad_recv_wc->wc->wr_id = 0; > - > mad_agent_priv->agent.recv_handler(&mad_agent_priv->agent, > -mad_recv_wc); > + mad_agent_priv->agent.recv_handler( > + &mad_agent_priv->agent, NULL, > + mad_recv_wc); > atomic_dec(&mad_agent_priv->refcount); > } else { > /* not user rmpp, revert to normal behavior and > @@ -2010,9 +2010,10 @@ static void ib_mad_complete_recv(struct > ib_mad_agent_private *mad_agent_priv, > spin_unlock_irqrestore(&mad_agent_priv->lock, flags); > > /* Defined behavior is to complete response before > request */ > - mad_recv_wc->wc->wr_id = (unsigned long) > &mad_send_wr->send_buf; > - > mad_agent_priv->agent.recv_handler(&mad_agent_priv->agent, > -mad_recv_wc); > + mad_agent_priv->agent.recv_handler( > + &mad_agent_priv->agent, > + &mad_send_wr->send_buf, > + mad_recv_wc); > atomic_dec(&mad_agent_priv->refcount); > > mad_send_wc.status = IB_WC_SUCCESS; > @@ -2021,7 +2022,7 @@ static void ib_mad_complete_recv(struct > ib_mad_agent_private *mad_agent_priv, > ib_mad_complete_send_wr(mad_send_wr, &mad_send_wc); > } > } else { > - mad_agent_priv->agent.recv_handler(&mad_agent_priv->agent, > + mad_agent_priv->agent.recv_handler(&mad_agent_priv->agent, NULL, > mad_recv_wc); > deref_mad_agent(mad_agent_priv); > } > @@ -2762,6 +2763,7 @@ static void local_completions(struct work_struct *work) > IB_MAD_SNOOP_RECVS); > recv_mad_agent->agent.recv_handler( > &recv_mad_agent->agent, > + &local->mad_send_wr->send_buf, > > &local->mad_priv->header.recv_wc); > spin_lock_irqsave(&recv_mad_agent->lock, flags); > atomic_dec(&recv_mad_agent->refcount); > diff --git a/drivers/infiniband/core/sa_query.c > b/drivers/infiniband/core/sa_query.c > index e364a42..1f91b6e 100644 > --- a/drivers/infiniband/core/sa_query.c > +++ b/drivers/infiniband/core/sa_query.c > @@ -1669,14 +1669,15 @@ static void send_handler(struct ib_mad_agent *agent,
re: iser-target: Add iSCSI Extensions for RDMA (iSER) target driver
Hello Nicholas Bellinger, The patch b8d26b3be8b3: "iser-target: Add iSCSI Extensions for RDMA (iSER) target driver" from Mar 7, 2013, leads to the following static checker warning: drivers/infiniband/ulp/isert/ib_isert.c:423 isert_device_get() error: passing non negative 1 to ERR_PTR drivers/infiniband/ulp/isert/ib_isert.c 417 418 device->ib_device = cma_id->device; 419 ret = isert_create_device_ib_res(device); 420 if (ret) { 421 kfree(device); 422 mutex_unlock(&device_list_mutex); 423 return ERR_PTR(ret); The warning here is because isert_create_device_ib_res() returns either a negative error code, zero or one. The documentation is not clear what that means. AHAHAHAHAHAHAHAH. I joke. There is no documentation. Anyway, it's definitely a bug and it leads to a NULL dereference in the caller. 424 } 425 426 device->refcount++; 427 list_add_tail(&device->dev_node, &device_list); 428 isert_info("Created a new iser device %p refcount %d\n", 429 device, device->refcount); 430 mutex_unlock(&device_list_mutex); 431 432 return device; 433 } regards, dan carpenter -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv2 TRIVIAL] IB/core: ib_mad.h ib_mad_snoop_handler documentation fix
ib_mad_snoop_handler uses send_buf rather than send_wr Signed-off-by: Hal Rosenstock --- Change since v1: Fixed typo in patch description diff --git a/include/rdma/ib_mad.h b/include/rdma/ib_mad.h index ec9b44d..2b3573d 100644 --- a/include/rdma/ib_mad.h +++ b/include/rdma/ib_mad.h @@ -424,11 +424,11 @@ typedef void (*ib_mad_send_handler)(struct ib_mad_agent *mad_agent, /** * ib_mad_snoop_handler - Callback handler for snooping sent MADs. * @mad_agent: MAD agent that snooped the MAD. - * @send_wr: Work request information on the sent MAD. + * @send_buf: send MAD data buffer. * @mad_send_wc: Work completion information on the sent MAD. Valid * only for snooping that occurs on a send completion. * - * Clients snooping MADs should not modify data referenced by the @send_wr + * Clients snooping MADs should not modify data referenced by the @send_buf * or @mad_send_wc. */ typedef void (*ib_mad_snoop_handler)(struct ib_mad_agent *mad_agent, -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] staging/rdma/hfi1: check for ARMED->ACTIVE transition in receive interrupt
On Mon, Jan 04, 2016 at 11:21:19AM -0500, Jubin John wrote: > From: Jim Snow > > } else { > + /* Auto activate link on non-SC15 packet receive */ > + if (unlikely(rcd->ppd->host_link_state == > + HLS_UP_ARMED)) > + if (set_armed_to_active(rcd, packet, dd)) > + goto bail; What is the advantage of double "if" over one "if"? Something like that + if (unlikely(rcd->ppd->host_link_state == HLS_UP_ARMED) && (set_armed_to_active(rcd, packet, dd)) + goto bail; > last = process_rcv_packet(&packet, thread); > } > > @@ -984,6 +1020,42 @@ bail: > } > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH TRIVIAL] IB/core: ib_mad.h ib_mad_snoop_handler documentation fix
On Mon, Jan 04, 2016 at 11:04:53AM -0500, Hal Rosenstock wrote: > ib_mad_snoop_handler ues send_buf rather than send_wr ues --> uses -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: start moving user space visible constants to uapi headers
On 12/24/2015 8:39 AM, Christoph Hellwig wrote: Currently very little of the uverbs user interface is actually exposed in uapi headers, and it's a constant struggle to figure out what's kernel internal and what is actually exposed in public. This series starts sorting this out by creating the infrastructure for a uapi header shared between uverbs and the core IB stack, and starts moving all WR and WC constants as well as the device capabilitity flags there. A lot more work will have to follow, and I hope others will help out as well. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Series looks ok to me. Reviewed-by: Steve Wise -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] svc_rdma: use local_dma_lkey
On 12/22/2015 7:11 AM, Christoph Hellwig wrote: We now alwasy have a per-PD local_dma_lkey available. Make use of that fact in svc_rdma and stop registering our own MR. Signed-off-by: Christoph Hellwig Reviewed-by: Sagi Grimberg Reviewed-by: Jason Gunthorpe Reviewed-by: Chuck Lever --- include/linux/sunrpc/svc_rdma.h| 2 -- net/sunrpc/xprtrdma/svc_rdma_backchannel.c | 2 +- net/sunrpc/xprtrdma/svc_rdma_recvfrom.c| 4 ++-- net/sunrpc/xprtrdma/svc_rdma_sendto.c | 6 ++--- net/sunrpc/xprtrdma/svc_rdma_transport.c | 36 -- 5 files changed, 10 insertions(+), 40 deletions(-) Reviewed-by: Steve Wise -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] staging/rdma/hfi1: check for ARMED->ACTIVE transition in receive interrupt
From: Jim Snow The link state will transition from ARMED to ACTIVE when a non-SC15 packet arrives, but the driver might not notice the change. With this fix, if the slowpath receive interrupt handler sees a non-SC15 packet while in the ARMED state, we queue work to call linkstate_active_work from process context to promote it to ACTIVE. Reviewed-by: Dean Luick Reviewed-by: Ira Weiny Reviewed-by: Mike Marciniszyn Signed-off-by: Jim Snow Signed-off-by: Brendan Cunningham Signed-off-by: Jubin John --- Changes in v2: - Fixed whitespace - Converted armed->active transition to inline function - Added comment to document reason for skipping HFI1_CTRL_CTXT in set_all_slowpath() drivers/staging/rdma/hfi1/chip.c | 5 +-- drivers/staging/rdma/hfi1/chip.h | 2 ++ drivers/staging/rdma/hfi1/driver.c | 72 ++ drivers/staging/rdma/hfi1/hfi.h| 11 ++ drivers/staging/rdma/hfi1/init.c | 1 + 5 files changed, 89 insertions(+), 2 deletions(-) diff --git a/drivers/staging/rdma/hfi1/chip.c b/drivers/staging/rdma/hfi1/chip.c index f7bf902..63d5d71 100644 --- a/drivers/staging/rdma/hfi1/chip.c +++ b/drivers/staging/rdma/hfi1/chip.c @@ -7878,7 +7878,7 @@ static inline void clear_recv_intr(struct hfi1_ctxtdata *rcd) } /* force the receive interrupt */ -static inline void force_recv_intr(struct hfi1_ctxtdata *rcd) +void force_recv_intr(struct hfi1_ctxtdata *rcd) { write_csr(rcd->dd, CCE_INT_FORCE + (8 * rcd->ireg), rcd->imask); } @@ -7977,7 +7977,7 @@ u32 read_physical_state(struct hfi1_devdata *dd) & DC_DC8051_STS_CUR_STATE_PORT_MASK; } -static u32 read_logical_state(struct hfi1_devdata *dd) +u32 read_logical_state(struct hfi1_devdata *dd) { u64 reg; @@ -9952,6 +9952,7 @@ int set_link_state(struct hfi1_pportdata *ppd, u32 state) ppd->link_enabled = 1; } + set_all_slowpath(ppd->dd); ret = set_local_link_attributes(ppd); if (ret) break; diff --git a/drivers/staging/rdma/hfi1/chip.h b/drivers/staging/rdma/hfi1/chip.h index b46ef66..78ba425 100644 --- a/drivers/staging/rdma/hfi1/chip.h +++ b/drivers/staging/rdma/hfi1/chip.h @@ -690,6 +690,8 @@ u64 read_dev_cntr(struct hfi1_devdata *dd, int index, int vl); u64 write_dev_cntr(struct hfi1_devdata *dd, int index, int vl, u64 data); u64 read_port_cntr(struct hfi1_pportdata *ppd, int index, int vl); u64 write_port_cntr(struct hfi1_pportdata *ppd, int index, int vl, u64 data); +u32 read_logical_state(struct hfi1_devdata *dd); +void force_recv_intr(struct hfi1_ctxtdata *rcd); /* Per VL indexes */ enum { diff --git a/drivers/staging/rdma/hfi1/driver.c b/drivers/staging/rdma/hfi1/driver.c index 3218520..dd8b2c5 100644 --- a/drivers/staging/rdma/hfi1/driver.c +++ b/drivers/staging/rdma/hfi1/driver.c @@ -862,6 +862,37 @@ static inline void set_all_dma_rtail(struct hfi1_devdata *dd) &handle_receive_interrupt_dma_rtail; } +void set_all_slowpath(struct hfi1_devdata *dd) +{ + int i; + + /* HFI1_CTRL_CTXT must always use the slow path interrupt handler */ + for (i = HFI1_CTRL_CTXT + 1; i < dd->first_user_ctxt; i++) + dd->rcd[i]->do_interrupt = &handle_receive_interrupt; +} + +static inline int set_armed_to_active(struct hfi1_ctxtdata *rcd, + struct hfi1_packet packet, + struct hfi1_devdata *dd) +{ + struct work_struct *lsaw = &rcd->ppd->linkstate_active_work; + struct hfi1_message_header *hdr = hfi1_get_msgheader(packet.rcd->dd, +packet.rhf_addr); + + if (hdr2sc(hdr, packet.rhf) != 0xf) { + int hwstate = read_logical_state(dd); + + if (hwstate != LSTATE_ACTIVE) { + dd_dev_info(dd, "Unexpected link state %d\n", hwstate); + return 0; + } + + queue_work(rcd->ppd->hfi1_wq, lsaw); + return 1; + } + return 0; +} + /* * handle_receive_interrupt - receive a packet * @rcd: the context @@ -929,6 +960,11 @@ int handle_receive_interrupt(struct hfi1_ctxtdata *rcd, int thread) last = skip_rcv_packet(&packet, thread); skip_pkt = 0; } else { + /* Auto activate link on non-SC15 packet receive */ + if (unlikely(rcd->ppd->host_link_state == +HLS_UP_ARMED)) + if (set_armed_to_active(rcd, packet, dd)) + goto bail; last = process_rcv_packet(&packet, thread); } @@ -984,6 +1020,42 @@ bail: } /* + * We may discover in the interrupt that the hardware link
Re: [PATCH 2/2] IB/mad: use CQ abstraction
On 1/4/2016 9:16 AM, Christoph Hellwig wrote: > Remove the local workqueue to process mad completions and use the CQ API > instead. > > Signed-off-by: Christoph Hellwig Reviewed-by: Hal Rosenstock -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH TRIVIAL] IB/core: ib_mad.h ib_mad_snoop_handler documentation fix
ib_mad_snoop_handler ues send_buf rather than send_wr Signed-off-by: Hal Rosenstock --- diff --git a/include/rdma/ib_mad.h b/include/rdma/ib_mad.h index ec9b44d..2b3573d 100644 --- a/include/rdma/ib_mad.h +++ b/include/rdma/ib_mad.h @@ -424,11 +424,11 @@ typedef void (*ib_mad_send_handler)(struct ib_mad_agent *mad_agent, /** * ib_mad_snoop_handler - Callback handler for snooping sent MADs. * @mad_agent: MAD agent that snooped the MAD. - * @send_wr: Work request information on the sent MAD. + * @send_buf: send MAD data buffer. * @mad_send_wc: Work completion information on the sent MAD. Valid * only for snooping that occurs on a send completion. * - * Clients snooping MADs should not modify data referenced by the @send_wr + * Clients snooping MADs should not modify data referenced by the @send_buf * or @mad_send_wc. */ typedef void (*ib_mad_snoop_handler)(struct ib_mad_agent *mad_agent, -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] staging: rdma: hfi1: diag: constify hfi1_filter_array structure
> From: devel [mailto:driverdev-devel-boun...@linuxdriverproject.org] On > Behalf Of Julia Lawall > Subject: [PATCH] staging: rdma: hfi1: diag: constify hfi1_filter_array > structure > > The hfi1_filter_array structure is never modified, so declare it as const. > > Done with the help of Coccinelle. > > Signed-off-by: Julia Lawall > Thanks for the patch! Acked-by: Mike Marciniszyn -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] IB/sysfs: Fix sparse warning on attr_id
On Sun, 3 Jan 2016, ira.we...@intel.com wrote: > Attributed ID was declared as an int while the value should really be big > endian 16. Reviewed-by: Christoph Lameter -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC contig pages support 1/2] IB: Supports contiguous memory operations
[Sorry for resending, forgot to CC Minchan] On 12/23/2015 05:30 PM, Shachar Raindel wrote: >>> >>> I completely agree, and this RFC was sent in order to start discussion >>> on this subject. >>> >>> Dear MM people, can you please advise on the subject? >>> >>> Multiple HW vendors, from different fields, ranging between embedded >> SoC >>> devices (TI) and HPC (Mellanox) are looking for a solution to allocate >>> blocks of contiguous memory to user space applications, without using >> huge >>> pages. >>> >>> What should be the API to expose such feature? >>> >>> Should we create a virtual FS that allows the user to create "files" >>> representing memory allocations, and define the contiguous level we >>> attempt to allocate using folders (similar to hugetlbfs)? >>> >>> Should we patch hugetlbfs to allow allocation of contiguous memory >> chunks, >>> without creating larger memory mapping in the CPU page tables? >>> >>> Should we create a special "allocator" virtual device, that will hand >> out >>> memory in contiguous chunks via a call to mmap with an FD connected to >> the >>> device? >> >> How much memory do you assume to be used like this? > > Depends on the use case. Most likely several MBs/core, used for interfacing > with the HW (packet rings, frame buffers, etc.). > > Some applications might want to perform calculations in such memory, to > optimize communication time, especially in the HPC market. OK. > >> Is this memory >> supposed to be swappable, migratable, etc? I.e. on LRU lists? > > Most likely not. In many of the relevant applications (embedded, HPC), > there is no swap and the application threads are pinned to specific cores > and NUMA nodes. > The biggest pain here is that these memory pages will not be eligible for > compaction, making it harder to handle fragmentations and CMA allocation > requests. There was a patch set to enable compaction on such pages, see https://lwn.net/Articles/650917/ Minchan was going to pick this after Gioh left, and then it should be possible. But it requires careful driver-specific cooperation, i.e. when a page can be isolated for the migration, see http://article.gmane.org/gmane.linux.kernel.mm/136457 >> Allocating a lot of memory (e.g. most of userspace memory) that's not >> LRU wouldn't be nice. But LRU operations are not prepared to work witch >> such non-standard-sized allocations, regardless of what API you use. So >> I think that's the more fundamental questions here. > > I agree that there are fundamental questions here. > > That being said, there is a clear need for an API allowing > allocation, to the user space, limited size of memory that > is composed of large contiguous blocks. > > What will be the best way to implement such solution? Given the likely driver-specific constraints/handling of the page migration, I'm not sure if some completely universal API is feasible. Maybe some reusable parts of the functionality in the patch in this thread could be provided by mm. > Thanks, > --Shachar > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majord...@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: em...@kvack.org > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC contig pages support 1/2] IB: Supports contiguous memory operations
On 12/23/2015 05:30 PM, Shachar Raindel wrote: >>> >>> I completely agree, and this RFC was sent in order to start discussion >>> on this subject. >>> >>> Dear MM people, can you please advise on the subject? >>> >>> Multiple HW vendors, from different fields, ranging between embedded >> SoC >>> devices (TI) and HPC (Mellanox) are looking for a solution to allocate >>> blocks of contiguous memory to user space applications, without using >> huge >>> pages. >>> >>> What should be the API to expose such feature? >>> >>> Should we create a virtual FS that allows the user to create "files" >>> representing memory allocations, and define the contiguous level we >>> attempt to allocate using folders (similar to hugetlbfs)? >>> >>> Should we patch hugetlbfs to allow allocation of contiguous memory >> chunks, >>> without creating larger memory mapping in the CPU page tables? >>> >>> Should we create a special "allocator" virtual device, that will hand >> out >>> memory in contiguous chunks via a call to mmap with an FD connected to >> the >>> device? >> >> How much memory do you assume to be used like this? > > Depends on the use case. Most likely several MBs/core, used for interfacing > with the HW (packet rings, frame buffers, etc.). > > Some applications might want to perform calculations in such memory, to > optimize communication time, especially in the HPC market. OK. > >> Is this memory >> supposed to be swappable, migratable, etc? I.e. on LRU lists? > > Most likely not. In many of the relevant applications (embedded, HPC), > there is no swap and the application threads are pinned to specific cores > and NUMA nodes. > The biggest pain here is that these memory pages will not be eligible for > compaction, making it harder to handle fragmentations and CMA allocation > requests. There was a patch set to enable compaction on such pages, see https://lwn.net/Articles/650917/ Minchan was going to pick this after Gioh left, and then it should be possible. But it requires careful driver-specific cooperation, i.e. when a page can be isolated for the migration, see http://article.gmane.org/gmane.linux.kernel.mm/136457 >> Allocating a lot of memory (e.g. most of userspace memory) that's not >> LRU wouldn't be nice. But LRU operations are not prepared to work witch >> such non-standard-sized allocations, regardless of what API you use. So >> I think that's the more fundamental questions here. > > I agree that there are fundamental questions here. > > That being said, there is a clear need for an API allowing > allocation, to the user space, limited size of memory that > is composed of large contiguous blocks. > > What will be the best way to implement such solution? Given the likely driver-specific constraints/handling of the page migration, I'm not sure if some completely universal API is feasible. Maybe some reusable parts of the functionality in the patch in this thread could be provided by mm. > Thanks, > --Shachar > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majord...@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: em...@kvack.org > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] IB/mad: pass ib_mad_send_buf explicitly to the recv_handler
On 1/4/2016 8:15 AM, Christoph Hellwig wrote: > Stop abusing wr_id and just pass the parameter explicitly. > > Signed-off-by: Christoph Hellwig Reviewed-by: Hal Rosenstock > --- > drivers/infiniband/core/cm.c | 1 + > drivers/infiniband/core/mad.c | 18 ++ > drivers/infiniband/core/sa_query.c| 7 --- > drivers/infiniband/core/user_mad.c| 1 + > drivers/infiniband/ulp/srpt/ib_srpt.c | 1 + > include/rdma/ib_mad.h | 2 ++ > 6 files changed, 19 insertions(+), 11 deletions(-) > > diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c > index e3a95d1..ad3726d 100644 > --- a/drivers/infiniband/core/cm.c > +++ b/drivers/infiniband/core/cm.c > @@ -3503,6 +3503,7 @@ int ib_cm_notify(struct ib_cm_id *cm_id, enum > ib_event_type event) > EXPORT_SYMBOL(ib_cm_notify); > > static void cm_recv_handler(struct ib_mad_agent *mad_agent, > + struct ib_mad_send_buf *send_buf, > struct ib_mad_recv_wc *mad_recv_wc) > { > struct cm_port *port = mad_agent->context; > diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c > index d4d2a61..cbe232a 100644 > --- a/drivers/infiniband/core/mad.c > +++ b/drivers/infiniband/core/mad.c > @@ -693,7 +693,7 @@ static void snoop_recv(struct ib_mad_qp_info *qp_info, > > atomic_inc(&mad_snoop_priv->refcount); > spin_unlock_irqrestore(&qp_info->snoop_lock, flags); > - mad_snoop_priv->agent.recv_handler(&mad_snoop_priv->agent, > + mad_snoop_priv->agent.recv_handler(&mad_snoop_priv->agent, NULL, > mad_recv_wc); > deref_snoop_agent(mad_snoop_priv); > spin_lock_irqsave(&qp_info->snoop_lock, flags); > @@ -1994,9 +1994,9 @@ static void ib_mad_complete_recv(struct > ib_mad_agent_private *mad_agent_priv, > /* user rmpp is in effect >* and this is an active RMPP MAD >*/ > - mad_recv_wc->wc->wr_id = 0; > - > mad_agent_priv->agent.recv_handler(&mad_agent_priv->agent, > -mad_recv_wc); > + mad_agent_priv->agent.recv_handler( > + &mad_agent_priv->agent, NULL, > + mad_recv_wc); > atomic_dec(&mad_agent_priv->refcount); > } else { > /* not user rmpp, revert to normal behavior and > @@ -2010,9 +2010,10 @@ static void ib_mad_complete_recv(struct > ib_mad_agent_private *mad_agent_priv, > spin_unlock_irqrestore(&mad_agent_priv->lock, flags); > > /* Defined behavior is to complete response before > request */ > - mad_recv_wc->wc->wr_id = (unsigned long) > &mad_send_wr->send_buf; > - > mad_agent_priv->agent.recv_handler(&mad_agent_priv->agent, > -mad_recv_wc); > + mad_agent_priv->agent.recv_handler( > + &mad_agent_priv->agent, > + &mad_send_wr->send_buf, > + mad_recv_wc); > atomic_dec(&mad_agent_priv->refcount); > > mad_send_wc.status = IB_WC_SUCCESS; > @@ -2021,7 +2022,7 @@ static void ib_mad_complete_recv(struct > ib_mad_agent_private *mad_agent_priv, > ib_mad_complete_send_wr(mad_send_wr, &mad_send_wc); > } > } else { > - mad_agent_priv->agent.recv_handler(&mad_agent_priv->agent, > + mad_agent_priv->agent.recv_handler(&mad_agent_priv->agent, NULL, > mad_recv_wc); > deref_mad_agent(mad_agent_priv); > } > @@ -2762,6 +2763,7 @@ static void local_completions(struct work_struct *work) > IB_MAD_SNOOP_RECVS); > recv_mad_agent->agent.recv_handler( > &recv_mad_agent->agent, > + &local->mad_send_wr->send_buf, > > &local->mad_priv->header.recv_wc); > spin_lock_irqsave(&recv_mad_agent->lock, flags); > atomic_dec(&recv_mad_agent->refcount); > diff --git a/drivers/infiniband/core/sa_query.c > b/drivers/infiniband/core/sa_query.c > index e364a42..1f91b6e 100644 > --- a/drivers/infiniband/core/sa_query.c > +++ b/drivers/infiniband/core/sa_query.c > @@ -1669,14 +1669,15 @@ static void send_handler(struct ib_mad_agent *agent, > } > > stat
[PATCH 2/2] IB/mad: use CQ abstraction
Remove the local workqueue to process mad completions and use the CQ API instead. Signed-off-by: Christoph Hellwig --- drivers/infiniband/core/mad.c | 159 + drivers/infiniband/core/mad_priv.h | 2 +- 2 files changed, 58 insertions(+), 103 deletions(-) diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c index cbe232a..286d1a9 100644 --- a/drivers/infiniband/core/mad.c +++ b/drivers/infiniband/core/mad.c @@ -61,18 +61,6 @@ MODULE_PARM_DESC(send_queue_size, "Size of send queue in number of work requests module_param_named(recv_queue_size, mad_recvq_size, int, 0444); MODULE_PARM_DESC(recv_queue_size, "Size of receive queue in number of work requests"); -/* - * Define a limit on the number of completions which will be processed by the - * worker thread in a single work item. This ensures that other work items - * (potentially from other users) are processed fairly. - * - * The number of completions was derived from the default queue sizes above. - * We use a value which is double the larger of the 2 queues (receive @ 512) - * but keep it fixed such that an increase in that value does not introduce - * unfairness. - */ -#define MAD_COMPLETION_PROC_LIMIT 1024 - static struct list_head ib_mad_port_list; static u32 ib_mad_client_id = 0; @@ -96,6 +84,9 @@ static int add_nonoui_reg_req(struct ib_mad_reg_req *mad_reg_req, u8 mgmt_class); static int add_oui_reg_req(struct ib_mad_reg_req *mad_reg_req, struct ib_mad_agent_private *agent_priv); +static bool ib_mad_send_error(struct ib_mad_port_private *port_priv, + struct ib_wc *wc); +static void ib_mad_send_done(struct ib_cq *cq, struct ib_wc *wc); /* * Returns a ib_mad_port_private structure or NULL for a device/port @@ -702,11 +693,11 @@ static void snoop_recv(struct ib_mad_qp_info *qp_info, } static void build_smp_wc(struct ib_qp *qp, -u64 wr_id, u16 slid, u16 pkey_index, u8 port_num, +void *wr_cqe, u16 slid, u16 pkey_index, u8 port_num, struct ib_wc *wc) { memset(wc, 0, sizeof *wc); - wc->wr_id = wr_id; + wc->wr_cqe = wr_cqe; wc->status = IB_WC_SUCCESS; wc->opcode = IB_WC_RECV; wc->pkey_index = pkey_index; @@ -844,7 +835,7 @@ static int handle_outgoing_dr_smp(struct ib_mad_agent_private *mad_agent_priv, } build_smp_wc(mad_agent_priv->agent.qp, -send_wr->wr.wr_id, drslid, +send_wr->wr.wr_cqe, drslid, send_wr->pkey_index, send_wr->port_num, &mad_wc); @@ -1051,7 +1042,9 @@ struct ib_mad_send_buf * ib_create_send_mad(struct ib_mad_agent *mad_agent, mad_send_wr->sg_list[1].lkey = mad_agent->qp->pd->local_dma_lkey; - mad_send_wr->send_wr.wr.wr_id = (unsigned long) mad_send_wr; + mad_send_wr->mad_list.cqe.done = ib_mad_send_done; + + mad_send_wr->send_wr.wr.wr_cqe = &mad_send_wr->mad_list.cqe; mad_send_wr->send_wr.wr.sg_list = mad_send_wr->sg_list; mad_send_wr->send_wr.wr.num_sge = 2; mad_send_wr->send_wr.wr.opcode = IB_WR_SEND; @@ -1163,8 +1156,9 @@ int ib_send_mad(struct ib_mad_send_wr_private *mad_send_wr) /* Set WR ID to find mad_send_wr upon completion */ qp_info = mad_send_wr->mad_agent_priv->qp_info; - mad_send_wr->send_wr.wr.wr_id = (unsigned long)&mad_send_wr->mad_list; mad_send_wr->mad_list.mad_queue = &qp_info->send_queue; + mad_send_wr->mad_list.cqe.done = ib_mad_send_done; + mad_send_wr->send_wr.wr.wr_cqe = &mad_send_wr->mad_list.cqe; mad_agent = mad_send_wr->send_buf.mad_agent; sge = mad_send_wr->sg_list; @@ -2185,13 +2179,14 @@ handle_smi(struct ib_mad_port_private *port_priv, return handle_ib_smi(port_priv, qp_info, wc, port_num, recv, response); } -static void ib_mad_recv_done_handler(struct ib_mad_port_private *port_priv, -struct ib_wc *wc) +static void ib_mad_recv_done(struct ib_cq *cq, struct ib_wc *wc) { + struct ib_mad_port_private *port_priv = cq->cq_context; + struct ib_mad_list_head *mad_list = + container_of(wc->wr_cqe, struct ib_mad_list_head, cqe); struct ib_mad_qp_info *qp_info; struct ib_mad_private_header *mad_priv_hdr; struct ib_mad_private *recv, *response = NULL; - struct ib_mad_list_head *mad_list; struct ib_mad_agent_private *mad_agent; int port_num; int ret = IB_MAD_RESULT_SUCCESS; @@ -2199,7 +2194,17 @@ static void ib_mad_recv_done_handler(struct ib_mad_port_private *port_priv, u16 resp_mad_pkey_index = 0; bool opa; - mad_list = (struct ib_mad_list_head *)(unsigned long)wc->wr_id; + if (list_empty_careful(&port_priv->port_list)) + return; + + if (wc->sta
[PATCH 1/2] IB/mad: pass ib_mad_send_buf explicitly to the recv_handler
Stop abusing wr_id and just pass the parameter explicitly. Signed-off-by: Christoph Hellwig --- drivers/infiniband/core/cm.c | 1 + drivers/infiniband/core/mad.c | 18 ++ drivers/infiniband/core/sa_query.c| 7 --- drivers/infiniband/core/user_mad.c| 1 + drivers/infiniband/ulp/srpt/ib_srpt.c | 1 + include/rdma/ib_mad.h | 2 ++ 6 files changed, 19 insertions(+), 11 deletions(-) diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index e3a95d1..ad3726d 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -3503,6 +3503,7 @@ int ib_cm_notify(struct ib_cm_id *cm_id, enum ib_event_type event) EXPORT_SYMBOL(ib_cm_notify); static void cm_recv_handler(struct ib_mad_agent *mad_agent, + struct ib_mad_send_buf *send_buf, struct ib_mad_recv_wc *mad_recv_wc) { struct cm_port *port = mad_agent->context; diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c index d4d2a61..cbe232a 100644 --- a/drivers/infiniband/core/mad.c +++ b/drivers/infiniband/core/mad.c @@ -693,7 +693,7 @@ static void snoop_recv(struct ib_mad_qp_info *qp_info, atomic_inc(&mad_snoop_priv->refcount); spin_unlock_irqrestore(&qp_info->snoop_lock, flags); - mad_snoop_priv->agent.recv_handler(&mad_snoop_priv->agent, + mad_snoop_priv->agent.recv_handler(&mad_snoop_priv->agent, NULL, mad_recv_wc); deref_snoop_agent(mad_snoop_priv); spin_lock_irqsave(&qp_info->snoop_lock, flags); @@ -1994,9 +1994,9 @@ static void ib_mad_complete_recv(struct ib_mad_agent_private *mad_agent_priv, /* user rmpp is in effect * and this is an active RMPP MAD */ - mad_recv_wc->wc->wr_id = 0; - mad_agent_priv->agent.recv_handler(&mad_agent_priv->agent, - mad_recv_wc); + mad_agent_priv->agent.recv_handler( + &mad_agent_priv->agent, NULL, + mad_recv_wc); atomic_dec(&mad_agent_priv->refcount); } else { /* not user rmpp, revert to normal behavior and @@ -2010,9 +2010,10 @@ static void ib_mad_complete_recv(struct ib_mad_agent_private *mad_agent_priv, spin_unlock_irqrestore(&mad_agent_priv->lock, flags); /* Defined behavior is to complete response before request */ - mad_recv_wc->wc->wr_id = (unsigned long) &mad_send_wr->send_buf; - mad_agent_priv->agent.recv_handler(&mad_agent_priv->agent, - mad_recv_wc); + mad_agent_priv->agent.recv_handler( + &mad_agent_priv->agent, + &mad_send_wr->send_buf, + mad_recv_wc); atomic_dec(&mad_agent_priv->refcount); mad_send_wc.status = IB_WC_SUCCESS; @@ -2021,7 +2022,7 @@ static void ib_mad_complete_recv(struct ib_mad_agent_private *mad_agent_priv, ib_mad_complete_send_wr(mad_send_wr, &mad_send_wc); } } else { - mad_agent_priv->agent.recv_handler(&mad_agent_priv->agent, + mad_agent_priv->agent.recv_handler(&mad_agent_priv->agent, NULL, mad_recv_wc); deref_mad_agent(mad_agent_priv); } @@ -2762,6 +2763,7 @@ static void local_completions(struct work_struct *work) IB_MAD_SNOOP_RECVS); recv_mad_agent->agent.recv_handler( &recv_mad_agent->agent, + &local->mad_send_wr->send_buf, &local->mad_priv->header.recv_wc); spin_lock_irqsave(&recv_mad_agent->lock, flags); atomic_dec(&recv_mad_agent->refcount); diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c index e364a42..1f91b6e 100644 --- a/drivers/infiniband/core/sa_query.c +++ b/drivers/infiniband/core/sa_query.c @@ -1669,14 +1669,15 @@ static void send_handler(struct ib_mad_agent *agent, } static void recv_handler(struct ib_mad_agent *mad_agent, +struct ib_mad_send_buf *send_buf, struct ib_mad_recv_wc *mad_recv_wc) {
convert mad to the new CQ API
This series converts the mad handler to the new CQ API so that it ensures fairness in completions intead of starving other processes while also greatly simplifying the code. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V1 for-next 1/2] IB/core: Rename rdma_addr_find_dmac_by_grh
rdma_addr_find_dmac_by_grh resolves dmac, vlan_id and if_index and downsteram patch will also add hop_limit as an output parameter, thus we rename it to rdma_addr_find_l2_eth_by_grh. Signed-off-by: Matan Barak --- drivers/infiniband/core/addr.c | 7 --- drivers/infiniband/core/verbs.c | 18 +- drivers/infiniband/hw/ocrdma/ocrdma_ah.c | 6 +++--- include/rdma/ib_addr.h | 5 +++-- 4 files changed, 19 insertions(+), 17 deletions(-) diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c index 0b5f245..ce3c68e 100644 --- a/drivers/infiniband/core/addr.c +++ b/drivers/infiniband/core/addr.c @@ -540,8 +540,9 @@ static void resolve_cb(int status, struct sockaddr *src_addr, complete(&((struct resolve_cb_context *)context)->comp); } -int rdma_addr_find_dmac_by_grh(const union ib_gid *sgid, const union ib_gid *dgid, - u8 *dmac, u16 *vlan_id, int *if_index) +int rdma_addr_find_l2_eth_by_grh(const union ib_gid *sgid, +const union ib_gid *dgid, +u8 *dmac, u16 *vlan_id, int *if_index) { int ret = 0; struct rdma_dev_addr dev_addr; @@ -583,7 +584,7 @@ int rdma_addr_find_dmac_by_grh(const union ib_gid *sgid, const union ib_gid *dgi dev_put(dev); return ret; } -EXPORT_SYMBOL(rdma_addr_find_dmac_by_grh); +EXPORT_SYMBOL(rdma_addr_find_l2_eth_by_grh); int rdma_addr_find_smac_by_sgid(union ib_gid *sgid, u8 *smac, u16 *vlan_id) { diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index 072b94d..66eb498 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -467,11 +467,11 @@ int ib_init_ah_from_wc(struct ib_device *device, u8 port_num, if (!idev) return -ENODEV; - ret = rdma_addr_find_dmac_by_grh(&dgid, &sgid, -ah_attr->dmac, -wc->wc_flags & IB_WC_WITH_VLAN ? -NULL : &vlan_id, -&if_index); + ret = rdma_addr_find_l2_eth_by_grh(&dgid, &sgid, + ah_attr->dmac, + wc->wc_flags & IB_WC_WITH_VLAN ? + NULL : &vlan_id, + &if_index); if (ret) { dev_put(idev); return ret; @@ -1158,10 +1158,10 @@ int ib_resolve_eth_dmac(struct ib_qp *qp, ifindex = sgid_attr.ndev->ifindex; - ret = rdma_addr_find_dmac_by_grh(&sgid, - &qp_attr->ah_attr.grh.dgid, -qp_attr->ah_attr.dmac, -NULL, &ifindex); + ret = rdma_addr_find_l2_eth_by_grh(&sgid, + &qp_attr->ah_attr.grh.dgid, + qp_attr->ah_attr.dmac, + NULL, &ifindex); dev_put(sgid_attr.ndev); } diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_ah.c b/drivers/infiniband/hw/ocrdma/ocrdma_ah.c index a343e03..850e0d1 100644 --- a/drivers/infiniband/hw/ocrdma/ocrdma_ah.c +++ b/drivers/infiniband/hw/ocrdma/ocrdma_ah.c @@ -152,9 +152,9 @@ struct ib_ah *ocrdma_create_ah(struct ib_pd *ibpd, struct ib_ah_attr *attr) if ((pd->uctx) && (!rdma_is_multicast_addr((struct in6_addr *)attr->grh.dgid.raw)) && (!rdma_link_local_addr((struct in6_addr *)attr->grh.dgid.raw))) { - status = rdma_addr_find_dmac_by_grh(&sgid, &attr->grh.dgid, - attr->dmac, &vlan_tag, - &sgid_attr.ndev->ifindex); + status = rdma_addr_find_l2_eth_by_grh(&sgid, &attr->grh.dgid, + attr->dmac, &vlan_tag, + &sgid_attr.ndev->ifindex); if (status) { pr_err("%s(): Failed to resolve dmac from gid." "status = %d\n", __func__, status); diff --git a/include/rdma/ib_addr.h b/include/rdma/ib_addr.h index 87156dc..73fd088 100644 --- a/include/rdma/ib_addr.h +++ b/include/rdma/ib_addr.h @@ -130,8 +130,9 @@ int rdma_copy_addr(struct rdma_dev_addr *dev_addr, struct net_device *dev, int rdma_addr_size(struct sockaddr *addr); int rdma_addr_find_smac_by_sgid(union ib_gid *sgid, u8 *smac, u16 *v
[PATCH V1 for-next 0/2] Fix hop-limit for RoCE
Hi Doug, Previously, the hop limit of RoCE packets were set to IPV6_DEFAULT_HOPLIMIT. This generally works, but RoCE stack needs to follow the IP stack rules. Therefore, this patch series use ip4_dst_hoplimit and ip6_dst_hoplimit in order to set the correct hop limit for RoCE traffic. The first patch refactors the name of rdma_addr_find_dmac_by_grh to rdma_addr_find_l2_eth_by_grh while the second one does the actual change. Regards, Matan Changes from V0: - Hop limit in IB when using reversible path should be 0xff. Matan Barak (2): IB/core: Rename rdma_addr_find_dmac_by_grh IB/core: Use hop-limit from IP stack for RoCE drivers/infiniband/core/addr.c | 14 +++--- drivers/infiniband/core/cm.c | 1 + drivers/infiniband/core/cma.c| 12 +--- drivers/infiniband/core/verbs.c | 30 ++ drivers/infiniband/hw/ocrdma/ocrdma_ah.c | 7 --- include/rdma/ib_addr.h | 7 +-- 6 files changed, 40 insertions(+), 31 deletions(-) -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V1 for-next 2/2] IB/core: Use hop-limit from IP stack for RoCE
Previously, IPV6_DEFAULT_HOPLIMIT was used as the hop limit value for RoCE. Fixing that by taking ip4_dst_hoplimit and ip6_dst_hoplimit as hop limit values. Signed-off-by: Matan Barak --- drivers/infiniband/core/addr.c | 9 - drivers/infiniband/core/cm.c | 1 + drivers/infiniband/core/cma.c| 12 +--- drivers/infiniband/core/verbs.c | 16 +++- drivers/infiniband/hw/ocrdma/ocrdma_ah.c | 3 ++- include/rdma/ib_addr.h | 4 +++- 6 files changed, 26 insertions(+), 19 deletions(-) diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c index ce3c68e..f924d90 100644 --- a/drivers/infiniband/core/addr.c +++ b/drivers/infiniband/core/addr.c @@ -252,6 +252,8 @@ static int addr4_resolve(struct sockaddr_in *src_in, if (rt->rt_uses_gateway) addr->network = RDMA_NETWORK_IPV4; + addr->hoplimit = ip4_dst_hoplimit(&rt->dst); + *prt = rt; return 0; out: @@ -295,6 +297,8 @@ static int addr6_resolve(struct sockaddr_in6 *src_in, if (rt->rt6i_flags & RTF_GATEWAY) addr->network = RDMA_NETWORK_IPV6; + addr->hoplimit = ip6_dst_hoplimit(dst); + *pdst = dst; return 0; put: @@ -542,7 +546,8 @@ static void resolve_cb(int status, struct sockaddr *src_addr, int rdma_addr_find_l2_eth_by_grh(const union ib_gid *sgid, const union ib_gid *dgid, -u8 *dmac, u16 *vlan_id, int *if_index) +u8 *dmac, u16 *vlan_id, int *if_index, +int *hoplimit) { int ret = 0; struct rdma_dev_addr dev_addr; @@ -581,6 +586,8 @@ int rdma_addr_find_l2_eth_by_grh(const union ib_gid *sgid, *if_index = dev_addr.bound_dev_if; if (vlan_id) *vlan_id = rdma_vlan_dev_vlan_id(dev); + if (hoplimit) + *hoplimit = dev_addr.hoplimit; dev_put(dev); return ret; } diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index e3a95d1..cd3d345 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -1641,6 +1641,7 @@ static int cm_req_handler(struct cm_work *work) cm_format_paths_from_req(req_msg, &work->path[0], &work->path[1]); memcpy(work->path[0].dmac, cm_id_priv->av.ah_attr.dmac, ETH_ALEN); + work->path[0].hop_limit = cm_id_priv->av.ah_attr.grh.hop_limit; ret = ib_get_cached_gid(work->port->cm_dev->ib_device, work->port->port_num, cm_id_priv->av.ah_attr.grh.sgid_index, diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 559ee3d..66983da 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -2424,7 +2424,6 @@ static int cma_resolve_iboe_route(struct rdma_id_private *id_priv) { struct rdma_route *route = &id_priv->id.route; struct rdma_addr *addr = &route->addr; - enum ib_gid_type network_gid_type; struct cma_work *work; int ret; struct net_device *ndev = NULL; @@ -2478,14 +2477,13 @@ static int cma_resolve_iboe_route(struct rdma_id_private *id_priv) &route->path_rec->dgid); /* Use the hint from IP Stack to select GID Type */ - network_gid_type = ib_network_to_gid_type(addr->dev_addr.network); - if (addr->dev_addr.network != RDMA_NETWORK_IB) { - route->path_rec->gid_type = network_gid_type; + if (route->path_rec->gid_type < ib_network_to_gid_type(addr->dev_addr.network)) + route->path_rec->gid_type = ib_network_to_gid_type(addr->dev_addr.network); + if (((struct sockaddr *)&id_priv->id.route.addr.dst_addr)->sa_family != AF_IB) /* TODO: get the hoplimit from the inet/inet6 device */ - route->path_rec->hop_limit = IPV6_DEFAULT_HOPLIMIT; - } else { + route->path_rec->hop_limit = addr->dev_addr.hoplimit; + else route->path_rec->hop_limit = 1; - } route->path_rec->reversible = 1; route->path_rec->pkey = cpu_to_be16(0x); route->path_rec->mtu_selector = IB_SA_EQ; diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index 66eb498..b1998bc 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -434,6 +434,7 @@ int ib_init_ah_from_wc(struct ib_device *device, u8 port_num, int ret; enum rdma_network_type net_type = RDMA_NETWORK_IB; enum ib_gid_type gid_type = IB_GID_TYPE_IB; + int hoplimit = 0xff; union ib_gid dgid; union ib_gid sgid; @@ -471,7 +472,7 @@ int ib_init_ah_from_wc(struct ib_device *device, u8 port_num, ah_attr->dmac,
Re: [PATCH for-next 2/2] IB/core: Use hop-limit from IP stack for RoCE
On Sun, Jan 3, 2016 at 9:03 PM, Jason Gunthorpe wrote: > On Sun, Jan 03, 2016 at 03:59:11PM +0200, Matan Barak wrote: >> @@ -434,6 +434,7 @@ int ib_init_ah_from_wc(struct ib_device *device, u8 >> port_num, >> int ret; >> enum rdma_network_type net_type = RDMA_NETWORK_IB; >> enum ib_gid_type gid_type = IB_GID_TYPE_IB; >> + int hoplimit = grh->hop_limit; > >> ah_attr->grh.flow_label = flow_class & 0xF; >> - ah_attr->grh.hop_limit = 0xFF; >> + ah_attr->grh.hop_limit = hoplimit; > > No, this is wrong for IB. Please be careful to follow the IB > specification language for computing a hop limit on a reversible path. > You're right, this should be 0xff. Thanks. > No idea about rocee, but I can't believe using grh->hop_limit is right > there either. > Regarding RoCE, the hop limit is set from the routing table (in the same function): ret = rdma_addr_find_l2_eth_by_grh(&dgid, &sgid, ah_attr->dmac, wc->wc_flags & IB_WC_WITH_VLAN ? NULL : &vlan_id, &if_index, &hoplimit); > Jason > Regards, Matan > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] IB/sysfs: Fix sparse warning on attr_id
On 01/04/2016 04:44 AM, ira.we...@intel.com wrote: From: Ira Weiny Attributed ID was declared as an int while the value should really be big endian 16. Fixes: 35c4cbb17811 ("IB/core: Create get_perf_mad function in sysfs.c") Reported-by: Bart Van Assche Signed-off-by: Ira Weiny Reviewed-by: Bart Van Assche -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html