Re: [ewg] [PATCH] pkey fix for ipoib - resubmission
Jason Gunthorpe wrote: OFED works on kernels that have compiled-in inline'd multicast map functions that do not include the pkey copy, while mainline's multicast map functions do. So to work around this there is a bit of code in OFED to overwrite the pkey in the multicast hw address. This means on OFED with those kernels ip maddr returns the wrong hw address sometimes.. okay, got it. Anyway, with this not being the essence of the patch nor the discussion here, I would wait to hear what Todd and Mike think about your suggestion to apply the approach taken for the bonding problem and solution. Or. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Name for a new type of QP
Hi, This message follows a discussion in the EWG mailing list. We want to promote a patch that enables use of a new QP type. This QP type lets the user post_send() data to its SQ and treat it as the entire packet, including headers. An example of use with this QP is sending Ethernet packets from userspace (and enjoying kernel bypass). An open question in this matter it how should we call this QP type. The first name IBV_QPT_RAW_ETH seems to be too similar to the existing type IBV_QPT_RAW_ETY. My suggestion (that were posted in a different thread) are IBV_QPT_FRAME IBV_QPT_PACKET IBV_QPT_NOHDR Please make your comments and send your suggestions. When we decide about a name we will send a patch that enables the use of this QP type. thanks Moni -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Name for a new type of QP
I would prefer a name IBV_QPT_FRAME so it is a L2 layerQP. The packet is reserved for L3. Regards, Mirek -Original Message- From: Moni Shoua [mailto:mo...@voltaire.com] Sent: Wednesday, June 23, 2010 11:20 AM To: linux-rdma Cc: Walukiewicz, Miroslaw; Roland Dreier; al...@voltaire.com Subject: Name for a new type of QP Hi, This message follows a discussion in the EWG mailing list. We want to promote a patch that enables use of a new QP type. This QP type lets the user post_send() data to its SQ and treat it as the entire packet, including headers. An example of use with this QP is sending Ethernet packets from userspace (and enjoying kernel bypass). An open question in this matter it how should we call this QP type. The first name IBV_QPT_RAW_ETH seems to be too similar to the existing type IBV_QPT_RAW_ETY. My suggestion (that were posted in a different thread) are IBV_QPT_FRAME IBV_QPT_PACKET IBV_QPT_NOHDR Please make your comments and send your suggestions. When we decide about a name we will send a patch that enables the use of this QP type. thanks Moni -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Name for a new type of QP
It is true that on the send path for such a QP you can send any sort of packet. The user of the QP is required to build the MAC header and the NIC will wrap the post_sent data in an Ethernet type II frame which includes the tailing CRC. On the receive path this QP assume an MAC header. The qp attach GID value gets the multicast MAC destination address to catch the correct ingress packets. Bottom line is that I think IBV_QPT_RAW_ETH is a good name. If not I would like to recommend to use IBV_QPT_RAW_PACKET which hold a resemblance to socket(AF_PACKET, SOCK_RAW,...) On the other hand, I never fully understood what does IBV_QPT_RAW_ETY stand for? Maybe we should change its name to better represent what the code does. _ Alex Rosenbaum -Original Message- From: Walukiewicz, Miroslaw [mailto:miroslaw.walukiew...@intel.com] Sent: Wednesday, June 23, 2010 1:21 PM To: Moni Shoua; linux-rdma Cc: Roland Dreier; Alex Rosenbaum Subject: RE: Name for a new type of QP I would prefer a name IBV_QPT_FRAME so it is a L2 layerQP. The packet is reserved for L3. Regards, Mirek -Original Message- From: Moni Shoua [mailto:mo...@voltaire.com] Sent: Wednesday, June 23, 2010 11:20 AM To: linux-rdma Cc: Walukiewicz, Miroslaw; Roland Dreier; al...@voltaire.com Subject: Name for a new type of QP Hi, This message follows a discussion in the EWG mailing list. We want to promote a patch that enables use of a new QP type. This QP type lets the user post_send() data to its SQ and treat it as the entire packet, including headers. An example of use with this QP is sending Ethernet packets from userspace (and enjoying kernel bypass). An open question in this matter it how should we call this QP type. The first name IBV_QPT_RAW_ETH seems to be too similar to the existing type IBV_QPT_RAW_ETY. My suggestion (that were posted in a different thread) are IBV_QPT_FRAME IBV_QPT_PACKET IBV_QPT_NOHDR Please make your comments and send your suggestions. When we decide about a name we will send a patch that enables the use of this QP type. thanks Moni -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] RDMA/cxgb4: derive smac_idx from port viid.
Signed-off-by: Steve Wise sw...@opengridcomputing.com --- drivers/infiniband/hw/cxgb4/cm.c |9 + 1 files changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c index f75108f..8c9b483 100644 --- a/drivers/infiniband/hw/cxgb4/cm.c +++ b/drivers/infiniband/hw/cxgb4/cm.c @@ -1373,7 +1373,7 @@ static int pass_accept_req(struct c4iw_dev *dev, struct sk_buff *skb) pdev, 0); mtu = pdev-mtu; tx_chan = cxgb4_port_chan(pdev); - smac_idx = tx_chan 1; + smac_idx = (cxgb4_port_viid(pdev) 0x7F) 1; step = dev-rdev.lldi.ntxq / dev-rdev.lldi.nchan; txq_idx = cxgb4_port_idx(pdev) * step; step = dev-rdev.lldi.nrxq / dev-rdev.lldi.nchan; @@ -1384,7 +1384,7 @@ static int pass_accept_req(struct c4iw_dev *dev, struct sk_buff *skb) dst-neighbour-dev, 0); mtu = dst_mtu(dst); tx_chan = cxgb4_port_chan(dst-neighbour-dev); - smac_idx = tx_chan 1; + smac_idx = (cxgb4_port_viid(dst-neighbour-dev) 0x7F) 1; step = dev-rdev.lldi.ntxq / dev-rdev.lldi.nchan; txq_idx = cxgb4_port_idx(dst-neighbour-dev) * step; step = dev-rdev.lldi.nrxq / dev-rdev.lldi.nchan; @@ -1951,7 +1951,7 @@ int c4iw_connect(struct iw_cm_id *cm_id, struct iw_cm_conn_param *conn_param) pdev, 0); ep-mtu = pdev-mtu; ep-tx_chan = cxgb4_port_chan(pdev); - ep-smac_idx = ep-tx_chan 1; + ep-smac_idx = (cxgb4_port_viid(pdev) 0x7F) 1; step = ep-com.dev-rdev.lldi.ntxq / ep-com.dev-rdev.lldi.nchan; ep-txq_idx = cxgb4_port_idx(pdev) * step; @@ -1966,7 +1966,8 @@ int c4iw_connect(struct iw_cm_id *cm_id, struct iw_cm_conn_param *conn_param) ep-dst-neighbour-dev, 0); ep-mtu = dst_mtu(ep-dst); ep-tx_chan = cxgb4_port_chan(ep-dst-neighbour-dev); - ep-smac_idx = ep-tx_chan 1; + ep-smac_idx = (cxgb4_port_viid(ep-dst-neighbour-dev) + 0x7F) 1; step = ep-com.dev-rdev.lldi.ntxq / ep-com.dev-rdev.lldi.nchan; ep-txq_idx = cxgb4_port_idx(ep-dst-neighbour-dev) * step; -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] RDMA/cxgb4: Add module option to tweak delayed ack.
Signed-off-by: Steve Wise sw...@opengridcomputing.com --- drivers/infiniband/hw/cxgb4/cm.c | 10 +- drivers/infiniband/hw/cxgb4/t4fw_ri_api.h | 10 ++ 2 files changed, 19 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c index 8c9b483..fae6080 100644 --- a/drivers/infiniband/hw/cxgb4/cm.c +++ b/drivers/infiniband/hw/cxgb4/cm.c @@ -61,6 +61,10 @@ static char *states[] = { NULL, }; +static int dack_mode; +module_param(dack_mode, int, 0644); +MODULE_PARM_DESC(dack_mode, Delayed ack mode (default=0)); + int c4iw_max_read_depth = 8; module_param(c4iw_max_read_depth, int, 0644); MODULE_PARM_DESC(c4iw_max_read_depth, Per-connection max ORD/IRD (default=8)); @@ -474,6 +478,7 @@ static int send_connect(struct c4iw_ep *ep) cxgb4_best_mtu(ep-com.dev-rdev.lldi.mtus, ep-mtu, mtu_idx); wscale = compute_wscale(rcv_win); opt0 = KEEP_ALIVE(1) | + DELACK(1) | WND_SCALE(wscale) | MSS_IDX(mtu_idx) | L2T_IDX(ep-l2t-idx) | @@ -845,7 +850,9 @@ static int update_rx_credits(struct c4iw_ep *ep, u32 credits) INIT_TP_WR(req, ep-hwtid); OPCODE_TID(req) = cpu_to_be32(MK_OPCODE_TID(CPL_RX_DATA_ACK, ep-hwtid)); - req-credit_dack = cpu_to_be32(credits); + req-credit_dack = cpu_to_be32(credits | RX_FORCE_ACK(1) | + F_RX_DACK_CHANGE | + V_RX_DACK_MODE(dack_mode)); set_wr_txq(skb, CPL_PRIORITY_ACK, ep-txq_idx); c4iw_ofld_send(ep-com.dev-rdev, skb); return credits; @@ -1264,6 +1271,7 @@ static void accept_cr(struct c4iw_ep *ep, __be32 peer_ip, struct sk_buff *skb, cxgb4_best_mtu(ep-com.dev-rdev.lldi.mtus, ep-mtu, mtu_idx); wscale = compute_wscale(rcv_win); opt0 = KEEP_ALIVE(1) | + DELACK(1) | WND_SCALE(wscale) | MSS_IDX(mtu_idx) | L2T_IDX(ep-l2t-idx) | diff --git a/drivers/infiniband/hw/cxgb4/t4fw_ri_api.h b/drivers/infiniband/hw/cxgb4/t4fw_ri_api.h index fc706bd..dc193c2 100644 --- a/drivers/infiniband/hw/cxgb4/t4fw_ri_api.h +++ b/drivers/infiniband/hw/cxgb4/t4fw_ri_api.h @@ -826,4 +826,14 @@ struct ulptx_idata { #define S_ULPTX_NSGE0 #define M_ULPTX_NSGE0x #define V_ULPTX_NSGE(x) ((x) S_ULPTX_NSGE) + +#define S_RX_DACK_MODE29 +#define M_RX_DACK_MODE0x3 +#define V_RX_DACK_MODE(x) ((x) S_RX_DACK_MODE) +#define G_RX_DACK_MODE(x) (((x) S_RX_DACK_MODE) M_RX_DACK_MODE) + +#define S_RX_DACK_CHANGE31 +#define V_RX_DACK_CHANGE(x) ((x) S_RX_DACK_CHANGE) +#define F_RX_DACK_CHANGEV_RX_DACK_CHANGE(1U) + #endif /* _T4FW_RI_API_H_ */ -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] RDMA/cxgb4: Obtain RDMA QID ranges from LLD/FW.
Signed-off-by: Steve Wise sw...@opengridcomputing.com --- drivers/infiniband/hw/cxgb4/device.c |9 +++-- drivers/infiniband/hw/cxgb4/resource.c |7 --- drivers/infiniband/hw/cxgb4/t4.h |2 -- 3 files changed, 11 insertions(+), 7 deletions(-) diff --git a/drivers/infiniband/hw/cxgb4/device.c b/drivers/infiniband/hw/cxgb4/device.c index d870f9c..e047ee8 100644 --- a/drivers/infiniband/hw/cxgb4/device.c +++ b/drivers/infiniband/hw/cxgb4/device.c @@ -250,12 +250,17 @@ static int c4iw_rdev_open(struct c4iw_rdev *rdev) rdev-cqshift = PAGE_SHIFT - ilog2(rdev-lldi.ucq_density); rdev-cqmask = rdev-lldi.ucq_density - 1; PDBG(%s dev %s stag start 0x%0x size 0x%0x num stags %d -pbl start 0x%0x size 0x%0x rq start 0x%0x size 0x%0x\n, +pbl start 0x%0x size 0x%0x rq start 0x%0x size 0x%0x +qp qid start %u size %u cq qid start %u size %u\n, __func__, pci_name(rdev-lldi.pdev), rdev-lldi.vr-stag.start, rdev-lldi.vr-stag.size, c4iw_num_stags(rdev), rdev-lldi.vr-pbl.start, rdev-lldi.vr-pbl.size, rdev-lldi.vr-rq.start, -rdev-lldi.vr-rq.size); +rdev-lldi.vr-rq.size, +rdev-lldi.vr-qp.start, +rdev-lldi.vr-qp.size, +rdev-lldi.vr-cq.start, +rdev-lldi.vr-cq.size); PDBG(udb len 0x%x udb base %p db_reg %p gts_reg %p qpshift %lu qpmask 0x%x cqshift %lu cqmask 0x%x\n, (unsigned)pci_resource_len(rdev-lldi.pdev, 2), diff --git a/drivers/infiniband/hw/cxgb4/resource.c b/drivers/infiniband/hw/cxgb4/resource.c index fb195d1..83b23df 100644 --- a/drivers/infiniband/hw/cxgb4/resource.c +++ b/drivers/infiniband/hw/cxgb4/resource.c @@ -110,11 +110,12 @@ static int c4iw_init_qid_fifo(struct c4iw_rdev *rdev) spin_lock_init(rdev-resource.qid_fifo_lock); - if (kfifo_alloc(rdev-resource.qid_fifo, T4_MAX_QIDS * sizeof(u32), - GFP_KERNEL)) + if (kfifo_alloc(rdev-resource.qid_fifo, rdev-lldi.vr-qp.size * + sizeof(u32), GFP_KERNEL)) return -ENOMEM; - for (i = T4_QID_BASE; i T4_QID_BASE + T4_MAX_QIDS; i++) + for (i = rdev-lldi.vr-qp.start; +i rdev-lldi.vr-qp.start + rdev-lldi.vr-qp.size; i++) if (!(i rdev-qpmask)) kfifo_in(rdev-resource.qid_fifo, (unsigned char *) i, sizeof(u32)); diff --git a/drivers/infiniband/hw/cxgb4/t4.h b/drivers/infiniband/hw/cxgb4/t4.h index 97798d4..e0b4ae0 100644 --- a/drivers/infiniband/hw/cxgb4/t4.h +++ b/drivers/infiniband/hw/cxgb4/t4.h @@ -36,8 +36,6 @@ #include t4_msg.h #include t4fw_ri_api.h -#define T4_QID_BASE 1024 -#define T4_MAX_QIDS 256 #define T4_MAX_NUM_QP (116) #define T4_MAX_NUM_CQ (115) #define T4_MAX_NUM_PD (115) -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] RDMA/cxgb4: derive smac_idx from port viid.
Hey Roland, Please ignore these 3 patches. I forgot to run checkpatch on them and they need some cleanup. I'll re-submit as v2 of the series. Sorry for the noise. Steve. Steve Wise wrote: Signed-off-by: Steve Wise sw...@opengridcomputing.com --- drivers/infiniband/hw/cxgb4/cm.c |9 + 1 files changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c index f75108f..8c9b483 100644 --- a/drivers/infiniband/hw/cxgb4/cm.c +++ b/drivers/infiniband/hw/cxgb4/cm.c @@ -1373,7 +1373,7 @@ static int pass_accept_req(struct c4iw_dev *dev, struct sk_buff *skb) pdev, 0); mtu = pdev-mtu; tx_chan = cxgb4_port_chan(pdev); - smac_idx = tx_chan 1; + smac_idx = (cxgb4_port_viid(pdev) 0x7F) 1; step = dev-rdev.lldi.ntxq / dev-rdev.lldi.nchan; txq_idx = cxgb4_port_idx(pdev) * step; step = dev-rdev.lldi.nrxq / dev-rdev.lldi.nchan; @@ -1384,7 +1384,7 @@ static int pass_accept_req(struct c4iw_dev *dev, struct sk_buff *skb) dst-neighbour-dev, 0); mtu = dst_mtu(dst); tx_chan = cxgb4_port_chan(dst-neighbour-dev); - smac_idx = tx_chan 1; + smac_idx = (cxgb4_port_viid(dst-neighbour-dev) 0x7F) 1; step = dev-rdev.lldi.ntxq / dev-rdev.lldi.nchan; txq_idx = cxgb4_port_idx(dst-neighbour-dev) * step; step = dev-rdev.lldi.nrxq / dev-rdev.lldi.nchan; @@ -1951,7 +1951,7 @@ int c4iw_connect(struct iw_cm_id *cm_id, struct iw_cm_conn_param *conn_param) pdev, 0); ep-mtu = pdev-mtu; ep-tx_chan = cxgb4_port_chan(pdev); - ep-smac_idx = ep-tx_chan 1; + ep-smac_idx = (cxgb4_port_viid(pdev) 0x7F) 1; step = ep-com.dev-rdev.lldi.ntxq / ep-com.dev-rdev.lldi.nchan; ep-txq_idx = cxgb4_port_idx(pdev) * step; @@ -1966,7 +1966,8 @@ int c4iw_connect(struct iw_cm_id *cm_id, struct iw_cm_conn_param *conn_param) ep-dst-neighbour-dev, 0); ep-mtu = dst_mtu(ep-dst); ep-tx_chan = cxgb4_port_chan(ep-dst-neighbour-dev); - ep-smac_idx = ep-tx_chan 1; + ep-smac_idx = (cxgb4_port_viid(ep-dst-neighbour-dev) + 0x7F) 1; step = ep-com.dev-rdev.lldi.ntxq / ep-com.dev-rdev.lldi.nchan; ep-txq_idx = cxgb4_port_idx(ep-dst-neighbour-dev) * step; -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 1/3] RDMA/cxgb4: derive smac_idx from port viid.
Signed-off-by: Steve Wise sw...@opengridcomputing.com --- drivers/infiniband/hw/cxgb4/cm.c |9 + 1 files changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c index f75108f..8c9b483 100644 --- a/drivers/infiniband/hw/cxgb4/cm.c +++ b/drivers/infiniband/hw/cxgb4/cm.c @@ -1373,7 +1373,7 @@ static int pass_accept_req(struct c4iw_dev *dev, struct sk_buff *skb) pdev, 0); mtu = pdev-mtu; tx_chan = cxgb4_port_chan(pdev); - smac_idx = tx_chan 1; + smac_idx = (cxgb4_port_viid(pdev) 0x7F) 1; step = dev-rdev.lldi.ntxq / dev-rdev.lldi.nchan; txq_idx = cxgb4_port_idx(pdev) * step; step = dev-rdev.lldi.nrxq / dev-rdev.lldi.nchan; @@ -1384,7 +1384,7 @@ static int pass_accept_req(struct c4iw_dev *dev, struct sk_buff *skb) dst-neighbour-dev, 0); mtu = dst_mtu(dst); tx_chan = cxgb4_port_chan(dst-neighbour-dev); - smac_idx = tx_chan 1; + smac_idx = (cxgb4_port_viid(dst-neighbour-dev) 0x7F) 1; step = dev-rdev.lldi.ntxq / dev-rdev.lldi.nchan; txq_idx = cxgb4_port_idx(dst-neighbour-dev) * step; step = dev-rdev.lldi.nrxq / dev-rdev.lldi.nchan; @@ -1951,7 +1951,7 @@ int c4iw_connect(struct iw_cm_id *cm_id, struct iw_cm_conn_param *conn_param) pdev, 0); ep-mtu = pdev-mtu; ep-tx_chan = cxgb4_port_chan(pdev); - ep-smac_idx = ep-tx_chan 1; + ep-smac_idx = (cxgb4_port_viid(pdev) 0x7F) 1; step = ep-com.dev-rdev.lldi.ntxq / ep-com.dev-rdev.lldi.nchan; ep-txq_idx = cxgb4_port_idx(pdev) * step; @@ -1966,7 +1966,8 @@ int c4iw_connect(struct iw_cm_id *cm_id, struct iw_cm_conn_param *conn_param) ep-dst-neighbour-dev, 0); ep-mtu = dst_mtu(ep-dst); ep-tx_chan = cxgb4_port_chan(ep-dst-neighbour-dev); - ep-smac_idx = ep-tx_chan 1; + ep-smac_idx = (cxgb4_port_viid(ep-dst-neighbour-dev) + 0x7F) 1; step = ep-com.dev-rdev.lldi.ntxq / ep-com.dev-rdev.lldi.nchan; ep-txq_idx = cxgb4_port_idx(ep-dst-neighbour-dev) * step; -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] IB/qib: turn off IB latency mode
Turn off IB latency mode. This improves link quality for slower process chips. Signed-off-by: Ralph Campbell ralph.campb...@qlogic.com --- drivers/infiniband/hw/qib/qib_iba7322.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/qib/qib_iba7322.c b/drivers/infiniband/hw/qib/qib_iba7322.c index 5eedf83..fc14ef8 100644 --- a/drivers/infiniband/hw/qib/qib_iba7322.c +++ b/drivers/infiniband/hw/qib/qib_iba7322.c @@ -7271,6 +7271,8 @@ static int serdes_7322_init(struct qib_pportdata *ppd) ibsd_wr_allchans(ppd, 20, (4 13), BMASK(15, 13)); /* SDR */ data = qib_read_kreg_port(ppd, krp_serdesctrl); + /* Turn off IB latency mode */ + data = ~SYM_MASK(IBSerdesCtrl_0, IB_LAT_MODE); qib_write_kreg_port(ppd, krp_serdesctrl, data | SYM_MASK(IBSerdesCtrl_0, RXLOSEN)); -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How do I get scst_vdisk/IB_SRP(T) to properly handle drives w/ 4KB sectors?
Chris Worley, on 06/22/2010 08:06 PM wrote: When given an LBA w/ a bad boundary, the drive returns an error and the target side says: dev_vdisk: ***ERROR***: cmd 810196f58b70 returned error -22 ... and the initiator: sd 8:0:0:0: SCSI error: return code = 0x0802 sdc: Current: sense key: Medium Error Add. Sense: Unrecovered read error Is there a way to tell scst that this drive requires 4KB block sizes, and pass that upstream? I'm not sure what you mean here under tell and pass upstream. Generally, such problems are outside of SCST scope and responsibilities. With vdisk kernel I/O stack should make sure you use correct alignment accessing your backend drive and you can always choose your own 512b block size for all vdisk devices. Vlad -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] RDMA/CMA: fix iWARP adapter TCP port space usage
iWARP is just another protocol on top of TCP - like iSCSI. There is no good reason to invent another TCP port maintainer per TCP user type trying to synchonize with the kernel if the resource is host global and already maintained by the kernel. I think the counter-argument to this is than an iWARP offload NIC is an independent TCP stack and hence should not be tied into the host stack. It's interesting that you bring up iSCSI -- as I understand things, iSCSI offload HBAs are typically configured with their own IP, through a separate mechanism. (The port collision problem is not likely to be hit with iSCSI, since the HBA is an initiator and hence does only active connections, and a 4-tuple collision between connections to the iSCSI target is not likely and other host stack traffic is extremely unlikely) Since we are developing and already open sourced a full software implementation (SoftiWARP) of RDMA, our view on the optimal solution must be different. Like kernel iSCSI, we are running on top of regular kernel sockets. With that, there is no point having a connection manager blocking just the port we wanted to use for communication - SoftiWARP uses kernel sockets for data communication. I think this is an extremely strong argument against the patch that started the thread. Breaking soft iWARP seems a fatal flaw. - R. -- Roland Dreier rola...@cisco.com || For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ewg] [PATCH v4] IB Core: RAW ETH support
There is no qp type IBV_QPT_RAW_ETY in user space (at least not in the definitions coming with libibverbs). In fact, libibverbs that comes with OFED defines (in verbs.h) a qp type called IBV_QPT_RAW_ETT which equals to 7. The patch that is under discussion here adds a new qp type IB_QPT_RAW_ETH and equals it to 7 to match the definition in user space. This indeed changes the value of IB_QPT_RAW_ETY to 8 but I don't see who can be affected since 1. No user space program that uses IB_QPT_RAW_ETY exists 2. kernel is compiled as one piece of code. Why renumber the _ETY enum? Maybe it doesn't break anything serious but why risk it? -- Roland Dreier rola...@cisco.com || For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How do I get scst_vdisk/IB_SRP(T) to properly handle drives w/ 4KB sectors?
On Wed, Jun 23, 2010 at 11:08 AM, Vladislav Bolkhovitin v...@vlnb.net wrote: Chris Worley, on 06/22/2010 08:06 PM wrote: When given an LBA w/ a bad boundary, the drive returns an error and the target side says: dev_vdisk: ***ERROR***: cmd 810196f58b70 returned error -22 ... and the initiator: sd 8:0:0:0: SCSI error: return code = 0x0802 sdc: Current: sense key: Medium Error Add. Sense: Unrecovered read error Is there a way to tell scst that this drive requires 4KB block sizes, and pass that upstream? I'm not sure what you mean here under tell and pass upstream. Generally, such problems are outside of SCST scope and responsibilities. With vdisk kernel I/O stack should make sure you use correct alignment accessing your backend drive and you can always choose your own 512b block size for all vdisk devices. DOH! Thanks, Chris Vlad -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] RDMA/CMA: fix iWARP adapter TCP port space usage
Roland Dreier wrote: Since we are developing and already open sourced a full software implementation (SoftiWARP) of RDMA, our view on the optimal solution must be different. Like kernel iSCSI, we are running on top of regular kernel sockets. With that, there is no point having a connection manager blocking just the port we wanted to use for communication - SoftiWARP uses kernel sockets for data communication. I think this is an extremely strong argument against the patch that started the thread. Breaking soft iWARP seems a fatal flaw. - R. I agree. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Name for a new type of QP
On the other hand, I never fully understood what does IBV_QPT_RAW_ETY stand for? Maybe we should change its name to better represent what the code does. Picking names for things is not my strongest suit, and I don't have a very good suggestion, so I'll leave that out. But on the point above, RAW_ETY is for the IBA raw ethertype special QP type. And I think it would probably be a good idea to change the enum from IBV_QPT_RAW_ETY to something like IBV_QPT_RAW_ETHERTYPE. - R. -- Roland Dreier rola...@cisco.com || For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/7] various fixes for QIB driver
The following patches are for various bug fixes. I'm not sure what counts as a regression for code that is newly introduced. I'm hoping that all except #2 can be made for 2.6.35 whereas #2 can wait for 2.6.36 since it is actually a feature. All except #2 look OK for 2.6.35. I'll hold #2 for 2.6.36 -- I hope it's independent? In the future it might be cleaner to send a series 1-6 of fixes for 2.6.35 and then send the port assignment one as a 2.6.36 patch separate from the series. (No need to resend here) - R. -- Roland Dreier rola...@cisco.com || For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] RDMA/CMA: fix iWARP adapter TCP port space usage
Roland, do you think the iSCSI approach is a good design for iWARP devices? Well, it's a different problem since as I said the port collision problem is a non-issue for iSCSI anyway. But yes having a separate interface to assign an iWARP IP address to an RNIC does seem to avoid the immediate problem. I actually don't know what the right answer is -- having a separate IP address for iWARP does seem to lead to having to duplicate everything for configuring it. (And this is the approach for the cxgb[34] iSCSI drivers, right?) On the other hand trying to hook offloaded iWARP into the normal stack does seem to lead to a mess. I see DaveM's point: TCP port space is just the beginning -- filtering, queueing, etc also have config that ultimately an offload device would want to hook too. Maybe the sanest out of a bad set of options would be to come up with a standard way to configure independent TCP/IP stacks that share a link. really, dunno. - R. -- Roland Dreier rola...@cisco.com || For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v2] RDMA/CMA: fix iWARP adapter TCP port space usage
I think this is an extremely strong argument against the patch that started the thread. Breaking soft iWARP seems a fatal flaw. - R. I agree. The patch or SoftiWARP can be reworked to allow the whole iWARP family to coexist. It is a matter of agreeing on which path to take. Chien -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] RDMA/CMA: fix iWARP adapter TCP port space usage
Tung, Chien Tin wrote: I think this is an extremely strong argument against the patch that started the thread. Breaking soft iWARP seems a fatal flaw. - R. I agree. The patch or SoftiWARP can be reworked to allow the whole iWARP family to coexist. It is a matter of agreeing on which path to take. I agree with this too! :) My only reason for stating I agree with Roland/Bernard is that reserving a port in the rdma-cm definitely breaks software iwarp, so we need to rethink this whole thing in light of software iwarp. Steve. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] RDMA/CMA: fix iWARP adapter TCP port space usage
I just think the customer looses when we add iwarp-specific tools, ipaddrs, subnets, etc etc. And what about software iwarp? Will it use the host stack tools and not these new tools? So then we end up with 2 sets of tools for iwarp devices. :( Agree -- but same prob with current iSCSI offload stuff... -- Roland Dreier rola...@cisco.com || For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/3] RDMA/cxgb4: derive smac_idx from port viid.
what's smac_idx? what's port viid? hard to know what the heck this fixes :) -- Roland Dreier rola...@cisco.com || For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/3] RDMA/cxgb4: Add module option to tweak delayed ack.
is this fixing anything? ie 2.6.35 or .36? -- Roland Dreier rola...@cisco.com || For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 3/3] RDMA/cxgb4: Obtain RDMA QID ranges from LLD/FW.
again fixing anything or just cleaning up? -- Roland Dreier rola...@cisco.com || For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/3] RDMA/cxgb4: derive smac_idx from port viid.
Roland Dreier wrote: what's smac_idx? what's port viid? hard to know what the heck this fixes :) smac_idx == source mac index: the index into the HW source mac table. viid = Virtual Interface ID: for virtualization, this allows having smac tables, among other things, per virtual device. I was incorrectly computing the smac_idx in my previous code. But it worked until a recent FW change I think. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/3] RDMA/cxgb4: Add module option to tweak delayed ack.
Roland Dreier wrote: is this fixing anything? ie 2.6.35 or .36? 2.6.36. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 3/3] RDMA/cxgb4: Obtain RDMA QID ranges from LLD/FW.
Roland Dreier wrote: again fixing anything or just cleaning up? This one is dependent on a cxgb3 change merged into net-next. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v2] RDMA/CMA: fix iWARP adapter TCP port space usage
On the other hand trying to hook offloaded iWARP into the normal stack does seem to lead to a mess. I see DaveM's point: TCP port space is just the beginning -- filtering, queueing, etc also have config that ultimately an offload device would want to hook too. TCP port space is just the beginning but then these features didn't show up all at once in the kernel either. Instead of evolving iWARP implementation, we can't even take a baby step and fix a flaw that exists in the current kernel. Why are we replicating everything offered by the host stack instead of hooking in? It does not sound like good engineering to me. Chien -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] RDMA/CMA: fix iWARP adapter TCP port space usage
On Wed, Jun 23, 2010 at 01:46:47PM -0500, Steve Wise wrote: Yes. Perusing the drivers/scsi/cxgb3i code I see the iscsi ipaddr is actually stored in the port_info struct which is hung of the netdev_priv of the cxgb3 device. It is set by cxgb3i_host_set_param() which is part of the iscsi transport interface. I wonder how does neighbor discovery, routing, etc work with iscsi? I just think the customer looses when we add iwarp-specific tools, ipaddrs, subnets, etc etc. And what about software iwarp? Will it use the host stack tools and not these new tools? So then we end up with 2 sets of tools for iwarp devices. :( Well, maybe you can get netdev to agree on some way to create an interface that has all the IP services, but no TCP protocol binding? Then the configuration could be largely the same. If you could share that with the iscsi world then maybe it isn't so bad? Jason -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] RDMA/CMA: fix iWARP adapter TCP port space usage
Jason Gunthorpe wrote: On Wed, Jun 23, 2010 at 01:46:47PM -0500, Steve Wise wrote: Yes. Perusing the drivers/scsi/cxgb3i code I see the iscsi ipaddr is actually stored in the port_info struct which is hung of the netdev_priv of the cxgb3 device. It is set by cxgb3i_host_set_param() which is part of the iscsi transport interface. I wonder how does neighbor discovery, routing, etc work with iscsi? For cxgb3i: ND is handled by initiating ND via exported kernel services (neigh_event_send()) and registering for NETEVENT_NEIGH_UPDATE net events to get updated neigh entries. The host routing table is consulted via ip_route_output_flow() to map a destination ip address to a local netdev, and then if that device is T3, it will do the iscsi offload. I just think the customer looses when we add iwarp-specific tools, ipaddrs, subnets, etc etc. And what about software iwarp? Will it use the host stack tools and not these new tools? So then we end up with 2 sets of tools for iwarp devices. :( Well, maybe you can get netdev to agree on some way to create an interface that has all the IP services, but no TCP protocol binding? Then the configuration could be largely the same. If you could share that with the iscsi world then maybe it isn't so bad? Maybe. I fear this will meet the same resistance from the netdev folks. Steve. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] RDMA/CMA: fix iWARP adapter TCP port space usage
I wonder how does neighbor discovery, routing, etc work with iscsi? For cxgb3i: ND is handled by initiating ND via exported kernel services (neigh_event_send()) and registering for NETEVENT_NEIGH_UPDATE net events to get updated neigh entries. The host routing table is consulted via ip_route_output_flow() to map a destination ip address to a local netdev, and then if that device is T3, it will do the iscsi offload. By the way, this is how iWARP works too.The ND stuff is done by the IWCM during RESOLVE_ADDR. The routing lookups are done by the iWARP devices themselves typically. Steve. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] RDMA/CMA: fix iWARP adapter TCP port space usage
Steve Wise wrote: I wonder how does neighbor discovery, routing, etc work with iscsi? For cxgb3i: ND is handled by initiating ND via exported kernel services (neigh_event_send()) and registering for NETEVENT_NEIGH_UPDATE net events to get updated neigh entries. The host routing table is consulted via ip_route_output_flow() to map a destination ip address to a local netdev, and then if that device is T3, it will do the iscsi offload. By the way, this is how iWARP works too.The ND stuff is done by the IWCM during RESOLVE_ADDR. The routing lookups are done by the iWARP devices themselves typically. Sorry I meant by the iWARP device drivers themselves Steve. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] RDMA/CMA: fix iWARP adapter TCP port space usage
On the other hand trying to hook offloaded iWARP into the normal stack does seem to lead to a mess. I see DaveM's point: TCP port space is just the beginning -- filtering, queueing, etc also have config that ultimately an offload device would want to hook too. TCP port space is just the beginning but then these features didn't show up all at once in the kernel either. Instead of evolving iWARP implementation, we can't even take a baby step and fix a flaw that exists in the current kernel. Why are we replicating everything offered by the host stack instead of hooking in? It does not sound like good engineering to me. Well as I said I don't particularly see a clean solution. But the point I was making was that the net stack is already very complex with many places where interface configs are controlled -- having to add hooks to pass that config on to offload devices is going to add even more complexity and also add constraints to the format of that config information. Which is not good. - R. -- Roland Dreier rola...@cisco.com || For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] RDMA/CMA: fix iWARP adapter TCP port space usage
On Wed, Jun 23, 2010 at 02:42:43PM -0500, Steve Wise wrote: I wonder how does neighbor discovery, routing, etc work with iscsi? For cxgb3i: ND is handled by initiating ND via exported kernel services (neigh_event_send()) and registering for NETEVENT_NEIGH_UPDATE net events to get updated neigh entries. The host routing table is consulted via ip_route_output_flow() to map a destination ip address to a local netdev, and then if that device is T3, it will do the iscsi offload. That is what RDMA does.. So that means that the IP used for iscsi is actually an IP assigned to the interface? Doesn't that mean the port collision problem still exits, although probably less likely? Well, maybe you can get netdev to agree on some way to create an interface that has all the IP services, but no TCP protocol binding? Then the configuration could be largely the same. If you could share that with the iscsi world then maybe it isn't so bad? Maybe. I fear this will meet the same resistance from the netdev folks. Hmm.. It kinds codifies what is already in the kernel, these offload devices rely on neighbour and routing services from netdev and provide their own TCP on top of it... But.. having a device that effectively swaps the entire TCP implementation for a proprietary version is not going to be popular either. At the very least, bringing iSCSI offload NICs into your solution broadens the applicability. Jason -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] RDMA/CMA: fix iWARP adapter TCP port space usage
Doesn't that mean the port collision problem still exits [for iSCSI], although probably less likely? Yes, it's there, but almost impossible to hit: first of all, iSCSI HBAs never listen on a port, so that can never collide. Second, iSCSI HBAs only establish connections to iSCSI targets on the iSCSI port -- so really your only chance of a problem is if you ran an offloaded and non-offloaded iSCSI initiator on the same IP to the same target, _and_ you got unlucky on the local ports that you chose. So in practice no one will hit this. -- Roland Dreier rola...@cisco.com || For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] RDMA/CMA: fix iWARP adapter TCP port space usage
Jason Gunthorpe wrote: On Wed, Jun 23, 2010 at 02:42:43PM -0500, Steve Wise wrote: I wonder how does neighbor discovery, routing, etc work with iscsi? For cxgb3i: ND is handled by initiating ND via exported kernel services (neigh_event_send()) and registering for NETEVENT_NEIGH_UPDATE net events to get updated neigh entries. The host routing table is consulted via ip_route_output_flow() to map a destination ip address to a local netdev, and then if that device is T3, it will do the iscsi offload. That is what RDMA does.. So that means that the IP used for iscsi is actually an IP assigned to the interface? The IP address assigned for the cxgb3i iscsi device is _not_ assigned to a netdev interface via ifconfig, as far as I understand it (by looking at the cxgb3i code). So the host stack doesn't know about this address. There is an administrative requirement, I assume, that the secret iscsi ipaddr is within a subnet that is bound to the T3 ethX interface. Otherwise the routing lookup wouldn't work. Steve. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] RDMA/CMA: fix iWARP adapter TCP port space usage
Roland Dreier wrote: Doesn't that mean the port collision problem still exits [for iSCSI], although probably less likely? Yes, it's there, but almost impossible to hit: first of all, iSCSI HBAs never listen on a port, so that can never collide. Second, iSCSI HBAs only establish connections to iSCSI targets on the iSCSI port -- so really your only chance of a problem is if you ran an offloaded and non-offloaded iSCSI initiator on the same IP to the same target, _and_ you got unlucky on the local ports that you chose. So in practice no one will hit this. I believe, at least for cxgb3i, the ipaddr used is not bound to an ethX interface. So the 4-tuple will never collide with host TCP connections. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] RDMA/CMA: fix iWARP adapter TCP port space usage
Jason Gunthorpe wrote: On Wed, Jun 23, 2010 at 03:11:27PM -0500, Steve Wise wrote: The IP address assigned for the cxgb3i iscsi device is _not_ assigned to a netdev interface via ifconfig, as far as I understand it (by looking at the cxgb3i code). So the host stack doesn't know about this address. There is an administrative requirement, I assume, that the secret iscsi ipaddr is within a subnet that is bound to the T3 ethX interface. Otherwise the routing lookup wouldn't work. So who responds to neighbor queries, and how do outgoing queries get sent with the right IP? Sounds odd... The iscsi hba is only an initiator, so it doesn't need to respond to arp queries. I guess the Source Protocol Address in the outgoing ARP request will be the ipaddr of the outgoing interface. Its ok though because what is needed is the next-hop peer's hwaddr. So the ARP reply comes in, updates the host neigh entry, and a NEIGH_EVENT callout is performed to the offload device drivers. It is a little hackish, but that's the only way the netdev maintainers would allow iscsi offload in. They originally tried to use the src address from the ethX interface for the offload iscsi connections and that was rejected. Steve. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] RDMA/CMA: fix iWARP adapter TCP port space usage
Steve Wise wrote: Jason Gunthorpe wrote: On Wed, Jun 23, 2010 at 03:11:27PM -0500, Steve Wise wrote: The IP address assigned for the cxgb3i iscsi device is _not_ assigned to a netdev interface via ifconfig, as far as I understand it (by looking at the cxgb3i code). So the host stack doesn't know about this address. There is an administrative requirement, I assume, that the secret iscsi ipaddr is within a subnet that is bound to the T3 ethX interface. Otherwise the routing lookup wouldn't work. So who responds to neighbor queries, and how do outgoing queries get sent with the right IP? Sounds odd... The iscsi hba is only an initiator, so it doesn't need to respond to arp queries. I guess the Source Protocol Address in the outgoing ARP request will be the ipaddr of the outgoing interface. Its ok though because what is needed is the next-hop peer's hwaddr. So the ARP reply comes in, updates the host neigh entry, and a NEIGH_EVENT callout is performed to the offload device drivers. It is a little hackish, but that's the only way the netdev maintainers would allow iscsi offload in. They originally tried to use the src address from the ethX interface for the offload iscsi connections and that was rejected. In case you're interested...Here is the tail end of the cxgb3i original submission thread showing the use of a private IP address which is unkown to the OS. http://marc.info/?l=linux-netdevm=121944339211552 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] RDMA/CMA: fix iWARP adapter TCP port space usage
On Wed, Jun 23, 2010 at 03:19:31PM -0500, Steve Wise wrote: So who responds to neighbor queries, and how do outgoing queries get sent with the right IP? Sounds odd... The iscsi hba is only an initiator, so it doesn't need to respond to arp queries. Hmm.. The other side could arp you at any time, and if you don't answer stuff can go bad, so something must be generating the replies. But I guess that is seperate, sounds like iSCSI is in a similar boat and they were not able to reconcile either? :( Jason -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] RDMA/CMA: fix iWARP adapter TCP port space usage
Jason Gunthorpe wrote: On Wed, Jun 23, 2010 at 03:19:31PM -0500, Steve Wise wrote: So who responds to neighbor queries, and how do outgoing queries get sent with the right IP? Sounds odd... The iscsi hba is only an initiator, so it doesn't need to respond to arp queries. Hmm.. The other side could arp you at any time, and if you don't answer stuff can go bad, so something must be generating the replies. You're right! The low level driver, cxgb3, handles it. See cxgb3_arp_process(). I missed this change in the original submission of cxgb3i... But I guess that is seperate, sounds like iSCSI is in a similar boat and they were not able to reconcile either? :( Jason -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v2] RDMA/CMA: fix iWARP adapter TCP port space usage
On the other hand trying to hook offloaded iWARP into the normal stack does seem to lead to a mess. I see DaveM's point: TCP port space is just the beginning -- filtering, queueing, etc also have config that ultimately an offload device would want to hook too. TCP port space is just the beginning but then these features didn't show up all at once in the kernel either. Instead of evolving iWARP implementation, we can't even take a baby step and fix a flaw that exists in the current kernel. Why are we replicating everything offered by the host stack instead of hooking in? It does not sound like good engineering to me. Well as I said I don't particularly see a clean solution. But the point I was making was that the net stack is already very complex with many places where interface configs are controlled -- having to add hooks to pass that config on to offload devices is going to add even more complexity and also add constraints to the format of that config information. Which is not good. I don't want separate config file for L2 and iWARP as it adds more work and complexity for the user. I want it dead simple. I can see extending config format to include information specific for offload but I don't see how it can limit the format. That has not been the case up to this point. Also, port space patch is totally transparent to the user and config file. There is no managing host TCP and iWARP TCP port space for the user. I'm not sure about passing config info to offload devices, if the info is outside of what L2 driver currently picks up then sure some work needs to be done. Hopefully everything can be pass-through from L2 to iWARP. Chien -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mlx4 pci device table
If the table is placed in mlx4_core (as of today in upstream), then I assume the mlx4_en and _ib aren't being probed by pci hot-plug mechasnisms, correct? else if you put it in _en _ib et al files, then one has to maintain two copies of the table, but maybe this would be the correct approach? how this should work with multi-protcol mlx4 devices and/or IBoE? I think the current upstream location is correct. This matches the practice of eg iw_cxgb3 as well as cxgb3i, bnx2i etc. This does have the disadvantage that mlx4_en and mlx4_ib are not auto-loaded by PCI hotplug, but so it goes. -- Roland Dreier rola...@cisco.com || For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: When IBoE will be merged to upstream?
This is actually a continue of the RAW_ET() issue. We want to make a submition of the patches to the upstream, but there is not support for IB transport in Ethernet devices, and the mlx4_en drivers version is a bit outdated 1.4.1.1 in upstream and 1.5.1 in the OFED There is also missing VLAN support that already present in the OFED. When do you planning to submit changes from OFED to upstream? - I do not search for more things to merge upstream. I have enough work reviewing things that are sent to me. So I will never look through OFED for changes. - I do not handle the mlx4_en driver. Changes for mlx4_en should go to netdev and Dave Miller. - I will try to get back to the IBoE changes when I have time, and I will admit that my time to spend as RDMA maintainer is nowhere near full time and less than it was in the past. - I did allocate a fair amount of time to spend on IBoE recently but unfortunately the patches were not really in a suitable state to merge, and I exhausted that time slice before we reached the end. When patch sets sit outside of the upstream kernel and are shipped in OFED for months and years, it would probably make upstream merging easier if that time was used to fix the patch set. - Specifically for the IBoE patches, shouldn't someone have realized that having a device-specific interface to do the standard mapping of GID to Ethernet address makes no sense? -- Roland Dreier rola...@cisco.com || For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html