Re: [ewg] [PATCH] pkey fix for ipoib - resubmission

2010-06-23 Thread Or Gerlitz

Jason Gunthorpe wrote:

OFED works on kernels that have compiled-in inline'd multicast map functions 
that do not include the pkey copy, while mainline's multicast map functions do. 
So to work around this there is a bit of code in OFED to overwrite the pkey in 
the multicast hw address. This means on OFED with those kernels ip maddr 
returns the wrong hw address sometimes..
okay, got it. Anyway, with this not being the essence of the patch nor 
the discussion here, I would wait to hear what  Todd and Mike think 
about your suggestion to apply the approach taken for the bonding 
problem and solution.


Or.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Name for a new type of QP

2010-06-23 Thread Moni Shoua
Hi,
This message follows a discussion in the EWG mailing list.

We want to promote a patch that enables use of a new QP type.
This QP type lets the user post_send() data to its SQ and treat it as the 
entire packet, including headers.
An example of use with this QP is sending Ethernet packets from userspace (and 
enjoying kernel bypass).

An open question in this matter it how should we call this QP type.
The first name IBV_QPT_RAW_ETH seems to be too similar to the existing type 
IBV_QPT_RAW_ETY.

My suggestion (that were posted in a different thread) are

IBV_QPT_FRAME
IBV_QPT_PACKET
IBV_QPT_NOHDR

Please make your comments and send your suggestions.

When we decide about a name we will send a patch that enables the use of this 
QP type.


thanks

Moni
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Name for a new type of QP

2010-06-23 Thread Walukiewicz, Miroslaw
I would prefer a name IBV_QPT_FRAME so it is a L2 layerQP. The packet is 
reserved for L3.

Regards,

Mirek

-Original Message-
From: Moni Shoua [mailto:mo...@voltaire.com] 
Sent: Wednesday, June 23, 2010 11:20 AM
To: linux-rdma
Cc: Walukiewicz, Miroslaw; Roland Dreier; al...@voltaire.com
Subject: Name for a new type of QP

Hi,
This message follows a discussion in the EWG mailing list.

We want to promote a patch that enables use of a new QP type.
This QP type lets the user post_send() data to its SQ and treat it as the 
entire packet, including headers.
An example of use with this QP is sending Ethernet packets from userspace (and 
enjoying kernel bypass).

An open question in this matter it how should we call this QP type.
The first name IBV_QPT_RAW_ETH seems to be too similar to the existing type 
IBV_QPT_RAW_ETY.

My suggestion (that were posted in a different thread) are

IBV_QPT_FRAME
IBV_QPT_PACKET
IBV_QPT_NOHDR

Please make your comments and send your suggestions.

When we decide about a name we will send a patch that enables the use of this 
QP type.


thanks

Moni
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Name for a new type of QP

2010-06-23 Thread Alex Rosenbaum
It is true that on the send path for such a QP you can send any sort of
packet. The user of the QP is required to build the MAC header and the
NIC will wrap the post_sent data in an Ethernet type II frame which
includes the tailing CRC.

On the receive path this QP assume an MAC header. The qp attach GID
value gets the multicast MAC destination address to catch the correct
ingress packets.

Bottom line is that I think IBV_QPT_RAW_ETH is a good name.
If not I would like to recommend to use IBV_QPT_RAW_PACKET which hold a
resemblance to socket(AF_PACKET, SOCK_RAW,...)

On the other hand, I never fully understood what does IBV_QPT_RAW_ETY
stand for? Maybe we should change its name to better represent what the
code does.


_
 Alex Rosenbaum 

 


-Original Message-
From: Walukiewicz, Miroslaw [mailto:miroslaw.walukiew...@intel.com] 
Sent: Wednesday, June 23, 2010 1:21 PM
To: Moni Shoua; linux-rdma
Cc: Roland Dreier; Alex Rosenbaum
Subject: RE: Name for a new type of QP

I would prefer a name IBV_QPT_FRAME so it is a L2 layerQP. The packet is
reserved for L3.

Regards,

Mirek

-Original Message-
From: Moni Shoua [mailto:mo...@voltaire.com] 
Sent: Wednesday, June 23, 2010 11:20 AM
To: linux-rdma
Cc: Walukiewicz, Miroslaw; Roland Dreier; al...@voltaire.com
Subject: Name for a new type of QP

Hi,
This message follows a discussion in the EWG mailing list.

We want to promote a patch that enables use of a new QP type.
This QP type lets the user post_send() data to its SQ and treat it as
the entire packet, including headers.
An example of use with this QP is sending Ethernet packets from
userspace (and enjoying kernel bypass).

An open question in this matter it how should we call this QP type.
The first name IBV_QPT_RAW_ETH seems to be too similar to the existing
type IBV_QPT_RAW_ETY.

My suggestion (that were posted in a different thread) are

IBV_QPT_FRAME
IBV_QPT_PACKET
IBV_QPT_NOHDR

Please make your comments and send your suggestions.

When we decide about a name we will send a patch that enables the use of
this QP type.


thanks

Moni
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] RDMA/cxgb4: derive smac_idx from port viid.

2010-06-23 Thread Steve Wise
Signed-off-by: Steve Wise sw...@opengridcomputing.com
---

 drivers/infiniband/hw/cxgb4/cm.c |9 +
 1 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c
index f75108f..8c9b483 100644
--- a/drivers/infiniband/hw/cxgb4/cm.c
+++ b/drivers/infiniband/hw/cxgb4/cm.c
@@ -1373,7 +1373,7 @@ static int pass_accept_req(struct c4iw_dev *dev, struct 
sk_buff *skb)
pdev, 0);
mtu = pdev-mtu;
tx_chan = cxgb4_port_chan(pdev);
-   smac_idx = tx_chan  1;
+   smac_idx = (cxgb4_port_viid(pdev)  0x7F)  1;
step = dev-rdev.lldi.ntxq / dev-rdev.lldi.nchan;
txq_idx = cxgb4_port_idx(pdev) * step;
step = dev-rdev.lldi.nrxq / dev-rdev.lldi.nchan;
@@ -1384,7 +1384,7 @@ static int pass_accept_req(struct c4iw_dev *dev, struct 
sk_buff *skb)
dst-neighbour-dev, 0);
mtu = dst_mtu(dst);
tx_chan = cxgb4_port_chan(dst-neighbour-dev);
-   smac_idx = tx_chan  1;
+   smac_idx = (cxgb4_port_viid(dst-neighbour-dev)  0x7F)  1;
step = dev-rdev.lldi.ntxq / dev-rdev.lldi.nchan;
txq_idx = cxgb4_port_idx(dst-neighbour-dev) * step;
step = dev-rdev.lldi.nrxq / dev-rdev.lldi.nchan;
@@ -1951,7 +1951,7 @@ int c4iw_connect(struct iw_cm_id *cm_id, struct 
iw_cm_conn_param *conn_param)
pdev, 0);
ep-mtu = pdev-mtu;
ep-tx_chan = cxgb4_port_chan(pdev);
-   ep-smac_idx = ep-tx_chan  1;
+   ep-smac_idx = (cxgb4_port_viid(pdev)  0x7F)  1;
step = ep-com.dev-rdev.lldi.ntxq /
   ep-com.dev-rdev.lldi.nchan;
ep-txq_idx = cxgb4_port_idx(pdev) * step;
@@ -1966,7 +1966,8 @@ int c4iw_connect(struct iw_cm_id *cm_id, struct 
iw_cm_conn_param *conn_param)
ep-dst-neighbour-dev, 0);
ep-mtu = dst_mtu(ep-dst);
ep-tx_chan = cxgb4_port_chan(ep-dst-neighbour-dev);
-   ep-smac_idx = ep-tx_chan  1;
+   ep-smac_idx = (cxgb4_port_viid(ep-dst-neighbour-dev) 
+   0x7F)  1;
step = ep-com.dev-rdev.lldi.ntxq /
   ep-com.dev-rdev.lldi.nchan;
ep-txq_idx = cxgb4_port_idx(ep-dst-neighbour-dev) * step;

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] RDMA/cxgb4: Add module option to tweak delayed ack.

2010-06-23 Thread Steve Wise
Signed-off-by: Steve Wise sw...@opengridcomputing.com
---

 drivers/infiniband/hw/cxgb4/cm.c  |   10 +-
 drivers/infiniband/hw/cxgb4/t4fw_ri_api.h |   10 ++
 2 files changed, 19 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c
index 8c9b483..fae6080 100644
--- a/drivers/infiniband/hw/cxgb4/cm.c
+++ b/drivers/infiniband/hw/cxgb4/cm.c
@@ -61,6 +61,10 @@ static char *states[] = {
NULL,
 };
 
+static int dack_mode;
+module_param(dack_mode, int, 0644);
+MODULE_PARM_DESC(dack_mode, Delayed ack mode (default=0));
+
 int c4iw_max_read_depth = 8;
 module_param(c4iw_max_read_depth, int, 0644);
 MODULE_PARM_DESC(c4iw_max_read_depth, Per-connection max ORD/IRD 
(default=8));
@@ -474,6 +478,7 @@ static int send_connect(struct c4iw_ep *ep)
cxgb4_best_mtu(ep-com.dev-rdev.lldi.mtus, ep-mtu, mtu_idx);
wscale = compute_wscale(rcv_win);
opt0 = KEEP_ALIVE(1) |
+  DELACK(1) |
   WND_SCALE(wscale) |
   MSS_IDX(mtu_idx) |
   L2T_IDX(ep-l2t-idx) |
@@ -845,7 +850,9 @@ static int update_rx_credits(struct c4iw_ep *ep, u32 
credits)
INIT_TP_WR(req, ep-hwtid);
OPCODE_TID(req) = cpu_to_be32(MK_OPCODE_TID(CPL_RX_DATA_ACK,
ep-hwtid));
-   req-credit_dack = cpu_to_be32(credits);
+   req-credit_dack = cpu_to_be32(credits | RX_FORCE_ACK(1) |
+  F_RX_DACK_CHANGE |
+  V_RX_DACK_MODE(dack_mode));
set_wr_txq(skb, CPL_PRIORITY_ACK, ep-txq_idx);
c4iw_ofld_send(ep-com.dev-rdev, skb);
return credits;
@@ -1264,6 +1271,7 @@ static void accept_cr(struct c4iw_ep *ep, __be32 peer_ip, 
struct sk_buff *skb,
cxgb4_best_mtu(ep-com.dev-rdev.lldi.mtus, ep-mtu, mtu_idx);
wscale = compute_wscale(rcv_win);
opt0 = KEEP_ALIVE(1) |
+  DELACK(1) |
   WND_SCALE(wscale) |
   MSS_IDX(mtu_idx) |
   L2T_IDX(ep-l2t-idx) |
diff --git a/drivers/infiniband/hw/cxgb4/t4fw_ri_api.h 
b/drivers/infiniband/hw/cxgb4/t4fw_ri_api.h
index fc706bd..dc193c2 100644
--- a/drivers/infiniband/hw/cxgb4/t4fw_ri_api.h
+++ b/drivers/infiniband/hw/cxgb4/t4fw_ri_api.h
@@ -826,4 +826,14 @@ struct ulptx_idata {
 #define S_ULPTX_NSGE0
 #define M_ULPTX_NSGE0x
 #define V_ULPTX_NSGE(x) ((x)  S_ULPTX_NSGE)
+
+#define S_RX_DACK_MODE29
+#define M_RX_DACK_MODE0x3
+#define V_RX_DACK_MODE(x) ((x)  S_RX_DACK_MODE)
+#define G_RX_DACK_MODE(x) (((x)  S_RX_DACK_MODE)  M_RX_DACK_MODE)
+
+#define S_RX_DACK_CHANGE31
+#define V_RX_DACK_CHANGE(x) ((x)  S_RX_DACK_CHANGE)
+#define F_RX_DACK_CHANGEV_RX_DACK_CHANGE(1U)
+
 #endif /* _T4FW_RI_API_H_ */

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] RDMA/cxgb4: Obtain RDMA QID ranges from LLD/FW.

2010-06-23 Thread Steve Wise
Signed-off-by: Steve Wise sw...@opengridcomputing.com
---

 drivers/infiniband/hw/cxgb4/device.c   |9 +++--
 drivers/infiniband/hw/cxgb4/resource.c |7 ---
 drivers/infiniband/hw/cxgb4/t4.h   |2 --
 3 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/device.c 
b/drivers/infiniband/hw/cxgb4/device.c
index d870f9c..e047ee8 100644
--- a/drivers/infiniband/hw/cxgb4/device.c
+++ b/drivers/infiniband/hw/cxgb4/device.c
@@ -250,12 +250,17 @@ static int c4iw_rdev_open(struct c4iw_rdev *rdev)
rdev-cqshift = PAGE_SHIFT - ilog2(rdev-lldi.ucq_density);
rdev-cqmask = rdev-lldi.ucq_density - 1;
PDBG(%s dev %s stag start 0x%0x size 0x%0x num stags %d 
-pbl start 0x%0x size 0x%0x rq start 0x%0x size 0x%0x\n,
+pbl start 0x%0x size 0x%0x rq start 0x%0x size 0x%0x 
+qp qid start %u size %u cq qid start %u size %u\n,
 __func__, pci_name(rdev-lldi.pdev), rdev-lldi.vr-stag.start,
 rdev-lldi.vr-stag.size, c4iw_num_stags(rdev),
 rdev-lldi.vr-pbl.start,
 rdev-lldi.vr-pbl.size, rdev-lldi.vr-rq.start,
-rdev-lldi.vr-rq.size);
+rdev-lldi.vr-rq.size,
+rdev-lldi.vr-qp.start, 
+rdev-lldi.vr-qp.size, 
+rdev-lldi.vr-cq.start, 
+rdev-lldi.vr-cq.size);
PDBG(udb len 0x%x udb base %p db_reg %p gts_reg %p qpshift %lu 
 qpmask 0x%x cqshift %lu cqmask 0x%x\n,
 (unsigned)pci_resource_len(rdev-lldi.pdev, 2),
diff --git a/drivers/infiniband/hw/cxgb4/resource.c 
b/drivers/infiniband/hw/cxgb4/resource.c
index fb195d1..83b23df 100644
--- a/drivers/infiniband/hw/cxgb4/resource.c
+++ b/drivers/infiniband/hw/cxgb4/resource.c
@@ -110,11 +110,12 @@ static int c4iw_init_qid_fifo(struct c4iw_rdev *rdev)
 
spin_lock_init(rdev-resource.qid_fifo_lock);
 
-   if (kfifo_alloc(rdev-resource.qid_fifo, T4_MAX_QIDS * sizeof(u32),
-   GFP_KERNEL))
+   if (kfifo_alloc(rdev-resource.qid_fifo, rdev-lldi.vr-qp.size *
+   sizeof(u32), GFP_KERNEL))
return -ENOMEM;
 
-   for (i = T4_QID_BASE; i  T4_QID_BASE + T4_MAX_QIDS; i++)
+   for (i = rdev-lldi.vr-qp.start;
+i  rdev-lldi.vr-qp.start + rdev-lldi.vr-qp.size; i++)
if (!(i  rdev-qpmask))
kfifo_in(rdev-resource.qid_fifo,
(unsigned char *) i, sizeof(u32));
diff --git a/drivers/infiniband/hw/cxgb4/t4.h b/drivers/infiniband/hw/cxgb4/t4.h
index 97798d4..e0b4ae0 100644
--- a/drivers/infiniband/hw/cxgb4/t4.h
+++ b/drivers/infiniband/hw/cxgb4/t4.h
@@ -36,8 +36,6 @@
 #include t4_msg.h
 #include t4fw_ri_api.h
 
-#define T4_QID_BASE 1024
-#define T4_MAX_QIDS 256
 #define T4_MAX_NUM_QP (116)
 #define T4_MAX_NUM_CQ (115)
 #define T4_MAX_NUM_PD (115)

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] RDMA/cxgb4: derive smac_idx from port viid.

2010-06-23 Thread Steve Wise

Hey Roland,

Please ignore these 3 patches.  I forgot to run checkpatch on them and 
they need some cleanup.


I'll re-submit as v2 of the series.

Sorry for the noise.

Steve.


Steve Wise wrote:

Signed-off-by: Steve Wise sw...@opengridcomputing.com
---

 drivers/infiniband/hw/cxgb4/cm.c |9 +
 1 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c
index f75108f..8c9b483 100644
--- a/drivers/infiniband/hw/cxgb4/cm.c
+++ b/drivers/infiniband/hw/cxgb4/cm.c
@@ -1373,7 +1373,7 @@ static int pass_accept_req(struct c4iw_dev *dev, struct 
sk_buff *skb)
pdev, 0);
mtu = pdev-mtu;
tx_chan = cxgb4_port_chan(pdev);
-   smac_idx = tx_chan  1;
+   smac_idx = (cxgb4_port_viid(pdev)  0x7F)  1;
step = dev-rdev.lldi.ntxq / dev-rdev.lldi.nchan;
txq_idx = cxgb4_port_idx(pdev) * step;
step = dev-rdev.lldi.nrxq / dev-rdev.lldi.nchan;
@@ -1384,7 +1384,7 @@ static int pass_accept_req(struct c4iw_dev *dev, struct 
sk_buff *skb)
dst-neighbour-dev, 0);
mtu = dst_mtu(dst);
tx_chan = cxgb4_port_chan(dst-neighbour-dev);
-   smac_idx = tx_chan  1;
+   smac_idx = (cxgb4_port_viid(dst-neighbour-dev)  0x7F)  1;
step = dev-rdev.lldi.ntxq / dev-rdev.lldi.nchan;
txq_idx = cxgb4_port_idx(dst-neighbour-dev) * step;
step = dev-rdev.lldi.nrxq / dev-rdev.lldi.nchan;
@@ -1951,7 +1951,7 @@ int c4iw_connect(struct iw_cm_id *cm_id, struct 
iw_cm_conn_param *conn_param)
pdev, 0);
ep-mtu = pdev-mtu;
ep-tx_chan = cxgb4_port_chan(pdev);
-   ep-smac_idx = ep-tx_chan  1;
+   ep-smac_idx = (cxgb4_port_viid(pdev)  0x7F)  1;
step = ep-com.dev-rdev.lldi.ntxq /
   ep-com.dev-rdev.lldi.nchan;
ep-txq_idx = cxgb4_port_idx(pdev) * step;
@@ -1966,7 +1966,8 @@ int c4iw_connect(struct iw_cm_id *cm_id, struct 
iw_cm_conn_param *conn_param)
ep-dst-neighbour-dev, 0);
ep-mtu = dst_mtu(ep-dst);
ep-tx_chan = cxgb4_port_chan(ep-dst-neighbour-dev);
-   ep-smac_idx = ep-tx_chan  1;
+   ep-smac_idx = (cxgb4_port_viid(ep-dst-neighbour-dev) 
+   0x7F)  1;
step = ep-com.dev-rdev.lldi.ntxq /
   ep-com.dev-rdev.lldi.nchan;
ep-txq_idx = cxgb4_port_idx(ep-dst-neighbour-dev) * step;

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
  


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/3] RDMA/cxgb4: derive smac_idx from port viid.

2010-06-23 Thread Steve Wise
Signed-off-by: Steve Wise sw...@opengridcomputing.com
---

 drivers/infiniband/hw/cxgb4/cm.c |9 +
 1 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c
index f75108f..8c9b483 100644
--- a/drivers/infiniband/hw/cxgb4/cm.c
+++ b/drivers/infiniband/hw/cxgb4/cm.c
@@ -1373,7 +1373,7 @@ static int pass_accept_req(struct c4iw_dev *dev, struct 
sk_buff *skb)
pdev, 0);
mtu = pdev-mtu;
tx_chan = cxgb4_port_chan(pdev);
-   smac_idx = tx_chan  1;
+   smac_idx = (cxgb4_port_viid(pdev)  0x7F)  1;
step = dev-rdev.lldi.ntxq / dev-rdev.lldi.nchan;
txq_idx = cxgb4_port_idx(pdev) * step;
step = dev-rdev.lldi.nrxq / dev-rdev.lldi.nchan;
@@ -1384,7 +1384,7 @@ static int pass_accept_req(struct c4iw_dev *dev, struct 
sk_buff *skb)
dst-neighbour-dev, 0);
mtu = dst_mtu(dst);
tx_chan = cxgb4_port_chan(dst-neighbour-dev);
-   smac_idx = tx_chan  1;
+   smac_idx = (cxgb4_port_viid(dst-neighbour-dev)  0x7F)  1;
step = dev-rdev.lldi.ntxq / dev-rdev.lldi.nchan;
txq_idx = cxgb4_port_idx(dst-neighbour-dev) * step;
step = dev-rdev.lldi.nrxq / dev-rdev.lldi.nchan;
@@ -1951,7 +1951,7 @@ int c4iw_connect(struct iw_cm_id *cm_id, struct 
iw_cm_conn_param *conn_param)
pdev, 0);
ep-mtu = pdev-mtu;
ep-tx_chan = cxgb4_port_chan(pdev);
-   ep-smac_idx = ep-tx_chan  1;
+   ep-smac_idx = (cxgb4_port_viid(pdev)  0x7F)  1;
step = ep-com.dev-rdev.lldi.ntxq /
   ep-com.dev-rdev.lldi.nchan;
ep-txq_idx = cxgb4_port_idx(pdev) * step;
@@ -1966,7 +1966,8 @@ int c4iw_connect(struct iw_cm_id *cm_id, struct 
iw_cm_conn_param *conn_param)
ep-dst-neighbour-dev, 0);
ep-mtu = dst_mtu(ep-dst);
ep-tx_chan = cxgb4_port_chan(ep-dst-neighbour-dev);
-   ep-smac_idx = ep-tx_chan  1;
+   ep-smac_idx = (cxgb4_port_viid(ep-dst-neighbour-dev) 
+   0x7F)  1;
step = ep-com.dev-rdev.lldi.ntxq /
   ep-com.dev-rdev.lldi.nchan;
ep-txq_idx = cxgb4_port_idx(ep-dst-neighbour-dev) * step;

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] IB/qib: turn off IB latency mode

2010-06-23 Thread Ralph Campbell
Turn off IB latency mode. This improves link quality for slower
process chips.

Signed-off-by: Ralph Campbell ralph.campb...@qlogic.com
---

 drivers/infiniband/hw/qib/qib_iba7322.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/qib/qib_iba7322.c 
b/drivers/infiniband/hw/qib/qib_iba7322.c
index 5eedf83..fc14ef8 100644
--- a/drivers/infiniband/hw/qib/qib_iba7322.c
+++ b/drivers/infiniband/hw/qib/qib_iba7322.c
@@ -7271,6 +7271,8 @@ static int serdes_7322_init(struct qib_pportdata *ppd)
ibsd_wr_allchans(ppd, 20, (4  13), BMASK(15, 13)); /* SDR */
 
data = qib_read_kreg_port(ppd, krp_serdesctrl);
+   /* Turn off IB latency mode */
+   data = ~SYM_MASK(IBSerdesCtrl_0, IB_LAT_MODE);
qib_write_kreg_port(ppd, krp_serdesctrl, data |
SYM_MASK(IBSerdesCtrl_0, RXLOSEN));
 

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How do I get scst_vdisk/IB_SRP(T) to properly handle drives w/ 4KB sectors?

2010-06-23 Thread Vladislav Bolkhovitin

Chris Worley, on 06/22/2010 08:06 PM wrote:

When given an LBA w/ a bad boundary, the drive returns an error and
the target side says:

dev_vdisk: ***ERROR***: cmd 810196f58b70 returned error -22

... and the initiator:

sd 8:0:0:0: SCSI error: return code = 0x0802
sdc: Current: sense key: Medium Error
Add. Sense: Unrecovered read error

Is there a way to tell scst that this drive requires 4KB block sizes,
and pass that upstream?


I'm not sure what you mean here under tell and pass upstream. 
Generally, such problems are outside of SCST scope and responsibilities. 
With vdisk kernel I/O stack should make sure you use correct alignment 
accessing your backend drive and you can always choose your own 512b 
block size for all vdisk devices.


Vlad
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] RDMA/CMA: fix iWARP adapter TCP port space usage

2010-06-23 Thread Roland Dreier
  iWARP is just another protocol on top of TCP - like iSCSI. There is
  no good reason to invent another TCP port maintainer per TCP user
  type trying to synchonize with the kernel if the resource is host
  global and already maintained by the kernel.

I think the counter-argument to this is than an iWARP offload NIC is an
independent TCP stack and hence should not be tied into the host stack.
It's interesting that you bring up iSCSI -- as I understand things,
iSCSI offload HBAs are typically configured with their own IP, through a
separate mechanism.  (The port collision problem is not likely to be hit
with iSCSI, since the HBA is an initiator and hence does only active
connections, and a 4-tuple collision between connections to the iSCSI
target is not likely and other host stack traffic is extremely unlikely)

  Since we are developing and already open sourced a full software
  implementation (SoftiWARP) of RDMA, our view on the optimal solution
  must be different. Like kernel iSCSI, we are running on top of regular
  kernel sockets. With that, there is no point having a connection manager
  blocking just the port we wanted to use for communication - SoftiWARP
  uses kernel sockets for data communication.

I think this is an extremely strong argument against the patch that
started the thread.  Breaking soft iWARP seems a fatal flaw.

 - R.
-- 
Roland Dreier rola...@cisco.com || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ewg] [PATCH v4] IB Core: RAW ETH support

2010-06-23 Thread Roland Dreier
  There is no qp type IBV_QPT_RAW_ETY in user space (at least not in the 
  definitions
  coming with libibverbs). In fact, libibverbs that comes with OFED defines 
  (in verbs.h)
  a qp type called IBV_QPT_RAW_ETT which equals to 7.
  The patch that is under discussion here adds a new qp type IB_QPT_RAW_ETH 
  and equals it to 7
  to match the definition in user space. This indeed changes the value of 
  IB_QPT_RAW_ETY to 8
  but I don't see who can be affected since
  1. No user space program that uses IB_QPT_RAW_ETY exists
  2. kernel is compiled as one piece of code.

Why renumber the _ETY enum?  Maybe it doesn't break anything serious but
why risk it?
-- 
Roland Dreier rola...@cisco.com || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How do I get scst_vdisk/IB_SRP(T) to properly handle drives w/ 4KB sectors?

2010-06-23 Thread Chris Worley
On Wed, Jun 23, 2010 at 11:08 AM, Vladislav Bolkhovitin v...@vlnb.net wrote:
 Chris Worley, on 06/22/2010 08:06 PM wrote:

 When given an LBA w/ a bad boundary, the drive returns an error and
 the target side says:

 dev_vdisk: ***ERROR***: cmd 810196f58b70 returned error -22

 ... and the initiator:

 sd 8:0:0:0: SCSI error: return code = 0x0802
 sdc: Current: sense key: Medium Error
    Add. Sense: Unrecovered read error

 Is there a way to tell scst that this drive requires 4KB block sizes,
 and pass that upstream?

 I'm not sure what you mean here under tell and pass upstream. Generally,
 such problems are outside of SCST scope and responsibilities. With vdisk
 kernel I/O stack should make sure you use correct alignment accessing your
 backend drive and you can always choose your own 512b block size for all
 vdisk devices.

DOH!

Thanks,

Chris

 Vlad

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] RDMA/CMA: fix iWARP adapter TCP port space usage

2010-06-23 Thread Steve Wise

Roland Dreier wrote:

  Since we are developing and already open sourced a full software
  implementation (SoftiWARP) of RDMA, our view on the optimal solution
  must be different. Like kernel iSCSI, we are running on top of regular
  kernel sockets. With that, there is no point having a connection manager
  blocking just the port we wanted to use for communication - SoftiWARP
  uses kernel sockets for data communication.

I think this is an extremely strong argument against the patch that
started the thread.  Breaking soft iWARP seems a fatal flaw.

 - R.
  


I agree.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Name for a new type of QP

2010-06-23 Thread Roland Dreier
  On the other hand, I never fully understood what does IBV_QPT_RAW_ETY
  stand for? Maybe we should change its name to better represent what the
  code does.

Picking names for things is not my strongest suit, and I don't have a
very good suggestion, so I'll leave that out.  But on the point above,
RAW_ETY is for the IBA raw ethertype special QP type.  And I think it
would probably be a good idea to change the enum from IBV_QPT_RAW_ETY to
something like IBV_QPT_RAW_ETHERTYPE.

 - R.
-- 
Roland Dreier rola...@cisco.com || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/7] various fixes for QIB driver

2010-06-23 Thread Roland Dreier
  The following patches are for various bug fixes.
  I'm not sure what counts as a regression for code that is newly introduced.
  I'm hoping that all except #2 can be made for 2.6.35 whereas
  #2 can wait for 2.6.36 since it is actually a feature.

All except #2 look OK for 2.6.35.  I'll hold #2 for 2.6.36 -- I hope
it's independent?

In the future it might be cleaner to send a series 1-6 of fixes for
2.6.35 and then send the port assignment one as a 2.6.36 patch separate
from the series.  (No need to resend here)

 - R.
-- 
Roland Dreier rola...@cisco.com || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] RDMA/CMA: fix iWARP adapter TCP port space usage

2010-06-23 Thread Roland Dreier
  Roland, do you think the iSCSI approach is a good design for iWARP
  devices?

Well, it's a different problem since as I said the port collision
problem is a non-issue for iSCSI anyway.  But yes having a separate
interface to assign an iWARP IP address to an RNIC does seem to avoid
the immediate problem.

I actually don't know what the right answer is -- having a separate IP
address for iWARP does seem to lead to having to duplicate everything
for configuring it.  (And this is the approach for the cxgb[34] iSCSI
drivers, right?)

On the other hand trying to hook offloaded iWARP into the normal stack
does seem to lead to a mess.  I see DaveM's point: TCP port space is
just the beginning -- filtering, queueing, etc also have config that
ultimately an offload device would want to hook too.

Maybe the sanest out of a bad set of options would be to come up with a
standard way to configure independent TCP/IP stacks that share a link.

really, dunno.

 - R.
-- 
Roland Dreier rola...@cisco.com || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v2] RDMA/CMA: fix iWARP adapter TCP port space usage

2010-06-23 Thread Tung, Chien Tin
  I think this is an extremely strong argument against the patch that
  started the thread.  Breaking soft iWARP seems a fatal flaw.
 
   - R.
 
 
 I agree.

The patch or SoftiWARP can be reworked to allow the whole iWARP family
to coexist.  It is a matter of agreeing on which path to take.

Chien


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] RDMA/CMA: fix iWARP adapter TCP port space usage

2010-06-23 Thread Steve Wise

Tung, Chien Tin wrote:

I think this is an extremely strong argument against the patch that
started the thread.  Breaking soft iWARP seems a fatal flaw.

 - R.

  

I agree.



The patch or SoftiWARP can be reworked to allow the whole iWARP family
to coexist.  It is a matter of agreeing on which path to take.

  


I agree with this too! :)  My only reason for stating I agree with 
Roland/Bernard is that reserving a port in the rdma-cm definitely breaks 
software iwarp, so we need to rethink this whole thing in light of 
software iwarp.



Steve.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] RDMA/CMA: fix iWARP adapter TCP port space usage

2010-06-23 Thread Roland Dreier
  I just think the customer looses when we add iwarp-specific tools,
  ipaddrs, subnets, etc etc.  And what about software iwarp?  Will it
  use the host stack tools and not these new tools?  So then we end up
  with 2 sets of tools for iwarp devices. :(

Agree -- but same prob with current iSCSI offload stuff...
-- 
Roland Dreier rola...@cisco.com || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/3] RDMA/cxgb4: derive smac_idx from port viid.

2010-06-23 Thread Roland Dreier
what's smac_idx?  what's port viid?
hard to know what the heck this fixes :)
-- 
Roland Dreier rola...@cisco.com || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/3] RDMA/cxgb4: Add module option to tweak delayed ack.

2010-06-23 Thread Roland Dreier
is this fixing anything?  ie 2.6.35 or .36?
-- 
Roland Dreier rola...@cisco.com || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/3] RDMA/cxgb4: Obtain RDMA QID ranges from LLD/FW.

2010-06-23 Thread Roland Dreier
again fixing anything or just cleaning up?
-- 
Roland Dreier rola...@cisco.com || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/3] RDMA/cxgb4: derive smac_idx from port viid.

2010-06-23 Thread Steve Wise

Roland Dreier wrote:

what's smac_idx?  what's port viid?
hard to know what the heck this fixes :)
  


smac_idx == source mac index:  the index into the HW source mac table. 

viid = Virtual Interface ID: for virtualization, this allows having smac 
tables, among other things, per virtual device.  I was incorrectly 
computing the smac_idx in my previous code.  But it worked until a 
recent FW change I think.



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/3] RDMA/cxgb4: Add module option to tweak delayed ack.

2010-06-23 Thread Steve Wise

Roland Dreier wrote:

is this fixing anything?  ie 2.6.35 or .36?
  


2.6.36.  
--

To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/3] RDMA/cxgb4: Obtain RDMA QID ranges from LLD/FW.

2010-06-23 Thread Steve Wise

Roland Dreier wrote:

again fixing anything or just cleaning up?
  


This one is dependent on a cxgb3 change merged into net-next.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v2] RDMA/CMA: fix iWARP adapter TCP port space usage

2010-06-23 Thread Tung, Chien Tin
 On the other hand trying to hook offloaded iWARP into the normal stack
 does seem to lead to a mess.  I see DaveM's point: TCP port space is
 just the beginning -- filtering, queueing, etc also have config that
 ultimately an offload device would want to hook too.

TCP port space is just the beginning but then these features
didn't show up all at once in the kernel either.  Instead of
evolving iWARP implementation, we can't even take a baby step
and fix a flaw that exists in the current kernel.  Why are we
replicating everything offered by the host stack instead of
hooking in?  It does not sound like good engineering to me.

Chien

 

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] RDMA/CMA: fix iWARP adapter TCP port space usage

2010-06-23 Thread Jason Gunthorpe
On Wed, Jun 23, 2010 at 01:46:47PM -0500, Steve Wise wrote:

 Yes.  Perusing the drivers/scsi/cxgb3i code I see the iscsi ipaddr is  
 actually stored in the port_info struct which is hung of the netdev_priv  
 of the cxgb3 device.  It is set by cxgb3i_host_set_param() which is part  
 of the iscsi transport interface.

I wonder how does neighbor discovery, routing, etc work with iscsi?

 I just think the customer looses when we add iwarp-specific tools,  
 ipaddrs, subnets, etc etc.  And what about software iwarp?  Will it use  
 the host stack tools and not these new tools?  So then we end up with 2  
 sets of tools for iwarp devices. :(

Well, maybe you can get netdev to agree on some way to create an
interface that has all the IP services, but no TCP protocol binding?
Then the configuration could be largely the same. If you could share
that with the iscsi world then maybe it isn't so bad?

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] RDMA/CMA: fix iWARP adapter TCP port space usage

2010-06-23 Thread Steve Wise

Jason Gunthorpe wrote:

On Wed, Jun 23, 2010 at 01:46:47PM -0500, Steve Wise wrote:

  
Yes.  Perusing the drivers/scsi/cxgb3i code I see the iscsi ipaddr is  
actually stored in the port_info struct which is hung of the netdev_priv  
of the cxgb3 device.  It is set by cxgb3i_host_set_param() which is part  
of the iscsi transport interface.



I wonder how does neighbor discovery, routing, etc work with iscsi?

  


For cxgb3i:

ND is handled by initiating ND via exported kernel services 
(neigh_event_send()) and registering for NETEVENT_NEIGH_UPDATE net 
events to get updated neigh entries.


The host routing table is consulted via ip_route_output_flow() to map a 
destination ip address to a local netdev, and then if that device is T3, 
it will do the iscsi offload.




I just think the customer looses when we add iwarp-specific tools,  
ipaddrs, subnets, etc etc.  And what about software iwarp?  Will it use  
the host stack tools and not these new tools?  So then we end up with 2  
sets of tools for iwarp devices. :(



Well, maybe you can get netdev to agree on some way to create an
interface that has all the IP services, but no TCP protocol binding?
Then the configuration could be largely the same. If you could share
that with the iscsi world then maybe it isn't so bad?
  



Maybe.  I fear this will meet the same resistance from the netdev folks.


Steve.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] RDMA/CMA: fix iWARP adapter TCP port space usage

2010-06-23 Thread Steve Wise



I wonder how does neighbor discovery, routing, etc work with iscsi?

  


For cxgb3i:

ND is handled by initiating ND via exported kernel services 
(neigh_event_send()) and registering for NETEVENT_NEIGH_UPDATE net 
events to get updated neigh entries.


The host routing table is consulted via ip_route_output_flow() to map 
a destination ip address to a local netdev, and then if that device is 
T3, it will do the iscsi offload.







By the way, this is how iWARP works too.The ND stuff is done by the 
IWCM during RESOLVE_ADDR.  The routing lookups are done by the iWARP 
devices themselves typically.



Steve.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] RDMA/CMA: fix iWARP adapter TCP port space usage

2010-06-23 Thread Steve Wise

Steve Wise wrote:



I wonder how does neighbor discovery, routing, etc work with iscsi?

  


For cxgb3i:

ND is handled by initiating ND via exported kernel services 
(neigh_event_send()) and registering for NETEVENT_NEIGH_UPDATE net 
events to get updated neigh entries.


The host routing table is consulted via ip_route_output_flow() to map 
a destination ip address to a local netdev, and then if that device 
is T3, it will do the iscsi offload.







By the way, this is how iWARP works too.The ND stuff is done by 
the IWCM during RESOLVE_ADDR.  The routing lookups are done by the 
iWARP devices themselves typically.





Sorry I meant by the iWARP device drivers themselves

Steve.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] RDMA/CMA: fix iWARP adapter TCP port space usage

2010-06-23 Thread Roland Dreier
   On the other hand trying to hook offloaded iWARP into the normal stack
   does seem to lead to a mess.  I see DaveM's point: TCP port space is
   just the beginning -- filtering, queueing, etc also have config that
   ultimately an offload device would want to hook too.

  TCP port space is just the beginning but then these features
  didn't show up all at once in the kernel either.  Instead of
  evolving iWARP implementation, we can't even take a baby step
  and fix a flaw that exists in the current kernel.  Why are we
  replicating everything offered by the host stack instead of
  hooking in?  It does not sound like good engineering to me.

Well as I said I don't particularly see a clean solution.  But the point
I was making was that the net stack is already very complex with many
places where interface configs are controlled -- having to add hooks to
pass that config on to offload devices is going to add even more
complexity and also add constraints to the format of that config
information.  Which is not good.

 - R.
-- 
Roland Dreier rola...@cisco.com || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] RDMA/CMA: fix iWARP adapter TCP port space usage

2010-06-23 Thread Jason Gunthorpe
On Wed, Jun 23, 2010 at 02:42:43PM -0500, Steve Wise wrote:
 I wonder how does neighbor discovery, routing, etc work with iscsi?

 For cxgb3i:

 ND is handled by initiating ND via exported kernel services  
 (neigh_event_send()) and registering for NETEVENT_NEIGH_UPDATE net  
 events to get updated neigh entries.

 The host routing table is consulted via ip_route_output_flow() to map a  
 destination ip address to a local netdev, and then if that device is T3,  
 it will do the iscsi offload.

That is what RDMA does.. So that means that the IP used for iscsi is
actually an IP assigned to the interface? Doesn't that mean the port
collision problem still exits, although probably less likely?

 Well, maybe you can get netdev to agree on some way to create an
 interface that has all the IP services, but no TCP protocol binding?
 Then the configuration could be largely the same. If you could share
 that with the iscsi world then maybe it isn't so bad?

 Maybe.  I fear this will meet the same resistance from the netdev folks.

Hmm.. It kinds codifies what is already in the kernel, these offload
devices rely on neighbour and routing services from netdev and provide
their own TCP on top of it...

But.. having a device that effectively swaps the entire TCP
implementation for a proprietary version is not going to be popular
either.

At the very least, bringing iSCSI offload NICs into your solution
broadens the applicability.

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] RDMA/CMA: fix iWARP adapter TCP port space usage

2010-06-23 Thread Roland Dreier
  Doesn't that mean the port collision problem still exits [for iSCSI],
  although probably less likely?

Yes, it's there, but almost impossible to hit: first of all, iSCSI HBAs
never listen on a port, so that can never collide.  Second, iSCSI HBAs
only establish connections to iSCSI targets on the iSCSI port -- so
really your only chance of a problem is if you ran an offloaded and
non-offloaded iSCSI initiator on the same IP to the same target, _and_
you got unlucky on the local ports that you chose.  So in practice no
one will hit this.
-- 
Roland Dreier rola...@cisco.com || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] RDMA/CMA: fix iWARP adapter TCP port space usage

2010-06-23 Thread Steve Wise

Jason Gunthorpe wrote:

On Wed, Jun 23, 2010 at 02:42:43PM -0500, Steve Wise wrote:
  

I wonder how does neighbor discovery, routing, etc work with iscsi?
  

For cxgb3i:

ND is handled by initiating ND via exported kernel services  
(neigh_event_send()) and registering for NETEVENT_NEIGH_UPDATE net  
events to get updated neigh entries.


The host routing table is consulted via ip_route_output_flow() to map a  
destination ip address to a local netdev, and then if that device is T3,  
it will do the iscsi offload.



That is what RDMA does.. So that means that the IP used for iscsi is
actually an IP assigned to the interface?

  


The IP address assigned for the cxgb3i iscsi device is _not_ assigned to 
a netdev interface via ifconfig, as far as I understand it (by looking 
at the cxgb3i code).  So the host stack doesn't know about this 
address.  There is an administrative requirement, I assume, that the 
secret iscsi ipaddr is within a subnet that is bound to the T3 ethX 
interface.  Otherwise the routing lookup wouldn't work.



Steve.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] RDMA/CMA: fix iWARP adapter TCP port space usage

2010-06-23 Thread Steve Wise

Roland Dreier wrote:

  Doesn't that mean the port collision problem still exits [for iSCSI],
  although probably less likely?

Yes, it's there, but almost impossible to hit: first of all, iSCSI HBAs
never listen on a port, so that can never collide.  Second, iSCSI HBAs
only establish connections to iSCSI targets on the iSCSI port -- so
really your only chance of a problem is if you ran an offloaded and
non-offloaded iSCSI initiator on the same IP to the same target, _and_
you got unlucky on the local ports that you chose.  So in practice no
one will hit this.
  


I believe, at least for cxgb3i, the ipaddr used is not bound to an ethX 
interface.  So the 4-tuple will never collide with host TCP connections.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] RDMA/CMA: fix iWARP adapter TCP port space usage

2010-06-23 Thread Steve Wise

Jason Gunthorpe wrote:

On Wed, Jun 23, 2010 at 03:11:27PM -0500, Steve Wise wrote:
  
The IP address assigned for the cxgb3i iscsi device is _not_ assigned to  
a netdev interface via ifconfig, as far as I understand it (by looking  
at the cxgb3i code).  So the host stack doesn't know about this address.  
There is an administrative requirement, I assume, that the secret iscsi 
ipaddr is within a subnet that is bound to the T3 ethX interface.  
Otherwise the routing lookup wouldn't work.



So who responds to neighbor queries, and how do outgoing queries get
sent with the right IP? Sounds odd...

  


The iscsi hba is only an initiator, so it doesn't need to respond to arp 
queries.  I guess the Source Protocol Address in the outgoing ARP 
request will be the ipaddr of the outgoing interface.   Its ok though 
because what is needed is the next-hop peer's hwaddr.   So the ARP reply 
comes in, updates the host neigh entry, and a NEIGH_EVENT callout is 
performed to the offload device drivers.  It is a little hackish, but 
that's the only way the netdev maintainers would allow iscsi offload 
in.  They originally tried to use the src address from the ethX 
interface for the offload iscsi connections and that was rejected.



Steve.



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] RDMA/CMA: fix iWARP adapter TCP port space usage

2010-06-23 Thread Steve Wise

Steve Wise wrote:

Jason Gunthorpe wrote:

On Wed, Jun 23, 2010 at 03:11:27PM -0500, Steve Wise wrote:
 
The IP address assigned for the cxgb3i iscsi device is _not_ 
assigned to  a netdev interface via ifconfig, as far as I understand 
it (by looking  at the cxgb3i code).  So the host stack doesn't know 
about this address.  There is an administrative requirement, I 
assume, that the secret iscsi ipaddr is within a subnet that is 
bound to the T3 ethX interface.  Otherwise the routing lookup 
wouldn't work.



So who responds to neighbor queries, and how do outgoing queries get
sent with the right IP? Sounds odd...

  


The iscsi hba is only an initiator, so it doesn't need to respond to 
arp queries.  I guess the Source Protocol Address in the outgoing ARP 
request will be the ipaddr of the outgoing interface.   Its ok though 
because what is needed is the next-hop peer's hwaddr.   So the ARP 
reply comes in, updates the host neigh entry, and a NEIGH_EVENT 
callout is performed to the offload device drivers.  It is a little 
hackish, but that's the only way the netdev maintainers would allow 
iscsi offload in.  They originally tried to use the src address from 
the ethX interface for the offload iscsi connections and that was 
rejected.





In case you're interested...Here is the tail end of the cxgb3i original 
submission thread showing the use of a private  IP address which is 
unkown to the OS.


http://marc.info/?l=linux-netdevm=121944339211552


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] RDMA/CMA: fix iWARP adapter TCP port space usage

2010-06-23 Thread Jason Gunthorpe
On Wed, Jun 23, 2010 at 03:19:31PM -0500, Steve Wise wrote:

 So who responds to neighbor queries, and how do outgoing queries get
 sent with the right IP? Sounds odd...

 The iscsi hba is only an initiator, so it doesn't need to respond to arp  
 queries.  

Hmm.. The other side could arp you at any time, and if you don't answer
stuff can go bad, so something must be generating the replies.

But I guess that is seperate, sounds like iSCSI is in a similar boat
and they were not able to reconcile either? :(

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] RDMA/CMA: fix iWARP adapter TCP port space usage

2010-06-23 Thread Steve Wise

Jason Gunthorpe wrote:

On Wed, Jun 23, 2010 at 03:19:31PM -0500, Steve Wise wrote:

  

So who responds to neighbor queries, and how do outgoing queries get
sent with the right IP? Sounds odd...
  
The iscsi hba is only an initiator, so it doesn't need to respond to arp  
queries.  



Hmm.. The other side could arp you at any time, and if you don't answer
stuff can go bad, so something must be generating the replies.
  



You're right!  The low level driver, cxgb3, handles it.  See 
cxgb3_arp_process().   I missed this change in the original submission 
of cxgb3i...




But I guess that is seperate, sounds like iSCSI is in a similar boat
and they were not able to reconcile either? :(

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
  


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v2] RDMA/CMA: fix iWARP adapter TCP port space usage

2010-06-23 Thread Tung, Chien Tin
On the other hand trying to hook offloaded iWARP into the normal stack
does seem to lead to a mess.  I see DaveM's point: TCP port space is
just the beginning -- filtering, queueing, etc also have config that
ultimately an offload device would want to hook too.
 
   TCP port space is just the beginning but then these features
   didn't show up all at once in the kernel either.  Instead of
   evolving iWARP implementation, we can't even take a baby step
   and fix a flaw that exists in the current kernel.  Why are we
   replicating everything offered by the host stack instead of
   hooking in?  It does not sound like good engineering to me.
 
 Well as I said I don't particularly see a clean solution.  But the point
 I was making was that the net stack is already very complex with many
 places where interface configs are controlled -- having to add hooks to
 pass that config on to offload devices is going to add even more
 complexity and also add constraints to the format of that config
 information.  Which is not good.

I don't want separate config file for L2 and iWARP as it adds more
work and complexity for the user.  I want it dead simple.  I can see
extending config format to include information specific for offload
but I don't see how it can limit the format.  That has not been the
case up to this point.  Also, port space patch is totally transparent
to the user and config file.  There is no managing host TCP and iWARP
TCP port space for the user.

I'm not sure about passing config info to offload devices, if the info
is outside of what L2 driver currently picks up then sure some work
needs to be done.  Hopefully everything can be pass-through from L2 to iWARP.


Chien




--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mlx4 pci device table

2010-06-23 Thread Roland Dreier
  If the table is placed in mlx4_core (as of today in upstream), then I
  assume the mlx4_en and _ib aren't being probed by pci hot-plug
  mechasnisms, correct? else if you put it in _en _ib et al files, then
  one has to maintain two copies of the table, but maybe this would be
  the correct approach? how this should work with multi-protcol mlx4
  devices and/or IBoE?

I think the current upstream location is correct.  This matches the
practice of eg iw_cxgb3 as well as cxgb3i, bnx2i etc.  This does have
the disadvantage that mlx4_en and mlx4_ib are not auto-loaded by PCI
hotplug, but so it goes.
-- 
Roland Dreier rola...@cisco.com || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: When IBoE will be merged to upstream?

2010-06-23 Thread Roland Dreier
  This is actually a continue of the RAW_ET()  issue. We want to
  make a submition  of the patches to the upstream, but there is not
  support for IB transport in Ethernet devices, and the mlx4_en drivers
  version  is a bit outdated 1.4.1.1 in upstream and 1.5.1 in the OFED
  There is also missing VLAN support that already present in the OFED.
  When do you planning to submit changes from OFED to upstream?

 - I do not search for more things to merge upstream.  I have enough
   work reviewing things that are sent to me.  So I will never look
   through OFED for changes.

 - I do not handle the mlx4_en driver.  Changes for mlx4_en should go to
   netdev and Dave Miller.

 - I will try to get back to the IBoE changes when I have time, and I
   will admit that my time to spend as RDMA maintainer is nowhere near
   full time and less than it was in the past.

 - I did allocate a fair amount of time to spend on IBoE recently but
   unfortunately the patches were not really in a suitable state to
   merge, and I exhausted that time slice before we reached the end.
   When patch sets sit outside of the upstream kernel and are shipped in
   OFED for months and years, it would probably make upstream merging
   easier if that time was used to fix the patch set.

 - Specifically for the IBoE patches, shouldn't someone have realized
   that having a device-specific interface to do the standard mapping of
   GID to Ethernet address makes no sense?
-- 
Roland Dreier rola...@cisco.com || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html