Re: [PATCH v1 3/3] TARGET/sbc,loopback: Adjust command data length in case pi exists on the wire

2014-06-11 Thread Sagi Grimberg

On 6/11/2014 12:17 AM, Quinn Tran wrote:

SNIP


QT Instead of using existing value within cmd-data_length, can we
calculated data_length based on secstors  blocksize.

cmd-data_length = sectors * dev-dev_attrib.block_size;


We can do that I suppose...
Although it seems weird that the core discards the transport byte-count...
Just wandering if we should recalc on the DIF case only or always.



 From the QLogic perfective, the cmd-data_length is pull directly from the
wire (i.e. FCP header) when cmd is received.  In essence, it's whatever
the Initiator is set it to.  We does not have any indicator to spot DIF vs
none-DIF cmd when 1st received, unless the code do a peek.


Same for all transports I assume...



With that said, the cmd-data_length does not guarantee to contain both
data length  protection length when cmd is submit to
TCM/target_submit_cmd().  In Dif-Insert mode, data_length will only
contain the actual data(no Dif).


No, in the DOUT_INSERT/DIN_STRIP case, protect bits are off and the core 
will take the data length as is.

This case is covered.


It's best that the SBC code re-calculate the actual data length and dif
data length based on the number of sectors derived from the cmd.


Nic, what's your take on this?

Sagi.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/3] Include protection information in transport header

2014-06-11 Thread Sagi Grimberg
At the SCSI transport level, there is no distinction between
user data and protection information. Thus, iscsi header field
expected data transfer length should include protection
information.

Patch #1 introduces scsi helper scsi_transfer_length which computes
wire transfer byte count.

Patch #2 modifies iscsi initiator to set correct wire transfer length
in iscsi header data_length field (and modifies iser accordingly).

Patch #3 modifies target core to re-calculate the pure data length
in case of PI presence over the wire (and modifies loopback transport
to align with other transports).

Changes from v1:
- scsi_cmnd: Rewrite scsi_transfer_length as MKP suggested
- Target/sbc: re-calculate the data_length in case PI exists
  on the wire (instead od deacrease data -= prot)

Changes from v0:
- Introduce scsi helpers to compute correct transfer length in the
  presence of protection information.
- Modify iscsi to set correct transfer length using scsi helpers
- Modify loopback transport to set correct transfer length using
  scsi helpers

Sagi Grimberg (3):
  scsi_cmnd: Introduce scsi_transfer_length helper
  libiscsi, iser: Adjust data_length to include protection information
  TARGET/sbc,loopback: Adjust command data length in case pi exists on
the wire

 drivers/infiniband/ulp/iser/iser_initiator.c |   34 +++--
 drivers/scsi/libiscsi.c  |   18 +++---
 drivers/target/loopback/tcm_loop.c   |   15 +--
 drivers/target/target_core_sbc.c |   15 ++-
 include/scsi/scsi_cmnd.h |   17 +
 5 files changed, 61 insertions(+), 38 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/3] libiscsi, iser: Adjust data_length to include protection information

2014-06-11 Thread Sagi Grimberg
In case protection information exists over the wire
iscsi header data length is required to include it.
Use protection information aware scsi helpers to set
the correct transfer length.

In order to avoid breakage, remove iser transfer length
checks for each task as they are not always true and
somewhat redundant anyway.

Signed-off-by: Sagi Grimberg sa...@mellanox.com
Reviewed-by: Mike Christie micha...@cs.wisc.edu
---
 drivers/infiniband/ulp/iser/iser_initiator.c |   34 +++--
 drivers/scsi/libiscsi.c  |   18 +++---
 2 files changed, 19 insertions(+), 33 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iser_initiator.c 
b/drivers/infiniband/ulp/iser/iser_initiator.c
index 2e2d903..8d44a40 100644
--- a/drivers/infiniband/ulp/iser/iser_initiator.c
+++ b/drivers/infiniband/ulp/iser/iser_initiator.c
@@ -41,11 +41,11 @@
 #include iscsi_iser.h
 
 /* Register user buffer memory and initialize passive rdma
- *  dto descriptor. Total data size is stored in
- *  iser_task-data[ISER_DIR_IN].data_len
+ *  dto descriptor. Data size is stored in
+ *  task-data[ISER_DIR_IN].data_len, Protection size
+ *  os stored in task-prot[ISER_DIR_IN].data_len
  */
-static int iser_prepare_read_cmd(struct iscsi_task *task,
-unsigned int edtl)
+static int iser_prepare_read_cmd(struct iscsi_task *task)
 
 {
struct iscsi_iser_task *iser_task = task-dd_data;
@@ -73,14 +73,6 @@ static int iser_prepare_read_cmd(struct iscsi_task *task,
return err;
}
 
-   if (edtl  iser_task-data[ISER_DIR_IN].data_len) {
-   iser_err(Total data length: %ld, less than EDTL: 
-%d, in READ cmd BHS itt: %d, conn: 0x%p\n,
-iser_task-data[ISER_DIR_IN].data_len, edtl,
-task-itt, iser_task-ib_conn);
-   return -EINVAL;
-   }
-
err = device-iser_reg_rdma_mem(iser_task, ISER_DIR_IN);
if (err) {
iser_err(Failed to set up Data-IN RDMA\n);
@@ -100,8 +92,9 @@ static int iser_prepare_read_cmd(struct iscsi_task *task,
 }
 
 /* Register user buffer memory and initialize passive rdma
- *  dto descriptor. Total data size is stored in
- *  task-data[ISER_DIR_OUT].data_len
+ *  dto descriptor. Data size is stored in
+ *  task-data[ISER_DIR_OUT].data_len, Protection size
+ *  is stored at task-prot[ISER_DIR_OUT].data_len
  */
 static int
 iser_prepare_write_cmd(struct iscsi_task *task,
@@ -135,14 +128,6 @@ iser_prepare_write_cmd(struct iscsi_task *task,
return err;
}
 
-   if (edtl  iser_task-data[ISER_DIR_OUT].data_len) {
-   iser_err(Total data length: %ld, less than EDTL: %d, 
-in WRITE cmd BHS itt: %d, conn: 0x%p\n,
-iser_task-data[ISER_DIR_OUT].data_len,
-edtl, task-itt, task-conn);
-   return -EINVAL;
-   }
-
err = device-iser_reg_rdma_mem(iser_task, ISER_DIR_OUT);
if (err != 0) {
iser_err(Failed to register write cmd RDMA mem\n);
@@ -417,11 +402,12 @@ int iser_send_command(struct iscsi_conn *conn,
if (scsi_prot_sg_count(sc)) {
prot_buf-buf  = scsi_prot_sglist(sc);
prot_buf-size = scsi_prot_sg_count(sc);
-   prot_buf-data_len = sc-prot_sdb-length;
+   prot_buf-data_len = data_buf-data_len 
+ilog2(sc-device-sector_size) * 8;
}
 
if (hdr-flags  ISCSI_FLAG_CMD_READ) {
-   err = iser_prepare_read_cmd(task, edtl);
+   err = iser_prepare_read_cmd(task);
if (err)
goto send_command_error;
}
diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index 26dc005..3f46234 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -338,7 +338,7 @@ static int iscsi_prep_scsi_cmd_pdu(struct iscsi_task *task)
struct iscsi_session *session = conn-session;
struct scsi_cmnd *sc = task-sc;
struct iscsi_scsi_req *hdr;
-   unsigned hdrlength, cmd_len;
+   unsigned hdrlength, cmd_len, transfer_length;
itt_t itt;
int rc;
 
@@ -391,11 +391,11 @@ static int iscsi_prep_scsi_cmd_pdu(struct iscsi_task 
*task)
if (scsi_get_prot_op(sc) != SCSI_PROT_NORMAL)
task-protected = true;
 
+   transfer_length = scsi_transfer_length(sc);
+   hdr-data_length = cpu_to_be32(transfer_length);
if (sc-sc_data_direction == DMA_TO_DEVICE) {
-   unsigned out_len = scsi_out(sc)-length;
struct iscsi_r2t_info *r2t = task-unsol_r2t;
 
-   hdr-data_length = cpu_to_be32(out_len);
hdr-flags |= ISCSI_FLAG_CMD_WRITE;
/*
 * Write counters:
@@ -414,18 +414,19 @@ static int iscsi_prep_scsi_cmd_pdu(struct iscsi_task 
*task)
   

[PATCH v2 1/3] scsi_cmnd: Introduce scsi_transfer_length helper

2014-06-11 Thread Sagi Grimberg
In case protection information exists on the wire
scsi transports should include it in the transfer
byte count (even if protection information does not
exist in the host memory space). This helper will
compute the total transfer length from the scsi
command data length and protection attributes.

Signed-off-by: Sagi Grimberg sa...@mellanox.com
Signed-off-by: Martin K. Petersen martin.peter...@oracle.com
---
 include/scsi/scsi_cmnd.h |   17 +
 1 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
index dd7c998..a100c6e 100644
--- a/include/scsi/scsi_cmnd.h
+++ b/include/scsi/scsi_cmnd.h
@@ -7,6 +7,7 @@
 #include linux/types.h
 #include linux/timer.h
 #include linux/scatterlist.h
+#include scsi/scsi_device.h
 
 struct Scsi_Host;
 struct scsi_device;
@@ -306,4 +307,20 @@ static inline void set_driver_byte(struct scsi_cmnd *cmd, 
char status)
cmd-result = (cmd-result  0x00ff) | (status  24);
 }
 
+static inline unsigned scsi_transfer_length(struct scsi_cmnd *scmd)
+{
+   unsigned int xfer_len = blk_rq_bytes(scmd-request);
+   unsigned int prot_op = scsi_get_prot_op(scmd);
+   unsigned int sector_size = scmd-device-sector_size;
+
+   switch (prot_op) {
+   case SCSI_PROT_NORMAL:
+   case SCSI_PROT_WRITE_STRIP:
+   case SCSI_PROT_READ_INSERT:
+   return xfer_len;
+   }
+
+   return xfer_len + (xfer_len  ilog2(sector_size)) * 8;
+}
+
 #endif /* _SCSI_SCSI_CMND_H */
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/3] TARGET/sbc,loopback: Adjust command data length in case pi exists on the wire

2014-06-11 Thread Sagi Grimberg
In various areas of the code, it is assumed that
se_cmd-data_length describes pure data. In case
that protection information exists over the wire
(protect bits is are on) the target core re-calculates
the data length from the CDB and the backed device
block size (instead of each transport peeking in the cdb).

Modify loopback device to include protection information
in the transferred data length (like other scsi transports).

Signed-off-by: Sagi Grimberg sa...@mellanox.com
---
 drivers/target/loopback/tcm_loop.c |   15 ---
 drivers/target/target_core_sbc.c   |   15 +--
 2 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/drivers/target/loopback/tcm_loop.c 
b/drivers/target/loopback/tcm_loop.c
index c886ad1..1f4c015 100644
--- a/drivers/target/loopback/tcm_loop.c
+++ b/drivers/target/loopback/tcm_loop.c
@@ -179,7 +179,7 @@ static void tcm_loop_submission_work(struct work_struct 
*work)
struct tcm_loop_hba *tl_hba;
struct tcm_loop_tpg *tl_tpg;
struct scatterlist *sgl_bidi = NULL;
-   u32 sgl_bidi_count = 0;
+   u32 sgl_bidi_count = 0, transfer_length;
int rc;
 
tl_hba = *(struct tcm_loop_hba **)shost_priv(sc-device-host);
@@ -213,12 +213,21 @@ static void tcm_loop_submission_work(struct work_struct 
*work)
 
}
 
-   if (!scsi_prot_sg_count(sc)  scsi_get_prot_op(sc) != SCSI_PROT_NORMAL)
+   transfer_length = scsi_transfer_length(sc);
+   if (!scsi_prot_sg_count(sc) 
+   scsi_get_prot_op(sc) != SCSI_PROT_NORMAL) {
se_cmd-prot_pto = true;
+   /*
+* loopback transport doesn't support
+* WRITE_GENERATE, READ_STRIP protection
+* information operations, go ahead unprotected.
+*/
+   transfer_length = scsi_bufflen(sc);
+   }
 
rc = target_submit_cmd_map_sgls(se_cmd, tl_nexus-se_sess, sc-cmnd,
tl_cmd-tl_sense_buf[0], tl_cmd-sc-device-lun,
-   scsi_bufflen(sc), tcm_loop_sam_attr(sc),
+   transfer_length, tcm_loop_sam_attr(sc),
sc-sc_data_direction, 0,
scsi_sglist(sc), scsi_sg_count(sc),
sgl_bidi, sgl_bidi_count,
diff --git a/drivers/target/target_core_sbc.c b/drivers/target/target_core_sbc.c
index e022959..4b5716f 100644
--- a/drivers/target/target_core_sbc.c
+++ b/drivers/target/target_core_sbc.c
@@ -665,8 +665,19 @@ sbc_check_prot(struct se_device *dev, struct se_cmd *cmd, 
unsigned char *cdb,
 
cmd-prot_type = dev-dev_attrib.pi_prot_type;
cmd-prot_length = dev-prot_length * sectors;
-   pr_debug(%s: prot_type=%d, prot_length=%d prot_op=%d prot_checks=%d\n,
-__func__, cmd-prot_type, cmd-prot_length,
+
+   /**
+* In case protection information exists over the wire
+* we modify command data length to describe pure data.
+* The actual transfer length is data length + protection
+* length
+**/
+   if (protect)
+   cmd-data_length = sectors * dev-dev_attrib.block_size;
+
+   pr_debug(%s: prot_type=%d, data_length=%d, prot_length=%d 
+prot_op=%d prot_checks=%d\n,
+__func__, cmd-prot_type, cmd-data_length, cmd-prot_length,
 cmd-prot_op, cmd-prot_checks);
 
return true;
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 for-next 02/16] RDMA/ocrdma: Query and initalize the PFC SL

2014-06-11 Thread Or Gerlitz

On 10/06/2014 17:02, Selvin Xavier wrote:

This patch implements routine to query the PFC priority from the adapter port.

Following are the changes implemented:

  * A new FW command is implemented to query the operational/admin
DCBX configuration from the FW and obtain active priority(service level).
  * Adds support for the async event reported by FW when the PFC priority 
changes.


+benet maintainers,

Any reason for all code relating to the above not to land in your 
Ethernet driver? the same FW serves  for plain Ethernet and RoCE, isn't 
that?




Service level is re-initialized during modify_qp or
create_ah, based on this event.
  * Maintain SL value in ocrdma_dev structure and refer that as and
when needed.

Signed-off-by: Devesh Sharma devesh.sha...@emulex.com
Signed-off-by: Selvin Xavier selvin.xav...@emulex.com
---
  drivers/infiniband/hw/ocrdma/ocrdma.h  |   21 
  drivers/infiniband/hw/ocrdma/ocrdma_ah.c   |2 +
  drivers/infiniband/hw/ocrdma/ocrdma_hw.c   |  172 
  drivers/infiniband/hw/ocrdma/ocrdma_hw.h   |2 +
  drivers/infiniband/hw/ocrdma/ocrdma_main.c |1 +
  drivers/infiniband/hw/ocrdma/ocrdma_sli.h  |   81 +-
  6 files changed, 278 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/hw/ocrdma/ocrdma.h 
b/drivers/infiniband/hw/ocrdma/ocrdma.h
index 19011db..5cd65c2 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma.h
+++ b/drivers/infiniband/hw/ocrdma/ocrdma.h
@@ -236,6 +236,9 @@ struct ocrdma_dev {
struct rcu_head rcu;
int id;
u64 stag_arr[OCRDMA_MAX_STAG];
+   u8 sl; /* service level */
+   bool pfc_state;
+   atomic_t update_sl;
u16 pvid;
u32 asic_id;
  
@@ -518,4 +521,22 @@ static inline u8 ocrdma_get_asic_type(struct ocrdma_dev *dev)

OCRDMA_SLI_ASIC_GEN_NUM_SHIFT;
  }
  
+static inline u8 ocrdma_get_pfc_prio(u8 *pfc, u8 prio)

+{
+   return *(pfc + prio);
+}
+
+static inline u8 ocrdma_get_app_prio(u8 *app_prio, u8 prio)
+{
+   return *(app_prio + prio);
+}
+
+static inline u8 ocrdma_is_enabled_and_synced(u32 state)
+{  /* May also be used to interpret TC-state, QCN-state
+* Appl-state and Logical-link-state in future.
+*/
+   return (state  OCRDMA_STATE_FLAG_ENABLED) 
+   (state  OCRDMA_STATE_FLAG_SYNC);
+}
+
  #endif
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_ah.c 
b/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
index d4cc01f..a023234 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
@@ -100,6 +100,8 @@ struct ib_ah *ocrdma_create_ah(struct ib_pd *ibpd, struct 
ib_ah_attr *attr)
if (!(attr-ah_flags  IB_AH_GRH))
return ERR_PTR(-EINVAL);
  
+	if (atomic_cmpxchg(dev-update_sl, 1, 0))

+   ocrdma_init_service_level(dev);
ah = kzalloc(sizeof(*ah), GFP_ATOMIC);
if (!ah)
return ERR_PTR(-ENOMEM);
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_hw.c 
b/drivers/infiniband/hw/ocrdma/ocrdma_hw.c
index bce4adf..e6463cb 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_hw.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_hw.c
@@ -771,6 +771,10 @@ static void ocrdma_process_grp5_aync(struct ocrdma_dev 
*dev,
OCRDMA_AE_PVID_MCQE_TAG_MASK) 
OCRDMA_AE_PVID_MCQE_TAG_SHIFT);
break;
+
+   case OCRDMA_ASYNC_EVENT_COS_VALUE:
+   atomic_set(dev-update_sl, 1);
+   break;
default:
/* Not interested evts. */
break;
@@ -2265,6 +2269,8 @@ static int ocrdma_set_av_params(struct ocrdma_qp *qp,
  
  	if ((ah_attr-ah_flags  IB_AH_GRH) == 0)

return -EINVAL;
+   if (atomic_cmpxchg(qp-dev-update_sl, 1, 0))
+   ocrdma_init_service_level(qp-dev);
cmd-params.tclass_sq_psn |=
(ah_attr-grh.traffic_class  OCRDMA_QP_PARAMS_TCLASS_SHIFT);
cmd-params.rnt_rc_sl_fl |=
@@ -2298,6 +2304,10 @@ static int ocrdma_set_av_params(struct ocrdma_qp *qp,
cmd-params.vlan_dmac_b4_to_b5 |=
vlan_id  OCRDMA_QP_PARAMS_VLAN_SHIFT;
cmd-flags |= OCRDMA_QP_PARA_VLAN_EN_VALID;
+   /* override the sl with default priority if 0 */
+   cmd-params.rnt_rc_sl_fl |=
+   (ah_attr-sl ? ah_attr-sl :
+   qp-dev-sl)  OCRDMA_QP_PARAMS_SL_SHIFT;
}
return 0;
  }
@@ -2605,6 +2615,168 @@ int ocrdma_mbx_destroy_srq(struct ocrdma_dev *dev, 
struct ocrdma_srq *srq)
return status;
  }
  
+static int ocrdma_mbx_get_dcbx_config(struct ocrdma_dev *dev, u32 ptype,

+ struct ocrdma_dcbx_cfg *dcbxcfg)
+{
+   int status = 0;
+   dma_addr_t pa;
+   struct ocrdma_mqe cmd;
+
+   struct ocrdma_get_dcbx_cfg_req *req = NULL;
+   struct 

Re: [PATCH libibverbs V1] Align Flow Steering API to the extension verbs scheme

2014-06-11 Thread Or Gerlitz

On 20/05/2014 09:59, Or Gerlitz wrote:

On 20/05/2014 04:05, Roland Dreier wrote:
On Thu, May 15, 2014 at 6:16 AM, Or Gerlitz ogerl...@mellanox.com 
wrote:

Hi Roland -- this addresses all the comments by Jason who gave a
ack/reviewed-by note for both the libibverbs and libmlx4 fixes.

So you can move ahead and conduct point releases for both libs.


OK... so I should do one more libmlx4 release before you take over?


YES, please.



Hi Roland,

Can you please go and do this joint/point release of libibverbs (1.1.9) 
and libmlx4 (1.0.7)

which fixes the issues with flow steering?!

I got distro contact which was about to take now 1.1.8/1.0.6 and asked 
them to wait few days and take the proper bits, so...


Or.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH libibverbs V1] Align Flow Steering API to the extension verbs scheme

2014-06-11 Thread Or Gerlitz

On 11/06/2014 12:53, Or Gerlitz wrote:

On 20/05/2014 09:59, Or Gerlitz wrote:

On 20/05/2014 04:05, Roland Dreier wrote:
On Thu, May 15, 2014 at 6:16 AM, Or Gerlitz ogerl...@mellanox.com 
wrote:

Hi Roland -- this addresses all the comments by Jason who gave a
ack/reviewed-by note for both the libibverbs and libmlx4 fixes.

So you can move ahead and conduct point releases for both libs.


OK... so I should do one more libmlx4 release before you take over?


YES, please.



Hi Roland,

Can you please go and do this joint/point release of libibverbs 
(1.1.9) and libmlx4 (1.0.7)

which fixes the issues with flow steering?!


FWIW, if it helps you, these are the relevant patchwork entries

https://patchwork.kernel.org/patch/4182461/
https://patchwork.kernel.org/patch/4174891/


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH libibverbs V4 2/5] Extend create_ah in order to pass L2 parameters to the provider

2014-06-11 Thread Or Gerlitz

On 22/05/2014 11:52, Matan Barak wrote:

On 21/5/2014 10:55 PM, Jason Gunthorpe wrote:

On Sun, May 18, 2014 at 12:38:46PM +0300, Or Gerlitz wrote:

From: Matan Barak mat...@mellanox.com

In order to support IP based addressing, one needs to pass the L2
parameters to the provider drive. This is done through a new extendable


'driver'

Please provide a man page. In this case you probably want to provide a
patch to the existing man/ibv_create_ah.3 to discuss the 2nd entry
point and new fields.



Ok, I'll add this entry to the ibv_create_ah man page.


+enum ibv_ah_attr_ex_attr_mask {
+IBV_AH_ATTR_EX_LL = 1UL  0,
+IBV_AH_ATTR_EX_VID = 1UL  1,
+};


OK


+struct ibv_ah_attr_ex {
+struct ibv_global_routegrh;
+uint16_tdlid;
+uint8_tsl;
+uint8_tsrc_path_bits;
+uint8_tstatic_rate;
+uint8_tis_global;
+uint8_tport_num;
+uint32_tcomp_mask;


OK


+union {
+struct sockaddrsa;
+struct sockaddr_storage _s;
+} ll;
+uint16_tvid;
+};


Hard to know exactly what is going on here without a man page, but I
thought one of the points was to provide an ethernet L2 MAC address?
Shouldn't there be a mechanism for that?

I see this:

+   attr_ex.ll.sa.sa_family = ARPHRD_ETHER;

Which makes no sense, sa_family should be an AF_* constant.



Kernel code *sometimes* use sa_family as ARPHRD_ETHER.


So, I think this bit needs some kind of work. If you want to setup
ethernet, then setup ethernet:

uint64_t eth_dmac
uint16_t eth_vid;

and what about all the trendy new ethernet stuff, overlay networks,
etc? Can that be expressed in there?



I don't want to make this Ethernet specific. That's why the previous 
pointer used a pointer. I don't mind creating a generic interface for 
link layer if the existing solutions aren't good enough. Any suggestions?


Hi Jason,

Can you please address  Matan's comments? we'd like this thread to 
converge...


Or.




@@ -975,6 +998,8 @@ enum verbs_context_mask {

  struct verbs_context {
  /*  grows up - new fields go here */
+struct ibv_ah * (*create_ah_ex)(struct ibv_pd *pd,
+struct ibv_ah_attr_ex *attr);


Can you check if 'attr' should be const? I see the existing API isn't
const, but I am suspecting that is a mistake?


You're probably right, but wouldn't we want to be aligned with the 
non-extended verb:
struct ibv_ah * (*create_ah)(struct ibv_pd *pd, struct 
ibv_ah_attr *attr);
Since the provider driver usually go through a common path for both 
the non-extended and the extended verb, this could cause an ugly const 
cast.




Jason



Matan



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH libibverbs V4 2/5] Extend create_ah in order to pass L2 parameters to the provider

2014-06-11 Thread Jason Gunthorpe
On Wed, Jun 11, 2014 at 01:54:56PM +0300, Or Gerlitz wrote:

 Can you please address  Matan's comments? we'd like this thread to
 converge...

I am waiting for your analysis as I mentioned in my last message to
this thread:

https://www.mail-archive.com/linux-rdma@vger.kernel.org/msg19925.html

That is rather a larger question, and the answer may well be we don't
need two new extended verbs to do what you want...

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] ibacm provider infrastructure

2014-06-11 Thread Hefty, Sean
A couple of months ago, I posted the start of a patch series to re-architect 
ibacm around supporting multiple providers.

http://www.spinics.net/lists/linux-rdma/msg19363.html

This more easily allows ibacm to support other mechanisms for resolving address 
and path record data, with support for ibssa (Scalable SA) as one obvious 
candidate.

A complete set of patches that completes the re-architecture is now available 
in the ibacm 'prov' branch.

http://git.openfabrics.org/?p=~shefty/ibacm.git;a=shortlog;h=refs/heads/prov

In total, there are 40-50 patches, with most of the work actually done by Kaike 
Wan.  Because of the size and limited audience interested in this work, I don't 
plan on posting these to the mail list.  But anyone interested in reviewing the 
patches should feel free to comment on them.  Likely the most interesting 
result of the work is the provider interface to ibacm, which is defined in 
include/infiniband/acm_prov.h:

http://git.openfabrics.org/?p=~shefty/ibacm.git;a=blob;f=include/infiniband/acm_prov.h;h=dcfdf5fce7373a7d57bcbe3fe54d6fe5d81935e2;hb=refs/heads/prov

- Sean
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


DMA window vs actual allocatable DMA memory

2014-06-11 Thread Bob Biloxi
Hi All,

I am having trouble understanding DMA window and actual amount of
addressable DMA memory.

I hope someone explains me. Let me put my understanding and doubts here:

Let's say I am writing code for an ethernet device driver in the
virtualisation(hypervisor) environment.

Now, if the ethernet adapter requires certain amount of DMA memory, I
need to allocate heap memory and dma map it and provide to the
adapter.

From the hardware perspective, we have a 64GB DMA window.

I am having trouble understanding this value. Does it mean i can
allocate 64GB of RAM(Heap memory) and dma map it?

As i understand there might be a table that translates bus address to
physical(RAM) addresses. Each entry of such table points to a 4KB
page. If the size of each entry is 8 bytes and there are 16M such
entries( 16M * 4K = 64GB DMA window), the size of the table comes to
around 128M

Now do I have 64GB DMA memory or 128M DMA memory?

I want to know what is the max amount of memory that I can
allocate(heap), dma map and provide it to the adapter.

I will be really thankful in all the help that I can get!!

Thanks so much


Best Regards,
Marc
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 1/1] IPoIB: Improve parent interface p_key handling.

2014-06-11 Thread Alex Estrin
With reference to commit c2904141696ee19551f155396f23cdd5d95e
(IPoIB: Fix pkey change flow for virtualization environments)
It was noticed that parent interface keeps sending broadcast group
join requests if p_key index 0 is invalid. That creates unnecessary
noise on a fabric:

ib0: multicast join failed for ff12:401b:8000:::::,
status -22

Proposed patch re-init resources, then brings interface
up only if p_key idx 0 is valid either on bootup or on PKEY_CHANGE event.
Original code could run multicast task regardless of p_key value,
which we want to avoid.


Reviewed-by: Ira Weiny ira.we...@intel.com
Signed-off-by: Alex Estrin alex.est...@intel.com
---

Changes from v2:
Handle pkey change event for a case when interface is down
(pointed out by Erez Shitrit ere...@dev.mellanox.co.il).

Changes from v1:
p_key check for 'Invalid' value was moved to
ipoib_pkey_dev_check_presence() that is used now in ipoib_ib_dev_open()
for p_key validation.
Whitespace and format adjusted.

 drivers/infiniband/ulp/ipoib/ipoib_ib.c |   35 ++
 1 files changed, 21 insertions(+), 14 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c 
b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
index 6a7003d..a2af996 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
@@ -52,6 +52,7 @@ MODULE_PARM_DESC(data_debug_level,
 #endif
 
 static DEFINE_MUTEX(pkey_mutex);
+static void ipoib_pkey_dev_check_presence(struct net_device *dev);
 
 struct ipoib_ah *ipoib_create_ah(struct net_device *dev,
 struct ib_pd *pd, struct ib_ah_attr *attr)
@@ -669,12 +670,13 @@ int ipoib_ib_dev_open(struct net_device *dev)
struct ipoib_dev_priv *priv = netdev_priv(dev);
int ret;
 
-   if (ib_find_pkey(priv-ca, priv-port, priv-pkey, priv-pkey_index)) {
-   ipoib_warn(priv, P_Key 0x%04x not found\n, priv-pkey);
-   clear_bit(IPOIB_PKEY_ASSIGNED, priv-flags);
+   ipoib_pkey_dev_check_presence(dev);
+
+   if (!test_bit(IPOIB_PKEY_ASSIGNED, priv-flags)) {
+   ipoib_warn(priv, P_Key 0x%04x is %s\n, priv-pkey,
+   !(priv-pkey  0x7fff) ? Invalid : not found);
return -1;
}
-   set_bit(IPOIB_PKEY_ASSIGNED, priv-flags);
 
ret = ipoib_init_qp(dev);
if (ret) {
@@ -712,9 +714,10 @@ dev_stop:
 static void ipoib_pkey_dev_check_presence(struct net_device *dev)
 {
struct ipoib_dev_priv *priv = netdev_priv(dev);
-   u16 pkey_index = 0;
 
-   if (ib_find_pkey(priv-ca, priv-port, priv-pkey, pkey_index))
+   if (!(priv-pkey  0x7fff) ||
+   ib_find_pkey(priv-ca, priv-port, priv-pkey,
+   priv-pkey_index))
clear_bit(IPOIB_PKEY_ASSIGNED, priv-flags);
else
set_bit(IPOIB_PKEY_ASSIGNED, priv-flags);
@@ -987,15 +990,17 @@ static void __ipoib_ib_dev_flush(struct ipoib_dev_priv 
*priv,
up_read(priv-vlan_rwsem);
 
if (!test_bit(IPOIB_FLAG_INITIALIZED, priv-flags)) {
-   /* for non-child devices must check/update the pkey value here 
*/
-   if (level == IPOIB_FLUSH_HEAVY 
-   !test_bit(IPOIB_FLAG_SUBINTERFACE, priv-flags))
-   update_parent_pkey(priv);
-   ipoib_dbg(priv, Not flushing - IPOIB_FLAG_INITIALIZED not 
set.\n);
-   return;
+   if (level != IPOIB_FLUSH_HEAVY) {
+   ipoib_dbg(priv, Not flushing - IPOIB_FLAG_INITIALIZED 
not set.\n);
+   return;
+   }
}
 
if (!test_bit(IPOIB_FLAG_ADMIN_UP, priv-flags)) {
+   /* interface is down. update pkey and leave. */
+   if (level == IPOIB_FLUSH_HEAVY 
+   !test_bit(IPOIB_FLAG_SUBINTERFACE, priv-flags))
+   update_parent_pkey(priv);
ipoib_dbg(priv, Not flushing - IPOIB_FLAG_ADMIN_UP not 
set.\n);
return;
}
@@ -1038,8 +1043,10 @@ static void __ipoib_ib_dev_flush(struct ipoib_dev_priv 
*priv,
ipoib_ib_dev_down(dev, 0);
 
if (level == IPOIB_FLUSH_HEAVY) {
-   ipoib_ib_dev_stop(dev, 0);
-   ipoib_ib_dev_open(dev);
+   if (test_bit(IPOIB_FLAG_INITIALIZED, priv-flags))
+   ipoib_ib_dev_stop(dev, 0);
+   if (ipoib_ib_dev_open(dev) != 0)
+   return;
}
 
/*
-- 
1.7.0.7

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 0/1] IPoIB: Improve parent interface p_key handling.

2014-06-11 Thread Alex Estrin
Changes from v2:
Handle pkey change event for a case when interface is down
(pointed out by Erez Shitrit ere...@dev.mellanox.co.il).

Changes from v1:
p_key check for 'Invalid' value was moved to
ipoib_pkey_dev_check_presence() that is used now in ipoib_ib_dev_open()
for p_key validation.
Whitespace and format adjusted.

---

Alex Estrin (1):
  IPoIB: Improve parent interface p_key handling.


 drivers/infiniband/ulp/ipoib/ipoib_ib.c |   35 +++
 1 files changed, 21 insertions(+), 14 deletions(-)

--
 
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 1/1] IPoIB: Improve parent interface p_key handling.

2014-06-11 Thread Alex Estrin
With reference to commit c2904141696ee19551f155396f23cdd5d95e
(IPoIB: Fix pkey change flow for virtualization environments)
It was noticed that parent interface keeps sending broadcast group
join requests if p_key index 0 is invalid. That creates unnecessary
noise on a fabric:

ib0: multicast join failed for ff12:401b:8000:::::,
status -22

Proposed patch re-init resources, then brings interface
up only if p_key idx 0 is valid either on bootup or on PKEY_CHANGE event.
Original code could run multicast task regardless of p_key value,
which we want to avoid.

Reviewed-by: Ira Weiny ira.we...@intel.com
Signed-off-by: Alex Estrin alex.est...@intel.com
---
 drivers/infiniband/ulp/ipoib/ipoib_ib.c |   35 +++
 1 files changed, 21 insertions(+), 14 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c 
b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
index 6a7003d..a2af996 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
@@ -52,6 +52,7 @@ MODULE_PARM_DESC(data_debug_level,
 #endif
 
 static DEFINE_MUTEX(pkey_mutex);
+static void ipoib_pkey_dev_check_presence(struct net_device *dev);
 
 struct ipoib_ah *ipoib_create_ah(struct net_device *dev,
 struct ib_pd *pd, struct ib_ah_attr *attr)
@@ -669,12 +670,13 @@ int ipoib_ib_dev_open(struct net_device *dev)
struct ipoib_dev_priv *priv = netdev_priv(dev);
int ret;
 
-   if (ib_find_pkey(priv-ca, priv-port, priv-pkey, priv-pkey_index)) {
-   ipoib_warn(priv, P_Key 0x%04x not found\n, priv-pkey);
-   clear_bit(IPOIB_PKEY_ASSIGNED, priv-flags);
+   ipoib_pkey_dev_check_presence(dev);
+
+   if (!test_bit(IPOIB_PKEY_ASSIGNED, priv-flags)) {
+   ipoib_warn(priv, P_Key 0x%04x is %s\n, priv-pkey,
+   !(priv-pkey  0x7fff) ? Invalid : not found);
return -1;
}
-   set_bit(IPOIB_PKEY_ASSIGNED, priv-flags);
 
ret = ipoib_init_qp(dev);
if (ret) {
@@ -712,9 +714,10 @@ dev_stop:
 static void ipoib_pkey_dev_check_presence(struct net_device *dev)
 {
struct ipoib_dev_priv *priv = netdev_priv(dev);
-   u16 pkey_index = 0;
 
-   if (ib_find_pkey(priv-ca, priv-port, priv-pkey, pkey_index))
+   if (!(priv-pkey  0x7fff) ||
+   ib_find_pkey(priv-ca, priv-port, priv-pkey,
+   priv-pkey_index))
clear_bit(IPOIB_PKEY_ASSIGNED, priv-flags);
else
set_bit(IPOIB_PKEY_ASSIGNED, priv-flags);
@@ -987,15 +990,17 @@ static void __ipoib_ib_dev_flush(struct ipoib_dev_priv 
*priv,
up_read(priv-vlan_rwsem);
 
if (!test_bit(IPOIB_FLAG_INITIALIZED, priv-flags)) {
-   /* for non-child devices must check/update the pkey value here 
*/
-   if (level == IPOIB_FLUSH_HEAVY 
-   !test_bit(IPOIB_FLAG_SUBINTERFACE, priv-flags))
-   update_parent_pkey(priv);
-   ipoib_dbg(priv, Not flushing - IPOIB_FLAG_INITIALIZED not 
set.\n);
-   return;
+   if (level != IPOIB_FLUSH_HEAVY) {
+   ipoib_dbg(priv, Not flushing - IPOIB_FLAG_INITIALIZED 
not set.\n);
+   return;
+   }
}
 
if (!test_bit(IPOIB_FLAG_ADMIN_UP, priv-flags)) {
+   /* interface is down. update pkey and leave. */
+   if (level == IPOIB_FLUSH_HEAVY 
+   !test_bit(IPOIB_FLAG_SUBINTERFACE, priv-flags))
+   update_parent_pkey(priv);
ipoib_dbg(priv, Not flushing - IPOIB_FLAG_ADMIN_UP not 
set.\n);
return;
}
@@ -1038,8 +1043,10 @@ static void __ipoib_ib_dev_flush(struct ipoib_dev_priv 
*priv,
ipoib_ib_dev_down(dev, 0);
 
if (level == IPOIB_FLUSH_HEAVY) {
-   ipoib_ib_dev_stop(dev, 0);
-   ipoib_ib_dev_open(dev);
+   if (test_bit(IPOIB_FLAG_INITIALIZED, priv-flags))
+   ipoib_ib_dev_stop(dev, 0);
+   if (ipoib_ib_dev_open(dev) != 0)
+   return;
}
 
/*

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v3 1/1] sorry for duplicate emails.

2014-06-11 Thread Estrin, Alex
Hi guys,

I'm really sorry for duplicate e-mails I have sent out.
 Somehow messages didn't get back to my list, so I assumed  emails were dropped 
.

Kind regards,
Alex Estrin.



Re: [PATCH v1 3/3] TARGET/sbc,loopback: Adjust command data length in case pi exists on the wire

2014-06-11 Thread Nicholas A. Bellinger
On Wed, 2014-06-11 at 10:24 +0300, Sagi Grimberg wrote:
 On 6/11/2014 12:17 AM, Quinn Tran wrote:
 
 SNIP
 
  QT Instead of using existing value within cmd-data_length, can we
  calculated data_length based on secstors  blocksize.
 
  cmd-data_length = sectors * dev-dev_attrib.block_size;
 
 We can do that I suppose...
 Although it seems weird that the core discards the transport byte-count...
 Just wandering if we should recalc on the DIF case only or always.
 

Yeah, same concern here wrt to discarding the transport length.

This effectively makes the residual handling further down in
sbc_parse_cdb() - target_cmd_size_check() always match size ==
cmd-data_length, because the latter as been recalculated by the former.

Honestly, I don't know if this is a problem for normal READ + WRITE with
prot=1 as I've never seen size != cmd-data_length for I/O related
commands, but it does seem potentially dangerous.

 
   From the QLogic perfective, the cmd-data_length is pull directly from the
  wire (i.e. FCP header) when cmd is received.  In essence, it's whatever
  the Initiator is set it to.  We does not have any indicator to spot DIF vs
  none-DIF cmd when 1st received, unless the code do a peek.
 
 Same for all transports I assume...
 

So just to confirm Quinn, the Qlogic the initiator includes the PI bytes
into the FCP header data_length for the target-side *_PASS and
DOUT_STRIP / DIN_INSERT, that is currently passed into
qla_tgt_ops-handle_cmd(), right..?

If that is the case, Sagi's v1 to cmd-data_length -= cmd-prot_length
seems like it would still do right thing for *_PASS and DOUT_STRIP /
DIN_INSERT, given that cmd-prot_length is calculated in
sbc_check_prot() based upon dev-prot_length * sectors..

 
  With that said, the cmd-data_length does not guarantee to contain both
  data length  protection length when cmd is submit to
  TCM/target_submit_cmd().  In Dif-Insert mode, data_length will only
  contain the actual data(no Dif).
 
 No, in the DOUT_INSERT/DIN_STRIP case, protect bits are off and the core 
 will take the data length as is.
 This case is covered.
 

nod

  It's best that the SBC code re-calculate the actual data length and dif
  data length based on the number of sectors derived from the cmd.
 
 Nic, what's your take on this?
 

Hard to say.  Discarding the transport length in v2 doesn't seem like a
good idea, but subtracting from cmd-prot_length in v1 is using the
sector count from the CDB anyways, so it's essentially the same tradeoff
of recalculating the transport's cmd-data_length from values within the
CDB w/ prot=1.

MKP, WDYT..?

--nab




--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/3] Include protection information in transport header

2014-06-11 Thread Nicholas A. Bellinger
On Wed, 2014-06-11 at 12:09 +0300, Sagi Grimberg wrote:
 At the SCSI transport level, there is no distinction between
 user data and protection information. Thus, iscsi header field
 expected data transfer length should include protection
 information.
 
 Patch #1 introduces scsi helper scsi_transfer_length which computes
 wire transfer byte count.
 
 Patch #2 modifies iscsi initiator to set correct wire transfer length
 in iscsi header data_length field (and modifies iser accordingly).
 
 Patch #3 modifies target core to re-calculate the pure data length
 in case of PI presence over the wire (and modifies loopback transport
 to align with other transports).
 
 Changes from v1:
 - scsi_cmnd: Rewrite scsi_transfer_length as MKP suggested
 - Target/sbc: re-calculate the data_length in case PI exists
   on the wire (instead od deacrease data -= prot)
 
 Changes from v0:
 - Introduce scsi helpers to compute correct transfer length in the
   presence of protection information.
 - Modify iscsi to set correct transfer length using scsi helpers
 - Modify loopback transport to set correct transfer length using
   scsi helpers
 
 Sagi Grimberg (3):
   scsi_cmnd: Introduce scsi_transfer_length helper
   libiscsi, iser: Adjust data_length to include protection information
   TARGET/sbc,loopback: Adjust command data length in case pi exists on
 the wire
 
  drivers/infiniband/ulp/iser/iser_initiator.c |   34 +++--
  drivers/scsi/libiscsi.c  |   18 +++---
  drivers/target/loopback/tcm_loop.c   |   15 +--
  drivers/target/target_core_sbc.c |   15 ++-
  include/scsi/scsi_cmnd.h |   17 +
  5 files changed, 61 insertions(+), 38 deletions(-)
 

Ok, I've applied these to for-next, but let's see what MKP recommends
for patch #3 wrt to recalculating cmd-data_length.

Thanks Sagi!

--nab

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 3/3] TARGET/sbc,loopback: Adjust command data length in case pi exists on the wire

2014-06-11 Thread Quinn Tran


On 6/11/14 2:30 PM, Nicholas A. Bellinger n...@linux-iscsi.org wrote:

On Wed, 2014-06-11 at 10:24 +0300, Sagi Grimberg wrote:
 On 6/11/2014 12:17 AM, Quinn Tran wrote:

 SNIP

  QT Instead of using existing value within cmd-data_length, can we
  calculated data_length based on secstors  blocksize.
 
  cmd-data_length = sectors * dev-dev_attrib.block_size;

 We can do that I suppose...
 Although it seems weird that the core discards the transport
byte-count...
 Just wandering if we should recalc on the DIF case only or always.


Yeah, same concern here wrt to discarding the transport length.

This effectively makes the residual handling further down in
sbc_parse_cdb() - target_cmd_size_check() always match size ==
cmd-data_length, because the latter as been recalculated by the former.

Honestly, I don't know if this is a problem for normal READ + WRITE with
prot=1 as I've never seen size != cmd-data_length for I/O related
commands, but it does seem potentially dangerous.

 
   From the QLogic perfective, the cmd-data_length is pull directly
from the
  wire (i.e. FCP header) when cmd is received.  In essence, it's
whatever
  the Initiator is set it to.  We does not have any indicator to spot
DIF vs
  none-DIF cmd when 1st received, unless the code do a peek.

 Same for all transports I assume...


So just to confirm Quinn, the Qlogic the initiator includes the PI bytes
into the FCP header data_length for the target-side *_PASS and
DOUT_STRIP / DIN_INSERT, that is currently passed into
qla_tgt_ops-handle_cmd(), right..?

QT
Initiator:
DOUT_STRIP/DIN_Insert:  FCP_DL = data length only (no dif length)
Dif PASS: FCP_DL = data length + Dif length.

Target:
DOUT_strip/DIN_Insert:  qla_tgt_ops-handle_cmd(data length only)

sbc_check_prot()
If (protect)
   cmd-data_length -= cmd-prot_length;   truncation


---
DIF_PASS: qla_tgt_ops-handle_cmd(data length + Dif length)



If that is the case, Sagi's v1 to cmd-data_length -= cmd-prot_length
seems like it would still do right thing for *_PASS and DOUT_STRIP /
DIN_INSERT, given that cmd-prot_length is calculated in
sbc_check_prot() based upon dev-prot_length * sectors..

 
  With that said, the cmd-data_length does not guarantee to contain
both
  data length  protection length when cmd is submit to
  TCM/target_submit_cmd().  In Dif-Insert mode, data_length will only
  contain the actual data(no Dif).

 No, in the DOUT_INSERT/DIN_STRIP case, protect bits are off and the
core
 will take the data length as is.
 This case is covered.


nod

QT agree.


  It's best that the SBC code re-calculate the actual data length and
dif
  data length based on the number of sectors derived from the cmd.

 Nic, what's your take on this?


Hard to say.  Discarding the transport length in v2 doesn't seem like a
good idea, but subtracting from cmd-prot_length in v1 is using the
sector count from the CDB anyways, so it's essentially the same tradeoff
of recalculating the transport's cmd-data_length from values within the
CDB w/ prot=1.

MKP, WDYT..?

--nab








This message and any attached documents contain information from QLogic 
Corporation or its wholly-owned subsidiaries that may be confidential. If you 
are not the intended recipient, you may not read, copy, distribute, or use this 
information. If you have received this transmission in error, please notify the 
sender immediately by reply e-mail and then delete this message.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/3] scsi_cmnd: Introduce scsi_transfer_length helper

2014-06-11 Thread Martin K. Petersen
 Sagi == Sagi Grimberg sa...@mellanox.com writes:

Sagi In case protection information exists on the wire scsi transports
Sagi should include it in the transfer byte count (even if protection
Sagi information does not exist in the host memory space). This helper
Sagi will compute the total transfer length from the scsi command data
Sagi length and protection attributes.

Looks good!

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: DMA window vs actual allocatable DMA memory

2014-06-11 Thread Bjorn Helgaas
On Wed, Jun 11, 2014 at 12:23 PM, Bob Biloxi iambobbil...@gmail.com wrote:
 Hi All,

 I am having trouble understanding DMA window and actual amount of
 addressable DMA memory.

 I hope someone explains me. Let me put my understanding and doubts here:

 Let's say I am writing code for an ethernet device driver in the
 virtualisation(hypervisor) environment.

 Now, if the ethernet adapter requires certain amount of DMA memory, I
 need to allocate heap memory and dma map it and provide to the
 adapter.

 From the hardware perspective, we have a 64GB DMA window.

 I am having trouble understanding this value. Does it mean i can
 allocate 64GB of RAM(Heap memory) and dma map it?

 As i understand there might be a table that translates bus address to
 physical(RAM) addresses. Each entry of such table points to a 4KB
 page. If the size of each entry is 8 bytes and there are 16M such
 entries( 16M * 4K = 64GB DMA window), the size of the table comes to
 around 128M

 Now do I have 64GB DMA memory or 128M DMA memory?

Documentation/DMA-API-HOWTO.txt might help answer your questions.

The bus address to RAM address translation is done by a IOMMU
hardware.  The tables you mention are I/O page tables used by the
IOMMU.  The 128M occupied by the tables is kernel bookkeeping overhead
and has nothing to do with the adapter itself.

The 64GB DMA window might be a hardware feature of the device, i.e.,
maybe it can only generate 36-bit DMA addresses.  That doesn't mean
you have to allocate memory for the whole window; I would guess
drivers would only allocate and map what they need.  I don't know how
they figure out how much to map.

 I want to know what is the max amount of memory that I can
 allocate(heap), dma map and provide it to the adapter.

 I will be really thankful in all the help that I can get!!

 Thanks so much


 Best Regards,
 Marc
 --
 To unsubscribe from this list: send the line unsubscribe linux-pci in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


linux-next: manual merge of the net-next tree with Linus' tree

2014-06-11 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the net-next tree got a conflict in
drivers/infiniband/hw/cxgb4/cm.c between commits 11b8e22d4d09
(RDMA/cxgb4: Fix vlan support) and 9eccfe109b27 (RDMA/cxgb4: Add
support for iWARP Port Mapper user space service) from Linus' tree and
commits 92e7ae71726c (iw_cxgb4: Choose appropriate hw mtu index and
ISS for iWARP connections) and b408ff282dda (iw_cxgb4: don't truncate
the recv window size) from the net-next tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc drivers/infiniband/hw/cxgb4/cm.c
index 96d7131ab974,965eaafd5851..
--- a/drivers/infiniband/hw/cxgb4/cm.c
+++ b/drivers/infiniband/hw/cxgb4/cm.c
@@@ -533,38 -532,17 +537,49 @@@ static int send_abort(struct c4iw_ep *e
return c4iw_l2t_send(ep-com.dev-rdev, skb, ep-l2t);
  }
  
 +/*
 + * c4iw_form_pm_msg - Form a port mapper message with mapping info
 + */
 +static void c4iw_form_pm_msg(struct c4iw_ep *ep,
 +  struct iwpm_sa_data *pm_msg)
 +{
 +  memcpy(pm_msg-loc_addr, ep-com.local_addr,
 +  sizeof(ep-com.local_addr));
 +  memcpy(pm_msg-rem_addr, ep-com.remote_addr,
 +  sizeof(ep-com.remote_addr));
 +}
 +
 +/*
 + * c4iw_form_reg_msg - Form a port mapper message with dev info
 + */
 +static void c4iw_form_reg_msg(struct c4iw_dev *dev,
 +  struct iwpm_dev_data *pm_msg)
 +{
 +  memcpy(pm_msg-dev_name, dev-ibdev.name, IWPM_DEVNAME_SIZE);
 +  memcpy(pm_msg-if_name, dev-rdev.lldi.ports[0]-name,
 +  IWPM_IFNAME_SIZE);
 +}
 +
 +static void c4iw_record_pm_msg(struct c4iw_ep *ep,
 +  struct iwpm_sa_data *pm_msg)
 +{
 +  memcpy(ep-com.mapped_local_addr, pm_msg-mapped_loc_addr,
 +  sizeof(ep-com.mapped_local_addr));
 +  memcpy(ep-com.mapped_remote_addr, pm_msg-mapped_rem_addr,
 +  sizeof(ep-com.mapped_remote_addr));
 +}
 +
+ static void best_mtu(const unsigned short *mtus, unsigned short mtu,
+unsigned int *idx, int use_ts)
+ {
+   unsigned short hdr_size = sizeof(struct iphdr) +
+ sizeof(struct tcphdr) +
+ (use_ts ? 12 : 0);
+   unsigned short data_size = mtu - hdr_size;
+ 
+   cxgb4_best_aligned_mtu(mtus, hdr_size, data_size, 8, idx);
+ }
+ 
  static int send_connect(struct c4iw_ep *ep)
  {
struct cpl_act_open_req *req;
@@@ -583,14 -561,11 +598,15 @@@
int sizev6 = is_t4(ep-com.dev-rdev.lldi.adapter_type) ?
sizeof(struct cpl_act_open_req6) :
sizeof(struct cpl_t5_act_open_req6);
 -  struct sockaddr_in *la = (struct sockaddr_in *)ep-com.local_addr;
 -  struct sockaddr_in *ra = (struct sockaddr_in *)ep-com.remote_addr;
 -  struct sockaddr_in6 *la6 = (struct sockaddr_in6 *)ep-com.local_addr;
 -  struct sockaddr_in6 *ra6 = (struct sockaddr_in6 *)ep-com.remote_addr;
 +  struct sockaddr_in *la = (struct sockaddr_in *)
 +   ep-com.mapped_local_addr;
 +  struct sockaddr_in *ra = (struct sockaddr_in *)
 +   ep-com.mapped_remote_addr;
 +  struct sockaddr_in6 *la6 = (struct sockaddr_in6 *)
 + ep-com.mapped_local_addr;
 +  struct sockaddr_in6 *ra6 = (struct sockaddr_in6 *)
 + ep-com.mapped_remote_addr;
+   int win;
  
wrlen = (ep-com.remote_addr.ss_family == AF_INET) ?
roundup(sizev4, 16) :
@@@ -1796,7 -1821,8 +1862,8 @@@ static int import_ep(struct c4iw_ep *ep
step = cdev-rdev.lldi.nrxq /
cdev-rdev.lldi.nchan;
ep-rss_qid = cdev-rdev.lldi.rxq_ids[
 -  cxgb4_port_idx(n-dev) * step];
 +  cxgb4_port_idx(pdev) * step];
+   set_tcp_window(ep, (struct port_info *)netdev_priv(pdev));
  
if (clear_mpa_v1) {
ep-retry_with_mpa_v1 = 0;


signature.asc
Description: PGP signature


Re: [PATCH v1 3/3] TARGET/sbc,loopback: Adjust command data length in case pi exists on the wire

2014-06-11 Thread Nicholas A. Bellinger
On Wed, 2014-06-11 at 22:32 +, Quinn Tran wrote:
 
 On 6/11/14 2:30 PM, Nicholas A. Bellinger n...@linux-iscsi.org wrote:
 
 On Wed, 2014-06-11 at 10:24 +0300, Sagi Grimberg wrote:
  On 6/11/2014 12:17 AM, Quinn Tran wrote:
 
  SNIP
 
   QT Instead of using existing value within cmd-data_length, can we
   calculated data_length based on secstors  blocksize.
  
   cmd-data_length = sectors * dev-dev_attrib.block_size;
 
  We can do that I suppose...
  Although it seems weird that the core discards the transport
 byte-count...
  Just wandering if we should recalc on the DIF case only or always.
 
 
 Yeah, same concern here wrt to discarding the transport length.
 
 This effectively makes the residual handling further down in
 sbc_parse_cdb() - target_cmd_size_check() always match size ==
 cmd-data_length, because the latter as been recalculated by the former.
 
 Honestly, I don't know if this is a problem for normal READ + WRITE with
 prot=1 as I've never seen size != cmd-data_length for I/O related
 commands, but it does seem potentially dangerous.
 
  
From the QLogic perfective, the cmd-data_length is pull directly
 from the
   wire (i.e. FCP header) when cmd is received.  In essence, it's
 whatever
   the Initiator is set it to.  We does not have any indicator to spot
 DIF vs
   none-DIF cmd when 1st received, unless the code do a peek.
 
  Same for all transports I assume...
 
 
 So just to confirm Quinn, the Qlogic the initiator includes the PI bytes
 into the FCP header data_length for the target-side *_PASS and
 DOUT_STRIP / DIN_INSERT, that is currently passed into
 qla_tgt_ops-handle_cmd(), right..?
 
 QT
 Initiator:
 DOUT_STRIP/DIN_Insert:  FCP_DL = data length only (no dif length)
 Dif PASS: FCP_DL = data length + Dif length.
 
 Target:
 DOUT_strip/DIN_Insert:  qla_tgt_ops-handle_cmd(data length only)
 
 sbc_check_prot()
 If (protect)
cmd-data_length -= cmd-prot_length;   truncation
 
 

nod

 ---
 DIF_PASS: qla_tgt_ops-handle_cmd(data length + Dif length)
 
 

Ok, so Sagi's patches for legacy fabric - PI backend and
PI fabric - PI backend cases look OK, but not for the
PI fabric - legacy backend case as noted above..

Given DOUT_STRIP + DIN_INSERT have not been enabled in sbc_check_prot()
just yet, I'll go ahead and merge his v2 series to address the two
existing cases in -rc1 + v3.15.y stable code.

From there, enabling PI fabric - legacy backend support in = rc2 with
target-core will be easy, as it's just a matter of updating
sbc_set_prot_op_checks() to assign prot_ops, avoid sbc_check_prot()
cmd-data_length recalculation, and making sure PROTECTION control
feature bits are still exposed for legacy - unprotected backends.

Thanks!

--nab

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: DMA window vs actual allocatable DMA memory

2014-06-11 Thread Bob Biloxi
Thanks so much Bjorn for the reply. This is very helpful. Now my
understanding is more clear.
I was curious whether there is any limit from the operating
system(hypervisor) as to how much DMA memory can drivers map.
I'll try to investigate.

On Thu, Jun 12, 2014 at 5:42 AM, Bjorn Helgaas bhelg...@google.com wrote:
 On Wed, Jun 11, 2014 at 12:23 PM, Bob Biloxi iambobbil...@gmail.com wrote:
 Hi All,

 I am having trouble understanding DMA window and actual amount of
 addressable DMA memory.

 I hope someone explains me. Let me put my understanding and doubts here:

 Let's say I am writing code for an ethernet device driver in the
 virtualisation(hypervisor) environment.

 Now, if the ethernet adapter requires certain amount of DMA memory, I
 need to allocate heap memory and dma map it and provide to the
 adapter.

 From the hardware perspective, we have a 64GB DMA window.

 I am having trouble understanding this value. Does it mean i can
 allocate 64GB of RAM(Heap memory) and dma map it?

 As i understand there might be a table that translates bus address to
 physical(RAM) addresses. Each entry of such table points to a 4KB
 page. If the size of each entry is 8 bytes and there are 16M such
 entries( 16M * 4K = 64GB DMA window), the size of the table comes to
 around 128M

 Now do I have 64GB DMA memory or 128M DMA memory?

 Documentation/DMA-API-HOWTO.txt might help answer your questions.

 The bus address to RAM address translation is done by a IOMMU
 hardware.  The tables you mention are I/O page tables used by the
 IOMMU.  The 128M occupied by the tables is kernel bookkeeping overhead
 and has nothing to do with the adapter itself.

 The 64GB DMA window might be a hardware feature of the device, i.e.,
 maybe it can only generate 36-bit DMA addresses.  That doesn't mean
 you have to allocate memory for the whole window; I would guess
 drivers would only allocate and map what they need.  I don't know how
 they figure out how much to map.

 I want to know what is the max amount of memory that I can
 allocate(heap), dma map and provide it to the adapter.

 I will be really thankful in all the help that I can get!!

 Thanks so much


 Best Regards,
 Marc
 --
 To unsubscribe from this list: send the line unsubscribe linux-pci in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html