Re: [PATCH v1 3/3] TARGET/sbc,loopback: Adjust command data length in case pi exists on the wire
On 6/11/2014 12:17 AM, Quinn Tran wrote: SNIP QT Instead of using existing value within cmd-data_length, can we calculated data_length based on secstors blocksize. cmd-data_length = sectors * dev-dev_attrib.block_size; We can do that I suppose... Although it seems weird that the core discards the transport byte-count... Just wandering if we should recalc on the DIF case only or always. From the QLogic perfective, the cmd-data_length is pull directly from the wire (i.e. FCP header) when cmd is received. In essence, it's whatever the Initiator is set it to. We does not have any indicator to spot DIF vs none-DIF cmd when 1st received, unless the code do a peek. Same for all transports I assume... With that said, the cmd-data_length does not guarantee to contain both data length protection length when cmd is submit to TCM/target_submit_cmd(). In Dif-Insert mode, data_length will only contain the actual data(no Dif). No, in the DOUT_INSERT/DIN_STRIP case, protect bits are off and the core will take the data length as is. This case is covered. It's best that the SBC code re-calculate the actual data length and dif data length based on the number of sectors derived from the cmd. Nic, what's your take on this? Sagi. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 0/3] Include protection information in transport header
At the SCSI transport level, there is no distinction between user data and protection information. Thus, iscsi header field expected data transfer length should include protection information. Patch #1 introduces scsi helper scsi_transfer_length which computes wire transfer byte count. Patch #2 modifies iscsi initiator to set correct wire transfer length in iscsi header data_length field (and modifies iser accordingly). Patch #3 modifies target core to re-calculate the pure data length in case of PI presence over the wire (and modifies loopback transport to align with other transports). Changes from v1: - scsi_cmnd: Rewrite scsi_transfer_length as MKP suggested - Target/sbc: re-calculate the data_length in case PI exists on the wire (instead od deacrease data -= prot) Changes from v0: - Introduce scsi helpers to compute correct transfer length in the presence of protection information. - Modify iscsi to set correct transfer length using scsi helpers - Modify loopback transport to set correct transfer length using scsi helpers Sagi Grimberg (3): scsi_cmnd: Introduce scsi_transfer_length helper libiscsi, iser: Adjust data_length to include protection information TARGET/sbc,loopback: Adjust command data length in case pi exists on the wire drivers/infiniband/ulp/iser/iser_initiator.c | 34 +++-- drivers/scsi/libiscsi.c | 18 +++--- drivers/target/loopback/tcm_loop.c | 15 +-- drivers/target/target_core_sbc.c | 15 ++- include/scsi/scsi_cmnd.h | 17 + 5 files changed, 61 insertions(+), 38 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/3] libiscsi, iser: Adjust data_length to include protection information
In case protection information exists over the wire iscsi header data length is required to include it. Use protection information aware scsi helpers to set the correct transfer length. In order to avoid breakage, remove iser transfer length checks for each task as they are not always true and somewhat redundant anyway. Signed-off-by: Sagi Grimberg sa...@mellanox.com Reviewed-by: Mike Christie micha...@cs.wisc.edu --- drivers/infiniband/ulp/iser/iser_initiator.c | 34 +++-- drivers/scsi/libiscsi.c | 18 +++--- 2 files changed, 19 insertions(+), 33 deletions(-) diff --git a/drivers/infiniband/ulp/iser/iser_initiator.c b/drivers/infiniband/ulp/iser/iser_initiator.c index 2e2d903..8d44a40 100644 --- a/drivers/infiniband/ulp/iser/iser_initiator.c +++ b/drivers/infiniband/ulp/iser/iser_initiator.c @@ -41,11 +41,11 @@ #include iscsi_iser.h /* Register user buffer memory and initialize passive rdma - * dto descriptor. Total data size is stored in - * iser_task-data[ISER_DIR_IN].data_len + * dto descriptor. Data size is stored in + * task-data[ISER_DIR_IN].data_len, Protection size + * os stored in task-prot[ISER_DIR_IN].data_len */ -static int iser_prepare_read_cmd(struct iscsi_task *task, -unsigned int edtl) +static int iser_prepare_read_cmd(struct iscsi_task *task) { struct iscsi_iser_task *iser_task = task-dd_data; @@ -73,14 +73,6 @@ static int iser_prepare_read_cmd(struct iscsi_task *task, return err; } - if (edtl iser_task-data[ISER_DIR_IN].data_len) { - iser_err(Total data length: %ld, less than EDTL: -%d, in READ cmd BHS itt: %d, conn: 0x%p\n, -iser_task-data[ISER_DIR_IN].data_len, edtl, -task-itt, iser_task-ib_conn); - return -EINVAL; - } - err = device-iser_reg_rdma_mem(iser_task, ISER_DIR_IN); if (err) { iser_err(Failed to set up Data-IN RDMA\n); @@ -100,8 +92,9 @@ static int iser_prepare_read_cmd(struct iscsi_task *task, } /* Register user buffer memory and initialize passive rdma - * dto descriptor. Total data size is stored in - * task-data[ISER_DIR_OUT].data_len + * dto descriptor. Data size is stored in + * task-data[ISER_DIR_OUT].data_len, Protection size + * is stored at task-prot[ISER_DIR_OUT].data_len */ static int iser_prepare_write_cmd(struct iscsi_task *task, @@ -135,14 +128,6 @@ iser_prepare_write_cmd(struct iscsi_task *task, return err; } - if (edtl iser_task-data[ISER_DIR_OUT].data_len) { - iser_err(Total data length: %ld, less than EDTL: %d, -in WRITE cmd BHS itt: %d, conn: 0x%p\n, -iser_task-data[ISER_DIR_OUT].data_len, -edtl, task-itt, task-conn); - return -EINVAL; - } - err = device-iser_reg_rdma_mem(iser_task, ISER_DIR_OUT); if (err != 0) { iser_err(Failed to register write cmd RDMA mem\n); @@ -417,11 +402,12 @@ int iser_send_command(struct iscsi_conn *conn, if (scsi_prot_sg_count(sc)) { prot_buf-buf = scsi_prot_sglist(sc); prot_buf-size = scsi_prot_sg_count(sc); - prot_buf-data_len = sc-prot_sdb-length; + prot_buf-data_len = data_buf-data_len +ilog2(sc-device-sector_size) * 8; } if (hdr-flags ISCSI_FLAG_CMD_READ) { - err = iser_prepare_read_cmd(task, edtl); + err = iser_prepare_read_cmd(task); if (err) goto send_command_error; } diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c index 26dc005..3f46234 100644 --- a/drivers/scsi/libiscsi.c +++ b/drivers/scsi/libiscsi.c @@ -338,7 +338,7 @@ static int iscsi_prep_scsi_cmd_pdu(struct iscsi_task *task) struct iscsi_session *session = conn-session; struct scsi_cmnd *sc = task-sc; struct iscsi_scsi_req *hdr; - unsigned hdrlength, cmd_len; + unsigned hdrlength, cmd_len, transfer_length; itt_t itt; int rc; @@ -391,11 +391,11 @@ static int iscsi_prep_scsi_cmd_pdu(struct iscsi_task *task) if (scsi_get_prot_op(sc) != SCSI_PROT_NORMAL) task-protected = true; + transfer_length = scsi_transfer_length(sc); + hdr-data_length = cpu_to_be32(transfer_length); if (sc-sc_data_direction == DMA_TO_DEVICE) { - unsigned out_len = scsi_out(sc)-length; struct iscsi_r2t_info *r2t = task-unsol_r2t; - hdr-data_length = cpu_to_be32(out_len); hdr-flags |= ISCSI_FLAG_CMD_WRITE; /* * Write counters: @@ -414,18 +414,19 @@ static int iscsi_prep_scsi_cmd_pdu(struct iscsi_task *task)
[PATCH v2 1/3] scsi_cmnd: Introduce scsi_transfer_length helper
In case protection information exists on the wire scsi transports should include it in the transfer byte count (even if protection information does not exist in the host memory space). This helper will compute the total transfer length from the scsi command data length and protection attributes. Signed-off-by: Sagi Grimberg sa...@mellanox.com Signed-off-by: Martin K. Petersen martin.peter...@oracle.com --- include/scsi/scsi_cmnd.h | 17 + 1 files changed, 17 insertions(+), 0 deletions(-) diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h index dd7c998..a100c6e 100644 --- a/include/scsi/scsi_cmnd.h +++ b/include/scsi/scsi_cmnd.h @@ -7,6 +7,7 @@ #include linux/types.h #include linux/timer.h #include linux/scatterlist.h +#include scsi/scsi_device.h struct Scsi_Host; struct scsi_device; @@ -306,4 +307,20 @@ static inline void set_driver_byte(struct scsi_cmnd *cmd, char status) cmd-result = (cmd-result 0x00ff) | (status 24); } +static inline unsigned scsi_transfer_length(struct scsi_cmnd *scmd) +{ + unsigned int xfer_len = blk_rq_bytes(scmd-request); + unsigned int prot_op = scsi_get_prot_op(scmd); + unsigned int sector_size = scmd-device-sector_size; + + switch (prot_op) { + case SCSI_PROT_NORMAL: + case SCSI_PROT_WRITE_STRIP: + case SCSI_PROT_READ_INSERT: + return xfer_len; + } + + return xfer_len + (xfer_len ilog2(sector_size)) * 8; +} + #endif /* _SCSI_SCSI_CMND_H */ -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 3/3] TARGET/sbc,loopback: Adjust command data length in case pi exists on the wire
In various areas of the code, it is assumed that se_cmd-data_length describes pure data. In case that protection information exists over the wire (protect bits is are on) the target core re-calculates the data length from the CDB and the backed device block size (instead of each transport peeking in the cdb). Modify loopback device to include protection information in the transferred data length (like other scsi transports). Signed-off-by: Sagi Grimberg sa...@mellanox.com --- drivers/target/loopback/tcm_loop.c | 15 --- drivers/target/target_core_sbc.c | 15 +-- 2 files changed, 25 insertions(+), 5 deletions(-) diff --git a/drivers/target/loopback/tcm_loop.c b/drivers/target/loopback/tcm_loop.c index c886ad1..1f4c015 100644 --- a/drivers/target/loopback/tcm_loop.c +++ b/drivers/target/loopback/tcm_loop.c @@ -179,7 +179,7 @@ static void tcm_loop_submission_work(struct work_struct *work) struct tcm_loop_hba *tl_hba; struct tcm_loop_tpg *tl_tpg; struct scatterlist *sgl_bidi = NULL; - u32 sgl_bidi_count = 0; + u32 sgl_bidi_count = 0, transfer_length; int rc; tl_hba = *(struct tcm_loop_hba **)shost_priv(sc-device-host); @@ -213,12 +213,21 @@ static void tcm_loop_submission_work(struct work_struct *work) } - if (!scsi_prot_sg_count(sc) scsi_get_prot_op(sc) != SCSI_PROT_NORMAL) + transfer_length = scsi_transfer_length(sc); + if (!scsi_prot_sg_count(sc) + scsi_get_prot_op(sc) != SCSI_PROT_NORMAL) { se_cmd-prot_pto = true; + /* +* loopback transport doesn't support +* WRITE_GENERATE, READ_STRIP protection +* information operations, go ahead unprotected. +*/ + transfer_length = scsi_bufflen(sc); + } rc = target_submit_cmd_map_sgls(se_cmd, tl_nexus-se_sess, sc-cmnd, tl_cmd-tl_sense_buf[0], tl_cmd-sc-device-lun, - scsi_bufflen(sc), tcm_loop_sam_attr(sc), + transfer_length, tcm_loop_sam_attr(sc), sc-sc_data_direction, 0, scsi_sglist(sc), scsi_sg_count(sc), sgl_bidi, sgl_bidi_count, diff --git a/drivers/target/target_core_sbc.c b/drivers/target/target_core_sbc.c index e022959..4b5716f 100644 --- a/drivers/target/target_core_sbc.c +++ b/drivers/target/target_core_sbc.c @@ -665,8 +665,19 @@ sbc_check_prot(struct se_device *dev, struct se_cmd *cmd, unsigned char *cdb, cmd-prot_type = dev-dev_attrib.pi_prot_type; cmd-prot_length = dev-prot_length * sectors; - pr_debug(%s: prot_type=%d, prot_length=%d prot_op=%d prot_checks=%d\n, -__func__, cmd-prot_type, cmd-prot_length, + + /** +* In case protection information exists over the wire +* we modify command data length to describe pure data. +* The actual transfer length is data length + protection +* length +**/ + if (protect) + cmd-data_length = sectors * dev-dev_attrib.block_size; + + pr_debug(%s: prot_type=%d, data_length=%d, prot_length=%d +prot_op=%d prot_checks=%d\n, +__func__, cmd-prot_type, cmd-data_length, cmd-prot_length, cmd-prot_op, cmd-prot_checks); return true; -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v1 for-next 02/16] RDMA/ocrdma: Query and initalize the PFC SL
On 10/06/2014 17:02, Selvin Xavier wrote: This patch implements routine to query the PFC priority from the adapter port. Following are the changes implemented: * A new FW command is implemented to query the operational/admin DCBX configuration from the FW and obtain active priority(service level). * Adds support for the async event reported by FW when the PFC priority changes. +benet maintainers, Any reason for all code relating to the above not to land in your Ethernet driver? the same FW serves for plain Ethernet and RoCE, isn't that? Service level is re-initialized during modify_qp or create_ah, based on this event. * Maintain SL value in ocrdma_dev structure and refer that as and when needed. Signed-off-by: Devesh Sharma devesh.sha...@emulex.com Signed-off-by: Selvin Xavier selvin.xav...@emulex.com --- drivers/infiniband/hw/ocrdma/ocrdma.h | 21 drivers/infiniband/hw/ocrdma/ocrdma_ah.c |2 + drivers/infiniband/hw/ocrdma/ocrdma_hw.c | 172 drivers/infiniband/hw/ocrdma/ocrdma_hw.h |2 + drivers/infiniband/hw/ocrdma/ocrdma_main.c |1 + drivers/infiniband/hw/ocrdma/ocrdma_sli.h | 81 +- 6 files changed, 278 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/hw/ocrdma/ocrdma.h b/drivers/infiniband/hw/ocrdma/ocrdma.h index 19011db..5cd65c2 100644 --- a/drivers/infiniband/hw/ocrdma/ocrdma.h +++ b/drivers/infiniband/hw/ocrdma/ocrdma.h @@ -236,6 +236,9 @@ struct ocrdma_dev { struct rcu_head rcu; int id; u64 stag_arr[OCRDMA_MAX_STAG]; + u8 sl; /* service level */ + bool pfc_state; + atomic_t update_sl; u16 pvid; u32 asic_id; @@ -518,4 +521,22 @@ static inline u8 ocrdma_get_asic_type(struct ocrdma_dev *dev) OCRDMA_SLI_ASIC_GEN_NUM_SHIFT; } +static inline u8 ocrdma_get_pfc_prio(u8 *pfc, u8 prio) +{ + return *(pfc + prio); +} + +static inline u8 ocrdma_get_app_prio(u8 *app_prio, u8 prio) +{ + return *(app_prio + prio); +} + +static inline u8 ocrdma_is_enabled_and_synced(u32 state) +{ /* May also be used to interpret TC-state, QCN-state +* Appl-state and Logical-link-state in future. +*/ + return (state OCRDMA_STATE_FLAG_ENABLED) + (state OCRDMA_STATE_FLAG_SYNC); +} + #endif diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_ah.c b/drivers/infiniband/hw/ocrdma/ocrdma_ah.c index d4cc01f..a023234 100644 --- a/drivers/infiniband/hw/ocrdma/ocrdma_ah.c +++ b/drivers/infiniband/hw/ocrdma/ocrdma_ah.c @@ -100,6 +100,8 @@ struct ib_ah *ocrdma_create_ah(struct ib_pd *ibpd, struct ib_ah_attr *attr) if (!(attr-ah_flags IB_AH_GRH)) return ERR_PTR(-EINVAL); + if (atomic_cmpxchg(dev-update_sl, 1, 0)) + ocrdma_init_service_level(dev); ah = kzalloc(sizeof(*ah), GFP_ATOMIC); if (!ah) return ERR_PTR(-ENOMEM); diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_hw.c b/drivers/infiniband/hw/ocrdma/ocrdma_hw.c index bce4adf..e6463cb 100644 --- a/drivers/infiniband/hw/ocrdma/ocrdma_hw.c +++ b/drivers/infiniband/hw/ocrdma/ocrdma_hw.c @@ -771,6 +771,10 @@ static void ocrdma_process_grp5_aync(struct ocrdma_dev *dev, OCRDMA_AE_PVID_MCQE_TAG_MASK) OCRDMA_AE_PVID_MCQE_TAG_SHIFT); break; + + case OCRDMA_ASYNC_EVENT_COS_VALUE: + atomic_set(dev-update_sl, 1); + break; default: /* Not interested evts. */ break; @@ -2265,6 +2269,8 @@ static int ocrdma_set_av_params(struct ocrdma_qp *qp, if ((ah_attr-ah_flags IB_AH_GRH) == 0) return -EINVAL; + if (atomic_cmpxchg(qp-dev-update_sl, 1, 0)) + ocrdma_init_service_level(qp-dev); cmd-params.tclass_sq_psn |= (ah_attr-grh.traffic_class OCRDMA_QP_PARAMS_TCLASS_SHIFT); cmd-params.rnt_rc_sl_fl |= @@ -2298,6 +2304,10 @@ static int ocrdma_set_av_params(struct ocrdma_qp *qp, cmd-params.vlan_dmac_b4_to_b5 |= vlan_id OCRDMA_QP_PARAMS_VLAN_SHIFT; cmd-flags |= OCRDMA_QP_PARA_VLAN_EN_VALID; + /* override the sl with default priority if 0 */ + cmd-params.rnt_rc_sl_fl |= + (ah_attr-sl ? ah_attr-sl : + qp-dev-sl) OCRDMA_QP_PARAMS_SL_SHIFT; } return 0; } @@ -2605,6 +2615,168 @@ int ocrdma_mbx_destroy_srq(struct ocrdma_dev *dev, struct ocrdma_srq *srq) return status; } +static int ocrdma_mbx_get_dcbx_config(struct ocrdma_dev *dev, u32 ptype, + struct ocrdma_dcbx_cfg *dcbxcfg) +{ + int status = 0; + dma_addr_t pa; + struct ocrdma_mqe cmd; + + struct ocrdma_get_dcbx_cfg_req *req = NULL; + struct
Re: [PATCH libibverbs V1] Align Flow Steering API to the extension verbs scheme
On 20/05/2014 09:59, Or Gerlitz wrote: On 20/05/2014 04:05, Roland Dreier wrote: On Thu, May 15, 2014 at 6:16 AM, Or Gerlitz ogerl...@mellanox.com wrote: Hi Roland -- this addresses all the comments by Jason who gave a ack/reviewed-by note for both the libibverbs and libmlx4 fixes. So you can move ahead and conduct point releases for both libs. OK... so I should do one more libmlx4 release before you take over? YES, please. Hi Roland, Can you please go and do this joint/point release of libibverbs (1.1.9) and libmlx4 (1.0.7) which fixes the issues with flow steering?! I got distro contact which was about to take now 1.1.8/1.0.6 and asked them to wait few days and take the proper bits, so... Or. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH libibverbs V1] Align Flow Steering API to the extension verbs scheme
On 11/06/2014 12:53, Or Gerlitz wrote: On 20/05/2014 09:59, Or Gerlitz wrote: On 20/05/2014 04:05, Roland Dreier wrote: On Thu, May 15, 2014 at 6:16 AM, Or Gerlitz ogerl...@mellanox.com wrote: Hi Roland -- this addresses all the comments by Jason who gave a ack/reviewed-by note for both the libibverbs and libmlx4 fixes. So you can move ahead and conduct point releases for both libs. OK... so I should do one more libmlx4 release before you take over? YES, please. Hi Roland, Can you please go and do this joint/point release of libibverbs (1.1.9) and libmlx4 (1.0.7) which fixes the issues with flow steering?! FWIW, if it helps you, these are the relevant patchwork entries https://patchwork.kernel.org/patch/4182461/ https://patchwork.kernel.org/patch/4174891/ -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH libibverbs V4 2/5] Extend create_ah in order to pass L2 parameters to the provider
On 22/05/2014 11:52, Matan Barak wrote: On 21/5/2014 10:55 PM, Jason Gunthorpe wrote: On Sun, May 18, 2014 at 12:38:46PM +0300, Or Gerlitz wrote: From: Matan Barak mat...@mellanox.com In order to support IP based addressing, one needs to pass the L2 parameters to the provider drive. This is done through a new extendable 'driver' Please provide a man page. In this case you probably want to provide a patch to the existing man/ibv_create_ah.3 to discuss the 2nd entry point and new fields. Ok, I'll add this entry to the ibv_create_ah man page. +enum ibv_ah_attr_ex_attr_mask { +IBV_AH_ATTR_EX_LL = 1UL 0, +IBV_AH_ATTR_EX_VID = 1UL 1, +}; OK +struct ibv_ah_attr_ex { +struct ibv_global_routegrh; +uint16_tdlid; +uint8_tsl; +uint8_tsrc_path_bits; +uint8_tstatic_rate; +uint8_tis_global; +uint8_tport_num; +uint32_tcomp_mask; OK +union { +struct sockaddrsa; +struct sockaddr_storage _s; +} ll; +uint16_tvid; +}; Hard to know exactly what is going on here without a man page, but I thought one of the points was to provide an ethernet L2 MAC address? Shouldn't there be a mechanism for that? I see this: + attr_ex.ll.sa.sa_family = ARPHRD_ETHER; Which makes no sense, sa_family should be an AF_* constant. Kernel code *sometimes* use sa_family as ARPHRD_ETHER. So, I think this bit needs some kind of work. If you want to setup ethernet, then setup ethernet: uint64_t eth_dmac uint16_t eth_vid; and what about all the trendy new ethernet stuff, overlay networks, etc? Can that be expressed in there? I don't want to make this Ethernet specific. That's why the previous pointer used a pointer. I don't mind creating a generic interface for link layer if the existing solutions aren't good enough. Any suggestions? Hi Jason, Can you please address Matan's comments? we'd like this thread to converge... Or. @@ -975,6 +998,8 @@ enum verbs_context_mask { struct verbs_context { /* grows up - new fields go here */ +struct ibv_ah * (*create_ah_ex)(struct ibv_pd *pd, +struct ibv_ah_attr_ex *attr); Can you check if 'attr' should be const? I see the existing API isn't const, but I am suspecting that is a mistake? You're probably right, but wouldn't we want to be aligned with the non-extended verb: struct ibv_ah * (*create_ah)(struct ibv_pd *pd, struct ibv_ah_attr *attr); Since the provider driver usually go through a common path for both the non-extended and the extended verb, this could cause an ugly const cast. Jason Matan -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH libibverbs V4 2/5] Extend create_ah in order to pass L2 parameters to the provider
On Wed, Jun 11, 2014 at 01:54:56PM +0300, Or Gerlitz wrote: Can you please address Matan's comments? we'd like this thread to converge... I am waiting for your analysis as I mentioned in my last message to this thread: https://www.mail-archive.com/linux-rdma@vger.kernel.org/msg19925.html That is rather a larger question, and the answer may well be we don't need two new extended verbs to do what you want... Jason -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC] ibacm provider infrastructure
A couple of months ago, I posted the start of a patch series to re-architect ibacm around supporting multiple providers. http://www.spinics.net/lists/linux-rdma/msg19363.html This more easily allows ibacm to support other mechanisms for resolving address and path record data, with support for ibssa (Scalable SA) as one obvious candidate. A complete set of patches that completes the re-architecture is now available in the ibacm 'prov' branch. http://git.openfabrics.org/?p=~shefty/ibacm.git;a=shortlog;h=refs/heads/prov In total, there are 40-50 patches, with most of the work actually done by Kaike Wan. Because of the size and limited audience interested in this work, I don't plan on posting these to the mail list. But anyone interested in reviewing the patches should feel free to comment on them. Likely the most interesting result of the work is the provider interface to ibacm, which is defined in include/infiniband/acm_prov.h: http://git.openfabrics.org/?p=~shefty/ibacm.git;a=blob;f=include/infiniband/acm_prov.h;h=dcfdf5fce7373a7d57bcbe3fe54d6fe5d81935e2;hb=refs/heads/prov - Sean -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
DMA window vs actual allocatable DMA memory
Hi All, I am having trouble understanding DMA window and actual amount of addressable DMA memory. I hope someone explains me. Let me put my understanding and doubts here: Let's say I am writing code for an ethernet device driver in the virtualisation(hypervisor) environment. Now, if the ethernet adapter requires certain amount of DMA memory, I need to allocate heap memory and dma map it and provide to the adapter. From the hardware perspective, we have a 64GB DMA window. I am having trouble understanding this value. Does it mean i can allocate 64GB of RAM(Heap memory) and dma map it? As i understand there might be a table that translates bus address to physical(RAM) addresses. Each entry of such table points to a 4KB page. If the size of each entry is 8 bytes and there are 16M such entries( 16M * 4K = 64GB DMA window), the size of the table comes to around 128M Now do I have 64GB DMA memory or 128M DMA memory? I want to know what is the max amount of memory that I can allocate(heap), dma map and provide it to the adapter. I will be really thankful in all the help that I can get!! Thanks so much Best Regards, Marc -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 1/1] IPoIB: Improve parent interface p_key handling.
With reference to commit c2904141696ee19551f155396f23cdd5d95e (IPoIB: Fix pkey change flow for virtualization environments) It was noticed that parent interface keeps sending broadcast group join requests if p_key index 0 is invalid. That creates unnecessary noise on a fabric: ib0: multicast join failed for ff12:401b:8000:::::, status -22 Proposed patch re-init resources, then brings interface up only if p_key idx 0 is valid either on bootup or on PKEY_CHANGE event. Original code could run multicast task regardless of p_key value, which we want to avoid. Reviewed-by: Ira Weiny ira.we...@intel.com Signed-off-by: Alex Estrin alex.est...@intel.com --- Changes from v2: Handle pkey change event for a case when interface is down (pointed out by Erez Shitrit ere...@dev.mellanox.co.il). Changes from v1: p_key check for 'Invalid' value was moved to ipoib_pkey_dev_check_presence() that is used now in ipoib_ib_dev_open() for p_key validation. Whitespace and format adjusted. drivers/infiniband/ulp/ipoib/ipoib_ib.c | 35 ++ 1 files changed, 21 insertions(+), 14 deletions(-) diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c index 6a7003d..a2af996 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c @@ -52,6 +52,7 @@ MODULE_PARM_DESC(data_debug_level, #endif static DEFINE_MUTEX(pkey_mutex); +static void ipoib_pkey_dev_check_presence(struct net_device *dev); struct ipoib_ah *ipoib_create_ah(struct net_device *dev, struct ib_pd *pd, struct ib_ah_attr *attr) @@ -669,12 +670,13 @@ int ipoib_ib_dev_open(struct net_device *dev) struct ipoib_dev_priv *priv = netdev_priv(dev); int ret; - if (ib_find_pkey(priv-ca, priv-port, priv-pkey, priv-pkey_index)) { - ipoib_warn(priv, P_Key 0x%04x not found\n, priv-pkey); - clear_bit(IPOIB_PKEY_ASSIGNED, priv-flags); + ipoib_pkey_dev_check_presence(dev); + + if (!test_bit(IPOIB_PKEY_ASSIGNED, priv-flags)) { + ipoib_warn(priv, P_Key 0x%04x is %s\n, priv-pkey, + !(priv-pkey 0x7fff) ? Invalid : not found); return -1; } - set_bit(IPOIB_PKEY_ASSIGNED, priv-flags); ret = ipoib_init_qp(dev); if (ret) { @@ -712,9 +714,10 @@ dev_stop: static void ipoib_pkey_dev_check_presence(struct net_device *dev) { struct ipoib_dev_priv *priv = netdev_priv(dev); - u16 pkey_index = 0; - if (ib_find_pkey(priv-ca, priv-port, priv-pkey, pkey_index)) + if (!(priv-pkey 0x7fff) || + ib_find_pkey(priv-ca, priv-port, priv-pkey, + priv-pkey_index)) clear_bit(IPOIB_PKEY_ASSIGNED, priv-flags); else set_bit(IPOIB_PKEY_ASSIGNED, priv-flags); @@ -987,15 +990,17 @@ static void __ipoib_ib_dev_flush(struct ipoib_dev_priv *priv, up_read(priv-vlan_rwsem); if (!test_bit(IPOIB_FLAG_INITIALIZED, priv-flags)) { - /* for non-child devices must check/update the pkey value here */ - if (level == IPOIB_FLUSH_HEAVY - !test_bit(IPOIB_FLAG_SUBINTERFACE, priv-flags)) - update_parent_pkey(priv); - ipoib_dbg(priv, Not flushing - IPOIB_FLAG_INITIALIZED not set.\n); - return; + if (level != IPOIB_FLUSH_HEAVY) { + ipoib_dbg(priv, Not flushing - IPOIB_FLAG_INITIALIZED not set.\n); + return; + } } if (!test_bit(IPOIB_FLAG_ADMIN_UP, priv-flags)) { + /* interface is down. update pkey and leave. */ + if (level == IPOIB_FLUSH_HEAVY + !test_bit(IPOIB_FLAG_SUBINTERFACE, priv-flags)) + update_parent_pkey(priv); ipoib_dbg(priv, Not flushing - IPOIB_FLAG_ADMIN_UP not set.\n); return; } @@ -1038,8 +1043,10 @@ static void __ipoib_ib_dev_flush(struct ipoib_dev_priv *priv, ipoib_ib_dev_down(dev, 0); if (level == IPOIB_FLUSH_HEAVY) { - ipoib_ib_dev_stop(dev, 0); - ipoib_ib_dev_open(dev); + if (test_bit(IPOIB_FLAG_INITIALIZED, priv-flags)) + ipoib_ib_dev_stop(dev, 0); + if (ipoib_ib_dev_open(dev) != 0) + return; } /* -- 1.7.0.7 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 0/1] IPoIB: Improve parent interface p_key handling.
Changes from v2: Handle pkey change event for a case when interface is down (pointed out by Erez Shitrit ere...@dev.mellanox.co.il). Changes from v1: p_key check for 'Invalid' value was moved to ipoib_pkey_dev_check_presence() that is used now in ipoib_ib_dev_open() for p_key validation. Whitespace and format adjusted. --- Alex Estrin (1): IPoIB: Improve parent interface p_key handling. drivers/infiniband/ulp/ipoib/ipoib_ib.c | 35 +++ 1 files changed, 21 insertions(+), 14 deletions(-) -- -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 1/1] IPoIB: Improve parent interface p_key handling.
With reference to commit c2904141696ee19551f155396f23cdd5d95e (IPoIB: Fix pkey change flow for virtualization environments) It was noticed that parent interface keeps sending broadcast group join requests if p_key index 0 is invalid. That creates unnecessary noise on a fabric: ib0: multicast join failed for ff12:401b:8000:::::, status -22 Proposed patch re-init resources, then brings interface up only if p_key idx 0 is valid either on bootup or on PKEY_CHANGE event. Original code could run multicast task regardless of p_key value, which we want to avoid. Reviewed-by: Ira Weiny ira.we...@intel.com Signed-off-by: Alex Estrin alex.est...@intel.com --- drivers/infiniband/ulp/ipoib/ipoib_ib.c | 35 +++ 1 files changed, 21 insertions(+), 14 deletions(-) diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c index 6a7003d..a2af996 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c @@ -52,6 +52,7 @@ MODULE_PARM_DESC(data_debug_level, #endif static DEFINE_MUTEX(pkey_mutex); +static void ipoib_pkey_dev_check_presence(struct net_device *dev); struct ipoib_ah *ipoib_create_ah(struct net_device *dev, struct ib_pd *pd, struct ib_ah_attr *attr) @@ -669,12 +670,13 @@ int ipoib_ib_dev_open(struct net_device *dev) struct ipoib_dev_priv *priv = netdev_priv(dev); int ret; - if (ib_find_pkey(priv-ca, priv-port, priv-pkey, priv-pkey_index)) { - ipoib_warn(priv, P_Key 0x%04x not found\n, priv-pkey); - clear_bit(IPOIB_PKEY_ASSIGNED, priv-flags); + ipoib_pkey_dev_check_presence(dev); + + if (!test_bit(IPOIB_PKEY_ASSIGNED, priv-flags)) { + ipoib_warn(priv, P_Key 0x%04x is %s\n, priv-pkey, + !(priv-pkey 0x7fff) ? Invalid : not found); return -1; } - set_bit(IPOIB_PKEY_ASSIGNED, priv-flags); ret = ipoib_init_qp(dev); if (ret) { @@ -712,9 +714,10 @@ dev_stop: static void ipoib_pkey_dev_check_presence(struct net_device *dev) { struct ipoib_dev_priv *priv = netdev_priv(dev); - u16 pkey_index = 0; - if (ib_find_pkey(priv-ca, priv-port, priv-pkey, pkey_index)) + if (!(priv-pkey 0x7fff) || + ib_find_pkey(priv-ca, priv-port, priv-pkey, + priv-pkey_index)) clear_bit(IPOIB_PKEY_ASSIGNED, priv-flags); else set_bit(IPOIB_PKEY_ASSIGNED, priv-flags); @@ -987,15 +990,17 @@ static void __ipoib_ib_dev_flush(struct ipoib_dev_priv *priv, up_read(priv-vlan_rwsem); if (!test_bit(IPOIB_FLAG_INITIALIZED, priv-flags)) { - /* for non-child devices must check/update the pkey value here */ - if (level == IPOIB_FLUSH_HEAVY - !test_bit(IPOIB_FLAG_SUBINTERFACE, priv-flags)) - update_parent_pkey(priv); - ipoib_dbg(priv, Not flushing - IPOIB_FLAG_INITIALIZED not set.\n); - return; + if (level != IPOIB_FLUSH_HEAVY) { + ipoib_dbg(priv, Not flushing - IPOIB_FLAG_INITIALIZED not set.\n); + return; + } } if (!test_bit(IPOIB_FLAG_ADMIN_UP, priv-flags)) { + /* interface is down. update pkey and leave. */ + if (level == IPOIB_FLUSH_HEAVY + !test_bit(IPOIB_FLAG_SUBINTERFACE, priv-flags)) + update_parent_pkey(priv); ipoib_dbg(priv, Not flushing - IPOIB_FLAG_ADMIN_UP not set.\n); return; } @@ -1038,8 +1043,10 @@ static void __ipoib_ib_dev_flush(struct ipoib_dev_priv *priv, ipoib_ib_dev_down(dev, 0); if (level == IPOIB_FLUSH_HEAVY) { - ipoib_ib_dev_stop(dev, 0); - ipoib_ib_dev_open(dev); + if (test_bit(IPOIB_FLAG_INITIALIZED, priv-flags)) + ipoib_ib_dev_stop(dev, 0); + if (ipoib_ib_dev_open(dev) != 0) + return; } /* -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v3 1/1] sorry for duplicate emails.
Hi guys, I'm really sorry for duplicate e-mails I have sent out. Somehow messages didn't get back to my list, so I assumed emails were dropped . Kind regards, Alex Estrin.
Re: [PATCH v1 3/3] TARGET/sbc,loopback: Adjust command data length in case pi exists on the wire
On Wed, 2014-06-11 at 10:24 +0300, Sagi Grimberg wrote: On 6/11/2014 12:17 AM, Quinn Tran wrote: SNIP QT Instead of using existing value within cmd-data_length, can we calculated data_length based on secstors blocksize. cmd-data_length = sectors * dev-dev_attrib.block_size; We can do that I suppose... Although it seems weird that the core discards the transport byte-count... Just wandering if we should recalc on the DIF case only or always. Yeah, same concern here wrt to discarding the transport length. This effectively makes the residual handling further down in sbc_parse_cdb() - target_cmd_size_check() always match size == cmd-data_length, because the latter as been recalculated by the former. Honestly, I don't know if this is a problem for normal READ + WRITE with prot=1 as I've never seen size != cmd-data_length for I/O related commands, but it does seem potentially dangerous. From the QLogic perfective, the cmd-data_length is pull directly from the wire (i.e. FCP header) when cmd is received. In essence, it's whatever the Initiator is set it to. We does not have any indicator to spot DIF vs none-DIF cmd when 1st received, unless the code do a peek. Same for all transports I assume... So just to confirm Quinn, the Qlogic the initiator includes the PI bytes into the FCP header data_length for the target-side *_PASS and DOUT_STRIP / DIN_INSERT, that is currently passed into qla_tgt_ops-handle_cmd(), right..? If that is the case, Sagi's v1 to cmd-data_length -= cmd-prot_length seems like it would still do right thing for *_PASS and DOUT_STRIP / DIN_INSERT, given that cmd-prot_length is calculated in sbc_check_prot() based upon dev-prot_length * sectors.. With that said, the cmd-data_length does not guarantee to contain both data length protection length when cmd is submit to TCM/target_submit_cmd(). In Dif-Insert mode, data_length will only contain the actual data(no Dif). No, in the DOUT_INSERT/DIN_STRIP case, protect bits are off and the core will take the data length as is. This case is covered. nod It's best that the SBC code re-calculate the actual data length and dif data length based on the number of sectors derived from the cmd. Nic, what's your take on this? Hard to say. Discarding the transport length in v2 doesn't seem like a good idea, but subtracting from cmd-prot_length in v1 is using the sector count from the CDB anyways, so it's essentially the same tradeoff of recalculating the transport's cmd-data_length from values within the CDB w/ prot=1. MKP, WDYT..? --nab -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 0/3] Include protection information in transport header
On Wed, 2014-06-11 at 12:09 +0300, Sagi Grimberg wrote: At the SCSI transport level, there is no distinction between user data and protection information. Thus, iscsi header field expected data transfer length should include protection information. Patch #1 introduces scsi helper scsi_transfer_length which computes wire transfer byte count. Patch #2 modifies iscsi initiator to set correct wire transfer length in iscsi header data_length field (and modifies iser accordingly). Patch #3 modifies target core to re-calculate the pure data length in case of PI presence over the wire (and modifies loopback transport to align with other transports). Changes from v1: - scsi_cmnd: Rewrite scsi_transfer_length as MKP suggested - Target/sbc: re-calculate the data_length in case PI exists on the wire (instead od deacrease data -= prot) Changes from v0: - Introduce scsi helpers to compute correct transfer length in the presence of protection information. - Modify iscsi to set correct transfer length using scsi helpers - Modify loopback transport to set correct transfer length using scsi helpers Sagi Grimberg (3): scsi_cmnd: Introduce scsi_transfer_length helper libiscsi, iser: Adjust data_length to include protection information TARGET/sbc,loopback: Adjust command data length in case pi exists on the wire drivers/infiniband/ulp/iser/iser_initiator.c | 34 +++-- drivers/scsi/libiscsi.c | 18 +++--- drivers/target/loopback/tcm_loop.c | 15 +-- drivers/target/target_core_sbc.c | 15 ++- include/scsi/scsi_cmnd.h | 17 + 5 files changed, 61 insertions(+), 38 deletions(-) Ok, I've applied these to for-next, but let's see what MKP recommends for patch #3 wrt to recalculating cmd-data_length. Thanks Sagi! --nab -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v1 3/3] TARGET/sbc,loopback: Adjust command data length in case pi exists on the wire
On 6/11/14 2:30 PM, Nicholas A. Bellinger n...@linux-iscsi.org wrote: On Wed, 2014-06-11 at 10:24 +0300, Sagi Grimberg wrote: On 6/11/2014 12:17 AM, Quinn Tran wrote: SNIP QT Instead of using existing value within cmd-data_length, can we calculated data_length based on secstors blocksize. cmd-data_length = sectors * dev-dev_attrib.block_size; We can do that I suppose... Although it seems weird that the core discards the transport byte-count... Just wandering if we should recalc on the DIF case only or always. Yeah, same concern here wrt to discarding the transport length. This effectively makes the residual handling further down in sbc_parse_cdb() - target_cmd_size_check() always match size == cmd-data_length, because the latter as been recalculated by the former. Honestly, I don't know if this is a problem for normal READ + WRITE with prot=1 as I've never seen size != cmd-data_length for I/O related commands, but it does seem potentially dangerous. From the QLogic perfective, the cmd-data_length is pull directly from the wire (i.e. FCP header) when cmd is received. In essence, it's whatever the Initiator is set it to. We does not have any indicator to spot DIF vs none-DIF cmd when 1st received, unless the code do a peek. Same for all transports I assume... So just to confirm Quinn, the Qlogic the initiator includes the PI bytes into the FCP header data_length for the target-side *_PASS and DOUT_STRIP / DIN_INSERT, that is currently passed into qla_tgt_ops-handle_cmd(), right..? QT Initiator: DOUT_STRIP/DIN_Insert: FCP_DL = data length only (no dif length) Dif PASS: FCP_DL = data length + Dif length. Target: DOUT_strip/DIN_Insert: qla_tgt_ops-handle_cmd(data length only) sbc_check_prot() If (protect) cmd-data_length -= cmd-prot_length; truncation --- DIF_PASS: qla_tgt_ops-handle_cmd(data length + Dif length) If that is the case, Sagi's v1 to cmd-data_length -= cmd-prot_length seems like it would still do right thing for *_PASS and DOUT_STRIP / DIN_INSERT, given that cmd-prot_length is calculated in sbc_check_prot() based upon dev-prot_length * sectors.. With that said, the cmd-data_length does not guarantee to contain both data length protection length when cmd is submit to TCM/target_submit_cmd(). In Dif-Insert mode, data_length will only contain the actual data(no Dif). No, in the DOUT_INSERT/DIN_STRIP case, protect bits are off and the core will take the data length as is. This case is covered. nod QT agree. It's best that the SBC code re-calculate the actual data length and dif data length based on the number of sectors derived from the cmd. Nic, what's your take on this? Hard to say. Discarding the transport length in v2 doesn't seem like a good idea, but subtracting from cmd-prot_length in v1 is using the sector count from the CDB anyways, so it's essentially the same tradeoff of recalculating the transport's cmd-data_length from values within the CDB w/ prot=1. MKP, WDYT..? --nab This message and any attached documents contain information from QLogic Corporation or its wholly-owned subsidiaries that may be confidential. If you are not the intended recipient, you may not read, copy, distribute, or use this information. If you have received this transmission in error, please notify the sender immediately by reply e-mail and then delete this message. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/3] scsi_cmnd: Introduce scsi_transfer_length helper
Sagi == Sagi Grimberg sa...@mellanox.com writes: Sagi In case protection information exists on the wire scsi transports Sagi should include it in the transfer byte count (even if protection Sagi information does not exist in the host memory space). This helper Sagi will compute the total transfer length from the scsi command data Sagi length and protection attributes. Looks good! -- Martin K. Petersen Oracle Linux Engineering -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: DMA window vs actual allocatable DMA memory
On Wed, Jun 11, 2014 at 12:23 PM, Bob Biloxi iambobbil...@gmail.com wrote: Hi All, I am having trouble understanding DMA window and actual amount of addressable DMA memory. I hope someone explains me. Let me put my understanding and doubts here: Let's say I am writing code for an ethernet device driver in the virtualisation(hypervisor) environment. Now, if the ethernet adapter requires certain amount of DMA memory, I need to allocate heap memory and dma map it and provide to the adapter. From the hardware perspective, we have a 64GB DMA window. I am having trouble understanding this value. Does it mean i can allocate 64GB of RAM(Heap memory) and dma map it? As i understand there might be a table that translates bus address to physical(RAM) addresses. Each entry of such table points to a 4KB page. If the size of each entry is 8 bytes and there are 16M such entries( 16M * 4K = 64GB DMA window), the size of the table comes to around 128M Now do I have 64GB DMA memory or 128M DMA memory? Documentation/DMA-API-HOWTO.txt might help answer your questions. The bus address to RAM address translation is done by a IOMMU hardware. The tables you mention are I/O page tables used by the IOMMU. The 128M occupied by the tables is kernel bookkeeping overhead and has nothing to do with the adapter itself. The 64GB DMA window might be a hardware feature of the device, i.e., maybe it can only generate 36-bit DMA addresses. That doesn't mean you have to allocate memory for the whole window; I would guess drivers would only allocate and map what they need. I don't know how they figure out how much to map. I want to know what is the max amount of memory that I can allocate(heap), dma map and provide it to the adapter. I will be really thankful in all the help that I can get!! Thanks so much Best Regards, Marc -- To unsubscribe from this list: send the line unsubscribe linux-pci in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
linux-next: manual merge of the net-next tree with Linus' tree
Hi all, Today's linux-next merge of the net-next tree got a conflict in drivers/infiniband/hw/cxgb4/cm.c between commits 11b8e22d4d09 (RDMA/cxgb4: Fix vlan support) and 9eccfe109b27 (RDMA/cxgb4: Add support for iWARP Port Mapper user space service) from Linus' tree and commits 92e7ae71726c (iw_cxgb4: Choose appropriate hw mtu index and ISS for iWARP connections) and b408ff282dda (iw_cxgb4: don't truncate the recv window size) from the net-next tree. I fixed it up (see below) and can carry the fix as necessary (no action is required). -- Cheers, Stephen Rothwells...@canb.auug.org.au diff --cc drivers/infiniband/hw/cxgb4/cm.c index 96d7131ab974,965eaafd5851.. --- a/drivers/infiniband/hw/cxgb4/cm.c +++ b/drivers/infiniband/hw/cxgb4/cm.c @@@ -533,38 -532,17 +537,49 @@@ static int send_abort(struct c4iw_ep *e return c4iw_l2t_send(ep-com.dev-rdev, skb, ep-l2t); } +/* + * c4iw_form_pm_msg - Form a port mapper message with mapping info + */ +static void c4iw_form_pm_msg(struct c4iw_ep *ep, + struct iwpm_sa_data *pm_msg) +{ + memcpy(pm_msg-loc_addr, ep-com.local_addr, + sizeof(ep-com.local_addr)); + memcpy(pm_msg-rem_addr, ep-com.remote_addr, + sizeof(ep-com.remote_addr)); +} + +/* + * c4iw_form_reg_msg - Form a port mapper message with dev info + */ +static void c4iw_form_reg_msg(struct c4iw_dev *dev, + struct iwpm_dev_data *pm_msg) +{ + memcpy(pm_msg-dev_name, dev-ibdev.name, IWPM_DEVNAME_SIZE); + memcpy(pm_msg-if_name, dev-rdev.lldi.ports[0]-name, + IWPM_IFNAME_SIZE); +} + +static void c4iw_record_pm_msg(struct c4iw_ep *ep, + struct iwpm_sa_data *pm_msg) +{ + memcpy(ep-com.mapped_local_addr, pm_msg-mapped_loc_addr, + sizeof(ep-com.mapped_local_addr)); + memcpy(ep-com.mapped_remote_addr, pm_msg-mapped_rem_addr, + sizeof(ep-com.mapped_remote_addr)); +} + + static void best_mtu(const unsigned short *mtus, unsigned short mtu, +unsigned int *idx, int use_ts) + { + unsigned short hdr_size = sizeof(struct iphdr) + + sizeof(struct tcphdr) + + (use_ts ? 12 : 0); + unsigned short data_size = mtu - hdr_size; + + cxgb4_best_aligned_mtu(mtus, hdr_size, data_size, 8, idx); + } + static int send_connect(struct c4iw_ep *ep) { struct cpl_act_open_req *req; @@@ -583,14 -561,11 +598,15 @@@ int sizev6 = is_t4(ep-com.dev-rdev.lldi.adapter_type) ? sizeof(struct cpl_act_open_req6) : sizeof(struct cpl_t5_act_open_req6); - struct sockaddr_in *la = (struct sockaddr_in *)ep-com.local_addr; - struct sockaddr_in *ra = (struct sockaddr_in *)ep-com.remote_addr; - struct sockaddr_in6 *la6 = (struct sockaddr_in6 *)ep-com.local_addr; - struct sockaddr_in6 *ra6 = (struct sockaddr_in6 *)ep-com.remote_addr; + struct sockaddr_in *la = (struct sockaddr_in *) + ep-com.mapped_local_addr; + struct sockaddr_in *ra = (struct sockaddr_in *) + ep-com.mapped_remote_addr; + struct sockaddr_in6 *la6 = (struct sockaddr_in6 *) + ep-com.mapped_local_addr; + struct sockaddr_in6 *ra6 = (struct sockaddr_in6 *) + ep-com.mapped_remote_addr; + int win; wrlen = (ep-com.remote_addr.ss_family == AF_INET) ? roundup(sizev4, 16) : @@@ -1796,7 -1821,8 +1862,8 @@@ static int import_ep(struct c4iw_ep *ep step = cdev-rdev.lldi.nrxq / cdev-rdev.lldi.nchan; ep-rss_qid = cdev-rdev.lldi.rxq_ids[ - cxgb4_port_idx(n-dev) * step]; + cxgb4_port_idx(pdev) * step]; + set_tcp_window(ep, (struct port_info *)netdev_priv(pdev)); if (clear_mpa_v1) { ep-retry_with_mpa_v1 = 0; signature.asc Description: PGP signature
Re: [PATCH v1 3/3] TARGET/sbc,loopback: Adjust command data length in case pi exists on the wire
On Wed, 2014-06-11 at 22:32 +, Quinn Tran wrote: On 6/11/14 2:30 PM, Nicholas A. Bellinger n...@linux-iscsi.org wrote: On Wed, 2014-06-11 at 10:24 +0300, Sagi Grimberg wrote: On 6/11/2014 12:17 AM, Quinn Tran wrote: SNIP QT Instead of using existing value within cmd-data_length, can we calculated data_length based on secstors blocksize. cmd-data_length = sectors * dev-dev_attrib.block_size; We can do that I suppose... Although it seems weird that the core discards the transport byte-count... Just wandering if we should recalc on the DIF case only or always. Yeah, same concern here wrt to discarding the transport length. This effectively makes the residual handling further down in sbc_parse_cdb() - target_cmd_size_check() always match size == cmd-data_length, because the latter as been recalculated by the former. Honestly, I don't know if this is a problem for normal READ + WRITE with prot=1 as I've never seen size != cmd-data_length for I/O related commands, but it does seem potentially dangerous. From the QLogic perfective, the cmd-data_length is pull directly from the wire (i.e. FCP header) when cmd is received. In essence, it's whatever the Initiator is set it to. We does not have any indicator to spot DIF vs none-DIF cmd when 1st received, unless the code do a peek. Same for all transports I assume... So just to confirm Quinn, the Qlogic the initiator includes the PI bytes into the FCP header data_length for the target-side *_PASS and DOUT_STRIP / DIN_INSERT, that is currently passed into qla_tgt_ops-handle_cmd(), right..? QT Initiator: DOUT_STRIP/DIN_Insert: FCP_DL = data length only (no dif length) Dif PASS: FCP_DL = data length + Dif length. Target: DOUT_strip/DIN_Insert: qla_tgt_ops-handle_cmd(data length only) sbc_check_prot() If (protect) cmd-data_length -= cmd-prot_length; truncation nod --- DIF_PASS: qla_tgt_ops-handle_cmd(data length + Dif length) Ok, so Sagi's patches for legacy fabric - PI backend and PI fabric - PI backend cases look OK, but not for the PI fabric - legacy backend case as noted above.. Given DOUT_STRIP + DIN_INSERT have not been enabled in sbc_check_prot() just yet, I'll go ahead and merge his v2 series to address the two existing cases in -rc1 + v3.15.y stable code. From there, enabling PI fabric - legacy backend support in = rc2 with target-core will be easy, as it's just a matter of updating sbc_set_prot_op_checks() to assign prot_ops, avoid sbc_check_prot() cmd-data_length recalculation, and making sure PROTECTION control feature bits are still exposed for legacy - unprotected backends. Thanks! --nab -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: DMA window vs actual allocatable DMA memory
Thanks so much Bjorn for the reply. This is very helpful. Now my understanding is more clear. I was curious whether there is any limit from the operating system(hypervisor) as to how much DMA memory can drivers map. I'll try to investigate. On Thu, Jun 12, 2014 at 5:42 AM, Bjorn Helgaas bhelg...@google.com wrote: On Wed, Jun 11, 2014 at 12:23 PM, Bob Biloxi iambobbil...@gmail.com wrote: Hi All, I am having trouble understanding DMA window and actual amount of addressable DMA memory. I hope someone explains me. Let me put my understanding and doubts here: Let's say I am writing code for an ethernet device driver in the virtualisation(hypervisor) environment. Now, if the ethernet adapter requires certain amount of DMA memory, I need to allocate heap memory and dma map it and provide to the adapter. From the hardware perspective, we have a 64GB DMA window. I am having trouble understanding this value. Does it mean i can allocate 64GB of RAM(Heap memory) and dma map it? As i understand there might be a table that translates bus address to physical(RAM) addresses. Each entry of such table points to a 4KB page. If the size of each entry is 8 bytes and there are 16M such entries( 16M * 4K = 64GB DMA window), the size of the table comes to around 128M Now do I have 64GB DMA memory or 128M DMA memory? Documentation/DMA-API-HOWTO.txt might help answer your questions. The bus address to RAM address translation is done by a IOMMU hardware. The tables you mention are I/O page tables used by the IOMMU. The 128M occupied by the tables is kernel bookkeeping overhead and has nothing to do with the adapter itself. The 64GB DMA window might be a hardware feature of the device, i.e., maybe it can only generate 36-bit DMA addresses. That doesn't mean you have to allocate memory for the whole window; I would guess drivers would only allocate and map what they need. I don't know how they figure out how much to map. I want to know what is the max amount of memory that I can allocate(heap), dma map and provide it to the adapter. I will be really thankful in all the help that I can get!! Thanks so much Best Regards, Marc -- To unsubscribe from this list: send the line unsubscribe linux-pci in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html