Re: [ewg] Mellanox target workaround in SRP
David Dillow wrote: On Mon, 2011-01-10 at 11:58 -0800, Vu Pham wrote: David Dillow wrote: either. The SRP FMR mapping code is careful to mask the SG address with the FMR page mask, so we should never ask the HCA to map a page with the first_byte_offset != 0. Instead, we tell the target to request an IO virtual address appropriately offset into the first page of the FMR. Or perhaps I misunderstood you, and it's the non-zero first byte offset in the RDMA command on the wire that is the issue, and not the FMR setup in the initiator? And it only affects FMR-mapped memory, not the kernel's MR? It's not the kernel's MR. I suspect that the corruption happen with *only* Mellanox FMR + MPT setup without fbo and target doing RDMA with offset vaddr. I need to ask internal hw/fw guys and confirm if it's true. Have you had any response from the HW/FW guys? Sorry for late response. Our hw/fw guys confirm that there is no problem, my suspect is wrong. To explain clearly how hw translate from remote rdma address to physical address in fmr's MTT X = requested/rdma_va - MPT.start + MPT.fbo MTT index = X / MPT.blocksize MTT offset = X % MPT.blocksize PA = MTT[index] + MTT offset MPT - memory protection table MTT - memory translation table -vu ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] Mellanox target workaround in SRP
David Dillow wrote: On Fri, 2011-01-07 at 20:05 -0800, Roland Dreier wrote: I'm sure this was tested and shown to fix the problem; I'm just confused as to what the problem really was and if this is still relevant. Can someone please enlighten me? At this point I'm afraid it's all lost in the mists of time, Yep, that's my fear. And since it is a corruption bug, I've got to tread lightly in this area. :/ I don't recall to discuss or review this patch with Michael Tsirkin when he summited the patch. looking at the patch, I would guess that the corruption occurred when the target got an IO request that started at a non-page-aligned address but that spanned more than one page. That's my thought as well, but then I'm not sure this really solved their problem. It may be more likely to occur in the FMR case, but the initiator enables clustering, so blk_rq_map_sg() could generate the same kinds of requests for both direct and indirect descriptors, even without FMR. This looks to have been true since the initiator was added to the kernel, though it is possible I'm misreading the code. I don't know if the target was ever fixed, or whether that target code has any relevance today. Here's hoping someone from Mellanox can shed some light. I think that the patch is specific for srp initiator using Mellanox FMR. It tried to avoid indirect desc with Mellanox FMR having first-byte-offset != 0. Since the low level implementation of mlx4/mthca_map_phys_fmr() did not create + setup MPT for FMR with first_byte_offset != 0. The corruption can happen with any target. -vu ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] Mellanox target workaround in SRP
Roland Dreier wrote: I think that the patch is specific for srp initiator using Mellanox FMR. It tried to avoid indirect desc with Mellanox FMR having first-byte-offset != 0. Since the low level implementation of mlx4/mthca_map_phys_fmr() did not create + setup MPT for FMR with first_byte_offset != 0. The corruption can happen with any target. I don't think this could be right -- right now the workaround only triggers if the target has a Mellanox OUI, so if what you say is true, presumably everyone who is using the SRP initiator with mlx4 would be seeing this problem. Yes, I'm afraid targets without Mellanox OUI would be seeing this problem. Also, the SRP initiator code that uses ib_fmr_pool_map_phys does not pass in any non-aligned addresses -- it doesn't try to use any first byte offset, it just uses the virtual address it passes to the target to handle the offset. Yes and I suspect that the corruption happen with Mellanox FMR/MPT setup without fbo and target doing RDMA with offset vaddr. Let me ask around and confirm. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] Mellanox target workaround in SRP
David Dillow wrote: On Mon, 2011-01-10 at 10:21 -0800, Vu Pham wrote: David Dillow wrote: On Fri, 2011-01-07 at 20:05 -0800, Roland Dreier wrote: looking at the patch, I would guess that the corruption occurred when the target got an IO request that started at a non-page-aligned address but that spanned more than one page. [snip] Here's hoping someone from Mellanox can shed some light. I think that the patch is specific for srp initiator using Mellanox FMR. It tried to avoid indirect desc with Mellanox FMR having first-byte-offset != 0. Since the low level implementation of mlx4/mthca_map_phys_fmr() did not create + setup MPT for FMR with first_byte_offset != 0. The corruption can happen with any target. Thanks for taking a look Vu -- Thanks for taking ownership of srp :) but I'm not sure that is the problem, either. The SRP FMR mapping code is careful to mask the SG address with the FMR page mask, so we should never ask the HCA to map a page with the first_byte_offset != 0. Instead, we tell the target to request an IO virtual address appropriately offset into the first page of the FMR. Or perhaps I misunderstood you, and it's the non-zero first byte offset in the RDMA command on the wire that is the issue, and not the FMR setup in the initiator? And it only affects FMR-mapped memory, not the kernel's MR? It's not the kernel's MR. I suspect that the corruption happen with *only* Mellanox FMR + MPT setup without fbo and target doing RDMA with offset vaddr. I need to ask internal hw/fw guys and confirm if it's true. -vu ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] [PATCH v1 10/10] mlx4_fc: Enable FC over Ethernet/Infiniband drivers
depends on + select LIBFC + select LIBFCOE + help + Fibre Channel over Ethernet/Infiniband module + + This is support for the Mellanox ConnectX/ConnectX-2 HCAs + The module will be called mlx4_fc End both lines above with periods (.). + + The module FIP-alike to discover BridgeX gateways in the + Infiniband fabric end sentence with .. What does FIP-alike mean? and fix grammar: This module attempts to discover BridgeX gateways in the Infiniband fabric. ? end sentence with .. Thanks for you comments. From 54041f56fa1896388a52f503e43e21aa11cc9a2a Mon Sep 17 00:00:00 2001 From: Vu Pham v...@vu-lt.mti.mtl.com Date: Tue, 17 Aug 2010 15:40:42 -0700 Subject: [PATCH 10/10] mlx4_fc: Enable FC over Ethernet/Infiniband drivers Add entries in scsi's Kconfig and Makefile to enable mlx4_fc (fcoe/fcoib offload driver) and mlx4_fcoib (FIP-alike discovery driver) Signed-off-by: Vu Pham v...@mellanox.com --- drivers/scsi/Kconfig | 23 +++ drivers/scsi/Makefile |2 ++ 2 files changed, 25 insertions(+), 0 deletions(-) diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig index 158284f..e3d2850 100644 --- a/drivers/scsi/Kconfig +++ b/drivers/scsi/Kconfig @@ -687,6 +687,29 @@ config FCOE_FNIC file:Documentation/scsi/scsi.txt. The module will be called fnic. +config MLX4_FC + tristate Mellanox FC module + select MLX4_EN + select LIBFC + select LIBFCOE + help + Fibre Channel over Ethernet/Infiniband module + + This is support for the Mellanox ConnectX/ConnectX-2 HCAs. + The module will be called mlx4_fc. + +config MLX4_FCOIB + tristate Mellanox FCoIB discovery module + depends on INFINIBAND + select MLX4_FC + help + Fibre Channel over Infiniband discovery module + + The module attempts to discover BridgeX gateways in the + Infiniband fabric by implementing FIP protocol (FCoE + Initialization protocol) over Infiniband. + The module will be called mlx4_fcoib. + config SCSI_DMX3191D tristate DMX3191D SCSI support depends on PCI SCSI diff --git a/drivers/scsi/Makefile b/drivers/scsi/Makefile index 2a3fca2..0d0dab7 100644 --- a/drivers/scsi/Makefile +++ b/drivers/scsi/Makefile @@ -40,6 +40,8 @@ obj-$(CONFIG_LIBFC) += libfc/ obj-$(CONFIG_LIBFCOE) += fcoe/ obj-$(CONFIG_FCOE) += fcoe/ obj-$(CONFIG_FCOE_FNIC)+= fnic/ +obj-$(CONFIG_MLX4_FC) += mlx4_fc/ +obj-$(CONFIG_MLX4_FCOIB) += mlx4_fc/ obj-$(CONFIG_ISCSI_TCP)+= libiscsi.o libiscsi_tcp.o iscsi_tcp.o obj-$(CONFIG_INFINIBAND_ISER) += libiscsi.o obj-$(CONFIG_SCSI_A4000T) += 53c700.o a4000t.o -- 1.6.3.3 ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] [PATCH v1 09/10] mlx4_fc: Implement fcoe/fcoib offload driver, fcoib initialization protocol driver
snip I skimmed through the patch and just noticed a few issues. I didn't do anything like a full review. I'm copying de...@open-fcoe.org, although some of them have seen this on the linux-scsi list. Thanks for your review and comments. +static int mlx4_fip_recv(struct sk_buff *skb, struct net_device *dev, + struct packet_type *ptype, struct net_device *orig_dev) +{ +struct mfc_vhba *vhba = +container_of(ptype, struct mfc_vhba, fip_packet_type); +struct ethhdr *eh = eth_hdr(skb); + +fcoe_ctlr_recv(vhba-ctlr, skb); + +/* XXX: This is ugly */ +memcpy(vhba-dest_addr, eh-h_source, 6); Not just ugly. First of all, picking up the dest addr from the FIP packet source means you may be changing it each time you receive an advertisement from an FCF, whether its appropriate or not. Also, the skb may have been freed by fcoe_ctlr_recv(). It is responsible for it being freed eventually and this could be done before it returns. Since eh points into the skb it is garbage at this point. The gateway MAC address will be in vhba-ctlr.dest_addr. I clean this up on next revision. +static void mlx4_fip_send(struct fcoe_ctlr *fip, struct sk_buff *skb) +{ +skb-dev = (struct net_device *)mlx4_from_ctlr(fip)-underdev; +dev_queue_xmit(skb); +} snip + +static void mfc_flogi_resp(struct fc_seq *seq, struct fc_frame *fp, void *arg) +{ snip +mfc_update_src_mac(lport, mac); +done: +fc_lport_flogi_resp(seq, fp, lport); +} snip +static struct fc_seq *mfc_elsct_send(struct fc_lport *lport, u32 did, + struct fc_frame *fp, unsigned int op, + void (*resp) (struct fc_seq *, + struct fc_frame *, + void *), void *arg, + u32 timeout) +{ +struct mfc_vhba *vhba = lport_priv(lport); +struct fcoe_ctlr *fip = vhba-ctlr; +struct fc_frame_header *fh = fc_frame_header_get(fp); + +switch (op) { +case ELS_FLOGI: +case ELS_FDISC: +return fc_elsct_send(lport, did, fp, op, mfc_flogi_resp, + fip, timeout); +case ELS_LOGO: +/* only hook onto fabric logouts, not port logouts */ +if (ntoh24(fh-fh_d_id) != FC_FID_FLOGI) +break; +return fc_elsct_send(lport, did, fp, op, mfc_logo_resp, + lport, timeout); +} +return fc_elsct_send(lport, did, fp, op, resp, arg, timeout); A better way to pick up the assigned MAC address after FLOGI succeeds is by providing a callback in the libfc_function_template for lport_set_port_id(). That gets a copy of the original frame and fcoe_ctlr_recv_flogi() can get the granted_mac address out of that for the non-FIP case. It also gets called at LOGO when the port_id is being set to 0. See how fnic does it. That's cleaner than intercepting FLOGI and LOGO ELSes. Also, the callback for set_mac_addr() should take care of the assigned MAC address. I forget why fcoe.ko did it this way, and its OK for you to do this, too, but I think the fnic way is cleaner. I recently just synced up with latest libfc/libfcoe and used fcoe as example. Thanks for pointing this out. I'll take a look at fnic way + +static ssize_t mfc_sys_create(struct class *cl, struct class_attribute *attr, + const char *buf, size_t count) +{ +char ifname[IFNAMSIZ + 1]; +char *ch; +char test; +int cnt = 0; +int vlan_id = -1; +int prio = 0; +struct net_device *netdev = NULL; + +strncpy(ifname, buf, sizeof(ifname)); This doesn't guarantee a terminated string. You might want to do: ifname[sizeof(ifname) - 1] = '\0'; to force the end. Also, your optional arguments won't fit if the specified interface name is already IFNAMSIZ long. I think adding comma separated args is fine, but maybe they should be of the form name=value and fcoe can use that method, too. We could putthe arg parsing somewhere shared like libfcoe. The comma is for the priority associated particularly with that interface. If openfc-dev can formalize the format, we adapt to it. + +netdev = dev_get_by_name(init_net, ifname); +if (!netdev) { +printk(KERN_ERR Couldn't get a network device for '%s'\n, + ifname); +goto out; This should return an error, not just return count. Otherwise the user gets no indication unless they're looking at the console log. Ok +} +if (netdev-priv_flags IFF_802_1Q_VLAN) { +vlan_id = vlan_dev_vlan_id(netdev); +printk(KERN_INFO PFX vlan id %d prio %d\n, vlan_id, prio); +if (vlan_id 0) +goto out; Same here. Ok
[ewg] [PATCH v1 01/10] mlx4_core: Change fw profile and qp context to enable FC
From 5fd7e9795e085ff64b2396a5339f98e9b7021965 Mon Sep 17 00:00:00 2001 From: Vu Pham v...@vu-lt.mti.mtl.com Date: Tue, 10 Aug 2010 13:55:43 -0700 Subject: [PATCH 01/10] mlx4_core: Change fw profile and qp context to enable FC Increase num_qp, num_mpt resource in fw profile to enable FC Add fields in qp context and qp path to enable FC qps Signed-off-by: Oren Duer o...@mellanox.co.il Signed-off-by: Vu Pham v...@mellanox.com --- drivers/net/mlx4/main.c |4 ++-- include/linux/mlx4/qp.h | 13 +++-- 2 files changed, 13 insertions(+), 4 deletions(-) diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c index 5102ab1..fe3be88 100644 --- a/drivers/net/mlx4/main.c +++ b/drivers/net/mlx4/main.c @@ -79,12 +79,12 @@ static char mlx4_version[] __devinitdata = DRV_VERSION ( DRV_RELDATE )\n; static struct mlx4_profile default_profile = { - .num_qp = 1 17, + .num_qp = 1 18, .num_srq = 1 16, .rdmarc_per_qp = 1 4, .num_cq = 1 16, .num_mcg = 1 13, - .num_mpt = 1 17, + .num_mpt = 1 19, .num_mtt = 1 20, }; diff --git a/include/linux/mlx4/qp.h b/include/linux/mlx4/qp.h index 7abe643..249dacf 100644 --- a/include/linux/mlx4/qp.h +++ b/include/linux/mlx4/qp.h @@ -109,7 +109,7 @@ struct mlx4_qp_path { __be32 tclass_flowlabel; u8 rgid[16]; u8 sched_queue; - u8 snooper_flags; + u8 vlan_index; u8 reserved3[2]; u8 counter_index; u8 reserved4[7]; @@ -151,7 +151,16 @@ struct mlx4_qp_context { u8 reserved4[2]; u8 mtt_base_addr_h; __be32 mtt_base_addr_l; - u32 reserved5[10]; + u8 VE; + u8 reserved5; + __be16 VFT_id_prio; + u8 reserved6; + u8 exch_size; + __be16 exch_base; + u8 VFT_hop_cnt; + u8 my_fc_id_idx; + __be16 reserved7; + u32 reserved8[7]; }; /* Which firmware version adds support for NEC (NoErrorCompletion) bit */ -- 1.6.3.3 ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] [PATCH v1 0/10] Add fcoe, fcoib drivers for mlx4 device
Hi Roland, Thanks for your feedbacks for previous patch set. I reworked, rearranged order of the patches as suggested. Please ignore the previous patch set. Here is the new one. The following series implements fcoe and fcoib offload driver for mlx4 device mlx4_fc: implement fcoe/fcoib, hook to scsi mid-layer to offload scsi operations, and use openfc's libfc to do ELS/BLS mlx4_fcoib: driver implement fcoib initialization protocol to discover IB-FC gateways/bridges mlx4: add/enable mlx4_core, mlx4_ib, mlx4_en to support fcoe/fcoib drivers/infiniband/hw/mlx4/cq.c |4 +- drivers/infiniband/hw/mlx4/main.c | 10 +- drivers/net/mlx4/cq.c | 27 +- drivers/net/mlx4/en_cq.c |2 +- drivers/net/mlx4/en_main.c|9 + drivers/net/mlx4/fw.c | 13 + drivers/net/mlx4/intf.c | 19 + drivers/net/mlx4/main.c |4 +- drivers/net/mlx4/mlx4.h |1 + drivers/net/mlx4/mr.c | 140 ++- drivers/scsi/Kconfig | 22 + drivers/scsi/Makefile |2 + drivers/scsi/mlx4_fc/Makefile |8 + drivers/scsi/mlx4_fc/fcoib.h | 343 ++ drivers/scsi/mlx4_fc/fcoib_api.h | 61 + drivers/scsi/mlx4_fc/fcoib_discover.c | 1925 +++ drivers/scsi/mlx4_fc/fcoib_main.c | 1211 drivers/scsi/mlx4_fc/mfc.c| 2003 + drivers/scsi/mlx4_fc/mfc.h| 666 +++ drivers/scsi/mlx4_fc/mfc_exch.c | 1496 drivers/scsi/mlx4_fc/mfc_rfci.c | 1001 drivers/scsi/mlx4_fc/mfc_sysfs.c | 244 include/linux/mlx4/device.h | 23 +- include/linux/mlx4/driver.h |9 + include/rdma/ib_verbs.h | 10 +- 25 files changed, 9222 insertions(+), 31 deletions(-) thanks -vu ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] [PATCH v1 02/10] mlx4_core: Add mr_alloc_reserved to able create MPTs with pre-reserve range
From 7c943349148dcbcb173814a9b6800ac2a2b12626 Mon Sep 17 00:00:00 2001 From: Vu Pham v...@vu-lt.mti.mtl.com Date: Tue, 10 Aug 2010 14:03:53 -0700 Subject: [PATCH 02/10] mlx4_core: Add mr_alloc_reserved to able create MPTs with pre-reserve range As we did with QPs, range of the MPTs are pre-reserved (the MPTs that are mapped for FEXCHs, 2*64K of them). We need to split the operation of allocating an MPT to two: . The allocation of a bit from the bitmap . The actual creation of the entry (and it's MTT). So, mr_alloc_reserved() is the second part, where you know which MPT number was allocated. mr_alloc() is the one that allocates a number from the bitmap. Normal users keep using the original mr_alloc(). For FEXCH, when we know the pre-reserved MPT entry, we call mr_alloc_reserved() directly. Same with the mr_free() and corresponding mr_free_reserved(). The first will just put back the bit, the later will actually destroy the entry, but will leave the bit set. Signed-off-by: Yevgeny Petrilin yevge...@mellanox.co.il Signed-off-by: Oren Duer o...@mellanox.co.il Signed-off-by: Vu Pham v...@mellanox.com --- drivers/net/mlx4/mr.c | 35 +-- include/linux/mlx4/device.h |4 2 files changed, 29 insertions(+), 10 deletions(-) diff --git a/drivers/net/mlx4/mr.c b/drivers/net/mlx4/mr.c index 9c188bd..35c0af6 100644 --- a/drivers/net/mlx4/mr.c +++ b/drivers/net/mlx4/mr.c @@ -263,6 +263,21 @@ static int mlx4_HW2SW_MPT(struct mlx4_dev *dev, struct mlx4_cmd_mailbox *mailbox !mailbox, MLX4_CMD_HW2SW_MPT, MLX4_CMD_TIME_CLASS_B); } +int mlx4_mr_alloc_reserved(struct mlx4_dev *dev, u32 mridx, u32 pd, + u64 iova, u64 size, u32 access, int npages, + int page_shift, struct mlx4_mr *mr) +{ + mr-iova = iova; + mr-size = size; + mr-pd = pd; + mr-access = access; + mr-enabled= 0; + mr-key = hw_index_to_key(mridx); + + return mlx4_mtt_init(dev, npages, page_shift, mr-mtt); +} +EXPORT_SYMBOL_GPL(mlx4_mr_alloc_reserved); + int mlx4_mr_alloc(struct mlx4_dev *dev, u32 pd, u64 iova, u64 size, u32 access, int npages, int page_shift, struct mlx4_mr *mr) { @@ -274,14 +289,8 @@ int mlx4_mr_alloc(struct mlx4_dev *dev, u32 pd, u64 iova, u64 size, u32 access, if (index == -1) return -ENOMEM; - mr-iova = iova; - mr-size = size; - mr-pd = pd; - mr-access = access; - mr-enabled= 0; - mr-key = hw_index_to_key(index); - - err = mlx4_mtt_init(dev, npages, page_shift, mr-mtt); + err = mlx4_mr_alloc_reserved(dev, index, pd, iova, size, + access, npages, page_shift, mr); if (err) mlx4_bitmap_free(priv-mr_table.mpt_bitmap, index); @@ -289,9 +298,8 @@ int mlx4_mr_alloc(struct mlx4_dev *dev, u32 pd, u64 iova, u64 size, u32 access, } EXPORT_SYMBOL_GPL(mlx4_mr_alloc); -void mlx4_mr_free(struct mlx4_dev *dev, struct mlx4_mr *mr) +void mlx4_mr_free_reserved(struct mlx4_dev *dev, struct mlx4_mr *mr) { - struct mlx4_priv *priv = mlx4_priv(dev); int err; if (mr-enabled) { @@ -303,6 +311,13 @@ void mlx4_mr_free(struct mlx4_dev *dev, struct mlx4_mr *mr) } mlx4_mtt_cleanup(dev, mr-mtt); +} +EXPORT_SYMBOL_GPL(mlx4_mr_free_reserved); + +void mlx4_mr_free(struct mlx4_dev *dev, struct mlx4_mr *mr) +{ + struct mlx4_priv *priv = mlx4_priv(dev); + mlx4_mr_free_reserved(dev, mr); mlx4_bitmap_free(priv-mr_table.mpt_bitmap, key_to_hw_index(mr-key)); } EXPORT_SYMBOL_GPL(mlx4_mr_free); diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index 7a7f9c1..66849cf 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -424,8 +424,12 @@ int mlx4_mtt_init(struct mlx4_dev *dev, int npages, int page_shift, void mlx4_mtt_cleanup(struct mlx4_dev *dev, struct mlx4_mtt *mtt); u64 mlx4_mtt_addr(struct mlx4_dev *dev, struct mlx4_mtt *mtt); +int mlx4_mr_alloc_reserved(struct mlx4_dev *dev, u32 mridx, u32 pd, + u64 iova, u64 size, u32 access, int npages, + int page_shift, struct mlx4_mr *mr); int mlx4_mr_alloc(struct mlx4_dev *dev, u32 pd, u64 iova, u64 size, u32 access, int npages, int page_shift, struct mlx4_mr *mr); +void mlx4_mr_free_reserved(struct mlx4_dev *dev, struct mlx4_mr *mr); void mlx4_mr_free(struct mlx4_dev *dev, struct mlx4_mr *mr); int mlx4_mr_enable(struct mlx4_dev *dev, struct mlx4_mr *mr); int mlx4_write_mtt(struct mlx4_dev *dev, struct mlx4_mtt *mtt, -- 1.6.3.3 ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] [PATCH v1 03/10] mlx4_core: Add mr_reserve_range to pre-reserve a range of MPTs
From b570c5df5006119ac626d96551cc0a9935037e5f Mon Sep 17 00:00:00 2001 From: Vu Pham v...@vu-lt.mti.mtl.com Date: Tue, 10 Aug 2010 14:10:57 -0700 Subject: [PATCH 03/10] mlx4_core: Add mr_reserve_range to pre-reserve a range of MPTs Add MPTs mr_reserve/free_range Remove QPs and MPTs static reservation for FC_EXCH by default. mlx4_fc driver will reserve them upon loading Signed-off-by: Oren Duer o...@mellanox.co.il Signed-off-by: Vu Pham v...@mellanox.com --- drivers/net/mlx4/main.c |4 +--- drivers/net/mlx4/mr.c | 22 ++ include/linux/mlx4/device.h |7 ++- 3 files changed, 25 insertions(+), 8 deletions(-) diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c index fe3be88..fbe646a 100644 --- a/drivers/net/mlx4/main.c +++ b/drivers/net/mlx4/main.c @@ -259,12 +259,10 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) (1 dev-caps.log_num_vlans) * (1 dev-caps.log_num_prios) * dev-caps.num_ports; - dev-caps.reserved_qps_cnt[MLX4_QP_REGION_FC_EXCH] = MLX4_NUM_FEXCH; dev-caps.reserved_qps = dev-caps.reserved_qps_cnt[MLX4_QP_REGION_FW] + dev-caps.reserved_qps_cnt[MLX4_QP_REGION_ETH_ADDR] + - dev-caps.reserved_qps_cnt[MLX4_QP_REGION_FC_ADDR] + - dev-caps.reserved_qps_cnt[MLX4_QP_REGION_FC_EXCH]; + dev-caps.reserved_qps_cnt[MLX4_QP_REGION_FC_ADDR]; return 0; } diff --git a/drivers/net/mlx4/mr.c b/drivers/net/mlx4/mr.c index 35c0af6..ba0514d 100644 --- a/drivers/net/mlx4/mr.c +++ b/drivers/net/mlx4/mr.c @@ -263,6 +263,28 @@ static int mlx4_HW2SW_MPT(struct mlx4_dev *dev, struct mlx4_cmd_mailbox *mailbox !mailbox, MLX4_CMD_HW2SW_MPT, MLX4_CMD_TIME_CLASS_B); } +int mlx4_mr_reserve_range(struct mlx4_dev *dev, int cnt, int align, u32 *base_mridx) +{ + struct mlx4_priv *priv = mlx4_priv(dev); + u32 mridx; + + mridx = mlx4_bitmap_alloc_range(priv-mr_table.mpt_bitmap, cnt, align); + if (mridx == -1) + return -ENOMEM; + + *base_mridx = mridx; + return 0; + +} +EXPORT_SYMBOL_GPL(mlx4_mr_reserve_range); + +void mlx4_mr_release_range(struct mlx4_dev *dev, u32 base_mridx, int cnt) +{ + struct mlx4_priv *priv = mlx4_priv(dev); + mlx4_bitmap_free_range(priv-mr_table.mpt_bitmap, base_mridx, cnt); +} +EXPORT_SYMBOL_GPL(mlx4_mr_release_range); + int mlx4_mr_alloc_reserved(struct mlx4_dev *dev, u32 mridx, u32 pd, u64 iova, u64 size, u32 access, int npages, int page_shift, struct mlx4_mr *mr) diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index 66849cf..da8ab85 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -151,7 +151,6 @@ enum mlx4_qp_region { MLX4_QP_REGION_FW = 0, MLX4_QP_REGION_ETH_ADDR, MLX4_QP_REGION_FC_ADDR, - MLX4_QP_REGION_FC_EXCH, MLX4_NUM_QP_REGION }; @@ -167,10 +166,6 @@ enum mlx4_special_vlan_idx { MLX4_VLAN_REGULAR }; -enum { - MLX4_NUM_FEXCH = 64 * 1024, -}; - static inline u64 mlx4_fw_ver(u64 major, u64 minor, u64 subminor) { return (major 32) | (minor 16) | subminor; @@ -424,6 +419,8 @@ int mlx4_mtt_init(struct mlx4_dev *dev, int npages, int page_shift, void mlx4_mtt_cleanup(struct mlx4_dev *dev, struct mlx4_mtt *mtt); u64 mlx4_mtt_addr(struct mlx4_dev *dev, struct mlx4_mtt *mtt); +int mlx4_mr_reserve_range(struct mlx4_dev *dev, int cnt, int align, u32 *base_mridx); +void mlx4_mr_release_range(struct mlx4_dev *dev, u32 base_mridx, int cnt); int mlx4_mr_alloc_reserved(struct mlx4_dev *dev, u32 mridx, u32 pd, u64 iova, u64 size, u32 access, int npages, int page_shift, struct mlx4_mr *mr); -- 1.6.3.3 ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] [PATCH v1 04/10] mlx4_core: Add interface to allocate fmr with pre-reserved MPTs
From 9652f29170c7daabf2e9a62acb848a05dc71db9a Mon Sep 17 00:00:00 2001 From: Vu Pham v...@vu-lt.mti.mtl.com Date: Tue, 10 Aug 2010 14:16:50 -0700 Subject: [PATCH 04/10] mlx4_core: Add interface to allocate fmr with pre-reserved MPTs As we did with MRs, the fmr_alloc() will call mr_alloc() to allocate bitmap and create MPTs entry. fmr_alloc_reserver will call mr_alloc_reserve() to create MPTs entry with pre-reserved range Signed-off-by: Oren Duer o...@mellanox.co.il Signed-off-by: Vu Pham v...@mellanox.com --- drivers/net/mlx4/mr.c | 55 +++ include/linux/mlx4/device.h |4 +++ 2 files changed, 59 insertions(+), 0 deletions(-) diff --git a/drivers/net/mlx4/mr.c b/drivers/net/mlx4/mr.c index ba0514d..67d858f 100644 --- a/drivers/net/mlx4/mr.c +++ b/drivers/net/mlx4/mr.c @@ -655,6 +655,49 @@ err_free: } EXPORT_SYMBOL_GPL(mlx4_fmr_alloc); +int mlx4_fmr_alloc_reserved(struct mlx4_dev *dev, u32 mridx, + u32 pd, u32 access, int max_pages, + int max_maps, u8 page_shift, struct mlx4_fmr *fmr) +{ + struct mlx4_priv *priv = mlx4_priv(dev); + u64 mtt_seg; + int err = -ENOMEM; + + if (page_shift (ffs(dev-caps.page_size_cap) - 1) || page_shift = 32) + return -EINVAL; + + /* All MTTs must fit in the same page */ + if (max_pages * sizeof *fmr-mtts PAGE_SIZE) + return -EINVAL; + + fmr-page_shift = page_shift; + fmr-max_pages = max_pages; + fmr-max_maps = max_maps; + fmr-maps = 0; + + err = mlx4_mr_alloc_reserved(dev, mridx, pd, 0, 0, access, max_pages, + page_shift, fmr-mr); + if (err) + return err; + + mtt_seg = fmr-mr.mtt.first_seg * dev-caps.mtt_entry_sz; + + fmr-mtts = mlx4_table_find(priv-mr_table.mtt_table, +fmr-mr.mtt.first_seg, +fmr-dma_handle); + if (!fmr-mtts) { + err = -ENOMEM; + goto err_free; + } + + return 0; + +err_free: + mlx4_mr_free_reserved(dev, fmr-mr); + return err; +} +EXPORT_SYMBOL_GPL(mlx4_fmr_alloc_reserved); + int mlx4_fmr_enable(struct mlx4_dev *dev, struct mlx4_fmr *fmr) { struct mlx4_priv *priv = mlx4_priv(dev); @@ -697,6 +740,18 @@ int mlx4_fmr_free(struct mlx4_dev *dev, struct mlx4_fmr *fmr) } EXPORT_SYMBOL_GPL(mlx4_fmr_free); +int mlx4_fmr_free_reserved(struct mlx4_dev *dev, struct mlx4_fmr *fmr) +{ + if (fmr-maps) + return -EBUSY; + + fmr-mr.enabled = 0; + mlx4_mr_free_reserved(dev, fmr-mr); + + return 0; +} +EXPORT_SYMBOL_GPL(mlx4_fmr_free_reserved); + int mlx4_SYNC_TPT(struct mlx4_dev *dev) { return mlx4_cmd(dev, 0, 0, 0, MLX4_CMD_SYNC_TPT, 1000); diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index da8ab85..3960033 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -474,11 +474,15 @@ void mlx4_unregister_vlan(struct mlx4_dev *dev, u8 port, int index); int mlx4_map_phys_fmr(struct mlx4_dev *dev, struct mlx4_fmr *fmr, u64 *page_list, int npages, u64 iova, u32 *lkey, u32 *rkey); +int mlx4_fmr_alloc_reserved(struct mlx4_dev *dev, u32 mridx, u32 pd, + u32 access, int max_pages, int max_maps, + u8 page_shift, struct mlx4_fmr *fmr); int mlx4_fmr_alloc(struct mlx4_dev *dev, u32 pd, u32 access, int max_pages, int max_maps, u8 page_shift, struct mlx4_fmr *fmr); int mlx4_fmr_enable(struct mlx4_dev *dev, struct mlx4_fmr *fmr); void mlx4_fmr_unmap(struct mlx4_dev *dev, struct mlx4_fmr *fmr, u32 *lkey, u32 *rkey); +int mlx4_fmr_free_reserved(struct mlx4_dev *dev, struct mlx4_fmr *fmr); int mlx4_fmr_free(struct mlx4_dev *dev, struct mlx4_fmr *fmr); int mlx4_SYNC_TPT(struct mlx4_dev *dev); -- 1.6.3.3 ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] [PATCH v1 05/10] mlx4_core: Map phys_fmr with first byte offset enabled
From 1afc1f9cfe33444c26dc027556035c7c85a9a565 Mon Sep 17 00:00:00 2001 From: Vu Pham v...@vu-lt.mti.mtl.com Date: Tue, 10 Aug 2010 14:27:23 -0700 Subject: [PATCH 05/10] mlx4_core: Map phys_fmr with first byte offset enabled map_phys_fmr_fbo() is very much like the original map_phys_fmr(): . allows setting an FBO (First Byte Offset) for the MPT . allows setting the data length for the MPT . does not increase the higher bits of the key after every map. Signed-off-by: Oren Duer o...@mellanox.co.il Signed-off-by: Vu Pham v...@mellanox.com --- drivers/net/mlx4/mr.c | 28 +++- include/linux/mlx4/device.h |3 +++ 2 files changed, 26 insertions(+), 5 deletions(-) diff --git a/drivers/net/mlx4/mr.c b/drivers/net/mlx4/mr.c index 67d858f..42527b5 100644 --- a/drivers/net/mlx4/mr.c +++ b/drivers/net/mlx4/mr.c @@ -52,7 +52,9 @@ struct mlx4_mpt_entry { __be64 length; __be32 lkey; __be32 win_cnt; - u8 reserved1[3]; + u8 reserved1; + u8 flags2; + u8 reserved2; u8 mtt_rep; __be64 mtt_seg; __be32 mtt_sz; @@ -71,6 +73,8 @@ struct mlx4_mpt_entry { #define MLX4_MPT_PD_FLAG_RAE (1 28) #define MLX4_MPT_PD_FLAG_EN_INV (3 24) +#define MLX4_MPT_FLAG2_FBO_EN (1 7) + #define MLX4_MPT_STATUS_SW 0xF0 #define MLX4_MPT_STATUS_HW 0x00 @@ -566,8 +570,9 @@ static inline int mlx4_check_fmr(struct mlx4_fmr *fmr, u64 *page_list, return 0; } -int mlx4_map_phys_fmr(struct mlx4_dev *dev, struct mlx4_fmr *fmr, u64 *page_list, - int npages, u64 iova, u32 *lkey, u32 *rkey) +int mlx4_map_phys_fmr_fbo(struct mlx4_dev *dev, struct mlx4_fmr *fmr, + u64 *page_list, int npages, u64 iova, u32 fbo, + u32 len, u32 *lkey, u32 *rkey, int same_key) { u32 key; int i, err; @@ -579,7 +584,8 @@ int mlx4_map_phys_fmr(struct mlx4_dev *dev, struct mlx4_fmr *fmr, u64 *page_list ++fmr-maps; key = key_to_hw_index(fmr-mr.key); - key += dev-caps.num_mpts; + if (!same_key) + key += dev-caps.num_mpts; *lkey = *rkey = fmr-mr.key = hw_index_to_key(key); *(u8 *) fmr-mpt = MLX4_MPT_STATUS_SW; @@ -598,8 +604,10 @@ int mlx4_map_phys_fmr(struct mlx4_dev *dev, struct mlx4_fmr *fmr, u64 *page_list fmr-mpt-key= cpu_to_be32(key); fmr-mpt-lkey = cpu_to_be32(key); - fmr-mpt-length = cpu_to_be64(npages * (1ull fmr-page_shift)); + fmr-mpt-length = cpu_to_be64(len); fmr-mpt-start = cpu_to_be64(iova); + fmr-mpt-first_byte_offset = cpu_to_be32(fbo 0x001f); + fmr-mpt-flags2 = (fbo ? MLX4_MPT_FLAG2_FBO_EN : 0); /* Make MTT entries are visible before setting MPT status */ wmb(); @@ -611,6 +619,16 @@ int mlx4_map_phys_fmr(struct mlx4_dev *dev, struct mlx4_fmr *fmr, u64 *page_list return 0; } +EXPORT_SYMBOL_GPL(mlx4_map_phys_fmr_fbo); + +int mlx4_map_phys_fmr(struct mlx4_dev *dev, struct mlx4_fmr *fmr, u64 *page_list, + int npages, u64 iova, u32 *lkey, u32 *rkey) +{ + u32 len = npages * (1ull fmr-page_shift); + + return mlx4_map_phys_fmr_fbo(dev, fmr, page_list, npages, iova, 0, + len, lkey, rkey, 0); +} EXPORT_SYMBOL_GPL(mlx4_map_phys_fmr); int mlx4_fmr_alloc(struct mlx4_dev *dev, u32 pd, u32 access, int max_pages, diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index 3960033..ae09787 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -472,6 +472,9 @@ void mlx4_unregister_mac(struct mlx4_dev *dev, u8 port, int index); int mlx4_register_vlan(struct mlx4_dev *dev, u8 port, u16 vlan, int *index); void mlx4_unregister_vlan(struct mlx4_dev *dev, u8 port, int index); +int mlx4_map_phys_fmr_fbo(struct mlx4_dev *dev, struct mlx4_fmr *fmr, + u64 *page_list, int npages, u64 iova, u32 fbo, + u32 len, u32 *lkey, u32 *rkey, int same_key); int mlx4_map_phys_fmr(struct mlx4_dev *dev, struct mlx4_fmr *fmr, u64 *page_list, int npages, u64 iova, u32 *lkey, u32 *rkey); int mlx4_fmr_alloc_reserved(struct mlx4_dev *dev, u32 mridx, u32 pd, -- 1.6.3.3 ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] [PATCH v1 06/10] mlx4: Enable create and attach cq to the vector with least number of cqs already attached
From 2d10665af8bfa41f772bedc18592b76b96c06729 Mon Sep 17 00:00:00 2001 From: Vu Pham v...@vu-lt.mti.mtl.com Date: Tue, 10 Aug 2010 14:33:51 -0700 Subject: [PATCH 06/10] mlx4: Enable create and attach cq to the vector with least number of cqs already attached When the vector number passed to mlx4_cq_alloc is MLX4_LEAST_ATTACHED_VECTOR the driver selects the completion vector that has the least CQs attached and attaches the CQ to the chosen vector. IB_CQ_VECTOR_LEAST_ATTACHED is defined in rdma/ib_verbs.h, when mlx4_ib driver recieves this cq vector number, it uses MLX4_LEAST_ATTACHED_VECTOR to create the cq Signed-off-by: Yevgeny Petrilin yevge...@mellanox.co.il Signed-off-by: Vu Pham v...@mellanox.com --- drivers/infiniband/hw/mlx4/cq.c |4 +++- drivers/net/mlx4/cq.c | 27 +++ drivers/net/mlx4/en_cq.c|2 +- drivers/net/mlx4/mlx4.h |1 + include/linux/mlx4/device.h |2 ++ include/rdma/ib_verbs.h | 10 +- 6 files changed, 39 insertions(+), 7 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c index 5a219a2..2687970 100644 --- a/drivers/infiniband/hw/mlx4/cq.c +++ b/drivers/infiniband/hw/mlx4/cq.c @@ -223,7 +223,9 @@ struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev, int entries, int vector } err = mlx4_cq_alloc(dev-dev, entries, cq-buf.mtt, uar, - cq-db.dma, cq-mcq, vector, 0); + cq-db.dma, cq-mcq, + vector == IB_CQ_VECTOR_LEAST_ATTACHED ? + MLX4_LEAST_ATTACHED_VECTOR : vector, 0); if (err) goto err_dbmap; diff --git a/drivers/net/mlx4/cq.c b/drivers/net/mlx4/cq.c index 7cd34e9..a6f03f9 100644 --- a/drivers/net/mlx4/cq.c +++ b/drivers/net/mlx4/cq.c @@ -187,6 +187,22 @@ int mlx4_cq_resize(struct mlx4_dev *dev, struct mlx4_cq *cq, } EXPORT_SYMBOL_GPL(mlx4_cq_resize); +static int mlx4_find_least_loaded_vector(struct mlx4_priv *priv) +{ + int i; + int index = 0; + int min = priv-eq_table.eq[0].load; + + for (i = 1; i priv-dev.caps.num_comp_vectors; i++) { + if (priv-eq_table.eq[i].load min) { + index = i; + min = priv-eq_table.eq[i].load; + } + } + + return index; +} + int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt, struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq, unsigned vector, int collapsed) @@ -198,10 +214,11 @@ int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt, u64 mtt_addr; int err; - if (vector = dev-caps.num_comp_vectors) - return -EINVAL; + cq-vector = (vector == MLX4_LEAST_ATTACHED_VECTOR) ? + mlx4_find_least_loaded_vector(priv) : vector; - cq-vector = vector; + if (cq-vector = dev-caps.num_comp_vectors) + return -EINVAL; cq-cqn = mlx4_bitmap_alloc(cq_table-bitmap); if (cq-cqn == -1) @@ -232,7 +249,7 @@ int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt, cq_context-flags = cpu_to_be32(!!collapsed 18); cq_context-logsize_usrpage = cpu_to_be32((ilog2(nent) 24) | uar-index); - cq_context-comp_eqn = priv-eq_table.eq[vector].eqn; + cq_context-comp_eqn = priv-eq_table.eq[cq-vector].eqn; cq_context-log_page_size = mtt-page_shift - MLX4_ICM_PAGE_SHIFT; mtt_addr = mlx4_mtt_addr(dev, mtt); @@ -245,6 +262,7 @@ int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt, if (err) goto err_radix; + priv-eq_table.eq[cq-vector].load++; cq-cons_index = 0; cq-arm_sn = 1; cq-uar= uar; @@ -282,6 +300,7 @@ void mlx4_cq_free(struct mlx4_dev *dev, struct mlx4_cq *cq) mlx4_warn(dev, HW2SW_CQ failed (%d) for CQN %06x\n, err, cq-cqn); synchronize_irq(priv-eq_table.eq[cq-vector].irq); + priv-eq_table.eq[cq-vector].load--; spin_lock_irq(cq_table-lock); radix_tree_delete(cq_table-tree, cq-cqn); diff --git a/drivers/net/mlx4/en_cq.c b/drivers/net/mlx4/en_cq.c index 21786ad..f3dc8b7 100644 --- a/drivers/net/mlx4/en_cq.c +++ b/drivers/net/mlx4/en_cq.c @@ -56,7 +56,7 @@ int mlx4_en_create_cq(struct mlx4_en_priv *priv, cq-vector = ring % mdev-dev-caps.num_comp_vectors; } else { cq-buf_size = sizeof(struct mlx4_cqe); - cq-vector = 0; + cq-vector = MLX4_LEAST_ATTACHED_VECTOR; } cq-ring = ring; diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h index 0da5bb7..d1112a8 100644 --- a/drivers/net/mlx4/mlx4.h +++ b/drivers/net/mlx4/mlx4.h @@ -137,6 +137,7 @@ struct mlx4_eq { u16 irq; u16 have_irq; int nent; + int load; struct mlx4_buf_list *page_list; struct mlx4_mtt mtt; }; diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index ae09787..8afac02 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -166,6 +166,8 @@ enum mlx4_special_vlan_idx { MLX4_VLAN_REGULAR }; +#define MLX4_LEAST_ATTACHED_VECTOR 0x + static inline u64 mlx4_fw_ver(u64 major, u64 minor, u64 subminor) { return (major 32) | (minor 16) | subminor; diff --git a/include/rdma/ib_verbs.h b/include
[ewg] [PATCH v1 07/10] mlx4_core: Enable T11 support bit in mlx4 device
From 85efcd1d8e1fae3bd5293d923eaf1aa50d54cbce Mon Sep 17 00:00:00 2001 From: Vu Pham v...@vu-lt.mti.mtl.com Date: Fri, 13 Aug 2010 10:01:31 -0700 Subject: [PATCH 07/10] mlx4_core: Enable T11 support bit in mlx4 device Enable T11 support bit in mlx4 device Add bool parameter to enable/disable pre_t11 support Add inteface to query the mode Signed-off-by: Oren Duer o...@mellanox.co.il Signed-off-by: Vu pham v...@mellanox.com --- drivers/net/mlx4/fw.c | 13 + include/linux/mlx4/device.h |5 - 2 files changed, 17 insertions(+), 1 deletions(-) diff --git a/drivers/net/mlx4/fw.c b/drivers/net/mlx4/fw.c index 04f42ae..20bee0f 100644 --- a/drivers/net/mlx4/fw.c +++ b/drivers/net/mlx4/fw.c @@ -51,6 +51,10 @@ static int enable_qos; module_param(enable_qos, bool, 0444); MODULE_PARM_DESC(enable_qos, Enable Quality of Service support in the HCA (default: off)); +static int mlx4_pre_t11; +module_param_named(pre_t11_mode, mlx4_pre_t11, bool, 0644); +MODULE_PARM_DESC(pre_t11_mode, For FCoXX, enable pre-t11 mode (default: off)); + #define MLX4_GET(dest, source, offset) \ do { \ void *__p = (char *) (source) + (offset); \ @@ -792,6 +796,8 @@ int mlx4_INIT_HCA(struct mlx4_dev *dev, struct mlx4_init_hca_param *param) MLX4_PUT(inbox, (u8) (PAGE_SHIFT - 12), INIT_HCA_UAR_PAGE_SZ_OFFSET); MLX4_PUT(inbox, param-log_uar_sz, INIT_HCA_LOG_UAR_SZ_OFFSET); + if (!mlx4_pre_t11 dev-caps.flags (u32) MLX4_DEV_CAP_FLAG_FC_T11) + *(inbox + INIT_HCA_FLAGS_OFFSET / 4) |= cpu_to_be32(1 10); err = mlx4_cmd(dev, mailbox-dma, 0, 0, MLX4_CMD_INIT_HCA, 1); @@ -890,3 +896,10 @@ int mlx4_NOP(struct mlx4_dev *dev) /* Input modifier of 0x1f means finish as soon as possible. */ return mlx4_cmd(dev, 0, 0x1f, 0, MLX4_CMD_NOP, 100); } + +void mlx4_get_fc_t11_settings(struct mlx4_dev *dev, int *pre_t11, int *t11) +{ + *pre_t11 = mlx4_pre_t11; + *t11 = dev-caps.flags MLX4_DEV_CAP_FLAG_FC_T11; +} +EXPORT_SYMBOL_GPL(mlx4_get_fc_t11_settings); diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index 8afac02..46966bb 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -67,7 +67,8 @@ enum { MLX4_DEV_CAP_FLAG_ATOMIC = 1 18, MLX4_DEV_CAP_FLAG_RAW_MCAST = 1 19, MLX4_DEV_CAP_FLAG_UD_AV_PORT = 1 20, - MLX4_DEV_CAP_FLAG_UD_MCAST = 1 21 + MLX4_DEV_CAP_FLAG_UD_MCAST = 1 21, + MLX4_DEV_CAP_FLAG_FC_T11 = 1 31 }; enum { @@ -491,4 +492,6 @@ int mlx4_fmr_free_reserved(struct mlx4_dev *dev, struct mlx4_fmr *fmr); int mlx4_fmr_free(struct mlx4_dev *dev, struct mlx4_fmr *fmr); int mlx4_SYNC_TPT(struct mlx4_dev *dev); +void mlx4_get_fc_t11_settings(struct mlx4_dev *dev, int *pre_t11, int *t11); + #endif /* MLX4_DEVICE_H */ -- 1.6.3.3 ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] [PATCH v1 08/10] mlx4: Add API to query protocol device of specific port on mlx4_device
From f29fe7ee6b8563eb362e22b8276e817e0337f048 Mon Sep 17 00:00:00 2001 From: Vu Pham v...@vu-lt.mti.mtl.com Date: Mon, 16 Aug 2010 10:16:42 -0700 Subject: [PATCH 08/10] mlx4: Add API to query protocol device of specific port on mlx4_device Adding new fields in mlx4_interface to set the type of protocol (IB, EN,...) and interface to query the underline protocol device of a specific port on the mlx4_device provided Signed-off-by: Oren Duer o...@mellanox.co.il Signed-off-by: Vu Pham v...@mellanx.com --- drivers/infiniband/hw/mlx4/main.c | 10 +- drivers/net/mlx4/en_main.c|9 + drivers/net/mlx4/intf.c | 19 +++ include/linux/mlx4/driver.h |9 + 4 files changed, 46 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c index 4e94e36..e071229 100644 --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -58,6 +58,12 @@ static const char mlx4_ib_version[] = DRV_NAME : Mellanox ConnectX InfiniBand driver v DRV_VERSION ( DRV_RELDATE )\n; +static void *get_ibdev(struct mlx4_dev *dev, void *ctx, u8 port) +{ + struct mlx4_ib_dev *mlxibdev = ctx; + return mlxibdev-ib_dev; +} + static void init_query_mad(struct ib_smp *mad) { mad-base_version = 1; @@ -749,7 +755,9 @@ static void mlx4_ib_event(struct mlx4_dev *dev, void *ibdev_ptr, static struct mlx4_interface mlx4_ib_interface = { .add = mlx4_ib_add, .remove = mlx4_ib_remove, - .event = mlx4_ib_event + .event = mlx4_ib_event, + .get_prot_dev = get_ibdev, + .protocol = MLX4_PROT_IB }; static int __init mlx4_ib_init(void) diff --git a/drivers/net/mlx4/en_main.c b/drivers/net/mlx4/en_main.c index 97934f1..03a4c3e 100644 --- a/drivers/net/mlx4/en_main.c +++ b/drivers/net/mlx4/en_main.c @@ -281,10 +281,19 @@ err_free_res: return NULL; } +static void *get_netdev(struct mlx4_dev *dev, void *ctx, u8 port) +{ + struct mlx4_en_dev *mdev = ctx; + + return (port = MLX4_MAX_PORTS) ? mdev-pndev[port] : NULL; +} + static struct mlx4_interface mlx4_en_interface = { .add = mlx4_en_add, .remove = mlx4_en_remove, .event = mlx4_en_event, + .get_prot_dev = get_netdev, + .protocol = MLX4_PROT_EN }; static int __init mlx4_en_init(void) diff --git a/drivers/net/mlx4/intf.c b/drivers/net/mlx4/intf.c index 5550678..10e18e4 100644 --- a/drivers/net/mlx4/intf.c +++ b/drivers/net/mlx4/intf.c @@ -80,6 +80,25 @@ static void mlx4_remove_device(struct mlx4_interface *intf, struct mlx4_priv *pr } } +void *mlx4_get_prot_dev(struct mlx4_dev *dev, enum mlx4_prot proto, int port) +{ + struct mlx4_priv *priv = mlx4_priv(dev); + struct mlx4_device_context *dev_ctx; + unsigned long flags; + void *result = NULL; + + spin_lock_irqsave(priv-ctx_lock, flags); + list_for_each_entry(dev_ctx, priv-ctx_list, list) + if (dev_ctx-intf-protocol == proto dev_ctx-intf-get_prot_dev) { + result = dev_ctx-intf-get_prot_dev(dev, dev_ctx-context, port); + break; + } + spin_unlock_irqrestore(priv-ctx_lock, flags); + + return result; +} +EXPORT_SYMBOL_GPL(mlx4_get_prot_dev); + int mlx4_register_interface(struct mlx4_interface *intf) { struct mlx4_priv *priv; diff --git a/include/linux/mlx4/driver.h b/include/linux/mlx4/driver.h index 53c5fdb..370e90a 100644 --- a/include/linux/mlx4/driver.h +++ b/include/linux/mlx4/driver.h @@ -44,15 +44,24 @@ enum mlx4_dev_event { MLX4_DEV_EVENT_PORT_REINIT, }; +enum mlx4_prot { + MLX4_PROT_IB, + MLX4_PROT_EN, +}; + struct mlx4_interface { void * (*add) (struct mlx4_dev *dev); void (*remove)(struct mlx4_dev *dev, void *context); void (*event) (struct mlx4_dev *dev, void *context, enum mlx4_dev_event event, int port); + void * (*get_prot_dev) (struct mlx4_dev *dev, + void *context, u8 port); + enum mlx4_prot protocol; struct list_head list; }; int mlx4_register_interface(struct mlx4_interface *intf); void mlx4_unregister_interface(struct mlx4_interface *intf); +void *mlx4_get_prot_dev(struct mlx4_dev *dev, enum mlx4_prot protocol, int port); #endif /* MLX4_DRIVER_H */ -- 1.6.3.3 ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] [PATCH v1 10/10] mlx4_fc: Enable FC over Ethernet/Infiniband drivers
From 17b0c5412e085e1f288c05cb667ec36e4a05c59b Mon Sep 17 00:00:00 2001 From: Vu Pham v...@vu-lt.mti.mtl.com Date: Mon, 16 Aug 2010 14:48:01 -0700 Subject: [PATCH 10/10] mlx4_fc: Enable FC over Ethernet/Infiniband drivers Add entries in scsi's Kconfig and Makefile to enable mlx4_fc (fcoe/fcoib offload driver) and mlx4_fcoib (FIP-alike discovery driver) Signed-off-by: Vu Pham v...@mellanox.com --- drivers/scsi/Kconfig | 22 ++ drivers/scsi/Makefile |2 ++ 2 files changed, 24 insertions(+), 0 deletions(-) diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig index 158284f..3573cee 100644 --- a/drivers/scsi/Kconfig +++ b/drivers/scsi/Kconfig @@ -687,6 +687,28 @@ config FCOE_FNIC file:Documentation/scsi/scsi.txt. The module will be called fnic. +config MLX4_FC + tristate Mellanox FC module + depends on MLX4_EN + select LIBFC + select LIBFCOE + help + Fibre Channel over Ethernet/Infiniband module + + This is support for the Mellanox ConnectX/ConnectX-2 HCAs + The module will be called mlx4_fc + +config MLX4_FCOIB + tristate Mellanox FCoIB discovery module + depends on INFINIBAND + select MLX4_FC + help + Fibre Channel over Infiniband discovery module + + The module FIP-alike to discover BridgeX gateways in the + Infiniband fabric + The module will be called mlx4_fc + config SCSI_DMX3191D tristate DMX3191D SCSI support depends on PCI SCSI diff --git a/drivers/scsi/Makefile b/drivers/scsi/Makefile index 2a3fca2..0d0dab7 100644 --- a/drivers/scsi/Makefile +++ b/drivers/scsi/Makefile @@ -40,6 +40,8 @@ obj-$(CONFIG_LIBFC) += libfc/ obj-$(CONFIG_LIBFCOE) += fcoe/ obj-$(CONFIG_FCOE) += fcoe/ obj-$(CONFIG_FCOE_FNIC) += fnic/ +obj-$(CONFIG_MLX4_FC) += mlx4_fc/ +obj-$(CONFIG_MLX4_FCOIB) += mlx4_fc/ obj-$(CONFIG_ISCSI_TCP) += libiscsi.o libiscsi_tcp.o iscsi_tcp.o obj-$(CONFIG_INFINIBAND_ISER) += libiscsi.o obj-$(CONFIG_SCSI_A4000T) += 53c700.o a4000t.o -- 1.6.3.3 ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] [PATCH 00/10] Add fcoe, fcoib drivers for mlx4 device
Hi Roland, The following series implements fcoe and fcoib offload driver for mlx4 device mlx4_fc: implement fcoe/fcoib, hook to scsi mid-layer to offload scsi operations, and use openfc's libfc to do ELS/BLS mlx4_fcoib: driver implement fcoib initialization protocol to discover IB-FC gateways/bridges Yevgeny Petrilin: Pre-reserve MTPs for FC Attach cq to the least cqs attached completion vector Enable T11 bit support in fw Add API to query the steer capabilities of mlx4 device Oren Duer: Add APIs to mlx4_en/mlx4_ib driver to query interfaces for given internal device Add MPT reserve/release_range APIs Vu Pham: Enable vlan support in qp path Enable mlx4_fc/mlx4_fcoib driver in scsi Kconfig/Makefile Add mlx4_fc/mlx4_fcoib drivers drivers/infiniband/hw/mlx4/cq.c |4 +- drivers/infiniband/hw/mlx4/main.c | 10 +- drivers/net/mlx4/cq.c | 27 +- drivers/net/mlx4/en_cq.c |2 +- drivers/net/mlx4/en_main.c| 14 + drivers/net/mlx4/fw.c | 13 + drivers/net/mlx4/intf.c | 50 + drivers/net/mlx4/main.c | 10 +- drivers/net/mlx4/mlx4.h |2 + drivers/net/mlx4/mr.c | 29 +- drivers/scsi/Kconfig | 14 + drivers/scsi/Makefile |2 + drivers/scsi/mlx4_fc/Makefile |8 + drivers/scsi/mlx4_fc/fcoib.h | 561 + drivers/scsi/mlx4_fc/fcoib_api.h | 102 ++ drivers/scsi/mlx4_fc/fcoib_discover.c | 2003 + drivers/scsi/mlx4_fc/fcoib_main.c | 1340 ++ drivers/scsi/mlx4_fc/mfc.c| 1992 drivers/scsi/mlx4_fc/mfc.h| 662 +++ drivers/scsi/mlx4_fc/mfc_exch.c | 1502 drivers/scsi/mlx4_fc/mfc_rfci.c | 990 drivers/scsi/mlx4_fc/mfc_sysfs.c | 243 include/linux/mlx4/device.h | 20 +- include/linux/mlx4/driver.h | 17 + include/linux/mlx4/qp.h |2 +- include/rdma/ib_verbs.h | 10 +- 26 files changed, 9606 insertions(+), 23 deletions(-) ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] [PATCH 01/10] pre-reserve MPTs for FC
From: Yevgeny Petrilin yevge...@mellanox.co.il Date: Sun, 16 Nov 2008 10:25:59 +0200 Subject: [PATCH] mlx4: Fibre Channel support As we did with QPs, some of the MPTs are pre-reserved (the MPTs that are mapped for FEXCHs, 2*64K of them). So needed to split the operation of allocating an MPT to two: The allocation of a bit from the bitmap The actual creation of the entry (and it's MTT). So, mr_alloc_reserved() is the second part, where you know which MPT number was allocated. mr_alloc() is the one that allocates a number from the bitmap. Normal users keep using the original mr_alloc(). For FEXCH, when we know the pre-reserved MPT entry, we call mr_alloc_reserved() directly. Same with the mr_free() and corresponding mr_free_reserved(). The first will just put back the bit, the later will actually destroy the entry, but will leave the bit set. map_phys_fmr_fbo() is very much like the original map_phys_fmr() allows setting an FBO (First Byte Offset) for the MPT allows setting the data length for the MPT does not increase the higher bits of the key after every map. Signed-off-by: Yevgeny Petrilin yevge...@mellanox.co.il Signed-off-by: Oren Duer o...@mellanox.co.il Signed-off-by: Vu Pham v...@mellanx.com drivers/net/mlx4/main.c |4 +- drivers/net/mlx4/mr.c | 128 +- include/linux/mlx4/device.h | 21 +++- include/linux/mlx4/qp.h | 11 +++- 4 files changed, 144 insertions(+), 20 deletions(-) From: Yevgeny Petrilin yevge...@mellanox.co.il Date: Sun, 16 Nov 2008 10:25:59 +0200 Subject: [PATCH] mlx4: Fibre Channel support As we did with QPs, some of the MPTs are pre-reserved (the MPTs that are mapped for FEXCHs, 2*64K of them). So needed to split the operation of allocating an MPT to two: The allocation of a bit from the bitmap The actual creation of the entry (and it's MTT). So, mr_alloc_reserved() is the second part, where you know which MPT number was allocated. mr_alloc() is the one that allocates a number from the bitmap. Normal users keep using the original mr_alloc(). For FEXCH, when we know the pre-reserved MPT entry, we call mr_alloc_reserved() directly. Same with the mr_free() and corresponding mr_free_reserved(). The first will just put back the bit, the later will actually destroy the entry, but will leave the bit set. map_phys_fmr_fbo() is very much like the original map_phys_fmr() allows setting an FBO (First Byte Offset) for the MPT allows setting the data length for the MPT does not increase the higher bits of the key after every map. Signed-off-by: Yevgeny Petrilin yevge...@mellanox.co.il Signed-off-by: Oren Duer o...@mellanox.co.il Signed-off-by: Vu Pham v...@mellanx.com drivers/net/mlx4/main.c |4 +- drivers/net/mlx4/mr.c | 128 +- include/linux/mlx4/device.h | 21 +++- include/linux/mlx4/qp.h | 11 +++- 4 files changed, 144 insertions(+), 20 deletions(-) diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c index e3e0d54..38fbf01 100644 --- a/drivers/net/mlx4/main.c +++ b/drivers/net/mlx4/main.c @@ -79,12 +79,12 @@ static char mlx4_version[] __devinitdata = DRV_VERSION ( DRV_RELDATE )\n; static struct mlx4_profile default_profile = { - .num_qp = 1 17, + .num_qp = 1 18, .num_srq = 1 16, .rdmarc_per_qp = 1 4, .num_cq = 1 16, .num_mcg = 1 13, - .num_mpt = 1 17, + .num_mpt = 1 19, .num_mtt = 1 20, }; diff --git a/drivers/net/mlx4/mr.c b/drivers/net/mlx4/mr.c index 3dc69be..7185c17 100644 --- a/drivers/net/mlx4/mr.c +++ b/drivers/net/mlx4/mr.c @@ -52,7 +52,9 @@ struct mlx4_mpt_entry { __be64 length; __be32 lkey; __be32 win_cnt; - u8 reserved1[3]; + u8 reserved1; + u8 flags2; + u8 reserved2; u8 mtt_rep; __be64 mtt_seg; __be32 mtt_sz; @@ -71,6 +73,8 @@ struct mlx4_mpt_entry { #define MLX4_MPT_PD_FLAG_RAE (1 28) #define MLX4_MPT_PD_FLAG_EN_INV (3 24) +#define MLX4_MPT_FLAG2_FBO_EN (1 7) + #define MLX4_MPT_STATUS_SW 0xF0 #define MLX4_MPT_STATUS_HW 0x00 @@ -263,6 +267,21 @@ static int mlx4_HW2SW_MPT(struct mlx4_dev *dev, struct mlx4_cmd_mailbox *mailbox !mailbox, MLX4_CMD_HW2SW_MPT, MLX4_CMD_TIME_CLASS_B); } +int mlx4_mr_alloc_reserved(struct mlx4_dev *dev, u32 mridx, u32 pd, + u64 iova, u64 size, u32 access, int npages, + int page_shift, struct mlx4_mr *mr) +{ + mr-iova = iova; + mr-size = size; + mr-pd = pd; + mr-access = access; + mr-enabled= 0; + mr-key = hw_index_to_key(mridx); + + return mlx4_mtt_init(dev, npages, page_shift, mr-mtt); +} +EXPORT_SYMBOL_GPL(mlx4_mr_alloc_reserved); + int mlx4_mr_alloc(struct mlx4_dev *dev, u32 pd, u64 iova, u64 size, u32 access, int npages, int page_shift, struct mlx4_mr *mr) { @@ -274,14 +293,8 @@ int mlx4_mr_alloc(struct mlx4_dev *dev, u32 pd, u64 iova, u64 size, u32 access, if (index == -1) return -ENOMEM; - mr-iova = iova; - mr-size
[ewg] [PATCH 02/10] api to query mlx4_en device for given mlx4 device
mlx4_en: Add API to query interfaces for given internal device Updated mlx4_en interface to provide a query function for it's internal net_device structure. Signed-off-by: Oren Duer o...@mellanox.co.il Signed-off-by: Vu Pham v...@mellanx.com drivers/net/mlx4/en_main.c | 14 ++ drivers/net/mlx4/intf.c | 30 ++ include/linux/mlx4/driver.h |7 +++ 3 files changed, 51 insertions(+), 0 deletions(-) mlx4: Add API to query interfaces for given internal device Updated mlx4_en interface to provide a query function for it's internal net_device structure. Signed-off-by: Oren Duer o...@mellanox.co.il Signed-off-by: Vu Pham v...@mellanx.com drivers/net/mlx4/en_main.c | 14 ++ drivers/net/mlx4/intf.c | 30 ++ include/linux/mlx4/driver.h |7 +++ 3 files changed, 51 insertions(+), 0 deletions(-) diff --git a/drivers/net/mlx4/en_main.c b/drivers/net/mlx4/en_main.c index cbabf14..6fce433 100644 --- a/drivers/net/mlx4/en_main.c +++ b/drivers/net/mlx4/en_main.c @@ -262,10 +262,24 @@ err_free_res: return NULL; } +enum mlx4_query_reply mlx4_en_query(void *endev_ptr, void *int_dev) +{ + struct mlx4_en_dev *mdev = endev_ptr; + struct net_device *netdev = int_dev; + int p; + + for (p = 1; p = MLX4_MAX_PORTS; ++p) + if (mdev-pndev[p] == netdev) + return p; + + return MLX4_QUERY_NOT_MINE; +} + static struct mlx4_interface mlx4_en_interface = { .add = mlx4_en_add, .remove = mlx4_en_remove, .event = mlx4_en_event, + .query = mlx4_en_query }; static int __init mlx4_en_init(void) diff --git a/drivers/net/mlx4/intf.c b/drivers/net/mlx4/intf.c index 5550678..beeed80 100644 --- a/drivers/net/mlx4/intf.c +++ b/drivers/net/mlx4/intf.c @@ -114,6 +114,36 @@ void mlx4_unregister_interface(struct mlx4_interface *intf) } EXPORT_SYMBOL_GPL(mlx4_unregister_interface); +struct mlx4_dev *mlx4_query_interface(void *int_dev, int *port) +{ + struct mlx4_priv *priv; + struct mlx4_device_context *dev_ctx; + enum mlx4_query_reply r; + unsigned long flags; + + mutex_lock(intf_mutex); + + list_for_each_entry(priv, dev_list, dev_list) { + spin_lock_irqsave(priv-ctx_lock, flags); + list_for_each_entry(dev_ctx, priv-ctx_list, list) { + if (!dev_ctx-intf-query) +continue; + r = dev_ctx-intf-query(dev_ctx-context, int_dev); + if (r != MLX4_QUERY_NOT_MINE) { +*port = r; +spin_unlock_irqrestore(priv-ctx_lock, flags); +mutex_unlock(intf_mutex); +return priv-dev; + } + } + spin_unlock_irqrestore(priv-ctx_lock, flags); + } + + mutex_unlock(intf_mutex); + return NULL; +} +EXPORT_SYMBOL_GPL(mlx4_query_interface); + void mlx4_dispatch_event(struct mlx4_dev *dev, enum mlx4_dev_event type, int port) { struct mlx4_priv *priv = mlx4_priv(dev); diff --git a/include/linux/mlx4/driver.h b/include/linux/mlx4/driver.h index 53c5fdb..55b45a6 100644 --- a/include/linux/mlx4/driver.h +++ b/include/linux/mlx4/driver.h @@ -44,15 +44,22 @@ enum mlx4_dev_event { MLX4_DEV_EVENT_PORT_REINIT, }; +enum mlx4_query_reply { + MLX4_QUERY_NOT_MINE = -1, + MLX4_QUERY_MINE_NOPORT = 0 +}; + struct mlx4_interface { void * (*add) (struct mlx4_dev *dev); void (*remove)(struct mlx4_dev *dev, void *context); void (*event) (struct mlx4_dev *dev, void *context, enum mlx4_dev_event event, int port); + enum mlx4_query_reply (*query) (void *context, void *); struct list_head list; }; int mlx4_register_interface(struct mlx4_interface *intf); void mlx4_unregister_interface(struct mlx4_interface *intf); +struct mlx4_dev *mlx4_query_interface(void *, int *port); #endif /* MLX4_DRIVER_H */ ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] [PATCH 03/10] attach cq to least cqs attached completion vector
When the vector number passed to mlx4_cq_alloc is MLX4_LEAST_ATTACHED_VECTOR the driver selects the completion vector that has the least CQ's attached to it and attaches the CQ to the chosen vector. IB_CQ_VECTOR_LEAST_ATTACHED is defined in rdma/ib_verbs.h, when mlx4_ib driv recieves this cq vector number, it uses MLX4_LEAST_ATTACHED_VECTOR Signed-off-by: Yevgeny Petrilin yevge...@mellanox.co.il Signed-off-by: Vu Pham v...@mellanx.com drivers/infiniband/hw/mlx4/cq.c |4 +++- drivers/net/mlx4/cq.c | 27 +++ drivers/net/mlx4/en_cq.c|2 +- drivers/net/mlx4/mlx4.h |1 + include/linux/mlx4/device.h |2 ++ include/rdma/ib_verbs.h | 10 +- 6 files changed, 39 insertions(+), 7 deletions(-) When the vector number passed to mlx4_cq_alloc is MLX4_LEAST_ATTACHED_VECTOR the driver selects the completion vector that has the least CQ's attached to it and attaches the CQ to the chosen vector. IB_CQ_VECTOR_LEAST_ATTACHED is defined in rdma/ib_verbs.h, when mlx4_ib driv recieves this cq vector number, it uses MLX4_LEAST_ATTACHED_VECTOR Signed-off-by: Yevgeny Petrilin yevge...@mellanox.co.il Signed-off-by: Vu Pham v...@mellanx.com drivers/infiniband/hw/mlx4/cq.c |4 +++- drivers/net/mlx4/cq.c | 27 +++ drivers/net/mlx4/en_cq.c|2 +- drivers/net/mlx4/mlx4.h |1 + include/linux/mlx4/device.h |2 ++ include/rdma/ib_verbs.h | 10 +- 6 files changed, 39 insertions(+), 7 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c index 5a219a2..2687970 100644 --- a/drivers/infiniband/hw/mlx4/cq.c +++ b/drivers/infiniband/hw/mlx4/cq.c @@ -223,7 +223,9 @@ struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev, int entries, int vector } err = mlx4_cq_alloc(dev-dev, entries, cq-buf.mtt, uar, - cq-db.dma, cq-mcq, vector, 0); + cq-db.dma, cq-mcq, + vector == IB_CQ_VECTOR_LEAST_ATTACHED ? + MLX4_LEAST_ATTACHED_VECTOR : vector, 0); if (err) goto err_dbmap; diff --git a/drivers/net/mlx4/cq.c b/drivers/net/mlx4/cq.c index 7cd34e9..a6f03f9 100644 --- a/drivers/net/mlx4/cq.c +++ b/drivers/net/mlx4/cq.c @@ -187,6 +187,22 @@ int mlx4_cq_resize(struct mlx4_dev *dev, struct mlx4_cq *cq, } EXPORT_SYMBOL_GPL(mlx4_cq_resize); +static int mlx4_find_least_loaded_vector(struct mlx4_priv *priv) +{ + int i; + int index = 0; + int min = priv-eq_table.eq[0].load; + + for (i = 1; i priv-dev.caps.num_comp_vectors; i++) { + if (priv-eq_table.eq[i].load min) { + index = i; + min = priv-eq_table.eq[i].load; + } + } + + return index; +} + int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt, struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq, unsigned vector, int collapsed) @@ -198,10 +214,11 @@ int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt, u64 mtt_addr; int err; - if (vector = dev-caps.num_comp_vectors) - return -EINVAL; + cq-vector = (vector == MLX4_LEAST_ATTACHED_VECTOR) ? + mlx4_find_least_loaded_vector(priv) : vector; - cq-vector = vector; + if (cq-vector = dev-caps.num_comp_vectors) + return -EINVAL; cq-cqn = mlx4_bitmap_alloc(cq_table-bitmap); if (cq-cqn == -1) @@ -232,7 +249,7 @@ int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt, cq_context-flags = cpu_to_be32(!!collapsed 18); cq_context-logsize_usrpage = cpu_to_be32((ilog2(nent) 24) | uar-index); - cq_context-comp_eqn = priv-eq_table.eq[vector].eqn; + cq_context-comp_eqn = priv-eq_table.eq[cq-vector].eqn; cq_context-log_page_size = mtt-page_shift - MLX4_ICM_PAGE_SHIFT; mtt_addr = mlx4_mtt_addr(dev, mtt); @@ -245,6 +262,7 @@ int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt, if (err) goto err_radix; + priv-eq_table.eq[cq-vector].load++; cq-cons_index = 0; cq-arm_sn = 1; cq-uar= uar; @@ -282,6 +300,7 @@ void mlx4_cq_free(struct mlx4_dev *dev, struct mlx4_cq *cq) mlx4_warn(dev, HW2SW_CQ failed (%d) for CQN %06x\n, err, cq-cqn); synchronize_irq(priv-eq_table.eq[cq-vector].irq); + priv-eq_table.eq[cq-vector].load--; spin_lock_irq(cq_table-lock); radix_tree_delete(cq_table-tree, cq-cqn); diff --git a/drivers/net/mlx4/en_cq.c b/drivers/net/mlx4/en_cq.c index 21786ad..f3dc8b7 100644 --- a/drivers/net/mlx4/en_cq.c +++ b/drivers/net/mlx4/en_cq.c @@ -56,7 +56,7 @@ int mlx4_en_create_cq(struct mlx4_en_priv *priv, cq-vector = ring % mdev-dev-caps.num_comp_vectors; } else { cq-buf_size = sizeof(struct mlx4_cqe); - cq-vector = 0; + cq-vector = MLX4_LEAST_ATTACHED_VECTOR; } cq-ring = ring; diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h index 13343e8..416aeca 100644 --- a/drivers/net/mlx4/mlx4.h +++ b/drivers/net/mlx4/mlx4.h @@ -138,6 +138,7 @@ struct mlx4_eq { u16 irq; u16 have_irq; int nent; + int load; struct
[ewg] [PATCH 04/10] remove default reservation of fexch qps and mpts
mlx4_core: removed reservation of FEXCH QPs and MPTs mlx4_fc module will reserve them upon loading. Added mpt reserve_range and release_range functions. Signed-off-by: Oren Duer o...@mellanox.co.il Signed-off-by: Vu Pham v...@mellanx.com drivers/net/mlx4/main.c |4 +--- drivers/net/mlx4/mr.c | 29 +++-- include/linux/mlx4/device.h |7 ++- 3 files changed, 26 insertions(+), 14 deletions(-) mlx4_core: removed reservation of FEXCH QPs and MPTs mlx4_fc module will reserve them upon loading. Added mpt reserve_range and release_range functions. Signed-off-by: Oren Duer o...@mellanox.co.il Signed-off-by: Vu Pham v...@mellanx.com drivers/net/mlx4/main.c |4 +--- drivers/net/mlx4/mr.c | 29 +++-- include/linux/mlx4/device.h |7 ++- 3 files changed, 26 insertions(+), 14 deletions(-) diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c index 38fbf01..bbf773d 100644 --- a/drivers/net/mlx4/main.c +++ b/drivers/net/mlx4/main.c @@ -259,12 +259,10 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap) (1 dev-caps.log_num_vlans) * (1 dev-caps.log_num_prios) * dev-caps.num_ports; - dev-caps.reserved_qps_cnt[MLX4_QP_REGION_FC_EXCH] = MLX4_NUM_FEXCH; dev-caps.reserved_qps = dev-caps.reserved_qps_cnt[MLX4_QP_REGION_FW] + dev-caps.reserved_qps_cnt[MLX4_QP_REGION_ETH_ADDR] + - dev-caps.reserved_qps_cnt[MLX4_QP_REGION_FC_ADDR] + - dev-caps.reserved_qps_cnt[MLX4_QP_REGION_FC_EXCH]; + dev-caps.reserved_qps_cnt[MLX4_QP_REGION_FC_ADDR]; return 0; } diff --git a/drivers/net/mlx4/mr.c b/drivers/net/mlx4/mr.c index 7185c17..5f07e0c 100644 --- a/drivers/net/mlx4/mr.c +++ b/drivers/net/mlx4/mr.c @@ -267,6 +267,28 @@ static int mlx4_HW2SW_MPT(struct mlx4_dev *dev, struct mlx4_cmd_mailbox *mailbox !mailbox, MLX4_CMD_HW2SW_MPT, MLX4_CMD_TIME_CLASS_B); } +int mlx4_mr_reserve_range(struct mlx4_dev *dev, int cnt, int align, u32 *base_mridx) +{ + struct mlx4_priv *priv = mlx4_priv(dev); + u32 mridx; + + mridx = mlx4_bitmap_alloc_range(priv-mr_table.mpt_bitmap, cnt, align); + if (mridx == -1) + return -ENOMEM; + + *base_mridx = mridx; + return 0; + +} +EXPORT_SYMBOL_GPL(mlx4_mr_reserve_range); + +void mlx4_mr_release_range(struct mlx4_dev *dev, u32 base_mridx, int cnt) +{ + struct mlx4_priv *priv = mlx4_priv(dev); + mlx4_bitmap_free_range(priv-mr_table.mpt_bitmap, base_mridx, cnt); +} +EXPORT_SYMBOL_GPL(mlx4_mr_release_range); + int mlx4_mr_alloc_reserved(struct mlx4_dev *dev, u32 mridx, u32 pd, u64 iova, u64 size, u32 access, int npages, int page_shift, struct mlx4_mr *mr) @@ -486,13 +508,8 @@ int mlx4_init_mr_table(struct mlx4_dev *dev) if (!is_power_of_2(dev-caps.num_mpts)) return -EINVAL; - dev-caps.num_fexch_mpts = - 2 * dev-caps.reserved_qps_cnt[MLX4_QP_REGION_FC_EXCH]; - dev-caps.reserved_fexch_mpts_base = dev-caps.num_mpts - - dev-caps.num_fexch_mpts; err = mlx4_bitmap_init(mr_table-mpt_bitmap, dev-caps.num_mpts, - ~0, dev-caps.reserved_mrws, - dev-caps.reserved_fexch_mpts_base); + ~0, dev-caps.reserved_mrws, 0); if (err) return err; diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index 4664d1d..8afac02 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -151,7 +151,6 @@ enum mlx4_qp_region { MLX4_QP_REGION_FW = 0, MLX4_QP_REGION_ETH_ADDR, MLX4_QP_REGION_FC_ADDR, - MLX4_QP_REGION_FC_EXCH, MLX4_NUM_QP_REGION }; @@ -167,10 +166,6 @@ enum mlx4_special_vlan_idx { MLX4_VLAN_REGULAR }; -enum { - MLX4_NUM_FEXCH = 64 * 1024, -}; - #define MLX4_LEAST_ATTACHED_VECTOR 0x static inline u64 mlx4_fw_ver(u64 major, u64 minor, u64 subminor) @@ -426,6 +421,8 @@ int mlx4_mtt_init(struct mlx4_dev *dev, int npages, int page_shift, void mlx4_mtt_cleanup(struct mlx4_dev *dev, struct mlx4_mtt *mtt); u64 mlx4_mtt_addr(struct mlx4_dev *dev, struct mlx4_mtt *mtt); +int mlx4_mr_reserve_range(struct mlx4_dev *dev, int cnt, int align, u32 *base_mridx); +void mlx4_mr_release_range(struct mlx4_dev *dev, u32 base_mridx, int cnt); int mlx4_mr_alloc_reserved(struct mlx4_dev *dev, u32 mridx, u32 pd, u64 iova, u64 size, u32 access, int npages, int page_shift, struct mlx4_mr *mr); ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] [PATCH 05/10] query ib device from given mlx4 device
Adding API to query ib_device with mlx4_dev Signed-off-by: Oren Duer o...@mellanox.co.il Signed-off-by: Vu Pham v...@mellanx.com drivers/infiniband/hw/mlx4/main.c | 10 +- drivers/net/mlx4/intf.c | 20 drivers/net/mlx4/main.c | 10 +++--- drivers/net/mlx4/mlx4.h |1 + include/linux/mlx4/driver.h | 10 ++ 7 files changed, 72 insertions(+), 15 deletions(-) Adding API to query ib_device with mlx4_dev Signed-off-by: Oren Duer o...@mellanox.co.il Signed-off-by: Vu Pham v...@mellanx.com drivers/infiniband/hw/mlx4/main.c | 10 +- drivers/net/mlx4/intf.c | 20 drivers/net/mlx4/main.c | 10 +++--- drivers/net/mlx4/mlx4.h |1 + include/linux/mlx4/driver.h | 10 ++ 7 files changed, 72 insertions(+), 15 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c index 4e94e36..e071229 100644 --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -58,6 +58,12 @@ static const char mlx4_ib_version[] = DRV_NAME : Mellanox ConnectX InfiniBand driver v DRV_VERSION ( DRV_RELDATE )\n; +static void *get_ibdev(struct mlx4_dev *dev, void *ctx, u8 port) +{ + struct mlx4_ib_dev *mlxibdev = ctx; + return mlxibdev-ib_dev; +} + static void init_query_mad(struct ib_smp *mad) { mad-base_version = 1; @@ -749,7 +755,9 @@ static void mlx4_ib_event(struct mlx4_dev *dev, void *ibdev_ptr, static struct mlx4_interface mlx4_ib_interface = { .add = mlx4_ib_add, .remove = mlx4_ib_remove, - .event = mlx4_ib_event + .event = mlx4_ib_event, + .get_prot_dev = get_ibdev, + .protocol = MLX4_PROT_IB }; static int __init mlx4_ib_init(void) diff --git a/drivers/net/mlx4/intf.c b/drivers/net/mlx4/intf.c index beeed80..f8f97f9 100644 --- a/drivers/net/mlx4/intf.c +++ b/drivers/net/mlx4/intf.c @@ -191,3 +191,23 @@ void mlx4_unregister_device(struct mlx4_dev *dev) mutex_unlock(intf_mutex); } + +void *mlx4_find_get_prot_dev(struct mlx4_dev *dev, enum mlx4_prot proto, int port) +{ + struct mlx4_priv *priv = mlx4_priv(dev); + struct mlx4_device_context *dev_ctx; + unsigned long flags; + void *result = NULL; + + spin_lock_irqsave(priv-ctx_lock, flags); + + list_for_each_entry(dev_ctx, priv-ctx_list, list) + if (dev_ctx-intf-protocol == proto dev_ctx-intf-get_prot_dev) { + result = dev_ctx-intf-get_prot_dev(dev, dev_ctx-context, port); + break; + } + + spin_unlock_irqrestore(priv-ctx_lock, flags); + + return result; +} diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c index 38fbf01..f14f0d6 100644 --- a/drivers/net/mlx4/main.c +++ b/drivers/net/mlx4/main.c @@ -105,6 +105,12 @@ static int log_mtts_per_seg = ilog2(MLX4_MTT_ENTRY_PER_SEG); module_param_named(log_mtts_per_seg, log_mtts_per_seg, int, 0444); MODULE_PARM_DESC(log_mtts_per_seg, Log2 number of MTT entries per segment (1-5)); +void *mlx4_get_prot_dev(struct mlx4_dev *dev, enum mlx4_prot proto, int port) +{ + return mlx4_find_get_prot_dev(dev, proto, port); +} +EXPORT_SYMBOL(mlx4_get_prot_dev); + int mlx4_check_port_params(struct mlx4_dev *dev, enum mlx4_port_type *port_type) { diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h index 416aeca..9c62019 100644 --- a/drivers/net/mlx4/mlx4.h +++ b/drivers/net/mlx4/mlx4.h @@ -364,6 +364,7 @@ int mlx4_restart_one(struct pci_dev *pdev); int mlx4_register_device(struct mlx4_dev *dev); void mlx4_unregister_device(struct mlx4_dev *dev); void mlx4_dispatch_event(struct mlx4_dev *dev, enum mlx4_dev_event type, int port); +void *mlx4_find_get_prot_dev(struct mlx4_dev *dev, enum mlx4_prot proto, int port); struct mlx4_dev_cap; struct mlx4_init_hca_param; diff --git a/include/linux/mlx4/driver.h b/include/linux/mlx4/driver.h index 55b45a6..94c9617 100644 --- a/include/linux/mlx4/driver.h +++ b/include/linux/mlx4/driver.h @@ -49,17 +49,27 @@ enum mlx4_query_reply { MLX4_QUERY_MINE_NOPORT = 0 }; +enum mlx4_prot { + MLX4_PROT_IB, + MLX4_PROT_EN, +}; + struct mlx4_interface { void * (*add) (struct mlx4_dev *dev); void (*remove)(struct mlx4_dev *dev, void *context); void (*event) (struct mlx4_dev *dev, void *context, enum mlx4_dev_event event, int port); + void * (*get_prot_dev) (struct mlx4_dev *dev, void *context, u8 port); + enum mlx4_prot protocol; + enum mlx4_query_reply (*query) (void *context, void *); struct list_head list; }; int mlx4_register_interface(struct mlx4_interface *intf); void mlx4_unregister_interface(struct mlx4_interface *intf); +void *mlx4_get_prot_dev(struct mlx4_dev *dev, enum mlx4_prot proto, int port); + struct mlx4_dev *mlx4_query_interface(void *, int *port); #endif /* MLX4_DRIVER_H */ ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] [PATCH 08/10] enable vlan support in mlx4 qp path
Enable vlan support in qp path, allow traffic to be encapsulated in tagged vlan frames. Signed-off-by: Vu Pham v...@mellanox.com Enable vlan support in qp path, allow traffic to be encapsulated in tagged vlan frames. Signed-off-by: Vu Pham v...@mellanox.com diff --git a/include/linux/mlx4/qp.h b/include/linux/mlx4/qp.h index 7abe643..1e53d45 100644 --- a/include/linux/mlx4/qp.h +++ b/include/linux/mlx4/qp.h @@ -109,7 +109,7 @@ struct mlx4_qp_path { __be32 tclass_flowlabel; u8 rgid[16]; u8 sched_queue; - u8 snooper_flags; + u8 vlan_index; u8 reserved3[2]; u8 counter_index; u8 reserved4[7]; ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] [PATCH 06/10] enable T11 bit for mlx4 device
Enable T11 bit support on mlx4 device Signed-off-by: Oren Duer o...@mellanox.co.il Signed-off-by: Vu Pham v...@mellanx.com drivers/net/mlx4/fw.c | 13 + include/linux/mlx4/device.h |5 - 2 files changed, 17 insertions(+), 1 deletions(-) Enable T11 bit in mlx4 device Signed-off-by: Oren Duer o...@mellanox.co.il Signed-off-by: Vu Pham v...@mellanx.com drivers/net/mlx4/fw.c | 13 + include/linux/mlx4/device.h |5 - 2 files changed, 17 insertions(+), 1 deletions(-) diff --git a/drivers/net/mlx4/fw.c b/drivers/net/mlx4/fw.c index 04f42ae..1286b72 100644 --- a/drivers/net/mlx4/fw.c +++ b/drivers/net/mlx4/fw.c @@ -51,6 +51,10 @@ static int enable_qos; module_param(enable_qos, bool, 0444); MODULE_PARM_DESC(enable_qos, Enable Quality of Service support in the HCA (default: off)); +static int mlx4_pre_t11_mode = 0; +module_param_named(enable_pre_t11_mode, mlx4_pre_t11_mode, int, 0644); +MODULE_PARM_DESC(enable_pre_t11_mode, For FCoXX, enable pre-t11 mode if non-zero (default: 0)); + #define MLX4_GET(dest, source, offset) \ do { \ void *__p = (char *) (source) + (offset); \ @@ -792,6 +796,8 @@ int mlx4_INIT_HCA(struct mlx4_dev *dev, struct mlx4_init_hca_param *param) MLX4_PUT(inbox, (u8) (PAGE_SHIFT - 12), INIT_HCA_UAR_PAGE_SZ_OFFSET); MLX4_PUT(inbox, param-log_uar_sz, INIT_HCA_LOG_UAR_SZ_OFFSET); + if (!mlx4_pre_t11_mode dev-caps.flags (u32) MLX4_DEV_CAP_FLAG_FC_T11) + *(inbox + INIT_HCA_FLAGS_OFFSET / 4) |= cpu_to_be32(1 10); err = mlx4_cmd(dev, mailbox-dma, 0, 0, MLX4_CMD_INIT_HCA, 1); @@ -890,3 +896,10 @@ int mlx4_NOP(struct mlx4_dev *dev) /* Input modifier of 0x1f means finish as soon as possible. */ return mlx4_cmd(dev, 0, 0x1f, 0, MLX4_CMD_NOP, 100); } + +void mlx4_get_fc_t11_settings(struct mlx4_dev *dev, int *enable_pre_t11, int *t11_supported) +{ + *enable_pre_t11 = mlx4_pre_t11_mode; + *t11_supported = dev-caps.flags MLX4_DEV_CAP_FLAG_FC_T11; +} +EXPORT_SYMBOL_GPL(mlx4_get_fc_t11_settings); diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index 8afac02..d173008 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -67,7 +67,8 @@ enum { MLX4_DEV_CAP_FLAG_ATOMIC = 1 18, MLX4_DEV_CAP_FLAG_RAW_MCAST = 1 19, MLX4_DEV_CAP_FLAG_UD_AV_PORT = 1 20, - MLX4_DEV_CAP_FLAG_UD_MCAST = 1 21 + MLX4_DEV_CAP_FLAG_UD_MCAST = 1 21, + MLX4_DEV_CAP_FLAG_FC_T11 = 1 31 }; enum { @@ -491,4 +492,6 @@ int mlx4_fmr_free_reserved(struct mlx4_dev *dev, struct mlx4_fmr *fmr); int mlx4_fmr_free(struct mlx4_dev *dev, struct mlx4_fmr *fmr); int mlx4_SYNC_TPT(struct mlx4_dev *dev); +void mlx4_get_fc_t11_settings(struct mlx4_dev *dev, int *enable_pre_t11, int *t11_supported); + #endif /* MLX4_DEVICE_H */ ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] [PATCH 07/10] query the steer capabilities of mlx4 device
Add API to query the steer capabilities of mlx4 device Signed-off-by: Oren Duer o...@mellanox.co.il Signed-off-by: Vu Pham v...@mellanx.com Add API to query the steer capabilities of mlx4 device Signed-off-by: Oren Duer o...@mellanox.co.il Signed-off-by: Vu Pham v...@mellanx.com diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index 4408b96..1777965 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -396,6 +394,14 @@ struct mlx4_init_port_param { u64 si_guid; }; +static inline void mlx4_query_steer_cap(struct mlx4_dev *dev, int *log_mac, + int *log_vlan, int *log_prio) +{ + *log_mac = dev-caps.log_num_macs; + *log_vlan = dev-caps.log_num_vlans; + *log_prio = dev-caps.log_num_prios; +} + #define mlx4_foreach_port(port, dev, type)\ for ((port) = 1; (port) = (dev)-caps.num_ports; (port)++) \ if (((type) == MLX4_PORT_TYPE_IB ? (dev)-caps.port_mask : \ ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] nfsrdma fails to write big file,
Tom, Some more info on the problem: 1. Running with memreg=4 (FMR) I can not reproduce the problem 2. I also see different error on client Feb 22 12:16:55 mellanox-2 rpc.idmapd[5786]: nss_getpwnam: name 'nobody' does not map into domain 'localdomain' Feb 22 12:16:55 mellanox-2 kernel: QP 0x70004b: WQE overflow Feb 22 12:16:55 mellanox-2 kernel: QP 0x6c004a: WQE overflow Feb 22 12:16:55 mellanox-2 kernel: QP 0x6c004a: WQE overflow Feb 22 12:16:55 mellanox-2 kernel: RPC: rpcrdma_ep_post: ib_post_send returned -12 cq_init 48 cq_count 32 Feb 22 12:17:00 mellanox-2 kernel: RPC: rpcrdma_event_process: send WC status 5, vend_err F5 Feb 22 12:17:00 mellanox-2 kernel: rpcrdma: connection to 13.20.1.9:20049 closed (-103) -vu -Original Message- From: Tom Tucker [mailto:t...@opengridcomputing.com] Sent: Monday, February 22, 2010 10:49 AM To: Vu Pham Cc: linux-r...@vger.kernel.org; Mahesh Siddheshwar; ewg@lists.openfabrics.org Subject: Re: [ewg] nfsrdma fails to write big file, Vu Pham wrote: Setup: 1. linux nfsrdma client/server with OFED-1.5.1-20100217-0600, ConnectX2 QDR HCAs fw 2.7.8-6, RHEL 5.2. 2. Solaris nfsrdma server svn 130, ConnectX QDR HCA. Running vdbench on 10g file or *dd if=/dev/zero of=10g_file bs=1M count=1*, operation fail, connection get drop, client cannot re-establish connection to server. After rebooting only the client, I can mount again. It happens with both solaris and linux nfsrdma servers. For linux client/server, I run memreg=5 (FRMR), I don't see problem with memreg=6 (global dma key) Awesome. This is the key I think. Thanks for the info Vu, Tom On Solaris server snv 130, we see problem decoding write request of 32K. The client send two read chunks (32K 16-byte), the server fail to do rdma read on the 16-byte chunk (cqe.status = 10 ie. IB_WC_REM_ACCCESS_ERROR); therefore, server terminate the connection. We don't see this problem on nfs version 3 on Solaris. Solaris server run normal memory registration mode. On linux client, I see cqe.status = 12 ie. IB_WC_RETRY_EXC_ERR I added these notes in bug #1919 (bugs.openfabrics.org) to track the issue. thanks, -vu ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] MLX4 Strangeness
Hi Tom, Status 12 = IB_WC_RETRY_EXC_ERR Vendor_err = 129 -- Timeout and transport error counter exceeded This indicates that we lost connection to the client ie. something went wrong on client side (bad operation cause QP error...) please try to catch any error on the client (qp async event, cq error status and vendor_err...) Today I just run vdbench on big file and get error right away (lost connection and nfsrdma cannot recover from there) Thanks, -vu -Original Message- From: ewg-boun...@lists.openfabrics.org [mailto:ewg-boun...@lists.openfabrics.org] On Behalf Of Tom Tucker Sent: Wednesday, February 17, 2010 10:07 AM To: Tziporet Koren Cc: linux-r...@vger.kernel.org; ewg@lists.openfabrics.org Subject: Re: [ewg] MLX4 Strangeness Hi Tziporet: Here is a trace with the data for WR failing with status 12. The vendor error is 129. Feb 17 12:27:33 vic10 kernel: rpcrdma_event_process:154 wr_id status 12 opcode 0 vendor_err 129 byte_len 0 qp 81002a13ec00 ex src_qp wc_flags, 0 pkey_index Feb 17 12:27:33 vic10 kernel: rpcrdma_event_process:154 wr_id 81002878d800 status 5 opcode 0 vendor_err 244 byte_len 0 qp 81002a13ec00 ex src_qp wc_flags, 0 pkey_index Feb 17 12:27:33 vic10 kernel: rpcrdma_event_process:167 wr_id 81002878d800 status 5 opcode 0 vendor_err 244 byte_len 0 qp 81002a13ec00 ex src_qp wc_flags, 0 pkey_index Any thoughts? Tom Tom Tucker wrote: Tom Tucker wrote: Tziporet Koren wrote: On 2/15/2010 10:24 PM, Tom Tucker wrote: Hello, I am seeing some very strange behavior on my MLX4 adapters running 2.7 firmware and the latest OFED 1.5.1. Two systems are involved and each have dual ported MTHCA DDR adapter and MLX4 adapters. The scenario starts with NFSRDMA stress testing between the two systems running bonnie++ and iozone concurrently. The test completes and there is no issue. Then 6 minutes pass and the server times out the connection and shuts down the RC connection to the client. From this point on, using the RDMA CM, a new RC QP can be brought up and moved to RTS, however, the first RDMA_SEND to the NFS SERVER system fails with IB_WC_RETRY_EXC_ERR. I have confirmed: - that arp completed successfully and the neighbor entries are populated on both the client and server - that the QP are in the RTS state on both the client and server - that there are RECV WR posted to the RQ on the server and they did not error out - that no RECV WR completed successfully or in error on the server - that there are SEND WR posted to the QP on the client - the client side SEND_WR fails with error 12 as mentioned above I have also confirmed the following with a different application (i.e. rping): server# rping -s client# rping -c -a 192.168.80.129 fails with the exact same error, i.e. client# rping -c -a 192.168.80.129 cq completion failed status 12 wait for RDMA_WRITE_ADV state 10 client DISCONNECT EVENT... However, if I run rping the other way, it works fine, that is, client# rping -s server# rping -c -a 192.168.80.135 It runs without error until I stop it. Does anyone have any ideas on how I might debug this? Tom What is the vendor syndrome error when you get a completion with error? Feb 16 15:08:29 vic10 kernel: rpcrdma: connection to 192.168.80.129:20049 closed (-103) Feb 16 15:51:27 vic10 kernel: rpcrdma: connection to 192.168.80.129:20049 on mlx4_0, memreg 5 slots 32 ird 16 Feb 16 15:52:01 vic10 kernel: rpcrdma_event_process:160 wr_id 81002879a000 status 5 opcode 0 vendor_err 244 byte_len 0 qp 81003c9e3200 ex src_qp wc_flags, 0 pkey_index Feb 16 15:52:06 vic10 kernel: rpcrdma: connection to 192.168.80.129:20049 closed (-103) Feb 16 15:52:06 vic10 kernel: rpcrdma: connection to 192.168.80.129:20049 on mlx4_0, memreg 5 slots 32 ird 16 Feb 16 15:52:40 vic10 kernel: rpcrdma_event_process:160 wr_id 81002879a000 status 5 opcode 0 vendor_err 244 byte_len 0 qp 81002f2d8400 ex src_qp wc_flags, 0 pkey_index Repeat forever So the vendor err is 244. Please ignore this. This log skips the failing WR (:-\). I need to do another trace. Does the issue occurs only on the ConnectX cards (mlx4) or also on the InfiniHost cards (mthca) Tziporet ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] [GIT PULL ofed-1.4.1] xprtrdma: The frmr iova_start values are truncated by the nfs rdma client.
Steve, You should have the same fix for the server side as well Here is the patch -vu A bad cast causes the iova_start, which in this case is a DMA bus address, to be truncated on 32b systems. No cast is needed. Signed-off-by: Steve Wise sw...@opengridcomputing.com --- net/sunrpc/xprtrdma/verbs.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c index 3b21e0c..3a374f5 100644 --- a/net/sunrpc/xprtrdma/verbs.c +++ b/net/sunrpc/xprtrdma/verbs.c @@ -1489,7 +1489,7 @@ rpcrdma_register_frmr_external(struct rpcrdma_mr_seg *seg, memset(frmr_wr, 0, sizeof frmr_wr); frmr_wr.opcode = IB_WR_FAST_REG_MR; frmr_wr.send_flags = 0;/* unsignaled */ -frmr_wr.wr.fast_reg.iova_start = (unsigned long)seg1-mr_dma; +frmr_wr.wr.fast_reg.iova_start = seg1-mr_dma; frmr_wr.wr.fast_reg.page_list = seg1-mr_chunk.rl_mw-r.frmr.fr_pgl; frmr_wr.wr.fast_reg.page_list_len = i; frmr_wr.wr.fast_reg.page_shift = PAGE_SHIFT; ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c 2009-03-03 17:05:22.0 -0800 +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c 2009-04-24 13:38:36.0 -0700 @@ -1228,7 +1244,7 @@ memset(fastreg_wr, 0, sizeof fastreg_wr); fastreg_wr.opcode = IB_WR_FAST_REG_MR; fastreg_wr.send_flags = IB_SEND_SIGNALED; - fastreg_wr.wr.fast_reg.iova_start = (unsigned long)frmr-kva; + fastreg_wr.wr.fast_reg.iova_start = frmr-kva; fastreg_wr.wr.fast_reg.page_list = frmr-page_list; fastreg_wr.wr.fast_reg.page_list_len = frmr-page_list_len; fastreg_wr.wr.fast_reg.page_shift = PAGE_SHIFT; ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] Re: cannot install SCST with SRP support
Doron Shoham wrote: Vu Pham wrote: In order to run with *scst_disk* you need to patch and rebuild the kernel. Please follow SCST readme to do this task If you don't want to patch and rebuild the kernel, you can run with *scst_vdisk* and get the same performance. I attach here an example to setup *scst_vdisk* -vu Can you please explain me what is the differences between scst_disk and scst_vdisk? scst_disk: scsi pass through module driver, using scsi mid-level interfaces to send commands scst_vdisk: simulating file and block devices as scsi luns. Using file system interfaces (fileIO) and block interfaces (blockio) to send commands Is there any way to config SCST to work with RAM device (instead of working with a real scsi device)? Yes. There are two ways 1. NULLIO or memory IO a. dd if=/dev/zero of=/tmp/tempfile bs=64k count=100 b. echo open vdisk0 /tmp/tempfile NULLIO /proc/scsi_tgt/vdisk/vdisk c. echo add vdisk0 0 /proc/scsi_tgt/groups/Default/devices 2. Using ramdisk a. echo open vdisk1 /dev/ram0 BLOCKIO /proc/scsi_tgt/vdisk/vdisk b. echo add vdisk1 1 proc/scsi_tgt/groups/Default/devices Lun 0 with NULLIO will perform better ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] Re: cannot install SCST with SRP support
I got: a. modprobe scst b. modprobe scst_disk In order to run with *scst_disk* you need to patch and rebuild the kernel. Please follow SCST readme to do this task If you don't want to patch and rebuild the kernel, you can run with *scst_vdisk* and get the same performance. I attach here an example to setup *scst_vdisk* -vu FATAL: Error inserting scst_disk (/lib/modules/2.6.16.21-0.8-smp/extra/dev_handlers/scst_disk.ko): No such device c. cat /proc/scsi_tgt/scsi_tgt Device (host:ch:id:lun or name) Device handler 0:0:0:0 none 1:0:0:0 none 2:0:1:0 none 4:0:1:0 none 2:0:1:1 none 4:0:1:1 none 2:0:1:2 none 4:0:1:2 none 2:0:1:3 none 4:0:1:3 none 2:0:1:4 none 4:0:1:4 none 2:0:1:5 none 4:0:1:5 none 2:0:1:6 none 4:0:1:6 none 4:0:1:7 none 2:0:1:7 none 4:0:1:8 none 2:0:1:8 none Doesn't the Device handler need to be dev_disk? -vu ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg Thanks, Doron #!/bin/sh modprobe scst modprobe scst_vdisk echo open vdisk0 /dev/sdb BLOCKIO /proc/scsi_tgt/vdisk/vdisk echo open vdisk1 /dev/sdc BLOCKIO /proc/scsi_tgt/vdisk/vdisk echo open vdisk2 /dev/sdd BLOCKIO /proc/scsi_tgt/vdisk/vdisk echo open vdisk3 /dev/sde BLOCKIO /proc/scsi_tgt/vdisk/vdisk echo open vdisk4 /dev/sdf BLOCKIO /proc/scsi_tgt/vdisk/vdisk echo open vdisk5 /dev/sdg BLOCKIO /proc/scsi_tgt/vdisk/vdisk echo open vdisk6 /dev/sdh BLOCKIO /proc/scsi_tgt/vdisk/vdisk echo open vdisk7 /dev/sdi BLOCKIO /proc/scsi_tgt/vdisk/vdisk echo open vdisk8 /dev/sdj BLOCKIO /proc/scsi_tgt/vdisk/vdisk echo open vdisk9 /dev/sdk BLOCKIO /proc/scsi_tgt/vdisk/vdisk echo open vdisk10 /dev/sdl BLOCKIO /proc/scsi_tgt/vdisk/vdisk echo open vdisk11 /dev/sdm BLOCKIO /proc/scsi_tgt/vdisk/vdisk echo open vdisk12 /dev/sdn BLOCKIO /proc/scsi_tgt/vdisk/vdisk echo open vdisk13 /dev/sdo BLOCKIO /proc/scsi_tgt/vdisk/vdisk echo open vdisk14 /dev/sdp BLOCKIO /proc/scsi_tgt/vdisk/vdisk echo add vdisk0 0 /proc/scsi_tgt/groups/Default/devices echo add vdisk1 1 /proc/scsi_tgt/groups/Default/devices echo add vdisk2 2 /proc/scsi_tgt/groups/Default/devices echo add vdisk3 3 /proc/scsi_tgt/groups/Default/devices echo add vdisk4 4 /proc/scsi_tgt/groups/Default/devices echo add vdisk5 5 /proc/scsi_tgt/groups/Default/devices echo add vdisk6 6 /proc/scsi_tgt/groups/Default/devices echo add vdisk7 7 /proc/scsi_tgt/groups/Default/devices echo add vdisk8 8 /proc/scsi_tgt/groups/Default/devices echo add vdisk9 9 /proc/scsi_tgt/groups/Default/devices echo add vdisk10 10 /proc/scsi_tgt/groups/Default/devices echo add vdisk11 11 /proc/scsi_tgt/groups/Default/devices echo add vdisk12 12 /proc/scsi_tgt/groups/Default/devices echo add vdisk13 13 /proc/scsi_tgt/groups/Default/devices echo add vdisk14 14 /proc/scsi_tgt/groups/Default/devices modprobe ib_srpt echo add mgmt /proc/scsi_tgt/trace_level echo add mgmt_dbg /proc/scsi_tgt/trace_level echo add out_of_mem /proc/scsi_tgt/trace_level ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg][PATCH][0/2] SRP multipath failover within 60 seconds,
The following patches assist SRP/dm-multipath to failover within 60 seconds (bugzilla #577) without data corruption, read/write error 1. srp_disconnect_without_wait.patch - srp send disconnect request without waiting for CM timewait exit event since srp current does not re-use the cm_id and qp/cq of a connection (patch srp_1_recreate_at_reconnect.patch already in kernel_patches/fixes recreate the cmid, qp/cq for a connection at reconnect) 2. srp_qp_in_err_timer_reconnect_target.patch - when detecting a post_send/post_receive error, srp set qp_in_error, set a timer to reconnect to target, return SCSI_MLQUEUE_HOST_BUSY to lock the queue, and return DID_NO_CONNECT when target state is DEAD or REMOVED Here is my multipath.conf defaults { udev_dir/dev polling_interval5 selectorround-robin 0 path_grouping_policymultibus getuid_callout /sbin/scsi_id -g -u -s /block/%n prio_callout/bin/true path_checkerreadsector0 rr_min_io 100 rr_weight priorities failbackimmediate no_path_retry 5 user_friendly_names no } I also set srp_daemon.sh to rescan fabric every 60 seconds (instead of 300 secs as default setting) I ran data integrity test to /dev/mapper/devices and {disable path 1, sleep 90, enable path 1, sleep 60, disable path 2, sleep 90, enable path 2, sleep 60} in the loop RHEL5, 5.1 work very well (no data corruption, read/write failure report) For SLES 10 sp1, it work well as long as I run *multipath* every 60 secs. I think that I mis-configured the multipathd somehow (Here is how I set it up: using the same multipath.conf above, chkconfig boot.multipath on and chkconf multipathd on) -vu ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg][PATCH][1/2] SRP multipath failover within 60 seconds,
srp_disconnect_without_wait.patch - srp send disconnect request without waiting for CM timewait exit event since srp current does not re-use the cm_id and qp/cq of a connection (patch srp_1_recreate_at_reconnect.patch already in kernel_patches/fixes recreate the cmid, qp/cq for a connection at reconnect) Signed-off-by: Vu Pham [EMAIL PROTECTED] diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 950228f..45a2533 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -400,7 +400,6 @@ printk(KERN_DEBUG PFX Sending CM DREQ failed\n); return; } - wait_for_completion(target-done); } static void srp_remove_work(struct work_struct *work) @@ -1266,7 +1294,6 @@ case IB_CM_TIMEWAIT_EXIT: printk(KERN_ERR PFX connection closed\n); - comp = 1; target-status = 0; break; ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg][PATCH][2/2] SRP multipath failover within 60 seconds,
srp_qp_in_err_timer_reconnect_target.patch - when detecting a post_send/post_receive error, srp set qp_in_error, set a timer to reconnect to target, return SCSI_MLQUEUE_HOST_BUSY to lock the queue, and return DID_NO_CONNECT when target state is DEAD or REMOVED Signed-off-by: Vu Pham [EMAIL PROTECTED] --- ofa_kernel-1.3.configured/drivers/infiniband/ulp/srp/ib_srp.c 2008-02-05 11:18:16.0 -0800 +++ ofa_kernel-1.3/drivers/infiniband/ulp/srp/ib_srp.c 2008-02-05 15:18:33.0 -0800 @@ -885,6 +884,26 @@ DMA_FROM_DEVICE); } +static void srp_reconnect_work(struct work_struct *work) +{ + struct srp_target_port *target = + container_of(work, struct srp_target_port, work); + + srp_reconnect_target(target); +} + +static void srp_qp_in_err_timer(unsigned long data) +{ + struct srp_target_port *target = (struct srp_target_port *)data; + + spin_lock_irq(target-scsi_host-host_lock); + INIT_WORK(target-work, srp_reconnect_work); + schedule_work(target-work); + spin_unlock_irq(target-scsi_host-host_lock); + + del_timer(target-qp_err_timer); +} + static void srp_completion(struct ib_cq *cq, void *target_ptr) { struct srp_target_port *target = target_ptr; @@ -896,7 +915,16 @@ printk(KERN_ERR PFX failed %s status %d\n, wc.wr_id SRP_OP_RECV ? receive : send, wc.status); - target-qp_in_error = 1; + if (!target-qp_in_error) { +target-qp_in_error = 1; +if (!timer_pending(target-qp_err_timer)) { + setup_timer(target-qp_err_timer, + srp_qp_in_err_timer, + (unsigned long)target); + target-qp_err_timer.expires = 10 * HZ + jiffies; + add_timer(target-qp_err_timer); +} + } break; } @@ -1004,12 +1032,13 @@ struct ib_device *dev; int len; - if (target-state == SRP_TARGET_CONNECTING) + if (target-state == SRP_TARGET_CONNECTING || + target-qp_in_error) goto err; if (target-state == SRP_TARGET_DEAD || target-state == SRP_TARGET_REMOVED) { - scmnd-result = DID_BAD_TARGET 16; + scmnd-result = DID_NO_CONNECT 16; done(scmnd); return 0; } --- ofa_kernel-1.3.configured/drivers/infiniband/ulp/srp/ib_srp.h 2008-02-05 11:18:16.0 -0800 +++ ofa_kernel-1.3/drivers/infiniband/ulp/srp/ib_srp.h 2008-02-05 11:20:49.0 -0800 @@ -160,6 +160,7 @@ int status; enum srp_target_state state; int qp_in_error; + struct timer_list qp_err_timer; }; struct srp_iu { ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg][PATCH][1/2] SRP multipath failover within 60 seconds,
Roland Dreier wrote: diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 950228f..45a2533 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -400,7 +400,6 @@ printk(KERN_DEBUG PFX Sending CM DREQ failed\n); return; } - wait_for_completion(target-done); } static void srp_remove_work(struct work_struct *work) @@ -1266,7 +1294,6 @@ case IB_CM_TIMEWAIT_EXIT: printk(KERN_ERR PFX connection closed\n); - comp = 1; target-status = 0; break; Seems like this would leak the cm_id? I said in my [0/2] email, this patch should be applied on top of srp_1_recreate_at_reconnect.patch which is already in ofed_1_3.git tree kernel_patches/fixes/ directory I attached it here Hello, Roland! Please consider the following for 2.6.19. --- From: Ishai Rabinovitz [EMAIL PROTECTED] For some reason (could be a firmware problem) I got a CQ overrun in SRP. Because of that there was a QP FATAL. Since in srp_reconnect_target we are not destroying the QP, the QP FATAL persists after the reconnect. In order to be able to recover from such situation I suggest we destroy the CQ and the QP in every reconnect. This also corrects a minor spec in-compliance - when srp_reconnect_target is called, srp destroys the CM ID and resets the QP, the new connection will be retried with the same QPN which could theoretically lead to stale packets (for strict spec compliance I think QPN should not be reused till all stale packets are flushed out of the network). --- IB/srp: destroy/re-create QP and CQ on each reconnect. This makes SRP more robust in presence of hardware errors and is closer to behaviour suggested by IB spec, reducing chance of stale packets. Signed-off-by: Ishai Rabinovitz [EMAIL PROTECTED] Signed-off-by: Michael S. Tsirkin [EMAIL PROTECTED] Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c === --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c 2006-08-31 12:23:52.0 +0300 +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-08-31 12:30:48.0 +0300 @@ -495,10 +495,10 @@ static int srp_reconnect_target(struct srp_target_port *target) { struct ib_cm_id *new_cm_id; - struct ib_qp_attr qp_attr; struct srp_request *req, *tmp; - struct ib_wc wc; int ret; + struct ib_cq *old_cq; + struct ib_qp *old_qp; spin_lock_irq(target-scsi_host-host_lock); if (target-state != SRP_TARGET_LIVE) { @@ -522,17 +522,17 @@ ib_destroy_cm_id(target-cm_id); target-cm_id = new_cm_id; - qp_attr.qp_state = IB_QPS_RESET; - ret = ib_modify_qp(target-qp, qp_attr, IB_QP_STATE); - if (ret) - goto err; - - ret = srp_init_qp(target, target-qp); - if (ret) + old_qp = target-qp; + old_cq = target-cq; + ret = srp_create_target_ib(target); + if (ret) { + target-qp = old_qp; + target-cq = old_cq; goto err; + } - while (ib_poll_cq(target-cq, 1, wc) 0) - ; /* nothing */ + ib_destroy_qp(old_qp); + ib_destroy_cq(old_cq); spin_lock_irq(target-scsi_host-host_lock); list_for_each_entry_safe(req, tmp, target-req_queue, list) -- MST ___ openib-general mailing list [EMAIL PROTECTED] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg][PATCH][1/2] SRP multipath failover within 60 seconds,
Roland Dreier wrote: diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 950228f..45a2533 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -400,7 +400,6 @@ printk(KERN_DEBUG PFX Sending CM DREQ failed\n); return; } - wait_for_completion(target-done); } static void srp_remove_work(struct work_struct *work) @@ -1266,7 +1294,6 @@ case IB_CM_TIMEWAIT_EXIT: printk(KERN_ERR PFX connection closed\n); - comp = 1; target-status = 0; break; Seems like this would leak the cm_id? I said in my [0/2] email, this patch should be applied on top of srp_1_recreate_at_reconnect.patch which is already in ofed_1_3.git tree kernel_patches/fixes/ directory I attached it here Hello, Roland! Please consider the following for 2.6.19. --- From: Ishai Rabinovitz [EMAIL PROTECTED] For some reason (could be a firmware problem) I got a CQ overrun in SRP. Because of that there was a QP FATAL. Since in srp_reconnect_target we are not destroying the QP, the QP FATAL persists after the reconnect. In order to be able to recover from such situation I suggest we destroy the CQ and the QP in every reconnect. This also corrects a minor spec in-compliance - when srp_reconnect_target is called, srp destroys the CM ID and resets the QP, the new connection will be retried with the same QPN which could theoretically lead to stale packets (for strict spec compliance I think QPN should not be reused till all stale packets are flushed out of the network). --- IB/srp: destroy/re-create QP and CQ on each reconnect. This makes SRP more robust in presence of hardware errors and is closer to behaviour suggested by IB spec, reducing chance of stale packets. Signed-off-by: Ishai Rabinovitz [EMAIL PROTECTED] Signed-off-by: Michael S. Tsirkin [EMAIL PROTECTED] Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c === --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c 2006-08-31 12:23:52.0 +0300 +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-08-31 12:30:48.0 +0300 @@ -495,10 +495,10 @@ static int srp_reconnect_target(struct srp_target_port *target) { struct ib_cm_id *new_cm_id; - struct ib_qp_attr qp_attr; struct srp_request *req, *tmp; - struct ib_wc wc; int ret; + struct ib_cq *old_cq; + struct ib_qp *old_qp; spin_lock_irq(target-scsi_host-host_lock); if (target-state != SRP_TARGET_LIVE) { @@ -522,17 +522,17 @@ ib_destroy_cm_id(target-cm_id); target-cm_id = new_cm_id; - qp_attr.qp_state = IB_QPS_RESET; - ret = ib_modify_qp(target-qp, qp_attr, IB_QP_STATE); - if (ret) - goto err; - - ret = srp_init_qp(target, target-qp); - if (ret) + old_qp = target-qp; + old_cq = target-cq; + ret = srp_create_target_ib(target); + if (ret) { + target-qp = old_qp; + target-cq = old_cq; goto err; + } - while (ib_poll_cq(target-cq, 1, wc) 0) - ; /* nothing */ + ib_destroy_qp(old_qp); + ib_destroy_cq(old_cq); spin_lock_irq(target-scsi_host-host_lock); list_for_each_entry_safe(req, tmp, target-req_queue, list) -- MST ___ openib-general mailing list [EMAIL PROTECTED] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg][GIT PULL] please pull srpt ofed_1_3.git,
Hi Vlad, Please pull from git://git.openfabrics.org/~vu/ofed_1_3.git This pull a fix for srpt in ofed-1.3 GIT COMMIT COMMENTS srpt: avoid disconnecting/removing connection again when it is already in disconnecting state. Manipulate connection list with spinlock_irq Signed-off-by: Vu Pham [EMAIL PROTECTED] thanks, -vu ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: Distributing the SRP target source code
Bart Van Assche wrote: On Jan 28, 2008 6:07 PM, Vu Pham [EMAIL PROTECTED] wrote: On srpt readme file, the prerequisite is install SCST BEFORE ofed-1.3 or like Vlad warning recompiling ofed if you install scst after install ofed. This is what will happen if someone installs Linux kernel headers + SCST + OFED in this order: 1. Linux kernel headers matching the running kernel are installed in /usr/src/linux-.../include or equivalent, and a symbolic link to the kernel headers is created in /lib/modules/$(uname -r)/build/include. 2. By building and installing SCST, SCST modules are installed in /lib/modules/$(uname -r)/extra and SCST kernel headers are installed in /usr/local/include, a.o. SCST's scsi_tgt.h header file, the interface between SCST and mid-level SCSI drivers. 3. Next, OFED kernel modules are being built. During this process the SRP target module is compiled with the header file drivers/infiniband/ulp/srpt/scsi_tgt.h. The version of this file distributed with OFED 1.3 is incompatible with the one distributed with the latest version of SCST. Or: the kernel will probably crash as soon as one starts using the SRP target module, even if he or she followed the above outlined official build procedure. Including /usr/local/include/scsi_tgt.h in the SRP target module is not an option -- kernel modules must not include userspace headers, except for the well known exceptions like stdarg.h. There are two include paths. The first one is /usr/local/include/scst and the second one are drivers/infiniband/ulp/srpt. Therefore, building srpt in ofed will always use the /usr/local/include/scst path first and if you already install scst then there won't be any problem As you already know /usr/local/include/scst/scsi_tgt.h is not userspace header. SCST is not part of kernel yet; srpt is also not part of kernel All this trouble can be avoided by distributing the SRP target code with SCST instead of with OFED. The same problem would appear if someone use different ofed versions Furthermore, all kernel headers that define inter-module interfaces should reside in kernel source root dir/include/subdir/... The SRP target breaks this convention by having a private copy of an inter-module interface in a local directory (drivers/infiniband/ulp/srpt/scsi_tgt.h). Once again srpt is not part of kernel; therefore, it breaks certain kernel rule. We'll fix it if scst is official part of kernel here is one of the reason srpt is part of ofed not scst: SCST is GPL ofed + srpt is GPL or BSD This is not an issue -- if you have a look at the Linux kernel, you will see that all source files are licensed under at least the GPLv2 and some source files are licensed under GPLv2 + one or more other licenses, e.g. BSD. I know that; however, I don't know if SCST has ok with double license or not -vu ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: [Scst-devel] [ofa-general] OFED 1.3 RC2 release is available
Bart Van Assche wrote: On Jan 28, 2008 12:47 PM, Vladislav Bolkhovitin [EMAIL PROTECTED] wrote: Bart Van Assche wrote: Apparently OFED 1.3 includes SRP target support ? Although I consider SRP target support as a very valuable contribution, it should not be included in the OFED distribution but in the SCST distribution. The reason is that the SRP target relies on SCST interfaces that can potentially change with each new SCST release. Consider e.g. the scsi_tgt.h header file, which defines the interface between SCST core and SCST mid-level modules. The version of this file included with git://git.openfabrics.org/~vu/ofed_1_3.git (0.9.6-pre3) is incompatible with the latest scsi_tgt.h file from the SCST project (0.9.6-rc1). This may cause kernel crashes for OFED 1.3 SRP target users who combine OFED 1.3 with the latest SCST version. No it won't crash, it will refuse to run. I've recently added in SCST protection against attempts running mixed SCST and target driver versions. BTW, there is a *!!* *!! !!* *!! BIG FAT WARNING ABOUT MIXED VERSIONS PROBLEM !!* *!! !!* *!!* Hello Vladislav, I did not test the above scenario -- what I wrote was the result of source reading. It is very good that interface versions are checked inside SCST before mid-level drivers are used. Even with interface version checking in place, my opinion is that the SRP target code should be included in the SCST project and not in the OFED project. Bart. Hi Bart, On srpt readme file, the prerequisite is install SCST BEFORE ofed-1.3 or like Vlad warning recompiling ofed if you install scst after install ofed. here is one of the reason srpt is part of ofed not scst: SCST is GPL ofed + srpt is GPL or BSD -vu ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg][GIT PULL] please pull srpt ofed_1_3.git,
Hi Vlad, Please pull from git://git.openfabrics.org/~vu/ofed_1_3.git This pull some fixes for srpt in ofed-1.3 GIT COMMIT COMMENTS --- srpt: Fix compilation error in 2.6.24-rc2 Using sg_set_page() in 2.6.24 instead of accessing sg fields directly Relying on scatterlist.h in backport header to support older kernels Signed-off-by: Vu Pham [EMAIL PROTECTED] srpt: Fix data corruption of using local mempool without lock Change the local buffer allocator to use a spin-lock protected linked list instead of an array of atomic_t used/free variables. The atomic_t code was open to a multi-thread race between test and set. This has been observed with the result that the same data buffer was used for more than one SCSI operation, either writing the wrong data to the disk or sending the wrong data to the initiator. Signed-off-by: Robert Pearson [EMAIL PROTECTED] Signed-off-by: David A. McMillen [EMAIL PROTECTED] srpt: Add qp and port asynchronous events handling Port error: set port lid, sm_lid to zero Port active/lid change/pkey change/sm change: refresh port for lid/sm_lid QP event on last wqe reached to clean up connection/session Signed-off-by: Vu Pham [EMAIL PROTECTED] srpt: Return the correct scsi status Return correct scsi error status in srp response, set expected data len, and call scst_tgt_cmd_done for aborted command, srp response for aborted command will be send in srpt_task_mgmt callback Signed-off-by: Vu Pham [EMAIL PROTECTED] srpt: Fix bug in connection tear down to allow clean module unload Fix bug in connection tear down to allow clean module unload or multiple connect/disconnect/reconnect from initiators. Fix race condition between rtu schedule work and ib_cm_notify. Thanks to Robert H.B Netzer [EMAIL PROTECTED] for debugging and fixing them Signed-off-by: Vu Pham [EMAIL PROTECTED] srpt: Avoid calling ib_unregister_client twice when fail to register with scst Signed-off-by: Vu Pham [EMAIL PROTECTED] srpt: Add and maintain active scst_cmd list for error handling cases. Modify other lists' name for better readability ie. completion list, wait command list... Signed-off-by: Vu Pham [EMAIL PROTECTED] srpt: Code to have a clean round up to power of 2 for mem_pool element Signed-off-by: David A. McMillen [EMAIL PROTECTED] srpt: Using full initiator port ID as session name Signed-off-by: Vu Pham [EMAIL PROTECTED] thanks, -vu ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg