Re: IB/iSER major problems with Linux 3.0 and Solaris targets
On 1/12/2012 11:23 AM, Sebastian Riemer wrote: We are running iSER directly on the host. KVM is compiled in but there aren't any VMs on our iSER test server. It is a diskless SuperMicro server with NFS root. On productive servers we have a live-image and KVM uses the iSER driven block devices for storage. This is the IB HCA (mlx4): Mellanox MT26428 [ConnectX IB QDR, PCIe 2.0 5GT/s] We've updated the firmware lately on all servers. How can I find out the firmware version? With tvflash or mstflint? If you have build the kernel IB user space support (uverbs) and the IB libs, do ibv_devinfo if not, just ossi cat /sys/class/infiniband/mlx4_0/* and send the output. To be clear, iser does work for you on the productive servers but not on this server? The storage has the same IB HCA, and they are connected via a switch. I'll ask someone of the SysOps which one it is and if they have the latest firmware on it. Perhaps this could be the problem. As its local protection error on TX, I don't see how this could relate to the target node. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: IB/iSER major problems with Linux 3.0 and Solaris targets
On 12/01/12 10:29, Or Gerlitz wrote: If you have build the kernel IB user space support (uverbs) and the IB libs, do ibv_devinfo if not, just ossi cat /sys/class/infiniband/mlx4_0/* and send the output. To be clear, iser does work for you on the productive servers but not on this server? Yes, we've got consistent OFED-1.5.4 user-space. ibv_devinfo reports a mismatch between the kernel and the userspace libraries - kernel does not support XRC.. ibverbs-driver-mlx4 is at version 1.0.1-1.20.g6771d22 and libibverbs is at version 1.1.4-1.24.gb89d4d7. But O.K., the other method shows firmware version 2.9.1000. iSER only works on productive servers, because we use the OFA kernel modules from OFED for them at the moment (with 3.0 ported *iscsi* drivers). But there the IPoIB traffic is too slow for us. We connect customer VMs with IPv6 between different servers via IB. And yes, we could also test kernel 3.2 on our iSER test server. Regards, Sebastian -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
ibv_req_notify_cq and multithreading
I'm trying to have N threads reading from the same completion channel, bounded to M completion queues. I would like to have N M, and to ensure that only a single thread at time can call ibv_poll_cq() on a given queue, to process the events in the same order they were put in the queue. I can't understand how to properly achieve this, since: 1- If I call ibv_req_notify_cq() before ibv_poll_cq(), I might end up with two threads polling the same queue. 2- If I call ibv_req_notify_cq() after ibv_poll_cq(), I could end up with events in the cq not being notified in the channel (I read this on the IBTA 11.4.2.2, and I *think* I actually experienced this under load). I can use option 1 with an additional lock before ibv_req_notify_cq(), but I would like to know if there is a simpler way which I can't see. Thanks Flavio -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
endian question about struct srp_direct_buf
Sparse complains because len in struct srp_direct_buf is declared as big endian but it's used throughout as CPU endian. struct srp_indirect_buf has the same thing. It's declared one way but used the other way. $ grep -w len drivers/scsi -R | grep -w md drivers/scsi/ibmvscsi/ibmvfc.c: md[i].len = sg_dma_len(sg); drivers/scsi/ibmvscsi/ibmvstgt.c: mlen = min(rest, md[i].len); drivers/scsi/libsrp.c: md-len, scsi_sg_count(sc)); drivers/scsi/libsrp.c: len = min(scsi_bufflen(sc), md-len); drivers/scsi/libsrp.c: len = md-len; drivers/scsi/libsrp.c: err = rdma_io(sc, sg, nsg, md, 1, dir, len); drivers/scsi/libsrp.c: md = dma_alloc_coherent(iue-target-dev, id-table_desc.len, drivers/scsi/libsrp.c: sg_init_one(dummy, md, id-table_desc.len); drivers/scsi/libsrp.c: err = rdma_io(sc, sg, nsg, md, nmd, dir, len); drivers/scsi/libsrp.c: dma_free_coherent(iue-target-dev, id-table_desc.len, md, token); drivers/scsi/libsrp.c: len = md-len; Probably we should just change the declaration to u32? regards, dan carpenter -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: endian question about struct srp_direct_buf
On Thu, Jan 12, 2012 at 12:41 PM, Dan Carpenter dan.carpen...@oracle.com wrote: Sparse complains because len in struct srp_direct_buf is declared as big endian but it's used throughout as CPU endian. struct srp_indirect_buf has the same thing. It's declared one way but used the other way. $ grep -w len drivers/scsi -R | grep -w md drivers/scsi/ibmvscsi/ibmvfc.c: md[i].len = sg_dma_len(sg); drivers/scsi/ibmvscsi/ibmvstgt.c: mlen = min(rest, md[i].len); drivers/scsi/libsrp.c: md-len, scsi_sg_count(sc)); drivers/scsi/libsrp.c: len = min(scsi_bufflen(sc), md-len); drivers/scsi/libsrp.c: len = md-len; drivers/scsi/libsrp.c: err = rdma_io(sc, sg, nsg, md, 1, dir, len); drivers/scsi/libsrp.c: md = dma_alloc_coherent(iue-target-dev, id-table_desc.len, drivers/scsi/libsrp.c: sg_init_one(dummy, md, id-table_desc.len); drivers/scsi/libsrp.c: err = rdma_io(sc, sg, nsg, md, nmd, dir, len); drivers/scsi/libsrp.c: dma_free_coherent(iue-target-dev, id-table_desc.len, md, token); drivers/scsi/libsrp.c: len = md-len; Probably we should just change the declaration to u32? (resending as plain text) No. The SRP spec says that that field is big endian and the ib_srp driver uses that field as a big endian field. The output above (libsrp + ibmvstgt) is code that is used by the ibmvstgt driver only, and the reason that driver works fine without endianness conversion is because it is only used on PowerPC systems. Bart. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] opensm: fixed segfault in osm_destroy
Fixed segfault in osm_destroy() when hop_weights_file, port_search_ordering_file or io_guid_file are configured. The segfault introduced by d71a924736707400bed47a3c69395cf864c970bb. Signed-off-by: Alex Netes ale...@mellanox.com --- opensm/main.c |6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/opensm/main.c b/opensm/main.c index 3edc52f..c75d220 100644 --- a/opensm/main.c +++ b/opensm/main.c @@ -724,13 +724,13 @@ int main(int argc, char *argv[]) break; case 'w': - opt.hop_weights_file = optarg; + SET_STR_OPT(opt.hop_weights_file, optarg); printf( Hop Weights File = %s\n, opt.hop_weights_file); break; case 'O': - opt.port_search_ordering_file = optarg; + SET_STR_OPT(opt.port_search_ordering_file, optarg); printf( Port Search Ordering/Dimension Ports File = %s\n, opt.port_search_ordering_file); break; @@ -959,7 +959,7 @@ int main(int argc, char *argv[]) break; case 'G': - opt.io_guid_file = optarg; + SET_STR_OPT(opt.io_guid_file, optarg); printf( I/O Node Guid File: %s\n, opt.io_guid_file); break; case 11: -- 1.7.1 -- -- Alex -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: IB/iSER major problems with Linux 3.0 and Solaris targets
On 12/01/12 11:16, Sebastian Riemer wrote: On 12/01/12 10:29, Or Gerlitz wrote: If you have build the kernel IB user space support (uverbs) and the IB libs, do ibv_devinfo if not, just ossi cat /sys/class/infiniband/mlx4_0/* and send the output. To be clear, iser does work for you on the productive servers but not on this server? Yes, we've got consistent OFED-1.5.4 user-space. ibv_devinfo reports a mismatch between the kernel and the userspace libraries - kernel does not support XRC.. ibverbs-driver-mlx4 is at version 1.0.1-1.20.g6771d22 and libibverbs is at version 1.1.4-1.24.gb89d4d7. But O.K., the other method shows firmware version 2.9.1000. I've found out that we have two single port MHQH19B-XTR InfiniBand HCAs. lspci output: 03:00.0 InfiniBand: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE] (rev b0) 04:00.0 InfiniBand: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE] (rev b0) The first one is ib1. And the second is ib0. /sys/devices/pci:00/:00:0c.0/:03:00.0/net/ib1 /sys/devices/pci:00/:00:0b.0/:04:00.0/net/ib0 The iSER traffic is on ib1 (the HCA which reported the error) and ib0 is for IPoIB traffic. I don't know if the mlx4 driver has a problem with that hardware config. Here is the requested data: mlx4_0: board_id MT_0D90110009 fw_ver 2.9.1000 hca_type MT26428 hw_rev b0 node_desc pserver214 HCA-1 (mlx4_0 - MT26428) node_guid 0002:c903:000f:5f76 node_type 1: CA sys_image_guid 0002:c903:000f:5f79 uevent NAME=mlx4_0 mlx4_1: board_id MT_0D90110009 fw_ver 2.9.1000 hca_type MT26428 hw_rev b0 node_desc pserver214 HCA-2 (mlx4_1 - MT26428) node_guid 0002:c903:000f:5f26 node_type 1: CA sys_image_guid 0002:c903:000f:5f29 uevent NAME=mlx4_1 Both are connected to the storage but in different subnets and without multipathing. How do I find out if ib1 is on mlx4_1 or mlx4_0? Cheers, Sebastian -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: ibv_req_notify_cq and multithreading
I'm trying to have N threads reading from the same completion channel, bounded to M completion queues. I would like to have N M, and to ensure that only a single thread at time can call ibv_poll_cq() on a given queue, to process the events in the same order they were put in the queue. I can't understand how to properly achieve this, since: 1- If I call ibv_req_notify_cq() before ibv_poll_cq(), I might end up with two threads polling the same queue. 2- If I call ibv_req_notify_cq() after ibv_poll_cq(), I could end up with events in the cq not being notified in the channel (I read this on the IBTA 11.4.2.2, and I *think* I actually experienced this under load). I can use option 1 with an additional lock before ibv_req_notify_cq(), but I would like to know if there is a simpler way which I can't see. I can't think of a simpler way. You just don't have any idea which CQ will be returned from the completion channel. Does your traffic pattern work to create N completion channels and distributed the CQs among them? -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Send with immediate data completion
On Jan 11, 2012, at 5:22 PM, Hefty, Sean wrote: I'm still waiting on feedback from the IBTA, but they are looking into the matter. The intent is for immediate data only to be provided on receive work completions. The IBTA will clarify the spec on this. I'll submit patches that remove setting the wc flag, which may help avoid this confusion some. Sean, Thanks for looking into this. Scott-- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] IB/qib: detour pcie_caps for certain chip sets
Should whatever this issue is be a general PCI fixup? Like broken MSI, etc. Can you point me to some details on this? Might be nice to include what 0x51 tunes in the commit to aide other peoole with the broken chipset :) Isn't it necesary to check the PCI vendor as well as the devid? Will do both of these in a V2. Mike This message and any attached documents contain information from QLogic Corporation or its wholly-owned subsidiaries that may be confidential. If you are not the intended recipient, you may not read, copy, distribute, or use this information. If you have received this transmission in error, please notify the sender immediately by reply e-mail and then delete this message. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: IB/iSER major problems with Linux 3.0 and Solaris targets
On 1/12/2012 5:18 PM, Sebastian Riemer wrote: How do I find out if ib1 is on mlx4_1 or mlx4_0 you do ip addr show and compare with /sys/class/infiniband/mlx4_*/ports/1/gid/0 you didn't send the kernel logs from the failure after opening the iser (debug_level=2) and libiscsi (debug_libiscsi_session=1 debug_libiscsi_conn=1) debug prints -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: IB/iSER major problems with Linux 3.0 and Solaris targets
On 1/11/2012 10:09 PM, Or Gerlitz wrote: [...] I'll give 3.0.15 a try tomorrow, however, the error you're getting iser_drain_tx_cq:tx id 88402391f898 status 4 vend_err 57 means that iser got local protection error (=4) on the first buffer we used with IB (the connection handshake buffers belong to the IB CM, not to the ULP) which is the login request. Sounds like something is broken maybe dma mapping wise, for this reason I think its likely that the problem might not hit me on my testbed [...] okay, I've tried 3.0.15 with your .config slightly changed for my local SATA disk, will send you copy of my .config , and, iser works for me... so you need to try a bit harder and send me your logs... I'm using iscsi-initiator-utils-6.2.0.872-21.el6.x86_64 Or. My board ID is MT_0D81120009 which is a bit different but the HCA is ConnectX b0 as yours, I'm using non GA firmware, but I find it hard to believe this is the reason for your failure # ibstat CA 'mlx4_0' CA type: MT26428 Number of ports: 2 Firmware version: 2.9.4270 Hardware version: b0 Node GUID: 0x0002c9030010c6e8 System image GUID: 0x0002c9030010c6eb Port 1: State: Active Physical state: LinkUp Rate: 40 Base lid: 10 LMC: 0 SM lid: 6 Capability mask: 0x02510868 Port GUID: 0x0002c9030010c6e9 [ 134.869036] iscsi: registered transport (tcp) [ 134.987553] iscsi: registered transport (iser) [ 136.075198] iser: iser_connect:connecting to: 192.168.20.19, port 0xbc0c [ 136.100162] iser: iser_cma_handler:event 0 status 0 conn 88020eb7ba80 id 8802252aec00 [ 136.58] iser: iser_cma_handler:event 2 status 0 conn 88020eb7ba80 id 8802252aec00 [ 136.130923] iser: iser_create_ib_conn_res:setting conn 88020eb7ba80 cma_id 8802252aec00: fmr_pool 880224c17880 qp 8802154a4600 [ 136.150646] iser: iser_cma_handler:event 9 status 0 conn 88020eb7ba80 id 8802252aec00 [ 136.332263] iser: iscsi_iser_ep_poll:ib conn 88020eb7ba80 rc = 1 [ 136.338710] scsi3 : iSCSI Initiator over iSER, v.0.1 [ 136.346240] iser: iscsi_iser_conn_bind:binding iscsi/iser conn 880225294ab8 880225294cc8 to ib_conn 88020eb7ba80 [ 136.609277] scsi 3:0:0:0: RAID Mellanox vsa 1PQ: 0 ANSI: 5 [ 136.617604] scsi 3:0:0:0: Attached scsi generic sg3 type 12 [ 136.623454] scsi 3:0:0:1: Direct-Access Mellanox VIRTUAL-DISK 0001 PQ: 0 ANSI: 5 [ 136.631820] sd 3:0:0:1: Attached scsi generic sg4 type 0 [ 136.631848] sd 3:0:0:1: [sdc] 2147483648 512-byte logical blocks: (1.09 TB/1.00 TiB) [ 136.645040] sd 3:0:0:1: [sdc] Write Protect is off [ 136.649880] sd 3:0:0:1: [sdc] Mode Sense: 49 00 00 08 [ 136.649975] sd 3:0:0:1: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 136.659680] sdc: unknown partition table [ 136.664071] sd 3:0:0:1: [sdc] Attached SCSI disk [ 211.020096] sd 3:0:0:1: [sdc] Synchronizing SCSI cache [ 211.526075] iser: iscsi_iser_ep_disconnect:ib conn 88020eb7ba80 state 2 [ 211.534048] iser: iser_cma_handler:event 10 status 0 conn 88020eb7ba80 id 8802252aec00 [ 211.542750] iser: iser_free_ib_conn_res:freeing conn 88020eb7ba80 cma_id 8802252aec00 fmr pool 880224c17880 qp 8802154a4600 [ 211.556053] iser: iser_device_try_release:device 880225b30480 refcount 0 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Upstream support for multicast IBoE
On Wed, Jan 11, 2012 at 09:49:25PM +0200, Or Gerlitz wrote: Shawn Bohrer sboh...@rgmadvisors.com wrote: Is there any estimate on when we might see something like this upstream? Could you elaborate a little on your use case for multicast IBoE traffic? e.g how the setup looks like and how are your Ethernet switches act to route that traffic. I'm not sure exactly what you are asking here. We do what I would imagine is a typical one to many UD multicast. We code directly to libibvers and librdmacm, and everything is sent IBoE. The hosts are in a spine/leaf configuration and all traffic is sent over vlans. My understanding is that the multicast IBoE traffic is simply sent as broadcast and that the adapters do the necessary filtering. Really from my point of view OFED already does what we want, but I would really like to see this supported upstream. Thanks, Shawn --- This email, along with any attachments, is confidential. If you believe you received this message in error, please contact the sender immediately and delete all copies of the message. Thank you. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] opensm: fixed segfault in osm_destroy
On Thu, 12 Jan 2012, Alex Netes wrote: Fixed segfault in osm_destroy() when hop_weights_file, port_search_ordering_file or io_guid_file are configured. The segfault introduced by d71a924736707400bed47a3c69395cf864c970bb. Signed-off-by: Alex Netes ale...@mellanox.com --- opensm/main.c |6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) The fix looks good, and works too! Thanks. Dale -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] IB/qib: detour pcie_caps for certain chip sets
Does this work on systems where the broken chipset might not be the immediate parent of the qib device (ie there are some PCIe switches in between)? The code figures this out at the top of routine and returns, changing nothing. This message and any attached documents contain information from QLogic Corporation or its wholly-owned subsidiaries that may be confidential. If you are not the intended recipient, you may not read, copy, distribute, or use this information. If you have received this transmission in error, please notify the sender immediately by reply e-mail and then delete this message. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] opensm: Get correct guid in case of multiple ports
Hi Goldwyn, On 10:02 Wed 11 Jan , Goldwyn Rodrigues wrote: Hi Alex, Let me start with how we encountered the problem: This problem came up when our customer was using a 2 port card with only one of the port active. opensm could not get the guid of the port that was active in daemon mode. I guess it's because your costumer runs opensm with -g 0 -B in command line. On 13:36 Wed 05 Oct , Goldwyn Rodrigues wrote: In case of multiple ports and running in daemon mode, the active port is not selected because opt.guid is set to INVALID_GUID in main() but the check in get_port_guid is done against zero: if (port_guid == 0) { opt.guid is set to 0 by default. opt.guid is set to INVALID_GUID if a user used -g WRONG_GUID command line option when executing the SM. What happens when -g 0 -B is specified? Check the getopt code. It sets guid to INVALID_GUID. Consider /etc/sysconfig/opensm as well. You are correct. Setting argument -g 0 will set port_guid to INVALID_GUID. From OpenSM man page: -g, --guid GUID in hex This option specifies the local port GUID value with which OpenSM should bind. OpenSM may be bound to 1 port at a time. If GUID given is 0, OpenSM displays a list of possible port GUIDs and waits for user input. Without -g, OpenSM tries to use the default port. So I guess the behavior of running OpenSM with -g 0 -B is undefined. I think it's better to exit than execute OpenSM with wrong parameter. Moreover, there is no problem when you set guid 0 in the opensm.conf and run opensm as a daemon (actually this is the default). What happens when you provide -g WRONG_GUID -B? I think in this case, -B should take priority and set with the first active port available. I think that in that case, a user intended to bind OpenSM on specific port and it could be a major issue if OpenSM will automatically binds to a different port. In that case, when SM runs not in daemon mode, SM prompts the user to choose available port GUID out of available range. In case when SM runs in daemon mode, it can't prompt the user so it just exits. On second thoughts, passing port_guid is worthless because this function is called only when no guid is supplied at the command prompt. So, removed the port_guid parameter from the function altogether. If not in daemon mode, it would show the list of ports as intended. Also added error message if no ports are found. Signed-off-by: Goldwyn Rodrigues rgold...@suse.de diff --git a/opensm/main.c b/opensm/main.c index 51c8291..a236859 100644 --- a/opensm/main.c +++ b/opensm/main.c @@ -403,7 +403,7 @@ static void show_usage(void) exit(2); } -static ib_net64_t get_port_guid(IN osm_opensm_t * p_osm, uint64_t port_guid) +static ib_net64_t get_port_guid(IN osm_opensm_t *p_osm) { ib_port_attr_t attr_array[MAX_LOCAL_IBPORTS]; uint32_t num_ports = MAX_LOCAL_IBPORTS; @@ -436,21 +436,19 @@ static ib_net64_t get_port_guid(IN osm_opensm_t * p_osm, uint64_t port_guid) cl_hton64(attr_array[0].port_guid)); return attr_array[0].port_guid; } - /* If port_guid is 0 - use the first connected port */ - if (port_guid == 0) { + /* If in daemon mode autoselect first available port */ + if (p_osm-subn.opt.daemon) { for (i = 0; i num_ports; i++) if (attr_array[i].link_state IB_LINK_DOWN) break; + /* No port found which is available */ if (i == num_ports) - i = 0; + return 0; printf(Using default GUID 0x% PRIx64 \n, cl_hton64(attr_array[i].port_guid)); return attr_array[i].port_guid; } - if (p_osm-subn.opt.daemon) - return 0; - /* More than one possible port - list all ports and let the user * to choose. */ while (1) { @@ -1106,10 +1104,12 @@ int main(int argc, char *argv[]) then get a port GUID value with which to bind. */ if (opt.guid == 0 || cl_hton64(opt.guid) == CL_HTON64(INVALID_GUID)) - opt.guid = get_port_guid(osm, opt.guid); + opt.guid = get_port_guid(osm); - if (opt.guid == 0) + if (opt.guid == 0) { + printf(\nError: No available ports\n); goto Exit; + } status = osm_opensm_bind(osm, opt.guid); if (status != IB_SUCCESS) { -- Goldwyn -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- -- Alex -- Goldwyn -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at
[PATCH V1 1/6] IB: use central enum for speed instead of hard-coded values
The kernel IB stack uses one enumeration for IB speed, which wasn't explicitly specified in the verbs header file. Add that enum, and use it all over the code. Note that the IB speed/width notation is also used by iWARP and IBoE hw drivers who apply the convention of rate = speed X width, to advertize their port link rate. Signed-off-by: Or Gerlitz ogerl...@mellanox.com --- changes from v0: fixed typo in the enum type name (was ib_port_seed instead of ib_port_speed) drivers/infiniband/core/sysfs.c | 15 +++- drivers/infiniband/core/uverbs_cmd.c |3 ++ drivers/infiniband/core/verbs.c |1 + drivers/infiniband/hw/amso1100/c2_provider.c |2 +- drivers/infiniband/hw/cxgb3/iwch_provider.c |2 +- drivers/infiniband/hw/cxgb4/provider.c |2 +- drivers/infiniband/hw/ehca/ehca_hca.c|2 +- drivers/infiniband/hw/mlx4/main.c| 10 drivers/infiniband/hw/mlx4/qp.c | 31 + drivers/infiniband/hw/nes/nes_verbs.c|2 +- include/rdma/ib_verbs.h | 11 - 11 files changed, 59 insertions(+), 22 deletions(-) diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c index c61bca3..9ce70ca 100644 --- a/drivers/infiniband/core/sysfs.c +++ b/drivers/infiniband/core/sysfs.c @@ -189,21 +189,24 @@ static ssize_t rate_show(struct ib_port *p, struct port_attribute *unused, rate = (25 * attr.active_speed) / 10; switch (attr.active_speed) { - case 2: + case IB_SPEED_SDR: + speed = SDR; + break; + case IB_SPEED_DDR: speed = DDR; break; - case 4: + case IB_SPEED_QDR: speed = QDR; break; - case 8: + case IB_SPEED_FDR10: speed = FDR10; rate = 10; break; - case 16: + case IB_SPEED_FDR: speed = FDR; rate = 14; break; - case 32: + case IB_SPEED_EDR: speed = EDR; rate = 25; break; @@ -214,7 +217,7 @@ static ssize_t rate_show(struct ib_port *p, struct port_attribute *unused, return -EINVAL; return sprintf(buf, %d%s Gb/sec (%dX%s)\n, - rate, (attr.active_speed == 1) ? .5 : , + rate, (attr.active_speed == IB_SPEED_SDR) ? .5 : , ib_width_enum_to_int(attr.active_width), speed); } diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c index b930da4..8722e96 100644 --- a/drivers/infiniband/core/uverbs_cmd.c +++ b/drivers/infiniband/core/uverbs_cmd.c @@ -1399,6 +1399,9 @@ ssize_t ib_uverbs_create_qp(struct ib_uverbs_file *file, if (copy_from_user(cmd, buf, sizeof cmd)) return -EFAULT; + if (cmd.qp_type == IB_QPT_RAW_PACKET !capable(CAP_NET_RAW)) + return -EPERM; + INIT_UDATA(udata, buf + sizeof cmd, (unsigned long) cmd.response + sizeof resp, in_len - sizeof cmd, out_len - sizeof resp); diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index 602b1bd..f73e15b 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -479,6 +479,7 @@ static const struct { [IB_QPT_UD] = (IB_QP_PKEY_INDEX | IB_QP_PORT | IB_QP_QKEY), + [IB_QPT_RAW_PACKET] = IB_QP_PORT, [IB_QPT_UC] = (IB_QP_PKEY_INDEX | IB_QP_PORT | IB_QP_ACCESS_FLAGS), diff --git a/drivers/infiniband/hw/amso1100/c2_provider.c b/drivers/infiniband/hw/amso1100/c2_provider.c index 12f923d..07eb3a8 100644 --- a/drivers/infiniband/hw/amso1100/c2_provider.c +++ b/drivers/infiniband/hw/amso1100/c2_provider.c @@ -94,7 +94,7 @@ static int c2_query_port(struct ib_device *ibdev, props-pkey_tbl_len = 1; props-qkey_viol_cntr = 0; props-active_width = 1; - props-active_speed = 1; + props-active_speed = IB_SPEED_SDR; return 0; } diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c index 37c224f..0bdf09a 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c @@ -1227,7 +1227,7 @@ static int iwch_query_port(struct ib_device *ibdev, props-gid_tbl_len = 1; props-pkey_tbl_len = 1; props-active_width = 2; - props-active_speed = 2; + props-active_speed = IB_SPEED_DDR; props-max_msg_sz =
Re: [PATCH] opensm: Get correct guid in case of multiple ports
Hi Alex, On Thu, Jan 12, 2012 at 07:23:30PM +0200, Alex Netes wrote: Hi Goldwyn, On 10:02 Wed 11 Jan , Goldwyn Rodrigues wrote: Hi Alex, Let me start with how we encountered the problem: This problem came up when our customer was using a 2 port card with only one of the port active. opensm could not get the guid of the port that was active in daemon mode. I guess it's because your costumer runs opensm with -g 0 -B in command line. On 13:36 Wed 05 Oct , Goldwyn Rodrigues wrote: In case of multiple ports and running in daemon mode, the active port is not selected because opt.guid is set to INVALID_GUID in main() but the check in get_port_guid is done against zero: if (port_guid == 0) { opt.guid is set to 0 by default. opt.guid is set to INVALID_GUID if a user used -g WRONG_GUID command line option when executing the SM. What happens when -g 0 -B is specified? Check the getopt code. It sets guid to INVALID_GUID. Consider /etc/sysconfig/opensm as well. You are correct. Setting argument -g 0 will set port_guid to INVALID_GUID. From OpenSM man page: -g, --guid GUID in hex This option specifies the local port GUID value with which OpenSM should bind. OpenSM may be bound to 1 port at a time. If GUID given is 0, OpenSM displays a list of possible port GUIDs and waits for user input. Without -g, OpenSM tries to use the default port. So I guess the behavior of running OpenSM with -g 0 -B is undefined. I think it's better to exit than execute OpenSM with wrong parameter. Think from a user POV instead of a programmer's POV. A user will be confused when he attempts to start the daemon and the daemon just exits. Could opensm atleast complain about it saying that the options are incompatible or it does not want to use the available guids? Moreover, there is no problem when you set guid 0 in the opensm.conf and run opensm as a daemon (actually this is the default). Have you tried it with multi-port? For 1 port, get_port_guid() selects the default one because num_ports is 1 and the daemon will not exit, even if you supply -g 0 -B. BTW, We are using SLES 11. What happens when you provide -g WRONG_GUID -B? I think in this case, -B should take priority and set with the first active port available. I think that in that case, a user intended to bind OpenSM on specific port and it could be a major issue if OpenSM will automatically binds to a different port. In that case, when SM runs not in daemon mode, SM prompts the user to choose available port GUID out of available range. In case when SM runs in daemon mode, it can't prompt the user so it just exits. On second thoughts, passing port_guid is worthless because this function is called only when no guid is supplied at the command prompt. So, removed the port_guid parameter from the function altogether. If not in daemon mode, it would show the list of ports as intended. Also added error message if no ports are found. Signed-off-by: Goldwyn Rodrigues rgold...@suse.de diff --git a/opensm/main.c b/opensm/main.c index 51c8291..a236859 100644 --- a/opensm/main.c +++ b/opensm/main.c @@ -403,7 +403,7 @@ static void show_usage(void) exit(2); } -static ib_net64_t get_port_guid(IN osm_opensm_t * p_osm, uint64_t port_guid) +static ib_net64_t get_port_guid(IN osm_opensm_t *p_osm) { ib_port_attr_t attr_array[MAX_LOCAL_IBPORTS]; uint32_t num_ports = MAX_LOCAL_IBPORTS; @@ -436,21 +436,19 @@ static ib_net64_t get_port_guid(IN osm_opensm_t * p_osm, uint64_t port_guid) cl_hton64(attr_array[0].port_guid)); return attr_array[0].port_guid; } - /* If port_guid is 0 - use the first connected port */ - if (port_guid == 0) { + /* If in daemon mode autoselect first available port */ + if (p_osm-subn.opt.daemon) { for (i = 0; i num_ports; i++) if (attr_array[i].link_state IB_LINK_DOWN) break; + /* No port found which is available */ if (i == num_ports) - i = 0; + return 0; printf(Using default GUID 0x% PRIx64 \n, cl_hton64(attr_array[i].port_guid)); return attr_array[i].port_guid; } - if (p_osm-subn.opt.daemon) - return 0; - /* More than one possible port - list all ports and let the user * to choose. */ while (1) { @@ -1106,10 +1104,12 @@ int main(int argc, char *argv[]) then get a port GUID value with which to bind. */
[PATCH] RDS: Remove some unused iWARP code
From: Roland Dreier rol...@purestorage.com rds_iw_flush_goal() just returns a count, but it is only called in one place and its return value is ignored there. So delete all the dead code. Signed-off-by: Roland Dreier rol...@purestorage.com --- net/rds/iw_rdma.c | 15 +-- 1 files changed, 1 insertions(+), 14 deletions(-) diff --git a/net/rds/iw_rdma.c b/net/rds/iw_rdma.c index 4e1de17..a817705 100644 --- a/net/rds/iw_rdma.c +++ b/net/rds/iw_rdma.c @@ -477,17 +477,6 @@ void rds_iw_sync_mr(void *trans_private, int direction) } } -static inline unsigned int rds_iw_flush_goal(struct rds_iw_mr_pool *pool, int free_all) -{ - unsigned int item_count; - - item_count = atomic_read(pool-item_count); - if (free_all) - return item_count; - - return 0; -} - /* * Flush our pool of MRs. * At a minimum, all currently unused MRs are unmapped. @@ -500,7 +489,7 @@ static int rds_iw_flush_mr_pool(struct rds_iw_mr_pool *pool, int free_all) LIST_HEAD(unmap_list); LIST_HEAD(kill_list); unsigned long flags; - unsigned int nfreed = 0, ncleaned = 0, unpinned = 0, free_goal; + unsigned int nfreed = 0, ncleaned = 0, unpinned = 0; int ret = 0; rds_iw_stats_inc(s_iw_rdma_mr_pool_flush); @@ -514,8 +503,6 @@ static int rds_iw_flush_mr_pool(struct rds_iw_mr_pool *pool, int free_all) list_splice_init(pool-clean_list, kill_list); spin_unlock_irqrestore(pool-list_lock, flags); - free_goal = rds_iw_flush_goal(pool, free_all); - /* Batched invalidate of dirty MRs. * For FMR based MRs, the mappings on the unmap list are * actually members of an ibmr (ibmr-mapping). They either -- 1.7.8.3 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V1 1/6] IB: use central enum for speed instead of hard-coded values
Seems to have the raw packet QP stuff mixed in now? -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V1 1/6] IB: use central enum for speed instead of hard-coded values
On Thu, Jan 12, 2012 at 9:30 PM, Roland Dreier rol...@kernel.org wrote: Seems to have the raw packet QP stuff mixed in now? sorry, my bad, will fix and resend Or. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V2 1/6] IB: use central enum for speed instead of hard-coded values
The kernel IB stack uses one enumeration for IB speed, which wasn't explicitly specified in the verbs header file. Add that enum, and use it all over the code. Note that the IB speed/width notation is also used by iWARP and IBoE hw drivers who apply the convention of rate = speed X width, to advertize their port link rate. Signed-off-by: Or Gerlitz ogerl...@mellanox.com --- changes from v0: fixed typo in the enum type name (was ib_port_seed instead of ib_port_speed) changes from v1: removed raw qp code which went in by mistake drivers/infiniband/core/sysfs.c | 15 +-- drivers/infiniband/hw/amso1100/c2_provider.c |2 +- drivers/infiniband/hw/cxgb3/iwch_provider.c |2 +- drivers/infiniband/hw/cxgb4/provider.c |2 +- drivers/infiniband/hw/ehca/ehca_hca.c|2 +- drivers/infiniband/hw/mlx4/main.c| 10 +- drivers/infiniband/hw/nes/nes_verbs.c|2 +- include/rdma/ib_verbs.h |9 + 8 files changed, 28 insertions(+), 16 deletions(-) diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c index c61bca3..9ce70ca 100644 --- a/drivers/infiniband/core/sysfs.c +++ b/drivers/infiniband/core/sysfs.c @@ -189,21 +189,24 @@ static ssize_t rate_show(struct ib_port *p, struct port_attribute *unused, rate = (25 * attr.active_speed) / 10; switch (attr.active_speed) { - case 2: + case IB_SPEED_SDR: + speed = SDR; + break; + case IB_SPEED_DDR: speed = DDR; break; - case 4: + case IB_SPEED_QDR: speed = QDR; break; - case 8: + case IB_SPEED_FDR10: speed = FDR10; rate = 10; break; - case 16: + case IB_SPEED_FDR: speed = FDR; rate = 14; break; - case 32: + case IB_SPEED_EDR: speed = EDR; rate = 25; break; @@ -214,7 +217,7 @@ static ssize_t rate_show(struct ib_port *p, struct port_attribute *unused, return -EINVAL; return sprintf(buf, %d%s Gb/sec (%dX%s)\n, - rate, (attr.active_speed == 1) ? .5 : , + rate, (attr.active_speed == IB_SPEED_SDR) ? .5 : , ib_width_enum_to_int(attr.active_width), speed); } diff --git a/drivers/infiniband/hw/amso1100/c2_provider.c b/drivers/infiniband/hw/amso1100/c2_provider.c index 12f923d..07eb3a8 100644 --- a/drivers/infiniband/hw/amso1100/c2_provider.c +++ b/drivers/infiniband/hw/amso1100/c2_provider.c @@ -94,7 +94,7 @@ static int c2_query_port(struct ib_device *ibdev, props-pkey_tbl_len = 1; props-qkey_viol_cntr = 0; props-active_width = 1; - props-active_speed = 1; + props-active_speed = IB_SPEED_SDR; return 0; } diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c index 37c224f..0bdf09a 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c @@ -1227,7 +1227,7 @@ static int iwch_query_port(struct ib_device *ibdev, props-gid_tbl_len = 1; props-pkey_tbl_len = 1; props-active_width = 2; - props-active_speed = 2; + props-active_speed = IB_SPEED_DDR; props-max_msg_sz = -1; return 0; diff --git a/drivers/infiniband/hw/cxgb4/provider.c b/drivers/infiniband/hw/cxgb4/provider.c index 247fe70..be1c18f 100644 --- a/drivers/infiniband/hw/cxgb4/provider.c +++ b/drivers/infiniband/hw/cxgb4/provider.c @@ -329,7 +329,7 @@ static int c4iw_query_port(struct ib_device *ibdev, u8 port, props-gid_tbl_len = 1; props-pkey_tbl_len = 1; props-active_width = 2; - props-active_speed = 2; + props-active_speed = IB_SPEED_DDR; props-max_msg_sz = -1; return 0; diff --git a/drivers/infiniband/hw/ehca/ehca_hca.c b/drivers/infiniband/hw/ehca/ehca_hca.c index 73edc36..9ed4d25 100644 --- a/drivers/infiniband/hw/ehca/ehca_hca.c +++ b/drivers/infiniband/hw/ehca/ehca_hca.c @@ -233,7 +233,7 @@ int ehca_query_port(struct ib_device *ibdev, props-phys_state = 5; props-state = rblock-state; props-active_width= IB_WIDTH_12X; - props-active_speed= 0x1; + props-active_speed= IB_SPEED_SDR; } query_port1: diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c index 7b445df..6ff6bdf 100644 --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -215,16 +215,16 @@ static int ib_link_query_port(struct ib_device *ibdev, u8 port, switch (ext_active_speed) { case 1: - props-active_speed = 16; /* FDR */ +
Re: [PATCH] IB/qib: detour pcie_caps for certain chip sets
On Thu, Jan 12, 2012 at 08:02:52AM -0800, Mike Marciniszyn wrote: Should whatever this issue is be a general PCI fixup? Like broken MSI, etc. Can you point me to some details on this? I can explain the broken MSI stuff, as an example. As I noted I'm not sure what you are working around here, but if there are limits imposed on otherwise correct values in the PCI capabilities block then I think it is broadly applicable to handle this in core code... There are flags in pci.h like: PCI_BUS_FLAGS_NO_MSI = (__force pci_bus_flags_t) 1, Which are quirk things.. Look in drivers/pci/quirks.c to see how it is set. So broadly you'd make a new appropriate bus flag to control whatever you are working around and then test and set it in quirks, and provide core code to traverse the bus path from a device to ensure nothing in the path sets that quirk. Really depends what the problem actually is. Jason -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] IB/qib: detour pcie_caps for certain chip sets
On Thu, Jan 12, 2012 at 9:17 AM, Mike Marciniszyn mike.marcinis...@qlogic.com wrote: Does this work on systems where the broken chipset might not be the immediate parent of the qib device (ie there are some PCIe switches in between)? The code figures this out at the top of routine and returns, changing nothing. OIC. Also I see if (parent-vendor != 0x8086) return 1; so I guess you don't need another vendor check. Although this might be better written as PCI_VENDOR_ID_INTEL instead of 0x8086. I guess this is OK, although as Jason said it would be much better if the PCI core knew about these chipset errata. - R. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] IB/qib: detour pcie_caps for certain chip sets
if (parent-vendor != 0x8086) return 1; so I guess you don't need another vendor check. Actually, Jason is right. The vendor check you reference here is in qib_tune_pcie_coalesce() and not the routine being patched. A bit of background here is that the issue was noted with the indicated Harpertown root complex chip sets as follows: - The BIOS set the root complex MaxPayLoad to 128, but rc capabilities indicate 256 is possible - To get the best performance we tried going to 256 on the rc and our card and noted the Poisoned TLP - The patch is an effort to avoid having to use set pcie_caps at all as well as avoiding issues with the problematic chip sets - The module parameter can still be used to experiment We have never the issue with AMD or other Intel chipsets. The problematic device ids are not in fixup.c in lib. I can reissue a v2 with: - the vendor check - define use when available We probably need to do something, since the current 3.2 rc has the above risk. Mike This message and any attached documents contain information from QLogic Corporation or its wholly-owned subsidiaries that may be confidential. If you are not the intended recipient, you may not read, copy, distribute, or use this information. If you have received this transmission in error, please notify the sender immediately by reply e-mail and then delete this message. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] IB/qib: detour pcie_caps for certain chip sets
On Thu, Jan 12, 2012 at 02:14:12PM -0800, Mike Marciniszyn wrote: Actually, Jason is right. The vendor check you reference here is in qib_tune_pcie_coalesce() and not the routine being patched. A bit of background here is that the issue was noted with the indicated Harpertown root complex chip sets as follows: - The BIOS set the root complex MaxPayLoad to 128, but rc capabilities indicate 256 is possible - To get the best performance we tried going to 256 on the rc and our card and noted the Poisoned TLP I don't think it is appropriate for a driver to modify the pci configuration of the root complex.. What if other drivers also try and modify this configuration? Chaos. It doesn't seem to me like this has any place in the quirks thing either. Things seem to be working properly, the MaxPayLoad of 128 is clearly the highest the system will support correctly. Jason -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] IB/qib: detour pcie_caps for certain chip sets
It doesn't seem to me like this has any place in the quirks thing either. Things seem to be working properly, the MaxPayLoad of 128 is clearly the highest the system will support correctly. Jason Probably the best thing to do unwind the module parameter default in 8d4548f2b which would change the initial value back to 0. That's the way the file has always been and that won't change the rc. Mike This message and any attached documents contain information from QLogic Corporation or its wholly-owned subsidiaries that may be confidential. If you are not the intended recipient, you may not read, copy, distribute, or use this information. If you have received this transmission in error, please notify the sender immediately by reply e-mail and then delete this message. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] IB/qib: unwind pcie change
Commit 8d4548f2b (IB/qib: Default some module parameters optimally) introduced an issue with older root complexes. They cannot handle the pcie_caps of 0x51 (MaxReadReq 4096, MaxPayload=256). A typical diagnostic in this situation reported by syslog contains the text: [PCIe Poisoned TLP][Send DMA memory read] Restore the module paramter default to zero with will avoid any changes in the root complex. Reviewed-by: Mark Debbage mark.debb...@qlogic.com Signed-off-by: Mike Marciniszyn mike.marcinis...@qlogic.com --- drivers/infiniband/hw/qib/qib_pcie.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/hw/qib/qib_pcie.c b/drivers/infiniband/hw/qib/qib_pcie.c index 0de55c0..790646e 100644 --- a/drivers/infiniband/hw/qib/qib_pcie.c +++ b/drivers/infiniband/hw/qib/qib_pcie.c @@ -577,7 +577,7 @@ static int qib_tune_pcie_coalesce(struct qib_devdata *dd) * BIOS may not set PCIe bus-utilization parameters for best performance. * Check and optionally adjust them to maximize our throughput. */ -static int qib_pcie_caps = 0x51; +static int qib_pcie_caps; module_param_named(pcie_caps, qib_pcie_caps, int, S_IRUGO); MODULE_PARM_DESC(pcie_caps, Max PCIe tuning: Payload (0..3), ReadReq (4..7)); -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] RDS: Remove some unused iWARP code
From: Roland Dreier rol...@kernel.org Date: Thu, 12 Jan 2012 10:57:56 -0800 From: Roland Dreier rol...@purestorage.com rds_iw_flush_goal() just returns a count, but it is only called in one place and its return value is ignored there. So delete all the dead code. Signed-off-by: Roland Dreier rol...@purestorage.com Applied. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html