RE: [PATCH v2 00/15] NFS/RDMA patches proposed for 4.1
Hi Chuck, I have validated these set of patches with ocrdma device, iozone passes with these. -Regards Devesh -Original Message- From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma- ow...@vger.kernel.org] On Behalf Of Anna Schumaker Sent: Friday, March 27, 2015 12:10 AM To: Chuck Lever; linux-rdma@vger.kernel.org; linux-...@vger.kernel.org Subject: Re: [PATCH v2 00/15] NFS/RDMA patches proposed for 4.1 Hey Chuck, I didn't see anything that needs to be fixed up in these patches. Are they ready for me? Anna On 03/24/2015 04:30 PM, Chuck Lever wrote: This is a series of client-side patches for NFS/RDMA. In preparation for increasing the transport credit limit and maximum rsize/wsize, I've re-factored the memory registration logic into separate files, invoked via a method API. The two main optimizations in v1 of this series have been dropped. Sagi Grimberg didn't like the complexity of the solution, and there isn't enough time to rework it, test the new version, and get it reviewed before the 4.1 merge window opens. I'm going to prepare these for 4.2. Fixes suggested by reviewers have been included before the refactoring patches to make it easier to backport them to previous kernels. The series is available in the nfs-rdma-for-4.1 topic branch at git://linux-nfs.org/projects/cel/cel-2.6.git Changes since v1: - Rebased on 4.0-rc5 - Main optimizations postponed to 4.2 - Addressed review comments from Anna, Sagi, and Devesh --- Chuck Lever (15): SUNRPC: Introduce missing well-known netids xprtrdma: Display IPv6 addresses and port numbers correctly xprtrdma: Perform a full marshal on retransmit xprtrdma: Byte-align FRWR registration xprtrdma: Prevent infinite loop in rpcrdma_ep_create() xprtrdma: Add vector of ops for each memory registration strategy xprtrdma: Add a max_payload op for each memreg mode xprtrdma: Add a register_external op for each memreg mode xprtrdma: Add a deregister_external op for each memreg mode xprtrdma: Add init MRs memreg op xprtrdma: Add reset MRs memreg op xprtrdma: Add destroy MRs memreg op xprtrdma: Add open memreg op xprtrdma: Handle non-SEND completions via a callout xprtrdma: Make rpcrdma_{un}map_one() into inline functions include/linux/sunrpc/msg_prot.h|8 net/sunrpc/xprtrdma/Makefile |3 net/sunrpc/xprtrdma/fmr_ops.c | 208 +++ net/sunrpc/xprtrdma/frwr_ops.c | 353 ++ net/sunrpc/xprtrdma/physical_ops.c | 94 + net/sunrpc/xprtrdma/rpc_rdma.c | 87 ++-- net/sunrpc/xprtrdma/transport.c| 61 ++- net/sunrpc/xprtrdma/verbs.c| 699 +++- net/sunrpc/xprtrdma/xprt_rdma.h| 90 - 9 files changed, 882 insertions(+), 721 deletions(-) create mode 100644 net/sunrpc/xprtrdma/fmr_ops.c create mode 100644 net/sunrpc/xprtrdma/frwr_ops.c create mode 100644 net/sunrpc/xprtrdma/physical_ops.c -- Chuck Lever -- To unsubscribe from this list: send the line unsubscribe linux-nfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v2 00/15] NFS/RDMA patches proposed for 4.1
-Original Message- From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma- ow...@vger.kernel.org] On Behalf Of Devesh Sharma Sent: Friday, March 27, 2015 11:13 AM To: Anna Schumaker; Chuck Lever; linux-rdma@vger.kernel.org; linux- n...@vger.kernel.org Subject: RE: [PATCH v2 00/15] NFS/RDMA patches proposed for 4.1 Hi Chuck, I have validated these set of patches with ocrdma device, iozone passes with these. Thanks to Meghna. -Regards Devesh -Original Message- From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma- ow...@vger.kernel.org] On Behalf Of Anna Schumaker Sent: Friday, March 27, 2015 12:10 AM To: Chuck Lever; linux-rdma@vger.kernel.org; linux-...@vger.kernel.org Subject: Re: [PATCH v2 00/15] NFS/RDMA patches proposed for 4.1 Hey Chuck, I didn't see anything that needs to be fixed up in these patches. Are they ready for me? Anna On 03/24/2015 04:30 PM, Chuck Lever wrote: This is a series of client-side patches for NFS/RDMA. In preparation for increasing the transport credit limit and maximum rsize/wsize, I've re-factored the memory registration logic into separate files, invoked via a method API. The two main optimizations in v1 of this series have been dropped. Sagi Grimberg didn't like the complexity of the solution, and there isn't enough time to rework it, test the new version, and get it reviewed before the 4.1 merge window opens. I'm going to prepare these for 4.2. Fixes suggested by reviewers have been included before the refactoring patches to make it easier to backport them to previous kernels. The series is available in the nfs-rdma-for-4.1 topic branch at git://linux-nfs.org/projects/cel/cel-2.6.git Changes since v1: - Rebased on 4.0-rc5 - Main optimizations postponed to 4.2 - Addressed review comments from Anna, Sagi, and Devesh --- Chuck Lever (15): SUNRPC: Introduce missing well-known netids xprtrdma: Display IPv6 addresses and port numbers correctly xprtrdma: Perform a full marshal on retransmit xprtrdma: Byte-align FRWR registration xprtrdma: Prevent infinite loop in rpcrdma_ep_create() xprtrdma: Add vector of ops for each memory registration strategy xprtrdma: Add a max_payload op for each memreg mode xprtrdma: Add a register_external op for each memreg mode xprtrdma: Add a deregister_external op for each memreg mode xprtrdma: Add init MRs memreg op xprtrdma: Add reset MRs memreg op xprtrdma: Add destroy MRs memreg op xprtrdma: Add open memreg op xprtrdma: Handle non-SEND completions via a callout xprtrdma: Make rpcrdma_{un}map_one() into inline functions include/linux/sunrpc/msg_prot.h|8 net/sunrpc/xprtrdma/Makefile |3 net/sunrpc/xprtrdma/fmr_ops.c | 208 +++ net/sunrpc/xprtrdma/frwr_ops.c | 353 ++ net/sunrpc/xprtrdma/physical_ops.c | 94 + net/sunrpc/xprtrdma/rpc_rdma.c | 87 ++-- net/sunrpc/xprtrdma/transport.c| 61 ++- net/sunrpc/xprtrdma/verbs.c| 699 +++- net/sunrpc/xprtrdma/xprt_rdma.h| 90 - 9 files changed, 882 insertions(+), 721 deletions(-) create mode 100644 net/sunrpc/xprtrdma/fmr_ops.c create mode 100644 net/sunrpc/xprtrdma/frwr_ops.c create mode 100644 net/sunrpc/xprtrdma/physical_ops.c -- Chuck Lever -- To unsubscribe from this list: send the line unsubscribe linux-nfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html 칻 ~ +- ݶ w ˛ m b kvf ^n r z h G h ( 階 ݢj m z ޖ f h ~ m N�r��yb�X��ǧv�^�){.n�+{��ٚ�{ay�ʇڙ�,j��f���h���z��w��� ���j:+v���w�j�mzZ+�ݢj��!�i
Questions about inline-receive in Mellanox cards
Hi everyone. I have been trying to get inline-receive functionality to work on Connect-IB cards. However, it seems like libmlx5 (from Mellanox OFED 2.4) does not support inline-receive. libmlx4 seems to support it, but I believe libmlx4 is not compatible with Connect-IB cards. Further, the Mellanox OFED user manual says that Connect-IB cards provide inline-receive - which makes it sound like ConnectX-3 cards do not, but then libmlx4 supports it... Does this mean that ConnectX-3 cards also provide inline-receive? Can someone please help me with this confusion? Also, is inline-receive supported with UD transport? Thanks! Anuj Kalia -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2 RESEND] IB/Verbs: Use helpers to refine the checking on transport and link layer
On 03/26/2015 05:27 PM, Doug Ledford wrote: On Thu, 2015-03-26 at 17:04 +0100, Michael Wang wrote: [snip] Few more questions here is: 1. when to setup? (maybe inside ib_register_device() before doing client-add() callback?) I don't think we can set it up here. The driver's have to set it up. After all, the mlx4 driver will have to decide for itself what the port transport is and tell us, we can't tell it. 2. how to setup? (still infer from the transport and link layer like we currently do?) Find each point in each driver where they currently set the link layer and transport fields today, and replace that with setting the new transport bitmask instead. Forgive me but I'm not very familiar with the process of such changing... So shall we do this for all the vendors? or provide a transition method and asking them to do this later by themselves? The questions is just wondering how the transition method could be, but if we have to do the changes for vendor, that sounds like a tough job... 3. in case if a device has ports with different link layer type (please correct me if this will never happen), then only one bitmask may not be enough to present the transport of all the ports? (maybe create a bitmask per port?) [snip] That preserves the user space ABI and all user programs keep working, while we update to an internal representation that makes more sense for how things have evolved. I get your point :-) to reassemble the information in old style, maybe we can temporarily reserve the old mechanism, and do reform for the user space in another separate patch set, also cleanup the old stuff too. Regards, Michael Wang Regards, Michael Wang If we do this, then the only thing we have to fix up to preserve ABI with user space is to make sure that any time we export an ibv_device struct and any time we import the same, we convert from our new internal representation to the old representation that user space expects. And we also need to make a few changes in the sysfs code to display the properties as things expect. But, that would allow us to fix up what I see as a problem right now, which is that we hide the information we need to know what sort of device we are working on in two different fields: the transport and the link layer. Instead, just use one field with enough variants that we can store all of the relevant information we need in that one field. This has the benefit that any comparisons that happen in hot paths will now always be a single bitwise comparison and will no longer need to hit two separate variables for two separate compares. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2 RESEND] IB/Verbs: Use helpers to refine the checking on transport and link layer
On Thu, 2015-03-26 at 17:04 +0100, Michael Wang wrote: Hi, Doug Thanks for the excellent comments :-) On 03/26/2015 03:09 PM, Doug Ledford wrote: On Wed, 2015-03-25 at 16:09 +0100, Michael Wang wrote: [snip] [snip] So, I would suggest that we fix things up thusly: enum transport { TRANSPORT_IB=1, TRANSPORT_IWARP=2, TRANSPORT_ROCE=4, TRANSPORT_OPA=8, TRANSPORT_USNIC=10, }; #define HAS_SA(ibdev) ((ibdev)-transport (TRANSPORT_IB|TRANSPORT_OPA)) #define HAS_JUMBO_SA(ibdev) ((ibdev)-transport TRANSPORT_OPA)) or possibly static bool ib_dev_has_sa(struct ibv_device *ibdev) { return ibdev-transport (TRANSPORT_IB | TRANSPORT_OPA); } The idea sounds interesting, and here my silly questions come :-P So are you suggesting that we add a new bitmask 'transport' into 'struct ib_device' in kernel, and setup it at very beginning? Few more questions here is: 1. when to setup? (maybe inside ib_register_device() before doing client-add() callback?) I don't think we can set it up here. The driver's have to set it up. After all, the mlx4 driver will have to decide for itself what the port transport is and tell us, we can't tell it. 2. how to setup? (still infer from the transport and link layer like we currently do?) Find each point in each driver where they currently set the link layer and transport fields today, and replace that with setting the new transport bitmask instead. 3. in case if a device has ports with different link layer type (please correct me if this will never happen), then only one bitmask may not be enough to present the transport of all the ports? (maybe create a bitmask per port?) Correct, a bitmask per port. And we can remove the existing transport and link layer elements of the struct and replace it with just the new transport. Then, whenever we need to copy a struct to user space, we have a helper that looks something like this: static void inline ib_set_user_transport(struct ib_device *ibdev, struct user_ibv_device *uibdev) { switch(ibdev-port[port]-transport) { case TRANSPORT_IB: case TRANSPORT_OPA: uibdev-port[port]-link_layer = INFINIBAND; uibdev-port[port]-transport = INFINIBAND; break; case TRANSPORT_IWARP: uibdev-port[port]-link_layer = INFINIBAND; uibdev-port[port]-transport = IWARP; break; case TRANSPORT_ROCE: uibdev-port[port]-link_layer = ETHERNET; uibdev-port[port]-transport = INFINIBAND; break; case TRANSPORT_USNIC: uibdev-port[port]-link_layer = ETHERNET; uibdev-port[port]-transport = whatever USNIC uses today; break; default: pr_err(ibdev, unknown transport type %x\n, ibdev-port[port]-transport); } } That preserves the user space ABI and all user programs keep working, while we update to an internal representation that makes more sense for how things have evolved. Regards, Michael Wang If we do this, then the only thing we have to fix up to preserve ABI with user space is to make sure that any time we export an ibv_device struct and any time we import the same, we convert from our new internal representation to the old representation that user space expects. And we also need to make a few changes in the sysfs code to display the properties as things expect. But, that would allow us to fix up what I see as a problem right now, which is that we hide the information we need to know what sort of device we are working on in two different fields: the transport and the link layer. Instead, just use one field with enough variants that we can store all of the relevant information we need in that one field. This has the benefit that any comparisons that happen in hot paths will now always be a single bitwise comparison and will no longer need to hit two separate variables for two separate compares. -- Doug Ledford dledf...@redhat.com GPG KeyID: 0E572FDD signature.asc Description: This is a digitally signed message part
Re: [PATCH v2 00/15] NFS/RDMA patches proposed for 4.1
On Mar 26, 2015, at 1:39 PM, Anna Schumaker anna.schuma...@netapp.com wrote: Hey Chuck, I didn't see anything that needs to be fixed up in these patches. Are they ready for me? Thanks for the review. IMO we can go one of two routes: - Wait for HCA vendors to test this latest version of the series, or - Merge it now, and simply apply any needed fixes on top before the 4.1 window opens. What do you prefer? Is it possible to get this series in front of the zero-day test folks before you merge? Anna On 03/24/2015 04:30 PM, Chuck Lever wrote: This is a series of client-side patches for NFS/RDMA. In preparation for increasing the transport credit limit and maximum rsize/wsize, I've re-factored the memory registration logic into separate files, invoked via a method API. The two main optimizations in v1 of this series have been dropped. Sagi Grimberg didn't like the complexity of the solution, and there isn't enough time to rework it, test the new version, and get it reviewed before the 4.1 merge window opens. I'm going to prepare these for 4.2. Fixes suggested by reviewers have been included before the refactoring patches to make it easier to backport them to previous kernels. The series is available in the nfs-rdma-for-4.1 topic branch at git://linux-nfs.org/projects/cel/cel-2.6.git Changes since v1: - Rebased on 4.0-rc5 - Main optimizations postponed to 4.2 - Addressed review comments from Anna, Sagi, and Devesh --- Chuck Lever (15): SUNRPC: Introduce missing well-known netids xprtrdma: Display IPv6 addresses and port numbers correctly xprtrdma: Perform a full marshal on retransmit xprtrdma: Byte-align FRWR registration xprtrdma: Prevent infinite loop in rpcrdma_ep_create() xprtrdma: Add vector of ops for each memory registration strategy xprtrdma: Add a max_payload op for each memreg mode xprtrdma: Add a register_external op for each memreg mode xprtrdma: Add a deregister_external op for each memreg mode xprtrdma: Add init MRs memreg op xprtrdma: Add reset MRs memreg op xprtrdma: Add destroy MRs memreg op xprtrdma: Add open memreg op xprtrdma: Handle non-SEND completions via a callout xprtrdma: Make rpcrdma_{un}map_one() into inline functions include/linux/sunrpc/msg_prot.h|8 net/sunrpc/xprtrdma/Makefile |3 net/sunrpc/xprtrdma/fmr_ops.c | 208 +++ net/sunrpc/xprtrdma/frwr_ops.c | 353 ++ net/sunrpc/xprtrdma/physical_ops.c | 94 + net/sunrpc/xprtrdma/rpc_rdma.c | 87 ++-- net/sunrpc/xprtrdma/transport.c| 61 ++- net/sunrpc/xprtrdma/verbs.c| 699 +++- net/sunrpc/xprtrdma/xprt_rdma.h| 90 - 9 files changed, 882 insertions(+), 721 deletions(-) create mode 100644 net/sunrpc/xprtrdma/fmr_ops.c create mode 100644 net/sunrpc/xprtrdma/frwr_ops.c create mode 100644 net/sunrpc/xprtrdma/physical_ops.c -- Chuck Lever -- To unsubscribe from this list: send the line unsubscribe linux-nfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-nfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Chuck Lever chuck[dot]lever[at]oracle[dot]com -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/15] NFS/RDMA patches proposed for 4.1
Hey Chuck, I didn't see anything that needs to be fixed up in these patches. Are they ready for me? Anna On 03/24/2015 04:30 PM, Chuck Lever wrote: This is a series of client-side patches for NFS/RDMA. In preparation for increasing the transport credit limit and maximum rsize/wsize, I've re-factored the memory registration logic into separate files, invoked via a method API. The two main optimizations in v1 of this series have been dropped. Sagi Grimberg didn't like the complexity of the solution, and there isn't enough time to rework it, test the new version, and get it reviewed before the 4.1 merge window opens. I'm going to prepare these for 4.2. Fixes suggested by reviewers have been included before the refactoring patches to make it easier to backport them to previous kernels. The series is available in the nfs-rdma-for-4.1 topic branch at git://linux-nfs.org/projects/cel/cel-2.6.git Changes since v1: - Rebased on 4.0-rc5 - Main optimizations postponed to 4.2 - Addressed review comments from Anna, Sagi, and Devesh --- Chuck Lever (15): SUNRPC: Introduce missing well-known netids xprtrdma: Display IPv6 addresses and port numbers correctly xprtrdma: Perform a full marshal on retransmit xprtrdma: Byte-align FRWR registration xprtrdma: Prevent infinite loop in rpcrdma_ep_create() xprtrdma: Add vector of ops for each memory registration strategy xprtrdma: Add a max_payload op for each memreg mode xprtrdma: Add a register_external op for each memreg mode xprtrdma: Add a deregister_external op for each memreg mode xprtrdma: Add init MRs memreg op xprtrdma: Add reset MRs memreg op xprtrdma: Add destroy MRs memreg op xprtrdma: Add open memreg op xprtrdma: Handle non-SEND completions via a callout xprtrdma: Make rpcrdma_{un}map_one() into inline functions include/linux/sunrpc/msg_prot.h|8 net/sunrpc/xprtrdma/Makefile |3 net/sunrpc/xprtrdma/fmr_ops.c | 208 +++ net/sunrpc/xprtrdma/frwr_ops.c | 353 ++ net/sunrpc/xprtrdma/physical_ops.c | 94 + net/sunrpc/xprtrdma/rpc_rdma.c | 87 ++-- net/sunrpc/xprtrdma/transport.c| 61 ++- net/sunrpc/xprtrdma/verbs.c| 699 +++- net/sunrpc/xprtrdma/xprt_rdma.h| 90 - 9 files changed, 882 insertions(+), 721 deletions(-) create mode 100644 net/sunrpc/xprtrdma/fmr_ops.c create mode 100644 net/sunrpc/xprtrdma/frwr_ops.c create mode 100644 net/sunrpc/xprtrdma/physical_ops.c -- Chuck Lever -- To unsubscribe from this list: send the line unsubscribe linux-nfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v3 for-next 01/33] IB/core: Add RoCE GID cache
Hi Matan/Moni, Could either of you please respond to both of Bart's queries? Thanks Somnath -Original Message- From: Bart Van Assche [mailto:bart.vanass...@sandisk.com] Sent: Thursday, March 26, 2015 5:13 AM To: Somnath Kotur; rol...@kernel.org Cc: linux-rdma@vger.kernel.org; Matan Barak Subject: Re: [PATCH v3 for-next 01/33] IB/core: Add RoCE GID cache On 03/25/2015 02:19 PM, Somnath Kotur wrote: + if (cache-data_vec[ix].attr.ndev + cache-data_vec[ix].attr.ndev != old_net_dev) A few lines earlier the memory old_net_dev points at was freed. If two instances of this function run concurrently, what prevents that the old_net_dev memory has been reallocated and hence that attr.ndev == old_net_dev although both pointers refer(red) to different network devices ? + ACCESS_ONCE(cache-data_vec[ix].seq) = orig_seq; Invoking write_gid() is only safe if the caller serializes write_gid() calls. Apparently the cache-lock mutex is used for that purpose. So why is it necessary to use ACCESS_ONCE() here ? Why is it needed to prevent that the compiler coalesces this write with another write into the same structure ? + /* Make sure the sequence number we remeber was read This looks like a typo - shouldn't the above read remember ? BTW, the style of that comment is recommended only for networking code and not for IB code. Have you verified this patch with checkpatch ? + mutex_lock(cache-lock); + + for (ix = 0; ix cache-sz; ix++) + if (cache-data_vec[ix].attr.ndev == ndev) + write_gid(ib_dev, port, cache, ix, zgid, zattr); + + mutex_unlock(cache-lock); + return 0; The traditional Linux kernel coding style is one blank line before mutex_lock() and after mutex_unlock() but not after mutex_lock() nor before mutex_unlock(). + orig_seq = ACCESS_ONCE(cache-data_vec[index].seq); + /* Make sure we read the sequence number before copying the +* gid to local storage. */ + smp_rmb(); Please use READ_ONCE() instead of ACCESS_ONCE() as recommended in linux/compiler.h. +static void free_roce_gid_cache(struct ib_device *ib_dev, u8 port) { + int i; + struct ib_roce_gid_cache *cache = + ib_dev-cache.roce_gid_cache[port - 1]; + + if (!cache) + return; + + for (i = 0; i cache-sz; ++i) { + if (memcmp(cache-data_vec[i].gid, zgid, + sizeof(cache-data_vec[i].gid))) + write_gid(ib_dev, port, cache, i, zgid, zattr); + } + kfree(cache-data_vec); + kfree(cache); +} Overwriting data just before it is freed is not useful. Please use CONFIG_SLUB_DEBUG=y to debug use-after-free issues instead of such code. Bart. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2 RESEND] IB/Verbs: Use helpers to refine the checking on transport and link layer
On Thu, Mar 26, 2015 at 05:58:20PM +0100, Michael Wang wrote: The questions is just wondering how the transition method could be, but if we have to do the changes for vendor, that sounds like a tough job... I would see changing how the information is represented in the struct as a follow on issue. The first patch should go through and replace all direct access to the link layer/transport/etc with an appropriately narrow is_XX() test like Doug was suggesting. That means looking at each code site and determining what it needs, making a is_XX for it and a kdoc describing exactly what is needed for the test to return true. The follow on patch can then rework the is_XX and drop the link layer/transport stuff.. Some ideas for is_XX: IB compatible SA QP0 SMP mechanism IB SMP format OPA SMP format QP1 GMP mechanism IB compatible CM GID addressing IP/IPv6 addressing Ethernet VLAN ... Jason -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2 RESEND] IB/Verbs: Use helpers to refine the checking on transport and link layer
On Wed, 2015-03-25 at 16:09 +0100, Michael Wang wrote: My sincerely apologies for the corrupted mails, and thanks for Dan's kindly remind :-) There are too many lengthy code to check the transport type of IB device, or the link layer type of it's port, this patch set try to use some helper to refine and save us some code. TODO: Currently we inferred from the transport type and link layer type to identify the way of management, it will be better if we can directly get the indicator from vendor. Sean proposed one suggestion: https://www.mail-archive.com/linux-rdma@vger.kernel.org/msg23339.html It may need a big work to adapt current implementation to utilize these flags elegantly. Also the performance concern on query_port() need to be addressed, may be some new callback like query_mgmt() could works. Michael Wang (2): [PATCH 1/2] IB/Verbs: Use helpers to check transport and link layer [PATCH 2/2] IB/Verbs: Use helpers to check IBoE technology --- drivers/infiniband/core/agent.c |2 - drivers/infiniband/core/cm.c |2 - drivers/infiniband/core/cma.c | 33 -- drivers/infiniband/core/mad.c |6 ++--- drivers/infiniband/core/multicast.c | 11 +++--- drivers/infiniband/core/sa_query.c| 14 ++-- drivers/infiniband/core/ucm.c |3 -- drivers/infiniband/core/user_mad.c|2 - drivers/infiniband/core/verbs.c |5 +--- drivers/infiniband/hw/mlx4/ah.c |2 - drivers/infiniband/hw/mlx4/cq.c |4 --- drivers/infiniband/hw/mlx4/mad.c | 14 +++- drivers/infiniband/hw/mlx4/main.c |8 ++- drivers/infiniband/hw/mlx4/mlx4_ib.h |2 - drivers/infiniband/hw/mlx4/qp.c | 21 ++- drivers/infiniband/hw/mlx4/sysfs.c|6 + drivers/infiniband/ulp/ipoib/ipoib_main.c |6 ++--- include/rdma/ib_verbs.h | 30 +++ net/sunrpc/xprtrdma/svc_rdma_recvfrom.c |3 -- 19 files changed, 87 insertions(+), 87 deletions(-) I think this is a step in the right direction. However, as I brought up in a different thread, I think it would be best if we start to clear up some of the funny things in this space, such as calling iWARP link layer InfiniBand. One thing we didn't discuss before is that, even if changing these items around in user space would break user space, changing them around in the kernel won't break anything except maybe Lustre (which we can inform the authors about the change so they can anticipate it and do the right thing in their code) because we can fix up our return values we pass to user space with no impact to them as it's not on a hot path. We can then emulate the old way of setting all these variables to user space while fixing them up in kernel space. So, I would suggest that we fix things up thusly: enum transport { TRANSPORT_IB=1, TRANSPORT_IWARP=2, TRANSPORT_ROCE=4, TRANSPORT_OPA=8, TRANSPORT_USNIC=10, }; #define HAS_SA(ibdev) ((ibdev)-transport (TRANSPORT_IB|TRANSPORT_OPA)) #define HAS_JUMBO_SA(ibdev) ((ibdev)-transport TRANSPORT_OPA)) or possibly static bool ib_dev_has_sa(struct ibv_device *ibdev) { return ibdev-transport (TRANSPORT_IB | TRANSPORT_OPA); } If we do this, then the only thing we have to fix up to preserve ABI with user space is to make sure that any time we export an ibv_device struct and any time we import the same, we convert from our new internal representation to the old representation that user space expects. And we also need to make a few changes in the sysfs code to display the properties as things expect. But, that would allow us to fix up what I see as a problem right now, which is that we hide the information we need to know what sort of device we are working on in two different fields: the transport and the link layer. Instead, just use one field with enough variants that we can store all of the relevant information we need in that one field. This has the benefit that any comparisons that happen in hot paths will now always be a single bitwise comparison and will no longer need to hit two separate variables for two separate compares. -- Doug Ledford dledf...@redhat.com GPG KeyID: 0E572FDD signature.asc Description: This is a digitally signed message part