Re: [PATCHv2] rdma: add a new IB_ACCESS_GIFT flag
On Wed, Apr 10, 2013 at 12:32:31AM -0400, Michael R. Hines wrote: > On 04/09/2013 11:24 PM, Michael S. Tsirkin wrote: > >Which mechanism do you refer to? You patches still seem to pin > >each page in guest memory at some point, which will break all COW. > >In particular any pagemap tricks to detect duplicates on source > >that I suggested won't work. > > Sorry, I mispoke. I'm reffering to dynamic server page registration. > > Of course it does not eliminate pinning - but it does mitigate the > foot print of the VM as a feature that was requested. > > I have implemented it and documented it. > > - Michael Okay, but GIFT is supposed to be used on send side: it's only allowed with local/remote read access, and serves to reduce memory usage on send side. For example, disable zero page detection and look at memory usage on send side before and after migration. Dynamic registration on the receive side is nice but seems completely unrelated ... -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2] rdma: add a new IB_ACCESS_GIFT flag
On 04/09/2013 11:24 PM, Michael S. Tsirkin wrote: Which mechanism do you refer to? You patches still seem to pin each page in guest memory at some point, which will break all COW. In particular any pagemap tricks to detect duplicates on source that I suggested won't work. Sorry, I mispoke. I'm reffering to dynamic server page registration. Of course it does not eliminate pinning - but it does mitigate the foot print of the VM as a feature that was requested. I have implemented it and documented it. - Michael On 04/09/2013 03:03 PM, Michael S. Tsirkin wrote: presumably is_dup_page reads the page, so should not break COW ... I'm not sure about the cgroups swap limit - you might have too many non COW pages so attempting to fault them all in makes you exceed the limit. You really should look at what is going on in the pagemap, to see if there's measureable gain from the patch. On Fri, Apr 05, 2013 at 05:32:30PM -0400, Michael R. Hines wrote: Well, I have the "is_dup_page()" commented out...when RDMA is activated. Is there something else in QEMU that could be touching the page that I don't know about? - Michael On 04/05/2013 05:03 PM, Roland Dreier wrote: On Fri, Apr 5, 2013 at 1:51 PM, Michael R. Hines wrote: Sorry, I was wrong. ignore the comments about cgroups. That's still broken. (i.e. trying to register RDMA memory while using a cgroup swap limit cause the process get killed). But the GIFT flag patch works (my understanding is that GIFT flag allows the adapter to transmit stale memory information, it does not have anything to do with cgroups specifically). The point of the GIFT patch is to avoid triggering copy-on-write so that memory doesn't blow up during migration. If that doesn't work then there's no point to the patch. - R. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2] rdma: add a new IB_ACCESS_GIFT flag
On Tue, Apr 09, 2013 at 09:26:59PM -0400, Michael R. Hines wrote: > With respect, I'm going to offload testing this patch back to the author =) > because I'm trying to address all of Paolo's other minor issues > with the RDMA patch before we can merge. Fair enough, this likely means it won't happen anytime soon though. > Since dynamic page registration (as you requested) is now fully > implemented, this patch is less urgent since we now have a > mechanism in place to avoid page pinning on both sides of the migration. > > - Michael > Which mechanism do you refer to? You patches still seem to pin each page in guest memory at some point, which will break all COW. In particular any pagemap tricks to detect duplicates on source that I suggested won't work. > On 04/09/2013 03:03 PM, Michael S. Tsirkin wrote: > >presumably is_dup_page reads the page, so should not break COW ... > > > >I'm not sure about the cgroups swap limit - you might have > >too many non COW pages so attempting to fault them all in > >makes you exceed the limit. You really should look at > >what is going on in the pagemap, to see if there's > >measureable gain from the patch. > > > > > >On Fri, Apr 05, 2013 at 05:32:30PM -0400, Michael R. Hines wrote: > >>Well, I have the "is_dup_page()" commented out...when RDMA is > >>activated. > >> > >>Is there something else in QEMU that could be touching the page that > >>I don't know about? > >> > >>- Michael > >> > >> > >>On 04/05/2013 05:03 PM, Roland Dreier wrote: > >>>On Fri, Apr 5, 2013 at 1:51 PM, Michael R. Hines > >>> wrote: > Sorry, I was wrong. ignore the comments about cgroups. That's still > broken. > (i.e. trying to register RDMA memory while using a cgroup swap limit cause > the process get killed). > > But the GIFT flag patch works (my understanding is that GIFT flag allows > the > adapter to transmit stale memory information, it does not have anything to > do with cgroups specifically). > >>>The point of the GIFT patch is to avoid triggering copy-on-write so > >>>that memory doesn't blow up during migration. If that doesn't work > >>>then there's no point to the patch. > >>> > >>> - R. > >>> -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 2/2] Ad IB_MTU_1500|9000 enums.
> -Original Message- > From: Hefty, Sean > Sent: Tuesday, April 09, 2013 6:30 PM > To: Weiny, Ira; Jeff Squyres (jsquyres) > Cc: Hal Rosenstock; Roland Dreier; linux-rdma@vger.kernel.org; Upinder > Malhi (umalhi) > Subject: RE: [PATCH 2/2] Ad IB_MTU_1500|9000 enums. > > > If the IBTA were to release new MTU enumerations which values would > > you recommend then? > > I don't think there's a great solution here. We're mixing IBTA encoded values > with non-IBTA values. We could reserve the 6-bit encoded values for IB, and > use direct values for others (or at least jump beyond the 6-bit range). Or we > can stop matching new IBTA MTU encodings (e.g. IB_MTU_1500 = 6). Or we > go back in time and make mtu an int. > I thought reserving the 6 bit's for IB and allowing the enum values to match the MTU was a pretty good compromise. Especially since PathRecord is defined in sa.h which is provided by libibverbs. That allows for that IB MTU enum to be used there. OTOH, now that we have moved toward decent defines in the libibumad library we could define the MTU enum there. But then we again go down the path of defining things multiple places and confusing the users... :-( As an aside I like the use of RDMA_MTU_* for these values. Again to distinguish them from the IBTA values. But I know that is poor form. Ira -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 2/2] Ad IB_MTU_1500|9000 enums.
> If the IBTA were to release new MTU enumerations which values would you > recommend then? I don't think there's a great solution here. We're mixing IBTA encoded values with non-IBTA values. We could reserve the 6-bit encoded values for IB, and use direct values for others (or at least jump beyond the 6-bit range). Or we can stop matching new IBTA MTU encodings (e.g. IB_MTU_1500 = 6). Or we go back in time and make mtu an int. - Sean -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2] rdma: add a new IB_ACCESS_GIFT flag
With respect, I'm going to offload testing this patch back to the author =) because I'm trying to address all of Paolo's other minor issues with the RDMA patch before we can merge. Since dynamic page registration (as you requested) is now fully implemented, this patch is less urgent since we now have a mechanism in place to avoid page pinning on both sides of the migration. - Michael On 04/09/2013 03:03 PM, Michael S. Tsirkin wrote: presumably is_dup_page reads the page, so should not break COW ... I'm not sure about the cgroups swap limit - you might have too many non COW pages so attempting to fault them all in makes you exceed the limit. You really should look at what is going on in the pagemap, to see if there's measureable gain from the patch. On Fri, Apr 05, 2013 at 05:32:30PM -0400, Michael R. Hines wrote: Well, I have the "is_dup_page()" commented out...when RDMA is activated. Is there something else in QEMU that could be touching the page that I don't know about? - Michael On 04/05/2013 05:03 PM, Roland Dreier wrote: On Fri, Apr 5, 2013 at 1:51 PM, Michael R. Hines wrote: Sorry, I was wrong. ignore the comments about cgroups. That's still broken. (i.e. trying to register RDMA memory while using a cgroup swap limit cause the process get killed). But the GIFT flag patch works (my understanding is that GIFT flag allows the adapter to transmit stale memory information, it does not have anything to do with cgroups specifically). The point of the GIFT patch is to avoid triggering copy-on-write so that memory doesn't blow up during migration. If that doesn't work then there's no point to the patch. - R. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 2/2] Ad IB_MTU_1500|9000 enums.
> -Original Message- > From: Jeff Squyres (jsquyres) [mailto:jsquy...@cisco.com] > Subject: Re: [PATCH 2/2] Ad IB_MTU_1500|9000 enums. > > On Apr 8, 2013, at 6:16 PM, "Hefty, Sean" wrote: > > > Why can't IB_MTU_1500 = 1500? > Sean, If the IBTA were to release new MTU enumerations which values would you recommend then? Ira > > It certainly could. Additionally, since Roland was a little concerned about > the > "IB" prefix (since 1500 and 9000 are not IBTA-sanctioned MTUs), they could > have a different prefix -- perhaps RDMA_MTU_1500. > > Although I admit that it would be weird to have an enum that contains values > with different prefixes: > > enum ib_mtu { > IB_MTU_256 = 1, > IB_MTU_512 = 2, > IB_MTU_1024 = 3, > IB_MTU_2048 = 4, > IB_MTU_4096 = 5, > RDMA_MTU_1500 = 1500, > RDMA_MTU_9000 = 9000 > }; > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] Ad IB_MTU_1500|9000 enums.
On Apr 9, 2013, at 4:10 PM, "Weiny, Ira" wrote: >> Just to re-state: our issue is that there does not seem to be any other way >> to >> get the max UD message size without knowing the actual MTU (are we >> incorrect about that?). Hence, using the IB-defined values is not really >> sufficient. > > I guess I am confused. Is this patch trying to support RoCE or a VNIC? Both, actually. The RoCE driver lies about its MTU (IIRC, it claims IB_MTU_1024, even if the MTU is actually 1500). So AFAIK, there's no way to know what the UD max message size is on RoCE, because the max message size attribute on port refers to RC, not UD. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] Ad IB_MTU_1500|9000 enums.
On Apr 8, 2013, at 6:16 PM, "Hefty, Sean" wrote: > Why can't IB_MTU_1500 = 1500? It certainly could. Additionally, since Roland was a little concerned about the "IB" prefix (since 1500 and 9000 are not IBTA-sanctioned MTUs), they could have a different prefix -- perhaps RDMA_MTU_1500. Although I admit that it would be weird to have an enum that contains values with different prefixes: enum ib_mtu { IB_MTU_256 = 1, IB_MTU_512 = 2, IB_MTU_1024 = 3, IB_MTU_2048 = 4, IB_MTU_4096 = 5, RDMA_MTU_1500 = 1500, RDMA_MTU_9000 = 9000 }; -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V4 for-next 1/5] IB/core: Add RSS and TSS QP groups
> This patch introduces the concept of RSS and TSS QP groups which > allows for implementing them by low level drivers and using it > by IPoIB and later also by user space ULPs. > > A QP group is a set of QPs consists of a parent QP and two disjoint sets > of RSS and TSS QPs. The creation of a QP group is a two stage process: > > In the the 1st stage, the parent QP is created. > > In the 2nd stage the children QPs of the parent are created. > > Each child QP indicates if its a RSS or TSS QP. Both the TSS > and RSS sets of QPs should have contiguous QP numbers. > > It is forbidden to modify parent QP state before all RSS/TSS children > were created. In the same manner it is disallowed to destroy the parent > QP unless all RSS/TSS children were destroyed. > > A few new elements/concepts are introduced to support this: > > Three new device capabilities that can be set by the low level driver: > > - IB_DEVICE_QPG which is set to indicate QP groups are supported. > > - IB_DEVICE_UD_RSS which is set to indicate that the device supports > RSS, that is applying hash function on incoming TCP/UDP/IP packets and > dispatching them to multiple "rings" (child QPs). > > - IB_DEVICE_UD_TSS which is set to indicate that the device supports > "HW TSS" which means that the HW is capable of over-riding the source > UD QPN present in sent IB datagram header (DTH) with the parent's QPN. > > Low level drivers not supporting HW TSS, could still support QP groups, such > as combination is referred as "SW TSS". Where in this case, the low level > drive > fills in the qpg_tss_mask_sz field of struct ib_qp_cap returned from > ib_create_qp. Such that this mask can be used to retrieve the parent QPN from > incoming packets carrying a child QPN (as of the contiguous QP numbers > requirement). > > - max rss table size device attribute, which is the maximal size of the RSS > indirection table supported by the device > > - qp group type attribute for qp creation saying whether this is a parent QP > or rx/tx (rss/tss) child QP or none of the above for non rss/tss QPs. > > - per qp group type, another attribute is added, for parent QPs, the number > of rx/tx child QPs and for child QPs pointer to the parent. > > - IB_QP_GROUP_RSS attribute mask, which should be used when modifying > the parent QP state from reset to init On Tue, Apr 9, 2013 at 8:06 PM, Hefty, Sean wrote: > I have no issue with RSS/TSS. But the 'qp group' interface to using this > seems kludgy. lets try to be more specific > On a node, this is multiple send/receive queues grouped together to form a > larger > construct. On the wire, this is a single QP - maybe? I'm still not clear on > that. From > what's written, all the send queues appear as a single QPN. The receive > queues > appear as different QPNs. Starting with RSS QP groups: its a group made of one parent QP and N RSS child QPs. On the wire everything is sent to the RSS parent QP, however, when the HW receives a packet for which this QP/QPN is the destination, it applies a hash function on the packet header and subject to the hash result dispatches the packet to one of the N child QPs. The design applies for IB UD QPs and Raw Ethernet Packet QP types, under IB the QPN of the parent is on the wire, under Eth, there are no QPNs on the wire, but that HW has some "steering rule" which makes certain packets to be steered to that RSS parent, and the RSS parent in turn further does dispatching decision (hashing) to determine which of the child RSS QPs will actually receive that packet. With IPoIB, the remote side is provided with the RSS parent QPN as part of the IPoIB HW address provided in the ARP reply payload, so packets are sent to that QPN. With RAW Packet Eth QPs, the remote side isn't aware to QPNs at all, all goes through a steering rule who is directing to the RSS parent. You can send packets over RSS packet QP but not receive packets. So for RSS, the remote side isn't aware to that QP group @ all. Makes sense? As for TSS QP groups, basically && generally speaking, the only case that really matters are applications/drivers that care for the source QPN of a packet. but lets get there after hopefully agreeing what is RSS QP group. Or. > > Signed-off-by: Shlomo Pongratz > --- > drivers/infiniband/core/uverbs_cmd.c |1 + > drivers/infiniband/core/verbs.c | 118 > ++ > drivers/infiniband/hw/amso1100/c2_provider.c |3 + > drivers/infiniband/hw/cxgb3/iwch_provider.c |2 + > drivers/infiniband/hw/cxgb4/qp.c |3 + > drivers/infiniband/hw/ehca/ehca_qp.c |3 + > drivers/infiniband/hw/ipath/ipath_qp.c |3 + > drivers/infiniband/hw/mlx4/qp.c |3 + > drivers/infiniband/hw/mthca/mthca_provider.c |3 + > drivers/infiniband/hw/nes/nes_verbs.c|3 + > drivers/infiniband/hw/ocrdma/ocrdma_verbs.c |5 + > drivers/infiniband/hw/qib/qib_qp.c |5 + > include/r
Re: [PATCHv2] rdma: add a new IB_ACCESS_GIFT flag
On Tue, Apr 09, 2013 at 11:34:09PM +0300, Michael S. Tsirkin wrote: > On Fri, Apr 05, 2013 at 04:17:36PM -0400, Michael R. Hines wrote: > > The userland part of the patch was missing (IBV_ACCESS_GIFT). > > > > I added flag that to /usr/include in addition to this patch and did > > a test RDMA migrate and it seems to work without any problems. > > > > I also removed the IBV_*_WRITE flags on the sender-side and > > activated cgroups with the "memory.memsw.limit_in_bytes" activated > > and the migration with RDMA also succeeded without any problems > > (both with *and* without GIFT also worked). > > > > Any additional tests you would like? > > > > > > - Michael > > RDMA can't really work with swap so not sure how that's relevant. > > Please check memory.usage_in_bytes - is it lower with > the GIFT flag? I think this is what we really care about. oh and no reason to set memsw.limit_in_bytes I think. > -- > MST -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2] rdma: add a new IB_ACCESS_GIFT flag
On Fri, Apr 05, 2013 at 04:17:36PM -0400, Michael R. Hines wrote: > The userland part of the patch was missing (IBV_ACCESS_GIFT). > > I added flag that to /usr/include in addition to this patch and did > a test RDMA migrate and it seems to work without any problems. > > I also removed the IBV_*_WRITE flags on the sender-side and > activated cgroups with the "memory.memsw.limit_in_bytes" activated > and the migration with RDMA also succeeded without any problems > (both with *and* without GIFT also worked). > > Any additional tests you would like? > > > - Michael RDMA can't really work with swap so not sure how that's relevant. Please check memory.usage_in_bytes - is it lower with the GIFT flag? I think this is what we really care about. -- MST -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 for-next 0/5] IB/IPoIB: Add multi-queue TSS and RSS support
On Tue, Apr 9, 2013 at 8:06 PM, Hefty, Sean wrote: > I have no issue with RSS/TSS. But the 'qp group' interface to using this > seems kludgy. OK, so lets take it over the patch that has the QP group description > On a node, this is multiple send/receive queues grouped together to form a > larger > construct. On the wire, this is a single QP - maybe? I'm still not clear on > that. From > what's written, all the send queues appear as a single QPN. The receive > queues > appear as different QPNs. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2] rdma: add a new IB_ACCESS_GIFT flag
On Fri, Apr 05, 2013 at 01:43:49PM -0700, Roland Dreier wrote: > On Fri, Apr 5, 2013 at 1:17 PM, Michael R. Hines > wrote: > > I also removed the IBV_*_WRITE flags on the sender-side and activated > > cgroups with the "memory.memsw.limit_in_bytes" activated and the migration > > with RDMA also succeeded without any problems (both with *and* without GIFT > > also worked). > > Not sure I'm interpreting this correctly. Are you saying that things > worked without actually setting the GIFT flag? In which case why are > we adding this flag? > > - R. We are adding the flag to reduce memory when there's lots of COW pages. There's no guarantee there will be COW pages so I expect things to work both with and without breaking COW, just using much more memory when we break COW. -- MST -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 2/2] Ad IB_MTU_1500|9000 enums.
> -Original Message- > From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma- > Subject: Re: [PATCH 2/2] Ad IB_MTU_1500|9000 enums. > > On Apr 4, 2013, at 1:57 PM, "Weiny, Ira" wrote: > > >> In hindsight, the user space API never should have exposed the mtu as > >> an enum... > >> > >> Since an enum is an int, and we're never going to have anything with > >> an mtu <= 5 bytes, couldn't we just store all new mtu values directly > >> as their byte value? > > > > That seems like a pretty good idea. > > > Agreed, but changing to an int would seem to have some fairly serious > backwards compatibility issues. > > What is the right way to move forward here? > > Just to re-state: our issue is that there does not seem to be any other way to > get the max UD message size without knowing the actual MTU (are we > incorrect about that?). Hence, using the IB-defined values is not really > sufficient. I guess I am confused. Is this patch trying to support RoCE or a VNIC? Ira -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2] rdma: add a new IB_ACCESS_GIFT flag
On Fri, Apr 05, 2013 at 02:03:33PM -0700, Roland Dreier wrote: > On Fri, Apr 5, 2013 at 1:51 PM, Michael R. Hines > wrote: > > Sorry, I was wrong. ignore the comments about cgroups. That's still broken. > > (i.e. trying to register RDMA memory while using a cgroup swap limit cause > > the process get killed). > > > > But the GIFT flag patch works (my understanding is that GIFT flag allows the > > adapter to transmit stale memory information, it does not have anything to > > do with cgroups specifically). > > The point of the GIFT patch is to avoid triggering copy-on-write so > that memory doesn't blow up during migration. If that doesn't work > then there's no point to the patch. > > - R. Absolutely. Checking whether an OOM gets triggered looks like a heavy handed approach to testing the feature though. It's relevant, but there could be many other reasons for it to trigger. See Documentation/cgroups/memory.txt section "Troubleshooting". It's easier to just check whether this patch reduces the memory consumption, that's the point really. -- MST -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2] rdma: add a new IB_ACCESS_GIFT flag
presumably is_dup_page reads the page, so should not break COW ... I'm not sure about the cgroups swap limit - you might have too many non COW pages so attempting to fault them all in makes you exceed the limit. You really should look at what is going on in the pagemap, to see if there's measureable gain from the patch. On Fri, Apr 05, 2013 at 05:32:30PM -0400, Michael R. Hines wrote: > Well, I have the "is_dup_page()" commented out...when RDMA is > activated. > > Is there something else in QEMU that could be touching the page that > I don't know about? > > - Michael > > > On 04/05/2013 05:03 PM, Roland Dreier wrote: > >On Fri, Apr 5, 2013 at 1:51 PM, Michael R. Hines > > wrote: > >>Sorry, I was wrong. ignore the comments about cgroups. That's still broken. > >>(i.e. trying to register RDMA memory while using a cgroup swap limit cause > >>the process get killed). > >> > >>But the GIFT flag patch works (my understanding is that GIFT flag allows the > >>adapter to transmit stale memory information, it does not have anything to > >>do with cgroups specifically). > >The point of the GIFT patch is to avoid triggering copy-on-write so > >that memory doesn't blow up during migration. If that doesn't work > >then there's no point to the patch. > > > > - R. > > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2] rdma: add a new IB_ACCESS_GIFT flag
On Tue, Apr 09, 2013 at 01:56:00PM -0400, Michael R. Hines wrote: > On 04/09/2013 12:39 PM, Michael S. Tsirkin wrote: > >On Fri, Apr 05, 2013 at 04:54:39PM -0400, Michael R. Hines wrote: > >>To be more specific, here's what I did: > >> > >>1. apply kernel module patch - re-insert module > >>1. QEMU does: ibv_reg_mr(IBV_ACCESS_GIFT | IBV_ACCESS_REMOTE_READ) > >>2. Start the RDMA migration > >>3. Migration completes without any errors > >> > >>This test does *not* work with a cgroup swap limit, however. The > >>process gets killed. (Both with and without GIFT) > >> > >>- Michael > >Try to attach a debugger and see where it is when it gets killed? > > > > It's killed by cgroups - not a CPU exception. > > The same test works fine using TCP migration with cgroups - > everything is fine there. > > The memory that RDMA attempted to register hits some kind of cgroups policy > which results in a kernel message saying that the cgroup swap limit was hit > and then it goes ahead and kills the process altogether. > > It's not a QEMU problem - it seems to be a kernel bug. Maybe cgroup swap limit really is buggy. That's interesting, but not really related to this patch. What's interesting is whether we save lots memory by using this patch. Couldn't you dump the pagemap for the qemu process and calculate real memory usage before and after applying the patch? -- MST -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH v3] IPoIB: Leave space in skb linear buffer for IP headers
> >-IPOIB_UD_HEAD_SIZE = IB_GRH_BYTES + IPOIB_ENCAP_LEN, >+/* add 128 bytes of tailroom for IP/TCP headers */ >+IPOIB_UD_HEAD_SIZE = IB_GRH_BYTES + IPOIB_ENCAP_LEN + 128, Hello, the version 3 of the patch finally works. I can see the performance gains but I cannot feel them (in real life). Here are the results of my testbed: Test 1: netperf/netserver message size 16K kernel 3.5 default : 5.1 GBit/s kernel 3.5 + patch v3 : 7.7 GBit/s kernel 3.5 + max MTU 3K: 10.8 GBit/s Test 2: Disk write performance VM with disk mounted on IB async NFS server block size | default | patch v3 | max MTU 3K +--+--+-- 1 KB | 10 MB/s | 10 MB/s | 10 MB/s 2 KB | 20 MB/s | 21 MB/s | 20 MB/s 4 KB | 40 MB/s | 40 MB/s | 43 MB/s 8 KB | 68 MB/s | 70 MB/s | 78 MB/s 16 KB | 105 MB/s | 105 MB/s | 120 MB/s 32 KB | 150 MB/s | 150 MB/s | 170 MB/s 64 KB | 200 MB/s | 210 MB/s | 260 MB/s 128 KB | 270 MB/s | 290 MB/s | 400 MB/s 256 KB | 300 MB/s | 310 MB/s | 430 MB/s 512 KB | 305 MB/s | 320 MB/s | 470 MB/s 1024 KB | 310 MB/s | 325 MB/s | 500 MB/s 2048 KB | 310 MB/s | 325 MB/s | 510 MB/s 4096 KB | 370 MB/s | 325 MB/s | 510 MB/s 8192 KB | 400 MB/s | 325 MB/s | 520 MB/s As you can see netperf throughput increases while NFS does not even care about the optimizations. Maybe it does not work well with fragmented SKBs. The MAX MTU 3K values once again are forced through a hack inside ipoib_main.c. For curiosity I changed the block splitting in your v3 patch from small head with large fragment to large head with small fragment in this line. IPOIB_UD_HEAD_SIZE = IB_GRH_BYTES + IPOIB_ENCAP_LEN + 3072 In my 2044 MTU case this brings the netperf & NFS throughput to the same levels as the dirty hack. Of course this no longer reflects a head but equals more or less to something like a new constant IPOIB_UD_FIXED_SKB_SIZE. I guess 4K MTU will not see any further gains but avoiding the skb_pull calls should improve speed as well. Maybe a final adaption could put the cherry on the cake. Markus -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2] rdma: add a new IB_ACCESS_GIFT flag
On 04/09/2013 12:39 PM, Michael S. Tsirkin wrote: On Fri, Apr 05, 2013 at 04:54:39PM -0400, Michael R. Hines wrote: To be more specific, here's what I did: 1. apply kernel module patch - re-insert module 1. QEMU does: ibv_reg_mr(IBV_ACCESS_GIFT | IBV_ACCESS_REMOTE_READ) 2. Start the RDMA migration 3. Migration completes without any errors This test does *not* work with a cgroup swap limit, however. The process gets killed. (Both with and without GIFT) - Michael Try to attach a debugger and see where it is when it gets killed? It's killed by cgroups - not a CPU exception. The same test works fine using TCP migration with cgroups - everything is fine there. The memory that RDMA attempted to register hits some kind of cgroups policy which results in a kernel message saying that the cgroup swap limit was hit and then it goes ahead and kills the process altogether. It's not a QEMU problem - it seems to be a kernel bug. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2] rdma: add a new IB_ACCESS_GIFT flag
On Fri, Apr 05, 2013 at 04:54:39PM -0400, Michael R. Hines wrote: > To be more specific, here's what I did: > > 1. apply kernel module patch - re-insert module > 1. QEMU does: ibv_reg_mr(IBV_ACCESS_GIFT | IBV_ACCESS_REMOTE_READ) > 2. Start the RDMA migration > 3. Migration completes without any errors > > This test does *not* work with a cgroup swap limit, however. The > process gets killed. (Both with and without GIFT) > > - Michael Try to attach a debugger and see where it is when it gets killed? > On 04/05/2013 04:43 PM, Roland Dreier wrote: > >On Fri, Apr 5, 2013 at 1:17 PM, Michael R. Hines > > wrote: > >>I also removed the IBV_*_WRITE flags on the sender-side and activated > >>cgroups with the "memory.memsw.limit_in_bytes" activated and the migration > >>with RDMA also succeeded without any problems (both with *and* without GIFT > >>also worked). > >Not sure I'm interpreting this correctly. Are you saying that things > >worked without actually setting the GIFT flag? In which case why are > >we adding this flag? > > > > - R. > > > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH V3 for-next 0/5] IB/IPoIB: Add multi-queue TSS and RSS support
> any feedback? I have no issue with RSS/TSS. But the 'qp group' interface to using this seems kludgy. On a node, this is multiple send/receive queues grouped together to form a larger construct. On the wire, this is a single QP - maybe? I'm still not clear on that. From what's written, all the send queues appear as a single QPN. The receive queues appear as different QPNs. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH v3] IPoIB: Leave space in skb linear buffer for IP headers
On Tue, Apr 9, 2013 at 6:13 AM, Luick, Dean wrote: > Can you go through the "else" of the first if (page is NULL), then enter the > second if? If so, isn't the page lost? Thanks, good catch. I'll fix that up. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tune ib stack
On 09.04.2013 16:23, Hal Rosenstock wrote: >> So these values are exactly the same as in "ibv_devinfo" and can be set >> in /sys/class/infiniband/mlx4_0/device/mlx4_port1_mtu. >> >> I've found the PortInfo with the command >> "smpquery portinfo -C mlx4_0 3 1" >> where I'm using the first HCA to contact the SM. I tell the SM the >> destination LID ('3' here in my case) and the destination port ('1'). >> >> Is there another method to set the max MTU? > > That doesn't set max MTU (MTUCap) but merely reads it (for that port). Sorry, copy and paste error. I've meant the mlx4 file: /sys/class/infiniband/mlx4_0/device/mlx4_port1_mtu But you've answered that by "vendor specific". Thanks for the valuable information! For us most interesting would be if the MTU can be changed live without any service disruption. Looks like the mlx4 driver can't provide that. Perhaps switches can do that. Cheers, Sebastian -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tune ib stack
On 4/9/2013 9:56 AM, Sebastian Riemer wrote: > On 09.04.2013 15:34, Hal Rosenstock wrote: >> On 4/9/2013 9:16 AM, Sebastian Riemer wrote: >>> On 09.04.2013 14:49, Hal Rosenstock wrote: On 4/9/2013 7:12 AM, Vasiliy Tolstov wrote: > Hello. I have some servers, with mellanox ConnectX-3 and have some > questions: > Why max_mtu differs with active_mtu? What does peer port say for max MTU ? > How can i set active mtu? SM sets active MTU to min of peer ports max MTUs. >>> >>> So with "peer port max MTU" do you mean this file?: >>> >>> /sys/class/infiniband/mlx4_0/device/mlx4_port1_mtu >> >> I meant NeighborMTU from PortInfo as active MTU and MTUCap there is >> supported MTU. > > So these values are exactly the same as in "ibv_devinfo" and can be set > in /sys/class/infiniband/mlx4_0/device/mlx4_port1_mtu. > > I've found the PortInfo with the command > "smpquery portinfo -C mlx4_0 3 1" > where I'm using the first HCA to contact the SM. I tell the SM the > destination LID ('3' here in my case) and the destination port ('1'). > > Is there another method to set the max MTU? That doesn't set max MTU (MTUCap) but merely reads it (for that port). > I know that switches can also set the max MTU for their switch ports > where most of them use 2048 as default. You would need to contact your CA and/or switch vendor(s) (see below). > How to change these switch port MTUs for unmanaged switches? > > On managed switches this can be done over the web front-end. Yes. MTUCap is RO in terms of the SM so there are only "out of band" mechanisms to change this which are vendor specific like a web front end. -- Hal > Cheers, > Sebastian > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 for-next 0/5] IB/IPoIB: Add multi-queue TSS and RSS support
On 03/04/2013 23:12, Hefty, Sean wrote: Hi Sean, Ping. You had concerns on the suggested concept, we want to know if we addressed them, can you comment? I'm in meetings this week until tomorrow. I'll try to take a look at the updated patches then or Friday. any feedback? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tune ib stack
On 09.04.2013 15:34, Hal Rosenstock wrote: > On 4/9/2013 9:16 AM, Sebastian Riemer wrote: >> On 09.04.2013 14:49, Hal Rosenstock wrote: >>> On 4/9/2013 7:12 AM, Vasiliy Tolstov wrote: Hello. I have some servers, with mellanox ConnectX-3 and have some questions: Why max_mtu differs with active_mtu? >>> >>> What does peer port say for max MTU ? >>> How can i set active mtu? >>> >>> SM sets active MTU to min of peer ports max MTUs. >> >> So with "peer port max MTU" do you mean this file?: >> >> /sys/class/infiniband/mlx4_0/device/mlx4_port1_mtu > > I meant NeighborMTU from PortInfo as active MTU and MTUCap there is > supported MTU. So these values are exactly the same as in "ibv_devinfo" and can be set in /sys/class/infiniband/mlx4_0/device/mlx4_port1_mtu. I've found the PortInfo with the command "smpquery portinfo -C mlx4_0 3 1" where I'm using the first HCA to contact the SM. I tell the SM the destination LID ('3' here in my case) and the destination port ('1'). Is there another method to set the max MTU? I know that switches can also set the max MTU for their switch ports where most of them use 2048 as default. How to change these switch port MTUs for unmanaged switches? On managed switches this can be done over the web front-end. Cheers, Sebastian -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tune ib stack
On 4/9/2013 9:16 AM, Sebastian Riemer wrote: > On 09.04.2013 14:49, Hal Rosenstock wrote: >> On 4/9/2013 7:12 AM, Vasiliy Tolstov wrote: >>> Hello. I have some servers, with mellanox ConnectX-3 and have some >>> questions: >>> Why max_mtu differs with active_mtu? >> >> What does peer port say for max MTU ? >> >>> How can i set active mtu? >> >> SM sets active MTU to min of peer ports max MTUs. > > So with "peer port max MTU" do you mean this file?: > > /sys/class/infiniband/mlx4_0/device/mlx4_port1_mtu I meant NeighborMTU from PortInfo as active MTU and MTUCap there is supported MTU. -- Hal -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tune ib stack
On 09.04.2013 14:49, Hal Rosenstock wrote: > On 4/9/2013 7:12 AM, Vasiliy Tolstov wrote: >> Hello. I have some servers, with mellanox ConnectX-3 and have some questions: >> Why max_mtu differs with active_mtu? > > What does peer port say for max MTU ? > >> How can i set active mtu? > > SM sets active MTU to min of peer ports max MTUs. So with "peer port max MTU" do you mean this file?: /sys/class/infiniband/mlx4_0/device/mlx4_port1_mtu I've seen that it can be set as well. I've got two ConnectX-2 machines connected back2back. In general these have 4K max and active. So let's try something: Host1: $ echo 2048 > /sys/class/infiniband/mlx4_0/device/mlx4_port1_mtu # Port is not active, let's reactivate it. $ echo 1 > /sys/class/infiniband/mlx4_0/device/enable ibv_devinfo Host1: max_mtu:2048 (4) active_mtu: 2048 (4) Host2: max_mtu:4096 (5) active_mtu: 2048 (4) Both had "4096 (5)" before everywhere. So that's the recommended way to reduce the MTU? I've heard that reducing the MTU in a fabric can help fighting congestion issues. As congestion control doesn't work yet, could this help against congestion? Cheers, Sebastian -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [RFC/PATCH v3] IPoIB: Leave space in skb linear buffer for IP headers
> From: Roland Dreier > + if (wc->byte_len > IPOIB_UD_HEAD_SIZE) { > + page = priv->rx_ring[wr_id].page; > + priv->rx_ring[wr_id].page = NULL; > + } else { > + page = NULL; > + } > + > /* >* If we can't allocate a new RX buffer, dump >* this packet and reuse the old buffer. >*/ > if (unlikely(!ipoib_alloc_rx_skb(dev, wr_id))) { > ++dev->stats.rx_dropped; > + priv->rx_ring[wr_id].page = page; > goto repost; > } Can you go through the "else" of the first if (page is NULL), then enter the second if? If so, isn't the page lost? Dean -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tune ib stack
On 4/9/2013 8:15 AM, Sebastian Riemer wrote: > On 09.04.2013 13:51, Vasiliy Tolstov wrote: >>> Something like this: >>> echo 4096 > /sys/class/infiniband/mlx4_0/device/mlx4_port1_mtu >> >> After doing this all srp connections down and port is down. I need to >> restart openibd > > Sorry for that! It's much easier to set the IP MTU. Managed switches > support setting the RDMA MTU. So it could be possible that it is a > setting in the SM config. But I'm not sure. IP MTU is different than link MTU. For UD mode, it's link MTU - 4. For RC (connected) mode, this can be a much larger number than the link MTU as the HCA does the segmentation/reassembly down to the path MTU. > $ man opensm > says that it can be set in the partitions.conf Yes, MTU for the IPoIB interface is set in the partition file. This would need configuring for the larger (4K) MTU assuming all ports support the 4K MTU. If not, some ports won't be able to join the IPoIB broadcast (or other) IB multicast groups and IPoIB won't work. -- Hal -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tune ib stack
On 4/9/2013 7:12 AM, Vasiliy Tolstov wrote: > Hello. I have some servers, with mellanox ConnectX-3 and have some questions: > Why max_mtu differs with active_mtu? What does peer port say for max MTU ? > How can i set active mtu? SM sets active MTU to min of peer ports max MTUs. > Why ibstatus says that i have only 10 Gb/s ? Because the link negotiated at 10 Gb/s. > All cables support 40 Gb/s. Do ports support 40 Gb/s also ? What do peer ports say for supported and enabled link speeds ? -- Hal > Thanks for any help. > > Linux xen28 3.8.6-1-xen #1 SMP Fri Apr 5 18:48:02 UTC 2013 (713918b) > x86_64 x86_64 x86_64 GNU/Linux > > Infiniband device 'mlx4_0' port 1 status: > default gid: fe80::::0025:90ff:ff17:9b25 > base lid:0x34 > sm lid: 0x4 > state: 4: ACTIVE > phys state: 5: LinkUp > rate:10 Gb/sec (4X) > link_layer: InfiniBand > > > 06:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3] > hca_id: mlx4_0 > transport: InfiniBand (0) > fw_ver: 2.10.700 > node_guid: 0025:90ff:ff17:9b24 > sys_image_guid: 0025:90ff:ff17:9b27 > vendor_id: 0x02c9 > vendor_part_id: 4099 > hw_ver: 0x0 > board_id: SM_218101000 > phys_port_cnt: 1 > port: 1 > state: PORT_ACTIVE (4) > max_mtu:4096 (5) > active_mtu: 2048 (4) > sm_lid: 4 > port_lid: 52 > port_lmc: 0x00 > link_layer: IB > > > -- > Vasiliy Tolstov, > e-mail: v.tols...@selfip.ru > jabber: v...@selfip.ru > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tune ib stack
On 09.04.2013 13:51, Vasiliy Tolstov wrote: >> Something like this: >> echo 4096 > /sys/class/infiniband/mlx4_0/device/mlx4_port1_mtu > > After doing this all srp connections down and port is down. I need to > restart openibd Sorry for that! It's much easier to set the IP MTU. Managed switches support setting the RDMA MTU. So it could be possible that it is a setting in the SM config. But I'm not sure. $ man opensm says that it can be set in the partitions.conf >> You should see "40 Gb/sec (4X QDR)" here. Perhaps the OFED is too old so >> that FDR and ConnectX 3 aren't supported, yet. "10 Gb/sec (4X)" seems to >> be the default case if a rate isn't supported. > > Yes, in older card with ConnecX i see this, but in case of ConnectX-3 only 10 > Gb The kernel version is okay. It depends on the user space. There is a support note in OFED 3.5: - ConnectX-3 (fw-ConnectX3 Rev 2.11.0500) (FDR and FDR10 Modes are Supported) Before OFED 3.5 these HCAs aren't supported. A look at the related source code could be worth a try. Cheers, Sebastian -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tune ib stack
2013/4/9 Sebastian Riemer : > Because 2048 is the default and 4096 is the max. supported MTU by the > hardware. > >> How can i set active mtu? > > Something like this: > echo 4096 > /sys/class/infiniband/mlx4_0/device/mlx4_port1_mtu After doing this all srp connections down and port is down. I need to restart openibd 06:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3] Subsystem: Mellanox Technologies Device 0017 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- Could be a bug. Which OFED/Kernel (if using in-tree IB modules) do you use? > Mine says with ConnectX2 QDR: "40 Gb/sec (4X QDR)" I'm using stock 3.8.6 kernel and xen patches on top. And i'm use modules provided with kernel. (only ib_srp i'm use from Bart github repo) > You should see "40 Gb/sec (4X QDR)" here. Perhaps the OFED is too old so > that FDR and ConnectX 3 aren't supported, yet. "10 Gb/sec (4X)" seems to > be the default case if a rate isn't supported. Yes, in older card with ConnecX i see this, but in case of ConnectX-3 only 10 Gb -- Vasiliy Tolstov, e-mail: v.tols...@selfip.ru jabber: v...@selfip.ru -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tune ib stack
On 09.04.2013 13:12, Vasiliy Tolstov wrote: > Hello. I have some servers, with mellanox ConnectX-3 and have some questions: > Why max_mtu differs with active_mtu? Because 2048 is the default and 4096 is the max. supported MTU by the hardware. > How can i set active mtu? Something like this: echo 4096 > /sys/class/infiniband/mlx4_0/device/mlx4_port1_mtu > Why ibstatus says that i have only 10 Gb/s ? Could be a bug. Which OFED/Kernel (if using in-tree IB modules) do you use? Mine says with ConnectX2 QDR: "40 Gb/sec (4X QDR)" > All cables support 40 Gb/s. > > Thanks for any help. > > Linux xen28 3.8.6-1-xen #1 SMP Fri Apr 5 18:48:02 UTC 2013 (713918b) > x86_64 x86_64 x86_64 GNU/Linux > > Infiniband device 'mlx4_0' port 1 status: > default gid: fe80::::0025:90ff:ff17:9b25 > base lid:0x34 > sm lid: 0x4 > state: 4: ACTIVE > phys state: 5: LinkUp > rate:10 Gb/sec (4X) > link_layer: InfiniBand You should see "40 Gb/sec (4X QDR)" here. Perhaps the OFED is too old so that FDR and ConnectX 3 aren't supported, yet. "10 Gb/sec (4X)" seems to be the default case if a rate isn't supported. Cheers, Sebastian -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
tune ib stack
Hello. I have some servers, with mellanox ConnectX-3 and have some questions: Why max_mtu differs with active_mtu? How can i set active mtu? Why ibstatus says that i have only 10 Gb/s ? All cables support 40 Gb/s. Thanks for any help. Linux xen28 3.8.6-1-xen #1 SMP Fri Apr 5 18:48:02 UTC 2013 (713918b) x86_64 x86_64 x86_64 GNU/Linux Infiniband device 'mlx4_0' port 1 status: default gid: fe80::::0025:90ff:ff17:9b25 base lid:0x34 sm lid: 0x4 state: 4: ACTIVE phys state: 5: LinkUp rate:10 Gb/sec (4X) link_layer: InfiniBand 06:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3] hca_id: mlx4_0 transport: InfiniBand (0) fw_ver: 2.10.700 node_guid: 0025:90ff:ff17:9b24 sys_image_guid: 0025:90ff:ff17:9b27 vendor_id: 0x02c9 vendor_part_id: 4099 hw_ver: 0x0 board_id: SM_218101000 phys_port_cnt: 1 port: 1 state: PORT_ACTIVE (4) max_mtu:4096 (5) active_mtu: 2048 (4) sm_lid: 4 port_lid: 52 port_lmc: 0x00 link_layer: IB -- Vasiliy Tolstov, e-mail: v.tols...@selfip.ru jabber: v...@selfip.ru -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html