Re: [PATCH/opensm] Fix SANodeRecord.nodeInfo.localPortNum
On 17:29 Tue 01 Mar , Jason Gunthorpe wrote: This value must match the portGUID, so it needs to vary on a per port basis like portGUID and not simply reflect the value opensm acquired during the sweep. Prior to this patch opensm returns the same value for localPortNum for all ports on a HCA, now it returns the correct localPortNum for the portGUID. Also fixes query matching in the same way Signed-off-by: Jason Gunthorpe jguntho...@obsidianresearch.com Applied. Thanks. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/opensm] Fix SANodeRecord.nodeInfo.localPortNum
On 10:11 Wed 09 Mar , Jason Gunthorpe wrote: On Wed, Mar 09, 2011 at 05:47:44PM +0200, Alex Netes wrote: Hi Jason, On 17:29 Tue 01 Mar , Jason Gunthorpe wrote: This value must match the portGUID, so it needs to vary on a per port basis like portGUID and not simply reflect the value opensm acquired during the sweep. Prior to this patch opensm returns the same value for localPortNum for all ports on a HCA, now it returns the correct localPortNum for the portGUID. Also fixes query matching in the same way Spec defines the SA NodeRecord.LocalPortNum field as the number of the link port which received this SMP (14.2.5.3). This definition doesn't make any sense when it comes to SA. I think the spec is pretty sound on this point. The SA is expected to return SA queries for SMP records that match what is present in the fabric if the requestor did the SMP query itself. The constraints IBA places on a NodeInfo SMP query are such that localPortNum and portGUID must *always* match. You cannot enter on port 1 and query the NodeInfo for port 2 for a CA. You are right about this. There is no case I can think of, where PortGUID can be different than localPortNum, when it comes to CAs. However for switches there is no such coupling. For NodeInfo SMP localPortNum can be any port, but the portGUID remain the same. All I'm saying is that section 15.2.5.2 or 14.2.5.3 should be more clear about the definition of LocalPortNum field when it comes to SA. So the current SA behavior of returning garbage for localPortNum is essentially returning a NodeInfo that can never be returned by a raw SMP query, which is clearly not aligned with the intent of the spec. There is same definition for localPortNum in PortInfoRecord, however the correct localPortNum is retrieved. So I guess NodeInfoRecord could act same way you suggested. I mispoke a bit in the commit message, opensm returns the localPortNum it acquired during the sweep *for a random port*, eg it collects and stores the NodeInfo for the first port it sees, then stores portGUID and localPortNum seperately. When it generates the SA reply it corretly replaces the port GUID but uses the random old localPortNum. This is clearly incorrect. I understand your motivation for the patch and the fact that current LocalPortNum query matching in the SM doesn't sound right. So maybe IBA spec fine tuning is needed, before applying the patch, so this field won't be open to free interpretations. I don't think there is really any free interpretation here. For everything but a switch localPortNum must always reflect the port that portGUID is associated with. This is is what is defined to happen when the NodeRecord SMP is generated by the SMA, the SA must do the same. This causes real problems, without an accurate localPortNumber it is impossible to associate the portGUID with a port number when doing SA queries What about using PortInfoRecord to get portGUID-localPortNumber binding? One more minor thing, you defined port_num/match_port_num as unsigned int. I think it can be uint8_t, right? -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/opensm] Fix SANodeRecord.nodeInfo.localPortNum
I think the spec is pretty sound on this point. The SA is expected to return SA queries for SMP records that match what is present in the fabric if the requestor did the SMP query itself. The constraints IBA places on a NodeInfo SMP query are such that localPortNum and portGUID must *always* match. You cannot enter on port 1 and query the NodeInfo for port 2 for a CA. You are right about this. There is no case I can think of, where PortGUID can be different than localPortNum, when it comes to CAs. However for switches there is no such coupling. For NodeInfo SMP localPortNum can be any port, but the portGUID remain the same. All I'm saying is that section 15.2.5.2 or 14.2.5.3 should be more clear about the definition of LocalPortNum field when it comes to SA. If you read the spec carefully it is saying that the SA should return the SMP it would get back from the device if it made the query. Under that condition the value for localPortNum is pretty clear.. So the current SA behavior of returning garbage for localPortNum is essentially returning a NodeInfo that can never be returned by a raw SMP query, which is clearly not aligned with the intent of the spec. There is same definition for localPortNum in PortInfoRecord, however the correct localPortNum is retrieved. So I guess NodeInfoRecord could act same way you suggested. Hal is correct to point out that PortInfoRecord can return other localPortNums due to the attribute modifier. Only NodeInfo has the fixed relationship between port number and portGUID. What about using PortInfoRecord to get portGUID-localPortNumber binding? That could work, but it is just working around a bug in opensm, and requires more network traffic. For the application where I found this bug the SAPortInfoRecord is never retrieved. One more minor thing, you defined port_num/match_port_num as unsigned int. I think it can be uint8_t, right? Yes, that is a style choice, it won't affect correctness. Using 'unsigned int' generates slightly better assembly. Jason -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/opensm] Fix SANodeRecord.nodeInfo.localPortNum
Hi Jason, On 17:29 Tue 01 Mar , Jason Gunthorpe wrote: This value must match the portGUID, so it needs to vary on a per port basis like portGUID and not simply reflect the value opensm acquired during the sweep. Prior to this patch opensm returns the same value for localPortNum for all ports on a HCA, now it returns the correct localPortNum for the portGUID. Also fixes query matching in the same way Spec defines the SA NodeRecord.LocalPortNum field as the number of the link port which received this SMP (14.2.5.3). This definition doesn't make any sense when it comes to SA. I understand your motivation for the patch and the fact that current LocalPortNum query matching in the SM doesn't sound right. So maybe IBA spec fine tuning is needed, before applying the patch, so this field won't be open to free interpretations. Signed-off-by: Jason Gunthorpe jguntho...@obsidianresearch.com --- opensm/opensm/osm_sa_node_record.c | 22 +++--- 1 files changed, 15 insertions(+), 7 deletions(-) Eg corrected output: $ ibtool saquery nr nodeInfo.localPortNum=1 NodeRecord dump: LID.8 Reserved_16.0 Base version1 Class version...1 node_type...1 num_ports...2 sys_guid0002:c903::14a7 node_guid...0002:c903::14a4 port_guid...0002:c903::14a5 partition_cap...128 device_id...0x634a Revision0x00a0 port_num1 vendor_id...0x0002c9 NodeDescription.MT25408 ConnectX Mellanox Technologies $ ibtool saquery nr nodeInfo.localPortNum=2 NodeRecord dump: LID.10 Reserved_16.0 Base version1 Class version...1 node_type...1 num_ports...2 sys_guid0002:c903::14a7 node_guid...0002:c903::14a4 port_guid...0002:c903::14a6 partition_cap...128 device_id...0x634a Revision0x00a0 port_num2 vendor_id...0x0002c9 NodeDescription.MT25408 ConnectX Mellanox Technologies diff --git a/opensm/opensm/osm_sa_node_record.c b/opensm/opensm/osm_sa_node_record.c index 87f00fd..ff08219 100644 --- a/opensm/opensm/osm_sa_node_record.c +++ b/opensm/opensm/osm_sa_node_record.c @@ -70,7 +70,8 @@ typedef struct osm_nr_search_ctxt { static ib_api_status_t nr_rcv_new_nr(osm_sa_t * sa, IN const osm_node_t * p_node, IN cl_qlist_t * p_list, - IN ib_net64_t port_guid, IN ib_net16_t lid) + IN ib_net64_t port_guid, IN ib_net16_t lid, + IN unsigned int port_num) port_num can be just uint8_t, right? { osm_nr_item_t *p_rec_item; ib_api_status_t status = IB_SUCCESS; @@ -97,6 +98,9 @@ static ib_api_status_t nr_rcv_new_nr(osm_sa_t * sa, p_rec_item-rec.node_info = p_node-node_info; p_rec_item-rec.node_info.port_guid = port_guid; + p_rec_item-rec.node_info.port_num_vendor_id = + (p_rec_item-rec.node_info.port_num_vendor_id IB_NODE_INFO_VEND_ID_MASK) | + ((port_num IB_NODE_INFO_PORT_NUM_SHIFT) IB_NODE_INFO_PORT_NUM_MASK); memcpy((p_rec_item-rec.node_desc), (p_node-node_desc), IB_NODE_DESCRIPTION_SIZE); cl_qlist_insert_tail(p_list, p_rec_item-list_item); @@ -110,6 +114,7 @@ static void nr_rcv_create_nr(IN osm_sa_t * sa, IN osm_node_t * p_node, IN cl_qlist_t * p_list, IN ib_net64_t const match_port_guid, IN ib_net16_t const match_lid, + IN unsigned int const match_port_num, IN const osm_physp_t * p_req_physp, IN const ib_net64_t comp_mask) { @@ -173,7 +178,11 @@ static void nr_rcv_create_nr(IN osm_sa_t * sa, IN osm_node_t * p_node, continue; } - nr_rcv_new_nr(sa, p_node, p_list, port_guid, base_lid); + if ((comp_mask IB_NR_COMPMASK_PORTNUM) + (port_num != match_port_num)) + continue; + + nr_rcv_new_nr(sa, p_node, p_list, port_guid, base_lid, port_num); }
Re: [PATCH/opensm] Fix SANodeRecord.nodeInfo.localPortNum
On Wed, Mar 09, 2011 at 05:47:44PM +0200, Alex Netes wrote: Hi Jason, On 17:29 Tue 01 Mar , Jason Gunthorpe wrote: This value must match the portGUID, so it needs to vary on a per port basis like portGUID and not simply reflect the value opensm acquired during the sweep. Prior to this patch opensm returns the same value for localPortNum for all ports on a HCA, now it returns the correct localPortNum for the portGUID. Also fixes query matching in the same way Spec defines the SA NodeRecord.LocalPortNum field as the number of the link port which received this SMP (14.2.5.3). This definition doesn't make any sense when it comes to SA. I think the spec is pretty sound on this point. The SA is expected to return SA queries for SMP records that match what is present in the fabric if the requestor did the SMP query itself. The constraints IBA places on a NodeInfo SMP query are such that localPortNum and portGUID must *always* match. You cannot enter on port 1 and query the NodeInfo for port 2 for a CA. So the current SA behavior of returning garbage for localPortNum is essentially returning a NodeInfo that can never be returned by a raw SMP query, which is clearly not aligned with the intent of the spec. I mispoke a bit in the commit message, opensm returns the localPortNum it acquired during the sweep *for a random port*, eg it collects and stores the NodeInfo for the first port it sees, then stores portGUID and localPortNum seperately. When it generates the SA reply it corretly replaces the port GUID but uses the random old localPortNum. This is clearly incorrect. I understand your motivation for the patch and the fact that current LocalPortNum query matching in the SM doesn't sound right. So maybe IBA spec fine tuning is needed, before applying the patch, so this field won't be open to free interpretations. I don't think there is really any free interpretation here. For everything but a switch localPortNum must always reflect the port that portGUID is associated with. This is is what is defined to happen when the NodeRecord SMP is generated by the SMA, the SA must do the same. This causes real problems, without an accurate localPortNumber it is impossible to associate the portGUID with a port number when doing SA queries Jason -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/opensm] Fix SANodeRecord.nodeInfo.localPortNum
On Wed, Mar 9, 2011 at 12:11 PM, Jason Gunthorpe jguntho...@obsidianresearch.com wrote: On Wed, Mar 09, 2011 at 05:47:44PM +0200, Alex Netes wrote: Hi Jason, On 17:29 Tue 01 Mar , Jason Gunthorpe wrote: This value must match the portGUID, so it needs to vary on a per port basis like portGUID and not simply reflect the value opensm acquired during the sweep. Prior to this patch opensm returns the same value for localPortNum for all ports on a HCA, now it returns the correct localPortNum for the portGUID. Also fixes query matching in the same way Spec defines the SA NodeRecord.LocalPortNum field as the number of the link port which received this SMP (14.2.5.3). This definition doesn't make any sense when it comes to SA. I think the spec is pretty sound on this point. The SA is expected to return SA queries for SMP records that match what is present in the fabric if the requestor did the SMP query itself. The constraints IBA places on a NodeInfo SMP query are such that localPortNum and portGUID must *always* match. You cannot enter on port 1 and query the NodeInfo for port 2 for a CA. So the current SA behavior of returning garbage for localPortNum is essentially returning a NodeInfo that can never be returned by a raw SMP query, which is clearly not aligned with the intent of the spec. I mispoke a bit in the commit message, opensm returns the localPortNum it acquired during the sweep *for a random port*, eg it collects and stores the NodeInfo for the first port it sees, then stores portGUID and localPortNum seperately. When it generates the SA reply it corretly replaces the port GUID but uses the random old localPortNum. This is clearly incorrect. I understand your motivation for the patch and the fact that current LocalPortNum query matching in the SM doesn't sound right. So maybe IBA spec fine tuning is needed, before applying the patch, so this field won't be open to free interpretations. I don't think there is really any free interpretation here. For everything but a switch localPortNum must always reflect the port that portGUID is associated with. Unfortunately, this is not always true due to the funky cross (CA and router) port PortInfo queries (but I don't think that query style is used by OpenSM for CA or router ports but allowed in IBA)... -- Hal This is is what is defined to happen when the NodeRecord SMP is generated by the SMA, the SA must do the same. This causes real problems, without an accurate localPortNumber it is impossible to associate the portGUID with a port number when doing SA queries Jason -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/opensm] Fix SANodeRecord.nodeInfo.localPortNum
On Wed, Mar 09, 2011 at 01:26:26PM -0500, Hal Rosenstock wrote: I don't think there is really any free interpretation here. For everything but a switch localPortNum must always reflect the port that portGUID is associated with. Unfortunately, this is not always true due to the funky cross (CA and router) port PortInfo queries (but I don't think that query style is used by OpenSM for CA or router ports but allowed in IBA)... What part of the spec are you referring to here? Thanks, Jason -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/opensm] Fix SANodeRecord.nodeInfo.localPortNum
On Wed, Mar 9, 2011 at 1:34 PM, Jason Gunthorpe jguntho...@obsidianresearch.com wrote: On Wed, Mar 09, 2011 at 01:26:26PM -0500, Hal Rosenstock wrote: I don't think there is really any free interpretation here. For everything but a switch localPortNum must always reflect the port that portGUID is associated with. Unfortunately, this is not always true due to the funky cross (CA and router) port PortInfo queries (but I don't think that query style is used by OpenSM for CA or router ports but allowed in IBA)... What part of the spec are you referring to here? IBA 1.2.1 v1 p.830 line 9 the otherwise sentence. Thanks, Jason -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/opensm] Fix SANodeRecord.nodeInfo.localPortNum
On Wed, Mar 09, 2011 at 01:38:27PM -0500, Hal Rosenstock wrote: On Wed, Mar 9, 2011 at 1:34 PM, Jason Gunthorpe jguntho...@obsidianresearch.com wrote: On Wed, Mar 09, 2011 at 01:26:26PM -0500, Hal Rosenstock wrote: I don't think there is really any free interpretation here. For everything but a switch localPortNum must always reflect the port that portGUID is associated with. Unfortunately, this is not always true due to the funky cross (CA and router) port PortInfo queries (but I don't think that query style is used by OpenSM for CA or router ports but allowed in IBA)... What part of the spec are you referring to here? IBA 1.2.1 v1 p.830 line 9 the otherwise sentence. That is talking about PortInfo. NodeInfo does not support that attribute modifier language, my patch affects SANodeRecord. Jason -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/opensm] Fix SANodeRecord.nodeInfo.localPortNum
On Wed, Mar 9, 2011 at 1:40 PM, Jason Gunthorpe jguntho...@obsidianresearch.com wrote: On Wed, Mar 09, 2011 at 01:38:27PM -0500, Hal Rosenstock wrote: On Wed, Mar 9, 2011 at 1:34 PM, Jason Gunthorpe jguntho...@obsidianresearch.com wrote: On Wed, Mar 09, 2011 at 01:26:26PM -0500, Hal Rosenstock wrote: I don't think there is really any free interpretation here. For everything but a switch localPortNum must always reflect the port that portGUID is associated with. Unfortunately, this is not always true due to the funky cross (CA and router) port PortInfo queries (but I don't think that query style is used by OpenSM for CA or router ports but allowed in IBA)... What part of the spec are you referring to here? IBA 1.2.1 v1 p.830 line 9 the otherwise sentence. That is talking about PortInfo. NodeInfo does not support that attribute modifier language, my patch affects SANodeRecord. My bad :-( I was thinking of PortInfoRecord, not NodeRecord, for some reason... -- Hal Jason -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/opensm] Fix SANodeRecord.nodeInfo.localPortNum
On Wed, Mar 09, 2011 at 01:47:48PM -0500, Hal Rosenstock wrote: On Wed, Mar 9, 2011 at 1:40 PM, Jason Gunthorpe jguntho...@obsidianresearch.com wrote: On Wed, Mar 09, 2011 at 01:38:27PM -0500, Hal Rosenstock wrote: On Wed, Mar 9, 2011 at 1:34 PM, Jason Gunthorpe jguntho...@obsidianresearch.com wrote: On Wed, Mar 09, 2011 at 01:26:26PM -0500, Hal Rosenstock wrote: I don't think there is really any free interpretation here. For everything but a switch localPortNum must always reflect the port that portGUID is associated with. Unfortunately, this is not always true due to the funky cross (CA and router) port PortInfo queries (but I don't think that query style is used by OpenSM for CA or router ports but allowed in IBA)... What part of the spec are you referring to here? IBA 1.2.1 v1 p.830 line 9 the otherwise sentence. That is talking about PortInfo. NodeInfo does not support that attribute modifier language, my patch affects SANodeRecord. My bad :-( I was thinking of PortInfoRecord, not NodeRecord, for some reason... No worries, what do you think about my patch, does it align with what IBA intends? Thanks, Jason -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/opensm] Fix SANodeRecord.nodeInfo.localPortNum
On Wed, Mar 9, 2011 at 1:51 PM, Jason Gunthorpe jguntho...@obsidianresearch.com wrote: On Wed, Mar 09, 2011 at 01:47:48PM -0500, Hal Rosenstock wrote: On Wed, Mar 9, 2011 at 1:40 PM, Jason Gunthorpe jguntho...@obsidianresearch.com wrote: On Wed, Mar 09, 2011 at 01:38:27PM -0500, Hal Rosenstock wrote: On Wed, Mar 9, 2011 at 1:34 PM, Jason Gunthorpe jguntho...@obsidianresearch.com wrote: On Wed, Mar 09, 2011 at 01:26:26PM -0500, Hal Rosenstock wrote: I don't think there is really any free interpretation here. For everything but a switch localPortNum must always reflect the port that portGUID is associated with. Unfortunately, this is not always true due to the funky cross (CA and router) port PortInfo queries (but I don't think that query style is used by OpenSM for CA or router ports but allowed in IBA)... What part of the spec are you referring to here? IBA 1.2.1 v1 p.830 line 9 the otherwise sentence. That is talking about PortInfo. NodeInfo does not support that attribute modifier language, my patch affects SANodeRecord. My bad :-( I was thinking of PortInfoRecord, not NodeRecord, for some reason... No worries, what do you think about my patch, does it align with what IBA intends? Yes from my brief scan/understanding of what it's trying to accomplish. I haven't probed/analyzed the actual fix itself. I guess LocalPortNum in NodeRecord was not so important to anyone before now. -- Hal Thanks, Jason -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/opensm] Fix SANodeRecord.nodeInfo.localPortNum
On Wed, Mar 09, 2011 at 02:00:16PM -0500, Hal Rosenstock wrote: IBA 1.2.1 v1 p.830 line 9 the otherwise sentence. That is talking about PortInfo. NodeInfo does not support that attribute modifier language, my patch affects SANodeRecord. My bad :-( I was thinking of PortInfoRecord, not NodeRecord, for some reason... No worries, what do you think about my patch, does it align with what IBA intends? Yes from my brief scan/understanding of what it's trying to accomplish. I haven't probed/analyzed the actual fix itself. I guess LocalPortNum in NodeRecord was not so important to anyone before now. Right, plus this only comes up if you connect both ports on a HCA into the same subnet, otherwise opensm works correctly.. Alex? Jason -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html