Hi Sean,
The problem arises when the active Solaris client is sending a connection
request to a passive OFED server instance. Solaris will set the hop_limit field
to 0xFF and will not expect or enable GRH routing. The subsequent exchange of
RC messages are therefore silently dropped since one side expects GRH traffic
and the other doesn't.
The active side seems to work ok for local only subnets so nothing needs to be
changed there.
Here is an updated patch:
Signed-off-by: Jim L Hall <[EMAIL PROTECTED]>
---
drivers/infiniband/core/cm.c | 10 ++++++++--
1 files changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index d446998..25a77ec 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -1095,7 +1095,10 @@ static void cm_format_paths_from_req(struct cm_req_msg
*req_msg,
primary_path->dlid = req_msg->primary_local_lid;
primary_path->slid = req_msg->primary_remote_lid;
primary_path->flow_label = cm_req_get_primary_flow_label(req_msg);
- primary_path->hop_limit = req_msg->primary_hop_limit;
+ if (cm_req_get_primary_subnet_local(req_msg) == 1)
+ primary_path->hop_limit = 1;
+ else
+ primary_path->hop_limit = req_msg->primary_hop_limit;
primary_path->traffic_class = req_msg->primary_traffic_class;
primary_path->reversible = 1;
primary_path->pkey = req_msg->pkey;
@@ -1116,7 +1119,10 @@ static void cm_format_paths_from_req(struct cm_req_msg
*req_msg,
alt_path->dlid = req_msg->alt_local_lid;
alt_path->slid = req_msg->alt_remote_lid;
alt_path->flow_label = cm_req_get_alt_flow_label(req_msg);
- alt_path->hop_limit = req_msg->alt_hop_limit;
+ if (cm_req_get_alt_subnet_local(req_msg) == 1)
+ alt_path->hop_limit = 1;
+ else
+ alt_path->hop_limit = req_msg->alt_hop_limit;
alt_path->traffic_class = req_msg->alt_traffic_class;
alt_path->reversible = 1;
alt_path->pkey = req_msg->pkey;
Thanks,
- Jim H.
----- Original Message -----
From: Sean Hefty
To: 'Jim Hall' ; [email protected]
Sent: Monday, September 17, 2007 12:10 PM
Subject: RE: [PATCH] core/cm: improve request message interpretation of
subnet local fields
(I don't think this made it to the mailng list, so re-posting.)
I don't disagree with the concept here, but can you explain the problem that
you're seeing? Is it that the path is assumed to be routed based on the
hop_limit (set in ib_init_ah_from_path)? Are any changes needed for active
side processing?
Btw, I'd prefer something more like:
if (cm_req_get_primary_subnet_local. )
primary_path->hop_limit = 1;
else
primary_path->hop_limit = req_msg->primary_hop_limit;
(or '? :' equivalent), versus setting hop_limit, then overriding it in the
common case. And I'm fine if we don't keep the comment.
- Sean
------------------------------------------------------------------------------
From: Jim Hall [mailto:[EMAIL PROTECTED]
Sent: Monday, September 17, 2007 7:38 AM
To: [EMAIL PROTECTED]
Cc: Hefty, Sean
Subject: [PATCH] core/cm: improve request message interpertation of subnet
local fields
When parsing a CMA connect request message, if the subnet local is 1
(both nodes on same subnet), then explicitly set the hop limit in the
corresponding
path record to 1. This avoids a Global/Local mis-configuration problem with
Solaris
infinband CMA sessions.
Signed-off-by: Jim L Hall <[EMAIL PROTECTED]>
---
drivers/infiniband/core/cm.c | 16 ++++++++++++++++
1 files changed, 16 insertions(+), 0 deletions(-)
diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index d446998..3d8740c 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -1109,6 +1109,14 @@ static void cm_format_paths_from_req(struct cm_req_msg
*req_msg,
cm_req_get_primary_local_ack_timeout(req_msg);
primary_path->packet_life_time -= (primary_path->packet_life_time >
0);
+ if (cm_req_get_primary_subnet_local(req_msg) == 1) {
+
+ /* At this point we know that both sides are on the same
+ * subnet, any hop limits above 1 don't make much sense
+ */
+ primary_path->hop_limit = 1;
+ }
+
if (req_msg->alt_local_lid) {
memset(alt_path, 0, sizeof *alt_path);
alt_path->dgid = req_msg->alt_local_gid;
@@ -1129,6 +1137,14 @@ static void cm_format_paths_from_req(struct cm_req_msg
*req_msg,
alt_path->packet_life_time =
cm_req_get_alt_local_ack_timeout(req_msg);
alt_path->packet_life_time -= (alt_path->packet_life_time >
0);
+
+ if (cm_req_get_alt_subnet_local(req_msg) == 1) {
+
+ /* At this point we know that both sides are on the
same
+ * subnet, any hop limits above 1 don't make much
sense
+ */
+ alt_path->hop_limit = 1;
+ }
}
}
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general