Re: [ewg] DHCP over InfiniBand Update
Hi Hal, On Tue, 31 Aug 2010 16:09:19 -0400 Hal Rosenstock hal.rosenst...@gmail.com wrote: Hi, There appear to be two basic approaches to supporting DHCP (over InfiniBand) in Linux. There's LPF support (4.1.1 based) and older (3.0.4 based) socket support. The 4.1.1 LPF patches are: http://lists.openfabrics.org/pipermail/ewg/2010-May/015265.html http://lists.openfabrics.org/pipermail/ewg/2010-May/015266.html http://lists.openfabrics.org/pipermail/ewg/2010-May/015264.html The last being Matthieu Hautreux's matthieu.hautreux at cea.fr improved XID generation (same as https://lists.isc.org/mailman/htdig/dhcp-hackers/2009-January/001773.html). AFAIT an LPF based approach will only work on older kernels (due to elimination of CONFIG_FILTER support). Is this accurate ? Where have you seen that the LPF approach does not work on recent kernels? AFAICR, the CONFIG_FILTER disappeared a long time ago. Unless I'm missing something, you only need the CONFIG_PACKET option. Sébastien. OFED has two patches for 3.0.4 for a socket approach in http://www.openfabrics.org/git/?p=~tziporet/docs.git;a=tree;f=dhcp;h=aec68a2905559c8ed91f1157fa11d78cccb266cd;hb=ofed_1_5 dhcp-3.0.4.patch 0001-Make-DHCP-server-print-HW-info.patch I've been upporting those to a 4.x based DHCP and have a fundamental question which occurs even with the 3.0.4 socket based version. On the client machine, the DHCPOFFER in response to the DHCPDISCOVER is received (seen with tcpdump) but never seems to make it to the dhclient application. I can't see any kernel stack error counters incremented so I'm mystified as to what could be going wrong. I've also tried this on a number of different kernels. Any idea on why this might be or how to figure out where that packet is going ? I do see the dhcp client port with netstat -a --udp -n udp0 0 0.0.0.0:68 0.0.0.0:* udp0 0 0.0.0.0:68 0.0.0.0:* Any idea on what I'm missing ? Also, is any of this work making it's way into a released DHCP ? What's the process for this ? Is there some branch in a source repository where this work is available ? Thanks in advance for any pointers on all this. -- Hal ___ ewg mailing list e...@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ewg] DHCP over InfiniBand Update
Hi Sébastien, On Mon, Sep 6, 2010 at 9:03 AM, sebastien dugue sebastien.du...@bull.net wrote: Hi Hal, On Tue, 31 Aug 2010 16:09:19 -0400 Hal Rosenstock hal.rosenst...@gmail.com wrote: Hi, There appear to be two basic approaches to supporting DHCP (over InfiniBand) in Linux. There's LPF support (4.1.1 based) and older (3.0.4 based) socket support. The 4.1.1 LPF patches are: http://lists.openfabrics.org/pipermail/ewg/2010-May/015265.html http://lists.openfabrics.org/pipermail/ewg/2010-May/015266.html http://lists.openfabrics.org/pipermail/ewg/2010-May/015264.html The last being Matthieu Hautreux's matthieu.hautreux at cea.fr improved XID generation (same as https://lists.isc.org/mailman/htdig/dhcp-hackers/2009-January/001773.html). AFAIT an LPF based approach will only work on older kernels (due to elimination of CONFIG_FILTER support). Is this accurate ? Where have you seen that the LPF approach does not work on recent kernels? AFAICR, the CONFIG_FILTER disappeared a long time ago. Unless I'm missing something, you only need the CONFIG_PACKET option. The question was based on the README and some code in lpf.c but it sounds those comments relating to CONFIG_FILTER relate to 2.4 and not to 2.6 based kernels then. All that is needed with a 2.6 kernel is CONFIG_PACKET so the PF_PACKET socket can be created and used by lpf. Right ? Out of curiousity, why did you choose a PF_PACKET rather than a UDP socket based approach ? The UDP socket approach seems simpler but maybe has some other pitfalls. Thanks again. -- Hal Sébastien. OFED has two patches for 3.0.4 for a socket approach in http://www.openfabrics.org/git/?p=~tziporet/docs.git;a=tree;f=dhcp;h=aec68a2905559c8ed91f1157fa11d78cccb266cd;hb=ofed_1_5 dhcp-3.0.4.patch 0001-Make-DHCP-server-print-HW-info.patch I've been upporting those to a 4.x based DHCP and have a fundamental question which occurs even with the 3.0.4 socket based version. On the client machine, the DHCPOFFER in response to the DHCPDISCOVER is received (seen with tcpdump) but never seems to make it to the dhclient application. I can't see any kernel stack error counters incremented so I'm mystified as to what could be going wrong. I've also tried this on a number of different kernels. Any idea on why this might be or how to figure out where that packet is going ? I do see the dhcp client port with netstat -a --udp -n udp 0 0 0.0.0.0:68 0.0.0.0:* udp 0 0 0.0.0.0:68 0.0.0.0:* Any idea on what I'm missing ? Also, is any of this work making it's way into a released DHCP ? What's the process for this ? Is there some branch in a source repository where this work is available ? Thanks in advance for any pointers on all this. -- Hal ___ ewg mailing list e...@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ewg] DHCP over InfiniBand Update
On Mon, 6 Sep 2010 09:15:50 -0400 Hal Rosenstock hal.rosenst...@gmail.com wrote: Hi Sébastien, On Mon, Sep 6, 2010 at 9:03 AM, sebastien dugue sebastien.du...@bull.net wrote: Hi Hal, On Tue, 31 Aug 2010 16:09:19 -0400 Hal Rosenstock hal.rosenst...@gmail.com wrote: Hi, There appear to be two basic approaches to supporting DHCP (over InfiniBand) in Linux. There's LPF support (4.1.1 based) and older (3.0.4 based) socket support. The 4.1.1 LPF patches are: http://lists.openfabrics.org/pipermail/ewg/2010-May/015265.html http://lists.openfabrics.org/pipermail/ewg/2010-May/015266.html http://lists.openfabrics.org/pipermail/ewg/2010-May/015264.html The last being Matthieu Hautreux's matthieu.hautreux at cea.fr improved XID generation (same as https://lists.isc.org/mailman/htdig/dhcp-hackers/2009-January/001773.html). AFAIT an LPF based approach will only work on older kernels (due to elimination of CONFIG_FILTER support). Is this accurate ? Where have you seen that the LPF approach does not work on recent kernels? AFAICR, the CONFIG_FILTER disappeared a long time ago. Unless I'm missing something, you only need the CONFIG_PACKET option. The question was based on the README and some code in lpf.c but it sounds those comments relating to CONFIG_FILTER relate to 2.4 and not to 2.6 based kernels then. All that is needed with a 2.6 kernel is CONFIG_PACKET so the PF_PACKET socket can be created and used by lpf. Right ? Absolutely right. Out of curiousity, why did you choose a PF_PACKET rather than a UDP socket based approach ? The UDP socket approach seems simpler but maybe has some other pitfalls. For unknown reasons which I did not have time to investigate at the time, I could not make it work using a plain UDP socket which I agree should be straightforward. So I did it the LPF way. Sébastien. Thanks again. -- Hal Sébastien. OFED has two patches for 3.0.4 for a socket approach in http://www.openfabrics.org/git/?p=~tziporet/docs.git;a=tree;f=dhcp;h=aec68a2905559c8ed91f1157fa11d78cccb266cd;hb=ofed_1_5 dhcp-3.0.4.patch 0001-Make-DHCP-server-print-HW-info.patch I've been upporting those to a 4.x based DHCP and have a fundamental question which occurs even with the 3.0.4 socket based version. On the client machine, the DHCPOFFER in response to the DHCPDISCOVER is received (seen with tcpdump) but never seems to make it to the dhclient application. I can't see any kernel stack error counters incremented so I'm mystified as to what could be going wrong. I've also tried this on a number of different kernels. Any idea on why this might be or how to figure out where that packet is going ? I do see the dhcp client port with netstat -a --udp -n udp 0 0 0.0.0.0:68 0.0.0.0:* udp 0 0 0.0.0.0:68 0.0.0.0:* Any idea on what I'm missing ? Also, is any of this work making it's way into a released DHCP ? What's the process for this ? Is there some branch in a source repository where this work is available ? Thanks in advance for any pointers on all this. -- Hal ___ ewg mailing list e...@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
hang when exiting opensm or diagnet
We have a situation where ibdiagnet does not succeed in exiting. (OpenSM displays the same behavior). Performing a sysrq_trigger generates the following stack trace for ibdiagnet: kernel: ibdiagnet D 810001004420 0 7004 6854 kernel: 81066899fbc8 0046 81036716 810367160310 kernel: 81066899fbb8 0009 81066dfac860 80301ae0 kernel: 008a75579daa 8686 81066dfaca48 kernel: Call Trace: kernel: [80063167] wait_for_completion+0x79/0xa2 kernel: [8008cabc] default_wake_function+0x0/0xe kernel: [8839034d] :ib_mad:ib_cancel_rmpp_recvs+0xd0/0x113 kernel: [8838d52e] :ib_mad:ib_unregister_mad_agent+0x30d/0x424 kernel: [8022296d] sk_free+0xc3/0x105 kernel: [883d63fd] :ib_umad:ib_umad_close+0x9d/0xd6 kernel: [800129df] __fput+0xae/0x198 kernel: [80023a17] filp_close+0x5c/0x64 kernel: [80038f1a] put_files_struct+0x63/0xae Does anyone have any ideas as to what could be causing this? -Jack -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] infiniband-diags: Do not exit when unexpected node found
On 11:28 Sun 05 Sep , Eli Dorfman (Voltaire) wrote: Show error message but do not exit when unexpected node is found. Signed-off-by: Eli Dorfman e...@voltaire.com Applied. Thanks. Sasha -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] inifiband-diags: Support Voltaire switch ISR4200
On 11:30 Sun 05 Sep , Eli Dorfman (Voltaire) wrote: Support Voltaire switch (ISR4200) grouping. Signed-off-by: Eli Dorfman e...@voltaire.com Applied. Thanks. Sasha -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Multicast group pre-creation for TopSpin old stack
Hi Yevgeny, On 12:07 Sun 05 Sep , Yevgeny Kliteynik wrote: There's a hack in the SM to deal with TopSpin's non-compliant join compmask for IPoIB v4 multicast group ff12:401b:pkey::1 The group is pre-defined and maintained as a well-known multicast group. opensm/osm_prtn.c: 232 /* workaround for TS */ 233 /* FIXME: remove this upon TS fixes */ 234 mc_rec.mgid = osm_ts_ipoib_mgid; 235 memcpy(mc_rec.mgid.raw[4], pkey, sizeof(pkey)); 236 /* Scope in MCMemberRecord (if present) needs to be consistent with MGID */ 237 mc_rec.scope_state = ib_member_set_scope_state(scope, IB_MC_REC_STATE_FULL_MEMBER); 238 ib_mgid_set_scope(mc_rec.mgid, scope); 239 240 status = osm_mcmr_rcv_find_or_create_new_mgrp(p_sa, comp_mask, mc_rec, 241p_mgrp); 242 if (p_mgrp) { 243 p_mgrp-well_known = TRUE; 244 if (!p-mgrp) 245 p-mgrp = p_mgrp; 246 } As far as I can tell, this was added before git history. Any idea if it's still needed? I don't know, just remember that we have asked about this during partition manager development (somewhere in 2006), and have got an answer like it is probably needed. Personally I've never seen this used. Maybe Hal may remember more. Sasha -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] opensm/osm_qos_policy.c: change a log message
On 18:24 Sun 05 Sep , Yevgeny Kliteynik wrote: Pring multicast group GID rather than LID - MGID is more relevant than MLID in this context. Signed-off-by: Yevgeny Kliteynik klit...@dev.mellanox.co.il Applied with tiny change (see below). Thanks. --- V2: fixed the log message opensm/opensm/osm_qos_policy.c |7 +-- 1 files changed, 5 insertions(+), 2 deletions(-) diff --git a/opensm/opensm/osm_qos_policy.c b/opensm/opensm/osm_qos_policy.c index 72df6c8..fa04e10 100644 --- a/opensm/opensm/osm_qos_policy.c +++ b/opensm/opensm/osm_qos_policy.c @@ -48,6 +48,7 @@ #include stdlib.h #include string.h #include ctype.h +#include arpa/inet.h #include opensm/osm_log.h #include opensm/osm_node.h #include opensm/osm_port.h @@ -772,6 +773,7 @@ static void __qos_policy_validate_pkey( uint8_t sl; uint32_t flow; uint8_t hop; + char gid_str[INET6_ADDRSTRLEN]; I moved this string declaration to be under related 'if'... if (!p_qos_policy || !p_qos_match_rule || !p_prtn) return; @@ -801,9 +803,10 @@ static void __qos_policy_validate_pkey( sl, flow, hop); if (sl != p_prtn-sl) { there... Sasha OSM_LOG(p_qos_policy-p_subn-p_osm-log, OSM_LOG_DEBUG, - Updating MCGroup (MLID 0x%04x) SL to + Updating MCGroup (MGID %s) SL to match partition SL (%u)\n, - cl_hton16(p_prtn-mgrp-mcmember_rec.mlid), + inet_ntop(AF_INET6, p_prtn-mgrp-mcmember_rec.mgid.raw, + gid_str, sizeof gid_str), p_prtn-sl); p_prtn-mgrp-mcmember_rec.sl_flow_hop = ib_member_set_sl_flow_hop(p_prtn-sl, flow, hop); -- 1.6.2.4 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: trouble sending mail to mailing list
Hi Tziporet. On Friday, Owen Media (cc'd) cut over to a new server (not sofa.openfabrics.org - I still don't understand this), and contacted GoDaddy to move our domain. ping openfabrics.org PING openfabrics.org (198.63.41.212) 56(84) bytes of data. 64 bytes from open11.tempdomainname.com (198.63.41.212): icmp_seq=1 ttl=47 time=99.1 ms 64 bytes from open11.tempdomainname.com (198.63.41.212): icmp_seq=2 ttl=47 time=95.6 ms So it doesn't seem to be at John Companies any more. I assume the builds etc., are still happening at John Companies, but I'm not sure. I'm pretty sure the domain move is the root of the list problems, Anyway, Paul, Jeff etc., stuff seems to be broken right now. Please fix. Thanks. -jeff On 09/06/2010 02:47 AM, Tziporet Koren wrote: Yes - all OFA web is not working fine I will send mails to other people too Tziporet -Original Message- From: Jonathan Perkins [mailto:perki...@cse.ohio-state.edu] Sent: Monday, September 06, 2010 7:53 AM To: Jeff Becker; Tziporet Koren Cc: Dhabaleswar Panda Subject: trouble sending mail to mailing list Hi, I'm not sure if this is just happening to me or not but I'm unable to send mail to the e...@lists.openfabrics.org or e...@openfabrics.org. I'm getting messages back stating that the servers cannot be found. Is this expected with the transitioning of the administration of the new servers? I also cannot log into www.openfabrics.org at this time but I just used the ip address of the old server to circumvent this issue. The original message was received at Mon, 6 Sep 2010 00:44:11 -0400 (EDT) from mail-vw0-f50.google.com [209.85.212.50] - The following addresses had permanent fatal errors - e...@lists.openfabrics.org (reason: 550 Host unknown) - Transcript of session follows - 550 5.1.2 e...@lists.openfabrics.org... Host unknown (Name server: lists.openfabrics.org: host not found) Final-Recipient: RFC822; e...@lists.openfabrics.org Action: failed Status: 5.1.2 Remote-MTA: DNS; lists.openfabrics.org Diagnostic-Code: SMTP; 550 Host unknown Last-Attempt-Date: Mon, 6 Sep 2010 00:44:11 -0400 (EDT) -- Forwarded message -- From: Jonathan Perkins perki...@cse.ohio-state.edu To: Vladimir Sokolovsky v...@mellanox.co.il, e...@lists.openfabrics.org Date: Mon, 6 Sep 2010 00:44:10 -0400 Subject: mvapich2 srpm uploaded Hi all: I've uploaded a pre-release of mvapich2-1.5.1 to the (old) openfabrics server at ~perkinjo/ofed_1_5/mvapich2-1.5.1-0.1.20100905svn4167.src.rpm. It's also indicated by the ~perkinjo/ofed_1_5/latest.txt file. This tarball contains bug fixes and several enhancements compared to the mvapich2-1.5 release. Our formal release will follow soon. FYI, I'm unable to log into the new openfabrics server (www.openfabrics.org). Is it expected to log into 69.55.239.13 using its ip address? -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] opensm/osm_qos_policy.c: fix SL for TS-precreated mcast group
On 06-Sep-10 7:56 PM, Sasha Khapyorsky wrote: On 13:01 Sun 05 Sep , Yevgeny Kliteynik wrote: Assuming we still need to support TopSpin's non-compliant join compmask for IPoIB v4 multicast group ff12:401b:pkey::1 Wouldn't it be better to understand an use case of this group? It is probably obsolete and such addition is not needed. Sure, I'd trade this patch for another patch that removes the mcast group pre-creation. If no one objects, I'll remove that TS hack. -- Yevgeny Sasha The group is pre-defined and maintained as a well-known multicast group, hence it's SL needs to be fixed too in accordance with QoS policy configuration, same as for ff12:401b:pkey::: MGID. Signed-off-by: Yevgeny Kliteynikklit...@dev.mellanox.co.il --- opensm/opensm/osm_qos_policy.c | 35 +++ 1 files changed, 35 insertions(+), 0 deletions(-) diff --git a/opensm/opensm/osm_qos_policy.c b/opensm/opensm/osm_qos_policy.c index ac49ab3..bdd27d0 100644 --- a/opensm/opensm/osm_qos_policy.c +++ b/opensm/opensm/osm_qos_policy.c @@ -764,6 +764,20 @@ static osm_qos_port_group_t *__qos_policy_get_port_group_by_name( /*** ***/ +/* + * HACK: Until TS resolves their noncompliant join compmask, + * we have to fix SL for this pre-defined the MGID too + */ +static const ib_gid_t osm_ts_ipoib_mgid = { +{ + 0xff, /* multicast field */ + 0x12, /* non-permanent bit, link local scope */ + 0x40, 0x1b,/* IPv4 signature */ + 0xff, 0xff,/* 16 bits of P_Key (to be filled in) */ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,/* 48 bits of zeros */ + 0x00, 0x00, 0x00, 0x01,/* 32 bit IPv4 broadcast address */ + }, +}; static void __qos_policy_validate_pkey( osm_qos_policy_t * p_qos_policy, @@ -773,6 +787,9 @@ static void __qos_policy_validate_pkey( uint8_t sl; uint32_t flow; uint8_t hop; +ib_gid_t mgid; +ib_net16_t pkey; +osm_mgrp_t * mgrp; char gid_str[INET6_ADDRSTRLEN]; if (!p_qos_policy || !p_qos_match_rule || !p_prtn) @@ -810,6 +827,24 @@ static void __qos_policy_validate_pkey( p_prtn-mgrp-mcmember_rec.sl_flow_hop = ib_member_set_sl_flow_hop(p_prtn-sl, flow, hop); } + +/* workaround for TS */ +/* FIXME: remove this upon TS fixes */ +mgid = osm_ts_ipoib_mgid; +pkey = p_prtn-pkey | cl_hton16(0x8000); +memcpy(mgid.raw[4],pkey, sizeof(pkey)); +mgrp = osm_get_mgrp_by_mgid(p_qos_policy-p_subn,mgid); +if (mgrp) { +OSM_LOG(p_qos_policy-p_subn-p_osm-log, OSM_LOG_DEBUG, +TS workaround: Updating MCGroup (MGID %s) SL to +match partition SL (%u)\n, +inet_ntop(AF_INET6, mgid.raw, gid_str, sizeof gid_str), +p_prtn-sl); +ib_member_get_sl_flow_hop(mgrp-mcmember_rec.sl_flow_hop, +sl,flow,hop); +mgrp-mcmember_rec.sl_flow_hop = +ib_member_set_sl_flow_hop(p_prtn-sl, flow, hop); +} } /*** -- 1.6.2.4 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Multicast group pre-creation for TopSpin old stack
On Mon, Sep 6, 2010 at 12:43 PM, Sasha Khapyorsky sas...@voltaire.com wrote: Hi Yevgeny, On 12:07 Sun 05 Sep , Yevgeny Kliteynik wrote: There's a hack in the SM to deal with TopSpin's non-compliant join compmask for IPoIB v4 multicast group ff12:401b:pkey::1 The group is pre-defined and maintained as a well-known multicast group. opensm/osm_prtn.c: 232 /* workaround for TS */ 233 /* FIXME: remove this upon TS fixes */ 234 mc_rec.mgid = osm_ts_ipoib_mgid; 235 memcpy(mc_rec.mgid.raw[4], pkey, sizeof(pkey)); 236 /* Scope in MCMemberRecord (if present) needs to be consistent with MGID */ 237 mc_rec.scope_state = ib_member_set_scope_state(scope, IB_MC_REC_STATE_FULL_MEMBER); 238 ib_mgid_set_scope(mc_rec.mgid, scope); 239 240 status = osm_mcmr_rcv_find_or_create_new_mgrp(p_sa, comp_mask, mc_rec, 241 p_mgrp); 242 if (p_mgrp) { 243 p_mgrp-well_known = TRUE; 244 if (!p-mgrp) 245 p-mgrp = p_mgrp; 246 } As far as I can tell, this was added before git history. Any idea if it's still needed? I don't know, just remember that we have asked about this during partition manager development (somewhere in 2006), and have got an answer like it is probably needed. Personally I've never seen this used. Maybe Hal may remember more. It's been carried forward just in case anyone was still running the old TS stack. It's always hard to determine the negative (when no one is using it any longer) (with the latest OpenSM) but it may be high time to deprecate this. Is anyone still using the old TS stack ? If so, is it used with OpenSM ? Maybe Roland has insight into whether the old TS stack might be being used with (the latest) OpenSM. -- Hal Sasha -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html