The problem I have with back-to-back plug is that it is a fatal case if found in a case where there was no use of this plug. So we will need some sort of user input if it is OK or not.
The case of moving a port in the middle of a sweep can be easily detected if instead of reporting an error a second check of the original DR where the same GUID was found is performed... Eitan Zahavi Senior Engineering Director, Software Architect Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL > -----Original Message----- > From: Sasha Khapyorsky [mailto:[EMAIL PROTECTED] > Sent: Friday, July 27, 2007 4:07 AM > To: Eitan Zahavi > Cc: Hal Rosenstock; OpenFabrics General; Yevgeny Kliteynik > Subject: Re: OpenSM detection of duplicated GUIDs on loopback > > On 09:25 Thu 26 Jul , Eitan Zahavi wrote: > > > Hi Eitan, Hal, > > > > > > On 20:44 Wed 25 Jul , Eitan Zahavi wrote: > > > > > > > > I am not following you. > > > > Why do a user need to run -y if a simple legal cable > connector is > > > > plugged? > > > > > > Because duplicated GUIDs detector can aborts OpenSM when regular > > > port is reconnected to another location during hard sweep. This > > > issue is not related to loopback plug at all. > > I think we should handle the case of "migrated port" in a > more global > > sense: > > If a port "moved" during the sweep we have to do a new sweep anyway. > > Another option is just to use recently discovered port > location. In case of CA it could work, switch migration can > be more complicated. > > > Maybe we could delay the 'abort' to the second sweep. > > > > So practically I propose: > > 1. Add state flag "was duplicated" on the port saying it > was reported > > as duplicate GUID. > > 2. Set the variable controlling a forced secodn sweep > (similar to the > > one used if we got Set error) > > We even can catch this yet before drop_manager and just rediscover. > > > 3. Repeat the sweep - if we find a port where it is a duplicate and > > the "was duplicated" flag is set - abort. > > > > A refinement for the user who is doing many changes > continuously might > > be to keep a counter. > > And have the abort happen after the Nth iteration. > > It is better approach than what we have today. > > > > > > > > The issue is only if a "loop back" plug connecting a port > > > to itself is > > > > plugged. > > > > > > No, not only. Now there are two completely separate known issues > > > with duplicated GUIDs detector: > > > > > > 1. Port moving > > > 2. Loopback plug > > > > > > And I think that _both_ should be solved. And if just using '-y' > > > could be suitable for (2) because it is esoteric > (although perfectly > > > legal) use, it is not acceptable solution for (1). > > > > > > I think we need to improve GUIDs duplication detector > instead. For > > > example we could add NodeInfo comparison there, and only > in case if > > > it is different drop GUIDs duplication error. Also I think this > > > should not be fatal error and should not abort OpenSM, > just logging > > > (probably via syslog too) should be sufficient - > non-working port is > > > good reason to look at logs. Another ideas? > > The problem is that the SM will sort of figure out the network but > > will create a completely bogus routing etc. > > Right. But it is not so with back-to-back (when loopback plug > could be interpreted as back-to-back duplicated GUID). So no > need to abort in this (back-to-back/loopback) case. Agreed? > > Sasha > > > > > > > > > Sasha > > > > > > > Do users use these plugs? For what sake? > > > > > > > > > > > > Eitan Zahavi > > > > Senior Engineering Director, Software Architect Mellanox > > > Technologies > > > > LTD > > > > Tel:+972-4-9097208 > > > > Fax:+972-4-9593245 > > > > P.O. Box 586 Yokneam 20692 ISRAEL > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > From: Sasha Khapyorsky [mailto:[EMAIL PROTECTED] > > > > > Sent: Wednesday, July 25, 2007 3:19 AM > > > > > To: Eitan Zahavi > > > > > Cc: Hal Rosenstock; OpenFabrics General; Yevgeny Kliteynik > > > > > Subject: Re: OpenSM detection of duplicated GUIDs on loopback > > > > > > > > > > On 23:25 Tue 24 Jul , Eitan Zahavi wrote: > > > > > > > > > > > > On 7/24/07, Eitan Zahavi <[EMAIL PROTECTED]> wrote: > > > > > > > > > > > > Maybe avoid the log if -y is provided? > > > > > > > > > > > > > > > > > > That avoids the spew but the duplicated GUID is > > > > > important to know so > > > > > > IMO something in the "middle" is needed where > > > duplicated GUIDs are > > > > > > logged but not continually the same ones. > > > > > > [EZ] > > > > > > OK so in -y mode only we track which ones were reported > > > > > and do not > > > > > > repeat the log? > > > > > > > > > > And how port moving problem should be solved? > > > > > > > > > > We cannot ask an user to run OpenSM with '-y' if in > > > her/his plans to > > > > > reconnect some ports in a future and just decrease logging. > > > > > > > > > > Sasha > > > > > > > > > _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
