Title: RE: [openib-general] IPoIB still not working

Forgive me for not following the entire thread.
But I did take a look at the log files:

The 64bit version have the following  multicast activities:
1. Port 0x0002c9010ad258f1 joining MLID 0xC000 -> success.
   Note that MLID 0xC000 is predefined (IPoIB).
            MGID....................0xff12401bffff0000 : 0x00000000ffffffff
            PortGid.................0xfe80000000000000 : 0x0002c9010ad258f1
            qkey....................0x0
            Mlid....................0x0
            ScopeState..............0x1
            Rate....................0x0
            Mtu.....................0x0

2. Port 0x0002c9010ad258f1 joining MLID 0xC000. (Again).
            MGID....................0xff12401bffff0000 : 0x00000000ffffffff
            PortGid.................0xfe80000000000000 : 0x0002c9010ad258f1
            qkey....................0x1B0B0000
            Mlid....................0xC000
            ScopeState..............0x11
            Rate....................0x3
            Mtu.....................0x4
    -> considered as an update to the scope state.

3. Request to join :
            MGID....................0xff12601bffff0000 : 0x0000000000000016
            PortGid.................0xfe80000000000000 : 0x0002c9010ad258f1
            qkey....................0x0
            Mlid....................0x0
            ScopeState..............0x1
            Rate....................0x0
            Mtu.....................0x0
Results with - ERR 1B10: Provided Join State != FullMember - required for create.
You can not create a group if you are not a full member.

4. A sequence of requests arrive to create MGRPs with several MGIDs:
MGID 0xff12601bffff0000:0x0000000000000002
MGID 0xff12601bffff0000:0x0000000000000016
MGID 0xff12601bffff0000:0x00000001ffd258f1
All fail due to the same join state issue.

Inspecting the 32bit version:
I see only one request to join
Port 0x0002c90107fc5be1 joining MLID 0xC000
And it succeeds

Hope this helps.    

Eitan Zahavi
Design Technology Director
Mellanox Technologies LTD
Tel:+972-4-9097208
Fax:+972-4-9593245
P.O. Box 586 Yokneam 20692 ISRAEL


-----Original Message-----
From: Woodruff, Robert J [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, December 08, 2004 3:12 AM
To: Roland Dreier
Cc: [EMAIL PROTECTED]
Subject: RE: [openib-general] IPoIB still not working

 
Here are some log files.

First file, mcast-64.log is the /var/log/messages output
from the patch you sent on the 64-bit system.

Next log files is the opensm log file
osm-64bit.log

Next log file is the opensm log file when running the 32-node.
osm-32-bit.log


In the passing case, ipoib sends 2 MCM messages and opensm has no
complaints.
Search for MCMember Record in osm-32-bit.log

In the failing case, ipoib sends 2 MCM messages that look similar with
no errors
reported. However, in the failing case ipoib continues to send MCM
messages
that opensm rejects. In the failing case there are a couple of
differences, first the MGID lower 32-bits appear to be 0xffffffff in the
passing case and something else when it fails.
Second, it appears that perhaps the opensm is rejecting the messages
because
of a bug where the scope and join fields are reversed when extracted
from
the mad. In the passing case, since the lower 32 bits of the mgid are
0xfffffffff,
you never get to the code that checks the join member.
Someone that understands opensm should look at this, but Sean
I think it may be wrong.

This however does not explain why in the failing case, ipoib continues
to
try to join the mcast group unless it is having difficulties after
trying yo
join he group and decides to re-try, with the subsequent re-tries to
join being failed by opensm.

_______________________________________________
openib-general mailing list
[EMAIL PROTECTED]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to