Re: [ewg] Update from September OpenFabrics Interoperability Event at UNH-IOL

2009-09-05 Thread Bob Noseworthy

Yes indeed,  please refer to the June 09 published Logo List for details:
http://www.iol.unh.edu/services/testing/ofa/interoplist/09jun/#rnic

Best Regards,
- Bob Noseworthy
 Chief Engineer / Technical Sherpa
 +1-909-891-0090 {unified phone number for office, cell, etc}
 +1-603-862-0090 {IOL Main number-associate this with any shipments}
 University of New Hampshire's InterOperability Laboratory (UNH-IOL)

pandit ib wrote:

Has there been any new interoperability testing between the iWARP
vendors since Oct 08?

Ranjit


On Tue, Oct 21, 2008 at 9:40 AM, Bob Noseworthyr...@iol.unh.edu wrote:
  

Greetings EWG members,
 A bug for the observed IPoIB issue was logged last Friday,  and updated
yesterday confirming that RC3 still demonstrates the issue. This is logged
as #1287 --  https://bugs.openfabrics.org/show_bug.cgi?id=1287

Further issues/observations from the recent OFA Interoperability Logo
Group's September Interoperability Event are at the end of this email.
Summary of reported IPoIB issue:
If IPoIB datagram mode is enabled,  and IP frames of 8K or larger are sent,
 and no ARP entry exists for the destination,  then the first IP frame is
always lost (ping used),  no matter what the timeout is set to (as high as
15s)


The following is a short summary of various updates from the September
OpenFabrics Interoperability Event.  Due to confidentiality reasons, many
details are occluded.  Per the request of the IWG on Oct 14, this
information is being shared with the EWG.

==


Below are rough notes from our testers, principally Nick Wood and Mike
Hagen.
IB update;

1. An SDP issue was observed once and not reproduced - suspected to be an
issue with starting testing too soon after netserver was started while all
three SDP tests were running simultaneously.   When retesting was performed
tests were not run simultaneously and no issues were seen.

2. An SRP issues was observed once and not reproduced - A vendors SRP target
was seen to become unresponsive when srp_sg_tablesize was increased to 255.
 Subsequent testing did not reproduce this behavior but is still being
pursued.

2a.  A vendors HCA was seen to perform slowly on SRP transfers,  this was
traced to an issue with the default srp_sg_tablesize of 16 had to be
increased to 131 for reasonable performance.Reminder - performance is
outside the scope of the Logo program. Tziporet - this default value perhaps
could be increased as recommended unless there is a reason 16 is preferred.



3. There is a link issue between two vendor's HCA cards. The fix that
was introduced allowed the link indication light to come up however
ibdiagnet never completes (hangs at IPoIB subnets check) and had to be
killed. Ibdiagnet also reports the following error:


-I---
-I- PM Counters Info
-I---
-E- Could not get PM info:
 pmGetPortCounters 0x 1 failed 4 consecutive times.
-E- Could not get PM info:
 pmGetPortCounters 0x 1 failed 4 consecutive times.
-I- No illegal PM counters values were found

 This happens with both VendorA cards when linked to any speed card from
VendorB *without* an sm running. If there is an sm running and the fix is in
place on the machines housing the VendorA cards then everything works
flawlessly when linked with any speed VendorB card.

 Upon removal of the cable from the VendorA card, that card gets put into a
bad state; with the fix in place and an sm running. The sm does not activate
the newly established link. This happened with VendorA cards to any VendorB
card. OpenSM also reports an error on screen; OpenSM: SM port is down.
Reestablishing the connection that was in place when the opensm instance was
started restores the active state.

 One final bit of information that I have been able to glean. It does not
appear to matter if you restore the original connection that the opensm was
started on. The only connection that brings the card back to an active state
is if you link it with a qdr hca even if that connection was not the
original. If you then attempt to restore the original the active state will
not be restored.
Currently this issue is presumed to be principally a vendor matter, but if
evidence points to additional issues with ibdiagnet, or other OFED matters,
then bugs will be filed.


4.  Similar to the above issue,  it was observed that two vendor's HCAs that
should link at DDR when directly connected were actually linking at SDR
speeds, regardless of the cable used.  This is a known issue however seems
to be a failure of the Link Init test procedure as the highest denominator
speed is not achieved.

5. An issue with ibdiagnet was discovered by a vendor and bugs submitted
(unrelated to issue 3 above)

==

iWARP update;

1. dapltest -T P will not work between two  cards.  They both have
implemented a different peer2peer protocol that ensures that a client does a
transfer before the server, to 

Re: [ewg] Update from September OpenFabrics Interoperability Event at UNH-IOL

2009-09-04 Thread pandit ib
Has there been any new interoperability testing between the iWARP
vendors since Oct 08?

Ranjit


On Tue, Oct 21, 2008 at 9:40 AM, Bob Noseworthyr...@iol.unh.edu wrote:
 Greetings EWG members,
  A bug for the observed IPoIB issue was logged last Friday,  and updated
 yesterday confirming that RC3 still demonstrates the issue. This is logged
 as #1287 --  https://bugs.openfabrics.org/show_bug.cgi?id=1287

 Further issues/observations from the recent OFA Interoperability Logo
 Group's September Interoperability Event are at the end of this email.
 Summary of reported IPoIB issue:
 If IPoIB datagram mode is enabled,  and IP frames of 8K or larger are sent,
  and no ARP entry exists for the destination,  then the first IP frame is
 always lost (ping used),  no matter what the timeout is set to (as high as
 15s)


 The following is a short summary of various updates from the September
 OpenFabrics Interoperability Event.  Due to confidentiality reasons, many
 details are occluded.  Per the request of the IWG on Oct 14, this
 information is being shared with the EWG.

 ==


 Below are rough notes from our testers, principally Nick Wood and Mike
 Hagen.
 IB update;

 1. An SDP issue was observed once and not reproduced - suspected to be an
 issue with starting testing too soon after netserver was started while all
 three SDP tests were running simultaneously.   When retesting was performed
 tests were not run simultaneously and no issues were seen.

 2. An SRP issues was observed once and not reproduced - A vendors SRP target
 was seen to become unresponsive when srp_sg_tablesize was increased to 255.
  Subsequent testing did not reproduce this behavior but is still being
 pursued.

 2a.  A vendors HCA was seen to perform slowly on SRP transfers,  this was
 traced to an issue with the default srp_sg_tablesize of 16 had to be
 increased to 131 for reasonable performance.    Reminder - performance is
 outside the scope of the Logo program. Tziporet - this default value perhaps
 could be increased as recommended unless there is a reason 16 is preferred.



 3.     There is a link issue between two vendor's HCA cards. The fix that
 was introduced allowed the link indication light to come up however
 ibdiagnet never completes (hangs at IPoIB subnets check) and had to be
 killed. Ibdiagnet also reports the following error:


 -I---
 -I- PM Counters Info
 -I---
 -E- Could not get PM info:
  pmGetPortCounters 0x 1 failed 4 consecutive times.
 -E- Could not get PM info:
  pmGetPortCounters 0x 1 failed 4 consecutive times.
 -I- No illegal PM counters values were found

  This happens with both VendorA cards when linked to any speed card from
 VendorB *without* an sm running. If there is an sm running and the fix is in
 place on the machines housing the VendorA cards then everything works
 flawlessly when linked with any speed VendorB card.

  Upon removal of the cable from the VendorA card, that card gets put into a
 bad state; with the fix in place and an sm running. The sm does not activate
 the newly established link. This happened with VendorA cards to any VendorB
 card. OpenSM also reports an error on screen; OpenSM: SM port is down.
 Reestablishing the connection that was in place when the opensm instance was
 started restores the active state.

  One final bit of information that I have been able to glean. It does not
 appear to matter if you restore the original connection that the opensm was
 started on. The only connection that brings the card back to an active state
 is if you link it with a qdr hca even if that connection was not the
 original. If you then attempt to restore the original the active state will
 not be restored.
 Currently this issue is presumed to be principally a vendor matter, but if
 evidence points to additional issues with ibdiagnet, or other OFED matters,
 then bugs will be filed.


 4.  Similar to the above issue,  it was observed that two vendor's HCAs that
 should link at DDR when directly connected were actually linking at SDR
 speeds, regardless of the cable used.  This is a known issue however seems
 to be a failure of the Link Init test procedure as the highest denominator
 speed is not achieved.

 5. An issue with ibdiagnet was discovered by a vendor and bugs submitted
 (unrelated to issue 3 above)

 ==

 iWARP update;

 1. dapltest -T P will not work between two  cards.  They both have
 implemented a different peer2peer protocol that ensures that a client does a
 transfer before the server, to overcome the limitation in the iWARP standard
 that says a client must send first data or the connection must be teared
 down.

 2. The section in the IWG test suite covering dapl must be updated to
 include at least some reference to /etc/dat.conf which must be configured in
 order to use any dapl based application including many MPIs and dapltest.
 (This 

[ewg] Update from September OpenFabrics Interoperability Event at UNH-IOL

2008-10-21 Thread Bob Noseworthy

Greetings EWG members,
 A bug for the observed IPoIB issue was logged last Friday,  and 
updated yesterday confirming that RC3 still demonstrates the issue. 
This is logged as #1287 --  
https://bugs.openfabrics.org/show_bug.cgi?id=1287


Further issues/observations from the recent OFA Interoperability Logo 
Group's September Interoperability Event are at the end of this email.  


Summary of reported IPoIB issue:
If IPoIB datagram mode is enabled,  and IP frames of 8K or larger are 
sent,  and no ARP entry exists for the destination,  then the first IP 
frame is always lost (ping used),  no matter what the timeout is set to 
(as high as 15s)



The following is a short summary of various updates from the September 
OpenFabrics Interoperability Event.  Due to confidentiality reasons, 
many details are occluded.  Per the request of the IWG on Oct 14, this 
information is being shared with the EWG.


==


Below are rough notes from our testers, principally Nick Wood and Mike 
Hagen.

IB update;

1. An SDP issue was observed once and not reproduced - suspected to be 
an issue with starting testing too soon after netserver was started 
while all three SDP tests were running simultaneously.   When retesting 
was performed tests were not run simultaneously and no issues were seen.


2. An SRP issues was observed once and not reproduced - A vendors SRP 
target was seen to become unresponsive when srp_sg_tablesize was 
increased to 255.  Subsequent testing did not reproduce this behavior 
but is still being pursued.


2a.  A vendors HCA was seen to perform slowly on SRP transfers,  this 
was traced to an issue with the default srp_sg_tablesize of 16 had to be 
increased to 131 for reasonable performance.Reminder - performance 
is outside the scope of the Logo program. 
Tziporet - this default value perhaps could be increased as recommended 
unless there is a reason 16 is preferred.




3. There is a link issue between two vendor's HCA cards. The fix 
that was introduced allowed the link indication light to come up however 
ibdiagnet never completes (hangs at IPoIB subnets check) and had to be 
killed. Ibdiagnet also reports the following error:



-I---
-I- PM Counters Info
-I---
-E- Could not get PM info:
  pmGetPortCounters 0x 1 failed 4 consecutive times.
-E- Could not get PM info:
  pmGetPortCounters 0x 1 failed 4 consecutive times.
-I- No illegal PM counters values were found

  This happens with both VendorA cards when linked to any speed card 
from VendorB *without* an sm running. If there is an sm running and the 
fix is in place on the machines housing the VendorA cards then 
everything works flawlessly when linked with any speed VendorB card.


  Upon removal of the cable from the VendorA card, that card gets put 
into a bad state; with the fix in place and an sm running. The sm does 
not activate the newly established link. This happened with VendorA 
cards to any VendorB card. OpenSM also reports an error on screen; 
OpenSM: SM port is down. Reestablishing the connection that was in place 
when the opensm instance was started restores the active state.


  One final bit of information that I have been able to glean. It does 
not appear to matter if you restore the original connection that the 
opensm was started on. The only connection that brings the card back to 
an active state is if you link it with a qdr hca even if that connection 
was not the original. If you then attempt to restore the original the 
active state will not be restored.  

Currently this issue is presumed to be principally a vendor matter, but 
if evidence points to additional issues with ibdiagnet, or other OFED 
matters, then bugs will be filed.



4.  Similar to the above issue,  it was observed that two vendor's HCAs 
that should link at DDR when directly connected were actually linking at 
SDR speeds, regardless of the cable used.  This is a known issue however 
seems to be a failure of the Link Init test procedure as the highest 
denominator speed is not achieved.


5. An issue with ibdiagnet was discovered by a vendor and bugs submitted 
(unrelated to issue 3 above)


==

iWARP update;

1. dapltest -T P will not work between two  cards.  They both have 
implemented a different peer2peer protocol that ensures that a client 
does a transfer before the server, to overcome the limitation in the 
iWARP standard that says a client must send first data or the connection 
must be teared down.


2. The section in the IWG test suite covering dapl must be updated to 
include at least some reference to /etc/dat.conf which must be 
configured in order to use any dapl based application including many 
MPIs and dapltest.   (This was being addressed by Arlin Davis)


3. dapl2.0 and dapltest2.0 do not work with iWARP devices.  From the 
base OFED1.4 install dapl2.0-utils must be uninstalled and