date:20100205

Re: [RFC Patch] net: reserve ports for applications using fixed portnumbers

2010-02-05 Thread Tetsuo Handa

Cong Wang wrote:
 The problem is that there are some existing applications which use
 fixed port number, we don't have chances to change this for them,
 thus making them working is desired, so they want to reserve these
 port for those applications.
 
 For example, if I have an appliction which uses port 4, but
 before this application starts, another application gets this port
 number by bind() with port 0 (i.e. chosen by kernel), in this case,
 that application will fail to start. Again, we don't have any chance
 to change the source code of that application.
 
And there is a utility called portreserved (port reserve daemon).
http://fedoraproject.org/wiki/Features/Portreserve

But that utility cannot close the race window between portreserved stops
reserving local port numbers and applications starts using local port
numbers which portreserved was reserving.

Thus, I think people want to have port reservation mechanism inside kernel
(if it has little impact).
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC Patch] net: reserve ports for applications using fixed port numbers

2010-02-05 Thread Octavian Purdila

On Friday 05 February 2010 06:45:38 you wrote:

 Again, using bitmap algorithm is not a problem and it's better, the
 problem is sysctl interface, how would you plan to interact with users
 via sysctl/proc if you use bitmap to handle this? I would like to hear
 more details about this.
 

We could use something like positive values for setting and negative for reset 
(e.g. 3 would set the port in the bitmap and -3 would reset it).

But we would need new sysctl and proc handlers to handle the bitmap case (e.g. 
sysctl_bitmap, proc_dobitmap_minmax).
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2] Using multi-smps on the wire in libibnetdisc

2010-02-05 Thread Hal Rosenstock

On Thu, Feb 4, 2010 at 7:13 PM, Ira Weiny wei...@llnl.gov wrote:
 On Thu, 4 Feb 2010 15:01:32 -0500
 Hal Rosenstock hal.rosenst...@gmail.com wrote:

 On Thu, Feb 4, 2010 at 1:00 PM, Ira Weiny wei...@llnl.gov wrote:
  On Thu, 4 Feb 2010 09:19:39 -0500
  Hal Rosenstock hal.rosenst...@gmail.com wrote:
 
  On Tue, Feb 2, 2010 at 7:45 PM, Ira Weiny wei...@llnl.gov wrote:
   Sasha,
  

 [snip]

  
   real    0m2.249s
   user    0m1.244s
   sys     0m0.936s
  
   14:40:59  time ./ibnetdiscover -o 4 --node-name-map 
   /etc/opensm/ib-node-name-map -g  new
  
   real    0m2.170s
   user    0m1.160s
   sys     0m0.933s
  
   14:41:10  /usr/sbin/ibqueryerrors  -s 
   RcvErrors,SymbolErrors,RcvSwRelayErrors,XmtWait -r --data
   Suppressing: RcvErrors SymbolErrors RcvSwRelayErrors XmtWait
   Errors for 0x66a00d90006fb SW19
     GUID 0x66a00d90006fb port 9: [VL15Dropped == 3] [XmtData == 25187379] 
   [RcvData == 25196688] [XmtPkts == 349861] [RcvPkts == 349954]
         Link info:    139   9[  ] ==( 4X 5.0 Gbps Active/  LinkUp)==  
   0x0002c9030001d736    864    1[  ] hyperion1 ( )
  
   Note that there were no additional VL15Dropped packets on the fabric.  
   I think 4 seems to be a good compromise.  I have not tested when there 
   are errors on the fabric.  (Right now things seem to be good!)
 
  Is this just with the SM doing light sweeping ?
 
  Yes.

 That's not a lot of SMP stress from the SM side. SMP consumers are SM,
 diags, and the unsolicited traps.

 Agreed.  I hope to test this more next week.

 
 
  Is there a speedup with 4 rather than 2 ?
 
  There is a bit of a speed up (~0.5 to 1.0 sec).  But my main reason to 
  want to
  go to 4 is that if there are issues on the fabric, unresponsive nodes 
  etc.; 4
  will give us better parallelism to get around these issues.  I have not had
  the chance to test this condition with the new algorithm but the original
  ibnetdiscover would slow way down when there are nodes which have 
  unresponsive
  SMA's.  If there are only 2 outstanding this will not give us much speed 
  up.
  This was the main motivation I had for improving the library in this way.
 
  Also, I think you are correct that we should increase OpenSM's default 
  from 4
  to 8.  For the same reason as above.  Some of our clusters have worked 
  better
  with 8 when we are having issues.  But right now we are still running with 
  4.

 I'm concerned about just increasing ibnetdiscover to 4 rather than 2.
 I've seen a number of clusters with SMP dropping with the current
 lower defaults.

 So OpenSM is seeing dropped packets?

OpenSM is seeing timeouts and there are VL15 drops in the subnet.

 With 4 SMP's on the wire?

Yes.

 I do see some
 VL15Dropped errors (maybe 2-3 a day) but I did not think that would be an
 issue.  What kind of rate are you seeing?

 The other question is; do people regularly run the tools which are using
 libibnetdisc (ibqueryerrors, iblinkinfo, ibnetdiscover)?

These tools are being used (at least ibnetdiscover and ibqueryerrors).

 We do.  If others
 are not then I would say this change would have less impact as they would want
 the diags to have some priority for debugging.  The other option is to change
 the patch to be a default of 2 and allow user to change it depending on what
 they are trying to do.  If you think that is best I will change the patch.

FWIW I think 2 is better until we have more exhaustive experience with
4. The other alternative would be to make it 4 and then see if people
start noticing (more) VL15 drops and possibly other issues.

-- Hal

 Ira


 -- Hal

  Ira
 
 
  -- Hal
 
  
   The first patch converts the algorithm and the second adds the 
   ibnd_set_max_smps_on_wire call.
  
   Let me know what you think.  Because the algorithm changed so much 
   testing this is a bit difficult because the order of the node discovery 
   is different.  However, I have done some extensive diffing of the 
   output of ibnetdiscover and things look good.
  
   Ira
  
   --
   Ira Weiny
   Math Programmer/Computer Scientist
   Lawrence Livermore National Lab
   925-423-8008
   wei...@llnl.gov
   --
   To unsubscribe from this list: send the line unsubscribe linux-rdma in
   the body of a message to majord...@vger.kernel.org
   More majordomo info at  http://**vger.kernel.org/majordomo-info.html
  
  --
  To unsubscribe from this list: send the line unsubscribe linux-rdma in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://**vger.kernel.org/majordomo-info.html
 
 
 
  --
  Ira Weiny
  Math Programmer/Computer Scientist
  Lawrence Livermore National Lab
  925-423-8008
  wei...@llnl.gov
 



 --
 Ira Weiny
 Math Programmer/Computer Scientist
 Lawrence Livermore National Lab
 925-423-8008
 wei...@llnl.gov

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2] Using multi-smps on the wire in libibnetdisc

2010-02-05 Thread Hal Rosenstock

On Thu, Feb 4, 2010 at 9:18 PM, Ira Weiny wei...@llnl.gov wrote:
 On Thu, 4 Feb 2010 16:13:25 -0800
 Ira Weiny wei...@llnl.gov wrote:

 On Thu, 4 Feb 2010 15:01:32 -0500
 Hal Rosenstock hal.rosenst...@gmail.com wrote:

  On Thu, Feb 4, 2010 at 1:00 PM, Ira Weiny wei...@llnl.gov wrote:
   On Thu, 4 Feb 2010 09:19:39 -0500
   Hal Rosenstock hal.rosenst...@gmail.com wrote:
  
   On Tue, Feb 2, 2010 at 7:45 PM, Ira Weiny wei...@llnl.gov wrote:
Sasha,
   

 [snip]

 [snip]

  
   Is there a speedup with 4 rather than 2 ?
  
   There is a bit of a speed up (~0.5 to 1.0 sec).  But my main reason to 
   want to
   go to 4 is that if there are issues on the fabric, unresponsive nodes 
   etc.; 4
   will give us better parallelism to get around these issues.  I have not 
   had
   the chance to test this condition with the new algorithm but the original
   ibnetdiscover would slow way down when there are nodes which have 
   unresponsive
   SMA's.  If there are only 2 outstanding this will not give us much speed 
   up.
   This was the main motivation I had for improving the library in this way.

 Ok, I found a fabric with just 2 nodes which were unresponsive...  A quick
 test shows...

 Original ibnetdiscover:

 18:12:29  time ./ibnetdiscover  foo
 ibwarn: [26993] mad_rpc: _do_madrpc failed; dport (DR path slid 0; dlid 0; 
 0,1,24,11,9)
 src/ibnetdisc.c:457; Query remote node (DR path slid 0; dlid 0; 0,1,24,11,9) 
 failed, skipping port
 ibwarn: [26993] mad_rpc: _do_madrpc failed; dport (DR path slid 0; dlid 0; 
 0,1,24,24,18,7,6)
 src/ibnetdisc.c:457; Query remote node (DR path slid 0; dlid 0; 
 0,1,24,24,18,7,6) failed, skipping port

 real    0m9.073s
 user    0m0.137s
 sys     0m0.172s

 18:12:43  time ./ibnetdiscover  foo
 ibwarn: [3] mad_rpc: _do_madrpc failed; dport (DR path slid 0; dlid 0; 
 0,1,24,11,9)
 src/ibnetdisc.c:457; Query remote node (DR path slid 0; dlid 0; 0,1,24,11,9) 
 failed, skipping port
 ibwarn: [3] mad_rpc: _do_madrpc failed; dport (DR path slid 0; dlid 0; 
 0,1,24,24,18,7,6)
 src/ibnetdisc.c:457; Query remote node (DR path slid 0; dlid 0; 
 0,1,24,24,18,7,6) failed, skipping port

 real    0m9.103s
 user    0m0.046s
 sys     0m0.046s


 *New* ibnetdiscover with different outstanding SMP's.

 18:12:14  time ./ibnetdiscover -o 2  foo
 src/query_smp.c:185; umad (DR path slid 0; dlid 0; 0,1,13,11,9 Attr 0x11:0) 
 bad status 110; Connection timed out
 src/query_smp.c:185; umad (DR path slid 0; dlid 0; 0,1,13,13,7,7,6 Attr 
 0x11:0) bad status 110; Connection timed out

 real    0m9.746s
 user    0m6.559s
 sys     0m3.156s

 18:13:00  time ./ibnetdiscover -o 4  foo
 src/query_smp.c:185; umad (DR path slid 0; dlid 0; 0,1,13,11,9 Attr 0x11:0) 
 bad status 110; Connection timed out
 src/query_smp.c:185; umad (DR path slid 0; dlid 0; 0,1,13,13,7,7,6 Attr 
 0x11:0) bad status 110; Connection timed out

 real    0m4.668s
 user    0m3.043s
 sys     0m1.601s

 18:13:10  time ./ibnetdiscover -o 8  foo
 src/query_smp.c:185; umad (DR path slid 0; dlid 0; 0,1,13,11,9 Attr 0x11:0) 
 bad status 110; Connection timed out
 src/query_smp.c:185; umad (DR path slid 0; dlid 0; 0,1,13,13,7,7,6 Attr 
 0x11:0) bad status 110; Connection timed out

 real    0m4.360s
 user    0m2.891s
 sys     0m1.451s


 Note that 2 does not give much speed up, where 4 does.  Obviously this could
 have to do with the fact there were 2 nodes which were bad (so if you had
 100's of nodes unresponsive a higher value might be worth using)

It depends on the number of unresponsive nodes being same or higher
than number of outstanding/parallel SMPs. In a sense, the number of
outstanding SMPs is a measure of how many unresponsive nodes one is
willing to tolerate before slowing down/waiting for timeouts. In some
environments, unresponsive nodes are a normal case.

-- Hal

 but as a
 default compromise I think 4 is good.

 Ira

  
   Also, I think you are correct that we should increase OpenSM's default 
   from 4
   to 8.  For the same reason as above.  Some of our clusters have worked 
   better
   with 8 when we are having issues.  But right now we are still running 
   with 4.
 
  I'm concerned about just increasing ibnetdiscover to 4 rather than 2.
  I've seen a number of clusters with SMP dropping with the current
  lower defaults.

 So OpenSM is seeing dropped packets?  With 4 SMP's on the wire?  I do see 
 some
 VL15Dropped errors (maybe 2-3 a day) but I did not think that would be an
 issue.  What kind of rate are you seeing?

 The other question is; do people regularly run the tools which are using
 libibnetdisc (ibqueryerrors, iblinkinfo, ibnetdiscover)?  We do.  If others
 are not then I would say this change would have less impact as they would 
 want
 the diags to have some priority for debugging.  The other option is to change
 the patch to be a default of 2 and allow user to change it depending on what
 they are trying to do.  If you think that is best I will change the patch.

 Ira

 
  -- Hal
 
   Ira

Re: [PATCH 0/8] ib/iser: major face lift of the data path code

2010-02-05 Thread Vladislav Bolkhovitin


Or Gerlitz, on 02/04/2010 05:21 PM wrote:

Bart Van Assche wrote:

Sounds really interesting. Do you have numbers available about how
much these patches improve throughput or decrease latency ?


Yes, generally speaking after the patches the initiator peaks to about 300-400K 
IOPS
with latency under such load being 20-30us and before the patches the initiator 
was
doing upto 200K IOPS with the latency under such load being 50-100us, see some 
data
I got today on mytest bed. Being focused on the initiator, I was using a NULL 
device
at the target side.


Also, what kind of test did you do?


AFTER

 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa st
11 11  0 6212936 260552 36735600 308671 0 285596 492842  4 64  
0 31  0
 7 13  0 6212936 260552 36735600 30962824 285138 496537  5 61  
0 33  0
10 12  0 6212936 260552 36735600 308222 0 277868 489261  4 65  
0 30  0
 8 13  0 6212936 260552 36735600 310724 0 282151 493868  4 67  
0 29  0
12 11  0 6212936 260552 36735600 308209 0 278753 489797  5 66  
0 29  0

Linux 2.6.33-rc4 (cto-1)02/04/2010

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   4.620.00   66.29   28.710.000.37

Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz avgqu-sz 
  await  svctm  %util
sdd   0.00 0.00 88905.94  0.00 177811.88 0.00 2.00 
2.820.03   0.01  98.61
sdf   0.00 0.00 64021.78  0.00 128045.54 0.00 2.00 
2.550.04   0.02  96.24
sdh   0.00 0.00 88922.77  0.00 177845.54 0.00 2.00 
2.850.03   0.01  99.01
sdj   0.00 0.00 64662.38  0.00 129324.75 0.00 2.00 
2.690.04   0.02  97.82

BEFORE

 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa st
 6 14  0 6211804 260684 36858400 195551 0 195997 557463  3 56  
4 37  0
 7 13  0 6211804 260684 36858400 191347 0 192311 525823  3 58  
3 36  0
 6 15  0 6211804 260692 36858400 18713516 190875 503739  3 58  
3 35  0
 8 14  0 6211804 260692 36858400 193745 0 193921 556821  3 55  
4 38  0
 8 16  0 6211804 260692 36858400 191233 0 191549 536499  3 58  
4 35  0

Linux 2.6.33-rc4 (cto-1)02/04/2010

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   2.240.00   58.16   35.870.003.74

Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz avgqu-sz 
  await  svctm  %util
sdd   0.00 0.00 33964.00  0.00 67928.00 0.00 2.00 
3.360.10   0.03 100.00
sdf   0.00 0.00 33456.00  0.00 66912.00 0.00 2.00 
3.340.10   0.03  99.60
sdh   0.00 0.00 63176.00  0.00 126352.00 0.00 2.00 
3.400.05   0.02 100.00
sdj   0.00 0.00 62973.00  0.00 125946.00 0.00 2.00 
3.380.05   0.02 100.40

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] opensm: bug in trap report for MC create(66) and delete(67) traps

2010-02-05 Thread Eli Dorfman

On Thu, Feb 4, 2010 at 10:52 PM, Hal Rosenstock
hal.rosenst...@gmail.com wrote:
 On Thu, Feb 4, 2010 at 12:43 PM, Eli Dorfman (Voltaire)
 dorfman@gmail.com wrote:

 Subject: [PATCH] Wrong handling of MC create and delete traps

 For these traps the GID in the data details is the MGID and
 not the source port gid.
 So the SM should check that subscriber port has the pkey of the MC group.
 There was also an error in comparing the subnet prefix and guid due to
 host/network order mismatch.

 Signed-off-by: Eli Dorfman e...@voltaire.com
 ---
  opensm/opensm/osm_inform.c |  151 
 ---
  1 files changed, 98 insertions(+), 53 deletions(-)

 diff --git a/opensm/opensm/osm_inform.c b/opensm/opensm/osm_inform.c
 index 8108213..ae4fe71 100644
 --- a/opensm/opensm/osm_inform.c
 +++ b/opensm/opensm/osm_inform.c
 @@ -341,6 +341,103 @@ Exit:
        return status;
  }

 +static int is_access_permitted( osm_infr_t *p_infr_rec,
 +                               osm_infr_match_ctxt_t *p_infr_match )
 +{
 +       cl_list_t *p_infr_to_remove_list = p_infr_match-p_remove_infr_list;
 +       ib_inform_info_t *p_ii = (p_infr_rec-inform_record.inform_info);
 +       ib_mad_notice_attr_t *p_ntc = p_infr_match-p_ntc;
 +       uint16_t trap_num = cl_ntoh16(p_ntc-g_or_v.generic.trap_num);
 +       osm_subn_t *p_subn = p_infr_rec-sa-p_subn;
 +       osm_log_t *p_log = p_infr_rec-sa-p_log;
 +       char gid_str[INET6_ADDRSTRLEN];
 +       osm_mgrp_t *p_mgrp;
 +       ib_gid_t source_gid;
 +       osm_port_t *p_src_port;
 +       osm_port_t *p_dest_port;
 +
 +       /* In case of GID_IN(64) or GID_OUT(65) traps the source gid
 +          comparison should be done on the trap source (saved as the gid in 
 the
 +          data details field).
 +          For traps MC_CREATE(66) or MC_DELETE(67) the data details gid is
 +          the MGID. We need to check whether subscriber has the pky of


                   typo  

                           pkey


 +          the MC group.

 Shouldn't this be the subscriber has a compatible pkey with MC group ?
 The MC group has a full member PKey and the members can be full or
 limited.

I accept the correction.
Sasha, can you please change this in the commit (only if there are not
other remarks).

BTW, there is no explicit reference in the IB spec for MC group
create/delete trap (at least I didn't find it).


 +          In all other cases the issuer gis is the trap source.

                                               typo  ^^^
                                                       gid


and this typo of course.

Thanks,
Eli
 -- Hal

 +       */
 +       if (trap_num = 64  trap_num = 67 )
 +               /* The issuer of these traps is the SM so source_gid
 +                  is the gid saved on the data details */
 +               source_gid = p_ntc-data_details.ntc_64_67.gid;
 +       else
 +               source_gid = p_ntc-issuer_gid;
 +
 +       p_dest_port =
 +           cl_ptr_vector_get(p_subn-port_lid_tbl,
 +                             cl_ntoh16(p_infr_rec-report_addr.dest_lid));
 +       if (!p_dest_port) {
 +               OSM_LOG(p_log, OSM_LOG_INFO,
 +                       Cannot find destination port with LID:%u\n,
 +                       cl_ntoh16(p_infr_rec-report_addr.dest_lid));
 +               goto Exit;
 +       }
 +
 +       switch (trap_num) {
 +               case 66:
 +               case 67:
 +                       p_mgrp = osm_get_mgrp_by_mgid(p_subn, source_gid);
 +                       if (!p_mgrp) {
 +                               OSM_LOG(p_log, OSM_LOG_INFO,
 +                                       Cannot find MGID %s\n,
 +                                       inet_ntop(AF_INET6, source_gid.raw, 
 gid_str, sizeof gid_str));
 +                               goto Exit;
 +                       }
 +
 +                       if (!osm_physp_has_pkey(p_log,
 +                                               p_mgrp-mcmember_rec.pkey,
 +                                               p_dest_port-p_physp)) {
 +                               OSM_LOG(p_log, OSM_LOG_INFO,
 +                                       MGID %s and port GUID:0x%016 
 PRIx64  do not share same pkey\n,
 +                                       inet_ntop(AF_INET6, source_gid.raw, 
 gid_str, sizeof gid_str),
 +                                       cl_ntoh64(p_dest_port-guid));
 +                               goto Exit;
 +                       }
 +                       break;
 +
 +               default:
 +                       p_src_port =
 +                           osm_get_port_by_guid(p_subn, 
 source_gid.unicast.interface_id);
 +                       if (!p_src_port) {
 +                               OSM_LOG(p_log, OSM_LOG_INFO,
 +                                       Cannot find source port with 
 GUID:0x%016 PRIx64 \n,
 +                                       
 cl_ntoh64(source_gid.unicast.interface_id));
 +

Re: bug 1918 - openmpi broken due to rdma-cm changes

2010-02-05 Thread Steve Wise


Jeff Squyres (jsquyres) wrote:


Note that it is highly unlikely that we will release open mpi 1.4.2 in 
time for ofed 1.5.1.




Jeff, there is no way to handle high priority bug fixes in the current 
released stream?


Also note that trying to bind rdma cm to all interface ip addresses 
was the way that we were advised by openfabrics to figure out which 
devices are rdma-capable.


As such, it is highly desirable to get the fix transparently in rdmacm 
and preserve the old semantic. More specifically, it seems undesirable 
to change this semantic in a minor ofed point release.




I agree that we should probably not allow 127.0.0.1 binds in ofed-1.5.1 
at all because it regresses OpenMPI.  Even with IB systems, if the bind 
to 127.0.0.1 succeeds, then OpenMPI assumes 127.0.0.1 is bound to that 
rdma interface and advertises this address to its peer as an address 
to-which that peer can rdma connect!  This will break IB clusters too, 
not just T3/iWARP cluster.   While I think OpenMPI needs to skip 
127.0.0.1 in its logic, I think we should probably defer allowing 
127.0.0.1 binds until ofed-1.6.


But Jeff, note that if someone uses the upstream kernel and OpenMPI, its 
busted...


So I recommend:

1) Don't allow 127.0.0.1 binds in ofed-1.5.1

2) Fix OpenMPI ASAP to never advertise 127.0.0.1 as a valid rdma-cm 
connect address (get it in ofed-1.5.2 or ofed-1.6).




Steve.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: bug 1918 - openmpi broken due to rdma-cm changes

2010-02-05 Thread Sean Hefty

Also note that trying to bind rdma cm to all interface ip addresses was the way
that we were advised by openfabrics to figure out which devices are rdma-
capable.

As such, it is highly desirable to get the fix transparently in rdmacm and
preserve the old semantic. More specifically, it seems undesirable to change
this semantic in a minor ofed point release.

I think the issue is larger than just the rdma_cm.

First, it sounds like openmpi tries to bind to 127.0.0.1, which now works.  If
opemmpi uses shared memory for connections on the same machine, I'm not sure why
this is a problem, unless it is passing that address to another machine to use
for a connection.  If this is the case, then that is a bug in openmpi.

Second, I still don't understand whether iwarp is limited to 'loopback'
connections that are not bound to 127.0.0.1.  For instance, if the RDMA device
is associated with 192.168.0.1, then can it handle a connection from 192.168.0.1
- 192.168.0.1?  If it can't, then the rdma_cm can't help in this case when
bind is called.  The failure has to come during connect, which sounds like the
behavior that's seen today with 127.0.0.1.

So, while the rdma_cm can fail binds to 127.0.0.1 if the RDMA device doesn't
support loopback, I'm still not sure how much of a fix this is.

- Sean

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] opensm: bug in trap report for MC create(66) and delete(67) traps

2010-02-05 Thread Hal Rosenstock

On Fri, Feb 5, 2010 at 9:18 AM, Eli Dorfman dorfman@gmail.com wrote:
 On Thu, Feb 4, 2010 at 10:52 PM, Hal Rosenstock
 hal.rosenst...@gmail.com wrote:
 On Thu, Feb 4, 2010 at 12:43 PM, Eli Dorfman (Voltaire)
 dorfman@gmail.com wrote:

 Subject: [PATCH] Wrong handling of MC create and delete traps

 For these traps the GID in the data details is the MGID and
 not the source port gid.
 So the SM should check that subscriber port has the pkey of the MC group.
 There was also an error in comparing the subnet prefix and guid due to
 host/network order mismatch.

 Signed-off-by: Eli Dorfman e...@voltaire.com
 ---
  opensm/opensm/osm_inform.c |  151 
 ---
  1 files changed, 98 insertions(+), 53 deletions(-)

 diff --git a/opensm/opensm/osm_inform.c b/opensm/opensm/osm_inform.c
 index 8108213..ae4fe71 100644
 --- a/opensm/opensm/osm_inform.c
 +++ b/opensm/opensm/osm_inform.c
 @@ -341,6 +341,103 @@ Exit:
        return status;
  }

 +static int is_access_permitted( osm_infr_t *p_infr_rec,
 +                               osm_infr_match_ctxt_t *p_infr_match )
 +{
 +       cl_list_t *p_infr_to_remove_list = p_infr_match-p_remove_infr_list;
 +       ib_inform_info_t *p_ii = (p_infr_rec-inform_record.inform_info);
 +       ib_mad_notice_attr_t *p_ntc = p_infr_match-p_ntc;
 +       uint16_t trap_num = cl_ntoh16(p_ntc-g_or_v.generic.trap_num);
 +       osm_subn_t *p_subn = p_infr_rec-sa-p_subn;
 +       osm_log_t *p_log = p_infr_rec-sa-p_log;
 +       char gid_str[INET6_ADDRSTRLEN];
 +       osm_mgrp_t *p_mgrp;
 +       ib_gid_t source_gid;
 +       osm_port_t *p_src_port;
 +       osm_port_t *p_dest_port;
 +
 +       /* In case of GID_IN(64) or GID_OUT(65) traps the source gid
 +          comparison should be done on the trap source (saved as the gid 
 in the
 +          data details field).
 +          For traps MC_CREATE(66) or MC_DELETE(67) the data details gid is
 +          the MGID. We need to check whether subscriber has the pky of


                   typo  

                           pkey


 +          the MC group.

 Shouldn't this be the subscriber has a compatible pkey with MC group ?
 The MC group has a full member PKey and the members can be full or
 limited.

 I accept the correction.

Doesn't this require a code change for handling trap cases 66-67 ?

 Sasha, can you please change this in the commit (only if there are not
 other remarks).

Is that what you are asking Sasha to do (beyond the typos) ?


 BTW, there is no explicit reference in the IB spec for MC group
 create/delete trap (at least I didn't find it).

Not sure what you mean by this. What didn't you find ?

-- Hal



 +          In all other cases the issuer gis is the trap source.

                                               typo  ^^^
                                                       gid


 and this typo of course.

 Thanks,
 Eli
 -- Hal

 +       */
 +       if (trap_num = 64  trap_num = 67 )
 +               /* The issuer of these traps is the SM so source_gid
 +                  is the gid saved on the data details */
 +               source_gid = p_ntc-data_details.ntc_64_67.gid;
 +       else
 +               source_gid = p_ntc-issuer_gid;
 +
 +       p_dest_port =
 +           cl_ptr_vector_get(p_subn-port_lid_tbl,
 +                             cl_ntoh16(p_infr_rec-report_addr.dest_lid));
 +       if (!p_dest_port) {
 +               OSM_LOG(p_log, OSM_LOG_INFO,
 +                       Cannot find destination port with LID:%u\n,
 +                       cl_ntoh16(p_infr_rec-report_addr.dest_lid));
 +               goto Exit;
 +       }
 +
 +       switch (trap_num) {
 +               case 66:
 +               case 67:
 +                       p_mgrp = osm_get_mgrp_by_mgid(p_subn, source_gid);
 +                       if (!p_mgrp) {
 +                               OSM_LOG(p_log, OSM_LOG_INFO,
 +                                       Cannot find MGID %s\n,
 +                                       inet_ntop(AF_INET6, source_gid.raw, 
 gid_str, sizeof gid_str));
 +                               goto Exit;
 +                       }
 +
 +                       if (!osm_physp_has_pkey(p_log,
 +                                               p_mgrp-mcmember_rec.pkey,
 +                                               p_dest_port-p_physp)) {
 +                               OSM_LOG(p_log, OSM_LOG_INFO,
 +                                       MGID %s and port GUID:0x%016 
 PRIx64  do not share same pkey\n,
 +                                       inet_ntop(AF_INET6, source_gid.raw, 
 gid_str, sizeof gid_str),
 +                                       cl_ntoh64(p_dest_port-guid));
 +                               goto Exit;
 +                       }
 +                       break;
 +
 +               default:
 +                       p_src_port =
 +                           osm_get_port_by_guid(p_subn, 
 source_gid.unicast.interface_id);
 +                       if (!p_src_port) {

Re: bug 1918 - openmpi broken due to rdma-cm changes

2010-02-05 Thread Steve Wise


Sean Hefty wrote:

Also note that trying to bind rdma cm to all interface ip addresses was the way
that we were advised by openfabrics to figure out which devices are rdma-
capable.

As such, it is highly desirable to get the fix transparently in rdmacm and
preserve the old semantic. More specifically, it seems undesirable to change
this semantic in a minor ofed point release.



I think the issue is larger than just the rdma_cm.

First, it sounds like openmpi tries to bind to 127.0.0.1, which now works.  If
opemmpi uses shared memory for connections on the same machine, I'm not sure why
this is a problem, unless it is passing that address to another machine to use
for a connection.  If this is the case, then that is a bug in openmpi.
  


Yes, OpenMPI incorrectly advertises 127.0.0.1 as a valid address 
to-which the peer can connect. This needs to be fixed.




Second, I still don't understand whether iwarp is limited to 'loopback'
connections that are not bound to 127.0.0.1.  For instance, if the RDMA device
is associated with 192.168.0.1, then can it handle a connection from 192.168.0.1
- 192.168.0.1?  If it can't, then the rdma_cm can't help in this case when
bind is called.  The failure has to come during connect, which sounds like the
behavior that's seen today with 127.0.0.1.
  


Its not iWARP specific.  A device may or may not support hw loopback.
Now the IB spec mandates this support, but the iWARP spec doesn't.  
Ammasso and Chelsio T3 rnics do not support HW loopback.  They will fail 
if you try to connect to a local address.  The rdma-cm shouldn't allow 
binds to 127.0.0.1 for these devices since it 100% implies that the 
connection will require hw loopback for that device.



So, while the rdma_cm can fail binds to 127.0.0.1 if the RDMA device doesn't
support loopback, I'm still not sure how much of a fix this is.
  


My concern is breaking an existing working OpenMPI in a point release 
because we changed semantics of the rdma-cm in an ofed point release...


BTW:  Was this change an artifact of rebasing ofed-1.5.1 on a new kernel 
version?


Steve.


- Sean

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
  


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: bug 1918 - openmpi broken due to rdma-cm changes

2010-02-05 Thread Steve Wise



I agree that we should probably not allow 127.0.0.1 binds in 
ofed-1.5.1 at all because it regresses OpenMPI.  Even with IB systems, 
if the bind to 127.0.0.1 succeeds, then OpenMPI assumes 127.0.0.1 is 
bound to that rdma interface and advertises this address to its peer 
as an address to-which that peer can rdma connect!  This will break IB 
clusters too, not just T3/iWARP cluster.   While I think OpenMPI needs 
to skip 127.0.0.1 in its logic, I think we should probably defer 
allowing 127.0.0.1 binds until ofed-1.6.


But Jeff, note that if someone uses the upstream kernel and OpenMPI, 
its busted...


So I recommend:

1) Don't allow 127.0.0.1 binds in ofed-1.5.1

2) Fix OpenMPI ASAP to never advertise 127.0.0.1 as a valid rdma-cm 
connect address (get it in ofed-1.5.2 or ofed-1.6).


Also, there is a good argument for never allowing 127.0.0.1 for rdma 
anyway.  It implies a _software_ loopback.  It should NEVER be bound to 
a real NIC interface and thus rdma binds shouldn't be allowed to it 
since there is no software rdma loopback support...


Unless someone implements software rdma loobpack...  ;)


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: bug 1918 - openmpi broken due to rdma-cm changes

2010-02-05 Thread Sean Hefty

My concern is breaking an existing working OpenMPI in a point release
because we changed semantics of the rdma-cm in an ofed point release...

OFED can call this release a point release, but in reality, the content makes it
a major release...

BTW:  Was this change an artifact of rebasing ofed-1.5.1 on a new kernel
version?

apparently

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: bug 1918 - openmpi broken due to rdma-cm changes

2010-02-05 Thread Steve Wise



Sean Hefty wrote:

My concern is breaking an existing working OpenMPI in a point release
because we changed semantics of the rdma-cm in an ofed point release...



OFED can call this release a point release, but in reality, the content makes it
a major release...

  

BTW:  Was this change an artifact of rebasing ofed-1.5.1 on a new kernel
version?



apparently

  


Well as it stands now:  OpenMPI on ofed-1.5.1 is broken for IB if they 
use the rdma-cm for connection setup, and all IW clusters which require 
the rdma-cm connect method. 


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: bug 1918 - openmpi broken due to rdma-cm changes

2010-02-05 Thread Roland Dreier

  But Jeff, note that if someone uses the upstream kernel and OpenMPI,
  its busted...

Is the issue 6f8372b6 (RDMA/cm: fix loopback address support)?  This
just went in for 2.6.33, which is still at -rc6, so if we can quickly
reach a consensus, there is still time to get a fix in for 2.6.33.

 - R.
-- 
Roland Dreier rola...@cisco.com
Cisco.com - http://www.cisco.com

For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: bug 1918 - openmpi broken due to rdma-cm changes

2010-02-05 Thread Jeff Squyres

On Feb 5, 2010, at 11:16 AM, Steve Wise wrote:

  Note that it is highly unlikely that we will release open mpi 1.4.2 in
  time for ofed 1.5.1.
 
 Jeff, there is no way to handle high priority bug fixes in the current
 released stream?

We have 1.4.2 cooking, but it's not ready yet.  

I'll take it back to the OMPI community to see if they want to do a 
high-priority release, but I'm not excited about it (see below).

  Also note that trying to bind rdma cm to all interface ip addresses
  was the way that we were advised by openfabrics to figure out which
  devices are rdma-capable.
 
  As such, it is highly desirable to get the fix transparently in rdmacm
  and preserve the old semantic. More specifically, it seems undesirable
  to change this semantic in a minor ofed point release.
 
 I agree that we should probably not allow 127.0.0.1 binds in ofed-1.5.1
 at all because it regresses OpenMPI.  Even with IB systems, if the bind
 to 127.0.0.1 succeeds, then OpenMPI assumes 127.0.0.1 is bound to that
 rdma interface and advertises this address to its peer as an address
 to-which that peer can rdma connect!  This will break IB clusters too,
 not just T3/iWARP cluster.   While I think OpenMPI needs to skip
 127.0.0.1 in its logic, I think we should probably defer allowing
 127.0.0.1 binds until ofed-1.6.

I agree that Open MPI should not advertise 127.0.0.1 to peers.  However, the 
logic that we were advised to use was to try to RDMA CM bind to each IP 
address.  If the bind succeeds, then it's an RDMA-capable device and therefore 
it's advertisable.  The rationale was that 127.0.0.1 (really, any loopback 
address) is *not* an RDMA device and therefore the RDMA CM bind should *never* 
succeed on it.  Hence, it wasn't necessary to add a is this a loopback 
address? check in the logic.

I guess I don't understand why that rationale is now incorrect -- 127.0.0.1 is 
still not an RDMA-capable device, right?

 But Jeff, note that if someone uses the upstream kernel and OpenMPI, its
 busted...
 
 So I recommend:
 
 1) Don't allow 127.0.0.1 binds in ofed-1.5.1
 
 2) Fix OpenMPI ASAP to never advertise 127.0.0.1 as a valid rdma-cm
 connect address (get it in ofed-1.5.2 or ofed-1.6).

We can add this logic (because I understand that some upstream kernels now 
allow binding to loopback addresses), but I'm still confused (in principle) as 
to why it should be necessary.

Can you clarify what kernel versions allow binding LOOPBACK addresses with RDMA 
CM?

-- 
Jeff Squyres jsquy...@cisco.com
Cisco.com - http://www.cisco.com

For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: bug 1918 - openmpi broken due to rdma-cm changes

2010-02-05 Thread Jeff Squyres

On Feb 5, 2010, at 12:51 PM, Roland Dreier (rdreier) wrote:

   But Jeff, note that if someone uses the upstream kernel and OpenMPI,
   its busted...
 
 Is the issue 6f8372b6 (RDMA/cm: fix loopback address support)?  This
 just went in for 2.6.33, which is still at -rc6, so if we can quickly
 reach a consensus, there is still time to get a fix in for 2.6.33.

Oh oh oh!  Yes, that would be fabulous...

Thanks!

-- 
Jeff Squyres jsquy...@cisco.com
Cisco.com - http://www.cisco.com

For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: bug 1918 - openmpi broken due to rdma-cm changes

2010-02-05 Thread Steve Wise


Jeff Squyres wrote:

On Feb 5, 2010, at 12:51 PM, Roland Dreier (rdreier) wrote:

  

  But Jeff, note that if someone uses the upstream kernel and OpenMPI,
  its busted...

Is the issue 6f8372b6 (RDMA/cm: fix loopback address support)?  This
just went in for 2.6.33, which is still at -rc6, so if we can quickly
reach a consensus, there is still time to get a fix in for 2.6.33.



Oh oh oh!  Yes, that would be fabulous...

Thanks!

  


I think we should remove the feature of allowing binds to 127.0.0.1 
altogether based on Jeff's arguments and my assertion that 127.0.0.1 is 
a sw-loopback mechanism anyway...


I'm not sure if that commit does more or not...

Steve.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: bug 1918 - openmpi broken due to rdma-cm changes

2010-02-05 Thread Sean Hefty

Is the issue 6f8372b6 (RDMA/cm: fix loopback address support)?  This
just went in for 2.6.33, which is still at -rc6, so if we can quickly
reach a consensus, there is still time to get a fix in for 2.6.33.

That should be the patch in question.  I'm not sure about reaching consensus. :)
If the other changes to the rdma_cm aren't closely tied to that change, we may
be able to back that one patch out until we can get whatever other fix may be
needed.

In my view, openmpi has a bug in that it can pass a loopback address to a remote
peer and expect it to be used to establish a connection.  Steve seems to agree
with this.

My original intent was to allow the use of the loopback address with the
rdma_cm.  I.e. 127.0.0.1 meant 'this host', and not 'software loopback'.  I just
had Arlin run a quick test with OFED 1.4 over IB, and it allows binding to
127.0.0.1, but never forms connections.  I.e. ucmatose -b 127.0.0.1 succeeds in
listening, but ucmatose -s 127.0.0.1 fails to connect because of a route error.
(Hmm... I'm still confused about what openmpi is doing then.)

Even if an application were to use non-loopback IP addresses, there's no
guarantee of forming a connection if those addresses map to an iwarp device.
So, even if the rdma_cm fails binding to 127.0.0.1 unless there's some RDMA
device (software or hardware - not sure why we care) capable of supporting it,
an application would need to also deal with failures from rdma_resolve_addr.

Indicating loopback through a device capability flag seems like the right
approach, and the rdma_cm can use this to fail rdma_bind_addr/rdma_resolve_addr
calls.  That's probably not a trivial patch however.

- Sean

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: bug 1918 - openmpi broken due to rdma-cm changes

2010-02-05 Thread Roland Dreier

  I think we should remove the feature of allowing binds to 127.0.0.1
  altogether based on Jeff's arguments and my assertion that 127.0.0.1
  is a sw-loopback mechanism anyway...

Well, someone propose a patch please.
-- 
Roland Dreier rola...@cisco.com
Cisco.com - http://www.cisco.com

For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: bug 1918 - openmpi broken due to rdma-cm changes

2010-02-05 Thread Jason Gunthorpe

On Fri, Feb 05, 2010 at 12:32:51PM -0600, Steve Wise wrote:

 I think we should remove the feature of allowing binds to 127.0.0.1  
 altogether based on Jeff's arguments and my assertion that 127.0.0.1 is  
 a sw-loopback mechanism anyway...

I don't agree, the kernel should be free to provide a loop back
service any way it likes, and if that means using one of the HW
adaptors to accelerate the work, then fine. Consider if we see the
RDMAoE (soft RDMA) patches then it would be reasonable for all
kernels to support RDMA on the loopback.

At a minimum, RDMA CM is an IP service, so whatever logic you use to
determine addresses for TCP must also be done after determining a list
of valid RDMA IPs. Trying to do RDMA CM bind just gives you the list
of candidate addreses, no different than netlink does for TCP.

One of those steps must be at least filtering 127.0.0.0/8. The user
should also be able to have some input into the IP filter - software
RDMAoE for instance really make this important.

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: bug 1918 - openmpi broken due to rdma-cm changes

2010-02-05 Thread Roland Dreier

   That should be the patch in question.  I'm not sure about reaching 
   consensus. :)
   If the other changes to the rdma_cm aren't closely tied to that change, we 
   may
   be able to back that one patch out until we can get whatever other fix may 
   be
   needed.

  I'd like to do this approach.  Then re-submit once we come to consensus...

That makes sense to me.  Someone please send me a tested revert.
-- 
Roland Dreier rola...@cisco.com
Cisco.com - http://www.cisco.com

For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] dapl-2.0: Cleanup CM object lock before freeing CM object memory

2010-02-05 Thread Davis, Arlin R

 
Running windows application verifiier for uDAPL validation
for all 3 providers. Cleanup memory lock leaks found
by verifier.

Signed-off-by: Arlin Davis arlin.r.da...@intel.com
---
 dapl/openib_cma/cm.c |4 
 dapl/openib_scm/cm.c |2 ++
 dapl/openib_ucm/cm.c |4 
 3 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/dapl/openib_cma/cm.c b/dapl/openib_cma/cm.c
index 9928239..cfa6ede 100644
--- a/dapl/openib_cma/cm.c
+++ b/dapl/openib_cma/cm.c
@@ -167,6 +167,7 @@ dp_ib_cm_handle_t dapls_ib_cm_create(DAPL_EP *ep)
 
/* create CM_ID, bind to local device, create QP */
if (rdma_create_id(g_cm_events, cm_id, (void *)conn, RDMA_PS_TCP)) {
+   dapl_os_lock_destroy(conn-lock);
dapl_os_free(conn, sizeof(*conn));
return NULL;
}
@@ -221,6 +222,7 @@ void dapls_ib_cm_free(dp_ib_cm_handle_t conn, DAPL_EP *ep)
rdma_destroy_id(conn-cm_id);
}
 
+   dapl_os_lock_destroy(conn-lock);
dapl_os_free(conn, sizeof(*conn));
 }
 
@@ -686,6 +688,7 @@ dapls_ib_setup_conn_listener(IN DAPL_IA * ia_ptr,
/* create CM_ID, bind to local device, create QP */
if (rdma_create_id
(g_cm_events, conn-cm_id, (void *)conn, RDMA_PS_TCP)) {
+   dapl_os_lock_destroy(conn-lock);
dapl_os_free(conn, sizeof(*conn));
return (dapl_convert_errno(errno, setup_listener));
}
@@ -734,6 +737,7 @@ dapls_ib_setup_conn_listener(IN DAPL_IA * ia_ptr,
 
   bail:
rdma_destroy_id(conn-cm_id);
+   dapl_os_lock_destroy(conn-lock);
dapl_os_free(conn, sizeof(*conn));
return dat_status;
 }
diff --git a/dapl/openib_scm/cm.c b/dapl/openib_scm/cm.c
index db2821a..8e9be4d 100644
--- a/dapl/openib_scm/cm.c
+++ b/dapl/openib_scm/cm.c
@@ -317,6 +317,7 @@ void dapls_ib_cm_free(dp_ib_cm_handle_t cm_ptr, DAPL_EP *ep)
closesocket(cm_ptr-socket);
}
dapl_os_unlock(cm_ptr-lock);
+   dapl_os_lock_destroy(cm_ptr-lock);
dapl_os_free(cm_ptr, sizeof(*cm_ptr));
return;
}
@@ -1761,6 +1762,7 @@ void cr_thread(void *arg)
shutdown(cr-socket, SHUT_RDWR);
closesocket(cr-socket);
}
+   dapl_os_lock_destroy(cr-lock);
dapl_os_free(cr, sizeof(*cr));
continue;
}
diff --git a/dapl/openib_ucm/cm.c b/dapl/openib_ucm/cm.c
index b5aba64..c0da589 100644
--- a/dapl/openib_ucm/cm.c
+++ b/dapl/openib_ucm/cm.c
@@ -728,6 +728,7 @@ void dapls_ib_cm_free(dp_ib_cm_handle_t cm, DAPL_EP *ep)
/* cleanup, never made it to work queue */
if (cm-state == DCM_INIT) {
dapl_os_unlock(cm-lock);
+   dapl_os_lock_destroy(cm-lock);
dapl_os_free(cm, sizeof(*cm));
return;
}
@@ -1701,6 +1702,7 @@ dapls_ib_remove_conn_listener(IN DAPL_IA *ia, IN DAPL_SP 
*sp)
cm-state = DCM_DESTROY;
dapl_os_unlock(cm-lock);
ucm_dequeue_listen(cm);
+   dapl_os_lock_destroy(cm-lock);
dapl_os_free(cm, sizeof(*cm));
}
return DAT_SUCCESS;
@@ -1981,6 +1983,7 @@ void cm_thread(void *arg)
dapl_llist_remove_entry(hca-ib_trans.list,
(DAPL_LLIST_ENTRY 
*)cm-entry);
dapl_os_unlock(cm-lock);
+   dapl_os_lock_destroy(cm-lock);
dapl_os_free(cm, sizeof(*cm));
continue;
}
@@ -2052,6 +2055,7 @@ void cm_thread(void *arg)
hca-ib_trans.list,
(DAPL_LLIST_ENTRY *)cm-entry);
dapl_os_unlock(cm-lock);
+   dapl_os_lock_destroy(cm-lock);
dapl_os_free(cm, sizeof(*cm));
continue;
}
-- 
1.5.2.5

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] dapl-2.0: undefined symbol: dapls_print_cm_list

2010-02-05 Thread Davis, Arlin R



call prototype dependent on DAPL_COUNTERS definition.

Signed-off-by: Arlin Davis arlin.r.da...@intel.com
---
 dapl/openib_cma/dapl_ib_util.h |2 ++
 dapl/openib_scm/dapl_ib_util.h |3 +++
 dapl/openib_ucm/dapl_ib_util.h |3 +++
 3 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/dapl/openib_cma/dapl_ib_util.h b/dapl/openib_cma/dapl_ib_util.h
index 6b43c91..451a967 100755
--- a/dapl/openib_cma/dapl_ib_util.h
+++ b/dapl/openib_cma/dapl_ib_util.h
@@ -125,9 +125,11 @@ void dapli_cq_event_cb(struct _ib_hca_transport *tp);
 dp_ib_cm_handle_t dapls_ib_cm_create(DAPL_EP *ep);
 void dapls_ib_cm_free(dp_ib_cm_handle_t cm, DAPL_EP *ep);
 
+#ifdef DAPL_COUNTERS
 STATIC _INLINE_ void dapls_print_cm_list(IN DAPL_IA * ia_ptr)
 {
return;
 }
+#endif
 
 #endif /*  _DAPL_IB_UTIL_H_ */
diff --git a/dapl/openib_scm/dapl_ib_util.h b/dapl/openib_scm/dapl_ib_util.h
index 138a3dd..831084f 100644
--- a/dapl/openib_scm/dapl_ib_util.h
+++ b/dapl/openib_scm/dapl_ib_util.h
@@ -113,6 +113,9 @@ void dapli_cq_event_cb(struct _ib_hca_transport *tp);
 DAT_RETURN dapli_socket_disconnect(dp_ib_cm_handle_t cm_ptr);
 dp_ib_cm_handle_t dapls_ib_cm_create(DAPL_EP *ep);
 void dapls_ib_cm_free(dp_ib_cm_handle_t cm, DAPL_EP *ep);
+
+#ifdef DAPL_COUNTERS
 void dapls_print_cm_list(IN DAPL_IA *ia_ptr);
+#endif
 
 #endif /*  _DAPL_IB_UTIL_H_ */
diff --git a/dapl/openib_ucm/dapl_ib_util.h b/dapl/openib_ucm/dapl_ib_util.h
index 6273459..d7844c6 100644
--- a/dapl/openib_ucm/dapl_ib_util.h
+++ b/dapl/openib_ucm/dapl_ib_util.h
@@ -119,7 +119,10 @@ void ucm_async_event(struct dapl_hca *hca);
 void dapli_cq_event_cb(struct _ib_hca_transport *tp);
 dp_ib_cm_handle_t dapls_ib_cm_create(DAPL_EP *ep);
 void dapls_ib_cm_free(dp_ib_cm_handle_t cm, DAPL_EP *ep);
+
+#ifdef DAPL_COUNTERS
 void dapls_print_cm_list(IN DAPL_IA *ia_ptr);
+#endif
 
 #endif /*  _DAPL_IB_UTIL_H_ */
 
-- 
1.5.2.5

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: bug 1918 - openmpi broken due to rdma-cm changes

2010-02-05 Thread Jeff Squyres

On Feb 5, 2010, at 1:56 PM, Jason Gunthorpe wrote:

  I think we should remove the feature of allowing binds to 127.0.0.1 
  altogether based on Jeff's arguments and my assertion that 127.0.0.1 is 
  a sw-loopback mechanism anyway...
 
 I don't agree, the kernel should be free to provide a loop back
 service any way it likes, and if that means using one of the HW

Ok, fine.  Should we push back OFED 1.5.1 until Open MPI can get 1.4.2 out?  I 
don't know when that will be.

In short: you're breaking backward compatibility with zero warning.  There is 
real software out there that will break if people upgrade their 
kernel/OFED/RDMA CM/whatever (e.g., Open MPI).  Isn't this supposed to be the 
Enterprise distribution (meaning: stability)?  (trying to keep the frustration 
out of my voice...)

This is a terrible, terrible idea.

How about this: back out the change for now.  Give everyone time to upgrade.  
If nothing else, ***give those of us who are involved in this community*** time 
to upgrade.  Then put the feature back in after adequate time has passed.

-- 
Jeff Squyres jsquy...@cisco.com
Cisco.com - http://www.cisco.com

For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: bug 1918 - openmpi broken due to rdma-cm changes

2010-02-05 Thread Sean Hefty

Ammasso and Chelsio T3 rnics do not support HW loopback.

It looks like the NES driver doesn't support 127.0.0.1, but does support
loopback connections (gurgle).  Here's an untested patch for 2.6.33
(not even compile tested) for consideration then.  I'll be testing
this shortly unless there's disagreement.


rdma/cm: disallow loopback address for iwarp devices

From: Sean Hefty sean.he...@intel.com

The current RDMA iWarp devices cannot be used to establish
connections using the loopback address.  Prevent rdma_bind_addr
from associating the loopback address with an iWarp device.

This fixes an issue with openmpi, where it tries to identify which
IP addresses map to RDMA devices by calling rdma_bind_addr on
each address and seeing if the bind succeeds.  Prior to patch
6f8372b6 RDMA/cm: fix loopback address support, this process
worked.  But the rdma_cm now allows rdma_bind_addr to bind to an
RDMA device using the loopback address, and attaches the rdma_cm_id
to the RDMA device as part of the bind.

Signed-off-by: Sean Hefty sean.he...@intel.com
---

 drivers/infiniband/core/cma.c |   14 ++
 1 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index cc9b594..5850411 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -1739,6 +1739,9 @@ err:
 }
 EXPORT_SYMBOL(rdma_resolve_route);
 
+/*
+ * Only IB devices support loopback connections.
+ */
 static int cma_bind_loopback(struct rdma_id_private *id_priv)
 {
struct cma_device *cma_dev;
@@ -1753,11 +1756,16 @@ static int cma_bind_loopback(struct rdma_id_private 
*id_priv)
ret = -ENODEV;
goto out;
}
-   list_for_each_entry(cma_dev, dev_list, list)
+   list_for_each_entry(cma_dev, dev_list, list) {
+   if (rdma_node_get_transport(cma_dev-device-node_type) !=
+   RDMA_TRANSPORT_IB)
+   continue;
+
for (p = 1; p = cma_dev-device-phys_port_cnt; ++p)
if (!ib_query_port(cma_dev-device, p, port_attr) 
port_attr.state == IB_PORT_ACTIVE)
goto port_found;
+   }
 
p = 1;
cma_dev = list_entry(dev_list.next, struct cma_device, list);
@@ -1771,9 +1779,7 @@ port_found:
if (ret)
goto out;
 
-   id_priv-id.route.addr.dev_addr.dev_type =
-   (rdma_node_get_transport(cma_dev-device-node_type) == 
RDMA_TRANSPORT_IB) ?
-   ARPHRD_INFINIBAND : ARPHRD_ETHER;
+   id_priv-id.route.addr.dev_addr.dev_type = ARPHRD_INFINIBAND;
 
rdma_addr_set_sgid(id_priv-id.route.addr.dev_addr, gid);
ib_addr_set_pkey(id_priv-id.route.addr.dev_addr, pkey);



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: bug 1918 - openmpi broken due to rdma-cm changes

2010-02-05 Thread Jason Gunthorpe

On Fri, Feb 05, 2010 at 03:08:10PM -0500, Jeff Squyres wrote:
 On Feb 5, 2010, at 1:56 PM, Jason Gunthorpe wrote:
 
   I think we should remove the feature of allowing binds to 127.0.0.1 
   altogether based on Jeff's arguments and my assertion that 127.0.0.1 is 
   a sw-loopback mechanism anyway...
  
  I don't agree, the kernel should be free to provide a loop back
  service any way it likes, and if that means using one of the HW
 
 Ok, fine.  Should we push back OFED 1.5.1 until Open MPI can get 1.4.2 out?  
 I don't know when that will be.
 
 In short: you're breaking backward compatibility with zero warning.
 There is real software out there that will break if people upgrade
 their kernel/OFED/RDMA CM/whatever (e.g., Open MPI).  Isn't this
 supposed to be the Enterprise distribution (meaning: stability)?
 (trying to keep the frustration out of my voice...)

Well, I think you are right. This kind of change seems appropriate to
me for mainline, but OFED/RHEL should carry a responsibility to manage
an identified incompatibility, either patch their kernel, patch their
OMPI, or publish an errata. That is the role of a distribution.

 How about this: back out the change for now.  Give everyone time to
 upgrade.  If nothing else, ***give those of us who are involved in
 this community*** time to upgrade.  Then put the feature back in
 after adequate time has passed.

I've seen this approach go badly too :( If it isn't actually in a
mainline kernel userspace devs tend to ignore it ..

Sounds like this is taken care for now anyhow, Sean's patch to remove
it for iwarp since it doesn't work today with any iwarp drivers does
obscure the problem.. But it does seem like rdma_cm mode for IB
networks will still be broken in OMPI with the new kernels.

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: bug 1918 - openmpi broken due to rdma-cm changes

2010-02-05 Thread Jeff Squyres

On Feb 5, 2010, at 4:14 PM, Jason Gunthorpe wrote:

 Well, I think you are right. This kind of change seems appropriate to
 me for mainline, but OFED/RHEL should carry a responsibility to manage
 an identified incompatibility, either patch their kernel, patch their
 OMPI, or publish an errata. That is the role of a distribution.

RHEL has said, multiple times, that they rely on OpenFabrics to do the Right 
Thing.  They don't do a lot of testing, validating, etc.

 Sounds like this is taken care for now anyhow, Sean's patch to remove
 it for iwarp since it doesn't work today with any iwarp drivers does
 obscure the problem.. But it does seem like rdma_cm mode for IB
 networks will still be broken in OMPI with the new kernels.

Correct.

So why not back off putting this in the kernel that's coming out now now now?  
Why not put it in *next* kernel?  (or even better, the one after that)

Is there a rush / need to have this in *now*?

-- 
Jeff Squyres
jsquy...@cisco.com

For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: bug 1918 - openmpi broken due to rdma-cm changes

2010-02-05 Thread Steve Wise


Jeff Squyres wrote:

On Feb 5, 2010, at 4:14 PM, Jason Gunthorpe wrote:

  

Well, I think you are right. This kind of change seems appropriate to
me for mainline, but OFED/RHEL should carry a responsibility to manage
an identified incompatibility, either patch their kernel, patch their
OMPI, or publish an errata. That is the role of a distribution.



RHEL has said, multiple times, that they rely on OpenFabrics to do the Right 
Thing.  They don't do a lot of testing, validating, etc.

  

Sounds like this is taken care for now anyhow, Sean's patch to remove
it for iwarp since it doesn't work today with any iwarp drivers does
obscure the problem.. But it does seem like rdma_cm mode for IB
networks will still be broken in OMPI with the new kernels.



Correct.

So why not back off putting this in the kernel that's coming out now now now?  
Why not put it in *next* kernel?  (or even better, the one after that)

Is there a rush / need to have this in *now*?

  


There is still some inconsistency here.   Sean, you claimed binds to 
127.0.0.1 succeed in ofed-1.4 for IB devices.  If so, then folks running 
IB/openmpi/rdmacm should be seeing issues.  We need to dig a little more...



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: bug 1918 - openmpi broken due to rdma-cm changes

2010-02-05 Thread Sean Hefty

There is still some inconsistency here.   Sean, you claimed binds to
127.0.0.1 succeed in ofed-1.4 for IB devices.  If so, then folks running
IB/openmpi/rdmacm should be seeing issues.  We need to dig a little more...

You can verify this by running ucmatose -b 127.0.0.1 and see if the test enters
the listening state.

Can you also try testing iwarp with the patch that I sent? 

- Sean

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: bug 1918 - openmpi broken due to rdma-cm changes

2010-02-05 Thread Jeff Squyres

On Feb 5, 2010, at 4:53 PM, Steve Wise wrote:

 There is still some inconsistency here.   Sean, you claimed binds to
 127.0.0.1 succeed in ofed-1.4 for IB devices.  If so, then folks running
 IB/openmpi/rdmacm should be seeing issues.  We need to dig a little more...

FWIW, I can run Open MPI v1.4.2beta on my OFED 1.4.1 cluster over IB devices 
using RDMA CM with no problems.  

I added some debug statements in OMPI showing which rdma_cm_bind's it attempts, 
just to be sure.  Here's a run across 2 nodes, each with a single 2-port mthca 
(each port connected to a different IB subnet, not that that matters):

$ mpirun -np 2 --bynode --mca btl_openib_cpc_include rdmacm ring
[svbu-mpi025:05592] FAILED to bind to 127.0.0.1
[svbu-mpi025:05592] FAILED to bind to 172.29.218.165
[svbu-mpi025:05592] SUCCEEDED to bind to 10.10.30.165
[svbu-mpi025:05592] SUCCEEDED to bind to 10.10.20.165
[svbu-mpi026:05529] FAILED to bind to 127.0.0.1
[svbu-mpi026:05529] FAILED to bind to 172.29.218.166
[svbu-mpi026:05529] SUCCEEDED to bind to 10.10.30.166
[svbu-mpi026:05529] SUCCEEDED to bind to 10.10.20.166
...

The 172.x address is my gigE device (eth0).

-- 
Jeff Squyres
jsquy...@cisco.com

For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: bug 1918 - openmpi broken due to rdma-cm changes

2010-02-05 Thread Steve Wise


Sean Hefty wrote:

There is still some inconsistency here.   Sean, you claimed binds to
127.0.0.1 succeed in ofed-1.4 for IB devices.  If so, then folks running
IB/openmpi/rdmacm should be seeing issues.  We need to dig a little more...



You can verify this by running ucmatose -b 127.0.0.1 and see if the test enters
the listening state.
  
Well ofed-1.4.1 with openmpi gets failures when binding to 127.0.0.1 on 
mthca devs.  Jeff will post the results soon.


Are you sure ucmatose is really binding to that address? :)

Can you also try testing iwarp with the patch that I sent? 

  


I will soon.  Can't do it right now.  I'll try tonight or tomorrow.


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] librdmacm: transition QP to RTS before sending reply

2010-02-05 Thread Sean Hefty

In order to handle a race condition where the passive side of
a connection can receive data on a QP before the connection established
event has been received, transition the QP to RTS before sending the reply.
This allows a user to send a response to any received message immediately,
rather than waiting until the connection established event has been
processed.

A similar fix was applied to the kernel rdma_cm a while ago.
Simply duplicate the fix in the user space library.

Signed-off-by: Sean Hefty sean.he...@intel.com
---

 src/cma.c |   27 +--
 1 files changed, 9 insertions(+), 18 deletions(-)

diff --git a/src/cma.c b/src/cma.c
index efad6ae..59e89dd 100644
--- a/src/cma.c
+++ b/src/cma.c
@@ -652,7 +652,8 @@ static int ucma_modify_qp_rtr(struct rdma_cm_id *id,
return ibv_modify_qp(id-qp, qp_attr, qp_attr_mask);
 }
 
-static int ucma_modify_qp_rts(struct rdma_cm_id *id)
+static int ucma_modify_qp_rts(struct rdma_cm_id *id,
+ struct rdma_conn_param *conn_param)
 {
struct ibv_qp_attr qp_attr;
int qp_attr_mask, ret;
@@ -662,6 +663,8 @@ static int ucma_modify_qp_rts(struct rdma_cm_id *id)
if (ret)
return ret;
 
+   if (conn_param)
+   qp_attr.max_rd_atomic = conn_param-initiator_depth;
return ibv_modify_qp(id-qp, qp_attr, qp_attr_mask);
 }
 
@@ -929,6 +932,10 @@ int rdma_accept(struct rdma_cm_id *id, struct 
rdma_conn_param *conn_param)
ret = ucma_modify_qp_rtr(id, conn_param);
if (ret)
return ret;
+
+   ret = ucma_modify_qp_rts(id, conn_param);
+   if (ret)
+   return ret;
}
 
CMA_CREATE_MSG_CMD(msg, cmd, UCMA_CMD_ACCEPT, size);
@@ -1212,7 +1219,7 @@ static int ucma_process_conn_resp(struct cma_id_private 
*id_priv)
if (ret)
goto err;
 
-   ret = ucma_modify_qp_rts(id_priv-id);
+   ret = ucma_modify_qp_rts(id_priv-id, NULL);
if (ret)
goto err;
 
@@ -1231,17 +1238,6 @@ err:
return ret;
 }
 
-static int ucma_process_establish(struct rdma_cm_id *id)
-{
-   int ret;
-
-   ret = ucma_modify_qp_rts(id);
-   if (ret)
-   ucma_modify_qp_err(id);
-
-   return ret;
-}
-
 static int ucma_process_join(struct cma_event *evt)
 {
evt-mc-mgid = evt-event.param.ud.ah_attr.grh.dgid;
@@ -1367,11 +1363,6 @@ retry:
}
 
ucma_copy_conn_event(evt, resp-param.conn);
-   evt-event.status = ucma_process_establish(evt-id_priv-id);
-   if (evt-event.status) {
-   evt-event.event = RDMA_CM_EVENT_CONNECT_ERROR;
-   evt-id_priv-connect_error = 1;
-   }
break;
case RDMA_CM_EVENT_REJECTED:
if (evt-id_priv-connect_error) {



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: bug 1918 - openmpi broken due to rdma-cm changes

2010-02-05 Thread Roland Dreier

   Well, I think you are right. This kind of change seems appropriate to
   me for mainline, but OFED/RHEL should carry a responsibility to manage
   an identified incompatibility, either patch their kernel, patch their
   OMPI, or publish an errata. That is the role of a distribution.
  
  RHEL has said, multiple times, that they rely on OpenFabrics to do the Right 
  Thing.  They don't do a lot of testing, validating, etc.

In that case OFED plays the role of distribution.
-- 
Roland Dreier rola...@cisco.com
Cisco.com - http://www.cisco.com

For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2] Using multi-smps on the wire in libibnetdisc

2010-02-05 Thread Ira Weiny

On Fri, 5 Feb 2010 07:27:05 -0500
Hal Rosenstock hal.rosenst...@gmail.com wrote:

 
 
  Note that 2 does not give much speed up, where 4 does.  Obviously this could
  have to do with the fact there were 2 nodes which were bad (so if you had
  100's of nodes unresponsive a higher value might be worth using)
 
 It depends on the number of unresponsive nodes being same or higher
 than number of outstanding/parallel SMPs. In a sense, the number of
 outstanding SMPs is a measure of how many unresponsive nodes one is
 willing to tolerate before slowing down/waiting for timeouts. In some
 environments, unresponsive nodes are a normal case.

Agreed but where should we set the default?  I don't think 4 is a bad default.
I don't think it makes the diags overly aggressive, compared with OpenSM.
Sasha I guess this is your call.

Just tell me where to set it and I will make the patch.  Basically with the
user option it can always be changed on a run by run basis.

Ira

 
 -- Hal
 
  but as a
  default compromise I think 4 is good.
 
  Ira
 
   
Also, I think you are correct that we should increase OpenSM's default 
from 4
to 8.  For the same reason as above.  Some of our clusters have worked 
better
with 8 when we are having issues.  But right now we are still running 
with 4.
  
   I'm concerned about just increasing ibnetdiscover to 4 rather than 2.
   I've seen a number of clusters with SMP dropping with the current
   lower defaults.
 
  So OpenSM is seeing dropped packets?  With 4 SMP's on the wire?  I do see 
  some
  VL15Dropped errors (maybe 2-3 a day) but I did not think that would be an
  issue.  What kind of rate are you seeing?
 
  The other question is; do people regularly run the tools which are using
  libibnetdisc (ibqueryerrors, iblinkinfo, ibnetdiscover)?  We do.  If others
  are not then I would say this change would have less impact as they would 
  want
  the diags to have some priority for debugging.  The other option is to 
  change
  the patch to be a default of 2 and allow user to change it depending on 
  what
  they are trying to do.  If you think that is best I will change the patch.
 
  Ira
 
  
   -- Hal
  
Ira
   
   
-- Hal
   

 The first patch converts the algorithm and the second adds the 
 ibnd_set_max_smps_on_wire call.

 Let me know what you think.  Because the algorithm changed so much 
 testing this is a bit difficult because the order of the node 
 discovery is different.  However, I have done some extensive 
 diffing of the output of ibnetdiscover and things look good.

 Ira

 --
 Ira Weiny
 Math Programmer/Computer Scientist
 Lawrence Livermore National Lab
 925-423-8008
 wei...@llnl.gov
 --
 To unsubscribe from this list: send the line unsubscribe 
 linux-rdma in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  
 http://***vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-rdma 
in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://***vger.kernel.org/majordomo-info.html
   
   
   
--
Ira Weiny
Math Programmer/Computer Scientist
Lawrence Livermore National Lab
925-423-8008
wei...@llnl.gov
   
  
 
 
  --
  Ira Weiny
  Math Programmer/Computer Scientist
  Lawrence Livermore National Lab
  925-423-8008
  wei...@llnl.gov
 
 
  --
  Ira Weiny
  Math Programmer/Computer Scientist
  Lawrence Livermore National Lab
  925-423-8008
  wei...@llnl.gov
 
 


-- 
Ira Weiny
Math Programmer/Computer Scientist
Lawrence Livermore National Lab
925-423-8008
wei...@llnl.gov
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC Patch] net: reserve ports for applications using fixed portnumbers

Re: [RFC Patch] net: reserve ports for applications using fixed port numbers

Re: [PATCH 0/2] Using multi-smps on the wire in libibnetdisc

Re: [PATCH 0/2] Using multi-smps on the wire in libibnetdisc

Re: [PATCH 0/8] ib/iser: major face lift of the data path code

Re: [PATCH v2] opensm: bug in trap report for MC create(66) and delete(67) traps

Re: bug 1918 - openmpi broken due to rdma-cm changes

RE: bug 1918 - openmpi broken due to rdma-cm changes

Re: [PATCH v2] opensm: bug in trap report for MC create(66) and delete(67) traps

Re: bug 1918 - openmpi broken due to rdma-cm changes

Re: bug 1918 - openmpi broken due to rdma-cm changes

RE: bug 1918 - openmpi broken due to rdma-cm changes

Re: bug 1918 - openmpi broken due to rdma-cm changes

Re: bug 1918 - openmpi broken due to rdma-cm changes

Re: bug 1918 - openmpi broken due to rdma-cm changes

Re: bug 1918 - openmpi broken due to rdma-cm changes

Re: bug 1918 - openmpi broken due to rdma-cm changes

RE: bug 1918 - openmpi broken due to rdma-cm changes

Re: bug 1918 - openmpi broken due to rdma-cm changes

Re: bug 1918 - openmpi broken due to rdma-cm changes

Re: bug 1918 - openmpi broken due to rdma-cm changes

[PATCH] dapl-2.0: Cleanup CM object lock before freeing CM object memory

[PATCH] dapl-2.0: undefined symbol: dapls_print_cm_list

Re: bug 1918 - openmpi broken due to rdma-cm changes

RE: bug 1918 - openmpi broken due to rdma-cm changes

Re: bug 1918 - openmpi broken due to rdma-cm changes

Re: bug 1918 - openmpi broken due to rdma-cm changes

Re: bug 1918 - openmpi broken due to rdma-cm changes

RE: bug 1918 - openmpi broken due to rdma-cm changes

Re: bug 1918 - openmpi broken due to rdma-cm changes

Re: bug 1918 - openmpi broken due to rdma-cm changes

[PATCH] librdmacm: transition QP to RTS before sending reply

Re: bug 1918 - openmpi broken due to rdma-cm changes

Re: [PATCH 0/2] Using multi-smps on the wire in libibnetdisc

34 matches

Site Navigation

Mail list logo

Footer information