"Soon."

:-)


On Oct 7, 2008, at 1:30 PM, Matt Burgess wrote:

Pasha,

That's great, thanks for the help. When exactly do you expect that 1.2.8 will be released?

Thanks,
Matt

On Tue, Oct 7, 2008 at 1:29 PM, Pavel Shamis (Pasha) <pa...@dev.mellanox.co.il > wrote:
Matt,
For all 1.2.X versions you should use btl_openib_ib_pkey_val
In ongoing 1.3 version the parameter was renamed to btl_openib_of_pkey_val.

BTW we plan to release 1.2.8 version very soon and it will include the partition bug fix.

Regards,
Pasha

Matt Burgess wrote:
Pasha,

With your patch and parameter suggestion, it works! So to be clear btl_openib_ib_pkey_val is for 1.2.6 and btl_openib_of_pkey_val is for 1.2.7?

Thanks again,
Matt

On Tue, Oct 7, 2008 at 12:24 PM, Pavel Shamis (Pasha) <pa...@dev.mellanox.co.il <mailto:pa...@dev.mellanox.co.il>> wrote:

   Matt,
   Can you please run " cat
   /sys/class/infiniband/mlx4_0/ports/1/pkeys/* " on your d2-ib,d3-ib.
   I would like to check the partition configuration.

   Ohh, BTW I see that the command line in previous email was wrong,
   Please use follow command line (the parameter name should be
   "btl_openib_ib_pkey_val" for ompi-1.2.6 and my patch accepts
   HEX/DEC values):
   /opt/openmpi-ib/1.2.6/bin/mpirun -np 2 -H d2-ib,d3-ib -mca btl
   openib,self -mca btl_openib_ib_pkey_val 0x8109
   /cluster/pallas/x86_64-ib/IMB-MPI1

   Ompi 1.2.6 version should work ok with this patch.


   Thanks,
   Pasha

   Matt Burgess wrote:

       Pasha,

       Thanks for the patch. Unfortunately, it doesn't seem like that
       fixed the problem. I realized earlier I didn't mention what
       version of OpenMPI I was trying - it's 1.2.6. <http://1.2.6.>
       <http://1.2.6.> Should I be trying 1.2.7 with this patch?

       Thanks,
       Matt

       2008/10/7 Pavel Shamis (Pasha) <pa...@dev.mellanox.co.il
       <mailto:pa...@dev.mellanox.co.il>
       <mailto:pa...@dev.mellanox.co.il
       <mailto:pa...@dev.mellanox.co.il>>>


          Matt,
          Can you please try attached patch ? I guess it will resolve
       this
          issue.

          Thanks,
          Pasha

          Matt Burgess wrote:

              Lenny,

              Thanks for the info. It doesn't seem to be be working
       still.
              My command line is:

              /opt/openmpi-ib/1.2.6/bin/mpirun -np 2 -H d2-ib,d3-ib
       -mca btl
              openib,self -mca btl_openib_of_pkey_val 33033
              /cluster/pallas/x86_64-ib/IMB-MPI1

              I don't have a
       "/sys/class/infiniband/mthca0/ports/1/pkeys/"
              but I do have
       "/sys/class/infiniband/mlx4_0/ports/1/pkeys/".
              It's contents are:

0 106 114 122 16 24 32 40 49 57 65 73 81
                9    98
1 107 115 123 17 25 33 41 5 58 66 74 82
                90   99
10 108 116 124 18 26 34 42 50 59 67 75 83 91 100 109 117 125 19 27 35 43 51 6 68 76 84 92 101 11 118 126 2 28 36 44 52 60 69 77 85 93 102 110 119 127 20 29 37 45 53 61 7 78 86 94 103 111 12 13 21 3 38 46 54 62 70 79 87 95 104 112 120 14 22 30 39 47 55 63 71 8 88 96 105
        113  121  15
                23   31   4    48   56   64   72   80   89   97
              We aren't using the opensm, but voltaire's SM on a 2012
       switch.

              Thanks again,
              Matt


              On Tue, Oct 7, 2008 at 9:37 AM, Lenny Verkhovsky
              <lenny.verkhov...@gmail.com
       <mailto:lenny.verkhov...@gmail.com>
              <mailto:lenny.verkhov...@gmail.com
       <mailto:lenny.verkhov...@gmail.com>>
              <mailto:lenny.verkhov...@gmail.com
       <mailto:lenny.verkhov...@gmail.com>
              <mailto:lenny.verkhov...@gmail.com
       <mailto:lenny.verkhov...@gmail.com>>>> wrote:

                 Hi Matt,

It seems that the right way to do it is the fallowing:

-mca btl openib,self -mca btl_openib_ib_pkey_val 33033

                 when the value is a decimal number of the pkey, in
       your case
                 0x8109 = 33033, and no need for
       btl_openib_ib_pkey_ix value.

                 ex.
mpirun -np 2 -H witch2,witch3 -mca btl openib,self - mca
                 btl_openib_ib_pkey_val 32769 ./mpi_p1_4_1_2 -t lt
LT (2) (size min max avg) 1 3.511429 3.511429 3.511429

                 if it's not working check cat
                 /sys/class/infiniband/mthca0/ports/1/pkeys/* for
       pkeys ans SM,
                 maybe it's a setup.

                 Pasha is currently checking this issue.

                 Best regards,

                 Lenny.





                 On 10/7/08, *Jeff Squyres* <jsquy...@cisco.com
       <mailto:jsquy...@cisco.com>
              <mailto:jsquy...@cisco.com <mailto:jsquy...@cisco.com>>
                 <mailto:jsquy...@cisco.com
       <mailto:jsquy...@cisco.com> <mailto:jsquy...@cisco.com
       <mailto:jsquy...@cisco.com>>>> wrote:

                     FWIW, if this configuration is for all of your
       users, you
                     might want to specify these MCA params in the
       default MCA
                     param file, or the environment, ...etc.  Just so
       that you
                     don't have to specify it on every mpirun command
       line.

                     See
http://www.open-mpi.org/faq/?category=tuning#setting-mca-params .



On Oct 7, 2008, at 5:43 AM, Lenny Verkhovsky wrote:

                         Sorry, misunderstood the question,

thanks for Pasha the right command line will be

                         -mca btl openib,self -mca
       btl_openib_of_pkey_val 0x8109
                         -mca btl_openib_of_pkey_ix 1

                         ex.

                         #mpirun -np 2 -H witch2,witch3 -mca btl
       openib,self
              -mca
                         btl_openib_of_pkey_val 0x8001 -mca
              btl_openib_of_pkey_ix 1
                         ./mpi_p1_4_TRUNK -t lt
                         LT (2) (size min max avg) 1 3.443480
       3.443480 3.443480


                         Best regards

                         Lenny.


                         On 10/6/08, Jeff Squyres <jsquy...@cisco.com
       <mailto:jsquy...@cisco.com>
              <mailto:jsquy...@cisco.com <mailto:jsquy...@cisco.com>>
                         <mailto:jsquy...@cisco.com
       <mailto:jsquy...@cisco.com>

              <mailto:jsquy...@cisco.com
       <mailto:jsquy...@cisco.com>>>> wrote: On Oct 5, 2008, at

                         1:22 PM, Lenny Verkhovsky wrote:

                         you should probably use -mca tcp,self  -mca
                         btl_openib_if_include ib0.8109


                         Really?  I thought we only took OpenFabrics
       device
              names
                         in the openib_if_include MCA param...?  It
       looks like
                         ib0.8109 is an IPoIB device name.



                         Lenny.



                         On 10/3/08, Matt Burgess
       <burgess.m...@gmail.com <mailto:burgess.m...@gmail.com>
              <mailto:burgess.m...@gmail.com
       <mailto:burgess.m...@gmail.com>>
                         <mailto:burgess.m...@gmail.com
       <mailto:burgess.m...@gmail.com>
              <mailto:burgess.m...@gmail.com
       <mailto:burgess.m...@gmail.com>>>> wrote:
                         Hi,


                         I'm trying to get openmpi working over openib
              partitions.
                         On this cluster, the partition number is
       0x109. The ib
                         interfaces are pingable over the appropriate
       ib0.8109
                         interface:

                         d2:/opt/openmpi-ib # ifconfig ib0.8109
                         ib0.8109  Link encap:UNSPEC  HWaddr
80-00-00-4A- FE-80-00-00-00-00-00-00-00-00-00-00
                                 inet addr:10.21.48.2
       <http://10.21.48.2> <http://10.21.48.2>
              <http://10.21.48.2>
                          Bcast:10.21.255.255 <http://10.21.255.255>
       <http://10.21.255.255>
              <http://10.21.255.255>
                          Mask:255.255.0.0 <http://255.255.0.0>
       <http://255.255.0.0>
              <http://255.255.0.0>

                                 inet6 addr: fe80::202:c902:26:ca01/64
              Scope:Link
                                 UP BROADCAST RUNNING MULTICAST
        MTU:65520
               Metric:1
                                 RX packets:16811 errors:0 dropped:0
              overruns:0 frame:0
                                 TX packets:15848 errors:0 dropped:1
       overruns:0
                         carrier:0
                                 collisions:0 txqueuelen:256
                                 RX bytes:102229428 (97.4 Mb)  TX
              bytes:102324172
                         (97.5 Mb)


                         I have tried the following:

                         /opt/openmpi-ib/1.2.6/bin/mpirun -np 2
       -machinefile
                         machinefile -mca btl openib,self -mca
              btl_openib_max_btls
                         1 -mca btl_openib_ib_pkey_val 0x8109 -mca
                         btl_openib_ib_pkey_ix 1
              /cluster/pallas/x86_64-ib/IMB-MPI1

                         but I just get a RETRY EXCEEDED ERROR. Is
       there a MCA
                         parameter I am missing?

                         I was successful using tcp only:

                         /opt/openmpi-ib/1.2.6/bin/mpirun -np 2
       -machinefile
                         machinefile -mca btl tcp,self -mca
              btl_openib_max_btls 1
                         -mca btl_openib_ib_pkey_val 0x8109
                         /cluster/pallas/x86_64-ib/IMB-MPI1



                         Thanks,
                         Matt Burgess

_______________________________________________
                         users mailing list
                         us...@open-mpi.org
       <mailto:us...@open-mpi.org> <mailto:us...@open-mpi.org
       <mailto:us...@open-mpi.org>>
              <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>
       <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>>>


                               
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
                         users mailing list
                         us...@open-mpi.org
       <mailto:us...@open-mpi.org> <mailto:us...@open-mpi.org
       <mailto:us...@open-mpi.org>>
              <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>
       <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>>>


                               
http://www.open-mpi.org/mailman/listinfo.cgi/users


                         --            Jeff Squyres
                         Cisco Systems


_______________________________________________
                         users mailing list
                         us...@open-mpi.org
       <mailto:us...@open-mpi.org> <mailto:us...@open-mpi.org
       <mailto:us...@open-mpi.org>>
              <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>
       <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>>>


                               
http://www.open-mpi.org/mailman/listinfo.cgi/users



                     --        Jeff Squyres
                     Cisco Systems




------------------------------------------------------------------------

              _______________________________________________
              devel mailing list
              de...@open-mpi.org <mailto:de...@open-mpi.org>
       <mailto:de...@open-mpi.org <mailto:de...@open-mpi.org>>


              http://www.open-mpi.org/mailman/listinfo.cgi/devel



          --    --
          Pavel Shamis (Pasha)
          Mellanox Technologies LTD.


          Index: ompi/mca/btl/openib/btl_openib_component.c
===================================================================
          --- ompi/mca/btl/openib/btl_openib_component.c  (revision
       19490)
+++ ompi/mca/btl/openib/btl_openib_component.c (working copy)
          @@ -558,7 +558,7 @@ static int init_one_hca(opal_list_t *btl
                   goto dealloc_pd;
              }

          -    ret = OMPI_SUCCESS;
          +    ret = OMPI_SUCCESS;
              /* Note ports are 1 based hence j = 1 */
              for(i = 1; i <= hca->ib_dev_attr.phys_port_cnt; i++){
                  struct ibv_port_attr ib_port_attr;
          @@ -580,7 +580,7 @@ static int init_one_hca(opal_list_t *btl
                          uint16_t pkey,j;
                          for (j=0; j < hca->ib_dev_attr.max_pkeys;
       j++) {
                              ibv_query_pkey(hca->ib_dev_context, i,
       j, &pkey);
          -                    pkey=ntohs(pkey);
          +                    pkey=ntohs(pkey) & 0x7fff;
                              if(pkey ==
       mca_btl_openib_component.ib_pkey_val){
                                  ret = init_one_port(btl_list, hca,
       i, j,
          &ib_port_attr);
                                  break;
          Index: ompi/mca/btl/openib/btl_openib_ini.c
===================================================================
          --- ompi/mca/btl/openib/btl_openib_ini.c        (revision
       19490)
+++ ompi/mca/btl/openib/btl_openib_ini.c (working copy)
          @@ -90,8 +90,6 @@ static int parse_line(parsed_section_val
           static void reset_section(bool had_previous_value,
          parsed_section_values_t *s);
           static void reset_values(ompi_btl_openib_ini_values_t *v);
           static int save_section(parsed_section_values_t *s);
          -static int intify(char *string);
          -static int intify_list(char *str, uint32_t **values, int
       *len);
           static inline void show_help(const char *topic);


@@ -364,14 +362,14 @@ static int parse_line(parsed_section_val
                 all whitespace at the beginning and ending of the
       value. */

              if (0 == strcasecmp(key_buffer, "vendor_id")) {
          -        if (OMPI_SUCCESS != (ret = intify_list(value,
          &sv->vendor_ids,
          +        if (OMPI_SUCCESS != (ret =
          ompi_btl_openib_ini_intify_list(value, &sv->vendor_ids,
&sv- >vendor_ids_len))) {
                      return ret;
                  }
              }

else if (0 == strcasecmp(key_buffer, "vendor_part_id")) {
          -        if (OMPI_SUCCESS != (ret = intify_list(value,
          &sv->vendor_part_ids,
          +        if (OMPI_SUCCESS != (ret =
          ompi_btl_openib_ini_intify_list(value, &sv->vendor_part_ids,
&sv- >vendor_part_ids_len))) {
                      return ret;
                  }
@@ -379,13 +377,13 @@ static int parse_line(parsed_section_val

              else if (0 == strcasecmp(key_buffer, "mtu")) {
                  /* Single value */
          -        sv->values.mtu = (uint32_t) intify(value);
          +        sv->values.mtu = (uint32_t)
          ompi_btl_openib_ini_intify(value);
                  sv->values.mtu_set = true;
              }

else if (0 == strcasecmp(key_buffer, "use_eager_rdma")) {
                  /* Single value */
- sv->values.use_eager_rdma = (uint32_t) intify(value);
          +        sv->values.use_eager_rdma = (uint32_t)
          ompi_btl_openib_ini_intify(value);
                  sv->values.use_eager_rdma_set = true;
              }

          @@ -547,7 +545,7 @@ static int save_section(parsed_section_v
           /*
           * Do string-to-integer conversion, for both hex and
       decimal numbers
           */
          -static int intify(char *str)
          +int ompi_btl_openib_ini_intify(char *str)
           {
              while (isspace(*str)) {
                  ++str;
          @@ -568,7 +566,7 @@ static int intify(char *str)
           /*
           * Take a comma-delimited list and infity them all
           */
          -static int intify_list(char *value, uint32_t **values, int
       *len)
          +int ompi_btl_openib_ini_intify_list(char *value, uint32_t
          **values, int *len)
           {
              char *comma;
              char *str = value;
          @@ -584,7 +582,7 @@ static int intify_list(char *value, uint
                  if (NULL == *values) {
                      return OMPI_ERR_OUT_OF_RESOURCE;
                  }
          -        *values[0] = (uint32_t) intify(str);
          +        *values[0] = (uint32_t)
       ompi_btl_openib_ini_intify(str);
                  *len = 1;
              } else {
/* If we found a comma, loop over all the values. Be a
          @@ -594,7 +592,7 @@ static int intify_list(char *value, uint
                  do {
                      *comma = '\0';
                      *values = realloc(*values, sizeof(uint32_t) *
       (*len + 2));
          -            (*values)[*len] = (int32_t) intify(str);
          +            (*values)[*len] = (int32_t)
          ompi_btl_openib_ini_intify(str);
                      ++(*len);
                      str = comma + 1;
                      comma = strchr(str, ',');
          @@ -602,7 +600,7 @@ static int intify_list(char *value, uint
/* Get the last value (i.e., the value after the last
                     comma, because it won't have been snarfed in the
                     loop) */
          -        (*values)[*len] = (uint32_t) intify(str);
          +        (*values)[*len] = (uint32_t)
       ompi_btl_openib_ini_intify(str);
                  ++(*len);
              }

          Index: ompi/mca/btl/openib/btl_openib_ini.h
===================================================================
          --- ompi/mca/btl/openib/btl_openib_ini.h        (revision
       19490)
+++ ompi/mca/btl/openib/btl_openib_ini.h (working copy)
          @@ -49,6 +49,9 @@ extern "C" {
               */
              int ompi_btl_openib_ini_finalize(void);

          +    int ompi_btl_openib_ini_intify(char *string);
          +    int ompi_btl_openib_ini_intify_list(char *str, uint32_t
          **values, int *len);
          +
           #if defined(c_plusplus) || defined(__cplusplus)
           }
           #endif
          Index: ompi/mca/btl/openib/btl_openib_mca.c
===================================================================
          --- ompi/mca/btl/openib/btl_openib_mca.c        (revision
       19490)
+++ ompi/mca/btl/openib/btl_openib_mca.c (working copy)
          @@ -27,6 +27,7 @@
           #include "opal/mca/base/mca_base_param.h"
           #include "btl_openib.h"
           #include "btl_openib_mca.h"
          +#include "btl_openib_ini.h"

           /*
           * Local flags
          @@ -97,7 +98,7 @@ static inline int reg_int(const char* pa
           */
           int btl_openib_register_mca_params(void)
           {
          -    char *msg, *str;
          +    char *msg, *str, *pkey;
              int ival, ival2, ret, tmp;

              ret = OMPI_SUCCESS;
@@ -192,13 +193,15 @@ int btl_openib_register_mca_params(void)
                            0, &ival, REGINT_GE_ZERO));
              mca_btl_openib_component.ib_pkey_ix = (uint32_t) ival;

          -    CHECK(reg_int("ib_pkey_val", "InfiniBand pkey value"
          +    CHECK(reg_string("ib_pkey_val", "InfiniBand pkey value"
                            "(must be > 0 and < 0xffff)",
          -                  0, &ival, REGINT_GE_ZERO));
          -    if (ival > 0xffff) {
          +                  "0", &pkey, 0));
          +    mca_btl_openib_component.ib_pkey_val =
          ompi_btl_openib_ini_intify(pkey) & 0x7fff;
          +    if (mca_btl_openib_component.ib_pkey_val > 0xffff ||
          +            mca_btl_openib_component.ib_pkey_val < 0) {
                  ret = OMPI_ERR_BAD_PARAM;
              }
          -    mca_btl_openib_component.ib_pkey_val = (uint32_t) ival;
          +    free(pkey);

              CHECK(reg_int("ib_psn", "InfiniBand packet sequence
       starting
          number "
                            "(must be >= 0)",

          _______________________________________________
          devel mailing list
          de...@open-mpi.org <mailto:de...@open-mpi.org>
       <mailto:de...@open-mpi.org <mailto:de...@open-mpi.org>>


          http://www.open-mpi.org/mailman/listinfo.cgi/devel




   --    --
   Pavel Shamis (Pasha)
   Mellanox Technologies LTD.




--
--
Pavel Shamis (Pasha)
Mellanox Technologies LTD.


_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Jeff Squyres
Cisco Systems

Reply via email to