Pasha,

Thanks for the patch. Unfortunately, it doesn't seem like that fixed the
problem. I realized earlier I didn't mention what version of OpenMPI I was
trying - it's 1.2.6. Should I be trying 1.2.7 with this patch?

Thanks,
Matt

2008/10/7 Pavel Shamis (Pasha) <pa...@dev.mellanox.co.il>

> Matt,
> Can you please try attached patch ? I guess it will resolve this issue.
>
> Thanks,
> Pasha
>
> Matt Burgess wrote:
>
>> Lenny,
>>
>> Thanks for the info. It doesn't seem to be be working still. My command
>> line is:
>>
>> /opt/openmpi-ib/1.2.6/bin/mpirun -np 2 -H d2-ib,d3-ib -mca btl openib,self
>> -mca btl_openib_of_pkey_val 33033 /cluster/pallas/x86_64-ib/IMB-MPI1
>>
>> I don't have a "/sys/class/infiniband/mthca0/ports/1/pkeys/" but I do have
>> "/sys/class/infiniband/mlx4_0/ports/1/pkeys/". It's contents are:
>>
>> 0    106  114  122  16   24   32   40   49   57   65   73   81   9    98
>> 1    107  115  123  17   25   33   41   5    58   66   74   82   90   99
>> 10   108  116  124  18   26   34   42   50   59   67   75   83   91  100
>>  109  117  125  19   27   35   43   51   6    68   76   84   92  101  11
>> 118  126  2    28   36   44   52   60   69   77   85   93  102  110  119
>>  127  20   29   37   45   53   61   7    78   86   94  103  111  12   13
>> 21   3    38   46   54   62   70   79   87   95  104  112  120  14   22   30
>>   39   47   55   63   71   8    88   96  105  113  121  15   23   31   4
>>  48   56   64   72   80   89   97
>> We aren't using the opensm, but voltaire's SM on a 2012 switch.
>>
>> Thanks again,
>> Matt
>>
>>
>> On Tue, Oct 7, 2008 at 9:37 AM, Lenny Verkhovsky <
>> lenny.verkhov...@gmail.com <mailto:lenny.verkhov...@gmail.com>> wrote:
>>
>>    Hi Matt,
>>
>>    It seems that the right way to do it is the fallowing:
>>
>>    -mca btl openib,self -mca btl_openib_ib_pkey_val 33033
>>
>>    when the value is a decimal number of the pkey, in your case
>>    0x8109 = 33033, and no need for btl_openib_ib_pkey_ix value.
>>
>>    ex.
>>    mpirun -np 2 -H witch2,witch3 -mca btl openib,self -mca
>>    btl_openib_ib_pkey_val 32769 ./mpi_p1_4_1_2 -t lt
>>    LT (2) (size min max avg) 1 3.511429 3.511429 3.511429
>>
>>    if it's not working check cat
>>    /sys/class/infiniband/mthca0/ports/1/pkeys/* for pkeys ans SM,
>>    maybe it's a setup.
>>
>>    Pasha is currently checking this issue.
>>
>>    Best regards,
>>
>>    Lenny.
>>
>>
>>
>>
>>
>>    On 10/7/08, *Jeff Squyres* <jsquy...@cisco.com
>>    <mailto:jsquy...@cisco.com>> wrote:
>>
>>        FWIW, if this configuration is for all of your users, you
>>        might want to specify these MCA params in the default MCA
>>        param file, or the environment, ...etc.  Just so that you
>>        don't have to specify it on every mpirun command line.
>>
>>        See
>>        http://www.open-mpi.org/faq/?category=tuning#setting-mca-params.
>>
>>
>>
>>        On Oct 7, 2008, at 5:43 AM, Lenny Verkhovsky wrote:
>>
>>            Sorry, misunderstood the question,
>>
>>            thanks for Pasha the right command line will be
>>
>>            -mca btl openib,self -mca btl_openib_of_pkey_val 0x8109
>>            -mca btl_openib_of_pkey_ix 1
>>
>>            ex.
>>
>>            #mpirun -np 2 -H witch2,witch3 -mca btl openib,self -mca
>>            btl_openib_of_pkey_val 0x8001 -mca btl_openib_of_pkey_ix 1
>>            ./mpi_p1_4_TRUNK -t lt
>>            LT (2) (size min max avg) 1 3.443480 3.443480 3.443480
>>
>>
>>            Best regards
>>
>>            Lenny.
>>
>>
>>            On 10/6/08, Jeff Squyres <jsquy...@cisco.com
>>            <mailto:jsquy...@cisco.com>> wrote: On Oct 5, 2008, at
>>            1:22 PM, Lenny Verkhovsky wrote:
>>
>>            you should probably use -mca tcp,self  -mca
>>            btl_openib_if_include ib0.8109
>>
>>
>>            Really?  I thought we only took OpenFabrics device names
>>            in the openib_if_include MCA param...?  It looks like
>>            ib0.8109 is an IPoIB device name.
>>
>>
>>
>>            Lenny.
>>
>>
>>
>>            On 10/3/08, Matt Burgess <burgess.m...@gmail.com
>>            <mailto:burgess.m...@gmail.com>> wrote:
>>            Hi,
>>
>>
>>            I'm trying to get openmpi working over openib partitions.
>>            On this cluster, the partition number is 0x109. The ib
>>            interfaces are pingable over the appropriate ib0.8109
>>            interface:
>>
>>            d2:/opt/openmpi-ib # ifconfig ib0.8109
>>            ib0.8109  Link encap:UNSPEC  HWaddr
>>            80-00-00-4A-FE-80-00-00-00-00-00-00-00-00-00-00
>>                    inet addr:10.21.48.2 <http://10.21.48.2>
>>             Bcast:10.21.255.255 <http://10.21.255.255>
>>             Mask:255.255.0.0 <http://255.255.0.0>
>>                    inet6 addr: fe80::202:c902:26:ca01/64 Scope:Link
>>                    UP BROADCAST RUNNING MULTICAST  MTU:65520  Metric:1
>>                    RX packets:16811 errors:0 dropped:0 overruns:0 frame:0
>>                    TX packets:15848 errors:0 dropped:1 overruns:0
>>            carrier:0
>>                    collisions:0 txqueuelen:256
>>                    RX bytes:102229428 (97.4 Mb)  TX bytes:102324172
>>            (97.5 Mb)
>>
>>
>>            I have tried the following:
>>
>>            /opt/openmpi-ib/1.2.6/bin/mpirun -np 2 -machinefile
>>            machinefile -mca btl openib,self -mca btl_openib_max_btls
>>            1 -mca btl_openib_ib_pkey_val 0x8109 -mca
>>            btl_openib_ib_pkey_ix 1 /cluster/pallas/x86_64-ib/IMB-MPI1
>>
>>            but I just get a RETRY EXCEEDED ERROR. Is there a MCA
>>            parameter I am missing?
>>
>>            I was successful using tcp only:
>>
>>            /opt/openmpi-ib/1.2.6/bin/mpirun -np 2 -machinefile
>>            machinefile -mca btl tcp,self -mca btl_openib_max_btls 1
>>            -mca btl_openib_ib_pkey_val 0x8109
>>            /cluster/pallas/x86_64-ib/IMB-MPI1
>>
>>
>>
>>            Thanks,
>>            Matt Burgess
>>
>>            _______________________________________________
>>            users mailing list
>>            us...@open-mpi.org <mailto:us...@open-mpi.org>
>>            http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>            _______________________________________________
>>            users mailing list
>>            us...@open-mpi.org <mailto:us...@open-mpi.org>
>>            http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>>            --            Jeff Squyres
>>            Cisco Systems
>>
>>
>>            _______________________________________________
>>            users mailing list
>>            us...@open-mpi.org <mailto:us...@open-mpi.org>
>>            http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>>
>>        --        Jeff Squyres
>>        Cisco Systems
>>
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>
>
> --
> --
> Pavel Shamis (Pasha)
> Mellanox Technologies LTD.
>
>
> Index: ompi/mca/btl/openib/btl_openib_component.c
> ===================================================================
> --- ompi/mca/btl/openib/btl_openib_component.c  (revision 19490)
> +++ ompi/mca/btl/openib/btl_openib_component.c  (working copy)
> @@ -558,7 +558,7 @@ static int init_one_hca(opal_list_t *btl
>          goto dealloc_pd;
>     }
>
> -    ret = OMPI_SUCCESS;
> +    ret = OMPI_SUCCESS;
>     /* Note ports are 1 based hence j = 1 */
>     for(i = 1; i <= hca->ib_dev_attr.phys_port_cnt; i++){
>         struct ibv_port_attr ib_port_attr;
> @@ -580,7 +580,7 @@ static int init_one_hca(opal_list_t *btl
>                 uint16_t pkey,j;
>                 for (j=0; j < hca->ib_dev_attr.max_pkeys; j++) {
>                     ibv_query_pkey(hca->ib_dev_context, i, j, &pkey);
> -                    pkey=ntohs(pkey);
> +                    pkey=ntohs(pkey) & 0x7fff;
>                     if(pkey == mca_btl_openib_component.ib_pkey_val){
>                         ret = init_one_port(btl_list, hca, i, j,
> &ib_port_attr);
>                         break;
> Index: ompi/mca/btl/openib/btl_openib_ini.c
> ===================================================================
> --- ompi/mca/btl/openib/btl_openib_ini.c        (revision 19490)
> +++ ompi/mca/btl/openib/btl_openib_ini.c        (working copy)
> @@ -90,8 +90,6 @@ static int parse_line(parsed_section_val
>  static void reset_section(bool had_previous_value, parsed_section_values_t
> *s);
>  static void reset_values(ompi_btl_openib_ini_values_t *v);
>  static int save_section(parsed_section_values_t *s);
> -static int intify(char *string);
> -static int intify_list(char *str, uint32_t **values, int *len);
>  static inline void show_help(const char *topic);
>
>
> @@ -364,14 +362,14 @@ static int parse_line(parsed_section_val
>        all whitespace at the beginning and ending of the value. */
>
>     if (0 == strcasecmp(key_buffer, "vendor_id")) {
> -        if (OMPI_SUCCESS != (ret = intify_list(value, &sv->vendor_ids,
> +        if (OMPI_SUCCESS != (ret = ompi_btl_openib_ini_intify_list(value,
> &sv->vendor_ids,
>                                                &sv->vendor_ids_len))) {
>             return ret;
>         }
>     }
>
>     else if (0 == strcasecmp(key_buffer, "vendor_part_id")) {
> -        if (OMPI_SUCCESS != (ret = intify_list(value,
> &sv->vendor_part_ids,
> +        if (OMPI_SUCCESS != (ret = ompi_btl_openib_ini_intify_list(value,
> &sv->vendor_part_ids,
>                                                &sv->vendor_part_ids_len)))
> {
>             return ret;
>         }
> @@ -379,13 +377,13 @@ static int parse_line(parsed_section_val
>
>     else if (0 == strcasecmp(key_buffer, "mtu")) {
>         /* Single value */
> -        sv->values.mtu = (uint32_t) intify(value);
> +        sv->values.mtu = (uint32_t) ompi_btl_openib_ini_intify(value);
>         sv->values.mtu_set = true;
>     }
>
>     else if (0 == strcasecmp(key_buffer, "use_eager_rdma")) {
>         /* Single value */
> -        sv->values.use_eager_rdma = (uint32_t) intify(value);
> +        sv->values.use_eager_rdma = (uint32_t)
> ompi_btl_openib_ini_intify(value);
>         sv->values.use_eager_rdma_set = true;
>     }
>
> @@ -547,7 +545,7 @@ static int save_section(parsed_section_v
>  /*
>  * Do string-to-integer conversion, for both hex and decimal numbers
>  */
> -static int intify(char *str)
> +int ompi_btl_openib_ini_intify(char *str)
>  {
>     while (isspace(*str)) {
>         ++str;
> @@ -568,7 +566,7 @@ static int intify(char *str)
>  /*
>  * Take a comma-delimited list and infity them all
>  */
> -static int intify_list(char *value, uint32_t **values, int *len)
> +int ompi_btl_openib_ini_intify_list(char *value, uint32_t **values, int
> *len)
>  {
>     char *comma;
>     char *str = value;
> @@ -584,7 +582,7 @@ static int intify_list(char *value, uint
>         if (NULL == *values) {
>             return OMPI_ERR_OUT_OF_RESOURCE;
>         }
> -        *values[0] = (uint32_t) intify(str);
> +        *values[0] = (uint32_t) ompi_btl_openib_ini_intify(str);
>         *len = 1;
>     } else {
>         /* If we found a comma, loop over all the values.  Be a
> @@ -594,7 +592,7 @@ static int intify_list(char *value, uint
>         do {
>             *comma = '\0';
>             *values = realloc(*values, sizeof(uint32_t) * (*len + 2));
> -            (*values)[*len] = (int32_t) intify(str);
> +            (*values)[*len] = (int32_t) ompi_btl_openib_ini_intify(str);
>             ++(*len);
>             str = comma + 1;
>             comma = strchr(str, ',');
> @@ -602,7 +600,7 @@ static int intify_list(char *value, uint
>         /* Get the last value (i.e., the value after the last
>            comma, because it won't have been snarfed in the
>            loop) */
> -        (*values)[*len] = (uint32_t) intify(str);
> +        (*values)[*len] = (uint32_t) ompi_btl_openib_ini_intify(str);
>         ++(*len);
>     }
>
> Index: ompi/mca/btl/openib/btl_openib_ini.h
> ===================================================================
> --- ompi/mca/btl/openib/btl_openib_ini.h        (revision 19490)
> +++ ompi/mca/btl/openib/btl_openib_ini.h        (working copy)
> @@ -49,6 +49,9 @@ extern "C" {
>      */
>     int ompi_btl_openib_ini_finalize(void);
>
> +    int ompi_btl_openib_ini_intify(char *string);
> +    int ompi_btl_openib_ini_intify_list(char *str, uint32_t **values, int
> *len);
> +
>  #if defined(c_plusplus) || defined(__cplusplus)
>  }
>  #endif
> Index: ompi/mca/btl/openib/btl_openib_mca.c
> ===================================================================
> --- ompi/mca/btl/openib/btl_openib_mca.c        (revision 19490)
> +++ ompi/mca/btl/openib/btl_openib_mca.c        (working copy)
> @@ -27,6 +27,7 @@
>  #include "opal/mca/base/mca_base_param.h"
>  #include "btl_openib.h"
>  #include "btl_openib_mca.h"
> +#include "btl_openib_ini.h"
>
>  /*
>  * Local flags
> @@ -97,7 +98,7 @@ static inline int reg_int(const char* pa
>  */
>  int btl_openib_register_mca_params(void)
>  {
> -    char *msg, *str;
> +    char *msg, *str, *pkey;
>     int ival, ival2, ret, tmp;
>
>     ret = OMPI_SUCCESS;
> @@ -192,13 +193,15 @@ int btl_openib_register_mca_params(void)
>                   0, &ival, REGINT_GE_ZERO));
>     mca_btl_openib_component.ib_pkey_ix = (uint32_t) ival;
>
> -    CHECK(reg_int("ib_pkey_val", "InfiniBand pkey value"
> +    CHECK(reg_string("ib_pkey_val", "InfiniBand pkey value"
>                   "(must be > 0 and < 0xffff)",
> -                  0, &ival, REGINT_GE_ZERO));
> -    if (ival > 0xffff) {
> +                  "0", &pkey, 0));
> +    mca_btl_openib_component.ib_pkey_val =
> ompi_btl_openib_ini_intify(pkey) & 0x7fff;
> +    if (mca_btl_openib_component.ib_pkey_val > 0xffff ||
> +            mca_btl_openib_component.ib_pkey_val < 0) {
>         ret = OMPI_ERR_BAD_PARAM;
>     }
> -    mca_btl_openib_component.ib_pkey_val = (uint32_t) ival;
> +    free(pkey);
>
>     CHECK(reg_int("ib_psn", "InfiniBand packet sequence starting number "
>                   "(must be >= 0)",
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>

Reply via email to