Pasha,
That's great, thanks for the help. When exactly do you expect that
1.2.8 will be released?
Thanks,
Matt
On Tue, Oct 7, 2008 at 1:29 PM, Pavel Shamis (Pasha) <pa...@dev.mellanox.co.il
> wrote:
Matt,
For all 1.2.X versions you should use btl_openib_ib_pkey_val
In ongoing 1.3 version the parameter was renamed to
btl_openib_of_pkey_val.
BTW we plan to release 1.2.8 version very soon and it will include
the partition bug fix.
Regards,
Pasha
Matt Burgess wrote:
Pasha,
With your patch and parameter suggestion, it works! So to be clear
btl_openib_ib_pkey_val is for 1.2.6 and btl_openib_of_pkey_val is
for 1.2.7?
Thanks again,
Matt
On Tue, Oct 7, 2008 at 12:24 PM, Pavel Shamis (Pasha) <pa...@dev.mellanox.co.il
<mailto:pa...@dev.mellanox.co.il>> wrote:
Matt,
Can you please run " cat
/sys/class/infiniband/mlx4_0/ports/1/pkeys/* " on your d2-ib,d3-ib.
I would like to check the partition configuration.
Ohh, BTW I see that the command line in previous email was wrong,
Please use follow command line (the parameter name should be
"btl_openib_ib_pkey_val" for ompi-1.2.6 and my patch accepts
HEX/DEC values):
/opt/openmpi-ib/1.2.6/bin/mpirun -np 2 -H d2-ib,d3-ib -mca btl
openib,self -mca btl_openib_ib_pkey_val 0x8109
/cluster/pallas/x86_64-ib/IMB-MPI1
Ompi 1.2.6 version should work ok with this patch.
Thanks,
Pasha
Matt Burgess wrote:
Pasha,
Thanks for the patch. Unfortunately, it doesn't seem like that
fixed the problem. I realized earlier I didn't mention what
version of OpenMPI I was trying - it's 1.2.6. <http://1.2.6.>
<http://1.2.6.> Should I be trying 1.2.7 with this patch?
Thanks,
Matt
2008/10/7 Pavel Shamis (Pasha) <pa...@dev.mellanox.co.il
<mailto:pa...@dev.mellanox.co.il>
<mailto:pa...@dev.mellanox.co.il
<mailto:pa...@dev.mellanox.co.il>>>
Matt,
Can you please try attached patch ? I guess it will resolve
this
issue.
Thanks,
Pasha
Matt Burgess wrote:
Lenny,
Thanks for the info. It doesn't seem to be be working
still.
My command line is:
/opt/openmpi-ib/1.2.6/bin/mpirun -np 2 -H d2-ib,d3-ib
-mca btl
openib,self -mca btl_openib_of_pkey_val 33033
/cluster/pallas/x86_64-ib/IMB-MPI1
I don't have a
"/sys/class/infiniband/mthca0/ports/1/pkeys/"
but I do have
"/sys/class/infiniband/mlx4_0/ports/1/pkeys/".
It's contents are:
0 106 114 122 16 24 32 40 49 57
65 73 81
9 98
1 107 115 123 17 25 33 41 5 58
66 74 82
90 99
10 108 116 124 18 26 34 42 50 59
67 75 83
91 100 109 117 125 19 27 35 43 51
6 68 76 84 92 101 11 118 126 2 28
36 44 52 60
69 77 85 93 102 110 119 127 20 29
37 45 53 61 7 78 86 94 103 111 12
13 21 3 38
46 54 62 70 79 87 95 104 112 120
14 22 30 39 47 55 63 71 8 88 96
105
113 121 15
23 31 4 48 56 64 72 80 89 97
We aren't using the opensm, but voltaire's SM on a 2012
switch.
Thanks again,
Matt
On Tue, Oct 7, 2008 at 9:37 AM, Lenny Verkhovsky
<lenny.verkhov...@gmail.com
<mailto:lenny.verkhov...@gmail.com>
<mailto:lenny.verkhov...@gmail.com
<mailto:lenny.verkhov...@gmail.com>>
<mailto:lenny.verkhov...@gmail.com
<mailto:lenny.verkhov...@gmail.com>
<mailto:lenny.verkhov...@gmail.com
<mailto:lenny.verkhov...@gmail.com>>>> wrote:
Hi Matt,
It seems that the right way to do it is the
fallowing:
-mca btl openib,self -mca btl_openib_ib_pkey_val
33033
when the value is a decimal number of the pkey, in
your case
0x8109 = 33033, and no need for
btl_openib_ib_pkey_ix value.
ex.
mpirun -np 2 -H witch2,witch3 -mca btl openib,self -
mca
btl_openib_ib_pkey_val 32769 ./mpi_p1_4_1_2 -t lt
LT (2) (size min max avg) 1 3.511429 3.511429
3.511429
if it's not working check cat
/sys/class/infiniband/mthca0/ports/1/pkeys/* for
pkeys ans SM,
maybe it's a setup.
Pasha is currently checking this issue.
Best regards,
Lenny.
On 10/7/08, *Jeff Squyres* <jsquy...@cisco.com
<mailto:jsquy...@cisco.com>
<mailto:jsquy...@cisco.com <mailto:jsquy...@cisco.com>>
<mailto:jsquy...@cisco.com
<mailto:jsquy...@cisco.com> <mailto:jsquy...@cisco.com
<mailto:jsquy...@cisco.com>>>> wrote:
FWIW, if this configuration is for all of your
users, you
might want to specify these MCA params in the
default MCA
param file, or the environment, ...etc. Just so
that you
don't have to specify it on every mpirun command
line.
See
http://www.open-mpi.org/faq/?category=tuning#setting-mca-params
.
On Oct 7, 2008, at 5:43 AM, Lenny Verkhovsky
wrote:
Sorry, misunderstood the question,
thanks for Pasha the right command line
will be
-mca btl openib,self -mca
btl_openib_of_pkey_val 0x8109
-mca btl_openib_of_pkey_ix 1
ex.
#mpirun -np 2 -H witch2,witch3 -mca btl
openib,self
-mca
btl_openib_of_pkey_val 0x8001 -mca
btl_openib_of_pkey_ix 1
./mpi_p1_4_TRUNK -t lt
LT (2) (size min max avg) 1 3.443480
3.443480 3.443480
Best regards
Lenny.
On 10/6/08, Jeff Squyres <jsquy...@cisco.com
<mailto:jsquy...@cisco.com>
<mailto:jsquy...@cisco.com <mailto:jsquy...@cisco.com>>
<mailto:jsquy...@cisco.com
<mailto:jsquy...@cisco.com>
<mailto:jsquy...@cisco.com
<mailto:jsquy...@cisco.com>>>> wrote: On Oct 5, 2008, at
1:22 PM, Lenny Verkhovsky wrote:
you should probably use -mca tcp,self -mca
btl_openib_if_include ib0.8109
Really? I thought we only took OpenFabrics
device
names
in the openib_if_include MCA param...? It
looks like
ib0.8109 is an IPoIB device name.
Lenny.
On 10/3/08, Matt Burgess
<burgess.m...@gmail.com <mailto:burgess.m...@gmail.com>
<mailto:burgess.m...@gmail.com
<mailto:burgess.m...@gmail.com>>
<mailto:burgess.m...@gmail.com
<mailto:burgess.m...@gmail.com>
<mailto:burgess.m...@gmail.com
<mailto:burgess.m...@gmail.com>>>> wrote:
Hi,
I'm trying to get openmpi working over openib
partitions.
On this cluster, the partition number is
0x109. The ib
interfaces are pingable over the appropriate
ib0.8109
interface:
d2:/opt/openmpi-ib # ifconfig ib0.8109
ib0.8109 Link encap:UNSPEC HWaddr
80-00-00-4A-
FE-80-00-00-00-00-00-00-00-00-00-00
inet addr:10.21.48.2
<http://10.21.48.2> <http://10.21.48.2>
<http://10.21.48.2>
Bcast:10.21.255.255 <http://10.21.255.255>
<http://10.21.255.255>
<http://10.21.255.255>
Mask:255.255.0.0 <http://255.255.0.0>
<http://255.255.0.0>
<http://255.255.0.0>
inet6 addr: fe80::202:c902:26:ca01/64
Scope:Link
UP BROADCAST RUNNING MULTICAST
MTU:65520
Metric:1
RX packets:16811 errors:0 dropped:0
overruns:0 frame:0
TX packets:15848 errors:0 dropped:1
overruns:0
carrier:0
collisions:0 txqueuelen:256
RX bytes:102229428 (97.4 Mb) TX
bytes:102324172
(97.5 Mb)
I have tried the following:
/opt/openmpi-ib/1.2.6/bin/mpirun -np 2
-machinefile
machinefile -mca btl openib,self -mca
btl_openib_max_btls
1 -mca btl_openib_ib_pkey_val 0x8109 -mca
btl_openib_ib_pkey_ix 1
/cluster/pallas/x86_64-ib/IMB-MPI1
but I just get a RETRY EXCEEDED ERROR. Is
there a MCA
parameter I am missing?
I was successful using tcp only:
/opt/openmpi-ib/1.2.6/bin/mpirun -np 2
-machinefile
machinefile -mca btl tcp,self -mca
btl_openib_max_btls 1
-mca btl_openib_ib_pkey_val 0x8109
/cluster/pallas/x86_64-ib/IMB-MPI1
Thanks,
Matt Burgess
_______________________________________________
users mailing list
us...@open-mpi.org
<mailto:us...@open-mpi.org> <mailto:us...@open-mpi.org
<mailto:us...@open-mpi.org>>
<mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>
<mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>>>
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
<mailto:us...@open-mpi.org> <mailto:us...@open-mpi.org
<mailto:us...@open-mpi.org>>
<mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>
<mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>>>
http://www.open-mpi.org/mailman/listinfo.cgi/users
-- Jeff Squyres
Cisco Systems
_______________________________________________
users mailing list
us...@open-mpi.org
<mailto:us...@open-mpi.org> <mailto:us...@open-mpi.org
<mailto:us...@open-mpi.org>>
<mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>
<mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>>>
http://www.open-mpi.org/mailman/listinfo.cgi/users
-- Jeff Squyres
Cisco Systems
------------------------------------------------------------------------
_______________________________________________
devel mailing list
de...@open-mpi.org <mailto:de...@open-mpi.org>
<mailto:de...@open-mpi.org <mailto:de...@open-mpi.org>>
http://www.open-mpi.org/mailman/listinfo.cgi/devel
-- --
Pavel Shamis (Pasha)
Mellanox Technologies LTD.
Index: ompi/mca/btl/openib/btl_openib_component.c
===================================================================
--- ompi/mca/btl/openib/btl_openib_component.c (revision
19490)
+++ ompi/mca/btl/openib/btl_openib_component.c (working
copy)
@@ -558,7 +558,7 @@ static int init_one_hca(opal_list_t *btl
goto dealloc_pd;
}
- ret = OMPI_SUCCESS;
+ ret = OMPI_SUCCESS;
/* Note ports are 1 based hence j = 1 */
for(i = 1; i <= hca->ib_dev_attr.phys_port_cnt; i++){
struct ibv_port_attr ib_port_attr;
@@ -580,7 +580,7 @@ static int init_one_hca(opal_list_t *btl
uint16_t pkey,j;
for (j=0; j < hca->ib_dev_attr.max_pkeys;
j++) {
ibv_query_pkey(hca->ib_dev_context, i,
j, &pkey);
- pkey=ntohs(pkey);
+ pkey=ntohs(pkey) & 0x7fff;
if(pkey ==
mca_btl_openib_component.ib_pkey_val){
ret = init_one_port(btl_list, hca,
i, j,
&ib_port_attr);
break;
Index: ompi/mca/btl/openib/btl_openib_ini.c
===================================================================
--- ompi/mca/btl/openib/btl_openib_ini.c (revision
19490)
+++ ompi/mca/btl/openib/btl_openib_ini.c (working
copy)
@@ -90,8 +90,6 @@ static int parse_line(parsed_section_val
static void reset_section(bool had_previous_value,
parsed_section_values_t *s);
static void reset_values(ompi_btl_openib_ini_values_t *v);
static int save_section(parsed_section_values_t *s);
-static int intify(char *string);
-static int intify_list(char *str, uint32_t **values, int
*len);
static inline void show_help(const char *topic);
@@ -364,14 +362,14 @@ static int
parse_line(parsed_section_val
all whitespace at the beginning and ending of the
value. */
if (0 == strcasecmp(key_buffer, "vendor_id")) {
- if (OMPI_SUCCESS != (ret = intify_list(value,
&sv->vendor_ids,
+ if (OMPI_SUCCESS != (ret =
ompi_btl_openib_ini_intify_list(value, &sv->vendor_ids,
&sv-
>vendor_ids_len))) {
return ret;
}
}
else if (0 == strcasecmp(key_buffer,
"vendor_part_id")) {
- if (OMPI_SUCCESS != (ret = intify_list(value,
&sv->vendor_part_ids,
+ if (OMPI_SUCCESS != (ret =
ompi_btl_openib_ini_intify_list(value, &sv->vendor_part_ids,
&sv-
>vendor_part_ids_len))) {
return ret;
}
@@ -379,13 +377,13 @@ static int
parse_line(parsed_section_val
else if (0 == strcasecmp(key_buffer, "mtu")) {
/* Single value */
- sv->values.mtu = (uint32_t) intify(value);
+ sv->values.mtu = (uint32_t)
ompi_btl_openib_ini_intify(value);
sv->values.mtu_set = true;
}
else if (0 == strcasecmp(key_buffer,
"use_eager_rdma")) {
/* Single value */
- sv->values.use_eager_rdma = (uint32_t)
intify(value);
+ sv->values.use_eager_rdma = (uint32_t)
ompi_btl_openib_ini_intify(value);
sv->values.use_eager_rdma_set = true;
}
@@ -547,7 +545,7 @@ static int save_section(parsed_section_v
/*
* Do string-to-integer conversion, for both hex and
decimal numbers
*/
-static int intify(char *str)
+int ompi_btl_openib_ini_intify(char *str)
{
while (isspace(*str)) {
++str;
@@ -568,7 +566,7 @@ static int intify(char *str)
/*
* Take a comma-delimited list and infity them all
*/
-static int intify_list(char *value, uint32_t **values, int
*len)
+int ompi_btl_openib_ini_intify_list(char *value, uint32_t
**values, int *len)
{
char *comma;
char *str = value;
@@ -584,7 +582,7 @@ static int intify_list(char *value, uint
if (NULL == *values) {
return OMPI_ERR_OUT_OF_RESOURCE;
}
- *values[0] = (uint32_t) intify(str);
+ *values[0] = (uint32_t)
ompi_btl_openib_ini_intify(str);
*len = 1;
} else {
/* If we found a comma, loop over all the values.
Be a
@@ -594,7 +592,7 @@ static int intify_list(char *value, uint
do {
*comma = '\0';
*values = realloc(*values, sizeof(uint32_t) *
(*len + 2));
- (*values)[*len] = (int32_t) intify(str);
+ (*values)[*len] = (int32_t)
ompi_btl_openib_ini_intify(str);
++(*len);
str = comma + 1;
comma = strchr(str, ',');
@@ -602,7 +600,7 @@ static int intify_list(char *value, uint
/* Get the last value (i.e., the value after the
last
comma, because it won't have been snarfed in the
loop) */
- (*values)[*len] = (uint32_t) intify(str);
+ (*values)[*len] = (uint32_t)
ompi_btl_openib_ini_intify(str);
++(*len);
}
Index: ompi/mca/btl/openib/btl_openib_ini.h
===================================================================
--- ompi/mca/btl/openib/btl_openib_ini.h (revision
19490)
+++ ompi/mca/btl/openib/btl_openib_ini.h (working
copy)
@@ -49,6 +49,9 @@ extern "C" {
*/
int ompi_btl_openib_ini_finalize(void);
+ int ompi_btl_openib_ini_intify(char *string);
+ int ompi_btl_openib_ini_intify_list(char *str, uint32_t
**values, int *len);
+
#if defined(c_plusplus) || defined(__cplusplus)
}
#endif
Index: ompi/mca/btl/openib/btl_openib_mca.c
===================================================================
--- ompi/mca/btl/openib/btl_openib_mca.c (revision
19490)
+++ ompi/mca/btl/openib/btl_openib_mca.c (working
copy)
@@ -27,6 +27,7 @@
#include "opal/mca/base/mca_base_param.h"
#include "btl_openib.h"
#include "btl_openib_mca.h"
+#include "btl_openib_ini.h"
/*
* Local flags
@@ -97,7 +98,7 @@ static inline int reg_int(const char* pa
*/
int btl_openib_register_mca_params(void)
{
- char *msg, *str;
+ char *msg, *str, *pkey;
int ival, ival2, ret, tmp;
ret = OMPI_SUCCESS;
@@ -192,13 +193,15 @@ int
btl_openib_register_mca_params(void)
0, &ival, REGINT_GE_ZERO));
mca_btl_openib_component.ib_pkey_ix = (uint32_t) ival;
- CHECK(reg_int("ib_pkey_val", "InfiniBand pkey value"
+ CHECK(reg_string("ib_pkey_val", "InfiniBand pkey value"
"(must be > 0 and < 0xffff)",
- 0, &ival, REGINT_GE_ZERO));
- if (ival > 0xffff) {
+ "0", &pkey, 0));
+ mca_btl_openib_component.ib_pkey_val =
ompi_btl_openib_ini_intify(pkey) & 0x7fff;
+ if (mca_btl_openib_component.ib_pkey_val > 0xffff ||
+ mca_btl_openib_component.ib_pkey_val < 0) {
ret = OMPI_ERR_BAD_PARAM;
}
- mca_btl_openib_component.ib_pkey_val = (uint32_t) ival;
+ free(pkey);
CHECK(reg_int("ib_psn", "InfiniBand packet sequence
starting
number "
"(must be >= 0)",
_______________________________________________
devel mailing list
de...@open-mpi.org <mailto:de...@open-mpi.org>
<mailto:de...@open-mpi.org <mailto:de...@open-mpi.org>>
http://www.open-mpi.org/mailman/listinfo.cgi/devel
-- --
Pavel Shamis (Pasha)
Mellanox Technologies LTD.
--
--
Pavel Shamis (Pasha)
Mellanox Technologies LTD.
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel