Re: [OMPI users] setsockopt() fails with EINVAL on solaris

2012-07-30 Thread TERRY DONTJE
Do you know what r# of 1.6 you were trying to compile?  Is this via the 
tarball or svn?


thanks,

--td

On 7/30/2012 9:41 AM, Daniel Junglas wrote:

Hi,

I compiled OpenMPI 1.6 on a 64bit Solaris ultrasparc machine.
Compilation and installation worked without a problem. However,
when trying to run an application with mpirun I always faced
this error:

[hostname:14798] [[50433,0],0] rmcast:init: setsockopt() failed on
MULTICAST_IF
 for multicast network xxx.xxx.xxx.xxx interface xxx.xxx.xxx.xxx
 Error: Invalid argument (22)
[hostname:14798] [[50433,0],0] ORTE_ERROR_LOG: Error in file
../../../../../openmpi-1.6/orte/mca/rmcast/udp/rmcast_udp.c at line 825
[hostname:14798] [[50433,0],0] ORTE_ERROR_LOG: Error in file
../../../../../openmpi-1.6/orte/mca/rmcast/udp/rmcast_udp.c at line 744
[hostname:14798] [[50433,0],0] ORTE_ERROR_LOG: Error in file
../../../../../openmpi-1.6/orte/mca/rmcast/udp/rmcast_udp.c at line 193
[hostname:14798] [[50433,0],0] ORTE_ERROR_LOG: Error in file
../../../../openmpi-1.6/orte/mca/rmcast/base/rmcast_base_select.c at line
56
[hostname:14798] [[50433,0],0] ORTE_ERROR_LOG: Error in file
../../../../../openmpi-1.6/orte/mca/ess/hnp/ess_hnp_module.c at line 233
--
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

   orte_rmcast_base_select failed
   -->  Returned value Error (-1) instead of ORTE_SUCCESS


After some digging I found that the following patch seems to fix the
problem (at least the application seems to run correct now):
--- a/orte/mca/rmcast/udp/rmcast_udp.c  Tue Apr  3 16:30:29 2012
+++ b/orte/mca/rmcast/udp/rmcast_udp.c  Mon Jul 30 15:12:02 2012
@@ -936,9 +936,16 @@
  }
  } else {
  /* on the xmit side, need to set the interface */
+void const *addrptr;
  memset(&inaddr, 0, sizeof(inaddr));
  inaddr.sin_addr.s_addr = htonl(chan->interface);
+#ifdef __sun
+addrlen = sizeof(inaddr.sin_addr);
+addrptr = (void *)&inaddr.sin_addr;
+#else
  addrlen = sizeof(struct sockaddr_in);
+addrptr = (void *)&inaddr;
+#endif

  OPAL_OUTPUT_VERBOSE((2, orte_rmcast_base.rmcast_output,
   "setup:socket:xmit interface
%03d.%03d.%03d.%03d",
@@ -945,7 +952,7 @@
   OPAL_IF_FORMAT_ADDR(chan->interface)));

  if ((setsockopt(target_sd, IPPROTO_IP, IP_MULTICAST_IF,
-(void *)&inaddr, addrlen))<  0) {
+addrptr, addrlen))<  0) {
  opal_output(0, "%s rmcast:init: setsockopt() failed on
MULTICAST_IF\n"
  "\tfor multicast network %03d.%03d.%03d.%03d
interface %03d.%03d.%03d.%03d\n\tError: %s (%d)",
  ORTE_NAME_PRINT(ORTE_PROC_MY_NAME),
Can anybody confirm that the patch is good/correct? In particular
that the '__sun' part is the right thing to do?

Thanks,

Daniel


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





Re: [OMPI users] setsockopt() fails with EINVAL on solaris

2012-07-30 Thread Jeff Squyres
Ralph actually suggests that we just remove rmcast from 1.6.1.


On Jul 30, 2012, at 10:15 AM, TERRY DONTJE wrote:

> Do you know what r# of 1.6 you were trying to compile?  Is this via the 
> tarball or svn?
> 
> thanks,
> 
> --td
> 
> On 7/30/2012 9:41 AM, Daniel Junglas wrote:
>> Hi,
>> 
>> I compiled OpenMPI 1.6 on a 64bit Solaris ultrasparc machine.
>> Compilation and installation worked without a problem. However,
>> when trying to run an application with mpirun I always faced
>> this error:
>> 
>> [hostname:14798] [[50433,0],0] rmcast:init: setsockopt() failed on 
>> MULTICAST_IF
>> for multicast network xxx.xxx.xxx.xxx interface xxx.xxx.xxx.xxx
>> Error: Invalid argument (22)
>> [hostname:14798] [[50433,0],0] ORTE_ERROR_LOG: Error in file 
>> ../../../../../openmpi-1.6/orte/mca/rmcast/udp/rmcast_udp.c at line 825
>> [hostname:14798] [[50433,0],0] ORTE_ERROR_LOG: Error in file 
>> ../../../../../openmpi-1.6/orte/mca/rmcast/udp/rmcast_udp.c at line 744
>> [hostname:14798] [[50433,0],0] ORTE_ERROR_LOG: Error in file 
>> ../../../../../openmpi-1.6/orte/mca/rmcast/udp/rmcast_udp.c at line 193
>> [hostname:14798] [[50433,0],0] ORTE_ERROR_LOG: Error in file 
>> ../../../../openmpi-1.6/orte/mca/rmcast/base/rmcast_base_select.c at line 
>> 56
>> [hostname:14798] [[50433,0],0] ORTE_ERROR_LOG: Error in file 
>> ../../../../../openmpi-1.6/orte/mca/ess/hnp/ess_hnp_module.c at line 233
>> --
>> It looks like orte_init failed for some reason; your parallel process is
>> likely to abort.  There are many reasons that a parallel process can
>> fail during orte_init; some of which are due to configuration or
>> environment problems.  This failure appears to be an internal failure;
>> here's some additional information (which may only be relevant to an
>> Open MPI developer):
>> 
>>   orte_rmcast_base_select failed
>>   --> Returned value Error (-1) instead of ORTE_SUCCESS
>> 
>> 
>> After some digging I found that the following patch seems to fix the
>> problem (at least the application seems to run correct now):
>> --- a/orte/mca/rmcast/udp/rmcast_udp.c  Tue Apr  3 16:30:29 2012
>> +++ b/orte/mca/rmcast/udp/rmcast_udp.c  Mon Jul 30 15:12:02 2012
>> @@ -936,9 +936,16 @@
>>  }
>>  } else {
>>  /* on the xmit side, need to set the interface */
>> +void const *addrptr;
>>  memset(&inaddr, 0, sizeof(inaddr));
>>  inaddr.sin_addr.s_addr = htonl(chan->interface);
>> +#ifdef __sun
>> +addrlen = sizeof(inaddr.sin_addr);
>> +addrptr = (void *)&inaddr.sin_addr;
>> +#else
>>  addrlen = sizeof(struct sockaddr_in);
>> +addrptr = (void *)&inaddr;
>> +#endif
>>  
>>  OPAL_OUTPUT_VERBOSE((2, orte_rmcast_base.rmcast_output,
>>   "setup:socket:xmit interface 
>> %03d.%03d.%03d.%03d",
>> @@ -945,7 +952,7 @@
>>   OPAL_IF_FORMAT_ADDR(chan->interface)));
>>  
>>  if ((setsockopt(target_sd, IPPROTO_IP, IP_MULTICAST_IF, 
>> -(void *)&inaddr, addrlen)) < 0) {
>> +addrptr, addrlen)) < 0) {
>>  opal_output(0, "%s rmcast:init: setsockopt() failed on 
>> MULTICAST_IF\n"
>>  "\tfor multicast network %03d.%03d.%03d.%03d 
>> interface %03d.%03d.%03d.%03d\n\tError: %s (%d)",
>>  ORTE_NAME_PRINT(ORTE_PROC_MY_NAME),
>> Can anybody confirm that the patch is good/correct? In particular
>> that the '__sun' part is the right thing to do?
>> 
>> Thanks,
>> 
>> Daniel
>> 
>> 
>> 
>> ___
>> users mailing list
>> 
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> -- 
> Terry D. Dontje | Principal Software Engineer
> Developer Tools Engineering | +1.781.442.2631
> Oracle - Performance Technologies
> 95 Network Drive, Burlington, MA 01803
> Email terry.don...@oracle.com
> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] setsockopt() fails with EINVAL on solaris

2012-07-30 Thread Ralph Castain
FWIW: the rmcast framework shouldn't be in 1.6. Jeff and I are testing removal 
and should have it out of there soon.

Meantime, the best solution is to "--enable-mca-no-build rmcast"

On Jul 30, 2012, at 7:15 AM, TERRY DONTJE wrote:

> Do you know what r# of 1.6 you were trying to compile?  Is this via the 
> tarball or svn?
> 
> thanks,
> 
> --td
> 
> On 7/30/2012 9:41 AM, Daniel Junglas wrote:
>> 
>> Hi,
>> 
>> I compiled OpenMPI 1.6 on a 64bit Solaris ultrasparc machine.
>> Compilation and installation worked without a problem. However,
>> when trying to run an application with mpirun I always faced
>> this error:
>> 
>> [hostname:14798] [[50433,0],0] rmcast:init: setsockopt() failed on 
>> MULTICAST_IF
>> for multicast network xxx.xxx.xxx.xxx interface xxx.xxx.xxx.xxx
>> Error: Invalid argument (22)
>> [hostname:14798] [[50433,0],0] ORTE_ERROR_LOG: Error in file 
>> ../../../../../openmpi-1.6/orte/mca/rmcast/udp/rmcast_udp.c at line 825
>> [hostname:14798] [[50433,0],0] ORTE_ERROR_LOG: Error in file 
>> ../../../../../openmpi-1.6/orte/mca/rmcast/udp/rmcast_udp.c at line 744
>> [hostname:14798] [[50433,0],0] ORTE_ERROR_LOG: Error in file 
>> ../../../../../openmpi-1.6/orte/mca/rmcast/udp/rmcast_udp.c at line 193
>> [hostname:14798] [[50433,0],0] ORTE_ERROR_LOG: Error in file 
>> ../../../../openmpi-1.6/orte/mca/rmcast/base/rmcast_base_select.c at line 
>> 56
>> [hostname:14798] [[50433,0],0] ORTE_ERROR_LOG: Error in file 
>> ../../../../../openmpi-1.6/orte/mca/ess/hnp/ess_hnp_module.c at line 233
>> --
>> It looks like orte_init failed for some reason; your parallel process is
>> likely to abort.  There are many reasons that a parallel process can
>> fail during orte_init; some of which are due to configuration or
>> environment problems.  This failure appears to be an internal failure;
>> here's some additional information (which may only be relevant to an
>> Open MPI developer):
>> 
>>   orte_rmcast_base_select failed
>>   --> Returned value Error (-1) instead of ORTE_SUCCESS
>> 
>> 
>> After some digging I found that the following patch seems to fix the
>> problem (at least the application seems to run correct now):
>> --- a/orte/mca/rmcast/udp/rmcast_udp.c  Tue Apr  3 16:30:29 2012
>> +++ b/orte/mca/rmcast/udp/rmcast_udp.c  Mon Jul 30 15:12:02 2012
>> @@ -936,9 +936,16 @@
>>  }
>>  } else {
>>  /* on the xmit side, need to set the interface */
>> +void const *addrptr;
>>  memset(&inaddr, 0, sizeof(inaddr));
>>  inaddr.sin_addr.s_addr = htonl(chan->interface);
>> +#ifdef __sun
>> +addrlen = sizeof(inaddr.sin_addr);
>> +addrptr = (void *)&inaddr.sin_addr;
>> +#else
>>  addrlen = sizeof(struct sockaddr_in);
>> +addrptr = (void *)&inaddr;
>> +#endif
>>  
>>  OPAL_OUTPUT_VERBOSE((2, orte_rmcast_base.rmcast_output,
>>   "setup:socket:xmit interface 
>> %03d.%03d.%03d.%03d",
>> @@ -945,7 +952,7 @@
>>   OPAL_IF_FORMAT_ADDR(chan->interface)));
>>  
>>  if ((setsockopt(target_sd, IPPROTO_IP, IP_MULTICAST_IF, 
>> -(void *)&inaddr, addrlen)) < 0) {
>> +addrptr, addrlen)) < 0) {
>>  opal_output(0, "%s rmcast:init: setsockopt() failed on 
>> MULTICAST_IF\n"
>>  "\tfor multicast network %03d.%03d.%03d.%03d 
>> interface %03d.%03d.%03d.%03d\n\tError: %s (%d)",
>>  ORTE_NAME_PRINT(ORTE_PROC_MY_NAME),
>> Can anybody confirm that the patch is good/correct? In particular
>> that the '__sun' part is the right thing to do?
>> 
>> Thanks,
>> 
>> Daniel
>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> -- 
> Terry D. Dontje | Principal Software Engineer
> Developer Tools Engineering | +1.781.442.2631
> Oracle - Performance Technologies
> 95 Network Drive, Burlington, MA 01803
> Email terry.don...@oracle.com
> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] setsockopt() fails with EINVAL on solaris

2012-07-30 Thread Daniel Junglas
I built from a tarball, not svn. In the VERSION file I have
  svn_r=r26429
Is that the information you asked for?

Daniel

users-boun...@open-mpi.org wrote on 07/30/2012 04:15:45 PM:
> 
> Do you know what r# of 1.6 you were trying to compile?  Is this via 
> the tarball or svn?
> 
> thanks,
> 
> --td
> 
> On 7/30/2012 9:41 AM, Daniel Junglas wrote: 
> Hi,
> 
> I compiled OpenMPI 1.6 on a 64bit Solaris ultrasparc machine.
> Compilation and installation worked without a problem. However,
> when trying to run an application with mpirun I always faced
> this error:
> 
> [hostname:14798] [[50433,0],0] rmcast:init: setsockopt() failed on 
> MULTICAST_IF
> for multicast network xxx.xxx.xxx.xxx interface xxx.xxx.xxx.xxx
> Error: Invalid argument (22)
> [hostname:14798] [[50433,0],0] ORTE_ERROR_LOG: Error in file 
> ../../../../../openmpi-1.6/orte/mca/rmcast/udp/rmcast_udp.c at line 825
> [hostname:14798] [[50433,0],0] ORTE_ERROR_LOG: Error in file 
> ../../../../../openmpi-1.6/orte/mca/rmcast/udp/rmcast_udp.c at line 744
> [hostname:14798] [[50433,0],0] ORTE_ERROR_LOG: Error in file 
> ../../../../../openmpi-1.6/orte/mca/rmcast/udp/rmcast_udp.c at line 193
> [hostname:14798] [[50433,0],0] ORTE_ERROR_LOG: Error in file 
> ../../../../openmpi-1.6/orte/mca/rmcast/base/rmcast_base_select.c at 
line 
> 56
> [hostname:14798] [[50433,0],0] ORTE_ERROR_LOG: Error in file 
> ../../../../../openmpi-1.6/orte/mca/ess/hnp/ess_hnp_module.c at line 233
> 
--
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems.  This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
> 
>   orte_rmcast_base_select failed
>   --> Returned value Error (-1) instead of ORTE_SUCCESS
> 
> 
> After some digging I found that the following patch seems to fix the
> problem (at least the application seems to run correct now):
> --- a/orte/mca/rmcast/udp/rmcast_udp.c  Tue Apr  3 16:30:29 2012
> +++ b/orte/mca/rmcast/udp/rmcast_udp.c  Mon Jul 30 15:12:02 2012
> @@ -936,9 +936,16 @@
>  }
>  } else {
>  /* on the xmit side, need to set the interface */
> +void const *addrptr;
>  memset(&inaddr, 0, sizeof(inaddr));
>  inaddr.sin_addr.s_addr = htonl(chan->interface);
> +#ifdef __sun
> +addrlen = sizeof(inaddr.sin_addr);
> +addrptr = (void *)&inaddr.sin_addr;
> +#else
>  addrlen = sizeof(struct sockaddr_in);
> +addrptr = (void *)&inaddr;
> +#endif
> 
>  OPAL_OUTPUT_VERBOSE((2, orte_rmcast_base.rmcast_output,
>   "setup:socket:xmit interface 
> %03d.%03d.%03d.%03d",
> @@ -945,7 +952,7 @@
>   OPAL_IF_FORMAT_ADDR(chan->interface)));
> 
>  if ((setsockopt(target_sd, IPPROTO_IP, IP_MULTICAST_IF, 
> -(void *)&inaddr, addrlen)) < 0) {
> +addrptr, addrlen)) < 0) {
>  opal_output(0, "%s rmcast:init: setsockopt() failed on 
> MULTICAST_IF\n"
>  "\tfor multicast network %03d.%03d.%03d.%03d 
> interface %03d.%03d.%03d.%03d\n\tError: %s (%d)",
>  ORTE_NAME_PRINT(ORTE_PROC_MY_NAME),
> Can anybody confirm that the patch is good/correct? In particular
> that the '__sun' part is the right thing to do?
> 
> Thanks,
> 
> Daniel

> 

> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> -- 
> Terry D. Dontje | Principal Software Engineer
> Developer Tools Engineering | +1.781.442.2631
> Oracle - Performance Technologies
> 95 Network Drive, Burlington, MA 01803
> Email terry.don...@oracle.com

> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OMPI users] setsockopt() fails with EINVAL on solaris

2012-07-31 Thread Daniel Junglas
Thanks,

configuring with '--enable-mca-no-build=rmcast' did the trick for me.

Daniel

users-boun...@open-mpi.org wrote on 07/30/2012 04:21:13 PM:
> FWIW: the rmcast framework shouldn't be in 1.6. Jeff and I are 
> testing removal and should have it out of there soon.
> 
> Meantime, the best solution is to "--enable-mca-no-build rmcast"
> 
> On Jul 30, 2012, at 7:15 AM, TERRY DONTJE wrote:
> 
> Do you know what r# of 1.6 you were trying to compile?  Is this via 
> the tarball or svn?
> 
> thanks,
> 
> --td
> 
> On 7/30/2012 9:41 AM, Daniel Junglas wrote: 
> Hi,
> 
> I compiled OpenMPI 1.6 on a 64bit Solaris ultrasparc machine.
> Compilation and installation worked without a problem. However,
> when trying to run an application with mpirun I always faced
> this error:
> 
> [hostname:14798] [[50433,0],0] rmcast:init: setsockopt() failed on 
> MULTICAST_IF
> for multicast network xxx.xxx.xxx.xxx interface xxx.xxx.xxx.xxx
> Error: Invalid argument (22)
> [hostname:14798] [[50433,0],0] ORTE_ERROR_LOG: Error in file 
> ../../../../../openmpi-1.6/orte/mca/rmcast/udp/rmcast_udp.c at line 825
> [hostname:14798] [[50433,0],0] ORTE_ERROR_LOG: Error in file 
> ../../../../../openmpi-1.6/orte/mca/rmcast/udp/rmcast_udp.c at line 744
> [hostname:14798] [[50433,0],0] ORTE_ERROR_LOG: Error in file 
> ../../../../../openmpi-1.6/orte/mca/rmcast/udp/rmcast_udp.c at line 193
> [hostname:14798] [[50433,0],0] ORTE_ERROR_LOG: Error in file 
> ../../../../openmpi-1.6/orte/mca/rmcast/base/rmcast_base_select.c at 
line 
> 56
> [hostname:14798] [[50433,0],0] ORTE_ERROR_LOG: Error in file 
> ../../../../../openmpi-1.6/orte/mca/ess/hnp/ess_hnp_module.c at line 233
> 
--
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems.  This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
> 
>   orte_rmcast_base_select failed
>   --> Returned value Error (-1) instead of ORTE_SUCCESS
> 
> 
> After some digging I found that the following patch seems to fix the
> problem (at least the application seems to run correct now):
> --- a/orte/mca/rmcast/udp/rmcast_udp.c  Tue Apr  3 16:30:29 2012
> +++ b/orte/mca/rmcast/udp/rmcast_udp.c  Mon Jul 30 15:12:02 2012
> @@ -936,9 +936,16 @@
>  }
>  } else {
>  /* on the xmit side, need to set the interface */
> +void const *addrptr;
>  memset(&inaddr, 0, sizeof(inaddr));
>  inaddr.sin_addr.s_addr = htonl(chan->interface);
> +#ifdef __sun
> +addrlen = sizeof(inaddr.sin_addr);
> +addrptr = (void *)&inaddr.sin_addr;
> +#else
>  addrlen = sizeof(struct sockaddr_in);
> +addrptr = (void *)&inaddr;
> +#endif
> 
>  OPAL_OUTPUT_VERBOSE((2, orte_rmcast_base.rmcast_output,
>   "setup:socket:xmit interface 
> %03d.%03d.%03d.%03d",
> @@ -945,7 +952,7 @@
>   OPAL_IF_FORMAT_ADDR(chan->interface)));
> 
>  if ((setsockopt(target_sd, IPPROTO_IP, IP_MULTICAST_IF, 
> -(void *)&inaddr, addrlen)) < 0) {
> +addrptr, addrlen)) < 0) {
>  opal_output(0, "%s rmcast:init: setsockopt() failed on 
> MULTICAST_IF\n"
>  "\tfor multicast network %03d.%03d.%03d.%03d 
> interface %03d.%03d.%03d.%03d\n\tError: %s (%d)",
>  ORTE_NAME_PRINT(ORTE_PROC_MY_NAME),
> Can anybody confirm that the patch is good/correct? In particular
> that the '__sun' part is the right thing to do?
> 
> Thanks,
> 
> Daniel

> 

> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> -- 
> Terry D. Dontje | Principal Software Engineer
> Developer Tools Engineering | +1.781.442.2631
> Oracle - Performance Technologies
> 95 Network Drive, Burlington, MA 01803
> Email terry.don...@oracle.com

> 

> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


smime.p7s
Description: S/MIME Cryptographic Signature