Re: [OMPI devel] Duplicated modex issue.

2012-12-21 Thread Victor Kocheganov
Actually, if I reuse id's in equivalent calls like this:

...
'modex' block;
'modex' block;
'modex' block;
...

or

...
'barrier' block;
'barrier' block;
'barrier' block;
...

there is no hanging. The hang only occurs if this "reusing" follows after
using of another collective id, In the way I wrote in the first letter:

...
'modex' block;
'barrier' block;
'modex' block; <- hangs
...

or in this way

...
'barrier' block;
'modex' block;
'barrier' block; <- hangs
...


If I use different collective id while calling modex (1, 2 , ... , but not
 0==orte_process_info.peer_modex), that also won't work, unfortunately..



On Thu, Dec 20, 2012 at 10:39 PM, Ralph Castain  wrote:

> Yeah, that won't work. The id's cannot be reused, so you'd have to assign
> a different one in each case.
>
> On Dec 20, 2012, at 9:12 AM, Victor Kocheganov <
> victor.kochega...@itseez.com> wrote:
>
> In every 'modex' block I use  coll->id = orte_process_info.peer_modex;
> id and in every 'barrier' block I use  coll->id =
> orte_process_info.peer_init_barrier;  id.
>
> P.s. In general (as I wrote in first letter), I use 'modex' term for
> following code:
> coll = OBJ_NEW(orte_grpcomm_collective_t);
> coll->id = orte_process_info.peer_modex;
> if (ORTE_SUCCESS != (ret = orte_grpcomm.modex(coll))) {
> error = "orte_grpcomm_modex failed";
> goto error;
> }
> /* wait for modex to complete - this may be moved anywhere in mpi_init
>  * so long as it occurs prior to calling a function that needs
>  * the modex info!
>  */
> while (coll->active) {
> opal_progress();  /* block in progress pending events */
> }
> OBJ_RELEASE(coll);
>
> and 'barrier' for this:
>
> coll = OBJ_NEW(orte_grpcomm_collective_t);
> coll->id = orte_process_info.peer_init_barrier;
> if (ORTE_SUCCESS != (ret = orte_grpcomm.barrier(coll))) {
> error = "orte_grpcomm_barrier failed";
> goto error;
> }
> /* wait for barrier to complete */
> while (coll->active) {
> opal_progress();  /* block in progress pending events */
> }
> OBJ_RELEASE(coll);
>
> On Thu, Dec 20, 2012 at 8:57 PM, Ralph Castain  wrote:
>
>>
>> On Dec 20, 2012, at 8:29 AM, Victor Kocheganov <
>> victor.kochega...@itseez.com> wrote:
>>
>> Thanks for fast answer, Ralph.
>>
>> In my example I use different collective objects. I mean in every
>> mentioned block I call  *coll = OBJ_NEW(orte_grpcomm_**collective_t);*
>> and *OBJ_RELEASE(coll);* , so all the grpcomm operations use unique
>> collective object.
>>
>>
>> How are the procs getting the collective id for those new calls? They all
>> have to match
>>
>>
>>
>> On Thu, Dec 20, 2012 at 7:48 PM, Ralph Castain  wrote:
>>
>>> Absolutely it will hang as the collective object passed into any grpcomm
>>> operation (modex or barrier) is only allowed to be used once - any attempt
>>> to reuse it will fail.
>>>
>>>
>>> On Dec 20, 2012, at 6:57 AM, Victor Kocheganov <
>>> victor.kochega...@itseez.com> wrote:
>>>
>>>   Hi.
>>>
>>> I have an issue with understanding  *ompi_mpi_init() *logic. Could you
>>> please tell me if you have any guesses about following behavior.
>>>
>>> I wonder if I understand ringh, there is a block in *ompi_mpi_init() 
>>> *function
>>> for exchanging procs information between processes (denote this block
>>> 'modex'):
>>>
>>> coll = OBJ_NEW(orte_grpcomm_collective_t);
>>> coll->id = orte_process_info.peer_modex;
>>> if (ORTE_SUCCESS != (ret = orte_grpcomm.modex(coll))) {
>>> error = "orte_grpcomm_modex failed";
>>> goto error;
>>> }
>>> /* wait for modex to complete - this may be moved anywhere in
>>> mpi_init
>>>  * so long as it occurs prior to calling a function that needs
>>>  * the modex info!
>>>  */
>>> while (coll->active) {
>>> opal_progress();  /* block in progress pending events */
>>> }
>>> OBJ_RELEASE(coll);
>>>
>>> and several instructions after this there is a block for processes
>>> synchronization (denote this block 'barrier'):
>>>
>>> coll = OBJ_NEW(orte_grpcomm_collective_t);
>>> coll->id = orte_process_info.peer_init_barrier;
>>> if (ORTE_SUCCESS != (ret = orte_grpcomm.barrier(coll))) {
>>> error = "orte_grpcomm_barrier failed";
>>> goto error;
>>> }
>>> /* wait for barrier to complete */
>>> while (coll->active) {
>>> opal_progress();  /* block in progress pending events */
>>> }
>>> OBJ_RELEASE(coll);
>>>
>>> So,* *initially* **ompi_mpi_init()* has following structure:
>>>
>>> ...
>>> 'modex' block;
>>> ...
>>> 'barrier' block;
>>> ...
>>>
>>> I made several experiments with this code and the following one is of
>>> interest: if I add sequence of two additional blocks, 'barrier' and
>>> 'modex', right after 'modex' block, then* **ompi_mpi_init() *hangs in *
>>> opal_progress()* of the last 'modex' block.
>>>
>>> ...
>>> 'modex' block;
>>> 'barrier' block;
>>> 'modex' block; <- han

Re: [OMPI devel] Duplicated modex issue.

2012-12-21 Thread Ralph Castain
Don't know how many times I can repeat it, but I'll try again: you are not 
allowed to reuse a collective id. If it happens to work, it's by accident.

If you want to implement multiple modex/barrier operations, they each need to 
have their own unique collective id.


On Dec 20, 2012, at 9:28 PM, Victor Kocheganov  
wrote:

> Actually, if I reuse id's in equivalent calls like this:
> ...
> 'modex' block;
> 'modex' block;
> 'modex' block;
> ...
> or 
> ...
> 'barrier' block;
> 'barrier' block;
> 'barrier' block;
> ...
> there is no hanging. The hang only occurs if this "reusing" follows after 
> using of another collective id, In the way I wrote in the first letter:
> ...
> 'modex' block;
> 'barrier' block;
> 'modex' block; <- hangs
> ...
> or in this way
> ...
> 'barrier' block;
> 'modex' block;
> 'barrier' block; <- hangs
> ...
> 
> If I use different collective id while calling modex (1, 2 , ... , but not  
> 0==orte_process_info.peer_modex), that also won't work, unfortunately..
> 
> 
> 
> On Thu, Dec 20, 2012 at 10:39 PM, Ralph Castain  wrote:
> Yeah, that won't work. The id's cannot be reused, so you'd have to assign a 
> different one in each case.
> 
> On Dec 20, 2012, at 9:12 AM, Victor Kocheganov  
> wrote:
> 
>> In every 'modex' block I use  coll->id = orte_process_info.peer_modex;  id 
>> and in every 'barrier' block I use  coll->id = 
>> orte_process_info.peer_init_barrier;  id. 
>> 
>> P.s. In general (as I wrote in first letter), I use 'modex' term for 
>> following code:
>> coll = OBJ_NEW(orte_grpcomm_collective_t);
>> coll->id = orte_process_info.peer_modex;
>> if (ORTE_SUCCESS != (ret = orte_grpcomm.modex(coll))) {
>> error = "orte_grpcomm_modex failed";
>> goto error;
>> }
>> /* wait for modex to complete - this may be moved anywhere in mpi_init
>>  * so long as it occurs prior to calling a function that needs
>>  * the modex info!
>>  */
>> while (coll->active) {
>> opal_progress();  /* block in progress pending events */
>> }
>> OBJ_RELEASE(coll);
>> 
>> and 'barrier' for this:
>> 
>> coll = OBJ_NEW(orte_grpcomm_collective_t);
>> coll->id = orte_process_info.peer_init_barrier;
>> if (ORTE_SUCCESS != (ret = orte_grpcomm.barrier(coll))) {
>> error = "orte_grpcomm_barrier failed";
>> goto error;
>> }
>> /* wait for barrier to complete */
>> while (coll->active) {
>> opal_progress();  /* block in progress pending events */
>> }
>> OBJ_RELEASE(coll);
>> 
>> On Thu, Dec 20, 2012 at 8:57 PM, Ralph Castain  wrote:
>> 
>> On Dec 20, 2012, at 8:29 AM, Victor Kocheganov 
>>  wrote:
>> 
>>> Thanks for fast answer, Ralph.
>>> 
>>> In my example I use different collective objects. I mean in every mentioned 
>>> block I call  coll = OBJ_NEW(orte_grpcomm_collective_t);  
>>> and OBJ_RELEASE(coll); , so all the grpcomm operations use unique 
>>> collective object. 
>> 
>> How are the procs getting the collective id for those new calls? They all 
>> have to match
>> 
>>> 
>>> 
>>> On Thu, Dec 20, 2012 at 7:48 PM, Ralph Castain  wrote:
>>> Absolutely it will hang as the collective object passed into any grpcomm 
>>> operation (modex or barrier) is only allowed to be used once - any attempt 
>>> to reuse it will fail.
>>> 
>>> 
>>> On Dec 20, 2012, at 6:57 AM, Victor Kocheganov 
>>>  wrote:
>>> 
 Hi.
 
 I have an issue with understanding  ompi_mpi_init() logic. Could you 
 please tell me if you have any guesses about following behavior.
 
 I wonder if I understand ringh, there is a block in ompi_mpi_init() 
 function for exchanging procs information between processes (denote this 
 block 'modex'):
 coll = OBJ_NEW(orte_grpcomm_collective_t);
 coll->id = orte_process_info.peer_modex;
 if (ORTE_SUCCESS != (ret = orte_grpcomm.modex(coll))) {
 error = "orte_grpcomm_modex failed";
 goto error;
 }
 /* wait for modex to complete - this may be moved anywhere in mpi_init
  * so long as it occurs prior to calling a function that needs
  * the modex info!
  */
 while (coll->active) {
 opal_progress();  /* block in progress pending events */
 }
 OBJ_RELEASE(coll);
 and several instructions after this there is a block for processes 
 synchronization (denote this block 'barrier'):
 coll = OBJ_NEW(orte_grpcomm_collective_t);
 coll->id = orte_process_info.peer_init_barrier;
 if (ORTE_SUCCESS != (ret = orte_grpcomm.barrier(coll))) {
 error = "orte_grpcomm_barrier failed";
 goto error;
 }
 /* wait for barrier to complete */
 while (coll->active) {
 opal_progress();  /* block in progress pending events */
 }
 OBJ_RELEASE(coll);
 So, initially ompi_mpi_init() has following structure:
 ...
 'modex' block;
 ...
 'barrier' block

[OMPI devel] openmpi-1.9a1r27710 on cygwin: patch and questions

2012-12-21 Thread marco atzeri

Hi,
additional to the patches used for building on cygwin
openmpi-1.7rc5, a new one is needed for openmpi-1.9a1r27710 build.
See attached for statfs usage.

As config parameters, I added "if-windows,shmem-windows"
to

--enable-mca-no-build=paffinity,installdirs-windows,timer-windows,shmem-sysv,if-windows,shmem-windows


Question 1 :
instead of a platform check should be better to check
if statvfs or statfs are implemented on the platform ?

Question 2:
any specif reason to have reset the shared library
version numbers ?

On openmpi-1.9a1r27710
./usr/bin/cygmpi-0.dll
./usr/bin/cygmpi_cxx-0.dll
./usr/bin/cygmpi_mpifh-0.dll
./usr/bin/cygmpi_usempi-0.dll
./usr/bin/cygopen-pal-0.dll
./usr/bin/cygopen-rte-0.dll
./usr/lib/openmpi/cygompi_dbg_msgq.dll

On openmpi-1.7rc5
./usr/bin/cygmpi-1.dll
./usr/bin/cygmpi_cxx-1.dll
./usr/bin/cygmpi_mpifh-2.dll
./usr/bin/cygmpi_usempi-1.dll
./usr/bin/cygopen-pal-5.dll
./usr/bin/cygopen-rte-5.dll
./usr/lib/openmpi/cygompi_dbg_msgq.dll

Question 3:
 there is an alternative way to exclude all the "*-windows" mca
 instead of
--enable-mca-no-build=installdirs-windows,timer-windows,if-windows,shmem-windows


Regards
Marco
--- origsrc/openmpi-1.9a1r27710/opal/util/path.c2012-12-20 
03:00:25.0 +0100
+++ src/openmpi-1.9a1r27710/opal/util/path.c2012-12-21 14:34:15.432823000 
+0100
@@ -547,7 +547,7 @@
 #if defined(__SVR4) && defined(__sun)
 struct statvfs buf;
 #elif defined(__linux__) || defined (__BSD) || 
\
-  (defined(__APPLE__) && defined(__MACH__))
+  (defined(__APPLE__) && defined(__MACH__)) || defined(__CYGWIN__)
 struct statfs buf;
 #endif

@@ -560,7 +560,7 @@
 #if defined(__SVR4) && defined(__sun)
 rc = statvfs(path, &buf);
 #elif defined(__linux__) || defined (__BSD) || 
\
-  (defined(__APPLE__) && defined(__MACH__))
+  (defined(__APPLE__) && defined(__MACH__)) || defined(__CYGWIN__)
 rc = statfs(path, &buf);
 #endif
 err = errno;


Re: [OMPI devel] Duplicated modex issue.

2012-12-21 Thread Victor Kocheganov
Thanks for help. All work as you said.

On Fri, Dec 21, 2012 at 7:11 PM, Ralph Castain  wrote:

> Don't know how many times I can repeat it, but I'll try again: you are not
> allowed to reuse a collective id. If it happens to work, it's by accident.
>
> If you want to implement multiple modex/barrier operations, they each need
> to have their own unique collective id.
>
>
> On Dec 20, 2012, at 9:28 PM, Victor Kocheganov <
> victor.kochega...@itseez.com> wrote:
>
> Actually, if I reuse id's in equivalent calls like this:
>
> ...
> 'modex' block;
> 'modex' block;
> 'modex' block;
> ...
>
> or
>
> ...
> 'barrier' block;
> 'barrier' block;
> 'barrier' block;
> ...
>
> there is no hanging. The hang only occurs if this "reusing" follows after
> using of another collective id, In the way I wrote in the first letter:
>
> ...
> 'modex' block;
> 'barrier' block;
> 'modex' block; <- hangs
> ...
>
> or in this way
>
> ...
> 'barrier' block;
> 'modex' block;
> 'barrier' block; <- hangs
> ...
>
>
> If I use different collective id while calling modex (1, 2 , ... , but not
>  0==orte_process_info.peer_modex), that also won't work, unfortunately..
>
>
>
> On Thu, Dec 20, 2012 at 10:39 PM, Ralph Castain  wrote:
>
>> Yeah, that won't work. The id's cannot be reused, so you'd have to assign
>> a different one in each case.
>>
>> On Dec 20, 2012, at 9:12 AM, Victor Kocheganov <
>> victor.kochega...@itseez.com> wrote:
>>
>> In every 'modex' block I use  coll->id = orte_process_info.peer_modex;
>> id and in every 'barrier' block I use  coll->id =
>> orte_process_info.peer_init_barrier;  id.
>>
>> P.s. In general (as I wrote in first letter), I use 'modex' term for
>> following code:
>> coll = OBJ_NEW(orte_grpcomm_collective_t);
>> coll->id = orte_process_info.peer_modex;
>> if (ORTE_SUCCESS != (ret = orte_grpcomm.modex(coll))) {
>> error = "orte_grpcomm_modex failed";
>> goto error;
>> }
>> /* wait for modex to complete - this may be moved anywhere in mpi_init
>>  * so long as it occurs prior to calling a function that needs
>>  * the modex info!
>>  */
>> while (coll->active) {
>> opal_progress();  /* block in progress pending events */
>> }
>> OBJ_RELEASE(coll);
>>
>> and 'barrier' for this:
>>
>> coll = OBJ_NEW(orte_grpcomm_collective_t);
>> coll->id = orte_process_info.peer_init_barrier;
>> if (ORTE_SUCCESS != (ret = orte_grpcomm.barrier(coll))) {
>> error = "orte_grpcomm_barrier failed";
>> goto error;
>> }
>> /* wait for barrier to complete */
>> while (coll->active) {
>> opal_progress();  /* block in progress pending events */
>> }
>> OBJ_RELEASE(coll);
>>
>> On Thu, Dec 20, 2012 at 8:57 PM, Ralph Castain  wrote:
>>
>>>
>>> On Dec 20, 2012, at 8:29 AM, Victor Kocheganov <
>>> victor.kochega...@itseez.com> wrote:
>>>
>>> Thanks for fast answer, Ralph.
>>>
>>> In my example I use different collective objects. I mean in every
>>> mentioned block I call  *coll = OBJ_NEW(orte_grpcomm_**collective_t);*
>>> and *OBJ_RELEASE(coll);* , so all the grpcomm operations use unique
>>> collective object.
>>>
>>>
>>> How are the procs getting the collective id for those new calls? They
>>> all have to match
>>>
>>>
>>>
>>> On Thu, Dec 20, 2012 at 7:48 PM, Ralph Castain  wrote:
>>>
 Absolutely it will hang as the collective object passed into any
 grpcomm operation (modex or barrier) is only allowed to be used once - any
 attempt to reuse it will fail.


 On Dec 20, 2012, at 6:57 AM, Victor Kocheganov <
 victor.kochega...@itseez.com> wrote:

   Hi.

 I have an issue with understanding  *ompi_mpi_init() *logic. Could you
 please tell me if you have any guesses about following behavior.

 I wonder if I understand ringh, there is a block in *ompi_mpi_init() 
 *function
 for exchanging procs information between processes (denote this block
 'modex'):

 coll = OBJ_NEW(orte_grpcomm_collective_t);
 coll->id = orte_process_info.peer_modex;
 if (ORTE_SUCCESS != (ret = orte_grpcomm.modex(coll))) {
 error = "orte_grpcomm_modex failed";
 goto error;
 }
 /* wait for modex to complete - this may be moved anywhere in
 mpi_init
  * so long as it occurs prior to calling a function that needs
  * the modex info!
  */
 while (coll->active) {
 opal_progress();  /* block in progress pending events */
 }
 OBJ_RELEASE(coll);

 and several instructions after this there is a block for processes
 synchronization (denote this block 'barrier'):

 coll = OBJ_NEW(orte_grpcomm_collective_t);
 coll->id = orte_process_info.peer_init_barrier;
 if (ORTE_SUCCESS != (ret = orte_grpcomm.barrier(coll))) {
 error = "orte_grpcomm_barrier failed";
 goto error;
 }
 /* wait for barrie

[OMPI devel] Open MPI planned outage

2012-12-21 Thread Jeff Squyres
Our Indiana U. hosting providers will be doing some maintenance over the 
holiday break.

All Open MPI services -- web, trac, SVN, ...etc. -- will be down on Wednesday, 
December 26th, 2012 during the following time period:

- 5:00am-11:00am Pacific US time
- 6:00am-12:00pm Mountain US time
- 7:00am-01:00pm Central US time
- 6:00am-02:00pm Eastern US time
- 11:00am-05:00pm GMT

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI devel] Open MPI planned outage

2012-12-21 Thread Jeff Squyres
Oops!  The times that were sent were wrong.  Here's the correct times:

- 3:00am-09:00am Pacific US time
- 4:00am-10:00am Mountain US time
- 5:00am-11:00am Central US time
- 6:00am-12:00am Eastern US time
- 11:00am-05:00pm GMT


On Dec 21, 2012, at 12:44 PM, Jeff Squyres wrote:

> Our Indiana U. hosting providers will be doing some maintenance over the 
> holiday break.
> 
> All Open MPI services -- web, trac, SVN, ...etc. -- will be down on 
> Wednesday, December 26th, 2012 during the following time period:
> 
> - 5:00am-11:00am Pacific US time
> - 6:00am-12:00pm Mountain US time
> - 7:00am-01:00pm Central US time
> - 6:00am-02:00pm Eastern US time
> - 11:00am-05:00pm GMT
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/