Re: [OMPI devel] Heads up on new feature to 1.3.4

2009-08-18 Thread Chris Samuel

- "Ralph Castain"  wrote:

> Hi Chris

Hiya,

> The devel trunk has all of this in it - you can get that tarball from 
> the OMPI web site (take the nightly snapshot).

OK, grabbed that (1.4a1r21825). Configured with:

./configure --prefix=$FOO --with-openib --with-tm=/usr/
local/torque/latest --enable-static  --enable-shared

It built & installed OK, but when running a trivial example
with it I don't see evidence for that code getting called.
Perhaps I'm not passing the correct options ?

$ mpiexec -bysocket -bind-to-socket -mca odls_base_report_bindings 99 -mca 
odls_base_verbose 7 ./cpi-1.4
[tango095.vpac.org:16976] mca:base:select:( odls) Querying component [default]
[tango095.vpac.org:16976] mca:base:select:( odls) Query of component [default] 
set priority to 1
[tango095.vpac.org:16976] mca:base:select:( odls) Selected component [default]
[tango095.vpac.org:16976] [[36578,0],0] odls:launch: spawning child 
[[36578,1],0]
[tango095.vpac.org:16976] [[36578,0],0] odls:launch: spawning child 
[[36578,1],1]
[tango095.vpac.org:16976] [[36578,0],0] odls:launch: spawning child 
[[36578,1],2]
[tango095.vpac.org:16976] [[36578,0],0] odls:launch: spawning child 
[[36578,1],3]
Process 0 on tango095.vpac.org
Process 1 on tango095.vpac.org
Process 2 on tango095.vpac.org
Process 3 on tango095.vpac.org
^Cmpiexec: killing job...

Increasing odls_base_verbose only seems to add the environment being
passed to the child processes. :-(

I'm pretty sure I've got the right code as ompi_info -a
reports the debug setting from the patch:

MCA odls: parameter "odls_base_report_bindings" (current value: <0>, data 
source: default value)

> I plan to work on cpuset support beginning Tues morning.

Great, anything I can help with then please let me know,
I'll be back from leave by then.

All the best,
Chris
-- 
Christopher Samuel - (03) 9925 4751 - Systems Manager
 The Victorian Partnership for Advanced Computing
 P.O. Box 201, Carlton South, VIC 3053, Australia
VPAC is a not-for-profit Registered Research Agency


Re: [OMPI devel] Heads up on new feature to 1.3.4

2009-08-18 Thread Eugene Loh

Chris Samuel wrote:


OK, grabbed that (1.4a1r21825). Configured with:

./configure --prefix=$FOO --with-openib --with-tm=/usr/
local/torque/latest --enable-static  --enable-shared

It built & installed OK, but when running a trivial example
with it I don't see evidence for that code getting called.
Perhaps I'm not passing the correct options ?

$ mpiexec -bysocket -bind-to-socket -mca odls_base_report_bindings 99 -mca 
odls_base_verbose 7 ./cpi-1.4
 

Ah, you're missing the third secret safety switch that prevents hapless 
mortals from using this stuff accidentally!  :^)


I think you need to add

   --mca opal_paffinity_alone 1

a name that not even Ralph himself likes!


Re: [OMPI devel] Heads up on new feature to 1.3.4

2009-08-18 Thread Chris Samuel

- "Eugene Loh"  wrote:

> Ah, you're missing the third secret safety switch that prevents
> hapless mortals from using this stuff accidentally!  :^)

Sounds good to me. :-)

> I think you need to add
> 
> --mca opal_paffinity_alone 1


Yup, looks like that's it; it fails to launch with that..


$ mpiexec --mca opal_paffinity_alone 1 -bysocket -bind-to-socket -mca 
odls_base_report_bindings 99 -mca odls_base_verbose 7 ./cpi-1.4
[tango095.vpac.org:18548] mca:base:select:( odls) Querying component [default]
[tango095.vpac.org:18548] mca:base:select:( odls) Query of component [default] 
set priority to 1
[tango095.vpac.org:18548] mca:base:select:( odls) Selected component [default]
[tango095.vpac.org:18548] [[33990,0],0] odls:launch: spawning child 
[[33990,1],0]
[tango095.vpac.org:18548] [[33990,0],0] odls:launch: spawning child 
[[33990,1],1]
[tango095.vpac.org:18548] [[33990,0],0] odls:default:fork binding child 
[[33990,1],0] to socket 0 cpus 000f
[tango095.vpac.org:18548] [[33990,0],0] odls:default:fork binding child 
[[33990,1],1] to socket 1 cpus 00f0
--
An attempt to set processor affinity has failed - please check to
ensure that your system supports such functionality. If so, then
this is probably something that should be reported to the OMPI developers.
--
--
mpiexec was unable to start the specified application as it encountered an error
on node tango095.vpac.org. More information may be available above.
--
4 total processes failed to start


This is most likely because it's getting an error from the
kernel when trying to bind to a socket it's not permitted
to access.

cheers,
Chris
-- 
Christopher Samuel - (03) 9925 4751 - Systems Manager
 The Victorian Partnership for Advanced Computing
 P.O. Box 201, Carlton South, VIC 3053, Australia
VPAC is a not-for-profit Registered Research Agency


Re: [OMPI devel] Heads up on new feature to 1.3.4

2009-08-18 Thread Chris Samuel

- "Chris Samuel"  wrote:

> This is most likely because it's getting an error from the
> kernel when trying to bind to a socket it's not permitted
> to access.

This is what strace reports:

18561 sched_setaffinity(18561, 8,  { f0 } 
18561 <... sched_setaffinity resumed> ) = -1 EINVAL (Invalid argument)

so that would appear to be it.

cheers,
Chris
-- 
Christopher Samuel - (03) 9925 4751 - Systems Manager
 The Victorian Partnership for Advanced Computing
 P.O. Box 201, Carlton South, VIC 3053, Australia
VPAC is a not-for-profit Registered Research Agency


Re: [OMPI devel] XML request

2009-08-18 Thread Ashley Pittman
On Mon, 2009-08-17 at 21:16 -0600, Ralph Castain wrote:
> Should be done on trunk with r21826 - would you please give it a try  
> and let me know if that meets requirements? If so, I'll move it to  
> 1.3.4.

Is there somewhere these xml changes are documented?  I don't work with
OMPI xml but I do work with valgrind xml and the tools need to be
updated for every change to the specification.

Having good documentation of at least the bits which have changed and
when is essential to be able to make a version-independant tool.

Ashley,

-- 

Ashley Pittman, Bath, UK.

Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk



Re: [OMPI devel] XML request

2009-08-18 Thread Ralph Castain
Hmmmwell, actually - no. To the best of my knowledge, the only  
ones using this interface are the Eclipse folks, who are the ones  
requesting the changes.


Is anyone else out there using it? If so, please let us know and we'll  
be more careful about procedure/docs.



On Aug 18, 2009, at 2:33 AM, Ashley Pittman wrote:


On Mon, 2009-08-17 at 21:16 -0600, Ralph Castain wrote:

Should be done on trunk with r21826 - would you please give it a try
and let me know if that meets requirements? If so, I'll move it to
1.3.4.


Is there somewhere these xml changes are documented?  I don't work  
with

OMPI xml but I do work with valgrind xml and the tools need to be
updated for every change to the specification.

Having good documentation of at least the bits which have changed and
when is essential to be able to make a version-independant tool.

Ashley,

--

Ashley Pittman, Bath, UK.

Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] XML request

2009-08-18 Thread Greg Watson

Hi Ralph,

I'm seeing something strange. When I run "mpirun -mca  
orte_show_resolved_nodenames 1 -xml -display-map...", I see:











...


but when I run " ssh localhost mpirun -mca  
orte_show_resolved_nodenames 1 -xml -display-map...", I see:











...


Any ideas?

Thanks,
Greg

On Aug 17, 2009, at 11:16 PM, Ralph Castain wrote:

Should be done on trunk with r21826 - would you please give it a try  
and let me know if that meets requirements? If so, I'll move it to  
1.3.4.


Thanks
Ralph

On Aug 17, 2009, at 6:42 AM, Greg Watson wrote:


Hi Ralph,

Yes, you'd just need issue the start tag prior to any other XML  
output, then the end tag when it's guaranteed all XML other output  
has been sent.


Greg

On Aug 17, 2009, at 7:44 AM, Ralph Castain wrote:


All things are possible - some just a tad more painful than others.

It looks like you want the mpirun tags to flow around all output  
during the run - i.e., there is only one pair of mpirun tags that  
surround anything that might come out of the job. True?


If so, that would be trivial.

On Aug 14, 2009, at 9:25 AM, Greg Watson wrote:


Ralph,

Would it be possible to get mpirun to issue start and end tags if  
the -xml option is used? Currently there is no way to determine  
when the output starts and finishes, which makes parsing the XML  
tricky, particularly if something else generates output (e.g. the  
shell). Something like this would be ideal:




...

...
...


If we could get it in 1.3.4 even better. :-)

Thanks,
Greg
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] XML request

2009-08-18 Thread Ralph Castain
Hmmmlet me try adding a fflush after the  output to force it
out. Best guess is that you are seeing a little race condition - the map
output is coming over stderr, while the  tag is coming over stdout.



On Tue, Aug 18, 2009 at 12:53 PM, Greg Watson  wrote:

> Hi Ralph,
>
> I'm seeing something strange. When I run "mpirun -mca
> orte_show_resolved_nodenames 1 -xml -display-map...", I see:
>
> 
> 
>
>
>
>
>
>
> 
> ...
> 
>
> but when I run " ssh localhost mpirun -mca orte_show_resolved_nodenames 1
> -xml -display-map...", I see:
>
> 
>
>
>
>
>
>
> 
> 
> ...
> 
>
> Any ideas?
>
> Thanks,
> Greg
>
>
> On Aug 17, 2009, at 11:16 PM, Ralph Castain wrote:
>
>  Should be done on trunk with r21826 - would you please give it a try and
>> let me know if that meets requirements? If so, I'll move it to 1.3.4.
>>
>> Thanks
>> Ralph
>>
>> On Aug 17, 2009, at 6:42 AM, Greg Watson wrote:
>>
>>  Hi Ralph,
>>>
>>> Yes, you'd just need issue the start tag prior to any other XML output,
>>> then the end tag when it's guaranteed all XML other output has been sent.
>>>
>>> Greg
>>>
>>> On Aug 17, 2009, at 7:44 AM, Ralph Castain wrote:
>>>
>>>  All things are possible - some just a tad more painful than others.

 It looks like you want the mpirun tags to flow around all output during
 the run - i.e., there is only one pair of mpirun tags that surround 
 anything
 that might come out of the job. True?

 If so, that would be trivial.

 On Aug 14, 2009, at 9:25 AM, Greg Watson wrote:

  Ralph,
>
> Would it be possible to get mpirun to issue start and end tags if the
> -xml option is used? Currently there is no way to determine when the 
> output
> starts and finishes, which makes parsing the XML tricky, particularly if
> something else generates output (e.g. the shell). Something like this 
> would
> be ideal:
>
> 
> 
> ...
> 
> ...
> ...
> 
>
> If we could get it in 1.3.4 even better. :-)
>
> Thanks,
> Greg
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>

 ___
 devel mailing list
 de...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/devel

>>>
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>


Re: [OMPI devel] XML request

2009-08-18 Thread Greg Watson

Ralph,

Not sure that's it because all XML output should be via stdout.

Greg

On Aug 18, 2009, at 3:53 PM, Ralph Castain wrote:

Hmmmlet me try adding a fflush after the  output to  
force it out. Best guess is that you are seeing a little race  
condition - the map output is coming over stderr, while the   
tag is coming over stdout.




On Tue, Aug 18, 2009 at 12:53 PM, Greg Watson  
 wrote:

Hi Ralph,

I'm seeing something strange. When I run "mpirun -mca  
orte_show_resolved_nodenames 1 -xml -display-map...", I see:




   
   
   
   
   
   

...


but when I run " ssh localhost mpirun -mca  
orte_show_resolved_nodenames 1 -xml -display-map...", I see:



   
   
   
   
   
   


...


Any ideas?

Thanks,
Greg


On Aug 17, 2009, at 11:16 PM, Ralph Castain wrote:

Should be done on trunk with r21826 - would you please give it a try  
and let me know if that meets requirements? If so, I'll move it to  
1.3.4.


Thanks
Ralph

On Aug 17, 2009, at 6:42 AM, Greg Watson wrote:

Hi Ralph,

Yes, you'd just need issue the start tag prior to any other XML  
output, then the end tag when it's guaranteed all XML other output  
has been sent.


Greg

On Aug 17, 2009, at 7:44 AM, Ralph Castain wrote:

All things are possible - some just a tad more painful than others.

It looks like you want the mpirun tags to flow around all output  
during the run - i.e., there is only one pair of mpirun tags that  
surround anything that might come out of the job. True?


If so, that would be trivial.

On Aug 14, 2009, at 9:25 AM, Greg Watson wrote:

Ralph,

Would it be possible to get mpirun to issue start and end tags if  
the -xml option is used? Currently there is no way to determine when  
the output starts and finishes, which makes parsing the XML tricky,  
particularly if something else generates output (e.g. the shell).  
Something like this would be ideal:




...

...
...


If we could get it in 1.3.4 even better. :-)

Thanks,
Greg
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] XML request

2009-08-18 Thread Ralph Castain
Trueyet the two outputs do come through separate code paths, so it  
could be that's the issue. I honestly can't think of any other reason  
as the printf for the mpirun tag comes well before any mapping occurs.  
I'm not sure why the ssh would invert that order, nor how it could.


Let's try the fflush and see if it fixes the problem...

On Aug 18, 2009, at 2:18 PM, Greg Watson wrote:


Ralph,

Not sure that's it because all XML output should be via stdout.

Greg

On Aug 18, 2009, at 3:53 PM, Ralph Castain wrote:

Hmmmlet me try adding a fflush after the  output to  
force it out. Best guess is that you are seeing a little race  
condition - the map output is coming over stderr, while the  
 tag is coming over stdout.




On Tue, Aug 18, 2009 at 12:53 PM, Greg Watson  
 wrote:

Hi Ralph,

I'm seeing something strange. When I run "mpirun -mca  
orte_show_resolved_nodenames 1 -xml -display-map...", I see:




   
   
   
   
   
   

...


but when I run " ssh localhost mpirun -mca  
orte_show_resolved_nodenames 1 -xml -display-map...", I see:



   
   
   
   
   
   


...


Any ideas?

Thanks,
Greg


On Aug 17, 2009, at 11:16 PM, Ralph Castain wrote:

Should be done on trunk with r21826 - would you please give it a  
try and let me know if that meets requirements? If so, I'll move it  
to 1.3.4.


Thanks
Ralph

On Aug 17, 2009, at 6:42 AM, Greg Watson wrote:

Hi Ralph,

Yes, you'd just need issue the start tag prior to any other XML  
output, then the end tag when it's guaranteed all XML other output  
has been sent.


Greg

On Aug 17, 2009, at 7:44 AM, Ralph Castain wrote:

All things are possible - some just a tad more painful than others.

It looks like you want the mpirun tags to flow around all output  
during the run - i.e., there is only one pair of mpirun tags that  
surround anything that might come out of the job. True?


If so, that would be trivial.

On Aug 14, 2009, at 9:25 AM, Greg Watson wrote:

Ralph,

Would it be possible to get mpirun to issue start and end tags if  
the -xml option is used? Currently there is no way to determine  
when the output starts and finishes, which makes parsing the XML  
tricky, particularly if something else generates output (e.g. the  
shell). Something like this would be ideal:




...

...
...


If we could get it in 1.3.4 even better. :-)

Thanks,
Greg
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] XML request

2009-08-18 Thread Ralph Castain

Give r21836 a try and see if it still gets out of order.

Ralph


On Aug 18, 2009, at 2:18 PM, Greg Watson wrote:


Ralph,

Not sure that's it because all XML output should be via stdout.

Greg

On Aug 18, 2009, at 3:53 PM, Ralph Castain wrote:

Hmmmlet me try adding a fflush after the  output to  
force it out. Best guess is that you are seeing a little race  
condition - the map output is coming over stderr, while the  
 tag is coming over stdout.




On Tue, Aug 18, 2009 at 12:53 PM, Greg Watson  
 wrote:

Hi Ralph,

I'm seeing something strange. When I run "mpirun -mca  
orte_show_resolved_nodenames 1 -xml -display-map...", I see:




   
   
   
   
   
   

...


but when I run " ssh localhost mpirun -mca  
orte_show_resolved_nodenames 1 -xml -display-map...", I see:



   
   
   
   
   
   


...


Any ideas?

Thanks,
Greg


On Aug 17, 2009, at 11:16 PM, Ralph Castain wrote:

Should be done on trunk with r21826 - would you please give it a  
try and let me know if that meets requirements? If so, I'll move it  
to 1.3.4.


Thanks
Ralph

On Aug 17, 2009, at 6:42 AM, Greg Watson wrote:

Hi Ralph,

Yes, you'd just need issue the start tag prior to any other XML  
output, then the end tag when it's guaranteed all XML other output  
has been sent.


Greg

On Aug 17, 2009, at 7:44 AM, Ralph Castain wrote:

All things are possible - some just a tad more painful than others.

It looks like you want the mpirun tags to flow around all output  
during the run - i.e., there is only one pair of mpirun tags that  
surround anything that might come out of the job. True?


If so, that would be trivial.

On Aug 14, 2009, at 9:25 AM, Greg Watson wrote:

Ralph,

Would it be possible to get mpirun to issue start and end tags if  
the -xml option is used? Currently there is no way to determine  
when the output starts and finishes, which makes parsing the XML  
tricky, particularly if something else generates output (e.g. the  
shell). Something like this would be ideal:




...

...
...


If we could get it in 1.3.4 even better. :-)

Thanks,
Greg
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel