Re: [OMPI users] totalview and message queue, empty windows

2010-02-04 Thread Terry Dontje
I figure out my issue.  We were building on Sles9 was causing the type 
field debug information not being generated for the .o.  So even though 
the type symbols could be found the field descriptions for those types 
could not because they were never generated.  Using Sles10 as the OS to 
compile and link on corrected the issue.


I am not sure if this issue actually happens with other compilers but at 
least the Pathscale compiler run's into this issue.


Note one of the things that helped me debug this issue was a SunStudio 
utility named dwarfdump which allows you to actually see the debugging 
symbols.


Unfortunately, I still haven't tracked down Ashley's issue which I think 
probably has more to do with the OMPI code instead of the debugging 
information not being generated.


--td

Terry Dontje wrote:
Hi DevL, what compiler and options are you using to build OMPI.  I am 
seeing something similar (Warning messages and the Message Queue 
window having bizarre values) when building with the Pathscale 
compiler but I don't see this with SunStudio, gcc, Intel or PGI.


However, I do see pending receives though there is no specific 
information on the actual communicators (name, size, rank).  It looks 
like some of the type symbols are not being kept in the .so.


--td


On 28 Jan 2010, at 21:04, DevL wrote:

> Hi,
> it looks that there is an issue with totalview and
> openmpi
>
> message queue just empty and output shows:
> WARNING: Field mtc_ndims_or_nnodes of type 
mca_topo_base_comm_1_0_0_t not found!
> WARNING: Field mtc_dims_or_index of type mca_topo_base_comm_1_0_0_t 
not found!
> WARNING: Field mtc_periods_or_edges of type 
mca_topo_base_comm_1_0_0_t not found!
> WARNING: Field mtc_reorder of type mca_topo_base_comm_1_0_0_t not 
found!
> WARNING: Field mtc_ndims_or_nnodes of type 
mca_topo_base_comm_1_0_0_t not found!
> WARNING: Field mtc_dims_or_index of type mca_topo_base_comm_1_0_0_t 
not found!
> WARNING: Field mtc_periods_or_edges of type 
mca_topo_base_comm_1_0_0_t not found!
> WARNING: Field mtc_reorder of type mca_topo_base_comm_1_0_0_t not 
found!

> [
> (Open MPI) 1.4a1r21427
> and
> totalview.8.7.0-7/linux-x86-64
>
> is this a known issue?

I've not seen it before but I do know of problems with the 
mca_topo_base_comm_1_0_0_t type and the debugger plugin (which 
TotalView is calling).


> and if so - how to overcome it ?

I'm afraid I don't know.

The Debugger plugin looks for the type (it's a struct) and then looks 
for some offsets within the struct. I've seen it fail to find the 
struct completely whereas this error appears to claim it can't find 
the entries within the struct. Perhaps the difference is that I found 
the problem using padb and you are using TotalView.


You could try the attached patch which allows the code to continue if 
the type isn't found, if you are seeing a different symptom of the 
same error then it might work for you.



As to the cause I've no idea, I've only seen it once or twice in the 
last six months and not on installations I've installed myself, I've 
never been able to find out the underlying cause and why some 
machines report this error and some don't.


Ashley,

--
Ashley Pittman, Bath, UK.
Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk
  






Re: [OMPI users] totalview and message queue, empty windows

2010-02-02 Thread Terry Dontje
Hi DevL, what compiler and options are you using to build OMPI.  I am 
seeing something similar (Warning messages and the Message Queue window 
having bizarre values) when building with the Pathscale compiler but I 
don't see this with SunStudio, gcc, Intel or PGI.


However, I do see pending receives though there is no specific 
information on the actual communicators (name, size, rank).  It looks 
like some of the type symbols are not being kept in the .so.


--td


On 28 Jan 2010, at 21:04, DevL wrote:

> Hi,
> it looks that there is an issue with totalview and
> openmpi
>
> message queue just empty and output shows:
> WARNING: Field mtc_ndims_or_nnodes of type mca_topo_base_comm_1_0_0_t 
not found!
> WARNING: Field mtc_dims_or_index of type mca_topo_base_comm_1_0_0_t 
not found!
> WARNING: Field mtc_periods_or_edges of type 
mca_topo_base_comm_1_0_0_t not found!

> WARNING: Field mtc_reorder of type mca_topo_base_comm_1_0_0_t not found!
> WARNING: Field mtc_ndims_or_nnodes of type mca_topo_base_comm_1_0_0_t 
not found!
> WARNING: Field mtc_dims_or_index of type mca_topo_base_comm_1_0_0_t 
not found!
> WARNING: Field mtc_periods_or_edges of type 
mca_topo_base_comm_1_0_0_t not found!

> WARNING: Field mtc_reorder of type mca_topo_base_comm_1_0_0_t not found!
> [
> (Open MPI) 1.4a1r21427
> and
> totalview.8.7.0-7/linux-x86-64
>
> is this a known issue?

I've not seen it before but I do know of problems with the 
mca_topo_base_comm_1_0_0_t type and the debugger plugin (which 
TotalView is calling).


> and if so - how to overcome it ?

I'm afraid I don't know.

The Debugger plugin looks for the type (it's a struct) and then looks 
for some offsets within the struct. I've seen it fail to find the 
struct completely whereas this error appears to claim it can't find 
the entries within the struct. Perhaps the difference is that I found 
the problem using padb and you are using TotalView.


You could try the attached patch which allows the code to continue if 
the type isn't found, if you are seeing a different symptom of the 
same error then it might work for you.



As to the cause I've no idea, I've only seen it once or twice in the 
last six months and not on installations I've installed myself, I've 
never been able to find out the underlying cause and why some machines 
report this error and some don't.


Ashley,

--
Ashley Pittman, Bath, UK.
Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk
  


Re: [OMPI users] totalview and message queue, empty windows

2010-01-29 Thread Ashley Pittman

On 28 Jan 2010, at 21:04, DevL wrote:

> Hi,
> it looks that there is an issue with totalview and
> openmpi
>  
> message queue just empty and output shows:
> WARNING: Field mtc_ndims_or_nnodes of type mca_topo_base_comm_1_0_0_t not 
> found!
> WARNING: Field mtc_dims_or_index of type mca_topo_base_comm_1_0_0_t not found!
> WARNING: Field mtc_periods_or_edges of type mca_topo_base_comm_1_0_0_t not 
> found!
> WARNING: Field mtc_reorder of type mca_topo_base_comm_1_0_0_t not found!
> WARNING: Field mtc_ndims_or_nnodes of type mca_topo_base_comm_1_0_0_t not 
> found!
> WARNING: Field mtc_dims_or_index of type mca_topo_base_comm_1_0_0_t not found!
> WARNING: Field mtc_periods_or_edges of type mca_topo_base_comm_1_0_0_t not 
> found!
> WARNING: Field mtc_reorder of type mca_topo_base_comm_1_0_0_t not found!
> [
>  (Open MPI) 1.4a1r21427
> and
> totalview.8.7.0-7/linux-x86-64
>  
> is this a known issue?

I've not seen it before but I do know of problems with the 
mca_topo_base_comm_1_0_0_t type and the debugger plugin (which TotalView is 
calling).

> and if so - how to overcome it ?

I'm afraid I don't know.

The Debugger plugin looks for the type (it's a struct) and then looks for some 
offsets within the struct.  I've seen it fail to find the struct completely 
whereas this error appears to claim it can't find the entries within the 
struct.  Perhaps the difference is that I found the problem using padb and you 
are using TotalView.

You could try the attached patch which allows the code to continue if the type 
isn't found, if you are seeing a different symptom of the same error then it 
might work for you.



ompi-topo-type.patch
Description: Binary data


As to the cause I've no idea, I've only seen it once or twice in the last six 
months and not on installations I've installed myself, I've never been able to 
find out the underlying cause and why some machines report this error and some 
don't.

Ashley,

-- 

Ashley Pittman, Bath, UK.

Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk