date:20130811

Re: [OMPI devel] [EXTERNAL] OpenSHMEM round 2

2013-08-11 Thread Ralph Castain

Turning off the enable_picky, I get it to compile with the following warnings:

pget_elements_x_f.c:70: warning: no previous prototype for 
'ompi_get_elements_x_f'
pstatus_set_elements_x_f.c:70: warning: no previous prototype for 
'ompi_status_set_elements_x_f'
ptype_get_extent_x_f.c:69: warning: no previous prototype for 
'ompi_type_get_extent_x_f'
ptype_get_true_extent_x_f.c:69: warning: no previous prototype for 
'ompi_type_get_true_extent_x_f'
ptype_size_x_f.c:69: warning: no previous prototype for 'ompi_type_size_x_f'

I also found that OpenShmem is still building by default. Is that intended? I 
thought you were only going to build if --with-shmem (or whatever option) was 
given.

Looks like some cleanup is required

On Aug 10, 2013, at 8:54 PM, Ralph Castain  wrote:

> FWIW, I couldn't get it to build - this is on a simple Xeon-based system 
> under CentOS 6.2:
> 
> cc1: warnings being treated as errors
> spml_yoda_getreq.c: In function 'mca_spml_yoda_get_completion':
> spml_yoda_getreq.c:98: error: pointer targets in passing argument 1 of 
> 'opal_atomic_add_32' differ in signedness
> ../../../../opal/include/opal/sys/amd64/atomic.h:174: note: expected 
> 'volatile int32_t *' but argument is of type 'uint32_t *'
> spml_yoda_getreq.c:98: error: signed and unsigned type in conditional 
> expression
> cc1: warnings being treated as errors
> spml_yoda_putreq.c: In function 'mca_spml_yoda_put_completion':
> spml_yoda_putreq.c:81: error: pointer targets in passing argument 1 of 
> 'opal_atomic_add_32' differ in signedness
> ../../../../opal/include/opal/sys/amd64/atomic.h:174: note: expected 
> 'volatile int32_t *' but argument is of type 'uint32_t *'
> spml_yoda_putreq.c:81: error: signed and unsigned type in conditional 
> expression
> make[2]: *** [spml_yoda_getreq.lo] Error 1
> make[2]: *** Waiting for unfinished jobs
> make[2]: *** [spml_yoda_putreq.lo] Error 1
> cc1: warnings being treated as errors
> spml_yoda.c: In function 'mca_spml_yoda_put_internal':
> spml_yoda.c:725: error: pointer targets in passing argument 1 of 
> 'opal_atomic_add_32' differ in signedness
> ../../../../opal/include/opal/sys/amd64/atomic.h:174: note: expected 
> 'volatile int32_t *' but argument is of type 'uint32_t *'
> spml_yoda.c:725: error: signed and unsigned type in conditional expression
> spml_yoda.c: In function 'mca_spml_yoda_get':
> spml_yoda.c:1107: error: pointer targets in passing argument 1 of 
> 'opal_atomic_add_32' differ in signedness
> ../../../../opal/include/opal/sys/amd64/atomic.h:174: note: expected 
> 'volatile int32_t *' but argument is of type 'uint32_t *'
> spml_yoda.c:1107: error: signed and unsigned type in conditional expression
> make[2]: *** [spml_yoda.lo] Error 1
> make[1]: *** [all-recursive] Error 1
> 
> Only configure arguments:
> 
> enable_picky=yes
> enable_debug=yes
> 
> 
> gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-3)
> 
> 
> 
> On Aug 10, 2013, at 7:21 PM, "Barrett, Brian W"  wrote:
> 
>> On 8/6/13 10:30 AM, "Joshua Ladd"  wrote:
>> 
>>> Dear OMPI Community,
>>> 
>>> Please find on Bitbucket the latest round of OSHMEM changes based on
>>> community feedback. Please git and test at your leisure.
>>> 
>>> https://bitbucket.org/jladd_math/mlnx-oshmem.git
>> 
>> Josh -
>> 
>> In general, I think everything looks ok.  However, the "right" thing
>> doesn't happen if the CM PML is used (at least, when using the Portals 4
>> MTL).  When configured with:
>> 
>> ./configure --enable-mca-no-build=pml-ob1,pml-bfo,pml-v,btl,bml,mpool
>> 
>> The build segfaults trying to run a SHMEM program:
>> 
>> mpirun -np 2 ./bcast
>> [shannon:90397] *** Process received signal ***
>> [shannon:90397] Signal: Segmentation fault (11)
>> [shannon:90397] Signal code: Address not mapped (1)
>> [shannon:90397] Failing at address: (nil)
>> [shannon:90398] *** Process received signal ***
>> [shannon:90398] Signal: Segmentation fault (11)
>> [shannon:90398] Signal code: Address not mapped (1)
>> [shannon:90398] Failing at address: (nil)
>> [shannon:90397] [ 0] /lib64/libpthread.so.0() [0x38b7a0f4a0]
>> [shannon:90397] *** End of error message ***
>> [shannon:90398] [ 0] /lib64/libpthread.so.0() [0x38b7a0f4a0]
>> [shannon:90398] *** End of error message ***
>> --
>> mpirun noticed that process rank 1 with PID 90398 on node shannon exited
>> on signal 11 (Segmentation fault).
>> --
>> 
>> 
>> 
>> Brian
>> 
>> --
>> Brian W. Barrett
>> Scalable System Software Group
>> Sandia National Laboratories
>> 
>> 
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>

Re: [OMPI devel] [EXTERNAL] OpenSHMEM round 2

2013-08-11 Thread Barrett, Brian W

Ralph -

I think those warnings are just because of when they last synced with the
trunk; it looks like they haven't updated in the last week, when those
(and some usnic fixes) went in.

More concerning is the --enable-picky stuff and the disabling of SHMEM in
the right places.

Brian

On 8/11/13 11:24 AM, "Ralph Castain"  wrote:

>Turning off the enable_picky, I get it to compile with the following
>warnings:
>
>pget_elements_x_f.c:70: warning: no previous prototype for
>'ompi_get_elements_x_f'
>pstatus_set_elements_x_f.c:70: warning: no previous prototype for
>'ompi_status_set_elements_x_f'
>ptype_get_extent_x_f.c:69: warning: no previous prototype for
>'ompi_type_get_extent_x_f'
>ptype_get_true_extent_x_f.c:69: warning: no previous prototype for
>'ompi_type_get_true_extent_x_f'
>ptype_size_x_f.c:69: warning: no previous prototype for
>'ompi_type_size_x_f'
>
>I also found that OpenShmem is still building by default. Is that
>intended? I thought you were only going to build if --with-shmem (or
>whatever option) was given.
>
>Looks like some cleanup is required
>
>On Aug 10, 2013, at 8:54 PM, Ralph Castain  wrote:
>
>> FWIW, I couldn't get it to build - this is on a simple Xeon-based
>>system under CentOS 6.2:
>> 
>> cc1: warnings being treated as errors
>> spml_yoda_getreq.c: In function 'mca_spml_yoda_get_completion':
>> spml_yoda_getreq.c:98: error: pointer targets in passing argument 1 of
>>'opal_atomic_add_32' differ in signedness
>> ../../../../opal/include/opal/sys/amd64/atomic.h:174: note: expected
>>'volatile int32_t *' but argument is of type 'uint32_t *'
>> spml_yoda_getreq.c:98: error: signed and unsigned type in conditional
>>expression
>> cc1: warnings being treated as errors
>> spml_yoda_putreq.c: In function 'mca_spml_yoda_put_completion':
>> spml_yoda_putreq.c:81: error: pointer targets in passing argument 1 of
>>'opal_atomic_add_32' differ in signedness
>> ../../../../opal/include/opal/sys/amd64/atomic.h:174: note: expected
>>'volatile int32_t *' but argument is of type 'uint32_t *'
>> spml_yoda_putreq.c:81: error: signed and unsigned type in conditional
>>expression
>> make[2]: *** [spml_yoda_getreq.lo] Error 1
>> make[2]: *** Waiting for unfinished jobs
>> make[2]: *** [spml_yoda_putreq.lo] Error 1
>> cc1: warnings being treated as errors
>> spml_yoda.c: In function 'mca_spml_yoda_put_internal':
>> spml_yoda.c:725: error: pointer targets in passing argument 1 of
>>'opal_atomic_add_32' differ in signedness
>> ../../../../opal/include/opal/sys/amd64/atomic.h:174: note: expected
>>'volatile int32_t *' but argument is of type 'uint32_t *'
>> spml_yoda.c:725: error: signed and unsigned type in conditional
>>expression
>> spml_yoda.c: In function 'mca_spml_yoda_get':
>> spml_yoda.c:1107: error: pointer targets in passing argument 1 of
>>'opal_atomic_add_32' differ in signedness
>> ../../../../opal/include/opal/sys/amd64/atomic.h:174: note: expected
>>'volatile int32_t *' but argument is of type 'uint32_t *'
>> spml_yoda.c:1107: error: signed and unsigned type in conditional
>>expression
>> make[2]: *** [spml_yoda.lo] Error 1
>> make[1]: *** [all-recursive] Error 1
>> 
>> Only configure arguments:
>> 
>> enable_picky=yes
>> enable_debug=yes
>> 
>> 
>> gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-3)
>> 
>> 
>> 
>> On Aug 10, 2013, at 7:21 PM, "Barrett, Brian W" 
>>wrote:
>> 
>>> On 8/6/13 10:30 AM, "Joshua Ladd"  wrote:
>>> 
 Dear OMPI Community,
 
 Please find on Bitbucket the latest round of OSHMEM changes based on
 community feedback. Please git and test at your leisure.
 
 https://bitbucket.org/jladd_math/mlnx-oshmem.git
>>> 
>>> Josh -
>>> 
>>> In general, I think everything looks ok.  However, the "right" thing
>>> doesn't happen if the CM PML is used (at least, when using the Portals
>>>4
>>> MTL).  When configured with:
>>> 
>>> ./configure --enable-mca-no-build=pml-ob1,pml-bfo,pml-v,btl,bml,mpool
>>> 
>>> The build segfaults trying to run a SHMEM program:
>>> 
>>> mpirun -np 2 ./bcast
>>> [shannon:90397] *** Process received signal ***
>>> [shannon:90397] Signal: Segmentation fault (11)
>>> [shannon:90397] Signal code: Address not mapped (1)
>>> [shannon:90397] Failing at address: (nil)
>>> [shannon:90398] *** Process received signal ***
>>> [shannon:90398] Signal: Segmentation fault (11)
>>> [shannon:90398] Signal code: Address not mapped (1)
>>> [shannon:90398] Failing at address: (nil)
>>> [shannon:90397] [ 0] /lib64/libpthread.so.0() [0x38b7a0f4a0]
>>> [shannon:90397] *** End of error message ***
>>> [shannon:90398] [ 0] /lib64/libpthread.so.0() [0x38b7a0f4a0]
>>> [shannon:90398] *** End of error message ***
>>> 
>>>
>>>--
>>> mpirun noticed that process rank 1 with PID 90398 on node shannon
>>>exited
>>> on signal 11 (Segmentation fault).
>>> 
>>>
>>>--
>>> 
>>> 
>>> 
>>> Brian
>>> 
>>> --
>>> Brian W. Barrett
>>> Sc

Re: [OMPI devel] Bad header guard in /opal/memoryhooks/memory.h

2013-08-11 Thread Ralph Castain

Thanks! Fixed in trunk and CMRd for 1.7.3

On Aug 9, 2013, at 1:07 AM, Michael Schlottke  
wrote:

> Hi there,
> 
> I don't know if this is the right place to post this, but it seems like the 
> header guard in /opal/memoryhooks/memory.h does not work as intended: 
> The header guard is written as
> 
> #ifndef OPAL_MEMORY_MEMORY_H
> #define OPAl_MEMORY_MEMORY_H
> 
> where in the second line it probably should read "OPAL_…" and not "OPAl_…". 
> This is openmpi-1.7.2.
> 
> Regards,
> 
> Michael
> 
> 
> --
> Michael Schlottke
> 
> SimLab Highly Scalable Fluids & Solids Engineering
> Jülich Aachen Research Alliance (JARA-HPC)
> RWTH Aachen University
> Wüllnerstraße 5a
> 52062 Aachen
> Germany
> 
> Phone: +49 (241) 80 95188
> Fax: +49 (241) 80 92257
> Mail: m.schlot...@aia.rwth-aachen.de
> Web: http://www.jara.org/jara-hpc
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] [slurm-dev] slurm-dev Memory accounting issues with mpirun (was Re: Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun)

2013-08-11 Thread Ralph Castain

I can't speak to what you get from sacct, but I can say that things will 
definitely be different when launched directly via srun vs indirectly thru 
mpirun. The reason is that mpirun uses srun to launch the orte daemons, which 
then fork/exec all the application processes under them (as opposed to 
launching those app procs thru srun). This means two things:

1. Slurm has no direct knowledge or visibility into the application procs 
themselves when launched by mpirun. Slurm only sees the ORTE daemons. I'm sure 
that Slurm rolls up all the resources used by those daemons and their children, 
so the totals should include them

2. Since all Slurm can do is roll everything up, the resources shown in sacct 
will include those used by the daemons and mpirun as well as the application 
procs. Slurm doesn't include their daemons or the slurmctld in their 
accounting. so the two numbers will be significantly different. If you are 
attempting to limit overall resource usage, you may need to leave some slack 
for the daemons and mpirun.

You should also see an extra "step" in the mpirun-launched job as mpirun itself 
generally takes the first step, and the launch of the daemons occupies a second 
step.

As for the strange numbers you are seeing, it looks to me like you are hitting 
a mismatch of unsigned vs signed values. When adding them up, that could cause 
all kinds of erroneous behavior.

On Aug 6, 2013, at 11:55 PM, Christopher Samuel  wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> On 07/08/13 16:19, Christopher Samuel wrote:
> 
>> Anyone seen anything similar, or any ideas on what could be going
>> on?
> 
> Sorry, this was with:
> 
> # ACCOUNTING
> JobAcctGatherType=jobacct_gather/linux
> JobAcctGatherFrequency=30
> 
> Since those initial tests we've started enforcing memory limits (the
> system is not yet in full production) and found that this causes jobs
> to get killed.
> 
> We tried the cgroups gathering method, but jobs still die with mpirun
> and now the numbers don't seem to right for mpirun or srun either:
> 
> mpirun (killed):
> 
> [samuel@barcoo-test Mem]$ sacct -j 94564 -o JobID,MaxRSS,MaxVMSize
>   JobID MaxRSS  MaxVMSize
> -  -- --
> 94564
> 94564.batch-523362K  0
> 94564.0 394525K  0
> 
> srun:
> 
> [samuel@barcoo-test Mem]$ sacct -j 94565 -o JobID,MaxRSS,MaxVMSize
>   JobID MaxRSS  MaxVMSize
> -  -- --
> 94565
> 94565.batch998K  0
> 94565.0  88663K  0
> 
> 
> All the best,
> Chris
> - -- 
> Christopher SamuelSenior Systems Administrator
> VLSCI - Victorian Life Sciences Computation Initiative
> Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
> http://www.vlsci.org.au/  http://twitter.com/vlsci
> 
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.4.11 (GNU/Linux)
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
> 
> iEYEARECAAYFAlIB73wACgkQO2KABBYQAh+kwACfYnMbONcpxD2lsM5i4QDw5r93
> KpMAn2hPUxMJ62u2gZIUGl5I0bQ6lllk
> =jYrC
> -END PGP SIGNATURE-
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] [EXTERNAL] OpenSHMEM round 2

Re: [OMPI devel] [EXTERNAL] OpenSHMEM round 2

Re: [OMPI devel] Bad header guard in /opal/memoryhooks/memory.h

Re: [OMPI devel] [slurm-dev] slurm-dev Memory accounting issues with mpirun (was Re: Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun)

4 matches

Site Navigation

Mail list logo

Footer information