Re: [Pw_forum] Memory usage estimate of the calculation

2017-01-05 Thread Paolo Giannozzi
Yes, those are "Megabytes", not "Megabits"

Paolo

On Thu, Jan 5, 2017 at 12:25 AM, Jun Jiang  wrote:

> Dear All,
>
> I was running the qe6.0 (pw.x)
> I find the memory usage estimate in the output files like below
> "
>
> Estimated max dynamical RAM per process > 3207.98Mb
> Estimated total allocated dynamical RAM > 153983.19Mb
> "
> To make full use of the RAM and CPU, is it means if I allocated 3208 MB
> /cpu and 153984 MB in total for the job, it would be enough for this
> calculation ?
> If the real RAM usage would be larger than that, how much should I add to
> this estimation or how to estimated the other part ?
>
> PS: Is this "Mb" in the code means Megabyte(MB) or Megabit(Mb), I think it
> should be Megabyte(MB).
>
> Thanks,
> Jun Jiang
>
> ___
> Pw_forum mailing list
> Pw_forum@pwscf.org
> http://pwscf.org/mailman/listinfo/pw_forum
>



-- 
Paolo Giannozzi, Dip. Scienze Matematiche Informatiche e Fisiche,
Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
Phone +39-0432-558216, fax +39-0432-558222
___
Pw_forum mailing list
Pw_forum@pwscf.org
http://pwscf.org/mailman/listinfo/pw_forum

[Pw_forum] Memory usage estimate of the calculation

2017-01-04 Thread Jun Jiang
Dear All,

I was running the qe6.0 (pw.x)
I find the memory usage estimate in the output files like below
"

Estimated max dynamical RAM per process > 3207.98Mb
Estimated total allocated dynamical RAM > 153983.19Mb
"
To make full use of the RAM and CPU, is it means if I allocated 3208 MB
/cpu and 153984 MB in total for the job, it would be enough for this
calculation ?
If the real RAM usage would be larger than that, how much should I add to
this estimation or how to estimated the other part ?

PS: Is this "Mb" in the code means Megabyte(MB) or Megabit(Mb), I think it
should be Megabyte(MB).

Thanks,
Jun Jiang
___
Pw_forum mailing list
Pw_forum@pwscf.org
http://pwscf.org/mailman/listinfo/pw_forum

[Pw_forum] Memory usage by pw.x

2012-08-30 Thread Guido Fratesi
I'm sorry for a previous incomplete message.

>   sed -ri "s/(^ *)(allocate.*$)/\1\2\n\1  CALL mem_whatever()/i" $(find
> /where/is/espresso -name \*.f90)

That was very useful, thank you.

I now get 1.9GB out of 2.4, which starts giving some usable estimate, 
but I do understand that getting the accurate value is very complex.

To my purpose, I'll monitor the memory occupancy by a script, which 
follows below in case someone finds it useful...

Guido



command=pw.x

maxsecs=$((60*60*24))
delay=1
nsteps=$((maxsecs/delay))

echo "#when  RAM  PID (0 for all $command instances)"}

for ((i=0;((i

[Pw_forum] Memory usage by pw.x

2012-08-30 Thread Guido Fratesi
This was indeed very useful, thank you.
I got 1.9GBMb out of about 2400

On 08/30/2012 04:34 PM, Lorenzo Paulatto wrote:
> sed -ri "s/(^ *)(allocate.*$)/\1\2\n\1  CALL mem_whatever()/i" $(find
> /where/is/espresso -name \*.f90)

-- 
Guido Fratesi

Dipartimento di Scienza dei Materiali
Universita` degli Studi di Milano-Bicocca
via Cozzi 53, 20125 Milano, Italy

Phone: +39 02 6448 5183
email: fratesi at mater.unimib.it


[Pw_forum] Memory usage by pw.x

2012-08-30 Thread Simon Binnie
On Thu, 30 Aug 2012 16:34:03 +0200, Lorenzo Paulatto  
 wrote:

> On 30 August 2012 15:54, Guido Fratesi  wrote:
>
>> Yet in my test, the max memory printed by top is 2.3GB within the first
>> step of the SCF cycle, but the standard call to memstat in electrons.f90
>> returned 744.1 Mb and the one tracked as described above 1168.572 Mb
>> (maximum reached earlier than that 744.1 Mb).
>>
>>
> Measuring the amount of memory in the "clock" subroutines is far from
> optimal, as they are usually called before temporary variables are
> allocated (start)clock) and after they are deallocated (stop_clock).
>
> With a command like this:
>  sed -ri "s/(^ *)(allocate.*$)/\1\2\n\1  CALL mem_whatever()/i" $(find
> /where/is/espresso -name \*.f90)
>
> You can add a call to mem_whatever after *every* allocate in the entire
> code. This will also modify all your f90 files, so I suggest making a
> backup first.
> This should result in a quite accurate report of memory consumption (at  
> the
> cost of a certain performance hit, I guess).

You could use valgrind with the 'massif' tool. This should be able to tell  
you in which routine the memory usage peaks and thus where is best to put  
your memory check. Running qe under valgrind can be a very slow process  
though...

Simon

-- 
Simon Binnie | Post Doc, Condensed Matter Sector
Scuola Internazionale di Studi Avanzati (SISSA)
Via Bonomea 256 | 34100 Trieste | sbinnie at sissa.it


[Pw_forum] Memory usage by pw.x

2012-08-30 Thread Lorenzo Paulatto
On 30 August 2012 15:54, Guido Fratesi  wrote:

> Yet in my test, the max memory printed by top is 2.3GB within the first
> step of the SCF cycle, but the standard call to memstat in electrons.f90
> returned 744.1 Mb and the one tracked as described above 1168.572 Mb
> (maximum reached earlier than that 744.1 Mb).
>
>
Measuring the amount of memory in the "clock" subroutines is far from
optimal, as they are usually called before temporary variables are
allocated (start)clock) and after they are deallocated (stop_clock).

With a command like this:
 sed -ri "s/(^ *)(allocate.*$)/\1\2\n\1  CALL mem_whatever()/i" $(find
/where/is/espresso -name \*.f90)

You can add a call to mem_whatever after *every* allocate in the entire
code. This will also modify all your f90 files, so I suggest making a
backup first.
This should result in a quite accurate report of memory consumption (at the
cost of a certain performance hit, I guess). Eventual allocations in c
files and external libraries could still escape. I.e. the FFT library could
decide to allocate a temporary array of 1GB, you will not see this in the
final report.

I suggest putting the subroutine mem_whatever somewher in flib/ and without
a Module, otherwise you'll be forced to include also a USE in every file,
which can be annoying.

bests

-- 
Lorenzo Paulatto IdR @ IMPMC/CNRS & Universit? Paris 6
phone: +33 (0)1 44275 084 / skype: paulatz
www:   http://www-int.impmc.upmc.fr/~paulatto/
mail:  23-24/4?16 Bo?te courrier 115, 4 place Jussieu 75252 Paris C?dex 05
-- next part --
An HTML attachment was scrubbed...
URL: 
http://www.democritos.it/pipermail/pw_forum/attachments/20120830/45925cc4/attachment.htm
 


[Pw_forum] Memory usage by pw.x

2012-08-30 Thread Paolo Giannozzi
On Thu, 2012-08-30 at 15:54 +0200, Guido Fratesi wrote:

> Yet in my test, the max memory printed by top is 2.3GB within the first 
> step of the SCF cycle, but the standard call to memstat in electrons.f90 
> returned 744.1 Mb and the one tracked as described above 1168.572 Mb 

my (limited) understanding is that the internal call to "memstat"
reports only the dynamically allocated memory; "top" reports all 
memory taken by the process, including shared libraries and whatnot. 
It seems to me that the difference, (2.3-1.2)Gb=1.1Gb, is a lot
of memory, but I have no idea how to figure out where all this 
memory come from (or goes to). This is stuff for OS wizards.

P.
-- 
Paolo Giannozzi, IOM-Democritos and University of Udine, Italy




[Pw_forum] Memory usage by pw.x

2012-08-30 Thread Guido Fratesi
> What about tracking the maximum and the minimum recorded within an
> entire SCF loop by sampling the memory occupancy where a clock (start or
> stop) is triggered?

I tried this possibility, defining the routines below in clocks.f90 and 
the variable max_ram_kb (stored for laziness in "mytime"). Then, I call 
set_max_tracked_ram at every start/stop of a clock. I would expect this 
to work, since for example h_psi is called within the diagonalization 
cycles where workspaces and (I expect) most arrays have been already 
allocated.
Yet in my test, the max memory printed by top is 2.3GB within the first 
step of the SCF cycle, but the standard call to memstat in electrons.f90 
returned 744.1 Mb and the one tracked as described above 1168.572 Mb 
(maximum reached earlier than that 744.1 Mb).

I report here the subroutines I used, they are trivial but...

   SUBROUTINE set_max_tracked_ram ()
 USE mytime,   ONLY : max_ram_kb
 IMPLICIT NONE
 INTEGER :: kilobytes
 CALL memstat ( kilobytes )
 IF ( kilobytes > max_ram_kb+50 ) THEN
   max_ram_kb = kilobytes
   CALL write_max_tracked_ram ()
 END IF
   END SUBROUTINE set_max_tracked_ram
   !
   SUBROUTINE write_max_tracked_ram ()
 USE io_global,ONLY : stdout
 USE mytime,   ONLY : max_ram_kb
 IMPLICIT NONE
 WRITE( stdout, 9001 ) max_ram_kb/1000.0
 9001 FORMAT(/' XXX per-process dynamical memory: ',f7.1,' Mb' )
   END SUBROUTINE write_max_tracked_ram

Guido

-- 
Guido Fratesi

Dipartimento di Scienza dei Materiali
Universita` degli Studi di Milano-Bicocca
via Cozzi 53, 20125 Milano, Italy



[Pw_forum] Memory usage by pw.x

2012-08-30 Thread Axel Kohlmeyer
guido,

On Thu, Aug 30, 2012 at 12:04 PM, Guido Fratesi  
wrote:
> I'm sorry for a previous incomplete message.
>
>>   sed -ri "s/(^ *)(allocate.*$)/\1\2\n\1  CALL mem_whatever()/i" $(find
>> /where/is/espresso -name \*.f90)
>
> That was very useful, thank you.
>
> I now get 1.9GB out of 2.4, which starts giving some usable estimate,
> but I do understand that getting the accurate value is very complex.

quantifying memory usage precisely on unix/linux machines with
virtual memory management and memory sharing is almost
impossible. you have multiple components to worry about:

- address space (memory reserved to be used, but initially all
  mapped to the same copy-on-write location)
- resident set size (actual physical memory used)
- shared memory (it is not memory that is shared, but
   more a measure for how much sharing is going on)
- swap space.
- device memory (from infiniband cards for example)
- pinned memory (allocated memory that cannot be swapped,
   usually used to back device memory)

so what is the real memory usage is difficult to determine.
address space (VMEM) is usually too large, resident set size
(RSS) does not consider memory that is swapped out, so it
is often too small. using many MPI tasks drives up the address
space for device memory (which doesn't increase real memory
usage, but also requires more pinned memory, which makes
swapping more likely). multi-threading results in a lot of sharing.

...and tracking allocations in the code only handles explicit
allocations, not those incurred by the fortran language on the
stack or otherwise.

so for all practical purposes you can say that memory use is
usually somewhere between VMEM and RSS, but that can be
pretty far apart.

axel.

> To my purpose, I'll monitor the memory occupancy by a script, which
> follows below in case someone finds it useful...



> Guido
>
> 
>
> command=pw.x
>
> maxsecs=$((60*60*24))
> delay=1
> nsteps=$((maxsecs/delay))
>
> echo "#when  RAM  PID (0 for all $command instances)"}
>
> for ((i=0;((i#timer=`date +%H:%M:%S`
>#timer=`date +"%H:%M:%S %s.%N"`
>timer=`date +"%s.%N"`
>ps -eo comm,rss,pid |
>  awk -v comm=$command -v timer=$timer '
>BEGIN {tot=0}
>($1==comm) {
>  tot+=$2;
>  print timer, $2, $3;
>}
>END {if (tot) print timer, tot, 0}
>  '
>sleep 1
> done
>
>
>
> --
> Guido Fratesi
>
> Dipartimento di Scienza dei Materiali
> Universita` degli Studi di Milano-Bicocca
> via Cozzi 53, 20125 Milano, Italy
>
> ___
> Pw_forum mailing list
> Pw_forum at pwscf.org
> http://www.democritos.it/mailman/listinfo/pw_forum



-- 
Dr. Axel Kohlmeyer  akohlmey at gmail.com  http://goo.gl/1wk0
International Centre for Theoretical Physics, Trieste. Italy.


[Pw_forum] Memory usage by pw.x

2012-08-29 Thread Filippo Spiga
On Aug 29, 2012, at 3:24 PM, Guido Fratesi  wrote:
>  Let me guess that 
> calling again memstat later on, eg in c_bands, could provide one a more 
> precise estimate, but maybe you can suggest a better approach, or 
> correct me if I'm completely wrong.


I think you need to put several calls in several part of the code in order to 
have an understanding how much memory is going to be allocated. It will likely 
happen that the memory occupancy increases than decreases than increases again 
and so on. It is possible to incorporate a  memory monitoring within the clock 
module BUT the amount of output that can be printed is huge!

What about tracking the maximum and the minimum recorded within an entire SCF 
loop by sampling the memory occupancy where a clock (start or stop) is 
triggered?

Cheers,
Filippo

--
Mr. Filippo SPIGA, M.Sc., Ph.D. Candidate 
CADMOS - Chair of Numerical Algorithms and HPC (ANCHP)
?cole Polytechnique F?d?rale de Lausanne (EPFL)
http://anchp.epfl.ch ~ http://filippospiga.me ~ skype: filippo.spiga

?Nobody will drive us out of Cantor's paradise.? ~ David Hilbert

-- next part --
An HTML attachment was scrubbed...
URL: 
http://www.democritos.it/pipermail/pw_forum/attachments/20120829/cf053c5a/attachment-0001.htm
 


[Pw_forum] Memory usage by pw.x

2012-08-29 Thread Paolo Giannozzi
Hi Guido

> I'm trying to quantify the memory usage by pw.x

good luck. Understand how much memory a code really uses is
highly nontrivial, due to the way modern operating systems work
(shared libraries, files kept in RAM, ...)

> at the beginning of the SCF cycle the message "per-process
> dynamical memory:" reports the memory allocated at that time
> (clib/memstat.c), but this value is significantly less than the one
> I can see by monitoring the process by the "top" command, as
> more memory is allocated afterwards. Let me guess that calling
> again memstat later on, eg in c_bands, could provide one a more
> precise estimate, but maybe you can suggest a better approach

>

I cannot

P.
---
Paolo Giannozzi, Dept of Chemistry&Physics&Environment,
Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
Phone +39-0432-558216, fax +39-0432-558222






[Pw_forum] Memory usage by pw.x

2012-08-29 Thread Guido Fratesi
Dear all,

I'm trying to quantify the memory usage by pw.x: at the beginning of the 
SCF cycle the message "per-process dynamical memory:" reports the memory 
allocated at that time (clib/memstat.c), but this value is significantly 
less than the one I can see by monitoring the process by the "top" 
command, as more memory is allocated afterwards. Let me guess that 
calling again memstat later on, eg in c_bands, could provide one a more 
precise estimate, but maybe you can suggest a better approach, or 
correct me if I'm completely wrong.

Thank you in advance,
Guido

-- 
Guido Fratesi

Dipartimento di Scienza dei Materiali
Universita` degli Studi di Milano-Bicocca
via Cozzi 53, 20125 Milano, Italy


[Pw_forum] Memory usage in Quantum Espresso

2011-03-11 Thread Paolo Giannozzi

On Mar 10, 2011, at 22:29 , Krukau, Aliaksandr wrote:

> The user guide mentions that old compilers can reduce the size
> of the system that PWSCF can treat. But I use relatively new
> version 9.0 of Portland group Fortran compilers.

Either your compiler is buggy, or it requires some specific options
to run a relatively large code like PWscf. If you paid real money
for the PGI compiler, complain with the vendor. In the meantime,
try a recent version of gfortran or a non-buggy version of the Intel
compiler (latest v.11 should be ok)

P.
---
Paolo Giannozzi, Dept of Chemistry&Physics&Environment,
Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
Phone +39-0432-558216, fax +39-0432-558222






[Pw_forum] Memory usage in Quantum Espresso

2011-03-10 Thread Krukau, Aliaksandr
  Dear QE users,
   My desktop has Intel Core 2 Quad processor and 4GB RAM. However, 
when I run  Quantum Espresso (QE) 4.1, the biggest PWSCF jobs that I 
manage to run use about 200MB of RAM. For bigger systems or tighter 
cutoffs, the calculations crash with a segmentation violation. 
Moreover, if I use QE 4.2.1, the calculation crash even when I use more 
than ~100MB. So I am restricted to the quite small systems, especially 
if I use the latest 4.2.1 version. The user guide mentions that old 
compilers can reduce the size of the system that PWSCF can treat. But I 
use relatively new version 9.0 of Portland group Fortran compilers. If 
I run 'limit' command, it shows 'memoryuse unlimited; stacksize 
unlimited'.
I apologize for the naive questions, but why can PWSCF use only a 
small fraction of my total RAM? Or the line "per-process dynamical 
memory" in the output file does not show the total required memory? Is 
there a way to run bigger calculations without using parallel execution 
(maybe by compiler flags etc.)?
  Best regards,
  Alex Krukau,
  Indiana University



[Pw_forum] memory usage

2007-02-16 Thread Paolo Giannozzi
On Feb 16, 2007, at 12:27 , Marcel Mohr wrote:

> how can i print or estimate the memory a calculation would need?

estimate: there is a subsection "Memory requirements" in the user's
guide. It is not very detailed but it gives you an idea. Print: there  
is a
routine "memstat" that can be called after the initialization phase and
after largest arrays have been allocated (you could try to call it  
after line
   WRITE( stdout, 9000 ) get_clock( 'PWSCF' )
in electrons.f90). It calculates (linux, aix, and a few other OS) the  
size
of the dynamically allocated memory only. It is not the maximum
memory size because a nonnegligible amount of memory is allocated
later, during self-consistency. Keeping track on how much memory is
really used is a nontrivial task.

Paolo
---
Paolo Giannozzi, Democritos and University of Udine, Italy





[Pw_forum] memory usage

2007-02-16 Thread Marcel Mohr
Dear list-members

I have the suspicion that all of you have supercomputers where memory 
usage is incidental.
OK, not really, but
how can i print or estimate the memory a calculation would need?
(In older versions there was memory.x, and later the routine show_memory).

Kind regards
Marcel