Nizamov Shawkat wrote:
> Dear Meep users !
>
> Lately I was trying to implement pythonic interface for libmeep. My
> current goal is enabling mpi feature of libmeep in python scripts.
> While enabling mpi is as straight as just calling mpi_init with
> correct arguments, the actual mpi acceleration do not meet my
> expectations. I did some tests with meep-mpi and c++ tasks also and
> found that the results are essentially the same - mpi efficiency is
> lower than expected.  For mpich the overall calculation speed on
> dual-core pentium was actually slower than on single core (mpi
> interconnection took way too much time). With openmpi, things got
> better. My simple tests, performed with meep-mpi (scheme), c++
> compiled testfiles (available in meep source) and pythonic interface
> all show approx 20-30% acceleraton (at best, 40% in time stepping
> code, see below) comparing dual-core pentium to single-core on the
> same PC. My  expectation for such SMP system was much higher - about
> 75-80%
>
> So the first question is - what mpi efficiency do you observe on your
> systems? Can someone provide simple benchmark (ctl preferred), which
> do not take hours to run on desktop PC and clearly demonstrates
> advantage of mpi?
>
> Next question - there is a special argument - num_chunks in class
> structure. Is it supposed to be the number of available processor
> cores, so that calculation domain optimally split among nodes ?
>
> And the last - Are there any hints for using meep with mpi support ?
>
> With best regards
> Nizamov Shawkat
>
>
>
>
>
> Supplementary:
>
> > cat 1.ctl
> (set! geometry-lattice (make lattice (size 16 8 no-size)))
> (set! geometry (list
>                 (make block (center 0 0) (size infinity 1 infinity)
>                       (material (make dielectric (epsilon 12))))))
> (set! sources (list
>                (make source
>                  (src (make continuous-src (frequency 0.15)))
>                  (component Ez)
>                  (center -7 0))))
> (set! pml-layers (list (make pml (thickness 1.0))))
> (set! resolution 10)
> (run-until 2000
>            (at-beginning output-epsilon)
>            (at-end output-efield-z))
>
>
> > mpirun -np 1 /usr/bin/meep-mpi 1.ctl
> Using MPI version 2.0, 1 processes                                  
> -----------                                                         
> Initializing structure...                                           
> Working in 2D dimensions.                                           
>      block, center = (0,0,0)                                        
>           size (1e+20,1,1e+20)                                      
>           axes (1,0,0), (0,1,0), (0,0,1)                            
>           dielectric constant epsilon = 12                          
> time for set_epsilon = 0.054827 s                                   
> -----------                                                         
> creating output file "./1-eps-000000.00.h5"...                      
> Meep progress: 230.95/2000.0 = 11.5% done in 4.0s, 30.6s to go
> on time step 4625 (time=231.25), 0.000865026 s/step
> Meep progress: 468.55/2000.0 = 23.4% done in 8.0s, 26.2s to go
> on time step 9378 (time=468.9), 0.000841727 s/step
> Meep progress: 705.8/2000.0 = 35.3% done in 12.0s, 22.0s to go
> on time step 14123 (time=706.15), 0.000843144 s/step
> Meep progress: 943.35/2000.0 = 47.2% done in 16.0s, 17.9s to go
> on time step 18874 (time=943.7), 0.000841985 s/step
> Meep progress: 1181.4/2000.0 = 59.1% done in 20.0s, 13.9s to go
> on time step 23635 (time=1181.75), 0.00084028 s/step
> Meep progress: 1418.85/2000.0 = 70.9% done in 24.0s, 9.8s to go
> on time step 28384 (time=1419.2), 0.000842386 s/step
> Meep progress: 1654.05/2000.0 = 82.7% done in 28.0s, 5.9s to go
> on time step 33088 (time=1654.4), 0.000850374 s/step
> Meep progress: 1891.5/2000.0 = 94.6% done in 32.0s, 1.8s to go
> on time step 37837 (time=1891.85), 0.000842369 s/step
> creating output file "./1-ez-002000.00.h5"...
> run 0 finished at t = 2000.0 (40000 timesteps)
>
> Elapsed run time = 33.9869 s
>
> > mpirun -np 2 /usr/bin/meep-mpi 1.ctl
> Using MPI version 2.0, 2 processes                                  
> -----------                                                         
> Initializing structure...                                           
> Working in 2D dimensions.                                           
>      block, center = (0,0,0)                                        
>           size (1e+20,1,1e+20)                                      
>           axes (1,0,0), (0,1,0), (0,0,1)                            
>           dielectric constant epsilon = 12                          
> time for set_epsilon = 0.0299381 s                                  
> -----------                                                         
> creating output file "./1-eps-000000.00.h5"...                      
> Meep progress: 328.85/2000.0 = 16.4% done in 4.0s, 20.4s to go      
> on time step 6577 (time=328.85), 0.00060946 s/step                  
> Meep progress: 656.1/2000.0 = 32.8% done in 8.0s, 16.4s to go       
> on time step 13123 (time=656.15), 0.000611187 s/step                
> Meep progress: 985.9/2000.0 = 49.3% done in 12.0s, 12.4s to go      
> on time step 19719 (time=985.95), 0.000606462 s/step                
> Meep progress: 1315.1/2000.0 = 65.8% done in 16.0s, 8.3s to go      
> on time step 26302 (time=1315.1), 0.000608525 s/step                
> Meep progress: 1644.95/2000.0 = 82.2% done in 20.0s, 4.3s to go     
> on time step 32911 (time=1645.55), 0.000605277 s/step               
> Meep progress: 1975.0/2000.0 = 98.8% done in 24.0s, 0.3s to go      
> on time step 39512 (time=1975.6), 0.000606022 s/step                
> creating output file "./1-ez-002000.00.h5"...                       
> run 0 finished at t = 2000.0 (40000 timesteps)                      
>
> Elapsed run time = 24.57 s
>
>
> For python script:
> (skipped harmless warnings like   [ubuntu:24234] mca: base:
> component_find: unable to open osc pt2pt: file not found (ignored))
>
>
> > mpirun -np 1 ./test-tut1-mpi.py
> Using MPI version 2.0, 1
> processes                                                         
> Count processors: 1 My rank is
> 0                                                           
> time for set_epsilon = 0.696046
> s                                                          
> creating output file
> "./eps-000000.00.h5"...                                               
> on time step 1880 (time=94), 0.00212839
> s/step                                             
> on time step 3781 (time=189.05), 0.00210516
> s/step                                         
> on time step 5682 (time=284.1), 0.00210487
> s/step                                          
> on time step 7585 (time=379.25), 0.00210279
> s/step                                         
> on time step 9486 (time=474.3), 0.00210501
> s/step                                          
> Field time usage:
>     connnecting chunks: 0.0260548 s
>          time stepping: 17.3853 s 
>          communicating: 3.30172 s 
>      outputting fields: 0.0123442 s
>     Fourier transforming: 0.0178975 s
>        everything else: 0.362632 s  
>
> > mpirun -np 2 ./test-tut1-mpi.py
> Using MPI version 2.0, 2
> processes                                                         
> Count processors: 2 My rank is
> 0                                                           
> time for set_epsilon = 0.37543
> s                                                           
> creating output file
> "./eps-000000.00.h5"...                                               
> on time step 2293 (time=114.65), 0.00174445
> s/step                                         
> on time step 4558 (time=227.9), 0.00176638
> s/step                                          
> on time step 6874 (time=343.7), 0.00172781
> s/step                                          
> on time step 9174 (time=458.7), 0.00173953
> s/step                                          
> Field time usage:
>     connnecting chunks: 0.0244552 s
>          time stepping: 12.9744 s
>          communicating: 4.15587 s
>      outputting fields: 0.00916502 s
>     Fourier transforming: 0.0171199 s
>        everything else: 0.328006 s
>
> > mpirun -np 2 ./test-tut1-mpi.py
> (class structure created with num_chunks=2)
>
> Using MPI version 2.0, 2 processes
> Count processors: 2 My rank is 0
> time for set_epsilon = 0.365422 s
> creating output file "./eps-000000.00.h5"...
> on time step 2066 (time=103.3), 0.00193695 s/step
> on time step 4152 (time=207.6), 0.00191805 s/step
> on time step 6205 (time=310.25), 0.00194897 s/step
> on time step 8307 (time=415.35), 0.00190318 s/step
> Field time usage:
>     connnecting chunks: 0.024951 s
>          time stepping: 14.0125 s
>          communicating: 4.72859 s
>      outputting fields: 0.00907009 s
>     Fourier transforming: 0.0204064 s
>        everything else: 0.438375 s
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> meep-discuss mailing list
> meep-discuss@ab-initio.mit.edu
> http://ab-initio.mit.edu/cgi-bin/mailman/listinfo/meep-discuss
Hi, Nizamov Shawkat,

For your first question, Benjamin has noted that your simulation sizes
are too small.  When you cut the region into
several chunks in MPI, each chunk should be large enough that its
surface points are much less than its volume ones.
A simple benchmark may be a bash script like
*for mynp in 1 2 4;
do
  for size in 100 1000 10000;
  do
     meep-mpi -np $mynp size=$size called.ctl
  done
done*
where called.ctl uses one clause
*(define-param size 10) *
to determine the problem size, and takes about 10 or 100 FDTD steps.


For  your second one,  num_chunks are larger than your core number
because different simulation domain types also
take different chunks, for example, PML regions and Normal FDTD regions
do not share one chunk.

yours
Zheng Li
2009-1-13
_______________________________________________
meep-discuss mailing list
meep-discuss@ab-initio.mit.edu
http://ab-initio.mit.edu/cgi-bin/mailman/listinfo/meep-discuss

Reply via email to