Re: [Meep-discuss] about meep-mpi

Konstantin Ladutenko Thu, 06 Oct 2011 04:02:00 -0700

Hi,
 
 Nizamov Shawkat, thank your for your reply and conforming not 1 to 1 scaling 
of meep stepping for MPI on your setup.


 I`ve checked scaling from problem size on two systems. Results are in LaTeX 
format at the end of my letter. You can download them in several formats 
http://194.85.163.135/meep-bench.[pdf,tex,ods] , I guess files should be 
available for several months.

 Short sum: scaling with increasing of used cores goes down with increase of 
problem size. On both tested systems best gain (1.85 for 8 Xeon cores, 1.25 for 
4 i5 cores relative to 1 core simulation)   was for 50 points per unit (size of 
model 15x15 units) with common trend to decrease for larger problems (for 
resolution 400 on i5 gain is about 1.18, for resolution 200 on Xeon gain is 
about 1.54).
  Short note: One 4 core i5-2400@3.1Ghz is 3-4 times faster than two 4-cores 
E5345@2.33 Ghz with meep-1.1.1.

 Any ideas? More benchmarking? 

 I has a home-made 2D fdtd code for invisible cloack simulation, I will thy to 
MPI it in few weeks, bench it on same systems and in case I will got something 
different I will post new results....

WBR,
KOstya

\begin{table}[h]
  \centering
  \begin{tabular}{c|c|c|c|c|c}    
    res & procs, $N$ &
    $\mbox{set eps}_{(N,res)}$, s & $\mbox{step}_{(N,res)}$,s &
    set eps $\frac{(N,res)}{(1,res)}$ & step $\frac{(N,res)}{(1,res)}$\\
    \hline \hline    
    i5-2400@3.1 Ghz &  &  &  &  & \\
    \hline
    50 & 4 & 8,3 & 0,0051 & 4,05 & 1,27 \\
    & 2 & 16,1 & 0,0053 & 2,09 & 1,23 \\
    & 1 & 33,6 & 0,0065 &  &  \\
    \hline
    100 & 4 & 32,8 & 0,0185 & 3,86 & 1,21 \\
    & 2 & 65,0 & 0,0186 & 1,95 & 1,20 \\
    & 1 & 126,6 & 0,0223 &  &  \\
    \hline
    200 & 4 & 133,1 & 0,0707 & 3,77 & 1,20 \\
    & 2 & 255,7 & 0,0705 & 1,96 & 1,20 \\
    & 1 & 502 & 0,0845 &  &  \\
    \hline
    400 & 4 & 526,2 & 0,2771 & 3,82 & 1,17 \\
    & 2 & 1024,7 & 0,2756 & 1,96 & 1,18 \\
    & 1 & 2008,3 & 0,3254 &  &  \\
    \hline
    \hline
    Xeon E5345@2.33 Ghz &  &  &  &  &  \\
    \hline
    50 & 8 & 6,5 & 0,0137 & 8,86 & 1,87 \\
    & 4 & 14,2 & 0,0138 & 4,06 & 1,86 \\
    & 2 & 28,7 & 0,0192 & 2,01 & 0,01 \\
    & 1 & 57,6 & 0,0256 &  &  \\
    \hline
    100 & 8 & 26,4 & 0,0608 & 8,57 & 1,54 \\
    & 4 & 56,2 & 0,0615 & 4,02 & 1,53 \\
    & 2 & 113,8 & 0,0658 & 1,99 & 1,43 \\
    & 1 & 226,2 & 0,0939 &  &  \\
    \hline
    200 & 8 & 104,3 & 0,235 & 8,76 & 1,56 \\
    & 4 & 225,4 & 0,239 & 4,05 & 1,54 \\
    & 2 & 456,2 & 0,2537 & 2,00 & 1,45 \\
    & 1 & 913,2 & 0,3676 &  & 
  \end{tabular}
  \caption{Single node performance vs model size (changing resolution,
    dots per size unit, 15x15 units), relative speed from size}
\end{table}


> ------------------------------
> 
> Message: 2
> Date: Wed, 5 Oct 2011 09:41:29 +0200
> From: Nizamov Shawkat <nizamov.shaw...@gmail.com>
> To: Konstantin Ladutenko <fisik2...@mail.ru>
> Cc: meep-discuss@ab-initio.mit.edu
> Subject: Re: [Meep-discuss] about meep-mpi
> Message-ID:
>       <cafwq1ssvdc0azsfadblbqkabtzmupf86dhkpelmdak_yyvy...@mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
> 
> 2011/10/4 Konstantin Ladutenko <fisik2...@mail.ru>:
> > Hi,
> >
> > ?I`ve continued benchmarking meep 1.1.1, now on a single 2 CPU (4 cores 
> > each) host to exclude my network setup...
> >
> > Simulating drop filter.
> >
> > procs --- set_eps --- set_eps_1 /procs* --- step --- step_time --- 
> > step_time_1/procs*
> > 8 --- 14,80 --- 15,98* --- 0,01630 --- 67,16 --- 14,20*
> > 4 --- 31,80 --- 31,95* --- 0,01600 --- 70,33 --- 28,40*
> > 2 --- 65,00 --- 63,91* --- 0,02080 --- 88,70 --- 56,80*
> > 1 --- 127,81 --- 127,81* --- 0,02680 --- 113,60 --- 113,60*
> >
> > * expected values
> >
> > So just the same. Set_epsilon is scaling, time stepping is not. Any ideas? 
> > My problem with scaling on Xeon?
> > I will try to check just the same on my home i5 in few days...
> >
> 
> Hello!
> 
> You results anyway show that your simulation is accelerating with
> adding cores. Yes, it is not 1 to 1 acceleration, and with 8 cores it
> becomes only twice faster in time stepping (67.16 vs 113.60). My guess
> is that your simulation is not large enough. Setting epsilon values is
> really parallel operation because each chunk does not depend on the
> other therefore there you see a good scaling.  On the other side the
> time stepping code is not completely parallelized - calculation of
> fields on their boundaries requires the interprocess communication
> over MPI. Because of such overheads these may actually slow down the
> simulation process. How much one will gain from parallelizing
> calculation of the fields in the chunks and suffer from slow down
> because of calculations on their boundaries actually depends on the
> size of simulation cell, used processor cores etc. Increase resolution
> a few times and you should see better scaling.
> 
> One more reason that one have to consider is MPI realization itself -
> if I remember it correctly, it is generally recommended to use openmpi
> over mpich, especially on the SMP.
> 
> Another problem, more hardware related, is the cache of processors.
> More processes actively using the memory - the less effective is cache
> and more stress on the memory bus. If you consider how epsilon
> population is performed you will see that this is relatively short
> function which takes a lot time for calculation and slowly populates
> the memory array. Time stepping, on the other side, performs less
> calculations but aggressively uses the memory (the memory bus speed
> may become a bottleneck) therefore making cache much less effective.
> In my own experience I noticed that on my 8 core system it was better
> to use 4 core simulation.
> 
> As a bottom line - what you see is quite normal and you have to play
> with your settings to obtain the best result. There is no single and
> universal recipe.
> 
> Hope it helps,
> S.Nizamov
> 
> ------------------------------
> 
> Message: 3
> Date: Wed, 5 Oct 2011 12:45:31 +0400
> From: Alexey <nad.co...@gmail.com>
> To: meep-discuss@ab-initio.mit.edu
> Subject: [Meep-discuss] Green's function calculation
> Message-ID:
>       <CAJYE85OSsdJv0EVsxwbdSV8O4W7fOxbg4do0uciwYdhkcv=q...@mail.gmail.com>
> Content-Type: text/plain; charset=KOI8-R
> 
> Hi all!
> 
> I need to calculate green function G (r,w) of the metamaterial system
> with cw electric current local source.
> Now I can calculate field pattern E (r,t) and H (r,t). So what
> I need to do now to calculate Re (G(r,w)) and Im (G(r,w)) of such
> system?
> Sory for may be stupid question:)
> 
> --
> Thanks,
> Alexey
> 
> --
> ? ?????????, ??????? ???????
> 
> ------------------------------
> 
> Message: 4
> Date: Wed, 5 Oct 2011 15:58:50 +0000
> From: Eguiluz Madariaga Lur <lur.eguiluz.madari...@aalto.fi>
> To: "meep-discuss@ab-initio.mit.edu" <meep-discuss@ab-initio.mit.edu>
> Subject: [Meep-discuss] Non-physical results in hole array group delay
>       calculation
> Message-ID:
>       <bd0c715223815240b762595ea2e29f141bb6b...@exmdb06.org.aalto.fi>
> Content-Type: text/plain; charset="us-ascii"
> 
> Hi all,
> 
> I am using meep to simulate the behavior of an infinite hole array in a metal 
> when it is hit by a gaussian pulsed plane wave. I obtain this infinite hole 
> array by simulating one hole and using periodic boundary conditions in the x 
> and y directions, and pml in the z direction. Taking the time domain data of 
> the electric field at the center of the hole, and making a Fourier transform 
> with Matlab, I calculate the spectral amplitude for each of the frequencies 
> contained in the pulse. The problem is that in the range of wavelengths from 
> 500 to 650nm, I obtain a plot full of spikes that look completely 
> non-physical, while out of this range the curve is smooth and seems to be 
> normal. I've attached a typical result to this message. I have checked the 
> results with the evaluation version of a commercial FDTD software package, 
> using the same values for the metal as I do in MEEP, and these peaks don't 
> appear. I would enormously appreciate if somebody could give me some idea 
> about how to get rid of the peaks. By now i have tried:
> 
> - Increasing the resolution
> - Increasing the pml layer thickness by an odd factor in order to rule out 
> multiple reflections
> - Changing the pml-profile to cubic
> - Different sizes for the geometry: increased distance between the hole array 
> and the pml layer by an odd factor (at least one wavelength).
> - Slightly different periodicities
> - A source with a smaller range of frequencies in the pulse, concentrated 
> around the problematic area.
> 
> None of these changes had an appreciable influence on the amplitude and all 
> the results were more or less as shown in the attached figure.
> 
> Any help would be greatly appreciated.
> Thank you for your time
> 
> Lur Eguiluz
> 
> The code I used is:
> 
> ;Definition of the structural parameters
> (define d_glass 1000)
> (define d_air 1000)
> (define d_metal 200.2)
> (define dpml 1000)
> (define periodx 410)
> (define periody 410)
> (define boxsize_z (+ d_glass d_metal d_air))
> (define x_hole 180)
> (define y_hole 180)
> 
> ;Metal characteristics
> (define metal
>       (make dielectric (epsilon 5.3894024200841582)
>           (polarizations
>             (make polarizability
>             (omega 5.7239396145741243e-07 ) (gamma 2.1739597464408151e-05 ) 
> (sigma 1.4274791830538642e+008 ) )
>             (make polarizability
>               (omega 0.0023282216492433606 ) (gamma 0.00040132006132541213 ) 
> (sigma 1.4316008270417622 ) )
>            )))
> 
> ;Lattice calculation and other settings
> (set! geometry-lattice (make lattice (size periodx periody (+ boxsize_z dpml 
> dpml) )))
> (set-param! resolution (/ 1 10))
> (set! eps-averaging? false)
> (define monitor_center (vector3 0 0 0) )
> 
> ;Boundary conditions (periodicity in x,y and pml in z)
> (set-param! k-point (vector3 0) )
> (set! pml-layers (list (make pml (thickness dpml) (direction Z) (side ALL) )))
> 
> ;Definition of the source characteristics
> 
> (define-param fmax 1/500 )
> (define-param fmin 1/1000)
> (define cfreq (* 0.5 (+ fmin fmax) ))
> (define-param freqwidth (- fmax fmin))
> 
> (set! sources (list (make source
>                     (src (make gaussian-src (frequency cfreq) (fwidth 
> freqwidth)))
>                     (component Ey) (center 0 0 (* (+ d_glass (/ d_metal 2)) 
> -1)) (size periodx periody 0))))
> 
> ;Definition of the geometry
> (set! geometry (list
>               (make block (center 0 0 0) (size infinity infinity d_metal) 
> (material metal));metal
>               (make block (center 0 0 (* -1 (/ (+ d_metal d_glass) 2)))
>                           (size infinity infinity d_glass) (material (make 
> dielectric (index 1.5))));glass
>               (make block (center 0 0 0) (size x_hole y_hole d_metal) 
> (material nothing));hole
> ))
> ;Initialize the fields, run the sources and store the field
> (run-sources+ (stop-when-fields-decayed 2000 Ey monitor_center 1e-7)
>   (in-point (vector3 0 0 (/ d_metal 2)) (to-appended "e_point" 
> output-efield)))
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: Spectral amplitude vs wavelength.jpg
> Type: image/jpeg
> Size: 19383 bytes
> Desc: Spectral amplitude vs wavelength.jpg
> URL: 
> <http://ab-initio.mit.edu/pipermail/meep-discuss/attachments/20111005/9b7c10a2/attachment.jpg>
> 
> ------------------------------
> 
> _______________________________________________
> meep-discuss mailing list
> meep-discuss@ab-initio.mit.edu
> http://ab-initio.mit.edu/cgi-bin/mailman/listinfo/meep-discuss
> 
> End of meep-discuss Digest, Vol 68, Issue 4
> *******************************************
> 
_______________________________________________
meep-discuss mailing list
meep-discuss@ab-initio.mit.edu
http://ab-initio.mit.edu/cgi-bin/mailman/listinfo/meep-discuss

Re: [Meep-discuss] about meep-mpi

Reply via email to