Re: [gmx-users] Gromacs 4 Scaling Benchmarks...
Am Mittwoch 12 November 2008 06:18:14 schrieb vivek sharma: Everybody thanks for your usefull suggestions.. What do you mean by % imbalance reported in log file. I don't know how to assign the specific load to PME, but I can see that around 37% of the computation is being used by PME. I am not assigning PME nodes separately. I have no idea of dynamic load balancing and how to use it ? The switch is -dlb Normally it's turned on automatically (auto), but you can turn it on/off with this switch as well. -dds might be of interest as well... Best Martin ___ gmx-users mailing listgmx-users@gromacs.org http://www.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to [EMAIL PROTECTED] Can't post? Read http://www.gromacs.org/mailing_lists/users.php
Re: [gmx-users] Gromacs 4 Scaling Benchmarks...
vivek sharma wrote: Hi Carsten, I have also tried scaling gromacs for a number of nodes but was not able to optimize it beyond 20 processor..on 20 nodes i.e. 1 processor per node.. I am not getting the point of optimizing PME for the number of nodes, is it like we can change the parameters for PME for MDS or using some other coulomb type for the same, please explain. This is something I played with for a while; see the thread I started here: http://www.gromacs.org/pipermail/gmx-users/2008-October/036856.html I got some great advice there. A big factor is the PME/PP balance, which grompp will estimate for you. For simple rectangular boxes, the goal is to shoot for 0.25 for the PME load (this is printed out by grompp). In the thread above, Berk shared with me some tips on how to get this to happen. Then you should be able to set -npme using mdrun to however many processors is appropriate. I believe mdrun will try to guess, but I'm in the habit of specifying it myself just for my own satisfaction :) -Justin and suggest me the way to do it. With Thanks, Vivek 2008/11/10 Carsten Kutzner [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] Hi, most likely the Ethernet is the problem here. I compiled some numbers for the DPPC benchmark in the paper Speeding up parallel GROMACS on high-latency networks, http://www3.interscience.wiley.com/journal/114205207/abstract?CRETRY=1SRETRY=0 http://www3.interscience.wiley.com/journal/114205207/abstract?CRETRY=1SRETRY=0 which are for version 3.3, but PME will behave similarly. If you did not already use separate PME nodes, this is worth a try, since on Ethernet the performance will drastically depend on the number of nodes involved in the FFT. I also have a tool which finds the optimal PME settings for a given number of nodes, by varying the number of PME nodes and the fourier grid settings. I can send it to you if you want. Carsten On Nov 9, 2008, at 10:30 PM, Yawar JQ wrote: I was wondering if anyone could comment on these benchmark results for the d.dppc benchmark? Nodes Cutoff (ns/day) PME (ns/day) 4 1.331 0.797 8 2.564 1.497 16 4.5 1.92 32 8.308 0.575 64 13.50.275 128 20.093 - 192 21.6- It seems to scale relatively well up to 32-64 nodes without PME. This seems slightly better than the benchmark results for Gromacs 3 on www.gromacs.org http://www.gromacs.org/. Can someone comment on the magnitude of the performance hit and lack of scaling with PME is worrying me. For the PME runs, I set rlist,rvdw,rouloumb=1.2 and the rest set to the defaults. I can try it with some other settings, larger spacing for the grid, but I'm not sure how much more that would help. Is there a more standardized system I should use for testing PME scaling? This is with GNU compilers and parallelization with OpenMPI 1.2. I'm not sure what we're using for the FFTW The compute nodes are Dell m600 blades w/ 16GB of RAM and dual quad core Intel Xeon 3GHz processors. I believe it's all ethernet interconnects. Thanks, YQ ___ gmx-users mailing listgmx-users@gromacs.org mailto:gmx-users@gromacs.org http://www.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]. Can't post? Read http://www.gromacs.org/mailing_lists/users.php ___ gmx-users mailing listgmx-users@gromacs.org mailto:gmx-users@gromacs.org http://www.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]. Can't post? Read http://www.gromacs.org/mailing_lists/users.php ___ gmx-users mailing listgmx-users@gromacs.org http://www.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to [EMAIL PROTECTED] Can't post? Read http://www.gromacs.org/mailing_lists/users.php -- Justin A. Lemkul Graduate Research Assistant Department of Biochemistry Virginia Tech Blacksburg, VA jalemkul[at]vt.edu | (540) 231-9080 http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin
Re: [gmx-users] Gromacs 4 Scaling Benchmarks...
A page on the wiki with further information and hints would be nice. Topic: improving performance with GMX4 or Pimp my GMX4 ;-) The beta manualpage of mdrun (version4) is not very comprehensible/user friendly in my eyes. - Christian On Tue, 2008-11-11 at 09:12 -0500, Justin A. Lemkul wrote: vivek sharma wrote: HI MArtin, I am using here the infiniband having speed more than 10 gbps..Can you suggest some option to scale better in this case. What % imbalance is being reported in the log file? What fraction of the load is being assigned to PME, from grompp? How many processors are you assigning to the PME calculation? Are you using dynamic load balancing? All of these factors affect performance. -Justin With Thanks, Vivek 2008/11/11 Martin Höfling [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] Am Dienstag 11 November 2008 12:06:06 schrieb vivek sharma: I have also tried scaling gromacs for a number of nodes but was not able to optimize it beyond 20 processor..on 20 nodes i.e. 1 processor per As mentioned before, performance strongly depends on the type of interconnect you're using between your processes. Shared Memory, Ethernet, Infiniband, NumaLink, whatever... I assume you're using ethernet (100/1000 MBit?), you can tune here to some extend as described in: Kutzner, C.; Spoel, D. V. D.; Fechner, M.; Lindahl, E.; Schmitt, U. W.; Groot, B. L. D. Grubmüller, H. Speeding up parallel GROMACS on high-latency networks Journal of Computational Chemistry, 2007 ...but be aware that principal limitations of ethernet remain. To come around this, you might consider to invest in the interconnect. If you can come out with 16 cores, shared memory nodes will give you the biggest bang for the buck. Best Martin ___ gmx-users mailing listgmx-users@gromacs.org mailto:gmx-users@gromacs.org http://www.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]. Can't post? Read http://www.gromacs.org/mailing_lists/users.php ___ gmx-users mailing listgmx-users@gromacs.org http://www.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to [EMAIL PROTECTED] Can't post? Read http://www.gromacs.org/mailing_lists/users.php -- M. Sc. Christian Seifert Department of Biophysics University of Bochum ND 04/67 44780 Bochum Germany Tel: +49 (0)234 32 28363 Fax: +49 (0)234 32 14626 E-Mail: [EMAIL PROTECTED] Web: http://www.bph.rub.de ___ gmx-users mailing listgmx-users@gromacs.org http://www.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to [EMAIL PROTECTED] Can't post? Read http://www.gromacs.org/mailing_lists/users.php
Re: [gmx-users] Gromacs 4 Scaling Benchmarks...
Hi Carsten, I have also tried scaling gromacs for a number of nodes but was not able to optimize it beyond 20 processor..on 20 nodes i.e. 1 processor per node.. I am not getting the point of optimizing PME for the number of nodes, is it like we can change the parameters for PME for MDS or using some other coulomb type for the same, please explain. and suggest me the way to do it. With Thanks, Vivek 2008/11/10 Carsten Kutzner [EMAIL PROTECTED] Hi, most likely the Ethernet is the problem here. I compiled some numbers for the DPPC benchmark in the paper Speeding up parallel GROMACS on high-latency networks, http://www3.interscience.wiley.com/journal/114205207/abstract?CRETRY=1SRETRY=0 which are for version 3.3, but PME will behave similarly. If you did not already use separate PME nodes, this is worth a try, since on Ethernet the performance will drastically depend on the number of nodes involved in the FFT. I also have a tool which finds the optimal PME settings for a given number of nodes, by varying the number of PME nodes and the fourier grid settings. I can send it to you if you want. Carsten On Nov 9, 2008, at 10:30 PM, Yawar JQ wrote: I was wondering if anyone could comment on these benchmark results for the d.dppc benchmark? Nodes Cutoff (ns/day) PME (ns/day) 4 1.331 0.797 8 2.564 1.497 16 4.5 1.92 32 8.308 0.575 64 13.5 0.275 128 20.093 - 192 21.6 - It seems to scale relatively well up to 32-64 nodes without PME. This seems slightly better than the benchmark results for Gromacs 3 on www.gromacs.org. Can someone comment on the magnitude of the performance hit and lack of scaling with PME is worrying me. For the PME runs, I set rlist,rvdw,rouloumb=1.2 and the rest set to the defaults. I can try it with some other settings, larger spacing for the grid, but I'm not sure how much more that would help. Is there a more standardized system I should use for testing PME scaling? This is with GNU compilers and parallelization with OpenMPI 1.2. I'm not sure what we're using for the FFTW The compute nodes are Dell m600 blades w/ 16GB of RAM and dual quad core Intel Xeon 3GHz processors. I believe it's all ethernet interconnects. Thanks, YQ ___ gmx-users mailing listgmx-users@gromacs.org http://www.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to [EMAIL PROTECTED] Can't post? Read http://www.gromacs.org/mailing_lists/users.php ___ gmx-users mailing listgmx-users@gromacs.org http://www.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to [EMAIL PROTECTED] Can't post? Read http://www.gromacs.org/mailing_lists/users.php ___ gmx-users mailing listgmx-users@gromacs.org http://www.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to [EMAIL PROTECTED] Can't post? Read http://www.gromacs.org/mailing_lists/users.php
Re: [gmx-users] Gromacs 4 Scaling Benchmarks...
Am Dienstag 11 November 2008 12:06:06 schrieb vivek sharma: I have also tried scaling gromacs for a number of nodes but was not able to optimize it beyond 20 processor..on 20 nodes i.e. 1 processor per As mentioned before, performance strongly depends on the type of interconnect you're using between your processes. Shared Memory, Ethernet, Infiniband, NumaLink, whatever... I assume you're using ethernet (100/1000 MBit?), you can tune here to some extend as described in: Kutzner, C.; Spoel, D. V. D.; Fechner, M.; Lindahl, E.; Schmitt, U. W.; Groot, B. L. D. Grubmüller, H. Speeding up parallel GROMACS on high-latency networks Journal of Computational Chemistry, 2007 ...but be aware that principal limitations of ethernet remain. To come around this, you might consider to invest in the interconnect. If you can come out with 16 cores, shared memory nodes will give you the biggest bang for the buck. Best Martin ___ gmx-users mailing listgmx-users@gromacs.org http://www.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to [EMAIL PROTECTED] Can't post? Read http://www.gromacs.org/mailing_lists/users.php
Re: [gmx-users] Gromacs 4 Scaling Benchmarks...
vivek sharma wrote: Hi All, one thing I forgot to mention I am getting here around 6 ns/day...for a protein of size around 2600 atoms.. Much more relevant is how much water... You can also be rate limited by I/O if you have poor hardware and/or are writing to disk excessively. Mark ___ gmx-users mailing listgmx-users@gromacs.org http://www.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to [EMAIL PROTECTED] Can't post? Read http://www.gromacs.org/mailing_lists/users.php
Re: [gmx-users] Gromacs 4 Scaling Benchmarks...
HI MArtin, I am using here the infiniband having speed more than 10 gbps..Can you suggest some option to scale better in this case. With Thanks, Vivek 2008/11/11 Martin Höfling [EMAIL PROTECTED] Am Dienstag 11 November 2008 12:06:06 schrieb vivek sharma: I have also tried scaling gromacs for a number of nodes but was not able to optimize it beyond 20 processor..on 20 nodes i.e. 1 processor per As mentioned before, performance strongly depends on the type of interconnect you're using between your processes. Shared Memory, Ethernet, Infiniband, NumaLink, whatever... I assume you're using ethernet (100/1000 MBit?), you can tune here to some extend as described in: Kutzner, C.; Spoel, D. V. D.; Fechner, M.; Lindahl, E.; Schmitt, U. W.; Groot, B. L. D. Grubmüller, H. Speeding up parallel GROMACS on high-latency networks Journal of Computational Chemistry, 2007 ...but be aware that principal limitations of ethernet remain. To come around this, you might consider to invest in the interconnect. If you can come out with 16 cores, shared memory nodes will give you the biggest bang for the buck. Best Martin ___ gmx-users mailing listgmx-users@gromacs.org http://www.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to [EMAIL PROTECTED] Can't post? Read http://www.gromacs.org/mailing_lists/users.php ___ gmx-users mailing listgmx-users@gromacs.org http://www.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to [EMAIL PROTECTED] Can't post? Read http://www.gromacs.org/mailing_lists/users.php
Re: [gmx-users] Gromacs 4 Scaling Benchmarks...
Hi All, one thing I forgot to mention I am getting here around 6 ns/day...for a protein of size around 2600 atoms.. With Thanks, Vivek 2008/11/11 vivek sharma [EMAIL PROTECTED] HI MArtin, I am using here the infiniband having speed more than 10 gbps..Can you suggest some option to scale better in this case. With Thanks, Vivek 2008/11/11 Martin Höfling [EMAIL PROTECTED] Am Dienstag 11 November 2008 12:06:06 schrieb vivek sharma: I have also tried scaling gromacs for a number of nodes but was not able to optimize it beyond 20 processor..on 20 nodes i.e. 1 processor per As mentioned before, performance strongly depends on the type of interconnect you're using between your processes. Shared Memory, Ethernet, Infiniband, NumaLink, whatever... I assume you're using ethernet (100/1000 MBit?), you can tune here to some extend as described in: Kutzner, C.; Spoel, D. V. D.; Fechner, M.; Lindahl, E.; Schmitt, U. W.; Groot, B. L. D. Grubmüller, H. Speeding up parallel GROMACS on high-latency networks Journal of Computational Chemistry, 2007 ...but be aware that principal limitations of ethernet remain. To come around this, you might consider to invest in the interconnect. If you can come out with 16 cores, shared memory nodes will give you the biggest bang for the buck. Best Martin ___ gmx-users mailing listgmx-users@gromacs.org http://www.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to [EMAIL PROTECTED] Can't post? Read http://www.gromacs.org/mailing_lists/users.php ___ gmx-users mailing listgmx-users@gromacs.org http://www.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to [EMAIL PROTECTED] Can't post? Read http://www.gromacs.org/mailing_lists/users.php
Re: [gmx-users] Gromacs 4 Scaling Benchmarks...
vivek sharma wrote: HI MArtin, I am using here the infiniband having speed more than 10 gbps..Can you suggest some option to scale better in this case. What % imbalance is being reported in the log file? What fraction of the load is being assigned to PME, from grompp? How many processors are you assigning to the PME calculation? Are you using dynamic load balancing? All of these factors affect performance. -Justin With Thanks, Vivek 2008/11/11 Martin Höfling [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] Am Dienstag 11 November 2008 12:06:06 schrieb vivek sharma: I have also tried scaling gromacs for a number of nodes but was not able to optimize it beyond 20 processor..on 20 nodes i.e. 1 processor per As mentioned before, performance strongly depends on the type of interconnect you're using between your processes. Shared Memory, Ethernet, Infiniband, NumaLink, whatever... I assume you're using ethernet (100/1000 MBit?), you can tune here to some extend as described in: Kutzner, C.; Spoel, D. V. D.; Fechner, M.; Lindahl, E.; Schmitt, U. W.; Groot, B. L. D. Grubmüller, H. Speeding up parallel GROMACS on high-latency networks Journal of Computational Chemistry, 2007 ...but be aware that principal limitations of ethernet remain. To come around this, you might consider to invest in the interconnect. If you can come out with 16 cores, shared memory nodes will give you the biggest bang for the buck. Best Martin ___ gmx-users mailing listgmx-users@gromacs.org mailto:gmx-users@gromacs.org http://www.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]. Can't post? Read http://www.gromacs.org/mailing_lists/users.php ___ gmx-users mailing listgmx-users@gromacs.org http://www.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to [EMAIL PROTECTED] Can't post? Read http://www.gromacs.org/mailing_lists/users.php -- Justin A. Lemkul Graduate Research Assistant Department of Biochemistry Virginia Tech Blacksburg, VA jalemkul[at]vt.edu | (540) 231-9080 http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin ___ gmx-users mailing listgmx-users@gromacs.org http://www.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to [EMAIL PROTECTED] Can't post? Read http://www.gromacs.org/mailing_lists/users.php
Re: [gmx-users] Gromacs 4 Scaling Benchmarks...
2008/11/11 Justin A. Lemkul [EMAIL PROTECTED] vivek sharma wrote: HI MArtin, I am using here the infiniband having speed more than 10 gbps..Can you suggest some option to scale better in this case. What % imbalance is being reported in the log file? What fraction of the load is being assigned to PME, from grompp? How many processors are you assigning to the PME calculation? Are you using dynamic load balancing? Everybody thanks for your usefull suggestions.. What do you mean by % imbalance reported in log file. I don't know how to assign the specific load to PME, but I can see that around 37% of the computation is being used by PME. I am not assigning PME nodes separately. I have no idea of dynamic load balancing and how to use it ? Looking forward for answers... With Thanks, Vivek All of these factors affect performance. -Justin With Thanks, Vivek 2008/11/11 Martin Höfling [EMAIL PROTECTED] mailto: [EMAIL PROTECTED] Am Dienstag 11 November 2008 12:06:06 schrieb vivek sharma: I have also tried scaling gromacs for a number of nodes but was not able to optimize it beyond 20 processor..on 20 nodes i.e. 1 processor per As mentioned before, performance strongly depends on the type of interconnect you're using between your processes. Shared Memory, Ethernet, Infiniband, NumaLink, whatever... I assume you're using ethernet (100/1000 MBit?), you can tune here to some extend as described in: Kutzner, C.; Spoel, D. V. D.; Fechner, M.; Lindahl, E.; Schmitt, U. W.; Groot, B. L. D. Grubmüller, H. Speeding up parallel GROMACS on high-latency networks Journal of Computational Chemistry, 2007 ...but be aware that principal limitations of ethernet remain. To come around this, you might consider to invest in the interconnect. If you can come out with 16 cores, shared memory nodes will give you the biggest bang for the buck. Best Martin ___ gmx-users mailing listgmx-users@gromacs.org mailto:gmx-users@gromacs.org http://www.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]. Can't post? Read http://www.gromacs.org/mailing_lists/users.php ___ gmx-users mailing listgmx-users@gromacs.org http://www.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to [EMAIL PROTECTED] Can't post? Read http://www.gromacs.org/mailing_lists/users.php -- Justin A. Lemkul Graduate Research Assistant Department of Biochemistry Virginia Tech Blacksburg, VA jalemkul[at]vt.edu | (540) 231-9080 http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin ___ gmx-users mailing listgmx-users@gromacs.org http://www.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to [EMAIL PROTECTED] Can't post? Read http://www.gromacs.org/mailing_lists/users.php ___ gmx-users mailing listgmx-users@gromacs.org http://www.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to [EMAIL PROTECTED] Can't post? Read http://www.gromacs.org/mailing_lists/users.php
Re: [gmx-users] Gromacs 4 Scaling Benchmarks...
vivek sharma wrote: 2008/11/11 Justin A. Lemkul [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] vivek sharma wrote: HI MArtin, I am using here the infiniband having speed more than 10 gbps..Can you suggest some option to scale better in this case. What % imbalance is being reported in the log file? What fraction of the load is being assigned to PME, from grompp? How many processors are you assigning to the PME calculation? Are you using dynamic load balancing? Everybody thanks for your usefull suggestions.. What do you mean by % imbalance reported in log file. I don't know how to assign the specific load to PME, but I can see that around 37% of the computation is being used by PME. I am not assigning PME nodes separately. I have no idea of dynamic load balancing and how to use it ? For the moment, you just need to look for information in your grompp output and your log file, like Justin asked. To find out how to use things, check out mdrun -h and the version 4 manual. There's some relevant discussion in this thread http://www.gromacs.org/pipermail/gmx-users/2008-October/036856.html Mark ___ gmx-users mailing listgmx-users@gromacs.org http://www.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to [EMAIL PROTECTED] Can't post? Read http://www.gromacs.org/mailing_lists/users.php
Re: [gmx-users] Gromacs 4 Scaling Benchmarks...
Hi, most likely the Ethernet is the problem here. I compiled some numbers for the DPPC benchmark in the paper Speeding up parallel GROMACS on high-latency networks, http://www3.interscience.wiley.com/journal/114205207/abstract?CRETRY=1SRETRY=0 which are for version 3.3, but PME will behave similarly. If you did not already use separate PME nodes, this is worth a try, since on Ethernet the performance will drastically depend on the number of nodes involved in the FFT. I also have a tool which finds the optimal PME settings for a given number of nodes, by varying the number of PME nodes and the fourier grid settings. I can send it to you if you want. Carsten On Nov 9, 2008, at 10:30 PM, Yawar JQ wrote: I was wondering if anyone could comment on these benchmark results for the d.dppc benchmark? Nodes Cutoff (ns/day) PME (ns/day) 4 1.331 0.797 8 2.564 1.497 16 4.5 1.92 32 8.308 0.575 64 13.50.275 128 20.093 - 192 21.6- It seems to scale relatively well up to 32-64 nodes without PME. This seems slightly better than the benchmark results for Gromacs 3 on www.gromacs.org. Can someone comment on the magnitude of the performance hit and lack of scaling with PME is worrying me. For the PME runs, I set rlist,rvdw,rouloumb=1.2 and the rest set to the defaults. I can try it with some other settings, larger spacing for the grid, but I'm not sure how much more that would help. Is there a more standardized system I should use for testing PME scaling? This is with GNU compilers and parallelization with OpenMPI 1.2. I'm not sure what we're using for the FFTW The compute nodes are Dell m600 blades w/ 16GB of RAM and dual quad core Intel Xeon 3GHz processors. I believe it's all ethernet interconnects. Thanks, YQ ___ gmx-users mailing listgmx-users@gromacs.org http://www.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to [EMAIL PROTECTED] Can't post? Read http://www.gromacs.org/mailing_lists/users.php ___ gmx-users mailing listgmx-users@gromacs.org http://www.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to [EMAIL PROTECTED] Can't post? Read http://www.gromacs.org/mailing_lists/users.php
RE: [gmx-users] Gromacs 4 Scaling Benchmarks...
The fftw used during compilation was FFTW 3.1.2 compiled using the GNU compilers. From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yawar JQ Sent: Sunday, November 09, 2008 3:31 PM To: gmx-users@gromacs.org Subject: [gmx-users] Gromacs 4 Scaling Benchmarks... I was wondering if anyone could comment on these benchmark results for the d.dppc benchmark? Nodes Cutoff (ns/day) PME (ns/day) 4 1.331 0.797 8 2.564 1.497 16 4.5 1.92 32 8.308 0.575 64 13.5 0.275 128 20.093 - 192 21.6 - It seems to scale relatively well up to 32-64 nodes without PME. This seems slightly better than the benchmark results for Gromacs 3 on www.gromacs.org http://www.gromacs.org/ . Can someone comment on the magnitude of the performance hit and lack of scaling with PME is worrying me. For the PME runs, I set rlist,rvdw,rouloumb=1.2 and the rest set to the defaults. I can try it with some other settings, larger spacing for the grid, but I'm not sure how much more that would help. Is there a more standardized system I should use for testing PME scaling? This is with GNU compilers and parallelization with OpenMPI 1.2. I'm not sure what we're using for the FFTW The compute nodes are Dell m600 blades w/ 16GB of RAM and dual quad core Intel Xeon 3GHz processors. I believe it's all ethernet interconnects. Thanks, YQ ___ gmx-users mailing listgmx-users@gromacs.org http://www.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to [EMAIL PROTECTED] Can't post? Read http://www.gromacs.org/mailing_lists/users.php