Another FYI: On our cluster here at Clemson University, we have turned off hyperthreading on any machine having intel processors. We found that MPI applications perform badly on a true multi-core system when hyperthreading is enabled.
Do any of your compute nodes have hyperthreading enabled? Becky On Tue, Apr 2, 2013 at 12:44 PM, Becky Ligon <[email protected]> wrote: > Just FYI: What we have seen with the high CPU utilization is that once > you have more processes running than cores per machine, the performance > slows down. And, we have seen this problem with the client core as well as > the pvfs library (which ROMIO accesses). We have not been able to recreate > the problem systematically and thus have not been able to resolve the issue. > > > On Tue, Apr 2, 2013 at 12:15 PM, Matthieu Dorier <[email protected] > > wrote: > >> To answer Phil's question: just restarting IOR is enough, yes. Not PVFS. >> For the rest, I'll do some experiments when I have the chance and get >> back to you. >> >> Thanks all >> >> Matthieu >> >> ------------------------------ >> >> *De: *"Becky Ligon" <[email protected]> >> *À: *"Matthieu Dorier" <[email protected]> >> *Cc: *"Rob Latham" <[email protected]>, "pvfs2-users" < >> [email protected]>, "ofs-support" < >> [email protected]> >> *Envoyé: *Mardi 2 Avril 2013 17:22:17 >> >> *Objet: *Re: [Pvfs2-users] Strange performance behavior with IOR >> >> Matthieu: >> >> Are you seeing any 100% CPU utilizations on the client? We have seen >> this with the client core (which you are not using) on a multicore system; >> however, both the client core and the PVFS interface do use the same >> request structures, etc. >> >> Becky >> >> On Tue, Apr 2, 2013 at 11:11 AM, Becky Ligon <[email protected]> wrote: >> >>> Matthieu: >>> >>> I have asked Phil Carns to help you since he is more familiar with the >>> benchmark and MPIIO. I think Rob Latham or Rob Ross may be helping too. I >>> continue to look at your data in the mean time. >>> >>> Becky >>> >>> Phil/Rob: >>> >>> Thanks so much for helping Matthieu. I am digging into the matter but >>> MPI is still new to me and I'm not familiar with the PVFS interface that >>> accompanies ROMIO. >>> >>> Becky >>> >>> PS. Can we keep this on the pvfs2-users list so I can see how things >>> progress? >>> >>> >>> On Tue, Apr 2, 2013 at 10:47 AM, Matthieu Dorier < >>> [email protected]> wrote: >>> >>>> Hi Rob and Phil, >>>> >>>> This thread moved to the ofs-support mailing list (probably because the >>>> first personne to answer was part of this team), but I didn't get much >>>> answer to my problem, so I'll try to summarize here what I have done. >>>> >>>> First to answer Phil, here is the PVFS config file attached, and here >>>> is the script file used for IOR: >>>> >>>> IOR START >>>> testFile = pvfs2:/mnt/pvfs2/testfileA >>>> filePerProc=0 >>>> api=MPIIO >>>> repetitions=100 >>>> verbose=2 >>>> blockSize=4m >>>> transferSize=4m >>>> collective=1 >>>> writeFile=1 >>>> interTestDelay=60 >>>> readFile=0 >>>> RUN >>>> IOR STOP >>>> >>>> Besides the tests I was describing on my first mail, I also did the >>>> same experiments on another cluster also with TCP over IB, and then on >>>> Ethernet, with 336 clients and 672 clients, with 2, 4 and 8 storage >>>> servers. In every cases, this behavior appears. >>>> >>>> I benchmarked the local disk attached to the storage servers and got >>>> 42MB/s, so the high throughput of over 2GB/s I get obviously benefits from >>>> some caching mechanisme and the periodic behavior observed at high output >>>> frequency could be explained by that. Yet this does not explain why, >>>> overall, the performance decreases over time. >>>> >>>> I attach a set of graphics summarizing the experiments (on the x axis >>>> it's the iteration number and on the y axis the aggregate throughput >>>> obtained for this iteration, 100 consecutive iterations are performed). >>>> It seems that the performance follows the law D = a*T+b where D is the >>>> duration of the write, T is the wallclock time since the beginning of the >>>> experiment, and "a" and "b" are constants. >>>> >>>> When I stop IOR and immediately restart it, I get the good performance >>>> back, it does not continue at the reduced performance the previous instance >>>> finished. >>>> >>>> I also thought it could come from the fact that the same file is >>>> re-written at every iteration, and tried with the multiFile=1 option to >>>> have one new file at every iteration instead, but this didn't help. >>>> >>>> Last thing I can mention: I'm using mpich 3.0.2, compiled with PVFS >>>> support. >>>> >>>> Matthieu >>>> >>>> ----- Mail original ----- >>>> > De: "Rob Latham" <[email protected]> >>>> > À: "Matthieu Dorier" <[email protected]> >>>> > Cc: "pvfs2-users" <[email protected]> >>>> > Envoyé: Mardi 2 Avril 2013 15:57:54 >>>> > Objet: Re: [Pvfs2-users] Strange performance behavior with IOR >>>> > >>>> > On Sat, Mar 23, 2013 at 03:31:22PM +0100, Matthieu Dorier wrote: >>>> > > I've installed PVFS (orangeFS 2.8.7) on a small cluster (2 PVFS >>>> > > nodes, 28 compute nodes of 24 cores each, everything connected >>>> > > through infiniband but using an IP stack on top of it, so the >>>> > > protocol for PVFS is TCP), and I witness some strange performance >>>> > > behaviors with IOR (using ROMIO compiled against PVFS, no kernel >>>> > > support): >>>> > >>>> > > IOR is started on 336 processes (14 nodes), writing 4MB/process in >>>> > > a >>>> > > single shared file using MPI-I/O (4MB transfer size also). It >>>> > > completes 100 iterations. >>>> > >>>> > OK, so you have one pvfs client per core. All these are talking to >>>> > two servers. >>>> > >>>> > > First every time I start an instance of IOR, the first I/O >>>> > > operation >>>> > > is extremely slow. I'm guessing this is because ROMIO has to >>>> > > initialize everything, get the list of PVFS servers, etc. Is there >>>> > > a >>>> > > way to speed this up? >>>> > >>>> > ROMIO isn't doing a whole lot here, but there is one thing different >>>> > about ROMIO's 1st call vs the Nth call. The 1st call (first time any >>>> > pvfs2 file is opened or deleted), ROMIO will call the function >>>> > PVFS_util_init_defaults(). >>>> > >>>> > If you have 336 clients banging away on just two servers, I bet that >>>> > could explain some slowness. In the old days, the PVFS server had to >>>> > service these requests one at a time. >>>> > >>>> > I don't think this restriction has been relaxed? Since it is a >>>> > read-only operation, though, it sure seems like one could just have >>>> > servers shovel out pvfs2 configuration information as fast as >>>> > possible. >>>> > >>>> > >>>> > > Then, I set some delay between each iteration, to better reflect >>>> > > the >>>> > > behavior of an actual scientific application. >>>> > >>>> > Fun! this is kind of like what MADNESS does. "computes" by sleeping >>>> > for a bit. I think Phil's questions will help us understand the >>>> > highly variable performance. >>>> > >>>> > Can you experiment with IORs collective I/O? by default, collective >>>> > I/O will select one client per node as an "i/o aggregator". The IOR >>>> > workload will not benefit from ROMIO's two-phase optimization, but >>>> > you've got 336 clients banging away on two servers. When I last >>>> > studied pvfs scalability, 100x more clients than servers wasn't a >>>> > big >>>> > deal, but 5-6 years ago nodes did not have 24 way parallelism. >>>> > >>>> > ==rob >>>> > >>>> > -- >>>> > Rob Latham >>>> > Mathematics and Computer Science Division >>>> > Argonne National Lab, IL USA >>>> > >>>> >>>> _______________________________________________ >>>> Pvfs2-users mailing list >>>> [email protected] >>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users >>>> >>>> >>> >>> >>> -- >>> Becky Ligon >>> OrangeFS Support and Development >>> Omnibond Systems >>> Anderson, South Carolina >>> >>> >> >> >> -- >> Becky Ligon >> OrangeFS Support and Development >> Omnibond Systems >> Anderson, South Carolina >> >> >> > > > -- > Becky Ligon > OrangeFS Support and Development > Omnibond Systems > Anderson, South Carolina > > -- Becky Ligon OrangeFS Support and Development Omnibond Systems Anderson, South Carolina
_______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
