Hi all, Here are the answers to your earlier questions (experiments are done with PVFS on 4 nodes, IOR on 384 cores):
- When IOR uses a file-per-process approach, the performance becomes very unstable, ranging from 5MB/s to 400MB/s depending on the iteration. No way to see if there is a global decrease of performance or not. - Setting TroveSyncData to yes leads to all iterations having a constant 50MB/s aggregate throughput. No performance decrease. - CPU utilization is not 100% (30% on average). So it seems the problem comes from caching. The questions are: where is the cache implemented, how to control its size and when it is sync'ed. Matthieu ----- Mail original ----- > De: "Becky Ligon" <[email protected]> > À: "Matthieu Dorier" <[email protected]> > Cc: "Rob Latham" <[email protected]>, "pvfs2-users" > <[email protected]>, "ofs-support" > <[email protected]> > Envoyé: Mardi 2 Avril 2013 19:19:07 > Objet: Re: [Pvfs2-users] Strange performance behavior with IOR > Another FYI: On our cluster here at Clemson University, we have > turned off hyperthreading on any machine having intel processors. We > found that MPI applications perform badly on a true multi-core > system when hyperthreading is enabled. > Do any of your compute nodes have hyperthreading enabled? > Becky > On Tue, Apr 2, 2013 at 12:44 PM, Becky Ligon < [email protected] > > wrote: > > Just FYI: What we have seen with the high CPU utilization is that > > once you have more processes running than cores per machine, the > > performance slows down. And, we have seen this problem with the > > client core as well as the pvfs library (which ROMIO accesses). We > > have not been able to recreate the problem systematically and thus > > have not been able to resolve the issue. > > > On Tue, Apr 2, 2013 at 12:15 PM, Matthieu Dorier < > > [email protected] > wrote: > > > > To answer Phil's question: just restarting IOR is enough, yes. > > > Not > > > PVFS. > > > > > > For the rest, I'll do some experiments when I have the chance and > > > get > > > back to you. > > > > > > Thanks all > > > > > > Matthieu > > > > > > > De: "Becky Ligon" < [email protected] > > > > > > > > > > > À: "Matthieu Dorier" < [email protected] > > > > > > > > > > > Cc: "Rob Latham" < [email protected] >, "pvfs2-users" < > > > > [email protected] >, "ofs-support" < > > > > [email protected] > > > > > > > > > > > Envoyé: Mardi 2 Avril 2013 17:22:17 > > > > > > > > > > Objet: Re: [Pvfs2-users] Strange performance behavior with IOR > > > > > > > > > > Matthieu: > > > > > > > > > > Are you seeing any 100% CPU utilizations on the client? We have > > > > seen > > > > this with the client core (which you are not using) on a > > > > multicore > > > > system; however, both the client core and the PVFS interface do > > > > use > > > > the same request structures, etc. > > > > > > > > > > Becky > > > > > > > > > > On Tue, Apr 2, 2013 at 11:11 AM, Becky Ligon < > > > > [email protected] > > > > > > > > > wrote: > > > > > > > > > > > Matthieu: > > > > > > > > > > > > > > > I have asked Phil Carns to help you since he is more familiar > > > > > with > > > > > the benchmark and MPIIO. I think Rob Latham or Rob Ross may > > > > > be > > > > > helping too. I continue to look at your data in the mean > > > > > time. > > > > > > > > > > > > > > > Becky > > > > > > > > > > > > > > > Phil/Rob: > > > > > > > > > > > > > > > Thanks so much for helping Matthieu. I am digging into the > > > > > matter > > > > > but > > > > > MPI is still new to me and I'm not familiar with the PVFS > > > > > interface > > > > > that accompanies ROMIO. > > > > > > > > > > > > > > > Becky > > > > > > > > > > > > > > > PS. Can we keep this on the pvfs2-users list so I can see how > > > > > things > > > > > progress? > > > > > > > > > > > > > > > On Tue, Apr 2, 2013 at 10:47 AM, Matthieu Dorier < > > > > > [email protected] > wrote: > > > > > > > > > > > > > > > > Hi Rob and Phil, > > > > > > > > > > > > > > > > > > > > > This thread moved to the ofs-support mailing list (probably > > > > > > because > > > > > > the first personne to answer was part of this team), but I > > > > > > didn't > > > > > > get much answer to my problem, so I'll try to summarize > > > > > > here > > > > > > what > > > > > > I > > > > > > have done. > > > > > > > > > > > > > > > > > > > > > First to answer Phil, here is the PVFS config file > > > > > > attached, > > > > > > and > > > > > > here > > > > > > is the script file used for IOR: > > > > > > > > > > > > > > > > > > > > > IOR START > > > > > > > > > > > > > > > > > > > > > testFile = pvfs2:/mnt/pvfs2/testfileA > > > > > > > > > > > > > > > > > > > > > filePerProc=0 > > > > > > > > > > > > > > > > > > > > > api=MPIIO > > > > > > > > > > > > > > > > > > > > > repetitions=100 > > > > > > > > > > > > > > > > > > > > > verbose=2 > > > > > > > > > > > > > > > > > > > > > blockSize=4m > > > > > > > > > > > > > > > > > > > > > transferSize=4m > > > > > > > > > > > > > > > > > > > > > collective=1 > > > > > > > > > > > > > > > > > > > > > writeFile=1 > > > > > > > > > > > > > > > > > > > > > interTestDelay=60 > > > > > > > > > > > > > > > > > > > > > readFile=0 > > > > > > > > > > > > > > > > > > > > > RUN > > > > > > > > > > > > > > > > > > > > > IOR STOP > > > > > > > > > > > > > > > > > > > > > Besides the tests I was describing on my first mail, I also > > > > > > did > > > > > > the > > > > > > same experiments on another cluster also with TCP over IB, > > > > > > and > > > > > > then > > > > > > on Ethernet, with 336 clients and 672 clients, with 2, 4 > > > > > > and > > > > > > 8 > > > > > > storage servers. In every cases, this behavior appears. > > > > > > > > > > > > > > > > > > > > > I benchmarked the local disk attached to the storage > > > > > > servers > > > > > > and > > > > > > got > > > > > > 42MB/s, so the high throughput of over 2GB/s I get > > > > > > obviously > > > > > > benefits from some caching mechanisme and the periodic > > > > > > behavior > > > > > > observed at high output frequency could be explained by > > > > > > that. > > > > > > Yet > > > > > > this does not explain why, overall, the performance > > > > > > decreases > > > > > > over > > > > > > time. > > > > > > > > > > > > > > > > > > > > > I attach a set of graphics summarizing the experiments (on > > > > > > the > > > > > > x > > > > > > axis > > > > > > it's the iteration number and on the y axis the aggregate > > > > > > throughput > > > > > > obtained for this iteration, 100 consecutive iterations are > > > > > > performed). > > > > > > > > > > > > > > > > > > > > > It seems that the performance follows the law D = a*T+b > > > > > > where > > > > > > D > > > > > > is > > > > > > the duration of the write, T is the wallclock time since > > > > > > the > > > > > > beginning of the experiment, and "a" and "b" are constants. > > > > > > > > > > > > > > > > > > > > > When I stop IOR and immediately restart it, I get the good > > > > > > performance back, it does not continue at the reduced > > > > > > performance > > > > > > the previous instance finished. > > > > > > > > > > > > > > > > > > > > > I also thought it could come from the fact that the same > > > > > > file > > > > > > is > > > > > > re-written at every iteration, and tried with the > > > > > > multiFile=1 > > > > > > option > > > > > > to have one new file at every iteration instead, but this > > > > > > didn't > > > > > > help. > > > > > > > > > > > > > > > > > > > > > Last thing I can mention: I'm using mpich 3.0.2, compiled > > > > > > with > > > > > > PVFS > > > > > > support. > > > > > > > > > > > > > > > > > > > > > Matthieu > > > > > > > > > > > > > > > > > > > > > ----- Mail original ----- > > > > > > > > > > > > > > > > > > > > > > De: "Rob Latham" < [email protected] > > > > > > > > > > > > > > > > > > > > > > > À: "Matthieu Dorier" < [email protected] > > > > > > > > > > > > > > > > > > > > > > > Cc: "pvfs2-users" < [email protected] > > > > > > > > > > > > > > > > > > > > > > > Envoyé: Mardi 2 Avril 2013 15:57:54 > > > > > > > > > > > > > > > > > > > > > > Objet: Re: [Pvfs2-users] Strange performance behavior > > > > > > > with > > > > > > > IOR > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sat, Mar 23, 2013 at 03:31:22PM +0100, Matthieu Dorier > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > I've installed PVFS (orangeFS 2.8.7) on a small cluster > > > > > > > > (2 > > > > > > > > PVFS > > > > > > > > > > > > > > > > > > > > > > > nodes, 28 compute nodes of 24 cores each, everything > > > > > > > > connected > > > > > > > > > > > > > > > > > > > > > > > through infiniband but using an IP stack on top of it, > > > > > > > > so > > > > > > > > the > > > > > > > > > > > > > > > > > > > > > > > protocol for PVFS is TCP), and I witness some strange > > > > > > > > performance > > > > > > > > > > > > > > > > > > > > > > > behaviors with IOR (using ROMIO compiled against PVFS, > > > > > > > > no > > > > > > > > kernel > > > > > > > > > > > > > > > > > > > > > > > support): > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > IOR is started on 336 processes (14 nodes), writing > > > > > > > > 4MB/process > > > > > > > > in > > > > > > > > > > > > > > > > > > > > > > > a > > > > > > > > > > > > > > > > > > > > > > > single shared file using MPI-I/O (4MB transfer size > > > > > > > > also). > > > > > > > > It > > > > > > > > > > > > > > > > > > > > > > > completes 100 iterations. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > OK, so you have one pvfs client per core. All these are > > > > > > > talking > > > > > > > to > > > > > > > > > > > > > > > > > > > > > > two servers. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > First every time I start an instance of IOR, the first > > > > > > > > I/O > > > > > > > > > > > > > > > > > > > > > > > operation > > > > > > > > > > > > > > > > > > > > > > > is extremely slow. I'm guessing this is because ROMIO > > > > > > > > has > > > > > > > > to > > > > > > > > > > > > > > > > > > > > > > > initialize everything, get the list of PVFS servers, > > > > > > > > etc. > > > > > > > > Is > > > > > > > > there > > > > > > > > > > > > > > > > > > > > > > > a > > > > > > > > > > > > > > > > > > > > > > > way to speed this up? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ROMIO isn't doing a whole lot here, but there is one > > > > > > > thing > > > > > > > different > > > > > > > > > > > > > > > > > > > > > > about ROMIO's 1st call vs the Nth call. The 1st call > > > > > > > (first > > > > > > > time > > > > > > > any > > > > > > > > > > > > > > > > > > > > > > pvfs2 file is opened or deleted), ROMIO will call the > > > > > > > function > > > > > > > > > > > > > > > > > > > > > > PVFS_util_init_defaults(). > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > If you have 336 clients banging away on just two servers, > > > > > > > I > > > > > > > bet > > > > > > > that > > > > > > > > > > > > > > > > > > > > > > could explain some slowness. In the old days, the PVFS > > > > > > > server > > > > > > > had > > > > > > > to > > > > > > > > > > > > > > > > > > > > > > service these requests one at a time. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I don't think this restriction has been relaxed? Since it > > > > > > > is > > > > > > > a > > > > > > > > > > > > > > > > > > > > > > read-only operation, though, it sure seems like one could > > > > > > > just > > > > > > > have > > > > > > > > > > > > > > > > > > > > > > servers shovel out pvfs2 configuration information as > > > > > > > fast > > > > > > > as > > > > > > > > > > > > > > > > > > > > > > possible. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Then, I set some delay between each iteration, to > > > > > > > > better > > > > > > > > reflect > > > > > > > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > > > > > behavior of an actual scientific application. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Fun! this is kind of like what MADNESS does. "computes" > > > > > > > by > > > > > > > sleeping > > > > > > > > > > > > > > > > > > > > > > for a bit. I think Phil's questions will help us > > > > > > > understand > > > > > > > the > > > > > > > > > > > > > > > > > > > > > > highly variable performance. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Can you experiment with IORs collective I/O? by default, > > > > > > > collective > > > > > > > > > > > > > > > > > > > > > > I/O will select one client per node as an "i/o > > > > > > > aggregator". > > > > > > > The > > > > > > > IOR > > > > > > > > > > > > > > > > > > > > > > workload will not benefit from ROMIO's two-phase > > > > > > > optimization, > > > > > > > but > > > > > > > > > > > > > > > > > > > > > > you've got 336 clients banging away on two servers. When > > > > > > > I > > > > > > > last > > > > > > > > > > > > > > > > > > > > > > studied pvfs scalability, 100x more clients than servers > > > > > > > wasn't > > > > > > > a > > > > > > > > > > > > > > > > > > > > > > big > > > > > > > > > > > > > > > > > > > > > > deal, but 5-6 years ago nodes did not have 24 way > > > > > > > parallelism. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ==rob > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > > > > > > > > Rob Latham > > > > > > > > > > > > > > > > > > > > > > Mathematics and Computer Science Division > > > > > > > > > > > > > > > > > > > > > > Argonne National Lab, IL USA > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > > > > > > > > > > > > > > Pvfs2-users mailing list > > > > > > > > > > > > > > > > > > > > > [email protected] > > > > > > > > > > > > > > > > > > > > > http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > Becky Ligon > > > > > > > > > > > > > > > OrangeFS Support and Development > > > > > > > > > > > > > > > Omnibond Systems > > > > > > > > > > > > > > > Anderson, South Carolina > > > > > > > > > > > > > > -- > > > > > > > > > > Becky Ligon > > > > > > > > > > OrangeFS Support and Development > > > > > > > > > > Omnibond Systems > > > > > > > > > > Anderson, South Carolina > > > > > > > > -- > > > Becky Ligon > > > OrangeFS Support and Development > > > Omnibond Systems > > > Anderson, South Carolina > > -- > Becky Ligon > OrangeFS Support and Development > Omnibond Systems > Anderson, South Carolina
_______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
