Hi all, 

Here are the answers to your earlier questions (experiments are done with PVFS 
on 4 nodes, IOR on 384 cores): 

- When IOR uses a file-per-process approach, the performance becomes very 
unstable, ranging from 5MB/s to 400MB/s depending on the iteration. No way to 
see if there is a global decrease of performance or not. 

- Setting TroveSyncData to yes leads to all iterations having a constant 50MB/s 
aggregate throughput. No performance decrease. 

- CPU utilization is not 100% (30% on average). 

So it seems the problem comes from caching. The questions are: where is the 
cache implemented, how to control its size and when it is sync'ed. 

Matthieu 

----- Mail original -----

> De: "Becky Ligon" <[email protected]>
> À: "Matthieu Dorier" <[email protected]>
> Cc: "Rob Latham" <[email protected]>, "pvfs2-users"
> <[email protected]>, "ofs-support"
> <[email protected]>
> Envoyé: Mardi 2 Avril 2013 19:19:07
> Objet: Re: [Pvfs2-users] Strange performance behavior with IOR

> Another FYI: On our cluster here at Clemson University, we have
> turned off hyperthreading on any machine having intel processors. We
> found that MPI applications perform badly on a true multi-core
> system when hyperthreading is enabled.

> Do any of your compute nodes have hyperthreading enabled?

> Becky

> On Tue, Apr 2, 2013 at 12:44 PM, Becky Ligon < [email protected] >
> wrote:

> > Just FYI: What we have seen with the high CPU utilization is that
> > once you have more processes running than cores per machine, the
> > performance slows down. And, we have seen this problem with the
> > client core as well as the pvfs library (which ROMIO accesses). We
> > have not been able to recreate the problem systematically and thus
> > have not been able to resolve the issue.
> 

> > On Tue, Apr 2, 2013 at 12:15 PM, Matthieu Dorier <
> > [email protected] > wrote:
> 

> > > To answer Phil's question: just restarting IOR is enough, yes.
> > > Not
> > > PVFS.
> > 
> 
> > > For the rest, I'll do some experiments when I have the chance and
> > > get
> > > back to you.
> > 
> 

> > > Thanks all
> > 
> 

> > > Matthieu
> > 
> 

> > > > De: "Becky Ligon" < [email protected] >
> > > 
> > 
> 
> > > > À: "Matthieu Dorier" < [email protected] >
> > > 
> > 
> 
> > > > Cc: "Rob Latham" < [email protected] >, "pvfs2-users" <
> > > > [email protected] >, "ofs-support" <
> > > > [email protected] >
> > > 
> > 
> 
> > > > Envoyé: Mardi 2 Avril 2013 17:22:17
> > > 
> > 
> 

> > > > Objet: Re: [Pvfs2-users] Strange performance behavior with IOR
> > > 
> > 
> 

> > > > Matthieu:
> > > 
> > 
> 

> > > > Are you seeing any 100% CPU utilizations on the client? We have
> > > > seen
> > > > this with the client core (which you are not using) on a
> > > > multicore
> > > > system; however, both the client core and the PVFS interface do
> > > > use
> > > > the same request structures, etc.
> > > 
> > 
> 

> > > > Becky
> > > 
> > 
> 

> > > > On Tue, Apr 2, 2013 at 11:11 AM, Becky Ligon <
> > > > [email protected]
> > > > >
> > > > wrote:
> > > 
> > 
> 

> > > > > Matthieu:
> > > > 
> > > 
> > 
> 

> > > > > I have asked Phil Carns to help you since he is more familiar
> > > > > with
> > > > > the benchmark and MPIIO. I think Rob Latham or Rob Ross may
> > > > > be
> > > > > helping too. I continue to look at your data in the mean
> > > > > time.
> > > > 
> > > 
> > 
> 

> > > > > Becky
> > > > 
> > > 
> > 
> 

> > > > > Phil/Rob:
> > > > 
> > > 
> > 
> 

> > > > > Thanks so much for helping Matthieu. I am digging into the
> > > > > matter
> > > > > but
> > > > > MPI is still new to me and I'm not familiar with the PVFS
> > > > > interface
> > > > > that accompanies ROMIO.
> > > > 
> > > 
> > 
> 

> > > > > Becky
> > > > 
> > > 
> > 
> 

> > > > > PS. Can we keep this on the pvfs2-users list so I can see how
> > > > > things
> > > > > progress?
> > > > 
> > > 
> > 
> 

> > > > > On Tue, Apr 2, 2013 at 10:47 AM, Matthieu Dorier <
> > > > > [email protected] > wrote:
> > > > 
> > > 
> > 
> 

> > > > > > Hi Rob and Phil,
> > > > > 
> > > > 
> > > 
> > 
> 

> > > > > > This thread moved to the ofs-support mailing list (probably
> > > > > > because
> > > > > > the first personne to answer was part of this team), but I
> > > > > > didn't
> > > > > > get much answer to my problem, so I'll try to summarize
> > > > > > here
> > > > > > what
> > > > > > I
> > > > > > have done.
> > > > > 
> > > > 
> > > 
> > 
> 

> > > > > > First to answer Phil, here is the PVFS config file
> > > > > > attached,
> > > > > > and
> > > > > > here
> > > > > > is the script file used for IOR:
> > > > > 
> > > > 
> > > 
> > 
> 

> > > > > > IOR START
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > testFile = pvfs2:/mnt/pvfs2/testfileA
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > filePerProc=0
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > api=MPIIO
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > repetitions=100
> > > > > 
> > > > 
> > > 
> > 
> 

> > > > > > verbose=2
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > blockSize=4m
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > transferSize=4m
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > collective=1
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > writeFile=1
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > interTestDelay=60
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > readFile=0
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > RUN
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > IOR STOP
> > > > > 
> > > > 
> > > 
> > 
> 

> > > > > > Besides the tests I was describing on my first mail, I also
> > > > > > did
> > > > > > the
> > > > > > same experiments on another cluster also with TCP over IB,
> > > > > > and
> > > > > > then
> > > > > > on Ethernet, with 336 clients and 672 clients, with 2, 4
> > > > > > and
> > > > > > 8
> > > > > > storage servers. In every cases, this behavior appears.
> > > > > 
> > > > 
> > > 
> > 
> 

> > > > > > I benchmarked the local disk attached to the storage
> > > > > > servers
> > > > > > and
> > > > > > got
> > > > > > 42MB/s, so the high throughput of over 2GB/s I get
> > > > > > obviously
> > > > > > benefits from some caching mechanisme and the periodic
> > > > > > behavior
> > > > > > observed at high output frequency could be explained by
> > > > > > that.
> > > > > > Yet
> > > > > > this does not explain why, overall, the performance
> > > > > > decreases
> > > > > > over
> > > > > > time.
> > > > > 
> > > > 
> > > 
> > 
> 

> > > > > > I attach a set of graphics summarizing the experiments (on
> > > > > > the
> > > > > > x
> > > > > > axis
> > > > > > it's the iteration number and on the y axis the aggregate
> > > > > > throughput
> > > > > > obtained for this iteration, 100 consecutive iterations are
> > > > > > performed).
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > It seems that the performance follows the law D = a*T+b
> > > > > > where
> > > > > > D
> > > > > > is
> > > > > > the duration of the write, T is the wallclock time since
> > > > > > the
> > > > > > beginning of the experiment, and "a" and "b" are constants.
> > > > > 
> > > > 
> > > 
> > 
> 

> > > > > > When I stop IOR and immediately restart it, I get the good
> > > > > > performance back, it does not continue at the reduced
> > > > > > performance
> > > > > > the previous instance finished.
> > > > > 
> > > > 
> > > 
> > 
> 

> > > > > > I also thought it could come from the fact that the same
> > > > > > file
> > > > > > is
> > > > > > re-written at every iteration, and tried with the
> > > > > > multiFile=1
> > > > > > option
> > > > > > to have one new file at every iteration instead, but this
> > > > > > didn't
> > > > > > help.
> > > > > 
> > > > 
> > > 
> > 
> 

> > > > > > Last thing I can mention: I'm using mpich 3.0.2, compiled
> > > > > > with
> > > > > > PVFS
> > > > > > support.
> > > > > 
> > > > 
> > > 
> > 
> 

> > > > > > Matthieu
> > > > > 
> > > > 
> > > 
> > 
> 

> > > > > > ----- Mail original -----
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > De: "Rob Latham" < [email protected] >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > À: "Matthieu Dorier" < [email protected] >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > Cc: "pvfs2-users" < [email protected] >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > Envoyé: Mardi 2 Avril 2013 15:57:54
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > Objet: Re: [Pvfs2-users] Strange performance behavior
> > > > > > > with
> > > > > > > IOR
> > > > > 
> > > > 
> > > 
> > 
> 

> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > On Sat, Mar 23, 2013 at 03:31:22PM +0100, Matthieu Dorier
> > > > > > > wrote:
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > I've installed PVFS (orangeFS 2.8.7) on a small cluster
> > > > > > > > (2
> > > > > > > > PVFS
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > nodes, 28 compute nodes of 24 cores each, everything
> > > > > > > > connected
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > through infiniband but using an IP stack on top of it,
> > > > > > > > so
> > > > > > > > the
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > protocol for PVFS is TCP), and I witness some strange
> > > > > > > > performance
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > behaviors with IOR (using ROMIO compiled against PVFS,
> > > > > > > > no
> > > > > > > > kernel
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > support):
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > IOR is started on 336 processes (14 nodes), writing
> > > > > > > > 4MB/process
> > > > > > > > in
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > a
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > single shared file using MPI-I/O (4MB transfer size
> > > > > > > > also).
> > > > > > > > It
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > completes 100 iterations.
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > OK, so you have one pvfs client per core. All these are
> > > > > > > talking
> > > > > > > to
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > two servers.
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > First every time I start an instance of IOR, the first
> > > > > > > > I/O
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > operation
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > is extremely slow. I'm guessing this is because ROMIO
> > > > > > > > has
> > > > > > > > to
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > initialize everything, get the list of PVFS servers,
> > > > > > > > etc.
> > > > > > > > Is
> > > > > > > > there
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > a
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > way to speed this up?
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > ROMIO isn't doing a whole lot here, but there is one
> > > > > > > thing
> > > > > > > different
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > about ROMIO's 1st call vs the Nth call. The 1st call
> > > > > > > (first
> > > > > > > time
> > > > > > > any
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > pvfs2 file is opened or deleted), ROMIO will call the
> > > > > > > function
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > PVFS_util_init_defaults().
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > If you have 336 clients banging away on just two servers,
> > > > > > > I
> > > > > > > bet
> > > > > > > that
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > could explain some slowness. In the old days, the PVFS
> > > > > > > server
> > > > > > > had
> > > > > > > to
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > service these requests one at a time.
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > I don't think this restriction has been relaxed? Since it
> > > > > > > is
> > > > > > > a
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > read-only operation, though, it sure seems like one could
> > > > > > > just
> > > > > > > have
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > servers shovel out pvfs2 configuration information as
> > > > > > > fast
> > > > > > > as
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > possible.
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > Then, I set some delay between each iteration, to
> > > > > > > > better
> > > > > > > > reflect
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > the
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > behavior of an actual scientific application.
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > Fun! this is kind of like what MADNESS does. "computes"
> > > > > > > by
> > > > > > > sleeping
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > for a bit. I think Phil's questions will help us
> > > > > > > understand
> > > > > > > the
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > highly variable performance.
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > Can you experiment with IORs collective I/O? by default,
> > > > > > > collective
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > I/O will select one client per node as an "i/o
> > > > > > > aggregator".
> > > > > > > The
> > > > > > > IOR
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > workload will not benefit from ROMIO's two-phase
> > > > > > > optimization,
> > > > > > > but
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > you've got 336 clients banging away on two servers. When
> > > > > > > I
> > > > > > > last
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > studied pvfs scalability, 100x more clients than servers
> > > > > > > wasn't
> > > > > > > a
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > big
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > deal, but 5-6 years ago nodes did not have 24 way
> > > > > > > parallelism.
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > ==rob
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > --
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > Rob Latham
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > Mathematics and Computer Science Division
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > Argonne National Lab, IL USA
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 

> > > > > > _______________________________________________
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > Pvfs2-users mailing list
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > [email protected]
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
> > > > > 
> > > > 
> > > 
> > 
> 

> > > > > --
> > > > 
> > > 
> > 
> 
> > > > > Becky Ligon
> > > > 
> > > 
> > 
> 
> > > > > OrangeFS Support and Development
> > > > 
> > > 
> > 
> 
> > > > > Omnibond Systems
> > > > 
> > > 
> > 
> 
> > > > > Anderson, South Carolina
> > > > 
> > > 
> > 
> 

> > > > --
> > > 
> > 
> 
> > > > Becky Ligon
> > > 
> > 
> 
> > > > OrangeFS Support and Development
> > > 
> > 
> 
> > > > Omnibond Systems
> > > 
> > 
> 
> > > > Anderson, South Carolina
> > > 
> > 
> 

> > --
> 
> > Becky Ligon
> 
> > OrangeFS Support and Development
> 
> > Omnibond Systems
> 
> > Anderson, South Carolina
> 

> --
> Becky Ligon
> OrangeFS Support and Development
> Omnibond Systems
> Anderson, South Carolina
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to