Another FYI:  On our cluster here at Clemson University, we have turned off
hyperthreading on any machine having intel processors.  We found that MPI
applications perform badly on a true multi-core system when hyperthreading
is enabled.

Do any of your compute nodes have hyperthreading enabled?

Becky


On Tue, Apr 2, 2013 at 12:44 PM, Becky Ligon <[email protected]> wrote:

> Just FYI:  What we have seen with the high CPU utilization is that once
> you have more processes running than cores per machine, the performance
> slows down.  And, we have seen this problem with the client core as well as
> the pvfs library (which ROMIO accesses).  We have not been able to recreate
> the problem systematically and thus have not been able to resolve the issue.
>
>
> On Tue, Apr 2, 2013 at 12:15 PM, Matthieu Dorier <[email protected]
> > wrote:
>
>> To answer Phil's question: just restarting IOR is enough, yes. Not PVFS.
>> For the rest, I'll do some experiments when I have the chance and get
>> back to you.
>>
>> Thanks all
>>
>> Matthieu
>>
>> ------------------------------
>>
>> *De: *"Becky Ligon" <[email protected]>
>> *À: *"Matthieu Dorier" <[email protected]>
>> *Cc: *"Rob Latham" <[email protected]>, "pvfs2-users" <
>> [email protected]>, "ofs-support" <
>> [email protected]>
>> *Envoyé: *Mardi 2 Avril 2013 17:22:17
>>
>> *Objet: *Re: [Pvfs2-users] Strange performance behavior with IOR
>>
>> Matthieu:
>>
>> Are you seeing any 100% CPU utilizations on the client?  We have seen
>> this with the client core (which you are not using) on a multicore system;
>> however, both the client core and the PVFS interface do use the same
>> request structures, etc.
>>
>> Becky
>>
>> On Tue, Apr 2, 2013 at 11:11 AM, Becky Ligon <[email protected]> wrote:
>>
>>> Matthieu:
>>>
>>> I have asked Phil Carns to help you since he is more familiar with the
>>> benchmark and MPIIO.  I think Rob Latham or Rob Ross may be helping too.  I
>>> continue to look at your data in the mean time.
>>>
>>> Becky
>>>
>>> Phil/Rob:
>>>
>>> Thanks so much for helping Matthieu.  I am digging into the matter but
>>> MPI is still new to me and I'm not familiar with the PVFS interface that
>>> accompanies ROMIO.
>>>
>>> Becky
>>>
>>> PS.  Can we keep this on the pvfs2-users list so I can see how things
>>> progress?
>>>
>>>
>>> On Tue, Apr 2, 2013 at 10:47 AM, Matthieu Dorier <
>>> [email protected]> wrote:
>>>
>>>> Hi Rob and Phil,
>>>>
>>>> This thread moved to the ofs-support mailing list (probably because the
>>>> first personne to answer was part of this team), but I didn't get much
>>>> answer to my problem, so I'll try to summarize here what I have done.
>>>>
>>>> First to answer Phil, here is the PVFS config file attached, and here
>>>> is the script file used for IOR:
>>>>
>>>> IOR START
>>>>   testFile = pvfs2:/mnt/pvfs2/testfileA
>>>>   filePerProc=0
>>>>   api=MPIIO
>>>>   repetitions=100
>>>>   verbose=2
>>>>   blockSize=4m
>>>>   transferSize=4m
>>>>   collective=1
>>>>   writeFile=1
>>>>   interTestDelay=60
>>>>   readFile=0
>>>>   RUN
>>>> IOR STOP
>>>>
>>>> Besides the tests I was describing on my first mail, I also did the
>>>> same experiments on another cluster also with TCP over IB, and then on
>>>> Ethernet, with 336 clients and 672 clients, with 2, 4 and 8 storage
>>>> servers. In every cases, this behavior appears.
>>>>
>>>> I benchmarked the local disk attached to the storage servers and got
>>>> 42MB/s, so the high throughput of over 2GB/s I get obviously benefits from
>>>> some caching mechanisme and the periodic behavior observed at high output
>>>> frequency could be explained by that. Yet this does not explain why,
>>>> overall, the performance decreases over time.
>>>>
>>>> I attach a set of graphics summarizing the experiments (on the x axis
>>>> it's the iteration number and on the y axis the aggregate throughput
>>>> obtained for this iteration, 100 consecutive iterations are performed).
>>>> It seems that the performance follows the law D = a*T+b where D is the
>>>> duration of the write, T is the wallclock time since the beginning of the
>>>> experiment, and "a" and "b" are constants.
>>>>
>>>> When I stop IOR and immediately restart it, I get the good performance
>>>> back, it does not continue at the reduced performance the previous instance
>>>> finished.
>>>>
>>>> I also thought it could come from the fact that the same file is
>>>> re-written at every iteration, and tried with the multiFile=1 option to
>>>> have one new file at every iteration instead, but this didn't help.
>>>>
>>>> Last thing I can mention: I'm using mpich 3.0.2, compiled with PVFS
>>>> support.
>>>>
>>>> Matthieu
>>>>
>>>> ----- Mail original -----
>>>> > De: "Rob Latham" <[email protected]>
>>>> > À: "Matthieu Dorier" <[email protected]>
>>>> > Cc: "pvfs2-users" <[email protected]>
>>>> > Envoyé: Mardi 2 Avril 2013 15:57:54
>>>> > Objet: Re: [Pvfs2-users] Strange performance behavior with IOR
>>>> >
>>>> > On Sat, Mar 23, 2013 at 03:31:22PM +0100, Matthieu Dorier wrote:
>>>> > > I've installed PVFS (orangeFS 2.8.7) on a small cluster (2 PVFS
>>>> > > nodes, 28 compute nodes of 24 cores each, everything connected
>>>> > > through infiniband but using an IP stack on top of it, so the
>>>> > > protocol for PVFS is TCP), and I witness some strange performance
>>>> > > behaviors with IOR (using ROMIO compiled against PVFS, no kernel
>>>> > > support):
>>>> >
>>>> > > IOR is started on 336 processes (14 nodes), writing 4MB/process in
>>>> > > a
>>>> > > single shared file using MPI-I/O (4MB transfer size also). It
>>>> > > completes 100 iterations.
>>>> >
>>>> > OK, so you have one pvfs client per core.  All these are talking to
>>>> > two servers.
>>>> >
>>>> > > First every time I start an instance of IOR, the first I/O
>>>> > > operation
>>>> > > is extremely slow. I'm guessing this is because ROMIO has to
>>>> > > initialize everything, get the list of PVFS servers, etc. Is there
>>>> > > a
>>>> > > way to speed this up?
>>>> >
>>>> > ROMIO isn't doing a whole lot here, but there is one thing different
>>>> > about ROMIO's 1st call vs the Nth call.  The 1st call (first time any
>>>> > pvfs2 file is opened or deleted), ROMIO will call the function
>>>> > PVFS_util_init_defaults().
>>>> >
>>>> > If you have 336 clients banging away on just two servers, I bet that
>>>> > could explain some slowness.  In the old days, the PVFS server had to
>>>> > service these requests one at a time.
>>>> >
>>>> > I don't think this restriction has been relaxed?  Since it is a
>>>> > read-only operation, though, it sure seems like one could just have
>>>> > servers shovel out pvfs2 configuration information as fast as
>>>> > possible.
>>>> >
>>>> >
>>>> > > Then, I set some delay between each iteration, to better reflect
>>>> > > the
>>>> > > behavior of an actual scientific application.
>>>> >
>>>> > Fun! this is kind of like what MADNESS does.  "computes" by sleeping
>>>> > for a bit.   I think Phil's questions will help us understand the
>>>> > highly variable performance.
>>>> >
>>>> > Can you experiment with IORs collective I/O?  by default, collective
>>>> > I/O will select one client per node as an "i/o aggregator".  The IOR
>>>> > workload will not benefit from ROMIO's two-phase optimization, but
>>>> > you've got 336 clients banging away on two servers.  When I last
>>>> > studied pvfs scalability,  100x more clients than servers wasn't a
>>>> > big
>>>> > deal, but 5-6 years ago nodes did not have 24 way parallelism.
>>>> >
>>>> > ==rob
>>>> >
>>>> > --
>>>> > Rob Latham
>>>> > Mathematics and Computer Science Division
>>>> > Argonne National Lab, IL USA
>>>> >
>>>>
>>>> _______________________________________________
>>>> Pvfs2-users mailing list
>>>> [email protected]
>>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>>>>
>>>>
>>>
>>>
>>> --
>>> Becky Ligon
>>> OrangeFS Support and Development
>>> Omnibond Systems
>>> Anderson, South Carolina
>>>
>>>
>>
>>
>> --
>> Becky Ligon
>> OrangeFS Support and Development
>> Omnibond Systems
>> Anderson, South Carolina
>>
>>
>>
>
>
> --
> Becky Ligon
> OrangeFS Support and Development
> Omnibond Systems
> Anderson, South Carolina
>
>


-- 
Becky Ligon
OrangeFS Support and Development
Omnibond Systems
Anderson, South Carolina
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to