Hi all:

I've been trying to track down some performance problems with my pvfs2
system on my HPC cluster.  Here's my system arch:

I have 3 dedicated I/O nodes, each are identical Dell PowerEdge 1950's
with PERC 6/e cards attached to a 15-disk MD1000 that has about 9.8TB
of storage after RAID-6'ing it.
Each I/O node has both of its Gig-E interfaces connected to the
cluster's switch and bonded together (bond0).  The systems are running
CentOS 5 and the RAID is formatted with XFS.
bonnie++ on one of the raid disk (locally) reports:

Version  1.94       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
pvfs2-io-0-0.loc 8G   596  99 177257  35 93649  21   851  97 353346  35 353.29
Latency             42912us     733ms    1016ms   65897us     436ms     118ms
Version  1.94       ------Sequential Create------ --------Random Create--------
pvfs2-io-0-0.local  -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16  1752  32 +++++ +++  4639  26  1932  37 +++++ +++  437823
Latency               135ms     104us     688ms   80075us     136us     214ms
1.93c,1.94,pvfs2-io-0-0.local,1,1241623639,8G,,596,99,177257,35,93649,21,851,97,353346,35,353.2,9,16,,,,,1752,32,+++++,+++,4639,26,1932,37,+++++,+++,4378,23,42912us,733ms,1016ms,65897us,436ms,118ms,135ms,104us,688ms,80075us,136us,214ms

I then have 24 compute nodes, running ROCKS 5.1 (all PVFS2 supported
added by me, NOT using rocks pvfs2 roll), and connected via a single
Gig-E port to the same switch.

I have 1 head node, Dell 2950 with a RAID-5 for user home directories,
but otherwise identical to the compute nodes.  It has one gig-e
interface on the "outside world" and one on the "cluster switch".
I also have a gig-e fiber link to a second set of compute nodes
located off site, but we'll ignore those for now (I haven't performed
performance testing on them yet).

I started with what I thought would be the base case: testing I/O
performance on the head node.  This is a valid use case, as users are
moving data sets into or out of pvfs2 through the head node; they may
also be running some single-threaded analysis on their data or some
post-processing.  I'd say that currently, about half (maybe a bit
more) of all I/O to pvfs2 happens in this way.

Using pvfs2-cp  -t, I obtain about 60MB/s with large file I/O (moving
a 10GB file to or from the pvfs).  While not stellar, it works.
However, any I/O happening with the filesystem interface deteriorates
rapidly.  Performing the same copy, but this time with time cp .....
(i.e., using the native / kernel filesystem hooks, I get only 2.97
MB/s.

After 2 hours, I have not yet been able to complete a single run of
bonnie++ using the filesystem interface.

There's got to be something wrong....How do I go about fixing it?

(BTW: I have seen a number of problems with the filesystem / kernel
module.  For example, this morning I found about 5GB of ram "missing",
and it appears that it got "lost" in the kernel.  While I can't pin it
on pvfs2, this doesn't happen if I don't have the pvfs2 module loaded.
 I haven't been able to reproduce it easily, but I think it has to do
with all my nodes running updatedb at the same time.  I realize the
solution in this case is to tell updatedb not to scan pvfs, but why
does pvfs kernel module loose 5GB of ram when this happens?  It should
either work slowly or fail miserably, but it should NOT crash the
system or loose large quantities of ram permanently (must reboot to
reclaim ram)).

--Jim
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to