Thank you for your reply!

So, first, on my testing:

I've got a terminal open on 3 each of my cluster's compute nodes.  I
then execute the dd line originally mentioned as near as
simultaneously as possible (I'd say no more than 2 sec delay from
first to last start).  I realize this is not that precise, except the
dd line is doing 11GB each, and the proportion of time spent
concurrent vs the non-concurrent time is small, and for timing, I let
each dd report its own.

I did not have jumbo frames (max frame size of 5,000-something; I
upped to the switch's max, a bit over 9,000).  That did not impact
results.

With respect to TCP kernel tuning, I'm running kernel 2.6.16.1.15.el5,
and I've verified that both sender and receiver has kernel autotuning
turned on.  According to the link linked from Sam's link on kernel tcp
tuning, all the values are sane, and my kernel is new enough to
suggest not messing with the tcp parameters.

I did experiment with setting my tcp congestion control algorithm to
highspeed, but with no noticeable difference.

Towards the end of my testing, I realized that linux still had the mtu
set to 1500...I wasn't sure if I had to change this (ifconfig eth0 mtu
9000), so I gave it a try.  On my servers with bond0, I changed all of
eth0, eth1, and bond0 to mtu's of 9000.  Originally, I forgot to
change my client, so I got very poor performance (single workstation
to my servers achieved 34.4MB/s; before MTU mod, I got 97).  After
adjusting my workstation to mtu of 9000, I got 103MB/s single system
(which is actually within the tolerance of my original, unmodified
tests).

So:

1) should I be setting my mtu's on linux (via ifconfig) on all my
systems to 9000 (or some other specific value)?  It was 1500 before I
started.
2) should I be setting my tcp congestion control algorithm to highspeed?
3) on my pvfs servers, should I be using alb ethernet bonding for
better throughput?

I have not modified my strip size.  Can this be done without breaking
my existing pvfs setup or having to do some migrations?  Can I do it
in the conf file of the servers?  Do I need to make any mods to the
clients for this to take effect?

--Jim

On Wed, Jul 22, 2009 at 5:17 PM, Rob Ross<[email protected]> wrote:
> Hi Jim,
>
> Sorry things aren't behaving as you would expect.
>
> A few questions related to your tests:
> - How are you running multiple clients?
> - How are you timing the tests?
>
> Related to configuration:
> - Do you have jumbo frames enabled?
> - Have you adjusted the default strip size in PVFS (looks like no from your
> configuration file, but you could have done this via an attribute)?
>  See:
> http://www.pvfs.org/cvs/pvfs-2-8-branch.build/doc/pvfs2-faq/pvfs2-faq.php#SECTION00076200000000000000
>  Crank this to 4194304 or so
> - Have you read Kyle's email about flow buffer adjustments? That can also be
> helpful.
>
> Maybe you just have a junk switch; see this:
> http://www.pdl.cmu.edu/Incast/index.html
>
> Regards,
>
> Rob
>
> On Jul 22, 2009, at 6:34 PM, Jim Kusznir wrote:
>
>> Hi:
>>
>> I performed some basic tests today with pvfs2 and 2.8.1.  Running this
>> version, and all tests performed with the kernel connector (i.e.,
>> traditional filesystem mount), I performed a few tests.
>>
>> My topology is as follows:
>>
>> 3 dedicated pvfs2 servers, serving I/O and metadata (although all
>> clients are given the address to the first server in their URI).  Each
>> I/O server has 2 gig-e connections into the gigabit network switch for
>> the cluster, and are runing ALB ethernet load balancing.  In theory,
>> each server has 2Gbps throughput potential now.  For disk drives, all
>> my servers are running Dell PERC 6/e cards and MD1000's array of 15
>> SATA 750GB hard drives in hardware RAID-6.  Each pvfs server is
>> responsible for just under 10TB of disk storage.  Using the test
>> command below, it came out to 373MB/s to the local disk on the pvfs
>> server.
>>
>> All of my clients are single Gig-E connected into the same gigabit
>> switch, same network.  My network is comprised of ROCKS 5.1 (CentOS
>> 5.2) and some just plain old CentOS 5 servers.
>>
>> The test command was:  dd if=/dev/zero of=<file>.out bs=4000K count=2800
>>
>> First test: single machine to pvfs storage: 95.6MB/s
>> Second Test: two cluster machines to pvfs storage: 80.2 MB/s
>> Third test: 3 machines: 53.2MB/s
>> Fourth test: 4 machines: 44.7MB/s
>>
>> This test surprised me greatly.  My understanding was the big benifits
>> behind pvfs was the scalability; that with 3 I/O servers, I should
>> reasonably expect to get at least 3x the bandwidth.  Given this, I
>> have a theoretical 6 Gbps to my storage, yet my actual throughput did
>> not scale much at all...My initial single-machine connection came out
>> at a bit under 1Gbps, and my 4 machine connection came up at 1.2Gbps.
>> Each time I added a machine, the throughput of them all went down.
>> What gives?  My actual local disk throughput on my I/O servers is
>> 373MB/s and the local pvfs2 server system load never broke 1.0, so
>> that wasn't the bottleneck...
>>
>> Here's my pvfs2-fs.conf:
>>
>> <Defaults>
>>        UnexpectedRequests 50
>>        EventLogging none
>>        LogStamp datetime
>>        BMIModules bmi_tcp
>>        FlowModules flowproto_multiqueue
>>        PerfUpdateInterval 1000
>>        ServerJobBMITimeoutSecs 30
>>        ServerJobFlowTimeoutSecs 30
>>        ClientJobBMITimeoutSecs 300
>>        ClientJobFlowTimeoutSecs 300
>>        ClientRetryLimit 5
>>        ClientRetryDelayMilliSecs 2000
>>        StorageSpace /mnt/pvfs2
>>        LogFile /var/log/pvfs2-server.log
>> </Defaults>
>>
>> <Aliases>
>>        Alias pvfs2-io-0-0 tcp://pvfs2-io-0-0:3334
>>        Alias pvfs2-io-0-1 tcp://pvfs2-io-0-1:3334
>>        Alias pvfs2-io-0-2 tcp://pvfs2-io-0-2:3334
>> </Aliases>
>>
>> <Filesystem>
>>        Name pvfs2-fs
>>        ID 62659950
>>        RootHandle 1048576
>>        <MetaHandleRanges>
>>                Range pvfs2-io-0-0 4-715827885
>>                Range pvfs2-io-0-1 715827886-1431655767
>>                Range pvfs2-io-0-2 1431655768-2147483649
>>        </MetaHandleRanges>
>>        <DataHandleRanges>
>>                Range pvfs2-io-0-0 2147483650-2863311531
>>                Range pvfs2-io-0-1 2863311532-3579139413
>>                Range pvfs2-io-0-2 3579139414-4294967295
>>        </DataHandleRanges>
>>        <StorageHints>
>>                TroveSyncMeta yes
>>                TroveSyncData no
>>        </StorageHints>
>> </Filesystem>
>>
>>
>> --Jim
>> _______________________________________________
>> Pvfs2-users mailing list
>> [email protected]
>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>
>

_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to