Mike:

>From the data you just sent, we see spikes in the touches as well as the
removes, with the removes being more frequent.

For example, on the rm data, there is a spike of about 2 orders of
magnitude (100x) about every 10 ops, which can result in a 10x average slow
down, even though most of the operations finish quite quickly.  We do not
normally see this, and we don't see it on our systems here, so we are
trying to decide what might cause this so we can direct our efforts.

At this point, we are trying to further diagnose the problem.  Would it be
possible for us to log onto your system to look around and possibly run
some more tests?

I am sorry for the inconvenience this is causing, but rest assured, several
of us developers are trying to figure out the difference in performance
between your system and ours.  (We haven't been able to recreate your
problem as of yet.)


Becky



On Fri, May 31, 2013 at 2:34 PM, Michael Robbert <[email protected]> wrote:

> My terminal buffers weren't big enough to copy and paste all of that
> output, but hopefully the attached will have enough info for you to get an
> idea of what I'm seeing.
> I am beginning to feel like we're just running around in circles here. I
> can do these kinds of tests with and without cache until I'm blue in the
> face, but nothing is going to change until we figure out why un-cached meta
> data access is so slow. What are we doing to track that down?
>
> Thanks,
> Mike
>
>
> On 5/31/13 12:05 PM, Becky Ligon wrote:
>
>> Mike:
>>
>> There is something going on with your system, as I am able to touch 500
>> files in 12.5 seconds and delete them in 8.8 seconds on our cluster.
>>
>> Did you remove all of ATTR entries from your conf file and restart the
>> servers?
>>
>> If not, please do so and then capture the output from the following and
>> send it to me:
>>
>> for i in `seq 1 500`; do time touch myfile${i}; done
>>
>> and then
>>
>> for i in myfile*; do time rm -f ${i}; done.
>>
>>
>> Thanks,
>> Becky
>>
>>
>> On Fri, May 31, 2013 at 12:02 PM, Michael Robbert <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>>     top - 09:54:53 up 6 days, 19:11,  1 user,  load average: 0.00, 0.00,
>>     0.00
>>     Tasks: 156 total,   1 running, 155 sleeping,   0 stopped,   0 zombie
>>     Cpu(s):  0.1%us,  0.2%sy,  0.0%ni, 99.8%id,  0.0%wa,  0.0%hi,
>>       0.0%si, 0.0%st
>>     Mem:  12289220k total,  1322196k used, 10967024k free,    85820k
>> buffers
>>     Swap:  2104432k total,      232k used,  2104200k free,   965636k
>> cached
>>
>>     They all look very similar to this. 232k swap used on all of them
>>     throughout a touch/rm of 100 files. Ganglia doesn't show any change
>>     over time with cache on or off.
>>
>>     Mike
>>
>>
>>     On 5/31/13 9:30 AM, Becky Ligon wrote:
>>
>>         Michael:
>>
>>         Can you send me a screen shot of "top" from your servers when the
>>         metadata is running on the local disk?  I'd like to see how much
>>         memory
>>         is available.  I'm wondering if 1GB for your DB cache is too high,
>>         possibly causing excessive swapping.
>>
>>         Becky
>>
>>
>>         On Fri, May 24, 2013 at 6:06 PM, Michael Robbert
>>         <[email protected] <mailto:[email protected]>
>>         <mailto:[email protected] <mailto:[email protected]>>> wrote:
>>
>>              We recently noticed a performance problem with our OrangeFS
>>         server.
>>
>>              Here are the server stats:
>>              3 servers, built identically with identical hardware
>>
>>              [root@orangefs02 ~]# /usr/sbin/pvfs2-server --version
>>              2.8.7-orangefs (mode: aio-threaded)
>>
>>              [root@orangefs02 ~]# uname -r
>>              2.6.18-308.16.1.el5.584g0000
>>
>>              4 core E5603 1.60GHz
>>              12GB of RAM
>>
>>              OrangeFS is being served to clients using bmi_tcp over DDR
>>         Infiniband.
>>              Backend storage is PanFS with 2x10Gig connections on the
>>         servers.
>>              Performance to the backend looks fine using bonnie++.
>>          >100MB/sec
>>              write and ~250MB/s read to each stack. ~300 creates/sec.
>>
>>              On the OrangeFS clients are running kernel version
>>         2.6.18-238.19.1.el5.
>>
>>              The biggest problem I have right now is that delete are
>>         taking a
>>              long time. Almost 1 sec per file.
>>
>>              [root@fatcompute-11-32
>>         L_10_V0.2_eta0.3_wRes_____**truncerr1e-11]# find
>>
>>              N2/|wc -l
>>              137
>>              [root@fatcompute-11-32
>>         L_10_V0.2_eta0.3_wRes_____**truncerr1e-11]# time
>>
>>
>>              rm -rf N2
>>
>>              real    1m31.096s
>>              user    0m0.000s
>>              sys     0m0.015s
>>
>>              Similar results for file creates:
>>
>>              [root@fatcompute-11-32 ]# date;for i in `seq 1 50`;do touch
>>              file${i};done;date
>>              Fri May 24 16:04:17 MDT 2013
>>              Fri May 24 16:05:05 MDT 2013
>>
>>              What else do you need to know? Which debug flags? What
>>         should we be
>>              looking at?
>>              I don't see any load on the servers and I've restarted
>>         server and
>>              rebooted server nodes.
>>
>>              Thanks for any pointers,
>>              Mike Robbert
>>              Colorado School of Mines
>>
>>
>>              ______________________________**___________________
>>              Pvfs2-users mailing list
>>         
>> Pvfs2-users@beowulf-__**underground.org<Pvfs2-users@beowulf-__underground.org>
>>         
>> <mailto:Pvfs2-users@beowulf-**underground.org<[email protected]>
>> >
>>              
>> <mailto:Pvfs2-users@beowulf-__**underground.org<Pvfs2-users@beowulf-__underground.org>
>>         
>> <mailto:Pvfs2-users@beowulf-**underground.org<[email protected]>
>> >>
>>
>>         http://www.beowulf-__**underground.org/mailman/__**
>> listinfo/pvfs2-users<http://www.beowulf-__underground.org/mailman/__listinfo/pvfs2-users>
>>
>>         <http://www.beowulf-**underground.org/mailman/**
>> listinfo/pvfs2-users<http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users>
>> >
>>
>>
>>
>>
>>         --
>>         Becky Ligon
>>         OrangeFS Support and Development
>>         Omnibond Systems
>>         Anderson, South Carolina
>>
>>
>>
>>
>>
>> --
>> Becky Ligon
>> OrangeFS Support and Development
>> Omnibond Systems
>> Anderson, South Carolina
>>
>>


-- 
Becky Ligon
OrangeFS Support and Development
Omnibond Systems
Anderson, South Carolina
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to