Re: [Pvfs2-users] Performance drop

Michael Robbert Mon, 03 Jun 2013 08:20:13 -0700

We are confined to kernels from Scyld Clusterware in the 2.6.18-308.* range. Our PanFS modules were purchased as a one time deal to get it to work with Scyld 5.x. They put in some work to make it version number independent, but I've tried non-scyld and other versions of Scyld and it doesn't work.


Mike


On 6/2/13 8:50 PM, Becky Ligon wrote:

All:

The area of the code where we thought more time was being spent than
seemed reasonable was in the metafile dspace create and the local
datafile dspace create contained in the create state machine.  In both
of these operations, the code executes a function called
dbpf_dspace_create_store_handle which does the following:

1.  db->get against BDB to see if the new handle already has a dspace
entry....which it shouldn't and doesn't.
2.  Issue a system call to "access" which tells us if the bstream file
for the given handle already exists...which it doesn't.
3.  db-put against BDB to store the dspace entry for the new handle
4.  inserts into the attribute cache.


In reviewing a more detailed debug log of these functions, I discovered
that most of the time these four operations execute in less than 0.5ms.
When the time is greater than that, the culprit is always the "access"
call alone or the "access" call along with interrupts from the job_timer
state machine.

At this point, I am thinking that there may be a problem with the
version of linux running on the machines.  As noted in my previous
email, 2.6.18-308.16.1.el5 is known to have issues with the kernel
dcache mechanism, which leads me to believe there could be other issues
as well.

In the morning, I will run the same tests on a newer kernel (RHEL 6.3)
and compare "access" times between the two kernels.

Becky






On Fri, May 31, 2013 at 7:22 PM, Becky Ligon <[email protected]
<mailto:[email protected]>> wrote:

    Thanks, Mike!

    I ran some more tests hoping that the null-aio trove method would
    eliminate disk issues, but null-aio, as I just discovered, still
    allows files to be created. Doh!  So, I will be looking more in
    depth at our file creation process which includes metadata updates
    and file creation on the disk.

    BTW:  I noticed that you are running 2.6.18-308.16.1.el5.584g0000
    on your servers and there is a known Linux bug concerning dcache
    processing that creates a kernel panic when OrangeFS is unmounted.
    This bug effects other software, too, not just ours.  Have you had
    any problems along these lines?  Our recommendation for those who
    want to stay on RHEL 5 is to use 2.6.18-308.

    Becky



    On Fri, May 31, 2013 at 6:33 PM, Michael Robbert <[email protected]
    <mailto:[email protected]>> wrote:

        Yes, please do. You have free reign on the nodes that I listed
        in my Email to you until this problem is solved.

        Thanks,
        Mike


        On 5/31/13 4:23 PM, Becky Ligon wrote:

            Mike:

            Thanks for letting us onto your system.

            We ran some more tests and it seems that file creation
            during the touch
            command is taking more time than it should, while metadata
            ops seem
            okay.   I dumped some more OFS debug data and will be
            looking at it over
            the weekend.  I want to pinpoint the precise places in the
            code that I
            *think* are taking time and then rerun more tests.  This may
            mean
            putting up a new copy of OFS with more specific debugging in
            it, if that
            is okay with you.  I also have more ideas on other tests
            that we can run
            to verify where the problem is occurring.

            Is it okay if I log onto your system over the weekend?

            Becky


            On Fri, May 31, 2013 at 3:24 PM, Becky Ligon
            <[email protected] <mailto:[email protected]>
            <mailto:[email protected] <mailto:[email protected]>>> wrote:

                 Mike:

                  From the data you just sent, we see spikes in the
            touches as well
                 as the removes, with the removes being more frequent.

                 For example, on the rm data, there is a spike of about
            2 orders of
                 magnitude (100x) about every 10 ops, which can result
            in a 10x
                 average slow down, even though most of the operations
            finish quite
                 quickly.  We do not normally see this, and we don't see
            it on our
                 systems here, so we are trying to decide what might
            cause this so we
                 can direct our efforts.

                 At this point, we are trying to further diagnose the
            problem.  Would
                 it be possible for us to log onto your system to look
            around and
                 possibly run some more tests?

                 I am sorry for the inconvenience this is causing, but
            rest assured,
                 several of us developers are trying to figure out the
            difference in
                 performance between your system and ours.  (We haven't
            been able to
                 recreate your problem as of yet.)


                 Becky



                 On Fri, May 31, 2013 at 2:34 PM, Michael Robbert
            <[email protected] <mailto:[email protected]>
                 <mailto:[email protected]
            <mailto:[email protected]>>> wrote:

                     My terminal buffers weren't big enough to copy and
            paste all of
                     that output, but hopefully the attached will have
            enough info
                     for you to get an idea of what I'm seeing.
                     I am beginning to feel like we're just running
            around in circles
                     here. I can do these kinds of tests with and
            without cache until
                     I'm blue in the face, but nothing is going to
            change until we
                     figure out why un-cached meta data access is so
            slow. What are
                     we doing to track that down?

                     Thanks,
                     Mike


                     On 5/31/13 12:05 PM, Becky Ligon wrote:

                         Mike:

                         There is something going on with your system,
            as I am able
                         to touch 500
                         files in 12.5 seconds and delete them in 8.8
            seconds on our
                         cluster.

                         Did you remove all of ATTR entries from your
            conf file and
                         restart the
                         servers?

                         If not, please do so and then capture the
            output from the
                         following and
                         send it to me:

                         for i in `seq 1 500`; do time touch myfile${i};
            done

                         and then

                         for i in myfile*; do time rm -f ${i}; done.


                         Thanks,
                         Becky


                         On Fri, May 31, 2013 at 12:02 PM, Michael Robbert
                         <[email protected] <mailto:[email protected]>
            <mailto:[email protected] <mailto:[email protected]>>
                         <mailto:[email protected]
            <mailto:[email protected]> <mailto:[email protected]
            <mailto:[email protected]>>>> wrote:

                              top - 09:54:53 up 6 days, 19:11,  1 user,
              load
                         average: 0.00, 0.00,
                              0.00
                              Tasks: 156 total,   1 running, 155
            sleeping,   0
                         stopped,   0 zombie
                              Cpu(s):  0.1%us,  0.2%sy,  0.0%ni,
            99.8%id,  0.0%wa,
                           0.0%hi,
                                0.0%si, 0.0%st
                              Mem:  12289220k total,  1322196k used,
            10967024k free,
                             85820k buffers
                              Swap:  2104432k total,      232k used,
              2104200k free,
                            965636k cached

                              They all look very similar to this. 232k
            swap used on
                         all of them
                              throughout a touch/rm of 100 files.
            Ganglia doesn't
                         show any change
                              over time with cache on or off.

                              Mike


                              On 5/31/13 9:30 AM, Becky Ligon wrote:

                                  Michael:

                                  Can you send me a screen shot of "top"
            from your
                         servers when the
                                  metadata is running on the local disk?
              I'd like to
                         see how much
                                  memory
                                  is available.  I'm wondering if 1GB
            for your DB
                         cache is too high,
                                  possibly causing excessive swapping.

                                  Becky


                                  On Fri, May 24, 2013 at 6:06 PM,
            Michael Robbert
                                  <[email protected]
            <mailto:[email protected]> <mailto:[email protected]
            <mailto:[email protected]>>
                         <mailto:[email protected]
            <mailto:[email protected]> <mailto:[email protected]
            <mailto:[email protected]>>>
                                  <mailto:[email protected]
            <mailto:[email protected]>
                         <mailto:[email protected]
            <mailto:[email protected]>> <mailto:[email protected]
            <mailto:[email protected]>
                         <mailto:[email protected]
            <mailto:[email protected]>>>>> wrote:

                                       We recently noticed a performance
            problem with
                         our OrangeFS
                                  server.

                                       Here are the server stats:
                                       3 servers, built identically with
            identical
                         hardware

                                       [root@orangefs02 ~]#
            /usr/sbin/pvfs2-server
                         --version
                                       2.8.7-orangefs (mode: aio-threaded)

                                       [root@orangefs02 ~]# uname -r
                                       2.6.18-308.16.1.el5.584g0000

                                       4 core E5603 1.60GHz
                                       12GB of RAM

                                       OrangeFS is being served to
            clients using
                         bmi_tcp over DDR
                                  Infiniband.
                                       Backend storage is PanFS with 2x10Gig
                         connections on the
                                  servers.
                                       Performance to the backend looks
            fine using
                         bonnie++.
                                   >100MB/sec
                                       write and ~250MB/s read to each
            stack. ~300
                         creates/sec.

                                       On the OrangeFS clients are
            running kernel version
                                  2.6.18-238.19.1.el5.

                                       The biggest problem I have right
            now is that
                         delete are
                                  taking a
                                       long time. Almost 1 sec per file.

                                       [root@fatcompute-11-32

              L_10_V0.2_eta0.3_wRes_________truncerr1e-11]# find


                                       N2/|wc -l
                                       137
                                       [root@fatcompute-11-32

              L_10_V0.2_eta0.3_wRes_________truncerr1e-11]# time



                                       rm -rf N2

                                       real    1m31.096s
                                       user    0m0.000s
                                       sys     0m0.015s

                                       Similar results for file creates:

                                       [root@fatcompute-11-32 ]#
            date;for i in `seq 1
                         50`;do touch
                                       file${i};done;date
                                       Fri May 24 16:04:17 MDT 2013
                                       Fri May 24 16:05:05 MDT 2013

                                       What else do you need to know?
            Which debug
                         flags? What
                                  should we be
                                       looking at?
                                       I don't see any load on the
            servers and I've
                         restarted
                                  server and
                                       rebooted server nodes.

                                       Thanks for any pointers,
                                       Mike Robbert
                                       Colorado School of Mines




            _____________________________________________________
                                       Pvfs2-users mailing list
            Pvfs2-users@beowulf-______underground.org
            <mailto:Pvfs2-users@beowulf-____underground.org>
                         <mailto:Pvfs2-users@beowulf-____underground.org
            <mailto:Pvfs2-users@beowulf-__underground.org>>

              <mailto:Pvfs2-users@beowulf-____underground.org
            <mailto:Pvfs2-users@beowulf-__underground.org>
                         <mailto:Pvfs2-users@beowulf-__underground.org
            <mailto:[email protected]>>>


            <mailto:Pvfs2-users@beowulf-______underground.org
            <mailto:Pvfs2-users@beowulf-____underground.org>
                         <mailto:Pvfs2-users@beowulf-____underground.org
            <mailto:Pvfs2-users@beowulf-__underground.org>>

              <mailto:Pvfs2-users@beowulf-____underground.org
            <mailto:Pvfs2-users@beowulf-__underground.org>
                         <mailto:Pvfs2-users@beowulf-__underground.org
            <mailto:[email protected]>>>>

            
http://www.beowulf-______underground.org/mailman/______listinfo/pvfs2-users
            
<http://www.beowulf-____underground.org/mailman/____listinfo/pvfs2-users>

            
<http://www.beowulf-____underground.org/mailman/____listinfo/pvfs2-users
            
<http://www.beowulf-__underground.org/mailman/__listinfo/pvfs2-users>>




            
<http://www.beowulf-____underground.org/mailman/____listinfo/pvfs2-users
            
<http://www.beowulf-__underground.org/mailman/__listinfo/pvfs2-users>

            <http://www.beowulf-__underground.org/mailman/__listinfo/pvfs2-users
            <http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users>>>




                                  --
                                  Becky Ligon
                                  OrangeFS Support and Development
                                  Omnibond Systems
                                  Anderson, South Carolina





                         --
                         Becky Ligon
                         OrangeFS Support and Development
                         Omnibond Systems
                         Anderson, South Carolina




                 --
                 Becky Ligon
                 OrangeFS Support and Development
                 Omnibond Systems
                 Anderson, South Carolina




            --
            Becky Ligon
            OrangeFS Support and Development
            Omnibond Systems
            Anderson, South Carolina





    --
    Becky Ligon
    OrangeFS Support and Development
    Omnibond Systems
    Anderson, South Carolina




--
Becky Ligon
OrangeFS Support and Development
Omnibond Systems
Anderson, South Carolina

smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Re: [Pvfs2-users] Performance drop

Reply via email to