Mike
On 6/2/13 8:50 PM, Becky Ligon wrote:
All: The area of the code where we thought more time was being spent than seemed reasonable was in the metafile dspace create and the local datafile dspace create contained in the create state machine. In both of these operations, the code executes a function called dbpf_dspace_create_store_handle which does the following: 1. db->get against BDB to see if the new handle already has a dspace entry....which it shouldn't and doesn't. 2. Issue a system call to "access" which tells us if the bstream file for the given handle already exists...which it doesn't. 3. db-put against BDB to store the dspace entry for the new handle 4. inserts into the attribute cache. In reviewing a more detailed debug log of these functions, I discovered that most of the time these four operations execute in less than 0.5ms. When the time is greater than that, the culprit is always the "access" call alone or the "access" call along with interrupts from the job_timer state machine. At this point, I am thinking that there may be a problem with the version of linux running on the machines. As noted in my previous email, 2.6.18-308.16.1.el5 is known to have issues with the kernel dcache mechanism, which leads me to believe there could be other issues as well. In the morning, I will run the same tests on a newer kernel (RHEL 6.3) and compare "access" times between the two kernels. Becky On Fri, May 31, 2013 at 7:22 PM, Becky Ligon <[email protected] <mailto:[email protected]>> wrote: Thanks, Mike! I ran some more tests hoping that the null-aio trove method would eliminate disk issues, but null-aio, as I just discovered, still allows files to be created. Doh! So, I will be looking more in depth at our file creation process which includes metadata updates and file creation on the disk. BTW: I noticed that you are running 2.6.18-308.16.1.el5.584g0000 on your servers and there is a known Linux bug concerning dcache processing that creates a kernel panic when OrangeFS is unmounted. This bug effects other software, too, not just ours. Have you had any problems along these lines? Our recommendation for those who want to stay on RHEL 5 is to use 2.6.18-308. Becky On Fri, May 31, 2013 at 6:33 PM, Michael Robbert <[email protected] <mailto:[email protected]>> wrote: Yes, please do. You have free reign on the nodes that I listed in my Email to you until this problem is solved. Thanks, Mike On 5/31/13 4:23 PM, Becky Ligon wrote: Mike: Thanks for letting us onto your system. We ran some more tests and it seems that file creation during the touch command is taking more time than it should, while metadata ops seem okay. I dumped some more OFS debug data and will be looking at it over the weekend. I want to pinpoint the precise places in the code that I *think* are taking time and then rerun more tests. This may mean putting up a new copy of OFS with more specific debugging in it, if that is okay with you. I also have more ideas on other tests that we can run to verify where the problem is occurring. Is it okay if I log onto your system over the weekend? Becky On Fri, May 31, 2013 at 3:24 PM, Becky Ligon <[email protected] <mailto:[email protected]> <mailto:[email protected] <mailto:[email protected]>>> wrote: Mike: From the data you just sent, we see spikes in the touches as well as the removes, with the removes being more frequent. For example, on the rm data, there is a spike of about 2 orders of magnitude (100x) about every 10 ops, which can result in a 10x average slow down, even though most of the operations finish quite quickly. We do not normally see this, and we don't see it on our systems here, so we are trying to decide what might cause this so we can direct our efforts. At this point, we are trying to further diagnose the problem. Would it be possible for us to log onto your system to look around and possibly run some more tests? I am sorry for the inconvenience this is causing, but rest assured, several of us developers are trying to figure out the difference in performance between your system and ours. (We haven't been able to recreate your problem as of yet.) Becky On Fri, May 31, 2013 at 2:34 PM, Michael Robbert <[email protected] <mailto:[email protected]> <mailto:[email protected] <mailto:[email protected]>>> wrote: My terminal buffers weren't big enough to copy and paste all of that output, but hopefully the attached will have enough info for you to get an idea of what I'm seeing. I am beginning to feel like we're just running around in circles here. I can do these kinds of tests with and without cache until I'm blue in the face, but nothing is going to change until we figure out why un-cached meta data access is so slow. What are we doing to track that down? Thanks, Mike On 5/31/13 12:05 PM, Becky Ligon wrote: Mike: There is something going on with your system, as I am able to touch 500 files in 12.5 seconds and delete them in 8.8 seconds on our cluster. Did you remove all of ATTR entries from your conf file and restart the servers? If not, please do so and then capture the output from the following and send it to me: for i in `seq 1 500`; do time touch myfile${i}; done and then for i in myfile*; do time rm -f ${i}; done. Thanks, Becky On Fri, May 31, 2013 at 12:02 PM, Michael Robbert <[email protected] <mailto:[email protected]> <mailto:[email protected] <mailto:[email protected]>> <mailto:[email protected] <mailto:[email protected]> <mailto:[email protected] <mailto:[email protected]>>>> wrote: top - 09:54:53 up 6 days, 19:11, 1 user, load average: 0.00, 0.00, 0.00 Tasks: 156 total, 1 running, 155 sleeping, 0 stopped, 0 zombie Cpu(s): 0.1%us, 0.2%sy, 0.0%ni, 99.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 12289220k total, 1322196k used, 10967024k free, 85820k buffers Swap: 2104432k total, 232k used, 2104200k free, 965636k cached They all look very similar to this. 232k swap used on all of them throughout a touch/rm of 100 files. Ganglia doesn't show any change over time with cache on or off. Mike On 5/31/13 9:30 AM, Becky Ligon wrote: Michael: Can you send me a screen shot of "top" from your servers when the metadata is running on the local disk? I'd like to see how much memory is available. I'm wondering if 1GB for your DB cache is too high, possibly causing excessive swapping. Becky On Fri, May 24, 2013 at 6:06 PM, Michael Robbert <[email protected] <mailto:[email protected]> <mailto:[email protected] <mailto:[email protected]>> <mailto:[email protected] <mailto:[email protected]> <mailto:[email protected] <mailto:[email protected]>>> <mailto:[email protected] <mailto:[email protected]> <mailto:[email protected] <mailto:[email protected]>> <mailto:[email protected] <mailto:[email protected]> <mailto:[email protected] <mailto:[email protected]>>>>> wrote: We recently noticed a performance problem with our OrangeFS server. Here are the server stats: 3 servers, built identically with identical hardware [root@orangefs02 ~]# /usr/sbin/pvfs2-server --version 2.8.7-orangefs (mode: aio-threaded) [root@orangefs02 ~]# uname -r 2.6.18-308.16.1.el5.584g0000 4 core E5603 1.60GHz 12GB of RAM OrangeFS is being served to clients using bmi_tcp over DDR Infiniband. Backend storage is PanFS with 2x10Gig connections on the servers. Performance to the backend looks fine using bonnie++. >100MB/sec write and ~250MB/s read to each stack. ~300 creates/sec. On the OrangeFS clients are running kernel version 2.6.18-238.19.1.el5. The biggest problem I have right now is that delete are taking a long time. Almost 1 sec per file. [root@fatcompute-11-32 L_10_V0.2_eta0.3_wRes_________truncerr1e-11]# find N2/|wc -l 137 [root@fatcompute-11-32 L_10_V0.2_eta0.3_wRes_________truncerr1e-11]# time rm -rf N2 real 1m31.096s user 0m0.000s sys 0m0.015s Similar results for file creates: [root@fatcompute-11-32 ]# date;for i in `seq 1 50`;do touch file${i};done;date Fri May 24 16:04:17 MDT 2013 Fri May 24 16:05:05 MDT 2013 What else do you need to know? Which debug flags? What should we be looking at? I don't see any load on the servers and I've restarted server and rebooted server nodes. Thanks for any pointers, Mike Robbert Colorado School of Mines _____________________________________________________ Pvfs2-users mailing list Pvfs2-users@beowulf-______underground.org <mailto:Pvfs2-users@beowulf-____underground.org> <mailto:Pvfs2-users@beowulf-____underground.org <mailto:Pvfs2-users@beowulf-__underground.org>> <mailto:Pvfs2-users@beowulf-____underground.org <mailto:Pvfs2-users@beowulf-__underground.org> <mailto:Pvfs2-users@beowulf-__underground.org <mailto:[email protected]>>> <mailto:Pvfs2-users@beowulf-______underground.org <mailto:Pvfs2-users@beowulf-____underground.org> <mailto:Pvfs2-users@beowulf-____underground.org <mailto:Pvfs2-users@beowulf-__underground.org>> <mailto:Pvfs2-users@beowulf-____underground.org <mailto:Pvfs2-users@beowulf-__underground.org> <mailto:Pvfs2-users@beowulf-__underground.org <mailto:[email protected]>>>> http://www.beowulf-______underground.org/mailman/______listinfo/pvfs2-users <http://www.beowulf-____underground.org/mailman/____listinfo/pvfs2-users> <http://www.beowulf-____underground.org/mailman/____listinfo/pvfs2-users <http://www.beowulf-__underground.org/mailman/__listinfo/pvfs2-users>> <http://www.beowulf-____underground.org/mailman/____listinfo/pvfs2-users <http://www.beowulf-__underground.org/mailman/__listinfo/pvfs2-users> <http://www.beowulf-__underground.org/mailman/__listinfo/pvfs2-users <http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users>>> -- Becky Ligon OrangeFS Support and Development Omnibond Systems Anderson, South Carolina -- Becky Ligon OrangeFS Support and Development Omnibond Systems Anderson, South Carolina -- Becky Ligon OrangeFS Support and Development Omnibond Systems Anderson, South Carolina -- Becky Ligon OrangeFS Support and Development Omnibond Systems Anderson, South Carolina -- Becky Ligon OrangeFS Support and Development Omnibond Systems Anderson, South Carolina -- Becky Ligon OrangeFS Support and Development Omnibond Systems Anderson, South Carolina
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
