On Tue, Feb 15, 2011 at 09:07:31PM +0100, Marc Grimme wrote: > Hi Steve, > I think lately I observed a very similar behavior with RHEL5 and gfs2. > It was a gfs2 filesystem that had about 2Mio files with sum of 2GB in a > directory. When I did a du -shx . in this directory it took about 5 Minutes > (noatime mountoption given). Independently on how much nodes took part in the > cluster (in the end I only tested with one node). This was only for the first > time running all later executed du commands were much faster. > When I mounted the exact same filesystem with lockproto=lock_nolock it took > about 10-20 seconds to proceed with the same command. > > Next I started to analyze this with oprofile and observed the following > result: > > opreport --long-file-names: > CPU: AMD64 family10, speed 2900.11 MHz (estimated) > Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit > mask of 0x00 (No unit mask) count 100000 > samples % symbol name > 200569 46.7639 search_rsb_list > 118905 27.7234 create_lkb
Hi Marc, thanks for sending this again, I remember that you pointed these out a long time ago, but had forgotten just how bad those searches were. I really do need to do some optimizing there. > This very much reminded me on a similar test we've done years ago with > gfs (see > http://www.open-sharedroot.org/Members/marc/blog/blog-on-dlm/red-hat-dlm-__find_lock_by_id/profile-data-with-diffrent-table-sizes). > > Does this not show that during the du command 46% of the time the kernel > stays in the dlm:search_rsb_list function while looking out for locks. > It still looks like the hashtable for the lock in dlm is much too small > and searching inside the hashmap is not constant anymore? We should definately check if the default hash table sizes should be increased. Dave -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster