On Mon, 22 Sep 2014 13:59:03 +0200, David Sterba wrote: > On Fri, Sep 19, 2014 at 01:34:38PM +0000, Holger Hoffstätte wrote: >> >> I'd also love a technical explanation why this happens and how it could >> be fixed. Maybe it's just a consequence of how the metadata tree(s) >> are laid out on disk. > > The stat() call is the most severe slowdown factor in combination with > fragmentation and random order of the inodes that are being stat()ed. > > A 'ls -f' (that does not stat) goes only through the DIR_INDEX_KEY items > in the b-tree that are usually packed together and reading is sequential > and fast. > > A 'ls' that calls stat() in turn will have to seek for the INODE_ITEM > item that can be far from all the currently read metadata blocks.
Thanks Dave - that confirms everything I (unscientifically ;) observed so far, since I also tried to use "find" to warm up (in the hope it would cache the relevant metadata blocks), but running with strace showed that it does - of course! - not call stat on each inode, and just quickly reads the directory entry list (via getdents()). This meant that even after a full "find" a subsequent "du" would still be slow(er). Both the cold "find" and a cold "du" also *sound* noticeably different, in terms of disk head scratching; find is significantly less seeky. Interesting that you also mention the readahead. I've run the "du" warmup under Brendan Gregg's iosnoop and it shows that most stat()-heavy I/O is done in 16k blocks, while ext4 only seems to use 4k. Time to look at the trees in detail.. :) -h -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html