On Mon, 22 Sep 2014 13:59:03 +0200, David Sterba wrote:

> On Fri, Sep 19, 2014 at 01:34:38PM +0000, Holger Hoffstätte wrote:
>> 
>> I'd also love a technical explanation why this happens and how it could
>> be fixed. Maybe it's just a consequence of how the metadata tree(s)
>> are laid out on disk.
> 
> The stat() call is the most severe slowdown factor in combination with
> fragmentation and random order of the inodes that are being stat()ed.
> 
> A 'ls -f' (that does not stat) goes only through the DIR_INDEX_KEY items
> in the b-tree that are usually packed together and reading is sequential
> and fast.
> 
> A 'ls' that calls stat() in turn will have to seek for the INODE_ITEM
> item that can be far from all the currently read metadata blocks.

Thanks Dave - that confirms everything I (unscientifically ;) observed so
far, since I also tried to use "find" to warm up (in the hope it would
cache the relevant metadata blocks), but running with strace showed that
it does - of course! - not call stat on each inode, and just quickly reads
the directory entry list (via getdents()).

This meant that even after a full "find" a subsequent "du" would still be
slow(er). Both the cold "find" and a cold "du" also *sound* noticeably
different, in terms of disk head scratching; find is significantly less
seeky.

Interesting that you also mention the readahead. I've run the "du" warmup
under Brendan Gregg's iosnoop and it shows that most stat()-heavy I/O is
done in 16k blocks, while ext4 only seems to use 4k.

Time to look at the trees in detail.. :)

-h

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to