Am Samstag, 27. Dezember 2014, 16:06:13 schrieb Robert White: > > > >> I also don't know what kind of tool you are using, but it might be > >> repeatedly trying and failing to fallocate the file as a single > >> extent or something equally dumb. > > > > Userspace doesn't as far as I know, get to make that decision. I've > > just read the fallocate(2) man page, and it says nothing at all about > > the contiguity of the extent(s) storage allocated by the call. > > Yep, my bad. But as soon as I saw that "fio" was starting two threads, > one doing random read/write and another doing sequential read/write, > both on the same file, it set off my "not just creating a file" mindset. > Given the delayed write into/through the cache normally done by casual > file io, It seemed likely that fio would be doing something more > aggressive (like using O_DIRECT or repeated fdatasync() which could get > very tit-for-tat).
Robert, please get to know about fio or *ask* before jumping to conclusions. I used this: [global] bs=4k #ioengine=libaio #iodepth=4 size=4g #direct=1 runtime=120 filename=ssd.test.file #[seq-write] #rw=write #stonewall [rand-write] rw=randwrite stonewall At the first test I still tested seq-write, but do you note the "stonewall" param? It *separates* both jobs from one another. I.e. fio may be starting two threads as it I think prepares all threads in advance, yet it did execute only *one* at a time. >From the manpage of fio: stonewall , wait_for_previous Wait for preceding jobs in the job file to exit before starting this one. stonewall implies new_group. (that said the first stonewall isn´t even needed, but I removed the read jobs from the ssd-test.fio example fio I used for this job and I didn´t remember to remove the statement) Thank you a lot for your input. I learned some from it. For example that the trees for the data handling are in the metadata section. And now I am very clear the btrfs fi df does not display any trees but the chunk reservation and usage. I think I knew this before, but I thought somehow that was combined with the tree, but it isn´t, at least not in place, but the trees are stored in the metadata chunks. I´d still not call these extents tough, cause thats a file-based thing to all I know. I skip theoretizing about algorithms here. I prefer to let measurements speak and try to understand these. Best approach to understand the ones I made, I think, is what Hugo suggested: A developer looks at the sysrq-t outputs. So I personally won´t speculate any further about given or not given algorithmic limitations of BTRFS. Ciao, -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html