Chris Mason wrote:
On Thu, Jul 23, 2009 at 01:35:21PM -0500, Steven Pratt wrote:
I have re-run the raid tests with re-creating the fileset between each of the random write workloads and performance does now match the previous newformat results. The bad news is that the huge gain that I had attributed to the newformat release, does not really exist. All of the previous results(except for the newformat run) were not re-creating the fileset, so the gain in performance was due only to having a fresh set of files, not any code changes.

Thanks for doing all of these runs.  This is still a little different
than what I have here, my initial runs are very very fast and after 10
or so level out to a relatively low performance on random writes.  With
nodatacow, it stays even.

Right, I do not see this problem with nodatacow.

So, I have done 2 new sets of runs to look into this further. One is a 3 hour run of single threaded random write to the RAID system. I have compared this to ext3. Performance results are here: http://btrfs.boxacle.net/repository/raid/longwrite/longwrite/Longrandomwrite.html

and graphing of all the iostat data can be found here:

http://btrfs.boxacle.net/repository/raid/longwrite/summary.html

The iostat graphs for btrfs are interesting for a number of reasons. First, it takes about 3000 seconds (or 50 minutes) for btrfs to reach steady state. Second, if you look at write throughput from the device view vs. the btrfs/application view, we see that for a application throughput of 21.5MB/sec it requires 63MB/sec of actual disk writes. That is an overhead of 3 to 1 vs an overhead of ~0 for ext3. Also, looking at the change in iops vs MB/sec, we see that while btrfs starts out with reasonable size IOs, it quickly deteriorate to an average IO size of only 13kb. Remember, the starting file set is only 100GB on a 2.1TB filesystem, and all data is overwrite, and this is single threaded, so there is no reason this should fragment. It seems like the allocator is having a problem doing sequential allocations.

There are two things happening.  First the default allocation scheme
isn't very well suited to this, mount -o ssd will perform better.  But
over the long term, random overwrites to the file cause a lot of writes
to the extent allocation tree.  That's really what -o nodatacow is
saving us.  There are optimizations we can do, but we're holding off on
that in favor of enospc and other pressing things.
Well I have -o ssd data that I can upload, but it was worse than without. I do understand about timing and priorities.

But, with all of that said, Josef has some really important allocator
improvements.  I've put them out along with our pending patches into the
experimental branch of the btrfs-unstable tree.  Could you please give
this branch a try both with and without the ssd mount option?

Sure, will try to get to it tomorrow.

Steve

-chris


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to