Re: Updated performance results

Steven Pratt Thu, 23 Jul 2009 15:05:24 -0700

Chris Mason wrote:

On Thu, Jul 23, 2009 at 01:35:21PM -0500, Steven Pratt wrote:
I have re-run the raid tests with re-creating the fileset between eachof the random write workloads and performance does now match theprevious newformat results. The bad news is that the huge gain that Ihad attributed to the newformat release, does not really exist. All ofthe previous results(except for the newformat run) were not re-creatingthe fileset, so the gain in performance was due only to having a freshset of files, not any code changes.
Thanks for doing all of these runs.  This is still a little different
than what I have here, my initial runs are very very fast and after 10
or so level out to a relatively low performance on random writes.  With
nodatacow, it stays even.

Right, I do not see this problem with nodatacow.

So, I have done 2 new sets of runs to look into this further. One is a 3hour run of single threaded random write to the RAID system. I havecompared this to ext3. Performance results are here:http://btrfs.boxacle.net/repository/raid/longwrite/longwrite/Longrandomwrite.html
and graphing of all the iostat data can be found here:

http://btrfs.boxacle.net/repository/raid/longwrite/summary.html
The iostat graphs for btrfs are interesting for a number of reasons.First, it takes about 3000 seconds (or 50 minutes) for btrfs to reachsteady state. Second, if you look at write throughput from the deviceview vs. the btrfs/application view, we see that for a applicationthroughput of 21.5MB/sec it requires 63MB/sec of actual disk writes.That is an overhead of 3 to 1 vs an overhead of ~0 for ext3. Also,looking at the change in iops vs MB/sec, we see that while btrfs startsout with reasonable size IOs, it quickly deteriorate to an average IOsize of only 13kb. Remember, the starting file set is only 100GB on a2.1TB filesystem, and all data is overwrite, and this is singlethreaded, so there is no reason this should fragment. It seems like theallocator is having a problem doing sequential allocations.
There are two things happening.  First the default allocation scheme
isn't very well suited to this, mount -o ssd will perform better.  But
over the long term, random overwrites to the file cause a lot of writes
to the extent allocation tree.  That's really what -o nodatacow is
saving us.  There are optimizations we can do, but we're holding off on
that in favor of enospc and other pressing things.

Well I have -o ssd data that I can upload, but it was worse thanwithout. I do understand about timing and priorities.

But, with all of that said, Josef has some really important allocator
improvements.  I've put them out along with our pending patches into the
experimental branch of the btrfs-unstable tree.  Could you please give
this branch a try both with and without the ssd mount option?

Sure, will try to get to it tomorrow.

Steve

-chris


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Updated performance results

Reply via email to