On Wed, Oct 12, 2016 at 05:10:18PM -0400, Zygo Blaxell wrote:
> On Wed, Oct 12, 2016 at 09:55:28PM +0200, Adam Borowski wrote:
> > On Wed, Oct 12, 2016 at 01:19:37PM -0400, Zygo Blaxell wrote:
> > > I had been thinking that we could inject "plug" extents to fill up
> > > RAID5 stripes.
> > Your idea sounds good, but there's one problem: most real users don't
> > balance.  Ever.  Contrary to the tribal wisdom here, this actually works
> > fine, unless you had a pathologic load skewed to either data or metadata on
> > the first write then fill the disk to near-capacity with a load skewed the
> > other way.
> > Most usage patterns produce a mix of transient and persistent data (and at
> > write time you don't know which file is which), meaning that with time every
> > stripe will contain a smidge of cold data plus a fill of plug extents.
> Yes, it'll certainly reduce storage efficiency.  I think all the
> RMW-avoidance strategies have this problem.  The alternative is to risk
> losing data or the entire filesystem on disk failure, so any of the
> RMW-avoidance strategies are probably a worthwhile tradeoff.  Big RAID5/6
> arrays tend to be used mostly for storing large sequentially-accessed
> files which are less susceptible to this kind of problem.
> If the pattern is lots of small random writes then performance on raid5
> will be terrible anyway (though it may even be improved by using plug
> extents, since RMW stripe updates would be replaced with pure CoW).

I've looked at some simple scenarios, and it appears that, with your scheme,
the total amount of I/O would increase, but it would not hinder performance
as increases happen only when the disk would be otherwise idle.  There's
also a latency win and a fragmentation win -- all while fixing the write

Let's assume leaf size 16KB, stripe size 64KB.  The disk has four stripes,
each 75% full 25% deleted.  '*' marks cold data, '.' deleted/plug space, 'x'
new data.  I'm not drawing entirely empty stripes.
The user wants to write 64KB of data.
RMW needs to read 12 leafs, write 16, no matter if the data comes in one
commit or four.
Latency 28 (big commit)/7 per commit (small commits), total I/O 28.

The plug extents scheme requires compaction (partial balance):
I/O so far 24.
Big commit:
Latency 4, total I/O 28.
If we had to compact on-demand, the latency is 28 (assuming we can do
stripe-sized balance).

Small commits, no concurrent writes:
Latency 1 per commit, I/O so far 28, need another compaction:
Total I/O 32.

Small io, concurrent writes that peg the disk:
Total I/O 28 (not counting concurrent writes).

Other scenarios I've analyzed give similar results.

I'm not sure if my thinking is correct, but if it is, the outcome is quite
surprising: no performance loss even though we had to rewrite the stripes!

> > Thus, while the plug extents idea doesn't suffer from problems of big
> > sectors you just mentioned, we'd need some kind of auto-balance.
> Another way to approach the problem is to relocate the blocks in
> partially filled RMW stripes so they can be effectively CoW stripes;
> however, the requirement to do full extent relocations leads to some
> nasty write amplification and performance ramifications.  Balance is
> hugely heavy I/O load and there are good reasons not to incur it at
> unexpected times.

We don't need balance in btrfs sense, it's enough to compact stripes -- ie,
something akin to balance except done at stripe level rather than allocation
block level.

As for write amplification, F2FS guys solved the issue by having two types
of cleaning (balancing):
* on demand (when there is no free space and thus it needs to be done NOW)
* in the background (done only on cold data)

The on-demand clean goes for juiciest targets first (least data/stripe),
background clean on the other hand uses a formula that takes into account
both the amount of space to reclaim and age of the stripe.  If the data is
hot, it shouldn't be cleaned yet -- it's likely to be deleted/modified soon.

A MAP07 (Dead Simple) raspberry tincture recipe: 0.5l 95% alcohol, 1kg
raspberries, 0.4kg sugar; put into a big jar for 1 month.  Filter out and
throw away the fruits (can dump them into a cake, etc), let the drink age
at least 3-6 months.
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to