-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 10/03/13 15:04, Hugo Mills wrote:
> On Sat, Mar 09, 2013 at 09:41:50PM -0800, Roger Binns wrote:
>> The only constraints that matter are surviving N device failures, and
>> data not lost if at least N devices are still present.  Under the
>> hood the best way of meeting those can be heuristically determined,
>> and I'd expect things like overhead to dynamically adjust as storage
>> fills up or empties.
> 
> That's really not going to work happily -- you'd have to run the 
> restriper in the background automatically as the device fills up.

Which is the better approach - the administrator has to sit there
adjusting various parameters after having done some difficult calculations
redoing it as data and devices increase or decrease - or a computer with
billions of bytes of memory and billions of cpu cycles per second just
figures it out based on experience :-)

> Given that this is going to end up rewriting *all* of the data on the 
> FS,

Why does all data have to be rewritten?  Why does every piece of data have
to have exactly the same storage parameters in terms of
non-redundancy/performance/striping options?

I can easily imagine the final implementation being informed by hot data
tracking.  There is absolutely no need for data that is rarely read to be
using the maximum striping/performance/overhead options.

There is no need to rewrite everything anyway - if a filesystem with 1GB
of data is heading towards 2GB of data then only enough readjusts need to
be made to release that additional 1GB of overhead.

I also assume that the probability of all devices being exactly the same
size and exactly the same performance characteristics is going to
decrease.  Many will expect that they can add an SSD to the soup, and over
time add/update devices.  ie the homogenous case that regular RAID
implicitly assumes will become increasingly rare.

> If you want maximum storage (with some given redundancy), regardless of
> performance, then you might as well start with the parity-based levels
> and just leave it at that.

In the short term it would certainly make sense to have an online
calculator or mkfs helper where you specify the device sizes and
redundancy requirements together with how much data you have, and it then
spits out the string of numbers and letters to use for mkfs/balance.

> Thinking about it, specifying a (redundancy, acceptable_wastage) pair
> is fairly pointless in controlling the performance levels,

I don't think there is merit in specifying acceptable message - the answer
is obvious in that any unused space is acceptable for use.  That also
means it changes over time as storage is used/freed.

> There's not much else a heuristic can do, without effectively exposing
> all the config options to the admin, in some obfuscated form.

There is lots heuristics can do.  At the simplest level btrfs can monitor
device performance characteristics and use that as a first pass.  One
database that I use has an interesting approach for queries - rather than
trying to work out the single best perfect execution strategy (eg which
indices in which order) it actually tries them all out concurrently and
picks the quickest.  That is then used for future similar queries with the
performance being monitored.  Once responses times no longer match the
strategy it tries them all again to pick a new winner.

There is no reason btrfs can't try a similar approach.  When presented
with a pile of heterogenous storage with different sizes and performance
characteristics, use all reasonable approaches and monitor resulting
read/write performance.  Then start biasing towards what works best.  Use
hot data tracking to determine which data would most benefit from its
approach being changed to more optimal values.

Roger
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAlE9I4UACgkQmOOfHg372QTNZgCeJe7H9FDiwMq1CWWZTWE89/4O
fDsAn1s6/J1am4mxHhOYUnz/3JUZ6VJx
=/XF8
-----END PGP SIGNATURE-----

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to