Re: btrfs-RAID(3 or 5/6/etc) like btrfs-RAID1?

2014-02-13 Thread Hugo Mills
On Thu, Feb 13, 2014 at 11:32:03AM -0500, Jim Salter wrote:
 That is FANTASTIC news.  Thank you for wielding the LART gently. =)

   No LART necessary. :) Nobody knows everything, and it's not a
particularly heavily-documented or written-about feature at the moment
(mostly because it only exists in Chris's local git repo).

 I do a fair amount of public speaking and writing about next-gen
 filesystems (example: 
 http://arstechnica.com/information-technology/2014/01/bitrot-and-atomic-cows-inside-next-gen-filesystems/)
 and I will be VERY sure to talk about the upcoming divorce of stripe
 size from array size in future presentations.  This makes me
 positively giddy.
 
 FWIW, after writing the above article I got contacted by a
 proprietary storage vendor who wanted to tell me all about his
 midmarket/enterprise product, and he was pretty audibly flummoxed
 when I explained how btrfs-RAID1 distributes data and redundancy -
 his product does something similar (to be fair, his product also
 does a lot of other things btrfs doesn't inherently do, like
 clustered storage and synchronous dedup), and he had no idea that
 anything freely available did anything vaguely like it.

   That's quite entertaining for the bogglement factor. Although,
again, see my comment above...

   Hugo.

 I have a feeling the storage world - even the relatively
 well-informed part of it that's aware of ZFS - has little to no
 inclination how gigantic of a splash btrfs is going to make when it
 truly hits the mainstream.
 
 This could be a pretty powerful setup IMO - if you implemented
 something like this, you'd be able to arbitrarily define your
 storage efficiency (percentage of parity blocks / data blocks) and
 your fault-tolerance level (how many drives you can afford to lose
 before failure) WITHOUT tying it directly to your underlying disks,
 or necessarily needing to rebalance as you add more disks to the
 array.  This would be a heck of a lot more flexible than ZFS'
 approach of adding more immutable vdevs.
 
 Please feel free to tell me why I'm dumb for either 1. not realizing
 the obvious flaw in this idea or 2. not realizing it's already being
 worked on in exactly this fashion. =)
 The latter. :)
 

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
 --- Nothing right in my left brain. Nothing left in --- 
 my right brain. 


signature.asc
Description: Digital signature


Re: btrfs-RAID(3 or 5/6/etc) like btrfs-RAID1?

2014-02-13 Thread Goffredo Baroncelli
Hi Jim,
On 02/13/2014 05:13 PM, Jim Salter wrote:
 This might be a stupid question but...

There is no stupid questions, only stupid answers...

 
 Are there any plans to make parity RAID levels in btrfs similar to
 the current implementation of btrfs-raid1?
 
 It took me a while to realize how different and powerful btrfs-raid1
 is from traditional raid1.  The ability to string together virtually
 any combination of mutt hard drives together in arbitrary ways and
 yet maintain redundancy is POWERFUL, and is seriously going to be a
 killer feature advancing btrfs adoption in small environments.
 
 The one real drawback to btrfs-raid1 is that you're committed to n/2
 storage efficiency, since you're using pure redundancy rather than
 parity on the array.  I was thinking about that this morning, and
 suddenly it occurred to me that you ought to be able to create a
 striped parity array in much the same way as a btrfs-raid1 array.
 
 Let's say you have five disks, and you arbitrarily want to define a
 stripe length of four data blocks plus one parity block per stripe.

I what it is different from a raid5 setup (which is supported by btrfs)?

 Right now, what you're looking at effectively amounts to a RAID3
 array, like FreeBSD used to use.  But, what if we add two more disks?
 Or three more disks? Or ten more?  Is there any reason we can't keep
 our stripe length of four blocks + one parity block, and just
 distribute them relatively ad-hoc in the same way btrfs-raid1
 distributes redundant data blocks across an ad-hoc array of disks?
 
 This could be a pretty powerful setup IMO - if you implemented
 something like this, you'd be able to arbitrarily define your storage
 efficiency (percentage of parity blocks / data blocks) and your
 fault-tolerance level (how many drives you can afford to lose before
 failure) WITHOUT tying it directly to your underlying disks

May be that it is a good idea, but which would be the advantage to 
use less drives that the available ones for a RAID ?

Regarding the fault tolerance level, few weeks ago there was a 
posting about a kernel library which would provide a generic
RAID framework capable of several degree of fault tolerance 
(raid 5,6,7...) [give a look to 
[RFC v4 2/3] fs: btrfs: Extends btrfs/raid56 to 
support up to six parities 2014/1/25]. This definitely would be a
big leap forward.

BTW, the raid5/raid6 support in BTRFS is only for testing purpose. 
However Chris Mason, told few week ago that he will work on these
issues.

[...]
 necessarily needing to rebalance as you add more disks to the array.
 This would be a heck of a lot more flexible than ZFS' approach of
 adding more immutable vdevs.

There is no needing to re-balance if you add more drives. The next 
chunk allocation will span all the available drives anyway. It is only 
required when you want to spans all data already written on all the drives.

Regards
Goffredo


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-RAID(3 or 5/6/etc) like btrfs-RAID1?

2014-02-13 Thread Hugo Mills
On Thu, Feb 13, 2014 at 09:22:07PM +0100, Goffredo Baroncelli wrote:
 Hi Jim,
 On 02/13/2014 05:13 PM, Jim Salter wrote:
  Let's say you have five disks, and you arbitrarily want to define a
  stripe length of four data blocks plus one parity block per stripe.
 
 I what it is different from a raid5 setup (which is supported by btrfs)?

   With what's above, yes, that's the current RAID-5 code.

  Right now, what you're looking at effectively amounts to a RAID3
  array, like FreeBSD used to use.  But, what if we add two more disks?
  Or three more disks? Or ten more?  Is there any reason we can't keep
  our stripe length of four blocks + one parity block, and just
  distribute them relatively ad-hoc in the same way btrfs-raid1
  distributes redundant data blocks across an ad-hoc array of disks?
  
  This could be a pretty powerful setup IMO - if you implemented
  something like this, you'd be able to arbitrarily define your storage
  efficiency (percentage of parity blocks / data blocks) and your
  fault-tolerance level (how many drives you can afford to lose before
  failure) WITHOUT tying it directly to your underlying disks
 
 May be that it is a good idea, but which would be the advantage to 
 use less drives that the available ones for a RAID ?

   Performance, plus the ability to handle different sized drives.
Hmm... maybe I should do an optimise option for the space planner...

 Regarding the fault tolerance level, few weeks ago there was a 
 posting about a kernel library which would provide a generic
 RAID framework capable of several degree of fault tolerance 
 (raid 5,6,7...) [give a look to 
 [RFC v4 2/3] fs: btrfs: Extends btrfs/raid56 to 
 support up to six parities 2014/1/25]. This definitely would be a
 big leap forward.
 
 BTW, the raid5/raid6 support in BTRFS is only for testing purpose. 
 However Chris Mason, told few week ago that he will work on these
 issues.
 
 [...]
  necessarily needing to rebalance as you add more disks to the array.
  This would be a heck of a lot more flexible than ZFS' approach of
  adding more immutable vdevs.
 
 There is no needing to re-balance if you add more drives. The next 
 chunk allocation will span all the available drives anyway. It is only 
 required when you want to spans all data already written on all the drives.

   The balance opens up more usable space, unless the new device is
(some nasty function of) the remaining free space on the other drives.
It's not necessarily about spanning the data, although that's an
effect, too.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- It used to take a lot of talent and a certain type of ---  
upbringing to be perfectly polite and have filthy manners
at the same time. Now all it needs is a computer.


signature.asc
Description: Digital signature


btrfs-RAID(3 or 5/6/etc) like btrfs-RAID1?

2014-02-13 Thread Jim Salter

This might be a stupid question but...

Are there any plans to make parity RAID levels in btrfs similar to the 
current implementation of btrfs-raid1?


It took me a while to realize how different and powerful btrfs-raid1 is 
from traditional raid1.  The ability to string together virtually any 
combination of mutt hard drives together in arbitrary ways and yet 
maintain redundancy is POWERFUL, and is seriously going to be a killer 
feature advancing btrfs adoption in small environments.


The one real drawback to btrfs-raid1 is that you're committed to n/2 
storage efficiency, since you're using pure redundancy rather than 
parity on the array.  I was thinking about that this morning, and 
suddenly it occurred to me that you ought to be able to create a striped 
parity array in much the same way as a btrfs-raid1 array.


Let's say you have five disks, and you arbitrarily want to define a 
stripe length of four data blocks plus one parity block per stripe.  
Right now, what you're looking at effectively amounts to a RAID3 array, 
like FreeBSD used to use.  But, what if we add two more disks? Or three 
more disks? Or ten more?  Is there any reason we can't keep our stripe 
length of four blocks + one parity block, and just distribute them 
relatively ad-hoc in the same way btrfs-raid1 distributes redundant data 
blocks across an ad-hoc array of disks?


This could be a pretty powerful setup IMO - if you implemented something 
like this, you'd be able to arbitrarily define your storage efficiency 
(percentage of parity blocks / data blocks) and your fault-tolerance 
level (how many drives you can afford to lose before failure) WITHOUT 
tying it directly to your underlying disks, or necessarily needing to 
rebalance as you add more disks to the array.  This would be a heck of a 
lot more flexible than ZFS' approach of adding more immutable vdevs.


Please feel free to tell me why I'm dumb for either 1. not realizing the 
obvious flaw in this idea or 2. not realizing it's already being worked 
on in exactly this fashion. =)

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-RAID(3 or 5/6/etc) like btrfs-RAID1?

2014-02-13 Thread Hugo Mills
On Thu, Feb 13, 2014 at 11:13:58AM -0500, Jim Salter wrote:
 This might be a stupid question but...
 
 Are there any plans to make parity RAID levels in btrfs similar to
 the current implementation of btrfs-raid1?

   Yes.

 It took me a while to realize how different and powerful btrfs-raid1
 is from traditional raid1.  The ability to string together virtually
 any combination of mutt hard drives together in arbitrary ways and
 yet maintain redundancy is POWERFUL, and is seriously going to be a
 killer feature advancing btrfs adoption in small environments.
 
 The one real drawback to btrfs-raid1 is that you're committed to n/2
 storage efficiency, since you're using pure redundancy rather than
 parity on the array.  I was thinking about that this morning, and
 suddenly it occurred to me that you ought to be able to create a
 striped parity array in much the same way as a btrfs-raid1 array.
 
 Let's say you have five disks, and you arbitrarily want to define a
 stripe length of four data blocks plus one parity block per
 stripe.  Right now, what you're looking at effectively amounts to
 a RAID3 array, like FreeBSD used to use.  But, what if we add two
 more disks? Or three more disks? Or ten more?  Is there any reason
 we can't keep our stripe length of four blocks + one parity block,
 and just distribute them relatively ad-hoc in the same way
 btrfs-raid1 distributes redundant data blocks across an ad-hoc array
 of disks?

   None whatsoever.

 This could be a pretty powerful setup IMO - if you implemented
 something like this, you'd be able to arbitrarily define your
 storage efficiency (percentage of parity blocks / data blocks) and
 your fault-tolerance level (how many drives you can afford to lose
 before failure) WITHOUT tying it directly to your underlying disks,
 or necessarily needing to rebalance as you add more disks to the
 array.  This would be a heck of a lot more flexible than ZFS'
 approach of adding more immutable vdevs.
 
 Please feel free to tell me why I'm dumb for either 1. not realizing
 the obvious flaw in this idea or 2. not realizing it's already being
 worked on in exactly this fashion. =)

   The latter. :)

   One of the (many) existing problems with the parity RAID
implementation as it is is that with large numbers of devices, it
becomes quite inefficient to write data when you (may) need to modify
dozens of devices. It's been Chris's stated intention for a while now
to allow a bound to be placed on the maximum number of devices per
stripe, which allows a degree of control over the space-yield -
performance knob.

   It's one of the reasons that the usage tool [1] has a maximum
stripes knob on it -- so that you can see the behaviour of the FS
once that feature's in place.

   Hugo.

[1] http://carfax.org.uk/btrfs-usage/

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
 --- Nothing right in my left brain. Nothing left in --- 
 my right brain. 


signature.asc
Description: Digital signature


Re: btrfs-RAID(3 or 5/6/etc) like btrfs-RAID1?

2014-02-13 Thread Jim Salter

That is FANTASTIC news.  Thank you for wielding the LART gently. =)

I do a fair amount of public speaking and writing about next-gen 
filesystems (example: 
http://arstechnica.com/information-technology/2014/01/bitrot-and-atomic-cows-inside-next-gen-filesystems/) 
and I will be VERY sure to talk about the upcoming divorce of stripe 
size from array size in future presentations.  This makes me positively 
giddy.


FWIW, after writing the above article I got contacted by a proprietary 
storage vendor who wanted to tell me all about his midmarket/enterprise 
product, and he was pretty audibly flummoxed when I explained how 
btrfs-RAID1 distributes data and redundancy - his product does something 
similar (to be fair, his product also does a lot of other things btrfs 
doesn't inherently do, like clustered storage and synchronous dedup), 
and he had no idea that anything freely available did anything vaguely 
like it.


I have a feeling the storage world - even the relatively well-informed 
part of it that's aware of ZFS - has little to no inclination how 
gigantic of a splash btrfs is going to make when it truly hits the 
mainstream.



This could be a pretty powerful setup IMO - if you implemented
something like this, you'd be able to arbitrarily define your
storage efficiency (percentage of parity blocks / data blocks) and
your fault-tolerance level (how many drives you can afford to lose
before failure) WITHOUT tying it directly to your underlying disks,
or necessarily needing to rebalance as you add more disks to the
array.  This would be a heck of a lot more flexible than ZFS'
approach of adding more immutable vdevs.

Please feel free to tell me why I'm dumb for either 1. not realizing
the obvious flaw in this idea or 2. not realizing it's already being
worked on in exactly this fashion. =)

The latter. :)


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html