On 2018-02-01 18:46, Edmund Nadolski wrote:
On 02/01/2018 01:12 AM, Anand Jain wrote:
On 02/01/2018 01:26 PM, Edmund Nadolski wrote:
On 1/31/18 7:36 AM, Anand Jain wrote:
On 01/31/2018 09:42 PM, Nikolay Borisov wrote:
So usually this should be functionality handled by the raid/san
controller I guess, > but given that btrfs is playing the role of a
controller here at what point are we drawing the line of not
implementing block-level functionality into the filesystem ?
Don't worry this is not invading into the block layer. How
can you even build this functionality in the block layer ?
Block layer even won't know that disks are mirrored. RAID
does or BTRFS in our case.
By block layer I guess I meant the storage driver of a particular raid
card. Because what is currently happening is re-implementing
functionality that will generally sit in the driver. So my question was
more generic and high-level - at what point do we draw the line of
implementing feature that are generally implemented in hardware devices
(be it their drivers or firmware).
Not all HW configs use RAID capable HBAs. A server connected to a SATA
JBOD using a SATA HBA without MD will relay on BTRFS to provide all
the
features and capabilities that otherwise would have provided by such a
presumable HW config.
That does sort of sound like means implementing some portion of the
HBA features/capabilities in the filesystem.
To me it seems this this could be workable at the fs level, provided it
deals just with policies and remains hardware-neutral.
Thanks. Ok.
However most
of the use cases appear to involve some hardware-dependent knowledge
or assumptions.
What happens when someone sets this on a virtual disk,
or say a (persistent) memory-backed block device?
Do you have any policy in particular ?
No, this is your proposal ;^)
You've said cases #3 thru #6 are illustrative only. However they make
assumptions about the underlying storage, and/or introduce potential for
unexpected behaviors. Plus they could end up replicating functionality
from other layers as Nikolay pointed out. Seems unlikely these would be
practical to implement.
The I/O one would actually be rather nice to have and wouldn't really be
duplicating anything (at least, not duplicating anything we consistently
run on top of). The pid-based selector works fine for cases where the
only thing on the disks is a single BTRFS filesystem. When there's more
than that, it can very easily result in highly asymmetrical load on the
disks because it doesn't account for current I/O load when picking a
copy to read. Last I checked, both MD and DM-RAID have at least the
option to use I/O load in determining where to send reads for RAID1
setups, and they do a far better job than BTRFS at balancing load in
these cases.
Case #2 seems concerning if it exposes internal,
implementation-dependent filesystem data into a de facto user-level
interface. (Do we ensure the devid is unique, and cannot get changed or
re-assigned internally to a different device, etc?)
The devid gets assigned when a device is added to a filesystem, it's a
monotonically increasing number that gets incremented for every new
device, and never changes for a given device as long as it remains in
the filesystem (it will change if you remove the device and then re-add
it). The only exception to this is that the replace command will assign
the new device the same devid that the device it is replacing had (which
I would argue leads to consistent behavior here). Given that, I think
it's sufficiently safe to use it for something like this.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html