So usually this should be functionality handled by the raid/san
controller I guess, > but given that btrfs is playing the role of a
controller here at what point are we drawing the line of not
implementing block-level functionality into the filesystem ?
Don't worry this is not invading into the block layer. How
can you even build this functionality in the block layer ?
Block layer even won't know that disks are mirrored. RAID
does or BTRFS in our case.
By block layer I guess I meant the storage driver of a particular
raid card. Because what is currently happening is re-implementing
functionality that will generally sit in the driver. So my
question was more generic and high-level - at what point do we draw the line of
implementing feature that are generally implemented in hardware
devices (be it their drivers or firmware).
Not all HW configs use RAID capable HBAs. A server connected to
a SATA JBOD using a SATA HBA without MD will relay on BTRFS to provide all
the features and capabilities that otherwise would have provided by
such a presumable HW config.
That does sort of sound like means implementing some portion of the
HBA features/capabilities in the filesystem.
To me it seems this this could be workable at the fs level, provided it
deals just with policies and remains hardware-neutral.
Thanks. Ok.
However most
of the use cases appear to involve some hardware-dependent knowledge
or assumptions.
What happens when someone sets this on a virtual disk,
or say a (persistent) memory-backed block device?
Do you have any policy in particular ?
No, this is your proposal ;^)
Policy added here:
devid
It is about the devid which is assigned by the btrfs.
Future policy:
LBA/ssd/io/Tims-heuristic
They aren't hardware dependent though ssd says use ssd
disk for reading if available. LBA is to divide the read
IO access based on the sector #. The logic is quite simple
read-sector < FS-SIZE/2 ? mirror1 : mirror2;
You've said cases #3 thru #6 are illustrative only. However they make
assumptions about the underlying storage, and/or introduce potential for
unexpected behaviors.
The assumptions I am making is that user will understand their
storage and tune this parameter accordingly, and there is heuristic
(which Tim wrote) to do things automatically. Sometimes manual settings
provide better performance than heuristic.
Plus they could end up replicating functionality
from other layers as Nikolay pointed out. Seems unlikely these would be
practical to implement.
The I/O one would actually be rather nice to have and wouldn't really be
duplicating anything (at least, not duplicating anything we consistently
run on top of). The pid-based selector works fine for cases where the
only thing on the disks is a single BTRFS filesystem. When there's more
than that, it can very easily result in highly asymmetrical load on the
disks because it doesn't account for current I/O load when picking a
copy to read. Last I checked, both MD and DM-RAID have at least the
option to use I/O load in determining where to send reads for RAID1
setups, and they do a far better job than BTRFS at balancing load in
these cases.
Yeah.. some enterprise FS and storage communicate performance
tunability automatically between each other. We will be there too.
Case #2 seems concerning if it exposes internal,
implementation-dependent filesystem data into a de facto user-level
interface. (Do we ensure the devid is unique, and cannot get changed or
re-assigned internally to a different device, etc?)
The devid gets assigned when a device is added to a filesystem, it's a
monotonically increasing number that gets incremented for every new
device, and never changes for a given device as long as it remains in
the filesystem (it will change if you remove the device and then re-add
it). The only exception to this is that the replace command will assign
the new device the same devid that the device it is replacing had (which
I would argue leads to consistent behavior here). Given that, I think
it's sufficiently safe to use it for something like this.
Thanks, Anand
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html