On 2017-11-03 03:42, Kai Krakow wrote:
Am Tue, 31 Oct 2017 07:28:58 -0400
schrieb "Austin S. Hemmelgarn" <ahferro...@gmail.com>:
On 2017-10-31 01:57, Marat Khalili wrote:
On 31/10/17 00:37, Chris Murphy wrote:
But off hand it sounds like hardware was sabotaging the expected
write ordering. How to test a given hardware setup for that, I
think, is really overdue. It affects literally every file system,
and Linux storage technology.
It kinda sounds like to me something other than supers is being
overwritten too soon, and that's why it's possible for none of the
backup roots to find a valid root tree, because all four possible
root trees either haven't actually been written yet (still) or
they've been overwritten, even though the super is updated. But
again, it's speculation, we don't actually know why your system
was no longer mountable.
Just a detached view: I know hardware should respect
ordering/barriers and such, but how hard is it really to avoid
overwriting at least one complete metadata tree for half an hour
(even better, yet another one for a day)? Just metadata, not data
extents.
If you're running on an SSD (or thinly provisioned storage, or
something else which supports discards) and have the 'discard' mount
option enabled, then there is no backup metadata tree (this issue was
mentioned on the list a while ago, but nobody ever replied), because
it's already been discarded. This is ideally something which should
be addressed (we need some sort of discard queue for handling in-line
discards), but it's not easy to address.
Otherwise, it becomes a question of space usage on the filesystem,
and this is just another reason to keep some extra slack space on the
FS (though that doesn't help _much_, it does help). This, in theory,
could be addressed, but it probably can't be applied across mounts of
a filesystem without an on-disk format change.
Well, maybe inline discard is working at the wrong level. It should
kick in when the reference through any of the backup roots is dropped,
not when the current instance is dropped.
Indeed.
Without knowledge of the internals, I guess discards could be added to
a queue within a new tree in btrfs, and only added to that queue when
dropped from the last backup root referencing it. But this will
probably add some bad performance spikes.
Inline discards can already cause bad performance spikes.
I wonder how a regular fstrim run through cron applies to this problem?
You functionally lose any old (freed) trees, they just get kept around
until you call fstrim.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html