On Tue, 2021-01-12 at 11:27 +0000, Filipe Manana wrote: > You get all these issues because you are using an incremental send in > a way it's not supposed to. > You are using two subvolumes that are completely unrelated. Yes, I am aware of that and know that this is not a designed for use case.
> My surprise here is that we actually allow a user to try that, > instead > of giving an error complaining that subvol1 and subvol2 aren't > snapshots of the same subvolume. This could be one way of solving it. But this is already harder than it sounds, since the same issue may also happen *even when* the subvolumes share a parent/are related, here an example reproducer: btrfs subvolume create subvol1 btrfs subvolume snapshot subvol1 subvol2 mkdir subvol1/a echo foo > subvol2/a btrfs property set subvol1 ro true btrfs property set subvol2 ro true btrfs send -p subvol1 subvol2 | btrfs receive --dump This will produce a stream that tries to write data into the cloned directory inode. So we not only had to check the parent relationships but also ensure that the descendant snapshot was not modified, i.e. was never set to ro=false. As far as I know there is no flag tracking this on-disk? So this would need additional work, too. > An incremental is supposed to work on snapshots of the same subvolume > (hence, have the same parent uuid). > That's what the entire code relies on to work correctly, and that's > what makes sense - to compute and send the differences between two > points in time of a subvolume. > That's why the code base assumes that inodes with the same number and > same generation must refer to the same inode in both the parent and > the send root. > > What I think that needs to be answered is: > > 1) Are there actually people using incremental sends in that way? > (It's the first time I see such use case) Well, I did (: But admittedly in a kind of experimental setup testing out the limits of btrfs. > 2) If so, why? That is completely unreliable, not only it can lead to > failure to apply the streams, but can result in all sorts of > weirdness > (logical inconsistencies, etc) if applying such streams doesn't cause > an error. In my case I was moving around subvolumes between multiple disks and deduplicating as much as possible. Trying to preserve already done deduplication and purging some intermediate subvolumes I ended up using "unrelated" subvolumes as parents. Thinking of it, maybe just using them as clone sources would have just worked? That would then of course produce much larger streams since *all* meta data had to be transfered. > 3) Making sure such use cases work reliably would require many, many > changes to the send implementation, as it goes against what it > currently expects. > Snapshot a subvolume, change the subvolume, snapshot it again, > then use both snapshots for the incremental send, that's the expected > scenario. I actually don't think that it is really that much work since besides from some edge cases it already *does* work - I tried ;) > In other words, what I think we should have is a check that forbids > using two roots for an incremental send that are not snapshots of the > same subvolume (have different parent uuids). I'd like to argue against that: 1. I don't think allowing this requires that much work 2. explicitly forbiding it requires work, too (and maybe even changes to the on-disk format?) 3. fixing bugs for this unexpected use case will probably also fix bugs for the expected scenario which may only happen in very rare and extremly unlikely - though still possible - cases and thus make the code overall more resilient: For example the assumption that inodes with the same number must refer to the same inode doesn't even hold for direct snapshots - almost always it does but in some rare conditions it doesn't, which is why there already is an additional check for that. This already caused bugs before. And the bug I hit with my unexpected use of btrfs-send and originally set out to find you had just fixed a few weeks before [1], i.e. if you hadn't fixed it I would - because of the unexpected use. The bug my patch fixes I only discovered by reading the code and having my scenario in mind. [1] https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git/commit/?id=0b3f407e6728d990ae1630a02c7b952c21c288d3 That's why I think the work fixing arising bugs for that case is better invested than in trying to just block it completely. But given its very rare occurence I would agree to not put any priority on it. > Thanks. >