On 29.03.2021 22:14, Claudius Heine wrote:
> Hi Andrei,
> 
> On 2021-03-29 18:30, Andrei Borzenkov wrote:
>> On 29.03.2021 16:16, Claudius Heine wrote:
>>> Hi,
>>>
>>> I am currently investigating the possibility to use `btrfs-stream` files
>>> (generated by `btrfs send`) for deploying a image based update to
>>> systems (probably embedded ones).
>>>
>>> One of the issues I encountered here is that btrfs-send does not use any
>>> diff algorithm on files that have changed from one snapshot to the next.
>>>
>>
>> btrfs send works on block level. It sends blocks that differ between two
>> snapshots.
> 
> Are you sure?
> 

Yes.

> I did a test with a 32MiB random file. I created one snapshot, then
> changed (not deleted or added) one byte in that file and then created a
> snapshot again. `btrfs send` created a >32MiB `btrfs-stream` file. If it
> would be only block based, then I would have expected that it would just
> contain the changed block, not the whole file. And if I use a smaller
> file on the same file system, then the `btrfs-stream` is smaller as well.
> 
> I looked into those `btrfs-stream` files using [1] and also [2] as well
> as the code. While I haven't understood everything there yet, it
> currently looks to me like it is file based.
> 

btrfs send is not pure block based image, because it would require two
absolutely identical filesystems. It needs to replicate filesystem
structure so it of course needs to know which files are created/deleted.
But for each file it only sends changed parts since previous snapshot.
This only works if both snapshots refer to the *same* file.

As was already mentioned, you need to understand how your files are
changed. In particular, standard tools for software update do not
rewrite files in place - they create new files with new content. From
btrfs perspective they are completely different; two files with the same
name in two snapshots do not share a single byte. When you compute delta
between two snapshots you get instructions to delete old file and create
new file with new content (that will be renamed to the same name as
deleted old file). This also by necessity sends full new content.

So yes, btrfs replication is block based; similarity is determined by
how much physical data is shared between two files. And you expect file
based replication where file names determine whether files should be
considered the same and changes are computed for two files with the same
name.

>>
>>> One way to implement this would be to add some sort of 'patch' command
>>> to the `btrfs-stream` format.
>>>
>>
>> This would require reading complete content of both snapshots instead if
>> just computing block diff using metadata. Unless I misunderstand what
>> you mean.
> I think I should only need access to the old snapshot as well as the
> `btrfs-stream` file. But I currently don't have a complete PoC of this
> ready.
> 
> regards,
> Claudius
> 
> [1] https://github.com/sysnux/btrfs-snapshots-diff
> [2] https://btrfs.wiki.kernel.org/index.php/Design_notes_on_Send/Receive

Reply via email to