On 11/23/22 at 10:27pm, Allan McRae wrote:
> The idea of package deltas just won't go away...  However, binary diffs
> really are not ideal with pacman verifying the compressed package - that
> means we need to reconstruct the package on the users system to verify. Also
> our old approach using xdelta3 somewhat died when moving packages away from
> gz (or xz?) compression.  Other binary diff approaches really suffered the
> same issue.  In general, I find the approach of reconstructing the full
> package to be suboptimal.  I also don't particulaly want to verify
> uncompressed packages.
> 
> 
> I wondered if this was a case of perfect being the enemy of good, so I have
> investigated a different, very lazy approach.  Instead of taking a binary
> diff, we could just provide the files that have changed between package
> versions.  This is super easy to do as we have checksums for all files in
> the mtree file.  We could then extract this "diff" package directly, and use
> the mtree file to adjust timestamps/permissions/etc(?) on kept files, and it
> would be just like the full package had been installed.

As I understand your intended approach, operations using a diff package would
be fundamentally different than those involving a full package.  Files changed
on the system but unchanged in the package would not be restored.  Once
upgraded, the cached diff package would be useless for
reinstallation/downgrading without downgrading to the previous version first
then upgrading again using the diff. `pacman -S foo` to reinstall would no
longer work without downloading the full package.

It's not ideal, but I think those are reasonable caveats.  People generally
shouldn't be messing with non-backup files anyway and as long as they manage
their cache properly, reinstallation and downgrading using the cache are still
possible.

> I ran some numbers to see if this was worth while.  The results for the last
> bunch of updates for bash, coreutils, qt5-base and systemd are given here:
> https://wiki.archlinux.org/title/User:Allan/Pkgdiff
> 
> On major version updates, this is approach is a waste of time.  But for
> minor updates bash download would average 25% of the size, coreutils about
> 36% (though was ~1% for simple rebuilds!), qt5-base about 40% and systemd
> 60%.  Not shown but worth noting note that when Arch changes gcc/binutils
> versions or updates CFLAGS etc, this can stop any binary diff being as
> useful.
> 
> 
> If we implemented using these diffs but only allowed it for updates from the
> previous package version (i.e. no diffs to package (current - 2) or earlier,
> or diff chaining), then this would be rather simple to implement (at least
> from the pacman side...).

I agree with no diff chaining; keeping them as separate partial packages
instead of reconstructing a full package would make chaining a little
complicated.  I'm not sure about the previous-version-only rule though.  The db
is going to have to know the base version for the partial package either way,
so the cost of supporting multiple bases seems low as far as we're concerned;
just a simple search through the available partial files for one based on the
currently installed version.

Reply via email to