On Sun, 2021-01-03 at 16:16 -0500, Colin Walters wrote:
> 
> On Sat, Jan 2, 2021, at 10:03 AM, Zbigniew Jędrzejewski-Szmek wrote:
> 
> > I fail to see why this would be significantly better...
> 
> I don't claim that the "separate temporary directory of unpacked
> content" is *better* - just that it's as easy to implement *and*
> doesn't require an RPM format change (with all the consequent pain)
> or support for reflinks from the underlying filesystem.
> 
> >  The logic to
> > handle the split rpm contents would seem to be more complicated
> > than the
> > rewrite with /usr/bin/rpm2extents. Other comments?
> 
> Hard to really say for sure I guess without trying to write
> both.  Probably the biggest impediment is that changes like that
> would end up needing to be split across the librpm + zypper/rpm-
> ostree/dnf tools.  It wasn't an accident really that for rpm-ostree
> /usr/bin/rpm is read-only - we effectively squash those layers
> togther and can thus make deep changes as a single unit.
> 
> Anyways, none of this really *requires* reflinks in any way and so
> calling the Change "RPMCoW" is misleading from that
> perspective.  "DnfParallelUnpack" would probably be a better title,
> with a dependency on "RPMFormatCowReady" or something.  And then my
> point is that one could do "DnfParallelUnpack" without changing the
> RPM format without much more complexity, if any.

Early on in this project I looked at creating all the files during
download in a temporary directory. It would work. It is more filesystem
type agnostic. If moving the decompression to an earlier step were the
sole goal, it's reasonable.

The goal of RPMCoW is to write once, and re-use data multiple times.
This comes up in a number of circumstances for this proposal:

1. Reflinking allows for de-duplication of file content. Today this is 
   only within a single RPM. I am looking at changing rpm2extents to
   reuse data across (cached) rpms to achieve something kind of like
   delta rpm. That is: if you already have file X, you don't write it,
   you clone it from any other rpm.
2. Reflinking allows sharing of file contents, without side effects 
   from the installed copy. Each copy is a real, distinct file, can be 
   deleted and or modified. Only the differences cost something, and
   99% of rpms files don't get modified. The net result is that the 
   rpm cache costs very little.
3. If you can keep a rpm cache, you can reuse the data very quickly, 
   either to build a new rootfs in a subdir/subvolume with the same or 
   different packages, and you can use those files for containers.
   This sounds similar to using snapshots, but with snapshots you're
   operating on a filesystem at a time, and you can only go backwards.
   Here you can decide what you want, and you get maximum reuse 
   automatically.

By contrast "DnfParallelUnpack" by itself, without CoW, is less useful
because you will need to re-fetch and re-decompress data.

Lastly, I'd like to emphasize that I'm not trying to change the "normal
rpm format". Doing so would orphan every previously built and signed
rpm, and would present a serious backward compatibility problem. I aim
to only change how they're downloaded and stored in the cache, locally,
and consumed in rpm itself within the confines of hosts that (can)
enable this.

- Matthew
_______________________________________________
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Reply via email to