Hi!

On Mon, 2023-04-03 at 14:02:13 +0200, Helmut Grohne wrote:
> I have been looking into the aliasing problems in dpkg on behalf of
> Freexian's Debian funding. To that end I proposed a possible way forward
> last year (https://lists.debian.org/debian-dpkg/2022/11/msg00007.html),
> but the feedback I got was not particularly helpful in determining
> consensus.

I thought my reply was rather clear, and that we had further clarified
that privately, that at the time I thought there was no other answer
required as (AFAIR) you stated you'd be digging further on it. And I
mentioned I'd try to reply to the list, but it didn't feel urgent given
the clarifications given, neither the timing during the freeze?

> A little later, Simon Richter also looked into the problem
> (https://lists.debian.org/debian-dpkg/2022/12/msg00023.html), but
> remained silent after the initial post. Little happened since then. Now
> Raphael Hertzog proposed to use the DEP process to get this thing
> unstuck

Sigh, a DEP(!?), for a dpkg change? It feels more like a way to exhort
pressure over this than anything else TBH…

> and with the help of Emilio Pozuelo Monfort I created a draft
> for discussion. I allocate number 17 via debian-project@l.d.o.  What
> follows is the draft text. Please consider it to be a piece of best
> intentions at reconciling feedback wherever I could.

I'm unlikely to discuss this topic on debian-devel, given previous
nastiness and abuse.

The text includes most (but not all) of what I've been saying publicly,
and what I've tried to further clarify to you and Emilio in private.
But I think ignores the essence of what I've been repeating all along.

> Introduction
> ============
> 
> At its core, `dpkg` assumes that every filename uniquely refers to a
> file on disk.  The situation where two distinct filenames refer to the
> same file on disk is referred to as aliasing.

(To be precise, I think this describes hardlinks. Aliasing occurs when
different pathnames where their last component is not a symlink, all
refer to the same filename on the same directory. But I don't think this
matters much.)

> Proposal
> ========
> 
> In order to handle aliasing efficiently, `dpkg` gains new options
> `--add-alias <symlink>`, `--remove-alias <symlink>` and
> `--list-aliases`.  When creating symbolic links that cause aliasing
> effects, the creating entity is supposed to inform `dpkg` using an
> appropriate invocation.  Doing so records the aliasing information in a
> new mapping inside its administrative directory.  No existing
> administrative files are modified as a result of this operation.  When
> `dpkg` operates on paths, it can compute a canonicalized version using a
> pure function without the need to `stat()` files on disk thus greatly
> improving performance.  Canonicalized paths are only needed when
> determining whether a file conflict exists.  In all other cases,
> original paths continue to be used as symbolic links will be followed by
> filesystem operations.  The `--add-alias` operation records the target
> of the symbolic link that must exist prior to invocation.  The
> `--remove-alias` operation fails if any files are still installed in the
> aliased location.

I already mentioned this in my reply for the thread you reference. So,
let me repeat and possibly expand to avoid any future doubt. I already
considered and discarded something like this (except for using a config
option instead of a new command, but that does not really change the
substance of the problems).

Let's also get back to the very basics. dpkg manages objects shipped
in binary packages, on the filesystem. It assumes this managing role in
exclusivity, it will for example overwrite unmanaged files. It preserves
admin changes with interfaces specifically provided for that (diversions,
statoverrides, conffile changes) or the unfortunate symlink redirects.
These shipped objects define the filesystem layout (not the other way
around). Due to the missing fsys metadata, where it does not have all
such metadata at hand when necessary (it might only have the one for
the currently unpacked .deb), it might use heuristics or check the
filesystem for such metadata, because it does not have anything else,
but that should not be taken to mean that the filesystem is the source
of truth, as most of those will be unnecessary once it has such
metadata at hand.

So the reason this proposal is still conceptually wrong is manifold:

* dpkg cannot safely and atomically perform such switches (and I don't
  see it ever being able to portably do so, so I don't see ever
  supporting that).
* No packages ships those symlinks (and none should! as that would
  currently imply having the same pathname contain different file types
  on the same system, introducing ordering issues and file type
  conflicts).
* This introduces a series of commands to let dpkg know that a
  filesystem change that was not shipped in any .deb (even though that
  should have been the way to do it), has been done, which:
  - Switches the source of truth from the .deb to the fsys.
  - Confuses admin initiated changes from distro initiated ones.
* Wants to be a generic change but it is really targeted to this
  specific mess. We have been doing similar aliasing transitions for
  many doc dirs, by stopping shipping files within, shipping that
  pathname as a symlink and then switching the directories to symlinks
  to match (via the dpkg-maintscript-helper hack because we miss fsys
  metadata). This means we'd need to then register all these directories
  too? Meh.
* This information can get out of sync with reality, as it adds an
  additional and unconnected with anything source of truth, that dpkg
  cannot do anything about if it diverges (in contrast to diversions
  or statoverrides f.ex.). This can never happen when that information
  comes from the real source of truth (the fsys metadata via the .deb).
* This also adds undue complexity, by supporting those as admin aliases.
  The admin generated redirecting symlinks are already annoying, I'd rather
  not add further to that pile. I don't really want to support admins doing
  this (dpkg-divert does not even support diverting a directory).

  [ As an aside, I think ideally eventually nothing distro provided should
    be allowed to be installed within an aliased dir, and dpkg should
    eventually just error out in those cases, which eventually would get
    rid of the aliasing problems and any such complexity (I'm not sure how
    or when that would be feasible though, but obviously in Debian at
    least not until nothing ships files there). ]

So this still looks like a terrible interface, like it did at the time
it was discarded; founded on a hack, an interface that seems wants to
be kind of a file-type override but it cannot be, and cannot even
properly act as record tracker, etc…

> Rejected proposals
> ==================
> 
> Hardcoding aliases into dpkg
> ----------------------------
> 
> It was suggested to include a static aliasing mapping into the `dpkg`
> source code.  Since `dpkg` is used by multiple projects in different
> ways (not necessarily Debian-derivatives), this approach would break
> other consumers.  Also note that Debian's `dpkg` can be used to operate
> on an installation using different aliases via the `--root` flag.  As
> such the alias mapping needs to be a property of the installation.

Yes.

> Modifying package lists in place
> --------------------------------
> 
> `dpkg` could rewrite the extracted `.list` files from `control.tar` and
> store paths in canonicalized form.  Canonicalization would happen as
> when a `control.tar` is extracted.  It would also happen either as a
> one-time conversion during the upgrade of `dpkg` or whenever a `.list`
> file is read.  Given canonicalized list files, string comparison on
> files would support conflict detection.  Other pieces to be updated in a
> similar way include `alternatives`, `diversions`, `statoverride`, and
> `triggers`.
> 
> This would affect the output of `dpkg -S`, which would then output
> canonicalized paths.  Packages generated by `dpkg-repack` would have
> their contents canonicalized as well.

This is an interface breaking change, as it introduces
change-at-a-distance for packages themselves, and reproducibility
issues that depend on the system at hand. As it is based on a foundation
of an invented filesystem view, as those remapped packages never shipped
those pathnames.

> Managing the aliasing mapping using a control file
> --------------------------------------------------
> 
> It was suggested that the mapping could be managed via a special control
> file `canonical`.  Given that aliasing is not a common operation, the
> benefit of handling it declaratively is minor.  Beyond that, aliasing
> can also happen as an customization issued by an administrator.
> Therefore, a command line based approach is preferred.

As long as the package does not provide the symlinks, shipping this
type of information declaratively would also be conceptually wrong.
And it is just a distraction from the fsys metadata stuff, with all
the drawbacks of the CLI commands.

> Having dpkg move files and create symbolic links
> ------------------------------------------------
> 
> When instructed with `--add-alias`, `dpkg` could also create the
> corresponding symbolic links and move the affected files to their new
> location.  While that would be convenient, doing so is non-trivial in an
> atomic way.  Sometimes, the underlying filesystem does not fully conform
> to POSIX (e.g. `overlayfs`) and such corner cases need to be managed
> individually.  Since such an implementation already exists outside
> `dpkg` and its complexity is non-trivial, the moving of files shall
> remain external.  In case aliases are setup in a bootstrap setting, no
> moves are necessary.

dpkg expects several requirements for filesystems semantics, if they do
not provide them, then those filesystems are not supported for dpkg to
manage objects on them.

dpkg cannot guarantee atomicity and safety for this kind of aliasing
switch, and I don't see it will ever be able to support performing such
switch, as that can break the system.

> Implement aliasing after metadata tracking
> ------------------------------------------
> 
> The [metadata
> tracking](https://wiki.debian.org/Teams/Dpkg/Spec/MetadataTracking)
> feature enhances `dpkg` with knowledge about filesystem metadata for
> installed files.  This includes knowledge of symbolic links, which would
> help with tracking aliasing.  Unfortunately, progress on this is fairly
> slow and we think that aliasing support is more urgent.

I thought it would be clear that if there is stuff that depends on
any of this kind of changes to dpkg, relying on those changes in
Debian would not be possible until after trixie+1. Of course there is
always the route to further pile up over the Jenga tower of hacks,
by for example adding huge amounts of Pre-Depends…

So given the above, I don't see why the apparent rush here. And as I've
mentioned many times now, I'm planning to continue working on the fsys
metadata stuff for 1.22.x, probably at the cost of database duplication
if necessary, if current blockers have not adapted by then. But as I've
mentioned before, that might not guarantee this support is sufficient to
support fixing this mess. But all other proposed changes I've seen
flying around for changes to dpkg are just conceptually wrong in one way
or another.

Regards,
Guillem

Reply via email to