Hi! On Mon, 2023-04-03 at 14:02:13 +0200, Helmut Grohne wrote: > I have been looking into the aliasing problems in dpkg on behalf of > Freexian's Debian funding. To that end I proposed a possible way forward > last year (https://lists.debian.org/debian-dpkg/2022/11/msg00007.html), > but the feedback I got was not particularly helpful in determining > consensus.
I thought my reply was rather clear, and that we had further clarified that privately, that at the time I thought there was no other answer required as (AFAIR) you stated you'd be digging further on it. And I mentioned I'd try to reply to the list, but it didn't feel urgent given the clarifications given, neither the timing during the freeze? > A little later, Simon Richter also looked into the problem > (https://lists.debian.org/debian-dpkg/2022/12/msg00023.html), but > remained silent after the initial post. Little happened since then. Now > Raphael Hertzog proposed to use the DEP process to get this thing > unstuck Sigh, a DEP(!?), for a dpkg change? It feels more like a way to exhort pressure over this than anything else TBH… > and with the help of Emilio Pozuelo Monfort I created a draft > for discussion. I allocate number 17 via debian-project@l.d.o. What > follows is the draft text. Please consider it to be a piece of best > intentions at reconciling feedback wherever I could. I'm unlikely to discuss this topic on debian-devel, given previous nastiness and abuse. The text includes most (but not all) of what I've been saying publicly, and what I've tried to further clarify to you and Emilio in private. But I think ignores the essence of what I've been repeating all along. > Introduction > ============ > > At its core, `dpkg` assumes that every filename uniquely refers to a > file on disk. The situation where two distinct filenames refer to the > same file on disk is referred to as aliasing. (To be precise, I think this describes hardlinks. Aliasing occurs when different pathnames where their last component is not a symlink, all refer to the same filename on the same directory. But I don't think this matters much.) > Proposal > ======== > > In order to handle aliasing efficiently, `dpkg` gains new options > `--add-alias <symlink>`, `--remove-alias <symlink>` and > `--list-aliases`. When creating symbolic links that cause aliasing > effects, the creating entity is supposed to inform `dpkg` using an > appropriate invocation. Doing so records the aliasing information in a > new mapping inside its administrative directory. No existing > administrative files are modified as a result of this operation. When > `dpkg` operates on paths, it can compute a canonicalized version using a > pure function without the need to `stat()` files on disk thus greatly > improving performance. Canonicalized paths are only needed when > determining whether a file conflict exists. In all other cases, > original paths continue to be used as symbolic links will be followed by > filesystem operations. The `--add-alias` operation records the target > of the symbolic link that must exist prior to invocation. The > `--remove-alias` operation fails if any files are still installed in the > aliased location. I already mentioned this in my reply for the thread you reference. So, let me repeat and possibly expand to avoid any future doubt. I already considered and discarded something like this (except for using a config option instead of a new command, but that does not really change the substance of the problems). Let's also get back to the very basics. dpkg manages objects shipped in binary packages, on the filesystem. It assumes this managing role in exclusivity, it will for example overwrite unmanaged files. It preserves admin changes with interfaces specifically provided for that (diversions, statoverrides, conffile changes) or the unfortunate symlink redirects. These shipped objects define the filesystem layout (not the other way around). Due to the missing fsys metadata, where it does not have all such metadata at hand when necessary (it might only have the one for the currently unpacked .deb), it might use heuristics or check the filesystem for such metadata, because it does not have anything else, but that should not be taken to mean that the filesystem is the source of truth, as most of those will be unnecessary once it has such metadata at hand. So the reason this proposal is still conceptually wrong is manifold: * dpkg cannot safely and atomically perform such switches (and I don't see it ever being able to portably do so, so I don't see ever supporting that). * No packages ships those symlinks (and none should! as that would currently imply having the same pathname contain different file types on the same system, introducing ordering issues and file type conflicts). * This introduces a series of commands to let dpkg know that a filesystem change that was not shipped in any .deb (even though that should have been the way to do it), has been done, which: - Switches the source of truth from the .deb to the fsys. - Confuses admin initiated changes from distro initiated ones. * Wants to be a generic change but it is really targeted to this specific mess. We have been doing similar aliasing transitions for many doc dirs, by stopping shipping files within, shipping that pathname as a symlink and then switching the directories to symlinks to match (via the dpkg-maintscript-helper hack because we miss fsys metadata). This means we'd need to then register all these directories too? Meh. * This information can get out of sync with reality, as it adds an additional and unconnected with anything source of truth, that dpkg cannot do anything about if it diverges (in contrast to diversions or statoverrides f.ex.). This can never happen when that information comes from the real source of truth (the fsys metadata via the .deb). * This also adds undue complexity, by supporting those as admin aliases. The admin generated redirecting symlinks are already annoying, I'd rather not add further to that pile. I don't really want to support admins doing this (dpkg-divert does not even support diverting a directory). [ As an aside, I think ideally eventually nothing distro provided should be allowed to be installed within an aliased dir, and dpkg should eventually just error out in those cases, which eventually would get rid of the aliasing problems and any such complexity (I'm not sure how or when that would be feasible though, but obviously in Debian at least not until nothing ships files there). ] So this still looks like a terrible interface, like it did at the time it was discarded; founded on a hack, an interface that seems wants to be kind of a file-type override but it cannot be, and cannot even properly act as record tracker, etc… > Rejected proposals > ================== > > Hardcoding aliases into dpkg > ---------------------------- > > It was suggested to include a static aliasing mapping into the `dpkg` > source code. Since `dpkg` is used by multiple projects in different > ways (not necessarily Debian-derivatives), this approach would break > other consumers. Also note that Debian's `dpkg` can be used to operate > on an installation using different aliases via the `--root` flag. As > such the alias mapping needs to be a property of the installation. Yes. > Modifying package lists in place > -------------------------------- > > `dpkg` could rewrite the extracted `.list` files from `control.tar` and > store paths in canonicalized form. Canonicalization would happen as > when a `control.tar` is extracted. It would also happen either as a > one-time conversion during the upgrade of `dpkg` or whenever a `.list` > file is read. Given canonicalized list files, string comparison on > files would support conflict detection. Other pieces to be updated in a > similar way include `alternatives`, `diversions`, `statoverride`, and > `triggers`. > > This would affect the output of `dpkg -S`, which would then output > canonicalized paths. Packages generated by `dpkg-repack` would have > their contents canonicalized as well. This is an interface breaking change, as it introduces change-at-a-distance for packages themselves, and reproducibility issues that depend on the system at hand. As it is based on a foundation of an invented filesystem view, as those remapped packages never shipped those pathnames. > Managing the aliasing mapping using a control file > -------------------------------------------------- > > It was suggested that the mapping could be managed via a special control > file `canonical`. Given that aliasing is not a common operation, the > benefit of handling it declaratively is minor. Beyond that, aliasing > can also happen as an customization issued by an administrator. > Therefore, a command line based approach is preferred. As long as the package does not provide the symlinks, shipping this type of information declaratively would also be conceptually wrong. And it is just a distraction from the fsys metadata stuff, with all the drawbacks of the CLI commands. > Having dpkg move files and create symbolic links > ------------------------------------------------ > > When instructed with `--add-alias`, `dpkg` could also create the > corresponding symbolic links and move the affected files to their new > location. While that would be convenient, doing so is non-trivial in an > atomic way. Sometimes, the underlying filesystem does not fully conform > to POSIX (e.g. `overlayfs`) and such corner cases need to be managed > individually. Since such an implementation already exists outside > `dpkg` and its complexity is non-trivial, the moving of files shall > remain external. In case aliases are setup in a bootstrap setting, no > moves are necessary. dpkg expects several requirements for filesystems semantics, if they do not provide them, then those filesystems are not supported for dpkg to manage objects on them. dpkg cannot guarantee atomicity and safety for this kind of aliasing switch, and I don't see it will ever be able to support performing such switch, as that can break the system. > Implement aliasing after metadata tracking > ------------------------------------------ > > The [metadata > tracking](https://wiki.debian.org/Teams/Dpkg/Spec/MetadataTracking) > feature enhances `dpkg` with knowledge about filesystem metadata for > installed files. This includes knowledge of symbolic links, which would > help with tracking aliasing. Unfortunately, progress on this is fairly > slow and we think that aliasing support is more urgent. I thought it would be clear that if there is stuff that depends on any of this kind of changes to dpkg, relying on those changes in Debian would not be possible until after trixie+1. Of course there is always the route to further pile up over the Jenga tower of hacks, by for example adding huge amounts of Pre-Depends… So given the above, I don't see why the apparent rush here. And as I've mentioned many times now, I'm planning to continue working on the fsys metadata stuff for 1.22.x, probably at the cost of database duplication if necessary, if current blockers have not adapted by then. But as I've mentioned before, that might not guarantee this support is sufficient to support fixing this mess. But all other proposed changes I've seen flying around for changes to dpkg are just conceptually wrong in one way or another. Regards, Guillem