Hi Raphaƫl, On Fri, Apr 21, 2023 at 03:03:10PM +0200, Raphael Hertzog wrote: > Here you are considering all files, but for the purpose of our issue, > we can restrict ourselves to the directories known by dpkg. We really > only care about directories that have been turned into symlinks (or > packaged symlinks that are pointing to directories). That's a a much lower > number of paths that we would have to check.
Considering just sid amd64 main, I count around 140000 directories, which clearly is less than millions. A typical installation will only have a fraction of that, probably less than 50000. I think this is the number of stat() calls we'd have to do. I timed this on a reasonably fast system (admittedly using Python but I think the overhead is not huge) and this can complete in around 0.1s (with a hot vfs cache). So depending on the cache invalidation strategy this may be viable or not. This is looking at it from a performance point of view. Guillem also raised that this is changing the source of truth from the dpkg database to the actual filesystem, which Guillem considers wrong and I find that vaguely agreeable. > We don't add any new public interface to dpkg, but we also have the > possibility to remove to /var/lib/dpkg/aliases to force an new scan > (some sort of "dpkg --refresh-aliases" without an official name). Can I rephrase this as your cache invalidation strategy is that any external entity (such as a maintainer script) introducing aliases should explicitly invalidate the cache. > It might still be cleaner to have that "dpkg --refresh-aliases" command > so that we can invoke it for example in "dpkg-maintscript-helper > symlink_to_dir/dir_to_symlink" when we are voluntarily turning a directory > into a symlink (or vice-versa). If you put it this way, it is not that different from the --add-alias/--remove-alias proposal. It is a different interface to dpkg, but the semantics are roughly the same: In both cases, something external to dpkg is responsible for performing the moves and creating the symbolic links followed by informing dpkg about the alias (explicitly or implicitly via scanning directories). Would you agree with me that this is a minor adaption of DEP17? In essence what changes is the way that a user communicates aliases to dpkg, but the assumption that a user must communicate aliases to dpkg is not affected. I'd be fine with changing this aspect in principle, but I still consider this a new public interface to dpkg with much the same effects to long term maintenance. > In any case, now that you have a database of aliases, you can do the other > modifications to detect conflicting files and avoid file losses. > > How does that sound? It sounds all the same as DEP17 with a different color to me. Hope I got it right. What I tried ruling out as naive solution is eliminating the need to tell dpkg about aliasing changes and then we'd have to incur this 0.1s delay after every maintainer script invocation, which would amount to 5 minutes of stat()ing on a typical dist-upgrade assuming a hot vfs cache on a fast x86 CPU. > The proposal I made above is not a real database in the sense that we > don't record what was shipped by the .deb when we installed the files... > it's rather the opposite, it analyzes the system to detect possible > conflicts with dpkg's view of the system. I think that Guillem considers this a bad property as he has expressed in his reply on debian-dpkg, that .debs should be the source of truth. > It can be seen as complimentary to it. In any case, I don't see how > implementing metadata tracking would help to solve the problem that we > have today. dpkg would know that all .deb have /bin as a directory and > not as a symlink, and it would be able to conclude that the directory > has been replaced by a symlink by something external, but that's it. Let me put it subtly different. As we currently do not ship the aliasing symbolic links in any data.tar, metadata tracking will not tell dpkg about the aliasing and therefore metadata tracking cannot help resolve the current situation (as singular measure). We can only add the symbolic links to a data.tar after the aliasing has been resolved (see Simon Richter's mails on how dpkg resolves directory vs symlink) and thus metadata tracking can only help with resolving the situation after we have fully resolved the situation. I don't see a way to resolve this vicious circle and shall update the DEP17 text. Helmut