On pátek 25. července 2025 0:16:53, středoevropský letní čas Aoife Moloney via devel-announce wrote: > Wiki - https://fedoraproject.org/wiki/Changes/Hardlink_identical_files_in_packages_by_default > Discussion thread - > https://discussion.fedoraproject.org/t/f43-change-proposal-hardlink-identical-files-in-packages-by-default-self-contained/160769 > > This is a proposed Change for Fedora Linux. > This document represents a proposed Change. As part of the Changes > process, proposals are publicly announced in order to receive > community feedback. This proposal will only be implemented if approved > by the Fedora Engineering Steering Committee. > > == Summary == > A post-build step is added to the package build macros to > automatically hardlink all identical files under `/usr`. Previously, > this was done in some packages and now it's done everywhere by > default. > > == Owner == > * Name: [[User:zbyszek|Zbigniew Jędrzejewski-Szmek]] > * Email: zbyszek at in.waw.pl > > > == Detailed Description == > Files can be hardlinked at the end of the `%install` step in package > builds. rpm supports this and will preserve those links in the binary > rpm and during installation. This makes the installation a bit more > efficient. Hardlinking of read-only files is generally transparent to > the user, but has some small benefits: the files are not duplicated in > the file system; backup, copy, and search programs will usually make > use of the link information and not process the same inode twice. > Thus, it's good to hardlink as many packaged files as possible. > > Previously, hardlinking was done automatically for a subset of files > in Python packages (via the `%__os_install_post_python` macro), and > explicitly in some packages with lots of similar files (usually via > the `hardlink` program). > > The `%__os_install_post` is extended to automatically hardlink all > identical files under `%{buildroot}%{_prefix}`, i.e. the `/usr` > directory in packages. This calls a new helper binary (part of the > `add-determinism` package) that does the linking. > > Hard links may be confusing if the file is ''modified''. In > particular, all links to the same inode share the same ownership and > permissions, and obviously the same contents. Thus, we want to apply > hardlinking only to files under `/usr`, which are generally read-only > in packages.
My /usr directories are NOT read-only. I have no intention of making the switch :shrug:. I frequently modify /usr files when I'm debugging or temporarily fixing issues. I think we need at least a knob to disable this feature for certain installations, and a feature that would safely "unlink" those files if necessary, in case hardlinks inadvertently came to those systems with released Fedora images. Pavel > When files are hardlinked, mtime (the modification timestamp) is taken > into account. Only files with identical mtime, owner, group, and mode > are subject to linking. The new program written to do the linking > takes `$SOURCE_DATE_EPOCH` into account, and will clamp mtimes to it > before comparing. > > Note: rpm correctly handles the case where a hardlink is between files > in two different subpackages. Thus, we can hardlink everything under > `%{buildroot}`, and rpm will store the files as hardlinked if they are > in the same output package, adjusting the hardlink counts as > appropriate. > > == Feedback == > <!-- Summarize the feedback from the community and address why you > chose not to accept proposed alternatives. This section is optional > for all change proposals but is strongly suggested. Incorporating > feedback here as it is raised gives FESCo a clearer view of your > proposal and leaves a good record for the future. If you get no > feedback, that is useful to note in this section as well. For > innovative or possibly controversial ideas, consider collecting > feedback before you file the change proposal. --> > > == Benefit to Fedora == > As mentioned in the Summary, hardlinking deduplicates the data in rpms > and in installations. Backup, copy, and search programs will usually > make use of the link information and not process the same inode twice. > Thus, by hardlinking files in the packages we make things a bit more > efficient. (The impact is small, because rpms generally don't have > large duplicated files.) > > Hardlinking of files was previously done in some packages explicitly, > but it required adding a `BuildRequires` line and invoking a script, > so it wasn't done very often. By handling this automatically, we'll be > able to simplify those packages. > > Another caveat that needs to be taken into account when doing > hardlinking as part of the package build is that newer `hardlink` > versions use reflinks instead of hardlinks by default. (With a > hardlink, one inode is connected to the file system tree in two or > more places. With a reflink, some blocks of an inode are shared with > another inode, ''inside'' of the file system, and the two inodes > retain their separate identities.) rpm has no knowledge of reflinks, > so those reflinks created during package build have no effect on the > binary package and the payload is duplicated. Invocations of > `hardlink` would have to be annotated with `--reflink=never` to retain > the intended effect. By removing that step from packages we avoid this > issue. > > The [https://docs.fedoraproject.org/en-US/reproducible-builds/ > Reproducible Builds] effort reported that some packages that use > hardlinking are not reproducible, see > [https://pagure.io/fedora-reproducible-builds/project/issue/22 > irreproducibility#22]. When files are created in the package build, > depending on how fast the build machine is, some files might or might > not have identical timestamps. The tools that were used to compare > files for hardlinking were general tools that did not "know" that we'd > clamp the mtimes to `$SOURCE_DATE_EPOCH` in a subsequent step, so the > results of the mtime comparisons were unstable. The tool that is added > as part of this Change does the mtime clamping internally for > reproducible results. Fixing this issue was the initial motivation for > this change. > > == Scope == > * Proposal owners: > ** extend the `add-determinism` package with a little helper that does > file comparisons and hardlinks identical files. The helper takes > `$SOURCE_DATE_EPOCH` into account. > ** open pull request for `redhat-rpm-config` to insert a call to the > helper in `%__os_install_post`. > ** open pull request for `python-srpm-macros` to drop their hardlinking step. > * Other developers: > ** merge pull request > ** report issues if the hardlinking has unforeseen consequences or > does not work correctly. > ** drop explicit calls to `hardlink` in their packages. > > * Release engineering: > > * Policies and guidelines: not needed, AFAICT. > > * Trademark approval: N/A (not needed for this Change) > > > * Alignment with the Fedora Strategy: > > > == Upgrade/compatibility impact == > No impact. > > == Early Testing (Optional) == > Build package with an invocation of the new helper. > > == How To Test == > Install packages rebuilt with the helper. > > == User Experience == > Not visible to users. > > == Dependencies == > > == Contingency Plan == > * Contingency mechanism: > ** if hardlinking causes a problem in some specific packages, they can > be trivially modified to skip the hardlinking step by setting a macro. > ** if there is a general problem, we can easily drop the macro in > `redhat-rpm-config`. > * Contingency deadline: any time, even after release. Any affected > packages would have to be rebuilt. > * Blocks release? No. > > == Documentation == > The invocation of the helper will be documented inline in the macros > files. Other documentation is not needed. > > == Release Notes == > Package builds automatically hardlink identical files. This reduces > the installation footprint a bit and also makes packages builds more > reproducible. > > > > -- > Aoife Moloney > > Fedora Operations Architect > > Fedora Project > > Matrix: @amoloney:fedora.im > > IRC: amoloney > > -- _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue