On pátek 25. července 2025 0:16:53, středoevropský letní čas Aoife Moloney via 
devel-announce wrote: > Wiki - 
https://fedoraproject.org/wiki/Changes/Hardlink_identical_files_in_packages_by_default
 > Discussion thread - > 
https://discussion.fedoraproject.org/t/f43-change-proposal-hardlink-identical-files-in-packages-by-default-self-contained/160769
> 
> This is a proposed Change for Fedora Linux.
> This document represents a proposed Change. As part of the Changes
> process, proposals are publicly announced in order to receive
> community feedback. This proposal will only be implemented if approved
> by the Fedora Engineering Steering Committee.
> 
> == Summary ==
> A post-build step is added to the package build macros to
> automatically hardlink all identical files under `/usr`. Previously,
> this was done in some packages and now it's done everywhere by
> default.
> 
> == Owner ==
> * Name: [[User:zbyszek|Zbigniew Jędrzejewski-Szmek]]
> * Email: zbyszek at in.waw.pl
> 
> 
> == Detailed Description ==
> Files can be hardlinked at the end of the `%install` step in package
> builds. rpm supports this and will preserve those links in the binary
> rpm and during installation. This makes the installation a bit more
> efficient. Hardlinking of read-only files is generally transparent to
> the user, but has some small benefits: the files are not duplicated in
> the file system; backup, copy, and search programs will usually make
> use of the link information and not process the same inode twice.
> Thus, it's good to hardlink as many packaged files as possible.
> 
> Previously, hardlinking was done automatically for a subset of files
> in Python packages (via the `%__os_install_post_python` macro), and
> explicitly in some packages with lots of similar files (usually via
> the `hardlink` program).
> 
> The `%__os_install_post` is extended to automatically hardlink all
> identical files under `%{buildroot}%{_prefix}`, i.e. the `/usr`
> directory in packages. This calls a new helper binary (part of the
> `add-determinism` package) that does the linking.
> 
> Hard links may be confusing if the file is ''modified''. In
> particular, all links to the same inode share the same ownership and
> permissions, and obviously the same contents. Thus, we want to apply
> hardlinking only to files under `/usr`, which are generally read-only
> in packages.

My /usr directories are NOT read-only.  I have no intention of making the
switch :shrug:.  I frequently modify /usr files when I'm debugging or
temporarily fixing issues.

I think we need at least a knob to disable this feature for certain
installations, and a feature that would safely "unlink" those files if
necessary, in case hardlinks inadvertently came to those systems with
released Fedora images.

Pavel

> When files are hardlinked, mtime (the modification timestamp) is taken
> into account. Only files with identical mtime, owner, group, and mode
> are subject to linking. The new program written to do the linking
> takes `$SOURCE_DATE_EPOCH` into account, and will clamp mtimes to it
> before comparing.
> 
> Note: rpm correctly handles the case where a hardlink is between files
> in two different subpackages. Thus, we can hardlink everything under
> `%{buildroot}`, and rpm will store the files as hardlinked if they are
> in the same output package, adjusting the hardlink counts as
> appropriate.
> 
> == Feedback ==
> <!-- Summarize the feedback from the community and address why you
> chose not to accept proposed alternatives. This section is optional
> for all change proposals but is strongly suggested. Incorporating
> feedback here as it is raised gives FESCo a clearer view of your
> proposal and leaves a good record for the future. If you get no
> feedback, that is useful to note in this section as well. For
> innovative or possibly controversial ideas, consider collecting
> feedback before you file the change proposal. -->
> 
> == Benefit to Fedora ==
> As mentioned in the Summary, hardlinking deduplicates the data in rpms
> and in installations. Backup, copy, and search programs will usually
> make use of the link information and not process the same inode twice.
> Thus, by hardlinking files in the packages we make things a bit more
> efficient. (The impact is small, because rpms generally don't have
> large duplicated files.)
> 
> Hardlinking of files was previously done in some packages explicitly,
> but it required adding a `BuildRequires` line and invoking a script,
> so it wasn't done very often. By handling this automatically, we'll be
> able to simplify those packages.
> 
> Another caveat that needs to be taken into account when doing
> hardlinking as part of the package build is that newer `hardlink`
> versions use reflinks instead of hardlinks by default. (With a
> hardlink, one inode is connected to the file system tree in two or
> more places. With a reflink, some blocks of an inode are shared with
> another inode, ''inside'' of the file system, and the two inodes
> retain their separate identities.)  rpm has no knowledge of reflinks,
> so those reflinks created during package build have no effect on the
> binary package and the payload is duplicated. Invocations of
> `hardlink` would have to be annotated with `--reflink=never` to retain
> the intended effect. By removing that step from packages we avoid this
> issue.
> 
> The [https://docs.fedoraproject.org/en-US/reproducible-builds/
> Reproducible Builds] effort reported that some packages that use
> hardlinking are not reproducible, see
> [https://pagure.io/fedora-reproducible-builds/project/issue/22
> irreproducibility#22]. When files are created in the package build,
> depending on how fast the build machine is, some files might or might
> not have identical timestamps. The tools that were used to compare
> files for hardlinking were general tools that did not "know" that we'd
> clamp the mtimes to `$SOURCE_DATE_EPOCH` in a subsequent step, so the
> results of the mtime comparisons were unstable. The tool that is added
> as part of this Change does the mtime clamping internally for
> reproducible results. Fixing this issue was the initial motivation for
> this change.
> 
> == Scope ==
> * Proposal owners:
> ** extend the `add-determinism` package with a little helper that does
> file comparisons and hardlinks identical files. The helper takes
> `$SOURCE_DATE_EPOCH` into account.
> ** open pull request for `redhat-rpm-config` to insert a call to the
> helper in `%__os_install_post`.
> ** open pull request for `python-srpm-macros` to drop their hardlinking step.
> * Other developers:
> ** merge pull request
> ** report issues if the hardlinking has unforeseen consequences or
> does not work correctly.
> ** drop explicit calls to `hardlink` in their packages.
> 
> * Release engineering:
> 
> * Policies and guidelines: not needed, AFAICT.
> 
> * Trademark approval: N/A (not needed for this Change)
> 
> 
> * Alignment with the Fedora Strategy:
> 
> 
> == Upgrade/compatibility impact ==
> No impact.
> 
> == Early Testing (Optional) ==
> Build package with an invocation of the new helper.
> 
> == How To Test ==
> Install packages rebuilt with the helper.
> 
> == User Experience ==
> Not visible to users.
> 
> == Dependencies ==
> 
> == Contingency Plan ==
> * Contingency mechanism:
> ** if hardlinking causes a problem in some specific packages, they can
> be trivially modified to skip the hardlinking step by setting a macro.
> ** if there is a general problem, we can easily drop the macro in
> `redhat-rpm-config`.
> * Contingency deadline: any time, even after release. Any affected
> packages would have to be rebuilt.
> * Blocks release? No.
> 
> == Documentation ==
> The invocation of the helper will be documented inline in the macros
> files. Other documentation is not needed.
> 
> == Release Notes ==
> Package builds automatically hardlink identical files. This reduces
> the installation footprint a bit and also makes packages builds more
> reproducible.
> 
> 
> 
> -- 
> Aoife Moloney
> 
> Fedora Operations Architect
> 
> Fedora Project
> 
> Matrix: @amoloney:fedora.im
> 
> IRC: amoloney
> 
> 




-- 
_______________________________________________
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue

Reply via email to