Re: [gentoo-dev] [pre-GLEP r3] Gentoo binary package container format
On 2018.11.30 17:06, Michał Górny wrote: > On Mon, 2018-11-26 at 21:43 +, Roy Bamford wrote: > > On 2018.11.26 18:58, Michał Górny wrote: > > > Here's the newest version. > > > > > > Changes: > > > > > > - added explicit notion of parent directory (missing in previous > GLEP > > > but present in implementation), > > > > > > - explicitly named GNU tar format with list of permitted > extensions, > > > > > > - changed volume label to 'gpkg-1.txt' file to improve > portability; > > > made > > > it explicit version identifier as well, > > > > > > - added info on other package formats to rationale. > > > > > > > [snip] > > > > The image archive stores all the files to be installed by the binary > > package. It should be included as the last of the files in the > binary > > package container. > > > > [snip] > > > > > > -- > > > Best regards, > > > Michał Górny > > > > > > > Its a nit today but that says that any future extensions, none > > yet planned, should be placed before the image archive. > > Yes. > > > The specification needs to avoid the use of relative references. > > > > I don't understand. Could you be more specific what you expect > instead? > > -- > Best regards, > Michał Górny > Michał, Enumerate the elements, in the preferred order, which you have already done. The is no need, in a specification that is intended to be easily extensible to specify that any element should be last. That constrains extensions. To build on an example extension given earlier. Suppose an extension came along to add the ebuild, required eclasses and sources. The present wording says that they should be included before image archive. Implementations may be capable of working with partial downloads, why force the download of elements that may not be required to get the payload. The overhead of the presently define elements is small compared to the image and its useful to be able check the metadata to determine if the image is really what is required. image 'last' works with the presently defined elements but may not be so good in the years to come. Its a subtle difference between 'last', which means always at the end, no mater what, and 'fifth' which is last today but might not be in the future. -- Regards, Roy Bamford (Neddyseagoon) a member of elections gentoo-ops forum-mods pgp_Ox59p3DZ7.pgp Description: PGP signature
Re: [gentoo-dev] [pre-GLEP r3] Gentoo binary package container format
On Mon, 2018-11-26 at 21:43 +, Roy Bamford wrote: > On 2018.11.26 18:58, Michał Górny wrote: > > Here's the newest version. > > > > Changes: > > > > - added explicit notion of parent directory (missing in previous GLEP > > but present in implementation), > > > > - explicitly named GNU tar format with list of permitted extensions, > > > > - changed volume label to 'gpkg-1.txt' file to improve portability; > > made > > it explicit version identifier as well, > > > > - added info on other package formats to rationale. > > > > [snip] > > The image archive stores all the files to be installed by the binary > package. It should be included as the last of the files in the binary > package container. > > [snip] > > > > -- > > Best regards, > > Michał Górny > > > > Its a nit today but that says that any future extensions, none > yet planned, should be placed before the image archive. Yes. > The specification needs to avoid the use of relative references. > I don't understand. Could you be more specific what you expect instead? -- Best regards, Michał Górny signature.asc Description: This is a digitally signed message part
Re: [gentoo-dev] [pre-GLEP r3] Gentoo binary package container format
On 2018.11.26 18:58, Michał Górny wrote: > Here's the newest version. > > Changes: > > - added explicit notion of parent directory (missing in previous GLEP > but present in implementation), > > - explicitly named GNU tar format with list of permitted extensions, > > - changed volume label to 'gpkg-1.txt' file to improve portability; > made > it explicit version identifier as well, > > - added info on other package formats to rationale. > [snip] The image archive stores all the files to be installed by the binary package. It should be included as the last of the files in the binary package container. [snip] > > -- > Best regards, > Michał Górny > Its a nit today but that says that any future extensions, none yet planned, should be placed before the image archive. The specification needs to avoid the use of relative references. -- Regards, Roy Bamford (Neddyseagoon) a member of elections gentoo-ops forum-mods pgpXu1HdOG3la.pgp Description: PGP signature
Re: [gentoo-dev] [pre-GLEP r3] Gentoo binary package container format
On Mon, 2018-11-26 at 20:17 +0100, Ulrich Mueller wrote: > > > > > > On Mon, 26 Nov 2018, Michał Górny wrote: > > Specification > > = > > The container format > > > > The gpkg package container is an uncompressed .tar achive whose filename > > should use ``.gpkg.tar`` suffix. This archive contains the following > > members, all placed in a single directory whose name matches > > the basename of the package file, in order: > > I see no value in adding another directory indirection, and it will add > more overhead. Tar bomb is not a good design. Given tar padding, there will be no overhead unless the full path exceeds ustar limits which is unlikely. > Also, AFAICS the tar|tar pipeline that you previously > suggested won't work any more (or would at least require additional > arguments). I'm pretty sure the tar pipeline was actually written with account for the directory. > > > 1. The package identifier file ``gpkg-1.txt`` (required). > > [...] > > The implementations must include a package identifier file named > > ``gpkg-1.txt``. The filename includes package format version; > > implementations should reject packages which do not contain this file > > as unsupported format. > > The file can have any contents. Normally, it should be empty. > > If the file is empty, why is it named gpkg-1.txt (instead of just > gpkg-1)? > *shrug*. I can make it 'gpkg-1' or 'gpkg.1' or whatever you want ;-). -- Best regards, Michał Górny signature.asc Description: This is a digitally signed message part
Re: [gentoo-dev] [pre-GLEP r3] Gentoo binary package container format
> On Mon, 26 Nov 2018, Michał Górny wrote: > Specification > = > The container format > > The gpkg package container is an uncompressed .tar achive whose filename > should use ``.gpkg.tar`` suffix. This archive contains the following > members, all placed in a single directory whose name matches > the basename of the package file, in order: I see no value in adding another directory indirection, and it will add more overhead. Also, AFAICS the tar|tar pipeline that you previously suggested won't work any more (or would at least require additional arguments). > 1. The package identifier file ``gpkg-1.txt`` (required). > [...] > The implementations must include a package identifier file named > ``gpkg-1.txt``. The filename includes package format version; > implementations should reject packages which do not contain this file > as unsupported format. > The file can have any contents. Normally, it should be empty. If the file is empty, why is it named gpkg-1.txt (instead of just gpkg-1)? Ulrich signature.asc Description: PGP signature
Re: [gentoo-dev] [pre-GLEP r3] Gentoo binary package container format
Here's the newest version. Changes: - added explicit notion of parent directory (missing in previous GLEP but present in implementation), - explicitly named GNU tar format with list of permitted extensions, - changed volume label to 'gpkg-1.txt' file to improve portability; made it explicit version identifier as well, - added info on other package formats to rationale. --- GLEP: Title: Gentoo binary package container format Author: Michał Górny Type: Standards Track Status: Draft Version: 1 Created: 2018-11-15 Last-Modified: 2018-11-26 Post-History: 2018-11-17 Content-Type: text/x-rst --- Abstract This GLEP proposes a new binary package container format for Gentoo. The current tbz2/XPAK format is shortly described, and its deficiences are explained. Accordingly, the requirements for a new format are set and a gpkg format satisfying them is proposed. The rationale for the design decisions is provided. Motivation == The current Portage binary package format - The historical ``.tbz2`` binary package format used by Portage is a concatenation of two distinct formats: header-oriented compressed .tar format (used to hold package files) and trailer-oriented custom XPAK format (used to hold metadata) [#MAN-XPAK]_. The format has already been extended incompatibly twice. The first time, support for storing multiple successive builds of binary package for a single ebuild version has been added. This feature relies on appending additional hyphen, followed by an integer to the package filename. It is disabled by default (preserving backwards compatibility) and controlled by ``binpkg-multi-instance`` feature. The second time, support for additional compression formats has been added. When format other than bzip2 is used, the ``.tbz2`` suffix is replaced by ``.xpak`` and Portage relies on magic bytes to detect compression used. For backwards compatibility, Portage still defaults to using bzip2; compression program can be switched using ``BINPKG_COMPRESS`` configuration variable. Additionally, there have been minor changes to the stored metadata and file storage policies. In particular, behavior regarding ``INSTALL_MASK``, controllable file compression and stripping has changed over time. The advantages of tbz2/XPAK format -- The tbz2/XPAK format used by Portage has three interesting features: 1. **Each binary package is fully contained within a single file.** While this might seem unnecessary, it makes it easier for the user to transfer binary packages without having to be concerned about finding all the necessary files to transfer. 2. **The binary packages are compatible with regular compressed tarballs, most of the time.** With notable exceptions of historical versions of pbzip2 and the recent zstd compressor, tbz2/XPAK packages can be extracted using regular tar utility with a compressor implementation that discards trailing garbage. 3. **The metadata is uncompressed, and can be efficiently accessed without decompressing package contents.** This includes the possibility of rewriting it (e.g. as a result of package moves) without the necessity of repacking the files. Transparency problem with the current binary package format --- Notwithstanding its advantages, the tbz2/XPAK format has a significant design fault that consists of two issues: 1. **The XPAK format is a custom binary format with explicit use of binary-encoded file offsets and field lengths.** As such, it is non-trivial to read or edit without specialized tools. Such tools are currently implemented separately from the package manager, as part of the portage-utils toolkit, written in C [#PORTAGE-UTILS]_. 2. **The tarball compatibility feature relies on obscure feature of ignoring trailing garbage in compressed files**. While this is implemented consistently in most of the compressors, this feature is not really a part of specification but rather traditional behavior. Given that the original reasons for this no longer apply, new compressor implementations are likely to miss support for this. Both of the issues make the format hard to use without dedicated tools, or when the tools misbehave. This impacts the following scenarios: A. **Using binary packages for system recovery.** In case of serious breakage, it is really preferable that the format depends on as few tools a possible, and especially not on Gentoo-specific tools. B. **Inspecting binary packages in detail exceeding standard package manager facilities.** C. **Modifying binary packages in ways not predicted by the package manager authors.** A real-life example of this is working around broken ``pkg_*`` phases which prevent the package from being installed. OpenPGP extensibility problem - There are at