Re: Decreasing packaging overhead
At Sun, 1 Nov 2015 12:33:19 -0800, Josh Triplett wrote: > > Thomas Goirand wrote: > > But good luck to teach good practices upstream. See Ross's reply: 120 > > packages are depending on this. > > It's more than that. Given tooling that doesn't have excessive overhead > for small packages, why call such packages "bad practices" in the first > place? The total amount of lines of all the files in the git repository is 161, there are 5 lines of code, so the overhead 3220%. Or if you want to measure in bytes, total files are 4515 bytes and index.js is 150 bytes which results in an overhead of 3010%. In my opinion that is excessive overhead. And that's just the overhead in bits. This package probably won't need any changes in the future, but packages with a few more lines of code might. What happens when the maintainer goes MIA and something needs to be fixed? Do we then get forks of libraries that have only 30 lines of code, everybody has to update their dependencies to get the fixed version, etc.? That is also overhead you wouldn't have with a standard library maintained by a group of developers. Kind regards, Jeroen Dekkers
Re: Decreasing packaging overhead
On 2015-11-02 22:55, Thomas Goirand wrote: > It's not the package which is a bad practice, here, the maintainer is > only dealing with upstream. > > What's a bad practice is creating a library for 2 lines of code. > Upstream should have tried to integrate this function into a bigger > library with more functionality to make it more useful. i resent the notion that either is bad practice. the problem merely reflects that Debian's concept of packages does not map well to other communities' concepts of packages (and i think i'm in line here with josh). our tried and tested concept of packages/libraries has been working for decades. young and emerging software development processes (might) have different needs. fgmasdr IOhannes
Re: Decreasing packaging overhead
Hi, [just picking a few random bits] On Sun, Nov 01, 2015 at 12:33:19PM -0800, Josh Triplett wrote: > Files, Checksums-Sha1, and Checksums-Sha256 are clearly redundant; has > it been long enough that we can drop the first two yet? apt/jessie should be fine with that, but as mentioned the last few times we had this dropping MD5/SHA1 discussion: Its not totally unrealistic that there are still tools which need changes. If it hasn't changed since then jigdo would be an example. Using either of these hashes is 'no' problem if you take it just for intermediate steps and verify the result at the end more heavily. Its how pdiffs work at the moment for example (but we are working on changing it [0]). What is clearly missing here is someone working on getting this forward. Just waiting isn't going to do it. apt waited >10 years before having the radical idea of wanting to deprecate repositories without a Release file. It took merely hours before the first complains[1] tickled in. [0] https://lists.debian.org/debian-dak/2015/10/msg00010.html [1] No pointers, just the obvious xkcd#1172 reference > Now that we use a secure hash, do we really need the sizes in those > fields? Once upon a time even MD5 was considered secure. Now its relatively easy to find collisions, a little harder to do pre-image, but adding a same-size requirement makes it harder. Also, checking if you got "too much" data based on size is important to prevent deny of service attacks as an attacker can otherwise fill up your disk. Oh and people love progress reports. > Furthermore, we could generate the filenames from the source > name and version. Filenames with or without epoch? (yes, that is a trick question) There is also v3 additional orig tarballs and other lovely things to worry about. For binary packages it might make sense through to move the info in the Release file with a field containing enough variables to make that fly. I considered that briefly for Changelog: (see thread-start of [0] above), but then decided that this is too complicated for this. That could surely be done if someone would get behind this. > In the Packages files for binaries, we could eliminate a *massive* > amount of redundancy by having a dedicated Packages file for "all", to > avoid duplicating entries into every architecture's Packages file. > That should not significantly increase overhead for end-users, and for > any user of multiarch it'll decrease overhead. A quick check on amd64 > shows that splitting out "all" into a separate Packages file would not > change the combined uncompressed size at all, should not change the > pdiff size at all, and would increase the combined compressed > full-download size by 94k, from 9957k to 10051k, an increase of less > than 1%. That seems reasonable in exchange for eliminating 12 > duplicate copies of the 4396k used for "all" Packages files, times > suites (oldstable/stable/testing/unstable/experimental), and that > doesn't even count unofficial architectures, or snapshot.debian.org. You are a few days too late for suggesting that idea, as Johannes already pointed out. Still, that will be a bunch of work, so if anyone wants to help… > Ditto for translated descriptions, except that there, we should share > descriptions across architectures by default, even for arch-specific > packages. Almost no packages have descriptions that vary by > architecture. We already share descriptions, see i18n/ … or what do you mean? > For translated descriptions, Package and Description-md5 seem redundant. Well, Package + the md5 of the original description as identifier was chosen because versions change way more often compared to descriptions. Only doing it based on package name is dangerous in terms of packages changing greatly between versions, which if you are unlucky both still exist in different architectures. A rarely noticed sideeffect of having -md5 is btw that translations can be shared across repositories, so that e.g. security.d.o (or experimental or your random bikeshed) uses the translated descriptions of the main archive. That isn't possible anymore if you go for package name only. What could have been done back then would have been using a shorter hash I guess. It seems a bit too late to change that now, but if someone feels like working on it I am not going to complain… Anyway, a giant list of things which could potentially be done isn't going to change anything as the problem isn't that we have too few tasks for the giant contributor armies working on the tools which need to be changed for something to happen… Best regards David Kalnischkies signature.asc Description: PGP signature
Re: Decreasing packaging overhead
On Sun, Nov 01, 2015 at 12:33:19PM -0800, Josh Triplett wrote: > In the Packages files for binaries, we could eliminate a *massive* > amount of redundancy by having a dedicated Packages file for "all", to > avoid duplicating entries into every architecture's Packages file. See [1]. However there is additional logic in dak that hides newer arch:all packages if the corresponding binary has not been built yet. (Or adds the older arch:all binary in addition?) That would no longer be possible. Kind regards Philipp Kern [1] http://ftp.de.debian.org/debian/dists/sid/main/binary-all/Packages.gz Interesting if you need to know which arch:all need building given the constraint above. signature.asc Description: Digital signature
Re: Decreasing packaging overhead
On 11/01/2015 09:33 PM, Josh Triplett wrote: > Thomas Goirand wrote: >> But good luck to teach good practices upstream. See Ross's reply: 120 >> packages are depending on this. > > It's more than that. Given tooling that doesn't have excessive overhead > for small packages, why call such packages "bad practices" in the first > place? It's not the package which is a bad practice, here, the maintainer is only dealing with upstream. What's a bad practice is creating a library for 2 lines of code. Upstream should have tried to integrate this function into a bigger library with more functionality to make it more useful. >> Though it is also my view that packaging tiny stuff shouldn't be a >> problem. If it is, then we should fix whatever it is that is problematic >> in Debian infra. > > Agreed. > > Let's consider what overhead exists for a Debian package [...] IMO, the reasoning should start from the *infra* part, ie, what is taking a tall on dak / britney2 [/ others?], and what part of the infra is too slow. In some case, rethinking these could work, on others, just throwing more compute power at it could also do... I don't know the Debian infra enough to be able to tell. Though where I work (ie: nearly unlimited resources from the cloud) every resource issue is fixable... Cheers, Thomas Goirand (zigo)
Re: Decreasing packaging overhead
Hi, Quoting Josh Triplett (2015-11-01 21:33:19) > "Binary" seems a bit excessive for several reasons. First, it seems > redundant with the "Source" entries in Packages files; we don't > necessarily need a two-way cross-reference at all here. And second, we > could assume that a missing entry means "same as Package". That rule > (source equals binary) would work for 13364 of 24097 packages in Debian > today, and potentially more if other single-binary packages ensured > their source and binary names matched. > > For that matter, Binary and Package-List seem redundant. (And > Package-List doesn't seem like end-user metadata; it seems like > something only the Debian infrastructure needs.) You can read about the original purpose of the Package-List field here: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=619131 It has recently been extended to also carry information about the build profile formulas that the binary packages a source package builds carry and there are talks to also let it contain non-architecture specific, unversioned binary Provides information. You would probably call this yet more duplication because all this information can be retrieved from the Packages file. But adding this information to the Sources file is useful because in a bootstrapping scenario your Packages file is empty. > Do we really need fields like Build-Depends, Testsuite, or Standards-Version > pulled out of the package itself and placed into the Sources file? Why do we > need to read those without the source package? (Note that tools that form > part of Debian infrastructure could work from UDD or similar; the question is > why those fields are needed on an end-user system that downloaded the Sources > file.) Why does an "end-user system" have a deb-src line in its /etc/apt/sources.list in the first place? I think I don't understand yet what use case you want to optimize for. And what do you consider "Debian infrastructure"? I like the Sources file precisely because it can be downloaded by anybody (in contrast to UDD for which one has to use a mirror right now). If information like Build-Depends or the Package-List field gets removed from the Sources file, then it should remain in a place that is as easy to access as the Sources file is right now. What exactly are you proposing? Suppose we'd have a Sources.full and a Sources.minimal. The latter would only carry the fields necessary for apt to be able to download a dsc while the former would be like today's Sources file. Would that not again mean lots of duplication because all information in Sources.minimal is part of Sources.full? So this would not help our mirrors (they'd actually now have to store more) but it would help our users who would now have a few MB less on their systems. Or you could just not have Sources.full on the mirrors but only distribute Sources.minimal. But if you do that, please, please, please make Sources.full just as easily accessibly as Sources is right now, including getting snapshotted and available for all suites, architectures and ports. > In the Packages files for binaries, we could eliminate a *massive* amount of > redundancy by having a dedicated Packages file for "all", to avoid > duplicating entries into every architecture's Packages file. That should not > significantly increase overhead for end-users, and for any user of multiarch > it'll decrease overhead. A quick check on amd64 shows that splitting out > "all" into a separate Packages file would not change the combined > uncompressed size at all, should not change the pdiff size at all, and would > increase the combined compressed full-download size by 94k, from 9957k to > 10051k, an increase of less than 1%. That seems reasonable in exchange for > eliminating 12 duplicate copies of the 4396k used for "all" Packages files, > times suites (oldstable/stable/testing/unstable/experimental), and that > doesn't even count unofficial architectures, or snapshot.debian.org. There is a thread about this on debian-dak@l.d.o: http://lists.debian.org/20151030145625.GB14516@crossbow cheers, josch signature.asc Description: signature
Decreasing packaging overhead
Thomas Goirand wrote: > But good luck to teach good practices upstream. See Ross's reply: 120 > packages are depending on this. It's more than that. Given tooling that doesn't have excessive overhead for small packages, why call such packages "bad practices" in the first place? > Though it is also my view that packaging tiny stuff shouldn't be a > problem. If it is, then we should fix whatever it is that is problematic > in Debian infra. Agreed. Let's consider what overhead exists for a Debian package, and what we could potentially reduce or remove, using node-defined as an example. (Obviously any such changes to metadata may require a full Debian release to propagate changes to tools like apt and dpkg.) To make redundancy more evident, I'll include everything first before discussing any of it. First, an entry in Sources that looks like this, for each Debian suite (unstable/testing/stable/oldstable): Package: node-defined Binary: node-defined Version: 1.0.0-1 Maintainer: Debian Javascript MaintainersUploaders: Ross Gammon Build-Depends: debhelper (>= 9), dh-buildinfo, nodejs Architecture: all Standards-Version: 3.9.6 Format: 3.0 (quilt) Files: 43ab019e6b53b9f4d4ff338027cb351d 1997 node-defined_1.0.0-1.dsc 978d30ee28482aa7812f74f812b1899f 2334 node-defined_1.0.0.orig.tar.gz 557f4bcec8a449608e50d09ba69bd224 2416 node-defined_1.0.0-1.debian.tar.xz Vcs-Browser: https://anonscm.debian.org/cgit/pkg-javascript/node-defined.git Vcs-Git: git://anonscm.debian.org/pkg-javascript/node-defined.git Checksums-Sha1: 02cb2027e3218b93fd856a5e3b68134fe01e47c1 1997 node-defined_1.0.0-1.dsc eff888bf76f9cfcca2b94e39c470a6c1441b3f03 2334 node-defined_1.0.0.orig.tar.gz 7237a9a8aee2add44a9d8bb0dae382c3f0a923cf 2416 node-defined_1.0.0-1.debian.tar.xz Checksums-Sha256: 4aa2a079bc7119678c58643def268e4789b56a6a40b2931601de527244a1def8 1997 node-defined_1.0.0-1.dsc d953e6e9fe9277cc6e68e5bb36a299d8f3505f8facd3468ab7edc7d6858d293a 2334 node-defined_1.0.0.orig.tar.gz 56ede623ee7929fcb334fa7459c3e3f43b529bf2b585866d5ebc9ee06cc3d03d 2416 node-defined_1.0.0-1.debian.tar.xz Homepage: https://github.com/substack/defined Package-List: node-defined deb web optional arch=all Testsuite: autopkgtest Directory: pool/main/n/node-defined Priority: extra Section: misc Second, an entry in *each architecture's* Packages file like this, for each Debian suite: Package: node-defined Version: 1.0.0-1 Installed-Size: 19 Maintainer: Debian Javascript Maintainers Architecture: all Depends: nodejs Description: return the first argument that is `!== undefined` Homepage: https://github.com/substack/defined Description-md5: b4200f8f2e989c1354c3c1cb3677e663 Section: web Priority: optional Filename: pool/main/n/node-defined/node-defined_1.0.0-1_all.deb Size: 3292 MD5sum: d5a08f2219b4128a49be206caeb5b8b4 SHA1: 115317d45d5028203269d84aa07c447d7c12ea7b SHA256: 5be875d209afc69aa2d6be10bbed3c514e75f0a5e8d5a769a6461f42ab6db581 (Note that a source package with multiple binary packages would have multiple such entries.) Third, an entry in Translation-en (and every other translation), for each Debian suite: Package: node-defined Description-md5: b4200f8f2e989c1354c3c1cb3677e663 Description-en: return the first argument that is `!== undefined` Most of the time when you chain together ||s, you actually just want the first item that is not undefined, not the first non-falsy item. . This module is like the defined-or (//) operator in perl 5.10+. . Node.js is an event-based server-side JavaScript engine. Fourth, the source package .dsc file: -BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Format: 3.0 (quilt) Source: node-defined Binary: node-defined Architecture: all Version: 1.0.0-1 Maintainer: Debian Javascript Maintainers Uploaders: Ross Gammon Homepage: https://github.com/substack/defined Standards-Version: 3.9.6 Vcs-Browser: https://anonscm.debian.org/cgit/pkg-javascript/node-defined.git Vcs-Git: git://anonscm.debian.org/pkg-javascript/node-defined.git Testsuite: autopkgtest Build-Depends: debhelper (>= 9), dh-buildinfo, nodejs Package-List: node-defined deb web optional arch=all Checksums-Sha1: eff888bf76f9cfcca2b94e39c470a6c1441b3f03 2334 node-defined_1.0.0.orig.tar.gz 7237a9a8aee2add44a9d8bb0dae382c3f0a923cf 2416 node-defined_1.0.0-1.debian.tar.xz Checksums-Sha256: d953e6e9fe9277cc6e68e5bb36a299d8f3505f8facd3468ab7edc7d6858d293a 2334 node-defined_1.0.0.orig.tar.gz 56ede623ee7929fcb334fa7459c3e3f43b529bf2b585866d5ebc9ee06cc3d03d 2416 node-defined_1.0.0-1.debian.tar.xz Files: 978d30ee28482aa7812f74f812b1899f 2334 node-defined_1.0.0.orig.tar.gz 557f4bcec8a449608e50d09ba69bd224 2416 node-defined_1.0.0-1.debian.tar.xz -BEGIN PGP SIGNATURE- Version: GnuPG v1 iQIcBAEBCAAGBQJWKj8IAAoJEPNPCXROn13ZrhwP/1+FQtC5NIM1SAWj8capx3Sm