Re: Decreasing packaging overhead

2015-11-03 Thread Jeroen Dekkers
At Sun, 1 Nov 2015 12:33:19 -0800,
Josh Triplett wrote:
> 
> Thomas Goirand wrote:
> > But good luck to teach good practices upstream. See Ross's reply: 120
> > packages are depending on this.
> 
> It's more than that.  Given tooling that doesn't have excessive overhead
> for small packages, why call such packages "bad practices" in the first
> place?

The total amount of lines of all the files in the git repository is
161, there are 5 lines of code, so the overhead 3220%. Or if you want
to measure in bytes, total files are 4515 bytes and index.js is 150
bytes which results in an overhead of 3010%. In my opinion that is
excessive overhead.

And that's just the overhead in bits. This package probably won't need
any changes in the future, but packages with a few more lines of code
might. What happens when the maintainer goes MIA and something needs
to be fixed? Do we then get forks of libraries that have only 30 lines
of code, everybody has to update their dependencies to get the fixed
version, etc.? That is also overhead you wouldn't have with a standard
library maintained by a group of developers.


Kind regards,

Jeroen Dekkers



Re: Decreasing packaging overhead

2015-11-03 Thread Debian/GNU
On 2015-11-02 22:55, Thomas Goirand wrote:
> It's not the package which is a bad practice, here, the maintainer is
> only dealing with upstream.
> 
> What's a bad practice is creating a library for 2 lines of code.
> Upstream should have tried to integrate this function into a bigger
> library with more functionality to make it more useful.

i resent the notion that either is bad practice.

the problem merely reflects that Debian's concept of packages does not
map well to other communities' concepts of packages (and i think i'm in
line here with josh).

our tried and tested concept of packages/libraries has been working for
decades. young and emerging software development processes (might) have
different needs.

fgmasdr
IOhannes



Re: Decreasing packaging overhead

2015-11-02 Thread David Kalnischkies
Hi,

[just picking a few random bits]

On Sun, Nov 01, 2015 at 12:33:19PM -0800, Josh Triplett wrote:
> Files, Checksums-Sha1, and Checksums-Sha256 are clearly redundant; has
> it been long enough that we can drop the first two yet?

apt/jessie should be fine with that, but as mentioned the last few times
we had this dropping MD5/SHA1 discussion: Its not totally unrealistic
that there are still tools which need changes. If it hasn't changed
since then jigdo would be an example. Using either of these hashes is
'no' problem if you take it just for intermediate steps and verify the
result at the end more heavily. Its how pdiffs work at the moment for
example (but we are working on changing it [0]).

What is clearly missing here is someone working on getting this forward.
Just waiting isn't going to do it. apt waited >10 years before having
the radical idea of wanting to deprecate repositories without a Release
file. It took merely hours before the first complains[1] tickled in.

[0] https://lists.debian.org/debian-dak/2015/10/msg00010.html
[1] No pointers, just the obvious xkcd#1172 reference


> Now that we use a secure hash, do we really need the sizes in those
> fields?

Once upon a time even MD5 was considered secure. Now its relatively easy
to find collisions, a little harder to do pre-image, but adding
a same-size requirement makes it harder. Also, checking if you got
"too much" data based on size is important to prevent deny of service
attacks as an attacker can otherwise fill up your disk. Oh and people
love progress reports.


> Furthermore, we could generate the filenames from the source
> name and version.

Filenames with or without epoch? (yes, that is a trick question) There
is also v3 additional orig tarballs and other lovely things to worry
about. For binary packages it might make sense through to move the info
in the Release file with a field containing enough variables to make
that fly. I considered that briefly for Changelog: (see thread-start of
[0] above), but then decided that this is too complicated for this.

That could surely be done if someone would get behind this.


> In the Packages files for binaries, we could eliminate a *massive*
> amount of redundancy by having a dedicated Packages file for "all", to
> avoid duplicating entries into every architecture's Packages file.
> That should not significantly increase overhead for end-users, and for
> any user of multiarch it'll decrease overhead.  A quick check on amd64
> shows that splitting out "all" into a separate Packages file would not
> change the combined uncompressed size at all, should not change the
> pdiff size at all, and would increase the combined compressed
> full-download size by 94k, from 9957k to 10051k, an increase of less
> than 1%.  That seems reasonable in exchange for eliminating 12
> duplicate copies of the 4396k used for "all" Packages files, times
> suites (oldstable/stable/testing/unstable/experimental), and that
> doesn't even count unofficial architectures, or snapshot.debian.org.

You are a few days too late for suggesting that idea, as Johannes
already pointed out. Still, that will be a bunch of work, so if anyone
wants to help…


> Ditto for translated descriptions, except that there, we should share
> descriptions across architectures by default, even for arch-specific
> packages.  Almost no packages have descriptions that vary by
> architecture.

We already share descriptions, see i18n/ … or what do you mean?


> For translated descriptions, Package and Description-md5 seem redundant.

Well, Package + the md5 of the original description as identifier was
chosen because versions change way more often compared to descriptions.
Only doing it based on package name is dangerous in terms of packages
changing greatly between versions, which if you are unlucky both still
exist in different architectures. A rarely noticed sideeffect of having
-md5 is btw that translations can be shared across repositories, so that
e.g. security.d.o (or experimental or your random bikeshed) uses the
translated descriptions of the main archive.  That isn't possible
anymore if you go for package name only.

What could have been done back then would have been using a shorter hash
I guess. It seems a bit too late to change that now, but if someone
feels like working on it I am not going to complain…


Anyway, a giant list of things which could potentially be done isn't
going to change anything as the problem isn't that we have too few tasks
for the giant contributor armies working on the tools which need to be
changed for something to happen…


Best regards

David Kalnischkies


signature.asc
Description: PGP signature


Re: Decreasing packaging overhead

2015-11-02 Thread Philipp Kern
On Sun, Nov 01, 2015 at 12:33:19PM -0800, Josh Triplett wrote:
> In the Packages files for binaries, we could eliminate a *massive*
> amount of redundancy by having a dedicated Packages file for "all", to
> avoid duplicating entries into every architecture's Packages file.

See [1]. However there is additional logic in dak that hides newer
arch:all packages if the corresponding binary has not been built yet.
(Or adds the older arch:all binary in addition?) That would no longer
be possible.

Kind regards
Philipp Kern

[1] http://ftp.de.debian.org/debian/dists/sid/main/binary-all/Packages.gz
Interesting if you need to know which arch:all need building given
the constraint above.


signature.asc
Description: Digital signature


Re: Decreasing packaging overhead

2015-11-02 Thread Thomas Goirand
On 11/01/2015 09:33 PM, Josh Triplett wrote:
> Thomas Goirand wrote:
>> But good luck to teach good practices upstream. See Ross's reply: 120
>> packages are depending on this.
> 
> It's more than that.  Given tooling that doesn't have excessive overhead
> for small packages, why call such packages "bad practices" in the first
> place?

It's not the package which is a bad practice, here, the maintainer is
only dealing with upstream.

What's a bad practice is creating a library for 2 lines of code.
Upstream should have tried to integrate this function into a bigger
library with more functionality to make it more useful.

>> Though it is also my view that packaging tiny stuff shouldn't be a
>> problem. If it is, then we should fix whatever it is that is problematic
>> in Debian infra.
> 
> Agreed.
> 
> Let's consider what overhead exists for a Debian package [...]

IMO, the reasoning should start from the *infra* part, ie, what is
taking a tall on dak / britney2 [/ others?], and what part of the infra
is too slow. In some case, rethinking these could work, on others, just
throwing more compute power at it could also do... I don't know the
Debian infra enough to be able to tell. Though where I work (ie: nearly
unlimited resources from the cloud) every resource issue is fixable...

Cheers,

Thomas Goirand (zigo)



Re: Decreasing packaging overhead

2015-11-02 Thread Johannes Schauer
Hi,

Quoting Josh Triplett (2015-11-01 21:33:19)
> "Binary" seems a bit excessive for several reasons.  First, it seems
> redundant with the "Source" entries in Packages files; we don't
> necessarily need a two-way cross-reference at all here.  And second, we
> could assume that a missing entry means "same as Package".  That rule
> (source equals binary) would work for 13364 of 24097 packages in Debian
> today, and potentially more if other single-binary packages ensured
> their source and binary names matched.
> 
> For that matter, Binary and Package-List seem redundant.  (And
> Package-List doesn't seem like end-user metadata; it seems like
> something only the Debian infrastructure needs.)

You can read about the original purpose of the Package-List field here:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=619131

It has recently been extended to also carry information about the build profile
formulas that the binary packages a source package builds carry and there are
talks to also let it contain non-architecture specific, unversioned binary
Provides information.

You would probably call this yet more duplication because all this information
can be retrieved from the Packages file. But adding this information to the
Sources file is useful because in a bootstrapping scenario your Packages file
is empty.

> Do we really need fields like Build-Depends, Testsuite, or Standards-Version
> pulled out of the package itself and placed into the Sources file?  Why do we
> need to read those without the source package? (Note that tools that form
> part of Debian infrastructure could work from UDD or similar; the question is
> why those fields are needed on an end-user system that downloaded the Sources
> file.)

Why does an "end-user system" have a deb-src line in its /etc/apt/sources.list
in the first place? I think I don't understand yet what use case you want to
optimize for.

And what do you consider "Debian infrastructure"? I like the Sources file
precisely because it can be downloaded by anybody (in contrast to UDD for which
one has to use a mirror right now). If information like Build-Depends or the
Package-List field gets removed from the Sources file, then it should remain in
a place that is as easy to access as the Sources file is right now.

What exactly are you proposing?

Suppose we'd have a Sources.full and a Sources.minimal. The latter would only
carry the fields necessary for apt to be able to download a dsc while the
former would be like today's Sources file. Would that not again mean lots of
duplication because all information in Sources.minimal is part of Sources.full?
So this would not help our mirrors (they'd actually now have to store more) but
it would help our users who would now have a few MB less on their systems.

Or you could just not have Sources.full on the mirrors but only distribute
Sources.minimal. But if you do that, please, please, please make Sources.full
just as easily accessibly as Sources is right now, including getting
snapshotted and available for all suites, architectures and ports.

> In the Packages files for binaries, we could eliminate a *massive* amount of
> redundancy by having a dedicated Packages file for "all", to avoid
> duplicating entries into every architecture's Packages file.  That should not
> significantly increase overhead for end-users, and for any user of multiarch
> it'll decrease overhead.  A quick check on amd64 shows that splitting out
> "all" into a separate Packages file would not change the combined
> uncompressed size at all, should not change the pdiff size at all, and would
> increase the combined compressed full-download size by 94k, from 9957k to
> 10051k, an increase of less than 1%.  That seems reasonable in exchange for
> eliminating 12 duplicate copies of the 4396k used for "all" Packages files,
> times suites (oldstable/stable/testing/unstable/experimental), and that
> doesn't even count unofficial architectures, or snapshot.debian.org.

There is a thread about this on debian-dak@l.d.o:

http://lists.debian.org/20151030145625.GB14516@crossbow

cheers, josch


signature.asc
Description: signature


Decreasing packaging overhead

2015-11-01 Thread Josh Triplett
Thomas Goirand wrote:
> But good luck to teach good practices upstream. See Ross's reply: 120
> packages are depending on this.

It's more than that.  Given tooling that doesn't have excessive overhead
for small packages, why call such packages "bad practices" in the first
place?

> Though it is also my view that packaging tiny stuff shouldn't be a
> problem. If it is, then we should fix whatever it is that is problematic
> in Debian infra.

Agreed.

Let's consider what overhead exists for a Debian package, and what we
could potentially reduce or remove, using node-defined as an example.
(Obviously any such changes to metadata may require a full Debian
release to propagate changes to tools like apt and dpkg.)  To make
redundancy more evident, I'll include everything first before discussing
any of it.

First, an entry in Sources that looks like this, for each Debian suite
(unstable/testing/stable/oldstable):

Package: node-defined
Binary: node-defined
Version: 1.0.0-1
Maintainer: Debian Javascript Maintainers 

Uploaders: Ross Gammon 
Build-Depends: debhelper (>= 9), dh-buildinfo, nodejs
Architecture: all
Standards-Version: 3.9.6
Format: 3.0 (quilt)
Files:
 43ab019e6b53b9f4d4ff338027cb351d 1997 node-defined_1.0.0-1.dsc
 978d30ee28482aa7812f74f812b1899f 2334 node-defined_1.0.0.orig.tar.gz
 557f4bcec8a449608e50d09ba69bd224 2416 node-defined_1.0.0-1.debian.tar.xz
Vcs-Browser: https://anonscm.debian.org/cgit/pkg-javascript/node-defined.git
Vcs-Git: git://anonscm.debian.org/pkg-javascript/node-defined.git
Checksums-Sha1:
 02cb2027e3218b93fd856a5e3b68134fe01e47c1 1997 node-defined_1.0.0-1.dsc
 eff888bf76f9cfcca2b94e39c470a6c1441b3f03 2334 node-defined_1.0.0.orig.tar.gz
 7237a9a8aee2add44a9d8bb0dae382c3f0a923cf 2416 
node-defined_1.0.0-1.debian.tar.xz
Checksums-Sha256:
 4aa2a079bc7119678c58643def268e4789b56a6a40b2931601de527244a1def8 1997 
node-defined_1.0.0-1.dsc
 d953e6e9fe9277cc6e68e5bb36a299d8f3505f8facd3468ab7edc7d6858d293a 2334 
node-defined_1.0.0.orig.tar.gz
 56ede623ee7929fcb334fa7459c3e3f43b529bf2b585866d5ebc9ee06cc3d03d 2416 
node-defined_1.0.0-1.debian.tar.xz
Homepage: https://github.com/substack/defined
Package-List: 
 node-defined deb web optional arch=all
Testsuite: autopkgtest
Directory: pool/main/n/node-defined
Priority: extra
Section: misc

Second, an entry in *each architecture's* Packages file like this, for each
Debian suite:

Package: node-defined
Version: 1.0.0-1
Installed-Size: 19
Maintainer: Debian Javascript Maintainers 

Architecture: all
Depends: nodejs
Description: return the first argument that is `!== undefined`
Homepage: https://github.com/substack/defined
Description-md5: b4200f8f2e989c1354c3c1cb3677e663
Section: web
Priority: optional
Filename: pool/main/n/node-defined/node-defined_1.0.0-1_all.deb
Size: 3292
MD5sum: d5a08f2219b4128a49be206caeb5b8b4
SHA1: 115317d45d5028203269d84aa07c447d7c12ea7b
SHA256: 5be875d209afc69aa2d6be10bbed3c514e75f0a5e8d5a769a6461f42ab6db581

(Note that a source package with multiple binary packages would have multiple
such entries.)

Third, an entry in Translation-en (and every other translation), for each
Debian suite:

Package: node-defined
Description-md5: b4200f8f2e989c1354c3c1cb3677e663
Description-en: return the first argument that is `!== undefined`
 Most of the time when you chain together ||s, you actually just want the
 first item that is not undefined, not the first non-falsy item.
 .
 This module is like the defined-or (//) operator in perl 5.10+.
 .
 Node.js is an event-based server-side JavaScript engine.

Fourth, the source package .dsc file:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Format: 3.0 (quilt)
Source: node-defined
Binary: node-defined
Architecture: all
Version: 1.0.0-1
Maintainer: Debian Javascript Maintainers 

Uploaders: Ross Gammon 
Homepage: https://github.com/substack/defined
Standards-Version: 3.9.6
Vcs-Browser: https://anonscm.debian.org/cgit/pkg-javascript/node-defined.git
Vcs-Git: git://anonscm.debian.org/pkg-javascript/node-defined.git
Testsuite: autopkgtest
Build-Depends: debhelper (>= 9), dh-buildinfo, nodejs
Package-List:
 node-defined deb web optional arch=all
Checksums-Sha1:
 eff888bf76f9cfcca2b94e39c470a6c1441b3f03 2334 node-defined_1.0.0.orig.tar.gz
 7237a9a8aee2add44a9d8bb0dae382c3f0a923cf 2416 
node-defined_1.0.0-1.debian.tar.xz
Checksums-Sha256:
 d953e6e9fe9277cc6e68e5bb36a299d8f3505f8facd3468ab7edc7d6858d293a 2334 
node-defined_1.0.0.orig.tar.gz
 56ede623ee7929fcb334fa7459c3e3f43b529bf2b585866d5ebc9ee06cc3d03d 2416 
node-defined_1.0.0-1.debian.tar.xz
Files:
 978d30ee28482aa7812f74f812b1899f 2334 node-defined_1.0.0.orig.tar.gz
 557f4bcec8a449608e50d09ba69bd224 2416 node-defined_1.0.0-1.debian.tar.xz

-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iQIcBAEBCAAGBQJWKj8IAAoJEPNPCXROn13ZrhwP/1+FQtC5NIM1SAWj8capx3Sm