Re: A new metric for source package importance in ports
Hi, Quoting Steven Chamberlain (2013-11-28 01:04:56) On 27/11/13 17:58, Johannes Schauer wrote: http://mister-muffin.de/p/Gid8.txt One can see that now the amount of source packages which is needed to build the rest of the archive is only 383. So, there are 383 packages that share the same, maximum value (in this case 11657) in the second column? Correct. $ curl -s http://mister-muffin.de/p/Gid8.txt | awk '{ if ($2==11657) print $0 }' | wc -l 383 In this particular graph the maximum value of the second column (11657) is less than the total amount of source packages (in contrast to the first graph) because this latter graph assumes that arch:all, essential:yes and build-essential do not have to be rebuild. Therefore, lots of source packages do not have to be compiled at all. Does anybody see enough value in these numbers for source package importance in the light of bootstrapping Debian (either for a new port or for rebuilding the archive from scratch)? I find the list of 383 packages interesting, at least. I think this closure is what I had in mind[0] for regular testing of ports' toolchains and reproducibility of builds. In that email you wrote: Some people have been trying to identify small sets of essential packages already, in the context of bootstrapping an architecture[1]. I wonder if that's likely to overlap with this? It encompasses toolchain and essential arch-specific packages. I imagine a healthy port should be able to bootstrap itself with only current package versions. If this was being tested regularly it could let porters know if circular dependencies are introduced Yes, if you omit the necessity to rebuild arch:all packages, then these 383 source packages are about what you were talking about: the set of source packages which makes a port able to bootstrap itself. Though notice that this number (383) is only the very lower bound because it was deducted using strong dependencies only. You can see the upper bound in the column that was calculated using the closure graph which would be 457 source packages. If you also want to rebuild arch:all packages, then you have to look at the first graph and then the number quickly climbs to 1194 source packages minimum and 1424 source packages maximum. Does the list vary by architecture? I see many odd things in here such as 'systemd' and 'redhat-cluster' which would be unavailable if trying to bootstrap a non-Linux port, for example. Yes it does vary by architecture because dependencies can have architecture qualifiers. Here, I used amd64 as an example. I also find it interesting to see openjdk-7 listed but not gcj; or even gcc-4.8. Was this computed for jessie or sid? Using Debian Sid as of yesterday. cheers, josch -- To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20131128080012.2752.32993@hoothoot
Re: A new metric for source package importance in ports
Hi, Quoting Dmitrijs Ledkovs (2013-11-28 01:15:06) On 28 November 2013 00:04, Steven Chamberlain ste...@pyro.eu.org wrote: I also find it interesting to see openjdk-7 listed but not gcj; or even gcc-4.8. Was this computed for jessie or sid? I guess implicit relationships are not considered: build-essential build-dependencies, and essential dependencies. I would expect for packages in those to sets have the highest rank, since, hypothetically, all packages in debian build-depend depend on those. Steven was looking at the second graph which (in contrast to the first graph) makes the assumption that essential:yes and build-essential are already available somehow (for example by having cross compiled them) and thus do not need to be recompiled to bootstrap the port. gcj and gcc-4.8 is part of the packages which are drawn in by creating a co-installation set of essential:yes and build-essential packages. Therefore they do not appear in the second graph. Since this co-installation set is an input to the algorithm of creating the second graph, they implicitly receive the highest rank. For the same reason you will also see them being assigned the highest rank in the first graph which does not assume that essential:yes and build-essential do not have to be recompiled. Implicit dependency relationships are considered by both algorithms to calculate the strong dependencies and the dependency closure of source and binary packages. My code uses dose3 to do the required calculations. cheers, josch -- To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20131128080700.2752.98455@hoothoot
Re: A new metric for source package importance in ports
peter green wrote: Johannes Schauer wrote: Hi, the following is a report of a successful implementation of what I have been talking about with Niels Thykier during debconf13. The question was how important it is for a source package to be compilable or exist in the first place given an incomplete port which is in the process of being bootstrapped. This work is solving a different purpose than the identification of key packages by Lucas Nussbaum [1]. Instead of attaching a binary value to each source package, this method is associating integer values to them. Once bootstrapping of the whole archive becomes more important or even possible in real life through an implementation of build profiles, this heuristic could be used to further extend the meaning of key packages as well. One problem with these metrics is that you get source packages whose importance is artifically inflated because of the way our source packages work. If anything in a source package needs x then the whole source package has to build-depend on x. Even if x is only needed for some perhipheral functionlity that could easilly be removed in the event that x was unavailable (either on a particular port or in general). In the case of libraries there may be a binary dependency too for rarely used fuctionality. For example some of the mesa libraries drag in libwayland0 which means wayland ends up with a very high importance even though afaict hardly anyone uses it right now. There's also issues where an individual package makes questionable assertions about its dependencies. For example, both elvis and nmap have the option of a GUI frontend, which means that as soon as they're installed onto a headless system they force a significant amount of X stuff to be loaded even if this is not desired. -- Mark Morgan Lloyd markMLl .AT. telemetry.co .DOT. uk [Opinions above are the author's, not those of his employers or colleagues] -- To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/52970f16.1050...@telemetry.co.uk
Re: A new metric for source package importance in ports
Hi josch! On 27/11/13 17:58, Johannes Schauer wrote: http://mister-muffin.de/p/Gid8.txt One can see that now the amount of source packages which is needed to build the rest of the archive is only 383. So, there are 383 packages that share the same, maximum value (in this case 11657) in the second column? Does anybody see enough value in these numbers for source package importance in the light of bootstrapping Debian (either for a new port or for rebuilding the archive from scratch)? I find the list of 383 packages interesting, at least. I think this closure is what I had in mind[0] for regular testing of ports' toolchains and reproducibility of builds. Because each Debian port depends in some indirect way on the authenticity of these packages. And likewise any toolchain bugs are most critical here. I just didn't think there would be so many packages. Does the list vary by architecture? I see many odd things in here such as 'systemd' and 'redhat-cluster' which would be unavailable if trying to bootstrap a non-Linux port, for example. I also find it interesting to see openjdk-7 listed but not gcj; or even gcc-4.8. Was this computed for jessie or sid? [0]: http://lists.debian.org/5266df9d.9040...@pyro.eu.org Regards, -- Steven Chamberlain ste...@pyro.eu.org -- To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/529688a8.8080...@pyro.eu.org
Re: A new metric for source package importance in ports
Johannes Schauer wrote: Hi, the following is a report of a successful implementation of what I have been talking about with Niels Thykier during debconf13. The question was how important it is for a source package to be compilable or exist in the first place given an incomplete port which is in the process of being bootstrapped. This work is solving a different purpose than the identification of key packages by Lucas Nussbaum [1]. Instead of attaching a binary value to each source package, this method is associating integer values to them. Once bootstrapping of the whole archive becomes more important or even possible in real life through an implementation of build profiles, this heuristic could be used to further extend the meaning of key packages as well. One problem with these metrics is that you get source packages whose importance is artifically inflated because of the way our source packages work. If anything in a source package needs x then the whole source package has to build-depend on x. Even if x is only needed for some perhipheral functionlity that could easilly be removed in the event that x was unavailable (either on a particular port or in general). In the case of libraries there may be a binary dependency too for rarely used fuctionality. For example some of the mesa libraries drag in libwayland0 which means wayland ends up with a very high importance even though afaict hardly anyone uses it right now. Another big example is languages. Lots of packages build language bindings for lots of languages dragging those languages into the important set. So these metrics are a good guide to what packages are unimportant but to determine whether a package is really important or just psuedo-important still requires human judgement. -- To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/52968a89.6050...@p10link.net
Re: A new metric for source package importance in ports
On 28 November 2013 00:04, Steven Chamberlain ste...@pyro.eu.org wrote: Hi josch! On 27/11/13 17:58, Johannes Schauer wrote: http://mister-muffin.de/p/Gid8.txt One can see that now the amount of source packages which is needed to build the rest of the archive is only 383. So, there are 383 packages that share the same, maximum value (in this case 11657) in the second column? Does anybody see enough value in these numbers for source package importance in the light of bootstrapping Debian (either for a new port or for rebuilding the archive from scratch)? I find the list of 383 packages interesting, at least. I think this closure is what I had in mind[0] for regular testing of ports' toolchains and reproducibility of builds. Because each Debian port depends in some indirect way on the authenticity of these packages. And likewise any toolchain bugs are most critical here. I just didn't think there would be so many packages. Does the list vary by architecture? I see many odd things in here such as 'systemd' and 'redhat-cluster' which would be unavailable if trying to bootstrap a non-Linux port, for example. I also find it interesting to see openjdk-7 listed but not gcj; or even gcc-4.8. Was this computed for jessie or sid? I guess implicit relationships are not considered: build-essential build-dependencies, and essential dependencies. I would expect for packages in those to sets have the highest rank, since, hypothetically, all packages in debian build-depend depend on those. Regards, Dmitrijs. -- To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/CANBHLUiifmR+_keS3eSQa_b3_CfZ_56o9vBRR8p2SeY=hy9...@mail.gmail.com
Re: A new metric for source package importance in ports
Instead of dwelling on this discovery, which is not productive, why not concentrate on what to do to improve Debian. The analysis has shown faults. Has Debian stopped working? Has the world crashed? The problems have been identified, the patches to address the issues are being evaluated and planned for retesting. By January 15,2014, Debian, Ubuntu , SUSE13.1, Fedora, RedHat, and probably every distribution that has an old or recent kernel will be corrected. So, whats the next topic? Regards Leslie Mr. Leslie Satenstein An experienced Information Technology specialist. Yesterday was a good day, today is a better day, and tomorrow will be even better.lsatenst...@yahoo.com SENT FROM MY OPEN SOURCE LINUX SYSTEM. From: Dmitrijs Ledkovs x...@debian.org To: Steven Chamberlain ste...@pyro.eu.org Cc: Johannes Schauer j.scha...@email.de; Debian Release debian-rele...@lists.debian.org; debian-po...@lists.debian.org Sent: Wednesday, November 27, 2013 7:15 PM Subject: Re: A new metric for source package importance in ports On 28 November 2013 00:04, Steven Chamberlain ste...@pyro.eu.org wrote: Hi josch! On 27/11/13 17:58, Johannes Schauer wrote: http://mister-muffin.de/p/Gid8.txt One can see that now the amount of source packages which is needed to build the rest of the archive is only 383. So, there are 383 packages that share the same, maximum value (in this case 11657) in the second column? Does anybody see enough value in these numbers for source package importance in the light of bootstrapping Debian (either for a new port or for rebuilding the archive from scratch)? I find the list of 383 packages interesting, at least. I think this closure is what I had in mind[0] for regular testing of ports' toolchains and reproducibility of builds. Because each Debian port depends in some indirect way on the authenticity of these packages. And likewise any toolchain bugs are most critical here. I just didn't think there would be so many packages. Does the list vary by architecture? I see many odd things in here such as 'systemd' and 'redhat-cluster' which would be unavailable if trying to bootstrap a non-Linux port, for example. I also find it interesting to see openjdk-7 listed but not gcj; or even gcc-4.8. Was this computed for jessie or sid? I guess implicit relationships are not considered: build-essential build-dependencies, and essential dependencies. I would expect for packages in those to sets have the highest rank, since, hypothetically, all packages in debian build-depend depend on those. Regards, Dmitrijs. -- To UNSUBSCRIBE, email to debian-amd64-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/CANBHLUiifmR+_keS3eSQa_b3_CfZ_56o9vBRR8p2SeY=hy9...@mail.gmail.com
Re: A new metric for source package importance in ports
Hi, Quoting peter green (2013-11-28 01:12:57) One problem with these metrics is that you get source packages whose importance is artifically inflated because of the way our source packages work. If anything in a source package needs x then the whole source package has to build-depend on x. Even if x is only needed for some perhipheral functionlity that could easilly be removed in the event that x was unavailable (either on a particular port or in general). In the case of libraries there may be a binary dependency too for rarely used fuctionality. For example some of the mesa libraries drag in libwayland0 which means wayland ends up with a very high importance even though afaict hardly anyone uses it right now. Another big example is languages. Lots of packages build language bindings for lots of languages dragging those languages into the important set. So these metrics are a good guide to what packages are unimportant but to determine whether a package is really important or just psuedo-important still requires human judgement. Correct. The situation can be greatly improved once build profiles allow to mark build dependencies as less important or non essential. cheers, josch -- To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20131128074506.2752.10616@hoothoot