Re: A new metric for source package importance in ports
Hi, Quoting peter green (2013-11-28 01:12:57) > One problem with these metrics is that you get source packages whose > importance is artifically inflated because of the way our source packages > work. If anything in a source package needs x then the whole source package > has to build-depend on x. Even if x is only needed for some perhipheral > functionlity that could easilly be removed in the event that x was > unavailable (either on a particular port or in general). In the case of > libraries there may be a binary dependency too for rarely used fuctionality. > > For example some of the mesa libraries drag in libwayland0 which means > wayland ends up with a very high importance even though afaict hardly > anyone uses it right now. > > Another big example is languages. Lots of packages build language > bindings for lots of languages dragging those languages into the > "important set". > > So these metrics are a good guide to what packages are unimportant > but to determine whether a package is really important or just > psuedo-important still requires human judgement. Correct. The situation can be greatly improved once build profiles allow to mark build dependencies as "less important" or "non essential". cheers, josch -- To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20131128074506.2752.10616@hoothoot
Re: A new metric for source package importance in ports
Instead of dwelling on this discovery, which is not productive, why not concentrate on what to do to improve Debian. The analysis has shown faults. Has Debian stopped working? Has the world crashed? The problems have been identified, the patches to address the issues are being evaluated and planned for retesting. By January 15,2014, Debian, Ubuntu , SUSE13.1, Fedora, RedHat, and probably every distribution that has an old or recent kernel will be corrected. So, whats the next topic? Regards Leslie Mr. Leslie Satenstein An experienced Information Technology specialist. Yesterday was a good day, today is a better day, and tomorrow will be even better.lsatenst...@yahoo.com SENT FROM MY OPEN SOURCE LINUX SYSTEM. > > From: Dmitrijs Ledkovs >To: Steven Chamberlain >Cc: Johannes Schauer ; Debian Release >; debian-po...@lists.debian.org >Sent: Wednesday, November 27, 2013 7:15 PM >Subject: Re: A new metric for source package importance in ports > > >On 28 November 2013 00:04, Steven Chamberlain wrote: >> Hi josch! >> >> On 27/11/13 17:58, Johannes Schauer wrote: >>> http://mister-muffin.de/p/Gid8.txt >>> >>> One can see that now the amount of source packages which is needed to build >>> the >>> rest of the archive is only 383. >> >> So, there are 383 packages that share the same, maximum value (in this >> case 11657) in the second column? >> >>> Does anybody see enough value in these numbers for source package >>> importance in >>> the light of bootstrapping Debian (either for a new port or for rebuilding >>> the >>> archive from scratch)? >> >> I find the list of 383 packages interesting, at least. I think this >> closure is what I had in mind[0] for regular testing of ports' >> toolchains and reproducibility of builds. Because each Debian port >> depends in some indirect way on the authenticity of these packages. And >> likewise any toolchain bugs are most critical here. I just didn't think >> there would be so many packages. >> >> Does the list vary by architecture? I see many odd things in here such >> as 'systemd' and 'redhat-cluster' which would be unavailable if trying >> to bootstrap a non-Linux port, for example. >> >> I also find it interesting to see openjdk-7 listed but not gcj; or even >> gcc-4.8. Was this computed for jessie or sid? > >I guess implicit relationships are not considered: build-essential >build-dependencies, and essential dependencies. I would expect for >packages in those to sets have the highest rank, since, >hypothetically, all packages in debian build-depend & depend on those. > >Regards, > >Dmitrijs. > > > >-- >To UNSUBSCRIBE, email to debian-amd64-requ...@lists.debian.org >with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org >Archive: >http://lists.debian.org/CANBHLUiifmR+_keS3eSQa_b3_CfZ_56o9vBRR8p2SeY=hy9...@mail.gmail.com > > > > >
Re: A new metric for source package importance in ports
On 28 November 2013 00:04, Steven Chamberlain wrote: > Hi josch! > > On 27/11/13 17:58, Johannes Schauer wrote: >> http://mister-muffin.de/p/Gid8.txt >> >> One can see that now the amount of source packages which is needed to build >> the >> rest of the archive is only 383. > > So, there are 383 packages that share the same, maximum value (in this > case 11657) in the second column? > >> Does anybody see enough value in these numbers for source package importance >> in >> the light of bootstrapping Debian (either for a new port or for rebuilding >> the >> archive from scratch)? > > I find the list of 383 packages interesting, at least. I think this > closure is what I had in mind[0] for regular testing of ports' > toolchains and reproducibility of builds. Because each Debian port > depends in some indirect way on the authenticity of these packages. And > likewise any toolchain bugs are most critical here. I just didn't think > there would be so many packages. > > Does the list vary by architecture? I see many odd things in here such > as 'systemd' and 'redhat-cluster' which would be unavailable if trying > to bootstrap a non-Linux port, for example. > > I also find it interesting to see openjdk-7 listed but not gcj; or even > gcc-4.8. Was this computed for jessie or sid? I guess implicit relationships are not considered: build-essential build-dependencies, and essential dependencies. I would expect for packages in those to sets have the highest rank, since, hypothetically, all packages in debian build-depend & depend on those. Regards, Dmitrijs. -- To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/CANBHLUiifmR+_keS3eSQa_b3_CfZ_56o9vBRR8p2SeY=hy9...@mail.gmail.com
Re: A new metric for source package importance in ports
Johannes Schauer wrote: Hi, the following is a report of a successful implementation of what I have been talking about with Niels Thykier during debconf13. The question was how important it is for a source package to be compilable or exist in the first place given an incomplete port which is in the process of being bootstrapped. This work is solving a different purpose than the identification of "key packages" by Lucas Nussbaum [1]. Instead of attaching a binary value to each source package, this method is associating integer values to them. Once bootstrapping of the whole archive becomes more important or even possible in real life through an implementation of build profiles, this heuristic could be used to further extend the meaning of "key packages" as well. One problem with these metrics is that you get source packages whose importance is artifically inflated because of the way our source packages work. If anything in a source package needs x then the whole source package has to build-depend on x. Even if x is only needed for some perhipheral functionlity that could easilly be removed in the event that x was unavailable (either on a particular port or in general). In the case of libraries there may be a binary dependency too for rarely used fuctionality. For example some of the mesa libraries drag in libwayland0 which means wayland ends up with a very high importance even though afaict hardly anyone uses it right now. Another big example is languages. Lots of packages build language bindings for lots of languages dragging those languages into the "important set". So these metrics are a good guide to what packages are unimportant but to determine whether a package is really important or just psuedo-important still requires human judgement. -- To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/52968a89.6050...@p10link.net
Re: A new metric for source package importance in ports
Hi josch! On 27/11/13 17:58, Johannes Schauer wrote: > http://mister-muffin.de/p/Gid8.txt > > One can see that now the amount of source packages which is needed to build > the > rest of the archive is only 383. So, there are 383 packages that share the same, maximum value (in this case 11657) in the second column? > Does anybody see enough value in these numbers for source package importance > in > the light of bootstrapping Debian (either for a new port or for rebuilding the > archive from scratch)? I find the list of 383 packages interesting, at least. I think this closure is what I had in mind[0] for regular testing of ports' toolchains and reproducibility of builds. Because each Debian port depends in some indirect way on the authenticity of these packages. And likewise any toolchain bugs are most critical here. I just didn't think there would be so many packages. Does the list vary by architecture? I see many odd things in here such as 'systemd' and 'redhat-cluster' which would be unavailable if trying to bootstrap a non-Linux port, for example. I also find it interesting to see openjdk-7 listed but not gcj; or even gcc-4.8. Was this computed for jessie or sid? [0]: http://lists.debian.org/5266df9d.9040...@pyro.eu.org Regards, -- Steven Chamberlain ste...@pyro.eu.org -- To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/529688a8.8080...@pyro.eu.org
A new metric for source package importance in ports
Hi, the following is a report of a successful implementation of what I have been talking about with Niels Thykier during debconf13. The question was how important it is for a source package to be compilable or exist in the first place given an incomplete port which is in the process of being bootstrapped. This work is solving a different purpose than the identification of "key packages" by Lucas Nussbaum [1]. Instead of attaching a binary value to each source package, this method is associating integer values to them. Once bootstrapping of the whole archive becomes more important or even possible in real life through an implementation of build profiles, this heuristic could be used to further extend the meaning of "key packages" as well. This heuristic attaches to each source package A the number of source packages which need A to be compilable so that they become compilable themselves. The dependency graph which is needed to extract this information is conveniently created by the service I run as http://bootstrap.debian.net - I'm using a simple Python script to walk this graph to extract the information. In fact that Python script uses two different graphs. Since dependencies contain disjunctions, there exists different choices for packages which have to be available for something to be compilable or installable. To not make this choice arbitrary, I calculate the minimum number of dependencies that have to be available (strong dependencies) and the maximum number that has to be available (dependency closure). Therefore each source package A is associated with two numbers: the minimum amount of source packages which depend on A being compilable and the maximum number of source packages which depend on A being compilable. To create more than syntactic meaning I also added popcon information to the output. I associate to each source package A the sum of all popcon values of the source packages which depend on A being compilable. Again this is done for the minimum as well as the maximum. So here is the (tab delimetered) data in no particular order: http://mister-muffin.de/p/pVxb.txt 1st column: the name of the source package 2nd column: minimum number of source packages which need this source pacage to be compilable 3rd column: maximum number of source packages which need this source pacage to be compilable 4th column: minimum sum of popcon values 5th column: maximum sum of popcon values Do you see any obvious error? When sorting the data by the second column, you will see that there are 1194 source packages with the same value: 19554. This value corresponds to the total amount of source packages. It means: everything else depends on these 1194 source packages being compilable. If those 1194 source package are not compilable then the rest will be neither. Remember that this only true during a bootstrappping scenario. These 1194 source package are also all part of the same strongly connected component of the strong srcgraph and roughly correlate to the smallest set of packages which are needed for a self-hosting Debian system. We call a set of binary and source packages self-hosting if all binary packages can be created from the source packages and all source packages can be compiled with just the available binary packages. In my opinion it would make sense to make all packages which are at minimum required to make Debian self-hosted to the set of "key packages" by extending the definition by Lucas Nussbaum at [1]. The amount of source packages which are needed to bootstrap themselves and all the rest of Debian is that high because it includes source packages which are only included because of the arch:all binary packages they build, because of the essential:yes packages they build or because of the build-essential packages they build. While it is important to include these for rebuilds of the whole archive, they are not important in a real bootstrap situation. Arch:all binary packages already exist and do not need to be bootstrapped and to start to compile packages natively, a minimal build system (essential:yes + build-essential) is required in the first place. Therefore I created a different graph which takes into account that arch:all packages as well as the packages of the minimal build system do not need to be rebuild: http://mister-muffin.de/p/Gid8.txt One can see that now the amount of source packages which is needed to build the rest of the archive is only 383. It is important that these source packages remain compilable (in addition to essential:yes + build-essential being cross-able) because otherwise a bootstrap of any new architecture cannot be done. The service at http://bootstrap.debian.net will indicate that an architecture is not bootstrappable at all if this is the case. Does anybody see enough value in these numbers for source package importance in the light of bootstrapping Debian (either for a new port or for rebuilding the archive from scratch)? If so, then I can generate