Re: A new metric for source package importance in ports

2013-11-28 Thread Johannes Schauer
Hi,

Quoting Steven Chamberlain (2013-11-28 01:04:56)
 On 27/11/13 17:58, Johannes Schauer wrote:
  http://mister-muffin.de/p/Gid8.txt
  
  One can see that now the amount of source packages which is needed to build 
  the
  rest of the archive is only 383.
 
 So, there are 383 packages that share the same, maximum value (in this
 case 11657) in the second column?

Correct.

$ curl -s http://mister-muffin.de/p/Gid8.txt | awk '{ if ($2==11657) print $0 
}' | wc -l
383

In this particular graph the maximum value of the second column (11657) is less
than the total amount of source packages (in contrast to the first graph)
because this latter graph assumes that arch:all, essential:yes and
build-essential do not have to be rebuild. Therefore, lots of source packages
do not have to be compiled at all.

  Does anybody see enough value in these numbers for source package
  importance in the light of bootstrapping Debian (either for a new port or
  for rebuilding the archive from scratch)?
 
 I find the list of 383 packages interesting, at least.  I think this
 closure is what I had in mind[0] for regular testing of ports' toolchains and
 reproducibility of builds.

In that email you wrote:

 Some people have been trying to identify small sets of essential packages
 already, in the context of bootstrapping an architecture[1].  I wonder if
 that's likely to overlap with this?  It encompasses toolchain and essential
 arch-specific packages.
 
 I imagine a healthy port should be able to bootstrap itself with only current
 package versions.  If this was being tested regularly it could let porters
 know if circular dependencies are introduced

Yes, if you omit the necessity to rebuild arch:all packages, then these 383
source packages are about what you were talking about: the set of source
packages which makes a port able to bootstrap itself. Though notice that this
number (383) is only the very lower bound because it was deducted using strong
dependencies only. You can see the upper bound in the column that was
calculated using the closure graph which would be 457 source packages.

If you also want to rebuild arch:all packages, then you have to look at the
first graph and then the number quickly climbs to 1194 source packages minimum
and 1424 source packages maximum.

 Does the list vary by architecture?  I see many odd things in here such as
 'systemd' and 'redhat-cluster' which would be unavailable if trying to
 bootstrap a non-Linux port, for example.

Yes it does vary by architecture because dependencies can have architecture
qualifiers. Here, I used amd64 as an example.

 I also find it interesting to see openjdk-7 listed but not gcj;  or even
 gcc-4.8.  Was this computed for jessie or sid?

Using Debian Sid as of yesterday.

cheers, josch


--
To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20131128080012.2752.32993@hoothoot



Re: A new metric for source package importance in ports

2013-11-28 Thread Johannes Schauer
Hi,

Quoting Dmitrijs Ledkovs (2013-11-28 01:15:06)
 On 28 November 2013 00:04, Steven Chamberlain ste...@pyro.eu.org wrote:
  I also find it interesting to see openjdk-7 listed but not gcj;  or even
  gcc-4.8.  Was this computed for jessie or sid?
 
 I guess implicit relationships are not considered: build-essential
 build-dependencies, and essential dependencies. I would expect for
 packages in those to sets have the highest rank, since,
 hypothetically, all packages in debian build-depend  depend on those.

Steven was looking at the second graph which (in contrast to the first graph)
makes the assumption that essential:yes and build-essential are already
available somehow (for example by having cross compiled them) and thus do not
need to be recompiled to bootstrap the port.

gcj and gcc-4.8 is part of the packages which are drawn in by creating a
co-installation set of essential:yes and build-essential packages. Therefore
they do not appear in the second graph.

Since this co-installation set is an input to the algorithm of creating the
second graph, they implicitly receive the highest rank. For the same reason you
will also see them being assigned the highest rank in the first graph which
does not assume that essential:yes and build-essential do not have to be
recompiled.

Implicit dependency relationships are considered by both algorithms to
calculate the strong dependencies and the dependency closure of source and
binary packages. My code uses dose3 to do the required calculations.

cheers, josch


--
To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20131128080700.2752.98455@hoothoot



Re: A new metric for source package importance in ports

2013-11-28 Thread Mark Morgan Lloyd

peter green wrote:

Johannes Schauer wrote:

Hi,

the following is a report of a successful implementation of what I 
have been

talking about with Niels Thykier during debconf13. The question was how
important it is for a source package to be compilable or exist in the 
first
place given an incomplete port which is in the process of being 
bootstrapped.

This work is solving a different purpose than the identification of key
packages by Lucas Nussbaum [1]. Instead of attaching a binary value 
to each

source package, this method is associating integer values to them. Once
bootstrapping of the whole archive becomes more important or even 
possible in
real life through an implementation of build profiles, this heuristic 
could be

used to further extend the meaning of key packages as well.

One problem with these metrics is that you get source packages whose
importance is artifically inflated because of the way our source
packages work. If anything in a source package needs x then the whole
source package has to build-depend on x.  Even if x is only needed for
some perhipheral functionlity that could easilly be removed in the event
that x was unavailable (either on a particular port or in general). In
the case of libraries there may be a binary dependency too for rarely
used fuctionality.

For example some of the mesa libraries drag in libwayland0 which means
wayland ends up with a very high importance even though afaict hardly
anyone uses it right now.


There's also issues where an individual package makes questionable 
assertions about its dependencies. For example, both elvis and nmap have 
the option of a GUI frontend, which means that as soon as they're 
installed onto a headless system they force a significant amount of X 
stuff to be loaded even if this is not desired.


--
Mark Morgan Lloyd
markMLl .AT. telemetry.co .DOT. uk

[Opinions above are the author's, not those of his employers or colleagues]


--
To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/52970f16.1050...@telemetry.co.uk



Re: A new metric for source package importance in ports

2013-11-27 Thread Steven Chamberlain
Hi josch!

On 27/11/13 17:58, Johannes Schauer wrote:
 http://mister-muffin.de/p/Gid8.txt
 
 One can see that now the amount of source packages which is needed to build 
 the
 rest of the archive is only 383.

So, there are 383 packages that share the same, maximum value (in this
case 11657) in the second column?

 Does anybody see enough value in these numbers for source package importance 
 in
 the light of bootstrapping Debian (either for a new port or for rebuilding the
 archive from scratch)?

I find the list of 383 packages interesting, at least.  I think this
closure is what I had in mind[0] for regular testing of ports'
toolchains and reproducibility of builds.  Because each Debian port
depends in some indirect way on the authenticity of these packages.  And
likewise any toolchain bugs are most critical here.  I just didn't think
there would be so many packages.

Does the list vary by architecture?  I see many odd things in here such
as 'systemd' and 'redhat-cluster' which would be unavailable if trying
to bootstrap a non-Linux port, for example.

I also find it interesting to see openjdk-7 listed but not gcj;  or even
gcc-4.8.  Was this computed for jessie or sid?

[0]: http://lists.debian.org/5266df9d.9040...@pyro.eu.org

Regards,
-- 
Steven Chamberlain
ste...@pyro.eu.org


-- 
To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/529688a8.8080...@pyro.eu.org



Re: A new metric for source package importance in ports

2013-11-27 Thread peter green

Johannes Schauer wrote:

Hi,

the following is a report of a successful implementation of what I have been
talking about with Niels Thykier during debconf13. The question was how
important it is for a source package to be compilable or exist in the first
place given an incomplete port which is in the process of being bootstrapped.
This work is solving a different purpose than the identification of key
packages by Lucas Nussbaum [1]. Instead of attaching a binary value to each
source package, this method is associating integer values to them. Once
bootstrapping of the whole archive becomes more important or even possible in
real life through an implementation of build profiles, this heuristic could be
used to further extend the meaning of key packages as well.

One problem with these metrics is that you get source packages whose
importance is artifically inflated because of the way our source
packages work. If anything in a source package needs x then the whole
source package has to build-depend on x.  Even if x is only needed for
some perhipheral functionlity that could easilly be removed in the event
that x was unavailable (either on a particular port or in general). In
the case of libraries there may be a binary dependency too for rarely
used fuctionality.

For example some of the mesa libraries drag in libwayland0 which means
wayland ends up with a very high importance even though afaict hardly
anyone uses it right now.

Another big example is languages. Lots of packages build language
bindings for lots of languages dragging those languages into the
important set.

So these metrics are a good guide to what packages are unimportant
but to determine whether a package is really important or just
psuedo-important still requires human judgement.


--
To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/52968a89.6050...@p10link.net



Re: A new metric for source package importance in ports

2013-11-27 Thread Dmitrijs Ledkovs
On 28 November 2013 00:04, Steven Chamberlain ste...@pyro.eu.org wrote:
 Hi josch!

 On 27/11/13 17:58, Johannes Schauer wrote:
 http://mister-muffin.de/p/Gid8.txt

 One can see that now the amount of source packages which is needed to build 
 the
 rest of the archive is only 383.

 So, there are 383 packages that share the same, maximum value (in this
 case 11657) in the second column?

 Does anybody see enough value in these numbers for source package importance 
 in
 the light of bootstrapping Debian (either for a new port or for rebuilding 
 the
 archive from scratch)?

 I find the list of 383 packages interesting, at least.  I think this
 closure is what I had in mind[0] for regular testing of ports'
 toolchains and reproducibility of builds.  Because each Debian port
 depends in some indirect way on the authenticity of these packages.  And
 likewise any toolchain bugs are most critical here.  I just didn't think
 there would be so many packages.

 Does the list vary by architecture?  I see many odd things in here such
 as 'systemd' and 'redhat-cluster' which would be unavailable if trying
 to bootstrap a non-Linux port, for example.

 I also find it interesting to see openjdk-7 listed but not gcj;  or even
 gcc-4.8.  Was this computed for jessie or sid?

I guess implicit relationships are not considered: build-essential
build-dependencies, and essential dependencies. I would expect for
packages in those to sets have the highest rank, since,
hypothetically, all packages in debian build-depend  depend on those.

Regards,

Dmitrijs.


-- 
To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/CANBHLUiifmR+_keS3eSQa_b3_CfZ_56o9vBRR8p2SeY=hy9...@mail.gmail.com



Re: A new metric for source package importance in ports

2013-11-27 Thread Leslie S Satenstein
Instead of dwelling on this discovery, which is not productive, why not 
concentrate on what to do to improve Debian.

The analysis has shown faults. Has Debian stopped working?  Has the world 
crashed?  

The problems have been identified, the patches to address the issues are being 
evaluated and planned for retesting.

By January 15,2014, Debian, Ubuntu , SUSE13.1, Fedora, RedHat, and probably 
every distribution that has an old or recent kernel will be corrected.

So, whats the next topic?


 
Regards 

 Leslie

Mr. Leslie Satenstein
An experienced Information Technology specialist.
Yesterday was a good day, today is a better day,
and tomorrow will be even better.lsatenst...@yahoo.com
SENT FROM MY OPEN SOURCE LINUX SYSTEM.





 From: Dmitrijs Ledkovs x...@debian.org
To: Steven Chamberlain ste...@pyro.eu.org 
Cc: Johannes Schauer j.scha...@email.de; Debian Release 
debian-rele...@lists.debian.org; debian-po...@lists.debian.org 
Sent: Wednesday, November 27, 2013 7:15 PM
Subject: Re: A new metric for source package importance in ports
 

On 28 November 2013 00:04, Steven Chamberlain ste...@pyro.eu.org wrote:
 Hi josch!

 On 27/11/13 17:58, Johannes Schauer wrote:
 http://mister-muffin.de/p/Gid8.txt

 One can see that now the amount of source packages which is needed to build 
 the
 rest of the archive is only 383.

 So, there are 383 packages that share the same, maximum value (in this
 case 11657) in the second column?

 Does anybody see enough value in these numbers for source package 
 importance in
 the light of bootstrapping Debian (either for a new port or for rebuilding 
 the
 archive from scratch)?

 I find the list of 383 packages interesting, at least.  I think this
 closure is what I had in mind[0] for regular testing of ports'
 toolchains and reproducibility of builds.  Because each Debian port
 depends in some indirect way on the authenticity of these packages.  And
 likewise any toolchain bugs are most critical here.  I just didn't think
 there would be so many packages.

 Does the list vary by architecture?  I see many odd things in here such
 as 'systemd' and 'redhat-cluster' which would be unavailable if trying
 to bootstrap a non-Linux port, for example.

 I also find it interesting to see openjdk-7 listed but not gcj;  or even
 gcc-4.8.  Was this computed for jessie or sid?

I guess implicit relationships are not considered: build-essential
build-dependencies, and essential dependencies. I would expect for
packages in those to sets have the highest rank, since,
hypothetically, all packages in debian build-depend  depend on those.

Regards,

Dmitrijs.



-- 
To UNSUBSCRIBE, email to debian-amd64-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/CANBHLUiifmR+_keS3eSQa_b3_CfZ_56o9vBRR8p2SeY=hy9...@mail.gmail.com






Re: A new metric for source package importance in ports

2013-11-27 Thread Johannes Schauer
Hi,

Quoting peter green (2013-11-28 01:12:57)
 One problem with these metrics is that you get source packages whose
 importance is artifically inflated because of the way our source packages
 work. If anything in a source package needs x then the whole source package
 has to build-depend on x.  Even if x is only needed for some perhipheral
 functionlity that could easilly be removed in the event that x was
 unavailable (either on a particular port or in general). In the case of
 libraries there may be a binary dependency too for rarely used fuctionality.
 
 For example some of the mesa libraries drag in libwayland0 which means
 wayland ends up with a very high importance even though afaict hardly
 anyone uses it right now.
 
 Another big example is languages. Lots of packages build language
 bindings for lots of languages dragging those languages into the
 important set.
 
 So these metrics are a good guide to what packages are unimportant
 but to determine whether a package is really important or just
 psuedo-important still requires human judgement.

Correct.

The situation can be greatly improved once build profiles allow to mark build
dependencies as less important or non essential.

cheers, josch


--
To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20131128074506.2752.10616@hoothoot