Re: Multiarch file overlap summary and proposal (was: Summary: dpkg shared / reference counted files and version match)

2012-02-29 Thread Guillem Jover
On Wed, 2012-02-15 at 16:41:21 +, Ian Jackson wrote:
 Guillem Jover writes (Re: Multiarch file overlap summary and proposal (was: 
 Summary: dpkg shared / reference counted files and version match)):
   [...]  But trying to workaround this by coming
  up with stacks of hacked up solutions  [...]
 
 I disagree with your tendentious phrasing.  The refcnt feature is not
 a hacked up solution (nor a stack of them).  It is entirely normal
 in Debian core tools (as in any substantial piece of software serving
 a lot of diverse needs) to have extra code to make it easier to deploy
 or use in common cases simpler.

All along this thread, when referring to the additional complexity and
the additional hacks, I've not been talking about the refcnt'ing at
all, but to all the other fixes needed to make it a workable solution.

regards,
guillem


-- 
To UNSUBSCRIBE, email to debian-dpkg-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20120229195152.ga4...@gaara.hadrons.org



Re: Multiarch file overlap summary and proposal (was: Summary: dpkg shared / reference counted files and version match)

2012-02-15 Thread Ian Jackson
Guillem Jover writes (Re: Multiarch file overlap summary and proposal (was: 
Summary: dpkg shared / reference counted files and version match)):
 On Tue, 2012-02-14 at 14:28:58 +, Ian Jackson wrote:
  I think the refcounting approach is very worthwhile because it
  eliminates unnecessary work (by human maintainers) in many simple
  cases.
 
 Aside from what I said on my other reply, I just wanted to note that
 this seems to be a recurring point of tension in the project when it
 comes to archive wide source package changes, where supposed short
 term convenience (with its usually long term harmful effects) appears
 to initially seduce people over what seems to be the cleaner although
 slightly a bit more laborious solution.

The refcnt doesn't just eliminate unnecessary multiarch
conversion work.  It also eliminates unnecessary maintenance effort.
Maintaining a split package will be more work than without.

I think that over the lifetime of the multiarch deployment this extra
packaging work will far outweigh the extra maintenance and
documentation burden of the refcnt feature.

  [...]  But trying to workaround this by coming
 up with stacks of hacked up solutions  [...]

I disagree with your tendentious phrasing.  The refcnt feature is not
a hacked up solution (nor a stack of them).  It is entirely normal
in Debian core tools (as in any substantial piece of software serving
a lot of diverse needs) to have extra code to make it easier to deploy
or use in common cases simpler.

Ian.


-- 
To UNSUBSCRIBE, email to debian-dpkg-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/20283.57393.237949.649...@chiark.greenend.org.uk



Re: Multiarch file overlap summary and proposal (was: Summary: dpkg shared / reference counted files and version match)

2012-02-14 Thread Raphael Hertzog
On Tue, 14 Feb 2012, Philipp Kern wrote:
 On 2012-02-14, Raphael Hertzog hert...@debian.org wrote:
  Somehow my suggestion is then to extend dpkg-parsechangelog to provide
  the required logic to split the changelog in its bin-nmu part and its
  usual content.
 
  dpkg-parsechangelog --split-binnmu binnmu-part-file remaining-part-file
 
  Then dh_installchangelogs could try to use this (and if it fails, fallback
  to the standard changelog installation).
 
  Does that sound sane? If yes, I can have a look at implementing this.
 
 In theory sbuild could also offload this to dpkg-buildpackage by passing
 something like --binnmu-version 2 --binnmu-changelog 'Rebuild for libfoo
 transition'.  The only thing that would be annoying is checking if the old
 style or the new style must be used.  (I.e. there must be some sort of feature
 query first.)

Yes but that doesn't change anything to the fact that dpkg-dev should not
install files in the generated .deb. So we still need some interaction
with dh_installchangelogs... but your suggestion lead me to another
proposal.

dpkg-buildpackage --binary-version ver --binary-changelog 'foo'
could create debian/changelog.build with the given changelog version and
changelog entry.

dpkg-parsechangelog could be taught to read debian/changelog.build
before debian/changelog so that dpkg-parsechangelog continues to do the
right thing (when called from debian/rules).

And dh_installchangelogs can be taught to install debian/changelog.build
as /usr/share/doc/foo/changelog.Debian.build-$arch.

dpkg-buildpackage would clean up debian/changelog.build if it wasn't
passed the proper option. dpkg-source would learn to not include it in
generated source packages, too.

This looks like rather appealing to me. What do you think?

Cheers,
-- 
Raphaël Hertzog ◈ Debian Developer

Pre-order a copy of the Debian Administrator's Handbook and help
liberate it: http://debian-handbook.info/liberation/


-- 
To UNSUBSCRIBE, email to debian-dpkg-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20120214131720.gd11...@rivendell.home.ouaza.com



Re: Multiarch file overlap summary and proposal (was: Summary: dpkg shared / reference counted files and version match)

2012-02-14 Thread Guillem Jover
On Mon, 2012-02-13 at 22:43:04 -0800, Russ Allbery wrote:
 If this is comprehensive, then I propose the following path forward, which
 is a mix of the various solutions that have been discussed:

 * dpkg re-adds the refcounting implementation for multiarch, but along
   with a Policy requirement that packages that are multiarch must only
   contain files in classes 1 and 2 above.

 * All packages that want to be multiarch: same have to move all generated
   documentation into a separate package unless the maintainer has very
   carefully checked that the generated documentation will be byte-for-byte
   identical even across minor updates of the documentation generation
   tools and when run at different times.

If packages have to be split anyway to cope with the other cases, then
the number of new packages which might not be needed otherwise will be
even smaller than the predicted amount, at which point it makes even
less sense to support refcnt'ing.

It also requires maintainers to carefully consider if the (doc, etc)
toolchains will generate predictible ouput.

Your proposal still requires papering over the other corner-cases.

 * Policy prohibits arch-varying data files in multiarch: same packages
   except in arch-qualified paths.

Well, there's no escape from this any way you look at it, regardless of
refcnt'ing or not.

 * The binNMU process is changed to add the binNMU changelog entry to an
   arch-qualified file (changelog.Debian.arch, probably).  We need to
   figure out what this means if the package being binNMU'd has a
   /usr/share/doc/package symlink to another package, though; it's not
   obvious what to do here.

This requires IMO multitude of hacks when the simplest and obvious
arch-qualified pkgname solves this cleanly, and allows debhelper to
automatically deal with it. And for tools to just change where they
always look for those files in the M-A:same case regardless of the
package being binNMUed or not.

This still does not solve the other issues I listed, namely binNMUs
have to be performed in lock-step, more complicated transitions /
upgrades. And introduces different solutions for different problems,
while my proposal is generic for all cases.

So this is still pretty much unconvincing, and seems like clinging
into the refcnt'ing “solution” while it makes things overall more
complicated, will introduce inconsistency and incertainty to
maintainers, needs way more global changes to keep it going, etc.

What I'd change to my proposal in the summary mail, is that arch-indep
files might be considered for splitting at maintainers discretion,
when it actually seems worth it, in the same way we've handled
splitting arch-indep files from arch:any up to now. So for example a
couple of headers could be kept on the -dev package, or Ian's case on
essential and data files could also be kept on the same lib package,
as long as their paths are arch-qualified either trhough a pkgname:arch
or the multiarch triplet. This would reduce even more the amount of
newly split packages.

regards,
guillem


-- 
To UNSUBSCRIBE, email to debian-dpkg-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20120214140138.ga23...@gaara.hadrons.org



Re: Multiarch file overlap summary and proposal (was: Summary: dpkg shared / reference counted files and version match)

2012-02-14 Thread Josselin Mouette
Le lundi 13 février 2012 à 22:43 -0800, Russ Allbery a écrit : 
 There's been a lot of discussion of this, but it seems to have been fairly
 inconclusive.  We need to decide what we're doing, if anything, for wheezy
 fairly soon, so I think we need to try to drive this discussion to some
 concrete conclusions.

Thank you very much for your constructive work.

 3. Generated documentation.  Here's where I think refcounting starts
failing.

So we need to move a lot of documentation generated with gtk-doc or
doxygen from -dev packages to -doc packages. But it really seems an
acceptable tradeoff between the amount of work required and the
cleanness of the solution.

 Does this seem comprehensive to everyone?  Am I missing any cases?

Are there any cases of configuration files in /etc that vary across
architectures? Think of stuff like ld.so.conf, where some plugins or
library path is coded in a configuration file.

-- 
 .''`.  Josselin Mouette
: :' :
`. `'
  `-


--
To UNSUBSCRIBE, email to debian-dpkg-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/1329230441.3297.378.camel@pi0307572



Re: Multiarch file overlap summary and proposal (was: Summary: dpkg shared / reference counted files and version match)

2012-02-14 Thread Jakub Wilk

* Raphael Hertzog hert...@debian.org, 2012-02-14, 14:17:
dpkg-buildpackage --binary-version ver --binary-changelog 'foo' could 
create debian/changelog.build with the given changelog version and 
changelog entry.


dpkg-parsechangelog could be taught to read debian/changelog.build 
before debian/changelog so that dpkg-parsechangelog continues to do the 
right thing (when called from debian/rules).


And dh_installchangelogs can be taught to install 
debian/changelog.build as 
/usr/share/doc/foo/changelog.Debian.build-$arch.


dpkg-buildpackage would clean up debian/changelog.build if it wasn't 
passed the proper option. dpkg-source would learn to not include it in 
generated source packages, too.


This looks like rather appealing to me. What do you think?


Yes, it does look appealing. But...

Are we sure than no existing package uses debian/changelog.build for 
their own purposes?


Are we sure that all existing packages (and helpers) that parse 
debian/changelog use dpkg-parsechangelog?


--
Jakub Wilk


--
To UNSUBSCRIBE, email to debian-dpkg-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20120214144341.ga3...@jwilk.net



Re: Multiarch file overlap summary and proposal (was: Summary: dpkg shared / reference counted files and version match)

2012-02-14 Thread Raphael Hertzog
Hi,

On Tue, 14 Feb 2012, Guillem Jover wrote:
  * All packages that want to be multiarch: same have to move all generated
documentation into a separate package unless the maintainer has very
carefully checked that the generated documentation will be byte-for-byte
identical even across minor updates of the documentation generation
tools and when run at different times.
 
 If packages have to be split anyway to cope with the other cases, then
 the number of new packages which might not be needed otherwise will be
 even smaller than the predicted amount, at which point it makes even
 less sense to support refcnt'ing.

Why are you so opposed to the refcnt'ing?

It's not such a big deal to maintain this feature in dpkg. And even if the
current implementation is not perfect, it can be improved later when dpkg
will store by itself checksums of provided files.

To me it looks like you don't like refcnt'ing and you're trying to find
some reasons to make it unacceptable.

 It also requires maintainers to carefully consider if the (doc, etc)
 toolchains will generate predictible ouput.

If the maintainer has to install files in non-standard path (because of
the need to arch-qualify it), it will also need maintainers to carefully
consider how to ensure that this move doesn't break anything.

It's not a white/black situation. You're trading one potential problem for
another. And the differing files are likely to be much more easy to spot
than other behaviour changes that might be implied by the move of some
files to arch qualified paths.

 Your proposal still requires papering over the other corner-cases.

Can you be explicit about which corner cases you're referring to ?

 This still does not solve the other issues I listed, namely binNMUs
 have to be performed in lock-step

Can you explain why? If the binnmu changelog is in a arch-specific file,
then we're free to bin-nmu packages separately.

dpkg must just ensure that all M-A: same packages have the same source
version (instead of the binary version as currently).

, more complicated transitions / upgrades.

We have no experience on this. It's a bit early to say whether those
constraints are going to be problematic or not.

 And introduces different solutions for different problems, while my
 proposal is generic for all cases.

There's nothing like a generic solution. You still have to decide whether
you move files to a -common package or if you arch qualify them and keep
them in the M-A: same package. And in both cases, you have to evaluate the
implications, in terms of package installation ordering in one case, in
terms of modifications to do to properly support the arch-qualified files
in the other one.

While it may sound like cleaner from a theoretical point of view, I'm
not convinced that it's better than the approach outlined by Russ.

Also you completely ignore the fact that what you're proposing is an
important change for multi-arch packages that have already been converted
both in Debian and in Ubuntu. You're pushing back the work to package
maintainers when there's not reason to not deal with this at the build
infrastructure level.

To reduce some of the downsides associated to compressed files in M-A:
same packages, we could/should investigate how to not compress files
in such packages instead of duplicating them needlessly.

 So this is still pretty much unconvincing, and seems like clinging
 into the refcnt'ing “solution” while it makes things overall more
 complicated, will introduce inconsistency and incertainty to
 maintainers, needs way more global changes to keep it going, etc.

This is not a fair characterization of the situation. IMO Global changes are
better than lots of maintainers having to do busy-work splitting their
packages.

You see inconsistency in Russ's proposal but you don't see
inconsistency/incertainty when you change the standard location of
changelog files.

And the more complicated, it might be true at the dpkg level, but I
don't believe that it's true from the maintainers points of view.

Cheers,
-- 
Raphaël Hertzog ◈ Debian Developer

Pre-order a copy of the Debian Administrator's Handbook and help
liberate it: http://debian-handbook.info/liberation/


-- 
To UNSUBSCRIBE, email to debian-dpkg-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20120214151318.ga14...@rivendell.home.ouaza.com



Re: Multiarch file overlap summary and proposal (was: Summary: dpkg shared / reference counted files and version match)

2012-02-14 Thread Raphael Hertzog
On Tue, 14 Feb 2012, Jakub Wilk wrote:
 Are we sure than no existing package uses debian/changelog.build for
 their own purposes?

No, but with debian/changelog.dpkg-build we should be safe.

 Are we sure that all existing packages (and helpers) that parse
 debian/changelog use dpkg-parsechangelog?

No, but I would consider anything else as a bug and we would notice
relatively quickly (we could even do a full rebuild to try to verify
pro-actively).

Cheers,
-- 
Raphaël Hertzog ◈ Debian Developer

Pre-order a copy of the Debian Administrator's Handbook and help
liberate it: http://debian-handbook.info/liberation/


-- 
To UNSUBSCRIBE, email to debian-dpkg-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20120214152757.gc14...@rivendell.home.ouaza.com



Re: Multiarch file overlap summary and proposal (was: Summary: dpkg shared / reference counted files and version match)

2012-02-14 Thread Guillem Jover
On Tue, 2012-02-14 at 14:28:58 +, Ian Jackson wrote:
 Guillem Jover writes (Re: Multiarch file overlap summary and proposal (was: 
 Summary: dpkg shared / reference counted files and version match)):
  On Mon, 2012-02-13 at 22:43:04 -0800, Russ Allbery wrote:
   * The binNMU process is changed to add the binNMU changelog entry to an
 arch-qualified file (changelog.Debian.arch, probably).  We need to
 figure out what this means if the package being binNMU'd has a
 /usr/share/doc/package symlink to another package, though; it's not
 obvious what to do here.
  
  This requires IMO multitude of hacks when the simplest and obvious
  arch-qualified pkgname solves this cleanly, and allows debhelper to
  automatically deal with it. And for tools to just change where they
  always look for those files in the M-A:same case regardless of the
  package being binNMUed or not.
 
 I agree that it would be nice to always arch-qualify the changelog
 filename.  But that would involve a lot of changes to
 changelog-reading tools which we perhaps don't want to do right now.

I've never proposed to arch-qualify the filename for the stuff under
/usr/share/doc/pkgname/, I've proposed to arch-qualify the pkgname in
the path (/usr/share/doc/pkgname:arch/), but only for M-A:same packages,
which are the only ones needing the disambiguation. This is how dpkg
handles pkgname output, or how it stores their data in the db too.

And it should be easy to ask a multiarch enabled dpkg-query for example
to normalize the pkgname output to be used on those paths, or otherwise
do it by hand:

  if M-A == same
pkgname:arch
  else
pkgname

 Note that even if we decide to always arch-qualify, we will still have
 lots of old packages so all changelog-reading tools will need to look
 in both places.

 For most changelog-reading tools it won't be very troublesome if they
 accidentally don't spot a binNMU entry.  So Russ's proposal is a good
 step towards your proposal.  And if we decide we don't need to go all
 the way then it's good enough for now.

How many tools are there that actually read the binary package changelog
file anyway? I only know of packages.d.o. Any other tool reading from
the installed path, cannot really rely on it being present at all
anyway, per policy.

And in addition, binNMU split changelogs are going to be there forever,
and as such their possible double locations. While the possible double
location for M-A:same packages using pkgname:arch qualified pathnames
would only be temporary and disappear once the packages have been rebuilt
with a new debhelper which automatically installs them in the correct
place.

  So this is still pretty much unconvincing, and seems like clinging
  into the refcnt'ing “solution” while it makes things overall more
  complicated, will introduce inconsistency and incertainty to
  maintainers, needs way more global changes to keep it going, etc.
 
 I think the refcounting approach is very worthwhile because it
 eliminates unnecessary work (by human maintainers) in many simple
 cases.

As I mentioned in Riku's reply, the amount of packages that would need
splitting that would otherwise not be needed should be even less than
before (which was predicted at around 700), also as I mentioned there
too, nothing prevents us from arch-qualifying paths (with Debian arch
or multiarch triplet depending on the case) if that's more convenient
or safer (as per your essential data example), and is what we've been
doing anyway for arch-indep data shipped in arch:any packages all along.
Given the amount of hacks or special casing piling up to make refcnt'ing
workable, when all that's really needed is a one time handling (or a
possible additional change for already converted packages, for things
that debhelper might not be able to handle) of moving qualifying paths
or splitting into new packages, it really does not seem worth it, no.

regards,
guillem


-- 
To UNSUBSCRIBE, email to debian-dpkg-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20120214164015.ga27...@gaara.hadrons.org



Re: Multiarch file overlap summary and proposal (was: Summary: dpkg shared / reference counted files and version match)

2012-02-14 Thread Guillem Jover
On Tue, 2012-02-14 at 14:28:58 +, Ian Jackson wrote:
 I think the refcounting approach is very worthwhile because it
 eliminates unnecessary work (by human maintainers) in many simple
 cases.

Aside from what I said on my other reply, I just wanted to note that
this seems to be a recurring point of tension in the project when it
comes to archive wide source package changes, where supposed short
term convenience (with its usually long term harmful effects) appears
to initially seduce people over what seems to be the cleaner although
slightly a bit more laborious solution.

Other recent-ish incarnations of this tension could be the build-arch
build-indep targets, or the build flag settings; where the former got
recently resolved so that the right thing to do is for *all* packages
needing to eventually support those targets, or for the latter which
got switched from the seemingly more convenient to the more laborious
but correct solution, that is, *all* packages need to set those build
flags by themselves.

This is a fundamental issue with how our source packages are handled,
and the freedom and power it gives to experiment and implement them
whatever way the maintainer wants, has the price that doing some
archive wide changes is sometimes more costly, than changing something
centrally and be done with it. But trying to workaround this by coming
up with stacks of hacked up solutions will not solve that fundamental
issue, and this kind of tension will keep coming up again and again,
as long as the foundation is not reworked. Either that, or the project
needs to accept that fact and learn to live with this kind of changes,
with patience.

regards,
guillem


-- 
To UNSUBSCRIBE, email to debian-dpkg-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20120215011510.ga15...@gaara.hadrons.org



Re: Multiarch file overlap summary and proposal (was: Summary: dpkg shared / reference counted files and version match)

2012-02-14 Thread Joey Hess
Guillem Jover wrote:
 Aside from what I said on my other reply, I just wanted to note that
 this seems to be a recurring point of tension in the project when it
 comes to archive wide source package changes, where supposed short
 term convenience (with its usually long term harmful effects) appears
 to initially seduce people over what seems to be the cleaner although
 slightly a bit more laborious solution.
 
 Other recent-ish incarnations of this tension could be the build-arch
 build-indep targets, or the build flag settings; where the former got
 recently resolved so that the right thing to do is for *all* packages
 needing to eventually support those targets, or for the latter which
 got switched from the seemingly more convenient to the more laborious
 but correct solution, that is, *all* packages need to set those build
 flags by themselves.
 
 This is a fundamental issue with how our source packages are handled,
 and the freedom and power it gives to experiment and implement them
 whatever way the maintainer wants, has the price that doing some
 archive wide changes is sometimes more costly, than changing something
 centrally and be done with it. But trying to workaround this by coming
 up with stacks of hacked up solutions will not solve that fundamental
 issue, and this kind of tension will keep coming up again and again,
 as long as the foundation is not reworked. Either that, or the project
 needs to accept that fact and learn to live with this kind of changes,
 with patience.

Very interesting mail. While I certianly agree with your examples, it's
worth remembering the counterexample of the /usr/doc transition which
took approximately 5 years to complete[1], and probably could have been
accomplished quickly and without pain with a simple hack to dpkg.

Anyway, my worry about the refcounting approach (or perhaps M-A: same in
general) is not the details of the implementation in dpkg, but the added
mental complexity of dpkg now being able to have multiple distinct
packages installed under the same name. I had a brief exposure to rpm,
which can install multiple versions of the same package, and that was
the main cause of much confusing behavior in rpm. While dpkg's invariant
that all co-installable package names be unique (and have unique files)
has certianly led to lots of ugly package names, it's kept the users'
and developers' mental models quite simple.

I worry that we have barely begun to scratch the surface of the added
complexity of losing this invariant.

-- 
see shy jo

[1] To the extent it was ever completed.. master.debian.org still has
a vestigial /usr/doc/


signature.asc
Description: Digital signature


Re: Multiarch file overlap summary and proposal (was: Summary: dpkg shared / reference counted files and version match)

2012-02-14 Thread Raphael Hertzog
On Tue, 14 Feb 2012, Guillem Jover wrote:
 I've never proposed to arch-qualify the filename for the stuff under
 /usr/share/doc/pkgname/, I've proposed to arch-qualify the pkgname in
 the path (/usr/share/doc/pkgname:arch/), but only for M-A:same packages,
 which are the only ones needing the disambiguation. This is how dpkg
 handles pkgname output, or how it stores their data in the db too.
[...]
 How many tools are there that actually read the binary package changelog
 file anyway?

There's apt-listchanges surely. And probably a bunch of other that are
less known.

I don't know if it's worth it, but if we go down that route, and if we
want to keep /usr/share/doc/pkgname on user's systems we could create
a new command in dpkg-maintscript-helper to manage that path as
a symlink to the native M-A: same package (if possible, otherwise
to any installed arch). That dpkg-maintscript-helper call could be
auto-enabled by debhelper for M-A: same packages.

Cheers,
-- 
Raphaël Hertzog ◈ Debian Developer

Pre-order a copy of the Debian Administrator's Handbook and help
liberate it: http://debian-handbook.info/liberation/


-- 
To UNSUBSCRIBE, email to debian-dpkg-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20120215074019.gc24...@rivendell.home.ouaza.com



Multiarch file overlap summary and proposal (was: Summary: dpkg shared / reference counted files and version match)

2012-02-13 Thread Russ Allbery
There's been a lot of discussion of this, but it seems to have been fairly
inconclusive.  We need to decide what we're doing, if anything, for wheezy
fairly soon, so I think we need to try to drive this discussion to some
concrete conclusions.

First, Steve's point here is very good:

Steve Langasek vor...@debian.org writes:

 I guess we're looking at the same data, yet we seem to have reached
 opposite conclusions.

  - Riku reports that 33 out of 82k files have different compression when
using current gzip vs. 10-year-old gzip.  I'd be surprised if any of
those binary packages hadn't been superseded long ago.  It's not a
guarantee, but I think the risks, and ultimate cost, of relying on gzip
output to not change often and to just do sourceful rebuilds when it
isn't are a lot smaller than if we go about manually splitting our
packages further.

  - The cases where gzip output has been reported to not be reproducible
seem to all boil down to a single issue with gzip being passed
different arguments due to the unreproducible nature of *find*'s
output.  A patch has been made available already on the bug, and this
patch seems to address the instances of the problem that we've hit so
far in the Ubuntu archive.

 Now, it's worth following up with gzip upstream about our concerns, but
 even without that, I just don't see this being problematic.

It isn't the end of the world if we have some conflicts provided that we
can detect them and can do something consistent to fix them.  I'm rather
nervous about relying on reproducibility of gzip because of Joey's
experience with pristine-tar, where he does find a lot of variation in
practice, but it is true that, for the purposes of multiarch, Debian *can*
possibly construct things such that we only need to worry about our own
gzip, which does simplify the situation.

However, as we've subsequently discussed, those are not the only issues
with file overlaps between packages.  So I'm going to try to summarize and
propose some possible solutions for the different issues.  I'm going to
discuss these issues in order from the most consistent with a refcounting
solution to the least consistent.

1. Uncompressed files that we know are absolutely identical between
   different architectures.  These include arch-independent header files
   that are just copied verbatim from the upstream source and data files
   in textual formats or arch-independent binary formats that aren't
   compressed and whose generation doesn't vary.  (Symlinks are a special
   case of this.)  Reference counting works great for these.  These also
   resolve most of the file overlaps between -dev packages, and many of
   the harder cases for interpackage dependencies if we split everything
   out.  I think it makes a lot of sense to use refcounting for these
   files.

2. Files like the above but that are compressed.  This is most common in
   the doc directory for things like README or the upstream changelog.
   Upstream man pages written directly in *roff fall into this category as
   well, for -dev packages.  With Steve's point above about gzip, I think
   we're probably okay using refcounting for this as well.

3. Generated documentation.  Here's where I think refcounting starts
   failing.  Man pages generated from POD may change if the version of
   Perl used to generate them changes, if Pod::Simple or Pod::Man have had
   a new release.  Doxygen-generated HTML documentation is even more
   likely to change.  Many documentation generation systems will include
   timestamps or other information that changes, or (even more likely)
   will have minor changes in their output and formatting even if there is
   nothing as obvious as a version number or timestamp.

   I don't think we can use refcounting for generated documentation
   produced as part of the package build process.  If there is
   Doxygen-generated documentation, generated man pages, or the like, I
   think those have to be split into a separate arch: all package.  Even
   if it's just a couple of man pages.  This is rather annoying, but I
   think trying to use refcounting here is just too fragile.

4. Lintian overrides.  I believe these should be qualified with the
   architecture on any multiarch: same package so that the overrides can
   vary by architecture, since this is a semi-frequent use case for
   Lintian.

5. Data files that vary by architecture.  This includes big-endian
   vs. little-endian issues.  These are simply incompatible with multiarch
   as currently designed, and incompatible with the obvious variations
   that I can think of, and will have to either be moved into
   arch-qualified directories (with corresponding patches to the paths
   from which the libraries load the data) or these packages can't be made
   multiarch.

6. Debian changelogs.  The actual content of these files change with
   binNMUs, so these obviously can't be refcounted at all right now.  We
   have to do 

Re: Multiarch file overlap summary and proposal (was: Summary: dpkg shared / reference counted files and version match)

2012-02-13 Thread Raphael Hertzog
On Mon, 13 Feb 2012, Russ Allbery wrote:
 There's been a lot of discussion of this, but it seems to have been fairly
 inconclusive.  We need to decide what we're doing, if anything, for wheezy
 fairly soon, so I think we need to try to drive this discussion to some
 concrete conclusions.

Thanks for this.

 2. Files like the above but that are compressed.  This is most common in
the doc directory for things like README or the upstream changelog.
Upstream man pages written directly in *roff fall into this category as
well, for -dev packages.  With Steve's point above about gzip, I think
we're probably okay using refcounting for this as well.

Yes, but I would still document at the policy level that, when feasible
without downsides, it's best to move compressed files in a shared package.

Also it might be wise to relax the policy rules on compression for
multi-arch: same and to let dh_compress not compress (some) files in such
packages.

 Does this seem comprehensive to everyone?  Am I missing any cases?

It's a good summary, yes.

 If this is comprehensive, then I propose the following path forward, which
 is a mix of the various solutions that have been discussed:

I agree with this plan.

 * The binNMU process is changed to add the binNMU changelog entry to an
   arch-qualified file (changelog.Debian.arch, probably).  We need to
   figure out what this means if the package being binNMU'd has a
   /usr/share/doc/package symlink to another package, though; it's not
   obvious what to do here.

I wonder what's the proper way to handle this. In theory, it would be nice
to deal with that at the dpkg-dev level but dpkg-dev is not at all
involved in installing the changelog. And I believe that the bin-nmu
process just adds a top-level entry to debian/changelog.

So the code should go to dh_installchangelogs... but it doesn't seem to be
a good idea to put the bin-nmu logic there in particular since we might
extend it (see #440094).

Somehow my suggestion is then to extend dpkg-parsechangelog to provide
the required logic to split the changelog in its bin-nmu part and its
usual content.

dpkg-parsechangelog --split-binnmu binnmu-part-file remaining-part-file

Then dh_installchangelogs could try to use this (and if it fails, fallback
to the standard changelog installation).

Does that sound sane? If yes, I can have a look at implementing this.

Cheers,
-- 
Raphaël Hertzog ◈ Debian Developer

Pre-order a copy of the Debian Administrator's Handbook and help
liberate it: http://debian-handbook.info/liberation/


-- 
To UNSUBSCRIBE, email to debian-dpkg-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20120214073923.ga...@rivendell.home.ouaza.com