On Tue, Jun 05, 2007 at 06:28:53PM +0900, Charles Plessy wrote:
> Le Tue, Jun 05, 2007 at 10:09:07AM +0200, Michael Hanke a ?crit :
> > My question is now: Is it reasonable to provide this rather huge amount
> > of data in a package in the archive?
> > An alternative to a dedicated package would be to provide a
> > download/install script for the data (like the msttcorefonts package)
> > that is called at package postinst.
> I recently had a heretic idea that I did not dare to submit yet: we
> could port fink to Debian, and use it to build .debs from info files
> shipped in Debian packages in main, and sources downloaded from
> upstream's FTP sites.

Some thoughts on constraints:

        * it's better to have stuff distributed by Debian than sourced
          elsewhere; we're a distribution, distributing is What We Do

        * it's better for users to have stuff in .deb's, so they don't
          have to worry about different ways of managing different stuff
          on their system

        * some large data sets are just "compiled" -- it can be good to
          distribute a small amount of source in a .deb and compile
          it on the user's machine.

        * some large data sets are "compiled" but it takes long enough that
          we don't want to do it on user's machines, so we have the usual
          source/deb situation here, and that's fairly easy too.

        * (***) many data sets don't fit those patterns though, but
          instead are just a bunch of data that needs to be shipped to
          users. doubling that by having it duplicated in a .orig.tar.gz
          and _all.deb is less than ideal

        * some data sets have large raw data and large compiled versions,
          so need a large source _and_ a large .deb containing different
          info. nothing much to be done in that case, though

        * (###) having .deb's generated on a user's system means they
          can't use aptitude or apt-get to install them easily; having
          .deb's generated on mirrors requires smart mirroring software
          rather than just rsync or similar; having .deb's generated by
          the maintainer or buildds requires both the source and .deb
          to be mirrored separately; having .deb's be the source format
          requires converting from the upstream source format adding
          complexity and making it harder to trace how the packaging
          worked

For the ***'d case, it seems like having a debian.org mirror network
that distributes unprocessed data tarballs, that're converted into debs
and installed on user's systems would be workable.

I don't see how we could resolve that with the ###'d concern though.

If we were to resolve the ###'d concern by changing apt etc, we could
conceivably add foobar_1.0.7-1_data.tar.bz2 files to the archive in the
existing sections, for instance, and providing some form of "Packages.gz"
file for them.

I guess an evil solution to *** that doesn't cause problems with ###
would be to create a dummy source package that Build-Depends: on the
exact version of the package it builds, so that uploads include a
basically empty .tar.gz that just has instructions on how to download
new versions of the data, and an unprocessed copy of the actual data
converted to _all.deb form. That'd give the correct behaviour for all
the tools we've got, avoid unnecessarily duplicating the data, and maybe
not be *too* confusing.

Hrm, actually, I kind-of like that approach...

I'm not sure if avoiding duplicating the data (1G of data is bad, but
1G of the same data in a .orig.tar.gz _and_ a .deb is absurd) is enough
to just use the existing archive and mirror network, or if it'd still be
worth setting up a separate apt-able archive under debian.org somewhere
for _really_ big data.

Bug#38902 for hysterical interest, btw.

Cheers,
aj

Attachment: signature.asc
Description: Digital signature

Reply via email to