On Tue, Jun 05, 2007 at 06:28:53PM +0900, Charles Plessy wrote: > Le Tue, Jun 05, 2007 at 10:09:07AM +0200, Michael Hanke a ?crit : > > My question is now: Is it reasonable to provide this rather huge amount > > of data in a package in the archive? > > An alternative to a dedicated package would be to provide a > > download/install script for the data (like the msttcorefonts package) > > that is called at package postinst. > I recently had a heretic idea that I did not dare to submit yet: we > could port fink to Debian, and use it to build .debs from info files > shipped in Debian packages in main, and sources downloaded from > upstream's FTP sites.
Some thoughts on constraints: * it's better to have stuff distributed by Debian than sourced elsewhere; we're a distribution, distributing is What We Do * it's better for users to have stuff in .deb's, so they don't have to worry about different ways of managing different stuff on their system * some large data sets are just "compiled" -- it can be good to distribute a small amount of source in a .deb and compile it on the user's machine. * some large data sets are "compiled" but it takes long enough that we don't want to do it on user's machines, so we have the usual source/deb situation here, and that's fairly easy too. * (***) many data sets don't fit those patterns though, but instead are just a bunch of data that needs to be shipped to users. doubling that by having it duplicated in a .orig.tar.gz and _all.deb is less than ideal * some data sets have large raw data and large compiled versions, so need a large source _and_ a large .deb containing different info. nothing much to be done in that case, though * (###) having .deb's generated on a user's system means they can't use aptitude or apt-get to install them easily; having .deb's generated on mirrors requires smart mirroring software rather than just rsync or similar; having .deb's generated by the maintainer or buildds requires both the source and .deb to be mirrored separately; having .deb's be the source format requires converting from the upstream source format adding complexity and making it harder to trace how the packaging worked For the ***'d case, it seems like having a debian.org mirror network that distributes unprocessed data tarballs, that're converted into debs and installed on user's systems would be workable. I don't see how we could resolve that with the ###'d concern though. If we were to resolve the ###'d concern by changing apt etc, we could conceivably add foobar_1.0.7-1_data.tar.bz2 files to the archive in the existing sections, for instance, and providing some form of "Packages.gz" file for them. I guess an evil solution to *** that doesn't cause problems with ### would be to create a dummy source package that Build-Depends: on the exact version of the package it builds, so that uploads include a basically empty .tar.gz that just has instructions on how to download new versions of the data, and an unprocessed copy of the actual data converted to _all.deb form. That'd give the correct behaviour for all the tools we've got, avoid unnecessarily duplicating the data, and maybe not be *too* confusing. Hrm, actually, I kind-of like that approach... I'm not sure if avoiding duplicating the data (1G of data is bad, but 1G of the same data in a .orig.tar.gz _and_ a .deb is absurd) is enough to just use the existing archive and mirror network, or if it'd still be worth setting up a separate apt-able archive under debian.org somewhere for _really_ big data. Bug#38902 for hysterical interest, btw. Cheers, aj
signature.asc
Description: Digital signature