Re: [Debian-med-packaging] Question about proper archive area for packages that require big data for operation

2013-04-23 Thread Laszlo Kajan
Hello Andreas!

On 23/04/13 12:23, Andreas Tille wrote:
> On Tue, Apr 23, 2013 at 11:48:05AM +0200, Laszlo Kajan wrote:
>>
>> This email is to continue the discussion about free packages that depend on 
>> big (e.g. >400MB) free data outside 'main'.
> 
> In your practical case is this data say <500MB?  Are we talking about
> compressed or uncompressed data (= >400MB on users harddisk or on all
> Debian mirrors world-wide)?

It is around 404MB, gzip compressed [1]. I think it is not arch independent. I 
think BLAST databases (the main bulk in the tar.gz) are sensitive
to the size of int, and endian-ness.

[1] ftp://rostlab.org/metastudent/metastudent-data_1.0.0.tar.gz

> We do actually have examples of >500MB binary packages:
> 
> udd@ullmann:/srv/mirrors/debian$ find . -type f -size +500M -name "*.deb"
> ./pool/main/f/freefoam/freefoam-dev-doc_0.1.0+dfsg-1_all.deb
> ./pool/main/libr/libreoffice/libreoffice-dbg_4.0.3~rc1-3_amd64.deb
> ./pool/main/libr/libreoffice/libreoffice-dbg_4.0.3~rc1-3_kfreebsd-amd64.deb
> ./pool/main/libr/libreoffice/libreoffice-dbg_4.0.3~rc1-2_amd64.deb
> ./pool/main/libr/libreoffice/libreoffice-dbg_4.0.3~rc1-2_kfreebsd-amd64.deb
> ./pool/main/n/ns3/ns3-doc_3.16+dfsg1-1_all.deb
> ./pool/main/n/ns3/ns3-doc_3.15+dfsg-1_all.deb
> ./pool/main/w/webkitgtk/libwebkit2gtk-3.0-0-dbg_1.11.91-1_amd64.deb
> ./pool/non-free/r/redeclipse-data/redeclipse-data_1.4-1_all.deb
> 
> Even if the topic should be clarified in general because we will
> certainly have larger data sets than this in the future I could imagine
> that packaging this very data in your case should not be the main
> problem under the current circumstances as long there is no better
> solution found.
> 
> I would even go that far that it might make sense to package these data
> and upload it to demonstrate that we should *really* create a solution
> for such cases if they will increase in the number and size of data
> packages.

All right, we will package and upload the big data in case no one thinks of a 
better solution and discussion dies in, say, a week.

Laszlo


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/517675d3.3030...@rostlab.org



Re: [Debian-med-packaging] Question about proper archive area for packages that require big data for operation

2013-04-23 Thread Benjamin Drung
Am Dienstag, den 23.04.2013, 13:51 +0200 schrieb Laszlo Kajan:
> Hello Andreas!
> 
> On 23/04/13 12:23, Andreas Tille wrote:
> > On Tue, Apr 23, 2013 at 11:48:05AM +0200, Laszlo Kajan wrote:
> >>
> >> This email is to continue the discussion about free packages that
> >> depend on big (e.g. >400MB) free data outside 'main'.
> > 
> > In your practical case is this data say <500MB?  Are we talking about
> > compressed or uncompressed data (= >400MB on users harddisk or on all
> > Debian mirrors world-wide)?
> 
> It is around 404MB, gzip compressed [1]. I think it is not arch
> independent. I think BLAST databases (the main bulk in the tar.gz) are
> sensitive
> to the size of int, and endian-ness.
> 
> [1] ftp://rostlab.org/metastudent/metastudent-data_1.0.0.tar.gz

You can use xz for the source and binary package to reduce the size. The
default compression level for xz reduces the size of the source tarball
from 415 MB to 272 MB:

$ ls -1s --si metastudent-data_1.0.0.tar*
823M metastudent-data_1.0.0.tar
381M metastudent-data_1.0.0.tar.bz2
415M metastudent-data_1.0.0.tar.gz
272M metastudent-data_1.0.0.tar.xz
$ ls -1sh metastudent-data_1.0.0.tar*
784M metastudent-data_1.0.0.tar
363M metastudent-data_1.0.0.tar.bz2
396M metastudent-data_1.0.0.tar.gz
259M metastudent-data_1.0.0.tar.xz

-- 
Benjamin Drung
Debian & Ubuntu Developer


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/1366722785.3022.4.camel@deep-thought



Re: [Debian-med-packaging] Question about proper archive area for packages that require big data for operation

2013-04-23 Thread Laszlo Kajan
Hello Benjamin!

On 23/04/13 15:13, Benjamin Drung wrote:
> Am Dienstag, den 23.04.2013, 13:51 +0200 schrieb Laszlo Kajan:
>> Hello Andreas!
>>
>> On 23/04/13 12:23, Andreas Tille wrote:
>>> On Tue, Apr 23, 2013 at 11:48:05AM +0200, Laszlo Kajan wrote:

 This email is to continue the discussion about free packages that
 depend on big (e.g. >400MB) free data outside 'main'.
>>>
>>> In your practical case is this data say <500MB?  Are we talking about
>>> compressed or uncompressed data (= >400MB on users harddisk or on all
>>> Debian mirrors world-wide)?
>>
>> It is around 404MB, gzip compressed [1]. I think it is not arch
>> independent. I think BLAST databases (the main bulk in the tar.gz) are
>> sensitive
>> to the size of int, and endian-ness.
>>
>> [1] ftp://rostlab.org/metastudent/metastudent-data_1.0.0.tar.gz
> 
> You can use xz for the source and binary package to reduce the size. The
> default compression level for xz reduces the size of the source tarball
> from 415 MB to 272 MB:
> 
> $ ls -1s --si metastudent-data_1.0.0.tar*
> 823M metastudent-data_1.0.0.tar
> 381M metastudent-data_1.0.0.tar.bz2
> 415M metastudent-data_1.0.0.tar.gz
> 272M metastudent-data_1.0.0.tar.xz
> $ ls -1sh metastudent-data_1.0.0.tar*
> 784M metastudent-data_1.0.0.tar
> 363M metastudent-data_1.0.0.tar.bz2
> 396M metastudent-data_1.0.0.tar.gz
> 259M metastudent-data_1.0.0.tar.xz

Ah great! Thanks for checking this. A lesson for the future. We will switch to 
xz. Best regards,

Laszlo


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/5176f4e6.2000...@rostlab.org



Re: [Debian-med-packaging] Question about proper archive area for packages that require big data for operation

2013-04-23 Thread Laszlo Kajan
Dear Russ!

Thank you for getting back to me.

On 23/04/13 18:48, Russ Allbery wrote:
> Laszlo Kajan  writes:
> 
>> This email is to continue the discussion about free packages that depend
>> on big (e.g. >400MB) free data outside 'main'. These packages apparently
>> violate policy 2.2.1 [0] for inclusion in 'main' because they require
>> software outside the 'main' area to function. They do not violate point
>> #1 of the social contract [1], which requires non-dependency on non-free
>> components. For these big data packages, policy seems to be overly
>> restrictive compared to the social contract, leading to seemingly
>> unfounded rejection from 'main'.
> 
>> * In case the social contract indeed allows such packages to be in
>> 'main' (and policy is overly restrictive), how could it be ensured that
>> the packages are accepted?
> 
> Yes, I agree.  Although we should probably talk with ftp-master about
> whether they would like the data to just be packaged and uploaded as a
> regular package.

Ftp-master was included in the initial thread [1], but they did not (yet) 
respond, and I started to feel that it may be impolite to flood their
inbox with an issue like this. Since perhaps they alone can not decide about 
it. So yes, ftp-master is included in the mail once again.

[1] 
http://lists.alioth.debian.org/pipermail/debian-med-packaging/2013-April/019282.html

>> * What is the procedure within Debian to elicit a decision about the
>> handling of such packages in terms of archive area? Discussion on
>> d-devel, followed by policy change? Asking the policy team to clarify
>> policy for such packages? Technical committee?
> 
> Discussing it on debian-devel seems right, but I would also draw it to
> ftp-master's attention, since they're the people who have to worry about
> archive size).  We can easily move on to modifying Policy if there's a
> consensus to let packages like that pull the data down from some external
> source.

How to gauge that consensus?

Laszlo


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/5176f5f5.5040...@debian.org



Re: [Debian-med-packaging] Question about proper archive area for packages that require big data for operation

2013-04-23 Thread Russ Allbery
Laszlo Kajan  writes:
> On 23/04/13 18:48, Russ Allbery wrote:

>> Discussing it on debian-devel seems right, but I would also draw it to
>> ftp-master's attention, since they're the people who have to worry
>> about archive size).  We can easily move on to modifying Policy if
>> there's a consensus to let packages like that pull the data down from
>> some external source.

> How to gauge that consensus?

Generally the way it works is that if no one objects to the idea, we bring
it up on the Policy list, where we have a somewhat more formal process
that involves seconds or objections.

I think the ideal from a usability standpoint would be to just upload the
data directly to the Debian archive, though.  It's just a question of how
big of packages we want to handle through the mirror network, or whether
it's worth the effort to create a separate archive of huge data packages.

-- 
Russ Allbery (r...@debian.org)   


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/878v487vrc@windlord.stanford.edu



Re: [Debian-med-packaging] Question about proper archive area for packages that require big data for operation

2013-04-23 Thread Olivier Sallou

On 04/23/2013 11:48 AM, Laszlo Kajan wrote:
> Dear Russ, Debian Med Team, Charles!
>
> (Please keep Tobias Hamp in replies.)
>
> @Russ: Please allow me to include you in a discussion about a few 
> bioinformatics packages that depend on big, but free data [2]. I have cited
> your opinion [3] in this discussion before. You are on the technical 
> committee and on the policy team, so you, together with Charles, can help
> substantially here.
>
> [2] 
> http://lists.alioth.debian.org/pipermail/debian-med-packaging/2013-April/thread.html
> [3] https://lists.debian.org/debian-vote/2013/03/msg00279.html
>
> This email is to continue the discussion about free packages that depend on 
> big (e.g. >400MB) free data outside 'main'. These packages
> apparently violate policy 2.2.1 [0] for inclusion in 'main' because they 
> require software outside the 'main' area to function. They do not
> violate point #1 of the social contract [1], which requires non-dependency on 
> non-free components. For these big data packages, policy seems to
> be overly restrictive compared to the social contract, leading to seemingly 
> unfounded rejection from 'main'.
Indeed, many bioinformatics programs relies on external data. But I am
afraid that if we start to add some data packages, we will open an
endless open door BioInformatics datasets are large, and becoming
huge and numerous.
This size will be an issue for Debian mirrors (mainly if some indexed
data are system dependent) but will also be a pain for the user if, when
installing a program (to have a look), it downloads GBs of dependent
packaged data. It may be really slow and fill the user disk (and I do
not talk of package updates).

Should not those data dependency clearly stated somewhere with the
software package, with a script to get them ?

Olivier
>
> [0] http://www.debian.org/doc/debian-policy/ch-archive.html
> [1] http://www.debian.org/social_contract
>
> * In case the social contract indeed allows such packages to be in 'main' 
> (and policy is overly restrictive), how could it be ensured that the
> packages are accepted?
>
> * What is the procedure within Debian to elicit a decision about the handling 
> of such packages in terms of archive area? Discussion on d-devel,
> followed by policy change? Asking the policy team to clarify policy for such 
> packages? Technical committee?
>
>  + Charles suggested such packages could go into 'main' [4], with a clear 
> indication of the large data dependency of the package in the long
> description.
>When possible, providing the scripts for generating the large data as well.
>
>  [4] 
> http://lists.alioth.debian.org/pipermail/debian-med-packaging/2013-April/019292.html
>
> My goal as a Debian Developer and a packager is to get packages into Debian 
> (so 'main') that are allowed in there, in reasonably short time. I
> would like to resolve this issue properly, because I believe it may pop up 
> more often in bioinformatics software. For example, imagine a protein
> folding tool that would require a very large database to search for 
> homologues for contact prediction, and using the contacts it would predict
> protein three-dimensional structure. This has been done before [5], and such 
> a tool would be (is) immensely useful for bioinformatics. This tool
> would depend on gigabytes of data we would not package. Yet, by all means, I 
> would want the tool to be part of the distribution.
>
> [5] http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0028766
>
> Thank you for your opinion and advice.
>
> Best regards,
> Laszlo
>
> ___
> Debian-med-packaging mailing list
> debian-med-packag...@lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/debian-med-packaging

-- 
Olivier Sallou
IRISA / University of Rennes 1
Campus de Beaulieu, 35000 RENNES - FRANCE
Tel: 02.99.84.71.95

gpg key id: 4096R/326D8438  (keyring.debian.org)
Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/51777990.60...@irisa.fr



Re: [Debian-med-packaging] Question about proper archive area for packages that require big data for operation

2013-04-24 Thread Olе Streicher
Olivier Sallou  writes:
> Indeed, many bioinformatics programs relies on external data. But I am afraid
> that if we start to add some data packages, we will open an endless open
> door BioInformatics datasets are large, and becoming huge and numerous.
> This size will be an issue for Debian mirrors (mainly if some indexed data are
> system dependent) but will also be a pain for the user if, when installing a
> program (to have a look), it downloads GBs of dependent packaged data. It may
> be really slow and fill the user disk (and I do not talk of package updates).

Without having any solution, I'd like to mention that this is not a med
specific problem. I am working on a number of packages for astrophysics
that require/suggest up to some hundreds of megabytes calibration data
. Given that these packages
are of quite specific interest, it makes IMO no sense to pollute all
Debian mirrors with these files.

Best regards

Ole


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/ytzobd49zcp@news.ole.ath.cx



Re: [Debian-med-packaging] Question about proper archive area for packages that require big data for operation

2013-04-24 Thread Laszlo Kajan
Hi Olivier!

On 24/04/13 08:20, Olivier Sallou wrote:
> 
> On 04/23/2013 11:48 AM, Laszlo Kajan wrote:
>> Dear Russ, Debian Med Team, Charles!
>>
>> (Please keep Tobias Hamp in replies.)
>>
>> @Russ: Please allow me to include you in a discussion about a few 
>> bioinformatics packages that depend on big, but free data [2]. I have cited
>> your opinion [3] in this discussion before. You are on the technical 
>> committee and on the policy team, so you, together with Charles, can help
>> substantially here.
>>
>> [2] 
>> http://lists.alioth.debian.org/pipermail/debian-med-packaging/2013-April/thread.html
>> [3] https://lists.debian.org/debian-vote/2013/03/msg00279.html
>>
>> This email is to continue the discussion about free packages that depend on 
>> big (e.g. >400MB) free data outside 'main'. These packages
>> apparently violate policy 2.2.1 [0] for inclusion in 'main' because they 
>> require software outside the 'main' area to function. They do not
>> violate point #1 of the social contract [1], which requires non-dependency 
>> on non-free components. For these big data packages, policy seems to
>> be overly restrictive compared to the social contract, leading to seemingly 
>> unfounded rejection from 'main'.
> Indeed, many bioinformatics programs relies on external data. But I am
> afraid that if we start to add some data packages, we will open an
> endless open door BioInformatics datasets are large, and becoming
> huge and numerous.
> This size will be an issue for Debian mirrors (mainly if some indexed
> data are system dependent) but will also be a pain for the user if, when
> installing a program (to have a look), it downloads GBs of dependent
> packaged data. It may be really slow and fill the user disk (and I do
> not talk of package updates).
> 
> Should not those data dependency clearly stated somewhere with the
> software package, with a script to get them ?

Yes, the former (clearly state large external data dependency in the long 
package description) is exactly what Charles Plessy recommended.
And your idea with the script to get the data is exactly what we implemented 
for this 'metastudent' package. So we clearly think along the same
lines... Now we just have to discuss it with the FTP master team as well, so we 
see if this is acceptable for them (or they prefer to have the
data in the archive).
Laszlo


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/5177e18b.4010...@rostlab.org



Re: [Debian-med-packaging] Question about proper archive area for packages that require big data for operation

2013-04-24 Thread Laszlo Kajan
Hello Didier!

On 24/04/13 09:32, Didier 'OdyX' Raboud wrote:
> Le mardi, 23 avril 2013 12.23:23, Andreas Tille a écrit :
>> I would even go that far that it might make sense to package these data
>> and upload it to demonstrate that we should *really* create a solution
>> for such cases if they will increase in the number and size of data
>> packages.
> 
> Isn't that what data.debian.org is supposed to be(come) ?
> 
>   * http://ftp-master.debian.org/wiki/projects/data/
>   * http://lists.debian.org/debian-devel/2010/09/msg00692.html

Thanks for pointing this out, I didn't know about this. This would work very 
well for the 'metastudent' (and other of the same kind) data. A
policy change (point 'We need to change policy.' of [1]) could be initiated, as 
Russ Allbery noted before [2]. But - data.debian.org does not
(yet) exist, does it?

[1] http://ftp-master.debian.org/wiki/projects/data/
[2] 
http://lists.alioth.debian.org/pipermail/debian-med-packaging/2013-April/019320.html

Laszlo


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/5177e604.50...@rostlab.org



Re: [Debian-med-packaging] Question about proper archive area for packages that require big data for operation

2013-04-24 Thread Olivier Sallou

On 04/24/2013 04:02 PM, Laszlo Kajan wrote:
> Hello Didier!
>
> On 24/04/13 09:32, Didier 'OdyX' Raboud wrote:
>> Le mardi, 23 avril 2013 12.23:23, Andreas Tille a écrit :
>>> I would even go that far that it might make sense to package these data
>>> and upload it to demonstrate that we should *really* create a solution
>>> for such cases if they will increase in the number and size of data
>>> packages.
>> Isn't that what data.debian.org is supposed to be(come) ?
>>
>>  * http://ftp-master.debian.org/wiki/projects/data/
>>  * http://lists.debian.org/debian-devel/2010/09/msg00692.html
> Thanks for pointing this out, I didn't know about this. This would work very 
> well for the 'metastudent' (and other of the same kind) data. A
> policy change (point 'We need to change policy.' of [1]) could be initiated, 
> as Russ Allbery noted before [2]. But - data.debian.org does not
> (yet) exist, does it?
sounds idea is quite old (2009) but did not progress. Could be the
opportunity to relaunch it.
> [1] http://ftp-master.debian.org/wiki/projects/data/
> [2] 
> http://lists.alioth.debian.org/pipermail/debian-med-packaging/2013-April/019320.html
>
> Laszlo
>
> ___
> Debian-med-packaging mailing list
> debian-med-packag...@lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/debian-med-packaging

-- 
Olivier Sallou
IRISA / University of Rennes 1
Campus de Beaulieu, 35000 RENNES - FRANCE
Tel: 02.99.84.71.95

gpg key id: 4096R/326D8438  (keyring.debian.org)
Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/5177ec98.5040...@irisa.fr



Re: [Debian-med-packaging] Question about proper archive area for packages that require big data for operation

2013-04-26 Thread Laszlo Kajan
Dear FTP Masters!

On 23/04/13 15:13, Benjamin Drung wrote:
[...]
> You can use xz for the source and binary package to reduce the size. The
> default compression level for xz reduces the size of the source tarball
> from 415 MB to 272 MB:
> 
> $ ls -1s --si metastudent-data_1.0.0.tar*
> 823M metastudent-data_1.0.0.tar
> 381M metastudent-data_1.0.0.tar.bz2
> 415M metastudent-data_1.0.0.tar.gz
> 272M metastudent-data_1.0.0.tar.xz
> $ ls -1sh metastudent-data_1.0.0.tar*
> 784M metastudent-data_1.0.0.tar
> 363M metastudent-data_1.0.0.tar.bz2
> 396M metastudent-data_1.0.0.tar.gz
> 259M metastudent-data_1.0.0.tar.xz

Following Benjamin's suggestion and the data.debian.org document [1], we have 
prepared a 'metastudent-data' arch:all package that is ~130MB (xz
compressed).
The package builds required architecture dependent databases in the postinst 
script. The purpose of this is to save space in the archive that
each architecture dependent version would take up.
The arch:all package is almost identical to the source package.

* Please comment on this solution. If you like it, we will upload it (targeting 
the 'main' area), and have 'metastudent' (also in main) depend
on it.

[1] http://ftp-master.debian.org/wiki/projects/data/

Thank you for commenting.

Best regards,
Laszlo


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/517a7f67.3070...@rostlab.org



Re: [Debian-med-packaging] Question about proper archive area for packages that require big data for operation

2013-04-26 Thread Ben Hutchings
On Fri, Apr 26, 2013 at 03:21:43PM +0200, Laszlo Kajan wrote:
> Dear FTP Masters!
> 
> On 23/04/13 15:13, Benjamin Drung wrote:
> [...]
> > You can use xz for the source and binary package to reduce the size. The
> > default compression level for xz reduces the size of the source tarball
> > from 415 MB to 272 MB:
> > 
> > $ ls -1s --si metastudent-data_1.0.0.tar*
> > 823M metastudent-data_1.0.0.tar
> > 381M metastudent-data_1.0.0.tar.bz2
> > 415M metastudent-data_1.0.0.tar.gz
> > 272M metastudent-data_1.0.0.tar.xz
> > $ ls -1sh metastudent-data_1.0.0.tar*
> > 784M metastudent-data_1.0.0.tar
> > 363M metastudent-data_1.0.0.tar.bz2
> > 396M metastudent-data_1.0.0.tar.gz
> > 259M metastudent-data_1.0.0.tar.xz
> 
> Following Benjamin's suggestion and the data.debian.org document [1], we have 
> prepared a 'metastudent-data' arch:all package that is ~130MB (xz
> compressed).
> The package builds required architecture dependent databases in the postinst 
> script. The purpose of this is to save space in the archive that
> each architecture dependent version would take up.
[...]

Does this mean that installing the package results in having two
uncompressed copies of the data on disk?  If so, wouldn't it be
better to do:

1. Compress the database (with xz).
2. Build the package without compression (contents are already
   compressed so re-compressing would be a waste of time).
3. In postinst, decompress and convert the database to native.

However, I would expect the vast majority of installations to be on
amd64, so if you always generate a 64-bit little-endian database
and avoid duplicating when installing on such a machine then it
would be better for most users (not so nice for others).

(Incidentally, arch:all packages generating arch-specific data have
interesting interactions with multi-arch.  I doubt many people with
multi-arch systems would want this package to generate multiple
versions of the database, but you never know...)

Ben.

-- 
Ben Hutchings
We get into the habit of living before acquiring the habit of thinking.
  - Albert Camus


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20130426224644.gf2...@decadent.org.uk



Re: [Debian-med-packaging] Question about proper archive area for packages that require big data for operation

2013-04-27 Thread Laszlo Kajan
Dear Ben!

On 27/04/13 00:46, Ben Hutchings wrote:
> On Fri, Apr 26, 2013 at 03:21:43PM +0200, Laszlo Kajan wrote:
>> Dear FTP Masters!
>>
>> On 23/04/13 15:13, Benjamin Drung wrote:
>> [...]
>>> You can use xz for the source and binary package to reduce the size. The
>>> default compression level for xz reduces the size of the source tarball
>>> from 415 MB to 272 MB:
>>>
>>> $ ls -1s --si metastudent-data_1.0.0.tar*
>>> 823M metastudent-data_1.0.0.tar
>>> 381M metastudent-data_1.0.0.tar.bz2
>>> 415M metastudent-data_1.0.0.tar.gz
>>> 272M metastudent-data_1.0.0.tar.xz
>>> $ ls -1sh metastudent-data_1.0.0.tar*
>>> 784M metastudent-data_1.0.0.tar
>>> 363M metastudent-data_1.0.0.tar.bz2
>>> 396M metastudent-data_1.0.0.tar.gz
>>> 259M metastudent-data_1.0.0.tar.xz
>>
>> Following Benjamin's suggestion and the data.debian.org document [1], we 
>> have prepared a 'metastudent-data' arch:all package that is ~130MB (xz
>> compressed).
>> The package builds required architecture dependent databases in the postinst 
>> script. The purpose of this is to save space in the archive that
>> each architecture dependent version would take up.
> [...]
> 
> Does this mean that installing the package results in having two
> uncompressed copies of the data on disk?  If so, wouldn't it be
> better to do:

Indeed, the original arch:all version, and the native one. The arch:all version 
is not needed any more after conversion, and could be removed.
Thanks for drawing my attention to this.

> 1. Compress the database (with xz).
> 2. Build the package without compression (contents are already
>compressed so re-compressing would be a waste of time).
> 3. In postinst, decompress and convert the database to native.
> 
> However, I would expect the vast majority of installations to be on
> amd64, so if you always generate a 64-bit little-endian database
> and avoid duplicating when installing on such a machine then it
> would be better for most users (not so nice for others).
> 
> (Incidentally, arch:all packages generating arch-specific data have
> interesting interactions with multi-arch.  I doubt many people with
> multi-arch systems would want this package to generate multiple
> versions of the database, but you never know...)

I see. According to [1], Arch:all with Multi-Arch:same is an error.
[1] https://wiki.ubuntu.com/MultiarchSpec

So at this point I see one way forward:

1: Move the postinst script into a new Arch:any package that depends on 
'metastudent-data'. This Arch:any package would build the native
database in postinst (with no multiarch support for now).

What do you think?

Best regards,
Laszlo


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/517b88b9.8000...@rostlab.org



Re: [Debian-med-packaging] Question about proper archive area for packages that require big data for operation

2013-04-27 Thread Adam Borowski
On Fri, Apr 26, 2013 at 11:46:44PM +0100, Ben Hutchings wrote:
> On Fri, Apr 26, 2013 at 03:21:43PM +0200, Laszlo Kajan wrote:
> > On 23/04/13 15:13, Benjamin Drung wrote:
> > [...]
> > > You can use xz for the source and binary package to reduce the size. The
> > > default compression level for xz reduces the size of the source tarball
> > > from 415 MB to 272 MB:
> > 
> > Following Benjamin's suggestion and the data.debian.org document [1], we 
> > have prepared a 'metastudent-data' arch:all package that is ~130MB (xz
> > compressed).
> > The package builds required architecture dependent databases in the 
> > postinst script. The purpose of this is to save space in the archive that
> > each architecture dependent version would take up.
> [...]
> 
> Does this mean that installing the package results in having two
> uncompressed copies of the data on disk?  If so, wouldn't it be
> better to do:
> 
> 1. Compress the database (with xz).
> 2. Build the package without compression (contents are already
>compressed so re-compressing would be a waste of time).
> 3. In postinst, decompress and convert the database to native.

If it's never going to be recompressed, you really want to compress it up
the wazoo:
| compression   | decompression
xz  |  size |amd64 |  armhf |   armhf
-0  | 407076744 |  1:49.77 |6:14.47 | 1:23.31
-6  | 271088012 | 14:56.38 |   47:40.23 | 1:02.37
-9e | 195223672 | 19:38.15 | 1:06:50|   48.01

Far less space taken, _and_ it decompresses faster.

> However, I would expect the vast majority of installations to be on
> amd64, so if you always generate a 64-bit little-endian database
> and avoid duplicating when installing on such a machine then it
> would be better for most users (not so nice for others).

Looks like we're getting 539375329859372 cores on one blade armhf machines,
but you have a point.  I find it quite strange why would the on-disk format
ever care about word width, though: if the data fits in 32 bits, there's
lots of waste for no gain -- mmap or not.

-- 
ᛊᚨᚾᛁᛏᚣ᛫ᛁᛊ᛫ᚠᛟᚱ᛫ᚦᛖ᛫ᚹᛖᚨᚲ


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20130427140916.ga24...@angband.pl



Re: [Debian-med-packaging] Question about proper archive area for packages that require big data for operation

2013-04-28 Thread Ben Hutchings
On Sat, 2013-04-27 at 10:13 +0200, Laszlo Kajan wrote:
> Dear Ben!
> 
> On 27/04/13 00:46, Ben Hutchings wrote:
[...]
> > However, I would expect the vast majority of installations to be on
> > amd64, so if you always generate a 64-bit little-endian database
> > and avoid duplicating when installing on such a machine then it
> > would be better for most users (not so nice for others).
> > 
> > (Incidentally, arch:all packages generating arch-specific data have
> > interesting interactions with multi-arch.  I doubt many people with
> > multi-arch systems would want this package to generate multiple
> > versions of the database, but you never know...)
> 
> I see. According to [1], Arch:all with Multi-Arch:same is an error.
> [1] https://wiki.ubuntu.com/MultiarchSpec
>
> So at this point I see one way forward:
> 
> 1: Move the postinst script into a new Arch:any package that depends on 
> 'metastudent-data'. This Arch:any package would build the native
> database in postinst (with no multiarch support for now).
> 
> What do you think?

This might work, but be careful.  Consider a multiarch system with armhf
as primary and armel as additional architecture.  Both architectures are
little-endian, 32-bit.  If you install metastudent-data-native/armel and
metastudent-data-native/armhf, they should only generate one native copy
of the data (right?).  But if you remove one and leave the other, the
native copy should stay.

Ben.

-- 
Ben Hutchings
Knowledge is power.  France is bacon.


signature.asc
Description: This is a digitally signed message part