Manoj Srivastava writes: > I think we should look at the possibility of not including the > information in either the Packages file nor the available file. The > Du files hsould be separately kept on the archives, and they maybe > compressed with gzip (bzip2?); and downloaded and kept in > /var/lib/dpkg/DU.gz or something on the users machine; and they need > only be downloaded if required by the user. Keeping this information > separate makes using this optional. > > I see no technical advantage encoding this in Packages files > and available file.
Hmm... $ for x in main contrib non-free; do > gzip -cd Packages.hamm.$x.du.gz \ > | sed -n '/^Package:/p;/^Du:/,/^$/p' \ > | gzip -9n > Sizes.hamm.$x.gz > done $ gzip -l Sizes.hamm.*.gz compressed uncompr. ratio uncompressed_name 105201 795450 86.7% Sizes.hamm.contrib 66294 402982 83.5% Sizes.hamm.main 11446 63766 82.0% Sizes.hamm.non-free 182941 1262198 85.5% (totals) $ gzip -cd Sizes.hamm.main.gz | head -15 Package: 2utf Du: 3 etc 1 usr 111 usr/bin 1 usr/doc 8 usr/doc/2utf 25 usr/doc/2utf/examples 1 usr/man 5 usr/man/man1 1 var 12 var/lib Package: 3dchess Du: 1 usr 1 usr/doc Looks reasonable... In practice, this information is only going to be used while installing packages, and 180K isn't much anyway. We could save far more space by compressing the available, available-old, status and status-old files (2.5Mb on my system). > We have conflicting data here. Mrvn says that the total du > data is only 76k. Charles says that the data is about 400k (which is > way more in line with my off the cuff calculations). The 400K was for normal hamm Packages files with additional Du data added to it. That makes my numbers far closer to Mrvn's. Also, weren't Mrvn's figures were for main only? > I am inclined to believe the 400k figures. I would, for > scalability reasons, advocate that we re run our scripts on a _ful__ > i386 mirror (which I do not have at the moment -- ran out of space). I generated my data from unix.hensa.ac.uk's mirror. > I also would strongly advocate *NOT* stuffing this data into > the Packages or the Available files, but keeping this apart on the > archive and when downloaded on the users disk. I'm now with you on this one. Given the sizes involved, I don't think we even need to go to the trouble of generating the "top N levels" versions. Using this would make it difficult to take symlinks into account. -- Charles Briscoe-Smith White pages entry, with PGP key: <URL:http://alethea.ukc.ac.uk/wp?95cpb4> PGP public keyprint: 74 68 AB 2E 1C 60 22 94 B8 21 2D 01 DE 66 13 E2 -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]