On Thu, 14 Dec 2006, Peter Tribble wrote:

> However, I can imagine an alternative format could be beneficial.
> Such a format could be more compact and easier to parse -
> and actually processing the contents file is quite expensive
> (in terms of both cpu and memory - there's quite a lot of memory
> pressure coming from the package tools, as the internal representation
> of the contents file is several times the size of the contents file
> itself). As a simple example, just storing the filename and not the
> full pathname for files (the file is sorted, so the directory path
> is the last directory you saw) could save 40% of the file size.

As a test I tried breaking apart a contents file into a SQLite2 database.
I only chose that platform since it comes with Solaris 10.
Unfortunately, it's a little rough, and missing a couple of
necessary fields.

The schema so far is:

create table c_files (fileid integer primary key, filename text);
create table c_match (fileid integer, pkgid integer);
create table c_pkgs (pkgid integer primary key, pkgname text);

There's no attributes or install checksums yet.    Most of that data (but
not the class name and maybe not the type) could be expressed as integers.

Importing a 25955 line contents file (2.1 megs) resulted in a 2.3 meg
database.  The c_files table had 25955 records, c_pkgs had 193 records,
and c_match had 25759 records.  Theoretically, c_match should have had
25955 records, so I'm not 100% sure what happened.  I'll have to do some
more digging in my code.

The idea of breaking down the directories and filenames would reduce the
number of entries.  Changing the representation of symlinks might save
some space as well.

This file might result in quicker reads/write operations; but I suspect
that it's really just going to require more complex internal
representations in memory for each package management application.

> --
> -Peter Tribble
> http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
>

--------------------
Christopher Josephes
cpj1 at visi.com

Reply via email to