On 12/14/06, Stephen Potter <spp at unixsa.net> wrote:
>
>
> I think you need to go a little lower first.  A size limit isn't useful
> if the contents need to be sorted.  So, an easy first question is, does
> the file need to be sorted?


 Currently, yes. It's then possible to locate an entry in it using
a binary chop. Not everything that can use this does (if pkgchk
did it could go *way* faster), but it's a significant optimization.

If not, then does the file need to be ASCII
> or would a DB file be better?
>

I'm not convinced by the DB idea. One snag is that you can end
up requiring lots of random I/O rather than lots of sequential I/O
as you get with the contents file.

However, I can imagine an alternative format could be beneficial.
Such a format could be more compact and easier to parse -
and actually processing the contents file is quite expensive
(in terms of both cpu and memory - there's quite a lot of memory
pressure coming from the package tools, as the internal representation
of the contents file is several times the size of the contents file
itself). As a simple example, just storing the filename and not the
full pathname for files (the file is sorted, so the directory path
is the last directory you saw) could save 40% of the file size.

-- 
-Peter Tribble
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://mail.opensolaris.org/pipermail/install-discuss/attachments/20061214/879b6a26/attachment.html>

Reply via email to