On 12/18/06, James Carlson <james.d.carlson at sun.com> wrote:
>
> Stephen Potter writes:
> > Could we break it on FS hierarchy, something like contents.etc,
> contents.usr,
> > contents.var, etc?  Or, perhaps as a directory contents/etc,
> contents/usr,
> > contents/var?  It may need to go to two levels somehow, as on my system
> 142000
> > of the 147000 lines of contents are for files in usr.  Maybe
>
> It occurs to me that a better way to do this would be to hash the
> string, and break it out into separate files based on the hash.  I
> think you're much more likely to end up with a nice distribution of
> entries over files that way.
>

But do you want a nice distribution? Doesn't that just make it more likely
that you have to handle all the separate files?

On a couple of my systems there are an average of 125 and 132 files
per package. (Actually more, as I haven't allowed for duplicates -
just divided the number of lines in the contents file by the number
of packages.) I have a SUNWCrnet based machine that's down at 90
files per package but handling the contents file on that machine
isn't such a problem anyway. Splitting up into say 10 files means
we end up manipulating all 10 files; split into many more and the
cost of manipulating the separate files becomes significant.

Splitting the contents file up only makes sense if we can find a split
that guarantees there is a reasonable set of operations that only need
to touch a subset of the data.

-- 
-Peter Tribble
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://mail.opensolaris.org/pipermail/install-discuss/attachments/20061218/48a2afb3/attachment.html>

Reply via email to