On 12/18/06, James Carlson <james.d.carlson at sun.com> wrote: > > Stephen Potter writes: > > Could we break it on FS hierarchy, something like contents.etc, > contents.usr, > > contents.var, etc? Or, perhaps as a directory contents/etc, > contents/usr, > > contents/var? It may need to go to two levels somehow, as on my system > 142000 > > of the 147000 lines of contents are for files in usr. Maybe > > It occurs to me that a better way to do this would be to hash the > string, and break it out into separate files based on the hash. I > think you're much more likely to end up with a nice distribution of > entries over files that way. >
But do you want a nice distribution? Doesn't that just make it more likely that you have to handle all the separate files? On a couple of my systems there are an average of 125 and 132 files per package. (Actually more, as I haven't allowed for duplicates - just divided the number of lines in the contents file by the number of packages.) I have a SUNWCrnet based machine that's down at 90 files per package but handling the contents file on that machine isn't such a problem anyway. Splitting up into say 10 files means we end up manipulating all 10 files; split into many more and the cost of manipulating the separate files becomes significant. Splitting the contents file up only makes sense if we can find a split that guarantees there is a reasonable set of operations that only need to touch a subset of the data. -- -Peter Tribble http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/install-discuss/attachments/20061218/48a2afb3/attachment.html>
